Compare commits

...

12 Commits

Author SHA1 Message Date
Zamil Majdy
e51c287ae4 test: PR #12841 native E2E v3 — link rendering + click-through 2026-04-18 18:45:54 +07:00
Zamil Majdy
58ce293ec0 test: PR #12841 native UI proof screenshots (v2) 2026-04-18 15:51:06 +07:00
Zamil Majdy
88b515c191 test: add real-feature E2E screenshots for PR #12841 2026-04-18 13:03:50 +07:00
Zamil Majdy
39f04b8990 test: add E2E screenshots for PR #12841 2026-04-18 09:01:08 +07:00
Zamil Majdy
b1b45e57e2 chore(skill/pr-test): add native-mode option, keep docker as fallback
Running `poetry run app` + `pnpm dev` against infra-only docker is 3–8
minutes faster per iteration than rebuilding the full compose stack on
every backend change. Document it as the preferred path for iterative
PR testing and keep the existing docker-compose path as an explicit
fallback for Dockerfile/compose-level changes or production-parity runs.
2026-04-18 08:37:37 +07:00
Zamil Majdy
6b199d2b9c fix(backend/copilot): check_background_tool — auth, dry-run, list=true support
Self-review findings from a fresh pass — all 🟠 Should Fix.

- requires_auth=True on CheckBackgroundToolTool for consistency with
  other stateful tools (run_agent, run_block, continue_run_block).
- Dry-run guard on cancel=true: return a simulated 'cancelled' status
  without actually calling task.cancel(), matching run_block /
  run_mcp_tool's dry-run semantics.
- list=true parameter: enumerates every active background task in the
  session as BackgroundToolList → list[BackgroundToolListEntry]. Closes
  the UX gap where an agent losing context compaction can no longer
  reach a parked task. No other params needed when list=true.
- background_id is now optional (required only when list=false).
- BackgroundToolList exported via tools/models.py and added to
  ToolResponseUnion in chat routes so frontend codegen picks it up.
- Registry gains list_background_tasks() snapshot helper.
- Regenerated openapi.json for the new type.

Tests:
- list=true returns active tasks with real bg_ids, age>=0, done=False
- list=true on empty registry returns []
- cancel=true under session.dry_run doesn't kill the real task
- requires_auth is True
- list_background_tasks registry-level snapshot
2026-04-18 08:10:37 +07:00
Zamil Majdy
38d3c506a1 fix(backend/copilot): register check_background_tool in PLATFORM_TOOL_NAMES
Adds 'check_background_tool' to the ToolName Literal so the permission
registry check (_assert_tool_names_consistent) passes.
2026-04-18 07:14:36 +07:00
Zamil Majdy
be500ba0e3 chore(backend): regenerate openapi.json for BackgroundToolStatus 2026-04-18 07:02:54 +07:00
Zamil Majdy
8915b2958c fix(backend/copilot): guard cancel race, assert cleanup with real bg_ids
- check_background_tool with cancel=true now checks task.done() before
  calling task.cancel(). If the task finished between the registry
  lookup and the cancel request, surface the real completed/error
  result instead of reporting 'cancelled' and losing the output.
- Registry test for cancel_all_background_tasks now captures the real
  bg_ids returned by register_background_task and asserts they're gone
  (instead of checking fabricated IDs).
- New test pins the cancel-after-completed race guard.
2026-04-18 06:54:59 +07:00
Zamil Majdy
453e90d0f4 fix(backend/copilot): address PR review — CancelledError propagation, error status, BackgroundToolStatus in codegen union
- _execute_tool_sync now catches asyncio.CancelledError and cancels the
  unregistered child task before re-raising. Prevents orphans when the
  handler is torn down before the per-tool timeout fires (child is not
  yet in the registry so cancel_all_background_tasks can't clean it up).
- check_background_tool now maps result.success=False to status='error'
  (not 'completed'), so an agent doesn't treat a failed finish as a win.
- BackgroundToolStatus moved to tools/models.py and added to the
  ToolResponseUnion in chat routes so frontend codegen picks it up.
- Tests: replace broad `except (CancelledError, BaseException)` catches
  with contextlib.suppress(asyncio.CancelledError) in the cleanup paths.
- New tests: handler cancellation propagates to child task; success=False
  result reports status='error'.
2026-04-18 06:51:37 +07:00
Zamil Majdy
bca21e84e4 fix(backend/copilot): address PR review — orphan cleanup, cap, clamp wording, trim
- Cancel all background tasks in the stream's finally block
  (cancel_all_background_tasks) so orphan long-running work doesn't
  outlive the session when the user leaves or the stream errors.
- Cap per-session registry at MAX_BACKGROUND_TASKS_PER_SESSION=32;
  overflow evicts + cancels the oldest entry.
- Document ContextVar scoping: sub-AutoPilots get an isolated registry.
- Trim the 'still running' background payload message; structured
  fields (type, tool, background_id, timeout_seconds) carry the rest.
- Clarify check_background_tool's wait_seconds: values above the max
  are clamped (not rejected) and the agent should call again to wait
  longer.
- Comment the intentional 10-min default on BashExecTool (its own
  subprocess timeout is capped at 120s so the budget never fires in
  the normal path).
- Add registry tests: register/lookup, unregister, cancel_all,
  overflow eviction.
2026-04-18 06:45:02 +07:00
Zamil Majdy
c32a4017fe fix(backend/copilot): non-cancelling per-tool timeouts + check_background_tool
When a tool call exceeds its per-call time budget the handler no longer
cancels the task — it parks the asyncio.Task in a per-session background
registry and returns a synthetic tool result with a background_id. The
agent then uses the new check_background_tool to wait longer, poll
status, or cancel. This keeps the autopilot in control of slow sub-agent
and graph-execution work instead of the handler making an irreversible
choice, and removes the need for an exemption list.

Design
------
- BaseTool.timeout_seconds (default 10 min, None disables) decides when
  to park.
- run_agent / run_block / continue_run_block declare None — they manage
  their own lifecycles.
- _execute_tool_sync wraps the tool coroutine in asyncio.wait(timeout=...)
  (non-cancelling). On timeout → register_background_task + synthetic
  result with type='background' and background_id.
- New tool check_background_tool exposes wait_seconds (0..540) and
  cancel=true to the agent; drives its own wait via asyncio.wait.
- Background registry lives in its own module (sdk/background_registry.py)
  to avoid a TOOL_REGISTRY import cycle.
- Stream-level idle timeout kept as last-resort safety net (30 min) and
  now logs the unresolved tool calls for monitoring.

Security / ops
--------------
- _redact_args_for_log replaces values of sensitive keys (api_key, token,
  password, secret, credentials, authorization, auth) with '<redacted>'
  before logging, on top of the existing 200-char truncation.

Docs
----
- _SHARED_TOOL_NOTES now documents the background lifecycle and tells
  the agent to keep polling for legitimate long-running work rather than
  cancelling.

Tests
-----
- TestToolTimeout: timeout parks task (doesn't cancel), synthetic
  result has type='background' and a bg_id, None disables timeout.
- TestBaseToolDefaultTimeout: default 600s, per-tool overrides.
- TestCheckBackgroundTool: missing/unknown id → error, wait=0 → status,
  wait returns completed/still_running, cancel=true propagates, errored
  tasks surface as status='error'.

Ref: SECRT-2247
2026-04-18 06:39:40 +07:00
37 changed files with 1554 additions and 20 deletions

View File

@@ -5,7 +5,7 @@ user-invocable: true
argument-hint: "[worktree path or PR number] — tests the PR in the given worktree. Optional flags: --fix (auto-fix issues found)"
metadata:
author: autogpt-team
version: "2.0.0"
version: "2.1.0"
---
# Manual E2E Test
@@ -248,7 +248,87 @@ docker ps --format "{{.Names}}" | grep -E "rest_server|executor|copilot|websocke
done
```
### 3e. Build and start
**Native mode also:** when running the app natively (see 3e-native), kill any stray host processes and free the app ports before starting — otherwise `poetry run app` and `pnpm dev` will fail to bind.
```bash
# Kill stray native app processes from prior runs
pkill -9 -f "python.*backend" 2>/dev/null || true
pkill -9 -f "poetry run app" 2>/dev/null || true
pkill -9 -f "next-server|next dev" 2>/dev/null || true
# Free app ports (errors per port are ignored — port may simply be unused)
for port in 3000 8006 8001 8002 8005 8008; do
lsof -ti :$port -sTCP:LISTEN | xargs -r kill -9 2>/dev/null || true
done
```
### 3e-native. Run the app natively (PREFERRED for iterative dev)
Native mode runs infra (postgres, supabase, redis, rabbitmq, clamav) in docker but runs the backend and frontend directly on the host. This avoids the 3-8 minute `docker compose build` cycle on every backend change — code edits are picked up on process restart (seconds) instead of a full image rebuild.
**When to prefer native mode (default for this skill):**
- Iterative dev/debug loops where you're editing backend or frontend code between test runs
- Any PR that touches Python/TS source but not Dockerfiles, compose config, or infra images
- Fast repro of a failing scenario — restart `poetry run app` in a couple of seconds
**When to prefer docker mode (3e fallback):**
- Testing changes to `Dockerfile`, `docker-compose.yml`, or base images
- Production-parity smoke tests (exact container env, networking, volumes)
- CI-equivalent runs where you need the exact image that'll ship
**Note on 3b (copilot auth):** in native mode, the runtime `npm install -g @anthropic-ai/claude-code` step is NOT required. The `claude_agent_sdk` bundled CLI ships with the poetry venv and is on `PATH` when you run commands via `poetry run`. The OAuth token extraction still applies (same `refresh_claude_token.sh` call).
**Preamble:** before starting native, run the kill-stray + free-ports block from 3c's "Native mode also" subsection.
**1. Start infra only (one-time per session):**
```bash
cd $PLATFORM_DIR && docker compose --profile local up deps --detach --remove-orphans --build
```
This brings up postgres/supabase/redis/rabbitmq/clamav and skips all app services.
**2. Start the backend natively:**
```bash
cd $BACKEND_DIR && (poetry run app 2>&1 | tee .ign.application.logs) &
```
`poetry run app` spawns **all** app subprocesses — `rest_server`, `executor`, `copilot_executor`, `websocket`, `scheduler`, `notification_server`, `database_manager` — inside ONE parent process. No separate containers, no separate terminals. The `.ign.application.logs` prefix is already gitignored.
**3. Wait for the backend on :8006 BEFORE starting the frontend.** This ordering matters — the frontend's `pnpm dev` startup invokes `generate-api-queries`, which fetches `/openapi.json` from the backend. If the backend isn't listening yet, `pnpm dev` fails immediately.
```bash
for i in $(seq 1 60); do
if [ "$(curl -s -o /dev/null -w '%{http_code}' http://localhost:8006/docs 2>/dev/null)" = "200" ]; then
echo "Backend ready"
break
fi
sleep 2
done
```
**4. Start the frontend natively:**
```bash
cd $FRONTEND_DIR && (pnpm dev 2>&1 | tee .ign.frontend.logs) &
```
**5. Wait for the frontend on :3000:**
```bash
for i in $(seq 1 60); do
if [ "$(curl -s -o /dev/null -w '%{http_code}' http://localhost:3000 2>/dev/null)" = "200" ]; then
echo "Frontend ready"
break
fi
sleep 2
done
```
Once both are up, skip 3e/3f and go straight to **3g/3h** (feature flags / test user creation).
### 3e. Build and start (docker — fallback)
```bash
cd $PLATFORM_DIR && docker compose build --no-cache 2>&1 | tail -20
@@ -442,6 +522,22 @@ agent-browser --session-name pr-test snapshot | grep "text:"
### Checking logs
**Native mode:** when running via `poetry run app` + `pnpm dev`, all app logs stream to the `.ign.*.logs` files written by the `tee` pipes in 3e-native. `rest_server`, `executor`, `copilot_executor`, `websocket`, `scheduler`, `notification_server`, and `database_manager` are all subprocesses of the single `poetry run app` parent, so their output is interleaved in `.ign.application.logs`.
```bash
# Backend (all app subprocesses interleaved)
tail -f $BACKEND_DIR/.ign.application.logs
# Frontend (Next.js dev server)
tail -f $FRONTEND_DIR/.ign.frontend.logs
# Filter for errors across either log
grep -iE "error|exception|traceback" $BACKEND_DIR/.ign.application.logs | tail -20
grep -iE "error|exception|traceback" $FRONTEND_DIR/.ign.frontend.logs | tail -20
```
**Docker mode:**
```bash
# Backend REST server
docker logs autogpt_platform-rest_server-1 2>&1 | tail -30

View File

@@ -50,6 +50,8 @@ from backend.copilot.tools.models import (
AgentPreviewResponse,
AgentSavedResponse,
AgentsFoundResponse,
BackgroundToolList,
BackgroundToolStatus,
BlockDetailsResponse,
BlockListResponse,
BlockOutputResponse,
@@ -1323,6 +1325,8 @@ ToolResponseUnion = (
| MemorySearchResponse
| MemoryForgetCandidatesResponse
| MemoryForgetConfirmResponse
| BackgroundToolStatus
| BackgroundToolList
)

View File

@@ -71,6 +71,7 @@ ToolName = Literal[
"browser_act",
"browser_navigate",
"browser_screenshot",
"check_background_tool",
"connect_integration",
"continue_run_block",
"create_agent",

View File

@@ -163,6 +163,21 @@ perform multi-step work autonomously.
Use this when a task is complex enough to benefit from a separate
autopilot context, e.g. "research X and write a report" while the
parent autopilot handles orchestration.
### Long-running tool calls (backgrounded)
If any tool call exceeds its per-call time budget, the MCP handler
parks it in the background (the work keeps running) and returns a
result with ``"type": "background"``, a ``background_id`` (e.g.
``bg-abc123``), the original tool name, and a message.
Use **check_background_tool** to control the task:
- ``wait_seconds`` (0-540): wait up to N seconds for completion.
- ``cancel: true``: abort the background task and discard its result.
For legitimate long-running work (sub-autopilot, agent execution,
large code builds) **keep calling check_background_tool with a
longer wait_seconds** — do not cancel unless the task is clearly
stuck or no longer useful.
"""
# E2B-only notes — E2B has full internet access so gh CLI works there.

View File

@@ -0,0 +1,144 @@
"""Per-session registry of backgrounded tool calls.
When a tool exceeds its per-call ``timeout_seconds`` budget the in-flight
``asyncio.Task`` is parked here rather than being cancelled. The agent can
then use the ``check_background_tool`` tool (keyed by ``background_id``) to
wait longer, poll status, or cancel — keeping the autopilot in control of
slow sub-agents and graph executions.
Lives in its own module so that both ``tool_adapter.py`` (which registers
tasks during tool dispatch) and ``tools/check_background_tool.py`` (which
inspects them) can import the registry without creating a cycle via the
tool-registry import chain.
Scoping: the registry is a :class:`ContextVar`, so each execution context
(parent AutoPilot, and any sub-AutoPilot invoked via ``run_block``) gets an
independent registry. A sub-AutoPilot cannot see or cancel a parent's
background tasks — this is intentional isolation.
"""
import asyncio
import logging
import time
import uuid
from contextvars import ContextVar
from typing import Any
logger = logging.getLogger(__name__)
# Max wait a single check_background_tool call may block for. Kept below the
# stream-level idle timeout so the outer safety net still triggers if the
# whole session genuinely stalls.
MAX_BACKGROUND_WAIT_SECONDS = 9 * 60 # 9 minutes
# Upper bound on concurrent background tasks per session. Prevents a
# pathological agent from leaking asyncio.Tasks by timing out hundreds of
# tools back-to-back. When full, the oldest entry is cancelled and evicted
# so the newest registration still succeeds.
MAX_BACKGROUND_TASKS_PER_SESSION = 32
_background_tasks: ContextVar[dict[str, dict[str, Any]]] = ContextVar(
"_background_tasks",
default=None, # type: ignore[arg-type]
)
def init_registry() -> None:
"""Install a fresh per-session registry in the current context."""
_background_tasks.set({})
def register_background_task(task: asyncio.Task, tool_name: str) -> str:
"""Register *task* in the per-session background registry, returning the id.
If the registry is already at :data:`MAX_BACKGROUND_TASKS_PER_SESSION`,
the oldest entry is cancelled and evicted to make room.
"""
bg_id = f"bg-{uuid.uuid4().hex[:12]}"
registry = _background_tasks.get(None)
if registry is None:
# Registry isn't initialized (e.g. unit tests that bypass
# set_execution_context). Fall back to a fresh dict so we at least
# don't drop the task silently.
registry = {}
_background_tasks.set(registry)
if len(registry) >= MAX_BACKGROUND_TASKS_PER_SESSION:
oldest_id, oldest_entry = min(
registry.items(), key=lambda kv: kv[1]["started_at"]
)
oldest_task: asyncio.Task = oldest_entry["task"]
if not oldest_task.done():
oldest_task.cancel()
registry.pop(oldest_id, None)
logger.warning(
"Background registry full — evicted oldest entry %s (tool=%s)",
oldest_id,
oldest_entry["tool_name"],
)
registry[bg_id] = {
"task": task,
"tool_name": tool_name,
"started_at": time.monotonic(),
}
return bg_id
def get_background_task(background_id: str) -> dict[str, Any] | None:
"""Return the registered entry for *background_id*, or ``None``."""
registry = _background_tasks.get(None)
if registry is None:
return None
return registry.get(background_id)
def list_background_tasks() -> list[dict[str, Any]]:
"""Return a snapshot of every registered task in the current session.
Each entry: ``{background_id, tool_name, started_at, done}``. Used by
``check_background_tool(list=true)`` so the agent can recover IDs after
context compaction or a long pause.
"""
registry = _background_tasks.get(None)
if not registry:
return []
return [
{
"background_id": bg_id,
"tool_name": entry["tool_name"],
"started_at": entry["started_at"],
"done": entry["task"].done(),
}
for bg_id, entry in registry.items()
]
def unregister_background_task(background_id: str) -> None:
"""Drop a finished/cancelled task from the registry."""
registry = _background_tasks.get(None)
if registry is None:
return
registry.pop(background_id, None)
def cancel_all_background_tasks(reason: str = "stream ended") -> int:
"""Cancel every task in the registry and empty it.
Called from the stream's ``finally`` block so orphaned long-running
tools don't keep executing after the user leaves or the stream errors.
Returns the number of tasks that were cancelled.
"""
registry = _background_tasks.get(None)
if not registry:
return 0
cancelled = 0
for bg_id, entry in list(registry.items()):
task: asyncio.Task = entry["task"]
if not task.done():
task.cancel()
cancelled += 1
registry.pop(bg_id, None)
if cancelled:
logger.info("Cancelled %d orphaned background task(s) on %s", cancelled, reason)
return cancelled

View File

@@ -0,0 +1,155 @@
"""Tests for the background task registry."""
import asyncio
import contextlib
import pytest
from .background_registry import (
MAX_BACKGROUND_TASKS_PER_SESSION,
cancel_all_background_tasks,
get_background_task,
init_registry,
list_background_tasks,
register_background_task,
unregister_background_task,
)
@pytest.fixture(autouse=True)
def _init_for_each_test():
init_registry()
@pytest.mark.asyncio
async def test_register_and_lookup():
async def hang():
await asyncio.sleep(60)
task = asyncio.create_task(hang())
bg_id = register_background_task(task, "some_tool")
entry = get_background_task(bg_id)
assert entry is not None
assert entry["tool_name"] == "some_tool"
assert entry["task"] is task
task.cancel()
with contextlib.suppress(asyncio.CancelledError):
await task
@pytest.mark.asyncio
async def test_unregister_removes_entry():
async def hang():
await asyncio.sleep(60)
task = asyncio.create_task(hang())
bg_id = register_background_task(task, "some_tool")
unregister_background_task(bg_id)
assert get_background_task(bg_id) is None
task.cancel()
with contextlib.suppress(asyncio.CancelledError):
await task
@pytest.mark.asyncio
async def test_cancel_all_cancels_pending_tasks_and_empties_registry():
events = []
async def hang_with_cancel_trap(idx: int):
try:
await asyncio.sleep(60)
except asyncio.CancelledError:
events.append(idx)
raise
tasks = [asyncio.create_task(hang_with_cancel_trap(i)) for i in range(3)]
# Let the tasks start before cancellation.
await asyncio.sleep(0)
bg_ids = [register_background_task(t, f"tool_{i}") for i, t in enumerate(tasks)]
# Sanity check: all three actually got registered under real IDs.
for bg_id in bg_ids:
assert get_background_task(bg_id) is not None
count = cancel_all_background_tasks(reason="test")
assert count == 3
# Let the cancellations propagate.
for t in tasks:
with contextlib.suppress(asyncio.CancelledError):
await t
assert sorted(events) == [0, 1, 2]
# Registry should be empty now — verify using the actual IDs we registered.
for bg_id in bg_ids:
assert get_background_task(bg_id) is None
@pytest.mark.asyncio
async def test_registry_cap_evicts_oldest_on_overflow():
tasks: list[asyncio.Task] = []
ids: list[str] = []
async def hang():
await asyncio.sleep(60)
# Fill to capacity.
for _ in range(MAX_BACKGROUND_TASKS_PER_SESSION):
t = asyncio.create_task(hang())
tasks.append(t)
ids.append(register_background_task(t, "pool_tool"))
oldest_id = ids[0]
oldest_task = tasks[0]
assert get_background_task(oldest_id) is not None
# One more registration should evict + cancel the oldest.
extra_task = asyncio.create_task(hang())
extra_id = register_background_task(extra_task, "overflow_tool")
tasks.append(extra_task)
ids.append(extra_id)
assert get_background_task(oldest_id) is None
assert get_background_task(extra_id) is not None
# The evicted task was cancelled.
with contextlib.suppress(asyncio.CancelledError):
await oldest_task
assert oldest_task.cancelled()
# Cleanup.
for t in tasks[1:]:
t.cancel()
with contextlib.suppress(asyncio.CancelledError):
await t
@pytest.mark.asyncio
async def test_list_background_tasks_returns_snapshot():
async def hang():
await asyncio.sleep(60)
tasks = [asyncio.create_task(hang()) for _ in range(2)]
await asyncio.sleep(0)
bg_ids = [register_background_task(t, f"tool_{i}") for i, t in enumerate(tasks)]
snapshot = list_background_tasks()
assert len(snapshot) == 2
returned = {e["background_id"]: e for e in snapshot}
assert set(returned) == set(bg_ids)
for entry in snapshot:
assert entry["tool_name"].startswith("tool_")
assert entry["done"] is False
assert entry["started_at"] > 0
for t in tasks:
t.cancel()
with contextlib.suppress(asyncio.CancelledError):
await t
@pytest.mark.asyncio
async def test_list_background_tasks_empty():
assert list_background_tasks() == []

View File

@@ -107,6 +107,7 @@ from ..transcript import (
)
from ..transcript_builder import TranscriptBuilder
from .compaction import CompactionTracker, filter_compaction_messages
from .background_registry import cancel_all_background_tasks
from .env import build_sdk_env # noqa: F401 — re-export for backward compat
from .response_adapter import SDKResponseAdapter
from .security_hooks import create_security_hooks
@@ -162,9 +163,12 @@ _CIRCUIT_BREAKER_ERROR_MSG = (
)
# Idle timeout: abort the stream if no meaningful SDK message (only heartbeats)
# arrives for this many seconds. This catches hung tool calls (e.g. WebSearch
# hanging on a search provider that never responds).
_IDLE_TIMEOUT_SECONDS = 10 * 60 # 10 minutes
# arrives for this many seconds. Acts as a last-resort safety net — individual
# tools enforce their own timeouts at the MCP handler level (see BaseTool.
# timeout_seconds) and return a synthetic tool result to the agent on timeout.
# This stream-level timeout only fires if a tool's per-call timeout was
# disabled (timeout_seconds=None) or the SDK itself is stuck between messages.
_IDLE_TIMEOUT_SECONDS = 30 * 60 # 30 minutes
# Event types that are ephemeral / cosmetic and must NOT be counted toward
# ``events_yielded`` in the transient-retry loop. Counting them would prevent
@@ -1932,20 +1936,33 @@ async def _run_stream_attempt(
yield ev
yield StreamHeartbeat()
# Idle timeout: if no real SDK message for too long, a tool
# call is likely hung (e.g. WebSearch provider not responding).
# Idle timeout: last-resort safety net. Per-tool timeouts in
# the MCP handler normally catch hung tools first and return
# a synthetic tool result so the agent can recover. This only
# fires if a tool opted out of per-call timeouts or the SDK
# itself is stuck between messages.
idle_seconds = time.monotonic() - _last_real_msg_time
if idle_seconds >= _IDLE_TIMEOUT_SECONDS:
unresolved_ids = (
state.adapter.current_tool_calls.keys()
- state.adapter.resolved_tool_calls
)
unresolved_tools = {
tid: state.adapter.current_tool_calls[tid]
for tid in unresolved_ids
}
logger.error(
"%s Idle timeout after %.0fs with no SDK message — "
"aborting stream (likely hung tool call)",
"%s Idle timeout after %.0fs — unresolved tool calls: %s",
ctx.log_prefix,
idle_seconds,
", ".join(
f"{tc['name']}(id={tid[:12]})"
for tid, tc in unresolved_tools.items()
)
or "(none tracked)",
)
stream_error_msg = (
"A tool call appears to be stuck "
"(no response for 10 minutes). "
"Please try again."
"The session has been idle for too long. Please try again."
)
stream_error_code = "idle_timeout"
_append_error_marker(ctx.session, stream_error_msg, retryable=True)
@@ -2318,6 +2335,10 @@ async def _run_stream_attempt(
break
finally:
await _safe_close_sdk_client(sdk_client, ctx.log_prefix)
# Cancel any tool calls still parked in the background registry so
# orphaned long-running work (sub-AutoPilot, graph execution, etc.)
# doesn't keep running after the stream ends.
cancel_all_background_tasks(reason=f"stream ended ({ctx.log_prefix})")
# --- Post-stream processing (only on success) ---
if state.adapter.has_unresolved_tool_calls:

View File

@@ -37,6 +37,11 @@ from backend.copilot.tools import TOOL_REGISTRY
from backend.copilot.tools.base import BaseTool
from backend.util.truncate import truncate
# Background-task registry for tools that exceed their per-call timeout —
# lives in its own module to avoid a TOOL_REGISTRY import cycle with
# ``tools/check_background_tool.py``.
from .background_registry import init_registry as _init_background_registry
from .background_registry import register_background_task as _register_background_task
from .e2b_file_tools import (
E2B_FILE_TOOL_NAMES,
E2B_FILE_TOOLS,
@@ -134,6 +139,7 @@ def set_execution_context(
_pending_tool_outputs.set({})
_stash_event.set(asyncio.Event())
_consecutive_tool_failures.set({})
_init_background_registry()
def reset_stash_event() -> None:
@@ -248,15 +254,57 @@ async def _execute_tool_sync(
session: ChatSession,
args: dict[str, Any],
) -> dict[str, Any]:
"""Execute a tool synchronously and return MCP-formatted response."""
"""Execute a tool and return an MCP-formatted response.
Applies the tool's ``timeout_seconds`` budget (``None`` disables it).
On timeout the pending task is **not** cancelled — it is parked in the
background registry and a synthetic tool result is returned to the
agent along with a ``background_id``. The agent can then call
``check_background_tool`` to keep waiting, inspect status, or cancel.
This lets the autopilot decide on slow sub-agents / graph executions
instead of the handler making an irreversible choice.
"""
effective_id = f"sdk-{uuid.uuid4().hex[:12]}"
result = await base_tool.execute(
user_id=user_id,
session=session,
tool_call_id=effective_id,
**args,
task: asyncio.Task = asyncio.create_task(
base_tool.execute(
user_id=user_id,
session=session,
tool_call_id=effective_id,
**args,
),
name=f"tool:{base_tool.name}:{effective_id}",
)
timeout = base_tool.timeout_seconds
try:
if timeout is None:
result = await task
else:
# asyncio.wait (unlike wait_for) does NOT cancel on timeout — the
# task keeps running in the background.
await asyncio.wait({task}, timeout=timeout)
if not task.done():
bg_id = _register_background_task(task, base_tool.name)
logger.warning(
"Tool %s exceeded %ss budget — parked as "
"background_id=%s (args=%s)",
base_tool.name,
timeout,
bg_id,
_redact_args_for_log(args),
)
return _tool_background_result(base_tool.name, timeout, bg_id)
# Completed within budget — .result() re-raises any exception.
result = task.result()
except asyncio.CancelledError:
# The handler itself was cancelled (e.g. stream teardown) mid-wait.
# Cancel the child so it doesn't keep running untracked — the
# registry hasn't seen it yet, so cancel_all_background_tasks
# couldn't clean it up.
if not task.done():
task.cancel()
raise
text = (
result.output if isinstance(result.output, str) else json.dumps(result.output)
)
@@ -267,6 +315,65 @@ async def _execute_tool_sync(
}
def _tool_background_result(
tool_name: str, timeout: int, background_id: str
) -> dict[str, Any]:
"""Build a synthetic tool result when a call is parked as a background task.
The task is still running; the agent receives this so the stream can
continue and the autopilot can decide whether to keep waiting or cancel
via ``check_background_tool``.
"""
payload = {
"type": "background",
"tool": tool_name,
"timeout_seconds": timeout,
"background_id": background_id,
"message": (
f"Still running after {timeout}s — use check_background_tool "
"to wait longer or cancel."
),
}
return {
"content": [{"type": "text", "text": json.dumps(payload, ensure_ascii=False)}],
"isError": False,
}
# Keys that may carry credentials / PII. Values for these keys are replaced
# with '<redacted>' in monitoring logs.
_SENSITIVE_ARG_KEYS = frozenset(
{
"api_key",
"apikey",
"authorization",
"auth",
"credentials",
"password",
"secret",
"token",
}
)
def _redact_args_for_log(args: dict[str, Any]) -> str:
"""Render args for log monitoring, redacting sensitive keys and truncating
long string values."""
try:
rendered: dict[str, Any] = {}
for k, v in args.items():
if k.lower() in _SENSITIVE_ARG_KEYS:
rendered[k] = "<redacted>"
continue
if isinstance(v, str) and len(v) > 200:
rendered[k] = v[:200] + ""
else:
rendered[k] = v
return json.dumps(rendered, default=str)[:500]
except (TypeError, ValueError):
return str(args)[:500]
def _mcp_error(message: str) -> dict[str, Any]:
return {
"content": [

View File

@@ -251,11 +251,16 @@ class TestTruncationAndStashIntegration:
# ---------------------------------------------------------------------------
def _make_mock_tool(name: str, output: str = "result") -> MagicMock:
def _make_mock_tool(
name: str,
output: str = "result",
timeout_seconds: int | None = 600,
) -> MagicMock:
"""Return a BaseTool mock that returns a successful StreamToolOutputAvailable."""
tool = MagicMock()
tool.name = name
tool.parameters = {"properties": {}, "required": []}
tool.timeout_seconds = timeout_seconds
tool.execute = AsyncMock(
return_value=StreamToolOutputAvailable(
toolCallId="test-id",
@@ -336,6 +341,216 @@ class TestCreateToolHandler:
assert mock_tool.execute.await_count == 2
class TestToolTimeout:
"""Tests for per-tool timeout behavior in _execute_tool_sync."""
@pytest.fixture(autouse=True)
def _init(self):
_init_ctx(session=_make_mock_session())
@pytest.mark.asyncio
async def test_timeout_parks_task_and_returns_background_id(self):
"""A tool that exceeds its timeout is moved to the background
registry (not cancelled); the handler returns a synthetic
type='background' result with a background_id."""
from backend.copilot.sdk.background_registry import (
get_background_task,
unregister_background_task,
)
mock_tool = _make_mock_tool("slow_tool", timeout_seconds=1)
async def hang_forever(*_args, **_kwargs):
await asyncio.sleep(60)
return StreamToolOutputAvailable(
toolCallId="t1",
output="late",
toolName="slow_tool",
success=True,
)
mock_tool.execute = AsyncMock(side_effect=hang_forever)
handler = create_tool_handler(mock_tool)
result = await handler({"arg": "v"})
# isError=False because the task is still running — the agent isn't
# being told about a failure, just about a delay.
assert result["isError"] is False
payload = json.loads(result["content"][0]["text"])
assert payload["type"] == "background"
assert payload["tool"] == "slow_tool"
assert payload["timeout_seconds"] == 1
assert payload["background_id"].startswith("bg-")
entry = get_background_task(payload["background_id"])
assert entry is not None
assert entry["tool_name"] == "slow_tool"
assert not entry["task"].done()
# Cleanup: cancel the parked task so the test doesn't leak it.
entry["task"].cancel()
try:
await entry["task"]
except (asyncio.CancelledError, BaseException):
pass
unregister_background_task(payload["background_id"])
@pytest.mark.asyncio
async def test_timeout_does_not_cancel_tool_coroutine(self):
"""The task keeps running in the background after the timeout
budget is exceeded — cancellation is the agent's choice."""
from backend.copilot.sdk.background_registry import (
get_background_task,
unregister_background_task,
)
mock_tool = _make_mock_tool("slow_tool", timeout_seconds=1)
observed_cancel = asyncio.Event()
async def stays_alive(*_args, **_kwargs):
try:
await asyncio.sleep(3)
except asyncio.CancelledError:
observed_cancel.set()
raise
return StreamToolOutputAvailable(
toolCallId="t1",
output="eventual",
toolName="slow_tool",
success=True,
)
mock_tool.execute = AsyncMock(side_effect=stays_alive)
handler = create_tool_handler(mock_tool)
result = await handler({})
payload = json.loads(result["content"][0]["text"])
entry = get_background_task(payload["background_id"])
assert entry is not None
# Give the background task a brief moment; it should still be
# running and NOT cancelled.
await asyncio.sleep(0.1)
assert not observed_cancel.is_set()
assert not entry["task"].done()
# Let it complete so the test stays clean.
await entry["task"]
unregister_background_task(payload["background_id"])
@pytest.mark.asyncio
async def test_none_timeout_disables_wait_for(self):
"""When timeout_seconds is None, the tool runs to completion without
an outer timeout wrapper."""
mock_tool = _make_mock_tool(
"long_running_tool",
output="completed",
timeout_seconds=None,
)
async def slow_but_completes(*_args, **_kwargs):
await asyncio.sleep(0.05)
return StreamToolOutputAvailable(
toolCallId="t1",
output="completed",
toolName="long_running_tool",
success=True,
)
mock_tool.execute = AsyncMock(side_effect=slow_but_completes)
handler = create_tool_handler(mock_tool)
result = await handler({})
assert result["isError"] is False
assert "completed" in result["content"][0]["text"]
@pytest.mark.asyncio
async def test_handler_cancellation_cancels_child_task(self):
"""If the handler itself is cancelled before the tool completes,
the child task is cancelled too (no leak into the background
registry, since it wasn't parked yet)."""
import contextlib
mock_tool = _make_mock_tool("slow_tool", timeout_seconds=60)
child_cancelled = asyncio.Event()
async def hang_until_cancelled(*_args, **_kwargs):
try:
await asyncio.sleep(60)
except asyncio.CancelledError:
child_cancelled.set()
raise
mock_tool.execute = AsyncMock(side_effect=hang_until_cancelled)
from backend.copilot.sdk.tool_adapter import _execute_tool_sync
outer_task = asyncio.create_task(
_execute_tool_sync(mock_tool, "u", _make_mock_session(), {})
)
# Let the handler start waiting on the child.
await asyncio.sleep(0.05)
outer_task.cancel()
with contextlib.suppress(asyncio.CancelledError):
await outer_task
await asyncio.sleep(0)
assert child_cancelled.is_set()
@pytest.mark.asyncio
async def test_fast_tool_within_timeout_succeeds(self):
"""Tools that complete well under the timeout are unaffected."""
mock_tool = _make_mock_tool(
"fast_tool",
output="fast-ok",
timeout_seconds=30,
)
handler = create_tool_handler(mock_tool)
result = await handler({})
assert result["isError"] is False
assert "fast-ok" in result["content"][0]["text"]
class TestBaseToolDefaultTimeout:
"""The BaseTool default timeout and per-tool overrides."""
def test_default_timeout_is_ten_minutes(self):
from backend.copilot.tools.base import BaseTool
class _Plain(BaseTool):
@property
def name(self):
return "plain"
@property
def description(self):
return ""
@property
def parameters(self):
return {"type": "object", "properties": {}}
assert _Plain().timeout_seconds == 600
def test_run_agent_opts_out(self):
from backend.copilot.tools.run_agent import RunAgentTool
assert RunAgentTool().timeout_seconds is None
def test_run_block_opts_out(self):
from backend.copilot.tools.run_block import RunBlockTool
assert RunBlockTool().timeout_seconds is None
def test_continue_run_block_opts_out(self):
from backend.copilot.tools.continue_run_block import ContinueRunBlockTool
assert ContinueRunBlockTool().timeout_seconds is None
# ---------------------------------------------------------------------------
# Regression tests: bugs fixed by removing pre-launch mechanism
#

View File

@@ -13,6 +13,7 @@ from .agent_output import AgentOutputTool
from .ask_question import AskQuestionTool
from .base import BaseTool
from .bash_exec import BashExecTool
from .check_background_tool import CheckBackgroundToolTool
from .connect_integration import ConnectIntegrationTool
from .continue_run_block import ContinueRunBlockTool
from .create_agent import CreateAgentTool
@@ -81,6 +82,7 @@ TOOL_REGISTRY: dict[str, BaseTool] = {
"run_agent": RunAgentTool(),
"run_block": RunBlockTool(),
"continue_run_block": ContinueRunBlockTool(),
"check_background_tool": CheckBackgroundToolTool(),
"run_mcp_tool": RunMCPToolTool(),
"get_mcp_guide": GetMCPGuideTool(),
"view_agent_output": AgentOutputTool(),

View File

@@ -140,6 +140,21 @@ class BaseTool:
"""
return True
@property
def timeout_seconds(self) -> int | None:
"""Maximum seconds a single invocation may run before soft-timing out.
On timeout the MCP handler cancels the call and returns a synthetic
tool result to the agent (rather than hard-killing the stream), so
the agent can decide to retry, check progress via another tool, or
move on.
Return ``None`` to disable the per-call timeout — appropriate for
tools that manage their own lifecycle (e.g. ``run_agent`` polls an
execution, ``run_block`` can delegate to a sub-AutoPilot).
"""
return 10 * 60 # 10 minutes
def as_openai_tool(self) -> ChatCompletionToolParam:
"""Convert to OpenAI tool format."""
return ChatCompletionToolParam(

View File

@@ -42,6 +42,11 @@ class BashExecTool(BaseTool):
def name(self) -> str:
return "bash_exec"
# BaseTool.timeout_seconds=600 is inherited but never fires in practice:
# the `timeout` parameter on each call is capped at 120s by this tool's
# own subprocess timeout, so the MCP handler's budget is only a safety
# net for pathological hangs around sandbox setup/teardown.
@property
def description(self) -> str:
return (

View File

@@ -0,0 +1,290 @@
"""Tool for waiting on, polling, or cancelling a backgrounded tool call.
Long-running tool calls exceed their per-call timeout and are parked in the
background registry by :func:`_execute_tool_sync`. This tool lets the agent
decide whether to keep waiting, poll status, or cancel — so the autopilot
stays in control rather than the handler making an irreversible choice.
"""
import asyncio
import logging
import time
from typing import Any
from backend.copilot.model import ChatSession
from backend.copilot.sdk.background_registry import (
MAX_BACKGROUND_WAIT_SECONDS as _MAX_BACKGROUND_WAIT_SECONDS,
)
from backend.copilot.sdk.background_registry import (
get_background_task,
list_background_tasks,
unregister_background_task,
)
from .base import BaseTool
from .models import (
BackgroundToolList,
BackgroundToolListEntry,
BackgroundToolStatus,
ErrorResponse,
ToolResponseBase,
)
logger = logging.getLogger(__name__)
class CheckBackgroundToolTool(BaseTool):
"""Inspect, wait on, or cancel a backgrounded tool call."""
@property
def name(self) -> str:
return "check_background_tool"
@property
def requires_auth(self) -> bool:
# Parked tasks almost always originate from authenticated tools
# (run_agent, run_block). Require auth here too for consistency
# with those tools even though ContextVar scoping already prevents
# cross-session leakage.
return True
@property
def timeout_seconds(self) -> int | None:
# This tool drives its own wait loop up to _MAX_BACKGROUND_WAIT_SECONDS.
# Applying a second timeout on top would be redundant and could cancel
# the wait prematurely.
return None
@property
def description(self) -> str:
return (
"Inspect a backgrounded tool call by its background_id. "
"Use when a prior tool call returned type='background'. "
"Options: list=true to enumerate all active background tasks, "
"wait for completion up to wait_seconds (default 60, max "
f"{_MAX_BACKGROUND_WAIT_SECONDS}), just check status with "
"wait_seconds=0, or cancel=true to abort the task and "
"discard its result."
)
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"list": {
"type": "boolean",
"description": (
"If true, return every active background task in "
"this session (no other params needed). Use to "
"recover background_ids after a context compaction."
),
"default": False,
},
"background_id": {
"type": "string",
"description": (
"The background_id returned by the timed-out tool. "
"Required unless list=true."
),
},
"wait_seconds": {
"type": "integer",
"description": (
"Max seconds to wait for completion. 0 = just check "
"status. Values above "
f"{_MAX_BACKGROUND_WAIT_SECONDS} are clamped to that "
"maximum — call again to keep waiting."
),
"default": 60,
},
"cancel": {
"type": "boolean",
"description": (
"If true, cancel the background task and discard "
"its result. Takes precedence over wait_seconds."
),
"default": False,
},
},
}
async def _execute(
self,
user_id: str | None,
session: ChatSession,
*,
list: bool = False,
background_id: str = "",
wait_seconds: int = 60,
cancel: bool = False,
**kwargs,
) -> ToolResponseBase:
if list:
return _list_response(session)
if not background_id:
return ErrorResponse(
message=(
"background_id is required (or pass list=true to "
"enumerate active tasks)."
),
session_id=session.session_id,
)
entry = get_background_task(background_id)
if entry is None:
return ErrorResponse(
message=(
f"No background task with id {background_id}. It may "
"have already completed (and been consumed) or never "
"existed."
),
session_id=session.session_id,
)
task: asyncio.Task = entry["task"]
tool_name: str = entry["tool_name"]
if cancel:
# Race guard: the task may have finished between the registry
# lookup and the cancel. If so, surface the real result rather
# than reporting 'cancelled' and losing the output.
if task.done():
return _status_from_finished_task(
session, tool_name, background_id, task
)
# Dry-run: simulate cancellation without touching the task, so
# the LLM can reason about the flow without real side effects.
if session.dry_run:
return BackgroundToolStatus(
message=(
f"[dry-run] Would cancel background task for " f"'{tool_name}'."
),
session_id=session.session_id,
status="cancelled",
tool=tool_name,
background_id=background_id,
)
task.cancel()
unregister_background_task(background_id)
logger.info(
"Cancelled background task %s for tool %s by agent request",
background_id,
tool_name,
)
return BackgroundToolStatus(
message=f"Cancelled background task for '{tool_name}'.",
session_id=session.session_id,
status="cancelled",
tool=tool_name,
background_id=background_id,
)
if task.done():
return _status_from_finished_task(session, tool_name, background_id, task)
effective_wait = max(0, min(wait_seconds, _MAX_BACKGROUND_WAIT_SECONDS))
if effective_wait == 0:
return BackgroundToolStatus(
message=(
f"'{tool_name}' is still running. Call again with "
"wait_seconds>0 to wait, or cancel=true to abort."
),
session_id=session.session_id,
status="still_running",
tool=tool_name,
background_id=background_id,
)
await asyncio.wait({task}, timeout=effective_wait)
if task.done():
return _status_from_finished_task(session, tool_name, background_id, task)
return BackgroundToolStatus(
message=(
f"'{tool_name}' still running after waiting "
f"{effective_wait}s. Call again to keep waiting, or "
"cancel=true to abort."
),
session_id=session.session_id,
status="still_running",
tool=tool_name,
background_id=background_id,
waited_seconds=effective_wait,
)
def _list_response(session: ChatSession) -> BackgroundToolList:
"""Build the response for ``check_background_tool(list=true)``."""
now = time.monotonic()
entries = [
BackgroundToolListEntry(
background_id=e["background_id"],
tool=e["tool_name"],
age_seconds=round(now - e["started_at"], 2),
done=e["done"],
)
for e in list_background_tasks()
]
count = len(entries)
msg = (
f"{count} active background task(s)."
if count
else "No active background tasks."
)
return BackgroundToolList(
message=msg,
session_id=session.session_id,
tasks=entries,
)
def _status_from_finished_task(
session: ChatSession,
tool_name: str,
background_id: str,
task: asyncio.Task,
) -> ToolResponseBase:
"""Unregister a finished task and return its status."""
unregister_background_task(background_id)
if task.cancelled():
return BackgroundToolStatus(
message=f"Background task for '{tool_name}' was cancelled.",
session_id=session.session_id,
status="cancelled",
tool=tool_name,
background_id=background_id,
)
exc = task.exception()
if exc is not None:
return BackgroundToolStatus(
message=f"'{tool_name}' raised {type(exc).__name__}: {exc}",
session_id=session.session_id,
status="error",
tool=tool_name,
background_id=background_id,
)
result = task.result()
# A tool can complete with success=False without raising — preserve
# that as status="error" so the agent doesn't treat it as a win.
if not result.success:
return BackgroundToolStatus(
message=f"'{tool_name}' completed with an error.",
session_id=session.session_id,
status="error",
tool=tool_name,
background_id=background_id,
output=result.output,
)
return BackgroundToolStatus(
message=f"'{tool_name}' completed.",
session_id=session.session_id,
status="completed",
tool=tool_name,
background_id=background_id,
output=result.output,
)

View File

@@ -0,0 +1,318 @@
"""Tests for CheckBackgroundToolTool."""
import asyncio
import contextlib
from unittest.mock import MagicMock
import pytest
from backend.copilot.response_model import StreamToolOutputAvailable
from backend.copilot.sdk.background_registry import (
init_registry,
register_background_task,
)
from .check_background_tool import CheckBackgroundToolTool
from .models import BackgroundToolList, BackgroundToolStatus
def _make_session() -> MagicMock:
session = MagicMock()
session.session_id = "s1"
session.dry_run = False
return session
def _completed_result(output: str = "ok") -> StreamToolOutputAvailable:
return StreamToolOutputAvailable(
toolCallId="tc-1",
output=output,
toolName="slow_tool",
success=True,
)
@pytest.fixture(autouse=True)
def _init_registry_for_each_test():
init_registry()
class TestCheckBackgroundTool:
@pytest.mark.asyncio
async def test_missing_background_id_returns_error(self):
tool = CheckBackgroundToolTool()
response = await tool._execute(
user_id="u",
session=_make_session(),
background_id="",
)
assert response.type.value == "error"
@pytest.mark.asyncio
async def test_unknown_background_id_returns_error(self):
tool = CheckBackgroundToolTool()
response = await tool._execute(
user_id="u",
session=_make_session(),
background_id="bg-does-not-exist",
)
assert response.type.value == "error"
assert "No background task" in response.message
@pytest.mark.asyncio
async def test_wait_zero_returns_still_running(self):
async def slow():
await asyncio.sleep(10)
return _completed_result()
task = asyncio.create_task(slow())
bg_id = register_background_task(task, "slow_tool")
tool = CheckBackgroundToolTool()
response = await tool._execute(
user_id="u",
session=_make_session(),
background_id=bg_id,
wait_seconds=0,
)
assert isinstance(response, BackgroundToolStatus)
assert response.status == "still_running"
assert response.background_id == bg_id
task.cancel()
with contextlib.suppress(asyncio.CancelledError):
await task
@pytest.mark.asyncio
async def test_wait_returns_completed_when_task_finishes(self):
async def fast():
await asyncio.sleep(0.05)
return _completed_result("final-output")
task = asyncio.create_task(fast())
bg_id = register_background_task(task, "slow_tool")
tool = CheckBackgroundToolTool()
response = await tool._execute(
user_id="u",
session=_make_session(),
background_id=bg_id,
wait_seconds=5,
)
assert isinstance(response, BackgroundToolStatus)
assert response.status == "completed"
assert response.output == "final-output"
@pytest.mark.asyncio
async def test_wait_times_out_and_returns_still_running(self):
async def slow():
await asyncio.sleep(10)
return _completed_result()
task = asyncio.create_task(slow())
bg_id = register_background_task(task, "slow_tool")
tool = CheckBackgroundToolTool()
response = await tool._execute(
user_id="u",
session=_make_session(),
background_id=bg_id,
wait_seconds=1,
)
assert isinstance(response, BackgroundToolStatus)
assert response.status == "still_running"
assert response.waited_seconds == 1
task.cancel()
with contextlib.suppress(asyncio.CancelledError):
await task
@pytest.mark.asyncio
async def test_cancel_true_cancels_and_removes_from_registry(self):
observed_cancel = asyncio.Event()
async def stays_until_cancelled():
try:
await asyncio.sleep(60)
except asyncio.CancelledError:
observed_cancel.set()
raise
return _completed_result()
task = asyncio.create_task(stays_until_cancelled())
# Let the task start before we cancel it.
await asyncio.sleep(0)
bg_id = register_background_task(task, "slow_tool")
tool = CheckBackgroundToolTool()
response = await tool._execute(
user_id="u",
session=_make_session(),
background_id=bg_id,
cancel=True,
)
assert isinstance(response, BackgroundToolStatus)
assert response.status == "cancelled"
with contextlib.suppress(asyncio.CancelledError):
await task
assert observed_cancel.is_set()
from backend.copilot.sdk.background_registry import get_background_task
assert get_background_task(bg_id) is None
@pytest.mark.asyncio
async def test_cancel_after_task_completed_returns_real_result(self):
"""If the task completes between registration and the agent's
cancel=true call, surface the real result instead of reporting
'cancelled' and losing the output (race guard)."""
async def finish_quickly():
return _completed_result("final-value")
task = asyncio.create_task(finish_quickly())
await task # definitely done by the time we register
bg_id = register_background_task(task, "slow_tool")
tool = CheckBackgroundToolTool()
response = await tool._execute(
user_id="u",
session=_make_session(),
background_id=bg_id,
cancel=True,
)
assert isinstance(response, BackgroundToolStatus)
assert response.status == "completed"
assert response.output == "final-value"
@pytest.mark.asyncio
async def test_errored_task_reports_error_status(self):
async def raises():
raise ValueError("boom")
task = asyncio.create_task(raises())
# Let the task complete before we query it.
try:
await task
except ValueError:
pass
bg_id = register_background_task(task, "broken_tool")
tool = CheckBackgroundToolTool()
response = await tool._execute(
user_id="u",
session=_make_session(),
background_id=bg_id,
)
assert isinstance(response, BackgroundToolStatus)
assert response.status == "error"
assert "boom" in response.message
@pytest.mark.asyncio
async def test_finished_task_with_success_false_reports_error(self):
"""A tool that completes with success=False (without raising) is
reported as status='error', not 'completed', so the agent doesn't
treat it as a win."""
async def finish_with_failure():
return StreamToolOutputAvailable(
toolCallId="tc-1",
output="partial",
toolName="broken_tool",
success=False,
)
task = asyncio.create_task(finish_with_failure())
await task
bg_id = register_background_task(task, "broken_tool")
tool = CheckBackgroundToolTool()
response = await tool._execute(
user_id="u",
session=_make_session(),
background_id=bg_id,
)
assert isinstance(response, BackgroundToolStatus)
assert response.status == "error"
assert response.output == "partial"
@pytest.mark.asyncio
async def test_list_true_returns_active_background_tasks(self):
"""list=true enumerates registered tasks so the agent can recover
forgotten background_ids."""
async def hang():
await asyncio.sleep(60)
tasks = [asyncio.create_task(hang()) for _ in range(2)]
await asyncio.sleep(0)
bg_ids = [register_background_task(t, f"tool_{i}") for i, t in enumerate(tasks)]
tool = CheckBackgroundToolTool()
response = await tool._execute(
user_id="u",
session=_make_session(),
list=True,
)
assert isinstance(response, BackgroundToolList)
assert len(response.tasks) == 2
returned_ids = {entry.background_id for entry in response.tasks}
assert returned_ids == set(bg_ids)
for entry in response.tasks:
assert entry.tool.startswith("tool_")
assert entry.age_seconds >= 0
assert entry.done is False
for t in tasks:
t.cancel()
with contextlib.suppress(asyncio.CancelledError):
await t
@pytest.mark.asyncio
async def test_list_true_empty_when_no_tasks(self):
tool = CheckBackgroundToolTool()
response = await tool._execute(
user_id="u",
session=_make_session(),
list=True,
)
assert isinstance(response, BackgroundToolList)
assert response.tasks == []
@pytest.mark.asyncio
async def test_cancel_in_dry_run_does_not_actually_cancel_task(self):
"""Under session.dry_run, cancel=true must not kill the real task."""
async def hang():
await asyncio.sleep(60)
task = asyncio.create_task(hang())
await asyncio.sleep(0)
bg_id = register_background_task(task, "slow_tool")
session = _make_session()
session.dry_run = True
tool = CheckBackgroundToolTool()
response = await tool._execute(
user_id="u",
session=session,
background_id=bg_id,
cancel=True,
)
assert isinstance(response, BackgroundToolStatus)
assert response.status == "cancelled"
assert "[dry-run]" in response.message
# Real task is still running.
assert not task.done()
# Cleanup.
task.cancel()
with contextlib.suppress(asyncio.CancelledError):
await task
def test_requires_auth_is_true(self):
tool = CheckBackgroundToolTool()
assert tool.requires_auth is True

View File

@@ -28,6 +28,12 @@ class ContinueRunBlockTool(BaseTool):
def name(self) -> str:
return "continue_run_block"
@property
def timeout_seconds(self) -> int | None:
# Resumes an execution that may be a long-running (sub-AutoPilot)
# block — same lifecycle as run_block.
return None
@property
def description(self) -> str:
return "Resume block execution after a run_block call returned review_required. Pass the review_id."

View File

@@ -259,6 +259,45 @@ class ErrorResponse(ToolResponseBase):
details: dict[str, Any] | None = None
class BackgroundToolStatus(ToolResponseBase):
"""Status of a backgrounded tool call, returned by ``check_background_tool``."""
type: ResponseType = ResponseType.MCP_TOOL_OUTPUT
status: Literal["completed", "still_running", "cancelled", "error"] = Field(
description="Current state of the background task."
)
tool: str = Field(description="The name of the originally-backgrounded tool.")
background_id: str
output: Any | None = Field(
default=None,
description="Tool output when status=completed or status=error.",
)
waited_seconds: int | None = Field(default=None)
class BackgroundToolListEntry(BaseModel):
"""One row in a ``check_background_tool(list=true)`` response."""
background_id: str
tool: str = Field(description="Name of the originally-backgrounded tool.")
age_seconds: float = Field(
description="Seconds since the task was parked in the background."
)
done: bool = Field(
description="True if the task has finished but hasn't been consumed yet."
)
class BackgroundToolList(ToolResponseBase):
"""List of active background tasks, returned by ``check_background_tool(list=true)``."""
type: ResponseType = ResponseType.MCP_TOOL_OUTPUT
tasks: list[BackgroundToolListEntry] = Field(
default_factory=list,
description="All background tasks currently registered for this session.",
)
class InputValidationErrorResponse(ToolResponseBase):
"""Response when run_agent receives unknown input fields."""

View File

@@ -104,6 +104,13 @@ class RunAgentTool(BaseTool):
def name(self) -> str:
return "run_agent"
@property
def timeout_seconds(self) -> int | None:
# Agent executions can legitimately run 15-45+ min; the tool polls
# its own wait_for_result window and returns an execution_id for
# later progress checks, so the stream-level timeout isn't needed.
return None
@property
def description(self) -> str:
return (

View File

@@ -27,6 +27,13 @@ class RunBlockTool(BaseTool):
def name(self) -> str:
return "run_block"
@property
def timeout_seconds(self) -> int | None:
# May delegate to AutoPilotBlock (sub-autopilot), which runs its own
# multi-turn stream of 15-45+ min. Per-call timeout is disabled here
# and left to the block's own execution lifecycle.
return None
@property
def description(self) -> str:
return (

View File

@@ -1354,7 +1354,9 @@
},
{
"$ref": "#/components/schemas/MemoryForgetConfirmResponse"
}
},
{ "$ref": "#/components/schemas/BackgroundToolStatus" },
{ "$ref": "#/components/schemas/BackgroundToolList" }
],
"title": "Response Getv2[Dummy] Tool Response Type Export For Codegen"
}
@@ -8430,6 +8432,91 @@
"required": ["sso_url", "expires_at"],
"title": "AyrshareSSOResponse"
},
"BackgroundToolList": {
"properties": {
"type": {
"$ref": "#/components/schemas/ResponseType",
"default": "mcp_tool_output"
},
"message": { "type": "string", "title": "Message" },
"session_id": {
"anyOf": [{ "type": "string" }, { "type": "null" }],
"title": "Session Id"
},
"tasks": {
"items": { "$ref": "#/components/schemas/BackgroundToolListEntry" },
"type": "array",
"title": "Tasks",
"description": "All background tasks currently registered for this session."
}
},
"type": "object",
"required": ["message"],
"title": "BackgroundToolList",
"description": "List of active background tasks, returned by ``check_background_tool(list=true)``."
},
"BackgroundToolListEntry": {
"properties": {
"background_id": { "type": "string", "title": "Background Id" },
"tool": {
"type": "string",
"title": "Tool",
"description": "Name of the originally-backgrounded tool."
},
"age_seconds": {
"type": "number",
"title": "Age Seconds",
"description": "Seconds since the task was parked in the background."
},
"done": {
"type": "boolean",
"title": "Done",
"description": "True if the task has finished but hasn't been consumed yet."
}
},
"type": "object",
"required": ["background_id", "tool", "age_seconds", "done"],
"title": "BackgroundToolListEntry",
"description": "One row in a ``check_background_tool(list=true)`` response."
},
"BackgroundToolStatus": {
"properties": {
"type": {
"$ref": "#/components/schemas/ResponseType",
"default": "mcp_tool_output"
},
"message": { "type": "string", "title": "Message" },
"session_id": {
"anyOf": [{ "type": "string" }, { "type": "null" }],
"title": "Session Id"
},
"status": {
"type": "string",
"enum": ["completed", "still_running", "cancelled", "error"],
"title": "Status",
"description": "Current state of the background task."
},
"tool": {
"type": "string",
"title": "Tool",
"description": "The name of the originally-backgrounded tool."
},
"background_id": { "type": "string", "title": "Background Id" },
"output": {
"anyOf": [{}, { "type": "null" }],
"title": "Output",
"description": "Tool output when status=completed or status=error."
},
"waited_seconds": {
"anyOf": [{ "type": "integer" }, { "type": "null" }],
"title": "Waited Seconds"
}
},
"type": "object",
"required": ["message", "status", "tool", "background_id"],
"title": "BackgroundToolStatus",
"description": "Status of a backgrounded tool call, returned by ``check_background_tool``."
},
"BaseGraph-Input": {
"properties": {
"id": { "type": "string", "title": "Id" },

Binary file not shown.

After

Width:  |  Height:  |  Size: 100 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 107 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 123 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 121 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 116 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 101 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 108 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 81 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 90 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 95 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 99 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 98 KiB