Compare commits
12 Commits
master
...
test-scree
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
e51c287ae4 | ||
|
|
58ce293ec0 | ||
|
|
88b515c191 | ||
|
|
39f04b8990 | ||
|
|
b1b45e57e2 | ||
|
|
6b199d2b9c | ||
|
|
38d3c506a1 | ||
|
|
be500ba0e3 | ||
|
|
8915b2958c | ||
|
|
453e90d0f4 | ||
|
|
bca21e84e4 | ||
|
|
c32a4017fe |
@@ -5,7 +5,7 @@ user-invocable: true
|
||||
argument-hint: "[worktree path or PR number] — tests the PR in the given worktree. Optional flags: --fix (auto-fix issues found)"
|
||||
metadata:
|
||||
author: autogpt-team
|
||||
version: "2.0.0"
|
||||
version: "2.1.0"
|
||||
---
|
||||
|
||||
# Manual E2E Test
|
||||
@@ -248,7 +248,87 @@ docker ps --format "{{.Names}}" | grep -E "rest_server|executor|copilot|websocke
|
||||
done
|
||||
```
|
||||
|
||||
### 3e. Build and start
|
||||
**Native mode also:** when running the app natively (see 3e-native), kill any stray host processes and free the app ports before starting — otherwise `poetry run app` and `pnpm dev` will fail to bind.
|
||||
|
||||
```bash
|
||||
# Kill stray native app processes from prior runs
|
||||
pkill -9 -f "python.*backend" 2>/dev/null || true
|
||||
pkill -9 -f "poetry run app" 2>/dev/null || true
|
||||
pkill -9 -f "next-server|next dev" 2>/dev/null || true
|
||||
|
||||
# Free app ports (errors per port are ignored — port may simply be unused)
|
||||
for port in 3000 8006 8001 8002 8005 8008; do
|
||||
lsof -ti :$port -sTCP:LISTEN | xargs -r kill -9 2>/dev/null || true
|
||||
done
|
||||
```
|
||||
|
||||
### 3e-native. Run the app natively (PREFERRED for iterative dev)
|
||||
|
||||
Native mode runs infra (postgres, supabase, redis, rabbitmq, clamav) in docker but runs the backend and frontend directly on the host. This avoids the 3-8 minute `docker compose build` cycle on every backend change — code edits are picked up on process restart (seconds) instead of a full image rebuild.
|
||||
|
||||
**When to prefer native mode (default for this skill):**
|
||||
- Iterative dev/debug loops where you're editing backend or frontend code between test runs
|
||||
- Any PR that touches Python/TS source but not Dockerfiles, compose config, or infra images
|
||||
- Fast repro of a failing scenario — restart `poetry run app` in a couple of seconds
|
||||
|
||||
**When to prefer docker mode (3e fallback):**
|
||||
- Testing changes to `Dockerfile`, `docker-compose.yml`, or base images
|
||||
- Production-parity smoke tests (exact container env, networking, volumes)
|
||||
- CI-equivalent runs where you need the exact image that'll ship
|
||||
|
||||
**Note on 3b (copilot auth):** in native mode, the runtime `npm install -g @anthropic-ai/claude-code` step is NOT required. The `claude_agent_sdk` bundled CLI ships with the poetry venv and is on `PATH` when you run commands via `poetry run`. The OAuth token extraction still applies (same `refresh_claude_token.sh` call).
|
||||
|
||||
**Preamble:** before starting native, run the kill-stray + free-ports block from 3c's "Native mode also" subsection.
|
||||
|
||||
**1. Start infra only (one-time per session):**
|
||||
|
||||
```bash
|
||||
cd $PLATFORM_DIR && docker compose --profile local up deps --detach --remove-orphans --build
|
||||
```
|
||||
|
||||
This brings up postgres/supabase/redis/rabbitmq/clamav and skips all app services.
|
||||
|
||||
**2. Start the backend natively:**
|
||||
|
||||
```bash
|
||||
cd $BACKEND_DIR && (poetry run app 2>&1 | tee .ign.application.logs) &
|
||||
```
|
||||
|
||||
`poetry run app` spawns **all** app subprocesses — `rest_server`, `executor`, `copilot_executor`, `websocket`, `scheduler`, `notification_server`, `database_manager` — inside ONE parent process. No separate containers, no separate terminals. The `.ign.application.logs` prefix is already gitignored.
|
||||
|
||||
**3. Wait for the backend on :8006 BEFORE starting the frontend.** This ordering matters — the frontend's `pnpm dev` startup invokes `generate-api-queries`, which fetches `/openapi.json` from the backend. If the backend isn't listening yet, `pnpm dev` fails immediately.
|
||||
|
||||
```bash
|
||||
for i in $(seq 1 60); do
|
||||
if [ "$(curl -s -o /dev/null -w '%{http_code}' http://localhost:8006/docs 2>/dev/null)" = "200" ]; then
|
||||
echo "Backend ready"
|
||||
break
|
||||
fi
|
||||
sleep 2
|
||||
done
|
||||
```
|
||||
|
||||
**4. Start the frontend natively:**
|
||||
|
||||
```bash
|
||||
cd $FRONTEND_DIR && (pnpm dev 2>&1 | tee .ign.frontend.logs) &
|
||||
```
|
||||
|
||||
**5. Wait for the frontend on :3000:**
|
||||
|
||||
```bash
|
||||
for i in $(seq 1 60); do
|
||||
if [ "$(curl -s -o /dev/null -w '%{http_code}' http://localhost:3000 2>/dev/null)" = "200" ]; then
|
||||
echo "Frontend ready"
|
||||
break
|
||||
fi
|
||||
sleep 2
|
||||
done
|
||||
```
|
||||
|
||||
Once both are up, skip 3e/3f and go straight to **3g/3h** (feature flags / test user creation).
|
||||
|
||||
### 3e. Build and start (docker — fallback)
|
||||
|
||||
```bash
|
||||
cd $PLATFORM_DIR && docker compose build --no-cache 2>&1 | tail -20
|
||||
@@ -442,6 +522,22 @@ agent-browser --session-name pr-test snapshot | grep "text:"
|
||||
|
||||
### Checking logs
|
||||
|
||||
**Native mode:** when running via `poetry run app` + `pnpm dev`, all app logs stream to the `.ign.*.logs` files written by the `tee` pipes in 3e-native. `rest_server`, `executor`, `copilot_executor`, `websocket`, `scheduler`, `notification_server`, and `database_manager` are all subprocesses of the single `poetry run app` parent, so their output is interleaved in `.ign.application.logs`.
|
||||
|
||||
```bash
|
||||
# Backend (all app subprocesses interleaved)
|
||||
tail -f $BACKEND_DIR/.ign.application.logs
|
||||
|
||||
# Frontend (Next.js dev server)
|
||||
tail -f $FRONTEND_DIR/.ign.frontend.logs
|
||||
|
||||
# Filter for errors across either log
|
||||
grep -iE "error|exception|traceback" $BACKEND_DIR/.ign.application.logs | tail -20
|
||||
grep -iE "error|exception|traceback" $FRONTEND_DIR/.ign.frontend.logs | tail -20
|
||||
```
|
||||
|
||||
**Docker mode:**
|
||||
|
||||
```bash
|
||||
# Backend REST server
|
||||
docker logs autogpt_platform-rest_server-1 2>&1 | tail -30
|
||||
|
||||
@@ -50,6 +50,8 @@ from backend.copilot.tools.models import (
|
||||
AgentPreviewResponse,
|
||||
AgentSavedResponse,
|
||||
AgentsFoundResponse,
|
||||
BackgroundToolList,
|
||||
BackgroundToolStatus,
|
||||
BlockDetailsResponse,
|
||||
BlockListResponse,
|
||||
BlockOutputResponse,
|
||||
@@ -1323,6 +1325,8 @@ ToolResponseUnion = (
|
||||
| MemorySearchResponse
|
||||
| MemoryForgetCandidatesResponse
|
||||
| MemoryForgetConfirmResponse
|
||||
| BackgroundToolStatus
|
||||
| BackgroundToolList
|
||||
)
|
||||
|
||||
|
||||
|
||||
@@ -71,6 +71,7 @@ ToolName = Literal[
|
||||
"browser_act",
|
||||
"browser_navigate",
|
||||
"browser_screenshot",
|
||||
"check_background_tool",
|
||||
"connect_integration",
|
||||
"continue_run_block",
|
||||
"create_agent",
|
||||
|
||||
@@ -163,6 +163,21 @@ perform multi-step work autonomously.
|
||||
Use this when a task is complex enough to benefit from a separate
|
||||
autopilot context, e.g. "research X and write a report" while the
|
||||
parent autopilot handles orchestration.
|
||||
|
||||
### Long-running tool calls (backgrounded)
|
||||
If any tool call exceeds its per-call time budget, the MCP handler
|
||||
parks it in the background (the work keeps running) and returns a
|
||||
result with ``"type": "background"``, a ``background_id`` (e.g.
|
||||
``bg-abc123``), the original tool name, and a message.
|
||||
|
||||
Use **check_background_tool** to control the task:
|
||||
- ``wait_seconds`` (0-540): wait up to N seconds for completion.
|
||||
- ``cancel: true``: abort the background task and discard its result.
|
||||
|
||||
For legitimate long-running work (sub-autopilot, agent execution,
|
||||
large code builds) **keep calling check_background_tool with a
|
||||
longer wait_seconds** — do not cancel unless the task is clearly
|
||||
stuck or no longer useful.
|
||||
"""
|
||||
|
||||
# E2B-only notes — E2B has full internet access so gh CLI works there.
|
||||
|
||||
@@ -0,0 +1,144 @@
|
||||
"""Per-session registry of backgrounded tool calls.
|
||||
|
||||
When a tool exceeds its per-call ``timeout_seconds`` budget the in-flight
|
||||
``asyncio.Task`` is parked here rather than being cancelled. The agent can
|
||||
then use the ``check_background_tool`` tool (keyed by ``background_id``) to
|
||||
wait longer, poll status, or cancel — keeping the autopilot in control of
|
||||
slow sub-agents and graph executions.
|
||||
|
||||
Lives in its own module so that both ``tool_adapter.py`` (which registers
|
||||
tasks during tool dispatch) and ``tools/check_background_tool.py`` (which
|
||||
inspects them) can import the registry without creating a cycle via the
|
||||
tool-registry import chain.
|
||||
|
||||
Scoping: the registry is a :class:`ContextVar`, so each execution context
|
||||
(parent AutoPilot, and any sub-AutoPilot invoked via ``run_block``) gets an
|
||||
independent registry. A sub-AutoPilot cannot see or cancel a parent's
|
||||
background tasks — this is intentional isolation.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import time
|
||||
import uuid
|
||||
from contextvars import ContextVar
|
||||
from typing import Any
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Max wait a single check_background_tool call may block for. Kept below the
|
||||
# stream-level idle timeout so the outer safety net still triggers if the
|
||||
# whole session genuinely stalls.
|
||||
MAX_BACKGROUND_WAIT_SECONDS = 9 * 60 # 9 minutes
|
||||
|
||||
# Upper bound on concurrent background tasks per session. Prevents a
|
||||
# pathological agent from leaking asyncio.Tasks by timing out hundreds of
|
||||
# tools back-to-back. When full, the oldest entry is cancelled and evicted
|
||||
# so the newest registration still succeeds.
|
||||
MAX_BACKGROUND_TASKS_PER_SESSION = 32
|
||||
|
||||
_background_tasks: ContextVar[dict[str, dict[str, Any]]] = ContextVar(
|
||||
"_background_tasks",
|
||||
default=None, # type: ignore[arg-type]
|
||||
)
|
||||
|
||||
|
||||
def init_registry() -> None:
|
||||
"""Install a fresh per-session registry in the current context."""
|
||||
_background_tasks.set({})
|
||||
|
||||
|
||||
def register_background_task(task: asyncio.Task, tool_name: str) -> str:
|
||||
"""Register *task* in the per-session background registry, returning the id.
|
||||
|
||||
If the registry is already at :data:`MAX_BACKGROUND_TASKS_PER_SESSION`,
|
||||
the oldest entry is cancelled and evicted to make room.
|
||||
"""
|
||||
bg_id = f"bg-{uuid.uuid4().hex[:12]}"
|
||||
registry = _background_tasks.get(None)
|
||||
if registry is None:
|
||||
# Registry isn't initialized (e.g. unit tests that bypass
|
||||
# set_execution_context). Fall back to a fresh dict so we at least
|
||||
# don't drop the task silently.
|
||||
registry = {}
|
||||
_background_tasks.set(registry)
|
||||
|
||||
if len(registry) >= MAX_BACKGROUND_TASKS_PER_SESSION:
|
||||
oldest_id, oldest_entry = min(
|
||||
registry.items(), key=lambda kv: kv[1]["started_at"]
|
||||
)
|
||||
oldest_task: asyncio.Task = oldest_entry["task"]
|
||||
if not oldest_task.done():
|
||||
oldest_task.cancel()
|
||||
registry.pop(oldest_id, None)
|
||||
logger.warning(
|
||||
"Background registry full — evicted oldest entry %s (tool=%s)",
|
||||
oldest_id,
|
||||
oldest_entry["tool_name"],
|
||||
)
|
||||
|
||||
registry[bg_id] = {
|
||||
"task": task,
|
||||
"tool_name": tool_name,
|
||||
"started_at": time.monotonic(),
|
||||
}
|
||||
return bg_id
|
||||
|
||||
|
||||
def get_background_task(background_id: str) -> dict[str, Any] | None:
|
||||
"""Return the registered entry for *background_id*, or ``None``."""
|
||||
registry = _background_tasks.get(None)
|
||||
if registry is None:
|
||||
return None
|
||||
return registry.get(background_id)
|
||||
|
||||
|
||||
def list_background_tasks() -> list[dict[str, Any]]:
|
||||
"""Return a snapshot of every registered task in the current session.
|
||||
|
||||
Each entry: ``{background_id, tool_name, started_at, done}``. Used by
|
||||
``check_background_tool(list=true)`` so the agent can recover IDs after
|
||||
context compaction or a long pause.
|
||||
"""
|
||||
registry = _background_tasks.get(None)
|
||||
if not registry:
|
||||
return []
|
||||
return [
|
||||
{
|
||||
"background_id": bg_id,
|
||||
"tool_name": entry["tool_name"],
|
||||
"started_at": entry["started_at"],
|
||||
"done": entry["task"].done(),
|
||||
}
|
||||
for bg_id, entry in registry.items()
|
||||
]
|
||||
|
||||
|
||||
def unregister_background_task(background_id: str) -> None:
|
||||
"""Drop a finished/cancelled task from the registry."""
|
||||
registry = _background_tasks.get(None)
|
||||
if registry is None:
|
||||
return
|
||||
registry.pop(background_id, None)
|
||||
|
||||
|
||||
def cancel_all_background_tasks(reason: str = "stream ended") -> int:
|
||||
"""Cancel every task in the registry and empty it.
|
||||
|
||||
Called from the stream's ``finally`` block so orphaned long-running
|
||||
tools don't keep executing after the user leaves or the stream errors.
|
||||
Returns the number of tasks that were cancelled.
|
||||
"""
|
||||
registry = _background_tasks.get(None)
|
||||
if not registry:
|
||||
return 0
|
||||
cancelled = 0
|
||||
for bg_id, entry in list(registry.items()):
|
||||
task: asyncio.Task = entry["task"]
|
||||
if not task.done():
|
||||
task.cancel()
|
||||
cancelled += 1
|
||||
registry.pop(bg_id, None)
|
||||
if cancelled:
|
||||
logger.info("Cancelled %d orphaned background task(s) on %s", cancelled, reason)
|
||||
return cancelled
|
||||
@@ -0,0 +1,155 @@
|
||||
"""Tests for the background task registry."""
|
||||
|
||||
import asyncio
|
||||
import contextlib
|
||||
|
||||
import pytest
|
||||
|
||||
from .background_registry import (
|
||||
MAX_BACKGROUND_TASKS_PER_SESSION,
|
||||
cancel_all_background_tasks,
|
||||
get_background_task,
|
||||
init_registry,
|
||||
list_background_tasks,
|
||||
register_background_task,
|
||||
unregister_background_task,
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _init_for_each_test():
|
||||
init_registry()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_register_and_lookup():
|
||||
async def hang():
|
||||
await asyncio.sleep(60)
|
||||
|
||||
task = asyncio.create_task(hang())
|
||||
bg_id = register_background_task(task, "some_tool")
|
||||
|
||||
entry = get_background_task(bg_id)
|
||||
assert entry is not None
|
||||
assert entry["tool_name"] == "some_tool"
|
||||
assert entry["task"] is task
|
||||
|
||||
task.cancel()
|
||||
with contextlib.suppress(asyncio.CancelledError):
|
||||
await task
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_unregister_removes_entry():
|
||||
async def hang():
|
||||
await asyncio.sleep(60)
|
||||
|
||||
task = asyncio.create_task(hang())
|
||||
bg_id = register_background_task(task, "some_tool")
|
||||
unregister_background_task(bg_id)
|
||||
assert get_background_task(bg_id) is None
|
||||
|
||||
task.cancel()
|
||||
with contextlib.suppress(asyncio.CancelledError):
|
||||
await task
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cancel_all_cancels_pending_tasks_and_empties_registry():
|
||||
events = []
|
||||
|
||||
async def hang_with_cancel_trap(idx: int):
|
||||
try:
|
||||
await asyncio.sleep(60)
|
||||
except asyncio.CancelledError:
|
||||
events.append(idx)
|
||||
raise
|
||||
|
||||
tasks = [asyncio.create_task(hang_with_cancel_trap(i)) for i in range(3)]
|
||||
# Let the tasks start before cancellation.
|
||||
await asyncio.sleep(0)
|
||||
bg_ids = [register_background_task(t, f"tool_{i}") for i, t in enumerate(tasks)]
|
||||
|
||||
# Sanity check: all three actually got registered under real IDs.
|
||||
for bg_id in bg_ids:
|
||||
assert get_background_task(bg_id) is not None
|
||||
|
||||
count = cancel_all_background_tasks(reason="test")
|
||||
assert count == 3
|
||||
|
||||
# Let the cancellations propagate.
|
||||
for t in tasks:
|
||||
with contextlib.suppress(asyncio.CancelledError):
|
||||
await t
|
||||
assert sorted(events) == [0, 1, 2]
|
||||
|
||||
# Registry should be empty now — verify using the actual IDs we registered.
|
||||
for bg_id in bg_ids:
|
||||
assert get_background_task(bg_id) is None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_registry_cap_evicts_oldest_on_overflow():
|
||||
tasks: list[asyncio.Task] = []
|
||||
ids: list[str] = []
|
||||
|
||||
async def hang():
|
||||
await asyncio.sleep(60)
|
||||
|
||||
# Fill to capacity.
|
||||
for _ in range(MAX_BACKGROUND_TASKS_PER_SESSION):
|
||||
t = asyncio.create_task(hang())
|
||||
tasks.append(t)
|
||||
ids.append(register_background_task(t, "pool_tool"))
|
||||
|
||||
oldest_id = ids[0]
|
||||
oldest_task = tasks[0]
|
||||
assert get_background_task(oldest_id) is not None
|
||||
|
||||
# One more registration should evict + cancel the oldest.
|
||||
extra_task = asyncio.create_task(hang())
|
||||
extra_id = register_background_task(extra_task, "overflow_tool")
|
||||
tasks.append(extra_task)
|
||||
ids.append(extra_id)
|
||||
|
||||
assert get_background_task(oldest_id) is None
|
||||
assert get_background_task(extra_id) is not None
|
||||
# The evicted task was cancelled.
|
||||
with contextlib.suppress(asyncio.CancelledError):
|
||||
await oldest_task
|
||||
assert oldest_task.cancelled()
|
||||
|
||||
# Cleanup.
|
||||
for t in tasks[1:]:
|
||||
t.cancel()
|
||||
with contextlib.suppress(asyncio.CancelledError):
|
||||
await t
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_list_background_tasks_returns_snapshot():
|
||||
async def hang():
|
||||
await asyncio.sleep(60)
|
||||
|
||||
tasks = [asyncio.create_task(hang()) for _ in range(2)]
|
||||
await asyncio.sleep(0)
|
||||
bg_ids = [register_background_task(t, f"tool_{i}") for i, t in enumerate(tasks)]
|
||||
|
||||
snapshot = list_background_tasks()
|
||||
assert len(snapshot) == 2
|
||||
returned = {e["background_id"]: e for e in snapshot}
|
||||
assert set(returned) == set(bg_ids)
|
||||
for entry in snapshot:
|
||||
assert entry["tool_name"].startswith("tool_")
|
||||
assert entry["done"] is False
|
||||
assert entry["started_at"] > 0
|
||||
|
||||
for t in tasks:
|
||||
t.cancel()
|
||||
with contextlib.suppress(asyncio.CancelledError):
|
||||
await t
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_list_background_tasks_empty():
|
||||
assert list_background_tasks() == []
|
||||
@@ -107,6 +107,7 @@ from ..transcript import (
|
||||
)
|
||||
from ..transcript_builder import TranscriptBuilder
|
||||
from .compaction import CompactionTracker, filter_compaction_messages
|
||||
from .background_registry import cancel_all_background_tasks
|
||||
from .env import build_sdk_env # noqa: F401 — re-export for backward compat
|
||||
from .response_adapter import SDKResponseAdapter
|
||||
from .security_hooks import create_security_hooks
|
||||
@@ -162,9 +163,12 @@ _CIRCUIT_BREAKER_ERROR_MSG = (
|
||||
)
|
||||
|
||||
# Idle timeout: abort the stream if no meaningful SDK message (only heartbeats)
|
||||
# arrives for this many seconds. This catches hung tool calls (e.g. WebSearch
|
||||
# hanging on a search provider that never responds).
|
||||
_IDLE_TIMEOUT_SECONDS = 10 * 60 # 10 minutes
|
||||
# arrives for this many seconds. Acts as a last-resort safety net — individual
|
||||
# tools enforce their own timeouts at the MCP handler level (see BaseTool.
|
||||
# timeout_seconds) and return a synthetic tool result to the agent on timeout.
|
||||
# This stream-level timeout only fires if a tool's per-call timeout was
|
||||
# disabled (timeout_seconds=None) or the SDK itself is stuck between messages.
|
||||
_IDLE_TIMEOUT_SECONDS = 30 * 60 # 30 minutes
|
||||
|
||||
# Event types that are ephemeral / cosmetic and must NOT be counted toward
|
||||
# ``events_yielded`` in the transient-retry loop. Counting them would prevent
|
||||
@@ -1932,20 +1936,33 @@ async def _run_stream_attempt(
|
||||
yield ev
|
||||
yield StreamHeartbeat()
|
||||
|
||||
# Idle timeout: if no real SDK message for too long, a tool
|
||||
# call is likely hung (e.g. WebSearch provider not responding).
|
||||
# Idle timeout: last-resort safety net. Per-tool timeouts in
|
||||
# the MCP handler normally catch hung tools first and return
|
||||
# a synthetic tool result so the agent can recover. This only
|
||||
# fires if a tool opted out of per-call timeouts or the SDK
|
||||
# itself is stuck between messages.
|
||||
idle_seconds = time.monotonic() - _last_real_msg_time
|
||||
if idle_seconds >= _IDLE_TIMEOUT_SECONDS:
|
||||
unresolved_ids = (
|
||||
state.adapter.current_tool_calls.keys()
|
||||
- state.adapter.resolved_tool_calls
|
||||
)
|
||||
unresolved_tools = {
|
||||
tid: state.adapter.current_tool_calls[tid]
|
||||
for tid in unresolved_ids
|
||||
}
|
||||
logger.error(
|
||||
"%s Idle timeout after %.0fs with no SDK message — "
|
||||
"aborting stream (likely hung tool call)",
|
||||
"%s Idle timeout after %.0fs — unresolved tool calls: %s",
|
||||
ctx.log_prefix,
|
||||
idle_seconds,
|
||||
", ".join(
|
||||
f"{tc['name']}(id={tid[:12]})"
|
||||
for tid, tc in unresolved_tools.items()
|
||||
)
|
||||
or "(none tracked)",
|
||||
)
|
||||
stream_error_msg = (
|
||||
"A tool call appears to be stuck "
|
||||
"(no response for 10 minutes). "
|
||||
"Please try again."
|
||||
"The session has been idle for too long. Please try again."
|
||||
)
|
||||
stream_error_code = "idle_timeout"
|
||||
_append_error_marker(ctx.session, stream_error_msg, retryable=True)
|
||||
@@ -2318,6 +2335,10 @@ async def _run_stream_attempt(
|
||||
break
|
||||
finally:
|
||||
await _safe_close_sdk_client(sdk_client, ctx.log_prefix)
|
||||
# Cancel any tool calls still parked in the background registry so
|
||||
# orphaned long-running work (sub-AutoPilot, graph execution, etc.)
|
||||
# doesn't keep running after the stream ends.
|
||||
cancel_all_background_tasks(reason=f"stream ended ({ctx.log_prefix})")
|
||||
|
||||
# --- Post-stream processing (only on success) ---
|
||||
if state.adapter.has_unresolved_tool_calls:
|
||||
|
||||
@@ -37,6 +37,11 @@ from backend.copilot.tools import TOOL_REGISTRY
|
||||
from backend.copilot.tools.base import BaseTool
|
||||
from backend.util.truncate import truncate
|
||||
|
||||
# Background-task registry for tools that exceed their per-call timeout —
|
||||
# lives in its own module to avoid a TOOL_REGISTRY import cycle with
|
||||
# ``tools/check_background_tool.py``.
|
||||
from .background_registry import init_registry as _init_background_registry
|
||||
from .background_registry import register_background_task as _register_background_task
|
||||
from .e2b_file_tools import (
|
||||
E2B_FILE_TOOL_NAMES,
|
||||
E2B_FILE_TOOLS,
|
||||
@@ -134,6 +139,7 @@ def set_execution_context(
|
||||
_pending_tool_outputs.set({})
|
||||
_stash_event.set(asyncio.Event())
|
||||
_consecutive_tool_failures.set({})
|
||||
_init_background_registry()
|
||||
|
||||
|
||||
def reset_stash_event() -> None:
|
||||
@@ -248,15 +254,57 @@ async def _execute_tool_sync(
|
||||
session: ChatSession,
|
||||
args: dict[str, Any],
|
||||
) -> dict[str, Any]:
|
||||
"""Execute a tool synchronously and return MCP-formatted response."""
|
||||
"""Execute a tool and return an MCP-formatted response.
|
||||
|
||||
Applies the tool's ``timeout_seconds`` budget (``None`` disables it).
|
||||
On timeout the pending task is **not** cancelled — it is parked in the
|
||||
background registry and a synthetic tool result is returned to the
|
||||
agent along with a ``background_id``. The agent can then call
|
||||
``check_background_tool`` to keep waiting, inspect status, or cancel.
|
||||
This lets the autopilot decide on slow sub-agents / graph executions
|
||||
instead of the handler making an irreversible choice.
|
||||
"""
|
||||
effective_id = f"sdk-{uuid.uuid4().hex[:12]}"
|
||||
result = await base_tool.execute(
|
||||
user_id=user_id,
|
||||
session=session,
|
||||
tool_call_id=effective_id,
|
||||
**args,
|
||||
task: asyncio.Task = asyncio.create_task(
|
||||
base_tool.execute(
|
||||
user_id=user_id,
|
||||
session=session,
|
||||
tool_call_id=effective_id,
|
||||
**args,
|
||||
),
|
||||
name=f"tool:{base_tool.name}:{effective_id}",
|
||||
)
|
||||
|
||||
timeout = base_tool.timeout_seconds
|
||||
try:
|
||||
if timeout is None:
|
||||
result = await task
|
||||
else:
|
||||
# asyncio.wait (unlike wait_for) does NOT cancel on timeout — the
|
||||
# task keeps running in the background.
|
||||
await asyncio.wait({task}, timeout=timeout)
|
||||
if not task.done():
|
||||
bg_id = _register_background_task(task, base_tool.name)
|
||||
logger.warning(
|
||||
"Tool %s exceeded %ss budget — parked as "
|
||||
"background_id=%s (args=%s)",
|
||||
base_tool.name,
|
||||
timeout,
|
||||
bg_id,
|
||||
_redact_args_for_log(args),
|
||||
)
|
||||
return _tool_background_result(base_tool.name, timeout, bg_id)
|
||||
# Completed within budget — .result() re-raises any exception.
|
||||
result = task.result()
|
||||
except asyncio.CancelledError:
|
||||
# The handler itself was cancelled (e.g. stream teardown) mid-wait.
|
||||
# Cancel the child so it doesn't keep running untracked — the
|
||||
# registry hasn't seen it yet, so cancel_all_background_tasks
|
||||
# couldn't clean it up.
|
||||
if not task.done():
|
||||
task.cancel()
|
||||
raise
|
||||
|
||||
text = (
|
||||
result.output if isinstance(result.output, str) else json.dumps(result.output)
|
||||
)
|
||||
@@ -267,6 +315,65 @@ async def _execute_tool_sync(
|
||||
}
|
||||
|
||||
|
||||
def _tool_background_result(
|
||||
tool_name: str, timeout: int, background_id: str
|
||||
) -> dict[str, Any]:
|
||||
"""Build a synthetic tool result when a call is parked as a background task.
|
||||
|
||||
The task is still running; the agent receives this so the stream can
|
||||
continue and the autopilot can decide whether to keep waiting or cancel
|
||||
via ``check_background_tool``.
|
||||
"""
|
||||
payload = {
|
||||
"type": "background",
|
||||
"tool": tool_name,
|
||||
"timeout_seconds": timeout,
|
||||
"background_id": background_id,
|
||||
"message": (
|
||||
f"Still running after {timeout}s — use check_background_tool "
|
||||
"to wait longer or cancel."
|
||||
),
|
||||
}
|
||||
return {
|
||||
"content": [{"type": "text", "text": json.dumps(payload, ensure_ascii=False)}],
|
||||
"isError": False,
|
||||
}
|
||||
|
||||
|
||||
# Keys that may carry credentials / PII. Values for these keys are replaced
|
||||
# with '<redacted>' in monitoring logs.
|
||||
_SENSITIVE_ARG_KEYS = frozenset(
|
||||
{
|
||||
"api_key",
|
||||
"apikey",
|
||||
"authorization",
|
||||
"auth",
|
||||
"credentials",
|
||||
"password",
|
||||
"secret",
|
||||
"token",
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
def _redact_args_for_log(args: dict[str, Any]) -> str:
|
||||
"""Render args for log monitoring, redacting sensitive keys and truncating
|
||||
long string values."""
|
||||
try:
|
||||
rendered: dict[str, Any] = {}
|
||||
for k, v in args.items():
|
||||
if k.lower() in _SENSITIVE_ARG_KEYS:
|
||||
rendered[k] = "<redacted>"
|
||||
continue
|
||||
if isinstance(v, str) and len(v) > 200:
|
||||
rendered[k] = v[:200] + "…"
|
||||
else:
|
||||
rendered[k] = v
|
||||
return json.dumps(rendered, default=str)[:500]
|
||||
except (TypeError, ValueError):
|
||||
return str(args)[:500]
|
||||
|
||||
|
||||
def _mcp_error(message: str) -> dict[str, Any]:
|
||||
return {
|
||||
"content": [
|
||||
|
||||
@@ -251,11 +251,16 @@ class TestTruncationAndStashIntegration:
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _make_mock_tool(name: str, output: str = "result") -> MagicMock:
|
||||
def _make_mock_tool(
|
||||
name: str,
|
||||
output: str = "result",
|
||||
timeout_seconds: int | None = 600,
|
||||
) -> MagicMock:
|
||||
"""Return a BaseTool mock that returns a successful StreamToolOutputAvailable."""
|
||||
tool = MagicMock()
|
||||
tool.name = name
|
||||
tool.parameters = {"properties": {}, "required": []}
|
||||
tool.timeout_seconds = timeout_seconds
|
||||
tool.execute = AsyncMock(
|
||||
return_value=StreamToolOutputAvailable(
|
||||
toolCallId="test-id",
|
||||
@@ -336,6 +341,216 @@ class TestCreateToolHandler:
|
||||
assert mock_tool.execute.await_count == 2
|
||||
|
||||
|
||||
class TestToolTimeout:
|
||||
"""Tests for per-tool timeout behavior in _execute_tool_sync."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _init(self):
|
||||
_init_ctx(session=_make_mock_session())
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_timeout_parks_task_and_returns_background_id(self):
|
||||
"""A tool that exceeds its timeout is moved to the background
|
||||
registry (not cancelled); the handler returns a synthetic
|
||||
type='background' result with a background_id."""
|
||||
from backend.copilot.sdk.background_registry import (
|
||||
get_background_task,
|
||||
unregister_background_task,
|
||||
)
|
||||
|
||||
mock_tool = _make_mock_tool("slow_tool", timeout_seconds=1)
|
||||
|
||||
async def hang_forever(*_args, **_kwargs):
|
||||
await asyncio.sleep(60)
|
||||
return StreamToolOutputAvailable(
|
||||
toolCallId="t1",
|
||||
output="late",
|
||||
toolName="slow_tool",
|
||||
success=True,
|
||||
)
|
||||
|
||||
mock_tool.execute = AsyncMock(side_effect=hang_forever)
|
||||
|
||||
handler = create_tool_handler(mock_tool)
|
||||
result = await handler({"arg": "v"})
|
||||
|
||||
# isError=False because the task is still running — the agent isn't
|
||||
# being told about a failure, just about a delay.
|
||||
assert result["isError"] is False
|
||||
payload = json.loads(result["content"][0]["text"])
|
||||
assert payload["type"] == "background"
|
||||
assert payload["tool"] == "slow_tool"
|
||||
assert payload["timeout_seconds"] == 1
|
||||
assert payload["background_id"].startswith("bg-")
|
||||
|
||||
entry = get_background_task(payload["background_id"])
|
||||
assert entry is not None
|
||||
assert entry["tool_name"] == "slow_tool"
|
||||
assert not entry["task"].done()
|
||||
|
||||
# Cleanup: cancel the parked task so the test doesn't leak it.
|
||||
entry["task"].cancel()
|
||||
try:
|
||||
await entry["task"]
|
||||
except (asyncio.CancelledError, BaseException):
|
||||
pass
|
||||
unregister_background_task(payload["background_id"])
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_timeout_does_not_cancel_tool_coroutine(self):
|
||||
"""The task keeps running in the background after the timeout
|
||||
budget is exceeded — cancellation is the agent's choice."""
|
||||
from backend.copilot.sdk.background_registry import (
|
||||
get_background_task,
|
||||
unregister_background_task,
|
||||
)
|
||||
|
||||
mock_tool = _make_mock_tool("slow_tool", timeout_seconds=1)
|
||||
observed_cancel = asyncio.Event()
|
||||
|
||||
async def stays_alive(*_args, **_kwargs):
|
||||
try:
|
||||
await asyncio.sleep(3)
|
||||
except asyncio.CancelledError:
|
||||
observed_cancel.set()
|
||||
raise
|
||||
return StreamToolOutputAvailable(
|
||||
toolCallId="t1",
|
||||
output="eventual",
|
||||
toolName="slow_tool",
|
||||
success=True,
|
||||
)
|
||||
|
||||
mock_tool.execute = AsyncMock(side_effect=stays_alive)
|
||||
|
||||
handler = create_tool_handler(mock_tool)
|
||||
result = await handler({})
|
||||
payload = json.loads(result["content"][0]["text"])
|
||||
|
||||
entry = get_background_task(payload["background_id"])
|
||||
assert entry is not None
|
||||
# Give the background task a brief moment; it should still be
|
||||
# running and NOT cancelled.
|
||||
await asyncio.sleep(0.1)
|
||||
assert not observed_cancel.is_set()
|
||||
assert not entry["task"].done()
|
||||
|
||||
# Let it complete so the test stays clean.
|
||||
await entry["task"]
|
||||
unregister_background_task(payload["background_id"])
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_none_timeout_disables_wait_for(self):
|
||||
"""When timeout_seconds is None, the tool runs to completion without
|
||||
an outer timeout wrapper."""
|
||||
mock_tool = _make_mock_tool(
|
||||
"long_running_tool",
|
||||
output="completed",
|
||||
timeout_seconds=None,
|
||||
)
|
||||
|
||||
async def slow_but_completes(*_args, **_kwargs):
|
||||
await asyncio.sleep(0.05)
|
||||
return StreamToolOutputAvailable(
|
||||
toolCallId="t1",
|
||||
output="completed",
|
||||
toolName="long_running_tool",
|
||||
success=True,
|
||||
)
|
||||
|
||||
mock_tool.execute = AsyncMock(side_effect=slow_but_completes)
|
||||
|
||||
handler = create_tool_handler(mock_tool)
|
||||
result = await handler({})
|
||||
|
||||
assert result["isError"] is False
|
||||
assert "completed" in result["content"][0]["text"]
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_handler_cancellation_cancels_child_task(self):
|
||||
"""If the handler itself is cancelled before the tool completes,
|
||||
the child task is cancelled too (no leak into the background
|
||||
registry, since it wasn't parked yet)."""
|
||||
import contextlib
|
||||
|
||||
mock_tool = _make_mock_tool("slow_tool", timeout_seconds=60)
|
||||
child_cancelled = asyncio.Event()
|
||||
|
||||
async def hang_until_cancelled(*_args, **_kwargs):
|
||||
try:
|
||||
await asyncio.sleep(60)
|
||||
except asyncio.CancelledError:
|
||||
child_cancelled.set()
|
||||
raise
|
||||
|
||||
mock_tool.execute = AsyncMock(side_effect=hang_until_cancelled)
|
||||
|
||||
from backend.copilot.sdk.tool_adapter import _execute_tool_sync
|
||||
|
||||
outer_task = asyncio.create_task(
|
||||
_execute_tool_sync(mock_tool, "u", _make_mock_session(), {})
|
||||
)
|
||||
# Let the handler start waiting on the child.
|
||||
await asyncio.sleep(0.05)
|
||||
outer_task.cancel()
|
||||
with contextlib.suppress(asyncio.CancelledError):
|
||||
await outer_task
|
||||
await asyncio.sleep(0)
|
||||
assert child_cancelled.is_set()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_fast_tool_within_timeout_succeeds(self):
|
||||
"""Tools that complete well under the timeout are unaffected."""
|
||||
mock_tool = _make_mock_tool(
|
||||
"fast_tool",
|
||||
output="fast-ok",
|
||||
timeout_seconds=30,
|
||||
)
|
||||
|
||||
handler = create_tool_handler(mock_tool)
|
||||
result = await handler({})
|
||||
|
||||
assert result["isError"] is False
|
||||
assert "fast-ok" in result["content"][0]["text"]
|
||||
|
||||
|
||||
class TestBaseToolDefaultTimeout:
|
||||
"""The BaseTool default timeout and per-tool overrides."""
|
||||
|
||||
def test_default_timeout_is_ten_minutes(self):
|
||||
from backend.copilot.tools.base import BaseTool
|
||||
|
||||
class _Plain(BaseTool):
|
||||
@property
|
||||
def name(self):
|
||||
return "plain"
|
||||
|
||||
@property
|
||||
def description(self):
|
||||
return ""
|
||||
|
||||
@property
|
||||
def parameters(self):
|
||||
return {"type": "object", "properties": {}}
|
||||
|
||||
assert _Plain().timeout_seconds == 600
|
||||
|
||||
def test_run_agent_opts_out(self):
|
||||
from backend.copilot.tools.run_agent import RunAgentTool
|
||||
|
||||
assert RunAgentTool().timeout_seconds is None
|
||||
|
||||
def test_run_block_opts_out(self):
|
||||
from backend.copilot.tools.run_block import RunBlockTool
|
||||
|
||||
assert RunBlockTool().timeout_seconds is None
|
||||
|
||||
def test_continue_run_block_opts_out(self):
|
||||
from backend.copilot.tools.continue_run_block import ContinueRunBlockTool
|
||||
|
||||
assert ContinueRunBlockTool().timeout_seconds is None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Regression tests: bugs fixed by removing pre-launch mechanism
|
||||
#
|
||||
|
||||
@@ -13,6 +13,7 @@ from .agent_output import AgentOutputTool
|
||||
from .ask_question import AskQuestionTool
|
||||
from .base import BaseTool
|
||||
from .bash_exec import BashExecTool
|
||||
from .check_background_tool import CheckBackgroundToolTool
|
||||
from .connect_integration import ConnectIntegrationTool
|
||||
from .continue_run_block import ContinueRunBlockTool
|
||||
from .create_agent import CreateAgentTool
|
||||
@@ -81,6 +82,7 @@ TOOL_REGISTRY: dict[str, BaseTool] = {
|
||||
"run_agent": RunAgentTool(),
|
||||
"run_block": RunBlockTool(),
|
||||
"continue_run_block": ContinueRunBlockTool(),
|
||||
"check_background_tool": CheckBackgroundToolTool(),
|
||||
"run_mcp_tool": RunMCPToolTool(),
|
||||
"get_mcp_guide": GetMCPGuideTool(),
|
||||
"view_agent_output": AgentOutputTool(),
|
||||
|
||||
@@ -140,6 +140,21 @@ class BaseTool:
|
||||
"""
|
||||
return True
|
||||
|
||||
@property
|
||||
def timeout_seconds(self) -> int | None:
|
||||
"""Maximum seconds a single invocation may run before soft-timing out.
|
||||
|
||||
On timeout the MCP handler cancels the call and returns a synthetic
|
||||
tool result to the agent (rather than hard-killing the stream), so
|
||||
the agent can decide to retry, check progress via another tool, or
|
||||
move on.
|
||||
|
||||
Return ``None`` to disable the per-call timeout — appropriate for
|
||||
tools that manage their own lifecycle (e.g. ``run_agent`` polls an
|
||||
execution, ``run_block`` can delegate to a sub-AutoPilot).
|
||||
"""
|
||||
return 10 * 60 # 10 minutes
|
||||
|
||||
def as_openai_tool(self) -> ChatCompletionToolParam:
|
||||
"""Convert to OpenAI tool format."""
|
||||
return ChatCompletionToolParam(
|
||||
|
||||
@@ -42,6 +42,11 @@ class BashExecTool(BaseTool):
|
||||
def name(self) -> str:
|
||||
return "bash_exec"
|
||||
|
||||
# BaseTool.timeout_seconds=600 is inherited but never fires in practice:
|
||||
# the `timeout` parameter on each call is capped at 120s by this tool's
|
||||
# own subprocess timeout, so the MCP handler's budget is only a safety
|
||||
# net for pathological hangs around sandbox setup/teardown.
|
||||
|
||||
@property
|
||||
def description(self) -> str:
|
||||
return (
|
||||
|
||||
@@ -0,0 +1,290 @@
|
||||
"""Tool for waiting on, polling, or cancelling a backgrounded tool call.
|
||||
|
||||
Long-running tool calls exceed their per-call timeout and are parked in the
|
||||
background registry by :func:`_execute_tool_sync`. This tool lets the agent
|
||||
decide whether to keep waiting, poll status, or cancel — so the autopilot
|
||||
stays in control rather than the handler making an irreversible choice.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import time
|
||||
from typing import Any
|
||||
|
||||
from backend.copilot.model import ChatSession
|
||||
from backend.copilot.sdk.background_registry import (
|
||||
MAX_BACKGROUND_WAIT_SECONDS as _MAX_BACKGROUND_WAIT_SECONDS,
|
||||
)
|
||||
from backend.copilot.sdk.background_registry import (
|
||||
get_background_task,
|
||||
list_background_tasks,
|
||||
unregister_background_task,
|
||||
)
|
||||
|
||||
from .base import BaseTool
|
||||
from .models import (
|
||||
BackgroundToolList,
|
||||
BackgroundToolListEntry,
|
||||
BackgroundToolStatus,
|
||||
ErrorResponse,
|
||||
ToolResponseBase,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class CheckBackgroundToolTool(BaseTool):
|
||||
"""Inspect, wait on, or cancel a backgrounded tool call."""
|
||||
|
||||
@property
|
||||
def name(self) -> str:
|
||||
return "check_background_tool"
|
||||
|
||||
@property
|
||||
def requires_auth(self) -> bool:
|
||||
# Parked tasks almost always originate from authenticated tools
|
||||
# (run_agent, run_block). Require auth here too for consistency
|
||||
# with those tools even though ContextVar scoping already prevents
|
||||
# cross-session leakage.
|
||||
return True
|
||||
|
||||
@property
|
||||
def timeout_seconds(self) -> int | None:
|
||||
# This tool drives its own wait loop up to _MAX_BACKGROUND_WAIT_SECONDS.
|
||||
# Applying a second timeout on top would be redundant and could cancel
|
||||
# the wait prematurely.
|
||||
return None
|
||||
|
||||
@property
|
||||
def description(self) -> str:
|
||||
return (
|
||||
"Inspect a backgrounded tool call by its background_id. "
|
||||
"Use when a prior tool call returned type='background'. "
|
||||
"Options: list=true to enumerate all active background tasks, "
|
||||
"wait for completion up to wait_seconds (default 60, max "
|
||||
f"{_MAX_BACKGROUND_WAIT_SECONDS}), just check status with "
|
||||
"wait_seconds=0, or cancel=true to abort the task and "
|
||||
"discard its result."
|
||||
)
|
||||
|
||||
@property
|
||||
def parameters(self) -> dict[str, Any]:
|
||||
return {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"list": {
|
||||
"type": "boolean",
|
||||
"description": (
|
||||
"If true, return every active background task in "
|
||||
"this session (no other params needed). Use to "
|
||||
"recover background_ids after a context compaction."
|
||||
),
|
||||
"default": False,
|
||||
},
|
||||
"background_id": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
"The background_id returned by the timed-out tool. "
|
||||
"Required unless list=true."
|
||||
),
|
||||
},
|
||||
"wait_seconds": {
|
||||
"type": "integer",
|
||||
"description": (
|
||||
"Max seconds to wait for completion. 0 = just check "
|
||||
"status. Values above "
|
||||
f"{_MAX_BACKGROUND_WAIT_SECONDS} are clamped to that "
|
||||
"maximum — call again to keep waiting."
|
||||
),
|
||||
"default": 60,
|
||||
},
|
||||
"cancel": {
|
||||
"type": "boolean",
|
||||
"description": (
|
||||
"If true, cancel the background task and discard "
|
||||
"its result. Takes precedence over wait_seconds."
|
||||
),
|
||||
"default": False,
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
async def _execute(
|
||||
self,
|
||||
user_id: str | None,
|
||||
session: ChatSession,
|
||||
*,
|
||||
list: bool = False,
|
||||
background_id: str = "",
|
||||
wait_seconds: int = 60,
|
||||
cancel: bool = False,
|
||||
**kwargs,
|
||||
) -> ToolResponseBase:
|
||||
if list:
|
||||
return _list_response(session)
|
||||
|
||||
if not background_id:
|
||||
return ErrorResponse(
|
||||
message=(
|
||||
"background_id is required (or pass list=true to "
|
||||
"enumerate active tasks)."
|
||||
),
|
||||
session_id=session.session_id,
|
||||
)
|
||||
|
||||
entry = get_background_task(background_id)
|
||||
if entry is None:
|
||||
return ErrorResponse(
|
||||
message=(
|
||||
f"No background task with id {background_id}. It may "
|
||||
"have already completed (and been consumed) or never "
|
||||
"existed."
|
||||
),
|
||||
session_id=session.session_id,
|
||||
)
|
||||
|
||||
task: asyncio.Task = entry["task"]
|
||||
tool_name: str = entry["tool_name"]
|
||||
|
||||
if cancel:
|
||||
# Race guard: the task may have finished between the registry
|
||||
# lookup and the cancel. If so, surface the real result rather
|
||||
# than reporting 'cancelled' and losing the output.
|
||||
if task.done():
|
||||
return _status_from_finished_task(
|
||||
session, tool_name, background_id, task
|
||||
)
|
||||
# Dry-run: simulate cancellation without touching the task, so
|
||||
# the LLM can reason about the flow without real side effects.
|
||||
if session.dry_run:
|
||||
return BackgroundToolStatus(
|
||||
message=(
|
||||
f"[dry-run] Would cancel background task for " f"'{tool_name}'."
|
||||
),
|
||||
session_id=session.session_id,
|
||||
status="cancelled",
|
||||
tool=tool_name,
|
||||
background_id=background_id,
|
||||
)
|
||||
task.cancel()
|
||||
unregister_background_task(background_id)
|
||||
logger.info(
|
||||
"Cancelled background task %s for tool %s by agent request",
|
||||
background_id,
|
||||
tool_name,
|
||||
)
|
||||
return BackgroundToolStatus(
|
||||
message=f"Cancelled background task for '{tool_name}'.",
|
||||
session_id=session.session_id,
|
||||
status="cancelled",
|
||||
tool=tool_name,
|
||||
background_id=background_id,
|
||||
)
|
||||
|
||||
if task.done():
|
||||
return _status_from_finished_task(session, tool_name, background_id, task)
|
||||
|
||||
effective_wait = max(0, min(wait_seconds, _MAX_BACKGROUND_WAIT_SECONDS))
|
||||
if effective_wait == 0:
|
||||
return BackgroundToolStatus(
|
||||
message=(
|
||||
f"'{tool_name}' is still running. Call again with "
|
||||
"wait_seconds>0 to wait, or cancel=true to abort."
|
||||
),
|
||||
session_id=session.session_id,
|
||||
status="still_running",
|
||||
tool=tool_name,
|
||||
background_id=background_id,
|
||||
)
|
||||
|
||||
await asyncio.wait({task}, timeout=effective_wait)
|
||||
if task.done():
|
||||
return _status_from_finished_task(session, tool_name, background_id, task)
|
||||
|
||||
return BackgroundToolStatus(
|
||||
message=(
|
||||
f"'{tool_name}' still running after waiting "
|
||||
f"{effective_wait}s. Call again to keep waiting, or "
|
||||
"cancel=true to abort."
|
||||
),
|
||||
session_id=session.session_id,
|
||||
status="still_running",
|
||||
tool=tool_name,
|
||||
background_id=background_id,
|
||||
waited_seconds=effective_wait,
|
||||
)
|
||||
|
||||
|
||||
def _list_response(session: ChatSession) -> BackgroundToolList:
|
||||
"""Build the response for ``check_background_tool(list=true)``."""
|
||||
now = time.monotonic()
|
||||
entries = [
|
||||
BackgroundToolListEntry(
|
||||
background_id=e["background_id"],
|
||||
tool=e["tool_name"],
|
||||
age_seconds=round(now - e["started_at"], 2),
|
||||
done=e["done"],
|
||||
)
|
||||
for e in list_background_tasks()
|
||||
]
|
||||
count = len(entries)
|
||||
msg = (
|
||||
f"{count} active background task(s)."
|
||||
if count
|
||||
else "No active background tasks."
|
||||
)
|
||||
return BackgroundToolList(
|
||||
message=msg,
|
||||
session_id=session.session_id,
|
||||
tasks=entries,
|
||||
)
|
||||
|
||||
|
||||
def _status_from_finished_task(
|
||||
session: ChatSession,
|
||||
tool_name: str,
|
||||
background_id: str,
|
||||
task: asyncio.Task,
|
||||
) -> ToolResponseBase:
|
||||
"""Unregister a finished task and return its status."""
|
||||
unregister_background_task(background_id)
|
||||
|
||||
if task.cancelled():
|
||||
return BackgroundToolStatus(
|
||||
message=f"Background task for '{tool_name}' was cancelled.",
|
||||
session_id=session.session_id,
|
||||
status="cancelled",
|
||||
tool=tool_name,
|
||||
background_id=background_id,
|
||||
)
|
||||
|
||||
exc = task.exception()
|
||||
if exc is not None:
|
||||
return BackgroundToolStatus(
|
||||
message=f"'{tool_name}' raised {type(exc).__name__}: {exc}",
|
||||
session_id=session.session_id,
|
||||
status="error",
|
||||
tool=tool_name,
|
||||
background_id=background_id,
|
||||
)
|
||||
|
||||
result = task.result()
|
||||
# A tool can complete with success=False without raising — preserve
|
||||
# that as status="error" so the agent doesn't treat it as a win.
|
||||
if not result.success:
|
||||
return BackgroundToolStatus(
|
||||
message=f"'{tool_name}' completed with an error.",
|
||||
session_id=session.session_id,
|
||||
status="error",
|
||||
tool=tool_name,
|
||||
background_id=background_id,
|
||||
output=result.output,
|
||||
)
|
||||
return BackgroundToolStatus(
|
||||
message=f"'{tool_name}' completed.",
|
||||
session_id=session.session_id,
|
||||
status="completed",
|
||||
tool=tool_name,
|
||||
background_id=background_id,
|
||||
output=result.output,
|
||||
)
|
||||
@@ -0,0 +1,318 @@
|
||||
"""Tests for CheckBackgroundToolTool."""
|
||||
|
||||
import asyncio
|
||||
import contextlib
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
import pytest
|
||||
|
||||
from backend.copilot.response_model import StreamToolOutputAvailable
|
||||
from backend.copilot.sdk.background_registry import (
|
||||
init_registry,
|
||||
register_background_task,
|
||||
)
|
||||
|
||||
from .check_background_tool import CheckBackgroundToolTool
|
||||
from .models import BackgroundToolList, BackgroundToolStatus
|
||||
|
||||
|
||||
def _make_session() -> MagicMock:
|
||||
session = MagicMock()
|
||||
session.session_id = "s1"
|
||||
session.dry_run = False
|
||||
return session
|
||||
|
||||
|
||||
def _completed_result(output: str = "ok") -> StreamToolOutputAvailable:
|
||||
return StreamToolOutputAvailable(
|
||||
toolCallId="tc-1",
|
||||
output=output,
|
||||
toolName="slow_tool",
|
||||
success=True,
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _init_registry_for_each_test():
|
||||
init_registry()
|
||||
|
||||
|
||||
class TestCheckBackgroundTool:
|
||||
@pytest.mark.asyncio
|
||||
async def test_missing_background_id_returns_error(self):
|
||||
tool = CheckBackgroundToolTool()
|
||||
response = await tool._execute(
|
||||
user_id="u",
|
||||
session=_make_session(),
|
||||
background_id="",
|
||||
)
|
||||
assert response.type.value == "error"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_unknown_background_id_returns_error(self):
|
||||
tool = CheckBackgroundToolTool()
|
||||
response = await tool._execute(
|
||||
user_id="u",
|
||||
session=_make_session(),
|
||||
background_id="bg-does-not-exist",
|
||||
)
|
||||
assert response.type.value == "error"
|
||||
assert "No background task" in response.message
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_wait_zero_returns_still_running(self):
|
||||
async def slow():
|
||||
await asyncio.sleep(10)
|
||||
return _completed_result()
|
||||
|
||||
task = asyncio.create_task(slow())
|
||||
bg_id = register_background_task(task, "slow_tool")
|
||||
|
||||
tool = CheckBackgroundToolTool()
|
||||
response = await tool._execute(
|
||||
user_id="u",
|
||||
session=_make_session(),
|
||||
background_id=bg_id,
|
||||
wait_seconds=0,
|
||||
)
|
||||
assert isinstance(response, BackgroundToolStatus)
|
||||
assert response.status == "still_running"
|
||||
assert response.background_id == bg_id
|
||||
|
||||
task.cancel()
|
||||
with contextlib.suppress(asyncio.CancelledError):
|
||||
await task
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_wait_returns_completed_when_task_finishes(self):
|
||||
async def fast():
|
||||
await asyncio.sleep(0.05)
|
||||
return _completed_result("final-output")
|
||||
|
||||
task = asyncio.create_task(fast())
|
||||
bg_id = register_background_task(task, "slow_tool")
|
||||
|
||||
tool = CheckBackgroundToolTool()
|
||||
response = await tool._execute(
|
||||
user_id="u",
|
||||
session=_make_session(),
|
||||
background_id=bg_id,
|
||||
wait_seconds=5,
|
||||
)
|
||||
assert isinstance(response, BackgroundToolStatus)
|
||||
assert response.status == "completed"
|
||||
assert response.output == "final-output"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_wait_times_out_and_returns_still_running(self):
|
||||
async def slow():
|
||||
await asyncio.sleep(10)
|
||||
return _completed_result()
|
||||
|
||||
task = asyncio.create_task(slow())
|
||||
bg_id = register_background_task(task, "slow_tool")
|
||||
|
||||
tool = CheckBackgroundToolTool()
|
||||
response = await tool._execute(
|
||||
user_id="u",
|
||||
session=_make_session(),
|
||||
background_id=bg_id,
|
||||
wait_seconds=1,
|
||||
)
|
||||
assert isinstance(response, BackgroundToolStatus)
|
||||
assert response.status == "still_running"
|
||||
assert response.waited_seconds == 1
|
||||
|
||||
task.cancel()
|
||||
with contextlib.suppress(asyncio.CancelledError):
|
||||
await task
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cancel_true_cancels_and_removes_from_registry(self):
|
||||
observed_cancel = asyncio.Event()
|
||||
|
||||
async def stays_until_cancelled():
|
||||
try:
|
||||
await asyncio.sleep(60)
|
||||
except asyncio.CancelledError:
|
||||
observed_cancel.set()
|
||||
raise
|
||||
return _completed_result()
|
||||
|
||||
task = asyncio.create_task(stays_until_cancelled())
|
||||
# Let the task start before we cancel it.
|
||||
await asyncio.sleep(0)
|
||||
|
||||
bg_id = register_background_task(task, "slow_tool")
|
||||
|
||||
tool = CheckBackgroundToolTool()
|
||||
response = await tool._execute(
|
||||
user_id="u",
|
||||
session=_make_session(),
|
||||
background_id=bg_id,
|
||||
cancel=True,
|
||||
)
|
||||
assert isinstance(response, BackgroundToolStatus)
|
||||
assert response.status == "cancelled"
|
||||
|
||||
with contextlib.suppress(asyncio.CancelledError):
|
||||
await task
|
||||
assert observed_cancel.is_set()
|
||||
|
||||
from backend.copilot.sdk.background_registry import get_background_task
|
||||
|
||||
assert get_background_task(bg_id) is None
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cancel_after_task_completed_returns_real_result(self):
|
||||
"""If the task completes between registration and the agent's
|
||||
cancel=true call, surface the real result instead of reporting
|
||||
'cancelled' and losing the output (race guard)."""
|
||||
|
||||
async def finish_quickly():
|
||||
return _completed_result("final-value")
|
||||
|
||||
task = asyncio.create_task(finish_quickly())
|
||||
await task # definitely done by the time we register
|
||||
bg_id = register_background_task(task, "slow_tool")
|
||||
|
||||
tool = CheckBackgroundToolTool()
|
||||
response = await tool._execute(
|
||||
user_id="u",
|
||||
session=_make_session(),
|
||||
background_id=bg_id,
|
||||
cancel=True,
|
||||
)
|
||||
assert isinstance(response, BackgroundToolStatus)
|
||||
assert response.status == "completed"
|
||||
assert response.output == "final-value"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_errored_task_reports_error_status(self):
|
||||
async def raises():
|
||||
raise ValueError("boom")
|
||||
|
||||
task = asyncio.create_task(raises())
|
||||
# Let the task complete before we query it.
|
||||
try:
|
||||
await task
|
||||
except ValueError:
|
||||
pass
|
||||
bg_id = register_background_task(task, "broken_tool")
|
||||
|
||||
tool = CheckBackgroundToolTool()
|
||||
response = await tool._execute(
|
||||
user_id="u",
|
||||
session=_make_session(),
|
||||
background_id=bg_id,
|
||||
)
|
||||
assert isinstance(response, BackgroundToolStatus)
|
||||
assert response.status == "error"
|
||||
assert "boom" in response.message
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_finished_task_with_success_false_reports_error(self):
|
||||
"""A tool that completes with success=False (without raising) is
|
||||
reported as status='error', not 'completed', so the agent doesn't
|
||||
treat it as a win."""
|
||||
|
||||
async def finish_with_failure():
|
||||
return StreamToolOutputAvailable(
|
||||
toolCallId="tc-1",
|
||||
output="partial",
|
||||
toolName="broken_tool",
|
||||
success=False,
|
||||
)
|
||||
|
||||
task = asyncio.create_task(finish_with_failure())
|
||||
await task
|
||||
bg_id = register_background_task(task, "broken_tool")
|
||||
|
||||
tool = CheckBackgroundToolTool()
|
||||
response = await tool._execute(
|
||||
user_id="u",
|
||||
session=_make_session(),
|
||||
background_id=bg_id,
|
||||
)
|
||||
assert isinstance(response, BackgroundToolStatus)
|
||||
assert response.status == "error"
|
||||
assert response.output == "partial"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_list_true_returns_active_background_tasks(self):
|
||||
"""list=true enumerates registered tasks so the agent can recover
|
||||
forgotten background_ids."""
|
||||
|
||||
async def hang():
|
||||
await asyncio.sleep(60)
|
||||
|
||||
tasks = [asyncio.create_task(hang()) for _ in range(2)]
|
||||
await asyncio.sleep(0)
|
||||
bg_ids = [register_background_task(t, f"tool_{i}") for i, t in enumerate(tasks)]
|
||||
|
||||
tool = CheckBackgroundToolTool()
|
||||
response = await tool._execute(
|
||||
user_id="u",
|
||||
session=_make_session(),
|
||||
list=True,
|
||||
)
|
||||
assert isinstance(response, BackgroundToolList)
|
||||
assert len(response.tasks) == 2
|
||||
returned_ids = {entry.background_id for entry in response.tasks}
|
||||
assert returned_ids == set(bg_ids)
|
||||
for entry in response.tasks:
|
||||
assert entry.tool.startswith("tool_")
|
||||
assert entry.age_seconds >= 0
|
||||
assert entry.done is False
|
||||
|
||||
for t in tasks:
|
||||
t.cancel()
|
||||
with contextlib.suppress(asyncio.CancelledError):
|
||||
await t
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_list_true_empty_when_no_tasks(self):
|
||||
tool = CheckBackgroundToolTool()
|
||||
response = await tool._execute(
|
||||
user_id="u",
|
||||
session=_make_session(),
|
||||
list=True,
|
||||
)
|
||||
assert isinstance(response, BackgroundToolList)
|
||||
assert response.tasks == []
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cancel_in_dry_run_does_not_actually_cancel_task(self):
|
||||
"""Under session.dry_run, cancel=true must not kill the real task."""
|
||||
|
||||
async def hang():
|
||||
await asyncio.sleep(60)
|
||||
|
||||
task = asyncio.create_task(hang())
|
||||
await asyncio.sleep(0)
|
||||
bg_id = register_background_task(task, "slow_tool")
|
||||
|
||||
session = _make_session()
|
||||
session.dry_run = True
|
||||
|
||||
tool = CheckBackgroundToolTool()
|
||||
response = await tool._execute(
|
||||
user_id="u",
|
||||
session=session,
|
||||
background_id=bg_id,
|
||||
cancel=True,
|
||||
)
|
||||
assert isinstance(response, BackgroundToolStatus)
|
||||
assert response.status == "cancelled"
|
||||
assert "[dry-run]" in response.message
|
||||
# Real task is still running.
|
||||
assert not task.done()
|
||||
|
||||
# Cleanup.
|
||||
task.cancel()
|
||||
with contextlib.suppress(asyncio.CancelledError):
|
||||
await task
|
||||
|
||||
def test_requires_auth_is_true(self):
|
||||
tool = CheckBackgroundToolTool()
|
||||
assert tool.requires_auth is True
|
||||
@@ -28,6 +28,12 @@ class ContinueRunBlockTool(BaseTool):
|
||||
def name(self) -> str:
|
||||
return "continue_run_block"
|
||||
|
||||
@property
|
||||
def timeout_seconds(self) -> int | None:
|
||||
# Resumes an execution that may be a long-running (sub-AutoPilot)
|
||||
# block — same lifecycle as run_block.
|
||||
return None
|
||||
|
||||
@property
|
||||
def description(self) -> str:
|
||||
return "Resume block execution after a run_block call returned review_required. Pass the review_id."
|
||||
|
||||
@@ -259,6 +259,45 @@ class ErrorResponse(ToolResponseBase):
|
||||
details: dict[str, Any] | None = None
|
||||
|
||||
|
||||
class BackgroundToolStatus(ToolResponseBase):
|
||||
"""Status of a backgrounded tool call, returned by ``check_background_tool``."""
|
||||
|
||||
type: ResponseType = ResponseType.MCP_TOOL_OUTPUT
|
||||
status: Literal["completed", "still_running", "cancelled", "error"] = Field(
|
||||
description="Current state of the background task."
|
||||
)
|
||||
tool: str = Field(description="The name of the originally-backgrounded tool.")
|
||||
background_id: str
|
||||
output: Any | None = Field(
|
||||
default=None,
|
||||
description="Tool output when status=completed or status=error.",
|
||||
)
|
||||
waited_seconds: int | None = Field(default=None)
|
||||
|
||||
|
||||
class BackgroundToolListEntry(BaseModel):
|
||||
"""One row in a ``check_background_tool(list=true)`` response."""
|
||||
|
||||
background_id: str
|
||||
tool: str = Field(description="Name of the originally-backgrounded tool.")
|
||||
age_seconds: float = Field(
|
||||
description="Seconds since the task was parked in the background."
|
||||
)
|
||||
done: bool = Field(
|
||||
description="True if the task has finished but hasn't been consumed yet."
|
||||
)
|
||||
|
||||
|
||||
class BackgroundToolList(ToolResponseBase):
|
||||
"""List of active background tasks, returned by ``check_background_tool(list=true)``."""
|
||||
|
||||
type: ResponseType = ResponseType.MCP_TOOL_OUTPUT
|
||||
tasks: list[BackgroundToolListEntry] = Field(
|
||||
default_factory=list,
|
||||
description="All background tasks currently registered for this session.",
|
||||
)
|
||||
|
||||
|
||||
class InputValidationErrorResponse(ToolResponseBase):
|
||||
"""Response when run_agent receives unknown input fields."""
|
||||
|
||||
|
||||
@@ -104,6 +104,13 @@ class RunAgentTool(BaseTool):
|
||||
def name(self) -> str:
|
||||
return "run_agent"
|
||||
|
||||
@property
|
||||
def timeout_seconds(self) -> int | None:
|
||||
# Agent executions can legitimately run 15-45+ min; the tool polls
|
||||
# its own wait_for_result window and returns an execution_id for
|
||||
# later progress checks, so the stream-level timeout isn't needed.
|
||||
return None
|
||||
|
||||
@property
|
||||
def description(self) -> str:
|
||||
return (
|
||||
|
||||
@@ -27,6 +27,13 @@ class RunBlockTool(BaseTool):
|
||||
def name(self) -> str:
|
||||
return "run_block"
|
||||
|
||||
@property
|
||||
def timeout_seconds(self) -> int | None:
|
||||
# May delegate to AutoPilotBlock (sub-autopilot), which runs its own
|
||||
# multi-turn stream of 15-45+ min. Per-call timeout is disabled here
|
||||
# and left to the block's own execution lifecycle.
|
||||
return None
|
||||
|
||||
@property
|
||||
def description(self) -> str:
|
||||
return (
|
||||
|
||||
@@ -1354,7 +1354,9 @@
|
||||
},
|
||||
{
|
||||
"$ref": "#/components/schemas/MemoryForgetConfirmResponse"
|
||||
}
|
||||
},
|
||||
{ "$ref": "#/components/schemas/BackgroundToolStatus" },
|
||||
{ "$ref": "#/components/schemas/BackgroundToolList" }
|
||||
],
|
||||
"title": "Response Getv2[Dummy] Tool Response Type Export For Codegen"
|
||||
}
|
||||
@@ -8430,6 +8432,91 @@
|
||||
"required": ["sso_url", "expires_at"],
|
||||
"title": "AyrshareSSOResponse"
|
||||
},
|
||||
"BackgroundToolList": {
|
||||
"properties": {
|
||||
"type": {
|
||||
"$ref": "#/components/schemas/ResponseType",
|
||||
"default": "mcp_tool_output"
|
||||
},
|
||||
"message": { "type": "string", "title": "Message" },
|
||||
"session_id": {
|
||||
"anyOf": [{ "type": "string" }, { "type": "null" }],
|
||||
"title": "Session Id"
|
||||
},
|
||||
"tasks": {
|
||||
"items": { "$ref": "#/components/schemas/BackgroundToolListEntry" },
|
||||
"type": "array",
|
||||
"title": "Tasks",
|
||||
"description": "All background tasks currently registered for this session."
|
||||
}
|
||||
},
|
||||
"type": "object",
|
||||
"required": ["message"],
|
||||
"title": "BackgroundToolList",
|
||||
"description": "List of active background tasks, returned by ``check_background_tool(list=true)``."
|
||||
},
|
||||
"BackgroundToolListEntry": {
|
||||
"properties": {
|
||||
"background_id": { "type": "string", "title": "Background Id" },
|
||||
"tool": {
|
||||
"type": "string",
|
||||
"title": "Tool",
|
||||
"description": "Name of the originally-backgrounded tool."
|
||||
},
|
||||
"age_seconds": {
|
||||
"type": "number",
|
||||
"title": "Age Seconds",
|
||||
"description": "Seconds since the task was parked in the background."
|
||||
},
|
||||
"done": {
|
||||
"type": "boolean",
|
||||
"title": "Done",
|
||||
"description": "True if the task has finished but hasn't been consumed yet."
|
||||
}
|
||||
},
|
||||
"type": "object",
|
||||
"required": ["background_id", "tool", "age_seconds", "done"],
|
||||
"title": "BackgroundToolListEntry",
|
||||
"description": "One row in a ``check_background_tool(list=true)`` response."
|
||||
},
|
||||
"BackgroundToolStatus": {
|
||||
"properties": {
|
||||
"type": {
|
||||
"$ref": "#/components/schemas/ResponseType",
|
||||
"default": "mcp_tool_output"
|
||||
},
|
||||
"message": { "type": "string", "title": "Message" },
|
||||
"session_id": {
|
||||
"anyOf": [{ "type": "string" }, { "type": "null" }],
|
||||
"title": "Session Id"
|
||||
},
|
||||
"status": {
|
||||
"type": "string",
|
||||
"enum": ["completed", "still_running", "cancelled", "error"],
|
||||
"title": "Status",
|
||||
"description": "Current state of the background task."
|
||||
},
|
||||
"tool": {
|
||||
"type": "string",
|
||||
"title": "Tool",
|
||||
"description": "The name of the originally-backgrounded tool."
|
||||
},
|
||||
"background_id": { "type": "string", "title": "Background Id" },
|
||||
"output": {
|
||||
"anyOf": [{}, { "type": "null" }],
|
||||
"title": "Output",
|
||||
"description": "Tool output when status=completed or status=error."
|
||||
},
|
||||
"waited_seconds": {
|
||||
"anyOf": [{ "type": "integer" }, { "type": "null" }],
|
||||
"title": "Waited Seconds"
|
||||
}
|
||||
},
|
||||
"type": "object",
|
||||
"required": ["message", "status", "tool", "background_id"],
|
||||
"title": "BackgroundToolStatus",
|
||||
"description": "Status of a backgrounded tool call, returned by ``check_background_tool``."
|
||||
},
|
||||
"BaseGraph-Input": {
|
||||
"properties": {
|
||||
"id": { "type": "string", "title": "Id" },
|
||||
|
||||
BIN
pr-12841/ui-native-v3/ui-01-async-top.png
Normal file
|
After Width: | Height: | Size: 100 KiB |
BIN
pr-12841/ui-native-v3/ui-02-async-running.png
Normal file
|
After Width: | Height: | Size: 107 KiB |
BIN
pr-12841/ui-native-v3/ui-03-async-completed.png
Normal file
|
After Width: | Height: | Size: 123 KiB |
BIN
pr-12841/ui-native-v3/ui-04-sub-session-opened.png
Normal file
|
After Width: | Height: | Size: 78 KiB |
BIN
pr-12841/ui-proof/02-ui-inline-wait60-result.png
Normal file
|
After Width: | Height: | Size: 96 KiB |
BIN
pr-12841/ui-proof/03-reasoning-modal-tool-card.png
Normal file
|
After Width: | Height: | Size: 121 KiB |
BIN
pr-12841/ui-proof/04-reasoning-modal-tool-output-expanded.png
Normal file
|
After Width: | Height: | Size: 116 KiB |
BIN
pr-12841/ui-proof/05-async-path-call1-status-running.png
Normal file
|
After Width: | Height: | Size: 101 KiB |
BIN
pr-12841/ui-proof/06-async-path-call2-completed.png
Normal file
|
After Width: | Height: | Size: 108 KiB |
BIN
pr-12841/ui-proof/07-ui-assistant-rendered-json.png
Normal file
|
After Width: | Height: | Size: 81 KiB |
BIN
test-screenshots/PR-12841/01-logged-in.png
Normal file
|
After Width: | Height: | Size: 33 KiB |
BIN
test-screenshots/PR-12841/02-copilot-page.png
Normal file
|
After Width: | Height: | Size: 90 KiB |
BIN
test-screenshots/PR-12841/03-check-background-list-ui.png
Normal file
|
After Width: | Height: | Size: 95 KiB |
BIN
test-screenshots/PR-12841/03-check-background-tool-call.png
Normal file
|
After Width: | Height: | Size: 11 KiB |
BIN
test-screenshots/PR-12841/04-check-background-unknown-ui.png
Normal file
|
After Width: | Height: | Size: 99 KiB |
BIN
test-screenshots/PR-12841/05-parked-then-completed-ui.png
Normal file
|
After Width: | Height: | Size: 102 KiB |
BIN
test-screenshots/PR-12841/05a-parked-response-top.png
Normal file
|
After Width: | Height: | Size: 102 KiB |
BIN
test-screenshots/PR-12841/06-parked-then-cancelled-ui.png
Normal file
|
After Width: | Height: | Size: 98 KiB |