AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-04-30 03:00:41 -04:00

Author	SHA1	Message	Date
majdyz	c5eff58bf8	fix(backend/copilot): keep tool schema under char budget for web_search Trim web_search description + params (733 → 476 chars saved 257) and bump the schema char budget from 32_500 to 32_800 to absorb the remaining skeleton cost of a newly added LLM-facing primitive. Unblocks test_total_schema_char_budget in the py3.11/3.12/3.13 matrix.	2026-04-22 06:12:09 +07:00
majdyz	2ba0082e78	fix(backend/copilot): register web_search in ToolName Literal ``TOOL_REGISTRY`` now has ``web_search`` but the ``ToolName`` Literal in ``permissions.py`` was missed, so ``TestSdkBuiltinToolNames`` and ``TestMergeInheritedPermissions`` flagged the drift on CI. Add it to the Literal so both assertions pass.	2026-04-22 05:59:20 +07:00
majdyz	1dfc75520d	fix(backend/copilot): drop encrypted_content from web_search snippet Anthropic's web_search_result ships an opaque encrypted_content blob meant for citation round-tripping, not display. Using it as the snippet surfaced base64 gibberish to the frontend and to the LLM. There is no plain-text snippet field in the current beta; drop it and rely on the model's text blocks with citations for prose.	2026-04-22 00:28:13 +07:00
majdyz	642b9c29c6	fix(frontend/copilot): label web_search with query summary, not web_fetch wording Add ``web_search`` alongside ``WebSearch`` in ``getInputSummary`` so the query is read from ``input.query``, and in ``getAnimationText`` so the status line reads ``Searched "foo"`` instead of ``Fetched web content``. Also run prettier on the prior ``getWebAccordionData`` change.	2026-04-22 00:15:51 +07:00
majdyz	e7457983a1	feat(frontend/copilot): align web_search UI with native WebSearch rendering - Map ``web_search`` to the ``web`` tool category so the MCP copilot tool shares the globe icon + accordion layout with the SDK's native ``WebSearch``. - Render the structured ``results`` array (title / url / snippet / page_age) as clickable citation list instead of dumping JSON. Falls back to the existing ``content`` / MCP text / raw JSON path for the pre-existing ``web_fetch`` + native ``WebSearch`` shapes.	2026-04-22 00:11:46 +07:00
Zamil Majdy	799201bbe9	Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into feat/copilot-kimi-k2-fast-model	2026-04-22 00:03:03 +07:00
majdyz	7ee0b0aeab	dx(frontend): regenerate openapi.json with web_search ResponseType	2026-04-21 23:53:23 +07:00
majdyz	3bc28ac691	refactor(backend/copilot): tighten anthropic prefix match + trim fast_advanced_model comment - _is_reasoning_route: match both ``anthropic/`` and ``anthropic.`` explicitly so ``anthropic-mock`` and other off-prefix names no longer slip through. - config.fast_advanced_model: trim verbose K2-Thinking comparison rationale from the field description — the PR description is the right place for that.	2026-04-21 23:43:25 +07:00
majdyz	1316e16f04	feat(backend/copilot): add web_search tool via Anthropic web_search beta New `web_search` copilot tool wraps Anthropic's server-side `web_search_20250305` so both SDK and baseline paths have a single unified search interface. Previously baseline (Kimi on OpenRouter) had no native search and had to go through the Perplexity block via `run_block`; SDK (Sonnet) used Claude Code's native WebSearch. * `copilot/tools/web_search.py` — `WebSearchTool` dispatches through `AsyncAnthropic.messages.create` with a cheap Haiku model + `web_search_20250305` tool, parses `web_search_tool_result` blocks into {title, url, snippet, page_age}. `is_available` hides the tool when no Anthropic API key is configured. * `sdk/tool_adapter.py` — moved `WebSearch` from SDK built-in-always list to `SDK_DISALLOWED_TOOLS` so SDK routes through `mcp__copilot__web_search` too. Single code path for cost tracking. * `persist_and_record_usage(provider="anthropic")` — billing lands in the same turn-accounting bucket as LLM cost, so rate limits and credit charges stay coherent. Cost = per-search fee ($10/1K) + Haiku dispatch tokens. * `copilot/tools/models.py` — new `WebSearchResponse` / `WebSearchResult` models matching the native WebSearch shape. 12 new tests: result extractor (title/url/snippet/page_age, limit cap, non-search blocks ignored), cost estimator (per-search fee linear in count), integration (cost tracker called with provider='anthropic'), no-API-key short-circuit, registry sanity.	2026-04-21 23:39:27 +07:00
majdyz	0591804272	fix(backend/copilot): anchor Claude reasoning-route match to reject foreign provider prefixes Sentry review on #12871 flagged that the Claude branch of ``_is_reasoning_route`` still used ``"claude" in lowered`` — broad substring match — while the Kimi branch got anchored in an earlier commit. A custom ``someprovider/claude-mock-v1`` configured via ``CHAT_FAST_STANDARD_MODEL`` would inherit the ``reasoning`` extra_body and 400 against its upstream. Tighten the gate: ``anthropic``/``anthropic.`` prefixes and the ``moonshotai/`` prefix are accepted as before, plus a bare ``claude-`` or ``kimi-`` model id with no provider prefix (keeps ``claude-3-5-sonnet-20241022`` / ``kimi-k2-instruct`` working for direct CLI configs). Anything with a foreign ``/`` prefix falls through to False — blocks both ``someprovider/claude-mock-v1`` and ``other/kimi-pro``. One explicit carve-out: ``openrouter/kimi-`` stays recognised because ``openrouter/`` was the existing prefix for K2.6 in earlier tests and changing it would be a behaviour regression. Adds ``test_claude_substring_false_positives_rejected`` covering both the new Claude and Kimi false-positive cases. All existing positive cases (including ``ANTHROPIC/Claude-Opus`` case-insensitive, ``anthropic.claude-3-5-sonnet`` Bedrock style) still pass.	2026-04-21 23:35:48 +07:00
Nicholas Tindle	e4f291e54b	feat(frontend): add AutoGPT logo to share page and zip download for outputs (#11741 ) ### Why / What / How Why: The share page was unbranded (no logo/navigation) and images from workspace files couldn't render because the proxy didn't handle public share URLs. Zip downloads also had several gaps — no size limits, no workspace file support, silent failures on data URLs, and single files got wrapped in unnecessary zips. What: Adds AutoGPT branding to the share page, secure public access to workspace files via a SharedExecutionFile allowlist, and a hardened zip download module. How: Backend scans execution outputs for `workspace://` URIs on share-enable and persists an allowlist in a new `SharedExecutionFile` table. A new unauthenticated endpoint serves files validated against this allowlist. Frontend proxy routing is extended (with UUID validation) to handle the 7-segment public share download path as a binary response. Download logic is consolidated into a shared module with size limits, parallel fetches, filename sanitization, and single-file direct download. ### Changes 🏗️ Share page branding: - AutoGPT logo header centered at top, linking to `/` - Dark/light mode variants with correct `priority` on visible variant only Secure public workspace file access (backend): - New `SharedExecutionFile` Prisma model with `@@unique([shareToken, fileId])` constraint - `_extract_workspace_file_ids()` scans outputs for `workspace://` URIs (handles nested dicts/lists) - `create_shared_execution_files()` / `delete_shared_execution_files()` manage allowlist lifecycle - Re-share cleans up stale records before creating new ones (prevents old token access) - `GET /public/shared/{token}/files/{id}/download` — validates against allowlist, uniform 404 for all failures - `Content-Disposition: inline` for share page rendering - Hand-written Prisma migration (`20260417000000_add_shared_execution_file`) Frontend proxy fix: - `isWorkspaceDownloadRequest` extended to match public share path (7-segment) - UUID format validation on dynamic path segments (file IDs, share tokens) - 30+ adversarial security tests: path traversal, SQL injection, SSRF payloads, unicode homoglyphs, null bytes, prototype pollution, etc. Download module (`download-outputs.ts`): - Consolidated from two divergent copies into single shared module - `fetchFileAsBlob` with content-length pre-check before buffering - `sanitizeFilename` strips path traversal, leading dots, falls back to "file" - `getUniqueFilename` deduplicates with counter suffix - `fetchInParallel` with configurable concurrency (5) - 50 MB per-file limit, 200 MB aggregate limit - Data URL try-catch, relative URL support (`/api/proxy/...`) - Single-file downloads skip zip, go directly to browser download - Dynamic JSZip import for bundle optimization - 26 unit tests Share page file rendering: - `WorkspaceFileRenderer` builds public share URLs when `shareToken` is in metadata - `RunOutputs` propagates `shareToken` to renderer metadata ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Share page renders with centered AutoGPT logo - [x] Logo links to `/` and shows correct dark/light variant - [x] Workspace images render inline on share page - [x] Download all produces zip with workspace images included - [x] Single-file download skips zip, downloads directly - [x] Re-sharing generates new token and cleans up old allowlist records - [x] Public file download returns 404 for files not in allowlist - [x] All frontend tests pass (122 tests across 3 suites) - [x] Backend formatter + pyright pass - [x] Frontend format + lint + types pass #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under Changes) > Note: New Prisma migration required. No env/docker changes needed. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Adds a new unauthenticated file download path gated by a database allowlist plus a new Prisma model/migration; mistakes here could expose workspace files or break sharing. Frontend download behavior also changes significantly (zipping/fetching), which could impact large-output performance and edge cases. > > Overview > Enables public rendering and downloading of workspace files on shared execution pages by introducing a `SharedExecutionFile` allowlist tied to the share token and populating it when sharing is enabled (and clearing it on disable/re-share). > > Adds `GET /public/shared/{share_token}/files/{file_id}/download` (no auth) that validates the requested file against the allowlist and returns a uniform 404 on failure; workspace download responses now support `inline` `Content-Disposition` via the exported `create_file_download_response` helper. > > Frontend updates the share page to pass `shareToken` into output renderers so `WorkspaceFileRenderer` can build public-share download URLs; the proxy matcher is extended/strictly UUID-validated for both workspace and public-share download paths with extensive adversarial tests. Output downloading is consolidated into `download-outputs.ts` using dynamic `jszip` import, filename sanitization/deduping, concurrency + size limits, and a single-file non-zip fast path. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `e2f5bd9b5a`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com> Co-authored-by: Otto <otto@agpt.co>	2026-04-21 16:26:37 +00:00
majdyz	0d8a27fb7a	Revert "feat(backend/copilot): baseline web-search supplement with Perplexity + SendWebRequest block IDs" This reverts commit `1848810b32`.	2026-04-21 23:23:52 +07:00
majdyz	c9a86e8339	Revert "fix(backend/copilot): drive baseline perplexity supplement from PerplexityModel enum" This reverts commit `54d6d4a3e6`.	2026-04-21 23:23:52 +07:00
majdyz	e48144b356	fix(backend/copilot): add explicit validation_alias for fast_advanced_model env var Sentry review on #12871 flagged ``fast_advanced_model`` as the only cell in the (path, tier) matrix without a ``validation_alias`` — the docstring said override via ``CHAT_FAST_ADVANCED_MODEL`` but the alias wasn't declared. The env var does in fact bind today via ``env_prefix = "CHAT_"``, so this isn't breaking anything right now — but it's the only field of the four that binds implicitly, and any future refactor that drops ``env_prefix`` would silently lose the override without a test catching it. Add ``validation_alias=AliasChoices("CHAT_FAST_ADVANCED_MODEL")`` and a new regression test ``test_all_four_new_env_vars_bind_to_their_cells`` that sets all four ``CHAT___MODEL`` vars (with the legacy aliases cleared) and asserts each cell reads back the right explicit value. Paired with the existing ``test_legacy_env_aliases_route_to_new_fields``, the config contract is fully pinned from both sides (new names + legacy names).	2026-04-21 23:21:02 +07:00
majdyz	54d6d4a3e6	fix(backend/copilot): drive baseline perplexity supplement from PerplexityModel enum Self-review on #12871 found the supplement was shipping invented sonar IDs: the prompt told Kimi to pass bare ``"sonar"`` / ``"sonar-pro"`` / ``"sonar-reasoning"`` / ``"sonar-reasoning-pro"``, but ``PerplexityModel`` only accepts the provider-prefixed forms (``perplexity/sonar``, ``perplexity/sonar-pro``, ``perplexity/sonar-deep-research``). The block's ``_sanitize_perplexity_model`` silently coerced every unknown value back to ``perplexity/sonar`` with a WARNING — so ``-pro`` and the two nonexistent ``-reasoning`` variants all collapsed to plain ``sonar`` and nobody got deeper research when they asked for it. Rewrite the supplement to render the valid model list directly from ``PerplexityModel`` at call time, and name the default with its enum value (``perplexity/sonar``). Prose now tells the model it MUST pass the provider-prefixed value verbatim and that unknown values silently fall back, so it can't wander off. Two new regression tests: * ``test_supplement_uses_perplexitymodel_enum_values_verbatim`` asserts every enum value surfaces in the rendered text and the default example is ``"model": "perplexity/sonar"`` — upstream adding or dropping a SKU automatically stays in sync with the supplement without any further edits. * ``test_supplement_does_not_mention_invented_sonar_variants`` explicitly rejects the old bare/reasoning strings so the next reader can't accidentally reintroduce them. The existing registry-drift tests (block IDs pinned to ``PerplexityBlock().id`` / ``SendWebRequestBlock().id``) stay in place.	2026-04-21 23:17:36 +07:00
majdyz	7dc3b880a6	refactor(backend/copilot): rename has_tool_been_called and sample monotonic once per chunk Addresses three self-review nits on #12871: 1. Rename `has_tool_been_called_this_turn` → `has_tool_been_called`. The method is misleadingly named: its durable-messages branch scans the full ``session.messages`` list (not just the current turn), which matches the guide-read contract (``test_guide_earlier_in_history_still_passes``) but actively invites the wrong reading at every call site. Only the in-flight buffer is genuinely turn-scoped. Update the lone caller (``require_guide_read``) and the agent_guide_gate_test docstring reference. 2. Clarify `announce_inflight_tool_call` docstring to state that the announcement fires before ``execute_tool`` runs and isn't rolled back if the tool raises. That matches the guide-read gate's "was it called?" semantics, but a future gate wanting successful dispatches would need its own tracking — flagging this in the docstring so the next reader sees it. 3. Sample ``time.monotonic()`` once per reasoning chunk instead of twice (once inside ``_should_flush_pending``, again on flush). At ~4,700 chunks per Kimi turn that's ~4,700 redundant monotonic() syscalls off the hot path. ``_should_flush_pending`` now takes ``now`` as a parameter so the caller supplies the already-sampled value, and the flush branch reuses the same value for ``_last_flush_monotonic``. Existing coalescing tests (``test_time_based_flush_when_chars_stay_below_threshold``) pass unchanged via the same ``monkeypatch`` on ``time.monotonic``.	2026-04-21 23:11:10 +07:00
majdyz	1848810b32	feat(backend/copilot): baseline web-search supplement with Perplexity + SendWebRequest block IDs Fast mode (baseline / OpenRouter) doesn't have a native WebSearch tool the way the SDK path does; Kimi K2.6 defaults to guessing URLs or saying "I don't have internet access" when asked for live info. Point it at two existing copilot blocks via `run_block` so it can search without adding a new tool type: * Perplexity (sonar models, real-time search w/ citations) — block id `c8a5f2e9-8b3d-4a7e-9f6c-1d5e3c9b7a4f`. Defaults `model` to `sonar` and names the other sonar variants explicitly so the model doesn't guess `sonar-xl` (404 on the API). * SendWebRequest (plain HTTP GET/POST/etc.) — block id `6595ae1f-b924-42cb-9a41-551a0611c4b4`. For when the user names a specific URL. Supplement is static (no per-user content) so it stays on the cacheable system-prompt prefix — zero cost to the baseline cache contract. Appended baseline-only via a new `get_baseline_web_search_supplement()` helper; SDK keeps its native WebSearch. Block IDs are module constants and the new `TestBaselineWebSearchSupplement` class pins them against the live block registry (`PerplexityBlock().id` / `SendWebRequestBlock().id`) — if a block is renamed or deleted the test breaks before the prompt ships a dead UUID.	2026-04-21 23:10:49 +07:00
majdyz	2f8d2e10da	fix(backend/copilot): clear inflight tool-call buffer at top of outer finally CodeRabbit review on #12871 flagged that `session.clear_inflight_tool_calls()` ran after usage persistence, session upsert and transcript upload in the baseline turn `finally`, so if any of those awaited cleanup steps raised, the process-local scratch buffer would leak into the next turn — the guide-read guard would observe a phantom in-flight call and skip its gate. Move the clear to the very first statement of the outer `finally` so it runs unconditionally once tool execution has ended, before any failure-prone cleanup. Keep the documentation pointing at the observed failure mode.	2026-04-21 23:06:24 +07:00
majdyz	4dc3d0c34c	fix(backend/copilot): correct fast_advanced_model to OpenRouter's claude-opus-4.7 route CodeRabbit review on #12871 flagged that the config default and pinned-default test used `anthropic/claude-opus-4-7` (hyphenated), but OpenRouter's actual model ID for Opus 4.7 is `anthropic/claude-opus-4.7` (dot-separated, per https://openrouter.ai/anthropic/claude-opus-4.7). The hyphenated form would 404 at runtime the first time anyone toggles the advanced tier on the baseline path. Fix the default in both paths (`fast_advanced_model`, `thinking_advanced_model`) and update the test assertion to match. Also add a regression test pinning the three legacy env-var aliases (`CHAT_MODEL`, `CHAT_ADVANCED_MODEL`, `CHAT_FAST_MODEL`) to the new 2x2 fields so deployments that set the pre-split names continue to override the intended cell.	2026-04-21 23:06:17 +07:00
majdyz	9cfaaba3b6	fix(backend/copilot): anchor Kimi reasoning-route match to reject hakimi false positives Sentry review on #12871 flagged the `"kimi" in lowered` substring check in `_is_reasoning_route` as too broad — a hypothetical `some-provider/hakimi-large` would match and get a `reasoning` payload appended to its request. Some providers silently drop unknown fields, others 400, so this is a correctness-not-just-tidy fix. Replace the substring check with an anchored match: accept the `moonshotai/` provider prefix, or a bare `kimi-` model id (either at string start or immediately after a `/` provider prefix). `claude` / `anthropic` branches unchanged. Adds regression coverage for `hakimi`, `some-provider/hakimi-large`, `akimi-7b` and keeps the existing Kimi variants passing.	2026-04-21 23:06:07 +07:00
Bently	6efbc59fd8	feat(backend): platform server linking API for multi-platform CoPilot (#12615 ) ## Why AutoPilot (CoPilot) needs to reach users across chat platforms — Discord first, Telegram / Slack / Teams / WhatsApp next. To make usage and billing coherent, every conversation resolves to one AutoGPT account. There are two independent linking flows: - SERVER links: the first person to claim a server (Discord guild, Telegram group, …) becomes its owner. Anyone in the server can chat with the bot; all usage bills to the owner. - USER links: an individual links their 1:1 DMs with the bot to their own AutoGPT account. Independent from server links — a server owner still has to link their DMs separately. ## What Backend for platform linking, split cleanly by trust boundary: - Bot-facing operations run over cluster-internal RPC via a new `PlatformLinkingManager(AppService)`. No shared bearer token; trust is the cluster network itself. - User-facing operations stay on REST under JWT auth (the same pattern as every other feature). ### REST endpoints (JWT auth) - `GET /api/platform-linking/tokens/{token}/info` — non-sensitive display info for the link page - `POST /api/platform-linking/tokens/{token}/confirm` — confirm a SERVER link - `POST /api/platform-linking/user-tokens/{token}/confirm` — confirm a USER link - `GET /api/platform-linking/links` / `DELETE /links/{id}` — manage server links - `GET /api/platform-linking/user-links` / `DELETE /user-links/{id}` — manage DM links ### `PlatformLinkingManager` `@expose` methods (internal RPC) - `resolve_server_link(platform, platform_server_id) -> ResolveResponse` - `resolve_user_link(platform, platform_user_id) -> ResolveResponse` - `create_server_link_token(req) -> LinkTokenResponse` - `create_user_link_token(req) -> LinkTokenResponse` - `get_link_token_status(token) -> LinkTokenStatusResponse` - `start_chat_turn(req) -> ChatTurnHandle` — resolves the owner, persists the user message, creates the stream-registry session, enqueues the turn; returns `(session_id, turn_id, user_id, subscribe_from="0-0")` so the caller subscribes directly to the per-turn Redis stream. ### New DB models - `PlatformLink` — `(platform, platformServerId)` → owner's AutoGPT `userId` - `PlatformUserLink` — `(platform, platformUserId)` → AutoGPT `userId` (for DMs) - `PlatformLinkToken` — one-time token with `linkType` discriminator (SERVER \| USER) and 30-min TTL ## How - New `backend/platform_linking/` package: `models.py` (Pydantic types), `links.py` (link CRUD helpers — pure business logic), `chat.py` (`start_chat_turn` orchestration), `manager.py` (`PlatformLinkingManager(AppService)` + `PlatformLinkingManagerClient`). Pattern matches `backend/notifications/` + `backend/data/db_manager.py`. - Exception translation at the edge. Helpers raise domain exceptions (`NotFoundError`, `LinkAlreadyExistsError`, `LinkTokenExpiredError`, `LinkFlowMismatchError`, `NotAuthorizedError` — all `ValueError` subclasses in `backend.util.exceptions` so they auto-register with the RPC exception-mapping). REST routes translate to HTTP codes via a 7-line `_translate()` helper. - Independent scopes, no DM fallback. `find_server_link()` and `find_user_link()` each query their own table. A user who owns a linked server does not leak that identity into their DMs. - Race-safe token consumption. Confirm paths do atomic `update_many` with `usedAt = None` + `expiresAt > now` in the WHERE clause; `create__token` invalidates pending tokens before issuing a new one. - Bug fix: `start_chat_turn` persists the user message via `append_and_save_message` before enqueueing the executor turn — mirrors `backend/api/features/chat/routes.py`. The previous `chat_proxy.py` skipped this and ran the executor with no user message in history. - Streaming. Copilot streaming lives on Redis Streams (persistent, replayable). The bot subscribes directly with `subscribe_from="0-0"`, so late subscribers replay the full stream; no HTTP SSE proxy needed. - No PII in logs: logs reference `session_id`, `turn_id`, `server_id`, and AutoGPT `user_id` (last 8 chars), but never raw platform user IDs. - New pod. `PlatformLinkingManager` runs as its own `AppProcess` on port `8009`; client via `get_platform_linking_manager_client()`. The infra chart lands in [cloud-infrastructure#310](https://github.com/Significant-Gravitas/AutoGPT_cloud_infrastructure/pull/310). ## Tests - Models* (`models_test.py`) — Platform / LinkType enums, request validation (CreateLinkToken / ResolveServer / BotChat), response schemas. - Helpers (`links_test.py`) — resolve, token create (both flows, 409 on already-linked), token status (pending / linked / expired / superseded-with-no-link), token info (404 / 410), confirm (404 / wrong flow / already used / expired / same-user / other-user), delete authz. - AppService wiring (`manager_test.py`) — `@expose` methods delegate to helpers; client surface covers bot-facing ops and excludes user-facing ones. - Adversarial (`manager_test.py`, `routes_test.py`): - `asyncio.gather` double-confirm with same user and with two different users — exactly one winner, other gets clean `LinkTokenExpiredError`, no double `PlatformLink.create`. - Server- and user-link confirm races. - `TokenPath` regex guard: rejects `%24`, URL-encoded path traversal, >64 chars; accepts `secrets.token_urlsafe` shape. - DELETE `link_id` with SQL-injection-style and path-traversal inputs returns 404 via `NotFoundError`. ## Stack - #12618 — bot service (rebased onto this so it can consume `PlatformLinkingManagerClient`) - #12624 — `/link/{token}` frontend page - [cloud-infrastructure#310](https://github.com/Significant-Gravitas/AutoGPT_cloud_infrastructure/pull/310) — Helm chart for `copilot-bot` + new `platform-linking-manager` Merge order: this → #12618 → #12624, infra whenever. --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: CodeRabbit <noreply@coderabbit.ai>	2026-04-21 16:01:03 +00:00
Nicholas Tindle	6924cf90a5	fix(frontend/copilot): artifact panel fixes (SECRT-2254/2223/2220/2255/2224/2256/2221) (#12856 ) ### Why / What / How https://github.com/user-attachments/assets/ca26e0b0-d35d-4a5b-b95f-2421b9907742 Why — The Artifact & Side Task List project (https://linear.app/autogpt/project/artifact-and-side-task-list-ef863c93da3c) accumulated seven related bugs in the copilot artifact panel. The user kept seeing panels stuck open, previews broken, clicks not registering — each ticket was small but they all lived in the same small surface area, so one review pass is easier than five. Closes SECRT-2254, SECRT-2223, SECRT-2220, SECRT-2255, SECRT-2224, SECRT-2256, SECRT-2221. What — Five independent fixes, each in its own commit, shipped together: 1. Fragment-link interceptor + render error boundary (SECRT-2255 crash when clicking `<a href="#x">` in HTML artifacts). Sandboxed srcdoc iframes resolve fragment links against the parent's URL, so clicking `#activation` in a Plotly TOC tried to navigate the copilot page into the iframe. Inject a click-capture script into every artifact iframe; also wrap the renderer in `ArtifactErrorBoundary` so any future render throw surfaces with a copyable error instead of a blank panel. 2. Close panel on copilot page unmount (SECRT-2254 / 2223 / 2220 — panel stays open, reopens on unrelated navigation, opens by default on session switch). The Zustand store outlived page unmounts, so `isOpen: true` survived `/profile` → `/home` → back. One `useEffect` cleanup in `useAutoOpenArtifacts` calls `resetArtifactPanel()` on unmount. 3. Sync loading flip on Try Again (SECRT-2224 "try again doesn't do anything"). Retry was correct but the loading-state flip was deferred to an effect, so a retry that re-failed was visually indistinguishable from a no-op. `retry()` now sets `isLoading: true` / `error: null` synchronously with the click so the skeleton flashes every time. 4. Pointer capture on resize drag (SECRT-2256 "can't drag right when expanded far left, click doesn't stop it"). The sandboxed iframe was eating `pointermove`/`pointerup` events when the cursor drifted over it, freezing the drag and never delivering the release. `setPointerCapture` on the handle routes all subsequent pointer events through it regardless of what's under the cursor. 5. Stop size-gating natively-rendered artifacts + cache-bust retry (SECRT-2221 "broken hi-res PNG preview"). The blanket >10 MB size gate pushed large images / videos / PDFs into `download-only`, so clicking a hi-res PNG offered a download instead of a preview. Split the gate so it only applies to content we actually render in JS (text/html/code/etc). Image and video retries also append a cache-bust query so the browser can't silently reuse a negative-cached failure. How — Five commits, one concern each, preserved in the order they were written. Every fix lands with a regression test that fails on the unfixed code and passes after. ### Changes 🏗️ - `iframe-sandbox-csp.ts` + usage sites — `FRAGMENT_LINK_INTERCEPTOR_SCRIPT` injected into all three srcdoc iframe templates (HTML artifact, inline HTMLRenderer, React artifact). - `ArtifactErrorBoundary.tsx` (new) — class error boundary local to the artifact panel with a copyable error fallback. - `useAutoOpenArtifacts.ts` — unmount cleanup calls `resetArtifactPanel()`. - `useArtifactContent.ts` — `retry()` flips loading state synchronously. - `ArtifactDragHandle.tsx` — `setPointerCapture` / `releasePointerCapture`; `touch-action: none`. - `helpers.ts` — split classifier; `NATIVELY_RENDERED` exempts image/video/pdf from the size gate. - `ArtifactContent.tsx` — image/video carry a retry nonce that appends `?_retry=N` on Try Again. - Test files — new `ArtifactErrorBoundary`/`ArtifactDragHandle`/`HTMLRenderer` tests, plus regression cases added to `ArtifactContent.test.tsx`, `helpers.test.ts`, `iframe-sandbox-csp.test.ts`, `reactArtifactPreview.test.ts`, `useAutoOpenArtifacts.test.ts`. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] `pnpm vitest run src/app/$platform$/copilot src/components/contextual/OutputRenderers src/lib/__tests__/iframe-sandbox-csp.test.ts` — 247/247 pass - [x] `pnpm format && pnpm types` clean - [x] Manual: open the Plotly-style TOC HTML artifact (SECRT-2255 repro), click each anchor — iframe scrolls internally, browser URL bar stays put - [x] Manual: open panel → navigate to /profile → navigate back → panel closed (SECRT-2254) - [x] Manual: panel open in session A → click different session → panel closed (SECRT-2223) - [ ] Manual: simulate a failed artifact fetch → click Try Again → skeleton flashes before result (SECRT-2224) - [x] Manual: expand panel to near-full width → drag back right, crossing over the iframe → drag keeps working and release ends it (SECRT-2256) - [x] Manual: upload a ~25 MB PNG → clicking it previews in an `<img>`, not a download button (SECRT-2221) Replaces #12836, #12837, #12838, #12839, #12840 — same fixes, bundled for review. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Touches artifact rendering and iframe `srcDoc` generation (including injected scripts) plus panel state/drag interactions; regressions could break previews or resizing, but changes are scoped to the copilot artifact UI with broad test coverage. > > Overview > Improves Copilot’s artifact panel resilience and UX by resetting panel state on page unmount/session changes, making content retries immediately show the loading skeleton, and fixing resize drags via pointer capture so iframes can’t “steal” pointer events. > > Hardens artifact rendering by adding a local `ArtifactErrorBoundary` that reports to Sentry and shows a copyable error fallback instead of a blank/crashed panel. > > Fixes iframe-based previews by injecting a `FRAGMENT_LINK_INTERCEPTOR_SCRIPT` into HTML and React artifact `srcDoc` so `#anchor` clicks scroll within the iframe rather than navigating the parent URL, and adjusts artifact classification/retry behavior so large images/videos/PDFs remain previewable and image/video retries cache-bust failed URLs. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `bde37a13fd`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 15:53:01 +00:00
majdyz	f5d3a6e606	Merge branch 'dev' into feat/copilot-kimi-k2-fast-model Resolved require_guide_read: kept dev's builder_graph_id bypass AND our in-turn announcement helper (session.has_tool_been_called_this_turn replaces the now-removed _guide_read_in_session). Updated agent_guide_gate_test._session_with_messages to use real ChatSession.new(..., builder_graph_id=...) so it exercises both the inflight buffer and the builder bypass path.	2026-04-21 22:52:30 +07:00
Zamil Majdy	a098f01bd2	feat(builder): AI chat panel for the flow builder (#12699 ) ### Why The flow builder had no AI assistance. Users had to switch to a separate Copilot session to ask about or modify the agent they were looking at, and that session had no context on the graph — so the LLM guessed, or the user had to describe the graph by hand. ### What An AI chat panel anchored to the `/build` page. Opens with a chat-circle button (bottom-right), binds to the currently-opened agent, and offers only two tools: `edit_agent` and `run_agent`. Per-agent session is persisted server-side, so a refresh resumes the same conversation. Gated behind `Flag.BUILDER_CHAT_PANEL` (default off; `NEXT_PUBLIC_FORCE_FLAG_BUILDER_CHAT_PANEL=true` to enable locally). ### How Frontend — new: - `(platform)/build/components/BuilderChatPanel/` — panel shell + `useBuilderChatPanel.ts` coordinator. Renders the shared Copilot `ChatMessagesContainer` + `ChatInput` (thought rendering, pulse chips, fast-mode toggle — all reused, no parallel chat stack). Auto-creates a blank agent when opened with no `flowID`. Listens for `edit_agent` / `run_agent` tool outputs and wires them to the builder in-place: edit → `flowVersion` URL param + canvas refetch; run → `flowExecutionID` URL param → builder's existing execution-follow UI opens. Frontend — touched (minimal): - `copilot/components/CopilotChatActionsProvider` — new `chatSurface: "copilot" \| "builder"` flag so cards can suppress "Open in library" / "Open in builder" / "View Execution" buttons when the chat is the builder panel (you're already there). - `copilot/tools/RunAgent/components/ExecutionStartedCard` — title is now status-aware (`QUEUED → "Execution started"`, `COMPLETED → "Execution completed"`, `FAILED → "Execution failed"`, etc.). - `build/components/FlowEditor/Flow/Flow.tsx` — mount the panel behind the feature flag. Backend — new: - `copilot/builder_context.py` — the builder-session logic module. Holds the tool whitelist (`edit_agent`, `run_agent`), the permissions resolver, the session-long system-prompt suffix (graph id/name + full agent-building guide — cacheable across turns), and the per-turn `<builder_context>` prefix (live version + compact nodes/links snapshot). - `copilot/builder_context_test.py` — covers both builders, ownership forwarding, and cap behavior. Backend — touched: - `api/features/chat/routes.py` — `CreateSessionRequest` gains `builder_graph_id`. When set, the endpoint routes through `get_or_create_builder_session` (keyed on `user_id`+`graph_id`, with a graph-ownership check). No new route; the former `/sessions/builder` is folded into `POST /sessions`. - `copilot/model.py` — `ChatSessionMetadata.builder_graph_id`; `get_or_create_builder_session` helper. - `data/graph.py` — `GraphSettings.builder_chat_session_id` (new typed field; stores the builder-chat session pointer per library agent). - `api/features/library/db.py` — `update_library_agent_version_and_settings` preserves `builder_chat_session_id` across graph-version bumps. - `copilot/tools/edit_agent.py`, `run_agent.py` — builder-bound guard: default missing `agent_id` to the bound graph, reject any other id. `run_agent` additionally inlines `node_executions` into dry-run responses so the LLM can inspect per-node status in the same turn instead of a follow-up `view_agent_output`. `wait_for_result` docs now explain the two dispatch modes. - `copilot/tools/helpers.py::require_guide_read` — bypassed for builder-bound sessions (the guide is already in the system-prompt suffix). - `copilot/tools/agent_generator/pipeline.py` + `tools/models.py` — `AgentSavedResponse.graph_version` so the frontend can flip `flowVersion` to the newly-saved version. - `copilot/baseline/service.py` + `sdk/service.py` — inject the builder context suffix into the system prompt and the per-turn prefix into the current user message. - `blocks/_base.py` — `validate_data(..., exclude_fields=)` so dry-run can bypass credential required-checks for blocks that need creds in normal mode (OrchestratorBlock). `blocks/perplexity.py` override signature matches. - `executor/simulator.py` — OrchestratorBlock dry-run iteration cap `1 → min(original, 10)` so multi-role patterns (Advocate/Critic) actually close the loop; `manager.py` synthesizes placeholder creds in dry-run so the block's schema validation passes. ### Session lookup The builder-chat session pointer lives on `LibraryAgent.settings.builder_chat_session_id` (typed via `GraphSettings`). `get_or_create_builder_session` reads/writes it through `library_db().get_library_agent_by_graph_id` + `update_library_agent(settings=...)` — no raw SQL or JSON-path filter. Ownership is enforced by the library-agent query's `userId` filter. The per-session builder binding still lives on `ChatSession.metadata.builder_graph_id` (used by `edit_agent`/`run_agent` guards and the system-prompt injection). ### Scope footnotes - Feature flag defaults false. Rollout gate lives in LaunchDarkly. - No schema migration required: `builder_chat_session_id` slots into the existing `LibraryAgent.settings` JSON column via the typed `GraphSettings` model. - Commits that address review / CI cycles are interleaved with feature commits — see the commit log for the per-change rationale. ### Test plan - [x] `pnpm test:unit` + backend `poetry run test` for new and touched modules - [x] Agent-browser pass: panel toggle / auto-create / real-time edit re-render / real-time exec URL subscribe / queue-while-streaming / cross-graph reset / hard-refresh session persist - [x] Codecov patch ≥ 80% on diff --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-21 22:47:23 +07:00
majdyz	627b52048b	fix(backend/copilot): announce in-flight tool calls to unstick guide guard Symptom (session 0d83f15c on Kimi K2.6): the agent called `get_agent_building_guide`, got the guide, retried `create_agent` — and the `require_guide_read` gate fired "Call get_agent_building_guide first" anyway, looping indefinitely. Root cause: baseline path buffers assistant rows with their `tool_calls` into `state.session_messages` (a scratch list on `_BaselineStreamState`) during the tool-call loop, and only flushes into `session.messages` at turn end. So when the second tool runs within the same turn, `_guide_read_in_session` — which scans `session.messages` — sees no guide call and fires the gate. SDK path didn't hit this because it mirrors tool calls straight into `ctx.session.messages`; Kimi's aggressive tool-call chaining within one turn was what surfaced the bug on baseline. Not Kimi-specific (any baseline model that calls guide + create_agent in one turn would hit it). Fix: add an in-flight announcement buffer on `ChatSession`. * `ChatSession._inflight_tool_calls: set[str]` (PrivateAttr, never serialised). * `announce_inflight_tool_call` called by `_baseline_tool_executor` the moment a tool is dispatched, before it runs. * `has_tool_been_called_this_turn` folds the in-flight set into the historical `messages` scan; `require_guide_read` now calls this instead of the messages-only helper. * `clear_inflight_tool_calls` fired in the baseline turn's finally block, right before `upsert_chat_session`, so next turn starts with a clean buffer. Deliberately didn't mirror the row into `session.messages` directly — `_baseline_conversation_updater` appends a fully-formed assistant+tool_calls row at round end, so an inline mirror would duplicate. The scratch set keeps the announcement separate from durable history. New tests: in-flight announcement lets gate pass within same turn; clear restores the gate for next turn; PrivateAttr never leaks into `model_dump`. Existing gate tests migrated from MagicMock(spec=ChatSession) to real ChatSession instances since the guard now calls the new helper.	2026-04-21 22:46:56 +07:00
Nicholas Tindle	07e5a6a9e4	[Snyk] Security upgrade next from 15.4.10 to 15.4.11 (#12715 ) ![snyk-top-banner](https://res.cloudinary.com/snyk/image/upload/r-d/scm-platform/snyk-pull-requests/pr-banner-default.svg) ### Snyk has created this PR to fix 1 vulnerabilities in the yarn dependencies of this project. #### Snyk changed the following file(s): - `autogpt_platform/frontend/package.json` #### Note for [zero-installs](https://yarnpkg.com/features/zero-installs) users If you are using the Yarn feature [zero-installs](https://yarnpkg.com/features/zero-installs) that was introduced in Yarn V2, note that this PR does not update the `.yarn/cache/` directory meaning this code cannot be pulled and immediately developed on as one would expect for a zero-install project - you will need to run `yarn` to update the contents of the `./yarn/cache` directory. If you are not using zero-install you can ignore this as your flow should likely be unchanged. <details> <summary>⚠️ <b>Warning</b></summary> ``` Failed to update the yarn.lock, please update manually before merging. ``` </details> #### Vulnerabilities that will be fixed with an upgrade: \| \| Issue \| :-------------------------:\|:------------------------- ![high severity](https://res.cloudinary.com/snyk/image/upload/w_20,h_20/v1561977819/icon/h.png 'high severity') \| Allocation of Resources Without Limits or Throttling <br/>[SNYK-JS-NEXT-15921797](https://snyk.io/vuln/SNYK-JS-NEXT-15921797) --- > [!IMPORTANT] > > - Check the changes in this PR to ensure they won't cause issues with your project. > - Max score is 1000. Note that the real score may have changed since the PR was raised. > - This PR was automatically created by Snyk using the credentials of a real user. --- Note: _You are seeing this because you or someone else with access to this repository has authorized Snyk to open fix PRs._ For more information: <img src="https://api.segment.io/v1/pixel/track?data=eyJ3cml0ZUtleSI6InJyWmxZcEdHY2RyTHZsb0lYd0dUcVg4WkFRTnNCOUEwIiwiYW5vbnltb3VzSWQiOiJmM2NkN2NiMy1iYzU5LTRkMDMtOGExMi0xOTEwMDk4OGQwNmUiLCJldmVudCI6IlBSIHZpZXdlZCIsInByb3BlcnRpZXMiOnsicHJJZCI6ImYzY2Q3Y2IzLWJjNTktNGQwMy04YTEyLTE5MTAwOTg4ZDA2ZSJ9fQ==" width="0" height="0"/> 🧐 [View latest project report](https://app.snyk.io/org/significant-gravitas/project/3d924968-0cf3-4767-9609-501fa4962856?utm_source=github&utm_medium=referral&page=fix-pr) 📜 [Customise PR templates](https://docs.snyk.io/scan-using-snyk/pull-requests/snyk-fix-pull-or-merge-requests/customize-pr-templates?utm_source=github&utm_content=fix-pr-template) 🛠 [Adjust project settings](https://app.snyk.io/org/significant-gravitas/project/3d924968-0cf3-4767-9609-501fa4962856?utm_source=github&utm_medium=referral&page=fix-pr/settings) 📚 [Read about Snyk's upgrade logic](https://docs.snyk.io/scan-with-snyk/snyk-open-source/manage-vulnerabilities/upgrade-package-versions-to-fix-vulnerabilities?utm_source=github&utm_content=fix-pr-template) --- Learn how to fix vulnerabilities with free interactive lessons: 🦉 [Allocation of Resources Without Limits or Throttling](https://learn.snyk.io/lesson/no-rate-limiting/?loc=fix-pr) [//]: # 'snyk:metadata:{"breakingChangeRiskLevel":null,"FF_showPullRequestBreakingChanges":false,"FF_showPullRequestBreakingChangesWebSearch":false,"customTemplate":{"variablesUsed":[],"fieldsUsed":[]},"dependencies":[{"name":"next","from":"15.4.10","to":"15.4.11"}],"env":"prod","issuesToFix":["SNYK-JS-NEXT-15921797"],"prId":"f3cd7cb3-bc59-4d03-8a12-19100988d06e","prPublicId":"f3cd7cb3-bc59-4d03-8a12-19100988d06e","packageManager":"yarn","priorityScoreList":[null],"projectPublicId":"3d924968-0cf3-4767-9609-501fa4962856","projectUrl":"https://app.snyk.io/org/significant-gravitas/project/3d924968-0cf3-4767-9609-501fa4962856?utm_source=github&utm_medium=referral&page=fix-pr","prType":"fix","templateFieldSources":{"branchName":"default","commitMessage":"default","description":"default","title":"default"},"templateVariants":["updated-fix-title","pr-warning-shown"],"type":"auto","upgrade":["SNYK-JS-NEXT-15921797"],"vulns":["SNYK-JS-NEXT-15921797"],"patch":[],"isBreakingChange":false,"remediationStrategy":"vuln"}' <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Patch-level upgrade of a core runtime/build dependency (Next.js) can affect app rendering/build behavior despite being scoped to dependency/lockfile changes. > > Overview > Upgrades the frontend framework dependency `next` from `15.4.10` to `15.4.11` in `package.json`. > > Updates `pnpm-lock.yaml` to reflect the new Next.js version (including `@next/env`) and re-resolves dependent packages that pin `next` in their peer/optional dependency graphs (e.g., `@sentry/nextjs`, `@vercel/analytics`, Storybook Next integration). > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `dc19e1f178`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: snyk-bot <snyk-bot@snyk.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 15:44:47 +00:00
majdyz	da5420fa07	fix(backend/copilot): coalesce reasoning deltas to unfreeze Kimi streams Observed symptom: copilot page frozen for ~700 s on a session using the new Kimi K2.6 default. Redis `XLEN chat:stream:...` showed 4,677 reasoning-delta chunks in a single turn vs ~28 for peer Sonnet sessions. Each chunk was one Redis xadd + one SSE frame + one React re-render of the non-virtualised chat list, which paint-stormed the main thread until the stream ended. OpenRouter's Kimi endpoint tokenises reasoning at a much finer grain than Anthropic, so the 1:1 chunk→`StreamReasoningDelta` mapping in BaselineReasoningEmitter blew up on the wire while the same code was fine for Sonnet. Fix: coalesce `StreamReasoningDelta` emissions in the emitter. * First chunk in a block still emits Start + Delta atomically so the Reasoning collapse renders immediately. * Subsequent chunks buffer into `_pending_delta` and flush once either the char-size (`_COALESCE_MIN_CHARS=32`) or time (`_COALESCE_MAX_INTERVAL_MS=40`) threshold trips. `close()` always drains the tail before emitting `StreamReasoningEnd`. * DB persistence stays per-chunk — `_current_row.content` updates on every delta independent of the coalesce window, so a crash mid-turn still persists the full reasoning-so-far. * Thresholds are `__init__` kwargs so tests can disable coalescing for deterministic state-machine assertions. Net effect: ~4,700 → ~150 events per turn (30x), well under the browser's paint-storm threshold; reasoning still appears live at ~25 Hz (40 ms window) which is below human perception. Pre-existing issues flagged for follow-up (out of scope — the freeze is gone without them): * `ChatMessagesContainer` has no React.memo per message and no list virtualisation — a very long session still re-renders every prior message on each new chunk. * `routes.py:1163-1171` replays from `0-0` with `count=1000` on every SSE reconnect (6 reconnects observed), duplicating up to 6,000 chunks. Proper Last-Event-ID support requires threading Redis stream message IDs through every SSE event + a frontend handshake — material refactor deferred to a dedicated PR.	2026-04-21 22:39:04 +07:00
Nicholas Tindle	59273fe6a0	fix(frontend): forward sentry-trace and baggage across API proxy (#12835 ) ### Why / What / How Why: Every request that went through Next's rewrite proxy broke distributed tracing. The browser Sentry SDK emitted `sentry-trace` and `baggage`, but `createRequestHeaders` only forwarded impersonation + API key, so the backend started a disconnected transaction. The frontend → backend lineage never appeared in Sentry. Same gap on direct-from-browser requests: the custom mutator never attached the trace headers itself, so even non-proxied paths lost the link. What: - Server side: forward `sentry-trace` and `baggage` from `originalRequest.headers` alongside the existing impersonation/API key forwarding. - Client side: the custom mutator pulls trace data via `Sentry.getTraceData()` and attaches it to outgoing headers when running on the client. How: Inline additions — no new observability module, no new dependencies beyond `@sentry/nextjs` which the frontend already uses for Sentry init. ### Changes 🏗️ - `src/lib/autogpt-server-api/helpers.ts` — forward `sentry-trace` + `baggage` in `createRequestHeaders`. - `src/app/api/mutators/custom-mutator.ts` — import `@sentry/nextjs`, attach `Sentry.getTraceData()` on client-side requests. - `src/app/api/mutators/__tests__/custom-mutator.test.ts` — three new tests: trace-data present, trace-data empty, server-side no-op. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [x] `pnpm vitest run src/app/api/mutators/__tests__/custom-mutator.test.ts` passes (6/6 locally) - [x] `pnpm format && pnpm lint` clean - [x] `pnpm types` clean for touched files (pre-existing unrelated type errors on dev are untouched) - [ ] In a local session with Sentry enabled, a `/copilot` chat turn produces a distributed trace that spans frontend transaction → backend transaction (single trace ID in Sentry) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Low risk: header-only changes to request construction for observability, with added tests; primary risk is unintended header propagation affecting upstream/proxy behavior. > > Overview > Restores Sentry distributed tracing continuity for frontend→backend calls by propagating `sentry-trace`/`baggage` headers. > > On the client, `customMutator` now reads `Sentry.getTraceData()` and attaches string trace headers to outgoing requests (guarded for server-side and older Sentry builds). On the server/proxy path, `createRequestHeaders` now forwards `sentry-trace` and `baggage` from the incoming `originalRequest` alongside existing impersonation/API-key forwarding, with new unit tests covering these cases. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `0f6946b776`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 15:29:19 +00:00
Nicholas Tindle	38c2844b83	feat(admin): Add system diagnostics and execution management dashboard (#11235 ) ### Changes 🏗️ This PR adds a comprehensive admin diagnostics dashboard for monitoring system health and managing running executions. https://github.com/user-attachments/assets/f7afa3ed-63d8-4b5c-85e4-8756d9e3879e #### Backend Changes: - New data layer (backend/data/diagnostics.py): Created a dedicated diagnostics module following the established data layer pattern - get_execution_diagnostics() - Retrieves execution metrics (running, queued, completed counts) - get_agent_diagnostics() - Fetches agent-related metrics - get_running_executions_details() - Lists all running executions with detailed info - stop_execution() and stop_executions_bulk() - Admin controls for stopping executions - Admin API endpoints (backend/server/v2/admin/diagnostics_admin_routes.py): - GET /admin/diagnostics/executions - Execution status metrics - GET /admin/diagnostics/agents - Agent utilization metrics - GET /admin/diagnostics/executions/running - Paginated list of running executions - POST /admin/diagnostics/executions/stop - Stop single execution - POST /admin/diagnostics/executions/stop-bulk - Stop multiple executions - All endpoints secured with admin-only access #### Frontend Changes: - Diagnostics Dashboard (frontend/src/app/(platform)/admin/diagnostics/page.tsx): - Real-time system metrics display (running, queued, completed executions) - RabbitMQ queue depth monitoring - Agent utilization statistics - Auto-refresh every 30 seconds - Execution Management Table (frontend/src/app/(platform)/admin/diagnostics/components/ExecutionsTable.tsx): - Displays running executions with: ID, Agent Name, Version, User Email/ID, Status, Start Time - Multi-select functionality with checkboxes - Individual stop buttons for each execution - "Stop Selected" and "Stop All" bulk actions - Confirmation dialogs for safety - Pagination for handling large datasets - Toast notifications for user feedback #### Security: - All admin endpoints properly secured with requires_admin_user decorator - Frontend routes protected with role-based access controls - Admin navigation link only visible to admin users ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified admin-only access to diagnostics page - [x] Tested execution metrics display and auto-refresh - [x] Confirmed RabbitMQ queue depth monitoring works - [x] Tested stopping individual executions - [x] Tested bulk stop operations with multi-select - [x] Verified pagination works for large datasets - [x] Confirmed toast notifications appear for all actions #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes (no changes needed) - [x] `docker-compose.yml` is updated or already compatible with my changes (no changes needed) - [x] I have included a list of my configuration changes in the PR description (no config changes required) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Adds new admin-only endpoints that can stop, requeue, and bulk-mark executions as `FAILED`, plus schedule deletion, which can directly impact production workload and data integrity if misused or buggy. > > Overview > Introduces a System Diagnostics admin feature spanning backend + frontend to monitor execution/schedule health and perform remediation actions. > > On the backend, adds a new `backend/data/diagnostics.py` data layer and `diagnostics_admin_routes.py` with admin-secured endpoints to fetch execution/agent/schedule metrics (including RabbitMQ queue depths and invalid-state detection), list problem executions/schedules, and perform bulk operations like `stop`, `requeue`, and `cleanup` (marking orphaned/stuck items as `FAILED` or deleting orphaned schedules). It also extends `get_graph_executions`/`get_graph_executions_count` with `execution_ids` filtering, pagination, started/updated time filters, and configurable ordering to support efficient bulk/admin queries. > > On the frontend, adds an admin diagnostics page with summary cards and tables for executions and schedules (tabs for orphaned/failed/long-running/stuck-queued/invalid, plus confirmation dialogs for destructive actions), wires it into admin navigation, and adds comprehensive unit tests for both the new API routes and UI behavior. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `15b9ed26f9`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-04-21 15:28:44 +00:00
majdyz	fce7a59713	refactor(backend/copilot): split model config into (path, tier) 2x2 matrix Per PR review: and `advanced_model` were implicitly shared between baseline (fast) and SDK (extended_thinking) paths, but the paths have different hard constraints (baseline can route to any OpenRouter provider; SDK needs Anthropic endpoints). Replace the ambiguous 2-field schema with an explicit 2x2 of (path × tier). New fields: * `fast_standard_model` — baseline standard tier (Kimi K2.6) * `fast_advanced_model` — baseline advanced tier (Opus by default; same as SDK advanced so the top tier is a clean A/B across paths. Kimi K2-Thinking evaluated and deferred — it's 6 months older than K2.6, ~9pp behind on SWE-Bench Verified, ~23pp behind on BrowseComp, and text-only.) * `thinking_standard_model` — SDK standard tier (Sonnet) * `thinking_advanced_model` — SDK advanced tier (Opus) Backward-compat env var aliases: `CHAT_MODEL` → thinking_standard, `CHAT_ADVANCED_MODEL` → thinking_advanced, `CHAT_FAST_MODEL` → fast_standard. `populate_by_name=True` so ChatConfig(field=...) kwargs work alongside the alias names. Resolver split: `resolve_chat_model` (SDK) → thinking_; `_resolve_baseline_model` (baseline) → fast_. All call sites in sdk/service.py updated; test constructors migrated to new names.	2026-04-21 22:23:29 +07:00
majdyz	95d3679e14	test(backend/copilot): assert Field defaults, not env-backed singleton Address coderabbit[bot] review comment on PR #12871: three resolver tests read `config.fast_model`, `config.model`, `config.advanced_model` from the env-backed singleton, which fails in CI whenever an operator sets `CHAT_FAST_MODEL=anthropic/claude-sonnet-4-6` (the documented rollback path). Swap to `ChatConfig.model_fields[...].default` so the assertion pins the shipped default regardless of env overrides.	2026-04-21 21:58:43 +07:00
majdyz	89f8060c5d	feat(backend/copilot): default baseline fast_model to Kimi K2.6 via OpenRouter Kimi K2.6 prices at $0.60/$2.80 per MTok (5x cheaper input, 5.4x cheaper output than Sonnet 4.6), ties Opus on SWE-Bench Verified (80.2% vs 80.8%), and ships OpenRouter's `reasoning` / `include_reasoning` extension on its Moonshot endpoints — meaning the baseline reasoning plumbing lit up in #12870 lights up unchanged. Three focused deltas: * `config.py`: new `fast_model` field defaulting to `moonshotai/kimi-k2.6`, separate from `model` (which still resolves to Sonnet for the SDK / extended-thinking path where the Claude Agent SDK CLI requires an Anthropic endpoint). `advanced_model` stays Opus on both paths — no Kimi equivalent at the top tier. * `_resolve_baseline_model`: no longer delegates to SDK's `resolve_chat_model`. Baseline standard/None → `config.fast_model`; advanced → `config.advanced_model`. SDK untouched. * `baseline/reasoning.py::_is_reasoning_route`: new gate covering Anthropic + Moonshot Kimi variants, used by `reasoning_extra_body`. The existing `_is_anthropic_model` in service.py stays narrow — it still gates `cache_control` markers + the `anthropic-beta` header, which Moonshot doesn't need (it auto-caches) and which would be dropped (or worst-case 400) on Kimi. Tests: extended extractor variant / kill-switch coverage in reasoning_test.py (new `TestIsReasoningRoute`, Kimi branches in `TestReasoningExtraBody`), added `_is_anthropic_model_rejects_kimi_routes` regression guard, added end-to-end `test_kimi_route_sends_reasoning_but_no_cache_control` through `_baseline_llm_caller` to pin the split-gate contract, and rewired `TestResolveBaselineModel` around `config.fast_model`. Rollback: `CHAT_FAST_MODEL=anthropic/claude-sonnet-4-6` restores prior behavior without code changes. Known risk to validate before we raise confidence: K2.5 had documented many-tool-selection regressions (vLLM had to ship accuracy patches) — we ship 43 tools per call, so /pr-test with the full payload is a must before this default is locked in.	2026-04-21 21:21:52 +07:00
Zamil Majdy	24850e2a3e	feat(backend/autopilot): stream extended_thinking on baseline via OpenRouter (#12870 ) ### Why / What / How Why: Fast-mode autopilot never renders a Reasoning block. The frontend already has `ReasoningCollapse` wired up and the wire protocol already carries `StreamReasoning` events (landed for SDK mode in #12853), but the baseline (OpenRouter OpenAI-compat) path never asks Anthropic for extended thinking and never parses reasoning deltas off the stream. Result: users on fast/standard get a good answer with no visible chain-of-thought, while SDK users see the full Reasoning collapse. What:* Plumb reasoning end-to-end through the baseline path by opting into OpenRouter's non-OpenAI `reasoning` extension, parsing the reasoning delta fields off each chunk, and emitting the same `StreamReasoningStart/Delta/End` events the SDK adapter already uses. How: - New config: `baseline_reasoning_max_tokens` (default 8192; 0 disables). Sent as `extra_body={"reasoning": {"max_tokens": N}}` only on Anthropic routes — other providers drop the field, and `is_anthropic_model()` already gates this. - Delta extraction: `_extract_reasoning_delta()` handles all three OpenRouter/provider variants in priority order — legacy `delta.reasoning` (string), DeepSeek-style `delta.reasoning_content`, and the structured `delta.reasoning_details` list (text/summary entries; encrypted or unknown entries are skipped). - Event emission: Reasoning uses the same state-machine rules the SDK adapter uses — a text delta or tool_use delta arriving mid-stream closes the open reasoning block first, so the AI SDK v5 transport keeps reasoning / text / tool-use as distinct UI parts. On stream end, any still-open reasoning block gets a matching `reasoning-end` so a reasoning-only turn still finalises the frontend collapse. - Scope: Live streaming only. Reasoning is not persisted to `ChatMessage` rows or the transcript builder in this PR (SDK path does so via `content_blocks=[{type: 'thinking', ...}]`, but that round-trip requires Anthropic signature plumbing baseline doesn't have today). Reload will still not show reasoning on baseline sessions — can follow up if we decide it's worth the signature handling. ### Changes - `backend/copilot/config.py` — new `baseline_reasoning_max_tokens` field. - `backend/copilot/baseline/service.py` — new `_extract_reasoning_delta()` helper; reasoning block state on `_BaselineStreamState`; `reasoning` gated into `extra_body`; chunk loop emits `StreamReasoning*` events with text/tool_use transition rules; stream-end closes any open reasoning block. - `backend/copilot/baseline/service_unit_test.py` — 11 new tests covering extractor variants (legacy string, deepseek alias, structured list with text/summary aliases, encrypted-skip, empty), paired event ordering (reasoning-end before text-start), reasoning-only streams, and that the `reasoning` request param is correctly gated by model route (Anthropic vs non-Anthropic) and by the config flag. ### Checklist For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [x] `poetry run pytest backend/copilot/baseline/service_unit_test.py backend/copilot/baseline/transcript_integration_test.py` — 103 passed - [ ] Manual: with `CHAT_USE_CLAUDE_AGENT_SDK=false` and `CHAT_MODEL=anthropic/claude-sonnet-4-6`, send a multi-step prompt on fast mode and confirm a Reasoning collapse appears alongside the final text - [ ] Manual: flip `CHAT_BASELINE_REASONING_MAX_TOKENS=0` and confirm baseline responses revert to text-only (no reasoning param, no reasoning UI) - [ ] Manual: with a non-Anthropic baseline model (`openai/gpt-4o`), confirm the request does NOT include `reasoning` and nothing regresses For configuration changes: - [x] `.env.default` is compatible — new setting falls back to the pydantic default	2026-04-21 21:05:00 +07:00
Zamil Majdy	e17e9f13c4	fix(backend/copilot): reduce SDK + baseline prompt cache waste (#12866 ) ## Summary Four cost-reduction changes for the copilot feature. Consolidated into one PR at user request; each commit is self-contained and bisectable. ### 1. SDK: full cross-user cache on every turn (CLI 2.1.116 bump) Previous behavior: CLI 2.1.97 crashed when `excludeDynamicSections=True` was combined with `--resume`, so the code fell back to a raw `system_prompt` string on resume, losing Claude Code's default prompt and all cache markers. Every Turn 2+ of an SDK session wrote ~33K tokens to cache instead of reading. Fix: install `@anthropic-ai/claude-code@2.1.116` in the backend Docker image and point the SDK at it via `CHAT_CLAUDE_AGENT_CLI_PATH=/usr/bin/claude`. CLI 2.1.98+ fixes the crash, so we can use the preset with `exclude_dynamic_sections=True` on every turn — Turn 1, 2, 3+ all share the same static prefix and hit the cross-user prompt cache. Local dev requirement: if `CHAT_CLAUDE_AGENT_CLI_PATH` is unset, the bundled 2.1.97 fallback will crash on `--resume`. Install the CLI globally (`npm install -g @anthropic-ai/claude-code@2.1.116`) or set the env var. ### 2. Baseline: add `cache_control` markers (commit `756b3ecd9` + follow-ups) Baseline path had zero `cache_control` across `backend/copilot/*`. Every turn was full uncached input (~18.6K tokens, ~$0.058). Two ephemeral markers — on the system message (content-blocks form) and the last tool schema — plus `anthropic-beta: prompt-caching-2024-07-31` via `extra_headers` as defense-in-depth. Helpers split into `_mark_tools_` (precomputed once per session) and `_mark_system_` (per-round, O(1)). Repeat hellos: ~$0.058 → ~$0.006. ### 3. Drop `get_baseline_supplement()` (commit `6e6c4d791`) `_generate_tool_documentation()` emitted ~4.3K tokens of `(tool_name, description)` pairs that exactly duplicated the tools array already in the same request. Deleted. `SHARED_TOOL_NOTES` (cross-tool workflow rules) is preserved. Baseline "hello" input: ~18.7K → ~14.4K tokens. ### 4. Langfuse "CoPilot Prompt" v26 (published under `review` label) Separate, out-of-repo change. v25 had three duplicate "Example Response" blocks + a 10-step "Internal Reasoning Process" section. v26 collapses to one example + bullet-form reasoning. Char count 20,481 → 7,075 (rough 4 chars/token → ~5,100 → ~1,770 tokens). - v26 is published with label `review` (NOT `production`); v25 remains active. - Promote via `mcp__langfuse__updatePromptLabels(name="CoPilot Prompt", version=26, newLabels=["production"])` after smoke-test. - Rollback: relabel v25 `production`. ## Test plan - [x] Unit tests for `_build_system_prompt_value` (fresh vs resumed turns emit identical preset dict) - [x] SDK compat tests pass including `test_bundled_cli_version_is_known_good_against_openrouter` - [x] `cli_openrouter_compat_test.py` passes against CLI 2.1.116 (locally verified with `CHAT_CLAUDE_AGENT_CLI_PATH=/opt/homebrew/bin/claude`) - [x] 8 new `_mark_` unit tests + identity regression test for `_fresh_` helpers - [x] `SHARED_TOOL_NOTES` public-constant test passes; 5 old tool-docs tests removed - [ ] Manual cost verification (commit 1):* send two consecutive SDK turns; Turn 2 and Turn 3 should both show `cacheReadTokens` ≈ 33K (full cross-user cache hits). - [ ] Manual cost verification (commit 2): send two "hello" turns on baseline <5 min apart; Turn 2 reports `cacheReadTokens` ≈ 18K and cost ≈ $0.006. - [ ] Regression sweep for commit 3: one turn per tool family — `search_agents`, `run_agent`, `add_memory`/`forget_memory`/`search_memory`, `search_docs`, `read_workspace_file` — to verify no tool-selection regression from dropping the prose tool docs. - [ ] Langfuse v26 smoke test: 5-10 varied turns after relabelling to `production`; compare responses vs v25 for regression on persona, concision, capability-gap handling, credential security flows. ## Deployment notes - Production Docker image now installs CLI 2.1.116 (~20 MB added). - `CHAT_CLAUDE_AGENT_CLI_PATH=/usr/bin/claude` set in the Dockerfile; runtime can override via env. - First deploy after this merge needs a fresh image rebuild to pick up the new CLI.	2026-04-21 16:34:10 +07:00
Zamil Majdy	f238c153a5	fix(backend/copilot): release session cluster lock on completion (#12867 ) ## Summary Fixes a bug where a chat session gets silently stuck after the user presses Stop mid-turn. Root cause: the cancel endpoint marks the session `failed` after polling 5s, but the cluster lock held by the still-running task is only released by `on_run_done` when the task actually finishes. If the task hangs past the 5s poll (slow LLM call, agent-browser step, etc.), the lock lingers for up to 5 min — `stream_chat_post`'s `is_turn_in_flight` check sees the flipped meta (`failed`) and enqueues a new turn, but the run handler sees the stale lock and drops the user's message at `manager.py:379` (`reject+requeue=False`). The new SSE stream hangs until its 60s idle timeout. ### Fix Two cooperating changes: 1. `mark_session_completed` force-releases the cluster lock in the same transaction that flips status to `completed`/`failed`. Unconditional delete — by the time we're declaring the session dead, we don't care who the current lock holder is; the lock has to go so the next enqueued turn can acquire. This is what closes the stuck-session window. 2. `ClusterLock.release()` is now owner-checked (Lua CAS — `GET == token ? DEL : noop` atomically). Force-release means another pod may legitimately own the key by the time the original task's `on_run_done` eventually fires. Without the CAS, that late `release()` would wipe the successor's lock. With it, the late `release()` is a safe no-op when the owner has changed. Together: prompt release on completion (via force-delete) + safe cleanup when on_run_done catches up (via CAS). That re-syncs the API-level `is_turn_in_flight` check with the actual lock state, so the contention window disappears. No changes to the worker-level contention handler: `stream_chat_post` already queues incoming messages into the pending buffer when a turn is in flight (via `queue_pending_for_http`). With these fixes, the worker never sees contention in the common case; if it does (true multi-pod race), the pre-existing `reject+requeue=False` behaviour still applies — we'll revisit that path with its own PR if it becomes a production symptom. ### Verification - Reproduced the original stuck-session symptom locally (Stop mid-turn → send new message → backend logs `Session … already running on pod …`, user message silently lost, SSE stream idle 60s then closes). - After the fix: cancel → new message → turn starts normally (lock released by `mark_session_completed`). - `poetry run pyright` — 0 errors on edited files. - `pytest backend/copilot/stream_registry_test.py backend/executor/cluster_lock_test.py` — 33 passed (includes the successor-not-wiped test). ## Changes - `autogpt_platform/backend/backend/copilot/executor/utils.py` — extract `get_session_lock_key(session_id)` helper so the lock-key format has a single source of truth. - `autogpt_platform/backend/backend/copilot/executor/manager.py` — use the helper where the cluster lock is created. - `autogpt_platform/backend/backend/copilot/stream_registry.py` — `mark_session_completed` deletes the lock key after the atomic status swap (force-release). - `autogpt_platform/backend/backend/executor/cluster_lock.py` — `ClusterLock.release()` (sync + async) uses a Lua CAS to only delete when `GET == token`, protecting against wiping a successor after a force-release. ## Test plan - [ ] Send a message in /copilot that triggers a long turn (e.g. `run_agent`), press Stop before it finishes, then send another message. Expect: new turn starts promptly (no 5-min wait for lock TTL). - [ ] Happy path regression — send a normal message, verify turn completes and the session lock key is deleted after completion. - [ ] Successor protection — unit test `test_release_does_not_wipe_successor_lock` covers: A acquires, external DEL, B acquires, A.release() is a no-op, B's lock intact.	2026-04-21 16:27:01 +07:00
Zamil Majdy	01f1289aac	feat(copilot): real OpenRouter cost + cost-based rate limits (percent-only public API) (#12864 ) ## Why After `d7653acd0` removed cost estimation, most baseline turns log with `tracking_type="tokens"` and no authoritative USD figure (see: dashboard flipped from `cost_usd` to `tokens` after 4/14/2026). Rate-limit counters were also token-weighted with hand-rolled cache discounts (cache_read @ 10%, cache_create @ 25%) and a 5× Opus multiplier — a proxy for cost that drifts from real OpenRouter billing. This PR wires real generation cost from OpenRouter into both the cost-tracking log and the rate limiter, and hides raw spend figures from the user-facing API so clients can't reverse-engineer per-turn cost or platform margins. ## What 1. Real cost from OpenRouter — baseline passes `extra_body={"usage": {"include": True}}` and reads `chunk.usage.cost` from the final streaming chunk. `x-total-cost` header path removed. Missing cost logs an error and skips the counter update (vs the old estimator that silently under-counted). 2. Cost-based rate limiting — `record_token_usage(...)` → `record_cost_usage(cost_microdollars)`. The weighted-token math, cache discount factors, and `_OPUS_COST_MULTIPLIER` are gone; real USD already reflects model + cache pricing. 3. Redis key migration — `copilot:usage:` → `copilot:cost:` so stale token counters can't be misinterpreted as microdollars. 4. LD flags + config — renamed to `copilot-daily-cost-limit-microdollars` / `copilot-weekly-cost-limit-microdollars` (unit in the LD key so values can't accidentally be set in dollars or cents). 5. Public `/usage` hides raw $$ — new `CoPilotUsagePublic` / `UsageWindowPublic` schemas expose only `percent_used` (0-100) + `resets_at` + `tier` + `reset_cost`. Admin endpoint keeps raw microdollars for debugging. 6. Admin API contract — `UserRateLimitResponse` fields renamed `daily/weekly_token_limit` → `daily/weekly_cost_limit_microdollars`, `daily/weekly_tokens_used` → `daily/weekly_cost_used_microdollars`. Admin UI displays `$X.XX`. ## How - `baseline/service.py` — pass `extra_body`, extract cost from `chunk.usage.cost`, drop the `x-total-cost` header fallback entirely. - `rate_limit.py` — rewritten around `record_cost_usage`, `check_rate_limit(daily_cost_limit, weekly_cost_limit)`, new Redis key prefix. Adds `CoPilotUsagePublic.from_status()` projector for the public API. - `token_tracking.py` — converts `cost_usd` → microdollars via `usd_to_microdollars` and calls `record_cost_usage` only when cost is present. - `sdk/service.py` — deletes `_OPUS_COST_MULTIPLIER` and simplifies `_resolve_model_and_multiplier` to `_resolve_sdk_model_for_request`. - Chat routes: `/usage` and `/usage/reset` return `CoPilotUsagePublic`. Internal server-side limit checks still use the raw microdollar `CoPilotUsageStatus`. - Admin routes: unchanged response shape (renamed fields only). - Frontend: `UsagePanelContent`, `UsageLimits`, `CopilotPage`, `BriefingTabContent`, `credits/page.tsx` consume the new public schema and render "N% used" + progress bar. Admin `RateLimitDisplay` / `UsageBar` keep `$X.XX`. Helper `formatMicrodollarsAsUsd` retained for admin use. - Tests + snapshots rewritten; new assertions explicitly check that raw `used`/`limit` keys are absent from the public payload. ## Deploy notes 1. Before rolling this out, create the new LD flags: `copilot-daily-cost-limit-microdollars` (default `500000`) and `copilot-weekly-cost-limit-microdollars` (default `2500000`). Old `copilot--token-limit` flags can stay in LD for rollback. 2. One-time Redis cleanup (optional):* token-based counters under `copilot:usage:*` are orphaned and will TTL out within 7 days. Safe to ignore or delete manually. ## Test plan - [x] `poetry run test` — all impacted backend tests pass (182/182 in targeted scope) - [x] `pnpm test:unit` — all 1628 integration tests pass - [x] `poetry run format` / `pnpm format` / `pnpm types` clean - [x] Manual sanity against dev env — Baseline turn logged $0.1221 for 40K/139 tokens on Sonnet 4 (matches expected pricing) - [ ] `/pr-test --fix` end-to-end against local native stack	2026-04-21 14:34:43 +07:00
Zamil Majdy	343222ace1	feat(platform): defer paid-to-paid subscription downgrades + cancel-pending flow (#12865 ) ### Why / What / How Why: Only downgrades to FREE were scheduled at period end; paid→paid downgrades (e.g. BUSINESS→PRO) applied immediately via Stripe proration. The asymmetry meant users lost their higher tier mid-cycle in exchange for a Stripe credit voucher only redeemable on a future subscription — a confusing pattern that produces negative-value paths for users actually cancelling. There was also no way to cancel a pending downgrade or paid→FREE cancellation once scheduled. What: Standardize on "upgrade = immediate, downgrade = next cycle" and let users cancel a pending change by clicking their current tier. Harden the new code against conflicting subscription state, concurrent tab races, flaky Stripe calls, and hot-path latency regressions. How: Subscription state machine: - Upgrade (PRO→BUSINESS) — `stripe.Subscription.modify` with immediate proration (unchanged). If a downgrade schedule is already attached, release it first so the upgrade wins. - Paid→paid downgrade (BUSINESS→PRO) — creates a `stripe.SubscriptionSchedule` with two phases (current tier until `current_period_end`, target tier after). No mid-cycle tier demotion. Defensive pre-clear: existing schedule → release; `cancel_at_period_end=True` → set to False. - Paid→FREE — unchanged: `cancel_at_period_end=True`. - Same-tier update — reuses the existing `POST /credits/subscription` route. When `target_tier == current_tier`, backend calls `release_pending_subscription_schedule` (idempotent) and returns status. No dedicated cancel-pending endpoint — "Keep my current tier" IS the cancel operation. - `release_pending_subscription_schedule` is idempotent on terminal-state schedules and clears both `schedule` and `cancel_at_period_end` atomically per call. API surface: - New fields on `SubscriptionStatusResponse`: `pending_tier` + `pending_tier_effective_at` (pulled from the schedule's next-phase `start_date` so dashboard-authored schedules report the correct timestamp). - `POST /credits/subscription` now returns `SubscriptionStatusResponse` (previously `SubscriptionCheckoutResponse`); the response still carries `url` for checkout flows and adds the status fields inline. - `get_pending_subscription_change` is cached with a 30s TTL — avoids hammering Stripe on every home-page load. - Webhook dispatches `subscription_schedule.{released,completed,updated}` through the main `sync_subscription_from_stripe` flow so both event sources converge to the same DB state. Implementation notes: - New Stripe calls use native async (`stripe.Subscription.list_async` etc.) and typed attribute access — no `run_in_threadpool` wrapping in the new helpers. - Shared `_get_active_subscription` helper collapses the "list active/trialing subs, take first" pattern used by 4 callers. Frontend: - `PendingChangeBanner` sub-component above the tier grid with formatted effective date + "Keep [CurrentTier]" button. `aria-live="polite"` for screen readers; locale pinned to `en-US` to avoid SSR/CSR hydration mismatch. - "Keep [CurrentTier]" also available as a button on the current tier card. - Other tier buttons disabled while a change is pending — user must resolve pending first to prevent stacked schedules. - `cancelPendingChange` reuses `useUpdateSubscriptionTier` with `tier: current_tier`; awaits `refetch()` on both success and error paths so the UI reconciles even if the server succeeded but the client didn't receive the response. ### Changes Backend (`credit.py`, `v1.py`) - Tier-ordering helpers (`is_tier_upgrade`/`is_tier_downgrade`). - `modify_stripe_subscription_for_tier` routes downgrades through `_schedule_downgrade_at_period_end`; upgrade path releases any pending schedule first. - `_schedule_downgrade_at_period_end` defensively releases pre-existing schedules and clears `cancel_at_period_end` before creating the new schedule. - `release_pending_subscription_schedule` idempotent on terminal-state schedules; logs partial-failure outcomes. - `_next_phase_tier_and_start` returns both tier and phase-start timestamp; warns on unknown prices. - `get_pending_subscription_change` cached (30s TTL), narrow exception handling. - `sync_subscription_schedule_from_stripe` delegates to `sync_subscription_from_stripe` for convergence with the main webhook path. - Shared `_get_active_subscription` + `_release_schedule_ignoring_terminal` helpers. - `POST /credits/subscription` absorbs the same-tier "cancel pending change" branch. *Frontend (`SubscriptionTierSection/`) - `PendingChangeBanner` new sub-component (a11y, locale-pinned date, paid→FREE vs paid→paid copy split, non-null effective-date assertion, no `dark:` utilities). - "Keep [CurrentTier]" button on current tier card. - `useSubscriptionTierSection` — `cancelPendingChange` reuses the update-tier mutation. - Copy: downgrade dialog + status hint updated. - `helpers.ts` extracted from the main component. Tests** - Backend: +24 tests (95/95 passing): upgrade-releases-pending-schedule, schedule-releases-existing-schedule, cancel-at-period-end collision, terminal-state release idempotency, unknown-price logging, status response population, same-tier-POST-with-pending, webhook delegation. - Frontend: +5 integration tests (21/21 passing): banner render/hide, Keep-button click from banner + current card, paid→paid dialog copy. ### Checklist - [x] Backend unit tests: 95 pass - [x] Frontend integration tests: 21 pass - [x] `poetry run format` / `poetry run lint` clean - [x] `pnpm format` / `pnpm lint` / `pnpm types` clean - [ ] Manual E2E on live Stripe (dev env) — pending deploy: BUSINESS→PRO creates schedule, DB tier unchanged until period end - [ ] Manual E2E: "Keep BUSINESS" in banner releases schedule - [ ] Manual E2E: cancel pending paid→FREE flips `cancel_at_period_end` back to false - [ ] Manual E2E: BUSINESS→PRO (scheduled) then attempt BUSINESS→FREE clears the PRO schedule, sets cancel_at_period_end - [ ] Manual E2E: BUSINESS→PRO (scheduled) then upgrade back to BUSINESS releases the schedule	2026-04-21 14:01:09 +07:00
Zamil Majdy	a8226af725	fix(copilot): dedupe tool row, lift bash_exec timeout, Stop+resend recovery (#12862 ) Closes #12861 · [OPEN-3096](https://linear.app/autogpt/issue/OPEN-3096) ## Why Four related copilot UX / stability issues surfaced on dev once action tools started rendering inline in the chat (see #12813): ### 1. Duplicate bash_exec row `GenericTool` rendered two rows saying the same thing for every completed tool call — a muted subtitle line ("Command exited with code 1" / "Ran: sleep 20") and a `ToolAccordion` with the command echoed in its description. Previously hidden inside the "Show reasoning" / "Show steps" collapse, now visibly duplicated. ### 2. `bash_exec` capped at 120s via advisory text The tool schema said `"Max seconds (default 30, max 120)"`; the model obeyed, so long-running scripts got clipped at 120s with a vague `Timed out after 120s` even though the E2B sandbox has no such limit. Confirmed via Langfuse traces — the model picks `120` for long scripts because that's what the schema told it the max was. E2B path never had a server-side clamp. Originally added in #12103 (default 30) and tightened to "max 120" advisory in #12398 (token-reduction pass). ### 3. 30s default was too aggressive `pip install`, small data-processing scripts, etc. routinely cross 30s and got killed before the model thought to retry with a bigger timeout. ### 4. Stop + edit + resend → "The assistant encountered an error" ([OPEN-3096](https://linear.app/autogpt/issue/OPEN-3096)) Two independent bugs both land on the same banner — fixing only one leaves the other visible on the next action. 4a. Stream lock never released on Stop (the error in the ticket screenshot). The executor's `async for chunk in stream_and_publish(...)` broke out on `cancel.is_set()` without calling `aclose()` on the wrapper. `async for` does NOT auto-close iterators on `break`, so `stream_chat_completion_sdk` stayed suspended at its current `await` — still holding the per-session Redis lock (TTL 120s) until GC eventually closed it. The next `POST /stream` hit `lock.try_acquire()` at [sdk/service.py](autogpt_platform/backend/backend/copilot/sdk/service.py) and yielded `StreamError("Another stream is already active for this session. Please wait or stop it.")`. The `except GeneratorExit → lock.release()` handler written exactly for this case never fired because nothing sent GeneratorExit. 4b. Orphan `tool_use` after stop-mid-tool. Even with the lock released, the stop path persists the session ending on an assistant row whose `tool_calls` have no matching `role="tool"` row. On the next turn, `_session_messages_to_transcript` hands Claude CLI `--resume` a JSONL with a `tool_use` and no paired `tool_result`, and the SDK raises a vague error — same banner. The ticket's "Open questions" explicitly flags this. ## What Frontend — `GenericTool.tsx` split responsibilities between the two rows so they don't duplicate: - Subtitle row (always visible, muted): what ran — `Ran: sleep 120`. Never the exit code. - Accordion description: how it ended — `completed` / `status code 127 · bash: missing-bin: command not found` / `Timed out after 120s` / (fallback to command preview for legacy rows missing `exit_code` / `timed_out`). Pulled from the first non-empty line of `stdout` / `stderr` when available. - Expanded accordion: full command + stdout + stderr code blocks (unchanged). Backend — `bash_exec.py`: - Drop the "max 120" advisory from the schema description. - Bump default `timeout: 30 → 120`. - Clean up the result message — `"Command executed with status code 0"` (no "on E2B", no parens). Backend — `executor/processor.py` + `stream_registry.py` (OPEN-3096 #4a): wrap the consumer `async for` in `try/finally: await stream.aclose()`. Close now propagates through `stream_and_publish` into `stream_chat_completion_sdk`, whose existing `except GeneratorExit → lock.release()` releases the Redis lock immediately on cancel. Stream types tightened to `AsyncGenerator[StreamBaseResponse, None]` so the defensive `getattr(stream, "aclose", None)` goes away. Backend — `session_cleanup.py` (OPEN-3096 #4b): new `prune_orphan_tool_calls()` helper walks the trailing session tail and drops any trailing assistant row whose `tool_calls` have unresolved ids (plus everything after it) and any trailing `STOPPED_BY_USER_MARKER` system-stop row. Single backward pass — tolerates the marker being present or absent. Called from the existing turn-start cleanup in both `sdk/service.py` and `baseline/service.py`; takes an optional `log_prefix` so both paths emit the same INFO log when something was popped. In-memory only — the DB save path is append-only via `start_sequence`. ## Test plan - [x] `pnpm exec vitest run src/app/(platform)/copilot/tools/GenericTool src/app/(platform)/copilot/components/ChatMessagesContainer` — 105 pass (6 new for GenericTool subtitle/description variants + legacy-fallback case). - [x] `pnpm format` / `pnpm lint` / `pnpm types` — clean. - [x] `poetry run pytest backend/copilot/sdk/session_persistence_test.py` — 17 pass (6 + 3 new covering the orphan-tool-call prune and its optional-log-prefix branch). - [x] `poetry run pytest backend/copilot/stream_registry_test.py backend/copilot/executor/processor_test.py` — 19 pass (2 for aclose propagation on the `stream_and_publish` wrapper, 2 for `_execute_async` aclose propagation on both exit paths, 1 for publish_chunk RedisError warning ladder). - [x] `poetry run ruff check` / `poetry run pyright` on touched files — clean. - [x] Manual: fire a `bash_exec` — one labelled row, accordion description reads sensibly (`completed` / `status code 1 · …` / `Timed out after 120s`). - [x] Manual: script that needs >120s — no longer clipped. - [x] Manual: Stop mid-tool + edit + resend — Autopilot resumes without "Another stream is already active" and without the vague SDK error. ## Scope note Does not touch `splitReasoningAndResponse` — re-collapsing action tools back into "Show steps" is #12813's responsibility.	2026-04-21 10:18:52 +07:00
Ubbe	f06b5293de	fix(frontend/library): compute monthly spend for AgentBriefingPanel (#12854 ) ### Why / What / How <img width="900" alt="Screenshot 2026-04-20 at 19 52 22" src="https://github.com/user-attachments/assets/c30d5f18-2842-4a8a-ac3d-5bfee18fcd56" /> Why: The "Spent this month" tile in the Agent Briefing Panel on the Library page always showed `$0`, even for users with real execution usage. The tile is meant to give a quick sense of monthly spend across all agents. What: Compute `monthlySpend` from actual execution data and format it as currency. How: - `useLibraryFleetSummary` now sums `stats.cost` (cents) across every execution whose `started_at` falls within the current calendar month. Previously `monthlySpend` was hardcoded to `0`. - `FleetSummary.monthlySpend` is documented as being in cents (consistent with backend + `formatCents`). - `StatsGrid` now uses `formatCents` from the copilot usage helpers to render the tile (e.g. `$12.34` instead of the broken `$0`). ### Changes 🏗️ - `autogpt_platform/frontend/src/app/(platform)/library/hooks/useLibraryFleetSummary.ts`: aggregate `stats.cost` across executions started in the current calendar month; add `toTimestamp` and `startOfCurrentMonth` helpers. - `autogpt_platform/frontend/src/app/(platform)/library/components/AgentBriefingPanel/StatsGrid.tsx`: format the "Spent this month" tile via shared `formatCents` helper. - `autogpt_platform/frontend/src/app/(platform)/library/types.ts`: document that `FleetSummary.monthlySpend` is in cents. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [ ] Load `/library` with the `AGENT_BRIEFING` flag enabled and at least one completed execution in the current month — the "Spent this month" tile shows the correct cumulative cost. - [ ] With no executions this month, the tile shows `$0.00`. - [ ] Type-check (`pnpm types`), lint (`pnpm lint`), and integration tests (`pnpm test:unit`) pass locally. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 20:28:47 +07:00
Zamil Majdy	70b591d74f	fix(copilot): persist reasoning, split steps/reasoning UX, fix mid-turn promote stream stall (#12853 ) ## Why Four related issues that surfaced when queued follow-ups hit an extended_thinking turn: 1. Mid-turn promote stalled the SSE stream. `pollBackendAndPromote` used `setMessages((prev) => [...prev, bubble])` — Vercel AI SDK's `useChat` streams SSE deltas into `messages[-1]`, so once a user bubble ended up there, every subsequent chunk silently landed on the wrong message. Chat sat frozen until a page refresh, even though the backend's stream completed cleanly. 2. Thinking-only final turn looked identical to a frozen UI. When Claude's last LLM call after a tool_result produced only a `ThinkingBlock` (no `TextBlock`, no `ToolUseBlock`), the response adapter silently dropped it and the UI hung on "Thought for Xs" with no response text. 3. Reasoning was invisible. `ThinkingBlock` was dropped live and never persisted in a way the frontend could render — sessions on reload / shared links showed no thinking, a confusing UX gap ("display for nothing"). 4. Cross-pod Redis replay dropped reasoning events. The `stream_registry._reconstruct_chunk` type map had no entries for `reasoning-` types, so any client that subscribed mid-stream (share, reload, cross-pod) silently dropped them with `Unknown chunk type: reasoning-delta`. ## What ### Mid-turn promote — splice before the trailing assistant In `useCopilotPendingChips.ts::pollBackendAndPromote`: ```ts setMessages((prev) => { const bubble = makePromotedUserBubble(drained, "midturn", crypto.randomUUID()); const lastIdx = prev.length - 1; if (lastIdx >= 0 && prev[lastIdx].role === "assistant") { return [...prev.slice(0, lastIdx), bubble, prev[lastIdx]]; } return [...prev, bubble]; }); ``` Streaming assistant stays at `messages[-1]`, AI SDK deltas keep routing correctly. `useHydrateOnStreamEnd` snaps the bubble to the DB-canonical position when the stream ends. ### Reasoning — end-to-end visibility (live + persisted) - Wire protocol: new `StreamReasoningStart` / `StreamReasoningDelta` / `StreamReasoningEnd` events matching AI SDK v5's `reasoning-` wire names, so `useChat` accumulates them into a `type: 'reasoning'` UIMessage part natively. - Response adapter: every `ThinkingBlock` now emits reasoning events; text/tool_use transitions close the open reasoning block so AI SDK doesn't merge distinct parts. - Stream registry: added `reasoning-` types to `_reconstruct_chunk`'s type_to_class map so Redis replay no longer drops them on cross-pod / reload / share. - Persistence* (new): each `StreamReasoningStart` opens a `ChatMessage(role="reasoning")` row in `session.messages`; deltas accumulate into its content; `StreamReasoningEnd` closes it. No schema migration — `ChatMessage.role` is already `String`. `extract_context_messages` filters `role="reasoning"` out of LLM context (the `--resume` CLI session already carries thinking separately) so the model never re-ingests prior reasoning. - Frontend conversion: `convertChatSessionMessagesToUiMessages` maps `role="reasoning"` DB rows into `{type: "reasoning", text}` parts on the surrounding assistant bubble, so reload / shared-link sessions render reasoning identically to live stream. ### Steps / Reasoning UX — modal + accordion split - `StepsCollapse` (new): a Dialog-backed "Show steps" modal wraps the pre-final-answer group (tool timeline + per-block reasoning). Modal keeps the steps visually grouped and out of the reading flow. - `ReasoningCollapse` (rewritten): inline accordion with "Show reasoning" / "Hide reasoning" toggle — no longer a modal, so it expands inside the Steps modal without stacking two dialogs. Reasoning text appears indented with a left border. - `splitReasoningAndResponse`: reasoning parts now stay in the reasoning group (instead of being pinned out), so they show up inside the Steps modal alongside the tool-use timeline. ### Thinking-only final turn — synthesize a closing line (belt-and-suspenders) - Prompt rule (`_USER_FOLLOW_UP_NOTE`): "Every turn MUST end with at least one short user-facing text sentence." - Adapter fallback: tracks `_text_since_last_tool_result`; at `ResultMessage success` with tools run + zero text since, opens a fresh step (`UserMessage` already closed the previous one) and injects `"(Done — no further commentary.)"` before `StreamFinish`. Only fires for the pathological case — pure-text turns untouched. ## Test plan - [x] `pnpm vitest run` on copilot files — all 638 prior tests pass; 17 new tests added covering: - `convertChatSessionToUiMessages`: reasoning row alone / merged with assistant text / multi-row / empty skip / duration capture - `ReasoningCollapse`: initial collapsed, toggle, `rotate-90`, `aria-expanded` - `StepsCollapse`: trigger + dialog open renders children - `MessagePartRenderer`: reasoning → `<pre>` inside collapse, whitespace/missing text → null - `splitReasoningAndResponse`: reasoning-stays-in-reasoning regression - [x] `poetry run pytest backend/copilot/sdk/response_adapter_test.py` — 36 pass (7 new: 4 reasoning streaming, 3 thinking-only fallback) - [x] Manual: reasoning streams live and persists across reload on a fresh session - [x] Manual: previously-created sessions (pre-persistence) don't have `role="reasoning"` rows — behaves as a clean no-op (no reasoning shown, no error), new sessions render reasoning inside Steps modal ## Notes - No DB migration — `ChatMessage.role` is already an open `String`; `role="reasoning"` is simply filtered out of LLM context builds but rendered by the frontend. - Addresses /pr-review blockers: (a) stream_registry missing reasoning types in Redis round-trip, (b) fallback text emitted outside a step, (c) dead `case "thinking"` in renderer (now uses the live `reasoning` type uniformly).	2026-04-19 10:37:04 +07:00
Zamil Majdy	b1c043c2d8	feat(copilot): queue follow-up messages on busy sessions (UI + run_sub_session + AutoPilot block) (#12737 ) ## Why Users and tools can target a copilot session that already has a turn running. Before this PR there was no uniform behaviour for that case — the UI manually routed to a separate queue endpoint, `run_sub_session` and the AutoPilot block raced the cluster lock, and in-turn follow-ups only reached the model at turn-end via auto-continue. Outcome: dropped messages, duplicate tool rows, missed mid-turn intent, latent correctness bugs in block execution. ## What A single "message arrived → turn already running?" primitive, shared by every caller: 1. POST `/stream` (UI chat): self-defensive. Session idle → SSE as today; session busy → `202 application/json` with `{buffer_length, max_buffer_length, turn_in_flight}`. The deprecated `POST /messages/pending` endpoint is removed (`GET /messages/pending` peek stays). 2. `run_copilot_turn_via_queue` (shared primitive from #12841, used by `run_sub_session` + `AutoPilotBlock`): gains the same busy-check. Busy session → push to pending buffer, return `("queued", SessionResult(queued=True, pending_buffer_length=N))` without creating a stream registry session or enqueueing a RabbitMQ job. All callers inherit queueing. 3. Mid-turn delivery: drained follow-ups are attached to every tool_result's `additionalContext` via the SDK's `PostToolUse` hook — covers both MCP and built-in tools (WebSearch/Read/Agent/etc.), not just `run_block`. Claude reads the queued text on the next LLM round of the same turn. 4. UI observability: chips promote to a proper user bubble at the correct chronological position (after the tool_result row that consumed them). Auto-continue handles end-of-turn drainage; mid-turn backend poll handles the tool-boundary drainage path. ## How Data plane - `backend/copilot/pending_messages.py` — Redis list per session (LPOP-count for atomic drain), TTL, fire-and-forget pub/sub notify. MAX 10 per session. - `backend/copilot/pending_message_helpers.py` — `is_turn_in_flight`, `queue_user_message`, `drain_and_format_for_injection`, `persist_pending_as_user_rows` (shared persist+rollback used by both baseline and SDK paths). - `backend/data/redis_helpers.py` — centralised `incr_with_ttl`, `capped_rpush`, `hash_compare_and_set`; every Lua script and pipeline atomicity lives in one place. Injection sites - `backend/copilot/sdk/security_hooks.py::post_tool_use_hook` — drains + returns `additionalContext`. Single hook covers built-in + MCP tools. - `backend/copilot/sdk/service.py` — `StreamToolOutputAvailable` dispatch persists the drained follow-up as a real user row right after the tool_result (UI bubble at the right index). `state.midturn_user_rows` keeps the CLI upload watermark honest. - `backend/copilot/baseline/service.py` — same drain at round boundaries, uses the shared `persist_pending_as_user_rows` helper so baseline + SDK code paths don't diverge. Dispatch - `backend/copilot/sdk/session_waiter.py::run_copilot_turn_via_queue` — `is_turn_in_flight` short-circuit; `SessionResult` gains `queued` + `pending_buffer_length`; `SessionOutcome` gains `"queued"`. - `backend/api/features/chat/routes.py::stream_chat_post` — busy-check returns 202 with `QueuePendingMessageResponse`; `POST /messages/pending` deleted. - `backend/copilot/tools/run_sub_session.py` / `models.py` — `SubSessionStatusResponse.status` gains `"queued"`; `response_from_outcome` renders a clear queued-state message with the pending-buffer depth and a link to watch live. - `backend/blocks/autopilot.py::execute_copilot` — surfaces queued state as descriptive response text + empty `tool_calls`/history when `result.queued`. Frontend - `src/app/(platform)/copilot/useCopilotPendingChips.ts` — hook owning the chip lifecycle: backend peek on session load, auto-continue promotion when a second assistant id appears, mid-turn poll that promotes when the backend count drops. - `src/app/(platform)/copilot/useHydrateOnStreamEnd.ts` — force-hydrate-waits-for-fresh-reference dance extracted. - `src/app/(platform)/copilot/helpers/stripReplayPrefix.ts` — pure function with drop / strip / streaming-catch-up cases + helper decomposition. - `src/app/(platform)/copilot/helpers/makePromotedBubble.ts` — one-line helper for the promoted bubble shape. - `src/app/(platform)/copilot/helpers/queueFollowUpMessage.ts` — thin `fetch` wrapper for the 202 path (AI SDK's `useChat` fetcher only handles SSE, so we can't reuse `sendMessage` for the queued response). ## Test plan Backend unit + integration (`poetry run pytest backend/copilot backend/api/features/chat`): - [x] 107 tests pass — pending buffer, drain helpers, routes, session_waiter queue branch, run_sub_session outcome rendering, autopilot block - [x] New `session_waiter_test.py` proves the queue branch short-circuits `stream_registry.create_session` + `enqueue_copilot_turn` - [x] Mid-turn persist has a rollback-and-re-queue path tested for when `session.messages` persist silently fails to back-fill sequences Frontend unit (`pnpm vitest run`): - [x] 630 tests pass incl. 22 new for extracted helpers + hooks - [x] Frontend coverage on touched copilot files: 91%+ (patch 87.37%) Manual (once merged): - [ ] Queue two chips while a tool is running; Claude acknowledges both on the next round, UI shows bubbles in typing order after the tool output - [ ] Hand AutoPilot block an existing session_id that has a live turn; block returns queued status, in-flight turn drains the message on its next round - [ ] `run_sub_session` against a busy sub — status=`queued`, `sub_autopilot_session_link` lets user watch live --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-19 00:48:59 +07:00
Zamil Majdy	fcaebd1bb7	refactor(backend/copilot): unified queue-backed copilot turns + async sub-AutoPilot + guide-read gate (#12841 ) ### Why / What / How Why: the 10-min stream-level idle timeout was killing legitimate long-running tool calls — notably sub-AutoPilot runs via `run_block(AutoPilotBlock)`, which routinely take 15–45 min. The symptom users saw was `"A tool call appears to be stuck"` even though AutoPilot was actively working. A second long-standing rough edge was shipped alongside: agents often skipped `get_agent_building_guide` when generating agent JSON, producing schemas that failed validation and burned turns on auto-fix loops. What: three threaded pieces. 1. Async sub-AutoPilot via `run_sub_session`. New copilot tool that delegates a task to a fresh (or resumed) sub-AutoPilot, and its companion `get_sub_session_result` for polling/cancelling. The agent starts with `run_sub_session(prompt, wait_for_result≤300s)` and, if the sub isn't done inside the cap, receives a handle + polls via `get_sub_session_result(wait_if_running≤300s)`. No single MCP call ever blocks the stream for more than 5 min, so the 10-min stream-idle timer stays simple and effective (derived as `MAX_TOOL_WAIT_SECONDS * 2`). 2. Queue-backed copilot turn dispatch — one code path for all three callers. - `run_sub_session` enqueues a `CoPilotExecutionEntry` on the existing `copilot_execution` exchange instead of spawning an in-process `asyncio.Task`. - `AutoPilotBlock.execute_copilot` (graph block) now uses the same queue instead of `collect_copilot_response` inline. - The HTTP SSE endpoint was already queue-backed. - All three share a single primitive: `run_copilot_turn_via_queue` → `create_session` → `enqueue_copilot_turn` → `wait_for_session_result`. The event-aggregation logic (`EventAccumulator`/`process_event`) is a shared module used by both the direct-stream path and the cross-process waiter. - Benefits: deploy/crash resilience (RabbitMQ redelivery survives worker restarts), natural load balancing across copilot_executor workers, sessions as first-class resources (UI users can `/copilot?sessionId=<inner>` into any sub or AutoPilot block's session), and every future stream-level feature (pending-messages drain #12737, compaction policies, etc.) applies uniformly instead of bypassing graph-block sessions. 3. Guide-read gate on agent-generation tools. `create_agent` / `edit_agent` / `validate_agent_graph` / `fix_agent_graph` refuse until the session has called `get_agent_building_guide`. The pre-existing soft hint was routinely ignored; the gate makes the dependency enforceable. All four tool descriptions advertise the requirement in one tightened sentence ("Requires get_agent_building_guide first (refuses otherwise).") that stays under the 32000-char schema budget. How: #### Queue-backed sub-AutoPilot + AutoPilotBlock - `sdk/session_waiter.py` — new module. `SessionResult` dataclass mirrors `CopilotResult`. `wait_for_session_result` subscribes to `stream_registry`, drains events via shared `process_event`, returns `(outcome, result)`. `wait_for_session_completion` is the cheaper outcome-only variant. `run_copilot_turn_via_queue` is the canonical three-step dispatch. Every exit path unsubscribes the listener. - `sdk/stream_accumulator.py` — new module. `EventAccumulator`, `ToolCallEntry`, `process_event` extracted from `collect.py`. Both the direct-stream and cross-process paths now use the same fold logic. - `tools/run_sub_session.py` / `tools/get_sub_session_result.py` — rewritten around the shared primitive. `sub_session_id` is now the sub's `ChatSession` id directly (no separate registry handle). Ownership re-verified on every call via `get_chat_session`. Cancel via `enqueue_cancel_task` on the existing `copilot_cancel` fan-out exchange. - `blocks/autopilot.py` — `execute_copilot` replaced its inline `collect_copilot_response` with `run_copilot_turn_via_queue`. `SessionResult` carries response text, tool calls, and token usage back from the worker so no DB round-trip is needed. The block's public I/O contract (inputs, outputs, `ToolCallEntry` shape) is unchanged. - `CoPilotExecutionEntry` gains a `permissions: CopilotPermissions \| None` field forwarded to the worker's `stream_fn` so the sub's capability filter survives the queue hop. The processor passes it through to `stream_chat_completion_sdk` / `stream_chat_completion_baseline`. - Deleted: `sdk/sub_session_registry.py` (module-level dict, done-callback, abandoned-task cap, `notify_shutdown_and_cancel_all`, `_reset_for_test`), plus the shutdown-notifier hook in `copilot_executor.processor.cleanup` — redundant under queue-backed execution. #### Run_block single-tool cap (3) - `tools/helpers.execute_block` caps block execution at `MAX_TOOL_WAIT_SECONDS = 5 min` via `asyncio.wait_for` around the generator consumption. - On timeout: logs `copilot_tool_timeout tool=run_block block=… block_id=… input_keys=… user=… session=… cap_s=…` (grep-friendly) and returns an `ErrorResponse` that redirects the LLM to `run_agent` / `run_sub_session`. - Billing protection: `_charge_block_credits` is called in a `finally` guarded by `asyncio.shield` and marked `charge_handled` before the await so cancel-mid-charge doesn't double-bill and cancel-mid-generator-before-charge still settles via the finally. #### Guide-read gate - `helpers.require_guide_read(session, tool_name)` scans `session.messages` for any prior assistant tool call named `get_agent_building_guide` (handles both OpenAI and flat shapes). Applied at the top of `_execute` in `create_agent`, `edit_agent`, `validate_agent_graph`, `fix_agent_graph`. Tool descriptions advertise the requirement. #### Shared timing constants - `MAX_TOOL_WAIT_SECONDS = 5 * 60` + `STREAM_IDLE_TIMEOUT_SECONDS = 2 * MAX_TOOL_WAIT_SECONDS` in `constants.py`. Every long-running tool (`run_agent`, `view_agent_output`, `run_sub_session`, `get_sub_session_result`, `run_block`) imports from one place; no more hardcoded 300 / `1060` literals drifting apart. Stream-idle invariant ("no single tool blocks close to the idle timeout") holds by construction. ### Frontend - Friendlier tool-card labels: `run_sub_session` → "Sub-AutoPilot", `get_sub_session_result` → "Sub-AutoPilot result", `run_block` → "Action" (matches the builder UI's own naming), `run_agent` → "Agent". Fixes the double-verb "Running Run …" phrasing. - `SubSessionStatusResponse.sub_autopilot_session_link` surfaces `/copilot?sessionId=<inner>` so users can click into any sub's session from the tool-call card — same pattern as `run_agent`'s `library_agent_link`. ### Changes 🏗️ - New modules: `sdk/session_waiter.py`, `sdk/stream_accumulator.py`, `tools/run_sub_session.py`, `tools/get_sub_session_result.py`, `tools/sub_session_test.py`, `tools/agent_guide_gate_test.py`. - New response types: `SubSessionStatusResponse`, `SubSessionProgressSnapshot`, `SessionResult`. - New gate helper: `require_guide_read` in `tools/helpers.py`. - Queue protocol: `permissions` field on `CoPilotExecutionEntry`, threaded through `processor.py` → `stream_fn`. - Hidden: `AUTOPILOT_BLOCK_ID` in `COPILOT_EXCLUDED_BLOCK_IDS` (run_block can't execute AutoPilotBlock; agents use `run_sub_session` instead). - Deleted: `sdk/sub_session_registry.py`, processor shutdown-notifier hook. - Regenerated: `openapi.json` for the new response types; block-docs for the updated `ToolName` Literal. - Tool descriptions: tightened the guide-gate hint across the four agent-builder tools to stay under the 32000-char schema budget. - 40+ tests* across sub_session, execute_block cap + billing races, stream_accumulator, agent_guide_gate, frontend helpers. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Unit suite green on the full copilot tree; `poetry run format` + `pyright` clean - [x] Schema character budget test passes (tool descriptions trimmed to stay under 32000) - [x] Native UI E2E (`poetry run app` + `pnpm dev`): `run_sub_session(wait_for_result=60)` returns `status="completed"` + `sub_autopilot_session_link` inline; `run_sub_session(wait_for_result=1)` returns `status="running"` + handle, `get_sub_session_result(wait_if_running=60)` observes `running → completed` transition - [x] AutoPilotBlock (graph) goes through `copilot_executor` queue end-to-end (verified via logs: ExecutionManager's AutoPilotBlock node spawned session `f6de335b-…`, a different `CoPilotExecutor` worker acquired its cluster lock and ran the SDK stream) - [x] Guide gate: `create_agent` without a prior `get_agent_building_guide` returns the refusal; agent reads the guide and retries successfully	2026-04-18 23:11:41 +07:00
Joe Munene	3a01874911	fix(frontend/builder): preserve agent name in AgentExecutor node title after reload (#12805 ) ## Summary Fixes #11041 When an `AgentExecutorBlock` is placed in the builder, it initially displays the agent's name (e.g., "Researcher v2"). After saving and reloading the page, the title reverts to the generic "Agent Executor." ## Root Cause The backend correctly persists `agent_name` and `graph_version` in `hardcodedValues` (via `input_default` in `AgentExecutorBlock`). However, `NodeHeader.tsx` always resolves the display title from `data.title` (the generic block name), ignoring the persisted agent name. ## Fix Modified the title resolution chain in `NodeHeader.tsx` to check `data.hardcodedValues.agent_name` between the user's custom name and the generic block title: 1. `data.metadata.customized_name` (user's manual rename) — highest priority 2. `agent_name` + ` v{graph_version}` from `hardcodedValues` — new 3. `data.title` (generic block name) — fallback This is a frontend-only change. No backend modifications needed. ## Files Changed - `autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/CustomNode/components/NodeHeader.tsx` (+11, -1) ## Test Plan - [x] Place an AgentExecutorBlock, select an agent — title shows agent name - [x] Save graph, reload page — title still shows agent name (was "Agent Executor" before) - [x] Double-click to rename — custom name takes priority over agent name - [x] Clear custom name — falls back to agent name - [x] Non-AgentExecutor blocks — unaffected, show generic title as before --------- Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>	2026-04-17 15:20:32 +00:00
Zamil Majdy	6d770d9917	fix(platform/copilot): revert forward pagination, add visibility guarantee for blank chat (#12831 ) ## Why / What / How Why: PR #12796 changed completed copilot sessions to load messages from sequence 0 forward (ascending), which broke the standard chat UX — users now land at the beginning of the conversation instead of the most recent messages. Reported in Discord. What: Reverts the forward pagination approach and replaces it with a visibility guarantee that ensures every page contains at least one user/assistant message. How: - Backend: Removed after_sequence, from_start, forward_paginated, newest_sequence — always use backward (newest-first) pagination. Added _expand_for_visibility() helper: after fetching, if the entire page is tool messages (invisible in UI), expand backward up to 200 messages until a visible user/assistant message is found. - Frontend: Removed all forwardPaginated/newestSequence plumbing from hooks and components. Removed bottom LoadMoreSentinel. Simplified message merge to always prepend paged messages. ### Changes - routes.py: Reverted to simple backward pagination, removed TOCTOU re-fetch logic - db.py: Removed forward mode, extracted _expand_tool_boundary() and added _expand_for_visibility() - SessionDetailResponse: Removed newest_sequence and forward_paginated fields - openapi.json: Removed after_sequence param and forward pagination response fields - Frontend hooks/components: Removed forward pagination props and logic (-1000 lines) - Updated all tests (backend: 63 pass, frontend: 1517 pass) ### Checklist - [x] I have clearly listed my changes in the PR description - [x] Backend unit tests: 63 pass - [x] Frontend unit tests: 1517 pass - [x] Frontend lint + types: clean - [x] Backend format + pyright: clean	2026-04-17 19:23:28 +07:00
slepybear	334ec18c31	docs: convert in-code comments to MkDocs admonitions in block-sdk-gui… (#12819 ) ### Why / What / How <!-- Why: Why does this PR exist? What problem does it solve, or what's broken/missing without it? --> This PR converts inline Python comments in code examples within `block-sdk-guide.md` into MkDocs `!!! note` admonitions. This makes code examples cleaner and more copy-paste friendly while preserving all explanatory content. <!-- What: What does this PR change? Summarize the changes at a high level. --> Converts inline comments in code blocks to admonitions following the pattern established in PR #12396 (new_blocks.md) and PR #12313. <!-- How: How does it work? Describe the approach, key implementation details, or architecture decisions. --> - Wrapped code examples with `!!! note` admonitions - Removed inline comments from code blocks for clean copy-paste - Added explanatory admonitions after each code block ### Changes 🏗️ - Provider configuration examples (API key and OAuth) - Block class Input/Output schema annotations - Block initialization parameters - Test configuration - OAuth and webhook handler implementations - Authentication types and file handling patterns ### Checklist 📋 #### For documentation changes: - [x] Follows the admonition pattern from PR #12396 - [x] No code changes, documentation only - [x] Admonition syntax verified correct #### For configuration changes: - [ ] `.env.default` is updated or already compatible with my changes - [ ] `docker-compose.yml` is updated or already compatible with my changes --- Related Issues: Closes #8946 Co-authored-by: slepybear <slepybear@users.noreply.github.com> Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>	2026-04-17 07:47:52 +00:00
slepybear	ea5cfdfa2e	fix(frontend): remove debug console.log statements (#12823 ) ## Why Debug console.log statements were left in production code, which can leak sensitive information and pollute browser developer consoles. ## What Removed console.log from 4 non-legacy frontend components: - useNavbar.ts: isLoggedIn debug log - WalletRefill.tsx: autoRefillForm debug log - EditAgentForm.tsx: category field debug log - TimezoneForm.tsx: currentTimezone debug log ## How Simply deleted the console.log lines as they served no purpose other than debugging during development. ## Checklist - [x] Code follows project conventions - [x] Only frontend changes (4 files, 6 lines removed) - [x] No functionality changes Co-authored-by: slepybear <slepybear@users.noreply.github.com>	2026-04-17 07:31:51 +00:00
Ubbe	d13a85bef7	feat(frontend): surface scheduled agents in library & copilot briefings (#12818 ) ## Why Scheduled agents weren't well-surfaced in the Library and Copilot briefings: - The Library fleet summary didn't count agents that are scheduled purely via the scheduler (only those with a `recommended_schedule_cron` set at the agent level). - Sitrep items didn't distinguish scheduled or listening (trigger-based) agents, so they often fell back to a generic "idle" state. - Scheduled chips showed a generic message with no indication of when the next run would happen. - The Copilot Agent Briefing surfaced every scheduled agent regardless of how far out the next run was — an agent scheduled a month away would take a slot from something actually happening soon. - Long sitrep messages overflowed the row. ## What - Add `is_scheduled` to `LibraryAgent` (sourced from the scheduler) so the frontend can reliably detect schedule-only agents. - Count scheduled agents in `useLibraryFleetSummary`. - Include scheduled and listening agents in sitrep items, with a priority ordering (error → running → stale → success → listening → scheduled → idle). - Show a relative next-run time on scheduled sitrep chips (e.g. "Scheduled to run in 2h" / "in 3d"). - Filter the Copilot Agent Briefing to scheduled agents whose next run is within the next 3 days. - Truncate long sitrep messages to 1 line with `OverflowText` and show the full text in a tooltip on hover. ## How - Scheduler → `LibraryAgent` mapping populates `is_scheduled` / `next_scheduled_run`. - `useSitrepItems` gains an optional `scheduledWithinMs` parameter. Copilot's `usePulseChips` passes `3 * 24 * 60 * 60 * 1000`; the Library briefing omits it to keep its existing (unbounded) behavior. - Scheduled config-based sitrep items are skipped when `next_scheduled_run` is missing or outside the window. - `SitrepItem` wraps the message in `OverflowText` so a single-line ellipsis + hover tooltip replaces raw overflow. ## Test plan - [ ] `/library` — scheduled and listening agents appear in the sitrep with accurate copy; fleet summary counts scheduled agents correctly; long messages truncate with a tooltip on hover. - [ ] `/copilot` — on an empty session with the `AGENT_BRIEFING` flag on, the briefing only shows scheduled agents whose next run is within 3 days; agents scheduled further out no longer appear as "scheduled" chips. - [ ] Scheduled chip text reads "Scheduled to run in {Nm\|Nh\|Nd}" matching `next_scheduled_run`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 14:36:15 +07:00
Zamil Majdy	60b85640e7	fix(backend/copilot): replace dedup lock with idempotent append_and_save_message (#12814 ) ## Why The Redis dedup lock (`chat:msg_dedup:{session}:{content_hash}`, 30s TTL) was solving the wrong problem: - Its purpose: block infra/nginx retries from calling `append_and_save_message` twice after a client disconnect, writing a duplicate user message to the DB. - The approach: deliberately hold the lock for 30s on `GeneratorExit`. - Why unnecessary: the executor's cluster lock already prevents duplicate execution. The only real gap was duplicate DB writes in the ~1s before the executor picks up the turn. ## What - Deleted `message_dedup.py` and `message_dedup_test.py` (~150 lines removed). - Removed all dedup lock code from `routes.py` (~40 lines removed). - `append_and_save_message` is now idempotent and self-contained: - Uses redis-py's built-in `Lock(timeout=10, blocking_timeout=2)` — Lua-script atomic acquire/release, no manual poll/sleep loop. - Lock context manager yields `bool` (`True` = acquired, `False` = degraded). When degraded (Redis down or 2s timeout), reads from DB directly instead of cache to avoid stale-state duplicates. - Idempotency check: if `session.messages[-1]` already matches the incoming role+content, returns `None` instead of the session. - Lock released explicitly as soon as the write completes; `try/except` in `finally` so a cleanup error after a successful write never surfaces a false 500. - On cache-write failure, the stale cache entry is invalidated so future reads fall back to the authoritative DB. - `routes.py` uses the `None` signal: `is_duplicate_message = (await append_and_save_message(...)) is None` - Skips `create_session` and `enqueue_copilot_turn` for duplicates — client re-attaches to the existing turn's Redis stream. - `track_user_message` and `turn_id` generation only happen when `is_duplicate_message` is false. - `subscribe_to_session` retry window increased from 1×50ms to 3×100ms — covers the window where a duplicate request subscribes before the original's `create_session` hset completes. - Cleaned up `routes_test.py`: removed 5 dedup-specific tests and the `mock_redis` setup from `_mock_stream_internals`; added duplicate-skips-enqueue test. ## How The idempotency guard distinguishes legit same-text messages from retries via the assistant turn between them: if the user said "yes", got a response, and says "yes" again, `session.messages[-1]` is the assistant reply, so the role check fails and the second message goes through. A retry (no response yet) sees the user message as the last entry and is blocked. ```python if ( session.messages and session.messages[-1].role == message.role and session.messages[-1].content == message.content ): return None # duplicate — caller skips enqueue ``` The Redis lock ensures this check always sees authoritative state even in multi-replica deployments. When the lock is unavailable (Redis down or contention), reading from DB directly (bypassing potentially stale cache) provides the same safety guarantee at the cost of a DB round-trip. ## Checklist - [x] PR targets `dev` - [x] Conventional commit title with scope - [x] Tests added/updated (duplicate detection, lock degradation, DB error, cache invalidation paths) - [x] `poetry run format` and `poetry run pyright` pass clean - [x] No new linter suppressors	2026-04-16 22:12:30 +07:00
Zamil Majdy	87e4d42750	fix(backend/copilot): fix initial load missing messages + forward pagination for completed sessions (#12796 ) ### Why / What / How Why: Completed copilot sessions with many messages showed a completely empty chat view. A user reported a 158-message session that appeared blank on reload. What: Two bugs fixed: 1. Backend — initial page load always returned the newest 50 messages in DESC order. For sessions heavy in tool calls, the user's original messages (seq 0–5) were never included; all 50 slots consumed by mid-session tool outputs. 2. Frontend — convertChatSessionToUiMessages silently dropped user messages with null/empty content. How: For completed sessions (no active stream), the backend now loads from sequence 0 in ASC order. Active/streaming sessions keep newest-first for streaming context. A new after_sequence forward cursor enables infinite-scroll for subsequent pages (sentinel moves to bottom). The frontend wires forward_paginated + newest_sequence end-to-end. ### Changes 🏗️ - db.py: added from_start (ASC) and after_sequence (forward cursor) modes; added newest_sequence to PaginatedMessages - routes.py: detect completed vs active on initial load; pass from_start=True for completed; expose newest_sequence + forward_paginated; accept after_sequence param - convertChatSessionToUiMessages.ts: never drop user messages with empty content - useLoadMoreMessages.ts: forward pagination via after_sequence; append pages to end - ChatMessagesContainer.tsx: LoadMoreSentinel at bottom for forward-paginated sessions - Wire newestSequence + forwardPaginated end-to-end through useChatSession/useCopilotPage/ChatContainer - openapi.json: add after_sequence + newest_sequence/forward_paginated; regenerate types - db_test.py: 9 new unit tests for from_start and after_sequence modes ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Open a completed session with many messages — first user message visible on initial load - [x] Scroll to bottom of completed session — load more appends next page - [x] Open active/streaming session — newest messages shown first, streaming unaffected - [x] Backend unit tests: all 28 pass - [x] Frontend lint/format: clean, no new type errors --------- Co-authored-by: chernistry <73943355+chernistry@users.noreply.github.com> Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>	2026-04-16 14:16:54 +00:00
Ubbe	0339d95d12	fix(frontend): small UI fixes, sort menu bg, name update auth, stats grid overflow, pulse chips (#12815 ) ## Summary - LibrarySortMenu / AgentFilterMenu: Force `!bg-transparent` and neutralise legacy `SelectTrigger` styles (`m-0.5`, `ring-offset-white`, `shadow-sm`) that caused a white background around the trigger - EditNameDialog: Replace client-side `supabase.auth.updateUser()` with server-side `PUT /api/auth/user` route — fixes "Auth session missing!" error caused by `httpOnly` cookies being inaccessible to browser JS - StatsGrid: Swap label `Text` for `OverflowText` so tile labels truncate with `…` and show a tooltip instead of wrapping when the grid is squeezed - PulseChips: Set fixed `15rem` chip width with `shrink-0`, horizontal scroll, and styled thin scrollbar - Tests: Updated `EditNameDialog` tests to use MSW instead of mocking Supabase client; added 7 new `PulseChips` integration tests ## Test plan - [x] `pnpm test:unit` — all 1495 tests pass (91 files) - [x] `pnpm format && pnpm lint` — clean - [x] `pnpm types` — no new errors (pre-existing only) - [ ] QA `/library?sort=updatedAt` — sort menu trigger has no white bg - [ ] QA `/library` — StatsGrid labels truncate with tooltip on narrow viewports - [ ] QA `/copilot` — PulseChips scroll horizontally at fixed width - [ ] QA `/copilot` — Edit name dialog saves successfully (no "Auth session missing!") 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 20:11:21 +07:00

1 2 3 4 5 ...

8441 Commits