mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-04-30 03:00:41 -04:00
23b5e1272ed9bbb4133eec6ebfd49ca456258b09
8428 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
23b5e1272e |
feat: add OpenAI GPT-image models to image blocks
Add gpt-image-1, gpt-image-1.5, gpt-image-2, and gpt-image-1-mini as model options in: - AIImageGeneratorBlock - AIImageEditorBlock - AIImageCustomizerBlock Changes: - Expand model enums in all three blocks - Update credentials to accept Union[Replicate, OpenAI] - Add OpenAI API branches using images.generate and images.edit - Add block pricing (8-20 credits) consistent with existing tiers - Rename GeminiImageModel -> ImageCustomizerModel (backwards compatible) - Rename FluxKontextModelName -> ImageEditorModel (backwards compatible alias) |
||
|
|
c56c1e5dd6 |
fix(backend/copilot): disable ask_question tool pending UX rework (#12887)
### Why / What / How **Why:** The in-conversation Question GUI is unreliable in production — users submitting answers can get their messages dropped and the agent gets stuck on the auto-generated "please proceed" step with no way to make progress. Discord report: https://discord.com/channels/1126875755960336515/1496474512966029472/1496537943287005365 (see attached video). Pause/queue semantics still need a rework; until then, the right call is to stop the model from reaching for this tool. **What:** Removes `ask_question` from the copilot tool registry so the model never sees or calls it. Historical sessions that already contain `ask_question` tool calls still render (frontend renderers + response model untouched), so this is non-destructive to existing chats. Re-enabling once UX is reworked is a small revert. **How:** - Drop the `AskQuestionTool` import + registry entry from `backend/copilot/tools/__init__.py`. - Drop `"ask_question"` from the `ToolName` literal in `backend/copilot/permissions.py` — required because a runtime consistency check asserts the literal matches `TOOL_REGISTRY.keys()`. - Delete the "Clarifying — Before or During Building" section from `backend/copilot/sdk/agent_generation_guide.md` so the SDK-mode system prompt no longer instructs the model to call `ask_question`. - Drop the three `prompting_test.py` tests that asserted the guide mentions that section. - Keep `ask_question.py`, its unit test, `ClarificationNeededResponse`, and the frontend `AskQuestion`/`ClarificationQuestionsCard` components untouched so old sessions still render and re-enabling is a small revert. ### Changes 🏗️ - `backend/copilot/tools/__init__.py` — remove `AskQuestionTool` import and `"ask_question"` entry in `TOOL_REGISTRY`. - `backend/copilot/permissions.py` — remove `"ask_question"` from the `ToolName` literal. - `backend/copilot/sdk/agent_generation_guide.md` — remove the "Clarifying — Before or During Building" section. - `backend/copilot/prompting_test.py` — remove `TestAgentGenerationGuideContainsClarifySection` and the now-unused `Path` import. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [x] `poetry run pytest backend/copilot/tools/ backend/copilot/permissions_test.py backend/copilot/prompting_test.py` — 805+78 tests pass, consistency check between `ToolName` literal and `TOOL_REGISTRY` still holds. - [ ] Smoke-test in dev: start a copilot session and confirm the model no longer lists/calls `ask_question` (its OpenAI tool schema is gone from `get_available_tools()` and from the SDK `allowed_tools`). - [ ] Load a historical session that contains an `ask_question` tool call in its transcript — confirm the frontend still renders the question card (no regression on legacy sessions). |
||
|
|
6fcbe95645 | Merge branch 'master' into dev autogpt-platform-beta-v0.6.57 | ||
|
|
9703da3dfd |
refactor(backend/copilot): Moonshot module + cache_control widening + partial-messages default-on + title cost (#12882)
## Why Several loose ends from the Kimi SDK-default merge (#12878), plus follow-ups surfaced during review + E2E testing: 1. **Kimi-specific pricing lived inline in `sdk/service.py`** alongside unrelated SDK plumbing — any future non-Anthropic vendor would have piled onto the same file. 2. **Moonshot's Anthropic-compat endpoint honours `cache_control: {type: ephemeral}`**, but the baseline cache-marking gate (`_is_anthropic_model`) was narrow enough to exclude it → Moonshot fell back to automatic prefix caching, which drifts readily between turns. 3. **Kimi reasoning rendered AFTER the answer text** on dev because the summary-walk hoist only reorders within one `AssistantMessage.content` list, and Moonshot splits each turn into multiple sequential AssistantMessages (text-only, then thinking-only). 4. **Title generation's LLM call bypassed cost tracking** — admin dashboard under-reported total provider spend by the aggregate of those per-session calls. 5. **Cost override** was using the requested primary model, not the actually-executed model — when the SDK fallback activates the override mis-routes pricing. ## What ### Moonshot module New `backend/copilot/moonshot.py`: - `is_moonshot_model(model)` — prefix check against `moonshotai/` - `rate_card_usd(model)` — published Moonshot rates, default `(0.60, 2.80)` per MTok with per-slug override slot - `override_cost_usd(...)` — moved from `sdk/service.py`, replaces CLI's Sonnet-rate estimate with real rate card - `moonshot_supports_cache_control(model)` — narrow gate for cache markers Rate card is **not canonical** — authoritative cost comes from the OpenRouter `/generation` reconcile; this module only improves the in-turn estimate and the reconcile's lookup-fail fallback. Signal authority: reconcile >> rate card >> CLI. ### Baseline cache-control widened to Moonshot - New `_supports_prompt_cache_markers` = `_is_anthropic_model OR is_moonshot_model` - Both call sites (system-message cache dict, last-tool cache marker) switched to the wider gate - OpenAI / Grok / Gemini still return `false` — those endpoints 400 on the unknown field **Measured impact in /pr-test:** baseline Kimi continuation turns jumped to ~98% cache hit (334 uncached + 12.8K cache_read on a 13.1K prompt). ### SDK partial-messages default-on (fixes the reasoning-order bug) - `CHAT_SDK_INCLUDE_PARTIAL_MESSAGES` flipped from `default=False` → `default=True` - Kimi stream now emits `reasoning-start → reasoning-delta* → reasoning-end → text-start → text-delta*` in the correct order — verified in /pr-test - Kill-switch: set `CHAT_SDK_INCLUDE_PARTIAL_MESSAGES=false` to fall back to summary-only emission ### SDK cost override scoped to Moonshot - Call site now explicitly gates `if _is_moonshot_model(active_model)` — Anthropic turns trust CLI's number directly - Added `_RetryState.observed_model` populated from `AssistantMessage.model`, preferred over `state.options.model` so fallback-model turns bill correctly (addresses CodeRabbit review) ### Title cost capture - `_generate_session_title` now returns `(title, ChatCompletion)` so the caller controls cost persistence - `_update_title_async` runs title-persist and cost-record as independent best-effort steps - `_title_usage_from_response` helper reads `prompt_tokens / completion_tokens / cost_usd` (OR's `usage.cost` off `model_extra`) - Provider label derived from `ChatConfig.base_url` (`open_router` / `openai`) - No exception suppressors — `isinstance(cost_raw, (int, float))` check replaces the inner `float()` try/except ### Misc - Kimi tool-name whitespace strip in the response adapter — Kimi occasionally emits tool names with leading spaces the CLI dispatcher can't resolve - TODO marker on the rate-card for post-prod-soak removal ## How - Detection is **prefix-based** (`moonshotai/`) — future Kimi SKUs transparently inherit rate card + cache-control gate - Baseline cache-marking was already structured; only the gate changes - Partial-messages default-on relies on the adapter's diff-based reconcile (shipped in #12878) which has soaked stable - Title cost path mirrors `tools/web_search.py`'s pattern for reading OR's `usage.cost` ## Test plan - [x] `pytest backend/copilot/moonshot_test.py` — 21 tests - [x] `pytest backend/copilot/baseline/service_unit_test.py` — updated for widened gate - [x] `pytest backend/copilot/sdk/*_test.py backend/copilot/service_test.py` — no regressions - [x] Full E2E on local native stack — 10/10 scenarios pass (see test-report comment) - [x] Measured: baseline Kimi ~98% cache hit on continuation, SDK Kimi ~62% (capped by Moonshot's prefix ceiling) ## Deferred SDK-path Moonshot cache hit rate stays at ~62% on long prompts. `native_tokens_cached=18432` regardless of turn/session suggests a Moonshot-side cap on cached prefix size. Not fixable by our code — requires proxy rewriting requests or upstream Moonshot change. |
||
|
|
ebb0d3b95b |
feat(backend/copilot): LaunchDarkly per-user model routing (#12881)
## Summary Per-user model routing for the copilot via LaunchDarkly. Replaces the pure-env-var pick on every `(mode, tier)` cell of the model matrix with an LD-first resolver that falls back to the `ChatConfig` default. Lets us roll out non-default routes (e.g. Kimi K2.6 on baseline standard) to a user cohort without shipping a deploy. | | standard | advanced | |----------|------------------------------------|------------------------------------| | fast | `copilot-fast-standard-model` | `copilot-fast-advanced-model` | | thinking | `copilot-thinking-standard-model` | `copilot-thinking-advanced-model` | All four flags are **string-valued** — the value IS the model identifier (e.g. `"anthropic/claude-sonnet-4-6"` or `"moonshotai/kimi-k2.6"`). ## What ships - **New module `backend/copilot/model_router.py`** with a single `resolve_model(mode, tier, user_id, *, config)` coroutine. That's the one place both paths consult. - **4 new `Flag` enum values** in `backend/util/feature_flag.py` (reusing the existing `get_feature_flag_value` helper which already supports arbitrary return types). - **`baseline/service.py::_resolve_baseline_model`** → async, takes `user_id`. - **`sdk/service.py::_resolve_sdk_model_for_request`** → takes `user_id`, consults LD for both standard and advanced thinking cells. - **Default flip**: `fast_standard_model` default goes back to `anthropic/claude-sonnet-4-6`. Non-Anthropic routes now ship via LD targeting — safer rollback, per-user cohort control, no redeploy required to flip. ## Behavior preserved - `config.claude_agent_model` explicit override still wins unconditionally (existing escape hatch for ops). - `use_claude_code_subscription=true` on the standard thinking tier still returns `None` so the CLI picks the model tied to the user's Claude Code subscription. - All legacy env var aliases (`CHAT_MODEL`, `CHAT_ADVANCED_MODEL`, `CHAT_FAST_MODEL`) still bind to their cells. - LD client exceptions / misconfigured (non-string) flag values fall back silently to config default with a single warning log — never fails the request. ## Files | File | Change | |---|---| | `backend/copilot/model_router.py` | new — `resolve_model` + `_config_default` + `_FLAG_BY_CELL` map | | `backend/copilot/model_router_test.py` | new — 11 cases | | `backend/util/feature_flag.py` | add 4 string-valued `Flag` entries | | `backend/copilot/config.py` | flip `fast_standard_model` default to Sonnet | | `backend/copilot/baseline/service.py` | `_resolve_baseline_model` → async + LD resolver | | `backend/copilot/sdk/service.py` | `_resolve_sdk_model_for_request` → LD resolver + user_id | | `backend/copilot/baseline/transcript_integration_test.py` | update tests for new signature + default | ## Test plan - [x] `poetry run pytest backend/copilot/model_router_test.py backend/copilot/baseline/transcript_integration_test.py backend/copilot/sdk/service_test.py backend/copilot/config_test.py` — **112 passing** - [x] 11 resolver cases: missing user → fallback, LD string wins, whitespace stripped, non-string value → fallback, empty string → fallback, LD exception → fallback + warn, each of 4 cells routes to its distinct flag - [x] Legacy env aliases still bind to their new fields - [ ] Manual dev-env smoke: flip `copilot-fast-standard-model` LD targeting to `moonshotai/kimi-k2.6` for one user and confirm baseline uses Kimi while other users stay on Sonnet - [ ] Confirm SDK path still honors subscription mode (LD not consulted when `use_claude_code_subscription=true` + standard tier) ## Rollout 1. Merge this PR → default stays Sonnet / Opus across the matrix, no behavior change. 2. Create the 4 LD flags as string-typed in the LaunchDarkly console (defaults matching config, so no drift if targeting empty). 3. Add per-user / per-cohort targeting in LD for the routes we want to roll out (Kimi on baseline standard for a percentage, etc.). |
||
|
|
b98bcf31c8 |
feat(backend/copilot): SDK fast tier defaults to Kimi K2.6 via OpenRouter + vendor-aware cost + cross-model fix (#12878)
## Summary Make Kimi K2.6 the default for the SDK (extended-thinking) copilot path, mirroring the baseline default landed in #12871. The SDK already routes through OpenRouter (see [`build_sdk_env`](autogpt_platform/backend/backend/copilot/sdk/env.py) — `ANTHROPIC_BASE_URL` is set to OpenRouter's Anthropic-compatible `/v1/messages` endpoint), but the model resolver was unconditionally stripping the vendor prefix, which prevented routing to anything except Anthropic models. This PR unblocks Kimi (and any other non-Anthropic OpenRouter vendor) on the SDK fast tier and flips the default to match the baseline path. ## Why After #12871 the baseline (`fast_*`) path runs Kimi K2.6 by default — ~5x cheaper than Sonnet at SWE-Bench parity — but the SDK (`thinking_*`) path was still pinned to Sonnet because: 1. **Model name normalization stripped the vendor prefix.** `_normalize_model_name("moonshotai/kimi-k2.6")` returned `"kimi-k2.6"`, which OpenRouter cannot route — the unprefixed form only resolves for Anthropic models. The docstring on `thinking_standard_model` claimed "the Claude Agent SDK CLI only speaks to Anthropic endpoints", but the env builder shows the CLI happily talks to OpenRouter's `/messages` endpoint, which routes to any vendor in the catalog. 2. **The default was `anthropic/claude-sonnet-4-6`.** Same model on a more expensive route. 3. **Cost label was hardcoded to `provider="anthropic"`** on the SDK path's `persist_and_record_usage` call, making cost-analytics rows misleading once Kimi runs. ## What 1. **`_normalize_model_name`** ([sdk/service.py](autogpt_platform/backend/backend/copilot/sdk/service.py)) — when `config.openrouter_active` is True, the canonical `vendor/model` slug is preserved unchanged so OpenRouter can route to the correct provider. Direct-Anthropic mode keeps the existing strip-prefix + dot-to-hyphen conversion (Anthropic API requires both) and now **raises `ValueError`** when paired with a non-Anthropic vendor slug — silent strip would have sent `kimi-k2.6` to the Anthropic API and produced an opaque `model_not_found`. 2. **`thinking_standard_model`** ([config.py](autogpt_platform/backend/backend/copilot/config.py)) — default flipped from `anthropic/claude-sonnet-4-6` to `moonshotai/kimi-k2.6`. Field description rewritten; rollback to Sonnet is one env var (`CHAT_THINKING_STANDARD_MODEL=anthropic/claude-sonnet-4.6`). 3. **`@model_validator(mode="after")`** on `ChatConfig` ([config.py:_validate_sdk_model_vendor_compatibility](autogpt_platform/backend/backend/copilot/config.py)) — fail at config load when `use_openrouter=False` is paired with a non-Anthropic SDK slug. The runtime guard in `_normalize_model_name` is kept as defence-in-depth, but the validator turns a per-request 500 into a boot-time error message the operator sees once, before any traffic lands. Covers `thinking_standard_model`, `thinking_advanced_model`, and `claude_agent_fallback_model`. Subscription mode is exempt (resolver returns `None` and never normalizes). The credential-missing case (`use_openrouter=True` + no `api_key`) is intentionally NOT a boot-time error so CI builds and OpenAPI-schema export jobs that construct `ChatConfig()` without secrets keep working — the runtime guard still catches it on the first SDK turn. 4. **Cost provider attribution** ([sdk/service.py:stream_chat_completion_sdk](autogpt_platform/backend/backend/copilot/sdk/service.py)) — `persist_and_record_usage` now passes `provider="open_router" if config.openrouter_active else "anthropic"` instead of hardcoded `"anthropic"`. The dollar value still comes from `ResultMessage.total_cost_usd`; this just fixes the analytics label. 5. **Baseline rollback example** ([config.py:fast_standard_model description](autogpt_platform/backend/backend/copilot/config.py)) — same dot-vs-hyphen footgun fix (CodeRabbit catch). 6. **Tests** — `TestNormalizeModelName` (sdk/) monkeypatches a deterministic config per case (the helper-test variants were passing accidentally based on ambient env). New `TestSdkModelVendorCompatibility` class in `config_test.py` covers all five validator shapes (default-Kimi + direct-Anthropic raises, anthropic override succeeds, openrouter mode succeeds, subscription mode skips check, advanced+fallback tier also validated, empty fallback skipped). `_ENV_VARS_TO_CLEAR` extended to all model/SDK/subscription env aliases so a leftover dev `.env` value can't mask validator behaviour. New `_make_direct_safe_config` helper for direct-Anthropic tests. ## Test plan - [x] `poetry run pytest backend/copilot/config_test.py backend/copilot/sdk/service_test.py backend/copilot/sdk/service_helpers_test.py backend/copilot/sdk/env_test.py backend/copilot/sdk/p0_guardrails_test.py` — 238 pass - [x] `poetry run pytest backend/copilot/` — 2560 pass + 5 pre-existing integration failures (need real API keys / browser env, unrelated) - [x] CI green on `feat/copilot-sdk-kimi-default` (35 pass / 0 fail / 1 neutral) - [x] Manual: SDK extended_thinking turn against Kimi K2.6 via OpenRouter on the native dev stack — request lands with `model=moonshotai/kimi-k2.6`, response streams back, multi-turn `--resume` recalls facts across turns. Backend log: `[SDK] Per-request model override: standard (moonshotai/kimi-k2.6)`. - [x] Manual: rollback path — `CHAT_THINKING_STANDARD_MODEL=anthropic/claude-sonnet-4.6` resumes Sonnet routing. ## Known follow-ups (not in this PR) These surfaced during manual testing and will need separate PRs: - **SDK CLI cost is wrong for non-Anthropic models.** `ResultMessage.total_cost_usd` comes from a static Anthropic pricing table baked into the CLI binary; for Kimi K2.6 it falls back to Sonnet rates, **over-billing ~5x** ($0.089 vs the real ~$0.018 for ~30K prompt + ~80 completion). The `provider` label is now correct but the dollar value isn't. Needs either a per-model rate card override on our side or a CLI patch upstream. - **Mid-session model switch (Kimi → Opus) breaks.** Kimi's `ThinkingBlock`s have no Anthropic `signature` field; when the user toggles standard → advanced after a Kimi turn, Opus rejects the replayed transcript with `Invalid signature in thinking block`. Needs transcript scrubbing on model switch (similar to existing `TestStripStaleThinkingBlocks` pattern). - **Reasoning UI ordering on Kimi.** Moonshot/OpenRouter places `reasoning` AFTER text in the response; the SDK's `AssistantMessage.content` reflects that order, and `response_adapter` emits SSE events in the same order — so reasoning lands BELOW the answer in the UI instead of above. Needs `ThinkingBlock` hoisting in `response_adapter.py`. |
||
|
|
4f11867d92 |
feat(backend/copilot): TodoWrite for baseline copilot (#12879)
## Summary
Add `TodoWrite` to baseline copilot so the "task checklist" UI works on
non-Claude models (Kimi, GPT, Grok, etc.) the same way it works on the
SDK path. Baseline previously had no `TodoWrite` tool at all — only SDK
mode did via the Claude Code CLI's built-in — so models on baseline just
couldn't reach for a planning checklist.
This closes the last clear feature gap blocking baseline from being the
primary copilot path without giving up model flexibility.
## What ships
- **New MCP tool `TodoWrite`** in `TOOL_REGISTRY`, schema matching the
one the frontend's `GenericTool.helpers.ts` (`getToolCategory → "todo"`)
already renders as the **Steps** accordion. The tool is a stateless echo
— the canonical list lives in the model's latest tool-call args and
replays from transcript on subsequent turns.
- **Prompt guidance** in `SHARED_TOOL_NOTES` teaching the model when to
use it (3+ step tasks; always send the full list; exactly one
`in_progress` at a time).
- **Sharpened `run_sub_session` guidance** in the same prompt section —
framed explicitly as the context-isolation primitive for baseline.
Clearer for the model, no dual-primitive confusion.
## How the SDK path stays untouched
- SDK mode keeps using the CLI-native `TodoWrite` built-in.
- `BASELINE_ONLY_MCP_TOOLS = {"TodoWrite"}` in `sdk/tool_adapter.py`
filters the baseline MCP wrapper out of SDK's `allowed_tools` — no name
shadowing.
- `SDK_BUILTIN_TOOL_NAMES` is now an explicit allowlist (not
auto-derived from capitalization) so the classification stays coherent
when a capitalized tool is platform-owned.
## Files
| File | Change |
|---|---|
| `backend/copilot/tools/todo_write.py` | new — `TodoWriteTool` |
| `backend/copilot/tools/__init__.py` | register in `TOOL_REGISTRY` |
| `backend/copilot/tools/models.py` | add `TodoItem` +
`TodoWriteResponse` + `ResponseType.TODO_WRITE` |
| `backend/copilot/permissions.py` | explicit `SDK_BUILTIN_TOOL_NAMES`;
`apply_tool_permissions` maps baseline-only tools to CLI name for SDK |
| `backend/copilot/sdk/tool_adapter.py` | `BASELINE_ONLY_MCP_TOOLS`
filter |
| `backend/copilot/prompting.py` | `TodoWrite` + sharpened
`run_sub_session` guidance |
| `backend/api/features/chat/routes.py` | add `TodoWriteResponse` to
`ToolResponseUnion` |
| `backend/copilot/tools/todo_write_test.py` | new — schema + execute
tests |
| `frontend/src/app/api/openapi.json` | regenerated |
| `tools/tool_schema_test.py` | budget bumped `32_800 → 34_000` (actual
33_865, +1_065 headroom) |
## Test plan
- [x] `poetry run pytest backend/copilot/
backend/api/features/chat/routes_test.py` — **1010 passing**
- [x] Tool schema char budget regression gate passes
- [x] `_assert_tool_names_consistent` passes
- [x] **E2E on local native stack (Kimi K2.6 via OpenRouter,
`CHAT_USE_CLAUDE_AGENT_SDK=false`)**: baseline called `TodoWrite` on a
3-step prompt, SSE stream carried the exact `{content, activeForm,
status}` shape the UI expects, "Steps" dialog renders `Task list — 0/3
completed` with all three items (see test-report comment below).
- [x] Negative cases covered: two `in_progress` → rejected, missing
`activeForm` → rejected, non-list `todos` → rejected.
|
||
|
|
33a608ec78 |
feat(platform/copilot): live baseline streaming + render flag + Sonar web_search + simulator cost tracking + reconnect fixes (#12873)
### Why / What / How **Why.** Three problems on the baseline copilot path that compound: extended-thinking turns froze the UI for minutes because Kimi K2.6 events were buffered in `state.pending_events: list` until the full `tool_call_loop` iteration finished (reasoning arrived in one lump at the end); the SSE stream replayed 1000 events on every reconnect and the frontend opened multiple SSE streams in quick succession on tab-focus thrash (reconnect storm → UI flickers, tab freezes); the `web_search` tool hit Anthropic's server-side beta directly via a dispatch-model round-trip that fed entire page contents back through the model for a second inference pass (observed $0.072 on a 74K-token call); and the simulator dry-run path ran on Gemini Flash without any cost tracking at all, so every dry-run was free on the platform's microdollar ledger. **What.** Grouped deltas, all targeting reliability, cost, and UX of the copilot live-answer pipeline: - **Live per-token baseline streaming.** `state.pending_events` is now an `asyncio.Queue` drained concurrently by the outer async generator. The tool-call loop runs as a background task; reasoning / text / tool events reach the SSE wire during the upstream OpenRouter stream, not after it. `None` is the close sentinel; inner-task exceptions are re-raised via `await loop_task` once the sentinel arrives. An `emitted_events: list` mirror preserves post-hoc test inspection. Coalescing widened 32/40 → 64/50 ms to halve the React re-render rate on extended-thinking turns while staying under the ~100 ms perceptual threshold. - **Reasoning render flag** — `ChatConfig.render_reasoning_in_ui: bool = True` wired through both `BaselineReasoningEmitter` and `SDKResponseAdapter`. When False the wire `StreamReasoning*` events are suppressed while the persisted `ChatMessage(role='reasoning')` rows always survive (decoupled from the render flag so audit/replay is unaffected); the service-layer yield filter does the gating. Tokens are still billed upstream; operator kill-switch for UI-level flicker investigations. - **Reconnect storm mitigations** — `ChatConfig.stream_replay_count: int = 200` (was hard-coded 1000) caps `stream_registry.subscribe_to_session` XREAD size. Frontend `useCopilotStream::handleReconnect` adds a 1500 ms debounce via `lastReconnectResumeAtRef`, so tab-focus thrash doesn't fan out into 5–6 parallel replays in the same second. - **web_search rewritten to Perplexity Sonar via OpenRouter** — single unified credential, real `usage.cost` flows through `persist_and_record_usage(provider='open_router')`. Two tiers via a `deep` param: `perplexity/sonar` (~$0.005/call quick) and `perplexity/sonar-deep-research` (~$0.50–$1.30/call multi-step research). Replaces the Anthropic-native + server-tool dispatches; drops the hardcoded pricing constants entirely. - **Synthesised answer surfaced end-to-end** — Sonar already writes a web-grounded answer on the same call we pay for; the new `WebSearchResponse.answer` field passes it through and the accordion UI renders it above citations so the agent doesn't re-fetch URLs that are usually bot-protected anyway. - **Deep-tier cost warning + UI affordances** — `deep` param description is explicit that it's ~100× pricier; UI labels read "Researching / Researched / N research sources" when `deep=true` so users know what's running. - **Simulator cost tracking + cheaper default** — `google/gemini-2.5-flash` → `google/gemini-2.5-flash-lite` (3× cheaper tokens) and every dry-run now hits `persist_and_record_usage(provider='open_router')` with real `usage.cost`. Previously each sim was free against the user's microdollar budget. - **Typed access everywhere** — cost extractors now use `openai.types.CompletionUsage.model_extra["cost"]` and `openai.types.chat.ChatCompletion` / `Annotation` / `AnnotationURLCitation` with no `getattr` / duck typing. Mirrors the baseline service's `_extract_usage_cost` pattern; keep in sync. **How.** Key file touches: 1. `copilot/config.py` — `render_reasoning_in_ui`, `stream_replay_count`, `simulation_model` default. 2. `copilot/baseline/service.py` — `_BaselineStreamState.pending_events: asyncio.Queue`, `_emit` / `_emit_all` helpers, outer generator runs `tool_call_loop` as a background task + yields from queue concurrently. 3. `copilot/baseline/reasoning.py` — `BaselineReasoningEmitter(render_in_ui=...)`, coalescing bumped to 64 chars / 50 ms. 4. `copilot/sdk/service.py` — `state.adapter.render_reasoning_in_ui` threaded through every adapter construction. 5. `copilot/sdk/response_adapter.py` — `render_reasoning_in_ui` wiring + service-layer yield filter gating for wire suppression while persistence stays intact. 6. `copilot/stream_registry.py` — `count=config.stream_replay_count`. 7. `frontend/.../useCopilotStream.ts::handleReconnect` — 1500 ms debounce. 8. `copilot/tools/web_search.py` + `models.py` — Sonar quick/deep paths, `WebSearchResponse.answer` + typed extractors. 9. `frontend/.../GenericTool/*` — `answer` render + deep-aware labels / accordion titles. 10. `executor/simulator.py` + `executor/manager.py` + `copilot/config.py` — cost tracking + model swap + `user_id` threading. ### Changes - `copilot/config.py` — new `render_reasoning_in_ui`, `stream_replay_count`; `simulation_model` default flipped to Flash-Lite. - `copilot/baseline/service.py` — `pending_events: asyncio.Queue` refactor; outer gen runs loop as task, yields from queue live. - `copilot/baseline/reasoning.py` — `BaselineReasoningEmitter(render_in_ui=...)` + 64/50 coalesce. - `copilot/sdk/service.py` + `response_adapter.py` — `render_reasoning_in_ui` wire suppression (persistence preserved). - `copilot/stream_registry.py` — replay cap from config. - `copilot/tools/web_search.py` + `models.py` — Sonar quick/deep + `answer` field + typed extractors. - `copilot/tools/helpers.py` — tool description tightens `deep=true` cost warning. - `frontend/.../useCopilotStream.ts` — reconnect debounce. - `frontend/.../GenericTool/GenericTool.tsx` + `helpers.ts` + tests — render `answer`, deep-aware verbs / titles. - `executor/simulator.py` + `simulator_test.py` + `executor/manager.py` — cost tracking + model swap + user_id plumbing. ### Follow-up (deferred to a separate PR) SDK per-token streaming via `include_partial_messages=True` was attempted (commits `599e83543` + `530fa8f95`) and reverted here. The two-signal model (StreamEvent partial deltas + AssistantMessage summary) needs proper per-block diff tracking — when the partial stream delivers a subset of the final block content, emit only `summary.text[len(already_emitted):]` from the summary rather than gating on a binary flag. Binary gating truncated replies in the field when the partial stream delivered less than the summary (observed: "The analysis template you" cut off mid-sentence because partial had streamed that much and the rest only lived in the summary). SDK reasoning still renders end-of-phase (as today); this PR's baseline per-token streaming is unaffected. ### Checklist For code changes: - [x] Changes listed above - [x] Test plan below - [x] Tested according to the test plan: - [x] `poetry run pytest backend/copilot/baseline/ backend/copilot/sdk/ backend/copilot/tools/web_search_test.py backend/executor/simulator_test.py` — all pass (155 baseline + 927 SDK + web_search + simulator) - [x] `pnpm types && pnpm vitest run src/app/(platform)/copilot/tools/GenericTool/` — pass - [x] Manual: baseline live-streaming — Kimi K2.6 reasoning arrives token-by-token, coalesced (no end-of-stream burst). - [x] Manual: quick web_search via copilot UI — ~$0.005/call, answer + citations rendered, cost logged as `provider=open_router`. - [x] Manual: deep web_search — dispatched only on explicit research phrasing; `sonar-deep-research` billed, UI labels say "Researched" / "N research sources". - [x] Manual: simulator dry-run — Gemini Flash-Lite, `[simulator] Turn usage` log entry, PlatformCostLog row visible. - [x] Manual: reconnect debounce — tab-focus thrash no longer produces parallel XREADs in backend log. - [ ] Manual: `CHAT_RENDER_REASONING_IN_UI=false` smoke-check — reasoning collapse absent, no persisted reasoning row on reload. For configuration changes: - [x] `.env.default` — new config knobs fall back to pydantic defaults; existing `CHAT_MODEL`/`CHAT_FAST_MODEL`/`CHAT_ADVANCED_MODEL` legacy envs still honored upstream (unchanged by this PR). ### Companion PR PR #12876 closes the `run_block`-via-copilot cost-leak gap (registers `PerplexityBlock` / `FactCheckerBlock` in `BLOCK_COSTS`; documents the credit/microdollar wallet boundary). Separate because the credit-wallet side is orthogonal to the copilot microdollar / rate-limit surface this PR ships. |
||
|
|
e3f6d36759 |
feat(backend/blocks): register 13 paid blocks + document credit/microdollar wallet boundary (#12876)
### Why / What / How **Why.** Audit of `BLOCK_COSTS` against `credentials_store.py` system credentials revealed **13 paid blocks** running for free from the credit wallet's perspective — `BLOCK_COSTS.get(type(block))` returned `None`, `cost = 0`, no `spend_credits` deduction. Users without their own API key consumed system credentials with zero credit drain. Separately, the credit wallet (user-facing prepaid balance) and the copilot microdollar counter (operator-side meter that gates `daily_cost_limit_microdollars`) were never documented as separate systems, so future readers kept tripping on the "why isn't this block charging my limit?" question. **What.** Three deltas, all credit-wallet-side: - **Register the 13 paid blocks in `BLOCK_COSTS`** with reasonable per-call credit prices (1 credit = $0.01). Pricing researched against the providers' published rates with ~2-3x markup. - **Document the credit/microdollar boundary** in `copilot/rate_limit.py`: credits = user-facing prepaid wallet with marketplace-creator charging; microdollars = operator-side meter that only ticks on copilot LLM turns (baseline / SDK / web_search / simulator). Block execution bills credits, not microdollars — explicit contract. - **Populate `provider_cost`** on PerplexityBlock so PlatformCostLog rows carry the real OpenRouter `x-total-cost` value via the existing `executor/cost_tracking.log_system_credential_cost` path (separate flow from credit deduction). ### Block costs registered | Provider | Block | Credits | Raw cost / markup | |---|---|---|---| | Perplexity (OpenRouter) | PerplexityBlock — Sonar | 1 | $0.001-0.005 / call | | | PerplexityBlock — Sonar Pro | 5 | $0.025 / call | | | PerplexityBlock — Sonar Deep Research | 10 | up to $0.05 / call | | Jina | FactCheckerBlock | 1 | $0.005 / call | | Mem0 | AddMemoryBlock | 1 | $0.0004 / call (1c floor) | | | SearchMemoryBlock | 1 | $0.004 / call | | | GetAllMemoriesBlock | 1 | $0.004 / call | | | GetLatestMemoryBlock | 1 | $0.004 / call | | ScreenshotOne | ScreenshotWebPageBlock | 2 | $0.0085 / call (2.4x) | | Nvidia | NvidiaDeepfakeDetectBlock | 2 | est $0.005 (no public SKU) | | Smartlead | CreateCampaignBlock | 2 | $0.0065 send-equivalent (3x) | | | AddLeadToCampaignBlock | 1 | $0.0065 (1.5x) | | | SaveCampaignSequencesBlock | 1 | config-only | | ZeroBounce | ValidateEmailsBlock | 2 | $0.008 / email (2.5x) | | E2B + Anthropic | ClaudeCodeBlock | **100** | $0.50-$2 / typical session (E2B sandbox + in-sandbox Claude) | **Not in scope** — already covered via the SDK `ProviderBuilder.with_base_cost()` pattern in their respective `_config.py`: Exa, Linear, Airtable, Bannerbear, Wolfram, Firecrawl, Wordpress, Baas, Stagehand, Dataforseo. ### How 1. `backend/data/block_cost_config.py` — 13 new `BlockCost` entries (3 Perplexity models + Fact Checker + 11 from this round). 2. `backend/copilot/rate_limit.py` — boundary docstring. 3. `backend/blocks/perplexity.py` — populate `NodeExecutionStats.provider_cost` so PlatformCostLog rows carry the real OpenRouter `x-total-cost` value. 4. Tests — `TestUnregisteredBlockRunsFree` regression + `TestNewlyRegisteredBlockCosts` pinning every new entry by `cost_amount` so a future refactor can't quietly drop one. The companion Notion "Platform System Credentials" database has been updated with a new `Platform Credit Cost` column populated across all 30 provider rows. ### Scope trim An earlier revision piped block execution cost into the **copilot microdollar counter** via `_record_block_microdollar_cost` in `copilot/tools/helpers.py::execute_block`. That was reverted in `16ae0f7b5` — the microdollar counter stays scoped to copilot LLM turns only, credit wallet handles block execution. The pipe-through crossed a boundary we explicitly want to keep. ### Changes - `backend/data/block_cost_config.py` — 13 × `BlockCost` entries across 7 providers. - `backend/blocks/perplexity.py` — populate `provider_cost` on the execution stats (feeds PlatformCostLog). - `backend/copilot/rate_limit.py` — boundary docstring only (no behaviour change). - `backend/copilot/tools/helpers_test.py` — `TestUnregisteredBlockRunsFree` + `TestNewlyRegisteredBlockCosts` (8 new regression tests). - `backend/blocks/block_cost_tracking_test.py` — provider-cost extraction pins. ### Checklist For code changes: - [x] Changes listed above - [x] Test plan below - [x] Tested according to the test plan: - [x] `poetry run pytest backend/copilot/tools/helpers_test.py backend/copilot/tools/run_block_test.py backend/copilot/tools/continue_run_block_test.py backend/blocks/block_cost_tracking_test.py backend/blocks/test/test_perplexity.py` — passes - [x] `poetry run pytest backend/executor/manager_cost_tracking_test.py backend/copilot/rate_limit_test.py backend/copilot/token_tracking_test.py` — passes (confirms docstring edits didn't regress the LLM-turn microdollar path) - [x] Pyright clean on all touched files - [ ] Manual: run PerplexityBlock via copilot `run_block` — credits deduct, PlatformCostLog row visible with `provider_cost`, no microdollar-counter tick. - [ ] Manual: run an unregistered block via copilot — no error, no credit drain, no silent billing. - [ ] Manual: run ClaudeCodeBlock via builder — 100 credits deducted from wallet. ### Companion PR PR #12873 ships the copilot microdollar / rate-limit work (web_search cost, simulator cost, reasoning / reconnect fixes). This PR is credit-wallet only. |
||
|
|
c1b9ed1f5e |
fix(backend/copilot): allow multiple compactions per turn (#12834)
### Why / What / How
**Why:** The old `CompactionTracker` set a `_done` flag after the first
completion and short-circuited every subsequent compaction in the same
turn. That blocked the SDK-internal compaction from running after a
pre-query compaction had already fired, so prompt-too-long errors
couldn't actually recover — retries saw the flag, bailed, and we re-hit
the context limit.
**What:** Drop the `_done` flag, track attempts and completions as
separate lists, and expose counters + an observability metadata builder
so callers can record compaction activity per turn.
**How:**
- Remove `_done` and `_compact_start` short-circuits.
- Track `_attempted_sources` / `_completed_sources` /
`_completed_count`.
- Expose `attempt_count`, `completed_count`, and
`get_observability_metadata()` / `get_log_summary()` for downstream
instrumentation (no caller change required in this PR).
### Changes 🏗️
- `backend/copilot/sdk/compaction.py` — rewritten `CompactionTracker`
internals; adds properties + observability helpers.
- `backend/copilot/sdk/compaction_test.py` — tests for multi-compaction
flow + new counters.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] `poetry run pytest backend/copilot/sdk/compaction_test.py -xvs`
passes
- [ ] Local chat that hits prompt-too-long now recovers via SDK
compaction instead of failing the turn
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Changes core streaming compaction state transitions and persistence
timing, which could affect UI event sequencing or compaction completion
behavior under concurrency; coverage is improved with new
multi-compaction tests.
>
> **Overview**
> Fixes `CompactionTracker` so compaction is no longer single-shot per
turn: removes the `_done`/event-gate behavior, queues multiple
`on_compact()` hook firings via a pending transcript-path deque, and
allows subsequent SDK-internal compactions after a pre-query compaction
within the same query.
>
> Adds lightweight instrumentation by tracking attempt/completion
sources and counts, plus `get_observability_metadata()` and
`get_log_summary()` (including source summaries like `sdk_internal:2`).
Updates/expands tests to cover multi-compaction flows, transcript-path
handling, and the new counters/metadata.
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
|
||
|
|
45bc167184 |
feat(backend/copilot): Kimi K2.6 fast default + 4-config matrix + coalesced reasoning + web_search tool (#12871)
### Why / What / How **Why.** Three unrelated but interlocking problems on the baseline (OpenRouter) copilot path, all blocking us from making Kimi K2.6 the default fast model: 1. **Cost / capability gap on the default.** Kimi K2.6 prices at $0.60 / $2.80 per MTok — ~5x cheaper input and ~5.4x cheaper output than Sonnet 4.6 — while tying Opus on SWE-Bench Verified (80.2% vs 80.8%) and beating it on SWE-Bench Pro (58.6% vs 53.4%). OpenRouter exposes the same `reasoning` / `include_reasoning` extension on Moonshot endpoints that #12870 plumbed for Anthropic, so the reasoning collapse lights up end-to-end without per-provider code. 2. **Kimi reasoning deltas freeze the UI.** K2.6 emits ~4,700 reasoning-delta SSE events per turn vs ~28 on Sonnet — the AI SDK v6 Reasoning UIMessagePart can't keep up and the tab locks. Needs a coalescing buffer upstream. 3. **Kimi loops on `require_guide_read`.** The guide-guard checks `session.messages` for a prior `agent_building_guide` call, but tool calls aren't flushed to `session.messages` until the end of the turn — mid-turn the check keeps returning False and Kimi calls the guide-load tool repeatedly in the same turn. Needs an in-flight tracker that lives on `ChatSession`. 4. **No `web_search` tool on either path.** Kimi doesn't have a native web-search equivalent and the SDK path's native `WebSearch` (the Claude Code CLI's built-in) doesn't carry cost accounting. We need one implementation that both paths share and that reports cost through the same tracker as every other tool call. **What.** Five grouped deltas on the baseline service, tool layer, and config: - **Kimi K2.6 default.** `fast_standard_model` defaults to `moonshotai/kimi-k2.6`. Full 2×2 model matrix below. Rollback is one env var. - **4-config model matrix.** `fast_standard_model` / `fast_advanced_model` / `thinking_standard_model` / `thinking_advanced_model`. Each cell independent so baseline can run a cheap provider at the standard tier without leaking into the SDK path (which is Anthropic-only by CLI contract). Legacy env vars (`CHAT_MODEL`, `CHAT_FAST_MODEL`, `CHAT_ADVANCED_MODEL`) stay aliased via `validation_alias` so live deployments keep resolving to the same effective cell. - **Reasoning delta coalescing.** `BaselineReasoningEmitter` buffers deltas and flushes on a char-count OR time-interval threshold (32 chars / 40 ms). ~4,700 → ~150 SSE events per turn on Kimi; no perceptible change on Sonnet (which was already well under the threshold). - **In-flight tool-call tracker.** `ChatSession._inflight_tool_calls` PrivateAttr is populated when a tool-call block is emitted and cleared at turn end. `session.has_tool_been_called_this_turn(name)` now returns True mid-turn, not just after the tool-result lands in `session.messages` — which is what `require_guide_read` needs to cut the loop. - **New `web_search` copilot tool.** Wraps Anthropic's server-side `web_search_20250305` beta via `AsyncAnthropic` (direct — OpenRouter can't proxy server-side tool execution). Dispatches through `claude-haiku-4-5` with `max_uses=1`. Cost estimated from published rates ($0.010 per search + Haiku tokens) since the Anthropic Messages API doesn't report cost on the response; reported to `persist_and_record_usage(provider='anthropic')` on both paths. SDK native `WebSearch` moved from `_SDK_BUILTIN_ALWAYS` into `SDK_DISALLOWED_TOOLS` so both paths now dispatch through `mcp__copilot__web_search`. **How.** 1. `copilot/config.py` — 2×2 model fields with `AliasChoices` preserving legacy env var names. `populate_by_name = True` so `ChatConfig(fast_standard_model=...)` works in tests. 2. `copilot/baseline/service.py::_resolve_baseline_model` — resolves the active baseline cell from `mode` + `tier`, no longer delegates to the SDK resolver. 3. `copilot/baseline/reasoning.py` — `BaselineReasoningEmitter` gains `_pending_delta` / `_last_flush_monotonic` and flushes on `len(_pending_delta) >= _COALESCE_MIN_CHARS` OR `monotonic() - _last_flush_monotonic >= _COALESCE_MAX_INTERVAL_MS / 1000`. `_is_reasoning_route` rewritten as an anchored prefix match covering `anthropic/`, `anthropic.`, `moonshotai/`, and `openrouter/kimi-` — split from the narrower `_is_anthropic_model` gate that still governs `cache_control` markers (which Kimi doesn't support). 4. `copilot/model.py::ChatSession` — `_inflight_tool_calls: set[str] = PrivateAttr(default_factory=set)` plus `announce_inflight_tool_call` / `clear_inflight_tool_calls` / `has_tool_been_called_this_turn`. 5. `copilot/tools/helpers.py::require_guide_read` — check `session.has_tool_been_called_this_turn(_AGENT_GUIDE_TOOL_NAME)` before falling back to scanning `session.messages`. 6. `copilot/tools/web_search.py` — new `WebSearchTool` + `_extract_results` + `_estimate_cost_usd`. `is_available` gated on `Settings().secrets.anthropic_api_key` so the deployment can roll back just by unsetting the key. 7. `copilot/tools/__init__.py` — registers `web_search` in `TOOL_REGISTRY` so it becomes `mcp__copilot__web_search` in the SDK path. 8. `copilot/sdk/tool_adapter.py` — `WebSearch` moves to `SDK_DISALLOWED_TOOLS`. ### Changes - `copilot/config.py` — 2×2 model matrix with legacy env alias preservation; `populate_by_name=True`. - `copilot/baseline/service.py::_resolve_baseline_model` — resolves against the new matrix. - `copilot/baseline/reasoning.py` — `BaselineReasoningEmitter` coalescing buffer; `_is_reasoning_route` rewritten as anchored prefix match (covers `anthropic/`, `anthropic.`, `moonshotai/`, `openrouter/kimi-`). - `copilot/model.py::ChatSession` — `_inflight_tool_calls` PrivateAttr + helpers. - `copilot/baseline/service.py::_baseline_tool_executor` — calls `announce_inflight_tool_call` after emitting `StreamToolInputAvailable`; `clear_inflight_tool_calls` in the outer `finally` before persist. - `copilot/tools/helpers.py::require_guide_read` — reads the new tracker first. - `copilot/tools/web_search.py` (new) — Anthropic `web_search_20250305` wrapper + cost estimator. - `copilot/tools/web_search_test.py` (new) — extractor / cost / dispatch / registry tests (12 total). - `copilot/tools/models.py` — `WebSearchResponse` + `WebSearchResult` + `ResponseType.WEB_SEARCH`. - `copilot/tools/__init__.py` — registers `web_search`. - `copilot/sdk/tool_adapter.py` — moves native `WebSearch` to `SDK_DISALLOWED_TOOLS`. ### Checklist For code changes: - [x] Changes listed above - [x] Test plan below - [ ] Tested according to the test plan: - [x] `poetry run pytest backend/copilot/baseline/` — all pass - [x] `poetry run pytest backend/copilot/sdk/` — all pass (SDK resolver untouched) - [x] `poetry run pytest backend/copilot/tools/web_search_test.py` — 12 pass - [ ] Manual: send a multi-step prompt on fast mode with default config; confirm backend routes to `moonshotai/kimi-k2.6`, SSE stream carries `reasoning-start/delta/end` (coalesced), Reasoning collapse renders + survives hard reload. - [ ] Manual: 43-tool payload reliability on Kimi — watch for malformed tool-call JSON or wrong-tool selection. - [ ] Manual: `CHAT_FAST_STANDARD_MODEL=anthropic/claude-sonnet-4-6` restarts confirm Sonnet routing (rollback path works). - [ ] Manual: SDK path (`CHAT_USE_CLAUDE_AGENT_SDK=true`) still selects the SDK service and uses `thinking_standard_model` = Sonnet (no Kimi leaked into extended thinking). - [ ] Manual: prompt that forces `web_search` — confirm results render, `persist_and_record_usage(provider='anthropic')` runs, cost lands in the per-user ledger. - [ ] Manual: ask Kimi a question that would require `agent_building_guide` — confirm the guide loads exactly once per turn (no loop). For configuration changes: - [x] `.env.default` — all four model fields fall back to the pydantic defaults; legacy `CHAT_MODEL` / `CHAT_FAST_MODEL` / `CHAT_ADVANCED_MODEL` remain honored via `AliasChoices`. |
||
|
|
e4f291e54b |
feat(frontend): add AutoGPT logo to share page and zip download for outputs (#11741)
### Why / What / How
**Why:** The share page was unbranded (no logo/navigation) and images
from workspace files couldn't render because the proxy didn't handle
public share URLs. Zip downloads also had several gaps — no size limits,
no workspace file support, silent failures on data URLs, and single
files got wrapped in unnecessary zips.
**What:** Adds AutoGPT branding to the share page, secure public access
to workspace files via a SharedExecutionFile allowlist, and a hardened
zip download module.
**How:** Backend scans execution outputs for `workspace://` URIs on
share-enable and persists an allowlist in a new `SharedExecutionFile`
table. A new unauthenticated endpoint serves files validated against
this allowlist. Frontend proxy routing is extended (with UUID
validation) to handle the 7-segment public share download path as a
binary response. Download logic is consolidated into a shared module
with size limits, parallel fetches, filename sanitization, and
single-file direct download.
### Changes 🏗️
**Share page branding:**
- AutoGPT logo header centered at top, linking to `/`
- Dark/light mode variants with correct `priority` on visible variant
only
**Secure public workspace file access (backend):**
- New `SharedExecutionFile` Prisma model with `@@unique([shareToken,
fileId])` constraint
- `_extract_workspace_file_ids()` scans outputs for `workspace://` URIs
(handles nested dicts/lists)
- `create_shared_execution_files()` / `delete_shared_execution_files()`
manage allowlist lifecycle
- Re-share cleans up stale records before creating new ones (prevents
old token access)
- `GET /public/shared/{token}/files/{id}/download` — validates against
allowlist, uniform 404 for all failures
- `Content-Disposition: inline` for share page rendering
- Hand-written Prisma migration
(`20260417000000_add_shared_execution_file`)
**Frontend proxy fix:**
- `isWorkspaceDownloadRequest` extended to match public share path
(7-segment)
- UUID format validation on dynamic path segments (file IDs, share
tokens)
- 30+ adversarial security tests: path traversal, SQL injection, SSRF
payloads, unicode homoglyphs, null bytes, prototype pollution, etc.
**Download module (`download-outputs.ts`):**
- Consolidated from two divergent copies into single shared module
- `fetchFileAsBlob` with content-length pre-check before buffering
- `sanitizeFilename` strips path traversal, leading dots, falls back to
"file"
- `getUniqueFilename` deduplicates with counter suffix
- `fetchInParallel` with configurable concurrency (5)
- 50 MB per-file limit, 200 MB aggregate limit
- Data URL try-catch, relative URL support (`/api/proxy/...`)
- Single-file downloads skip zip, go directly to browser download
- Dynamic JSZip import for bundle optimization
- 26 unit tests
**Share page file rendering:**
- `WorkspaceFileRenderer` builds public share URLs when `shareToken` is
in metadata
- `RunOutputs` propagates `shareToken` to renderer metadata
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Share page renders with centered AutoGPT logo
- [x] Logo links to `/` and shows correct dark/light variant
- [x] Workspace images render inline on share page
- [x] Download all produces zip with workspace images included
- [x] Single-file download skips zip, downloads directly
- [x] Re-sharing generates new token and cleans up old allowlist records
- [x] Public file download returns 404 for files not in allowlist
- [x] All frontend tests pass (122 tests across 3 suites)
- [x] Backend formatter + pyright pass
- [x] Frontend format + lint + types pass
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
> Note: New Prisma migration required. No env/docker changes needed.
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Adds a new unauthenticated file download path gated by a database
allowlist plus a new Prisma model/migration; mistakes here could expose
workspace files or break sharing. Frontend download behavior also
changes significantly (zipping/fetching), which could impact
large-output performance and edge cases.
>
> **Overview**
> Enables **public rendering and downloading of workspace files on
shared execution pages** by introducing a `SharedExecutionFile`
allowlist tied to the share token and populating it when sharing is
enabled (and clearing it on disable/re-share).
>
> Adds `GET /public/shared/{share_token}/files/{file_id}/download` (no
auth) that validates the requested file against the allowlist and
returns a uniform 404 on failure; workspace download responses now
support `inline` `Content-Disposition` via the exported
`create_file_download_response` helper.
>
> Frontend updates the share page to pass `shareToken` into output
renderers so `WorkspaceFileRenderer` can build public-share download
URLs; the proxy matcher is extended/strictly UUID-validated for both
workspace and public-share download paths with extensive adversarial
tests. Output downloading is consolidated into `download-outputs.ts`
using dynamic `jszip` import, filename sanitization/deduping,
concurrency + size limits, and a single-file non-zip fast path.
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
|
||
|
|
6efbc59fd8 |
feat(backend): platform server linking API for multi-platform CoPilot (#12615)
## Why
AutoPilot (CoPilot) needs to reach users across chat platforms — Discord
first, Telegram / Slack / Teams / WhatsApp next. To make usage and
billing coherent, every conversation resolves to one AutoGPT account.
There are two independent linking flows:
- **SERVER links**: the first person to claim a server (Discord guild,
Telegram group, …) becomes its owner. Anyone in the server can chat with
the bot; all usage bills to the owner.
- **USER links**: an individual links their 1:1 DMs with the bot to
their own AutoGPT account. Independent from server links — a server
owner still has to link their DMs separately.
## What
Backend for platform linking, split cleanly by trust boundary:
- **Bot-facing operations** run over cluster-internal RPC via a new
`PlatformLinkingManager(AppService)`. No shared bearer token; trust is
the cluster network itself.
- **User-facing operations** stay on REST under JWT auth (the same
pattern as every other feature).
### REST endpoints (JWT auth)
- `GET /api/platform-linking/tokens/{token}/info` — non-sensitive
display info for the link page
- `POST /api/platform-linking/tokens/{token}/confirm` — confirm a SERVER
link
- `POST /api/platform-linking/user-tokens/{token}/confirm` — confirm a
USER link
- `GET /api/platform-linking/links` / `DELETE /links/{id}` — manage
server links
- `GET /api/platform-linking/user-links` / `DELETE /user-links/{id}` —
manage DM links
### `PlatformLinkingManager` `@expose` methods (internal RPC)
- `resolve_server_link(platform, platform_server_id) -> ResolveResponse`
- `resolve_user_link(platform, platform_user_id) -> ResolveResponse`
- `create_server_link_token(req) -> LinkTokenResponse`
- `create_user_link_token(req) -> LinkTokenResponse`
- `get_link_token_status(token) -> LinkTokenStatusResponse`
- `start_chat_turn(req) -> ChatTurnHandle` — resolves the owner,
persists the user message, creates the stream-registry session, enqueues
the turn; returns `(session_id, turn_id, user_id, subscribe_from="0-0")`
so the caller subscribes directly to the per-turn Redis stream.
### New DB models
- `PlatformLink` — `(platform, platformServerId)` → owner's AutoGPT
`userId`
- `PlatformUserLink` — `(platform, platformUserId)` → AutoGPT `userId`
(for DMs)
- `PlatformLinkToken` — one-time token with `linkType` discriminator
(SERVER | USER) and 30-min TTL
## How
- **New `backend/platform_linking/` package**: `models.py` (Pydantic
types), `links.py` (link CRUD helpers — pure business logic), `chat.py`
(`start_chat_turn` orchestration), `manager.py`
(`PlatformLinkingManager(AppService)` + `PlatformLinkingManagerClient`).
Pattern matches `backend/notifications/` + `backend/data/db_manager.py`.
- **Exception translation at the edge**. Helpers raise domain exceptions
(`NotFoundError`, `LinkAlreadyExistsError`, `LinkTokenExpiredError`,
`LinkFlowMismatchError`, `NotAuthorizedError` — all `ValueError`
subclasses in `backend.util.exceptions` so they auto-register with the
RPC exception-mapping). REST routes translate to HTTP codes via a 7-line
`_translate()` helper.
- **Independent scopes, no DM fallback**. `find_server_link()` and
`find_user_link()` each query their own table. A user who owns a linked
server does not leak that identity into their DMs.
- **Race-safe token consumption**. Confirm paths do atomic `update_many`
with `usedAt = None` + `expiresAt > now` in the WHERE clause;
`create_*_token` invalidates pending tokens before issuing a new one.
- **Bug fix**: `start_chat_turn` persists the user message via
`append_and_save_message` before enqueueing the executor turn — mirrors
`backend/api/features/chat/routes.py`. The previous `chat_proxy.py`
skipped this and ran the executor with no user message in history.
- **Streaming**. Copilot streaming lives on Redis Streams (persistent,
replayable). The bot subscribes directly with `subscribe_from="0-0"`, so
late subscribers replay the full stream; no HTTP SSE proxy needed.
- **No PII in logs**: logs reference `session_id`, `turn_id`,
`server_id`, and AutoGPT `user_id` (last 8 chars), but never raw
platform user IDs.
- **New pod**. `PlatformLinkingManager` runs as its own `AppProcess` on
port `8009`; client via `get_platform_linking_manager_client()`. The
infra chart lands in
[cloud-infrastructure#310](https://github.com/Significant-Gravitas/AutoGPT_cloud_infrastructure/pull/310).
## Tests
- **Models** (`models_test.py`) — Platform / LinkType enums, request
validation (CreateLinkToken / ResolveServer / BotChat), response
schemas.
- **Helpers** (`links_test.py`) — resolve, token create (both flows, 409
on already-linked), token status (pending / linked / expired /
superseded-with-no-link), token info (404 / 410), confirm (404 / wrong
flow / already used / expired / same-user / other-user), delete authz.
- **AppService wiring** (`manager_test.py`) — `@expose` methods delegate
to helpers; client surface covers bot-facing ops and excludes
user-facing ones.
- **Adversarial** (`manager_test.py`, `routes_test.py`):
- `asyncio.gather` double-confirm with same user and with two different
users — exactly one winner, other gets clean `LinkTokenExpiredError`, no
double `PlatformLink.create`.
- Server- and user-link confirm races.
- `TokenPath` regex guard: rejects `%24`, URL-encoded path traversal,
>64 chars; accepts `secrets.token_urlsafe` shape.
- DELETE `link_id` with SQL-injection-style and path-traversal inputs
returns 404 via `NotFoundError`.
## Stack
- #12618 — bot service (rebased onto this so it can consume
`PlatformLinkingManagerClient`)
- #12624 — `/link/{token}` frontend page
-
[cloud-infrastructure#310](https://github.com/Significant-Gravitas/AutoGPT_cloud_infrastructure/pull/310)
— Helm chart for `copilot-bot` + new `platform-linking-manager`
Merge order: this → #12618 → #12624, infra whenever.
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
|
||
|
|
6924cf90a5 |
fix(frontend/copilot): artifact panel fixes (SECRT-2254/2223/2220/2255/2224/2256/2221) (#12856)
### Why / What / How
https://github.com/user-attachments/assets/ca26e0b0-d35d-4a5b-b95f-2421b9907742
**Why** — The Artifact & Side Task List project
(https://linear.app/autogpt/project/artifact-and-side-task-list-ef863c93da3c)
accumulated seven related bugs in the copilot artifact panel. The user
kept seeing panels stuck open, previews broken, clicks not registering —
each ticket was small but they all lived in the same small surface area,
so one review pass is easier than five.
Closes SECRT-2254, SECRT-2223, SECRT-2220, SECRT-2255, SECRT-2224,
SECRT-2256, SECRT-2221.
**What** — Five independent fixes, each in its own commit, shipped
together:
1. **Fragment-link interceptor + render error boundary** (SECRT-2255
crash when clicking `<a href="#x">` in HTML artifacts). Sandboxed srcdoc
iframes resolve fragment links against the parent's URL, so clicking
`#activation` in a Plotly TOC tried to navigate the copilot page into
the iframe. Inject a click-capture script into every artifact iframe;
also wrap the renderer in `ArtifactErrorBoundary` so any future render
throw surfaces with a copyable error instead of a blank panel.
2. **Close panel on copilot page unmount** (SECRT-2254 / 2223 / 2220 —
panel stays open, reopens on unrelated navigation, opens by default on
session switch). The Zustand store outlived page unmounts, so `isOpen:
true` survived `/profile` → `/home` → back. One `useEffect` cleanup in
`useAutoOpenArtifacts` calls `resetArtifactPanel()` on unmount.
3. **Sync loading flip on Try Again** (SECRT-2224 "try again doesn't do
anything"). Retry was correct but the loading-state flip was deferred to
an effect, so a retry that re-failed was visually indistinguishable from
a no-op. `retry()` now sets `isLoading: true` / `error: null`
synchronously with the click so the skeleton flashes every time.
4. **Pointer capture on resize drag** (SECRT-2256 "can't drag right when
expanded far left, click doesn't stop it"). The sandboxed iframe was
eating `pointermove`/`pointerup` events when the cursor drifted over it,
freezing the drag and never delivering the release. `setPointerCapture`
on the handle routes all subsequent pointer events through it regardless
of what's under the cursor.
5. **Stop size-gating natively-rendered artifacts + cache-bust retry**
(SECRT-2221 "broken hi-res PNG preview"). The blanket >10 MB size gate
pushed large images / videos / PDFs into `download-only`, so clicking a
hi-res PNG offered a download instead of a preview. Split the gate so it
only applies to content we actually render in JS (text/html/code/etc).
Image and video retries also append a cache-bust query so the browser
can't silently reuse a negative-cached failure.
**How** — Five commits, one concern each, preserved in the order they
were written. Every fix lands with a regression test that fails on the
unfixed code and passes after.
### Changes 🏗️
- `iframe-sandbox-csp.ts` + usage sites —
`FRAGMENT_LINK_INTERCEPTOR_SCRIPT` injected into all three srcdoc iframe
templates (HTML artifact, inline HTMLRenderer, React artifact).
- `ArtifactErrorBoundary.tsx` (new) — class error boundary local to the
artifact panel with a copyable error fallback.
- `useAutoOpenArtifacts.ts` — unmount cleanup calls
`resetArtifactPanel()`.
- `useArtifactContent.ts` — `retry()` flips loading state synchronously.
- `ArtifactDragHandle.tsx` — `setPointerCapture` /
`releasePointerCapture`; `touch-action: none`.
- `helpers.ts` — split classifier; `NATIVELY_RENDERED` exempts
image/video/pdf from the size gate.
- `ArtifactContent.tsx` — image/video carry a retry nonce that appends
`?_retry=N` on Try Again.
- Test files — new
`ArtifactErrorBoundary`/`ArtifactDragHandle`/`HTMLRenderer` tests, plus
regression cases added to `ArtifactContent.test.tsx`, `helpers.test.ts`,
`iframe-sandbox-csp.test.ts`, `reactArtifactPreview.test.ts`,
`useAutoOpenArtifacts.test.ts`.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] `pnpm vitest run src/app/\(platform\)/copilot
src/components/contextual/OutputRenderers
src/lib/__tests__/iframe-sandbox-csp.test.ts` — 247/247 pass
- [x] `pnpm format && pnpm types` clean
- [x] Manual: open the Plotly-style TOC HTML artifact (SECRT-2255
repro), click each anchor — iframe scrolls internally, browser URL bar
stays put
- [x] Manual: open panel → navigate to /profile → navigate back → panel
closed (SECRT-2254)
- [x] Manual: panel open in session A → click different session → panel
closed (SECRT-2223)
- [ ] Manual: simulate a failed artifact fetch → click Try Again →
skeleton flashes before result (SECRT-2224)
- [x] Manual: expand panel to near-full width → drag back right,
crossing over the iframe → drag keeps working and release ends it
(SECRT-2256)
- [x] Manual: upload a ~25 MB PNG → clicking it previews in an `<img>`,
not a download button (SECRT-2221)
Replaces #12836, #12837, #12838, #12839, #12840 — same fixes, bundled
for review.
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Touches artifact rendering and iframe `srcDoc` generation (including
injected scripts) plus panel state/drag interactions; regressions could
break previews or resizing, but changes are scoped to the copilot
artifact UI with broad test coverage.
>
> **Overview**
> Improves Copilot’s artifact panel resilience and UX by **resetting
panel state on page unmount/session changes**, making content retries
immediately show the loading skeleton, and fixing resize drags via
pointer capture so iframes can’t “steal” pointer events.
>
> Hardens artifact rendering by adding a local `ArtifactErrorBoundary`
that reports to Sentry and shows a copyable error fallback instead of a
blank/crashed panel.
>
> Fixes iframe-based previews by injecting a
`FRAGMENT_LINK_INTERCEPTOR_SCRIPT` into HTML and React artifact `srcDoc`
so `#anchor` clicks scroll within the iframe rather than navigating the
parent URL, and adjusts artifact classification/retry behavior so large
images/videos/PDFs remain previewable and image/video retries cache-bust
failed URLs.
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
|
||
|
|
07e5a6a9e4 |
[Snyk] Security upgrade next from 15.4.10 to 15.4.11 (#12715)

### Snyk has created this PR to fix 1 vulnerabilities in the yarn
dependencies of this project.
#### Snyk changed the following file(s):
- `autogpt_platform/frontend/package.json`
#### Note for
[zero-installs](https://yarnpkg.com/features/zero-installs) users
If you are using the Yarn feature
[zero-installs](https://yarnpkg.com/features/zero-installs) that was
introduced in Yarn V2, note that this PR does not update the
`.yarn/cache/` directory meaning this code cannot be pulled and
immediately developed on as one would expect for a zero-install project
- you will need to run `yarn` to update the contents of the
`./yarn/cache` directory.
If you are not using zero-install you can ignore this as your flow
should likely be unchanged.
<details>
<summary>⚠️ <b>Warning</b></summary>
```
Failed to update the yarn.lock, please update manually before merging.
```
</details>
#### Vulnerabilities that will be fixed with an upgrade:
| | Issue |
:-------------------------:|:-------------------------
 | Allocation of Resources Without Limits or Throttling
<br/>[SNYK-JS-NEXT-15921797](https://snyk.io/vuln/SNYK-JS-NEXT-15921797)
---
> [!IMPORTANT]
>
> - Check the changes in this PR to ensure they won't cause issues with
your project.
> - Max score is 1000. Note that the real score may have changed since
the PR was raised.
> - This PR was automatically created by Snyk using the credentials of a
real user.
---
**Note:** _You are seeing this because you or someone else with access
to this repository has authorized Snyk to open fix PRs._
For more information: <img
src="https://api.segment.io/v1/pixel/track?data=eyJ3cml0ZUtleSI6InJyWmxZcEdHY2RyTHZsb0lYd0dUcVg4WkFRTnNCOUEwIiwiYW5vbnltb3VzSWQiOiJmM2NkN2NiMy1iYzU5LTRkMDMtOGExMi0xOTEwMDk4OGQwNmUiLCJldmVudCI6IlBSIHZpZXdlZCIsInByb3BlcnRpZXMiOnsicHJJZCI6ImYzY2Q3Y2IzLWJjNTktNGQwMy04YTEyLTE5MTAwOTg4ZDA2ZSJ9fQ=="
width="0" height="0"/>
🧐 [View latest project
report](https://app.snyk.io/org/significant-gravitas/project/3d924968-0cf3-4767-9609-501fa4962856?utm_source=github&utm_medium=referral&page=fix-pr)
📜 [Customise PR
templates](https://docs.snyk.io/scan-using-snyk/pull-requests/snyk-fix-pull-or-merge-requests/customize-pr-templates?utm_source=github&utm_content=fix-pr-template)
🛠 [Adjust project
settings](https://app.snyk.io/org/significant-gravitas/project/3d924968-0cf3-4767-9609-501fa4962856?utm_source=github&utm_medium=referral&page=fix-pr/settings)
📚 [Read about Snyk's upgrade
logic](https://docs.snyk.io/scan-with-snyk/snyk-open-source/manage-vulnerabilities/upgrade-package-versions-to-fix-vulnerabilities?utm_source=github&utm_content=fix-pr-template)
---
**Learn how to fix vulnerabilities with free interactive lessons:**
🦉 [Allocation of Resources Without Limits or
Throttling](https://learn.snyk.io/lesson/no-rate-limiting/?loc=fix-pr)
[//]: #
'snyk:metadata:{"breakingChangeRiskLevel":null,"FF_showPullRequestBreakingChanges":false,"FF_showPullRequestBreakingChangesWebSearch":false,"customTemplate":{"variablesUsed":[],"fieldsUsed":[]},"dependencies":[{"name":"next","from":"15.4.10","to":"15.4.11"}],"env":"prod","issuesToFix":["SNYK-JS-NEXT-15921797"],"prId":"f3cd7cb3-bc59-4d03-8a12-19100988d06e","prPublicId":"f3cd7cb3-bc59-4d03-8a12-19100988d06e","packageManager":"yarn","priorityScoreList":[null],"projectPublicId":"3d924968-0cf3-4767-9609-501fa4962856","projectUrl":"https://app.snyk.io/org/significant-gravitas/project/3d924968-0cf3-4767-9609-501fa4962856?utm_source=github&utm_medium=referral&page=fix-pr","prType":"fix","templateFieldSources":{"branchName":"default","commitMessage":"default","description":"default","title":"default"},"templateVariants":["updated-fix-title","pr-warning-shown"],"type":"auto","upgrade":["SNYK-JS-NEXT-15921797"],"vulns":["SNYK-JS-NEXT-15921797"],"patch":[],"isBreakingChange":false,"remediationStrategy":"vuln"}'
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Patch-level upgrade of a core runtime/build dependency (Next.js) can
affect app rendering/build behavior despite being scoped to
dependency/lockfile changes.
>
> **Overview**
> Upgrades the frontend framework dependency `next` from `15.4.10` to
`15.4.11` in `package.json`.
>
> Updates `pnpm-lock.yaml` to reflect the new Next.js version (including
`@next/env`) and re-resolves dependent packages that pin `next` in their
peer/optional dependency graphs (e.g., `@sentry/nextjs`,
`@vercel/analytics`, Storybook Next integration).
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
|
||
|
|
a098f01bd2 |
feat(builder): AI chat panel for the flow builder (#12699)
### Why The flow builder had no AI assistance. Users had to switch to a separate Copilot session to ask about or modify the agent they were looking at, and that session had no context on the graph — so the LLM guessed, or the user had to describe the graph by hand. ### What An AI chat panel anchored to the `/build` page. Opens with a chat-circle button (bottom-right), binds to the currently-opened agent, and offers **only** two tools: `edit_agent` and `run_agent`. Per-agent session is persisted server-side, so a refresh resumes the same conversation. Gated behind `Flag.BUILDER_CHAT_PANEL` (default off; `NEXT_PUBLIC_FORCE_FLAG_BUILDER_CHAT_PANEL=true` to enable locally). ### How **Frontend — new**: - `(platform)/build/components/BuilderChatPanel/` — panel shell + `useBuilderChatPanel.ts` coordinator. Renders the shared Copilot `ChatMessagesContainer` + `ChatInput` (thought rendering, pulse chips, fast-mode toggle — all reused, no parallel chat stack). Auto-creates a blank agent when opened with no `flowID`. Listens for `edit_agent` / `run_agent` tool outputs and wires them to the builder in-place: edit → `flowVersion` URL param + canvas refetch; run → `flowExecutionID` URL param → builder's existing execution-follow UI opens. **Frontend — touched (minimal)**: - `copilot/components/CopilotChatActionsProvider` — new `chatSurface: "copilot" | "builder"` flag so cards can suppress "Open in library" / "Open in builder" / "View Execution" buttons when the chat is the builder panel (you're already there). - `copilot/tools/RunAgent/components/ExecutionStartedCard` — title is now status-aware (`QUEUED → "Execution started"`, `COMPLETED → "Execution completed"`, `FAILED → "Execution failed"`, etc.). - `build/components/FlowEditor/Flow/Flow.tsx` — mount the panel behind the feature flag. **Backend — new**: - `copilot/builder_context.py` — the builder-session logic module. Holds the tool whitelist (`edit_agent`, `run_agent`), the permissions resolver, the session-long system-prompt suffix (graph id/name + full agent-building guide — cacheable across turns), and the per-turn `<builder_context>` prefix (live version + compact nodes/links snapshot). - `copilot/builder_context_test.py` — covers both builders, ownership forwarding, and cap behavior. **Backend — touched**: - `api/features/chat/routes.py` — `CreateSessionRequest` gains `builder_graph_id`. When set, the endpoint routes through `get_or_create_builder_session` (keyed on `user_id`+`graph_id`, with a graph-ownership check). No new route; the former `/sessions/builder` is folded into `POST /sessions`. - `copilot/model.py` — `ChatSessionMetadata.builder_graph_id`; `get_or_create_builder_session` helper. - `data/graph.py` — `GraphSettings.builder_chat_session_id` (new typed field; stores the builder-chat session pointer per library agent). - `api/features/library/db.py` — `update_library_agent_version_and_settings` preserves `builder_chat_session_id` across graph-version bumps. - `copilot/tools/edit_agent.py`, `run_agent.py` — builder-bound guard: default missing `agent_id` to the bound graph, reject any other id. `run_agent` additionally inlines `node_executions` into dry-run responses so the LLM can inspect per-node status in the same turn instead of a follow-up `view_agent_output`. `wait_for_result` docs now explain the two dispatch modes. - `copilot/tools/helpers.py::require_guide_read` — bypassed for builder-bound sessions (the guide is already in the system-prompt suffix). - `copilot/tools/agent_generator/pipeline.py` + `tools/models.py` — `AgentSavedResponse.graph_version` so the frontend can flip `flowVersion` to the newly-saved version. - `copilot/baseline/service.py` + `sdk/service.py` — inject the builder context suffix into the system prompt and the per-turn prefix into the current user message. - `blocks/_base.py` — `validate_data(..., exclude_fields=)` so dry-run can bypass credential required-checks for blocks that need creds in normal mode (OrchestratorBlock). `blocks/perplexity.py` override signature matches. - `executor/simulator.py` — OrchestratorBlock dry-run iteration cap `1 → min(original, 10)` so multi-role patterns (Advocate/Critic) actually close the loop; `manager.py` synthesizes placeholder creds in dry-run so the block's schema validation passes. ### Session lookup The builder-chat session pointer lives on `LibraryAgent.settings.builder_chat_session_id` (typed via `GraphSettings`). `get_or_create_builder_session` reads/writes it through `library_db().get_library_agent_by_graph_id` + `update_library_agent(settings=...)` — no raw SQL or JSON-path filter. Ownership is enforced by the library-agent query's `userId` filter. The per-session builder binding still lives on `ChatSession.metadata.builder_graph_id` (used by `edit_agent`/`run_agent` guards and the system-prompt injection). ### Scope footnotes - Feature flag defaults **false**. Rollout gate lives in LaunchDarkly. - No schema migration required: `builder_chat_session_id` slots into the existing `LibraryAgent.settings` JSON column via the typed `GraphSettings` model. - Commits that address review / CI cycles are interleaved with feature commits — see the commit log for the per-change rationale. ### Test plan - [x] `pnpm test:unit` + backend `poetry run test` for new and touched modules - [x] Agent-browser pass: panel toggle / auto-create / real-time edit re-render / real-time exec URL subscribe / queue-while-streaming / cross-graph reset / hard-refresh session persist - [x] Codecov patch ≥ 80% on diff --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
59273fe6a0 |
fix(frontend): forward sentry-trace and baggage across API proxy (#12835)
### Why / What / How
**Why:** Every request that went through Next's rewrite proxy broke
distributed tracing. The browser Sentry SDK emitted `sentry-trace` and
`baggage`, but `createRequestHeaders` only forwarded impersonation + API
key, so the backend started a disconnected transaction. The frontend →
backend lineage never appeared in Sentry. Same gap on
direct-from-browser requests: the custom mutator never attached the
trace headers itself, so even non-proxied paths lost the link.
**What:**
- **Server side:** forward `sentry-trace` and `baggage` from
`originalRequest.headers` alongside the existing impersonation/API key
forwarding.
- **Client side:** the custom mutator pulls trace data via
`Sentry.getTraceData()` and attaches it to outgoing headers when running
on the client.
**How:** Inline additions — no new observability module, no new
dependencies beyond `@sentry/nextjs` which the frontend already uses for
Sentry init.
### Changes 🏗️
- `src/lib/autogpt-server-api/helpers.ts` — forward `sentry-trace` +
`baggage` in `createRequestHeaders`.
- `src/app/api/mutators/custom-mutator.ts` — import `@sentry/nextjs`,
attach `Sentry.getTraceData()` on client-side requests.
- `src/app/api/mutators/__tests__/custom-mutator.test.ts` — three new
tests: trace-data present, trace-data empty, server-side no-op.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [x] `pnpm vitest run
src/app/api/mutators/__tests__/custom-mutator.test.ts` passes (6/6
locally)
- [x] `pnpm format && pnpm lint` clean
- [x] `pnpm types` clean for touched files (pre-existing unrelated type
errors on dev are untouched)
- [ ] In a local session with Sentry enabled, a `/copilot` chat turn
produces a distributed trace that spans frontend transaction → backend
transaction (single trace ID in Sentry)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Low Risk**
> Low risk: header-only changes to request construction for
observability, with added tests; primary risk is unintended header
propagation affecting upstream/proxy behavior.
>
> **Overview**
> Restores **Sentry distributed tracing continuity** for
frontend→backend calls by propagating `sentry-trace`/`baggage` headers.
>
> On the client, `customMutator` now reads `Sentry.getTraceData()` and
attaches string trace headers to outgoing requests (guarded for
server-side and older Sentry builds). On the server/proxy path,
`createRequestHeaders` now forwards `sentry-trace` and `baggage` from
the incoming `originalRequest` alongside existing impersonation/API-key
forwarding, with new unit tests covering these cases.
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
|
||
|
|
38c2844b83 |
feat(admin): Add system diagnostics and execution management dashboard (#11235)
### Changes 🏗️
This PR adds a comprehensive admin diagnostics dashboard for monitoring
system health and managing running executions.
https://github.com/user-attachments/assets/f7afa3ed-63d8-4b5c-85e4-8756d9e3879e
#### Backend Changes:
- **New data layer** (backend/data/diagnostics.py): Created a dedicated
diagnostics module following the established data layer pattern
- get_execution_diagnostics() - Retrieves execution metrics (running,
queued, completed counts)
- get_agent_diagnostics() - Fetches agent-related metrics
- get_running_executions_details() - Lists all running executions with
detailed info
- stop_execution() and stop_executions_bulk() - Admin controls for
stopping executions
- **Admin API endpoints**
(backend/server/v2/admin/diagnostics_admin_routes.py):
- GET /admin/diagnostics/executions - Execution status metrics
- GET /admin/diagnostics/agents - Agent utilization metrics
- GET /admin/diagnostics/executions/running - Paginated list of running
executions
- POST /admin/diagnostics/executions/stop - Stop single execution
- POST /admin/diagnostics/executions/stop-bulk - Stop multiple
executions
- All endpoints secured with admin-only access
#### Frontend Changes:
- **Diagnostics Dashboard**
(frontend/src/app/(platform)/admin/diagnostics/page.tsx):
- Real-time system metrics display (running, queued, completed
executions)
- RabbitMQ queue depth monitoring
- Agent utilization statistics
- Auto-refresh every 30 seconds
- **Execution Management Table**
(frontend/src/app/(platform)/admin/diagnostics/components/ExecutionsTable.tsx):
- Displays running executions with: ID, Agent Name, Version, User
Email/ID, Status, Start Time
- Multi-select functionality with checkboxes
- Individual stop buttons for each execution
- "Stop Selected" and "Stop All" bulk actions
- Confirmation dialogs for safety
- Pagination for handling large datasets
- Toast notifications for user feedback
#### Security:
- All admin endpoints properly secured with requires_admin_user
decorator
- Frontend routes protected with role-based access controls
- Admin navigation link only visible to admin users
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified admin-only access to diagnostics page
- [x] Tested execution metrics display and auto-refresh
- [x] Confirmed RabbitMQ queue depth monitoring works
- [x] Tested stopping individual executions
- [x] Tested bulk stop operations with multi-select
- [x] Verified pagination works for large datasets
- [x] Confirmed toast notifications appear for all actions
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
(no changes needed)
- [x] `docker-compose.yml` is updated or already compatible with my
changes (no changes needed)
- [x] I have included a list of my configuration changes in the PR
description (no config changes required)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Adds new admin-only endpoints that can stop, requeue, and bulk-mark
executions as `FAILED`, plus schedule deletion, which can directly
impact production workload and data integrity if misused or buggy.
>
> **Overview**
> Introduces a **System Diagnostics** admin feature spanning backend +
frontend to monitor execution/schedule health and perform remediation
actions.
>
> On the backend, adds a new `backend/data/diagnostics.py` data layer
and `diagnostics_admin_routes.py` with admin-secured endpoints to fetch
execution/agent/schedule metrics (including RabbitMQ queue depths and
invalid-state detection), list problem executions/schedules, and perform
bulk operations like `stop`, `requeue`, and `cleanup` (marking
orphaned/stuck items as `FAILED` or deleting orphaned schedules). It
also extends `get_graph_executions`/`get_graph_executions_count` with
`execution_ids` filtering, pagination, started/updated time filters, and
configurable ordering to support efficient bulk/admin queries.
>
> On the frontend, adds an admin diagnostics page with summary cards and
tables for executions and schedules (tabs for
orphaned/failed/long-running/stuck-queued/invalid, plus confirmation
dialogs for destructive actions), wires it into admin navigation, and
adds comprehensive unit tests for both the new API routes and UI
behavior.
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
|
||
|
|
24850e2a3e |
feat(backend/autopilot): stream extended_thinking on baseline via OpenRouter (#12870)
### Why / What / How **Why:** Fast-mode autopilot never renders a Reasoning block. The frontend already has `ReasoningCollapse` wired up and the wire protocol already carries `StreamReasoning*` events (landed for SDK mode in #12853), but the baseline (OpenRouter OpenAI-compat) path never asks Anthropic for extended thinking and never parses reasoning deltas off the stream. Result: users on fast/standard get a good answer with no visible chain-of-thought, while SDK users see the full Reasoning collapse. **What:** Plumb reasoning end-to-end through the baseline path by opting into OpenRouter's non-OpenAI `reasoning` extension, parsing the reasoning delta fields off each chunk, and emitting the same `StreamReasoningStart/Delta/End` events the SDK adapter already uses. **How:** - **New config:** `baseline_reasoning_max_tokens` (default 8192; 0 disables). Sent as `extra_body={"reasoning": {"max_tokens": N}}` only on Anthropic routes — other providers drop the field, and `is_anthropic_model()` already gates this. - **Delta extraction:** `_extract_reasoning_delta()` handles all three OpenRouter/provider variants in priority order — legacy `delta.reasoning` (string), DeepSeek-style `delta.reasoning_content`, and the structured `delta.reasoning_details` list (text/summary entries; encrypted or unknown entries are skipped). - **Event emission:** Reasoning uses the same state-machine rules the SDK adapter uses — a text delta or tool_use delta arriving mid-stream closes the open reasoning block first, so the AI SDK v5 transport keeps reasoning / text / tool-use as distinct UI parts. On stream end, any still-open reasoning block gets a matching `reasoning-end` so a reasoning-only turn still finalises the frontend collapse. - **Scope:** Live streaming only. Reasoning is not persisted to `ChatMessage` rows or the transcript builder in this PR (SDK path does so via `content_blocks=[{type: 'thinking', ...}]`, but that round-trip requires Anthropic signature plumbing baseline doesn't have today). Reload will still not show reasoning on baseline sessions — can follow up if we decide it's worth the signature handling. ### Changes - `backend/copilot/config.py` — new `baseline_reasoning_max_tokens` field. - `backend/copilot/baseline/service.py` — new `_extract_reasoning_delta()` helper; reasoning block state on `_BaselineStreamState`; `reasoning` gated into `extra_body`; chunk loop emits `StreamReasoning*` events with text/tool_use transition rules; stream-end closes any open reasoning block. - `backend/copilot/baseline/service_unit_test.py` — 11 new tests covering extractor variants (legacy string, deepseek alias, structured list with text/summary aliases, encrypted-skip, empty), paired event ordering (reasoning-end before text-start), reasoning-only streams, and that the `reasoning` request param is correctly gated by model route (Anthropic vs non-Anthropic) and by the config flag. ### Checklist For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [x] `poetry run pytest backend/copilot/baseline/service_unit_test.py backend/copilot/baseline/transcript_integration_test.py` — 103 passed - [ ] Manual: with `CHAT_USE_CLAUDE_AGENT_SDK=false` and `CHAT_MODEL=anthropic/claude-sonnet-4-6`, send a multi-step prompt on fast mode and confirm a Reasoning collapse appears alongside the final text - [ ] Manual: flip `CHAT_BASELINE_REASONING_MAX_TOKENS=0` and confirm baseline responses revert to text-only (no reasoning param, no reasoning UI) - [ ] Manual: with a non-Anthropic baseline model (`openai/gpt-4o`), confirm the request does NOT include `reasoning` and nothing regresses For configuration changes: - [x] `.env.default` is compatible — new setting falls back to the pydantic default |
||
|
|
e17e9f13c4 |
fix(backend/copilot): reduce SDK + baseline prompt cache waste (#12866)
## Summary Four cost-reduction changes for the copilot feature. Consolidated into one PR at user request; each commit is self-contained and bisectable. ### 1. SDK: full cross-user cache on every turn (CLI 2.1.116 bump) Previous behavior: CLI 2.1.97 crashed when `excludeDynamicSections=True` was combined with `--resume`, so the code fell back to a raw `system_prompt` string on resume, losing Claude Code's default prompt and all cache markers. Every Turn 2+ of an SDK session wrote ~33K tokens to cache instead of reading. Fix: install `@anthropic-ai/claude-code@2.1.116` in the backend Docker image and point the SDK at it via `CHAT_CLAUDE_AGENT_CLI_PATH=/usr/bin/claude`. CLI 2.1.98+ fixes the crash, so we can use the preset with `exclude_dynamic_sections=True` on every turn — Turn 1, 2, 3+ all share the same static prefix and hit the **cross-user** prompt cache. **Local dev requirement:** if `CHAT_CLAUDE_AGENT_CLI_PATH` is unset, the bundled 2.1.97 fallback will crash on `--resume`. Install the CLI globally (`npm install -g @anthropic-ai/claude-code@2.1.116`) or set the env var. ### 2. Baseline: add `cache_control` markers (commit `756b3ecd9` + follow-ups) Baseline path had zero `cache_control` across `backend/copilot/**`. Every turn was full uncached input (~18.6K tokens, ~$0.058). Two ephemeral markers — on the system message (content-blocks form) and the last tool schema — plus `anthropic-beta: prompt-caching-2024-07-31` via `extra_headers` as defense-in-depth. Helpers split into `_mark_tools_*` (precomputed once per session) and `_mark_system_*` (per-round, O(1)). Repeat hellos: ~$0.058 → ~$0.006. ### 3. Drop `get_baseline_supplement()` (commit `6e6c4d791`) `_generate_tool_documentation()` emitted ~4.3K tokens of `(tool_name, description)` pairs that exactly duplicated the tools array already in the same request. Deleted. `SHARED_TOOL_NOTES` (cross-tool workflow rules) is preserved. Baseline "hello" input: ~18.7K → ~14.4K tokens. ### 4. Langfuse "CoPilot Prompt" v26 (published under `review` label) Separate, out-of-repo change. v25 had three duplicate "Example Response" blocks + a 10-step "Internal Reasoning Process" section. v26 collapses to one example + bullet-form reasoning. Char count 20,481 → 7,075 (rough 4 chars/token → ~5,100 → ~1,770 tokens). - v26 is published with label `review` (NOT `production`); v25 remains active. - Promote via `mcp__langfuse__updatePromptLabels(name="CoPilot Prompt", version=26, newLabels=["production"])` after smoke-test. - Rollback: relabel v25 `production`. ## Test plan - [x] Unit tests for `_build_system_prompt_value` (fresh vs resumed turns emit identical preset dict) - [x] SDK compat tests pass including `test_bundled_cli_version_is_known_good_against_openrouter` - [x] `cli_openrouter_compat_test.py` passes against CLI 2.1.116 (locally verified with `CHAT_CLAUDE_AGENT_CLI_PATH=/opt/homebrew/bin/claude`) - [x] 8 new `_mark_*` unit tests + identity regression test for `_fresh_*` helpers - [x] `SHARED_TOOL_NOTES` public-constant test passes; 5 old tool-docs tests removed - [ ] **Manual cost verification (commit 1):** send two consecutive SDK turns; Turn 2 and Turn 3 should both show `cacheReadTokens` ≈ 33K (full cross-user cache hits). - [ ] **Manual cost verification (commit 2):** send two "hello" turns on baseline <5 min apart; Turn 2 reports `cacheReadTokens` ≈ 18K and cost ≈ $0.006. - [ ] **Regression sweep for commit 3:** one turn per tool family — `search_agents`, `run_agent`, `add_memory`/`forget_memory`/`search_memory`, `search_docs`, `read_workspace_file` — to verify no tool-selection regression from dropping the prose tool docs. - [ ] **Langfuse v26 smoke test:** 5-10 varied turns after relabelling to `production`; compare responses vs v25 for regression on persona, concision, capability-gap handling, credential security flows. ## Deployment notes - Production Docker image now installs CLI 2.1.116 (~20 MB added). - `CHAT_CLAUDE_AGENT_CLI_PATH=/usr/bin/claude` set in the Dockerfile; runtime can override via env. - First deploy after this merge needs a fresh image rebuild to pick up the new CLI. |
||
|
|
f238c153a5 |
fix(backend/copilot): release session cluster lock on completion (#12867)
## Summary Fixes a bug where a chat session gets silently stuck after the user presses Stop mid-turn. **Root cause:** the cancel endpoint marks the session `failed` after polling 5s, but the cluster lock held by the still-running task is only released by `on_run_done` when the task actually finishes. If the task hangs past the 5s poll (slow LLM call, agent-browser step, etc.), the lock lingers for up to 5 min — `stream_chat_post`'s `is_turn_in_flight` check sees the flipped meta (`failed`) and enqueues a new turn, but the run handler sees the stale lock and drops the user's message at `manager.py:379` (`reject+requeue=False`). The new SSE stream hangs until its 60s idle timeout. ### Fix Two cooperating changes: 1. **`mark_session_completed` force-releases the cluster lock** in the same transaction that flips status to `completed`/`failed`. Unconditional delete — by the time we're declaring the session dead, we don't care who the current lock holder is; the lock has to go so the next enqueued turn can acquire. This is what closes the stuck-session window. 2. **`ClusterLock.release()` is now owner-checked** (Lua CAS — `GET == token ? DEL : noop` atomically). Force-release means another pod may legitimately own the key by the time the original task's `on_run_done` eventually fires. Without the CAS, that late `release()` would wipe the successor's lock. With it, the late `release()` is a safe no-op when the owner has changed. Together: prompt release on completion (via force-delete) + safe cleanup when on_run_done catches up (via CAS). That re-syncs the API-level `is_turn_in_flight` check with the actual lock state, so the contention window disappears. No changes to the worker-level contention handler: `stream_chat_post` already queues incoming messages into the pending buffer when a turn is in flight (via `queue_pending_for_http`). With these fixes, the worker never sees contention in the common case; if it does (true multi-pod race), the pre-existing `reject+requeue=False` behaviour still applies — we'll revisit that path with its own PR if it becomes a production symptom. ### Verification - Reproduced the original stuck-session symptom locally (Stop mid-turn → send new message → backend logs `Session … already running on pod …`, user message silently lost, SSE stream idle 60s then closes). - After the fix: cancel → new message → turn starts normally (lock released by `mark_session_completed`). - `poetry run pyright` — 0 errors on edited files. - `pytest backend/copilot/stream_registry_test.py backend/executor/cluster_lock_test.py` — 33 passed (includes the successor-not-wiped test). ## Changes - `autogpt_platform/backend/backend/copilot/executor/utils.py` — extract `get_session_lock_key(session_id)` helper so the lock-key format has a single source of truth. - `autogpt_platform/backend/backend/copilot/executor/manager.py` — use the helper where the cluster lock is created. - `autogpt_platform/backend/backend/copilot/stream_registry.py` — `mark_session_completed` deletes the lock key after the atomic status swap (force-release). - `autogpt_platform/backend/backend/executor/cluster_lock.py` — `ClusterLock.release()` (sync + async) uses a Lua CAS to only delete when `GET == token`, protecting against wiping a successor after a force-release. ## Test plan - [ ] Send a message in /copilot that triggers a long turn (e.g. `run_agent`), press Stop before it finishes, then send another message. Expect: new turn starts promptly (no 5-min wait for lock TTL). - [ ] Happy path regression — send a normal message, verify turn completes and the session lock key is deleted after completion. - [ ] Successor protection — unit test `test_release_does_not_wipe_successor_lock` covers: A acquires, external DEL, B acquires, A.release() is a no-op, B's lock intact. |
||
|
|
01f1289aac |
feat(copilot): real OpenRouter cost + cost-based rate limits (percent-only public API) (#12864)
## Why
After
|
||
|
|
343222ace1 |
feat(platform): defer paid-to-paid subscription downgrades + cancel-pending flow (#12865)
### Why / What / How
**Why:** Only downgrades to FREE were scheduled at period end; paid→paid
downgrades (e.g. BUSINESS→PRO) applied immediately via Stripe proration.
The asymmetry meant users lost their higher tier mid-cycle in exchange
for a Stripe credit voucher only redeemable on a future subscription — a
confusing pattern that produces negative-value paths for users actually
cancelling. There was also no way to cancel a pending downgrade or
paid→FREE cancellation once scheduled.
**What:** Standardize on "upgrade = immediate, downgrade = next cycle"
and let users cancel a pending change by clicking their current tier.
Harden the new code against conflicting subscription state, concurrent
tab races, flaky Stripe calls, and hot-path latency regressions.
**How:**
Subscription state machine:
- **Upgrade** (PRO→BUSINESS) — `stripe.Subscription.modify` with
immediate proration (unchanged). If a downgrade schedule is already
attached, release it first so the upgrade wins.
- **Paid→paid downgrade** (BUSINESS→PRO) — creates a
`stripe.SubscriptionSchedule` with two phases (current tier until
`current_period_end`, target tier after). No mid-cycle tier demotion.
Defensive pre-clear: existing schedule → release;
`cancel_at_period_end=True` → set to False.
- **Paid→FREE** — unchanged: `cancel_at_period_end=True`.
- **Same-tier update** — reuses the existing `POST
/credits/subscription` route. When `target_tier == current_tier`,
backend calls `release_pending_subscription_schedule` (idempotent) and
returns status. No dedicated cancel-pending endpoint — "Keep my current
tier" IS the cancel operation.
- `release_pending_subscription_schedule` is idempotent on
terminal-state schedules and clears both `schedule` and
`cancel_at_period_end` atomically per call.
API surface:
- New fields on `SubscriptionStatusResponse`: `pending_tier` +
`pending_tier_effective_at` (pulled from the schedule's next-phase
`start_date` so dashboard-authored schedules report the correct
timestamp).
- `POST /credits/subscription` now returns `SubscriptionStatusResponse`
(previously `SubscriptionCheckoutResponse`); the response still carries
`url` for checkout flows and adds the status fields inline.
- `get_pending_subscription_change` is cached with a 30s TTL — avoids
hammering Stripe on every home-page load.
- Webhook dispatches
`subscription_schedule.{released,completed,updated}` through the main
`sync_subscription_from_stripe` flow so both event sources converge to
the same DB state.
Implementation notes:
- New Stripe calls use native async (`stripe.Subscription.list_async`
etc.) and typed attribute access — no `run_in_threadpool` wrapping in
the new helpers.
- Shared `_get_active_subscription` helper collapses the "list
active/trialing subs, take first" pattern used by 4 callers.
Frontend:
- `PendingChangeBanner` sub-component above the tier grid with formatted
effective date + "Keep [CurrentTier]" button. `aria-live="polite"` for
screen readers; locale pinned to `en-US` to avoid SSR/CSR hydration
mismatch.
- "Keep [CurrentTier]" also available as a button on the current tier
card.
- Other tier buttons disabled while a change is pending — user must
resolve pending first to prevent stacked schedules.
- `cancelPendingChange` reuses `useUpdateSubscriptionTier` with `tier:
current_tier`; awaits `refetch()` on both success and error paths so the
UI reconciles even if the server succeeded but the client didn't receive
the response.
### Changes
**Backend (`credit.py`, `v1.py`)**
- Tier-ordering helpers (`is_tier_upgrade`/`is_tier_downgrade`).
- `modify_stripe_subscription_for_tier` routes downgrades through
`_schedule_downgrade_at_period_end`; upgrade path releases any pending
schedule first.
- `_schedule_downgrade_at_period_end` defensively releases pre-existing
schedules and clears `cancel_at_period_end` before creating the new
schedule.
- `release_pending_subscription_schedule` idempotent on terminal-state
schedules; logs partial-failure outcomes.
- `_next_phase_tier_and_start` returns both tier and phase-start
timestamp; warns on unknown prices.
- `get_pending_subscription_change` cached (30s TTL), narrow exception
handling.
- `sync_subscription_schedule_from_stripe` delegates to
`sync_subscription_from_stripe` for convergence with the main webhook
path.
- Shared `_get_active_subscription` +
`_release_schedule_ignoring_terminal` helpers.
- `POST /credits/subscription` absorbs the same-tier "cancel pending
change" branch.
**Frontend (`SubscriptionTierSection/*`)**
- `PendingChangeBanner` new sub-component (a11y, locale-pinned date,
paid→FREE vs paid→paid copy split, non-null effective-date assertion, no
`dark:` utilities).
- "Keep [CurrentTier]" button on current tier card.
- `useSubscriptionTierSection` — `cancelPendingChange` reuses the
update-tier mutation.
- Copy: downgrade dialog + status hint updated.
- `helpers.ts` extracted from the main component.
**Tests**
- Backend: +24 tests (95/95 passing): upgrade-releases-pending-schedule,
schedule-releases-existing-schedule, cancel-at-period-end collision,
terminal-state release idempotency, unknown-price logging, status
response population, same-tier-POST-with-pending, webhook delegation.
- Frontend: +5 integration tests (21/21 passing): banner render/hide,
Keep-button click from banner + current card, paid→paid dialog copy.
### Checklist
- [x] Backend unit tests: 95 pass
- [x] Frontend integration tests: 21 pass
- [x] `poetry run format` / `poetry run lint` clean
- [x] `pnpm format` / `pnpm lint` / `pnpm types` clean
- [ ] Manual E2E on live Stripe (dev env) — pending deploy: BUSINESS→PRO
creates schedule, DB tier unchanged until period end
- [ ] Manual E2E: "Keep BUSINESS" in banner releases schedule
- [ ] Manual E2E: cancel pending paid→FREE flips `cancel_at_period_end`
back to false
- [ ] Manual E2E: BUSINESS→PRO (scheduled) then attempt BUSINESS→FREE
clears the PRO schedule, sets cancel_at_period_end
- [ ] Manual E2E: BUSINESS→PRO (scheduled) then upgrade back to BUSINESS
releases the schedule
|
||
|
|
a8226af725 |
fix(copilot): dedupe tool row, lift bash_exec timeout, Stop+resend recovery (#12862)
Closes #12861 · [OPEN-3096](https://linear.app/autogpt/issue/OPEN-3096) ## Why Four related copilot UX / stability issues surfaced on dev once action tools started rendering inline in the chat (see #12813): ### 1. Duplicate bash_exec row `GenericTool` rendered two rows saying the same thing for every completed tool call — a muted subtitle line ("Command exited with code 1" / "Ran: sleep 20") **and** a `ToolAccordion` with the command echoed in its description. Previously hidden inside the "Show reasoning" / "Show steps" collapse, now visibly duplicated. ### 2. `bash_exec` capped at 120s via advisory text The tool schema said `"Max seconds (default 30, max 120)"`; the model obeyed, so long-running scripts got clipped at 120s with a vague `Timed out after 120s` even though the E2B sandbox has no such limit. Confirmed via Langfuse traces — the model picks `120` for long scripts because that's what the schema told it the max was. E2B path never had a server-side clamp. Originally added in #12103 (default 30) and tightened to "max 120" advisory in #12398 (token-reduction pass). ### 3. 30s default was too aggressive `pip install`, small data-processing scripts, etc. routinely cross 30s and got killed before the model thought to retry with a bigger timeout. ### 4. Stop + edit + resend → "The assistant encountered an error" ([OPEN-3096](https://linear.app/autogpt/issue/OPEN-3096)) Two independent bugs both land on the same banner — fixing only one leaves the other visible on the next action. **4a. Stream lock never released on Stop** *(the error in the ticket screenshot)*. The executor's `async for chunk in stream_and_publish(...)` broke out on `cancel.is_set()` without calling `aclose()` on the wrapper. `async for` does NOT auto-close iterators on `break`, so `stream_chat_completion_sdk` stayed suspended at its current `await` — still holding the per-session Redis lock (TTL 120s) until GC eventually closed it. The next `POST /stream` hit `lock.try_acquire()` at [sdk/service.py](autogpt_platform/backend/backend/copilot/sdk/service.py) and yielded `StreamError("Another stream is already active for this session. Please wait or stop it.")`. The `except GeneratorExit → lock.release()` handler written exactly for this case never fired because nothing sent GeneratorExit. **4b. Orphan `tool_use` after stop-mid-tool.** Even with the lock released, the stop path persists the session ending on an assistant row whose `tool_calls` have no matching `role="tool"` row. On the next turn, `_session_messages_to_transcript` hands Claude CLI `--resume` a JSONL with a `tool_use` and no paired `tool_result`, and the SDK raises a vague error — same banner. The ticket's "Open questions" explicitly flags this. ## What **Frontend — `GenericTool.tsx`** split responsibilities between the two rows so they don't duplicate: - **Subtitle row** (always visible, muted): *what ran* — `Ran: sleep 120`. Never the exit code. - **Accordion description**: *how it ended* — `completed` / `status code 127 · bash: missing-bin: command not found` / `Timed out after 120s` / (fallback to command preview for legacy rows missing `exit_code` / `timed_out`). Pulled from the first non-empty line of `stdout` / `stderr` when available. - **Expanded accordion**: full command + stdout + stderr code blocks (unchanged). **Backend — `bash_exec.py`**: - Drop the "max 120" advisory from the schema description. - Bump default `timeout: 30 → 120`. - Clean up the result message — `"Command executed with status code 0"` (no "on E2B", no parens). **Backend — `executor/processor.py` + `stream_registry.py` (OPEN-3096 #4a)**: wrap the consumer `async for` in `try/finally: await stream.aclose()`. Close now propagates through `stream_and_publish` into `stream_chat_completion_sdk`, whose existing `except GeneratorExit → lock.release()` releases the Redis lock immediately on cancel. Stream types tightened to `AsyncGenerator[StreamBaseResponse, None]` so the defensive `getattr(stream, "aclose", None)` goes away. **Backend — `session_cleanup.py` (OPEN-3096 #4b)**: new `prune_orphan_tool_calls()` helper walks the trailing session tail and drops any trailing assistant row whose `tool_calls` have unresolved ids (plus everything after it) and any trailing `STOPPED_BY_USER_MARKER` system-stop row. Single backward pass — tolerates the marker being present or absent. Called from the existing turn-start cleanup in both `sdk/service.py` and `baseline/service.py`; takes an optional `log_prefix` so both paths emit the same INFO log when something was popped. In-memory only — the DB save path is append-only via `start_sequence`. ## Test plan - [x] `pnpm exec vitest run src/app/(platform)/copilot/tools/GenericTool src/app/(platform)/copilot/components/ChatMessagesContainer` — 105 pass (6 new for GenericTool subtitle/description variants + legacy-fallback case). - [x] `pnpm format` / `pnpm lint` / `pnpm types` — clean. - [x] `poetry run pytest backend/copilot/sdk/session_persistence_test.py` — 17 pass (6 + 3 new covering the orphan-tool-call prune and its optional-log-prefix branch). - [x] `poetry run pytest backend/copilot/stream_registry_test.py backend/copilot/executor/processor_test.py` — 19 pass (2 for aclose propagation on the `stream_and_publish` wrapper, 2 for `_execute_async` aclose propagation on both exit paths, 1 for publish_chunk RedisError warning ladder). - [x] `poetry run ruff check` / `poetry run pyright` on touched files — clean. - [x] Manual: fire a `bash_exec` — one labelled row, accordion description reads sensibly (`completed` / `status code 1 · …` / `Timed out after 120s`). - [x] Manual: script that needs >120s — no longer clipped. - [x] Manual: Stop mid-tool + edit + resend — Autopilot resumes without "Another stream is already active" and without the vague SDK error. ## Scope note Does not touch `splitReasoningAndResponse` — re-collapsing action tools back into "Show steps" is #12813's responsibility. |
||
|
|
f06b5293de |
fix(frontend/library): compute monthly spend for AgentBriefingPanel (#12854)
### Why / What / How <img width="900" alt="Screenshot 2026-04-20 at 19 52 22" src="https://github.com/user-attachments/assets/c30d5f18-2842-4a8a-ac3d-5bfee18fcd56" /> **Why:** The "Spent this month" tile in the Agent Briefing Panel on the Library page always showed `$0`, even for users with real execution usage. The tile is meant to give a quick sense of monthly spend across all agents. **What:** Compute `monthlySpend` from actual execution data and format it as currency. **How:** - `useLibraryFleetSummary` now sums `stats.cost` (cents) across every execution whose `started_at` falls within the current calendar month. Previously `monthlySpend` was hardcoded to `0`. - `FleetSummary.monthlySpend` is documented as being in cents (consistent with backend + `formatCents`). - `StatsGrid` now uses `formatCents` from the copilot usage helpers to render the tile (e.g. `$12.34` instead of the broken `$0`). ### Changes 🏗️ - `autogpt_platform/frontend/src/app/(platform)/library/hooks/useLibraryFleetSummary.ts`: aggregate `stats.cost` across executions started in the current calendar month; add `toTimestamp` and `startOfCurrentMonth` helpers. - `autogpt_platform/frontend/src/app/(platform)/library/components/AgentBriefingPanel/StatsGrid.tsx`: format the "Spent this month" tile via shared `formatCents` helper. - `autogpt_platform/frontend/src/app/(platform)/library/types.ts`: document that `FleetSummary.monthlySpend` is in cents. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [ ] Load `/library` with the `AGENT_BRIEFING` flag enabled and at least one completed execution in the current month — the "Spent this month" tile shows the correct cumulative cost. - [ ] With no executions this month, the tile shows `$0.00`. - [ ] Type-check (`pnpm types`), lint (`pnpm lint`), and integration tests (`pnpm test:unit`) pass locally. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
70b591d74f |
fix(copilot): persist reasoning, split steps/reasoning UX, fix mid-turn promote stream stall (#12853)
## Why
Four related issues that surfaced when queued follow-ups hit an
extended_thinking turn:
1. **Mid-turn promote stalled the SSE stream.** `pollBackendAndPromote`
used `setMessages((prev) => [...prev, bubble])` — Vercel AI SDK's
`useChat` streams SSE deltas into `messages[-1]`, so once a user bubble
ended up there, every subsequent chunk silently landed on the wrong
message. Chat sat frozen until a page refresh, even though the backend's
stream completed cleanly.
2. **Thinking-only final turn looked identical to a frozen UI.** When
Claude's last LLM call after a tool_result produced only a
`ThinkingBlock` (no `TextBlock`, no `ToolUseBlock`), the response
adapter silently dropped it and the UI hung on "Thought for Xs" with no
response text.
3. **Reasoning was invisible.** `ThinkingBlock` was dropped live and
never persisted in a way the frontend could render — sessions on reload
/ shared links showed no thinking, a confusing UX gap ("display for
nothing").
4. **Cross-pod Redis replay dropped reasoning events.** The
`stream_registry._reconstruct_chunk` type map had no entries for
`reasoning-*` types, so any client that subscribed mid-stream (share,
reload, cross-pod) silently dropped them with `Unknown chunk type:
reasoning-delta`.
## What
### Mid-turn promote — splice before the trailing assistant
In `useCopilotPendingChips.ts::pollBackendAndPromote`:
```ts
setMessages((prev) => {
const bubble = makePromotedUserBubble(drained, "midturn", crypto.randomUUID());
const lastIdx = prev.length - 1;
if (lastIdx >= 0 && prev[lastIdx].role === "assistant") {
return [...prev.slice(0, lastIdx), bubble, prev[lastIdx]];
}
return [...prev, bubble];
});
```
Streaming assistant stays at `messages[-1]`, AI SDK deltas keep routing
correctly. `useHydrateOnStreamEnd` snaps the bubble to the DB-canonical
position when the stream ends.
### Reasoning — end-to-end visibility (live + persisted)
- **Wire protocol**: new `StreamReasoningStart` / `StreamReasoningDelta`
/ `StreamReasoningEnd` events matching AI SDK v5's `reasoning-*` wire
names, so `useChat` accumulates them into a `type: 'reasoning'`
UIMessage part natively.
- **Response adapter**: every `ThinkingBlock` now emits reasoning
events; text/tool_use transitions close the open reasoning block so AI
SDK doesn't merge distinct parts.
- **Stream registry**: added `reasoning-*` types to
`_reconstruct_chunk`'s type_to_class map so Redis replay no longer drops
them on cross-pod / reload / share.
- **Persistence** (new): each `StreamReasoningStart` opens a
`ChatMessage(role="reasoning")` row in `session.messages`; deltas
accumulate into its content; `StreamReasoningEnd` closes it. No schema
migration — `ChatMessage.role` is already `String`.
`extract_context_messages` filters `role="reasoning"` out of LLM context
(the `--resume` CLI session already carries thinking separately) so the
model never re-ingests prior reasoning.
- **Frontend conversion**: `convertChatSessionMessagesToUiMessages` maps
`role="reasoning"` DB rows into `{type: "reasoning", text}` parts on the
surrounding assistant bubble, so reload / shared-link sessions render
reasoning identically to live stream.
### Steps / Reasoning UX — modal + accordion split
- **`StepsCollapse`** (new): a Dialog-backed "Show steps" modal wraps
the pre-final-answer group (tool timeline + per-block reasoning). Modal
keeps the steps visually grouped and out of the reading flow.
- **`ReasoningCollapse`** (rewritten): inline accordion with "Show
reasoning" / "Hide reasoning" toggle — no longer a modal, so it expands
*inside* the Steps modal without stacking two dialogs. Reasoning text
appears indented with a left border.
- **`splitReasoningAndResponse`**: reasoning parts now stay in the
reasoning group (instead of being pinned out), so they show up inside
the Steps modal alongside the tool-use timeline.
### Thinking-only final turn — synthesize a closing line
(belt-and-suspenders)
- **Prompt rule** (`_USER_FOLLOW_UP_NOTE`): "Every turn MUST end with at
least one short user-facing text sentence."
- **Adapter fallback**: tracks `_text_since_last_tool_result`; at
`ResultMessage success` with tools run + zero text since, opens a fresh
step (`UserMessage` already closed the previous one) and injects `"(Done
— no further commentary.)"` before `StreamFinish`. Only fires for the
pathological case — pure-text turns untouched.
## Test plan
- [x] `pnpm vitest run` on copilot files — all 638 prior tests pass;
**17 new tests** added covering:
- `convertChatSessionToUiMessages`: reasoning row alone / merged with
assistant text / multi-row / empty skip / duration capture
- `ReasoningCollapse`: initial collapsed, toggle, `rotate-90`,
`aria-expanded`
- `StepsCollapse`: trigger + dialog open renders children
- `MessagePartRenderer`: reasoning → `<pre>` inside collapse,
whitespace/missing text → null
- `splitReasoningAndResponse`: reasoning-stays-in-reasoning regression
- [x] `poetry run pytest backend/copilot/sdk/response_adapter_test.py` —
36 pass (7 new: 4 reasoning streaming, 3 thinking-only fallback)
- [x] Manual: reasoning streams live and persists across reload on a
fresh session
- [x] Manual: previously-created sessions (pre-persistence) don't have
`role="reasoning"` rows — behaves as a clean no-op (no reasoning shown,
no error), new sessions render reasoning inside Steps modal
## Notes
- No DB migration — `ChatMessage.role` is already an open `String`;
`role="reasoning"` is simply filtered out of LLM context builds but
rendered by the frontend.
- Addresses /pr-review blockers: (a) stream_registry missing reasoning
types in Redis round-trip, (b) fallback text emitted outside a step, (c)
dead `case "thinking"` in renderer (now uses the live `reasoning` type
uniformly).
|
||
|
|
b1c043c2d8 |
feat(copilot): queue follow-up messages on busy sessions (UI + run_sub_session + AutoPilot block) (#12737)
## Why
Users and tools can target a copilot session that already has a turn
running. Before this PR there was no uniform behaviour for that case —
the UI manually routed to a separate queue endpoint, `run_sub_session`
and the AutoPilot block raced the cluster lock, and in-turn follow-ups
only reached the model at turn-end via auto-continue. Outcome: dropped
messages, duplicate tool rows, missed mid-turn intent, latent
correctness bugs in block execution.
## What
A single "message arrived → turn already running?" primitive, shared by
every caller:
1. **POST `/stream`** (UI chat): self-defensive. Session idle → SSE as
today; session busy → `202 application/json` with `{buffer_length,
max_buffer_length, turn_in_flight}`. The deprecated `POST
/messages/pending` endpoint is removed (`GET /messages/pending` peek
stays).
2. **`run_copilot_turn_via_queue`** (shared primitive from #12841, used
by `run_sub_session` + `AutoPilotBlock`): gains the same busy-check.
Busy session → push to pending buffer, return `("queued",
SessionResult(queued=True, pending_buffer_length=N))` without creating a
stream registry session or enqueueing a RabbitMQ job. All callers
inherit queueing.
3. **Mid-turn delivery**: drained follow-ups are attached to every
tool_result's `additionalContext` via the SDK's `PostToolUse` hook —
covers both MCP and built-in tools (WebSearch/Read/Agent/etc.), not just
`run_block`. Claude reads the queued text on the next LLM round of the
same turn.
4. **UI observability**: chips promote to a proper user bubble at the
correct chronological position (after the tool_result row that consumed
them). Auto-continue handles end-of-turn drainage; mid-turn backend poll
handles the tool-boundary drainage path.
## How
**Data plane**
- `backend/copilot/pending_messages.py` — Redis list per session
(LPOP-count for atomic drain), TTL, fire-and-forget pub/sub notify. MAX
10 per session.
- `backend/copilot/pending_message_helpers.py` — `is_turn_in_flight`,
`queue_user_message`, `drain_and_format_for_injection`,
`persist_pending_as_user_rows` (shared persist+rollback used by both
baseline and SDK paths).
- `backend/data/redis_helpers.py` — centralised `incr_with_ttl`,
`capped_rpush`, `hash_compare_and_set`; every Lua script and pipeline
atomicity lives in one place.
**Injection sites**
- `backend/copilot/sdk/security_hooks.py::post_tool_use_hook` — drains +
returns `additionalContext`. Single hook covers built-in + MCP tools.
- `backend/copilot/sdk/service.py` — `StreamToolOutputAvailable`
dispatch persists the drained follow-up as a real user row right after
the tool_result (UI bubble at the right index).
`state.midturn_user_rows` keeps the CLI upload watermark honest.
- `backend/copilot/baseline/service.py` — same drain at round
boundaries, uses the shared `persist_pending_as_user_rows` helper so
baseline + SDK code paths don't diverge.
**Dispatch**
- `backend/copilot/sdk/session_waiter.py::run_copilot_turn_via_queue` —
`is_turn_in_flight` short-circuit; `SessionResult` gains `queued` +
`pending_buffer_length`; `SessionOutcome` gains `"queued"`.
- `backend/api/features/chat/routes.py::stream_chat_post` — busy-check
returns 202 with `QueuePendingMessageResponse`; `POST /messages/pending`
deleted.
- `backend/copilot/tools/run_sub_session.py` / `models.py` —
`SubSessionStatusResponse.status` gains `"queued"`;
`response_from_outcome` renders a clear queued-state message with the
pending-buffer depth and a link to watch live.
- `backend/blocks/autopilot.py::execute_copilot` — surfaces queued state
as descriptive response text + empty `tool_calls`/history when
`result.queued`.
**Frontend**
- `src/app/(platform)/copilot/useCopilotPendingChips.ts` — hook owning
the chip lifecycle: backend peek on session load, auto-continue
promotion when a second assistant id appears, mid-turn poll that
promotes when the backend count drops.
- `src/app/(platform)/copilot/useHydrateOnStreamEnd.ts` —
force-hydrate-waits-for-fresh-reference dance extracted.
- `src/app/(platform)/copilot/helpers/stripReplayPrefix.ts` — pure
function with drop / strip / streaming-catch-up cases + helper
decomposition.
- `src/app/(platform)/copilot/helpers/makePromotedBubble.ts` — one-line
helper for the promoted bubble shape.
- `src/app/(platform)/copilot/helpers/queueFollowUpMessage.ts` — thin
`fetch` wrapper for the 202 path (AI SDK's `useChat` fetcher only
handles SSE, so we can't reuse `sendMessage` for the queued response).
## Test plan
Backend unit + integration (`poetry run pytest backend/copilot
backend/api/features/chat`):
- [x] 107 tests pass — pending buffer, drain helpers, routes,
session_waiter queue branch, run_sub_session outcome rendering,
autopilot block
- [x] New `session_waiter_test.py` proves the queue branch
short-circuits `stream_registry.create_session` + `enqueue_copilot_turn`
- [x] Mid-turn persist has a rollback-and-re-queue path tested for when
`session.messages` persist silently fails to back-fill sequences
Frontend unit (`pnpm vitest run`):
- [x] 630 tests pass incl. 22 new for extracted helpers + hooks
- [x] Frontend coverage on touched copilot files: 91%+ (patch 87.37%)
Manual (once merged):
- [ ] Queue two chips while a tool is running; Claude acknowledges both
on the next round, UI shows bubbles in typing order after the tool
output
- [ ] Hand AutoPilot block an existing session_id that has a live turn;
block returns queued status, in-flight turn drains the message on its
next round
- [ ] `run_sub_session` against a busy sub — status=`queued`,
`sub_autopilot_session_link` lets user watch live
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
fcaebd1bb7 |
refactor(backend/copilot): unified queue-backed copilot turns + async sub-AutoPilot + guide-read gate (#12841)
### Why / What / How **Why:** the 10-min stream-level idle timeout was killing legitimate long-running tool calls — notably sub-AutoPilot runs via `run_block(AutoPilotBlock)`, which routinely take 15–45 min. The symptom users saw was `"A tool call appears to be stuck"` even though AutoPilot was actively working. A second long-standing rough edge was shipped alongside: agents often skipped `get_agent_building_guide` when generating agent JSON, producing schemas that failed validation and burned turns on auto-fix loops. **What:** three threaded pieces. 1. **Async sub-AutoPilot via `run_sub_session`.** New copilot tool that delegates a task to a fresh (or resumed) sub-AutoPilot, and its companion `get_sub_session_result` for polling/cancelling. The agent starts with `run_sub_session(prompt, wait_for_result≤300s)` and, if the sub isn't done inside the cap, receives a handle + polls via `get_sub_session_result(wait_if_running≤300s)`. No single MCP call ever blocks the stream for more than 5 min, so the 10-min stream-idle timer stays simple and effective (derived as `MAX_TOOL_WAIT_SECONDS * 2`). 2. **Queue-backed copilot turn dispatch** — one code path for all three callers. - `run_sub_session` enqueues a `CoPilotExecutionEntry` on the existing `copilot_execution` exchange instead of spawning an in-process `asyncio.Task`. - `AutoPilotBlock.execute_copilot` (graph block) now uses the **same queue** instead of `collect_copilot_response` inline. - The HTTP SSE endpoint was already queue-backed. - All three share a single primitive: `run_copilot_turn_via_queue` → `create_session` → `enqueue_copilot_turn` → `wait_for_session_result`. The event-aggregation logic (`EventAccumulator`/`process_event`) is a shared module used by both the direct-stream path and the cross-process waiter. - Benefits: **deploy/crash resilience** (RabbitMQ redelivery survives worker restarts), **natural load balancing** across copilot_executor workers, **sessions as first-class resources** (UI users can `/copilot?sessionId=<inner>` into any sub or AutoPilot block's session), and every future stream-level feature (pending-messages drain #12737, compaction policies, etc.) applies uniformly instead of bypassing graph-block sessions. 3. **Guide-read gate on agent-generation tools.** `create_agent` / `edit_agent` / `validate_agent_graph` / `fix_agent_graph` refuse until the session has called `get_agent_building_guide`. The pre-existing soft hint was routinely ignored; the gate makes the dependency enforceable. All four tool descriptions advertise the requirement in one tightened sentence ("Requires get_agent_building_guide first (refuses otherwise).") that stays under the 32000-char schema budget. **How:** #### Queue-backed sub-AutoPilot + AutoPilotBlock - `sdk/session_waiter.py` — new module. `SessionResult` dataclass mirrors `CopilotResult`. `wait_for_session_result` subscribes to `stream_registry`, drains events via shared `process_event`, returns `(outcome, result)`. `wait_for_session_completion` is the cheaper outcome-only variant. `run_copilot_turn_via_queue` is the canonical three-step dispatch. Every exit path unsubscribes the listener. - `sdk/stream_accumulator.py` — new module. `EventAccumulator`, `ToolCallEntry`, `process_event` extracted from `collect.py`. Both the direct-stream and cross-process paths now use the same fold logic. - `tools/run_sub_session.py` / `tools/get_sub_session_result.py` — rewritten around the shared primitive. `sub_session_id` is now the sub's `ChatSession` id directly (no separate registry handle). Ownership re-verified on every call via `get_chat_session`. Cancel via `enqueue_cancel_task` on the existing `copilot_cancel` fan-out exchange. - `blocks/autopilot.py` — `execute_copilot` replaced its inline `collect_copilot_response` with `run_copilot_turn_via_queue`. `SessionResult` carries response text, tool calls, and token usage back from the worker so no DB round-trip is needed. The block's public I/O contract (inputs, outputs, `ToolCallEntry` shape) is unchanged. - `CoPilotExecutionEntry` gains a `permissions: CopilotPermissions | None` field forwarded to the worker's `stream_fn` so the sub's capability filter survives the queue hop. The processor passes it through to `stream_chat_completion_sdk` / `stream_chat_completion_baseline`. - **Deleted**: `sdk/sub_session_registry.py` (module-level dict, done-callback, abandoned-task cap, `notify_shutdown_and_cancel_all`, `_reset_for_test`), plus the shutdown-notifier hook in `copilot_executor.processor.cleanup` — redundant under queue-backed execution. #### Run_block single-tool cap (3) - `tools/helpers.execute_block` caps block execution at `MAX_TOOL_WAIT_SECONDS = 5 min` via `asyncio.wait_for` around the generator consumption. - On timeout: logs `copilot_tool_timeout tool=run_block block=… block_id=… input_keys=… user=… session=… cap_s=…` (grep-friendly) and returns an `ErrorResponse` that redirects the LLM to `run_agent` / `run_sub_session`. - Billing protection: `_charge_block_credits` is called in a `finally` guarded by `asyncio.shield` and marked `charge_handled` **before** the await so cancel-mid-charge doesn't double-bill and cancel-mid-generator-before-charge still settles via the finally. #### Guide-read gate - `helpers.require_guide_read(session, tool_name)` scans `session.messages` for any prior assistant tool call named `get_agent_building_guide` (handles both OpenAI and flat shapes). Applied at the top of `_execute` in `create_agent`, `edit_agent`, `validate_agent_graph`, `fix_agent_graph`. Tool descriptions advertise the requirement. #### Shared timing constants - `MAX_TOOL_WAIT_SECONDS = 5 * 60` + `STREAM_IDLE_TIMEOUT_SECONDS = 2 * MAX_TOOL_WAIT_SECONDS` in `constants.py`. Every long-running tool (`run_agent`, `view_agent_output`, `run_sub_session`, `get_sub_session_result`, `run_block`) imports from one place; no more hardcoded 300 / `10*60` literals drifting apart. Stream-idle invariant ("no single tool blocks close to the idle timeout") holds by construction. ### Frontend - Friendlier tool-card labels: `run_sub_session` → "Sub-AutoPilot", `get_sub_session_result` → "Sub-AutoPilot result", `run_block` → "Action" (matches the builder UI's own naming), `run_agent` → "Agent". Fixes the double-verb "Running Run …" phrasing. - `SubSessionStatusResponse.sub_autopilot_session_link` surfaces `/copilot?sessionId=<inner>` so users can click into any sub's session from the tool-call card — same pattern as `run_agent`'s `library_agent_link`. ### Changes 🏗️ - **New modules**: `sdk/session_waiter.py`, `sdk/stream_accumulator.py`, `tools/run_sub_session.py`, `tools/get_sub_session_result.py`, `tools/sub_session_test.py`, `tools/agent_guide_gate_test.py`. - **New response types**: `SubSessionStatusResponse`, `SubSessionProgressSnapshot`, `SessionResult`. - **New gate helper**: `require_guide_read` in `tools/helpers.py`. - **Queue protocol**: `permissions` field on `CoPilotExecutionEntry`, threaded through `processor.py` → `stream_fn`. - **Hidden**: `AUTOPILOT_BLOCK_ID` in `COPILOT_EXCLUDED_BLOCK_IDS` (run_block can't execute AutoPilotBlock; agents use `run_sub_session` instead). - **Deleted**: `sdk/sub_session_registry.py`, processor shutdown-notifier hook. - **Regenerated**: `openapi.json` for the new response types; block-docs for the updated `ToolName` Literal. - **Tool descriptions**: tightened the guide-gate hint across the four agent-builder tools to stay under the 32000-char schema budget. - **40+ tests** across sub_session, execute_block cap + billing races, stream_accumulator, agent_guide_gate, frontend helpers. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Unit suite green on the full copilot tree; `poetry run format` + `pyright` clean - [x] Schema character budget test passes (tool descriptions trimmed to stay under 32000) - [x] Native UI E2E (`poetry run app` + `pnpm dev`): `run_sub_session(wait_for_result=60)` returns `status="completed"` + `sub_autopilot_session_link` inline; `run_sub_session(wait_for_result=1)` returns `status="running"` + handle, `get_sub_session_result(wait_if_running=60)` observes `running → completed` transition - [x] AutoPilotBlock (graph) goes through `copilot_executor` queue end-to-end (verified via logs: ExecutionManager's AutoPilotBlock node spawned session `f6de335b-…`, a different `CoPilotExecutor` worker acquired its cluster lock and ran the SDK stream) - [x] Guide gate: `create_agent` without a prior `get_agent_building_guide` returns the refusal; agent reads the guide and retries successfully |
||
|
|
1c0c7a6b44 |
fix(copilot): add gh auth status check to Tool Discovery Priority section (#12832)
## Problem The CoPilot system prompt contains a `gh auth status` instruction in the E2B-specific `GitHub CLI` section, but models pattern-match to `connect_integration` from the **Tool Discovery Priority** section — which is where the actual decision to call an external service is made. Because the GitHub auth check lives in a separate, later section, it's not salient at the point of decision-making. This causes the model to call `connect_integration(provider='github')` even when `gh` is already authenticated via `GH_TOKEN`, unnecessarily prompting the user. ## Fix Add a 3-line callout directly inside the **Tool Discovery Priority** section: ``` > 🔑 **GitHub exception:** Before calling `connect_integration` for GitHub, > always run `gh auth status` first. If it shows `Logged in`, proceed > directly with `gh`/`git` — no integration connection needed. ``` This places the rule at the exact location where the model decides which tool path to take, preventing the miss. ## Why this works - **Placement over repetition**: The existing instruction isn't wrong — it's just in the wrong spot relative to where the decision is made - **Negative framing**: Explicitly says "before calling `connect_integration`" which directly intercepts the incorrect reflex - **Minimal change**: 4 lines added, zero removed Co-authored-by: Toran Bruce Richards <22963551+Torantulino@users.noreply.github.com> |
||
|
|
3a01874911 |
fix(frontend/builder): preserve agent name in AgentExecutor node title after reload (#12805)
## Summary Fixes #11041 When an `AgentExecutorBlock` is placed in the builder, it initially displays the agent's name (e.g., "Researcher v2"). After saving and reloading the page, the title reverts to the generic "Agent Executor." ## Root Cause The backend correctly persists `agent_name` and `graph_version` in `hardcodedValues` (via `input_default` in `AgentExecutorBlock`). However, `NodeHeader.tsx` always resolves the display title from `data.title` (the generic block name), ignoring the persisted agent name. ## Fix Modified the title resolution chain in `NodeHeader.tsx` to check `data.hardcodedValues.agent_name` between the user's custom name and the generic block title: 1. `data.metadata.customized_name` (user's manual rename) — highest priority 2. `agent_name` + ` v{graph_version}` from `hardcodedValues` — **new** 3. `data.title` (generic block name) — fallback This is a frontend-only change. No backend modifications needed. ## Files Changed - `autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/CustomNode/components/NodeHeader.tsx` (+11, -1) ## Test Plan - [x] Place an AgentExecutorBlock, select an agent — title shows agent name - [x] Save graph, reload page — title still shows agent name (was "Agent Executor" before) - [x] Double-click to rename — custom name takes priority over agent name - [x] Clear custom name — falls back to agent name - [x] Non-AgentExecutor blocks — unaffected, show generic title as before --------- Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co> |
||
|
|
6d770d9917 |
fix(platform/copilot): revert forward pagination, add visibility guarantee for blank chat (#12831)
## Why / What / How **Why:** PR #12796 changed completed copilot sessions to load messages from sequence 0 forward (ascending), which broke the standard chat UX — users now land at the beginning of the conversation instead of the most recent messages. Reported in Discord. **What:** Reverts the forward pagination approach and replaces it with a visibility guarantee that ensures every page contains at least one user/assistant message. **How:** - **Backend**: Removed after_sequence, from_start, forward_paginated, newest_sequence — always use backward (newest-first) pagination. Added _expand_for_visibility() helper: after fetching, if the entire page is tool messages (invisible in UI), expand backward up to 200 messages until a visible user/assistant message is found. - **Frontend**: Removed all forwardPaginated/newestSequence plumbing from hooks and components. Removed bottom LoadMoreSentinel. Simplified message merge to always prepend paged messages. ### Changes - routes.py: Reverted to simple backward pagination, removed TOCTOU re-fetch logic - db.py: Removed forward mode, extracted _expand_tool_boundary() and added _expand_for_visibility() - SessionDetailResponse: Removed newest_sequence and forward_paginated fields - openapi.json: Removed after_sequence param and forward pagination response fields - Frontend hooks/components: Removed forward pagination props and logic (-1000 lines) - Updated all tests (backend: 63 pass, frontend: 1517 pass) ### Checklist - [x] I have clearly listed my changes in the PR description - [x] Backend unit tests: 63 pass - [x] Frontend unit tests: 1517 pass - [x] Frontend lint + types: clean - [x] Backend format + pyright: clean |
||
|
|
334ec18c31 |
docs: convert in-code comments to MkDocs admonitions in block-sdk-gui… (#12819)
### Why / What / How <!-- Why: Why does this PR exist? What problem does it solve, or what's broken/missing without it? --> This PR converts inline Python comments in code examples within `block-sdk-guide.md` into MkDocs `!!! note` admonitions. This makes code examples cleaner and more copy-paste friendly while preserving all explanatory content. <!-- What: What does this PR change? Summarize the changes at a high level. --> Converts inline comments in code blocks to admonitions following the pattern established in PR #12396 (new_blocks.md) and PR #12313. <!-- How: How does it work? Describe the approach, key implementation details, or architecture decisions. --> - Wrapped code examples with `!!! note` admonitions - Removed inline comments from code blocks for clean copy-paste - Added explanatory admonitions after each code block ### Changes 🏗️ - Provider configuration examples (API key and OAuth) - Block class Input/Output schema annotations - Block initialization parameters - Test configuration - OAuth and webhook handler implementations - Authentication types and file handling patterns ### Checklist 📋 #### For documentation changes: - [x] Follows the admonition pattern from PR #12396 - [x] No code changes, documentation only - [x] Admonition syntax verified correct #### For configuration changes: - [ ] `.env.default` is updated or already compatible with my changes - [ ] `docker-compose.yml` is updated or already compatible with my changes --- **Related Issues**: Closes #8946 Co-authored-by: slepybear <slepybear@users.noreply.github.com> Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co> |
||
|
|
ea5cfdfa2e |
fix(frontend): remove debug console.log statements (#12823)
## Why Debug console.log statements were left in production code, which can leak sensitive information and pollute browser developer consoles. ## What Removed console.log from 4 non-legacy frontend components: - useNavbar.ts: isLoggedIn debug log - WalletRefill.tsx: autoRefillForm debug log - EditAgentForm.tsx: category field debug log - TimezoneForm.tsx: currentTimezone debug log ## How Simply deleted the console.log lines as they served no purpose other than debugging during development. ## Checklist - [x] Code follows project conventions - [x] Only frontend changes (4 files, 6 lines removed) - [x] No functionality changes Co-authored-by: slepybear <slepybear@users.noreply.github.com> |
||
|
|
d13a85bef7 |
feat(frontend): surface scheduled agents in library & copilot briefings (#12818)
## Why
Scheduled agents weren't well-surfaced in the Library and Copilot
briefings:
- The Library fleet summary didn't count agents that are scheduled
purely via the scheduler (only those with a `recommended_schedule_cron`
set at the agent level).
- Sitrep items didn't distinguish scheduled or listening (trigger-based)
agents, so they often fell back to a generic "idle" state.
- Scheduled chips showed a generic message with no indication of when
the next run would happen.
- The Copilot Agent Briefing surfaced every scheduled agent regardless
of how far out the next run was — an agent scheduled a month away would
take a slot from something actually happening soon.
- Long sitrep messages overflowed the row.
## What
- Add `is_scheduled` to `LibraryAgent` (sourced from the scheduler) so
the frontend can reliably detect schedule-only agents.
- Count scheduled agents in `useLibraryFleetSummary`.
- Include scheduled and listening agents in sitrep items, with a
priority ordering (error → running → stale → success → listening →
scheduled → idle).
- Show a relative next-run time on scheduled sitrep chips (e.g.
"Scheduled to run in 2h" / "in 3d").
- Filter the Copilot Agent Briefing to scheduled agents whose next run
is within the next 3 days.
- Truncate long sitrep messages to 1 line with `OverflowText` and show
the full text in a tooltip on hover.
## How
- Scheduler → `LibraryAgent` mapping populates `is_scheduled` /
`next_scheduled_run`.
- `useSitrepItems` gains an optional `scheduledWithinMs` parameter.
Copilot's `usePulseChips` passes `3 * 24 * 60 * 60 * 1000`; the Library
briefing omits it to keep its existing (unbounded) behavior.
- Scheduled config-based sitrep items are skipped when
`next_scheduled_run` is missing or outside the window.
- `SitrepItem` wraps the message in `OverflowText` so a single-line
ellipsis + hover tooltip replaces raw overflow.
## Test plan
- [ ] `/library` — scheduled and listening agents appear in the sitrep
with accurate copy; fleet summary counts scheduled agents correctly;
long messages truncate with a tooltip on hover.
- [ ] `/copilot` — on an empty session with the `AGENT_BRIEFING` flag
on, the briefing only shows scheduled agents whose next run is within 3
days; agents scheduled further out no longer appear as "scheduled"
chips.
- [ ] Scheduled chip text reads "Scheduled to run in {Nm|Nh|Nd}"
matching `next_scheduled_run`.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
60b85640e7 |
fix(backend/copilot): replace dedup lock with idempotent append_and_save_message (#12814)
## Why
The Redis dedup lock (`chat:msg_dedup:{session}:{content_hash}`, 30s
TTL) was solving the wrong problem:
- Its purpose: block infra/nginx retries from calling
`append_and_save_message` twice after a client disconnect, writing a
duplicate user message to the DB.
- The approach: deliberately hold the lock for 30s on `GeneratorExit`.
- Why unnecessary: the executor's cluster lock already prevents
duplicate *execution*. The only real gap was duplicate *DB writes* in
the ~1s before the executor picks up the turn.
## What
- **Deleted** `message_dedup.py` and `message_dedup_test.py` (~150 lines
removed).
- **Removed** all dedup lock code from `routes.py` (~40 lines removed).
- **`append_and_save_message`** is now idempotent and self-contained:
- Uses redis-py's built-in `Lock(timeout=10, blocking_timeout=2)` —
Lua-script atomic acquire/release, no manual poll/sleep loop.
- Lock context manager yields `bool` (`True` = acquired, `False` =
degraded). When degraded (Redis down or 2s timeout), reads from DB
directly instead of cache to avoid stale-state duplicates.
- Idempotency check: if `session.messages[-1]` already matches the
incoming role+content, returns `None` instead of the session.
- Lock released explicitly as soon as the write completes; `try/except`
in `finally` so a cleanup error after a successful write never surfaces
a false 500.
- On cache-write failure, the stale cache entry is invalidated so future
reads fall back to the authoritative DB.
- **`routes.py`** uses the `None` signal: `is_duplicate_message = (await
append_and_save_message(...)) is None`
- Skips `create_session` and `enqueue_copilot_turn` for duplicates —
client re-attaches to the existing turn's Redis stream.
- `track_user_message` and `turn_id` generation only happen when
`is_duplicate_message` is false.
- **`subscribe_to_session`** retry window increased from 1×50ms to
3×100ms — covers the window where a duplicate request subscribes before
the original's `create_session` hset completes.
- **Cleaned up** `routes_test.py`: removed 5 dedup-specific tests and
the `mock_redis` setup from `_mock_stream_internals`; added
duplicate-skips-enqueue test.
## How
The idempotency guard distinguishes legit same-text messages from
retries via the **assistant turn between them**: if the user said "yes",
got a response, and says "yes" again, `session.messages[-1]` is the
assistant reply, so the role check fails and the second message goes
through. A retry (no response yet) sees the user message as the last
entry and is blocked.
```python
if (
session.messages
and session.messages[-1].role == message.role
and session.messages[-1].content == message.content
):
return None # duplicate — caller skips enqueue
```
The Redis lock ensures this check always sees authoritative state even
in multi-replica deployments. When the lock is unavailable (Redis down
or contention), reading from DB directly (bypassing potentially stale
cache) provides the same safety guarantee at the cost of a DB
round-trip.
## Checklist
- [x] PR targets `dev`
- [x] Conventional commit title with scope
- [x] Tests added/updated (duplicate detection, lock degradation, DB
error, cache invalidation paths)
- [x] `poetry run format` and `poetry run pyright` pass clean
- [x] No new linter suppressors
|
||
|
|
87e4d42750 |
fix(backend/copilot): fix initial load missing messages + forward pagination for completed sessions (#12796)
### Why / What / How **Why:** Completed copilot sessions with many messages showed a completely empty chat view. A user reported a 158-message session that appeared blank on reload. **What:** Two bugs fixed: 1. **Backend** — initial page load always returned the newest 50 messages in DESC order. For sessions heavy in tool calls, the user's original messages (seq 0–5) were never included; all 50 slots consumed by mid-session tool outputs. 2. **Frontend** — convertChatSessionToUiMessages silently dropped user messages with null/empty content. **How:** For completed sessions (no active stream), the backend now loads from sequence 0 in ASC order. Active/streaming sessions keep newest-first for streaming context. A new after_sequence forward cursor enables infinite-scroll for subsequent pages (sentinel moves to bottom). The frontend wires forward_paginated + newest_sequence end-to-end. ### Changes 🏗️ - db.py: added from_start (ASC) and after_sequence (forward cursor) modes; added newest_sequence to PaginatedMessages - routes.py: detect completed vs active on initial load; pass from_start=True for completed; expose newest_sequence + forward_paginated; accept after_sequence param - convertChatSessionToUiMessages.ts: never drop user messages with empty content - useLoadMoreMessages.ts: forward pagination via after_sequence; append pages to end - ChatMessagesContainer.tsx: LoadMoreSentinel at bottom for forward-paginated sessions - Wire newestSequence + forwardPaginated end-to-end through useChatSession/useCopilotPage/ChatContainer - openapi.json: add after_sequence + newest_sequence/forward_paginated; regenerate types - db_test.py: 9 new unit tests for from_start and after_sequence modes ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Open a completed session with many messages — first user message visible on initial load - [x] Scroll to bottom of completed session — load more appends next page - [x] Open active/streaming session — newest messages shown first, streaming unaffected - [x] Backend unit tests: all 28 pass - [x] Frontend lint/format: clean, no new type errors --------- Co-authored-by: chernistry <73943355+chernistry@users.noreply.github.com> Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co> |
||
|
|
0339d95d12 |
fix(frontend): small UI fixes, sort menu bg, name update auth, stats grid overflow, pulse chips (#12815)
## Summary - **LibrarySortMenu / AgentFilterMenu**: Force `!bg-transparent` and neutralise legacy `SelectTrigger` styles (`m-0.5`, `ring-offset-white`, `shadow-sm`) that caused a white background around the trigger - **EditNameDialog**: Replace client-side `supabase.auth.updateUser()` with server-side `PUT /api/auth/user` route — fixes "Auth session missing!" error caused by `httpOnly` cookies being inaccessible to browser JS - **StatsGrid**: Swap label `Text` for `OverflowText` so tile labels truncate with `…` and show a tooltip instead of wrapping when the grid is squeezed - **PulseChips**: Set fixed `15rem` chip width with `shrink-0`, horizontal scroll, and styled thin scrollbar - **Tests**: Updated `EditNameDialog` tests to use MSW instead of mocking Supabase client; added 7 new `PulseChips` integration tests ## Test plan - [x] `pnpm test:unit` — all 1495 tests pass (91 files) - [x] `pnpm format && pnpm lint` — clean - [x] `pnpm types` — no new errors (pre-existing only) - [ ] QA `/library?sort=updatedAt` — sort menu trigger has no white bg - [ ] QA `/library` — StatsGrid labels truncate with tooltip on narrow viewports - [ ] QA `/copilot` — PulseChips scroll horizontally at fixed width - [ ] QA `/copilot` — Edit name dialog saves successfully (no "Auth session missing!") 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
f410929560 |
feat(platform): Add xAI Grok 4.20 models from OpenRouter (#12620)
Requested by @Torantulino Adds the 2 xAI Grok 4.20 models available on OpenRouter that are missing from the platform. ## Why `x-ai/grok-4.20` and `x-ai/grok-4.20-multi-agent` are xAI's current flagship models (released March 2026) and are available via OpenRouter, but weren't accessible from the platform's LLM blocks. ## Changes **`autogpt_platform/backend/backend/blocks/llm.py`** - Added `GROK_4_20` and `GROK_4_20_MULTI_AGENT` enum members - Added corresponding `MODEL_METADATA` entries (open_router provider, 2M context window, price tier 3) **`autogpt_platform/backend/backend/data/block_cost_config.py`** - Added `MODEL_COST` entries at 5 credits each (flagship tier, $2/M in) **`docs/integrations/block-integrations/llm.md`** - Added new model IDs to all LLM block tables | Model | Pricing | Context | |-------|---------|---------| | `x-ai/grok-4.20` | $2/M in, $6/M out | 2M | | `x-ai/grok-4.20-multi-agent` | $2/M in, $6/M out | 2M | Both models use the standard OpenRouter chat completions API — no special handling needed. Resolves: SECRT-2196 --------- Co-authored-by: Torantulino <22963551+Torantulino@users.noreply.github.com> Co-authored-by: Toran Bruce Richards <Torantulino@users.noreply.github.com> Co-authored-by: Otto (AGPT) <otto@agpt.co> |
||
|
|
2bbec09e1a |
feat(platform): subscription tier billing via Stripe Checkout (#12727)
## Why Introducing paid subscription tiers (PRO, BUSINESS) so we can charge for AutoPilot capacity beyond the free tier. Without a billing integration, all users share the same rate limits regardless of their willingness to pay for additional capacity. ## What End-to-end subscription billing system using Stripe Checkout Sessions: **Backend:** - `SubscriptionTier` enum (`FREE`, `PRO`, `BUSINESS`, `ENTERPRISE`) on the `User` model - `POST /credits/subscription` — creates a Stripe Checkout Session for paid upgrades; for FREE tier or when `ENABLE_PLATFORM_PAYMENT` is off, sets tier directly - `GET /credits/subscription` — returns current tier, monthly cost (cents), and all tier costs - `POST /credits/stripe_webhook` — handles `customer.subscription.created/updated/deleted`, `checkout.session.completed`, `charge.dispute.*`, `refund.created` - `sync_subscription_from_stripe()` — keeps `User.subscriptionTier` in sync from webhook events; guards against out-of-order delivery (cancelled event after new sub created), ENTERPRISE overwrite, and duplicate webhook replay - Open-redirect protection on `success_url`/`cancel_url` via `_validate_checkout_redirect_url()` - `_cancel_customer_subscriptions()` — cancels both active and trialing subs; propagates errors so callers can avoid updating DB tier on Stripe failure - `_cleanup_stale_subscriptions()` — best-effort cancellation of old subs when a new one becomes active (paid-to-paid upgrade), to prevent double-billing - `get_stripe_customer_id()` with idempotency key to prevent duplicate Stripe customers on concurrent requests - `cache_none=False` sentinel fix in `@cached` decorator so Stripe price lookups retry on transient error instead of poisoning the cache with `None` - Stripe Price IDs read from LaunchDarkly (`stripe-price-id-pro`, `stripe-price-id-business`). If not configured, upgrade returns 422. **Frontend:** - `SubscriptionTierSection` component on billing page: tier cards (FREE/PRO/BUSINESS), upgrade/downgrade buttons, per-tier cost display, Stripe redirect on upgrade - Confirmation dialog for downgrades - ENTERPRISE users see a read-only admin-managed banner - Success toast on return from Stripe Checkout (`?subscription=success`) - Uses generated `useGetSubscriptionStatus` / `useUpdateSubscriptionTier` hooks ## How - Paid upgrades use Stripe Checkout Sessions (not server-side subscription creation) — Stripe handles PCI-compliant card collection and the subscription lifecycle - Tier is synced back via webhook on `customer.subscription.created/updated/deleted` - Downgrade to FREE cancels via Stripe API immediately; a `stripe.StripeError` during cancellation returns 502 with a generic message (no Stripe detail leakage) - LaunchDarkly flags: `stripe-price-id-pro` (string), `stripe-price-id-business` (string), `enable-platform-payment` (bool) - `ENABLE_PLATFORM_PAYMENT=false` bypasses Stripe for beta/internal access (sets tier directly) ## Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] `ENABLE_PLATFORM_PAYMENT=false` → tier change updates directly, no Stripe redirect - [x] `ENABLE_PLATFORM_PAYMENT=true` with price IDs configured → paid upgrade redirects to Stripe Checkout - [x] Stripe webhook `customer.subscription.created` → `User.subscriptionTier` updated - [x] Unrecognised price ID in webhook → logs warning, tier unchanged - [x] ENTERPRISE user webhook event → tier not overwritten - [x] Empty `STRIPE_WEBHOOK_SECRET` → 503 (prevents HMAC bypass) - [x] Open-redirect attack on `success_url`/`cancel_url` → 422 #### For configuration changes: - [x] No `.env` or `docker-compose.yml` changes - [x] LaunchDarkly flags to create: `stripe-price-id-pro` (string), `stripe-price-id-business` (string), `enable-platform-payment` (bool) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: majdyz <majdy.zamil@gmail.com> |
||
|
|
31b88a6e56 |
feat(frontend): add Agent Briefing Panel (#12764)
## Summary <img width="800" height="772" alt="Screenshot_2026-04-13_at_18 29 19" src="https://github.com/user-attachments/assets/3da6eaf2-1485-4c08-9651-18f2f4220eba" /> <img width="800" height="285" alt="Screenshot_2026-04-13_at_18 29 24" src="https://github.com/user-attachments/assets/6a5f981a-1e1d-4d22-a33d-9e1b0e7555a7" /> <img width="800" height="288" alt="Screenshot_2026-04-13_at_18 29 27" src="https://github.com/user-attachments/assets/f97b4611-7c23-4fc9-a12d-edf6314a77ef" /> <img width="800" height="433" alt="Screenshot_2026-04-13_at_18 29 31" src="https://github.com/user-attachments/assets/e6d7241d-84f3-4936-b8cd-e0b12df392bb" /> <img width="700" height="554" alt="Screenshot_2026-04-13_at_18 29 40" src="https://github.com/user-attachments/assets/92c08f21-f950-45cd-8c1d-529905a6e85f" /> Implements the Agent Intelligence Layer — real-time agent awareness across the Library and Copilot pages. ### Core Features - **Agent Briefing Panel** — stats grid with fleet-wide counts (running, recently completed, needs attention, scheduled, idle, monthly spend) and tab-driven content below - **Enhanced Library Cards** — StatusBadge, run counts, contextual action buttons (See tasks, Start, Chat) with consistent icon-left styling - **Situation Report Items** — prioritized sitrep with error-first ranking, "See task" deep-links for completed runs, and "Ask AutoPilot" bridge - **Home Pulse Chips** — agent status chips on Copilot empty state with hover-reveal actions (slide-up animation + backdrop blur on desktop, always visible on touch) - **Edit Display Name** — pencil icon on Copilot greeting to update Supabase user metadata inline ### Backend - **Execution count API** — batch `COUNT(*)` query on `AgentGraphExecution` grouped by `agentGraphId` for the current user, avoiding loading full execution rows. Wired into `list_library_agents` and `list_favorite_library_agents` via `execution_count_override` on `LibraryAgent.from_db()` ### UI Polish - Subtler gradient on AgentBriefingPanel (reduced opacity on background + animated border) - Consistent button styles across all action buttons (icon-left, same sizing) - Removed duplicate "Open in builder" menu item (kept "Edit agent") - "Recently completed" tab replaces "Listening" in briefing panel, showing agents with completed runs in last 72h ## Changes ### Backend - `backend/api/features/library/db.py` — added `_fetch_execution_counts()` batch COUNT query, wired into list endpoints - `backend/api/features/library/model.py` — added `execution_count_override` param to `LibraryAgent.from_db()` ### Frontend — New files - `EditNameDialog/EditNameDialog.tsx` — modal to update display name via Supabase auth - `PulseChips/PulseChips.module.css` — hover-reveal animation + glass panel styles ### Frontend — Modified files - `EmptySession.tsx` — added EditNameDialog and PulseChips - `PulseChips.tsx` — redesigned with See/Ask buttons, hover overlay on desktop - `usePulseChips.ts` — added agentID for deep-linking - `AgentBriefingPanel.tsx` — subtler gradient, adjusted padding - `AgentBriefingPanel.module.css` — reduced conic gradient opacity - `BriefingTabContent.tsx` — added "completed" tab routing - `StatsGrid.tsx` — replaced Listening with Recently completed, reordered tabs - `SitrepItem.tsx` — consistent button styles, "See task" link for completed items, updated copilot prompt - `ContextualActionButton.tsx` — icon-left, smaller icon, renamed Run to Start - `LibraryAgentCard.tsx` — icon-left on all buttons, EyeIcon for See tasks - `AgentCardMenu.tsx` — removed duplicate "Open in builder" - `useAgentStatus.ts` — added completed count to FleetSummary - `useLibraryFleetSummary.ts` — added recent completion tracking - `types.ts` — added `completed` to FleetSummary and AgentStatusFilter ## Test plan - [ ] Library page renders Agent Briefing Panel with stats grid - [ ] "Recently completed" tab shows agents with completed runs in last 72h - [ ] Agent cards show real execution counts (not 0) - [ ] Action buttons have consistent styling with icon on the left - [ ] "See task" on completed items deep-links to agent page with execution selected - [ ] "Ask AutoPilot" generates last-run-specific prompt for completed items - [ ] Copilot empty state shows PulseChips with hover-reveal actions on desktop - [ ] PulseChips show See/Ask buttons always on touch screens - [ ] Pencil icon on greeting opens edit name dialog - [ ] Name update persists via Supabase and refreshes greeting - [ ] `pnpm format && pnpm lint && pnpm types` pass - [ ] `poetry run format` passes for backend changes 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: John Ababseh <jababseh7@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Bentlybro <Github@bentlybro.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: CodeRabbit <noreply@coderabbit.ai> Co-authored-by: majdyz <zamil.majdy@agpt.co> |
||
|
|
d357956d98 |
refactor(backend/copilot): make session-file helper fns public to fix Pyright warnings (#12812)
## Why After PR #12804 was squashed into dev, two module-level helper functions in `backend/copilot/sdk/service.py` remained private (`_`-prefixed) while being directly imported by name in `sdk/transcript_test.py`. Pyright reports `reportAttributeAccessIssue` when tests (even those excluded from CI lint) import private symbols from outside their defining module. ## What Rename two helpers to remove the underscore prefix: - `_process_cli_restore` → `process_cli_restore` - `_read_cli_session_from_disk` → `read_cli_session_from_disk` Update call sites in `service.py` and imports/calls/docstrings in `sdk/transcript_test.py`. ## How Pure rename — no logic change. Both functions were already module-level helpers with no reason to be private; the underscore was convention carried over during the refactor but they are directly unit-tested and should be public. All 66 `sdk/transcript_test.py` tests pass after the rename. ## Checklist - [x] Tests pass (`poetry run pytest backend/copilot/sdk/transcript_test.py`) - [x] No `_`-prefixed symbols imported across module boundaries - [x] No linter suppressors added |
||
|
|
697ffa81f0 | fix(backend/copilot): update transcript_test to use strip_for_upload after upload_cli_session removal | ||
|
|
2b4727e8b2 |
chore: merge master into dev, resolve baseline/transcript conflicts
Conflicts in baseline/service.py, baseline/transcript_integration_test.py,
and transcript.py arose because dev-only commit
|
||
|
|
0d4b31e8a1 |
refactor(backend/copilot): unified transcript context — extract_context_messages, mode-gated --resume, compaction-aware gap-fill (#12804)
### Why / What / How
**Why:** The copilot had two separate GCS paths (`cli-sessions/` and
`chat-transcripts/`), redundant function names
(`upload_cli_session`/`restore_cli_session`), and no shared context
strategy between modes. When switching from baseline→SDK or
SDK→baseline, the receiving mode discarded the stored transcript and
fell back to full DB reconstruction — loading all raw messages instead
of the compacted form — causing inflated context, wasted tokens, and
loss of CLI compaction summaries.
**What:**
- Single GCS path (`cli-sessions/`) for both modes — `chat-transcripts/`
removed
- Unified public API: `upload_transcript` / `download_transcript` /
`TranscriptDownload`
- `TranscriptMode = Literal["sdk", "baseline"]` persisted in
`.meta.json` — SDK skips `--resume` when `mode != "sdk"`
(baseline-written JSONL has stripped fields / synthetic IDs)
- `extract_context_messages(download, session_messages)` — shared
context primitive used by **both SDK and baseline**: reads compacted
transcript content + fills only the DB gap (messages after watermark),
so CLI compaction summaries are preserved across mode switches
- Watermark fix: `_jsonl_covered = transcript_msg_count + 2` when a real
transcript is present, preventing false gap detection after `--resume`
- Baseline gap-fill: `_append_gap_to_builder` converts `ChatMessage` →
JSONL entries; no more silently discarded stale transcripts
**How:**
```
SDK turn (mode="sdk" transcript available):
──► --resume [full CLI session restored natively]
──► inject gap prefix if DB has messages after watermark
SDK turn (mode="baseline" transcript available):
──► cannot --resume (synthetic CLI IDs)
──► extract_context_messages(download, session_messages):
returns transcript JSONL (compacted, isCompactSummary preserved) + gap
excludes session_messages[-1] (current turn — caller injects it separately)
──► format as <conversation_history> + "Now, the user says: {current}"
Baseline turn (any transcript):
──► _load_prior_transcript → TranscriptDownload
──► extract_context_messages(download, session_messages) + session_messages[-1]
replaces full session.messages DB read
──► LLM messages: [compacted history + gap] + [current user turn]
Transcript unavailable — both SDK (use_resume=False) and baseline:
──► extract_context_messages(None, session_messages) returns session_messages[:-1]
(all prior DB messages except the current user turn at [-1])
──► graceful fallback — no crash, no empty context
──► covers: first turn, GCS error, corrupt JSONL, missing .meta.json
──► next successful response uploads a fresh transcript
```
`extract_context_messages` is the shared primitive — both modes call the
same function, which handles:
- `download=None` (first turn, GCS unavailable) → falls back to
`session_messages[:-1]`
- Empty/corrupt content → falls back to `session_messages[:-1]`
- `bytes` content (raw GCS) or `str` content (pre-decoded baseline path)
- `isCompactSummary=True` entries → preserved so CLI compaction survives
mode switches
- Missing/corrupt `.meta.json` → `message_count` defaults to `0`, `mode`
defaults to `"sdk"`
**Why `[:-1]` and not all messages?** `session_messages[-1]` is always
the current user turn being handled right now. Both callers inject it
separately — SDK wraps it as `"Now, the user says: ..."`, baseline
appends it as the final message in the LLM array. Returning it inside
`extract_context_messages` would double-inject it.
### Changes 🏗️
- **`transcript.py`**: `CliSessionRestore` → `TranscriptDownload` +
`mode` field; `upload_cli_session` → `upload_transcript`;
`restore_cli_session` → `download_transcript`; add `TranscriptMode`,
`detect_gap`, `extract_context_messages`; import `ChatMessage` via
relative path to match `service.py` style
- **`sdk/service.py`**: mode-check before `--resume`; `_RestoreResult`
carries `baseline_download` + `context_messages` + `transcript_content`;
`_build_query_message` accepts `prior_messages` override;
`_restore_cli_session_for_turn` populates `context_messages` via
`extract_context_messages` and sets `transcript_content` to prevent
duplicate DB reconstruction; watermark fix (`_jsonl_covered =
transcript_msg_count + 2`)
- **`baseline/service.py`**: `_load_prior_transcript` returns `(bool,
TranscriptDownload | None)`; LLM context replaced with
`extract_context_messages(download, messages)`; `_append_gap_to_builder`
+ `detect_gap` call; `upload_transcript(mode="baseline")`
- **`sdk/transcript.py`**: updated re-exports, old aliases removed
- **`scripts/download_transcripts.py`**: updated for `bytes | str`
content type
- **Test files**: 179 tests total; `transcript_test.py`,
`baseline/transcript_integration_test.py`,
`sdk/service_helpers_test.py`, `sdk/test_transcript_watermark.py`,
`test/copilot/test_transcript_watermark.py` all updated/added
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] 179 unit tests pass — `transcript_test`,
`baseline/transcript_integration_test`, `sdk/service_helpers_test`,
`sdk/test_transcript_watermark`
- [x] pyright 0 errors on all changed files
- [x] SDK `--resume` path still works when `mode="sdk"` transcript is
present
- [x] SDK fallback uses `extract_context_messages` (compacted baseline
content + gap) when `mode="baseline"` transcript is stored — no more
full DB reconstruction
- [x] Baseline uses `extract_context_messages` per turn instead of full
`session.messages` DB read
- [x] `isCompactSummary=True` entries preserved across mode switches
- [x] Watermark (`_jsonl_covered`) fix prevents false gap detection
after `--resume`
- [x] Baseline gap detection no longer silently discards stale
transcripts
- [x] `TranscriptDownload.content` accepts `bytes | str` — backward
compatible
- [x] Transcript unavailable (GCS error, first turn, corrupt file)
gracefully falls back to `session_messages[:-1]` without crash — applies
to both SDK and baseline paths
---------
Co-authored-by: chernistry <73943355+chernistry@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
|
||
|
|
0cd0a76305 |
fix(backend/copilot): baseline always uploads when GCS has no transcript
_load_prior_transcript was returning False for missing/invalid transcripts, which caused should_upload_transcript to suppress the upload. The original intent was to protect against overwriting a *newer* GCS version — but a missing or corrupt file is not 'newer'. Only stale (watermark ahead) and download errors (unknown GCS state) should suppress upload. Also renames transcript_covers_prefix → transcript_upload_safe throughout to accurately describe what the flag means. |
||
|
|
d01a51be0e |
Add check for GitHub account connection status (#12807)
Added instruction to check GitHub authentication status before prompting user. This prevents repeated, unnecessary asking of the user to add their GitHub credentials when they're already added, which is currently a prevalent bug. ### Changes 🏗️ - Added one line to `autogpt_platform/backend/backend/copilot/prompting.py` instructing AutoPilot to run `gh auth status` before prompting the user to connect their GitHub account. Co-authored-by: Toran Bruce Richards <22963551+Torantulino@users.noreply.github.com> |
||
|
|
bd2efed080 |
fix(frontend): allow zooming out more in the builder (#12690)
Reduced minZoom on the builder canvas from 0.1 to 0.05 to allow zooming out further when working with large agent graphs. Fixes #9325 Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co> |
||
|
|
5fccd8a762 | Merge branch 'master' of github.com:Significant-Gravitas/AutoGPT into dev | ||
|
|
2740b2be3a |
fix(backend/copilot): disable fallback model to fix prod CLI rejection (#12802)
### Why / What / How **Why:** `fffbe0aad8` changed both `ChatConfig.model` and `ChatConfig.claude_agent_fallback_model` to `claude-sonnet-4-6`. The Claude Code CLI rejects this with `Error: Fallback model cannot be the same as the main model`, causing every standard-mode copilot turn to fail with exit code 1 — the session "completes" in ~30s but produces no response and drops the transcript. **What:** Set `claude_agent_fallback_model` default to `""`. `_resolve_fallback_model()` already returns `None` on empty string, which means the `--fallback-model` flag is simply not passed to the CLI. On 529 overload errors the turn will surface normally instead of silently retrying with a fallback. **How:** One-line config change + test update. ### Changes 🏗️ - `ChatConfig.claude_agent_fallback_model` default: `"claude-sonnet-4-6"` → `""` - Update `test_fallback_model_default` to assert the empty default ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] `poetry run pytest backend/copilot/sdk/p0_guardrails_test.py` #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes |
||
|
|
d27d22159d | Merge branch 'master' of github.com:Significant-Gravitas/AutoGPT into dev |