AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-04-30 03:00:41 -04:00

Author	SHA1	Message	Date
anvyle	f7601d06ed	fix(copilot): resume decompose_goal countdown from server timestamp Reopening a session was restarting the client countdown from a fresh 60s, even though the server had been counting the whole time. Now the timer reflects real elapsed time so the user sees the actual remaining seconds (or 0, which auto-approves immediately). - backend: stamp UTC created_at on TaskDecompositionResponse via a default factory. The timestamp is set when the tool returns and persisted in the message content JSON, so it survives DB round-trips. - frontend: lazy-init secondsLeft from (auto_approve_seconds - (Date.now() - created_at)), clamped to [0, total]. Older messages without created_at fall back to a fresh full countdown (existing behaviour). - Test: assert created_at is stamped within the duration of _execute(). Note: openapi.json regen is skipped in this commit because the existing REST server is in use; the frontend reads tool output as opaque JSON via custom helpers, so the regen is not required for the feature to work. Regen later for completeness. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 17:44:50 +02:00
anvyle	fb86fcb67d	feat(copilot): add server-side auto-approve fallback for decompose_goal The decompose_goal countdown was purely client-side: if the user closed the tab before the timer ran out, the agent never got built. Add a server-side timer that fires the same approval message even when no client is connected. - backend/copilot/model.py: add append_message_if helper that appends a message inside the session lock only if a predicate is satisfied. Used by the auto-approve task to no-op when the user has already acted. - backend/copilot/tools/decompose_goal.py: when the tool returns, schedule a fire-and-forget asyncio task (same _background_tasks pattern as agent_browser.py) that sleeps 90s, re-checks the session, and if no user message has appeared since, appends "Approved. Please build the agent." and enqueues a new copilot turn. Stays in process; restart-resilience is a documented follow-up. - backend/copilot/tools/models.py: expose auto_approve_seconds on TaskDecompositionResponse so the frontend countdown is sourced from the backend instead of a hard-coded constant. - frontend DecomposeGoal.tsx: seed secondsLeft from output.auto_approve_seconds with a 60s fallback for older sessions. - Regenerate openapi.json with the new field. - Tests: 9 new unit tests covering the predicate, the auto-approve flow (idle / user-acted / errors swallowed) and _schedule_auto_approve. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 16:34:46 +02:00
anvyle	94f065a7e0	fix(frontend/copilot): remove setInitialPrompt conflict and reset edit mode on new message - Remove setInitialPrompt() from handleModify() — the inline editor is the sole editing UX; pre-filling the chat input simultaneously creates a conflicting interface where chat-input submission loses inline edits - Add useEffect to reset isEditing when showActions goes false (new message arrives while editing), preventing users from being stuck in edit mode with no way to submit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 23:15:16 +02:00
anvyle	8d5e8a9e3f	fix(backend/copilot): add decompose_goal to ToolName Literal in permissions.py The ToolName Literal must stay in sync with TOOL_REGISTRY keys. Adds 'decompose_goal' to the platform tools section to fix CI test failures. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 23:09:14 +02:00
anvyle	02b972cfc4	fix(backend/copilot): regenerate openapi.json with TaskDecompositionResponse schema The API schema was missing DecompositionStepModel and TaskDecompositionResponse after the merge. Regenerated with export-api-schema and formatted with prettier. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 22:51:18 +02:00
anvyle	31ce418d5e	fix(backend/copilot): resolve merge conflict with dev branch in models.py Merge upstream dev changes (Graphiti memory responses) alongside the TaskDecompositionResponse added in this PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 22:44:02 +02:00
anvyle	70689ce326	fix(frontend/copilot): guard isPending flag on error and filter empty steps from approval - Prevent simultaneous pending + error state when output-error has null payload: isPending is now false when isError is true - Filter out steps with empty descriptions before building the approval message, preventing malformed input from reaching the LLM Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 22:40:39 +02:00
anvyle	9004a3ada1	fix(copilot): guard auto-approve against race condition when isLastMessage changes Add showActions to the auto-approve useEffect dependency array and condition. This prevents the approval from firing after isLastMessage becomes false (e.g. when a new message arrives just as the timer expires), closing the race condition flagged by Sentry. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 22:25:27 +02:00
anvyle	5e9cee524d	fix(copilot): address PR review comments on decompose_goal tool - Add TaskDecompositionResponse to ToolResponseUnion for OpenAPI codegen - Remove LLM-controllable require_approval param (hardcoded to True) - Validate each step is a dict before calling .get() - Validate step descriptions are non-empty - Validate action values against allowlist, coerce unknown to DEFAULT_ACTION - Align MAX_STEPS=8 with agent_generation_guide.md (was 10) - Add DEFAULT_ACTION constant; use enum in schema - Add model_validator to sync step_count with len(steps) - Fix handleModify: pre-fill chat input via setInitialPrompt instead of sending dangling message - Add approvedRef guard on handleModify to prevent double-clicks - Fix eslint-disable: rewrite auto-approve effect without dependency suppression - Fix hardcoded light-mode colors (bg-white, border-slate-200, text-zinc-800) → semantic tokens - Fix error card: render ToolErrorCard whenever isError=true, not only when output is present - Fix hint text: only show approve hint when requires_approval=true - Remove dead `action` prop from StepItem - Add aria-label to all StepStatusIcon states - Tighten parseOutput type guards (Array.isArray check, no false positives) - Rename isOperating → isPending for clarity - Add backend unit tests for DecomposeGoalTool (16 cases) - Add frontend unit tests for helpers.tsx (20 cases) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 22:23:11 +02:00
anvyle	b9d47a8cf5	fix(copilot): auto-size editable step textareas on initial render and input - Replace <input type="text"> with <textarea> for step descriptions - Use ref callback to set height from scrollHeight on every render so long descriptions wrap to multiple lines by default without interaction - Bump countdown ring container from 20px to 24px and text from 9px to 11px for better legibility Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 22:10:51 +02:00
anvyle	5fa33111de	feat(copilot): add auto-approve timer with editable steps to decompose_goal UI - Replace static Approve/Modify buttons with a 99s countdown timer that auto-approves when it expires - Timer ring animates inline within "Starting in [N]s" text using SVG strokeDasharray; hover on the text swaps it to "Start now" via Tailwind named groups (group/label) - Clicking Modify stops the timer, enters editable mode where steps can be renamed, deleted, or inserted between existing steps - In edit mode only Approve is shown; timer and Modify are hidden - showActions gated on isLastMessage (server-derived) so the timer never re-appears when returning to a session with prior messages - Forward isLastMessage through ChatMessagesContainer → MessagePartRenderer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 21:50:43 +02:00
Zamil Majdy	87539c03a4	fix(frontend): unify copilot auth headers and propagate impersonation header (#12718 ) ### Why Admin user impersonation was silently broken for the copilot/autopilot chat feature. The SSE stream requests and message feedback requests made direct HTTP calls to the backend with only a Bearer token — missing the `X-Act-As-User-Id` header that the impersonation feature requires. This meant that when an admin impersonated a user and used copilot chat, messages were processed and feedback was recorded under the admin's identity, not the impersonated user's. The impersonation header was also read inconsistently: `custom-mutator.ts` accessed `sessionStorage` directly (breaking cross-tab impersonation), while other callers had no impersonation support at all. ### What - `src/lib/impersonation.ts`: Added `getSystemHeaders()` — a single function that returns all cross-cutting request headers, currently `X-Act-As-User-Id` when impersonation is active. Uses `ImpersonationState.get()` which handles both `sessionStorage` (same-tab) and cookie fallback (cross-tab). Added `IMPERSONATION_COOKIE_NAME` constant to `constants.ts` to replace the previously hardcoded local string. - `src/app/(platform)/copilot/helpers.ts`: Added `getCopilotAuthHeaders()` — combines `getWebSocketToken()` (JWT) with `getSystemHeaders()` (impersonation) into a single async call for direct backend requests. - `src/app/(platform)/copilot/useCopilotStream.ts`: Replaced local `getAuthHeaders()` (JWT only) with shared `getCopilotAuthHeaders()` in both `prepareSendMessagesRequest` and `prepareReconnectToStreamRequest`. - `src/app/(platform)/copilot/components/ChatMessagesContainer/useMessageFeedback.ts`: Switched from `getWebSocketToken()` to `getCopilotAuthHeaders()` for feedback POST requests. - `src/app/api/mutators/custom-mutator.ts`: Replaced raw `sessionStorage.getItem(IMPERSONATION_STORAGE_KEY)` with `getSystemHeaders()` (fixes cross-tab support for all generated API calls). - Tests: New unit tests for `getCopilotAuthHeaders` (4 cases), `customMutator` impersonation header propagation (2 cases), and `ImpersonationState`/`ImpersonationCookie`/`ImpersonationSession` (full coverage across 3 describe blocks, 18 cases). ### How it works `getSystemHeaders()` calls `ImpersonationState.get()` which reads `sessionStorage` first and falls back to the impersonation cookie when `sessionStorage` is empty (cross-tab scenario). The returned header map is spread into every outbound request, so a single update to `getSystemHeaders()` propagates to all callers automatically. `getCopilotAuthHeaders()` wraps both the JWT fetch and the impersonation header into one `async` call. Callers no longer need to know about impersonation — they just spread the returned headers into their fetch options. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] As admin, impersonate a user and open copilot/autopilot chat — messages processed in the context of the impersonated user - [x] As admin, impersonate a user and submit feedback (upvote/downvote) — feedback recorded against the impersonated user - [x] Without impersonation active, copilot chat works normally - [x] Frontend unit tests pass: `pnpm test:unit`	2026-04-09 14:54:53 +00:00
Zamil Majdy	f112555fc3	feat(backend/copilot): hide session-level dry_run from LLM (#12711 ) ### Why During autopilot sessions with \`dry_run=True\`, the LLM was leaking awareness of simulation mode through three channels: 1. \`dry_run\` appeared as a required parameter in \`RunBlockTool\`'s schema — the LLM could see and pass it. 2. \`is_dry_run: true\` appeared in the serialized MCP tool result JSON the LLM received, causing it to narrate that execution was simulated. 3. The \`[DRY RUN]\` prefix on response messages told the LLM explicitly that credentials were absent or execution was skipped. This broke the illusion of a seamless preview experience: users watching an autopilot dry-run would see the LLM comment on simulation rather than treating the run as real. ### What Backend: - \`copilot/model.py\`: \`ChatSessionInfo.dry_run\` is the single source of truth, stored in the \`metadata\` JSON column (no migration needed). Set at session creation; never changes. - \`copilot/tools/run_block.py\`: Removed \`dry_run\` from the tool schema and \`_execute\` params entirely. Block always reads \`session.dry_run\`. - \`copilot/tools/run_agent.py\`: Kept \`dry_run\` as an optional schema parameter (LLM may request a per-call test run in normal sessions), but \`session.dry_run=True\` unconditionally forces it True. Removed from \`required\`. - \`copilot/tools/models.py\`: \`BlockOutputResponse.is_dry_run: bool \| None = None\` — field is absent from normal-run output (was always \`false\`). - \`copilot/tools/base.py\`: \`model_dump_json(exclude_none=True)\` — omits \`None\` fields from serialized output, keeping payloads clean. - \`copilot/sdk/tool_adapter.py\`: \`_strip_llm_fields\` removes \`is_dry_run\` from MCP tool result JSON after stashing for the frontend SSE stream. Stripping is conditional on \`session.dry_run\` — in normal sessions \`is_dry_run\` remains visible so the LLM can reason about individual simulated calls. Extracted \`_make_truncating_wrapper\` (was \`_truncating\`) for direct unit testing. - \`blocks/autopilot.py\`: \`dry_run\` propagates from \`execution_context.dry_run\` so nested AutoPilot sessions inherit the parent's simulation mode. Frontend: - \`useCopilotUIStore\`: Added \`isDryRun\` / \`setIsDryRun\` state persisted to localStorage (\`COPILOT_DRY_RUN\` key). - \`useChatSession\`: Accepts \`dryRun\` option; creates session with \`dry_run: true\` when enabled; resets session when the toggle changes. - \`DryRunToggleButton\`: New UI control for toggling dry_run mode. - \`RunAgent.tsx\` / \`helpers.tsx\`: Added \`AgentOutputResponse\` type handling and \`ExecutionStartedCard\` rendering for the \`agent_output\` response type. - OpenAPI: \`is_dry_run\` on \`BlockOutputResponse\` changed to \`boolean \| null\` (was \`boolean\`). ### How it works Three-layer defense: 1. Schema layer: \`run_block\` exposes no \`dry_run\` parameter. \`run_agent\` keeps it optional so the LLM can request test runs in normal sessions, but \`session.dry_run\` always wins. 2. Response layer: \`is_dry_run: bool \| None = None\` + \`exclude_none=True\` means the field is absent from the serialized JSON in non-dry-run mode — no leakage at rest. 3. Transport layer: When \`session.dry_run=True\`, \`_strip_llm_fields\` removes \`is_dry_run\` from the MCP result before the LLM sees it, while the stashed copy (for the frontend SSE stream) retains the full payload. Stash-before-strip ordering: \`_make_truncating_wrapper\` stashes the full tool output before calling \`_strip_llm_fields\`. This ensures \`StreamToolOutputAvailable\` events carry the complete payload — so the frontend's "Simulated" badge renders correctly — while the LLM only ever sees the stripped version. Session-level flag: \`ChatSessionInfo.dry_run\` is set at session creation and never changes. No LLM tool call can alter it. \`_strip_llm_fields\` fast path: Stripping is skipped when none of the \`_STRIP_FROM_LLM\` field names appear in the raw text (string scan before JSON parse), keeping the common non-dry-run path allocation-free. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] \`poetry run pytest backend/copilot/tools/test_dry_run.py\` — all tests pass - [x] \`poetry run pytest backend/copilot/sdk/tool_adapter_test.py\` — all tests pass (including new \`TestStripLlmFields\` suite) - [x] Pre-commit hooks pass (Ruff, Black, isort, pyright, tsc, OpenAPI export + orval generate) - [x] Verify LLM tool result JSON for a dry_run session does not contain \`is_dry_run\` - [x] Verify frontend SSE stream still delivers \`is_dry_run: true\` for "Simulated" badge rendering	2026-04-09 14:46:04 +00:00
Nicholas Tindle	e68dadd2c9	feat(backend): add Graphiti temporal knowledge graph memory for CoPilot (#12720 ) ## Summary Add Graphiti temporal knowledge graph memory to CoPilot, giving AutoPilot persistent cross-session memory with entities, relationships, and temporal validity tracking. - 3 new CoPilot tools (`graphiti_store`, `graphiti_search`, `graphiti_delete_user_data`) as BaseTool implementations — automatically available in both SDK and baseline/fast modes via existing TOOL_REGISTRY bridge - FalkorDB as graph database backend with per-user physical isolation via `driver.clone(database=group_id)` - graphiti-core Python library for in-process knowledge graph operations (no separate MCP server needed) - MemoryEpisodeLog append-only replay table for migration safety - LaunchDarkly flag `graphiti-memory` for per-user rollout - OpenRouter for extraction LLM, direct OpenAI for embeddings ### Memory Quality - Episode body uses `"Speaker: content"` format matching graphiti's extraction prompt expectations - Only user messages ingested (Zep Cloud `ignore_roles` approach) — assistant responses excluded from graph - `custom_extraction_instructions` suppress meta-entity pollution (no more "assistant", "human", block names as entities) - `ep.content` attribute correctly surfaced in search results and warm context - Per-user asyncio.Queue serializes ingestion (graphiti-core requirement) ### Architecture Decision Custom BaseTool implementations over MCP — the existing `create_copilot_mcp_server()` in `tool_adapter.py` already wraps every BaseTool as MCP for the SDK path. One implementation serves both execution paths with zero extra infrastructure. ## Test plan - [x] Set LaunchDarkly flag `graphiti-memory` to true for test user - [x] Verify FalkorDB is healthy: `docker compose up falkordb` - [x] S1: Send message with user facts ("my assistant is Sarah, CC her on client stuff, CRM is HubSpot") - [x] Verify agent calls `graphiti_store` to save memories - [x] S2 (new session): Ask "Who should I CC on outgoing client proposals?" - [x] Verify agent calls `graphiti_search` before answering - [x] Verify agent answers correctly from memory (Sarah) - [x] Verify graph entities are clean (no "assistant"/"human"/block names) - [x] Verify MemoryEpisodeLog has replay entries - [ ] Verify `GRAPHITI_MEMORY=false` in LaunchDarkly → tools return "not enabled" error 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Adds a new persistence layer and background ingestion flow for chat memory plus new dependencies/services (FalkorDB, `graphiti-core`) and prompt/tooling changes; rollout is gated by a LaunchDarkly flag but failures could impact chat latency or resource usage. > > Overview > Enables optional, per-user Graphiti temporal memory for CoPilot (gated by LaunchDarkly `graphiti-memory`), including warm-start recall on the first turn and background ingestion of user messages after each turn in both `baseline` and SDK chat paths. > > Adds Graphiti infrastructure: new `memory_search`/`memory_store` tools and response types, a per-user cached Graphiti client with safe `group_id` derivation, a FalkorDB driver tweak for full-text queries, and a serialized per-user ingestion queue with graceful failure/timeout handling. > > Introduces new runtime configuration and local dev support (`GRAPHITI_*` env vars, new `falkordb` docker service/volume), updates permissions/OpenAPI enums, and adds dependencies (`graphiti-core`, `falkordb`, `cachetools`) plus unit tests for the new modules. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `81eb14e30a`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 13:56:52 +00:00
Zamil Majdy	d113687878	fix(copilot): P0 guardrails, transient retry, and security hardening (#12636 ) ### Why The copilot's Claude Code CLI integration had several production reliability gaps reported from live deployments: - No transient retry: 429 rate-limit errors, 5xx server errors, and ECONNRESET connection resets surfaced immediately as failures — there was no retry mechanism. - Subagent permission errors: CLI subprocesses wrote temp files to `/tmp/claude-0/` which was inaccessible inside E2B sandboxes, causing subagent spawning to report "agent completed" without actually running. - Missing security hardening in non-OpenRouter modes: Security env vars (`CLAUDE_CODE_DISABLE_CLAUDE_MDS`, `CLAUDE_CODE_SKIP_PROMPT_HISTORY`, `CLAUDE_CODE_DISABLE_AUTO_MEMORY`, `CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC`) were only applied in the OpenRouter path, leaving subscription and direct Anthropic modes unprotected in multi-tenant deployment. - No resource guardrails: No per-query budget cap, turn limit, or fallback model meant a single runaway query could burn unlimited tokens/spend. - Lossy transcript reconstruction: When no transcript file was available (storage failure or compaction drop), the old code injected a truncated plain-text summary that cut tool results at 500 chars and dropped `tool_use`/`tool_result` structural linkage, causing the LLM to lose conversation context. ### What - SDK guardrails (`config.py`, `sdk/service.py`): Added `fallback_model` (auto-failover on 529 overloaded), `max_turns=1000` (runaway prevention), `max_budget_usd=100.0` (per-query cost cap). All configurable via env-backed `ChatConfig` fields. - Transient retry (`sdk/service.py`, `constants.py`): Exponential backoff (1s, 2s, 4s) for 429/5xx/ECONNRESET errors, retried only when `events_yielded == 0` to avoid breaking partial streams. `_TRANSIENT_ERROR_PATTERNS` extended with status-code-specific patterns to avoid false positives. - Workspace isolation (`sdk/env.py`): `CLAUDE_CODE_TMPDIR` now set in all auth modes so CLI subprocesses write to the per-session workspace directory rather than `/tmp/`. - Security hardening (`sdk/env.py`): Security env vars applied uniformly across all three auth modes (subscription, direct Anthropic, OpenRouter) via restructured `build_sdk_env()`. - Transcript reconstruction (`sdk/service.py`): `_session_messages_to_transcript()` converts `ChatMessage.tool_calls` and `ChatMessage.tool_call_id` to proper `tool_use`/`tool_result` JSONL blocks for `--resume`, restoring full structural fidelity. - Model normalization refactor (`sdk/service.py`): `_resolve_fallback_model()` and `_normalize_model_name()` extracted to share prefix-stripping and dot→hyphen conversion logic between primary and fallback model resolution. ### How it works Transient retry: `_can_retry_transient()` checks the retry budget and returns the next backoff delay (or `None` when exhausted). Retries are gated on `events_yielded == 0` — if any events were already streamed to the client, we cannot retry without breaking the SSE stream mid-response. After all retries are exhausted, `FRIENDLY_TRANSIENT_MSG` is surfaced to the user. Transcript reconstruction: When `--resume` has no on-disk session file, `_session_messages_to_transcript()` builds a JSONL transcript from `session.messages`, emitting `tool_use` blocks for assistant tool calls and `tool_result` blocks (with matching IDs) for their results. This gives Claude CLI the same structural fidelity as an on-disk session — preserving tool call/result pairing that the old plain-text injection lost. `build_sdk_env()` restructure: The three auth modes now share a common "epilogue" block that applies workspace isolation and security hardening env vars regardless of which mode is active, eliminating the previous pattern of repeating `if sdk_cwd: env["CLAUDE_CODE_TMPDIR"] = sdk_cwd` in each branch. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] 729 unit tests passing: `env_test.py`, `p0_guardrails_test.py`, `retry_scenarios_test.py` (incl. integration tests for both transient retry paths), `service_test.py`, `sdk_compat_test.py`, `response_adapter_test.py` - [x] E2E tested: live copilot session (API + UI), multi-turn, security env vars verified in all 3 auth modes, guardrail defaults confirmed - [x] `_session_messages_to_transcript()`: 7 unit tests covering empty input, tool_use blocks, tool_result blocks, no truncation (10K chars preserved), parent UUID chain, malformed argument handling	2026-04-09 21:10:39 +07:00
anvyle	aca81f3e40	Merge branch 'dev' of https://github.com/Significant-Gravitas/AutoGPT into feat/task-decomposition-copilot	2026-04-09 12:27:23 +02:00
anvyle	629fb4d3bb	fix(copilot): allow sub-instructions companion text and restore streaming render - Revert ChatMessagesContainer streaming filter — decompose_goal now visible during stream - Remove text suppression in splitReasoningAndResponse — table message is allowed alongside sub-instructions box Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 12:27:05 +02:00
anvyle	703d34364d	chore(frontend): update openapi.json snapshot Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 23:37:25 +02:00
anvyle	f330699a89	fix(copilot): improve decompose_goal UX — pin box post-stream, suppress companion text - Move decomposition prompt from prompting.py to agent_generation_guide.md as a required pre-build gate - Add tool-decompose_goal to CUSTOM_TOOL_TYPES so it renders individually (not collapsed) - Add task_decomposition to INTERACTIVE_RESPONSE_TYPES so the box is pinned to response after streaming - Filter out text parts (table) from response when decompose_goal is pinned - Hide decompose_goal during streaming so the box only appears once all reasoning is complete and Approve is immediately actionable Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-08 23:36:17 +02:00
Otto	7acfdf5974	docs(skill): add coverage guidance to pr-address skill (#12695 ) Requested by @majdyz ## Why As we enforce patch coverage targets via Codecov (see #12694), the `pr-address` skill needs to guide agents to verify test coverage when they write new code while addressing review comments. Without this, an agent could address a comment by adding untested code and create a new CI failure to fix. ## What Adds a Coverage section to `.claude/skills/pr-address/SKILL.md` with: - The `pytest --cov` command to check coverage locally on changed files - Clear rules: new code needs tests, don't remove existing tests, clean up dead test code when deleting code ## Impact Agents using `/pr-address` will now run coverage checks as part of their workflow and won't land untested new code. Linear: SECRT-2217 Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>	2026-04-08 17:05:54 +00:00
Zamil Majdy	ef477ae4b9	fix(backend): convert AttributeError to ValueError in _generate_schema (#12714 ) ## Why `POST /api/graphs` was returning 500 when an agent graph contained an Agent Input block without a `name` field. Root cause: `GraphModel._generate_schema` calls `model_construct(input_default)` (which skips Pydantic validation) to build a list of field objects. If `input_default` doesn't include `name`, the constructed `Input` object has no `name` attribute. The subsequent dict comprehension (`p.name: {...}`) then raises `AttributeError`, which is not handled and falls through to the generic `Exception → 500` catch-all in `rest_api.py`. The `ValueError → 400` handler already exists but is never reached. ## What - In `_generate_schema`, wrap the `return {…}` block in `try/except AttributeError` and re-raise as `ValueError`. - Added a unit test that directly exercises `GraphModel._generate_schema` with a nameless `AgentInputBlock.Input` and asserts `ValueError` is raised. ## How `rest_api.py` already has: ```python app.add_exception_handler(ValueError, handle_internal_http_error(400)) ``` The only change needed was to ensure `AttributeError` gets converted before it propagates. The fix is a single `try/except` block — no new exception types, no new handlers. Note:** In Pydantic v2, `ValidationError` is _not_ a subclass of `ValueError` — they are separate hierarchies. `pydantic.ValidationError` inherits directly from `Exception`. The existing separate handler for `pydantic.ValidationError` is correct and unrelated to this fix. ## Checklist - [x] My changes follow the project coding style - [x] I've written/updated tests for the changes - [x] Tests pass locally (`poetry run pytest backend/data/graph_test.py::test_generate_schema_raises_value_error_when_name_missing`) autogpt-platform-beta-v0.6.54	2026-04-09 00:05:01 +07:00
Zamil Majdy	705bd27930	fix(backend): wrap PlatformCostLog metadata in SafeJson to fix silent DataError (#12713 ) ## Changes - Wrap `metadata` field in `SafeJson()` when calling `PrismaLog.prisma().create()` in `log_platform_cost` - Add `platform_cost_integration_test.py` with DB round-trip tests for the fix ## Why `PrismaLog.prisma().create()` was silently failing with a `DataError` because passing a plain Python `dict` to a `Json?`-typed Prisma field is not allowed: ``` DataError: Invalid argument type. `metadata` should be of type NullableJsonNullValueInput or Json ``` The error was swallowed silently by `logger.exception` in the background task, so no rows ever landed in `PlatformCostLog` — which is why the dev admin cost dashboard showed no data after #12696 was merged. ## How Wrap `entry.metadata` in `SafeJson()` (already used throughout the codebase, lives in `backend/util/json.py`) before passing it to the Prisma create call. `SafeJson` extends `prisma.Json`, sanitizes PostgreSQL-incompatible control characters, and handles Pydantic-model conversion. Add two integration tests in `platform_cost_integration_test.py` (following the `credit_integration_test.py` pattern) that write a record to a real DB and read it back — confirming both metadata round-trip and NULL metadata work correctly. ## Test plan - [x] Integration tests verify metadata persists/reads correctly via Prisma - [x] Unit tests updated: `isinstance(data["metadata"], Json)` confirms the field is wrapped - [x] Verified on dev executor pod: cost rows now appear in the admin dashboard after fix	2026-04-08 23:59:06 +07:00
Zamil Majdy	fa6ea36488	fix(backend): make User RPC model forward-compatible during rolling deploys (#12707 ) ## Why A Sentry `AttributeError: 'dict' object has no attribute 'timezone'` was traced to the scheduler accessing `user.timezone` on a value that was a raw `dict` instead of a typed `User` model. Root cause (two-part): 1. `User.model_config` had `extra='forbid'`. During a rolling deploy, the database manager (newer pod) can return fields that the client (older pod) doesn't yet know about. `extra='forbid'` caused `TypeAdapter(User).validate_python()` to raise `ValidationError` on those unknown fields. 2. `DynamicClient._get_return` had a silent `try/except` that swallowed the `ValidationError` and fell back to returning the raw `dict`. The scheduler then received a `dict` and crashed on `.timezone`. ## What - `backend/data/model.py`: Change `User.model_config` `extra='forbid'` → `extra='ignore'`. Unknown fields from a newer database manager are silently dropped, making the RPC layer forward-compatible during rolling deploys. This is the primary fix. - `backend/util/service.py`: Restore the `try/except` fallback in `_get_return`, but make it observable: log the full error message at `WARNING` (so ValidationError details — field name, value — appear in logs) and call `sentry_sdk.capture_exception(e)` so every fallback is tracked and alerted without crashing the caller. The raw result is still returned as before (continuity). - `backend/util/service_test.py`: Add `TestGetReturn` with two direct unit tests: valid dict (including an unknown future field) → typed `User` returned; invalid dict (missing required fields) → fallback returns raw dict (no crash). Uses a typed `_SupportsGetReturn` Protocol + `cast` instead of `# type: ignore` suppressors. - `backend/executor/utils_test.py`: Fix misleading docstring; move inner imports to module top level per code style. ## How `extra='ignore'` is the standard Pydantic pattern for forward-compatible models at service boundaries. It means a rolling deploy where the DB manager has a new column will not break older client pods — the extra field is simply dropped on deserialization. The restored `_get_return` fallback preserves continuity (callers don't crash) while the `logger.warning` + `sentry_sdk.capture_exception` ensure no schema mismatch goes undetected. Silent degradation is replaced by observable degradation. ## Checklist - [x] Changes are backward-compatible (unknown fields ignored, not rejected) - [x] Regression tests added for `_get_return` typed deserialization contract - [x] Fallback preserved with observable logging and Sentry capture (no silent degradation) - [x] `extra='ignore'` is consistent with forward-compatibility requirements at service boundaries - [x] No `# type: ignore` suppressors introduced	2026-04-08 23:49:30 +07:00
Zamil Majdy	cab061a12d	fix(frontend): suppress Sentry noise from expected 401s in OnboardingProvider (#12708 ) ## Why `OnboardingProvider` was generating a Sentry alert (BUILDER-7ME: "Authorization header is missing") on every behave test run. The root cause: when a user's session expires mid-flow, they get redirected to `/login`. The provider remounts on the login page, calls `getV1CheckIfOnboardingIsCompleted()` while unauthenticated, and the 401 falls into the catch block which calls `console.error`. Sentry's `captureConsole` integration auto-captures all `console.error` calls as events, triggering the alert. This is expected behavior — the auth middleware handles the redirect, there's nothing broken. It was just noisy. ## What - In `OnboardingProvider`'s `initializeOnboarding` catch block, return early and silently on `ApiError` with status 401 — no `console.error`, no toast - Only unexpected errors (non-401) still surface via `console.error` and the destructive toast ## How ```ts } catch (error) { if (error instanceof ApiError && error.status === 401) { return; } // ... existing error handling } ``` ## Checklist - [x] `pnpm format && pnpm lint && pnpm types` pass - [x] Change is minimal and scoped to the one catch block - [x] No new test needed — this is a logging/noise fix, not a behavioral change	2026-04-08 23:40:49 +07:00
Zamil Majdy	6552d9bfdd	fix(backend/executor): OrchestratorBlock dry-run credentials + Responses API status field (#12709 ) ## Why Two bugs block OrchestratorBlock from working correctly: 1. Dry-run always fails with "credentials required" even when `OPEN_ROUTER_API_KEY` is set on dev. The n8n conversion dry-run hits this. 2. Agent-mode OrchestratorBlock fails on the second LLM call with `Error code: 400 – Unknown parameter: 'input[2].status'` when using OpenAI models (Responses API path). ## What Bug 1 — manager.py credential null (`backend/executor/manager.py`): The dry-run path called `input_data[field_name] = None` to "clear" the credential slot, but `_execute` in `_base.py` filters out `None` values before calling `input_schema(...)`. This drops the required `credentials` field from the schema constructor, causing a Pydantic validation error. Fix: Don't null out the field. If the user already has credential metadata in `input_data` (normal case), leave it intact. If not (no credentials configured), synthesise a minimal `CredentialsMetaInput`-compatible placeholder from the platform credentials so schema construction passes. The actual `APIKeyCredentials` (platform key) is still injected via `extra_exec_kwargs`. Bug 2 — Responses API `status` field** (`backend/blocks/orchestrator.py`): OpenAI returns output items (function calls, messages) with a `status: "completed"` field. When `_convert_raw_response_to_dict` serialises these items and they are stored in `conversation_history`, they are sent back as input on the next call — but OpenAI rejects `status` as an input-only field. Fix: Strip `status` from each output item before it enters the history. ## How - `manager.py` lines 311-314: removed the `input_data[field_name] = None` nullification; added a conditional placeholder when no credential metadata is present. - `orchestrator.py` `_convert_raw_response_to_dict`: filter `k != "status"` when extracting Responses API output items. - Tests added for both fixes. ## Checklist - [x] Tests written and passing (94 total, all green) - [x] Pre-commit hooks passed (Black, Ruff, isort, typecheck) - [x] No out-of-scope changes	2026-04-08 23:40:08 +07:00
anvyle	5bb919e7b5	feat(copilot): add task decomposition for agent building Add a decompose_goal tool that breaks user goals into sub-instructions before building. Users see a plan checklist and can approve or modify before the agent is created, improving transparency and control. - Backend: DecomposeGoalTool, TaskDecompositionResponse model, system prompt update - Frontend: DecomposeGoal component with StepItem checklist, approve/modify buttons Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-08 14:33:49 +02:00
Zamil Majdy	ff8cdda4e8	feat(platform/admin): cost tracking for system credentials (#12696 ) ## Why When system-managed credentials are used (AutoGPT pays the API bills), there was no visibility into which providers were being called, how much each costs, or which users were driving usage. This makes it impossible to set appropriate per-user limits or reconcile expenses with actual API invoices. ## What End-to-end platform cost tracking for all 22 system-credential providers + both copilot modes: - Every block execution that uses system credentials records a `PlatformCostLog` row (provider, cost, tokens, user, execution IDs) - Copilot turns (SDK + baseline) are tracked with model name, token counts, and actual USD cost - Admin dashboard at `/admin/platform-costs` shows cost breakdown by provider and user with date/provider/user filters and paginated raw logs - Admin API endpoints with 30s TTL cache: `GET /platform-costs/dashboard` and `GET /platform-costs/logs` ## How ### Core hook `cost_tracking.py` calls `log_system_credential_cost()` after each block node execution. It reads `NodeExecutionStats.provider_cost` (set by `merge_stats()` inside each block) and dispatches a fire-and-forget `INSERT` via `log_platform_cost_safe()`. ### Per-block tracking Each block calls `self.merge_stats(NodeExecutionStats(provider_cost=..., provider_cost_type=...))`: \| Tracking type \| Providers \| Amount \| \|---\|---\|---\| \| `cost_usd` \| OpenRouter, Exa \| Actual USD from API response \| \| `tokens` \| OpenAI, Anthropic, Groq, Ollama, Jina \| Token count from response.usage \| \| `characters` \| Unreal Speech, ElevenLabs, D-ID \| Input text length \| \| `sandbox_seconds` \| E2B \| Walltime \| \| `walltime_seconds` \| FAL, Revid, Replicate \| Walltime \| \| `per_run` \| Google Maps, Apollo, SmartLead, etc. \| 1 per execution \| OpenRouter cost: extracted via `with_raw_response.create()` and `raw.headers.get("x-total-cost")` with `math.isfinite` + `>= 0` validation (replaces private `_response` access). ### Copilot tracking `token_tracking.py` writes a `PlatformCostLog` row per copilot LLM turn via an async fire-and-forget queue bounded by a `Semaphore(50)`. SDK path uses `sdk_msg.total_cost_usd`; baseline path uses the `x-total-cost` header from OpenRouter streaming responses. ### Executor drain `drain_pending_cost_logs()` is called before `executor.shutdown()` using a module-level loop registry (`_active_node_execution_loops`) so that pending log tasks from each worker thread's event loop are awaited before the process exits. Tasks are filtered by `task.get_loop() is current_loop` to avoid cross-loop `RuntimeError` in Python ≥ 3.10. ### CoPilot executor lifecycle Worker threads connect Prisma on startup and disconnect on cleanup (even on failure). If `db.connect()` fails during `@func_retry`, the event loop is stopped and joined before re-raising so no loop is leaked across retry attempts. ### Schema ```prisma model PlatformCostLog { id String @id @default(uuid()) createdAt DateTime @default(now()) userId String? graphExecId String? nodeExecId String? blockName String provider String trackingType String costMicrodollars BigInt @default(0) inputTokens Int? outputTokens Int? duration Float? model String? } ``` ### Admin dashboard React page with three tabs (By Provider / By User / Raw Logs) driven by two generated Orval hooks (`useGetV2GetPlatformCostDashboard`, `useGetV2GetPlatformCostLogs`). Filters are URL-based (`searchParams`) for bookmarkability. Pagination for raw logs. Per-provider estimated totals using configurable cost-per-unit multipliers. ## Test plan - [x] Migration applies cleanly - [x] Block execution with system credentials creates PlatformCostLog row - [x] Copilot conversation records cost log with tokens + model - [x] `/admin/platform-costs` dashboard renders with correct data - [x] Date/provider/user filters work correctly - [x] Non-admin users get 403 on cost endpoints - [x] Executor drain completes before process exit (no lost logs) --------- Co-authored-by: Zamil Majdy <majdyz@users.noreply.github.com> Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>	2026-04-08 10:05:33 +00:00
Zamil Majdy	c51097d8ac	dx(orchestrate): harden agent fleet scripts — idle detection, pagination, fake-resolution guard, parallelism (#12704 ) ### Why / What / How Why: A series of production failures exposed gaps in the agent fleet tooling: 1. Agents using `_wait_idle`/`wait_for_claude_idle` would time out waiting for `❯` while a settings-error dialog blocked progress — because the dialog can appear above the last 3 captured lines. 2. The run-loop's adaptive backoff used `POLL_CURRENT * 3 / 2` which stalls at 1 forever in bash integer arithmetic, and printed the interval before recomputing it. 3. `pr-address` agents were silently missing review threads when a PR had >100 threads across multiple pages — they'd stop at page 1, address 69/111 threads, and falsely report "done". 4. `resolveReviewThread` was being called without a committed fix — producing false "0 unresolved" signals that bypassed verification. 5. The onboarding bypass in `/pr-test` had no timeout on curl calls, so the step could hang forever if the backend wasn't ready yet. 6. The orchestrator's own verification query used `first: 1` which can't reliably count unresolved threads across all pages. What: - Idle detection hardened in both `spawn-agent.sh` and `run-loop.sh` — full-pane check for 'Enter to confirm' so the dialog is never missed - Adaptive backoff arithmetic fixed (`POLL_CURRENT + POLL_CURRENT/2 + 1` always increments); log ordering corrected; `POLL_IDLE_MAX` made env-configurable - `pr-address/SKILL.md`: mandatory cursor-pagination loop collecting ALL thread IDs before addressing anything; prominent ⚠️ warning with the PR #12636 incident (142 threads, 2 pages, agent stopped at 69) - `pr-address/SKILL.md`: new "Parallel thread resolution" section — batch by file, one commit per file group, concurrent reply subshells with 3s gaps, sequential resolves - `pr-address/SKILL.md`: "Verify actual count" section now uses paginated loop (not single first:100 query) - `orchestrate/SKILL.md`: verification query fixed to paginate all pages; new "Thread resolution integrity" section with anti-patterns; fake-resolution detection query; state-staleness recovery; RUNNING-count confusion explained - `/pr-test` onboarding bypass: `--max-time 30` on curl calls; hard-fail on bypass failure How: All changes are to DX skill files and orchestration scripts — no production code modified. Each fix is a separate commit so the change history is readable. ### Changes 🏗️ Scripts: - `run-loop.sh`: `wait_for_claude_idle` — add 'Enter to confirm' dialog check (reset elapsed on dialog); fix backoff arithmetic stall; fix log ordering; make `POLL_IDLE_MAX` env-configurable; reset poll interval when `waiting_approval` agents present - `spawn-agent.sh`: `_wait_idle` — capture full pane (not just `tail -3`) for 'Enter to confirm' check; wait-for-idle before sending agent objective to prevent stuck pasted-text SKILL.md files: - `pr-address/SKILL.md`: - ⚠️ WARNING + totalCount step + cursor-pagination loop before addressing any threads - "Parallel thread resolution" section: group by file, batch commits, concurrent replies, sequential resolves - "Verify actual count" section: full paginated loop instead of single first:100 query - "What counts as a valid resolution" with explicit anti-patterns (Acknowledged, Accepted, no-commit resolves) - Rate limits table (403 secondary vs 429 primary), 2-3 min recovery - `git rev-parse HEAD` pattern with `${FULL_SHA:0:9}` short SHA - `orchestrate/SKILL.md`: - Thread resolution integrity section + fake-resolution detection query - Verification query fixed to paginate all pages - State file staleness recovery (stale `loop_window`, closed windows, repair recipes) - RUNNING count confusion: explains `waiting_approval` included in regex - Idle check before re-briefing agents - `pr-test/SKILL.md`: - `--max-time 30` on onboarding bypass curl calls - Hard-fail (`exit 1`) if bypass verification fails ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified adaptive backoff increments correctly (no longer stalls at 1) - [x] Verified 'Enter to confirm' dialog handled in both wait functions - [x] Verified pagination loop collects all thread IDs across pages - [x] Verified PR #12636 onboarding bypass works end-to-end (11/11 scenarios PASS) --------- Co-authored-by: Zamil Majdy <majdy.zamil@gmail.com>	2026-04-08 17:11:55 +07:00
Zamil Majdy	f3306d9211	Merge branch 'master' of github.com:Significant-Gravitas/AutoGPT into dev	2026-04-08 16:17:09 +07:00
Zamil Majdy	f5e2eccda7	dx(orchestrate): fix stale-review gate and add pr-test evaluation rules to SKILL.md (#12701 ) ## Changes ### verify-complete.sh - CHANGES_REQUESTED reviews are now compared against the latest commit timestamp. If the review was submitted before the latest commit, it is treated as stale and does not block verification. - Added fail-closed guard: if the `gh pr view` fetch fails, the script exits 1 (rather than treating missing data as "no blocking reviews") - Fixed edge case: a `CHANGES_REQUESTED` review with a null `submittedAt` is now counted as fresh/blocking (previously silently skipped) - Combined two separate `gh pr view` calls into one (`--json commits,reviews`) to reduce API calls and ensure consistency ### SKILL.md (orchestrate skill) - Added `### /pr-test result evaluation` section with explicit pass/partial/fail handling table - PARTIAL on any headline feature scenario = immediate blocker: re-brief the agent, fix, and re-run from scratch. Never approve or output ORCHESTRATOR:DONE with a PARTIAL headline result. - Concrete incident callout: PR #12699 S5 (Apply suggestions) was PARTIAL — AI never output JSON action blocks — but was nearly approved. This rule prevents recurrence. - Updated `verify-complete.sh` description throughout to include "no fresh CHANGES_REQUESTED" - Added staleness rule documentation: a review only blocks if submitted after the latest commit ## Why Two separate incidents prompted these changes: 1. verify-complete.sh false positive: An automated bot (autogpt-pr-reviewer) submitted a `CHANGES_REQUESTED` review in April. An agent then pushed fixing commits. The old script still blocked on the stale review, preventing the PR from being verified as done. 2. Missed PARTIAL signal: PR #12699 had a PARTIAL result on its headline scenario (S5 Apply button) because the AI emitted direct builder tool calls instead of JSON action blocks. The orchestrator nearly approved it. The new SKILL.md rule makes PARTIAL = blocker explicit. ## Checklist - [x] I have read the contribution guide - [x] My changes follow the code style of this project - [x] Changes are limited to the scope of this PR (< 20% unrelated changes) - [x] All new and existing tests pass	2026-04-08 08:58:42 +07:00
Zamil Majdy	58b230ff5a	dx: add /orchestrate skill — Claude Code agent fleet supervisor with spare worktree lifecycle (#12691 ) ### Why When running multiple Claude Code agents in parallel worktrees, they frequently get stuck: an agent exits and sits at a shell prompt, freezes mid-task, or waits on an approval prompt with no human watching. Fixing this currently requires manually checking each tmux window. ### What Adds a `/orchestrate` skill — a meta-agent supervisor that manages a fleet of Claude Code agents across tmux windows and spare worktrees. It auto-discovers available worktrees, spawns agents, monitors them, kicks idle/stuck ones, auto-approves safe confirmations, and recycles worktrees on completion. ### How to use Prerequisites: - One tmux session already running (the skill adds windows to it; it does not create a new session) - Spare worktrees on `spare/N` branches (e.g. `AutoGPT3` on `spare/3`, `AutoGPT7` on `spare/7`) Basic workflow: ``` /orchestrate capacity → see how many spare worktrees are free /orchestrate start → enter task list, agents spawn automatically /orchestrate status → check what's running /orchestrate add → add one more task to the next free worktree /orchestrate stop → mark inactive (agents finish current work) /orchestrate poll → one manual poll cycle (debug / on-demand) ``` Worktree lifecycle: ```text spare/N branch → /orchestrate add → new window + feat/branch + claude running ↓ ORCHESTRATOR:DONE ↓ kill window + git checkout spare/N ↓ spare/N (free again) ``` Windows are always capped by worktree count — no creep. ### Changes - `.claude/skills/orchestrate/SKILL.md` — skill definition with 5 subcommands, state file schema, spawn/recycle helpers, approval policy - `.claude/skills/orchestrate/scripts/classify-pane.sh` — pane state classifier: `idle` (shell foreground), `running` (non-shell), `waiting_approval` (pattern match), `complete` (ORCHESTRATOR:DONE) - `.claude/skills/orchestrate/scripts/poll-cycle.sh` — poll loop: reads/updates state file atomically, outputs JSON action list, stuck detection via output-hash sampling State detection: \| State \| Detection method \| \|---\|---\| \| `idle` \| `pane_current_command` is a shell (zsh/bash/fish) \| \| `running` \| `pane_current_command` is non-shell (claude/node) \| \| `stuck` \| pane hash unchanged for N consecutive polls \| \| `waiting_approval` \| pattern match on last 40 lines of pane output \| \| `complete` \| `ORCHESTRATOR:DONE` string present in pane output \| Safety policy for auto-approvals: git ops, package installs, tests, docker compose → approve. `rm -rf` outside worktree, force push, `sudo`, secrets → escalate to user. State file lives at `~/.claude/orchestrator-state.json` (outside repo, never committed). ### Checklist #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] `classify-pane.sh`: idle shell → `idle`, running process → `running`, `ORCHESTRATOR:DONE` → `complete`, approval prompt → `waiting_approval`, nonexistent window → `error` - [x] `poll-cycle.sh`: inactive state → `[]`, empty agents array → `[]`, spare worktree discovery, stuck detection (3-poll hash cycle) - [x] Real agent spawn in `autogpt1` tmux session — agent ran, output `ORCHESTRATOR:DONE`, recycle verified - [x] Upfront JSON validation before `set -e`-guarded jq reads - [x] Idle timer reset only on `idle → running` transition (not stuck), preventing false stuck-detections - [x] Classify fallback only triggers when output is empty (no double-JSON on classify exit 1)	2026-04-08 00:18:32 +07:00
Krzysztof Czerwinski	67bdef13e7	feat(platform): load copilot messages from newest first with cursor-based pagination (#12328 ) Copilot chat sessions with long histories loaded all messages at once, causing slow initial loads. This PR adds cursor-based pagination so only the most recent messages load initially, with older messages fetched on demand as the user scrolls up. ### Changes 🏗️ Backend: - Cursor-based pagination on `GET /sessions/{session_id}` (`limit`, `before_sequence` params) - `user_id` relation filter on the paginated query — ownership check and message fetch now run in parallel - Backward boundary expansion to keep tool-call / assistant message pairs intact at page edges - Unit tests for paginated queries Frontend: - `useLoadMoreMessages` hook + `LoadMoreSentinel` (IntersectionObserver) for infinite scroll upward - `ScrollPreserver` to maintain scroll position when older messages are prepended - Session-keyed `Conversation` remount with one-frame opacity hide to eliminate scroll flash on switch - Scrollbar moved to the correct scroll container; loading spinner no longer causes overflow ### Checklist 📋 - [x] Pagination: only recent messages load initially; older pages load on scroll-up - [x] Scroll position preserved on prepend; no flash on session switch - [x] Tool-call boundary pairs stay intact across page edges - [x] Stream reconnection still works on initial load --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>	2026-04-07 12:43:47 +00:00
Ubbe	e67dd93ee8	refactor(frontend): remove stale feature flags and stabilize share execution (#12697 ) ## Why Stale feature flags add noise to the codebase and make it harder to understand which flags are actually gating live features. Four flags were defined but never referenced anywhere in the frontend, and the "Share Execution Results" flag has been stable long enough to remove its gate. ## What - Remove 4 unused flags from the `Flag` enum and `defaultFlags`: `NEW_BLOCK_MENU`, `GRAPH_SEARCH`, `ENABLE_ENHANCED_OUTPUT_HANDLING`, `AGENT_FAVORITING` - Remove the `SHARE_EXECUTION_RESULTS` flag and its conditional — the `ShareRunButton` now always renders ## How - Deleted enum entries and default values in `use-get-flag.ts` - Removed the `useGetFlag` call and conditional wrapper around `<ShareRunButton />` in `SelectedRunActions.tsx` ## Changes - `src/services/feature-flags/use-get-flag.ts` — removed 5 flags from enum + defaults - `src/app/(platform)/library/.../SelectedRunActions.tsx` — removed flag import, condition; share button always renders ### Checklist - [x] My PR is small and focused on one change - [x] I've tested my changes locally - [x] `pnpm format && pnpm lint` pass 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 19:28:40 +07:00
Otto	3140a60816	fix(frontend/builder): allow horizontal scroll for JSON output data (#12638 ) Requested by @Abhi1992002 ## Why JSON output data in the "Complete Output Data" dialog and node output panel gets clipped — text overflows and is hidden with no way to scroll right. Reported by Zamil in #frontend. ## What The `ContentRenderer` wrapper divs used `overflow-hidden` which prevented the `JSONRenderer`'s `overflow-x-auto` from working. Changed both wrapper divs from `overflow-hidden` to `overflow-x-auto`. ```diff - overflow-hidden [&>]:rounded-xlarge [&>]:!text-xs [&_pre]:whitespace-pre-wrap [&_pre]:break-words + overflow-x-auto [&>]:rounded-xlarge [&>]:!text-xs [&_pre]:whitespace-pre-wrap [&_pre]:break-words - overflow-hidden [&>]:rounded-xlarge [&>]:!text-xs + overflow-x-auto [&>]:rounded-xlarge [&>]:!text-xs ``` ## Scope - 1 file changed (`ContentRenderer.tsx`) - 2 lines: `overflow-hidden` → `overflow-x-auto` - CSS only, no logic changes Resolves SECRT-2206 Co-authored-by: Abhimanyu Yadav <122007096+Abhi1992002@users.noreply.github.com> Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>	2026-04-07 19:11:09 +07:00
Nicholas Tindle	41c2ee9f83	feat(platform): add copilot artifact preview panel (#12629 ) ### Why / What / How Copilot artifacts were not previewing reliably: PDFs downloaded instead of rendering, Python code could still render like markdown, JSX/TSX artifacts were brittle, HTML dashboards/charts could fail to execute, and users had to manually open artifact panes after generation. The pane also got stuck at maximized width when trying to drag it smaller. This PR adds a dedicated copilot artifact panel and preview pipeline across the backend/frontend boundary. It preserves artifact metadata needed for classification, adds extension-first preview routing, introduces dedicated preview/rendering paths for HTML/CSV/code/PDF/React artifacts, auto-opens new or edited assistant artifacts, and fixes the maximized-pane resize path so dragging exits maximized mode immediately. ### Changes 🏗️ - add artifact card and artifact panel UI in copilot, including persisted panel state and resize/maximize/minimize behavior - add shared artifact extraction/classification helpers and auto-open behavior for new or edited assistant messages with artifacts - add preview/rendering support for HTML, CSV, PDF, code, and React artifact files - fix code artifacts such as Python to render through the code renderer with a dark code surface instead of markdown-style output - improve JSX/TSX preview behavior with provider wrapping, fallback export selection, and explicit runtime error surfaces - allow script execution inside HTML previews so embedded chart dashboards can render - update workspace artifact/backend API handling and regenerate the frontend OpenAPI client - add regression coverage for artifact helpers, React preview runtime, auto-open behavior, code rendering, and panel store behavior - post-review hardening: correct download path for cross-origin URLs, defer scroll restore until content mounts, gate auto-open behind the ARTIFACTS flag, parse CSVs with RFC 4180-compliant quoted newlines + BOM handling, distinguish 413 vs 409 on upload, normalize empty session_id, and keep AnimatePresence mounted so the panel exit animation plays ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] `pnpm format` - [x] `pnpm lint` - [x] `pnpm types` - [x] `pnpm test:unit` #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under Changes) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Adds a new Copilot artifact preview surface that executes user/AI-generated HTML/React in sandboxed iframes and changes workspace file upload/listing behavior, so regressions could affect file handling and client security assumptions despite sandboxing safeguards. > > Overview > Adds an Artifacts feature (flagged by `Flag.ARTIFACTS`) to Copilot: workspace file links/attachments now render as `ArtifactCard`s and can open a new resizable/minimizable `ArtifactPanel` with history, auto-open behavior, copy/download actions, and persisted panel width. > > Introduces a richer artifact preview pipeline with type classification and dedicated renderers for HTML, CSV, PDF, code (Shiki-highlighted), and React/TSX (transpiled and executed in a sandboxed iframe), plus safer download filename handling and content caching/scroll restore. > > Extends the workspace backend API by adding `GET /workspace/files` pagination, standardizing operation IDs in OpenAPI, attaching `metadata.origin` on uploads/agent-created files, normalizing empty `session_id`, improving upload error mapping (409 vs 413), and hardening post-quota soft-delete error handling; updates and expands test coverage accordingly. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit `b732d10eca`. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 11:24:22 +00:00
Ubbe	ca748ee12a	feat(frontend): refine AutoPilot onboarding — branding, auto-advance, soft cap, polish (#12686 ) ### Why / What / How Why: The onboarding flow had inconsistent branding ("Autopilot" vs "AutoPilot"), a heavy progress bar that dominated the header, an extra click on the role screen, and no guidance on how many pain points to select — leading to users selecting everything or nothing useful. What: Copy & brand fixes, UX improvements (auto-advance, soft cap), and visual polish (progress bar, checkmark badges, purple focus inputs). How: - Replaced all "Autopilot" with "AutoPilot" (capital P) across screens 1-3 - Removed the `?` tooltip on screen 1 (users will learn about AutoPilot from the access email) - Changed name label to conversational "What should I call you?" - Screen 2: auto-advances 350ms after role selection (except "Other" which still shows input + button) - Screen 3: soft cap of 3 selections with green confirmation text and shake animation on overflow attempt - Thinned progress bar from ~10px to 3px (Linear/Notion style) - Added purple checkmark badges on selected cards - Updated Input atom focus state to purple ring ### Changes 🏗️ - WelcomeStep: "AutoPilot" branding, removed tooltip, conversational label - RoleStep: Updated subtitle, auto-advance on non-"Other" role select, Continue button only for "Other" - PainPointsStep: Soft cap of 3 with dynamic helper text and shake animation - usePainPointsStep: Added `atLimit`/`shaking` state, wrapped `togglePainPoint` with cap logic - store.ts: `togglePainPoint` returns early when at 3 and adding - ProgressBar: 3px height, removed glow shadow - SelectableCard: Added purple checkmark badge on selected state - Input atom: Focus ring changed from zinc to purple - tailwind.config.ts: Added `shake` keyframe and `animate-shake` utility ### Checklist 📋 #### For code changes: - [ ] I have clearly listed my changes in the PR description - [ ] I have made a test plan - [ ] I have tested my changes according to the test plan: - [ ] Navigate through full onboarding flow (screens 1→2→3→4) - [ ] Verify "AutoPilot" branding on all screens (no "Autopilot") - [ ] Verify screen 2 auto-advances after tapping a role (non-"Other") - [ ] Verify "Other" role still shows text input and Continue button - [ ] Verify Back button works correctly from screen 2 and 3 - [ ] Select 3 pain points and verify green "3 selected" text - [ ] Attempt 4th selection and verify shake animation + swap message - [ ] Deselect one and verify can select a different one - [ ] Verify checkmark badges appear on selected cards - [ ] Verify progress bar is thin (3px) and subtle - [ ] Verify input focus state is purple across onboarding inputs - [ ] Verify "Something else" + other text input still works on screen 3 --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 17:58:36 +07:00
Zamil Majdy	243b12778f	dx: improve pr-test skill — inline screenshots, flow captions, and test evaluation (#12692 ) ## Changes ### 1. Inline image enforcement (Step 7) - Added `CRITICAL` warning: never post a bare directory tree link - Added post-comment verification block that greps for `![` tags and exits 1 if none found — agents can't silently skip inline embedding ### 2. Structured screenshot captions (Step 6) - `SCREENSHOT_EXPLANATIONS` now requires Flow (which scenario), Steps (exact actions taken), Evidence (what this proves) - Good/bad example included so agents know what format is expected - A bare "shows the page" caption is explicitly rejected ### 3. Test completeness evaluation (Step 8) — new step After posting screenshots, the agent must evaluate coverage against the test plan and post a formal GitHub review: - `APPROVE` — every scenario tested with screenshot + DB/API evidence, no blockers - `REQUEST_CHANGES` — lists exact gaps: untested scenarios, missing evidence, confirmed bugs - Per-scenario checklist (✅/❌) required in the review body - Cannot auto-approve without ticking every item in the test plan ## Why - Agents were posting `https://github.com/.../tree/test-screenshots/...` instead of `![name](url)` inline - Screenshot captions were too vague to be useful ("shows the page") - No mechanism to catch incomplete test runs — agent could skip scenarios and still post a passing report ## Checklist - [x] `.claude/skills/pr-test/SKILL.md` updated - [x] No production code changes — skill/dx only - [x] Pre-commit hooks pass	2026-04-07 16:04:08 +07:00
An Vy Le	43c81910ae	fix(backend/copilot): skip AI blocks without model property in fix_ai_model_parameter (#12688 ) ### Why / What / How Why: Some AI-category blocks do not expose a `"model"` input property in their `inputSchema`. The `fix_ai_model_parameter` fixer was unconditionally injecting a default model value (e.g. `"gpt-4o"`) into any node whose block has category `"AI"`, regardless of whether that block actually accepts a `model` input. This causes the agent JSON to include an invalid field for those blocks. What: Guard the model-injection logic with a check that `"model"` exists in the block's `inputSchema.properties` before attempting to set or validate the field. AI blocks that have no model selector are now skipped entirely. How: In `fix_ai_model_parameter`, after confirming `is_ai_block`, extract `input_properties` from the block's `inputSchema.properties` and `continue` if `"model"` is absent. The subsequent `model_schema` lookup is also simplified to reuse the already-fetched `input_properties` dict. A regression test is added to cover this case. ### Changes 🏗️ - `backend/copilot/tools/agent_generator/fixer.py`: In `fix_ai_model_parameter`, skip AI-category nodes whose block `inputSchema.properties` does not contain a `"model"` key; reuse `input_properties` for the subsequent `model_schema` lookup. - `backend/copilot/tools/agent_generator/fixer_test.py`: Add `test_ai_block_without_model_property_is_skipped` to `TestFixAiModelParameter`. ### Checklist 📋 #### For code changes: - [ ] I have clearly listed my changes in the PR description - [ ] I have made a test plan - [ ] I have tested my changes according to the test plan: - [ ] Run `poetry run pytest backend/copilot/tools/agent_generator/fixer_test.py` — all 50 tests pass (49 pre-existing + 1 new) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 17:14:11 +00:00
anvyle	261959104a	fix(backend/copilot): skip AI blocks without model property in fix_ai_model_parameter Some AI-category blocks do not expose a "model" input property in their inputSchema. The fixer was injecting a default model value into these blocks, which is incorrect. Now checks for the presence of "model" in inputSchema properties before attempting to set or validate the model field. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 19:00:47 +02:00
Ubbe	a11199aa67	dx(frontend): set up React integration testing with Vitest + RTL + MSW (#12667 ) ## Summary - Establish React integration tests (Vitest + RTL + MSW) as the primary frontend testing strategy (~90% of tests) - Update all contributor documentation (TESTING.md, CONTRIBUTING.md, AGENTS.md) to reflect the integration-first convention - Add `NuqsTestingAdapter` and `TooltipProvider` to the shared test wrapper so page-level tests work out of the box - Write 8 integration tests for the library page as a reference example for the pattern ## Why We had the testing infrastructure (Vitest, RTL, MSW, Orval-generated handlers) but no established convention for page-level integration tests. Most existing tests were for stores or small components. Since our frontend is client-first, we need a documented, repeatable pattern for testing full pages with mocked APIs. ## What - Docs: Rewrote `TESTING.md` as a comprehensive guide. Updated testing sections in `CONTRIBUTING.md`, `frontend/AGENTS.md`, `platform/AGENTS.md`, and `autogpt_platform/AGENTS.md` - Test infra: Added `NuqsTestingAdapter` (for `nuqs` query state hooks) and `TooltipProvider` (for Radix tooltips) to `test-utils.tsx` - Reference tests: `library/__tests__/main.test.tsx` with 8 tests covering agent rendering, tabs, folders, search bar, and Jump Back In ## How - Convention: tests live in `__tests__/` next to `page.tsx`, named descriptively (`main.test.tsx`, `search.test.tsx`) - Pattern: `setupHandlers()` → `render(<Page />)` → `findBy*` assertions - MSW handlers from `@/app/api/__generated__/endpoints/{tag}/{tag}.msw.ts` for API mocking - Custom `render()` from `@/tests/integrations/test-utils` wraps all required providers ## Test plan - [x] All 422 unit/integration tests pass (`pnpm test:unit`) - [x] `pnpm format` clean - [x] `pnpm lint` clean (no new errors) - [x] `pnpm types` — pre-existing onboarding type errors only, no new errors 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co> Co-authored-by: Reinier van der Leer <pwuts@agpt.co>	2026-04-06 13:17:08 +00:00
Zamil Majdy	5f82a71d5f	feat(copilot): add Fast/Thinking mode toggle with full tool parity (#12623 ) ### Why / What / How Users need a way to choose between fast, cheap responses (Sonnet) and deep reasoning (Opus) in the copilot. Previously only the SDK/Opus path existed, and the baseline path was a degraded fallback with no tool calling, no file attachments, no E2B sandbox, and no permission enforcement. This PR adds a copilot mode toggle and brings the baseline (fast) path to full feature parity with the SDK (extended thinking) path. ### Changes 🏗️ #### 1. Mode toggle (UI → full stack) - Add Fast / Thinking mode toggle to ChatInput footer (Phosphor `Brain`/`Zap` icons via lucide-react) - Thread `mode: "fast" \| "extended_thinking" \| null` from `StreamChatRequest` → RabbitMQ queue → executor → service selection - Fast → baseline service (Sonnet 4 via OpenRouter), Thinking → SDK service (Opus 4.6) - Toggle gated behind `CHAT_MODE_OPTION` feature flag with server-side enforcement - Mode persists in localStorage with SSR-safe init #### 2. Baseline service full tool parity - Tool call persistence: Store structured `ChatMessage` entries (assistant + tool results) instead of flat concatenated text — enables frontend to render tool call details and maintain context across turns - E2B sandbox: Wire up `get_or_create_sandbox()` so `bash_exec` routes to E2B (image download, Python/PIL compression, filesystem access) - File attachments: Accept `file_ids`, download workspace files, embed images as OpenAI vision blocks, save non-images to working dir - Permissions: Filter tool list via `CopilotPermissions` (whitelist/blacklist) - URL context: Pass `context` dict to user message for URL-shared content - Execution context: Pass `sandbox`, `sdk_cwd`, `permissions` to `set_execution_context()` - Model: Changed `fast_model` from `google/gemini-2.5-flash` to `anthropic/claude-sonnet-4` for reliable function calling - Temp dir cleanup: Lazy `mkdtemp` (only when files attached) + `shutil.rmtree` in finally #### 3. Transcript support for Fast mode - Baseline service now downloads / validates / loads / appends / uploads transcripts (parity with SDK) - Enables seamless mode switching mid-conversation via shared transcript - Upload shielded from cancellation, bounded at 5s timeout #### 4. Feature-flag infrastructure fixes - `FORCE_FLAG_*` env-var overrides on both backend and frontend for local dev / E2E - LaunchDarkly context parity (frontend mirrors backend user context) - `CHAT_MODE_OPTION` default flipped to `false` to match backend #### 5. Other hardening - Double-submit ref guard in `useChatInput` + reconnect dedup in `useCopilotStream` - `copilotModeRef` pattern to read latest mode without recreating transport - Shared `CopilotMode` type across frontend files - File name collision handling with numeric suffix - Path sanitization in file description hints (`os.path.basename`) ### Test plan - [x] 30 new unit tests: `_env_flag_override` (12), `envFlagOverride` (8), `_filter_tools_by_permissions` (4), `_prepare_baseline_attachments` (6) - [x] E2E tested on dev: fast mode creates E2B sandbox, calls 7-10 tools, generates and renders images - [x] Mode switching mid-session works (shared transcript + session messages) - [x] Server-side flag gate enforced (crafted `mode=fast` stripped when flag off) - [x] All 37 CI checks green - [x] Verified via agent-browser: workspace images render correctly in all message positions 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Zamil Majdy <majdy.zamil@gmail.com>	2026-04-06 19:54:36 +07:00
Nicholas Tindle	1a305db162	ci(frontend): add Playwright E2E coverage reporting to Codecov (#12665 ) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 00:55:09 -05:00
Zamil Majdy	48a653dc63	fix(copilot): prevent duplicate side effects from double-submit and stale-cache race (#12660 ) ## Why #12604 (intermediate persistence) introduced two bugs on dev: 1. Duplicate user messages — `set_turn_duration` calls `invalidate_session_cache()` which deletes the Redis key. Concurrent `get_chat_session()` calls re-populate it from DB with stale data. The executor loads this stale cache, misses the user message, and re-appends it. 2. Tool outputs lost on hydration — Intermediate flushes save assistant messages to DB before `StreamToolInputAvailable` sets `tool_calls` on them. Since `_save_session_to_db` is append-only (uses `start_sequence`), the `tool_calls` update is lost — subsequent flushes start past that index. On page refresh / SSE reconnect, tool UIs (SetupRequirementsCard, run_block output, etc.) are invisible. 3. Sessions stuck running — If a tool call hangs (e.g. WebSearch provider not responding), the stream never completes, `mark_session_completed` never runs, and the `active_stream` flag stays stale in Redis. ## What - In-place cache update in `set_turn_duration` — replaces `invalidate_session_cache()` with a read-modify-write that patches the duration on the cached session, eliminating the stale-cache repopulation window - tool_calls backfill — tracks the flush watermark and assistant message index; when `StreamToolInputAvailable` sets `tool_calls` on an already-flushed assistant, updates the DB record directly via `update_message_tool_calls()` - Improved message dedup — `is_message_duplicate()` / `maybe_append_user_message()` scans trailing same-role messages (current turn) instead of only checking `messages[-1]` - Idle timeout — aborts the stream with a retryable error if no meaningful SDK message arrives for 10 minutes, preventing hung tool calls from leaving sessions stuck ## Changes - `copilot/db.py` — `update_message_tool_calls()`, in-place cache update in `set_turn_duration` - `copilot/model.py` — `is_message_duplicate()`, `maybe_append_user_message()` - `copilot/sdk/service.py` — flush watermark tracking, tool_calls backfill, idle timeout - `copilot/baseline/service.py` — use `maybe_append_user_message()` - `copilot/model_test.py` — unit tests for dedup - `copilot/db_test.py` — unit tests for set_turn_duration cache update ## Checklist - [x] My PR title follows [conventional commit](https://www.conventionalcommits.org/) format - [x] Out-of-scope changes are less than 20% of the PR - [x] Changes to `data/*.py` validated for user ID checks (N/A) - [x] Protected routes updated in middleware (N/A)	2026-04-04 01:09:42 +07:00
Toran Bruce Richards	f6ddcbc6cb	feat(platform): Add all 12 Z.ai GLM models via OpenRouter (#12672 ) ## Summary Add Z.ai (Zhipu AI) GLM model family to the platform LLM blocks, routed through OpenRouter. This enables users to select any of the 12 Z.ai models across all LLM-powered blocks (AI Text Generator, AI Conversation, AI Structured Response, AI Text Summarizer, AI List Generator). ## Gap Analysis All 12 Z.ai models currently available on OpenRouter's API were missing from the AutoGPT platform: \| Model \| Context Window \| Max Output \| Price Tier \| Cost \| \|-------\|---------------\|------------\|------------\|------\| \| GLM 4 32B \| 128K \| N/A \| Tier 1 \| 1 \| \| GLM 4.5 \| 131K \| 98K \| Tier 2 \| 2 \| \| GLM 4.5 Air \| 131K \| 98K \| Tier 1 \| 1 \| \| GLM 4.5 Air (Free) \| 131K \| 96K \| Tier 1 \| 1 \| \| GLM 4.5V (vision) \| 65K \| 16K \| Tier 2 \| 2 \| \| GLM 4.6 \| 204K \| 204K \| Tier 1 \| 1 \| \| GLM 4.6V (vision) \| 131K \| 131K \| Tier 1 \| 1 \| \| GLM 4.7 \| 202K \| 65K \| Tier 1 \| 1 \| \| GLM 4.7 Flash \| 202K \| N/A \| Tier 1 \| 1 \| \| GLM 5 \| 80K \| 131K \| Tier 2 \| 2 \| \| GLM 5 Turbo \| 202K \| 131K \| Tier 3 \| 4 \| \| GLM 5V Turbo (vision) \| 202K \| 131K \| Tier 3 \| 4 \| ## Changes - `autogpt_platform/backend/backend/blocks/llm.py`: Added 12 `LlmModel` enum entries and corresponding `MODEL_METADATA` with context windows, max output tokens, display names, and price tiers sourced from OpenRouter API - `autogpt_platform/backend/backend/data/block_cost_config.py`: Added `MODEL_COST` entries for all 12 models, with costs scaled to match pricing (1 for budget, 2 for mid-range, 4 for premium) ## How it works All Z.ai models route through the existing OpenRouter provider (`open_router`) — no new provider or API client code needed. Users with an OpenRouter API key can immediately select any Z.ai model from the model dropdown in any LLM block. ## Related - Linear: REQ-83 --------- Co-authored-by: AutoGPT CoPilot <copilot@agpt.co>	2026-04-03 15:48:33 +00:00
Zamil Majdy	98f13a6e5d	feat(copilot): add create -> dry-run -> fix loop to agent generation (#12578 ) ## Summary - Instructs the copilot LLM to automatically dry-run agents after creating or editing them, inspect the output for wiring/data-flow issues, and fix iteratively before presenting the agent as ready to the user - Updates tool descriptions (run_agent, get_agent_building_guide), prompting supplement, and agent generation guide with clear workflow instructions and error pattern guidance - Adds Tool Discovery Priority to shared tool notes (find_block -> run_mcp_tool -> SendAuthenticatedWebRequestBlock -> manual API) - Adds 37 tests: prompt regression tests + functional tests (tool schema validation, Pydantic model, guide workflow ordering) - Frontend: Fixes host-scoped credential UX — replaces duplicate credentials for the same host instead of stacking them, wires up delete functionality with confirmation modal, updates button text contextually ("Update headers" vs "Add headers") ## Test plan - [x] All 37 `dry_run_loop_test.py` tests pass (prompt content, tool schemas, Pydantic model, guide ordering) - [x] Existing `tool_schema_test.py` passes (110 tests including character budget gate) - [x] Ruff lint and format pass - [x] Pyright type checking passes - [x] Frontend: `pnpm lint`, `pnpm types` pass - [x] Manual verification: confirm copilot follows the create -> dry-run -> fix workflow when asked to build an agent - [x] Manual verification: confirm host-scoped credentials replace instead of duplicate	2026-04-03 14:48:57 +00:00
Zamil Majdy	613978a611	ci: add gitleaks secret scanning to pre-commit hooks (#12649 ) ### Why / What / How Why: We had no local pre-commit protection against accidentally committing secrets. The existing `detect-secrets` hook only ran on `pre-push`, which is too late — secrets are already in git history by that point. GitHub's push protection only covers known provider patterns and runs server-side. What: Adds a 3-layer defense against secret leaks: local pre-commit hooks (gitleaks + detect-secrets), and a CI workflow as a safety net. How: - Moved `detect-secrets` from `pre-push` to `pre-commit` stage - Added `gitleaks` as a second pre-commit hook (Go binary, faster and more comprehensive rule set) - Added `.gitleaks.toml` config with allowlists for known false positives (test fixtures, dev docker JWTs, Firebase public keys, lock files, docs examples) - Added `repo-secret-scan.yml` CI workflow using `gitleaks-action` on PRs/pushes to master/dev ### Changes 🏗️ - `.pre-commit-config.yaml`: Moved `detect-secrets` to pre-commit stage, added baseline arg, added `gitleaks` hook - `.gitleaks.toml`: New config with tuned allowlists for this repo's false positives - `.secrets.baseline`: Empty baseline for detect-secrets to track known findings - `.github/workflows/repo-secret-scan.yml`: New CI workflow running gitleaks on every PR and push ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Ran `gitleaks detect --no-git` against the full repo — only `.env` files (gitignored) remain as findings - [x] Verified gitleaks catches a test secret file correctly - [x] Pre-commit hooks pass on commit (both detect-secrets and gitleaks passed) #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under Changes)	2026-04-03 14:01:26 +00:00
Zamil Majdy	2b0e8a5a9f	feat(platform): add rate-limit tiering system for CoPilot (#12581 ) ## Summary - Adds a four-tier subscription system (FREE/PRO/BUSINESS/ENTERPRISE) for CoPilot with configurable multipliers (1x/5x/20x/60x) applied on top of the base LaunchDarkly/config limits - Stores user tier in the database (`User.subscriptionTier` column as a Prisma enum, defaults to PRO for beta testing) with admin API endpoints for tier management - Includes tier info in usage status responses and OTEL/Langfuse trace metadata for observability ## Tier Structure \| Tier \| Multiplier \| Daily Tokens \| Weekly Tokens \| Notes \| \|------\|-----------\|-------------\|--------------\|-------\| \| FREE \| 1x \| 2.5M \| 12.5M \| Base tier (unused during beta) \| \| PRO \| 5x \| 12.5M \| 62.5M \| Default on sign-up (beta) \| \| BUSINESS \| 20x \| 50M \| 250M \| Manual upgrade for select users \| \| ENTERPRISE \| 60x \| 150M \| 750M \| Highest tier, custom \| ## Changes - `rate_limit.py`: `SubscriptionTier` enum (FREE/PRO/BUSINESS/ENTERPRISE), `TIER_MULTIPLIERS`, `get_user_tier()`, `set_user_tier()`, update `get_global_rate_limits()` to apply tier multiplier and return 3-tuple, add `tier` field to `CoPilotUsageStatus` - `rate_limit_admin_routes.py`: Add `GET/POST /admin/rate_limit/tier` endpoints, include `tier` in `UserRateLimitResponse` - `routes.py` (chat): Include tier in `/usage` endpoint response - `sdk/service.py`: Send `subscription_tier` in OTEL/Langfuse trace metadata - `schema.prisma`: Add `SubscriptionTier` enum and `subscriptionTier` column to `User` model (default: PRO) - `config.py`: Update docs to reflect tier system - Migration: `20260326200000_add_rate_limit_tier` — creates enum, migrates STANDARD→PRO, adds BUSINESS, sets default to PRO ## Test plan - [x] 72 unit tests all passing (43 rate_limit + 11 admin routes + 18 chat routes) - [ ] Verify FREE tier users get base limits (2.5M daily, 12.5M weekly) - [ ] Verify PRO tier users get 5x limits (12.5M daily, 62.5M weekly) - [ ] Verify BUSINESS tier users get 20x limits (50M daily, 250M weekly) - [ ] Verify ENTERPRISE tier users get 60x limits (150M daily, 750M weekly) - [ ] Verify admin can read and set user tiers via API - [ ] Verify tier info appears in Langfuse traces - [ ] Verify migration applies cleanly (creates enum, migrates STANDARD users to PRO, adds BUSINESS, default PRO) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>	2026-04-03 13:36:01 +00:00
Zamil Majdy	08bb05141c	dx: enhance pr-address skill with detailed codecov coverage guidance (#12662 ) Enhanced pr-address skill codecov section with local coverage commands, priority guide, and troubleshooting steps.	2026-04-03 13:15:46 +00:00
Nicholas Tindle	3ccaa5e103	ci(frontend): make frontend coverage checks informational (non-blocking) (#12663 ) ### Why / What / How Why: Frontend test coverage is still ramping up. The default component status checks (project + patch at 80%) would block merges for insufficient coverage on frontend changes, which isn't practical yet. What: Override the platform-frontend component's coverage statuses to be `informational: true`, so they report but don't block merges. How: Added explicit `statuses` to the `platform-frontend` component in `codecov.yml` with `informational: true` on both project and patch checks, overriding the `default_rules`. ### Changes 🏗️ - `codecov.yml`: Added `informational: true` to platform-frontend component's project and patch status checks ### Checklist 📋 #### For code changes: - [ ] I have clearly listed my changes in the PR description - [ ] I have made a test plan - [ ] I have tested my changes according to the test plan: - [ ] Verify Codecov frontend status checks show as informational (non-blocking) on PRs touching frontend code #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under Changes) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Low Risk > Low risk: Codecov configuration-only change that affects merge gating for frontend coverage statuses but does not alter runtime code. > > Overview > Updates `codecov.yml` to override the `platform-frontend` component’s coverage `statuses` so both project and patch checks are marked `informational: true` (non-blocking), while leaving the default component coverage rules unchanged for other components. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `f8e8426a31`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 12:22:05 +00:00
Krzysztof Czerwinski	09e42041ce	fix(frontend): AutoPilot notification follow-ups — branding, UX, persistence, and cross-tab sync (#12428 ) AutoPilot (copilot) notifications had several follow-up issues after initial implementation: old "Otto" branding, UX quirks, a service-worker crash, notification state that didn't persist or sync across tabs, a broken notification sound, and noisy Sentry alerts from SSR. ### Changes 🏗️ - Rename "Otto" → "AutoPilot" in all notification surfaces: browser notifications, document title badge, permission dialog copy, and notification banner copy - Agent Activity icon: changed from `Bell` to `Pulse` (Phosphor) in the navbar dropdown - Centered dialog buttons: the "Stay in the loop" permission dialog buttons are now centered instead of right-aligned - Service worker notification fix: wrapped `new Notification()` in try-catch so it degrades gracefully in service worker / PWA contexts instead of throwing `TypeError: Illegal constructor` - Persist notification state: `completedSessionIDs` is now stored in localStorage (`copilot-completed-sessions`) so it survives page refreshes and new tabs - Cross-tab sync: a `storage` event listener keeps `completedSessionIDs` and `document.title` in sync across all open tabs — clearing a notification in one tab clears it everywhere - Fix notification sound: corrected the sound file path from `/sounds/notification.mp3` to `/notification.mp3` and added a `.gitignore` exception (root `.gitignore` has a blanket `.mp3` ignore rule from legacy AutoGPT agent days) - Fix SSR Sentry noise*: guarded the Copilot Zustand store initialization with a client-side check so `storage.get()` is never called during SSR, eliminating spurious Sentry alerts (BUILDER-7CB, 7CC, 7C7) while keeping the Sentry reporting in `local-storage.ts` intact for genuinely unexpected SSR access ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verify "AutoPilot" appears (not "Otto") in browser notification, document title, permission dialog, and banner - [x] Verify Pulse icon in navbar Agent Activity dropdown - [x] Verify "Stay in the loop" dialog buttons are centered - [x] Open two tabs on copilot → trigger completion → both tabs show badge/checkmark - [x] Click completed session in tab 1 → badge clears in both tabs - [x] Refresh a tab → completed session state is preserved - [x] Verify notification sound plays on completion - [x] Verify no Sentry alerts from SSR localStorage access --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 11:44:22 +00:00

1 2 3 4 5 ...

8252 Commits