## Summary
Upgrade the frontend **Docker image** from **Node.js v21** (EOL since
June 2024) to **Node.js v22 LTS** (supported through April 2027).
> **Scope:** This only affects the **Dockerfile** used for local
development (`docker compose`) and CI. It does **not** affect Vercel
(which manages its own Node.js runtime) or Kubernetes (the frontend Helm
chart was removed in Dec 2025 — the frontend is deployed exclusively via
Vercel).
## Why
- Node v21.7.3 has a **known TransformStream race condition bug**
causing `TypeError: controller[kState].transformAlgorithm is not a
function` — this is
[BUILDER-3KF](https://significant-gravitas.sentry.io/issues/BUILDER-3KF)
with **567,000+ Sentry events**
- The error is entirely in Node.js internals
(`node:internal/webstreams/transformstream`), zero first-party code
- Node 21 is **not an LTS release** and has been EOL since June 2024
- `package.json` already declares `"engines": { "node": "22.x" }` — the
Dockerfile was inconsistent
- Node 22.x LTS (v22.22.1) fixes the TransformStream bug
- Next.js 15.4.x requires Node 18.18+, so Node 22 is fully compatible
## Changes
- `autogpt_platform/frontend/Dockerfile`: `node:21-alpine` →
`node:22.22-alpine3.23` (both `base` and `prod` stages)
## Test plan
- [ ] Verify frontend Docker image builds successfully via `docker
compose`
- [ ] Verify frontend starts and serves pages correctly in local Docker
environment
- [ ] Monitor Sentry for BUILDER-3KF — should drop to zero for
Docker-based runs
## Why
Admins reviewing marketplace submissions currently approve blindly —
they can see raw metadata in the admin table but cannot see what the
listing actually looks like (images, video, branding, layout). This
risks approving inappropriate content. With full-scale production
approaching, this is critical.
Additionally, when a creator un-publishes an agent, users who already
added it to their library lose access — breaking their workflows.
Product decided on a "you added it, you keep it" model.
## What
- **Admin preview page** at `/admin/marketplace/preview/[id]` — renders
the listing exactly as it would appear on the public marketplace
- **Add to Library** for admins to test-run pending agents before
approving
- **Library membership grants graph access** — if you added an agent to
your library, you keep access even if it's un-published or rejected
- **Preview button** on every submission row in the admin marketplace
table
- **Cross-reference comments** on original functions to prevent
SECRT-2162-style regressions
## How
### Backend
**Admin preview (`store/db.py`):**
- `get_store_agent_details_as_admin()` queries `StoreListingVersion`
directly, bypassing the APPROVED-only `StoreAgent` DB view
- Validates `CreatorProfile` FK integrity, reads all fields including
`recommendedScheduleCron`
**Admin add-to-library (`library/_add_to_library.py`):**
- Extracted shared logic into `resolve_graph_for_library()` +
`add_graph_to_library()` — eliminates duplication between public and
admin paths
- Admin path uses `get_graph_as_admin()` to bypass marketplace status
checks
- Handles concurrent double-click race via `UniqueViolationError` catch
**Library membership grants graph access (`data/graph.py`):**
- `get_graph()` now falls back to `LibraryAgent` lookup if ownership and
marketplace checks fail
- Only for authenticated users with non-deleted, non-archived library
records
- `validate_graph_execution_permissions()` updated to match — library
membership grants execution access too
**New endpoints (`store_admin_routes.py`):**
- `GET /admin/submissions/{id}/preview` — returns `StoreAgentDetails`
- `POST /admin/submissions/{id}/add-to-library` — creates `LibraryAgent`
via admin path
### Frontend
- Preview page reuses `AgentInfo` + `AgentImages` with admin banner
- Shows instructions, recommended schedule, and slug
- "Add to My Library" button wired to admin endpoint
- Preview button added to `ExpandableRow` (header + version history)
- Categories column uncommented in version history table
### Testing (19 tests)
**Graph access control (9 in `graph_test.py`):** Owner access,
marketplace access, library member access (unpublished),
deleted/archived/anonymous denied, null FK denied, efficiency checks
**Admin bypass (5 in `store_admin_routes_test.py`):** Preview uses
StoreListingVersion not StoreAgent, admin path uses get_graph_as_admin,
regular path uses get_graph, library member can view in builder
**Security (3):** Non-admin 403 on preview, non-admin 403 on
add-to-library, nonexistent 404
**SECRT-2162 regression (2):** Admin access to pending agent, export
with sub-graphs
### Checklist
- [x] Changes clearly listed
- [x] Test plan made
- [x] 19 backend tests pass
- [x] Frontend lints and types clean
## Test plan
- [x] Navigate to `/admin/marketplace`, click Preview on a PENDING
submission
- [x] Verify images, video, description, categories, instructions,
schedule render correctly
- [x] Click "Add to My Library", verify agent appears in library and
opens in builder
- [x] Verify non-admin users get 403
- [x] Verify un-publishing doesn't break access for users who already
added it
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **High Risk**
> Adds new admin-only endpoints that bypass marketplace
approval/ownership checks and changes `get_graph`/execution
authorization to grant access via library membership, which impacts
security-sensitive access control paths.
>
> **Overview**
> Adds **admin preview + review workflow support** for marketplace
submissions: new admin routes to `GET /admin/submissions/{id}/preview`
(querying `StoreListingVersion` directly) and `POST
/admin/submissions/{id}/add-to-library` (admin bypass to pull pending
graphs into an admin’s library).
>
> Refactors library add-from-store logic into shared helpers
(`resolve_graph_for_library`, `add_graph_to_library`) and introduces an
admin variant `add_store_agent_to_library_as_admin`, including restore
of archived/deleted entries and dedup/race handling.
>
> Changes core graph access rules: `get_graph()` now falls back to
**library membership** (non-deleted/non-archived, version-specific) when
ownership and marketplace approval don’t apply, and
`validate_graph_execution_permissions()` is updated accordingly.
Frontend adds a preview link and a dedicated admin preview page with
“Add to My Library”; tests expand significantly to lock in the new
bypass and access-control behavior.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
a362415d12. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
### Why / What / How
**Why:** GitHub's code scanning detected a HIGH severity security
vulnerability in `/autogpt_platform/backend/backend/util/json.py:172`.
The error handler in `sanitize_json()` was logging sensitive data
(potentially including secrets, API keys, credentials) as clear text
when serialization fails.
**What:** This PR removes the logging of actual data content from the
error handler while preserving useful debugging metadata (error type,
error message, and data type).
**How:** Removed the `"Data preview: %s"` format parameter and the
corresponding `truncate(str(data), 100)` argument from the
logger.error() call. The error handler now logs only safe metadata that
helps debugging without exposing sensitive information.
### Changes 🏗️
- **Security Fix**: Modified `sanitize_json()` function in
`backend/util/json.py`
- Removed logging of data content (`truncate(str(data), 100)`) from the
error handler
- Retained logging of error type (`type(e).__name__`)
- Retained logging of truncated error message (`truncate(str(e), 200)`)
- Retained logging of data type (`type(data).__name__`)
- Error handler still provides useful debugging information without
exposing secrets
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified the code passes type checking (`poetry run pyright
backend/util/json.py`)
- [x] Verified the code passes linting (`poetry run ruff check
backend/util/json.py`)
- [x] Verified all pre-commit hooks pass
- [x] Reviewed the diff to ensure only the sensitive data logging was
removed
- [x] Confirmed that useful debugging information (error type, error
message, data type) is still logged
#### For configuration changes:
- N/A - No configuration changes required
## Summary
- **pr-address skill**: Add explicit rule against empty commits for CI
re-triggers, and strengthen push-immediately guidance with rationale
- **Platform CLAUDE.md**: Add "split PRs by concern" guideline under
Creating Pull Requests
### Changes
- Updated `.claude/skills/pr-address/SKILL.md`
- Updated `autogpt_platform/CLAUDE.md`
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan
#### Test plan:
- [x] Documentation-only changes — no functional tests needed
- [x] Verified markdown renders correctly
### Changes 🏗️
- When the AutoPilot block executes a copilot session via
`collect_copilot_response`, it calls `stream_chat_completion_sdk`
directly, bypassing the copilot executor and stream registry. This means
the frontend sees no `active_stream` on the session and cannot connect
via SSE — users see a frozen chat with no updates until the turn fully
completes.
- Fix: register a `stream_registry` session in
`collect_copilot_response` and publish each chunk to Redis as events are
consumed. This allows the frontend to detect `active_stream=true` and
connect via the SSE reconnect endpoint for live streaming updates during
AutoPilot execution.
- Error handling is graceful — if stream registry fails, AutoPilot still
works normally, just without real-time frontend updates.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Trigger an AutoPilot block execution that creates a new chat
session
- [x] Verify the new session appears in the sidebar with streaming
indicator
- [x] Click on the session while AutoPilot is still executing — verify
SSE connects and messages stream in real-time
- [x] Verify that after AutoPilot completes, the session shows as
complete (no active_stream)
- [x] Test reconnection: disconnect and reconnect while AutoPilot is
running — verify stream resumes (found and fixed GeneratorExit bug that
caused stuck sessions)
- [x] E2E: 10 stream events published to Redis (StreamStart,
3×ToolInput, 3×ToolOutput, TextStart, TextEnd, StreamFinish)
- [x] E2E: Redis xadd latency 0.2–3.4ms per chunk
- [x] E2E: Chat sessions registered in Redis (confirmed via redis-cli)
## Summary
- Allow `/tmp` as a valid writable directory in E2B sandbox file tools
(`write_file`, `read_file`, `edit_file`, `glob`, `grep`)
- The E2B sandbox is already fully isolated, so restricting writes to
only `/home/user` was unnecessarily limiting — scripts and tools
commonly use `/tmp` for temporary files
- Extract `is_within_allowed_dirs()` helper in `context.py` to
centralize the allowed-directory check for both path resolution and
symlink escape detection
## Changes
- `context.py`: Add `E2B_ALLOWED_DIRS` tuple and `E2B_ALLOWED_DIRS_STR`,
introduce `is_within_allowed_dirs()`, update `resolve_sandbox_path()` to
use it
- `e2b_file_tools.py`: Update `_check_sandbox_symlink_escape()` to use
`is_within_allowed_dirs()`, update tool descriptions
- Tests: Add coverage for `/tmp` paths in both `context_test.py` and
`e2b_file_tools_test.py`
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] All 59 existing + new tests pass (`poetry run pytest
backend/copilot/context_test.py
backend/copilot/sdk/e2b_file_tools_test.py`)
- [x] `poetry run format` and `poetry run lint` pass clean
- [x] Verify `/tmp` write works in live E2B sandbox
- [x] E2E: Write file to /tmp/test.py in E2B sandbox via copilot
- [x] E2E: Execute script from /tmp — output "Hello, World!"
- [x] E2E: E2B sandbox lifecycle (create, use, pause) works correctly
## Summary
- Filter SDK-provisioned default credentials from credentials API list
endpoints
- Reuse `CredentialsMetaResponse` model from internal router in external
API (removes duplicate `CredentialSummary`)
- Add `is_sdk_default()` helper for identifying platform-provisioned
credentials
- Add `provider_matches()` to credential store for consistent provider
filtering
- Add tests for credential filtering behavior
### Changes
- `backend/data/model.py` — add `is_sdk_default()` helper
- `backend/api/features/integrations/router.py` — filter SDK defaults
from list endpoints
- `backend/api/external/v1/integrations.py` — reuse
`CredentialsMetaResponse`, filter SDK defaults
- `backend/integrations/credentials_store.py` — add `provider_matches()`
- `backend/sdk/registry.py` — update credential registration
- `backend/api/features/integrations/router_test.py` — new tests
- `backend/api/features/integrations/conftest.py` — test fixtures
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan
#### Test plan:
- [x] Unit tests for credential filtering (`router_test.py`)
- [x] Verify SDK default credentials excluded from API responses
- [x] Verify user-created credentials still returned normally
## Why
Agent generation and building needs a way to test-run agents without
requiring real credentials or producing side effects. Currently, every
execution hits real APIs, consumes credits, and requires valid
credentials — making it impossible to debug or validate agent graphs
during the build phase without real consequences.
## Summary
Adds a `dry_run` execution mode to the copilot's `run_block` and
`run_agent` tools. When `dry_run=True`, every block execution is
simulated by an LLM instead of calling the real service — no real API
calls, no credentials consumed, no side effects.
Inspired by
[Significant-Gravitas/agent-simulator](https://github.com/Significant-Gravitas/agent-simulator).
### How it works
- **`backend/executor/simulator.py`** (new): `simulate_block()` builds a
prompt from the block's name, description, input/output schemas, and
actual input values, then calls `gpt-4o-mini` via the existing
OpenRouter client with JSON mode. Retries up to 5 times on JSON parse
failures. Missing output pins are filled with `None` (or `""` for the
`error` pin). Long inputs (>20k chars) are truncated before sending to
the LLM.
- **`ExecutionContext`**: Added `dry_run: bool = False` field; threaded
through `add_graph_execution()` so graph-level dry runs propagate to
every block execution.
- **`execute_block()` helper**: When `dry_run=True`, the function
short-circuits before any credential injection or credit checks, calls
`simulate_block()`, and returns a `[DRY RUN]`-prefixed
`BlockOutputResponse`.
- **`RunBlockTool`**: New `dry_run` boolean parameter.
- **`RunAgentTool`**: New `dry_run` boolean parameter; passes
`ExecutionContext(dry_run=True)` to graph execution.
### Tests
11 tests in `backend/copilot/tools/test_dry_run.py`:
- Correct output tuples from LLM response
- JSON retry logic (3 total calls when first 2 fail)
- All-retries-exhausted yields `SIMULATOR ERROR`
- Missing output pins filled with `None`/`""`
- No-client case
- Input truncation at 20k chars
- `execute_block(dry_run=True)` skips real `block.execute()`
- Response format: `[DRY RUN]` message, `success=True`
- `dry_run=False` unchanged (real path)
- `RunBlockTool` parameter presence
- `dry_run` kwarg forwarding
## Test plan
- [x] Run `pytest backend/copilot/tools/test_dry_run.py -v` — all 11
pass
- [x] Call `run_block` with `dry_run=true` in copilot; verify no real
API calls occur and output contains `[DRY RUN]`
- [x] Call `run_agent` with `dry_run=true`; verify execution is created
with `dry_run=True` in context
- [x] E2E: Simulate button (flask icon) present in builder alongside
play button
- [x] E2E: Simulated run labeled with "(Simulated)" suffix and badge in
Library
- [x] E2E: No credits consumed during dry-run
## Summary
**Critical CI fix** — litellm was compromised in a supply chain attack
(versions 1.82.7/1.82.8 contained infostealer malware) and PyPI
subsequently yanked many litellm versions including the 1.7x range that
stagehand 0.5.x depended on. This breaks `poetry lock` in CI for all
PRs.
- Bump `stagehand` from `^0.5.1` to `^3.4.0` — Stagehand v3 is a
Stainless-generated HTTP API client that **no longer depends on
litellm**, completely removing litellm from our dependency tree
- Migrate stagehand blocks to use `AsyncStagehand` + session-based API
(`sessions.start`, `session.navigate/act/observe/extract`)
- Net reduction of ~430 lines in `poetry.lock` from dropping litellm and
its transitive dependencies
## Why
All CI pipelines are blocked because `poetry lock` fails to resolve
yanked litellm versions that stagehand 0.5.x required.
## Test plan
- [x] CI passes (poetry lock resolves, backend tests green)
- [ ] Verify stagehand blocks still function with the new session-based
API
## Summary
- Implements **infrastructure-level parallel tool execution** for
CoPilot: all tools called in a single LLM turn now execute concurrently
with zero changes to individual tool implementations or LLM prompts.
- Adds `pre_launch_tool_call()` to `tool_adapter.py`: when an
`AssistantMessage` with `ToolUseBlock`s arrives, all tools are
immediately fired as `asyncio.Task`s before the SDK dispatches MCP
handlers. Each MCP handler then awaits its pre-launched task instead of
executing fresh.
- Adds a `_tool_task_queues` `ContextVar` (initialized per-session in
`set_execution_context()`) so concurrent sessions never share task
queues.
- DRY refactor: extracts `prepare_block_for_execution()`,
`check_hitl_review()`, and `BlockPreparation` dataclass into
`helpers.py` so the execution pipeline is reusable.
- 10 unit tests for the parallel pre-launch infrastructure (queue
enqueue/dequeue, MCP prefix stripping, fallback path, `CancelledError`
handling, multi-same-tool FIFO ordering).
## Root cause
The Claude Agent SDK CLI sends MCP tool calls as sequential
request-response pairs: it waits for each `control_response` before
issuing the next `mcp_message`. Even though Python dispatches handlers
with `start_soon`, the CLI never issues call B until call A's response
is sent — blocks always ran sequentially. The pre-launch pattern fixes
this at the infrastructure level by starting all tasks before the SDK
even dispatches the first handler.
## Test plan
- [x] `poetry run pytest backend/copilot/sdk/tool_adapter_test.py` — 27
tests pass (10 new parallel infra tests)
- [x] `poetry run pytest backend/copilot/tools/helpers_test.py` — 20
tests pass
- [x] `poetry run pytest backend/copilot/tools/run_block_test.py
backend/copilot/tools/test_run_block_details.py` — all pass
- [x] Manually test in CoPilot: ask the agent to run two blocks
simultaneously — verify both start executing before either completes
- [x] E2E: Both GetCurrentTimeBlock and CalculatorBlock executed
concurrently (time=09:35:42, 42×7=294)
- [x] E2E: Pre-launch mechanism active — two run_block events at same
timestamp (3ms apart)
- [x] E2E: Arg-mismatch fallback tested — system correctly cancels and
falls back to direct execution
## Summary
- Renames `SmartDecisionMakerBlock` to `OrchestratorBlock` across the
entire codebase
- The block supports iteration/agent mode and general tool
orchestration, so "Smart Decision Maker" no longer accurately describes
its capabilities
- Block UUID (`3b191d9f-356f-482d-8238-ba04b6d18381`) remains unchanged
— fully backward compatible with existing graphs
## Changes
- Renamed block class, constants, file names, test files, docs, and
frontend enum
- Updated copilot agent generator (helpers, validator, fixer) references
- Updated agent generation guide documentation
- No functional changes — pure rename refactor
### For code changes
- [x] I have clearly listed my changes in the PR description
- [x] I have made corresponding changes to the documentation
- [x] My changes do not generate new warnings or errors
- [x] New and existing unit tests pass locally with my changes
## Test plan
- [x] All pre-commit hooks pass (typecheck, lint, format)
- [x] Existing graphs with this block continue to load and execute (same
UUID)
- [x] Agent mode / iteration mode works as before
- [x] Copilot agent generator correctly references the renamed block
Requested by @majdyz
Follow-up to #12513. Anthropic/OpenAI 401, 403, and 429 errors are
user-caused (bad API keys, forbidden, rate limits) and should not hit
Sentry as exceptions.
### Changes
**Changes in `blocks/llm.py`:**
- Anthropic `APIError` handler (line ~950): check `status_code` — use
`logger.warning()` for 401/403/429, keep `logger.error()` for server
errors
- Generic `Exception` handler in LLM block `run()` (line ~1467): same
pattern — `logger.warning()` for user-caused status codes,
`logger.exception()` for everything else
- Extracted `USER_ERROR_STATUS_CODES = (401, 403, 429)` module-level
constant
- Added `break` to short-circuit retry loop for user-caused errors
- Removed double-logging from inner Anthropic handler
**Changes in `blocks/test/test_llm.py`:**
- Added 8 regression tests covering 401/403/429 fast-exit and 500 retry
behavior
**Sentry issues addressed:**
- AUTOGPT-SERVER-8B6, 8B7, 8B8 — `[LLM-Block] Anthropic API error: Error
code: 401 - invalid x-api-key`
- Any OpenAI 401/403/429 errors hitting the generic exception handler
Part of SECRT-2166
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan
#### Test plan:
- [x] Unit tests for 401/403/429 Anthropic errors → warning log, no
retry
- [x] Unit tests for 500 Anthropic errors → error log, retry
- [x] Unit tests for 401/403/429 OpenAI errors → warning log, no retry
- [x] Unit tests for 500 OpenAI errors → error log, retry
- [x] Verified USER_ERROR_STATUS_CODES constant is used consistently
- [x] Verified no double-logging in Anthropic handler path
---
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
---------
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
## Summary
- Adds `CopilotPermissions` model (`copilot/permissions.py`) — a
capability filter that restricts which tools and blocks the
AutoPilot/Copilot may use during a single execution
- Exposes 4 new `advanced=True` fields on `AutoPilotBlock`: `tools`,
`tools_exclude`, `blocks`, `blocks_exclude`
- Threads permissions through the full execution path: `AutoPilotBlock`
→ `collect_copilot_response` → `stream_chat_completion_sdk` →
`run_block`
- Implements recursion inheritance via contextvar: sub-agent executions
can only be *more* restrictive than their parent
## Design
**Tool filtering** (`tools` + `tools_exclude`):
- `tools_exclude=True` (default): `tools` is a **blacklist** — listed
tools denied, all others allowed. Empty list = allow all.
- `tools_exclude=False`: `tools` is a **whitelist** — only listed tools
are allowed.
- Users specify short names (`run_block`, `web_fetch`, `Read`, `Task`,
…) — mapped to full SDK format internally.
- Validated eagerly at block-run time with a clear error listing valid
names.
**Block filtering** (`blocks` + `blocks_exclude`):
- Same semantics as tool filtering, applied inside `run_block` via
contextvar.
- Each entry can be a full UUID, an 8-char partial UUID (first segment),
or a case-insensitive block name.
- Validated against the live block registry; invalid identifiers surface
a helpful error before the session is created.
**Recursion inheritance**:
- `_inherited_permissions` contextvar stores the parent execution's
permissions.
- On each `AutoPilotBlock.run()`, the child's permissions are merged
with the parent via `merged_with_parent()` — effective allowed sets are
intersected (tools) and the parent chain is kept for block checks.
- Sub-agents can never expand what the parent allowed.
## Test plan
- [x] 68 new unit tests in `copilot/permissions_test.py` and
`blocks/autopilot_permissions_test.py`
- [x] Block identifier matching: full UUID, partial UUID, name,
case-insensitivity
- [x] Tool allow/deny list semantics including edge cases (empty list,
unknown tool)
- [x] Parent/child merging and recursion ceiling correctness
- [x] `validate_tool_names` / `validate_block_identifiers` with mock
block registry
- [x] `apply_tool_permissions` SDK tool-list integration
- [x] `AutoPilotBlock.run()` — invalid tool/block yields error before
session creation
- [x] `AutoPilotBlock.run()` — valid permissions forwarded to
`execute_copilot`
- [x] Existing `AutoPilotBlock` block tests still pass (2/2)
- [x] All hooks pass (pyright, ruff, black, isort)
- [x] E2E: CoPilot chat works end-to-end with E2B sandbox (12s stream)
- [x] E2E: Permission fields render in Builder UI (Tools combobox,
exclude toggles)
- [x] E2E: Agent with restricted permissions (whitelist web_fetch only)
executes correctly
- [x] E2E: Permission values preserved through API round-trip
## Why
Admins cannot download submitted-but-not-yet-approved agents from
`/admin/marketplace`. Clicking "Download" fails silently with a Server
Components render error. This blocks admins from reviewing agents that
companies have submitted.
## What
Remove the redundant ownership/marketplace check from
`get_graph_as_admin()` that was silently tightened in PR #11323 (Nov
2025). Add regression tests for both the admin download path and the
non-admin marketplace access control.
## How
**Root cause:** In PR #11323, Reinier refactored an inline
`StoreListingVersion` query (which had no status filter) into a call to
`is_graph_published_in_marketplace()` (which requires `submissionStatus:
APPROVED`). This was collateral cleanup — his PR focused on sub-agent
execution permissions — but it broke admin download of pending agents.
**Fix:** Remove the ownership/marketplace check from
`get_graph_as_admin()`, keeping only the null guard. This is safe
because `get_graph_as_admin` is only callable through admin-protected
routes (`requires_admin_user` at router level).
**Tests added:**
- `test_admin_can_access_pending_agent_not_owned` — admin can access a
graph they don't own that isn't APPROVED
- `test_admin_download_pending_agent_with_subagents` — admin export
includes sub-graphs
- `test_get_graph_non_owner_approved_marketplace_agent` — protects PR
#11323: non-owners CAN access APPROVED agents
- `test_get_graph_non_owner_pending_marketplace_agent_denied` — protects
PR #11323: non-owners CANNOT access PENDING agents
### Checklist
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] 4 regression tests pass locally
- [x] Admin can download pending agents (verified via unit test)
- [x] Non-admin marketplace access control preserved
## Test plan
- [ ] Verify admin can download a submitted-but-not-approved agent from
`/admin/marketplace`
- [ ] Verify non-admin users still cannot access admin endpoints
- [ ] Verify the download succeeds without console errors
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Changes access-control behavior for admin graph retrieval; risk is
mitigated by route-level admin auth but misuse of `get_graph_as_admin()`
outside admin-protected routes would expose non-approved graphs.
>
> **Overview**
> Admins can now download/review **submitted-but-not-approved**
marketplace agents: `get_graph_as_admin()` no longer enforces ownership
or *marketplace APPROVED* checks, only returning `None` when the graph
doesn’t exist.
>
> Adds regression tests covering the admin download/export path
(including sub-graphs) and confirming non-admin behavior is unchanged:
non-owners can fetch **APPROVED** marketplace graphs but cannot access
**pending** ones.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
a6d2d69ae4. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- Adds a two-layer circuit breaker to prevent AutoPilot from looping
infinitely when tool calls fail with empty parameters
- **Tool-level**: After 3 consecutive identical failures per tool,
returns a hard-stop message instructing the model to output content as
text instead of retrying
- **Stream-level**: After 6 consecutive empty tool calls (`input: {}`),
aborts the stream entirely with a user-visible error and retry button
## Background
In session `c5548b48`, the model completed all research successfully but
then spent 51+ minutes in an infinite loop trying to write output —
every tool call was sent with `input: {}` (likely due to context
saturation preventing argument serialization). 21+ identical failing
tool calls with no circuit breaker.
## Changes
- `tool_adapter.py`: Added `_check_circuit_breaker`,
`_record_tool_failure`, `_clear_tool_failures` functions with a
`ContextVar`-based tracker. Integrated into both `create_tool_handler`
(BaseTool) and the `_truncating` wrapper (all tools).
- `service.py`: Added empty-tool-call detection in the main stream loop
that counts consecutive `AssistantMessage`s with empty
`ToolUseBlock.input` and aborts after the limit.
- `test_circuit_breaker.py`: 7 unit tests covering threshold behavior,
per-args tracking, reset on success, and uninitialized tracker safety.
## Test plan
- [x] Unit tests pass (`pytest
backend/copilot/sdk/test_circuit_breaker.py` — 8/8 passing)
- [x] Pre-commit hooks pass (Ruff, Black, isort, typecheck all pass)
- [x] E2E: CoPilot tool calls work normally (GetCurrentTimeBlock
returned 09:16:39 UTC)
- [x] E2E: Circuit breaker pass-through verified (successful calls don't
trigger breaker)
- [x] E2E: Circuit breaker code integrated into tool_adapter truncating
wrapper
## Summary
- Add "Critical Requirements" section making screenshots at every step,
PR comment posting, state verification, negative tests, and full
evidence reports non-negotiable
- Add "State Manipulation for Realistic Testing" section with Redis CLI,
DB query, and API before/after patterns
- Strengthen fix mode to require before/after screenshot pairs, rebuild
only affected services, and commit after each fix
- Expand test report format to include API evidence and screenshot
evidence columns
- Bump version to 2.0.0
## Test plan
- [x] Run `/pr-test` on an existing PR and verify it follows the new
critical requirements
- [x] Verify screenshots are posted to PR comment
- [x] Verify fix mode produces before/after screenshot pairs
Bumps the development-dependencies group with 2 updates in the
/autogpt_platform/autogpt_libs directory:
[pytest-cov](https://github.com/pytest-dev/pytest-cov) and
[ruff](https://github.com/astral-sh/ruff).
Updates `pytest-cov` from 7.0.0 to 7.1.0
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst">pytest-cov's
changelog</a>.</em></p>
<blockquote>
<h2>7.1.0 (2026-03-21)</h2>
<ul>
<li>
<p>Fixed total coverage computation to always be consistent, regardless
of reporting settings.
Previously some reports could produce different total counts, and
consequently can make --cov-fail-under behave different depending on
reporting options.
See <code>[#641](https://github.com/pytest-dev/pytest-cov/issues/641)
<https://github.com/pytest-dev/pytest-cov/issues/641></code>_.</p>
</li>
<li>
<p>Improve handling of ResourceWarning from sqlite3.</p>
<p>The plugin adds warning filter for sqlite3
<code>ResourceWarning</code> unclosed database (since 6.2.0).
It checks if there is already existing plugin for this message by
comparing filter regular expression.
When filter is specified on command line the message is escaped and does
not match an expected message.
A check for an escaped regular expression is added to handle this
case.</p>
<p>With this fix one can suppress <code>ResourceWarning</code> from
sqlite3 from command line::</p>
<p>pytest -W "ignore:unclosed database in <sqlite3.Connection
object at:ResourceWarning" ...</p>
</li>
<li>
<p>Various improvements to documentation.
Contributed by Art Pelling in
<code>[#718](https://github.com/pytest-dev/pytest-cov/issues/718)
<https://github.com/pytest-dev/pytest-cov/pull/718></code>_ and
"vivodi" in
<code>[#738](https://github.com/pytest-dev/pytest-cov/issues/738)
<https://github.com/pytest-dev/pytest-cov/pull/738></code><em>.
Also closed
<code>[#736](https://github.com/pytest-dev/pytest-cov/issues/736)
<https://github.com/pytest-dev/pytest-cov/issues/736></code></em>.</p>
</li>
<li>
<p>Fixed some assertions in tests.
Contributed by in Markéta Machová in
<code>[#722](https://github.com/pytest-dev/pytest-cov/issues/722)
<https://github.com/pytest-dev/pytest-cov/pull/722></code>_.</p>
</li>
<li>
<p>Removed unnecessary coverage configuration copying (meant as a backup
because reporting commands had configuration side-effects before
coverage 5.0).</p>
</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="66c8a526b1"><code>66c8a52</code></a>
Bump version: 7.0.0 → 7.1.0</li>
<li><a
href="f707662478"><code>f707662</code></a>
Make the examples use pypy 3.11.</li>
<li><a
href="6049a78478"><code>6049a78</code></a>
Make context test use the old ctracer (seems the new sysmon tracer
behaves di...</li>
<li><a
href="8ebf20bbbc"><code>8ebf20b</code></a>
Update changelog.</li>
<li><a
href="861d30e60d"><code>861d30e</code></a>
Remove the backup context manager - shouldn't be needed since coverage
5.0, ...</li>
<li><a
href="fd4c956014"><code>fd4c956</code></a>
Pass the precision on the nulled total (seems that there's some caching
goion...</li>
<li><a
href="78c9c4ecb0"><code>78c9c4e</code></a>
Only run the 3.9 on older deps.</li>
<li><a
href="4849a922e8"><code>4849a92</code></a>
Punctuation.</li>
<li><a
href="197c35e2f3"><code>197c35e</code></a>
Update changelog and hopefully I don't forget to publish release again
:))</li>
<li><a
href="14dc1c92d4"><code>14dc1c9</code></a>
Update examples to use 3.11 and make the adhoc layout example look a bit
more...</li>
<li>Additional commits viewable in <a
href="https://github.com/pytest-dev/pytest-cov/compare/v7.0.0...v7.1.0">compare
view</a></li>
</ul>
</details>
<br />
Updates `ruff` from 0.15.0 to 0.15.7
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/releases">ruff's
releases</a>.</em></p>
<blockquote>
<h2>0.15.7</h2>
<h2>Release Notes</h2>
<p>Released on 2026-03-19.</p>
<h3>Preview features</h3>
<ul>
<li>Display output severity in preview (<a
href="https://redirect.github.com/astral-sh/ruff/pull/23845">#23845</a>)</li>
<li>Don't show <code>noqa</code> hover for non-Python documents (<a
href="https://redirect.github.com/astral-sh/ruff/pull/24040">#24040</a>)</li>
</ul>
<h3>Rule changes</h3>
<ul>
<li>[<code>pycodestyle</code>] Recognize <code>pyrefly:</code> as a
pragma comment (<code>E501</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/24019">#24019</a>)</li>
</ul>
<h3>Server</h3>
<ul>
<li>Don't return code actions for non-Python documents (<a
href="https://redirect.github.com/astral-sh/ruff/pull/23905">#23905</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>Add company AI policy to contributing guide (<a
href="https://redirect.github.com/astral-sh/ruff/pull/24021">#24021</a>)</li>
<li>Document editor features for Markdown code formatting (<a
href="https://redirect.github.com/astral-sh/ruff/pull/23924">#23924</a>)</li>
<li>[<code>pylint</code>] Improve phrasing (<code>PLC0208</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/24033">#24033</a>)</li>
</ul>
<h3>Other changes</h3>
<ul>
<li>Use PEP 639 license information (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19661">#19661</a>)</li>
</ul>
<h3>Contributors</h3>
<ul>
<li><a
href="https://github.com/tmimmanuel"><code>@tmimmanuel</code></a></li>
<li><a
href="https://github.com/DimitriPapadopoulos"><code>@DimitriPapadopoulos</code></a></li>
<li><a
href="https://github.com/amyreese"><code>@amyreese</code></a></li>
<li><a href="https://github.com/statxc"><code>@statxc</code></a></li>
<li><a href="https://github.com/dylwil3"><code>@dylwil3</code></a></li>
<li><a
href="https://github.com/hunterhogan"><code>@hunterhogan</code></a></li>
<li><a
href="https://github.com/renovate"><code>@renovate</code></a></li>
</ul>
<h2>Install ruff 0.15.7</h2>
<h3>Install prebuilt binaries via shell script</h3>
<pre lang="sh"><code>curl --proto '=https' --tlsv1.2 -LsSf
https://releases.astral.sh/github/ruff/releases/download/0.15.7/ruff-installer.sh
| sh
</code></pre>
<h3>Install prebuilt binaries via powershell script</h3>
<pre lang="sh"><code>powershell -ExecutionPolicy Bypass -c "irm
https://releases.astral.sh/github/ruff/releases/download/0.15.7/ruff-installer.ps1
| iex"
</tr></table>
</code></pre>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's
changelog</a>.</em></p>
<blockquote>
<h2>0.15.7</h2>
<p>Released on 2026-03-19.</p>
<h3>Preview features</h3>
<ul>
<li>Display output severity in preview (<a
href="https://redirect.github.com/astral-sh/ruff/pull/23845">#23845</a>)</li>
<li>Don't show <code>noqa</code> hover for non-Python documents (<a
href="https://redirect.github.com/astral-sh/ruff/pull/24040">#24040</a>)</li>
</ul>
<h3>Rule changes</h3>
<ul>
<li>[<code>pycodestyle</code>] Recognize <code>pyrefly:</code> as a
pragma comment (<code>E501</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/24019">#24019</a>)</li>
</ul>
<h3>Server</h3>
<ul>
<li>Don't return code actions for non-Python documents (<a
href="https://redirect.github.com/astral-sh/ruff/pull/23905">#23905</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>Add company AI policy to contributing guide (<a
href="https://redirect.github.com/astral-sh/ruff/pull/24021">#24021</a>)</li>
<li>Document editor features for Markdown code formatting (<a
href="https://redirect.github.com/astral-sh/ruff/pull/23924">#23924</a>)</li>
<li>[<code>pylint</code>] Improve phrasing (<code>PLC0208</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/24033">#24033</a>)</li>
</ul>
<h3>Other changes</h3>
<ul>
<li>Use PEP 639 license information (<a
href="https://redirect.github.com/astral-sh/ruff/pull/19661">#19661</a>)</li>
</ul>
<h3>Contributors</h3>
<ul>
<li><a
href="https://github.com/tmimmanuel"><code>@tmimmanuel</code></a></li>
<li><a
href="https://github.com/DimitriPapadopoulos"><code>@DimitriPapadopoulos</code></a></li>
<li><a
href="https://github.com/amyreese"><code>@amyreese</code></a></li>
<li><a href="https://github.com/statxc"><code>@statxc</code></a></li>
<li><a href="https://github.com/dylwil3"><code>@dylwil3</code></a></li>
<li><a
href="https://github.com/hunterhogan"><code>@hunterhogan</code></a></li>
<li><a
href="https://github.com/renovate"><code>@renovate</code></a></li>
</ul>
<h2>0.15.6</h2>
<p>Released on 2026-03-12.</p>
<h3>Preview features</h3>
<ul>
<li>Add support for <code>lazy</code> import parsing (<a
href="https://redirect.github.com/astral-sh/ruff/pull/23755">#23755</a>)</li>
<li>Add support for star-unpacking of comprehensions (PEP 798) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/23788">#23788</a>)</li>
<li>Reject semantic syntax errors for lazy imports (<a
href="https://redirect.github.com/astral-sh/ruff/pull/23757">#23757</a>)</li>
<li>Drop a few rules from the preview default set (<a
href="https://redirect.github.com/astral-sh/ruff/pull/23879">#23879</a>)</li>
<li>[<code>airflow</code>] Flag <code>Variable.get()</code> calls
outside of task execution context (<code>AIR003</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/23584">#23584</a>)</li>
<li>[<code>airflow</code>] Flag runtime-varying values in DAG/task
constructor arguments (<code>AIR304</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/23631">#23631</a>)</li>
<li>[<code>flake8-bugbear</code>] Implement
<code>delattr-with-constant</code> (<code>B043</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/23737">#23737</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="0ef39de46c"><code>0ef39de</code></a>
Bump 0.15.7 (<a
href="https://redirect.github.com/astral-sh/ruff/issues/24049">#24049</a>)</li>
<li><a
href="beb543b5c6"><code>beb543b</code></a>
[ty] ecosystem-analyzer: Fail on newly panicking projects (<a
href="https://redirect.github.com/astral-sh/ruff/issues/24043">#24043</a>)</li>
<li><a
href="378fe73092"><code>378fe73</code></a>
Don't show noqa hover for non-Python documents (<a
href="https://redirect.github.com/astral-sh/ruff/issues/24040">#24040</a>)</li>
<li><a
href="b5665bd18e"><code>b5665bd</code></a>
[<code>pylint</code>] Improve phrasing (<code>PLC0208</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/issues/24033">#24033</a>)</li>
<li><a
href="6e20f22190"><code>6e20f22</code></a>
test: migrate <code>show_settings</code> and <code>version</code> tests
to use <code>CliTest</code> (<a
href="https://redirect.github.com/astral-sh/ruff/issues/23702">#23702</a>)</li>
<li><a
href="f99b284c1f"><code>f99b284</code></a>
Drain file watcher events during test setup (<a
href="https://redirect.github.com/astral-sh/ruff/issues/24030">#24030</a>)</li>
<li><a
href="744c996c35"><code>744c996</code></a>
[ty] Filter out unsatisfiable inference attempts during generic call
narrowin...</li>
<li><a
href="16160958bd"><code>1616095</code></a>
[ty] Avoid inferring intersection types for call arguments (<a
href="https://redirect.github.com/astral-sh/ruff/issues/23933">#23933</a>)</li>
<li><a
href="7f275f431b"><code>7f275f4</code></a>
[ty] Pin mypy_primer in <code>setup_primer_project.py</code> (<a
href="https://redirect.github.com/astral-sh/ruff/issues/24020">#24020</a>)</li>
<li><a
href="7255e362e4"><code>7255e36</code></a>
[<code>pycodestyle</code>] Recognize <code>pyrefly:</code> as a pragma
comment (<code>E501</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/issues/24019">#24019</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/astral-sh/ruff/compare/0.15.0...0.15.7">compare
view</a></li>
</ul>
</details>
<br />
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Requested by @majdyz
### Why / What / How
**Why:** PR descriptions currently explain the *what* and *how* but not
the *why*. Without motivation context, reviewers can't judge whether an
approach fits the problem. Nick flagged this in standup: "The PR
descriptions you use are explaining the what not the why."
**What:** Adds a consistent Why / What / How structure to PR
descriptions across the entire workflow — template, CLAUDE.md guidance,
and all PR-related skills (`/pr-review`, `/pr-test`, `/pr-address`).
**How:**
- **`.github/PULL_REQUEST_TEMPLATE.md`**: Replaced the old vague
`Changes` heading with a single `Why / What / How` section with guiding
comments
- **`autogpt_platform/CLAUDE.md`**: Added bullet under "Creating Pull
Requests" requiring the Why/What/How structure
- **`.claude/skills/pr-review/SKILL.md`**: Added "Read the PR
description" step before reading the diff, and "Description quality" to
the review checklist
- **`.claude/skills/pr-test/SKILL.md`**: Updated Step 1 to read the PR
description and understand Why/What/How before testing
- **`.claude/skills/pr-address/SKILL.md`**: Added "Read the PR
description" step before fetching comments
## Test plan
- [x] All five files reviewed for correct formatting and consistency
---
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
## Summary
- Add `DB_STATEMENT_CACHE_SIZE` env var support for Prisma query engine
- Wires through as `statement_cache_size` URL parameter to control the
LRU prepared statement cache per connection in the Rust binary engine
## Why
Live investigation on dev pods showed the Prisma Rust engine growing
from 34MB to 932MB over ~1hr due to unbounded query plan cache. Despite
`pgbouncer=true` in the DATABASE_URL (which should disable caching), the
engine still caches.
This gives explicit control: setting `DB_STATEMENT_CACHE_SIZE=0`
disables the cache entirely.
## Live data (dev)
```
Fresh pod: Python=693MB, Engine=34MB, Total=727MB
Bloated: Python=2.1GB, Engine=932MB, Total=3GB
```
## Infra companion PR
[AutoGPT_cloud_infrastructure#299](https://github.com/Significant-Gravitas/AutoGPT_cloud_infrastructure/pull/299)
sets `DB_STATEMENT_CACHE_SIZE=0` along with `PYTHONMALLOC=malloc` and
memory limit changes.
## Test plan
- [ ] Deploy to dev and monitor Prisma engine memory over 1hr
- [ ] Verify queries still work correctly with cache disabled
- [ ] Compare engine RSS on fresh vs aged pods
## Summary
- Update /pr-test skill to consistently show screenshots inline to the
user with explanations
- Post PR comments with inline images and per-screenshot descriptions
(not just local file paths)
- Simplify GitHub Git API upload flow for screenshot hosting
## Changes
- Step 5: Take screenshots at every significant test step (aim for 1+
per scenario)
- Step 6 (new): Show every screenshot to the user via Read tool with 2-3
sentence explanations
- Step 7: Post PR comment with inline images, summary table, and
per-screenshot context
## Test plan
- [x] Tested end-to-end on PR #12512 — screenshots uploaded and rendered
correctly in PR comment
## Summary
- Replaces the arch-conditional chromium install (ARM64 vs AMD64) with a
single approach: always use the distro-packaged `chromium` and set
`AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium`
- Removes `agent-browser install` entirely (it downloads Chrome for
Testing, which has no ARM64 binary)
- Removes the `entrypoint.sh` wrapper script that was setting the env
var at runtime
- Updates `autogpt_platform/db/docker/docker-compose.yml`: removes
`external: true` from the network declarations so the Supabase stack can
be brought up standalone (needed for the Docker integration tests in the
test plan below — without this, `docker compose up` fails unless the
platform stack is already running); also sets
`GOTRUE_MAILER_AUTOCONFIRM: true` for local dev convenience (no SMTP
setup required on first run — this compose file is not used in
production)
- Updates `autogpt_platform/docker-compose.platform.yml`: mounts the
`workspace` volume so agent-browser results (screenshots, snapshots) are
accessible from other services; without this the copilot workspace write
fails in Docker
## Verification
Tested via Docker build on arm64 (Apple Silicon):
```
=== Testing agent-browser with system chromium ===
✓ Example Domain
https://example.com/
=== SUCCESS: agent-browser launched with system chromium ===
```
agent-browser navigated to example.com in ~1.5s using system chromium
(v146 from Debian trixie).
## Test plan
- [x] Docker build test on arm64: `agent-browser open
https://example.com` succeeds with system chromium
- [x] Verify amd64 Docker build still works (CI)
## Summary
- Enable one-click import of workflows from other platforms (n8n,
Make.com, Zapier, etc.) into AutoGPT via CoPilot
- **No backend endpoint** — import is entirely client-side: the dialog
reads the file or fetches the n8n template URL, uploads the JSON to the
workspace via `uploadFileDirect`, stores the file reference in
`sessionStorage`, and redirects to CoPilot with `autosubmit=true`
- CoPilot receives the workflow JSON as a proper file attachment and
uses the existing agent-generator pipeline to convert it
- Library dialog redesigned: 2 tabs — "AutoGPT agent" (upload exported
agent JSON) and "Another platform" (file upload + optional n8n URL)
## How it works
1. User uploads a workflow JSON (or pastes an n8n template URL)
2. Frontend fetches/reads the JSON and uploads it to the user's
workspace via the existing file upload API
3. User is redirected to `/copilot?source=import&autosubmit=true`
4. CoPilot picks up the file from `sessionStorage` and sends it as a
`FileUIPart` attachment with a prompt to recreate the workflow as an
AutoGPT agent
## Test plan
- [x] Manual test: import a real n8n workflow JSON via the dialog
- [x] Manual test: paste an n8n template URL and verify it fetches +
converts
- [x] Manual test: import Make.com / Zapier workflow export JSON
- [x] Repeated imports don't cause 409 conflicts (filenames use
`crypto.randomUUID()`)
- [x] E2E: Import dialog has 2 tabs (AutoGPT agent + Another platform)
- [x] E2E: n8n quick-start template buttons present
- [x] E2E: n8n URL input enables Import button on valid URL
- [x] E2E: Workspace upload API returns file_id
Requested by @majdyz
User-caused errors (no payment method, webhook agent invocation, missing
credentials, bad API keys) were hitting Sentry via `logger.exception()`
in the `ValueError` handler, creating noise that obscures real bugs.
Additionally, a frontend crash on the copilot page (BUILDER-71J) needed
fixing.
**Changes:**
**Backend — rest_api.py**
- Set `log_error=False` for the `ValueError` exception handler (line
278), consistent with how `FolderValidationError` and `NotFoundError`
are already handled. User-caused 400 errors no longer trigger
`logger.exception()` → Sentry.
**Backend — executor/manager.py**
- Downgrade `ExecutionManager` input validation skip errors from `error`
to `warning` level. Missing credentials is expected user behavior, not
an internal error.
**Backend — blocks/llm.py**
- Sanitize unpaired surrogates in LLM prompt content before sending to
provider APIs. Prevents `UnicodeEncodeError: surrogates not allowed`
when httpx encodes the JSON body (AUTOGPT-SERVER-8AX).
**Frontend — package.json**
- Upgrade `ai` SDK from `6.0.59` to `6.0.134` to fix BUILDER-71J
(`TypeError: undefined is not an object (evaluating
'this.activeResponse.state')` on /copilot page). This is a known issue
in the Vercel AI SDK fixed in later patch versions.
**Sentry issues addressed:**
- `No payment method found` (ValueError → 400)
- `This agent is triggered by an external event (webhook)` (ValueError →
400)
- `Node input updated with non-existent credentials` (ValueError → 400)
- `[ExecutionManager] Skip execution, input validation error: missing
input {credentials}`
- `UnicodeEncodeError: surrogates not allowed` (AUTOGPT-SERVER-8AX)
- `TypeError: activeResponse.state` (BUILDER-71J)
Resolves SECRT-2166
---
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
---------
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
## Summary
Reduce CoPilot per-turn token overhead by systematically trimming tool
descriptions, parameter schemas, and system prompt content. All 35 MCP
tool schemas are passed on every SDK call — this PR reduces their size.
### Strategy
1. **Tool descriptions**: Trimmed verbose multi-sentence explanations to
concise single-sentence summaries while preserving meaning
2. **Parameter schemas**: Shortened parameter descriptions to essential
info, removed some `default` values (handled in code)
3. **System prompt**: Condensed `_SHARED_TOOL_NOTES` and storage
supplement template in `prompting.py`
4. **Cross-tool references**: Removed duplicate workflow hints (e.g.
"call find_block before run_block" appeared in BOTH tools — kept only in
the dependent tool). Critical cross-tool references retained (e.g.
`continue_run_block` in `run_block`, `fix_agent_graph` in
`validate_agent`, `get_doc_page` in `search_docs`, `web_fetch`
preference in `browser_navigate`)
### Token Impact
| Metric | Before | After | Reduction |
|--------|--------|-------|-----------|
| System Prompt | ~865 tokens | ~497 tokens | 43% |
| Tool Schemas | ~9,744 tokens | ~6,470 tokens | 34% |
| **Grand Total** | **~10,609 tokens** | **~6,967 tokens** | **34%** |
Saves **~3,642 tokens per conversation turn**.
### Key Decisions
- **Mostly description changes**: Tool logic, parameters, and types
unchanged. However, some schema-level `default` fields were removed
(e.g. `save` in `customize_agent`) — these are machine-readable
metadata, not just prose, and may affect LLM behavior.
- **Quality preserved**: All descriptions still convey what the tool
does and essential usage patterns
- **Cross-references trimmed carefully**: Kept prerequisite hints in the
dependent tool (run_block mentions find_block) but removed the reverse
(find_block no longer mentions run_block). Critical cross-tool guidance
retained where removal would degrade model behavior.
- **`run_time` description fixed**: Added missing supported values
(today, last 30 days, ISO datetime) per review feedback
### Future Optimization
The SDK passes all 35 tools on every call. The MCP protocol's
`list_tools()` handler supports dynamic tool registration — a follow-up
PR could implement lazy tool loading (register core tools + a discovery
meta-tool) to further reduce per-turn token cost.
### Changes
- Trimmed descriptions across 25 tool files
- Condensed `_SHARED_TOOL_NOTES` and `_build_storage_supplement` in
`prompting.py`
- Fixed `run_time` schema description in `agent_output.py`
### Checklist
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] All 273 copilot tests pass locally
- [x] All 35 tools load and produce valid schemas
- [x] Before/after token dumps compared
- [x] Formatting passes (`poetry run format`)
- [x] CI green
## Summary
- Adds `/pr-test` skill for automated E2E testing of PRs using docker
compose, agent-browser, and API calls
- Covers full environment setup (copy .env, configure copilot auth,
ARM64 Docker fix)
- Includes browser UI testing, direct API testing, screenshot capture,
and test report generation
- Has `--fix` mode for auto-fixing bugs found during testing (similar to
`/pr-address`)
- **Screenshot uploads use GitHub Git API** (blobs → tree → commit →
ref) — no local git operations, safe for worktrees
- **Subscription mode improvements:**
- Extract subscription auth logic to `sdk/subscription.py` — uses SDK's
bundled CLI binary instead of requiring `npm install -g
@anthropic-ai/claude-code`
- Auto-provision `~/.claude/.credentials.json` from
`CLAUDE_CODE_OAUTH_TOKEN` env var on container startup — no `claude
login` needed in Docker
- Add `scripts/refresh_claude_token.sh` — cross-platform helper
(macOS/Linux/Windows) to extract OAuth tokens from host and update
`backend/.env`
## Test plan
- [x] Validated skill on multiple PRs (#12482, #12483, #12499, #12500,
#12501, #12440, #12472) — all test scenarios passed
- [x] Confirmed screenshot upload via GitHub Git API renders correctly
on all 7 PRs
- [x] Verified subscription mode E2E in Docker:
`refresh_claude_token.sh` → `docker compose up` → copilot chat responds
correctly with no API keys (pure OAuth subscription)
- [x] Verified auto-provisioning of credentials file inside container
from `CLAUDE_CODE_OAUTH_TOKEN` env var
- [x] Confirmed bundled CLI detection
(`claude_agent_sdk._bundled/claude`) works without system-installed
`claude`
- [x] `poetry run pytest backend/copilot/sdk/service_test.py` — 24/24
tests pass
## Summary
Fixes 5 high-priority Sentry alerts from production:
- **AUTOGPT-SERVER-8AM**: Fix `TypeError: TypedDict does not support
instance and class checks` — `_value_satisfies_type` in `type.py` now
handles TypedDict classes that don't support `isinstance()` checks
- **AUTOGPT-SERVER-8AN**: Fix `ValueError: No payment method found`
triggering Sentry error — catch the expected ValueError in the
auto-top-up endpoint and return HTTP 422 instead
- **BUILDER-7F5**: Fix `Upload failed (409): File already exists` — add
`overwrite` query param to workspace upload endpoint and set it to
`true` from the frontend direct-upload
- **BUILDER-7F0**: Fix `LaTeX-incompatible input` KaTeX warnings
flooding Sentry — set `strict: false` on rehype-katex plugin to suppress
warnings for unrecognized Unicode characters
- **AUTOGPT-SERVER-89N**: Fix `Tool execution with manager failed:
validation error for dict[str,list[any]]` — make RPC return type
validation resilient (log warning instead of crash) and downgrade
SmartDecisionMaker tool execution errors to warnings
## Test plan
- [ ] Verify TypedDict type coercion works for
GithubMultiFileCommitBlock inputs
- [ ] Verify auto-top-up without payment method returns 422, not 500
- [ ] Verify file re-upload in copilot succeeds (overwrites instead of
409)
- [ ] Verify LaTeX rendering with Unicode characters doesn't produce
console warnings
- [ ] Verify SmartDecisionMaker tool execution failures are logged at
warning level
Requested by @kcze
When a user closes the OAuth sign-in popup without completing
authentication, the 'Waiting on sign-in process' modal was stuck open
with no way to dismiss it, forcing a page refresh.
Two bugs caused this:
1. `oauth-popup.ts` had no detection for the popup being closed by the
user. The promise would hang until the 5-minute timeout.
2. The modal's cancel button aborted a disconnected `AbortController`
instead of the actual OAuth flow's abort function, so clicking
cancel/close did nothing.
### Changes
- Add `popup.closed` polling (500ms) in `openOAuthPopup()` that rejects
the promise when the user closes the auth window
- Add reject-on-abort so the cancel button properly terminates the flow
- Replace the disconnected `oAuthPopupController` with a direct
`cancelOAuthFlow()` function that calls the real abort ref
- Handle popup-closed and user-canceled as silent cancellations (no
error toast)
### Testing
Tested manually ✅
- [x] Start OAuth flow → close popup window → modal dismisses
automatically ✅
- [x] Start OAuth flow → click cancel on modal → popup closes, modal
dismisses ✅
- [x] Complete OAuth flow normally → works as before ✅
Resolves SECRT-2054
---
Co-authored-by: Krzysztof Czerwinski (@kcze)
<krzysztof.czerwinski@agpt.co>
---------
Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
### Before
<img width="500" height="501" alt="Screenshot 2026-03-20 at 21 50 31"
src="https://github.com/user-attachments/assets/6154cffb-6772-4c3d-a703-527c8ca0daff"
/>
### After
<img width="500" height="581" alt="Screenshot 2026-03-20 at 21 33 12"
src="https://github.com/user-attachments/assets/2f9bd69d-30c5-4d06-ad1e-ed76b184afe5"
/>
### Other minor fixes
- minor spacing adjustments in creator/search pages when empty and
between sections
### Summary
- Increase StoreCard height from 25rem to 26.5rem to prevent content
overflow
- Replace manual tooltip-based title truncation with `OverflowText`
component in StoreCard
- Adjust carousel indicator positioning and hide it on md+ when exactly
3 featured agents are shown
## Test plan
- [x] Verify marketplace cards display without text overflow
- [x] Verify featured section carousel indicators behave correctly
- [x] Check responsive behavior at common breakpoints
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
<img width="1487" height="670" alt="Screenshot 2026-03-20 at 00 52 58"
src="https://github.com/user-attachments/assets/f09de2a0-3c5b-4bce-b6f4-8a853f6792cf"
/>
- Move the download button from inline next to "Add to library" to a
separate line below it
- Add contextual text: "Want to use this agent locally? Download here"
- Style the "Download here" as a violet ghost button link with the
download icon
## Test plan
- [ ] Visit a marketplace agent page
- [ ] Verify "Add to library" button renders in its row
- [ ] Verify "Want to use this agent locally? Download here" appears
below it
- [ ] Click "Download here" and confirm the agent downloads correctly
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduces `line-clamp` from 3 to 2 on the marketplace `StoreCard`
description to prevent text from overlapping with the
absolutely-positioned run count and +Add button at the bottom of the
card.
Resolves SECRT-2156.
---
Co-authored-by: Abhimanyu Yadav (@Abhi1992002)
<122007096+Abhi1992002@users.noreply.github.com>
## Summary
- Fixes SmartDecisionMakerBlock conversation management to work with
OpenAI's Responses API, which was introduced in #12099 (commit 1240f38)
- The migration to `responses.create` updated the outbound LLM call but
missed the conversation history serialization — the `raw_response` is
now the entire `Response` object (not a `ChatCompletionMessage`), and
tool calls/results use `function_call` / `function_call_output` types
instead of role-based messages
- This caused a 400 error on the second LLM call in agent mode:
`"Invalid value: ''. Supported values are: 'assistant', 'system',
'developer', and 'user'."`
### Changes
**`smart_decision_maker.py`** — 6 functions updated:
| Function | Fix |
|---|---|
| `_convert_raw_response_to_dict` | Detects Responses API `Response`
objects, extracts output items as a list |
| `_get_tool_requests` | Recognizes `type: "function_call"` items |
| `_get_tool_responses` | Recognizes `type: "function_call_output"`
items |
| `_create_tool_response` | New `responses_api` kwarg produces
`function_call_output` format |
| `_update_conversation` | Handles list return from
`_convert_raw_response_to_dict` |
| Non-agent mode path | Same list handling for traditional execution |
**`test_smart_decision_maker_responses_api.py`** — 61 tests covering:
- Every branch of all 6 affected helper functions
- Chat Completions, Anthropic, and Responses API formats
- End-to-end agent mode and traditional mode conversation validity
## Test plan
- [x] 61 new unit tests all pass
- [x] 11 existing SmartDecisionMakerBlock tests still pass (no
regressions)
- [x] All pre-commit hooks pass (ruff, black, isort, pyright)
- [ ] CI integration tests
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Updates core LLM invocation and agent conversation/tool-call
bookkeeping to match OpenAI’s Responses API, which can affect tool
execution loops and prompt serialization across providers. Risk is
mitigated by extensive new unit tests, but regressions could surface in
production agent-mode flows or token/usage accounting.
>
> **Overview**
> **Migrates OpenAI calls from Chat Completions to the Responses API
end-to-end**, including tool schema conversion, output parsing,
reasoning/text extraction, and updated token usage fields in
`LLMResponse`.
>
> **Fixes SmartDecisionMakerBlock conversation/tool handling for
Responses API** by treating `raw_response` as a Response object
(splitting it into `output` items for replay), recognizing
`function_call`/`function_call_output` entries, and emitting tool
outputs in the correct Responses format to prevent invalid follow-up
prompts.
>
> Also adjusts prompt compaction/token estimation to understand
Responses API tool items, changes
`get_execution_outputs_by_node_exec_id` to return list-valued
`CompletedBlockOutput`, removes `gpt-3.5-turbo` from model/cost/docs
lists, and adds focused unit tests plus a lightweight `conftest.py` to
run these tests without the full server stack.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
ff292efd3d. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>
Requested by @majdyz
Adds TDD (test-driven development) guidance to CLAUDE.md files so Claude
Code follows a test-first workflow when fixing bugs or adding features.
**Changes:**
- **Parent `CLAUDE.md`**: Cross-cutting TDD workflow — write a failing
`xfail` test, implement the fix, remove the marker
- **Backend `CLAUDE.md`**: Concrete pytest example with
`@pytest.mark.xfail` pattern
- **Frontend `CLAUDE.md`**: Note about using Playwright `.fixme`
annotation for bug-fix tests
The workflow is: write a failing test first → confirm it fails for the
right reason → implement → confirm it passes. This ensures every bug fix
is covered by a test that would have caught the regression.
---
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
Reverts Significant-Gravitas/AutoGPT#12099
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Reverts the OpenAI integration in `llm_call` from the Responses API
back to `chat.completions`, which can change tool-calling, JSON-mode
behavior, and token accounting across core AI blocks. The change is
localized but touches the primary LLM execution path and associated
tests/docs.
>
> **Overview**
> Reverts the OpenAI path in `backend/blocks/llm.py` from the Responses
API back to `chat.completions`, including updating JSON-mode
(`response_format`), tool handling, and usage extraction to match the
Chat Completions response shape.
>
> Removes the now-unused `backend/util/openai_responses.py` helpers and
their unit tests, updates LLM tests to mock `chat.completions.create`,
and adds `gpt-3.5-turbo` to the supported model list, cost config, and
LLM docs.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
7d6226d10e. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
## Summary
Reverts the invite system PRs due to security gaps identified during
review:
- The move from Supabase-native `allowed_users` gating to
application-level gating allows orphaned Supabase auth accounts (valid
JWT without a platform `User`)
- The auth middleware never verifies `User` existence, so orphaned users
get 500s instead of clean 403s
- OAuth/Google SSO signup completely bypasses the invite gate
- The DB trigger that atomically created `User` + `Profile` on signup
was dropped in favor of a client-initiated API call, introducing a
failure window
### Reverted PRs
- Reverts #12347 — Foundation: InvitedUser model, invite-gated signup,
admin UI
- Reverts #12374 — Tally enrichment: personalized prompts from form
submissions
- Reverts #12451 — Pre-check: POST /auth/check-invite endpoint
- Reverts #12452 (collateral) — Themed prompt categories /
SuggestionThemes UI. This PR built on top of #12374's
`suggested_prompts` backend field and `/chat/suggested-prompts`
endpoint, so it cannot remain without #12374. The copilot empty session
falls back to hardcoded default prompts.
### Migration
Includes a new migration (`20260319120000_revert_invite_system`) that:
- Drops the `InvitedUser` table and its enums (`InvitedUserStatus`,
`TallyComputationStatus`)
- Restores the `add_user_and_profile_to_platform()` trigger on
`auth.users`
- Backfills `User` + `Profile` rows for any auth accounts created during
the invite-gate window
### What's NOT reverted
- The `generate_username()` function (never dropped, still used by
backfill migration)
- The old `add_user_to_platform()` function (superseded by
`add_user_and_profile_to_platform()`)
- PR #12471 (admin UX improvements) — was never merged, no action needed
## Test plan
- [x] Verify migration: `InvitedUser` table dropped, enums dropped,
trigger restored
- [x] Verify backfill: no orphaned auth users, no users without Profile
- [x] Verify existing users can still log in (email + OAuth)
- [x] Verify CoPilot chat page loads with default prompts
- [ ] Verify new user signup creates `User` + `Profile` via the restored
trigger
- [ ] Verify admin `/admin/users` page loads without crashing
- [ ] Run backend tests: `poetry run test`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
## Summary
<img width="400" height="339" alt="Screenshot 2026-03-19 at 22 53 23"
src="https://github.com/user-attachments/assets/2fa76b8f-424d-4764-90ac-b7a331f5f610"
/>
<img width="600" height="595" alt="Screenshot 2026-03-19 at 22 53 31"
src="https://github.com/user-attachments/assets/23f51cc7-b01e-4d83-97ba-2c43683877db"
/>
<img width="800" height="523" alt="Screenshot 2026-03-19 at 22 53 36"
src="https://github.com/user-attachments/assets/1e447b9a-1cca-428c-bccd-1730f1670b8e"
/>
Now that we have the `Give feedback` button on the Navigation bar,
collpase some of the links below `1280px` so there is more space and
they don't collide with each other...
- Collapse navbar link text to icon-only below 1280px (`xl` breakpoint)
to prevent crowding
- Wallet button shows only the wallet icon below 1280px instead of "Earn
credits" text
- Feedback button shows only the chat icon below 1280px instead of "Give
Feedback" text
- Added `whitespace-nowrap` to feedback button to prevent wrapping
## Changes
- `NavbarLink.tsx`: `lg:block` → `xl:block` for link text
- `Wallet.tsx`: `md:hidden`/`md:inline-block` →
`xl:hidden`/`xl:inline-block`
- `FeedbackButton.tsx`: wrap text in `hidden xl:inline` span, add
`whitespace-nowrap`
## Test plan
- [ ] Resize browser between 1024px–1280px and verify navbar shows only
icons
- [ ] At 1280px+ verify full text labels appear for links, wallet, and
feedback
- [ ] Verify mobile navbar still works correctly below `md` breakpoint
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
https://github.com/user-attachments/assets/13da6d36-5f35-429b-a6cf-e18316bb8709
Replaces the flat list of suggestion pills in the CoPilot empty session
with themed prompt categories (Learn, Create, Automate, Organize), each
shown as a popover with contextual prompts.
- **Backend**: Changes `suggested_prompts` from a flat `list[str]` to a
themed `dict[str, list[str]]` keyed by category. Updates Tally
extraction LLM prompt to generate prompts per theme, and the
`/suggested-prompts` API to return grouped themes. Legacy `list[str]`
rows are preserved under a `"General"` key for backward compatibility.
- **Frontend**: Replaces inline pill buttons with a `SuggestionThemes`
popover component. Each theme button (with icon) opens a dropdown of 5
relevant prompts. Falls back to hardcoded defaults when the API has no
personalized prompts. Normalizes partial API responses by padding
missing themes with defaults. Legacy `"General"` prompts are distributed
round-robin across themes so existing users keep their personalized
suggestions.
### Changes 🏗️
- `backend/data/understanding.py`: `suggested_prompts` field changed
from `list[str]` to `dict[str, list[str]]`; legacy list rows preserved
under `"General"` key; list items validated as strings
- `backend/data/tally.py`: LLM prompt updated to generate themed
prompts; validation now per-theme with blank-string rejection
- `backend/api/features/chat/routes.py`: New `SuggestedTheme` model;
endpoint returns `themes[]`
- `frontend/copilot/components/EmptySession/EmptySession.tsx`: Uses
generated API types directly (no cast)
- `frontend/copilot/components/EmptySession/helpers.ts`:
`DEFAULT_THEMES` replaces `DEFAULT_QUICK_ACTIONS`; `getSuggestionThemes`
normalizes partial API responses and distributes legacy `"General"`
prompts across themes
-
`frontend/copilot/components/EmptySession/components/SuggestionThemes/`:
New popover component with theme icons and loading states
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verify themed suggestion buttons render on CoPilot empty session
- [x] Click each theme button and confirm popover opens with prompts
- [x] Click a prompt and confirm it sends the message
- [x] Verify fallback to default themes when API returns no custom
prompts
- [x] Verify legacy users' personalized prompts are preserved and
visible
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
Improves the `pr-address` skill with two fixes:
- **Full comment thread loading**: Adds `--paginate` to the inline
comments fetch and explicit instructions to reconstruct threads using
`in_reply_to_id`, reading root-to-last-reply before acting. Previously,
only the opening comment was visible — missing reviewer replies led to
wrong fixes.
- **Backtick-safe PR descriptions**: Adds instructions to write the PR
body to a temp file via `<<'PREOF'` heredoc before passing to `gh pr
edit/create`. Inlining the body directly causes backticks to be
shell-escaped, breaking markdown rendering.
## Test plan
- [ ] Run `/pr-address` on a PR with multi-reply inline comment threads
— verify the last reply is what gets acted on
- [ ] Update a PR description containing backticks — verify they render
correctly in GitHub
When OpenAI credentials are unavailable (fork PRs, dev envs without API
keys), both builder block search and store agent functionality break:
1. **Block search returns wrong results.** `unified_hybrid_search` falls
back to a zero vector when embedding generation fails. With ~200 blocks
in `UnifiedContentEmbedding`, the zero-vector semantic scores are
garbage, and lexical matching on short block names is too weak — "Store
Value" doesn't appear in the top results for query "Store Value".
2. **Store submission approval fails entirely.**
`review_store_submission` calls `ensure_embedding()` inside a
transaction. When it throws, the entire transaction rolls back — no
store submissions get approved, the `StoreAgent` materialized view stays
empty, and all marketplace e2e tests fail.
3. **Store search returns nothing.** Even when store data exists,
`hybrid_search` queries `UnifiedContentEmbedding` which has no store
agent rows (backfill failed). It succeeds with zero results rather than
throwing, so the existing exception-based fallback never triggers.
### Changes 🏗️
- Replace `unified_hybrid_search` with in-memory text search in
`_hybrid_search_blocks` (-> `_text_search_blocks`). All ~200 blocks are
already loaded in memory, and `_score_primary_fields` provides correct
deterministic text relevance scoring against block name, description,
and input schema field descriptions — the same rich text the embedding
pipeline uses. CamelCase block names are split via `split_camelcase()`
to match the tokenization from PR #12400.
- Make embedding generation in `review_store_submission` best-effort:
catch failures and log a warning instead of rolling back the approval
transaction. The backfill scheduler retries later when credentials
become available.
- Fall through to direct DB search when `hybrid_search` returns empty
results (not just when it throws). The fallback uses ad-hoc
`to_tsvector`/`plainto_tsquery` with `ts_rank_cd` ranking on
`StoreAgent` view fields, restoring the search quality of the original
pre-hybrid implementation (stemming, stop-word removal, relevance
ranking).
- Fix Playwright artifact upload in end-to-end test CI
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] `build.spec.ts`: 8/8 pass locally (was 0/7 before fix)
- [x] All 79 e2e tests pass in CI (was 15 failures before fix)
---
Co-authored-by: Reinier van der Leer (@Pwuts)
---------
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After the legacy builder was removed in #12082, the TallyPopup component
still showed a "Tutorial" button (bottom-right, next to "Give Feedback")
that navigated to `/build?resetTutorial=true`. Nothing handles that
param anymore, so clicking it did nothing.
This removes the dead button and its associated state/handler from
TallyPopup and useTallyPopup. The working tutorial (Shepherd.js
chalkboard icon in CustomControls) is unaffected.
**Changes:**
- `TallyPopup.tsx`: Remove Tutorial button JSX, unused imports
(`usePathname`, `useSearchParams`), and `isNewBuilder` check
- `useTallyPopup.ts`: Remove `showTutorial` state, `handleResetTutorial`
handler, unused `useRouter` import
Resolves SECRT-2109
---
Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>
Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>
### Before
<img width="600" height="489" alt="Screenshot 2026-03-18 at 19 24 41"
src="https://github.com/user-attachments/assets/bb8dc0fa-04cd-4f32-8125-2d7930b4acde"
/>
Formatted headings in user messages would look massive
### After
<img width="600" height="549" alt="Screenshot 2026-03-18 at 19 24 33"
src="https://github.com/user-attachments/assets/51230232-c914-42dd-821f-3b067b80bab4"
/>
Markdown headings (`# H1` through `###### H6`) and setext-style headings
(`====`) in user chat messages rendered at their full HTML heading size,
which looked disproportionately large in the chat bubble context.
### Changes 🏗️
- Added Tailwind CSS overrides on the user message `MessageContent`
wrapper to cap all heading elements (h1-h6) at `text-lg font-semibold`
- Only affects user messages in copilot chat (via `group-[.is-user]`
selector); assistant messages are unchanged
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [ ] Send a user message containing `# Heading 1` through `######
Heading 6` and verify they all render at constrained size
- [ ] Send a message with `====` separator pattern and verify it doesn't
render as a mega H1
- [ ] Verify assistant messages with headings still render normally
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- When a user has connected GitHub, `GH_TOKEN` is automatically injected
into the Claude Agent SDK subprocess environment so `gh` CLI commands
work without any manual auth step
- When GitHub is **not** connected, the copilot can call a new
`connect_integration(provider="github")` MCP tool, which surfaces the
same credential setup card used by regular GitHub blocks — the user
connects inline without leaving the chat
- After connecting, the copilot is instructed to retry the operation
automatically
## Changes
**Backend**
- `sdk/service.py`: `_get_github_token_for_user()` fetches OAuth2 or API
key credentials and injects `GH_TOKEN` + `GITHUB_TOKEN` into `sdk_env`
before the SDK subprocess starts (per-request, thread-safe via
`ClaudeAgentOptions.env`)
- `tools/connect_integration.py`: new `ConnectIntegrationTool` MCP tool
— returns `SetupRequirementsResponse` for a given provider (`github` for
now); extensible via `_PROVIDER_INFO` dict
- `tools/__init__.py`: registers `connect_integration` in
`TOOL_REGISTRY`
- `prompting.py`: adds GitHub CLI / `connect_integration` guidance to
`_SHARED_TOOL_NOTES`
**Frontend**
- `ConnectIntegrationTool/ConnectIntegrationTool.tsx`: thin wrapper
around the existing `SetupRequirementsCard` with a tailored retry
instruction
- `MessagePartRenderer.tsx`: dispatches `tool-connect_integration` to
the new component
## Test plan
- [ ] User with GitHub credentials: `gh pr list` works without any auth
step in copilot
- [ ] User without GitHub credentials: copilot calls
`connect_integration`, card renders with GitHub credential input, after
connecting copilot retries and `gh` works
- [ ] `GH_TOKEN` is NOT leaked across users (injected via
`ClaudeAgentOptions.env`, not `os.environ`)
- [ ] `connect_integration` with unknown provider returns a graceful
error message
## Summary
- On **amd64**: keep `agent-browser install` (Chrome for Testing —
pinned version tested with Playwright) + restore runtime libs
- On **arm64**: install system `chromium` package (Chrome for Testing
has no ARM64 binary) + skip `agent-browser install`
- An entrypoint script sets
`AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium` at container startup
on arm64 (detected via presence of `/usr/bin/chromium`); on amd64 the
var is left unset so agent-browser uses Chrome for Testing as before
**Why not system chromium on amd64?** `agent-browser install` downloads
a specific Chrome for Testing version pinned to the Playwright version
in use. Using whatever Debian ships on amd64 could cause protocol
compatibility issues.
Introduced by #12301 (cc @Significant-Gravitas/zamil-majdy)
## Test plan
- [ ] `docker compose up --build` succeeds on ARM64 (Apple Silicon)
- [ ] `docker compose up --build` succeeds on x86_64
- [ ] Copilot browser tools (`browser_navigate`, `browser_act`,
`browser_screenshot`) work in a Copilot session on both architectures
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
Requested by @Swiftyos
The invite gate check in `get_or_activate_user()` runs after Supabase
creates the auth user, resulting in orphaned auth accounts with no
platform access when a non-invited user signs up. Users could create a
Supabase account but had no `User`, `Profile`, or `Onboarding` records —
they could log in but access nothing.
### Changes 🏗️
**Backend** (`v1.py`, `invited_user.py`):
- Add public `POST /api/auth/check-invite` endpoint (no auth required —
this is a pre-signup check)
- Add `check_invite_eligibility()` helper in the data layer
- Returns `{allowed: true}` when `enable_invite_gate` is disabled
- Extracted `is_internal_email()` helper to deduplicate `@agpt.co`
bypass logic (was duplicated between route and `get_or_activate_user`)
- Checks `InvitedUser` table for `INVITED` status
- Added IP-based Redis rate limiting (10 req/60 s per IP, fails open if
Redis unavailable, returns HTTP 429 when exceeded)
- Fixed Redis pipeline atomicity: `incr` + `expire` now sent in a single
pipeline round-trip, preventing a TTL-less key if `expire` had
previously failed after `incr`
- Fixed incorrect `await` on `pipe.incr()` / `pipe.expire()` — redis-py
async pipeline queue methods are synchronous; only `execute()` is
awaitable. The erroneous `await` was silently swallowed by the `except`
block, making the rate limiter completely non-functional
**Frontend** (`signup/actions.ts`):
- Call the generated `postV1CheckIfAnEmailIsAllowedToSignUp` client
(replacing raw `fetch`) before `supabase.auth.signUp()`
- `ApiError` (non-OK HTTP responses) logs a Sentry warning with the HTTP
status; network/other errors capture a Sentry exception
- If not allowed, return `not_allowed` error (existing
`EmailNotAllowedModal` handles this)
- Graceful fallback: if the pre-check fails (backend unreachable), falls
through to the existing flow — `get_or_activate_user()` remains as
defense-in-depth
**Tests** (`v1_test.py`, `invited_user_test.py`):
- 5 route-level tests covering: gate disabled → allowed, `@agpt.co`
bypass, eligible email, ineligible email, rate-limit exceeded
- Rate-limit test mock updated to use pipeline interface
(`pipeline().execute()` returns `[count, True]`)
- Existing `invited_user_test.py` updated to cover
`check_invite_eligibility` branches
**Not changed:**
- Google OAuth flow — already gated by OAuth provider settings
- `get_or_activate_user()` — stays as backend safety net
- All admin invite CRUD routes — unchanged
### Test plan
1. Email/password signup with invited email → signup proceeds normally
2. Email/password signup with non-invited email → `EmailNotAllowedModal`
shown, no Supabase user created
3. `enable_invite_gate=false` → all emails allowed
4. Backend unreachable during pre-check → falls through to existing flow
5. Same IP exceeds 10 requests/60 s → HTTP 429 returned
---
Co-authored-by: Craig Swift (@Swiftyos) <craigswift13@gmail.com>
---------
Co-authored-by: Craig Swift (@Swiftyos) <craigswift13@gmail.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
## Summary
- Adds merge conflict detection as step 2 of the polling loop (between
CI check and comment check), including handling of the transient
`"UNKNOWN"` state
- Adds a "Resolving merge conflicts" section with step-by-step
instructions using 3-way merge (no force push needed since PRs are
squash-merged)
- Validates all three git conflict markers before staging to prevent
committing broken code
- Fixes `args` → `argument-hint` in skill frontmatter
## Test plan
- [ ] Verify skill renders correctly in Claude Code
## Summary
When the Claude SDK returns a prompt-too-long error (e.g. transcript +
query exceeds the model's context window), the streaming loop now
retries with escalating fallbacks instead of failing immediately:
1. **Attempt 1**: Use the transcript as-is (normal path)
2. **Attempt 2**: Compact the transcript via LLM summarization
(`compact_transcript`) and retry
3. **Attempt 3**: Drop the transcript entirely and fall back to
DB-reconstructed context (`_build_query_message`)
If all 3 attempts fail, a `StreamError(code="prompt_too_long")` is
yielded to the frontend.
### Key changes
**`service.py`**
- Add `_is_prompt_too_long(err)` — pattern-matches SDK exceptions for
prompt-length errors (`prompt is too long`, `prompt_too_long`,
`context_length_exceeded`, `request too large`)
- Wrap `async with ClaudeSDKClient` in a 3-attempt retry `for` loop with
compaction/fallback logic
- Move `current_message`, `_build_query_message`, and
`_prepare_file_attachments` before the retry loop (computed once,
reused)
- Skip transcript upload in `finally` when `transcript_caused_error`
(avoids persisting a broken/empty transcript)
- Reset `stream_completed` between retry iterations
- Document outer-scope variable contract in `_run_stream_attempt`
closure (which variables are reassigned between retries vs read-only)
**`transcript.py`**
- Add `compact_transcript(content, log_prefix, model)` — converts JSONL
→ messages → `compress_context` (LLM summarization with truncation
fallback) → JSONL
- Add helpers: `_flatten_assistant_content`,
`_flatten_tool_result_content`, `_transcript_to_messages`,
`_messages_to_transcript`, `_run_compression`
- Returns `None` when compaction fails or transcript is already within
budget (signals caller to fall through to DB fallback)
- Truncation fallback wrapped in 30s timeout to prevent unbounded CPU
time on large transcripts
- Accepts `model` parameter to avoid creating a new `ChatConfig()` on
every call
**`util/prompt.py`**
- Fix `_truncate_middle_tokens` edge case: returns empty string when
`max_tok < 1`, properly handles `max_tok < 3`
**`config.py`**
- E2B sandbox timeout raised from 5 min to 15 min to accommodate
compaction retries
**`prompt_too_long_test.py`** (new, 45 tests)
- `_is_prompt_too_long` positive/negative patterns, case sensitivity,
BaseException handling
- Flatten helpers for assistant/tool_result content blocks
- `_transcript_to_messages` / `_messages_to_transcript` roundtrip,
strippable types, empty content
- `compact_transcript` async tests: too few messages, not compacted,
successful compaction, compression failure
**`retry_scenarios_test.py`** (new, 27 tests)
- Full retry state machine simulation covering all 8 scenarios:
1. Normal flow (no retry)
2. Compact succeeds → retry succeeds
3. Compact fails → DB fallback succeeds
4. No transcript → DB fallback succeeds
5. Double fail → DB fallback on attempt 3
6. All 3 attempts exhausted
7. Non-prompt-too-long error (no retry)
8. Compaction returns identical content → DB fallback
- Edge cases: nested exceptions, case insensitivity, unicode content,
large transcripts, resume-after-compaction flow
**Shared test fixtures** (`conftest.py`)
- Extracted `build_test_transcript` helper used across 3 test files to
eliminate duplication
## Test plan
- [x] `_is_prompt_too_long` correctly identifies prompt-length errors (8
positive, 5 negative patterns)
- [x] `compact_transcript` compacts oversized transcripts via LLM
summarization
- [x] `compact_transcript` returns `None` on failure or when already
within budget
- [x] Retry loop state machine: all 8 scenarios verified with state
assertions
- [x] `TranscriptBuilder` works correctly after loading compacted
transcripts
- [x] `_messages_to_transcript` roundtrip preserves content including
unicode
- [x] `transcript_caused_error` prevents stale transcript upload
- [x] Truncation timeout prevents unbounded CPU time
- [x] All 139 unit tests pass locally
- [x] CI green (tests 3.11/3.12/3.13, types, CodeQL, linting)
Requested by @Torantulino
Add `google/nano-banana-2` (Gemini 3.1 Flash Image) support across all
three image blocks.
### Changes
**`ai_image_customizer.py`**
- Add `NANO_BANANA_2 = "google/nano-banana-2"` to `GeminiImageModel`
enum
- Update block description to reference Nano-Banana models generically
**`ai_image_generator_block.py`**
- Add `NANO_BANANA_2` to `ImageGenModel` enum
- Add generation branch (identical to NBP except model name)
**`flux_kontext.py` (AI Image Editor)**
- Rename `FluxKontextModelName` → `ImageEditorModel` (with
backwards-compatible alias)
- Add `NANO_BANANA_PRO` and `NANO_BANANA_2` to the editor
- Model-aware branching in `run_model()`: NB models use `image_input`
list (not `input_image`), no `seed`, and add `output_format`
**`block_cost_config.py`**
- Add NB2 cost entries for all three blocks (14 credits, matching NBP)
- Add NB Pro cost entry for editor block
- Update editor block refs from `.PRO`/`.MAX` to
`.FLUX_KONTEXT_PRO`/`.FLUX_KONTEXT_MAX`
Resolves SECRT-2047
---------
Co-authored-by: Torantulino <Torantulino@users.noreply.github.com>
Co-authored-by: Abhimanyu Yadav <122007096+Abhi1992002@users.noreply.github.com>
## Summary
- treat AddToListBlock.entry as optional rather than truthy so
0/""/False are appended
- extend block self-tests with a falsy entry case
## Testing
- Not run (pytest not available in environment)
Co-authored-by: DEEVEN SERU <144827577+DEVELOPER-DEEVEN@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Fixes issue where filenames with no dots until the end (or massive
extensions) bypassed truncation logic, causing OSError [Errno 36].
Limits extension preservation to 20 chars.
---------
Co-authored-by: DEVELOPER-DEEVEN <144827577+DEVELOPER-DEEVEN@users.noreply.github.com>
- Resolves#10657
- Partially based on #10913
### Changes 🏗️
- Run Pyright separately for each supported Python version
- Move type checking and linting into separate jobs
- Add `--skip-pyright` option to lint script
- Move `linter.py` into `backend/scripts`
- Move other scripts in `backend/` too for consistency
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- CI
---
Co-authored-by: @Joaco2603 <jpappa2603@gmail.com>
---------
Co-authored-by: Joaco2603 <jpappa2603@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Poetry v2.2.1 has bugfixes that are relevant in context of our
`.pre-commit-config.yaml`
### Changes 🏗️
- Update `poetry` from v2.1.1 to v2.2.1 (latest version supported by
Dependabot)
- Re-generate `poetry.lock`
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- CI
## Summary
- Updates the `pr-address` skill to poll for new PR comments while
waiting for CI, instead of blocking solely on `gh pr checks --watch
--fail-fast`
- Runs CI watch in the background and polls all 3 comment endpoints
every 30s
- Allows bot comments (coderabbitai, sentry) to be addressed in parallel
with CI rather than sequentially
## Test plan
- [ ] Run `/pr-address` on a PR with pending CI and verify it detects
new comments while CI is running
- [ ] Verify CI failures are still handled correctly after the combined
wait
* fix(backend): add resource limits to Jinja2 template rendering
Prevent DoS via computational exhaustion in FillTextTemplateBlock by:
- Subclassing SandboxedEnvironment to intercept ** and * operators
with caps on exponent size (1000) and string repeat length (10K)
- Replacing range() global with a capped version (max 10K items)
- Wrapping template.render() in a ThreadPoolExecutor with a 10s
timeout to kill runaway expressions
Addresses GHSA-ppw9-h7rv-gwq9 (CWE-400).
* address review: move helpers after TextFormatter, drop ThreadPoolExecutor
- Move _safe_range and _RestrictedEnvironment below TextFormatter
(helpers after the function that uses them)
- Remove ThreadPoolExecutor timeout wrapper from format_string() —
it has problematic behavior in async contexts and the static
interception (operator caps, range limit) already covers the
known attack vectors
* address review: extend sequence guard, harden format_email, add tests
- Extend * guard to cover list and tuple repetition, not just strings
(blocks {{ [0] * 999999999 }} and {{ (0,) * 999999999 }})
- Rename MAX_STRING_REPEAT → MAX_SEQUENCE_REPEAT
- Use _RestrictedEnvironment in format_email (defense-in-depth)
- Add tests: list repeat, tuple repeat, negative exponent, nested
exponentiation (18 tests total)
* add async timeout wrapper at block level
Wrap format_string calls in FillTextTemplateBlock and AgentOutputBlock
with asyncio.wait_for(asyncio.to_thread(...), timeout=10s).
This provides defense-in-depth: if an expression somehow bypasses the
static operator checks, the async timeout will cancel it. Uses
asyncio.to_thread for proper async integration (no event loop blocking)
and asyncio.wait_for for real cancellation on timeout.
* make format_string async with timeout kwarg
Move asyncio.wait_for + asyncio.to_thread into format_string() itself
with a timeout kwarg (default 10s). This way all callers get the
timeout automatically — no wrapper needed at each call site.
- format_string() is now async, callers use await
- format_email() is now async (calls format_string internally)
- Updated all callers: text.py, io.py, llm.py, smart_decision_maker.py,
email.py, notifications.py
- Tests updated to use asyncio.run()
* use Jinja2 native async rendering instead of to_thread
Switch from asyncio.to_thread(template.render) to Jinja2's native
enable_async=True + template.render_async(). No thread overhead,
proper async integration. asyncio.wait_for timeout still applies.
---------
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
## Summary
- Add direct `creator/slug` lookup to `find_agent` marketplace search,
bypassing full-text search when an exact identifier is provided
- Add direct UUID lookup to `find_block`, returning the block
immediately when a valid block ID is given
- Update tool descriptions and parameter hints to document the new
lookup capabilities
## Test plan
- [ ] Verify `find_agent` with a `creator/slug` query returns the exact
agent
- [ ] Verify `find_agent` falls back to search when slug lookup fails
- [ ] Verify `find_block` with a block UUID returns the exact block
- [ ] Verify `find_block` with a non-existent UUID falls through to
search
- [ ] Verify excluded block types/IDs are still filtered in direct
lookup
- Prefer f-strings except for debug statements
- Top-down module/function/class ordering
As suggested by @majdyz, this is more effective than commenting on every
single instance on PRs.
## Summary
- **Path validation fix**: `is_allowed_local_path()` now correctly
handles the SDK's nested conversation UUID path structure
(`<encoded-cwd>/<conversation-uuid>/tool-results/<file>`) instead of
only matching `<encoded-cwd>/tool-results/<file>`
- **`read_workspace_file` fallback**: When the model mistakenly calls
`read_workspace_file` for an SDK tool-result path (local disk, not cloud
storage), the tool now falls back to reading from local disk instead of
returning "file not found"
- **Cross-turn cleanup fix**: Stopped deleting
`~/.claude/projects/<encoded-cwd>/` between turns — tool-result files
now persist across `--resume` turns so the model can re-read them. Added
TTL-based stale directory sweeping (24h) to prevent unbounded disk
growth.
- **System prompt**: Added guidance telling the model to use `read_file`
(not `read_workspace_file`) for SDK tool-result paths
- **Symlink escape fix** (e2b_file_tools.py): Added `readlink -f`
canonicalization inside the E2B sandbox to detect symlink-based path
escapes before writes
- **Stash timeout increase**: `wait_for_stash` timeout increased from
0.5s to 2.0s, with a post-timeout `sleep(0)` fallback
### Root cause
Investigated via Langfuse trace `5116befdca6a6ff9a8af6153753e267d`
(session `d5841fd8`). The model ran 3 Perplexity deep research calls,
SDK truncated large outputs to `~/.claude/projects/.../tool-results/`
files. Model then called `read_workspace_file` (cloud DB) instead of
`read_file` (local disk), getting "file not found". Additionally, the
path validation check didn't account for the SDK's nested UUID directory
structure, and cleanup between turns deleted tool-result files that the
transcript still referenced.
## Test plan
- [x] All 653 copilot tests pass (excluding 1 pre-existing infra test)
- [x] Security test `test_read_claude_projects_settings_json_denied`
still passes — non-tool-result files under the project dir are still
blocked
- [x] `poetry run format` passes all checks
## Summary
- Teaches the `pr-address` skill to use `gh pr checks --watch
--fail-fast` for efficient CI waiting instead of manual polling
- Adds guidance on investigating failures with `gh run view
--log-failed`
- Adds explicit "between CI waits" section: re-fetch and address new bot
comments while CI runs
## Test plan
- [x] Verified the updated skill renders correctly
- [ ] Use `/pr-address` on a PR with pending CI to confirm the new flow
works
SendEmailBlock accepted user-supplied smtp_server and smtp_port inputs
and passed them directly to smtplib.SMTP() with no IP validation,
bypassing the platform's SSRF protections in request.py.
This fix:
- Makes _resolve_and_check_blocked public in request.py so non-HTTP
blocks can reuse the same IP validation
- Validates the SMTP server hostname via resolve_and_check_blocked()
before connecting
- Restricts allowed SMTP ports to standard values (25, 465, 587, 2525)
- Catches SMTPConnectError and SMTPServerDisconnected to prevent TCP
banner leakage in error messages
Fixes GHSA-4jwj-6mg5-wrwf
* fix(backend): add HMAC signing to Redis cache to prevent pickle deserialization attacks
Add HMAC-SHA256 integrity verification to all values stored in the shared
Redis cache. This prevents cache poisoning attacks where an attacker with
Redis access injects malicious pickled payloads that execute arbitrary code
on deserialization.
Changes:
- Sign pickled values with HMAC-SHA256 before storing in Redis
- Verify HMAC signature before deserializing cached values
- Reject tampered or unsigned (legacy) cache entries gracefully
(treated as cache misses, logged as warnings)
- Derive HMAC key from redis_password or unsubscribe_secret_key
- Add tests for HMAC round-trip, tamper detection, and legacy rejection
Fixes GHSA-rfg2-37xq-w4m9
* improve log message
---------
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
Replace NamedTemporaryFile(delete=False) with a direct Response,
preventing unbounded disk consumption on the public download endpoint.
Fixes: GHSA-374w-2pxq-c9jp
<!-- Clearly explain the need for these changes: -->
The previous confetti implementation using party-js was causing lag.
Replaced it with canvas-confetti for smoother, more performant
celebrations with enhanced visual effects.
### Changes 🏗️
- **New Confetti Component**: Reusable canvas-confetti wrapper with
AutoGPT purple color palette and Storybook stories demonstrating various
effects
- **Enhanced Wallet Confetti**: Dual simultaneous bursts at 45° and 135°
angles with larger particles (scalar 1.2) for better visibility
- **Enhanced Task Celebration**: Dual-burst confetti for task group and
individual task completion events
- **Onboarding Congrats Page**: Replaced party-js with canvas-confetti
for side-cannon animation effect
- **Dependency**: Added canvas-confetti v1.9.4, removed party-js
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Trigger task completion in wallet to see dual-burst confetti at
45° and 135° angles
- [x] Complete tasks/groups to verify celebration confetti displays with
larger particles
- [x] Visit onboarding congratulations page to see side-cannon effect
- [x] Verify confetti rendering performance and no console errors
### Changes
- Replace skipped legacy builder tests with 8 working Playwright e2e
tests
targeting the new Flow Editor
- Rewrite `BuildPage` page object to match new `data-id`/`data-testid`
selectors
- Update `agent-activity.spec.ts` to use new `BuildPage` API
### Tests added
- Build page loads successfully (canvas + control buttons)
- Add a block via block menu search
- Add multiple blocks
- Remove a block (select + Backspace)
- Save an agent (name/description, verify flowID in URL)
- Save and verify run button becomes enabled
- Copy and paste a node (Cmd+C/V)
- Run an agent from the builder
### Test plan
- [x] All 8 builder tests pass locally (`pnpm test:no-build
src/tests/build.spec.ts`)
- [x] `pnpm format`, `pnpm lint`, `pnpm types` all clean
- [x] CI passes
## Summary
- Detect transient Anthropic API errors (ECONNRESET, "socket connection
was closed unexpectedly") across all error paths in the copilot SDK
streaming loop
- Replace raw technical error messages with user-friendly text:
**"Anthropic connection interrupted — please retry"**
- Add `retryable` field to `StreamError` model so the frontend can
distinguish retryable errors
- Add **"Try Again" button** on the error card for transient errors,
which re-sends the last user message
### Background
Sentry issue
[AUTOGPT-SERVER-875](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-875)
— 25+ events since March 13, caused by Anthropic API infrastructure
instability (confirmed by their status page). Same SDK/code on dev and
prod, prod-only because of higher volume of long-running streaming
sessions.
### Changes
**Backend (`constants.py`, `service.py`, `response_adapter.py`,
`response_model.py`):**
- `is_transient_api_error()` — pattern-matching helper for known
transient error strings
- Intercept transient errors in 3 places: `AssistantMessage.error`,
stream exceptions, `BaseException` handler
- Use friendly message in error markers persisted to session (so it
shows properly on page refresh too)
- `StreamError.retryable` field for frontend consumption
**Frontend (`ChatContainer`, `ChatMessagesContainer`,
`MessagePartRenderer`):**
- Thread `onRetry` callback from `ChatContainer` →
`ChatMessagesContainer` → `MessagePartRenderer`
- Detect transient error text in error markers and show "Try Again"
button via existing `ErrorCard.onRetry`
- Clicking "Try Again" re-sends the last user message (backend
auto-cleans stale error markers)
Fixes SECRT-2128, SECRT-2129, SECRT-2130
## Test plan
- [ ] Verify transient error detection with `is_transient_api_error()`
for known patterns
- [ ] Confirm error card shows "Anthropic connection interrupted —
please retry" instead of raw socket error
- [ ] Confirm "Try Again" button appears on transient error cards
- [ ] Confirm "Try Again" re-sends the last user message successfully
- [ ] Confirm non-transient errors (e.g., "Prompt is too long") still
show original error text without retry button
- [ ] Verify error marker persists correctly on page refresh
### Changes
- Integrates the existing graph search components into the new builder's
control panel
- Search by block name/title, block type, node inputs/outputs, and
description with fuzzy matching
(Jaro-Winkler)
- Clicking a result zooms/navigates to the node on the canvas
- Keyboard shortcut Cmd/Ctrl+F to open search
- Arrow key navigation and Enter to select within results
- Styled to match the new builder's block menu card pattern
https://github.com/user-attachments/assets/41ed676d-83b1-4f00-8611-00d20987a7af
### Test plan
- [x] Open builder with a graph containing multiple nodes
- [x] Click magnifying glass icon in control panel — search panel opens
- [x] Type a query — results filter by name, type, inputs, outputs
- [x] Click a result — canvas zooms to that node
- [x] Use arrow keys + Enter to navigate and select results
- [x] Press Cmd/Ctrl+F — search panel opens
- [x] Press Escape or click outside — search panel closes and query
clears
Replaces the custom LibraryTabs component with the design system's
TabsLine component throughout the library page for better UI
consistency. Also wires up favorite animation refs and removes the
unused `agentGraphVersion` field from the test fixture.
### Changes 🏗️
- Replace `LibraryTabs` with `TabsLine` from design system in
`FavoritesSection`, `LibrarySubSection`, and `page.tsx`
- Add favorite animation ref registration in `FavoritesSection` and
`LibrarySubSection`
- Inline tab type definition as `{ id: string; title: string; icon: Icon
}` in component props
- Remove unused `agentGraphVersion` field from `load_store_agents.py`
test
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Library page renders with both "All" and "Favorites" tabs using
TabsLine component
- [x] Tab switching between all agents and favorites works correctly
- [x] Favorite animations reference the correct tab element
## Summary
Two bugs causing blocks to be invisible in CoPilot search:
### Bug 1: CamelCase block names not tokenized
Block names like `AITextGeneratorBlock` were indexed as single tokens in
the search database. PostgreSQL's `plainto_tsquery('english', ...)` and
the BM25 tokenizer both treat CamelCase as one word, so searching for
"text generator" produced zero lexical/BM25 match.
**Fix:** Split CamelCase names into separate words before indexing (e.g.
`"AI Text Generator Block"`) and in the BM25 tokenizer.
### Bug 2: Disabled blocks exhausting batch budget (root cause of 36
missing blocks)
The `batch_size` limit in `get_missing_items()` was applied **before**
filtering out disabled blocks. With 120+ disabled blocks and
`batch_size=100`, the first 100 missing entries were all disabled
(skipped via `continue`), leaving the 36 enabled blocks beyond the slice
boundary **never indexed**. This made core blocks like
`AITextGeneratorBlock`, `AIConversationBlock`, `AIListGeneratorBlock`,
etc. completely invisible to search.
**Fix:** Filter disabled blocks from the missing list before slicing by
`batch_size`.
### Changes
- **`content_handlers.py`**:
- Split CamelCase block names into space-separated words when building
`searchableText`
- Filter disabled blocks before applying `batch_size` slice so enabled
blocks aren't starved
- **`hybrid_search.py`**: Updated BM25 `tokenize()` to split CamelCase
tokens
### Evidence from local DB
```
Indexed blocks: 341
Total blocks: 497 (156 missing from index)
Missing (non-disabled): 36 — including AITextGeneratorBlock, AIConversationBlock, etc.
# batch_size analysis:
First 100 missing: 0 enabled, 100 disabled ← batch exhausted by disabled blocks
After 100: 36 enabled ← never reached!
```
## Test plan
- [ ] Verify CamelCase splitting: `AITextGeneratorBlock` → `AI Text
Generator Block`
- [ ] Run `poetry run pytest backend/api/features/store/` for
regressions
- [ ] After deploy, trigger embedding backfill and verify all 36 blocks
get indexed
- [ ] Search for "text generator" in CoPilot and verify
`AITextGeneratorBlock` appears
### Changes
- Add YouTube/Vimeo embed support to `VideoRenderer` — URLs render as
embedded
iframe players instead of plain text
- Add new `AudioRenderer` — HTTP audio URLs (.mp3, .wav, .ogg, .m4a,
.aac,
.flac) and data URIs render as inline audio players
- Add new `LinkRenderer` — any HTTP/HTTPS URL not claimed by a media
renderer
becomes a clickable link with an external-link icon
- Add media preview button to `FileInput` — uploaded audio, video, and
image
files show an Eye icon that opens a preview dialog reusing the
OutputRenderer
system
- Update `ContentRenderer` shortContent gate to allow new renderers
through in
node previews
https://github.com/user-attachments/assets/eea27fb7-3870-4a1e-8d08-ba23b6e07d74
### Test plan
- [x] `pnpm vitest run src/components/contextual/OutputRenderers/` — 36
tests
passing
- [x] `pnpm format && pnpm lint && pnpm types` — all clean
- [x] Manual: run a block that outputs a YouTube URL → embedded player
- [x] Manual: run a block that outputs an audio file URL → audio player
- [x] Manual: run a block that outputs a generic URL → clickable link
- [x] Manual: upload an audio/video/image file to a file input → Eye
icon
appears, clicking opens preview dialog
### Summary
- SECRT-2094: Fix store image delete button accidentally submitting the
edit form — the remove image <button> in ThumbnailImages.tsx was missing
type="button", causing it to act as a form submit inside the
EditAgentForm. This closed the modal and showed a success toast without
the user clicking "Update submission".
https://github.com/user-attachments/assets/86cbdd7d-90b1-473c-9709-e75e956dea6b
### Changes
- `frontend/.../ThumbnailImages.tsx` — added type="button" to image
remove button
### Changes 🏗️
- Added comprehensive architecture documentation at
`docs/platform/workspace-media-architecture.md` covering:
- Database models (`UserWorkspace`, `UserWorkspaceFile`)
- `WorkspaceManager` API with session scoping
- `store_media_file()` media normalization pipeline (input types, return
formats)
- Virus scanning responsibility boundaries
- Decision tree for choosing `WorkspaceManager` vs `store_media_file()`
- Configuration reference including `clamav_max_concurrency` and
`clamav_mark_failed_scans_as_clean`
- Common patterns with error handling examples
- Updated `autogpt_platform/backend/CLAUDE.md` with a "Workspace & Media
Files" section referencing the new docs
- Removed duplicate `scan_content_safe()` call from
`WriteWorkspaceFileTool` — `WorkspaceManager.write_file()` already scans
internally, so the tool was double-scanning every file
- Replaced removed comment in `workspace.py` with explicit ownership
comment clarifying that `WorkspaceManager` is the single scanning
boundary
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified `scan_content_safe()` is called inside
`WorkspaceManager.write_file()` (workspace.py:186)
- [x] Verified `store_media_file()` scans all input branches including
local paths (file.py:351)
- [x] Verified documentation accuracy against current source code after
merge with dev
- [x] CI checks all passing
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Low Risk**
> Mostly adds documentation and internal developer guidance; the only
code change is a comment clarifying `WorkspaceManager.write_file()` as
the single virus-scanning boundary, with no behavior change.
>
> **Overview**
> Adds a new `docs/platform/workspace-media-architecture.md` describing
the Workspace storage layer vs the `store_media_file()` media pipeline,
including session scoping and virus-scanning/persistence responsibility
boundaries.
>
> Updates backend `CLAUDE.md` to point contributors to the new doc when
working on CoPilot uploads/downloads or
`WorkspaceManager`/`store_media_file()`, and clarifies in
`WorkspaceManager.write_file()` (comment-only) that callers should not
duplicate virus scanning.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
18fcfa03f8. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Our CI costs are skyrocketing, most of it because of
`platform-fullstack-ci.yml`. The `types` job currently uses in a
`big-boi` runner (= expensive), but doesn't need to.
Additionally, the "end-to-end tests" job is currently in
`platform-frontend-ci.yml` instead of `platform-fullstack-ci.yml`,
causing it not to run on backend changes (which it should).
### Changes 🏗️
- Simplify `check-api-types` job (renamed from `types`) and make it use
regular `ubuntu-latest` runner
- Export API schema from backend through CLI (instead of spinning it up
in docker)
- Fix dependency caching in `platform-fullstack-ci.yml` (based on recent
improvements in `platform-frontend-ci.yml`)
- Move `e2e_tests` job to `platform-fullstack-ci.yml`
Out-of-scope but necessary:
- Eliminate module-level init of OpenAI client in
`backend.copilot.service`
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- CI
The `@@agptfile:` expansion system previously used content-sniffing
(trying
`json.loads` then `csv.Sniffer`) to decide whether to parse file content
as
structured data. This was fragile — a file containing just `"42"` would
be
parsed as an integer, and the heuristics could misfire on ambiguous
content.
This PR replaces content-sniffing with **extension/MIME-based format
detection**.
When the file has a well-known extension (`.json`, `.csv`, etc.) or MIME
type
fragment (`workspace://id#application/json`), the content is parsed
accordingly.
Unknown formats or parse failures always fall back to plain string — no
surprises.
> [!NOTE]
> This PR builds on the `@@agptfile:` file reference protocol introduced
in #12332 and the structured data auto-parsing added in #12390.
>
> **What is `@@agptfile:`?**
> It is a special URI prefix (e.g. `@@agptfile:workspace:///report.csv`)
that the CoPilot SDK expands inline before sending tool arguments to
blocks. This lets the AI reference workspace files by name, and the SDK
automatically reads and injects the file content. See #12332 for the
full design.
### Changes 🏗️
**New utility: `backend/util/file_content_parser.py`**
- `infer_format(uri)` — determines format from file extension or MIME
fragment
- `parse_file_content(content, fmt)` — parses content, never raises
- Supported text formats: JSON, JSONL/NDJSON, CSV, TSV, YAML, TOML
- Supported binary formats: Parquet (via pyarrow), Excel/XLSX (via
openpyxl)
- JSON scalars (strings, numbers, booleans, null) stay as strings — only
containers (arrays, objects) are promoted
- CSV/TSV require ≥1 row and ≥2 columns to qualify as tabular data
- Added `openpyxl` dependency for Excel reading via pandas
- Case-insensitive MIME fragment matching per RFC 2045
- Shared `PARSE_EXCEPTIONS` constant to avoid duplication between
modules
**Updated `expand_file_refs_in_args` in `file_ref.py`**
- Bare refs now use `infer_format` + `parse_file_content` instead of the
old `_try_parse_structured` content-sniffing function
- Binary formats (parquet, xlsx) read raw bytes via `read_file_bytes`
- Embedded refs (text around `@@agptfile:`) still produce plain strings
- **Size guards**: Workspace and sandbox file reads now enforce a 10 MB
limit
(matching the existing local file limit) to prevent OOM on large files
**Updated `blocks/github/commits.py`**
- Consolidated `_create_blob` and `_create_binary_blob` into a single
function
with an `encoding` parameter
**Updated copilot system prompt**
- Documents the extension-based structured data parsing and supported
formats
**66 new tests** in `file_content_parser_test.py` covering:
- Format inference (extension, MIME, case-insensitive, precedence)
- All 8 format parsers (happy path + edge cases + fallbacks)
- Binary format handling (string input fallback, invalid bytes fallback)
- Unknown format passthrough
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] All 66 file_content_parser_test.py tests pass
- [x] All 31 file_ref_test.py tests pass
- [x] All 13 file_ref_integration_test.py tests pass
- [x] `poetry run format` passes clean (including pyright)
Adds a "Jump Back In" CTA at the top of the Library page to encourage
users to quickly rerun their most recently successful agent.
Closes SECRT-1536
### Changes 🏗️
- New `JumpBackIn` component with `useJumpBackIn` hook at
`library/components/JumpBackIn/`
- Fetches first page of library agents sorted by `updatedAt`
- Finds the first agent with a `COMPLETED` execution in
`recent_executions`
- Shows banner with agent name + "Jump Back In" button linking to
`/library/agents/{id}`
- Returns `null` (hidden) when loading or when no agent with a
successful run exists
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] `pnpm format`, `pnpm lint`, `pnpm types` all pass
- [x] Verified banner is hidden when no successful runs exist (edge
case)
- [x] Verified library page renders correctly with no visual regressions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Refactors the Claude Code skills for a cleaner, more intuitive dev loop.
### Changes 🏗️
- **`/pr-review` (new)**: Actual code review skill — reads the PR diff,
fetches existing comments to avoid duplicates, and posts inline GitHub
comments with structured feedback (Blockers / Should Fix / Nice to Have
/ Nit) covering correctness, security, code quality, architecture, and
testing.
- **`/pr-address` (was `/babysit-pr`)**: Addresses review comments and
monitors CI until green. Renamed from `/babysit-pr` to `/pr-address` to
better reflect its purpose. Handles bot-specific feedback
(autogpt-reviewer, sentry, coderabbitai) and loops until all comments
are addressed and CI is green.
- **`/backend-check` + `/frontend-check` → `/check`**: Unified into a
single `/check` skill that auto-detects whether backend (Python) or
frontend (TypeScript) code changed and runs the appropriate formatting,
linting, type checking, and tests. Shared code quality rules applied to
both.
- **`/code-style` enhanced**: Now covers both Python and
TypeScript/React. Added learnings from real PR work: lazy `%s` logging,
TOCTOU awareness, SSE protocol rules (`data:` vs `: comment`), FastAPI
`Security()` vs `Depends()`, Redis pipeline atomicity, error path
sanitization, mock target rules after refactoring.
- **`/worktree` fixed**: Normal `git worktree` is now the default (was
branchlet-first). Branchlet moved to optional section. All paths derived
from `git rev-parse --show-toplevel`.
- **`/pr-create`, `/openapi-regen`, `/new-block` cleaned up**: Reference
`/check` and `/code-style` instead of duplicating instructions.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified all skill files parse correctly (valid YAML frontmatter)
- [x] Verified skill auto-detection triggers updated in descriptions
- [x] Verified old backend-check and frontend-check directories removed
- [x] Verified pr-review and pr-address directories created with correct
content
## Summary
- Replaces all user-facing "block" terminology in the CoPilot activity
stream with plain-English labels ("Step failed", "action",
"Credentials", etc.)
- Adds `humanizeFileName()` utility to display file names without
extensions, with title-case and spaces (e.g. `executive_memo.md` →
`"Executive Memo"`)
- Updates error messages across RunBlock, RunAgent, and FindBlocks tools
to use friendly language
## Test plan
- [ ] Open CoPilot and trigger a block execution — verify animation text
says "Running" / "Step failed" instead of "Running the block" / "Error
running block"
- [ ] Trigger a file read/write action — verify the activity shows
humanized file names (e.g. `Reading "Executive Memo"` not `Reading
executive_memo.md`)
- [ ] Trigger FindBlocks — verify labels say "Searching for actions" and
"Results" instead of "Searching for blocks" and "Block results"
- [ ] Check the work-done stats bar — verify it shows "action" /
"actions" instead of "block run" / "block runs"
- [ ] Trigger a setup requirements card — verify labels say
"Credentials" and "Inputs" instead of "Block credentials" and "Block
inputs"
- [ ] Visit `/copilot/styleguide` — verify error test data no longer
contains "Block execution" text
Resolves: [SECRT-2025](https://linear.app/autogpt/issue/SECRT-2025)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Adds a "Run now" action to the schedule detail view and sidebar
dropdown, allowing users to immediately trigger a scheduled agent run
without waiting for the next cron execution.
### Changes 🏗️
- **`useSelectedScheduleActions.ts`**: Added
`usePostV1ExecuteGraphAgent` hook and `handleRunNow` function that
executes the agent using the schedule's stored `input_data` and
`input_credentials`. On success, invalidates runs query and navigates to
the new run
- **`SelectedScheduleActions.tsx`**: Added Play icon button as first
action button, with loading spinner while running
- **`SelectedScheduleView.tsx`**: Threads `onSelectRun` prop and
`schedule` object to action components (both mobile and desktop layouts)
- **`NewAgentLibraryView.tsx`**: Passes `onSelectRun` handler to enable
navigation to the new run after execution
- **`ScheduleActionsDropdown.tsx`**: Added "Run now" dropdown menu item
with same execution logic
- **`ScheduleListItem.tsx`**: Added `onRunCreated` prop passed to
dropdown
- **`SidebarRunsList.tsx`**: Connects sidebar dropdown to run
selection/navigation
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] `pnpm format`, `pnpm lint`, `pnpm types` all pass
- [x] Code review: follows existing patterns (mirrors "Run Again" in
SelectedRunActions)
- [x] No visual regressions on agent detail page
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Requested by @majdyz
When a user asks for Google Sheets integration, the CoPilot agent skips
block discovery entirely (despite 55+ Google Sheets blocks being
available), jumps straight to MCP, guesses a fake URL
(`https://sheets.googleapis.com/mcp`), and gets a raw HTML 404 error
page dumped into the conversation.
**Changes:**
1. **MCP guide** (`mcp_tool_guide.md`): Added "Check blocks first"
section directing the agent to use `find_block` before attempting MCP
for any service not in the known servers list. Explicitly prohibits
guessing/constructing MCP server URLs.
2. **Error handling** (`run_mcp_tool.py`): Detects HTML error pages in
HTTP responses (e.g. raw 404 pages from non-MCP endpoints) and returns a
clean one-liner like "This URL does not appear to host an MCP server"
instead of dumping the full HTML body.
**Note:** The main CoPilot system prompt (managed externally, not in
repo) should also be updated to reinforce block-first behavior in the
Capability Check section. This PR covers the in-repo changes.
Session reference: `9216df83-5f4a-48eb-9457-3ba2057638ae` (turn 3)
Ticket: [SECRT-2116](https://linear.app/autogpt/issue/SECRT-2116)
---
Co-authored-by: Zamil Majdy (@majdyz) <majdyz@gmail.com>
---------
Co-authored-by: Zamil Majdy (@majdyz) <majdyz@gmail.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
Agent validation failures are expected when the LLM generates invalid
agent graphs (wrong block IDs, missing required inputs, bad output field
names). The validator catches these and returns proper error responses.
However, `validator.py:938` used `logger.error()`, which Sentry captures
as error events — flooding #platform-alerts with non-errors.
This changes it to `logger.warning()`, keeping the log visible for
debugging without triggering Sentry alerts.
Fixes SECRT-2120
---
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
## Summary
- **Root cause**: `TranscriptBuilder` accumulates all raw SDK stream
messages including pre-compaction content. When the CLI compacts
mid-stream, the uploaded transcript was still uncompacted, causing
"Prompt is too long" errors on the next `--resume` turn.
- **Fix**: Detect mid-stream compaction via the `PreCompact` hook, read
the CLI's session file to get the compacted entries (summary +
post-compaction messages), and call
`TranscriptBuilder.replace_entries()` to sync it with the CLI's active
context. This ensures the uploaded transcript always matches what the
CLI sees.
- **Key changes**:
- `CompactionTracker`: stores `transcript_path` from `PreCompact` hook,
one-shot `compaction_just_ended` flag that correctly resets for multiple
compactions
- `read_compacted_entries()`: reads CLI session JSONL, finds
`isCompactSummary: true` entry, returns it + all entries after. Includes
path validation against the CLI projects directory.
- `TranscriptBuilder.replace_entries()`: clears and replaces all entries
with compacted ones, preserving `isCompactSummary` entries (which have
`type: "summary"` that would normally be stripped)
- `load_previous()`: also preserves `isCompactSummary` entries when
loading a previously compacted transcript
- Service stream loop: after compaction ends, reads compacted entries
and syncs TranscriptBuilder
## Test plan
- [x] 69 tests pass across `compaction_test.py` and `transcript_test.py`
- [x] Tests cover: one-shot flag behavior, multiple compactions within a
query, transcript path storage, path traversal rejection,
`read_compacted_entries` (7 tests), `replace_entries` (4 tests),
`load_previous` with compacted content (2 tests)
- [x] Pre-commit hooks pass (lint, format, typecheck)
- [ ] Manual test: trigger compaction in a multi-turn session and verify
the uploaded transcript reflects compaction
Requested by @torantula
Add support for shareable AutoPilot URLs that contain a prompt in the
URL hash fragment, inspired by [Lovable's
implementation](https://docs.lovable.dev/integrations/build-with-url).
**URL format:**
- `/copilot#prompt=URL-encoded-text` — pre-fills the input for the user
to review before sending
- `/copilot?autosubmit=true#prompt=...` — auto-creates a session and
sends the prompt immediately
**Example:**
```
https://platform.agpt.co/copilot#prompt=Create%20a%20todo%20apphttps://platform.agpt.co/copilot?autosubmit=true#prompt=Create%20a%20todo%20app
```
**Key design decisions:**
- Uses URL fragment (`#`) instead of query params — fragments never hit
the server, so prompts stay client-side only (better for privacy, no
backend URL length limits)
- URL is cleaned via `history.replaceState` immediately after extraction
to prevent re-triggering on navigation/reload
- Leverages existing `pendingMessage` + `createSession()` flow for
auto-submit — no new backend APIs needed
- For populate-only mode, passes `initialPrompt` down through component
tree to pre-fill the chat input
**Files changed:**
- `useCopilotPage.ts` — URL hash extraction logic + `initialPrompt`
state
- `CopilotPage.tsx` — passes `initialPrompt` to `ChatContainer`
- `ChatContainer.tsx` — passes `initialPrompt` to `EmptySession`
- `EmptySession.tsx` — passes `initialPrompt` to `ChatInput`
- `ChatInput.tsx` / `useChatInput.ts` — accepts `initialValue` to
pre-fill the textarea
Fixes SECRT-2119
---
Co-authored-by: Toran Bruce Richards (@Torantulino) <toran@agpt.co>
### Changes 🏗️
Adds `autogpt_platform/analytics/` — 14 SQL view definitions that expose
production data safely through a locked-down `analytics` schema.
**Security model:**
- Views use `security_invoker = false` (PostgreSQL 15+), so they execute
as their owner (`postgres`), not the caller
- `analytics_readonly` role only has access to `analytics.*` — cannot
touch `platform` or `auth` tables directly
**Files:**
- `backend/generate_views.py` — does everything; auto-reads credentials
from `backend/.env`
- `analytics/queries/*.sql` — 14 documented view definitions (auth, user
activity, executions, onboarding funnel, cohort retention)
---
### Running locally (dev)
```bash
cd autogpt_platform/backend
# First time only — creates analytics schema, role, grants
poetry run analytics-setup
# Create / refresh views (auto-reads backend/.env)
poetry run analytics-views
```
### Running in production (Supabase)
```bash
cd autogpt_platform/backend
# Step 1 — first time only (run in Supabase SQL Editor as postgres superuser)
poetry run analytics-setup --dry-run
# Paste the output into Supabase SQL Editor and run
# Step 2 — apply views (use direct connection host, not pooler)
poetry run analytics-views --db-url "postgresql://postgres:PASSWORD@db.<ref>.supabase.co:5432/postgres"
# Step 3 — set password for analytics_readonly so external tools can connect
# Run in Supabase SQL Editor:
# ALTER ROLE analytics_readonly WITH PASSWORD 'your-password';
```
---
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Setup + views applied cleanly on local Postgres 15
- [x] `analytics_readonly` can `SELECT` from all 14 `analytics.*` views
- [x] `analytics_readonly` gets `permission denied` on `platform.*` and
`auth.*` directly
---------
Co-authored-by: Otto (AGPT) <otto@agpt.co>
During Tally data extraction, the system now also generates personalized
quick-action prompts as part of the existing LLM extraction call
(configurable model, defaults to GPT-4o-mini, `temperature=0.0`). The
prompt asks the LLM for 5 candidates, then the code validates (filters
prompts >20 words) and keeps the top 3. These prompts are stored in the
existing `CoPilotUnderstanding.data` JSON field (at the top level, not
under `business`) and served to the frontend via a new API endpoint. The
copilot chat page uses them instead of hardcoded defaults when
available.
### Changes 🏗️
**Backend – Data models** (`understanding.py`):
- Added `suggested_prompts` field to `BusinessUnderstandingInput`
(optional) and `BusinessUnderstanding` (default empty list)
- Updated `from_db()` to deserialize `suggested_prompts` from top-level
of the data JSON
- Updated `merge_business_understanding_data()` with overwrite strategy
for prompts (full replace, not append)
- `format_understanding_for_prompt()` intentionally does **not** include
`suggested_prompts` — they are UI-only
**Backend – Prompt generation** (`tally.py`):
- Extended `_EXTRACTION_PROMPT` to request 5 suggested prompts alongside
the existing business understanding fields — all extracted in a single
LLM call (`temperature=0.0`)
- Post-extraction validation filters out prompts exceeding 20 words and
slices to the top 3
- Model is now configurable via `tally_extraction_llm_model` setting
(defaults to `openai/gpt-4o-mini`)
**Backend – API endpoint** (`routes.py`):
- Added `GET /api/chat/suggested-prompts` (auth required)
- Returns `{prompts: string[]}` from the user's cached business
understanding (48h Redis TTL)
- Returns empty array if no understanding or no prompts exist
**Frontend** (`EmptySession/`):
- `helpers.ts`: Extracted defaults to `DEFAULT_QUICK_ACTIONS`,
`getQuickActions()` now accepts optional custom prompts and falls back
to defaults
- `EmptySession.tsx`: Calls `useGetV2GetSuggestedPrompts` hook
(`staleTime: Infinity`) and passes results to `getQuickActions()` with
hardcoded fallback
- Fixed `useEffect` resize handler that previously used
`window.innerWidth` as a dependency (re-ran every render); now uses a
proper resize event listener
- Added skeleton loading state while prompts are being fetched
**Generated** (`__generated__/`):
- Regenerated Orval API client with new endpoint types and hooks
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Backend format + lint + pyright pass
- [x] Frontend format + lint pass
- [x] All existing tally tests pass (28/28)
- [x] All chat route tests pass (9/9)
- [x] All invited_user tests pass (7/7)
- [x] E2E: New user with tally data sees custom prompts on copilot page
- [x] E2E: User without tally data sees hardcoded default prompts
- [x] E2E: Clicking a custom prompt sends it as a chat message
Enable E2B `auto_resume` lifecycle option and reduce the safety-net
timeout from 3 hours to 5 minutes.
Currently, if the explicit per-turn `pause_sandbox_direct()` call fails
(process crash, network issue, fire-and-forget task cancellation), the
sandbox keeps running for up to **3 hours** before the safety-net
timeout fires. With this change, worst-case billing drops to **5
minutes**.
### Changes
- Add `auto_resume: True` to sandbox lifecycle config — paused sandboxes
wake transparently on SDK activity
- Reduce `e2b_sandbox_timeout` default from 10800s (3h) → 300s (5min)
- Add `e2b_sandbox_auto_resume` config field (default: `True`)
- Guard: `auto_resume` only added when `on_timeout == "pause"`
### What doesn't change
- Explicit per-turn `pause_sandbox_direct()` remains the primary
mechanism
- `connect()` / `_try_reconnect()` flow unchanged
- Redis key management unchanged
- No latency impact (resume is ~1-2s regardless of trigger)
### Risk
Very low — `auto_resume` is additive. If it doesn't work as advertised,
`connect()` still resumes paused sandboxes exactly as before.
Ref: https://e2b.dev/docs/sandbox/auto-resume
Linear: SECRT-2118
---
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
### Changes 🏗️
- add invite-backed beta provisioning with a new `InvitedUser` platform
model, Prisma migration, and first-login activation path that
materializes `User`, `Profile`, `UserOnboarding`, and
`CoPilotUnderstanding`
- replace the legacy beta allowlist check with invite-backed gating for
email/password signup and Tally pre-seeding during activation
- add admin backend APIs and frontend `/admin/users` management UI for
listing, creating, revoking, retrying, and bulk-uploading invited users
- add the design doc for the beta invite system and extend backend
coverage for invite activation, bulk uploads, and auth-route behavior
- configuration changes: introduce the new invite/tally schema objects
and migration; no new env vars or docker service changes are required
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] `cd autogpt_platform/backend && poetry run format`
- [x] `cd autogpt_platform/backend && poetry run pytest -q` (run against
an isolated local Postgres database with non-conflicting service port
overrides)
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
## Summary
When opening a folder in the library, sub-folders were not displayed —
only agents were shown. This was caused by two issues:
1. The folder list query always fetched root-level folders (no
`parent_id` filter), so sub-folders were never requested
2. `showFolders` was set to `false` whenever a folder was selected,
hiding all folders from the view
### Changes 🏗️
- Pass `parent_id` to the `useGetV2ListLibraryFolders` hook so it
fetches child folders of the currently selected folder
- Remove the `!selectedFolderId` condition from `showFolders` so folders
render inside other folders
- Fetch the current folder via `useGetV2GetFolder` instead of searching
the (now differently-scoped) folder list
- Clean up breadcrumb: remove emoji icon, match folder name text size to
"My Library", replace `Button` with plain `<button>` to remove extra
padding/gap
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Open a folder in the library and verify sub-folders are displayed
- [x] Verify agents inside the folder still display correctly
- [x] Verify breadcrumb shows folder name without emoji, matching "My
Library" text size
- [x] Verify clicking "My Library" in breadcrumb navigates back to root
- [x] Verify root-level view still shows all top-level folders
- [x] Verify favorites tab does not show folders
The copilot's `@@agptfile:` reference system always produces strings
when expanding
file references. This breaks blocks that expect structured types — e.g.
`GoogleSheetsWriteBlock` expects `values: list[list[str]]`, but receives
a raw CSV
string instead. Additionally, the copilot's input coercion was
duplicating logic from
the executor instead of reusing the shared `convert()` utility, and the
coercion had
no type-aware gating — it would always call `convert()`, which could
incorrectly
transform values that already matched the expected type (e.g.
stringifying a valid
`list[str]` in a `str | list[str]` union).
### Changes 🏗️
**Structured data parsing for `@@agptfile:` bare references:**
- When an entire tool argument value is a bare `@@agptfile:` reference,
the resolved
content is now auto-parsed: JSON → native types, CSV/TSV →
`list[list[str]]`
- Embedded references within larger strings still do plain text
substitution
- Updated copilot system prompt to document the structured data
capability
**Shared type coercion utility (`coerce_inputs_to_schema`):**
- Extracted `coerce_inputs_to_schema()` into `backend/util/type.py` —
shared by both
the executor's `validate_exec()` and the copilot's `execute_block()`
- Uses Pydantic `model_fields` (not `__annotations__`) to include
inherited fields
- Added `_value_satisfies_type()` gate: only calls `convert()` when the
value doesn't
already match the target type, including recursive inner-element
checking for generics
**`_value_satisfies_type` — recursive type checking:**
- Handles `Any`, `Optional`, `Union`, `list[T]`, `dict[K,V]`, `set[T]`,
`tuple[T, ...]`,
heterogeneous `tuple[str, int, bool]`, bare generics, nested generics
- Guards against non-runtime origins (`Literal`, etc.) to prevent
`isinstance()` crashes
- Returns `False` (not `True`) for unhandled generic origins as a safe
fallback
**Test coverage:**
- 51 new tests for `_value_satisfies_type` and `coerce_inputs_to_schema`
in `type_test.py`
- 8 new tests for `execute_block` type coercion in `helpers_test.py`
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] All existing file_ref tests pass
- [x] All new type_test.py tests pass (51 tests covering
_value_satisfies_type and coerce_inputs_to_schema)
- [x] All new helpers_test.py tests pass (8 tests covering execute_block
coercion)
- [x] `poetry run format` passes clean
- [x] `poetry run lint` passes clean
- [x] Pyright type checking passes
Fixes the agent generator setting `gpt-5.2-2025-12-11` (or `gpt-4o`) as
the model for PerplexityBlocks instead of valid Perplexity models,
causing 100% failure rate for agents using Perplexity blocks.
### Changes 🏗️
- **Fixer: block-aware model validation** — `fix_ai_model_parameter()`
now reads the block's `inputSchema` to check for `enum` constraints on
the model field. Blocks with their own model enum (PerplexityBlock,
IdeogramBlock, CodexBlock, etc.) are validated against their own allowed
values with the correct default, instead of the hardcoded generic set
(`gpt-4o`, `claude-opus-4-6`). This also fixes `edit_agent` which runs
through the same fixer pipeline.
- **PerplexityBlock: runtime fallback** — Added a `field_validator` on
the model field that gracefully falls back to `SONAR` instead of
crashing when an invalid model value is encountered at runtime. Also
overrides `validate_data` to sanitize invalid model values *before* JSON
schema validation (which runs in `Block._execute` before Pydantic
instantiation), ensuring the fallback is actually reachable during block
execution.
- **DB migration** — Fixes existing PerplexityBlock nodes with invalid
model values in both `AgentNode.constantInput` and
`AgentNodeExecutionInputOutput` (preset overrides), matching the pattern
from the Gemini migration.
- **Tests** — Fixer tests for block-specific enum validation, plus
`validate_data`-level tests ensuring invalid models are sanitized before
JSON schema validation rejects them.
Resolves [SECRT-2097](https://linear.app/autogpt/issue/SECRT-2097)
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] All existing + new fixer tests pass
- [x] PerplexityBlock block test passes
- [x] 11 perplexity_test.py tests pass (field_validator + validate_data
paths)
- [x] Verified invalid model (`gpt-5.2-2025-12-11`) falls back to
`perplexity/sonar` at runtime
- [x] Verified valid Perplexity models are preserved by the fixer
- [x] Migration covers both constantInput and preset overrides
Adds a notification system for the Copilot (AutoPilot) so users know
when background chats finish processing — via in-app indicators, sounds,
browser notifications, and document title badges.
### Changes 🏗️
**Backend**
- Add `is_processing` field to `SessionSummaryResponse` — batch-checks
Redis for active stream status on each session in the list endpoint
- Fix `is_processing` always returning `false` due to bytes vs string
comparison (`b"running"` → `"running"`) with `decode_responses=True`
Redis client
- Add `CopilotCompletionPayload` model for WebSocket notification events
- Publish `copilot_completion` notification via WebSocket when a session
completes in `stream_registry.mark_session_completed`
**Frontend — Notification UI**
- Add `NotificationBanner` component — amber banner prompting users to
enable browser notifications (auto-hides when already enabled or
dismissed)
- Add `NotificationDialog` component — modal dialog for enabling
notifications, supports force-open from sidebar menu for testing
- Fix repeated word "response" in dialog copy
**Frontend — Sidebar**
- Add bell icon in sidebar header with popover menu containing:
- Notifications toggle (requests browser permission on enable; shows
toast if denied)
- Sound toggle (disabled when notifications are off)
- "Show notification popup" button (for testing the dialog)
- "Clear local data" button (resets all copilot localStorage keys)
- Bell icon states: `BellSlash` (disabled), `Bell` (enabled, no sound),
`BellRinging` (enabled + sound)
- Add processing indicator (PulseLoader) and completion checkmark
(CheckCircle) inline with chat title, to the left of the hamburger menu
- Processing indicator hides immediately when completion arrives (no
overlap with checkmark)
- Fix PulseLoader initial flash — start at `scale(0); opacity: 0` with
smoother keyframes
- Add 10s polling (`refetchInterval`) to session list so `is_processing`
updates automatically
- Clear document title badge when navigating to a completed chat
- Remove duplicate "Your chats" heading that appeared in both
SidebarHeader and SidebarContent
**Frontend — Notification Hook (`useCopilotNotifications`)**
- Listen for `copilot_completion` WebSocket events
- Track completed sessions in Zustand store
- Play notification sound (only for background sessions, not active
chat)
- Update `document.title` with unread count badge
- Send browser `Notification` when tab is hidden, with click-to-navigate
to the completed chat
- Reset document title on tab focus
**Frontend — Store & Storage**
- Add `completedSessionIDs`, `isNotificationsEnabled`, `isSoundEnabled`,
`showNotificationDialog`, `clearCopilotLocalData` to Zustand store
- Persist notification and sound preferences in localStorage
- On init, validate `isNotificationsEnabled` against actual
`Notification.permission`
- Add localStorage keys: `COPILOT_NOTIFICATIONS_ENABLED`,
`COPILOT_SOUND_ENABLED`, `COPILOT_NOTIFICATION_BANNER_DISMISSED`,
`COPILOT_NOTIFICATION_DIALOG_DISMISSED`
**Mobile**
- Add processing/completion indicators and sound toggle to MobileDrawer
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Open copilot, start a chat, switch to another chat — verify
processing indicator appears on the background chat
- [x] Wait for background chat to complete — verify checkmark appears,
processing indicator disappears
- [x] Enable notifications via bell menu — verify browser permission
prompt appears
- [x] With notifications enabled, complete a background chat while on
another tab — verify system notification appears with sound
- [x] Click system notification — verify it navigates to the completed
chat
- [x] Verify document title shows unread count and resets when
navigating to the chat or focusing the tab
- [x] Toggle sound off — verify no sound plays on completion
- [x] Toggle notifications off — verify no sound, no system
notification, no badge
- [x] Clear local data — verify all preferences reset
- [x] Verify notification banner hides when notifications already
enabled
- [x] Verify dialog auto-shows for first-time users and can be
force-opened from menu
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
When Supabase rejects a password reset token (expired, already used,
etc.), it redirects to the callback URL with `error`, `error_code`, and
`error_description` params instead of a `code`. Previously, the callback
only checked for `!code` and returned a generic "Missing verification
code" error, swallowing the actual Supabase error.
This meant the `ExpiredLinkMessage` UX (added in SECRT-1369 / #12123)
was never triggered for these cases — users just saw the email input
form again with no explanation.
Now the callback reads Supabase's error params and forwards them to
`/reset-password`, where the existing expired link detection picks them
up correctly.
**Note:** This doesn't fix the root cause of Pwuts's token expiry issue
(likely link preview/prefetch consuming the OTP), but it ensures users
see the proper "link expired" message with a "Request new link" button
instead of a confusing silent redirect.
---
Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>
## Summary
Adds the Cohere Command A family of models to AutoGPT Platform with
proper pricing configuration.
## Models Added
- **Command A 03.2025**: Flagship model (256k context, 8k output) - 3
credits
- **Command A Translate 08.2025**: State-of-the-art translation (8k
context, 8k output) - 3 credits
- **Command A Reasoning 08.2025**: First reasoning model (256k context,
32k output) - 6 credits
- **Command A Vision 07.2025**: First vision-capable model (128k
context, 8k output) - 3 credits
## Changes
- Added 4 new LlmModel enum entries with proper OpenRouter model IDs
- Added ModelMetadata for each model with correct context windows,
output limits, and price tiers
- Added pricing configuration in block_cost_config.py
## Testing
- [ ] Models appear in AutoGPT Platform model selector
- [ ] Pricing is correctly applied when using models
Resolves **SECRT-2083**
## Changes
- Added `MICROSOFT_PHI_4` to LlmModel enum (`microsoft/phi-4`)
- Configured model metadata:
- 16K context window
- 16K max output tokens
- OpenRouter provider
- Set cost tier: 1
- Input: $0.06 per 1M tokens
- Output: $0.14 per 1M tokens
## Details
Microsoft Phi-4 is a 14B parameter model available through OpenRouter.
This PR adds proper support in the autogpt_platform backend.
Resolves SECRT-2086
### Changes
- When a provider supports multiple credential types (e.g. GitHub with
both OAuth and API Key),
clicking "Add credential" now opens a tabbed dialog where users can
choose which type to use.
Previously, OAuth always took priority and API key was unreachable.
- Each credential in the list now shows a type-specific icon (provider
icon for OAuth, key for API Key,
password/lock for others) and a small label badge (e.g. "API Key",
"OAuth").
- The native dropdown options also include the credential type in
parentheses for clarity.
- Single credential type providers behave exactly as before — no dialog,
direct action.
https://github.com/user-attachments/assets/79f3a097-ea97-426b-a2d9-781d7dcdb8a4
## Test plan
- [x] Test with a provider that has only one credential type (e.g.
OpenAI with api_key only) — should
behave as before
- [x] Test with a provider that has multiple types (e.g. GitHub with
OAuth + API Key configured) —
should show tabbed dialog
- [x] Verify OAuth tab triggers the OAuth flow correctly
- [x] Verify API Key tab shows the inline form and creates credentials
- [x] Verify credential list shows correct icons and type badges
- [x] Verify dropdown options show type in parentheses
Two frontend fixes from release testing (2026-03-11):
**SECRT-2102:** The schedule dialog shows an "At [hh]:[mm]" time picker
when selecting Custom > Every x Minutes or Hours, which makes no sense
for sub-day intervals. Now only shows the time picker for Custom > Days
and other frequency types.
**SECRT-2103:** The "Unpublished changes" banner shows for agents the
user doesn't own or create. Root cause: `owner_user_id` is the library
copy owner, not the graph creator. Changed to use `can_access_graph`
which correctly reflects write access.
---
Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>
---------
Co-authored-by: Reinier van der Leer (@Pwuts) <reinier@agpt.co>
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
### Changes
- Restored missing `shepherd.js/dist/css/shepherd.css` base styles
import
- Added missing .new-builder-tutorial-disable and
.new-builder-tutorial-highlight CSS classes to
tutorial.css
- Fixed getFormContainerSelector() to include -node suffix matching the
actual DOM attribute
### What broke
The old legacy-builder/tutorial.ts was the only file importing
Shepherd's base CSS. When #12082 removed
the legacy builder, the new tutorial lost all base Shepherd styles
(close button positioning, modal
overlay, tooltip layout). The new tutorial's custom CSS overrides
depended on these base styles
existing.
Test plan
- [x] Start the tutorial from the builder (click the chalkboard icon)
- [x] Verify the close (X) button is positioned correctly in the
top-right of the popover
- [x] Verify the modal overlay dims the background properly
- [x] Verify element highlighting works when the tutorial points to
blocks/buttons
- [x] Verify non-target blocks are grayed out during the "select
calculator" step
- [x] Complete the full tutorial flow end-to-end (add block → configure
→ connect → save → run)
Closes SECRT-2105
### Changes 🏗️
Replace all user-facing MCP technical terminology with plain, friendly
language across the CoPilot UI and LLM prompting.
**Backend (`run_mcp_tool.py`)**
- Added `_service_name()` helper that extracts a readable name from an
MCP host (`mcp.sentry.dev` → `Sentry`)
- `agent_name` in `SetupRequirementsResponse`: `"MCP: mcp.sentry.dev"` →
`"Sentry"`
- Auth message: `"The MCP server at X requires authentication. Please
connect your credentials to continue."` → `"To continue, sign in to
Sentry and approve access."`
**Backend (`mcp_tool_guide.md`)**
- Added "Communication style" section with before/after examples to
teach the LLM to avoid "MCP server", "OAuth", "credentials" jargon in
responses to users
**Frontend (`MCPSetupCard.tsx`)**
- Button: `"Connect to mcp.sentry.dev"` → `"Connect Sentry"`
- Connected state: `"Connected to mcp.sentry.dev!"` → `"Connected to
Sentry!"`
- Retry message: `"I've connected the MCP server credentials. Please
retry."` → `"I've connected. Please retry."`
**Frontend (`helpers.tsx`)**
- Added `serviceNameFromHost()` helper (exported, mirrors the backend
logic)
- Run text: `"Discovering MCP tools on mcp.sentry.dev"` → `"Connecting
to Sentry…"`
- Run text: `"Connecting to MCP server"` → `"Connecting…"`
- Run text: `"Connect to MCP: mcp.sentry.dev"` → `"Connect Sentry"`
(uses `agent_name` which is now just `"Sentry"`)
- Run text: `"Discovered N tool(s) on mcp.sentry.dev"` → `"Connected to
Sentry"`
- Error text: `"MCP error"` → `"Connection error"`
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [ ] Open CoPilot and ask it to connect to a service (e.g. Sentry,
Notion)
- [ ] Verify the run text accordion title shows `"Connecting to
Sentry…"` instead of `"Discovering MCP tools on mcp.sentry.dev"`
- [ ] Verify the auth card button shows `"Connect Sentry"` instead of
`"Connect to mcp.sentry.dev"`
- [ ] Verify the connected state shows `"Connected to Sentry!"` instead
of `"Connected to mcp.sentry.dev!"`
- [ ] Verify the LLM response text avoids "MCP server", "OAuth",
"credentials" terminology
`responseType.ts` was accidentally committed inside
`src/app/api/__generated__/models/` despite that directory being listed
in `.gitignore` (added in PR #12238).
### Changes 🏗️
- Removes
`autogpt_platform/frontend/src/app/api/__generated__/models/responseType.ts`
from git tracking — the file is already covered by the `.gitignore` rule
`src/app/api/__generated__/` and should never have been committed.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] No functional changes — only removes a stale tracked file that is
already gitignored
## Summary
- keep Gmail body extraction resilient when `html2text` converter raises
- fallback to raw HTML instead of failing extraction
- add regression test for converter failure path
Closes#12368
## Testing
- added unit test in
`autogpt_platform/backend/test/blocks/test_gmail.py`
---------
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
### Changes 🏗️
- The "View" modal for agent submissions hardcoded "Agent is awaiting
review" regardless of actual status
- Now displays "Agent approved", "Agent rejected", or "Agent is awaiting
review" based on the submission's actual status
- Shows review feedback in a highlighted section for rejected agents
when review comments are available
<img width="1127" height="788" alt="Screenshot 2026-03-11 at 9 02 29 AM"
src="https://github.com/user-attachments/assets/840e0fb1-22c2-4fda-891b-967c8b8b4043"
/>
<img width="1105" height="680" alt="Screenshot 2026-03-11 at 9 02 46 AM"
src="https://github.com/user-attachments/assets/f0c407e6-c58e-4ec8-9988-9f5c69bfa9a7"
/>
## Test plan
- [x] Submit an agent and verify the view modal shows "Agent is awaiting
review"
- [x] View an approved agent submission and verify it shows "Agent
approved"
- [x] View a rejected agent submission and verify it shows "Agent
rejected"
- [x] View a rejected agent with review comments and verify the feedback
section appears
Closes SECRT-2092
### Changes
- Surface backend error details (file size limit, invalid file type,
virus detected, etc.) in the upload failed toast instead of showing a
generic "Upload Failed" message
- The backend already returns specific error messages (e.g., "File too
large. Maximum size is 50MB") but the frontend was discarding them with
a catch-all handler
<img width="1222" height="411" alt="Screenshot 2026-03-11 at 9 13 30 AM"
src="https://github.com/user-attachments/assets/34ab3d90-fffa-4788-917a-fe2a7f4144b9"
/>
## Test plan
- [x] Upload an image larger than 50MB to a store submission → should
see "File too large. Maximum size is 50MB"
- [x] Upload an unsupported file type → should see file type error
message
- [x] Upload a valid image → should still work normally
Resolves SECRT-2093
### Motivation 🎯
Fixes the issue where deleting a schedule shows an error screen instead
of gracefully handling the deletion. Previously, when a user deleted a
schedule, a race condition occurred where the query cache refetch
completed before the URL
state updated, causing the component to try rendering a schedule that no
longer existed (resulting in a 404 error screen).
### Changes 🏗️
**1. Fixed deletion order to prevent error screen flash**
- `useSelectedScheduleActions.ts` - Call `onDeleted()` callback
**before** invalidating queries to clear selection first
- `ScheduleActionsDropdown.tsx` - Same fix for sidebar dropdown deletion
**2. Added smart auto-selection logic**
- `useNewAgentLibraryView.ts`:
- Added query to fetch current schedules list
- Added `handleScheduleDeleted(deletedScheduleId)` function that:
- Auto-selects the first remaining schedule if others exist
- Clears selection to show empty state if no schedules remain
**3. Wired up smart deletion handler throughout component tree**
- `NewAgentLibraryView.tsx` - Passes `handleScheduleDeleted` to child
components
- `SelectedScheduleView.tsx` - Changed callback from
`onClearSelectedRun` to `onScheduleDeleted` and passes schedule ID
- `SidebarRunsList.tsx` - Added `onScheduleDeleted` prop and passes it
through to list items
### Checklist 📋
**Test Plan:**
- [] Create 2-3 test schedules for an agent
- [] Delete a schedule from the detail view (trash icon in actions) when
other schedules exist → Verify next schedule auto-selects without error
- [] Delete a schedule from the sidebar dropdown (three-dot menu) when
other schedules exist → Verify next schedule auto-selects without error
- [] Delete all schedules until only one remains → Verify empty state
shows gracefully without error
- [] Verify "Schedule deleted" toast appears on successful deletion
- [] Verify no error screen appears at any point during deletion flow
## Summary
Adds four missing Mistral AI flagship models to address the critical
coverage gap identified in
[SECRT-2082](https://linear.app/autogpt/issue/SECRT-2082).
## Models Added
| Model | Context | Max Output | Price Tier | Use Case |
|-------|---------|------------|------------|----------|
| **Mistral Large 3** | 262K | None | 2 (Medium) | Flagship reasoning
model, 41B active params (675B total), MoE architecture |
| **Mistral Medium 3.1** | 131K | None | 2 (Medium) | Balanced
performance/cost, 8x cheaper than traditional large models |
| **Mistral Small 3.2** | 131K | 131K | 1 (Low) | Fast, cost-efficient,
high-volume use cases |
| **Codestral 2508** | 256K | None | 1 (Low) | Code generation
specialist (FIM, correction, test gen) |
## Problem
Previously, the platform only offered:
- Mistral Nemo (1 official model)
- dolphin-mistral (third-party Ollama fine-tune)
This left significant gaps in Mistral's lineup, particularly:
- No flagship reasoning model
- No balanced mid-tier option
- No code-specialized model
- Missing multimodal capabilities (Large 3, Medium 3.1, Small 3.2 all
support text+image)
## Changes
**File:** `autogpt_platform/backend/backend/blocks/llm.py`
- Added 4 enum entries in `LlmModel` class
- Added 4 metadata entries in `MODEL_METADATA` dict
- All models use OpenRouter provider
- Follows existing pattern for model additions
## Testing
- ✅ Enum values match OpenRouter model IDs
- ✅ Metadata follows existing format
- ✅ Context windows verified from OpenRouter API
- ✅ Price tiers assigned appropriately
## Closes
- SECRT-2082
---
**Note:** All models are available via OpenRouter and tested. This
brings Mistral coverage in line with other major providers (OpenAI,
Anthropic, Google).
Requested by @0ubbe
Refines the `pickBestVoice()` function to ensure non-robotic voices are
always preferred:
- **Filter out known low-quality engines** — eSpeak, Festival, MBROLA,
Flite, and Pico voices are deprioritized
- **Prefer remote/cloud-backed voices** — `localService: false` voices
are typically higher quality
- **Expand preferred voices list** — added Moira, Tessa (macOS), Jenny,
Aria, Guy (Windows OneCore)
- **Smarter fallback chain** — English default → English → any default →
first available
The previous fallback could select eSpeak or Festival voices on Linux
systems, resulting in robotic output. Now those are filtered out unless
they're the only option.
---
Co-authored-by: Ubbe <ubbe@users.noreply.github.com>
---------
Co-authored-by: Ubbe <hi@ubbe.dev>
Co-authored-by: Lluis Agusti <hi@llu.lu>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Workspace file downloads (images, CSVs, etc.) were silently truncated
(~10 KB lost from the end) when served through the Next.js proxy
- Root cause: `new NextResponse(response.body)` passes a
`ReadableStream` directly, which Next.js / Vercel silently truncates for
larger files
- Fix: fully buffer with `response.arrayBuffer()` before forwarding, and
set `Content-Length` from the actual buffer size
- Keeps the auth proxy intact — no signed URLs (which would be public
and expire, breaking chat history)
## Root cause verification
Confirmed locally on session `080f27f9-0379-4085-a67a-ee34cc40cd62`:
- Backend `write_workspace_file` logs **978,831 bytes** written
- Direct backend download (`curl
localhost:8006/api/workspace/files/.../download`): **978,831 bytes** ✅
- Browser download through Next.js proxy: **truncated** ❌
## Why not signed URLs?
- Signed URLs are effectively public — anyone with the link can download
the file (privacy concern)
- Signed URLs expire, but chat history persists — reopening a
conversation later would show broken downloads
- Buffering is fine: workspace files are capped at 100 MB, Vercel
function memory is 1 GB
## Related
- Discord thread: `#Truncated File Bug` channel
- Related PR #12319 (signed URL approach) — this fix is simpler and
preserves auth
## Test plan
- [ ] Download a workspace file (CSV, PNG, any type) through the copilot
UI
- [ ] Verify downloaded file size matches the original
- [ ] Verify PNGs open correctly and CSVs have all rows
cc @Swiftyos @uberdot @AdarshRawat1
## Summary
- Integrates existing Human-In-The-Loop (HITL) review infrastructure
into CoPilot's direct block execution (`run_block`) for blocks marked
with `is_sensitive_action=True`
- Removes the `PendingHumanReview → AgentGraphExecution` FK constraint
to support synthetic CoPilot session IDs (migration included)
- Adds `ReviewRequiredResponse` model + frontend `ReviewRequiredCard`
component to surface review status in the chat UI
- Auto-approval works within a CoPilot session: once a block is
approved, subsequent executions of the same block in the same session
are auto-approved (using `copilot-session-{session_id}` as
`graph_exec_id` and `copilot-node-{block_id}` as `node_id`)
## Test plan
- [x] All 11 `run_block_test.py` tests pass (3 new sensitive action
tests)
- [ ] Manual: Execute a block with `is_sensitive_action=True` in CoPilot
→ verify ReviewRequiredResponse is returned and rendered
- [ ] Manual: Approve in review panel → re-execute the same block →
verify auto-approval kicks in
- [ ] Manual: Verify non-sensitive blocks still execute without review
bfb843a renamed `validate_url` to `validate_url_host` in
`agent_browser`, `run_mcp_tool`, and MCP routes, but the corresponding
test files still patched the old name, causing `AttributeError` in CI.
Updates all mock patch targets and assertions across 3 test files:
- `agent_browser_test.py`
- `test_run_mcp_tool.py`
- `mcp/test_routes.py`
---
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>
* Fix SSRF via user-controlled ollama_host field
Validate ollama_host against BLOCKED_IP_NETWORKS before passing to
ollama.AsyncClient(). The server-configured default (env: OLLAMA_HOST)
is allowed without validation; user-supplied values that differ are
checked for private/internal IP resolution.
Fixes GHSA-6jx2-4h7q-3fx3
* Generalize validate_ollama_host to validate_host; fix description line length
* Rename to validate_untrusted_host with whitelist parameter
* Apply PR suggestion: include whitelist in error message; run formatting
* Move whitelist check after URL normalization; match on netloc
* revert unrelated formatting changes
* Dedup validate_url and validate_untrusted_host; normalize whitelist
* Move _resolve_and_check_blocked after calling functions
* dedup and clean up
* make trusted_hostnames truly optional
---------
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
## Summary
- **Discriminated union support (oneOf)**: Added a new `OneOfField`
component that properly
renders Pydantic discriminated unions. Hides the unusable parent object
handle, auto-populates
the discriminator value, shows a dropdown with variant titles (e.g.,
"Username" / "UserId"), and
filters out the internal discriminator field from the form.
Non-discriminated `oneOf` schemas
fall back to existing `AnyOfField` behavior.
- **Collapsible object outputs**: Object-type outputs with nested keys
(e.g.,
`PersonLookupResponse.Url`, `PersonLookupResponse.profile`) are now
collapsed by default behind a
caret toggle. Nested keys show short names instead of the full
`Parent.Key` prefix.
- **Node layout cleanup**: Removed excessive bottom margin (`mb-6`) from
`FormRenderer`, hide the
Advanced toggle when no advanced fields exist, and add rounded bottom
corners on OUTPUT-type
blocks.
<img width="440" height="427" alt="Screenshot 2026-03-10 at 11 31 55 AM"
src="https://github.com/user-attachments/assets/06cc5414-4e02-4371-bdeb-1695e7cb2c97"
/>
<img width="371" height="320" alt="Screenshot 2026-03-10 at 11 36 52 AM"
src="https://github.com/user-attachments/assets/1a55f87a-c602-4f4d-b91b-6e49f810e5d5"
/>
## Test plan
- [x] Add a Twitter Get User block — verify "Identifier" shows a
dropdown (Username/UserId) with
no unusable parent handle, discriminator field is hidden, and the block
can run without staying
INCOMPLETE
- [x] Add any block with object outputs (e.g., PersonLookupResponse) —
verify nested keys are
collapsed by default and expand on click with short labels
- [x] Verify blocks without advanced fields don't show the Advanced
toggle
- [x] Verify existing `anyOf` schemas (optional types, 3+ variant
unions) still render correctly
- [x] Check OUTPUT-type blocks have rounded bottom corners
---------
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
Co-authored-by: eureka928 <meobius123@gmail.com>
## Summary
Fixes [OPEN-3025](https://linear.app/autogpt/issue/OPEN-3025) —
**107,571+ Server Action errors** in production
Removes the orphaned `askOtto` Server Action that was left behind after
the Otto chat widget removal in PR #12082.
## Problem
Next.js Server Actions that are never imported are excluded from the
server manifest. Old client bundles still reference the action ID,
causing "not found" errors.
**Sentry impact:**
- **BUILDER-3BN:** 107,571 events
- **BUILDER-729:** 285 events
- **BUILDER-3QH:** 1,611 events
- **36+ users affected**
## Root Cause
1. **Mar 2025:** Otto widget added to `/build` page with `askOtto`
Server Action
2. **Feb 2026:** Otto widget removed (PR #12082), but `actions.ts` left
behind
3. **Result:** Dead code → not in manifest → errors
## Evidence
```bash
# Zero imports across frontend:
grep -r "askOtto" src/ --exclude="actions.ts"
# → No results
# Server manifest missing the action:
cat .next/server/server-reference-manifest.json
# → Only includes login/supabase actions, NOT build/actions
```
## Changes
- ❌ Delete
`autogpt_platform/frontend/src/app/(platform)/build/actions.ts`
## Testing
1. Verify no imports of `askOtto` in codebase ✅
2. Check Sentry for error drop after deploy
3. Monitor for new "Server Action not found" errors
## Checklist
- [x] Dead code confirmed (zero imports)
- [x] Sentry issues documented
- [x] Clear commit message with context
## Summary
Fixes undo in the Builder not working correctly when deleting nodes.
When a node is deleted, React Flow fires `onNodesChange` (node removal)
and `onEdgesChange` (cascading edge cleanup) as separate callbacks —
each independently pushing to the undo history stack. This creates
intermediate states that break undo:
- Single undo restores a partial state (e.g. edges pointing to a deleted
node)
- Multiple undos required to fully restore the graph
- Redo also produces inconsistent states
Resolves#10999
### Changes 🏗️
- **`historyStore.ts`** — Added microtask-based batching to
`pushState()`. Multiple calls within the same synchronous execution
(same event loop tick) are coalesced into a single history entry,
keeping only the first pre-change snapshot. Uses `queueMicrotask` so all
cascading store updates from a single user action settle before the
history entry is committed.
- Reset `pendingState` in `initializeHistory()` and `clear()` to prevent
stale batched state from leaking across graph loads or navigation.
**Side benefit:** Copy/paste operations that add multiple nodes and
edges now also produce a single history entry instead of one per
node/edge.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Place 3 blocks A, B, C and connect A→B→C
- [x] Delete block C (removes node + cascading edge B→C)
- [x] Delete connection A→B
- [x] Undo — connection A→B restored (single undo, not multiple)
- [x] Undo — block C and connection B→C restored
- [x] Redo — block C removed again with its connections
- [x] Copy/paste multiple connected blocks — single undo reverts entire
paste
---------
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
Co-authored-by: Abhimanyu Yadav <122007096+Abhi1992002@users.noreply.github.com>
## Summary
- **Problem**: When the LLM calls a tool with large file content, it
must rewrite all content token-by-token. This is wasteful since the
files are already accessible on disk.
- **Solution**: Introduces an \`@@agptfile:\` reference protocol. The
LLM passes a file path reference; the processor loads and substitutes
the content before executing the tool.
### Protocol
\`\`\`
@@agptfile:<uri>[<start>-<end>]
\`\`\`
**Supported URI types:**
| URI | Source |
|-----|--------|
| \`workspace://<file_id>\` | Persistent workspace file by ID |
| \`workspace:///<path>\` | Workspace file by virtual path |
| \`/absolute/path\` | Absolute host or sandbox path |
**Line range** is optional; omitting it reads the whole file.
### Backend changes
- Rename \`@file:\` → \`@@agptfile:\` prefix for uniqueness; extract
\`FILE_REF_PREFIX\` constant
- Extract shared execution-context ContextVars into
\`backend/copilot/context.py\` — eliminates duplicate ContextVar objects
that caused \`e2b_file_tools.py\` to always see empty context
- \`tool_adapter.py\` imports ContextVars from \`context.py\` (single
source of truth)
- \`expand_file_refs_in_string\` raises \`FileRefExpansionError\` on
failure (instead of inline error strings), blocking tool execution and
returning a clear error hint to the model
- Tighten URI regex: only expand refs starting with \`workspace://\` or
\`/\`
- Aggregate budget: 1 MB total expansion cap across all refs in one
string
- Per-file cap: 200 KB per individual ref
- Fix \`_read_file_handler\` to pass \`get_sdk_cwd()\` to
\`is_allowed_local_path\` — ephemeral working directory files were
incorrectly blocked
- Fix \`_is_allowed_local\` in \`e2b_file_tools.py\` to pass
\`get_sdk_cwd()\`
- Restrict local path allow-list to \`tool-results/\` subdirectory only
(was entire session project dir)
- Add \`raise_on_error\` param + remove two-pass \`_FILE_REF_ERROR_RE\`
detection
- Update system prompt docs and tool_adapter error messages
### Frontend changes
- \`BlockInputCard\`: hidden by default with Show/Hide toggle + \`mb-2\`
spacing
## Test plan
- [ ] \`poetry run pytest backend/copilot/ -x
--ignore=backend/copilot/sdk/file_ref_integration_test.py\` passes
- [ ] \`@@agptfile:workspace:///<path>[1-50]\` expands correctly in tool
calls
- [ ] Invalid line ranges produce \`[file-ref error: ...]\` inline
messages
- [ ] Files outside \`sdk_cwd\` / \`tool-results/\` are rejected
- [ ] Block input card shows hidden by default with toggle
## Summary
Port the agent generation pipeline from the external AgentGenerator
service into local copilot tools, making the Claude Agent SDK itself
handle validation, fixing, and block recommendation — no separate inner
LLM calls needed.
Key capabilities:
- **Local agent generation**: Create, edit, and customize agents
entirely within the SDK session
- **Graph validation**: 9 validation checks (block existence, link
references, type compatibility, IO blocks, etc.)
- **Graph fixing**: 17+ auto-fix methods (ID repair, link rewiring, type
conversion, credential stripping, dynamic block sink names, etc.)
- **MCP tool blocks**: Guide and fixer support for MCPToolBlock nodes
with proper dynamic input schema handling
- **Sub-agent composition**: AgentExecutorBlock support with library
agent schema enrichment
- **Embedding fallback**: Falls back to OpenRouter for embeddings when
`openai_internal_api_key` is unavailable
- **Actionable error messages**: Excluded block types (MCP, Agent)
return specific hints redirecting to the correct tool
### New Tools
- `validate_agent_graph` — run 9 validation checks on agent JSON
- `fix_agent_graph` — apply 17+ auto-fixes to agent JSON
- `get_blocks_for_goal` — recommend blocks for a given goal (with
optimized descriptions)
### Refactored Tools
- `create_agent`, `edit_agent`, `customize_agent` — accept `agent_json`
for local generation with shared fix→validate→save pipeline
- `find_block` — added `include_schemas` parameter, excludes MCP/Agent
blocks with actionable hints
- `run_block` — actionable error messages for excluded block types
- `find_library_agent` — enriched with `graph_version`, `input_schema`,
`output_schema` for sub-agent composition
### Architecture
- Split 2,558-line `validation.py` into `fixer.py`, `validator.py`,
`helpers.py`, `pipeline.py`
- Extracted shared `fix_validate_and_save()` pipeline (was duplicated
across 3 tools)
- Shared `OPENROUTER_BASE_URL` constant across codebase
- Comprehensive test coverage: 78+ unit tests for fixer/validator, 8
run_block tests, 17 SDK compat tests
## Test plan
- [x] `poetry run format` passes
- [x] `poetry run pytest -s -vvv backend/copilot/` — all tests pass
- [x] CI green on all Python versions (3.11, 3.12, 3.13)
- [x] Manual E2E: copilot generates agents with correct IO blocks,
links, and node structure
- [x] Manual E2E: MCP tool blocks use bare field names for dynamic
inputs
- [x] Manual E2E: sub-agent composition with AgentExecutorBlock
## Summary
- Add 8 Claude Code skills under \`.claude/skills/\` that act as
**auto-triggered guidelines** — the LLM invokes them automatically based
on context, no manual \`/command\` needed
- Skills: \`pr-review\`, \`pr-create\`, \`new-block\`,
\`openapi-regen\`, \`backend-check\`, \`frontend-check\`,
\`worktree-setup\`, \`code-style\`
- Each skill has an explicit TRIGGER condition so the LLM knows when to
apply it without being asked
## Changes
### Skills (all auto-triggered by context)
| Skill | Trigger |
|-------|---------|
| \`pr-review\` | User shares a PR URL or asks to address review
comments |
| \`pr-create\` | User asks to create a PR, push changes for review, or
submit work |
| \`new-block\` | User asks to create a new block or add a new
integration |
| \`openapi-regen\` | API routes change, new endpoints added, or
frontend types are stale |
| \`backend-check\` | Backend Python code has been modified |
| \`frontend-check\` | Frontend TypeScript/React code has been modified
|
| \`worktree-setup\` | User asks to work on a branch in isolation or set
up a worktree |
| \`code-style\` | Writing or reviewing Python code |
## Test plan
- [ ] Verify skills appear automatically in Claude Code when context
matches (no \`/command\` needed)
- [ ] Modify frontend code — confirm \`frontend-check\` fires
automatically
- [ ] Ask Claude to "create a PR" — confirm \`pr-create\` fires without
\`/pr-create\`
- [ ] Share a PR URL — confirm \`pr-review\` fires automatically
---------
Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
### Before
- E2B sandboxes ran continuously between CoPilot turns, billing for idle
time
- Sandbox timeout caused **termination** (kill), losing all session
state
- No explicit cleanup when sessions were deleted — sandboxes leaked
- Single timeout concept with no separation between pause and kill
semantics
### After
- **Per-turn pause**: `pause_sandbox()` is called in the `finally` block
after every CoPilot turn, stopping billing instantly between turns
(paused sandboxes cost \$0 compute)
- **Auto-pause safety net**: Sandboxes are created with
`lifecycle={"on_timeout": "pause"}` (`pause_timeout` = 4h default) so
they auto-pause rather than terminate if the explicit pause is missed
- **Auto-reconnect**: `AsyncSandbox.connect()` in e2b SDK v2
auto-resumes paused sandboxes transparently — no extra code needed
- **Session delete cleanup**: `kill_sandbox()` is now called in
`delete_chat_session()` to explicitly terminate sandboxes and free
resources
- **Two distinct timeouts**: `pause_timeout` (4h, e2b auto-pause) vs
`redis_ttl` (12h, session key lifetime)
### Key Changes
| File | Change |
|------|--------|
| `pyproject.toml` | Bump `e2b-code-interpreter` `1.x` → `2.x` |
| `e2b_sandbox.py` | Add `pause_sandbox()`, `kill_sandbox()`,
`_act_on_sandbox()` helper; `lifecycle={"on_timeout": "pause"}`;
separate `pause_timeout` / `redis_ttl` params |
| `sdk/service.py` | Call `pause_sandbox()` in `finally` block
**before** transcript upload; use walrus operator for type-safe
`e2b_api_key` narrowing |
| `model.py` | Call `kill_sandbox()` in `delete_chat_session()`; inline
import to avoid circular dependency |
| `config.py` | Add `e2b_active` property; rename `e2b_sandbox_timeout`
default to 4h |
| `e2b_sandbox_test.py` | Add `test_pause_then_reconnect_reuses_sandbox`
test; update all `sandbox_timeout` → `pause_timeout` |
### Verified E2E
- Used real `E2B_API_KEY` from k8s dev cluster to manually verify:
sandbox created → paused → `is_running() == False` → reconnected via
`connect()` → state preserved → killed
## Test plan
- [x] `poetry run pytest backend/copilot/tools/e2b_sandbox_test.py` —
all 19 tests pass
- [x] CI: test (3.11, 3.12, 3.13), types — all green
- [x] E2E verified with real E2B credentials
## Summary
<img width="400" height="227" alt="Screenshot 2026-03-09 at 22 43 10"
src="https://github.com/user-attachments/assets/0116e260-860d-4466-9763-e02de2766e50"
/>
<img width="600" height="618" alt="Screenshot 2026-03-09 at 22 43 14"
src="https://github.com/user-attachments/assets/beaa6aca-afa8-483f-ac06-439bf162c951"
/>
- When the copilot stream finishes, tool calls that require user
interaction (credentials, inputs, clarification) are now **pinned**
outside the "Show reasoning" collapse instead of being hidden
- Added `isInteractiveToolPart()` helper that checks tool output's
`type` field against a set of interactive response types
- Modified `splitReasoningAndResponse()` to extract interactive tools
from reasoning into the visible response section
- Added styleguide section with 3 demos: `setup_requirements`,
`agent_details`, and `agent_saved` pinning scenarios
### Interactive response types kept visible:
`setup_requirements`, `agent_details`, `block_details`, `need_login`,
`input_validation_error`, `clarification_needed`, `suggested_goal`,
`agent_preview`, `agent_saved`
Error responses remain in reasoning (LLM explains them in final text).
Closes SECRT-2088
## Test plan
- [ ] Verify copilot stream with interactive tool (e.g. run_agent
requiring credentials) keeps the tool card visible after stream ends
- [ ] Verify non-interactive tools (find_block, bash_exec) still
collapse into "Show reasoning"
- [ ] Verify styleguide page at `/copilot/styleguide` renders the new
"Reasoning Collapse: Interactive Tool Pinning" section correctly
- [ ] Verify `pnpm types`, `pnpm lint`, `pnpm format` all pass
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Requested by @0ubbe
The **New Chat** button was visible on the Autopilot homepage where
clicking it has no effect (since `sessionId` is already `null`). This
hides the button when no chat session is active, so it only appears when
the user is viewing a conversation and wants to start a new one.
**Changes:**
- `ChatSidebar.tsx` — hide button in both collapsed and expanded sidebar
states when `sessionId` is null
- `MobileDrawer.tsx` — same fix for mobile drawer
---
Co-authored-by: Ubbe <ubbe@users.noreply.github.com>
Requested by @olivia-1421
Moves the microphone/recording button from the left-side tools group to
the right side, next to the submit button. The left side is now reserved
for the attachment/upload (plus) button only.
**Before:** `[ 📎🎤 ] .................. [ ➤ ]`
**After:** `[ 📎 ] .................. [ 🎤 ➤ ]`
---
Co-authored-by: Olivia <olivia-1421@users.noreply.github.com>
---------
Co-authored-by: Ubbe <hi@ubbe.dev>
Co-authored-by: Lluis Agusti <hi@llu.lu>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
When a webhook-triggered agent is executed directly (e.g. via Copilot)
without actual webhook data, `GraphExecution.from_db()` crashes with
`KeyError: 'payload'` because it does a hard key access on
`exec.input_data["payload"]` for webhook blocks.
This caused 232 Sentry events (AUTOGPT-SERVER-821) and multiple
INCOMPLETE graph executions due to retries.
**Changes:**
1. **Defensive fix in `from_db()`** — use `.get("payload")` instead of
`["payload"]` to handle missing keys gracefully (matches existing
pattern for input blocks using `.get("value")`)
2. **Upfront refusal in `_construct_starting_node_execution_input()`** —
refuse execution of webhook/webhook_manual blocks when no payload is
provided. The check is placed after `nodes_input_masks` application, so
legitimate webhook triggers (which inject payload via
`nodes_input_masks`) pass through fine.
Resolves [SENTRY-1113: Copilot is able to manually initiate runs for
triggered agents (which
fails)](https://linear.app/autogpt/issue/SENTRY-1113/copilot-is-able-to-manually-initiate-runs-for-triggered-agents-which)
---
Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>
## Summary
- Fixes tool results not being captured in the CoPilot transcript during
SDK-based streaming
- Adds `transcript_builder.add_user_message()` call with `tool_result`
content block when a `StreamToolOutputAvailable` event is received
- Ensures transcript accurately reflects the full conversation including
tool outputs, which is critical for Langfuse tracing and debugging
## Context
After the transcript refactor in #12318, tool call results from the SDK
streaming loop were not being recorded in the transcript. This meant
Langfuse traces were missing tool outputs, making it hard to debug agent
behavior.
## Test plan
- [ ] Verify CoPilot conversation with tool calls captures tool results
in Langfuse traces
- [ ] Verify transcript includes tool_result content blocks after tool
execution
## Summary
Centralizes all prompt building logic into a new
`backend/copilot/prompting.py` module with clear SDK vs baseline and
local vs E2B distinctions.
### Key Changes
**New `prompting.py` module:**
- `get_sdk_supplement(use_e2b, cwd)` - For SDK mode (NO tool docs -
Claude gets schemas automatically)
- `get_baseline_supplement(use_e2b, cwd)` - For baseline mode (WITH
auto-generated tool docs from TOOL_REGISTRY)
- Handles local/E2B storage differences
**SDK mode (`sdk/service.py`):**
- Removed 165+ lines of duplicate constants
- Now imports and uses `get_sdk_supplement()`
- Cleaner, more maintainable
**Baseline mode (`baseline/service.py`):**
- Now appends `get_baseline_supplement()` to system prompt
- Baseline mode finally gets tool documentation!
**Enhanced tool descriptions:**
- `create_agent`: Added feedback loop workflow (suggested_goal,
clarifying_questions)
- `run_mcp_tool`: Added known server URLs, 2-step workflow, auth
handling
**Tests:**
- Updated to verify SDK excludes tool docs, baseline includes them
- All existing tests pass
### Architecture Benefits
✅ Single source of truth for prompt supplements
✅ Clear SDK vs baseline distinction (SDK doesn't need tool docs)
✅ Clear local vs E2B distinction (storage systems)
✅ Easy to maintain and update
✅ Eliminates code duplication
## Test plan
- [x] Unit tests pass (TestPromptSupplement class)
- [x] SDK mode excludes tool documentation
- [x] Baseline mode includes tool documentation
- [x] E2B vs local mode differences handled correctly
Adds folder management capabilities to the CoPilot, allowing users to
organize agents into folders directly from the chat interface.
<img width="823" height="356" alt="Screenshot 2026-03-05 at 5 26 30 PM"
src="https://github.com/user-attachments/assets/4c55f926-1e71-488f-9eb6-fca87c4ab01b"
/>
<img width="797" height="150" alt="Screenshot 2026-03-05 at 5 28 40 PM"
src="https://github.com/user-attachments/assets/5c9c6f8b-57ac-4122-b17d-b9f091bb7c4e"
/>
<img width="763" height="196" alt="Screenshot 2026-03-05 at 5 28 36 PM"
src="https://github.com/user-attachments/assets/d1b22b5d-921d-44ac-90e8-a5820bb3146d"
/>
<img width="756" height="199" alt="Screenshot 2026-03-05 at 5 30 17 PM"
src="https://github.com/user-attachments/assets/40a59748-f42e-4521-bae0-cc786918a9b5"
/>
### Changes
**Backend -- 6 new CoPilot tools** (`manage_folders.py`):
- `create_folder` -- Create folders with optional parent, icon, and
color
- `list_folders` -- List folder tree or children of a specific folder,
with optional `include_agents` to show agents inside each folder
- `update_folder` -- Rename or change icon/color
- `move_folder` -- Reparent a folder or move to root
- `delete_folder` -- Soft-delete (agents moved to root, not deleted)
- `move_agents_to_folder` -- Bulk-move agents into a folder or back to
root
**Backend -- DatabaseManager RPC registration**:
- Registered all 7 folder DB functions (`create_folder`, `list_folders`,
`get_folder_tree`, `update_folder`, `move_folder`, `delete_folder`,
`bulk_move_agents_to_folder`) in `DatabaseManager` and
`DatabaseManagerAsyncClient` so they work via RPC in the CoPilotExecutor
process
- `manage_folders.py` uses `db_accessors.library_db()` pattern
(consistent with all other copilot tools) instead of direct Prisma
imports
**Backend -- folder_id threading**:
- `create_agent` and `customize_agent` tools accept optional `folder_id`
to save agents directly into a folder
- `save_agent_to_library` -> `create_graph_in_library` ->
`create_library_agent` pipeline passes `folder_id` through
- `create_library_agent` refactored from `asyncio.gather` to sequential
loop to support conditional `folderId` assignment on the main graph only
(not sub-graphs)
**Backend -- system prompt and models**:
- Added folder tool descriptions and usage guidance to Otto's system
prompt
- Added `FolderAgentSummary` model for lightweight agent info in folder
listings
- Added 6 `ResponseType` enum values and corresponding Pydantic response
models (`FolderInfo`, `FolderTreeInfo`, `FolderCreatedResponse`, etc.)
**Frontend -- FolderTool UI component**:
- `FolderTool.tsx` -- Renders folder operations in chat using the
`file-tree` molecule component for tree view, with `FileIcon` for agents
and `FolderIcon` for folders (both `text-neutral-600`)
- `helpers.ts` -- Type guards, output parsing, animation text helpers,
and `FolderAgentSummary` type
- `MessagePartRenderer.tsx` -- Routes 6 folder tool types to
`FolderTool` component
- Flat folder list view shows agents inside `FolderCard` when
`include_agents` is set
**Frontend -- file-tree molecule**:
- Fixed 3 pre-existing lint errors in `file-tree.tsx` (unused `ref`,
`handleSelect`, `className` params)
- Updated tree indicator line color from `bg-neutral-100` to
`bg-neutral-400` for visibility
- Added `file-tree.stories.tsx` with 5 stories: Default, AllExpanded,
FoldersOnly, WithInitialSelection, NoIndicator
- Added `ui/scroll-area.tsx` (dependency of file-tree, was missing from
non-legacy ui folder)
### Checklist
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Create a folder via copilot chat ("create a folder called
Marketing")
- [x] List folders ("show me my folders")
- [x] List folders with agents ("show me my folders and the agents in
them")
- [x] Update folder name/icon/color ("rename Marketing folder to Sales")
- [x] Move folder to a different parent ("move Sales into the Projects
folder")
- [x] Delete a folder and verify agents move to root
- [x] Move agents into a folder ("put my newsletter agent in the
Marketing folder")
- [x] Create agent with folder_id ("create a scraper agent and save it
in my Tools folder")
- [x] Verify FolderTool UI renders loading, success, error, and empty
states correctly
- [x] Verify folder tree renders nested folders with file-tree component
- [x] Verify agents appear as FileIcon nodes in tree view when
include_agents is true
- [x] Verify file-tree storybook stories render correctly
These changes were part of #12206, but here they are separately for
easier review.
This is all primarily to make the v2 API (#11678) work possible/easier.
### Changes 🏗️
- Fix relations between `Profile`, `StoreListing`, and `AgentGraph`
- Redefine `StoreSubmission` view with more efficient joins (100x
speed-up on dev DB) and more consistent field names
- Clean up query functions in `store/db.py`
- Clean up models in `store/model.py`
- Add missing fields to `StoreAgent` and `StoreSubmission` views
- Rename ambiguous `agent_id` -> `graph_id`
- Clean up API route definitions & docs in `store/routes.py`
- Make routes more consistent
- Avoid collision edge-case between `/agents/{username}/{agent_name}`
and `/agents/{store_listing_version_id}/*`
- Replace all usages of legacy `BackendAPI` for store endpoints with
generated client
- Remove scope requirements on public store endpoints in v1 external API
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Test all Marketplace views (including admin views)
- [x] Download an agent from the marketplace
- [x] Submit an agent to the Marketplace
- [x] Approve/reject Marketplace submission
## Summary
- **Frontend:** Group consecutive completed generic tool parts into
collapsible summary rows with a "Reasoning" collapse for finalized
messages. Merge consecutive assistant messages on hydration to avoid
split bubbles. Extract GenericTool helpers. Add `reconnectExhausted`
state and a brief delay before refetching session to reduce stale
`active_stream` reconnect cycles.
- **Backend:** Make transcript upload fire-and-forget instead of
blocking the generator exit. The 30s upload timeout in
`_try_upload_transcript` was delaying `mark_session_completed()`,
keeping the SSE stream alive with only heartbeats after the LLM had
finished — causing the UI to stay stuck in "streaming" state.
## Test plan
- [ ] Send a message in Copilot that triggers multiple tool calls —
verify they collapse into a grouped summary row once completed
- [ ] Verify the final text response appears below the collapsed
reasoning section
- [ ] Confirm the stream properly closes after the agent finishes (no
stuck "Stop" button)
- [ ] Refresh mid-stream and verify reconnection works correctly
- [ ] Click Stop during streaming — verify the UI becomes responsive
immediately
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- File uploads routed through the Next.js API proxy (`/api/proxy/...`)
fail with HTTP 413 for files >4.5MB due to Vercel's serverless function
body size limit
- Created shared `uploadFileDirect` utility (`src/lib/direct-upload.ts`)
that uploads files directly from the browser to the Python backend,
bypassing the proxy entirely
- Updated `useWorkspaceUpload` to use direct upload instead of the
generated hook (which went through the proxy)
- Deduplicated the copilot page's inline upload logic to use the same
shared utility
## Changes 🏗️
- **New**: `src/lib/direct-upload.ts` — shared utility for
direct-to-backend file uploads (up to 256MB)
- **Updated**: `useWorkspaceUpload.ts` — replaced proxy-based generated
hook with `uploadFileDirect`
- **Updated**: `useCopilotPage.ts` — replaced inline upload logic with
shared `uploadFileDirect`, removed unused imports
## Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Upload a file >5MB via workspace file input (e.g. in agent
builder) — should succeed without 413
- [x] Upload a file >5MB via copilot chat — should succeed without 413
- [x] Upload a small file (<1MB) via both paths — should still work
- [x] Verify file delete still works from workspace file input
When `add_graph_execution` is called from a context where the global
Prisma client isn't connected (e.g. CoPilot tools, external API), the
call to `get_or_create_workspace(user_id)` crashes with
`ClientNotConnectedError` because it directly accesses
`UserWorkspace.prisma()`.
The fix adds `workspace_db` to the existing `if prisma.is_connected()`
fallback pattern, consistent with how all other DB calls in the function
already work.
**Sentry:** AUTOGPT-SERVER-83T (and ~15 related issues going back to Jan
2026)
---
Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>
Co-authored-by: Reinier van der Leer (@Pwuts) <pwuts@agpt.co>
description: React and Next.js performance optimization guidelines from Vercel Engineering. This skill should be used when writing, reviewing, or refactoring React/Next.js code to ensure optimal performance patterns. Triggers on tasks involving React components, Next.js pages, data fetching, bundle optimization, or performance improvements.
license: MIT
metadata:
author: vercel
version: "1.0.0"
---
# Vercel React Best Practices
Comprehensive performance optimization guide for React and Next.js applications, maintained by Vercel. Contains 45 rules across 8 categories, prioritized by impact to guide automated refactoring and code generation.
## When to Apply
Reference these guidelines when:
- Writing new React components or Next.js pages
- Implementing data fetching (client or server-side)
`useEffectEvent` provides a cleaner API for the same pattern: it creates a stable function reference that always calls the latest version of the handler.
Import directly from source files instead of barrel files to avoid loading thousands of unused modules. **Barrel files** are entry points that re-export multiple modules (e.g., `index.js` that does `export * from './module'`).
Popular icon and component libraries can have **up to 10,000 re-exports** in their entry file. For many React packages, **it takes 200-800ms just to import them**, affecting both development speed and production cold starts.
**Why tree-shaking doesn't help:** When a library is marked as external (not bundled), the bundler can't optimize it. If you bundle it to enable tree-shaking, builds become substantially slower analyzing the entire module graph.
When comparing arrays with expensive operations (sorting, deep equality, serialization), check lengths first. If lengths differ, the arrays cannot be equal.
In real-world applications, this optimization is especially valuable when the comparison runs in hot paths (event handlers, render loops).
Two O(n log n) sorts run even when `current.length` is 5 and `original.length` is 100. There is also overhead of joining the arrays and comparing the strings.
## Use toSorted() Instead of sort() for Immutability
`.sort()` mutates the array in place, which can cause bugs with React state and props. Use `.toSorted()` to create a new sorted array without mutation.
This applies to all CSS transforms and transitions (`transform`, `opacity`, `translate`, `scale`, `rotate`). The wrapper div allows browsers to use GPU acceleration for smoother animations.
Use explicit ternary operators (`? :`) instead of `&&` for conditional rendering when the condition can be `0`, `NaN`, or other falsy values that render.
**Incorrect (renders "0" when count is 0):**
```tsx
functionBadge({count}:{count: number}){
return(
<div>
{count&&<spanclassName="badge">{count}</span>}
</div>
)
}
// When count = 0, renders: <div>0</div>
// When count = 5, renders: <div><span class="badge">5</span></div>
This is especially helpful for large and static SVG nodes, which can be expensive to recreate on every render.
**Note:** If your project has [React Compiler](https://react.dev/learn/react-compiler) enabled, the compiler automatically hoists static JSX elements and optimizes component re-renders, making manual hoisting unnecessary.
When rendering content that depends on client-side storage (localStorage, cookies), avoid both SSR breakage and post-hydration flickering by injecting a synchronous script that updates the DOM before React hydrates.
var theme = localStorage.getItem('theme') || 'light';
var el = document.getElementById('theme-wrapper');
if (el) el.className = theme;
} catch (e) {}
})();
`,
}}
/>
</>
)
}
```
The inline script executes synchronously before showing the element, ensuring the DOM already has the correct value. No flickering, no hydration mismatch.
This pattern is especially useful for theme toggles, user preferences, authentication states, and any client-only data that should render immediately without flashing default values.
Reduce SVG coordinate precision to decrease file size. The optimal precision depends on the viewBox size, but in general reducing precision should be considered.
**Incorrect (excessive precision):**
```svg
<pathd="M 10.293847 20.847362 L 30.938472 40.192837"/>
When updating state based on the current state value, use the functional update form of setState instead of directly referencing the state variable. This prevents stale closures, eliminates unnecessary dependencies, and creates stable callback references.
**Incorrect (requires state as dependency):**
```tsx
functionTodoList() {
const[items,setItems]=useState(initialItems)
// Callback must depend on items, recreated on every items change
The first callback is recreated every time `items` changes, which can cause child components to re-render unnecessarily. The second callback has a stale closure bug—it will always reference the initial `items` value.
**Correct (stable callbacks, no stale closures):**
```tsx
functionTodoList() {
const[items,setItems]=useState(initialItems)
// Stable callback, never recreated
constaddItems=useCallback((newItems: Item[])=>{
setItems(curr=>[...curr,...newItems])
},[])// ✅ No dependencies needed
// Always uses latest state, no stale closure risk
1.**Stable callback references** - Callbacks don't need to be recreated when state changes
2.**No stale closures** - Always operates on the latest state value
3.**Fewer dependencies** - Simplifies dependency arrays and reduces memory leaks
4.**Prevents bugs** - Eliminates the most common source of React closure bugs
**When to use functional updates:**
- Any setState that depends on the current state value
- Inside useCallback/useMemo when state is needed
- Event handlers that reference state
- Async operations that update state
**When direct updates are fine:**
- Setting state to a static value: `setCount(0)`
- Setting state from props/arguments only: `setName(newName)`
- State doesn't depend on previous value
**Note:** If your project has [React Compiler](https://react.dev/learn/react-compiler) enabled, the compiler can automatically optimize some cases, but functional updates are still recommended for correctness and to prevent stale closure bugs.
Pass a function to `useState` for expensive initial values. Without the function form, the initializer runs on every render even though the value is only used once.
**Incorrect (runs on every render):**
```tsx
functionFilteredList({items}:{items: Item[]}){
// buildSearchIndex() runs on EVERY render, even after initialization
Use lazy initialization when computing initial values from localStorage/sessionStorage, building data structures (indexes, maps), reading from the DOM, or performing heavy transformations.
For simple primitives (`useState(0)`), direct references (`useState(props.value)`), or cheap literals (`useState({})`), the function form is unnecessary.
**Note:** If your project has [React Compiler](https://react.dev/learn/react-compiler) enabled, manual memoization with `memo()` and `useMemo()` is not necessary. The compiler automatically optimizes re-renders.
Use Next.js's `after()` to schedule work that should execute after a response is sent. This prevents logging, analytics, and other side effects from blocking the response.
`React.cache()` only works within one request. For data shared across sequential requests (user clicks button A then button B), use an LRU cache.
**Implementation:**
```typescript
import{LRUCache}from'lru-cache'
constcache=newLRUCache<string,any>({
max: 1000,
ttl: 5*60*1000// 5 minutes
})
exportasyncfunctiongetUser(id: string){
constcached=cache.get(id)
if(cached)returncached
constuser=awaitdb.user.findUnique({where:{id}})
cache.set(id,user)
returnuser
}
// Request 1: DB query, result cached
// Request 2: cache hit, no DB query
```
Use when sequential user actions hit multiple endpoints needing the same data within seconds.
**With Vercel's [Fluid Compute](https://vercel.com/docs/fluid-compute):** LRU caching is especially effective because multiple concurrent requests can share the same function instance and cache. This means the cache persists across requests without needing external storage like Redis.
**In traditional serverless:** Each invocation runs in isolation, so consider Redis for cross-process caching.
The React Server/Client boundary serializes all object properties into strings and embeds them in the HTML response and subsequent RSC requests. This serialized data directly impacts page weight and load time, so **size matters a lot**. Only pass fields that the client actually uses.
[wiki page on Contributing]: https://github.com/Significant-Gravitas/AutoGPT/wiki/Contributing
- type:checkboxes
attributes:
label:⚠️ Search for existing issues first ⚠️
description:>
Please [search the history](https://github.com/Significant-Gravitas/AutoGPT/issues)
to see if an issue already exists for the same problem.
options:
- label:I have searched the existing issues, and there is no existing issue for my problem
required:true
- type:markdown
attributes:
value:|
Please confirm that the issue you have is described well and precise in the title above ⬆️.
A good rule of thumb: What would you type if you were searching for the issue?
For example:
BAD - my AutoGPT keeps looping
GOOD - After performing execute_python_file, AutoGPT goes into a loop where it keeps trying to execute the file.
⚠️ SUPER-busy repo, please help the volunteer maintainers.
The less time we spend here, the more time we can spend building AutoGPT.
Please help us help you by following these steps:
- Search for existing issues, adding a comment when you have the same or similar issue is tidier than "new issue" and
newer issues will not be reviewed earlier, this is dependent on the current priorities set by our wonderful team
- Ask on our Discord if your issue is known when you are unsure (https://discord.gg/autogpt)
- Provide relevant info:
- Provide commit-hash (`git rev-parse HEAD` gets it) if possible
- If it's a pip/packages issue, mention this in the title and provide pip version, python version
- If it's a crash, provide traceback and describe the error you got as precise as possible in the title.
- type:dropdown
attributes:
label:Which Operating System are you using?
description:>
Please select the operating system you were using to run AutoGPT when this problem occurred.
options:
- Windows
- Linux
- MacOS
- Docker
- Devcontainer / Codespace
- Windows Subsystem for Linux (WSL)
- Other
validations:
required:true
nested_fields:
- type:text
attributes:
label:Specify the system
description:Please specify the system you are working on.
- type:dropdown
attributes:
label:Which version of AutoGPT are you using?
description:|
Please select which version of AutoGPT you were using when this issue occurred.
If you downloaded the code from the [releases page](https://github.com/Significant-Gravitas/AutoGPT/releases/) make sure you were using the latest code.
**If you weren't please try with the [latest code](https://github.com/Significant-Gravitas/AutoGPT/releases/)**.
If installed with git you can run `git branch` to see which version of AutoGPT you are running.
options:
- Latest Release
- Stable (branch)
- Master (branch)
validations:
required:true
- type:dropdown
attributes:
label:What LLM Provider do you use?
description:>
If you are using AutoGPT with `SMART_LLM=gpt-3.5-turbo`, your problems may be caused by
the [limitations](https://github.com/Significant-Gravitas/AutoGPT/issues?q=is%3Aissue+label%3A%22AI+model+limitation%22) of GPT-3.5.
options:
- Azure
- Groq
- Anthropic
- Llamafile
- Other (detail in issue)
validations:
required:true
- type:dropdown
attributes:
label:Which area covers your issue best?
description:>
Select the area related to the issue you are reporting.
options:
- Installation and setup
- Memory
- Performance
- Prompt
- Commands
- Plugins
- AI Model Limitations
- Challenges
- Documentation
- Logging
- Agents
- Other
validations:
required:true
autolabels:true
nested_fields:
- type:text
attributes:
label:Specify the area
description:Please specify the area you think is best related to the issue.
- type:input
attributes:
label:What commit or version are you using?
description:It is helpful for us to reproduce to know what version of the software you were using when this happened. Please run `git log -n 1 --pretty=format:"%H"` to output the full commit hash.
validations:
required:true
- type:textarea
attributes:
label:Describe your issue.
description:Describe the problem you are experiencing. Try to describe only the issue and phrase it short but clear. ⚠️ Provide NO other data in this field
validations:
required:true
#Following are optional file content uploads
- type:markdown
attributes:
value:|
⚠️The following is OPTIONAL, please keep in mind that the log files may contain personal information such as credentials.⚠️
"The log files are located in the folder 'logs' inside the main AutoGPT folder."
- type:textarea
attributes:
label:Upload Activity Log Content
description:|
Upload the activity log content, this can help us understand the issue better.
To do this, go to the folder logs in your main AutoGPT folder, open activity.log and copy/paste the contents to this field.
⚠️ The activity log may contain personal data given to AutoGPT by you in prompt or input as well as
any personal information that AutoGPT collected out of files during last run. Do not add the activity log if you are not comfortable with sharing it. ⚠️
validations:
required:false
- type:textarea
attributes:
label:Upload Error Log Content
description:|
Upload the error log content, this will help us understand the issue better.
To do this, go to the folder logs in your main AutoGPT folder, open error.log and copy/paste the contents to this field.
⚠️ The error log may contain personal data given to AutoGPT by you in prompt or input as well as
any personal information that AutoGPT collected out of files during last run. Do not add the activity log if you are not comfortable with sharing it. ⚠️
First, check out our [wiki page on Contributing](https://github.com/Significant-Gravitas/AutoGPT/wiki/Contributing)
Please provide a searchable summary of the issue in the title above ⬆️.
- type:checkboxes
attributes:
label:Duplicates
description:Please [search the history](https://github.com/Significant-Gravitas/AutoGPT/issues) to see if an issue already exists for the same problem.
options:
- label:I have searched the existing issues
required:true
- type:textarea
attributes:
label:Summary 💡
description:Describe how it should work.
- type:textarea
attributes:
label:Examples 🌈
description:Provide a link to other implementations, or screenshots of the expected behavior.
- type:textarea
attributes:
label:Motivation 🔦
description:What are you trying to accomplish? How has the lack of this feature affected you? Providing context helps us come up with a solution that is more useful in the real world.
This file provides comprehensive onboarding information for GitHub Copilot coding agent to work efficiently with the AutoGPT repository.
## Repository Overview
**AutoGPT** is a powerful platform for creating, deploying, and managing continuous AI agents that automate complex workflows. This is a large monorepo (~150MB) containing multiple components:
- **AutoGPT Platform** (`autogpt_platform/`) - Main focus: Modern AI agent platform (Polyform Shield License)
- **Classic AutoGPT** (`classic/`) - Legacy agent system (MIT License)
- **Documentation** (`docs/`) - MkDocs-based documentation site
- **Infrastructure** - Docker configurations, CI/CD, and development tools
- `frontend/src/lib/supabase/` - Authentication and database client
**Protected Routes**: Update `frontend/lib/supabase/middleware.ts` when adding protected routes
### Agent Block System
Agents are built using a visual block-based system where each block performs a single action. Blocks are defined in `backend/blocks/` and must include:
- Block definition with input/output schemas
- Execution logic with proper error handling
- Tests validating functionality
### Database & ORM
**Prisma ORM** with PostgreSQL backend including pgvector for embeddings:
- Schema in `schema.prisma`
- Migrations in `backend/migrations/`
- Always run `prisma migrate dev` and `prisma generate` after schema changes
5. Shell environment variables have highest precedence
### Docker Environment Setup
- All services use hardcoded defaults (no `${VARIABLE}` substitutions)
- The `env_file` directive loads variables INTO containers at runtime
- Backend/Frontend services use YAML anchors for consistent configuration
- Copy `.env.default` files to `.env` for local development customization
## Advanced Development Patterns
### Adding New Blocks
1. Create file in `/backend/backend/blocks/`
2. Inherit from `Block` base class with input/output schemas
3. Implement `run` method with proper error handling
4. Generate block UUID using `uuid.uuid4()`
5. Register in block registry
6. Write tests alongside block implementation
7. Consider how inputs/outputs connect with other blocks in graph editor
### API Development
1. Update routes in `/backend/backend/api/features/`
2. Add/update Pydantic models in same directory
3. Write tests alongside route files
4. For `data/*.py` changes, validate user ID checks
5. Run `poetry run test` to verify changes
### Frontend Development
**📖 Complete Frontend Guide**: See `autogpt_platform/frontend/CONTRIBUTING.md` and `autogpt_platform/frontend/.cursorrules` for comprehensive patterns and conventions.
set +e # Ignore non-zero exit codes and continue execution
echo "Running the following command: poetry run agbenchmark --maintain --mock"
poetry run agbenchmark --maintain --mock
EXIT_CODE=$?
set -e # Stop ignoring non-zero exit codes
# Check if the exit code was 5, and if so, exit with 0 instead
if [ $EXIT_CODE -eq 5 ]; then
echo "regression_tests.json is empty."
fi
echo "Running the following command: poetry run agbenchmark --mock"
poetry run agbenchmark --mock
echo "Running the following command: poetry run agbenchmark --mock --category=data"
poetry run agbenchmark --mock --category=data
echo "Running the following command: poetry run agbenchmark --mock --category=coding"
poetry run agbenchmark --mock --category=coding
# echo "Running the following command: poetry run agbenchmark --test=WriteFile"
# poetry run agbenchmark --test=WriteFile
cd ../benchmark
poetry install
echo "Adding the BUILD_SKILL_TREE environment variable. This will attempt to add new elements in the skill tree. If new elements are added, the CI fails because they should have been pushed"
gh issue comment $PR_NUMBER --body "You changed AutoGPT's behaviour on ${{ runner.os }}. The cassettes have been updated and will be merged to the submodule when this Pull Request gets merged."
# required to fetch internal or private CodeQL packs
packages:read
# only required for workflows in private repositories
actions:read
contents:read
strategy:
fail-fast:false
matrix:
include:
- language:typescript
build-mode:none
- language:python
build-mode:none
# CodeQL supports the following values keywords for 'language': 'c-cpp', 'csharp', 'go', 'java-kotlin', 'javascript-typescript', 'python', 'ruby', 'swift'
# Use `c-cpp` to analyze code written in C, C++ or both
# Use 'java-kotlin' to analyze code written in Java, Kotlin or both
# Use 'javascript-typescript' to analyze code written in JavaScript, TypeScript or both
# To learn more about changing the languages that are analyzed or customizing the build mode for your analysis,
# see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/customizing-your-advanced-setup-for-code-scanning.
# If you are analyzing a compiled language, you can modify the 'build-mode' for that language to customize how
# your codebase is analyzed, see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/codeql-code-scanning-for-compiled-languages
steps:
- name:Checkout repository
uses:actions/checkout@v6
# Initializes the CodeQL tools for scanning.
- name:Initialize CodeQL
uses:github/codeql-action/init@v4
with:
languages:${{ matrix.language }}
build-mode:${{ matrix.build-mode }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
config:|
paths-ignore:
- classic/frontend/build/**
# For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
# queries: security-extended,security-and-quality
# If the analyze step fails for one of the languages you are analyzing with
# "We were unable to automatically build your code", modify the matrix above
# to set the build mode to "manual" for that language. Then modify this step
# to build your code.
# ℹ️ Command-line programs to run using the OS shell.
# 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
- if:matrix.build-mode == 'manual'
shell:bash
run:|
echo 'If you are using a "manual" build mode for one or more of the' \
'languages you are analyzing, replace this with the commands to build' \
# Automatically run the setup steps when they are changed to allow for easy validation, and
# allow manual testing through the repository's "Actions" tab
on:
workflow_dispatch:
push:
paths:
- .github/workflows/copilot-setup-steps.yml
pull_request:
paths:
- .github/workflows/copilot-setup-steps.yml
jobs:
# The job MUST be called `copilot-setup-steps` or it will not be picked up by Copilot.
copilot-setup-steps:
runs-on:ubuntu-latest
timeout-minutes:45
# Set the permissions to the lowest permissions possible needed for your steps.
# Copilot will be given its own token for its operations.
permissions:
# If you want to clone the repository as part of your setup steps, for example to install dependencies, you'll need the `contents: read` permission. If you don't clone the repository in your setup steps, Copilot will do this for you automatically after the steps complete.
contents:read
# You can define any steps you want, and they will run before the agent starts.
# If you do not check out your code, Copilot will do this for you.
2. **For Each Documentation File** (up to ${{ inputs.max_blocks }} files):
a. Read the documentation file
b. Identify which block(s) it documents (look for the block class name)
c. Find and read the corresponding block implementation in `autogpt_platform/backend/backend/blocks/`
d. Improve the MANUAL sections:
**"How it works" section** (within `<!-- MANUAL: how_it_works -->` markers):
- Explain the technical flow of the block
- Describe what APIs or services it connects to
- Note any important configuration or prerequisites
- Keep it concise but informative (2-4 paragraphs)
**"Possible use case" section** (within `<!-- MANUAL: use_case -->` markers):
- Provide 2-3 practical, real-world examples
- Make them specific and actionable
- Show how this block could be used in an automation workflow
3. **Important Rules**
- ONLY modify content within `<!-- MANUAL: -->` and `<!-- END MANUAL -->` markers
- Do NOT modify auto-generated sections (inputs/outputs tables, descriptions)
- Keep content accurate based on the actual block implementation
- Write for users who may not be technical experts
4. **Output**
${{ inputs.dry_run == true && 'DRY RUN MODE: Show proposed changes for each file but do NOT actually edit the files. Describe what you would change.' || 'LIVE MODE: Actually edit the files to improve the documentation.' }}
## Example Improvements
**Before (How it works):**
```
_Add technical explanation here._
```
**After (How it works):**
```
This block connects to the GitHub API to retrieve issue information. When executed,
it authenticates using your GitHub credentials and fetches issue details including
title, body, labels, and assignees.
The block requires a valid GitHub OAuth connection with repository access permissions.
It supports both public and private repositories you have access to.
```
**Before (Possible use case):**
```
_Add practical use case examples here._
```
**After (Possible use case):**
```
**Customer Support Automation**: Monitor a GitHub repository for new issues with
the "bug" label, then automatically create a ticket in your support system and
notify the on-call engineer via Slack.
**Release Notes Generation**: When a new release is published, gather all closed
issues since the last release and generate a summary for your changelog.
```
Begin by finding and listing the documentation files to process.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.