Without specifying an explicit build target it would build the `migrate` stage because it is the last stage in the Dockerfile. This caused deployment failures.
- Follow-up to #12124 and 074be7ae
## Summary
Upgrades RabbitMQ from the end-of-life `rabbitmq:3.12-management` to
`rabbitmq:4.1.4`, aligning CI, local dev, and e2e testing with
production.
## Changes
### CI Workflow (`.github/workflows/platform-backend-ci.yml`)
- **Image:** `rabbitmq:3.12-management` → `rabbitmq:4.1.4`
- **Port:** Removed 15672 (management UI) — not used
- **Health check:** Added to prevent flaky tests from race conditions
during startup
### Docker Compose (`docker-compose.platform.yml`,
`docker-compose.test.yaml`)
- **Image:** `rabbitmq:management` → `rabbitmq:4.1.4`
- **Port:** Removed 15672 (management UI) — not used
## Why
- RabbitMQ 3.12 is EOL
- We don't use the management interface, so `-management` variant is
unnecessary
- CI and local dev/e2e should match production (4.1.4)
## Testing
CI validates that backend tests pass against RabbitMQ 4.1.4 on Python
3.11, 3.12, and 3.13.
---
Closes SECRT-1703
[SECRT-2006: Dev deployment failing: poetry not found in container
PATH](https://linear.app/autogpt/issue/SECRT-2006)
- Follow-up to #12090
### Changes 🏗️
- Remove now-broken Poetry path config values
- Remove usage of now-broken `poetry run` in container run command
- Add trigger on `backend/Dockerfile` changes to Frontend CI workflow
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- If it works, CI will pass
## Summary
Adds the ability to delete chat sessions from the CoPilot interface.
## Changes
### Backend
- Add `DELETE /api/chat/sessions/{session_id}` endpoint in `routes.py`
- Returns 204 on success, 404 if not found or not owned by user
- Reuses existing `delete_chat_session` function from `model.py`
### Frontend
- Add delete button (trash icon) that appears on hover for each chat
session
- Add confirmation dialog before deletion using existing
`DeleteConfirmDialog` component
- Refresh session list after successful delete
- Clear current session selection if the deleted session was active
- Update OpenAPI spec with new endpoint
## Testing
1. Hover over a chat session in sidebar → trash icon appears
2. Click trash icon → confirmation dialog
3. Confirm deletion → session removed, list refreshes
4. If deleted session was active, selection is cleared
## Screenshots
Delete button appears on hover, confirmation dialog on click.
## Related Issues
Closes SECRT-1928
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Adds the ability to delete chat sessions from the CoPilot interface — a
new `DELETE /api/chat/sessions/{session_id}` backend endpoint and a
corresponding delete button with confirmation dialog in the
`ChatSidebar` frontend component.
- **Backend route** (`routes.py`): Clean implementation reusing the
existing `delete_chat_session` model function with proper auth guards
and 204/404 responses. No issues.
- **Frontend** (`ChatSidebar.tsx`): Adds hover-visible trash icon per
session, confirmation dialog, mutation with cache invalidation, and
active session clearing on delete. However, it uses a `__legacy__`
component (`DeleteConfirmDialog`) which violates the project's style
guide — new code should use the modern design system components. Error
handling only logs to console without user-facing feedback (project
convention is to use toast notifications for mutation errors).
`isDeleting` is destructured but unused.
- **OpenAPI spec** updated correctly.
- **Unrelated file included**:
`notes/plan-SECRT-1959-graph-edge-desync.md` is a planning document for
a different ticket and should be removed from this PR. The `notes/`
directory is newly introduced and both plan files should be reconsidered
for inclusion.
</details>
<details><summary><h3>Confidence Score: 3/5</h3></summary>
- Functionally correct but has style guide violations and includes
unrelated files that should be addressed before merge.
- The core feature implementation (backend DELETE endpoint and frontend
mutation logic) is sound and follows existing patterns. Score is lowered
because: (1) the frontend uses a legacy component explicitly prohibited
by the project's style guide, (2) mutation errors are not surfaced to
the user, and (3) the PR includes an unrelated planning document for a
different ticket.
- Pay close attention to `ChatSidebar.tsx` for the legacy component
import and error handling, and
`notes/plan-SECRT-1959-graph-edge-desync.md` which should be removed.
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant User
participant ChatSidebar as ChatSidebar (Frontend)
participant ReactQuery as React Query
participant API as DELETE /api/chat/sessions/{id}
participant Model as model.delete_chat_session
participant DB as db.delete_chat_session (Prisma)
participant Redis as Redis Cache
User->>ChatSidebar: Click trash icon on session
ChatSidebar->>ChatSidebar: Show DeleteConfirmDialog
User->>ChatSidebar: Confirm deletion
ChatSidebar->>ReactQuery: deleteSession({ sessionId })
ReactQuery->>API: DELETE /api/chat/sessions/{session_id}
API->>Model: delete_chat_session(session_id, user_id)
Model->>DB: delete_many(where: {id, userId})
DB-->>Model: bool (deleted count > 0)
Model->>Redis: Delete session cache key
Model->>Model: Clean up session lock
Model-->>API: True
API-->>ReactQuery: 204 No Content
ReactQuery->>ChatSidebar: onSuccess callback
ChatSidebar->>ReactQuery: invalidateQueries(sessions list)
ChatSidebar->>ChatSidebar: Clear sessionId if deleted was active
```
</details>
<sub>Last reviewed commit: 44a92c6</sub>
<!-- greptile_other_comments_section -->
<details><summary><h4>Context used (3)</h4></summary>
- Context from `dashboard` - autogpt_platform/frontend/CLAUDE.md
([source](https://app.greptile.com/review/custom-context?memory=39861924-d320-41ba-a1a7-a8bff44f780a))
- Context from `dashboard` - autogpt_platform/frontend/CONTRIBUTING.md
([source](https://app.greptile.com/review/custom-context?memory=cc4f1b17-cb5c-4b63-b218-c772b48e20ee))
- Context from `dashboard` - autogpt_platform/CLAUDE.md
([source](https://app.greptile.com/review/custom-context?memory=6e9dc5dc-8942-47df-8677-e60062ec8c3a))
</details>
<!-- /greptile_comment -->
---------
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
## Summary
Enhances the existing `ConcatenateListsBlock` and adds five new
companion blocks for comprehensive list manipulation, addressing issue
#11139 ("Implement block to concatenate lists").
### Changes
- **Enhanced `ConcatenateListsBlock`** with optional deduplication
(`deduplicate`) and None-value filtering (`remove_none`), plus an output
`length` field
- **New `FlattenListBlock`**: Recursively flattens nested list
structures with configurable `max_depth`
- **New `InterleaveListsBlock`**: Round-robin interleaving of elements
from multiple lists
- **New `ZipListsBlock`**: Zips corresponding elements from multiple
lists with support for padding to longest or truncating to shortest
- **New `ListDifferenceBlock`**: Computes set difference between two
lists (regular or symmetric)
- **New `ListIntersectionBlock`**: Finds common elements between two
lists, preserving order
### Helper Utilities
Extracted reusable helper functions for validation, flattening,
deduplication, interleaving, chunking, and statistics computation to
support the blocks and enable future reuse.
### Test Coverage
Comprehensive test suite with 188 test functions across 29 test classes
covering:
- Built-in block test harness validation for all 6 blocks
- Manual edge-case tests for each block (empty inputs, large lists,
mixed types, nested structures)
- Internal method tests for all block classes
- Unit tests for all helper utility functions
Closes#11139
## Test plan
- [x] All files pass Python syntax validation (`ast.parse`)
- [x] Built-in `test_input`/`test_output` tests defined for all blocks
- [x] Manual tests cover edge cases: empty lists, large lists, mixed
types, nested structures, deduplication, None removal
- [x] Helper function tests validate all utility functions independently
- [x] All block IDs are valid UUID4
- [x] Block categories set to `BlockCategory.BASIC` for consistency with
existing list blocks
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Enhanced `ConcatenateListsBlock` with deduplication and None-filtering
options, and added five new list manipulation blocks
(`FlattenListBlock`, `InterleaveListsBlock`, `ZipListsBlock`,
`ListDifferenceBlock`, `ListIntersectionBlock`) with comprehensive
helper functions and test coverage.
**Key Changes:**
- Enhanced `ConcatenateListsBlock` with `deduplicate` and `remove_none`
options, plus `length` output field
- Added `FlattenListBlock` for recursively flattening nested lists with
configurable `max_depth`
- Added `InterleaveListsBlock` for round-robin element interleaving
- Added `ZipListsBlock` with support for padding/truncation
- Added `ListDifferenceBlock` and `ListIntersectionBlock` for set
operations
- Extracted 12 reusable helper functions for validation, flattening,
deduplication, etc.
- Comprehensive test suite with 188 test functions covering edge cases
**Minor Issues:**
- Helper function `_deduplicate_list` has redundant logic in the `else`
branch that duplicates the `if` branch
- Three helper functions (`_filter_empty_collections`,
`_compute_list_statistics`, `_chunk_list`) are defined but unused -
consider removing unless planned for future use
- The `_make_hashable` function uses `hash(repr(item))` for unhashable
types, which correctly treats structurally identical dicts/lists as
duplicates
</details>
<details><summary><h3>Confidence Score: 4/5</h3></summary>
- Safe to merge with minor style improvements recommended
- The implementation is well-structured with comprehensive test coverage
(188 tests), proper error handling, and follows existing block patterns.
All blocks use valid UUID4 IDs and correct categories. The helper
functions provide good code reuse. The minor issues are purely stylistic
(redundant code, unused helpers) and don't affect functionality or
safety.
- No files require special attention - both files are well-tested and
follow project conventions
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant User
participant Block as List Block
participant Helper as Helper Functions
participant Output
User->>Block: Input (lists/parameters)
Block->>Helper: _validate_all_lists()
Helper-->>Block: validation result
alt validation fails
Block->>Output: error message
else validation succeeds
Block->>Helper: _concatenate_lists_simple() / _flatten_nested_list() / etc.
Helper-->>Block: processed result
opt deduplicate enabled
Block->>Helper: _deduplicate_list()
Helper-->>Block: deduplicated result
end
opt remove_none enabled
Block->>Helper: _filter_none_values()
Helper-->>Block: filtered result
end
Block->>Output: result + length
end
Output-->>User: Block outputs
```
</details>
<sub>Last reviewed commit: a6d5445</sub>
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs
up/down!</sub>
<!-- /greptile_comment -->
---------
Co-authored-by: Otto <otto@agpt.co>
## Summary
- Enable Claude Agent SDK built-in **WebSearch** tool (Brave Search via
Anthropic API) for the CoPilot SDK agent
- Explicitly **block WebFetch** via `SDK_DISALLOWED_TOOLS`. The agent
uses the SSRF-protected `mcp__copilot__web_fetch` MCP tool instead
- **Consolidate** all tool security constants (`BLOCKED_TOOLS`,
`WORKSPACE_SCOPED_TOOLS`, `DANGEROUS_PATTERNS`, `SDK_DISALLOWED_TOOLS`)
into `tool_adapter.py` as a single source of truth — previously
scattered across `tool_adapter.py`, `security_hooks.py`, and inline in
`service.py`
## Changes
- `tool_adapter.py`: Add `WebSearch` to `_SDK_BUILTIN_TOOLS`, add
`SDK_DISALLOWED_TOOLS`, move security constants here
- `security_hooks.py`: Import constants from `tool_adapter.py` instead
of defining locally
- `service.py`: Use `SDK_DISALLOWED_TOOLS` instead of inline `["Bash"]`
## Test plan
- [x] All 21 security hooks tests pass
- [x] Ruff lint clean
- [x] All pre-commit hooks pass
- [ ] Verify WebSearch works in CoPilot chat (manual test)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Consolidates tool security constants into `tool_adapter.py` as single
source of truth, enables WebSearch (Brave via Anthropic API), and
explicitly blocks WebFetch to prevent SSRF attacks. The change improves
security by ensuring the agent uses the SSRF-protected
`mcp__copilot__web_fetch` tool instead of the built-in WebFetch which
can access internal networks like `localhost:8006`.
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge with minimal risk
- The changes improve security by blocking WebFetch (SSRF risk) while
enabling safe WebSearch. The consolidation of constants into a single
source of truth improves maintainability. All existing tests pass (21
security hooks tests), and the refactoring is straightforward with no
behavioral changes to existing security logic. The only suggestions are
minor improvements: adding a test for WebFetch blocking and considering
a lowercase alias for consistency.
- No files require special attention
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant Agent as SDK Agent
participant Hooks as Security Hooks
participant TA as tool_adapter.py
participant MCP as MCP Tools
Note over TA: SDK_DISALLOWED_TOOLS = ["Bash", "WebFetch"]
Note over TA: _SDK_BUILTIN_TOOLS includes WebSearch
Agent->>Hooks: Request WebSearch (Brave API)
Hooks->>TA: Check BLOCKED_TOOLS
TA-->>Hooks: Not blocked
Hooks-->>Agent: Allowed ✓
Agent->>Agent: Execute via Anthropic API
Agent->>Hooks: Request WebFetch (SSRF risk)
Hooks->>TA: Check BLOCKED_TOOLS
Note over TA: WebFetch in SDK_DISALLOWED_TOOLS
TA-->>Hooks: Blocked
Hooks-->>Agent: Denied ✗
Note over Agent: Use mcp__copilot__web_fetch instead
Agent->>Hooks: Request mcp__copilot__web_fetch
Hooks->>MCP: Validate (MCP tool, not SDK builtin)
MCP-->>Hooks: Has SSRF protection
Hooks-->>Agent: Allowed ✓
Agent->>MCP: Execute with SSRF checks
```
</details>
<sub>Last reviewed commit: 2d9975f</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
## Summary
- catch Jina reader client/server errors in ExtractWebsiteContentBlock
and surface a clear error output keyed to the user URL
- guard empty responses to return an explicit error instead of yielding
blank content
- add regression tests covering the happy path and HTTP client failures
via a monkeypatched fetch
## Testing
- not run (pytest unavailable in this environment)
---------
Co-authored-by: Nicholas Tindle <nicktindle@outlook.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
- Resolve merge conflicts from merged SDK changes (PR #12103)
- Move sdk/ files from api/features/chat/sdk/ to copilot/sdk/
- Fix all imports to use backend.copilot.* paths
- Move new tools (bash_exec, sandbox, web_fetch, feature_requests,
check_operation_status) to copilot/tools/ with updated imports
- Add append_and_save_message to model.py (adapted to chat_db() pattern)
- Wire SDK service into copilot executor processor with feature flag
- Add track_user_message to routes.py stream handler
## Summary
<img width="1000" alt="image"
src="https://github.com/user-attachments/assets/18e8ef34-d222-453c-8b0a-1b25ef8cf806"
/>
<img width="250" alt="image"
src="https://github.com/user-attachments/assets/ba97556c-09c5-4f76-9f4e-49a2e8e57468"
/>
<img width="250" alt="image"
src="https://github.com/user-attachments/assets/68f7804a-fe74-442d-9849-39a229c052cf"
/>
<img width="250" alt="image"
src="https://github.com/user-attachments/assets/700690ba-f9fe-4726-8871-3bfbab586001"
/>
Full-stack MCP (Model Context Protocol) tool block integration that
allows users to connect to any MCP server, discover available tools,
authenticate via OAuth, and execute tools — all through the standard
AutoGPT credential system.
### Backend
- **MCPToolBlock** (`blocks/mcp/block.py`): New block using
`CredentialsMetaInput` pattern with optional credentials (`default={}`),
supporting both authenticated (OAuth) and public MCP servers. Includes
auto-lookup fallback for backward compatibility.
- **MCP Client** (`blocks/mcp/client.py`): HTTP transport with JSON-RPC
2.0, tool discovery, tool execution with robust error handling
(type-checked error fields, non-JSON response handling)
- **MCP OAuth Handler** (`blocks/mcp/oauth.py`): RFC 8414 discovery,
dynamic per-server OAuth with PKCE, token storage and refresh via
`raise_for_status=True`
- **MCP API Routes** (`api/features/mcp/routes.py`): `discover-tools`,
`oauth/login`, `oauth/callback` endpoints with credential cleanup,
defensive OAuth metadata validation
- **Credential system integration**:
- `CredentialsMetaInput` model_validator normalizes legacy
`"ProviderName.MCP"` format from Python 3.13's `str(StrEnum)` change
- `CredentialsFieldInfo.combine()` supports URL-based credential
discrimination (each MCP server gets its own credential entry)
- `aggregate_credentials_inputs` checks block schema defaults for
credential optionality
- Executor normalizes credential data for both Pydantic and JSON schema
validation paths
- Chat credential matching handles MCP server URL filtering
- `provider_matches()` helper used consistently for Python 3.13 StrEnum
compatibility
- **Pre-run validation**: `_validate_graph_get_errors` now calls
`get_missing_input()` for custom block-level validation (MCP tool
arguments)
- **Security**: HTML tag stripping loop to prevent XSS bypass, SSRF
protection (removed trusted_origins)
### Frontend
- **MCPToolDialog** (`MCPToolDialog.tsx`): Full tool discovery UI —
enter server URL, authenticate if needed, browse tools, select tool and
configure
- **OAuth popup** (`oauth-popup.ts`): Shared utility supporting
cross-origin MCP OAuth flows with BroadcastChannel + localStorage
fallback
- **Credential integration**: MCP-specific OAuth flow in
`useCredentialsInput`, server URL filtering in `useCredentials`, MCP
callback page
- **CredentialsSelect**: Auto-selects first available credential instead
of defaulting to "None", credentials listed before "None" in dropdown
- **Node rendering**: Dynamic tool input schema rendering on MCP nodes,
proper handling in both legacy and new flow editors
- **Block title persistence**: `customized_name` set at block creation
for both MCP and Agent blocks — no fallback logic needed, titles survive
save/load reliably
- **Stable credential ordering**: Removed `sortByUnsetFirst` that caused
credential inputs to jump when selected
### Tests (~2060 lines)
- Unit tests: block, client, tool execution
- Integration tests: mock MCP server with auth
- OAuth flow tests
- API endpoint tests
- Credential combining/optionality tests
- E2e tests (skipped in CI, run manually)
## Key Design Decisions
1. **Optional credentials via `default={}`**: MCP servers can be public
(no auth) or private (OAuth). The `credentials` field has `default={}`
making it optional at the schema level, so public servers work without
prompting for credentials.
2. **URL-based credential discrimination**: Each MCP server URL gets its
own credential entry in the "Run agent" form (via
`discriminator="server_url"`), so agents using multiple MCP servers
prompt for each independently.
3. **Model-level normalization**: Python 3.13 changed `str(StrEnum)` to
return `"ClassName.MEMBER"`. Rather than scattering fixes across the
codebase, a Pydantic `model_validator(mode="before")` on
`CredentialsMetaInput` handles normalization centrally, and
`provider_matches()` handles lookups.
4. **Credential auto-select**: `CredentialsSelect` component defaults to
the first available credential and notifies the parent state, ensuring
credentials are pre-filled in the "Run agent" dialog without requiring
manual selection.
5. **customized_name for block titles**: Both MCP and Agent blocks set
`customized_name` in metadata at creation time. This eliminates
convoluted runtime fallback logic (`agent_name`, hostname extraction) —
the title is persisted once and read directly.
## Test plan
- [x] Unit/integration tests pass (68 MCP + 11 graph = 79 tests)
- [x] Manual: MCP block with public server (DeepWiki) — no credentials
needed, tools discovered and executable
- [x] Manual: MCP block with OAuth server (Linear, Sentry) — OAuth flow
prompts correctly
- [x] Manual: "Run agent" form shows correct credential requirements per
MCP server
- [x] Manual: Credential auto-selects when exactly one matches,
pre-selects first when multiple exist
- [x] Manual: Credential ordering stays stable when
selecting/deselecting
- [x] Manual: MCP block title persists after save and refresh
- [x] Manual: Agent block title persists after save and refresh (via
customized_name)
- [ ] Manual: Shared agent with MCP block prompts new user for
credentials
---------
Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: Ubbe <hi@ubbe.dev>
## Summary
Full integration of the **Claude Agent SDK** to replace the existing
one-turn OpenAI-compatible CoPilot implementation with a multi-turn,
tool-using AI agent.
### What changed
**Core SDK Integration** (`chat/sdk/` — new module)
- **`service.py`**: Main orchestrator — spawns Claude Code CLI as a
subprocess per user message, streams responses back via SSE. Handles
conversation history compression, session lifecycle, and error recovery.
- **`response_adapter.py`**: Translates Claude Agent SDK events (text
deltas, tool use, errors, result messages) into the existing CoPilot
`StreamEvent` protocol so the frontend works unchanged.
- **`tool_adapter.py`**: Bridges CoPilot's MCP tools (find_block,
run_block, create_agent, etc.) into the SDK's tool format. Handles
schema conversion and result serialization.
- **`security_hooks.py`**: Pre/Post tool-use hooks that enforce a strict
allowlist of tools, block path traversal, sandbox file operations to
per-session workspace directories, cap sub-agent spawning, and prevent
the model from accessing unauthorized system resources.
- **`transcript.py`**: JSONL transcript I/O utilities for the stateless
`--resume` feature (see below).
**Stateless Multi-Turn Resume** (new)
- Instead of compressing conversation history via LLM on every turn
(lossy and expensive), we capture Claude Code's native JSONL session
transcript via a **Stop hook** callback, persist it in the DB
(`ChatSession.sdkTranscript`), and restore it on the next turn via
`--resume <file>`.
- This preserves full tool call/result context across turns with zero
token overhead for history.
- Feature-flagged via `CLAUDE_AGENT_USE_RESUME` (default: off).
- DB migration: `ALTER TABLE "ChatSession" ADD COLUMN "sdkTranscript"
TEXT`.
**Sandboxed Tool Execution** (`chat/tools/`)
- **`bash_exec.py`**: Sandboxed bash execution using bubblewrap
(`bwrap`) with read-only root filesystem, per-session writable
workspace, resource limits (CPU, memory, file size), and network
isolation.
- **`sandbox.py`**: Shared bubblewrap sandbox infrastructure — generates
`bwrap` command lines with configurable mounts, environment, and
resource constraints.
- **`web_fetch.py`**: URL fetching tool with domain allowlist, size
limits, and content-type filtering.
- **`check_operation_status.py`**: Polling tool for long-running
operations (agent creation, block execution) so the SDK doesn't block
waiting.
- **`find_block.py`** / **`run_block.py`**: Enhanced with category
filtering, optimized response size (removed raw JSON schemas), and
better error handling.
**Security**
- Path traversal prevention: session IDs sanitized, all file ops
confined to workspace dirs, symlink resolution.
- Tool allowlist enforcement via SDK hooks — model cannot call arbitrary
tools.
- Built-in `Bash` tool blocked via `disallowed_tools` to prevent
bypassing sandboxed `bash_exec`.
- Sub-agent (`Task`) spawning capped at configurable limit (default:
10).
- CodeQL-clean path sanitization patterns.
**Streaming & Reconnection**
- SSE stream registry backed by Redis Streams for crash-resilient
reconnection.
- Long-running operation tracking with TTL-based cleanup.
- Atomic message append to prevent race conditions on concurrent writes.
**Configuration** (`config.py`)
- `use_claude_agent_sdk` — master toggle (default: on)
- `claude_agent_model` — model override for SDK path
- `claude_agent_max_buffer_size` — JSON parsing buffer (10MB)
- `claude_agent_max_subtasks` — sub-agent cap (10)
- `claude_agent_use_resume` — transcript-based resume (default: off)
- `thinking_enabled` — extended thinking for Claude models
**Tests**
- `sdk/response_adapter_test.py` — 366 lines covering all event
translation paths
- `sdk/security_hooks_test.py` — 165 lines covering tool blocking, path
traversal, subtask limits
- `chat/model_test.py` — 214 lines covering session model serialization
- `chat/service_test.py` — Integration tests including multi-turn resume
keyword recall
- `tools/find_block_test.py` / `run_block_test.py` — Extended with new
tool behavior tests
## Test plan
- [x] Unit tests pass (`sdk/response_adapter_test.py`,
`security_hooks_test.py`, `model_test.py`)
- [x] Integration test: multi-turn keyword recall via `--resume`
(`service_test.py::test_sdk_resume_multi_turn`)
- [x] Manual E2E: CoPilot chat sessions with tool calls, bash execution,
and multi-turn context
- [x] Pre-commit hooks pass (ruff, isort, black, pyright, flake8)
- [ ] Staging deployment with `claude_agent_use_resume=false` initially
- [ ] Enable resume in staging, verify transcript capture and recall
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
This PR replaces the existing OpenAI-compatible CoPilot with a full
Claude Agent SDK integration, introducing multi-turn conversations,
stateless resume via JSONL transcripts, and sandboxed tool execution.
**Key changes:**
- **SDK integration** (`chat/sdk/`): spawns Claude Code CLI subprocess
per message, translates events to frontend protocol, bridges MCP tools
- **Stateless resume**: captures JSONL transcripts via Stop hook,
persists in `ChatSession.sdkTranscript`, restores with `--resume`
(feature-flagged, default off)
- **Sandboxed execution**: bubblewrap sandbox for bash commands with
filesystem whitelist, network isolation, resource limits
- **Security hooks**: tool allowlist enforcement, path traversal
prevention, workspace-scoped file operations, sub-agent spawn limits
- **Long-running operations**: delegates `create_agent`/`edit_agent` to
existing stream_registry infrastructure for SSE reconnection
- **Feature flag**: `CHAT_USE_CLAUDE_AGENT_SDK` with LaunchDarkly
support, defaults to enabled
**Security issues found:**
- Path traversal validation has logic errors in `security_hooks.py:82`
(tilde expansion order) and `service.py:266` (redundant `..` check)
- Config validator always prefers env var over explicit `False` value
(`config.py:162`)
- Race condition in `routes.py:323` — message persisted before task
registration, could duplicate on retry
- Resource limits in sandbox may fail silently (`sandbox.py:109`)
**Test coverage is strong** with 366 lines for response adapter, 165 for
security hooks, and integration tests for multi-turn resume.
</details>
<details><summary><h3>Confidence Score: 3/5</h3></summary>
- This PR is generally safe but has critical security issues in path
validation that must be fixed before merge
- Score reflects strong architecture and test coverage offset by real
security vulnerabilities: the tilde expansion bug in `security_hooks.py`
could allow sandbox escape, the race condition could cause message
duplication, and the silent ulimit failures could bypass resource
limits. The bubblewrap sandbox and allowlist enforcement are
well-designed, but the path validation bugs need fixing. The transcript
resume feature is properly feature-flagged. Overall the implementation
is solid but the security issues prevent a higher score.
- Pay close attention to
`backend/api/features/chat/sdk/security_hooks.py` (path traversal
vulnerability), `backend/api/features/chat/routes.py` (race condition),
`backend/api/features/chat/tools/sandbox.py` (silent resource limit
failures), and `backend/api/features/chat/sdk/service.py` (redundant
security check)
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant Frontend
participant Routes as routes.py
participant SDKService as sdk/service.py
participant ClaudeSDK as Claude Agent SDK CLI
participant SecurityHooks as security_hooks.py
participant ToolAdapter as tool_adapter.py
participant CoPilotTools as tools/*
participant Sandbox as sandbox.py (bwrap)
participant DB as Database
participant Redis as stream_registry
Frontend->>Routes: POST /chat (user message)
Routes->>SDKService: stream_chat_completion_sdk()
SDKService->>DB: get_chat_session()
DB-->>SDKService: session + messages
alt Resume enabled AND transcript exists
SDKService->>SDKService: validate_transcript()
SDKService->>SDKService: write_transcript_to_tempfile()
Note over SDKService: Pass --resume to SDK
else No resume
SDKService->>SDKService: _compress_conversation_history()
Note over SDKService: Inject history into user message
end
SDKService->>SecurityHooks: create_security_hooks()
SDKService->>ToolAdapter: create_copilot_mcp_server()
SDKService->>ClaudeSDK: spawn subprocess with MCP server
loop Streaming Conversation
ClaudeSDK->>SDKService: AssistantMessage (text/tool_use)
SDKService->>Frontend: StreamTextDelta / StreamToolInputAvailable
alt Tool Call
ClaudeSDK->>SecurityHooks: PreToolUse hook
SecurityHooks->>SecurityHooks: validate path, check allowlist
alt Tool blocked
SecurityHooks-->>ClaudeSDK: deny
else Tool allowed
SecurityHooks-->>ClaudeSDK: allow
ClaudeSDK->>ToolAdapter: call MCP tool
alt Long-running tool (create_agent, edit_agent)
ToolAdapter->>Redis: register task
ToolAdapter->>DB: save OperationPendingResponse
ToolAdapter->>ToolAdapter: spawn background task
ToolAdapter-->>ClaudeSDK: OperationStartedResponse
else Regular tool (find_block, bash_exec)
ToolAdapter->>CoPilotTools: execute()
alt bash_exec
CoPilotTools->>Sandbox: run_sandboxed()
Sandbox->>Sandbox: build bwrap command
Note over Sandbox: Network isolation,<br/>filesystem whitelist,<br/>resource limits
Sandbox-->>CoPilotTools: stdout, stderr, exit_code
end
CoPilotTools-->>ToolAdapter: result
ToolAdapter->>ToolAdapter: stash full output
ToolAdapter-->>ClaudeSDK: MCP response
end
SecurityHooks->>SecurityHooks: PostToolUse hook (log)
end
end
ClaudeSDK->>SDKService: UserMessage (ToolResultBlock)
SDKService->>ToolAdapter: pop_pending_tool_output()
SDKService->>Frontend: StreamToolOutputAvailable
end
ClaudeSDK->>SecurityHooks: Stop hook
SecurityHooks->>SDKService: transcript_path callback
SDKService->>SDKService: read_transcript_file()
SDKService->>DB: save transcript to session.sdkTranscript
ClaudeSDK->>SDKService: ResultMessage (success)
SDKService->>Frontend: StreamFinish
SDKService->>DB: upsert_chat_session()
```
</details>
<sub>Last reviewed commit: 28c1121</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
---------
Co-authored-by: Swifty <craigswift13@gmail.com>
## Summary
Disables the Print to Console block as requested by Nick Tindle.
## Changes
- Added `disabled=True` to PrintToConsoleBlock in `basic.py`
## Testing
- Block will no longer appear in the platform UI
- Existing graphs using this block should be checked (block ID:
`f3b1c1b2-4c4f-4f0d-8d2f-4c4f0d8d2f4c`)
Closes OPEN-3000
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Added `disabled=True` parameter to `PrintToConsoleBlock` in `basic.py`
per Nick Tindle's request (OPEN-3000).
- Block follows the same disabling pattern used by other blocks in the
codebase (e.g., `BlockInstallationBlock`, video blocks, Ayrshare blocks)
- Block will no longer appear in the platform UI for new graph creation
- Existing graphs using this block (ID:
`f3b1c1b2-4c4f-4f0d-8d2f-4c4f0d8d2f4c`) will need to be checked for
compatibility
- Comment properly documents the reason for disabling
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge with minimal risk
- Single-line change that adds a well-documented flag following existing
patterns used throughout the codebase. The change is non-destructive and
only affects UI visibility of the block for new graphs.
- No files require special attention
</details>
<sub>Last reviewed commit: 759003b</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Users can now search for existing feature requests and submit new ones
directly through the CoPilot chat interface. Requests are tracked in
Linear with customer need attribution.
### Changes 🏗️
**Backend:**
- Added `SearchFeatureRequestsTool` and `CreateFeatureRequestTool` to
the CoPilot chat tools registry
- Integrated with Linear GraphQL API for searching issues in the feature
requests project, creating new issues, upserting customers, and
attaching customer needs
- Added `linear_api_key` secret to settings for system-level Linear API
access
- Added response models (`FeatureRequestSearchResponse`,
`FeatureRequestCreatedResponse`, `FeatureRequestInfo`) to the tools
models
**Frontend:**
- Added `SearchFeatureRequestsTool` and `CreateFeatureRequestTool` UI
components with full streaming state handling (input-streaming,
input-available, output-available, output-error)
- Added helper utilities for output parsing, type guards, animation
text, and icon rendering
- Wired tools into `ChatMessagesContainer` for rendering in the chat
- Added styleguide examples covering all tool states
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified search returns matching feature requests from Linear
- [x] Verified creating a new feature request creates an issue and
customer need in Linear
- [x] Verified adding a need to an existing issue works via
`existing_issue_id`
- [x] Verified error states render correctly in the UI
- [x] Verified styleguide page renders all tool states
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
New secret: `LINEAR_API_KEY` — required for system-level Linear API
operations (defaults to empty string).
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Adds feature request search and creation tools to CoPilot chat,
integrating with Linear's GraphQL API to track user feedback. Users can
now search existing feature requests and submit new ones (or add their
need to existing issues) directly through conversation.
**Key changes:**
- Backend: `SearchFeatureRequestsTool` and `CreateFeatureRequestTool`
with Linear API integration via system-level `LINEAR_API_KEY`
- Frontend: React components with streaming state handling and accordion
UI for search results and creation confirmations
- Models: Added `FeatureRequestSearchResponse` and
`FeatureRequestCreatedResponse` to response types
- Customer need tracking: Upserts customers in Linear and attaches needs
to issues for better feedback attribution
**Issues found:**
- Missing `LINEAR_API_KEY` entry in `.env.default` (required per PR
description checklist)
- Hardcoded project/team IDs reduce maintainability
- Global singleton pattern could cause issues in async contexts
- Using `user_id` as customer name reduces readability in Linear
</details>
<details><summary><h3>Confidence Score: 4/5</h3></summary>
- Safe to merge with minor configuration fix required
- The implementation is well-structured with proper error handling, type
safety, and follows existing patterns in the codebase. The missing
`.env.default` entry is a straightforward configuration issue that must
be fixed before deployment but doesn't affect code quality. The other
findings are style improvements that don't impact functionality.
- Verify that `LINEAR_API_KEY` is added to `.env.default` before merging
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant User
participant CoPilot UI
participant LLM
participant FeatureRequestTool
participant LinearClient
participant Linear API
User->>CoPilot UI: Request feature via chat
CoPilot UI->>LLM: Send user message
LLM->>FeatureRequestTool: search_feature_requests(query)
FeatureRequestTool->>LinearClient: query(SEARCH_ISSUES_QUERY)
LinearClient->>Linear API: POST /graphql (search)
Linear API-->>LinearClient: searchIssues.nodes[]
LinearClient-->>FeatureRequestTool: Feature request data
FeatureRequestTool-->>LLM: FeatureRequestSearchResponse
alt No existing requests found
LLM->>FeatureRequestTool: create_feature_request(title, description)
FeatureRequestTool->>LinearClient: mutate(CUSTOMER_UPSERT_MUTATION)
LinearClient->>Linear API: POST /graphql (upsert customer)
Linear API-->>LinearClient: customer {id, name}
LinearClient-->>FeatureRequestTool: Customer data
FeatureRequestTool->>LinearClient: mutate(ISSUE_CREATE_MUTATION)
LinearClient->>Linear API: POST /graphql (create issue)
Linear API-->>LinearClient: issue {id, identifier, url}
LinearClient-->>FeatureRequestTool: Issue data
FeatureRequestTool->>LinearClient: mutate(CUSTOMER_NEED_CREATE_MUTATION)
LinearClient->>Linear API: POST /graphql (attach need)
Linear API-->>LinearClient: need {id, issue}
LinearClient-->>FeatureRequestTool: Need data
FeatureRequestTool-->>LLM: FeatureRequestCreatedResponse
else Existing request found
LLM->>FeatureRequestTool: create_feature_request(title, description, existing_issue_id)
FeatureRequestTool->>LinearClient: mutate(CUSTOMER_UPSERT_MUTATION)
LinearClient->>Linear API: POST /graphql (upsert customer)
Linear API-->>LinearClient: customer {id}
LinearClient-->>FeatureRequestTool: Customer data
FeatureRequestTool->>LinearClient: mutate(CUSTOMER_NEED_CREATE_MUTATION)
LinearClient->>Linear API: POST /graphql (attach need to existing)
Linear API-->>LinearClient: need {id, issue}
LinearClient-->>FeatureRequestTool: Need data
FeatureRequestTool-->>LLM: FeatureRequestCreatedResponse
end
LLM-->>CoPilot UI: Tool response + continuation
CoPilot UI-->>User: Display result with accordion UI
```
</details>
<sub>Last reviewed commit: af2e093</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
## Summary
Adds comprehensive error logging for OpenRouter/OpenAI API errors to
help diagnose issues like provider routing failures, context length
exceeded, rate limits, etc.
## Background
While investigating
[SECRT-1859](https://linear.app/autogpt/issue/SECRT-1859), we found that
when OpenRouter returns errors, the actual error details weren't being
captured or logged. Langfuse traces showed `provider_name: 'unknown'`
and `completion: null` without any insight into WHY all providers
rejected the request.
## Changes
- Add `_extract_api_error_details()` to extract rich information from
API errors including:
- Status code and request ID
- Response body (contains OpenRouter's actual error message)
- OpenRouter-specific headers (provider, model)
- Rate limit headers
- Add `_log_api_error()` helper that logs errors with context:
- Session ID for correlation
- Message count (helps identify context length issues)
- Model being used
- Retry count
- Update error handling in `_stream_chat_chunks()` and
`_generate_llm_continuation()` to use new logging
- Extract provider's error message from response body for better user
feedback
## Example log output
```
API error: {
'error_type': 'APIStatusError',
'error_message': 'Provider returned error',
'status_code': 400,
'request_id': 'req_xxx',
'response_body': {'error': {'message': 'context_length_exceeded', 'type': 'invalid_request_error'}},
'openrouter_provider': 'unknown',
'session_id': '44fbb803-...',
'message_count': 52,
'model': 'anthropic/claude-opus-4.5',
'retry_count': 0
}
```
## Testing
- [ ] Verified code passes linting (black, isort, ruff)
- [ ] Error details are properly extracted from different error types
## Refs
- Linear: SECRT-1859
- Thread:
https://discord.com/channels/1126875755960336515/1467066151002571034
---------
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
The frontend `e2e_test` doesn't have a working build cache setup,
causing really slow builds = slow test jobs. These changes reduce total
test runtime from ~12 minutes to ~5 minutes.
### Changes 🏗️
- Inject build cache config into docker compose config; let `buildx
bake` use GHA cache directly
- Add `docker-ci-fix-compose-build-cache.py` script
- Optimize `backend/Dockerfile` + root `.dockerignore`
- Replace broken DIY pnpm store caching with `actions/setup-node`
built-in cache management
- Add caching for test seed data created in DB
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- CI
### Changes 🏗️
The `find_block` AutoPilot tool was returning ~90K characters per
response (10 blocks). The bloat came from including full JSON Schema
objects (`input_schema`, `output_schema`) with all nested `$defs`,
`anyOf`, and type definitions for every block.
**What changed:**
- **`BlockInfoSummary` model**: Removed `input_schema` (raw JSON
Schema), `output_schema` (raw JSON Schema), and `categories`. Added
`output_fields` (compact field-level summaries matching the existing
`required_inputs` format).
- **`BlockListResponse` model**: Removed `usage_hint` (info now in
`message`).
- **`FindBlockTool._execute()`**: Now extracts compact `output_fields`
from output schema properties instead of including the entire raw
schema. Credentials handling is unchanged.
- **Test**: Added `test_response_size_average_chars_per_block` with
realistic block schemas (HTTP, Email, Claude Code) to measure and assert
response size stays under 2K chars/block.
- **`CLAUDE.md`**: Clarified `dev` vs `master` branching strategy.
**Result:** Average response size reduced from ~9,000 to ~1,300 chars
per block (~85% reduction). This directly reduces LLM token consumption,
latency, and API costs for AutoPilot interactions.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified models import and serialize correctly
- [x] Verified response size: 3,970 chars for 3 realistic blocks (avg
1,323/block)
- [x] Lint (`ruff check`) and type check (`pyright`) pass on changed
files
- [x] Frontend compatibility preserved: `blocks[].name` and `count`
fields retained for `block_list` handler
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Toran Bruce Richards <toran.richards@gmail.com>
Agent generation completes on the backend but the UI does not
update/refresh to show the result.
### Changes 🏗️
![Uploading Screenshot 2026-02-13 at 00.44.54.png…]()
- **Stream start timeout (12s):** If the backend doesn't begin streaming
within 12 seconds of submitting a message, the stream is aborted and a
destructive toast is shown to the user.
- **Long-running tool polling:** Added `useLongRunningToolPolling` hook
that polls the session endpoint every 1.5s while a tool output is in an
operating state (`operation_started` / `operation_pending` /
`operation_in_progress`). When the backend completes, messages are
refreshed so the UI reflects the final result.
- **CreateAgent UI improvements:** Replaced the orbit loader / progress
bar with a mini-game, added expanded accordion for saved agents, and
improved the saved-agent card with image, icons, and links that open in
new tabs.
- **Backend tweaks:** Added `image_url` to `CreateAgentToolOutput`,
minor model/service updates for the dummy agent generator.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Send a message and verify the stream starts within 12s or a toast
appears
- [x] Trigger agent creation and verify the UI updates when the backend
completes
- [x] Verify the saved-agent card renders correctly with image, links,
and icons
---------
Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Store files created by sandbox blocks (Claude Code, Code Executor) to
the user's workspace for persistence across runs.
### Changes 🏗️
- **New `sandbox_files.py` utility** (`backend/util/sandbox_files.py`)
- Shared module for extracting files from E2B sandboxes
- Stores files to workspace via `store_media_file()` (includes virus
scanning, size limits)
- Returns `SandboxFileOutput` with path, content, and `workspace_ref`
- **Claude Code block** (`backend/blocks/claude_code.py`)
- Added `workspace_ref` field to `FileOutput` schema
- Replaced inline `_extract_files()` with shared utility
- Files from working directory now stored to workspace automatically
- **Code Executor block** (`backend/blocks/code_executor.py`)
- Added `files` output field to `ExecuteCodeBlock.Output`
- Creates `/output` directory in sandbox before execution
- Extracts all files (text + binary) from `/output` after execution
- Updated `execute_code()` to support file extraction with
`extract_files` param
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Create agent with Claude Code block, have it create a file, verify
`workspace_ref` in output
- [x] Create agent with Code Executor block, write file to `/output`,
verify `workspace_ref` in output
- [x] Verify files persist in workspace after sandbox disposal
- [x] Verify binary files (images, etc.) work correctly in Code Executor
- [x] Verify existing graphs using `content` field still work (backward
compat)
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
No configuration changes required - this is purely additive backend
code.
---
**Related:** Closes SECRT-1931
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Adds automatic extraction and workspace storage of sandbox-written
files (including binaries for code execution), which can affect output
payload size, performance, and file-handling edge cases.
>
> **Overview**
> **Sandbox blocks now persist generated files to workspace.** A new
shared utility (`backend/util/sandbox_files.py`) extracts files from an
E2B sandbox (scoped by a start timestamp) and stores them via
`store_media_file`, returning `SandboxFileOutput` with `workspace_ref`.
>
> `ClaudeCodeBlock` replaces its inline file-scraping logic with this
utility and updates the `files` output schema to include
`workspace_ref`.
>
> `ExecuteCodeBlock` adds a `files` output and extends the executor
mixin to optionally extract/store files (text + binary) when an
`execution_context` is provided; related mocks/tests and docs are
updated accordingly.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
343854c0cf. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
I'm getting circular import issues because there is a lot of
cross-importing between `backend.data`, `backend.blocks`, and other
modules. This change reduces block-related cross-imports and thus risk
of breaking circular imports.
### Changes 🏗️
- Strip down `backend.data.block`
- Move `Block` base class and related class/enum defs to
`backend.blocks._base`
- Move `is_block_auth_configured` to `backend.blocks._utils`
- Move `get_blocks()`, `get_io_block_ids()` etc. to `backend.blocks`
(`__init__.py`)
- Update imports everywhere
- Remove unused and poorly typed `Block.create()`
- Change usages from `block_cls.create()` to `block_cls()`
- Improve typing of `load_all_blocks` and `get_blocks`
- Move cross-import of `backend.api.features.library.model` from
`backend/data/__init__.py` to `backend/data/integrations.py`
- Remove deprecated attribute `NodeModel.webhook`
- Re-generate OpenAPI spec and fix frontend usage
- Eliminate module-level `backend.blocks` import from `blocks/agent.py`
- Eliminate module-level `backend.data.execution` and
`backend.executor.manager` imports from `blocks/helpers/review.py`
- Replace `BlockInput` with `GraphInput` for graph inputs
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- CI static type-checking + tests should be sufficient for this