mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-02-19 02:54:28 -05:00
## Summary
Full integration of the **Claude Agent SDK** to replace the existing
one-turn OpenAI-compatible CoPilot implementation with a multi-turn,
tool-using AI agent.
### What changed
**Core SDK Integration** (`chat/sdk/` — new module)
- **`service.py`**: Main orchestrator — spawns Claude Code CLI as a
subprocess per user message, streams responses back via SSE. Handles
conversation history compression, session lifecycle, and error recovery.
- **`response_adapter.py`**: Translates Claude Agent SDK events (text
deltas, tool use, errors, result messages) into the existing CoPilot
`StreamEvent` protocol so the frontend works unchanged.
- **`tool_adapter.py`**: Bridges CoPilot's MCP tools (find_block,
run_block, create_agent, etc.) into the SDK's tool format. Handles
schema conversion and result serialization.
- **`security_hooks.py`**: Pre/Post tool-use hooks that enforce a strict
allowlist of tools, block path traversal, sandbox file operations to
per-session workspace directories, cap sub-agent spawning, and prevent
the model from accessing unauthorized system resources.
- **`transcript.py`**: JSONL transcript I/O utilities for the stateless
`--resume` feature (see below).
**Stateless Multi-Turn Resume** (new)
- Instead of compressing conversation history via LLM on every turn
(lossy and expensive), we capture Claude Code's native JSONL session
transcript via a **Stop hook** callback, persist it in the DB
(`ChatSession.sdkTranscript`), and restore it on the next turn via
`--resume <file>`.
- This preserves full tool call/result context across turns with zero
token overhead for history.
- Feature-flagged via `CLAUDE_AGENT_USE_RESUME` (default: off).
- DB migration: `ALTER TABLE "ChatSession" ADD COLUMN "sdkTranscript"
TEXT`.
**Sandboxed Tool Execution** (`chat/tools/`)
- **`bash_exec.py`**: Sandboxed bash execution using bubblewrap
(`bwrap`) with read-only root filesystem, per-session writable
workspace, resource limits (CPU, memory, file size), and network
isolation.
- **`sandbox.py`**: Shared bubblewrap sandbox infrastructure — generates
`bwrap` command lines with configurable mounts, environment, and
resource constraints.
- **`web_fetch.py`**: URL fetching tool with domain allowlist, size
limits, and content-type filtering.
- **`check_operation_status.py`**: Polling tool for long-running
operations (agent creation, block execution) so the SDK doesn't block
waiting.
- **`find_block.py`** / **`run_block.py`**: Enhanced with category
filtering, optimized response size (removed raw JSON schemas), and
better error handling.
**Security**
- Path traversal prevention: session IDs sanitized, all file ops
confined to workspace dirs, symlink resolution.
- Tool allowlist enforcement via SDK hooks — model cannot call arbitrary
tools.
- Built-in `Bash` tool blocked via `disallowed_tools` to prevent
bypassing sandboxed `bash_exec`.
- Sub-agent (`Task`) spawning capped at configurable limit (default:
10).
- CodeQL-clean path sanitization patterns.
**Streaming & Reconnection**
- SSE stream registry backed by Redis Streams for crash-resilient
reconnection.
- Long-running operation tracking with TTL-based cleanup.
- Atomic message append to prevent race conditions on concurrent writes.
**Configuration** (`config.py`)
- `use_claude_agent_sdk` — master toggle (default: on)
- `claude_agent_model` — model override for SDK path
- `claude_agent_max_buffer_size` — JSON parsing buffer (10MB)
- `claude_agent_max_subtasks` — sub-agent cap (10)
- `claude_agent_use_resume` — transcript-based resume (default: off)
- `thinking_enabled` — extended thinking for Claude models
**Tests**
- `sdk/response_adapter_test.py` — 366 lines covering all event
translation paths
- `sdk/security_hooks_test.py` — 165 lines covering tool blocking, path
traversal, subtask limits
- `chat/model_test.py` — 214 lines covering session model serialization
- `chat/service_test.py` — Integration tests including multi-turn resume
keyword recall
- `tools/find_block_test.py` / `run_block_test.py` — Extended with new
tool behavior tests
## Test plan
- [x] Unit tests pass (`sdk/response_adapter_test.py`,
`security_hooks_test.py`, `model_test.py`)
- [x] Integration test: multi-turn keyword recall via `--resume`
(`service_test.py::test_sdk_resume_multi_turn`)
- [x] Manual E2E: CoPilot chat sessions with tool calls, bash execution,
and multi-turn context
- [x] Pre-commit hooks pass (ruff, isort, black, pyright, flake8)
- [ ] Staging deployment with `claude_agent_use_resume=false` initially
- [ ] Enable resume in staging, verify transcript capture and recall
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
This PR replaces the existing OpenAI-compatible CoPilot with a full
Claude Agent SDK integration, introducing multi-turn conversations,
stateless resume via JSONL transcripts, and sandboxed tool execution.
**Key changes:**
- **SDK integration** (`chat/sdk/`): spawns Claude Code CLI subprocess
per message, translates events to frontend protocol, bridges MCP tools
- **Stateless resume**: captures JSONL transcripts via Stop hook,
persists in `ChatSession.sdkTranscript`, restores with `--resume`
(feature-flagged, default off)
- **Sandboxed execution**: bubblewrap sandbox for bash commands with
filesystem whitelist, network isolation, resource limits
- **Security hooks**: tool allowlist enforcement, path traversal
prevention, workspace-scoped file operations, sub-agent spawn limits
- **Long-running operations**: delegates `create_agent`/`edit_agent` to
existing stream_registry infrastructure for SSE reconnection
- **Feature flag**: `CHAT_USE_CLAUDE_AGENT_SDK` with LaunchDarkly
support, defaults to enabled
**Security issues found:**
- Path traversal validation has logic errors in `security_hooks.py:82`
(tilde expansion order) and `service.py:266` (redundant `..` check)
- Config validator always prefers env var over explicit `False` value
(`config.py:162`)
- Race condition in `routes.py:323` — message persisted before task
registration, could duplicate on retry
- Resource limits in sandbox may fail silently (`sandbox.py:109`)
**Test coverage is strong** with 366 lines for response adapter, 165 for
security hooks, and integration tests for multi-turn resume.
</details>
<details><summary><h3>Confidence Score: 3/5</h3></summary>
- This PR is generally safe but has critical security issues in path
validation that must be fixed before merge
- Score reflects strong architecture and test coverage offset by real
security vulnerabilities: the tilde expansion bug in `security_hooks.py`
could allow sandbox escape, the race condition could cause message
duplication, and the silent ulimit failures could bypass resource
limits. The bubblewrap sandbox and allowlist enforcement are
well-designed, but the path validation bugs need fixing. The transcript
resume feature is properly feature-flagged. Overall the implementation
is solid but the security issues prevent a higher score.
- Pay close attention to
`backend/api/features/chat/sdk/security_hooks.py` (path traversal
vulnerability), `backend/api/features/chat/routes.py` (race condition),
`backend/api/features/chat/tools/sandbox.py` (silent resource limit
failures), and `backend/api/features/chat/sdk/service.py` (redundant
security check)
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant Frontend
participant Routes as routes.py
participant SDKService as sdk/service.py
participant ClaudeSDK as Claude Agent SDK CLI
participant SecurityHooks as security_hooks.py
participant ToolAdapter as tool_adapter.py
participant CoPilotTools as tools/*
participant Sandbox as sandbox.py (bwrap)
participant DB as Database
participant Redis as stream_registry
Frontend->>Routes: POST /chat (user message)
Routes->>SDKService: stream_chat_completion_sdk()
SDKService->>DB: get_chat_session()
DB-->>SDKService: session + messages
alt Resume enabled AND transcript exists
SDKService->>SDKService: validate_transcript()
SDKService->>SDKService: write_transcript_to_tempfile()
Note over SDKService: Pass --resume to SDK
else No resume
SDKService->>SDKService: _compress_conversation_history()
Note over SDKService: Inject history into user message
end
SDKService->>SecurityHooks: create_security_hooks()
SDKService->>ToolAdapter: create_copilot_mcp_server()
SDKService->>ClaudeSDK: spawn subprocess with MCP server
loop Streaming Conversation
ClaudeSDK->>SDKService: AssistantMessage (text/tool_use)
SDKService->>Frontend: StreamTextDelta / StreamToolInputAvailable
alt Tool Call
ClaudeSDK->>SecurityHooks: PreToolUse hook
SecurityHooks->>SecurityHooks: validate path, check allowlist
alt Tool blocked
SecurityHooks-->>ClaudeSDK: deny
else Tool allowed
SecurityHooks-->>ClaudeSDK: allow
ClaudeSDK->>ToolAdapter: call MCP tool
alt Long-running tool (create_agent, edit_agent)
ToolAdapter->>Redis: register task
ToolAdapter->>DB: save OperationPendingResponse
ToolAdapter->>ToolAdapter: spawn background task
ToolAdapter-->>ClaudeSDK: OperationStartedResponse
else Regular tool (find_block, bash_exec)
ToolAdapter->>CoPilotTools: execute()
alt bash_exec
CoPilotTools->>Sandbox: run_sandboxed()
Sandbox->>Sandbox: build bwrap command
Note over Sandbox: Network isolation,<br/>filesystem whitelist,<br/>resource limits
Sandbox-->>CoPilotTools: stdout, stderr, exit_code
end
CoPilotTools-->>ToolAdapter: result
ToolAdapter->>ToolAdapter: stash full output
ToolAdapter-->>ClaudeSDK: MCP response
end
SecurityHooks->>SecurityHooks: PostToolUse hook (log)
end
end
ClaudeSDK->>SDKService: UserMessage (ToolResultBlock)
SDKService->>ToolAdapter: pop_pending_tool_output()
SDKService->>Frontend: StreamToolOutputAvailable
end
ClaudeSDK->>SecurityHooks: Stop hook
SecurityHooks->>SDKService: transcript_path callback
SDKService->>SDKService: read_transcript_file()
SDKService->>DB: save transcript to session.sdkTranscript
ClaudeSDK->>SDKService: ResultMessage (success)
SDKService->>Frontend: StreamFinish
SDKService->>DB: upsert_chat_session()
```
</details>
<sub>Last reviewed commit: 28c1121</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
---------
Co-authored-by: Swifty <craigswift13@gmail.com>
188 lines
6.6 KiB
Python
188 lines
6.6 KiB
Python
"""Configuration management for chat system."""
|
|
|
|
import os
|
|
|
|
from pydantic import Field, field_validator
|
|
from pydantic_settings import BaseSettings
|
|
|
|
|
|
class ChatConfig(BaseSettings):
|
|
"""Configuration for the chat system."""
|
|
|
|
# OpenAI API Configuration
|
|
model: str = Field(
|
|
default="anthropic/claude-opus-4.6", description="Default model to use"
|
|
)
|
|
title_model: str = Field(
|
|
default="openai/gpt-4o-mini",
|
|
description="Model to use for generating session titles (should be fast/cheap)",
|
|
)
|
|
api_key: str | None = Field(default=None, description="OpenAI API key")
|
|
base_url: str | None = Field(
|
|
default="https://openrouter.ai/api/v1",
|
|
description="Base URL for API (e.g., for OpenRouter)",
|
|
)
|
|
|
|
# Session TTL Configuration - 12 hours
|
|
session_ttl: int = Field(default=43200, description="Session TTL in seconds")
|
|
|
|
# Streaming Configuration
|
|
stream_timeout: int = Field(default=300, description="Stream timeout in seconds")
|
|
max_retries: int = Field(
|
|
default=3,
|
|
description="Max retries for fallback path (SDK handles retries internally)",
|
|
)
|
|
max_agent_runs: int = Field(default=30, description="Maximum number of agent runs")
|
|
max_agent_schedules: int = Field(
|
|
default=30, description="Maximum number of agent schedules"
|
|
)
|
|
|
|
# Long-running operation configuration
|
|
long_running_operation_ttl: int = Field(
|
|
default=600,
|
|
description="TTL in seconds for long-running operation tracking in Redis (safety net if pod dies)",
|
|
)
|
|
|
|
# Stream registry configuration for SSE reconnection
|
|
stream_ttl: int = Field(
|
|
default=3600,
|
|
description="TTL in seconds for stream data in Redis (1 hour)",
|
|
)
|
|
stream_max_length: int = Field(
|
|
default=10000,
|
|
description="Maximum number of messages to store per stream",
|
|
)
|
|
|
|
# Redis Streams configuration for completion consumer
|
|
stream_completion_name: str = Field(
|
|
default="chat:completions",
|
|
description="Redis Stream name for operation completions",
|
|
)
|
|
stream_consumer_group: str = Field(
|
|
default="chat_consumers",
|
|
description="Consumer group name for completion stream",
|
|
)
|
|
stream_claim_min_idle_ms: int = Field(
|
|
default=60000,
|
|
description="Minimum idle time in milliseconds before claiming pending messages from dead consumers",
|
|
)
|
|
|
|
# Redis key prefixes for stream registry
|
|
task_meta_prefix: str = Field(
|
|
default="chat:task:meta:",
|
|
description="Prefix for task metadata hash keys",
|
|
)
|
|
task_stream_prefix: str = Field(
|
|
default="chat:stream:",
|
|
description="Prefix for task message stream keys",
|
|
)
|
|
task_op_prefix: str = Field(
|
|
default="chat:task:op:",
|
|
description="Prefix for operation ID to task ID mapping keys",
|
|
)
|
|
internal_api_key: str | None = Field(
|
|
default=None,
|
|
description="API key for internal webhook callbacks (env: CHAT_INTERNAL_API_KEY)",
|
|
)
|
|
|
|
# Langfuse Prompt Management Configuration
|
|
# Note: Langfuse credentials are in Settings().secrets (settings.py)
|
|
langfuse_prompt_name: str = Field(
|
|
default="CoPilot Prompt",
|
|
description="Name of the prompt in Langfuse to fetch",
|
|
)
|
|
|
|
# Claude Agent SDK Configuration
|
|
use_claude_agent_sdk: bool = Field(
|
|
default=True,
|
|
description="Use Claude Agent SDK for chat completions",
|
|
)
|
|
claude_agent_model: str | None = Field(
|
|
default=None,
|
|
description="Model for the Claude Agent SDK path. If None, derives from "
|
|
"the `model` field by stripping the OpenRouter provider prefix.",
|
|
)
|
|
claude_agent_max_buffer_size: int = Field(
|
|
default=10 * 1024 * 1024, # 10MB (default SDK is 1MB)
|
|
description="Max buffer size in bytes for Claude Agent SDK JSON message parsing. "
|
|
"Increase if tool outputs exceed the limit.",
|
|
)
|
|
claude_agent_max_subtasks: int = Field(
|
|
default=10,
|
|
description="Max number of sub-agent Tasks the SDK can spawn per session.",
|
|
)
|
|
claude_agent_use_resume: bool = Field(
|
|
default=True,
|
|
description="Use --resume for multi-turn conversations instead of "
|
|
"history compression. Falls back to compression when unavailable.",
|
|
)
|
|
|
|
# Extended thinking configuration for Claude models
|
|
thinking_enabled: bool = Field(
|
|
default=True,
|
|
description="Enable adaptive thinking for Claude models via OpenRouter",
|
|
)
|
|
|
|
@field_validator("api_key", mode="before")
|
|
@classmethod
|
|
def get_api_key(cls, v):
|
|
"""Get API key from environment if not provided."""
|
|
if v is None:
|
|
# Try to get from environment variables
|
|
# First check for CHAT_API_KEY (Pydantic prefix)
|
|
v = os.getenv("CHAT_API_KEY")
|
|
if not v:
|
|
# Fall back to OPEN_ROUTER_API_KEY
|
|
v = os.getenv("OPEN_ROUTER_API_KEY")
|
|
if not v:
|
|
# Fall back to OPENAI_API_KEY
|
|
v = os.getenv("OPENAI_API_KEY")
|
|
return v
|
|
|
|
@field_validator("base_url", mode="before")
|
|
@classmethod
|
|
def get_base_url(cls, v):
|
|
"""Get base URL from environment if not provided."""
|
|
if v is None:
|
|
# Check for OpenRouter or custom base URL
|
|
v = os.getenv("CHAT_BASE_URL")
|
|
if not v:
|
|
v = os.getenv("OPENROUTER_BASE_URL")
|
|
if not v:
|
|
v = os.getenv("OPENAI_BASE_URL")
|
|
if not v:
|
|
v = "https://openrouter.ai/api/v1"
|
|
return v
|
|
|
|
@field_validator("internal_api_key", mode="before")
|
|
@classmethod
|
|
def get_internal_api_key(cls, v):
|
|
"""Get internal API key from environment if not provided."""
|
|
if v is None:
|
|
v = os.getenv("CHAT_INTERNAL_API_KEY")
|
|
return v
|
|
|
|
@field_validator("use_claude_agent_sdk", mode="before")
|
|
@classmethod
|
|
def get_use_claude_agent_sdk(cls, v):
|
|
"""Get use_claude_agent_sdk from environment if not provided."""
|
|
# Check environment variable - default to True if not set
|
|
env_val = os.getenv("CHAT_USE_CLAUDE_AGENT_SDK", "").lower()
|
|
if env_val:
|
|
return env_val in ("true", "1", "yes", "on")
|
|
# Default to True (SDK enabled by default)
|
|
return True if v is None else v
|
|
|
|
# Prompt paths for different contexts
|
|
PROMPT_PATHS: dict[str, str] = {
|
|
"default": "prompts/chat_system.md",
|
|
"onboarding": "prompts/onboarding_system.md",
|
|
}
|
|
|
|
class Config:
|
|
"""Pydantic config."""
|
|
|
|
env_file = ".env"
|
|
env_file_encoding = "utf-8"
|
|
extra = "ignore" # Ignore extra environment variables
|