Zamil Majdy 52b3aebf71 feat(backend/sdk): Claude Agent SDK integration for CoPilot (#12103)
## Summary

Full integration of the **Claude Agent SDK** to replace the existing
one-turn OpenAI-compatible CoPilot implementation with a multi-turn,
tool-using AI agent.

### What changed

**Core SDK Integration** (`chat/sdk/` — new module)
- **`service.py`**: Main orchestrator — spawns Claude Code CLI as a
subprocess per user message, streams responses back via SSE. Handles
conversation history compression, session lifecycle, and error recovery.
- **`response_adapter.py`**: Translates Claude Agent SDK events (text
deltas, tool use, errors, result messages) into the existing CoPilot
`StreamEvent` protocol so the frontend works unchanged.
- **`tool_adapter.py`**: Bridges CoPilot's MCP tools (find_block,
run_block, create_agent, etc.) into the SDK's tool format. Handles
schema conversion and result serialization.
- **`security_hooks.py`**: Pre/Post tool-use hooks that enforce a strict
allowlist of tools, block path traversal, sandbox file operations to
per-session workspace directories, cap sub-agent spawning, and prevent
the model from accessing unauthorized system resources.
- **`transcript.py`**: JSONL transcript I/O utilities for the stateless
`--resume` feature (see below).

**Stateless Multi-Turn Resume** (new)
- Instead of compressing conversation history via LLM on every turn
(lossy and expensive), we capture Claude Code's native JSONL session
transcript via a **Stop hook** callback, persist it in the DB
(`ChatSession.sdkTranscript`), and restore it on the next turn via
`--resume <file>`.
- This preserves full tool call/result context across turns with zero
token overhead for history.
- Feature-flagged via `CLAUDE_AGENT_USE_RESUME` (default: off).
- DB migration: `ALTER TABLE "ChatSession" ADD COLUMN "sdkTranscript"
TEXT`.

**Sandboxed Tool Execution** (`chat/tools/`)
- **`bash_exec.py`**: Sandboxed bash execution using bubblewrap
(`bwrap`) with read-only root filesystem, per-session writable
workspace, resource limits (CPU, memory, file size), and network
isolation.
- **`sandbox.py`**: Shared bubblewrap sandbox infrastructure — generates
`bwrap` command lines with configurable mounts, environment, and
resource constraints.
- **`web_fetch.py`**: URL fetching tool with domain allowlist, size
limits, and content-type filtering.
- **`check_operation_status.py`**: Polling tool for long-running
operations (agent creation, block execution) so the SDK doesn't block
waiting.
- **`find_block.py`** / **`run_block.py`**: Enhanced with category
filtering, optimized response size (removed raw JSON schemas), and
better error handling.

**Security**
- Path traversal prevention: session IDs sanitized, all file ops
confined to workspace dirs, symlink resolution.
- Tool allowlist enforcement via SDK hooks — model cannot call arbitrary
tools.
- Built-in `Bash` tool blocked via `disallowed_tools` to prevent
bypassing sandboxed `bash_exec`.
- Sub-agent (`Task`) spawning capped at configurable limit (default:
10).
- CodeQL-clean path sanitization patterns.

**Streaming & Reconnection**
- SSE stream registry backed by Redis Streams for crash-resilient
reconnection.
- Long-running operation tracking with TTL-based cleanup.
- Atomic message append to prevent race conditions on concurrent writes.

**Configuration** (`config.py`)
- `use_claude_agent_sdk` — master toggle (default: on)
- `claude_agent_model` — model override for SDK path
- `claude_agent_max_buffer_size` — JSON parsing buffer (10MB)
- `claude_agent_max_subtasks` — sub-agent cap (10)
- `claude_agent_use_resume` — transcript-based resume (default: off)
- `thinking_enabled` — extended thinking for Claude models

**Tests**
- `sdk/response_adapter_test.py` — 366 lines covering all event
translation paths
- `sdk/security_hooks_test.py` — 165 lines covering tool blocking, path
traversal, subtask limits
- `chat/model_test.py` — 214 lines covering session model serialization
- `chat/service_test.py` — Integration tests including multi-turn resume
keyword recall
- `tools/find_block_test.py` / `run_block_test.py` — Extended with new
tool behavior tests

## Test plan
- [x] Unit tests pass (`sdk/response_adapter_test.py`,
`security_hooks_test.py`, `model_test.py`)
- [x] Integration test: multi-turn keyword recall via `--resume`
(`service_test.py::test_sdk_resume_multi_turn`)
- [x] Manual E2E: CoPilot chat sessions with tool calls, bash execution,
and multi-turn context
- [x] Pre-commit hooks pass (ruff, isort, black, pyright, flake8)
- [ ] Staging deployment with `claude_agent_use_resume=false` initially
- [ ] Enable resume in staging, verify transcript capture and recall

<!-- greptile_comment -->

<h2>Greptile Overview</h2>

<details><summary><h3>Greptile Summary</h3></summary>

This PR replaces the existing OpenAI-compatible CoPilot with a full
Claude Agent SDK integration, introducing multi-turn conversations,
stateless resume via JSONL transcripts, and sandboxed tool execution.

**Key changes:**
- **SDK integration** (`chat/sdk/`): spawns Claude Code CLI subprocess
per message, translates events to frontend protocol, bridges MCP tools
- **Stateless resume**: captures JSONL transcripts via Stop hook,
persists in `ChatSession.sdkTranscript`, restores with `--resume`
(feature-flagged, default off)
- **Sandboxed execution**: bubblewrap sandbox for bash commands with
filesystem whitelist, network isolation, resource limits
- **Security hooks**: tool allowlist enforcement, path traversal
prevention, workspace-scoped file operations, sub-agent spawn limits
- **Long-running operations**: delegates `create_agent`/`edit_agent` to
existing stream_registry infrastructure for SSE reconnection
- **Feature flag**: `CHAT_USE_CLAUDE_AGENT_SDK` with LaunchDarkly
support, defaults to enabled

**Security issues found:**
- Path traversal validation has logic errors in `security_hooks.py:82`
(tilde expansion order) and `service.py:266` (redundant `..` check)
- Config validator always prefers env var over explicit `False` value
(`config.py:162`)
- Race condition in `routes.py:323` — message persisted before task
registration, could duplicate on retry
- Resource limits in sandbox may fail silently (`sandbox.py:109`)

**Test coverage is strong** with 366 lines for response adapter, 165 for
security hooks, and integration tests for multi-turn resume.
</details>


<details><summary><h3>Confidence Score: 3/5</h3></summary>

- This PR is generally safe but has critical security issues in path
validation that must be fixed before merge
- Score reflects strong architecture and test coverage offset by real
security vulnerabilities: the tilde expansion bug in `security_hooks.py`
could allow sandbox escape, the race condition could cause message
duplication, and the silent ulimit failures could bypass resource
limits. The bubblewrap sandbox and allowlist enforcement are
well-designed, but the path validation bugs need fixing. The transcript
resume feature is properly feature-flagged. Overall the implementation
is solid but the security issues prevent a higher score.
- Pay close attention to
`backend/api/features/chat/sdk/security_hooks.py` (path traversal
vulnerability), `backend/api/features/chat/routes.py` (race condition),
`backend/api/features/chat/tools/sandbox.py` (silent resource limit
failures), and `backend/api/features/chat/sdk/service.py` (redundant
security check)
</details>


<details><summary><h3>Sequence Diagram</h3></summary>

```mermaid
sequenceDiagram
    participant Frontend
    participant Routes as routes.py
    participant SDKService as sdk/service.py
    participant ClaudeSDK as Claude Agent SDK CLI
    participant SecurityHooks as security_hooks.py
    participant ToolAdapter as tool_adapter.py
    participant CoPilotTools as tools/*
    participant Sandbox as sandbox.py (bwrap)
    participant DB as Database
    participant Redis as stream_registry

    Frontend->>Routes: POST /chat (user message)
    Routes->>SDKService: stream_chat_completion_sdk()
    
    SDKService->>DB: get_chat_session()
    DB-->>SDKService: session + messages
    
    alt Resume enabled AND transcript exists
        SDKService->>SDKService: validate_transcript()
        SDKService->>SDKService: write_transcript_to_tempfile()
        Note over SDKService: Pass --resume to SDK
    else No resume
        SDKService->>SDKService: _compress_conversation_history()
        Note over SDKService: Inject history into user message
    end
    
    SDKService->>SecurityHooks: create_security_hooks()
    SDKService->>ToolAdapter: create_copilot_mcp_server()
    SDKService->>ClaudeSDK: spawn subprocess with MCP server
    
    loop Streaming Conversation
        ClaudeSDK->>SDKService: AssistantMessage (text/tool_use)
        SDKService->>Frontend: StreamTextDelta / StreamToolInputAvailable
        
        alt Tool Call
            ClaudeSDK->>SecurityHooks: PreToolUse hook
            SecurityHooks->>SecurityHooks: validate path, check allowlist
            alt Tool blocked
                SecurityHooks-->>ClaudeSDK: deny
            else Tool allowed
                SecurityHooks-->>ClaudeSDK: allow
                ClaudeSDK->>ToolAdapter: call MCP tool
                
                alt Long-running tool (create_agent, edit_agent)
                    ToolAdapter->>Redis: register task
                    ToolAdapter->>DB: save OperationPendingResponse
                    ToolAdapter->>ToolAdapter: spawn background task
                    ToolAdapter-->>ClaudeSDK: OperationStartedResponse
                else Regular tool (find_block, bash_exec)
                    ToolAdapter->>CoPilotTools: execute()
                    alt bash_exec
                        CoPilotTools->>Sandbox: run_sandboxed()
                        Sandbox->>Sandbox: build bwrap command
                        Note over Sandbox: Network isolation,<br/>filesystem whitelist,<br/>resource limits
                        Sandbox-->>CoPilotTools: stdout, stderr, exit_code
                    end
                    CoPilotTools-->>ToolAdapter: result
                    ToolAdapter->>ToolAdapter: stash full output
                    ToolAdapter-->>ClaudeSDK: MCP response
                end
                
                SecurityHooks->>SecurityHooks: PostToolUse hook (log)
            end
        end
        
        ClaudeSDK->>SDKService: UserMessage (ToolResultBlock)
        SDKService->>ToolAdapter: pop_pending_tool_output()
        SDKService->>Frontend: StreamToolOutputAvailable
    end
    
    ClaudeSDK->>SecurityHooks: Stop hook
    SecurityHooks->>SDKService: transcript_path callback
    SDKService->>SDKService: read_transcript_file()
    SDKService->>DB: save transcript to session.sdkTranscript
    
    ClaudeSDK->>SDKService: ResultMessage (success)
    SDKService->>Frontend: StreamFinish
    SDKService->>DB: upsert_chat_session()
```
</details>


<sub>Last reviewed commit: 28c1121</sub>

<!-- greptile_other_comments_section -->

<!-- /greptile_comment -->

---------

Co-authored-by: Swifty <craigswift13@gmail.com>
2026-02-13 15:49:03 +00:00
2025-01-29 10:31:57 -06:00
2026-02-03 16:01:23 +04:00
2025-03-24 18:11:56 +00:00
2025-07-25 15:39:29 +01:00

AutoGPT: Build, Deploy, and Run AI Agents

Discord Follow Twitter Follow

Deutsch | Español | français | 日本語 | 한국어 | Português | Русский | 中文

AutoGPT is a powerful platform that allows you to create, deploy, and manage continuous AI agents that automate complex workflows.

Hosting Options

  • Download to self-host (Free!)
  • Join the Waitlist for the cloud-hosted beta (Closed Beta - Public release Coming Soon!)

How to Self-Host the AutoGPT Platform

Note

Setting up and hosting the AutoGPT Platform yourself is a technical process. If you'd rather something that just works, we recommend joining the waitlist for the cloud-hosted beta.

System Requirements

Before proceeding with the installation, ensure your system meets the following requirements:

Hardware Requirements

  • CPU: 4+ cores recommended
  • RAM: Minimum 8GB, 16GB recommended
  • Storage: At least 10GB of free space

Software Requirements

  • Operating Systems:
    • Linux (Ubuntu 20.04 or newer recommended)
    • macOS (10.15 or newer)
    • Windows 10/11 with WSL2
  • Required Software (with minimum versions):
    • Docker Engine (20.10.0 or newer)
    • Docker Compose (2.0.0 or newer)
    • Git (2.30 or newer)
    • Node.js (16.x or newer)
    • npm (8.x or newer)
    • VSCode (1.60 or newer) or any modern code editor

Network Requirements

  • Stable internet connection
  • Access to required ports (will be configured in Docker)
  • Ability to make outbound HTTPS connections

Updated Setup Instructions:

We've moved to a fully maintained and regularly updated documentation site.

👉 Follow the official self-hosting guide here

This tutorial assumes you have Docker, VSCode, git and npm installed.


Skip the manual steps and get started in minutes using our automatic setup script.

For macOS/Linux:

curl -fsSL https://setup.agpt.co/install.sh -o install.sh && bash install.sh

For Windows (PowerShell):

powershell -c "iwr https://setup.agpt.co/install.bat -o install.bat; ./install.bat"

This will install dependencies, configure Docker, and launch your local instance — all in one go.

🧱 AutoGPT Frontend

The AutoGPT frontend is where users interact with our powerful AI automation platform. It offers multiple ways to engage with and leverage our AI agents. This is the interface where you'll bring your AI automation ideas to life:

Agent Builder: For those who want to customize, our intuitive, low-code interface allows you to design and configure your own AI agents.

Workflow Management: Build, modify, and optimize your automation workflows with ease. You build your agent by connecting blocks, where each block performs a single action.

Deployment Controls: Manage the lifecycle of your agents, from testing to production.

Ready-to-Use Agents: Don't want to build? Simply select from our library of pre-configured agents and put them to work immediately.

Agent Interaction: Whether you've built your own or are using pre-configured agents, easily run and interact with them through our user-friendly interface.

Monitoring and Analytics: Keep track of your agents' performance and gain insights to continually improve your automation processes.

Read this guide to learn how to build your own custom blocks.

💽 AutoGPT Server

The AutoGPT Server is the powerhouse of our platform This is where your agents run. Once deployed, agents can be triggered by external sources and can operate continuously. It contains all the essential components that make AutoGPT run smoothly.

Source Code: The core logic that drives our agents and automation processes.

Infrastructure: Robust systems that ensure reliable and scalable performance.

Marketplace: A comprehensive marketplace where you can find and deploy a wide range of pre-built agents.

🐙 Example Agents

Here are two examples of what you can do with AutoGPT:

  1. Generate Viral Videos from Trending Topics

    • This agent reads topics on Reddit.
    • It identifies trending topics.
    • It then automatically creates a short-form video based on the content.
  2. Identify Top Quotes from Videos for Social Media

    • This agent subscribes to your YouTube channel.
    • When you post a new video, it transcribes it.
    • It uses AI to identify the most impactful quotes to generate a summary.
    • Then, it writes a post to automatically publish to your social media.

These examples show just a glimpse of what you can achieve with AutoGPT! You can create customized workflows to build agents for any use case.


License Overview:

🛡️ Polyform Shield License: All code and content within the autogpt_platform folder is licensed under the Polyform Shield License. This new project is our in-developlemt platform for building, deploying and managing agents.
Read more about this effort

🦉 MIT License: All other portions of the AutoGPT repository (i.e., everything outside the autogpt_platform folder) are licensed under the MIT License. This includes the original stand-alone AutoGPT Agent, along with projects such as Forge, agbenchmark and the AutoGPT Classic GUI.
We also publish additional work under the MIT Licence in other repositories, such as GravitasML which is developed for and used in the AutoGPT Platform. See also our MIT Licenced Code Ability project.


Mission

Our mission is to provide the tools, so that you can focus on what matters:

  • 🏗️ Building - Lay the foundation for something amazing.
  • 🧪 Testing - Fine-tune your agent to perfection.
  • 🤝 Delegating - Let AI work for you, and have your ideas come to life.

Be part of the revolution! AutoGPT is here to stay, at the forefront of AI innovation.

📖 Documentation | 🚀 Contributing


🤖 AutoGPT Classic

Below is information about the classic version of AutoGPT.

🛠️ Build your own Agent - Quickstart

🏗️ Forge

Forge your own agent! Forge is a ready-to-go toolkit to build your own agent application. It handles most of the boilerplate code, letting you channel all your creativity into the things that set your agent apart. All tutorials are located here. Components from forge can also be used individually to speed up development and reduce boilerplate in your agent project.

🚀 Getting Started with Forge This guide will walk you through the process of creating your own agent and using the benchmark and user interface.

📘 Learn More about Forge

🎯 Benchmark

Measure your agent's performance! The agbenchmark can be used with any agent that supports the agent protocol, and the integration with the project's CLI makes it even easier to use with AutoGPT and forge-based agents. The benchmark offers a stringent testing environment. Our framework allows for autonomous, objective performance evaluations, ensuring your agents are primed for real-world action.

📦 agbenchmark on Pypi | 📘 Learn More about the Benchmark

💻 UI

Makes agents easy to use! The frontend gives you a user-friendly interface to control and monitor your agents. It connects to agents through the agent protocol, ensuring compatibility with many agents from both inside and outside of our ecosystem.

The frontend works out-of-the-box with all agents in the repo. Just use the CLI to run your agent of choice!

📘 Learn More about the Frontend

⌨️ CLI

To make it as easy as possible to use all of the tools offered by the repository, a CLI is included at the root of the repo:

$ ./run
Usage: cli.py [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  agent      Commands to create, start and stop agents
  benchmark  Commands to start the benchmark and list tests and categories
  setup      Installs dependencies needed for your system.

Just clone the repo, install dependencies with ./run setup, and you should be good to go!

🤔 Questions? Problems? Suggestions?

Get help - Discord 💬

Join us on Discord

To report a bug or request a feature, create a GitHub Issue. Please ensure someone else hasn't created an issue for the same topic.

🤝 Sister projects

🔄 Agent Protocol

To maintain a uniform standard and ensure seamless compatibility with many current and future applications, AutoGPT employs the agent protocol standard by the AI Engineer Foundation. This standardizes the communication pathways from your agent to the frontend and benchmark.


Stars stats

Star History Chart

Contributors

Contributors
Description
No description provided
Readme MIT Cite this repository 688 MiB
Languages
Python 62.5%
TypeScript 32.9%
Dart 1.7%
JavaScript 1%
PLpgSQL 0.6%
Other 1%