AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-02-13 08:14:58 -05:00

Author	SHA1	Message	Date
Zamil Majdy	b915e67a9b	feat(chat/sdk): add stateless multi-turn resume via JSONL transcripts Capture Claude Code CLI session transcripts via the Stop hook and persist them in the DB. On subsequent turns, write the transcript to a temp file and pass --resume so the CLI restores full conversation context without lossy history compression. Key changes: - transcript.py: read/write/validate JSONL transcript utilities - security_hooks: register Stop hook to capture transcript_path - service.py: resume strategy with fallback to compression - schema.prisma: add sdkTranscript column to ChatSession - Feature flag: CLAUDE_AGENT_USE_RESUME (default off)	2026-02-13 13:48:04 +04:00
Zamil Majdy	32e9dda30d	fix(chat/sdk): resolve relative paths in security hooks and unify workspace access The security hook's path validation blocked SDK Read/Write tools because it didn't resolve relative paths against sdk_cwd. Since the SDK sets cwd, Claude naturally uses relative paths like "test.txt" which failed the absolute path prefix check. Now relative paths are joined with sdk_cwd before validation, and denial messages include the allowed workspace path. Also clarifies the workspace model: SDK Read/Write + bash_exec share the same ephemeral session directory, while workspace_file tools provide persistent cloud storage across sessions.	2026-02-13 10:40:41 +04:00
Zamil Majdy	cb45e7957b	feat: fix openapi.json	2026-02-12 23:39:47 +04:00
Zamil Majdy	f1d02fb8f3	fix(chat/sdk): move cwd setup inside try block to ensure cleanup Move _make_sdk_cwd() and os.makedirs() inside the try block so the finally cleanup always runs, preventing /tmp dir leaks if setup fails.	2026-02-12 23:32:26 +04:00
Zamil Majdy	47de6b6420	feat(chat): add check_operation_status tool for long-running ops Lets the CoPilot agent query whether a create_agent/edit_agent operation is still running, completed, or failed. Accepts operation_id or task_id from a previous operation_started response and looks up the task status in Redis via stream_registry.	2026-02-12 23:30:51 +04:00
Zamil Majdy	62cd2eea89	fix(chat/sandbox): use --symlink for compat paths on Debian 13 On Debian 13 (bookworm+), /bin, /lib, /sbin, /lib64 are symlinks to /usr/*. bwrap --ro-bind cannot create a symlink as a mount target inside the sandbox, causing "execvp: No such file or directory" because the ELF dynamic linker at /lib64/ld-linux-x86-64.so.2 is unreachable. Detect symlinks at runtime with os.path.islink() and use bwrap --symlink instead of --ro-bind. Falls back to --ro-bind on older distros where these are real directories.	2026-02-12 22:55:29 +04:00
Zamil Majdy	ae61ec692e	Merge branch 'dev' into feat/copitlot-claude-code	2026-02-12 22:27:50 +04:00
Zamil Majdy	9296bd8736	fix(chat/sandbox): fix bwrap inside Docker containers Three fixes for bubblewrap sandbox: - Fix --tmpdir (invalid) to --tmpfs (correct bwrap option) - Add --unshare-user so bwrap can create namespaces inside unprivileged Docker containers (no CAP_SYS_ADMIN needed) - Reorder mounts: --tmpfs /tmp first, then --bind workspace on top, so the workspace directory is visible through the fresh tmpfs	2026-02-12 22:22:39 +04:00
Zamil Majdy	308113c03d	fix(chat/sdk): remove obsolete Bash allowlist tests The SDK built-in Bash tool is now unconditionally blocked (bash_exec MCP tool with bubblewrap is used instead). Remove tests that expected safe Bash commands to be allowed and replace with a single test that verifies Bash is always denied.	2026-02-12 22:19:30 +04:00
Zamil Majdy	51abf13254	feat(chat): use LaunchDarkly flag for copilot SDK rollout Replace static CHAT_USE_CLAUDE_AGENT_SDK env var with a LaunchDarkly feature flag (copilot-sdk) for per-user rollout control. The env var value serves as the default when LD is not configured or the flag doesn't exist yet.	2026-02-12 22:02:28 +04:00
Zamil Majdy	54b03d3a29	fix(frontend): remove python_exec from openapi.json ResponseType enum The python_exec tool was removed from the backend but the generated openapi.json still referenced the enum value.	2026-02-12 21:55:25 +04:00
Zamil Majdy	239dff5ebd	feat(chat/sandbox): add resource limits to bubblewrap sandbox Add ulimit-based resource caps inside the bwrap sandbox to prevent fork bombs and resource exhaustion: - max 64 processes (stops fork bombs) - 512 MB virtual memory - 50 MB max file size - 256 open file descriptors Limits are applied via `sh -c 'ulimit ...; exec "$@"'` wrapper inside the sandbox, so they're inherited by all child processes.	2026-02-12 21:47:49 +04:00
Zamil Majdy	1dd53db21c	feat(chat/sandbox): bubblewrap sandbox for bash_exec, remove python_exec - Replace `--ro-bind / /` with whitelist-only filesystem: only /usr, /etc, /bin, /lib, /sbin mounted read-only. /app, /root, /home, /opt, /var are completely invisible inside the sandbox. - Add `--clearenv` to wipe all inherited env vars (API keys, DB passwords). Only safe vars (PATH, HOME=workspace, LANG) are explicitly set. - Remove python_exec tool — bash_exec can run `python3 -c` or heredocs with identical bubblewrap protection, reducing attack surface. - Remove all fallback security code (import hooks, blocked modules, network command lists). Tools now hard-require bubblewrap — disabled on platforms without bwrap. - Clean up security_hooks.py: remove ~200 lines of dead bash validation code, add Bash to BLOCKED_TOOLS as defence-in-depth. - Wire up long-running tool callback in SDK service for create_agent/edit_agent delegation to Redis Streams background infrastructure.	2026-02-12 21:44:40 +04:00
Zamil Majdy	06c16ee2fe	fix(chat/sdk): non-blocking long-running tools, tighten security - Long-running tools (create_agent) now run in background and return immediately with an operation_id. Add check_operation MCP tool for polling results. Prevents 3+ min blocking and survives page refresh. - Fix CodeQL path traversal alert: use normpath+startswith sanitizer in _make_sdk_cwd() instead of assert. - Tighten _read_file_handler: restrict from ~/.claude/ to only ~/.claude/projects/**/tool-results/ (sentry review feedback). - Fix bash redirect bypass: strip quoted strings before checking for unquoted > operator, catches `echo hello>file` (sentry review).	2026-02-12 20:39:33 +04:00
Zamil Majdy	8d2a649ee5	refactor(chat/sdk): remove Langfuse tracing — OpenRouter handles observability Delete tracing.py (~408 lines) and all TracedSession/hook references from the SDK path. OpenRouter already provides token usage, cost tracking, and request logging, making manual Langfuse integration redundant. This also fixes the broken 'Langfuse' object has no attribute 'trace' warning on every request.	2026-02-12 20:24:27 +04:00
Nicholas Tindle	cb166dd6fb	feat(blocks): Store sandbox files to workspace (#12073 ) Store files created by sandbox blocks (Claude Code, Code Executor) to the user's workspace for persistence across runs. ### Changes 🏗️ - New `sandbox_files.py` utility (`backend/util/sandbox_files.py`) - Shared module for extracting files from E2B sandboxes - Stores files to workspace via `store_media_file()` (includes virus scanning, size limits) - Returns `SandboxFileOutput` with path, content, and `workspace_ref` - Claude Code block (`backend/blocks/claude_code.py`) - Added `workspace_ref` field to `FileOutput` schema - Replaced inline `_extract_files()` with shared utility - Files from working directory now stored to workspace automatically - Code Executor block (`backend/blocks/code_executor.py`) - Added `files` output field to `ExecuteCodeBlock.Output` - Creates `/output` directory in sandbox before execution - Extracts all files (text + binary) from `/output` after execution - Updated `execute_code()` to support file extraction with `extract_files` param ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Create agent with Claude Code block, have it create a file, verify `workspace_ref` in output - [x] Create agent with Code Executor block, write file to `/output`, verify `workspace_ref` in output - [x] Verify files persist in workspace after sandbox disposal - [x] Verify binary files (images, etc.) work correctly in Code Executor - [x] Verify existing graphs using `content` field still work (backward compat) #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under Changes) No configuration changes required - this is purely additive backend code. --- Related: Closes SECRT-1931 <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Adds automatic extraction and workspace storage of sandbox-written files (including binaries for code execution), which can affect output payload size, performance, and file-handling edge cases. > > Overview > Sandbox blocks now persist generated files to workspace. A new shared utility (`backend/util/sandbox_files.py`) extracts files from an E2B sandbox (scoped by a start timestamp) and stores them via `store_media_file`, returning `SandboxFileOutput` with `workspace_ref`. > > `ClaudeCodeBlock` replaces its inline file-scraping logic with this utility and updates the `files` output schema to include `workspace_ref`. > > `ExecuteCodeBlock` adds a `files` output and extends the executor mixin to optionally extract/store files (text + binary) when an `execution_context` is provided; related mocks/tests and docs are updated accordingly. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `343854c0cf`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-12 15:56:59 +00:00
Zamil Majdy	9589474709	Merge branch 'dev' into feat/copitlot-claude-code	2026-02-12 19:40:32 +04:00
Swifty	3d31f62bf1	Revert "added feature request tooling" This reverts commit `b8b6c9de23`.	2026-02-12 16:39:24 +01:00
Swifty	b8b6c9de23	added feature request tooling	2026-02-12 16:38:17 +01:00
Zamil Majdy	749a78723a	refactor(chat/sdk): deduplicate code and remove anthropic fallback - Extract shared `make_session_path()` into sandbox.py (single source of truth for workspace path sanitization), replace duplicate in service.py - Delete anthropic_fallback.py (~360 lines) — redundant third code path; routes.py already falls back to non-SDK service - Remove dead `traced_session()`, `get_tool_definitions()`, `get_tool_handlers()`, `_current_tool_call_id` ContextVar - Fix hardcoded model in tracing — pass actual resolved model - Fix inconsistent model name splitting in anthropic fallback	2026-02-12 19:26:29 +04:00
Zamil Majdy	bec2e1ddee	fix(chat/tools): sanitize session_id in sandbox workspace path Align with SDK's _make_sdk_cwd() to prevent path traversal and ensure python_exec/bash_exec share the same workspace as SDK file tools.	2026-02-12 19:08:47 +04:00
Zamil Majdy	ec1ab06e0d	chore(chat): bump default max_subtasks from 3 to 10	2026-02-12 19:07:42 +04:00
Zamil Majdy	f31cb49557	feat(chat/tools): add sandboxed python_exec, bash_exec, web_fetch tools and enable Task - Add sandbox.py with network-isolated execution via unshare --net (Linux) and import/command blocklist fallback (macOS dev) - Add python_exec tool: runs Python in subprocess with no network, workspace-scoped - Add bash_exec tool: full Bash scripting with no network, workspace-scoped - Add web_fetch tool: SSRF-protected URL fetching via backend Requests utility - Remove SDK built-in Bash from allowlist (replaced by sandboxed bash_exec) - Enable SDK built-in Task (sub-agents) with per-session rate limit (default 3) - Add claude_agent_max_subtasks config field	2026-02-12 19:07:19 +04:00
Zamil Majdy	fd28c386f4	Merge branch 'dev' into feat/copitlot-claude-code	2026-02-12 18:50:11 +04:00
Zamil Majdy	3bea584659	feat(chat/sdk): route SDK through OpenRouter with observability (#12084 ) ## Summary - Routes Claude Agent SDK API calls through OpenRouter via `ANTHROPIC_BASE_URL` / `ANTHROPIC_AUTH_TOKEN` env vars, enabling per-call token and cost tracking on the OpenRouter dashboard - Adds `sdk_model` and `sdk_max_budget_usd` config fields for SDK-specific model selection and budget control - Emits `StreamUsage` from SDK `ResultMessage` so the frontend receives token counts, and persists usage to `session.usage` - Fixes Langfuse tracing to use the configured model name instead of a hardcoded default - Updates Anthropic fallback to use `config.api_key` / `config.base_url` (OpenRouter routing) instead of raw `ANTHROPIC_API_KEY` env var ## Test plan - [ ] Deploy and send a CoPilot message — verify the API call appears on the OpenRouter dashboard - [ ] Check Langfuse trace shows correct model name (e.g. `claude-opus-4.6` not hardcoded `claude-sonnet-4-20250514`) - [ ] Verify frontend receives `StreamUsage` with `promptTokens` / `completionTokens` values - [ ] Set `CHAT_SDK_MAX_BUDGET_USD` and verify budget is respected - [ ] Test fallback path (without `claude-agent-sdk` installed) still works via OpenRouter <!-- greptile_comment --> <h2>Greptile Overview</h2> <details><summary><h3>Greptile Summary</h3></summary> Routes Claude Agent SDK API calls through OpenRouter for enhanced observability and cost tracking. The PR enables per-call token tracking on the OpenRouter dashboard by configuring the SDK to use `ANTHROPIC_BASE_URL` and `ANTHROPIC_AUTH_TOKEN` environment variables derived from the chat configuration. Key changes: - Added `sdk_model` and `sdk_max_budget_usd` configuration fields for SDK-specific control - Implemented automatic model name resolution that strips OpenRouter provider prefixes - Updated SDK client initialization to route through OpenRouter with proper environment variables - Emits `StreamUsage` events from SDK `ResultMessage` for frontend token visibility - Persists usage data to `session.usage` for historical tracking - Fixed Langfuse tracing to use the configured model name instead of hardcoded defaults - Updated fallback path to use OpenRouter routing instead of direct Anthropic API </details> <details><summary><h3>Confidence Score: 4/5</h3></summary> - Safe to merge with minor observations - the implementation is solid and the changes are well-structured - The code quality is high with proper error handling, clear separation of concerns, and good defensive coding practices. The changes integrate cleanly with existing patterns. Minor observations include missing validation for sdk_max_budget_usd and a potential edge case in model name resolution, but these don't block merging - No files require special attention - all changes follow existing patterns and maintain consistency </details> <details><summary><h3>Sequence Diagram</h3></summary> ```mermaid sequenceDiagram participant Frontend participant Backend participant SDK as Claude Agent SDK participant OpenRouter participant Anthropic participant Langfuse Frontend->>Backend: POST /chat/completions Backend->>Backend: Load config (api_key, base_url) Backend->>Backend: Resolve SDK model (strip OpenRouter prefix) Backend->>Backend: Build SDK env vars (ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN) Backend->>Langfuse: Initialize TracedSession with model name Backend->>SDK: ClaudeSDKClient(model, env, max_budget_usd) SDK->>SDK: Use ANTHROPIC_BASE_URL from env SDK->>OpenRouter: POST /messages (via configured base_url) OpenRouter->>Anthropic: Forward request with routing Anthropic-->>OpenRouter: Stream response chunks OpenRouter-->>SDK: Stream response with usage data loop For each SDK message SDK-->>Backend: AssistantMessage/UserMessage/ResultMessage Backend->>Langfuse: log_sdk_message() Backend->>Backend: SDKResponseAdapter.convert_message() Backend->>Backend: Extract usage from ResultMessage Backend->>Backend: Persist Usage to session.usage Backend-->>Frontend: StreamUsage(promptTokens, completionTokens) Backend-->>Frontend: StreamTextDelta/StreamToolInput/etc end Backend->>Langfuse: Log final generation with model name Backend->>Backend: Save session with usage data Backend-->>Frontend: StreamFinish ``` </details> <!-- greptile_other_comments_section --> <!-- /greptile_comment -->	2026-02-12 21:47:39 +07:00
Abhimanyu Yadav	4f6055f494	refactor(frontend): remove default expiration date from API key credentials form (#12092 ) ### Changes 🏗️ Removed the default expiration date for API keys in the credentials modal. Previously, API keys were set to expire the next day by default, but now the expiration date field starts empty, allowing users to explicitly choose whether they want to set an expiration date. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Open the API key credentials modal and verify the expiration date field is empty by default - [x] Test creating an API key with and without an expiration date - [x] Verify both scenarios work correctly <!-- greptile_comment --> <h2>Greptile Overview</h2> <details><summary><h3>Greptile Summary</h3></summary> Removed the default expiration date for API key credentials in the credentials modal. Previously, API keys were automatically set to expire the next day at midnight. Now the expiration date field starts empty, allowing users to explicitly choose whether to set an expiration. - Removed `getDefaultExpirationDate()` helper function that calculated tomorrow's date - Changed default `expiresAt` value from calculated date to empty string - Backend already supports optional expiration (`expires_at?: number`), so no backend changes needed - Form submission correctly handles empty expiration by passing `undefined` to the API </details> <details><summary><h3>Confidence Score: 5/5</h3></summary> - This PR is safe to merge with minimal risk - The changes are straightforward and well-contained. The refactor removes a helper function and changes a default value. The backend API already supports optional expiration dates, and the form submission logic correctly handles empty values by passing undefined. The change improves UX by not forcing a default expiration date on users. - No files require special attention </details> <!-- greptile_other_comments_section --> <!-- /greptile_comment -->	2026-02-12 12:57:06 +00:00
Otto	695a185fa1	fix(frontend): remove fixed min-height from CoPilot message container (#12091 ) ## Summary Removes the `min-h-screen` class from `ConversationContent` in ChatMessagesContainer, which was causing fixed height layout issues in the CoPilot chat interface. ## Changes - Removed `min-h-screen` from ConversationContent className ## Linear Fixes [SECRT-1944](https://linear.app/autogpt/issue/SECRT-1944) <!-- greptile_comment --> <h2>Greptile Overview</h2> <details><summary><h3>Greptile Summary</h3></summary> Removes the `min-h-screen` (100vh) class from `ConversationContent` that was causing the chat message container to enforce a minimum viewport height. The parent container already handles height constraints with `h-full min-h-0` and flexbox layout, so the fixed minimum height was creating layout conflicts. The component now properly grows within its flex container using `flex-1`. </details> <details><summary><h3>Confidence Score: 5/5</h3></summary> - This PR is safe to merge with minimal risk - The change removes a single problematic CSS class that was causing fixed height layout issues. The parent container already handles height constraints properly with flexbox, and removing min-h-screen allows the component to size correctly within its flex parent. This is a targeted, low-risk bug fix with no logic changes. - No files require special attention </details> <!-- greptile_other_comments_section --> <!-- /greptile_comment -->	2026-02-12 12:46:29 +00:00
Reinier van der Leer	113e87a23c	refactor(backend): Reduce circular imports (#12068 ) I'm getting circular import issues because there is a lot of cross-importing between `backend.data`, `backend.blocks`, and other modules. This change reduces block-related cross-imports and thus risk of breaking circular imports. ### Changes 🏗️ - Strip down `backend.data.block` - Move `Block` base class and related class/enum defs to `backend.blocks._base` - Move `is_block_auth_configured` to `backend.blocks._utils` - Move `get_blocks()`, `get_io_block_ids()` etc. to `backend.blocks` (`__init__.py`) - Update imports everywhere - Remove unused and poorly typed `Block.create()` - Change usages from `block_cls.create()` to `block_cls()` - Improve typing of `load_all_blocks` and `get_blocks` - Move cross-import of `backend.api.features.library.model` from `backend/data/__init__.py` to `backend/data/integrations.py` - Remove deprecated attribute `NodeModel.webhook` - Re-generate OpenAPI spec and fix frontend usage - Eliminate module-level `backend.blocks` import from `blocks/agent.py` - Eliminate module-level `backend.data.execution` and `backend.executor.manager` imports from `blocks/helpers/review.py` - Replace `BlockInput` with `GraphInput` for graph inputs ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - CI static type-checking + tests should be sufficient for this	2026-02-12 12:07:49 +00:00
Abhimanyu Yadav	d09f1532a4	feat(frontend): replace legacy builder with new flow editor (#12081) ### Changes 🏗️ This PR completes the migration from the legacy builder to the new Flow editor by removing all legacy code and feature flags. Removed: - Old builder view toggle functionality (`BuilderViewTabs.tsx`) - Legacy debug panel (`RightSidebar.tsx`) - Feature flags: `NEW_FLOW_EDITOR` and `BUILDER_VIEW_SWITCH` - `useBuilderView` hook and related view-switching logic Updated: - Simplified `build/page.tsx` to always render the new Flow editor - Added CSS styling (`flow.css`) to properly render Phosphor icons in React Flow handles Tests: - Skipped e2e test suite in `build.spec.ts` (legacy builder tests) - Follow-up PR (#12082) will add new e2e tests for the Flow editor ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Create a new flow and verify it loads correctly - [x] Add nodes and connections to verify basic functionality works - [x] Verify that node handles render correctly with the new CSS - [x] Check that the UI is clean without the old debug panel or view toggles #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes	2026-02-12 11:16:01 +00:00
Zamil Majdy	d7f7a2747f	fix(backend/chat): Atomic message append to prevent race condition Replace the read-modify-write pattern in stream_chat_post with an atomic append_and_save_message helper that acquires the session lock before re-fetching and appending. This prevents message loss when concurrent requests modify the same session.	2026-02-12 09:10:43 +04:00
Zamil Majdy	68849e197c	format	2026-02-12 08:26:26 +04:00
Zamil Majdy	211478bb29	Revert "style: run ruff format and isort" This reverts commit `40b58807ab`.	2026-02-12 08:25:22 +04:00
Zamil Majdy	0e88dd15b2	feat(chat): add hook-based tracing integration for Claude Agent SDK - Add create_tracing_hooks() for fine-grained tool timing - Add merge_hooks() utility to combine security + tracing hooks - Captures precise pre/post timing for tool executions - Tracks tool failures via PostToolUseFailure hook - Integrates seamlessly with existing security hooks	2026-02-12 03:35:16 +00:00
Zamil Majdy	7f3c227f0a	feat(chat): add modular Langfuse tracing for Claude Agent SDK - Create tracing.py with TracedSession context manager - Automatically trace user messages, SDK messages, and results - Capture tool calls with input/output and timing - Log usage and cost from SDK ResultMessage - No-op when Langfuse not configured (zero overhead) - Clean integration into service.py via context manager	2026-02-12 03:33:37 +00:00
Zamil Majdy	40b58807ab	style: run ruff format and isort	2026-02-12 03:25:19 +00:00
Zamil Majdy	d0e2e6f013	security(service): strengthen path validation for SDK cleanup - Add empty check after session_id sanitization - Add assertion for defense-in-depth - Add explicit '..' traversal check in cleanup - Replace glob with os.listdir to avoid glob injection - Add validation that project_dir stays under ~/.claude/projects - Add warning logs for rejected paths Addresses CodeQL alert about uncontrolled data in path expression	2026-02-12 03:07:08 +00:00
Zamil Majdy	efdc8d73cc	fix(security_hooks): use json.dumps for pattern matching and log warning - Use json.dumps instead of str() for more predictable pattern matching - Log warning when SDK not available and security hooks are disabled Addresses CodeRabbit review feedback	2026-02-12 02:55:04 +00:00
Zamil Majdy	a34810d8a2	revert: remove Bash command extraction from GenericTool Keep it simple - just show 'Bash completed' instead of special handling to extract command names like 'jq completed'	2026-02-12 02:53:37 +00:00
Zamil Majdy	038b7d5841	feat(copilot): show specific command name for Bash tool - Extract command name (jq, grep, etc.) from Bash tool input - Display 'jq completed' instead of 'Bash completed' - Add ripgrep and tree to Dockerfile (match ALLOWED_BASH_COMMANDS)	2026-02-12 02:48:19 +00:00
Zamil Majdy	cac93b0cc9	fix(chat): increase SDK buffer limit and add jq - Add sdk_max_buffer_size config option (default 10MB, was 1MB) - Pass max_buffer_size to ClaudeAgentOptions to prevent crashes on large tool outputs - Install jq in Dockerfile for JSON processing capabilities Fixes AUTOGPT-SERVER-7V2	2026-02-12 02:41:12 +00:00
Zamil Majdy	a78145505b	fix(copilot): merge split assistant messages to prevent Anthropic API errors (#12062 ) ## Summary - When the copilot model responds with both text content AND a long-running tool call (e.g., `create_agent`), the streaming code created two separate consecutive assistant messages — one with text, one with `tool_calls`. This caused Anthropic's API to reject with `"unexpected tool_use_id found in tool_result blocks"` because the `tool_result` couldn't find a matching `tool_use` in the immediately preceding assistant message. - Added a defensive merge of consecutive assistant messages in `to_openai_messages()` (fixes existing corrupt sessions too) - Fixed `_yield_tool_call` to add tool_calls to the existing current-turn assistant message instead of creating a new one - Changed `accumulated_tool_calls` assignment to use `extend` to prevent overwriting tool_calls added by long-running tool flow ## Test plan - [x] All 23 chat feature tests pass (`backend/api/features/chat/`) - [x] All 44 prompt utility tests pass (`backend/util/prompt_test.py`) - [x] All pre-commit hooks pass (ruff, isort, black, pyright) - [ ] Manual test: create an agent via copilot, then ask a follow-up question — should no longer get 400 error <!-- greptile_comment --> <h2>Greptile Overview</h2> <details><summary><h3>Greptile Summary</h3></summary> Fixes a critical bug where long-running tool calls (like `create_agent`) caused Anthropic API 400 errors due to split assistant messages. The fix ensures tool calls are added to the existing assistant message instead of creating new ones, and adds a defensive merge function to repair any existing corrupt sessions. Key changes: - Added `_merge_consecutive_assistant_messages()` to defensively merge split assistant messages in `to_openai_messages()` - Modified `_yield_tool_call()` to append tool calls to the current-turn assistant message instead of creating a new one - Changed `accumulated_tool_calls` from assignment to `extend` to preserve tool calls already added by long-running tool flow Impact: Resolves the issue where users received 400 errors after creating agents via copilot and asking follow-up questions. </details> <details><summary><h3>Confidence Score: 4/5</h3></summary> - Safe to merge with minor verification recommended - The changes are well-targeted and solve a real API compatibility issue. The logic is sound: searching backwards for the current assistant message is correct, and using `extend` instead of assignment prevents overwriting. The defensive merge in `to_openai_messages()` also fixes existing corrupt sessions. All existing tests pass according to the PR description. - No files require special attention - changes are localized and defensive </details> <details><summary><h3>Sequence Diagram</h3></summary> ```mermaid sequenceDiagram participant User participant StreamAPI as stream_chat_completion participant Chunks as _stream_chat_chunks participant ToolCall as _yield_tool_call participant Session as ChatSession User->>StreamAPI: Send message StreamAPI->>Chunks: Stream chat chunks alt Text + Long-running tool call Chunks->>StreamAPI: Text delta (content) StreamAPI->>Session: Append assistant message with content Chunks->>ToolCall: Tool call detected Note over ToolCall: OLD: Created new assistant message<br/>NEW: Appends to existing assistant ToolCall->>Session: Search backwards for current assistant ToolCall->>Session: Append tool_call to existing message ToolCall->>Session: Add pending tool result end StreamAPI->>StreamAPI: Merge accumulated_tool_calls Note over StreamAPI: Use extend (not assign)<br/>to preserve existing tool_calls StreamAPI->>Session: to_openai_messages() Session->>Session: _merge_consecutive_assistant_messages() Note over Session: Defensive: Merges any split<br/>assistant messages Session-->>StreamAPI: Merged messages StreamAPI->>User: Return response ``` </details> <!-- greptile_other_comments_section --> <!-- /greptile_comment -->	2026-02-12 01:52:17 +00:00
Zamil Majdy	2025aaf5f2	fix(backend/chat): Preserve full MCP tool output for frontend widgets The SDK CLI truncates large tool results (writing them to disk), which breaks frontend widget rendering (e.g., find_block's block list cards). Stash the full MCP tool output before the SDK sees it, then use the stash in the response adapter so the frontend always receives the complete JSON for proper widget parsing.	2026-02-11 23:13:42 +04:00
Zamil Majdy	ae9bce3bae	feat(backend/chat): Add sandboxed Bash and notify SDK of restrictions - Allow Bash tool with command allowlist (jq, grep, head, tail, etc.) validated via shlex.split for proper quote handling - Add workspace path validation for Bash absolute paths - Add SDK built-in tools (Read/Write/Edit/Glob/Grep/Bash) to allowed_tools - Append Bash restrictions to system prompt (SDK doesn't know our allowlist) - Add default_factory to BlockInfoSummary schema fields - Add 12 Bash sandbox tests covering safe/dangerous commands, substitution, redirection, /dev/ access, path escaping	2026-02-11 22:35:39 +04:00
Zamil Majdy	3107d889fc	feat(frontend/copilot): Add generic tool widget for unrecognized tools SDK built-in tools (Read, Glob, Grep, etc.) have no dedicated frontend widget, so tool calls silently disappeared. Add a GenericTool component that shows a spinning gear + "Running {tool}…" for any tool-* part type that doesn't match a known case.	2026-02-11 22:08:03 +04:00
Zamil Majdy	f174fb6303	fix(backend/chat): Strip MCP prefix from SDK tool names for frontend rendering The Vercel AI SDK frontend renders tool widgets based on tool name (e.g. "tool-find_block", "tool-run_agent"). The SDK sends tool names with the MCP prefix (mcp__copilot__find_block) which didn't match any frontend switch case, causing tool execution to be invisible. Strip the mcp__copilot__ prefix in the response adapter so tool events reach the correct frontend widget handlers.	2026-02-11 22:01:59 +04:00
Zamil Majdy	920a4c5f15	feat(backend/chat): Allow Read/Write/Edit/Glob/Grep in SDK within workspace Move these tools from fully-blocked to workspace-scoped: they are now allowed when the file path stays within the SDK working directory (/tmp/copilot-<session>/) or the tool-results directory (~/.claude/projects/…/tool-results/). This enables the SDK's built-in oversized tool result handling and workspace file operations. - Add _validate_workspace_path() with normpath-based path validation - Pass sdk_cwd from service.py into create_security_hooks() - Add 20 unit tests covering allowed/denied paths, traversal attacks	2026-02-11 20:39:33 +04:00
Zamil Majdy	e95fadbb86	Merge branch 'dev' into feat/copitlot-claude-code	2026-02-11 20:23:56 +04:00
Zamil Majdy	b14b3803ad	feat(backend/chat): Add StreamStartStep/StreamFinishStep to SDK adapter The non-SDK path emits step boundaries (StartStep/FinishStep) around each LLM turn and tool cycle. The SDK adapter was missing these, causing the frontend to lack visual step framing for tool calls. Now the SDK adapter emits: - StreamStartStep after init and before each new LLM turn - StreamFinishStep after tool results and before final finish	2026-02-11 20:18:27 +04:00
Otto	36aeb0b2b3	docs(blocks): clarify HumanInTheLoop output descriptions for agent builder (#12069 ) ## Problem The agent builder (LLM) misinterprets the HumanInTheLoop block outputs. It thinks `approved_data` and `rejected_data` will yield status strings like "APPROVED" or "REJECTED" instead of understanding that the actual input data passes through. This leads to unnecessary complexity - the agent builder adds comparison blocks to check for status strings that don't exist. ## Solution Enriched the block docstring and all input/output field descriptions to make it explicit that: 1. The output is the actual data itself, not a status string 2. The routing is determined by which output pin fires 3. How to use the block correctly (connect downstream blocks to appropriate output pins) ## Changes - Updated block docstring with clear "How it works" and "Example usage" sections - Enhanced `data` input description to explain data flow - Enhanced `name` input description for reviewer context - Enhanced `approved_data` output to explicitly state it's NOT a status string - Enhanced `rejected_data` output to explicitly state it's NOT a status string - Enhanced `review_message` output for clarity ## Testing Documentation-only change to schema descriptions. No functional changes. Fixes SECRT-1930 <!-- greptile_comment --> <h2>Greptile Overview</h2> <details><summary><h3>Greptile Summary</h3></summary> Enhanced documentation for the `HumanInTheLoopBlock` to clarify how output pins work. The key improvement explicitly states that output pins (`approved_data` and `rejected_data`) yield the actual input data, not status strings like "APPROVED" or "REJECTED". This prevents the agent builder (LLM) from misinterpreting the block's behavior and adding unnecessary comparison blocks. Key changes: - Added "How it works" and "Example usage" sections to the block docstring - Clarified that routing is determined by which output pin fires, not by comparing output values - Enhanced all input/output field descriptions with explicit data flow explanations - Emphasized that downstream blocks should be connected to the appropriate output pin based on desired workflow path This is a documentation-only change with no functional modifications to the code logic. </details> <details><summary><h3>Confidence Score: 5/5</h3></summary> - This PR is safe to merge with no risk - Documentation-only change that accurately reflects the existing code behavior. No functional changes, no runtime impact, and the enhanced descriptions correctly explain how the block outputs work based on verification of the implementation code. - No files require special attention </details> <!-- greptile_other_comments_section --> <!-- /greptile_comment --> Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>	2026-02-11 15:43:58 +00:00
Ubbe	2a189c44c4	fix(frontend): API stream issues leaking into prompt (#12063 ) ## Changes 🏗️ <img width="800" height="621" alt="Screenshot 2026-02-11 at 19 32 39" src="https://github.com/user-attachments/assets/e97be1a7-972e-4ae0-8dfa-6ade63cf287b" /> When the BE API has an error, prevent it from leaking into the stream and instead handle it gracefully via toast. ## Checklist 📋 ### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Run the app locally and trust the changes <!-- greptile_comment --> <h2>Greptile Overview</h2> <details><summary><h3>Greptile Summary</h3></summary> This PR fixes an issue where backend API stream errors were leaking into the chat prompt instead of being handled gracefully. The fix involves both backend and frontend changes to ensure error events conform to the AI SDK's strict schema. Key Changes: - Backend (`response_model.py`): Added custom `to_sse()` method for `StreamError` that only emits `type` and `errorText` fields, stripping extra fields like `code` and `details` that cause AI SDK validation failures - Backend (`prompt.py`): Added validation step after context compression to remove orphaned tool responses without matching tool calls, preventing "unexpected tool_use_id" API errors - Frontend (`route.ts`): Implemented SSE stream normalization with `normalizeSSEStream()` and `normalizeSSEEvent()` functions to strip non-conforming fields from error events before they reach the AI SDK - Frontend (`ChatMessagesContainer.tsx`): Added toast notifications for errors and improved error display UI with deduplication logic The changes ensure a clean separation between internal error metadata (useful for logging/debugging) and the strict schema required by the AI SDK on the frontend. </details> <details><summary><h3>Confidence Score: 4/5</h3></summary> - This PR is safe to merge with low risk - The changes are well-structured and address a specific bug with proper error handling. The dual-layer approach (backend filtering in `to_sse()` + frontend normalization) provides defense-in-depth. However, the lack of automated tests for the new error normalization logic and the potential for edge cases in SSE parsing prevent a perfect score. - Pay close attention to `autogpt_platform/frontend/src/app/api/chat/sessions/[sessionId]/stream/route.ts` - the SSE normalization logic should be tested with various error scenarios </details> <details><summary><h3>Sequence Diagram</h3></summary> ```mermaid sequenceDiagram participant User participant Frontend as ChatMessagesContainer participant Proxy as /api/chat/.../stream participant Backend as Backend API participant AISDK as AI SDK User->>Frontend: Send message Frontend->>Proxy: POST with message Proxy->>Backend: Forward request with auth Backend->>Backend: Process message alt Success Path Backend->>Proxy: SSE stream (text-delta, etc.) Proxy->>Proxy: normalizeSSEStream (pass through) Proxy->>AISDK: Forward SSE events AISDK->>Frontend: Update messages Frontend->>User: Display response else Error Path Backend->>Backend: StreamError.to_sse() Note over Backend: Only emit {type, errorText} Backend->>Proxy: SSE error event Proxy->>Proxy: normalizeSSEEvent() Note over Proxy: Strip extra fields (code, details) Proxy->>AISDK: {type: "error", errorText: "..."} AISDK->>Frontend: error state updated Frontend->>Frontend: Toast notification (deduplicated) Frontend->>User: Show error UI + toast end ``` </details> <!-- greptile_other_comments_section --> <!-- /greptile_comment --> --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Otto-AGPT <otto@agpt.co>	2026-02-11 22:46:37 +08:00

1 2 3 4 5 ...

7970 Commits