AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-02-07 05:15:09 -05:00

Author	SHA1	Message	Date
Nicholas Tindle	29ee85c86f	fix: add virus scanning to WorkspaceManager.write_file() (#11990 ) ## Summary Adds virus scanning at the `WorkspaceManager.write_file()` layer for defense in depth. ## Problem Previously, virus scanning was only performed at entry points: - `store_media_file()` in `backend/util/file.py` - `WriteWorkspaceFileTool` in `backend/api/features/chat/tools/workspace_files.py` This created a trust boundary where any new caller of `WorkspaceManager.write_file()` would need to remember to scan first. ## Solution Add `scan_content_safe()` call directly in `WorkspaceManager.write_file()` before persisting to storage. This ensures all content is scanned regardless of the caller. ## Changes - Added import for `scan_content_safe` from `backend.util.virus_scanner` - Added virus scan call after file size validation, before storage ## Testing Existing tests should pass. The scan is a no-op in test environments where ClamAV isn't running. Closes https://linear.app/autogpt/issue/OPEN-2993 <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Introduces a new required async scan step in the workspace write path, which can add latency or cause new failures if the scanner/ClamAV is misconfigured or unavailable. > > Overview > Adds a defense-in-depth virus scan to `WorkspaceManager.write_file()` by invoking `scan_content_safe()` after file-size validation and before any storage/database persistence. > > This centralizes scanning so any caller writing workspace files gets the same malware check without relying on upstream entry points to remember to scan. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `0f5ac68b92`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-02-06 04:38:32 +00:00
Nicholas Tindle	85b6520710	feat(blocks): Add video editing blocks (#11796 ) <!-- Clearly explain the need for these changes: --> This PR adds general-purpose video editing blocks for the AutoGPT Platform, enabling automated video production workflows like documentary creation, marketing videos, tutorial assembly, and content repurposing. ### Changes 🏗️ <!-- Concisely describe all of the changes made in this pull request: --> New blocks added in `backend/blocks/video/`: - `VideoDownloadBlock` - Download videos from URLs (YouTube, Vimeo, news sites, direct links) using yt-dlp - `VideoClipBlock` - Extract time segments from videos with start/end time validation - `VideoConcatBlock` - Merge multiple video clips with optional transitions (none, crossfade, fade_black) - `VideoTextOverlayBlock` - Add text overlays/captions with positioning and timing options - `VideoNarrationBlock` - Generate AI narration via ElevenLabs and mix with video audio (replace, mix, or ducking modes) Dependencies required: - `yt-dlp` - For video downloading - `moviepy` - For video editing operations Implementation details: - All blocks follow the SDK pattern with proper error handling and exception chaining - Proper resource cleanup in `finally` blocks to prevent memory leaks - Input validation (e.g., end_time > start_time) - Test mocks included for CI ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Blocks follow the SDK pattern with `BlockSchemaInput`/`BlockSchemaOutput` - [x] Resource cleanup is implemented in `finally` blocks - [x] Exception chaining is properly implemented - [x] Input validation is in place - [x] Test mocks are provided for CI environments #### For configuration changes: - [ ] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [ ] I have included a list of my configuration changes in the PR description (under Changes) N/A - No configuration changes required. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Adds new multimedia blocks that invoke ffmpeg/MoviePy and introduces new external dependencies (plus container packages), which can impact runtime stability and resource usage; download/overlay blocks are present but disabled due to sandbox/policy concerns. > > Overview > Adds a new `backend.blocks.video` module with general-purpose video workflow blocks (download, clip, concat w/ transitions, loop, add-audio, text overlay, and ElevenLabs-powered narration), including shared utilities for codec selection, filename cleanup, and an ffmpeg-based chapter-strip workaround for MoviePy. > > Extends credentials/config to support ElevenLabs (`ELEVENLABS_API_KEY`, provider enum, system credentials, and cost config) and adds new dependencies (`elevenlabs`, `yt-dlp`) plus Docker runtime packages (`ffmpeg`, `imagemagick`). > > Improves file/reference handling end-to-end by embedding MIME types in `workspace://...#mime` outputs and updating frontend rendering to detect video vs image from MIME fragments (and broaden supported audio/video extensions), with optional enhanced output rendering behind a feature flag in the legacy builder UI. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `da7a44d794`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com> Co-authored-by: Otto <otto@agpt.co> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 22:22:33 +00:00
Bently	bfa942e032	feat(platform): Add Claude Opus 4.6 model support (#11983 ) ## Summary Adds support for Anthropic's newly released Claude Opus 4.6 model. ## Changes - Added `claude-opus-4-6` to the `LlmModel` enum - Added model metadata: 200K context window (1M beta), 128K max output tokens - Added block cost config (same pricing tier as Opus 4.5: $5/MTok input, $25/MTok output) - Updated chat config default model to Claude Opus 4.6 ## Model Details From [Anthropic's docs](https://docs.anthropic.com/en/docs/about-claude/models): - API ID: `claude-opus-4-6` - Context window: 200K tokens (1M beta) - Max output: 128K tokens (up from 64K on Opus 4.5) - Extended thinking: Yes - Adaptive thinking: Yes (new, Opus 4.6 exclusive) - Knowledge cutoff: May 2025 (reliable), Aug 2025 (training) - Pricing: $5/MTok input, $25/MTok output (same as Opus 4.5) --------- Co-authored-by: Toran Bruce Richards <toran.richards@gmail.com>	2026-02-05 19:19:51 +00:00
Bently	3ca2387631	feat(blocks): Implement Text Encode block (#11857 ) ## Summary Implements a `TextEncoderBlock` that encodes plain text into escape sequences (the reverse of `TextDecoderBlock`). ## Changes ### Block Implementation - Added `encoder_block.py` with `TextEncoderBlock` in `autogpt_platform/backend/backend/blocks/` - Uses `codecs.encode(text, "unicode_escape").decode("utf-8")` for encoding - Mirrors the structure and patterns of the existing `TextDecoderBlock` - Categorised as `BlockCategory.TEXT` ### Documentation - Added Text Encoder section to `docs/integrations/block-integrations/text.md` (the auto-generated docs file for TEXT category blocks) - Expanded "How it works" with technical details on the encoding method, validation, and edge cases - Added 3 structured use cases per docs guidelines: JSON payload preparation, Config/ENV generation, Snapshot fixtures - Added Text Encoder to the overview table in `docs/integrations/README.md` - Removed standalone `encoder_block.md` (TEXT category blocks belong in `text.md` per `CATEGORY_FILE_MAP` in `generate_block_docs.py`) ### Documentation Formatting (CodeRabbit feedback) - Added blank lines around markdown tables (MD058) - Added `text` language tags to fenced code blocks (MD040) - Restructured use case section with bold headings per coding guidelines ## How Docs Were Synced The `check-docs-sync` CI job runs `poetry run python scripts/generate_block_docs.py --check` which expects blocks to be documented in category-grouped files. Since `TextEncoderBlock` uses `BlockCategory.TEXT`, the `CATEGORY_FILE_MAP` maps it to `text.md` — not a standalone file. The block entry was added to `text.md` following the exact format used by the generator (with `<!-- MANUAL -->` markers for hand-written sections). ## Related Issue Fixes #11111 --------- Co-authored-by: Otto <otto@agpt.co> Co-authored-by: lif <19658300+majiayu000@users.noreply.github.com> Co-authored-by: Aryan Kaul <134673289+aryancodes1@users.noreply.github.com> Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co> Co-authored-by: Nick Tindle <nick@ntindle.com>	2026-02-05 17:31:02 +00:00
Otto	ed07f02738	fix(copilot): edit_agent updates existing agent instead of creating duplicate (#11981 ) ## Summary When editing an agent via CoPilot's `edit_agent` tool, the code was always creating a new `LibraryAgent` entry instead of updating the existing one to point to the new graph version. This caused duplicate agents to appear in the user's library. ## Changes In `save_agent_to_library()`: - When `is_update=True`, now checks if there's an existing library agent for the graph using `get_library_agent_by_graph_id()` - If found, uses `update_agent_version_in_library()` to update the existing library agent to point to the new version - Falls back to creating a new library agent if no existing one is found (e.g., if editing a graph that wasn't added to library yet) ## Testing - Verified lint/format checks pass - Plan reviewed and approved by Staff Engineer Plan Reviewer agent ## Related Fixes [SECRT-1857](https://linear.app/autogpt/issue/SECRT-1857) --------- Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>	2026-02-05 15:02:26 +00:00
Swifty	b121030c94	feat(frontend): Add progress indicator during agent generation [SECRT-1883] (#11974 ) ## Summary - Add asymptotic progress bar that appears during long-running chat tasks - Progress bar shows after 10 seconds with "Working on it..." label and percentage - Uses half-life formula: ~50% at 30s, ~75% at 60s, ~87.5% at 90s, etc. - Creates the classic "game loading bar" effect that never reaches 100% https://github.com/user-attachments/assets/3c59289e-793c-4a08-b3fc-69e1eef28b1f ## Test plan - [x] Start a chat that triggers agent generation - [x] Wait 10+ seconds for the progress bar to appear - [x] Verify progress bar is centered with label and percentage - [x] Verify progress follows expected timing (~50% at 30s) - [x] Verify progress bar disappears when task completes --------- Co-authored-by: Otto <otto@agpt.co>	2026-02-05 15:37:51 +01:00
Swifty	e40233a3ac	fix(backend/chat): Guide find_agent users toward action with CTAs (#11976 ) When users search for agents, guide them toward creating custom agents if no results are found or after showing results. This improves user engagement by offering a clear next step. ### Changes 🏗️ - Updated `agent_search.py` to add CTAs in search responses - Added messaging to inform users they can create custom agents based on their needs - Applied to both "no results found" and "agents found" scenarios ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Search for agents in marketplace with matching results - [x] Search for agents in marketplace with no results - [x] Search for agents in library with matching results - [x] Search for agents in library with no results - [x] Verify CTA message appears in all cases --------- Co-authored-by: Otto <otto@agpt.co>	2026-02-05 15:36:55 +01:00
Swifty	3ae5eabf9d	fix(backend/chat): Use latest prompt label in non-production environments (#11977 ) In non-production environments, the chat service now fetches prompts with the `latest` label instead of the default production-labeled prompt. This makes it easier to test and iterate on prompt changes in dev/staging without needing to promote them to production first. ### Changes 🏗️ - Updated `_get_system_prompt_template()` in chat service to pass `label="latest"` when `app_env` is not `PRODUCTION` - Production environments continue using the default behavior (production-labeled prompts) ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified that in non-production environments, prompts with `latest` label are fetched - [x] Verified that production environments still use the default (production) labeled prompts Co-authored-by: Otto <otto@agpt.co>	2026-02-05 14:54:39 +01:00
Otto	a077ba9f03	fix(platform): YouTube block yields only error on failure (#11980 ) ## Summary Fixes [SECRT-1889](https://linear.app/autogpt/issue/SECRT-1889): The YouTube transcription block was yielding both `video_id` and `error` when the transcript fetch failed. ## Problem The block yielded `video_id` immediately upon extracting it from the URL, before attempting to fetch the transcript. If the transcript fetch failed, both outputs were present. ```python # Before video_id = self.extract_video_id(input_data.youtube_url) yield "video_id", video_id # ← Yielded before transcript attempt transcript = self.get_transcript(video_id, credentials) # ← Could fail here ``` ## Solution Wrap the entire operation in try/except and only yield outputs after all operations succeed: ```python # After try: video_id = self.extract_video_id(input_data.youtube_url) transcript = self.get_transcript(video_id, credentials) transcript_text = self.format_transcript(transcript=transcript) # Only yield after all operations succeed yield "video_id", video_id yield "transcript", transcript_text except Exception as e: yield "error", str(e) ``` This follows the established pattern in other blocks (e.g., `ai_image_generator_block.py`). ## Testing - All 10 unit tests pass (`test/blocks/test_youtube.py`) - Lint/format checks pass Co-authored-by: Toran Bruce Richards <toran.richards@gmail.com>	2026-02-05 11:51:32 +00:00
Bently	5401d54eaa	fix(backend): Handle StreamHeartbeat in CoPilot stream handler (#11928 ) ### Changes 🏗️ Fixes AUTOGPT-SERVER-7JA (123 events since Jan 27, 2026). #### Problem `StreamHeartbeat` was added to keep SSE connections alive during long-running tool executions (yielded every 15s while waiting). However, the main `stream_chat_completion` handler's `elif` chain didn't have a case for it: ``` StreamTextStart → ✅ handled StreamTextDelta → ✅ handled StreamTextEnd → ✅ handled StreamToolInputStart → ✅ handled StreamToolInputAvailable → ✅ handled StreamToolOutputAvailable → ✅ handled StreamFinish → ✅ handled StreamError → ✅ handled StreamUsage → ✅ handled StreamHeartbeat → ❌ fell through to 'Unknown chunk type' error ``` This meant every heartbeat during tool execution generated a Sentry error instead of keeping the connection alive. #### Fix Add `StreamHeartbeat` to the `elif` chain and yield it through. The route handler already calls `to_sse()` on all yielded chunks, and `StreamHeartbeat.to_sse()` correctly returns `: heartbeat\n\n` (SSE comment format, ignored by clients but keeps proxies/load balancers happy). 1 file changed, 3 insertions.	2026-02-05 12:04:46 +01:00
Otto	5ac89d7c0b	fix(test): fix timing bug in test_block_credit_reset (#11978 ) ## Summary Fixes the flaky `test_block_credit_reset` test that was failing on multiple PRs with `assert 0 == 1000`. ## Root Cause The test calls `disable_test_user_transactions()` which sets `updatedAt` to 35 days ago from the actual current time. It then mocks `time_now` to January 1st. The bug: If the test runs in early February, 35 days ago is January — the same month as the mocked `time_now`. The credit refill logic only triggers when the balance snapshot is from a different month, so no refill happens and the balance stays at 0. ## Fix After calling `disable_test_user_transactions()`, explicitly set `updatedAt` to December of the previous year. This ensures it's always in a different month than the mocked `month1` (January), regardless of when the test runs. ## Testing CI will verify the fix.	2026-02-05 11:56:26 +01:00
Otto	4f908d5cb3	fix(platform): Improve Linear Search Block [SECRT-1880] (#11967 ) ## Summary Implements [SECRT-1880](https://linear.app/autogpt/issue/SECRT-1880) - Improve Linear Search Block ## Changes ### Models (`models.py`) - Added `State` model with `id`, `name`, and `type` fields for workflow state information - Added `state: State \| None` field to `Issue` model ### API Client (`_api.py`) - Updated `try_search_issues()` to: - Add `max_results` parameter (default 10, was ~50) to reduce token usage - Add `team_id` parameter for team filtering - Return `createdAt`, `state`, `project`, and `assignee` fields in results - Fixed `try_get_team_by_name()` to return descriptive error message when team not found instead of crashing with `IndexError` ### Block (`issues.py`) - Added `max_results` input parameter (1-100, default 10) - Added `team_name` input parameter for optional team filtering - Added `error` output field for graceful error handling - Added categories (`PRODUCTIVITY`, `ISSUE_TRACKING`) - Updated test fixtures to include new fields ## Breaking Changes \| Change \| Before \| After \| Mitigation \| \|--------\|--------\|-------\|------------\| \| Default result count \| ~50 \| 10 \| Users can set `max_results` up to 100 if needed \| ## Non-Breaking Changes - `state` field added to `Issue` (optional, defaults to `None`) - `max_results` param added (has default value) - `team_name` param added (optional, defaults to `None`) - `error` output added (follows established pattern from GitHub blocks) ## Testing - [x] Format/lint checks pass - [x] Unit test fixtures updated Resolves SECRT-1880 --------- Co-authored-by: Toran Bruce Richards <toran.richards@gmail.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Toran Bruce Richards <Torantulino@users.noreply.github.com>	2026-02-04 22:54:46 +00:00
Reinier van der Leer	c1aa684743	fix(platform/chat): Filter host-scoped credentials for `run_agent` tool (#11905 ) - Fixes [SECRT-1851: \[Copilot\] `run_agent` tool doesn't filter host-scoped credentials](https://linear.app/autogpt/issue/SECRT-1851) - Follow-up to #11881 ### Changes 🏗️ - Filter host-scoped credentials for `run_agent` tool - Tighten validation on host input field in `HostScopedCredentialsModal` - Use netloc (w/ port) rather than just hostname (w/o port) as host scope ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - Create graph that requires host-scoped credentials to work - Create host-scoped credentials with a different host - Try to have Copilot run the graph - [x] -> no matching credentials available - Create new credentials - [x] -> works --------- Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>	2026-02-04 16:27:14 +00:00
Nicholas Tindle	1eabc60484	Merge commit from fork Fixes GHSA-rc89-6g7g-v5v7 / CVE-2026-22038 The logger.info() calls were explicitly logging API keys via get_secret_value(), exposing credentials in plaintext logs. Changes: - Replace info-level credential logging with debug-level provider logging - Remove all explicit secret value logging from observe/act/extract blocks Co-authored-by: Otto <otto@agpt.co>	2026-02-03 11:16:57 -06:00
Swifty	f4bf492f24	feat(platform): Add Redis-based SSE reconnection for long-running CoPilot operations (#11877 ) ## Changes 🏗️ Adds Redis-based SSE reconnection support for long-running CoPilot operations (like Agent Generator), enabling clients to reconnect and resume receiving updates after disconnection. ### What this does: - Stream Registry - Redis-backed task tracking with message persistence via Redis Streams - SSE Reconnection - Clients can reconnect to active tasks using `task_id` and `last_message_id` - Duplicate Message Fix - Filters out in-progress assistant messages from session response when active stream exists - Completion Consumer - Handles background task completion notifications via Redis Streams ### Architecture: ``` 1. User sends message → Backend creates task in Redis 2. SSE chunks written to Redis Stream for persistence 3. Client receives chunks via SSE subscription 4. If client disconnects → Task continues in background 5. Client reconnects → GET /sessions/{id} returns active_stream info 6. Client subscribes to /tasks/{task_id}/stream with last_message_id 7. Missed messages replayed from Redis Stream ``` ### Key endpoints: - `GET /sessions/{session_id}` - Returns `active_stream` info if task is running - `GET /tasks/{task_id}/stream?last_message_id=X` - SSE endpoint for reconnection - `GET /tasks/{task_id}` - Get task status - `POST /operations/{op_id}/complete` - Webhook for external service completion ### Duplicate message fix: When `GET /sessions/{id}` detects an active stream: 1. Filters out the in-progress assistant message from response 2. Returns `last_message_id="0-0"` so client replays stream from beginning 3. Client receives complete response only through SSE (single source of truth) ### Frontend changes: - Task persistence in localStorage for cross-tab reconnection - Stream event dispatcher handles reconnection flow - Deduplication logic prevents duplicate messages ### Testing: - Manual testing of reconnection scenarios - Verified duplicate message fix works correctly ## Related - Resolves SSE timeout issues for Agent Generator - Fixes duplicate message bug on reconnection	2026-02-03 16:52:06 +01:00
Zamil Majdy	81e48c00a4	feat(copilot): add customize_agent tool for marketplace templates (#11943 ) ## Summary Adds a new copilot tool that allows users to customize marketplace/template agents using natural language before adding them to their library. This exposes the Agent Generator's `/api/template-modification` endpoint to the copilot, which was previously not available. ## Changes - service.py: Add `customize_template_external` to call Agent Generator's template modification endpoint - core.py: - Add `customize_template` wrapper function - Extract `graph_to_json` as a reusable function (was previously inline in `get_agent_as_json`) - customize_agent.py: New tool that: - Takes marketplace agent ID (format: `creator/slug`) - Fetches template from store via `store_db.get_agent()` - Calls Agent Generator for customization - Handles clarifying questions from the generator - Saves customized agent to user's library - __init__.py: Register the tool in `TOOL_REGISTRY` for auto-discovery ## Usage Flow 1. User searches marketplace: "Find me a newsletter agent" 2. Copilot calls `find_agent` → returns `autogpt/newsletter-writer` 3. User: "Customize that agent to post to Discord instead of email" 4. Copilot calls: ``` customize_agent( agent_id="autogpt/newsletter-writer", modifications="Post to Discord instead of sending email" ) ``` 5. Agent Generator may ask clarifying questions (e.g., "What Discord channel?") 6. Customized agent is saved to user's library ## Test plan - [x] Verified tool imports correctly - [x] Verified tool is registered in `TOOL_REGISTRY` - [x] Verified OpenAI function schema is valid - [x] Ran existing tests (`pytest backend/api/features/chat/tools/`) - all pass - [x] Type checker (`pyright`) passes with 0 errors - [ ] Manual testing with copilot (requires Agent Generator service)	2026-02-03 14:59:25 +00:00
Otto	7dc53071e8	fix(backend): Add retry and error handling to block initialization (#11946 ) ## Summary Adds retry logic and graceful error handling to `initialize_blocks()` to prevent transient DB errors from crashing server startup. ## Problem When a transient database error occurs during block initialization (e.g., Prisma P1017 "Server has closed the connection"), the entire server fails to start. This is overly aggressive since: 1. Blocks are already registered in memory 2. The DB sync is primarily for tracking/schema storage 3. One flaky connection shouldn't prevent the server from starting Triggered by: [Sentry AUTOGPT-SERVER-7PW](https://significant-gravitas.sentry.io/issues/7238733543/) ## Solution - Add retry decorator (3 attempts with exponential backoff) for DB operations - On failure after retries, log a warning and continue to the next block - Blocks remain available in memory even if DB sync fails - Log summary of any failed blocks at the end ## Changes - `autogpt_platform/backend/backend/data/block.py`: Wrap block DB sync in retry logic with graceful fallback ## Testing - Existing block initialization behavior unchanged on success - On transient DB errors: retries up to 3 times, then continues with warning	2026-02-03 12:43:30 +00:00
Zamil Majdy	4878665c66	Merge branch 'master' into dev	2026-02-03 16:01:23 +04:00
Zamil Majdy	678ddde751	refactor(backend): unify context compression into compress_context() (#11937 ) ## Background This PR consolidates and unifies context window management for the CoPilot backend. ### Problem The CoPilot backend had two separate implementations of context window management: 1. `service.py` → `_manage_context_window()` - Chat service streaming/continuation 2. `prompt.py` → `compress_prompt()` - Sync LLM blocks This duplication led to inconsistent behavior, maintenance burden, and duplicate code. --- ## Solution: Unified `compress_context()` A single async function that handles both use cases: \| Caller \| Usage \| Behavior \| \|--------\|-------\|----------\| \| Chat service \| `compress_context(msgs, client=openai_client)` \| Summarization → Truncation \| \| LLM blocks \| `compress_context(msgs, client=None)` \| Truncation only (no API call) \| --- ## Strategy Order \| Step \| Description \| Runs When \| \|------\|-------------\|-----------\| \| 1. LLM Summarization \| Summarize old messages into single context message, keep recent 15 \| Only if `client` provided \| \| 2. Content Truncation \| Progressively truncate message content (8192→4096→...→128 tokens) \| If still over limit \| \| 3. Middle-out Deletion \| Delete messages one at a time from center outward \| If still over limit \| \| 4. First/Last Trim \| Truncate system prompt and last message content \| Last resort \| ### Why This Order? 1. Summarization first (if available) - Preserves semantic meaning of old messages 2. Content truncation before deletion - Keeps all conversation turns, just shorter 3. Middle-out deletion - More granular than dropping all old messages at once 4. First/last trim - Only touch system prompt as last resort --- ## Key Fixes \| Issue \| Before \| After \| \|-------\|--------\|-------\| \| Socket leak \| `AsyncOpenAI` client never closed \| `async with` context manager \| \| Timeout ignored \| `timeout=30` passed to `create()` (invalid) \| `client.with_options(timeout=30)` \| \| OpenAI tool messages \| Not truncated \| Properly truncated \| \| Tool pair integrity \| OpenAI format only \| Both OpenAI + Anthropic formats \| --- ## Tool Format Support `_ensure_tool_pairs_intact()` now supports both formats: ### OpenAI Format ```python # Assistant with tool_calls {"role": "assistant", "tool_calls": [{"id": "call_1", ...}]} # Tool response {"role": "tool", "tool_call_id": "call_1", "content": "result"} ``` ### Anthropic Format ```python # Assistant with tool_use {"role": "assistant", "content": [{"type": "tool_use", "id": "toolu_1", ...}]} # Tool result {"role": "user", "content": [{"type": "tool_result", "tool_use_id": "toolu_1", ...}]} ``` --- ## Files Changed \| File \| Change \| \|------\|--------\| \| `backend/util/prompt.py` \| +450 lines: Add `CompressResult`, `compress_context()`, helpers \| \| `backend/api/features/chat/service.py` \| -380 lines: Remove duplicate, use thin wrapper \| \| `backend/blocks/llm.py` \| Migrate `llm_call()` to use `compress_context(client=None)` \| \| `backend/util/prompt_test.py` \| +400 lines: Comprehensive tests (OpenAI + Anthropic) \| ### Removed - `compress_prompt()` - Replaced by `compress_context(client=None)` - `_manage_context_window()` - Replaced by `compress_context(client=openai_client)` --- ## API ```python async def compress_context( messages: list[dict], target_tokens: int = 120_000, *, model: str = "gpt-4o", client: AsyncOpenAI \| None = None, # None = truncation only keep_recent: int = 15, reserve: int = 2_048, start_cap: int = 8_192, floor_cap: int = 128, ) -> CompressResult: ... @dataclass class CompressResult: messages: list[dict] token_count: int was_compacted: bool error: str \| None = None original_token_count: int = 0 messages_summarized: int = 0 messages_dropped: int = 0 ``` --- ## Tests Added \| Test Class \| Coverage \| \|------------\|----------\| \| `TestMsgTokens` \| Token counting for regular messages, OpenAI tool calls, Anthropic tool_use \| \| `TestTruncateToolMessageContent` \| OpenAI + Anthropic tool message truncation \| \| `TestEnsureToolPairsIntact` \| OpenAI format (3 tests), Anthropic format (3 tests), edge cases (3 tests) \| \| `TestCompressContext` \| No compression, truncation-only, tool pair preservation, error handling \| --- ## Checklist - [x] Code follows project conventions - [x] Linting passes (`poetry run format`) - [x] Type checking passes (`pyright`) - [x] Tests added for all new functions - [x] Both OpenAI and Anthropic tool formats supported - [x] Backward compatible behavior preserved - [x] All review comments addressed	2026-02-03 10:36:10 +00:00
Otto	aef6f57cfd	fix(scheduler): route db calls through DatabaseManager (#11941 ) ## Summary Routes `increment_onboarding_runs` and `cleanup_expired_oauth_tokens` through the DatabaseManager RPC client instead of calling Prisma directly. ## Problem The Scheduler service never connects its Prisma client. While `add_graph_execution()` in `utils.py` has a fallback that routes through DatabaseManager when Prisma isn't connected, subsequent calls in the scheduler were hitting Prisma directly: - `increment_onboarding_runs()` after successful graph execution - `cleanup_expired_oauth_tokens()` in the scheduled job These threw `ClientNotConnectedError`, caught by generic exception handlers but spamming Sentry (~696K events since December per the original analysis in #11926). ## Solution Follow the same pattern as `utils.py`: 1. Add `cleanup_expired_oauth_tokens` to `DatabaseManager` and `DatabaseManagerAsyncClient` 2. Update scheduler to use `get_database_manager_async_client()` for both calls ## Changes - database.py: Import and expose `cleanup_expired_oauth_tokens` in both manager classes - scheduler.py: Use `db.increment_onboarding_runs()` and `db.cleanup_expired_oauth_tokens()` via the async client ## Impact - Eliminates Sentry error spam from scheduler - Onboarding run counters now actually increment for scheduled executions - OAuth token cleanup now actually runs ## Testing Deploy to staging with scheduled graphs and verify: 1. No more `ClientNotConnectedError` in scheduler logs 2. `UserOnboarding.agentRuns` increments on scheduled runs 3. Expired OAuth tokens get cleaned up Refs: #11926 (original fix that was closed)	2026-02-03 09:54:49 +00:00
Krzysztof Czerwinski	14cee1670a	fix(backend): Prevent leaking Redis connections in `ws_api` (#11869 ) Fixing https://github.com/Significant-Gravitas/AutoGPT/pull/11297#discussion_r2496833421 ### Changes 🏗️ 1. event_bus.py - Added close method to AsyncRedisEventBus - Added __init__ method to track the _pubsub instance attribute - Added async def close() method that closes the PubSub connection safely - Modified listen_events() to store the pubsub reference in self._pubsub 2. ws_api.py - Added cleanup in event_broadcaster - Wrapped the worker coroutines in try/finally block - The finally block calls close() on both event buses to ensure cleanup happens on any exit (including exceptions before retry)	2026-02-03 08:07:48 +00:00
Zamil Majdy	d81d1ce024	refactor(backend): extract context window management and fix LLM continuation (#11936 ) ## Summary Fixes CoPilot becoming unresponsive after long-running tools complete, and refactors context window management into a reusable function. ## Problem After `create_agent` completes, `_generate_llm_continuation()` was sending ALL messages to OpenRouter without any context compaction. When conversations exceeded ~50 messages, OpenRouter rejected requests with `provider_name: 'unknown'` (no provider would accept). Evidence: Langfuse session [44fbb803-092e-4ebd-b288-852959f4faf5](https://cloud.langfuse.com/project/cmk5qhf210003ad079sd8utjt/sessions/44fbb803-092e-4ebd-b288-852959f4faf5) showed: - Successful calls: 32-50 messages, known providers - Failed calls: 52+ messages, `provider: unknown`, `completion: null` ## Changes ### Refactor: Extract reusable `_manage_context_window()` - Counts tokens and checks against 120k threshold - Summarizes old messages while keeping recent 15 - Ensures tool_call/tool_response pairs stay intact - Progressive truncation if still over limit - Returns `ContextWindowResult` dataclass with messages, token count, compaction status, and errors - Helper `_messages_to_dicts()` reduces code duplication ### Fix: Update `_generate_llm_continuation()` - Now calls `_manage_context_window()` before making LLM calls - Adds retry logic with exponential backoff (matching `_stream_chat_chunks` behavior) ### Cleanup: Update `_stream_chat_chunks()` - Replaced inline context management with call to `_manage_context_window()` - Eliminates code duplication between the two functions ## Testing - Syntax check: ✅ - Ruff lint: ✅ - Import verification: ✅ ## Checklist - [x] My code follows the style guidelines of this project - [x] I have performed a self-review of my own code - [x] My changes generate no new warnings - [x] I have checked that my changes do not break existing functionality --------- Co-authored-by: Otto <otto@agpt.co>	2026-02-03 04:41:43 +00:00
Zamil Majdy	2dd341c369	refactor: enrich description with context before calling Agent Generator (#11932 ) ## Summary Updates the Agent Generator client to enrich the description with context before calling, instead of sending `user_instruction` as a separate parameter. ## Context Companion PR to Significant-Gravitas/AutoGPT-Agent-Generator#105 which removes unused parameters from the decompose API. ## Changes - Enrich `description` with `context` (e.g., clarifying question answers) before sending - Remove `user_instruction` from request payload ## How it works Both input boxes and chat box work the same way - the frontend constructs a formatted message with answers and sends it as a user message. The backend then enriches the description with this context before calling the external Agent Generator service.	2026-02-03 02:31:07 +00:00
Otto	f7350c797a	fix(copilot): use messages_dict in fallback context compaction (#11922 ) ## Summary Fixes a bug where the fallback path in context compaction passes `recent_messages` (already sliced) instead of `messages_dict` (full conversation) to `_ensure_tool_pairs_intact`. This caused the function to fail to find assistant messages that exist in the original conversation but were outside the sliced window, resulting in orphan tool_results being sent to Anthropic and rejected with: ``` messages.66.content.0: unexpected tool_use_id found in tool_result blocks: toolu_vrtx_019bi1PDvEn7o5ByAxcS3VdA ``` ## Changes - Pass `messages_dict` and `slice_start` (relative to full conversation) instead of `recent_messages` and `reduced_slice_start` (relative to already-sliced list) ## Testing This is a targeted fix for the fallback path. The bug only manifests when: 1. Token count > 120k (triggers compaction) 2. Initial compaction + summary still exceeds limit (triggers fallback) 3. A tool_result's corresponding assistant is in `messages_dict` but not in `recent_messages` ## Related - Fixes SECRT-1861 - Related: SECRT-1839 (original fix that missed this code path)	2026-02-02 13:01:05 +00:00
Guofang.Tang	1081590384	feat(backend): cover webhook ingress URL route (#11747 ) ### Changes 🏗️ - Add a unit test to verify webhook ingress URL generation matches the FastAPI route. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] poetry run pytest backend/integrations/webhooks/utils_test.py --confcutdir=backend/integrations/webhooks #### For configuration changes: - [x] .env.default is updated or already compatible with my changes - [x] docker-compose.yml is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under Changes) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Tests * Added a unit test that validates webhook ingress URL generation matches the application's resolved route (scheme, host, and path) for provider-specific webhook endpoints, improving confidence in routing behavior and helping prevent regressions. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Reinier van der Leer <pwuts@agpt.co>	2026-02-01 20:29:15 +00:00
Otto	7e37de8e30	fix: Include graph schemas for marketplace agents in Agent Generator (#11920 ) ## Problem When marketplace agents are included in the `library_agents` payload sent to the Agent Generator service, they were missing required fields (`graph_id`, `graph_version`, `input_schema`, `output_schema`). This caused Pydantic validation to fail with HTTP 422 Unprocessable Entity. Root cause: The `MarketplaceAgentSummary` TypedDict had a different shape than `LibraryAgentInfo` expected by the Agent Generator: - Agent Generator expects: `graph_id`, `graph_version`, `name`, `description`, `input_schema`, `output_schema` - MarketplaceAgentSummary had: `name`, `description`, `sub_heading`, `creator`, `is_marketplace_agent` ## Solution 1. Add `agent_graph_id` to `StoreAgent` model - The field was already in the database view but not exposed 2. Include `agentGraphId` in hybrid search SQL query - Carry the field through the search CTEs 3. Update `search_marketplace_agents_for_generation()` - Now fetches full graph schemas using `get_graph()` and returns `LibraryAgentSummary` (same type as library agents) 4. Update deduplication logic - Use `graph_id` instead of name for more accurate deduplication ## Changes - `backend/api/features/store/model.py`: Add optional `agent_graph_id` field to `StoreAgent` - `backend/api/features/store/hybrid_search.py`: Include `agentGraphId` in SQL query columns - `backend/api/features/store/db.py`: Map `agentGraphId` when creating `StoreAgent` objects - `backend/api/features/chat/tools/agent_generator/core.py`: Update `search_marketplace_agents_for_generation()` to fetch and include full graph schemas ## Testing - [ ] Agent creation on dev with marketplace agents in context - [ ] Verify no 422 errors from Agent Generator - [ ] Verify marketplace agents can be used as sub-agents Fixes: SECRT-1817 --------- Co-authored-by: majdyz <majdyz@users.noreply.github.com> Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>	2026-01-31 19:17:36 +00:00
Otto	2abbb7fbc8	hotfix(backend): use discriminator for credential matching in run_block (#11908 ) Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com> Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 21:50:21 -06:00
Nicholas Tindle	05b60db554	fix(backend/chat): Include input schema in discovery and validate unknown fields (#11916 ) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 21:00:43 -06:00
Zamil Majdy	18a1661fa3	feat: add library agent fetching with two-phase search for sub-agent support (#11889 ) ## Context When users ask the chat to create agents, they may want to compose workflows that reuse their existing agents as sub-agents. For this to work, the Agent Generator service needs to know what agents the user has available. Challenge: Users can have large libraries with many agents. Fetching all of them would be slow and provide too much context to the LLM. ## Solution This PR implements search-based library agent fetching with a two-phase search strategy: 1. Phase 1 (Initial Search): When the user describes their goal, we search for relevant library agents using the goal as the search query 2. Phase 2 (Step-Based Enrichment): After the goal is decomposed into steps, we extract keywords from those steps and search for additional relevant agents This ensures we find agents that are relevant to both the high-level goal AND the specific steps identified. ### Example Flow ``` User goal: "Create an agent that fetches weather and sends a summary email" Phase 1: Search for "weather email summary" → finds "Weather Fetcher" agent Phase 2: After decomposition identifies steps like "send email notification" → searches "send email notification" → finds "Gmail Sender" agent ``` ### Changes Library Agent Fetching: - `get_library_agents_for_generation()` - Search-based fetching from user's library - `search_marketplace_agents_for_generation()` - Search public marketplace - `get_all_relevant_agents_for_generation()` - Combines both with deduplication Two-Phase Search: - `extract_search_terms_from_steps()` - Extracts keywords from decomposed steps - `enrich_library_agents_from_steps()` - Searches for additional agents based on steps - Integrated into `create_agent.py` as "Step 1.5" after goal decomposition Type Safety: - Added `TypedDict` definitions: `LibraryAgentSummary`, `MarketplaceAgentSummary`, `DecompositionStep`, `DecompositionResult` ### Design Decisions - Search-based, not fetch-all: Scalable for large libraries - Library agents prioritized: They have full schemas; marketplace agents have basic info only - Deduplication by name and graph_id: Prevents duplicates across searches - Graceful degradation: Failures don't block agent generation - Limited to 3 search terms: Avoids excessive API calls during enrichment ## Related PR - Agent Generator: https://github.com/Significant-Gravitas/AutoGPT-Agent-Generator/pull/103 ## Test plan - [x] `test_library_agents.py` - 19 tests covering all new functions - [x] `test_service.py` - 4 tests for library_agents passthrough - [ ] Integration test: Create agent with library sub-agent composition	2026-01-31 00:18:21 +00:00
Reinier van der Leer	350ad3591b	fix(backend/chat): Filter credentials for graph execution by scopes (#11881 ) [SECRT-1842: run_agent tool does not correctly use credentials - agents fail with insufficient auth scopes](https://linear.app/autogpt/issue/SECRT-1842) ### Changes 🏗️ - Include scopes in credentials filter in `backend.api.features.chat.tools.utils.match_user_credentials_to_graph` ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - CI must pass - It's broken now and a simple change so we'll test in the dev deployment	2026-01-30 11:01:51 +00:00
Bently	de0ec3d388	chore(llm): remove deprecated Claude 3.7 Sonnet model with migration and defensive handling (#11841 ) ## Summary Remove `claude-3-7-sonnet-20250219` from LLM model definitions ahead of Anthropic's API retirement, with comprehensive migration and defensive error handling. ## Background Anthropic is retiring Claude 3.7 Sonnet (`claude-3-7-sonnet-20250219`) on February 19, 2026 at 9:00 AM PT. This PR removes the model from the platform and migrates existing users to prevent service interruptions. ## Changes ### Code Changes - Remove `CLAUDE_3_7_SONNET` enum member from `LlmModel` in `llm.py` - Remove corresponding `ModelMetadata` entry - Remove `CLAUDE_3_7_SONNET` from `StagehandRecommendedLlmModel` enum - Remove `CLAUDE_3_7_SONNET` from block cost config - Add `CLAUDE_4_5_SONNET` to `StagehandRecommendedLlmModel` enum - Update Stagehand block defaults from `CLAUDE_3_7_SONNET` to `CLAUDE_4_5_SONNET` (staying in Claude family) - Add defensive error handling in `CredentialsFieldInfo.discriminate()` for deprecated model values ### Database Migration - Adds migration `20260126120000_migrate_claude_3_7_to_4_5_sonnet` - Migrates `AgentNode.constantInput` model references - Migrates `AgentNodeExecutionInputOutput.data` preset overrides ### Documentation - Updated `docs/integrations/block-integrations/llm.md` to remove deprecated model - Updated `docs/integrations/block-integrations/stagehand/blocks.md` to remove deprecated model and add Claude 4.5 Sonnet ## Notes - Agent JSON files in `autogpt_platform/backend/agents/` still reference this model in their provider mappings. These are auto-generated and should be regenerated separately. ## Testing - [ ] Verify LLM block still functions with remaining models - [ ] Confirm no import errors in affected files - [ ] Verify migration runs successfully - [ ] Verify deprecated model gives helpful error message instead of KeyError	2026-01-30 08:40:55 +00:00
Otto	582c6cad36	fix(e2e): Make E2E test data deterministic and fix flaky tests (#11890 ) ## Summary Fixes flaky E2E marketplace and library tests that were causing PRs to be removed from the merge queue. ## Root Cause 1. Test data was probabilistic - `e2e_test_data.py` used random chances (40% approve, then 20-50% feature), which could result in 0 featured agents 2. Library pagination threshold wrong - Checked `>= 10`, but page size is 20 3. Fixed timeouts - Used `waitForTimeout(2000)` / `waitForTimeout(10000)` instead of proper waits ## Changes ### Backend (`e2e_test_data.py`) - Add guaranteed minimums: 8 featured agents, 5 featured creators, 10 top agents - First N submissions are deterministically approved and featured - Increase agents per user from 15 → 25 (for pagination with page_size=20) - Fix library agent creation to use constants instead of hardcoded `10` ### Frontend Tests - `library.spec.ts`: Fix pagination threshold to `PAGE_SIZE` (20) - `library.page.ts`: Replace 2s timeout with `networkidle` + `waitForFunction` - `marketplace.page.ts`: Add `networkidle` wait, 30s waits in `getFirst*` methods - `marketplace.spec.ts`: Replace 10s timeout with `waitForFunction` - `marketplace-creator.spec.ts`: Add `networkidle` + element waits ## Related - Closes SECRT-1848, SECRT-1849 - Should unblock #11841 and other PRs in merge queue --------- Co-authored-by: Ubbe <hi@ubbe.dev>	2026-01-30 05:12:35 +00:00
Zamil Majdy	b2eb4831bd	feat(chat): improve agent generator error propagation (#11884 ) ## Summary - Add helper functions in `service.py` to create standardized error responses with `error_type` classification - Update service functions to return error dicts instead of `None`, preserving error details from the Agent Generator microservice - Update `core.py` to pass through error responses properly - Update `create_agent.py` to handle error responses with user-friendly messages based on error type ## Error Types Now Propagated \| Error Type \| Description \| User Message \| \|------------\|-------------\|--------------\| \| `llm_parse_error` \| LLM returned unparseable response \| "The AI had trouble understanding this request" \| \| `llm_timeout` / `timeout` \| Request timed out \| "The request took too long" \| \| `llm_rate_limit` / `rate_limit` \| Rate limited \| "The service is currently busy" \| \| `validation_error` \| Agent validation failed \| "The generated agent failed validation" \| \| `connection_error` \| Could not connect to Agent Generator \| Generic error message \| \| `http_error` \| HTTP error from Agent Generator \| Generic error message \| \| `unknown` \| Unclassified error \| Generic error message \| ## Motivation This enables better debugging for issues like SECRT-1817 where decomposition failed due to transient LLM errors but the root cause was unclear in the logs. Now: 1. Error details from the Agent Generator microservice are preserved 2. Users get more helpful error messages based on error type 3. Debugging is easier with `error_type` in response details ## Related PR - Agent Generator side: https://github.com/Significant-Gravitas/AutoGPT-Agent-Generator/pull/102 ## Test Plan - [ ] Test decomposition with various error scenarios (timeout, parse error) - [ ] Verify user-friendly messages are shown based on error type - [ ] Check that error details are logged properly	2026-01-29 19:53:40 +00:00
Reinier van der Leer	4cd5da678d	refactor(claude): Split `autogpt_platform/CLAUDE.md` into project-specific files (#11788 ) Split `autogpt_platform/CLAUDE.md` into project-specific files, to make the scope of the instructions clearer. Also, some minor improvements: - Change references to other Markdown files to @file/path.md syntax that Claude recognizes - Update ambiguous/incorrect/outdated instructions - Remove trailing slashes - Fix broken file path references in other docs (including comments)	2026-01-29 17:33:02 +00:00
Nicholas Tindle	7668c17d9c	feat(platform): add User Workspace for persistent CoPilot file storage (#11867 ) Implements persistent User Workspace storage for CoPilot, enabling blocks to save and retrieve files across sessions. Files are stored in session-scoped virtual paths (`/sessions/{session_id}/`). Fixes SECRT-1833 ### Changes 🏗️ Database & Storage: - Add `UserWorkspace` and `UserWorkspaceFile` Prisma models - Implement `WorkspaceStorageBackend` abstraction (GCS for cloud, local filesystem for self-hosted) - Add `workspace_id` and `session_id` fields to `ExecutionContext` Backend API: - Add REST endpoints: `GET/POST /api/workspace/files`, `GET/DELETE /api/workspace/files/{id}`, `GET /api/workspace/files/{id}/download` - Add CoPilot tools: `list_workspace_files`, `read_workspace_file`, `write_workspace_file` - Integrate workspace storage into `store_media_file()` - returns `workspace://file-id` references Block Updates: - Refactor all file-handling blocks to use unified `ExecutionContext` parameter - Update media-generating blocks to persist outputs to workspace (AIImageGenerator, AIImageCustomizer, FluxKontext, TalkingHead, FAL video, Bannerbear, etc.) Frontend: - Render `workspace://` image references in chat via proxy endpoint - Add "AI cannot see this image" overlay indicator CoPilot Context Mapping: - Session = Agent (graph_id) = Run (graph_exec_id) - Files scoped to `/sessions/{session_id}/` ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [ ] Create CoPilot session, generate image with AIImageGeneratorBlock - [ ] Verify image returns `workspace://file-id` (not base64) - [ ] Verify image renders in chat with visibility indicator - [ ] Verify workspace files persist across sessions - [ ] Test list/read/write workspace files via CoPilot tools - [ ] Test local storage backend for self-hosted deployments #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under Changes) 🤖 Generated with [Claude Code](https://claude.ai/code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Medium Risk > Introduces a new persistent file-storage surface area (DB tables, storage backends, download API, and chat tools) and rewires `store_media_file()`/block execution context across many blocks, so regressions could impact file handling, access control, or storage costs. > > Overview > Adds a persistent per-user Workspace (new `UserWorkspace`/`UserWorkspaceFile` models plus `WorkspaceManager` + `WorkspaceStorageBackend` with GCS/local implementations) and wires it into the API via a new `/api/workspace/files/{file_id}/download` route (including header-sanitized `Content-Disposition`) and shutdown lifecycle hooks. > > Extends `ExecutionContext` to carry execution identity + `workspace_id`/`session_id`, updates executor tooling to clone node-specific contexts, and updates `run_block` (CoPilot) to create a session-scoped workspace and synthetic graph/run/node IDs. > > Refactors `store_media_file()` to require `execution_context` + `return_format` and to support `workspace://` references; migrates many media/file-handling blocks and related tests to the new API and to persist generated media as `workspace://...` (or fall back to data URIs outside CoPilot), and adds CoPilot chat tools for listing/reading/writing/deleting workspace files with safeguards against context bloat. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `6abc70f793`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Reinier van der Leer <pwuts@agpt.co>	2026-01-29 05:49:47 +00:00
Nicholas Tindle	e0dfae5732	fix(platform): evaluate chat flag after auth for correct redirect (#11873 ) Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-28 14:58:02 -06:00
Zamil Majdy	d855f79874	fix(platform): reduce Sentry alert spam for expected errors (#11872 ) ## Summary - Add `InvalidInputError` for validation errors (search term too long, invalid pagination) - returns 400 instead of 500 - Remove redundant try/catch blocks in library routes - global exception handlers already handle `ValueError`→400 and `NotFoundError`→404 - Aggregate embedding backfill errors and log once at the end instead of per content type to prevent Sentry issue spam ## Test plan - [x] Verify validation errors (search term >100 chars) return 400 Bad Request - [x] Verify NotFoundError still returns 404 - [x] Verify embedding errors are logged once at the end with aggregated counts Fixes AUTOGPT-SERVER-7K5, BUILDER-6NC --------- Co-authored-by: Swifty <craigswift13@gmail.com>	2026-01-29 01:28:27 +07:00
Nicholas Tindle	0953983944	feat(platform): disable onboarding redirects and add $5 signup bonus (#11862 ) Disable automatic onboarding redirects on signup/login while keeping the checklist/wallet functional. Users now receive $5 (500 credits) on their first visit to /copilot. ### Changes 🏗️ - Frontend: `shouldShowOnboarding()` now returns `false`, disabling auto-redirects to `/onboarding` - Backend: Added `VISIT_COPILOT` onboarding step with 500 credit ($5) reward - Frontend: Copilot page automatically completes `VISIT_COPILOT` step on mount - Database: Migration to add `VISIT_COPILOT` to `OnboardingStep` enum NOTE: /onboarding/1-welcome -> /library now as shouldShowOnboardin is always false Users land directly on `/copilot` after signup/login and receive $5 invisibly (not shown in checklist UI). ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] New user signup (email/password) → lands on `/copilot`, wallet shows 500 credits - [x] Verified credits are only granted once (idempotent via onboarding reward mechanism) - [x] Existing user login (already granted flag set) → lands on `/copilot`, no duplicate credits - [x] Checklist/wallet remains functional #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under Changes) No configuration changes required. --- OPEN-2967 🤖 Generated with [Claude Code](https://claude.ai/code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Introduces a new onboarding step and adjusts onboarding flow. > > - Adds `VISIT_COPILOT` onboarding step (+500 credits) with DB enum migration and API/type updates > - Copilot page auto-completes `VISIT_COPILOT` on mount to grant the welcome bonus > - Changes `/onboarding/enabled` to require user context and return `false` when `CHAT` feature is enabled (skips legacy onboarding) > - Wallet now refreshes credits on any onboarding `step_completed` notification; confetti limited to visible tasks > - Test flows updated to accept redirects to `copilot`/`library` and verify authenticated state > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `ec5a5a4dfd`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com>	2026-01-28 07:22:46 +00:00
Zamil Majdy	0058cd3ba6	fix(frontend): auto-poll for long-running tool completion (#11866 ) ## Summary Fixes the issue where the "Creating Agent" spinner doesn't auto-update when agent generation completes - user had to refresh the browser. Changes: - Frontend polling: Add `onOperationStarted` callback to trigger polling when `operation_started` is received via SSE - Polling backoff: 2s, 4s, 6s, 8s... up to 30s max - Message deduplication: Use content-based keys (role + content) instead of timestamps to prevent duplicate messages - Message ordering: Preserve server message order instead of timestamp-based sorting - Debug cleanup: Remove verbose console.log/console.info statements ## Test plan - [ ] Start agent generation in copilot - [ ] Verify "Creating Agent" spinner appears - [ ] Wait for completion (2-5 min) WITHOUT refreshing - [ ] Verify agent carousel appears automatically when done - [ ] Verify no duplicate messages in chat - [ ] Verify message order is correct (user → assistant → tool_call → tool_response)	2026-01-28 10:03:21 +07:00
Nicholas Tindle	ea035224bc	feat(copilot): Increase max_agent_runs and max_agent_schedules (#11865 ) <!-- Clearly explain the need for these changes: --> Config change to increase the max times an agent can run in the chat and the max number of scheduels created by copilot in one chat <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Increases per-chat operational limits for Copilot. > > - Bumps `max_agent_runs` default from `3` to `30` in `ChatConfig` > - Bumps `max_agent_schedules` default from `3` to `30` in `ChatConfig` > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `93cbae6d27`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->	2026-01-28 01:08:02 +00:00
Bently	67405f7eb9	fix(copilot): ensure tool_call/tool_response pairs stay intact during context compaction (#11863 ) ## Summary Fixes context compaction breaking tool_call/tool_response pairs, causing API validation errors. ## Problem When context compaction slices messages with `messages[-KEEP_RECENT:]`, a naive slice can separate an assistant message containing `tool_calls` from its corresponding tool response messages. This causes API validation errors like: ``` messages.0.content.1: unexpected 'tool_use_id' found in 'tool_result' blocks: orphan_12345. Each 'tool_result' block must have a corresponding 'tool_use' block in the previous message. ``` ## Solution Added `_ensure_tool_pairs_intact()` helper function that: 1. Detects orphan tool responses in a slice (tool messages whose `tool_call_id` has no matching assistant message) 2. Extends the slice backwards to include the missing assistant messages 3. Falls back to removing orphan tool responses if the assistant cannot be found (edge case) Applied this safeguard to: - The initial `KEEP_RECENT` slice (line ~990) - The progressive fallback slices when still over token limit (line ~1079) ## Testing - Syntax validated with `python -m py_compile` - Logic reviewed for correctness ## Linear Fixes SECRT-1839 --- Debugged by Toran & Orion in #agpt Discord	2026-01-28 00:21:54 +00:00
Zamil Majdy	171ff6e776	feat(backend): persist long-running tool results to survive SSE disconnects (#11856 ) ## Summary Agent generation (`create_agent`, `edit_agent`) can take 1-5 minutes. Previously, if the user closed their browser tab during this time: 1. The SSE connection would die 2. The tool execution would be cancelled via `CancelledError` 3. The result would be lost - even if the agent-generator service completed successfully This PR ensures long-running tool operations survive SSE disconnections. ### Changes 🏗️ Backend: - base.py: Added `is_long_running` property to `BaseTool` for tools to opt-in to background execution - create_agent.py / edit_agent.py: Set `is_long_running = True` - models.py: Added `OperationStartedResponse`, `OperationPendingResponse`, `OperationInProgressResponse` types - service.py: Modified `_yield_tool_call()` to: - Check if tool is `is_long_running` - Save "pending" message to chat history immediately - Spawn background task that runs independently of SSE - Return `operation_started` immediately (don't wait) - Update chat history with result when background task completes - Track running operations for idempotency (prevents duplicate ops on refresh) - db.py: Added `update_tool_message_content()` to update pending messages - model.py: Added `invalidate_session_cache()` to clear Redis after background completion Frontend: - useChatMessage.ts: Added operation message types - helpers.ts: Handle `operation_started`, `operation_pending`, `operation_in_progress` response types - PendingOperationWidget: New component to display operation status with spinner - ChatMessage.tsx: Render `PendingOperationWidget` for operation messages ### How It Works ``` User Request → Save "pending" message → Spawn background task → Return immediately ↓ Task runs independently of SSE ↓ On completion: Update message in chat history ↓ User refreshes → Loads history → Sees result ``` ### User Experience 1. User requests agent creation 2. Sees "Agent creation started. You can close this tab - check your library in a few minutes." 3. Can close browser tab safely 4. When they return, chat shows the completed result (or error) ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] pyright passes (0 errors) - [x] TypeScript checks pass - [x] Formatters applied ### Test Plan 1. Start agent creation in copilot 2. Close browser tab immediately after seeing "operation_started" 3. Wait 2-3 minutes 4. Reopen chat 5. Verify: Chat history shows completion message and agent appears in library --------- Co-authored-by: Ubbe <hi@ubbe.dev>	2026-01-28 05:09:34 +07:00
Swifty	2134d777be	fix(backend): exclude disabled blocks from chat search and indexing (#11854 ) ## Summary Disabled blocks (e.g., webhook blocks without `platform_base_url` configured) were being indexed and returned in chat tool search results. This PR ensures they are properly filtered out. ### Changes 🏗️ - find_block.py: Skip disabled blocks when enriching search results - content_handlers.py: - Skip disabled blocks during embedding indexing - Update `get_stats()` to only count enabled blocks for accurate coverage metrics ### Why Blocks can be disabled for various reasons (missing OAuth config, no platform URL for webhooks, etc.). These blocks shouldn't appear in search results since users cannot use them. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified disabled blocks are filtered from search results - [x] Verified disabled blocks are not indexed - [x] Verified stats accurately reflect enabled block count	2026-01-27 15:21:13 +00:00
Zamil Majdy	3e9d5d0d50	fix(backend): handle race condition in review processing gracefully (#11845 ) ## Summary - Fixes race condition when multiple concurrent requests try to process the same reviews (e.g., double-click, multiple browser tabs) - Previously the second request would fail with "Reviews not found, access denied, or not in WAITING status" - Now handles this gracefully by treating already-processed reviews with the same decision as success ## Changes - Added `get_reviews_by_node_exec_ids()` function that fetches reviews regardless of status - Modified `process_all_reviews_for_execution()` to handle already-processed reviews - Updated route to use idempotent validation ## Test plan - [x] Linter passes (`poetry run ruff check`) - [x] Type checker passes (`poetry run pyright`) - [x] Formatter passes (`poetry run format`) - [ ] Manual testing: double-click approve button should not cause errors Fixes AUTOGPT-SERVER-7HE	2026-01-27 21:43:31 +07:00
Swifty	fac10c422b	fix(backend): add SSE heartbeats to prevent tool execution timeouts (#11855 ) ## Summary Long-running chat tools (like `create_agent` and `edit_agent`) were timing out because no SSE data was sent during tool execution. GCP load balancers and proxies have idle connection timeouts (~60 seconds), and when the external Agent Generator service takes longer than this, the connection would drop. This PR adds SSE heartbeat comments during tool execution to keep connections alive. ### Changes 🏗️ - response_model.py: Added `StreamHeartbeat` response type that emits SSE comments (`: heartbeat\n\n`) - service.py: Modified `_yield_tool_call()` to: - Run tool execution in a background asyncio task - Yield heartbeat events every 15 seconds while waiting - Handle task failures with explicit error responses (no silent failures) - Handle cancellation gracefully - create_agent.py: Improved error messages with more context and details - edit_agent.py: Improved error messages with more context and details ### How It Works ``` Tool Call → Background Task Started │ ├── Every 15 seconds: yield `: heartbeat\n\n` (SSE comment) │ └── Task Complete → yield tool result OR error response ``` SSE comments (`: heartbeat\n\n`) are: - Ignored by SSE clients (don't trigger events) - Keep TCP connections alive through proxies/load balancers - Don't affect the AI SDK data protocol ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] All chat service tests pass (17 tests) - [x] Verified heartbeats are sent during long tool execution - [x] Verified errors are properly reported to frontend	2026-01-27 15:41:58 +01:00
Bently	91c7896859	fix(backend): implement context window management for long chat sessions (#11848 ) ## Changes 🏗️ Implements automatic context window management to prevent chat failures when conversations exceed token limits. ### Problem - Issue: [SECRT-1800] Long chat conversations stop working when context grows beyond model limits (~113k tokens observed) - Root Cause: Chat service sends ALL messages to LLM without token-aware compression, eventually exceeding Claude Opus 4.5's 200k context window ### Solution Implements a sliding window with summarization strategy: 1. Monitors token count before sending to LLM (triggers at 120k tokens) 2. Keeps last 15 messages completely intact (preserves recent conversation flow) 3. Summarizes older messages using gpt-4o-mini (fast & cheap) 4. Rebuilds context: `[system_prompt] + [summary] + [recent_15_messages]` 5. Full history preserved in database (only compresses when sending to LLM) ### Changes Made - Added `_summarize_messages()` helper function to create concise summaries using gpt-4o-mini - Modified `_stream_chat_chunks()` to implement token counting and conditional summarization - Integrated existing `estimate_token_count()` utility for accurate token measurement - Added graceful fallback - continues with original messages if summarization fails ## Motivation and Context 🎯 Without context management, users with long chat sessions (250+ messages) experience: - Complete chat failure when hitting 200k token limit - Lost conversation context - Poor user experience This fix enables: - ✅ Unlimited conversation length - ✅ Transparent operation (no UX changes) - ✅ Preserved conversation quality (recent messages intact) - ✅ Cost-efficient (~$0.0001 per summarization) ## Testing 🧪 ### Expected Behavior - Conversations < 120k tokens: No change (normal operation) - Conversations > 120k tokens: - Log message: `Context summarized: {tokens} tokens, kept last 15 messages + summary` - Chat continues working smoothly - Recent context remains intact ### How to Verify 1. Start a chat session in copilot 2. Send 250-600 messages (or 50+ with large code blocks) 3. Check logs for "Context summarized:" message 4. Verify chat continues working without errors 5. Verify conversation quality remains good ## Checklist ✅ - [x] My code follows the style guidelines of this project - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas - [x] My changes generate no new warnings - [x] I have tested my changes and verified they work as expected	2026-01-27 15:37:17 +01:00
Swifty	bab436231a	refactor(backend): remove Langfuse tracing from chat system (#11829 ) We are removing Langfuse tracing from the chat/copilot system in favor of using OpenRouter's broadcast feature, which keeps our codebase simpler. Langfuse prompt management is retained for fetching system prompts. ### Changes 🏗️ Removed Langfuse tracing: - Removed `@observe` decorators from all 11 chat tool files - Removed `langfuse.openai` wrapper (now using standard `openai` client) - Removed `start_as_current_observation` and `propagate_attributes` context managers from `service.py` - Removed `update_current_trace()`, `update_current_span()`, `span.update()` calls Retained Langfuse prompt management: - `langfuse.get_prompt()` for fetching system prompts - `_is_langfuse_configured()` check for prompt availability - Configuration for `langfuse_prompt_name` Files modified: - `backend/api/features/chat/service.py` - `backend/api/features/chat/tools/*.py` (11 tool files) ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified `poetry run format` passes - [x] Verified no `@observe` decorators remain in chat tools - [x] Verified Langfuse prompt fetching is still functional (code preserved)	2026-01-27 13:07:42 +01:00
Swifty	d5c0f5b2df	refactor(backend): remove page context from chat service (#11844 ) ### Background The chat service previously supported including page context (URL and content) in user messages. This functionality is being removed. ### Changes 🏗️ - Removed page context handling from `stream_chat_completion` in the chat service - User messages are now passed directly without URL/content context injection - Removed associated logging for page context ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verify chat functionality works without page context - [x] Confirm no regressions in basic chat message handling	2026-01-26 16:00:48 +00:00
Swifty	75ecc4de92	fix(backend): enforce block disabled flag on execution endpoints (#11839 ) ## Summary This PR adds security checks to prevent execution of disabled blocks across all block execution endpoints. - Add `disabled` flag check to main web API endpoint (`/api/blocks/{block_id}/execute`) - Add `disabled` flag check to external API endpoint (`/api/blocks/{block_id}/execute`) - Add `disabled` flag check to chat tool block execution Previously, block execution endpoints only checked if a block existed but did not verify the `disabled` flag, allowing any authenticated user to execute disabled blocks. ## Test plan - [x] Verify disabled blocks return 403 Forbidden on main API endpoint - [x] Verify disabled blocks return 403 Forbidden on external API endpoint - [x] Verify disabled blocks return error response in chat tool execution - [x] Verify enabled blocks continue to execute normally	2026-01-26 13:56:24 +00:00
Swifty	cfb7dc5aca	feat(backend): Add PostHog analytics and OpenRouter tracing to chat system (#11828 ) Adds analytics tracking to the chat copilot system for better observability of user interactions and agent operations. ### Changes 🏗️ PostHog Analytics Integration: - Added `posthog` dependency (v7.6.0) to track chat events - Created new tracking module (`backend/api/features/chat/tracking.py`) with events: - `chat_message_sent` - When a user sends a message - `chat_tool_called` - When a tool is called (includes tool name) - `chat_agent_run_success` - When an agent runs successfully - `chat_agent_scheduled` - When an agent is scheduled - `chat_trigger_setup` - When a trigger is set up - Added PostHog configuration to settings: - `POSTHOG_API_KEY` - API key for PostHog - `POSTHOG_HOST` - PostHog host URL (defaults to `https://us.i.posthog.com`) OpenRouter Tracing: - Added `user` and `session_id` fields to chat completion API calls for OpenRouter tracing - Added `posthogDistinctId` and `posthogProperties` (with environment) to API calls Files Changed: - `backend/api/features/chat/tracking.py` - New PostHog tracking module - `backend/api/features/chat/service.py` - Added user message tracking and OpenRouter tracing - `backend/api/features/chat/tools/__init__.py` - Added tool call tracking - `backend/api/features/chat/tools/run_agent.py` - Added agent run/schedule tracking - `backend/util/settings.py` - Added PostHog configuration fields - `pyproject.toml` - Added posthog dependency ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified code passes linting and formatting - [x] Verified PostHog client initializes correctly when API key is provided - [x] Verified tracking is gracefully skipped when PostHog is not configured #### For configuration changes: - [ ] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under Changes) New environment variables (optional): - `POSTHOG_API_KEY` - PostHog project API key - `POSTHOG_HOST` - PostHog host URL (optional, defaults to US cloud)	2026-01-26 12:26:15 +00:00

1 2 3 4 5 ...

1076 Commits