- Fix executionStatus.value crash when status is a string not enum
- Update snapshot with new LibraryAgent fields (execution_count, etc.)
- Update test mocks to include include_executions=True parameter
- Fix potential KeyError/AttributeError in agent deduplication by using
.get() with defaults instead of direct dict access
Fixes AUTOGPT-SERVER-7N6 - Some DB records have settings stored as
a JSON string instead of a dict, causing GraphSettings.model_validate
to fail with 'str' object has no attribute 'value'.
- Add RecentExecution model with status, correctness_score, and activity_summary
- Expose recent_executions in LibraryAgent for quality assessment
- Always pass error_details to user-facing messages for better debugging
- Update ExecutionSummary TypedDict for sub-agent composition
- Re-raise DatabaseError in get_library_agent_by_id to not swallow DB failures
- Add error details sanitization to strip sensitive info (paths, URLs, etc.)
- Clean up redundant inline comments in edit_agent.py
The ERROR status filter was too aggressive - a single failed execution
would exclude an agent from sub-agent composition, even if it had many
successful runs. Removed the filter for now.
Future enhancement: Add quality filtering based on execution success rate
or correctness_score (stored in AgentGraphExecution stats) rather than
the binary ERROR status.
Filter out library agents with ERROR status when searching for
sub-agent composition candidates. This prevents recommending broken
or draft agents that have failed executions.
Add configurable include_library parameter (default True) to allow
controlling whether user's library agents are included in the search
results for sub-agent composition.
[SECRT-1842: run_agent tool does not correctly use credentials - agents
fail with insufficient auth
scopes](https://linear.app/autogpt/issue/SECRT-1842)
### Changes 🏗️
- Include scopes in credentials filter in
`backend.api.features.chat.tools.utils.match_user_credentials_to_graph`
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- CI must pass
- It's broken now and a simple change so we'll test in the dev
deployment
## Summary
Remove `claude-3-7-sonnet-20250219` from LLM model definitions ahead of
Anthropic's API retirement, with comprehensive migration and defensive
error handling.
## Background
Anthropic is retiring Claude 3.7 Sonnet (`claude-3-7-sonnet-20250219`)
on **February 19, 2026 at 9:00 AM PT**. This PR removes the model from
the platform and migrates existing users to prevent service
interruptions.
## Changes
### Code Changes
- Remove `CLAUDE_3_7_SONNET` enum member from `LlmModel` in `llm.py`
- Remove corresponding `ModelMetadata` entry
- Remove `CLAUDE_3_7_SONNET` from `StagehandRecommendedLlmModel` enum
- Remove `CLAUDE_3_7_SONNET` from block cost config
- Add `CLAUDE_4_5_SONNET` to `StagehandRecommendedLlmModel` enum
- Update Stagehand block defaults from `CLAUDE_3_7_SONNET` to
`CLAUDE_4_5_SONNET` (staying in Claude family)
- Add defensive error handling in `CredentialsFieldInfo.discriminate()`
for deprecated model values
### Database Migration
- Adds migration `20260126120000_migrate_claude_3_7_to_4_5_sonnet`
- Migrates `AgentNode.constantInput` model references
- Migrates `AgentNodeExecutionInputOutput.data` preset overrides
### Documentation
- Updated `docs/integrations/block-integrations/llm.md` to remove
deprecated model
- Updated `docs/integrations/block-integrations/stagehand/blocks.md` to
remove deprecated model and add Claude 4.5 Sonnet
## Notes
- Agent JSON files in `autogpt_platform/backend/agents/` still reference
this model in their provider mappings. These are auto-generated and
should be regenerated separately.
## Testing
- [ ] Verify LLM block still functions with remaining models
- [ ] Confirm no import errors in affected files
- [ ] Verify migration runs successfully
- [ ] Verify deprecated model gives helpful error message instead of
KeyError
The "View in Library" link was returning 404 because the path was
missing the /agents/ segment. Fixed in both create_agent.py and
edit_agent.py to match the correct route used elsewhere.
- Add null checks for .lower() on agent names that could be None
- Add isinstance guard for non-string step values in extract_search_terms
- Re-raise DatabaseError instead of swallowing it in agent_search
- Remove inline comments per style guidelines
- Add LLM continuation call when background tool execution fails with
exception (previously users saw no explanation for errors)
- Improve validation error messages with more helpful guidance
- Add error_details parameter to include technical context in error
responses when needed
- Update create_agent to pass error details for validation failures
## Summary
Fixes flaky E2E marketplace and library tests that were causing PRs to
be removed from the merge queue.
## Root Cause
1. **Test data was probabilistic** - `e2e_test_data.py` used random
chances (40% approve, then 20-50% feature), which could result in 0
featured agents
2. **Library pagination threshold wrong** - Checked `>= 10`, but page
size is 20
3. **Fixed timeouts** - Used `waitForTimeout(2000)` /
`waitForTimeout(10000)` instead of proper waits
## Changes
### Backend (`e2e_test_data.py`)
- Add guaranteed minimums: 8 featured agents, 5 featured creators, 10
top agents
- First N submissions are deterministically approved and featured
- Increase agents per user from 15 → 25 (for pagination with
page_size=20)
- Fix library agent creation to use constants instead of hardcoded `10`
### Frontend Tests
- `library.spec.ts`: Fix pagination threshold to `PAGE_SIZE` (20)
- `library.page.ts`: Replace 2s timeout with `networkidle` +
`waitForFunction`
- `marketplace.page.ts`: Add `networkidle` wait, 30s waits in
`getFirst*` methods
- `marketplace.spec.ts`: Replace 10s timeout with `waitForFunction`
- `marketplace-creator.spec.ts`: Add `networkidle` + element waits
## Related
- Closes SECRT-1848, SECRT-1849
- Should unblock #11841 and other PRs in merge queue
---------
Co-authored-by: Ubbe <hi@ubbe.dev>
Previously, searching for "flight price drop alert" required that exact
phrase to be in the agent name/description. Now it splits into individual
words and matches agents containing ANY of: flight, price, drop, alert.
This fixes the issue where "flight price tracker" wasn't found when
searching for "flight price drop alert" even though they share keywords.
Added logging to help diagnose library search issues:
- Log the query and user_id when tool is called
- Log the number of results returned from database
When users paste a library URL or agent UUID, the find_library_agent
tool now does direct ID lookup first (both by graph_id and library
agent ID) before falling back to text search.
This fixes the issue where searching by UUID would fail because
it was only doing text matching on agent names/descriptions.
When users paste library URLs (e.g., /library/agents/{id}), the ID is
the LibraryAgent primary key, not the graph_id. The previous code only
looked up by graph_id, causing "agent not found" errors.
Now get_library_agent_by_id() tries both lookup strategies:
1. First by graph_id (AgentGraph primary key)
2. Then by library agent ID (LibraryAgent primary key)
This fixes the issue where users couldn't reference agents by pasting
their library URLs in chat.
When users mention agents by UUID in their goal description, we now:
1. Extract UUID v4 patterns from the search_query text
2. Fetch those agents directly by graph_id
3. Include them in the library_agents list for the LLM
This ensures explicitly referenced agents are always available to the
Agent Generator, even if text search wouldn't find them.
Added:
- extract_uuids_from_text(): extracts UUID v4 patterns from text
- get_library_agent_by_graph_id(): fetches a single agent by graph_id
- Integration in get_all_relevant_agents_for_generation()
- Move store_db, get_graph, get_graph_all_versions imports to top-level
- Catch specific NotFoundError instead of generic Exception
- Cleaner code organization following standard Python conventions
get_agent_as_json claimed to accept both graph IDs and library agent IDs
but only tried direct graph lookup. When a library agent ID was passed,
the function would return None (agent_not_found error).
Now the function:
1. First tries direct graph lookup with the provided ID
2. If not found, resolves the ID as a library agent ID to get the graph_id
3. Then fetches the graph using the resolved graph_id
- Add TypedDict types for agent summaries (LibraryAgentSummary, MarketplaceAgentSummary, DecompositionResult)
- Add extract_search_terms_from_steps() to extract keywords from decomposed instructions
- Add enrich_library_agents_from_steps() for two-phase search after decomposition
- Integrate enrichment into create_agent.py flow
- Add comprehensive tests for new functionality
- Add try/except error handling to get_library_agents_for_generation
for graceful degradation (consistent with marketplace search)
- Add null checks when deduplicating agents by name to prevent
AttributeError if agent name is None
- Use actual graph ID from current_agent in edit_agent.py to properly
exclude the agent being edited (agent_id might be a library agent ID)
- Add get_library_agents_for_generation() with search_term support
- Add search_marketplace_agents_for_generation() for marketplace search
- Add get_all_relevant_agents_for_generation() combining both sources
- Update service.py to pass library_agents in all requests
- Update create_agent.py to fetch and pass relevant library agents
- Update edit_agent.py to fetch and pass relevant library agents
- Add tests for library agent fetching and passthrough
## Summary
- Add helper functions in `service.py` to create standardized error
responses with `error_type` classification
- Update service functions to return error dicts instead of `None`,
preserving error details from the Agent Generator microservice
- Update `core.py` to pass through error responses properly
- Update `create_agent.py` to handle error responses with user-friendly
messages based on error type
## Error Types Now Propagated
| Error Type | Description | User Message |
|------------|-------------|--------------|
| `llm_parse_error` | LLM returned unparseable response | "The AI had
trouble understanding this request" |
| `llm_timeout` / `timeout` | Request timed out | "The request took too
long" |
| `llm_rate_limit` / `rate_limit` | Rate limited | "The service is
currently busy" |
| `validation_error` | Agent validation failed | "The generated agent
failed validation" |
| `connection_error` | Could not connect to Agent Generator | Generic
error message |
| `http_error` | HTTP error from Agent Generator | Generic error message
|
| `unknown` | Unclassified error | Generic error message |
## Motivation
This enables better debugging for issues like SECRT-1817 where
decomposition failed due to transient LLM errors but the root cause was
unclear in the logs. Now:
1. Error details from the Agent Generator microservice are preserved
2. Users get more helpful error messages based on error type
3. Debugging is easier with `error_type` in response details
## Related PR
- Agent Generator side:
https://github.com/Significant-Gravitas/AutoGPT-Agent-Generator/pull/102
## Test Plan
- [ ] Test decomposition with various error scenarios (timeout, parse
error)
- [ ] Verify user-friendly messages are shown based on error type
- [ ] Check that error details are logged properly
Split `autogpt_platform/CLAUDE.md` into project-specific files, to make
the scope of the instructions clearer.
Also, some minor improvements:
- Change references to other Markdown files to @file/path.md syntax that
Claude recognizes
- Update ambiguous/incorrect/outdated instructions
- Remove trailing slashes
- Fix broken file path references in other docs (including comments)
Implements persistent User Workspace storage for CoPilot, enabling
blocks to save and retrieve files across sessions. Files are stored in
session-scoped virtual paths (`/sessions/{session_id}/`).
Fixes SECRT-1833
### Changes 🏗️
**Database & Storage:**
- Add `UserWorkspace` and `UserWorkspaceFile` Prisma models
- Implement `WorkspaceStorageBackend` abstraction (GCS for cloud, local
filesystem for self-hosted)
- Add `workspace_id` and `session_id` fields to `ExecutionContext`
**Backend API:**
- Add REST endpoints: `GET/POST /api/workspace/files`, `GET/DELETE
/api/workspace/files/{id}`, `GET /api/workspace/files/{id}/download`
- Add CoPilot tools: `list_workspace_files`, `read_workspace_file`,
`write_workspace_file`
- Integrate workspace storage into `store_media_file()` - returns
`workspace://file-id` references
**Block Updates:**
- Refactor all file-handling blocks to use unified `ExecutionContext`
parameter
- Update media-generating blocks to persist outputs to workspace
(AIImageGenerator, AIImageCustomizer, FluxKontext, TalkingHead, FAL
video, Bannerbear, etc.)
**Frontend:**
- Render `workspace://` image references in chat via proxy endpoint
- Add "AI cannot see this image" overlay indicator
**CoPilot Context Mapping:**
- Session = Agent (graph_id) = Run (graph_exec_id)
- Files scoped to `/sessions/{session_id}/`
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] Create CoPilot session, generate image with AIImageGeneratorBlock
- [ ] Verify image returns `workspace://file-id` (not base64)
- [ ] Verify image renders in chat with visibility indicator
- [ ] Verify workspace files persist across sessions
- [ ] Test list/read/write workspace files via CoPilot tools
- [ ] Test local storage backend for self-hosted deployments
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
🤖 Generated with [Claude Code](https://claude.ai/code)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Introduces a new persistent file-storage surface area (DB tables,
storage backends, download API, and chat tools) and rewires
`store_media_file()`/block execution context across many blocks, so
regressions could impact file handling, access control, or storage
costs.
>
> **Overview**
> Adds a **persistent per-user Workspace** (new
`UserWorkspace`/`UserWorkspaceFile` models plus `WorkspaceManager` +
`WorkspaceStorageBackend` with GCS/local implementations) and wires it
into the API via a new `/api/workspace/files/{file_id}/download` route
(including header-sanitized `Content-Disposition`) and shutdown
lifecycle hooks.
>
> Extends `ExecutionContext` to carry execution identity +
`workspace_id`/`session_id`, updates executor tooling to clone
node-specific contexts, and updates `run_block` (CoPilot) to create a
session-scoped workspace and synthetic graph/run/node IDs.
>
> Refactors `store_media_file()` to require `execution_context` +
`return_format` and to support `workspace://` references; migrates many
media/file-handling blocks and related tests to the new API and to
persist generated media as `workspace://...` (or fall back to data URIs
outside CoPilot), and adds CoPilot chat tools for
listing/reading/writing/deleting workspace files with safeguards against
context bloat.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
6abc70f793. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
## Summary
- Add `InvalidInputError` for validation errors (search term too long,
invalid pagination) - returns 400 instead of 500
- Remove redundant try/catch blocks in library routes - global exception
handlers already handle `ValueError`→400 and `NotFoundError`→404
- Aggregate embedding backfill errors and log once at the end instead of
per content type to prevent Sentry issue spam
## Test plan
- [x] Verify validation errors (search term >100 chars) return 400 Bad
Request
- [x] Verify NotFoundError still returns 404
- [x] Verify embedding errors are logged once at the end with aggregated
counts
Fixes AUTOGPT-SERVER-7K5, BUILDER-6NC
---------
Co-authored-by: Swifty <craigswift13@gmail.com>
Disable automatic onboarding redirects on signup/login while keeping the
checklist/wallet functional. Users now receive $5 (500 credits) on their
first visit to /copilot.
### Changes 🏗️
- **Frontend**: `shouldShowOnboarding()` now returns `false`, disabling
auto-redirects to `/onboarding`
- **Backend**: Added `VISIT_COPILOT` onboarding step with 500 credit
($5) reward
- **Frontend**: Copilot page automatically completes `VISIT_COPILOT`
step on mount
- **Database**: Migration to add `VISIT_COPILOT` to `OnboardingStep`
enum
NOTE: /onboarding/1-welcome -> /library now as shouldShowOnboardin is
always false
Users land directly on `/copilot` after signup/login and receive $5
invisibly (not shown in checklist UI).
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] New user signup (email/password) → lands on `/copilot`, wallet
shows 500 credits
- [x] Verified credits are only granted once (idempotent via onboarding
reward mechanism)
- [x] Existing user login (already granted flag set) → lands on
`/copilot`, no duplicate credits
- [x] Checklist/wallet remains functional
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
No configuration changes required.
---
OPEN-2967
🤖 Generated with [Claude Code](https://claude.ai/code)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> Introduces a new onboarding step and adjusts onboarding flow.
>
> - Adds `VISIT_COPILOT` onboarding step (+500 credits) with DB enum
migration and API/type updates
> - Copilot page auto-completes `VISIT_COPILOT` on mount to grant the
welcome bonus
> - Changes `/onboarding/enabled` to require user context and return
`false` when `CHAT` feature is enabled (skips legacy onboarding)
> - Wallet now refreshes credits on any onboarding `step_completed`
notification; confetti limited to visible tasks
> - Test flows updated to accept redirects to `copilot`/`library` and
verify authenticated state
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
ec5a5a4dfd. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com>
## Summary
Fixes the issue where the "Creating Agent" spinner doesn't auto-update
when agent generation completes - user had to refresh the browser.
**Changes:**
- **Frontend polling**: Add `onOperationStarted` callback to trigger
polling when `operation_started` is received via SSE
- **Polling backoff**: 2s, 4s, 6s, 8s... up to 30s max
- **Message deduplication**: Use content-based keys (role + content)
instead of timestamps to prevent duplicate messages
- **Message ordering**: Preserve server message order instead of
timestamp-based sorting
- **Debug cleanup**: Remove verbose console.log/console.info statements
## Test plan
- [ ] Start agent generation in copilot
- [ ] Verify "Creating Agent" spinner appears
- [ ] Wait for completion (2-5 min) WITHOUT refreshing
- [ ] Verify agent carousel appears automatically when done
- [ ] Verify no duplicate messages in chat
- [ ] Verify message order is correct (user → assistant → tool_call →
tool_response)
<!-- Clearly explain the need for these changes: -->
Config change to increase the max times an agent can run in the chat and
the max number of scheduels created by copilot in one chat
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> Increases per-chat operational limits for Copilot.
>
> - Bumps `max_agent_runs` default from `3` to `30` in `ChatConfig`
> - Bumps `max_agent_schedules` default from `3` to `30` in `ChatConfig`
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
93cbae6d27. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
## Summary
Fixes context compaction breaking tool_call/tool_response pairs, causing
API validation errors.
## Problem
When context compaction slices messages with `messages[-KEEP_RECENT:]`,
a naive slice can separate an assistant message containing `tool_calls`
from its corresponding tool response messages. This causes API
validation errors like:
```
messages.0.content.1: unexpected 'tool_use_id' found in 'tool_result' blocks: orphan_12345.
Each 'tool_result' block must have a corresponding 'tool_use' block in the previous message.
```
## Solution
Added `_ensure_tool_pairs_intact()` helper function that:
1. Detects orphan tool responses in a slice (tool messages whose
`tool_call_id` has no matching assistant message)
2. Extends the slice backwards to include the missing assistant messages
3. Falls back to removing orphan tool responses if the assistant cannot
be found (edge case)
Applied this safeguard to:
- The initial `KEEP_RECENT` slice (line ~990)
- The progressive fallback slices when still over token limit (line
~1079)
## Testing
- Syntax validated with `python -m py_compile`
- Logic reviewed for correctness
## Linear
Fixes SECRT-1839
---
*Debugged by Toran & Orion in #agpt Discord*
## Summary
Agent generation (`create_agent`, `edit_agent`) can take 1-5 minutes.
Previously, if the user closed their browser tab during this time:
1. The SSE connection would die
2. The tool execution would be cancelled via `CancelledError`
3. The result would be lost - even if the agent-generator service
completed successfully
This PR ensures long-running tool operations survive SSE disconnections.
### Changes 🏗️
**Backend:**
- **base.py**: Added `is_long_running` property to `BaseTool` for tools
to opt-in to background execution
- **create_agent.py / edit_agent.py**: Set `is_long_running = True`
- **models.py**: Added `OperationStartedResponse`,
`OperationPendingResponse`, `OperationInProgressResponse` types
- **service.py**: Modified `_yield_tool_call()` to:
- Check if tool is `is_long_running`
- Save "pending" message to chat history immediately
- Spawn background task that runs independently of SSE
- Return `operation_started` immediately (don't wait)
- Update chat history with result when background task completes
- Track running operations for idempotency (prevents duplicate ops on
refresh)
- **db.py**: Added `update_tool_message_content()` to update pending
messages
- **model.py**: Added `invalidate_session_cache()` to clear Redis after
background completion
**Frontend:**
- **useChatMessage.ts**: Added operation message types
- **helpers.ts**: Handle `operation_started`, `operation_pending`,
`operation_in_progress` response types
- **PendingOperationWidget**: New component to display operation status
with spinner
- **ChatMessage.tsx**: Render `PendingOperationWidget` for operation
messages
### How It Works
```
User Request → Save "pending" message → Spawn background task → Return immediately
↓
Task runs independently of SSE
↓
On completion: Update message in chat history
↓
User refreshes → Loads history → Sees result
```
### User Experience
1. User requests agent creation
2. Sees "Agent creation started. You can close this tab - check your
library in a few minutes."
3. Can close browser tab safely
4. When they return, chat shows the completed result (or error)
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] pyright passes (0 errors)
- [x] TypeScript checks pass
- [x] Formatters applied
### Test Plan
1. Start agent creation in copilot
2. Close browser tab immediately after seeing "operation_started"
3. Wait 2-3 minutes
4. Reopen chat
5. Verify: Chat history shows completion message and agent appears in
library
---------
Co-authored-by: Ubbe <hi@ubbe.dev>
## Summary
Disabled blocks (e.g., webhook blocks without `platform_base_url`
configured) were being indexed and returned in chat tool search results.
This PR ensures they are properly filtered out.
### Changes 🏗️
- **find_block.py**: Skip disabled blocks when enriching search results
- **content_handlers.py**:
- Skip disabled blocks during embedding indexing
- Update `get_stats()` to only count enabled blocks for accurate
coverage metrics
### Why
Blocks can be disabled for various reasons (missing OAuth config, no
platform URL for webhooks, etc.). These blocks shouldn't appear in
search results since users cannot use them.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified disabled blocks are filtered from search results
- [x] Verified disabled blocks are not indexed
- [x] Verified stats accurately reflect enabled block count
## Summary
- Fixes race condition when multiple concurrent requests try to process
the same reviews (e.g., double-click, multiple browser tabs)
- Previously the second request would fail with "Reviews not found,
access denied, or not in WAITING status"
- Now handles this gracefully by treating already-processed reviews with
the same decision as success
## Changes
- Added `get_reviews_by_node_exec_ids()` function that fetches reviews
regardless of status
- Modified `process_all_reviews_for_execution()` to handle
already-processed reviews
- Updated route to use idempotent validation
## Test plan
- [x] Linter passes (`poetry run ruff check`)
- [x] Type checker passes (`poetry run pyright`)
- [x] Formatter passes (`poetry run format`)
- [ ] Manual testing: double-click approve button should not cause
errors
Fixes AUTOGPT-SERVER-7HE
## Summary
Long-running chat tools (like `create_agent` and `edit_agent`) were
timing out because no SSE data was sent during tool execution. GCP load
balancers and proxies have idle connection timeouts (~60 seconds), and
when the external Agent Generator service takes longer than this, the
connection would drop.
This PR adds SSE heartbeat comments during tool execution to keep
connections alive.
### Changes 🏗️
- **response_model.py**: Added `StreamHeartbeat` response type that
emits SSE comments (`: heartbeat\n\n`)
- **service.py**: Modified `_yield_tool_call()` to:
- Run tool execution in a background asyncio task
- Yield heartbeat events every 15 seconds while waiting
- Handle task failures with explicit error responses (no silent
failures)
- Handle cancellation gracefully
- **create_agent.py**: Improved error messages with more context and
details
- **edit_agent.py**: Improved error messages with more context and
details
### How It Works
```
Tool Call → Background Task Started
│
├── Every 15 seconds: yield `: heartbeat\n\n` (SSE comment)
│
└── Task Complete → yield tool result OR error response
```
SSE comments (`: heartbeat\n\n`) are:
- Ignored by SSE clients (don't trigger events)
- Keep TCP connections alive through proxies/load balancers
- Don't affect the AI SDK data protocol
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] All chat service tests pass (17 tests)
- [x] Verified heartbeats are sent during long tool execution
- [x] Verified errors are properly reported to frontend
## Changes 🏗️
Implements automatic context window management to prevent chat failures
when conversations exceed token limits.
### Problem
- **Issue**: [SECRT-1800] Long chat conversations stop working when
context grows beyond model limits (~113k tokens observed)
- **Root Cause**: Chat service sends ALL messages to LLM without
token-aware compression, eventually exceeding Claude Opus 4.5's 200k
context window
### Solution
Implements a sliding window with summarization strategy:
1. Monitors token count before sending to LLM (triggers at 120k tokens)
2. Keeps last 15 messages completely intact (preserves recent
conversation flow)
3. Summarizes older messages using gpt-4o-mini (fast & cheap)
4. Rebuilds context: `[system_prompt] + [summary] +
[recent_15_messages]`
5. Full history preserved in database (only compresses when sending to
LLM)
### Changes Made
- **Added** `_summarize_messages()` helper function to create concise
summaries using gpt-4o-mini
- **Modified** `_stream_chat_chunks()` to implement token counting and
conditional summarization
- **Integrated** existing `estimate_token_count()` utility for accurate
token measurement
- **Added** graceful fallback - continues with original messages if
summarization fails
## Motivation and Context 🎯
Without context management, users with long chat sessions (250+
messages) experience:
- Complete chat failure when hitting 200k token limit
- Lost conversation context
- Poor user experience
This fix enables:
- ✅ Unlimited conversation length
- ✅ Transparent operation (no UX changes)
- ✅ Preserved conversation quality (recent messages intact)
- ✅ Cost-efficient (~$0.0001 per summarization)
## Testing 🧪
### Expected Behavior
- Conversations < 120k tokens: No change (normal operation)
- Conversations > 120k tokens:
- Log message: `Context summarized: {tokens} tokens, kept last 15
messages + summary`
- Chat continues working smoothly
- Recent context remains intact
### How to Verify
1. Start a chat session in copilot
2. Send 250-600 messages (or 50+ with large code blocks)
3. Check logs for "Context summarized:" message
4. Verify chat continues working without errors
5. Verify conversation quality remains good
## Checklist ✅
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] My changes generate no new warnings
- [x] I have tested my changes and verified they work as expected
We are removing Langfuse tracing from the chat/copilot system in favor
of using OpenRouter's broadcast feature, which keeps our codebase
simpler. Langfuse prompt management is retained for fetching system
prompts.
### Changes 🏗️
**Removed Langfuse tracing:**
- Removed `@observe` decorators from all 11 chat tool files
- Removed `langfuse.openai` wrapper (now using standard `openai` client)
- Removed `start_as_current_observation` and `propagate_attributes`
context managers from `service.py`
- Removed `update_current_trace()`, `update_current_span()`,
`span.update()` calls
**Retained Langfuse prompt management:**
- `langfuse.get_prompt()` for fetching system prompts
- `_is_langfuse_configured()` check for prompt availability
- Configuration for `langfuse_prompt_name`
**Files modified:**
- `backend/api/features/chat/service.py`
- `backend/api/features/chat/tools/*.py` (11 tool files)
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified `poetry run format` passes
- [x] Verified no `@observe` decorators remain in chat tools
- [x] Verified Langfuse prompt fetching is still functional (code
preserved)
### Background
The chat service previously supported including page context (URL and
content) in user messages. This functionality is being removed.
### Changes 🏗️
- Removed page context handling from `stream_chat_completion` in the
chat service
- User messages are now passed directly without URL/content context
injection
- Removed associated logging for page context
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verify chat functionality works without page context
- [x] Confirm no regressions in basic chat message handling
## Summary
This PR adds security checks to prevent execution of disabled blocks
across all block execution endpoints.
- Add `disabled` flag check to main web API endpoint
(`/api/blocks/{block_id}/execute`)
- Add `disabled` flag check to external API endpoint
(`/api/blocks/{block_id}/execute`)
- Add `disabled` flag check to chat tool block execution
Previously, block execution endpoints only checked if a block existed
but did not verify the `disabled` flag, allowing any authenticated user
to execute disabled blocks.
## Test plan
- [x] Verify disabled blocks return 403 Forbidden on main API endpoint
- [x] Verify disabled blocks return 403 Forbidden on external API
endpoint
- [x] Verify disabled blocks return error response in chat tool
execution
- [x] Verify enabled blocks continue to execute normally
Adds analytics tracking to the chat copilot system for better
observability of user interactions and agent operations.
### Changes 🏗️
**PostHog Analytics Integration:**
- Added `posthog` dependency (v7.6.0) to track chat events
- Created new tracking module (`backend/api/features/chat/tracking.py`)
with events:
- `chat_message_sent` - When a user sends a message
- `chat_tool_called` - When a tool is called (includes tool name)
- `chat_agent_run_success` - When an agent runs successfully
- `chat_agent_scheduled` - When an agent is scheduled
- `chat_trigger_setup` - When a trigger is set up
- Added PostHog configuration to settings:
- `POSTHOG_API_KEY` - API key for PostHog
- `POSTHOG_HOST` - PostHog host URL (defaults to
`https://us.i.posthog.com`)
**OpenRouter Tracing:**
- Added `user` and `session_id` fields to chat completion API calls for
OpenRouter tracing
- Added `posthogDistinctId` and `posthogProperties` (with environment)
to API calls
**Files Changed:**
- `backend/api/features/chat/tracking.py` - New PostHog tracking module
- `backend/api/features/chat/service.py` - Added user message tracking
and OpenRouter tracing
- `backend/api/features/chat/tools/__init__.py` - Added tool call
tracking
- `backend/api/features/chat/tools/run_agent.py` - Added agent
run/schedule tracking
- `backend/util/settings.py` - Added PostHog configuration fields
- `pyproject.toml` - Added posthog dependency
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified code passes linting and formatting
- [x] Verified PostHog client initializes correctly when API key is
provided
- [x] Verified tracking is gracefully skipped when PostHog is not
configured
#### For configuration changes:
- [ ] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
**New environment variables (optional):**
- `POSTHOG_API_KEY` - PostHog project API key
- `POSTHOG_HOST` - PostHog host URL (optional, defaults to US cloud)
## Summary
- Add support for delegating agent generation to an external
microservice when `AGENTGENERATOR_HOST` is configured
- Falls back to built-in LLM-based implementation when not configured
(default behavior)
- Add comprehensive tests for the service client and core integration
(34 tests)
## Changes
- Add `agentgenerator_host`, `agentgenerator_port`,
`agentgenerator_timeout` settings to `backend/util/settings.py`
- Add `service.py` client for external Agent Generator API endpoints:
- `/api/decompose-description` - Break down goals into steps
- `/api/generate-agent` - Generate agent from instructions
- `/api/update-agent` - Generate patches to update existing agents
- `/api/blocks` - Get available blocks
- `/health` - Health check
- Update `core.py` to delegate to external service when configured
- Export `is_external_service_configured` and
`check_external_service_health` from the module
## Related PRs
- Infrastructure repo:
https://github.com/Significant-Gravitas/AutoGPT-cloud-infrastructure/pull/273
## Test plan
- [x] All 34 new tests pass (`poetry run pytest test/agent_generator/
-v`)
- [ ] Deploy with `AGENTGENERATOR_HOST` configured and verify external
service is used
- [ ] Verify built-in implementation still works when
`AGENTGENERATOR_HOST` is empty
## Summary
This PR implements comprehensive improvements to the human-in-the-loop
(HITL) review system, including safety features, architectural changes,
and bug fixes:
### Key Features
- **SECRT-1798: One-time safety popup** - Shows informational popup
before first run of AI-generated agents with sensitive actions/HITL
blocks
- **SECRT-1795: Auto-approval toggle UX** - Toggle in pending reviews
panel to auto-approve future actions from the same node
- **Node-specific auto-approval** - Changed from execution-specific to
node-specific using special key pattern
`auto_approve_{graph_exec_id}_{node_id}`
- **Consolidated approval checking** - Merged `check_auto_approval` into
`check_approval` using single OR query for better performance
- **Race condition prevention** - Added execution status check before
resuming to prevent duplicate execution when approving while graph is
running
- **Parallel auto-approval creation** - Uses `asyncio.gather` for better
performance when creating multiple auto-approval records
## Changes
### Backend Architecture
- **`human_review.py`**:
- Added `check_approval()` function that checks both normal and
auto-approval in single query
- Added `create_auto_approval_record()` for node-specific auto-approval
using special key pattern
- Added `get_auto_approve_key()` helper to generate consistent
auto-approval keys
- **`review/routes.py`**:
- Added execution status check before resuming to prevent race
conditions
- Refactored auto-approval record creation to use parallel execution
with `asyncio.gather`
- Removed obvious comments for cleaner code
- **`review/model.py`**: Added `auto_approve_future_actions` field to
`ReviewRequest`
- **`blocks/helpers/review.py`**: Updated to use consolidated
`check_approval` via database manager client
- **`executor/database.py`**: Exposed `check_approval` through
DatabaseManager RPC for block execution context
- **`data/block.py`**: Fixed safe mode checks for sensitive action
blocks
### Frontend
- **New `AIAgentSafetyPopup`** component with localStorage-based
one-time display
- **`PendingReviewsList`**:
- Replaced "Approve all future actions" button with toggle
- Toggle resets data to original values and disables editing when
enabled
- Shows warning message explaining auto-approval behavior
- **`RunAgentModal`**: Integrated safety popup before first run
- **`usePendingReviews`**: Added polling for real-time badge updates
- **`FloatingSafeModeToggle` & `SafeModeToggle`**: Simplified visibility
logic
- **`local-storage.ts`**: Added localStorage key for popup state
tracking
### Bug Fixes
- Fixed "Client is not connected to query engine" error by using
database manager client pattern
- Fixed race condition where approving reviews while graph is RUNNING
could queue execution twice
- Fixed migration to only drop FK constraint, not non-existent column
- Fixed card data reset when auto-approve toggle changes
### Code Quality
- Removed duplicate/obvious comments
- Moved imports to top-level instead of local scope in tests
- Used walrus operator for cleaner conditional assignments
- Parallel execution for auto-approval record creation
## Test plan
- [ ] Create an AI-generated agent with sensitive actions (e.g., email
sending)
- [ ] First run should show the safety popup before starting
- [ ] Subsequent runs should not show the popup
- [ ] Clear localStorage (`AI_AGENT_SAFETY_POPUP_SHOWN`) to verify popup
shows again
- [ ] Create an agent with human-in-the-loop blocks
- [ ] Run it and verify the pending reviews panel appears
- [ ] Enable the "Auto-approve all future actions" toggle
- [ ] Verify editing is disabled and shows warning message
- [ ] Click "Approve" and verify subsequent blocks from same node
auto-approve
- [ ] Verify auto-approval persists across multiple executions of same
graph
- [ ] Disable toggle and verify editing works normally
- [ ] Verify "Reject" button still works regardless of toggle state
- [ ] Test race condition: Approve reviews while graph is RUNNING
(should skip resume)
- [ ] Test race condition: Approve reviews while graph is REVIEW (should
resume)
- [ ] Verify pending reviews badge updates in real-time when new reviews
are created