mark_session_completed() is now the SINGLE place that publishes
StreamFinish to the turn stream. Simplified API:
- mark_session_completed(session_id) → completed
- mark_session_completed(session_id, error_message='...') → failed
Flow: set status (Lua CAS) → StreamError if failed → StreamFinish.
The processor intercepts StreamFinish from generators, calls
mark_session_completed instead. Removed _mark_task_failed (redundant).
Removed cleanup_turn_stream (streams have TTL, eager deletion raced
with _stream_listener xread).
Keep the improved collapsible <details> UI but restore the original
copywriting: 'The assistant encountered an error. Please try sending
your message again.'
- Move Redis stream cleanup from processor into stream_registry.cleanup_turn_stream()
- Remove obvious comment on CancelledError raise
- Remove unused get_redis_async import from processor
- Remove unnecessary comment in ChatMessagesContainer
- Save session on CancelledError to prevent message loss
Previously, sending the same message twice would cause the second one to disappear
due to aggressive fingerprint-based deduplication that applied to all messages.
Now fingerprint deduplication only applies to assistant messages (where it's needed
for hydration/stream boundary dedup). User messages are only deduplicated by ID,
allowing users to send the same text multiple times.
Fixes issue where duplicate user messages would be filtered out.
Previously, loading indicators were shown separately above accordions, causing them to stack.
Now the loading text only appears when there's no accordion, keeping the UI cleaner.
Changes:
- GenericTool: Only show loading text when not showing accordion
- RunAgentTool: Only show loading text when not showing accordion or other content
- For TodoWrite and running agents, the accordion now contains all the UI without duplicate indicators
This fixes the issue where parallel tasks show stacked loading indicators above their respective accordions.
When an agent generation fails and retries, both the error accordion and the
new loading accordion were stacking on top of each other instead of replacing.
Changes:
- Modified CreateAgent and EditAgent tools to show errors inline (no accordion)
- Errors now get replaced when a new attempt starts, instead of stacking
- Error display includes technical details in collapsible section and retry button
- Prevents confusing UI where old error and new loading state both appear
Also includes previous fixes:
- Skip incomplete tool calls during hydration to prevent "No tool invocation found" errors
- Use last_message_id="0-0" in resume endpoint to replay all chunks for proper tool call delivery
- Remove dead GET /tasks/{task_id}/stream and GET /tasks/{task_id} endpoints
- Rename CancelCoPilotEvent.task_id to session_id
- Rename enqueue_cancel_task param to session_id
- Rename all task_id local vars in manager.py to session_id
- Rename Redis lock key copilot:task:{id}:lock to copilot:session:{id}:lock
- Rename StreamStart.taskId to sessionId
- Remove set_task_id from SDKResponseAdapter (uses self.session_id)
- Remove task_id from CancelTaskResponse and CoPilotExecutionEntry
- Update agent_generator dummy/service docstrings
- Regenerate openapi.json and frontend generated types
- Update tests
- Change logger.info to logger.debug for UserMessage, flush operations
- Reduces production log noise and overhead from tool call serialization
- Addresses PR review comment about debug logging
- Increase timeout from 10 minutes to COPILOT_CONSUMER_TIMEOUT_SECONDS (1 hour) + 5min buffer
- Prevents false failures when users refresh during long-running agent generation (20-30 min)
- Fixes critical bug where valid tasks were marked as failed after 10 minutes
- Use shared constant instead of hardcoded value for consistency
The get_all_relevant_agents_for_generation and decompose_goal functions
were moved from create_agent to agent_generator module. Tests need to
mock these at the import location (create_agent) not the definition
location (agent_generator).
Fixes 3 failing tests:
- test_vague_goal_returns_suggested_goal_response
- test_unachievable_goal_returns_suggested_goal_response
- test_clarifying_questions_returns_clarification_needed_response
- Revert out-of-scope changes: .gitignore, auth/dependencies.py, auth/jwt_utils.py,
backend/.gitignore, sdk/response_adapter.py
- Raise ValueError instead of silently truncating search queries in
agent_generator/core.py so the AI can adjust its search term
- Remove .application.logs artifact
- Introduce turn_id (per-turn UUID) for Redis stream key isolation,
preventing cross-turn event accumulation on refresh
- Remove incremental DB saves during streaming — persist only at end of turn
- Remove debug/timing logging artifacts added during development
- Remove accidentally committed analysis MD files
- Clean up excessive logging throughout copilot backend
## Summary
Adds a warning to the Getting Started docs clarifying that **Podman and
podman-compose are not supported**.
## Problem
Users on Windows using `podman-compose` instead of Docker get errors
like:
```
Error: the specified Containerfile or Dockerfile does not exist, ..\..\autogpt_platform\backend\Dockerfile
```
This is because Podman handles relative paths differently than Docker,
causing incorrect path resolution on Windows.
## Solution
- Added a clear warning section after the Windows WSL 2 notes
- Explains the error users might see
- Directs them to install Docker Desktop instead
Closes#11358
<!-- greptile_comment -->
<details><summary><h3>Greptile Summary</h3></summary>
Adds a "Podman Not Supported" warning section to the Getting Started
documentation, placed after the Windows/WSL 2 installation notes. The
section clarifies that Docker is required, shows the typical error
message users encounter when using Podman, and directs them to install
Docker Desktop instead. This addresses issue #11358 where Windows users
using `podman-compose` hit path resolution errors.
- Adds `### ⚠️ Podman Not Supported` section under Manual Setup, after
Windows Installation Note
- Includes the specific error message users see with Podman for easy
identification
- Links to Docker Desktop installation docs as the recommended solution
- Formatting is consistent with existing sections in the document (emoji
headings, code blocks for errors)
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge — it only adds a documentation warning
section with no code changes.
- The change is a small, well-written documentation addition that adds a
Podman compatibility warning. It touches only one markdown file,
introduces no code changes, and is consistent with the existing document
structure and style. No issues were found.
- No files require special attention.
</details>
<details><summary><h3>Flowchart</h3></summary>
```mermaid
flowchart TD
A[User wants to run AutoGPT] --> B{Which container runtime?}
B -->|Docker / Docker Desktop| C[docker compose up -d --build]
C --> D[AutoGPT starts successfully]
B -->|Podman / podman-compose| E[podman-compose up -d --build]
E --> F[Error: Containerfile or Dockerfile does not exist]
F --> G[New warning section directs user to install Docker Desktop]
G --> C
```
</details>
<sub>Last reviewed commit: 23ea6bd</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Backend: Subscribe resume stream from latest position ($) instead of 0-0.
Hydrated messages from REST already contain all history — replay only
delivers NEW chunks (tool progress, heartbeats, StreamFinish) going
forward. Prevents intro text duplication on refresh.
Frontend: Include toolCallId in message fingerprint so tool-only messages
(no text content) are properly deduplicated across hydration/stream
boundary.
- Keep assistant message with tool_calls in GET /sessions response during
active stream so frontend renders correct tool UI (mini game) on refresh
- Use last_message_id from Redis instead of 0-0 for SSE replay position
- Match EditAgent saved-output card to CreateAgent sparkles design
- Verified StreamFinish flow is correct (mark_task_completed publishes it)
The delete call was causing a feedback loop: stream completes → ref cleared
→ hasActiveStream still true (stale query) → resume triggers again → 73+
rapid-fire GET /stream requests. The ref should persist for the full page
lifecycle since a fresh page load creates a new Map via useRef.
- Re-add missing get_all_relevant_agents_for_generation import in create_agent.py
- Fix test assertions for generate_agent_external (2 args, not 4) and
generate_agent_patch_external (3 args, not 5) after operation_id/task_id removal
- Move json import to module level in response_model.py (was inside to_sse())
- Track fire-and-forget asyncio.create_task for title update to prevent GC
- Remove redundant inline import asyncio in service.py
Remove the entire long-running/async tool execution infrastructure that
became redundant after the executor system was introduced. All tools now
execute uniformly through execute_tool() with heartbeat support.
Backend cleanup:
- Remove _execute_long_running_tool_with_streaming and is_long_running
- Delete completion_consumer.py and completion_handler.py (dead code)
- Remove _generate_llm_continuation (both variants), _update_pending_operation,
_mark_operation_started/completed
- Remove operation_id plumbing from stream_registry, routes, executor
- Remove LongRunningCallback from SDK tool_adapter
- Remove dead config fields (stream_completion_name, etc.)
Frontend cleanup:
- Remove OperationInProgressResponse checks from CreateAgent/EditAgent
- Remove debug logging from useCopilotPage
- Delete dead task-level stream route and useAsymptoticProgress hook
- Regenerate OpenAPI types
Bug fixes:
- #3: Pessimistic input lock — disable chat while session loading or errored
- #3: Replace O(N) SCAN with O(1) direct lookup in get_active_task_for_session
- #1: Content-based deduplication to prevent intro message replay on resume
- Fix parallel_tool_calls_test to match new _yield_tool_call signature
Credentials, inputs, and login prompts in copilot tool outputs were
hidden inside collapsible accordions — users could accidentally collapse
them, hiding blocking actionable UI. This PR extracts all blocking
requirements out of accordions so they're always visible.
### Changes 🏗️
- **RunAgent & RunBlock**: Extract `SetupRequirementsCard` (credentials
picker) out of `ToolAccordion` — renders standalone, always visible
- **RunAgent**: Also extract `AgentDetailsCard` (inputs needed) and
`need_login` message out of accordion
- **SetupRequirementsCard (RunBlock)**: Input form always visible
(removed toggle button and animation), unified "Proceed" button disabled
until credentials + inputs are satisfied
- **SetupRequirementsCard (RunAgent)**: "Proceed" button disabled until
all credentials are selected
- **Both cards**: Added titled box with border for credentials section
("Block credentials" / "Agent credentials"), matching the existing
inputs box pattern
- **CredentialsFlatView**: "Add" button uses `variant="primary"` when
user has no credentials (was `secondary`)
- **Styleguide**: Added mock `CredentialsProvidersContext` with two
scenarios:
- No credentials → shows "add new" flow
- Has credentials → shows selection list with existing accounts
- **CreateAgent & EditAgent**: Picked up user-initiated styling
refinements
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] `pnpm format && pnpm lint && pnpm types` all pass
- [ ] Visit `/copilot/styleguide` and verify:
- [ ] "Setup requirements — no credentials" shows add-credential button
(primary variant)
- [ ] "Setup requirements — has credentials" shows credential selection
dropdown
- [ ] Both RunAgent and RunBlock setup requirements render outside
accordion
- [ ] Trigger a copilot agent run that requires credentials — credential
picker always visible
- [ ] Trigger a copilot block run that requires credentials + inputs —
both sections visible, "Proceed" disabled until ready
- [ ] Trigger a copilot agent run that returns "agent details" — card
renders outside accordion
- [ ] Verify other output types (execution_started, error) still render
inside accordions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
StreamFinishStep is only needed for multi-step LLM continuations (tool → LLM → tool).
For synchronous long-running tools, there's only 1 step, so step separation isn't needed.
Frontend doesn't listen to finish_step events (confirmed via grep - no matches).
Only StreamFinish is required to signal stream completion.
Simplifies success path to match actual requirements.
CRITICAL BUG FIX: Addresses issue #2 from manual testing - agent generation
result doesn't show without refresh.
Root Cause:
_execute_long_running_tool_with_streaming() was publishing the tool result
but NOT publishing StreamFinish/StreamFinishStep events. The error path
correctly published these events (lines 1682-1683), but the success path
was missing them.
Impact:
- Frontend never received StreamFinish signal after tool completion
- AI SDK waited indefinitely for stream to end
- UI showed mini-game forever until page refresh
- useLongRunningToolPolling hook was removed assuming sync execution would work
- But sync execution wasn't signaling completion properly
Fix:
After publishing tool result chunk, now also publish:
1. StreamFinishStep() - signals step completion
2. StreamFinish() - signals stream end
3. mark_task_completed(status="completed") - updates stream registry
This matches the error path behavior and ensures frontend knows when
long-running tools complete.
Also added CRITICAL_BUGS_ANALYSIS.md documenting all 3 manual test issues:
1. Intro message replay on retry
2. Agent generation result not showing (FIXED by this commit)
3. Red button/lock delay after refresh
Addresses autogpt-reviewer blocker #1 (dead ContextVar).
The _current_tool_use_id ContextVar was being set in security_hooks.py
but never consumed. Now the long-running callback uses it as the primary
source (with DB query as fallback), which is faster and more reliable.
Flow:
1. security_hooks.py sets _current_tool_use_id for long-running tools
2. Callback reads from ContextVar first (no DB query needed)
3. Falls back to _find_latest_tool_use_id() if not set
4. Final fallback generates sdk-{uuid} if neither works
This fixes the "dead code" issue identified in PR review.
- test_message_deduplication: verify message IDs are unique
- test_event_ordering: verify StreamStart < text deltas < StreamFinish
- test_stream_completeness: verify exactly 1 start, N text, 1 finish
- test_text_delta_consistency: verify all deltas share same block ID and build coherent text
All tests pass using COPILOT_TEST_MODE=true with dummy implementations.
Improves test coverage for reconnection flow identified in RECONNECTION_FLOW_REVIEW.md.
- Complete analysis of reconnection implementation
- Verified backend and frontend coordination
- Identified test coverage gaps
- All implementation is correct, just needs tests
- Increase agentgenerator_timeout from 600s to 1800s (30 minutes)
- Remove async complexity (operation_id/task_id) from agent generation
- Tools now block synchronously on HTTP response (simpler, cleaner)
- Remove dual Redis subscription issues
- Fix duplicate Otto intro with $ fallback and continuous deduplication
- Fix bare except statements and TypeScript errors
Previous async approach created duplicate subscriptions and race conditions.
Synchronous approach is simpler and safer - tools already run async in service.py.
- Remove ActiveTask.task_id field (was redundant, always equaled session_id)
- Update OperationCompleteMessage to use session_id instead of task_id
- Remove all task_id references from stream_registry
- Simplify code by using session_id directly throughout
This completes the task_id -> session_id refactor with no more intermediate
variables or backwards compatibility fields.
The commit dd10e1b33 accidentally included 84 files with formatting-only
changes (black/ruff reformatting) that were unrelated to the task_id removal.
This commit reverts those files back to their original state while keeping
the actual copilot-related changes.
Files reverted: blocks, tests, utilities (84 files)
Files kept: copilot, frontend copilot components (15 files)
Implements proper async execution for agent generator tools while maintaining
blocking UX to prevent race conditions and ensure predictable behavior.
Architecture:
- Agent Generator executes asynchronously (returns 202 Accepted)
- HTTP request stays open and waits for completion (no polling, event-driven)
- Heartbeats keep SSE connection alive during long operations
- Session stays locked - user cannot send new messages during generation
- When complete, continuation happens in same request (no concurrent LLM calls)
Changes:
- Add 'blocking' flag to ActiveTask model and Redis storage
- Update create_task() to accept blocking parameter
- Modify _execute_long_running_tool_with_streaming():
* For agent tools (create_agent, edit_agent, customize_agent):
- Pass operation_id/session_id to enable async on Agent Generator
- Create task with blocking=True
- Subscribe to stream_registry for completion
- Wait via event-driven subscription (no polling!)
- Forward progress chunks to SSE
* For other tools: keep synchronous execution
- Update completion_handler to skip LLM continuation if blocking=True
(HTTP request handles continuation instead)
Benefits:
- No HTTP timeout issues (Agent Generator doesn't block their service)
- No polling waste (event-driven via Redis Pub/Sub)
- No race conditions (session stays locked)
- Predictable UX (stop button visible, can't chat during generation)
- Efficient (heartbeats keep connection alive, progress updates forwarded)
Related to #<issue-number>
- Add DEBUG_CONVERSATION logs to track messages through the system
- Log raw message input, session history, OpenAI message conversion
- Log text deltas being yielded to frontend
- Log when LLM continuation is triggered
- This will help debug the 'double response then stuck' issue
Related to fix/copilot-synchronous-long-running-tools refactor
- Remove separate task_id creation for long-running tools
- Update stream_registry to use session_id as primary identifier
- Update all stream_registry calls across codebase to use session_id
- Keep ActiveTask.task_id field for backwards compatibility (equals session_id)
- Fix mini game showing forever by ensuring results reach correct stream
- Remove _task_id parameter from stream_chat_completion
- Update processor, service, sdk/service, routes, completion handlers
Root cause: Long-running tools were creating separate task_ids and publishing
to different streams than the frontend was subscribed to. Now everything uses
session_id, ensuring results reach the frontend properly.
Files modified:
- backend/copilot/stream_registry.py
- backend/api/features/chat/routes.py
- backend/copilot/executor/processor.py
- backend/copilot/service.py
- backend/copilot/sdk/service.py
- backend/copilot/completion_handler.py
- backend/copilot/completion_consumer.py
- frontend EditAgent/CreateAgent components (remove duplicate loader)
- frontend ChatMessagesContainer (remove unused imports)
Tests: 139/140 passed (1 SDK initialization failure unrelated to changes)
Hide the MorphingTextAnimation status text when the accordion is shown to
avoid duplication. Previously showed:
1. "Editing the agent" (status text)
2. "Editing agent, this may take a few minutes..." (accordion title)
Now only shows the accordion with its title, which contains more useful
information (includes the mini game and full messaging).
The useEffect for resuming streams was only depending on sessionId, but it
checked hasActiveStream and hydratedMessages inside. This caused a race
condition where the effect would run before those values loaded, return early,
and never re-run when they became available.
Added hasActiveStream, hydratedMessages, status, and resumeStream to the
dependency array so the effect re-runs when these values change.
Fixes the critical bug where:
1. User sends create/edit agent request
2. User refreshes page
3. SSE stream never reconnects
4. Final result never arrives (stuck with red button forever)
5. User has to keep refreshing manually to see progress
Publish the initial OperationInProgressResponse to stream_registry.publish_chunk()
so it's available when users refresh/reconnect. Previously it was only yielded
to the current SSE connection but not persisted, causing the mini game UI to
disappear after page refresh.
Fixes: "I see the agent generating UI and I see the game but once I refresh it's gone"
Yield initial StreamToolOutputAvailable with OperationInProgressResponse before
starting synchronous execution of long-running tools (create_agent, edit_agent).
This ensures the frontend receives output immediately and can render the accordion
UI with mini-game while the synchronous operation is running. Previously, the
frontend only had input but no output until the tool completed, preventing the
UI from rendering.
Flow is now:
1. StreamToolInputAvailable - creates tool part
2. StreamToolOutputAvailable (OperationInProgressResponse) - triggers UI rendering
3. [synchronous execution with heartbeats]
4. StreamToolOutputAvailable (final result) - updates with actual result
Addresses user feedback: "why don't we update the event before calling the sync request"
Set hasExpandableContent=true unconditionally in CreateAgent and EditAgent
components to ensure the accordion (with mini game) renders immediately when
the tool starts executing, not just after output is received.
This fixes the issue where users only saw the "Creating agent..." text with
spinner but no expandable UI until the agent generation completed.
When tool is executing (state=input-available, no output yet), expand
accordion to show mini game. Streaming updates work fine - SSE delivers
updates in real-time regardless of sync/async execution.
Changes:
- getAccordionMeta: accept null, show mini game message when no output
- Remove '&& output' check to render accordion during execution
- Add null guards before type check functions
Revert "fix(copilot): show mini game during synchronous agent operations"
These changes broke real-time streaming - UI was not updating without refresh.
Need a different approach for showing mini game that doesn't break streaming.
Removed '&& output' check to allow accordion to render when no output yet.
Added null checks before type guard calls to satisfy TypeScript.
This shows the mini game during agent creation/editing.
With synchronous execution, getAccordionMeta() received null output
and fell through to error case, hiding the MiniGame.
Now explicitly handles null output and expands accordion to show
mini game while waiting for agent creation/editing to complete.
Instead of automatically searching for library agents using the user's
full description (which caused 'Search term too long' errors), we now:
1. Add library_agent_ids parameter to create_agent and edit_agent tools
2. Update tool descriptions to instruct the copilot to search first
3. Copilot uses find_library_agent to search for relevant agents
4. Copilot passes the agent IDs to the tool
5. Tool fetches agents by ID and passes to generator
This approach:
- Separates concerns: copilot handles search, tool handles generation
- Lets copilot make intelligent decisions about relevance
- Avoids search term length issues
- Gives copilot more control over sub-agent composition
New function: get_library_agents_by_ids() to fetch multiple agents by ID
Tool descriptions now explicitly instruct:
'IMPORTANT: Before calling this tool, search for relevant existing
agents using find_library_agent that could be used as building blocks.
Pass their IDs in the library_agent_ids parameter.'
When creating or editing agents with long descriptions, the full user
description was being passed as a search query to list_library_agents
and get_store_agents. This caused 'Search term is too long' errors
because the DB enforces a 100-character limit on search terms.
Now truncating search queries to 100 chars in:
- get_library_agents_for_generation
- search_marketplace_agents_for_generation
This fixes the error while preserving the sub-agent composition feature
that searches for relevant existing agents during agent creation/editing.
Fixes issue where users couldn't create agents with long descriptions
like 'Create an agent that generates funny, humorous images of bekantan
(proboscis monkeys) using AI image generation...'
CRITICAL: We were accidentally triggering async/webhook mode by passing
_operation_id and _task_id to the tool. This caused:
- Dummy agent generator to return 202 Accepted
- 30s background task with Redis webhook
- Massive polling (hundreds of xread operations)
- Complex async flow instead of simple synchronous blocking
FIX: Don't pass operation_id/task_id to the tool execution. These are only
for our internal task tracking, not for controlling tool execution mode.
Now:
- Dummy blocks for 30s and returns result directly (synchronous)
- No 202 Accepted, no webhook, no polling loop
- Single SSE stream with final result
- Much simpler flow
The polling code remains as fallback for when the REAL external
agent-generator service needs to return 202 (queue overload, etc.)
STALE TASK DETECTION:
- Auto-complete tasks running >10 minutes without updates
- Prevents infinite polling when StreamFinish sent but task not marked complete
- Logged as warning for debugging
REDUCED POLLING:
- Changed POLL_INTERVAL_MS from 1.5s to 5s (less aggressive)
- Polling is fallback for legacy async tools (202 Accepted)
- Modern synchronous tools use SSE stream only (no polling)
Addresses user concerns:
- "Can't the code check if task is stale?" -> Yes, now it does
- "Why polling so aggressive?" -> Reduced from 1.5s to 5s
- "If we use polling, why Redis?" -> Redis streams for real-time,
polling only for legacy async fallback
CRITICAL BUG FIX: The synchronous long-running tool execution was never
marking the stream_registry task as completed in the success path, causing:
- Frontend infinite polling loop (active_task=True persists forever)
- Same messages rendered repeatedly
- "Something Went Wrong" toast after timeout
Root cause: service.py:1548 returned without calling
stream_registry.mark_task_completed() and _mark_operation_completed()
in the success path, while the error path (line 1516) correctly did.
Fix: Added task completion cleanup before yielding final
StreamToolOutputAvailable in the success path.
This commit completes the synchronous long-running tool execution refactor by:
1. **Fixes all PR review issues**:
- Move logging import to module level in response_model.py
- Change INFO logging to DEBUG with isEnabledFor guards to prevent sensitive data leakage
- Fix tool name lookup in SDK to check function.name (OpenAI format)
- Add heartbeats during synchronous execution to keep SSE connection alive
- Fix polling timeout to release Redis lock and mark task as failed
- Track background tasks in dummy.py to prevent garbage collection
2. **Add frontend support for callProviderMetadata**:
- Update CreateAgent.tsx to detect long-running via callProviderMetadata.isLongRunning
- Update EditAgent.tsx with same detection logic
- Maintain backwards compatibility with operation_started/pending output types
3. **Remove dead backend code**:
- Remove unused _background_tasks set from service.py (only used in dummy.py now)
- Remove dead _execute_long_running_tool function (never called)
- Add comments clarifying legacy vs new execution paths
4. **Update frontend comments**:
- Clarify that useLongRunningToolPolling is for async fallback and legacy sessions
- Document that modern synchronous execution bypasses polling
The synchronous execution flow now works end-to-end:
- Backend sets callProviderMetadata.isLongRunning on tool input
- Backend blocks synchronously with heartbeats
- Frontend shows mini-game immediately when detecting isLongRunning
- Backend returns actual result after completion
- Frontend displays result (single LLM continuation)
Async fallback path (202 Accepted + webhooks) still supported for external services.
Removes background task spawning for long-running tools (create_agent,
edit_agent) in favor of synchronous blocking execution. This simplifies
the conversation flow by returning actual results to Claude in a single
LLM continuation instead of two separate continuations.
Changes:
- Remove asyncio.create_task() background task spawning
- Execute tools synchronously with await
- Fix database access to use get_chat_session().messages
- Remove unused pending_msg/started_msg variables
- Fix dummy agent generator to match real behavior
- Add callProviderMetadata for frontend mini-game indication
Benefits:
- Single LLM continuation with actual result (not operation started)
- Clearer conversation flow
- Frontend mini-game still shows via SSE during blocking execution
- Simplified codebase without background task tracking
Requested by @majdyz
Concurrent writers (incremental streaming saves from PR #12173 and
long-running tool callbacks) can race to persist messages with the same
`(sessionId, sequence)` pair, causing unique constraint violations on
`ChatMessage`.
**Root cause:** The streaming loop tracks `saved_msg_count` in-memory,
but the long-running tool callback (`_build_long_running_callback`) also
appends messages and calls `upsert_chat_session` independently — without
coordinating sequence numbers. When the streaming loop does its next
incremental save with the stale `saved_msg_count`, it tries to insert at
a sequence that already exists.
**Fix:** Multi-layered defense-in-depth approach:
1. **Collision detection with retry** (db.py): `add_chat_messages_batch`
uses `create_many()` in a transaction. On `UniqueViolationError`,
queries `MAX(sequence)+1` from DB and retries with the correct offset
(max 5 attempts).
2. **Robust sequence tracking** (db.py): `get_next_sequence()` uses
indexed `find_first` with `order={"sequence": "desc"}` for O(1) MAX
lookup, immune to deleted messages.
3. **Session-based counter** (model.py): Added `saved_message_count`
field to `ChatSession`. `upsert_chat_session` returns the session with
updated count, eliminating tuple returns throughout the codebase.
4. **MessageCounter dataclass** (sdk/service.py): Replaced list[int]
mutable reference pattern with a clean `MessageCounter` dataclass for
shared state between streaming loop and long-running callbacks.
5. **Session locking** (sdk/service.py): Prevent concurrent streams on
the same session using Redis `SET NX EX` distributed locks with TTL
refresh on heartbeats (config.stream_ttl = 3600s).
6. **Atomic operations** (db.py): Single timestamp for all messages and
session update in batch operations for consistency. Parallel queries
with `asyncio.gather` for lower latency.
7. **Config-based TTL** (sdk/service.py, config.py): Consolidated all
TTL constants to use `config.stream_ttl` (3600s) with lock refresh on
heartbeats.
### Key implementation details
- **create_many**: Uses `sessionId` directly (not nested
`Session.connect`) as `create_many` doesn't support nested creates
- **Type narrowing**: Added explicit `assert session is not None`
statements for pyright type checking in async contexts
- **Parallel operations**: Use `asyncio.gather` for independent DB
operations (create_many + session update)
- **Single timestamp**: All messages in a batch share the same
`createdAt` timestamp for atomicity
### Changes
- `backend/copilot/db.py`: Collision detection with `create_many` +
retry, indexed sequence lookup, single timestamp, parallel queries
- `backend/copilot/model.py`: Added `saved_message_count` field,
simplified return types
- `backend/copilot/sdk/service.py`: MessageCounter dataclass, session
locking with refresh, config-based TTL, type narrowing
- `backend/copilot/service.py`: Updated all callers to handle new return
types
- `backend/copilot/config.py`: Increased long_running_operation_ttl to
3600s with clarified docstring
- `backend/copilot/*_test.py`: Tests updated for new signatures
---------
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
## Summary
Fixes critical reliability issues where long-running copilot sessions
were forcibly terminated and failures showed no error messages to users.
## Issues Fixed
1. **Silent failures**: Tasks failed but frontend showed "stopped" with
zero explanation
2. **Premature timeout**: Sessions auto-expired after 5 minutes even
when actively running
## Changes
### Error propagation to frontend
- Add `error_message` parameter to `mark_task_completed()`
- When `status="failed"`, publish `StreamError` before `StreamFinish` so
frontend displays reason
- Update all failure callers with specific error messages:
- Session not found: `"Session {id} not found"`
- Tool setup failed: `"Failed to setup tool {name}: {error}"`
- Task cancelled: `"Task was cancelled"`
### Remove stream timeout
- Delete `stream_timeout` config (was 300s/5min)
- Remove auto-expiry logic in `get_active_task_for_session()`
- Sessions now run indefinitely — user controls stopping via UI
## Why
**Auto-expiry was broken:**
- Used `created_at` (task start) not last activity
- SDK sessions with multiple LLM calls + subagent Tasks easily run
20-30+ minutes
- A task publishing chunks every second still got killed at 5min mark
- Hard timeout is inappropriate for long-running AI agents
**Error propagation was missing:**
- `mark_task_completed(status="failed")` only sent `StreamFinish`
- No `StreamError` event = frontend had no message to show user
- Backend logs showed errors but user saw nothing
## Test Plan
- [x] Formatter, linter, type-check pass
- [ ] Start a copilot session with Task tool (spawns subagent)
- [ ] Verify session runs beyond 5 minutes without auto-expiry
- [ ] Cancel a running session → frontend shows "Task was cancelled"
error
- [ ] Trigger a tool setup failure → frontend shows error message
- [ ] Session continues running until user clicks stop or task completes
## Files Changed
- `backend/copilot/config.py` — removed `stream_timeout`
- `backend/copilot/stream_registry.py` — removed auto-expiry, added
error propagation
- `backend/copilot/service.py` — error messages for 2 failure paths
- `backend/copilot/executor/processor.py` — error message for
cancellation
## Summary
Fixes multiple reliability issues in the copilot's Claude Agent SDK
streaming pipeline — tool outputs getting stuck, parallel tool calls
flushing prematurely, messages lost on page refresh, and SSE
reconnection failures.
## Changes
### Backend: Streaming loop rewrite (`sdk/service.py`)
- **Non-cancelling heartbeat pattern**: Replace `asyncio.timeout()` with
`asyncio.wait()` for SDK message iteration. The old approach corrupted
the SDK's internal anyio memory stream when timeouts fired
mid-`__anext__()`, causing `StopAsyncIteration` on the next call and
silently dropping all in-flight tool results.
- **Hook synchronization**: Add `wait_for_stash()` before
`convert_message()` — the SDK fires PostToolUse hooks via `start_soon()`
(fire-and-forget), so the next message can arrive before the hook
stashes its output. The new asyncio.Event-based mechanism bridges this
gap without arbitrary sleeps.
- **Error handling**: Add `asyncio.CancelledError` handling at both
inner (streaming loop) and outer (session) levels, plus pending task
cleanup in `finally` block to prevent leaked coroutines. Catch
`Exception` from `done.pop().result()` for SDK error messages.
- **Safety-net flush**: After streaming loop ends, flush any remaining
unresolved tool calls so the frontend stops showing spinners even if the
stream drops unexpectedly.
- **StreamFinish fallback**: Emit `StreamFinishStep` + `StreamFinish`
when stream ends without `ResultMessage` (StopAsyncIteration) so the
frontend transitions to "ready" state.
- **Incremental session saves**: Save session to PostgreSQL after each
tool input/output event (not just at stream end), so page refresh and
other devices see recent messages.
- **Enhanced logging**: All log lines now include `session_id[:12]`
prefix and tool call resolution state (unresolved/current/resolved
counts).
### Backend: Response adapter (`sdk/response_adapter.py`)
- **Parallel tool call support**: Skip `_flush_unresolved_tool_calls()`
when an AssistantMessage contains only ToolUseBlocks (parallel
continuation) — the prior tools are still executing concurrently and
haven't finished yet.
- **Duplicate output prevention**: Skip already-resolved tool results in
both UserMessage (ToolResultBlock) and parent_tool_use_id handling to
prevent duplicate `StreamToolOutputAvailable` events.
- **`has_unresolved_tool_calls` property**: Used by the streaming loop
to decide whether to wait for PostToolUse hooks.
- **`session_id` parameter**: Passed through for structured logging.
### Backend: Hook synchronization (`sdk/tool_adapter.py`)
- **`_stash_event` ContextVar**: asyncio.Event signaled by
`stash_pending_tool_output()` whenever a PostToolUse hook stashes
output.
- **`wait_for_stash()`**: Awaits the event with configurable timeout —
replaces the racy "hope the hook finished" approach.
### Backend: Security hooks (`sdk/security_hooks.py`)
- Enhanced logging in `post_tool_use_hook` — log whether tool is
built-in, preview of stashed output, warning when `tool_response` is
None.
### Backend: Incremental save optimization (`model.py`)
- **`existing_message_count` parameter** on `upsert_chat_session`: Skip
the DB query to count existing messages when the caller already tracks
this (streaming loop).
- **`skip_existence_check` parameter** on `_save_session_to_db`: Skip
the `get_chat_session` existence query when we know the session row
already exists. Reduces from 4 DB round trips to 2 per incremental save.
### Backend: SDK version bump (`pyproject.toml`, `poetry.lock`)
- Bump `claude-agent-sdk` from `^0.1.0` to `^0.1.39`.
### Backend: New tests
- **`sdk_compat_test.py`** (new file): SDK compatibility tests — verify
the installed SDK exposes every class, attribute, and method the copilot
integration relies on. Catches SDK upgrade breakage immediately.
- **`response_adapter_test.py`**: 9 new tests covering
flush-at-ResultMessage, flush-at-next-AssistantMessage, stashed output
flush, wait_for_stash signaling/timeout/fast-path, parallel tool call
non-premature-flush, text-message flush of prior tools, and
already-resolved tool skip in UserMessage.
### Frontend: Session hydration (`convertChatSessionToUiMessages.ts`)
- **`isComplete` option**: When session has no active stream, mark
dangling tool calls (no output in DB) as `output-available` with empty
output — stops stale spinners after page refresh.
### Frontend: Chat session hook (`useChatSession.ts`)
- Reorder `hasActiveStream` memo before `hydratedMessages` so
`isComplete` flag is available.
- Pass `{ isComplete: !hasActiveStream }` to
`convertChatSessionMessagesToUiMessages`.
### Frontend: Copilot page hook (`useCopilotPage.ts`)
- **Cache invalidation on stream end**: Invalidate React Query session
cache when stream transitions active → idle, so next hydration fetches
fresh messages from backend (staleTime: Infinity otherwise keeps stale
data).
- **Resume ref reset**: Reset `hasResumedRef` on stream end to allow
re-resume if SSE drops but backend task is still running.
- **Remove old `resolveInProgressTools` effect**: Replaced by
backend-side safety-net flush + hydration-time `isComplete` marking.
## Test plan
- [ ] Existing copilot tests pass (`pytest backend/copilot/ -x -q`)
- [ ] SDK compat tests pass (`pytest
backend/copilot/sdk/sdk_compat_test.py -v`)
- [ ] Tool outputs (bash_exec, web_fetch, WebSearch) appear in the UI
instead of getting stuck
- [ ] Parallel tool calls (e.g. multiple WebSearch) complete and display
results without premature flush
- [ ] Page refresh during active stream reconnects and recovers messages
- [ ] Opening session from another device shows recent tool results
- [ ] SSE drop → automatic reconnection without losing messages
- [ ] Long-running tools (create_agent) still delegate to background
infrastructure
The todo card rendered by GenericTool was collapsed by default,
requiring users to click to see their checklist items. Now passes
`defaultExpanded` when the category is `"todo"` so the task list is
immediately visible.
**File changed:**
`autogpt_platform/frontend/src/app/(platform)/copilot/tools/GenericTool/GenericTool.tsx`
Resolves [SECRT-2017](https://linear.app/autogpt/issue/SECRT-2017)
(#12082)
### Changes 🏗️
This PR removes old builder code and monitoring components as part of
the migration to the new flow editor:
- **NewControlPanel**: Simplified component by removing unused props
(`flowExecutionID`, `visualizeBeads`, `pinSavePopover`,
`pinBlocksPopover`, `nodes`, `onNodeSelect`, `onNodeHover`) and cleaned
up commented legacy code
- **Import paths**: Updated all references from
`legacy-builder/CustomNode` to `FlowEditor/nodes/CustomNode`
- **GraphContent**: Fixed type safety by properly handling
`customized_name` metadata and using `categoryColorMap` instead of
`getPrimaryCategoryColor`
- **useNewControlPanel**: Removed unused state and query parameter
handling related to old builder
- Removed dead code and commented-out imports throughout
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verify NewControlPanel renders correctly
- [x] Test BlockMenu functionality
- [x] Test Save Control
- [x] Test Undo/Redo buttons
- [x] Verify graph search menu still works with updated imports
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
This PR removes legacy builder components and monitoring page (~12,000
lines of code), simplifying `NewControlPanel` to focus only on the new
flow editor.
**Key changes:**
- Removed entire `legacy-builder/` directory (36 files) containing old
CustomNode, CustomEdge, Flow, and control components
- Deleted `/monitoring` page and all related components (9 files)
- Deleted `useAgentGraph` hook (1,043 lines) that was only used by
legacy components
- Simplified `NewControlPanel` by removing unused props
(`flowExecutionID`, `nodes`, `onNodeSelect`, etc.) and commented-out
code
- Updated imports in `NewSearchGraph` components to reference new
`FlowEditor/nodes/CustomNode` instead of deleted
`legacy-builder/CustomNode`
- Removed `/monitoring` from protected pages in `helpers.ts`
- Updated test files to remove monitoring-related test helpers
**Minor style issues:**
- `useNewControlPanel` hook returns unused state (`blockMenuSelected`)
that should be cleaned up
- Unnecessary double negation (`!!`) in `GraphContent.tsx:136`
</details>
<details><summary><h3>Confidence Score: 4/5</h3></summary>
- This PR is safe to merge with minor style improvements recommended
- The refactor is a straightforward deletion of legacy code with no
references remaining in the codebase. All imports have been updated
correctly, tests cleaned up, and routing configuration updated. The only
issues are minor unused code that could be cleaned up but won't cause
runtime errors.
- No files require special attention - the unused state in
`useNewControlPanel.ts` is a minor style issue
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant User
participant NewControlPanel
participant BlockMenu
participant NewSaveControl
participant UndoRedoButtons
participant Store as blockMenuStore (Zustand)
Note over NewControlPanel: Simplified component (removed props & legacy code)
User->>NewControlPanel: Render
NewControlPanel->>useNewControlPanel: Call hook (unused return)
NewControlPanel->>BlockMenu: Render
BlockMenu->>Store: Access state via useBlockMenuStore
Store-->>BlockMenu: Return search, filters, etc.
NewControlPanel->>NewSaveControl: Render
NewControlPanel->>UndoRedoButtons: Render
Note over NewControlPanel,Store: State management moved from hook to Zustand store
Note over User: Legacy components (CustomNode, Flow, etc.) completely removed
```
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
## Summary
- **Workspace file tools**: `write_workspace_file` now accepts plain
text `content`, `source_path` (copy from ephemeral disk), and graceful
fallback for invalid base64. `read_workspace_file` gains `save_to_path`
to download workspace files to the ephemeral working directory. Both
validate paths against session-specific ephemeral directory.
- **Context reconstruction**: `_format_conversation_context` now
includes tool call summaries and tool results (not just user/assistant
text), fixing agent amnesia when transcript is unavailable or stale.
- **Transcript upload protection**: Moved transcript upload from inside
the inner `try` block to the `finally` block, ensuring it always runs
even on streaming exceptions — prevents transcript loss that caused
staleness on subsequent turns.
- **Agent inactivity timeout**: Configurable timeout (default 300s)
kills hung Claude agents that stop producing SDK messages.
- **SDK system prompt**: Restructured with clear sections for shell
commands, two storage systems, file transfer workflows, and long-running
tools.
- **Path validation hardening**: `_validate_ephemeral_path` uses
`os.path.realpath` for both session dir and target path, fixing macOS
`/tmp` → `/private/tmp` symlink mismatch. Empty-string params normalised
to `None` to prevent dispatch assertion failures.
## Test plan
- [x] `_format_conversation_context` — empty, user, assistant, tool
calls, tool results, full conversation (query_builder_test.py)
- [x] `_build_query_message` — resume up-to-date, stale transcript gap,
zero msg count, no resume single/multi (query_builder_test.py)
- [x] `_validate_ephemeral_path` — valid path, traversal, cross-session,
symlink escape, nested (workspace_files_test.py)
- [x] `_resolve_write_content` — no sources, multiple sources, plain
text, base64, invalid base64, source_path, not found, outside ephemeral,
empty strings (workspace_files_test.py)
- [ ] Verify transcript upload occurs even after streaming error
- [ ] Verify agent inactivity timeout kills hung agents (300s default)
---------
Co-authored-by: Otto (AGPT) <otto@agpt.co>
## Summary
- **Block background Task agents**: The SDK's `Task` tool with
`run_in_background=true` stalls the SSE stream (no messages flow while
they execute) and the agents get killed when the main agent's turn ends
and we SIGTERM the CLI. The `PreToolUse` hook now denies these and tells
the agent to run tasks in the foreground instead.
- **Add heartbeats to SDK streaming**: Replaced the `async for` loop
with an explicit async iterator + `asyncio.wait_for(15s)`. Sends
`StreamHeartbeat` when the CLI is idle (e.g. during long tool execution)
to keep SSE connections alive through proxies/LBs.
- **Fix summarization hallucination**: The `_summarize_messages_llm`
prompt forced the LLM to produce ALL 9 sections ("You MUST include
ALL"), causing fabrication when the conversation didn't have content for
every section. Changed to optional sections with explicit
anti-hallucination instructions.
## Context
Session `7a9dda34-1068-4cfb-9132-5daf8ad31253` exhibited both issues:
1. The copilot tried to spin up background agents to create files in
parallel, then stopped responding
2. On resume, the copilot hallucinated having completed a "comprehensive
competitive analysis" with "9 deliverables" that never happened
## Test plan
- [x] All 26 security hooks tests pass (3 new: background blocked,
foreground allowed, limit enforced)
- [x] All 44 prompt utility tests pass
- [x] Linting and typecheck pass
- [ ] Manual test: copilot session where agent attempts to use Task tool
— should run foreground only
- [ ] Manual test: long-running tool execution — SSE should stay alive
via heartbeats
- [ ] Manual test: resume a multi-turn session — no hallucinated context
in summary
## Summary
- The stop button was completely disconnected — clicking it only aborted
the client-side SSE fetch while the executor kept running indefinitely
- The executor already had full cancel infrastructure (RabbitMQ FANOUT
consumer, `CancelCoPilotEvent`, `threading.Event`, periodic cancel
checks) but nobody ever published a cancel message
- This PR wires up the missing pieces: a cancel REST endpoint, a publish
function, and frontend integration
## Changes
- **`executor/utils.py`**: Add `enqueue_cancel_task()` to publish
`CancelCoPilotEvent` to the existing FANOUT exchange
- **`routes.py`**: Add `POST /sessions/{session_id}/cancel` that finds
the active task, publishes cancel, and polls Redis until the task
confirms stopped (up to 10s timeout)
- **`cancel/route.ts`**: Next.js API proxy route for the cancel endpoint
- **`useCopilotPage.ts`**: Wrap AI SDK's `stop()` to also call the
backend cancel API — `sdkStop()` fires first for instant UI feedback,
then the cancel API waits for executor confirmation
## Test plan
- [ ] Start a copilot chat session and send a message
- [ ] Click "Stop generating" while streaming
- [ ] Verify executor logs show `Received cancel for {task_id}` and
`Cancelled during streaming`
- [ ] Verify the cancel endpoint returns `{"cancelled": true}` (not
timeout)
- [ ] Verify frontend transitions to idle state
- [ ] Verify clicking stop when no task is running returns
`{"cancelled": false, "reason": "no_active_task"}`
When the LLM returns multiple tool calls in a single response (e.g.
multiple web fetches for a research task), they now execute concurrently
instead of sequentially. This can dramatically reduce latency for
multi-tool turns.
**Before:** Tool calls execute one after another — 7 web fetches × 2s
each = 14s total
**After:** All tool calls fire concurrently — 7 web fetches = ~2s total
### Changes
- **`service.py`**: New `_execute_tool_calls_parallel()` function that
spawns tool calls as concurrent `asyncio` tasks, collecting stream
events via `asyncio.Queue`
- **`service.py`**: `_yield_tool_call()` now accepts an optional
`session_lock` parameter for concurrent-safe session mutations
- **`base.py`**: Session lock exposed via `contextvars` so tools that
need it can access it without interface changes
- **`run_agent.py`**: Rate-limit counters (`successful_agent_runs`,
`successful_agent_schedules`) protected with the session lock to prevent
race conditions
### Concurrency Safety
| Shared State | Risk | Mitigation |
|---|---|---|
| `session.messages` (long-running tools only) | Race on append + upsert
| `session_lock` wraps mutations |
| `session.successful_agent_runs` counter | Bypass max-runs check |
`session_lock` wraps read-check-increment |
| Tool-internal state (DB queries, API calls) | None — stateless | No
mitigation needed |
### Testing
- Added `parallel_tool_calls_test.py` with tests for:
- Parallel timing verification (sum vs max of delays)
- Single tool call regression
- Retryable error propagation
- Shared session lock verification
- Cancellation cleanup
Closes SECRT-2016
---------
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
## Summary
- Add `SUGGESTED_GOAL` response type and `SuggestedGoalResponse` model
to backend; vague/unachievable goals now return a structured suggestion
instead of a generic error
- Add `SuggestedGoalCard` frontend component (amber styling, "Use this
goal" button) that lets users accept and re-submit a refined goal in one
click
- Add error recovery buttons ("Try again", "Simplify goal") to the error
output block
- Update copilot system prompt with explicit guidance for handling
`suggested_goal` and `clarifying_questions` feedback loops
- Add `create_agent_test.py` covering all four decomposition result
types
## Test plan
- [ ] Trigger vague goal (e.g. "monitor social media") →
`SuggestedGoalCard` renders with amber styling
- [ ] Trigger unachievable goal (e.g. "read my mind") → card shows goal
type "Goal cannot be accomplished" with reason
- [ ] Click "Use this goal" → sends message and triggers new
`create_agent` call with the suggested goal
- [ ] Trigger an error → "Try again" and "Simplify goal" buttons appear
below the error
- [ ] Clarifying questions answered → LLM re-calls `create_agent` with
context (system prompt guidance)
- [ ] Backend tests pass: `poetry run pytest
backend/api/features/chat/tools/create_agent_test.py -xvs` (requires
Docker services)
<!-- greptile_comment -->
<details><summary><h3>Greptile Summary</h3></summary>
Replaced generic `ErrorResponse` with structured `SuggestedGoalResponse`
for vague/unachievable goals in the copilot agent creation flow. Added
frontend `SuggestedGoalCard` component with amber styling and "Use this
goal" button for one-click goal refinement. Enhanced system prompt with
explicit feedback loop handling for `suggested_goal` and
`clarifying_questions`. Added comprehensive test coverage for all four
decomposition result types.
**Key improvements:**
- Better UX: Users can now accept refined goals with one click instead
of manually retyping
- Clearer error recovery: Added "Try again" and "Simplify goal" buttons
to error blocks
- Structured data: Backend now returns `suggested_goal`, `reason`,
`original_goal`, and `goal_type` fields instead of embedding everything
in error messages
**Issue found:**
- The `reason` field from the backend is not being passed to or
displayed by the `SuggestedGoalCard` component, so users won't see the
explanation for why their goal was rejected (especially important for
unachievable goals where it explains what blocks are missing)
</details>
<details><summary><h3>Confidence Score: 4/5</h3></summary>
- Safe to merge after fixing the missing `reason` field in the frontend
component
- Implementation is well-structured with good test coverage and follows
established patterns. The issue with the missing `reason` field is
straightforward to fix but important for UX - users won't understand why
their goal was rejected without it. All other changes are solid: backend
properly returns structured data, tests cover all cases, and the
component integration follows the project's conventions.
-
autogpt_platform/frontend/src/app/(platform)/copilot/tools/CreateAgent/CreateAgent.tsx
and SuggestedGoalCard.tsx need the `reason` prop added
</details>
<details><summary><h3>Flowchart</h3></summary>
```mermaid
flowchart TD
Start[User submits goal to create_agent] --> Decompose[decompose_goal analyzes request]
Decompose --> CheckType{Decomposition result type?}
CheckType -->|clarifying_questions| Questions[Return ClarificationNeededResponse]
Questions --> UserAnswers[User answers questions]
UserAnswers --> Retry[Retry with context]
Retry --> Decompose
CheckType -->|vague_goal| VagueResponse[Return SuggestedGoalResponse<br/>goal_type: vague]
VagueResponse --> ShowSuggestion[Frontend: SuggestedGoalCard<br/>amber styling]
ShowSuggestion --> UserAccepts{User clicks<br/>Use this goal?}
UserAccepts -->|Yes| NewGoal[Send suggested goal]
NewGoal --> Decompose
UserAccepts -->|No| End1[User refines manually]
CheckType -->|unachievable_goal| UnachievableResponse[Return SuggestedGoalResponse<br/>goal_type: unachievable<br/>reason: missing blocks]
UnachievableResponse --> ShowSuggestion
CheckType -->|success| Generate[generate_agent creates workflow]
Generate --> SaveOrPreview{save parameter?}
SaveOrPreview -->|true| Save[Save to library<br/>AgentSavedResponse]
SaveOrPreview -->|false| Preview[AgentPreviewResponse]
CheckType -->|error| ErrorFlow[Return ErrorResponse]
ErrorFlow --> ShowError[Frontend: Show error with<br/>Try again & Simplify goal buttons]
ShowError --> UserRetry{User action?}
UserRetry -->|Try again| Decompose
UserRetry -->|Simplify goal| GetHelp[Ask LLM to simplify]
GetHelp --> Decompose
Save --> End2[Done]
Preview --> End2
End1 --> End2
```
</details>
<sub>Last reviewed commit: 2f37aee</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
### Changes 🏗️
Adds a **Pre-completion Checks (MANDATORY)** section to
`frontend/CLAUDE.md` that instructs Claude Code agents to always run the
following commands in order before reporting frontend work as done:
1. `pnpm format` — auto-fix formatting issues
2. `pnpm lint` — check for lint errors and fix them
3. `pnpm types` — check for type errors and fix them
This ensures code quality gates are enforced consistently by AI agents
working on the frontend.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified `pnpm format` passes cleanly
- [x] Verified `pnpm lint` passes cleanly
- [x] Verified `pnpm types` passes cleanly
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
### Changes 🏗️
Some fixes to make running e2e more predictable...
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] e2e are imdempotent
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Changes 🏗️
<img width="600" height="416" alt="Screenshot 2026-02-19 at 18 05 39"
src="https://github.com/user-attachments/assets/930116ad-b611-4398-bee7-4e33ca4dc688"
/>
Make the mini game a snake 🐍 game, so we don't use assets (_possible
license issues_ ), and it's simpler...
## Checklist 📋
### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Run the app and test
## Changes 🏗️
<img width="800" height="969" alt="Screenshot 2026-02-18 at 20 30 36"
src="https://github.com/user-attachments/assets/30d7d211-98c1-4159-94e1-86e81e29ad43"
/>
- Make the spinner centred when the chat is loading
## Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Run the app and test locally
## Summary
### SDK built-in tool output forwarding
- WebSearch, Read, TodoWrite outputs now render in the frontend —
PostToolUse hook stashes outputs before SDK truncation, response adapter
flushes unresolved tool calls via `_flush_unresolved_tool_calls` +
`parent_tool_use_id` handling
- Multi-call stash upgraded to `dict[str, list[str]]` to support
multiple calls to the same built-in tool in one turn
### Transcript-based `--resume` with staleness detection
- Simplified to single upload block after `async with` (Stop hook +
`appendFileSync` guarantees), extracted `_try_upload_transcript` helper
- **NEW**: `message_count` watermark + timestamp metadata stored
alongside transcript — on the next turn, detects staleness and
compresses only the missed messages instead of the full history (hybrid:
transcript via `--resume` + compressed gap)
- Removed redundant dual-strategy code and dead
`find_cli_transcript`/`read_fallback_transcript` functions
### Frontend stream reconnection
- **NEW**: Enabled `resume: true` on `useChat` with
`prepareReconnectToStreamRequest` — page refresh reconnects to active
backend streams via Redis replay (backend `resume_session_stream`
endpoint was already wired up)
### GenericTool.tsx UI overhaul
- Tool-specific icons (terminal, globe, file, search, edit, checklist)
with category-based display
- TodoWrite checklist rendering with status indicators
(completed/in-progress/pending)
- WebSearch/MCP content display via `extractMcpText` for MCP-style
content blocks + raw JSON fallback
- Defensive TodoItem filter per coderabbit review
- Proper accordion content per tool category (bash, web, file, search,
edit, todo)
### Image support
- MCP tool results now include `{"type": "image"}` content blocks when
workspace file responses contain `content_base64` with image MIME types
### Security & cleanup
- `AskUserQuestion` added to `SDK_DISALLOWED_TOOLS` (interactive CLI
tool, no terminal in copilot)
- 36 per-operation `[TIMING]`/`[TASK_LOOKUP]` diagnostic logs downgraded
info→debug
- Silent exception fixes: warning logs for swallowed errors in
stream_registry + service
## Test plan
- [ ] Verify copilot multi-turn conversations use `--resume` (check logs
for `Using --resume`)
- [ ] Verify stale transcript detection fills gap (check logs for
`Transcript stale: covers N of M messages`)
- [ ] Verify page refresh reconnects to active stream (check network tab
for GET to `/stream` returning SSE)
- [ ] Verify WebSearch, Read, TodoWrite tool outputs render in frontend
accordion
- [ ] Verify GenericTool icons and accordion content display correctly
for each tool type
- [ ] Verify production log volume is reduced (no more `[TIMING]` at
info level)
---------
Co-authored-by: Otto (AGPT) <otto@agpt.co>
## Summary
Replaces the "Advanced" switch/toggle on builder nodes with a chevron
control, matching the UX pattern used for the outputs section.
Resolves
[OPEN-3006](https://linear.app/autogpt/issue/OPEN-3006/replace-advanced-switch-with-chevron-on-builder-nodes)
Before
<img width="443" height="348" alt="Screenshot 2026-02-17 at 9 01 31 pm"
src="https://github.com/user-attachments/assets/40e98669-3136-4e53-8d46-df18ea32c4d7"
/>
After
<img width="443" height="348" alt="Screenshot 2026-02-17 at 9 00 21 pm"
src="https://github.com/user-attachments/assets/0836e3ac-1d0a-43d7-9392-c9d5741b32b6"
/>
## Changes
- `NodeAdvancedToggle.tsx` — Replaced switch component with a chevron
expand/collapse toggle
## Testing
Tested and verified by @kpczerwinski
<!-- greptile_comment -->
<details><summary><h3>Greptile Summary</h3></summary>
Replaces the `Switch` toggle for the "Advanced" section on builder nodes
with a chevron (`CaretDownIcon`) expand/collapse control, matching the
existing UX pattern used in `OutputHandler.tsx`. The change is clean and
consistent with the codebase.
- Swapped `Switch` component for a ghost `Button` + `CaretDownIcon` with
a `rotate-180` transition for visual feedback
- Pattern closely mirrors the output section toggle in
`OutputHandler.tsx` (lines 120-136)
- Removed the top border separator and rounded bottom corners from the
container, adjusting the visual spacing
- Toggle logic correctly inverts the `showAdvanced` boolean state
- Uses Phosphor Icons and design system components per project
conventions
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge — it is a small, focused UI change with no
logic or security concerns.
- Single file changed with a straightforward UI component swap. The new
implementation follows an established pattern already in use in
OutputHandler.tsx. Toggle logic is correct and all conventions (Phosphor
Icons, design system components, Tailwind styling) are followed.
- No files require special attention.
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant User
participant NodeAdvancedToggle
participant nodeStore
User->>NodeAdvancedToggle: Click chevron button
NodeAdvancedToggle->>nodeStore: setShowAdvanced(nodeId, !showAdvanced)
nodeStore-->>NodeAdvancedToggle: Updated showAdvanced state
NodeAdvancedToggle->>NodeAdvancedToggle: Rotate CaretDownIcon (0° ↔ 180°)
Note over NodeAdvancedToggle: Advanced fields shown/hidden via FormCreator
```
</details>
<sub>Last reviewed commit: ad66080</sub>
<!-- greptile_other_comments_section -->
**Context used:**
- Context from `dashboard` - autogpt_platform/frontend/CLAUDE.md
([source](https://app.greptile.com/review/custom-context?memory=39861924-d320-41ba-a1a7-a8bff44f780a))
- Context from `dashboard` -
autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/ARCHITECTURE_FLOW_EDITOR.md
([source](https://app.greptile.com/review/custom-context?memory=0c5511fe-9aeb-4cf1-bbe9-798f2093b748))
<!-- /greptile_comment -->
---------
Co-authored-by: Krzysztof Czerwinski <kpczerwinski@gmail.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Ubbe <0ubbe@users.noreply.github.com>
Co-authored-by: Ubbe <hi@ubbe.dev>
Two targeted changes to the CoPilot feature request tools:
1. **Remove description from search results** — The
`search_feature_requests` tool no longer returns issue descriptions.
Only the title is needed for duplicate detection, reducing unnecessary
data exposure.
2. **Prevent PII in created issues** — Updated the
`create_feature_request` tool description and parameter descriptions to
explicitly instruct the LLM to never include personally identifiable
information (names, emails, company names, etc.) in Linear issue titles
and descriptions.
Resolves [SECRT-2010](https://linear.app/autogpt/issue/SECRT-2010)
The copilot executor runs each worker in its own thread with a dedicated
event loop (`asyncio.new_event_loop()`). `aiohttp.ClientSession` is
bound to the event loop where it was created — using it from a different
loop causes `asyncio.timeout()` to fail with:
```
RuntimeError: Timeout context manager should be used inside a task
```
This was the root cause of transcript upload failures tracked in
SECRT-2009 and [Sentry
#7272473694](https://significant-gravitas.sentry.io/issues/7272473694/).
### Fix
**One `GCSWorkspaceStorage` instance per event loop** instead of a
single shared global.
- `get_workspace_storage()` now returns a per-loop GCS instance (keyed
by `id(asyncio.get_running_loop())`). Local storage remains shared since
it has no async I/O.
- `shutdown_workspace_storage()` closes the instance for the **current**
loop only, so `session.close()` always runs on the loop that created the
session.
- `CoPilotProcessor.cleanup()` shuts down workspace storage on the
worker's own loop, then stops the loop.
- Manager cleanup submits `cleanup_worker` to each thread pool worker
before shutting down the executor — replacing the old approach of
creating a temporary event loop that couldn't close cross-loop sessions.
### Changes
| File | Change |
|------|--------|
| `util/workspace_storage.py` | `GCSWorkspaceStorage` back to simple
single-session class; `get_workspace_storage()` returns per-loop GCS
instance; `shutdown_workspace_storage()` scoped to current loop |
| `copilot/executor/processor.py` | Added `CoPilotProcessor.cleanup()`
and `cleanup_worker()` |
| `copilot/executor/manager.py` | Calls `cleanup_worker` on each thread
pool worker during shutdown |
Fixes SECRT-2009
---------
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
## Summary
Fixes two bugs in the copilot executor:
### SECRT-2008: WorkspaceManager bypasses db_accessors
`backend/util/workspace.py` imported 6 workspace functions directly from
`backend/data/workspace.py`, which call `prisma()` directly. In the
copilot executor (no Prisma connection), these fail.
**Fix:** Replace direct imports with `workspace_db()` from
`db_accessors`, routing through the database_manager HTTP client when
Prisma is unavailable. Also:
- Register all 6 workspace functions in `DatabaseManager` and
`DatabaseManagerAsyncClient`
- Add `UniqueViolationError` to the service `EXCEPTION_MAPPING` so it's
properly re-raised over HTTP (needed for race-condition retry logic)
### SECRT-2009: Transcript upload asyncio.timeout error
`asyncio.create_task()` at line 696 of `service.py` creates an orphaned
background task in the executor's thread event loop.
`gcloud-aio-storage`'s `asyncio.timeout()` fails in this context.
**Fix:** Replace `create_task` with direct `await`. The upload runs
after streaming completes (all chunks already yielded), so no
user-facing latency impact. The function already has internal try/except
error handling.
Uncouple Copilot task execution from the REST API server. This should
help performance and scalability, and allows task execution to continue
regardless of the state of the user's connection.
- Resolves#12023
### Changes 🏗️
- Add `backend.copilot.executor`->`CoPilotExecutor` (setup similar to
`backend.executor`->`ExecutionManager`).
This executor service uses RabbitMQ-based task distribution, and sticks
with the existing Redis Streams setup for task output. It uses a cluster
lock mechanism to ensure a task is only executed by one pod, and the
`DatabaseManager` for pooled DB access.
- Add `backend.data.db_accessors` for automatic choice of direct/proxied
DB access
Chat requests now flow: API → RabbitMQ → CoPilot Executor → Redis
Streams → SSE Client. This enables horizontal scaling of chat processing
and isolates long-running LLM operations from the API service.
- Move non-API Copilot stuff into `backend.copilot` (from
`backend.api.features.chat`)
- Updated import paths for all usages
- Move `backend.executor.database` to `backend.data.db_manager` and add
methods for copilot executor
- Updated import paths for all usages
- Make `backend.copilot.db` RPC-compatible (-> DB ops return ~~Prisma~~
Pydantic models)
- Make `backend.data.workspace` RPC-compatible
- Make `backend.data.graphs.get_store_listed_graphs` RPC-compatible
DX:
- Add `copilot_executor` service to Docker setup
Config:
- Add `Config.num_copilot_workers` (default 5) and
`Config.copilot_executor_port` (default 8008)
- Remove unused `Config.agent_server_port`
> [!WARNING]
> **This change adds a new microservice to the system, with entrypoint
`backend.copilot.executor`.**
> The `docker compose` setup has been updated, but if you run the
Platform on something else, you'll have to update your deployment config
to include this new service.
>
> When running locally, the `CoPilotExecutor` uses port 8008 by default.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Copilot works
- [x] Processes messages when triggered
- [x] Can use its tools
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
---------
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
## Summary
Addresses SENTRY-1051: Shiki warning about multiple highlighter
instances.
## Problem
The `@streamdown/code` package creates a **new Shiki highlighter for
each language** encountered. When users view AI chat responses with code
blocks in multiple languages (JavaScript, Python, JSON, YAML, etc.),
this creates 10+ highlighter instances, triggering Shiki's warning:
> "10 instances have been created. Shiki is supposed to be used as a
singleton, consider refactoring your code to cache your highlighter
instance"
This causes memory bloat and performance degradation.
## Solution
Introduced a custom code highlighting plugin that properly implements
the singleton pattern:
### New files:
- `src/lib/shiki-highlighter.ts` - Singleton highlighter management
- `src/lib/streamdown-code-plugin.ts` - Drop-in replacement for
`@streamdown/code`
### Key features:
- **Single shared highlighter** - One instance serves all code blocks
- **Preloaded common languages** - JS, TS, Python, JSON, Bash, YAML,
etc.
- **Lazy loading** - Additional languages loaded on demand
- **Result caching** - Avoids re-highlighting identical code blocks
### Changes:
- Added `shiki` as direct dependency
- Updated `message.tsx` to use the new plugin
## Testing
- [ ] Verify code blocks render correctly in AI chat
- [ ] Confirm no Shiki singleton warnings in console
- [ ] Test with multiple languages in same conversation
## Related
- Linear: SENTRY-1051
- Sentry: Multiple Shiki instances warning
<!-- greptile_comment -->
<details><summary><h3>Greptile Summary</h3></summary>
Replaced `@streamdown/code` with a custom singleton-based Shiki
highlighter implementation to resolve memory bloat from creating
multiple highlighter instances per language. The new implementation
creates a single shared highlighter with preloaded common languages (JS,
TS, Python, JSON, etc.) and lazy-loads additional languages on demand.
Results are cached to avoid re-highlighting identical code blocks.
**Key changes:**
- Added `shiki` v3.21.0 as a direct dependency
- Created `shiki-highlighter.ts` with singleton pattern and language
management utilities
- Created `streamdown-code-plugin.ts` as a drop-in replacement for
`@streamdown/code`
- Updated `message.tsx` to import from the new plugin instead of
`@streamdown/code`
The implementation follows React best practices with async highlighting
and callback-based notifications. The cache key uses code length +
prefix/suffix for efficient lookups on large code blocks.
</details>
<details><summary><h3>Confidence Score: 4/5</h3></summary>
- Safe to merge with minor considerations for edge cases
- The implementation is solid with proper singleton pattern, caching,
and async handling. The code is well-structured and addresses the stated
problem. However, there's a subtle potential race condition in the
callback handling where multiple concurrent requests for the same cache
key could trigger duplicate highlight operations before the first
completes. The cache key generation using prefix/suffix could
theoretically cause false cache hits for large files with identical
prefixes and suffixes. Despite these edge cases, the implementation
should work correctly for the vast majority of use cases.
- No files require special attention
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant UI as Streamdown Component
participant Plugin as Custom Code Plugin
participant Cache as Token Cache
participant Singleton as Shiki Highlighter (Singleton)
participant Callbacks as Pending Callbacks
UI->>Plugin: highlight(code, lang)
Plugin->>Cache: Check cache key
alt Cache hit
Cache-->>Plugin: Return cached result
Plugin-->>UI: Return highlighted tokens
else Cache miss
Plugin->>Callbacks: Register callback
Plugin->>Singleton: Get highlighter instance
alt First call
Singleton->>Singleton: Create highlighter with preloaded languages
end
Singleton-->>Plugin: Return highlighter
alt Language not loaded
Plugin->>Singleton: Load language dynamically
end
Plugin->>Singleton: codeToTokens(code, lang, themes)
Singleton-->>Plugin: Return tokens
Plugin->>Cache: Store result
Plugin->>Callbacks: Notify all waiting callbacks
Callbacks-->>UI: Async callback with result
end
```
</details>
<sub>Last reviewed commit: 96c793b</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
## Summary
Updates the session delete confirmation in CoPilot to use the new
`Dialog` component from `molecules/Dialog` instead of the legacy
`DeleteConfirmDialog`.
## Changes
- **ChatSidebar**: Use Dialog component for delete confirmation
(desktop)
- **CopilotPage**: Use Dialog component for delete confirmation (mobile)
## Behavior
- Dialog stays **open** during deletion with loading state on button
- Cancel button **disabled** while delete is in progress
- Delete button shows **loading spinner** during deletion
- Dialog only closes on successful delete or when cancel is clicked (if
not deleting)
## Screenshots
*Dialog uses the same styling as other molecules/Dialog instances in the
app*
## Requested by
@0ubbe
<!-- greptile_comment -->
<details><summary><h3>Greptile Summary</h3></summary>
Replaces the legacy `DeleteConfirmDialog` component with the new
`molecules/Dialog` component for session delete confirmations in both
desktop (ChatSidebar) and mobile (CopilotPage) views. The new
implementation maintains the same behavior: dialog stays open during
deletion with a loading state on the delete button and disabled cancel
button, closing only on successful deletion or cancel click.
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge with minimal risk
- This is a straightforward component replacement that maintains the
same behavior and UX. The Dialog component API is properly used with
controlled state, the loading states are correctly implemented, and both
mobile and desktop views are handled consistently. The changes are
well-tested patterns used elsewhere in the codebase.
- No files require special attention
</details>
<details><summary><h3>Flowchart</h3></summary>
```mermaid
flowchart TD
A[User clicks delete button] --> B{isMobile?}
B -->|Yes| C[CopilotPage Dialog]
B -->|No| D[ChatSidebar Dialog]
C --> E[Set sessionToDelete state]
D --> E
E --> F[Dialog opens with controlled.isOpen]
F --> G{User action?}
G -->|Cancel| H{isDeleting?}
H -->|No| I[handleCancelDelete: setSessionToDelete null]
H -->|Yes| J[Cancel button disabled]
G -->|Confirm Delete| K[handleConfirmDelete called]
K --> L[deleteSession mutation]
L --> M[isDeleting = true]
M --> N[Button shows loading spinner]
M --> O[Cancel button disabled]
L --> P{Mutation result?}
P -->|Success| Q[Invalidate sessions query]
Q --> R[Clear sessionId if current]
R --> S[setSessionToDelete null]
S --> T[Dialog closes]
P -->|Error| U[Show toast error]
U --> V[setSessionToDelete null]
V --> W[Dialog closes]
```
</details>
<sub>Last reviewed commit: 275950c</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
---------
Co-authored-by: Lluis Agusti <hi@llu.lu>
Co-authored-by: Ubbe <hi@ubbe.dev>
The `LINEAR_API_KEY` environment variable name is too generic — it
matches the key name used by integrations/blocks, meaning that if set
globally, it could inadvertently grant all users access to Linear
through the blocks system rather than restricting it to the copilot
feature-request tool.
This renames the setting to `COPILOT_LINEAR_API_KEY` to make it clear
this key is scoped exclusively to the copilot's feature-request
functionality, preventing it from being picked up as a general-purpose
Linear credential.
### Changes 🏗️
- Renamed `linear_api_key` → `copilot_linear_api_key` in `Secrets`
settings model (`backend/util/settings.py`)
- Updated all references in the copilot feature-request tool
(`backend/api/features/chat/tools/feature_requests.py`)
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified the rename is consistent across all references (settings
+ feature_requests tool)
- [x] No other files reference the old `linear_api_key` setting name
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
> **Note:** The env var changes from `LINEAR_API_KEY` to
`COPILOT_LINEAR_API_KEY`. Any deployment using the old name will need to
update accordingly.
<!-- greptile_comment -->
<details><summary><h3>Greptile Summary</h3></summary>
Renamed `LINEAR_API_KEY` to `COPILOT_LINEAR_API_KEY` in settings and the
copilot feature-request tool to prevent unintended access through Linear
blocks.
**Key changes:**
- Updated `Secrets.linear_api_key` → `Secrets.copilot_linear_api_key` in
`backend/util/settings.py`
- Updated all references in
`backend/api/features/chat/tools/feature_requests.py`
- The rename prevents the copilot Linear key from being picked up by the
Linear blocks integration (which uses `LINEAR_API_KEY` via
`ProviderBuilder` in `backend/blocks/linear/_config.py`)
**Issues found:**
- `.env.default` still references `LINEAR_API_KEY` instead of
`COPILOT_LINEAR_API_KEY`
- Frontend styleguide has a hardcoded error message with the old
variable name
</details>
<details><summary><h3>Confidence Score: 3/5</h3></summary>
- Generally safe but requires fixing `.env.default` before deployment
- The code changes are correct and achieve the intended security
improvement by preventing scope leakage. However, the PR is incomplete -
`.env.default` wasn't updated (critical for deployment) and a frontend
error message reference was missed. These issues will cause
configuration problems for anyone deploying with the new variable name.
- Check `autogpt_platform/backend/.env.default` and
`autogpt_platform/frontend/src/app/(platform)/copilot/styleguide/page.tsx`
- both need updates to match the renamed variable
</details>
<details><summary><h3>Flowchart</h3></summary>
```mermaid
flowchart TD
A[".env file<br/>COPILOT_LINEAR_API_KEY"] --> B["Secrets model<br/>copilot_linear_api_key"]
B --> C["feature_requests.py<br/>_get_linear_config()"]
C --> D["Creates APIKeyCredentials<br/>for copilot feature requests"]
E[".env file<br/>LINEAR_API_KEY"] --> F["ProviderBuilder<br/>in blocks/linear/_config.py"]
F --> G["Linear blocks integration<br/>for user workflows"]
style A fill:#90EE90
style B fill:#90EE90
style C fill:#90EE90
style D fill:#90EE90
style E fill:#FFD700
style F fill:#FFD700
style G fill:#FFD700
```
</details>
<sub>Last reviewed commit: 86dc57a</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Without specifying an explicit build target it would build the `migrate` stage because it is the last stage in the Dockerfile. This caused deployment failures.
- Follow-up to #12124 and 074be7ae
- Follow-up to #12124
Changes:
- Update `run` commands for all backend services in `docker-compose.platform.yml` to match the deployment commands used in production
- Add trigger on `docker-compose(.platform)?.yml` changes to the Frontend CI workflow
## Summary
Upgrades RabbitMQ from the end-of-life `rabbitmq:3.12-management` to
`rabbitmq:4.1.4`, aligning CI, local dev, and e2e testing with
production.
## Changes
### CI Workflow (`.github/workflows/platform-backend-ci.yml`)
- **Image:** `rabbitmq:3.12-management` → `rabbitmq:4.1.4`
- **Port:** Removed 15672 (management UI) — not used
- **Health check:** Added to prevent flaky tests from race conditions
during startup
### Docker Compose (`docker-compose.platform.yml`,
`docker-compose.test.yaml`)
- **Image:** `rabbitmq:management` → `rabbitmq:4.1.4`
- **Port:** Removed 15672 (management UI) — not used
## Why
- RabbitMQ 3.12 is EOL
- We don't use the management interface, so `-management` variant is
unnecessary
- CI and local dev/e2e should match production (4.1.4)
## Testing
CI validates that backend tests pass against RabbitMQ 4.1.4 on Python
3.11, 3.12, and 3.13.
---
Closes SECRT-1703
[SECRT-2006: Dev deployment failing: poetry not found in container
PATH](https://linear.app/autogpt/issue/SECRT-2006)
- Follow-up to #12090
### Changes 🏗️
- Remove now-broken Poetry path config values
- Remove usage of now-broken `poetry run` in container run command
- Add trigger on `backend/Dockerfile` changes to Frontend CI workflow
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- If it works, CI will pass
## Summary
Adds the ability to delete chat sessions from the CoPilot interface.
## Changes
### Backend
- Add `DELETE /api/chat/sessions/{session_id}` endpoint in `routes.py`
- Returns 204 on success, 404 if not found or not owned by user
- Reuses existing `delete_chat_session` function from `model.py`
### Frontend
- Add delete button (trash icon) that appears on hover for each chat
session
- Add confirmation dialog before deletion using existing
`DeleteConfirmDialog` component
- Refresh session list after successful delete
- Clear current session selection if the deleted session was active
- Update OpenAPI spec with new endpoint
## Testing
1. Hover over a chat session in sidebar → trash icon appears
2. Click trash icon → confirmation dialog
3. Confirm deletion → session removed, list refreshes
4. If deleted session was active, selection is cleared
## Screenshots
Delete button appears on hover, confirmation dialog on click.
## Related Issues
Closes SECRT-1928
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Adds the ability to delete chat sessions from the CoPilot interface — a
new `DELETE /api/chat/sessions/{session_id}` backend endpoint and a
corresponding delete button with confirmation dialog in the
`ChatSidebar` frontend component.
- **Backend route** (`routes.py`): Clean implementation reusing the
existing `delete_chat_session` model function with proper auth guards
and 204/404 responses. No issues.
- **Frontend** (`ChatSidebar.tsx`): Adds hover-visible trash icon per
session, confirmation dialog, mutation with cache invalidation, and
active session clearing on delete. However, it uses a `__legacy__`
component (`DeleteConfirmDialog`) which violates the project's style
guide — new code should use the modern design system components. Error
handling only logs to console without user-facing feedback (project
convention is to use toast notifications for mutation errors).
`isDeleting` is destructured but unused.
- **OpenAPI spec** updated correctly.
- **Unrelated file included**:
`notes/plan-SECRT-1959-graph-edge-desync.md` is a planning document for
a different ticket and should be removed from this PR. The `notes/`
directory is newly introduced and both plan files should be reconsidered
for inclusion.
</details>
<details><summary><h3>Confidence Score: 3/5</h3></summary>
- Functionally correct but has style guide violations and includes
unrelated files that should be addressed before merge.
- The core feature implementation (backend DELETE endpoint and frontend
mutation logic) is sound and follows existing patterns. Score is lowered
because: (1) the frontend uses a legacy component explicitly prohibited
by the project's style guide, (2) mutation errors are not surfaced to
the user, and (3) the PR includes an unrelated planning document for a
different ticket.
- Pay close attention to `ChatSidebar.tsx` for the legacy component
import and error handling, and
`notes/plan-SECRT-1959-graph-edge-desync.md` which should be removed.
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant User
participant ChatSidebar as ChatSidebar (Frontend)
participant ReactQuery as React Query
participant API as DELETE /api/chat/sessions/{id}
participant Model as model.delete_chat_session
participant DB as db.delete_chat_session (Prisma)
participant Redis as Redis Cache
User->>ChatSidebar: Click trash icon on session
ChatSidebar->>ChatSidebar: Show DeleteConfirmDialog
User->>ChatSidebar: Confirm deletion
ChatSidebar->>ReactQuery: deleteSession({ sessionId })
ReactQuery->>API: DELETE /api/chat/sessions/{session_id}
API->>Model: delete_chat_session(session_id, user_id)
Model->>DB: delete_many(where: {id, userId})
DB-->>Model: bool (deleted count > 0)
Model->>Redis: Delete session cache key
Model->>Model: Clean up session lock
Model-->>API: True
API-->>ReactQuery: 204 No Content
ReactQuery->>ChatSidebar: onSuccess callback
ChatSidebar->>ReactQuery: invalidateQueries(sessions list)
ChatSidebar->>ChatSidebar: Clear sessionId if deleted was active
```
</details>
<sub>Last reviewed commit: 44a92c6</sub>
<!-- greptile_other_comments_section -->
<details><summary><h4>Context used (3)</h4></summary>
- Context from `dashboard` - autogpt_platform/frontend/CLAUDE.md
([source](https://app.greptile.com/review/custom-context?memory=39861924-d320-41ba-a1a7-a8bff44f780a))
- Context from `dashboard` - autogpt_platform/frontend/CONTRIBUTING.md
([source](https://app.greptile.com/review/custom-context?memory=cc4f1b17-cb5c-4b63-b218-c772b48e20ee))
- Context from `dashboard` - autogpt_platform/CLAUDE.md
([source](https://app.greptile.com/review/custom-context?memory=6e9dc5dc-8942-47df-8677-e60062ec8c3a))
</details>
<!-- /greptile_comment -->
---------
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
## Changes 🏗️
Make the UX nicer when running long tasks in Copilot, like creating an
agent, editing it or running a task.
## Checklist 📋
### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Run locally and play the game!
<!-- greptile_comment -->
<details><summary><h3>Greptile Summary</h3></summary>
This PR replaces the static progress bar and idle wait screens with an
interactive mini-game across the Create, Edit, and Run Agent copilot
tools. The existing mini-game (a simple runner with projectile-dodge
boss encounters) is significantly overhauled into a two-mode game: a
runner mode with animated tree obstacles and a duel mode featuring a
melee boss fight with attack, guard, and movement mechanics.
Sprite-based rendering replaces the previous shape-drawing approach.
- **Create/Edit/Run Agent UX**: All three tool views now show the
mini-game with contextual overlays during long-running operations,
replacing the progress bar in EditAgent and adding the game to RunAgent
- **Game mechanics overhaul**: Boss encounters changed from
projectile-dodging to melee duel with attack (Z), block (X), movement
(arrows), and jump (Space) controls
- **Sprite rendering**: Added 9 sprite sheet assets for characters,
trees, and boss animations with fallback to shape rendering if images
fail to load
- **UI overlays**: Added React-managed overlay states for idle,
boss-intro, boss-defeated, and game-over screens with continue/retry
buttons
- **Minor issues found**: Unused `isRunActive` variable in
`MiniGame.tsx`, unreachable "leaving" boss phase in `useMiniGame.ts`,
and a missing `expanded` property in `getAccordionMeta` return type
annotation in `EditAgent.tsx`
- **Unused asset**: `archer-shoot.png` is included in the PR but never
imported or referenced in any code
</details>
<details><summary><h3>Confidence Score: 4/5</h3></summary>
- This PR is safe to merge — it only affects the copilot mini-game UX
with no backend or data model changes.
- The changes are entirely frontend/cosmetic, scoped to the copilot
tools' waiting UX. The mini-game logic is self-contained in a
canvas-based hook and doesn't affect any application state, API calls,
or routing. The issues found are minor (unused variable, dead code, type
annotation gap, unused asset) and don't impact runtime behavior.
- `useMiniGame.ts` has the most complex logic changes (boss AI, death
animations, sprite rendering) and contains unreachable dead code in the
"leaving" phase handler. `EditAgent.tsx` has a return type annotation
that doesn't include `expanded`.
</details>
<details><summary><h3>Flowchart</h3></summary>
```mermaid
flowchart TD
A[Game Idle] -->|"Start button"| B[Run Mode]
B -->|"Jump over trees"| C{Score >= Threshold?}
C -->|No| B
C -->|"Yes, obstacles clear"| D[Boss Intro Overlay]
D -->|"Continue button"| E[Duel Mode]
E -->|"Attack Z / Guard X / Move ←→"| F{Boss HP <= 0?}
F -->|No| G{Player hit & not guarding?}
G -->|No| E
G -->|Yes| H[Player Death Animation]
H --> I[Game Over Overlay]
I -->|"Retry button"| B
F -->|Yes| J[Boss Death Animation]
J --> K[Boss Defeated Overlay]
K -->|"Continue button"| L[Reset Boss & Resume Run]
L --> B
```
</details>
<sub>Last reviewed commit: ad80e24</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
## Summary
- Removes the deprecated `OldAgentLibraryView` directory (13 files,
~2200 lines deleted)
- Removes the `NEW_AGENT_RUNS` feature flag from the `Flag` enum and
defaults
- Removes the legacy agent library page at `library/legacy/[id]`
- Moves shared `CronScheduler` components to
`src/components/contextual/CronScheduler/`
- Moves `agent-run-draft-view` and `agent-status-chip` to
`legacy-builder/` (co-located with their only consumer)
- Updates all import paths in consuming files (`AgentInfoStep`,
`SaveControl`, `RunnerInputUI`, `useRunGraph`)
## Test plan
- [x] `pnpm format` passes
- [x] `pnpm types` passes (no TypeScript errors)
- [x] No remaining references to `OldAgentLibraryView`,
`NEW_AGENT_RUNS`, or `new-agent-runs` in the codebase
- [x] Verify `RunnerInputUI` dialog still works in the legacy builder
- [x] Verify `AgentInfoStep` cron scheduling works in the publish modal
- [x] Verify `SaveControl` cron scheduling works in the legacy builder
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
This PR removes deprecated code from the legacy agent library view
system and consolidates the codebase to use the new agent runs
implementation exclusively. The refactor successfully removes ~2200
lines of code across 13 deleted files while properly relocating shared
components.
**Key changes:**
- Removed the entire `OldAgentLibraryView` directory and its 13
component files
- Removed the `NEW_AGENT_RUNS` feature flag from the `Flag` enum and
defaults
- Deleted the legacy agent library page route at `library/legacy/[id]`
- Moved `CronScheduler` components to
`src/components/contextual/CronScheduler/` for shared use across the
application
- Moved `agent-run-draft-view` and `agent-status-chip` to
`legacy-builder/` directory, co-locating them with their only consumer
- Updated `useRunGraph.ts` to import `GraphExecutionMeta` from the
generated API models instead of the deleted custom type definition
- Updated all import paths in consuming components (`AgentInfoStep`,
`SaveControl`, `RunnerInputUI`)
**Technical notes:**
- The new import path for `GraphExecutionMeta`
(`@/app/api/__generated__/models/graphExecutionMeta`) will be generated
when running `pnpm generate:api` from the OpenAPI spec
- All references to the old code have been cleanly removed from the
codebase
- The refactor maintains proper separation of concerns by moving shared
components to contextual locations
</details>
<details><summary><h3>Confidence Score: 4/5</h3></summary>
- This PR is safe to merge with minimal risk, pending manual
verification of the UI components mentioned in the test plan
- The refactor is well-structured and all code changes are correct. The
score of 4 (rather than 5) reflects that the PR author has marked three
manual testing items as incomplete in the test plan: verifying
`RunnerInputUI` dialog, `AgentInfoStep` cron scheduling, and
`SaveControl` cron scheduling. While the code changes are sound, these
UI components should be manually tested before merging to ensure the
moved components work correctly in their new locations.
- No files require special attention. The author should complete the
manual testing checklist items for `RunnerInputUI`, `AgentInfoStep`, and
`SaveControl` as noted in the test plan.
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant Dev as Developer
participant FE as Frontend Build
participant API as Backend API
participant Gen as Generated Types
Note over Dev,Gen: Refactor: Remove OldAgentLibraryView & NEW_AGENT_RUNS flag
Dev->>FE: Delete OldAgentLibraryView (13 files, ~2200 lines)
Dev->>FE: Remove NEW_AGENT_RUNS from Flag enum
Dev->>FE: Delete library/legacy/[id]/page.tsx
Dev->>FE: Move CronScheduler → src/components/contextual/
Dev->>FE: Move agent-run-draft-view → legacy-builder/
Dev->>FE: Move agent-status-chip → legacy-builder/
Dev->>FE: Update RunnerInputUI import path
Dev->>FE: Update SaveControl import path
Dev->>FE: Update AgentInfoStep import path
Dev->>FE: Update useRunGraph.ts
FE->>Gen: Import GraphExecutionMeta from generated models
Note over Gen: Type available after pnpm generate:api
Gen-->>API: Uses OpenAPI spec schema
API-->>FE: Type-safe GraphExecutionMeta model
```
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Add `nodrag` class to the key name input wrapper in
`WrapIfAdditionalTemplate.tsx`
- This prevents the node from being dragged when users try to select
text in the key name input field
- Follows the same pattern used by other input components like
`TextWidget.tsx`
## Test plan
- [x] Open the new builder
- [x] Add a custom node with an Object input field
- [x] Try to select text in the key name input by clicking and dragging
- [x] Verify that text selection works without moving the block
Co-authored-by: Claude <noreply@anthropic.com>
## Summary
Enhances the existing `ConcatenateListsBlock` and adds five new
companion blocks for comprehensive list manipulation, addressing issue
#11139 ("Implement block to concatenate lists").
### Changes
- **Enhanced `ConcatenateListsBlock`** with optional deduplication
(`deduplicate`) and None-value filtering (`remove_none`), plus an output
`length` field
- **New `FlattenListBlock`**: Recursively flattens nested list
structures with configurable `max_depth`
- **New `InterleaveListsBlock`**: Round-robin interleaving of elements
from multiple lists
- **New `ZipListsBlock`**: Zips corresponding elements from multiple
lists with support for padding to longest or truncating to shortest
- **New `ListDifferenceBlock`**: Computes set difference between two
lists (regular or symmetric)
- **New `ListIntersectionBlock`**: Finds common elements between two
lists, preserving order
### Helper Utilities
Extracted reusable helper functions for validation, flattening,
deduplication, interleaving, chunking, and statistics computation to
support the blocks and enable future reuse.
### Test Coverage
Comprehensive test suite with 188 test functions across 29 test classes
covering:
- Built-in block test harness validation for all 6 blocks
- Manual edge-case tests for each block (empty inputs, large lists,
mixed types, nested structures)
- Internal method tests for all block classes
- Unit tests for all helper utility functions
Closes#11139
## Test plan
- [x] All files pass Python syntax validation (`ast.parse`)
- [x] Built-in `test_input`/`test_output` tests defined for all blocks
- [x] Manual tests cover edge cases: empty lists, large lists, mixed
types, nested structures, deduplication, None removal
- [x] Helper function tests validate all utility functions independently
- [x] All block IDs are valid UUID4
- [x] Block categories set to `BlockCategory.BASIC` for consistency with
existing list blocks
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Enhanced `ConcatenateListsBlock` with deduplication and None-filtering
options, and added five new list manipulation blocks
(`FlattenListBlock`, `InterleaveListsBlock`, `ZipListsBlock`,
`ListDifferenceBlock`, `ListIntersectionBlock`) with comprehensive
helper functions and test coverage.
**Key Changes:**
- Enhanced `ConcatenateListsBlock` with `deduplicate` and `remove_none`
options, plus `length` output field
- Added `FlattenListBlock` for recursively flattening nested lists with
configurable `max_depth`
- Added `InterleaveListsBlock` for round-robin element interleaving
- Added `ZipListsBlock` with support for padding/truncation
- Added `ListDifferenceBlock` and `ListIntersectionBlock` for set
operations
- Extracted 12 reusable helper functions for validation, flattening,
deduplication, etc.
- Comprehensive test suite with 188 test functions covering edge cases
**Minor Issues:**
- Helper function `_deduplicate_list` has redundant logic in the `else`
branch that duplicates the `if` branch
- Three helper functions (`_filter_empty_collections`,
`_compute_list_statistics`, `_chunk_list`) are defined but unused -
consider removing unless planned for future use
- The `_make_hashable` function uses `hash(repr(item))` for unhashable
types, which correctly treats structurally identical dicts/lists as
duplicates
</details>
<details><summary><h3>Confidence Score: 4/5</h3></summary>
- Safe to merge with minor style improvements recommended
- The implementation is well-structured with comprehensive test coverage
(188 tests), proper error handling, and follows existing block patterns.
All blocks use valid UUID4 IDs and correct categories. The helper
functions provide good code reuse. The minor issues are purely stylistic
(redundant code, unused helpers) and don't affect functionality or
safety.
- No files require special attention - both files are well-tested and
follow project conventions
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant User
participant Block as List Block
participant Helper as Helper Functions
participant Output
User->>Block: Input (lists/parameters)
Block->>Helper: _validate_all_lists()
Helper-->>Block: validation result
alt validation fails
Block->>Output: error message
else validation succeeds
Block->>Helper: _concatenate_lists_simple() / _flatten_nested_list() / etc.
Helper-->>Block: processed result
opt deduplicate enabled
Block->>Helper: _deduplicate_list()
Helper-->>Block: deduplicated result
end
opt remove_none enabled
Block->>Helper: _filter_none_values()
Helper-->>Block: filtered result
end
Block->>Output: result + length
end
Output-->>User: Block outputs
```
</details>
<sub>Last reviewed commit: a6d5445</sub>
<!-- greptile_other_comments_section -->
<sub>(2/5) Greptile learns from your feedback when you react with thumbs
up/down!</sub>
<!-- /greptile_comment -->
---------
Co-authored-by: Otto <otto@agpt.co>
## Summary
- Enable Claude Agent SDK built-in **WebSearch** tool (Brave Search via
Anthropic API) for the CoPilot SDK agent
- Explicitly **block WebFetch** via `SDK_DISALLOWED_TOOLS`. The agent
uses the SSRF-protected `mcp__copilot__web_fetch` MCP tool instead
- **Consolidate** all tool security constants (`BLOCKED_TOOLS`,
`WORKSPACE_SCOPED_TOOLS`, `DANGEROUS_PATTERNS`, `SDK_DISALLOWED_TOOLS`)
into `tool_adapter.py` as a single source of truth — previously
scattered across `tool_adapter.py`, `security_hooks.py`, and inline in
`service.py`
## Changes
- `tool_adapter.py`: Add `WebSearch` to `_SDK_BUILTIN_TOOLS`, add
`SDK_DISALLOWED_TOOLS`, move security constants here
- `security_hooks.py`: Import constants from `tool_adapter.py` instead
of defining locally
- `service.py`: Use `SDK_DISALLOWED_TOOLS` instead of inline `["Bash"]`
## Test plan
- [x] All 21 security hooks tests pass
- [x] Ruff lint clean
- [x] All pre-commit hooks pass
- [ ] Verify WebSearch works in CoPilot chat (manual test)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Consolidates tool security constants into `tool_adapter.py` as single
source of truth, enables WebSearch (Brave via Anthropic API), and
explicitly blocks WebFetch to prevent SSRF attacks. The change improves
security by ensuring the agent uses the SSRF-protected
`mcp__copilot__web_fetch` tool instead of the built-in WebFetch which
can access internal networks like `localhost:8006`.
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge with minimal risk
- The changes improve security by blocking WebFetch (SSRF risk) while
enabling safe WebSearch. The consolidation of constants into a single
source of truth improves maintainability. All existing tests pass (21
security hooks tests), and the refactoring is straightforward with no
behavioral changes to existing security logic. The only suggestions are
minor improvements: adding a test for WebFetch blocking and considering
a lowercase alias for consistency.
- No files require special attention
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant Agent as SDK Agent
participant Hooks as Security Hooks
participant TA as tool_adapter.py
participant MCP as MCP Tools
Note over TA: SDK_DISALLOWED_TOOLS = ["Bash", "WebFetch"]
Note over TA: _SDK_BUILTIN_TOOLS includes WebSearch
Agent->>Hooks: Request WebSearch (Brave API)
Hooks->>TA: Check BLOCKED_TOOLS
TA-->>Hooks: Not blocked
Hooks-->>Agent: Allowed ✓
Agent->>Agent: Execute via Anthropic API
Agent->>Hooks: Request WebFetch (SSRF risk)
Hooks->>TA: Check BLOCKED_TOOLS
Note over TA: WebFetch in SDK_DISALLOWED_TOOLS
TA-->>Hooks: Blocked
Hooks-->>Agent: Denied ✗
Note over Agent: Use mcp__copilot__web_fetch instead
Agent->>Hooks: Request mcp__copilot__web_fetch
Hooks->>MCP: Validate (MCP tool, not SDK builtin)
MCP-->>Hooks: Has SSRF protection
Hooks-->>Agent: Allowed ✓
Agent->>MCP: Execute with SSRF checks
```
</details>
<sub>Last reviewed commit: 2d9975f</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
## Summary
- catch Jina reader client/server errors in ExtractWebsiteContentBlock
and surface a clear error output keyed to the user URL
- guard empty responses to return an explicit error instead of yielding
blank content
- add regression tests covering the happy path and HTTP client failures
via a monkeypatched fetch
## Testing
- not run (pytest unavailable in this environment)
---------
Co-authored-by: Nicholas Tindle <nicktindle@outlook.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
## Changes 🏗️
This PR updates the Claude Block Docs Review CI workflow to update
existing comments instead of creating new ones on each push.
### What's Changed:
1. **Concurrency group** - Prevents race conditions if the workflow runs
twice simultaneously
2. **Comment cleanup step** - Deletes any previous Claude review comment
before posting a new one
3. **Marker instruction** - Instructs Claude to include a `<!--
CLAUDE_DOCS_REVIEW -->` marker in its comment for identification
### Why:
Previously, every PR push would create a new review comment, cluttering
the PR with multiple comments. Now only the most recent review is shown.
### Testing:
1. Create a PR that triggers this workflow (modify a file in
`docs/integrations/` or `autogpt_platform/backend/backend/blocks/`)
2. Verify first run creates comment with marker
3. Push another commit
4. Verify old comment is deleted and new comment is created (not
accumulated)
Requested by @Bentlybro
---
## Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan (will be
tested on merge)
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Added concurrency control and comment deduplication to prevent multiple
Claude review comments from accumulating on PRs. The workflow now
deletes previous review comments (identified by `<!-- CLAUDE_DOCS_REVIEW
-->` marker) before posting new ones, and uses concurrency groups to
prevent race conditions.
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge with minimal risk
- The changes are well-contained, follow GitHub Actions best practices,
and use built-in GitHub APIs safely. The concurrency control prevents
race conditions, and the comment cleanup logic uses proper filtering
with `head -1` to handle edge cases. The HTML comment marker approach is
standard and reliable.
- No files require special attention
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant GH as GitHub PR Event
participant WF as Workflow
participant API as GitHub API
participant Claude as Claude Action
GH->>WF: PR opened/synchronized
WF->>WF: Check concurrency group
Note over WF: Cancel any in-progress runs<br/>for same PR number
WF->>API: Query PR comments
API-->>WF: Return all comments
WF->>WF: Filter for CLAUDE_DOCS_REVIEW marker
alt Previous comment exists
WF->>API: DELETE comment by ID
API-->>WF: Comment deleted
else No previous comment
WF->>WF: Skip deletion
end
WF->>Claude: Run code review
Claude->>API: POST new comment with marker
API-->>Claude: Comment created
```
</details>
<sub>Last reviewed commit: fb1b436</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
## Summary
<img width="1000" alt="image"
src="https://github.com/user-attachments/assets/18e8ef34-d222-453c-8b0a-1b25ef8cf806"
/>
<img width="250" alt="image"
src="https://github.com/user-attachments/assets/ba97556c-09c5-4f76-9f4e-49a2e8e57468"
/>
<img width="250" alt="image"
src="https://github.com/user-attachments/assets/68f7804a-fe74-442d-9849-39a229c052cf"
/>
<img width="250" alt="image"
src="https://github.com/user-attachments/assets/700690ba-f9fe-4726-8871-3bfbab586001"
/>
Full-stack MCP (Model Context Protocol) tool block integration that
allows users to connect to any MCP server, discover available tools,
authenticate via OAuth, and execute tools — all through the standard
AutoGPT credential system.
### Backend
- **MCPToolBlock** (`blocks/mcp/block.py`): New block using
`CredentialsMetaInput` pattern with optional credentials (`default={}`),
supporting both authenticated (OAuth) and public MCP servers. Includes
auto-lookup fallback for backward compatibility.
- **MCP Client** (`blocks/mcp/client.py`): HTTP transport with JSON-RPC
2.0, tool discovery, tool execution with robust error handling
(type-checked error fields, non-JSON response handling)
- **MCP OAuth Handler** (`blocks/mcp/oauth.py`): RFC 8414 discovery,
dynamic per-server OAuth with PKCE, token storage and refresh via
`raise_for_status=True`
- **MCP API Routes** (`api/features/mcp/routes.py`): `discover-tools`,
`oauth/login`, `oauth/callback` endpoints with credential cleanup,
defensive OAuth metadata validation
- **Credential system integration**:
- `CredentialsMetaInput` model_validator normalizes legacy
`"ProviderName.MCP"` format from Python 3.13's `str(StrEnum)` change
- `CredentialsFieldInfo.combine()` supports URL-based credential
discrimination (each MCP server gets its own credential entry)
- `aggregate_credentials_inputs` checks block schema defaults for
credential optionality
- Executor normalizes credential data for both Pydantic and JSON schema
validation paths
- Chat credential matching handles MCP server URL filtering
- `provider_matches()` helper used consistently for Python 3.13 StrEnum
compatibility
- **Pre-run validation**: `_validate_graph_get_errors` now calls
`get_missing_input()` for custom block-level validation (MCP tool
arguments)
- **Security**: HTML tag stripping loop to prevent XSS bypass, SSRF
protection (removed trusted_origins)
### Frontend
- **MCPToolDialog** (`MCPToolDialog.tsx`): Full tool discovery UI —
enter server URL, authenticate if needed, browse tools, select tool and
configure
- **OAuth popup** (`oauth-popup.ts`): Shared utility supporting
cross-origin MCP OAuth flows with BroadcastChannel + localStorage
fallback
- **Credential integration**: MCP-specific OAuth flow in
`useCredentialsInput`, server URL filtering in `useCredentials`, MCP
callback page
- **CredentialsSelect**: Auto-selects first available credential instead
of defaulting to "None", credentials listed before "None" in dropdown
- **Node rendering**: Dynamic tool input schema rendering on MCP nodes,
proper handling in both legacy and new flow editors
- **Block title persistence**: `customized_name` set at block creation
for both MCP and Agent blocks — no fallback logic needed, titles survive
save/load reliably
- **Stable credential ordering**: Removed `sortByUnsetFirst` that caused
credential inputs to jump when selected
### Tests (~2060 lines)
- Unit tests: block, client, tool execution
- Integration tests: mock MCP server with auth
- OAuth flow tests
- API endpoint tests
- Credential combining/optionality tests
- E2e tests (skipped in CI, run manually)
## Key Design Decisions
1. **Optional credentials via `default={}`**: MCP servers can be public
(no auth) or private (OAuth). The `credentials` field has `default={}`
making it optional at the schema level, so public servers work without
prompting for credentials.
2. **URL-based credential discrimination**: Each MCP server URL gets its
own credential entry in the "Run agent" form (via
`discriminator="server_url"`), so agents using multiple MCP servers
prompt for each independently.
3. **Model-level normalization**: Python 3.13 changed `str(StrEnum)` to
return `"ClassName.MEMBER"`. Rather than scattering fixes across the
codebase, a Pydantic `model_validator(mode="before")` on
`CredentialsMetaInput` handles normalization centrally, and
`provider_matches()` handles lookups.
4. **Credential auto-select**: `CredentialsSelect` component defaults to
the first available credential and notifies the parent state, ensuring
credentials are pre-filled in the "Run agent" dialog without requiring
manual selection.
5. **customized_name for block titles**: Both MCP and Agent blocks set
`customized_name` in metadata at creation time. This eliminates
convoluted runtime fallback logic (`agent_name`, hostname extraction) —
the title is persisted once and read directly.
## Test plan
- [x] Unit/integration tests pass (68 MCP + 11 graph = 79 tests)
- [x] Manual: MCP block with public server (DeepWiki) — no credentials
needed, tools discovered and executable
- [x] Manual: MCP block with OAuth server (Linear, Sentry) — OAuth flow
prompts correctly
- [x] Manual: "Run agent" form shows correct credential requirements per
MCP server
- [x] Manual: Credential auto-selects when exactly one matches,
pre-selects first when multiple exist
- [x] Manual: Credential ordering stays stable when
selecting/deselecting
- [x] Manual: MCP block title persists after save and refresh
- [x] Manual: Agent block title persists after save and refresh (via
customized_name)
- [ ] Manual: Shared agent with MCP block prompts new user for
credentials
---------
Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: Ubbe <hi@ubbe.dev>
## Summary
Full integration of the **Claude Agent SDK** to replace the existing
one-turn OpenAI-compatible CoPilot implementation with a multi-turn,
tool-using AI agent.
### What changed
**Core SDK Integration** (`chat/sdk/` — new module)
- **`service.py`**: Main orchestrator — spawns Claude Code CLI as a
subprocess per user message, streams responses back via SSE. Handles
conversation history compression, session lifecycle, and error recovery.
- **`response_adapter.py`**: Translates Claude Agent SDK events (text
deltas, tool use, errors, result messages) into the existing CoPilot
`StreamEvent` protocol so the frontend works unchanged.
- **`tool_adapter.py`**: Bridges CoPilot's MCP tools (find_block,
run_block, create_agent, etc.) into the SDK's tool format. Handles
schema conversion and result serialization.
- **`security_hooks.py`**: Pre/Post tool-use hooks that enforce a strict
allowlist of tools, block path traversal, sandbox file operations to
per-session workspace directories, cap sub-agent spawning, and prevent
the model from accessing unauthorized system resources.
- **`transcript.py`**: JSONL transcript I/O utilities for the stateless
`--resume` feature (see below).
**Stateless Multi-Turn Resume** (new)
- Instead of compressing conversation history via LLM on every turn
(lossy and expensive), we capture Claude Code's native JSONL session
transcript via a **Stop hook** callback, persist it in the DB
(`ChatSession.sdkTranscript`), and restore it on the next turn via
`--resume <file>`.
- This preserves full tool call/result context across turns with zero
token overhead for history.
- Feature-flagged via `CLAUDE_AGENT_USE_RESUME` (default: off).
- DB migration: `ALTER TABLE "ChatSession" ADD COLUMN "sdkTranscript"
TEXT`.
**Sandboxed Tool Execution** (`chat/tools/`)
- **`bash_exec.py`**: Sandboxed bash execution using bubblewrap
(`bwrap`) with read-only root filesystem, per-session writable
workspace, resource limits (CPU, memory, file size), and network
isolation.
- **`sandbox.py`**: Shared bubblewrap sandbox infrastructure — generates
`bwrap` command lines with configurable mounts, environment, and
resource constraints.
- **`web_fetch.py`**: URL fetching tool with domain allowlist, size
limits, and content-type filtering.
- **`check_operation_status.py`**: Polling tool for long-running
operations (agent creation, block execution) so the SDK doesn't block
waiting.
- **`find_block.py`** / **`run_block.py`**: Enhanced with category
filtering, optimized response size (removed raw JSON schemas), and
better error handling.
**Security**
- Path traversal prevention: session IDs sanitized, all file ops
confined to workspace dirs, symlink resolution.
- Tool allowlist enforcement via SDK hooks — model cannot call arbitrary
tools.
- Built-in `Bash` tool blocked via `disallowed_tools` to prevent
bypassing sandboxed `bash_exec`.
- Sub-agent (`Task`) spawning capped at configurable limit (default:
10).
- CodeQL-clean path sanitization patterns.
**Streaming & Reconnection**
- SSE stream registry backed by Redis Streams for crash-resilient
reconnection.
- Long-running operation tracking with TTL-based cleanup.
- Atomic message append to prevent race conditions on concurrent writes.
**Configuration** (`config.py`)
- `use_claude_agent_sdk` — master toggle (default: on)
- `claude_agent_model` — model override for SDK path
- `claude_agent_max_buffer_size` — JSON parsing buffer (10MB)
- `claude_agent_max_subtasks` — sub-agent cap (10)
- `claude_agent_use_resume` — transcript-based resume (default: off)
- `thinking_enabled` — extended thinking for Claude models
**Tests**
- `sdk/response_adapter_test.py` — 366 lines covering all event
translation paths
- `sdk/security_hooks_test.py` — 165 lines covering tool blocking, path
traversal, subtask limits
- `chat/model_test.py` — 214 lines covering session model serialization
- `chat/service_test.py` — Integration tests including multi-turn resume
keyword recall
- `tools/find_block_test.py` / `run_block_test.py` — Extended with new
tool behavior tests
## Test plan
- [x] Unit tests pass (`sdk/response_adapter_test.py`,
`security_hooks_test.py`, `model_test.py`)
- [x] Integration test: multi-turn keyword recall via `--resume`
(`service_test.py::test_sdk_resume_multi_turn`)
- [x] Manual E2E: CoPilot chat sessions with tool calls, bash execution,
and multi-turn context
- [x] Pre-commit hooks pass (ruff, isort, black, pyright, flake8)
- [ ] Staging deployment with `claude_agent_use_resume=false` initially
- [ ] Enable resume in staging, verify transcript capture and recall
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
This PR replaces the existing OpenAI-compatible CoPilot with a full
Claude Agent SDK integration, introducing multi-turn conversations,
stateless resume via JSONL transcripts, and sandboxed tool execution.
**Key changes:**
- **SDK integration** (`chat/sdk/`): spawns Claude Code CLI subprocess
per message, translates events to frontend protocol, bridges MCP tools
- **Stateless resume**: captures JSONL transcripts via Stop hook,
persists in `ChatSession.sdkTranscript`, restores with `--resume`
(feature-flagged, default off)
- **Sandboxed execution**: bubblewrap sandbox for bash commands with
filesystem whitelist, network isolation, resource limits
- **Security hooks**: tool allowlist enforcement, path traversal
prevention, workspace-scoped file operations, sub-agent spawn limits
- **Long-running operations**: delegates `create_agent`/`edit_agent` to
existing stream_registry infrastructure for SSE reconnection
- **Feature flag**: `CHAT_USE_CLAUDE_AGENT_SDK` with LaunchDarkly
support, defaults to enabled
**Security issues found:**
- Path traversal validation has logic errors in `security_hooks.py:82`
(tilde expansion order) and `service.py:266` (redundant `..` check)
- Config validator always prefers env var over explicit `False` value
(`config.py:162`)
- Race condition in `routes.py:323` — message persisted before task
registration, could duplicate on retry
- Resource limits in sandbox may fail silently (`sandbox.py:109`)
**Test coverage is strong** with 366 lines for response adapter, 165 for
security hooks, and integration tests for multi-turn resume.
</details>
<details><summary><h3>Confidence Score: 3/5</h3></summary>
- This PR is generally safe but has critical security issues in path
validation that must be fixed before merge
- Score reflects strong architecture and test coverage offset by real
security vulnerabilities: the tilde expansion bug in `security_hooks.py`
could allow sandbox escape, the race condition could cause message
duplication, and the silent ulimit failures could bypass resource
limits. The bubblewrap sandbox and allowlist enforcement are
well-designed, but the path validation bugs need fixing. The transcript
resume feature is properly feature-flagged. Overall the implementation
is solid but the security issues prevent a higher score.
- Pay close attention to
`backend/api/features/chat/sdk/security_hooks.py` (path traversal
vulnerability), `backend/api/features/chat/routes.py` (race condition),
`backend/api/features/chat/tools/sandbox.py` (silent resource limit
failures), and `backend/api/features/chat/sdk/service.py` (redundant
security check)
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant Frontend
participant Routes as routes.py
participant SDKService as sdk/service.py
participant ClaudeSDK as Claude Agent SDK CLI
participant SecurityHooks as security_hooks.py
participant ToolAdapter as tool_adapter.py
participant CoPilotTools as tools/*
participant Sandbox as sandbox.py (bwrap)
participant DB as Database
participant Redis as stream_registry
Frontend->>Routes: POST /chat (user message)
Routes->>SDKService: stream_chat_completion_sdk()
SDKService->>DB: get_chat_session()
DB-->>SDKService: session + messages
alt Resume enabled AND transcript exists
SDKService->>SDKService: validate_transcript()
SDKService->>SDKService: write_transcript_to_tempfile()
Note over SDKService: Pass --resume to SDK
else No resume
SDKService->>SDKService: _compress_conversation_history()
Note over SDKService: Inject history into user message
end
SDKService->>SecurityHooks: create_security_hooks()
SDKService->>ToolAdapter: create_copilot_mcp_server()
SDKService->>ClaudeSDK: spawn subprocess with MCP server
loop Streaming Conversation
ClaudeSDK->>SDKService: AssistantMessage (text/tool_use)
SDKService->>Frontend: StreamTextDelta / StreamToolInputAvailable
alt Tool Call
ClaudeSDK->>SecurityHooks: PreToolUse hook
SecurityHooks->>SecurityHooks: validate path, check allowlist
alt Tool blocked
SecurityHooks-->>ClaudeSDK: deny
else Tool allowed
SecurityHooks-->>ClaudeSDK: allow
ClaudeSDK->>ToolAdapter: call MCP tool
alt Long-running tool (create_agent, edit_agent)
ToolAdapter->>Redis: register task
ToolAdapter->>DB: save OperationPendingResponse
ToolAdapter->>ToolAdapter: spawn background task
ToolAdapter-->>ClaudeSDK: OperationStartedResponse
else Regular tool (find_block, bash_exec)
ToolAdapter->>CoPilotTools: execute()
alt bash_exec
CoPilotTools->>Sandbox: run_sandboxed()
Sandbox->>Sandbox: build bwrap command
Note over Sandbox: Network isolation,<br/>filesystem whitelist,<br/>resource limits
Sandbox-->>CoPilotTools: stdout, stderr, exit_code
end
CoPilotTools-->>ToolAdapter: result
ToolAdapter->>ToolAdapter: stash full output
ToolAdapter-->>ClaudeSDK: MCP response
end
SecurityHooks->>SecurityHooks: PostToolUse hook (log)
end
end
ClaudeSDK->>SDKService: UserMessage (ToolResultBlock)
SDKService->>ToolAdapter: pop_pending_tool_output()
SDKService->>Frontend: StreamToolOutputAvailable
end
ClaudeSDK->>SecurityHooks: Stop hook
SecurityHooks->>SDKService: transcript_path callback
SDKService->>SDKService: read_transcript_file()
SDKService->>DB: save transcript to session.sdkTranscript
ClaudeSDK->>SDKService: ResultMessage (success)
SDKService->>Frontend: StreamFinish
SDKService->>DB: upsert_chat_session()
```
</details>
<sub>Last reviewed commit: 28c1121</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
---------
Co-authored-by: Swifty <craigswift13@gmail.com>
## Summary
Adds an automated workflow that detects potential merge conflicts
between open PRs, helping contributors coordinate proactively.
**Example output:** [See comment on PR
#12057](https://github.com/Significant-Gravitas/AutoGPT/pull/12057#issuecomment-3897330632)
## How it works
1. **Triggered on PR events** — runs when a PR is opened, pushed to, or
reopened
2. **Compares against all open PRs** targeting the same base branch
3. **Detects overlaps** at multiple levels:
- File overlap (same files modified)
- Line overlap (same line ranges modified)
- Actual merge conflicts (attempts real merges)
4. **Posts a comment** on the PR with findings
## Features
- Full file paths with common prefix extraction for readability
- Conflict size (number of conflict regions + lines affected)
- Conflict types (content, added, deleted, modified/deleted, etc.)
- Last-updated timestamps for each PR
- Risk categorization (conflict, medium, low)
- Ignores noise files (openapi.json, lock files)
- Updates existing comment on subsequent pushes (no spam)
- Filters out PRs older than 14 days
- Clone-once optimization for fast merge testing (~48s for 19 PRs)
## Files
- `.github/scripts/detect_overlaps.py` — main detection script
- `.github/workflows/pr-overlap-check.yml` — workflow definition
## Summary
Disables the Print to Console block as requested by Nick Tindle.
## Changes
- Added `disabled=True` to PrintToConsoleBlock in `basic.py`
## Testing
- Block will no longer appear in the platform UI
- Existing graphs using this block should be checked (block ID:
`f3b1c1b2-4c4f-4f0d-8d2f-4c4f0d8d2f4c`)
Closes OPEN-3000
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Added `disabled=True` parameter to `PrintToConsoleBlock` in `basic.py`
per Nick Tindle's request (OPEN-3000).
- Block follows the same disabling pattern used by other blocks in the
codebase (e.g., `BlockInstallationBlock`, video blocks, Ayrshare blocks)
- Block will no longer appear in the platform UI for new graph creation
- Existing graphs using this block (ID:
`f3b1c1b2-4c4f-4f0d-8d2f-4c4f0d8d2f4c`) will need to be checked for
compatibility
- Comment properly documents the reason for disabling
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge with minimal risk
- Single-line change that adds a well-documented flag following existing
patterns used throughout the codebase. The change is non-destructive and
only affects UI visibility of the block for new graphs.
- No files require special attention
</details>
<sub>Last reviewed commit: 759003b</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Users can now search for existing feature requests and submit new ones
directly through the CoPilot chat interface. Requests are tracked in
Linear with customer need attribution.
### Changes 🏗️
**Backend:**
- Added `SearchFeatureRequestsTool` and `CreateFeatureRequestTool` to
the CoPilot chat tools registry
- Integrated with Linear GraphQL API for searching issues in the feature
requests project, creating new issues, upserting customers, and
attaching customer needs
- Added `linear_api_key` secret to settings for system-level Linear API
access
- Added response models (`FeatureRequestSearchResponse`,
`FeatureRequestCreatedResponse`, `FeatureRequestInfo`) to the tools
models
**Frontend:**
- Added `SearchFeatureRequestsTool` and `CreateFeatureRequestTool` UI
components with full streaming state handling (input-streaming,
input-available, output-available, output-error)
- Added helper utilities for output parsing, type guards, animation
text, and icon rendering
- Wired tools into `ChatMessagesContainer` for rendering in the chat
- Added styleguide examples covering all tool states
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified search returns matching feature requests from Linear
- [x] Verified creating a new feature request creates an issue and
customer need in Linear
- [x] Verified adding a need to an existing issue works via
`existing_issue_id`
- [x] Verified error states render correctly in the UI
- [x] Verified styleguide page renders all tool states
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
New secret: `LINEAR_API_KEY` — required for system-level Linear API
operations (defaults to empty string).
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Adds feature request search and creation tools to CoPilot chat,
integrating with Linear's GraphQL API to track user feedback. Users can
now search existing feature requests and submit new ones (or add their
need to existing issues) directly through conversation.
**Key changes:**
- Backend: `SearchFeatureRequestsTool` and `CreateFeatureRequestTool`
with Linear API integration via system-level `LINEAR_API_KEY`
- Frontend: React components with streaming state handling and accordion
UI for search results and creation confirmations
- Models: Added `FeatureRequestSearchResponse` and
`FeatureRequestCreatedResponse` to response types
- Customer need tracking: Upserts customers in Linear and attaches needs
to issues for better feedback attribution
**Issues found:**
- Missing `LINEAR_API_KEY` entry in `.env.default` (required per PR
description checklist)
- Hardcoded project/team IDs reduce maintainability
- Global singleton pattern could cause issues in async contexts
- Using `user_id` as customer name reduces readability in Linear
</details>
<details><summary><h3>Confidence Score: 4/5</h3></summary>
- Safe to merge with minor configuration fix required
- The implementation is well-structured with proper error handling, type
safety, and follows existing patterns in the codebase. The missing
`.env.default` entry is a straightforward configuration issue that must
be fixed before deployment but doesn't affect code quality. The other
findings are style improvements that don't impact functionality.
- Verify that `LINEAR_API_KEY` is added to `.env.default` before merging
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant User
participant CoPilot UI
participant LLM
participant FeatureRequestTool
participant LinearClient
participant Linear API
User->>CoPilot UI: Request feature via chat
CoPilot UI->>LLM: Send user message
LLM->>FeatureRequestTool: search_feature_requests(query)
FeatureRequestTool->>LinearClient: query(SEARCH_ISSUES_QUERY)
LinearClient->>Linear API: POST /graphql (search)
Linear API-->>LinearClient: searchIssues.nodes[]
LinearClient-->>FeatureRequestTool: Feature request data
FeatureRequestTool-->>LLM: FeatureRequestSearchResponse
alt No existing requests found
LLM->>FeatureRequestTool: create_feature_request(title, description)
FeatureRequestTool->>LinearClient: mutate(CUSTOMER_UPSERT_MUTATION)
LinearClient->>Linear API: POST /graphql (upsert customer)
Linear API-->>LinearClient: customer {id, name}
LinearClient-->>FeatureRequestTool: Customer data
FeatureRequestTool->>LinearClient: mutate(ISSUE_CREATE_MUTATION)
LinearClient->>Linear API: POST /graphql (create issue)
Linear API-->>LinearClient: issue {id, identifier, url}
LinearClient-->>FeatureRequestTool: Issue data
FeatureRequestTool->>LinearClient: mutate(CUSTOMER_NEED_CREATE_MUTATION)
LinearClient->>Linear API: POST /graphql (attach need)
Linear API-->>LinearClient: need {id, issue}
LinearClient-->>FeatureRequestTool: Need data
FeatureRequestTool-->>LLM: FeatureRequestCreatedResponse
else Existing request found
LLM->>FeatureRequestTool: create_feature_request(title, description, existing_issue_id)
FeatureRequestTool->>LinearClient: mutate(CUSTOMER_UPSERT_MUTATION)
LinearClient->>Linear API: POST /graphql (upsert customer)
Linear API-->>LinearClient: customer {id}
LinearClient-->>FeatureRequestTool: Customer data
FeatureRequestTool->>LinearClient: mutate(CUSTOMER_NEED_CREATE_MUTATION)
LinearClient->>Linear API: POST /graphql (attach need to existing)
Linear API-->>LinearClient: need {id, issue}
LinearClient-->>FeatureRequestTool: Need data
FeatureRequestTool-->>LLM: FeatureRequestCreatedResponse
end
LLM-->>CoPilot UI: Tool response + continuation
CoPilot UI-->>User: Display result with accordion UI
```
</details>
<sub>Last reviewed commit: af2e093</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
## Summary
Applies the CI performance optimizations from #12090 to Claude Code
workflows.
## Changes
### `claude.yml` & `claude-dependabot.yml`
- **pnpm caching**: Replaced manual `actions/cache` with `setup-node`
built-in `cache: "pnpm"`
- Removes 4 steps (set pnpm store dir, cache step, manual config) → 1
step
### `claude-ci-failure-auto-fix.yml`
- **Added dev environment setup** with optimized caching
- Now Claude can run lint/tests when fixing CI failures (previously
could only edit files)
- Uses the same optimized caching patterns
## Dependency
This PR is based on #12090 and will merge after it.
## Testing
- Workflow YAML syntax validated
- Patterns match proven #12090 implementation
- CI caching changes fail gracefully to uncached builds
## Linear
Fixes [SECRT-1950](https://linear.app/autogpt/issue/SECRT-1950)
## Future Enhancements
E2E test data caching could be added to Claude workflows if needed for
running integration tests. Currently Claude workflows set up a dev
environment but don't run E2E tests by default.
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Applies proven CI performance optimizations to Claude workflows by
simplifying pnpm caching and adding dev environment setup to the
auto-fix workflow.
**Key changes:**
- Replaced manual pnpm cache configuration (4 steps) with built-in
`setup-node` `cache: "pnpm"` support in `claude.yml` and
`claude-dependabot.yml`
- Added complete dev environment setup (Python/Poetry + Node.js/pnpm) to
`claude-ci-failure-auto-fix.yml` so Claude can run linting and tests
when fixing CI failures
- Correctly orders `corepack enable` before `setup-node` to ensure pnpm
is available for caching
The changes mirror the optimizations from PR #12090 and maintain
consistency across all Claude workflows.
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge with minimal risk
- The changes are CI infrastructure optimizations that mirror proven
patterns from PR #12090. The pnpm caching simplification reduces
complexity without changing functionality (caching failures gracefully
fall back to uncached builds). The dev environment setup in the auto-fix
workflow is additive and enables Claude to run linting/tests. All YAML
syntax is correct and the step ordering follows best practices.
- No files require special attention
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant GHA as GitHub Actions
participant Corepack as Corepack
participant SetupNode as setup-node@v6
participant Cache as GHA Cache
participant pnpm as pnpm
Note over GHA,pnpm: Before (Manual Caching)
GHA->>SetupNode: Set up Node.js 22
SetupNode-->>GHA: Node.js ready
GHA->>Corepack: Enable corepack
Corepack-->>GHA: pnpm available
GHA->>pnpm: Configure store directory
pnpm-->>GHA: Store path set
GHA->>Cache: actions/cache (manual key)
Cache-->>GHA: Cache restored/missed
GHA->>pnpm: Install dependencies
pnpm-->>GHA: Dependencies installed
Note over GHA,pnpm: After (Built-in Caching)
GHA->>Corepack: Enable corepack
Corepack-->>GHA: pnpm available
GHA->>SetupNode: Set up Node.js 22<br/>cache: "pnpm"<br/>cache-dependency-path: pnpm-lock.yaml
SetupNode->>Cache: Auto-detect pnpm store
Cache-->>SetupNode: Cache restored/missed
SetupNode-->>GHA: Node.js + cache ready
GHA->>pnpm: Install dependencies
pnpm-->>GHA: Dependencies installed
```
</details>
<sub>Last reviewed commit: f1681a0</sub>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
---------
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
Co-authored-by: Ubbe <hi@ubbe.dev>
## Summary
Adds comprehensive error logging for OpenRouter/OpenAI API errors to
help diagnose issues like provider routing failures, context length
exceeded, rate limits, etc.
## Background
While investigating
[SECRT-1859](https://linear.app/autogpt/issue/SECRT-1859), we found that
when OpenRouter returns errors, the actual error details weren't being
captured or logged. Langfuse traces showed `provider_name: 'unknown'`
and `completion: null` without any insight into WHY all providers
rejected the request.
## Changes
- Add `_extract_api_error_details()` to extract rich information from
API errors including:
- Status code and request ID
- Response body (contains OpenRouter's actual error message)
- OpenRouter-specific headers (provider, model)
- Rate limit headers
- Add `_log_api_error()` helper that logs errors with context:
- Session ID for correlation
- Message count (helps identify context length issues)
- Model being used
- Retry count
- Update error handling in `_stream_chat_chunks()` and
`_generate_llm_continuation()` to use new logging
- Extract provider's error message from response body for better user
feedback
## Example log output
```
API error: {
'error_type': 'APIStatusError',
'error_message': 'Provider returned error',
'status_code': 400,
'request_id': 'req_xxx',
'response_body': {'error': {'message': 'context_length_exceeded', 'type': 'invalid_request_error'}},
'openrouter_provider': 'unknown',
'session_id': '44fbb803-...',
'message_count': 52,
'model': 'anthropic/claude-opus-4.5',
'retry_count': 0
}
```
## Testing
- [ ] Verified code passes linting (black, isort, ruff)
- [ ] Error details are properly extracted from different error types
## Refs
- Linear: SECRT-1859
- Thread:
https://discord.com/channels/1126875755960336515/1467066151002571034
---------
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
The frontend `e2e_test` doesn't have a working build cache setup,
causing really slow builds = slow test jobs. These changes reduce total
test runtime from ~12 minutes to ~5 minutes.
### Changes 🏗️
- Inject build cache config into docker compose config; let `buildx
bake` use GHA cache directly
- Add `docker-ci-fix-compose-build-cache.py` script
- Optimize `backend/Dockerfile` + root `.dockerignore`
- Replace broken DIY pnpm store caching with `actions/setup-node`
built-in cache management
- Add caching for test seed data created in DB
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- CI
### Changes 🏗️
The `find_block` AutoPilot tool was returning ~90K characters per
response (10 blocks). The bloat came from including full JSON Schema
objects (`input_schema`, `output_schema`) with all nested `$defs`,
`anyOf`, and type definitions for every block.
**What changed:**
- **`BlockInfoSummary` model**: Removed `input_schema` (raw JSON
Schema), `output_schema` (raw JSON Schema), and `categories`. Added
`output_fields` (compact field-level summaries matching the existing
`required_inputs` format).
- **`BlockListResponse` model**: Removed `usage_hint` (info now in
`message`).
- **`FindBlockTool._execute()`**: Now extracts compact `output_fields`
from output schema properties instead of including the entire raw
schema. Credentials handling is unchanged.
- **Test**: Added `test_response_size_average_chars_per_block` with
realistic block schemas (HTTP, Email, Claude Code) to measure and assert
response size stays under 2K chars/block.
- **`CLAUDE.md`**: Clarified `dev` vs `master` branching strategy.
**Result:** Average response size reduced from ~9,000 to ~1,300 chars
per block (~85% reduction). This directly reduces LLM token consumption,
latency, and API costs for AutoPilot interactions.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified models import and serialize correctly
- [x] Verified response size: 3,970 chars for 3 realistic blocks (avg
1,323/block)
- [x] Lint (`ruff check`) and type check (`pyright`) pass on changed
files
- [x] Frontend compatibility preserved: `blocks[].name` and `count`
fields retained for `block_list` handler
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Toran Bruce Richards <toran.richards@gmail.com>
## Summary
- Remove left and right borders from tables rendered in CoPilot chat
- Increase cell padding (py-3 → py-3.5) for better spacing between text
and lines
- Applies to both Streamdown (main chat) and MarkdownRenderer (tool
outputs)
Design feedback from Olivia to make tables "breathe" more.
## Test plan
- [ ] Open CoPilot chat and trigger a response containing a table
- [ ] Verify tables no longer have left/right borders
- [ ] Verify increased spacing between rows
- [ ] Check both light and dark modes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Improved CoPilot chat table styling by removing left and right borders
and increasing vertical padding from `py-3` to `py-3.5`. Changes apply
to both:
- Streamdown-rendered tables (via CSS selector in `globals.css`)
- MarkdownRenderer tables (via Tailwind classes)
The changes make tables "breathe" more per design feedback from Olivia.
**Issue Found:**
- The CSS padding value in `globals.css:192` is `0.625rem` (`py-2.5`)
but should be `0.875rem` (`py-3.5`) to match the PR description and the
MarkdownRenderer implementation.
</details>
<details><summary><h3>Confidence Score: 2/5</h3></summary>
- This PR has a logical error that will cause inconsistent table styling
between Streamdown and MarkdownRenderer tables
- The implementation has an inconsistency where the CSS file uses
`py-2.5` padding while the PR description and MarkdownRenderer use
`py-3.5`. This will result in different table padding between the two
rendering systems, contradicting the goal of consistent styling
improvements.
- Pay close attention to `autogpt_platform/frontend/src/app/globals.css`
- the padding value needs to be corrected to match the intended design
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Resolves OPEN-2693: Make exact timestamp of runs accessible through UI.
The NewAgentLibraryView shows relative timestamps ("2 days ago") for
runs and schedules, but unlike the OldAgentLibraryView it didn't show
the exact timestamp on hover. This PR adds a native `title` tooltip so
users can see the full date/time by hovering.
### Changes 🏗️
- Added `descriptionTitle` prop to `SidebarItemCard` that renders as a
`title` attribute on the description text
- `TaskListItem` now passes the exact `run.started_at` timestamp via
`descriptionTitle`
- `ScheduleListItem` now passes the exact `schedule.next_run_time`
timestamp via `descriptionTitle`
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [ ] Open an agent in the library view
- [ ] Hover over a run's relative timestamp (e.g. "2 days ago") and
confirm the full date/time tooltip appears
- [ ] Hover over a schedule's relative timestamp and confirm the full
date/time tooltip appears
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Added native tooltip functionality to show exact timestamps in the
library view. The implementation adds a `descriptionTitle` prop to
`SidebarItemCard` that renders as a `title` attribute on the description
text. This allows users to hover over relative timestamps (e.g., "2 days
ago") to see the full date/time.
**Changes:**
- Added optional `descriptionTitle` prop to `SidebarItemCard` component
(SidebarItemCard.tsx:10)
- `TaskListItem` passes `run.started_at` as the tooltip value
(TaskListItem.tsx:84-86)
- `ScheduleListItem` passes `schedule.next_run_time` as the tooltip
value (ScheduleListItem.tsx:32)
- Unrelated fix included: Sentry configuration updated to suppress
cross-origin stylesheet errors (instrumentation-client.ts:25-28)
**Note:** The PR includes two separate commits - the main timestamp
tooltip feature and a Sentry error suppression fix. The PR description
only documents the timestamp feature.
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge with minimal risk
- The changes are straightforward and limited in scope - adding an
optional prop that forwards a native HTML attribute for tooltip
functionality. The Text component already supports forwarding arbitrary
HTML attributes through its spread operator (...rest), ensuring the
`title` attribute works correctly. Both the timestamp tooltip feature
and the Sentry configuration fix are low-risk improvements with no
breaking changes.
- No files require special attention
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant User
participant TaskListItem
participant ScheduleListItem
participant SidebarItemCard
participant Text
participant Browser
User->>TaskListItem: Hover over run timestamp
TaskListItem->>SidebarItemCard: Pass descriptionTitle (run.started_at)
SidebarItemCard->>Text: Render with title attribute
Text->>Browser: Forward title attribute to DOM
Browser->>User: Display native tooltip with exact timestamp
User->>ScheduleListItem: Hover over schedule timestamp
ScheduleListItem->>SidebarItemCard: Pass descriptionTitle (schedule.next_run_time)
SidebarItemCard->>Text: Render with title attribute
Text->>Browser: Forward title attribute to DOM
Browser->>User: Display native tooltip with exact timestamp
```
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Adds `ignoreErrors` to the Sentry client configuration
(`instrumentation-client.ts`) to filter out `SecurityError:
CSSStyleSheet.cssRules getter: Not allowed to access cross-origin
stylesheet` errors
- These errors are caused by Sentry Replay (rrweb) attempting to
serialize DOM snapshots that include cross-origin stylesheets (from
browser extensions or CDN-loaded CSS)
- This was reported via Sentry on production, occurring on any page when
logged in
## Changes
- **`frontend/instrumentation-client.ts`**: Added `ignoreErrors: [/Not
allowed to access cross-origin stylesheet/]` to `Sentry.init()` config
## Test plan
- [ ] Verify the error no longer appears in Sentry after deployment
- [ ] Verify Sentry Replay still works correctly for other errors
- [ ] Verify no regressions in error tracking (other errors should still
be captured)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Adds error filtering to Sentry client configuration to suppress
cross-origin stylesheet security errors that occur when Sentry Replay
(rrweb) attempts to serialize DOM snapshots containing stylesheets from
browser extensions or CDN-loaded CSS. This prevents noise in Sentry
error logs without affecting the capture of legitimate errors.
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge with minimal risk
- The change adds a simple error filter to suppress benign cross-origin
stylesheet errors that are caused by Sentry Replay itself. The regex
pattern is specific and only affects client-side error reporting, with
no impact on application functionality or legitimate error capture
- No files require special attention
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Agent generation completes on the backend but the UI does not
update/refresh to show the result.
### Changes 🏗️
![Uploading Screenshot 2026-02-13 at 00.44.54.png…]()
- **Stream start timeout (12s):** If the backend doesn't begin streaming
within 12 seconds of submitting a message, the stream is aborted and a
destructive toast is shown to the user.
- **Long-running tool polling:** Added `useLongRunningToolPolling` hook
that polls the session endpoint every 1.5s while a tool output is in an
operating state (`operation_started` / `operation_pending` /
`operation_in_progress`). When the backend completes, messages are
refreshed so the UI reflects the final result.
- **CreateAgent UI improvements:** Replaced the orbit loader / progress
bar with a mini-game, added expanded accordion for saved agents, and
improved the saved-agent card with image, icons, and links that open in
new tabs.
- **Backend tweaks:** Added `image_url` to `CreateAgentToolOutput`,
minor model/service updates for the dummy agent generator.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Send a message and verify the stream starts within 12s or a toast
appears
- [x] Trigger agent creation and verify the UI updates when the backend
completes
- [x] Verify the saved-agent card renders correctly with image, links,
and icons
---------
Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Store files created by sandbox blocks (Claude Code, Code Executor) to
the user's workspace for persistence across runs.
### Changes 🏗️
- **New `sandbox_files.py` utility** (`backend/util/sandbox_files.py`)
- Shared module for extracting files from E2B sandboxes
- Stores files to workspace via `store_media_file()` (includes virus
scanning, size limits)
- Returns `SandboxFileOutput` with path, content, and `workspace_ref`
- **Claude Code block** (`backend/blocks/claude_code.py`)
- Added `workspace_ref` field to `FileOutput` schema
- Replaced inline `_extract_files()` with shared utility
- Files from working directory now stored to workspace automatically
- **Code Executor block** (`backend/blocks/code_executor.py`)
- Added `files` output field to `ExecuteCodeBlock.Output`
- Creates `/output` directory in sandbox before execution
- Extracts all files (text + binary) from `/output` after execution
- Updated `execute_code()` to support file extraction with
`extract_files` param
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Create agent with Claude Code block, have it create a file, verify
`workspace_ref` in output
- [x] Create agent with Code Executor block, write file to `/output`,
verify `workspace_ref` in output
- [x] Verify files persist in workspace after sandbox disposal
- [x] Verify binary files (images, etc.) work correctly in Code Executor
- [x] Verify existing graphs using `content` field still work (backward
compat)
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
No configuration changes required - this is purely additive backend
code.
---
**Related:** Closes SECRT-1931
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Adds automatic extraction and workspace storage of sandbox-written
files (including binaries for code execution), which can affect output
payload size, performance, and file-handling edge cases.
>
> **Overview**
> **Sandbox blocks now persist generated files to workspace.** A new
shared utility (`backend/util/sandbox_files.py`) extracts files from an
E2B sandbox (scoped by a start timestamp) and stores them via
`store_media_file`, returning `SandboxFileOutput` with `workspace_ref`.
>
> `ClaudeCodeBlock` replaces its inline file-scraping logic with this
utility and updates the `files` output schema to include
`workspace_ref`.
>
> `ExecuteCodeBlock` adds a `files` output and extends the executor
mixin to optionally extract/store files (text + binary) when an
`execution_context` is provided; related mocks/tests and docs are
updated accordingly.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
343854c0cf. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
### Changes 🏗️
Removed the default expiration date for API keys in the credentials
modal. Previously, API keys were set to expire the next day by default,
but now the expiration date field starts empty, allowing users to
explicitly choose whether they want to set an expiration date.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Open the API key credentials modal and verify the expiration date
field is empty by default
- [x] Test creating an API key with and without an expiration date
- [x] Verify both scenarios work correctly
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Removed the default expiration date for API key credentials in the
credentials modal. Previously, API keys were automatically set to expire
the next day at midnight. Now the expiration date field starts empty,
allowing users to explicitly choose whether to set an expiration.
- Removed `getDefaultExpirationDate()` helper function that calculated
tomorrow's date
- Changed default `expiresAt` value from calculated date to empty string
- Backend already supports optional expiration (`expires_at?: number`),
so no backend changes needed
- Form submission correctly handles empty expiration by passing
`undefined` to the API
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge with minimal risk
- The changes are straightforward and well-contained. The refactor
removes a helper function and changes a default value. The backend API
already supports optional expiration dates, and the form submission
logic correctly handles empty values by passing undefined. The change
improves UX by not forcing a default expiration date on users.
- No files require special attention
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
## Summary
Removes the `min-h-screen` class from `ConversationContent` in
ChatMessagesContainer, which was causing fixed height layout issues in
the CoPilot chat interface.
## Changes
- Removed `min-h-screen` from ConversationContent className
## Linear
Fixes [SECRT-1944](https://linear.app/autogpt/issue/SECRT-1944)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Removes the `min-h-screen` (100vh) class from `ConversationContent` that
was causing the chat message container to enforce a minimum viewport
height. The parent container already handles height constraints with
`h-full min-h-0` and flexbox layout, so the fixed minimum height was
creating layout conflicts. The component now properly grows within its
flex container using `flex-1`.
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge with minimal risk
- The change removes a single problematic CSS class that was causing
fixed height layout issues. The parent container already handles height
constraints properly with flexbox, and removing min-h-screen allows the
component to size correctly within its flex parent. This is a targeted,
low-risk bug fix with no logic changes.
- No files require special attention
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
I'm getting circular import issues because there is a lot of
cross-importing between `backend.data`, `backend.blocks`, and other
modules. This change reduces block-related cross-imports and thus risk
of breaking circular imports.
### Changes 🏗️
- Strip down `backend.data.block`
- Move `Block` base class and related class/enum defs to
`backend.blocks._base`
- Move `is_block_auth_configured` to `backend.blocks._utils`
- Move `get_blocks()`, `get_io_block_ids()` etc. to `backend.blocks`
(`__init__.py`)
- Update imports everywhere
- Remove unused and poorly typed `Block.create()`
- Change usages from `block_cls.create()` to `block_cls()`
- Improve typing of `load_all_blocks` and `get_blocks`
- Move cross-import of `backend.api.features.library.model` from
`backend/data/__init__.py` to `backend/data/integrations.py`
- Remove deprecated attribute `NodeModel.webhook`
- Re-generate OpenAPI spec and fix frontend usage
- Eliminate module-level `backend.blocks` import from `blocks/agent.py`
- Eliminate module-level `backend.data.execution` and
`backend.executor.manager` imports from `blocks/helpers/review.py`
- Replace `BlockInput` with `GraphInput` for graph inputs
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- CI static type-checking + tests should be sufficient for this
(#12081)
### Changes 🏗️
This PR completes the migration from the legacy builder to the new Flow
editor by removing all legacy code and feature flags.
**Removed:**
- Old builder view toggle functionality (`BuilderViewTabs.tsx`)
- Legacy debug panel (`RightSidebar.tsx`)
- Feature flags: `NEW_FLOW_EDITOR` and `BUILDER_VIEW_SWITCH`
- `useBuilderView` hook and related view-switching logic
**Updated:**
- Simplified `build/page.tsx` to always render the new Flow editor
- Added CSS styling (`flow.css`) to properly render Phosphor icons in
React Flow handles
**Tests:**
- Skipped e2e test suite in `build.spec.ts` (legacy builder tests)
- Follow-up PR (#12082) will add new e2e tests for the Flow editor
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Create a new flow and verify it loads correctly
- [x] Add nodes and connections to verify basic functionality works
- [x] Verify that node handles render correctly with the new CSS
- [x] Check that the UI is clean without the old debug panel or view
toggles
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
## Summary
- When the copilot model responds with both text content AND a
long-running tool call (e.g., `create_agent`), the streaming code
created two separate consecutive assistant messages — one with text, one
with `tool_calls`. This caused Anthropic's API to reject with
`"unexpected tool_use_id found in tool_result blocks"` because the
`tool_result` couldn't find a matching `tool_use` in the immediately
preceding assistant message.
- Added a defensive merge of consecutive assistant messages in
`to_openai_messages()` (fixes existing corrupt sessions too)
- Fixed `_yield_tool_call` to add tool_calls to the existing
current-turn assistant message instead of creating a new one
- Changed `accumulated_tool_calls` assignment to use `extend` to prevent
overwriting tool_calls added by long-running tool flow
## Test plan
- [x] All 23 chat feature tests pass (`backend/api/features/chat/`)
- [x] All 44 prompt utility tests pass (`backend/util/prompt_test.py`)
- [x] All pre-commit hooks pass (ruff, isort, black, pyright)
- [ ] Manual test: create an agent via copilot, then ask a follow-up
question — should no longer get 400 error
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Fixes a critical bug where long-running tool calls (like `create_agent`)
caused Anthropic API 400 errors due to split assistant messages. The fix
ensures tool calls are added to the existing assistant message instead
of creating new ones, and adds a defensive merge function to repair any
existing corrupt sessions.
**Key changes:**
- Added `_merge_consecutive_assistant_messages()` to defensively merge
split assistant messages in `to_openai_messages()`
- Modified `_yield_tool_call()` to append tool calls to the current-turn
assistant message instead of creating a new one
- Changed `accumulated_tool_calls` from assignment to `extend` to
preserve tool calls already added by long-running tool flow
**Impact:** Resolves the issue where users received 400 errors after
creating agents via copilot and asking follow-up questions.
</details>
<details><summary><h3>Confidence Score: 4/5</h3></summary>
- Safe to merge with minor verification recommended
- The changes are well-targeted and solve a real API compatibility
issue. The logic is sound: searching backwards for the current assistant
message is correct, and using `extend` instead of assignment prevents
overwriting. The defensive merge in `to_openai_messages()` also fixes
existing corrupt sessions. All existing tests pass according to the PR
description.
- No files require special attention - changes are localized and
defensive
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant User
participant StreamAPI as stream_chat_completion
participant Chunks as _stream_chat_chunks
participant ToolCall as _yield_tool_call
participant Session as ChatSession
User->>StreamAPI: Send message
StreamAPI->>Chunks: Stream chat chunks
alt Text + Long-running tool call
Chunks->>StreamAPI: Text delta (content)
StreamAPI->>Session: Append assistant message with content
Chunks->>ToolCall: Tool call detected
Note over ToolCall: OLD: Created new assistant message<br/>NEW: Appends to existing assistant
ToolCall->>Session: Search backwards for current assistant
ToolCall->>Session: Append tool_call to existing message
ToolCall->>Session: Add pending tool result
end
StreamAPI->>StreamAPI: Merge accumulated_tool_calls
Note over StreamAPI: Use extend (not assign)<br/>to preserve existing tool_calls
StreamAPI->>Session: to_openai_messages()
Session->>Session: _merge_consecutive_assistant_messages()
Note over Session: Defensive: Merges any split<br/>assistant messages
Session-->>StreamAPI: Merged messages
StreamAPI->>User: Return response
```
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
## Problem
The agent builder (LLM) misinterprets the HumanInTheLoop block outputs.
It thinks `approved_data` and `rejected_data` will yield status strings
like "APPROVED" or "REJECTED" instead of understanding that the actual
input data passes through.
This leads to unnecessary complexity - the agent builder adds comparison
blocks to check for status strings that don't exist.
## Solution
Enriched the block docstring and all input/output field descriptions to
make it explicit that:
1. The output is the actual data itself, not a status string
2. The routing is determined by which output pin fires
3. How to use the block correctly (connect downstream blocks to
appropriate output pins)
## Changes
- Updated block docstring with clear "How it works" and "Example usage"
sections
- Enhanced `data` input description to explain data flow
- Enhanced `name` input description for reviewer context
- Enhanced `approved_data` output to explicitly state it's NOT a status
string
- Enhanced `rejected_data` output to explicitly state it's NOT a status
string
- Enhanced `review_message` output for clarity
## Testing
Documentation-only change to schema descriptions. No functional changes.
Fixes SECRT-1930
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Enhanced documentation for the `HumanInTheLoopBlock` to clarify how
output pins work. The key improvement explicitly states that output pins
(`approved_data` and `rejected_data`) yield the actual input data, not
status strings like "APPROVED" or "REJECTED". This prevents the agent
builder (LLM) from misinterpreting the block's behavior and adding
unnecessary comparison blocks.
**Key changes:**
- Added "How it works" and "Example usage" sections to the block
docstring
- Clarified that routing is determined by which output pin fires, not by
comparing output values
- Enhanced all input/output field descriptions with explicit data flow
explanations
- Emphasized that downstream blocks should be connected to the
appropriate output pin based on desired workflow path
This is a documentation-only change with no functional modifications to
the code logic.
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge with no risk
- Documentation-only change that accurately reflects the existing code
behavior. No functional changes, no runtime impact, and the enhanced
descriptions correctly explain how the block outputs work based on
verification of the implementation code.
- No files require special attention
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
## Changes 🏗️
<img width="800" height="621" alt="Screenshot 2026-02-11 at 19 32 39"
src="https://github.com/user-attachments/assets/e97be1a7-972e-4ae0-8dfa-6ade63cf287b"
/>
When the BE API has an error, prevent it from leaking into the stream
and instead handle it gracefully via toast.
## Checklist 📋
### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Run the app locally and trust the changes
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
This PR fixes an issue where backend API stream errors were leaking into
the chat prompt instead of being handled gracefully. The fix involves
both backend and frontend changes to ensure error events conform to the
AI SDK's strict schema.
**Key Changes:**
- **Backend (`response_model.py`)**: Added custom `to_sse()` method for
`StreamError` that only emits `type` and `errorText` fields, stripping
extra fields like `code` and `details` that cause AI SDK validation
failures
- **Backend (`prompt.py`)**: Added validation step after context
compression to remove orphaned tool responses without matching tool
calls, preventing "unexpected tool_use_id" API errors
- **Frontend (`route.ts`)**: Implemented SSE stream normalization with
`normalizeSSEStream()` and `normalizeSSEEvent()` functions to strip
non-conforming fields from error events before they reach the AI SDK
- **Frontend (`ChatMessagesContainer.tsx`)**: Added toast notifications
for errors and improved error display UI with deduplication logic
The changes ensure a clean separation between internal error metadata
(useful for logging/debugging) and the strict schema required by the AI
SDK on the frontend.
</details>
<details><summary><h3>Confidence Score: 4/5</h3></summary>
- This PR is safe to merge with low risk
- The changes are well-structured and address a specific bug with proper
error handling. The dual-layer approach (backend filtering in `to_sse()`
+ frontend normalization) provides defense-in-depth. However, the lack
of automated tests for the new error normalization logic and the
potential for edge cases in SSE parsing prevent a perfect score.
- Pay close attention to
`autogpt_platform/frontend/src/app/api/chat/sessions/[sessionId]/stream/route.ts`
- the SSE normalization logic should be tested with various error
scenarios
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant User
participant Frontend as ChatMessagesContainer
participant Proxy as /api/chat/.../stream
participant Backend as Backend API
participant AISDK as AI SDK
User->>Frontend: Send message
Frontend->>Proxy: POST with message
Proxy->>Backend: Forward request with auth
Backend->>Backend: Process message
alt Success Path
Backend->>Proxy: SSE stream (text-delta, etc.)
Proxy->>Proxy: normalizeSSEStream (pass through)
Proxy->>AISDK: Forward SSE events
AISDK->>Frontend: Update messages
Frontend->>User: Display response
else Error Path
Backend->>Backend: StreamError.to_sse()
Note over Backend: Only emit {type, errorText}
Backend->>Proxy: SSE error event
Proxy->>Proxy: normalizeSSEEvent()
Note over Proxy: Strip extra fields (code, details)
Proxy->>AISDK: {type: "error", errorText: "..."}
AISDK->>Frontend: error state updated
Frontend->>Frontend: Toast notification (deduplicated)
Frontend->>User: Show error UI + toast
end
```
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Otto-AGPT <otto@agpt.co>
### Changes 🏗️
Added `min-w-0` class to the ContentCard component in the ToolAccordion
to prevent content overflow issues. This CSS fix ensures that the card
properly respects its container width constraints and allows text
truncation to work correctly when content is too wide.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified that tool content displays correctly in the accordion
- [x] Confirmed that long content properly truncates instead of
overflowing
- [x] Tested with various screen sizes to ensure responsive behavior
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Added `min-w-0` class to `ContentCard` component to fix text truncation
overflow in grid layouts. This is a standard CSS fix that allows grid
items to shrink below their content size, enabling `truncate` classes on
child elements (`ContentCardTitle`, `ContentCardSubtitle`) to work
correctly. The fix follows the same pattern already used in
`ContentCardHeader` (line 54) and `ToolAccordion` (line 54).
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- Safe to merge with no risk
- Single-line CSS fix that addresses a well-known flexbox/grid layout
issue. The change follows existing patterns in the codebase and is
thoroughly tested. No logic changes, no breaking changes, no side
effects.
- No files require special attention
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
## Summary
Blocks marked `disabled=True` (like BlockInstallationBlock) were not
being checked during graph validation, allowing them to be used via
direct API calls despite being hidden from the UI.
This adds a security check in `_validate_graph_get_errors()` to reject
any graph containing disabled blocks.
## Security Advisory
GHSA-4crw-9p35-9x54
## Linear
SECRT-1927
## Changes
- Added `block.disabled` check in graph validation (6 lines)
## Testing
- Graphs with disabled blocks → rejected with clear error message
- Graphs with valid blocks → unchanged behavior
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Adds critical security validation to prevent execution of disabled
blocks (like `BlockInstallationBlock`) via direct API calls. The fix
validates that `block.disabled` is `False` during graph validation in
`_validate_graph_get_errors()` on line 747-750, ensuring disabled blocks
are rejected before graph creation or execution. This closes a
vulnerability where blocks marked disabled in the UI could still be used
through API endpoints.
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge and addresses a critical security
vulnerability
- The fix is minimal (6 lines), correctly placed in the validation flow,
includes clear security context (GHSA reference), and follows existing
validation patterns. The check is positioned after block existence
validation and before input validation, ensuring disabled blocks are
caught early in both graph creation and execution paths.
- No files require special attention
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
---------
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
# your codebase is analyzed, see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/codeql-code-scanning-for-compiled-languages
Concatenates two or more lists into a single list.
This block accepts a list of lists and combines all their elements
in order into one flat output list. It supports options for
deduplication and None-filtering to provide flexible list merging
capabilities for workflow pipelines.
"""
classInput(BlockSchemaInput):
lists:List[List[Any]]=SchemaField(
description="A list of lists to concatenate together. All lists will be combined in order into a single list.",
placeholder="e.g., [[1, 2], [3, 4], [5, 6]]",
)
deduplicate:bool=SchemaField(
description="If True, remove duplicate elements from the concatenated result while preserving order.",
default=False,
advanced=True,
)
remove_none:bool=SchemaField(
description="If True, remove None values from the concatenated result.",
default=False,
advanced=True,
)
classOutput(BlockSchemaOutput):
concatenated_list:List[Any]=SchemaField(
description="The concatenated list containing all elements from all input lists in order."
)
length:int=SchemaField(
description="The total number of elements in the concatenated list."
)
error:str=SchemaField(
description="Error message if concatenation failed due to invalid input types."
)
@@ -700,7 +902,7 @@ class ConcatenateListsBlock(Block):
def__init__(self):
super().__init__(
id="3cf9298b-5817-4141-9d80-7c2cc5199c8e",
description="Concatenates multiple lists into a single list. All elements from all input lists are combined in order.",
description="Concatenates multiple lists into a single list. All elements from all input lists are combined in order. Supports optional deduplication and None removal.",
categories={BlockCategory.BASIC},
input_schema=ConcatenateListsBlock.Input,
output_schema=ConcatenateListsBlock.Output,
@@ -709,29 +911,497 @@ class ConcatenateListsBlock(Block):
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.