The frontend `e2e_test` doesn't have a working build cache setup,
causing really slow builds = slow test jobs. These changes reduce total
test runtime from ~12 minutes to ~5 minutes.
### Changes 🏗️
- Inject build cache config into docker compose config; let `buildx
bake` use GHA cache directly
- Add `docker-ci-fix-compose-build-cache.py` script
- Optimize `backend/Dockerfile` + root `.dockerignore`
- Replace broken DIY pnpm store caching with `actions/setup-node`
built-in cache management
- Add caching for test seed data created in DB
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- CI
### Changes 🏗️
The `find_block` AutoPilot tool was returning ~90K characters per
response (10 blocks). The bloat came from including full JSON Schema
objects (`input_schema`, `output_schema`) with all nested `$defs`,
`anyOf`, and type definitions for every block.
**What changed:**
- **`BlockInfoSummary` model**: Removed `input_schema` (raw JSON
Schema), `output_schema` (raw JSON Schema), and `categories`. Added
`output_fields` (compact field-level summaries matching the existing
`required_inputs` format).
- **`BlockListResponse` model**: Removed `usage_hint` (info now in
`message`).
- **`FindBlockTool._execute()`**: Now extracts compact `output_fields`
from output schema properties instead of including the entire raw
schema. Credentials handling is unchanged.
- **Test**: Added `test_response_size_average_chars_per_block` with
realistic block schemas (HTTP, Email, Claude Code) to measure and assert
response size stays under 2K chars/block.
- **`CLAUDE.md`**: Clarified `dev` vs `master` branching strategy.
**Result:** Average response size reduced from ~9,000 to ~1,300 chars
per block (~85% reduction). This directly reduces LLM token consumption,
latency, and API costs for AutoPilot interactions.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified models import and serialize correctly
- [x] Verified response size: 3,970 chars for 3 realistic blocks (avg
1,323/block)
- [x] Lint (`ruff check`) and type check (`pyright`) pass on changed
files
- [x] Frontend compatibility preserved: `blocks[].name` and `count`
fields retained for `block_list` handler
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Toran Bruce Richards <toran.richards@gmail.com>
## Summary
- Remove left and right borders from tables rendered in CoPilot chat
- Increase cell padding (py-3 → py-3.5) for better spacing between text
and lines
- Applies to both Streamdown (main chat) and MarkdownRenderer (tool
outputs)
Design feedback from Olivia to make tables "breathe" more.
## Test plan
- [ ] Open CoPilot chat and trigger a response containing a table
- [ ] Verify tables no longer have left/right borders
- [ ] Verify increased spacing between rows
- [ ] Check both light and dark modes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Improved CoPilot chat table styling by removing left and right borders
and increasing vertical padding from `py-3` to `py-3.5`. Changes apply
to both:
- Streamdown-rendered tables (via CSS selector in `globals.css`)
- MarkdownRenderer tables (via Tailwind classes)
The changes make tables "breathe" more per design feedback from Olivia.
**Issue Found:**
- The CSS padding value in `globals.css:192` is `0.625rem` (`py-2.5`)
but should be `0.875rem` (`py-3.5`) to match the PR description and the
MarkdownRenderer implementation.
</details>
<details><summary><h3>Confidence Score: 2/5</h3></summary>
- This PR has a logical error that will cause inconsistent table styling
between Streamdown and MarkdownRenderer tables
- The implementation has an inconsistency where the CSS file uses
`py-2.5` padding while the PR description and MarkdownRenderer use
`py-3.5`. This will result in different table padding between the two
rendering systems, contradicting the goal of consistent styling
improvements.
- Pay close attention to `autogpt_platform/frontend/src/app/globals.css`
- the padding value needs to be corrected to match the intended design
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Resolves OPEN-2693: Make exact timestamp of runs accessible through UI.
The NewAgentLibraryView shows relative timestamps ("2 days ago") for
runs and schedules, but unlike the OldAgentLibraryView it didn't show
the exact timestamp on hover. This PR adds a native `title` tooltip so
users can see the full date/time by hovering.
### Changes 🏗️
- Added `descriptionTitle` prop to `SidebarItemCard` that renders as a
`title` attribute on the description text
- `TaskListItem` now passes the exact `run.started_at` timestamp via
`descriptionTitle`
- `ScheduleListItem` now passes the exact `schedule.next_run_time`
timestamp via `descriptionTitle`
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [ ] Open an agent in the library view
- [ ] Hover over a run's relative timestamp (e.g. "2 days ago") and
confirm the full date/time tooltip appears
- [ ] Hover over a schedule's relative timestamp and confirm the full
date/time tooltip appears
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Added native tooltip functionality to show exact timestamps in the
library view. The implementation adds a `descriptionTitle` prop to
`SidebarItemCard` that renders as a `title` attribute on the description
text. This allows users to hover over relative timestamps (e.g., "2 days
ago") to see the full date/time.
**Changes:**
- Added optional `descriptionTitle` prop to `SidebarItemCard` component
(SidebarItemCard.tsx:10)
- `TaskListItem` passes `run.started_at` as the tooltip value
(TaskListItem.tsx:84-86)
- `ScheduleListItem` passes `schedule.next_run_time` as the tooltip
value (ScheduleListItem.tsx:32)
- Unrelated fix included: Sentry configuration updated to suppress
cross-origin stylesheet errors (instrumentation-client.ts:25-28)
**Note:** The PR includes two separate commits - the main timestamp
tooltip feature and a Sentry error suppression fix. The PR description
only documents the timestamp feature.
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge with minimal risk
- The changes are straightforward and limited in scope - adding an
optional prop that forwards a native HTML attribute for tooltip
functionality. The Text component already supports forwarding arbitrary
HTML attributes through its spread operator (...rest), ensuring the
`title` attribute works correctly. Both the timestamp tooltip feature
and the Sentry configuration fix are low-risk improvements with no
breaking changes.
- No files require special attention
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant User
participant TaskListItem
participant ScheduleListItem
participant SidebarItemCard
participant Text
participant Browser
User->>TaskListItem: Hover over run timestamp
TaskListItem->>SidebarItemCard: Pass descriptionTitle (run.started_at)
SidebarItemCard->>Text: Render with title attribute
Text->>Browser: Forward title attribute to DOM
Browser->>User: Display native tooltip with exact timestamp
User->>ScheduleListItem: Hover over schedule timestamp
ScheduleListItem->>SidebarItemCard: Pass descriptionTitle (schedule.next_run_time)
SidebarItemCard->>Text: Render with title attribute
Text->>Browser: Forward title attribute to DOM
Browser->>User: Display native tooltip with exact timestamp
```
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Adds `ignoreErrors` to the Sentry client configuration
(`instrumentation-client.ts`) to filter out `SecurityError:
CSSStyleSheet.cssRules getter: Not allowed to access cross-origin
stylesheet` errors
- These errors are caused by Sentry Replay (rrweb) attempting to
serialize DOM snapshots that include cross-origin stylesheets (from
browser extensions or CDN-loaded CSS)
- This was reported via Sentry on production, occurring on any page when
logged in
## Changes
- **`frontend/instrumentation-client.ts`**: Added `ignoreErrors: [/Not
allowed to access cross-origin stylesheet/]` to `Sentry.init()` config
## Test plan
- [ ] Verify the error no longer appears in Sentry after deployment
- [ ] Verify Sentry Replay still works correctly for other errors
- [ ] Verify no regressions in error tracking (other errors should still
be captured)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Adds error filtering to Sentry client configuration to suppress
cross-origin stylesheet security errors that occur when Sentry Replay
(rrweb) attempts to serialize DOM snapshots containing stylesheets from
browser extensions or CDN-loaded CSS. This prevents noise in Sentry
error logs without affecting the capture of legitimate errors.
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge with minimal risk
- The change adds a simple error filter to suppress benign cross-origin
stylesheet errors that are caused by Sentry Replay itself. The regex
pattern is specific and only affects client-side error reporting, with
no impact on application functionality or legitimate error capture
- No files require special attention
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Agent generation completes on the backend but the UI does not
update/refresh to show the result.
### Changes 🏗️
![Uploading Screenshot 2026-02-13 at 00.44.54.png…]()
- **Stream start timeout (12s):** If the backend doesn't begin streaming
within 12 seconds of submitting a message, the stream is aborted and a
destructive toast is shown to the user.
- **Long-running tool polling:** Added `useLongRunningToolPolling` hook
that polls the session endpoint every 1.5s while a tool output is in an
operating state (`operation_started` / `operation_pending` /
`operation_in_progress`). When the backend completes, messages are
refreshed so the UI reflects the final result.
- **CreateAgent UI improvements:** Replaced the orbit loader / progress
bar with a mini-game, added expanded accordion for saved agents, and
improved the saved-agent card with image, icons, and links that open in
new tabs.
- **Backend tweaks:** Added `image_url` to `CreateAgentToolOutput`,
minor model/service updates for the dummy agent generator.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Send a message and verify the stream starts within 12s or a toast
appears
- [x] Trigger agent creation and verify the UI updates when the backend
completes
- [x] Verify the saved-agent card renders correctly with image, links,
and icons
---------
Co-authored-by: Otto <otto@agpt.co>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Store files created by sandbox blocks (Claude Code, Code Executor) to
the user's workspace for persistence across runs.
### Changes 🏗️
- **New `sandbox_files.py` utility** (`backend/util/sandbox_files.py`)
- Shared module for extracting files from E2B sandboxes
- Stores files to workspace via `store_media_file()` (includes virus
scanning, size limits)
- Returns `SandboxFileOutput` with path, content, and `workspace_ref`
- **Claude Code block** (`backend/blocks/claude_code.py`)
- Added `workspace_ref` field to `FileOutput` schema
- Replaced inline `_extract_files()` with shared utility
- Files from working directory now stored to workspace automatically
- **Code Executor block** (`backend/blocks/code_executor.py`)
- Added `files` output field to `ExecuteCodeBlock.Output`
- Creates `/output` directory in sandbox before execution
- Extracts all files (text + binary) from `/output` after execution
- Updated `execute_code()` to support file extraction with
`extract_files` param
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Create agent with Claude Code block, have it create a file, verify
`workspace_ref` in output
- [x] Create agent with Code Executor block, write file to `/output`,
verify `workspace_ref` in output
- [x] Verify files persist in workspace after sandbox disposal
- [x] Verify binary files (images, etc.) work correctly in Code Executor
- [x] Verify existing graphs using `content` field still work (backward
compat)
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
No configuration changes required - this is purely additive backend
code.
---
**Related:** Closes SECRT-1931
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Adds automatic extraction and workspace storage of sandbox-written
files (including binaries for code execution), which can affect output
payload size, performance, and file-handling edge cases.
>
> **Overview**
> **Sandbox blocks now persist generated files to workspace.** A new
shared utility (`backend/util/sandbox_files.py`) extracts files from an
E2B sandbox (scoped by a start timestamp) and stores them via
`store_media_file`, returning `SandboxFileOutput` with `workspace_ref`.
>
> `ClaudeCodeBlock` replaces its inline file-scraping logic with this
utility and updates the `files` output schema to include
`workspace_ref`.
>
> `ExecuteCodeBlock` adds a `files` output and extends the executor
mixin to optionally extract/store files (text + binary) when an
`execution_context` is provided; related mocks/tests and docs are
updated accordingly.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
343854c0cf. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
### Changes 🏗️
Removed the default expiration date for API keys in the credentials
modal. Previously, API keys were set to expire the next day by default,
but now the expiration date field starts empty, allowing users to
explicitly choose whether they want to set an expiration date.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Open the API key credentials modal and verify the expiration date
field is empty by default
- [x] Test creating an API key with and without an expiration date
- [x] Verify both scenarios work correctly
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Removed the default expiration date for API key credentials in the
credentials modal. Previously, API keys were automatically set to expire
the next day at midnight. Now the expiration date field starts empty,
allowing users to explicitly choose whether to set an expiration.
- Removed `getDefaultExpirationDate()` helper function that calculated
tomorrow's date
- Changed default `expiresAt` value from calculated date to empty string
- Backend already supports optional expiration (`expires_at?: number`),
so no backend changes needed
- Form submission correctly handles empty expiration by passing
`undefined` to the API
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge with minimal risk
- The changes are straightforward and well-contained. The refactor
removes a helper function and changes a default value. The backend API
already supports optional expiration dates, and the form submission
logic correctly handles empty values by passing undefined. The change
improves UX by not forcing a default expiration date on users.
- No files require special attention
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
## Summary
Removes the `min-h-screen` class from `ConversationContent` in
ChatMessagesContainer, which was causing fixed height layout issues in
the CoPilot chat interface.
## Changes
- Removed `min-h-screen` from ConversationContent className
## Linear
Fixes [SECRT-1944](https://linear.app/autogpt/issue/SECRT-1944)
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Removes the `min-h-screen` (100vh) class from `ConversationContent` that
was causing the chat message container to enforce a minimum viewport
height. The parent container already handles height constraints with
`h-full min-h-0` and flexbox layout, so the fixed minimum height was
creating layout conflicts. The component now properly grows within its
flex container using `flex-1`.
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge with minimal risk
- The change removes a single problematic CSS class that was causing
fixed height layout issues. The parent container already handles height
constraints properly with flexbox, and removing min-h-screen allows the
component to size correctly within its flex parent. This is a targeted,
low-risk bug fix with no logic changes.
- No files require special attention
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
I'm getting circular import issues because there is a lot of
cross-importing between `backend.data`, `backend.blocks`, and other
modules. This change reduces block-related cross-imports and thus risk
of breaking circular imports.
### Changes 🏗️
- Strip down `backend.data.block`
- Move `Block` base class and related class/enum defs to
`backend.blocks._base`
- Move `is_block_auth_configured` to `backend.blocks._utils`
- Move `get_blocks()`, `get_io_block_ids()` etc. to `backend.blocks`
(`__init__.py`)
- Update imports everywhere
- Remove unused and poorly typed `Block.create()`
- Change usages from `block_cls.create()` to `block_cls()`
- Improve typing of `load_all_blocks` and `get_blocks`
- Move cross-import of `backend.api.features.library.model` from
`backend/data/__init__.py` to `backend/data/integrations.py`
- Remove deprecated attribute `NodeModel.webhook`
- Re-generate OpenAPI spec and fix frontend usage
- Eliminate module-level `backend.blocks` import from `blocks/agent.py`
- Eliminate module-level `backend.data.execution` and
`backend.executor.manager` imports from `blocks/helpers/review.py`
- Replace `BlockInput` with `GraphInput` for graph inputs
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- CI static type-checking + tests should be sufficient for this
(#12081)
### Changes 🏗️
This PR completes the migration from the legacy builder to the new Flow
editor by removing all legacy code and feature flags.
**Removed:**
- Old builder view toggle functionality (`BuilderViewTabs.tsx`)
- Legacy debug panel (`RightSidebar.tsx`)
- Feature flags: `NEW_FLOW_EDITOR` and `BUILDER_VIEW_SWITCH`
- `useBuilderView` hook and related view-switching logic
**Updated:**
- Simplified `build/page.tsx` to always render the new Flow editor
- Added CSS styling (`flow.css`) to properly render Phosphor icons in
React Flow handles
**Tests:**
- Skipped e2e test suite in `build.spec.ts` (legacy builder tests)
- Follow-up PR (#12082) will add new e2e tests for the Flow editor
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Create a new flow and verify it loads correctly
- [x] Add nodes and connections to verify basic functionality works
- [x] Verify that node handles render correctly with the new CSS
- [x] Check that the UI is clean without the old debug panel or view
toggles
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
## Summary
- When the copilot model responds with both text content AND a
long-running tool call (e.g., `create_agent`), the streaming code
created two separate consecutive assistant messages — one with text, one
with `tool_calls`. This caused Anthropic's API to reject with
`"unexpected tool_use_id found in tool_result blocks"` because the
`tool_result` couldn't find a matching `tool_use` in the immediately
preceding assistant message.
- Added a defensive merge of consecutive assistant messages in
`to_openai_messages()` (fixes existing corrupt sessions too)
- Fixed `_yield_tool_call` to add tool_calls to the existing
current-turn assistant message instead of creating a new one
- Changed `accumulated_tool_calls` assignment to use `extend` to prevent
overwriting tool_calls added by long-running tool flow
## Test plan
- [x] All 23 chat feature tests pass (`backend/api/features/chat/`)
- [x] All 44 prompt utility tests pass (`backend/util/prompt_test.py`)
- [x] All pre-commit hooks pass (ruff, isort, black, pyright)
- [ ] Manual test: create an agent via copilot, then ask a follow-up
question — should no longer get 400 error
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Fixes a critical bug where long-running tool calls (like `create_agent`)
caused Anthropic API 400 errors due to split assistant messages. The fix
ensures tool calls are added to the existing assistant message instead
of creating new ones, and adds a defensive merge function to repair any
existing corrupt sessions.
**Key changes:**
- Added `_merge_consecutive_assistant_messages()` to defensively merge
split assistant messages in `to_openai_messages()`
- Modified `_yield_tool_call()` to append tool calls to the current-turn
assistant message instead of creating a new one
- Changed `accumulated_tool_calls` from assignment to `extend` to
preserve tool calls already added by long-running tool flow
**Impact:** Resolves the issue where users received 400 errors after
creating agents via copilot and asking follow-up questions.
</details>
<details><summary><h3>Confidence Score: 4/5</h3></summary>
- Safe to merge with minor verification recommended
- The changes are well-targeted and solve a real API compatibility
issue. The logic is sound: searching backwards for the current assistant
message is correct, and using `extend` instead of assignment prevents
overwriting. The defensive merge in `to_openai_messages()` also fixes
existing corrupt sessions. All existing tests pass according to the PR
description.
- No files require special attention - changes are localized and
defensive
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant User
participant StreamAPI as stream_chat_completion
participant Chunks as _stream_chat_chunks
participant ToolCall as _yield_tool_call
participant Session as ChatSession
User->>StreamAPI: Send message
StreamAPI->>Chunks: Stream chat chunks
alt Text + Long-running tool call
Chunks->>StreamAPI: Text delta (content)
StreamAPI->>Session: Append assistant message with content
Chunks->>ToolCall: Tool call detected
Note over ToolCall: OLD: Created new assistant message<br/>NEW: Appends to existing assistant
ToolCall->>Session: Search backwards for current assistant
ToolCall->>Session: Append tool_call to existing message
ToolCall->>Session: Add pending tool result
end
StreamAPI->>StreamAPI: Merge accumulated_tool_calls
Note over StreamAPI: Use extend (not assign)<br/>to preserve existing tool_calls
StreamAPI->>Session: to_openai_messages()
Session->>Session: _merge_consecutive_assistant_messages()
Note over Session: Defensive: Merges any split<br/>assistant messages
Session-->>StreamAPI: Merged messages
StreamAPI->>User: Return response
```
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
## Problem
The agent builder (LLM) misinterprets the HumanInTheLoop block outputs.
It thinks `approved_data` and `rejected_data` will yield status strings
like "APPROVED" or "REJECTED" instead of understanding that the actual
input data passes through.
This leads to unnecessary complexity - the agent builder adds comparison
blocks to check for status strings that don't exist.
## Solution
Enriched the block docstring and all input/output field descriptions to
make it explicit that:
1. The output is the actual data itself, not a status string
2. The routing is determined by which output pin fires
3. How to use the block correctly (connect downstream blocks to
appropriate output pins)
## Changes
- Updated block docstring with clear "How it works" and "Example usage"
sections
- Enhanced `data` input description to explain data flow
- Enhanced `name` input description for reviewer context
- Enhanced `approved_data` output to explicitly state it's NOT a status
string
- Enhanced `rejected_data` output to explicitly state it's NOT a status
string
- Enhanced `review_message` output for clarity
## Testing
Documentation-only change to schema descriptions. No functional changes.
Fixes SECRT-1930
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Enhanced documentation for the `HumanInTheLoopBlock` to clarify how
output pins work. The key improvement explicitly states that output pins
(`approved_data` and `rejected_data`) yield the actual input data, not
status strings like "APPROVED" or "REJECTED". This prevents the agent
builder (LLM) from misinterpreting the block's behavior and adding
unnecessary comparison blocks.
**Key changes:**
- Added "How it works" and "Example usage" sections to the block
docstring
- Clarified that routing is determined by which output pin fires, not by
comparing output values
- Enhanced all input/output field descriptions with explicit data flow
explanations
- Emphasized that downstream blocks should be connected to the
appropriate output pin based on desired workflow path
This is a documentation-only change with no functional modifications to
the code logic.
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge with no risk
- Documentation-only change that accurately reflects the existing code
behavior. No functional changes, no runtime impact, and the enhanced
descriptions correctly explain how the block outputs work based on
verification of the implementation code.
- No files require special attention
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
## Changes 🏗️
<img width="800" height="621" alt="Screenshot 2026-02-11 at 19 32 39"
src="https://github.com/user-attachments/assets/e97be1a7-972e-4ae0-8dfa-6ade63cf287b"
/>
When the BE API has an error, prevent it from leaking into the stream
and instead handle it gracefully via toast.
## Checklist 📋
### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Run the app locally and trust the changes
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
This PR fixes an issue where backend API stream errors were leaking into
the chat prompt instead of being handled gracefully. The fix involves
both backend and frontend changes to ensure error events conform to the
AI SDK's strict schema.
**Key Changes:**
- **Backend (`response_model.py`)**: Added custom `to_sse()` method for
`StreamError` that only emits `type` and `errorText` fields, stripping
extra fields like `code` and `details` that cause AI SDK validation
failures
- **Backend (`prompt.py`)**: Added validation step after context
compression to remove orphaned tool responses without matching tool
calls, preventing "unexpected tool_use_id" API errors
- **Frontend (`route.ts`)**: Implemented SSE stream normalization with
`normalizeSSEStream()` and `normalizeSSEEvent()` functions to strip
non-conforming fields from error events before they reach the AI SDK
- **Frontend (`ChatMessagesContainer.tsx`)**: Added toast notifications
for errors and improved error display UI with deduplication logic
The changes ensure a clean separation between internal error metadata
(useful for logging/debugging) and the strict schema required by the AI
SDK on the frontend.
</details>
<details><summary><h3>Confidence Score: 4/5</h3></summary>
- This PR is safe to merge with low risk
- The changes are well-structured and address a specific bug with proper
error handling. The dual-layer approach (backend filtering in `to_sse()`
+ frontend normalization) provides defense-in-depth. However, the lack
of automated tests for the new error normalization logic and the
potential for edge cases in SSE parsing prevent a perfect score.
- Pay close attention to
`autogpt_platform/frontend/src/app/api/chat/sessions/[sessionId]/stream/route.ts`
- the SSE normalization logic should be tested with various error
scenarios
</details>
<details><summary><h3>Sequence Diagram</h3></summary>
```mermaid
sequenceDiagram
participant User
participant Frontend as ChatMessagesContainer
participant Proxy as /api/chat/.../stream
participant Backend as Backend API
participant AISDK as AI SDK
User->>Frontend: Send message
Frontend->>Proxy: POST with message
Proxy->>Backend: Forward request with auth
Backend->>Backend: Process message
alt Success Path
Backend->>Proxy: SSE stream (text-delta, etc.)
Proxy->>Proxy: normalizeSSEStream (pass through)
Proxy->>AISDK: Forward SSE events
AISDK->>Frontend: Update messages
Frontend->>User: Display response
else Error Path
Backend->>Backend: StreamError.to_sse()
Note over Backend: Only emit {type, errorText}
Backend->>Proxy: SSE error event
Proxy->>Proxy: normalizeSSEEvent()
Note over Proxy: Strip extra fields (code, details)
Proxy->>AISDK: {type: "error", errorText: "..."}
AISDK->>Frontend: error state updated
Frontend->>Frontend: Toast notification (deduplicated)
Frontend->>User: Show error UI + toast
end
```
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
---------
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Otto-AGPT <otto@agpt.co>
I'm getting circular import issues because there is a lot of cross-importing between `backend.data`, `backend.blocks`, and other components. This change reduces block-related cross-imports and thus risk of breaking circular imports.
### Changes 🏗️
Added `min-w-0` class to the ContentCard component in the ToolAccordion
to prevent content overflow issues. This CSS fix ensures that the card
properly respects its container width constraints and allows text
truncation to work correctly when content is too wide.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified that tool content displays correctly in the accordion
- [x] Confirmed that long content properly truncates instead of
overflowing
- [x] Tested with various screen sizes to ensure responsive behavior
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Added `min-w-0` class to `ContentCard` component to fix text truncation
overflow in grid layouts. This is a standard CSS fix that allows grid
items to shrink below their content size, enabling `truncate` classes on
child elements (`ContentCardTitle`, `ContentCardSubtitle`) to work
correctly. The fix follows the same pattern already used in
`ContentCardHeader` (line 54) and `ToolAccordion` (line 54).
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- Safe to merge with no risk
- Single-line CSS fix that addresses a well-known flexbox/grid layout
issue. The change follows existing patterns in the codebase and is
thoroughly tested. No logic changes, no breaking changes, no side
effects.
- No files require special attention
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
## Summary
Blocks marked `disabled=True` (like BlockInstallationBlock) were not
being checked during graph validation, allowing them to be used via
direct API calls despite being hidden from the UI.
This adds a security check in `_validate_graph_get_errors()` to reject
any graph containing disabled blocks.
## Security Advisory
GHSA-4crw-9p35-9x54
## Linear
SECRT-1927
## Changes
- Added `block.disabled` check in graph validation (6 lines)
## Testing
- Graphs with disabled blocks → rejected with clear error message
- Graphs with valid blocks → unchanged behavior
<!-- greptile_comment -->
<h2>Greptile Overview</h2>
<details><summary><h3>Greptile Summary</h3></summary>
Adds critical security validation to prevent execution of disabled
blocks (like `BlockInstallationBlock`) via direct API calls. The fix
validates that `block.disabled` is `False` during graph validation in
`_validate_graph_get_errors()` on line 747-750, ensuring disabled blocks
are rejected before graph creation or execution. This closes a
vulnerability where blocks marked disabled in the UI could still be used
through API endpoints.
</details>
<details><summary><h3>Confidence Score: 5/5</h3></summary>
- This PR is safe to merge and addresses a critical security
vulnerability
- The fix is minimal (6 lines), correctly placed in the validation flow,
includes clear security context (GHSA reference), and follows existing
validation patterns. The check is positioned after block existence
validation and before input validation, ensuring disabled blocks are
caught early in both graph creation and execution paths.
- No files require special attention
</details>
<!-- greptile_other_comments_section -->
<!-- /greptile_comment -->
---------
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
### Changes
- Removed `defaultExpanded` prop from `ToolAccordion` in CreateAgent,
EditAgent, RunAgent, and RunBlock components to streamline the code and
improve readability.
### Impact
- This refactor enhances maintainability by reducing complexity in the
component structure while preserving existing functionality.
### Changes 🏗️
- Removed conditional expansion logic from all tool components
- Simplified ToolAccordion implementation across all affected components
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Create and run agents with various tools to verify accordion
behavior works correctly
- [x] Verify that UI components expand and collapse as expected
- [x] Test with different output types to ensure proper rendering
---------
Co-authored-by: Ubbe <hi@ubbe.dev>
Co-authored-by: Lluis Agusti <hi@llu.lu>
## Summary
Enables Anthropic's extended thinking feature for Claude models in
CoPilot via OpenRouter. This keeps the model's chain-of-thought
reasoning internal rather than outputting it to users.
## Problem
The CoPilot prompt was designed for a thinking agent (with
`<internal_reasoning>` tags), but extended thinking wasn't enabled on
the API side. This caused the model to output its reasoning as regular
text, leaking internal analysis to users.
## Solution
Added thinking configuration to the OpenRouter `extra_body` for
Anthropic models:
```python
extra_body["provider"] = {
"anthropic": {
"thinking": {
"type": "enabled",
"budget_tokens": config.thinking_budget_tokens,
}
}
}
```
## Configuration
New settings in `ChatConfig`:
| Setting | Default | Description |
|---------|---------|-------------|
| `thinking_enabled` | `True` | Enable extended thinking for Claude
models |
| `thinking_budget_tokens` | `10000` | Token budget for thinking
(1000-100000) |
## Changes
- `config.py`: Added `thinking_enabled` and `thinking_budget_tokens`
settings
- `service.py`: Added thinking config to all 3 places where `extra_body`
is built for LLM calls
## Testing
- Verify CoPilot responses no longer include internal reasoning text
- Check that Claude's extended thinking is working (should see thinking
tokens in usage)
- Confirm non-Anthropic models are unaffected
## Related
Discussion:
https://discord.com/channels/1126875755960336515/1126875756925046928/1470779843552612607
---------
Co-authored-by: Swifty <craigswift13@gmail.com>