## Changes 🏗️
Allow dynamic URLs in the CORS config, to match them via regex. This
helps because currently we have Front-end preview deployments which are
isolated ( _nice they don't pollute or overrride other domains_ ) like:
```
https://autogpt-git-{branch_name}-{commit}-significant-gravitas.vercel.app
```
The Front-end builds and works there, but as soon as you login, any API
requests to endpoints that need auth will fail due to CORS, given our
current CORS config does not support dynamically generated domains.
### Changes
After these changes we can specify dynamic domains to be allowed under
CORS. I also made `localhost` disabled if the API is in production for
safety...
### Before
```yml
cors:
allowOrigin: "https://dev-builder.agpt.co" # could only specify full URL strings, not dyamic ones
```
### After
```yml
cors:
allowOrigins:
- "https://dev-builder.agpt.co"
- "regex:https://autogpt-git-[a-z0-9-]+\\.vercel\\.app" # dynamic domains supported via regex
```
### Files
- add `build_cors_params` utility to parse literal/regex origins and
block localhost in production (`backend/server/utils/cors.py`)
- apply the helper in both `AgentServer` and `WebsocketServer` so CORS
logic and validations remain consistent
- add reusable `override_config` testing helper and update existing
WebSocket tests to cover the shared CORS behavior
- introduce targeted unit tests for the new CORS helper
(`backend/server/utils/cors_test.py`)
## Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] We will know once we made the origin config changes on infra and
test with this...
Replaces legacy stop/requeue functions in diagnostics.py with robust implementations using add_graph_execution and stop_graph_execution. Updates admin diagnostics routes to use these new methods, ensuring proper cascading and parallel handling. Adds comprehensive tests for admin routes, including edge cases and validation for bulk operations. Enhances get_graph_executions to support filtering by execution_ids for efficiency.
Introduces detection and reporting of executions in impossible states (QUEUED with startedAt, RUNNING without startedAt) to backend diagnostics, API, and frontend. Adds a new read-only admin endpoint and UI tab for manual investigation of data corruption cases, updates metrics and OpenAPI spec, and refactors queries to support filtering by startedAt.
Refactored admin diagnostics routes to remove redundant try/except blocks and streamline response handling. Added utility functions in diagnostics.py for fetching all orphaned and stuck queued execution IDs, and for counting failed executions. Updated execution.py to support offset in get_graph_executions. These changes improve maintainability, error logging, and enable bulk operations for admin endpoints.
These models have become deprecated
- deepseek-r1-distill-llama-70b
- gemma2-9b-it
- llama3-70b-8192
- llama3-8b-8192
- google/gemini-flash-1.5
I have removed them and setup a migration, the migration is to convert
all the old versions of the model to new versions, the model changes
will happen like so
- llama3-70b-8192 → llama-3.3-70b-versatile
- llama3-8b-8192 → llama-3.1-8b-instant
- google/gemini-flash-1.5 → google/gemini-2.5-flash
- deepseek-r1-distill-llama-70b → gpt-5-chat-latest
- gemma2-9b-it → gpt-5-chat-latest
### Changes 🏗️
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Check to see if old models where removed
- [x] Check to see if migration worked and converted old models to new
one in graph
## Summary
Add AI-generated correctness score field to execution activity status
generation to provide quantitative assessment of how well executions
achieved their intended purpose.
New page:
<img width="1000" height="229" alt="image"
src="https://github.com/user-attachments/assets/5cb907cf-5bc7-4b96-8128-8eecccde9960"
/>
Old page:
<img width="1000" alt="image"
src="https://github.com/user-attachments/assets/ece0dfab-1e50-4121-9985-d585f7fcd4d2"
/>
## What Changed
- Added `correctness_score` field (float 0.0-1.0) to
`GraphExecutionStats` model
- **REFACTORED**: Removed duplicate `llm_utils.py` and reused existing
`AIStructuredResponseGeneratorBlock` logic
- Updated activity status generator to use structured responses instead
of plain text
- Modified prompts to include correctness assessment with 5-tier scoring
system:
- 0.0-0.2: Failure
- 0.2-0.4: Poor
- 0.4-0.6: Partial Success
- 0.6-0.8: Mostly Successful
- 0.8-1.0: Success
- Updated manager.py to extract and set both activity_status and
correctness_score
- Fixed tests to work with existing structured response interface
## Technical Details
- **Code Reuse**: Eliminated duplication by using existing
`AIStructuredResponseGeneratorBlock` instead of creating new LLM
utilities
- Added JSON validation with retry logic for malformed responses
- Maintained backward compatibility for existing activity status
functionality
- Score is clamped to valid 0.0-1.0 range and validated
- All type errors resolved and linting passes
## Test Plan
- [x] All existing tests pass with refactored structure
- [x] Structured LLM call functionality tested with success and error
cases
- [x] Activity status generation tested with various execution scenarios
- [x] Integration tests verify both fields are properly set in execution
stats
- [x] No code duplication - reuses existing block logic
🤖 Generated with [Claude Code](https://claude.ai/code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Zamil Majdy <majdyz@users.noreply.github.com>
BREAKING CHANGE: Removed deprecated use_auto_prompt field from Input
schema. Existing workflows using this field will need to be updated to
use the type field set to "auto" instead.
## Summary of Changes 📝
This PR comprehensively updates all Exa search blocks to match the
latest Exa API specification and adds significant new functionality
through the Websets API integration.
### Core API Updates 🔄
- **Migration to Exa SDK**: Replaced manual API calls with the official
`exa_py` AsyncExa SDK across all blocks for better reliability and
maintainability
- **Removed deprecated fields**: Eliminated
`use_auto_prompt`/`useAutoprompt` field (breaking change)
- **Fixed incomplete field definitions**: Corrected `user_location`
field definition
- **Added new input fields**: Added `moderation` and `context` fields
for enhanced content filtering
### Enhanced Content Settings 🛠️
- **Text field improvements**: Support both boolean and advanced object
configurations
- **New content options**:
- Added `livecrawl` settings (never, fallback, always, preferred)
- Added `subpages` support for deeper content retrieval
- Added `extras` settings for links and images
- Added `context` settings for additional contextual information
- **Updated settings**: Enhanced `highlight` and `summary`
configurations with new query and schema options
### Comprehensive Cost Tracking 💰
- Added detailed cost tracking models:
- `CostDollars` for monetary costs
- `CostCredits` for API credit tracking
- `CostDuration` for time-based costs
- New output fields: `request_id`, `resolved_search_type`,
`cost_dollars`
- Improved response handling to conditionally yield fields based on
availability
### New Websets API Integration 🚀
Added eight new specialized blocks for Exa's Websets API:
- **`websets.py`**: Core webset management (create, get, list, delete)
- **`websets_search.py`**: Search operations within websets
- **`websets_items.py`**: Individual item management (add, get, update,
delete)
- **`websets_enrichment.py`**: Data enrichment operations
- **`websets_import_export.py`**: Bulk import/export functionality
- **`websets_monitor.py`**: Monitor and track webset changes
- **`websets_polling.py`**: Poll for updates and changes
### New Special-Purpose Blocks 🎯
- **`code_context.py`**: Code search capabilities for finding relevant
code snippets from open source repositories, documentation, and Stack
Overflow
- **`research.py`**: Asynchronous research capabilities that explore the
web, gather sources, synthesize findings, and return structured results
with citations
### Code Organization Improvements 📁
- **Removed legacy code**: Deleted `model.py` file containing deprecated
API models
- **Centralized helpers**: Consolidated shared models and utilities in
`helpers.py`
- **Improved modularity**: Each webset operation is now in its own
dedicated file
### Other Changes 🔧
- Updated `.gitignore` for better development workflow
- Updated `CLAUDE.md` with project-specific instructions
- Updated documentation in `docs/content/platform/new_blocks.md` with
error handling, data models, and file input guidelines
- Improved webhook block implementations with SDK integration
### Files Changed 📂
- **Modified (11 files)**:
- `.gitignore`
- `autogpt_platform/CLAUDE.md`
- `autogpt_platform/backend/backend/blocks/exa/answers.py`
- `autogpt_platform/backend/backend/blocks/exa/contents.py`
- `autogpt_platform/backend/backend/blocks/exa/helpers.py`
- `autogpt_platform/backend/backend/blocks/exa/search.py`
- `autogpt_platform/backend/backend/blocks/exa/similar.py`
- `autogpt_platform/backend/backend/blocks/exa/webhook_blocks.py`
- `autogpt_platform/backend/backend/blocks/exa/websets.py`
- `docs/content/platform/new_blocks.md`
- **Added (8 files)**:
- `autogpt_platform/backend/backend/blocks/exa/code_context.py`
- `autogpt_platform/backend/backend/blocks/exa/research.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_enrichment.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_import_export.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_items.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_monitor.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_polling.py`
- `autogpt_platform/backend/backend/blocks/exa/websets_search.py`
- **Deleted (1 file)**:
- `autogpt_platform/backend/backend/blocks/exa/model.py`
### Migration Guide 🚦
For users with existing workflows using the deprecated `use_auto_prompt`
field:
1. Remove the `use_auto_prompt` field from your input configuration
2. Set the `type` field to `ExaSearchTypes.AUTO` (or "auto" in JSON) to
achieve the same behavior
3. Review any custom content settings as the structure has been enhanced
### Testing Recommendations ✅
- Test existing workflows to ensure they handle the breaking change
- Verify cost tracking fields are properly returned
- Test new content settings options (livecrawl, subpages, extras,
context)
- Validate websets functionality if using the new Websets API blocks
🤖 Generated with [Claude Code](https://claude.com/claude-code)
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] made + ran a test agent for the blocks and flows between them
[Exa
Tests_v44.json](https://github.com/user-attachments/files/23226143/Exa.Tests_v44.json)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> Migrates Exa blocks to AsyncExa SDK, adds comprehensive
Websets/research/code-context blocks, updates existing
search/content/answers/similar, deletes legacy models, adjusts
tests/docs; breaking: remove `use_auto_prompt` in favor of
`type="auto"`.
>
> - **Backend — Exa integration (SDK migration & BREAKING)**:
> - Replace manual HTTP calls with `exa_py.AsyncExa` across `search`,
`similar`, `contents`, `answers`, and webhooks; richer outputs
(citations, context, costs, resolved search type).
> - BREAKING: remove `Input.use_auto_prompt`; use `type = "auto"`.
> - Centralize models/utilities in `exa/helpers.py` (content settings,
cost models, result mappers).
> - **New Blocks**:
> - **Websets**: management (`websets.py`), searches, items,
enrichments, imports/exports, monitors, polling (new files under
`exa/websets_*`).
> - **Research**: async research task create/get/wait/list
(`exa/research.py`).
> - **Code Context**: code snippet/context retrieval
(`exa/code_context.py`).
> - **Removals**:
> - Delete deprecated `exa/model.py`.
> - **Docs & DX**:
> - Update `docs/new_blocks.md` (error handling, models, file input) and
`CLAUDE.md`; ignore backend logs in `.gitignore`.
> - **Frontend Tests**:
> - Split/extend “e” block tests and improve block add robustness in
Playwright (`build.spec.ts`, `build.page.ts`).
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
6e5e572322. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added multiple Exa research and webset management blocks for task
creation, monitoring, and completion tracking.
* Introduced new search capabilities including code context retrieval,
content search, and enhanced filtering options.
* Added webset enrichment, import/export, and item management
functionality.
* Expanded search with location-based and category filters.
* **Documentation**
* Updated guidance on error handling, data models, and file input
handling.
* **Refactor**
* Modernized backend API integration with improved response structure
and error reporting.
* Simplified configuration options for search operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Claude <noreply@anthropic.com>
- Resolves#11316
- Durable fix to replace #11318
### Changes 🏗️
- Expand graph execution permissions check
- Don't require library membership for execution as sub-graph
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Can run sub-agent with non-latest graph version
- [x] Can run sub-agent that is available in Marketplace but not added
to Library
## Summary
Fix critical queue blocking issue where rate-limited user messages
prevent other users' executions from being processed, causing the 135
late executions reported in production.
## Root Cause Analysis
When a user exceeds `max_concurrent_graph_executions_per_user` (25), the
executor uses `basic_nack(requeue=True)` which sends the message to the
**FRONT** of the RabbitMQ queue. This creates an infinite blocking loop
where:
1. Rate-limited message goes to front of queue
2. Gets processed, hits rate limit again
3. Goes back to front of queue
4. Blocks all other users' messages indefinitely
## Solution Implementation
### 🔧 Core Changes
- **New setting**: `requeue_by_republishing` (default: `True`) in
`backend/util/settings.py`
- **Smart `_ack_message`**: Automatically uses republishing when
`requeue=True` and setting enabled
- **Efficient implementation**: Uses existing `self.run_client`
connection instead of creating new ones
- **Integration test**: Real RabbitMQ test validates queue ordering
behavior
### 🔄 Technical Implementation
**Before (blocking):**
```python
basic_nack(delivery_tag, requeue=True) # Goes to FRONT of queue ❌
```
**After (non-blocking):**
```python
if requeue and self.config.requeue_by_republishing:
# First: Republish to BACK of queue
self.run_client.publish_message(...)
# Then: Reject without requeue
basic_nack(delivery_tag, requeue=False)
```
### 📊 Impact
- ✅ **Other users' executions no longer blocked** by rate-limited users
- ✅ **Fair queue processing** - FIFO behavior maintained for all users
- ✅ **Rate limiting still works** - just doesn't block others
- ✅ **Configurable** - can revert to old behavior with
`requeue_by_republishing=False`
- ✅ **Zero performance impact** - uses existing connections
## Test Plan
- **Integration test**: `test_requeue_integration.py` validates real
RabbitMQ queue ordering
- **Scenario testing**: Confirms rate-limited messages go to back of
queue
- **Cross-user validation**: Verifies other users' messages process
correctly
- **Setting test**: Confirms configuration loads with correct defaults
## Deployment Strategy
This is a **hotfix** that can be deployed immediately:
- **Backward compatible**: Old behavior available via config
- **Safe default**: New behavior is safer than current state
- **No breaking changes**: All existing functionality preserved
- **Immediate relief**: Resolves production queue blocking
## Files Modified
- `backend/executor/manager.py`: Enhanced `_ack_message` logic and
`_requeue_message_to_back` method
- `backend/util/settings.py`: Added `requeue_by_republishing`
configuration field
- `test_requeue_integration.py`: Integration test for queue ordering
validation
## Related Issues
Fixes the 135 late executions issue where messages were stuck in QUEUED
state despite available executor capacity (583m/600m utilization).
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
Implements foundational backend infrastructure for chat-based agent
interaction system. Users will be able to discover, configure, and run
marketplace agents through conversational AI.
**Note:** Chat routes are behind a feature flag
### Changes 🏗️
**Core Chat System:**
- Chat service with LLM orchestration (Claude 3.5 Sonnet, Haiku, GPT-4)
- REST API routes for sessions and messages
- Database layer for chat persistence
- System prompts and configuration
**5 Conversational Tools:**
1. `find_agent` - Search marketplace by keywords
2. `get_agent_details` - Fetch agent info, inputs, credentials
3. `get_required_setup_info` - Check user readiness, missing credentials
4. `run_agent` - Execute agents immediately
5. `setup_agent` - Configure scheduled execution with cron
**Testing:**
- 28 tests across chat tools (23 passing, 5 skipped for scheduler)
- Test fixtures for simple, LLM, and Firecrawl agents
- Service and data layer tests
**Bug Fixes:**
- Fixed `setup_agent.py` to create schedules instead of immediate
execution
- Fixed graph lookup to use UUID instead of username/slug
- Fixed credential matching by provider/type instead of ID
- Fixed internal tool calls to use `._execute()` instead of `.execute()`
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] All 28 chat tool tests pass (23 pass, 5 skip - require scheduler)
- [x] Code formatting and linting pass
- [x] Tool execution flow validated through unit tests
- [x] Agent discovery, details, and execution tested
- [x] Credential parsing and matching tested
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
No configuration changes required - all existing settings compatible.
### Changes 🏗️
add_store_agent_to_library does not add subagents to the user library,
this check can cause issues.
### Checklist 📋
#### For code changes:
- [ ] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [ ] ...
<details>
<summary>Example test plan</summary>
- [ ] Create from scratch and execute an agent with at least 3 blocks
- [ ] Import an agent from file upload, and confirm it executes
correctly
- [ ] Upload agent to marketplace
- [ ] Import an agent from marketplace and confirm it executes correctly
- [ ] Edit an agent from monitor, and confirm it executes correctly
</details>
#### For configuration changes:
- [ ] `.env.default` is updated or already compatible with my changes
- [ ] `docker-compose.yml` is updated or already compatible with my
changes
- [ ] I have included a list of my configuration changes in the PR
description (under **Changes**)
<details>
<summary>Examples of configuration changes</summary>
- Changing ports
- Adding new services that need to communicate with each other
- Secrets or environment variable changes
- New or infrastructure changes such as databases
</details>
Backend now calculates and returns the total number of scheduled execution runs in the next hour and 24 hours, not just unique schedules. The frontend displays these new metrics in the diagnostics admin panel. The OpenAPI schema is updated to reflect the new fields.
Introduces backend and frontend support for stopping all long-running executions and cleaning up all stuck queued executions via new admin endpoints. Updates diagnostics logic to ensure both cancel signals and DB status updates are performed, adds corresponding API routes, and enhances the admin UI to expose these bulk actions. Also updates the sidebar icon for diagnostics.
Extract error messages from the stats JSON field in failed executions details. Update the admin diagnostics route to always count the actual number of failed executions within the specified time window, ensuring accurate pagination.
Introduces backend endpoints and models for schedule diagnostics, including orphaned schedule detection, listing, and bulk cleanup. Updates the frontend to display schedule health metrics and a new schedules table with management actions. OpenAPI spec is updated to document the new endpoints and models.
Add extensive diagnostic capabilities for on-call engineers to monitor and manage execution health.
Backend Enhancements:
- Add 18 diagnostic metrics covering failures, orphaned executions, stuck queued, throughput, and queue health
- Implement orphaned execution detection (>24h old, not in executor)
- Add stuck queued detection (QUEUED >1h, never started)
- Add long-running execution detection (RUNNING >24h)
- Monitor both execution and cancel RabbitMQ queues
- Track failure rates (1h, 24h) and execution throughput metrics
New Backend Endpoints (15 total):
- GET /admin/diagnostics/executions/orphaned - List orphaned executions
- GET /admin/diagnostics/executions/stuck-queued - List stuck queued executions
- GET /admin/diagnostics/executions/long-running - List long-running executions
- GET /admin/diagnostics/executions/failed - List failed executions with error messages
- POST /admin/diagnostics/executions/cleanup-all-orphaned - Cleanup all orphaned (operates on entire dataset)
- POST /admin/diagnostics/executions/requeue - Requeue single stuck execution
- POST /admin/diagnostics/executions/requeue-bulk - Requeue selected executions
- POST /admin/diagnostics/executions/requeue-all-stuck - Requeue all stuck queued (operates on entire dataset)
Execution Management:
- Dual-mode stop: Active executions (cancel signals) vs orphaned (direct DB cleanup)
- Intelligent Stop All: Auto-splits active/orphaned, executes in parallel
- Requeue functionality for stuck QUEUED executions with credit cost warnings
- Stop sends cancel signals to RabbitMQ for graceful termination
- Cleanup orphaned updates DB directly without cancel signals
- ALL endpoints operate on entire datasets (not limited to pagination)
Frontend Enhancements:
- 5-tab filtering interface: All, Orphaned, Stuck Queued, Long-Running, Failed
- Clickable alert cards (🟠🔴🟡) automatically switch to relevant tabs
- Tab badges show live counts from diagnostics metrics
- Age column displays execution duration (e.g., "245d 12h")
- Orange row highlighting for orphaned executions (>24h old)
- Error message column for failed executions with hover tooltips
- Click-to-copy for execution IDs and user IDs with visual feedback
- Status badge colors match library view (blue=RUNNING, yellow=QUEUED, red=FAILED)
Tab-Specific Actions:
- Stuck Queued: Cleanup All OR Requeue All buttons with cost warnings
- Stuck Queued per-row: 🟠 Cleanup OR 🔵 Requeue buttons
- Orphaned: Cleanup All (operates on ALL orphaned)
- Long-Running: Stop All (sends cancel signals)
- Failed: View-only with error details
- All: Stop All (intelligent split of active/orphaned)
Alert Cards:
- 🟠 Orphaned: Shows count with RUNNING/QUEUED breakdown, click to view
- 🔴 Failed (24h): Shows count with hourly rate, click to view
- 🟡 Long-Running: Shows count with oldest execution age, click to view
Updated Diagnostic Info Card:
- Color-coded explanations for each execution type
- When to cleanup vs requeue vs stop
- Credit cost implications clearly documented
- Queue health thresholds explained
Provides ~70% coverage of on-call guide requirements for troubleshooting execution issues, orphaned database records, and system health monitoring.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
### Changes 🏗️
- Prevent removing progress of user onboarding tasks by merging arrays
on the backend instead of replacing them
- New endpoint for onboarding reset
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Tasks are not being reset
- [x] `/onboarding/reset` works
- #11273
- Bump `apscheduler` to v3.11.1 which contains a fix for the issue
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] "It's a rather ugly solution but the test proves that it works."
~the maintainer
- [x] CI passes
<!-- Clearly explain the need for these changes: -->
This PR addresses the need for consistent error handling across all
blocks in the AutoGPT platform. Previously, each block had to manually
define an `error` field in their output schema, leading to code
duplication and potential inconsistencies. Some blocks might forget to
include the error field, making error handling unpredictable.
### Changes 🏗️
<!-- Concisely describe all of the changes made in this pull request:
-->
- **Created `BlockSchemaOutput` base class**: New base class that
extends `BlockSchema` with a standardized `error` field
- **Created `BlockSchemaInput` base class**: Added for consistency and
future extensibility
- **Updated 140+ block implementations**: Changed all block `Output`
classes from `class Output(BlockSchema):` to `class
Output(BlockSchemaOutput):`
- **Removed manual error field definitions**: Eliminated hundreds of
duplicate `error: str = SchemaField(...)` definitions
- **Updated type annotations**: Changed `Block[BlockSchema,
BlockSchema]` to `Block[BlockSchemaInput, BlockSchemaOutput]` throughout
the codebase
- **Fixed imports**: Added `BlockSchemaInput` and `BlockSchemaOutput`
imports to all relevant files
- **Maintained backward compatibility**: Updated `EmptySchema` to
inherit from `BlockSchemaOutput`
**Key Benefits:**
- Consistent error handling across all blocks
- Reduced code duplication (removed ~200 lines of repetitive error field
definitions)
- Type safety improvements with distinct input/output schema types
- Blocks can still override error field with more specific descriptions
when needed
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Verified `poetry run format` passes (all linting, formatting, and
type checking)
- [x] Tested block instantiation works correctly (MediaDurationBlock,
UnrealTextToSpeechBlock)
- [x] Confirmed error fields are automatically present in all updated
blocks
- [x] Verified block loading system works (successfully loads 353+
blocks)
- [x] Tested backward compatibility with EmptySchema
- [x] Confirmed blocks can still override error field with custom
descriptions
- [x] Validated core schema inheritance chain works correctly
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
*Note: No configuration changes were needed for this refactoring.*
🤖 Generated with [Claude Code](https://claude.ai/code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Lluis Agusti <hi@llu.lu>
Co-authored-by: Ubbe <hi@ubbe.dev>
### Changes 🏗️
- Increased `max_field_size` in `aiohttp.ClientSession` to 16KB to
handle servers with large headers (e.g., long CSP headers).
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Add unit test that checks it can now parse headers over 8k size
---------
Co-authored-by: seer-by-sentry[bot] <157164994+seer-by-sentry[bot]@users.noreply.github.com>
Co-authored-by: Swifty <craigswift13@gmail.com>
Co-authored-by: Ubbe <hi@ubbe.dev>
### Changes 🏗️
- Added validation to ensure that the `summary` and `final_summary`
returned by the LLM are strings.
- Raises a `ValueError` if the LLM returns a list or other non-string
type, providing a descriptive error message to aid debugging.
Fixes
[AUTOGPT-SERVER-6M4](https://sentry.io/organizations/significant-gravitas/issues/6978480131/).
The issue was that: LLM returned list of strings instead of single
string summary, causing `_combine_summaries` to fail on `join`.
This fix was generated by Seer in Sentry, triggered by Craig Swift. 👁️
Run ID: 2230933
Not quite right? [Click here to continue debugging with
Seer.](https://sentry.io/organizations/significant-gravitas/issues/6978480131/?seerDrawer=true)
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Added a unit test to verify that a ValueError is raised when the
LLM returns a list instead of a string for summary or final_summary.
---------
Co-authored-by: seer-by-sentry[bot] <157164994+seer-by-sentry[bot]@users.noreply.github.com>
Co-authored-by: Swifty <craigswift13@gmail.com>
Marketplace sort by functionality was not working on the frontend. This
PR fixes it
### Changes 🏗️
- Add type hints for sort by
- Fix marketplace sort by drop downs
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] tested locally
### Changes 🏗️
Enhanced SQL query security in the store search functionality by
implementing proper parameterization to prevent SQL injection
vulnerabilities.
**Security Improvements:**
- Replaced string interpolation with PostgreSQL positional parameters
(`$1`, `$2`, etc.) for all user inputs
- Added ORDER BY whitelist validation to prevent injection via
`sorted_by` parameter
- Parameterized search term, creators array, category, and pagination
values
- Fixed variable naming conflict (`sql_where_clause` vs `where_clause`)
**Testing:**
- Added 4 comprehensive tests validating SQL injection prevention across
different attack vectors
- Tests verify that malicious input in search queries, filters, sorting,
and categories are safely handled
- All 10 tests in db_test.py pass successfully
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] All existing tests pass (10/10 tests passing)
- [x] New security tests validate SQL injection prevention
- [x] Verified parameterized queries handle malicious input safely
- [x] Code formatting passes (`poetry run format`)
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
*Note: No configuration changes required for this security fix*
## Summary
Fix critical issue where pre-execution permission validation broke
execution of graphs that reference older versions of sub-graphs.
## Problem
The `validate_graph_execution_permissions` function was checking for the
specific version of a graph in the user's library. This caused failures
when:
1. A parent graph references an older version of a sub-graph
2. The user updates the sub-graph to a newer version
3. The older version is no longer in their library
4. Execution of the parent graph fails with `GraphNotInLibraryError`
## Root Cause
In `backend/executor/utils.py` line 523, the function was checking for
the exact version, but sub-graphs legitimately reference older versions
that may no longer be in the library.
## Solution
### 1. Remove Version-Specific Check (backend/executor/utils.py)
- Remove `graph_version=graph.version` parameter from validation call
- Add explanatory comment about version-agnostic behavior
- Now only checks that the graph ID exists in user's library (any
version)
### 2. Enhance Documentation (backend/data/graph.py)
- Update function docstring to explain version-agnostic behavior
- Document that `None` (now default) allows execution of any version
- Clarify this is important for sub-graph version compatibility
## Technical Details
The `validate_graph_execution_permissions` function was already designed
to handle version-agnostic checks when `graph_version=None`. By omitting
the version parameter, we skip the version check and only verify:
- Graph exists in user's library
- Graph is not deleted/archived
- User has execution permissions
## Impact
- ✅ Parent graphs can execute even when they reference older sub-graph
versions
- ✅ Sub-graph updates don't break existing parent graphs
- ✅ Maintains security: still checks library membership and permissions
- ✅ No breaking changes: version-specific validation still available
when needed
## Example Scenario Fixed
1. User creates parent graph that uses sub-graph v1
2. User updates sub-graph to v2 (v1 removed from library)
3. Parent graph still references sub-graph v1
4. **Before**: Execution fails with `GraphNotInLibraryError`
5. **After**: Execution succeeds (version-agnostic permission check)
## Testing
- [x] Code formatting and linting passes
- [x] Type checking passes
- [x] No breaking changes to existing functionality
- [x] Security still maintained through library membership checks
## Files Changed
- `backend/executor/utils.py`: Remove version-specific permission check
- `backend/data/graph.py`: Enhanced documentation for version-agnostic
behavior
Closes #[issue-number-if-applicable]
Co-authored-by: Claude <noreply@anthropic.com>
📨 Fix: Handle Oversized Notification Emails
Summary
This PR adds logic to detect and handle oversized notification emails
exceeding Postmark’s 5 MB limit. Instead of retrying indefinitely, the
system now sends a lightweight summary email with key stats and a
dashboard link.
Changes
Added size check in EmailSender.send_templated()
Sends summary email when payload > ~4.5 MB
Prevents infinite retries and queue clogging
Added logs for oversized detection
Fixes#11119
---------
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
- Resolves#11251
This fixes all the warnings mentioned in #11251, reducing noise and
making our logs and error alerts more useful :)
### Changes 🏗️
- Remove "Block {block_name} has multiple credential inputs" warning
(not actually an issue)
- Rename `json` attribute of `MainCodeExecutionResult` to `json_data`;
retain serialized name through a field alias
- Replace `Path(regex=...)` with `Path(pattern=...)` in
`get_shared_execution` endpoint parameter config
- Change Uvicorn's WebSocket module to new Sans-I/O implementation for
WS server
- Disable Uvicorn's WebSocket module for REST server
- Remove deprecated `enable_cleanup_closed=True` argument in
`CloudStorageHandler` implementation
- Replace Prisma transaction timeout `int` argument with a `timedelta`
value
- Update Sentry SDK to latest version (v2.42.1)
- Broaden filter for cleanup warnings from indirect dependency `litellm`
- Fix handling of `MissingConfigError` in REST server endpoints
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- Check that the warnings are actually gone
- [x] Deploy to dev environment and run a graph; check for any warnings
- Test WebSocket server
- [x] Run an agent in the Builder; make sure real-time execution updates
still work
### Changes 🏗️
- Modifies the LLM block to extract the actual response from the
dictionary returned by the LLM, instead of yielding the entire
dictionary. This addresses
[AUTOGPT-SERVER-6EY](https://sentry.io/organizations/significant-gravitas/issues/6950850822/).
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] After applying the fix, I ran the agent that triggered the Sentry
error and confirmed that it now completes successfully without errors.
---------
Co-authored-by: seer-by-sentry[bot] <157164994+seer-by-sentry[bot]@users.noreply.github.com>
Co-authored-by: Swifty <craigswift13@gmail.com>
Fixes
[AUTOGPT-SERVER-6KZ](https://sentry.io/organizations/significant-gravitas/issues/6976926125/).
The issue was that: Redirect handling strips the URL scheme, causing
subsequent requests to fail validation and hit a 404.
- Ensures the hostname in the URL is properly IDNA-encoded after
validation.
- Reconstructs the netloc with the encoded hostname and preserves the
port if it exists.
This fix was generated by Seer in Sentry, triggered by Craig Swift. 👁️
Run ID: 2204774
Not quite right? [Click here to continue debugging with
Seer.](https://sentry.io/organizations/significant-gravitas/issues/6976926125/?seerDrawer=true)
### Changes 🏗️
**backend/util/request.py:**
- Fixed URL validation to properly preserve port numbers when
reconstructing netloc
- Ensures IDNA-encoded hostname is combined with port (if present)
before URL reconstruction
**Test Results:**
- ✅ Tested request to https://www.target.com/ (original failing URL from
Sentry issue)
- ✅ Status: 200, Content retrieved successfully (339,846 bytes)
- ✅ Port preservation verified for URLs with explicit ports
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Tested request to https://www.target.com/ (original failing URL)
- [x] Verified status code 200 and successful content retrieval
- [x] Verified port preservation in URL validation
<details>
<summary>Example test plan</summary>
- [ ] Create from scratch and execute an agent with at least 3 blocks
- [ ] Import an agent from file upload, and confirm it executes
correctly
- [ ] Upload agent to marketplace
- [ ] Import an agent from marketplace and confirm it executes correctly
- [ ] Edit an agent from monitor, and confirm it executes correctly
</details>
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
<details>
<summary>Examples of configuration changes</summary>
- Changing ports
- Adding new services that need to communicate with each other
- Secrets or environment variable changes
- New or infrastructure changes such as databases
</details>
Co-authored-by: seer-by-sentry[bot] <157164994+seer-by-sentry[bot]@users.noreply.github.com>
Co-authored-by: Swifty <craigswift13@gmail.com>
Removes CLAUDE_3_5_SONNET and CLAUDE_3_5_HAIKU from LlmModel enum, model
metadata, and cost configuration since they are deprecated
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verify the models are gone from the llm blocks
Categories and Creators where not sanitized in the full text search
- apply sanitization to categories and creators
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] run tests to check it still works
Categories and Creators where not sanitized in the full text search
### Changes 🏗️
- apply sanitization to categories and creators
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] run tests to check it still works
Added error output pins to all Firecrawl blocks as standard on the
AutoGPT platform. The base block execution code already handles error
yielding, so no try-catch logic was needed.
- FirecrawlScrapeBlock: Added error output pin for scrape failures
- FirecrawlCrawlBlock: Added error output pin for crawl failures
- FirecrawlExtractBlock: Added error output pin for extraction failures
- FirecrawlMapBlock: Added error output pin for map failures
- FirecrawlSearchBlock: Added error output pin for search failures
Resolves#11253
<!-- Clearly explain the need for these changes: -->
### Changes 🏗️
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [ ] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [ ] ...
<details>
<summary>Example test plan</summary>
- [ ] Create from scratch and execute an agent with at least 3 blocks
- [ ] Import an agent from file upload, and confirm it executes
correctly
- [ ] Upload agent to marketplace
- [ ] Import an agent from marketplace and confirm it executes correctly
- [ ] Edit an agent from monitor, and confirm it executes correctly
</details>
#### For configuration changes:
- [ ] `.env.default` is updated or already compatible with my changes
- [ ] `docker-compose.yml` is updated or already compatible with my
changes
- [ ] I have included a list of my configuration changes in the PR
description (under **Changes**)
<details>
<summary>Examples of configuration changes</summary>
- Changing ports
- Adding new services that need to communicate with each other
- Secrets or environment variable changes
- New or infrastructure changes such as databases
</details>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Toran Bruce Richards <Torantulino@users.noreply.github.com>
We're currently seeing errors in the `DatabaseManager` while it's
shutting down, like:
```
WARNING [DatabaseManager] Termination request: SystemExit; 0 executing cleanup.
INFO [DatabaseManager] ⏳ Disconnecting Database...
INFO [PID-1|THREAD-29|DatabaseManager|Prisma-82fb1994-4b87-40c1-8869-fbd97bd33fc8] Releasing connection started...
INFO [PID-1|THREAD-29|DatabaseManager|Prisma-82fb1994-4b87-40c1-8869-fbd97bd33fc8] Releasing connection completed successfully.
INFO [DatabaseManager] Terminated.
ERROR POST /create_or_add_to_user_notification_batch failed: Failed to create or add to notification batch for user {user_id} and type AGENT_RUN: NoneType: None
```
This indicates two issues:
- The service doesn't wait for pending RPC calls to finish before
terminating
- We're using `logger.exception` outside an error handling context,
causing the confusing and not much useful `NoneType: None` to be printed
instead of error info
### Changes 🏗️
- Implement graceful shutdown in `AppService` so in-flight RPC calls can
finish
- Add tests for graceful shutdown
- Prevent `AppService` accepting new requests during shutdown
- Rework `AppService` lifecycle management; add support for async
`lifespan`
- Fix `AppService` endpoint error logging
- Improve logging in `AppProcess` and `AppService`
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- Deploy to Dev cluster, then `kubectl rollout restart` the different
services a few times
- [x] -> `DatabaseManager` doesn't break on re-deployment
- [x] -> `Scheduler` doesn't break on re-deployment
- [x] -> `NotificationManager` doesn't break on re-deployment
Updated block costs in `backend/backend/data/block_cost_config.py`:
- **AIShortformVideoCreatorBlock**: Updated from 50 credits to 307
- **AIAdMakerVideoCreatorBlock**: Added cost of 714 credits
- **AIScreenshotToVideoAdBlock**: Added cost of 612 credits
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verify AIShortformVideoCreatorBlock costs 307 credits when
executed
- [x] Verify AIAdMakerVideoCreatorBlock costs 714 credits when executed
- [x] Verify AIScreenshotToVideoAdBlock costs 612 credits when executed
Updated block costs in `backend/backend/data/block_cost_config.py`:
- **AIShortformVideoCreatorBlock**: Updated from 50 credits to 307
- **AIAdMakerVideoCreatorBlock**: Added cost of 714 credits
- **AIScreenshotToVideoAdBlock**: Added cost of 612 credits
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verify AIShortformVideoCreatorBlock costs 307 credits when
executed
- [x] Verify AIAdMakerVideoCreatorBlock costs 714 credits when executed
- [x] Verify AIScreenshotToVideoAdBlock costs 612 credits when executed
<!-- Clearly explain the need for these changes: -->
This PR converts Jinja2 TemplateError exceptions to ValueError in the
TextFormatter class to ensure proper error handling and HTTP status code
responses (400 instead of 500).
### Changes 🏗️
<!-- Concisely describe all of the changes made in this pull request:
-->
- Added import for `jinja2.exceptions.TemplateError` in
`backend/util/text.py:6`
- Wrapped template rendering in try-catch block in `format_string`
method (`backend/util/text.py:105-109`)
- Convert `TemplateError` to `ValueError` to ensure proper 400 HTTP
status code for client errors
- Added warning logging for template rendering errors before re-raising
as ValueError
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan: -->
- [x] Verified that invalid Jinja2 templates now raise ValueError
instead of TemplateError
- [x] Confirmed that valid templates continue to work correctly
- [x] Checked that warning logs are generated for template errors
- [x] Validated that the exception chain is preserved with `from e`
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
- Run ruff, isort, and black on Python files
- Run prettier on TypeScript files
- Remove unused LaunchDarklyIntegration import from metrics.py
Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com>