mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-04-30 03:00:41 -04:00
5e2ae3cec54c093ae569f2f324534f70dfa7360f
87 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
728c40def5 |
fix(backend): replace multiprocessing queue with thread safe queue in ExecutionQueue (#11618)
<!-- Clearly explain the need for these changes: --> The `ExecutionQueue` class was using `multiprocessing.Manager().Queue()` which spawns a subprocess for inter-process communication. However, analysis showed that `ExecutionQueue` is only accessed from threads within the same process, not across processes. This caused: - Unnecessary subprocess spawning per graph execution - IPC overhead for every queue operation - Potential resource leaks if Manager processes weren't properly cleaned up - Limited scalability when many graphs execute concurrently ### Changes <!-- Concisely describe all of the changes made in this pull request: --> - Replaced `multiprocessing.Manager().Queue()` with `queue.Queue()` in `ExecutionQueue` class - Updated imports: removed `from multiprocessing import Manager` and `from queue import Empty`, added `import queue` - Updated exception handling from `except Empty:` to `except queue.Empty:` - Added comprehensive docstring explaining the bug and fix **File changed:** `autogpt_platform/backend/backend/data/execution.py` ### Checklist #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: <!-- Put your test plan here: --> - [x] Verified `ExecutionQueue` uses `queue.Queue` (not `multiprocessing.Manager().Queue()`) - [x] Tested all queue operations: `add()`, `get()`, `empty()`, `get_or_none()` - [x] Verified thread-safety with concurrent producer/consumer threads (100 items) - [x] Verified multi-producer/consumer scenario (3 producers, 2 consumers, 150 items) - [x] Confirmed no subprocess spawning when creating multiple queues - [x] Code passes Black formatting check #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under **Changes**) > No configuration changes required - this is a code-only fix with no external API changes. --------- Co-authored-by: Otto <otto@agpt.co> Co-authored-by: Zamil Majdy <majdyz@users.noreply.github.com> Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co> |
||
|
|
7668c17d9c |
feat(platform): add User Workspace for persistent CoPilot file storage (#11867)
Implements persistent User Workspace storage for CoPilot, enabling
blocks to save and retrieve files across sessions. Files are stored in
session-scoped virtual paths (`/sessions/{session_id}/`).
Fixes SECRT-1833
### Changes 🏗️
**Database & Storage:**
- Add `UserWorkspace` and `UserWorkspaceFile` Prisma models
- Implement `WorkspaceStorageBackend` abstraction (GCS for cloud, local
filesystem for self-hosted)
- Add `workspace_id` and `session_id` fields to `ExecutionContext`
**Backend API:**
- Add REST endpoints: `GET/POST /api/workspace/files`, `GET/DELETE
/api/workspace/files/{id}`, `GET /api/workspace/files/{id}/download`
- Add CoPilot tools: `list_workspace_files`, `read_workspace_file`,
`write_workspace_file`
- Integrate workspace storage into `store_media_file()` - returns
`workspace://file-id` references
**Block Updates:**
- Refactor all file-handling blocks to use unified `ExecutionContext`
parameter
- Update media-generating blocks to persist outputs to workspace
(AIImageGenerator, AIImageCustomizer, FluxKontext, TalkingHead, FAL
video, Bannerbear, etc.)
**Frontend:**
- Render `workspace://` image references in chat via proxy endpoint
- Add "AI cannot see this image" overlay indicator
**CoPilot Context Mapping:**
- Session = Agent (graph_id) = Run (graph_exec_id)
- Files scoped to `/sessions/{session_id}/`
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] Create CoPilot session, generate image with AIImageGeneratorBlock
- [ ] Verify image returns `workspace://file-id` (not base64)
- [ ] Verify image renders in chat with visibility indicator
- [ ] Verify workspace files persist across sessions
- [ ] Test list/read/write workspace files via CoPilot tools
- [ ] Test local storage backend for self-hosted deployments
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
🤖 Generated with [Claude Code](https://claude.ai/code)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Introduces a new persistent file-storage surface area (DB tables,
storage backends, download API, and chat tools) and rewires
`store_media_file()`/block execution context across many blocks, so
regressions could impact file handling, access control, or storage
costs.
>
> **Overview**
> Adds a **persistent per-user Workspace** (new
`UserWorkspace`/`UserWorkspaceFile` models plus `WorkspaceManager` +
`WorkspaceStorageBackend` with GCS/local implementations) and wires it
into the API via a new `/api/workspace/files/{file_id}/download` route
(including header-sanitized `Content-Disposition`) and shutdown
lifecycle hooks.
>
> Extends `ExecutionContext` to carry execution identity +
`workspace_id`/`session_id`, updates executor tooling to clone
node-specific contexts, and updates `run_block` (CoPilot) to create a
session-scoped workspace and synthetic graph/run/node IDs.
>
> Refactors `store_media_file()` to require `execution_context` +
`return_format` and to support `workspace://` references; migrates many
media/file-handling blocks and related tests to the new API and to
persist generated media as `workspace://...` (or fall back to data URIs
outside CoPilot), and adds CoPilot chat tools for
listing/reading/writing/deleting workspace files with safeguards against
context bloat.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
|
||
|
|
033f58c075 |
fix(backend): Make Redis event bus gracefully handle connection failures (#11817)
## Summary Adds graceful error handling to AsyncRedisEventBus and RedisEventBus so that connection failures log exceptions with full traceback while remaining non-breaking. This allows DatabaseManager to operate without Redis connectivity. ## Problem DatabaseManager was failing with "Authentication required" when trying to publish notifications via AsyncRedisNotificationEventBus. The service has no Redis credentials configured, causing `increment_onboarding_runs` to fail. ## Root Cause When `increment_onboarding_runs` publishes a notification: 1. Calls `AsyncRedisNotificationEventBus().publish()` 2. Attempts to connect to Redis via `get_redis_async()` 3. Connection fails due to missing credentials 4. Exception propagates, failing the entire DB operation Previous fix (#11775) made the cache module lazy, but didn't address the notification bus which also requires Redis. ## Solution Wrap Redis operations in try-except blocks: - `publish_event`: Logs exception with traceback, continues without publishing - `listen_events`: Logs exception with traceback, returns empty generator - `wait_for_event`: Returns None on connection failure Using `logger.exception()` instead of `logger.warning()` ensures full stack traces are captured for debugging while keeping operations non-breaking. This allows services to operate without Redis when only using event bus for non-critical notifications. ## Changes - Modified `backend/data/event_bus.py`: - Added graceful error handling to `RedisEventBus` and `AsyncRedisEventBus` - All Redis operations now catch exceptions and log with `logger.exception()` - Added `backend/data/event_bus_test.py`: - Tests verify graceful degradation when Redis is unavailable - Tests verify normal operation when Redis is available ## Test Plan - [x] New tests verify graceful degradation when Redis unavailable - [x] Existing notification tests still pass - [x] DatabaseManager can increment onboarding runs without Redis ## Related Issues Fixes https://significant-gravitas.sentry.io/issues/7205834440/ (AUTOGPT-SERVER-76D) |
||
|
|
8b25e62959 |
feat(backend,frontend): add explicit safe mode toggles for HITL and sensitive actions (#11756)
## Summary This PR introduces two explicit safe mode toggles for controlling agent execution behavior, providing clearer and more granular control over when agents should pause for human review. ### Key Changes **New Safe Mode Settings:** - **`human_in_the_loop_safe_mode`** (bool, default `true`) - Controls whether human-in-the-loop (HITL) blocks pause for review - **`sensitive_action_safe_mode`** (bool, default `false`) - Controls whether sensitive action blocks pause for review **New Computed Properties on LibraryAgent:** - `has_human_in_the_loop` - Indicates if agent contains HITL blocks - `has_sensitive_action` - Indicates if agent contains sensitive action blocks **Block Changes:** - Renamed `requires_human_review` to `is_sensitive_action` on blocks for clarity - Blocks marked as `is_sensitive_action=True` pause only when `sensitive_action_safe_mode=True` - HITL blocks pause when `human_in_the_loop_safe_mode=True` **Frontend Changes:** - Two separate toggles in Agent Settings based on block types present - Toggle visibility based on `has_human_in_the_loop` and `has_sensitive_action` computed properties - Settings cog hidden if neither toggle applies - Proper state management for both toggles with defaults **AI-Generated Agent Behavior:** - AI-generated agents set `sensitive_action_safe_mode=True` by default - This ensures sensitive actions are reviewed for AI-generated content ## Changes **Backend:** - `backend/data/graph.py` - Updated `GraphSettings` with two boolean toggles (non-optional with defaults), added `has_sensitive_action` computed property - `backend/data/block.py` - Renamed `requires_human_review` to `is_sensitive_action`, updated review logic - `backend/data/execution.py` - Updated `ExecutionContext` with both safe mode fields - `backend/api/features/library/model.py` - Added `has_human_in_the_loop` and `has_sensitive_action` to `LibraryAgent` - `backend/api/features/library/db.py` - Updated to use `sensitive_action_safe_mode` parameter - `backend/executor/utils.py` - Simplified execution context creation **Frontend:** - `useAgentSafeMode.ts` - Rewritten to support two independent toggles - `AgentSettingsModal.tsx` - Shows two separate toggles - `SelectedSettingsView.tsx` - Shows two separate toggles - Regenerated API types with new schema ## Test Plan - [x] All backend tests pass (Python 3.11, 3.12, 3.13) - [x] All frontend tests pass - [x] Backend format and lint pass - [x] Frontend format and lint pass - [x] Pre-commit hooks pass --------- Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co> |
||
|
|
f31c160043 |
feat(platform): add endedAt field and fix execution analytics timestamps (#11759)
## Summary
This PR adds proper execution end time tracking and fixes timestamp
handling throughout the execution analytics system.
### Key Changes
1. **Added `endedAt` field to database schema** - Executions now have a
dedicated field for tracking when they finish
2. **Fixed timestamp nullable handling** - `started_at` and `ended_at`
are now properly nullable in types
3. **Fixed chart aggregation** - Reduced threshold from ≥3 to ≥1
executions per day
4. **Improved timestamp display** - Moved timestamps to expandable
details section in analytics table
5. **Fixed nullable timestamp bugs** - Updated all frontend code to
handle null timestamps correctly
## Problem Statement
### Issue 1: Missing Execution End Times
Previously, executions used `updatedAt` (last DB update) as a proxy for
"end time". This broke when adding correctness scores retroactively -
the end time would change to whenever the score was added, not when the
execution actually finished.
### Issue 2: Chart Shows Only One Data Point
The accuracy trends chart showed only one data point despite having
executions across multiple days. Root cause: aggregation required ≥3
executions per day.
### Issue 3: Incorrect Type Definitions
Manually maintained types defined `started_at` and `ended_at` as
non-nullable `Date`, contradicting reality where QUEUED executions
haven't started yet.
## Solution
### Database Schema (`schema.prisma`)
```prisma
model AgentGraphExecution {
// ...
startedAt DateTime?
endedAt DateTime? // NEW FIELD
// ...
}
```
### Execution Lifecycle
- **QUEUED**: `startedAt = null`, `endedAt = null` (not started)
- **RUNNING**: `startedAt = set`, `endedAt = null` (in progress)
- **COMPLETED/FAILED/TERMINATED**: `startedAt = set`, `endedAt = set`
(finished)
### Migration Strategy
```sql
-- Add endedAt column
ALTER TABLE "AgentGraphExecution" ADD COLUMN "endedAt" TIMESTAMP(3);
-- Backfill ONLY terminal executions (prevents marking RUNNING executions as ended)
UPDATE "AgentGraphExecution"
SET "endedAt" = "updatedAt"
WHERE "endedAt" IS NULL
AND "executionStatus" IN ('COMPLETED', 'FAILED', 'TERMINATED');
```
## Changes by Component
### Backend
**`schema.prisma`**
- Added `endedAt` field to `AgentGraphExecution`
**`execution.py`**
- Made `started_at` and `ended_at` optional with Field descriptions
- Updated `from_db()` to use `endedAt` instead of `updatedAt`
- `update_graph_execution_stats()` sets `endedAt` when status becomes
terminal
**`execution_analytics_routes.py`**
- Removed `created_at`/`updated_at` from `ExecutionAnalyticsResult` (DB
metadata, not execution data)
- Kept only `started_at`/`ended_at` (actual execution runtime)
- Made settings global (avoid recreation)
- Moved OpenAI key validation to `_process_batch` (only check when LLM
actually runs)
**`analytics.py`**
- Fixed aggregation: `COUNT(*) >= 1` (was 3) - include all days with ≥1
execution
- Uses `createdAt` for chart grouping (when execution was queued)
**`late_execution_monitor.py`**
- Handle optional `started_at` with fallback to `datetime.min` for
sorting
- Display "Not started" when `started_at` is null
### Frontend
**Type Definitions**
- Fixed manually maintained `types.ts`: `started_at: Date | null` (was
non-nullable)
- Generated types were already correct
**Analytics Components**
- `AnalyticsResultsTable.tsx`: Show only `started_at`/`ended_at` in
2-column expandable grid
- `ExecutionAnalyticsForm.tsx`: Added filter explanation UI
**Monitoring Components** - Fixed null handling bugs:
- `OldAgentLibraryView.tsx`: Handle null in reduce function
- `agent-runs-selector-list.tsx`: Safe sorting with `?.getTime() ?? 0`
- `AgentFlowList.tsx`: Filter/sort with null checks
- `FlowRunsStatus.tsx`: Filter null timestamps
- `FlowRunsTimeline.tsx`: Filter executions with null timestamps before
rendering
- `monitoring/page.tsx`: Safe sorting
- `ActivityItem.tsx`: Fallback to "recently" for null timestamps
## Benefits
✅ **Accurate End Times**: `endedAt` is frozen when execution finishes,
not updated later
✅ **Type Safety**: Nullable types match reality, exposing real bugs
✅ **Better UX**: Chart shows all days with data (not just days with ≥3
executions)
✅ **Bug Fixes**: 7+ frontend components now handle null timestamps
correctly
✅ **Documentation**: Field descriptions explain when timestamps are null
## Testing
### Backend
```bash
cd autogpt_platform/backend
poetry run format # ✅ All checks passed
poetry run lint # ✅ All checks passed
```
### Frontend
```bash
cd autogpt_platform/frontend
pnpm format # ✅ All checks passed
pnpm lint # ✅ All checks passed
pnpm types # ✅ All type errors fixed
```
### Test Data Generation
Created script to generate 35 test executions across 7 days with
correctness scores:
```bash
poetry run python scripts/generate_test_analytics_data.py
```
## Migration Notes
⚠️ **Important**: The migration only backfills `endedAt` for executions
with terminal status (COMPLETED, FAILED, TERMINATED). Active executions
(QUEUED, RUNNING) correctly keep `endedAt = null`.
## Breaking Changes
None - this is backward compatible:
- `endedAt` is nullable, existing code that doesn't use it is unaffected
- Frontend already used generated types which were correct
- Migration safely backfills historical data
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> Introduces explicit execution end-time tracking and normalizes
timestamp handling across backend and frontend.
>
> - Adds `endedAt` to `AgentGraphExecution` (schema + migration);
backfills terminal executions; sets `endedAt` on terminal status updates
> - Makes `GraphExecutionMeta.started_at/ended_at` optional; updates
`from_db()` to use DB `endedAt`; exposes timestamps in
`ExecutionAnalyticsResult`
> - Moves OpenAI key validation into batch processing; instantiates
`Settings` once
> - Accuracy trends: reduce daily aggregation threshold to `>= 1`;
optional historical series
> - Monitoring/analytics UI: results table shows/export
`started_at`/`ended_at`; adds chart filter explainer
> - Frontend null-safety: update types (`Date | null`) and fix
sorting/filtering/rendering for nullable timestamps across monitoring
and library views
> - Late execution monitor: safe sorting/display when `started_at` is
null
> - OpenAPI specs updated for new/nullable fields
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
|
||
|
|
47a3a5ef41 |
feat(backend,frontend): optional credentials flag for blocks at agent level (#11716)
This feature allows agent makers to mark credential fields as optional.
When credentials are not configured for an optional block, the block
will be skipped during execution rather than causing a validation error.
**Use case:** An agent with multiple notification channels (Discord,
Twilio, Slack) where the user only needs to configure one - unconfigured
channels are simply skipped.
### Changes 🏗️
#### Backend
**Data Model Changes:**
- `backend/data/graph.py`: Added `credentials_optional` property to
`Node` model that reads from node metadata
- `backend/data/execution.py`: Added `nodes_to_skip` field to
`GraphExecutionEntry` model to track nodes that should be skipped
**Validation Changes:**
- `backend/executor/utils.py`:
- Updated `_validate_node_input_credentials()` to return a tuple of
`(credential_errors, nodes_to_skip)`
- Nodes with `credentials_optional=True` and missing credentials are
added to `nodes_to_skip` instead of raising validation errors
- Updated `validate_graph_with_credentials()` to propagate
`nodes_to_skip` set
- Updated `validate_and_construct_node_execution_input()` to return
`nodes_to_skip`
- Updated `add_graph_execution()` to pass `nodes_to_skip` to execution
entry
**Execution Changes:**
- `backend/executor/manager.py`:
- Added skip logic in `_on_graph_execution()` dispatch loop
- When a node is in `nodes_to_skip`, it is marked as `COMPLETED` without
execution
- No outputs are produced, so downstream nodes won't trigger
#### Frontend
**Node Store:**
- `frontend/src/app/(platform)/build/stores/nodeStore.ts`:
- Added `credentials_optional` to node metadata serialization in
`convertCustomNodeToBackendNode()`
- Added `getCredentialsOptional()` and `setCredentialsOptional()` helper
methods
**Credential Field Component:**
-
`frontend/src/components/renderers/input-renderer/fields/CredentialField/CredentialField.tsx`:
- Added "Optional - skip block if not configured" switch toggle
- Switch controls the `credentials_optional` metadata flag
- Placeholder text updates based on optional state
**Credential Field Hook:**
-
`frontend/src/components/renderers/input-renderer/fields/CredentialField/useCredentialField.ts`:
- Added `disableAutoSelect` parameter
- When credentials are optional, auto-selection of credentials is
disabled
**Feature Flags:**
- `frontend/src/services/feature-flags/use-get-flag.ts`: Minor refactor
(condition ordering)
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Build an agent using smart decision maker and down stream blocks
to test this
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> Introduces optional credentials across graph execution and UI,
allowing nodes to be skipped (no outputs, no downstream triggers) when
their credentials are not configured.
>
> - Backend
> - Adds `Node.credentials_optional` (from node `metadata`) and computes
required credential fields in `Graph.credentials_input_schema` based on
usage.
> - Validates credentials with `_validate_node_input_credentials` →
returns `(errors, nodes_to_skip)`; plumbs `nodes_to_skip` through
`validate_graph_with_credentials`,
`_construct_starting_node_execution_input`,
`validate_and_construct_node_execution_input`, and `add_graph_execution`
into `GraphExecutionEntry`.
> - Executor: dispatch loop skips nodes in `nodes_to_skip` (marks
`COMPLETED`); `execute_node`/`on_node_execution` accept `nodes_to_skip`;
`SmartDecisionMakerBlock.run` filters tool functions whose
`_sink_node_id` is in `nodes_to_skip` and errors only if all tools are
filtered.
> - Models: `GraphExecutionEntry` gains `nodes_to_skip` field. Tests and
snapshots updated accordingly.
>
> - Frontend
> - Builder: credential field uses `custom/credential_field` with an
"Optional – skip block if not configured" toggle; `nodeStore` persists
`credentials_optional` and history; UI hides optional toggle in run
dialogs.
> - Run dialogs: compute required credentials from
`credentials_input_schema.required`; allow selecting "None"; avoid
auto-select for optional; filter out incomplete creds before execute.
> - Minor schema/UI wiring updates (`uiSchema`, form context flags).
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
|
||
|
|
71157bddd7 |
feat(backend): add agent mode support to SmartDecisionMakerBlock with autonomous tool execution loops (#11547)
## Summary <img width="2072" height="1836" alt="image" src="https://github.com/user-attachments/assets/9d231a77-6309-46b9-bc11-befb5d8e9fcc" /> **🚀 Major Feature: Agent Mode Support** Adds autonomous agent mode to SmartDecisionMakerBlock, enabling it to execute tools directly in loops until tasks are completed, rather than just yielding tool calls for external execution. ## ⭐ **Key New Features** ### 🤖 **Agent Mode with Tool Execution Loops** - **New `agent_mode_max_iterations` parameter** controls execution behavior: - `0` = Traditional mode (single LLM call, yield tool calls) - `1+` = Agent mode with iteration limit - `-1` = Infinite agent mode (loop until finished) ### 🔄 **Autonomous Tool Execution** - **Direct tool execution** instead of yielding for external handling - **Multi-iteration loops** with conversation state management - **Automatic completion detection** when LLM stops making tool calls - **Iteration limit handling** with graceful completion messages ### 🏗️ **Proper Database Operations** - **Replace manual execution ID generation** with proper `upsert_execution_input`/`upsert_execution_output` - **Real NodeExecutionEntry objects** from database results - **Proper execution status management**: QUEUED → RUNNING → COMPLETED/FAILED ### 🔧 **Enhanced Type Safety** - **Pydantic models** replace TypedDict: `ToolInfo` and `ExecutionParams` - **Runtime validation** with better error messages - **Improved developer experience** with IDE support ## 🔧 **Technical Implementation** ### Agent Mode Flow: ```python # Agent mode enabled with iterations if input_data.agent_mode_max_iterations != 0: async for result in self._execute_tools_agent_mode(...): yield result # "conversations", "finished" return # Traditional mode (existing behavior) # Single LLM call + yield tool calls for external execution ``` ### Tool Execution with Database Operations: ```python # Before: Manual execution IDs tool_exec_id = f"{node_exec_id}_tool_{sink_node_id}_{len(input_data)}" # After: Proper database operations node_exec_result, final_input_data = await db_client.upsert_execution_input( node_id=sink_node_id, graph_exec_id=execution_params.graph_exec_id, input_name=input_name, input_data=input_value, ) ``` ### Type Safety with Pydantic: ```python # Before: Dict access prone to errors execution_params["user_id"] # After: Validated model access execution_params.user_id # Runtime validation + IDE support ``` ## 🧪 **Comprehensive Test Coverage** - **Agent mode execution tests** with multi-iteration scenarios - **Database operation verification** - **Type safety validation** - **Backward compatibility** for traditional mode - **Enhanced dynamic fields tests** ## 📊 **Usage Examples** ### Traditional Mode (Existing Behavior): ```python SmartDecisionMakerBlock.Input( prompt="Search for keywords", agent_mode_max_iterations=0 # Default ) # → Yields tool calls for external execution ``` ### Agent Mode (New Feature): ```python SmartDecisionMakerBlock.Input( prompt="Complete this task using available tools", agent_mode_max_iterations=5 # Max 5 iterations ) # → Executes tools directly until task completion or iteration limit ``` ### Infinite Agent Mode: ```python SmartDecisionMakerBlock.Input( prompt="Analyze and process this data thoroughly", agent_mode_max_iterations=-1 # No limit, run until finished ) # → Executes tools autonomously until LLM indicates completion ``` ## ✅ **Backward Compatibility** - **Zero breaking changes** to existing functionality - **Traditional mode remains default** (`agent_mode_max_iterations=0`) - **All existing tests pass** - **Same API for tool definitions and execution** This transforms the SmartDecisionMakerBlock from a simple tool call generator into a powerful autonomous agent capable of complex multi-step task execution! 🎯 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com> |
||
|
|
c1e21d07e6 |
feat(platform): add execution accuracy alert system (#11562)
## Summary <img width="1263" height="883" alt="image" src="https://github.com/user-attachments/assets/98d4f449-1897-4019-a599-846c27df4191" /> <img width="398" height="190" alt="image" src="https://github.com/user-attachments/assets/0138ac02-420d-4f96-b980-74eb41e3c968" /> - Add execution accuracy monitoring with moving averages and Discord alerts - Dashboard visualization for accuracy trends and alert detection - Hourly monitoring for marketplace agents (≥10 executions in 30 days) - Generated API client integration with type-safe models ## Features - **Moving Average Analysis**: 3-day vs 7-day comparison with configurable thresholds - **Discord Notifications**: Hourly alerts for accuracy drops ≥10% - **Dashboard UI**: Real-time trends visualization with alert status - **Type Safety**: Generated API hooks and models throughout - **Error Handling**: Graceful OpenAI configuration handling - **PostgreSQL Optimization**: Window functions for efficient trend queries ## Test plan - [x] Backend accuracy monitoring logic tested with sample data - [x] Frontend components using generated API hooks (no manual fetch) - [x] Discord notification integration working - [x] Admin authentication and authorization working - [x] All formatting and linting checks passing - [x] Error handling for missing OpenAI configuration - [x] Test data available with `test-accuracy-agent-001` 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com> |
||
|
|
e4102bf0fb |
fix(backend): resolve HITL execution context validation and re-enable tests (#11509)
## Summary
Fix critical validation errors in GraphExecutionEntry and
NodeExecutionEntry models that were preventing HITL block execution, and
re-enable HITL test suite.
## Root Cause
After introducing ExecutionContext as a mandatory field, existing
messages in the execution queue lacked this field, causing validation
failures:
```
[GraphExecutor] [ExecutionManager] Could not parse run message: 1 validation error for GraphExecutionEntry
execution_context
Field required [type=missing, input_value={'user_id': '26db15cb-a29...
```
## Changes Made
### 🔧 Execution Model Fixes (`backend/data/execution.py`)
- **Add backward compatibility**: `execution_context: ExecutionContext =
Field(default_factory=ExecutionContext)`
- **Prevent shared mutable defaults**: Use `default_factory` instead of
direct instantiation to avoid mutation issues
- **Ignore unknown fields**: Add `model_config = {"extra": "ignore"}`
for future compatibility
- **Applied to both models**: GraphExecutionEntry and NodeExecutionEntry
### 🧪 Re-enable HITL Test Suite
- **Remove test skips**: Remove `pytestmark = pytest.mark.skip()` from
`human_review_test.py`
- **Remove test skips**: Remove `pytestmark = pytest.mark.skip()` from
`review_routes_test.py`
- **Restore test coverage**: HITL functionality now properly tested in
CI
## Default ExecutionContext Behavior
When execution_context is missing from old messages, provides safe
defaults:
- `safe_mode: bool = True` (HITL blocks require approval by default)
- `user_timezone: str = "UTC"` (safe timezone default)
- `root_execution_id: Optional[str] = None`
- `parent_execution_id: Optional[str] = None`
## Impact
- ✅ **Fixes deployment validation errors** in dev environment
- ✅ **Maintains backward compatibility** with existing queue messages
- ✅ **Restores proper HITL test coverage** in CI
- ✅ **Ensures isolation**: Each execution gets its own ExecutionContext
instance
- ✅ **Future-proofs**: Protects against message format changes
## Testing
- [x] HITL test suite re-enabled and should pass in CI
- [x] Existing executions continue to work with sensible defaults
- [x] New executions receive proper ExecutionContext from caller
- [x] Verified `default_factory` prevents shared mutable instances
## Files Changed
- `backend/data/execution.py` - Add backward-compatible ExecutionContext
defaults
- `backend/data/human_review_test.py` - Re-enable test suite
- `backend/server/v2/executions/review/review_routes_test.py` -
Re-enable test suite
Resolves the "Field required" validation error preventing HITL block
execution in dev environment.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
||
|
|
7b951c977e |
feat(platform): implement graph-level Safe Mode toggle for HITL blocks (#11455)
## Summary This PR implements a graph-level Safe Mode toggle system for Human-in-the-Loop (HITL) blocks. When Safe Mode is ON (default), HITL blocks require manual review before proceeding. When OFF, they execute automatically. ## 🔧 Backend Changes - **Database**: Added `metadata` JSON column to `AgentGraph` table with migration - **API**: Updated `execute_graph` endpoint to accept `safe_mode` parameter - **Execution**: Enhanced execution context to use graph metadata as default with API override capability - **Auto-detection**: Automatically populate `has_human_in_the_loop` for graphs containing HITL blocks - **Block Detection**: HITL block ID: `8b2a7b3c-6e9d-4a5f-8c1b-2e3f4a5b6c7d` ## 🎨 Frontend Changes - **Component**: New `FloatingSafeModeToggle` with dual variants: - **White variant**: For library pages, integrates with action buttons - **Black variant**: For builders, floating positioned - **Integration**: Added toggles to both new/legacy builders and library pages - **API Integration**: Direct graph metadata updates via `usePutV1UpdateGraphVersion` - **Query Management**: React Query cache invalidation for consistent UI updates - **Conditional Display**: Toggle only appears when graph contains HITL blocks ## 🛠 Technical Implementation - **Safe Mode ON** (default): HITL blocks require manual review before proceeding - **Safe Mode OFF**: HITL blocks execute automatically without intervention - **Priority**: Backend API `safe_mode` parameter takes precedence over graph metadata - **Detection**: Auto-populates `has_human_in_the_loop` metadata field - **Positioning**: Proper z-index and responsive positioning for floating elements ## 🚧 Known Issues (Work in Progress) ### High Priority - [ ] **Toggle state persistence**: Always shows "ON" regardless of actual state - query invalidation issue - [ ] **LibraryAgent metadata**: Missing metadata field causing TypeScript errors - [ ] **Tooltip z-index**: Still covered by some UI elements despite high z-index ### Medium Priority - [ ] **HITL detection**: Logic needs improvement for reliable block detection - [ ] **Error handling**: Removing HITL blocks from graph causes save errors - [ ] **TypeScript**: Fix type mismatches between GraphModel and LibraryAgent ### Low Priority - [ ] **Frontend API**: Add `safe_mode` parameter to execution calls once OpenAPI is regenerated - [ ] **Performance**: Consider debouncing rapid toggle clicks ## 🧪 Test Plan - [ ] Verify toggle appears only when graph has HITL blocks - [ ] Test toggle persistence across page refreshes - [ ] Confirm API calls update graph metadata correctly - [ ] Validate execution behavior respects safe mode setting - [ ] Check styling consistency across builder and library contexts ## 🔗 Related - Addresses requirements for graph-level HITL configuration - Builds on existing FloatingReviewsPanel infrastructure - Integrates with existing graph metadata system 🤖 Generated with [Claude Code](https://claude.ai/code) |
||
|
|
3d08c22dd5 |
feat(platform): add Human In The Loop block with review workflow (#11380)
## Summary This PR implements a comprehensive Human In The Loop (HITL) block that allows agents to pause execution and wait for human approval/modification of data before continuing. https://github.com/user-attachments/assets/c027d731-17d3-494c-85ca-97c3bf33329c ## Key Features - Added WAITING_FOR_REVIEW status to AgentExecutionStatus enum - Created PendingHumanReview database table for storing review requests - Implemented HumanInTheLoopBlock that extracts input data and creates review entries - Added API endpoints at /api/executions/review for fetching and reviewing pending data - Updated execution manager to properly handle waiting status and resume after approval ## Frontend Components - PendingReviewCard for individual review handling - PendingReviewsList for multiple reviews - FloatingReviewsPanel for graph builder integration - Integrated review UI into 3 locations: legacy library, new library, and graph builder ## Technical Implementation - Added proper type safety throughout with SafeJson handling - Optimized database queries using count functions instead of full data fetching - Fixed imports to be top-level instead of local - All formatters and linters pass ## Test plan - [ ] Test Human In The Loop block creation in graph builder - [ ] Test block execution pauses and creates pending review - [ ] Test review UI appears in all 3 locations - [ ] Test data modification and approval workflow - [ ] Test rejection workflow - [ ] Test execution resumes after approval 🤖 Generated with [Claude Code](https://claude.ai/code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added Human-In-The-Loop review workflows to pause executions for human validation. * Users can approve or reject pending tasks, optionally editing submitted data and adding a message. * New "Waiting for Review" execution status with UI indicators across run lists, badges, and activity views. * Review management UI: pending review cards, list view, and a floating reviews panel for quick access. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude <noreply@anthropic.com> |
||
|
|
901bb31e14 |
feat(backend): parameterize activity status generation with customizable prompts (#11407)
## Summary
Implement comprehensive parameterization of the activity status
generation system to enable custom prompts for admin analytics
dashboard.
## Changes Made
### Core Function Enhancement (`activity_status_generator.py`)
- **Extract hardcoded prompts to constants**: `DEFAULT_SYSTEM_PROMPT`
and `DEFAULT_USER_PROMPT`
- **Add prompt parameters**: `system_prompt`, `user_prompt` with
defaults to maintain backward compatibility
- **Template substitution system**: User prompt supports
`{{GRAPH_NAME}}` and `{{EXECUTION_DATA}}` placeholders
- **Skip existing flag**: `skip_existing` parameter allows admin to
force regeneration of existing data
- **Maintain manager compatibility**: All existing calls continue to
work with default parameters
### Admin API Enhancement (`execution_analytics_routes.py`)
- **Custom prompt fields**: `system_prompt` and `user_prompt` optional
fields in `ExecutionAnalyticsRequest`
- **Skip existing control**: `skip_existing` boolean flag for admin
regeneration option
- **Template documentation**: Clear documentation of placeholder system
in field descriptions
- **Backward compatibility**: All existing API calls work unchanged
### Template System Design
- **Simple placeholder replacement**: `{{GRAPH_NAME}}` → actual graph
name, `{{EXECUTION_DATA}}` → JSON execution data
- **No dependencies**: Uses simple `string.replace()` for maximum
compatibility
- **JSON safety**: Execution data properly serialized as indented JSON
- **Validation tested**: Template substitution verified to work
correctly
## Key Features
### For Regular Users (Manager Integration)
- **No changes required**: Existing manager.py calls work unchanged
- **Default behavior preserved**: Same prompts and logic as before
- **Feature flag compatibility**: LaunchDarkly integration unchanged
### For Admin Analytics Dashboard
- **Custom system prompts**: Admins can override the AI evaluation
criteria
- **Custom user prompts**: Admins can modify the analysis instructions
with execution data templates
- **Force regeneration**: `skip_existing=False` allows reprocessing
existing executions with new prompts
- **Complete model list**: Access to all LLM models from `llm.py` (70+
models including GPT, Claude, Gemini, etc.)
## Technical Validation
- ✅ Template substitution tested and working
- ✅ Default behavior preserved for existing code
- ✅ Admin API parameter validation working
- ✅ All imports and function signatures correct
- ✅ Backward compatibility maintained
## Use Cases Enabled
- **A/B testing**: Compare different prompt strategies on same execution
data
- **Custom evaluation**: Tailor success criteria for specific graph
types
- **Prompt optimization**: Iterate on prompt design based on admin
feedback
- **Bulk reprocessing**: Regenerate activity status with improved
prompts
## Testing
- Template substitution functionality verified
- Function signatures and imports validated
- Code formatting and linting passed
- Backward compatibility confirmed
## Breaking Changes
None - all existing functionality preserved with default parameters.
## Related Issues
Resolves the requirement to expose prompt customization on the frontend
execution analytics dashboard.
---------
Co-authored-by: Claude <noreply@anthropic.com>
|
||
|
|
d6ee402483 |
feat(platform): Add execution analytics admin endpoint with feature flag bypass (#11327)
This PR adds a comprehensive execution analytics admin endpoint that generates AI-powered activity summaries and correctness scores for graph executions, with proper feature flag bypass for admin use. ### Changes 🏗️ **Backend Changes:** - Added admin endpoint: `/api/executions/admin/execution_analytics` - Implemented feature flag bypass with `skip_feature_flag=True` parameter for admin operations - Fixed async database client usage (`get_db_async_client`) to resolve async/await errors - Added batch processing with configurable size limits to handle large datasets - Comprehensive error handling and logging for troubleshooting - Renamed entire feature from "Activity Backfill" to "Execution Analytics" for clarity **Frontend Changes:** - Created clean admin UI for execution analytics generation at `/admin/execution-analytics` - Built form with graph ID input, model selection dropdown, and optional filters - Implemented results table with status badges and detailed execution information - Added CSV export functionality for analytics results - Integrated with generated TypeScript API client for proper authentication - Added proper error handling with toast notifications and loading states **Database & API:** - Fixed critical async/await issue by switching from sync to async database client - Updated router configuration and endpoint naming for consistency - Generated proper TypeScript types and API client integration - Applied feature flag filtering at API level while bypassing for admin operations ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: **Test Plan:** - [x] Admin can access execution analytics page at `/admin/execution-analytics` - [x] Form validation works correctly (requires graph ID, validates inputs) - [x] API endpoint `/api/executions/admin/execution_analytics` responds correctly - [x] Authentication works properly through generated API client - [x] Analytics generation works with different LLM models (gpt-4o-mini, gpt-4o, etc.) - [x] Results display correctly with appropriate status badges (success/failed/skipped) - [x] CSV export functionality downloads correct data - [x] Error handling displays appropriate toast messages - [x] Feature flag bypass works for admin users (generates analytics regardless of user flags) - [x] Batch processing handles multiple executions correctly - [x] Loading states show proper feedback during processing #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] No configuration changes required for this feature **Related to:** PR #11325 (base correctness score functionality) 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Zamil Majdy <majdyz@users.noreply.github.com> |
||
|
|
6037f80502 |
feat(backend): Add correctness score to execution activity generation (#11325)
## Summary Add AI-generated correctness score field to execution activity status generation to provide quantitative assessment of how well executions achieved their intended purpose. New page: <img width="1000" height="229" alt="image" src="https://github.com/user-attachments/assets/5cb907cf-5bc7-4b96-8128-8eecccde9960" /> Old page: <img width="1000" alt="image" src="https://github.com/user-attachments/assets/ece0dfab-1e50-4121-9985-d585f7fcd4d2" /> ## What Changed - Added `correctness_score` field (float 0.0-1.0) to `GraphExecutionStats` model - **REFACTORED**: Removed duplicate `llm_utils.py` and reused existing `AIStructuredResponseGeneratorBlock` logic - Updated activity status generator to use structured responses instead of plain text - Modified prompts to include correctness assessment with 5-tier scoring system: - 0.0-0.2: Failure - 0.2-0.4: Poor - 0.4-0.6: Partial Success - 0.6-0.8: Mostly Successful - 0.8-1.0: Success - Updated manager.py to extract and set both activity_status and correctness_score - Fixed tests to work with existing structured response interface ## Technical Details - **Code Reuse**: Eliminated duplication by using existing `AIStructuredResponseGeneratorBlock` instead of creating new LLM utilities - Added JSON validation with retry logic for malformed responses - Maintained backward compatibility for existing activity status functionality - Score is clamped to valid 0.0-1.0 range and validated - All type errors resolved and linting passes ## Test Plan - [x] All existing tests pass with refactored structure - [x] Structured LLM call functionality tested with success and error cases - [x] Activity status generation tested with various execution scenarios - [x] Integration tests verify both fields are properly set in execution stats - [x] No code duplication - reuses existing block logic 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Zamil Majdy <majdyz@users.noreply.github.com> |
||
|
|
4922f88851 |
feat(backend/executor): Implement cascading stop for nested graph executions (#11277)
## Summary Fixes critical issue where child executions spawned by `AgentExecutorBlock` continue running after parent execution is stopped. Implements parent-child execution tracking and recursive cascading stop logic to ensure entire execution trees are terminated together. ## Background When a parent graph execution containing `AgentExecutorBlock` nodes is stopped, only the parent was terminated. Child executions continued running, leading to: - ❌ Orphaned child executions consuming credits - ❌ No user control over execution trees - ❌ Race conditions where children start after parent stops - ❌ Resource leaks from abandoned executions ## Core Changes ### 1. Database Schema (`schema.prisma` + migration) ```sql -- Add nullable parent tracking field ALTER TABLE "AgentGraphExecution" ADD COLUMN "parentGraphExecutionId" TEXT; -- Add self-referential foreign key with graceful deletion ALTER TABLE "AgentGraphExecution" ADD CONSTRAINT "AgentGraphExecution_parentGraphExecutionId_fkey" FOREIGN KEY ("parentGraphExecutionId") REFERENCES "AgentGraphExecution"("id") ON DELETE SET NULL ON UPDATE CASCADE; -- Add index for efficient child queries CREATE INDEX "AgentGraphExecution_parentGraphExecutionId_idx" ON "AgentGraphExecution"("parentGraphExecutionId"); ``` ### 2. Parent ID Propagation (`backend/blocks/agent.py`) ```python # Extract current graph execution ID and pass as parent to child execution = add_graph_execution( # ... other params parent_graph_exec_id=graph_exec_id, # NEW: Track parent relationship ) ``` ### 3. Data Layer (`backend/data/execution.py`) ```python async def get_child_graph_executions(parent_exec_id: str) -> list[GraphExecution]: """Get all child executions of a parent execution.""" children = await AgentGraphExecution.prisma().find_many( where={"parentGraphExecutionId": parent_exec_id, "isDeleted": False} ) return [GraphExecution.from_db(child) for child in children] ``` ### 4. Cascading Stop Logic (`backend/executor/utils.py`) ```python async def stop_graph_execution( user_id: str, graph_exec_id: str, wait_timeout: float = 15.0, cascade: bool = True, # NEW parameter ): # 1. Find all child executions if cascade: children = await _get_child_executions(graph_exec_id) # 2. Stop all children recursively in parallel if children: await asyncio.gather( *[stop_graph_execution(user_id, child.id, wait_timeout, True) for child in children], return_exceptions=True, # Don't fail parent if child fails ) # 3. Stop the parent execution # ... existing stop logic ``` ### 5. Race Condition Prevention (`backend/executor/manager.py`) ```python # Before executing queued child, check if parent was terminated if parent_graph_exec_id: parent_exec = get_db_client().get_graph_execution_meta(parent_graph_exec_id, user_id) if parent_exec and parent_exec.status == ExecutionStatus.TERMINATED: # Skip execution, mark child as terminated get_db_client().update_graph_execution_stats( graph_exec_id=graph_exec_id, status=ExecutionStatus.TERMINATED, ) return # Don't start orphaned child ``` ## How It Works ### Before (Broken) ``` User stops parent execution ↓ Parent terminates ✓ ↓ Child executions keep running ✗ ↓ User cannot stop children ✗ ``` ### After (Fixed) ``` User stops parent execution ↓ Query database for all children ↓ Recursively stop all children in parallel ↓ Wait for children to terminate ↓ Stop parent execution ↓ All executions in tree stopped ✓ ``` ### Race Prevention ``` Child in QUEUED status ↓ Parent stopped ↓ Child picked up by executor ↓ Pre-flight check: parent TERMINATED? ↓ Yes → Skip execution, mark child TERMINATED ↓ Child never runs ✓ ``` ## Edge Cases Handled ✅ **Deep nesting** - Recursive cascading handles multi-level trees ✅ **Queued children** - Pre-flight check prevents execution ✅ **Race conditions** - Child spawned during stop operation ✅ **Partial failures** - `return_exceptions=True` continues on error ✅ **Multiple children** - Parallel stop via `asyncio.gather()` ✅ **No parent** - Backward compatible (nullable field) ✅ **Already completed** - Existing status check handles it ## Performance Impact - **Stop operation**: O(depth) with parallel execution vs O(1) before - **Memory**: +36 bytes per execution (one UUID reference) - **Database**: +1 query per tree level, indexed for efficiency ## API Changes (Backward Compatible) ### `stop_graph_execution()` - New Optional Parameter ```python # Before async def stop_graph_execution(user_id: str, graph_exec_id: str, wait_timeout: float = 15.0) # After async def stop_graph_execution(user_id: str, graph_exec_id: str, wait_timeout: float = 15.0, cascade: bool = True) ``` **Default `cascade=True`** means existing callers get the new behavior automatically. ### `add_graph_execution()` - New Optional Parameter ```python async def add_graph_execution(..., parent_graph_exec_id: Optional[str] = None) ``` ## Security & Safety - ✅ **User verification** - Users can only stop their own executions (parent + children) - ✅ **No cycles** - Self-referential FK prevents infinite loops - ✅ **Graceful degradation** - Errors in child stops don't block parent stop - ✅ **Rate limits** - Existing execution rate limits still apply ## Testing Checklist ### Database Migration - [x] Migration runs successfully - [x] Prisma client regenerates without errors - [x] Existing tests pass ### Core Functionality - [ ] Manual test: Stop parent with running child → child stops - [ ] Manual test: Stop parent with queued child → child never starts - [ ] Unit test: Cascading stop with multiple children - [ ] Unit test: Deep nesting (3+ levels) - [ ] Integration test: Race condition prevention ## Breaking Changes **None** - All changes are backward compatible with existing code. ## Rollback Plan If issues arise: 1. **Code rollback**: Revert PR, redeploy 2. **Database rollback**: Drop column and constraints (non-destructive) --- **Note**: This branch contains additional unrelated changes from merging with `dev`. The core cascading stop feature involves only: - `schema.prisma` + migration - `backend/data/execution.py` - `backend/executor/utils.py` - `backend/blocks/agent.py` - `backend/executor/manager.py` All other file changes are from dev branch updates and not part of this feature. 🤖 Generated with [Claude Code](https://claude.ai/code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Nested graph executions: parent-child tracking and retrieval of child executions * **Improvements** * Cascading stop: stopping a parent optionally terminates child executions * Parent execution IDs propagated through runs and surfaced in logs * Per-user/graph concurrent execution limits enforced * **Bug Fixes** * Skip enqueuing children if parent is terminated; robust handling when parent-status checks fail * **Tests** * Updated tests to cover parent linkage in graph creation <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude <noreply@anthropic.com> |
||
|
|
b7ae2c2fd2 |
fix(backend): move DatabaseError to backend.util.exceptions for better layer separation (#11177)
## Summary Move DatabaseError from store-specific exceptions to generic backend exceptions for proper layer separation, while also fixing store exception inheritance to ensure proper HTTP status codes. ## Problem 1. **Poor Layer Separation**: DatabaseError was defined in store-specific exceptions but represents infrastructure concerns that affect the entire backend 2. **Incorrect HTTP Status Codes**: Store exceptions inherited from Exception instead of ValueError, causing 500 responses for client errors 3. **Reusability Issues**: Other backend modules couldn't use DatabaseError for DB operations 4. **Blanket Catch Issues**: Store-specific catches were affecting generic database operations ## Solution ### Move DatabaseError to Generic Location - Move from backend.server.v2.store.exceptions to backend.util.exceptions - Update all 23 references in backend/server/v2/store/db.py to use new location - Remove from StoreError inheritance hierarchy ### Fix Complete Store Exception Hierarchy - MediaUploadError: Changed from Exception to ValueError inheritance (client errors → 400) - StoreError: Changed from Exception to ValueError inheritance (business logic errors → 400) - Store NotFound exceptions: Changed to inherit from NotFoundError (→ 404) - DatabaseError: Now properly inherits from Exception (infrastructure errors → 500) ## Benefits ### ✅ Proper Layer Separation - Database errors are infrastructure concerns, not store-specific business logic - Store exceptions focus on business validation and client errors - Clean separation between infrastructure and business logic layers ### ✅ Correct HTTP Status Codes - DatabaseError: 500 (server infrastructure errors) - Store NotFound errors: 404 (via existing NotFoundError handler) - Store validation errors: 400 (via existing ValueError handler) - Media upload errors: 400 (client validation errors) ### ✅ Architectural Improvements - DatabaseError now reusable across entire backend - Eliminates blanket catch issues affecting generic DB operations - All store exceptions use global exception handlers properly - Future store exceptions automatically get proper status codes ## Files Changed - **backend/util/exceptions.py**: Add DatabaseError class - **backend/server/v2/store/exceptions.py**: Remove DatabaseError, fix inheritance hierarchy - **backend/server/v2/store/db.py**: Update all DatabaseError references to new location ## Result - ✅ **No more stack trace spam**: Expected business logic errors handled properly - ✅ **Proper HTTP semantics**: 500 for infrastructure, 400/404 for client errors - ✅ **Better architecture**: Clean layer separation and reusable components - ✅ **Fixes original issue**: AgentNotFoundError now returns 404 instead of 500 This addresses the logging issue mentioned by @zamilmajdy while also implementing the architectural improvements suggested by @Pwuts. |
||
|
|
934cb3a9c7 |
feat(backend): Make execution limit per user per graph and reduce to 25 (#11169)
## Summary - Changed max_concurrent_graph_executions_per_user from 50 to 25 concurrent executions - Updated the limit to be per user per graph instead of globally per user - Users can now run different graphs concurrently without being limited by executions of other graphs - Enhanced database query to filter by both user_id and graph_id ## Changes Made - **Settings**: Reduced default limit from 50 to 25 and updated description to clarify per-graph scope - **Database Layer**: Modified `get_graph_executions_count` to accept optional `graph_id` parameter - **Executor Manager**: Updated rate limiting logic to check per-user-per-graph instead of per-user globally - **Logging**: Enhanced warning messages to include graph_id context ## Test plan - [ ] Verify that users can run up to 25 concurrent executions of the same graph - [ ] Verify that users can run different graphs concurrently without interference - [ ] Test rate limiting behavior when limit is exceeded for a specific graph - [ ] Confirm logging shows correct graph_id context in rate limit messages ## Impact This change improves the user experience by allowing concurrent execution of different graphs while still preventing resource exhaustion from running too many instances of the same graph. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com> |
||
|
|
05a72f4185 |
feat(backend): implement user rate limiting for concurrent graph executions (#11128)
## Summary Add configurable rate limiting to prevent users from exceeding the maximum number of concurrent graph executions, defaulting to 50 per user. ## Changes Made ### Configuration (`backend/util/settings.py`) - Add `max_concurrent_graph_executions_per_user` setting (default: 50, range: 1-1000) - Configurable via environment variables or settings file ### Database Query Function (`backend/data/execution.py`) - Add `get_graph_executions_count()` function for efficient count queries - Supports filtering by user_id, statuses, and time ranges - Used to check current RUNNING/QUEUED executions per user ### Database Manager Integration (`backend/executor/database.py`) - Expose `get_graph_executions_count` through DatabaseManager RPC interface - Follows existing patterns for database operations - Enables proper service-to-service communication ### Rate Limiting Logic (`backend/executor/manager.py`) - Inline rate limit check in `_handle_run_message()` before cluster lock - Use existing `db_client` pattern for consistency - Reject and requeue executions when limit exceeded - Graceful error handling - proceed if rate limit check fails - Enhanced logging with user_id and current/max execution counts ## Technical Implementation - **Database approach**: Query actual execution statuses for accuracy - **RPC pattern**: Use DatabaseManager client following existing codebase patterns - **Fail-safe design**: Proceed with execution if rate limit check fails - **Requeue on limit**: Rejected executions are requeued for later processing - **Early rejection**: Check rate limit before expensive cluster lock operations ## Rate Limiting Flow 1. Parse incoming graph execution request 2. Query database via RPC for user's current RUNNING/QUEUED execution count 3. Compare against configured limit (default: 50) 4. If limit exceeded: reject and requeue message 5. If within limit: proceed with normal execution flow ## Configuration Example ```env MAX_CONCURRENT_GRAPH_EXECUTIONS_PER_USER=25 # Reduce to 25 for stricter limits ``` ## Test plan - [x] Basic functionality tested - settings load correctly, database function works - [x] ExecutionManager imports and initializes without errors - [x] Database manager exposes the new function through RPC - [x] Code follows existing patterns and conventions - [ ] Integration testing with actual rate limiting scenarios - [ ] Performance testing to ensure minimal impact on execution pipeline 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com> |
||
|
|
dd84fb5c66 |
feat(platform): Add public share links for agent run results (#10938)
<!-- Clearly explain the need for these changes: --> This PR adds the ability for users to share their agent run results publicly via shareable links. Users can generate a public link that allows anyone to view the outputs of a specific agent execution without requiring authentication. This feature enables users to share their agent results with clients, colleagues, or the community. https://github.com/user-attachments/assets/5508f430-07d0-4cd3-87bc-301b0b005cce ### Changes 🏗️ #### Backend Changes - **Database Schema**: Added share tracking fields to `AgentGraphExecution` model in Prisma schema: - `isShared`: Boolean flag to track if execution is shared - `shareToken`: Unique token for the share URL - `sharedAt`: Timestamp when sharing was enabled - **API Endpoints**: Added three new REST endpoints in `/backend/backend/server/routers/v1.py`: - `POST /graphs/{graph_id}/executions/{graph_exec_id}/share`: Enable sharing for an execution - `DELETE /graphs/{graph_id}/executions/{graph_exec_id}/share`: Disable sharing - `GET /share/{share_token}`: Retrieve shared execution data (public endpoint) - **Data Models**: - Created `SharedExecutionResponse` model for public-safe execution data - Added `ShareRequest` and `ShareResponse` Pydantic models for type-safe API responses - Updated `GraphExecutionMeta` to include share status fields - **Security**: - All share management endpoints verify user ownership before allowing changes - Public endpoint only exposes OUTPUT block data, no intermediate execution details - Share tokens are UUIDs for security #### Frontend Changes - **ShareButton Component** (`/frontend/src/components/ShareButton.tsx`): - Modal dialog for managing share settings - Copy-to-clipboard functionality for share links - Clear warnings about public accessibility - Uses Orval-generated API hooks for enable/disable operations - **Share Page** (`/frontend/src/app/(no-navbar)/share/[token]/page.tsx`): - Clean, navigation-free page for viewing shared executions - Reuses existing `RunOutputs` component for consistent output rendering - Proper error handling for invalid/disabled share links - Loading states during data fetch - **API Integration**: - Fixed custom mutator to properly set Content-Type headers for POST requests with empty bodies - Generated TypeScript types via Orval for type-safe API calls ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: <!-- Test plan: --> - [x] Enable sharing for an agent execution and verify share link is generated - [x] Copy share link and verify it copies to clipboard - [x] Open share link in incognito/private browser and verify outputs are displayed - [x] Disable sharing and verify share link returns 404 - [x] Try to enable/disable sharing for another user's execution (should fail with 404) - [x] Verify share page shows proper loading and error states - [x] Test that only OUTPUT blocks are shown in shared view, no intermediate data = |
||
|
|
27fccdbf31 |
fix(backend/executor): Make graph execution status transitions atomic and enforce state machine (#10863)
## Summary - Fixed race condition issues in `update_graph_execution_stats` function - Implemented atomic status transitions using database-level constraints - Added state machine enforcement to prevent invalid status transitions - Eliminated code duplication and improved error handling ## Problem The `update_graph_execution_stats` function had race condition vulnerabilities where concurrent status updates could cause invalid transitions like RUNNING → QUEUED. The function was not durable and could result in executions moving backwards in their lifecycle, causing confusion and potential system inconsistencies. ## Root Cause Analysis 1. **Race Conditions**: The function used a broad OR clause that allowed updates from multiple source statuses without validating the specific transition 2. **No Atomicity**: No atomic check to ensure the status hadn't changed between read and write operations 3. **Missing State Machine**: No enforcement of valid state transitions according to execution lifecycle rules ## Solution Implementation ### 1. Atomic Status Transitions - Use database-level atomicity by including the current allowed source statuses in the WHERE clause during updates - This ensures only valid transitions can occur at the database level ### 2. State Machine Enforcement Define valid transitions as a module constant `VALID_STATUS_TRANSITIONS`: - `INCOMPLETE` → `QUEUED`, `RUNNING`, `FAILED`, `TERMINATED` - `QUEUED` → `RUNNING`, `FAILED`, `TERMINATED` - `RUNNING` → `COMPLETED`, `TERMINATED`, `FAILED` - `TERMINATED` → `RUNNING` (for resuming halted execution) - `COMPLETED` and `FAILED` are terminal states with no allowed transitions ### 3. Improved Error Handling - Early validation with clear error messages for invalid parameters - Graceful handling when transitions fail - return current state instead of None - Proper logging of invalid transition attempts ### 4. Code Quality Improvements - Eliminated code duplication in fetch logic - Added proper type hints and casting - Made status transitions constant for better maintainability ## Benefits ✅ **Prevents Invalid Regressions**: No more RUNNING → QUEUED transitions ✅ **Atomic Operations**: Database-level consistency guarantees ✅ **Clear Error Messages**: Better debugging and monitoring ✅ **Maintainable Code**: Clean logic flow without duplication ✅ **Race Condition Safe**: Handles concurrent updates gracefully ## Test Plan - [x] Function imports and basic structure validation - [x] Code formatting and linting checks pass - [x] Type checking passes for modified files - [x] Pre-commit hooks validation ## Technical Details The key insight is using the database query itself to enforce valid transitions by filtering on allowed source statuses in the WHERE clause. This makes the operation truly atomic and eliminates the race condition window that existed in the previous implementation. 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co> |
||
|
|
e16e69ca55 |
feat(library, executor): Make "Run Again" work with credentials (#10821)
- Resolves [OPEN-2549: Make "Run again" work with credentials in `AgentRunDetailsView`](https://linear.app/autogpt/issue/OPEN-2549/make-run-again-work-with-credentials-in-agentrundetailsview) - Resolves #10237 ### Changes 🏗️ - feat(frontend/library): Make "Run Again" button work for runs with credentials - feat(backend/executor): Store passed-in credentials on `GraphExecution` ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - Go to `/library/agents/[id]` for an agent with credentials inputs - Run the agent manually - [x] -> runs successfully - [x] -> "Run again" shows among the action buttons on the newly created run - Click "Run again" - [x] -> runs successfully |
||
|
|
2bb8e91040 |
feat(backend): Add user timezone support to backend (#10707)
Co-authored-by: Swifty <craigswift13@gmail.com> resolve issue #10692 where scheduled time and actual run |
||
|
|
aa256f21cd |
feat(platform/library): Infinite scroll in Agent Runs list (#10709)
- Resolves #10645 ### Changes 🏗️ - Implement infinite scroll in the Agent Runs list (on `/library/agents/[id]`) - Add horizontal scroll support to `ScrollArea` and `InfiniteScroll` components - Fix `InfiniteScroll` triggering twice - Fix date handling by React Queries - Add response mutator to parse dates coming out of API - Make legacy `GraphExecutionMeta` compatible with generated type ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - Open `/library/agents/[id]` - [x] Agent runs list loads - Scroll agent runs list to the end - [x] More runs are loaded and appear in the list |
||
|
|
28d85ad61c |
feat(backend/AM): Integrate AutoMod content moderation (#10539)
Copy of [feat(backend/AM): Integrate AutoMod content moderation - By Bentlybro - PR #10490](https://github.com/Significant-Gravitas/AutoGPT/pull/10490) cos i messed it up 🤦 Adds AutoMod input and output moderation to the execution flow. Introduces a new AutoMod manager and models, updates settings for moderation configuration, and modifies execution result handling to support moderation-cleared data. Moderation failures now clear sensitive data and mark executions as failed. <img width="921" height="816" alt="image" src="https://github.com/user-attachments/assets/65c0fee8-d652-42bc-9553-ff507bc067c5" /> ### Changes 🏗️ I have made some small changes to ``autogpt_platform\backend\backend\executor\manager.py`` to send the needed into to the AutoMod system which collects the data, combines and makes the api call to AM and based on its reply lets it run or not! I also had to make small changes to ``autogpt_platform\backend\backend\data\execution.py`` to add checks that allow me to clear the content from the blocks if it was flagged I am working on finalizing the AM repo then that will be public To note: we will want to set this up behind launch darkly first for testing on the team before we roll it out any more ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: <!-- Put your test plan here: --> - [x] Setup and run the platform with ``automod_enabled`` set to False and it works normally - [x] Setup and run the platform with ``automod_enabled`` set to True, set the AM URL and API Key and test it runs safe blocks normally - [x] Test AM with content that would trigger it to flag and watch it stop and clear all the blocks outputs Message @Bentlybro for the URL and an API key to AM for local testing! ## Changes made to Settings.py I have added a few new options to the settings.py for AutoMod Config! ``` # AutoMod configuration automod_enabled: bool = Field( default=False, description="Whether AutoMod content moderation is enabled", ) automod_api_url: str = Field( default="", description="AutoMod API base URL - Make sure it ends in /api", ) automod_timeout: int = Field( default=30, description="Timeout in seconds for AutoMod API requests", ) automod_retry_attempts: int = Field( default=3, description="Number of retry attempts for AutoMod API requests", ) automod_retry_delay: float = Field( default=1.0, description="Delay between retries for AutoMod API requests in seconds", ) automod_fail_open: bool = Field( default=False, description="If True, allow execution to continue if AutoMod fails", ) automod_moderate_inputs: bool = Field( default=True, description="Whether to moderate block inputs", ) automod_moderate_outputs: bool = Field( default=True, description="Whether to moderate block outputs", ) ``` and ``` automod_api_key: str = Field(default="", description="AutoMod API key") ``` --------- Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co> |
||
|
|
3fe88b6106 |
refactor(backend): Refactor log client and resource cleanup (#10558)
## Summary - Created centralized service client helpers with thread caching in `util/clients.py` - Refactored service client management to eliminate health checks and improve performance - Enhanced logging in process cleanup to include error details - Improved retry mechanisms and resource cleanup across the platform - Updated multiple services to use new centralized client patterns ## Key Changes ### New Centralized Client Factory (`util/clients.py`) - Added thread-cached factory functions for all major service clients: - Database managers (sync and async) - Scheduler client - Notification manager - Execution event bus (Redis-based) - RabbitMQ execution queue (sync and async) - Integration credentials store - All clients use `@thread_cached` decorator for performance optimization ### Service Client Improvements - **Removed health checks**: Eliminated unnecessary health check calls from `get_service_client()` to reduce startup overhead - **Enhanced retry support**: Database manager clients now use request retry by default - **Better error handling**: Improved error propagation and logging ### Enhanced Logging and Cleanup - **Process termination logs**: Added error details to termination messages in `util/process.py` - **Retry mechanism updates**: Improved retry logic with better error handling in `util/retry.py` - **Resource cleanup**: Better resource management across executors and monitoring services ### Updated Service Usage - Refactored 21+ files to use new centralized client patterns - Updated all executor, monitoring, and notification services - Maintained backward compatibility while improving performance ## Files Changed - **Created**: `backend/util/clients.py` - Centralized client factory with thread caching - **Modified**: 21 files across blocks, executor, monitoring, and utility modules - **Key areas**: Service client initialization, resource cleanup, retry mechanisms ## Test Plan - [x] Verify all existing tests pass - [x] Validate service startup and client initialization - [x] Test resource cleanup on process termination - [x] Confirm retry mechanisms work correctly - [x] Validate thread caching performance improvements - [x] Ensure no breaking changes to existing functionality ## Breaking Changes None - all changes maintain backward compatibility. ## Additional Notes This refactoring centralizes client management patterns that were scattered across the codebase, making them more consistent and performant through thread caching. The removal of health checks reduces startup time while maintaining reliability through improved retry mechanisms. 🤖 Generated with [Claude Code](https://claude.ai/code) |
||
|
|
69d873debc |
fix(backend): improve executor reliability and error handling (#10526)
This PR improves the reliability of the executor system by addressing several race conditions and improving error handling throughout the execution pipeline. ### Changes 🏗️ - **Consolidated exception handling**: Now using `BaseException` to properly catch all types of interruptions including `CancelledError` and `SystemExit` - **Atomic stats updates**: Moved node execution stats updates to be atomic with graph stats updates to prevent race conditions - **Improved cleanup handling**: Added proper timeout handling (3600s) for stuck executions during cleanup - **Fixed concurrent update race conditions**: Node execution updates are now properly synchronized with graph execution updates - **Better error propagation**: Improved error type preservation and status management throughout the execution chain - **Graph resumption support**: Added proper handling for resuming terminated and failed graph executions - **Removed deprecated methods**: Removed `update_node_execution_stats` in favor of atomic updates ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Execute a graph with multiple nodes and verify stats are updated correctly - [x] Cancel a running graph execution and verify proper cleanup - [x] Simulate node failures and verify error propagation - [x] Test graph resumption after termination/failure - [x] Verify no race conditions in concurrent node execution updates #### For configuration changes: - [x] `.env.example` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under **Changes**) 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com> |
||
|
|
e632549175 |
feat(backend): Add AI-generated activity status for agent executions (#10487)
## Summary - Adds AI-generated activity status summaries for agent execution results - Provides users with conversational, non-technical summaries of what their agents accomplished - Includes comprehensive execution data analysis with honest failure reporting ## Changes Made - **Backend**: Added `ActivityStatusGenerator` module with async LLM integration - **Database**: Extended `GraphExecutionStats` and `Stats` models with `activity_status` field - **Frontend**: Added "Smart Agent Execution Summary" display with disclaimer tooltip - **Settings**: Added `execution_enable_ai_activity_status` toggle (disabled by default) - **Testing**: Comprehensive test suite with 12 test cases covering all scenarios ## Key Features - Collects execution data including graph structure, node relations, errors, and I/O samples - Generates user-friendly summaries from first-person perspective - Honest reporting of failures and invalid inputs (no sugar-coating) - Payload optimization for LLM context limits - Full async implementation with proper error handling ## Test Plan - [x] All existing tests pass - [x] New comprehensive test suite covers success/failure scenarios - [x] Feature toggle testing (enabled/disabled states) - [x] Frontend integration displays correctly - [x] Error handling and edge cases covered 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com> |
||
|
|
686d811062 |
feat(backend): Agent Executor reliability; make RPC to DB manager durable (#10516)
Some failure on DB RPC can cause agent execution failure. This change makes sure the error chance is minimized. ### Changes 🏗️ * Enable request retry * Increase transaction timeout * Use better typing on the DB query * Gracefully handles insufficient balance ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: <!-- Put your test plan here: --> - [x] Manual tests |
||
|
|
b71d0ec805 |
feat(backend): Ensure json column value serializable & remove excessive transaction and locking (#10441)
### Changes 🏗️ 1. Json columns have to be json serializable, but sometimes the data is not, so `SafeJson` is introduced to make sure that the data being loaded can be string serialized and back before persisting into the database. 2. Locks & transactions seem to be used in the case where it's not needed, this reduces database & redis performance. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: <!-- Put your test plan here: --> - [x] CI tests |
||
|
|
08b05621c1 |
feat(block;backend): Truncate execution update payload on large data & Improve ReadSpreadsheetBlock performance (#10395)
### Changes 🏗️ This PR introduces several key improvements to message handling, block functionality, and execution reliability: - **Renamed CSV block to Spreadsheet block** with enhanced CSV/Excel processing capabilities - **Added message size limiting and truncation** for Redis communication to prevent connection issues - **Optimized FileReadBlock** to yield content chunks instead of duplicated outputs for better performance - **Improved execution termination handling** with better timeout management and event publishing - **Enhanced continuous retry decorator** with async function support - **Implemented payload truncation** to prevent Redis connection issues from oversized messages ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified backend starts without errors - [x] Confirmed message truncation works for large payloads - [x] Tested spreadsheet block functionality with CSV and Excel files - [x] Validated execution termination improvements - [x] Checked FileReadBlock chunk processing #### For configuration changes: - [x] `.env.example` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under **Changes**) 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com> |
||
|
|
309114a727 | Merge commit from fork | ||
|
|
4ffb99bfb0 |
feat(backend): Add block error rate monitoring and Discord alerts (#10332)
## Summary This PR adds a simple block error rate monitoring system that runs every 24 hours (configurable) and sends Discord alerts when blocks exceed the error rate threshold. ## Changes Made **Modified Files:** - `backend/executor/scheduler.py` - Added `report_block_error_rates` function and scheduled job - `backend/util/settings.py` - Added configuration options - `backend/.env.example` - Added environment variable examples - Refactor scheduled job logics in scheduler.py into seperate files ## Configuration ```bash # Block Error Rate Monitoring BLOCK_ERROR_RATE_THRESHOLD=0.5 # 50% error rate threshold BLOCK_ERROR_RATE_CHECK_INTERVAL_SECS=86400 # 24 hours ``` ## How It Works 1. **Scheduled Job**: Runs every 24 hours (configurable via `BLOCK_ERROR_RATE_CHECK_INTERVAL_SECS`) 2. **Error Rate Calculation**: Queries last 24 hours of node executions and calculates error rates per block 3. **Threshold Check**: Alerts on blocks with ≥50% error rate (configurable via `BLOCK_ERROR_RATE_THRESHOLD`) 4. **Discord Alert**: Sends alert to Discord using existing `discord_system_alert` function 5. **Manual Execution**: Available via `execute_report_block_error_rates()` scheduler client method ## Alert Format ``` Block Error Rate Alert: 🚨 Block 'DeprecatedGPT3Block' has 75.0% error rate (75/100) in the last 24 hours 🚨 Block 'BrokenImageBlock' has 60.0% error rate (30/50) in the last 24 hours ``` ## Testing Can be tested manually via: ```python from backend.executor.scheduler import SchedulerClient client = SchedulerClient() result = client.execute_report_block_error_rates() ``` ## Implementation Notes - Follows the same pattern as `report_late_executions` function - Only checks blocks with ≥10 executions to avoid noise - Uses existing Discord notification infrastructure - Configurable threshold and check interval - Proper error handling and logging ## Test plan - [x] Verify configuration loads correctly - [x] Test error rate calculation with existing database - [x] Confirm Discord integration works - [x] Test manual execution via scheduler client - [x] Verify scheduled job runs correctly 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude AI <claude@anthropic.com> Co-authored-by: Claude <noreply@anthropic.com> |
||
|
|
095199bfa6 |
feat(backend): implement KV data storage blocks (#10294)
This PR introduces key-value storage blocks. ### Changes 🏗️ - **Database Schema**: Add AgentNodeExecutionKeyValueData table with composite primary key (userId, key) - **Persistence Blocks**: Create PersistInformationBlock and RetrieveInformationBlock in persistence.py - **Scope-based Storage**: Support for within_agent (per agent) vs across_agents (global user) persistence - **Key Structure**: Use formal # delimiter for storage keys: `agent#{graph_id}#{key}` and `global#{key}` ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Run all 244 block tests - all passing ✅ - [x] Test PersistInformationBlock with mock data storage - [x] Test RetrieveInformationBlock with mock data retrieval - [x] Verify scope-based key generation (within_agent vs across_agents) - [x] Verify database function integration through all manager classes - [x] Run lint and type checking - all passing ✅ - [x] Verify database migration is included and valid #### For configuration changes: - [x] `.env.example` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under **Changes**) Note: This change adds database schema and new blocks but doesn't require environment or docker-compose changes as it uses existing database infrastructure. --------- Co-authored-by: Claude <noreply@anthropic.com> |
||
|
|
f1cc2afbda |
feat(backend): improve stop graph execution reliability (#10293)
## Summary - Enhanced graph execution cancellation and cleanup mechanisms - Improved error handling and logging for graph execution lifecycle - Added timeout handling for graph termination with proper status updates - Exposed a new API for stopping graph based on only graph_id or user_id - Refactored logging metadata structure for better error tracking ## Key Changes ### Backend - **Graph Execution Management**: Enhanced `stop_graph_execution` with timeout-based waiting and proper status transitions - **Execution Cleanup**: Added proper cancellation waiting with timeout handling in executor manager - **Logging Improvements**: Centralized `LogMetadata` class and improved error logging consistency - **API Enhancements**: Added bulk graph execution stopping functionality - **Error Handling**: Better exception handling and status management for failed/cancelled executions ### Frontend - **Status Safety**: Added null safety checks for status chips to prevent runtime errors - **Execution Control**: Simplified stop execution request handling ## Test Plan - [x] Verify graph execution can be properly stopped and reaches terminal state - [x] Test timeout scenarios for stuck executions - [x] Validate proper cleanup of running node executions when graph is cancelled - [x] Check frontend status chips handle undefined statuses gracefully - [x] Test bulk execution stopping functionality 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com> |
||
|
|
198b3d9f45 |
fix(backend): Avoid swallowing exception on graph execution failure (#10260)
Graph execution that fails due to interruption or unknown error should be enqueued back to the queue. However, swallowing the error ends up not marking the execution as a failure. ### Changes 🏗️ * Avoid keep updating the graph execution status on each node execution step. * Added a guard rail to avoid completing graph execution on non-completed execution status. * Avoid acknowledging messages from the queue if the graph execution is not yet completed. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: <!-- Put your test plan here: --> - [x] Run graph execution, kill the process, re-run the process --------- Co-authored-by: Swifty <craigswift13@gmail.com> |
||
|
|
2f12e8d731 |
feat(platform): Add Host-scoped credentials support for blocks HTTP requests (#10215)
Currently, we don't have a secure way to pass Authorization headers when calling the `SendWebRequestBlock`. This hinders the integration of third-party applications that do not yet have native block support. ### Changes 🏗️ Add Host-scoped credentials support for the newly introduced SendAuthenticatedWebRequestBlock. <img width="1000" alt="image" src="https://github.com/user-attachments/assets/0d3d577a-2b9b-4f0f-9377-0e00a069ba37" /> <img width="1000" alt="image" src="https://github.com/user-attachments/assets/a59b9f16-c89c-453d-a628-1df0dfd60fb5" /> ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: <!-- Put your test plan here: --> - [x] Uses `https://api.openai.com/v1/images/edits` through SendWebRequestBlock by passing the api-key through host-scoped credentials. |
||
|
|
443995d79a | fix(backend): Fix broken executor due to missing nodes_input_masks on GraphExecutionEntry | ||
|
|
efa4b6d2a0 |
feat(platform/library): Triggered-agent support (#10167)
This pull request adds support for setting up (webhook-)triggered agents in the Library. It contains changes throughout the entire stack to make everything work in the various phases of a triggered agent's lifecycle: setup, execution, updates, deletion. Setting up agents with webhook triggers was previously only possible in the Builder, limiting their use to the agent's creator only. To make it work in the Library, this change uses the previously introduced `AgentPreset` to store information on, instead of on the graph's nodes to which only a graph's creator has access. - Initial ticket: #10111 - Builds on #9786   ### Changes 🏗️ Frontend: - Amend the Library's `AgentRunDraftView` to handle creating and editing Presets - Add `hideIfSingleCredentialAvailable` parameter to `CredentialsInput` - Add multi-select support to `TypeBasedInput` - Add Presets section to `AgentRunsSelectorList` - Amend `AgentRunSummaryCard` for use for Presets - Add `AgentStatusChip` to display general agent status (for now: Active / Inactive / Error) - Add Preset loading logic and create/update/delete handlers logic to `AgentRunsPage` - Rename `IconClose` to `IconCross` API: - Add `LibraryAgent` properties `has_external_trigger`, `trigger_setup_info`, `credentials_input_schema` - Add `POST /library/agents/{library_agent_id}/setup_trigger` endpoint - Remove redundant parameters from `POST /library/presets/{preset_id}/execute` endpoint Backend: - Add `POST /library/agents/{library_agent_id}/setup_trigger` endpoint - Extract non-node-related logic from `on_node_activate` into `setup_webhook_for_block` - Add webhook-related logic to `update_preset` and `delete_preset` endpoints - Amend webhook infrastructure to work with AgentPresets - Add preset trigger support to webhook ingress endpoint - Amend executor stack to work with passed-in node input (`nodes_input_masks`, generalized from `node_credentials_input_map`) - Amend graph validation to work with passed-in node input - Add `AgentPreset`->`IntegrationWebhook` relation - Add `WebhookWithRelations` model - Change behavior of `BaseWebhooksManager.get_manual_webhook(..)` to avoid unnecessary changes of the webhook URL: ignore `events` to find matching webhook, and update `events` if necessary. - Fix & improve `AgentPreset` API, models, and DB logic - Add `isDeleted` filter to get/list queries - Add `user_id` attribute to `LibraryAgentPreset` model - Add separate `credentials` property to `LibraryAgentPreset` model - Fix `library_db.update_preset(..)` replacement of existing `InputPresets` - Make `library_db.update_preset(..)` more usage-friendly with separate parameters for updateable properties - Add `user_id` checks to various DB functions - Fix error handling in various endpoints - Fix cache race condition on `load_webhook_managers()` ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - Test existing functionality - [x] Auto-setup and -teardown of webhooks on save in the builder still works - [x] Running an agent normally from the Library still works - Test new functionality - [x] Setting up a trigger in the Library - [x] Updating a trigger in the Library - [x] Disabling and re-enabling a trigger in the Library - [x] Deleting a trigger in the Library - [x] Triggers set up in the Library result in a new run when the webhook receives a payload |
||
|
|
3df6dcd26b |
feat(blocks): Improve SmartDecisionBlock & AIStructuredResponseGeneratorBlock (#10198)
Main issues: * `AIStructuredResponseGeneratorBlock` is not able to produce a list of objects. * `SmartDecisionBlock` is not able to call tools with some optional inputs. ### Changes 🏗️ * Allow persisting `null` / `None` value as execution output. * Provide `multiple_tool_calls` option for `SmartDecisionBlock`. * Provide `list_result` option for `AIStructuredResponseGeneratorBlock` ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: <!-- Put your test plan here: --> - [x] Run `SmartDecisionBlock` & `AIStructuredResponseGeneratorBlock` |
||
|
|
97e72cb485 |
feat(backend): Make execution engine async-first (#10138)
This change introduced async execution for blocks and the execution engine. Paralellism will be achieved through a single process asynchronous execution instead of process concurrency. ### Changes 🏗️ * Support async execution for the graph executor * Removed process creation for node execution * Update all blocks to support async executions ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: <!-- Put your test plan here: --> - [x] Manual graph executions, tested many of the impacted blocks. |
||
|
|
210d457ecd |
feat(executor): Improve execution ordering to allow depth-first execution (#10142)
Allowing depth-first execution will unlock faster processing latency and a better sense of progress. <img width="950" alt="image" src="https://github.com/user-attachments/assets/e2a0e11a-8bc5-4a65-a10d-b5b6c6383354" /> ### Changes 🏗️ * Prioritize adding a new execution over processing execution output * Make sure to enqueue each node once when processing output instead of draining a single node and move on. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: <!-- Put your test plan here: --> - [x] Run company follower count finder agent. --------- Co-authored-by: Swifty <craigswift13@gmail.com> |
||
|
|
2647417e9f |
feat(executor;frontend): Move output processing step from node executor to graph executor & simplify input beads calculation (#10066)
**Goal**: Allow parallel runs within a single node. Currently, we prevent this to avoid unexpected ordering of the execution. ### Changes 🏗️ #### Executor changes We decoupled the node execution output processing, which is responsible for deciding the next executions from the node executor code. Currently, `execute_node` does two big things: * Runs the block’s execute(...) (which yields outputs). * immediately enqueues the next nodes based on those outputs. This PR makes: * execute_node(node_exec) -> stream of (output_name, data). That purely runs the block and yields each output as soon as it’s available. * Move _enqueue_next_nodes into the graph executor. So the next execution is handled serially by the graph executor to avoid concurrency issues. #### UI changes The change on the executor also fixes the behavior of the execution update to the UI We will report the execution output to the UI as soon as it is available, not when the node execution is fully completed. This, however, broke the bread calculation logic that assumes each execution update will never overlap. So the change in this PR makes the bead calculation take the overlap / duplicated execution update into account, and simplify the overall calculation logic. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: <!-- Put your test plan here: --> - [x] Execute this agent and observe its concurrency ordering <img width="1424" alt="image" src="https://github.com/user-attachments/assets/0fe8259f-9091-4ecc-b824-ce8e8819c2d2" /> |
||
|
|
b244726b20 |
fix(backend): Include webhook block executions in GraphExecution queries (#9984)
- Resolves #9752 - Follow-up fix to #9940 ### Changes 🏗️ - `GRAPH_EXECUTION_INCLUDE` -> `graph_execution_include(include_block_ids)` - Add `get_io_block_ids()` and `get_webhook_block_ids()` to `backend.data.blocks` ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [ ] Payload for webhook-triggered runs is shown on `/library/agents/[id]` |
||
|
|
17e973a8cb |
fix(platform): Optimise Query Indexes (#10038)
# Query Optimization for AgentNodeExecution Tables
## Overview
This PR describes the database index optimizations applied to improve
the performance of slow queries in the AutoGPT platform backend.
## Problem Analysis
The following queries were identified as consuming significant database
resources:
### 1. Complex Filtering Query (19.3% of total time)
```sql
SELECT ... FROM "AgentNodeExecution"
WHERE "agentNodeId" = $1
AND "agentGraphExecutionId" = $2
AND "executionStatus" = $3
AND "id" NOT IN (
SELECT "referencedByInputExecId"
FROM "AgentNodeExecutionInputOutput"
WHERE "name" = $4 AND "referencedByInputExecId" IS NOT NULL
)
ORDER BY "addedTime" ASC
```
### 2. Multi-table JOIN Query (8.9% of total time)
```sql
SELECT ... FROM "AgentNodeExecution"
LEFT JOIN "AgentNode" ON ...
LEFT JOIN "AgentBlock" ON ...
WHERE "AgentBlock"."id" IN (...)
AND "executionStatus" != $11
AND "agentGraphExecutionId" IN (...)
ORDER BY "queuedTime" DESC, "addedTime" DESC
```
### 3. Bulk Graph Execution Queries (multiple variations, ~10% combined)
Multiple queries filtering by `agentGraphExecutionId` with various
ordering requirements.
## Optimization Strategy
### 1. Composite Indexes for AgentNodeExecution
Set the following composite indexes to the `AgentNodeExecution` model:
```prisma
@@index([agentGraphExecutionId, agentNodeId, executionStatus])
@@index([addedTime, queuedTime])
```
#### Benefits:
- **Index 1**: Covers the exact WHERE clause of the complex filtering
query, allowing index-only scans
- **Index 2**: Optimizes queries filtering by graph execution and status
- **Index 3**: Supports efficient sorting when filtering by graph
execution
- **Index 4**: Optimizes ORDER BY operations on time fields
### 2. Composite Index for AgentNodeExecutionInputOutput
Added the following composite index:
```prisma
// Input and Output pin names are unique for each AgentNodeExecution.
@@unique([referencedByInputExecId, referencedByOutputExecId, name])
@@index([referencedByOutputExecId])
// Composite index for `upsert_execution_input`.
@@index([name, time])
```
#### Benefits:
- Dramatically improves the NOT IN subquery performance in Query 1
- Allows the database to use an index scan instead of a full table scan
- Reduces the subquery execution time from O(n) to O(log n)
## Expected Performance Improvements
1. **Query 1 (19.3% of total time)**:
- Expected improvement: 80-90% reduction in execution time
- The composite index on `[agentNodeId, agentGraphExecutionId,
executionStatus]` will allow index-only scans
- The subquery will benefit from the new index on
`AgentNodeExecutionInputOutput`
2. **Query 2 (8.9% of total time)**:
- Expected improvement: 50-70% reduction in execution time
- The `[agentGraphExecutionId, executionStatus]` index will reduce the
initial filtering cost
3. **Bulk Queries (10% combined)**:
- Expected improvement: 60-80% reduction in execution time
- Composite indexes including time fields will optimize sorting
operations
## Migration Considerations
1. **Index Creation Time**: Creating these indexes on existing large
tables may take time
2. **Storage Impact**: Each index requires additional storage space
3. **Write Performance**: Slight decrease in INSERT/UPDATE performance
due to index maintenance
## Additional Optimizations
### NotificationEvent Table Index
Added index for notification batch queries:
```prisma
@@index([userNotificationBatchId])
```
This optimizes the query:
```sql
SELECT ... FROM "NotificationEvent"
WHERE "userNotificationBatchId" IN (...)
```
#### Benefits:
- Eliminates full table scans when filtering by batch ID
- Improves query performance from O(n) to O(log n)
- Particularly beneficial for users with many notification events
## Future Optimizations
Consider these additional optimizations if needed:
1. Partitioning `AgentNodeExecution` table by `createdAt` or
`agentGraphExecutionId`
2. Implementing materialized views for frequently accessed aggregate
data
3. Adding covering indexes for specific query patterns
4. Implementing query result caching at the application level
---------
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
|
||
|
|
ac532ca4b9 |
fix(backend): Graph execution update on terminate (#9952)
Resolves #9947 ### Changes 🏗️ Backend: - Send a graph execution update after terminating a run - Don't wipe the graph execution stats when not passed in to `update_graph_execution_stats` Frontend: - Don't hide the output of stopped runs ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - Go to `/library/agents/[id]` - Run an agent that takes a while (long enough to click stop and see the effect) - Hit stop after it has executed a few nodes - [x] -> run status should change to "Stopped" - [x] -> run stats (steps, duration, cost) should stay the same or increase only one last time - [x] -> output so far should be visible - [x] -> shown information should stay the same after refreshing the page --- Co-authored-by: Krzysztof Czerwinski <34861343+kcze@users.noreply.github.com> |
||
|
|
8de88395f1 |
fix(backend): Continue stats accounting on aborted or broken executions (#9921)
This is a follow-up to https://github.com/Significant-Gravitas/AutoGPT/pull/9903 The continued graph execution restarted all the execution stats from zero, making the execution stats misleading. ### Changes 🏗️ Continue the execution stats when continuing the graph execution. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: <!-- Put your test plan here: --> - [x] Existing tests, manual graph run with the graph execution aborted midway. |
||
|
|
0726a00fb7 |
fix(backend): Include sub-graphs in graph-level credentials support (#9862)
The Library Agent credentials UX (#9789) currently doesn't work for sub-graphs. ### Changes 🏗️ - Include sub-graphs in generating `Graph.credentials_input_schema` - Propagate `node_credentials_input_map` into `AgentExecutionBlock` executions - Fix: also apply `node_credentials_input_map` in `_enqueue_next_nodes` ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - Import a graph with sub-graphs that need credentials - Run this agent from the Library - [x] -> Should work |
||
|
|
ac8ef9bdb2 |
feat(backend): Introduce late execution check scheduled job (#9914)
Introduce a late execution check scheduled job. The late threshold duration is configurable. This initial version only reports the error to Sentry. ### Changes 🏗️ * Added late execution check scheduled job * Move the registration weekly notification processing job out of API call and calling it directly from the scheduler service. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: <!-- Put your test plan here: --> - [x] Manual firing of scheduled job through an exposed API |
||
|
|
d7077b5161 |
feat(backend): Continue instead of retrying aborted/broken agent execution (#9903)
Currently, the agent/graph execution engine is consuming the execution queue and acknowledges the message after fully completing its execution or failing it. However, in the case of the agent executor failing due to a hardware/resource issue, or the executor did not manage to acknowledge the execution message. Another agent executor will pick it up and start the execution again from the beginning. The scope of this PR is to make the next executor pick up the next work to continue the pre-existing execution instead of starting it all over from the beginning. ### Changes 🏗️ * Removed `start_node_execs` from `GraphExecutionEntry` * Populate the starting graph node from the DB query instead (fetching Running & Queued node executions). * Removed `get_incomplete_node_executions` from DB manager. * Use get_node_executions with a status filter instead. * Allow graph execution to end in non-FAILED/COMPLETED status, e.g, when the executor is interrupted, it should be stuck in the running status, and let other executors continue the task. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: <!-- Put your test plan here: --> - [x] Run an agent, stop the executor midway, re-reun the executor, the execution should be continued instead of restarted. |
||
|
|
9fa62c03f6 |
feat(backend): Improve cancel execution reliability (#9889)
When an executor dies, an ongoing execution will not be retried and will just stuck in the running status. This change avoids such a scenario by allowing an execution of an entry that is not in QUEUED status with the low-probability risk of double execution. ### Changes 🏗️ * Allow non-QUEUED status to be re-executed. * Improve cleanup of node & graph executor. * Make a cancellation request consumption a separate thread to avoid being blocked by other messages. * Remove unused retry loop on the execution manager. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: <!-- Put your test plan here: --> - [x] Run agent, kill the server, re-run it, agent restarted. |