Commit Graph

232 Commits

Author SHA1 Message Date
Nicholas Tindle
7668c17d9c feat(platform): add User Workspace for persistent CoPilot file storage (#11867)
Implements persistent User Workspace storage for CoPilot, enabling
blocks to save and retrieve files across sessions. Files are stored in
session-scoped virtual paths (`/sessions/{session_id}/`).

Fixes SECRT-1833

### Changes 🏗️

**Database & Storage:**
- Add `UserWorkspace` and `UserWorkspaceFile` Prisma models
- Implement `WorkspaceStorageBackend` abstraction (GCS for cloud, local
filesystem for self-hosted)
- Add `workspace_id` and `session_id` fields to `ExecutionContext`

**Backend API:**
- Add REST endpoints: `GET/POST /api/workspace/files`, `GET/DELETE
/api/workspace/files/{id}`, `GET /api/workspace/files/{id}/download`
- Add CoPilot tools: `list_workspace_files`, `read_workspace_file`,
`write_workspace_file`
- Integrate workspace storage into `store_media_file()` - returns
`workspace://file-id` references

**Block Updates:**
- Refactor all file-handling blocks to use unified `ExecutionContext`
parameter
- Update media-generating blocks to persist outputs to workspace
(AIImageGenerator, AIImageCustomizer, FluxKontext, TalkingHead, FAL
video, Bannerbear, etc.)

**Frontend:**
- Render `workspace://` image references in chat via proxy endpoint
- Add "AI cannot see this image" overlay indicator

**CoPilot Context Mapping:**
- Session = Agent (graph_id) = Run (graph_exec_id)
- Files scoped to `/sessions/{session_id}/`

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] Create CoPilot session, generate image with AIImageGeneratorBlock
  - [ ] Verify image returns `workspace://file-id` (not base64)
  - [ ] Verify image renders in chat with visibility indicator
  - [ ] Verify workspace files persist across sessions
  - [ ] Test list/read/write workspace files via CoPilot tools
  - [ ] Test local storage backend for self-hosted deployments

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

🤖 Generated with [Claude Code](https://claude.ai/code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Introduces a new persistent file-storage surface area (DB tables,
storage backends, download API, and chat tools) and rewires
`store_media_file()`/block execution context across many blocks, so
regressions could impact file handling, access control, or storage
costs.
> 
> **Overview**
> Adds a **persistent per-user Workspace** (new
`UserWorkspace`/`UserWorkspaceFile` models plus `WorkspaceManager` +
`WorkspaceStorageBackend` with GCS/local implementations) and wires it
into the API via a new `/api/workspace/files/{file_id}/download` route
(including header-sanitized `Content-Disposition`) and shutdown
lifecycle hooks.
> 
> Extends `ExecutionContext` to carry execution identity +
`workspace_id`/`session_id`, updates executor tooling to clone
node-specific contexts, and updates `run_block` (CoPilot) to create a
session-scoped workspace and synthetic graph/run/node IDs.
> 
> Refactors `store_media_file()` to require `execution_context` +
`return_format` and to support `workspace://` references; migrates many
media/file-handling blocks and related tests to the new API and to
persist generated media as `workspace://...` (or fall back to data URIs
outside CoPilot), and adds CoPilot chat tools for
listing/reading/writing/deleting workspace files with safeguards against
context bloat.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
6abc70f793. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-01-29 05:49:47 +00:00
Zamil Majdy
fb58827c61 feat(backend;frontend): Implement node-specific auto-approval, safety popup, and race condition fixes (#11810)
## Summary

This PR implements comprehensive improvements to the human-in-the-loop
(HITL) review system, including safety features, architectural changes,
and bug fixes:

### Key Features
- **SECRT-1798: One-time safety popup** - Shows informational popup
before first run of AI-generated agents with sensitive actions/HITL
blocks
- **SECRT-1795: Auto-approval toggle UX** - Toggle in pending reviews
panel to auto-approve future actions from the same node
- **Node-specific auto-approval** - Changed from execution-specific to
node-specific using special key pattern
`auto_approve_{graph_exec_id}_{node_id}`
- **Consolidated approval checking** - Merged `check_auto_approval` into
`check_approval` using single OR query for better performance
- **Race condition prevention** - Added execution status check before
resuming to prevent duplicate execution when approving while graph is
running
- **Parallel auto-approval creation** - Uses `asyncio.gather` for better
performance when creating multiple auto-approval records

## Changes

### Backend Architecture
- **`human_review.py`**: 
- Added `check_approval()` function that checks both normal and
auto-approval in single query
- Added `create_auto_approval_record()` for node-specific auto-approval
using special key pattern
- Added `get_auto_approve_key()` helper to generate consistent
auto-approval keys
- **`review/routes.py`**: 
- Added execution status check before resuming to prevent race
conditions
- Refactored auto-approval record creation to use parallel execution
with `asyncio.gather`
  - Removed obvious comments for cleaner code
- **`review/model.py`**: Added `auto_approve_future_actions` field to
`ReviewRequest`
- **`blocks/helpers/review.py`**: Updated to use consolidated
`check_approval` via database manager client
- **`executor/database.py`**: Exposed `check_approval` through
DatabaseManager RPC for block execution context
- **`data/block.py`**: Fixed safe mode checks for sensitive action
blocks

### Frontend
- **New `AIAgentSafetyPopup`** component with localStorage-based
one-time display
- **`PendingReviewsList`**: 
  - Replaced "Approve all future actions" button with toggle
- Toggle resets data to original values and disables editing when
enabled
  - Shows warning message explaining auto-approval behavior
- **`RunAgentModal`**: Integrated safety popup before first run
- **`usePendingReviews`**: Added polling for real-time badge updates
- **`FloatingSafeModeToggle` & `SafeModeToggle`**: Simplified visibility
logic
- **`local-storage.ts`**: Added localStorage key for popup state
tracking

### Bug Fixes
- Fixed "Client is not connected to query engine" error by using
database manager client pattern
- Fixed race condition where approving reviews while graph is RUNNING
could queue execution twice
- Fixed migration to only drop FK constraint, not non-existent column
- Fixed card data reset when auto-approve toggle changes

### Code Quality
- Removed duplicate/obvious comments
- Moved imports to top-level instead of local scope in tests
- Used walrus operator for cleaner conditional assignments
- Parallel execution for auto-approval record creation

## Test plan
- [ ] Create an AI-generated agent with sensitive actions (e.g., email
sending)
- [ ] First run should show the safety popup before starting
- [ ] Subsequent runs should not show the popup
- [ ] Clear localStorage (`AI_AGENT_SAFETY_POPUP_SHOWN`) to verify popup
shows again
- [ ] Create an agent with human-in-the-loop blocks
- [ ] Run it and verify the pending reviews panel appears
- [ ] Enable the "Auto-approve all future actions" toggle
- [ ] Verify editing is disabled and shows warning message
- [ ] Click "Approve" and verify subsequent blocks from same node
auto-approve
- [ ] Verify auto-approval persists across multiple executions of same
graph
- [ ] Disable toggle and verify editing works normally
- [ ] Verify "Reject" button still works regardless of toggle state
- [ ] Test race condition: Approve reviews while graph is RUNNING
(should skip resume)
- [ ] Test race condition: Approve reviews while graph is REVIEW (should
resume)
- [ ] Verify pending reviews badge updates in real-time when new reviews
are created
2026-01-25 04:05:25 +07:00
Zamil Majdy
8b25e62959 feat(backend,frontend): add explicit safe mode toggles for HITL and sensitive actions (#11756)
## Summary

This PR introduces two explicit safe mode toggles for controlling agent
execution behavior, providing clearer and more granular control over
when agents should pause for human review.

### Key Changes

**New Safe Mode Settings:**
- **`human_in_the_loop_safe_mode`** (bool, default `true`) - Controls
whether human-in-the-loop (HITL) blocks pause for review
- **`sensitive_action_safe_mode`** (bool, default `false`) - Controls
whether sensitive action blocks pause for review

**New Computed Properties on LibraryAgent:**
- `has_human_in_the_loop` - Indicates if agent contains HITL blocks
- `has_sensitive_action` - Indicates if agent contains sensitive action
blocks

**Block Changes:**
- Renamed `requires_human_review` to `is_sensitive_action` on blocks for
clarity
- Blocks marked as `is_sensitive_action=True` pause only when
`sensitive_action_safe_mode=True`
- HITL blocks pause when `human_in_the_loop_safe_mode=True`

**Frontend Changes:**
- Two separate toggles in Agent Settings based on block types present
- Toggle visibility based on `has_human_in_the_loop` and
`has_sensitive_action` computed properties
- Settings cog hidden if neither toggle applies
- Proper state management for both toggles with defaults

**AI-Generated Agent Behavior:**
- AI-generated agents set `sensitive_action_safe_mode=True` by default
- This ensures sensitive actions are reviewed for AI-generated content

## Changes

**Backend:**
- `backend/data/graph.py` - Updated `GraphSettings` with two boolean
toggles (non-optional with defaults), added `has_sensitive_action`
computed property
- `backend/data/block.py` - Renamed `requires_human_review` to
`is_sensitive_action`, updated review logic
- `backend/data/execution.py` - Updated `ExecutionContext` with both
safe mode fields
- `backend/api/features/library/model.py` - Added
`has_human_in_the_loop` and `has_sensitive_action` to `LibraryAgent`
- `backend/api/features/library/db.py` - Updated to use
`sensitive_action_safe_mode` parameter
- `backend/executor/utils.py` - Simplified execution context creation

**Frontend:**
- `useAgentSafeMode.ts` - Rewritten to support two independent toggles
- `AgentSettingsModal.tsx` - Shows two separate toggles
- `SelectedSettingsView.tsx` - Shows two separate toggles
- Regenerated API types with new schema

## Test Plan

- [x] All backend tests pass (Python 3.11, 3.12, 3.13)
- [x] All frontend tests pass
- [x] Backend format and lint pass
- [x] Frontend format and lint pass
- [x] Pre-commit hooks pass

---------

Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-01-21 00:56:02 +00:00
Swifty
e55f05c7a8 feat(backend): add chat search tools and BM25 reranking (#11782)
This PR adds new chat tools for searching blocks and documentation,
along with BM25 reranking for improved search relevance.

### Changes 🏗️

**New Chat Tools:**
- `find_block` - Search for available blocks by name/description using
hybrid search
- `run_block` - Execute a block directly with provided inputs and
credentials
- `search_docs` - Search documentation with section-level granularity  
- `get_doc_page` - Retrieve full documentation page content

**Search Improvements:**
- Added BM25 reranking to hybrid search for better lexical relevance
- Documentation handler now chunks markdown by headings (##) for
finer-grained embeddings
- Section-based content IDs (`doc_path::section_index`) for precise doc
retrieval
- Startup embedding backfill in scheduler for immediate searchability

**Other Changes:**
- New response models for block and documentation search results
- Updated orphan cleanup to handle section-based doc embeddings
- Added `rank-bm25` dependency for BM25 scoring
- Removed max message limit check in chat service

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Run find_block tool to search for blocks (e.g., "current time")
  - [x] Run run_block tool to execute a found block
  - [x] Run search_docs tool to search documentation
  - [x] Run get_doc_page tool to retrieve full doc content
- [x] Verify BM25 reranking improves search relevance for exact term
matches
  - [x] Verify documentation sections are properly chunked and embedded

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

**Dependencies added:** `rank-bm25` for BM25 scoring algorithm
2026-01-16 16:18:10 +01:00
Zamil Majdy
8b83bb8647 feat(backend): unified hybrid search with embedding backfill for all content types (#11767)
## Summary

This PR extends the embedding system to support **blocks** and
**documentation** content types in addition to store agents, and
introduces **unified hybrid search** across all content types using a
single `UnifiedContentEmbedding` table.

### Key Changes

1. **Unified Hybrid Search Architecture**
   - Added `search` tsvector column to `UnifiedContentEmbedding` table
- New `unified_hybrid_search()` function searches across all content
types (agents, blocks, docs)
- Updated `hybrid_search()` for store agents to use
`UnifiedContentEmbedding.search`
   - Removed deprecated `search` column from `StoreListingVersion` table

2. **Pluggable Content Handler Architecture**
   - Created abstract `ContentHandler` base class for extensibility
- Implemented handlers: `StoreAgentHandler`, `BlockHandler`,
`DocumentationHandler`
   - Registry pattern for easy addition of new content types

3. **Block Embeddings**
   - Discovers all blocks using `get_blocks()`
- Extracts searchable text from: name, description, categories,
input/output schemas

4. **Documentation Embeddings**
   - Scans `/docs/` directory for `.md` and `.mdx` files
   - Extracts title from first `#` heading or uses filename as fallback

5. **Hybrid Search Graceful Degradation**
- Falls back to lexical-only search if query embedding generation fails
   - Redistributes semantic weight proportionally to other components
   - Logs warning instead of throwing error

6. **Database Migrations**
- `20260115200000_add_unified_search_tsvector`: Adds search column to
UnifiedContentEmbedding with auto-update trigger
- `20260115210000_remove_storelistingversion_search`: Removes deprecated
search column and updates StoreAgent view

7. **Orphan Cleanup**
- `cleanup_orphaned_embeddings()` removes embeddings for deleted content
   - Always runs after backfill, even at 100% coverage

### Review Comments Addressed

-  SQL parameter index bug when user_id provided (embeddings.py)
-  Early return skipping cleanup at 100% coverage (scheduler.py)
-  Inconsistent return structure across code paths (scheduler.py)
-  SQL UNION syntax error - added parentheses for ORDER BY/LIMIT
(hybrid_search.py)
-  Version numeric ordering in aggregations (migration)
-  Embedding dimension uses EMBEDDING_DIM constant

### Files Changed

- `backend/api/features/store/content_handlers.py` (NEW): Handler
architecture
- `backend/api/features/store/embeddings.py`: Refactored to use handlers
- `backend/api/features/store/hybrid_search.py`: Unified search +
graceful degradation
- `backend/executor/scheduler.py`: Process all content types, consistent
returns
- `migrations/20260115200000_add_unified_search_tsvector/`: Add tsvector
to unified table
- `migrations/20260115210000_remove_storelistingversion_search/`: Remove
old search column
- `schema.prisma`: Updated UnifiedContentEmbedding and
StoreListingVersion models
- `*_test.py`: Added tests for unified_hybrid_search

## Test Plan

1.  All tests passing on Python 3.11, 3.12, 3.13
2.  Types check passing
3.  CodeRabbit and Sentry reviews addressed
4. Deploy to staging and verify:
   - Backfill job processes all content types
   - Search results include blocks and docs
   - Search works without OpenAI API (graceful degradation)

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Swifty <craigswift13@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 09:47:19 +01:00
Swifty
5ac941fe2f feat(backend): add hybrid search for store listings, docs and blocks (#11721)
This PR adds hybrid search functionality combining semantic embeddings
with traditional text search for improved store listing discovery.

### Changes 🏗️

- Add `embeddings.py` - OpenAI-based embedding generation and similarity
search
- Add `hybrid_search.py` - Combines vector similarity with text matching
for better search results
- Add `backfill_embeddings.py` - Script to generate embeddings for
existing store listings
- Update `db.py` - Integrate hybrid search into store database queries
- Update `schema.prisma` - Add embedding storage fields and indexes
- Add migrations for embedding columns and HNSW index for vector search

### Architecture Decisions 🏛️

**Fail-Fast Approach (No Silent Fallbacks)**

We explicitly chose NOT to implement graceful degradation when hybrid
search fails. Here's why:

 **Benefits:**
- Errors surface immediately → faster fixes
- Tests verify hybrid search actually works (not just fallback)
- Consistent search quality for all users
- Forces proper infrastructure setup (API keys, database)

 **Why Not Fallback:**
- Silent degradation hides production issues
- Users get inconsistent results without knowing why
- Tests can pass even when hybrid search is broken
- Reduces operational visibility

**How We Prevent Failures:**
1. Embedding generation in approval flow (db.py:1545)
2. Error logging with `logger.error` (not warning)
3. Clear error messages (ValueError explains what's wrong)
4. Comprehensive test coverage (9/9 tests passing)

If embeddings fail, it indicates a real infrastructure issue (missing
API key, OpenAI down, database issues) that needs immediate attention,
not silent degradation.

### Test Coverage 

**All tests passing (1625 total):**
- 9/9 hybrid_search tests (including fail-fast validation)
- 3/3 db search integration tests
- Full schema compatibility (public/platform schemas)
- Error handling verification

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Test hybrid search returns relevant results
  - [x] Test embedding generation for new listings
  - [x] Test backfill script on existing data
  - [x] Verify search performance with embeddings
  - [x] Test fail-fast behavior when embeddings unavailable

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] Configuration: Requires `openai_internal_api_key` in secrets

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 04:17:03 +00:00
Reinier van der Leer
b01ea3fcbd fix(backend/executor): Centralize increment_runs calls & make add_graph_execution more robust (#11764)
[OPEN-2946: \[Scheduler\] Error executing graph <graph_id> after 19.83s:
ClientNotConnectedError: Client is not connected to the query engine,
you must call `connect()` before attempting to query
data.](https://linear.app/autogpt/issue/OPEN-2946)

- Follow-up to #11375
  <sub>(broken `increment_runs` call)</sub>
- Follow-up to #11380
  <sub>(direct `get_graph_execution` call)</sub>

### Changes 🏗️

- Move `increment_runs` call from `scheduler._execute_graph` to
`executor.utils.add_graph_execution` so it can be made through
`DatabaseManager`
  - Add `increment_onboarding_runs` to `DatabaseManager`
- Remove now-redundant `increment_onboarding_runs` calls in other places
- Make `add_graph_execution` more resilient
  - Split up large try/except block
  - Fix direct `get_graph_execution` call

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - CI + a thorough review
2026-01-15 04:08:19 +00:00
Nicholas Tindle
47a3a5ef41 feat(backend,frontend): optional credentials flag for blocks at agent level (#11716)
This feature allows agent makers to mark credential fields as optional.
When credentials are not configured for an optional block, the block
will be skipped during execution rather than causing a validation error.

**Use case:** An agent with multiple notification channels (Discord,
Twilio, Slack) where the user only needs to configure one - unconfigured
channels are simply skipped.

### Changes 🏗️

#### Backend

**Data Model Changes:**
- `backend/data/graph.py`: Added `credentials_optional` property to
`Node` model that reads from node metadata
- `backend/data/execution.py`: Added `nodes_to_skip` field to
`GraphExecutionEntry` model to track nodes that should be skipped

**Validation Changes:**
- `backend/executor/utils.py`:
- Updated `_validate_node_input_credentials()` to return a tuple of
`(credential_errors, nodes_to_skip)`
- Nodes with `credentials_optional=True` and missing credentials are
added to `nodes_to_skip` instead of raising validation errors
- Updated `validate_graph_with_credentials()` to propagate
`nodes_to_skip` set
- Updated `validate_and_construct_node_execution_input()` to return
`nodes_to_skip`
- Updated `add_graph_execution()` to pass `nodes_to_skip` to execution
entry

**Execution Changes:**
- `backend/executor/manager.py`:
  - Added skip logic in `_on_graph_execution()` dispatch loop
- When a node is in `nodes_to_skip`, it is marked as `COMPLETED` without
execution
  - No outputs are produced, so downstream nodes won't trigger

#### Frontend

**Node Store:**
- `frontend/src/app/(platform)/build/stores/nodeStore.ts`:
- Added `credentials_optional` to node metadata serialization in
`convertCustomNodeToBackendNode()`
- Added `getCredentialsOptional()` and `setCredentialsOptional()` helper
methods

**Credential Field Component:**
-
`frontend/src/components/renderers/input-renderer/fields/CredentialField/CredentialField.tsx`:
  - Added "Optional - skip block if not configured" switch toggle
  - Switch controls the `credentials_optional` metadata flag
  - Placeholder text updates based on optional state

**Credential Field Hook:**
-
`frontend/src/components/renderers/input-renderer/fields/CredentialField/useCredentialField.ts`:
  - Added `disableAutoSelect` parameter
- When credentials are optional, auto-selection of credentials is
disabled

**Feature Flags:**
- `frontend/src/services/feature-flags/use-get-flag.ts`: Minor refactor
(condition ordering)

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Build an agent using smart decision maker and down stream blocks
to test this

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Introduces optional credentials across graph execution and UI,
allowing nodes to be skipped (no outputs, no downstream triggers) when
their credentials are not configured.
> 
> - Backend
> - Adds `Node.credentials_optional` (from node `metadata`) and computes
required credential fields in `Graph.credentials_input_schema` based on
usage.
> - Validates credentials with `_validate_node_input_credentials` →
returns `(errors, nodes_to_skip)`; plumbs `nodes_to_skip` through
`validate_graph_with_credentials`,
`_construct_starting_node_execution_input`,
`validate_and_construct_node_execution_input`, and `add_graph_execution`
into `GraphExecutionEntry`.
> - Executor: dispatch loop skips nodes in `nodes_to_skip` (marks
`COMPLETED`); `execute_node`/`on_node_execution` accept `nodes_to_skip`;
`SmartDecisionMakerBlock.run` filters tool functions whose
`_sink_node_id` is in `nodes_to_skip` and errors only if all tools are
filtered.
> - Models: `GraphExecutionEntry` gains `nodes_to_skip` field. Tests and
snapshots updated accordingly.
> 
> - Frontend
> - Builder: credential field uses `custom/credential_field` with an
"Optional – skip block if not configured" toggle; `nodeStore` persists
`credentials_optional` and history; UI hides optional toggle in run
dialogs.
> - Run dialogs: compute required credentials from
`credentials_input_schema.required`; allow selecting "None"; avoid
auto-select for optional; filter out incomplete creds before execute.
>   - Minor schema/UI wiring updates (`uiSchema`, form context flags).
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
5e01fd6a3e. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com>
2026-01-09 14:11:35 +00:00
Nicholas Tindle
79d45a15d0 feat(platform): Deduplicate insufficient funds Discord + email notifications (#11672)
Add Redis-based deduplication for insufficient funds notifications (both
Discord alerts and user emails) when users run out of credits. This
prevents spamming users and the PRODUCT Discord channel with repeated
alerts for the same user+agent combination.

### Changes 🏗️

- **Redis-based deduplication** (`backend/executor/manager.py`):
- Add `INSUFFICIENT_FUNDS_NOTIFIED_PREFIX` constant for Redis key prefix
- Add `INSUFFICIENT_FUNDS_NOTIFIED_TTL_SECONDS` (30 days) as fallback
cleanup
- Implement deduplication in `_handle_insufficient_funds_notif` using
Redis `SET NX`
- Skip both email (`ZERO_BALANCE`) and Discord notifications for
duplicate alerts per user+agent
- Add `clear_insufficient_funds_notifications(user_id)` function to
remove all notification flags for a user

- **Clear flags on credit top-up** (`backend/data/credit.py`):
- Call `clear_insufficient_funds_notifications` in `_top_up_credits`
after successful auto-charge
- Call `clear_insufficient_funds_notifications` in `fulfill_checkout`
after successful manual top-up
- This allows users to receive notifications again if they run out of
funds in the future

- **Comprehensive test coverage**
(`backend/executor/manager_insufficient_funds_test.py`):
  - Test first-time notification sends both email and Discord alert
  - Test duplicate notifications are skipped for same user+agent
  - Test different agents for same user get separate alerts
  - Test clearing notifications removes all keys for a user
  - Test handling when no notification keys exist
- Test notifications still sent when Redis fails (graceful degradation)

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] First insufficient funds alert sends both email and Discord
notification
  - [x] Duplicate alerts for same user+agent are skipped
  - [x] Different agents for same user each get their own notification
  - [x] Topping up credits clears notification flags
  - [x] Redis failure gracefully falls back to sending notifications
  - [x] 30-day TTL provides automatic cleanup as fallback
  - [x] Manually test this works with scheduled agents
 

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Introduces Redis-backed deduplication for insufficient-funds alerts
and resets flags on successful credit additions.
> 
> - **Dedup insufficient-funds alerts** in `executor/manager.py` using
Redis `SET NX` with `INSUFFICIENT_FUNDS_NOTIFIED_PREFIX` and 30‑day TTL;
skips duplicate ZERO_BALANCE email + Discord alerts per
`user_id`+`graph_id`, with graceful fallback if Redis fails.
> - **Reset notification flags on credit increases** by adding
`clear_insufficient_funds_notifications(user_id)` and invoking it when
enabling/adding positive `GRANT`/`TOP_UP` transactions in
`data/credit.py`.
> - **Tests** (`executor/manager_insufficient_funds_test.py`):
first-time vs duplicate behavior, per-agent separation, clearing keys
(including no-key and Redis-error cases), and clearing on
`_add_transaction`/`_enable_transaction`.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
1a4413b3a1. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Ubbe <hi@ubbe.dev>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-30 18:10:30 +00:00
Reinier van der Leer
de78d062a9 refactor(backend/api): Clean up API file structure (#11629)
We'll soon be needing a more feature-complete external API. To make way
for this, I'm moving some files around so:
- We can more easily create new versions of our external API
- The file structure of our internal API is more homogeneous

These changes are quite opinionated, but IMO in any case they're better
than the chaotic structure we have now.

### Changes 🏗️

- Move `backend/server` -> `backend/api`
- Move `backend/server/routers` + `backend/server/v2` ->
`backend/api/features`
  - Change absolute sibling imports to relative imports
- Move `backend/server/v2/AutoMod` -> `backend/executor/automod`
- Combine `backend/server/routers/analytics_*test.py` ->
`backend/api/features/analytics_test.py`
- Sort OpenAPI spec file

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - CI tests
  - [x] Clicking around in the app -> no obvious breakage
2025-12-20 20:33:10 +00:00
Reinier van der Leer
3dbc03e488 feat(platform): OAuth API & Single Sign-On (#11617)
We want to provide Single Sign-On for multiple AutoGPT apps that use the
Platform as their backend.

### Changes 🏗️

Backend:
- DB + logic + API for OAuth flow (w/ tests)
  - DB schema additions for OAuth apps, codes, and tokens
  - Token creation/validation/management logic
- OAuth flow endpoints (app info, authorize, token exchange, introspect,
revoke)
  - E2E OAuth API integration tests
- Other OAuth-related endpoints (upload app logo, list owned apps,
external `/me` endpoint)
    - App logo asset management
  - Adjust external API middleware to support auth with access token
  - Expired token clean-up job
    - Add `OAUTH_TOKEN_CLEANUP_INTERVAL_HOURS` setting (optional)
- `poetry run oauth-tool`: dev tool to test the OAuth flows and register
new OAuth apps
- `poetry run export-api-schema`: dev tool to quickly export the OpenAPI
schema (much quicker than spinning up the backend)

Frontend:
- Frontend UI for app authorization (`/auth/authorize`)
  - Re-redirect after login/signup
- Frontend flow to batch-auth integrations on request of the client app
(`/auth/integrations/setup-wizard`)
  - Debug `CredentialInputs` component
- Add `/profile/oauth-apps` management page
- Add `isOurProblem` flag to `ErrorCard` to hide action buttons when the
error isn't our fault
- Add `showTitle` flag to `CredentialsInput` to hide built-in title for
layout reasons

DX:
- Add [API
guide](https://github.com/Significant-Gravitas/AutoGPT/blob/pwuts/sso/docs/content/platform/integrating/api-guide.md)
and [OAuth
guide](https://github.com/Significant-Gravitas/AutoGPT/blob/pwuts/sso/docs/content/platform/integrating/oauth-guide.md)

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Manually verify test coverage of OAuth API tests
  - Test `/auth/authorize` using `poetry run oauth-tool test-server`
    - [x] Works
    - [x] Looks okay
- Test `/auth/integrations/setup-wizard` using `poetry run oauth-tool
test-server`
    - [x] Works
    - [x] Looks okay
  - Test `/profile/oauth-apps` page
    - [x] All owned OAuth apps show up
    - [x] Enabling/disabling apps works
- [ ] ~~Uploading logos works~~ can only test this once deployed to dev

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
2025-12-19 21:05:16 +01:00
Zamil Majdy
71157bddd7 feat(backend): add agent mode support to SmartDecisionMakerBlock with autonomous tool execution loops (#11547)
## Summary

<img width="2072" height="1836" alt="image"
src="https://github.com/user-attachments/assets/9d231a77-6309-46b9-bc11-befb5d8e9fcc"
/>

**🚀 Major Feature: Agent Mode Support**

Adds autonomous agent mode to SmartDecisionMakerBlock, enabling it to
execute tools directly in loops until tasks are completed, rather than
just yielding tool calls for external execution.

##  **Key New Features**

### 🤖 **Agent Mode with Tool Execution Loops**
- **New `agent_mode_max_iterations` parameter** controls execution
behavior:
  - `0` = Traditional mode (single LLM call, yield tool calls)
  - `1+` = Agent mode with iteration limit
  - `-1` = Infinite agent mode (loop until finished)

### 🔄 **Autonomous Tool Execution**  
- **Direct tool execution** instead of yielding for external handling
- **Multi-iteration loops** with conversation state management
- **Automatic completion detection** when LLM stops making tool calls
- **Iteration limit handling** with graceful completion messages

### 🏗️ **Proper Database Operations**
- **Replace manual execution ID generation** with proper
`upsert_execution_input`/`upsert_execution_output`
- **Real NodeExecutionEntry objects** from database results
- **Proper execution status management**: QUEUED → RUNNING →
COMPLETED/FAILED

### 🔧 **Enhanced Type Safety**
- **Pydantic models** replace TypedDict: `ToolInfo` and
`ExecutionParams`
- **Runtime validation** with better error messages
- **Improved developer experience** with IDE support

## 🔧 **Technical Implementation**

### Agent Mode Flow:
```python
# Agent mode enabled with iterations
if input_data.agent_mode_max_iterations != 0:
    async for result in self._execute_tools_agent_mode(...):
        yield result  # "conversations", "finished"
    return

# Traditional mode (existing behavior)  
# Single LLM call + yield tool calls for external execution
```

### Tool Execution with Database Operations:
```python
# Before: Manual execution IDs
tool_exec_id = f"{node_exec_id}_tool_{sink_node_id}_{len(input_data)}"

# After: Proper database operations
node_exec_result, final_input_data = await db_client.upsert_execution_input(
    node_id=sink_node_id,
    graph_exec_id=execution_params.graph_exec_id,
    input_name=input_name, 
    input_data=input_value,
)
```

### Type Safety with Pydantic:
```python
# Before: Dict access prone to errors
execution_params["user_id"]  

# After: Validated model access
execution_params.user_id  # Runtime validation + IDE support
```

## 🧪 **Comprehensive Test Coverage**

- **Agent mode execution tests** with multi-iteration scenarios
- **Database operation verification** 
- **Type safety validation**
- **Backward compatibility** for traditional mode
- **Enhanced dynamic fields tests**

## 📊 **Usage Examples**

### Traditional Mode (Existing Behavior):
```python
SmartDecisionMakerBlock.Input(
    prompt="Search for keywords",
    agent_mode_max_iterations=0  # Default
)
# → Yields tool calls for external execution
```

### Agent Mode (New Feature):
```python  
SmartDecisionMakerBlock.Input(
    prompt="Complete this task using available tools",
    agent_mode_max_iterations=5  # Max 5 iterations
)
# → Executes tools directly until task completion or iteration limit
```

### Infinite Agent Mode:
```python
SmartDecisionMakerBlock.Input(
    prompt="Analyze and process this data thoroughly", 
    agent_mode_max_iterations=-1  # No limit, run until finished
)
# → Executes tools autonomously until LLM indicates completion
```

##  **Backward Compatibility**

- **Zero breaking changes** to existing functionality
- **Traditional mode remains default** (`agent_mode_max_iterations=0`)
- **All existing tests pass**
- **Same API for tool definitions and execution**

This transforms the SmartDecisionMakerBlock from a simple tool call
generator into a powerful autonomous agent capable of complex multi-step
task execution! 🎯

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-12-12 09:58:06 +00:00
Zamil Majdy
c1e21d07e6 feat(platform): add execution accuracy alert system (#11562)
## Summary

<img width="1263" height="883" alt="image"
src="https://github.com/user-attachments/assets/98d4f449-1897-4019-a599-846c27df4191"
/>
<img width="398" height="190" alt="image"
src="https://github.com/user-attachments/assets/0138ac02-420d-4f96-b980-74eb41e3c968"
/>

- Add execution accuracy monitoring with moving averages and Discord
alerts
- Dashboard visualization for accuracy trends and alert detection  
- Hourly monitoring for marketplace agents (≥10 executions in 30 days)
- Generated API client integration with type-safe models

## Features
- **Moving Average Analysis**: 3-day vs 7-day comparison with
configurable thresholds
- **Discord Notifications**: Hourly alerts for accuracy drops ≥10%
- **Dashboard UI**: Real-time trends visualization with alert status
- **Type Safety**: Generated API hooks and models throughout
- **Error Handling**: Graceful OpenAI configuration handling
- **PostgreSQL Optimization**: Window functions for efficient trend
queries

## Test plan
- [x] Backend accuracy monitoring logic tested with sample data
- [x] Frontend components using generated API hooks (no manual fetch)
- [x] Discord notification integration working
- [x] Admin authentication and authorization working
- [x] All formatting and linting checks passing
- [x] Error handling for missing OpenAI configuration
- [x] Test data available with `test-accuracy-agent-001`

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-12-08 19:28:57 +00:00
Krzysztof Czerwinski
c880db439d feat(platform): Backend completion of Onboarding tasks (#11375)
Make onboarding task completion backend-authoritative which prevents
cheating (previously users could mark all tasks as completed instantly
and get rewards) and makes task completion more reliable. Completion of
tasks is moved backend with exception of introductory onboarding tasks
and visit-page type tasks.

### Changes 🏗️

- Move incrementing run counter backend and make webhook-triggered and
scheduled task execution count as well
- Use user timezone for calculating run streak
- Frontend task completion is moved from update onboarding state to
separate endpoint and guarded so only frontend tasks can be completed
- Graph creation, execution and add marketplace agent to library accept
`source`, so appropriate tasks can be completed
- Replace `client.ts` api calls with orval generated and remove no
longer used functions from `client.ts`
- Add `resolveResponse` helper function that unwraps orval generated
call result to 2xx response

Small changes&bug fixes:
- Make Redis notification bus serialize all payload fields
- Fix confetti when group is finished
- Collapse finished group when opening Wallet
- Play confetti only for tasks that are listed in the Wallet UI

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Onboarding can be finished
  - [x] All tasks can be finished and work properly
  - [x] Confetti works properly
2025-12-05 02:32:28 +00:00
Nicholas Tindle
113df689dc feat(platform): Improve Google Sheets/Drive integration with unified credentials (#11520)
Simplifies and improves the Google Sheets/Drive integration by merging
credentials with the file picker and using narrower OAuth scopes.

### Changes 🏗️

- Merge Google credentials and file picker into a single unified input
field for better UX
- Create spreadsheets using Drive API instead of Sheets API for proper
scope support
- Simplify Google Drive OAuth scope to only use `drive.file` (narrowest
permission needed)
- Clean up unused imports (NormalizedPickedFile)

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Test creating a new Google Spreadsheet with
GoogleSheetsCreateSpreadsheetBlock
- [x] Test reading from existing spreadsheets with GoogleSheetsReadBlock
  - [x] Test writing to spreadsheets with GoogleSheetsWriteBlock
  - [x] Verify OAuth flow works with simplified scopes
  - [x] Verify file picker works with merged credentials field

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Unifies Google Drive picker and credentials with auto-credentials
across backend and frontend, updates all Sheets blocks and execution to
use it, and adds Drive-based spreadsheet creation plus supporting tests
and UI fixes.
> 
> - **Backend**:
> - **Google Drive model/field**: Introduce `GoogleDriveFile` (with
`_credentials_id`) and `GoogleDriveFileField()` for unified auth+picker
(`backend/blocks/google/_drive.py`).
> - **Sheets blocks**: Replace `GoogleDrivePickerField` and explicit
credentials with `GoogleDriveFileField` across all Sheets blocks;
preserve and emit credentials for chaining; add Drive service; create
spreadsheets via Drive API then manage via Sheets API.
> - **IO block**: Add `AgentGoogleDriveFileInputBlock` providing a Drive
picker input.
> - **Execution**: Support auto-generated credentials via
`BlockSchema.get_auto_credentials_fields()`; acquire/release multiple
credential locks; pass creds by `credentials_kwarg`
(`executor/manager.py`, `data/block.py`, `util/test.py`).
> - **Tests**: Add validation tests for duplicate/unique
`auto_credentials.kwarg_name` and defaults.
> - **Frontend**:
> - **Picker**: Enhance Google Drive picker to require/use saved
platform credentials, pass `_credentials_id`, validate scopes, and
manage dialog z-index/interaction; expose `requirePlatformCredentials`.
> - **UI**: Update dialogs/CSS to keep Google picker on top and prevent
overlay interactions.
> - **Types**: Extend `GoogleDrivePickerConfig` with `auto_credentials`
and related typings.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
7d25534def. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
2025-12-04 14:40:30 +00:00
Zamil Majdy
7b951c977e feat(platform): implement graph-level Safe Mode toggle for HITL blocks (#11455)
## Summary

This PR implements a graph-level Safe Mode toggle system for
Human-in-the-Loop (HITL) blocks. When Safe Mode is ON (default), HITL
blocks require manual review before proceeding. When OFF, they execute
automatically.

## 🔧 Backend Changes

- **Database**: Added `metadata` JSON column to `AgentGraph` table with
migration
- **API**: Updated `execute_graph` endpoint to accept `safe_mode`
parameter
- **Execution**: Enhanced execution context to use graph metadata as
default with API override capability
- **Auto-detection**: Automatically populate `has_human_in_the_loop` for
graphs containing HITL blocks
- **Block Detection**: HITL block ID:
`8b2a7b3c-6e9d-4a5f-8c1b-2e3f4a5b6c7d`

## 🎨 Frontend Changes

- **Component**: New `FloatingSafeModeToggle` with dual variants:
  - **White variant**: For library pages, integrates with action buttons
  - **Black variant**: For builders, floating positioned  
- **Integration**: Added toggles to both new/legacy builders and library
pages
- **API Integration**: Direct graph metadata updates via
`usePutV1UpdateGraphVersion`
- **Query Management**: React Query cache invalidation for consistent UI
updates
- **Conditional Display**: Toggle only appears when graph contains HITL
blocks

## 🛠 Technical Implementation

- **Safe Mode ON** (default): HITL blocks require manual review before
proceeding
- **Safe Mode OFF**: HITL blocks execute automatically without
intervention
- **Priority**: Backend API `safe_mode` parameter takes precedence over
graph metadata
- **Detection**: Auto-populates `has_human_in_the_loop` metadata field
- **Positioning**: Proper z-index and responsive positioning for
floating elements

## 🚧 Known Issues (Work in Progress)

### High Priority
- [ ] **Toggle state persistence**: Always shows "ON" regardless of
actual state - query invalidation issue
- [ ] **LibraryAgent metadata**: Missing metadata field causing
TypeScript errors
- [ ] **Tooltip z-index**: Still covered by some UI elements despite
high z-index

### Medium Priority  
- [ ] **HITL detection**: Logic needs improvement for reliable block
detection
- [ ] **Error handling**: Removing HITL blocks from graph causes save
errors
- [ ] **TypeScript**: Fix type mismatches between GraphModel and
LibraryAgent

### Low Priority
- [ ] **Frontend API**: Add `safe_mode` parameter to execution calls
once OpenAPI is regenerated
- [ ] **Performance**: Consider debouncing rapid toggle clicks

## 🧪 Test Plan

- [ ] Verify toggle appears only when graph has HITL blocks
- [ ] Test toggle persistence across page refreshes  
- [ ] Confirm API calls update graph metadata correctly
- [ ] Validate execution behavior respects safe mode setting
- [ ] Check styling consistency across builder and library contexts

## 🔗 Related

- Addresses requirements for graph-level HITL configuration
- Builds on existing FloatingReviewsPanel infrastructure
- Integrates with existing graph metadata system

🤖 Generated with [Claude Code](https://claude.ai/code)
2025-12-02 09:55:55 +00:00
Zamil Majdy
3d08c22dd5 feat(platform): add Human In The Loop block with review workflow (#11380)
## Summary
This PR implements a comprehensive Human In The Loop (HITL) block that
allows agents to pause execution and wait for human
approval/modification of data before continuing.



https://github.com/user-attachments/assets/c027d731-17d3-494c-85ca-97c3bf33329c


## Key Features
- Added WAITING_FOR_REVIEW status to AgentExecutionStatus enum
- Created PendingHumanReview database table for storing review requests
- Implemented HumanInTheLoopBlock that extracts input data and creates
review entries
- Added API endpoints at /api/executions/review for fetching and
reviewing pending data
- Updated execution manager to properly handle waiting status and resume
after approval

## Frontend Components
- PendingReviewCard for individual review handling
- PendingReviewsList for multiple reviews
- FloatingReviewsPanel for graph builder integration
- Integrated review UI into 3 locations: legacy library, new library,
and graph builder

## Technical Implementation
- Added proper type safety throughout with SafeJson handling
- Optimized database queries using count functions instead of full data
fetching
- Fixed imports to be top-level instead of local
- All formatters and linters pass

## Test plan
- [ ] Test Human In The Loop block creation in graph builder
- [ ] Test block execution pauses and creates pending review
- [ ] Test review UI appears in all 3 locations
- [ ] Test data modification and approval workflow
- [ ] Test rejection workflow
- [ ] Test execution resumes after approval

🤖 Generated with [Claude Code](https://claude.ai/code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added Human-In-The-Loop review workflows to pause executions for human
validation.
* Users can approve or reject pending tasks, optionally editing
submitted data and adding a message.
* New "Waiting for Review" execution status with UI indicators across
run lists, badges, and activity views.
* Review management UI: pending review cards, list view, and a floating
reviews panel for quick access.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-27 12:07:46 +07:00
Zamil Majdy
901bb31e14 feat(backend): parameterize activity status generation with customizable prompts (#11407)
## Summary

Implement comprehensive parameterization of the activity status
generation system to enable custom prompts for admin analytics
dashboard.

## Changes Made

### Core Function Enhancement (`activity_status_generator.py`)
- **Extract hardcoded prompts to constants**: `DEFAULT_SYSTEM_PROMPT`
and `DEFAULT_USER_PROMPT`
- **Add prompt parameters**: `system_prompt`, `user_prompt` with
defaults to maintain backward compatibility
- **Template substitution system**: User prompt supports
`{{GRAPH_NAME}}` and `{{EXECUTION_DATA}}` placeholders
- **Skip existing flag**: `skip_existing` parameter allows admin to
force regeneration of existing data
- **Maintain manager compatibility**: All existing calls continue to
work with default parameters

### Admin API Enhancement (`execution_analytics_routes.py`)
- **Custom prompt fields**: `system_prompt` and `user_prompt` optional
fields in `ExecutionAnalyticsRequest`
- **Skip existing control**: `skip_existing` boolean flag for admin
regeneration option
- **Template documentation**: Clear documentation of placeholder system
in field descriptions
- **Backward compatibility**: All existing API calls work unchanged

### Template System Design
- **Simple placeholder replacement**: `{{GRAPH_NAME}}` → actual graph
name, `{{EXECUTION_DATA}}` → JSON execution data
- **No dependencies**: Uses simple `string.replace()` for maximum
compatibility
- **JSON safety**: Execution data properly serialized as indented JSON
- **Validation tested**: Template substitution verified to work
correctly

## Key Features

### For Regular Users (Manager Integration)
- **No changes required**: Existing manager.py calls work unchanged
- **Default behavior preserved**: Same prompts and logic as before
- **Feature flag compatibility**: LaunchDarkly integration unchanged

### For Admin Analytics Dashboard
- **Custom system prompts**: Admins can override the AI evaluation
criteria
- **Custom user prompts**: Admins can modify the analysis instructions
with execution data templates
- **Force regeneration**: `skip_existing=False` allows reprocessing
existing executions with new prompts
- **Complete model list**: Access to all LLM models from `llm.py` (70+
models including GPT, Claude, Gemini, etc.)

## Technical Validation
-  Template substitution tested and working
-  Default behavior preserved for existing code
-  Admin API parameter validation working
-  All imports and function signatures correct
-  Backward compatibility maintained

## Use Cases Enabled
- **A/B testing**: Compare different prompt strategies on same execution
data
- **Custom evaluation**: Tailor success criteria for specific graph
types
- **Prompt optimization**: Iterate on prompt design based on admin
feedback
- **Bulk reprocessing**: Regenerate activity status with improved
prompts

## Testing
- Template substitution functionality verified
- Function signatures and imports validated
- Code formatting and linting passed
- Backward compatibility confirmed

## Breaking Changes
None - all existing functionality preserved with default parameters.

## Related Issues
Resolves the requirement to expose prompt customization on the frontend
execution analytics dashboard.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-19 13:38:08 +00:00
Swifty
9438817702 fix(platform): Capture Sentry Block Errors Correctly (#11404)
Currently we are capturing block errors via the scope only, this change
captures the error directly.

### Changes 🏗️

- capture the error as well as the scope in the executor manager
- Update the block error message to include additional details
- remove the __str__ function from blockerror as it is no longer needed

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  <!-- Put your test plan here: -->
  - [x] Checked that errors are still captured in dev
2025-11-19 12:21:47 +01:00
Zamil Majdy
02757d68f3 fix(backend): resolve marketplace agent access in get_graph_execution endpoint (#11396)
## Summary
Fixes critical issue where `GET
/graphs/{graph_id}/executions/{graph_exec_id}` failed for marketplace
agents with "Graph not found" errors due to incorrect version access
checking.

## Root Cause
The endpoint was checking access to the **latest version** of a graph
instead of the **specific version used in the execution**. This broke
marketplace agents when:

1. User executes a marketplace agent (e.g., v3)
2. Graph owner later publishes a new version (e.g., v4) 
3. User tries to view execution details
4. **BUG**: Code checked access to latest version (v4) instead of
execution version (v3)
5. If v4 wasn't published to marketplace → access denied → "Graph not
found"

## Original Problematic Code
```python
# routers/v1.py - get_graph_execution (WRONG ORDER)
graph = await graph_db.get_graph(graph_id=graph_id, user_id=user_id)  #  Uses LATEST version
if not graph:
    raise HTTPException(404, f"Graph #{graph_id} not found")

result = await execution_db.get_graph_execution(...)  # Gets execution data
```

## Solution
**Reordered operations** to check access against the **execution's
specific version**:

```python
# NEW CODE (CORRECT ORDER)
result = await execution_db.get_graph_execution(...)  #  Get execution FIRST

if not await graph_db.get_graph(
    graph_id=result.graph_id,
    version=result.graph_version,  #  Use execution's version, not latest!
    user_id=user_id,
):
    raise HTTPException(404, f"Graph #{graph_id} not found")
```

### Key Changes Made

1. **Fixed version access logic** (routers/v1.py:1075-1095):
   - Reordered operations to get execution data first 
   - Check access using `result.graph_version` instead of latest version
   - Applied same fix to external API routes

2. **Enhanced `get_graph()` marketplace fallback**
(data/graph.py:919-935):
   - Added proper marketplace lookup when user doesn't own the graph
   - Supports version-specific marketplace access checking
   - Maintains security by only allowing approved, non-deleted listings

3. **Activity status generator fix**
(activity_status_generator.py:139-144):
   - Use `skip_access_check=True` for internal system operations

4. **Missing block handling** (data/graph.py:94-103):
- Added `_UnknownBlockBase` placeholder for graceful handling of deleted
blocks

## Example Scenario Fixed
1. **User**: Installs marketplace agent "Blog Writer" v3
2. **Owner**: Later publishes v4 (not to marketplace yet)  
3. **User**: Runs the agent (executes v3)
4. **Before**: Viewing execution details fails because code checked v4
access
5. **After**:  Viewing execution details works because code checks v3
access

## Impact
-  **Marketplace agents work correctly**: Users can view execution
details for any marketplace agent version they've used
-  **Backward compatibility**: Existing owned graphs continue working
-  **Security maintained**: Only allows access to versions user
legitimately executed
-  **Version-aware access control**: Proper access checking for
specific versions, not just latest

## Testing
- [x] Marketplace agents: Execution details now accessible for all
executed versions
- [x] Owned graphs: Continue working as before
- [x] Version scenarios: Access control works correctly for specific
versions
- [x] Missing blocks: Graceful handling without errors

**Root issue resolved**: Version mismatch between execution version and
access check version that was breaking marketplace agent execution
viewing.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-18 15:44:10 +00:00
Swifty
a66219fc1f fix(platform): Remove un-runnable agents from schedule (#11374)
Currently when an agent fails validation during a scheduled run, we
raise an error then try again, regardless of why.

This change removed the agent schedule and notifies the user 

### Changes 🏗️

- add schedule_id to the GraphExecutionJobArgs
- add agent_name to the GraphExecutionJobArgs
- Delete schedule on GraphValidationError
- Notify the user with a message that include the agent name

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  <!-- Put your test plan here: -->
  - [x] I have ensured the scheduler tests work with these changes
2025-11-17 15:24:40 +00:00
Reinier van der Leer
536e2a5ec8 fix(blocks): Make Smart Decision Maker tool pin handling consistent and reliable (#11363)
- Resolves #11345

### Changes 🏗️

- Move tool use routing logic from frontend to backend: routing info was
being baked into graph links by the frontend, inconsistently, causing
issues
- Rework tool use routing to use target node ID instead of target block
name
- Add a bit of magic to `NodeOutputs` component to show tool node title
instead of ID

DX:
- Removed `build` from `.prettierignore` -> re-enable formatting for
builder components

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Use SDM block in a graph; verify it works
  - [x] Use SDM block with agent executor block as tool; verify it works
  - Tests for `parse_execution_output` pass (checked by CI)
2025-11-12 18:55:38 +01:00
Zamil Majdy
d6ee402483 feat(platform): Add execution analytics admin endpoint with feature flag bypass (#11327)
This PR adds a comprehensive execution analytics admin endpoint that
generates AI-powered activity summaries and correctness scores for graph
executions, with proper feature flag bypass for admin use.

### Changes 🏗️

**Backend Changes:**
- Added admin endpoint: `/api/executions/admin/execution_analytics`
- Implemented feature flag bypass with `skip_feature_flag=True`
parameter for admin operations
- Fixed async database client usage (`get_db_async_client`) to resolve
async/await errors
- Added batch processing with configurable size limits to handle large
datasets
- Comprehensive error handling and logging for troubleshooting
- Renamed entire feature from "Activity Backfill" to "Execution
Analytics" for clarity

**Frontend Changes:**
- Created clean admin UI for execution analytics generation at
`/admin/execution-analytics`
- Built form with graph ID input, model selection dropdown, and optional
filters
- Implemented results table with status badges and detailed execution
information
- Added CSV export functionality for analytics results
- Integrated with generated TypeScript API client for proper
authentication
- Added proper error handling with toast notifications and loading
states

**Database & API:**
- Fixed critical async/await issue by switching from sync to async
database client
- Updated router configuration and endpoint naming for consistency
- Generated proper TypeScript types and API client integration
- Applied feature flag filtering at API level while bypassing for admin
operations

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:

**Test Plan:**
- [x] Admin can access execution analytics page at
`/admin/execution-analytics`
- [x] Form validation works correctly (requires graph ID, validates
inputs)
- [x] API endpoint `/api/executions/admin/execution_analytics` responds
correctly
- [x] Authentication works properly through generated API client
- [x] Analytics generation works with different LLM models (gpt-4o-mini,
gpt-4o, etc.)
- [x] Results display correctly with appropriate status badges
(success/failed/skipped)
- [x] CSV export functionality downloads correct data
- [x] Error handling displays appropriate toast messages
- [x] Feature flag bypass works for admin users (generates analytics
regardless of user flags)
- [x] Batch processing handles multiple executions correctly
- [x] Loading states show proper feedback during processing

#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] No configuration changes required for this feature

**Related to:** PR #11325 (base correctness score functionality)

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Zamil Majdy <majdyz@users.noreply.github.com>
2025-11-10 10:27:44 +00:00
Zamil Majdy
6037f80502 feat(backend): Add correctness score to execution activity generation (#11325)
## Summary
Add AI-generated correctness score field to execution activity status
generation to provide quantitative assessment of how well executions
achieved their intended purpose.

New page:
<img width="1000" height="229" alt="image"
src="https://github.com/user-attachments/assets/5cb907cf-5bc7-4b96-8128-8eecccde9960"
/>

Old page:
<img width="1000" alt="image"
src="https://github.com/user-attachments/assets/ece0dfab-1e50-4121-9985-d585f7fcd4d2"
/>


## What Changed
- Added `correctness_score` field (float 0.0-1.0) to
`GraphExecutionStats` model
- **REFACTORED**: Removed duplicate `llm_utils.py` and reused existing
`AIStructuredResponseGeneratorBlock` logic
- Updated activity status generator to use structured responses instead
of plain text
- Modified prompts to include correctness assessment with 5-tier scoring
system:
  - 0.0-0.2: Failure
  - 0.2-0.4: Poor 
  - 0.4-0.6: Partial Success
  - 0.6-0.8: Mostly Successful
  - 0.8-1.0: Success
- Updated manager.py to extract and set both activity_status and
correctness_score
- Fixed tests to work with existing structured response interface

## Technical Details
- **Code Reuse**: Eliminated duplication by using existing
`AIStructuredResponseGeneratorBlock` instead of creating new LLM
utilities
- Added JSON validation with retry logic for malformed responses
- Maintained backward compatibility for existing activity status
functionality
- Score is clamped to valid 0.0-1.0 range and validated
- All type errors resolved and linting passes

## Test Plan
- [x] All existing tests pass with refactored structure
- [x] Structured LLM call functionality tested with success and error
cases
- [x] Activity status generation tested with various execution scenarios
- [x] Integration tests verify both fields are properly set in execution
stats
- [x] No code duplication - reuses existing block logic

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Zamil Majdy <majdyz@users.noreply.github.com>
2025-11-06 04:42:13 +00:00
Reinier van der Leer
de7c5b5c31 Merge branch 'master' into dev 2025-11-05 20:17:27 +01:00
Reinier van der Leer
d68dceb9c1 fix(backend/executor): Improve graph execution permission check (#11323)
- Resolves #11316
- Durable fix to replace #11318

### Changes 🏗️

- Expand graph execution permissions check
  - Don't require library membership for execution as sub-graph

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Can run sub-agent with non-latest graph version
- [x] Can run sub-agent that is available in Marketplace but not added
to Library
2025-11-05 17:13:41 +00:00
Zamil Majdy
193866232c hotfix(backend): fix rate-limited messages blocking queue by republishing to back (#11326)
## Summary
Fix critical queue blocking issue where rate-limited user messages
prevent other users' executions from being processed, causing the 135
late executions reported in production.

## Root Cause Analysis
When a user exceeds `max_concurrent_graph_executions_per_user` (25), the
executor uses `basic_nack(requeue=True)` which sends the message to the
**FRONT** of the RabbitMQ queue. This creates an infinite blocking loop
where:
1. Rate-limited message goes to front of queue
2. Gets processed, hits rate limit again  
3. Goes back to front of queue
4. Blocks all other users' messages indefinitely

## Solution Implementation

### 🔧 Core Changes
- **New setting**: `requeue_by_republishing` (default: `True`) in
`backend/util/settings.py`
- **Smart `_ack_message`**: Automatically uses republishing when
`requeue=True` and setting enabled
- **Efficient implementation**: Uses existing `self.run_client`
connection instead of creating new ones
- **Integration test**: Real RabbitMQ test validates queue ordering
behavior

### 🔄 Technical Implementation
**Before (blocking):**
```python
basic_nack(delivery_tag, requeue=True)  # Goes to FRONT of queue 
```

**After (non-blocking):**
```python
if requeue and self.config.requeue_by_republishing:
    # First: Republish to BACK of queue
    self.run_client.publish_message(...)
    # Then: Reject without requeue
    basic_nack(delivery_tag, requeue=False)
```

### 📊 Impact
-  **Other users' executions no longer blocked** by rate-limited users
-  **Fair queue processing** - FIFO behavior maintained for all users
-  **Rate limiting still works** - just doesn't block others
-  **Configurable** - can revert to old behavior with
`requeue_by_republishing=False`
-  **Zero performance impact** - uses existing connections

## Test Plan
- **Integration test**: `test_requeue_integration.py` validates real
RabbitMQ queue ordering
- **Scenario testing**: Confirms rate-limited messages go to back of
queue
- **Cross-user validation**: Verifies other users' messages process
correctly
- **Setting test**: Confirms configuration loads with correct defaults

## Deployment Strategy
This is a **hotfix** that can be deployed immediately:
- **Backward compatible**: Old behavior available via config
- **Safe default**: New behavior is safer than current state
- **No breaking changes**: All existing functionality preserved
- **Immediate relief**: Resolves production queue blocking

## Files Modified
- `backend/executor/manager.py`: Enhanced `_ack_message` logic and
`_requeue_message_to_back` method
- `backend/util/settings.py`: Added `requeue_by_republishing`
configuration field
- `test_requeue_integration.py`: Integration test for queue ordering
validation

## Related Issues
Fixes the 135 late executions issue where messages were stuck in QUEUED
state despite available executor capacity (583m/600m utilization).

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-05 16:24:07 +00:00
Zamil Majdy
910fd2640d hotfix(backend): Temporarily disable library existence check for graph execution (#11318)
### Changes 🏗️

add_store_agent_to_library does not add subagents to the user library,
this check can cause issues.

### Checklist 📋

#### For code changes:
- [ ] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan:
  <!-- Put your test plan here: -->
  - [ ] ...

<details>
  <summary>Example test plan</summary>
  
  - [ ] Create from scratch and execute an agent with at least 3 blocks
- [ ] Import an agent from file upload, and confirm it executes
correctly
  - [ ] Upload agent to marketplace
- [ ] Import an agent from marketplace and confirm it executes correctly
  - [ ] Edit an agent from monitor, and confirm it executes correctly
</details>

#### For configuration changes:

- [ ] `.env.default` is updated or already compatible with my changes
- [ ] `docker-compose.yml` is updated or already compatible with my
changes
- [ ] I have included a list of my configuration changes in the PR
description (under **Changes**)

<details>
  <summary>Examples of configuration changes</summary>

  - Changing ports
  - Adding new services that need to communicate with each other
  - Secrets or environment variable changes
  - New or infrastructure changes such as databases
</details>
2025-11-04 13:54:48 +00:00
Zamil Majdy
5506d59da1 fix(backend/executor): make graph execution permission check version-agnostic (#11283)
## Summary
Fix critical issue where pre-execution permission validation broke
execution of graphs that reference older versions of sub-graphs.

## Problem
The `validate_graph_execution_permissions` function was checking for the
specific version of a graph in the user's library. This caused failures
when:
1. A parent graph references an older version of a sub-graph  
2. The user updates the sub-graph to a newer version
3. The older version is no longer in their library
4. Execution of the parent graph fails with `GraphNotInLibraryError`

## Root Cause
In `backend/executor/utils.py` line 523, the function was checking for
the exact version, but sub-graphs legitimately reference older versions
that may no longer be in the library.

## Solution

### 1. Remove Version-Specific Check (backend/executor/utils.py)
- Remove `graph_version=graph.version` parameter from validation call
- Add explanatory comment about version-agnostic behavior
- Now only checks that the graph ID exists in user's library (any
version)

### 2. Enhance Documentation (backend/data/graph.py)  
- Update function docstring to explain version-agnostic behavior
- Document that `None` (now default) allows execution of any version
- Clarify this is important for sub-graph version compatibility

## Technical Details
The `validate_graph_execution_permissions` function was already designed
to handle version-agnostic checks when `graph_version=None`. By omitting
the version parameter, we skip the version check and only verify:
- Graph exists in user's library  
- Graph is not deleted/archived
- User has execution permissions

## Impact
-  Parent graphs can execute even when they reference older sub-graph
versions
-  Sub-graph updates don't break existing parent graphs  
-  Maintains security: still checks library membership and permissions
-  No breaking changes: version-specific validation still available
when needed

## Example Scenario Fixed
1. User creates parent graph that uses sub-graph v1
2. User updates sub-graph to v2 (v1 removed from library)  
3. Parent graph still references sub-graph v1
4. **Before**: Execution fails with `GraphNotInLibraryError`
5. **After**: Execution succeeds (version-agnostic permission check)

## Testing
- [x] Code formatting and linting passes
- [x] Type checking passes
- [x] No breaking changes to existing functionality
- [x] Security still maintained through library membership checks

## Files Changed
- `backend/executor/utils.py`: Remove version-specific permission check
- `backend/data/graph.py`: Enhanced documentation for version-agnostic
behavior

Closes #[issue-number-if-applicable]

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-29 14:13:23 +00:00
Zamil Majdy
4922f88851 feat(backend/executor): Implement cascading stop for nested graph executions (#11277)
## Summary
Fixes critical issue where child executions spawned by
`AgentExecutorBlock` continue running after parent execution is stopped.
Implements parent-child execution tracking and recursive cascading stop
logic to ensure entire execution trees are terminated together.

## Background
When a parent graph execution containing `AgentExecutorBlock` nodes is
stopped, only the parent was terminated. Child executions continued
running, leading to:
-  Orphaned child executions consuming credits
-  No user control over execution trees  
-  Race conditions where children start after parent stops
-  Resource leaks from abandoned executions

## Core Changes

### 1. Database Schema (`schema.prisma` + migration)
```sql
-- Add nullable parent tracking field
ALTER TABLE "AgentGraphExecution" ADD COLUMN "parentGraphExecutionId" TEXT;

-- Add self-referential foreign key with graceful deletion
ALTER TABLE "AgentGraphExecution" ADD CONSTRAINT "AgentGraphExecution_parentGraphExecutionId_fkey" 
  FOREIGN KEY ("parentGraphExecutionId") REFERENCES "AgentGraphExecution"("id") 
  ON DELETE SET NULL ON UPDATE CASCADE;

-- Add index for efficient child queries
CREATE INDEX "AgentGraphExecution_parentGraphExecutionId_idx" 
  ON "AgentGraphExecution"("parentGraphExecutionId");
```

### 2. Parent ID Propagation (`backend/blocks/agent.py`)
```python
# Extract current graph execution ID and pass as parent to child
execution = add_graph_execution(
    # ... other params
    parent_graph_exec_id=graph_exec_id,  # NEW: Track parent relationship
)
```

### 3. Data Layer (`backend/data/execution.py`)
```python
async def get_child_graph_executions(parent_exec_id: str) -> list[GraphExecution]:
    """Get all child executions of a parent execution."""
    children = await AgentGraphExecution.prisma().find_many(
        where={"parentGraphExecutionId": parent_exec_id, "isDeleted": False}
    )
    return [GraphExecution.from_db(child) for child in children]
```

### 4. Cascading Stop Logic (`backend/executor/utils.py`)
```python
async def stop_graph_execution(
    user_id: str,
    graph_exec_id: str,
    wait_timeout: float = 15.0,
    cascade: bool = True,  # NEW parameter
):
    # 1. Find all child executions
    if cascade:
        children = await _get_child_executions(graph_exec_id)
        
        # 2. Stop all children recursively in parallel
        if children:
            await asyncio.gather(
                *[stop_graph_execution(user_id, child.id, wait_timeout, True) 
                  for child in children],
                return_exceptions=True,  # Don't fail parent if child fails
            )
    
    # 3. Stop the parent execution
    # ... existing stop logic
```

### 5. Race Condition Prevention (`backend/executor/manager.py`)
```python
# Before executing queued child, check if parent was terminated
if parent_graph_exec_id:
    parent_exec = get_db_client().get_graph_execution_meta(parent_graph_exec_id, user_id)
    if parent_exec and parent_exec.status == ExecutionStatus.TERMINATED:
        # Skip execution, mark child as terminated
        get_db_client().update_graph_execution_stats(
            graph_exec_id=graph_exec_id,
            status=ExecutionStatus.TERMINATED,
        )
        return  # Don't start orphaned child
```

## How It Works

### Before (Broken)
```
User stops parent execution
    ↓
Parent terminates ✓
    ↓
Child executions keep running ✗
    ↓
User cannot stop children ✗
```

### After (Fixed)
```
User stops parent execution
    ↓
Query database for all children
    ↓
Recursively stop all children in parallel
    ↓
Wait for children to terminate
    ↓
Stop parent execution
    ↓
All executions in tree stopped ✓
```

### Race Prevention
```
Child in QUEUED status
    ↓
Parent stopped
    ↓
Child picked up by executor
    ↓
Pre-flight check: parent TERMINATED?
    ↓
Yes → Skip execution, mark child TERMINATED
    ↓
Child never runs ✓
```

## Edge Cases Handled
 **Deep nesting** - Recursive cascading handles multi-level trees  
 **Queued children** - Pre-flight check prevents execution  
 **Race conditions** - Child spawned during stop operation  
 **Partial failures** - `return_exceptions=True` continues on error  
 **Multiple children** - Parallel stop via `asyncio.gather()`  
 **No parent** - Backward compatible (nullable field)  
 **Already completed** - Existing status check handles it  

## Performance Impact
- **Stop operation**: O(depth) with parallel execution vs O(1) before
- **Memory**: +36 bytes per execution (one UUID reference)
- **Database**: +1 query per tree level, indexed for efficiency

## API Changes (Backward Compatible)

### `stop_graph_execution()` - New Optional Parameter
```python
# Before
async def stop_graph_execution(user_id: str, graph_exec_id: str, wait_timeout: float = 15.0)

# After  
async def stop_graph_execution(user_id: str, graph_exec_id: str, wait_timeout: float = 15.0, cascade: bool = True)
```
**Default `cascade=True`** means existing callers get the new behavior
automatically.

### `add_graph_execution()` - New Optional Parameter
```python
async def add_graph_execution(..., parent_graph_exec_id: Optional[str] = None)
```

## Security & Safety
-  **User verification** - Users can only stop their own executions
(parent + children)
-  **No cycles** - Self-referential FK prevents infinite loops  
-  **Graceful degradation** - Errors in child stops don't block parent
stop
-  **Rate limits** - Existing execution rate limits still apply

## Testing Checklist

### Database Migration
- [x] Migration runs successfully  
- [x] Prisma client regenerates without errors
- [x] Existing tests pass

### Core Functionality  
- [ ] Manual test: Stop parent with running child → child stops
- [ ] Manual test: Stop parent with queued child → child never starts
- [ ] Unit test: Cascading stop with multiple children
- [ ] Unit test: Deep nesting (3+ levels)
- [ ] Integration test: Race condition prevention

## Breaking Changes
**None** - All changes are backward compatible with existing code.

## Rollback Plan
If issues arise:
1. **Code rollback**: Revert PR, redeploy
2. **Database rollback**: Drop column and constraints (non-destructive)

---

**Note**: This branch contains additional unrelated changes from merging
with `dev`. The core cascading stop feature involves only:
- `schema.prisma` + migration
- `backend/data/execution.py` 
- `backend/executor/utils.py`
- `backend/blocks/agent.py`
- `backend/executor/manager.py`

All other file changes are from dev branch updates and not part of this
feature.

🤖 Generated with [Claude Code](https://claude.ai/code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Nested graph executions: parent-child tracking and retrieval of child
executions

* **Improvements**
* Cascading stop: stopping a parent optionally terminates child
executions
  * Parent execution IDs propagated through runs and surfaced in logs
  * Per-user/graph concurrent execution limits enforced

* **Bug Fixes**
* Skip enqueuing children if parent is terminated; robust handling when
parent-status checks fail

* **Tests**
  * Updated tests to cover parent linkage in graph creation
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-29 11:11:22 +00:00
Zamil Majdy
5fb142c656 fix(backend/executor): ensure cluster lock release on all execution submission failures (#11281)
## Root Cause
During rolling deployment, execution
`97058338-052a-4528-87f4-98c88416bb7f` got stuck in QUEUED state
because:

1. Pod acquired cluster lock successfully during shutdown  
2. Subsequent setup operations failed (ThreadPoolExecutor shutdown,
resource exhaustion, etc.)
3. **No error handling existed** around the critical section after lock
acquisition
4. Cluster lock remained stuck in Redis for 5 minutes (TTL timeout)
5. Other pods couldn't acquire the lock, leaving execution permanently
queued

## The Fix

### Problem: Critical Section Not Protected
The original code had no error handling for the entire critical section
after successful lock acquisition:
```python
# Original code - no error handling after lock acquired
current_owner = cluster_lock.try_acquire()
if current_owner != self.executor_id:
    return  # didn't get lock
    
# CRITICAL SECTION - any failure here leaves lock stuck
self._execution_locks[graph_exec_id] = cluster_lock  # Could fail: memory
logger.info("Acquired cluster lock...")              # Could fail: logging  
cancel_event = threading.Event()                     # Could fail: resources
future = self.executor.submit(...)                   # Could fail: shutdown
self.active_graph_runs[...] = (future, cancel_event) # Could fail: memory
```

### Solution: Wrap Entire Critical Section  
Protect ALL operations after successful lock acquisition:
```python
# Fixed code - comprehensive error handling
current_owner = cluster_lock.try_acquire()
if current_owner != self.executor_id:
    return  # didn't get lock

# Wrap ENTIRE critical section after successful acquisition
try:
    self._execution_locks[graph_exec_id] = cluster_lock
    logger.info("Acquired cluster lock...")
    cancel_event = threading.Event()
    future = self.executor.submit(...)
    self.active_graph_runs[...] = (future, cancel_event)
except Exception as e:
    # Release cluster lock before requeue
    cluster_lock.release()
    del self._execution_locks[graph_exec_id] 
    _ack_message(reject=True, requeue=True)
    return
```

### Why This Comprehensive Approach Works
- **Complete protection**: Any failure in critical section → lock
released
- **Proper cleanup order**: Lock released → message requeued → another
pod can try
- **Uses existing infrastructure**: Leverages established
`_ack_message()` requeue logic
- **Handles all scenarios**: ThreadPoolExecutor shutdown, resource
exhaustion, memory issues, logging failures

## Protected Failure Scenarios
1. **Memory exhaustion**: `_execution_locks` assignment or
`active_graph_runs` assignment
2. **Resource exhaustion**: `threading.Event()` creation fails
3. **ThreadPoolExecutor shutdown**: `executor.submit()` with "cannot
schedule new futures after shutdown"
4. **Logging system failures**: `logger.info()` calls fail
5. **Any unexpected exceptions**: Network issues, disk problems, etc.

## Validation
-  All existing tests pass  
-  Maintains exact same success path behavior
-  Comprehensive error handling for all failure points
-  Minimal code change with maximum protection

## Impact
- **Eliminates stuck executions** during pod lifecycle events (rolling
deployments, scaling, crashes)
- **Faster recovery**: Immediate requeue vs 5-minute Redis TTL wait
- **Higher reliability**: Handles ANY failure in the critical section
- **Production-ready**: Comprehensive solution for distributed lock
management

This prevents the exact race condition that caused execution
`97058338-052a-4528-87f4-98c88416bb7f` to be stuck for >300 seconds,
plus many other potential failure scenarios.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-29 08:56:24 +00:00
Zamil Majdy
de70ede54a fix(backend): prevent execution of deleted agents and cleanup orphaned resources (#11243)
## Summary
Fix critical bug where deleted agents continue running scheduled and
triggered executions indefinitely, consuming credits without user
control.

## Problem
When agents are deleted from user libraries, their schedules and webhook
triggers remain active, leading to:
-  Uncontrolled resource consumption 
-  "Unknown agent" executions that charge credits
-  No way for users to stop orphaned executions
-  Accumulation of orphaned database records

## Solution

### 1. Prevention: Library Validation Before Execution
- Add `is_graph_in_user_library()` function with efficient database
queries
- Validate graph accessibility before all executions in
`validate_and_construct_node_execution_input()`
- Use specific `GraphNotInLibraryError` for clear error handling

### 2. Cleanup: Remove Schedules & Webhooks on Deletion
- Enhanced `delete_library_agent()` to clean up associated schedules and
webhooks
- Comprehensive cleanup functions for both scheduled and triggered
executions
- Proper database transaction handling

### 3. Error-Based Cleanup: Handle Existing Orphaned Resources
- Catch `GraphNotInLibraryError` in scheduler and webhook handlers
- Automatically clean up orphaned resources when execution fails
- Graceful degradation without breaking existing workflows

### 4. Migration: Clean Up Historical Orphans
- SQL migration to remove existing orphaned schedules and webhooks
- Performance index for faster cleanup queries
- Proper logging and error handling

## Key Changes

### Core Library Validation
```python
# backend/data/graph.py - Single source of truth
async def is_graph_in_user_library(graph_id: str, user_id: str, graph_version: Optional[int] = None) -> bool:
    where_clause = {"userId": user_id, "agentGraphId": graph_id, "isDeleted": False, "isArchived": False}
    if graph_version is not None:
        where_clause["agentGraphVersion"] = graph_version
    count = await LibraryAgent.prisma().count(where=where_clause)
    return count > 0
```

### Enhanced Agent Deletion
```python
# backend/server/v2/library/db.py
async def delete_library_agent(library_agent_id: str, user_id: str, soft_delete: bool = True) -> None:
    # ... existing deletion logic ...
    await _cleanup_schedules_for_graph(graph_id=graph_id, user_id=user_id)
    await _cleanup_webhooks_for_graph(graph_id=graph_id, user_id=user_id)
```

### Execution Prevention
```python
# backend/executor/utils.py
if not await gdb.is_graph_in_user_library(graph_id=graph_id, user_id=user_id, graph_version=graph.version):
    raise GraphNotInLibraryError(f"Graph #{graph_id} is not accessible in your library")
```

### Error-Based Cleanup
```python
# backend/executor/scheduler.py & backend/server/integrations/router.py
except GraphNotInLibraryError as e:
    logger.warning(f"Execution blocked for deleted/archived graph {graph_id}")
    await _cleanup_orphaned_resources_for_graph(graph_id, user_id)
```

## Technical Implementation

### Database Efficiency
- Use `count()` instead of `find_first()` for faster queries
- Add performance index: `idx_library_agent_user_graph_active`
- Follow existing `prisma.is_connected()` patterns

### Error Handling Hierarchy
- **`GraphNotInLibraryError`**: Specific exception for deleted/archived
graphs
- **`NotAuthorizedError`**: Generic authorization errors (preserved for
user ID mismatches)
- Clear error messages for better debugging

### Code Organization
- Single source of truth for library validation in
`backend/data/graph.py`
- Import from centralized location to avoid duplication
- Top-level imports following codebase conventions

## Testing & Validation

### Functional Testing
-  Library validation prevents execution of deleted agents
-  Cleanup functions remove schedules and webhooks properly  
-  Error-based cleanup handles orphaned resources gracefully
-  Migration removes existing orphaned records

### Integration Testing
-  All existing tests pass (including `test_store_listing_graph`)
-  No breaking changes to existing functionality
-  Proper error propagation and handling

### Performance Testing
-  Efficient database queries with proper indexing
-  Minimal overhead for normal execution flows
-  Cleanup operations don't impact performance

## Impact

### User Experience
- 🎯 **Immediate**: Deleted agents stop running automatically
- 🎯 **Ongoing**: No more unexpected credit charges from orphaned
executions
- 🎯 **Cleanup**: Historical orphaned resources are removed

### System Reliability
- 🔒 **Security**: Users can only execute agents they have access to
- 🧹 **Cleanup**: Automatic removal of orphaned database records
- 📈 **Performance**: Efficient validation with minimal overhead

### Developer Experience
- 🎯 **Clear Errors**: Specific exception types for better debugging
- 🔧 **Maintainable**: Centralized library validation logic
- 📚 **Documented**: Comprehensive error handling patterns

## Files Modified
- `backend/data/graph.py` - Library validation function
- `backend/server/v2/library/db.py` - Enhanced agent deletion with
cleanup
- `backend/executor/utils.py` - Execution validation and prevention
- `backend/executor/scheduler.py` - Error-based cleanup for schedules
- `backend/server/integrations/router.py` - Error-based cleanup for
webhooks
- `backend/util/exceptions.py` - Specific error type for deleted graphs
-
`migrations/20251023000000_cleanup_orphaned_schedules_and_webhooks/migration.sql`
- Historical cleanup

## Breaking Changes
None. All changes are backward compatible and preserve existing
functionality.

## Follow-up Tasks
- [ ] Monitor cleanup effectiveness in production
- [ ] Consider adding metrics for orphaned resource detection
- [ ] Potential optimization of cleanup batch operations

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-28 23:48:35 +00:00
Reinier van der Leer
5e5f45a713 fix(backend): Fix various warnings (#11252)
- Resolves #11251

This fixes all the warnings mentioned in #11251, reducing noise and
making our logs and error alerts more useful :)

### Changes 🏗️

- Remove "Block {block_name} has multiple credential inputs" warning
(not actually an issue)
- Rename `json` attribute of `MainCodeExecutionResult` to `json_data`;
retain serialized name through a field alias
- Replace `Path(regex=...)` with `Path(pattern=...)` in
`get_shared_execution` endpoint parameter config
- Change Uvicorn's WebSocket module to new Sans-I/O implementation for
WS server
- Disable Uvicorn's WebSocket module for REST server
- Remove deprecated `enable_cleanup_closed=True` argument in
`CloudStorageHandler` implementation
- Replace Prisma transaction timeout `int` argument with a `timedelta`
value
- Update Sentry SDK to latest version (v2.42.1)
- Broaden filter for cleanup warnings from indirect dependency `litellm`
- Fix handling of `MissingConfigError` in REST server endpoints

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - Check that the warnings are actually gone
- [x] Deploy to dev environment and run a graph; check for any warnings
  - Test WebSocket server
- [x] Run an agent in the Builder; make sure real-time execution updates
still work
2025-10-28 13:18:45 +00:00
Reinier van der Leer
e06e7ff33f fix(backend): Implement graceful shutdown in AppService to prevent RPC errors (#11240)
We're currently seeing errors in the `DatabaseManager` while it's
shutting down, like:

```
WARNING [DatabaseManager] Termination request: SystemExit; 0 executing cleanup.
INFO [DatabaseManager]  Disconnecting Database...
INFO [PID-1|THREAD-29|DatabaseManager|Prisma-82fb1994-4b87-40c1-8869-fbd97bd33fc8] Releasing connection started...
INFO [PID-1|THREAD-29|DatabaseManager|Prisma-82fb1994-4b87-40c1-8869-fbd97bd33fc8] Releasing connection completed successfully.
INFO [DatabaseManager] Terminated.
ERROR POST /create_or_add_to_user_notification_batch failed: Failed to create or add to notification batch for user {user_id} and type AGENT_RUN: NoneType: None
```

This indicates two issues:
- The service doesn't wait for pending RPC calls to finish before
terminating
- We're using `logger.exception` outside an error handling context,
causing the confusing and not much useful `NoneType: None` to be printed
instead of error info

### Changes 🏗️

- Implement graceful shutdown in `AppService` so in-flight RPC calls can
finish
  - Add tests for graceful shutdown
  - Prevent `AppService` accepting new requests during shutdown
- Rework `AppService` lifecycle management; add support for async
`lifespan`
- Fix `AppService` endpoint error logging
- Improve logging in `AppProcess` and `AppService`

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- Deploy to Dev cluster, then `kubectl rollout restart` the different
services a few times
    - [x] -> `DatabaseManager` doesn't break on re-deployment
    - [x] -> `Scheduler` doesn't break on re-deployment
    - [x] -> `NotificationManager` doesn't break on re-deployment
2025-10-25 14:47:19 +00:00
Reinier van der Leer
04df981115 fix(backend): Fix structured logging for cloud environments (#11227)
- Resolves #11226

### Changes 🏗️

- Drop use of `CloudLoggingHandler` which docs state isn't for use in
GKE
- For cloud logging, output only structured log entries to `stdout`

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Test deploy to dev and check logs
2025-10-21 12:48:41 +00:00
Zamil Majdy
11d55f6055 fix(backend/executor): Avoid running direct query in executor (#11224)
## Summary
- Fixes database connection warnings in executor logs: "Client is not
connected to the query engine, you must call `connect()` before
attempting to query data"
- Implements resilient database client pattern already used elsewhere in
the codebase
- Adds caching to reduce database load for user context lookups

## Changes
- Updated `get_user_context()` to check `prisma.is_connected()` and fall
back to database manager client
- Added `@cached(maxsize=1000, ttl_seconds=3600)` decorator for
performance optimization
- Updated database manager to expose `get_user_by_id` method

## Test plan
- [x] Verify executor pods no longer show Prisma connection warnings
- [x] Confirm user timezone is still correctly retrieved
- [x] Test fallback behavior when Prisma is disconnected

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-21 08:46:40 +00:00
Zamil Majdy
0bb2b87c32 fix(backend): resolve UserBalance migration issues and credit spending bug (#11192)
## Summary
Fix critical UserBalance migration and spending issues affecting users
with credits from transaction history but no UserBalance records.

## Root Issues Fixed

### Issue 1: UserBalance Migration Complexity
- **Problem**: Complex data migration with timestamp logic issues and
potential race conditions
- **Solution**: Simplified to idempotent table creation only,
application handles auto-population

### Issue 2: Credit Spending Bug  
- **Problem**: Users with $10.0 from transaction history couldn't spend
$0.16
- **Root Cause**: `_add_transaction` and `_enable_transaction` only
checked UserBalance table, returning 0 balance for users without records
- **Solution**: Enhanced both methods with transaction history fallback
logic

### Issue 3: Exception Handling Inconsistency
- **Problem**: Raw SQL unique violations raised different exception
types than Prisma ORM
- **Solution**: Convert raw SQL unique violations to
`UniqueViolationError` at source

## Changes Made

### Migration Cleanup
- **Idempotent operations**: Use `CREATE TABLE IF NOT EXISTS`, `CREATE
INDEX IF NOT EXISTS`
- **Inline foreign key**: Define constraint within `CREATE TABLE`
instead of separate `ALTER TABLE`
- **Removed data migration**: Application creates UserBalance records
on-demand
- **Safe to re-run**: No errors if table/index/constraint already exists

### Credit Logic Fixes
- **Enhanced `_add_transaction`**: Added transaction history fallback in
`user_balance_lock` CTE
- **Enhanced `_enable_transaction`**: Added same fallback logic for
payment fulfillment
- **Exception normalization**: Convert raw SQL unique violations to
`UniqueViolationError`
- **Simplified `onboarding_reward`**: Use standardized
`UniqueViolationError` catching

### SQL Fallback Pattern
```sql
COALESCE(
    (SELECT balance FROM UserBalance WHERE userId = ? FOR UPDATE),
    -- Fallback: compute from transaction history if UserBalance doesn't exist
    (SELECT COALESCE(ct.runningBalance, 0) 
     FROM CreditTransaction ct 
     WHERE ct.userId = ? AND ct.isActive = true AND ct.runningBalance IS NOT NULL 
     ORDER BY ct.createdAt DESC LIMIT 1),
    0
) as balance
```

## Impact

### Before
-  Users with transaction history but no UserBalance couldn't spend
credits
-  Migration had complex timestamp logic with potential bugs  
-  Raw SQL and Prisma exceptions handled differently
-  Error: "Insufficient balance of $10.0, where this will cost $0.16"

### After  
-  Seamless spending for all users regardless of UserBalance record
existence
-  Simple, idempotent migration that's safe to re-run
-  Consistent exception handling across all credit operations
-  Automatic UserBalance record creation during first transaction
-  Backward compatible - existing users unaffected

## Business Value
- **Eliminates user frustration**: Users can spend their credits
immediately
- **Smooth migration path**: From old User.balance to new UserBalance
table
- **Better reliability**: Atomic operations with proper error handling
- **Maintainable code**: Consistent patterns across credit operations

## Test Plan
- [ ] Manual testing with users who have transaction history but no
UserBalance records
- [ ] Verify migration can be run multiple times safely
- [ ] Test spending credits works for all user scenarios
- [ ] Verify payment fulfillment (`_enable_transaction`) works correctly
- [ ] Add comprehensive test coverage for this scenario

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-17 19:46:13 +07:00
Zamil Majdy
dfdd632161 fix(backend/util): handle nested Pydantic models in SafeJson (#11188)
## Summary

Fixes a critical serialization bug introduced in PR #11187 where
`SafeJson` failed to serialize dictionaries containing Pydantic models,
causing 500 Internal Server Errors in the executor service.

## Problem

The error manifested as:
```
CRITICAL: Operation Approaching Failure Threshold: Service communication: '_call_method_async'
Current attempt: 50/50
Error: HTTPServerError: HTTP 500: Server error '500 Internal Server Error' 
for url 'http://autogpt-database-manager.prod-agpt.svc.cluster.local:8005/create_graph_execution'
```

Root cause in `create_graph_execution`
(backend/data/execution.py:656-657):
```python
"credentialInputs": SafeJson(credential_inputs) if credential_inputs else Json({})
```

Where `credential_inputs: Mapping[str, CredentialsMetaInput]` is a dict
containing Pydantic models.

After PR #11187's refactor, `_sanitize_value()` only converted top-level
BaseModel instances to dicts, but didn't handle BaseModel instances
nested inside dicts/lists/tuples. This caused Prisma's JSON serializer
to fail with:
```
TypeError: Type <class 'backend.data.model.CredentialsMetaInput'> not serializable
```

## Solution

Added BaseModel handling to `_sanitize_value()` to recursively convert
Pydantic models to dicts before sanitizing:

```python
elif isinstance(value, BaseModel):
    # Convert Pydantic models to dict and recursively sanitize
    return _sanitize_value(value.model_dump(exclude_none=True))
```

This ensures all nested Pydantic models are properly serialized
regardless of nesting depth.

## Changes

- **backend/util/json.py**: Added BaseModel check to `_sanitize_value()`
function
- **backend/util/test_json.py**: Added 6 comprehensive tests covering:
  - Dict containing Pydantic models
  - Deeply nested Pydantic models  
  - Lists of Pydantic models in dicts
  - The exact CredentialsMetaInput scenario
  - Complex mixed structures
  - Models with control characters

## Testing

 All new tests pass  
 Verified fix resolves the production 500 error  
 Code formatted with `poetry run format`

## Related

- Fixes issues introduced in PR #11187
- Related to executor service 500 errors in production

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Bentlybro <Github@bentlybro.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-10-17 09:27:09 +00:00
Zamil Majdy
374f35874c feat(platform): Add LaunchDarkly flag for platform payment system (#11181)
## Summary

Implement selective rollout of payment functionality using LaunchDarkly
feature flags to enable gradual deployment to pilot users.

- Add `ENABLE_PLATFORM_PAYMENT` flag to control credit system behavior
- Update `get_user_credit_model` to use user-specific flag evaluation  
- Replace hardcoded `NEXT_PUBLIC_SHOW_BILLING_PAGE` with LaunchDarkly
flag
- Enable payment UI components only for flagged users
- Maintain backward compatibility with existing beta credit system
- Default to beta monthly credits when flag is disabled
- Fix tests to work with new async credit model function

## Key Changes

### Backend
- **Credit Model Selection**: The `get_user_credit_model()` function now
takes a `user_id` parameter and uses LaunchDarkly to determine which
credit model to return:
- Flag enabled → `UserCredit` (payment system enabled, no monthly
refills)
- Flag disabled → `BetaUserCredit` (current behavior with monthly
refills)
  
- **Flag Integration**: Added `ENABLE_PLATFORM_PAYMENT` flag and
integrated LaunchDarkly evaluation throughout the credit system

- **API Updates**: All credit-related endpoints now use the
user-specific credit model instead of a global instance

### Frontend
- **Dynamic UI**: Payment-related components (billing page, wallet
refill) now show/hide based on the LaunchDarkly flag
- **Removed Environment Variable**: Replaced
`NEXT_PUBLIC_SHOW_BILLING_PAGE` with runtime flag evaluation

### Testing
- **Test Fixes**: Updated all tests that referenced the removed global
`_user_credit_model` to use proper mocking of the new async function

## Deployment Strategy

This implementation enables a controlled rollout:
1. Deploy with flag disabled (default) - no behavior change for existing
users
2. Enable flag for pilot/beta users via LaunchDarkly dashboard
3. Monitor usage and feedback from pilot users
4. Gradually expand to more users
5. Eventually enable for all users once validated

## Test Plan

- [x] Unit tests pass for credit system components
- [x] Payment UI components show/hide correctly based on flag
- [x] Default behavior (flag disabled) maintains current functionality
- [x] Flag enabled users get payment system without monthly refills
- [x] Admin credit operations work correctly
- [x] Backward compatibility maintained

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-17 06:11:39 +00:00
Nicholas Tindle
b230b1b5cf feat(backend): Add Sentry user and tag tracking to node execution (#11170)
Integrates Sentry SDK to set user and contextual tags during node
execution for improved error tracking and user count analytics. Ensures
Sentry context is properly set and restored, and exceptions are captured
with relevant context before scope restoration.

<!-- Clearly explain the need for these changes: -->

### Changes 🏗️
Adds sentry tracking to block failures
<!-- Concisely describe all of the changes made in this pull request:
-->

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  <!-- Put your test plan here: -->
  - [x] Test to make sure the userid and block details show up in Sentry
  - [x] make sure other errors aren't contaminated 


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- New Features
- Added conditional support for feature flags when configured, enabling
targeted rollouts and experiments without impacting unconfigured
environments.

- Chores
- Enhanced error monitoring with richer contextual data during node
execution to improve stability and diagnostics.
- Updated metrics initialization to dynamically include feature flag
integrations when available, without altering behavior for unconfigured
setups.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-10-15 14:33:08 +00:00
Zamil Majdy
934cb3a9c7 feat(backend): Make execution limit per user per graph and reduce to 25 (#11169)
## Summary
- Changed max_concurrent_graph_executions_per_user from 50 to 25
concurrent executions
- Updated the limit to be per user per graph instead of globally per
user
- Users can now run different graphs concurrently without being limited
by executions of other graphs
- Enhanced database query to filter by both user_id and graph_id

## Changes Made
- **Settings**: Reduced default limit from 50 to 25 and updated
description to clarify per-graph scope
- **Database Layer**: Modified `get_graph_executions_count` to accept
optional `graph_id` parameter
- **Executor Manager**: Updated rate limiting logic to check
per-user-per-graph instead of per-user globally
- **Logging**: Enhanced warning messages to include graph_id context

## Test plan
- [ ] Verify that users can run up to 25 concurrent executions of the
same graph
- [ ] Verify that users can run different graphs concurrently without
interference
- [ ] Test rate limiting behavior when limit is exceeded for a specific
graph
- [ ] Confirm logging shows correct graph_id context in rate limit
messages

## Impact
This change improves the user experience by allowing concurrent
execution of different graphs while still preventing resource exhaustion
from running too many instances of the same graph.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-15 00:02:55 +00:00
seer-by-sentry[bot]
20acd8b51d fix(backend): Improve Postmark error handling and logging for notification delivery (#11052)
<!-- Clearly explain the need for these changes: -->
Fixes
[AUTOGPT-SERVER-5K6](https://sentry.io/organizations/significant-gravitas/issues/6887660207/).
The issue was that: Batch sending fails due to malformed data (422) and
inactive recipients (406); the 406 error is misclassified as a size
limit failure.

- Implements more robust error handling for Postmark API failures during
notification sending.
- Specifically handles inactive recipients (HTTP 406), malformed data
(HTTP 422), and oversized notifications.
- Adds detailed logging for each error case, including the notification
index and error message.
- Skips individual notifications that fail due to these errors,
preventing the entire batch from failing.
- Improves error handling for ValueErrors during send_templated calls,
specifically addressing oversized notifications.


This fix was generated by Seer in Sentry, triggered by Nicholas Tindle.
👁️ Run ID: 1675950

Not quite right? [Click here to continue debugging with
Seer.](https://sentry.io/organizations/significant-gravitas/issues/6887660207/?seerDrawer=true)

### Changes 🏗️

<!-- Concisely describe all of the changes made in this pull request:
-->
- Implements more robust error handling for Postmark API failures during
notification sending.
- Specifically handles inactive recipients (HTTP 406), malformed data
(HTTP 422), and oversized notifications.
- Adds detailed logging for each error case, including the notification
index and error message.
- Skips individual notifications that fail due to these errors,
preventing the entire batch from failing.
- Improves error handling for ValueErrors during send_templated calls,
specifically addressing oversized notifications.
- Also disables this in prod to prevent scaling issues until we work out
some of the more critical issues

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  <!-- Put your test plan here: -->
- [x] Test sending notifications with invalid email addresses to ensure
406 errors are handled correctly.
- [x] Test sending notifications with malformed data to ensure 422
errors are handled correctly.
- [x] Test sending oversized notifications to ensure they are skipped
and logged correctly.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- New Features
  - None

- Bug Fixes
- Individual email failures no longer abort a batch; processing
continues after per-recipient errors.
- Specific handling for inactive recipients and malformed messages to
prevent repeated delivery attempts.

- Chores
  - Improved error logging and diagnostics for email delivery scenarios.

- Tests
- Added tests covering email-sending error cases, user-deactivation on
inactive addresses, and batch-continuation behavior.

- Documentation
  - None
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: seer-by-sentry[bot] <157164994+seer-by-sentry[bot]@users.noreply.github.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2025-10-13 07:16:48 +00:00
Zamil Majdy
05a72f4185 feat(backend): implement user rate limiting for concurrent graph executions (#11128)
## Summary
Add configurable rate limiting to prevent users from exceeding the
maximum number of concurrent graph executions, defaulting to 50 per
user.

## Changes Made

### Configuration (`backend/util/settings.py`)
- Add `max_concurrent_graph_executions_per_user` setting (default: 50,
range: 1-1000)
- Configurable via environment variables or settings file

### Database Query Function (`backend/data/execution.py`) 
- Add `get_graph_executions_count()` function for efficient count
queries
- Supports filtering by user_id, statuses, and time ranges
- Used to check current RUNNING/QUEUED executions per user

### Database Manager Integration (`backend/executor/database.py`)
- Expose `get_graph_executions_count` through DatabaseManager RPC
interface
- Follows existing patterns for database operations
- Enables proper service-to-service communication

### Rate Limiting Logic (`backend/executor/manager.py`)
- Inline rate limit check in `_handle_run_message()` before cluster lock
- Use existing `db_client` pattern for consistency
- Reject and requeue executions when limit exceeded
- Graceful error handling - proceed if rate limit check fails
- Enhanced logging with user_id and current/max execution counts

## Technical Implementation
- **Database approach**: Query actual execution statuses for accuracy
- **RPC pattern**: Use DatabaseManager client following existing
codebase patterns
- **Fail-safe design**: Proceed with execution if rate limit check fails
- **Requeue on limit**: Rejected executions are requeued for later
processing
- **Early rejection**: Check rate limit before expensive cluster lock
operations

## Rate Limiting Flow
1. Parse incoming graph execution request
2. Query database via RPC for user's current RUNNING/QUEUED execution
count
3. Compare against configured limit (default: 50)
4. If limit exceeded: reject and requeue message
5. If within limit: proceed with normal execution flow

## Configuration Example
```env
MAX_CONCURRENT_GRAPH_EXECUTIONS_PER_USER=25  # Reduce to 25 for stricter limits
```

## Test plan
- [x] Basic functionality tested - settings load correctly, database
function works
- [x] ExecutionManager imports and initializes without errors
- [x] Database manager exposes the new function through RPC
- [x] Code follows existing patterns and conventions
- [ ] Integration testing with actual rate limiting scenarios
- [ ] Performance testing to ensure minimal impact on execution pipeline

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-11 08:02:34 +07:00
Bently
4856bd1f3a fix(backend): prevent sub-agent execution visibility across users (#11132)
Fixes a issue where sub-agent executions triggered by one user were
visible in the original agent author's execution library.
 ## Solution

Fixed the user_id attribution in
`autogpt_platform/backend/backend/executor/manager.py` by ensuring that
sub-agent executions always use the actual executor's user_id rather
than the agent author's user_id stored in node defaults.

### Changes
- Added user_id override in `execute_node()` function when preparing
AgentExecutorBlock input (line 194)
- Ensures sub-agent executions are correctly attributed to the user
running them, not the agent author
- Maintains proper privacy isolation between users in marketplace agent
scenarios

### Security Impact
- **Before**: When User B downloaded and ran a marketplace agent
containing sub-agents owned by User A, the sub-agent executions appeared
in User A's library
- **After**: Sub-agent executions now only appear in the library of the
user who actually ran them
- Prevents unauthorized access to execution data and user privacy
violation

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  <!-- Test plan: -->
  - [x] Create an agent with sub-agents as User A
  - [x] Publish agent to marketplace
  - [x] Run the agent as User B
- [x] Verify User A cannot see User B's sub-agent executions in their
library
  - [x] Verify User B can see their own sub-agent executions
  - [x] Verify primary agent executions remain correctly filtered
2025-10-09 11:17:26 +00:00
Zamil Majdy
59c27fe248 feat(backend): implement comprehensive rate-limited Discord alerting system (#11106)
## Summary
Implement comprehensive Discord alerting system with intelligent rate
limiting to prevent spam and provide proper visibility into system
failures across retry mechanisms and execution errors.

## Key Features

### 🚨 Rate-Limited Discord Alerting Infrastructure
- **Reusable rate-limited alerts**: `send_rate_limited_discord_alert()`
function for any Discord alerts
- **5-minute rate limiting**: Prevents spam for identical error
signatures (function+error+context)
- **Thread-safe**: Proper locking for concurrent alert attempts
- **Configurable channels**: Support custom Discord channels or default
to PLATFORM
- **Graceful failure handling**: Alert failures don't break main
application flow

### 🔄 Enhanced Retry Alert System
- **Unified threshold alerting**: Both general retries and
infrastructure retries alert at EXCESSIVE_RETRY_THRESHOLD (50 attempts)
- **Critical retry alerts**: Early warning when operations approach
failure threshold
- **Infrastructure monitoring**: Dedicated alerts for database, Redis,
RabbitMQ connection issues
- **Rate limited**: All retry alerts use rate limiting to prevent
overwhelming Discord channels

### 📊 Unknown Execution Error Alerts
- **Automatic error detection**: Alert for unexpected graph execution
failures
- **Rich context**: Include user ID, graph ID, execution ID, error type
and message
- **Filtered alerts**: Skip known errors (InsufficientBalanceError,
ModerationError)
- **Proper error tracking**: Ensure execution_stats.error is set for all
error types

## Technical Implementation

### Rate Limiting Strategy
```python
# Create unique signatures based on function+error+context
error_signature = f"{context}:{func_name}:{type(exception).__name__}:{str(exception)[:100]}"
```
- **5-minute windows**: ALERT_RATE_LIMIT_SECONDS = 300 prevents
duplicate alerts
- **Memory efficient**: Only store last alert timestamp per unique error
signature
- **Context awareness**: Same error in different contexts can send
separate alerts

### Alerting Hierarchy
1. **50 attempts**: Critical alert warning about approaching failure
(EXCESSIVE_RETRY_THRESHOLD)
2. **100 attempts**: Final infrastructure failure (conn_retry max_retry)
3. **Unknown execution errors**: Immediate rate-limited alerts for
unexpected failures

## Files Modified

### Core Implementation
- `backend/executor/manager.py`: Unknown execution error alerts with
rate limiting
- `backend/util/retry.py`: Comprehensive rate-limited alerting
infrastructure
- `backend/util/retry_test.py`: Full test coverage for rate limiting
functionality (14 tests)

### Code Quality Improvements
- **Inlined alert messages**: Eliminated unnecessary temporary variables
- **Simplified logic**: Removed excessive comments and redundant alerts
- **Consistent patterns**: All alert functions follow same clean code
style
- **DRY principle**: Reusable rate-limited alert system for future
monitoring needs

## Benefits

### 🛡️ Prevents Alert Spam
- **Rate limiting**: No more overwhelming Discord channels with
duplicate alerts
- **Intelligent deduplication**: Same errors rate limited while
different errors get through
- **Thread safety**: Concurrent operations handled correctly

### 🔍 Better System Visibility  
- **Unknown errors**: Issues that need investigation are properly
surfaced
- **Infrastructure monitoring**: Early warning for
database/Redis/RabbitMQ issues
- **Rich context**: All necessary debugging information included in
alerts

### 🧹 Maintainable Codebase
- **Reusable infrastructure**: `send_rate_limited_discord_alert()` for
future monitoring
- **Clean, consistent code**: Inlined messages, simplified logic, proper
abstractions
- **Comprehensive testing**: Rate limiting edge cases and real-world
scenarios covered

## Validation Results
-  All 14 retry tests pass including comprehensive rate limiting
coverage
-  Manager execution tests pass validating integration with execution
flow
-  Thread safety validated with concurrent alert attempt tests
-  Real-world scenarios tested including the specific spend_credits
spam issue that motivated this work
-  Code formatting, linting, and type checking all pass

## Before/After Comparison

### Before
- No rate limiting → Discord spam for repeated errors
- Unknown execution errors not monitored → Issues went unnoticed  
- Inconsistent alerting thresholds → Confusing monitoring
- Verbose code with temporary variables → Harder to maintain

### After  
-  Rate-limited intelligent alerting prevents spam
-  Unknown execution errors properly monitored with context
-  Unified 50-attempt threshold for consistent monitoring
-  Clean, maintainable code with reusable infrastructure

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-09 08:22:15 +07:00
Zamil Majdy
4e1557e498 fix(backend): Add dynamic input pin support for Smart Decision Maker Block (#11082)
## Summary

- Centralize dynamic field delimiters and helpers in
backend/data/dynamic_fields.py.
- Refactor SmartDecisionMaker: build function signatures with
dynamic-field mapping and re-map tool outputs back to original dynamic
names.
- Deterministic retry loop with retry-only feedback to avoid polluting
final conversation history.
- Update executor/utils.py and data/graph.py to use centralized
utilities.
- Update and extend tests: dynamic-field E2E flow, mapping verification,
output yielding, and retry validation; switch mocked llm_call to
AsyncMock; align tool-name expectations.
- Add a single-tool fallback in schema lookup to support mocked
scenarios.

## Validation

- Full backend test suite: 1125 passed, 88 skipped, 53 warnings (local).
- Backend lint/format pass.

## Scope

- Minimal and localized to SmartDecisionMaker and dynamic-field
utilities; unrelated pyright warnings remain unchanged.

## Risks/Mitigations

- Behavior is backward-compatible; dynamic-field constants are
centralized and reused.
- Output re-mapping only affects SmartDecisionMaker tool outputs and
matches existing link naming conventions.

## Checklist

- [x] Formatted and linted
- [x] All updated tests pass locally
- [x] No secrets introduced

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-04 14:23:13 +00:00
Zamil Majdy
57a06f7088 fix(blocks, security): Fixes for various DoS vulnerabilities (#10798)
This PR addresses multiple critical and medium security vulnerabilities
that could lead to Denial of Service (DoS) attacks. All fixes implement
defense-in-depth strategies with comprehensive testing.

### Changes 🏗️

#### **Critical Security Fixes:**

1. **GHSA-m2wr-7m3r-p52c - ReDoS in CodeExtractionBlock** 
- Fixed catastrophic backtracking in regex patterns `\s+[\s\S]*?` and
`\s+(.*?)`
   - Replaced with safer patterns: `[ \t]*\n([^\s\S]*?)`
   - Files: `backend/blocks/code_extraction_block.py`

2. **GHSA-955p-gpfx-r66j - AITextSummarizerBlock Memory Amplification**
   - Added 1MB text size limit and 100 chunk maximum
   - Prevents 10K input → 50G memory amplification attacks
   - Files: `backend/blocks/llm.py`

3. **GHSA-5cqw-g779-9f9x - RSS Feed XML Bomb DoS**
   - Added 10MB feed size limit and 30s timeout
   - Prevents deep XML parsing memory exhaustion
   - Files: `backend/blocks/rss.py`

4. **GHSA-7g34-7fvq-xxq6 - File Storage Disk Exhaustion**
   - Added 100MB per file and 1GB per execution directory limits
   - Prevents disk space exhaustion from file uploads
   - Files: `backend/util/file.py`

5. **GHSA-pppq-xx2w-7jpq - ExtractTextInformationBlock ReDoS**
   - Added 1MB text limit, 1000 match limit, and 5s timeout protection
   - Prevents lookahead pattern memory exhaustion
   - Files: `backend/blocks/text.py`

6. **GHSA-vw3v-whvp-33v5 - Docker Logging Disk Exhaustion**
- Added log rotation limits at Docker (10MB × 3 files) and application
levels
   - Prevents unbounded log growth causing disk exhaustion
- Files: `docker-compose.platform.yml`,
`autogpt_libs/autogpt_libs/logging/config.py`

#### **Additional Security Improvements:**

7. **StepThroughItemsBlock DoS Prevention**
   - Added 10,000 item limit and 1MB input size limit
   - Prevents large iteration DoS attacks
   - Files: `backend/blocks/iteration.py`

8. **XMLParserBlock XML Bomb Prevention**
   - Added 10MB XML input size limit
   - Files: `backend/blocks/xml_parser.py`

#### **Code Quality:**
- Fixed Python 3.10 typing compatibility issues
- Added comprehensive security test suite
- All code formatted and linted

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Created comprehensive security test suite covering all
vulnerabilities
  - [x] Verified ReDoS patterns are fixed and don't cause timeouts
  - [x] Confirmed memory limits prevent amplification attacks
  - [x] Tested file size limits prevent disk exhaustion
  - [x] Validated log rotation prevents unbounded growth
  - [x] Ensured backward compatibility for normal usage

#### For configuration changes:
- [x] `docker-compose.yml` is updated with logging limits
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

### Test Plan 🧪

**Security Tests:**
1. **ReDoS Protection**: Tested with malicious regex inputs (large
spaces) - completes without hanging
2. **Memory Limits**: Verified 2MB text input gets truncated to 1MB,
chunk limits enforced
3. **File Size Limits**: Confirmed 200MB files rejected, directory size
limits enforced
4. **Iteration Limits**: Tested 20K item arrays rejected, large JSON
strings rejected
5. **Timeout Protection**: Dangerous regex patterns timeout after 5s
instead of hanging

**Compatibility Tests:**
- Normal functionality preserved for all blocks
- Existing tests pass with new security limits
- Performance impact minimal for typical usage

### Security Impact 🛡️

**Before:** Multiple attack vectors could cause:
- CPU exhaustion (ReDoS attacks)
- Memory exhaustion (amplification attacks)  
- Disk exhaustion (file/log bombs)
- Service unavailability

**After:** All attack vectors mitigated with:
- Input validation and size limits
- Timeout protections
- Resource quotas
- Defense-in-depth approach

All fixes maintain backward compatibility while preventing DoS attacks.

🤖 Generated with [Claude Code](https://claude.ai/code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Adds robust DoS protections across blocks (regex, memory, iteration,
XML/RSS, file I/O) and enables app/Docker log rotation with
comprehensive tests.
> 
> - **Security hardening**:
> - Replace unsafe regex in `backend/blocks/code_extraction_block.py` to
prevent ReDoS; add safer extraction/removal patterns.
> - Constrain LLM summarizer chunking in `backend/blocks/llm.py` (1MB
cap, chunk/overlap validation, chunk count limit).
> - Limit RSS fetching in `backend/blocks/rss.py` (scheme validation,
10MB cap, timeout, bounded read) and return empty on failure.
>   - Impose XML size limit (10MB) in `backend/blocks/xml_parser.py`.
> - Add file upload/download limits in `backend/util/file.py`
(100MB/file, 1GB dir quota) and enforce scanning before write.
> - Enable rotating file logs in `autogpt_libs/logging/config.py` (size
+ backups) and Docker json-file log rotation in
`docker-compose.platform.yml`.
> - **Iteration block**:
> - Add item count/string size limits; fix yielded key for dicts; cap
iterations in `backend/blocks/iteration.py`.
> - **Tests**:
> - New `backend/blocks/test/test_security_fixes.py` covering ReDoS,
timeouts, memory/size and iteration limits, XML/file constraints.
> - **Misc**:
> - Typing fallback for `NotRequired` in `activity_status_generator.py`.
>   - Dependency updates in `backend/poetry.lock`.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
500e1578b1. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com>
Co-authored-by: Zamil Majdy <majdyz@users.noreply.github.com>
Co-authored-by: Reinier van der Leer <Pwuts@users.noreply.github.com>
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2025-10-02 12:55:55 +00:00
Zamil Majdy
258bf0b1a5 fix(backend): improve activity status generation accuracy and handle missing blocks gracefully (#11039)
## Summary
Fix critical issues where activity status generator incorrectly reported
failed executions as successful, and enhance AI evaluation logic to be
more accurate about actual task accomplishment.

## Changes Made

### 1. Missing Block Handling (`backend/data/graph.py`)
- **Replace ValueError with graceful degradation**: When blocks are
deleted/missing, return `_UnknownBlock` placeholder instead of crashing
- **Comprehensive interface implementation**: `_UnknownBlock` implements
all expected Block methods to prevent type errors
- **Warning logging**: Log missing blocks for debugging without breaking
execution flow
- **Removed unnecessary caching**: Direct constructor calls instead of
cached wrapper functions

### 2. Enhanced Activity Status AI Evaluation
(`backend/executor/activity_status_generator.py`)

#### Intention-Based Success Evaluation
- **Graph description analysis**: AI now reads graph description FIRST
to understand intended purpose
- **Purpose-driven evaluation**: Success is measured against what the
graph was designed to accomplish
- **Critical output analysis**: Enhanced detection of missing outputs
from key blocks (Output, Post, Create, Send, Publish, Generate)
- **Sub-agent failure detection**: Better identification when
AgentExecutorBlock produces no outputs

#### Improved Prompting
- **Intent-specific examples**: 'blog writing' → check for blog content,
'email automation' → check for sent emails
- **Primary evaluation criteria**: 'Did this execution accomplish what
the graph was designed to do?'
- **Enhanced checklist**: 7-point analysis including graph description
matching
- **Technical vs. goal completion**: Distinguish between workflow steps
completing vs. actual user goals achieved

#### Removed Database Error Handling
- **Eliminated try-catch blocks**: No longer needed around
`get_graph_metadata` and `get_graph` calls
- **Direct database calls**: Simplified error handling after fixing
missing block root cause
- **Cleaner code flow**: More predictable execution path without
redundant error handling

## Problem Solved
- **False success reports**: AI previously marked executions as
'successful' when critical output blocks produced no results
- **Missing block crashes**: System would fail when trying to analyze
executions with deleted/missing blocks
- **Intent-blind evaluation**: AI evaluated technical completion instead
of actual goal achievement
- **Database service errors**: 500 errors when missing blocks caused
graph loading failures

## Business Impact
- **More accurate user feedback**: Users get honest assessment of
whether their automations actually worked
- **Better task completion detection**: Clear distinction between
'workflow completed' vs. 'goal achieved'
- **Improved reliability**: System handles edge cases gracefully without
crashing
- **Enhanced user trust**: Truthful reporting builds confidence in the
platform

## Testing
-  Tested with problematic executions that previously showed false
successes
-  Confirmed missing block handling works without warnings
-  Verified enhanced prompt correctly identifies failures
-  Database calls work without try-catch protection

## Example Before/After

**Before (False Success):**
```
Graph: "Automated SEO Blog Writer"
Status: " I successfully completed your blog writing task!"
Reality: No blog content was actually created (critical output blocks had no outputs)
```

**After (Accurate Failure Detection):**
```
Graph: "Automated SEO Blog Writer"  
Status: " The task failed because the blog post creation step didn't produce any output."
Reality: Correctly identifies that the intended blog writing goal was not achieved
```

## Files Modified
- `backend/data/graph.py`: Missing block graceful handling with complete
interface
- `backend/executor/activity_status_generator.py`: Enhanced AI
evaluation with intention-based analysis

## Type of Change
- [x] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality) 
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] This change requires a documentation update

## Checklist
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] New and existing unit tests pass locally with my changes
- [x] Any dependent changes have been merged and published in downstream
modules

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-02 12:28:57 +00:00
Zamil Majdy
f314fbf14f fix(backend): resolve two critical long-running agent execution failures (#11011)
## Summary

Fix two production issues causing agent execution failures that occurred
this morning:

1. **AsyncRedisLock Release Error** (ExecutionID:
08b2c251-ee27-45de-b88d-1792823ca3ee)
   - Error: "Cannot release a lock that's no longer owned" 
- Root cause: Race condition where lock expires during long database
operations
   - Location: backend/executor/manager.py synchronized context manager

2. **Tool Call Parameter Validation** (ExecutionID:
766fd9a0-5f22-4a77-96e8-14c9d02f3292)
- Issue: LLM used typo'd parameter 'maximum_keyword_difficulty' instead
of 'max_keyword_difficulty'
- SmartDecisionMakerBlock silently accepted typo, setting correct
parameter to null
- Result: Downstream blocks received null values causing execution
failures

## Changes Made

### AsyncRedisLock Error Handling
- Add try-catch blocks around AsyncRedisLock.release() calls in
ExecutionManager and OAuth refresh
- Prevent crashes when locks expire between ownership check and release
- Log warnings instead of crashing execution

### Tool Call Parameter Validation  
- **Reject unknown parameters**: Raise ValueError for typo'd parameter
names with detailed error messages
- **Allow optional parameters**: Only validate missing REQUIRED
parameters
- **Safe parameter access**: Use .get() to handle optional parameters
with defaults
- **Clean code**: Extract parameters object once to eliminate
duplication

## Technical Implementation

**Lock Release Protection:**
```python
if await lock.locked() and await lock.owned():
    try:
        await lock.release()
    except Exception as e:
        logger.warning(f"Failed to release lock for key {key}: {e}")
```

**Parameter Validation Logic:**
```python
# Get parameters schema from tool definition  
if tool_def and "function" in tool_def and "parameters" in tool_def["function"]:
    parameters = tool_def["function"]["parameters"]
    expected_args = parameters.get("properties", {})
    required_params = set(parameters.get("required", []))

# Detect parameter typos and missing required params
unexpected_args = provided_args - expected_args_set  
missing_required_args = required_params - provided_args

if unexpected_args or missing_required_args:
    raise ValueError(error_msg)  # Detailed error explaining the problem
```

## Testing

- [x] All existing tests pass
- [x] Lock error handling prevents execution crashes  
- [x] Tool validation catches typos while allowing optional parameters
- [x] Maintains backward compatibility with existing workflows

## Impact

-  No more "Cannot release a lock" crashes during long database
operations
-  Tool calls with typo'd parameters are rejected with clear error
messages
-  Optional parameters work correctly with default values  
-  Production stability improved with graceful error handling

## Files Modified

- `backend/executor/manager.py` - AsyncRedisLock error handling in
synchronized context
- `backend/integrations/creds_manager.py` - OAuth refresh lock error
handling
- `backend/blocks/smart_decision_maker.py` - Tool call parameter
validation with typo detection

Fixes two critical production failures that were causing 2/5 agent runs
to fail this morning.

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-29 15:34:20 +00:00
Zamil Majdy
a97ff641c3 feat(backend): optimize FastAPI endpoints performance and alert system (#11000)
## Summary

Comprehensive performance optimization fixing event loop binding issues
and addressing all PR feedback.

### Original Performance Issues Fixed

**Event Loop Binding Problems:**
- JWT authentication dependencies were synchronous, causing thread pool
bottlenecks under high concurrency
- FastAPI's default thread pool (40 threads) was insufficient for
high-load scenarios
- Backend services lacked proper event loop configuration

**Security & Performance Improvements:**
- Security middleware converted from BaseHTTPMiddleware to pure ASGI for
better performance
- Added blocks endpoint to cacheable paths for improved response times
- Cross-platform uvloop detection with Windows compatibility

### Key Changes Made

#### 1. JWT Authentication Async Conversion
- **Files**: `autogpt_libs/auth/dependencies.py`,
`autogpt_libs/auth/jwt_utils.py`
- **Change**: Convert all JWT functions to async (`requires_user`,
`requires_admin_user`, `get_user_id`, `get_jwt_payload`)
- **Impact**: Eliminates thread pool blocking, improves concurrency
handling
- **Tests**: All 25+ authentication tests updated to async patterns

#### 2. FastAPI Thread Pool Optimization  
- **File**: `backend/server/rest_api.py:82-93`
- **Change**: Configure thread pool size via
`config.fastapi_thread_pool_size`
- **Default**: Increased from 40 to higher limit for sync operations
- **Impact**: Better handling of remaining sync dependencies

#### 3. Performance-Optimized Security Middleware
- **File**: `backend/server/middleware/security.py`
- **Change**: Pure ASGI implementation replacing BaseHTTPMiddleware
- **Headers**: HTTP spec compliant capitalization
(X-Content-Type-Options, X-Frame-Options, etc.)
- **Caching**: Added `/api/blocks` and `/api/v1/blocks` to cacheable
paths
- **Impact**: Reduced middleware overhead, improved header compliance

#### 4. Cross-Platform Event Loop Configuration
- **File**: `backend/server/rest_api.py:311-312`
- **Change**: Platform-aware uvloop detection: `'uvloop' if
platform.system() != 'Windows' else 'auto'`
- **Impact**: Windows compatibility while maintaining Unix performance
benefits
- **Verified**: 'auto' is valid uvicorn default parameter

#### 5. Enhanced Caching Infrastructure
- **File**: `autogpt_libs/utils/cache.py:118-132`
- **Change**: Per-event-loop asyncio.Lock instances prevent cross-loop
deadlocks
- **Impact**: Thread-safe caching across multiple event loops

#### 6. Database Query Limits & Performance
- **Files**: Multiple data layer files
- **Change**: Added configurable limits to prevent unbounded queries
- **Constants**: `MAX_GRAPH_VERSIONS_FETCH=50`,
`MAX_USER_API_KEYS_FETCH=500`, etc.
- **Impact**: Consistent performance regardless of data volume

#### 7. OpenAPI Documentation Improvements
- **File**: `backend/server/routers/v1.py:68-85`
- **Change**: Added proper response model and schema for blocks endpoint
- **Impact**: Better API documentation and type safety

#### 8. Error Handling & Retry Logic Fixes
- **File**: `backend/util/retry.py:63`
- **Change**: Accurate retry threshold comments referencing
EXCESSIVE_RETRY_THRESHOLD
- **Impact**: Clear documentation for debugging retry scenarios

### ntindle Feedback Addressed

 **HTTP Header Capitalization**: All headers now use proper HTTP spec
capitalization
 **Windows uvloop Compatibility**: Clean platform detection with inline
conditional
 **OpenAPI Response Model**: Blocks endpoint properly documented in
schema
 **Retry Comment Accuracy**: References actual threshold constants
instead of hardcoded numbers
 **Code Cleanliness**: Inline conditionals preferred over verbose if
statements

### Performance Testing Results

**Before Optimization:**
- High latency under concurrent load
- Thread pool exhaustion at ~40 concurrent requests
- Event loop binding issues causing timeouts

**After Optimization:**
- Improved concurrency handling with async JWT pipeline
- Configurable thread pool scaling
- Cross-platform event loop optimization
- Reduced middleware overhead

### Backward Compatibility

 **All existing functionality preserved**  
 **No breaking API changes**  
 **Enhanced test coverage with async patterns**  
 **Windows and Unix compatibility maintained**

### Files Modified

**Core Authentication & Performance:**
- `autogpt_libs/auth/dependencies.py` - Async JWT dependencies
- `autogpt_libs/auth/jwt_utils.py` - Async JWT utilities  
- `backend/server/rest_api.py` - Thread pool config + uvloop detection
- `backend/server/middleware/security.py` - ASGI security middleware

**Database & Limits:**
- `backend/data/includes.py` - Performance constants and configurable
includes
- `backend/data/api_key.py`, `backend/data/credit.py`,
`backend/data/graph.py`, `backend/data/integrations.py` - Query limits

**Caching & Infrastructure:**
- `autogpt_libs/utils/cache.py` - Per-event-loop lock safety
- `backend/server/routers/v1.py` - OpenAPI improvements
- `backend/util/retry.py` - Comment accuracy

**Testing:**
- `autogpt_libs/auth/dependencies_test.py` - 25+ async test conversions
- `autogpt_libs/auth/jwt_utils_test.py` - Async JWT test patterns

Ready for review and production deployment. 🚀

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-29 05:32:48 +00:00