- Add get_pending_reviews_by_node_exec_ids() for batch fetching reviews
- Replace N+1 query loop with single batch query using find_many
- Execute auto-approval record creation in parallel with asyncio.gather()
- Simplify graph_exec_id extraction logic
This reduces database round-trips from N queries to 1 query when processing
multiple reviews, significantly improving performance for bulk operations.
- Remove regex matching for old 'Review required for X execution' format
- Remove console.log debug statement
- Simplify code by removing unnecessary string parsing
- Remove fallback to displayName/nodeNameMap, use instructions directly
- Remove graph fetch and nodeNameMap logic (unused)
- Simplify PendingReviewCard to use review.instructions only
- Remove all obvious comments
- Clean up unused imports
- Show HITL instruction label by checking for user-provided names (not technical block names)
- Always show 'Review required for X' in header (not just the name)
- Default single reviews to expanded, multiple reviews to collapsed
- Detect HITL blocks by checking if name doesn't contain 'Block' suffix
- Only show instruction label when it's in payload (HITL blocks)
- Don't show label for sensitive action blocks (uses review.instructions)
- Add backward compatibility for old format reviews
- Simplify auto-approve message (remove review count)
- Remove obvious code comments
- Add back instructions display as label for HITL blocks
- Show user-provided name field (e.g., 'User profile data') above the input
- Only show label when instructions exist (HITL blocks)
- Other review types show no label, just the data
Backend changes:
- Use user-provided name directly as review message (not wrapped in 'Review required for X execution')
- Pass input_data.name from HITL block instead of generic block type name
Frontend changes:
- Show HITL block name in bold in collapse header
- Use HITL name (not node UUID) in auto-approve text
- Default all review groups to collapsed state
- Remove redundant instructions display from individual cards
- Fix lint error: remove unused instructions variable
This ensures HITL blocks show meaningful names like 'User profile data'
instead of UUIDs or generic 'HumanInTheLoopBlock' in the UI.
- Use 'Node #shortened-id' format in collapse header for consistency
- Remove special background colors and styling from collapse button
- Only show collapse header when there are multiple reviews in a group
- Don't show Node #id on individual cards when collapse header is visible (avoid redundancy)
- Show actual node name in auto-approve text instead of generic 'this node'
(e.g., 'Auto-approve future executions of Email Trigger')
- Move auto-approve toggle from individual review cards to node group level
- Change auto-approve tracking from node_exec_id to node_id for consistency
- When auto-approve is enabled, reset ALL review items in that node group to original values
- Add node identifier display to each review card (shortened format: Node #abc1...xyz9)
- Show single toggle per node with clear messaging about how many reviews it affects
This prevents confusion when multiple review items come from the same node,
making it clear that auto-approve applies to the entire node, not individual items.
- Convert review_routes_test.py from sync TestClient to async httpx.AsyncClient
- Use shared server fixture instead of creating new FastAPI app
- Add @pytest.mark.asyncio(loop_scope="session") to all test functions
- Convert all HTTP calls from sync to async (add await)
This fixes the 'RuntimeError: Event loop is closed' error that occurred
when OAuth tests ran after review_routes tests. The sync TestClient was
closing the session event loop, contaminating subsequent async tests.
## Summary
Refactors error handling in the embedding service to prevent Sentry
alert spam. Previously, batch operations would log one error per failed
file, causing hundreds of duplicate alerts. Now, exceptions bubble up
from individual functions and are aggregated at the batch level,
producing a single log entry showing all unique error types with counts.
## Changes
### Removed Error Swallowing
- Removed try/except blocks from `generate_embedding()`,
`store_content_embedding()`, `ensure_content_embedding()`,
`get_content_embedding()`, and `ensure_embedding()`
- These functions now raise exceptions instead of returning None/False
on failure
- Added docstring notes: "Raises exceptions on failure - caller should
handle"
### Improved Batch Error Aggregation
- Updated `backfill_all_content_types()` to aggregate unique errors
- Collects all exceptions from batch results
- Groups by error type and message, shows counts
- Single log entry per content type instead of per-file
### Example Output
Before: 50 separate error logs for same issue
After: `BLOCK: 50/100 embeddings failed. Errors: PrismaError: type
vector does not exist (50x)`
## Motivation
This was triggered by the AUTOGPT-SERVER-7D2 Sentry issue where pgvector
errors created hundreds of duplicate alerts. Even after the root cause
was fixed (stale database connections), the error logging pattern would
create spam for any future issues.
## Impact
- ✅ Reduces Sentry noise - single alert per batch instead of per-file
- ✅ Better diagnostics - shows all unique error types with counts
- ✅ Cleaner code - removed ~24 lines of unnecessary error swallowing
- ✅ Proper exception propagation follows Python best practices
## Testing
- Existing tests should pass (error handling moved to batch level)
- Error aggregation logic tested via
asyncio.gather(return_exceptions=True)
## Related Issues
- Fixes Sentry alert spam from AUTOGPT-SERVER-7D2
Change deprecated @pytest.mark.asyncio(scope="session") to the correct
@pytest.mark.asyncio(loop_scope="session") syntax to eliminate deprecation
warnings in the test suite.
This affects 11 test functions in run_agent_test.py that were using the
old 'scope' parameter instead of the new 'loop_scope' parameter.
Add explicit loop_scope='function' to all async tests in human_review_test.py to prevent event loop conflicts with other tests in the suite.
The pytest config has asyncio_default_fixture_loop_scope='session' which causes all async tests to share the same event loop by default. This was causing the OAuth test to fail with 'Event loop is closed' when running the full test suite.
By explicitly using function scope for our async tests, each test gets its own isolated event loop, preventing contamination between tests.
Group pending reviews by node_id and display the block names from the graph metadata. Reviews from the same HITL block are now grouped together with a collapsible header.
Features:
- Fetch graph structure to get node metadata
- Group reviews by node_id
- Display customized block name or fall back to block_id
- Collapsible groups with caret icons
- Show review count per group
- Only show group headers when there are multiple groups
This improves UX when multiple reviews are from the same block, making it easier to understand context and batch-process related reviews.
- Pass external data value from parent to child component
- Add useEffect to sync child's internal state with parent's reviewDataMap
- Fixes issue where toggling auto-approve didn't reset edited input values
- Input now correctly reverts to original payload when auto-approve is enabled
- Use TYPE_CHECKING conditional imports to avoid event loop binding issues
- Keep local imports within functions to prevent async initialization conflicts
- All 25 human review tests now pass reliably (10 data layer + 15 API routes)
- Fixes node_id fetching for frontend review grouping
Adds node_id field to PendingHumanReviewModel to enable frontend to:
- Group reviews from the same node together
- Show auto-approve toggle only on last review per node
- Apply auto-approval at node level (not per-review)
Changes:
- Fetch node_id from NodeExecution when loading reviews
- Add node_id parameter to PendingHumanReviewModel.from_db()
- Update test fixture with node_id
- Add temporary default value for backwards compatibility
Adds a 0.5s sleep after shutting down test services (ExecutionManager,
AgentServer, etc.) to ensure they fully clean up their event loops before
the next test starts. This prevents 'Event loop is closed' errors in CI
that were breaking the oauth_test.py fixture setup.
The issue occurred because:
1. SpinTestServer starts background services with their own event loops
2. When services shut down, event loops weren't fully cleaned up
3. Subsequent tests would encounter closed event loops
Fixes test isolation issue where review_routes_test.py would leave the
event loop in a bad state for tests running after it.
## Changes 🏗️
- Make the loading UX better when switching between chats or loading a
new chat
- Make session/chat management logic more manageable
- Improving "Deep thinking" loading states
- Fix bug that happened when returning to chat after navigating away
## Checklist 📋
### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Run the app locally and test the above
Introduces a new ClaudeCodeBlock that enables execution of coding tasks
using Anthropic's Claude Code in an E2B sandbox. This block unlocks
powerful agentic coding capabilities - Claude Code can autonomously
create files, install packages, run commands, and build complete
applications within a secure sandboxed environment.
Changes 🏗️
- New file backend/blocks/claude_code.py:
- ClaudeCodeBlock - Execute tasks using Claude Code in an E2B sandbox
- Dual credential support: E2B API key (sandbox) + Anthropic API key
(Claude Code)
- Session continuation support via session_id, sandbox_id, and
conversation_history
- Automatic file extraction with path, relative_path, name, and content
fields
- Configurable timeout, setup commands, and working directory
- dispose_sandbox option to keep sandbox alive for multi-turn
conversations
Checklist 📋
For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Create and execute ClaudeCodeBlock with a simple prompt ("Create a
hello world HTML file")
- [x] Verify files output includes correct path, relative_path, name,
and content
- [x] Test session continuation by passing session_id and sandbox_id
back
- [x] Build "Any API → Instant App" demo agent combining Firecrawl +
ClaudeCodeBlock + GitHub blocks
- [x] Verify generated files are pushed to GitHub with correct folder
structure using relative_path
Here are two example agents i made that can be used to test this agent,
they require github, anthropic and e2b access via api keys that are set
via the user/on the platform is testing on dev
The first agent is my
Any API → Instant App
"Transform any API documentation into a fully functional web
application. Just provide a docs URL and get a complete, ready-to-deploy
app pushed to a new GitHub repository."
[Any API → Instant
App_v36.json](https://github.com/user-attachments/files/24600326/Any.API.Instant.App_v36.json)
The second agent is my
Idea to project
"Simply enter your coding project's idea and this agent will make all of
the base initial code needed for you to start working on that project
and place it on github for you!"
[Idea to
project_v11.json](https://github.com/user-attachments/files/24600346/Idea.to.project_v11.json)
If you have any questions or issues let me know.
References
https://e2b.dev/blog/python-guide-run-claude-code-in-an-e2b-sandboxhttps://github.com/e2b-dev/e2b-cookbook/tree/main/examples/anthropic-claude-code-in-sandbox-pythonhttps://code.claude.com/docs/en/cli-reference
I tried to use E2b's "anthropic-claude-code" template but it kept
complaining it was out of date, so I make it manually spin up a E2b
instance and make it install the latest claude code and it uses that
<!-- Clearly explain the need for these changes: -->
we met some reality when merging into the docs site but this fixes it
### Changes 🏗️
updates paths, adds some guides
<!-- Concisely describe all of the changes made in this pull request:
-->
update to match reality
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] deploy it and validate
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> Aligns block integrations documentation with GitBook.
>
> - Changes generator default output to
`docs/integrations/block-integrations` and writes overview `README.md`
and `SUMMARY.md` at `docs/integrations/`
> - Adds GitBook frontmatter and hint syntax to overview; prefixes block
links with `block-integrations/`
> - Introduces `generate_summary_md` to build GitBook navigation
(including optional `guides/`)
> - Preserves per-block manual sections and adds optional `extras` +
file-level `additional_content`
> - Updates sync checker to validate parent `README.md` and `SUMMARY.md`
> - Rewrites `docs/integrations/README.md` with GitBook frontmatter and
updated links; adds `docs/integrations/SUMMARY.md`
> - Adds new guides: `guides/llm-providers.md`,
`guides/voice-providers.md`
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
fdb7ff8111. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: bobby.gaffin <bobby.gaffin@agpt.co>
- Remove confusing Exclude/Include button functionality
- Remove blue border/background from auto-approve toggle for cleaner look
- Toggle now shows by default with subtle gray text
- Disable editing when auto-approve is enabled (not when excluded)
- Remove unused rejection message functionality
- Simpler UX: just approve/reject reviews directly
The auto-approve toggle was hidden because onToggleDisabled handler was not
passed to PendingReviewCard, causing showSimplified mode. This adds:
- disabledReviewsMap state to track excluded reviews
- handleToggleDisabled handler to toggle review exclusion
- Proper filtering in processReviews to handle disabled reviews
Add safety check before accessing matching_review.graph_exec_id to prevent
Pyright reportOptionalMemberAccess error. This check should never be triggered
in practice due to validation logic, but satisfies static type checking.
Fixes Pyright error: reportOptionalMemberAccess on line 164
Add get_user_by_id mocks to three review route tests that trigger the execution resume path.
Without these mocks, the tests attempt database access via Prisma when get_user_by_id is
called to fetch user.timezone, making tests non-deterministic and potentially causing failures.
Tests fixed:
- test_process_review_action_auto_approve_creates_auto_approval_records
- test_process_review_action_without_auto_approve_still_loads_settings
- test_process_review_action_auto_approve_only_applies_to_approved_reviews
Fixes CodeRabbit comment: 2719326546
- Update execution status to QUEUED before publishing to RabbitMQ to prevent duplicate queueing
- Verify status update succeeded before publishing message to queue
- Return early if status update fails (execution already queued by another request)
- Ensures only one concurrent request can successfully queue an execution
This prevents the race condition where two concurrent requests processing the final reviews
for the same execution could both publish to the queue if the queue publish happens before
the status update.
Fixes CodeRabbit comment: 2719411493
- Change get_pending_review_by_node_exec_id to use find_first with userId filter in query (prevents cross-tenant existence leak)
- Add input_data parameter to check_approval() to use current data for auto-approvals instead of stale stored payload
- Validate all reviews in a request belong to the same execution to prevent cross-execution review processing
- Update HITLReviewHelper to pass input_data to check_approval()
Fixes CodeRabbit comments: 2719424510, 2719424508, 2719424506
- Replace global auto_approve_future_actions with per-review auto_approve_future
- Add individual toggle for each review in PendingReviewCard
- Track per-review auto-approval state in autoApproveFutureMap
- Send auto_approve_future field with each review item
- Update UI to show per-review toggle with explanation
- Automatically reset data to original when auto-approve is enabled per review
- Add user_id parameter to validate ownership before cancelling reviews
- Update call site in executor/utils.py to pass user_id
- Update all test assertions to expect user_id parameter
- Prevents cross-tenant cancellation if graph_exec_id is misused
- Add get_pending_review_by_node_exec_id() for direct review lookup
- Replace paginated search with direct lookup to avoid missing reviews beyond page 1
- Implement per-review auto_approve_future toggle for granular control
- Fix log deduplication for embedding generation warnings
- Remove unnecessary f-string prefixes per linter feedback
- Fix all test mocks to use correct functions (get_pending_reviews_for_user vs get_pending_review_by_node_exec_id)
- All 15 review route tests passing
- Change from global auto_approve_future_actions to per-review auto_approve_future flag
- Each review item can now individually specify auto-approval
- Better UX: users can auto-approve some actions but not others
- Example: auto-approve file reads but not file writes
- Backward compatible: auto_approve_future defaults to False
- Add test for per-review granularity
- Update all existing tests to use new structure
- Use log_once_per_task for embedding generation failures
- Prevents log spam when API key is missing
- Now shows single warning per task instead of per-file warnings
- Makes logs more readable and actionable
- Add defense-in-depth check that graph_exec_id belongs to user_id
- Validates ownership before creating auto-approval records
- Prevents potential misuse if function called from other contexts
- Addresses CodeRabbit security concern (comment 2718990979)
- Add user_timezone to ExecutionContext when resuming after review approval
- Fetch user to get timezone preference, defaulting to UTC if not set
- Make error deduplication more general using contextvars
- Replace global flag with log_once_per_task() helper for task-scoped logging
- Prevents log spam when processing batches (embeddings, etc.)
Addresses CodeRabbit comment about ExecutionContext not being exhaustive.
- Convert module-level TestClient to fixture to avoid event loop conflicts
- Add missing mock for get_pending_reviews_for_user in all tests
- Add client parameter to all test functions that use the test client
- Add missing mocks for get_graph_execution_meta in several tests
- Remove asyncio.gather to avoid event loop binding issues
- Process auto-approval creation sequentially with try/except for safety
All 14 review route tests now pass successfully.
Only log "openai_internal_api_key not set" error once per process instead
of on every embedding generation attempt. Reduces log spam when processing
batch operations without an API key configured.
- Use return_exceptions=True in asyncio.gather for auto-approval creation
to prevent endpoint failure when auto-approval fails (reviews already processed)
- Fix empty payload handling: use explicit None check instead of truthiness
- Distinguish auto-approvals from normal approvals: auto-approvals always
use current input_data, normal approvals preserve explicitly empty payloads
- Test cancellation of pending reviews when stopping execution in REVIEW status
- Test database manager pattern when Prisma is disconnected
- Test cascading stop to children with pending reviews
- Fix mock to simulate status transition from RUNNING to TERMINATED
Covers the bug fixes in stop_graph_execution() that handle:
1. Immediate termination of REVIEW status executions
2. Cleanup of pending reviews when stopping
3. Recursive cleanup of subagent reviews via cascade
Critical bug fix: stopping a graph in REVIEW status caused timeouts and orphaned reviews.
## Bugs Fixed
### 1. REVIEW Status Not Handled
Before:
- stop_graph_execution() only handled QUEUED, INCOMPLETE, RUNNING, COMPLETED, FAILED
- REVIEW status → waited 15 seconds → TimeoutError
- Graph remained stuck in REVIEW status
After:
- REVIEW status treated like QUEUED/INCOMPLETE (terminate immediately)
- No need to wait for executor since execution is paused
- Clean termination without timeouts
### 2. Orphaned Pending Reviews
Before:
- Stopping graph → status = TERMINATED
- Pending reviews remained in WAITING status
- User saw reviews for terminated execution in UI
- Could not approve/reject (backend validation rejects)
- Reviews stuck until manual cleanup
After:
- When stopping REVIEW execution, clean up pending reviews
- Mark all WAITING reviews as REJECTED
- reviewMessage: 'Execution was stopped by user'
- processed: true, reviewedAt: now()
- No orphaned reviews in UI
### 3. Subagent Reviews
Before:
- Parent graph with child (subagent) executions
- Child paused for HITL review
- Stop parent → recursively stops child
- Child reviews orphaned (same bugs as above)
After:
- Cascade stop properly handles child REVIEW status
- All child reviews cleaned up recursively
- Clean shutdown of entire execution tree
## Implementation
Changes to stop_graph_execution():
1. Added ExecutionStatus.REVIEW to immediate termination list
2. Check if status == REVIEW before marking TERMINATED
3. Update all WAITING reviews to REJECTED with message
4. Log cleanup for debugging
5. Then terminate execution normally
Cascade behavior preserved:
- Still recursively stops all child executions
- Each child's reviews cleaned up individually
- Parent waits for all children to complete cleanup
Defense in depth: prevent users from seeing/clicking review panel before
execution pauses for review.
Before:
- Reviews panel could show while execution is RUNNING
- User could click to open panel and see pending reviews
- Confusing UX: why are reviews shown if graph hasn't paused yet?
- Could lead to frustration when backend rejects the approval attempt
After:
- Panel hidden if execution status is RUNNING or QUEUED
- Panel only shows when status is REVIEW (paused for review)
- Clear UX: reviews appear only when execution needs user input
Benefits:
1. **Better UX**: No confusion about when to approve reviews
2. **Prevents invalid attempts**: User can't try to approve while running
3. **Works with backend validation**: Frontend hides, backend rejects
4. **Clear state**: Panel visibility directly matches execution state
Changes:
- Added status check: hide if RUNNING or QUEUED
- Panel shows only when execution has paused (REVIEW/INCOMPLETE)
- Existing polling logic still works for real-time updates