Adds node_id field to PendingHumanReviewModel to enable frontend to:
- Group reviews from the same node together
- Show auto-approve toggle only on last review per node
- Apply auto-approval at node level (not per-review)
Changes:
- Fetch node_id from NodeExecution when loading reviews
- Add node_id parameter to PendingHumanReviewModel.from_db()
- Update test fixture with node_id
- Add temporary default value for backwards compatibility
Adds a 0.5s sleep after shutting down test services (ExecutionManager,
AgentServer, etc.) to ensure they fully clean up their event loops before
the next test starts. This prevents 'Event loop is closed' errors in CI
that were breaking the oauth_test.py fixture setup.
The issue occurred because:
1. SpinTestServer starts background services with their own event loops
2. When services shut down, event loops weren't fully cleaned up
3. Subsequent tests would encounter closed event loops
Fixes test isolation issue where review_routes_test.py would leave the
event loop in a bad state for tests running after it.
- Remove confusing Exclude/Include button functionality
- Remove blue border/background from auto-approve toggle for cleaner look
- Toggle now shows by default with subtle gray text
- Disable editing when auto-approve is enabled (not when excluded)
- Remove unused rejection message functionality
- Simpler UX: just approve/reject reviews directly
The auto-approve toggle was hidden because onToggleDisabled handler was not
passed to PendingReviewCard, causing showSimplified mode. This adds:
- disabledReviewsMap state to track excluded reviews
- handleToggleDisabled handler to toggle review exclusion
- Proper filtering in processReviews to handle disabled reviews
Add safety check before accessing matching_review.graph_exec_id to prevent
Pyright reportOptionalMemberAccess error. This check should never be triggered
in practice due to validation logic, but satisfies static type checking.
Fixes Pyright error: reportOptionalMemberAccess on line 164
Add get_user_by_id mocks to three review route tests that trigger the execution resume path.
Without these mocks, the tests attempt database access via Prisma when get_user_by_id is
called to fetch user.timezone, making tests non-deterministic and potentially causing failures.
Tests fixed:
- test_process_review_action_auto_approve_creates_auto_approval_records
- test_process_review_action_without_auto_approve_still_loads_settings
- test_process_review_action_auto_approve_only_applies_to_approved_reviews
Fixes CodeRabbit comment: 2719326546
- Update execution status to QUEUED before publishing to RabbitMQ to prevent duplicate queueing
- Verify status update succeeded before publishing message to queue
- Return early if status update fails (execution already queued by another request)
- Ensures only one concurrent request can successfully queue an execution
This prevents the race condition where two concurrent requests processing the final reviews
for the same execution could both publish to the queue if the queue publish happens before
the status update.
Fixes CodeRabbit comment: 2719411493
- Change get_pending_review_by_node_exec_id to use find_first with userId filter in query (prevents cross-tenant existence leak)
- Add input_data parameter to check_approval() to use current data for auto-approvals instead of stale stored payload
- Validate all reviews in a request belong to the same execution to prevent cross-execution review processing
- Update HITLReviewHelper to pass input_data to check_approval()
Fixes CodeRabbit comments: 2719424510, 2719424508, 2719424506
- Replace global auto_approve_future_actions with per-review auto_approve_future
- Add individual toggle for each review in PendingReviewCard
- Track per-review auto-approval state in autoApproveFutureMap
- Send auto_approve_future field with each review item
- Update UI to show per-review toggle with explanation
- Automatically reset data to original when auto-approve is enabled per review
- Add user_id parameter to validate ownership before cancelling reviews
- Update call site in executor/utils.py to pass user_id
- Update all test assertions to expect user_id parameter
- Prevents cross-tenant cancellation if graph_exec_id is misused
- Add get_pending_review_by_node_exec_id() for direct review lookup
- Replace paginated search with direct lookup to avoid missing reviews beyond page 1
- Implement per-review auto_approve_future toggle for granular control
- Fix log deduplication for embedding generation warnings
- Remove unnecessary f-string prefixes per linter feedback
- Fix all test mocks to use correct functions (get_pending_reviews_for_user vs get_pending_review_by_node_exec_id)
- All 15 review route tests passing
- Change from global auto_approve_future_actions to per-review auto_approve_future flag
- Each review item can now individually specify auto-approval
- Better UX: users can auto-approve some actions but not others
- Example: auto-approve file reads but not file writes
- Backward compatible: auto_approve_future defaults to False
- Add test for per-review granularity
- Update all existing tests to use new structure
- Use log_once_per_task for embedding generation failures
- Prevents log spam when API key is missing
- Now shows single warning per task instead of per-file warnings
- Makes logs more readable and actionable
- Add defense-in-depth check that graph_exec_id belongs to user_id
- Validates ownership before creating auto-approval records
- Prevents potential misuse if function called from other contexts
- Addresses CodeRabbit security concern (comment 2718990979)
- Add user_timezone to ExecutionContext when resuming after review approval
- Fetch user to get timezone preference, defaulting to UTC if not set
- Make error deduplication more general using contextvars
- Replace global flag with log_once_per_task() helper for task-scoped logging
- Prevents log spam when processing batches (embeddings, etc.)
Addresses CodeRabbit comment about ExecutionContext not being exhaustive.
- Convert module-level TestClient to fixture to avoid event loop conflicts
- Add missing mock for get_pending_reviews_for_user in all tests
- Add client parameter to all test functions that use the test client
- Add missing mocks for get_graph_execution_meta in several tests
- Remove asyncio.gather to avoid event loop binding issues
- Process auto-approval creation sequentially with try/except for safety
All 14 review route tests now pass successfully.
Only log "openai_internal_api_key not set" error once per process instead
of on every embedding generation attempt. Reduces log spam when processing
batch operations without an API key configured.
- Use return_exceptions=True in asyncio.gather for auto-approval creation
to prevent endpoint failure when auto-approval fails (reviews already processed)
- Fix empty payload handling: use explicit None check instead of truthiness
- Distinguish auto-approvals from normal approvals: auto-approvals always
use current input_data, normal approvals preserve explicitly empty payloads
- Test cancellation of pending reviews when stopping execution in REVIEW status
- Test database manager pattern when Prisma is disconnected
- Test cascading stop to children with pending reviews
- Fix mock to simulate status transition from RUNNING to TERMINATED
Covers the bug fixes in stop_graph_execution() that handle:
1. Immediate termination of REVIEW status executions
2. Cleanup of pending reviews when stopping
3. Recursive cleanup of subagent reviews via cascade
Critical bug fix: stopping a graph in REVIEW status caused timeouts and orphaned reviews.
## Bugs Fixed
### 1. REVIEW Status Not Handled
Before:
- stop_graph_execution() only handled QUEUED, INCOMPLETE, RUNNING, COMPLETED, FAILED
- REVIEW status → waited 15 seconds → TimeoutError
- Graph remained stuck in REVIEW status
After:
- REVIEW status treated like QUEUED/INCOMPLETE (terminate immediately)
- No need to wait for executor since execution is paused
- Clean termination without timeouts
### 2. Orphaned Pending Reviews
Before:
- Stopping graph → status = TERMINATED
- Pending reviews remained in WAITING status
- User saw reviews for terminated execution in UI
- Could not approve/reject (backend validation rejects)
- Reviews stuck until manual cleanup
After:
- When stopping REVIEW execution, clean up pending reviews
- Mark all WAITING reviews as REJECTED
- reviewMessage: 'Execution was stopped by user'
- processed: true, reviewedAt: now()
- No orphaned reviews in UI
### 3. Subagent Reviews
Before:
- Parent graph with child (subagent) executions
- Child paused for HITL review
- Stop parent → recursively stops child
- Child reviews orphaned (same bugs as above)
After:
- Cascade stop properly handles child REVIEW status
- All child reviews cleaned up recursively
- Clean shutdown of entire execution tree
## Implementation
Changes to stop_graph_execution():
1. Added ExecutionStatus.REVIEW to immediate termination list
2. Check if status == REVIEW before marking TERMINATED
3. Update all WAITING reviews to REJECTED with message
4. Log cleanup for debugging
5. Then terminate execution normally
Cascade behavior preserved:
- Still recursively stops all child executions
- Each child's reviews cleaned up individually
- Parent waits for all children to complete cleanup
Defense in depth: prevent users from seeing/clicking review panel before
execution pauses for review.
Before:
- Reviews panel could show while execution is RUNNING
- User could click to open panel and see pending reviews
- Confusing UX: why are reviews shown if graph hasn't paused yet?
- Could lead to frustration when backend rejects the approval attempt
After:
- Panel hidden if execution status is RUNNING or QUEUED
- Panel only shows when status is REVIEW (paused for review)
- Clear UX: reviews appear only when execution needs user input
Benefits:
1. **Better UX**: No confusion about when to approve reviews
2. **Prevents invalid attempts**: User can't try to approve while running
3. **Works with backend validation**: Frontend hides, backend rejects
4. **Clear state**: Panel visibility directly matches execution state
Changes:
- Added status check: hide if RUNNING or QUEUED
- Panel shows only when execution has paused (REVIEW/INCOMPLETE)
- Existing polling logic still works for real-time updates
Defense in depth: validate execution status before processing reviews.
Before:
- Reviews could be processed regardless of execution status
- Could cause race conditions and deadlocks
- User confusion when reviews processed but execution still running
After:
- Reject review processing with 409 Conflict if status is not REVIEW/INCOMPLETE
- Only allow processing when execution is actually paused for review
- Clear error message explaining why the request was rejected
Benefits:
1. **Prevention over cure**: Stop invalid requests before processing
2. **Clear semantics**: Reviews can only be processed when execution paused
3. **Better UX**: User gets immediate feedback if they try to approve too early
4. **Simpler resume logic**: No need for complex status checks since we validate upfront
Changes:
- Fetch graph execution metadata early in the endpoint
- Validate status is REVIEW or INCOMPLETE before processing
- Removed redundant status checks in resume logic (already validated)
- Simplified resume flow: just check if pending reviews remain
- Fixed comment: 'all pending reviews' not 'some reviews'
Changed AI_AGENT_SAFETY_POPUP_SHOWN from a boolean flag to an array of
agent IDs. This ensures users see the safety popup once per unique agent
instead of once globally.
Why this is better:
- Different agents have different capabilities (sensitive actions, HITL blocks)
- User should be aware of what THIS specific agent can do
- Not too annoying since it's still only once per agent, not every run
- Better safety awareness when switching between safe and risky agents
Changes:
- Store array of seen agent IDs in localStorage instead of single boolean
- Pass agentId to useAIAgentSafetyPopup hook and AIAgentSafetyPopup component
- Check if current agent ID is in the seen list before showing popup
- Add agent ID to list when user acknowledges popup
Testing:
- Clear localStorage or remove specific agent ID from array to re-trigger popup
- Each unique agent shows popup on first run only
When users approve/reject reviews but the execution status is not REVIEW
(due to race conditions or bugs), the reviews get marked as processed but
execution never resumes, leaving the graph stuck forever.
This fix ensures that:
- If no pending reviews remain after processing, we ALWAYS attempt to resume
- Only skip if status is COMPLETED or FAILED (already finished)
- Log warning if status is unexpected (not REVIEW) but still resume to prevent deadlock
- Prevents scenario where user has nothing to do (reviews processed) but graph never completes
Example deadlock scenario (now prevented):
1. Graph creates review, sets status to REVIEW
2. User approves review → marked as APPROVED
3. Status check finds unexpected state (not REVIEW)
4. OLD: Return without resuming → graph stuck forever
5. NEW: Log warning and resume anyway → graph completes
- Add user_id parameter to check_approval for data isolation consistency
- Fix message text: 'block' → 'node' in auto-approval message
- Use walrus operator for cleaner approval_result check
- Move imports to top-level in test file (avoid local imports)
- Remove obvious comments (Check if pending, Resume execution, Load settings)
Fixed race condition where user approves reviews while graph execution
is still RUNNING, which could queue the execution twice and cause
duplicate/conflicting execution instances.
Solution:
- Check graph execution status BEFORE resuming
- Only resume if status is REVIEW (execution paused for review)
- Skip resumption if RUNNING (will naturally pick up approved reviews)
- Skip if COMPLETED/other (already finished)
This ensures we never queue an execution that's already running,
while still allowing the running execution to pick up approved
reviews naturally.
Added tests:
- All review action tests now mock get_graph_execution_meta
- Tests verify execution only resumes when status is REVIEW
Fixed "Client is not connected to the query engine" error when
check_approval is called from block execution context. The function
is now accessed through the database manager async client (RPC),
similar to other HITL methods like get_or_create_human_review.
Changes:
- Add check_approval to DatabaseManager and DatabaseManagerAsyncClient
- Update HITLReviewHelper to call check_approval via database client
- Remove direct import of check_approval in review.py
Merge auto-approval check and normal approval check into a single
function using find_first with OR condition. This reduces database
queries by checking both the node_exec_id and auto_approve_key in
one query.
- Add auto-approval via special nodeExecId key pattern (auto_approve_{graph_exec_id}_{node_id})
- Create auto-approval records in PendingHumanReview when user approves with auto-approve flag
- Check for existing auto-approval before requiring human review
- Remove node_id parameter from get_or_create_human_review
- Load graph settings properly when resuming execution after review
- Add refetchInterval to execution details query to poll while running/review
- Add polling support to usePendingReviewsForExecution hook
- Poll pending reviews every 2 seconds when execution is in REVIEW status
- This ensures the "X Reviews Pending" badge updates without page refresh
Include autoApproveFuture in the key prop to force PendingReviewCard
to remount when the toggle changes, which resets its internal state
to the original payload data.
The nodeId column was never added to PendingHumanReview. The migration
should only drop the foreign key constraint linking nodeExecId to
AgentNodeExecution, not try to drop a column that doesn't exist.
- Remove nodeId column from PendingHumanReview schema (use in-memory tracking)
- Remove foreign key relation from PendingHumanReview to AgentNodeExecution
- Use ExecutionContext.auto_approved_node_ids for auto-approval tracking
- Add auto-approve toggle in frontend (default off)
- When toggle enabled: disable editing and use original data
- Backend looks up agentNodeId from AgentNodeExecution when auto-approving
- Update tests to reflect schema changes
## Summary
- Remove explicit schema qualification (`{schema}.vector` and
`OPERATOR({schema}.<=>)`) from pgvector queries in `embeddings.py` and
`hybrid_search.py`
- Use unqualified `::vector` type cast and `<=>` operator which work
because pgvector is in the search_path on all environments
## Problem
The previous approach tried to explicitly qualify the vector type with
schema names, but this failed because:
- **CI environment**: pgvector is in `public` schema → `platform.vector`
doesn't exist
- **Dev (Supabase)**: pgvector is in `platform` schema → `public.vector`
doesn't exist
## Solution
Use unqualified `::vector` and `<=>` operator. PostgreSQL resolves these
via `search_path`, which includes the schema where pgvector is installed
on all environments.
Tested on both local and dev environments with a test script that
verified:
- ✅ Unqualified `::vector` type cast
- ✅ Unqualified `<=>` operator in ORDER BY
- ✅ Unqualified `<=>` in SELECT (similarity calculation)
- ✅ Combined query patterns matching actual usage
## Test plan
- [ ] CI tests pass
- [ ] Marketplace approval works on dev after deployment
Fixes: AUTOGPT-SERVER-763, AUTOGPT-SERVER-764, AUTOGPT-SERVER-76B
## Summary
Adds graceful error handling to AsyncRedisEventBus and RedisEventBus so
that connection failures log exceptions with full traceback while
remaining non-breaking. This allows DatabaseManager to operate without
Redis connectivity.
## Problem
DatabaseManager was failing with "Authentication required" when trying
to publish notifications via AsyncRedisNotificationEventBus. The service
has no Redis credentials configured, causing `increment_onboarding_runs`
to fail.
## Root Cause
When `increment_onboarding_runs` publishes a notification:
1. Calls `AsyncRedisNotificationEventBus().publish()`
2. Attempts to connect to Redis via `get_redis_async()`
3. Connection fails due to missing credentials
4. Exception propagates, failing the entire DB operation
Previous fix (#11775) made the cache module lazy, but didn't address the
notification bus which also requires Redis.
## Solution
Wrap Redis operations in try-except blocks:
- `publish_event`: Logs exception with traceback, continues without
publishing
- `listen_events`: Logs exception with traceback, returns empty
generator
- `wait_for_event`: Returns None on connection failure
Using `logger.exception()` instead of `logger.warning()` ensures full
stack traces are captured for debugging while keeping operations
non-breaking.
This allows services to operate without Redis when only using event bus
for non-critical notifications.
## Changes
- Modified `backend/data/event_bus.py`:
- Added graceful error handling to `RedisEventBus` and
`AsyncRedisEventBus`
- All Redis operations now catch exceptions and log with
`logger.exception()`
- Added `backend/data/event_bus_test.py`:
- Tests verify graceful degradation when Redis is unavailable
- Tests verify normal operation when Redis is available
## Test Plan
- [x] New tests verify graceful degradation when Redis unavailable
- [x] Existing notification tests still pass
- [x] DatabaseManager can increment onboarding runs without Redis
## Related Issues
Fixes https://significant-gravitas.sentry.io/issues/7205834440/
(AUTOGPT-SERVER-76D)
## Changes 🏗️
On the **Old Builder**, when running an agent...
### Before
<img width="800" height="614" alt="Screenshot 2026-01-21 at 21 27 05"
src="https://github.com/user-attachments/assets/a3b2ec17-597f-44d2-9130-9e7931599c38"
/>
Credentials are there, but it is not recognising them, you need to click
on them to be selected
### After
<img width="1029" height="728" alt="Screenshot 2026-01-21 at 21 26 47"
src="https://github.com/user-attachments/assets/c6e83846-6048-439e-919d-6807674f2d5a"
/>
It uses the new credentials UI and correctly auto-selects existing ones.
### Other
Fixed a small timezone display glitch on the new library view.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Run agent in old builder
- [x] Credentials are auto-selected and using the new collapsed system
credentials UI
## Summary
- Fixes AUTOGPT-SERVER-76H - Error parsing LibraryAgent from database
due to null values in GraphSettings fields
- When parsing LibraryAgent settings from the database, null values for
`human_in_the_loop_safe_mode` and `sensitive_action_safe_mode` were
causing Pydantic validation errors
- Adds `BeforeValidator` annotations to coerce null values to their
defaults (True and False respectively)
## Test plan
- [x] Verified with unit tests that GraphSettings can now handle
None/null values
- [x] Backend tests pass
- [x] Manually tested with all scenarios (None, empty dict, explicit
values)
Add new LLM Picker for the new Builder.
### Changes 🏗️
- Enrich `LlmModelMeta` (in `llm.py`) with human readable model, creator
and provider names and price tier (note: this is temporary measure and
all LlmModelMeta will be removed completely once LLM Registry is ready)
- Add provider icons
- Add custom input field `LlmModelField` and its components&helpers
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] LLM model picker works correctly in the new Builder
- [x] Legacy LLM model picker works in the old Builder
Instead of disabling all safe modes when approving all future actions,
now tracks specific node IDs that should be auto-approved. This means
clicking "Approve all future actions" will only auto-approve future
reviews from the same blocks, not all reviews.
Changes:
- Add nodeId field to PendingHumanReview schema
- Add auto_approved_node_ids set to ExecutionContext
- Update review helper to check auto_approved_node_ids
- Change API from disable_future_reviews to auto_approve_node_ids
- Update frontend to pass node_ids when bulk approving
- Address PR feedback: remove barrel file, JSDoc comments, and cleanup