AutoGPT

github/AutoGPT

Fork 0

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-01-09 07:08:09 -05:00

Commit Graph

Author	SHA1	Message	Date
Zamil Majdy	7b951c977e	feat(platform): implement graph-level Safe Mode toggle for HITL blocks (#11455 ) ## Summary This PR implements a graph-level Safe Mode toggle system for Human-in-the-Loop (HITL) blocks. When Safe Mode is ON (default), HITL blocks require manual review before proceeding. When OFF, they execute automatically. ## 🔧 Backend Changes - Database: Added `metadata` JSON column to `AgentGraph` table with migration - API: Updated `execute_graph` endpoint to accept `safe_mode` parameter - Execution: Enhanced execution context to use graph metadata as default with API override capability - Auto-detection: Automatically populate `has_human_in_the_loop` for graphs containing HITL blocks - Block Detection: HITL block ID: `8b2a7b3c-6e9d-4a5f-8c1b-2e3f4a5b6c7d` ## 🎨 Frontend Changes - Component: New `FloatingSafeModeToggle` with dual variants: - White variant: For library pages, integrates with action buttons - Black variant: For builders, floating positioned - Integration: Added toggles to both new/legacy builders and library pages - API Integration: Direct graph metadata updates via `usePutV1UpdateGraphVersion` - Query Management: React Query cache invalidation for consistent UI updates - Conditional Display: Toggle only appears when graph contains HITL blocks ## 🛠 Technical Implementation - Safe Mode ON (default): HITL blocks require manual review before proceeding - Safe Mode OFF: HITL blocks execute automatically without intervention - Priority: Backend API `safe_mode` parameter takes precedence over graph metadata - Detection: Auto-populates `has_human_in_the_loop` metadata field - Positioning: Proper z-index and responsive positioning for floating elements ## 🚧 Known Issues (Work in Progress) ### High Priority - [ ] Toggle state persistence: Always shows "ON" regardless of actual state - query invalidation issue - [ ] LibraryAgent metadata: Missing metadata field causing TypeScript errors - [ ] Tooltip z-index: Still covered by some UI elements despite high z-index ### Medium Priority - [ ] HITL detection: Logic needs improvement for reliable block detection - [ ] Error handling: Removing HITL blocks from graph causes save errors - [ ] TypeScript: Fix type mismatches between GraphModel and LibraryAgent ### Low Priority - [ ] Frontend API: Add `safe_mode` parameter to execution calls once OpenAPI is regenerated - [ ] Performance: Consider debouncing rapid toggle clicks ## 🧪 Test Plan - [ ] Verify toggle appears only when graph has HITL blocks - [ ] Test toggle persistence across page refreshes - [ ] Confirm API calls update graph metadata correctly - [ ] Validate execution behavior respects safe mode setting - [ ] Check styling consistency across builder and library contexts ## 🔗 Related - Addresses requirements for graph-level HITL configuration - Builds on existing FloatingReviewsPanel infrastructure - Integrates with existing graph metadata system 🤖 Generated with [Claude Code](https://claude.ai/code)	2025-12-02 09:55:55 +00:00
Zamil Majdy	193866232c	hotfix(backend): fix rate-limited messages blocking queue by republishing to back (#11326 ) ## Summary Fix critical queue blocking issue where rate-limited user messages prevent other users' executions from being processed, causing the 135 late executions reported in production. ## Root Cause Analysis When a user exceeds `max_concurrent_graph_executions_per_user` (25), the executor uses `basic_nack(requeue=True)` which sends the message to the FRONT of the RabbitMQ queue. This creates an infinite blocking loop where: 1. Rate-limited message goes to front of queue 2. Gets processed, hits rate limit again 3. Goes back to front of queue 4. Blocks all other users' messages indefinitely ## Solution Implementation ### 🔧 Core Changes - New setting: `requeue_by_republishing` (default: `True`) in `backend/util/settings.py` - Smart `_ack_message`: Automatically uses republishing when `requeue=True` and setting enabled - Efficient implementation: Uses existing `self.run_client` connection instead of creating new ones - Integration test: Real RabbitMQ test validates queue ordering behavior ### 🔄 Technical Implementation Before (blocking): ```python basic_nack(delivery_tag, requeue=True) # Goes to FRONT of queue ❌ ``` After (non-blocking): ```python if requeue and self.config.requeue_by_republishing: # First: Republish to BACK of queue self.run_client.publish_message(...) # Then: Reject without requeue basic_nack(delivery_tag, requeue=False) ``` ### 📊 Impact - ✅ Other users' executions no longer blocked by rate-limited users - ✅ Fair queue processing - FIFO behavior maintained for all users - ✅ Rate limiting still works - just doesn't block others - ✅ Configurable - can revert to old behavior with `requeue_by_republishing=False` - ✅ Zero performance impact - uses existing connections ## Test Plan - Integration test: `test_requeue_integration.py` validates real RabbitMQ queue ordering - Scenario testing: Confirms rate-limited messages go to back of queue - Cross-user validation: Verifies other users' messages process correctly - Setting test: Confirms configuration loads with correct defaults ## Deployment Strategy This is a hotfix that can be deployed immediately: - Backward compatible: Old behavior available via config - Safe default: New behavior is safer than current state - No breaking changes: All existing functionality preserved - Immediate relief: Resolves production queue blocking ## Files Modified - `backend/executor/manager.py`: Enhanced `_ack_message` logic and `_requeue_message_to_back` method - `backend/util/settings.py`: Added `requeue_by_republishing` configuration field - `test_requeue_integration.py`: Integration test for queue ordering validation ## Related Issues Fixes the 135 late executions issue where messages were stuck in QUEUED state despite available executor capacity (583m/600m utilization). 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-11-05 16:24:07 +00:00

Author

SHA1

Message

Date

Zamil Majdy

7b951c977e

feat(platform): implement graph-level Safe Mode toggle for HITL blocks (#11455 )

## Summary

This PR implements a graph-level Safe Mode toggle system for
Human-in-the-Loop (HITL) blocks. When Safe Mode is ON (default), HITL
blocks require manual review before proceeding. When OFF, they execute
automatically.

## 🔧 Backend Changes

- **Database**: Added `metadata` JSON column to `AgentGraph` table with
migration
- **API**: Updated `execute_graph` endpoint to accept `safe_mode`
parameter
- **Execution**: Enhanced execution context to use graph metadata as
default with API override capability
- **Auto-detection**: Automatically populate `has_human_in_the_loop` for
graphs containing HITL blocks
- **Block Detection**: HITL block ID:
`8b2a7b3c-6e9d-4a5f-8c1b-2e3f4a5b6c7d`

## 🎨 Frontend Changes

- **Component**: New `FloatingSafeModeToggle` with dual variants:
  - **White variant**: For library pages, integrates with action buttons
  - **Black variant**: For builders, floating positioned  
- **Integration**: Added toggles to both new/legacy builders and library
pages
- **API Integration**: Direct graph metadata updates via
`usePutV1UpdateGraphVersion`
- **Query Management**: React Query cache invalidation for consistent UI
updates
- **Conditional Display**: Toggle only appears when graph contains HITL
blocks

## 🛠 Technical Implementation

- **Safe Mode ON** (default): HITL blocks require manual review before
proceeding
- **Safe Mode OFF**: HITL blocks execute automatically without
intervention
- **Priority**: Backend API `safe_mode` parameter takes precedence over
graph metadata
- **Detection**: Auto-populates `has_human_in_the_loop` metadata field
- **Positioning**: Proper z-index and responsive positioning for
floating elements

## 🚧 Known Issues (Work in Progress)

### High Priority
- [ ] **Toggle state persistence**: Always shows "ON" regardless of
actual state - query invalidation issue
- [ ] **LibraryAgent metadata**: Missing metadata field causing
TypeScript errors
- [ ] **Tooltip z-index**: Still covered by some UI elements despite
high z-index

### Medium Priority  
- [ ] **HITL detection**: Logic needs improvement for reliable block
detection
- [ ] **Error handling**: Removing HITL blocks from graph causes save
errors
- [ ] **TypeScript**: Fix type mismatches between GraphModel and
LibraryAgent

### Low Priority
- [ ] **Frontend API**: Add `safe_mode` parameter to execution calls
once OpenAPI is regenerated
- [ ] **Performance**: Consider debouncing rapid toggle clicks

## 🧪 Test Plan

- [ ] Verify toggle appears only when graph has HITL blocks
- [ ] Test toggle persistence across page refreshes  
- [ ] Confirm API calls update graph metadata correctly
- [ ] Validate execution behavior respects safe mode setting
- [ ] Check styling consistency across builder and library contexts

## 🔗 Related

- Addresses requirements for graph-level HITL configuration
- Builds on existing FloatingReviewsPanel infrastructure
- Integrates with existing graph metadata system

🤖 Generated with [Claude Code](https://claude.ai/code)

2025-12-02 09:55:55 +00:00

Zamil Majdy

193866232c

hotfix(backend): fix rate-limited messages blocking queue by republishing to back (#11326 )

## Summary
Fix critical queue blocking issue where rate-limited user messages
prevent other users' executions from being processed, causing the 135
late executions reported in production.

## Root Cause Analysis
When a user exceeds `max_concurrent_graph_executions_per_user` (25), the
executor uses `basic_nack(requeue=True)` which sends the message to the
**FRONT** of the RabbitMQ queue. This creates an infinite blocking loop
where:
1. Rate-limited message goes to front of queue
2. Gets processed, hits rate limit again  
3. Goes back to front of queue
4. Blocks all other users' messages indefinitely

## Solution Implementation

### 🔧 Core Changes
- **New setting**: `requeue_by_republishing` (default: `True`) in
`backend/util/settings.py`
- **Smart `_ack_message`**: Automatically uses republishing when
`requeue=True` and setting enabled
- **Efficient implementation**: Uses existing `self.run_client`
connection instead of creating new ones
- **Integration test**: Real RabbitMQ test validates queue ordering
behavior

### 🔄 Technical Implementation
**Before (blocking):**
```python
basic_nack(delivery_tag, requeue=True)  # Goes to FRONT of queue ❌
```

**After (non-blocking):**
```python
if requeue and self.config.requeue_by_republishing:
    # First: Republish to BACK of queue
    self.run_client.publish_message(...)
    # Then: Reject without requeue
    basic_nack(delivery_tag, requeue=False)
```

### 📊 Impact
- ✅ **Other users' executions no longer blocked** by rate-limited users
- ✅ **Fair queue processing** - FIFO behavior maintained for all users
- ✅ **Rate limiting still works** - just doesn't block others
- ✅ **Configurable** - can revert to old behavior with
`requeue_by_republishing=False`
- ✅ **Zero performance impact** - uses existing connections

## Test Plan
- **Integration test**: `test_requeue_integration.py` validates real
RabbitMQ queue ordering
- **Scenario testing**: Confirms rate-limited messages go to back of
queue
- **Cross-user validation**: Verifies other users' messages process
correctly
- **Setting test**: Confirms configuration loads with correct defaults

## Deployment Strategy
This is a **hotfix** that can be deployed immediately:
- **Backward compatible**: Old behavior available via config
- **Safe default**: New behavior is safer than current state
- **No breaking changes**: All existing functionality preserved
- **Immediate relief**: Resolves production queue blocking

## Files Modified
- `backend/executor/manager.py`: Enhanced `_ack_message` logic and
`_requeue_message_to_back` method
- `backend/util/settings.py`: Added `requeue_by_republishing`
configuration field
- `test_requeue_integration.py`: Integration test for queue ordering
validation

## Related Issues
Fixes the 135 late executions issue where messages were stuck in QUEUED
state despite available executor capacity (583m/600m utilization).

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>

2025-11-05 16:24:07 +00:00

2 Commits