2 Commits

Author SHA1 Message Date
Zamil Majdy
7b951c977e feat(platform): implement graph-level Safe Mode toggle for HITL blocks (#11455)
## Summary

This PR implements a graph-level Safe Mode toggle system for
Human-in-the-Loop (HITL) blocks. When Safe Mode is ON (default), HITL
blocks require manual review before proceeding. When OFF, they execute
automatically.

## 🔧 Backend Changes

- **Database**: Added `metadata` JSON column to `AgentGraph` table with
migration
- **API**: Updated `execute_graph` endpoint to accept `safe_mode`
parameter
- **Execution**: Enhanced execution context to use graph metadata as
default with API override capability
- **Auto-detection**: Automatically populate `has_human_in_the_loop` for
graphs containing HITL blocks
- **Block Detection**: HITL block ID:
`8b2a7b3c-6e9d-4a5f-8c1b-2e3f4a5b6c7d`

## 🎨 Frontend Changes

- **Component**: New `FloatingSafeModeToggle` with dual variants:
  - **White variant**: For library pages, integrates with action buttons
  - **Black variant**: For builders, floating positioned  
- **Integration**: Added toggles to both new/legacy builders and library
pages
- **API Integration**: Direct graph metadata updates via
`usePutV1UpdateGraphVersion`
- **Query Management**: React Query cache invalidation for consistent UI
updates
- **Conditional Display**: Toggle only appears when graph contains HITL
blocks

## 🛠 Technical Implementation

- **Safe Mode ON** (default): HITL blocks require manual review before
proceeding
- **Safe Mode OFF**: HITL blocks execute automatically without
intervention
- **Priority**: Backend API `safe_mode` parameter takes precedence over
graph metadata
- **Detection**: Auto-populates `has_human_in_the_loop` metadata field
- **Positioning**: Proper z-index and responsive positioning for
floating elements

## 🚧 Known Issues (Work in Progress)

### High Priority
- [ ] **Toggle state persistence**: Always shows "ON" regardless of
actual state - query invalidation issue
- [ ] **LibraryAgent metadata**: Missing metadata field causing
TypeScript errors
- [ ] **Tooltip z-index**: Still covered by some UI elements despite
high z-index

### Medium Priority  
- [ ] **HITL detection**: Logic needs improvement for reliable block
detection
- [ ] **Error handling**: Removing HITL blocks from graph causes save
errors
- [ ] **TypeScript**: Fix type mismatches between GraphModel and
LibraryAgent

### Low Priority
- [ ] **Frontend API**: Add `safe_mode` parameter to execution calls
once OpenAPI is regenerated
- [ ] **Performance**: Consider debouncing rapid toggle clicks

## 🧪 Test Plan

- [ ] Verify toggle appears only when graph has HITL blocks
- [ ] Test toggle persistence across page refreshes  
- [ ] Confirm API calls update graph metadata correctly
- [ ] Validate execution behavior respects safe mode setting
- [ ] Check styling consistency across builder and library contexts

## 🔗 Related

- Addresses requirements for graph-level HITL configuration
- Builds on existing FloatingReviewsPanel infrastructure
- Integrates with existing graph metadata system

🤖 Generated with [Claude Code](https://claude.ai/code)
2025-12-02 09:55:55 +00:00
Zamil Majdy
193866232c hotfix(backend): fix rate-limited messages blocking queue by republishing to back (#11326)
## Summary
Fix critical queue blocking issue where rate-limited user messages
prevent other users' executions from being processed, causing the 135
late executions reported in production.

## Root Cause Analysis
When a user exceeds `max_concurrent_graph_executions_per_user` (25), the
executor uses `basic_nack(requeue=True)` which sends the message to the
**FRONT** of the RabbitMQ queue. This creates an infinite blocking loop
where:
1. Rate-limited message goes to front of queue
2. Gets processed, hits rate limit again  
3. Goes back to front of queue
4. Blocks all other users' messages indefinitely

## Solution Implementation

### 🔧 Core Changes
- **New setting**: `requeue_by_republishing` (default: `True`) in
`backend/util/settings.py`
- **Smart `_ack_message`**: Automatically uses republishing when
`requeue=True` and setting enabled
- **Efficient implementation**: Uses existing `self.run_client`
connection instead of creating new ones
- **Integration test**: Real RabbitMQ test validates queue ordering
behavior

### 🔄 Technical Implementation
**Before (blocking):**
```python
basic_nack(delivery_tag, requeue=True)  # Goes to FRONT of queue 
```

**After (non-blocking):**
```python
if requeue and self.config.requeue_by_republishing:
    # First: Republish to BACK of queue
    self.run_client.publish_message(...)
    # Then: Reject without requeue
    basic_nack(delivery_tag, requeue=False)
```

### 📊 Impact
-  **Other users' executions no longer blocked** by rate-limited users
-  **Fair queue processing** - FIFO behavior maintained for all users
-  **Rate limiting still works** - just doesn't block others
-  **Configurable** - can revert to old behavior with
`requeue_by_republishing=False`
-  **Zero performance impact** - uses existing connections

## Test Plan
- **Integration test**: `test_requeue_integration.py` validates real
RabbitMQ queue ordering
- **Scenario testing**: Confirms rate-limited messages go to back of
queue
- **Cross-user validation**: Verifies other users' messages process
correctly
- **Setting test**: Confirms configuration loads with correct defaults

## Deployment Strategy
This is a **hotfix** that can be deployed immediately:
- **Backward compatible**: Old behavior available via config
- **Safe default**: New behavior is safer than current state
- **No breaking changes**: All existing functionality preserved
- **Immediate relief**: Resolves production queue blocking

## Files Modified
- `backend/executor/manager.py`: Enhanced `_ack_message` logic and
`_requeue_message_to_back` method
- `backend/util/settings.py`: Added `requeue_by_republishing`
configuration field
- `test_requeue_integration.py`: Integration test for queue ordering
validation

## Related Issues
Fixes the 135 late executions issue where messages were stuck in QUEUED
state despite available executor capacity (583m/600m utilization).

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-11-05 16:24:07 +00:00