## Summary
Fix critical queue blocking issue where rate-limited user messages
prevent other users' executions from being processed, causing the 135
late executions reported in production.
## Root Cause Analysis
When a user exceeds `max_concurrent_graph_executions_per_user` (25), the
executor uses `basic_nack(requeue=True)` which sends the message to the
**FRONT** of the RabbitMQ queue. This creates an infinite blocking loop
where:
1. Rate-limited message goes to front of queue
2. Gets processed, hits rate limit again
3. Goes back to front of queue
4. Blocks all other users' messages indefinitely
## Solution Implementation
### 🔧 Core Changes
- **New setting**: `requeue_by_republishing` (default: `True`) in
`backend/util/settings.py`
- **Smart `_ack_message`**: Automatically uses republishing when
`requeue=True` and setting enabled
- **Efficient implementation**: Uses existing `self.run_client`
connection instead of creating new ones
- **Integration test**: Real RabbitMQ test validates queue ordering
behavior
### 🔄 Technical Implementation
**Before (blocking):**
```python
basic_nack(delivery_tag, requeue=True) # Goes to FRONT of queue ❌
```
**After (non-blocking):**
```python
if requeue and self.config.requeue_by_republishing:
# First: Republish to BACK of queue
self.run_client.publish_message(...)
# Then: Reject without requeue
basic_nack(delivery_tag, requeue=False)
```
### 📊 Impact
- ✅ **Other users' executions no longer blocked** by rate-limited users
- ✅ **Fair queue processing** - FIFO behavior maintained for all users
- ✅ **Rate limiting still works** - just doesn't block others
- ✅ **Configurable** - can revert to old behavior with
`requeue_by_republishing=False`
- ✅ **Zero performance impact** - uses existing connections
## Test Plan
- **Integration test**: `test_requeue_integration.py` validates real
RabbitMQ queue ordering
- **Scenario testing**: Confirms rate-limited messages go to back of
queue
- **Cross-user validation**: Verifies other users' messages process
correctly
- **Setting test**: Confirms configuration loads with correct defaults
## Deployment Strategy
This is a **hotfix** that can be deployed immediately:
- **Backward compatible**: Old behavior available via config
- **Safe default**: New behavior is safer than current state
- **No breaking changes**: All existing functionality preserved
- **Immediate relief**: Resolves production queue blocking
## Files Modified
- `backend/executor/manager.py`: Enhanced `_ack_message` logic and
`_requeue_message_to_back` method
- `backend/util/settings.py`: Added `requeue_by_republishing`
configuration field
- `test_requeue_integration.py`: Integration test for queue ordering
validation
## Related Issues
Fixes the 135 late executions issue where messages were stuck in QUEUED
state despite available executor capacity (583m/600m utilization).
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>