Files
AutoGPT/autogpt_platform/backend
Zamil Majdy 4ffb99bfb0 feat(backend): Add block error rate monitoring and Discord alerts (#10332)
## Summary

This PR adds a simple block error rate monitoring system that runs every
24 hours (configurable) and sends Discord alerts when blocks exceed the
error rate threshold.

## Changes Made

**Modified Files:**
- `backend/executor/scheduler.py` - Added `report_block_error_rates`
function and scheduled job
- `backend/util/settings.py` - Added configuration options
- `backend/.env.example` - Added environment variable examples
- Refactor scheduled job logics in scheduler.py into seperate files

## Configuration

```bash
# Block Error Rate Monitoring
BLOCK_ERROR_RATE_THRESHOLD=0.5  # 50% error rate threshold
BLOCK_ERROR_RATE_CHECK_INTERVAL_SECS=86400  # 24 hours
```

## How It Works

1. **Scheduled Job**: Runs every 24 hours (configurable via
`BLOCK_ERROR_RATE_CHECK_INTERVAL_SECS`)
2. **Error Rate Calculation**: Queries last 24 hours of node executions
and calculates error rates per block
3. **Threshold Check**: Alerts on blocks with ≥50% error rate
(configurable via `BLOCK_ERROR_RATE_THRESHOLD`)
4. **Discord Alert**: Sends alert to Discord using existing
`discord_system_alert` function
5. **Manual Execution**: Available via
`execute_report_block_error_rates()` scheduler client method

## Alert Format

```
Block Error Rate Alert:
🚨 Block 'DeprecatedGPT3Block' has 75.0% error rate (75/100) in the last 24 hours
🚨 Block 'BrokenImageBlock' has 60.0% error rate (30/50) in the last 24 hours
```

## Testing

Can be tested manually via:
```python
from backend.executor.scheduler import SchedulerClient
client = SchedulerClient()
result = client.execute_report_block_error_rates()
```

## Implementation Notes

- Follows the same pattern as `report_late_executions` function
- Only checks blocks with ≥10 executions to avoid noise
- Uses existing Discord notification infrastructure
- Configurable threshold and check interval
- Proper error handling and logging

## Test plan

- [x] Verify configuration loads correctly
- [x] Test error rate calculation with existing database
- [x] Confirm Discord integration works
- [x] Test manual execution via scheduler client
- [x] Verify scheduled job runs correctly

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude AI <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-07-10 21:56:58 +00:00
..
2025-04-15 15:40:15 +00:00