mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-01-08 06:44:05 -05:00
hotfix: reduce scheduler max_workers to match database pool size (#10665)
## Summary - Fixes scheduler pod crashes during peak scheduling periods (e.g., 03:00:00) - Reduces APScheduler ThreadPoolExecutor max_workers from 10 to 3 (matching scheduler_db_pool_size) - Prevents event loop saturation that blocks health checks and causes pod restarts ## Root Cause Analysis During peak scheduling periods, multiple jobs execute simultaneously and compete for the shared event loop through `run_async()`. This creates a resource bottleneck where: 1. **ThreadPoolExecutor** runs up to 10 jobs concurrently 2. Each job calls `run_async()` which submits to the **same event loop** that FastAPI health check needs 3. **Health check blocks** waiting for event loop availability 4. **Liveness probe fails** after 5 consecutive timeouts (50s) 5. **Pod gets killed** with SIGKILL (exit code 137) 6. **Executions orphaned** - created in DB but never published to RabbitMQ ## Solution Match `max_workers` to `scheduler_db_pool_size` (3) to prevent more concurrent jobs than the system can handle without blocking critical health checks. ## Evidence - Pod restart at exactly 03:05:48 when executions e47cd564-ed87-4a52-999b-40804c41537a and eae69811-4c7c-4cd5-b084-41872293185b were created - 7 scheduled jobs triggered simultaneously at 03:00:00 - Health check normally responds in 0.007s but times out during high concurrency - Exit code 137 indicates SIGKILL from liveness probe failure ## Test Plan - [ ] Monitor scheduler pod stability during peak scheduling periods - [ ] Verify no executions remain QUEUED without being published to RabbitMQ - [ ] Confirm health checks remain responsive under load - [ ] Check that job execution still works correctly with reduced concurrency 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -269,7 +269,9 @@ class Scheduler(AppService):
|
||||
|
||||
self.scheduler = BackgroundScheduler(
|
||||
executors={
|
||||
"default": ThreadPoolExecutor(max_workers=10), # Max 10 concurrent jobs
|
||||
"default": ThreadPoolExecutor(
|
||||
max_workers=self.db_pool_size()
|
||||
), # Match DB pool size to prevent resource contention
|
||||
},
|
||||
job_defaults={
|
||||
"coalesce": True, # Skip redundant missed jobs - just run the latest
|
||||
|
||||
Reference in New Issue
Block a user