Add extensive diagnostic capabilities for on-call engineers to monitor and manage execution health. Backend Enhancements: - Add 18 diagnostic metrics covering failures, orphaned executions, stuck queued, throughput, and queue health - Implement orphaned execution detection (>24h old, not in executor) - Add stuck queued detection (QUEUED >1h, never started) - Add long-running execution detection (RUNNING >24h) - Monitor both execution and cancel RabbitMQ queues - Track failure rates (1h, 24h) and execution throughput metrics New Backend Endpoints (15 total): - GET /admin/diagnostics/executions/orphaned - List orphaned executions - GET /admin/diagnostics/executions/stuck-queued - List stuck queued executions - GET /admin/diagnostics/executions/long-running - List long-running executions - GET /admin/diagnostics/executions/failed - List failed executions with error messages - POST /admin/diagnostics/executions/cleanup-all-orphaned - Cleanup all orphaned (operates on entire dataset) - POST /admin/diagnostics/executions/requeue - Requeue single stuck execution - POST /admin/diagnostics/executions/requeue-bulk - Requeue selected executions - POST /admin/diagnostics/executions/requeue-all-stuck - Requeue all stuck queued (operates on entire dataset) Execution Management: - Dual-mode stop: Active executions (cancel signals) vs orphaned (direct DB cleanup) - Intelligent Stop All: Auto-splits active/orphaned, executes in parallel - Requeue functionality for stuck QUEUED executions with credit cost warnings - Stop sends cancel signals to RabbitMQ for graceful termination - Cleanup orphaned updates DB directly without cancel signals - ALL endpoints operate on entire datasets (not limited to pagination) Frontend Enhancements: - 5-tab filtering interface: All, Orphaned, Stuck Queued, Long-Running, Failed - Clickable alert cards (🟠 🔴 🟡) automatically switch to relevant tabs - Tab badges show live counts from diagnostics metrics - Age column displays execution duration (e.g., "245d 12h") - Orange row highlighting for orphaned executions (>24h old) - Error message column for failed executions with hover tooltips - Click-to-copy for execution IDs and user IDs with visual feedback - Status badge colors match library view (blue=RUNNING, yellow=QUEUED, red=FAILED) Tab-Specific Actions: - Stuck Queued: Cleanup All OR Requeue All buttons with cost warnings - Stuck Queued per-row: 🟠 Cleanup OR 🔵 Requeue buttons - Orphaned: Cleanup All (operates on ALL orphaned) - Long-Running: Stop All (sends cancel signals) - Failed: View-only with error details - All: Stop All (intelligent split of active/orphaned) Alert Cards: - 🟠 Orphaned: Shows count with RUNNING/QUEUED breakdown, click to view - 🔴 Failed (24h): Shows count with hourly rate, click to view - 🟡 Long-Running: Shows count with oldest execution age, click to view Updated Diagnostic Info Card: - Color-coded explanations for each execution type - When to cleanup vs requeue vs stop - Credit cost implications clearly documented - Queue health thresholds explained Provides ~70% coverage of on-call guide requirements for troubleshooting execution issues, orphaned database records, and system health monitoring. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
AutoGPT Platform
Welcome to the AutoGPT Platform - a powerful system for creating and running AI agents to solve business problems. This platform enables you to harness the power of artificial intelligence to automate tasks, analyze data, and generate insights for your organization.
Getting Started
Prerequisites
- Docker
- Docker Compose V2 (comes with Docker Desktop, or can be installed separately)
Running the System
To run the AutoGPT Platform, follow these steps:
-
Clone this repository to your local machine and navigate to the
autogpt_platformdirectory within the repository:git clone <https://github.com/Significant-Gravitas/AutoGPT.git | git@github.com:Significant-Gravitas/AutoGPT.git> cd AutoGPT/autogpt_platform -
Run the following command:
cp .env.default .envThis command will copy the
.env.defaultfile to.env. You can modify the.envfile to add your own environment variables. -
Run the following command:
docker compose up -dThis command will start all the necessary backend services defined in the
docker-compose.ymlfile in detached mode. -
After all the services are in ready state, open your browser and navigate to
http://localhost:3000to access the AutoGPT Platform frontend.
Running Just Core services
You can now run the following to enable just the core services.
# For help
make help
# Run just Supabase + Redis + RabbitMQ
make start-core
# Stop core services
make stop-core
# View logs from core services
make logs-core
# Run formatting and linting for backend and frontend
make format
# Run migrations for backend database
make migrate
# Run backend server
make run-backend
# Run frontend development server
make run-frontend
Docker Compose Commands
Here are some useful Docker Compose commands for managing your AutoGPT Platform:
docker compose up -d: Start the services in detached mode.docker compose stop: Stop the running services without removing them.docker compose rm: Remove stopped service containers.docker compose build: Build or rebuild services.docker compose down: Stop and remove containers, networks, and volumes.docker compose watch: Watch for changes in your services and automatically update them.
Sample Scenarios
Here are some common scenarios where you might use multiple Docker Compose commands:
-
Updating and restarting a specific service:
docker compose build api_srv docker compose up -d --no-deps api_srvThis rebuilds the
api_srvservice and restarts it without affecting other services. -
Viewing logs for troubleshooting:
docker compose logs -f api_srv ws_srvThis shows and follows the logs for both
api_srvandws_srvservices. -
Scaling a service for increased load:
docker compose up -d --scale executor=3This scales the
executorservice to 3 instances to handle increased load. -
Stopping the entire system for maintenance:
docker compose stop docker compose rm -f docker compose pull docker compose up -dThis stops all services, removes containers, pulls the latest images, and restarts the system.
-
Developing with live updates:
docker compose watchThis watches for changes in your code and automatically updates the relevant services.
-
Checking the status of services:
docker compose psThis shows the current status of all services defined in your docker-compose.yml file.
These scenarios demonstrate how to use Docker Compose commands in combination to manage your AutoGPT Platform effectively.
Persisting Data
To persist data for PostgreSQL and Redis, you can modify the docker-compose.yml file to add volumes. Here's how:
-
Open the
docker-compose.ymlfile in a text editor. -
Add volume configurations for PostgreSQL and Redis services:
services: postgres: # ... other configurations ... volumes: - postgres_data:/var/lib/postgresql/data redis: # ... other configurations ... volumes: - redis_data:/data volumes: postgres_data: redis_data: -
Save the file and run
docker compose up -dto apply the changes.
This configuration will create named volumes for PostgreSQL and Redis, ensuring that your data persists across container restarts.
API Client Generation
The platform includes scripts for generating and managing the API client:
pnpm fetch:openapi: Fetches the OpenAPI specification from the backend service (requires backend to be running on port 8006)pnpm generate:api-client: Generates the TypeScript API client from the OpenAPI specification using Orvalpnpm generate:api: Runs both fetch and generate commands in sequence
Manual API Client Updates
If you need to update the API client after making changes to the backend API:
-
Ensure the backend services are running:
docker compose up -d -
Generate the updated API client:
pnpm generate:api
This will fetch the latest OpenAPI specification and regenerate the TypeScript client code.