Files
AutoGPT/autogpt_platform/backend/backend/server/external/api.py
Zamil Majdy 1fdc02467b feat(backend): Add comprehensive Prometheus instrumentation for observability (#10923)
## Summary
- Implement comprehensive Prometheus metrics instrumentation for all
FastAPI services
- Add custom business metrics for graph/block executions
- Enable dual publishing to both Grafana Cloud and internal Prometheus

## Related Infrastructure PR
-
https://github.com/Significant-Gravitas/AutoGPT_cloud_infrastructure/pull/214

## Changes

### 📊 Metrics Infrastructure
- Added `prometheus-fastapi-instrumentator` dependency for automatic
HTTP metrics
- Created centralized `instrumentation.py` module for consistent metrics
across services
- Instrumented REST API, WebSocket, and External API services

### 📈 Automatic HTTP Metrics
All FastAPI services now automatically collect:
- **Request latency**: Histogram with custom buckets (10ms to 60s)
- **Request/response size**: Track payload sizes
- **Request counts**: By method, endpoint, and status code
- **Active requests**: Real-time count of in-progress requests
- **Error rates**: 4xx and 5xx responses

### 🎯 Custom Business Metrics
Added domain-specific metrics:
- **Graph executions**: Count by status (success/error/validation_error)
- **Block executions**: Count and duration by block_type and status
- **WebSocket connections**: Active connection gauge
- **Database queries**: Duration histogram by operation and table
- **RabbitMQ messages**: Count by queue and status
- **Authentication**: Attempts by method and status
- **API key usage**: By provider and block type
- **Rate limiting**: Hit count by endpoint

### 🔌 Service Endpoints
Each service exposes metrics at `/metrics`:
- REST API (port 8006): `/metrics`
- WebSocket (port 8001): `/metrics`
- External API: `/external-api/metrics`
- Executor (port 8002): Already had metrics, now enhanced

### 🏷️ Kubernetes Integration
Updated Helm charts with pod annotations:
```yaml
prometheus.io/scrape: "true"
prometheus.io/port: "8006"  # or appropriate port
prometheus.io/path: "/metrics"
```

## Testing
- [x] Install dependencies: `poetry install`
- [x] Run services: `poetry run serve`
- [x] Check metrics endpoints are accessible
- [x] Verify metrics are being collected
- [x] Confirm Grafana Agent can scrape metrics
- [x] Test graph/block execution tracking
- [x] Verify WebSocket connection metrics

## Performance Impact
- Minimal overhead (~1-2ms per request)
- Metrics are collected asynchronously
- Can be disabled via `ENABLE_METRICS=false` env var

## Next Steps
1. Deploy to dev environment
2. Configure Grafana Cloud dashboards
3. Set up alerting rules based on metrics
4. Add more custom business metrics as needed

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-16 12:58:04 +07:00

26 lines
656 B
Python

from fastapi import FastAPI
from backend.monitoring.instrumentation import instrument_fastapi
from backend.server.middleware.security import SecurityHeadersMiddleware
from .routes.v1 import v1_router
external_app = FastAPI(
title="AutoGPT External API",
description="External API for AutoGPT integrations",
docs_url="/docs",
version="1.0",
)
external_app.add_middleware(SecurityHeadersMiddleware)
external_app.include_router(v1_router, prefix="/v1")
# Add Prometheus instrumentation
instrument_fastapi(
external_app,
service_name="external-api",
expose_endpoint=True,
endpoint="/metrics",
include_in_schema=True,
)