Files
AutoGPT/autogpt_platform/backend/Dockerfile
Zamil Majdy bb20821634 feat(backend): Add k6 load testing infrastructure + fix critical performance issues (#10941)
# AutoGPT Platform Load Testing Infrastructure

A comprehensive k6-based load testing suite for AutoGPT Platform API
with Grafana Cloud integration for real-time monitoring and performance
analysis.

## 🚀 Quick Start

### Prerequisites
- k6 installed ([Install
Guide](https://k6.io/docs/getting-started/installation/))
- Backend server running (port 8006)
- Valid test user credentials

### Running Tests

#### 1. Setup Test Users (First Time Only)
```bash
cd autogpt_platform/backend/load-tests
k6 run setup-test-users.js
```

#### 2. Basic Load Tests
```bash
# Test API connectivity and authentication
k6 run basic-connectivity-test.js

# Test core API endpoints (credits, profiles)
k6 run core-api-load-test.js

# Test graph operations (create, execute)
k6 run graph-execution-load-test.js

# Full platform integration test
k6 run scenarios/comprehensive-platform-load-test.js
```

#### 3. Run with Grafana Cloud (Optional)
```bash
# Set environment variables
export K6_CLOUD_TOKEN="your-grafana-cloud-token"
export K6_CLOUD_PROJECT_ID="your-project-id"

# Run with cloud monitoring
k6 run basic-connectivity-test.js --out cloud
```

## 📊 Test Scenarios

| Test | Purpose | Endpoints Tested | Load Pattern |
|------|---------|-----------------|-------------|
| **Basic Connectivity** | Validate infrastructure | Auth, health checks
| 1-10 VUs, 10s-5m |
| **Core API** | Test CRUD operations | /api/credits, /api/auth/user |
1-5 VUs, 30s-2m |
| **Graph Execution** | Test graph workflows | /api/graphs,
/api/graphs/*/execute | 1-3 VUs, 1-3m |
| **Comprehensive** | End-to-end user journeys | All major endpoints |
1-2 VUs, 2-5m |

## 🔧 Configuration

### Environment Variables
```bash
# Target Environment
export K6_ENVIRONMENT="dev"    # dev, local, staging

# Load Test Parameters  
export VUS="5"                 # Virtual users (concurrent)
export DURATION="2m"           # Test duration
export REQUESTS_PER_VU="10"    # Requests per user

# Grafana Cloud (Optional)
export K6_CLOUD_TOKEN="your-token"
export K6_CLOUD_PROJECT_ID="your-project-id"
```

### Test Environments
- **LOCAL**: localhost:8006 (development)
- **DEV**: dev-server.agpt.co (staging)

## 📈 Performance Thresholds

Current SLA targets:
- **Response Time P95**: < 2 seconds
- **Error Rate**: < 5%
- **Authentication Success**: > 95%
- **Graph Creation**: < 5 seconds
- **Graph Execution**: < 30 seconds

## 🔍 Current Performance Issues Identified

⚠️ **Load testing reveals significant performance bottlenecks that need
optimization:**

### 📊 **Load Test Results**
| Endpoint | RPS | P95 Latency | Success Rate | Status |
|----------|-----|-------------|--------------|---------|
| Basic Connectivity | 40.6 | 926ms | 99.15% |  |
| Core API | 4.6 | 24.2s | 99.83% | ⚠️ |
| Graph Execution | 1.1 | 47.8s | 70.28% |  |
| Comprehensive Platform | 0.3 | 44.2s | 96.25% |  |

### 🚨 **Critical Issues Requiring Performance Work**
1. **Graph Operations**: 70% failure rate under load, P95 latency 47.8s
2. **Database Bottlenecks**: Transaction timeouts during concurrent
operations
3. **Query Optimization**: Graph creation involves multiple large
database operations
4. **Connection Pooling**: Database connection limits under high
concurrency

###  **Configuration Fixes Applied**
- **Database Transaction Timeout**: Increased from 15s to 30s (bandaid
solution)
- **Block Execution API**: Fixed missing user_context parameter  
- **Credits API Error Handling**: Added proper exception handling
- **CI Tests**: Fixed test_execute_graph_block

**Note**: These are configuration fixes, not performance optimizations.
The underlying performance issues still need to be addressed through
query optimization, database tuning, and application-level improvements.

## 🛠️ Infrastructure Features

- **k6 Load Testing**: JavaScript-based scenarios with realistic user
workflows
- **Grafana Cloud Integration**: Real-time dashboards and alerting
- **Multi-Environment Support**: Dev, local, staging configurations
- **Authentication Testing**: Supabase JWT token validation
- **Performance Monitoring**: SLA validation with configurable
thresholds
- **Automated User Setup**: Test user creation and management

## 📁 Files Structure

```
load-tests/
├── basic-connectivity-test.js          # Infrastructure validation
├── core-api-load-test.js               # Core API testing  
├── graph-execution-load-test.js        # Graph operations
├── setup-test-users.js                 # User management
├── scenarios/
│   └── comprehensive-platform-load-test.js  # End-to-end testing
├── configs/
│   ├── environment.js                  # Environment settings
│   └── grafana-cloud.js               # Monitoring configuration
└── utils/
    └── auth.js                        # Authentication utilities
```

## 🎯 Next Steps for Performance Optimization

1. **Query Optimization**: Profile and optimize graph creation queries
2. **Database Tuning**: Optimize connection pooling and indexing
3. **Caching Strategy**: Implement appropriate caching for frequently
accessed data
4. **Load Balancing**: Fix uneven traffic distribution between pods
5. **Monitoring**: Use this load testing infrastructure to measure
improvements

##  Test Plan
- [x] All load testing scenarios validated locally
- [x] Grafana Cloud integration working
- [x] Test user setup automated
- [x] Performance baselines established
- [x] Critical performance bottlenecks identified
- [x] CI tests passing (test_execute_graph_block fixed)
- [x] Configuration issues resolved
- [ ] **Performance optimizations still needed** (separate work)

**This PR provides the infrastructure to identify and monitor
performance issues. The actual performance optimizations are separate
work that should be prioritized based on these findings.**

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-09-22 08:28:57 +07:00

105 lines
3.5 KiB
Docker

FROM debian:13-slim AS builder
# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV DEBIAN_FRONTEND=noninteractive
WORKDIR /app
RUN echo 'Acquire::http::Pipeline-Depth 0;\nAcquire::http::No-Cache true;\nAcquire::BrokenProxy true;\n' > /etc/apt/apt.conf.d/99fixbadproxy
# Install Node.js repository key and setup
RUN apt-get update --allow-releaseinfo-change --fix-missing \
&& apt-get install -y curl ca-certificates gnupg \
&& mkdir -p /etc/apt/keyrings \
&& curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg \
&& echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_20.x nodistro main" | tee /etc/apt/sources.list.d/nodesource.list
# Update package list and install Python, Node.js, and build dependencies
RUN apt-get update \
&& apt-get install -y \
python3.13 \
python3.13-dev \
python3.13-venv \
python3-pip \
build-essential \
libpq5 \
libz-dev \
libssl-dev \
postgresql-client \
nodejs \
&& rm -rf /var/lib/apt/lists/*
ENV POETRY_HOME=/opt/poetry
ENV POETRY_NO_INTERACTION=1
ENV POETRY_VIRTUALENVS_CREATE=true
ENV POETRY_VIRTUALENVS_IN_PROJECT=true
ENV PATH=/opt/poetry/bin:$PATH
RUN pip3 install poetry --break-system-packages
# Copy and install dependencies
COPY autogpt_platform/autogpt_libs /app/autogpt_platform/autogpt_libs
COPY autogpt_platform/backend/poetry.lock autogpt_platform/backend/pyproject.toml /app/autogpt_platform/backend/
WORKDIR /app/autogpt_platform/backend
RUN poetry install --no-ansi --no-root
# Generate Prisma client
COPY autogpt_platform/backend/schema.prisma ./
RUN poetry run prisma generate
FROM debian:13-slim AS server_dependencies
WORKDIR /app
ENV POETRY_HOME=/opt/poetry \
POETRY_NO_INTERACTION=1 \
POETRY_VIRTUALENVS_CREATE=true \
POETRY_VIRTUALENVS_IN_PROJECT=true \
DEBIAN_FRONTEND=noninteractive
ENV PATH=/opt/poetry/bin:$PATH
# Install Python without upgrading system-managed packages
RUN apt-get update && apt-get install -y \
python3.13 \
python3-pip \
&& rm -rf /var/lib/apt/lists/*
# Copy only necessary files from builder
COPY --from=builder /app /app
COPY --from=builder /usr/local/lib/python3* /usr/local/lib/python3*
COPY --from=builder /usr/local/bin/poetry /usr/local/bin/poetry
# Copy Node.js installation for Prisma
COPY --from=builder /usr/bin/node /usr/bin/node
COPY --from=builder /usr/lib/node_modules /usr/lib/node_modules
COPY --from=builder /usr/bin/npm /usr/bin/npm
COPY --from=builder /usr/bin/npx /usr/bin/npx
COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries
ENV PATH="/app/autogpt_platform/backend/.venv/bin:$PATH"
RUN mkdir -p /app/autogpt_platform/autogpt_libs
RUN mkdir -p /app/autogpt_platform/backend
COPY autogpt_platform/autogpt_libs /app/autogpt_platform/autogpt_libs
COPY autogpt_platform/backend/poetry.lock autogpt_platform/backend/pyproject.toml /app/autogpt_platform/backend/
WORKDIR /app/autogpt_platform/backend
FROM server_dependencies AS migrate
# Migration stage only needs schema and migrations - much lighter than full backend
COPY autogpt_platform/backend/schema.prisma /app/autogpt_platform/backend/
COPY autogpt_platform/backend/migrations /app/autogpt_platform/backend/migrations
FROM server_dependencies AS server
COPY autogpt_platform/backend /app/autogpt_platform/backend
RUN poetry install --no-ansi --only-root
ENV PORT=8000
CMD ["poetry", "run", "rest"]