AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-02-08 05:45:07 -05:00

Author	SHA1	Message	Date
Zamil Majdy	91dd9364bb	fix(backend): implement retry mechanism for SmartDecisionMaker tool call validation (#11015 ) <!-- Clearly explain the need for these changes: --> This PR fixes a critical production issue where SmartDecisionMakerBlock was silently accepting tool calls with typo'd parameter names (e.g., 'maximum_keyword_difficulty' instead of 'max_keyword_difficulty'), causing downstream blocks to receive null values and execution failures. The solution implements comprehensive parameter validation with automatic retry when the LLM provides malformed tool calls, giving the LLM specific feedback to correct the errors. ### Changes 🏗️ <!-- Concisely describe all of the changes made in this pull request: --> Core Validation & Retry Logic (`backend/blocks/smart_decision_maker.py`) - Add tool call parameter validation against function schema - Implement retry mechanism using existing `create_retry_decorator` from `backend.util.retry` - Validate provided parameters against expected schema properties and required fields - Generate specific error messages for unknown parameters (typos) and missing required parameters - Add error feedback to conversation history for LLM learning on retry attempts - Use `input_data.retry` field to configure number of retry attempts Comprehensive Test Coverage (`backend/blocks/test/test_smart_decision_maker.py`) - Add `test_smart_decision_maker_parameter_validation` with 4 comprehensive test scenarios: 1. Tool call with typo'd parameter (should retry and eventually fail with clear error) 2. Tool call missing required parameter (should fail immediately with clear error) 3. Valid tool call with optional parameter missing (should succeed) 4. Valid tool call with all parameters provided (should succeed) - Verify retry mechanism works correctly and respects retry count - Mock LLM responses for controlled testing of validation logic Load Tests Documentation Update (`load-tests/README.md`) - Update documentation to reflect current orchestrator-based architecture - Remove references to deprecated `run-tests.js` and `comprehensive-orchestrator.js` - Streamline documentation to focus on working `orchestrator/orchestrator.js` - Update NPM scripts and command examples for current workflow - Clean up outdated file references to match actual infrastructure Production Impact - Prevents silent failures: Tool call parameter typos now cause retries instead of null downstream values - Maintains compatibility: No breaking changes to existing SmartDecisionMaker functionality - Improves reliability: LLM receives feedback to correct parameter errors - Configurable retries: Uses existing `retry` field for user control - Accurate documentation: Load-tests docs now match actual working infrastructure ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: <!-- Put your test plan here: --> - [x] Run existing SmartDecisionMaker tests to ensure no regressions: `poetry run pytest backend/blocks/test/test_smart_decision_maker.py -xvs` ✅ All 4 tests passed - [x] Run new parameter validation test specifically: `poetry run pytest backend/blocks/test/test_smart_decision_maker.py::test_smart_decision_maker_parameter_validation -xvs` ✅ Passed with retry behavior confirmed - [x] Verify retry mechanism works by checking log output for retry attempts ✅ Confirmed in test logs - [x] Test tool call validation with different scenarios (typos, missing params, valid calls) ✅ All scenarios covered and working - [x] Run code formatting and linting: `poetry run format` ✅ All formatters passed - [x] Verify no breaking changes to existing SmartDecisionMaker functionality ✅ All existing tests pass - [x] Verify load-tests documentation accuracy ✅ README now matches actual orchestrator infrastructure #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under Changes) Note: No configuration changes were needed as this uses existing retry infrastructure and block schema validation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-09-30 16:18:05 +00:00
Zamil Majdy	a97ff641c3	feat(backend): optimize FastAPI endpoints performance and alert system (#11000 ) ## Summary Comprehensive performance optimization fixing event loop binding issues and addressing all PR feedback. ### Original Performance Issues Fixed Event Loop Binding Problems: - JWT authentication dependencies were synchronous, causing thread pool bottlenecks under high concurrency - FastAPI's default thread pool (40 threads) was insufficient for high-load scenarios - Backend services lacked proper event loop configuration Security & Performance Improvements: - Security middleware converted from BaseHTTPMiddleware to pure ASGI for better performance - Added blocks endpoint to cacheable paths for improved response times - Cross-platform uvloop detection with Windows compatibility ### Key Changes Made #### 1. JWT Authentication Async Conversion - Files: `autogpt_libs/auth/dependencies.py`, `autogpt_libs/auth/jwt_utils.py` - Change: Convert all JWT functions to async (`requires_user`, `requires_admin_user`, `get_user_id`, `get_jwt_payload`) - Impact: Eliminates thread pool blocking, improves concurrency handling - Tests: All 25+ authentication tests updated to async patterns #### 2. FastAPI Thread Pool Optimization - File: `backend/server/rest_api.py:82-93` - Change: Configure thread pool size via `config.fastapi_thread_pool_size` - Default: Increased from 40 to higher limit for sync operations - Impact: Better handling of remaining sync dependencies #### 3. Performance-Optimized Security Middleware - File: `backend/server/middleware/security.py` - Change: Pure ASGI implementation replacing BaseHTTPMiddleware - Headers: HTTP spec compliant capitalization (X-Content-Type-Options, X-Frame-Options, etc.) - Caching: Added `/api/blocks` and `/api/v1/blocks` to cacheable paths - Impact: Reduced middleware overhead, improved header compliance #### 4. Cross-Platform Event Loop Configuration - File: `backend/server/rest_api.py:311-312` - Change: Platform-aware uvloop detection: `'uvloop' if platform.system() != 'Windows' else 'auto'` - Impact: Windows compatibility while maintaining Unix performance benefits - Verified: 'auto' is valid uvicorn default parameter #### 5. Enhanced Caching Infrastructure - File: `autogpt_libs/utils/cache.py:118-132` - Change: Per-event-loop asyncio.Lock instances prevent cross-loop deadlocks - Impact: Thread-safe caching across multiple event loops #### 6. Database Query Limits & Performance - Files: Multiple data layer files - Change: Added configurable limits to prevent unbounded queries - Constants: `MAX_GRAPH_VERSIONS_FETCH=50`, `MAX_USER_API_KEYS_FETCH=500`, etc. - Impact: Consistent performance regardless of data volume #### 7. OpenAPI Documentation Improvements - File: `backend/server/routers/v1.py:68-85` - Change: Added proper response model and schema for blocks endpoint - Impact: Better API documentation and type safety #### 8. Error Handling & Retry Logic Fixes - File: `backend/util/retry.py:63` - Change: Accurate retry threshold comments referencing EXCESSIVE_RETRY_THRESHOLD - Impact: Clear documentation for debugging retry scenarios ### ntindle Feedback Addressed ✅ HTTP Header Capitalization: All headers now use proper HTTP spec capitalization ✅ Windows uvloop Compatibility: Clean platform detection with inline conditional ✅ OpenAPI Response Model: Blocks endpoint properly documented in schema ✅ Retry Comment Accuracy: References actual threshold constants instead of hardcoded numbers ✅ Code Cleanliness: Inline conditionals preferred over verbose if statements ### Performance Testing Results Before Optimization: - High latency under concurrent load - Thread pool exhaustion at ~40 concurrent requests - Event loop binding issues causing timeouts After Optimization: - Improved concurrency handling with async JWT pipeline - Configurable thread pool scaling - Cross-platform event loop optimization - Reduced middleware overhead ### Backward Compatibility ✅ All existing functionality preserved ✅ No breaking API changes ✅ Enhanced test coverage with async patterns ✅ Windows and Unix compatibility maintained ### Files Modified Core Authentication & Performance: - `autogpt_libs/auth/dependencies.py` - Async JWT dependencies - `autogpt_libs/auth/jwt_utils.py` - Async JWT utilities - `backend/server/rest_api.py` - Thread pool config + uvloop detection - `backend/server/middleware/security.py` - ASGI security middleware Database & Limits: - `backend/data/includes.py` - Performance constants and configurable includes - `backend/data/api_key.py`, `backend/data/credit.py`, `backend/data/graph.py`, `backend/data/integrations.py` - Query limits Caching & Infrastructure: - `autogpt_libs/utils/cache.py` - Per-event-loop lock safety - `backend/server/routers/v1.py` - OpenAPI improvements - `backend/util/retry.py` - Comment accuracy Testing: - `autogpt_libs/auth/dependencies_test.py` - 25+ async test conversions - `autogpt_libs/auth/jwt_utils_test.py` - Async JWT test patterns Ready for review and production deployment. 🚀 --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-09-29 05:32:48 +00:00
Zamil Majdy	4c000086e6	feat(backend): implement clean k6 load testing infrastructure (#10978 ) ## Summary Implement comprehensive k6 load testing infrastructure for the AutoGPT Platform with clean file organization, unified test runner, and cloud integration. ## Key Features ### 🗂️ Clean File Organization - tests/basic/: Simple validation tests (connectivity, single endpoints) - tests/api/: Core functionality tests (API endpoints, graph execution) - tests/marketplace/: User-facing feature tests (public/library access) - tests/comprehensive/: End-to-end scenario tests (complete user journeys) - orchestrator/: Advanced test orchestration for full suites ### 🚀 Unified Test Runner - Single entry point: `run-tests.js` for both local and cloud execution - 7 available tests: From basic connectivity to comprehensive platform journeys - Flexible execution: Run individual tests, comma-separated lists, or all tests - Auto-configuration: Different VU/duration settings for local vs cloud execution ### 🔐 Advanced Authentication - Pre-authenticated tokens: 24-hour JWT tokens eliminate Supabase rate limiting - Configurable generation: Default 10 tokens, scalable to 150+ for high concurrency - Graceful handling: Proper auth failure detection and recovery - ES module compatible: Modern JavaScript with full import/export support ### ☁️ k6 Cloud Integration - Cloud execution: Tests run on k6 cloud infrastructure for consistent results - Real-time monitoring: Live dashboards with performance metrics - URL tracking: Automatic test result URL capture and storage - Sequential orchestration: Proper delays between tests for resource management ## Test Coverage ### Performance Validated - Core API: 100 VUs successfully testing `/api/credits`, `/api/graphs`, `/api/blocks`, `/api/executions` - Graph Execution: 80 VUs for complete workflow pipeline testing - Marketplace: 150 VUs for public browsing, 100 VUs for authenticated library operations - Authentication: 150+ concurrent users with pre-authenticated token scaling ### User Journey Simulation - Dashboard workflows: Credits checking, graph management, execution monitoring - Marketplace browsing: Public search, agent discovery, category filtering - Library operations: Agent adding, favoriting, forking, detailed views - Complete workflows: End-to-end platform usage with realistic user behavior ## Technical Implementation ### ES Module Compatibility - Full ES module support with modern JavaScript imports/exports - Proper module execution patterns for Node.js compatibility - Clean separation between CommonJS legacy and modern ES modules ### Error Handling & Monitoring - Separate metrics: HTTP status, authentication, JSON validation, overall success - Graceful degradation: Auth failures don't crash VUs, proper error tracking - Performance thresholds: Configurable P95/P99 latency and error rate limits - Custom counters: Track operation types, success rates, user journey completion ### Infrastructure Benefits - Rate limit protection: Pre-auth tokens prevent Supabase auth bottlenecks - Scalable testing: Support for 150+ concurrent users with proper token management - Cloud consistency: Tests run on dedicated k6 cloud servers for reliable results - Development workflow: Local execution for debugging, cloud for performance validation ## Usage ### Quick Start ```bash # Setup and verification export SUPABASE_SERVICE_KEY="your-service-key" node generate-tokens.js node run-tests.js verify # Local testing (development) node run-tests.js run core-api-test DEV # Cloud testing (performance) node run-tests.js cloud all DEV ``` ### NPM Scripts ```bash npm run verify # Quick setup check npm test # All tests locally npm run cloud # All tests in k6 cloud ``` ## Validation Results ✅ Authentication: 100% success rate with fresh 24-hour tokens ✅ File Structure: All imports and references verified correct ✅ Test Execution: All 7 tests execute successfully with proper metrics ✅ Cloud Integration: k6 cloud execution working with proper credentials ✅ Documentation: Complete README with usage examples and troubleshooting ## Files Changed ### Core Infrastructure - `run-tests.js`: Unified test runner supporting local/cloud execution - `generate-tokens.js`: ES module compatible token generation with 24-hour expiry - `README.md`: Comprehensive documentation with updated file references ### Organized Test Structure - `tests/basic/connectivity-test.js`: Basic connectivity validation - `tests/basic/single-endpoint-test.js`: Individual API endpoint testing - `tests/api/core-api-test.js`: Core authenticated API endpoints - `tests/api/graph-execution-test.js`: Graph workflow pipeline testing - `tests/marketplace/public-access-test.js`: Public marketplace browsing - `tests/marketplace/library-access-test.js`: Authenticated marketplace/library operations - `tests/comprehensive/platform-journey-test.js`: Complete user journey simulation ### Configuration - `configs/environment.js`: Environment URLs and performance settings - `package.json`: NPM scripts and dependencies for unified workflow This infrastructure provides a solid foundation for continuous performance monitoring and load testing of the AutoGPT Platform. 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co> Co-authored-by: Reinier van der Leer <pwuts@agpt.co>	2025-09-25 12:51:54 +07:00
Zamil Majdy	50689218ed	feat(backend): implement comprehensive load testing performance fixes + database health improvements (#10965 )	2025-09-24 14:22:57 +07:00

4 Commits