AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-02-07 21:35:34 -05:00

Author	SHA1	Message	Date
Nicholas Tindle	634bff8277	refactor(forge): replace Selenium with Playwright for web browsing - Remove selenium.py and test_selenium.py - Add playwright_browser.py with WebPlaywrightComponent - Update web component exports to use Playwright - Update dependencies in pyproject.toml/poetry.lock - Minor agent and reflexion strategy improvements - Update CLAUDE.md documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:57:17 -06:00
Nicholas Tindle	d591f36c7b	fix(direct_benchmark): track cost from LLM provider Previously cost was hardcoded to 0.0. Now extracts cumulative cost from MultiProvider.get_incurred_cost() after each step execution. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:37:12 -06:00
Nicholas Tindle	a347bed0b1	feat(direct_benchmark): add incremental resume and selective reset Benchmarks now automatically save progress and resume from where they left off. State is persisted to .benchmark_state.json in reports dir. Features: - Auto-resume: runs skip already-completed challenges - --fresh: clear all state and start over - --retry-failures: re-run only failed challenges - --reset-strategy/model/challenge: selective resets - `state show/clear/reset` subcommands for state management - Config mismatch detection with auto-reset Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:32:27 -06:00
Nicholas Tindle	4eeb6ee2b0	feat(direct_benchmark): add CI mode for non-interactive environments Add --ci flag that disables Rich Live display while preserving completion blocks. Auto-detects CI environment via CI env var or non-TTY stdout. Prints progress every 10 completions for visibility. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:21:10 -06:00
Nicholas Tindle	7db962b9f9	feat(direct_benchmark): dynamic column layout up to 10 wide - Calculate max columns based on terminal width (up to 10) - Reduced panel width from 35 to 30 chars to fit more - Wider terminals can now show more parallel runs side-by-side Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:15:16 -06:00
Nicholas Tindle	9108b21541	fix(direct_benchmark): parallel execution and always show completion blocks Fixes: - Use run_key (config:challenge) instead of just config_name for tracking active runs - allows multiple challenges from same config to run in parallel - Add asyncio.sleep(0) yields to let multiple tasks acquire semaphore and start before any proceed with work - Always print completion blocks (not just failures) for visibility This should properly show 8/8 active runs when running with --parallel 8. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:13:56 -06:00
Nicholas Tindle	ffe9325296	feat(direct_benchmark): multi-panel UI with copy-paste completion blocks UI improvements: - Multi-column layout: each active config gets its own panel showing challenge name and step history (last 6 steps with status) - Copy-paste completion blocks: when a challenge finishes (especially failures), prints a detailed block with all steps for easy debugging - Configurable logging: suppresses noisy LLM provider warnings unless --debug flag is set - Pass debug flag through harness to UI Example active runs panel: ┌─ one_shot/claude ─┬─ rewoo/claude ────┐ │ ReadFile │ WriteFile │ │ ✓ #1 read_file │ ✓ #1 think │ │ ✓ #2 write_file │ ✓ #2 plan │ │ ● step 3: ... │ ● step 3: ... │ └───────────────────┴───────────────────┘ Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:10:34 -06:00
Nicholas Tindle	0a616d9267	feat(direct_benchmark): add step-level logging with colored prefixes - Add step callback to AgentRunner for real-time step logging - BenchmarkUI now shows: - Active runs with current step info - Recent steps panel with colored config prefixes - Proper Live display refresh (implements __rich_console__) - Each config gets a distinct color for easy identification - Verbose mode prints step logs immediately with config prefix - Fix Live display not updating (pass UI object, not rendered content) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:02:20 -06:00
Nicholas Tindle	ab95077e5b	refactor(forge): remove VCR cassettes, use real API calls with skip for forks - Remove vcrpy and pytest-recording dependencies - Remove tests/vcr/ directory and vcr_cassettes submodule - Remove .gitmodules (only had cassette submodule) - Simplify CI workflow - no more cassette checkout/push/PAT_REVIEW - Tests requiring API keys now skip if not set (fork PRs) - Update CLAUDE.md files to remove cassette references - Fix broken agbenchmark path in pyproject.toml Security improvement: removes need for PAT with cross-repo write access. Fork PRs will have API-dependent tests skipped (GitHub protects secrets). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 22:51:57 -06:00
Nicholas Tindle	e477150979	Merge branch 'dev' into make-old-work	2026-01-19 22:30:46 -06:00
Nicholas Tindle	804430e243	refactor(classic): migrate from agbenchmark to direct_benchmark harness - Remove old benchmark/ folder with agbenchmark framework - Move challenges to direct_benchmark/challenges/ - Move analysis tools (analyze_reports.py, analyze_failures.py) to direct_benchmark/ - Move challenges_already_beaten.json to direct_benchmark/ - Update CI workflow to use direct_benchmark - Update CLAUDE.md files with new benchmarking instructions - Add benchmarking section to original_autogpt/CLAUDE.md The direct_benchmark harness directly instantiates agents without HTTP server overhead, enabling parallel execution with asyncio semaphore. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 22:29:51 -06:00
Nicholas Tindle	acb320d32d	feat(classic): add noninteractive mode env var and benchmark config logging - Add NONINTERACTIVE_MODE env var support to AppConfig for disabling user interaction during automated runs - Benchmark harness now sets NONINTERACTIVE_MODE=True when starting agents - Add agent configuration logging at server startup (model, strategy, etc.) - Harness logs env vars being passed to agent for verification - Add --agent-output flag to show full agent server output for debugging Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 19:40:24 -06:00
Nicholas Tindle	32f68d5999	feat(classic): add failure analysis tool and improve benchmark output Benchmark improvements: - Add analyze_failures.py for pattern detection and failure analysis - Add informative step output: tool name, args, result status, cost - Add --all and --matrix flags for comprehensive model/strategy testing - Add --analyze-only and --no-analyze flags for flexible analysis control - Auto-run failure analysis after benchmarks with markdown export - Fix directory creation bug in ReportManager (add parents=True) Prompt strategy enhancements: - Implement full plan_execute, reflexion, rewoo, tree_of_thoughts strategies - Add PROMPT_STRATEGY env var support for strategy selection - Add extended thinking support for Anthropic models - Add reasoning effort support for OpenAI o-series models LLM provider improvements: - Add thinking_budget_tokens config for Anthropic extended thinking - Add reasoning_effort config for OpenAI reasoning models - Improve error feedback for LLM self-correction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 18:58:41 -06:00
Nicholas Tindle	49f56b4e8d	feat(classic): enhance strategy benchmark harness with model comparison and bug fixes - Add model comparison support to test harness (claude, openai, gpt5, opus presets) - Add --models, --smart-llm, --fast-llm, --list-models CLI args - Add real-time logging with timestamps and progress indicators - Fix success parsing bug: read results[0].success instead of non-existent metrics.success - Fix agbenchmark TestResult validation: use exception typename when value is empty - Fix WebArena challenge validation: use strings instead of integers in instantiation_dict - Fix Agent type annotations: create AnyActionProposal union for all prompt strategies - Add pytest integration tests for the strategy benchmark harness Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 18:07:14 -06:00
Swifty	bc75d70e7d	refactor(backend): Improve Langfuse tracing with v3 SDK patterns and @observe decorators (#11803 ) <!-- Clearly explain the need for these changes: --> This PR improves the Langfuse tracing implementation in the chat feature by adopting the v3 SDK patterns, resulting in cleaner code and better observability. ### Changes 🏗️ - Simplified Langfuse client usage: Replace manual client initialization with `langfuse.get_client()` global singleton - Use v3 context managers: Switch to `start_as_current_observation()` and `propagate_attributes()` for automatic trace propagation - Auto-instrument OpenAI calls: Use `langfuse.openai` wrapper for automatic LLM call tracing instead of manual generation tracking - Add `@observe` decorators: All chat tools now have `@observe(as_type="tool")` decorators for automatic tool execution tracing: - `add_understanding` - `view_agent_output` (renamed from `agent_output`) - `create_agent` - `edit_agent` - `find_agent` - `find_block` - `find_library_agent` - `get_doc_page` - `run_agent` - `run_block` - `search_docs` - Remove manual trace lifecycle: Eliminated the verbose `finally` block that manually ended traces/generations - Rename tool: `agent_output` → `view_agent_output` for clarity ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified chat feature works with Langfuse tracing enabled - [x] Confirmed traces appear correctly in Langfuse dashboard with tool spans - [x] Tested tool execution flows show up as nested observations #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under Changes) No configuration changes required - uses existing Langfuse environment variables.	2026-01-19 20:56:51 +00:00
Nicholas Tindle	bead811e73	docs(classic): add workspace, settings, and permissions documentation Document the layered configuration system including: - Workspace structure (.autogpt/ directory layout) - Settings location (environment variables, workspace YAML, agent YAML) - Permission system (check order, pattern syntax, approval scopes) - Default security behavior Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 12:17:10 -06:00
Nicholas Tindle	013f728ebf	feat(forge): improve tool call error feedback for LLM self-correction When tool calls fail validation, the error messages now include: - What arguments were actually provided - The expected parameter schema with types and required/optional indicators This helps LLMs understand and fix their mistakes when retrying, rather than just being told a parameter is missing. Example improved error: Invalid function call for write_file: 'contents' is a required property You provided: {"filename": 'story.txt'} Expected parameters: {"filename": string (required), "contents": string (required)} Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 11:49:17 -06:00
Nicholas Tindle	cda9572acd	feat(forge): add lightweight web fetch component Add WebFetchComponent for fast HTTP-based page fetching without browser overhead. Uses trafilatura for intelligent content extraction. Commands: - fetch_webpage: Extract main content as text/markdown/xml - Removes navigation, ads, boilerplate automatically - Extracts page metadata (title, description, author, date) - Extracts and lists page links - Much faster than Selenium-based read_webpage - fetch_raw_html: Get raw HTML for structure inspection - Optional truncation for large pages Features: - Trafilatura-powered content extraction (best-in-class accuracy) - Automatic link extraction with relative URL resolution - Page metadata extraction (OG tags, meta tags) - Configurable timeout, max content length, max links - Proper error handling for timeouts and HTTP errors - 19 comprehensive tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 01:04:22 -06:00
Nicholas Tindle	c1a1767034	feat(docs): Add block documentation auto-generation system (#11707 ) - Add generate_block_docs.py script that introspects block code to generate markdown - Support manual content preservation via <!-- MANUAL: --> markers - Add migrate_block_docs.py to preserve existing manual content from git HEAD - Add CI workflow (docs-block-sync.yml) to fail if docs drift from code - Add Claude PR review workflow (docs-claude-review.yml) for doc changes - Add manual LLM enhancement workflow (docs-enhance.yml) - Add GitBook configuration (.gitbook.yaml, SUMMARY.md) - Fix non-deterministic category ordering (categories is a set) - Add comprehensive test suite (32 tests) - Generate docs for 444 blocks with 66 preserved manual sections 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> <!-- Clearly explain the need for these changes: --> ### Changes 🏗️ <!-- Concisely describe all of the changes made in this pull request: --> ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: <!-- Put your test plan here: --> - [x] Extensively test code generation for the docs pages <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Introduces an automated documentation pipeline for blocks and integrates it into CI. > > - Adds `scripts/generate_block_docs.py` (+ tests) to introspect blocks and generate `docs/integrations/`, preserving `<!-- MANUAL: -->` sections > - New CI workflows: docs-block-sync (fails if docs drift), docs-claude-review (AI review for block/docs PRs), and docs-enhance** (optional LLM improvements) > - Updates existing Claude workflows to use `CLAUDE_CODE_OAUTH_TOKEN` instead of `ANTHROPIC_API_KEY` > - Improves numerous block descriptions/typos and links across backend blocks to standardize docs output > - Commits initial generated docs including `docs/integrations/README.md` and many provider/category pages > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `631e53e0f6`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 07:03:19 +00:00
Nicholas Tindle	e0784f8f6b	refactor(forge): simplify deeply nested error handling in Anthropic provider - Extract _get_tool_error_message helper method - Replace 20+ levels of nesting with simple for loop - Improve readability of tool_result construction - Update benchmark poetry.lock Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 00:15:33 -06:00
Nicholas Tindle	3040f39136	feat(forge): modernize web search with tiered provider system Replace basic DuckDuckGo-only search with a modern tiered system: 1. Tavily (primary) - AI-optimized results with content extraction - AI-generated answer summaries - Relevance scoring - Full page content extraction via search_and_extract command 2. Serper (secondary) - Fast, cheap Google SERP results - $0.30-1.00 per 1K queries - Real Google results without scraping 3. DDGS multi-engine (fallback) - Free, no API key required - Automatic fallback chain: DuckDuckGo → Bing → Brave → Google → etc. - 8 search backends supported Key changes: - Upgrade duckduckgo-search to ddgs v9.10 (renamed successor package) - Add Tavily and Serper API integrations - Implement automatic provider selection and fallback chain - Add search_and_extract command for research with content extraction - Add TAVILY_API_KEY and SERPER_API_KEY to env templates - Update benchmark httpx constraint for ddgs compatibility - 23 comprehensive tests for all providers and fallback scenarios Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 00:06:42 -06:00
Nicholas Tindle	515504c604	fix(classic): resolve pyright type errors in original_autogpt - Change Agent class to use ActionProposal instead of OneShotAgentActionProposal to support multiple prompt strategy types - Widen display_thoughts parameter type from AssistantThoughts to ModelWithSummary - Fix speak attribute access in agent_protocol_server with hasattr check - Add type: ignore comments for intentional thoughts field overrides in strategies - Remove unused OneShotAgentActionProposal import Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 23:53:23 -06:00
Nicholas Tindle	18edeaeaf4	fix(classic): fix linting and formatting errors across codebase - Fix 32+ flake8 E501 (line too long) errors by shortening descriptions - Remove unused import in todo.py - Fix test_todo.py argument order (config= keyword) - Add type annotations to fix pyright errors where straightforward - Add noqa comments for flake8 false positives in __init__.py - Remove unused nonlocal declarations in main.py - Run black and isort to fix formatting - Update CLAUDE.md with improved linting commands Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 23:37:28 -06:00
Nicholas Tindle	44182aff9c	feat(classic): add strategy benchmark test harness for CI - Add test_prompt_strategies.py harness to compare prompt strategies - Add pytest wrapper (test_strategy_benchmark.py) for CI integration - Fix serve command (remove invalid --port flag, use AP_SERVER_PORT env) - Fix test category (interface -> general) - Add aiohttp-retry dependency for agbenchmark - Add pytest markers: slow, integration, requires_agent Usage: poetry run python agbenchmark_config/test_prompt_strategies.py --quick poetry run pytest tests/integration/test_strategy_benchmark.py -v Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 23:36:19 -06:00
Nicholas Tindle	864c5a7846	fix(classic): approve+feedback now executes command then sends feedback Previously, when a user selected "Once" or "Always" with feedback (via Tab), the command was NOT executed because UserFeedbackProvided was raised before checking the approval scope. This fix changes the architecture from exception-based to return-value-based. Changes: - Add PermissionCheckResult class with allowed, scope, and feedback fields - Change check_command() to return PermissionCheckResult instead of bool - Update prompt_fn signature to return (ApprovalScope, feedback) tuple - Add pending_user_feedback mechanism to EpisodicActionHistory - Update execute() to handle feedback after successful command execution - Feedback message explicitly states "Command executed successfully" - Add on_auto_approve callback for displaying auto-approved commands - Add comprehensive tests for approval/denial with feedback scenarios Behavior: - Once + feedback → Execute command, then send feedback to agent - Always + feedback → Execute command, save permission, send feedback - Deny + feedback → Don't execute, send feedback to agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 22:32:43 -06:00
Nicholas Tindle	699fffb1a8	feat(classic): add Rich interactive selector for command approval Adds a custom Rich-based interactive selector for the command approval workflow. Features include: - Arrow key navigation for selecting approval options - Tab to add context to any selection (e.g., "Once + also check file x") - Dedicated inline feedback option with shadow placeholder text - Quick select with number keys 1-5 - Works within existing asyncio event loop (no prompt_toolkit dependency) Also adds UIProvider abstraction pattern for future UI implementations. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 21:49:43 -06:00
Nicholas Tindle	f0641c2d26	fix(classic): auto-advance plan steps in Plan-Execute strategy The strategy was stuck in a loop because it tracked plan steps but never advanced them - the record_step_success() method existed but was never called by the agent's execution loop. Fix by using a _pending_step_advance flag to track when an action has been proposed. On the next parse_response_content() call, advance the previous step before processing the new response. This keeps step tracking self-contained in the strategy without requiring agent changes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 21:14:16 -06:00
Nicholas Tindle	94b6f74c95	feat(classic): add multiple prompt strategies for agent reasoning Implement four new prompt strategies based on research papers: - ReWOO: Reasoning Without Observation (5x token efficiency) - Plan-and-Execute: Separate planning from execution phases - Reflexion: Verbal reinforcement learning with episodic memory - Tree of Thoughts: Deliberate problem solving with tree search Each strategy extends a new BaseMultiStepPromptStrategy base class with shared utilities. Strategies are selectable via PROMPT_STRATEGY environment variable or config.prompt_strategy setting. Fix JSONSchema generation issue where Optional/Union types created anyOf schemas without direct type field - resolved by storing plan/phase state in strategy instances rather than ActionProposal. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 20:33:10 -06:00
Nicholas Tindle	46aabab3ea	feat(classic): upgrade to Python 3.12+ with CI testing on 3.12, 3.13, 3.14 - Update Python version constraint from ^3.10 to ^3.12 in all pyproject.toml - Update classifiers to reflect Python 3.12, 3.13, 3.14 support - Update dependencies for Python 3.13+ compatibility: - chromadb: ^0.4.10 -> ^1.4.0 - numpy: >=1.26.0,<2.0.0 -> >=2.0.0 - watchdog: 4.0.0 -> ^6.0.0 - spacy: ^3.0.0 -> ^3.8.0 (numpy 2.x compatibility) - en-core-web-sm model: 3.7.1 -> 3.8.0 - httpx (benchmark): ^0.24.0 -> ^0.27.0 - Update tool configuration: - Black target-version: py310 -> py312 - Pyright pythonVersion: 3.10 -> 3.12 - Update Dockerfiles to use Python 3.12 - Update CI workflows to test on Python 3.12, 3.13, and 3.14 - Regenerate all poetry.lock files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 20:25:11 -06:00
Nicholas Tindle	0a65df5102	fix(classic): always use native tool calling, fix N/A command loop - Remove openai_functions config option - native tool calling is now always enabled - Remove use_functions_api from BaseAgentConfiguration and prompt strategy - Add use_prefill config to disable prefill for Anthropic (prefill + tools incompatible) - Update anthropic dependency to ^0.45.0 for tools API support - Simplify prompt strategy to always expect tool_calls from LLM response This fixes the N/A command loop bug where models would output "N/A" as a command name when function calling was disabled. With native tool calling always enabled, models are forced to pick from valid tools only. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 19:54:40 -06:00
Nicholas Tindle	6fbd208fe3	chore: ignore .claude/settings.local.json in all directories Update gitignore to use glob pattern for settings.local.json files in any .claude directory. Also untrack the existing file. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 18:54:42 -06:00
Nicholas Tindle	8fc174ca87	refactor(classic): simplify log format by removing timestamps Remove asctime from log formats since terminal output already has timestamps from the logging infrastructure. Makes logs cleaner and easier to read. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 18:52:47 -06:00
Nicholas Tindle	cacc89790f	feat(classic): improve AutoGPT configuration and setup Environment loading: - Search for .env in multiple locations (cwd, ~/.autogpt, ~/.config/autogpt) - Allows running autogpt from any directory - Document search order in .env.template Setup simplification: - Remove interactive AI settings revision (was broken/unused) - Simplify to just printing current settings - Clean up unused imports Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 18:52:38 -06:00
Nicholas Tindle	b9113bee02	feat(classic): enhance existing components with new capabilities CodeExecutorComponent: - Add timeout and env_vars parameters to execution commands - Add execute_shell_popen for streaming output - Improve error handling with CodeTimeoutError FileManagerComponent: - Add file_info, file_search, file_copy, file_move commands - Add directory_create, directory_list_tree commands - Better path validation and error messages GitOperationsComponent: - Add git_log, git_show, git_branch commands - Add git_stash, git_stash_pop, git_stash_list commands - Add git_cherry_pick, git_revert, git_reset commands - Add git_remote, git_fetch, git_pull, git_push commands UserInteractionComponent: - Add ask_multiple_choice for structured options - Add notify_user for non-blocking notifications - Add confirm_action for yes/no confirmations WebSearchComponent: - Minor error handling improvements WebSeleniumComponent: - Add get_page_content, execute_javascript commands - Add take_element_screenshot command - Add wait_for_element, scroll_page commands - Improve element interaction reliability Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 18:52:27 -06:00
Nicholas Tindle	3f65da03e7	feat(classic): add new exception types for enhanced error handling Add specialized exception classes for better error reporting: - CodeTimeoutError: For code execution timeouts - HTTPError: For HTTP request failures with status code/URL - DataProcessingError: For JSON/CSV processing errors Each exception includes helpful hints for users. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 18:52:10 -06:00
Nicholas Tindle	9e96d11b2d	feat(classic): add utility components for agent capabilities Add 6 new utility components to expand agent functionality: - ArchiveHandlerComponent: ZIP/TAR archive operations (create, extract, list) - ClipboardComponent: In-memory clipboard for copy/paste operations - DataProcessorComponent: CSV/JSON data manipulation and analysis - HTTPClientComponent: HTTP requests (GET, POST, PUT, DELETE) - MathUtilsComponent: Mathematical calculations and statistics - TextUtilsComponent: Text processing (regex, diff, encoding, hashing) All components follow the forge component pattern with: - CommandProvider for exposing commands - DirectiveProvider for resources/best practices - Comprehensive parameter validation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 18:50:52 -06:00
Nicholas Tindle	4c264b7ae9	feat(classic): add TodoComponent with LLM-powered decomposition Add a task management component modeled after Claude Code's TodoWrite: - TodoItem with recursive sub_items for hierarchical task structure - todo_write: atomic list replacement with sub-items support - todo_read: retrieve current todos with nested structure - todo_clear: clear all todos - todo_decompose: use smart LLM to break down tasks into sub-steps Features: - Hierarchical task tracking with independent status per sub-item - MessageProvider shows todos in LLM context with proper indentation - DirectiveProvider adds best practices for task management - Graceful fallback when LLM provider not configured Integrates with: - original_autogpt Agent (full LLM decomposition support) - ForgeAgent (basic task tracking, no decomposition) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 18:49:48 -06:00
Nicholas Tindle	0adbc0bd05	fix(classic): update CI for removed frontend and helper scripts Remove references to deleted files (./run, cli.py, setup.py, frontend/) from CI workflows. Replace ./run agent start with direct poetry commands to start agent servers in background. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 17:41:11 -06:00
Nicholas Tindle	8f3291bc92	feat(classic): add workspace permissions system for agent commands Add a layered permission system that controls agent command execution: - Create autogpt.yaml in .autogpt/ folder with default allow/deny rules - File operations in workspace allowed by default - Sensitive files (.env, .key, .pem) blocked by default - Dangerous shell commands (sudo, rm -rf) blocked by default - Interactive prompts for unknown commands (y=agent, Y=workspace, n=deny) - Agent-specific permissions stored in .autogpt/agents/{id}/permissions.yaml Files added: - forge/forge/config/workspace_settings.py - Pydantic models for settings - forge/forge/permissions.py - CommandPermissionManager with pattern matching Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 17:39:33 -06:00
Nicholas Tindle	7a20de880d	chore: add .autogpt/ to gitignore The .autogpt/ directory is where AutoGPT stores agent data when running from any directory. This should not be committed to version control. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 17:02:47 -06:00
Nicholas Tindle	ef8a6d2528	feat(classic): make AutoGPT installable and runnable from any directory Add --workspace option to CLI that defaults to current working directory, allowing users to run `autogpt` from any folder. Agent data is now stored in `.autogpt/` subdirectory of the workspace instead of a hardcoded path. Changes: - Add -w/--workspace CLI option to run and serve commands - Remove dependency on forge package location for PROJECT_ROOT - Update config to use workspace instead of project_root - Store agent data in .autogpt/ within workspace directory - Update pyproject.toml files with proper PyPI metadata - Fix outdated tests to match current implementation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 17:00:36 -06:00
Nicholas Tindle	fd66be2aaa	chore(classic): remove unneeded files and add CLAUDE.md docs - Remove deprecated Flutter frontend (replaced by autogpt_platform) - Remove shell scripts (run, setup, autogpt.sh, etc.) - Remove tutorials (outdated) - Remove CLI-USAGE.md and FORGE-QUICKSTART.md - Add CLAUDE.md files for Claude Code guidance Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 16:17:35 -06:00
Nicholas Tindle	ae2cc97dc4	feat(classic): add modern Anthropic models and fix deprecated API - Add Claude 3.5 v2, Claude 4 Sonnet, Claude 4 Opus, and Claude 4.5 Opus models - Add rolling aliases (CLAUDE_SONNET, CLAUDE_OPUS, CLAUDE_HAIKU) - Fix deprecated beta.tools.messages.create API call to use standard messages.create - Update anthropic SDK from ^0.25.1 to >=0.40,<1.0 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 16:15:16 -06:00
Nicholas Tindle	1b56ff13d9	test	2026-01-18 15:32:10 -06:00
Zamil Majdy	f31c160043	feat(platform): add endedAt field and fix execution analytics timestamps (#11759 ) ## Summary This PR adds proper execution end time tracking and fixes timestamp handling throughout the execution analytics system. ### Key Changes 1. Added `endedAt` field to database schema - Executions now have a dedicated field for tracking when they finish 2. Fixed timestamp nullable handling - `started_at` and `ended_at` are now properly nullable in types 3. Fixed chart aggregation - Reduced threshold from ≥3 to ≥1 executions per day 4. Improved timestamp display - Moved timestamps to expandable details section in analytics table 5. Fixed nullable timestamp bugs - Updated all frontend code to handle null timestamps correctly ## Problem Statement ### Issue 1: Missing Execution End Times Previously, executions used `updatedAt` (last DB update) as a proxy for "end time". This broke when adding correctness scores retroactively - the end time would change to whenever the score was added, not when the execution actually finished. ### Issue 2: Chart Shows Only One Data Point The accuracy trends chart showed only one data point despite having executions across multiple days. Root cause: aggregation required ≥3 executions per day. ### Issue 3: Incorrect Type Definitions Manually maintained types defined `started_at` and `ended_at` as non-nullable `Date`, contradicting reality where QUEUED executions haven't started yet. ## Solution ### Database Schema (`schema.prisma`) ```prisma model AgentGraphExecution { // ... startedAt DateTime? endedAt DateTime? // NEW FIELD // ... } ``` ### Execution Lifecycle - QUEUED: `startedAt = null`, `endedAt = null` (not started) - RUNNING: `startedAt = set`, `endedAt = null` (in progress) - COMPLETED/FAILED/TERMINATED: `startedAt = set`, `endedAt = set` (finished) ### Migration Strategy ```sql -- Add endedAt column ALTER TABLE "AgentGraphExecution" ADD COLUMN "endedAt" TIMESTAMP(3); -- Backfill ONLY terminal executions (prevents marking RUNNING executions as ended) UPDATE "AgentGraphExecution" SET "endedAt" = "updatedAt" WHERE "endedAt" IS NULL AND "executionStatus" IN ('COMPLETED', 'FAILED', 'TERMINATED'); ``` ## Changes by Component ### Backend `schema.prisma` - Added `endedAt` field to `AgentGraphExecution` `execution.py` - Made `started_at` and `ended_at` optional with Field descriptions - Updated `from_db()` to use `endedAt` instead of `updatedAt` - `update_graph_execution_stats()` sets `endedAt` when status becomes terminal `execution_analytics_routes.py` - Removed `created_at`/`updated_at` from `ExecutionAnalyticsResult` (DB metadata, not execution data) - Kept only `started_at`/`ended_at` (actual execution runtime) - Made settings global (avoid recreation) - Moved OpenAI key validation to `_process_batch` (only check when LLM actually runs) `analytics.py` - Fixed aggregation: `COUNT() >= 1` (was 3) - include all days with ≥1 execution - Uses `createdAt` for chart grouping (when execution was queued) `late_execution_monitor.py`* - Handle optional `started_at` with fallback to `datetime.min` for sorting - Display "Not started" when `started_at` is null ### Frontend Type Definitions - Fixed manually maintained `types.ts`: `started_at: Date \| null` (was non-nullable) - Generated types were already correct Analytics Components - `AnalyticsResultsTable.tsx`: Show only `started_at`/`ended_at` in 2-column expandable grid - `ExecutionAnalyticsForm.tsx`: Added filter explanation UI Monitoring Components - Fixed null handling bugs: - `OldAgentLibraryView.tsx`: Handle null in reduce function - `agent-runs-selector-list.tsx`: Safe sorting with `?.getTime() ?? 0` - `AgentFlowList.tsx`: Filter/sort with null checks - `FlowRunsStatus.tsx`: Filter null timestamps - `FlowRunsTimeline.tsx`: Filter executions with null timestamps before rendering - `monitoring/page.tsx`: Safe sorting - `ActivityItem.tsx`: Fallback to "recently" for null timestamps ## Benefits ✅ Accurate End Times: `endedAt` is frozen when execution finishes, not updated later ✅ Type Safety: Nullable types match reality, exposing real bugs ✅ Better UX: Chart shows all days with data (not just days with ≥3 executions) ✅ Bug Fixes: 7+ frontend components now handle null timestamps correctly ✅ Documentation: Field descriptions explain when timestamps are null ## Testing ### Backend ```bash cd autogpt_platform/backend poetry run format # ✅ All checks passed poetry run lint # ✅ All checks passed ``` ### Frontend ```bash cd autogpt_platform/frontend pnpm format # ✅ All checks passed pnpm lint # ✅ All checks passed pnpm types # ✅ All type errors fixed ``` ### Test Data Generation Created script to generate 35 test executions across 7 days with correctness scores: ```bash poetry run python scripts/generate_test_analytics_data.py ``` ## Migration Notes ⚠️ Important: The migration only backfills `endedAt` for executions with terminal status (COMPLETED, FAILED, TERMINATED). Active executions (QUEUED, RUNNING) correctly keep `endedAt = null`. ## Breaking Changes None - this is backward compatible: - `endedAt` is nullable, existing code that doesn't use it is unaffected - Frontend already used generated types which were correct - Migration safely backfills historical data <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Introduces explicit execution end-time tracking and normalizes timestamp handling across backend and frontend. > > - Adds `endedAt` to `AgentGraphExecution` (schema + migration); backfills terminal executions; sets `endedAt` on terminal status updates > - Makes `GraphExecutionMeta.started_at/ended_at` optional; updates `from_db()` to use DB `endedAt`; exposes timestamps in `ExecutionAnalyticsResult` > - Moves OpenAI key validation into batch processing; instantiates `Settings` once > - Accuracy trends: reduce daily aggregation threshold to `>= 1`; optional historical series > - Monitoring/analytics UI: results table shows/export `started_at`/`ended_at`; adds chart filter explainer > - Frontend null-safety: update types (`Date \| null`) and fix sorting/filtering/rendering for nullable timestamps across monitoring and library views > - Late execution monitor: safe sorting/display when `started_at` is null > - OpenAPI specs updated for new/nullable fields > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `1d987ca6e5`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>	2026-01-16 21:44:24 +00:00
Nicholas Tindle	06550a87eb	feat(backend): add missed default credentials (#11760 ) ### Changes 🏗️ Fixed missing default credentials and provider name mismatch in the credentials store: 1. Provider name correction (`credentials_store.py:97-103`) - Changed `provider="unreal"` → `provider="unreal_speech"` to match the existing `unreal_speech_api_key` setting and block usage - Updated title from "Use Credits for Unreal" → "Use Credits for Unreal Speech" for clarity 2. Added missing OpenWeatherMap credentials (`credentials_store.py:219-226`) - New `openweathermap_credentials` definition with `APIKeyCredentials` - Uses existing `settings.secrets.openweathermap_api_key` setting that was previously defined but had no credential object - Added to `DEFAULT_CREDENTIALS` list 3. Fixed credentials not exposed in `get_all_creds()` (`credentials_store.py:343-354`) - Added `llama_api_credentials` conditional append (was defined but not returned to users) - Added `v0_credentials` conditional append (was defined but not returned to users) - Added `openweathermap_credentials` conditional append ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified provider name `unreal_speech` matches block usage in `text_to_speech_block.py` - [x] Confirmed `openweathermap_api_key` setting exists in secrets - [x] Confirmed `llama_api_key` and `v0_api_key` settings exist in secrets <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Aligns backend credential definitions and exposes missing system creds; updates frontend to hide new built-ins. > > - Backend `credentials_store.py`: > - Corrects `provider` to `unreal_speech` and updates title > - Adds `openweathermap_credentials`; includes in `DEFAULT_CREDENTIALS` and `get_all_creds()` when key present > - Ensures `llama_api_credentials` and `v0_credentials` are returned by `get_all_creds()` > - Frontend `integrations/page.tsx`: > - Extends `hiddenCredentials` with IDs for `v0`, `webshare_proxy`, and `openweathermap` > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit `e7d46b76c6`. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com>	2026-01-16 21:18:12 +00:00
Nicholas Tindle	088b9998dc	fix(frontend): Fix flaky agent-activity tests by targeting correct agent (#11790 ) This PR fixes flaky agent-activity Playwright tests that were failing intermittently in CI. Closes #11789 ### Changes 🏗️ - Navigate to specific agent by name: Replace `LibraryPage.clickFirstAgent(page)` with `LibraryPage.navigateToAgentByName(page, "Test Agent")` to ensure we're testing the correct agent rather than relying on the first agent in the list - Add retry mechanism for async data loading: Replace direct visibility check with `expect(...).toPass({ timeout: 15000 })` pattern to properly handle asynchronous agent data fetching - Increase timeout: Extended timeout from 8000ms to 15000ms to accommodate slower CI environments ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Verified the test file syntax is correct - [x] Changes target the correct file (`autogpt_platform/frontend/src/tests/agent-activity.spec.ts`) - [x] The retry mechanism follows Playwright best practices using `toPass()` #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes (N/A - no config changes) - [x] `docker-compose.yml` is updated or already compatible with my changes (N/A - no config changes) - [x] I have included a list of my configuration changes in the PR description (under Changes) (N/A - no config changes) --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Nicholas Tindle <ntindle@users.noreply.github.com>	2026-01-16 20:33:47 +00:00
Nicholas Tindle	05c89fa5c0	feat(claude): add vercel-react-best-practices skill (#11777 )	2026-01-16 09:40:58 -07:00
Swifty	8cc8295f14	feat(backend): add agent generator tools for chat copilot (#11781 ) This PR adds the ability to create and edit agents from natural language descriptions in the chat copilot. ### Changes 🏗️ - Added `agent_generator/` module with: - LLM client for OpenAI API calls - Core generation logic for decomposing goals and generating agent JSON - Fixer module to correct common LLM generation errors - Validator to ensure generated agents are structurally valid - Prompts for goal decomposition and agent generation - Utility functions for blocks info and agent saving - Added `CreateAgentTool` - creates new agents from natural language descriptions - Added `EditAgentTool` - edits existing agents using natural language patches - Added response models: `AgentPreviewResponse`, `AgentSavedResponse`, `ClarificationNeededResponse` - Registered new tools in the tools registry ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Run `poetry run format` to ensure code passes linting - [x] Test creating an agent via chat with a natural language description - [x] Test editing an existing agent via chat	2026-01-16 17:11:57 +01:00
Swifty	e55f05c7a8	feat(backend): add chat search tools and BM25 reranking (#11782 ) This PR adds new chat tools for searching blocks and documentation, along with BM25 reranking for improved search relevance. ### Changes 🏗️ New Chat Tools: - `find_block` - Search for available blocks by name/description using hybrid search - `run_block` - Execute a block directly with provided inputs and credentials - `search_docs` - Search documentation with section-level granularity - `get_doc_page` - Retrieve full documentation page content Search Improvements: - Added BM25 reranking to hybrid search for better lexical relevance - Documentation handler now chunks markdown by headings (##) for finer-grained embeddings - Section-based content IDs (`doc_path::section_index`) for precise doc retrieval - Startup embedding backfill in scheduler for immediate searchability Other Changes: - New response models for block and documentation search results - Updated orphan cleanup to handle section-based doc embeddings - Added `rank-bm25` dependency for BM25 scoring - Removed max message limit check in chat service ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] Run find_block tool to search for blocks (e.g., "current time") - [x] Run run_block tool to execute a found block - [x] Run search_docs tool to search documentation - [x] Run get_doc_page tool to retrieve full doc content - [x] Verify BM25 reranking improves search relevance for exact term matches - [x] Verify documentation sections are properly chunked and embedded #### For configuration changes: - [x] `.env.default` is updated or already compatible with my changes - [x] `docker-compose.yml` is updated or already compatible with my changes - [x] I have included a list of my configuration changes in the PR description (under Changes) Dependencies added: `rank-bm25` for BM25 scoring algorithm	2026-01-16 16:18:10 +01:00

1 2 3 4 5 ...

7786 Commits