Add search_blocks and execute_block commands that expose platform blocks
to classic agents:
- search_blocks: Local search by name, description, or category (fast, offline)
- execute_block: Execute via platform API with automatic credential handling
The loader automatically discovers the platform backend from the monorepo
structure without requiring manual PYTHONPATH configuration.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Docker containers cannot have their mount bindings updated after creation.
When running benchmarks or multiple agent instances, the same container name
could be reused with a different workspace directory, causing the container
to still reference the OLD mount path. This resulted in "python: can't open
file '/workspace/temp*.py'" errors.
The fix: remove existing containers before creating new ones to ensure fresh
mount bindings to the current workspace directory.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When prompts encourage parallel tool execution and the LLM makes multiple
tool calls simultaneously, the Anthropic API requires a tool_result message
for EACH tool_use. Previously, we only created one tool result for the first
tool call, causing "tool_use ids were found without tool_result blocks" errors.
This fix:
- Adds _make_result_messages() to create results for ALL tool calls
- Maps tool names to their outputs from parallel execution results
- Handles errors per-tool from the _errors list
- Falls back gracefully when results are missing
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update black version to match pre-commit hook (24.10.0) and reformat
all files with the new version.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add missing strategies (lats, multi_agent_debate) to PromptStrategyName
- Fix method override signatures for reasoning_effort parameter
- Fix Pydantic Field() overload issues with helper function
- Fix BeautifulSoup Tag type narrowing in web_fetch.py
- Fix Optional member access in playwright_browser.py and rewoo.py
- Convert hasattr patterns to getattr for proper type narrowing
- Add proper type casts for Literal types
- Fix file storage path type conversions
- Exclude legacy challenges/ from pyright checking
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update .flake8 config to exclude workspace directories and ignore E203
- Fix import sorting (isort) across multiple files
- Fix code formatting (black) across multiple files
- Remove unused imports and fix line length issues (flake8)
- Fix f-strings without placeholders and unused variables
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive sub-agent spawning infrastructure that enables prompt
strategies to coordinate multiple agents for advanced reasoning patterns.
New files:
- forge/agent/execution_context.py: ExecutionContext, ResourceBudget,
SubAgentHandle, and AgentFactory protocol for sub-agent lifecycle
- agent_factory/default_factory.py: DefaultAgentFactory implementation
- prompt_strategies/lats.py: Language Agent Tree Search using MCTS
with sub-agents for action expansion and evaluation
- prompt_strategies/multi_agent_debate.py: Multi-agent debate with
proposal, critique, and consensus phases
Key changes:
- BaseMultiStepPromptStrategy gains spawn_sub_agent(), run_sub_agent(),
spawn_and_run(), and run_parallel() methods
- Agent class accepts optional ExecutionContext and injects it into strategies
- Sub-agents enabled by default (enable_sub_agents=True)
- Resource limits: max_depth=5, max_sub_agents=25, max_cycles=25
All 7 strategies now available in benchmark:
one_shot, rewoo, plan_execute, reflexion, tree_of_thoughts, lats, multi_agent_debate
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Merge forge/, original_autogpt/, and direct_benchmark/ into a single Poetry
project to eliminate cross-project path dependency issues.
Changes:
- Create classic/pyproject.toml with merged dependencies from all three projects
- Remove individual pyproject.toml and poetry.lock files from subdirectories
- Update all CLAUDE.md files to reflect commands run from classic/ root
- Update all README.md files with new installation and usage instructions
All packages are now included via the packages directive:
- forge/forge (core agent framework)
- original_autogpt/autogpt (AutoGPT agent)
- direct_benchmark/direct_benchmark (benchmark harness)
CLI entry points preserved: autogpt, serve, direct-benchmark
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove selenium.py and test_selenium.py
- Add playwright_browser.py with WebPlaywrightComponent
- Update web component exports to use Playwright
- Update dependencies in pyproject.toml/poetry.lock
- Minor agent and reflexion strategy improvements
- Update CLAUDE.md documentation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove vcrpy and pytest-recording dependencies
- Remove tests/vcr/ directory and vcr_cassettes submodule
- Remove .gitmodules (only had cassette submodule)
- Simplify CI workflow - no more cassette checkout/push/PAT_REVIEW
- Tests requiring API keys now skip if not set (fork PRs)
- Update CLAUDE.md files to remove cassette references
- Fix broken agbenchmark path in pyproject.toml
Security improvement: removes need for PAT with cross-repo write access.
Fork PRs will have API-dependent tests skipped (GitHub protects secrets).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When tool calls fail validation, the error messages now include:
- What arguments were actually provided
- The expected parameter schema with types and required/optional indicators
This helps LLMs understand and fix their mistakes when retrying,
rather than just being told a parameter is missing.
Example improved error:
Invalid function call for write_file: 'contents' is a required property
You provided: {"filename": 'story.txt'}
Expected parameters: {"filename": string (required), "contents": string (required)}
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add WebFetchComponent for fast HTTP-based page fetching without browser
overhead. Uses trafilatura for intelligent content extraction.
Commands:
- fetch_webpage: Extract main content as text/markdown/xml
- Removes navigation, ads, boilerplate automatically
- Extracts page metadata (title, description, author, date)
- Extracts and lists page links
- Much faster than Selenium-based read_webpage
- fetch_raw_html: Get raw HTML for structure inspection
- Optional truncation for large pages
Features:
- Trafilatura-powered content extraction (best-in-class accuracy)
- Automatic link extraction with relative URL resolution
- Page metadata extraction (OG tags, meta tags)
- Configurable timeout, max content length, max links
- Proper error handling for timeouts and HTTP errors
- 19 comprehensive tests
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Extract _get_tool_error_message helper method
- Replace 20+ levels of nesting with simple for loop
- Improve readability of tool_result construction
- Update benchmark poetry.lock
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace basic DuckDuckGo-only search with a modern tiered system:
1. Tavily (primary) - AI-optimized results with content extraction
- AI-generated answer summaries
- Relevance scoring
- Full page content extraction via search_and_extract command
2. Serper (secondary) - Fast, cheap Google SERP results
- $0.30-1.00 per 1K queries
- Real Google results without scraping
3. DDGS multi-engine (fallback) - Free, no API key required
- Automatic fallback chain: DuckDuckGo → Bing → Brave → Google → etc.
- 8 search backends supported
Key changes:
- Upgrade duckduckgo-search to ddgs v9.10 (renamed successor package)
- Add Tavily and Serper API integrations
- Implement automatic provider selection and fallback chain
- Add search_and_extract command for research with content extraction
- Add TAVILY_API_KEY and SERPER_API_KEY to env templates
- Update benchmark httpx constraint for ddgs compatibility
- 23 comprehensive tests for all providers and fallback scenarios
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix 32+ flake8 E501 (line too long) errors by shortening descriptions
- Remove unused import in todo.py
- Fix test_todo.py argument order (config= keyword)
- Add type annotations to fix pyright errors where straightforward
- Add noqa comments for flake8 false positives in __init__.py
- Remove unused nonlocal declarations in main.py
- Run black and isort to fix formatting
- Update CLAUDE.md with improved linting commands
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, when a user selected "Once" or "Always" with feedback (via Tab),
the command was NOT executed because UserFeedbackProvided was raised before
checking the approval scope. This fix changes the architecture from
exception-based to return-value-based.
Changes:
- Add PermissionCheckResult class with allowed, scope, and feedback fields
- Change check_command() to return PermissionCheckResult instead of bool
- Update prompt_fn signature to return (ApprovalScope, feedback) tuple
- Add pending_user_feedback mechanism to EpisodicActionHistory
- Update execute() to handle feedback after successful command execution
- Feedback message explicitly states "Command executed successfully"
- Add on_auto_approve callback for displaying auto-approved commands
- Add comprehensive tests for approval/denial with feedback scenarios
Behavior:
- Once + feedback → Execute command, then send feedback to agent
- Always + feedback → Execute command, save permission, send feedback
- Deny + feedback → Don't execute, send feedback to agent
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove openai_functions config option - native tool calling is now always enabled
- Remove use_functions_api from BaseAgentConfiguration and prompt strategy
- Add use_prefill config to disable prefill for Anthropic (prefill + tools incompatible)
- Update anthropic dependency to ^0.45.0 for tools API support
- Simplify prompt strategy to always expect tool_calls from LLM response
This fixes the N/A command loop bug where models would output "N/A" as a
command name when function calling was disabled. With native tool calling
always enabled, models are forced to pick from valid tools only.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove asctime from log formats since terminal output already has
timestamps from the logging infrastructure. Makes logs cleaner
and easier to read.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add specialized exception classes for better error reporting:
- CodeTimeoutError: For code execution timeouts
- HTTPError: For HTTP request failures with status code/URL
- DataProcessingError: For JSON/CSV processing errors
Each exception includes helpful hints for users.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a task management component modeled after Claude Code's TodoWrite:
- TodoItem with recursive sub_items for hierarchical task structure
- todo_write: atomic list replacement with sub-items support
- todo_read: retrieve current todos with nested structure
- todo_clear: clear all todos
- todo_decompose: use smart LLM to break down tasks into sub-steps
Features:
- Hierarchical task tracking with independent status per sub-item
- MessageProvider shows todos in LLM context with proper indentation
- DirectiveProvider adds best practices for task management
- Graceful fallback when LLM provider not configured
Integrates with:
- original_autogpt Agent (full LLM decomposition support)
- ForgeAgent (basic task tracking, no decomposition)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add --workspace option to CLI that defaults to current working directory,
allowing users to run `autogpt` from any folder. Agent data is now stored
in `.autogpt/` subdirectory of the workspace instead of a hardcoded path.
Changes:
- Add -w/--workspace CLI option to run and serve commands
- Remove dependency on forge package location for PROJECT_ROOT
- Update config to use workspace instead of project_root
- Store agent data in .autogpt/ within workspace directory
- Update pyproject.toml files with proper PyPI metadata
- Fix outdated tests to match current implementation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Claude 3.5 v2, Claude 4 Sonnet, Claude 4 Opus, and Claude 4.5 Opus models
- Add rolling aliases (CLAUDE_SONNET, CLAUDE_OPUS, CLAUDE_HAIKU)
- Fix deprecated beta.tools.messages.create API call to use standard messages.create
- Update anthropic SDK from ^0.25.1 to >=0.40,<1.0
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
These models have become deprecated
- deepseek-r1-distill-llama-70b
- gemma2-9b-it
- llama3-70b-8192
- llama3-8b-8192
- google/gemini-flash-1.5
I have removed them and setup a migration, the migration is to convert
all the old versions of the model to new versions, the model changes
will happen like so
- llama3-70b-8192 → llama-3.3-70b-versatile
- llama3-8b-8192 → llama-3.1-8b-instant
- google/gemini-flash-1.5 → google/gemini-2.5-flash
- deepseek-r1-distill-llama-70b → gpt-5-chat-latest
- gemma2-9b-it → gpt-5-chat-latest
### Changes 🏗️
<!-- Concisely describe all of the changes made in this pull request:
-->
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Check to see if old models where removed
- [x] Check to see if migration worked and converted old models to new
one in graph
Make all changes necessary to make everything work with Poetry v2.0.0.
- Resolves#9196
## Changes
- Removed `--no-update` flag from `poetry lock` command in codebase
- Removed extra path arguments from `poetry -C [path] run [command]`
occurrences
- Regenerated all lock files in hierarchical order
- Added workaround for Poetry bug where `packages.[i].format` is now
suddenly required
Additionally:
- Fixed up .dockerignore
- Fixes .venv being erroneously copied over from local
- Fixes build context bloat (300MB -> 2.5MB)
- Fixed warnings about entrypoint script not being installed in docker
builds
### Relevant (breaking) changes in v2.0.0
- `--no-update` flag no longer exists for `poetry lock` as it has become
default behavior
- The `-C` option now actually changes the directory, so any path
arguments in `poetry run` commands can/must be removed
- Poetry v2.0.0 uses the new v2.1 lock file spec, so all lock files have
to be regenerated to avoid false-positive lock file updates and checks
on future PRs
- **BUG:** when specifying `poetry.tool.packages`, `format` is required
now
- python-poetry/poetry#9961
Full Poetry v2.0.0 release notes and change log:
https://python-poetry.org/blog/announcing-poetry-2.0.0
<!-- Clearly explain the need for these changes: -->
Nick wants others to be able to write tests besides Nick
### Changes 🏗️
<!-- Concisely describe all of the changes made in this pull request:
-->
- Fixes various import errors across the docs to fix dead links
- Adds Docs for making and debugging your own tests
---------
Co-authored-by: Swifty <craigswift13@gmail.com>
Restructuring the Repo to make it clear the difference between classic autogpt and the autogpt platform:
* Move the "classic" projects `autogpt`, `forge`, `frontend`, and `benchmark` into a `classic` folder
* Also rename `autogpt` to `original_autogpt` for absolute clarity
* Rename `rnd/` to `autogpt_platform/`
* `rnd/autogpt_builder` -> `autogpt_platform/frontend`
* `rnd/autogpt_server` -> `autogpt_platform/backend`
* Adjust any paths accordingly