AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-02-13 08:14:58 -05:00

Author	SHA1	Message	Date
Nick Tindle	711f0da63c	fix(classic): fix CI failures - install Playwright and auto-detect model - Add 'playwright install chromium' step to Forge CI workflow - Auto-detect default model from available API keys (ANTHROPIC_API_KEY, OPENAI_API_KEY, GROQ_API_KEY) in direct_benchmark harness - Prefer Claude > OpenAI > Groq, fallback to OpenAI if no keys found	2026-02-12 15:46:54 -06:00
Nicholas Tindle	804430e243	refactor(classic): migrate from agbenchmark to direct_benchmark harness - Remove old benchmark/ folder with agbenchmark framework - Move challenges to direct_benchmark/challenges/ - Move analysis tools (analyze_reports.py, analyze_failures.py) to direct_benchmark/ - Move challenges_already_beaten.json to direct_benchmark/ - Update CI workflow to use direct_benchmark - Update CLAUDE.md files with new benchmarking instructions - Add benchmarking section to original_autogpt/CLAUDE.md The direct_benchmark harness directly instantiates agents without HTTP server overhead, enabling parallel execution with asyncio semaphore. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 22:29:51 -06:00
Nicholas Tindle	32f68d5999	feat(classic): add failure analysis tool and improve benchmark output Benchmark improvements: - Add analyze_failures.py for pattern detection and failure analysis - Add informative step output: tool name, args, result status, cost - Add --all and --matrix flags for comprehensive model/strategy testing - Add --analyze-only and --no-analyze flags for flexible analysis control - Auto-run failure analysis after benchmarks with markdown export - Fix directory creation bug in ReportManager (add parents=True) Prompt strategy enhancements: - Implement full plan_execute, reflexion, rewoo, tree_of_thoughts strategies - Add PROMPT_STRATEGY env var support for strategy selection - Add extended thinking support for Anthropic models - Add reasoning effort support for OpenAI o-series models LLM provider improvements: - Add thinking_budget_tokens config for Anthropic extended thinking - Add reasoning_effort config for OpenAI reasoning models - Improve error feedback for LLM self-correction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 18:58:41 -06:00
Nicholas Tindle	49f56b4e8d	feat(classic): enhance strategy benchmark harness with model comparison and bug fixes - Add model comparison support to test harness (claude, openai, gpt5, opus presets) - Add --models, --smart-llm, --fast-llm, --list-models CLI args - Add real-time logging with timestamps and progress indicators - Fix success parsing bug: read results[0].success instead of non-existent metrics.success - Fix agbenchmark TestResult validation: use exception typename when value is empty - Fix WebArena challenge validation: use strings instead of integers in instantiation_dict - Fix Agent type annotations: create AnyActionProposal union for all prompt strategies - Add pytest integration tests for the strategy benchmark harness Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 18:07:14 -06:00
Nicholas Tindle	e0784f8f6b	refactor(forge): simplify deeply nested error handling in Anthropic provider - Extract _get_tool_error_message helper method - Replace 20+ levels of nesting with simple for loop - Improve readability of tool_result construction - Update benchmark poetry.lock Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 00:15:33 -06:00
Nicholas Tindle	3040f39136	feat(forge): modernize web search with tiered provider system Replace basic DuckDuckGo-only search with a modern tiered system: 1. Tavily (primary) - AI-optimized results with content extraction - AI-generated answer summaries - Relevance scoring - Full page content extraction via search_and_extract command 2. Serper (secondary) - Fast, cheap Google SERP results - $0.30-1.00 per 1K queries - Real Google results without scraping 3. DDGS multi-engine (fallback) - Free, no API key required - Automatic fallback chain: DuckDuckGo → Bing → Brave → Google → etc. - 8 search backends supported Key changes: - Upgrade duckduckgo-search to ddgs v9.10 (renamed successor package) - Add Tavily and Serper API integrations - Implement automatic provider selection and fallback chain - Add search_and_extract command for research with content extraction - Add TAVILY_API_KEY and SERPER_API_KEY to env templates - Update benchmark httpx constraint for ddgs compatibility - 23 comprehensive tests for all providers and fallback scenarios Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 00:06:42 -06:00
Nicholas Tindle	46aabab3ea	feat(classic): upgrade to Python 3.12+ with CI testing on 3.12, 3.13, 3.14 - Update Python version constraint from ^3.10 to ^3.12 in all pyproject.toml - Update classifiers to reflect Python 3.12, 3.13, 3.14 support - Update dependencies for Python 3.13+ compatibility: - chromadb: ^0.4.10 -> ^1.4.0 - numpy: >=1.26.0,<2.0.0 -> >=2.0.0 - watchdog: 4.0.0 -> ^6.0.0 - spacy: ^3.0.0 -> ^3.8.0 (numpy 2.x compatibility) - en-core-web-sm model: 3.7.1 -> 3.8.0 - httpx (benchmark): ^0.24.0 -> ^0.27.0 - Update tool configuration: - Black target-version: py310 -> py312 - Pyright pythonVersion: 3.10 -> 3.12 - Update Dockerfiles to use Python 3.12 - Update CI workflows to test on Python 3.12, 3.13, and 3.14 - Regenerate all poetry.lock files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 20:25:11 -06:00
Nicholas Tindle	0a65df5102	fix(classic): always use native tool calling, fix N/A command loop - Remove openai_functions config option - native tool calling is now always enabled - Remove use_functions_api from BaseAgentConfiguration and prompt strategy - Add use_prefill config to disable prefill for Anthropic (prefill + tools incompatible) - Update anthropic dependency to ^0.45.0 for tools API support - Simplify prompt strategy to always expect tool_calls from LLM response This fixes the N/A command loop bug where models would output "N/A" as a command name when function calling was disabled. With native tool calling always enabled, models are forced to pick from valid tools only. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 19:54:40 -06:00
Nicholas Tindle	fd66be2aaa	chore(classic): remove unneeded files and add CLAUDE.md docs - Remove deprecated Flutter frontend (replaced by autogpt_platform) - Remove shell scripts (run, setup, autogpt.sh, etc.) - Remove tutorials (outdated) - Remove CLI-USAGE.md and FORGE-QUICKSTART.md - Add CLAUDE.md files for Claude Code guidance Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-18 16:17:35 -06:00
Emmanuel Ferdman	e5368f3857	fix: Resolve `logger.warn(..)` deprecration warnings (#9938 ) This small PR resolves the deprecation warnings of the `logger` library: ``` DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead ```	2025-05-16 10:56:03 +02:00
Reinier van der Leer	d638c1f484	Fix Poetry v2.0.0 compatibility (#9197 ) Make all changes necessary to make everything work with Poetry v2.0.0. - Resolves #9196 ## Changes - Removed `--no-update` flag from `poetry lock` command in codebase - Removed extra path arguments from `poetry -C [path] run [command]` occurrences - Regenerated all lock files in hierarchical order - Added workaround for Poetry bug where `packages.[i].format` is now suddenly required Additionally: - Fixed up .dockerignore - Fixes .venv being erroneously copied over from local - Fixes build context bloat (300MB -> 2.5MB) - Fixed warnings about entrypoint script not being installed in docker builds ### Relevant (breaking) changes in v2.0.0 - `--no-update` flag no longer exists for `poetry lock` as it has become default behavior - The `-C` option now actually changes the directory, so any path arguments in `poetry run` commands can/must be removed - Poetry v2.0.0 uses the new v2.1 lock file spec, so all lock files have to be regenerated to avoid false-positive lock file updates and checks on future PRs - BUG: when specifying `poetry.tool.packages`, `format` is required now - python-poetry/poetry#9961 Full Poetry v2.0.0 release notes and change log: https://python-poetry.org/blog/announcing-poetry-2.0.0	2025-01-06 23:34:49 +01:00
Swifty	ef7cfbb860	refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 ) Restructuring the Repo to make it clear the difference between classic autogpt and the autogpt platform: * Move the "classic" projects `autogpt`, `forge`, `frontend`, and `benchmark` into a `classic` folder * Also rename `autogpt` to `original_autogpt` for absolute clarity * Rename `rnd/` to `autogpt_platform/` * `rnd/autogpt_builder` -> `autogpt_platform/frontend` * `rnd/autogpt_server` -> `autogpt_platform/backend` * Adjust any paths accordingly	2024-09-20 16:50:43 +02:00

12 Commits