AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-02-13 16:25:05 -05:00

Author	SHA1	Message	Date
Nicholas Tindle	791e1d8982	fix(classic): resolve CI lint, type, and test failures - Fix line-too-long in test_permissions.py docstring - Fix type annotation in validators.py (callable -> Callable) - Add --fresh flag to benchmark tests to prevent state resumption - Exclude direct_benchmark/adapters from pyright (optional deps) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-29 14:31:11 -06:00
Nicholas Tindle	57fbab500b	feat(classic): add external benchmark adapters for GAIA, SWE-bench, and AgentBench Integrate standard AI agent benchmarks into the direct_benchmark infrastructure using a plugin-based adapter pattern: - Add BenchmarkAdapter base class with setup(), load_challenges(), and evaluate() - Implement GAIAAdapter for the GAIA benchmark (requires HF token) - Implement SWEBenchAdapter for SWE-bench (requires Docker) - Implement AgentBenchAdapter for AgentBench multi-environment benchmark - Extend HarnessConfig with benchmark options (--benchmark, --benchmark-split, etc.) - Modify ParallelExecutor to use adapter's evaluate() for external benchmarks - Fix runner to record finish step (was being skipped, breaking answer extraction) - Add optional benchmarks dependency group with datasets and huggingface-hub - Increase default benchmark timeout to 900s Usage: poetry run direct-benchmark run \ --benchmark agent-bench \ --benchmark-subset dbbench \ --strategies one_shot \ --models claude Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-21 13:06:32 -06:00
Nicholas Tindle	326554d89a	style(classic): update black to 24.10.0 and reformat Update black version to match pre-commit hook (24.10.0) and reformat all files with the new version. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 10:51:54 -06:00
Nicholas Tindle	a4d7b0142f	fix(classic): resolve all pyright type errors - Add missing strategies (lats, multi_agent_debate) to PromptStrategyName - Fix method override signatures for reasoning_effort parameter - Fix Pydantic Field() overload issues with helper function - Fix BeautifulSoup Tag type narrowing in web_fetch.py - Fix Optional member access in playwright_browser.py and rewoo.py - Convert hasattr patterns to getattr for proper type narrowing - Add proper type casts for Literal types - Fix file storage path type conversions - Exclude legacy challenges/ from pyright checking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 10:41:53 -06:00
Nicholas Tindle	572c3f5e0d	refactor(classic): consolidate Poetry projects into single pyproject.toml Merge forge/, original_autogpt/, and direct_benchmark/ into a single Poetry project to eliminate cross-project path dependency issues. Changes: - Create classic/pyproject.toml with merged dependencies from all three projects - Remove individual pyproject.toml and poetry.lock files from subdirectories - Update all CLAUDE.md files to reflect commands run from classic/ root - Update all README.md files with new installation and usage instructions All packages are now included via the packages directive: - forge/forge (core agent framework) - original_autogpt/autogpt (AutoGPT agent) - direct_benchmark/direct_benchmark (benchmark harness) CLI entry points preserved: autogpt, serve, direct-benchmark Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-20 00:49:56 -06:00

5 Commits