github/AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-02-12 15:55:03 -05:00

Files

History

Nicholas Tindle 49f56b4e8d feat(classic): enhance strategy benchmark harness with model comparison and bug fixes

- Add model comparison support to test harness (claude, openai, gpt5, opus presets)
- Add --models, --smart-llm, --fast-llm, --list-models CLI args
- Add real-time logging with timestamps and progress indicators
- Fix success parsing bug: read results[0].success instead of non-existent metrics.success
- Fix agbenchmark TestResult validation: use exception typename when value is empty
- Fix WebArena challenge validation: use strings instead of integers in instantiation_dict
- Fix Agent type annotations: create AnyActionProposal union for all prompt strategies
- Add pytest integration tests for the strategy benchmark harness

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-19 18:07:14 -06:00

..

feat(classic): enhance strategy benchmark harness with model comparison and bug fixes

2026-01-19 18:07:14 -06:00

agbenchmark_config

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

.env.example

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

.flake8

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

.gitignore

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

agents_to_benchmark.json

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

LICENSE

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

poetry.lock

refactor(forge): simplify deeply nested error handling in Anthropic provider

2026-01-19 00:15:33 -06:00

pyproject.toml

feat(forge): modernize web search with tiered provider system

2026-01-19 00:06:42 -06:00

README.md

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

README.md

Auto-GPT Benchmarks

Built for the purpose of benchmarking the performance of agents regardless of how they work.

Objectively know how well your agent is performing in categories like code, retrieval, memory, and safety.

Save time and money while doing it through smart dependencies. The best part? It's all automated.

Scores:

Screenshot 2023-07-25 at 10 35 01 AM

Ranking overall:

Detailed results:

Screenshot 2023-07-25 at 10 42 15 AM

Click here to see the results and the raw data!!

More agents coming soon !