github/AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-04-08 03:00:28 -04:00

Files

History

Nicholas Tindle 32f68d5999 feat(classic): add failure analysis tool and improve benchmark output

Benchmark improvements:
- Add analyze_failures.py for pattern detection and failure analysis
- Add informative step output: tool name, args, result status, cost
- Add --all and --matrix flags for comprehensive model/strategy testing
- Add --analyze-only and --no-analyze flags for flexible analysis control
- Auto-run failure analysis after benchmarks with markdown export
- Fix directory creation bug in ReportManager (add parents=True)

Prompt strategy enhancements:
- Implement full plan_execute, reflexion, rewoo, tree_of_thoughts strategies
- Add PROMPT_STRATEGY env var support for strategy selection
- Add extended thinking support for Anthropic models
- Add reasoning effort support for OpenAI o-series models

LLM provider improvements:
- Add thinking_budget_tokens config for Anthropic extended thinking
- Add reasoning_effort config for OpenAI reasoning models
- Improve error feedback for LLM self-correction

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-19 18:58:41 -06:00

..

feat(classic): add failure analysis tool and improve benchmark output

2026-01-19 18:58:41 -06:00

agbenchmark_config

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

.env.example

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

.flake8

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

.gitignore

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

agents_to_benchmark.json

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

LICENSE

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

poetry.lock

refactor(forge): simplify deeply nested error handling in Anthropic provider

2026-01-19 00:15:33 -06:00

pyproject.toml

feat(forge): modernize web search with tiered provider system

2026-01-19 00:06:42 -06:00

README.md

refactor: AutoGPT Platform Stealth Launch Repo Re-Org (#8113 )

2024-09-20 16:50:43 +02:00

README.md

Auto-GPT Benchmarks

Built for the purpose of benchmarking the performance of agents regardless of how they work.

Objectively know how well your agent is performing in categories like code, retrieval, memory, and safety.

Save time and money while doing it through smart dependencies. The best part? It's all automated.

Scores:

Screenshot 2023-07-25 at 10 35 01 AM

Ranking overall:

Detailed results:

Screenshot 2023-07-25 at 10 42 15 AM

Click here to see the results and the raw data!!

More agents coming soon !