mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-04-08 03:00:28 -04:00
e90bc0f1d12717566f0e15f0a7a05029cff55bac
Auto-GPT Benchmark
A repo built for the purpose of benchmarking the performance of agents far and wide, regardless of how they are set up and how they work
Scores:
Radio chart for each agent coming soon !
Detailed results
⚠️ These results are constantly evolving at the moment. We will publish an official benchmark result very soon.
Interface
| Task | Auto-GPT | gpt-engineer | mini-agi | smol-developer |
|---|---|---|---|---|
| Write File | ❌ | ✅ | tbd | ✅ |
| Read File | ❌ | ❌ | tbd | ❌ |
| Search File | ❌ | ❌ | tbd | ❌ |
Code
| Task | Auto-GPT | gpt-engineer | mini-agi | smol-developer |
|---|---|---|---|---|
| Debug Simple Typo With Guidance | ❌ | ❌ | tbd | ❌ |
| Debug Simple Typo Without Guidance | ❌ | ❌ | tbd | ❌ |
| Basic Code Generation | ❌ | ✅ | tbd | ✅ |
| Create Simple Web Server | ❌ | ❌ | tbd | ❌ |
Memory
| Task | Auto-GPT |
|---|---|
| Basic Memory | ❌ |
| Remember Multiple Ids | ❌ |
| Remember Multiple Ids With Noise | ❌ |
| Remember Multiple Phrases With Noise | ❌ |
Languages
Python
67.5%
TypeScript
28.6%
Dart
1.4%
JavaScript
0.9%
PLpgSQL
0.6%
Other
0.8%