- Add 'playwright install chromium' step to Forge CI workflow
- Auto-detect default model from available API keys (ANTHROPIC_API_KEY,
OPENAI_API_KEY, GROQ_API_KEY) in direct_benchmark harness
- Prefer Claude > OpenAI > Groq, fallback to OpenAI if no keys found
- Remove old benchmark/ folder with agbenchmark framework
- Move challenges to direct_benchmark/challenges/
- Move analysis tools (analyze_reports.py, analyze_failures.py) to direct_benchmark/
- Move challenges_already_beaten.json to direct_benchmark/
- Update CI workflow to use direct_benchmark
- Update CLAUDE.md files with new benchmarking instructions
- Add benchmarking section to original_autogpt/CLAUDE.md
The direct_benchmark harness directly instantiates agents without HTTP
server overhead, enabling parallel execution with asyncio semaphore.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Restructuring the Repo to make it clear the difference between classic autogpt and the autogpt platform:
* Move the "classic" projects `autogpt`, `forge`, `frontend`, and `benchmark` into a `classic` folder
* Also rename `autogpt` to `original_autogpt` for absolute clarity
* Rename `rnd/` to `autogpt_platform/`
* `rnd/autogpt_builder` -> `autogpt_platform/frontend`
* `rnd/autogpt_server` -> `autogpt_platform/backend`
* Adjust any paths accordingly