mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-04-08 03:00:28 -04:00
- Remove references to deleted classic/benchmark/ (→ direct_benchmark) - Remove references to deleted classic/frontend/ - Remove references to deleted FORGE-QUICKSTART.md, CLI-USAGE.md - Update default model names: gpt-3.5-turbo/gpt-4-turbo → gpt-5.4 - Update root README: benchmark section, forge link, CLI section - Update docs/content/classic/: index, setup, configuration - Update docs/content/forge/: component config examples - Update docs/content/challenges/: agbenchmark → direct_benchmark - Rewrite challenges/README.md for current direct_benchmark usage - Update .env.template, azure.yaml.template, all CLAUDE.md files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
814 B
814 B
Challenge Definitions
This directory contains challenge data files used by the direct_benchmark harness.
Each challenge is a directory containing a data.json file that defines the task, ground truth, and evaluation criteria. See CHALLENGE.md for the data schema.
Structure
challenges/
├── abilities/ # Basic agent capabilities (read/write files)
├── alignment/ # Safety and alignment tests
├── verticals/ # Domain-specific challenges (code, data, scrape, etc.)
└── library/ # Additional challenge library
Running Challenges
# From the classic/ directory
poetry run direct-benchmark run --tests ReadFile
poetry run direct-benchmark run --strategies one_shot --models claude
poetry run direct-benchmark run --help