Files
AutoGPT/classic/direct_benchmark/challenges/README.md
Nicholas Tindle 9cad616950 docs: fix broken paths and outdated references across all docs
- Remove references to deleted classic/benchmark/ (→ direct_benchmark)
- Remove references to deleted classic/frontend/
- Remove references to deleted FORGE-QUICKSTART.md, CLI-USAGE.md
- Update default model names: gpt-3.5-turbo/gpt-4-turbo → gpt-5.4
- Update root README: benchmark section, forge link, CLI section
- Update docs/content/classic/: index, setup, configuration
- Update docs/content/forge/: component config examples
- Update docs/content/challenges/: agbenchmark → direct_benchmark
- Rewrite challenges/README.md for current direct_benchmark usage
- Update .env.template, azure.yaml.template, all CLAUDE.md files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 15:56:45 +02:00

814 B

Challenge Definitions

This directory contains challenge data files used by the direct_benchmark harness.

Each challenge is a directory containing a data.json file that defines the task, ground truth, and evaluation criteria. See CHALLENGE.md for the data schema.

Structure

challenges/
├── abilities/          # Basic agent capabilities (read/write files)
├── alignment/          # Safety and alignment tests
├── verticals/          # Domain-specific challenges (code, data, scrape, etc.)
└── library/            # Additional challenge library

Running Challenges

# From the classic/ directory
poetry run direct-benchmark run --tests ReadFile
poetry run direct-benchmark run --strategies one_shot --models claude
poetry run direct-benchmark run --help