mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-02-12 07:45:14 -05:00

Files

Nicholas Tindle 0a616d9267 feat(direct_benchmark): add step-level logging with colored prefixes

- Add step callback to AgentRunner for real-time step logging
- BenchmarkUI now shows:
  - Active runs with current step info
  - Recent steps panel with colored config prefixes
  - Proper Live display refresh (implements __rich_console__)
- Each config gets a distinct color for easy identification
- Verbose mode prints step logs immediately with config prefix
- Fix Live display not updating (pass UI object, not rendered content)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-19 23:02:20 -06:00

abilities

refactor(classic): migrate from agbenchmark to direct_benchmark harness

2026-01-19 22:29:51 -06:00

alignment

refactor(classic): migrate from agbenchmark to direct_benchmark harness

2026-01-19 22:29:51 -06:00

library

refactor(classic): migrate from agbenchmark to direct_benchmark harness

2026-01-19 22:29:51 -06:00

verticals

refactor(classic): migrate from agbenchmark to direct_benchmark harness

2026-01-19 22:29:51 -06:00

__init__.py

refactor(classic): migrate from agbenchmark to direct_benchmark harness

2026-01-19 22:29:51 -06:00

base.py

feat(direct_benchmark): add step-level logging with colored prefixes

2026-01-19 23:02:20 -06:00

builtin.py

feat(direct_benchmark): add step-level logging with colored prefixes

2026-01-19 23:02:20 -06:00

CHALLENGE.md

refactor(classic): migrate from agbenchmark to direct_benchmark harness

2026-01-19 22:29:51 -06:00

optional_categories.json

refactor(classic): migrate from agbenchmark to direct_benchmark harness

2026-01-19 22:29:51 -06:00

README.md

refactor(classic): migrate from agbenchmark to direct_benchmark harness

2026-01-19 22:29:51 -06:00

webarena_selection.json

refactor(classic): migrate from agbenchmark to direct_benchmark harness

2026-01-19 22:29:51 -06:00

webarena.py

feat(direct_benchmark): add step-level logging with colored prefixes

2026-01-19 23:02:20 -06:00

README.md

Auto-GPT-Benchmarks

The goal of this repo is to provide easy challenge creation for test driven development with the Auto-GPT-Benchmarks package. This is essentially a library to craft challenges using a dsl (jsons in this case).

This is the up to date dependency graph: https://sapphire-denys-23.tiiny.site/

How to use

Make sure you have the package installed with pip install agbenchmark.

If you would just like to use the default challenges, don't worry about this repo. Just install the package and you will have access to the default challenges.

To add new challenges as you develop, add this repo as a submodule to your project/agbenchmark folder. Any new challenges you add within the submodule will get registered automatically.