AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-04-30 03:00:41 -04:00

Author	SHA1	Message	Date
Nicholas Tindle	4eeb6ee2b0	feat(direct_benchmark): add CI mode for non-interactive environments Add --ci flag that disables Rich Live display while preserving completion blocks. Auto-detects CI environment via CI env var or non-TTY stdout. Prints progress every 10 completions for visibility. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:21:10 -06:00
Nicholas Tindle	7db962b9f9	feat(direct_benchmark): dynamic column layout up to 10 wide - Calculate max columns based on terminal width (up to 10) - Reduced panel width from 35 to 30 chars to fit more - Wider terminals can now show more parallel runs side-by-side Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:15:16 -06:00
Nicholas Tindle	9108b21541	fix(direct_benchmark): parallel execution and always show completion blocks Fixes: - Use run_key (config:challenge) instead of just config_name for tracking active runs - allows multiple challenges from same config to run in parallel - Add asyncio.sleep(0) yields to let multiple tasks acquire semaphore and start before any proceed with work - Always print completion blocks (not just failures) for visibility This should properly show 8/8 active runs when running with --parallel 8. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:13:56 -06:00
Nicholas Tindle	ffe9325296	feat(direct_benchmark): multi-panel UI with copy-paste completion blocks UI improvements: - Multi-column layout: each active config gets its own panel showing challenge name and step history (last 6 steps with status) - Copy-paste completion blocks: when a challenge finishes (especially failures), prints a detailed block with all steps for easy debugging - Configurable logging: suppresses noisy LLM provider warnings unless --debug flag is set - Pass debug flag through harness to UI Example active runs panel: ┌─ one_shot/claude ─┬─ rewoo/claude ────┐ │ ReadFile │ WriteFile │ │ ✓ #1 read_file │ ✓ #1 think │ │ ✓ #2 write_file │ ✓ #2 plan │ │ ● step 3: ... │ ● step 3: ... │ └───────────────────┴───────────────────┘ Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:10:34 -06:00
Nicholas Tindle	0a616d9267	feat(direct_benchmark): add step-level logging with colored prefixes - Add step callback to AgentRunner for real-time step logging - BenchmarkUI now shows: - Active runs with current step info - Recent steps panel with colored config prefixes - Proper Live display refresh (implements __rich_console__) - Each config gets a distinct color for easy identification - Verbose mode prints step logs immediately with config prefix - Fix Live display not updating (pass UI object, not rendered content) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 23:02:20 -06:00
Nicholas Tindle	804430e243	refactor(classic): migrate from agbenchmark to direct_benchmark harness - Remove old benchmark/ folder with agbenchmark framework - Move challenges to direct_benchmark/challenges/ - Move analysis tools (analyze_reports.py, analyze_failures.py) to direct_benchmark/ - Move challenges_already_beaten.json to direct_benchmark/ - Update CI workflow to use direct_benchmark - Update CLAUDE.md files with new benchmarking instructions - Add benchmarking section to original_autogpt/CLAUDE.md The direct_benchmark harness directly instantiates agents without HTTP server overhead, enabling parallel execution with asyncio semaphore. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-19 22:29:51 -06:00

6 Commits