## Summary
This PR modernizes AutoGPT Classic to make it more useful for day-to-day
autonomous agent development. Major changes include consolidating the
project structure, adding new prompt strategies, modernizing the
benchmark system, and improving the development experience.
**Note: AutoGPT Classic is an experimental, unsupported project
preserved for educational/historical purposes. Dependencies will not be
actively updated.**
## Changes 🏗️
### Project Structure & Build System
- **Consolidated Poetry projects** - Merged `forge/`,
`original_autogpt/`, and benchmark packages into a single
`pyproject.toml` at `classic/` root
- **Removed old benchmark infrastructure** - Deleted the complex
`agbenchmark` package (3000+ lines) in favor of the new
`direct_benchmark` harness
- **Removed frontend** - Deleted `benchmark/frontend/` React app (no
longer needed)
- **Cleaned up CI workflows** - Simplified GitHub Actions workflows for
the consolidated project structure
- **Added CLAUDE.md** - Documentation for working with the codebase
using Claude Code
### New Direct Benchmark System
- **`direct_benchmark` harness** - New streamlined benchmark runner
with:
- Rich TUI with multi-panel layout showing parallel test execution
- Incremental resume and selective reset capabilities
- CI mode for non-interactive environments
- Step-level logging with colored prefixes
- "Would have passed" tracking for timed-out challenges
- Copy-paste completion blocks for sharing results
### Multiple Prompt Strategies
Added pluggable prompt strategy system supporting:
- **one_shot** - Single-prompt completion
- **plan_execute** - Plan first, then execute steps
- **rewoo** - Reasoning without observation (deferred tool execution)
- **react** - Reason + Act iterative loop
- **lats** - Language Agent Tree Search (MCTS-based exploration)
- **sub_agent** - Multi-agent delegation architecture
- **debate** - Multi-agent debate for consensus
### LLM Provider Improvements
- Added support for modern **Anthropic Claude models**
(claude-3.5-sonnet, claude-3-haiku, etc.)
- Added **Groq** provider support
- Improved tool call error feedback for LLM self-correction
- Fixed deprecated API usage
### Web Components
- **Replaced Selenium with Playwright** for web browsing (better async
support, faster)
- Added **lightweight web fetch component** for simple URL fetching
- **Modernized web search** with tiered provider system (Tavily, Serper,
Google)
### Agent Capabilities
- **Workspace permissions system** - Pattern-based allow/deny lists for
agent commands
- **Rich interactive selector** for command approval with scopes
(once/agent/workspace/deny)
- **TodoComponent** with LLM-powered task decomposition
- **Platform blocks integration** - Connect to AutoGPT Platform API for
additional blocks
- **Sub-agent architecture** - Agents can spawn and coordinate
sub-agents
### Developer Experience
- **Python 3.12+ support** with CI testing on 3.12, 3.13, 3.14
- **Current working directory as default workspace** - Run `autogpt`
from any project directory
- Simplified log format (removed timestamps)
- Improved configuration and setup flow
- External benchmark adapters for GAIA, SWE-bench, and AgentBench
### Bug Fixes
- Fixed N/A command loop when using native tool calling
- Fixed auto-advance plan steps in Plan-Execute strategy
- Fixed approve+feedback to execute command then send feedback
- Fixed parallel tool calls in action history
- Always recreate Docker containers for code execution
- Various pyright type errors resolved
- Linting and formatting issues fixed across codebase
## Test Plan
- [x] CI lint, type, and test checks pass
- [x] Run `poetry install` from `classic/` directory
- [x] Run `poetry run autogpt` and verify CLI starts
- [x] Run `poetry run direct-benchmark run --tests ReadFile` to verify
benchmark works
## Notes
- This is a WIP PR for personal use improvements
- The project is marked as **unsupported** - no active maintenance
planned
- Contains known vulnerabilities in dependencies (intentionally not
updated)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> CI/build workflows are substantially reworked (runner matrix removal,
path/layout changes, new benchmark runner), so breakage is most likely
in automation and packaging rather than runtime behavior.
>
> **Overview**
> **Modernizes the `classic/` project layout and automation around a
single consolidated Poetry project** (root
`classic/pyproject.toml`/`poetry.lock`) and updates docs
(`classic/README.md`, new `classic/CLAUDE.md`) accordingly.
>
> **Replaces the old `agbenchmark` CI usage with `direct-benchmark` in
GitHub Actions**, including new/updated benchmark smoke and regression
workflows, standardized `working-directory: classic`, and a move to
**Python 3.12** on Ubuntu-only runners (plus updated caching, coverage
flags, and required `ANTHROPIC_API_KEY` wiring).
>
> Cleans up repo/dev tooling by removing the classic frontend workflow,
deleting the Forge VCR cassette submodule (`.gitmodules`) and associated
CI steps, consolidating `flake8`/`isort`/`pyright` pre-commit hooks to
run from `classic/`, updating ignores for new report/workspace
artifacts, and updating `classic/Dockerfile.autogpt` to build from
Python 3.12 with the consolidated project structure.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
de67834dac. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
We have been submoduling Supabase for provisioning local Supabase
instances using docker-compose. Aside from the huge size of unrelated
code being pulled, there is also the risk of pulling unintentional
breaking change from the upstream to the platform.
The latest Supabase changes hide the 5432 port from the supabase-db
container and shift it to the supavisor, the instance that we are
currently not using. This causes an error in the existing setup.
## BREAKING CHANGES
This change will introduce different volume locations for the database
content, pulling this change will make the data content fresh from the
start. To keep your old data with this change, execute this command:
```
cp -r supabase/docker/volumes/db/data db/docker/volumes/db/data
```
### Changes 🏗️
The scope of this PR is snapshotting the current docker-compose code
obtained from the Supabase repository and embedding it into our
repository. This will eliminate the need for submodule / recursive
cloning and bringing the entire Supabase repository into the platform.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
<!-- Put your test plan here: -->
- [x] Existing CI
Restructuring the Repo to make it clear the difference between classic autogpt and the autogpt platform:
* Move the "classic" projects `autogpt`, `forge`, `frontend`, and `benchmark` into a `classic` folder
* Also rename `autogpt` to `original_autogpt` for absolute clarity
* Rename `rnd/` to `autogpt_platform/`
* `rnd/autogpt_builder` -> `autogpt_platform/frontend`
* `rnd/autogpt_server` -> `autogpt_platform/backend`
* Adjust any paths accordingly
- Moved `autogpt` and `forge` to project root
- Removed `autogpts` directory
- Moved and renamed submodule `autogpts/autogpt/tests/vcr_cassettes` to `autogpt/tests/vcr_cassettes`
- When using CLI agents will be created in `agents` directory (instead of `autogpts`)
- Renamed relevant docs, code and config references from `autogpts/[forge|autogpt]` to `[forge|autogpt]` and from `*../../*` to `*../*`
- Updated `CODEOWNERS`, GitHub Actions and Docker `*.yml` configs
- Updated symbolic links in `docs`
* gfeat: specify directory of cassettes and automatically load them depending on module
fix: formatting for linter
test: commit newly generated cassettes to their respective folder
tests: update latest fixtures with master
fix: update .gitattributes with updated path to cassettes
fix: use cassettes from master instead of generating them myself
fix: update path in .gitattributes
fix: make sure to match default functionality by using test name for cassette directory
fix: actually add git submodule
ci: checkout git submodules in CI
ci: update git submodules separately to ensure it gets called
feat: add a hooks directory so we can update git submodules on post-checkout
feat: make sure we push the tests/cassettes submodule on merge into master
ci: remove unused code now that we are using git submodules to keep cassettes in sync
fix: simplify how we load the submodule and fix updating cassettes on merge to master
chore: remove echo of checkout hook, it's unneeded
ci: remove unneccesary step
* cassettes submodule
* cassettes submodule
* cassettes submodule
* cassettes submodule
* cassettes submodule
---------
Co-authored-by: Stefan Ayala <stefanayala3266@gmail.com>