Files
AutoGPT/.github/workflows/classic-autogpt-docker-ci.yml
Nicholas Tindle e33b1e2105 feat(classic): update classic autogpt a bit to make it more useful for my day to day (#11797)
## Summary

This PR modernizes AutoGPT Classic to make it more useful for day-to-day
autonomous agent development. Major changes include consolidating the
project structure, adding new prompt strategies, modernizing the
benchmark system, and improving the development experience.

**Note: AutoGPT Classic is an experimental, unsupported project
preserved for educational/historical purposes. Dependencies will not be
actively updated.**

## Changes 🏗️

### Project Structure & Build System
- **Consolidated Poetry projects** - Merged `forge/`,
`original_autogpt/`, and benchmark packages into a single
`pyproject.toml` at `classic/` root
- **Removed old benchmark infrastructure** - Deleted the complex
`agbenchmark` package (3000+ lines) in favor of the new
`direct_benchmark` harness
- **Removed frontend** - Deleted `benchmark/frontend/` React app (no
longer needed)
- **Cleaned up CI workflows** - Simplified GitHub Actions workflows for
the consolidated project structure
- **Added CLAUDE.md** - Documentation for working with the codebase
using Claude Code

### New Direct Benchmark System
- **`direct_benchmark` harness** - New streamlined benchmark runner
with:
  - Rich TUI with multi-panel layout showing parallel test execution
  - Incremental resume and selective reset capabilities
  - CI mode for non-interactive environments
  - Step-level logging with colored prefixes
  - "Would have passed" tracking for timed-out challenges
  - Copy-paste completion blocks for sharing results

### Multiple Prompt Strategies
Added pluggable prompt strategy system supporting:
- **one_shot** - Single-prompt completion
- **plan_execute** - Plan first, then execute steps
- **rewoo** - Reasoning without observation (deferred tool execution)
- **react** - Reason + Act iterative loop
- **lats** - Language Agent Tree Search (MCTS-based exploration)
- **sub_agent** - Multi-agent delegation architecture
- **debate** - Multi-agent debate for consensus

### LLM Provider Improvements
- Added support for modern **Anthropic Claude models**
(claude-3.5-sonnet, claude-3-haiku, etc.)
- Added **Groq** provider support
- Improved tool call error feedback for LLM self-correction
- Fixed deprecated API usage

### Web Components
- **Replaced Selenium with Playwright** for web browsing (better async
support, faster)
- Added **lightweight web fetch component** for simple URL fetching
- **Modernized web search** with tiered provider system (Tavily, Serper,
Google)

### Agent Capabilities
- **Workspace permissions system** - Pattern-based allow/deny lists for
agent commands
- **Rich interactive selector** for command approval with scopes
(once/agent/workspace/deny)
- **TodoComponent** with LLM-powered task decomposition
- **Platform blocks integration** - Connect to AutoGPT Platform API for
additional blocks
- **Sub-agent architecture** - Agents can spawn and coordinate
sub-agents

### Developer Experience
- **Python 3.12+ support** with CI testing on 3.12, 3.13, 3.14
- **Current working directory as default workspace** - Run `autogpt`
from any project directory
- Simplified log format (removed timestamps)
- Improved configuration and setup flow
- External benchmark adapters for GAIA, SWE-bench, and AgentBench

### Bug Fixes
- Fixed N/A command loop when using native tool calling
- Fixed auto-advance plan steps in Plan-Execute strategy
- Fixed approve+feedback to execute command then send feedback
- Fixed parallel tool calls in action history
- Always recreate Docker containers for code execution
- Various pyright type errors resolved
- Linting and formatting issues fixed across codebase

## Test Plan

- [x] CI lint, type, and test checks pass
- [x] Run `poetry install` from `classic/` directory
- [x] Run `poetry run autogpt` and verify CLI starts
- [x] Run `poetry run direct-benchmark run --tests ReadFile` to verify
benchmark works

## Notes

- This is a WIP PR for personal use improvements
- The project is marked as **unsupported** - no active maintenance
planned
- Contains known vulnerabilities in dependencies (intentionally not
updated)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> CI/build workflows are substantially reworked (runner matrix removal,
path/layout changes, new benchmark runner), so breakage is most likely
in automation and packaging rather than runtime behavior.
> 
> **Overview**
> **Modernizes the `classic/` project layout and automation around a
single consolidated Poetry project** (root
`classic/pyproject.toml`/`poetry.lock`) and updates docs
(`classic/README.md`, new `classic/CLAUDE.md`) accordingly.
> 
> **Replaces the old `agbenchmark` CI usage with `direct-benchmark` in
GitHub Actions**, including new/updated benchmark smoke and regression
workflows, standardized `working-directory: classic`, and a move to
**Python 3.12** on Ubuntu-only runners (plus updated caching, coverage
flags, and required `ANTHROPIC_API_KEY` wiring).
> 
> Cleans up repo/dev tooling by removing the classic frontend workflow,
deleting the Forge VCR cassette submodule (`.gitmodules`) and associated
CI steps, consolidating `flake8`/`isort`/`pyright` pre-commit hooks to
run from `classic/`, updating ignores for new report/workspace
artifacts, and updating `classic/Dockerfile.autogpt` to build from
Python 3.12 with the consolidated project structure.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
de67834dac. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-04-03 07:16:36 +00:00

167 lines
5.5 KiB
YAML

name: Classic - AutoGPT Docker CI
on:
push:
branches: [master, dev]
paths:
- '.github/workflows/classic-autogpt-docker-ci.yml'
- 'classic/original_autogpt/**'
- 'classic/forge/**'
pull_request:
branches: [ master, dev, release-* ]
paths:
- '.github/workflows/classic-autogpt-docker-ci.yml'
- 'classic/original_autogpt/**'
- 'classic/forge/**'
concurrency:
group: ${{ format('classic-autogpt-docker-ci-{0}', github.head_ref && format('pr-{0}', github.event.pull_request.number) || github.sha) }}
cancel-in-progress: ${{ github.event_name == 'pull_request' }}
defaults:
run:
working-directory: classic/original_autogpt
env:
IMAGE_NAME: auto-gpt
DEPLOY_IMAGE_NAME: ${{ secrets.DOCKER_USER && format('{0}/', secrets.DOCKER_USER) || '' }}auto-gpt
DEV_IMAGE_TAG: latest-dev
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
build-type: [release, dev]
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- if: runner.debug
run: |
ls -al
du -hs *
- id: build
name: Build image
uses: docker/build-push-action@v6
with:
context: classic/
file: classic/Dockerfile.autogpt
build-args: BUILD_TYPE=${{ matrix.build-type }}
tags: ${{ env.IMAGE_NAME }}
labels: GIT_REVISION=${{ github.sha }}
load: true # save to docker images
# cache layers in GitHub Actions cache to speed up builds
cache-from: type=gha,scope=autogpt-docker-${{ matrix.build-type }}
cache-to: type=gha,scope=autogpt-docker-${{ matrix.build-type }},mode=max
- name: Generate build report
env:
event_name: ${{ github.event_name }}
event_ref: ${{ github.event.ref }}
event_ref_type: ${{ github.event.ref}}
build_type: ${{ matrix.build-type }}
prod_branch: master
dev_branch: dev
repository: ${{ github.repository }}
base_branch: ${{ github.ref_name != 'master' && github.ref_name != 'dev' && 'dev' || 'master' }}
current_ref: ${{ github.ref_name }}
commit_hash: ${{ github.event.after }}
source_url: ${{ format('{0}/tree/{1}', github.event.repository.url, github.event.release && github.event.release.tag_name || github.sha) }}
push_forced_label: ${{ github.event.forced && '☢️ forced' || '' }}
new_commits_json: ${{ toJSON(github.event.commits) }}
compare_url_template: ${{ format('/{0}/compare/{{base}}...{{head}}', github.repository) }}
github_context_json: ${{ toJSON(github) }}
job_env_json: ${{ toJSON(env) }}
vars_json: ${{ toJSON(vars) }}
run: .github/workflows/scripts/docker-ci-summary.sh >> $GITHUB_STEP_SUMMARY
continue-on-error: true
test:
runs-on: ubuntu-latest
timeout-minutes: 10
services:
minio:
image: minio/minio:edge-cicd
options: >
--name=minio
--health-interval=10s --health-timeout=5s --health-retries=3
--health-cmd="curl -f http://localhost:9000/minio/health/live"
steps:
- name: Check out repository
uses: actions/checkout@v4
with:
submodules: true
- if: github.event_name == 'push'
name: Log in to Docker hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKER_USER }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- id: build
name: Build image
uses: docker/build-push-action@v6
with:
context: classic/
file: classic/Dockerfile.autogpt
build-args: BUILD_TYPE=dev # include pytest
tags: >
${{ env.IMAGE_NAME }},
${{ env.DEPLOY_IMAGE_NAME }}:${{ env.DEV_IMAGE_TAG }}
labels: GIT_REVISION=${{ github.sha }}
load: true # save to docker images
# cache layers in GitHub Actions cache to speed up builds
cache-from: type=gha,scope=autogpt-docker-dev
cache-to: type=gha,scope=autogpt-docker-dev,mode=max
- id: test
name: Run tests
env:
CI: true
PLAIN_OUTPUT: True
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
S3_ENDPOINT_URL: http://minio:9000
AWS_ACCESS_KEY_ID: minioadmin
AWS_SECRET_ACCESS_KEY: minioadmin
run: |
set +e
docker run --env CI --env OPENAI_API_KEY \
--network container:minio \
--env S3_ENDPOINT_URL --env AWS_ACCESS_KEY_ID --env AWS_SECRET_ACCESS_KEY \
--entrypoint poetry ${{ env.IMAGE_NAME }} run \
pytest -v --cov=autogpt --cov-branch --cov-report term-missing \
--numprocesses=4 --durations=10 \
original_autogpt/tests/unit original_autogpt/tests/integration 2>&1 | tee test_output.txt
test_failure=${PIPESTATUS[0]}
cat << $EOF >> $GITHUB_STEP_SUMMARY
# Tests $([ $test_failure = 0 ] && echo '✅' || echo '❌')
\`\`\`
$(cat test_output.txt)
\`\`\`
$EOF
exit $test_failure
- if: github.event_name == 'push' && github.ref_name == 'master'
name: Push image to Docker Hub
run: docker push ${{ env.DEPLOY_IMAGE_NAME }}:${{ env.DEV_IMAGE_TAG }}