### Why / What / How
**Why:** We had no local pre-commit protection against accidentally
committing secrets. The existing `detect-secrets` hook only ran on
`pre-push`, which is too late — secrets are already in git history by
that point. GitHub's push protection only covers known provider patterns
and runs server-side.
**What:** Adds a 3-layer defense against secret leaks: local pre-commit
hooks (gitleaks + detect-secrets), and a CI workflow as a safety net.
**How:**
- Moved `detect-secrets` from `pre-push` to `pre-commit` stage
- Added `gitleaks` as a second pre-commit hook (Go binary, faster and
more comprehensive rule set)
- Added `.gitleaks.toml` config with allowlists for known false
positives (test fixtures, dev docker JWTs, Firebase public keys, lock
files, docs examples)
- Added `repo-secret-scan.yml` CI workflow using `gitleaks-action` on
PRs/pushes to master/dev
### Changes 🏗️
- `.pre-commit-config.yaml`: Moved `detect-secrets` to pre-commit stage,
added baseline arg, added `gitleaks` hook
- `.gitleaks.toml`: New config with tuned allowlists for this repo's
false positives
- `.secrets.baseline`: Empty baseline for detect-secrets to track known
findings
- `.github/workflows/repo-secret-scan.yml`: New CI workflow running
gitleaks on every PR and push
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Ran `gitleaks detect --no-git` against the full repo — only `.env`
files (gitignored) remain as findings
- [x] Verified gitleaks catches a test secret file correctly
- [x] Pre-commit hooks pass on commit (both detect-secrets and gitleaks
passed)
#### For configuration changes:
- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
## Summary
This PR modernizes AutoGPT Classic to make it more useful for day-to-day
autonomous agent development. Major changes include consolidating the
project structure, adding new prompt strategies, modernizing the
benchmark system, and improving the development experience.
**Note: AutoGPT Classic is an experimental, unsupported project
preserved for educational/historical purposes. Dependencies will not be
actively updated.**
## Changes 🏗️
### Project Structure & Build System
- **Consolidated Poetry projects** - Merged `forge/`,
`original_autogpt/`, and benchmark packages into a single
`pyproject.toml` at `classic/` root
- **Removed old benchmark infrastructure** - Deleted the complex
`agbenchmark` package (3000+ lines) in favor of the new
`direct_benchmark` harness
- **Removed frontend** - Deleted `benchmark/frontend/` React app (no
longer needed)
- **Cleaned up CI workflows** - Simplified GitHub Actions workflows for
the consolidated project structure
- **Added CLAUDE.md** - Documentation for working with the codebase
using Claude Code
### New Direct Benchmark System
- **`direct_benchmark` harness** - New streamlined benchmark runner
with:
- Rich TUI with multi-panel layout showing parallel test execution
- Incremental resume and selective reset capabilities
- CI mode for non-interactive environments
- Step-level logging with colored prefixes
- "Would have passed" tracking for timed-out challenges
- Copy-paste completion blocks for sharing results
### Multiple Prompt Strategies
Added pluggable prompt strategy system supporting:
- **one_shot** - Single-prompt completion
- **plan_execute** - Plan first, then execute steps
- **rewoo** - Reasoning without observation (deferred tool execution)
- **react** - Reason + Act iterative loop
- **lats** - Language Agent Tree Search (MCTS-based exploration)
- **sub_agent** - Multi-agent delegation architecture
- **debate** - Multi-agent debate for consensus
### LLM Provider Improvements
- Added support for modern **Anthropic Claude models**
(claude-3.5-sonnet, claude-3-haiku, etc.)
- Added **Groq** provider support
- Improved tool call error feedback for LLM self-correction
- Fixed deprecated API usage
### Web Components
- **Replaced Selenium with Playwright** for web browsing (better async
support, faster)
- Added **lightweight web fetch component** for simple URL fetching
- **Modernized web search** with tiered provider system (Tavily, Serper,
Google)
### Agent Capabilities
- **Workspace permissions system** - Pattern-based allow/deny lists for
agent commands
- **Rich interactive selector** for command approval with scopes
(once/agent/workspace/deny)
- **TodoComponent** with LLM-powered task decomposition
- **Platform blocks integration** - Connect to AutoGPT Platform API for
additional blocks
- **Sub-agent architecture** - Agents can spawn and coordinate
sub-agents
### Developer Experience
- **Python 3.12+ support** with CI testing on 3.12, 3.13, 3.14
- **Current working directory as default workspace** - Run `autogpt`
from any project directory
- Simplified log format (removed timestamps)
- Improved configuration and setup flow
- External benchmark adapters for GAIA, SWE-bench, and AgentBench
### Bug Fixes
- Fixed N/A command loop when using native tool calling
- Fixed auto-advance plan steps in Plan-Execute strategy
- Fixed approve+feedback to execute command then send feedback
- Fixed parallel tool calls in action history
- Always recreate Docker containers for code execution
- Various pyright type errors resolved
- Linting and formatting issues fixed across codebase
## Test Plan
- [x] CI lint, type, and test checks pass
- [x] Run `poetry install` from `classic/` directory
- [x] Run `poetry run autogpt` and verify CLI starts
- [x] Run `poetry run direct-benchmark run --tests ReadFile` to verify
benchmark works
## Notes
- This is a WIP PR for personal use improvements
- The project is marked as **unsupported** - no active maintenance
planned
- Contains known vulnerabilities in dependencies (intentionally not
updated)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> CI/build workflows are substantially reworked (runner matrix removal,
path/layout changes, new benchmark runner), so breakage is most likely
in automation and packaging rather than runtime behavior.
>
> **Overview**
> **Modernizes the `classic/` project layout and automation around a
single consolidated Poetry project** (root
`classic/pyproject.toml`/`poetry.lock`) and updates docs
(`classic/README.md`, new `classic/CLAUDE.md`) accordingly.
>
> **Replaces the old `agbenchmark` CI usage with `direct-benchmark` in
GitHub Actions**, including new/updated benchmark smoke and regression
workflows, standardized `working-directory: classic`, and a move to
**Python 3.12** on Ubuntu-only runners (plus updated caching, coverage
flags, and required `ANTHROPIC_API_KEY` wiring).
>
> Cleans up repo/dev tooling by removing the classic frontend workflow,
deleting the Forge VCR cassette submodule (`.gitmodules`) and associated
CI steps, consolidating `flake8`/`isort`/`pyright` pre-commit hooks to
run from `classic/`, updating ignores for new report/workspace
artifacts, and updating `classic/Dockerfile.autogpt` to build from
Python 3.12 with the consolidated project structure.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
de67834dac. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
Our pre-commit hooks can use an update: the type check often fails based
on stale type definitions, the OpenAPI schema isn't synced/checked, and
the pre-push checks aren't installed by default.
### Changes 🏗️
- Regenerate Prisma `.pyi` type stub in on `prisma generate` hook:
Pyright prefers `.pyi` over `.py`, so a stale stub shadows the
regenerated `types.py`
- Also run setup hooks (dependency install, `prisma generate`, `pnpm
generate:api`) on `post-checkout`, to keep types and packages in sync
after switching branches
- Switch these hooks to `git diff` checks because `post-checkout`
doesn't support file triggers/filters
- Add `Check & Install dependencies - AutoGPT Platform - Frontend` hook
- Add `Sync API types - AutoGPT Platform - Backend -> Frontend` hook
- Fix non-ASCII issue in `export-api-schema` (`ensure_ascii=False`)
- Exclude `pnpm-lock.yaml` from `detect-secrets` hook (integrity hashes
cause ~1800 false positives)
- Add `default_stages: [pre-commit]`
- Add `post-checkout`, `pre-push` to `default_install_hook_types`
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Tested locally
## Changes 🏗️
- Run the API query generation as part of the `dev` command
- update the `README` to reflect so
- Add CI job to generate queries and type-check to make sure we are not
out of sync
- the job is run both in Front-end and Back-end changes
- Generate the files via script to load the BE URL dynamically from the
env
- Remove generated files from Git
- rename the `type-check` command to `types`
## Checklist 📋
### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] CI passes
- [x] `README` updates make sense
#### For configuration changes:
None
---------
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
## 🧢 Overview
This PR migrates the AutoGPT Platform frontend from [yarn
1](https://classic.yarnpkg.com/lang/en/) to [pnpm](https://pnpm.io/)
using **corepack** for automatic package manager management.
**yarn1** is not longer maintained and a bit old, moving to **pnpm** we
get:
- ⚡ Significantly faster install times,
- 💾 Better disk space efficiency,
- 🛠️ Better community support and maintenance,
- 💆🏽♂️ Config swap very easy
## 🏗️ Changes
### Package Management Migration
- updated [corepack](https://github.com/nodejs/corepack) to use
[pnpm](https://pnpm.io/)
- Deleted `yarn.lock` and generated new `pnpm-lock.yaml`
- Updated `.gitignore`
### Documentation Updates
- `frontend/README.md`:
- added comprehensive tech stack overview with links
- updated all commands to use pnpm
- added corepack setup instructions
- and included migration disclaimer for yarn users
- `backend/README.md`:
- Updated installation instructions to use pnpm with corepack
- `AGENTS.md`:
- Updated testing commands from yarn to pnpm
### CI/CD & Infrastructure
- **GitHub Workflows** :
- updated all jobs to use pnpm with corepack enable
- cleaned FE Playwright test workflow to avoid Sentry noise
- **Dockerfile**:
- updated to use pnpm with corepack, changed lock file reference, and
updated cache mount path
### 📋 Checklist
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
**Test Plan:**
> assuming you are on the `frontend` folder
- [x] Clean installation works: `rm -rf node_modules && corepack enable
&& pnpm install`
- [x] Development server starts correctly: `pnpm dev`
- [x] Build process works: `pnpm build`
- [x] Linting and formatting work: `pnpm lint` and `pnpm format`
- [x] Type checking works: `pnpm type-check`
- [x] Tests run successfully: `pnpm test`
- [x] Storybook starts correctly: `pnpm storybook`
- [x] Docker build succeeds with new pnpm configuration
- [x] GitHub Actions workflow passes with pnpm commands
#### For configuration changes:
- [x] `.env.example` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
# Query Optimization for AgentNodeExecution Tables
## Overview
This PR describes the database index optimizations applied to improve
the performance of slow queries in the AutoGPT platform backend.
## Problem Analysis
The following queries were identified as consuming significant database
resources:
### 1. Complex Filtering Query (19.3% of total time)
```sql
SELECT ... FROM "AgentNodeExecution"
WHERE "agentNodeId" = $1
AND "agentGraphExecutionId" = $2
AND "executionStatus" = $3
AND "id" NOT IN (
SELECT "referencedByInputExecId"
FROM "AgentNodeExecutionInputOutput"
WHERE "name" = $4 AND "referencedByInputExecId" IS NOT NULL
)
ORDER BY "addedTime" ASC
```
### 2. Multi-table JOIN Query (8.9% of total time)
```sql
SELECT ... FROM "AgentNodeExecution"
LEFT JOIN "AgentNode" ON ...
LEFT JOIN "AgentBlock" ON ...
WHERE "AgentBlock"."id" IN (...)
AND "executionStatus" != $11
AND "agentGraphExecutionId" IN (...)
ORDER BY "queuedTime" DESC, "addedTime" DESC
```
### 3. Bulk Graph Execution Queries (multiple variations, ~10% combined)
Multiple queries filtering by `agentGraphExecutionId` with various
ordering requirements.
## Optimization Strategy
### 1. Composite Indexes for AgentNodeExecution
Set the following composite indexes to the `AgentNodeExecution` model:
```prisma
@@index([agentGraphExecutionId, agentNodeId, executionStatus])
@@index([addedTime, queuedTime])
```
#### Benefits:
- **Index 1**: Covers the exact WHERE clause of the complex filtering
query, allowing index-only scans
- **Index 2**: Optimizes queries filtering by graph execution and status
- **Index 3**: Supports efficient sorting when filtering by graph
execution
- **Index 4**: Optimizes ORDER BY operations on time fields
### 2. Composite Index for AgentNodeExecutionInputOutput
Added the following composite index:
```prisma
// Input and Output pin names are unique for each AgentNodeExecution.
@@unique([referencedByInputExecId, referencedByOutputExecId, name])
@@index([referencedByOutputExecId])
// Composite index for `upsert_execution_input`.
@@index([name, time])
```
#### Benefits:
- Dramatically improves the NOT IN subquery performance in Query 1
- Allows the database to use an index scan instead of a full table scan
- Reduces the subquery execution time from O(n) to O(log n)
## Expected Performance Improvements
1. **Query 1 (19.3% of total time)**:
- Expected improvement: 80-90% reduction in execution time
- The composite index on `[agentNodeId, agentGraphExecutionId,
executionStatus]` will allow index-only scans
- The subquery will benefit from the new index on
`AgentNodeExecutionInputOutput`
2. **Query 2 (8.9% of total time)**:
- Expected improvement: 50-70% reduction in execution time
- The `[agentGraphExecutionId, executionStatus]` index will reduce the
initial filtering cost
3. **Bulk Queries (10% combined)**:
- Expected improvement: 60-80% reduction in execution time
- Composite indexes including time fields will optimize sorting
operations
## Migration Considerations
1. **Index Creation Time**: Creating these indexes on existing large
tables may take time
2. **Storage Impact**: Each index requires additional storage space
3. **Write Performance**: Slight decrease in INSERT/UPDATE performance
due to index maintenance
## Additional Optimizations
### NotificationEvent Table Index
Added index for notification batch queries:
```prisma
@@index([userNotificationBatchId])
```
This optimizes the query:
```sql
SELECT ... FROM "NotificationEvent"
WHERE "userNotificationBatchId" IN (...)
```
#### Benefits:
- Eliminates full table scans when filtering by batch ID
- Improves query performance from O(n) to O(log n)
- Particularly beneficial for users with many notification events
## Future Optimizations
Consider these additional optimizations if needed:
1. Partitioning `AgentNodeExecution` table by `createdAt` or
`agentGraphExecutionId`
2. Implementing materialized views for frequently accessed aggregate
data
3. Adding covering indexes for specific query patterns
4. Implementing query result caching at the application level
---------
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
Running the tests locally takes a lot of time and leaves test data
behind in the DB, making it impractical to actually run locally.
I'm disabling the `pytest` hooks in the pre-commit config so the
pre-commit checks can reasonably be used without significant negative
impact to DX.
This doesn't impact UX and there is nothing to test.
- Prep work for #8782
- Prep work for #8779
### Changes 🏗️
- refactor(platform): Differentiate graph/node execution events
- fix(platform): Subscribe to execution updates by `graph_exec_id`
instead of `graph_id`+`graph_version`
- refactor(backend): Move all execution related models and functions
from `.data.graph` to `.data.execution`
- refactor(backend): Reorganize & refactor `.data.execution`
- fix(libs): Remove `load_dotenv` in `.auth.config` to fix test config
issues
- dx: Bump version of `black` in pre-commit config to v24.10.0 to match
poetry.lock
- Other minor refactoring in both frontend and backend
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- Run an agent in the builder
- [x] -> works normally, node I/O is updated in real time
- Run an agent in the library
- [x] -> works normally
---------
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
- Resolves#8780
- Part of #8774
### Changes 🏗️
- Add new UI components
- Add `/agents/[id]` page, with sub-components:
- `AgentRunsSelectorList`
- `AgentRunSummaryCard`
- `AgentRunStatusChip`
- `AgentRunDetailsView`
- `AgentRunDraftView`
- `AgentScheduleDetailsView`
Backend improvements:
- Improve output of execution-related API endpoints: return
`GraphExecution` instead of `NodeExecutionResult[]`
- Reduce log spam from Prisma in tests
General frontend improvements:
- Hide nav link names on smaller screens to prevent navbar overflow
- Clean up styling and fix sizing of `agptui/Button`
Technical frontend improvements:
- Fix tailwind config size increments
- Rename `font-poppin` -> `font-poppins`
- Clean up component implementations and usages
- Yeet all occurrences of `variant="default"`
- Remove `default` button variant as duplicate of `outline`; make
`outline` the default
- Fix minor typing issues
DX:
- Add front end type-check step to `pre-commit` config
- Fix logging setup in conftest.py
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- `/agents/[id]` (new)
- Go to page -> list of runs loads
- Create new run -> runs; all I/O is visible
- Click "Run again" -> runs again with same input
- `/monitoring` (existing)
- Go to page -> everything loads
- Selecting agents and agent runs works
---------
Co-authored-by: Nicholas Tindle <nicktindle@outlook.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: Swifty <craigswift13@gmail.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
Make all changes necessary to make everything work with Poetry v2.0.0.
- Resolves#9196
## Changes
- Removed `--no-update` flag from `poetry lock` command in codebase
- Removed extra path arguments from `poetry -C [path] run [command]`
occurrences
- Regenerated all lock files in hierarchical order
- Added workaround for Poetry bug where `packages.[i].format` is now
suddenly required
Additionally:
- Fixed up .dockerignore
- Fixes .venv being erroneously copied over from local
- Fixes build context bloat (300MB -> 2.5MB)
- Fixed warnings about entrypoint script not being installed in docker
builds
### Relevant (breaking) changes in v2.0.0
- `--no-update` flag no longer exists for `poetry lock` as it has become
default behavior
- The `-C` option now actually changes the directory, so any path
arguments in `poetry run` commands can/must be removed
- Poetry v2.0.0 uses the new v2.1 lock file spec, so all lock files have
to be regenerated to avoid false-positive lock file updates and checks
on future PRs
- **BUG:** when specifying `poetry.tool.packages`, `format` is required
now
- python-poetry/poetry#9961
Full Poetry v2.0.0 release notes and change log:
https://python-poetry.org/blog/announcing-poetry-2.0.0
- Resolves#8859
- Follow-up to #8751
### Changes
- Add `autogpt_libs` to the backend CI path filter
- Add `ruff format` step for `autogpt_libs` to `linter.py` and
`pre-commit` config
- Run `poetry run format` with the new setup
- fix naming of hooks
- fix `pyright` hooks (b0rked by repo restructure)
- fix `forge` path (b0rked by faulty replace-all when the repo was restructured)
- fix `black` hook to work on all Python versions
- add `poetry install` hooks
- add `ruff`, `isort`, `pyright`, `pytest`, and `prisma generate` hooks for `backend/`
- add `ruff` and `pyright` hooks for `autogpt_libs/`
Restructuring the Repo to make it clear the difference between classic autogpt and the autogpt platform:
* Move the "classic" projects `autogpt`, `forge`, `frontend`, and `benchmark` into a `classic` folder
* Also rename `autogpt` to `original_autogpt` for absolute clarity
* Rename `rnd/` to `autogpt_platform/`
* `rnd/autogpt_builder` -> `autogpt_platform/frontend`
* `rnd/autogpt_server` -> `autogpt_platform/backend`
* Adjust any paths accordingly
- **FIX ALL LINT/TYPE ERRORS IN AUTOGPT, FORGE, AND BENCHMARK**
### Linting
- Clean up linter configs for `autogpt`, `forge`, and `benchmark`
- Add type checking with Pyright
- Create unified pre-commit config
- Create unified linting and type checking CI workflow
### Testing
- Synchronize CI test setups for `autogpt`, `forge`, and `benchmark`
- Add missing pytest-cov to benchmark dependencies
- Mark GCS tests as slow to speed up pre-commit test runs
- Repair `forge` test suite
- Add `AgentDB.close()` method for test DB teardown in db_test.py
- Use actual temporary dir instead of forge/test_workspace/
- Move left-behind dependencies for moved `forge`-code to from autogpt to forge
### Notable type changes
- Replace uses of `ChatModelProvider` by `MultiProvider`
- Removed unnecessary exports from various __init__.py
- Simplify `FileStorage.open_file` signature by removing `IOBase` from return type union
- Implement `S3BinaryIOWrapper(BinaryIO)` type interposer for `S3FileStorage`
- Expand overloads of `GCSFileStorage.open_file` for improved typing of read and write modes
Had to silence type checking for the extra overloads, because (I think) Pyright is reporting a false-positive:
https://github.com/microsoft/pyright/issues/8007
- Change `count_tokens`, `get_tokenizer`, `count_message_tokens` methods on `ModelProvider`s from class methods to instance methods
- Move `CompletionModelFunction.schema` method -> helper function `format_function_def_for_openai` in `forge.llm.providers.openai`
- Rename `ModelProvider` -> `BaseModelProvider`
- Rename `ChatModelProvider` -> `BaseChatModelProvider`
- Add type `ChatModelProvider` which is a union of all subclasses of `BaseChatModelProvider`
### Removed rather than fixed
- Remove deprecated and broken autogpt/agbenchmark_config/benchmarks.py
- Various base classes and properties on base classes in `forge.llm.providers.schema` and `forge.models.providers`
### Fixes for other issues that came to light
- Clean up `forge.agent_protocol.api_router`, `forge.agent_protocol.database`, and `forge.agent.agent`
- Add fallback behavior to `ImageGeneratorComponent`
- Remove test for deprecated failure behavior
- Fix `agbenchmark.challenges.builtin` challenge exclusion mechanism on Windows
- Fix `_tool_calls_compat_extract_calls` in `forge.llm.providers.openai`
- Add support for `any` (= no type specified) in `JSONSchema.typescript_type`