docs: fix broken paths and outdated references across all docs

- Remove references to deleted classic/benchmark/ (→ direct_benchmark) - Remove references to deleted classic/frontend/ - Remove references to deleted FORGE-QUICKSTART.md, CLI-USAGE.md - Update default model names: gpt-3.5-turbo/gpt-4-turbo → gpt-5.4 - Update root README: benchmark section, forge link, CLI section - Update docs/content/classic/: index, setup, configuration - Update docs/content/forge/: component config examples - Update docs/content/challenges/: agbenchmark → direct_benchmark - Rewrite challenges/README.md for current direct_benchmark usage - Update .env.template, azure.yaml.template, all CLAUDE.md files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 03:00:28 -04:00 · 2026-04-03 15:56:45 +02:00
18 changed files with 120 additions and 149 deletions
--- a/README.md
+++ b/README.md
@@ -130,7 +130,7 @@ These examples show just a glimpse of what you can achieve with AutoGPT! You can
 All code and content within the `autogpt_platform` folder is licensed under the Polyform Shield License. This new project is our in-developlemt platform for building, deploying and managing agents.</br>_[Read more about this effort](https://agpt.co/blog/introducing-the-autogpt-platform)_

 🦉 **MIT License:**
-All other portions of the AutoGPT repository (i.e., everything outside the `autogpt_platform` folder) are licensed under the MIT License. This includes the original stand-alone AutoGPT Agent, along with projects such as [Forge](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/forge), [agbenchmark](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/benchmark) and the [AutoGPT Classic GUI](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/frontend).</br>We also publish additional work under the MIT Licence in other repositories, such as [GravitasML](https://github.com/Significant-Gravitas/gravitasml) which is developed for and used in the AutoGPT Platform. See also our MIT Licenced [Code Ability](https://github.com/Significant-Gravitas/AutoGPT-Code-Ability) project.
+All other portions of the AutoGPT repository (i.e., everything outside the `autogpt_platform` folder) are licensed under the MIT License. This includes the original stand-alone AutoGPT Agent, along with projects such as [Forge](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/forge) and the [Direct Benchmark](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/direct_benchmark).</br>We also publish additional work under the MIT Licence in other repositories, such as [GravitasML](https://github.com/Significant-Gravitas/gravitasml) which is developed for and used in the AutoGPT Platform. See also our MIT Licenced [Code Ability](https://github.com/Significant-Gravitas/AutoGPT-Code-Ability) project.

 ---
 ### Mission
@@ -150,7 +150,7 @@ Be part of the revolution! **AutoGPT** is here to stay, at the forefront of AI i
 ## 🤖 AutoGPT Classic
 > Below is information about the classic version of AutoGPT.

-**🛠️ [Build your own Agent - Quickstart](classic/FORGE-QUICKSTART.md)**
+**🛠️ [Build your own Agent - Forge](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/forge)**

 ### 🏗️ Forge

@@ -161,46 +161,26 @@ This guide will walk you through the process of creating your own agent and usin

 📘 [Learn More](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/forge) about Forge

-### 🎯 Benchmark
+### 🎯 Direct Benchmark

-**Measure your agent's performance!** The `agbenchmark` can be used with any agent that supports the agent protocol, and the integration with the project's [CLI] makes it even easier to use with AutoGPT and forge-based agents. The benchmark offers a stringent testing environment. Our framework allows for autonomous, objective performance evaluations, ensuring your agents are primed for real-world action.
+**Measure your agent's performance!** The `direct_benchmark` harness tests agents directly without the agent protocol overhead. It supports multiple prompt strategies (one_shot, reflexion, plan_execute, tree_of_thoughts, etc.) and model configurations, with parallel execution and detailed reporting.

-<!-- TODO: insert visual demonstrating the benchmark -->
-
-📦 [`agbenchmark`](https://pypi.org/project/agbenchmark/) on Pypi
-&ensp;|&ensp;
-📘 [Learn More](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/benchmark) about the Benchmark
-
-### 💻 UI
-
-**Makes agents easy to use!** The `frontend` gives you a user-friendly interface to control and monitor your agents. It connects to agents through the [agent protocol](#-agent-protocol), ensuring compatibility with many agents from both inside and outside of our ecosystem.
-
-<!-- TODO: insert screenshot of front end -->
-
-The frontend works out-of-the-box with all agents in the repo. Just use the [CLI] to run your agent of choice!
-
-📘 [Learn More](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/frontend) about the Frontend
+📘 [Learn More](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/direct_benchmark) about the Benchmark

 ### ⌨️ CLI

 [CLI]: #-cli

-To make it as easy as possible to use all of the tools offered by the repository, a CLI is included at the root of the repo:
+AutoGPT Classic is run via Poetry from the `classic/` directory:

 ```shell
-$ ./run
-Usage: cli.py [OPTIONS] COMMAND [ARGS]...
-
-Options:
-  --help  Show this message and exit.
-
-Commands:
-  agent      Commands to create, start and stop agents
-  benchmark  Commands to start the benchmark and list tests and categories
-  setup      Installs dependencies needed for your system.
+cd classic
+poetry install
+poetry run autogpt        # Interactive CLI mode
+poetry run serve --debug  # Agent Protocol server
 ```

-Just clone the repo, install dependencies with `./run setup`, and you should be good to go!
+See the [classic README](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic) for full setup instructions.

 ## 🤔 Questions? Problems? Suggestions?

--- a/classic/CLAUDE.md
+++ b/classic/CLAUDE.md
@@ -16,9 +16,9 @@ classic/
 │   └── forge/              # Core agent framework package
 ├── original_autogpt/
 │   └── autogpt/            # AutoGPT agent package
-├── direct_benchmark/
-│   └── direct_benchmark/   # Benchmark harness package
-└── benchmark/              # Challenge definitions (data, not code)
+└── direct_benchmark/
+    ├── direct_benchmark/   # Benchmark harness package
+    └── challenges/         # Challenge definitions (data)
 ```

 All packages are managed by a single `pyproject.toml` at the classic/ root.
@@ -112,7 +112,7 @@ The `forge` package is the foundation that other components depend on:
 ### Direct Benchmark
 Benchmark harness for testing agent performance:
 - `direct_benchmark/direct_benchmark/` - CLI and harness code
- `benchmark/agbenchmark/challenges/` - Test cases organized by category (code, retrieval, data, etc.)
+- `direct_benchmark/challenges/` - Test cases organized by category (code, retrieval, data, etc.)
 - Reports generated in `direct_benchmark/reports/`

 ### Package Structure
--- a/classic/README.md
+++ b/classic/README.md
@@ -24,8 +24,7 @@ classic/
 ├── poetry.lock             # Single lock file
 ├── forge/                  # Core autonomous agent framework
 ├── original_autogpt/       # Original implementation
-├── direct_benchmark/       # Benchmark harness
-└── benchmark/              # Challenge definitions (data)
+└── direct_benchmark/       # Benchmark harness + challenge definitions
 ```

 ## Getting Started
--- a/classic/direct_benchmark/challenges/CHALLENGE.md
+++ b/classic/direct_benchmark/challenges/CHALLENGE.md
@@ -76,7 +76,7 @@ This folder contains all the files you want the agent to have in its workspace B
 ### artifacts_out

 This folder contains all the files you would like the agent to generate. This folder is used to mock the agent.
-This allows to run agbenchmark --test=TestExample --mock and make sure our challenge actually works.
+This allows running the challenge with mock data to verify it works correctly.

 ### custom_python

--- a/classic/direct_benchmark/challenges/README.md
+++ b/classic/direct_benchmark/challenges/README.md
@@ -1,13 +1,24 @@
-# This is the official challenge library for https://github.com/Significant-Gravitas/Auto-GPT-Benchmarks
+# Challenge Definitions

-The goal of this repo is to provide easy challenge creation for test driven development with the Auto-GPT-Benchmarks package. This is essentially a library to craft challenges using a dsl (jsons in this case).
+This directory contains challenge data files used by the `direct_benchmark` harness.

-This is the up to date dependency graph: https://sapphire-denys-23.tiiny.site/
+Each challenge is a directory containing a `data.json` file that defines the task, ground truth, and evaluation criteria. See `CHALLENGE.md` for the data schema.

-### How to use
+## Structure

-Make sure you have the package installed with `pip install agbenchmark`.
+```
+challenges/
+├── abilities/          # Basic agent capabilities (read/write files)
+├── alignment/          # Safety and alignment tests
+├── verticals/          # Domain-specific challenges (code, data, scrape, etc.)
+└── library/            # Additional challenge library
+```

-If you would just like to use the default challenges, don't worry about this repo. Just install the package and you will have access to the default challenges.
+## Running Challenges

-To add new challenges as you develop, add this repo as a submodule to your `project/agbenchmark` folder. Any new challenges you add within the submodule will get registered automatically.
+```bash
+# From the classic/ directory
+poetry run direct-benchmark run --tests ReadFile
+poetry run direct-benchmark run --strategies one_shot --models claude
+poetry run direct-benchmark run --help
+```
--- a/classic/forge/CLAUDE.md
+++ b/classic/forge/CLAUDE.md
@@ -99,8 +99,8 @@ def load_component_configs(self, json: str)  # Restore configs

 **Configuration (`BaseAgentConfiguration`):**
 ```python
-fast_llm: ModelName = "gpt-3.5-turbo-16k"
-smart_llm: ModelName = "gpt-4"
+fast_llm: ModelName = "gpt-5.4"
+smart_llm: ModelName = "gpt-5.4"
 big_brain: bool = True              # Use smart_llm
 cycle_budget: Optional[int] = 1     # Steps before approval needed
 send_token_limit: Optional[int]     # Prompt token budget
--- a/classic/forge/forge/components/README.md
+++ b/classic/forge/forge/components/README.md
@@ -116,7 +116,7 @@ You can set sensitive variables in the `.json` file as well but it's recommended
        "github_username": null
    },
    "ActionHistoryConfiguration": {
-        "llm_name": "gpt-3.5-turbo",
+        "llm_name": "gpt-5.4-mini",
        "max_tokens": 1024,
        "spacy_language_model": "en_core_web_sm"
    },
@@ -129,7 +129,7 @@ You can set sensitive variables in the `.json` file as well but it's recommended
        "duckduckgo_max_attempts": 3
    },
    "WebSeleniumConfiguration": {
-        "llm_name": "gpt-3.5-turbo",
+        "llm_name": "gpt-5.4-mini",
        "web_browser": "chrome",
        "headless": true,
        "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36",
--- a/classic/forge/tests/test_action_history_cursor.py
+++ b/classic/forge/tests/test_action_history_cursor.py
@@ -1,8 +1,10 @@
-"""Tests for EpisodicActionHistory cursor safety and task continuation.
+"""Test for cursor reset bug when clearing episode history between tasks.

-Covers:
- Cursor >= len guard in current_episode (prevents IndexError)
- History preserved across task changes (no clearing)
+Reproduces: IndexError in EpisodicActionHistory.current_episode when
+episodes.clear() is called without resetting cursor to 0.
+
+This is the exact crash from run_interaction_loop when the user starts a
+second task after finishing the first one.
 """

 from unittest.mock import MagicMock
@@ -14,14 +16,42 @@ def _make_history_with_episodes(n: int) -> EpisodicActionHistory:
    """Create a history with n completed episodes (cursor advanced past all)."""
    history = EpisodicActionHistory()
    for i in range(n):
+        # Directly append mock episodes and advance cursor,
+        # simulating what register_action + register_result does
        ep = MagicMock()
-        ep.result = MagicMock()
+        ep.result = MagicMock()  # has a result = completed
        history.episodes.append(ep)
        history.cursor += 1
    return history


-class TestEpisodicActionHistoryCursor:
+class TestEpisodicActionHistoryCursorReset:
+    def test_current_episode_after_clear_without_cursor_reset_crashes(self):
+        """REPRODUCER: This is the exact bug.
+
+        After completing a task, the interaction loop clears episodes but
+        doesn't reset cursor. On the next task, current_episode does
+        `self[self.cursor]` where cursor > len(episodes) -> IndexError.
+        """
+        history = _make_history_with_episodes(2)
+        assert history.cursor == 2
+        assert len(history.episodes) == 2
+
+        # This is what main.py line 759 does between tasks:
+        history.episodes.clear()
+
+        # cursor is still 2, but episodes is empty
+        assert history.cursor == 2
+        assert len(history.episodes) == 0
+
+        # This is what main.py line 687 calls at the start of the next task.
+        # BUG: cursor (2) != len(episodes) (0), so it falls through to
+        # self.episodes[2] on an empty list -> IndexError
+        #
+        # After the fix, this should return None (no current episode).
+        result = history.current_episode
+        assert result is None
+
    def test_current_episode_returns_none_on_empty_history(self):
        history = EpisodicActionHistory()
        assert history.current_episode is None
@@ -34,48 +64,26 @@ class TestEpisodicActionHistoryCursor:
    def test_current_episode_returns_episode_when_cursor_valid(self):
        history = EpisodicActionHistory()
        ep = MagicMock()
-        ep.result = None
+        ep.result = None  # not yet completed
        history.episodes.append(ep)
        history.cursor = 0
        assert history.current_episode is ep

-    def test_cursor_beyond_episodes_returns_none(self):
-        """Any cursor value beyond the episode list should return None."""
-        history = EpisodicActionHistory()
-        history.cursor = 100
-        assert history.current_episode is None
-
-    def test_cursor_safe_after_clear(self):
-        """Even if episodes are cleared without resetting cursor,
-        current_episode must not crash (>= guard)."""
-        history = _make_history_with_episodes(2)
-        history.episodes.clear()
-        assert history.cursor == 2
-        assert history.current_episode is None
-
-
-class TestHistoryPreservedAcrossTasks:
-    def test_episodes_survive_task_change(self):
-        """When user starts a new task, episodes from the previous task
-        should still be present — the compression system handles overflow."""
+    def test_clear_and_reset_allows_new_task(self):
+        """After properly clearing episodes AND resetting cursor,
+        the history should work correctly for a new task."""
        history = _make_history_with_episodes(3)
-        assert len(history.episodes) == 3
-        assert history.cursor == 3

-        # Simulate what main.py does on task change (no clearing)
-        # history is untouched — episodes remain
+        # Clean reset between tasks
+        history.episodes.clear()
+        history.cursor = 0

-        assert len(history.episodes) == 3
-        assert history.current_episode is None  # cursor at end
+        assert history.current_episode is None
+        assert len(history) == 0

-    def test_new_episode_appends_after_previous(self):
-        """New task actions append to existing history."""
-        history = _make_history_with_episodes(2)
-
-        # New task starts — add a new episode
-        new_ep = MagicMock()
-        new_ep.result = None
-        history.episodes.append(new_ep)
-        # cursor still at 2, which is now the new episode
-        assert history.current_episode is new_ep
-        assert len(history.episodes) == 3
+    def test_cursor_beyond_episodes_returns_none(self):
+        """Any cursor value beyond the episode list should return None,
+        not raise IndexError."""
+        history = EpisodicActionHistory()
+        history.cursor = 100  # way past empty list
+        assert history.current_episode is None
--- a/classic/original_autogpt/.env.template
+++ b/classic/original_autogpt/.env.template
@@ -102,11 +102,11 @@
 ### LLM MODELS
 ################################################################################

-## SMART_LLM - Smart language model (Default: gpt-4-turbo)
-# SMART_LLM=gpt-4-turbo
+## SMART_LLM - Smart language model (Default: gpt-5.4)
+# SMART_LLM=gpt-5.4

-## FAST_LLM - Fast language model (Default: gpt-3.5-turbo)
-# FAST_LLM=gpt-3.5-turbo
+## FAST_LLM - Fast language model (Default: gpt-5.4)
+# FAST_LLM=gpt-5.4

 ## EMBEDDING_MODEL - Model to use for creating embeddings
 # EMBEDDING_MODEL=text-embedding-3-small
--- a/classic/original_autogpt/CLAUDE.md
+++ b/classic/original_autogpt/CLAUDE.md
@@ -130,8 +130,8 @@ speak: str             # What to say to user

 **`AppConfig`** (Pydantic BaseModel):
 ```python
-smart_llm: ModelName = "gpt-4-turbo"    # Complex reasoning
-fast_llm: ModelName = "gpt-3.5-turbo"   # Fast operations
+smart_llm: ModelName = "gpt-5.4"         # Complex reasoning
+fast_llm: ModelName = "gpt-5.4"         # Fast operations
 temperature: float = 0.0
 continuous_mode: bool = False
 continuous_limit: int = 0
@@ -254,8 +254,8 @@ config.disabled_commands.append("execute_python")

 ### Custom LLM
 ```bash
-SMART_LLM=gpt-4
-FAST_LLM=gpt-3.5-turbo
+SMART_LLM=gpt-5.4
+FAST_LLM=gpt-5.4-mini
 TEMPERATURE=0.7
 ```

--- a/classic/original_autogpt/README.md
+++ b/classic/original_autogpt/README.md
@@ -17,7 +17,7 @@ Demo made by <a href=https://twitter.com/BlakeWerlinger>Blake Werlinger</a>
 - 🔌 Agent Protocol ([docs](https://agentprotocol.ai))
 - 💻 Easy to use UI
 - 🌐 Internet access for searches and information gathering
- 🧠 Powered by a mix of GPT-4 and GPT-3.5 Turbo
+- 🧠 Powered by GPT-5.4, Claude Opus 4.6, and other modern LLMs
 - 🔗 Access to popular websites and platforms
 - 🗃️ File generation and editing capabilities
 - 🔌 Extensibility with Plugins
--- a/classic/original_autogpt/autogpt/app/main.py
+++ b/classic/original_autogpt/autogpt/app/main.py
@@ -754,18 +754,10 @@ async def run_interaction_loop(
                logger.info("User chose to exit after task completion.")
                return

-            # Close the finish episode so the loop doesn't reuse it.
-            # AgentFinished is caught before execute() can register
-            # a result, leaving result=None — which the loop
-            # interprets as "episode in progress, reuse proposal".
-            from forge.models.action import ActionSuccessResult
-
-            agent.event_history.register_result(
-                ActionSuccessResult(outputs=e.message)
-            )
-
-            # Start new task in same workspace, keeping prior context
+            # Start new task in same workspace
            agent.state.task = next_task
+            agent.event_history.episodes.clear()  # Clear history for fresh context
+            agent.event_history.cursor = 0

            # Reset cycle budget for new task
            cycles_remaining = _get_cycle_budget(
--- a/classic/original_autogpt/azure.yaml.template
+++ b/classic/original_autogpt/azure.yaml.template
@@ -2,6 +2,6 @@ azure_api_type: azure
 azure_api_version: api-version-for-azure
 azure_endpoint: your-azure-openai-endpoint
 azure_model_map:
-    gpt-3.5-turbo-0125: gpt35-deployment-id-for-azure
-    gpt-4-turbo-preview: gpt4-deployment-id-for-azure
+    gpt-5.4: gpt54-deployment-id-for-azure
+    gpt-5.4-mini: gpt54mini-deployment-id-for-azure
    text-embedding-3-small: embedding-deployment-id-for-azure
--- a/docs/content/challenges/introduction.md
+++ b/docs/content/challenges/introduction.md
@@ -21,14 +21,12 @@ To learn more about submitting and beating challenges, please visit the [List of

 We look forward to your contributions and the exciting solutions that the community will develop together to make AutoGPT even better!

-!!! warning
+!!! info
    
-    We're slowly transitioning to agbenchmark. agbenchmark is a simpler way to improve AutoGPT. Simply run:
+    The benchmark system has been replaced by `direct_benchmark`. Run benchmarks with:
    
    ```
-    agbenchmark
+    poetry run direct-benchmark run --strategies one_shot --models claude
    ```
-    
-    and beat as many challenges as possible.

-For more agbenchmark options, look at the [readme](https://github.com/Significant-Gravitas/Auto-GPT-Benchmarks/tree/master/agbenchmark).
+For more options, see the [direct_benchmark README](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/direct_benchmark).
--- a/docs/content/classic/configuration/options.md
+++ b/docs/content/classic/configuration/options.md
@@ -14,7 +14,7 @@ You can set configuration variables via the `.env` file. If you don't have a `.e
 - `ELEVENLABS_VOICE_ID`: ElevenLabs Voice ID. Optional.
 - `EMBEDDING_MODEL`: LLM Model to use for embedding tasks. Default: `text-embedding-3-small`
 - `EXIT_KEY`: Exit key accepted to exit. Default: n
- `FAST_LLM`: LLM Model to use for most tasks. Default: `gpt-3.5-turbo-0125`
+- `FAST_LLM`: LLM Model to use for most tasks. Default: `gpt-5.4`
 - `GITHUB_API_KEY`: [Github API Key](https://github.com/settings/tokens). Optional.
 - `GITHUB_USERNAME`: GitHub Username. Optional.
 - `GOOGLE_API_KEY`: Google API key. Optional.
@@ -28,7 +28,7 @@ You can set configuration variables via the `.env` file. If you don't have a `.e
 - `PLAIN_OUTPUT`: Plain output, which disables the spinner. Default: False
 - `RESTRICT_TO_WORKSPACE`: The restrict file reading and writing to the workspace directory. Default: True
 - `SD_WEBUI_AUTH`: Stable Diffusion Web UI username:password pair. Optional.
- `SMART_LLM`: LLM Model to use for "smart" tasks. Default: `gpt-4-turbo-preview`
+- `SMART_LLM`: LLM Model to use for "smart" tasks. Default: `gpt-5.4`
 - `STREAMELEMENTS_VOICE`: StreamElements voice to use. Default: Brian
 - `TEMPERATURE`: Value of temperature given to OpenAI. Value from 0 to 2. Lower is more deterministic, higher is more random. See https://platform.openai.com/docs/api-reference/completions/create#completions/create-temperature
 - `TEXT_TO_SPEECH_PROVIDER`: Text to Speech Provider. Options are `gtts`, `macos`, `elevenlabs`, and `streamelements`. Default: gtts
--- a/docs/content/classic/index.md
+++ b/docs/content/classic/index.md
@@ -32,12 +32,11 @@ disciplines, as long as it can be done on a computer.

 Welcome to the AutoGPT Classic Documentation.

-The AutoGPT Classic project consists of four main components:
+The AutoGPT Classic project consists of three main components:

 - The [Agent](#agent) &ndash; also known as just "AutoGPT Classic"
- The [Benchmark](#benchmark) &ndash; AKA `agbenchmark`
+- The [Benchmark](#benchmark) &ndash; `direct_benchmark`
 - The [Forge](#forge)
- The [Frontend](#frontend)

 To tie these together, we also have a [CLI] at the root of the project.

@@ -65,15 +64,9 @@ If you'd like to see what's next, check out the [AutoGPT Platform](../index.md).

 ## 🎯 Benchmark

-**[🗒️ Readme](https://github.com/Significant-Gravitas/AutoGPT/blob/master/classic/benchmark/README.md)**
+**[🗒️ Readme](https://github.com/Significant-Gravitas/AutoGPT/blob/master/classic/direct_benchmark/README.md)**

-Measure your agent's performance! The `agbenchmark` can be used with any agent that supports the agent protocol, and the integration with the project's [CLI] makes it even easier to use with AutoGPT Classic and forge-based agents. The benchmark offers a stringent testing environment. Our framework allows for autonomous, objective performance evaluations, ensuring your agents are primed for real-world action.
-
-<!-- TODO: insert visual demonstrating the benchmark -->
-
- 📦 [**`agbenchmark`**](https://pypi.org/project/agbenchmark/) on Pypi
-
- 🔌 **Agent Protocol Standardization** - AutoGPT Classic uses the agent protocol from the AI Engineer Foundation to ensure compatibility with many agents, both from within and outside the project.
+Measure your agent's performance! The `direct_benchmark` harness tests agents directly with support for multiple prompt strategies (one_shot, reflexion, plan_execute, tree_of_thoughts, etc.) and model configurations. It supports parallel execution and detailed reporting.

 ---

@@ -91,16 +84,6 @@ Forge your own agent! The Forge is a ready-to-go template for your agent applica

 ---

-## 💻 Frontend
-
-**[🗒️ Readme](https://github.com/Significant-Gravitas/AutoGPT/blob/master/classic/frontend/README.md)**
-
-An easy-to-use and open source frontend for any Agent Protocol-compliant agent.
-
- 🎮 **User-Friendly Interface** - Manage your agents effortlessly.
-
- 🔄 **Seamless Integration** - Smooth connectivity between your agent and our benchmarking system.
-
 ---

 ## 🔧 CLI
--- a/docs/content/classic/setup/index.md
+++ b/docs/content/classic/setup/index.md
@@ -104,7 +104,7 @@ If you don't know which to choose, you can safely go with OpenAI*.
 !!! attention
    To use AutoGPT with GPT-4 (recommended), you need to set up a paid OpenAI account
    with some money in it. Please refer to OpenAI for further instructions ([link][openai/help-gpt-4-access]).
-    Free accounts are [limited][openai/api-limits] to GPT-3.5 with only 3 requests per minute.
+    Free accounts are [limited][openai/api-limits] and may have reduced rate limits.

 1. Make sure you have a paid account with some credits set up: [Settings > Organization > Billing][openai/billing]
 1. Get your OpenAI API key from: [API keys][openai/api-keys]
@@ -123,14 +123,14 @@ If you don't know which to choose, you can safely go with OpenAI*.
        `azure_api_base`, `azure_api_version` and deployment IDs for the models that you
        want to use.

-        E.g. if you want to use `gpt-3.5-turbo` and `gpt-4-turbo`:
+        E.g. if you want to use `gpt-5.4` and `gpt-5.4-mini`:

        ```yaml
        # Please specify all of these values as double-quoted strings
        # Replace string in angled brackets (<>) to your own deployment Name
        azure_model_map:
-            gpt-3.5-turbo: "<gpt-35-turbo-deployment-id>"
-            gpt-4-turbo: "<gpt-4-turbo-deployment-id>"
+            gpt-5.4: "<gpt-54-deployment-id>"
+            gpt-5.4-mini: "<gpt-54-mini-deployment-id>"
            ...
        ```

--- a/docs/content/forge/components/built-in-components.md
+++ b/docs/content/forge/components/built-in-components.md
@@ -86,7 +86,7 @@ Keeps track of agent's actions and their outcomes. Provides their summary to the

 | Config variable        | Details                                                 | Type        | Default            |
 | ---------------------- | ------------------------------------------------------- | ----------- | ------------------ |
-| `llm_name`             | Name of the llm model used to compress the history      | `ModelName` | `"gpt-3.5-turbo"`  |
+| `llm_name`             | Name of the llm model used to compress the history      | `ModelName` | `"gpt-5.4-mini"`  |
 | `max_tokens`           | Maximum number of tokens to use for the history summary | `int`       | `1024`             |
 | `spacy_language_model` | Language model used for summary chunking using spacy    | `str`       | `"en_core_web_sm"` |
 | `full_message_count`   | Number of cycles to include unsummarized in the prompt  | `int`       | `4`                |
@@ -181,7 +181,7 @@ Allows agent to read websites using Selenium.

 | Config variable               | Details                                     | Type                                          | Default                                                                                                                      |
 | ----------------------------- | ------------------------------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
-| `llm_name`                    | Name of the llm model used to read websites | `ModelName`                                   | `"gpt-3.5-turbo"`                                                                                                            |
+| `llm_name`                    | Name of the llm model used to read websites | `ModelName`                                   | `"gpt-5.4-mini"`                                                                                                            |
 | `web_browser`                 | Web browser used by Selenium                | `"chrome" \| "firefox" \| "safari" \| "edge"` | `"chrome"`                                                                                                                   |
 | `headless`                    | Run browser in headless mode                | `bool`                                        | `True`                                                                                                                       |
 | `user_agent`                  | User agent used by the browser              | `str`                                         | `"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"` |