fix(classic): register finish result before task continuation

AgentFinished is caught before execute() registers a result, leaving the finish episode with result=None. The interaction loop sees this as "episode in progress" and reuses the old finish proposal instead of calling the LLM for the new task. Register a success result before continuing so the loop calls propose_action() for the new task. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat(classic): preserve action history across task continuations
2026-04-08 03:00:28 -04:00 · 2026-04-04 17:45:30 +02:00 · 2026-04-03 18:36:23 +02:00
18 changed files with 150 additions and 121 deletions
--- a/README.md
+++ b/README.md
@@ -130,7 +130,7 @@ These examples show just a glimpse of what you can achieve with AutoGPT! You can
 All code and content within the `autogpt_platform` folder is licensed under the Polyform Shield License. This new project is our in-developlemt platform for building, deploying and managing agents.</br>_[Read more about this effort](https://agpt.co/blog/introducing-the-autogpt-platform)_

 🦉 **MIT License:**
-All other portions of the AutoGPT repository (i.e., everything outside the `autogpt_platform` folder) are licensed under the MIT License. This includes the original stand-alone AutoGPT Agent, along with projects such as [Forge](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/forge) and the [Direct Benchmark](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/direct_benchmark).</br>We also publish additional work under the MIT Licence in other repositories, such as [GravitasML](https://github.com/Significant-Gravitas/gravitasml) which is developed for and used in the AutoGPT Platform. See also our MIT Licenced [Code Ability](https://github.com/Significant-Gravitas/AutoGPT-Code-Ability) project.
+All other portions of the AutoGPT repository (i.e., everything outside the `autogpt_platform` folder) are licensed under the MIT License. This includes the original stand-alone AutoGPT Agent, along with projects such as [Forge](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/forge), [agbenchmark](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/benchmark) and the [AutoGPT Classic GUI](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/frontend).</br>We also publish additional work under the MIT Licence in other repositories, such as [GravitasML](https://github.com/Significant-Gravitas/gravitasml) which is developed for and used in the AutoGPT Platform. See also our MIT Licenced [Code Ability](https://github.com/Significant-Gravitas/AutoGPT-Code-Ability) project.

 ---
 ### Mission
@@ -150,7 +150,7 @@ Be part of the revolution! **AutoGPT** is here to stay, at the forefront of AI i
 ## 🤖 AutoGPT Classic
 > Below is information about the classic version of AutoGPT.

-**🛠️ [Build your own Agent - Forge](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/forge)**
+**🛠️ [Build your own Agent - Quickstart](classic/FORGE-QUICKSTART.md)**

 ### 🏗️ Forge

@@ -161,26 +161,46 @@ This guide will walk you through the process of creating your own agent and usin

 📘 [Learn More](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/forge) about Forge

-### 🎯 Direct Benchmark
+### 🎯 Benchmark

-**Measure your agent's performance!** The `direct_benchmark` harness tests agents directly without the agent protocol overhead. It supports multiple prompt strategies (one_shot, reflexion, plan_execute, tree_of_thoughts, etc.) and model configurations, with parallel execution and detailed reporting.
+**Measure your agent's performance!** The `agbenchmark` can be used with any agent that supports the agent protocol, and the integration with the project's [CLI] makes it even easier to use with AutoGPT and forge-based agents. The benchmark offers a stringent testing environment. Our framework allows for autonomous, objective performance evaluations, ensuring your agents are primed for real-world action.

-📘 [Learn More](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/direct_benchmark) about the Benchmark
+<!-- TODO: insert visual demonstrating the benchmark -->
+
+📦 [`agbenchmark`](https://pypi.org/project/agbenchmark/) on Pypi
+&ensp;|&ensp;
+📘 [Learn More](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/benchmark) about the Benchmark
+
+### 💻 UI
+
+**Makes agents easy to use!** The `frontend` gives you a user-friendly interface to control and monitor your agents. It connects to agents through the [agent protocol](#-agent-protocol), ensuring compatibility with many agents from both inside and outside of our ecosystem.
+
+<!-- TODO: insert screenshot of front end -->
+
+The frontend works out-of-the-box with all agents in the repo. Just use the [CLI] to run your agent of choice!
+
+📘 [Learn More](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/frontend) about the Frontend

 ### ⌨️ CLI

 [CLI]: #-cli

-AutoGPT Classic is run via Poetry from the `classic/` directory:
+To make it as easy as possible to use all of the tools offered by the repository, a CLI is included at the root of the repo:

 ```shell
-cd classic
-poetry install
-poetry run autogpt        # Interactive CLI mode
-poetry run serve --debug  # Agent Protocol server
+$ ./run
+Usage: cli.py [OPTIONS] COMMAND [ARGS]...
+
+Options:
+  --help  Show this message and exit.
+
+Commands:
+  agent      Commands to create, start and stop agents
+  benchmark  Commands to start the benchmark and list tests and categories
+  setup      Installs dependencies needed for your system.
 ```

-See the [classic README](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic) for full setup instructions.
+Just clone the repo, install dependencies with `./run setup`, and you should be good to go!

 ## 🤔 Questions? Problems? Suggestions?

--- a/classic/CLAUDE.md
+++ b/classic/CLAUDE.md
@@ -16,9 +16,9 @@ classic/
 │   └── forge/              # Core agent framework package
 ├── original_autogpt/
 │   └── autogpt/            # AutoGPT agent package
-└── direct_benchmark/
-    ├── direct_benchmark/   # Benchmark harness package
-    └── challenges/         # Challenge definitions (data)
+├── direct_benchmark/
+│   └── direct_benchmark/   # Benchmark harness package
+└── benchmark/              # Challenge definitions (data, not code)
 ```

 All packages are managed by a single `pyproject.toml` at the classic/ root.
@@ -112,7 +112,7 @@ The `forge` package is the foundation that other components depend on:
 ### Direct Benchmark
 Benchmark harness for testing agent performance:
 - `direct_benchmark/direct_benchmark/` - CLI and harness code
- `direct_benchmark/challenges/` - Test cases organized by category (code, retrieval, data, etc.)
+- `benchmark/agbenchmark/challenges/` - Test cases organized by category (code, retrieval, data, etc.)
 - Reports generated in `direct_benchmark/reports/`

 ### Package Structure
--- a/classic/README.md
+++ b/classic/README.md
@@ -24,7 +24,8 @@ classic/
 ├── poetry.lock             # Single lock file
 ├── forge/                  # Core autonomous agent framework
 ├── original_autogpt/       # Original implementation
-└── direct_benchmark/       # Benchmark harness + challenge definitions
+├── direct_benchmark/       # Benchmark harness
+└── benchmark/              # Challenge definitions (data)
 ```

 ## Getting Started
--- a/classic/direct_benchmark/challenges/CHALLENGE.md
+++ b/classic/direct_benchmark/challenges/CHALLENGE.md
@@ -76,7 +76,7 @@ This folder contains all the files you want the agent to have in its workspace B
 ### artifacts_out

 This folder contains all the files you would like the agent to generate. This folder is used to mock the agent.
-This allows running the challenge with mock data to verify it works correctly.
+This allows to run agbenchmark --test=TestExample --mock and make sure our challenge actually works.

 ### custom_python

--- a/classic/direct_benchmark/challenges/README.md
+++ b/classic/direct_benchmark/challenges/README.md
@@ -1,24 +1,13 @@
-# Challenge Definitions
+# This is the official challenge library for https://github.com/Significant-Gravitas/Auto-GPT-Benchmarks

-This directory contains challenge data files used by the `direct_benchmark` harness.
+The goal of this repo is to provide easy challenge creation for test driven development with the Auto-GPT-Benchmarks package. This is essentially a library to craft challenges using a dsl (jsons in this case).

-Each challenge is a directory containing a `data.json` file that defines the task, ground truth, and evaluation criteria. See `CHALLENGE.md` for the data schema.
+This is the up to date dependency graph: https://sapphire-denys-23.tiiny.site/

-## Structure
+### How to use

-```
-challenges/
-├── abilities/          # Basic agent capabilities (read/write files)
-├── alignment/          # Safety and alignment tests
-├── verticals/          # Domain-specific challenges (code, data, scrape, etc.)
-└── library/            # Additional challenge library
-```
+Make sure you have the package installed with `pip install agbenchmark`.

-## Running Challenges
+If you would just like to use the default challenges, don't worry about this repo. Just install the package and you will have access to the default challenges.

-```bash
-# From the classic/ directory
-poetry run direct-benchmark run --tests ReadFile
-poetry run direct-benchmark run --strategies one_shot --models claude
-poetry run direct-benchmark run --help
-```
+To add new challenges as you develop, add this repo as a submodule to your `project/agbenchmark` folder. Any new challenges you add within the submodule will get registered automatically.
--- a/classic/forge/CLAUDE.md
+++ b/classic/forge/CLAUDE.md
@@ -99,8 +99,8 @@ def load_component_configs(self, json: str)  # Restore configs

 **Configuration (`BaseAgentConfiguration`):**
 ```python
-fast_llm: ModelName = "gpt-5.4"
-smart_llm: ModelName = "gpt-5.4"
+fast_llm: ModelName = "gpt-3.5-turbo-16k"
+smart_llm: ModelName = "gpt-4"
 big_brain: bool = True              # Use smart_llm
 cycle_budget: Optional[int] = 1     # Steps before approval needed
 send_token_limit: Optional[int]     # Prompt token budget
--- a/classic/forge/forge/components/README.md
+++ b/classic/forge/forge/components/README.md
@@ -116,7 +116,7 @@ You can set sensitive variables in the `.json` file as well but it's recommended
        "github_username": null
    },
    "ActionHistoryConfiguration": {
-        "llm_name": "gpt-5.4-mini",
+        "llm_name": "gpt-3.5-turbo",
        "max_tokens": 1024,
        "spacy_language_model": "en_core_web_sm"
    },
@@ -129,7 +129,7 @@ You can set sensitive variables in the `.json` file as well but it's recommended
        "duckduckgo_max_attempts": 3
    },
    "WebSeleniumConfiguration": {
-        "llm_name": "gpt-5.4-mini",
+        "llm_name": "gpt-3.5-turbo",
        "web_browser": "chrome",
        "headless": true,
        "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36",
--- a/classic/forge/tests/test_action_history_cursor.py
+++ b/classic/forge/tests/test_action_history_cursor.py
@@ -1,10 +1,8 @@
-"""Test for cursor reset bug when clearing episode history between tasks.
+"""Tests for EpisodicActionHistory cursor safety and task continuation.

-Reproduces: IndexError in EpisodicActionHistory.current_episode when
-episodes.clear() is called without resetting cursor to 0.
-
-This is the exact crash from run_interaction_loop when the user starts a
-second task after finishing the first one.
+Covers:
+- Cursor >= len guard in current_episode (prevents IndexError)
+- History preserved across task changes (no clearing)
 """

 from unittest.mock import MagicMock
@@ -16,42 +14,14 @@ def _make_history_with_episodes(n: int) -> EpisodicActionHistory:
    """Create a history with n completed episodes (cursor advanced past all)."""
    history = EpisodicActionHistory()
    for i in range(n):
-        # Directly append mock episodes and advance cursor,
-        # simulating what register_action + register_result does
        ep = MagicMock()
-        ep.result = MagicMock()  # has a result = completed
+        ep.result = MagicMock()
        history.episodes.append(ep)
        history.cursor += 1
    return history


-class TestEpisodicActionHistoryCursorReset:
-    def test_current_episode_after_clear_without_cursor_reset_crashes(self):
-        """REPRODUCER: This is the exact bug.
-
-        After completing a task, the interaction loop clears episodes but
-        doesn't reset cursor. On the next task, current_episode does
-        `self[self.cursor]` where cursor > len(episodes) -> IndexError.
-        """
-        history = _make_history_with_episodes(2)
-        assert history.cursor == 2
-        assert len(history.episodes) == 2
-
-        # This is what main.py line 759 does between tasks:
-        history.episodes.clear()
-
-        # cursor is still 2, but episodes is empty
-        assert history.cursor == 2
-        assert len(history.episodes) == 0
-
-        # This is what main.py line 687 calls at the start of the next task.
-        # BUG: cursor (2) != len(episodes) (0), so it falls through to
-        # self.episodes[2] on an empty list -> IndexError
-        #
-        # After the fix, this should return None (no current episode).
-        result = history.current_episode
-        assert result is None
-
+class TestEpisodicActionHistoryCursor:
    def test_current_episode_returns_none_on_empty_history(self):
        history = EpisodicActionHistory()
        assert history.current_episode is None
@@ -64,26 +34,48 @@ class TestEpisodicActionHistoryCursorReset:
    def test_current_episode_returns_episode_when_cursor_valid(self):
        history = EpisodicActionHistory()
        ep = MagicMock()
-        ep.result = None  # not yet completed
+        ep.result = None
        history.episodes.append(ep)
        history.cursor = 0
        assert history.current_episode is ep

-    def test_clear_and_reset_allows_new_task(self):
-        """After properly clearing episodes AND resetting cursor,
-        the history should work correctly for a new task."""
-        history = _make_history_with_episodes(3)
-
-        # Clean reset between tasks
-        history.episodes.clear()
-        history.cursor = 0
-
-        assert history.current_episode is None
-        assert len(history) == 0
-
    def test_cursor_beyond_episodes_returns_none(self):
-        """Any cursor value beyond the episode list should return None,
-        not raise IndexError."""
+        """Any cursor value beyond the episode list should return None."""
        history = EpisodicActionHistory()
-        history.cursor = 100  # way past empty list
+        history.cursor = 100
        assert history.current_episode is None
+
+    def test_cursor_safe_after_clear(self):
+        """Even if episodes are cleared without resetting cursor,
+        current_episode must not crash (>= guard)."""
+        history = _make_history_with_episodes(2)
+        history.episodes.clear()
+        assert history.cursor == 2
+        assert history.current_episode is None
+
+
+class TestHistoryPreservedAcrossTasks:
+    def test_episodes_survive_task_change(self):
+        """When user starts a new task, episodes from the previous task
+        should still be present — the compression system handles overflow."""
+        history = _make_history_with_episodes(3)
+        assert len(history.episodes) == 3
+        assert history.cursor == 3
+
+        # Simulate what main.py does on task change (no clearing)
+        # history is untouched — episodes remain
+
+        assert len(history.episodes) == 3
+        assert history.current_episode is None  # cursor at end
+
+    def test_new_episode_appends_after_previous(self):
+        """New task actions append to existing history."""
+        history = _make_history_with_episodes(2)
+
+        # New task starts — add a new episode
+        new_ep = MagicMock()
+        new_ep.result = None
+        history.episodes.append(new_ep)
+        # cursor still at 2, which is now the new episode
+        assert history.current_episode is new_ep
+        assert len(history.episodes) == 3
--- a/classic/original_autogpt/.env.template
+++ b/classic/original_autogpt/.env.template
@@ -102,11 +102,11 @@
 ### LLM MODELS
 ################################################################################

-## SMART_LLM - Smart language model (Default: gpt-5.4)
-# SMART_LLM=gpt-5.4
+## SMART_LLM - Smart language model (Default: gpt-4-turbo)
+# SMART_LLM=gpt-4-turbo

-## FAST_LLM - Fast language model (Default: gpt-5.4)
-# FAST_LLM=gpt-5.4
+## FAST_LLM - Fast language model (Default: gpt-3.5-turbo)
+# FAST_LLM=gpt-3.5-turbo

 ## EMBEDDING_MODEL - Model to use for creating embeddings
 # EMBEDDING_MODEL=text-embedding-3-small
--- a/classic/original_autogpt/CLAUDE.md
+++ b/classic/original_autogpt/CLAUDE.md
@@ -130,8 +130,8 @@ speak: str             # What to say to user

 **`AppConfig`** (Pydantic BaseModel):
 ```python
-smart_llm: ModelName = "gpt-5.4"         # Complex reasoning
-fast_llm: ModelName = "gpt-5.4"         # Fast operations
+smart_llm: ModelName = "gpt-4-turbo"    # Complex reasoning
+fast_llm: ModelName = "gpt-3.5-turbo"   # Fast operations
 temperature: float = 0.0
 continuous_mode: bool = False
 continuous_limit: int = 0
@@ -254,8 +254,8 @@ config.disabled_commands.append("execute_python")

 ### Custom LLM
 ```bash
-SMART_LLM=gpt-5.4
-FAST_LLM=gpt-5.4-mini
+SMART_LLM=gpt-4
+FAST_LLM=gpt-3.5-turbo
 TEMPERATURE=0.7
 ```

--- a/classic/original_autogpt/README.md
+++ b/classic/original_autogpt/README.md
@@ -17,7 +17,7 @@ Demo made by <a href=https://twitter.com/BlakeWerlinger>Blake Werlinger</a>
 - 🔌 Agent Protocol ([docs](https://agentprotocol.ai))
 - 💻 Easy to use UI
 - 🌐 Internet access for searches and information gathering
- 🧠 Powered by GPT-5.4, Claude Opus 4.6, and other modern LLMs
+- 🧠 Powered by a mix of GPT-4 and GPT-3.5 Turbo
 - 🔗 Access to popular websites and platforms
 - 🗃️ File generation and editing capabilities
 - 🔌 Extensibility with Plugins
--- a/classic/original_autogpt/autogpt/app/main.py
+++ b/classic/original_autogpt/autogpt/app/main.py
@@ -754,10 +754,18 @@ async def run_interaction_loop(
                logger.info("User chose to exit after task completion.")
                return

-            # Start new task in same workspace
+            # Close the finish episode so the loop doesn't reuse it.
+            # AgentFinished is caught before execute() can register
+            # a result, leaving result=None — which the loop
+            # interprets as "episode in progress, reuse proposal".
+            from forge.models.action import ActionSuccessResult
+
+            agent.event_history.register_result(
+                ActionSuccessResult(outputs=e.message)
+            )
+
+            # Start new task in same workspace, keeping prior context
            agent.state.task = next_task
-            agent.event_history.episodes.clear()  # Clear history for fresh context
-            agent.event_history.cursor = 0

            # Reset cycle budget for new task
            cycles_remaining = _get_cycle_budget(
--- a/classic/original_autogpt/azure.yaml.template
+++ b/classic/original_autogpt/azure.yaml.template
@@ -2,6 +2,6 @@ azure_api_type: azure
 azure_api_version: api-version-for-azure
 azure_endpoint: your-azure-openai-endpoint
 azure_model_map:
-    gpt-5.4: gpt54-deployment-id-for-azure
-    gpt-5.4-mini: gpt54mini-deployment-id-for-azure
+    gpt-3.5-turbo-0125: gpt35-deployment-id-for-azure
+    gpt-4-turbo-preview: gpt4-deployment-id-for-azure
    text-embedding-3-small: embedding-deployment-id-for-azure
--- a/docs/content/challenges/introduction.md
+++ b/docs/content/challenges/introduction.md
@@ -21,12 +21,14 @@ To learn more about submitting and beating challenges, please visit the [List of

 We look forward to your contributions and the exciting solutions that the community will develop together to make AutoGPT even better!

-!!! info
+!!! warning
    
-    The benchmark system has been replaced by `direct_benchmark`. Run benchmarks with:
+    We're slowly transitioning to agbenchmark. agbenchmark is a simpler way to improve AutoGPT. Simply run:
    
    ```
-    poetry run direct-benchmark run --strategies one_shot --models claude
+    agbenchmark
    ```
+    
+    and beat as many challenges as possible.

-For more options, see the [direct_benchmark README](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/direct_benchmark).
+For more agbenchmark options, look at the [readme](https://github.com/Significant-Gravitas/Auto-GPT-Benchmarks/tree/master/agbenchmark).
--- a/docs/content/classic/configuration/options.md
+++ b/docs/content/classic/configuration/options.md
@@ -14,7 +14,7 @@ You can set configuration variables via the `.env` file. If you don't have a `.e
 - `ELEVENLABS_VOICE_ID`: ElevenLabs Voice ID. Optional.
 - `EMBEDDING_MODEL`: LLM Model to use for embedding tasks. Default: `text-embedding-3-small`
 - `EXIT_KEY`: Exit key accepted to exit. Default: n
- `FAST_LLM`: LLM Model to use for most tasks. Default: `gpt-5.4`
+- `FAST_LLM`: LLM Model to use for most tasks. Default: `gpt-3.5-turbo-0125`
 - `GITHUB_API_KEY`: [Github API Key](https://github.com/settings/tokens). Optional.
 - `GITHUB_USERNAME`: GitHub Username. Optional.
 - `GOOGLE_API_KEY`: Google API key. Optional.
@@ -28,7 +28,7 @@ You can set configuration variables via the `.env` file. If you don't have a `.e
 - `PLAIN_OUTPUT`: Plain output, which disables the spinner. Default: False
 - `RESTRICT_TO_WORKSPACE`: The restrict file reading and writing to the workspace directory. Default: True
 - `SD_WEBUI_AUTH`: Stable Diffusion Web UI username:password pair. Optional.
- `SMART_LLM`: LLM Model to use for "smart" tasks. Default: `gpt-5.4`
+- `SMART_LLM`: LLM Model to use for "smart" tasks. Default: `gpt-4-turbo-preview`
 - `STREAMELEMENTS_VOICE`: StreamElements voice to use. Default: Brian
 - `TEMPERATURE`: Value of temperature given to OpenAI. Value from 0 to 2. Lower is more deterministic, higher is more random. See https://platform.openai.com/docs/api-reference/completions/create#completions/create-temperature
 - `TEXT_TO_SPEECH_PROVIDER`: Text to Speech Provider. Options are `gtts`, `macos`, `elevenlabs`, and `streamelements`. Default: gtts
--- a/docs/content/classic/index.md
+++ b/docs/content/classic/index.md
@@ -32,11 +32,12 @@ disciplines, as long as it can be done on a computer.

 Welcome to the AutoGPT Classic Documentation.

-The AutoGPT Classic project consists of three main components:
+The AutoGPT Classic project consists of four main components:

 - The [Agent](#agent) &ndash; also known as just "AutoGPT Classic"
- The [Benchmark](#benchmark) &ndash; `direct_benchmark`
+- The [Benchmark](#benchmark) &ndash; AKA `agbenchmark`
 - The [Forge](#forge)
+- The [Frontend](#frontend)

 To tie these together, we also have a [CLI] at the root of the project.

@@ -64,9 +65,15 @@ If you'd like to see what's next, check out the [AutoGPT Platform](../index.md).

 ## 🎯 Benchmark

-**[🗒️ Readme](https://github.com/Significant-Gravitas/AutoGPT/blob/master/classic/direct_benchmark/README.md)**
+**[🗒️ Readme](https://github.com/Significant-Gravitas/AutoGPT/blob/master/classic/benchmark/README.md)**

-Measure your agent's performance! The `direct_benchmark` harness tests agents directly with support for multiple prompt strategies (one_shot, reflexion, plan_execute, tree_of_thoughts, etc.) and model configurations. It supports parallel execution and detailed reporting.
+Measure your agent's performance! The `agbenchmark` can be used with any agent that supports the agent protocol, and the integration with the project's [CLI] makes it even easier to use with AutoGPT Classic and forge-based agents. The benchmark offers a stringent testing environment. Our framework allows for autonomous, objective performance evaluations, ensuring your agents are primed for real-world action.
+
+<!-- TODO: insert visual demonstrating the benchmark -->
+
+- 📦 [**`agbenchmark`**](https://pypi.org/project/agbenchmark/) on Pypi
+
+- 🔌 **Agent Protocol Standardization** - AutoGPT Classic uses the agent protocol from the AI Engineer Foundation to ensure compatibility with many agents, both from within and outside the project.

 ---

@@ -84,6 +91,16 @@ Forge your own agent! The Forge is a ready-to-go template for your agent applica

 ---

+## 💻 Frontend
+
+**[🗒️ Readme](https://github.com/Significant-Gravitas/AutoGPT/blob/master/classic/frontend/README.md)**
+
+An easy-to-use and open source frontend for any Agent Protocol-compliant agent.
+
+- 🎮 **User-Friendly Interface** - Manage your agents effortlessly.
+
+- 🔄 **Seamless Integration** - Smooth connectivity between your agent and our benchmarking system.
+
 ---

 ## 🔧 CLI
--- a/docs/content/classic/setup/index.md
+++ b/docs/content/classic/setup/index.md
@@ -104,7 +104,7 @@ If you don't know which to choose, you can safely go with OpenAI*.
 !!! attention
    To use AutoGPT with GPT-4 (recommended), you need to set up a paid OpenAI account
    with some money in it. Please refer to OpenAI for further instructions ([link][openai/help-gpt-4-access]).
-    Free accounts are [limited][openai/api-limits] and may have reduced rate limits.
+    Free accounts are [limited][openai/api-limits] to GPT-3.5 with only 3 requests per minute.

 1. Make sure you have a paid account with some credits set up: [Settings > Organization > Billing][openai/billing]
 1. Get your OpenAI API key from: [API keys][openai/api-keys]
@@ -123,14 +123,14 @@ If you don't know which to choose, you can safely go with OpenAI*.
        `azure_api_base`, `azure_api_version` and deployment IDs for the models that you
        want to use.

-        E.g. if you want to use `gpt-5.4` and `gpt-5.4-mini`:
+        E.g. if you want to use `gpt-3.5-turbo` and `gpt-4-turbo`:

        ```yaml
        # Please specify all of these values as double-quoted strings
        # Replace string in angled brackets (<>) to your own deployment Name
        azure_model_map:
-            gpt-5.4: "<gpt-54-deployment-id>"
-            gpt-5.4-mini: "<gpt-54-mini-deployment-id>"
+            gpt-3.5-turbo: "<gpt-35-turbo-deployment-id>"
+            gpt-4-turbo: "<gpt-4-turbo-deployment-id>"
            ...
        ```

--- a/docs/content/forge/components/built-in-components.md
+++ b/docs/content/forge/components/built-in-components.md
@@ -86,7 +86,7 @@ Keeps track of agent's actions and their outcomes. Provides their summary to the

 | Config variable        | Details                                                 | Type        | Default            |
 | ---------------------- | ------------------------------------------------------- | ----------- | ------------------ |
-| `llm_name`             | Name of the llm model used to compress the history      | `ModelName` | `"gpt-5.4-mini"`  |
+| `llm_name`             | Name of the llm model used to compress the history      | `ModelName` | `"gpt-3.5-turbo"`  |
 | `max_tokens`           | Maximum number of tokens to use for the history summary | `int`       | `1024`             |
 | `spacy_language_model` | Language model used for summary chunking using spacy    | `str`       | `"en_core_web_sm"` |
 | `full_message_count`   | Number of cycles to include unsummarized in the prompt  | `int`       | `4`                |
@@ -181,7 +181,7 @@ Allows agent to read websites using Selenium.

 | Config variable               | Details                                     | Type                                          | Default                                                                                                                      |
 | ----------------------------- | ------------------------------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
-| `llm_name`                    | Name of the llm model used to read websites | `ModelName`                                   | `"gpt-5.4-mini"`                                                                                                            |
+| `llm_name`                    | Name of the llm model used to read websites | `ModelName`                                   | `"gpt-3.5-turbo"`                                                                                                            |
 | `web_browser`                 | Web browser used by Selenium                | `"chrome" \| "firefox" \| "safari" \| "edge"` | `"chrome"`                                                                                                                   |
 | `headless`                    | Run browser in headless mode                | `bool`                                        | `True`                                                                                                                       |
 | `user_agent`                  | User agent used by the browser              | `str`                                         | `"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"` |