mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-04-08 03:00:28 -04:00
Compare commits
1 Commits
feat/keep-
...
fix/classi
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
9cad616950 |
42
README.md
42
README.md
@@ -130,7 +130,7 @@ These examples show just a glimpse of what you can achieve with AutoGPT! You can
|
||||
All code and content within the `autogpt_platform` folder is licensed under the Polyform Shield License. This new project is our in-developlemt platform for building, deploying and managing agents.</br>_[Read more about this effort](https://agpt.co/blog/introducing-the-autogpt-platform)_
|
||||
|
||||
🦉 **MIT License:**
|
||||
All other portions of the AutoGPT repository (i.e., everything outside the `autogpt_platform` folder) are licensed under the MIT License. This includes the original stand-alone AutoGPT Agent, along with projects such as [Forge](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/forge), [agbenchmark](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/benchmark) and the [AutoGPT Classic GUI](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/frontend).</br>We also publish additional work under the MIT Licence in other repositories, such as [GravitasML](https://github.com/Significant-Gravitas/gravitasml) which is developed for and used in the AutoGPT Platform. See also our MIT Licenced [Code Ability](https://github.com/Significant-Gravitas/AutoGPT-Code-Ability) project.
|
||||
All other portions of the AutoGPT repository (i.e., everything outside the `autogpt_platform` folder) are licensed under the MIT License. This includes the original stand-alone AutoGPT Agent, along with projects such as [Forge](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/forge) and the [Direct Benchmark](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/direct_benchmark).</br>We also publish additional work under the MIT Licence in other repositories, such as [GravitasML](https://github.com/Significant-Gravitas/gravitasml) which is developed for and used in the AutoGPT Platform. See also our MIT Licenced [Code Ability](https://github.com/Significant-Gravitas/AutoGPT-Code-Ability) project.
|
||||
|
||||
---
|
||||
### Mission
|
||||
@@ -150,7 +150,7 @@ Be part of the revolution! **AutoGPT** is here to stay, at the forefront of AI i
|
||||
## 🤖 AutoGPT Classic
|
||||
> Below is information about the classic version of AutoGPT.
|
||||
|
||||
**🛠️ [Build your own Agent - Quickstart](classic/FORGE-QUICKSTART.md)**
|
||||
**🛠️ [Build your own Agent - Forge](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/forge)**
|
||||
|
||||
### 🏗️ Forge
|
||||
|
||||
@@ -161,46 +161,26 @@ This guide will walk you through the process of creating your own agent and usin
|
||||
|
||||
📘 [Learn More](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/forge) about Forge
|
||||
|
||||
### 🎯 Benchmark
|
||||
### 🎯 Direct Benchmark
|
||||
|
||||
**Measure your agent's performance!** The `agbenchmark` can be used with any agent that supports the agent protocol, and the integration with the project's [CLI] makes it even easier to use with AutoGPT and forge-based agents. The benchmark offers a stringent testing environment. Our framework allows for autonomous, objective performance evaluations, ensuring your agents are primed for real-world action.
|
||||
**Measure your agent's performance!** The `direct_benchmark` harness tests agents directly without the agent protocol overhead. It supports multiple prompt strategies (one_shot, reflexion, plan_execute, tree_of_thoughts, etc.) and model configurations, with parallel execution and detailed reporting.
|
||||
|
||||
<!-- TODO: insert visual demonstrating the benchmark -->
|
||||
|
||||
📦 [`agbenchmark`](https://pypi.org/project/agbenchmark/) on Pypi
|
||||
 | 
|
||||
📘 [Learn More](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/benchmark) about the Benchmark
|
||||
|
||||
### 💻 UI
|
||||
|
||||
**Makes agents easy to use!** The `frontend` gives you a user-friendly interface to control and monitor your agents. It connects to agents through the [agent protocol](#-agent-protocol), ensuring compatibility with many agents from both inside and outside of our ecosystem.
|
||||
|
||||
<!-- TODO: insert screenshot of front end -->
|
||||
|
||||
The frontend works out-of-the-box with all agents in the repo. Just use the [CLI] to run your agent of choice!
|
||||
|
||||
📘 [Learn More](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/frontend) about the Frontend
|
||||
📘 [Learn More](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/direct_benchmark) about the Benchmark
|
||||
|
||||
### ⌨️ CLI
|
||||
|
||||
[CLI]: #-cli
|
||||
|
||||
To make it as easy as possible to use all of the tools offered by the repository, a CLI is included at the root of the repo:
|
||||
AutoGPT Classic is run via Poetry from the `classic/` directory:
|
||||
|
||||
```shell
|
||||
$ ./run
|
||||
Usage: cli.py [OPTIONS] COMMAND [ARGS]...
|
||||
|
||||
Options:
|
||||
--help Show this message and exit.
|
||||
|
||||
Commands:
|
||||
agent Commands to create, start and stop agents
|
||||
benchmark Commands to start the benchmark and list tests and categories
|
||||
setup Installs dependencies needed for your system.
|
||||
cd classic
|
||||
poetry install
|
||||
poetry run autogpt # Interactive CLI mode
|
||||
poetry run serve --debug # Agent Protocol server
|
||||
```
|
||||
|
||||
Just clone the repo, install dependencies with `./run setup`, and you should be good to go!
|
||||
See the [classic README](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic) for full setup instructions.
|
||||
|
||||
## 🤔 Questions? Problems? Suggestions?
|
||||
|
||||
|
||||
@@ -16,9 +16,9 @@ classic/
|
||||
│ └── forge/ # Core agent framework package
|
||||
├── original_autogpt/
|
||||
│ └── autogpt/ # AutoGPT agent package
|
||||
├── direct_benchmark/
|
||||
│ └── direct_benchmark/ # Benchmark harness package
|
||||
└── benchmark/ # Challenge definitions (data, not code)
|
||||
└── direct_benchmark/
|
||||
├── direct_benchmark/ # Benchmark harness package
|
||||
└── challenges/ # Challenge definitions (data)
|
||||
```
|
||||
|
||||
All packages are managed by a single `pyproject.toml` at the classic/ root.
|
||||
@@ -112,7 +112,7 @@ The `forge` package is the foundation that other components depend on:
|
||||
### Direct Benchmark
|
||||
Benchmark harness for testing agent performance:
|
||||
- `direct_benchmark/direct_benchmark/` - CLI and harness code
|
||||
- `benchmark/agbenchmark/challenges/` - Test cases organized by category (code, retrieval, data, etc.)
|
||||
- `direct_benchmark/challenges/` - Test cases organized by category (code, retrieval, data, etc.)
|
||||
- Reports generated in `direct_benchmark/reports/`
|
||||
|
||||
### Package Structure
|
||||
|
||||
@@ -24,8 +24,7 @@ classic/
|
||||
├── poetry.lock # Single lock file
|
||||
├── forge/ # Core autonomous agent framework
|
||||
├── original_autogpt/ # Original implementation
|
||||
├── direct_benchmark/ # Benchmark harness
|
||||
└── benchmark/ # Challenge definitions (data)
|
||||
└── direct_benchmark/ # Benchmark harness + challenge definitions
|
||||
```
|
||||
|
||||
## Getting Started
|
||||
|
||||
@@ -76,7 +76,7 @@ This folder contains all the files you want the agent to have in its workspace B
|
||||
### artifacts_out
|
||||
|
||||
This folder contains all the files you would like the agent to generate. This folder is used to mock the agent.
|
||||
This allows to run agbenchmark --test=TestExample --mock and make sure our challenge actually works.
|
||||
This allows running the challenge with mock data to verify it works correctly.
|
||||
|
||||
### custom_python
|
||||
|
||||
|
||||
@@ -1,13 +1,24 @@
|
||||
# This is the official challenge library for https://github.com/Significant-Gravitas/Auto-GPT-Benchmarks
|
||||
# Challenge Definitions
|
||||
|
||||
The goal of this repo is to provide easy challenge creation for test driven development with the Auto-GPT-Benchmarks package. This is essentially a library to craft challenges using a dsl (jsons in this case).
|
||||
This directory contains challenge data files used by the `direct_benchmark` harness.
|
||||
|
||||
This is the up to date dependency graph: https://sapphire-denys-23.tiiny.site/
|
||||
Each challenge is a directory containing a `data.json` file that defines the task, ground truth, and evaluation criteria. See `CHALLENGE.md` for the data schema.
|
||||
|
||||
### How to use
|
||||
## Structure
|
||||
|
||||
Make sure you have the package installed with `pip install agbenchmark`.
|
||||
```
|
||||
challenges/
|
||||
├── abilities/ # Basic agent capabilities (read/write files)
|
||||
├── alignment/ # Safety and alignment tests
|
||||
├── verticals/ # Domain-specific challenges (code, data, scrape, etc.)
|
||||
└── library/ # Additional challenge library
|
||||
```
|
||||
|
||||
If you would just like to use the default challenges, don't worry about this repo. Just install the package and you will have access to the default challenges.
|
||||
## Running Challenges
|
||||
|
||||
To add new challenges as you develop, add this repo as a submodule to your `project/agbenchmark` folder. Any new challenges you add within the submodule will get registered automatically.
|
||||
```bash
|
||||
# From the classic/ directory
|
||||
poetry run direct-benchmark run --tests ReadFile
|
||||
poetry run direct-benchmark run --strategies one_shot --models claude
|
||||
poetry run direct-benchmark run --help
|
||||
```
|
||||
|
||||
@@ -99,8 +99,8 @@ def load_component_configs(self, json: str) # Restore configs
|
||||
|
||||
**Configuration (`BaseAgentConfiguration`):**
|
||||
```python
|
||||
fast_llm: ModelName = "gpt-3.5-turbo-16k"
|
||||
smart_llm: ModelName = "gpt-4"
|
||||
fast_llm: ModelName = "gpt-5.4"
|
||||
smart_llm: ModelName = "gpt-5.4"
|
||||
big_brain: bool = True # Use smart_llm
|
||||
cycle_budget: Optional[int] = 1 # Steps before approval needed
|
||||
send_token_limit: Optional[int] # Prompt token budget
|
||||
|
||||
@@ -116,7 +116,7 @@ You can set sensitive variables in the `.json` file as well but it's recommended
|
||||
"github_username": null
|
||||
},
|
||||
"ActionHistoryConfiguration": {
|
||||
"llm_name": "gpt-3.5-turbo",
|
||||
"llm_name": "gpt-5.4-mini",
|
||||
"max_tokens": 1024,
|
||||
"spacy_language_model": "en_core_web_sm"
|
||||
},
|
||||
@@ -129,7 +129,7 @@ You can set sensitive variables in the `.json` file as well but it's recommended
|
||||
"duckduckgo_max_attempts": 3
|
||||
},
|
||||
"WebSeleniumConfiguration": {
|
||||
"llm_name": "gpt-3.5-turbo",
|
||||
"llm_name": "gpt-5.4-mini",
|
||||
"web_browser": "chrome",
|
||||
"headless": true,
|
||||
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36",
|
||||
|
||||
@@ -1,8 +1,10 @@
|
||||
"""Tests for EpisodicActionHistory cursor safety and task continuation.
|
||||
"""Test for cursor reset bug when clearing episode history between tasks.
|
||||
|
||||
Covers:
|
||||
- Cursor >= len guard in current_episode (prevents IndexError)
|
||||
- History preserved across task changes (no clearing)
|
||||
Reproduces: IndexError in EpisodicActionHistory.current_episode when
|
||||
episodes.clear() is called without resetting cursor to 0.
|
||||
|
||||
This is the exact crash from run_interaction_loop when the user starts a
|
||||
second task after finishing the first one.
|
||||
"""
|
||||
|
||||
from unittest.mock import MagicMock
|
||||
@@ -14,14 +16,42 @@ def _make_history_with_episodes(n: int) -> EpisodicActionHistory:
|
||||
"""Create a history with n completed episodes (cursor advanced past all)."""
|
||||
history = EpisodicActionHistory()
|
||||
for i in range(n):
|
||||
# Directly append mock episodes and advance cursor,
|
||||
# simulating what register_action + register_result does
|
||||
ep = MagicMock()
|
||||
ep.result = MagicMock()
|
||||
ep.result = MagicMock() # has a result = completed
|
||||
history.episodes.append(ep)
|
||||
history.cursor += 1
|
||||
return history
|
||||
|
||||
|
||||
class TestEpisodicActionHistoryCursor:
|
||||
class TestEpisodicActionHistoryCursorReset:
|
||||
def test_current_episode_after_clear_without_cursor_reset_crashes(self):
|
||||
"""REPRODUCER: This is the exact bug.
|
||||
|
||||
After completing a task, the interaction loop clears episodes but
|
||||
doesn't reset cursor. On the next task, current_episode does
|
||||
`self[self.cursor]` where cursor > len(episodes) -> IndexError.
|
||||
"""
|
||||
history = _make_history_with_episodes(2)
|
||||
assert history.cursor == 2
|
||||
assert len(history.episodes) == 2
|
||||
|
||||
# This is what main.py line 759 does between tasks:
|
||||
history.episodes.clear()
|
||||
|
||||
# cursor is still 2, but episodes is empty
|
||||
assert history.cursor == 2
|
||||
assert len(history.episodes) == 0
|
||||
|
||||
# This is what main.py line 687 calls at the start of the next task.
|
||||
# BUG: cursor (2) != len(episodes) (0), so it falls through to
|
||||
# self.episodes[2] on an empty list -> IndexError
|
||||
#
|
||||
# After the fix, this should return None (no current episode).
|
||||
result = history.current_episode
|
||||
assert result is None
|
||||
|
||||
def test_current_episode_returns_none_on_empty_history(self):
|
||||
history = EpisodicActionHistory()
|
||||
assert history.current_episode is None
|
||||
@@ -34,48 +64,26 @@ class TestEpisodicActionHistoryCursor:
|
||||
def test_current_episode_returns_episode_when_cursor_valid(self):
|
||||
history = EpisodicActionHistory()
|
||||
ep = MagicMock()
|
||||
ep.result = None
|
||||
ep.result = None # not yet completed
|
||||
history.episodes.append(ep)
|
||||
history.cursor = 0
|
||||
assert history.current_episode is ep
|
||||
|
||||
def test_cursor_beyond_episodes_returns_none(self):
|
||||
"""Any cursor value beyond the episode list should return None."""
|
||||
history = EpisodicActionHistory()
|
||||
history.cursor = 100
|
||||
assert history.current_episode is None
|
||||
|
||||
def test_cursor_safe_after_clear(self):
|
||||
"""Even if episodes are cleared without resetting cursor,
|
||||
current_episode must not crash (>= guard)."""
|
||||
history = _make_history_with_episodes(2)
|
||||
history.episodes.clear()
|
||||
assert history.cursor == 2
|
||||
assert history.current_episode is None
|
||||
|
||||
|
||||
class TestHistoryPreservedAcrossTasks:
|
||||
def test_episodes_survive_task_change(self):
|
||||
"""When user starts a new task, episodes from the previous task
|
||||
should still be present — the compression system handles overflow."""
|
||||
def test_clear_and_reset_allows_new_task(self):
|
||||
"""After properly clearing episodes AND resetting cursor,
|
||||
the history should work correctly for a new task."""
|
||||
history = _make_history_with_episodes(3)
|
||||
assert len(history.episodes) == 3
|
||||
assert history.cursor == 3
|
||||
|
||||
# Simulate what main.py does on task change (no clearing)
|
||||
# history is untouched — episodes remain
|
||||
# Clean reset between tasks
|
||||
history.episodes.clear()
|
||||
history.cursor = 0
|
||||
|
||||
assert len(history.episodes) == 3
|
||||
assert history.current_episode is None # cursor at end
|
||||
assert history.current_episode is None
|
||||
assert len(history) == 0
|
||||
|
||||
def test_new_episode_appends_after_previous(self):
|
||||
"""New task actions append to existing history."""
|
||||
history = _make_history_with_episodes(2)
|
||||
|
||||
# New task starts — add a new episode
|
||||
new_ep = MagicMock()
|
||||
new_ep.result = None
|
||||
history.episodes.append(new_ep)
|
||||
# cursor still at 2, which is now the new episode
|
||||
assert history.current_episode is new_ep
|
||||
assert len(history.episodes) == 3
|
||||
def test_cursor_beyond_episodes_returns_none(self):
|
||||
"""Any cursor value beyond the episode list should return None,
|
||||
not raise IndexError."""
|
||||
history = EpisodicActionHistory()
|
||||
history.cursor = 100 # way past empty list
|
||||
assert history.current_episode is None
|
||||
|
||||
@@ -102,11 +102,11 @@
|
||||
### LLM MODELS
|
||||
################################################################################
|
||||
|
||||
## SMART_LLM - Smart language model (Default: gpt-4-turbo)
|
||||
# SMART_LLM=gpt-4-turbo
|
||||
## SMART_LLM - Smart language model (Default: gpt-5.4)
|
||||
# SMART_LLM=gpt-5.4
|
||||
|
||||
## FAST_LLM - Fast language model (Default: gpt-3.5-turbo)
|
||||
# FAST_LLM=gpt-3.5-turbo
|
||||
## FAST_LLM - Fast language model (Default: gpt-5.4)
|
||||
# FAST_LLM=gpt-5.4
|
||||
|
||||
## EMBEDDING_MODEL - Model to use for creating embeddings
|
||||
# EMBEDDING_MODEL=text-embedding-3-small
|
||||
|
||||
@@ -130,8 +130,8 @@ speak: str # What to say to user
|
||||
|
||||
**`AppConfig`** (Pydantic BaseModel):
|
||||
```python
|
||||
smart_llm: ModelName = "gpt-4-turbo" # Complex reasoning
|
||||
fast_llm: ModelName = "gpt-3.5-turbo" # Fast operations
|
||||
smart_llm: ModelName = "gpt-5.4" # Complex reasoning
|
||||
fast_llm: ModelName = "gpt-5.4" # Fast operations
|
||||
temperature: float = 0.0
|
||||
continuous_mode: bool = False
|
||||
continuous_limit: int = 0
|
||||
@@ -254,8 +254,8 @@ config.disabled_commands.append("execute_python")
|
||||
|
||||
### Custom LLM
|
||||
```bash
|
||||
SMART_LLM=gpt-4
|
||||
FAST_LLM=gpt-3.5-turbo
|
||||
SMART_LLM=gpt-5.4
|
||||
FAST_LLM=gpt-5.4-mini
|
||||
TEMPERATURE=0.7
|
||||
```
|
||||
|
||||
|
||||
@@ -17,7 +17,7 @@ Demo made by <a href=https://twitter.com/BlakeWerlinger>Blake Werlinger</a>
|
||||
- 🔌 Agent Protocol ([docs](https://agentprotocol.ai))
|
||||
- 💻 Easy to use UI
|
||||
- 🌐 Internet access for searches and information gathering
|
||||
- 🧠 Powered by a mix of GPT-4 and GPT-3.5 Turbo
|
||||
- 🧠 Powered by GPT-5.4, Claude Opus 4.6, and other modern LLMs
|
||||
- 🔗 Access to popular websites and platforms
|
||||
- 🗃️ File generation and editing capabilities
|
||||
- 🔌 Extensibility with Plugins
|
||||
|
||||
@@ -754,18 +754,10 @@ async def run_interaction_loop(
|
||||
logger.info("User chose to exit after task completion.")
|
||||
return
|
||||
|
||||
# Close the finish episode so the loop doesn't reuse it.
|
||||
# AgentFinished is caught before execute() can register
|
||||
# a result, leaving result=None — which the loop
|
||||
# interprets as "episode in progress, reuse proposal".
|
||||
from forge.models.action import ActionSuccessResult
|
||||
|
||||
agent.event_history.register_result(
|
||||
ActionSuccessResult(outputs=e.message)
|
||||
)
|
||||
|
||||
# Start new task in same workspace, keeping prior context
|
||||
# Start new task in same workspace
|
||||
agent.state.task = next_task
|
||||
agent.event_history.episodes.clear() # Clear history for fresh context
|
||||
agent.event_history.cursor = 0
|
||||
|
||||
# Reset cycle budget for new task
|
||||
cycles_remaining = _get_cycle_budget(
|
||||
|
||||
@@ -2,6 +2,6 @@ azure_api_type: azure
|
||||
azure_api_version: api-version-for-azure
|
||||
azure_endpoint: your-azure-openai-endpoint
|
||||
azure_model_map:
|
||||
gpt-3.5-turbo-0125: gpt35-deployment-id-for-azure
|
||||
gpt-4-turbo-preview: gpt4-deployment-id-for-azure
|
||||
gpt-5.4: gpt54-deployment-id-for-azure
|
||||
gpt-5.4-mini: gpt54mini-deployment-id-for-azure
|
||||
text-embedding-3-small: embedding-deployment-id-for-azure
|
||||
|
||||
@@ -21,14 +21,12 @@ To learn more about submitting and beating challenges, please visit the [List of
|
||||
|
||||
We look forward to your contributions and the exciting solutions that the community will develop together to make AutoGPT even better!
|
||||
|
||||
!!! warning
|
||||
!!! info
|
||||
|
||||
We're slowly transitioning to agbenchmark. agbenchmark is a simpler way to improve AutoGPT. Simply run:
|
||||
The benchmark system has been replaced by `direct_benchmark`. Run benchmarks with:
|
||||
|
||||
```
|
||||
agbenchmark
|
||||
poetry run direct-benchmark run --strategies one_shot --models claude
|
||||
```
|
||||
|
||||
and beat as many challenges as possible.
|
||||
|
||||
For more agbenchmark options, look at the [readme](https://github.com/Significant-Gravitas/Auto-GPT-Benchmarks/tree/master/agbenchmark).
|
||||
For more options, see the [direct_benchmark README](https://github.com/Significant-Gravitas/AutoGPT/tree/master/classic/direct_benchmark).
|
||||
|
||||
@@ -14,7 +14,7 @@ You can set configuration variables via the `.env` file. If you don't have a `.e
|
||||
- `ELEVENLABS_VOICE_ID`: ElevenLabs Voice ID. Optional.
|
||||
- `EMBEDDING_MODEL`: LLM Model to use for embedding tasks. Default: `text-embedding-3-small`
|
||||
- `EXIT_KEY`: Exit key accepted to exit. Default: n
|
||||
- `FAST_LLM`: LLM Model to use for most tasks. Default: `gpt-3.5-turbo-0125`
|
||||
- `FAST_LLM`: LLM Model to use for most tasks. Default: `gpt-5.4`
|
||||
- `GITHUB_API_KEY`: [Github API Key](https://github.com/settings/tokens). Optional.
|
||||
- `GITHUB_USERNAME`: GitHub Username. Optional.
|
||||
- `GOOGLE_API_KEY`: Google API key. Optional.
|
||||
@@ -28,7 +28,7 @@ You can set configuration variables via the `.env` file. If you don't have a `.e
|
||||
- `PLAIN_OUTPUT`: Plain output, which disables the spinner. Default: False
|
||||
- `RESTRICT_TO_WORKSPACE`: The restrict file reading and writing to the workspace directory. Default: True
|
||||
- `SD_WEBUI_AUTH`: Stable Diffusion Web UI username:password pair. Optional.
|
||||
- `SMART_LLM`: LLM Model to use for "smart" tasks. Default: `gpt-4-turbo-preview`
|
||||
- `SMART_LLM`: LLM Model to use for "smart" tasks. Default: `gpt-5.4`
|
||||
- `STREAMELEMENTS_VOICE`: StreamElements voice to use. Default: Brian
|
||||
- `TEMPERATURE`: Value of temperature given to OpenAI. Value from 0 to 2. Lower is more deterministic, higher is more random. See https://platform.openai.com/docs/api-reference/completions/create#completions/create-temperature
|
||||
- `TEXT_TO_SPEECH_PROVIDER`: Text to Speech Provider. Options are `gtts`, `macos`, `elevenlabs`, and `streamelements`. Default: gtts
|
||||
|
||||
@@ -32,12 +32,11 @@ disciplines, as long as it can be done on a computer.
|
||||
|
||||
Welcome to the AutoGPT Classic Documentation.
|
||||
|
||||
The AutoGPT Classic project consists of four main components:
|
||||
The AutoGPT Classic project consists of three main components:
|
||||
|
||||
- The [Agent](#agent) – also known as just "AutoGPT Classic"
|
||||
- The [Benchmark](#benchmark) – AKA `agbenchmark`
|
||||
- The [Benchmark](#benchmark) – `direct_benchmark`
|
||||
- The [Forge](#forge)
|
||||
- The [Frontend](#frontend)
|
||||
|
||||
To tie these together, we also have a [CLI] at the root of the project.
|
||||
|
||||
@@ -65,15 +64,9 @@ If you'd like to see what's next, check out the [AutoGPT Platform](../index.md).
|
||||
|
||||
## 🎯 Benchmark
|
||||
|
||||
**[🗒️ Readme](https://github.com/Significant-Gravitas/AutoGPT/blob/master/classic/benchmark/README.md)**
|
||||
**[🗒️ Readme](https://github.com/Significant-Gravitas/AutoGPT/blob/master/classic/direct_benchmark/README.md)**
|
||||
|
||||
Measure your agent's performance! The `agbenchmark` can be used with any agent that supports the agent protocol, and the integration with the project's [CLI] makes it even easier to use with AutoGPT Classic and forge-based agents. The benchmark offers a stringent testing environment. Our framework allows for autonomous, objective performance evaluations, ensuring your agents are primed for real-world action.
|
||||
|
||||
<!-- TODO: insert visual demonstrating the benchmark -->
|
||||
|
||||
- 📦 [**`agbenchmark`**](https://pypi.org/project/agbenchmark/) on Pypi
|
||||
|
||||
- 🔌 **Agent Protocol Standardization** - AutoGPT Classic uses the agent protocol from the AI Engineer Foundation to ensure compatibility with many agents, both from within and outside the project.
|
||||
Measure your agent's performance! The `direct_benchmark` harness tests agents directly with support for multiple prompt strategies (one_shot, reflexion, plan_execute, tree_of_thoughts, etc.) and model configurations. It supports parallel execution and detailed reporting.
|
||||
|
||||
---
|
||||
|
||||
@@ -91,16 +84,6 @@ Forge your own agent! The Forge is a ready-to-go template for your agent applica
|
||||
|
||||
---
|
||||
|
||||
## 💻 Frontend
|
||||
|
||||
**[🗒️ Readme](https://github.com/Significant-Gravitas/AutoGPT/blob/master/classic/frontend/README.md)**
|
||||
|
||||
An easy-to-use and open source frontend for any Agent Protocol-compliant agent.
|
||||
|
||||
- 🎮 **User-Friendly Interface** - Manage your agents effortlessly.
|
||||
|
||||
- 🔄 **Seamless Integration** - Smooth connectivity between your agent and our benchmarking system.
|
||||
|
||||
---
|
||||
|
||||
## 🔧 CLI
|
||||
|
||||
@@ -104,7 +104,7 @@ If you don't know which to choose, you can safely go with OpenAI*.
|
||||
!!! attention
|
||||
To use AutoGPT with GPT-4 (recommended), you need to set up a paid OpenAI account
|
||||
with some money in it. Please refer to OpenAI for further instructions ([link][openai/help-gpt-4-access]).
|
||||
Free accounts are [limited][openai/api-limits] to GPT-3.5 with only 3 requests per minute.
|
||||
Free accounts are [limited][openai/api-limits] and may have reduced rate limits.
|
||||
|
||||
1. Make sure you have a paid account with some credits set up: [Settings > Organization > Billing][openai/billing]
|
||||
1. Get your OpenAI API key from: [API keys][openai/api-keys]
|
||||
@@ -123,14 +123,14 @@ If you don't know which to choose, you can safely go with OpenAI*.
|
||||
`azure_api_base`, `azure_api_version` and deployment IDs for the models that you
|
||||
want to use.
|
||||
|
||||
E.g. if you want to use `gpt-3.5-turbo` and `gpt-4-turbo`:
|
||||
E.g. if you want to use `gpt-5.4` and `gpt-5.4-mini`:
|
||||
|
||||
```yaml
|
||||
# Please specify all of these values as double-quoted strings
|
||||
# Replace string in angled brackets (<>) to your own deployment Name
|
||||
azure_model_map:
|
||||
gpt-3.5-turbo: "<gpt-35-turbo-deployment-id>"
|
||||
gpt-4-turbo: "<gpt-4-turbo-deployment-id>"
|
||||
gpt-5.4: "<gpt-54-deployment-id>"
|
||||
gpt-5.4-mini: "<gpt-54-mini-deployment-id>"
|
||||
...
|
||||
```
|
||||
|
||||
|
||||
@@ -86,7 +86,7 @@ Keeps track of agent's actions and their outcomes. Provides their summary to the
|
||||
|
||||
| Config variable | Details | Type | Default |
|
||||
| ---------------------- | ------------------------------------------------------- | ----------- | ------------------ |
|
||||
| `llm_name` | Name of the llm model used to compress the history | `ModelName` | `"gpt-3.5-turbo"` |
|
||||
| `llm_name` | Name of the llm model used to compress the history | `ModelName` | `"gpt-5.4-mini"` |
|
||||
| `max_tokens` | Maximum number of tokens to use for the history summary | `int` | `1024` |
|
||||
| `spacy_language_model` | Language model used for summary chunking using spacy | `str` | `"en_core_web_sm"` |
|
||||
| `full_message_count` | Number of cycles to include unsummarized in the prompt | `int` | `4` |
|
||||
@@ -181,7 +181,7 @@ Allows agent to read websites using Selenium.
|
||||
|
||||
| Config variable | Details | Type | Default |
|
||||
| ----------------------------- | ------------------------------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `llm_name` | Name of the llm model used to read websites | `ModelName` | `"gpt-3.5-turbo"` |
|
||||
| `llm_name` | Name of the llm model used to read websites | `ModelName` | `"gpt-5.4-mini"` |
|
||||
| `web_browser` | Web browser used by Selenium | `"chrome" \| "firefox" \| "safari" \| "edge"` | `"chrome"` |
|
||||
| `headless` | Run browser in headless mode | `bool` | `True` |
|
||||
| `user_agent` | User agent used by the browser | `str` | `"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"` |
|
||||
|
||||
Reference in New Issue
Block a user