feat(forge): Component-specific configuration (#7170)

Remove many env vars and use component-level configuration that could be loaded from file instead.

### Changed

- `BaseAgent` provides `serialize_configs` and `deserialize_configs` that can save and load all component configuration as json `str`. Deserialized components/values overwrite existing values, so not all values need to be present in the serialized config.
- Decoupled `forge/content_processing/text.py` from `Config`
- Kept `execute_local_commands` in `Config` because it's needed to know if OS info should be included in the prompt
- Updated docs to reflect changes
- Renamed `Config` to `AppConfig`

### Added

- Added `ConfigurableComponent` class for components and following configs:
  - `ActionHistoryConfiguration`
  - `CodeExecutorConfiguration`
  - `FileManagerConfiguration` - now file manager allows to have multiple agents using the same workspace
  - `GitOperationsConfiguration`
  - `ImageGeneratorConfiguration`
  - `WebSearchConfiguration`
  - `WebSeleniumConfiguration`
- `BaseConfig` in `forge` and moved `Config` (now inherits from `BaseConfig`) back to `autogpt`
- Required `config_class` attribute for the `ConfigurableComponent` class that should be set to configuration class for a component
`--component-config-file` CLI option and `COMPONENT_CONFIG_FILE` env var and field in `Config`. This option allows to load configuration from a specific file, CLI option takes precedence over env var.
- Added comments to config models

### Removed

- Unused `change_agent_id` method from `FileManagerComponent`
- Unused `allow_downloads` from `Config` and CLI options (it should be in web component config if needed)
- CLI option `--browser-name` (the option is inside `WebSeleniumConfiguration`)
- Unused `workspace_directory` from CLI options
- No longer needed variables from `Config` and docs
- Unused fields from `Config`: `image_size`, `audio_to_text_provider`, `huggingface_audio_to_text_model`
- Removed `files` and `workspace` class attributes from `FileManagerComponent`
This commit is contained in:
Krzysztof Czerwinski
2024-06-19 09:14:01 +01:00
committed by GitHub
parent 02dc198a9f
commit c19ab2b24f
47 changed files with 772 additions and 722 deletions

View File

@@ -1,63 +0,0 @@
# 🖼 Image Generation configuration
| Config variable | Values | |
| ---------------- | ------------------------------- | -------------------- |
| `IMAGE_PROVIDER` | `dalle` `huggingface` `sdwebui` | **default: `dalle`** |
## DALL-e
In `.env`, make sure `IMAGE_PROVIDER` is commented (or set to `dalle`):
```ini
# IMAGE_PROVIDER=dalle # this is the default
```
Further optional configuration:
| Config variable | Values | |
| ---------------- | ------------------ | -------------- |
| `IMAGE_SIZE` | `256` `512` `1024` | default: `256` |
## Hugging Face
To use text-to-image models from Hugging Face, you need a Hugging Face API token.
Link to the appropriate settings page: [Hugging Face > Settings > Tokens](https://huggingface.co/settings/tokens)
Once you have an API token, uncomment and adjust these variables in your `.env`:
```ini
IMAGE_PROVIDER=huggingface
HUGGINGFACE_API_TOKEN=your-huggingface-api-token
```
Further optional configuration:
| Config variable | Values | |
| ------------------------- | ---------------------- | ---------------------------------------- |
| `HUGGINGFACE_IMAGE_MODEL` | see [available models] | default: `CompVis/stable-diffusion-v1-4` |
[available models]: https://huggingface.co/models?pipeline_tag=text-to-image
## Stable Diffusion WebUI
It is possible to use your own self-hosted Stable Diffusion WebUI with AutoGPT:
```ini
IMAGE_PROVIDER=sdwebui
```
!!! note
Make sure you are running WebUI with `--api` enabled.
Further optional configuration:
| Config variable | Values | |
| --------------- | ----------------------- | -------------------------------- |
| `SD_WEBUI_URL` | URL to your WebUI | default: `http://127.0.0.1:7860` |
| `SD_WEBUI_AUTH` | `{username}:{password}` | *Note: do not copy the braces!* |
## Selenium
```shell
sudo Xvfb :10 -ac -screen 0 1024x768x24 & DISPLAY=:10 <YOUR_CLIENT>
```

View File

@@ -4,18 +4,14 @@ Configuration is controlled through the `Config` object. You can set configurati
## Environment Variables
- `AUDIO_TO_TEXT_PROVIDER`: Audio To Text Provider. Only option currently is `huggingface`. Default: huggingface
- `AUTHORISE_COMMAND_KEY`: Key response accepted when authorising commands. Default: y
- `ANTHROPIC_API_KEY`: Set this if you want to use Anthropic models with AutoGPT
- `AZURE_CONFIG_FILE`: Location of the Azure Config file relative to the AutoGPT root directory. Default: azure.yaml
- `BROWSE_CHUNK_MAX_LENGTH`: When browsing website, define the length of chunks to summarize. Default: 3000
- `BROWSE_SPACY_LANGUAGE_MODEL`: [spaCy language model](https://spacy.io/usage/models) to use when creating chunks. Default: en_core_web_sm
- `CHAT_MESSAGES_ENABLED`: Enable chat messages. Optional
- `DISABLED_COMMANDS`: Commands to disable. Use comma separated names of commands. See the list of commands from built-in components [here](../../forge/components/built-in-components.md). Default: None
- `COMPONENT_CONFIG_FILE`: Path to the component configuration file (json) for an agent. Optional
- `DISABLED_COMMANDS`: Commands to disable. Use comma separated names of commands. See the list of commands from built-in components [here](../components/components.md). Default: None
- `ELEVENLABS_API_KEY`: ElevenLabs API Key. Optional.
- `ELEVENLABS_VOICE_ID`: ElevenLabs Voice ID. Optional.
- `EMBEDDING_MODEL`: LLM Model to use for embedding tasks. Default: `text-embedding-3-small`
- `EXECUTE_LOCAL_COMMANDS`: If shell commands should be executed locally. Default: False
- `EXIT_KEY`: Exit key accepted to exit. Default: n
- `FAST_LLM`: LLM Model to use for most tasks. Default: `gpt-3.5-turbo-0125`
- `GITHUB_API_KEY`: [Github API Key](https://github.com/settings/tokens). Optional.
@@ -23,26 +19,16 @@ Configuration is controlled through the `Config` object. You can set configurati
- `GOOGLE_API_KEY`: Google API key. Optional.
- `GOOGLE_CUSTOM_SEARCH_ENGINE_ID`: [Google custom search engine ID](https://programmablesearchengine.google.com/controlpanel/all). Optional.
- `GROQ_API_KEY`: Set this if you want to use Groq models with AutoGPT
- `HEADLESS_BROWSER`: Use a headless browser while AutoGPT uses a web browser. Setting to `False` will allow you to see AutoGPT operate the browser. Default: True
- `HUGGINGFACE_API_TOKEN`: HuggingFace API, to be used for both image generation and audio to text. Optional.
- `HUGGINGFACE_AUDIO_TO_TEXT_MODEL`: HuggingFace audio to text model. Default: CompVis/stable-diffusion-v1-4
- `HUGGINGFACE_IMAGE_MODEL`: HuggingFace model to use for image generation. Default: CompVis/stable-diffusion-v1-4
- `IMAGE_PROVIDER`: Image provider. Options are `dalle`, `huggingface`, and `sdwebui`. Default: dalle
- `IMAGE_SIZE`: Default size of image to generate. Default: 256
- `OPENAI_API_KEY`: *REQUIRED*- Your [OpenAI API Key](https://platform.openai.com/account/api-keys).
- `OPENAI_ORGANIZATION`: Organization ID in OpenAI. Optional.
- `PLAIN_OUTPUT`: Plain output, which disables the spinner. Default: False
- `RESTRICT_TO_WORKSPACE`: The restrict file reading and writing to the workspace directory. Default: True
- `SD_WEBUI_AUTH`: Stable Diffusion Web UI username:password pair. Optional.
- `SD_WEBUI_URL`: Stable Diffusion Web UI URL. Default: http://localhost:7860
- `SHELL_ALLOWLIST`: List of shell commands that ARE allowed to be executed by AutoGPT. Only applies if `SHELL_COMMAND_CONTROL` is set to `allowlist`. Default: None
- `SHELL_COMMAND_CONTROL`: Whether to use `allowlist` or `denylist` to determine what shell commands can be executed (Default: denylist)
- `SHELL_DENYLIST`: List of shell commands that ARE NOT allowed to be executed by AutoGPT. Only applies if `SHELL_COMMAND_CONTROL` is set to `denylist`. Default: sudo,su
- `SMART_LLM`: LLM Model to use for "smart" tasks. Default: `gpt-4-turbo-preview`
- `STREAMELEMENTS_VOICE`: StreamElements voice to use. Default: Brian
- `TEMPERATURE`: Value of temperature given to OpenAI. Value from 0 to 2. Lower is more deterministic, higher is more random. See https://platform.openai.com/docs/api-reference/completions/create#completions/create-temperature
- `TEXT_TO_SPEECH_PROVIDER`: Text to Speech Provider. Options are `gtts`, `macos`, `elevenlabs`, and `streamelements`. Default: gtts
- `USER_AGENT`: User-Agent given when browsing websites. Default: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"
- `USE_AZURE`: Use Azure's LLM Default: False
- `USE_WEB_BROWSER`: Which web browser to use. Options are `chrome`, `firefox`, `safari` or `edge` Default: chrome
- `WIPE_REDIS_ON_START`: Wipes data / index on start. Default: True

View File

@@ -2,36 +2,36 @@
!!! note
This section is optional. Use the official Google API if search attempts return
error 429. To use the `google_official_search` command, you need to set up your
Google API key in your environment variables.
error 429. To use the `google` command, you need to set up your
Google API key in your environment variables or pass it with configuration to the [`WebSearchComponent`](../../forge/components/built-in-components.md).
Create your project:
1. Go to the [Google Cloud Console](https://console.cloud.google.com/).
2. If you don't already have an account, create one and log in
3. Create a new project by clicking on the *Select a Project* dropdown at the top of the
1. If you don't already have an account, create one and log in
1. Create a new project by clicking on the *Select a Project* dropdown at the top of the
page and clicking *New Project*
4. Give it a name and click *Create*
5. Set up a custom search API and add to your .env file:
5. Go to the [APIs & Services Dashboard](https://console.cloud.google.com/apis/dashboard)
6. Click *Enable APIs and Services*
7. Search for *Custom Search API* and click on it
8. Click *Enable*
9. Go to the [Credentials](https://console.cloud.google.com/apis/credentials) page
10. Click *Create Credentials*
11. Choose *API Key*
12. Copy the API key
13. Set it as the `GOOGLE_API_KEY` in your `.env` file
14. [Enable](https://console.developers.google.com/apis/api/customsearch.googleapis.com)
1. Give it a name and click *Create*
1. Set up a custom search API and add to your .env file:
1. Go to the [APIs & Services Dashboard](https://console.cloud.google.com/apis/dashboard)
1. Click *Enable APIs and Services*
1. Search for *Custom Search API* and click on it
1. Click *Enable*
1. Go to the [Credentials](https://console.cloud.google.com/apis/credentials) page
1. Click *Create Credentials*
1. Choose *API Key*
1. Copy the API key
1. Set it as the `GOOGLE_API_KEY` in your `.env` file
1. [Enable](https://console.developers.google.com/apis/api/customsearch.googleapis.com)
the Custom Search API on your project. (Might need to wait few minutes to propagate.)
Set up a custom search engine and add to your .env file:
15. Go to the [Custom Search Engine](https://cse.google.com/cse/all) page
16. Click *Add*
17. Set up your search engine by following the prompts.
1. Go to the [Custom Search Engine](https://cse.google.com/cse/all) page
1. Click *Add*
1. Set up your search engine by following the prompts.
You can choose to search the entire web or specific sites
18. Once you've created your search engine, click on *Control Panel*
19. Click *Basics*
20. Copy the *Search engine ID*
21. Set it as the `CUSTOM_SEARCH_ENGINE_ID` in your `.env` file
1. Once you've created your search engine, click on *Control Panel*
1. Click *Basics*
1. Copy the *Search engine ID*
1. Set it as the `CUSTOM_SEARCH_ENGINE_ID` in your `.env` file
_Remember that your free daily custom search quota allows only up to 100 searches. To increase this limit, you need to assign a billing account to the project to profit from up to 10K daily searches._

View File

@@ -60,10 +60,6 @@ Options:
--debug Enable Debug Mode
--gpt3only Enable GPT3.5 Only Mode
--gpt4only Enable GPT4 Only Mode
-b, --browser-name TEXT Specifies which web-browser to use when
using selenium to scrape the web.
--allow-downloads Dangerous: Allows AutoGPT to download files
natively.
--skip-news Specifies whether to suppress the output of
latest news on startup.
--install-plugin-deps Installs external dependencies for 3rd party
@@ -82,6 +78,7 @@ Options:
--override-directives If specified, --constraint, --resource and
--best-practice will override the AI's
directives instead of being appended to them
--component-config-file TEXT Path to the json configuration file.
--help Show this message and exit.
```
</details>
@@ -128,10 +125,6 @@ Options:
--debug Enable Debug Mode
--gpt3only Enable GPT3.5 Only Mode
--gpt4only Enable GPT4 Only Mode
-b, --browser-name TEXT Specifies which web-browser to use when using
selenium to scrape the web.
--allow-downloads Dangerous: Allows AutoGPT to download files
natively.
--install-plugin-deps Installs external dependencies for 3rd party
plugins.
--help Show this message and exit.

View File

@@ -1,19 +1,26 @@
# Built-in Components
This page lists all [🧩 Components](./components.md) and [⚙️ Protocols](./protocols.md) they implement that are natively provided. They are used by the AutoGPT agent.
Some components have additional configuration options listed in the table, see [Component configuration](./components.md/#ordering-components) to learn more.
!!! note
If a configuration field uses environment variable, it still can be passed using configuration model. **Value from the configuration takes precedence over env var!** Env var will be only applied if value in the configuration is not set.
## `SystemComponent`
Essential component to allow an agent to finish.
**DirectiveProvider**
- Constraints about API budget
**MessageProvider**
- Current time and date
- Remaining API budget and warnings if budget is low
**CommandProvider**
- `finish` used when task is completed
## `UserInteractionComponent`
@@ -21,6 +28,7 @@ Essential component to allow an agent to finish.
Adds ability to interact with user in CLI.
**CommandProvider**
- `ask_user` used to ask user for input
## `FileManagerComponent`
@@ -28,10 +36,19 @@ Adds ability to interact with user in CLI.
Adds ability to read and write persistent files to local storage, Google Cloud Storage or Amazon's S3.
Necessary for saving and loading agent's state (preserving session).
| Config variable | Details | Type | Default |
| ---------------- | -------------------------------------- | ----- | ---------------------------------- |
| `files_path` | Path to agent files, e.g. state | `str` | `agents/{agent_id}/`[^1] |
| `workspace_path` | Path to files that agent has access to | `str` | `agents/{agent_id}/workspace/`[^1] |
[^1] This option is set dynamically during component construction as opposed to by default inside the configuration model, `{agent_id}` is replaced with the agent's unique identifier.
**DirectiveProvider**
- Resource information that it's possible to read and write files
**CommandProvider**
- `read_file` used to read file
- `write_file` used to write file
- `list_folder` lists all files in a folder
@@ -40,7 +57,16 @@ Necessary for saving and loading agent's state (preserving session).
Lets the agent execute non-interactive Shell commands and Python code. Python execution works only if Docker is available.
| Config variable | Details | Type | Default |
| ------------------------ | ---------------------------------------------------- | --------------------------- | ----------------- |
| `execute_local_commands` | Enable shell command execution | `bool` | `False` |
| `shell_command_control` | Controls which list is used | `"allowlist" \| "denylist"` | `"allowlist"` |
| `shell_allowlist` | List of allowed shell commands | `List[str]` | `[]` |
| `shell_denylist` | List of prohibited shell commands | `List[str]` | `[]` |
| `docker_container_name` | Name of the Docker container used for code execution | `str` | `"agent_sandbox"` |
**CommandProvider**
- `execute_shell` execute shell command
- `execute_shell_popen` execute shell command with popen
- `execute_python_code` execute Python code
@@ -50,38 +76,84 @@ Lets the agent execute non-interactive Shell commands and Python code. Python ex
Keeps track of agent's actions and their outcomes. Provides their summary to the prompt.
| Config variable | Details | Type | Default |
| ---------------------- | ------------------------------------------------------- | ----------- | ------------------ |
| `model_name` | Name of the llm model used to compress the history | `ModelName` | `"gpt-3.5-turbo"` |
| `max_tokens` | Maximum number of tokens to use for the history summary | `int` | `1024` |
| `spacy_language_model` | Language model used for summary chunking using spacy | `str` | `"en_core_web_sm"` |
**MessageProvider**
- Agent's progress summary
**AfterParse**
- Register agent's action
**ExecutionFailuer**
**ExecutionFailure**
- Rewinds the agent's action, so it isn't saved
**AfterExecute**
- Saves the agent's action result in the history
## `GitOperationsComponent`
Adds ability to iteract with git repositories and GitHub.
| Config variable | Details | Type | Default |
| ----------------- | ----------------------------------------- | ----- | ------- |
| `github_username` | GitHub username, *ENV:* `GITHUB_USERNAME` | `str` | `None` |
| `github_api_key` | GitHub API key, *ENV:* `GITHUB_API_KEY` | `str` | `None` |
**CommandProvider**
- `clone_repository` used to clone a git repository
## `ImageGeneratorComponent`
Adds ability to generate images using various providers, see [Image Generation configuration](./../configuration/imagegen.md) to learn more.
Adds ability to generate images using various providers.
### Hugging Face
To use text-to-image models from Hugging Face, you need a Hugging Face API token.
Link to the appropriate settings page: [Hugging Face > Settings > Tokens](https://huggingface.co/settings/tokens)
### Stable Diffusion WebUI
It is possible to use your own self-hosted Stable Diffusion WebUI with AutoGPT. **Make sure you are running WebUI with `--api` enabled.**
| Config variable | Details | Type | Default |
| ------------------------- | ------------------------------------------------------------- | --------------------------------------- | --------------------------------- |
| `image_provider` | Image generation provider | `"dalle" \| "huggingface" \| "sdwebui"` | `"dalle"` |
| `huggingface_image_model` | Hugging Face image model, see [available models] | `str` | `"CompVis/stable-diffusion-v1-4"` |
| `huggingface_api_token` | Hugging Face API token, *ENV:* `HUGGINGFACE_API_TOKEN` | `str` | `None` |
| `sd_webui_url` | URL to self-hosted Stable Diffusion WebUI | `str` | `"http://localhost:7860"` |
| `sd_webui_auth` | Basic auth for Stable Diffusion WebUI, *ENV:* `SD_WEBUI_AUTH` | `str` of format `{username}:{password}` | `None` |
[available models]: https://huggingface.co/models?pipeline_tag=text-to-image
**CommandProvider**
- `generate_image` used to generate an image given a prompt
## `WebSearchComponent`
Allows agent to search the web.
Allows agent to search the web. Google credentials aren't required for DuckDuckGo. [Instructions how to set up Google API key](../../AutoGPT/configuration/search.md)
| Config variable | Details | Type | Default |
| -------------------------------- | ----------------------------------------------------------------------- | ----- | ------- |
| `google_api_key` | Google API key, *ENV:* `GOOGLE_API_KEY` | `str` | `None` |
| `google_custom_search_engine_id` | Google Custom Search Engine ID, *ENV:* `GOOGLE_CUSTOM_SEARCH_ENGINE_ID` | `str` | `None` |
| `duckduckgo_max_attempts` | Maximum number of attempts to search using DuckDuckGo | `int` | `3` |
**DirectiveProvider**
- Resource information that it's possible to search the web
**CommandProvider**
- `search_web` used to search the web using DuckDuckGo
- `google` used to search the web using Google, requires API key
@@ -89,10 +161,20 @@ Allows agent to search the web.
Allows agent to read websites using Selenium.
| Config variable | Details | Type | Default |
| ----------------------------- | ------------------------------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| `model_name` | Name of the llm model used to read websites | `ModelName` | `"gpt-3.5-turbo"` |
| `web_browser` | Web browser used by Selenium | `"chrome" \| "firefox" \| "safari" \| "edge"` | `"chrome"` |
| `headless` | Run browser in headless mode | `bool` | `True` |
| `user_agent` | User agent used by the browser | `str` | `"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"` |
| `browse_spacy_language_model` | Spacy language model used for chunking text | `str` | `"en_core_web_sm"` |
**DirectiveProvider**
- Resource information that it's possible to read websites
**CommandProvider**
- `read_website` used to read a specific url and look for specific topics or answer a question
## `ContextComponent`
@@ -100,9 +182,11 @@ Allows agent to read websites using Selenium.
Adds ability to keep up-to-date file and folder content in the prompt.
**MessageProvider**
- Content of elements in the context
**CommandProvider**
- `open_file` used to open a file into context
- `open_folder` used to open a folder into context
- `close_context_item` remove an item from the context
@@ -112,4 +196,5 @@ Adds ability to keep up-to-date file and folder content in the prompt.
Watches if agent is looping and switches to smart mode if necessary.
**AfterParse**
- Investigates what happened and switches to smart mode if necessary

View File

@@ -148,12 +148,12 @@ It gives an ability for the agent to ask user for input in the terminal.
yield self.ask_user
```
5. Since agent isn't always running in the terminal or interactive mode, we need to disable this component by setting `self._enabled` when it's not possible to ask for user input.
5. Since agent isn't always running in the terminal or interactive mode, we need to disable this component by setting `self._enabled=False` when it's not possible to ask for user input.
```py
def __init__(self, config: Config):
def __init__(self, interactive_mode: bool):
self.config = config
self._enabled = not config.noninteractive_mode
self._enabled = interactive_mode
```
The final component should look like this:
@@ -164,10 +164,10 @@ class MyUserInteractionComponent(CommandProvider):
"""Provides commands to interact with the user."""
# We pass config to check if we're in noninteractive mode
def __init__(self, config: Config):
def __init__(self, interactive_mode: bool):
self.config = config
# 5.
self._enabled = not config.noninteractive_mode
self._enabled = interactive_mode
# 4.
def get_commands(self) -> Iterator[Command]:
@@ -205,10 +205,10 @@ class MyAgent(Agent):
settings: AgentSettings,
llm_provider: MultiProvider,
file_storage: FileStorage,
legacy_config: Config,
app_config: Config,
):
# Call the parent constructor to bring in the default components
super().__init__(settings, llm_provider, file_storage, legacy_config)
super().__init__(settings, llm_provider, file_storage, app_config)
# Disable the default user interaction component by overriding it
self.user_interaction = MyUserInteractionComponent()
```
@@ -222,14 +222,14 @@ class MyAgent(Agent):
settings: AgentSettings,
llm_provider: MultiProvider,
file_storage: FileStorage,
legacy_config: Config,
app_config: Config,
):
# Call the parent constructor to bring in the default components
super().__init__(settings, llm_provider, file_storage, legacy_config)
super().__init__(settings, llm_provider, file_storage, app_config)
# Disable the default user interaction component
self.user_interaction = None
# Add our own component
self.my_user_interaction = MyUserInteractionComponent(legacy_config)
self.my_user_interaction = MyUserInteractionComponent(app_config)
```
## Learn more

View File

@@ -1,5 +1,11 @@
# Component Agents
!!! important
[Legacy plugins] no longer work with AutoGPT. They have been replaced by components,
although we're still working on a new system to load plug-in components.
[Legacy plugins]: https://github.com/Significant-Gravitas/Auto-GPT-Plugins
This guide explains the component-based architecture of AutoGPT agents. It's a new way of building agents that is more flexible and easier to extend. Components replace some agent's logic and plugins with a more modular and composable system.
Agent is composed of *components*, and each *component* implements a range of *protocols* (interfaces), each one providing a specific functionality, e.g. additional commands or messages. Each *protocol* is handled in a specific order, defined by the agent. This allows for a clear separation of concerns and a more modular design.

View File

@@ -15,7 +15,6 @@ nav:
- Options: AutoGPT/configuration/options.md
- Search: AutoGPT/configuration/search.md
- Voice: AutoGPT/configuration/voice.md
- Image Generation: AutoGPT/configuration/imagegen.md
- Usage: AutoGPT/usage.md
- Help us improve AutoGPT:
- Share your debug logs with us: AutoGPT/share-your-logs.md