mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-01-09 15:17:59 -05:00
feat(forge): Component-specific configuration (#7170)
Remove many env vars and use component-level configuration that could be loaded from file instead. ### Changed - `BaseAgent` provides `serialize_configs` and `deserialize_configs` that can save and load all component configuration as json `str`. Deserialized components/values overwrite existing values, so not all values need to be present in the serialized config. - Decoupled `forge/content_processing/text.py` from `Config` - Kept `execute_local_commands` in `Config` because it's needed to know if OS info should be included in the prompt - Updated docs to reflect changes - Renamed `Config` to `AppConfig` ### Added - Added `ConfigurableComponent` class for components and following configs: - `ActionHistoryConfiguration` - `CodeExecutorConfiguration` - `FileManagerConfiguration` - now file manager allows to have multiple agents using the same workspace - `GitOperationsConfiguration` - `ImageGeneratorConfiguration` - `WebSearchConfiguration` - `WebSeleniumConfiguration` - `BaseConfig` in `forge` and moved `Config` (now inherits from `BaseConfig`) back to `autogpt` - Required `config_class` attribute for the `ConfigurableComponent` class that should be set to configuration class for a component `--component-config-file` CLI option and `COMPONENT_CONFIG_FILE` env var and field in `Config`. This option allows to load configuration from a specific file, CLI option takes precedence over env var. - Added comments to config models ### Removed - Unused `change_agent_id` method from `FileManagerComponent` - Unused `allow_downloads` from `Config` and CLI options (it should be in web component config if needed) - CLI option `--browser-name` (the option is inside `WebSeleniumConfiguration`) - Unused `workspace_directory` from CLI options - No longer needed variables from `Config` and docs - Unused fields from `Config`: `image_size`, `audio_to_text_provider`, `huggingface_audio_to_text_model` - Removed `files` and `workspace` class attributes from `FileManagerComponent`
This commit is contained in:
committed by
GitHub
parent
02dc198a9f
commit
c19ab2b24f
@@ -1,19 +1,26 @@
|
||||
# Built-in Components
|
||||
|
||||
This page lists all [🧩 Components](./components.md) and [⚙️ Protocols](./protocols.md) they implement that are natively provided. They are used by the AutoGPT agent.
|
||||
Some components have additional configuration options listed in the table, see [Component configuration](./components.md/#ordering-components) to learn more.
|
||||
|
||||
!!! note
|
||||
If a configuration field uses environment variable, it still can be passed using configuration model. **Value from the configuration takes precedence over env var!** Env var will be only applied if value in the configuration is not set.
|
||||
|
||||
## `SystemComponent`
|
||||
|
||||
Essential component to allow an agent to finish.
|
||||
|
||||
**DirectiveProvider**
|
||||
|
||||
- Constraints about API budget
|
||||
|
||||
**MessageProvider**
|
||||
|
||||
- Current time and date
|
||||
- Remaining API budget and warnings if budget is low
|
||||
|
||||
**CommandProvider**
|
||||
|
||||
- `finish` used when task is completed
|
||||
|
||||
## `UserInteractionComponent`
|
||||
@@ -21,6 +28,7 @@ Essential component to allow an agent to finish.
|
||||
Adds ability to interact with user in CLI.
|
||||
|
||||
**CommandProvider**
|
||||
|
||||
- `ask_user` used to ask user for input
|
||||
|
||||
## `FileManagerComponent`
|
||||
@@ -28,10 +36,19 @@ Adds ability to interact with user in CLI.
|
||||
Adds ability to read and write persistent files to local storage, Google Cloud Storage or Amazon's S3.
|
||||
Necessary for saving and loading agent's state (preserving session).
|
||||
|
||||
| Config variable | Details | Type | Default |
|
||||
| ---------------- | -------------------------------------- | ----- | ---------------------------------- |
|
||||
| `files_path` | Path to agent files, e.g. state | `str` | `agents/{agent_id}/`[^1] |
|
||||
| `workspace_path` | Path to files that agent has access to | `str` | `agents/{agent_id}/workspace/`[^1] |
|
||||
|
||||
[^1] This option is set dynamically during component construction as opposed to by default inside the configuration model, `{agent_id}` is replaced with the agent's unique identifier.
|
||||
|
||||
**DirectiveProvider**
|
||||
|
||||
- Resource information that it's possible to read and write files
|
||||
|
||||
**CommandProvider**
|
||||
|
||||
- `read_file` used to read file
|
||||
- `write_file` used to write file
|
||||
- `list_folder` lists all files in a folder
|
||||
@@ -40,7 +57,16 @@ Necessary for saving and loading agent's state (preserving session).
|
||||
|
||||
Lets the agent execute non-interactive Shell commands and Python code. Python execution works only if Docker is available.
|
||||
|
||||
| Config variable | Details | Type | Default |
|
||||
| ------------------------ | ---------------------------------------------------- | --------------------------- | ----------------- |
|
||||
| `execute_local_commands` | Enable shell command execution | `bool` | `False` |
|
||||
| `shell_command_control` | Controls which list is used | `"allowlist" \| "denylist"` | `"allowlist"` |
|
||||
| `shell_allowlist` | List of allowed shell commands | `List[str]` | `[]` |
|
||||
| `shell_denylist` | List of prohibited shell commands | `List[str]` | `[]` |
|
||||
| `docker_container_name` | Name of the Docker container used for code execution | `str` | `"agent_sandbox"` |
|
||||
|
||||
**CommandProvider**
|
||||
|
||||
- `execute_shell` execute shell command
|
||||
- `execute_shell_popen` execute shell command with popen
|
||||
- `execute_python_code` execute Python code
|
||||
@@ -50,38 +76,84 @@ Lets the agent execute non-interactive Shell commands and Python code. Python ex
|
||||
|
||||
Keeps track of agent's actions and their outcomes. Provides their summary to the prompt.
|
||||
|
||||
| Config variable | Details | Type | Default |
|
||||
| ---------------------- | ------------------------------------------------------- | ----------- | ------------------ |
|
||||
| `model_name` | Name of the llm model used to compress the history | `ModelName` | `"gpt-3.5-turbo"` |
|
||||
| `max_tokens` | Maximum number of tokens to use for the history summary | `int` | `1024` |
|
||||
| `spacy_language_model` | Language model used for summary chunking using spacy | `str` | `"en_core_web_sm"` |
|
||||
|
||||
**MessageProvider**
|
||||
|
||||
- Agent's progress summary
|
||||
|
||||
**AfterParse**
|
||||
|
||||
- Register agent's action
|
||||
|
||||
**ExecutionFailuer**
|
||||
**ExecutionFailure**
|
||||
|
||||
- Rewinds the agent's action, so it isn't saved
|
||||
|
||||
**AfterExecute**
|
||||
|
||||
- Saves the agent's action result in the history
|
||||
|
||||
## `GitOperationsComponent`
|
||||
|
||||
Adds ability to iteract with git repositories and GitHub.
|
||||
|
||||
| Config variable | Details | Type | Default |
|
||||
| ----------------- | ----------------------------------------- | ----- | ------- |
|
||||
| `github_username` | GitHub username, *ENV:* `GITHUB_USERNAME` | `str` | `None` |
|
||||
| `github_api_key` | GitHub API key, *ENV:* `GITHUB_API_KEY` | `str` | `None` |
|
||||
|
||||
**CommandProvider**
|
||||
|
||||
- `clone_repository` used to clone a git repository
|
||||
|
||||
## `ImageGeneratorComponent`
|
||||
|
||||
Adds ability to generate images using various providers, see [Image Generation configuration](./../configuration/imagegen.md) to learn more.
|
||||
Adds ability to generate images using various providers.
|
||||
|
||||
### Hugging Face
|
||||
|
||||
To use text-to-image models from Hugging Face, you need a Hugging Face API token.
|
||||
Link to the appropriate settings page: [Hugging Face > Settings > Tokens](https://huggingface.co/settings/tokens)
|
||||
|
||||
### Stable Diffusion WebUI
|
||||
|
||||
It is possible to use your own self-hosted Stable Diffusion WebUI with AutoGPT. **Make sure you are running WebUI with `--api` enabled.**
|
||||
|
||||
| Config variable | Details | Type | Default |
|
||||
| ------------------------- | ------------------------------------------------------------- | --------------------------------------- | --------------------------------- |
|
||||
| `image_provider` | Image generation provider | `"dalle" \| "huggingface" \| "sdwebui"` | `"dalle"` |
|
||||
| `huggingface_image_model` | Hugging Face image model, see [available models] | `str` | `"CompVis/stable-diffusion-v1-4"` |
|
||||
| `huggingface_api_token` | Hugging Face API token, *ENV:* `HUGGINGFACE_API_TOKEN` | `str` | `None` |
|
||||
| `sd_webui_url` | URL to self-hosted Stable Diffusion WebUI | `str` | `"http://localhost:7860"` |
|
||||
| `sd_webui_auth` | Basic auth for Stable Diffusion WebUI, *ENV:* `SD_WEBUI_AUTH` | `str` of format `{username}:{password}` | `None` |
|
||||
|
||||
[available models]: https://huggingface.co/models?pipeline_tag=text-to-image
|
||||
|
||||
**CommandProvider**
|
||||
|
||||
- `generate_image` used to generate an image given a prompt
|
||||
|
||||
## `WebSearchComponent`
|
||||
|
||||
Allows agent to search the web.
|
||||
Allows agent to search the web. Google credentials aren't required for DuckDuckGo. [Instructions how to set up Google API key](../../AutoGPT/configuration/search.md)
|
||||
|
||||
| Config variable | Details | Type | Default |
|
||||
| -------------------------------- | ----------------------------------------------------------------------- | ----- | ------- |
|
||||
| `google_api_key` | Google API key, *ENV:* `GOOGLE_API_KEY` | `str` | `None` |
|
||||
| `google_custom_search_engine_id` | Google Custom Search Engine ID, *ENV:* `GOOGLE_CUSTOM_SEARCH_ENGINE_ID` | `str` | `None` |
|
||||
| `duckduckgo_max_attempts` | Maximum number of attempts to search using DuckDuckGo | `int` | `3` |
|
||||
|
||||
**DirectiveProvider**
|
||||
|
||||
- Resource information that it's possible to search the web
|
||||
|
||||
**CommandProvider**
|
||||
|
||||
- `search_web` used to search the web using DuckDuckGo
|
||||
- `google` used to search the web using Google, requires API key
|
||||
|
||||
@@ -89,10 +161,20 @@ Allows agent to search the web.
|
||||
|
||||
Allows agent to read websites using Selenium.
|
||||
|
||||
| Config variable | Details | Type | Default |
|
||||
| ----------------------------- | ------------------------------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `model_name` | Name of the llm model used to read websites | `ModelName` | `"gpt-3.5-turbo"` |
|
||||
| `web_browser` | Web browser used by Selenium | `"chrome" \| "firefox" \| "safari" \| "edge"` | `"chrome"` |
|
||||
| `headless` | Run browser in headless mode | `bool` | `True` |
|
||||
| `user_agent` | User agent used by the browser | `str` | `"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"` |
|
||||
| `browse_spacy_language_model` | Spacy language model used for chunking text | `str` | `"en_core_web_sm"` |
|
||||
|
||||
**DirectiveProvider**
|
||||
|
||||
- Resource information that it's possible to read websites
|
||||
|
||||
**CommandProvider**
|
||||
|
||||
- `read_website` used to read a specific url and look for specific topics or answer a question
|
||||
|
||||
## `ContextComponent`
|
||||
@@ -100,9 +182,11 @@ Allows agent to read websites using Selenium.
|
||||
Adds ability to keep up-to-date file and folder content in the prompt.
|
||||
|
||||
**MessageProvider**
|
||||
|
||||
- Content of elements in the context
|
||||
|
||||
**CommandProvider**
|
||||
|
||||
- `open_file` used to open a file into context
|
||||
- `open_folder` used to open a folder into context
|
||||
- `close_context_item` remove an item from the context
|
||||
@@ -112,4 +196,5 @@ Adds ability to keep up-to-date file and folder content in the prompt.
|
||||
Watches if agent is looping and switches to smart mode if necessary.
|
||||
|
||||
**AfterParse**
|
||||
|
||||
- Investigates what happened and switches to smart mode if necessary
|
||||
|
||||
@@ -148,12 +148,12 @@ It gives an ability for the agent to ask user for input in the terminal.
|
||||
yield self.ask_user
|
||||
```
|
||||
|
||||
5. Since agent isn't always running in the terminal or interactive mode, we need to disable this component by setting `self._enabled` when it's not possible to ask for user input.
|
||||
5. Since agent isn't always running in the terminal or interactive mode, we need to disable this component by setting `self._enabled=False` when it's not possible to ask for user input.
|
||||
|
||||
```py
|
||||
def __init__(self, config: Config):
|
||||
def __init__(self, interactive_mode: bool):
|
||||
self.config = config
|
||||
self._enabled = not config.noninteractive_mode
|
||||
self._enabled = interactive_mode
|
||||
```
|
||||
|
||||
The final component should look like this:
|
||||
@@ -164,10 +164,10 @@ class MyUserInteractionComponent(CommandProvider):
|
||||
"""Provides commands to interact with the user."""
|
||||
|
||||
# We pass config to check if we're in noninteractive mode
|
||||
def __init__(self, config: Config):
|
||||
def __init__(self, interactive_mode: bool):
|
||||
self.config = config
|
||||
# 5.
|
||||
self._enabled = not config.noninteractive_mode
|
||||
self._enabled = interactive_mode
|
||||
|
||||
# 4.
|
||||
def get_commands(self) -> Iterator[Command]:
|
||||
@@ -205,10 +205,10 @@ class MyAgent(Agent):
|
||||
settings: AgentSettings,
|
||||
llm_provider: MultiProvider,
|
||||
file_storage: FileStorage,
|
||||
legacy_config: Config,
|
||||
app_config: Config,
|
||||
):
|
||||
# Call the parent constructor to bring in the default components
|
||||
super().__init__(settings, llm_provider, file_storage, legacy_config)
|
||||
super().__init__(settings, llm_provider, file_storage, app_config)
|
||||
# Disable the default user interaction component by overriding it
|
||||
self.user_interaction = MyUserInteractionComponent()
|
||||
```
|
||||
@@ -222,14 +222,14 @@ class MyAgent(Agent):
|
||||
settings: AgentSettings,
|
||||
llm_provider: MultiProvider,
|
||||
file_storage: FileStorage,
|
||||
legacy_config: Config,
|
||||
app_config: Config,
|
||||
):
|
||||
# Call the parent constructor to bring in the default components
|
||||
super().__init__(settings, llm_provider, file_storage, legacy_config)
|
||||
super().__init__(settings, llm_provider, file_storage, app_config)
|
||||
# Disable the default user interaction component
|
||||
self.user_interaction = None
|
||||
# Add our own component
|
||||
self.my_user_interaction = MyUserInteractionComponent(legacy_config)
|
||||
self.my_user_interaction = MyUserInteractionComponent(app_config)
|
||||
```
|
||||
|
||||
## Learn more
|
||||
|
||||
@@ -1,5 +1,11 @@
|
||||
# Component Agents
|
||||
|
||||
!!! important
|
||||
[Legacy plugins] no longer work with AutoGPT. They have been replaced by components,
|
||||
although we're still working on a new system to load plug-in components.
|
||||
|
||||
[Legacy plugins]: https://github.com/Significant-Gravitas/Auto-GPT-Plugins
|
||||
|
||||
This guide explains the component-based architecture of AutoGPT agents. It's a new way of building agents that is more flexible and easier to extend. Components replace some agent's logic and plugins with a more modular and composable system.
|
||||
|
||||
Agent is composed of *components*, and each *component* implements a range of *protocols* (interfaces), each one providing a specific functionality, e.g. additional commands or messages. Each *protocol* is handled in a specific order, defined by the agent. This allows for a clear separation of concerns and a more modular design.
|
||||
|
||||
Reference in New Issue
Block a user