feat(forge): Component-specific configuration (#7170)

Remove many env vars and use component-level configuration that could be loaded from file instead. ### Changed - `BaseAgent` provides `serialize_configs` and `deserialize_configs` that can save and load all component configuration as json `str`. Deserialized components/values overwrite existing values, so not all values need to be present in the serialized config. - Decoupled `forge/content_processing/text.py` from `Config` - Kept `execute_local_commands` in `Config` because it's needed to know if OS info should be included in the prompt - Updated docs to reflect changes - Renamed `Config` to `AppConfig` ### Added - Added `ConfigurableComponent` class for components and following configs: - `ActionHistoryConfiguration` - `CodeExecutorConfiguration` - `FileManagerConfiguration` - now file manager allows to have multiple agents using the same workspace - `GitOperationsConfiguration` - `ImageGeneratorConfiguration` - `WebSearchConfiguration` - `WebSeleniumConfiguration` - `BaseConfig` in `forge` and moved `Config` (now inherits from `BaseConfig`) back to `autogpt` - Required `config_class` attribute for the `ConfigurableComponent` class that should be set to configuration class for a component `--component-config-file` CLI option and `COMPONENT_CONFIG_FILE` env var and field in `Config`. This option allows to load configuration from a specific file, CLI option takes precedence over env var. - Added comments to config models ### Removed - Unused `change_agent_id` method from `FileManagerComponent` - Unused `allow_downloads` from `Config` and CLI options (it should be in web component config if needed) - CLI option `--browser-name` (the option is inside `WebSeleniumConfiguration`) - Unused `workspace_directory` from CLI options - No longer needed variables from `Config` and docs - Unused fields from `Config`: `image_size`, `audio_to_text_provider`, `huggingface_audio_to_text_model` - Removed `files` and `workspace` class attributes from `FileManagerComponent`
2026-01-09 15:17:59 -05:00 · 2024-06-19 09:14:01 +01:00
parent 02dc198a9f
commit c19ab2b24f
47 changed files with 772 additions and 722 deletions
--- a/docs/content/forge/components/built-in-components.md
+++ b/docs/content/forge/components/built-in-components.md
@@ -1,19 +1,26 @@
 # Built-in Components

 This page lists all [🧩 Components](./components.md) and [⚙️ Protocols](./protocols.md) they implement that are natively provided. They are used by the AutoGPT agent.
+Some components have additional configuration options listed in the table, see [Component configuration](./components.md/#ordering-components) to learn more.
+
+!!! note
+    If a configuration field uses environment variable, it still can be passed using configuration model. **Value from the configuration takes precedence over env var!** Env var will be only applied if value in the configuration is not set.

 ## `SystemComponent`

 Essential component to allow an agent to finish.

 **DirectiveProvider**
+
 - Constraints about API budget
  
 **MessageProvider**
+
 - Current time and date
 - Remaining API budget and warnings if budget is low

 **CommandProvider**
+
 - `finish` used when task is completed

 ## `UserInteractionComponent`
@@ -21,6 +28,7 @@ Essential component to allow an agent to finish.
 Adds ability to interact with user in CLI.

 **CommandProvider**
+
 - `ask_user` used to ask user for input

 ## `FileManagerComponent`
@@ -28,10 +36,19 @@ Adds ability to interact with user in CLI.
 Adds ability to read and write persistent files to local storage, Google Cloud Storage or Amazon's S3.
 Necessary for saving and loading agent's state (preserving session).

+| Config variable  | Details                                | Type  | Default                            |
+| ---------------- | -------------------------------------- | ----- | ---------------------------------- |
+| `files_path`     | Path to agent files, e.g. state        | `str` | `agents/{agent_id}/`[^1]           |
+| `workspace_path` | Path to files that agent has access to | `str` | `agents/{agent_id}/workspace/`[^1] |
+
+[^1] This option is set dynamically during component construction as opposed to by default inside the configuration model, `{agent_id}` is replaced with the agent's unique identifier.
+
 **DirectiveProvider**
+
 - Resource information that it's possible to read and write files

 **CommandProvider**
+
 - `read_file` used to read file
 - `write_file` used to write file
 - `list_folder` lists all files in a folder 
@@ -40,7 +57,16 @@ Necessary for saving and loading agent's state (preserving session).

 Lets the agent execute non-interactive Shell commands and Python code. Python execution works only if Docker is available.

+| Config variable          | Details                                              | Type                        | Default           |
+| ------------------------ | ---------------------------------------------------- | --------------------------- | ----------------- |
+| `execute_local_commands` | Enable shell command execution                       | `bool`                      | `False`           |
+| `shell_command_control`  | Controls which list is used                          | `"allowlist" \| "denylist"` | `"allowlist"`     |
+| `shell_allowlist`        | List of allowed shell commands                       | `List[str]`                 | `[]`              |
+| `shell_denylist`         | List of prohibited shell commands                    | `List[str]`                 | `[]`              |
+| `docker_container_name`  | Name of the Docker container used for code execution | `str`                       | `"agent_sandbox"` |
+
 **CommandProvider**
+
 - `execute_shell` execute shell command
 - `execute_shell_popen` execute shell command with popen
 - `execute_python_code` execute Python code
@@ -50,38 +76,84 @@ Lets the agent execute non-interactive Shell commands and Python code. Python ex

 Keeps track of agent's actions and their outcomes. Provides their summary to the prompt.

+| Config variable        | Details                                                 | Type        | Default            |
+| ---------------------- | ------------------------------------------------------- | ----------- | ------------------ |
+| `model_name`           | Name of the llm model used to compress the history      | `ModelName` | `"gpt-3.5-turbo"`  |
+| `max_tokens`           | Maximum number of tokens to use for the history summary | `int`       | `1024`             |
+| `spacy_language_model` | Language model used for summary chunking using spacy    | `str`       | `"en_core_web_sm"` |
+
 **MessageProvider**
+
 - Agent's progress summary

 **AfterParse**
+
 - Register agent's action

-**ExecutionFailuer**
+**ExecutionFailure**
+
 - Rewinds the agent's action, so it isn't saved

 **AfterExecute**
+
 - Saves the agent's action result in the history

 ## `GitOperationsComponent`

+Adds ability to iteract with git repositories and GitHub.
+
+| Config variable   | Details                                   | Type  | Default |
+| ----------------- | ----------------------------------------- | ----- | ------- |
+| `github_username` | GitHub username, *ENV:* `GITHUB_USERNAME` | `str` | `None`  |
+| `github_api_key`  | GitHub API key, *ENV:* `GITHUB_API_KEY`   | `str` | `None`  |
+
 **CommandProvider**
+
 - `clone_repository` used to clone a git repository

 ## `ImageGeneratorComponent`

-Adds ability to generate images using various providers, see [Image Generation configuration](./../configuration/imagegen.md) to learn more.
+Adds ability to generate images using various providers.
+
+### Hugging Face
+
+To use text-to-image models from Hugging Face, you need a Hugging Face API token.
+Link to the appropriate settings page: [Hugging Face > Settings > Tokens](https://huggingface.co/settings/tokens)
+
+### Stable Diffusion WebUI
+
+It is possible to use your own self-hosted Stable Diffusion WebUI with AutoGPT. **Make sure you are running WebUI with `--api` enabled.**
+
+| Config variable           | Details                                                       | Type                                    | Default                           |
+| ------------------------- | ------------------------------------------------------------- | --------------------------------------- | --------------------------------- |
+| `image_provider`          | Image generation provider                                     | `"dalle" \| "huggingface" \| "sdwebui"` | `"dalle"`                         |
+| `huggingface_image_model` | Hugging Face image model, see [available models]              | `str`                                   | `"CompVis/stable-diffusion-v1-4"` |
+| `huggingface_api_token`   | Hugging Face API token, *ENV:* `HUGGINGFACE_API_TOKEN`        | `str`                                   | `None`                            |
+| `sd_webui_url`            | URL to self-hosted Stable Diffusion WebUI                     | `str`                                   | `"http://localhost:7860"`         |
+| `sd_webui_auth`           | Basic auth for Stable Diffusion WebUI, *ENV:* `SD_WEBUI_AUTH` | `str` of format `{username}:{password}` | `None`                            |
+
+[available models]: https://huggingface.co/models?pipeline_tag=text-to-image

 **CommandProvider**
+
 - `generate_image` used to generate an image given a prompt

 ## `WebSearchComponent`

-Allows agent to search the web.
+Allows agent to search the web. Google credentials aren't required for DuckDuckGo. [Instructions how to set up Google API key](../../AutoGPT/configuration/search.md)
+
+| Config variable                  | Details                                                                 | Type  | Default |
+| -------------------------------- | ----------------------------------------------------------------------- | ----- | ------- |
+| `google_api_key`                 | Google API key, *ENV:* `GOOGLE_API_KEY`                                 | `str` | `None`  |
+| `google_custom_search_engine_id` | Google Custom Search Engine ID, *ENV:* `GOOGLE_CUSTOM_SEARCH_ENGINE_ID` | `str` | `None`  |
+| `duckduckgo_max_attempts`        | Maximum number of attempts to search using DuckDuckGo                   | `int` | `3`     |

 **DirectiveProvider**
+
 - Resource information that it's possible to search the web

 **CommandProvider**
+
 - `search_web` used to search the web using DuckDuckGo
 - `google` used to search the web using Google, requires API key

@@ -89,10 +161,20 @@ Allows agent to search the web.

 Allows agent to read websites using Selenium.

+| Config variable               | Details                                     | Type                                          | Default                                                                                                                      |
+| ----------------------------- | ------------------------------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
+| `model_name`                  | Name of the llm model used to read websites | `ModelName`                                   | `"gpt-3.5-turbo"`                                                                                                            |
+| `web_browser`                 | Web browser used by Selenium                | `"chrome" \| "firefox" \| "safari" \| "edge"` | `"chrome"`                                                                                                                   |
+| `headless`                    | Run browser in headless mode                | `bool`                                        | `True`                                                                                                                       |
+| `user_agent`                  | User agent used by the browser              | `str`                                         | `"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"` |
+| `browse_spacy_language_model` | Spacy language model used for chunking text | `str`                                         | `"en_core_web_sm"`                                                                                                           |
+
 **DirectiveProvider**
+
 - Resource information that it's possible to read websites

 **CommandProvider**
+
 - `read_website` used to read a specific url and look for specific topics or answer a question

 ## `ContextComponent`
@@ -100,9 +182,11 @@ Allows agent to read websites using Selenium.
 Adds ability to keep up-to-date file and folder content in the prompt.

 **MessageProvider**
+
 - Content of elements in the context

 **CommandProvider**
+
 - `open_file` used to open a file into context
 - `open_folder` used to open a folder into context
 - `close_context_item` remove an item from the context
@@ -112,4 +196,5 @@ Adds ability to keep up-to-date file and folder content in the prompt.
 Watches if agent is looping and switches to smart mode if necessary.

 **AfterParse**
+
 - Investigates what happened and switches to smart mode if necessary
--- a/docs/content/forge/components/creating-components.md
+++ b/docs/content/forge/components/creating-components.md
@@ -148,12 +148,12 @@ It gives an ability for the agent to ask user for input in the terminal.
        yield self.ask_user
    ```

-5. Since agent isn't always running in the terminal or interactive mode, we need to disable this component by setting `self._enabled` when it's not possible to ask for user input.
+5. Since agent isn't always running in the terminal or interactive mode, we need to disable this component by setting `self._enabled=False` when it's not possible to ask for user input.

    ```py
-    def __init__(self, config: Config):
+    def __init__(self, interactive_mode: bool):
        self.config = config
-        self._enabled = not config.noninteractive_mode
+        self._enabled = interactive_mode
    ```

 The final component should look like this:
@@ -164,10 +164,10 @@ class MyUserInteractionComponent(CommandProvider):
    """Provides commands to interact with the user."""

    # We pass config to check if we're in noninteractive mode
-    def __init__(self, config: Config):
+    def __init__(self, interactive_mode: bool):
        self.config = config
        # 5.
-        self._enabled = not config.noninteractive_mode
+        self._enabled = interactive_mode

    # 4.
    def get_commands(self) -> Iterator[Command]:
@@ -205,10 +205,10 @@ class MyAgent(Agent):
        settings: AgentSettings,
        llm_provider: MultiProvider,
        file_storage: FileStorage,
-        legacy_config: Config,
+        app_config: Config,
    ):
        # Call the parent constructor to bring in the default components
-        super().__init__(settings, llm_provider, file_storage, legacy_config)
+        super().__init__(settings, llm_provider, file_storage, app_config)
        # Disable the default user interaction component by overriding it
        self.user_interaction = MyUserInteractionComponent()
 ```
@@ -222,14 +222,14 @@ class MyAgent(Agent):
        settings: AgentSettings,
        llm_provider: MultiProvider,
        file_storage: FileStorage,
-        legacy_config: Config,
+        app_config: Config,
    ):
        # Call the parent constructor to bring in the default components
-        super().__init__(settings, llm_provider, file_storage, legacy_config)
+        super().__init__(settings, llm_provider, file_storage, app_config)
        # Disable the default user interaction component
        self.user_interaction = None
        # Add our own component
-        self.my_user_interaction = MyUserInteractionComponent(legacy_config)
+        self.my_user_interaction = MyUserInteractionComponent(app_config)
 ```

 ## Learn more
--- a/docs/content/forge/components/introduction.md
+++ b/docs/content/forge/components/introduction.md
@@ -1,5 +1,11 @@
 # Component Agents

+!!! important
+    [Legacy plugins] no longer work with AutoGPT. They have been replaced by components,
+    although we're still working on a new system to load plug-in components.
+
+[Legacy plugins]: https://github.com/Significant-Gravitas/Auto-GPT-Plugins
+
 This guide explains the component-based architecture of AutoGPT agents. It's a new way of building agents that is more flexible and easier to extend. Components replace some agent's logic and plugins with a more modular and composable system.

 Agent is composed of *components*, and each *component* implements a range of *protocols* (interfaces), each one providing a specific functionality, e.g. additional commands or messages. Each *protocol* is handled in a specific order, defined by the agent. This allows for a clear separation of concerns and a more modular design.