AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-02-10 06:45:28 -05:00

Files

Bently 28d85ad61c feat(backend/AM): Integrate AutoMod content moderation (#10539 )

Copy of [feat(backend/AM): Integrate AutoMod content moderation - By
Bentlybro - PR
#10490](https://github.com/Significant-Gravitas/AutoGPT/pull/10490) cos
i messed it up 🤦

Adds AutoMod input and output moderation to the execution flow.
Introduces a new AutoMod manager and models, updates settings for
moderation configuration, and modifies execution result handling to
support moderation-cleared data. Moderation failures now clear sensitive
data and mark executions as failed.

<img width="921" height="816" alt="image"
src="https://github.com/user-attachments/assets/65c0fee8-d652-42bc-9553-ff507bc067c5"
/>


### Changes 🏗️

I have made some small changes to
``autogpt_platform\backend\backend\executor\manager.py`` to send the
needed into to the AutoMod system which collects the data, combines and
makes the api call to AM and based on its reply lets it run or not!

I also had to make small changes to
``autogpt_platform\backend\backend\data\execution.py`` to add checks
that allow me to clear the content from the blocks if it was flagged

I am working on finalizing the AM repo then that will be public

To note: we will want to set this up behind launch darkly first for
testing on the team before we roll it out any more

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  <!-- Put your test plan here: -->
- [x] Setup and run the platform with ``automod_enabled`` set to False
and it works normally
- [x] Setup and run the platform with ``automod_enabled`` set to True,
set the AM URL and API Key and test it runs safe blocks normally
- [x] Test AM with content that would trigger it to flag and watch it
stop and clear all the blocks outputs

Message @Bentlybro for the URL and an API key to AM for local testing!

## Changes made to Settings.py 

I have added a few new options to the settings.py for AutoMod Config!

```
    # AutoMod configuration
    automod_enabled: bool = Field(
        default=False,
        description="Whether AutoMod content moderation is enabled",
    )
    automod_api_url: str = Field(
        default="",
        description="AutoMod API base URL - Make sure it ends in /api",
    )
    automod_timeout: int = Field(
        default=30,
        description="Timeout in seconds for AutoMod API requests",
    )
    automod_retry_attempts: int = Field(
        default=3,
        description="Number of retry attempts for AutoMod API requests",
    )
    automod_retry_delay: float = Field(
        default=1.0,
        description="Delay between retries for AutoMod API requests in seconds",
    )
    automod_fail_open: bool = Field(
        default=False,
        description="If True, allow execution to continue if AutoMod fails",
    )
    automod_moderate_inputs: bool = Field(
        default=True,
        description="Whether to moderate block inputs",
    )
    automod_moderate_outputs: bool = Field(
        default=True,
        description="Whether to moderate block outputs",
    )
```
and
```
automod_api_key: str = Field(default="", description="AutoMod API key")
```

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>

2025-08-11 09:39:28 +00:00