Files
AutoGPT/autogpt_platform/backend/backend/copilot/tools/IDEAS.md
Reinier van der Leer d23248f065 feat(backend/copilot): Copilot Executor Microservice (#12057)
Uncouple Copilot task execution from the REST API server. This should
help performance and scalability, and allows task execution to continue
regardless of the state of the user's connection.

- Resolves #12023

### Changes 🏗️

- Add `backend.copilot.executor`->`CoPilotExecutor` (setup similar to
`backend.executor`->`ExecutionManager`).

This executor service uses RabbitMQ-based task distribution, and sticks
with the existing Redis Streams setup for task output. It uses a cluster
lock mechanism to ensure a task is only executed by one pod, and the
`DatabaseManager` for pooled DB access.

- Add `backend.data.db_accessors` for automatic choice of direct/proxied
DB access

Chat requests now flow: API → RabbitMQ → CoPilot Executor → Redis
Streams → SSE Client. This enables horizontal scaling of chat processing
and isolates long-running LLM operations from the API service.

- Move non-API Copilot stuff into `backend.copilot` (from
`backend.api.features.chat`)
  - Updated import paths for all usages

- Move `backend.executor.database` to `backend.data.db_manager` and add
methods for copilot executor
  - Updated import paths for all usages
- Make `backend.copilot.db` RPC-compatible (-> DB ops return ~~Prisma~~
Pydantic models)
  - Make `backend.data.workspace` RPC-compatible
  - Make `backend.data.graphs.get_store_listed_graphs` RPC-compatible

DX:
- Add `copilot_executor` service to Docker setup

Config:
- Add `Config.num_copilot_workers` (default 5) and
`Config.copilot_executor_port` (default 8008)
- Remove unused `Config.agent_server_port`

> [!WARNING]
> **This change adds a new microservice to the system, with entrypoint
`backend.copilot.executor`.**
> The `docker compose` setup has been updated, but if you run the
Platform on something else, you'll have to update your deployment config
to include this new service.
>
> When running locally, the `CoPilotExecutor` uses port 8008 by default.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Copilot works
    - [x] Processes messages when triggered
    - [x] Can use its tools

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-02-17 16:15:28 +00:00

3.1 KiB

CoPilot Tools - Future Ideas

Multimodal Image Support for CoPilot

Problem: CoPilot uses a vision-capable model but can't "see" workspace images. When a block generates an image and returns workspace://abc123, CoPilot can't evaluate it (e.g., checking blog thumbnail quality).

Backend Solution: When preparing messages for the LLM, detect workspace:// image references and convert them to proper image content blocks:

# Before sending to LLM, scan for workspace image references
# and inject them as image content parts

# Example message transformation:
# FROM: {"role": "assistant", "content": "Generated image: workspace://abc123"}
# TO:   {"role": "assistant", "content": [
#         {"type": "text", "text": "Generated image: workspace://abc123"},
#         {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
#       ]}

Where to implement:

  • In the chat stream handler before calling the LLM
  • Or in a message preprocessing step
  • Need to fetch image from workspace, convert to base64, add as image content

Considerations:

  • Only do this for image MIME types (image/png, image/jpeg, etc.)
  • May want a size limit (don't pass 10MB images)
  • Track which images were "shown" to the AI for frontend indicator
  • Cost implications - vision API calls are more expensive

Frontend Solution: Show visual indicator on workspace files in chat:

  • If AI saw the image: normal display
  • If AI didn't see it: overlay icon saying "AI can't see this image"

Requires response metadata indicating which workspace:// refs were passed to the model.


Output Post-Processing Layer for run_block

Problem: Many blocks produce large outputs that:

  • Consume massive context (100KB base64 image = ~133KB tokens)
  • Can't fit in conversation
  • Break things and cause high LLM costs

Proposed Solution: Instead of modifying individual blocks or store_media_file(), implement a centralized output processor in run_block.py that handles outputs before they're returned to CoPilot.

Benefits:

  1. Centralized - one place to handle all output processing
  2. Future-proof - new blocks automatically get output processing
  3. Keeps blocks pure - they don't need to know about context constraints
  4. Handles all large outputs - not just images

Processing Rules:

  • Detect base64 data URIs → save to workspace, return workspace:// reference
  • Truncate very long strings (>N chars) with truncation note
  • Summarize large arrays/lists (e.g., "Array with 1000 items, first 5: [...]")
  • Handle nested large outputs in dicts recursively
  • Cap total output size

Implementation Location: run_block.py after block execution, before returning BlockOutputResponse

Example:

def _process_outputs_for_context(
    outputs: dict[str, list[Any]],
    workspace_manager: WorkspaceManager,
    max_string_length: int = 10000,
    max_array_preview: int = 5,
) -> dict[str, list[Any]]:
    """Process block outputs to prevent context bloat."""
    processed = {}
    for name, values in outputs.items():
        processed[name] = [_process_value(v, workspace_manager) for v in values]
    return processed