Files
AutoGPT/autogpt_platform/backend/backend/copilot/sdk/dummy.py
Otto 3d0ede9f34 feat(backend/copilot): attach uploaded images and PDFs as multimodal vision blocks (#12273)
Requested by @majdyz

When users upload images or PDFs to CoPilot, the AI couldn't see the
content because the CLI's Zod validator rejects large base64 in MCP tool
results and even small images were misidentified (the CLI silently drops
or corrupts image content blocks in tool results).

## Approach

Embed uploaded images directly as **vision content blocks** in the user
message via `client._transport.write()`. The SDK's `client.query()` only
accepts string content, so we bypass it for multimodal messages —
writing a properly structured user message with `[...image_blocks,
{"type": "text", "text": query}]` directly to the transport. This
ensures the CLI binary receives images as native vision blocks, matching
how the Anthropic API handles multimodal input.

For binary files accessed via workspace tools at runtime, we save them
to the SDK's ephemeral working directory (`sdk_cwd`) and return a file
path for the CLI's built-in `Read` tool to handle natively.

## Changes

### Vision content blocks for attached files — `service.py`
- `_prepare_file_attachments` downloads workspace files before the
query, converts images to base64 vision blocks (`{"type": "image",
"source": {"type": "base64", ...}}`)
- When vision blocks are present, writes multimodal user message
directly to `client._transport` instead of using `client.query()`
- Non-image files (PDFs, text) are saved to `sdk_cwd` with a hint to use
the Read tool

### File-path based access for workspace tools — `workspace_files.py`
- `read_workspace_file` saves binary files to `sdk_cwd` instead of
returning base64, returning a path for the Read tool

### SDK context for ephemeral directory — `tool_adapter.py`
- Added `sdk_cwd` context variable so workspace tools can access the
ephemeral directory
- Removed inline base64 multimodal block machinery
(`_extract_content_block`, `_strip_base64_from_text`, `_BLOCK_BUILDERS`,
etc.)

### Frontend — rendering improvements
- `MessageAttachments.tsx` — uses `OutputRenderers` system
(`globalRegistry` + `OutputItem`) for image/video preview rendering
instead of custom components
- `GenericTool.tsx` — uses `OutputRenderers` system for inline image
rendering of base64 content
- `routes.py` — returns 409 for duplicate workspace filenames

### Tests
- `tool_adapter_test.py` — removed multimodal extraction/stripping
tests, added `get_sdk_cwd` tests
- `service_test.py` — rewritten for `_prepare_file_attachments` with
file-on-disk assertions

Closes OPEN-3022

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-05 09:09:59 +00:00

60 lines
1.8 KiB
Python

"""Dummy SDK service for testing copilot streaming.
Returns mock streaming responses without calling Claude Agent SDK.
Enable via COPILOT_TEST_MODE=true environment variable.
WARNING: This is for testing only. Do not use in production.
"""
import asyncio
import logging
import uuid
from collections.abc import AsyncGenerator
from typing import Any
from ..model import ChatSession
from ..response_model import StreamBaseResponse, StreamStart, StreamTextDelta
logger = logging.getLogger(__name__)
async def stream_chat_completion_dummy(
session_id: str,
message: str | None = None,
tool_call_response: str | None = None,
is_user_message: bool = True,
user_id: str | None = None,
retry_count: int = 0,
session: ChatSession | None = None,
context: dict[str, str] | None = None,
**_kwargs: Any,
) -> AsyncGenerator[StreamBaseResponse, None]:
"""Stream dummy chat completion for testing.
Returns a simple streaming response with text deltas to test:
- Streaming infrastructure works
- No timeout occurs
- Text arrives in chunks
- StreamFinish is sent by mark_session_completed
"""
logger.warning(
f"[TEST MODE] Using dummy copilot streaming for session {session_id}"
)
message_id = str(uuid.uuid4())
text_block_id = str(uuid.uuid4())
# Start the stream
yield StreamStart(messageId=message_id, sessionId=session_id)
# Simulate streaming text response with delays
dummy_response = "I counted: 1... 2... 3. All done!"
words = dummy_response.split()
for i, word in enumerate(words):
# Add space except for last word
text = word if i == len(words) - 1 else f"{word} "
yield StreamTextDelta(id=text_block_id, delta=text)
# Small delay to simulate real streaming
await asyncio.sleep(0.1)