mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-04-08 03:00:28 -04:00
Requested by @majdyz
When users upload images or PDFs to CoPilot, the AI couldn't see the
content because the CLI's Zod validator rejects large base64 in MCP tool
results and even small images were misidentified (the CLI silently drops
or corrupts image content blocks in tool results).
## Approach
Embed uploaded images directly as **vision content blocks** in the user
message via `client._transport.write()`. The SDK's `client.query()` only
accepts string content, so we bypass it for multimodal messages —
writing a properly structured user message with `[...image_blocks,
{"type": "text", "text": query}]` directly to the transport. This
ensures the CLI binary receives images as native vision blocks, matching
how the Anthropic API handles multimodal input.
For binary files accessed via workspace tools at runtime, we save them
to the SDK's ephemeral working directory (`sdk_cwd`) and return a file
path for the CLI's built-in `Read` tool to handle natively.
## Changes
### Vision content blocks for attached files — `service.py`
- `_prepare_file_attachments` downloads workspace files before the
query, converts images to base64 vision blocks (`{"type": "image",
"source": {"type": "base64", ...}}`)
- When vision blocks are present, writes multimodal user message
directly to `client._transport` instead of using `client.query()`
- Non-image files (PDFs, text) are saved to `sdk_cwd` with a hint to use
the Read tool
### File-path based access for workspace tools — `workspace_files.py`
- `read_workspace_file` saves binary files to `sdk_cwd` instead of
returning base64, returning a path for the Read tool
### SDK context for ephemeral directory — `tool_adapter.py`
- Added `sdk_cwd` context variable so workspace tools can access the
ephemeral directory
- Removed inline base64 multimodal block machinery
(`_extract_content_block`, `_strip_base64_from_text`, `_BLOCK_BUILDERS`,
etc.)
### Frontend — rendering improvements
- `MessageAttachments.tsx` — uses `OutputRenderers` system
(`globalRegistry` + `OutputItem`) for image/video preview rendering
instead of custom components
- `GenericTool.tsx` — uses `OutputRenderers` system for inline image
rendering of base64 content
- `routes.py` — returns 409 for duplicate workspace filenames
### Tests
- `tool_adapter_test.py` — removed multimodal extraction/stripping
tests, added `get_sdk_cwd` tests
- `service_test.py` — rewritten for `_prepare_file_attachments` with
file-on-disk assertions
Closes OPEN-3022
---------
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
60 lines
1.8 KiB
Python
60 lines
1.8 KiB
Python
"""Dummy SDK service for testing copilot streaming.
|
|
|
|
Returns mock streaming responses without calling Claude Agent SDK.
|
|
Enable via COPILOT_TEST_MODE=true environment variable.
|
|
|
|
WARNING: This is for testing only. Do not use in production.
|
|
"""
|
|
|
|
import asyncio
|
|
import logging
|
|
import uuid
|
|
from collections.abc import AsyncGenerator
|
|
from typing import Any
|
|
|
|
from ..model import ChatSession
|
|
from ..response_model import StreamBaseResponse, StreamStart, StreamTextDelta
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
async def stream_chat_completion_dummy(
|
|
session_id: str,
|
|
message: str | None = None,
|
|
tool_call_response: str | None = None,
|
|
is_user_message: bool = True,
|
|
user_id: str | None = None,
|
|
retry_count: int = 0,
|
|
session: ChatSession | None = None,
|
|
context: dict[str, str] | None = None,
|
|
**_kwargs: Any,
|
|
) -> AsyncGenerator[StreamBaseResponse, None]:
|
|
"""Stream dummy chat completion for testing.
|
|
|
|
Returns a simple streaming response with text deltas to test:
|
|
- Streaming infrastructure works
|
|
- No timeout occurs
|
|
- Text arrives in chunks
|
|
- StreamFinish is sent by mark_session_completed
|
|
"""
|
|
logger.warning(
|
|
f"[TEST MODE] Using dummy copilot streaming for session {session_id}"
|
|
)
|
|
|
|
message_id = str(uuid.uuid4())
|
|
text_block_id = str(uuid.uuid4())
|
|
|
|
# Start the stream
|
|
yield StreamStart(messageId=message_id, sessionId=session_id)
|
|
|
|
# Simulate streaming text response with delays
|
|
dummy_response = "I counted: 1... 2... 3. All done!"
|
|
words = dummy_response.split()
|
|
|
|
for i, word in enumerate(words):
|
|
# Add space except for last word
|
|
text = word if i == len(words) - 1 else f"{word} "
|
|
yield StreamTextDelta(id=text_block_id, delta=text)
|
|
# Small delay to simulate real streaming
|
|
await asyncio.sleep(0.1)
|