mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-04-08 03:00:28 -04:00
## Summary When the Claude SDK returns a prompt-too-long error (e.g. transcript + query exceeds the model's context window), the streaming loop now retries with escalating fallbacks instead of failing immediately: 1. **Attempt 1**: Use the transcript as-is (normal path) 2. **Attempt 2**: Compact the transcript via LLM summarization (`compact_transcript`) and retry 3. **Attempt 3**: Drop the transcript entirely and fall back to DB-reconstructed context (`_build_query_message`) If all 3 attempts fail, a `StreamError(code="prompt_too_long")` is yielded to the frontend. ### Key changes **`service.py`** - Add `_is_prompt_too_long(err)` — pattern-matches SDK exceptions for prompt-length errors (`prompt is too long`, `prompt_too_long`, `context_length_exceeded`, `request too large`) - Wrap `async with ClaudeSDKClient` in a 3-attempt retry `for` loop with compaction/fallback logic - Move `current_message`, `_build_query_message`, and `_prepare_file_attachments` before the retry loop (computed once, reused) - Skip transcript upload in `finally` when `transcript_caused_error` (avoids persisting a broken/empty transcript) - Reset `stream_completed` between retry iterations - Document outer-scope variable contract in `_run_stream_attempt` closure (which variables are reassigned between retries vs read-only) **`transcript.py`** - Add `compact_transcript(content, log_prefix, model)` — converts JSONL → messages → `compress_context` (LLM summarization with truncation fallback) → JSONL - Add helpers: `_flatten_assistant_content`, `_flatten_tool_result_content`, `_transcript_to_messages`, `_messages_to_transcript`, `_run_compression` - Returns `None` when compaction fails or transcript is already within budget (signals caller to fall through to DB fallback) - Truncation fallback wrapped in 30s timeout to prevent unbounded CPU time on large transcripts - Accepts `model` parameter to avoid creating a new `ChatConfig()` on every call **`util/prompt.py`** - Fix `_truncate_middle_tokens` edge case: returns empty string when `max_tok < 1`, properly handles `max_tok < 3` **`config.py`** - E2B sandbox timeout raised from 5 min to 15 min to accommodate compaction retries **`prompt_too_long_test.py`** (new, 45 tests) - `_is_prompt_too_long` positive/negative patterns, case sensitivity, BaseException handling - Flatten helpers for assistant/tool_result content blocks - `_transcript_to_messages` / `_messages_to_transcript` roundtrip, strippable types, empty content - `compact_transcript` async tests: too few messages, not compacted, successful compaction, compression failure **`retry_scenarios_test.py`** (new, 27 tests) - Full retry state machine simulation covering all 8 scenarios: 1. Normal flow (no retry) 2. Compact succeeds → retry succeeds 3. Compact fails → DB fallback succeeds 4. No transcript → DB fallback succeeds 5. Double fail → DB fallback on attempt 3 6. All 3 attempts exhausted 7. Non-prompt-too-long error (no retry) 8. Compaction returns identical content → DB fallback - Edge cases: nested exceptions, case insensitivity, unicode content, large transcripts, resume-after-compaction flow **Shared test fixtures** (`conftest.py`) - Extracted `build_test_transcript` helper used across 3 test files to eliminate duplication ## Test plan - [x] `_is_prompt_too_long` correctly identifies prompt-length errors (8 positive, 5 negative patterns) - [x] `compact_transcript` compacts oversized transcripts via LLM summarization - [x] `compact_transcript` returns `None` on failure or when already within budget - [x] Retry loop state machine: all 8 scenarios verified with state assertions - [x] `TranscriptBuilder` works correctly after loading compacted transcripts - [x] `_messages_to_transcript` roundtrip preserves content including unicode - [x] `transcript_caused_error` prevents stale transcript upload - [x] Truncation timeout prevents unbounded CPU time - [x] All 139 unit tests pass locally - [x] CI green (tests 3.11/3.12/3.13, types, CodeQL, linting)
283 lines
8.5 KiB
Python
283 lines
8.5 KiB
Python
"""
|
|
Response models for Vercel AI SDK UI Stream Protocol.
|
|
|
|
This module implements the AI SDK UI Stream Protocol (v1) for streaming chat responses.
|
|
See: https://ai-sdk.dev/docs/ai-sdk-ui/stream-protocol
|
|
"""
|
|
|
|
import json
|
|
import logging
|
|
from enum import Enum
|
|
from typing import Any
|
|
|
|
from pydantic import BaseModel, Field
|
|
|
|
from backend.util.json import dumps as json_dumps
|
|
from backend.util.truncate import truncate
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
class ResponseType(str, Enum):
|
|
"""Types of streaming responses following AI SDK protocol."""
|
|
|
|
# Message lifecycle
|
|
START = "start"
|
|
FINISH = "finish"
|
|
|
|
# Step lifecycle (one LLM API call within a message)
|
|
START_STEP = "start-step"
|
|
FINISH_STEP = "finish-step"
|
|
|
|
# Text streaming
|
|
TEXT_START = "text-start"
|
|
TEXT_DELTA = "text-delta"
|
|
TEXT_END = "text-end"
|
|
|
|
# Tool interaction
|
|
TOOL_INPUT_START = "tool-input-start"
|
|
TOOL_INPUT_AVAILABLE = "tool-input-available"
|
|
TOOL_OUTPUT_AVAILABLE = "tool-output-available"
|
|
|
|
# Other
|
|
ERROR = "error"
|
|
USAGE = "usage"
|
|
HEARTBEAT = "heartbeat"
|
|
STATUS = "status"
|
|
|
|
|
|
class StreamBaseResponse(BaseModel):
|
|
"""Base response model for all streaming responses."""
|
|
|
|
type: ResponseType
|
|
|
|
def to_sse(self) -> str:
|
|
"""Convert to SSE format."""
|
|
json_str = self.model_dump_json(exclude_none=True)
|
|
return f"data: {json_str}\n\n"
|
|
|
|
|
|
# ========== Message Lifecycle ==========
|
|
|
|
|
|
class StreamStart(StreamBaseResponse):
|
|
"""Start of a new message."""
|
|
|
|
type: ResponseType = ResponseType.START
|
|
messageId: str = Field(..., description="Unique message ID")
|
|
sessionId: str | None = Field(
|
|
default=None,
|
|
description="Session ID for SSE reconnection.",
|
|
)
|
|
|
|
def to_sse(self) -> str:
|
|
"""Convert to SSE format, excluding non-protocol fields like sessionId."""
|
|
data: dict[str, Any] = {
|
|
"type": self.type.value,
|
|
"messageId": self.messageId,
|
|
}
|
|
return f"data: {json.dumps(data)}\n\n"
|
|
|
|
|
|
class StreamFinish(StreamBaseResponse):
|
|
"""End of message/stream."""
|
|
|
|
type: ResponseType = ResponseType.FINISH
|
|
|
|
|
|
class StreamStartStep(StreamBaseResponse):
|
|
"""Start of a step (one LLM API call within a message).
|
|
|
|
The AI SDK uses this to add a step-start boundary to message.parts,
|
|
enabling visual separation between multiple LLM calls in a single message.
|
|
"""
|
|
|
|
type: ResponseType = ResponseType.START_STEP
|
|
|
|
|
|
class StreamFinishStep(StreamBaseResponse):
|
|
"""End of a step (one LLM API call within a message).
|
|
|
|
The AI SDK uses this to reset activeTextParts and activeReasoningParts,
|
|
so the next LLM call in a tool-call continuation starts with clean state.
|
|
"""
|
|
|
|
type: ResponseType = ResponseType.FINISH_STEP
|
|
|
|
|
|
# ========== Text Streaming ==========
|
|
|
|
|
|
class StreamTextStart(StreamBaseResponse):
|
|
"""Start of a text block."""
|
|
|
|
type: ResponseType = ResponseType.TEXT_START
|
|
id: str = Field(..., description="Text block ID")
|
|
|
|
|
|
class StreamTextDelta(StreamBaseResponse):
|
|
"""Streaming text content delta."""
|
|
|
|
type: ResponseType = ResponseType.TEXT_DELTA
|
|
id: str = Field(..., description="Text block ID")
|
|
delta: str = Field(..., description="Text content delta")
|
|
|
|
|
|
class StreamTextEnd(StreamBaseResponse):
|
|
"""End of a text block."""
|
|
|
|
type: ResponseType = ResponseType.TEXT_END
|
|
id: str = Field(..., description="Text block ID")
|
|
|
|
|
|
# ========== Tool Interaction ==========
|
|
|
|
|
|
class StreamToolInputStart(StreamBaseResponse):
|
|
"""Tool call started notification."""
|
|
|
|
type: ResponseType = ResponseType.TOOL_INPUT_START
|
|
toolCallId: str = Field(..., description="Unique tool call ID")
|
|
toolName: str = Field(..., description="Name of the tool being called")
|
|
|
|
|
|
class StreamToolInputAvailable(StreamBaseResponse):
|
|
"""Tool input is ready for execution."""
|
|
|
|
type: ResponseType = ResponseType.TOOL_INPUT_AVAILABLE
|
|
toolCallId: str = Field(..., description="Unique tool call ID")
|
|
toolName: str = Field(..., description="Name of the tool being called")
|
|
input: dict[str, Any] = Field(
|
|
default_factory=dict, description="Tool input arguments"
|
|
)
|
|
|
|
|
|
_MAX_TOOL_OUTPUT_SIZE = 100_000 # ~100 KB; truncate to avoid bloating SSE/DB
|
|
|
|
|
|
class StreamToolOutputAvailable(StreamBaseResponse):
|
|
"""Tool execution result."""
|
|
|
|
type: ResponseType = ResponseType.TOOL_OUTPUT_AVAILABLE
|
|
toolCallId: str = Field(..., description="Tool call ID this responds to")
|
|
output: str | dict[str, Any] = Field(..., description="Tool execution output")
|
|
# Keep these for internal backend use
|
|
toolName: str | None = Field(
|
|
default=None, description="Name of the tool that was executed"
|
|
)
|
|
success: bool = Field(
|
|
default=True, description="Whether the tool execution succeeded"
|
|
)
|
|
|
|
def model_post_init(self, __context: Any) -> None:
|
|
"""Truncate oversized outputs after construction."""
|
|
self.output = truncate(self.output, _MAX_TOOL_OUTPUT_SIZE)
|
|
|
|
def to_sse(self) -> str:
|
|
"""Convert to SSE format, excluding non-spec fields."""
|
|
data = {
|
|
"type": self.type.value,
|
|
"toolCallId": self.toolCallId,
|
|
"output": self.output,
|
|
}
|
|
return f"data: {json.dumps(data)}\n\n"
|
|
|
|
|
|
# ========== Other ==========
|
|
|
|
|
|
class StreamUsage(StreamBaseResponse):
|
|
"""Token usage statistics.
|
|
|
|
Emitted as an SSE comment so the Vercel AI SDK parser ignores it
|
|
(it uses z.strictObject() and rejects unknown event types).
|
|
Usage data is recorded server-side (session DB + Redis counters).
|
|
"""
|
|
|
|
type: ResponseType = ResponseType.USAGE
|
|
prompt_tokens: int = Field(
|
|
...,
|
|
serialization_alias="promptTokens",
|
|
description="Number of uncached prompt tokens",
|
|
)
|
|
completion_tokens: int = Field(
|
|
...,
|
|
serialization_alias="completionTokens",
|
|
description="Number of completion tokens",
|
|
)
|
|
total_tokens: int = Field(
|
|
...,
|
|
serialization_alias="totalTokens",
|
|
description="Total number of tokens (raw, not weighted)",
|
|
)
|
|
cache_read_tokens: int = Field(
|
|
default=0,
|
|
serialization_alias="cacheReadTokens",
|
|
description="Prompt tokens served from cache (10% cost)",
|
|
)
|
|
cache_creation_tokens: int = Field(
|
|
default=0,
|
|
serialization_alias="cacheCreationTokens",
|
|
description="Prompt tokens written to cache (25% cost)",
|
|
)
|
|
|
|
def to_sse(self) -> str:
|
|
"""Emit as SSE comment so the AI SDK parser ignores it."""
|
|
return f": usage {self.model_dump_json(exclude_none=True, by_alias=True)}\n\n"
|
|
|
|
|
|
class StreamError(StreamBaseResponse):
|
|
"""Error response."""
|
|
|
|
type: ResponseType = ResponseType.ERROR
|
|
errorText: str = Field(..., description="Error message text")
|
|
code: str | None = Field(default=None, description="Error code")
|
|
details: dict[str, Any] | None = Field(
|
|
default=None, description="Additional error details"
|
|
)
|
|
|
|
def to_sse(self) -> str:
|
|
"""Convert to SSE format, only emitting fields required by AI SDK protocol.
|
|
|
|
The AI SDK uses z.strictObject({type, errorText}) which rejects
|
|
any extra fields like `code` or `details`.
|
|
"""
|
|
data = {
|
|
"type": self.type.value,
|
|
"errorText": self.errorText,
|
|
}
|
|
return f"data: {json_dumps(data)}\n\n"
|
|
|
|
|
|
class StreamHeartbeat(StreamBaseResponse):
|
|
"""Heartbeat to keep SSE connection alive during long-running operations.
|
|
|
|
Uses SSE comment format (: comment) which is ignored by clients but keeps
|
|
the connection alive through proxies and load balancers.
|
|
"""
|
|
|
|
type: ResponseType = ResponseType.HEARTBEAT
|
|
toolCallId: str | None = Field(
|
|
default=None, description="Tool call ID if heartbeat is for a specific tool"
|
|
)
|
|
|
|
def to_sse(self) -> str:
|
|
"""Convert to SSE comment format to keep connection alive."""
|
|
return ": heartbeat\n\n"
|
|
|
|
|
|
class StreamStatus(StreamBaseResponse):
|
|
"""Transient status notification shown to the user during long operations.
|
|
|
|
Used to provide feedback when the backend performs behind-the-scenes work
|
|
(e.g., compacting conversation context on a retry) that would otherwise
|
|
leave the user staring at an unexplained pause.
|
|
|
|
Sent as a proper ``data:`` event so the frontend can display it to the
|
|
user. The AI SDK stream parser gracefully skips unknown chunk types
|
|
(logs a console warning), so this does not break the stream.
|
|
"""
|
|
|
|
type: ResponseType = ResponseType.STATUS
|
|
message: str = Field(..., description="Human-readable status message")
|