feat(chat/sdk): route SDK through OpenRouter with observability (#12084)

## Summary - Routes Claude Agent SDK API calls through OpenRouter via `ANTHROPIC_BASE_URL` / `ANTHROPIC_AUTH_TOKEN` env vars, enabling per-call token and cost tracking on the OpenRouter dashboard - Adds `sdk_model` and `sdk_max_budget_usd` config fields for SDK-specific model selection and budget control - Emits `StreamUsage` from SDK `ResultMessage` so the frontend receives token counts, and persists usage to `session.usage` - Fixes Langfuse tracing to use the configured model name instead of a hardcoded default - Updates Anthropic fallback to use `config.api_key` / `config.base_url` (OpenRouter routing) instead of raw `ANTHROPIC_API_KEY` env var ## Test plan - [ ] Deploy and send a CoPilot message — verify the API call appears on the OpenRouter dashboard - [ ] Check Langfuse trace shows correct model name (e.g. `claude-opus-4.6` not hardcoded `claude-sonnet-4-20250514`) - [ ] Verify frontend receives `StreamUsage` with `promptTokens` / `completionTokens` values - [ ] Set `CHAT_SDK_MAX_BUDGET_USD` and verify budget is respected - [ ] Test fallback path (without `claude-agent-sdk` installed) still works via OpenRouter  <h2>Greptile Overview</h2> <details><summary><h3>Greptile Summary</h3></summary> Routes Claude Agent SDK API calls through OpenRouter for enhanced observability and cost tracking. The PR enables per-call token tracking on the OpenRouter dashboard by configuring the SDK to use `ANTHROPIC_BASE_URL` and `ANTHROPIC_AUTH_TOKEN` environment variables derived from the chat configuration. Key changes: - Added `sdk_model` and `sdk_max_budget_usd` configuration fields for SDK-specific control - Implemented automatic model name resolution that strips OpenRouter provider prefixes - Updated SDK client initialization to route through OpenRouter with proper environment variables - Emits `StreamUsage` events from SDK `ResultMessage` for frontend token visibility - Persists usage data to `session.usage` for historical tracking - Fixed Langfuse tracing to use the configured model name instead of hardcoded defaults - Updated fallback path to use OpenRouter routing instead of direct Anthropic API </details> <details><summary><h3>Confidence Score: 4/5</h3></summary> - Safe to merge with minor observations - the implementation is solid and the changes are well-structured - The code quality is high with proper error handling, clear separation of concerns, and good defensive coding practices. The changes integrate cleanly with existing patterns. Minor observations include missing validation for sdk_max_budget_usd and a potential edge case in model name resolution, but these don't block merging - No files require special attention - all changes follow existing patterns and maintain consistency </details> <details><summary><h3>Sequence Diagram</h3></summary> ```mermaid sequenceDiagram participant Frontend participant Backend participant SDK as Claude Agent SDK participant OpenRouter participant Anthropic participant Langfuse Frontend->>Backend: POST /chat/completions Backend->>Backend: Load config (api_key, base_url) Backend->>Backend: Resolve SDK model (strip OpenRouter prefix) Backend->>Backend: Build SDK env vars (ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN) Backend->>Langfuse: Initialize TracedSession with model name Backend->>SDK: ClaudeSDKClient(model, env, max_budget_usd) SDK->>SDK: Use ANTHROPIC_BASE_URL from env SDK->>OpenRouter: POST /messages (via configured base_url) OpenRouter->>Anthropic: Forward request with routing Anthropic-->>OpenRouter: Stream response chunks OpenRouter-->>SDK: Stream response with usage data loop For each SDK message SDK-->>Backend: AssistantMessage/UserMessage/ResultMessage Backend->>Langfuse: log_sdk_message() Backend->>Backend: SDKResponseAdapter.convert_message() Backend->>Backend: Extract usage from ResultMessage Backend->>Backend: Persist Usage to session.usage Backend-->>Frontend: StreamUsage(promptTokens, completionTokens) Backend-->>Frontend: StreamTextDelta/StreamToolInput/etc end Backend->>Langfuse: Log final generation with model name Backend->>Backend: Save session with usage data Backend-->>Frontend: StreamFinish ``` </details>
2026-04-08 03:00:28 -04:00 · 2026-02-12 18:47:39 +04:00
parent d7f7a2747f
commit 3bea584659
6 changed files with 252 additions and 17 deletions
--- a/autogpt_platform/backend/backend/api/features/chat/config.py
+++ b/autogpt_platform/backend/backend/api/features/chat/config.py
@@ -97,9 +97,14 @@ class ChatConfig(BaseSettings):
        default=True,
        description="Use Claude Agent SDK for chat completions",
    )
-    sdk_max_buffer_size: int = Field(
+    claude_agent_model: str | None = Field(
+        default=None,
+        description="Model for the Claude Agent SDK path. If None, derives from "
+        "the `model` field by stripping the OpenRouter provider prefix.",
+    )
+    claude_agent_max_buffer_size: int = Field(
        default=10 * 1024 * 1024,  # 10MB (default SDK is 1MB)
-        description="Max buffer size in bytes for SDK JSON message parsing. "
+        description="Max buffer size in bytes for Claude Agent SDK JSON message parsing. "
        "Increase if tool outputs exceed the limit.",
    )

--- a/autogpt_platform/backend/backend/api/features/chat/sdk/anthropic_fallback.py
+++ b/autogpt_platform/backend/backend/api/features/chat/sdk/anthropic_fallback.py
@@ -6,11 +6,12 @@ directly when the Claude Agent SDK is not available.

 import json
 import logging
-import os
 import uuid
 from collections.abc import AsyncGenerator
 from typing import Any, cast

+import anthropic
+
 from ..config import ChatConfig
 from ..model import ChatMessage, ChatSession
 from ..response_model import (
@@ -23,7 +24,6 @@ from ..response_model import (
    StreamToolInputAvailable,
    StreamToolInputStart,
    StreamToolOutputAvailable,
-    StreamUsage,
 )
 from .tool_adapter import get_tool_definitions, get_tool_handlers

@@ -44,19 +44,27 @@ async def stream_with_anthropic(
    This function accumulates messages into the session for persistence.
    The caller should NOT yield an additional StreamFinish - this function handles it.
    """
-    import anthropic
-
-    # Only use ANTHROPIC_API_KEY - don't fall back to OpenRouter keys
-    api_key = os.getenv("ANTHROPIC_API_KEY")
+    # Use config.api_key (CHAT_API_KEY > OPEN_ROUTER_API_KEY > OPENAI_API_KEY)
+    # with config.base_url for OpenRouter routing — matching the non-SDK path.
+    api_key = config.api_key
    if not api_key:
        yield StreamError(
-            errorText="ANTHROPIC_API_KEY not configured for fallback",
+            errorText="No API key configured (set CHAT_API_KEY or OPENAI_API_KEY)",
            code="config_error",
        )
        yield StreamFinish()
        return

-    client = anthropic.AsyncAnthropic(api_key=api_key)
+    # Build kwargs for the Anthropic client — use base_url if configured
+    client_kwargs: dict[str, Any] = {"api_key": api_key}
+    if config.base_url:
+        # Strip /v1 suffix — Anthropic SDK adds its own version path
+        base = config.base_url.rstrip("/")
+        if base.endswith("/v1"):
+            base = base[:-3]
+        client_kwargs["base_url"] = base
+
+    client = anthropic.AsyncAnthropic(**client_kwargs)
    tool_definitions = get_tool_definitions()
    tool_handlers = get_tool_handlers()

@@ -220,12 +228,6 @@ async def stream_with_anthropic(
                            ChatMessage(role="assistant", content=accumulated_text)
                        )

-                    yield StreamUsage(
-                        promptTokens=final_message.usage.input_tokens,
-                        completionTokens=final_message.usage.output_tokens,
-                        totalTokens=final_message.usage.input_tokens
-                        + final_message.usage.output_tokens,
-                    )
                    yield StreamFinish()
                    return

--- a/autogpt_platform/backend/backend/api/features/chat/sdk/service.py
+++ b/autogpt_platform/backend/backend/api/features/chat/sdk/service.py
@@ -64,6 +64,48 @@ interpreters (python, node) are NOT available.
 """


+def _resolve_sdk_model() -> str | None:
+    """Resolve the model name for the Claude Agent SDK CLI.
+
+    Uses ``config.claude_agent_model`` if set, otherwise derives from
+    ``config.model`` by stripping the OpenRouter provider prefix (e.g.,
+    ``"anthropic/claude-opus-4.6"`` → ``"claude-opus-4.6"``).
+    """
+    if config.claude_agent_model:
+        return config.claude_agent_model
+    model = config.model
+    if "/" in model:
+        return model.split("/", 1)[1]
+    return model
+
+
+def _build_sdk_env() -> dict[str, str]:
+    """Build env vars for the SDK CLI process.
+
+    Routes API calls through OpenRouter (or a custom base_url) using
+    the same ``config.api_key`` / ``config.base_url`` as the non-SDK path.
+    This gives per-call token and cost tracking on the OpenRouter dashboard.
+
+    Only overrides ``ANTHROPIC_API_KEY`` when a valid proxy URL and auth
+    token are both present — otherwise returns an empty dict so the SDK
+    falls back to its default credentials.
+    """
+    env: dict[str, str] = {}
+    if config.api_key and config.base_url:
+        # Strip /v1 suffix — SDK expects the base URL without a version path
+        base = config.base_url.rstrip("/")
+        if base.endswith("/v1"):
+            base = base[:-3]
+        if not base or not base.startswith("http"):
+            # Invalid base_url — don't override SDK defaults
+            return env
+        env["ANTHROPIC_BASE_URL"] = base
+        env["ANTHROPIC_AUTH_TOKEN"] = config.api_key
+        # Must be explicitly empty so the CLI uses AUTH_TOKEN instead
+        env["ANTHROPIC_API_KEY"] = ""
+    return env
+
+
 def _make_sdk_cwd(session_id: str) -> str:
    """Create a safe, session-specific working directory path.

@@ -315,8 +357,19 @@ async def stream_chat_completion_sdk(
        try:
            from claude_agent_sdk import ClaudeAgentOptions, ClaudeSDKClient

+            # Fail fast when no API credentials are available at all
+            sdk_env = _build_sdk_env()
+            if not sdk_env and not os.environ.get("ANTHROPIC_API_KEY"):
+                raise RuntimeError(
+                    "No API key configured. Set OPEN_ROUTER_API_KEY "
+                    "(or CHAT_API_KEY) for OpenRouter routing, "
+                    "or ANTHROPIC_API_KEY for direct Anthropic access."
+                )
+
            mcp_server = create_copilot_mcp_server()

+            sdk_model = _resolve_sdk_model()
+
            # Initialize Langfuse tracing (no-op if not configured)
            tracer = TracedSession(session_id, user_id, system_prompt)

@@ -331,7 +384,9 @@ async def stream_chat_completion_sdk(
                allowed_tools=COPILOT_TOOL_NAMES,
                hooks=combined_hooks,  # type: ignore[arg-type]
                cwd=sdk_cwd,
-                max_buffer_size=config.sdk_max_buffer_size,
+                max_buffer_size=config.claude_agent_max_buffer_size,
+                # Only pass model/env when OpenRouter is configured
+                **({"model": sdk_model, "env": sdk_env} if sdk_env else {}),
            )

            adapter = SDKResponseAdapter(message_id=message_id)
@@ -385,6 +440,7 @@ async def stream_chat_completion_sdk(
                    for response in adapter.convert_message(sdk_msg):
                        if isinstance(response, StreamStart):
                            continue
+
                        yield response

                        if isinstance(response, StreamTextDelta):
--- a/autogpt_platform/backend/backend/api/features/chat/tools/init.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/init.py
@@ -19,6 +19,7 @@ from .get_doc_page import GetDocPageTool
 from .run_agent import RunAgentTool
 from .run_block import RunBlockTool
 from .search_docs import SearchDocsTool
+from .web_fetch import WebFetchTool
 from .workspace_files import (
    DeleteWorkspaceFileTool,
    ListWorkspaceFilesTool,
@@ -45,6 +46,8 @@ TOOL_REGISTRY: dict[str, BaseTool] = {
    "view_agent_output": AgentOutputTool(),
    "search_docs": SearchDocsTool(),
    "get_doc_page": GetDocPageTool(),
+    # Web fetch for safe URL retrieval
+    "web_fetch": WebFetchTool(),
    # Workspace tools for CoPilot file operations
    "list_workspace_files": ListWorkspaceFilesTool(),
    "read_workspace_file": ReadWorkspaceFileTool(),
--- a/autogpt_platform/backend/backend/api/features/chat/tools/models.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/models.py
@@ -40,6 +40,8 @@ class ResponseType(str, Enum):
    OPERATION_IN_PROGRESS = "operation_in_progress"
    # Input validation
    INPUT_VALIDATION_ERROR = "input_validation_error"
+    # Web fetch
+    WEB_FETCH = "web_fetch"


 # Base response model
@@ -427,3 +429,14 @@ class AsyncProcessingResponse(ToolResponseBase):
    status: str = "accepted"  # Must be "accepted" for detection
    operation_id: str | None = None
    task_id: str | None = None
+
+
+class WebFetchResponse(ToolResponseBase):
+    """Response for web_fetch tool."""
+
+    type: ResponseType = ResponseType.WEB_FETCH
+    url: str
+    status_code: int
+    content_type: str
+    content: str
+    truncated: bool = False
--- a/autogpt_platform/backend/backend/api/features/chat/tools/web_fetch.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/web_fetch.py
@@ -0,0 +1,156 @@
+"""Web fetch tool — safely retrieve public web page content."""
+
+import logging
+from typing import Any
+
+import aiohttp
+import html2text
+
+from backend.api.features.chat.model import ChatSession
+from backend.api.features.chat.tools.base import BaseTool
+from backend.api.features.chat.tools.models import (
+    ErrorResponse,
+    ToolResponseBase,
+    WebFetchResponse,
+)
+from backend.util.request import Requests
+
+logger = logging.getLogger(__name__)
+
+# Limits
+_MAX_CONTENT_BYTES = 102_400  # 100 KB download cap
+_MAX_OUTPUT_CHARS = 50_000  # 50K char truncation for LLM context
+_REQUEST_TIMEOUT = aiohttp.ClientTimeout(total=15)
+
+# Content types we'll read as text
+_TEXT_CONTENT_TYPES = {
+    "text/html",
+    "text/plain",
+    "text/xml",
+    "text/csv",
+    "text/markdown",
+    "application/json",
+    "application/xml",
+    "application/xhtml+xml",
+    "application/rss+xml",
+    "application/atom+xml",
+}
+
+
+def _is_text_content(content_type: str) -> bool:
+    base = content_type.split(";")[0].strip().lower()
+    return base in _TEXT_CONTENT_TYPES or base.startswith("text/")
+
+
+def _html_to_text(html: str) -> str:
+    h = html2text.HTML2Text()
+    h.ignore_links = False
+    h.ignore_images = True
+    h.body_width = 0
+    return h.handle(html)
+
+
+class WebFetchTool(BaseTool):
+    """Safely fetch content from a public URL using SSRF-protected HTTP."""
+
+    @property
+    def name(self) -> str:
+        return "web_fetch"
+
+    @property
+    def description(self) -> str:
+        return (
+            "Fetch the content of a public web page by URL. "
+            "Returns readable text extracted from HTML by default. "
+            "Useful for reading documentation, articles, and API responses. "
+            "Only supports HTTP/HTTPS GET requests to public URLs "
+            "(private/internal network addresses are blocked)."
+        )
+
+    @property
+    def parameters(self) -> dict[str, Any]:
+        return {
+            "type": "object",
+            "properties": {
+                "url": {
+                    "type": "string",
+                    "description": "The public HTTP/HTTPS URL to fetch.",
+                },
+                "extract_text": {
+                    "type": "boolean",
+                    "description": (
+                        "If true (default), extract readable text from HTML. "
+                        "If false, return raw content."
+                    ),
+                    "default": True,
+                },
+            },
+            "required": ["url"],
+        }
+
+    @property
+    def requires_auth(self) -> bool:
+        return False
+
+    async def _execute(
+        self,
+        user_id: str | None,
+        session: ChatSession,
+        **kwargs: Any,
+    ) -> ToolResponseBase:
+        url: str = (kwargs.get("url") or "").strip()
+        extract_text: bool = kwargs.get("extract_text", True)
+        session_id = session.session_id if session else None
+
+        if not url:
+            return ErrorResponse(
+                message="Please provide a URL to fetch.",
+                error="missing_url",
+                session_id=session_id,
+            )
+
+        try:
+            client = Requests(raise_for_status=False, retry_max_attempts=1)
+            response = await client.get(url, timeout=_REQUEST_TIMEOUT)
+        except ValueError as e:
+            # validate_url raises ValueError for SSRF / blocked IPs
+            return ErrorResponse(
+                message=f"URL blocked: {e}",
+                error="url_blocked",
+                session_id=session_id,
+            )
+        except Exception as e:
+            logger.warning(f"[web_fetch] Request failed for {url}: {e}")
+            return ErrorResponse(
+                message=f"Failed to fetch URL: {e}",
+                error="fetch_failed",
+                session_id=session_id,
+            )
+
+        content_type = response.headers.get("content-type", "")
+        if not _is_text_content(content_type):
+            return ErrorResponse(
+                message=f"Non-text content type: {content_type.split(';')[0]}",
+                error="unsupported_content_type",
+                session_id=session_id,
+            )
+
+        raw = response.content[:_MAX_CONTENT_BYTES]
+        text = raw.decode("utf-8", errors="replace")
+
+        if extract_text and "html" in content_type.lower():
+            text = _html_to_text(text)
+
+        truncated = len(text) > _MAX_OUTPUT_CHARS
+        if truncated:
+            text = text[:_MAX_OUTPUT_CHARS]
+
+        return WebFetchResponse(
+            message=f"Fetched {url}" + (" (truncated)" if truncated else ""),
+            url=response.url,
+            status_code=response.status,
+            content_type=content_type.split(";")[0].strip(),
+            content=text,
+            truncated=truncated,
+            session_id=session_id,
+        )