mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-04-08 03:00:28 -04:00
## Why
Multiple Sentry issues paging on-call in prod:
1. **AUTOGPT-SERVER-8BP**: `ConversionError: Failed to convert
anthropic/claude-sonnet-4-6 to <enum 'LlmModel'>` — the copilot passes
OpenRouter-style provider-prefixed model names
(`anthropic/claude-sonnet-4-6`) to blocks, but the `LlmModel` enum only
recognizes the bare model ID (`claude-sonnet-4-6`).
2. **BUILDER-7GF**: `Error invoking postEvent: Method not found` —
Sentry SDK internal error on Chrome Mobile Android, not a platform bug.
3. **XMLParserBlock**: `BlockUnknownError raised by XMLParserBlock with
message: Error in input xml syntax` — user sent bad XML but the block
raised `SyntaxError`, which gets wrapped as `BlockUnknownError`
(unexpected) instead of `BlockExecutionError` (expected).
4. **AUTOGPT-SERVER-8BS**: `Virus scanning failed for Screenshot
2026-03-26 091900.png: range() arg 3 must not be zero` — empty (0-byte)
file upload causes `range(0, 0, 0)` in the virus scanner chunking loop,
and the failure is logged at `error` level which pages on-call.
5. **AUTOGPT-SERVER-8BT**: `ValueError: <Token var=<ContextVar
name='current_context'>> was created in a different Context` —
OpenTelemetry `context.detach()` fails when the SDK streaming async
generator is garbage-collected in a different context than where it was
created (client disconnect mid-stream).
6. **AUTOGPT-SERVER-8BW**: `RuntimeError: Attempted to exit cancel scope
in a different task than it was entered in` — anyio's
`TaskGroup.__aexit__` detects cancel scope entered in one task but
exited in another when `GeneratorExit` interrupts the SDK cleanup during
client disconnect.
7. **Workspace UniqueViolationError**: `UniqueViolationError: Unique
constraint failed on (workspaceId, path)` — race condition during
concurrent file uploads handled by `WorkspaceManager._persist_db_record`
retry logic, but Sentry still captures the exception at the raise site.
8. **Library UniqueViolationError**: `UniqueViolationError` on
`LibraryAgent (userId, agentGraphId, agentGraphVersion)` — race
conditions in `add_graph_to_library` and `create_library_agent` caused
crashes or silent data loss.
9. **Graph version collision**: `UniqueViolationError` on `AgentGraph
(id, version)` — copilot re-saving an agent at an existing version
collides with the primary key.
## What
### Backend: `LlmModel._missing_()` for provider-prefixed model names
- Adds `_missing_` classmethod to `LlmModel` enum that strips the
provider prefix (e.g., `anthropic/`) when direct lookup fails
- Self-contained in the enum — no changes to the generic type conversion
system
### Frontend: Filter Sentry SDK noise
- Adds `postEvent: Method not found` to `ignoreErrors` — a known Sentry
SDK issue on certain mobile browsers
### Backend: XMLParserBlock — raise ValueError instead of SyntaxError
- Changed `_validate_tokens()` to raise `ValueError` instead of
`SyntaxError`
- Changed the `except SyntaxError` handler in `run()` to re-raise as
`ValueError`
- This ensures `Block.execute()` wraps XML parsing failures as
`BlockExecutionError` (expected/user-caused) instead of
`BlockUnknownError` (unexpected/alerts Sentry)
### Backend: Virus scanner — handle empty files + reduce alert noise
- Added early return for empty (0-byte) files in `scan_file()` to avoid
`range() arg 3 must not be zero` when `chunk_size` is 0
- Added `max(1, len(content))` guard on `chunk_size` as defense-in-depth
- Downgraded `scan_content_safe` failure log from `error` to `warning`
so single-file scan failures don't page on-call via Sentry
### Backend: Suppress SDK client cleanup errors on SSE disconnect
- Replaced `async with ClaudeSDKClient` in `_run_stream_attempt` with
manual `__aenter__`/`__aexit__` wrapped in new
`_safe_close_sdk_client()` helper
- `_safe_close_sdk_client()` catches `ValueError` (OTEL context token
mismatch) and `RuntimeError` (anyio cancel scope in wrong task) during
`__aexit__` and logs at `debug` level — these are expected when SSE
client disconnects mid-stream
- Added `_is_sdk_disconnect_error()` helper for defense-in-depth at the
outer `except BaseException` handler in `stream_chat_completion_sdk`
- Both Sentry errors (8BT and 8BW) are now suppressed without affecting
normal cleanup flow
### Backend: Filter workspace UniqueViolationError from Sentry alerts
- Added `before_send` filter in `_before_send()` to drop
`UniqueViolationError` events where the message contains `workspaceId`
and `path`
- The error is already handled by `WorkspaceManager._persist_db_record`
retry logic — it must propagate for the retry logic to work, so the fix
is at the Sentry filter level rather than catching/suppressing at source
### Backend: Library agent race condition fixes
- **`add_graph_to_library`**: Replaced check-then-create pattern with
create-then-catch-`UniqueViolationError`-then-update. On collision,
updates the existing row (restoring soft-deleted/archived agents)
instead of crashing.
- **`create_library_agent`**: Replaced `create` with `upsert` on the
`(userId, agentGraphId, agentGraphVersion)` composite unique constraint,
so concurrent adds restore soft-deleted entries instead of throwing.
### Backend: Graph version auto-increment on collision
- `__create_graph` now checks if the `(id, version)` already exists
before `create_many`, and auto-increments the version to `max_existing +
1` to avoid `UniqueViolationError` when the copilot re-saves an agent.
### Backend: Workspace `get_or_create_workspace` upsert
- Changed from find-then-create to `upsert` to atomically handle
concurrent workspace creation.
## Test plan
- [x] `LlmModel("anthropic/claude-sonnet-4-6")` resolves correctly
- [x] `LlmModel("claude-sonnet-4-6")` still works (no regression)
- [x] `LlmModel("invalid/nonexistent-model")` still raises `ValueError`
- [x] XMLParserBlock: unclosed tags, extra closing tags, empty XML all
raise `ValueError`
- [x] XMLParserBlock: `SyntaxError` from gravitasml library is caught
and re-raised as `ValueError`
- [x] Virus scanner: empty file (0 bytes) returns clean without hitting
ClamAV
- [x] Virus scanner: single-byte file scans normally (regression test)
- [x] Virus scanner: `scan_content_safe` logs at WARNING not ERROR on
failure
- [x] SDK disconnect: `_is_sdk_disconnect_error` correctly identifies
cancel scope and context var errors
- [x] SDK disconnect: `_is_sdk_disconnect_error` rejects unrelated
errors
- [x] SDK disconnect: `_safe_close_sdk_client` suppresses ValueError,
RuntimeError, and unexpected exceptions
- [x] SDK disconnect: `_safe_close_sdk_client` calls `__aexit__` on
clean exit
- [x] Library: `add_graph_to_library` creates new agent on first call
- [x] Library: `add_graph_to_library` updates existing on
UniqueViolationError
- [x] Library: `create_library_agent` uses upsert to handle concurrent
adds
- [x] All existing workspace overwrite tests still pass
- [x] All tests passing (existing + 4 XML syntax + 3 virus scanner + 10
SDK disconnect + library tests)
842 lines
32 KiB
Python
842 lines
32 KiB
Python
from typing import cast
|
|
from unittest.mock import AsyncMock, MagicMock, patch
|
|
|
|
import anthropic
|
|
import httpx
|
|
import openai
|
|
import pytest
|
|
|
|
import backend.blocks.llm as llm
|
|
from backend.data.model import NodeExecutionStats
|
|
|
|
# TEST_CREDENTIALS_INPUT is a plain dict that satisfies AICredentials at runtime
|
|
# but not at the type level. Cast once here to avoid per-test suppressors.
|
|
_TEST_AI_CREDENTIALS = cast(llm.AICredentials, llm.TEST_CREDENTIALS_INPUT)
|
|
|
|
|
|
class TestLLMStatsTracking:
|
|
"""Test that LLM blocks correctly track token usage statistics."""
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_llm_call_returns_token_counts(self):
|
|
"""Test that llm_call returns proper token counts in LLMResponse."""
|
|
import backend.blocks.llm as llm
|
|
|
|
# Mock the OpenAI Responses API response
|
|
mock_response = MagicMock()
|
|
mock_response.output_text = "Test response"
|
|
mock_response.output = []
|
|
mock_response.usage = MagicMock(input_tokens=10, output_tokens=20)
|
|
|
|
# Test with mocked OpenAI response
|
|
with patch("openai.AsyncOpenAI") as mock_openai:
|
|
mock_client = AsyncMock()
|
|
mock_openai.return_value = mock_client
|
|
mock_client.responses.create = AsyncMock(return_value=mock_response)
|
|
|
|
response = await llm.llm_call(
|
|
credentials=llm.TEST_CREDENTIALS,
|
|
llm_model=llm.DEFAULT_LLM_MODEL,
|
|
prompt=[{"role": "user", "content": "Hello"}],
|
|
max_tokens=100,
|
|
)
|
|
|
|
assert isinstance(response, llm.LLMResponse)
|
|
assert response.prompt_tokens == 10
|
|
assert response.completion_tokens == 20
|
|
assert response.response == "Test response"
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_ai_structured_response_block_tracks_stats(self):
|
|
"""Test that AIStructuredResponseGeneratorBlock correctly tracks stats."""
|
|
from unittest.mock import patch
|
|
|
|
import backend.blocks.llm as llm
|
|
|
|
block = llm.AIStructuredResponseGeneratorBlock()
|
|
|
|
# Mock the llm_call method
|
|
async def mock_llm_call(*args, **kwargs):
|
|
return llm.LLMResponse(
|
|
raw_response="",
|
|
prompt=[],
|
|
response='<json_output id="test123456">{"key1": "value1", "key2": "value2"}</json_output>',
|
|
tool_calls=None,
|
|
prompt_tokens=15,
|
|
completion_tokens=25,
|
|
reasoning=None,
|
|
)
|
|
|
|
block.llm_call = mock_llm_call # type: ignore
|
|
|
|
# Run the block
|
|
input_data = llm.AIStructuredResponseGeneratorBlock.Input(
|
|
prompt="Test prompt",
|
|
expected_format={"key1": "desc1", "key2": "desc2"},
|
|
model=llm.DEFAULT_LLM_MODEL,
|
|
credentials=llm.TEST_CREDENTIALS_INPUT, # type: ignore # type: ignore
|
|
)
|
|
|
|
outputs = {}
|
|
# Mock secrets.token_hex to return consistent ID
|
|
with patch("secrets.token_hex", return_value="test123456"):
|
|
async for output_name, output_data in block.run(
|
|
input_data, credentials=llm.TEST_CREDENTIALS
|
|
):
|
|
outputs[output_name] = output_data
|
|
|
|
# Check stats
|
|
assert block.execution_stats.input_token_count == 15
|
|
assert block.execution_stats.output_token_count == 25
|
|
assert block.execution_stats.llm_call_count == 1
|
|
assert block.execution_stats.llm_retry_count == 0
|
|
|
|
# Check output
|
|
assert "response" in outputs
|
|
assert outputs["response"] == {"key1": "value1", "key2": "value2"}
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_ai_text_generator_block_tracks_stats(self):
|
|
"""Test that AITextGeneratorBlock correctly tracks stats through delegation."""
|
|
import backend.blocks.llm as llm
|
|
|
|
block = llm.AITextGeneratorBlock()
|
|
|
|
# Mock the underlying structured response block
|
|
async def mock_llm_call(input_data, credentials):
|
|
# Simulate the structured block setting stats
|
|
block.execution_stats = NodeExecutionStats(
|
|
input_token_count=30,
|
|
output_token_count=40,
|
|
llm_call_count=1,
|
|
)
|
|
return "Generated text" # AITextGeneratorBlock.llm_call returns a string
|
|
|
|
block.llm_call = mock_llm_call # type: ignore
|
|
|
|
# Run the block
|
|
input_data = llm.AITextGeneratorBlock.Input(
|
|
prompt="Generate text",
|
|
model=llm.DEFAULT_LLM_MODEL,
|
|
credentials=llm.TEST_CREDENTIALS_INPUT, # type: ignore
|
|
)
|
|
|
|
outputs = {}
|
|
async for output_name, output_data in block.run(
|
|
input_data, credentials=llm.TEST_CREDENTIALS
|
|
):
|
|
outputs[output_name] = output_data
|
|
|
|
# Check stats
|
|
assert block.execution_stats.input_token_count == 30
|
|
assert block.execution_stats.output_token_count == 40
|
|
assert block.execution_stats.llm_call_count == 1
|
|
|
|
# Check output - AITextGeneratorBlock returns the response directly, not in a dict
|
|
assert outputs["response"] == "Generated text"
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_stats_accumulation_with_retries(self):
|
|
"""Test that stats correctly accumulate across retries."""
|
|
import backend.blocks.llm as llm
|
|
|
|
block = llm.AIStructuredResponseGeneratorBlock()
|
|
|
|
# Counter to track calls
|
|
call_count = 0
|
|
|
|
async def mock_llm_call(*args, **kwargs):
|
|
nonlocal call_count
|
|
call_count += 1
|
|
|
|
# First call returns invalid format
|
|
if call_count == 1:
|
|
return llm.LLMResponse(
|
|
raw_response="",
|
|
prompt=[],
|
|
response='<json_output id="test123456">{"wrong": "format"}</json_output>',
|
|
tool_calls=None,
|
|
prompt_tokens=10,
|
|
completion_tokens=15,
|
|
reasoning=None,
|
|
)
|
|
# Second call returns correct format
|
|
else:
|
|
return llm.LLMResponse(
|
|
raw_response="",
|
|
prompt=[],
|
|
response='<json_output id="test123456">{"key1": "value1", "key2": "value2"}</json_output>',
|
|
tool_calls=None,
|
|
prompt_tokens=20,
|
|
completion_tokens=25,
|
|
reasoning=None,
|
|
)
|
|
|
|
block.llm_call = mock_llm_call # type: ignore
|
|
|
|
# Run the block with retry
|
|
input_data = llm.AIStructuredResponseGeneratorBlock.Input(
|
|
prompt="Test prompt",
|
|
expected_format={"key1": "desc1", "key2": "desc2"},
|
|
model=llm.DEFAULT_LLM_MODEL,
|
|
credentials=llm.TEST_CREDENTIALS_INPUT, # type: ignore
|
|
retry=2,
|
|
)
|
|
|
|
outputs = {}
|
|
# Mock secrets.token_hex to return consistent ID
|
|
with patch("secrets.token_hex", return_value="test123456"):
|
|
async for output_name, output_data in block.run(
|
|
input_data, credentials=llm.TEST_CREDENTIALS
|
|
):
|
|
outputs[output_name] = output_data
|
|
|
|
# Check stats - should accumulate both calls
|
|
# For 2 attempts: attempt 1 (failed) + attempt 2 (success) = 2 total
|
|
# but llm_call_count is only set on success, so it shows 1 for the final successful attempt
|
|
assert block.execution_stats.input_token_count == 30 # 10 + 20
|
|
assert block.execution_stats.output_token_count == 40 # 15 + 25
|
|
assert block.execution_stats.llm_call_count == 2 # retry_count + 1 = 1 + 1 = 2
|
|
assert block.execution_stats.llm_retry_count == 1
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_ai_text_summarizer_multiple_chunks(self):
|
|
"""Test that AITextSummarizerBlock correctly accumulates stats across multiple chunks."""
|
|
import backend.blocks.llm as llm
|
|
|
|
block = llm.AITextSummarizerBlock()
|
|
|
|
# Track calls to simulate multiple chunks
|
|
call_count = 0
|
|
|
|
async def mock_llm_call(input_data, credentials):
|
|
nonlocal call_count
|
|
call_count += 1
|
|
|
|
# Create a mock block with stats to merge from
|
|
mock_structured_block = llm.AIStructuredResponseGeneratorBlock()
|
|
mock_structured_block.execution_stats = NodeExecutionStats(
|
|
input_token_count=25,
|
|
output_token_count=15,
|
|
llm_call_count=1,
|
|
)
|
|
|
|
# Simulate merge_llm_stats behavior
|
|
block.merge_llm_stats(mock_structured_block)
|
|
|
|
if "final_summary" in input_data.expected_format:
|
|
return {"final_summary": "Final combined summary"}
|
|
else:
|
|
return {"summary": f"Summary of chunk {call_count}"}
|
|
|
|
block.llm_call = mock_llm_call # type: ignore
|
|
|
|
# Create long text that will be split into chunks
|
|
long_text = " ".join(["word"] * 1000) # Moderate size to force ~2-3 chunks
|
|
|
|
input_data = llm.AITextSummarizerBlock.Input(
|
|
text=long_text,
|
|
model=llm.DEFAULT_LLM_MODEL,
|
|
credentials=llm.TEST_CREDENTIALS_INPUT, # type: ignore
|
|
max_tokens=100, # Small chunks
|
|
chunk_overlap=10,
|
|
)
|
|
|
|
# Run the block
|
|
outputs = {}
|
|
async for output_name, output_data in block.run(
|
|
input_data, credentials=llm.TEST_CREDENTIALS
|
|
):
|
|
outputs[output_name] = output_data
|
|
|
|
# Block finished - now grab and assert stats
|
|
assert block.execution_stats is not None
|
|
assert call_count > 1 # Should have made multiple calls
|
|
assert block.execution_stats.llm_call_count > 0
|
|
assert block.execution_stats.input_token_count > 0
|
|
assert block.execution_stats.output_token_count > 0
|
|
|
|
# Check output
|
|
assert "summary" in outputs
|
|
assert outputs["summary"] == "Final combined summary"
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_ai_text_summarizer_real_llm_call_stats(self):
|
|
"""Test AITextSummarizer with real LLM call mocking to verify llm_call_count."""
|
|
from unittest.mock import AsyncMock, MagicMock, patch
|
|
|
|
import backend.blocks.llm as llm
|
|
|
|
block = llm.AITextSummarizerBlock()
|
|
|
|
# Mock the actual LLM call instead of the llm_call method
|
|
call_count = 0
|
|
|
|
async def mock_create(*args, **kwargs):
|
|
nonlocal call_count
|
|
call_count += 1
|
|
|
|
mock_response = MagicMock()
|
|
# Return different responses for chunk summary vs final summary
|
|
if call_count == 1:
|
|
mock_response.output_text = '<json_output id="test123456">{"summary": "Test chunk summary"}</json_output>'
|
|
else:
|
|
mock_response.output_text = '<json_output id="test123456">{"final_summary": "Test final summary"}</json_output>'
|
|
mock_response.output = []
|
|
mock_response.usage = MagicMock(input_tokens=50, output_tokens=30)
|
|
return mock_response
|
|
|
|
with patch("openai.AsyncOpenAI") as mock_openai:
|
|
mock_client = AsyncMock()
|
|
mock_openai.return_value = mock_client
|
|
mock_client.responses.create = mock_create
|
|
|
|
# Test with very short text (should only need 1 chunk + 1 final summary)
|
|
input_data = llm.AITextSummarizerBlock.Input(
|
|
text="This is a short text.",
|
|
model=llm.DEFAULT_LLM_MODEL,
|
|
credentials=llm.TEST_CREDENTIALS_INPUT, # type: ignore
|
|
max_tokens=1000, # Large enough to avoid chunking
|
|
)
|
|
|
|
# Mock secrets.token_hex to return consistent ID
|
|
with patch("secrets.token_hex", return_value="test123456"):
|
|
outputs = {}
|
|
async for output_name, output_data in block.run(
|
|
input_data, credentials=llm.TEST_CREDENTIALS
|
|
):
|
|
outputs[output_name] = output_data
|
|
|
|
print(f"Actual calls made: {call_count}")
|
|
print(f"Block stats: {block.execution_stats}")
|
|
print(f"LLM call count: {block.execution_stats.llm_call_count}")
|
|
|
|
# Should have made 2 calls: 1 for chunk summary + 1 for final summary
|
|
assert block.execution_stats.llm_call_count >= 1
|
|
assert block.execution_stats.input_token_count > 0
|
|
assert block.execution_stats.output_token_count > 0
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_ai_conversation_block_tracks_stats(self):
|
|
"""Test that AIConversationBlock correctly tracks stats."""
|
|
import backend.blocks.llm as llm
|
|
|
|
block = llm.AIConversationBlock()
|
|
|
|
# Mock the llm_call method
|
|
async def mock_llm_call(input_data, credentials):
|
|
block.execution_stats = NodeExecutionStats(
|
|
input_token_count=100,
|
|
output_token_count=50,
|
|
llm_call_count=1,
|
|
)
|
|
return {"response": "AI response to conversation"}
|
|
|
|
block.llm_call = mock_llm_call # type: ignore
|
|
|
|
# Run the block
|
|
input_data = llm.AIConversationBlock.Input(
|
|
messages=[
|
|
{"role": "user", "content": "Hello"},
|
|
{"role": "assistant", "content": "Hi there!"},
|
|
{"role": "user", "content": "How are you?"},
|
|
],
|
|
model=llm.DEFAULT_LLM_MODEL,
|
|
credentials=llm.TEST_CREDENTIALS_INPUT, # type: ignore
|
|
)
|
|
|
|
outputs = {}
|
|
async for output_name, output_data in block.run(
|
|
input_data, credentials=llm.TEST_CREDENTIALS
|
|
):
|
|
outputs[output_name] = output_data
|
|
|
|
# Check stats
|
|
assert block.execution_stats.input_token_count == 100
|
|
assert block.execution_stats.output_token_count == 50
|
|
assert block.execution_stats.llm_call_count == 1
|
|
|
|
# Check output
|
|
assert outputs["response"] == "AI response to conversation"
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_ai_list_generator_basic_functionality(self):
|
|
"""Test that AIListGeneratorBlock correctly works with structured responses."""
|
|
import backend.blocks.llm as llm
|
|
|
|
block = llm.AIListGeneratorBlock()
|
|
|
|
# Mock the llm_call to return a structured response
|
|
async def mock_llm_call(input_data, credentials):
|
|
# Update stats to simulate LLM call
|
|
block.execution_stats = NodeExecutionStats(
|
|
input_token_count=50,
|
|
output_token_count=30,
|
|
llm_call_count=1,
|
|
)
|
|
# Return a structured response with the expected format
|
|
return {"list": ["item1", "item2", "item3"]}
|
|
|
|
block.llm_call = mock_llm_call # type: ignore
|
|
|
|
# Run the block
|
|
input_data = llm.AIListGeneratorBlock.Input(
|
|
focus="test items",
|
|
model=llm.DEFAULT_LLM_MODEL,
|
|
credentials=llm.TEST_CREDENTIALS_INPUT, # type: ignore
|
|
max_retries=3,
|
|
)
|
|
|
|
outputs = {}
|
|
async for output_name, output_data in block.run(
|
|
input_data, credentials=llm.TEST_CREDENTIALS
|
|
):
|
|
outputs[output_name] = output_data
|
|
|
|
# Check stats
|
|
assert block.execution_stats.input_token_count == 50
|
|
assert block.execution_stats.output_token_count == 30
|
|
assert block.execution_stats.llm_call_count == 1
|
|
|
|
# Check output
|
|
assert outputs["generated_list"] == ["item1", "item2", "item3"]
|
|
# Check that individual items were yielded
|
|
# Note: outputs dict will only contain the last value for each key
|
|
# So we need to check that the list_item output exists
|
|
assert "list_item" in outputs
|
|
# The list_item output should be the last item in the list
|
|
assert outputs["list_item"] == "item3"
|
|
assert "prompt" in outputs
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_merge_llm_stats(self):
|
|
"""Test the merge_llm_stats method correctly merges stats from another block."""
|
|
import backend.blocks.llm as llm
|
|
|
|
block1 = llm.AITextGeneratorBlock()
|
|
block2 = llm.AIStructuredResponseGeneratorBlock()
|
|
|
|
# Set stats on block2
|
|
block2.execution_stats = NodeExecutionStats(
|
|
input_token_count=100,
|
|
output_token_count=50,
|
|
llm_call_count=2,
|
|
llm_retry_count=1,
|
|
)
|
|
block2.prompt = [{"role": "user", "content": "Test"}]
|
|
|
|
# Merge stats from block2 into block1
|
|
block1.merge_llm_stats(block2)
|
|
|
|
# Check that stats were merged
|
|
assert block1.execution_stats.input_token_count == 100
|
|
assert block1.execution_stats.output_token_count == 50
|
|
assert block1.execution_stats.llm_call_count == 2
|
|
assert block1.execution_stats.llm_retry_count == 1
|
|
assert block1.prompt == [{"role": "user", "content": "Test"}]
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_stats_initialization(self):
|
|
"""Test that blocks properly initialize stats when not present."""
|
|
import backend.blocks.llm as llm
|
|
|
|
block = llm.AIStructuredResponseGeneratorBlock()
|
|
|
|
# Initially stats should be initialized with zeros
|
|
assert hasattr(block, "execution_stats")
|
|
assert block.execution_stats.llm_call_count == 0
|
|
|
|
# Mock llm_call
|
|
async def mock_llm_call(*args, **kwargs):
|
|
return llm.LLMResponse(
|
|
raw_response="",
|
|
prompt=[],
|
|
response='<json_output id="test123456">{"result": "test"}</json_output>',
|
|
tool_calls=None,
|
|
prompt_tokens=10,
|
|
completion_tokens=20,
|
|
reasoning=None,
|
|
)
|
|
|
|
block.llm_call = mock_llm_call # type: ignore
|
|
|
|
# Run the block
|
|
input_data = llm.AIStructuredResponseGeneratorBlock.Input(
|
|
prompt="Test",
|
|
expected_format={"result": "desc"},
|
|
model=llm.DEFAULT_LLM_MODEL,
|
|
credentials=llm.TEST_CREDENTIALS_INPUT, # type: ignore
|
|
)
|
|
|
|
# Run the block
|
|
outputs = {}
|
|
# Mock secrets.token_hex to return consistent ID
|
|
with patch("secrets.token_hex", return_value="test123456"):
|
|
async for output_name, output_data in block.run(
|
|
input_data, credentials=llm.TEST_CREDENTIALS
|
|
):
|
|
outputs[output_name] = output_data
|
|
|
|
# Block finished - now grab and assert stats
|
|
assert block.execution_stats is not None
|
|
assert block.execution_stats.input_token_count == 10
|
|
assert block.execution_stats.output_token_count == 20
|
|
assert block.execution_stats.llm_call_count == 1 # Should have exactly 1 call
|
|
|
|
# Check output
|
|
assert "response" in outputs
|
|
assert outputs["response"] == {"result": "test"}
|
|
|
|
|
|
class TestAITextSummarizerValidation:
|
|
"""Test that AITextSummarizerBlock validates LLM responses are strings."""
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_summarize_chunk_rejects_list_response(self):
|
|
"""Test that _summarize_chunk raises ValueError when LLM returns a list instead of string."""
|
|
import backend.blocks.llm as llm
|
|
|
|
block = llm.AITextSummarizerBlock()
|
|
|
|
# Mock llm_call to return a list instead of a string
|
|
async def mock_llm_call(input_data, credentials):
|
|
# Simulate LLM returning a list when it should return a string
|
|
return {"summary": ["bullet point 1", "bullet point 2", "bullet point 3"]}
|
|
|
|
block.llm_call = mock_llm_call # type: ignore
|
|
|
|
# Create input data
|
|
input_data = llm.AITextSummarizerBlock.Input(
|
|
text="Some text to summarize",
|
|
model=llm.DEFAULT_LLM_MODEL,
|
|
credentials=llm.TEST_CREDENTIALS_INPUT, # type: ignore
|
|
style=llm.SummaryStyle.BULLET_POINTS,
|
|
)
|
|
|
|
# Should raise ValueError with descriptive message
|
|
with pytest.raises(ValueError) as exc_info:
|
|
await block._summarize_chunk(
|
|
"Some text to summarize",
|
|
input_data,
|
|
credentials=llm.TEST_CREDENTIALS,
|
|
)
|
|
|
|
error_message = str(exc_info.value)
|
|
assert "Expected a string summary" in error_message
|
|
assert "received list" in error_message
|
|
assert "incorrectly formatted" in error_message
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_combine_summaries_rejects_list_response(self):
|
|
"""Test that _combine_summaries raises ValueError when LLM returns a list instead of string."""
|
|
import backend.blocks.llm as llm
|
|
|
|
block = llm.AITextSummarizerBlock()
|
|
|
|
# Mock llm_call to return a list instead of a string
|
|
async def mock_llm_call(input_data, credentials):
|
|
# Check if this is the final summary call
|
|
if "final_summary" in input_data.expected_format:
|
|
# Simulate LLM returning a list when it should return a string
|
|
return {
|
|
"final_summary": [
|
|
"bullet point 1",
|
|
"bullet point 2",
|
|
"bullet point 3",
|
|
]
|
|
}
|
|
else:
|
|
return {"summary": "Valid summary"}
|
|
|
|
block.llm_call = mock_llm_call # type: ignore
|
|
|
|
# Create input data
|
|
input_data = llm.AITextSummarizerBlock.Input(
|
|
text="Some text to summarize",
|
|
model=llm.DEFAULT_LLM_MODEL,
|
|
credentials=llm.TEST_CREDENTIALS_INPUT, # type: ignore
|
|
style=llm.SummaryStyle.BULLET_POINTS,
|
|
max_tokens=1000,
|
|
)
|
|
|
|
# Should raise ValueError with descriptive message
|
|
with pytest.raises(ValueError) as exc_info:
|
|
await block._combine_summaries(
|
|
["summary 1", "summary 2"],
|
|
input_data,
|
|
credentials=llm.TEST_CREDENTIALS,
|
|
)
|
|
|
|
error_message = str(exc_info.value)
|
|
assert "Expected a string final summary" in error_message
|
|
assert "received list" in error_message
|
|
assert "incorrectly formatted" in error_message
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_summarize_chunk_accepts_valid_string_response(self):
|
|
"""Test that _summarize_chunk accepts valid string responses."""
|
|
import backend.blocks.llm as llm
|
|
|
|
block = llm.AITextSummarizerBlock()
|
|
|
|
# Mock llm_call to return a valid string
|
|
async def mock_llm_call(input_data, credentials):
|
|
return {"summary": "This is a valid string summary"}
|
|
|
|
block.llm_call = mock_llm_call # type: ignore
|
|
|
|
# Create input data
|
|
input_data = llm.AITextSummarizerBlock.Input(
|
|
text="Some text to summarize",
|
|
model=llm.DEFAULT_LLM_MODEL,
|
|
credentials=llm.TEST_CREDENTIALS_INPUT, # type: ignore
|
|
)
|
|
|
|
# Should not raise any error
|
|
result = await block._summarize_chunk(
|
|
"Some text to summarize",
|
|
input_data,
|
|
credentials=llm.TEST_CREDENTIALS,
|
|
)
|
|
|
|
assert result == "This is a valid string summary"
|
|
assert isinstance(result, str)
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_combine_summaries_accepts_valid_string_response(self):
|
|
"""Test that _combine_summaries accepts valid string responses."""
|
|
import backend.blocks.llm as llm
|
|
|
|
block = llm.AITextSummarizerBlock()
|
|
|
|
# Mock llm_call to return a valid string
|
|
async def mock_llm_call(input_data, credentials):
|
|
return {"final_summary": "This is a valid final summary string"}
|
|
|
|
block.llm_call = mock_llm_call # type: ignore
|
|
|
|
# Create input data
|
|
input_data = llm.AITextSummarizerBlock.Input(
|
|
text="Some text to summarize",
|
|
model=llm.DEFAULT_LLM_MODEL,
|
|
credentials=llm.TEST_CREDENTIALS_INPUT, # type: ignore
|
|
max_tokens=1000,
|
|
)
|
|
|
|
# Should not raise any error
|
|
result = await block._combine_summaries(
|
|
["summary 1", "summary 2"],
|
|
input_data,
|
|
credentials=llm.TEST_CREDENTIALS,
|
|
)
|
|
|
|
assert result == "This is a valid final summary string"
|
|
assert isinstance(result, str)
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_summarize_chunk_rejects_dict_response(self):
|
|
"""Test that _summarize_chunk raises ValueError when LLM returns a dict instead of string."""
|
|
import backend.blocks.llm as llm
|
|
|
|
block = llm.AITextSummarizerBlock()
|
|
|
|
# Mock llm_call to return a dict instead of a string
|
|
async def mock_llm_call(input_data, credentials):
|
|
return {"summary": {"nested": "object", "with": "data"}}
|
|
|
|
block.llm_call = mock_llm_call # type: ignore
|
|
|
|
# Create input data
|
|
input_data = llm.AITextSummarizerBlock.Input(
|
|
text="Some text to summarize",
|
|
model=llm.DEFAULT_LLM_MODEL,
|
|
credentials=llm.TEST_CREDENTIALS_INPUT, # type: ignore
|
|
)
|
|
|
|
# Should raise ValueError
|
|
with pytest.raises(ValueError) as exc_info:
|
|
await block._summarize_chunk(
|
|
"Some text to summarize",
|
|
input_data,
|
|
credentials=llm.TEST_CREDENTIALS,
|
|
)
|
|
|
|
error_message = str(exc_info.value)
|
|
assert "Expected a string summary" in error_message
|
|
assert "received dict" in error_message
|
|
|
|
|
|
def _make_anthropic_status_error(status_code: int) -> anthropic.APIStatusError:
|
|
"""Create an anthropic.APIStatusError with the given status code."""
|
|
request = httpx.Request("POST", "https://api.anthropic.com/v1/messages")
|
|
response = httpx.Response(status_code, request=request)
|
|
return anthropic.APIStatusError(
|
|
f"Error code: {status_code}", response=response, body=None
|
|
)
|
|
|
|
|
|
def _make_openai_status_error(status_code: int) -> openai.APIStatusError:
|
|
"""Create an openai.APIStatusError with the given status code."""
|
|
response = httpx.Response(
|
|
status_code, request=httpx.Request("POST", "https://api.openai.com/v1/chat")
|
|
)
|
|
return openai.APIStatusError(
|
|
f"Error code: {status_code}", response=response, body=None
|
|
)
|
|
|
|
|
|
class TestUserErrorStatusCodeHandling:
|
|
"""Test that user-caused LLM API errors (401/403/429) break the retry loop
|
|
and are logged as warnings, while server errors (500) trigger retries."""
|
|
|
|
@pytest.mark.asyncio
|
|
@pytest.mark.parametrize("status_code", [401, 403, 429])
|
|
async def test_anthropic_user_error_breaks_retry_loop(self, status_code: int):
|
|
"""401/403/429 Anthropic errors should break immediately, not retry."""
|
|
import backend.blocks.llm as llm
|
|
|
|
block = llm.AIStructuredResponseGeneratorBlock()
|
|
call_count = 0
|
|
|
|
async def mock_llm_call(*args, **kwargs):
|
|
nonlocal call_count
|
|
call_count += 1
|
|
raise _make_anthropic_status_error(status_code)
|
|
|
|
with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
|
|
input_data = llm.AIStructuredResponseGeneratorBlock.Input(
|
|
prompt="Test",
|
|
expected_format={"key": "desc"},
|
|
model=llm.DEFAULT_LLM_MODEL,
|
|
credentials=_TEST_AI_CREDENTIALS,
|
|
retry=3,
|
|
)
|
|
|
|
with pytest.raises(RuntimeError):
|
|
async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
|
|
pass
|
|
|
|
assert (
|
|
call_count == 1
|
|
), f"Expected exactly 1 call for status {status_code}, got {call_count}"
|
|
|
|
@pytest.mark.asyncio
|
|
@pytest.mark.parametrize("status_code", [401, 403, 429])
|
|
async def test_openai_user_error_breaks_retry_loop(self, status_code: int):
|
|
"""401/403/429 OpenAI errors should break immediately, not retry."""
|
|
import backend.blocks.llm as llm
|
|
|
|
block = llm.AIStructuredResponseGeneratorBlock()
|
|
call_count = 0
|
|
|
|
async def mock_llm_call(*args, **kwargs):
|
|
nonlocal call_count
|
|
call_count += 1
|
|
raise _make_openai_status_error(status_code)
|
|
|
|
with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
|
|
input_data = llm.AIStructuredResponseGeneratorBlock.Input(
|
|
prompt="Test",
|
|
expected_format={"key": "desc"},
|
|
model=llm.DEFAULT_LLM_MODEL,
|
|
credentials=_TEST_AI_CREDENTIALS,
|
|
retry=3,
|
|
)
|
|
|
|
with pytest.raises(RuntimeError):
|
|
async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
|
|
pass
|
|
|
|
assert (
|
|
call_count == 1
|
|
), f"Expected exactly 1 call for status {status_code}, got {call_count}"
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_server_error_retries(self):
|
|
"""500 errors should be retried (not break immediately)."""
|
|
import backend.blocks.llm as llm
|
|
|
|
block = llm.AIStructuredResponseGeneratorBlock()
|
|
call_count = 0
|
|
|
|
async def mock_llm_call(*args, **kwargs):
|
|
nonlocal call_count
|
|
call_count += 1
|
|
raise _make_anthropic_status_error(500)
|
|
|
|
with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
|
|
input_data = llm.AIStructuredResponseGeneratorBlock.Input(
|
|
prompt="Test",
|
|
expected_format={"key": "desc"},
|
|
model=llm.DEFAULT_LLM_MODEL,
|
|
credentials=_TEST_AI_CREDENTIALS,
|
|
retry=3,
|
|
)
|
|
|
|
with pytest.raises(RuntimeError):
|
|
async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
|
|
pass
|
|
|
|
assert (
|
|
call_count > 1
|
|
), f"Expected multiple retry attempts for 500, got {call_count}"
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_user_error_logs_warning_not_exception(self):
|
|
"""User-caused errors should log with logger.warning, not logger.exception."""
|
|
import backend.blocks.llm as llm
|
|
|
|
block = llm.AIStructuredResponseGeneratorBlock()
|
|
|
|
async def mock_llm_call(*args, **kwargs):
|
|
raise _make_anthropic_status_error(401)
|
|
|
|
with patch.object(block, "llm_call", new=AsyncMock(side_effect=mock_llm_call)):
|
|
input_data = llm.AIStructuredResponseGeneratorBlock.Input(
|
|
prompt="Test",
|
|
expected_format={"key": "desc"},
|
|
model=llm.DEFAULT_LLM_MODEL,
|
|
credentials=_TEST_AI_CREDENTIALS,
|
|
)
|
|
|
|
with (
|
|
patch.object(llm.logger, "warning") as mock_warning,
|
|
patch.object(llm.logger, "exception") as mock_exception,
|
|
pytest.raises(RuntimeError),
|
|
):
|
|
async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
|
|
pass
|
|
|
|
mock_warning.assert_called_once()
|
|
mock_exception.assert_not_called()
|
|
|
|
|
|
class TestLlmModelMissing:
|
|
"""Test that LlmModel handles provider-prefixed model names."""
|
|
|
|
def test_provider_prefixed_model_resolves(self):
|
|
"""Provider-prefixed model string should resolve to the correct enum member."""
|
|
assert (
|
|
llm.LlmModel("anthropic/claude-sonnet-4-6")
|
|
== llm.LlmModel.CLAUDE_4_6_SONNET
|
|
)
|
|
|
|
def test_bare_model_still_works(self):
|
|
"""Bare (non-prefixed) model string should still resolve correctly."""
|
|
assert llm.LlmModel("claude-sonnet-4-6") == llm.LlmModel.CLAUDE_4_6_SONNET
|
|
|
|
def test_invalid_prefixed_model_raises(self):
|
|
"""Unknown provider-prefixed model string should raise ValueError."""
|
|
with pytest.raises(ValueError):
|
|
llm.LlmModel("invalid/nonexistent-model")
|
|
|
|
def test_slash_containing_value_direct_lookup(self):
|
|
"""Enum values with '/' (e.g., OpenRouter models) should resolve via direct lookup, not _missing_."""
|
|
assert llm.LlmModel("google/gemini-2.5-pro") == llm.LlmModel.GEMINI_2_5_PRO
|
|
|
|
def test_double_prefixed_slash_model(self):
|
|
"""Double-prefixed value should still resolve by stripping first prefix."""
|
|
assert (
|
|
llm.LlmModel("extra/google/gemini-2.5-pro") == llm.LlmModel.GEMINI_2_5_PRO
|
|
)
|