When the SSE connection reconnects, resume_session_stream replays from
"0-0" and the replayed UIMessage objects get new IDs from useChat,
bypassing the adjacent-only content dedup. Switch deduplicateMessages
to track all seen role+context+content fingerprints globally, scoped
by the preceding user message to avoid false positives when the
assistant legitimately gives identical answers to different prompts.
Add responses={404, 429} to the pending endpoint's @router.post decorator
so FastAPI auto-generates them in the OpenAPI spec. Previously these were
only manually added to openapi.json and the CI schema-check (export +
diff) stripped them. Also apply black formatting to the long warning
line that was failing the backend lint check.
Non-IBE exceptions from charge_node_usage (e.g. DB timeout) were
re-raised and caught by the outer generic handler, incorrectly marking
a successful tool execution as failed. This could cause the LLM to
retry side-effectful operations. Now logs the error and continues to
the success path since the tool itself completed successfully.
The setup_test_data fixture creates a graph with credentials already
embedded in node defaults. The DB-stored credential schema may not
surface these as "missing" in build_missing_credentials_from_graph,
so assert the key exists rather than asserting non-empty count.
- Extract duplicated GraphValidationError handler from _run_agent and
_schedule_agent into _handle_graph_validation_race helper method
- Use generator expressions instead of list comprehension for
short-circuit evaluation in _build_setup_requirements_from_validation_error
- Improve mixed-error fallback message to be more user-friendly
- Add test for empty node_errors={} edge case
- Pin expected credential count in firecrawl fixture tests
- Add missing_credentials assertion to schedule race E2E test
- Add test for extras present with node_errors=None in service_test
- Add TODO(#12747) to _SystemPromptPreset for cleanup tracking
- Update docstring to note SDK version and migration path
- Add debug logging in _build_system_prompt_value for observability
- Document empty-string edge case in docstring
- Trim redundant block comment at call site to single line
- Add test for empty-string system_prompt with cache enabled
- Add test for CHAT_CLAUDE_AGENT_CROSS_USER_PROMPT_CACHE=false env var
- Backend: always pass tracking_type=None to _build_raw_where for
percentile and histogram queries so they compute stats on cost_usd
rows regardless of the caller's tracking_type filter.
- Frontend test: use getAllByText for "5" which appears in both the
Active Users card and the $1-2 bucket count.
- Frontend: fix prettier formatting in PlatformCostContent.tsx.
In non-E2B mode, to_sdk_names() failed to map whitelisted SDK built-in
file tool names (Write, Edit, Read) to their MCP-prefixed equivalents
(mcp__copilot__Write, etc.), causing them to be incorrectly filtered out
when users configured tool whitelists.
Add _SDK_TO_MCP mapping for non-E2B mode that maps Read->read_file,
Write->Write, Edit->Edit. Add test coverage for this case.
Also fix black formatting in permissions_test.py that was causing CI lint
failure.
The 6th group_by (total agg no-tracking-type) only runs when
tracking_type is set. This test doesn't pass tracking_type, so the
expected count is 5, not 6.
Backend:
- Extract _build_raw_where() helper so raw SQL and Prisma WHERE share
filter logic (review item #4 — duplicated filter logic)
- Skip redundant total_agg_no_tracking_type_groups query when
tracking_type is None since it duplicates total_agg_groups (item #3)
- Convert CostBucket from TypedDict to BaseModel for consistency (nit #1)
- Replace fragile 8-way positional tuple unpack with indexed list access
Frontend:
- Make 12 SummaryCards data-driven via a cards config array (item #5)
- Use friendlier percentile labels: Typical/Upper/High/Peak Cost (P50/P75/P95/P99)
- Update test fixtures with all new dashboard fields (item #1)
- Add test assertions for new summary card labels, cost buckets, token
values, and user table columns
- Reject non-existent and non-file CLI paths at config validation time
instead of letting them fail with opaque OS errors at runtime
- Add negative test coverage for CLI path validator (non-existent,
non-executable, directory paths)
- Document breaking default changes (max_turns 1000->50, max_budget
$100->$5) in field descriptions with env var override instructions
- Narrow broad `except Exception` to `except (ImportError, AttributeError)`
in cli_openrouter_compat_test.py
Text parts in assistant messages were being rendered as plain <span>
elements, bypassing MessagePartRenderer's case "text" handler and
parseSpecialMarkers(). This broke styled error/system messages
([ERROR:], [RETRYABLE_ERROR:], [SYSTEM:] markers) and markdown
rendering in the builder chat panel.
Route all assistant message parts (text and tool) through
MessagePartRenderer so parseSpecialMarkers() runs on text content.
- Rename `_READ_TOOL_NAME` from `"Read"` to `"read_tool_result"` so the LLM
can distinguish it from `read_file` (working-directory tool). The new name
plus an updated description make its narrow scope (tool-results/ paths and
workspace:// URIs) unambiguous.
- Fix path leak in `_read_file_handler`: use `os.path.basename(file_path)` in
the "Path not allowed" error, consistent with write/edit handlers.
- Update `permissions.py` comment and all `permissions_test.py` assertions to
use the new `mcp__copilot__read_tool_result` name.
Baseline turn-start drain_pending_messages was unprotected — a transient
Redis error would propagate up and kill the entire turn stream, unlike the
already-protected mid-loop and SDK paths. Wrap with try/except + fallback
to [] so a Redis hiccup degrades gracefully.
Also adds 404 (session not found) and 429 (rate-limit exceeded) response
codes to the pending endpoint's OpenAPI spec so TypeScript clients can
handle these error paths correctly.
- Remove redundant inner `ChatConfig` import in `_prewarm_cli` — it was
already imported at module scope on line 16 (style guide: inner imports
only for heavy optional deps)
- Correct stale comment in `sdk_compat_test.py`: 2.1.63/2.1.70 pre-date
the context-management regression and are OpenRouter-safe without any
env var; only 2.1.97+ requires CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1
- Update `_assert_no_forbidden_patterns` error message in
`cli_openrouter_compat_test.py`: remove the stale "above 0.1.45" ceiling
(we've already upgraded to 0.1.58) and point at the correct remediation
steps (add to _KNOWN_GOOD_BUNDLED_CLI_VERSIONS after bisect verification)
- Plug test coverage gap in `env_test.py`: add
`CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS == "1"` assertions to three
OpenRouter test methods that were missing it
(test_strips_trailing_v1, test_strips_trailing_v1_and_slash,
test_no_v1_suffix_left_alone) — guards against the env var being
accidentally dropped from a code path that the main test didn't exercise
The gated_processor fixture's fake_low_balance mock used **kwargs, but
production code calls _handle_low_balance with positional args via
asyncio.to_thread. This caused a silent TypeError caught by the broad
except handler, making the handle_low_balance assertion fail (0 calls
instead of 1). Updated mock to match the actual method signature.
- Move tool_node_stats None guard before node_exec_future.set_result so
that when on_node_execution returns None (swallowed by @async_error_logged),
the future carries set_exception(RuntimeError) rather than set_result(None),
giving the tracking system an accurate error state
- Remove redundant `tool_node_stats is not None` check that was dead code
after the early-return guard was added
- Add explanatory comment in _charge_extra_iterations_sync docstring explaining
why the block lookup is intentionally repeated rather than cached from
_charge_usage (two separate thread-pool workers, no shared mutable state)
- Add assertion to test_on_node_execution_charges_extra_iterations_when_gate_passes
verifying _handle_low_balance is called when extra_cost > 0
- Add test_on_node_execution_failed_ibe_sends_notification covering the
FAILED + InsufficientBalanceError path in on_node_execution (lines 822-836)
that was previously untested
- Add type="button" and focus-visible ring to Stop/Send buttons in PanelInput
- Add type="button" to Retry button in MessageList and Apply button in ActionList
- Fix MessageList to render plain text directly and only pass dynamic-tool parts
to MessagePartRenderer (text parts were being misrouted through a tool renderer)
- Replace clearGraphSessionCacheForTesting export with _graphSessionCache for
tests — avoids leaking test scaffolding into the production bundle
- Add toast notification in undo restore when target node was deleted between
apply and undo (prevents silent no-op)
- Fix misleading test: remove red-herring mockNodes.push from 'no auto-send' test
since the guard is isGraphLoaded===false, not the node array
- Add truncation-path coverage to helpers.test.ts (MAX_NODES/MAX_EDGES branches)
- Add deleted-node undo test to actionApplicators.test.ts
- UserTable: replace `cost_bearing_request_count!` non-null assertion with
`?? 1` nullish coalesce — eliminates the TypeScript anti-pattern and
guards against a theoretical divide-by-zero if the guard is refactored
- platform_cost_test: add assertions for `total_input_tokens` and
`total_output_tokens` in test_returns_dashboard_with_data to cover the
"Total Tokens" summary card computation path
- PlatformCostContent: add a h-32 skeleton placeholder for the cost-bucket
histogram section so the loading state reflects the loaded layout more
closely and reduces CLS
- sdk/service.py: wrap drain_pending_messages at turn start in try/except;
a transient Redis error no longer kills the entire turn (baseline mid-loop
drain was already protected, SDK was missed in round 5)
- baseline/service.py: pre-compute format_pending_as_user_message content
once per drained message and reuse it for both session.messages and
transcript_builder — eliminates the redundant second call per message
- routes.py: move _URL_LIMIT/_CONTENT_LIMIT out of the validator body into
module-level _CONTEXT_URL_MAX_LENGTH/_CONTEXT_CONTENT_MAX_LENGTH so the
contract limits are visible to tooling without reading the implementation
- Add "do NOT redirect to the Builder for credential setup" guardrail to
run_agent description, making it symmetric with create_agent/edit_agent
- Scrub error message text from race-path warning logs; log only node IDs
and field names to avoid leaking credential IDs/provider details
- Add code comment explaining the None-vs-filtering trade-off in
_build_setup_requirements_from_validation_error
- Add E2E tests for structural-error fallback on both run and schedule
paths (verifies ErrorResponse returned, not setup_requirements card)
- env_test: add missing CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS assertion
to test_no_anthropic_key_overrides_when_openrouter_flag_true_but_no_key
(the other three build_sdk_env test cases already assert it; this case
was the only one that didn't, leaving the env-var injection unverified
for the openrouter_active=False / no-key path)
- sdk_compat_test: add test_sdk_exposes_max_thinking_tokens_option
parallel to the existing test_sdk_exposes_cli_path_option — guards
against a future SDK rename/removal of max_thinking_tokens silently
disabling the Opus thinking-token cost cap
- Use `file_path` (caller-supplied) instead of `resolved` in Write/Edit
success messages to avoid leaking `/tmp/copilot-<session>/...` to the LLM
- Add partial-truncation guard to `_read_file_handler` (MCP `Read` tool):
when `offset`/`limit` are present but `file_path` is missing, return a
specific truncation message instead of the generic `file_path is required`
- Add `TestConcurrentEditLocking` test that uses `asyncio.gather` to verify
two parallel Edit calls on the same file are serialised by `_edit_locks`
- Add `autouse` fixture `_clear_edit_locks` to prevent module-level dict
from bleeding between test runs
When a caller filters the dashboard by tracking_type='tokens', total_agg_groups
only contains tokens rows so cost_bearing_requests=0 and avg_cost_microdollars_per_request
silently returned 0.0. Symmetrically, filtering by cost_usd gave zero token averages.
Add a parallel total_agg_no_tracking_type_groups query (using where_no_tracking_type,
mirroring the fix already applied to by_user_tracking_groups) and derive avg_cost_total,
avg_input_total, avg_output_total, cost_bearing_requests, and token_bearing_requests
from that unfiltered aggregate. The displayed grand totals (total_cost, total_requests,
total_input_tokens) remain scoped to the active filter.
Also adds test_global_avg_cost_nonzero_when_filtering_by_tokens to cover this case.
The existing compat test for SystemPromptPreset omitted exclude_dynamic_sections,
diverging from the actual dict _build_system_prompt_value produces. The new test
calls the production helper directly and passes its output through ClaudeAgentOptions,
so any SDK version that rejects the extra key is caught at test time.
Two new tests in TestGetPlatformCostDashboard:
1. test_cost_bearing_request_count_nonzero_when_filtering_by_tokens: verifies
that cost_bearing_request_count per user is correct even when the main
tracking_type filter is 'tokens' (regression guard for the bug where
by_user_tracking_groups used the filtered where-clause).
2. test_user_tracking_groups_excludes_tracking_type_filter: verifies that the
3rd group_by call (by_user_tracking_groups) does NOT receive a trackingType
constraint while the 1st call (by_provider) does.
When the caller filters the main view by e.g. tracking_type=tokens, the
by_user_tracking_groups query was also filtered, excluding all cost_usd rows
and making cost_bearing_request_count zero for every user. Use a separate
where_no_tracking_type filter (omitting tracking_type) for this sub-query so
cost_usd rows are always present for correct per-user avg cost denominators.
Pyright rejects `list[dict[str, Unknown]]` being passed as `list[CostBucket]`
because list is invariant. Constructing CostBucket instances explicitly
satisfies the type checker across Python 3.11/3.12/3.13.
- Add CostBucket to openapi.json components/schemas so orval generates
a costBucket.ts file instead of an inline anonymous type
- Use \$ref in cost_buckets items array for proper orval code generation
- Create costBucket.ts generated model; update platformCostDashboard.ts
to import from it instead of defining CostBucket inline
- Update PlatformCostContent.tsx import to use costBucket directly
The PR added 5 new cost summary cards (Avg Cost, P50, P75, P95, P99)
that also display \$0.0000 when empty, so the test assertion needed to
change from 2 to 7 matching elements.
- Add `cost_bearing_request_count` to `UserCostSummary` via a new
group-by-(userId,trackingType) query; `UserTable` now divides by
this count instead of the mixed `request_count`, eliminating
denominator dilution for users with both tokens and cost_usd rows
- Guard histogram CASE against NULL costMicrodollars (NULL < N → unknown
falls to ELSE '$10+'); add `AND "costMicrodollars" IS NOT NULL` to
the histogram WHERE so NULL rows are excluded instead of bucketed
- Respect the `tracking_type` dashboard filter in raw SQL percentile
and bucket queries; previously the filter was hardcoded to 'cost_usd'
even when the caller passed tracking_type='tokens', making those
queries return inconsistent data relative to the ORM queries
- Add p75 and p99 assertions to test_returns_dashboard_with_data
- Update openapi.json and generated TS model for new field