The backend added total_cache_read_tokens and total_cache_creation_tokens
to UserCostSummary but the OpenAPI spec was not updated, causing frontend
build failures.
- Validate entries from localStorage before constructing the sessionModes map,
filtering out corrupt/unknown mode strings (addresses CodeRabbit review)
- Add removeSessionMode action and call it on session delete so the map does
not grow unboundedly
- Add recordSessionMode to the useEffect dependency array to avoid stale-closure risk
- Add clarifying comment to restoreSessionMode no-op branch
- Extend tests to cover removeSessionMode, no-op, and corrupt-localStorage behaviour
Using user message text as the context key caused the deduplicator to
drop the second assistant reply when a user asked the same question twice
in one session. Switching to user message ID (which is unique per turn)
fixes the false positive while still preventing SSE-replayed duplicates.
Adds a regression test covering the same-question-twice scenario.
Bug 1: Fallback cost estimation was using accumulated turn_prompt_tokens /
turn_completion_tokens across all tool-call rounds, causing compounding
over-estimation on the 2nd+ turn. Snapshot token counts before each call and
pass only the per-call delta to _estimate_cost_from_tokens.
Bug 2: turn_cache_creation_tokens was defined but never populated. Extract
cache_creation_input_tokens from prompt_tokens_details (available from some
providers such as Anthropic via OpenRouter).
Add regression tests for both fixes.
- Add claude_agent_thinking_effort config (default: 'low') to control
thinking depth. 'low' minimizes thinking token usage — the #1 cost
driver at 49% of total spend.
- Raise max_budget_usd from $5 to $15 — $5 was below p50 ($5.37),
causing half of all turns to get budget-killed mid-task.
- Log raw SDK usage dict to discover thinking token fields.
When OpenRouter's x-total-cost header is missing, estimate cost from
token counts using a known model pricing table so cost is always logged.
Also extract cache token details from streaming usage chunks
(prompt_tokens_details.cached_tokens) and pass them through to
PlatformCostLog.
On the dashboard side, add cache read/write columns to the logs table
and user table, and include cache tokens in the UserCostSummary backend
model so they surface in the API response.
The copilot mode (fast/extended_thinking) was stored as a single global
value. When switching between sessions, the mode indicator stayed on
whatever was last set globally rather than reflecting the mode each
session was created with.
Add a sessionModes map to the Zustand store that records the active
copilotMode when a session is created and restores it when the user
switches back to that session.
$5 was too aggressive — p50 cost is $5.37 so half of all turns were
getting budget-killed mid-task with no value delivered. $15 covers p75
($13.07) so ~75% of tasks complete. The thinking token cap is the
better cost lever but needs verification first.
Add Accept-Encoding: identity to ANTHROPIC_CUSTOM_HEADERS in
build_sdk_env() to prevent ZlibError decompression failures in the
CLI subprocess. Appended after any existing custom headers (OpenRouter
trace headers).
See: oven-sh/bun#23149, anthropics/claude-code#18302
The raw SQL WHERE clause builder was passing datetime parameters without
explicit type casts, causing PostgreSQL to fail with "operator does not
exist: timestamp without time zone >= text".
When the SSE connection reconnects, resume_session_stream replays from
"0-0" and the replayed UIMessage objects get new IDs from useChat,
bypassing the adjacent-only content dedup. Switch deduplicateMessages
to track all seen role+context+content fingerprints globally, scoped
by the preceding user message to avoid false positives when the
assistant legitimately gives identical answers to different prompts.
When the SSE connection reconnects, resume_session_stream replays from
"0-0" and the replayed UIMessage objects get new IDs from useChat,
bypassing the adjacent-only content dedup. Switch deduplicateMessages
to track all seen role+context+content fingerprints globally, scoped
by the preceding user message to avoid false positives when the
assistant legitimately gives identical answers to different prompts.
Add responses={404, 429} to the pending endpoint's @router.post decorator
so FastAPI auto-generates them in the OpenAPI spec. Previously these were
only manually added to openapi.json and the CI schema-check (export +
diff) stripped them. Also apply black formatting to the long warning
line that was failing the backend lint check.
Non-IBE exceptions from charge_node_usage (e.g. DB timeout) were
re-raised and caught by the outer generic handler, incorrectly marking
a successful tool execution as failed. This could cause the LLM to
retry side-effectful operations. Now logs the error and continues to
the success path since the tool itself completed successfully.
The setup_test_data fixture creates a graph with credentials already
embedded in node defaults. The DB-stored credential schema may not
surface these as "missing" in build_missing_credentials_from_graph,
so assert the key exists rather than asserting non-empty count.
- Extract duplicated GraphValidationError handler from _run_agent and
_schedule_agent into _handle_graph_validation_race helper method
- Use generator expressions instead of list comprehension for
short-circuit evaluation in _build_setup_requirements_from_validation_error
- Improve mixed-error fallback message to be more user-friendly
- Add test for empty node_errors={} edge case
- Pin expected credential count in firecrawl fixture tests
- Add missing_credentials assertion to schedule race E2E test
- Add test for extras present with node_errors=None in service_test
- Add TODO(#12747) to _SystemPromptPreset for cleanup tracking
- Update docstring to note SDK version and migration path
- Add debug logging in _build_system_prompt_value for observability
- Document empty-string edge case in docstring
- Trim redundant block comment at call site to single line
- Add test for empty-string system_prompt with cache enabled
- Add test for CHAT_CLAUDE_AGENT_CROSS_USER_PROMPT_CACHE=false env var
- Backend: always pass tracking_type=None to _build_raw_where for
percentile and histogram queries so they compute stats on cost_usd
rows regardless of the caller's tracking_type filter.
- Frontend test: use getAllByText for "5" which appears in both the
Active Users card and the $1-2 bucket count.
- Frontend: fix prettier formatting in PlatformCostContent.tsx.
In non-E2B mode, to_sdk_names() failed to map whitelisted SDK built-in
file tool names (Write, Edit, Read) to their MCP-prefixed equivalents
(mcp__copilot__Write, etc.), causing them to be incorrectly filtered out
when users configured tool whitelists.
Add _SDK_TO_MCP mapping for non-E2B mode that maps Read->read_file,
Write->Write, Edit->Edit. Add test coverage for this case.
Also fix black formatting in permissions_test.py that was causing CI lint
failure.
The 6th group_by (total agg no-tracking-type) only runs when
tracking_type is set. This test doesn't pass tracking_type, so the
expected count is 5, not 6.
Backend:
- Extract _build_raw_where() helper so raw SQL and Prisma WHERE share
filter logic (review item #4 — duplicated filter logic)
- Skip redundant total_agg_no_tracking_type_groups query when
tracking_type is None since it duplicates total_agg_groups (item #3)
- Convert CostBucket from TypedDict to BaseModel for consistency (nit #1)
- Replace fragile 8-way positional tuple unpack with indexed list access
Frontend:
- Make 12 SummaryCards data-driven via a cards config array (item #5)
- Use friendlier percentile labels: Typical/Upper/High/Peak Cost (P50/P75/P95/P99)
- Update test fixtures with all new dashboard fields (item #1)
- Add test assertions for new summary card labels, cost buckets, token
values, and user table columns
- Reject non-existent and non-file CLI paths at config validation time
instead of letting them fail with opaque OS errors at runtime
- Add negative test coverage for CLI path validator (non-existent,
non-executable, directory paths)
- Document breaking default changes (max_turns 1000->50, max_budget
$100->$5) in field descriptions with env var override instructions
- Narrow broad `except Exception` to `except (ImportError, AttributeError)`
in cli_openrouter_compat_test.py
Text parts in assistant messages were being rendered as plain <span>
elements, bypassing MessagePartRenderer's case "text" handler and
parseSpecialMarkers(). This broke styled error/system messages
([ERROR:], [RETRYABLE_ERROR:], [SYSTEM:] markers) and markdown
rendering in the builder chat panel.
Route all assistant message parts (text and tool) through
MessagePartRenderer so parseSpecialMarkers() runs on text content.
- Rename `_READ_TOOL_NAME` from `"Read"` to `"read_tool_result"` so the LLM
can distinguish it from `read_file` (working-directory tool). The new name
plus an updated description make its narrow scope (tool-results/ paths and
workspace:// URIs) unambiguous.
- Fix path leak in `_read_file_handler`: use `os.path.basename(file_path)` in
the "Path not allowed" error, consistent with write/edit handlers.
- Update `permissions.py` comment and all `permissions_test.py` assertions to
use the new `mcp__copilot__read_tool_result` name.
Baseline turn-start drain_pending_messages was unprotected — a transient
Redis error would propagate up and kill the entire turn stream, unlike the
already-protected mid-loop and SDK paths. Wrap with try/except + fallback
to [] so a Redis hiccup degrades gracefully.
Also adds 404 (session not found) and 429 (rate-limit exceeded) response
codes to the pending endpoint's OpenAPI spec so TypeScript clients can
handle these error paths correctly.
- Remove redundant inner `ChatConfig` import in `_prewarm_cli` — it was
already imported at module scope on line 16 (style guide: inner imports
only for heavy optional deps)
- Correct stale comment in `sdk_compat_test.py`: 2.1.63/2.1.70 pre-date
the context-management regression and are OpenRouter-safe without any
env var; only 2.1.97+ requires CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1
- Update `_assert_no_forbidden_patterns` error message in
`cli_openrouter_compat_test.py`: remove the stale "above 0.1.45" ceiling
(we've already upgraded to 0.1.58) and point at the correct remediation
steps (add to _KNOWN_GOOD_BUNDLED_CLI_VERSIONS after bisect verification)
- Plug test coverage gap in `env_test.py`: add
`CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS == "1"` assertions to three
OpenRouter test methods that were missing it
(test_strips_trailing_v1, test_strips_trailing_v1_and_slash,
test_no_v1_suffix_left_alone) — guards against the env var being
accidentally dropped from a code path that the main test didn't exercise
The gated_processor fixture's fake_low_balance mock used **kwargs, but
production code calls _handle_low_balance with positional args via
asyncio.to_thread. This caused a silent TypeError caught by the broad
except handler, making the handle_low_balance assertion fail (0 calls
instead of 1). Updated mock to match the actual method signature.
- Move tool_node_stats None guard before node_exec_future.set_result so
that when on_node_execution returns None (swallowed by @async_error_logged),
the future carries set_exception(RuntimeError) rather than set_result(None),
giving the tracking system an accurate error state
- Remove redundant `tool_node_stats is not None` check that was dead code
after the early-return guard was added
- Add explanatory comment in _charge_extra_iterations_sync docstring explaining
why the block lookup is intentionally repeated rather than cached from
_charge_usage (two separate thread-pool workers, no shared mutable state)
- Add assertion to test_on_node_execution_charges_extra_iterations_when_gate_passes
verifying _handle_low_balance is called when extra_cost > 0
- Add test_on_node_execution_failed_ibe_sends_notification covering the
FAILED + InsufficientBalanceError path in on_node_execution (lines 822-836)
that was previously untested
- Add type="button" and focus-visible ring to Stop/Send buttons in PanelInput
- Add type="button" to Retry button in MessageList and Apply button in ActionList
- Fix MessageList to render plain text directly and only pass dynamic-tool parts
to MessagePartRenderer (text parts were being misrouted through a tool renderer)
- Replace clearGraphSessionCacheForTesting export with _graphSessionCache for
tests — avoids leaking test scaffolding into the production bundle
- Add toast notification in undo restore when target node was deleted between
apply and undo (prevents silent no-op)
- Fix misleading test: remove red-herring mockNodes.push from 'no auto-send' test
since the guard is isGraphLoaded===false, not the node array
- Add truncation-path coverage to helpers.test.ts (MAX_NODES/MAX_EDGES branches)
- Add deleted-node undo test to actionApplicators.test.ts