Sessions created before the mode fix had no recorded mode. Previously
restoreSessionMode would leave the global mode unchanged (whatever it
was set to on another session). Now defaults to extended_thinking when
no mode is recorded — no need to clear localStorage.
Old sessions (created before the mode fix) didn't have a recorded
mode, so switching away and back would lose the mode. Now we record
the current mode for the departing session before switching.
500K chars (~125K tokens) per tool result was too generous — a few
large tool outputs could push context past 200K+ tokens. 100K chars
(~25K tokens) keeps individual results reasonable. The SDK writes
oversized results to tool-results/ files and returns a reference.
Set CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=50 to compact at 50% of 200K
context window (100K) instead of the default 70% (140K). Context
>200K accounts for 54% of cost despite being only 3% of calls.
Earlier compaction keeps context smaller and reduces cache creation.
Models without extended thinking (e.g. Sonnet) sometimes emit
<internal_reasoning>...</internal_reasoning> tags as visible text.
Extract ThinkingStripper to a shared module and apply it to the SDK
streaming path so these tags are stripped before reaching the SSE
client and the persisted message.
effort=low on Sonnet causes <internal_reasoning> tags to leak into
visible output. Changed default to None (let model decide). Only
passed to SDK when explicitly set via CHAT_CLAUDE_AGENT_THINKING_EFFORT.
Opus at $15/$75 per M tokens is unsustainable for agentic sessions
(1M+ context after 30+ turns = $7+/turn). Sonnet at $3/$15 per M
is 5x cheaper with comparable quality for most tasks.
Override via CHAT_MODEL=anthropic/claude-opus-4.6 for premium tier.
The backend added total_cache_read_tokens and total_cache_creation_tokens
to UserCostSummary but the OpenAPI spec was not updated, causing frontend
build failures.
The backend added total_cache_read_tokens and total_cache_creation_tokens
to UserCostSummary but the OpenAPI spec was not updated, causing frontend
build failures.
- Validate entries from localStorage before constructing the sessionModes map,
filtering out corrupt/unknown mode strings (addresses CodeRabbit review)
- Add removeSessionMode action and call it on session delete so the map does
not grow unboundedly
- Add recordSessionMode to the useEffect dependency array to avoid stale-closure risk
- Add clarifying comment to restoreSessionMode no-op branch
- Extend tests to cover removeSessionMode, no-op, and corrupt-localStorage behaviour
Using user message text as the context key caused the deduplicator to
drop the second assistant reply when a user asked the same question twice
in one session. Switching to user message ID (which is unique per turn)
fixes the false positive while still preventing SSE-replayed duplicates.
Adds a regression test covering the same-question-twice scenario.
Bug 1: Fallback cost estimation was using accumulated turn_prompt_tokens /
turn_completion_tokens across all tool-call rounds, causing compounding
over-estimation on the 2nd+ turn. Snapshot token counts before each call and
pass only the per-call delta to _estimate_cost_from_tokens.
Bug 2: turn_cache_creation_tokens was defined but never populated. Extract
cache_creation_input_tokens from prompt_tokens_details (available from some
providers such as Anthropic via OpenRouter).
Add regression tests for both fixes.
- Add claude_agent_thinking_effort config (default: 'low') to control
thinking depth. 'low' minimizes thinking token usage — the #1 cost
driver at 49% of total spend.
- Raise max_budget_usd from $5 to $15 — $5 was below p50 ($5.37),
causing half of all turns to get budget-killed mid-task.
- Log raw SDK usage dict to discover thinking token fields.
When OpenRouter's x-total-cost header is missing, estimate cost from
token counts using a known model pricing table so cost is always logged.
Also extract cache token details from streaming usage chunks
(prompt_tokens_details.cached_tokens) and pass them through to
PlatformCostLog.
On the dashboard side, add cache read/write columns to the logs table
and user table, and include cache tokens in the UserCostSummary backend
model so they surface in the API response.
The copilot mode (fast/extended_thinking) was stored as a single global
value. When switching between sessions, the mode indicator stayed on
whatever was last set globally rather than reflecting the mode each
session was created with.
Add a sessionModes map to the Zustand store that records the active
copilotMode when a session is created and restores it when the user
switches back to that session.
$5 was too aggressive — p50 cost is $5.37 so half of all turns were
getting budget-killed mid-task with no value delivered. $15 covers p75
($13.07) so ~75% of tasks complete. The thinking token cap is the
better cost lever but needs verification first.
Add Accept-Encoding: identity to ANTHROPIC_CUSTOM_HEADERS in
build_sdk_env() to prevent ZlibError decompression failures in the
CLI subprocess. Appended after any existing custom headers (OpenRouter
trace headers).
See: oven-sh/bun#23149, anthropics/claude-code#18302
The raw SQL WHERE clause builder was passing datetime parameters without
explicit type casts, causing PostgreSQL to fail with "operator does not
exist: timestamp without time zone >= text".
When the SSE connection reconnects, resume_session_stream replays from
"0-0" and the replayed UIMessage objects get new IDs from useChat,
bypassing the adjacent-only content dedup. Switch deduplicateMessages
to track all seen role+context+content fingerprints globally, scoped
by the preceding user message to avoid false positives when the
assistant legitimately gives identical answers to different prompts.
When the SSE connection reconnects, resume_session_stream replays from
"0-0" and the replayed UIMessage objects get new IDs from useChat,
bypassing the adjacent-only content dedup. Switch deduplicateMessages
to track all seen role+context+content fingerprints globally, scoped
by the preceding user message to avoid false positives when the
assistant legitimately gives identical answers to different prompts.
Add responses={404, 429} to the pending endpoint's @router.post decorator
so FastAPI auto-generates them in the OpenAPI spec. Previously these were
only manually added to openapi.json and the CI schema-check (export +
diff) stripped them. Also apply black formatting to the long warning
line that was failing the backend lint check.
Non-IBE exceptions from charge_node_usage (e.g. DB timeout) were
re-raised and caught by the outer generic handler, incorrectly marking
a successful tool execution as failed. This could cause the LLM to
retry side-effectful operations. Now logs the error and continues to
the success path since the tool itself completed successfully.
The setup_test_data fixture creates a graph with credentials already
embedded in node defaults. The DB-stored credential schema may not
surface these as "missing" in build_missing_credentials_from_graph,
so assert the key exists rather than asserting non-empty count.
- Extract duplicated GraphValidationError handler from _run_agent and
_schedule_agent into _handle_graph_validation_race helper method
- Use generator expressions instead of list comprehension for
short-circuit evaluation in _build_setup_requirements_from_validation_error
- Improve mixed-error fallback message to be more user-friendly
- Add test for empty node_errors={} edge case
- Pin expected credential count in firecrawl fixture tests
- Add missing_credentials assertion to schedule race E2E test
- Add test for extras present with node_errors=None in service_test
- Add TODO(#12747) to _SystemPromptPreset for cleanup tracking
- Update docstring to note SDK version and migration path
- Add debug logging in _build_system_prompt_value for observability
- Document empty-string edge case in docstring
- Trim redundant block comment at call site to single line
- Add test for empty-string system_prompt with cache enabled
- Add test for CHAT_CLAUDE_AGENT_CROSS_USER_PROMPT_CACHE=false env var
- Backend: always pass tracking_type=None to _build_raw_where for
percentile and histogram queries so they compute stats on cost_usd
rows regardless of the caller's tracking_type filter.
- Frontend test: use getAllByText for "5" which appears in both the
Active Users card and the $1-2 bucket count.
- Frontend: fix prettier formatting in PlatformCostContent.tsx.
In non-E2B mode, to_sdk_names() failed to map whitelisted SDK built-in
file tool names (Write, Edit, Read) to their MCP-prefixed equivalents
(mcp__copilot__Write, etc.), causing them to be incorrectly filtered out
when users configured tool whitelists.
Add _SDK_TO_MCP mapping for non-E2B mode that maps Read->read_file,
Write->Write, Edit->Edit. Add test coverage for this case.
Also fix black formatting in permissions_test.py that was causing CI lint
failure.
The 6th group_by (total agg no-tracking-type) only runs when
tracking_type is set. This test doesn't pass tracking_type, so the
expected count is 5, not 6.