Backend:
- Cache stripe.Price.retrieve with 5-min TTL via _get_stripe_price_amount
to avoid 200-600ms Stripe round-trip on every GET /credits/subscription
- Use SubscriptionTier enum .value for FREE/ENTERPRISE in tier_costs dict
for consistency (instead of hardcoded strings)
- Rename misleading test names: "defaults_to_FREE" → "preserves_current_tier"
to reflect actual behaviour (unknown price IDs preserve tier, not reset)
- Update subscription_routes_test to mock _get_stripe_price_amount instead
of stripe.Price.retrieve directly, avoiding cached-result interference
Frontend:
- Handle ?subscription=success return from Stripe Checkout: refetch + toast
- Add downgrade confirmation Dialog before cancelling paid subscription
- Handle ENTERPRISE tier: render dedicated admin-managed plan card, not the
FREE/PRO/BUSINESS tier cards (which would show no "Current" badge)
- Track pendingTier (via variables) so only the clicked button shows "Updating..."
- Show "Pricing available soon" for paid tiers with cost=0 (unconfigured LD flags)
instead of misleading "Free"
- Move tierError state into the hook, set via changeTier internally
- Move TIER_ORDER constant to module scope (was magic array inside render body)
- Add aria-current="true" to active tier card for screen reader accessibility
- Add role="alert" to all error paragraph elements
- Improve tier descriptions with concrete capacity values
Black detected double blank lines between class definitions in
platform_cost_test.py (pulled from dev base). Normalise to a single
blank line so the CI merge-commit lint check passes.
An empty STRIPE_WEBHOOK_SECRET (the default) allows an attacker to
compute a valid HMAC-SHA256 signature over the same key and forge any
webhook event (customer.subscription.created, etc.), escalating any
user to an arbitrary subscription tier without paying.
Fix: return 503 immediately when stripe_webhook_secret is unset rather
than proceeding to signature verification. Also add run_in_threadpool
to get_stripe_customer_id and remove the duplicate trialing-sub test.
Merges origin/feat/subscription-tier-billing which had the open-redirect
guard, blocking-IO fix, and idempotency/ENTERPRISE guard.
Test added: test_stripe_webhook_unconfigured_secret_returns_503
- Guard stripe_webhook: return 503 when STRIPE_WEBHOOK_SECRET is empty.
An empty secret allows HMAC forgery (attacker computes a valid sig over
the same key), so we reject all webhook calls when unconfigured.
- Suppress raw Stripe error from 502 cancel response; log server-side instead.
- Wrap all blocking Stripe SDK calls in run_in_threadpool: Customer.create,
Subscription.list, Subscription.cancel, checkout.Session.create.
- cancel_stripe_subscription now also cancels 'trialing' subscriptions
(previously only 'active'), preventing billing after a FREE downgrade.
- session.url None now raises ValueError instead of returning empty string.
- Add tests: webhook 503 on missing secret, trialing-sub cancellation.
## Why
The platform cost tracking system had several gaps that made the admin
dashboard less accurate and harder to reason about:
**Q: Do we have per-model granularity on the provider page?**
The `model` column was stored in `PlatformCostLog` but the SQL
aggregation grouped only by `(provider, tracking_type)`, so all models
for a given provider collapsed into one row. Now grouped by `(provider,
tracking_type, model)` — each model gets its own row.
**Q: Why does Anthropic show `per_run` for OrchestratorBlock?**
Bug: `OrchestratorBlock._call_llm()` was building `NodeExecutionStats`
with only `input_token_count` and `output_token_count` — it dropped
`resp.provider_cost` entirely. For OpenRouter calls this silently
discarded the `cost_usd`. For the SDK (autopilot) path,
`ResultMessage.total_cost_usd` was never read. When `provider_cost` is
None and token counts are 0 (e.g. SDK error path), `resolve_tracking`
falls through to `per_run`. Fixed by propagating all cost/cache fields.
**Q: Why can't we get `cost_usd` for Anthropic direct API calls?**
The Anthropic Messages API does not return a dollar amount — only token
counts. OpenRouter returns cost via response headers, so it uses
`cost_usd` directly. The Claude Agent SDK *does* compute
`total_cost_usd` internally, so SDK-mode OrchestratorBlock runs now get
`cost_usd` tracking. For direct Anthropic LLM blocks the estimate uses
per-token rates (see cache section below).
**Q: What about labeling by source (autopilot vs block)?**
Already tracked: `block_name` stores `copilot:SDK`, `copilot:Baseline`,
or the actual block name. Visible in the raw logs table. Not added to
the provider group-by (would explode row count); use the logs table
filter instead.
**Q: Is there double-counting between `tokens`, `per_run`, and
`cost_usd`?**
No. `resolve_tracking()` uses a strict preference hierarchy — exactly
one tracking type per execution: `cost_usd` > `tokens` > provider
heuristics > `per_run`. A single execution produces exactly one
`PlatformCostLog` row.
**Q: Should we track Anthropic prompt cache tokens (PR #12725)?**
Yes — PR #12725 adds `cache_control` markers to Anthropic API calls,
which causes the API to return `cache_read_input_tokens` and
`cache_creation_input_tokens` alongside regular `input_tokens`. These
have different billing rates:
- Cache reads: **10%** of base input rate (much cheaper)
- Cache writes: **125%** of base input rate (slightly more expensive,
one-time)
- Uncached input: **100%** of base rate
Without tracking them separately, a flat-rate estimate on
`total_input_tokens` would be wrong in both directions.
## What
- **Per-model provider table**: SQL now groups by `(provider,
tracking_type, model)`. `ProviderCostSummary` and the frontend
`ProviderTable` show a model column.
- **Cache token columns**: New `cacheReadTokens` and
`cacheCreationTokens` columns in `PlatformCostLog` with matching
migration.
- **LLM block cache tracking**: `LLMResponse` captures
`cache_read_input_tokens` / `cache_creation_input_tokens` from Anthropic
responses. `NodeExecutionStats` gains `cache_read_token_count` /
`cache_creation_token_count`. Both propagate to `PlatformCostEntry` and
the DB.
- **Copilot path**: `token_tracking.persist_and_record_usage` now writes
cache tokens as dedicated `PlatformCostEntry` fields (was
metadata-only).
- **OrchestratorBlock bug fix**: `_call_llm()` now includes
`resp.provider_cost`, `resp.cache_read_tokens`,
`resp.cache_creation_tokens` in the stats merge. SDK path captures
`ResultMessage.total_cost_usd` as `provider_cost`.
- **Accurate cost estimation**: `estimateCostForRow` uses
token-type-specific rates for `tokens` rows (uncached=100%, reads=10%,
writes=125% of configured base rate).
## How
`resolve_tracking` priority is unchanged. For Anthropic LLM blocks the
tracking type remains `tokens` (Anthropic API returns no dollar amount).
For OrchestratorBlock in SDK/autopilot mode it now correctly uses
`cost_usd` because the Claude Agent SDK computes and returns
`total_cost_usd`. For OpenRouter through OrchestratorBlock it now
correctly uses `cost_usd` (was silently dropped before).
## Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] `ProviderCostSummary` SQL updated
- [x] Cache token fields present in `PlatformCostEntry` and
`PlatformCostLogCreateInput`
- [x] Prisma client regenerated — all type checks pass
- [x] Frontend `helpers.test.ts` updated for new `rateKey` format
- [x] Pre-commit hooks pass (Black, Ruff, isort, tsc, Prisma generate)
### Why / What / How
**Why:** The `ask_question` copilot tool previously only accepted a
single question per invocation. When the LLM needs to ask multiple
clarifying questions simultaneously, it either crams them into one text
field (requiring users to format numbered answers manually) or makes
multiple sequential tool calls (slow and disruptive UX).
**What:** Replace the single `question`/`options`/`keyword` parameters
with a `questions` array parameter so the LLM can ask multiple questions
in one tool call, each rendered as its own input box.
**How:** Simplified the tool to accept only `questions` (array of
question objects). Each item has `question` (required), `options`, and
`keyword`. The frontend `ClarificationQuestionsCard` already supports
rendering multiple questions — no frontend changes needed.
### Changes 🏗️
- `backend/copilot/tools/ask_question.py`: Replaced dual
question/questions schema with single `questions` array. Extracted
parsing into module-level `_parse_questions` and `_parse_one` helpers.
Follows backend code style: early returns, list comprehensions, top-down
ordering, functions under 40 lines.
- `backend/copilot/tools/ask_question_test.py`: Rewritten with 18
focused tests covering happy paths, keyword handling, options filtering,
and invalid input handling.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] Run `poetry run pytest backend/copilot/tools/ask_question_test.py`
— all tests pass
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
_cancel_customer_subscriptions previously only queried status="active",
leaving trialing subscriptions in place. A user on a trial who downgrades
to FREE, or upgrades to a different paid tier, would continue to be billed
once the trial ended. Query both "active" and "trialing" statuses and
dedupe by sub id to ensure every billable sub is cleaned up.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When Stripe processes a subscription upgrade, the old subscription's
customer.subscription.deleted event may arrive after the new subscription's
customer.subscription.created has already been handled. Unconditionally
setting the user to FREE in the cancel branch would immediately undo the
upgrade.
sync_subscription_from_stripe now checks Stripe for other active/trialing
subscriptions on the same customer before downgrading. If at least one
different active sub exists, the handler preserves the current tier and
returns without writing. Added a regression test that mocks Stripe
returning sub_new as active and asserts set_subscription_tier is never
awaited.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Security and quality fixes for PR #12727 subscription tier billing review:
- Open-redirect protection: validate success_url/cancel_url against
settings.config.frontend_base_url before passing to Stripe Checkout.
- Blocking I/O: wrap every synchronous Stripe SDK call (Subscription.list,
Subscription.cancel, checkout.Session.create) with run_in_threadpool via
a shared _cancel_customer_subscriptions helper.
- Info leakage: log raw Stripe errors server-side but return a generic
502 detail to the client ("Please try again or contact support.").
- Webhook idempotency: skip DB writes in sync_subscription_from_stripe
when the tier is already current, avoiding redundant writes on retry.
- ENTERPRISE guard in webhook: refuse to overwrite ENTERPRISE tier from
Stripe events (admin-managed, not self-service).
- create_subscription_checkout raises ValueError on empty session.url
instead of silently returning "".
- Tests: fixture-based client (no leaky try/finally), open-redirect test,
ENTERPRISE 403 test, webhook dispatch test, trialing status test,
multi-sub partial-cancel-failure test, idempotency test, renamed
misleading "defaults to FREE" tests to "preserves_current_tier".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a PRO user upgrades to BUSINESS via a fresh Checkout Session, Stripe
creates a new subscription without touching the existing one, leaving the
customer double-billed. Cleaning up in sync_subscription_from_stripe
rather than the API handler ensures an abandoned Checkout does not leave
the user without a subscription: we only cancel the old sub once the new
sub has actually become active.
Errors listing or cancelling stale subs are logged but not propagated —
the new subscription tier still gets persisted, and Stripe will retry
the webhook later if listing fails.
Addresses sentry[bot] comment 3061713750 on PR #12727.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After a tier change, the rate-limit cache (get_user_tier, 5-minute TTL)
was not cleared, so CoPilot rate limits would continue enforcing the old tier
until the TTL expired. Call get_user_tier.cache_delete(user_id) via a local
import to avoid circular import issues.
Addresses sentry[bot] comment 3061725912 on PR #12727.
customer_update only accepts {address, name, shipping} per Stripe's TypedDict.
The payment_method key does not exist in CreateParamsCustomerUpdate, so pyright
was failing the type-check CI. Remove the invalid parameter — for Stripe
subscriptions the payment method used for the first invoice is automatically
saved to the customer by Stripe.
getSubscription() and setSubscriptionTier() in client.ts were replaced by
generated hooks (useGetSubscriptionStatus, useUpdateSubscriptionTier) and
are no longer called anywhere in the codebase. Remove them to avoid adding
further surface area to the deprecated BackendAPI.
- stripe.Subscription.list() is now wrapped in try-except; StripeError
is logged and re-raised so callers know the listing failed.
- stripe.Subscription.cancel() StripeError is now re-raised (was swallowed),
preventing set_subscription_tier from marking the user FREE when Stripe
cancellation failed.
- update_subscription_tier catches StripeError from cancel and returns HTTP 502
so DB tier is only updated if Stripe succeeds.
- Fix test patch path: use backend.data.credit.stripe.checkout.Session.create
instead of bare stripe.checkout.Session.create for import-refactor safety.
- Add tests for raise-on-list-failure, raise-on-cancel-failure, and
502 route response on cancel failure.
Addresses sentry[bot] comments 3061585490, 3061654688 on PR #12727.
Adds customer_update={payment_method: auto} so the payment method used
for subscription is set as the Stripe customer's default. Makes it show
pre-selected in future Checkout sessions (manual top-ups).
- Import get_subscription_price_id in v1.py
- get_subscription_status now calls stripe.Price.retrieve for PRO/BUSINESS
tiers to return actual unit_amount instead of hardcoded zeros
- UI will now show correct monthly costs when LD price IDs are configured
- Fix Button import from __legacy__ to design system in SubscriptionTierSection
- Update subscription status tests to mock the new Stripe price lookup
Drop the dual question/questions schema in favor of a single
`questions` array parameter. This removes ~175 lines of complexity
(the _execute_single path, duplicate params, precedence logic).
Restructured per backend code style rules:
- Top-down ordering: public _execute first, helpers below
- Early return with guard clauses, no deep nesting
- List comprehensions via walrus operator in _parse_questions
- Helpers extracted as module-level functions (not methods)
- Functions under 40 lines each
The frontend ClarificationQuestionsCard already renders arrays of
any length — no UI changes needed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add isinstance narrowing in test_execute_multiple_questions_ignores_single_params
to fix Pyright type-check CI failure (reportAttributeAccessIssue).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests that access `result.questions` without first narrowing the type
from `ToolResponseBase` to `ClarificationNeededResponse` cause Pyright
type-check failures. Added `assert isinstance(result,
ClarificationNeededResponse)` before accessing `.questions` in 4 tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When navigating back to a cached session, appliedActionKeys was reset to empty
but messages were preserved. This caused previously applied actions to reappear
as unapplied in the UI, allowing them to be re-applied and creating duplicate
undo entries. Clearing messages unconditionally on navigation ensures the
displayed action buttons always reflect the actual applied state.
- Restore top-level `required: ["question"]` in schema for LLM tool-
calling compatibility; validation handles the questions-only path
- Fix keyword null bug: `item.get("keyword")` returning None now
correctly falls back to `question-{idx}` instead of producing "None"
- Filter empty-string options in _build_question (`str(o).strip()`)
to avoid artifacts like "Email, , Slack"
- Revert session type hint to `ChatSession` to match base class contract
- Add tests for null keyword and empty-string options filtering
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove top-level `required: ["question"]` from schema so the
`questions`-only calling convention is valid for schema-compliant LLMs
- Move logger assignment below all imports (PEP 8 / isort)
- Remove duplicated option filtering in `_execute_single`; let
`_build_question` own that responsibility
- Fix `session` type hint to `ChatSession | None` to match the guard
- Add test for `questions` as non-list type (falls back to single path)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix falsy option filtering: use `if o is not None` instead of `if o`
so valid values like "0" are preserved
- Improve multi-question `message` field: join all questions with ";"
instead of only using the first question's text
- Add logging warnings for skipped invalid items in multi-question path
instead of silently dropping them
- Simplify schema: use `"required": ["question"]` instead of empty
required + anyOf (more LLM-friendly)
- Add missing test cases: session=None, single-item questions array,
duplicate keywords, falsy option values
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ask_question tool previously only accepted a single question per
invocation, forcing the LLM to cram multiple queries into one text box
or make multiple sequential tool calls. This adds a `questions` parameter
(list of question objects) so multiple input fields render at once.
Backward-compatible: the existing `question`/`options`/`keyword` params
still work. When `questions` (plural) is provided, they take precedence.
The frontend ClarificationQuestionsCard already supports rendering
multiple questions — no frontend changes needed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests for GET/POST /credits/subscription covering:
- GET returns current tier (PRO, FREE default when None)
- POST FREE skips Stripe when payment disabled
- POST PRO sets tier directly for beta users (payment disabled)
- POST paid tier rejects missing success_url/cancel_url with 422
- POST paid tier creates Stripe Checkout Session and returns URL
- POST FREE with payment enabled cancels active Stripe subscription
- Remove useCallback from changeTier (not needed per project guidelines)
- Block self-service tier changes for ENTERPRISE users (admin-managed)
- Preserve current tier on unrecognized Stripe price_id instead of
defaulting to FREE (prevents accidental downgrades during price migration)
Tests for:
- Unknown/mismatched Stripe price_id defaults to FREE (not early return)
- None from LaunchDarkly price flags defaults to FREE
- BUSINESS tier mapping
- StripeError during cancel_stripe_subscription is logged, not raised
When sync_subscription_from_stripe encounters an unrecognized price_id
(e.g. LD flags unconfigured or price changed), it no longer returns early
leaving the user on a stale tier. Instead it defaults to FREE and logs a
warning, keeping the DB state consistent with Stripe's subscription status.
Also guard against None pro_price/biz_price from LaunchDarkly before
comparison to avoid silent mismatches.
EditAgentTool and RunAgentTool call useCopilotChatActions() which throws
if no provider is in the tree. Wrap the panel content with
CopilotChatActionsProvider wired to sendRawMessage so tool components
can send retry prompts without crashing.
The user message was saved to DB before the <user_context> prefix was added
to session.messages. Subsequent upsert_chat_session calls only append new
messages (slicing by existing_message_count), so the prefixed content was
never written to the DB. On page reload or --resume, the unprefixed version
was loaded, losing personalisation.
Fix: add update_message_content_by_sequence to db.py and call it after
injecting the prefix in both sdk/service.py and baseline/service.py.
Add customer.subscription.created to the sync handler so user tier is
upgraded immediately when the subscription is first created (not just on
subsequent updates/deletions).
dev branch already creates SubscriptionTier enum and subscriptionTier column in
20260326200000_add_rate_limit_tier. Remove duplicate DDL from our migration and
only add SUBSCRIPTION to CreditTransactionType using IF NOT EXISTS guard.
Add cancel_stripe_subscription() which lists and cancels all active Stripe
subscriptions for the customer, preventing continued billing after downgrade.
Call it from update_subscription_tier() when tier == FREE and payment is
enabled. Add two unit tests covering active and empty subscription scenarios.
Add tests for the /logs/export endpoint (success, truncated, filters, auth) and
fix missing import of get_platform_cost_logs_for_export in platform_cost_test.py.
Add upfront 422 validation when upgrading to a paid tier without providing
redirect URLs. Also catch stripe.StripeError alongside ValueError to return
a proper 422 instead of a 500 on Stripe API errors.