Round 3 follow-up: the drain-at-start in ``stream_chat_completion_baseline``
persisted pending messages to ``session.messages`` but never called
``transcript_builder.append_user`` for them. A mid-turn transcript
upload would be missing the drained text, which could produce a
malformed assistant-after-assistant structure on the next turn.
The drain block runs BEFORE ``transcript_builder`` is instantiated
(which happens after prompt/transcript async setup), so we can't call
append_user in the drain block itself. Instead, we remember the
drained list and mirror it into the transcript right after the
single-message ``transcript_builder.append_user(content=message)``
call near the prompt-build site.
Also cleaned up the stray adjacent-string concatenation in the log
line (``"...turn start " "for session %s"`` → single string).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical: SDK path was double-injecting. The endpoint persisted the
message to ``session.messages`` AND the executor drained it from Redis
and concatenated into ``current_message`` — the LLM saw each queued
message twice (once via the compacted history / gap context that
``_build_query_message`` pulls from ``session.messages``, once via
the new query). Baseline avoided this via ``maybe_append_user_message``
dedup but SDK had no equivalent guard.
### Fix: Redis is the single source of truth
- Endpoint no longer persists to ``session.messages``. It only
pushes to Redis and returns.
- Baseline drain-at-start calls ``maybe_append_user_message`` (dedup
is a safety net, not the primary guard).
- SDK drain-at-start calls ``maybe_append_user_message`` too, so the
durable transcript records the queued messages. The concatenation
into ``current_message`` stays so the SDK CLI sees the content in
the first user message of the new turn.
### Baseline max-iterations silent-loss — Fixed
``tool_call_loop`` yields ``finished_naturally=False`` when
``iteration == max_iterations`` then returns. Previously the drain
only skipped ``finished_naturally=True``, so messages drained on the
max-iterations final yield were appended to ``openai_messages`` and
silently lost (the loop was already exiting). Now the drain also
skips when ``loop_result.iterations >= _MAX_TOOL_ROUNDS``.
### API response cleanup
- ``QueuePendingMessageResponse``: dropped ``queued`` (always True) and
``detail`` (human-readable, clients shouldn't parse). Kept
``buffer_length``, ``max_buffer_length``, and ``turn_in_flight``.
### Tests
- Removed dead ``_FakePipeline`` class (the code switched to Lua EVAL
in round 1 so the pipeline fake was unused).
- Added ``test_drain_decodes_bytes_payloads`` so the ``bytes → str``
decode branch in ``drain_pending_messages`` is actually exercised
(real redis-py returns bytes when ``decode_responses=False``).
- Updated ``_FakeRedis.lists`` type hint to ``list[str | bytes]``.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical fix — the SDK mid-stream injection was structurally broken.
``ClaudeSDKClient.receive_response()`` explicitly returns after the
first ``ResultMessage``, so re-issuing ``client.query()`` and setting
``acc.stream_completed = False`` could never restart the iteration —
the next ``__anext__`` raised ``StopAsyncIteration`` and the injected
turn's response was never consumed. Replaced the broken mid-stream
path with a turn-start drain that works for both baseline and SDK.
### Changes
**Atomic push via Lua EVAL** (``pending_messages.py``)
- Replace the ``RPUSH`` + ``LTRIM`` + ``EXPIRE`` + ``LLEN`` pipeline
(which was ``transaction=False`` and racy against concurrent
``LPOP``) with a single Lua script so the push is atomic.
- Drop the unused ``enqueued_at`` field.
- Add 16k ``max_length`` cap on ``PendingMessage.content``.
**Baseline path** (``baseline/service.py``)
- Drain at turn start (atomic ``LPOP``): any message queued while the
session was idle or between turns is picked up before the first
LLM call.
- Mid-loop drain now skips the final ``tool_call_loop`` yield
(``finished_naturally=True``) — draining there would append a user
message the loop is about to exit past, silently losing it.
- Inject via ``format_pending_as_user_message`` so file IDs + context
are preserved in both ``openai_messages`` and the persisted session
transcript (previously the DB copy lost file/context metadata).
- Remove the ``finally`` ``clear_pending_messages`` — atomic drain at
turn start means any late push belongs to the next turn; clearing
here would racily clobber it.
**SDK path** (``sdk/service.py``)
- Remove the broken mid-stream injection block entirely.
- Drain at turn start (same atomic ``LPOP``) and merge the drained
messages into ``current_message`` before ``_build_query_message``,
so the SDK CLI sees them as part of the initial user message.
- Remove the ``finally`` ``clear_pending_messages``.
- Delete the unused ``_combine_pending_messages`` helper.
**Endpoint** (``api/features/chat/routes.py``)
- Enforce ``check_rate_limit`` / ``get_global_rate_limits`` — was
bypassing per-user daily/weekly token limits that ``/stream``
enforces.
- ``QueuePendingMessageRequest`` gets ``extra="forbid"`` and
``message: max_length=16_000``.
- Push-first, persist-second: if the Redis push fails we raise 5xx;
previously the session DB got an orphan user message with no
corresponding queued entry and a retry would duplicate it.
- Log a warning when sanitised file IDs drop unknown entries.
- Persisted message content now uses ``format_pending_as_user_message``
so the session copy matches what the model actually sees on drain.
- Response returns ``buffer_length``, ``max_buffer_length``, and
``turn_in_flight`` so the frontend can show accurate feedback about
whether the message will hit the current turn or the next one.
**Tests** (``pending_messages_test.py``)
- ``_FakeRedis.eval`` emulates the Lua push script so the existing
push/drain/cap tests keep working under the new atomic path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a user sends a follow-up message while a copilot turn is still
streaming, we now queue it into a per-session Redis buffer and let the
executor currently processing the turn drain it between tool-call
rounds — the model sees the new message before its next LLM call.
Previously such messages were blocked at the RabbitMQ/cluster-lock
layer and only processed after the current turn completed.
### New module
`backend/copilot/pending_messages.py`
- Redis list buffer keyed by ``copilot:pending:{session_id}``
- Pub/sub notify channel as a wake-up hint for future blocking-wait use
- Cap of ``MAX_PENDING_MESSAGES=10`` — trims oldest on overflow
- 1h TTL matches ``stream_ttl`` default
- Helpers: ``push_pending_message``, ``drain_pending_messages``,
``peek_pending_count``, ``clear_pending_messages``,
``format_pending_as_user_message``
### New endpoint
`POST /sessions/{session_id}/messages/pending`
- Returns 202 + current buffer length
- Persists the message to the DB so it's in the transcript immediately
- Sanitises file IDs against the caller's workspace
- Does NOT start a new turn (unlike ``stream``)
### Baseline path (simple — in-process injection)
`backend/copilot/baseline/service.py`
- Between iterations of ``tool_call_loop``, drain pending and append to
the shared ``openai_messages`` list so the loop picks them up on the
next LLM call
- Persist session via ``upsert_chat_session`` after injection
- Finally-block safety net clears the buffer on early exit
### SDK path (in-process injection via live client.query)
`backend/copilot/sdk/service.py`
- When the SDK loop detects ``acc.stream_completed``, before breaking,
drain pending and send them via the existing open ``client.query()``
as a new user message; reset ``stream_completed`` to ``False`` and
``continue`` the async-for loop so we keep consuming CLI messages
- Combines multiple drained messages into a single ``query()`` call via
``_combine_pending_messages`` to preserve ordering
- Finally-block safety net clears the buffer on early exit
- This works because the Claude Agent SDK's ``ClaudeSDKClient`` is a
long-lived connection: ``query()`` writes a new user message to the
CLI's stdin and the same ``receive_response()`` stream picks up the
next turn's events, so we keep session continuity without releasing
the cluster lock or restarting the subprocess
### Tests
`backend/copilot/pending_messages_test.py`
- FakeRedis + FakePipeline so tests don't need a live Redis
- Covers push/drain, ordering, buffer cap (MAX_PENDING_MESSAGES),
clear, publish hook, malformed-payload handling, and the format
helper (plain / with context / with file_ids)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
### Why / What / How
**Why:** The `ask_question` copilot tool previously only accepted a
single question per invocation. When the LLM needs to ask multiple
clarifying questions simultaneously, it either crams them into one text
field (requiring users to format numbered answers manually) or makes
multiple sequential tool calls (slow and disruptive UX).
**What:** Replace the single `question`/`options`/`keyword` parameters
with a `questions` array parameter so the LLM can ask multiple questions
in one tool call, each rendered as its own input box.
**How:** Simplified the tool to accept only `questions` (array of
question objects). Each item has `question` (required), `options`, and
`keyword`. The frontend `ClarificationQuestionsCard` already supports
rendering multiple questions — no frontend changes needed.
### Changes 🏗️
- `backend/copilot/tools/ask_question.py`: Replaced dual
question/questions schema with single `questions` array. Extracted
parsing into module-level `_parse_questions` and `_parse_one` helpers.
Follows backend code style: early returns, list comprehensions, top-down
ordering, functions under 40 lines.
- `backend/copilot/tools/ask_question_test.py`: Rewritten with 18
focused tests covering happy paths, keyword handling, options filtering,
and invalid input handling.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] Run `poetry run pytest backend/copilot/tools/ask_question_test.py`
— all tests pass
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Import get_subscription_price_id in v1.py
- get_subscription_status now calls stripe.Price.retrieve for PRO/BUSINESS
tiers to return actual unit_amount instead of hardcoded zeros
- UI will now show correct monthly costs when LD price IDs are configured
- Fix Button import from __legacy__ to design system in SubscriptionTierSection
- Update subscription status tests to mock the new Stripe price lookup
Drop the dual question/questions schema in favor of a single
`questions` array parameter. This removes ~175 lines of complexity
(the _execute_single path, duplicate params, precedence logic).
Restructured per backend code style rules:
- Top-down ordering: public _execute first, helpers below
- Early return with guard clauses, no deep nesting
- List comprehensions via walrus operator in _parse_questions
- Helpers extracted as module-level functions (not methods)
- Functions under 40 lines each
The frontend ClarificationQuestionsCard already renders arrays of
any length — no UI changes needed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add isinstance narrowing in test_execute_multiple_questions_ignores_single_params
to fix Pyright type-check CI failure (reportAttributeAccessIssue).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests that access `result.questions` without first narrowing the type
from `ToolResponseBase` to `ClarificationNeededResponse` cause Pyright
type-check failures. Added `assert isinstance(result,
ClarificationNeededResponse)` before accessing `.questions` in 4 tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When navigating back to a cached session, appliedActionKeys was reset to empty
but messages were preserved. This caused previously applied actions to reappear
as unapplied in the UI, allowing them to be re-applied and creating duplicate
undo entries. Clearing messages unconditionally on navigation ensures the
displayed action buttons always reflect the actual applied state.
- Restore top-level `required: ["question"]` in schema for LLM tool-
calling compatibility; validation handles the questions-only path
- Fix keyword null bug: `item.get("keyword")` returning None now
correctly falls back to `question-{idx}` instead of producing "None"
- Filter empty-string options in _build_question (`str(o).strip()`)
to avoid artifacts like "Email, , Slack"
- Revert session type hint to `ChatSession` to match base class contract
- Add tests for null keyword and empty-string options filtering
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove top-level `required: ["question"]` from schema so the
`questions`-only calling convention is valid for schema-compliant LLMs
- Move logger assignment below all imports (PEP 8 / isort)
- Remove duplicated option filtering in `_execute_single`; let
`_build_question` own that responsibility
- Fix `session` type hint to `ChatSession | None` to match the guard
- Add test for `questions` as non-list type (falls back to single path)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix falsy option filtering: use `if o is not None` instead of `if o`
so valid values like "0" are preserved
- Improve multi-question `message` field: join all questions with ";"
instead of only using the first question's text
- Add logging warnings for skipped invalid items in multi-question path
instead of silently dropping them
- Simplify schema: use `"required": ["question"]` instead of empty
required + anyOf (more LLM-friendly)
- Add missing test cases: session=None, single-item questions array,
duplicate keywords, falsy option values
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ask_question tool previously only accepted a single question per
invocation, forcing the LLM to cram multiple queries into one text box
or make multiple sequential tool calls. This adds a `questions` parameter
(list of question objects) so multiple input fields render at once.
Backward-compatible: the existing `question`/`options`/`keyword` params
still work. When `questions` (plural) is provided, they take precedence.
The frontend ClarificationQuestionsCard already supports rendering
multiple questions — no frontend changes needed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests for GET/POST /credits/subscription covering:
- GET returns current tier (PRO, FREE default when None)
- POST FREE skips Stripe when payment disabled
- POST PRO sets tier directly for beta users (payment disabled)
- POST paid tier rejects missing success_url/cancel_url with 422
- POST paid tier creates Stripe Checkout Session and returns URL
- POST FREE with payment enabled cancels active Stripe subscription
- Remove useCallback from changeTier (not needed per project guidelines)
- Block self-service tier changes for ENTERPRISE users (admin-managed)
- Preserve current tier on unrecognized Stripe price_id instead of
defaulting to FREE (prevents accidental downgrades during price migration)
Tests for:
- Unknown/mismatched Stripe price_id defaults to FREE (not early return)
- None from LaunchDarkly price flags defaults to FREE
- BUSINESS tier mapping
- StripeError during cancel_stripe_subscription is logged, not raised
When sync_subscription_from_stripe encounters an unrecognized price_id
(e.g. LD flags unconfigured or price changed), it no longer returns early
leaving the user on a stale tier. Instead it defaults to FREE and logs a
warning, keeping the DB state consistent with Stripe's subscription status.
Also guard against None pro_price/biz_price from LaunchDarkly before
comparison to avoid silent mismatches.
EditAgentTool and RunAgentTool call useCopilotChatActions() which throws
if no provider is in the tree. Wrap the panel content with
CopilotChatActionsProvider wired to sendRawMessage so tool components
can send retry prompts without crashing.
The user message was saved to DB before the <user_context> prefix was added
to session.messages. Subsequent upsert_chat_session calls only append new
messages (slicing by existing_message_count), so the prefixed content was
never written to the DB. On page reload or --resume, the unprefixed version
was loaded, losing personalisation.
Fix: add update_message_content_by_sequence to db.py and call it after
injecting the prefix in both sdk/service.py and baseline/service.py.
Add customer.subscription.created to the sync handler so user tier is
upgraded immediately when the subscription is first created (not just on
subsequent updates/deletions).
dev branch already creates SubscriptionTier enum and subscriptionTier column in
20260326200000_add_rate_limit_tier. Remove duplicate DDL from our migration and
only add SUBSCRIPTION to CreditTransactionType using IF NOT EXISTS guard.
Add cancel_stripe_subscription() which lists and cancels all active Stripe
subscriptions for the customer, preventing continued billing after downgrade.
Call it from update_subscription_tier() when tier == FREE and payment is
enabled. Add two unit tests covering active and empty subscription scenarios.
Add tests for the /logs/export endpoint (success, truncated, filters, auth) and
fix missing import of get_platform_cost_logs_for_export in platform_cost_test.py.
Add upfront 422 validation when upgrading to a paid tier without providing
redirect URLs. Also catch stripe.StripeError alongside ValueError to return
a proper 422 instead of a 500 on Stripe API errors.
Pressing Escape while drafting a message was silently discarding the
user's text. Guard the handler so it only closes the panel when focus is
outside an editable element.
Remove get_subscription_cost (referenced deleted flags SUBSCRIPTION_COST_PRO/BUSINESS).
Subscription pricing is now handled by Stripe. Add GRAPHITI_MEMORY flag from dev.
- Remove `# type: ignore[attr-defined]` suppressors from `set_auto_top_up`
and `set_subscription_tier` — pyright resolves `CachedFunction.cache_delete`
through the import boundary without the suppressor
- Add `max(0, ...)` guard to `get_subscription_cost` to prevent negative
LaunchDarkly flag values from yielding negative costs
- Change `SubscriptionTierRequest.tier` from `str` to
`Literal["FREE", "PRO", "BUSINESS"]` so Pydantic rejects ENTERPRISE and
any unknown tier with a 422 at the schema layer
- Move `SubscriptionTier` and feature-flag imports from local function scope
to module-level in v1.py (top-level imports policy)
- Fix `test_sync_subscription_from_stripe_active` mock to use a proper async
`side_effect` function instead of calling an `AsyncMock` inline
POST /credits/subscription now returns {url} when Stripe checkout is needed.
Redirect user to Stripe on non-empty URL, refresh tier on empty URL (beta/FREE).
Remove credit-based tier validation; Stripe handles payment gating.
- Restore CreditTransactionType to original enum without SUBSCRIPTION
- Restore input/ctx fields in ValidationError schema
These changes were accidentally included from workspace drift; they are
not part of this PR and should come from their own respective PRs.
- Replace type: ignore suppressor with monkeypatch.setattr in AIConditionBlock test
- Replace bare sentry_sdk module with typed API imports in metrics/service/manager
- Fix CostLogRow field order: cache_read/creation_tokens after model
- Move /logs/export endpoint to correct position in paths (before analytics)
- Add model, block_name, tracking_type params to export endpoint schema
- Add PlatformCostExportResponse in correct schema position
- Add SUBSCRIPTION to CreditTransactionType enum
- Remove input/ctx from ValidationError schema
- Add model/block/type filter UI inputs and wire to hook/URL
- Make AnthropicIntegration and LaunchDarklyIntegration optional imports in metrics.py
- Add export CSV button wired to handleExport in LogsTable