The no-resume fallback in _build_query_message used raw msg_count (> 1) to
detect multi-message history and session.messages[:-1] for the compression
slice. After a turn-start drain appends pending messages, msg_count is inflated
and the fallback fires on what should be a fresh first turn, placing the current
user message into the history context and delivering a confusing split prompt to
the model.
Apply session_msg_ceiling to both branches:
- elif condition: effective_count > 1 instead of msg_count > 1
- compression slice: session.messages[:effective_count - 1] instead of [:-1]
With _pre_drain_msg_count=1 on a first turn with drained pending messages,
effective_count=1 so the fallback is correctly skipped and current_message
(which already contains both the original and pending text) is returned as-is.
Adds regression test covering the spurious-fallback scenario.
When use_resume=True and the transcript is stale, _build_query_message computes
a gap slice from session.messages[transcript_msg_count:-1]. Pending messages
drained at turn start are appended to session.messages AND concatenated into
current_message, so without the ceiling they appear in both gap_context and
current_message.
Capture _pre_drain_msg_count before drain_pending_messages() and pass it as
session_msg_ceiling to _build_query_message. The gap slice is now bounded at
the pre-drain count, preventing pending messages from leaking into the gap.
Adds two regression tests in query_builder_test.py.
Replace separate INCR + EXPIRE with a single Lua EVAL so the rate-limit
key can never be orphaned without a TTL. If the process died between the
two commands the key would persist indefinitely, permanently locking out
the user after hitting the 30-push limit.
Fixes sentry bug report on routes.py:1153.
The token-budget check guards against over-spending but does not prevent
rapid-fire pushes from a client with a large budget. Add a Redis
INCR + EXPIRE sliding-window counter (30 calls per 60-second window per
user) to cap call frequency independently of token consumption.
Returns HTTP 429 with "Too many pending messages" when exceeded.
Fails open (Redis unavailable → allows request).
Adds test for the new 429 path.
Addresses autogpt-pr-reviewer "Should Fix: per-request rate limit".
Verify that content + context (url + content) + file_ids all appear in
the formatted output when all fields are present simultaneously.
Addresses autogpt-pr-reviewer 'format_pending_as_user_message never
tested with all fields simultaneously'.
Extract `_resolve_workspace_files(user_id, file_ids)` helper from the
duplicated UUID-filter + workspace-DB-lookup logic in both
`stream_chat_post` and `queue_pending_message`.
Both endpoints now call the single helper; callers map the returned
`list[UserWorkspaceFile]` to IDs or file-description strings as before.
Also removes the redundant `if user_id:` guard from `stream_chat_post`'s
file-ID block — `Security(auth.get_user_id)` guarantees a non-empty string.
Addresses autogpt-pr-reviewer "Should Fix: Duplicated file-ID sanitization"
and coderabbitai nit on the if user_id guard.
- Replace broad `except Exception` with `except (json.JSONDecodeError,
ValidationError, TypeError, ValueError)` in drain_pending_messages so
unexpected non-data errors propagate instead of being silently swallowed
- Introduce `PendingMessageContext` Pydantic model to replace the raw
`dict[str, str]` for the context field, making the url/content contract
explicit and enabling typed attribute access instead of .get() calls
- Update routes.py to construct PendingMessageContext from the validated
request dict before passing to PendingMessage
- Update tests to use PendingMessageContext directly
Addresses coderabbitai review comments.
Replace broad `url in content` assertion with exact `[Page URL: url]`
substring check so CodeQL does not flag it as Incomplete URL Substring
Sanitization.
- Use _pre_drain_msg_count for transcript load gate (len > 1 check)
to avoid spurious transcript load on first turn with pending messages
- Use _pre_drain_msg_count for Graphiti warm context gate to prevent
warm context skip when pending messages are drained at first turn
- Add context.url/content length validators to QueuePendingMessageRequest
to prevent LLM context-window stuffing (2K url, 32K content caps)
- Rename underscore-prefixed active variables (_pm, _content, _pt)
to conventional names (pm, content, pt) per Python convention
1. SDK pyright: the inner ``_fetch_transcript`` closure captured
``session`` which pyright couldn't narrow to non-None (the outer
scope casts it, but the narrowing doesn't propagate into the
nested async function). Added an explicit ``assert session is not
None`` at the top of the closure.
2. Lint: re-formatted ``platform_cost_test.py`` — some pre-existing
whitespace drift from an upstream merge was tripping Black on CI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. SDK retry tests failing with "Event loop is closed" — the
drain-at-start call in stream_chat_completion_sdk was reaching the
real ``drain_pending_messages`` (which hits Redis) instead of being
mocked. Added a ``drain_pending_messages`` stub returning ``[]`` to
the shared ``_make_sdk_patches`` helper so all retry-integration
tests skip the drain path.
2. API types check failing — the new
``POST /sessions/{id}/messages/pending`` endpoint wasn't reflected
in the frontend's ``openapi.json``. Regenerated via
``poetry run export-api-schema --output ../frontend/src/app/api/openapi.json``
and ``pnpm prettier --write``.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace maybe_append_user_message with direct session.messages.append
for pending drain in both baseline mid-loop and SDK drain-at-start:
pending messages are atomically popped from Redis and are never
stale-cache duplicates, so the dedup is wrong and causes
openai_messages/transcript to diverge from the DB record
- Add immediate upsert_chat_session after SDK drain-at-start so a
crash between drain and finally doesn't lose messages already removed
from Redis
- Capture _pre_drain_msg_count before the baseline drain-at-start:
use it for is_first_turn (prevents pending messages from flipping the
flag to False on an actual first turn) and for _load_prior_transcript
(prevents the stale-transcript check from firing on every turn that
drains pending messages, which would block transcript upload forever)
- Remove redundant if user_id: guards in queue_pending_message — user_id
is guaranteed non-empty by Security(auth.get_user_id); the guards made
the rate-limit check silently optional
Round 4 review nits:
- ``_PUSH_LUA`` block comment mentioned "returns 0 from our earlier
LLEN" which was a leftover from an earlier design that had a
separate LLEN check. The atomicity guarantee doesn't depend on it.
Reworded to describe Redis EVAL serialisation instead.
- ``clear_pending_messages`` docstring said "called at the end of a
turn" but the finally-block call sites were removed in round 2
when the atomic drain-at-start became the primary consumer. The
function is now only an operator/debug escape hatch. Docstring
updated to match.
No behavioural change.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Round 3 follow-up: the drain-at-start in ``stream_chat_completion_baseline``
persisted pending messages to ``session.messages`` but never called
``transcript_builder.append_user`` for them. A mid-turn transcript
upload would be missing the drained text, which could produce a
malformed assistant-after-assistant structure on the next turn.
The drain block runs BEFORE ``transcript_builder`` is instantiated
(which happens after prompt/transcript async setup), so we can't call
append_user in the drain block itself. Instead, we remember the
drained list and mirror it into the transcript right after the
single-message ``transcript_builder.append_user(content=message)``
call near the prompt-build site.
Also cleaned up the stray adjacent-string concatenation in the log
line (``"...turn start " "for session %s"`` → single string).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical: SDK path was double-injecting. The endpoint persisted the
message to ``session.messages`` AND the executor drained it from Redis
and concatenated into ``current_message`` — the LLM saw each queued
message twice (once via the compacted history / gap context that
``_build_query_message`` pulls from ``session.messages``, once via
the new query). Baseline avoided this via ``maybe_append_user_message``
dedup but SDK had no equivalent guard.
### Fix: Redis is the single source of truth
- Endpoint no longer persists to ``session.messages``. It only
pushes to Redis and returns.
- Baseline drain-at-start calls ``maybe_append_user_message`` (dedup
is a safety net, not the primary guard).
- SDK drain-at-start calls ``maybe_append_user_message`` too, so the
durable transcript records the queued messages. The concatenation
into ``current_message`` stays so the SDK CLI sees the content in
the first user message of the new turn.
### Baseline max-iterations silent-loss — Fixed
``tool_call_loop`` yields ``finished_naturally=False`` when
``iteration == max_iterations`` then returns. Previously the drain
only skipped ``finished_naturally=True``, so messages drained on the
max-iterations final yield were appended to ``openai_messages`` and
silently lost (the loop was already exiting). Now the drain also
skips when ``loop_result.iterations >= _MAX_TOOL_ROUNDS``.
### API response cleanup
- ``QueuePendingMessageResponse``: dropped ``queued`` (always True) and
``detail`` (human-readable, clients shouldn't parse). Kept
``buffer_length``, ``max_buffer_length``, and ``turn_in_flight``.
### Tests
- Removed dead ``_FakePipeline`` class (the code switched to Lua EVAL
in round 1 so the pipeline fake was unused).
- Added ``test_drain_decodes_bytes_payloads`` so the ``bytes → str``
decode branch in ``drain_pending_messages`` is actually exercised
(real redis-py returns bytes when ``decode_responses=False``).
- Updated ``_FakeRedis.lists`` type hint to ``list[str | bytes]``.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical fix — the SDK mid-stream injection was structurally broken.
``ClaudeSDKClient.receive_response()`` explicitly returns after the
first ``ResultMessage``, so re-issuing ``client.query()`` and setting
``acc.stream_completed = False`` could never restart the iteration —
the next ``__anext__`` raised ``StopAsyncIteration`` and the injected
turn's response was never consumed. Replaced the broken mid-stream
path with a turn-start drain that works for both baseline and SDK.
### Changes
**Atomic push via Lua EVAL** (``pending_messages.py``)
- Replace the ``RPUSH`` + ``LTRIM`` + ``EXPIRE`` + ``LLEN`` pipeline
(which was ``transaction=False`` and racy against concurrent
``LPOP``) with a single Lua script so the push is atomic.
- Drop the unused ``enqueued_at`` field.
- Add 16k ``max_length`` cap on ``PendingMessage.content``.
**Baseline path** (``baseline/service.py``)
- Drain at turn start (atomic ``LPOP``): any message queued while the
session was idle or between turns is picked up before the first
LLM call.
- Mid-loop drain now skips the final ``tool_call_loop`` yield
(``finished_naturally=True``) — draining there would append a user
message the loop is about to exit past, silently losing it.
- Inject via ``format_pending_as_user_message`` so file IDs + context
are preserved in both ``openai_messages`` and the persisted session
transcript (previously the DB copy lost file/context metadata).
- Remove the ``finally`` ``clear_pending_messages`` — atomic drain at
turn start means any late push belongs to the next turn; clearing
here would racily clobber it.
**SDK path** (``sdk/service.py``)
- Remove the broken mid-stream injection block entirely.
- Drain at turn start (same atomic ``LPOP``) and merge the drained
messages into ``current_message`` before ``_build_query_message``,
so the SDK CLI sees them as part of the initial user message.
- Remove the ``finally`` ``clear_pending_messages``.
- Delete the unused ``_combine_pending_messages`` helper.
**Endpoint** (``api/features/chat/routes.py``)
- Enforce ``check_rate_limit`` / ``get_global_rate_limits`` — was
bypassing per-user daily/weekly token limits that ``/stream``
enforces.
- ``QueuePendingMessageRequest`` gets ``extra="forbid"`` and
``message: max_length=16_000``.
- Push-first, persist-second: if the Redis push fails we raise 5xx;
previously the session DB got an orphan user message with no
corresponding queued entry and a retry would duplicate it.
- Log a warning when sanitised file IDs drop unknown entries.
- Persisted message content now uses ``format_pending_as_user_message``
so the session copy matches what the model actually sees on drain.
- Response returns ``buffer_length``, ``max_buffer_length``, and
``turn_in_flight`` so the frontend can show accurate feedback about
whether the message will hit the current turn or the next one.
**Tests** (``pending_messages_test.py``)
- ``_FakeRedis.eval`` emulates the Lua push script so the existing
push/drain/cap tests keep working under the new atomic path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a user sends a follow-up message while a copilot turn is still
streaming, we now queue it into a per-session Redis buffer and let the
executor currently processing the turn drain it between tool-call
rounds — the model sees the new message before its next LLM call.
Previously such messages were blocked at the RabbitMQ/cluster-lock
layer and only processed after the current turn completed.
### New module
`backend/copilot/pending_messages.py`
- Redis list buffer keyed by ``copilot:pending:{session_id}``
- Pub/sub notify channel as a wake-up hint for future blocking-wait use
- Cap of ``MAX_PENDING_MESSAGES=10`` — trims oldest on overflow
- 1h TTL matches ``stream_ttl`` default
- Helpers: ``push_pending_message``, ``drain_pending_messages``,
``peek_pending_count``, ``clear_pending_messages``,
``format_pending_as_user_message``
### New endpoint
`POST /sessions/{session_id}/messages/pending`
- Returns 202 + current buffer length
- Persists the message to the DB so it's in the transcript immediately
- Sanitises file IDs against the caller's workspace
- Does NOT start a new turn (unlike ``stream``)
### Baseline path (simple — in-process injection)
`backend/copilot/baseline/service.py`
- Between iterations of ``tool_call_loop``, drain pending and append to
the shared ``openai_messages`` list so the loop picks them up on the
next LLM call
- Persist session via ``upsert_chat_session`` after injection
- Finally-block safety net clears the buffer on early exit
### SDK path (in-process injection via live client.query)
`backend/copilot/sdk/service.py`
- When the SDK loop detects ``acc.stream_completed``, before breaking,
drain pending and send them via the existing open ``client.query()``
as a new user message; reset ``stream_completed`` to ``False`` and
``continue`` the async-for loop so we keep consuming CLI messages
- Combines multiple drained messages into a single ``query()`` call via
``_combine_pending_messages`` to preserve ordering
- Finally-block safety net clears the buffer on early exit
- This works because the Claude Agent SDK's ``ClaudeSDKClient`` is a
long-lived connection: ``query()`` writes a new user message to the
CLI's stdin and the same ``receive_response()`` stream picks up the
next turn's events, so we keep session continuity without releasing
the cluster lock or restarting the subprocess
### Tests
`backend/copilot/pending_messages_test.py`
- FakeRedis + FakePipeline so tests don't need a live Redis
- Covers push/drain, ordering, buffer cap (MAX_PENDING_MESSAGES),
clear, publish hook, malformed-payload handling, and the format
helper (plain / with context / with file_ids)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
### Why / What / How
**Why:** The `ask_question` copilot tool previously only accepted a
single question per invocation. When the LLM needs to ask multiple
clarifying questions simultaneously, it either crams them into one text
field (requiring users to format numbered answers manually) or makes
multiple sequential tool calls (slow and disruptive UX).
**What:** Replace the single `question`/`options`/`keyword` parameters
with a `questions` array parameter so the LLM can ask multiple questions
in one tool call, each rendered as its own input box.
**How:** Simplified the tool to accept only `questions` (array of
question objects). Each item has `question` (required), `options`, and
`keyword`. The frontend `ClarificationQuestionsCard` already supports
rendering multiple questions — no frontend changes needed.
### Changes 🏗️
- `backend/copilot/tools/ask_question.py`: Replaced dual
question/questions schema with single `questions` array. Extracted
parsing into module-level `_parse_questions` and `_parse_one` helpers.
Follows backend code style: early returns, list comprehensions, top-down
ordering, functions under 40 lines.
- `backend/copilot/tools/ask_question_test.py`: Rewritten with 18
focused tests covering happy paths, keyword handling, options filtering,
and invalid input handling.
### Checklist 📋
#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] Run `poetry run pytest backend/copilot/tools/ask_question_test.py`
— all tests pass
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Import get_subscription_price_id in v1.py
- get_subscription_status now calls stripe.Price.retrieve for PRO/BUSINESS
tiers to return actual unit_amount instead of hardcoded zeros
- UI will now show correct monthly costs when LD price IDs are configured
- Fix Button import from __legacy__ to design system in SubscriptionTierSection
- Update subscription status tests to mock the new Stripe price lookup
Drop the dual question/questions schema in favor of a single
`questions` array parameter. This removes ~175 lines of complexity
(the _execute_single path, duplicate params, precedence logic).
Restructured per backend code style rules:
- Top-down ordering: public _execute first, helpers below
- Early return with guard clauses, no deep nesting
- List comprehensions via walrus operator in _parse_questions
- Helpers extracted as module-level functions (not methods)
- Functions under 40 lines each
The frontend ClarificationQuestionsCard already renders arrays of
any length — no UI changes needed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add isinstance narrowing in test_execute_multiple_questions_ignores_single_params
to fix Pyright type-check CI failure (reportAttributeAccessIssue).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests that access `result.questions` without first narrowing the type
from `ToolResponseBase` to `ClarificationNeededResponse` cause Pyright
type-check failures. Added `assert isinstance(result,
ClarificationNeededResponse)` before accessing `.questions` in 4 tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When navigating back to a cached session, appliedActionKeys was reset to empty
but messages were preserved. This caused previously applied actions to reappear
as unapplied in the UI, allowing them to be re-applied and creating duplicate
undo entries. Clearing messages unconditionally on navigation ensures the
displayed action buttons always reflect the actual applied state.
- Restore top-level `required: ["question"]` in schema for LLM tool-
calling compatibility; validation handles the questions-only path
- Fix keyword null bug: `item.get("keyword")` returning None now
correctly falls back to `question-{idx}` instead of producing "None"
- Filter empty-string options in _build_question (`str(o).strip()`)
to avoid artifacts like "Email, , Slack"
- Revert session type hint to `ChatSession` to match base class contract
- Add tests for null keyword and empty-string options filtering
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove top-level `required: ["question"]` from schema so the
`questions`-only calling convention is valid for schema-compliant LLMs
- Move logger assignment below all imports (PEP 8 / isort)
- Remove duplicated option filtering in `_execute_single`; let
`_build_question` own that responsibility
- Fix `session` type hint to `ChatSession | None` to match the guard
- Add test for `questions` as non-list type (falls back to single path)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix falsy option filtering: use `if o is not None` instead of `if o`
so valid values like "0" are preserved
- Improve multi-question `message` field: join all questions with ";"
instead of only using the first question's text
- Add logging warnings for skipped invalid items in multi-question path
instead of silently dropping them
- Simplify schema: use `"required": ["question"]` instead of empty
required + anyOf (more LLM-friendly)
- Add missing test cases: session=None, single-item questions array,
duplicate keywords, falsy option values
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ask_question tool previously only accepted a single question per
invocation, forcing the LLM to cram multiple queries into one text box
or make multiple sequential tool calls. This adds a `questions` parameter
(list of question objects) so multiple input fields render at once.
Backward-compatible: the existing `question`/`options`/`keyword` params
still work. When `questions` (plural) is provided, they take precedence.
The frontend ClarificationQuestionsCard already supports rendering
multiple questions — no frontend changes needed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests for GET/POST /credits/subscription covering:
- GET returns current tier (PRO, FREE default when None)
- POST FREE skips Stripe when payment disabled
- POST PRO sets tier directly for beta users (payment disabled)
- POST paid tier rejects missing success_url/cancel_url with 422
- POST paid tier creates Stripe Checkout Session and returns URL
- POST FREE with payment enabled cancels active Stripe subscription