Commit Graph

8372 Commits

Author SHA1 Message Date
anvyle
b2dab8afad fix(copilot): use Redis flag for cross-process auto-approve cancellation
The cancel endpoint runs in the AgentServer process while the asyncio
auto-approve task lives in the CoPilotExecutor process — separate memory.
The in-process dict cancel from the previous commit was a no-op across
processes.

- cancel_auto_approve now SETs a Redis key with TTL as the primary cancel
  signal, plus best-effort in-process task.cancel() for single-worker.
- _run_auto_approve checks the Redis key before firing. If set, skips.
- Tests stub get_redis_async with a fake to avoid real Redis connections.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 20:34:58 +02:00
anvyle
7b60e45604 feat(copilot): cancel server auto-approve when user clicks Modify + use generated types
Blocker fix: the server-side auto-approve timer fired even when the user
was editing steps via Modify, potentially building an agent against a plan
the user had explicitly chosen to change.

- backend: change _auto_approve_tasks set → _pending_auto_approvals dict
  keyed by session_id. Add cancel_auto_approve(session_id) that looks up
  and cancels the pending asyncio task.
- backend: new POST /sessions/{id}/cancel-auto-approve endpoint in
  chat/routes.py, following the existing cancel_session_task pattern.
- frontend: handleModify() now fires postV2CancelAutoApproveTask
  (generated hook) as a best-effort cancel before entering edit mode.
- helpers.tsx: import DecompositionStepModel from generated API types
  instead of hand-rolling the interface. TaskDecompositionOutput stays
  hand-rolled (runtime shape differs from generated type for created_at).
- Add session_id to TaskDecompositionOutput so the cancel call has it.
- Default step.status to "pending" where the generated type is optional.
- 2 new tests: cancel_auto_approve cancels pending task + returns false
  for unknown session.
- Regenerate openapi.json with the new endpoint.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 20:05:28 +02:00
anvyle
8f5b9fa791 fix(copilot): align server auto-approve timer with client at 60s
Remove the 30s grace period — both client and server now fire at 60s.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 19:47:53 +02:00
anvyle
ca7dc221df chore(frontend): regenerate openapi.json with TaskDecompositionResponse.created_at
The created_at field was added to TaskDecompositionResponse a few commits
back but openapi.json was never regenerated, so the check-api-types CI
job (which re-exports the schema and asserts no diff) was failing.
Re-exporting via poetry run export-api-schema and prettier.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 21:57:33 +02:00
anvyle
98470c27e1 chore(backend): black-format platform_cost_test.py
Pre-existing formatting issue inherited from the dev merge — black wants
one blank line between TestUsdToMicrodollars and TestMaskEmail, not two.
This is unrelated to the decomposition feature but blocks CI lint.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 21:47:16 +02:00
anvyle
2760cb076f Merge remote-tracking branch 'origin/dev' into feat/task-decomposition-copilot 2026-04-10 21:46:51 +02:00
anvyle
fdfd53b45e fix(copilot): don't auto-approve decomposition on mount when deadline already passed
If the user reopened the tab between 60s and 90s after a decomposition
was created, the lazy initializer for ``secondsLeft`` would return 0
(server-stamped deadline already elapsed). The auto-approve useEffect
fires whenever ``secondsLeft === 0``, so it would silently send the
"Approved" message on mount with no user interaction — even if the user
came back specifically to click Modify.

Track in a ref whether the lazy init returned 0 because the deadline
had already passed (vs. 0 because the timer counted down from a
positive value), and skip the auto-approve in that case. The server's
own fallback timer (running 30s longer than the client) handles the
"user never returns" path, so the client doesn't need to silently fire
on mount. The user can still click Approve or Modify manually; the
server will inject its own approval at 90s if neither happens.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 21:42:28 +02:00
anvyle
ed989801d2 fix(copilot): index-based predicate so manual approve cancels server timer
The auto-approve task was firing a duplicate "Approved" message after the
agent had already been built manually. The predicate compared
ChatMessage.sequence against a baseline, but _save_session_to_db assigns
sequences in the DB without writing them back to the in-memory message
objects, and cache_chat_session writes those (sequence=None) objects to
Redis. So the predicate's loaded-from-cache view had None sequences for
freshly-appended messages, treated them as 0, and missed the user's
"Approved" entirely — leaving the timer to fire after the build had
already completed and re-injecting "Approved" for a duplicate turn.

Fix: capture len(session.messages) at schedule time and check for any
user-role message at index >= baseline. Indices are monotonic and require
no DB-side sequence bookkeeping.

Adds a regression test that constructs a session with sequence=None on
the user message, asserting the predicate detects it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 20:54:18 +02:00
anvyle
f467ead855 fix(copilot): disable decompose_goal Approve/Modify while message is streaming
After the build plan box appears, the assistant continues streaming a
short summary text. Clicking Approve or Modify in that 1-2s window failed
because the chat session is locked to the in-flight turn — sending a new
user message gets rejected.

- ChatMessagesContainer now forwards isCurrentlyStreaming through
  renderSegments → MessagePartRenderer → DecomposeGoalTool.
- DecomposeGoalTool computes actionsEnabled = showActions && !streaming
  and uses it to (a) disable the Approve, Modify, and timer buttons and
  (b) gate the auto-approve effect so the timer can hit 0 mid-stream
  without firing — the effect re-runs and approves once streaming ends.
- The countdown ring keeps ticking during streaming so it stays in sync
  with the server-side timer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 18:49:17 +02:00
Zamil Majdy
b319c26cab feat(platform/admin): per-model cost breakdown, cache token tracking, OrchestratorBlock cost fix (#12726)
## Why

The platform cost tracking system had several gaps that made the admin
dashboard less accurate and harder to reason about:

**Q: Do we have per-model granularity on the provider page?**
The `model` column was stored in `PlatformCostLog` but the SQL
aggregation grouped only by `(provider, tracking_type)`, so all models
for a given provider collapsed into one row. Now grouped by `(provider,
tracking_type, model)` — each model gets its own row.

**Q: Why does Anthropic show `per_run` for OrchestratorBlock?**
Bug: `OrchestratorBlock._call_llm()` was building `NodeExecutionStats`
with only `input_token_count` and `output_token_count` — it dropped
`resp.provider_cost` entirely. For OpenRouter calls this silently
discarded the `cost_usd`. For the SDK (autopilot) path,
`ResultMessage.total_cost_usd` was never read. When `provider_cost` is
None and token counts are 0 (e.g. SDK error path), `resolve_tracking`
falls through to `per_run`. Fixed by propagating all cost/cache fields.

**Q: Why can't we get `cost_usd` for Anthropic direct API calls?**
The Anthropic Messages API does not return a dollar amount — only token
counts. OpenRouter returns cost via response headers, so it uses
`cost_usd` directly. The Claude Agent SDK *does* compute
`total_cost_usd` internally, so SDK-mode OrchestratorBlock runs now get
`cost_usd` tracking. For direct Anthropic LLM blocks the estimate uses
per-token rates (see cache section below).

**Q: What about labeling by source (autopilot vs block)?**
Already tracked: `block_name` stores `copilot:SDK`, `copilot:Baseline`,
or the actual block name. Visible in the raw logs table. Not added to
the provider group-by (would explode row count); use the logs table
filter instead.

**Q: Is there double-counting between `tokens`, `per_run`, and
`cost_usd`?**
No. `resolve_tracking()` uses a strict preference hierarchy — exactly
one tracking type per execution: `cost_usd` > `tokens` > provider
heuristics > `per_run`. A single execution produces exactly one
`PlatformCostLog` row.

**Q: Should we track Anthropic prompt cache tokens (PR #12725)?**
Yes — PR #12725 adds `cache_control` markers to Anthropic API calls,
which causes the API to return `cache_read_input_tokens` and
`cache_creation_input_tokens` alongside regular `input_tokens`. These
have different billing rates:
- Cache reads: **10%** of base input rate (much cheaper)
- Cache writes: **125%** of base input rate (slightly more expensive,
one-time)
- Uncached input: **100%** of base rate

Without tracking them separately, a flat-rate estimate on
`total_input_tokens` would be wrong in both directions.

## What

- **Per-model provider table**: SQL now groups by `(provider,
tracking_type, model)`. `ProviderCostSummary` and the frontend
`ProviderTable` show a model column.
- **Cache token columns**: New `cacheReadTokens` and
`cacheCreationTokens` columns in `PlatformCostLog` with matching
migration.
- **LLM block cache tracking**: `LLMResponse` captures
`cache_read_input_tokens` / `cache_creation_input_tokens` from Anthropic
responses. `NodeExecutionStats` gains `cache_read_token_count` /
`cache_creation_token_count`. Both propagate to `PlatformCostEntry` and
the DB.
- **Copilot path**: `token_tracking.persist_and_record_usage` now writes
cache tokens as dedicated `PlatformCostEntry` fields (was
metadata-only).
- **OrchestratorBlock bug fix**: `_call_llm()` now includes
`resp.provider_cost`, `resp.cache_read_tokens`,
`resp.cache_creation_tokens` in the stats merge. SDK path captures
`ResultMessage.total_cost_usd` as `provider_cost`.
- **Accurate cost estimation**: `estimateCostForRow` uses
token-type-specific rates for `tokens` rows (uncached=100%, reads=10%,
writes=125% of configured base rate).

## How

`resolve_tracking` priority is unchanged. For Anthropic LLM blocks the
tracking type remains `tokens` (Anthropic API returns no dollar amount).
For OrchestratorBlock in SDK/autopilot mode it now correctly uses
`cost_usd` because the Claude Agent SDK computes and returns
`total_cost_usd`. For OpenRouter through OrchestratorBlock it now
correctly uses `cost_usd` (was silently dropped before).

## Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `ProviderCostSummary` SQL updated
- [x] Cache token fields present in `PlatformCostEntry` and
`PlatformCostLogCreateInput`
  - [x] Prisma client regenerated — all type checks pass
  - [x] Frontend `helpers.test.ts` updated for new `rateKey` format
  - [x] Pre-commit hooks pass (Black, Ruff, isort, tsc, Prisma generate)
2026-04-10 23:14:43 +07:00
Zamil Majdy
85921f227a Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into preview/all-active-prs 2026-04-10 22:59:30 +07:00
anvyle
f7601d06ed fix(copilot): resume decompose_goal countdown from server timestamp
Reopening a session was restarting the client countdown from a fresh 60s,
even though the server had been counting the whole time. Now the timer
reflects real elapsed time so the user sees the actual remaining seconds
(or 0, which auto-approves immediately).

- backend: stamp UTC created_at on TaskDecompositionResponse via a default
  factory. The timestamp is set when the tool returns and persisted in the
  message content JSON, so it survives DB round-trips.
- frontend: lazy-init secondsLeft from (auto_approve_seconds -
  (Date.now() - created_at)), clamped to [0, total]. Older messages
  without created_at fall back to a fresh full countdown (existing
  behaviour).
- Test: assert created_at is stamped within the duration of _execute().

Note: openapi.json regen is skipped in this commit because the existing
REST server is in use; the frontend reads tool output as opaque JSON via
custom helpers, so the regen is not required for the feature to work.
Regen later for completeness.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 17:44:50 +02:00
Zamil Majdy
5844b13fb1 feat(backend/copilot): support multiple questions in ask_question tool (#12732)
### Why / What / How

**Why:** The `ask_question` copilot tool previously only accepted a
single question per invocation. When the LLM needs to ask multiple
clarifying questions simultaneously, it either crams them into one text
field (requiring users to format numbered answers manually) or makes
multiple sequential tool calls (slow and disruptive UX).

**What:** Replace the single `question`/`options`/`keyword` parameters
with a `questions` array parameter so the LLM can ask multiple questions
in one tool call, each rendered as its own input box.

**How:** Simplified the tool to accept only `questions` (array of
question objects). Each item has `question` (required), `options`, and
`keyword`. The frontend `ClarificationQuestionsCard` already supports
rendering multiple questions — no frontend changes needed.

### Changes 🏗️

- `backend/copilot/tools/ask_question.py`: Replaced dual
question/questions schema with single `questions` array. Extracted
parsing into module-level `_parse_questions` and `_parse_one` helpers.
Follows backend code style: early returns, list comprehensions, top-down
ordering, functions under 40 lines.
- `backend/copilot/tools/ask_question_test.py`: Rewritten with 18
focused tests covering happy paths, keyword handling, options filtering,
and invalid input handling.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] Run `poetry run pytest backend/copilot/tools/ask_question_test.py`
— all tests pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 21:54:53 +07:00
anvyle
fb86fcb67d feat(copilot): add server-side auto-approve fallback for decompose_goal
The decompose_goal countdown was purely client-side: if the user closed the
tab before the timer ran out, the agent never got built. Add a server-side
timer that fires the same approval message even when no client is connected.

- backend/copilot/model.py: add append_message_if helper that appends a
  message inside the session lock only if a predicate is satisfied. Used
  by the auto-approve task to no-op when the user has already acted.
- backend/copilot/tools/decompose_goal.py: when the tool returns, schedule
  a fire-and-forget asyncio task (same _background_tasks pattern as
  agent_browser.py) that sleeps 90s, re-checks the session, and if no user
  message has appeared since, appends "Approved. Please build the agent."
  and enqueues a new copilot turn. Stays in process; restart-resilience
  is a documented follow-up.
- backend/copilot/tools/models.py: expose auto_approve_seconds on
  TaskDecompositionResponse so the frontend countdown is sourced from the
  backend instead of a hard-coded constant.
- frontend DecomposeGoal.tsx: seed secondsLeft from output.auto_approve_seconds
  with a 60s fallback for older sessions.
- Regenerate openapi.json with the new field.
- Tests: 9 new unit tests covering the predicate, the auto-approve flow
  (idle / user-acted / errors swallowed) and _schedule_auto_approve.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 16:34:46 +02:00
Zamil Majdy
c014e1aa35 merge(preview): merge all active PRs into preview/all-active-prs from fresh dev 2026-04-10 08:40:23 +07:00
Zamil Majdy
e59f576622 Merge remote-tracking branch 'origin/spare/13' into preview/all-active-prs 2026-04-10 08:39:34 +07:00
Zamil Majdy
c99fa32ae3 Merge remote-tracking branch 'origin/spare/3' into preview/all-active-prs 2026-04-10 08:39:34 +07:00
Zamil Majdy
b71789da50 Merge remote-tracking branch 'origin/feat/subscription-tier-billing' into preview/all-active-prs 2026-04-10 08:39:34 +07:00
Zamil Majdy
5661326e7e fix(platform): fetch real Stripe prices in subscription status endpoint
- Import get_subscription_price_id in v1.py
- get_subscription_status now calls stripe.Price.retrieve for PRO/BUSINESS
  tiers to return actual unit_amount instead of hardcoded zeros
- UI will now show correct monthly costs when LD price IDs are configured
- Fix Button import from __legacy__ to design system in SubscriptionTierSection
- Update subscription status tests to mock the new Stripe price lookup
2026-04-10 08:37:40 +07:00
Zamil Majdy
df3fe926f2 style(backend/copilot): apply Black formatting to ask_question
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 23:56:42 +00:00
Zamil Majdy
505af7e673 refactor(backend/copilot): simplify ask_question to questions-only API
Drop the dual question/questions schema in favor of a single
`questions` array parameter. This removes ~175 lines of complexity
(the _execute_single path, duplicate params, precedence logic).

Restructured per backend code style rules:
- Top-down ordering: public _execute first, helpers below
- Early return with guard clauses, no deep nesting
- List comprehensions via walrus operator in _parse_questions
- Helpers extracted as module-level functions (not methods)
- Functions under 40 lines each

The frontend ClarificationQuestionsCard already renders arrays of
any length — no UI changes needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 23:54:11 +00:00
Zamil Majdy
d896a1f9fa fix(backend/copilot): add missing isinstance assertion in test
Add isinstance narrowing in test_execute_multiple_questions_ignores_single_params
to fix Pyright type-check CI failure (reportAttributeAccessIssue).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 23:48:02 +00:00
Zamil Majdy
6aa5a808e0 fix(backend/copilot): add isinstance assertions to fix type-check CI
Tests that access `result.questions` without first narrowing the type
from `ToolResponseBase` to `ClarificationNeededResponse` cause Pyright
type-check failures. Added `assert isinstance(result,
ClarificationNeededResponse)` before accessing `.questions` in 4 tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 23:40:08 +00:00
anvyle
94f065a7e0 fix(frontend/copilot): remove setInitialPrompt conflict and reset edit mode on new message
- Remove setInitialPrompt() from handleModify() — the inline editor is the
  sole editing UX; pre-filling the chat input simultaneously creates a
  conflicting interface where chat-input submission loses inline edits
- Add useEffect to reset isEditing when showActions goes false (new message
  arrives while editing), preventing users from being stuck in edit mode with
  no way to submit

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 23:15:16 +02:00
anvyle
8d5e8a9e3f fix(backend/copilot): add decompose_goal to ToolName Literal in permissions.py
The ToolName Literal must stay in sync with TOOL_REGISTRY keys. Adds
'decompose_goal' to the platform tools section to fix CI test failures.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 23:09:14 +02:00
anvyle
02b972cfc4 fix(backend/copilot): regenerate openapi.json with TaskDecompositionResponse schema
The API schema was missing DecompositionStepModel and TaskDecompositionResponse
after the merge. Regenerated with export-api-schema and formatted with prettier.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 22:51:18 +02:00
anvyle
31ce418d5e fix(backend/copilot): resolve merge conflict with dev branch in models.py
Merge upstream dev changes (Graphiti memory responses) alongside the
TaskDecompositionResponse added in this PR.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 22:44:02 +02:00
anvyle
70689ce326 fix(frontend/copilot): guard isPending flag on error and filter empty steps from approval
- Prevent simultaneous pending + error state when output-error has null payload:
  isPending is now false when isError is true
- Filter out steps with empty descriptions before building the approval
  message, preventing malformed input from reaching the LLM

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 22:40:39 +02:00
anvyle
9004a3ada1 fix(copilot): guard auto-approve against race condition when isLastMessage changes
Add showActions to the auto-approve useEffect dependency array and
condition. This prevents the approval from firing after isLastMessage
becomes false (e.g. when a new message arrives just as the timer
expires), closing the race condition flagged by Sentry.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 22:25:27 +02:00
anvyle
5e9cee524d fix(copilot): address PR review comments on decompose_goal tool
- Add TaskDecompositionResponse to ToolResponseUnion for OpenAPI codegen
- Remove LLM-controllable require_approval param (hardcoded to True)
- Validate each step is a dict before calling .get()
- Validate step descriptions are non-empty
- Validate action values against allowlist, coerce unknown to DEFAULT_ACTION
- Align MAX_STEPS=8 with agent_generation_guide.md (was 10)
- Add DEFAULT_ACTION constant; use enum in schema
- Add model_validator to sync step_count with len(steps)
- Fix handleModify: pre-fill chat input via setInitialPrompt instead of sending dangling message
- Add approvedRef guard on handleModify to prevent double-clicks
- Fix eslint-disable: rewrite auto-approve effect without dependency suppression
- Fix hardcoded light-mode colors (bg-white, border-slate-200, text-zinc-800) → semantic tokens
- Fix error card: render ToolErrorCard whenever isError=true, not only when output is present
- Fix hint text: only show approve hint when requires_approval=true
- Remove dead `action` prop from StepItem
- Add aria-label to all StepStatusIcon states
- Tighten parseOutput type guards (Array.isArray check, no false positives)
- Rename isOperating → isPending for clarity
- Add backend unit tests for DecomposeGoalTool (16 cases)
- Add frontend unit tests for helpers.tsx (20 cases)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 22:23:11 +02:00
anvyle
b9d47a8cf5 fix(copilot): auto-size editable step textareas on initial render and input
- Replace <input type="text"> with <textarea> for step descriptions
- Use ref callback to set height from scrollHeight on every render so
  long descriptions wrap to multiple lines by default without interaction
- Bump countdown ring container from 20px to 24px and text from 9px to
  11px for better legibility

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 22:10:51 +02:00
anvyle
5fa33111de feat(copilot): add auto-approve timer with editable steps to decompose_goal UI
- Replace static Approve/Modify buttons with a 99s countdown timer that
  auto-approves when it expires
- Timer ring animates inline within "Starting in [N]s" text using SVG
  strokeDasharray; hover on the text swaps it to "Start now" via Tailwind
  named groups (group/label)
- Clicking Modify stops the timer, enters editable mode where steps can be
  renamed, deleted, or inserted between existing steps
- In edit mode only Approve is shown; timer and Modify are hidden
- showActions gated on isLastMessage (server-derived) so the timer never
  re-appears when returning to a session with prior messages
- Forward isLastMessage through ChatMessagesContainer → MessagePartRenderer

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 21:50:43 +02:00
Zamil Majdy
18c88b4da0 fix(frontend/builder): always clear messages on flowID change to keep action state consistent
When navigating back to a cached session, appliedActionKeys was reset to empty
but messages were preserved. This caused previously applied actions to reappear
as unapplied in the UI, allowing them to be re-applied and creating duplicate
undo entries. Clearing messages unconditionally on navigation ensures the
displayed action buttons always reflect the actual applied state.
2026-04-10 02:03:56 +07:00
Zamil Majdy
3a5ce570e0 fix(backend/copilot): address PR review round 4
- Restore top-level `required: ["question"]` in schema for LLM tool-
  calling compatibility; validation handles the questions-only path
- Fix keyword null bug: `item.get("keyword")` returning None now
  correctly falls back to `question-{idx}` instead of producing "None"
- Filter empty-string options in _build_question (`str(o).strip()`)
  to avoid artifacts like "Email, , Slack"
- Revert session type hint to `ChatSession` to match base class contract
- Add tests for null keyword and empty-string options filtering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:56:37 +00:00
Zamil Majdy
5a3739e54d fix(backend/copilot): address PR review round 2
- Remove top-level `required: ["question"]` from schema so the
  `questions`-only calling convention is valid for schema-compliant LLMs
- Move logger assignment below all imports (PEP 8 / isort)
- Remove duplicated option filtering in `_execute_single`; let
  `_build_question` own that responsibility
- Fix `session` type hint to `ChatSession | None` to match the guard
- Add test for `questions` as non-list type (falls back to single path)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:43:11 +00:00
Zamil Majdy
72bc8a92df fix(frontend/builder): guard msg.parts with nullish coalescing to prevent runtime error 2026-04-10 01:41:15 +07:00
Zamil Majdy
cc29cf5e20 fix(backend/copilot): address PR review round 1
- Fix falsy option filtering: use `if o is not None` instead of `if o`
  so valid values like "0" are preserved
- Improve multi-question `message` field: join all questions with ";"
  instead of only using the first question's text
- Add logging warnings for skipped invalid items in multi-question path
  instead of silently dropping them
- Simplify schema: use `"required": ["question"]` instead of empty
  required + anyOf (more LLM-friendly)
- Add missing test cases: session=None, single-item questions array,
  duplicate keywords, falsy option values

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:39:55 +00:00
Zamil Majdy
a0efbbba90 feat(backend/copilot): support multiple questions in ask_question tool
The ask_question tool previously only accepted a single question per
invocation, forcing the LLM to cram multiple queries into one text box
or make multiple sequential tool calls. This adds a `questions` parameter
(list of question objects) so multiple input fields render at once.

Backward-compatible: the existing `question`/`options`/`keyword` params
still work. When `questions` (plural) is provided, they take precedence.
The frontend ClarificationQuestionsCard already supports rendering
multiple questions — no frontend changes needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:21:35 +00:00
Zamil Majdy
8ed959433a fix(frontend/builder): clear stale messages in retrySession so new session starts clean 2026-04-10 00:56:31 +07:00
Zamil Majdy
98f3e09580 fix(frontend/builder): reset hasSentSeedMessageRef in retrySession so seed is sent to new session 2026-04-10 00:39:10 +07:00
Zamil Majdy
9ec44dd109 test(backend): add route-level tests for subscription API endpoints
Tests for GET/POST /credits/subscription covering:
- GET returns current tier (PRO, FREE default when None)
- POST FREE skips Stripe when payment disabled
- POST PRO sets tier directly for beta users (payment disabled)
- POST paid tier rejects missing success_url/cancel_url with 422
- POST paid tier creates Stripe Checkout Session and returns URL
- POST FREE with payment enabled cancels active Stripe subscription
2026-04-10 00:19:06 +07:00
Zamil Majdy
bfb82b6246 fix(platform): address reviewer feedback on subscription endpoint
- Remove useCallback from changeTier (not needed per project guidelines)
- Block self-service tier changes for ENTERPRISE users (admin-managed)
- Preserve current tier on unrecognized Stripe price_id instead of
  defaulting to FREE (prevents accidental downgrades during price migration)
2026-04-10 00:08:54 +07:00
Zamil Majdy
63210770ce test(backend): add tests for get_subscription_price_id to improve coverage 2026-04-09 23:54:02 +07:00
Zamil Majdy
f2b8f81bb1 test(backend/copilot): add unit tests for update_message_content_by_sequence
Cover success, not-found (returns False + warning), and DB-error (returns
False + error log) paths to push patch coverage above the 80% threshold.
2026-04-09 23:52:39 +07:00
Zamil Majdy
68b51ae2d3 test(backend): add coverage for sync_subscription_from_stripe edge cases
Tests for:
- Unknown/mismatched Stripe price_id defaults to FREE (not early return)
- None from LaunchDarkly price flags defaults to FREE
- BUSINESS tier mapping
- StripeError during cancel_stripe_subscription is logged, not raised
2026-04-09 23:52:16 +07:00
Zamil Majdy
63ff214563 fix(backend): default to FREE tier on unknown Stripe price ID in webhook sync
When sync_subscription_from_stripe encounters an unrecognized price_id
(e.g. LD flags unconfigured or price changed), it no longer returns early
leaving the user on a stale tier. Instead it defaults to FREE and logs a
warning, keeping the DB state consistent with Stripe's subscription status.

Also guard against None pro_price/biz_price from LaunchDarkly before
comparison to avoid silent mismatches.
2026-04-09 23:41:51 +07:00
Zamil Majdy
9498daca31 fix(frontend/builder): wrap panel in CopilotChatActionsProvider to prevent crash
EditAgentTool and RunAgentTool call useCopilotChatActions() which throws
if no provider is in the tree. Wrap the panel content with
CopilotChatActionsProvider wired to sendRawMessage so tool components
can send retry prompts without crashing.
2026-04-09 23:41:06 +07:00
Zamil Majdy
ce0cb1e035 fix(backend/copilot): persist user-context prefix to DB in both SDK and baseline paths
The user message was saved to DB before the <user_context> prefix was added
to session.messages. Subsequent upsert_chat_session calls only append new
messages (slicing by existing_message_count), so the prefixed content was
never written to the DB. On page reload or --resume, the unprefixed version
was loaded, losing personalisation.

Fix: add update_message_content_by_sequence to db.py and call it after
injecting the prefix in both sdk/service.py and baseline/service.py.
2026-04-09 23:40:14 +07:00
Zamil Majdy
0d89f7bb33 fix(backend): handle customer.subscription.created webhook event
Add customer.subscription.created to the sync handler so user tier is
upgraded immediately when the subscription is first created (not just on
subsequent updates/deletions).
2026-04-09 23:39:16 +07:00
Zamil Majdy
aef9298be6 test(platform/admin): add cache token and retry cost accumulation tests
Add unit tests for:
- Anthropic cache_read_tokens/cache_creation_tokens in llm_call response
- cache token accumulation in AIStructuredResponseGeneratorBlock stats
- provider_cost persistence on exhausted retry path
- usd_to_microdollars None-safe branch
- explicit start param covering _build_where false branch
- cache token columns in platform_cost integration test
2026-04-09 23:33:21 +07:00