AutoGPT

mirror of https://github.com/Significant-Gravitas/AutoGPT.git synced 2026-04-30 03:00:41 -04:00

Author	SHA1	Message	Date
anvyle	b2dab8afad	fix(copilot): use Redis flag for cross-process auto-approve cancellation The cancel endpoint runs in the AgentServer process while the asyncio auto-approve task lives in the CoPilotExecutor process — separate memory. The in-process dict cancel from the previous commit was a no-op across processes. - cancel_auto_approve now SETs a Redis key with TTL as the primary cancel signal, plus best-effort in-process task.cancel() for single-worker. - _run_auto_approve checks the Redis key before firing. If set, skips. - Tests stub get_redis_async with a fake to avoid real Redis connections. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 20:34:58 +02:00
anvyle	7b60e45604	feat(copilot): cancel server auto-approve when user clicks Modify + use generated types Blocker fix: the server-side auto-approve timer fired even when the user was editing steps via Modify, potentially building an agent against a plan the user had explicitly chosen to change. - backend: change _auto_approve_tasks set → _pending_auto_approvals dict keyed by session_id. Add cancel_auto_approve(session_id) that looks up and cancels the pending asyncio task. - backend: new POST /sessions/{id}/cancel-auto-approve endpoint in chat/routes.py, following the existing cancel_session_task pattern. - frontend: handleModify() now fires postV2CancelAutoApproveTask (generated hook) as a best-effort cancel before entering edit mode. - helpers.tsx: import DecompositionStepModel from generated API types instead of hand-rolling the interface. TaskDecompositionOutput stays hand-rolled (runtime shape differs from generated type for created_at). - Add session_id to TaskDecompositionOutput so the cancel call has it. - Default step.status to "pending" where the generated type is optional. - 2 new tests: cancel_auto_approve cancels pending task + returns false for unknown session. - Regenerate openapi.json with the new endpoint. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-14 20:05:28 +02:00
anvyle	8f5b9fa791	fix(copilot): align server auto-approve timer with client at 60s Remove the 30s grace period — both client and server now fire at 60s. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-13 19:47:53 +02:00
anvyle	ca7dc221df	chore(frontend): regenerate openapi.json with TaskDecompositionResponse.created_at The created_at field was added to TaskDecompositionResponse a few commits back but openapi.json was never regenerated, so the check-api-types CI job (which re-exports the schema and asserts no diff) was failing. Re-exporting via poetry run export-api-schema and prettier. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 21:57:33 +02:00
anvyle	98470c27e1	chore(backend): black-format platform_cost_test.py Pre-existing formatting issue inherited from the dev merge — black wants one blank line between TestUsdToMicrodollars and TestMaskEmail, not two. This is unrelated to the decomposition feature but blocks CI lint. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 21:47:16 +02:00
anvyle	2760cb076f	Merge remote-tracking branch 'origin/dev' into feat/task-decomposition-copilot	2026-04-10 21:46:51 +02:00
anvyle	fdfd53b45e	fix(copilot): don't auto-approve decomposition on mount when deadline already passed If the user reopened the tab between 60s and 90s after a decomposition was created, the lazy initializer for ``secondsLeft`` would return 0 (server-stamped deadline already elapsed). The auto-approve useEffect fires whenever ``secondsLeft === 0``, so it would silently send the "Approved" message on mount with no user interaction — even if the user came back specifically to click Modify. Track in a ref whether the lazy init returned 0 because the deadline had already passed (vs. 0 because the timer counted down from a positive value), and skip the auto-approve in that case. The server's own fallback timer (running 30s longer than the client) handles the "user never returns" path, so the client doesn't need to silently fire on mount. The user can still click Approve or Modify manually; the server will inject its own approval at 90s if neither happens. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 21:42:28 +02:00
anvyle	ed989801d2	fix(copilot): index-based predicate so manual approve cancels server timer The auto-approve task was firing a duplicate "Approved" message after the agent had already been built manually. The predicate compared ChatMessage.sequence against a baseline, but _save_session_to_db assigns sequences in the DB without writing them back to the in-memory message objects, and cache_chat_session writes those (sequence=None) objects to Redis. So the predicate's loaded-from-cache view had None sequences for freshly-appended messages, treated them as 0, and missed the user's "Approved" entirely — leaving the timer to fire after the build had already completed and re-injecting "Approved" for a duplicate turn. Fix: capture len(session.messages) at schedule time and check for any user-role message at index >= baseline. Indices are monotonic and require no DB-side sequence bookkeeping. Adds a regression test that constructs a session with sequence=None on the user message, asserting the predicate detects it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 20:54:18 +02:00
anvyle	f467ead855	fix(copilot): disable decompose_goal Approve/Modify while message is streaming After the build plan box appears, the assistant continues streaming a short summary text. Clicking Approve or Modify in that 1-2s window failed because the chat session is locked to the in-flight turn — sending a new user message gets rejected. - ChatMessagesContainer now forwards isCurrentlyStreaming through renderSegments → MessagePartRenderer → DecomposeGoalTool. - DecomposeGoalTool computes actionsEnabled = showActions && !streaming and uses it to (a) disable the Approve, Modify, and timer buttons and (b) gate the auto-approve effect so the timer can hit 0 mid-stream without firing — the effect re-runs and approves once streaming ends. - The countdown ring keeps ticking during streaming so it stays in sync with the server-side timer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 18:49:17 +02:00
Zamil Majdy	b319c26cab	feat(platform/admin): per-model cost breakdown, cache token tracking, OrchestratorBlock cost fix (#12726 ) ## Why The platform cost tracking system had several gaps that made the admin dashboard less accurate and harder to reason about: Q: Do we have per-model granularity on the provider page? The `model` column was stored in `PlatformCostLog` but the SQL aggregation grouped only by `(provider, tracking_type)`, so all models for a given provider collapsed into one row. Now grouped by `(provider, tracking_type, model)` — each model gets its own row. Q: Why does Anthropic show `per_run` for OrchestratorBlock? Bug: `OrchestratorBlock._call_llm()` was building `NodeExecutionStats` with only `input_token_count` and `output_token_count` — it dropped `resp.provider_cost` entirely. For OpenRouter calls this silently discarded the `cost_usd`. For the SDK (autopilot) path, `ResultMessage.total_cost_usd` was never read. When `provider_cost` is None and token counts are 0 (e.g. SDK error path), `resolve_tracking` falls through to `per_run`. Fixed by propagating all cost/cache fields. Q: Why can't we get `cost_usd` for Anthropic direct API calls? The Anthropic Messages API does not return a dollar amount — only token counts. OpenRouter returns cost via response headers, so it uses `cost_usd` directly. The Claude Agent SDK does compute `total_cost_usd` internally, so SDK-mode OrchestratorBlock runs now get `cost_usd` tracking. For direct Anthropic LLM blocks the estimate uses per-token rates (see cache section below). Q: What about labeling by source (autopilot vs block)? Already tracked: `block_name` stores `copilot:SDK`, `copilot:Baseline`, or the actual block name. Visible in the raw logs table. Not added to the provider group-by (would explode row count); use the logs table filter instead. Q: Is there double-counting between `tokens`, `per_run`, and `cost_usd`? No. `resolve_tracking()` uses a strict preference hierarchy — exactly one tracking type per execution: `cost_usd` > `tokens` > provider heuristics > `per_run`. A single execution produces exactly one `PlatformCostLog` row. Q: Should we track Anthropic prompt cache tokens (PR #12725)? Yes — PR #12725 adds `cache_control` markers to Anthropic API calls, which causes the API to return `cache_read_input_tokens` and `cache_creation_input_tokens` alongside regular `input_tokens`. These have different billing rates: - Cache reads: 10% of base input rate (much cheaper) - Cache writes: 125% of base input rate (slightly more expensive, one-time) - Uncached input: 100% of base rate Without tracking them separately, a flat-rate estimate on `total_input_tokens` would be wrong in both directions. ## What - Per-model provider table: SQL now groups by `(provider, tracking_type, model)`. `ProviderCostSummary` and the frontend `ProviderTable` show a model column. - Cache token columns: New `cacheReadTokens` and `cacheCreationTokens` columns in `PlatformCostLog` with matching migration. - LLM block cache tracking: `LLMResponse` captures `cache_read_input_tokens` / `cache_creation_input_tokens` from Anthropic responses. `NodeExecutionStats` gains `cache_read_token_count` / `cache_creation_token_count`. Both propagate to `PlatformCostEntry` and the DB. - Copilot path: `token_tracking.persist_and_record_usage` now writes cache tokens as dedicated `PlatformCostEntry` fields (was metadata-only). - OrchestratorBlock bug fix: `_call_llm()` now includes `resp.provider_cost`, `resp.cache_read_tokens`, `resp.cache_creation_tokens` in the stats merge. SDK path captures `ResultMessage.total_cost_usd` as `provider_cost`. - Accurate cost estimation: `estimateCostForRow` uses token-type-specific rates for `tokens` rows (uncached=100%, reads=10%, writes=125% of configured base rate). ## How `resolve_tracking` priority is unchanged. For Anthropic LLM blocks the tracking type remains `tokens` (Anthropic API returns no dollar amount). For OrchestratorBlock in SDK/autopilot mode it now correctly uses `cost_usd` because the Claude Agent SDK computes and returns `total_cost_usd`. For OpenRouter through OrchestratorBlock it now correctly uses `cost_usd` (was silently dropped before). ## Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [x] I have tested my changes according to the test plan: - [x] `ProviderCostSummary` SQL updated - [x] Cache token fields present in `PlatformCostEntry` and `PlatformCostLogCreateInput` - [x] Prisma client regenerated — all type checks pass - [x] Frontend `helpers.test.ts` updated for new `rateKey` format - [x] Pre-commit hooks pass (Black, Ruff, isort, tsc, Prisma generate)	2026-04-10 23:14:43 +07:00
Zamil Majdy	85921f227a	Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into preview/all-active-prs	2026-04-10 22:59:30 +07:00
anvyle	f7601d06ed	fix(copilot): resume decompose_goal countdown from server timestamp Reopening a session was restarting the client countdown from a fresh 60s, even though the server had been counting the whole time. Now the timer reflects real elapsed time so the user sees the actual remaining seconds (or 0, which auto-approves immediately). - backend: stamp UTC created_at on TaskDecompositionResponse via a default factory. The timestamp is set when the tool returns and persisted in the message content JSON, so it survives DB round-trips. - frontend: lazy-init secondsLeft from (auto_approve_seconds - (Date.now() - created_at)), clamped to [0, total]. Older messages without created_at fall back to a fresh full countdown (existing behaviour). - Test: assert created_at is stamped within the duration of _execute(). Note: openapi.json regen is skipped in this commit because the existing REST server is in use; the frontend reads tool output as opaque JSON via custom helpers, so the regen is not required for the feature to work. Regen later for completeness. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 17:44:50 +02:00
Zamil Majdy	5844b13fb1	feat(backend/copilot): support multiple questions in ask_question tool (#12732 ) ### Why / What / How Why: The `ask_question` copilot tool previously only accepted a single question per invocation. When the LLM needs to ask multiple clarifying questions simultaneously, it either crams them into one text field (requiring users to format numbered answers manually) or makes multiple sequential tool calls (slow and disruptive UX). What: Replace the single `question`/`options`/`keyword` parameters with a `questions` array parameter so the LLM can ask multiple questions in one tool call, each rendered as its own input box. How: Simplified the tool to accept only `questions` (array of question objects). Each item has `question` (required), `options`, and `keyword`. The frontend `ClarificationQuestionsCard` already supports rendering multiple questions — no frontend changes needed. ### Changes 🏗️ - `backend/copilot/tools/ask_question.py`: Replaced dual question/questions schema with single `questions` array. Extracted parsing into module-level `_parse_questions` and `_parse_one` helpers. Follows backend code style: early returns, list comprehensions, top-down ordering, functions under 40 lines. - `backend/copilot/tools/ask_question_test.py`: Rewritten with 18 focused tests covering happy paths, keyword handling, options filtering, and invalid input handling. ### Checklist 📋 #### For code changes: - [x] I have clearly listed my changes in the PR description - [x] I have made a test plan - [ ] I have tested my changes according to the test plan: - [ ] Run `poetry run pytest backend/copilot/tools/ask_question_test.py` — all tests pass 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 21:54:53 +07:00
anvyle	fb86fcb67d	feat(copilot): add server-side auto-approve fallback for decompose_goal The decompose_goal countdown was purely client-side: if the user closed the tab before the timer ran out, the agent never got built. Add a server-side timer that fires the same approval message even when no client is connected. - backend/copilot/model.py: add append_message_if helper that appends a message inside the session lock only if a predicate is satisfied. Used by the auto-approve task to no-op when the user has already acted. - backend/copilot/tools/decompose_goal.py: when the tool returns, schedule a fire-and-forget asyncio task (same _background_tasks pattern as agent_browser.py) that sleeps 90s, re-checks the session, and if no user message has appeared since, appends "Approved. Please build the agent." and enqueues a new copilot turn. Stays in process; restart-resilience is a documented follow-up. - backend/copilot/tools/models.py: expose auto_approve_seconds on TaskDecompositionResponse so the frontend countdown is sourced from the backend instead of a hard-coded constant. - frontend DecomposeGoal.tsx: seed secondsLeft from output.auto_approve_seconds with a 60s fallback for older sessions. - Regenerate openapi.json with the new field. - Tests: 9 new unit tests covering the predicate, the auto-approve flow (idle / user-acted / errors swallowed) and _schedule_auto_approve. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 16:34:46 +02:00
Zamil Majdy	c014e1aa35	merge(preview): merge all active PRs into preview/all-active-prs from fresh dev	2026-04-10 08:40:23 +07:00
Zamil Majdy	e59f576622	Merge remote-tracking branch 'origin/spare/13' into preview/all-active-prs	2026-04-10 08:39:34 +07:00
Zamil Majdy	c99fa32ae3	Merge remote-tracking branch 'origin/spare/3' into preview/all-active-prs	2026-04-10 08:39:34 +07:00
Zamil Majdy	b71789da50	Merge remote-tracking branch 'origin/feat/subscription-tier-billing' into preview/all-active-prs	2026-04-10 08:39:34 +07:00
Zamil Majdy	5661326e7e	fix(platform): fetch real Stripe prices in subscription status endpoint - Import get_subscription_price_id in v1.py - get_subscription_status now calls stripe.Price.retrieve for PRO/BUSINESS tiers to return actual unit_amount instead of hardcoded zeros - UI will now show correct monthly costs when LD price IDs are configured - Fix Button import from __legacy__ to design system in SubscriptionTierSection - Update subscription status tests to mock the new Stripe price lookup	2026-04-10 08:37:40 +07:00
Zamil Majdy	df3fe926f2	style(backend/copilot): apply Black formatting to ask_question Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 23:56:42 +00:00
Zamil Majdy	505af7e673	refactor(backend/copilot): simplify ask_question to questions-only API Drop the dual question/questions schema in favor of a single `questions` array parameter. This removes ~175 lines of complexity (the _execute_single path, duplicate params, precedence logic). Restructured per backend code style rules: - Top-down ordering: public _execute first, helpers below - Early return with guard clauses, no deep nesting - List comprehensions via walrus operator in _parse_questions - Helpers extracted as module-level functions (not methods) - Functions under 40 lines each The frontend ClarificationQuestionsCard already renders arrays of any length — no UI changes needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 23:54:11 +00:00
Zamil Majdy	d896a1f9fa	fix(backend/copilot): add missing isinstance assertion in test Add isinstance narrowing in test_execute_multiple_questions_ignores_single_params to fix Pyright type-check CI failure (reportAttributeAccessIssue). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 23:48:02 +00:00
Zamil Majdy	6aa5a808e0	fix(backend/copilot): add isinstance assertions to fix type-check CI Tests that access `result.questions` without first narrowing the type from `ToolResponseBase` to `ClarificationNeededResponse` cause Pyright type-check failures. Added `assert isinstance(result, ClarificationNeededResponse)` before accessing `.questions` in 4 tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 23:40:08 +00:00
anvyle	94f065a7e0	fix(frontend/copilot): remove setInitialPrompt conflict and reset edit mode on new message - Remove setInitialPrompt() from handleModify() — the inline editor is the sole editing UX; pre-filling the chat input simultaneously creates a conflicting interface where chat-input submission loses inline edits - Add useEffect to reset isEditing when showActions goes false (new message arrives while editing), preventing users from being stuck in edit mode with no way to submit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 23:15:16 +02:00
anvyle	8d5e8a9e3f	fix(backend/copilot): add decompose_goal to ToolName Literal in permissions.py The ToolName Literal must stay in sync with TOOL_REGISTRY keys. Adds 'decompose_goal' to the platform tools section to fix CI test failures. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 23:09:14 +02:00
anvyle	02b972cfc4	fix(backend/copilot): regenerate openapi.json with TaskDecompositionResponse schema The API schema was missing DecompositionStepModel and TaskDecompositionResponse after the merge. Regenerated with export-api-schema and formatted with prettier. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 22:51:18 +02:00
anvyle	31ce418d5e	fix(backend/copilot): resolve merge conflict with dev branch in models.py Merge upstream dev changes (Graphiti memory responses) alongside the TaskDecompositionResponse added in this PR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 22:44:02 +02:00
anvyle	70689ce326	fix(frontend/copilot): guard isPending flag on error and filter empty steps from approval - Prevent simultaneous pending + error state when output-error has null payload: isPending is now false when isError is true - Filter out steps with empty descriptions before building the approval message, preventing malformed input from reaching the LLM Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 22:40:39 +02:00
anvyle	9004a3ada1	fix(copilot): guard auto-approve against race condition when isLastMessage changes Add showActions to the auto-approve useEffect dependency array and condition. This prevents the approval from firing after isLastMessage becomes false (e.g. when a new message arrives just as the timer expires), closing the race condition flagged by Sentry. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 22:25:27 +02:00
anvyle	5e9cee524d	fix(copilot): address PR review comments on decompose_goal tool - Add TaskDecompositionResponse to ToolResponseUnion for OpenAPI codegen - Remove LLM-controllable require_approval param (hardcoded to True) - Validate each step is a dict before calling .get() - Validate step descriptions are non-empty - Validate action values against allowlist, coerce unknown to DEFAULT_ACTION - Align MAX_STEPS=8 with agent_generation_guide.md (was 10) - Add DEFAULT_ACTION constant; use enum in schema - Add model_validator to sync step_count with len(steps) - Fix handleModify: pre-fill chat input via setInitialPrompt instead of sending dangling message - Add approvedRef guard on handleModify to prevent double-clicks - Fix eslint-disable: rewrite auto-approve effect without dependency suppression - Fix hardcoded light-mode colors (bg-white, border-slate-200, text-zinc-800) → semantic tokens - Fix error card: render ToolErrorCard whenever isError=true, not only when output is present - Fix hint text: only show approve hint when requires_approval=true - Remove dead `action` prop from StepItem - Add aria-label to all StepStatusIcon states - Tighten parseOutput type guards (Array.isArray check, no false positives) - Rename isOperating → isPending for clarity - Add backend unit tests for DecomposeGoalTool (16 cases) - Add frontend unit tests for helpers.tsx (20 cases) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 22:23:11 +02:00
anvyle	b9d47a8cf5	fix(copilot): auto-size editable step textareas on initial render and input - Replace <input type="text"> with <textarea> for step descriptions - Use ref callback to set height from scrollHeight on every render so long descriptions wrap to multiple lines by default without interaction - Bump countdown ring container from 20px to 24px and text from 9px to 11px for better legibility Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 22:10:51 +02:00
anvyle	5fa33111de	feat(copilot): add auto-approve timer with editable steps to decompose_goal UI - Replace static Approve/Modify buttons with a 99s countdown timer that auto-approves when it expires - Timer ring animates inline within "Starting in [N]s" text using SVG strokeDasharray; hover on the text swaps it to "Start now" via Tailwind named groups (group/label) - Clicking Modify stops the timer, enters editable mode where steps can be renamed, deleted, or inserted between existing steps - In edit mode only Approve is shown; timer and Modify are hidden - showActions gated on isLastMessage (server-derived) so the timer never re-appears when returning to a session with prior messages - Forward isLastMessage through ChatMessagesContainer → MessagePartRenderer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-09 21:50:43 +02:00
Zamil Majdy	18c88b4da0	fix(frontend/builder): always clear messages on flowID change to keep action state consistent When navigating back to a cached session, appliedActionKeys was reset to empty but messages were preserved. This caused previously applied actions to reappear as unapplied in the UI, allowing them to be re-applied and creating duplicate undo entries. Clearing messages unconditionally on navigation ensures the displayed action buttons always reflect the actual applied state.	2026-04-10 02:03:56 +07:00
Zamil Majdy	3a5ce570e0	fix(backend/copilot): address PR review round 4 - Restore top-level `required: ["question"]` in schema for LLM tool- calling compatibility; validation handles the questions-only path - Fix keyword null bug: `item.get("keyword")` returning None now correctly falls back to `question-{idx}` instead of producing "None" - Filter empty-string options in _build_question (`str(o).strip()`) to avoid artifacts like "Email, , Slack" - Revert session type hint to `ChatSession` to match base class contract - Add tests for null keyword and empty-string options filtering Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 18:56:37 +00:00
Zamil Majdy	5a3739e54d	fix(backend/copilot): address PR review round 2 - Remove top-level `required: ["question"]` from schema so the `questions`-only calling convention is valid for schema-compliant LLMs - Move logger assignment below all imports (PEP 8 / isort) - Remove duplicated option filtering in `_execute_single`; let `_build_question` own that responsibility - Fix `session` type hint to `ChatSession \| None` to match the guard - Add test for `questions` as non-list type (falls back to single path) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 18:43:11 +00:00
Zamil Majdy	72bc8a92df	fix(frontend/builder): guard msg.parts with nullish coalescing to prevent runtime error	2026-04-10 01:41:15 +07:00
Zamil Majdy	cc29cf5e20	fix(backend/copilot): address PR review round 1 - Fix falsy option filtering: use `if o is not None` instead of `if o` so valid values like "0" are preserved - Improve multi-question `message` field: join all questions with ";" instead of only using the first question's text - Add logging warnings for skipped invalid items in multi-question path instead of silently dropping them - Simplify schema: use `"required": ["question"]` instead of empty required + anyOf (more LLM-friendly) - Add missing test cases: session=None, single-item questions array, duplicate keywords, falsy option values Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 18:39:55 +00:00
Zamil Majdy	a0efbbba90	feat(backend/copilot): support multiple questions in ask_question tool The ask_question tool previously only accepted a single question per invocation, forcing the LLM to cram multiple queries into one text box or make multiple sequential tool calls. This adds a `questions` parameter (list of question objects) so multiple input fields render at once. Backward-compatible: the existing `question`/`options`/`keyword` params still work. When `questions` (plural) is provided, they take precedence. The frontend ClarificationQuestionsCard already supports rendering multiple questions — no frontend changes needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 18:21:35 +00:00
Zamil Majdy	8ed959433a	fix(frontend/builder): clear stale messages in retrySession so new session starts clean	2026-04-10 00:56:31 +07:00
Zamil Majdy	98f3e09580	fix(frontend/builder): reset hasSentSeedMessageRef in retrySession so seed is sent to new session	2026-04-10 00:39:10 +07:00
Zamil Majdy	9ec44dd109	test(backend): add route-level tests for subscription API endpoints Tests for GET/POST /credits/subscription covering: - GET returns current tier (PRO, FREE default when None) - POST FREE skips Stripe when payment disabled - POST PRO sets tier directly for beta users (payment disabled) - POST paid tier rejects missing success_url/cancel_url with 422 - POST paid tier creates Stripe Checkout Session and returns URL - POST FREE with payment enabled cancels active Stripe subscription	2026-04-10 00:19:06 +07:00
Zamil Majdy	bfb82b6246	fix(platform): address reviewer feedback on subscription endpoint - Remove useCallback from changeTier (not needed per project guidelines) - Block self-service tier changes for ENTERPRISE users (admin-managed) - Preserve current tier on unrecognized Stripe price_id instead of defaulting to FREE (prevents accidental downgrades during price migration)	2026-04-10 00:08:54 +07:00
Zamil Majdy	63210770ce	test(backend): add tests for get_subscription_price_id to improve coverage	2026-04-09 23:54:02 +07:00
Zamil Majdy	f2b8f81bb1	test(backend/copilot): add unit tests for update_message_content_by_sequence Cover success, not-found (returns False + warning), and DB-error (returns False + error log) paths to push patch coverage above the 80% threshold.	2026-04-09 23:52:39 +07:00
Zamil Majdy	68b51ae2d3	test(backend): add coverage for sync_subscription_from_stripe edge cases Tests for: - Unknown/mismatched Stripe price_id defaults to FREE (not early return) - None from LaunchDarkly price flags defaults to FREE - BUSINESS tier mapping - StripeError during cancel_stripe_subscription is logged, not raised	2026-04-09 23:52:16 +07:00
Zamil Majdy	63ff214563	fix(backend): default to FREE tier on unknown Stripe price ID in webhook sync When sync_subscription_from_stripe encounters an unrecognized price_id (e.g. LD flags unconfigured or price changed), it no longer returns early leaving the user on a stale tier. Instead it defaults to FREE and logs a warning, keeping the DB state consistent with Stripe's subscription status. Also guard against None pro_price/biz_price from LaunchDarkly before comparison to avoid silent mismatches.	2026-04-09 23:41:51 +07:00
Zamil Majdy	9498daca31	fix(frontend/builder): wrap panel in CopilotChatActionsProvider to prevent crash EditAgentTool and RunAgentTool call useCopilotChatActions() which throws if no provider is in the tree. Wrap the panel content with CopilotChatActionsProvider wired to sendRawMessage so tool components can send retry prompts without crashing.	2026-04-09 23:41:06 +07:00
Zamil Majdy	ce0cb1e035	fix(backend/copilot): persist user-context prefix to DB in both SDK and baseline paths The user message was saved to DB before the <user_context> prefix was added to session.messages. Subsequent upsert_chat_session calls only append new messages (slicing by existing_message_count), so the prefixed content was never written to the DB. On page reload or --resume, the unprefixed version was loaded, losing personalisation. Fix: add update_message_content_by_sequence to db.py and call it after injecting the prefix in both sdk/service.py and baseline/service.py.	2026-04-09 23:40:14 +07:00
Zamil Majdy	0d89f7bb33	fix(backend): handle customer.subscription.created webhook event Add customer.subscription.created to the sync handler so user tier is upgraded immediately when the subscription is first created (not just on subsequent updates/deletions).	2026-04-09 23:39:16 +07:00
Zamil Majdy	aef9298be6	test(platform/admin): add cache token and retry cost accumulation tests Add unit tests for: - Anthropic cache_read_tokens/cache_creation_tokens in llm_call response - cache token accumulation in AIStructuredResponseGeneratorBlock stats - provider_cost persistence on exhausted retry path - usd_to_microdollars None-safe branch - explicit start param covering _build_where false branch - cache token columns in platform_cost integration test	2026-04-09 23:33:21 +07:00

1 2 3 4 5 ...

8372 Commits