Compare commits

...

107 Commits

Author SHA1 Message Date
Zamil Majdy
5cc72e7608 fix(backend): validate non-finite float in extract_openrouter_cost, restore UTILITIES comment
- Add math.isfinite() guard in extract_openrouter_cost so inf/nan header
  values are rejected instead of stored (Sentry finding)
- Add test cases for inf, -inf, and nan header values
- Restore the '# ------- UTILITIES ------- #' section separator in
  manager.py that was accidentally dropped during the drain-on-shutdown commit
2026-04-08 10:26:41 +07:00
Zamil Majdy
fa0650214d refactor(backend): extract shared _update_title_async to copilot/service.py
The function was duplicated in both baseline/service.py and sdk/service.py
with identical logic. Consolidate into the shared service module alongside
_generate_session_title which it wraps.
2026-04-08 10:14:29 +07:00
Zamil Majdy
3e016508d4 fix(backend): replace getattr/hasattr duck-typing with try/except in baseline cost extraction
Consistent with the pattern used in llm.extract_openrouter_cost():
use try/except (AttributeError, ValueError) instead of getattr(response, '_response', None)
+ hasattr(raw_resp, 'headers') so there is only one attribute-access pattern in the codebase.
2026-04-08 10:08:52 +07:00
Zamil Majdy
b80d7abda9 fix(backend): remove per-worker Prisma connect, route DB via DatabaseManagerAsyncClient
- Remove local import + db.connect()/disconnect() from CoPilotProcessor.on_executor_start
  DB calls already route through db_accessors (chat_db, user_db) which fall back to
  DatabaseManagerAsyncClient RPC when db.is_connected() is False
- Fix rate_limit._fetch_user_tier to use user_db().get_user_by_id() instead of
  PrismaUser.prisma() directly — avoids requiring Prisma connected on worker event loop
- Add subscription_tier field to User Pydantic model, mapped in User.from_db() so
  the RPC path returns the tier value without a direct Prisma connection
2026-04-08 10:05:18 +07:00
Zamil Majdy
0e310c788a fix(backend): fix TestGetPlatformCostDashboard mocks to match 3-query implementation
get_platform_cost_dashboard runs 3 concurrent queries (by_provider,
by_user, COUNT DISTINCT userId) but the unit tests only provided 2
side_effect values, causing StopAsyncIteration on the third call.
Updated all three test cases to supply a third mock return value and
corrected await_count assertion from 2 to 3.
2026-04-07 23:15:42 +07:00
Zamil Majdy
91af007c18 fix(backend): guard against non-finite cost_usd in persist_and_record_usage
float('inf') and float('nan') do not raise ValueError/TypeError so they
bypass the existing try/except. Passing them to usd_to_microdollars causes
OverflowError at round(inf * 1_000_000). Add math.isfinite(val) and val >= 0
check (matching the same pattern used in baseline/service.py and llm.py)
before assigning cost_float.
2026-04-07 22:56:20 +07:00
Zamil Majdy
e7ca81ed89 fix(backend): address coderabbitai nitpicks in cost tracking files
- token_tracking.py: convert logger.info %s calls to f-strings per style guide
- cost_tracking.py: simplify metadata=meta (was redundantly `meta or None`);
  move token_tracking imports to module level to remove # noqa: PLC0415 suppressors
- baseline/service.py: remove dead UnboundLocalError from except tuple since
  response is initialized to None before the try block
2026-04-07 22:54:52 +07:00
Zamil Majdy
5164fa878f fix(backend): fix race condition on _copilot_tasks concurrent iteration during drain
Add _pending_log_tasks_lock to token_tracking.py so that add/discard
operations on _pending_log_tasks are always lock-protected. Update
drain_pending_cost_logs in cost_tracking.py to acquire the copilot
tasks lock (not its own lock) when taking a snapshot of the copilot
set, preventing RuntimeError: Set changed size during iteration during
graceful shutdown when done callbacks fire concurrently.
2026-04-07 22:48:51 +07:00
Zamil Majdy
cf605ef5a3 fix(backend): fix race condition on _pending_log_tasks and uncapped total_users
- cost_tracking.py: add threading.Lock (_pending_log_tasks_lock) around all
  add/discard/iterate access to _pending_log_tasks; worker thread done callbacks
  and drain_pending_cost_logs() run concurrently across loops, causing
  RuntimeError: Set changed size during iteration without a lock

- platform_cost.py: add a separate COUNT(DISTINCT userId) query so total_users
  is accurate for >100 active users; previously it was silently capped at
  MAX_USER_ROWS=100 because it was derived from len(by_user_rows)
2026-04-07 22:25:25 +07:00
Zamil Majdy
e7bd05c6f1 fix(backend): filter drain tasks by current event loop to prevent cross-loop asyncio.wait() crash 2026-04-07 21:42:17 +07:00
Zamil Majdy
22fb3549e3 test(frontend): fix null-user dash assertion using getAllByText to handle multiple matches 2026-04-07 21:31:43 +07:00
Zamil Majdy
1c3fe1444e fix(backend): address 4 unresolved review threads on cost tracking
1. cost_tracking.py: replace shared _log_semaphore with per-loop dict
   (_log_semaphores + _get_log_semaphore()) — asyncio.Semaphore is not
   thread-safe and must not be shared across executor worker threads/loops

2. cost_tracking.py: only honor provider_cost_type when provider_cost is
   also present (not None); use tracking_amount (not raw stats.provider_cost)
   in usd_to_microdollars() to avoid unit mismatches

3. token_tracking.py: add semaphore to _schedule_cost_log (same pattern
   as cost_tracking.py) to bound concurrent DB inserts under load; fix
   forward-reference string in _pending_log_tasks type annotation

4. baseline/service.py: validate x-total-cost header with math.isfinite
   and max(0.0, cost) guard before accumulating — rejects nan/inf values
   that float() accepts but that should never reach the persistence path
2026-04-07 21:25:56 +07:00
Zamil Majdy
b89321a688 Merge remote-tracking branch 'origin/dev' into codex/platform-cost-tracking 2026-04-07 21:16:02 +07:00
Krzysztof Czerwinski
67bdef13e7 feat(platform): load copilot messages from newest first with cursor-based pagination (#12328)
Copilot chat sessions with long histories loaded all messages at once,
causing slow initial loads. This PR adds cursor-based pagination so only
the most recent messages load initially, with older messages fetched on
demand as the user scrolls up.

### Changes 🏗️

**Backend:**
- Cursor-based pagination on `GET /sessions/{session_id}` (`limit`,
`before_sequence` params)
- `user_id` relation filter on the paginated query — ownership check and
message fetch now run in parallel
- Backward boundary expansion to keep tool-call / assistant message
pairs intact at page edges
- Unit tests for paginated queries

**Frontend:**
- `useLoadMoreMessages` hook + `LoadMoreSentinel` (IntersectionObserver)
for infinite scroll upward
- `ScrollPreserver` to maintain scroll position when older messages are
prepended
- Session-keyed `Conversation` remount with one-frame opacity hide to
eliminate scroll flash on switch
- Scrollbar moved to the correct scroll container; loading spinner no
longer causes overflow

### Checklist 📋

- [x] Pagination: only recent messages load initially; older pages load
on scroll-up
- [x] Scroll position preserved on prepend; no flash on session switch
- [x] Tool-call boundary pairs stay intact across page edges
- [x] Stream reconnection still works on initial load

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-04-07 12:43:47 +00:00
Ubbe
e67dd93ee8 refactor(frontend): remove stale feature flags and stabilize share execution (#12697)
## Why

Stale feature flags add noise to the codebase and make it harder to
understand which flags are actually gating live features. Four flags
were defined but never referenced anywhere in the frontend, and the
"Share Execution Results" flag has been stable long enough to remove its
gate.

## What

- Remove 4 unused flags from the `Flag` enum and `defaultFlags`:
`NEW_BLOCK_MENU`, `GRAPH_SEARCH`, `ENABLE_ENHANCED_OUTPUT_HANDLING`,
`AGENT_FAVORITING`
- Remove the `SHARE_EXECUTION_RESULTS` flag and its conditional — the
`ShareRunButton` now always renders

## How

- Deleted enum entries and default values in `use-get-flag.ts`
- Removed the `useGetFlag` call and conditional wrapper around
`<ShareRunButton />` in `SelectedRunActions.tsx`

## Changes

- `src/services/feature-flags/use-get-flag.ts` — removed 5 flags from
enum + defaults
- `src/app/(platform)/library/.../SelectedRunActions.tsx` — removed flag
import, condition; share button always renders

### Checklist

- [x] My PR is small and focused on one change
- [x] I've tested my changes locally
- [x] `pnpm format && pnpm lint` pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 19:28:40 +07:00
Otto
3140a60816 fix(frontend/builder): allow horizontal scroll for JSON output data (#12638)
Requested by @Abhi1992002 

## Why

JSON output data in the "Complete Output Data" dialog and node output
panel gets clipped — text overflows and is hidden with no way to scroll
right. Reported by Zamil in #frontend.

## What

The `ContentRenderer` wrapper divs used `overflow-hidden` which
prevented the `JSONRenderer`'s `overflow-x-auto` from working. Changed
both wrapper divs from `overflow-hidden` to `overflow-x-auto`.

```diff
- overflow-hidden [&>*]:rounded-xlarge [&>*]:!text-xs [&_pre]:whitespace-pre-wrap [&_pre]:break-words
+ overflow-x-auto [&>*]:rounded-xlarge [&>*]:!text-xs [&_pre]:whitespace-pre-wrap [&_pre]:break-words

- overflow-hidden [&>*]:rounded-xlarge [&>*]:!text-xs
+ overflow-x-auto [&>*]:rounded-xlarge [&>*]:!text-xs
```

## Scope
- 1 file changed (`ContentRenderer.tsx`)
- 2 lines: `overflow-hidden` → `overflow-x-auto`
- CSS only, no logic changes

Resolves SECRT-2206

Co-authored-by: Abhimanyu Yadav <122007096+Abhi1992002@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-04-07 19:11:09 +07:00
Nicholas Tindle
41c2ee9f83 feat(platform): add copilot artifact preview panel (#12629)
### Why / What / How

Copilot artifacts were not previewing reliably: PDFs downloaded instead
of rendering, Python code could still render like markdown, JSX/TSX
artifacts were brittle, HTML dashboards/charts could fail to execute,
and users had to manually open artifact panes after generation. The pane
also got stuck at maximized width when trying to drag it smaller.

This PR adds a dedicated copilot artifact panel and preview pipeline
across the backend/frontend boundary. It preserves artifact metadata
needed for classification, adds extension-first preview routing,
introduces dedicated preview/rendering paths for HTML/CSV/code/PDF/React
artifacts, auto-opens new or edited assistant artifacts, and fixes the
maximized-pane resize path so dragging exits maximized mode immediately.

### Changes 🏗️

- add artifact card and artifact panel UI in copilot, including
persisted panel state and resize/maximize/minimize behavior
- add shared artifact extraction/classification helpers and auto-open
behavior for new or edited assistant messages with artifacts
- add preview/rendering support for HTML, CSV, PDF, code, and React
artifact files
- fix code artifacts such as Python to render through the code renderer
with a dark code surface instead of markdown-style output
- improve JSX/TSX preview behavior with provider wrapping, fallback
export selection, and explicit runtime error surfaces
- allow script execution inside HTML previews so embedded chart
dashboards can render
- update workspace artifact/backend API handling and regenerate the
frontend OpenAPI client
- add regression coverage for artifact helpers, React preview runtime,
auto-open behavior, code rendering, and panel store behavior

- post-review hardening: correct download path for cross-origin URLs,
defer scroll restore until content mounts, gate auto-open behind the
ARTIFACTS flag, parse CSVs with RFC 4180-compliant quoted newlines + BOM
handling, distinguish 413 vs 409 on upload, normalize empty session_id,
and keep AnimatePresence mounted so the panel exit animation plays

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `pnpm format`
  - [x] `pnpm lint`
  - [x] `pnpm types`
  - [x] `pnpm test:unit`

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds a new Copilot artifact preview surface that executes
user/AI-generated HTML/React in sandboxed iframes and changes workspace
file upload/listing behavior, so regressions could affect file handling
and client security assumptions despite sandboxing safeguards.
> 
> **Overview**
> Adds an **Artifacts** feature (flagged by `Flag.ARTIFACTS`) to
Copilot: workspace file links/attachments now render as `ArtifactCard`s
and can open a new resizable/minimizable `ArtifactPanel` with history,
auto-open behavior, copy/download actions, and persisted panel width.
> 
> Introduces a richer artifact preview pipeline with type classification
and dedicated renderers for **HTML**, **CSV**, **PDF**, **code
(Shiki-highlighted)**, and **React/TSX** (transpiled and executed in a
sandboxed iframe), plus safer download filename handling and content
caching/scroll restore.
> 
> Extends the workspace backend API by adding `GET /workspace/files`
pagination, standardizing operation IDs in OpenAPI, attaching
`metadata.origin` on uploads/agent-created files, normalizing empty
`session_id`, improving upload error mapping (409 vs 413), and hardening
post-quota soft-delete error handling; updates and expands test coverage
accordingly.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
b732d10eca. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:24:22 +00:00
Zamil Majdy
630d6d4705 fix(backend): add semaphore to executor cost log tasks; fix type annotation
- Add `_log_semaphore = asyncio.Semaphore(50)` to cost_tracking.py to bound
  concurrent DB inserts (mirrors platform_cost.py's existing semaphore)
- Narrow `_extract_model_name` param type from `Any` to `str | dict | None`
- Add `test_get_dashboard_cache_hit` to verify TTL cache deduplicates DB calls
- Add `scope="col"` to all table `<th>` elements for screen-reader accessibility
- Add `(local time)` labels to date filter inputs to clarify timezone behaviour
2026-04-07 18:15:57 +07:00
Zamil Majdy
7c685c6677 fix(backend): update platform_cost_test to expect masked email in dashboard
_mask_email() is applied to by_user emails in get_platform_cost_dashboard().
Test now asserts 'a***@b.com' instead of 'a@b.com'.
2026-04-07 18:02:00 +07:00
Ubbe
ca748ee12a feat(frontend): refine AutoPilot onboarding — branding, auto-advance, soft cap, polish (#12686)
### Why / What / How

**Why:** The onboarding flow had inconsistent branding ("Autopilot" vs
"AutoPilot"), a heavy progress bar that dominated the header, an extra
click on the role screen, and no guidance on how many pain points to
select — leading to users selecting everything or nothing useful.

**What:** Copy & brand fixes, UX improvements (auto-advance, soft cap),
and visual polish (progress bar, checkmark badges, purple focus inputs).

**How:**
- Replaced all "Autopilot" with "AutoPilot" (capital P) across screens
1-3
- Removed the `?` tooltip on screen 1 (users will learn about AutoPilot
from the access email)
- Changed name label to conversational "What should I call you?"
- Screen 2: auto-advances 350ms after role selection (except "Other"
which still shows input + button)
- Screen 3: soft cap of 3 selections with green confirmation text and
shake animation on overflow attempt
- Thinned progress bar from ~10px to 3px (Linear/Notion style)
- Added purple checkmark badges on selected cards
- Updated Input atom focus state to purple ring

### Changes 🏗️

- **WelcomeStep**: "AutoPilot" branding, removed tooltip, conversational
label
- **RoleStep**: Updated subtitle, auto-advance on non-"Other" role
select, Continue button only for "Other"
- **PainPointsStep**: Soft cap of 3 with dynamic helper text and shake
animation
- **usePainPointsStep**: Added `atLimit`/`shaking` state, wrapped
`togglePainPoint` with cap logic
- **store.ts**: `togglePainPoint` returns early when at 3 and adding
- **ProgressBar**: 3px height, removed glow shadow
- **SelectableCard**: Added purple checkmark badge on selected state
- **Input atom**: Focus ring changed from zinc to purple
- **tailwind.config.ts**: Added `shake` keyframe and `animate-shake`
utility

### Checklist 📋

#### For code changes:
- [ ] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan:
  - [ ] Navigate through full onboarding flow (screens 1→2→3→4)
  - [ ] Verify "AutoPilot" branding on all screens (no "Autopilot")
  - [ ] Verify screen 2 auto-advances after tapping a role (non-"Other")
  - [ ] Verify "Other" role still shows text input and Continue button
  - [ ] Verify Back button works correctly from screen 2 and 3
  - [ ] Select 3 pain points and verify green "3 selected" text
  - [ ] Attempt 4th selection and verify shake animation + swap message
  - [ ] Deselect one and verify can select a different one
  - [ ] Verify checkmark badges appear on selected cards
  - [ ] Verify progress bar is thin (3px) and subtle
  - [ ] Verify input focus state is purple across onboarding inputs
- [ ] Verify "Something else" + other text input still works on screen 3

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 17:58:36 +07:00
Zamil Majdy
bbdf13c7a8 test(backend): add missing cost-tracking tests for Exa research, Apollo people, and copilot baseline
- ExaCreateResearchBlock, ExaGetResearchBlock, ExaWaitForResearchBlock: verify
  merge_stats is called with provider_cost=cost_dollars.total when completed, and
  not called when costDollars is absent
- SearchPeopleBlock: verify provider_cost=len(people) with type='items'
- Copilot baseline: 4 tests for x-total-cost header extraction in
  _baseline_llm_caller — including accumulation across turns and extraction in
  the finally block when the stream raises
2026-04-07 17:44:15 +07:00
Zamil Majdy
e1ea4cf326 test(frontend): rewrite PlatformCostContent tests to mock Orval hooks
Component now uses React Query hooks (useGetV2GetPlatformCostDashboard,
useGetV2GetPlatformCostLogs) instead of server actions, so tests must
mock @/app/api/__generated__/endpoints/admin/admin rather than ../actions.

Adds 16 test cases covering loading state, empty/data renders, tabs,
filters, pagination, null email/user handling, and tracking type badges.
2026-04-07 17:34:15 +07:00
Zamil Majdy
db6b4444e0 fix(platform): address autogpt-reviewer should-fix items
- Remove dead _pending_log_tasks/schedule_cost_log/drain_pending_cost_logs
  from platform_cost.py (only cost_tracking.py and token_tracking.py have
  active task registries; drain comment updated to match)
- Replace vars(other) iteration in NodeExecutionStats.__iadd__ with
  type(other).model_fields to avoid any potential __pydantic_extra__ leakage
- Fix rate-override clear: onRateOverride(key, null) deletes the key so
  defaultRateFor() takes effect instead of pinning estimated cost to $0
- Type extract_openrouter_cost parameter as OpenAIChatCompletion
- Fix early-return guard in persist_and_record_usage: allow through when
  all token counts are 0 but cost_usd is provided (fully-cached responses)
- Add missing tests: LLM retry cost (only last attempt merged), zero-token
  copilot cost, Exa search + similar merge_stats coverage
2026-04-07 17:23:42 +07:00
Zamil Majdy
9b1175473b fix(backend/copilot): update stale comment in processor.py after cost log routing change
token_tracking.py now routes cost logs through DatabaseManagerAsyncClient
(platform_cost_db()), so the Prisma connect in on_executor_start() is for
copilot/db.py and rate_limit.py direct Prisma usage.
2026-04-07 17:11:56 +07:00
Zamil Majdy
752a238166 refactor(frontend): replace server actions with React Query hooks for cost dashboard
usePlatformCostContent.ts now calls useGetV2GetPlatformCostDashboard and
useGetV2GetPlatformCostLogs directly (with okData selector) so the browser
gets proper caching, deduplication, and background refetch.

actions.ts is retained as a plain helper module (no 'use server') because
the co-located test file imports from it; the functions are no longer called
by the hook.
2026-04-07 17:08:42 +07:00
Zamil Majdy
2a73d1baa9 fix(backend/copilot): route copilot cost logging through DatabaseManagerAsyncClient
The copilot executor's token_tracking.py was using schedule_cost_log()
which calls execute_raw_with_schema() directly on the Prisma singleton.
In the copilot_executor process, Prisma is not reliably connected due to
event-loop binding issues, causing ClientNotConnectedError on every turn.

Fix: route cost logging through platform_cost_db() -> DatabaseManagerAsyncClient
RPC (same approach already used by the block executor). Also fix
_copilot_block_name() to extract only the service tag from the log prefix
(e.g. "[SDK][session-id][T1]" -> "copilot:SDK") instead of the full suffix.

Update cost_tracking.py drain to drain token_tracking._pending_log_tasks,
and update token_tracking_test.py mocks to match new call site.
2026-04-07 16:58:41 +07:00
Zamil Majdy
254e6057f4 fix(backend/copilot): connect Prisma in copilot executor for cost log writes
schedule_cost_log() in token_tracking.py writes PlatformCostLog rows via
execute_raw_with_schema(), which requires an active Prisma connection.
Connect Prisma at on_executor_start() so cost tracking is not silently dropped
in the copilot executor process.
2026-04-07 16:54:00 +07:00
Zamil Majdy
a616e5a060 fix(backend): address PR review — email masking, semaphore, openrouter cost style
- Mask user emails in admin API responses (dashboard + logs) to reduce
  PII exposure in proxy/CDN logs; _mask_email() shows first 2 chars only
- Add _log_semaphore(50) in platform_cost.py to bound concurrent DB inserts
  and provide back-pressure under high load
- Refactor extract_openrouter_cost() to use try/except AttributeError
  instead of getattr/hasattr, and log a WARNING when _response is missing
  so SDK changes are detectable
- Add comment to usePlatformCostContent.ts explaining why server actions
  are used instead of React Query (server-side withRoleAccess constraint)
2026-04-07 16:38:07 +07:00
Zamil Majdy
c9461836c6 fix(backend): address all open review comments on platform cost tracking
- Normalize provider to lowercase at write time; drop LOWER() in filter so
  the (provider, createdAt) index is used without function overhead
- Drop COALESCE(trackingType, metadata->>'tracking_type') fallback — new rows
  always have trackingType set at write time
- Derive total_users from len(by_user_rows) instead of a separate
  COUNT(DISTINCT userId) query (saves one aggregation per dashboard load)
- Add 30-second TTLCache for dashboard endpoint (cachetools, maxsize=256)
- Add backpressure/bounds comment to _pending_log_tasks in platform_cost.py
- Convert f-string logger calls in token_tracking.py to lazy %s formatting
- Add 6 block-level tests for ExaCodeContextBlock and ExaContentsBlock cost
  paths: valid/invalid/zero cost_dollars strings and None cost_dollars
- Update existing tests to match provider-lowercasing and 2-query dashboard
2026-04-07 16:23:10 +07:00
Zamil Majdy
50a8df3d67 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into codex/platform-cost-tracking 2026-04-07 16:12:47 +07:00
Zamil Majdy
243b12778f dx: improve pr-test skill — inline screenshots, flow captions, and test evaluation (#12692)
## Changes

### 1. Inline image enforcement (Step 7)
- Added `CRITICAL` warning: never post a bare directory tree link
- Added post-comment verification block that greps for `![` tags and
exits 1 if none found — agents can't silently skip inline embedding

### 2. Structured screenshot captions (Step 6)
- `SCREENSHOT_EXPLANATIONS` now requires **Flow** (which scenario),
**Steps** (exact actions taken), **Evidence** (what this proves)
- Good/bad example included so agents know what format is expected
- A bare "shows the page" caption is explicitly rejected

### 3. Test completeness evaluation (Step 8) — new step
After posting screenshots, the agent must evaluate coverage against the
test plan and post a formal GitHub review:
- **`APPROVE`** — every scenario tested with screenshot + DB/API
evidence, no blockers
- **`REQUEST_CHANGES`** — lists exact gaps: untested scenarios, missing
evidence, confirmed bugs
- Per-scenario checklist (/) required in the review body
- Cannot auto-approve without ticking every item in the test plan

## Why

- Agents were posting `https://github.com/.../tree/test-screenshots/...`
instead of `![name](url)` inline
- Screenshot captions were too vague to be useful ("shows the page")
- No mechanism to catch incomplete test runs — agent could skip
scenarios and still post a passing report

## Checklist

- [x] `.claude/skills/pr-test/SKILL.md` updated
- [x] No production code changes — skill/dx only
- [x] Pre-commit hooks pass
2026-04-07 16:04:08 +07:00
Zamil Majdy
3f7a8dc44d fix(backend/copilot): use active_model instead of config.model for cost attribution
When fast mode is selected, the baseline copilot uses a different model
(active_model from _resolve_baseline_model) than the config default. Using
config.model for cost attribution would misattribute costs to the wrong model.
2026-04-07 15:54:36 +07:00
Zamil Majdy
1c15d6a6cc fix(frontend): update platform-costs tests for Skeleton loading state and removed duration_seconds
- Update loading state test to check for absent Skeleton elements (animate-pulse)
  rather than absent 'Loading...' text (which was removed in previous commit)
- Update helpers.test.ts to test sandbox_seconds instead of the removed duration_seconds
2026-04-07 15:52:38 +07:00
Zamil Majdy
a31be77408 fix(platform): address additional reviewer feedback on platform cost dashboard
- Extract _extract_model_name() helper in cost_tracking.py to replace nested isinstance checks
- Replace Lucide icons with Phosphor equivalents in admin/layout.tsx
- Replace Loading... text with Skeleton components in PlatformCostContent
- Switch Promise.all to Promise.allSettled in usePlatformCostContent for partial data resilience
- Fix hardcoded border-blue-600/text-blue-600 with design token border-primary/text-primary
- Remove dead duration_seconds case from helpers.ts and TrackingBadge (backend never emits it)
2026-04-07 15:46:39 +07:00
Zamil Majdy
1d45f2f18c fix(platform): fix baseline copilot OpenRouter cost extraction and credit_cost check
- Fix wrong attribute: baseline/service.py used getattr(response, 'response')
  but AsyncStream exposes the raw httpx response via '_response' (with
  underscore), matching the pattern in llm.py:extract_openrouter_cost().
  OpenRouter cost tracking in baseline copilot was silently failing.
- Fix falsy zero-cost guard: change `if credit_cost:` to `if credit_cost is
  not None:` so free-tier blocks (credit_cost=0) include the field in metadata.
2026-04-07 15:25:02 +07:00
Zamil Majdy
27e34e9514 fix(platform): drain pending cost logs on shutdown, remove dark: badges
- Wire drain_pending_cost_logs() into ExecutionManager.cleanup() so
  in-flight INSERT tasks are awaited before process exit during rolling
  deployments (uses the existing node_execution_loop; no-op if the loop
  was never started, e.g. in tests)
- Remove prohibited dark: Tailwind classes from TrackingBadge badges;
  light tokens (text-green-700, text-blue-700, …) now apply in all
  themes — design system handles dark mode via CSS variables
2026-04-07 15:07:41 +07:00
Zamil Majdy
16d696edcc fix(platform): address autogpt-reviewer blockers and should-fix items
- Fix LLM retry double-counting: track tokens per attempt but only merge
  provider_cost on the successful attempt, not across all retries
- Add drain_pending_cost_logs() to platform_cost.py; update cost_tracking
  to drain both executor and copilot task sets on shutdown
- Remove prohibited dark: Tailwind classes from PlatformCostContent error
  div, replace with Alert component (design system error variant)
- Add block-level cost tracking tests for: JinaEmbeddingBlock (with/without
  usage), UnrealTextToSpeechBlock (character count), GoogleMapsSearchBlock
  (place count), AddLeadToCampaignBlock (lead count)
- Add __iadd__ edge case tests: provider_cost_type first-write-to-None and
  None does not overwrite existing value
- Rename metadata key provider_cost_usd to provider_cost_raw (value unit
  varies by tracking type; only cost_usd uses USD)
- Add test verifying per_run providers have no provider_cost_raw in metadata
2026-04-07 15:05:44 +07:00
Zamil Majdy
f87bbd5966 fix(backend): route platform cost logging through DatabaseManagerAsyncClient
Fixes ClientNotConnectedError in the executor process by routing
log_platform_cost through the DatabaseManagerAsyncClient RPC proxy
instead of calling execute_raw_with_schema directly on the unconnected
module-level prisma instance.
2026-04-07 14:53:46 +07:00
Nicholas Tindle
b64d1ed9fa Merge branch 'dev' into codex/platform-cost-tracking 2026-04-06 14:31:13 -05:00
An Vy Le
43c81910ae fix(backend/copilot): skip AI blocks without model property in fix_ai_model_parameter (#12688)
### Why / What / How

**Why:** Some AI-category blocks do not expose a `"model"` input
property in their `inputSchema`. The `fix_ai_model_parameter` fixer was
unconditionally injecting a default model value (e.g. `"gpt-4o"`) into
any node whose block has category `"AI"`, regardless of whether that
block actually accepts a `model` input. This causes the agent JSON to
include an invalid field for those blocks.

**What:** Guard the model-injection logic with a check that `"model"`
exists in the block's `inputSchema.properties` before attempting to set
or validate the field. AI blocks that have no model selector are now
skipped entirely.

**How:** In `fix_ai_model_parameter`, after confirming `is_ai_block`,
extract `input_properties` from the block's `inputSchema.properties` and
`continue` if `"model"` is absent. The subsequent `model_schema` lookup
is also simplified to reuse the already-fetched `input_properties` dict.
A regression test is added to cover this case.

### Changes 🏗️

- `backend/copilot/tools/agent_generator/fixer.py`: In
`fix_ai_model_parameter`, skip AI-category nodes whose block
`inputSchema.properties` does not contain a `"model"` key; reuse
`input_properties` for the subsequent `model_schema` lookup.
- `backend/copilot/tools/agent_generator/fixer_test.py`: Add
`test_ai_block_without_model_property_is_skipped` to
`TestFixAiModelParameter`.

### Checklist 📋

#### For code changes:
- [ ] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] Run `poetry run pytest
backend/copilot/tools/agent_generator/fixer_test.py` — all 50 tests pass
(49 pre-existing + 1 new)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 17:14:11 +00:00
Zamil Majdy
3895d95826 fix(platform): address reviewer comments — tests, a11y, and frontend polish
Backend:
- Add block cost tracking tests for ExaCodeContext, ExaContents, and
  SearchOrganizations blocks (high-severity reviewer ask)
- Add test verifying FAILED status skips cost log in manager
- Add test for empty org list tracking zero items cost

Frontend:
- Rename trackingBadge() → TrackingBadge component (PascalCase convention)
- Move toLocalInput/toUtcIso helpers from usePlatformCostContent.ts to helpers.ts
- Add aria-label to ProviderTable rate override inputs
- Add role="alert" to error state div in PlatformCostContent
- Add Clear Filters button next to Apply
- Fix text-gray-500 → text-muted-foreground in page.tsx (dark mode)
- Dark-mode-compatible error div styling
- Strengthen PlatformCostContent test assertion (exact count instead of >= 1)
- Add tab panel visibility tests and toLocalInput/toUtcIso unit tests
2026-04-06 21:49:23 +07:00
Zamil Majdy
181208528f fix(backend): update token_tracking_test mock targets after _schedule_log refactor
After extracting _schedule_log into schedule_cost_log() in platform_cost.py,
token_tracking no longer has log_platform_cost_safe as an attribute.
Update patch targets to backend.data.platform_cost.log_platform_cost_safe.
2026-04-06 21:37:38 +07:00
Zamil Majdy
0365a26c85 refactor(backend): add clarifying comment to NodeExecutionStats.__iadd__ vars() usage 2026-04-06 21:25:00 +07:00
Zamil Majdy
fb63ae54f0 refactor(platform): address review comments on platform cost tracking
Backend:
- Extract shared _schedule_log into schedule_cost_log() in platform_cost.py
  so both cost_tracking and token_tracking drain a single task set
- Add DEFAULT_DASHBOARD_DAYS=30 default for dashboard queries to avoid
  full-table scans when no date filter is provided
- Add MAX_PROVIDER_ROWS=500 / MAX_USER_ROWS=100 named constants
- Fix typing.Optional -> X | None union syntax in routes
- Fix logger f-strings to lazy %s format in platform_cost_routes
- Fix token_tracking condition to allow logging when cost_usd is set
  even if total_tokens is 0 (fully-cached responses)
- Fix test_get_dashboard_success to use real PlatformCostDashboard instance
- Add invalid input tests (422 for bad dates, page_size=0/201, page=0)
- Add test_does_not_raise_when_block_usage_cost_raises
- Add test_provider_cost_zero_is_not_none

Frontend:
- Fix TrackingBadge dark mode colors using design tokens
- Fix UserTable null key for deleted users (use unknown-{idx} fallback)
- Fix ProviderTable rate input from uncontrolled to controlled
- Fix "use server" directive on page component (not a server action)
- Add ARIA label and tabpanel roles to tab UI
- Fix LogsTable fragile cast with safe formatLogDate helper
2026-04-06 21:19:49 +07:00
Zamil Majdy
6de79fb73f fix: resolve merge conflicts with dev branch
Keep cost_usd field alongside new thinking_stripper and session_messages
fields added in dev for baseline copilot state.
2026-04-06 21:00:23 +07:00
Ubbe
a11199aa67 dx(frontend): set up React integration testing with Vitest + RTL + MSW (#12667)
## Summary
- Establish React integration tests (Vitest + RTL + MSW) as the primary
frontend testing strategy (~90% of tests)
- Update all contributor documentation (TESTING.md, CONTRIBUTING.md,
AGENTS.md) to reflect the integration-first convention
- Add `NuqsTestingAdapter` and `TooltipProvider` to the shared test
wrapper so page-level tests work out of the box
- Write 8 integration tests for the library page as a reference example
for the pattern

## Why
We had the testing infrastructure (Vitest, RTL, MSW, Orval-generated
handlers) but no established convention for page-level integration
tests. Most existing tests were for stores or small components. Since
our frontend is client-first, we need a documented, repeatable pattern
for testing full pages with mocked APIs.

## What
- **Docs**: Rewrote `TESTING.md` as a comprehensive guide. Updated
testing sections in `CONTRIBUTING.md`, `frontend/AGENTS.md`,
`platform/AGENTS.md`, and `autogpt_platform/AGENTS.md`
- **Test infra**: Added `NuqsTestingAdapter` (for `nuqs` query state
hooks) and `TooltipProvider` (for Radix tooltips) to `test-utils.tsx`
- **Reference tests**: `library/__tests__/main.test.tsx` with 8 tests
covering agent rendering, tabs, folders, search bar, and Jump Back In

## How
- Convention: tests live in `__tests__/` next to `page.tsx`, named
descriptively (`main.test.tsx`, `search.test.tsx`)
- Pattern: `setupHandlers()` → `render(<Page />)` → `findBy*` assertions
- MSW handlers from
`@/app/api/__generated__/endpoints/{tag}/{tag}.msw.ts` for API mocking
- Custom `render()` from `@/tests/integrations/test-utils` wraps all
required providers

## Test plan
- [x] All 422 unit/integration tests pass (`pnpm test:unit`)
- [x] `pnpm format` clean
- [x] `pnpm lint` clean (no new errors)
- [x] `pnpm types` — pre-existing onboarding type errors only, no new
errors

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-04-06 13:17:08 +00:00
Zamil Majdy
5f82a71d5f feat(copilot): add Fast/Thinking mode toggle with full tool parity (#12623)
### Why / What / How

Users need a way to choose between fast, cheap responses (Sonnet) and
deep reasoning (Opus) in the copilot. Previously only the SDK/Opus path
existed, and the baseline path was a degraded fallback with no tool
calling, no file attachments, no E2B sandbox, and no permission
enforcement.

This PR adds a copilot mode toggle and brings the baseline (fast) path
to full feature parity with the SDK (extended thinking) path.

### Changes 🏗️

#### 1. Mode toggle (UI → full stack)
- Add Fast / Thinking mode toggle to ChatInput footer (Phosphor
`Brain`/`Zap` icons via lucide-react)
- Thread `mode: "fast" | "extended_thinking" | null` from
`StreamChatRequest` → RabbitMQ queue → executor → service selection
- Fast → baseline service (Sonnet 4 via OpenRouter), Thinking → SDK
service (Opus 4.6)
- Toggle gated behind `CHAT_MODE_OPTION` feature flag with server-side
enforcement
- Mode persists in localStorage with SSR-safe init

#### 2. Baseline service full tool parity
- **Tool call persistence**: Store structured `ChatMessage` entries
(assistant + tool results) instead of flat concatenated text — enables
frontend to render tool call details and maintain context across turns
- **E2B sandbox**: Wire up `get_or_create_sandbox()` so `bash_exec`
routes to E2B (image download, Python/PIL compression, filesystem
access)
- **File attachments**: Accept `file_ids`, download workspace files,
embed images as OpenAI vision blocks, save non-images to working dir
- **Permissions**: Filter tool list via `CopilotPermissions`
(whitelist/blacklist)
- **URL context**: Pass `context` dict to user message for URL-shared
content
- **Execution context**: Pass `sandbox`, `sdk_cwd`, `permissions` to
`set_execution_context()`
- **Model**: Changed `fast_model` from `google/gemini-2.5-flash` to
`anthropic/claude-sonnet-4` for reliable function calling
- **Temp dir cleanup**: Lazy `mkdtemp` (only when files attached) +
`shutil.rmtree` in finally

#### 3. Transcript support for Fast mode
- Baseline service now downloads / validates / loads / appends / uploads
transcripts (parity with SDK)
- Enables seamless mode switching mid-conversation via shared transcript
- Upload shielded from cancellation, bounded at 5s timeout

#### 4. Feature-flag infrastructure fixes
- `FORCE_FLAG_*` env-var overrides on both backend and frontend for
local dev / E2E
- LaunchDarkly context parity (frontend mirrors backend user context)
- `CHAT_MODE_OPTION` default flipped to `false` to match backend

#### 5. Other hardening
- Double-submit ref guard in `useChatInput` + reconnect dedup in
`useCopilotStream`
- `copilotModeRef` pattern to read latest mode without recreating
transport
- Shared `CopilotMode` type across frontend files
- File name collision handling with numeric suffix
- Path sanitization in file description hints (`os.path.basename`)

### Test plan
- [x] 30 new unit tests: `_env_flag_override` (12), `envFlagOverride`
(8), `_filter_tools_by_permissions` (4), `_prepare_baseline_attachments`
(6)
- [x] E2E tested on dev: fast mode creates E2B sandbox, calls 7-10
tools, generates and renders images
- [x] Mode switching mid-session works (shared transcript + session
messages)
- [x] Server-side flag gate enforced (crafted `mode=fast` stripped when
flag off)
- [x] All 37 CI checks green
- [x] Verified via agent-browser: workspace images render correctly in
all message positions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Zamil Majdy <majdy.zamil@gmail.com>
2026-04-06 19:54:36 +07:00
Zamil Majdy
d57da6c078 refactor(platform): extract usePlatformCostContent hook
PlatformCostContent.tsx was 248 lines mixing data fetching, URL state,
filter inputs, rate overrides, and rendering. Per frontend convention,
extract the stateful/effectful logic into a dedicated hook:

- usePlatformCostContent.ts (new, 142 lines) — owns:
  - dashboard/logs/pagination fetching via effect
  - URL ↔ filter input sync (startInput, endInput, providerInput, userInput)
  - rateOverrides state + handleRateOverride
  - toLocalInput/toUtcIso datetime helpers
  - updateUrl + handleFilter actions
  - totalEstimatedCost reducer
- PlatformCostContent.tsx (now 182 lines) — pure rendering only.
2026-04-05 15:56:32 +02:00
Zamil Majdy
689cd67a13 refactor(platform): address autogpt-reviewer feedback (batch 2)
- resolve_tracking: replace hardcoded provider string literals with
  ProviderName enum values + _CHARACTER_BILLED_PROVIDERS /
  _WALLTIME_BILLED_PROVIDERS frozensets (nice-to-have #2).
- NodeExecutionStats.__iadd__: replace double model_dump() with
  vars()-based iteration for ~10-50x speedup on each merge_stats() call
  (hot path — runs once per block per yield across 20+ blocks).
- Add 3 accumulation tests for provider_cost semantics:
  - Multiple provider_cost values sum (not last-write-wins)
  - None never overwrites a set value
  - provider_cost_type is last-write-wins (documented semantics)
2026-04-05 15:54:12 +02:00
Zamil Majdy
dca89d1586 refactor(platform): address autogpt-reviewer feedback (batch 1)
- cost_tracking.py: replace `Any` types with NodeExecutionEntry + Block
- Extract usd_to_microdollars utility in platform_cost.py, used by
  cost_tracking.py and copilot/token_tracking.py.
- llm.py: extract x-total-cost header parsing to extract_openrouter_cost()
  helper + 8 unit tests covering present/absent/empty/non-numeric/zero
  cases. Previously untested blocker.
- token_tracking.py: extract COPILOT_BLOCK_ID, COPILOT_CREDENTIAL_ID
  constants + _copilot_block_name() helper (clearer than inline
  f"copilot:{log_prefix.strip(' []')}".rstrip(":")).
- platform_cost.py: cap by_provider query at LIMIT 500 (defensive bound).
- TrackingBadge.tsx: drop dark: classes per frontend convention, add
  "items" badge color.
- PlatformCostContent.tsx: drop dark: classes from error banner,
  add role="tablist"/role="tab"/aria-selected to tabs, add htmlFor
  to filter input labels.
- admin/layout.tsx: Receipt icon moved from lucide-react to phosphor.
- ProviderTable.tsx: add "(unsaved)" label to Rate column header to
  signal per-session only.
2026-04-05 15:46:50 +02:00
Zamil Majdy
2f63fcd383 test(frontend): update platform-costs helpers tests for per-type rate estimation
Updates the test suite to match the new per-type rate estimation logic:
- rateOverrides now use composite keys (provider:tracking_type)
- trackingValue appends unit suffixes (tokens, chars, items)
- characters/items tracking reads from total_tracking_amount
- adds coverage for default rates across characters, items, duration types
2026-04-05 15:30:32 +02:00
Zamil Majdy
f04cd08e40 feat(platform): add trackingAmount column + per-type rate estimation
Problem
- cost_tracking.py was multiplying stats.provider_cost by 1M to get
  cost_microdollars regardless of tracking_type. When provider_cost_type
  was "items" or "characters", 5.0 items got stored as $5 USD.
- The dashboard had no way to aggregate item/character counts since
  they aren't naturally carried by inputTokens/outputTokens/duration.
- Dashboard estimation only handled cost_usd/tokens/per_run; characters,
  items, sandbox_seconds, walltime_seconds showed "-" always.

Fix
- Add PlatformCostLog.trackingAmount (Float?) column + migration.
- cost_tracking.py: only treat provider_cost as USD when tracking_type
  is "cost_usd"; always populate trackingAmount with resolve_tracking's
  amount so the dashboard can aggregate it.
- Dashboard query: SUM(trackingAmount) as total_tracking_amount.
- ProviderCostSummary (backend + regenerated TS): add total_tracking_amount.
- Frontend helpers: DEFAULT_COST_PER_1K_CHARS, DEFAULT_COST_PER_ITEM,
  DEFAULT_COST_PER_SECOND tables for characters/items/duration rates.
  estimateCostForRow dispatches per tracking_type and multiplies the
  correct amount by the correct rate.
- ProviderTable: show editable rate input for every tracking_type
  (not only per_run), with unit label ($/1K tokens, $/1K chars, $/item,
  $/second, $/run). Rate overrides keyed on "provider:tracking_type".
2026-04-05 15:23:31 +02:00
Zamil Majdy
44714f1b25 refactor(platform): use provider_cost_type Literal instead of output_size misuse
Blocks previously called merge_stats(NodeExecutionStats(output_size=...))
to signal "per-request" billing or "N items returned", but `output_size`
is semantically the output payload byte count and is always overridden
by the executor wrapper (manager.py:440 = len(json.dumps(output_data))).
Those calls were silently dead code.

Changes:
- Add ProviderCostType Literal enum on NodeExecutionStats with the
  canonical set of tracking types: cost_usd, tokens, characters,
  sandbox_seconds, walltime_seconds, per_run, items.
- Add provider_cost_type field to NodeExecutionStats so blocks can
  declare their billing model explicitly instead of resolve_tracking
  guessing from provider name.
- resolve_tracking honors provider_cost_type first, falling back to
  heuristics only when not set.
- Remove 26 dead merge_stats(output_size=1) calls across 15 blocks.
- Replace 5 merge_stats(output_size=len(X)) calls with explicit
  provider_cost+provider_cost_type (items/characters) so the count
  is preserved through the wrapper's output_size override.
- Clean up unused NodeExecutionStats imports in 14 files.
- Add tests for block-declared provider_cost_type pathway.
2026-04-05 14:56:44 +02:00
Zamil Majdy
78b95f8a76 fix(platform): add provider_cost tracking to Exa code_context block 2026-04-05 14:44:30 +02:00
Zamil Majdy
6f0c1dfa11 fix(platform): close tracking gaps found during audit
- resolve_tracking: read `script` field for elevenlabs in addition to
  `script_input`/`text` — VideoNarrationBlock uses `script`, was
  producing tracking_amount=0 characters before.
- exa/similar.py + exa/research.py (3 blocks): extract provider_cost
  from response.cost_dollars.total via merge_stats so tracking_type
  ends up as "cost_usd" with real dollar amounts instead of
  falling through to per_run.
- Add test for script field resolution.

Audit finding: `output_size` set via merge_stats in blocks is
always overridden by the executor wrapper (manager.py:440 computes
byte count of serialized output), and `walltime` is also set by
the wrapper (manager.py:667). So the existing merge_stats(output_size=1)
calls in ~15 blocks are dead code for cost tracking purposes; they
don't hurt but don't add data either. The real tracking data sources
are: (1) input/output_token_count from LLM blocks, (2) provider_cost
from APIs that return USD, (3) input_data for per-character TTS,
(4) auto-populated walltime for wall-clock billing.
2026-04-05 14:38:24 +02:00
Zamil Majdy
5e595231da test(platform): align actions tests with string date passthrough
The actions intentionally pass raw ISO strings (cast to Date) to the
generated client to avoid Date.toString() producing non-ISO output
that FastAPI rejects. Update the tests to match this behavior rather
than expecting Date instances.
2026-04-05 12:54:42 +02:00
Zamil Majdy
7b36bed8a5 fix(platform): address autogpt-reviewer feedback on cost tracking
- cost_tracking.py + token_tracking.py: switch back to asyncio.create_task
  for true fire-and-forget on hot path, but hold strong references in a
  module-level set (with done-callback discard) so tasks can't be GC'd
  mid-flight. Addresses both the "await blocks executor" concern and the
  "task may vanish before completion" concern.
- cost_tracking.py: `> 0` checks instead of truthy for output_size/walltime
  so legitimate zero values aren't stored as NULL.
- platform_cost_routes_test.py: add explicit 403 test for non-admin JWT
  and extend 401 test to cover /logs endpoint.
- actions.ts: forward raw ISO strings to generated client instead of Date
  objects — the client calls .toString() which produces human-readable
  format that FastAPI rejects with 422. Fixes timezone filter on the
  admin dashboard.
2026-04-05 12:50:03 +02:00
Zamil Majdy
372900c141 fix(platform): address 5 self-review items on cost tracking
- cost_tracking.py: drop asyncio.create_task fire-and-forget (risked task
  GC mid-flight per Python docs); await log_platform_cost_safe directly.
  Wrap body in try/except so logging never disrupts executor.
- token_tracking.py: same create_task fix; await directly.
- platform_cost.py: document that by_provider rows are keyed on
  (provider, tracking_type) so the same provider can appear multiple times.
- PlatformCostContent.tsx: convert datetime-local (naive local time) to
  UTC ISO before URL serialization so filter windows match admin's wall
  clock regardless of backend timezone. Convert back to local for input
  display.
2026-04-05 11:55:00 +02:00
Nicholas Tindle
1a305db162 ci(frontend): add Playwright E2E coverage reporting to Codecov (#12665)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 00:55:09 -05:00
Zamil Majdy
7afd2b249d fix(platform): address 9 should-fix items from PR review
1. Fix route path double-nesting: /api/admin/platform-costs/{dashboard,logs}
2. Fix falsy zero suppression: pass raw token counts instead of `or None`
3. Split 546-line PlatformCostContent into SummaryCard, ProviderTable,
   UserTable, LogsTable, TrackingBadge sub-components
4. Add merge_stats accumulation tests and integration test for
   on_node_execution -> log_system_credential_cost wiring
5. Add source citations for DEFAULT_COST_PER_RUN values
6. Extract MICRODOLLARS_PER_USD constant, use in all conversion sites
7. Parallelize COUNT + SELECT in get_platform_cost_logs with asyncio.gather
8. Remove dead block_name parameter from resolve_tracking()
9. Remove unrelated store.test.ts (added by this PR, not on dev)
2026-04-03 23:14:38 +02:00
Zamil Majdy
8d22653810 fix(platform): address 4 review blockers on cost tracking
- Fire-and-forget cost logging via asyncio.create_task() instead of await
  to avoid blocking executor and copilot streaming paths on DB INSERT
- Add trackingType column to PlatformCostLog schema, migration, and INSERT;
  update dashboard/logs queries to use COALESCE(column, JSONB) for backward
  compat and index-friendly GROUP BY
- Admin auth test now explicitly mocks get_jwt_payload to raise 401 instead
  of relying on bare FastAPI app behavior
- Blocker 3 (nullable user_id) was already addressed in prior commit
2026-04-03 22:43:57 +02:00
Zamil Majdy
48a653dc63 fix(copilot): prevent duplicate side effects from double-submit and stale-cache race (#12660)
## Why

#12604 (intermediate persistence) introduced two bugs on dev:

1. **Duplicate user messages** — `set_turn_duration` calls
`invalidate_session_cache()` which deletes the Redis key. Concurrent
`get_chat_session()` calls re-populate it from DB with stale data. The
executor loads this stale cache, misses the user message, and re-appends
it.

2. **Tool outputs lost on hydration** — Intermediate flushes save
assistant messages to DB before `StreamToolInputAvailable` sets
`tool_calls` on them. Since `_save_session_to_db` is append-only (uses
`start_sequence`), the `tool_calls` update is lost — subsequent flushes
start past that index. On page refresh / SSE reconnect, tool UIs
(SetupRequirementsCard, run_block output, etc.) are invisible.

3. **Sessions stuck running** — If a tool call hangs (e.g. WebSearch
provider not responding), the stream never completes,
`mark_session_completed` never runs, and the `active_stream` flag stays
stale in Redis.

## What

- **In-place cache update** in `set_turn_duration` — replaces
`invalidate_session_cache()` with a read-modify-write that patches the
duration on the cached session, eliminating the stale-cache repopulation
window
- **tool_calls backfill** — tracks the flush watermark and assistant
message index; when `StreamToolInputAvailable` sets `tool_calls` on an
already-flushed assistant, updates the DB record directly via
`update_message_tool_calls()`
- **Improved message dedup** — `is_message_duplicate()` /
`maybe_append_user_message()` scans trailing same-role messages (current
turn) instead of only checking `messages[-1]`
- **Idle timeout** — aborts the stream with a retryable error if no
meaningful SDK message arrives for 10 minutes, preventing hung tool
calls from leaving sessions stuck

## Changes

- `copilot/db.py` — `update_message_tool_calls()`, in-place cache update
in `set_turn_duration`
- `copilot/model.py` — `is_message_duplicate()`,
`maybe_append_user_message()`
- `copilot/sdk/service.py` — flush watermark tracking, tool_calls
backfill, idle timeout
- `copilot/baseline/service.py` — use `maybe_append_user_message()`
- `copilot/model_test.py` — unit tests for dedup
- `copilot/db_test.py` — unit tests for set_turn_duration cache update

## Checklist

- [x] My PR title follows [conventional
commit](https://www.conventionalcommits.org/) format
- [x] Out-of-scope changes are less than 20% of the PR
- [x] Changes to `data/*.py` validated for user ID checks (N/A)
- [x] Protected routes updated in middleware (N/A)
2026-04-04 01:09:42 +07:00
Toran Bruce Richards
f6ddcbc6cb feat(platform): Add all 12 Z.ai GLM models via OpenRouter (#12672)
## Summary

Add Z.ai (Zhipu AI) GLM model family to the platform LLM blocks, routed
through OpenRouter. This enables users to select any of the 12 Z.ai
models across all LLM-powered blocks (AI Text Generator, AI
Conversation, AI Structured Response, AI Text Summarizer, AI List
Generator).

## Gap Analysis

All 12 Z.ai models currently available on OpenRouter's API were missing
from the AutoGPT platform:

| Model | Context Window | Max Output | Price Tier | Cost |
|-------|---------------|------------|------------|------|
| GLM 4 32B | 128K | N/A | Tier 1 | 1 |
| GLM 4.5 | 131K | 98K | Tier 2 | 2 |
| GLM 4.5 Air | 131K | 98K | Tier 1 | 1 |
| GLM 4.5 Air (Free) | 131K | 96K | Tier 1 | 1 |
| GLM 4.5V (vision) | 65K | 16K | Tier 2 | 2 |
| GLM 4.6 | 204K | 204K | Tier 1 | 1 |
| GLM 4.6V (vision) | 131K | 131K | Tier 1 | 1 |
| GLM 4.7 | 202K | 65K | Tier 1 | 1 |
| GLM 4.7 Flash | 202K | N/A | Tier 1 | 1 |
| GLM 5 | 80K | 131K | Tier 2 | 2 |
| GLM 5 Turbo | 202K | 131K | Tier 3 | 4 |
| GLM 5V Turbo (vision) | 202K | 131K | Tier 3 | 4 |

## Changes

- **`autogpt_platform/backend/backend/blocks/llm.py`**: Added 12
`LlmModel` enum entries and corresponding `MODEL_METADATA` with context
windows, max output tokens, display names, and price tiers sourced from
OpenRouter API
- **`autogpt_platform/backend/backend/data/block_cost_config.py`**:
Added `MODEL_COST` entries for all 12 models, with costs scaled to match
pricing (1 for budget, 2 for mid-range, 4 for premium)

## How it works

All Z.ai models route through the existing OpenRouter provider
(`open_router`) — no new provider or API client code needed. Users with
an OpenRouter API key can immediately select any Z.ai model from the
model dropdown in any LLM block.

## Related

- Linear: REQ-83

---------

Co-authored-by: AutoGPT CoPilot <copilot@agpt.co>
2026-04-03 15:48:33 +00:00
Zamil Majdy
b00e16b438 fix(platform): fix model_test to use Optional fields for None-skip test
Use provider_cost (Optional) and error (Optional) instead of
walltime (non-nullable float) to test __iadd__ None-skip behavior.
2026-04-03 17:25:15 +02:00
Zamil Majdy
b5acfb7855 fix: resolve merge conflict with dev in helpers.test.ts 2026-04-03 17:12:07 +02:00
Zamil Majdy
1ee0bd6619 fix(platform): use round() for microdollar conversion and add cost tracking tests
- Fix float->int truncation bug in token_tracking.py and cost_tracking.py
  where int(cost * 1_000_000) would under-count (e.g. 0.0015 -> 1499
  instead of 1500). Now uses round() for correct rounding.
- Extract _resolve_tracking and _log_system_credential_cost from
  manager.py into dedicated cost_tracking.py module for testability.
- Add unit tests for all 8+ provider branches in resolve_tracking,
  log_system_credential_cost happy/skip paths, and model conversion.
- Add NodeExecutionStats.__iadd__ regression tests for None-skip behavior.
- Add frontend component tests for PlatformCostContent (14 tests) and
  actions.ts server actions (7 tests) to improve codecov patch coverage.
2026-04-03 17:04:07 +02:00
Zamil Majdy
98f13a6e5d feat(copilot): add create -> dry-run -> fix loop to agent generation (#12578)
## Summary
- Instructs the copilot LLM to automatically dry-run agents after
creating or editing them, inspect the output for wiring/data-flow
issues, and fix iteratively before presenting the agent as ready to the
user
- Updates tool descriptions (run_agent, get_agent_building_guide),
prompting supplement, and agent generation guide with clear workflow
instructions and error pattern guidance
- Adds Tool Discovery Priority to shared tool notes (find_block ->
run_mcp_tool -> SendAuthenticatedWebRequestBlock -> manual API)
- Adds 37 tests: prompt regression tests + functional tests (tool schema
validation, Pydantic model, guide workflow ordering)
- **Frontend**: Fixes host-scoped credential UX — replaces duplicate
credentials for the same host instead of stacking them, wires up delete
functionality with confirmation modal, updates button text contextually
("Update headers" vs "Add headers")

## Test plan
- [x] All 37 `dry_run_loop_test.py` tests pass (prompt content, tool
schemas, Pydantic model, guide ordering)
- [x] Existing `tool_schema_test.py` passes (110 tests including
character budget gate)
- [x] Ruff lint and format pass
- [x] Pyright type checking passes
- [x] Frontend: `pnpm lint`, `pnpm types` pass
- [x] Manual verification: confirm copilot follows the create -> dry-run
-> fix workflow when asked to build an agent
- [x] Manual verification: confirm host-scoped credentials replace
instead of duplicate
2026-04-03 14:48:57 +00:00
Zamil Majdy
613978a611 ci: add gitleaks secret scanning to pre-commit hooks (#12649)
### Why / What / How

**Why:** We had no local pre-commit protection against accidentally
committing secrets. The existing `detect-secrets` hook only ran on
`pre-push`, which is too late — secrets are already in git history by
that point. GitHub's push protection only covers known provider patterns
and runs server-side.

**What:** Adds a 3-layer defense against secret leaks: local pre-commit
hooks (gitleaks + detect-secrets), and a CI workflow as a safety net.

**How:** 
- Moved `detect-secrets` from `pre-push` to `pre-commit` stage
- Added `gitleaks` as a second pre-commit hook (Go binary, faster and
more comprehensive rule set)
- Added `.gitleaks.toml` config with allowlists for known false
positives (test fixtures, dev docker JWTs, Firebase public keys, lock
files, docs examples)
- Added `repo-secret-scan.yml` CI workflow using `gitleaks-action` on
PRs/pushes to master/dev

### Changes 🏗️

- `.pre-commit-config.yaml`: Moved `detect-secrets` to pre-commit stage,
added baseline arg, added `gitleaks` hook
- `.gitleaks.toml`: New config with tuned allowlists for this repo's
false positives
- `.secrets.baseline`: Empty baseline for detect-secrets to track known
findings
- `.github/workflows/repo-secret-scan.yml`: New CI workflow running
gitleaks on every PR and push

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Ran `gitleaks detect --no-git` against the full repo — only `.env`
files (gitignored) remain as findings
  - [x] Verified gitleaks catches a test secret file correctly
- [x] Pre-commit hooks pass on commit (both detect-secrets and gitleaks
passed)

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
2026-04-03 14:01:26 +00:00
Zamil Majdy
2b0e8a5a9f feat(platform): add rate-limit tiering system for CoPilot (#12581)
## Summary
- Adds a four-tier subscription system (FREE/PRO/BUSINESS/ENTERPRISE)
for CoPilot with configurable multipliers (1x/5x/20x/60x) applied on top
of the base LaunchDarkly/config limits
- Stores user tier in the database (`User.subscriptionTier` column as a
Prisma enum, defaults to PRO for beta testing) with admin API endpoints
for tier management
- Includes tier info in usage status responses and OTEL/Langfuse trace
metadata for observability

## Tier Structure
| Tier | Multiplier | Daily Tokens | Weekly Tokens | Notes |
|------|-----------|-------------|--------------|-------|
| FREE | 1x | 2.5M | 12.5M | Base tier (unused during beta) |
| PRO | 5x | 12.5M | 62.5M | Default on sign-up (beta) |
| BUSINESS | 20x | 50M | 250M | Manual upgrade for select users |
| ENTERPRISE | 60x | 150M | 750M | Highest tier, custom |

## Changes
- **`rate_limit.py`**: `SubscriptionTier` enum
(FREE/PRO/BUSINESS/ENTERPRISE), `TIER_MULTIPLIERS`, `get_user_tier()`,
`set_user_tier()`, update `get_global_rate_limits()` to apply tier
multiplier and return 3-tuple, add `tier` field to `CoPilotUsageStatus`
- **`rate_limit_admin_routes.py`**: Add `GET/POST
/admin/rate_limit/tier` endpoints, include `tier` in
`UserRateLimitResponse`
- **`routes.py`** (chat): Include tier in `/usage` endpoint response
- **`sdk/service.py`**: Send `subscription_tier` in OTEL/Langfuse trace
metadata
- **`schema.prisma`**: Add `SubscriptionTier` enum and
`subscriptionTier` column to `User` model (default: PRO)
- **`config.py`**: Update docs to reflect tier system
- **Migration**: `20260326200000_add_rate_limit_tier` — creates enum,
migrates STANDARD→PRO, adds BUSINESS, sets default to PRO

## Test plan
- [x] 72 unit tests all passing (43 rate_limit + 11 admin routes + 18
chat routes)
- [ ] Verify FREE tier users get base limits (2.5M daily, 12.5M weekly)
- [ ] Verify PRO tier users get 5x limits (12.5M daily, 62.5M weekly)
- [ ] Verify BUSINESS tier users get 20x limits (50M daily, 250M weekly)
- [ ] Verify ENTERPRISE tier users get 60x limits (150M daily, 750M
weekly)
- [ ] Verify admin can read and set user tiers via API
- [ ] Verify tier info appears in Langfuse traces
- [ ] Verify migration applies cleanly (creates enum, migrates STANDARD
users to PRO, adds BUSINESS, default PRO)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-04-03 13:36:01 +00:00
Zamil Majdy
08bb05141c dx: enhance pr-address skill with detailed codecov coverage guidance (#12662)
Enhanced pr-address skill codecov section with local coverage commands,
priority guide, and troubleshooting steps.
2026-04-03 13:15:46 +00:00
Zamil Majdy
4190f75b0b test: additional coverage for platform cost and token tracking 2026-04-03 15:09:54 +02:00
Zamil Majdy
71315aa982 fix(backend): use actual provider in persist_and_record_usage cost logging
The provider field was hardcoded to "open_router" for all PlatformCostLog
entries, even when the SDK (Anthropic) path was the caller. Add a provider
parameter that defaults to "open_router" for backward compatibility and
pass "anthropic" from the SDK service layer.
2026-04-03 14:37:47 +02:00
Nicholas Tindle
3ccaa5e103 ci(frontend): make frontend coverage checks informational (non-blocking) (#12663)
### Why / What / How

**Why:** Frontend test coverage is still ramping up. The default
component status checks (project + patch at 80%) would block merges for
insufficient coverage on frontend changes, which isn't practical yet.

**What:** Override the platform-frontend component's coverage statuses
to be `informational: true`, so they report but don't block merges.

**How:** Added explicit `statuses` to the `platform-frontend` component
in `codecov.yml` with `informational: true` on both project and patch
checks, overriding the `default_rules`.

### Changes 🏗️

- **`codecov.yml`**: Added `informational: true` to platform-frontend
component's project and patch status checks

### Checklist 📋

#### For code changes:
- [ ] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] Verify Codecov frontend status checks show as informational
(non-blocking) on PRs touching frontend code

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Low risk: Codecov configuration-only change that affects merge gating
for frontend coverage statuses but does not alter runtime code.
> 
> **Overview**
> Updates `codecov.yml` to override the `platform-frontend` component’s
coverage `statuses` so both **project** and **patch** checks are marked
`informational: true` (non-blocking), while leaving the default
component coverage rules unchanged for other components.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
f8e8426a31. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 12:22:05 +00:00
Zamil Majdy
960f893295 test(platform): add unit tests for platform cost helpers and data layer
Extract pure helper functions (formatMicrodollars, formatTokens,
formatDuration, estimateCostForRow, trackingValue, toDateOrUndefined)
from PlatformCostContent.tsx into helpers.ts for testability. Add 26
vitest cases covering all formatting and cost-estimation branches.

Add backend tests for _build_where and _json_or_none in
platform_cost.py (11 pytest cases covering filter combinations).
2026-04-03 14:15:28 +02:00
Zamil Majdy
759effab60 test(frontend): add unit tests for onboarding store and GenericTool helpers
Improve frontend patch coverage with comprehensive tests for the
onboarding wizard zustand store and GenericTool helper functions.
2026-04-03 13:25:52 +02:00
Zamil Majdy
45b6ada739 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into codex/platform-cost-tracking 2026-04-03 13:07:06 +02:00
Zamil Majdy
da544d3411 fix(platform): fix CI - regenerate API schema + fix Date type mismatch
- Regenerate openapi.json using export-api-schema command (CI-compatible)
- Convert string date params to Date objects before passing to generated
  API functions (orval generates Date | null for datetime fields)
- pnpm types passes cleanly
2026-04-02 20:56:08 +02:00
Zamil Majdy
54e5059d7c fix(platform): use generated Pagination type + estimate token costs
- Import Pagination from generated client instead of hand-written types
- Add DEFAULT_COST_PER_1K_TOKENS for OpenAI/Anthropic/Groq/Ollama
- estimateCostForRow now computes cost from token count when provider
  doesn't report actual USD (tokens * rate_per_1k / 1000)
- Added date comment for when default rates were checked
2026-04-02 20:43:23 +02:00
Zamil Majdy
1d7d2f77f3 feat(platform): tracking-aware dashboard with generated API client
Backend:
- ProviderCostSummary now includes tracking_type and total_duration_seconds
- CostLogRow includes tracking_type and duration
- SQL queries extract tracking_type from metadata JSON

Frontend:
- Replaced hand-written types/client with generated API client (orval)
- Actions use getV2GetPlatformCostDashboard/getV2GetPlatformCostLogs
- Provider table shows: Type badge, Usage metric, Known Cost, Estimated Cost
- Per-run providers have editable $/run input with defaults
- Summary cards show "Known Cost" vs "Estimated Total"
- Logs table shows tracking_type badge + duration column
- Color-coded badges: cost_usd(green), tokens(blue), duration(orange),
  characters(purple), per_run(gray)
2026-04-02 20:27:58 +02:00
Zamil Majdy
567bc73ec4 fix(blocks): regenerate block docs after merge with dev 2026-04-02 19:32:15 +02:00
Zamil Majdy
61ef54af05 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into codex/platform-cost-tracking 2026-04-02 19:28:37 +02:00
Zamil Majdy
405403e6b7 fix(backend): initialize response before try block to satisfy pyright 2026-04-02 19:21:22 +02:00
Zamil Majdy
ab16e63b0a fix(platform): pass model name to copilot cost tracking
Both SDK and Baseline paths now pass config.model to
persist_and_record_usage so PlatformCostLog records the actual
model (e.g. anthropic/claude-sonnet-4) for filtering/grouping.
2026-04-02 19:03:47 +02:00
Zamil Majdy
45d3193727 fix(platform): move baseline cost extraction to finally + accumulate multi-round costs
- Move x-total-cost header extraction to finally block so cost is
  captured even when stream errors mid-way (we already paid)
- Accumulate cost across multi-round tool-calling turns instead of
  overwriting with last round only
- Handle UnboundLocalError if response was never assigned
2026-04-02 19:00:44 +02:00
Zamil Majdy
9a08011d7d fix(platform): move opentelemetry import to top-level in both copilot paths 2026-04-02 18:57:36 +02:00
Zamil Majdy
6fa66ac7da feat(platform): add cost/token OTEL attributes to both copilot paths
Both SDK and Baseline copilot paths now set OpenTelemetry span
attributes for cost tracking before the trace context closes:
- gen_ai.usage.prompt_tokens
- gen_ai.usage.completion_tokens
- gen_ai.usage.cost_usd (when available)
- gen_ai.usage.cache_read_tokens (SDK only)
- gen_ai.usage.cache_creation_tokens (SDK only)

Also extracts x-total-cost from OpenRouter response headers in the
Baseline streaming path, giving actual USD cost for both modes.

These attributes flow to Langfuse/any OTEL backend for cost dashboards.
2026-04-02 18:55:12 +02:00
Zamil Majdy
4bad08394c feat(platform): extract OpenRouter cost from baseline copilot path
The baseline copilot path uses the same OpenRouter API but wasn't
extracting the x-total-cost header. Now extracts cost from the
streaming response headers and passes it to persist_and_record_usage,
giving us actual USD cost for both copilot modes.
2026-04-02 18:39:55 +02:00
Zamil Majdy
993c43b623 feat(platform): add merge_stats to remaining blocks (FAL, Revid, D-ID, E2B, YouTube, Weather, TTS, Enrichlayer)
Every system credential block now has explicit merge_stats tracking.
No block relies on the generic fallback anymore.
2026-04-02 18:22:02 +02:00
Zamil Majdy
a8a62eeefc feat(platform): add merge_stats tracking to all system credential blocks
Every block that uses system credentials now calls merge_stats with
meaningful data after the API response:
- Google Maps: output_size = number of places returned (= detail API calls)
- Apollo people/org: output_size = results count
- Apollo person: output_size = 1 per enrichment
- SmartLead: output_size = leads added or 1 per operation
- Ideogram: output_size = 1 per image
- Replicate: output_size = 1 per prediction
- Nvidia: output_size = 1 per inference
- ScreenshotOne: output_size = 1 per screenshot
- ZeroBounce: output_size = 1 per email validated
- Mem0: output_size = 1 per memory operation
2026-04-02 18:13:15 +02:00
Zamil Majdy
173614bcc5 fix(platform): audit and fix per-provider tracking accuracy
- Fix ElevenLabs/D-ID field name: script -> script_input
- Remove incorrect Google Maps api_calls formula, use per_run instead
- Remove D-ID from generation_seconds (walltime includes polling)
- Jina embeddings: extract total_tokens from response.usage
- Simplify tracking types: cost_usd, tokens, characters,
  sandbox_seconds, walltime_seconds, per_run
2026-04-02 17:58:24 +02:00
Zamil Majdy
fbe634fb19 fix(platform): handle null user_id in cost logs and fix 0.0 cost stored as NULL
- Add null-safe optional chaining for user_id.slice() in LogsTable, displaying
  "Deleted user" when user_id is null to prevent frontend crash
- Change `if cost_float` to `if cost_float is not None` in token_tracking.py
  so that a legitimate $0.00 cost is stored as 0 instead of NULL
2026-04-02 17:38:59 +02:00
Zamil Majdy
a338c72c42 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into codex/platform-cost-tracking 2026-04-02 17:36:14 +02:00
Zamil Majdy
7f4398efa3 feat(platform): provider-specific tracking types for accurate cost metrics
Replace one-size-fits-all tracking cascade with provider-aware logic:
- cost_usd: OpenRouter (x-total-cost header), Exa (cost_dollars)
- tokens: OpenAI, Anthropic, Groq, Ollama (token counts)
- characters: Unreal Speech, ElevenLabs (input text length)
- api_calls: Google Maps (1 nearby + N detail calls)
- sandbox_seconds: E2B (sandbox execution time)
- generation_seconds: FAL, Revid, D-ID, Replicate (video/image gen time)
- per_run: Apollo, SmartLead, ZeroBounce, Jina, etc.
2026-04-02 17:30:15 +02:00
Zamil Majdy
c2a054c511 fix(backend): prevent provider_cost loss on stats merge and widen costMicrodollars to BigInt
- NodeExecutionStats.__iadd__ was overwriting accumulated provider_cost
  with None when merging stats that lacked provider_cost (e.g. the final
  llm_call_count/llm_retry_count merge). Skip None values in __iadd__
  so existing data is never erased.
- Widen PlatformCostLog.costMicrodollars from Int (max ~$2,147) to
  BigInt to prevent theoretical overflow for high-cost aggregated
  node executions.
2026-04-02 17:28:27 +02:00
Zamil Majdy
83b00f4789 feat(platform): add copilot/autopilot cost tracking via token_tracking.py
Copilot uses OpenRouter via a separate code path (not through the block
executor). This integrates PlatformCostLog into the shared
persist_and_record_usage() function which is called by both SDK and
baseline copilot paths, capturing:
- Every LLM turn (main conversation, title gen, context compression)
- Tokens (prompt + completion + cache)
- Actual USD cost when available (SDK path provides cost_usd)
- Session ID for correlation
2026-04-02 17:17:53 +02:00
Zamil Majdy
95524e94b3 feat(platform): add tracking_type and tracking_amount to cost log metadata
Standardize cost tracking across providers:
- cost_usd: actual dollar cost (OpenRouter, Exa)
- tokens: total token count (LLM blocks)
- duration_seconds: execution time (video gen, sandboxes)
- per_run: flat per-request (all others)
2026-04-02 17:04:50 +02:00
Zamil Majdy
2c517ff9a1 feat(platform): add per-provider cost extraction
- OpenRouter: Extract actual USD cost from x-total-cost response header
- Exa (search, contents): Write cost_dollars.total to execution_stats
- LLM blocks: Store provider_cost in stats when available
- Add provider_cost field to NodeExecutionStats
- Hook now converts provider_cost to costMicrodollars in PlatformCostLog
- Metadata includes both credit_cost and provider_cost_usd when available
2026-04-02 16:57:34 +02:00
Zamil Majdy
7020ae2189 fix(backend): handle NULL userId in platform cost models and queries
Make user_id Optional[str] in UserCostSummary and CostLogRow to handle
cases where the referenced user has been deleted. Use .get() for safe
access to user_id from query result rows. Regenerate OpenAPI schema.
2026-04-02 16:54:09 +02:00
Zamil Majdy
b9336984be fix(platform): re-add credit_cost to platform cost log metadata
Include the block's credit cost (from block_cost_config) in the log
metadata so every entry has a known cost proxy even when the provider
doesn't expose actual dollar costs.
2026-04-02 16:37:28 +02:00
Zamil Majdy
9924dedddc fix(platform): address bot review comments (sentry + coderabbit)
- CRITICAL: Use execute_raw_with_schema for INSERT (not query_raw)
- Remove accidentally committed transcripts/
- Add dry_run guard to skip cost logging for simulated executions
- Change onDelete: Cascade → SetNull to preserve cost history
- Add standalone createdAt index for date-only queries
- Add deterministic tiebreaker (id) to pagination ORDER BY
- Update migration SQL to match schema changes
2026-04-02 16:26:01 +02:00
Zamil Majdy
c054799b4f fix: regenerate API schema and block docs 2026-04-02 16:23:12 +02:00
Zamil Majdy
f3b5d584a3 fix(platform): address PR review round 5
- Replace ServerCrash icon with Receipt for Platform Costs sidebar
2026-04-02 16:02:00 +02:00
Zamil Majdy
476d9dcf80 fix(platform): address PR review round 4
- Add tests for query parameter forwarding and pagination
2026-04-02 16:00:08 +02:00
Zamil Majdy
072b623f8b fix(platform): address PR review round 3
- Remove duplicate block_usage_cost call from cost logging
- Add case-insensitive provider filter using LOWER()
- Add platform_cost_routes_test.py with basic endpoint tests
2026-04-02 15:58:00 +02:00
Zamil Majdy
26b0c95936 fix(platform): address PR review round 2
- Parallelize dashboard queries with asyncio.gather for ~3x speedup
- Move json import to top-level
- Use consistent p. table alias across all dashboard queries
2026-04-02 15:55:03 +02:00
Zamil Majdy
308357de84 fix(platform): address PR review round 1
- Parameterize LIMIT/OFFSET in SQL queries to prevent injection
- Only log platform cost on successful block execution
- Convert model enum values to strings for proper logging
- Add error handling with try/catch/finally in frontend useEffect
- Drive filter state from URL params to prevent desync
- Add dark mode support using design tokens
- Return total_users count in dashboard for accurate reporting
- Add credit_cost to metadata as cost proxy until per-token pricing
2026-04-02 15:51:28 +02:00
Zamil Majdy
1a6c50c6cc feat(platform): add platform cost tracking for system credentials
Track real API costs incurred when users consume system-managed credentials.
Captures provider, tokens, duration, and model per block execution and
surfaces an admin dashboard with provider/user aggregation and raw logs.
2026-04-02 15:42:18 +02:00
242 changed files with 27506 additions and 2342 deletions

View File

@@ -95,6 +95,28 @@ Address comments **one at a time**: fix → commit → push → inline reply →
| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="🤖 Fixed in <commit-sha>: <description>"` |
| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="🤖 Fixed in <commit-sha>: <description>"` |
## Codecov coverage
Codecov patch target is **80%** on changed lines. Checks are **informational** (not blocking) but should be green.
### Running coverage locally
**Backend** (from `autogpt_platform/backend/`):
```bash
poetry run pytest -s -vv --cov=backend --cov-branch --cov-report term-missing
```
**Frontend** (from `autogpt_platform/frontend/`):
```bash
pnpm vitest run --coverage
```
### When codecov/patch fails
1. Find uncovered files: `git diff --name-only $(gh pr view --json baseRefName --jq '.baseRefName')...HEAD`
2. For each uncovered file — extract inline logic to `helpers.ts`/`helpers.py` and test those (highest ROI). Colocate tests as `*_test.py` (backend) or `__tests__/*.test.ts` (frontend).
3. Run coverage locally to verify, commit, push.
## Format and commit
After fixing, format the changed code:

View File

@@ -530,9 +530,19 @@ After showing all screenshots, output a **detailed** summary table:
# but Homebrew bash is 5.x; Linux typically has bash 5.x). If running on Bash <4, use a
# plain variable with a lookup function instead.
declare -A SCREENSHOT_EXPLANATIONS=(
["01-login-page.png"]="Shows the login page loaded successfully with SSO options visible."
["02-builder-with-block.png"]="The builder canvas displays the newly added block connected to the trigger."
# ... one entry per screenshot, using the same explanations you showed the user above
# Each explanation MUST answer three things:
# 1. FLOW: Which test scenario / user journey is this part of?
# 2. STEPS: What exact actions were taken to reach this state?
# 3. EVIDENCE: What does this screenshot prove (pass/fail/data)?
#
# Good example:
# ["03-cost-log-after-run.png"]="Flow: LLM block cost tracking. Steps: Logged in as tester@gmail.com → ran 'Cost Test Agent' → waited for COMPLETED status. Evidence: PlatformCostLog table shows 1 new row with cost_microdollars=1234 and correct user_id."
#
# Bad example (too vague — never do this):
# ["03-cost-log.png"]="Shows the cost log table."
["01-login-page.png"]="Flow: Login flow. Steps: Opened /login. Evidence: Login page renders with email/password fields and SSO options visible."
["02-builder-with-block.png"]="Flow: Block execution. Steps: Logged in → /build → added LLM block. Evidence: Builder canvas shows block connected to trigger, ready to run."
# ... one entry per screenshot using the flow/steps/evidence format above
)
TEST_RESULTS_TABLE="| 1 | Login flow | PASS | N/A | 01-login-before.png, 02-login-after.png |
@@ -547,6 +557,9 @@ Upload screenshots to the PR using the GitHub Git API (no local git operations
**This step is MANDATORY. Every test run MUST post a PR comment with screenshots. No exceptions.**
> **CRITICAL — NEVER post a bare directory link like `https://github.com/.../tree/...`.**
> Every screenshot MUST appear as `![name](raw_url)` inline in the PR comment so reviewers can see them without clicking any links. After posting, the verification step below greps the comment for `![` tags and exits 1 if none are found — the test run is considered incomplete until this passes.
```bash
# Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
REPO="Significant-Gravitas/AutoGPT"
@@ -582,12 +595,25 @@ for img in "${SCREENSHOT_FILES[@]}"; do
done
TREE_JSON+=']'
# Step 2: Create tree, commit, and branch ref
# Step 2: Create tree, commit (with parent), and branch ref
TREE_SHA=$(echo "$TREE_JSON" | jq -c '{tree: .}' | gh api "repos/${REPO}/git/trees" --input - --jq '.sha')
COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
-f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
-f tree="$TREE_SHA" \
--jq '.sha')
# Resolve existing branch tip as parent (avoids orphan commits on repeat runs)
PARENT_SHA=$(gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" --jq '.object.sha' 2>/dev/null || true)
if [ -n "$PARENT_SHA" ]; then
COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
-f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
-f tree="$TREE_SHA" \
-f "parents[]=$PARENT_SHA" \
--jq '.sha')
else
# First commit on this branch — no parent
COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
-f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
-f tree="$TREE_SHA" \
--jq '.sha')
fi
gh api "repos/${REPO}/git/refs" \
-f ref="refs/heads/${SCREENSHOTS_BRANCH}" \
-f sha="$COMMIT_SHA" 2>/dev/null \
@@ -656,17 +682,123 @@ ${IMAGE_MARKDOWN}
${FAILED_SECTION}
INNEREOF
gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE"
POSTED_BODY=$(gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE" --jq '.body')
rm -f "$COMMENT_FILE"
```
**The PR comment MUST include:**
1. A summary table of all scenarios with PASS/FAIL and before/after API evidence
2. Every successfully uploaded screenshot rendered inline; any failed uploads listed with manual attachment instructions
3. A 1-2 sentence explanation below each screenshot describing what it proves
3. A structured explanation below each screenshot covering: **Flow** (which scenario), **Steps** (exact actions taken to reach this state), **Evidence** (what this proves — pass/fail/data values). A bare "shows the page" caption is not acceptable.
This approach uses the GitHub Git API to create blobs, trees, commits, and refs entirely server-side. No local `git checkout` or `git push` — safe for worktrees and won't interfere with the PR branch.
**Verify inline rendering after posting — this is required, not optional:**
```bash
# 1. Confirm the posted comment body contains inline image markdown syntax
if ! echo "$POSTED_BODY" | grep -q '!\['; then
echo "❌ FAIL: No inline image tags in posted comment body. Re-check IMAGE_MARKDOWN and re-post."
exit 1
fi
# 2. Verify at least one raw URL actually resolves (catches wrong branch name, wrong path, etc.)
FIRST_IMG_URL=$(echo "$POSTED_BODY" | grep -o 'https://raw.githubusercontent.com[^)]*' | head -1)
if [ -n "$FIRST_IMG_URL" ]; then
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" --max-time 10 "$FIRST_IMG_URL")
if [ "$HTTP_STATUS" = "200" ]; then
echo "✅ Inline images confirmed and raw URL resolves (HTTP 200)"
else
echo "❌ FAIL: Raw image URL returned HTTP $HTTP_STATUS — images will not render inline."
echo " URL: $FIRST_IMG_URL"
echo " Check branch name, path, and that the push succeeded."
exit 1
fi
else
echo "⚠️ Could not extract a raw URL from the comment — verify manually."
fi
```
## Step 8: Evaluate test completeness and post a GitHub review
After posting the PR comment, evaluate whether the test run actually covered everything it needed to. This is NOT a rubber-stamp — be critical. Then post a formal GitHub review so the PR author and reviewers can see the verdict.
### 8a. Evaluate against the test plan
Re-read `$RESULTS_DIR/test-plan.md` (written in Step 2) and `$RESULTS_DIR/test-report.md` (written in Step 5). For each scenario in the plan, answer:
> **Note:** `test-report.md` is written in Step 5. If it doesn't exist, write it before proceeding here — see the Step 5 template. Do not skip evaluation because the file is missing; create it from your notes instead.
| Question | Pass criteria |
|----------|--------------|
| Was it tested? | Explicit steps were executed, not just described |
| Is there screenshot evidence? | At least one before/after screenshot per scenario |
| Did the core feature work correctly? | Expected state matches actual state |
| Were negative cases tested? | At least one failure/rejection case per feature |
| Was DB/API state verified (not just UI)? | Raw API response or DB query confirms state change |
Build a verdict:
- **APPROVE** — every scenario tested, evidence present, no bugs found or all bugs are minor/known
- **REQUEST_CHANGES** — one or more: untested scenarios, missing evidence, bugs found, data not verified
### 8b. Post the GitHub review
```bash
EVAL_FILE=$(mktemp)
# === STEP A: Write header ===
cat > "$EVAL_FILE" << 'ENDEVAL'
## 🧪 Test Evaluation
### Coverage checklist
ENDEVAL
# === STEP B: Append ONE line per scenario — do this BEFORE calculating verdict ===
# Format: "- ✅ **Scenario N name**: <what was done and verified>"
# or "- ❌ **Scenario N name**: <what is missing or broken>"
# Examples:
# echo "- ✅ **Scenario 1 Login flow**: tested, screenshot evidence present, auth token verified via API" >> "$EVAL_FILE"
# echo "- ❌ **Scenario 3 Cost logging**: NOT verified in DB — UI showed entry but raw SQL query was skipped" >> "$EVAL_FILE"
#
# !!! IMPORTANT: append ALL scenario lines here before proceeding to STEP C !!!
# === STEP C: Derive verdict from the checklist — runs AFTER all lines are appended ===
FAIL_COUNT=$(grep -c "^- ❌" "$EVAL_FILE" || true)
if [ "$FAIL_COUNT" -eq 0 ]; then
VERDICT="APPROVE"
else
VERDICT="REQUEST_CHANGES"
fi
# === STEP D: Append verdict section ===
cat >> "$EVAL_FILE" << ENDVERDICT
### Verdict
ENDVERDICT
if [ "$VERDICT" = "APPROVE" ]; then
echo "✅ All scenarios covered with evidence. No blocking issues found." >> "$EVAL_FILE"
else
echo "$FAIL_COUNT scenario(s) incomplete or have confirmed bugs. See ❌ items above." >> "$EVAL_FILE"
echo "" >> "$EVAL_FILE"
echo "**Required before merge:** address each ❌ item above." >> "$EVAL_FILE"
fi
# === STEP E: Post the review ===
gh api "repos/${REPO}/pulls/$PR_NUMBER/reviews" \
--method POST \
-f body="$(cat "$EVAL_FILE")" \
-f event="$VERDICT"
rm -f "$EVAL_FILE"
```
**Rules:**
- Never auto-approve without checking every scenario in the test plan
- `REQUEST_CHANGES` if ANY scenario is untested, lacks DB/API evidence, or has a confirmed bug
- The evaluation body must list every scenario explicitly (✅ or ❌) — not just the failures
- If you find new bugs during evaluation, add them to the request-changes body and (if `--fix` flag is set) fix them before posting
## Fix mode (--fix flag)
When `--fix` is present, the standard is HIGHER. Do not just note issues — FIX them immediately.

View File

@@ -0,0 +1,224 @@
---
name: write-frontend-tests
description: "Analyze the current branch diff against dev, plan integration tests for changed frontend pages/components, and write them. TRIGGER when user asks to write frontend tests, add test coverage, or 'write tests for my changes'."
user-invocable: true
args: "[base branch] — defaults to dev. Optionally pass a specific base branch to diff against."
metadata:
author: autogpt-team
version: "1.0.0"
---
# Write Frontend Tests
Analyze the current branch's frontend changes, plan integration tests, and write them.
## References
Before writing any tests, read the testing rules and conventions:
- `autogpt_platform/frontend/TESTING.md` — testing strategy, file locations, examples
- `autogpt_platform/frontend/src/tests/AGENTS.md` — detailed testing rules, MSW patterns, decision flowchart
- `autogpt_platform/frontend/src/tests/integrations/test-utils.tsx` — custom render with providers
- `autogpt_platform/frontend/src/tests/integrations/vitest.setup.tsx` — MSW server setup
## Step 1: Identify changed frontend files
```bash
BASE_BRANCH="${ARGUMENTS:-dev}"
cd autogpt_platform/frontend
# Get changed frontend files (excluding generated, config, and test files)
git diff "$BASE_BRANCH"...HEAD --name-only -- src/ \
| grep -v '__generated__' \
| grep -v '__tests__' \
| grep -v '\.test\.' \
| grep -v '\.stories\.' \
| grep -v '\.spec\.'
```
Also read the diff to understand what changed:
```bash
git diff "$BASE_BRANCH"...HEAD --stat -- src/
git diff "$BASE_BRANCH"...HEAD -- src/ | head -500
```
## Step 2: Categorize changes and find test targets
For each changed file, determine:
1. **Is it a page?** (`page.tsx`) — these are the primary test targets
2. **Is it a hook?** (`use*.ts`) — test via the page that uses it
3. **Is it a component?** (`.tsx` in `components/`) — test via the parent page unless it's complex enough to warrant isolation
4. **Is it a helper?** (`helpers.ts`, `utils.ts`) — unit test directly if pure logic
**Priority order:**
1. Pages with new/changed data fetching or user interactions
2. Components with complex internal logic (modals, forms, wizards)
3. Hooks with non-trivial business logic
4. Pure helper functions
Skip: styling-only changes, type-only changes, config changes.
## Step 3: Check for existing tests
For each test target, check if tests already exist:
```bash
# For a page at src/app/(platform)/library/page.tsx
ls src/app/\(platform\)/library/__tests__/ 2>/dev/null
# For a component at src/app/(platform)/library/components/AgentCard/AgentCard.tsx
ls src/app/\(platform\)/library/components/AgentCard/__tests__/ 2>/dev/null
```
Note which targets have no tests (need new files) vs which have tests that need updating.
## Step 4: Identify API endpoints used
For each test target, find which API hooks are used:
```bash
# Find generated API hook imports in the changed files
grep -rn 'from.*__generated__/endpoints' src/app/\(platform\)/library/
grep -rn 'use[A-Z].*V[12]' src/app/\(platform\)/library/
```
For each API hook found, locate the corresponding MSW handler:
```bash
# If the page uses useGetV2ListLibraryAgents, find its MSW handlers
grep -rn 'getGetV2ListLibraryAgents.*Handler' src/app/api/__generated__/endpoints/library/library.msw.ts
```
List every MSW handler you will need (200 for happy path, 4xx for error paths).
## Step 5: Write the test plan
Before writing code, output a plan as a numbered list:
```
Test plan for [branch name]:
1. src/app/(platform)/library/__tests__/main.test.tsx (NEW)
- Renders page with agent list (MSW 200)
- Shows loading state
- Shows error state (MSW 422)
- Handles empty agent list
2. src/app/(platform)/library/__tests__/search.test.tsx (NEW)
- Filters agents by search query
- Shows no results message
- Clears search
3. src/app/(platform)/library/components/AgentCard/__tests__/AgentCard.test.tsx (UPDATE)
- Add test for new "duplicate" action
```
Present this plan to the user. Wait for confirmation before proceeding. If the user has feedback, adjust the plan.
## Step 6: Write the tests
For each test file in the plan, follow these conventions:
### File structure
```tsx
import { render, screen, waitFor } from "@/tests/integrations/test-utils";
import { server } from "@/mocks/mock-server";
// Import MSW handlers for endpoints the page uses
import {
getGetV2ListLibraryAgentsMockHandler200,
getGetV2ListLibraryAgentsMockHandler422,
} from "@/app/api/__generated__/endpoints/library/library.msw";
// Import the component under test
import LibraryPage from "../page";
describe("LibraryPage", () => {
test("renders agent list from API", async () => {
server.use(getGetV2ListLibraryAgentsMockHandler200());
render(<LibraryPage />);
expect(await screen.findByText(/my agents/i)).toBeDefined();
});
test("shows error state on API failure", async () => {
server.use(getGetV2ListLibraryAgentsMockHandler422());
render(<LibraryPage />);
expect(await screen.findByText(/error/i)).toBeDefined();
});
});
```
### Rules
- Use `render()` from `@/tests/integrations/test-utils` (NOT from `@testing-library/react` directly)
- Use `server.use()` to set up MSW handlers BEFORE rendering
- Use `findBy*` (async) for elements that appear after data fetching — NOT `getBy*`
- Use `getBy*` only for elements that are immediately present in the DOM
- Use `screen` queries — do NOT destructure from `render()`
- Use `waitFor` when asserting side effects or state changes after interactions
- Import `fireEvent` or `userEvent` from the test-utils for interactions
- Do NOT mock internal hooks or functions — mock at the API boundary via MSW
- Do NOT use `act()` manually — `render` and `fireEvent` handle it
- Keep tests focused: one behavior per test
- Use descriptive test names that read like sentences
### Test location
```
# For pages: __tests__/ next to page.tsx
src/app/(platform)/library/__tests__/main.test.tsx
# For complex standalone components: __tests__/ inside component folder
src/app/(platform)/library/components/AgentCard/__tests__/AgentCard.test.tsx
# For pure helpers: co-located .test.ts
src/app/(platform)/library/helpers.test.ts
```
### Custom MSW overrides
When the auto-generated faker data is not enough, override with specific data:
```tsx
import { http, HttpResponse } from "msw";
server.use(
http.get("http://localhost:3000/api/proxy/api/v2/library/agents", () => {
return HttpResponse.json({
agents: [
{ id: "1", name: "Test Agent", description: "A test agent" },
],
pagination: { total_items: 1, total_pages: 1, page: 1, page_size: 10 },
});
}),
);
```
Use the proxy URL pattern: `http://localhost:3000/api/proxy/api/v{version}/{path}` — this matches the MSW base URL configured in `orval.config.ts`.
## Step 7: Run and verify
After writing all tests:
```bash
cd autogpt_platform/frontend
pnpm test:unit --reporter=verbose
```
If tests fail:
1. Read the error output carefully
2. Fix the test (not the source code, unless there is a genuine bug)
3. Re-run until all pass
Then run the full checks:
```bash
pnpm format
pnpm lint
pnpm types
```

View File

@@ -179,21 +179,30 @@ jobs:
pip install pyyaml
# Resolve extends and generate a flat compose file that bake can understand
export NEXT_PUBLIC_SOURCEMAPS NEXT_PUBLIC_PW_TEST
docker compose -f docker-compose.yml config > docker-compose.resolved.yml
# Ensure NEXT_PUBLIC_SOURCEMAPS is in resolved compose
# (docker compose config on some versions drops this arg)
if ! grep -q "NEXT_PUBLIC_SOURCEMAPS" docker-compose.resolved.yml; then
echo "Injecting NEXT_PUBLIC_SOURCEMAPS into resolved compose (docker compose config dropped it)"
sed -i '/NEXT_PUBLIC_PW_TEST/a\ NEXT_PUBLIC_SOURCEMAPS: "true"' docker-compose.resolved.yml
fi
# Add cache configuration to the resolved compose file
python ../.github/workflows/scripts/docker-ci-fix-compose-build-cache.py \
--source docker-compose.resolved.yml \
--cache-from "type=gha" \
--cache-to "type=gha,mode=max" \
--backend-hash "${{ hashFiles('autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/poetry.lock', 'autogpt_platform/backend/backend/**') }}" \
--frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}" \
--frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}-sourcemaps" \
--git-ref "${{ github.ref }}"
# Build with bake using the resolved compose file (now includes cache config)
docker buildx bake --allow=fs.read=.. -f docker-compose.resolved.yml --load
env:
NEXT_PUBLIC_PW_TEST: true
NEXT_PUBLIC_SOURCEMAPS: true
- name: Set up tests - Cache E2E test data
id: e2e-data-cache
@@ -279,6 +288,11 @@ jobs:
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Copy source maps from Docker for E2E coverage
run: |
FRONTEND_CONTAINER=$(docker compose -f ../docker-compose.resolved.yml ps -q frontend)
docker cp "$FRONTEND_CONTAINER":/app/.next/static .next-static-coverage
- name: Set up tests - Install dependencies
run: pnpm install --frozen-lockfile
@@ -289,6 +303,15 @@ jobs:
run: pnpm test:no-build
continue-on-error: false
- name: Upload E2E coverage to Codecov
if: ${{ !cancelled() }}
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
flags: platform-frontend-e2e
files: ./autogpt_platform/frontend/coverage/e2e/cobertura-coverage.xml
disable_search: true
- name: Upload Playwright report
if: always()
uses: actions/upload-artifact@v4

36
.gitleaks.toml Normal file
View File

@@ -0,0 +1,36 @@
title = "AutoGPT Gitleaks Config"
[extend]
useDefault = true
[allowlist]
description = "Global allowlist"
paths = [
# Template/example env files (no real secrets)
'''\.env\.(default|example|template)$''',
# Lock files
'''pnpm-lock\.yaml$''',
'''poetry\.lock$''',
# Secrets baseline
'''\.secrets\.baseline$''',
# Build artifacts and caches (should not be committed)
'''__pycache__/''',
'''classic/frontend/build/''',
# Docker dev setup (local dev JWTs/keys only)
'''autogpt_platform/db/docker/''',
# Load test configs (dev JWTs)
'''load-tests/configs/''',
# Test files with fake/fixture keys (_test.py, test_*.py, conftest.py)
'''(_test|test_.*|conftest)\.py$''',
# Documentation (only contains placeholder keys in curl/API examples)
'''docs/.*\.md$''',
# Firebase config (public API keys by design)
'''google-services\.json$''',
'''classic/frontend/(lib|web)/''',
]
# CI test-only encryption key (marked DO NOT USE IN PRODUCTION)
regexes = [
'''dvziYgz0KSK8FENhju0ZYi8''',
# LLM model name enum values falsely flagged as API keys
'''Llama-\d.*Instruct''',
]

View File

@@ -23,9 +23,15 @@ repos:
- id: detect-secrets
name: Detect secrets
description: Detects high entropy strings that are likely to be passwords.
args: ["--baseline", ".secrets.baseline"]
files: ^autogpt_platform/
exclude: pnpm-lock\.yaml$
stages: [pre-push]
exclude: (pnpm-lock\.yaml|\.env\.(default|example|template))$
- repo: https://github.com/gitleaks/gitleaks
rev: v8.24.3
hooks:
- id: gitleaks
name: Detect secrets (gitleaks)
- repo: local
# For proper type checking, all dependencies need to be up-to-date.

467
.secrets.baseline Normal file
View File

@@ -0,0 +1,467 @@
{
"version": "1.5.0",
"plugins_used": [
{
"name": "ArtifactoryDetector"
},
{
"name": "AWSKeyDetector"
},
{
"name": "AzureStorageKeyDetector"
},
{
"name": "Base64HighEntropyString",
"limit": 4.5
},
{
"name": "BasicAuthDetector"
},
{
"name": "CloudantDetector"
},
{
"name": "DiscordBotTokenDetector"
},
{
"name": "GitHubTokenDetector"
},
{
"name": "GitLabTokenDetector"
},
{
"name": "HexHighEntropyString",
"limit": 3.0
},
{
"name": "IbmCloudIamDetector"
},
{
"name": "IbmCosHmacDetector"
},
{
"name": "IPPublicDetector"
},
{
"name": "JwtTokenDetector"
},
{
"name": "KeywordDetector",
"keyword_exclude": ""
},
{
"name": "MailchimpDetector"
},
{
"name": "NpmDetector"
},
{
"name": "OpenAIDetector"
},
{
"name": "PrivateKeyDetector"
},
{
"name": "PypiTokenDetector"
},
{
"name": "SendGridDetector"
},
{
"name": "SlackDetector"
},
{
"name": "SoftlayerDetector"
},
{
"name": "SquareOAuthDetector"
},
{
"name": "StripeDetector"
},
{
"name": "TelegramBotTokenDetector"
},
{
"name": "TwilioKeyDetector"
}
],
"filters_used": [
{
"path": "detect_secrets.filters.allowlist.is_line_allowlisted"
},
{
"path": "detect_secrets.filters.common.is_ignored_due_to_verification_policies",
"min_level": 2
},
{
"path": "detect_secrets.filters.heuristic.is_indirect_reference"
},
{
"path": "detect_secrets.filters.heuristic.is_likely_id_string"
},
{
"path": "detect_secrets.filters.heuristic.is_lock_file"
},
{
"path": "detect_secrets.filters.heuristic.is_not_alphanumeric_string"
},
{
"path": "detect_secrets.filters.heuristic.is_potential_uuid"
},
{
"path": "detect_secrets.filters.heuristic.is_prefixed_with_dollar_sign"
},
{
"path": "detect_secrets.filters.heuristic.is_sequential_string"
},
{
"path": "detect_secrets.filters.heuristic.is_swagger_file"
},
{
"path": "detect_secrets.filters.heuristic.is_templated_secret"
},
{
"path": "detect_secrets.filters.regex.should_exclude_file",
"pattern": [
"\\.env$",
"pnpm-lock\\.yaml$",
"\\.env\\.(default|example|template)$",
"__pycache__",
"_test\\.py$",
"test_.*\\.py$",
"conftest\\.py$",
"poetry\\.lock$",
"node_modules"
]
}
],
"results": {
"autogpt_platform/backend/backend/api/external/v1/integrations.py": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/backend/backend/api/external/v1/integrations.py",
"hashed_secret": "665b1e3851eefefa3fb878654292f16597d25155",
"is_verified": false,
"line_number": 289
}
],
"autogpt_platform/backend/backend/blocks/airtable/_config.py": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/backend/backend/blocks/airtable/_config.py",
"hashed_secret": "57e168b03afb7c1ee3cdc4ee3db2fe1cc6e0df26",
"is_verified": false,
"line_number": 29
}
],
"autogpt_platform/backend/backend/blocks/dataforseo/_config.py": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/backend/backend/blocks/dataforseo/_config.py",
"hashed_secret": "32ce93887331fa5d192f2876ea15ec000c7d58b8",
"is_verified": false,
"line_number": 12
}
],
"autogpt_platform/backend/backend/blocks/github/checks.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/checks.py",
"hashed_secret": "8ac6f92737d8586790519c5d7bfb4d2eb172c238",
"is_verified": false,
"line_number": 108
}
],
"autogpt_platform/backend/backend/blocks/github/ci.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/ci.py",
"hashed_secret": "90bd1b48e958257948487b90bee080ba5ed00caa",
"is_verified": false,
"line_number": 123
}
],
"autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
"hashed_secret": "f96896dafced7387dcd22343b8ea29d3d2c65663",
"is_verified": false,
"line_number": 42
},
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
"hashed_secret": "b80a94d5e70bedf4f5f89d2f5a5255cc9492d12e",
"is_verified": false,
"line_number": 193
},
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
"hashed_secret": "75b17e517fe1b3136394f6bec80c4f892da75e42",
"is_verified": false,
"line_number": 344
},
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
"hashed_secret": "b0bfb5e4e2394e7f8906e5ed1dffd88b2bc89dd5",
"is_verified": false,
"line_number": 534
}
],
"autogpt_platform/backend/backend/blocks/github/statuses.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/statuses.py",
"hashed_secret": "8ac6f92737d8586790519c5d7bfb4d2eb172c238",
"is_verified": false,
"line_number": 85
}
],
"autogpt_platform/backend/backend/blocks/google/docs.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/google/docs.py",
"hashed_secret": "c95da0c6696342c867ef0c8258d2f74d20fd94d4",
"is_verified": false,
"line_number": 203
}
],
"autogpt_platform/backend/backend/blocks/google/sheets.py": [
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/google/sheets.py",
"hashed_secret": "bd5a04fa3667e693edc13239b6d310c5c7a8564b",
"is_verified": false,
"line_number": 57
}
],
"autogpt_platform/backend/backend/blocks/linear/_config.py": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/backend/backend/blocks/linear/_config.py",
"hashed_secret": "b37f020f42d6d613b6ce30103e4d408c4499b3bb",
"is_verified": false,
"line_number": 53
}
],
"autogpt_platform/backend/backend/blocks/medium.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/medium.py",
"hashed_secret": "ff998abc1ce6d8f01a675fa197368e44c8916e9c",
"is_verified": false,
"line_number": 131
}
],
"autogpt_platform/backend/backend/blocks/replicate/replicate_block.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/replicate/replicate_block.py",
"hashed_secret": "8bbdd6f26368f58ea4011d13d7f763cb662e66f0",
"is_verified": false,
"line_number": 55
}
],
"autogpt_platform/backend/backend/blocks/slant3d/webhook.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/slant3d/webhook.py",
"hashed_secret": "36263c76947443b2f6e6b78153967ac4a7da99f9",
"is_verified": false,
"line_number": 100
}
],
"autogpt_platform/backend/backend/blocks/talking_head.py": [
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/talking_head.py",
"hashed_secret": "44ce2d66222529eea4a32932823466fc0601c799",
"is_verified": false,
"line_number": 113
}
],
"autogpt_platform/backend/backend/blocks/wordpress/_config.py": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/backend/backend/blocks/wordpress/_config.py",
"hashed_secret": "e62679512436161b78e8a8d68c8829c2a1031ccb",
"is_verified": false,
"line_number": 17
}
],
"autogpt_platform/backend/backend/util/cache.py": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/backend/backend/util/cache.py",
"hashed_secret": "37f0c918c3fa47ca4a70e42037f9f123fdfbc75b",
"is_verified": false,
"line_number": 449
}
],
"autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/helpers.ts": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/helpers.ts",
"hashed_secret": "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8",
"is_verified": false,
"line_number": 6
}
],
"autogpt_platform/frontend/src/app/(platform)/dictionaries/en.json": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/dictionaries/en.json",
"hashed_secret": "8be3c943b1609fffbfc51aad666d0a04adf83c9d",
"is_verified": false,
"line_number": 5
}
],
"autogpt_platform/frontend/src/app/(platform)/dictionaries/es.json": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/dictionaries/es.json",
"hashed_secret": "5a6d1c612954979ea99ee33dbb2d231b00f6ac0a",
"is_verified": false,
"line_number": 5
}
],
"autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts",
"hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
"is_verified": false,
"line_number": 6
},
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts",
"hashed_secret": "f72cbb45464d487064610c5411c576ca4019d380",
"is_verified": false,
"line_number": 8
}
],
"autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts",
"hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
"is_verified": false,
"line_number": 5
},
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts",
"hashed_secret": "f72cbb45464d487064610c5411c576ca4019d380",
"is_verified": false,
"line_number": 7
}
],
"autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx",
"hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
"is_verified": false,
"line_number": 192
},
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx",
"hashed_secret": "86275db852204937bbdbdebe5fabe8536e030ab6",
"is_verified": false,
"line_number": 193
}
],
"autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts",
"hashed_secret": "47acd2028cf81b5da88ddeedb2aea4eca4b71fbd",
"is_verified": false,
"line_number": 102
},
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts",
"hashed_secret": "8be3c943b1609fffbfc51aad666d0a04adf83c9d",
"is_verified": false,
"line_number": 103
}
],
"autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts": [
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "9c486c92f1a7420e1045c7ad963fbb7ba3621025",
"is_verified": false,
"line_number": 73
},
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "9277508c7a6effc8fb59163efbfada189e35425c",
"is_verified": false,
"line_number": 75
},
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "8dc7e2cb1d0935897d541bf5facab389b8a50340",
"is_verified": false,
"line_number": 77
},
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "79a26ad48775944299be6aaf9fb1d5302c1ed75b",
"is_verified": false,
"line_number": 79
},
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "a3b62b44500a1612e48d4cab8294df81561b3b1a",
"is_verified": false,
"line_number": 81
},
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "a58979bd0b21ef4f50417d001008e60dd7a85c64",
"is_verified": false,
"line_number": 83
},
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "6cb6e075f8e8c7c850f9d128d6608e5dbe209a79",
"is_verified": false,
"line_number": 85
}
],
"autogpt_platform/frontend/src/lib/constants.ts": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/lib/constants.ts",
"hashed_secret": "27b924db06a28cc755fb07c54f0fddc30659fe4d",
"is_verified": false,
"line_number": 10
}
],
"autogpt_platform/frontend/src/tests/credentials/index.ts": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/tests/credentials/index.ts",
"hashed_secret": "c18006fc138809314751cd1991f1e0b820fabd37",
"is_verified": false,
"line_number": 4
}
]
},
"generated_at": "2026-04-02T13:10:54Z"
}

View File

@@ -30,7 +30,7 @@ See `/frontend/CONTRIBUTING.md` for complete patterns. Quick reference:
- Regenerate with `pnpm generate:api`
- Pattern: `use{Method}{Version}{OperationName}`
4. **Styling**: Tailwind CSS only, use design tokens, Phosphor Icons only
5. **Testing**: Add Storybook stories for new components, Playwright for E2E
5. **Testing**: Integration tests (Vitest + RTL + MSW) are the default (~90%, page-level). Playwright for E2E critical flows. Storybook for design system components. See `autogpt_platform/frontend/TESTING.md`
6. **Code conventions**: Function declarations (not arrow functions) for components/handlers
- Component props should be `interface Props { ... }` (not exported) unless the interface needs to be used outside the component
@@ -47,7 +47,9 @@ See `/frontend/CONTRIBUTING.md` for complete patterns. Quick reference:
## Testing
- Backend: `poetry run test` (runs pytest with a docker based postgres + prisma).
- Frontend: `pnpm test` or `pnpm test-ui` for Playwright tests. See `docs/content/platform/contributing/tests.md` for tips.
- Frontend integration tests: `pnpm test:unit` (Vitest + RTL + MSW, primary testing approach).
- Frontend E2E tests: `pnpm test` or `pnpm test-ui` for Playwright tests.
- See `autogpt_platform/frontend/TESTING.md` for the full testing strategy.
Always run the relevant linters and tests before committing.
Use conventional commit messages for all commits (e.g. `feat(backend): add API`).

View File

@@ -0,0 +1,98 @@
import logging
from datetime import datetime
from autogpt_libs.auth import get_user_id, requires_admin_user
from cachetools import TTLCache
from fastapi import APIRouter, Query, Security
from pydantic import BaseModel
from backend.data.platform_cost import (
CostLogRow,
PlatformCostDashboard,
get_platform_cost_dashboard,
get_platform_cost_logs,
)
from backend.util.models import Pagination
logger = logging.getLogger(__name__)
# Cache dashboard results for 30 seconds per unique filter combination.
# The table is append-only so stale reads are acceptable for analytics.
_DASHBOARD_CACHE_TTL = 30
_dashboard_cache: TTLCache[tuple, PlatformCostDashboard] = TTLCache(
maxsize=256, ttl=_DASHBOARD_CACHE_TTL
)
router = APIRouter(
prefix="/platform-costs",
tags=["platform-cost", "admin"],
dependencies=[Security(requires_admin_user)],
)
class PlatformCostLogsResponse(BaseModel):
logs: list[CostLogRow]
pagination: Pagination
@router.get(
"/dashboard",
response_model=PlatformCostDashboard,
summary="Get Platform Cost Dashboard",
)
async def get_cost_dashboard(
admin_user_id: str = Security(get_user_id),
start: datetime | None = Query(None),
end: datetime | None = Query(None),
provider: str | None = Query(None),
user_id: str | None = Query(None),
):
logger.info("Admin %s fetching platform cost dashboard", admin_user_id)
cache_key = (start, end, provider, user_id)
cached = _dashboard_cache.get(cache_key)
if cached is not None:
return cached
result = await get_platform_cost_dashboard(
start=start,
end=end,
provider=provider,
user_id=user_id,
)
_dashboard_cache[cache_key] = result
return result
@router.get(
"/logs",
response_model=PlatformCostLogsResponse,
summary="Get Platform Cost Logs",
)
async def get_cost_logs(
admin_user_id: str = Security(get_user_id),
start: datetime | None = Query(None),
end: datetime | None = Query(None),
provider: str | None = Query(None),
user_id: str | None = Query(None),
page: int = Query(1, ge=1),
page_size: int = Query(50, ge=1, le=200),
):
logger.info("Admin %s fetching platform cost logs", admin_user_id)
logs, total = await get_platform_cost_logs(
start=start,
end=end,
provider=provider,
user_id=user_id,
page=page,
page_size=page_size,
)
total_pages = (total + page_size - 1) // page_size
return PlatformCostLogsResponse(
logs=logs,
pagination=Pagination(
total_items=total,
total_pages=total_pages,
current_page=page,
page_size=page_size,
),
)

View File

@@ -0,0 +1,192 @@
from unittest.mock import AsyncMock
import fastapi
import fastapi.testclient
import pytest
import pytest_mock
from autogpt_libs.auth.jwt_utils import get_jwt_payload
from backend.data.platform_cost import PlatformCostDashboard
from . import platform_cost_routes
from .platform_cost_routes import router as platform_cost_router
app = fastapi.FastAPI()
app.include_router(platform_cost_router)
client = fastapi.testclient.TestClient(app)
@pytest.fixture(autouse=True)
def setup_app_admin_auth(mock_jwt_admin):
"""Setup admin auth overrides for all tests in this module"""
app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
# Clear TTL cache so each test starts cold.
platform_cost_routes._dashboard_cache.clear()
yield
app.dependency_overrides.clear()
def test_get_dashboard_success(
mocker: pytest_mock.MockerFixture,
) -> None:
real_dashboard = PlatformCostDashboard(
by_provider=[],
by_user=[],
total_cost_microdollars=0,
total_requests=0,
total_users=0,
)
mocker.patch(
"backend.api.features.admin.platform_cost_routes.get_platform_cost_dashboard",
AsyncMock(return_value=real_dashboard),
)
response = client.get("/platform-costs/dashboard")
assert response.status_code == 200
data = response.json()
assert "by_provider" in data
assert "by_user" in data
assert data["total_cost_microdollars"] == 0
def test_get_logs_success(
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"backend.api.features.admin.platform_cost_routes.get_platform_cost_logs",
AsyncMock(return_value=([], 0)),
)
response = client.get("/platform-costs/logs")
assert response.status_code == 200
data = response.json()
assert data["logs"] == []
assert data["pagination"]["total_items"] == 0
def test_get_dashboard_with_filters(
mocker: pytest_mock.MockerFixture,
) -> None:
real_dashboard = PlatformCostDashboard(
by_provider=[],
by_user=[],
total_cost_microdollars=0,
total_requests=0,
total_users=0,
)
mock_dashboard = AsyncMock(return_value=real_dashboard)
mocker.patch(
"backend.api.features.admin.platform_cost_routes.get_platform_cost_dashboard",
mock_dashboard,
)
response = client.get(
"/platform-costs/dashboard",
params={
"start": "2026-01-01T00:00:00",
"end": "2026-04-01T00:00:00",
"provider": "openai",
"user_id": "test-user-123",
},
)
assert response.status_code == 200
mock_dashboard.assert_called_once()
call_kwargs = mock_dashboard.call_args.kwargs
assert call_kwargs["provider"] == "openai"
assert call_kwargs["user_id"] == "test-user-123"
assert call_kwargs["start"] is not None
assert call_kwargs["end"] is not None
def test_get_logs_with_pagination(
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"backend.api.features.admin.platform_cost_routes.get_platform_cost_logs",
AsyncMock(return_value=([], 0)),
)
response = client.get(
"/platform-costs/logs",
params={"page": 2, "page_size": 25, "provider": "anthropic"},
)
assert response.status_code == 200
data = response.json()
assert data["pagination"]["current_page"] == 2
assert data["pagination"]["page_size"] == 25
def test_get_dashboard_requires_admin() -> None:
import fastapi
from fastapi import HTTPException
def reject_jwt(request: fastapi.Request):
raise HTTPException(status_code=401, detail="Not authenticated")
app.dependency_overrides[get_jwt_payload] = reject_jwt
try:
response = client.get("/platform-costs/dashboard")
assert response.status_code == 401
response = client.get("/platform-costs/logs")
assert response.status_code == 401
finally:
app.dependency_overrides.clear()
def test_get_dashboard_rejects_non_admin(mock_jwt_user, mock_jwt_admin) -> None:
"""Non-admin JWT must be rejected with 403 by requires_admin_user."""
app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
try:
response = client.get("/platform-costs/dashboard")
assert response.status_code == 403
response = client.get("/platform-costs/logs")
assert response.status_code == 403
finally:
app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
def test_get_logs_invalid_page_size_too_large() -> None:
"""page_size > 200 must be rejected with 422."""
response = client.get("/platform-costs/logs", params={"page_size": 201})
assert response.status_code == 422
def test_get_logs_invalid_page_size_zero() -> None:
"""page_size = 0 (below ge=1) must be rejected with 422."""
response = client.get("/platform-costs/logs", params={"page_size": 0})
assert response.status_code == 422
def test_get_logs_invalid_page_negative() -> None:
"""page < 1 must be rejected with 422."""
response = client.get("/platform-costs/logs", params={"page": 0})
assert response.status_code == 422
def test_get_dashboard_invalid_date_format() -> None:
"""Malformed start date must be rejected with 422."""
response = client.get("/platform-costs/dashboard", params={"start": "not-a-date"})
assert response.status_code == 422
def test_get_dashboard_cache_hit(
mocker: pytest_mock.MockerFixture,
) -> None:
"""Second identical request returns cached result without calling the DB again."""
real_dashboard = PlatformCostDashboard(
by_provider=[],
by_user=[],
total_cost_microdollars=42,
total_requests=1,
total_users=1,
)
mock_fn = mocker.patch(
"backend.api.features.admin.platform_cost_routes.get_platform_cost_dashboard",
AsyncMock(return_value=real_dashboard),
)
client.get("/platform-costs/dashboard")
client.get("/platform-costs/dashboard")
mock_fn.assert_awaited_once() # second request hit the cache

View File

@@ -9,11 +9,14 @@ from pydantic import BaseModel
from backend.copilot.config import ChatConfig
from backend.copilot.rate_limit import (
SubscriptionTier,
get_global_rate_limits,
get_usage_status,
get_user_tier,
reset_user_usage,
set_user_tier,
)
from backend.data.user import get_user_by_email, get_user_email_by_id
from backend.data.user import get_user_by_email, get_user_email_by_id, search_users
logger = logging.getLogger(__name__)
@@ -33,6 +36,17 @@ class UserRateLimitResponse(BaseModel):
weekly_token_limit: int
daily_tokens_used: int
weekly_tokens_used: int
tier: SubscriptionTier
class UserTierResponse(BaseModel):
user_id: str
tier: SubscriptionTier
class SetUserTierRequest(BaseModel):
user_id: str
tier: SubscriptionTier
async def _resolve_user_id(
@@ -86,10 +100,10 @@ async def get_user_rate_limit(
logger.info("Admin %s checking rate limit for user %s", admin_user_id, resolved_id)
daily_limit, weekly_limit = await get_global_rate_limits(
daily_limit, weekly_limit, tier = await get_global_rate_limits(
resolved_id, config.daily_token_limit, config.weekly_token_limit
)
usage = await get_usage_status(resolved_id, daily_limit, weekly_limit)
usage = await get_usage_status(resolved_id, daily_limit, weekly_limit, tier=tier)
return UserRateLimitResponse(
user_id=resolved_id,
@@ -98,6 +112,7 @@ async def get_user_rate_limit(
weekly_token_limit=weekly_limit,
daily_tokens_used=usage.daily.used,
weekly_tokens_used=usage.weekly.used,
tier=tier,
)
@@ -125,10 +140,10 @@ async def reset_user_rate_limit(
logger.exception("Failed to reset user usage")
raise HTTPException(status_code=500, detail="Failed to reset usage") from e
daily_limit, weekly_limit = await get_global_rate_limits(
daily_limit, weekly_limit, tier = await get_global_rate_limits(
user_id, config.daily_token_limit, config.weekly_token_limit
)
usage = await get_usage_status(user_id, daily_limit, weekly_limit)
usage = await get_usage_status(user_id, daily_limit, weekly_limit, tier=tier)
try:
resolved_email = await get_user_email_by_id(user_id)
@@ -143,4 +158,102 @@ async def reset_user_rate_limit(
weekly_token_limit=weekly_limit,
daily_tokens_used=usage.daily.used,
weekly_tokens_used=usage.weekly.used,
tier=tier,
)
@router.get(
"/rate_limit/tier",
response_model=UserTierResponse,
summary="Get User Rate Limit Tier",
)
async def get_user_rate_limit_tier(
user_id: str,
admin_user_id: str = Security(get_user_id),
) -> UserTierResponse:
"""Get a user's current rate-limit tier. Admin-only.
Returns 404 if the user does not exist in the database.
"""
logger.info("Admin %s checking tier for user %s", admin_user_id, user_id)
resolved_email = await get_user_email_by_id(user_id)
if resolved_email is None:
raise HTTPException(status_code=404, detail=f"User {user_id} not found")
tier = await get_user_tier(user_id)
return UserTierResponse(user_id=user_id, tier=tier)
@router.post(
"/rate_limit/tier",
response_model=UserTierResponse,
summary="Set User Rate Limit Tier",
)
async def set_user_rate_limit_tier(
request: SetUserTierRequest,
admin_user_id: str = Security(get_user_id),
) -> UserTierResponse:
"""Set a user's rate-limit tier. Admin-only.
Returns 404 if the user does not exist in the database.
"""
try:
resolved_email = await get_user_email_by_id(request.user_id)
except Exception:
logger.warning(
"Failed to resolve email for user %s",
request.user_id,
exc_info=True,
)
resolved_email = None
if resolved_email is None:
raise HTTPException(status_code=404, detail=f"User {request.user_id} not found")
old_tier = await get_user_tier(request.user_id)
logger.info(
"Admin %s changing tier for user %s (%s): %s -> %s",
admin_user_id,
request.user_id,
resolved_email,
old_tier.value,
request.tier.value,
)
try:
await set_user_tier(request.user_id, request.tier)
except Exception as e:
logger.exception("Failed to set user tier")
raise HTTPException(status_code=500, detail="Failed to set tier") from e
return UserTierResponse(user_id=request.user_id, tier=request.tier)
class UserSearchResult(BaseModel):
user_id: str
user_email: Optional[str] = None
@router.get(
"/rate_limit/search_users",
response_model=list[UserSearchResult],
summary="Search Users by Name or Email",
)
async def admin_search_users(
query: str,
limit: int = 20,
admin_user_id: str = Security(get_user_id),
) -> list[UserSearchResult]:
"""Search users by partial email or name. Admin-only.
Queries the User table directly — returns results even for users
without credit transaction history.
"""
if len(query.strip()) < 3:
raise HTTPException(
status_code=400,
detail="Search query must be at least 3 characters.",
)
logger.info("Admin %s searching users with query=%r", admin_user_id, query)
results = await search_users(query, limit=max(1, min(limit, 50)))
return [UserSearchResult(user_id=uid, user_email=email) for uid, email in results]

View File

@@ -9,7 +9,7 @@ import pytest_mock
from autogpt_libs.auth.jwt_utils import get_jwt_payload
from pytest_snapshot.plugin import Snapshot
from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow
from backend.copilot.rate_limit import CoPilotUsageStatus, SubscriptionTier, UsageWindow
from .rate_limit_admin_routes import router as rate_limit_admin_router
@@ -57,7 +57,7 @@ def _patch_rate_limit_deps(
mocker.patch(
f"{_MOCK_MODULE}.get_global_rate_limits",
new_callable=AsyncMock,
return_value=(2_500_000, 12_500_000),
return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
)
mocker.patch(
f"{_MOCK_MODULE}.get_usage_status",
@@ -89,6 +89,7 @@ def test_get_rate_limit(
assert data["weekly_token_limit"] == 12_500_000
assert data["daily_tokens_used"] == 500_000
assert data["weekly_tokens_used"] == 3_000_000
assert data["tier"] == "FREE"
configured_snapshot.assert_match(
json.dumps(data, indent=2, sort_keys=True) + "\n",
@@ -162,6 +163,7 @@ def test_reset_user_usage_daily_only(
assert data["daily_tokens_used"] == 0
# Weekly is untouched
assert data["weekly_tokens_used"] == 3_000_000
assert data["tier"] == "FREE"
mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=False)
@@ -192,6 +194,7 @@ def test_reset_user_usage_daily_and_weekly(
data = response.json()
assert data["daily_tokens_used"] == 0
assert data["weekly_tokens_used"] == 0
assert data["tier"] == "FREE"
mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=True)
@@ -228,7 +231,7 @@ def test_get_rate_limit_email_lookup_failure(
mocker.patch(
f"{_MOCK_MODULE}.get_global_rate_limits",
new_callable=AsyncMock,
return_value=(2_500_000, 12_500_000),
return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
)
mocker.patch(
f"{_MOCK_MODULE}.get_usage_status",
@@ -261,3 +264,303 @@ def test_admin_endpoints_require_admin_role(mock_jwt_user) -> None:
json={"user_id": "test"},
)
assert response.status_code == 403
# ---------------------------------------------------------------------------
# Tier management endpoints
# ---------------------------------------------------------------------------
def test_get_user_tier(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test getting a user's rate-limit tier."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
return_value=_TARGET_EMAIL,
)
mocker.patch(
f"{_MOCK_MODULE}.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.PRO,
)
response = client.get("/admin/rate_limit/tier", params={"user_id": target_user_id})
assert response.status_code == 200
data = response.json()
assert data["user_id"] == target_user_id
assert data["tier"] == "PRO"
def test_get_user_tier_user_not_found(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test that getting tier for a non-existent user returns 404."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
return_value=None,
)
response = client.get("/admin/rate_limit/tier", params={"user_id": target_user_id})
assert response.status_code == 404
def test_set_user_tier(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test setting a user's rate-limit tier (upgrade)."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
return_value=_TARGET_EMAIL,
)
mocker.patch(
f"{_MOCK_MODULE}.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.FREE,
)
mock_set = mocker.patch(
f"{_MOCK_MODULE}.set_user_tier",
new_callable=AsyncMock,
)
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "ENTERPRISE"},
)
assert response.status_code == 200
data = response.json()
assert data["user_id"] == target_user_id
assert data["tier"] == "ENTERPRISE"
mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.ENTERPRISE)
def test_set_user_tier_downgrade(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test downgrading a user's tier from PRO to FREE."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
return_value=_TARGET_EMAIL,
)
mocker.patch(
f"{_MOCK_MODULE}.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.PRO,
)
mock_set = mocker.patch(
f"{_MOCK_MODULE}.set_user_tier",
new_callable=AsyncMock,
)
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "FREE"},
)
assert response.status_code == 200
data = response.json()
assert data["user_id"] == target_user_id
assert data["tier"] == "FREE"
mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.FREE)
def test_set_user_tier_invalid_tier(
target_user_id: str,
) -> None:
"""Test that setting an invalid tier returns 422."""
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "invalid"},
)
assert response.status_code == 422
def test_set_user_tier_invalid_tier_uppercase(
target_user_id: str,
) -> None:
"""Test that setting an unrecognised uppercase tier (e.g. 'INVALID') returns 422.
Regression: ensures Pydantic enum validation rejects values that are not
members of SubscriptionTier, even when they look like valid enum names.
"""
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "INVALID"},
)
assert response.status_code == 422
body = response.json()
assert "detail" in body
def test_set_user_tier_email_lookup_failure_returns_404(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test that email lookup failure returns 404 (user unverifiable)."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
side_effect=Exception("DB connection failed"),
)
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "PRO"},
)
assert response.status_code == 404
def test_set_user_tier_user_not_found(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test that setting tier for a non-existent user returns 404."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
return_value=None,
)
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "PRO"},
)
assert response.status_code == 404
def test_set_user_tier_db_failure(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test that DB failure on set tier returns 500."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
return_value=_TARGET_EMAIL,
)
mocker.patch(
f"{_MOCK_MODULE}.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.FREE,
)
mocker.patch(
f"{_MOCK_MODULE}.set_user_tier",
new_callable=AsyncMock,
side_effect=Exception("DB connection refused"),
)
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "PRO"},
)
assert response.status_code == 500
def test_tier_endpoints_require_admin_role(mock_jwt_user) -> None:
"""Test that tier admin endpoints require admin role."""
app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
response = client.get("/admin/rate_limit/tier", params={"user_id": "test"})
assert response.status_code == 403
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": "test", "tier": "PRO"},
)
assert response.status_code == 403
# ─── search_users endpoint ──────────────────────────────────────────
def test_search_users_returns_matching_users(
mocker: pytest_mock.MockerFixture,
admin_user_id: str,
) -> None:
"""Partial search should return all matching users from the User table."""
mocker.patch(
_MOCK_MODULE + ".search_users",
new_callable=AsyncMock,
return_value=[
("user-1", "zamil.majdy@gmail.com"),
("user-2", "zamil.majdy@agpt.co"),
],
)
response = client.get("/admin/rate_limit/search_users", params={"query": "zamil"})
assert response.status_code == 200
results = response.json()
assert len(results) == 2
assert results[0]["user_email"] == "zamil.majdy@gmail.com"
assert results[1]["user_email"] == "zamil.majdy@agpt.co"
def test_search_users_empty_results(
mocker: pytest_mock.MockerFixture,
admin_user_id: str,
) -> None:
"""Search with no matches returns empty list."""
mocker.patch(
_MOCK_MODULE + ".search_users",
new_callable=AsyncMock,
return_value=[],
)
response = client.get(
"/admin/rate_limit/search_users", params={"query": "nonexistent"}
)
assert response.status_code == 200
assert response.json() == []
def test_search_users_short_query_rejected(
admin_user_id: str,
) -> None:
"""Query shorter than 3 characters should return 400."""
response = client.get("/admin/rate_limit/search_users", params={"query": "ab"})
assert response.status_code == 400
def test_search_users_negative_limit_clamped(
mocker: pytest_mock.MockerFixture,
admin_user_id: str,
) -> None:
"""Negative limit should be clamped to 1, not passed through."""
mock_search = mocker.patch(
_MOCK_MODULE + ".search_users",
new_callable=AsyncMock,
return_value=[],
)
response = client.get(
"/admin/rate_limit/search_users", params={"query": "test", "limit": -1}
)
assert response.status_code == 200
mock_search.assert_awaited_once_with("test", limit=1)
def test_search_users_requires_admin_role(mock_jwt_user) -> None:
"""Test that the search_users endpoint requires admin role."""
app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
response = client.get("/admin/rate_limit/search_users", params={"query": "test"})
assert response.status_code == 403

View File

@@ -15,7 +15,8 @@ from pydantic import BaseModel, ConfigDict, Field, field_validator
from backend.copilot import service as chat_service
from backend.copilot import stream_registry
from backend.copilot.config import ChatConfig
from backend.copilot.config import ChatConfig, CopilotMode
from backend.copilot.db import get_chat_messages_paginated
from backend.copilot.executor.utils import enqueue_cancel_task, enqueue_copilot_turn
from backend.copilot.model import (
ChatMessage,
@@ -111,6 +112,11 @@ class StreamChatRequest(BaseModel):
file_ids: list[str] | None = Field(
default=None, max_length=20
) # Workspace file IDs attached to this message
mode: CopilotMode | None = Field(
default=None,
description="Autopilot mode: 'fast' for baseline LLM, 'extended_thinking' for Claude Agent SDK. "
"If None, uses the server default (extended_thinking).",
)
class CreateSessionRequest(BaseModel):
@@ -150,6 +156,8 @@ class SessionDetailResponse(BaseModel):
user_id: str | None
messages: list[dict]
active_stream: ActiveStreamInfo | None = None # Present if stream is still active
has_more_messages: bool = False
oldest_sequence: int | None = None
total_prompt_tokens: int = 0
total_completion_tokens: int = 0
metadata: ChatSessionMetadata = ChatSessionMetadata()
@@ -389,60 +397,78 @@ async def update_session_title_route(
async def get_session(
session_id: str,
user_id: Annotated[str, Security(auth.get_user_id)],
limit: int = Query(default=50, ge=1, le=200),
before_sequence: int | None = Query(default=None, ge=0),
) -> SessionDetailResponse:
"""
Retrieve the details of a specific chat session.
Looks up a chat session by ID for the given user (if authenticated) and returns all session data including messages.
If there's an active stream for this session, returns active_stream info for reconnection.
Supports cursor-based pagination via ``limit`` and ``before_sequence``.
When no pagination params are provided, returns the most recent messages.
Args:
session_id: The unique identifier for the desired chat session.
user_id: The optional authenticated user ID, or None for anonymous access.
user_id: The authenticated user's ID.
limit: Maximum number of messages to return (1-200, default 50).
before_sequence: Return messages with sequence < this value (cursor).
Returns:
SessionDetailResponse: Details for the requested session, including active_stream info if applicable.
SessionDetailResponse: Details for the requested session, including
active_stream info and pagination metadata.
"""
session = await get_chat_session(session_id, user_id)
if not session:
page = await get_chat_messages_paginated(
session_id, limit, before_sequence, user_id=user_id
)
if page is None:
raise NotFoundError(f"Session {session_id} not found.")
messages = [message.model_dump() for message in page.messages]
messages = [message.model_dump() for message in session.messages]
# Check if there's an active stream for this session
# Only check active stream on initial load (not on "load more" requests)
active_stream_info = None
active_session, last_message_id = await stream_registry.get_active_session(
session_id, user_id
)
logger.info(
f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
)
if active_session:
# Keep the assistant message (including tool_calls) so the frontend can
# render the correct tool UI (e.g. CreateAgent with mini game).
# convertChatSessionToUiMessages handles isComplete=false by setting
# tool parts without output to state "input-available".
active_stream_info = ActiveStreamInfo(
turn_id=active_session.turn_id,
last_message_id=last_message_id,
if before_sequence is None:
active_session, last_message_id = await stream_registry.get_active_session(
session_id, user_id
)
logger.info(
f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
)
if active_session:
active_stream_info = ActiveStreamInfo(
turn_id=active_session.turn_id,
last_message_id=last_message_id,
)
# Skip session metadata on "load more" — frontend only needs messages
if before_sequence is not None:
return SessionDetailResponse(
id=page.session.session_id,
created_at=page.session.started_at.isoformat(),
updated_at=page.session.updated_at.isoformat(),
user_id=page.session.user_id or None,
messages=messages,
active_stream=None,
has_more_messages=page.has_more,
oldest_sequence=page.oldest_sequence,
total_prompt_tokens=0,
total_completion_tokens=0,
)
# Sum token usage from session
total_prompt = sum(u.prompt_tokens for u in session.usage)
total_completion = sum(u.completion_tokens for u in session.usage)
total_prompt = sum(u.prompt_tokens for u in page.session.usage)
total_completion = sum(u.completion_tokens for u in page.session.usage)
return SessionDetailResponse(
id=session.session_id,
created_at=session.started_at.isoformat(),
updated_at=session.updated_at.isoformat(),
user_id=session.user_id or None,
id=page.session.session_id,
created_at=page.session.started_at.isoformat(),
updated_at=page.session.updated_at.isoformat(),
user_id=page.session.user_id or None,
messages=messages,
active_stream=active_stream_info,
has_more_messages=page.has_more,
oldest_sequence=page.oldest_sequence,
total_prompt_tokens=total_prompt,
total_completion_tokens=total_completion,
metadata=session.metadata,
metadata=page.session.metadata,
)
@@ -456,8 +482,9 @@ async def get_copilot_usage(
Returns current token usage vs limits for daily and weekly windows.
Global defaults sourced from LaunchDarkly (falling back to config).
Includes the user's rate-limit tier.
"""
daily_limit, weekly_limit = await get_global_rate_limits(
daily_limit, weekly_limit, tier = await get_global_rate_limits(
user_id, config.daily_token_limit, config.weekly_token_limit
)
return await get_usage_status(
@@ -465,6 +492,7 @@ async def get_copilot_usage(
daily_token_limit=daily_limit,
weekly_token_limit=weekly_limit,
rate_limit_reset_cost=config.rate_limit_reset_cost,
tier=tier,
)
@@ -516,7 +544,7 @@ async def reset_copilot_usage(
detail="Rate limit reset is not available (credit system is disabled).",
)
daily_limit, weekly_limit = await get_global_rate_limits(
daily_limit, weekly_limit, tier = await get_global_rate_limits(
user_id, config.daily_token_limit, config.weekly_token_limit
)
@@ -556,6 +584,7 @@ async def reset_copilot_usage(
user_id=user_id,
daily_token_limit=daily_limit,
weekly_token_limit=weekly_limit,
tier=tier,
)
if daily_limit > 0 and usage_status.daily.used < daily_limit:
raise HTTPException(
@@ -631,6 +660,7 @@ async def reset_copilot_usage(
daily_token_limit=daily_limit,
weekly_token_limit=weekly_limit,
rate_limit_reset_cost=config.rate_limit_reset_cost,
tier=tier,
)
return RateLimitResetResponse(
@@ -741,7 +771,7 @@ async def stream_chat_post(
# Global defaults sourced from LaunchDarkly, falling back to config.
if user_id:
try:
daily_limit, weekly_limit = await get_global_rate_limits(
daily_limit, weekly_limit, _ = await get_global_rate_limits(
user_id, config.daily_token_limit, config.weekly_token_limit
)
await check_rate_limit(
@@ -836,6 +866,7 @@ async def stream_chat_post(
is_user_message=request.is_user_message,
context=request.context,
file_ids=sanitized_file_ids,
mode=request.mode,
)
setup_time = (time.perf_counter() - stream_start_time) * 1000

View File

@@ -9,6 +9,7 @@ import pytest
import pytest_mock
from backend.api.features.chat import routes as chat_routes
from backend.copilot.rate_limit import SubscriptionTier
app = fastapi.FastAPI()
app.include_router(chat_routes.router)
@@ -331,14 +332,28 @@ def _mock_usage(
*,
daily_used: int = 500,
weekly_used: int = 2000,
daily_limit: int = 10000,
weekly_limit: int = 50000,
tier: "SubscriptionTier" = SubscriptionTier.FREE,
) -> AsyncMock:
"""Mock get_usage_status to return a predictable CoPilotUsageStatus."""
"""Mock get_usage_status and get_global_rate_limits for usage endpoint tests.
Mocks both ``get_global_rate_limits`` (returns the given limits + tier) and
``get_usage_status`` so that tests exercise the endpoint without hitting
LaunchDarkly or Prisma.
"""
from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow
mocker.patch(
"backend.api.features.chat.routes.get_global_rate_limits",
new_callable=AsyncMock,
return_value=(daily_limit, weekly_limit, tier),
)
resets_at = datetime.now(UTC) + timedelta(days=1)
status = CoPilotUsageStatus(
daily=UsageWindow(used=daily_used, limit=10000, resets_at=resets_at),
weekly=UsageWindow(used=weekly_used, limit=50000, resets_at=resets_at),
daily=UsageWindow(used=daily_used, limit=daily_limit, resets_at=resets_at),
weekly=UsageWindow(used=weekly_used, limit=weekly_limit, resets_at=resets_at),
)
return mocker.patch(
"backend.api.features.chat.routes.get_usage_status",
@@ -369,6 +384,7 @@ def test_usage_returns_daily_and_weekly(
daily_token_limit=10000,
weekly_token_limit=50000,
rate_limit_reset_cost=chat_routes.config.rate_limit_reset_cost,
tier=SubscriptionTier.FREE,
)
@@ -376,11 +392,9 @@ def test_usage_uses_config_limits(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""The endpoint forwards daily_token_limit and weekly_token_limit from config."""
mock_get = _mock_usage(mocker)
"""The endpoint forwards resolved limits from get_global_rate_limits to get_usage_status."""
mock_get = _mock_usage(mocker, daily_limit=99999, weekly_limit=77777)
mocker.patch.object(chat_routes.config, "daily_token_limit", 99999)
mocker.patch.object(chat_routes.config, "weekly_token_limit", 77777)
mocker.patch.object(chat_routes.config, "rate_limit_reset_cost", 500)
response = client.get("/usage")
@@ -391,6 +405,7 @@ def test_usage_uses_config_limits(
daily_token_limit=99999,
weekly_token_limit=77777,
rate_limit_reset_cost=500,
tier=SubscriptionTier.FREE,
)
@@ -526,3 +541,41 @@ def test_create_session_rejects_nested_metadata(
)
assert response.status_code == 422
class TestStreamChatRequestModeValidation:
"""Pydantic-level validation of the ``mode`` field on StreamChatRequest."""
def test_rejects_invalid_mode_value(self) -> None:
"""Any string outside the Literal set must raise ValidationError."""
from pydantic import ValidationError
from backend.api.features.chat.routes import StreamChatRequest
with pytest.raises(ValidationError):
StreamChatRequest(message="hi", mode="turbo") # type: ignore[arg-type]
def test_accepts_fast_mode(self) -> None:
from backend.api.features.chat.routes import StreamChatRequest
req = StreamChatRequest(message="hi", mode="fast")
assert req.mode == "fast"
def test_accepts_extended_thinking_mode(self) -> None:
from backend.api.features.chat.routes import StreamChatRequest
req = StreamChatRequest(message="hi", mode="extended_thinking")
assert req.mode == "extended_thinking"
def test_accepts_none_mode(self) -> None:
"""``mode=None`` is valid (server decides via feature flags)."""
from backend.api.features.chat.routes import StreamChatRequest
req = StreamChatRequest(message="hi", mode=None)
assert req.mode is None
def test_mode_defaults_to_none_when_omitted(self) -> None:
from backend.api.features.chat.routes import StreamChatRequest
req = StreamChatRequest(message="hi")
assert req.mode is None

View File

@@ -189,6 +189,7 @@ async def test_create_store_submission(mocker):
notifyOnAgentApproved=True,
notifyOnAgentRejected=True,
timezone="Europe/Delft",
subscriptionTier=prisma.enums.SubscriptionTier.FREE, # type: ignore[reportCallIssue,reportAttributeAccessIssue]
)
mock_agent = prisma.models.AgentGraph(
id="agent-id",

View File

@@ -12,7 +12,7 @@ import fastapi
from autogpt_libs.auth.dependencies import get_user_id, requires_user
from fastapi import Query, UploadFile
from fastapi.responses import Response
from pydantic import BaseModel
from pydantic import BaseModel, Field
from backend.data.workspace import (
WorkspaceFile,
@@ -131,9 +131,26 @@ class StorageUsageResponse(BaseModel):
file_count: int
class WorkspaceFileItem(BaseModel):
id: str
name: str
path: str
mime_type: str
size_bytes: int
metadata: dict = Field(default_factory=dict)
created_at: str
class ListFilesResponse(BaseModel):
files: list[WorkspaceFileItem]
offset: int = 0
has_more: bool = False
@router.get(
"/files/{file_id}/download",
summary="Download file by ID",
operation_id="getWorkspaceDownloadFileById",
)
async def download_file(
user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -158,6 +175,7 @@ async def download_file(
@router.delete(
"/files/{file_id}",
summary="Delete a workspace file",
operation_id="deleteWorkspaceFile",
)
async def delete_workspace_file(
user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -183,6 +201,7 @@ async def delete_workspace_file(
@router.post(
"/files/upload",
summary="Upload file to workspace",
operation_id="uploadWorkspaceFile",
)
async def upload_file(
user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -196,6 +215,9 @@ async def upload_file(
Files are stored in session-scoped paths when session_id is provided,
so the agent's session-scoped tools can discover them automatically.
"""
# Empty-string session_id drops session scoping; normalize to None.
session_id = session_id or None
config = Config()
# Sanitize filename — strip any directory components
@@ -250,16 +272,27 @@ async def upload_file(
manager = WorkspaceManager(user_id, workspace.id, session_id)
try:
workspace_file = await manager.write_file(
content, filename, overwrite=overwrite
content, filename, overwrite=overwrite, metadata={"origin": "user-upload"}
)
except ValueError as e:
raise fastapi.HTTPException(status_code=409, detail=str(e)) from e
# write_file raises ValueError for both path-conflict and size-limit
# cases; map each to its correct HTTP status.
message = str(e)
if message.startswith("File too large"):
raise fastapi.HTTPException(status_code=413, detail=message) from e
raise fastapi.HTTPException(status_code=409, detail=message) from e
# Post-write storage check — eliminates TOCTOU race on the quota.
# If a concurrent upload pushed us over the limit, undo this write.
new_total = await get_workspace_total_size(workspace.id)
if storage_limit_bytes and new_total > storage_limit_bytes:
await soft_delete_workspace_file(workspace_file.id, workspace.id)
try:
await soft_delete_workspace_file(workspace_file.id, workspace.id)
except Exception as e:
logger.warning(
f"Failed to soft-delete over-quota file {workspace_file.id} "
f"in workspace {workspace.id}: {e}"
)
raise fastapi.HTTPException(
status_code=413,
detail={
@@ -281,6 +314,7 @@ async def upload_file(
@router.get(
"/storage/usage",
summary="Get workspace storage usage",
operation_id="getWorkspaceStorageUsage",
)
async def get_storage_usage(
user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -301,3 +335,57 @@ async def get_storage_usage(
used_percent=round((used_bytes / limit_bytes) * 100, 1) if limit_bytes else 0,
file_count=file_count,
)
@router.get(
"/files",
summary="List workspace files",
operation_id="listWorkspaceFiles",
)
async def list_workspace_files(
user_id: Annotated[str, fastapi.Security(get_user_id)],
session_id: str | None = Query(default=None),
limit: int = Query(default=200, ge=1, le=1000),
offset: int = Query(default=0, ge=0),
) -> ListFilesResponse:
"""
List files in the user's workspace.
When session_id is provided, only files for that session are returned.
Otherwise, all files across sessions are listed. Results are paginated
via `limit`/`offset`; `has_more` indicates whether additional pages exist.
"""
workspace = await get_or_create_workspace(user_id)
# Treat empty-string session_id the same as omitted — an empty value
# would otherwise silently list files across every session instead of
# scoping to one.
session_id = session_id or None
manager = WorkspaceManager(user_id, workspace.id, session_id)
include_all = session_id is None
# Fetch one extra to compute has_more without a separate count query.
files = await manager.list_files(
limit=limit + 1,
offset=offset,
include_all_sessions=include_all,
)
has_more = len(files) > limit
page = files[:limit]
return ListFilesResponse(
files=[
WorkspaceFileItem(
id=f.id,
name=f.name,
path=f.path,
mime_type=f.mime_type,
size_bytes=f.size_bytes,
metadata=f.metadata or {},
created_at=f.created_at.isoformat(),
)
for f in page
],
offset=offset,
has_more=has_more,
)

View File

@@ -1,48 +1,28 @@
"""Tests for workspace file upload and download routes."""
import io
from datetime import datetime, timezone
from unittest.mock import AsyncMock, MagicMock, patch
import fastapi
import fastapi.testclient
import pytest
import pytest_mock
from backend.api.features.workspace import routes as workspace_routes
from backend.data.workspace import WorkspaceFile
from backend.api.features.workspace.routes import router
from backend.data.workspace import Workspace, WorkspaceFile
app = fastapi.FastAPI()
app.include_router(workspace_routes.router)
app.include_router(router)
@app.exception_handler(ValueError)
async def _value_error_handler(
request: fastapi.Request, exc: ValueError
) -> fastapi.responses.JSONResponse:
"""Mirror the production ValueError → 400 mapping from rest_api.py."""
"""Mirror the production ValueError → 400 mapping from the REST app."""
return fastapi.responses.JSONResponse(status_code=400, content={"detail": str(exc)})
client = fastapi.testclient.TestClient(app)
TEST_USER_ID = "3e53486c-cf57-477e-ba2a-cb02dc828e1a"
MOCK_WORKSPACE = type("W", (), {"id": "ws-1"})()
_NOW = datetime(2023, 1, 1, tzinfo=timezone.utc)
MOCK_FILE = WorkspaceFile(
id="file-aaa-bbb",
workspace_id="ws-1",
created_at=_NOW,
updated_at=_NOW,
name="hello.txt",
path="/session/hello.txt",
mime_type="text/plain",
size_bytes=13,
storage_path="local://hello.txt",
)
@pytest.fixture(autouse=True)
def setup_app_auth(mock_jwt_user):
@@ -53,25 +33,201 @@ def setup_app_auth(mock_jwt_user):
app.dependency_overrides.clear()
def _make_workspace(user_id: str = "test-user-id") -> Workspace:
return Workspace(
id="ws-001",
user_id=user_id,
created_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
updated_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
)
def _make_file(**overrides) -> WorkspaceFile:
defaults = {
"id": "file-001",
"workspace_id": "ws-001",
"created_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
"updated_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
"name": "test.txt",
"path": "/test.txt",
"storage_path": "local://test.txt",
"mime_type": "text/plain",
"size_bytes": 100,
"checksum": None,
"is_deleted": False,
"deleted_at": None,
"metadata": {},
}
defaults.update(overrides)
return WorkspaceFile(**defaults)
def _make_file_mock(**overrides) -> MagicMock:
"""Create a mock WorkspaceFile to simulate DB records with null fields."""
defaults = {
"id": "file-001",
"name": "test.txt",
"path": "/test.txt",
"mime_type": "text/plain",
"size_bytes": 100,
"metadata": {},
"created_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
}
defaults.update(overrides)
mock = MagicMock(spec=WorkspaceFile)
for k, v in defaults.items():
setattr(mock, k, v)
return mock
# -- list_workspace_files tests --
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_list_files_returns_all_when_no_session(mock_manager_cls, mock_get_workspace):
mock_get_workspace.return_value = _make_workspace()
files = [
_make_file(id="f1", name="a.txt", metadata={"origin": "user-upload"}),
_make_file(id="f2", name="b.csv", metadata={"origin": "agent-created"}),
]
mock_instance = AsyncMock()
mock_instance.list_files.return_value = files
mock_manager_cls.return_value = mock_instance
response = client.get("/files")
assert response.status_code == 200
data = response.json()
assert len(data["files"]) == 2
assert data["has_more"] is False
assert data["offset"] == 0
assert data["files"][0]["id"] == "f1"
assert data["files"][0]["metadata"] == {"origin": "user-upload"}
assert data["files"][1]["id"] == "f2"
mock_instance.list_files.assert_called_once_with(
limit=201, offset=0, include_all_sessions=True
)
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_list_files_scopes_to_session_when_provided(
mock_manager_cls, mock_get_workspace, test_user_id
):
mock_get_workspace.return_value = _make_workspace(user_id=test_user_id)
mock_instance = AsyncMock()
mock_instance.list_files.return_value = []
mock_manager_cls.return_value = mock_instance
response = client.get("/files?session_id=sess-123")
assert response.status_code == 200
data = response.json()
assert data["files"] == []
assert data["has_more"] is False
mock_manager_cls.assert_called_once_with(test_user_id, "ws-001", "sess-123")
mock_instance.list_files.assert_called_once_with(
limit=201, offset=0, include_all_sessions=False
)
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_list_files_null_metadata_coerced_to_empty_dict(
mock_manager_cls, mock_get_workspace
):
"""Route uses `f.metadata or {}` for pre-existing files with null metadata."""
mock_get_workspace.return_value = _make_workspace()
mock_instance = AsyncMock()
mock_instance.list_files.return_value = [_make_file_mock(metadata=None)]
mock_manager_cls.return_value = mock_instance
response = client.get("/files")
assert response.status_code == 200
assert response.json()["files"][0]["metadata"] == {}
# -- upload_file metadata tests --
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.get_workspace_total_size")
@patch("backend.api.features.workspace.routes.scan_content_safe")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_upload_passes_user_upload_origin_metadata(
mock_manager_cls, mock_scan, mock_total_size, mock_get_workspace
):
mock_get_workspace.return_value = _make_workspace()
mock_total_size.return_value = 100
written = _make_file(id="new-file", name="doc.pdf")
mock_instance = AsyncMock()
mock_instance.write_file.return_value = written
mock_manager_cls.return_value = mock_instance
response = client.post(
"/files/upload",
files={"file": ("doc.pdf", b"fake-pdf-content", "application/pdf")},
)
assert response.status_code == 200
mock_instance.write_file.assert_called_once()
call_kwargs = mock_instance.write_file.call_args
assert call_kwargs.kwargs.get("metadata") == {"origin": "user-upload"}
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.get_workspace_total_size")
@patch("backend.api.features.workspace.routes.scan_content_safe")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_upload_returns_409_on_file_conflict(
mock_manager_cls, mock_scan, mock_total_size, mock_get_workspace
):
mock_get_workspace.return_value = _make_workspace()
mock_total_size.return_value = 100
mock_instance = AsyncMock()
mock_instance.write_file.side_effect = ValueError("File already exists at path")
mock_manager_cls.return_value = mock_instance
response = client.post(
"/files/upload",
files={"file": ("dup.txt", b"content", "text/plain")},
)
assert response.status_code == 409
assert "already exists" in response.json()["detail"]
# -- Restored upload/download/delete security + invariant tests --
def _upload(
filename: str = "hello.txt",
content: bytes = b"Hello, world!",
content_type: str = "text/plain",
):
"""Helper to POST a file upload."""
return client.post(
"/files/upload?session_id=sess-1",
files={"file": (filename, io.BytesIO(content), content_type)},
)
# ---- Happy path ----
_MOCK_FILE = WorkspaceFile(
id="file-aaa-bbb",
workspace_id="ws-001",
created_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
updated_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
name="hello.txt",
path="/sessions/sess-1/hello.txt",
mime_type="text/plain",
size_bytes=13,
storage_path="local://hello.txt",
)
def test_upload_happy_path(mocker: pytest_mock.MockFixture):
def test_upload_happy_path(mocker):
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
@@ -82,7 +238,7 @@ def test_upload_happy_path(mocker: pytest_mock.MockFixture):
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
@@ -96,10 +252,7 @@ def test_upload_happy_path(mocker: pytest_mock.MockFixture):
assert data["size_bytes"] == 13
# ---- Per-file size limit ----
def test_upload_exceeds_max_file_size(mocker: pytest_mock.MockFixture):
def test_upload_exceeds_max_file_size(mocker):
"""Files larger than max_file_size_mb should be rejected with 413."""
cfg = mocker.patch("backend.api.features.workspace.routes.Config")
cfg.return_value.max_file_size_mb = 0 # 0 MB → any content is too big
@@ -109,15 +262,11 @@ def test_upload_exceeds_max_file_size(mocker: pytest_mock.MockFixture):
assert response.status_code == 413
# ---- Storage quota exceeded ----
def test_upload_storage_quota_exceeded(mocker: pytest_mock.MockFixture):
def test_upload_storage_quota_exceeded(mocker):
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
# Current usage already at limit
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
return_value=500 * 1024 * 1024,
@@ -128,27 +277,22 @@ def test_upload_storage_quota_exceeded(mocker: pytest_mock.MockFixture):
assert "Storage limit exceeded" in response.text
# ---- Post-write quota race (B2) ----
def test_upload_post_write_quota_race(mocker: pytest_mock.MockFixture):
"""If a concurrent upload tips the total over the limit after write,
the file should be soft-deleted and 413 returned."""
def test_upload_post_write_quota_race(mocker):
"""Concurrent upload tipping over limit after write should soft-delete + 413."""
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
# Pre-write check passes (under limit), but post-write check fails
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
side_effect=[0, 600 * 1024 * 1024], # first call OK, second over limit
side_effect=[0, 600 * 1024 * 1024],
)
mocker.patch(
"backend.api.features.workspace.routes.scan_content_safe",
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
@@ -160,17 +304,14 @@ def test_upload_post_write_quota_race(mocker: pytest_mock.MockFixture):
response = _upload()
assert response.status_code == 413
mock_delete.assert_called_once_with("file-aaa-bbb", "ws-1")
mock_delete.assert_called_once_with("file-aaa-bbb", "ws-001")
# ---- Any extension accepted (no allowlist) ----
def test_upload_any_extension(mocker: pytest_mock.MockFixture):
def test_upload_any_extension(mocker):
"""Any file extension should be accepted — ClamAV is the security layer."""
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
@@ -181,7 +322,7 @@ def test_upload_any_extension(mocker: pytest_mock.MockFixture):
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
@@ -191,16 +332,13 @@ def test_upload_any_extension(mocker: pytest_mock.MockFixture):
assert response.status_code == 200
# ---- Virus scan rejection ----
def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
def test_upload_blocked_by_virus_scan(mocker):
"""Files flagged by ClamAV should be rejected and never written to storage."""
from backend.api.features.store.exceptions import VirusDetectedError
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
@@ -211,7 +349,7 @@ def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
side_effect=VirusDetectedError("Eicar-Test-Signature"),
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
@@ -219,18 +357,14 @@ def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
response = _upload(filename="evil.exe", content=b"X5O!P%@AP...")
assert response.status_code == 400
assert "Virus detected" in response.text
mock_manager.write_file.assert_not_called()
# ---- No file extension ----
def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
def test_upload_file_without_extension(mocker):
"""Files without an extension should be accepted and stored as-is."""
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
@@ -241,7 +375,7 @@ def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
@@ -257,14 +391,11 @@ def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
assert mock_manager.write_file.call_args[0][1] == "Makefile"
# ---- Filename sanitization (SF5) ----
def test_upload_strips_path_components(mocker: pytest_mock.MockFixture):
def test_upload_strips_path_components(mocker):
"""Path-traversal filenames should be reduced to their basename."""
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
@@ -275,28 +406,23 @@ def test_upload_strips_path_components(mocker: pytest_mock.MockFixture):
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
)
# Filename with traversal
_upload(filename="../../etc/passwd.txt")
# write_file should have been called with just the basename
mock_manager.write_file.assert_called_once()
call_args = mock_manager.write_file.call_args
assert call_args[0][1] == "passwd.txt"
# ---- Download ----
def test_download_file_not_found(mocker: pytest_mock.MockFixture):
def test_download_file_not_found(mocker):
mocker.patch(
"backend.api.features.workspace.routes.get_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_file",
@@ -307,14 +433,11 @@ def test_download_file_not_found(mocker: pytest_mock.MockFixture):
assert response.status_code == 404
# ---- Delete ----
def test_delete_file_success(mocker: pytest_mock.MockFixture):
def test_delete_file_success(mocker):
"""Deleting an existing file should return {"deleted": true}."""
mocker.patch(
"backend.api.features.workspace.routes.get_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
mock_manager = mocker.MagicMock()
mock_manager.delete_file = mocker.AsyncMock(return_value=True)
@@ -329,11 +452,11 @@ def test_delete_file_success(mocker: pytest_mock.MockFixture):
mock_manager.delete_file.assert_called_once_with("file-aaa-bbb")
def test_delete_file_not_found(mocker: pytest_mock.MockFixture):
def test_delete_file_not_found(mocker):
"""Deleting a non-existent file should return 404."""
mocker.patch(
"backend.api.features.workspace.routes.get_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
mock_manager = mocker.MagicMock()
mock_manager.delete_file = mocker.AsyncMock(return_value=False)
@@ -347,7 +470,7 @@ def test_delete_file_not_found(mocker: pytest_mock.MockFixture):
assert "File not found" in response.text
def test_delete_file_no_workspace(mocker: pytest_mock.MockFixture):
def test_delete_file_no_workspace(mocker):
"""Deleting when user has no workspace should return 404."""
mocker.patch(
"backend.api.features.workspace.routes.get_workspace",
@@ -357,3 +480,123 @@ def test_delete_file_no_workspace(mocker: pytest_mock.MockFixture):
response = client.delete("/files/file-aaa-bbb")
assert response.status_code == 404
assert "Workspace not found" in response.text
def test_upload_write_file_too_large_returns_413(mocker):
"""write_file raises ValueError("File too large: …") → must map to 413."""
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=_make_workspace(),
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
return_value=0,
)
mocker.patch(
"backend.api.features.workspace.routes.scan_content_safe",
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(
side_effect=ValueError("File too large: 900 bytes exceeds 1MB limit")
)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
)
response = _upload()
assert response.status_code == 413
assert "File too large" in response.text
def test_upload_write_file_conflict_returns_409(mocker):
"""Non-'File too large' ValueErrors from write_file stay as 409."""
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=_make_workspace(),
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
return_value=0,
)
mocker.patch(
"backend.api.features.workspace.routes.scan_content_safe",
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(
side_effect=ValueError("File already exists at path: /sessions/x/a.txt")
)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
)
response = _upload()
assert response.status_code == 409
assert "already exists" in response.text
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_list_files_has_more_true_when_limit_exceeded(
mock_manager_cls, mock_get_workspace
):
"""The limit+1 fetch trick must flip has_more=True and trim the page."""
mock_get_workspace.return_value = _make_workspace()
# Backend was asked for limit+1=3, and returned exactly 3 items.
files = [
_make_file(id="f1", name="a.txt"),
_make_file(id="f2", name="b.txt"),
_make_file(id="f3", name="c.txt"),
]
mock_instance = AsyncMock()
mock_instance.list_files.return_value = files
mock_manager_cls.return_value = mock_instance
response = client.get("/files?limit=2")
assert response.status_code == 200
data = response.json()
assert data["has_more"] is True
assert len(data["files"]) == 2
assert data["files"][0]["id"] == "f1"
assert data["files"][1]["id"] == "f2"
mock_instance.list_files.assert_called_once_with(
limit=3, offset=0, include_all_sessions=True
)
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_list_files_has_more_false_when_exactly_page_size(
mock_manager_cls, mock_get_workspace
):
"""Exactly `limit` rows means we're on the last page — has_more=False."""
mock_get_workspace.return_value = _make_workspace()
files = [_make_file(id="f1", name="a.txt"), _make_file(id="f2", name="b.txt")]
mock_instance = AsyncMock()
mock_instance.list_files.return_value = files
mock_manager_cls.return_value = mock_instance
response = client.get("/files?limit=2")
assert response.status_code == 200
data = response.json()
assert data["has_more"] is False
assert len(data["files"]) == 2
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_list_files_offset_is_echoed_back(mock_manager_cls, mock_get_workspace):
mock_get_workspace.return_value = _make_workspace()
mock_instance = AsyncMock()
mock_instance.list_files.return_value = []
mock_manager_cls.return_value = mock_instance
response = client.get("/files?offset=50&limit=10")
assert response.status_code == 200
assert response.json()["offset"] == 50
mock_instance.list_files.assert_called_once_with(
limit=11, offset=50, include_all_sessions=True
)

View File

@@ -18,6 +18,7 @@ from prisma.errors import PrismaError
import backend.api.features.admin.credit_admin_routes
import backend.api.features.admin.execution_analytics_routes
import backend.api.features.admin.platform_cost_routes
import backend.api.features.admin.rate_limit_admin_routes
import backend.api.features.admin.store_admin_routes
import backend.api.features.builder
@@ -329,6 +330,11 @@ app.include_router(
tags=["v2", "admin"],
prefix="/api/copilot",
)
app.include_router(
backend.api.features.admin.platform_cost_routes.router,
tags=["v2", "admin"],
prefix="/api/admin",
)
app.include_router(
backend.api.features.executions.review.routes.router,
tags=["v2", "executions", "review"],

View File

@@ -17,7 +17,7 @@ from backend.blocks.apollo.models import (
PrimaryPhone,
SearchOrganizationsRequest,
)
from backend.data.model import CredentialsField, SchemaField
from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField
class SearchOrganizationsBlock(Block):
@@ -218,6 +218,11 @@ To find IDs, identify the values for organization_id when you call this endpoint
) -> BlockOutput:
query = SearchOrganizationsRequest(**input_data.model_dump())
organizations = await self.search_organizations(query, credentials)
self.merge_stats(
NodeExecutionStats(
provider_cost=float(len(organizations)), provider_cost_type="items"
)
)
for organization in organizations:
yield "organization", organization
yield "organizations", organizations

View File

@@ -21,7 +21,7 @@ from backend.blocks.apollo.models import (
SearchPeopleRequest,
SenorityLevels,
)
from backend.data.model import CredentialsField, SchemaField
from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField
class SearchPeopleBlock(Block):
@@ -366,4 +366,9 @@ class SearchPeopleBlock(Block):
*(enrich_or_fallback(person) for person in people)
)
self.merge_stats(
NodeExecutionStats(
provider_cost=float(len(people)), provider_cost_type="items"
)
)
yield "people", people

View File

@@ -0,0 +1,712 @@
"""Unit tests for merge_stats cost tracking in individual blocks.
Covers the exa code_context, exa contents, and apollo organization blocks
to verify provider cost is correctly extracted and reported.
"""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from pydantic import SecretStr
from backend.data.model import APIKeyCredentials, NodeExecutionStats
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
TEST_EXA_CREDENTIALS = APIKeyCredentials(
id="01234567-89ab-cdef-0123-456789abcdef",
provider="exa",
api_key=SecretStr("mock-exa-api-key"),
title="Mock Exa API key",
expires_at=None,
)
TEST_EXA_CREDENTIALS_INPUT = {
"provider": TEST_EXA_CREDENTIALS.provider,
"id": TEST_EXA_CREDENTIALS.id,
"type": TEST_EXA_CREDENTIALS.type,
"title": TEST_EXA_CREDENTIALS.title,
}
# ---------------------------------------------------------------------------
# ExaCodeContextBlock — cost_dollars is a string like "0.005"
# ---------------------------------------------------------------------------
class TestExaCodeContextBlockCostTracking:
@pytest.mark.asyncio
async def test_merge_stats_called_with_float_cost(self):
"""float(cost_dollars) parsed from API string and passed to merge_stats."""
from backend.blocks.exa.code_context import ExaCodeContextBlock
block = ExaCodeContextBlock()
api_response = {
"requestId": "req-1",
"query": "how to use hooks",
"response": "Here are some examples...",
"resultsCount": 3,
"costDollars": "0.005",
"searchTime": 1.2,
"outputTokens": 100,
}
mock_resp = MagicMock()
mock_resp.json.return_value = api_response
accumulated: list[NodeExecutionStats] = []
with (
patch(
"backend.blocks.exa.code_context.Requests.post",
new_callable=AsyncMock,
return_value=mock_resp,
),
patch.object(
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
),
):
input_data = ExaCodeContextBlock.Input(
query="how to use hooks",
credentials=TEST_EXA_CREDENTIALS_INPUT, # type: ignore[arg-type]
)
results = []
async for output in block.run(
input_data,
credentials=TEST_EXA_CREDENTIALS,
):
results.append(output)
assert len(accumulated) == 1
assert accumulated[0].provider_cost == pytest.approx(0.005)
@pytest.mark.asyncio
async def test_invalid_cost_dollars_does_not_raise(self):
"""When cost_dollars cannot be parsed as float, merge_stats is not called."""
from backend.blocks.exa.code_context import ExaCodeContextBlock
block = ExaCodeContextBlock()
api_response = {
"requestId": "req-2",
"query": "query",
"response": "response",
"resultsCount": 0,
"costDollars": "N/A",
"searchTime": 0.5,
"outputTokens": 0,
}
mock_resp = MagicMock()
mock_resp.json.return_value = api_response
merge_calls: list[NodeExecutionStats] = []
with (
patch(
"backend.blocks.exa.code_context.Requests.post",
new_callable=AsyncMock,
return_value=mock_resp,
),
patch.object(
block, "merge_stats", side_effect=lambda s: merge_calls.append(s)
),
):
input_data = ExaCodeContextBlock.Input(
query="query",
credentials=TEST_EXA_CREDENTIALS_INPUT, # type: ignore[arg-type]
)
async for _ in block.run(
input_data,
credentials=TEST_EXA_CREDENTIALS,
):
pass
assert merge_calls == []
@pytest.mark.asyncio
async def test_zero_cost_is_tracked(self):
"""A zero cost_dollars string '0.0' should still be recorded."""
from backend.blocks.exa.code_context import ExaCodeContextBlock
block = ExaCodeContextBlock()
api_response = {
"requestId": "req-3",
"query": "query",
"response": "...",
"resultsCount": 1,
"costDollars": "0.0",
"searchTime": 0.1,
"outputTokens": 10,
}
mock_resp = MagicMock()
mock_resp.json.return_value = api_response
accumulated: list[NodeExecutionStats] = []
with (
patch(
"backend.blocks.exa.code_context.Requests.post",
new_callable=AsyncMock,
return_value=mock_resp,
),
patch.object(
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
),
):
input_data = ExaCodeContextBlock.Input(
query="query",
credentials=TEST_EXA_CREDENTIALS_INPUT, # type: ignore[arg-type]
)
async for _ in block.run(
input_data,
credentials=TEST_EXA_CREDENTIALS,
):
pass
assert len(accumulated) == 1
assert accumulated[0].provider_cost == 0.0
# ---------------------------------------------------------------------------
# ExaContentsBlock — response.cost_dollars.total (CostDollars model)
# ---------------------------------------------------------------------------
class TestExaContentsBlockCostTracking:
@pytest.mark.asyncio
async def test_merge_stats_called_with_cost_dollars_total(self):
"""provider_cost equals response.cost_dollars.total when present."""
from backend.blocks.exa.contents import ExaContentsBlock
from backend.blocks.exa.helpers import CostDollars
block = ExaContentsBlock()
cost_dollars = CostDollars(total=0.012)
mock_response = MagicMock()
mock_response.results = []
mock_response.context = None
mock_response.statuses = None
mock_response.cost_dollars = cost_dollars
accumulated: list[NodeExecutionStats] = []
with (
patch(
"backend.blocks.exa.contents.AsyncExa",
return_value=MagicMock(
get_contents=AsyncMock(return_value=mock_response)
),
),
patch.object(
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
),
):
input_data = ExaContentsBlock.Input(
urls=["https://example.com"],
credentials=TEST_EXA_CREDENTIALS_INPUT, # type: ignore[arg-type]
)
async for _ in block.run(
input_data,
credentials=TEST_EXA_CREDENTIALS,
):
pass
assert len(accumulated) == 1
assert accumulated[0].provider_cost == pytest.approx(0.012)
@pytest.mark.asyncio
async def test_no_merge_stats_when_cost_dollars_absent(self):
"""When response.cost_dollars is None, merge_stats is not called."""
from backend.blocks.exa.contents import ExaContentsBlock
block = ExaContentsBlock()
mock_response = MagicMock()
mock_response.results = []
mock_response.context = None
mock_response.statuses = None
mock_response.cost_dollars = None
accumulated: list[NodeExecutionStats] = []
with (
patch(
"backend.blocks.exa.contents.AsyncExa",
return_value=MagicMock(
get_contents=AsyncMock(return_value=mock_response)
),
),
patch.object(
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
),
):
input_data = ExaContentsBlock.Input(
urls=["https://example.com"],
credentials=TEST_EXA_CREDENTIALS_INPUT, # type: ignore[arg-type]
)
async for _ in block.run(
input_data,
credentials=TEST_EXA_CREDENTIALS,
):
pass
assert accumulated == []
# ---------------------------------------------------------------------------
# SearchOrganizationsBlock — provider_cost = float(len(organizations))
# ---------------------------------------------------------------------------
class TestSearchOrganizationsBlockCostTracking:
@pytest.mark.asyncio
async def test_merge_stats_called_with_org_count(self):
"""provider_cost == number of returned organizations, type == 'items'."""
from backend.blocks.apollo._auth import TEST_CREDENTIALS as APOLLO_CREDS
from backend.blocks.apollo._auth import (
TEST_CREDENTIALS_INPUT as APOLLO_CREDS_INPUT,
)
from backend.blocks.apollo.models import Organization
from backend.blocks.apollo.organization import SearchOrganizationsBlock
block = SearchOrganizationsBlock()
fake_orgs = [Organization(id=str(i), name=f"Org{i}") for i in range(3)]
accumulated: list[NodeExecutionStats] = []
with (
patch.object(
SearchOrganizationsBlock,
"search_organizations",
new_callable=AsyncMock,
return_value=fake_orgs,
),
patch.object(
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
),
):
input_data = SearchOrganizationsBlock.Input(
credentials=APOLLO_CREDS_INPUT, # type: ignore[arg-type]
)
results = []
async for output in block.run(
input_data,
credentials=APOLLO_CREDS,
):
results.append(output)
assert len(accumulated) == 1
assert accumulated[0].provider_cost == pytest.approx(3.0)
assert accumulated[0].provider_cost_type == "items"
@pytest.mark.asyncio
async def test_empty_org_list_tracks_zero(self):
"""An empty organization list results in provider_cost=0.0."""
from backend.blocks.apollo._auth import TEST_CREDENTIALS as APOLLO_CREDS
from backend.blocks.apollo._auth import (
TEST_CREDENTIALS_INPUT as APOLLO_CREDS_INPUT,
)
from backend.blocks.apollo.organization import SearchOrganizationsBlock
block = SearchOrganizationsBlock()
accumulated: list[NodeExecutionStats] = []
with (
patch.object(
SearchOrganizationsBlock,
"search_organizations",
new_callable=AsyncMock,
return_value=[],
),
patch.object(
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
),
):
input_data = SearchOrganizationsBlock.Input(
credentials=APOLLO_CREDS_INPUT, # type: ignore[arg-type]
)
async for _ in block.run(
input_data,
credentials=APOLLO_CREDS,
):
pass
assert len(accumulated) == 1
assert accumulated[0].provider_cost == 0.0
assert accumulated[0].provider_cost_type == "items"
# ---------------------------------------------------------------------------
# JinaEmbeddingBlock — token count from usage.total_tokens
# ---------------------------------------------------------------------------
class TestJinaEmbeddingBlockCostTracking:
@pytest.mark.asyncio
async def test_merge_stats_called_with_token_count(self):
"""provider token count is recorded when API returns usage.total_tokens."""
from backend.blocks.jina._auth import TEST_CREDENTIALS as JINA_CREDS
from backend.blocks.jina._auth import TEST_CREDENTIALS_INPUT as JINA_CREDS_INPUT
from backend.blocks.jina.embeddings import JinaEmbeddingBlock
block = JinaEmbeddingBlock()
api_response = {
"data": [{"embedding": [0.1, 0.2, 0.3]}],
"usage": {"total_tokens": 42},
}
mock_resp = MagicMock()
mock_resp.json.return_value = api_response
accumulated: list[NodeExecutionStats] = []
with (
patch(
"backend.blocks.jina.embeddings.Requests.post",
new_callable=AsyncMock,
return_value=mock_resp,
),
patch.object(
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
),
):
input_data = JinaEmbeddingBlock.Input(
texts=["hello world"],
credentials=JINA_CREDS_INPUT, # type: ignore[arg-type]
)
async for _ in block.run(input_data, credentials=JINA_CREDS):
pass
assert len(accumulated) == 1
assert accumulated[0].input_token_count == 42
@pytest.mark.asyncio
async def test_no_merge_stats_when_usage_absent(self):
"""When API response omits usage field, merge_stats is not called."""
from backend.blocks.jina._auth import TEST_CREDENTIALS as JINA_CREDS
from backend.blocks.jina._auth import TEST_CREDENTIALS_INPUT as JINA_CREDS_INPUT
from backend.blocks.jina.embeddings import JinaEmbeddingBlock
block = JinaEmbeddingBlock()
api_response = {
"data": [{"embedding": [0.1, 0.2, 0.3]}],
}
mock_resp = MagicMock()
mock_resp.json.return_value = api_response
accumulated: list[NodeExecutionStats] = []
with (
patch(
"backend.blocks.jina.embeddings.Requests.post",
new_callable=AsyncMock,
return_value=mock_resp,
),
patch.object(
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
),
):
input_data = JinaEmbeddingBlock.Input(
texts=["hello"],
credentials=JINA_CREDS_INPUT, # type: ignore[arg-type]
)
async for _ in block.run(input_data, credentials=JINA_CREDS):
pass
assert accumulated == []
# ---------------------------------------------------------------------------
# UnrealTextToSpeechBlock — character count from input text length
# ---------------------------------------------------------------------------
class TestUnrealTextToSpeechBlockCostTracking:
@pytest.mark.asyncio
async def test_merge_stats_called_with_character_count(self):
"""provider_cost equals len(text) with type='characters'."""
from backend.blocks.text_to_speech_block import TEST_CREDENTIALS as TTS_CREDS
from backend.blocks.text_to_speech_block import (
TEST_CREDENTIALS_INPUT as TTS_CREDS_INPUT,
)
from backend.blocks.text_to_speech_block import UnrealTextToSpeechBlock
block = UnrealTextToSpeechBlock()
test_text = "Hello, world!"
with (
patch.object(
UnrealTextToSpeechBlock,
"call_unreal_speech_api",
new_callable=AsyncMock,
return_value={"OutputUri": "https://example.com/audio.mp3"},
),
patch.object(block, "merge_stats") as mock_merge,
):
input_data = UnrealTextToSpeechBlock.Input(
text=test_text,
credentials=TTS_CREDS_INPUT, # type: ignore[arg-type]
)
async for _ in block.run(input_data, credentials=TTS_CREDS):
pass
mock_merge.assert_called_once()
stats = mock_merge.call_args[0][0]
assert stats.provider_cost == float(len(test_text))
assert stats.provider_cost_type == "characters"
@pytest.mark.asyncio
async def test_empty_text_gives_zero_characters(self):
"""An empty text string results in provider_cost=0.0."""
from backend.blocks.text_to_speech_block import TEST_CREDENTIALS as TTS_CREDS
from backend.blocks.text_to_speech_block import (
TEST_CREDENTIALS_INPUT as TTS_CREDS_INPUT,
)
from backend.blocks.text_to_speech_block import UnrealTextToSpeechBlock
block = UnrealTextToSpeechBlock()
with (
patch.object(
UnrealTextToSpeechBlock,
"call_unreal_speech_api",
new_callable=AsyncMock,
return_value={"OutputUri": "https://example.com/audio.mp3"},
),
patch.object(block, "merge_stats") as mock_merge,
):
input_data = UnrealTextToSpeechBlock.Input(
text="",
credentials=TTS_CREDS_INPUT, # type: ignore[arg-type]
)
async for _ in block.run(input_data, credentials=TTS_CREDS):
pass
mock_merge.assert_called_once()
stats = mock_merge.call_args[0][0]
assert stats.provider_cost == 0.0
assert stats.provider_cost_type == "characters"
# ---------------------------------------------------------------------------
# GoogleMapsSearchBlock — item count from search_places results
# ---------------------------------------------------------------------------
class TestGoogleMapsSearchBlockCostTracking:
@pytest.mark.asyncio
async def test_merge_stats_called_with_place_count(self):
"""provider_cost equals number of returned places, type == 'items'."""
from backend.blocks.google_maps import TEST_CREDENTIALS as MAPS_CREDS
from backend.blocks.google_maps import (
TEST_CREDENTIALS_INPUT as MAPS_CREDS_INPUT,
)
from backend.blocks.google_maps import GoogleMapsSearchBlock
block = GoogleMapsSearchBlock()
fake_places = [{"name": f"Place{i}", "address": f"Addr{i}"} for i in range(4)]
accumulated: list[NodeExecutionStats] = []
with (
patch.object(
GoogleMapsSearchBlock,
"search_places",
return_value=fake_places,
),
patch.object(
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
),
):
input_data = GoogleMapsSearchBlock.Input(
query="coffee shops",
credentials=MAPS_CREDS_INPUT, # type: ignore[arg-type]
)
async for _ in block.run(input_data, credentials=MAPS_CREDS):
pass
assert len(accumulated) == 1
assert accumulated[0].provider_cost == 4.0
assert accumulated[0].provider_cost_type == "items"
@pytest.mark.asyncio
async def test_empty_results_tracks_zero(self):
"""Zero places returned results in provider_cost=0.0."""
from backend.blocks.google_maps import TEST_CREDENTIALS as MAPS_CREDS
from backend.blocks.google_maps import (
TEST_CREDENTIALS_INPUT as MAPS_CREDS_INPUT,
)
from backend.blocks.google_maps import GoogleMapsSearchBlock
block = GoogleMapsSearchBlock()
accumulated: list[NodeExecutionStats] = []
with (
patch.object(
GoogleMapsSearchBlock,
"search_places",
return_value=[],
),
patch.object(
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
),
):
input_data = GoogleMapsSearchBlock.Input(
query="nothing here",
credentials=MAPS_CREDS_INPUT, # type: ignore[arg-type]
)
async for _ in block.run(input_data, credentials=MAPS_CREDS):
pass
assert len(accumulated) == 1
assert accumulated[0].provider_cost == 0.0
assert accumulated[0].provider_cost_type == "items"
# ---------------------------------------------------------------------------
# SmartLeadAddLeadsBlock — item count from lead_list length
# ---------------------------------------------------------------------------
class TestSmartLeadAddLeadsBlockCostTracking:
@pytest.mark.asyncio
async def test_merge_stats_called_with_lead_count(self):
"""provider_cost equals number of leads uploaded, type == 'items'."""
from backend.blocks.smartlead._auth import TEST_CREDENTIALS as SL_CREDS
from backend.blocks.smartlead._auth import (
TEST_CREDENTIALS_INPUT as SL_CREDS_INPUT,
)
from backend.blocks.smartlead.campaign import AddLeadToCampaignBlock
from backend.blocks.smartlead.models import (
AddLeadsToCampaignResponse,
LeadInput,
)
block = AddLeadToCampaignBlock()
fake_leads = [
LeadInput(first_name="Alice", last_name="A", email="alice@example.com"),
LeadInput(first_name="Bob", last_name="B", email="bob@example.com"),
]
fake_response = AddLeadsToCampaignResponse(
ok=True,
upload_count=2,
total_leads=2,
block_count=0,
duplicate_count=0,
invalid_email_count=0,
invalid_emails=[],
already_added_to_campaign=0,
unsubscribed_leads=[],
is_lead_limit_exhausted=False,
lead_import_stopped_count=0,
bounce_count=0,
)
accumulated: list[NodeExecutionStats] = []
with (
patch.object(
AddLeadToCampaignBlock,
"add_leads_to_campaign",
new_callable=AsyncMock,
return_value=fake_response,
),
patch.object(
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
),
):
input_data = AddLeadToCampaignBlock.Input(
campaign_id=123,
lead_list=fake_leads,
credentials=SL_CREDS_INPUT, # type: ignore[arg-type]
)
async for _ in block.run(input_data, credentials=SL_CREDS):
pass
assert len(accumulated) == 1
assert accumulated[0].provider_cost == 2.0
assert accumulated[0].provider_cost_type == "items"
# ---------------------------------------------------------------------------
# SearchPeopleBlock — item count from people list length
# ---------------------------------------------------------------------------
class TestSearchPeopleBlockCostTracking:
@pytest.mark.asyncio
async def test_merge_stats_called_with_people_count(self):
"""provider_cost equals number of returned people, type == 'items'."""
from backend.blocks.apollo._auth import TEST_CREDENTIALS as APOLLO_CREDS
from backend.blocks.apollo._auth import (
TEST_CREDENTIALS_INPUT as APOLLO_CREDS_INPUT,
)
from backend.blocks.apollo.models import Contact
from backend.blocks.apollo.people import SearchPeopleBlock
block = SearchPeopleBlock()
fake_people = [Contact(id=str(i), first_name=f"Person{i}") for i in range(5)]
accumulated: list[NodeExecutionStats] = []
with (
patch.object(
SearchPeopleBlock,
"search_people",
new_callable=AsyncMock,
return_value=fake_people,
),
patch.object(
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
),
):
input_data = SearchPeopleBlock.Input(
credentials=APOLLO_CREDS_INPUT, # type: ignore[arg-type]
)
async for _ in block.run(input_data, credentials=APOLLO_CREDS):
pass
assert len(accumulated) == 1
assert accumulated[0].provider_cost == pytest.approx(5.0)
assert accumulated[0].provider_cost_type == "items"
@pytest.mark.asyncio
async def test_empty_people_list_tracks_zero(self):
"""An empty people list results in provider_cost=0.0."""
from backend.blocks.apollo._auth import TEST_CREDENTIALS as APOLLO_CREDS
from backend.blocks.apollo._auth import (
TEST_CREDENTIALS_INPUT as APOLLO_CREDS_INPUT,
)
from backend.blocks.apollo.people import SearchPeopleBlock
block = SearchPeopleBlock()
accumulated: list[NodeExecutionStats] = []
with (
patch.object(
SearchPeopleBlock,
"search_people",
new_callable=AsyncMock,
return_value=[],
),
patch.object(
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
),
):
input_data = SearchPeopleBlock.Input(
credentials=APOLLO_CREDS_INPUT, # type: ignore[arg-type]
)
async for _ in block.run(input_data, credentials=APOLLO_CREDS):
pass
assert len(accumulated) == 1
assert accumulated[0].provider_cost == 0.0
assert accumulated[0].provider_cost_type == "items"

View File

@@ -9,6 +9,7 @@ from typing import Union
from pydantic import BaseModel
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
@@ -116,3 +117,10 @@ class ExaCodeContextBlock(Block):
yield "cost_dollars", context.cost_dollars
yield "search_time", context.search_time
yield "output_tokens", context.output_tokens
# Parse cost_dollars (API returns as string, e.g. "0.005")
try:
cost_usd = float(context.cost_dollars)
self.merge_stats(NodeExecutionStats(provider_cost=cost_usd))
except (ValueError, TypeError):
pass

View File

@@ -4,6 +4,7 @@ from typing import Optional
from exa_py import AsyncExa
from pydantic import BaseModel
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
@@ -223,3 +224,6 @@ class ExaContentsBlock(Block):
if response.cost_dollars:
yield "cost_dollars", response.cost_dollars
self.merge_stats(
NodeExecutionStats(provider_cost=response.cost_dollars.total)
)

View File

@@ -0,0 +1,575 @@
"""Tests for cost tracking in Exa blocks.
Covers the cost_dollars → provider_cost → merge_stats path for both
ExaContentsBlock and ExaCodeContextBlock.
"""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from backend.blocks.exa._test import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT
from backend.data.model import NodeExecutionStats
class TestExaCodeContextCostTracking:
"""ExaCodeContextBlock parses cost_dollars (string) and calls merge_stats."""
@pytest.mark.asyncio
async def test_valid_cost_string_is_parsed_and_merged(self):
"""A numeric cost string like '0.005' is merged as provider_cost."""
from backend.blocks.exa.code_context import ExaCodeContextBlock
block = ExaCodeContextBlock()
merged: list[NodeExecutionStats] = []
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
api_response = {
"requestId": "req-1",
"query": "test query",
"response": "some code",
"resultsCount": 3,
"costDollars": "0.005",
"searchTime": 1.2,
"outputTokens": 100,
}
with patch("backend.blocks.exa.code_context.Requests") as mock_requests_cls:
mock_resp = MagicMock()
mock_resp.json.return_value = api_response
mock_requests_cls.return_value.post = AsyncMock(return_value=mock_resp)
outputs = []
async for key, value in block.run(
block.Input(query="test query", credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
credentials=TEST_CREDENTIALS,
):
outputs.append((key, value))
assert any(k == "cost_dollars" for k, _ in outputs)
assert len(merged) == 1
assert merged[0].provider_cost == pytest.approx(0.005)
@pytest.mark.asyncio
async def test_invalid_cost_string_does_not_raise(self):
"""A non-numeric cost_dollars value is swallowed silently."""
from backend.blocks.exa.code_context import ExaCodeContextBlock
block = ExaCodeContextBlock()
merged: list[NodeExecutionStats] = []
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
api_response = {
"requestId": "req-2",
"query": "test",
"response": "code",
"resultsCount": 0,
"costDollars": "N/A",
"searchTime": 0.5,
"outputTokens": 0,
}
with patch("backend.blocks.exa.code_context.Requests") as mock_requests_cls:
mock_resp = MagicMock()
mock_resp.json.return_value = api_response
mock_requests_cls.return_value.post = AsyncMock(return_value=mock_resp)
outputs = []
async for key, value in block.run(
block.Input(query="test", credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
credentials=TEST_CREDENTIALS,
):
outputs.append((key, value))
# No merge_stats call because float() raised ValueError
assert len(merged) == 0
@pytest.mark.asyncio
async def test_zero_cost_string_is_merged(self):
"""'0.0' is a valid cost — should still be tracked."""
from backend.blocks.exa.code_context import ExaCodeContextBlock
block = ExaCodeContextBlock()
merged: list[NodeExecutionStats] = []
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
api_response = {
"requestId": "req-3",
"query": "free query",
"response": "result",
"resultsCount": 1,
"costDollars": "0.0",
"searchTime": 0.1,
"outputTokens": 10,
}
with patch("backend.blocks.exa.code_context.Requests") as mock_requests_cls:
mock_resp = MagicMock()
mock_resp.json.return_value = api_response
mock_requests_cls.return_value.post = AsyncMock(return_value=mock_resp)
async for _ in block.run(
block.Input(query="free query", credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
credentials=TEST_CREDENTIALS,
):
pass
assert len(merged) == 1
assert merged[0].provider_cost == pytest.approx(0.0)
class TestExaContentsCostTracking:
"""ExaContentsBlock merges cost_dollars.total as provider_cost."""
@pytest.mark.asyncio
async def test_cost_dollars_total_is_merged(self):
"""When the SDK response includes cost_dollars, its total is merged."""
from backend.blocks.exa.contents import ExaContentsBlock
from backend.blocks.exa.helpers import CostDollars
block = ExaContentsBlock()
merged: list[NodeExecutionStats] = []
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
mock_sdk_response = MagicMock()
mock_sdk_response.results = []
mock_sdk_response.context = None
mock_sdk_response.statuses = None
mock_sdk_response.cost_dollars = CostDollars(total=0.012)
with patch("backend.blocks.exa.contents.AsyncExa") as mock_exa_cls:
mock_exa = MagicMock()
mock_exa.get_contents = AsyncMock(return_value=mock_sdk_response)
mock_exa_cls.return_value = mock_exa
async for _ in block.run(
block.Input(urls=["https://example.com"], credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
credentials=TEST_CREDENTIALS,
):
pass
assert len(merged) == 1
assert merged[0].provider_cost == pytest.approx(0.012)
@pytest.mark.asyncio
async def test_no_cost_dollars_skips_merge(self):
"""When cost_dollars is absent, merge_stats is not called."""
from backend.blocks.exa.contents import ExaContentsBlock
block = ExaContentsBlock()
merged: list[NodeExecutionStats] = []
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
mock_sdk_response = MagicMock()
mock_sdk_response.results = []
mock_sdk_response.context = None
mock_sdk_response.statuses = None
mock_sdk_response.cost_dollars = None
with patch("backend.blocks.exa.contents.AsyncExa") as mock_exa_cls:
mock_exa = MagicMock()
mock_exa.get_contents = AsyncMock(return_value=mock_sdk_response)
mock_exa_cls.return_value = mock_exa
async for _ in block.run(
block.Input(urls=["https://example.com"], credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
credentials=TEST_CREDENTIALS,
):
pass
assert len(merged) == 0
@pytest.mark.asyncio
async def test_zero_cost_dollars_is_merged(self):
"""A total of 0.0 (free tier) should still be merged."""
from backend.blocks.exa.contents import ExaContentsBlock
from backend.blocks.exa.helpers import CostDollars
block = ExaContentsBlock()
merged: list[NodeExecutionStats] = []
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
mock_sdk_response = MagicMock()
mock_sdk_response.results = []
mock_sdk_response.context = None
mock_sdk_response.statuses = None
mock_sdk_response.cost_dollars = CostDollars(total=0.0)
with patch("backend.blocks.exa.contents.AsyncExa") as mock_exa_cls:
mock_exa = MagicMock()
mock_exa.get_contents = AsyncMock(return_value=mock_sdk_response)
mock_exa_cls.return_value = mock_exa
async for _ in block.run(
block.Input(urls=["https://example.com"], credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
credentials=TEST_CREDENTIALS,
):
pass
assert len(merged) == 1
assert merged[0].provider_cost == pytest.approx(0.0)
class TestExaSearchCostTracking:
"""ExaSearchBlock merges cost_dollars.total as provider_cost."""
@pytest.mark.asyncio
async def test_cost_dollars_total_is_merged(self):
"""When the SDK response includes cost_dollars, its total is merged."""
from backend.blocks.exa.helpers import CostDollars
from backend.blocks.exa.search import ExaSearchBlock
block = ExaSearchBlock()
merged: list[NodeExecutionStats] = []
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
mock_sdk_response = MagicMock()
mock_sdk_response.results = []
mock_sdk_response.context = None
mock_sdk_response.resolved_search_type = None
mock_sdk_response.cost_dollars = CostDollars(total=0.008)
with patch("backend.blocks.exa.search.AsyncExa") as mock_exa_cls:
mock_exa = MagicMock()
mock_exa.search = AsyncMock(return_value=mock_sdk_response)
mock_exa_cls.return_value = mock_exa
async for _ in block.run(
block.Input(query="test query", credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
credentials=TEST_CREDENTIALS,
):
pass
assert len(merged) == 1
assert merged[0].provider_cost == pytest.approx(0.008)
@pytest.mark.asyncio
async def test_no_cost_dollars_skips_merge(self):
"""When cost_dollars is absent, merge_stats is not called."""
from backend.blocks.exa.search import ExaSearchBlock
block = ExaSearchBlock()
merged: list[NodeExecutionStats] = []
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
mock_sdk_response = MagicMock()
mock_sdk_response.results = []
mock_sdk_response.context = None
mock_sdk_response.resolved_search_type = None
mock_sdk_response.cost_dollars = None
with patch("backend.blocks.exa.search.AsyncExa") as mock_exa_cls:
mock_exa = MagicMock()
mock_exa.search = AsyncMock(return_value=mock_sdk_response)
mock_exa_cls.return_value = mock_exa
async for _ in block.run(
block.Input(query="test query", credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
credentials=TEST_CREDENTIALS,
):
pass
assert len(merged) == 0
class TestExaSimilarCostTracking:
"""ExaFindSimilarBlock merges cost_dollars.total as provider_cost."""
@pytest.mark.asyncio
async def test_cost_dollars_total_is_merged(self):
"""When the SDK response includes cost_dollars, its total is merged."""
from backend.blocks.exa.helpers import CostDollars
from backend.blocks.exa.similar import ExaFindSimilarBlock
block = ExaFindSimilarBlock()
merged: list[NodeExecutionStats] = []
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
mock_sdk_response = MagicMock()
mock_sdk_response.results = []
mock_sdk_response.context = None
mock_sdk_response.request_id = "req-1"
mock_sdk_response.cost_dollars = CostDollars(total=0.015)
with patch("backend.blocks.exa.similar.AsyncExa") as mock_exa_cls:
mock_exa = MagicMock()
mock_exa.find_similar = AsyncMock(return_value=mock_sdk_response)
mock_exa_cls.return_value = mock_exa
async for _ in block.run(
block.Input(url="https://example.com", credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
credentials=TEST_CREDENTIALS,
):
pass
assert len(merged) == 1
assert merged[0].provider_cost == pytest.approx(0.015)
@pytest.mark.asyncio
async def test_no_cost_dollars_skips_merge(self):
"""When cost_dollars is absent, merge_stats is not called."""
from backend.blocks.exa.similar import ExaFindSimilarBlock
block = ExaFindSimilarBlock()
merged: list[NodeExecutionStats] = []
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
mock_sdk_response = MagicMock()
mock_sdk_response.results = []
mock_sdk_response.context = None
mock_sdk_response.request_id = "req-2"
mock_sdk_response.cost_dollars = None
with patch("backend.blocks.exa.similar.AsyncExa") as mock_exa_cls:
mock_exa = MagicMock()
mock_exa.find_similar = AsyncMock(return_value=mock_sdk_response)
mock_exa_cls.return_value = mock_exa
async for _ in block.run(
block.Input(url="https://example.com", credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
credentials=TEST_CREDENTIALS,
):
pass
assert len(merged) == 0
# ---------------------------------------------------------------------------
# ExaCreateResearchBlock — cost_dollars from completed poll response
# ---------------------------------------------------------------------------
COMPLETED_RESEARCH_RESPONSE = {
"researchId": "test-research-id",
"status": "completed",
"model": "exa-research",
"instructions": "test instructions",
"createdAt": 1700000000000,
"finishedAt": 1700000060000,
"costDollars": {
"total": 0.05,
"numSearches": 3,
"numPages": 10,
"reasoningTokens": 500,
},
"output": {"content": "Research findings...", "parsed": None},
}
PENDING_RESEARCH_RESPONSE = {
"researchId": "test-research-id",
"status": "pending",
"model": "exa-research",
"instructions": "test instructions",
"createdAt": 1700000000000,
}
class TestExaCreateResearchBlockCostTracking:
"""ExaCreateResearchBlock merges cost from completed poll response."""
@pytest.mark.asyncio
async def test_cost_merged_when_research_completes(self):
"""merge_stats called with provider_cost=total when poll returns completed."""
from backend.blocks.exa.research import ExaCreateResearchBlock
block = ExaCreateResearchBlock()
merged: list[NodeExecutionStats] = []
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
create_resp = MagicMock()
create_resp.json.return_value = PENDING_RESEARCH_RESPONSE
poll_resp = MagicMock()
poll_resp.json.return_value = COMPLETED_RESEARCH_RESPONSE
mock_instance = MagicMock()
mock_instance.post = AsyncMock(return_value=create_resp)
mock_instance.get = AsyncMock(return_value=poll_resp)
with (
patch("backend.blocks.exa.research.Requests", return_value=mock_instance),
patch("asyncio.sleep", new=AsyncMock()),
):
async for _ in block.run(
block.Input(
instructions="test instructions",
wait_for_completion=True,
credentials=TEST_CREDENTIALS_INPUT, # type: ignore[arg-type]
),
credentials=TEST_CREDENTIALS,
):
pass
assert len(merged) == 1
assert merged[0].provider_cost == pytest.approx(0.05)
@pytest.mark.asyncio
async def test_no_merge_when_no_cost_dollars(self):
"""When completed response has no costDollars, merge_stats is not called."""
from backend.blocks.exa.research import ExaCreateResearchBlock
block = ExaCreateResearchBlock()
merged: list[NodeExecutionStats] = []
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
no_cost_response = {**COMPLETED_RESEARCH_RESPONSE, "costDollars": None}
create_resp = MagicMock()
create_resp.json.return_value = PENDING_RESEARCH_RESPONSE
poll_resp = MagicMock()
poll_resp.json.return_value = no_cost_response
mock_instance = MagicMock()
mock_instance.post = AsyncMock(return_value=create_resp)
mock_instance.get = AsyncMock(return_value=poll_resp)
with (
patch("backend.blocks.exa.research.Requests", return_value=mock_instance),
patch("asyncio.sleep", new=AsyncMock()),
):
async for _ in block.run(
block.Input(
instructions="test instructions",
wait_for_completion=True,
credentials=TEST_CREDENTIALS_INPUT, # type: ignore[arg-type]
),
credentials=TEST_CREDENTIALS,
):
pass
assert merged == []
# ---------------------------------------------------------------------------
# ExaGetResearchBlock — cost_dollars from single GET response
# ---------------------------------------------------------------------------
class TestExaGetResearchBlockCostTracking:
"""ExaGetResearchBlock merges cost when the fetched research has cost_dollars."""
@pytest.mark.asyncio
async def test_cost_merged_from_completed_research(self):
"""merge_stats called with provider_cost=total when research has costDollars."""
from backend.blocks.exa.research import ExaGetResearchBlock
block = ExaGetResearchBlock()
merged: list[NodeExecutionStats] = []
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
get_resp = MagicMock()
get_resp.json.return_value = COMPLETED_RESEARCH_RESPONSE
mock_instance = MagicMock()
mock_instance.get = AsyncMock(return_value=get_resp)
with patch("backend.blocks.exa.research.Requests", return_value=mock_instance):
async for _ in block.run(
block.Input(
research_id="test-research-id",
credentials=TEST_CREDENTIALS_INPUT, # type: ignore[arg-type]
),
credentials=TEST_CREDENTIALS,
):
pass
assert len(merged) == 1
assert merged[0].provider_cost == pytest.approx(0.05)
@pytest.mark.asyncio
async def test_no_merge_when_no_cost_dollars(self):
"""When research has no costDollars, merge_stats is not called."""
from backend.blocks.exa.research import ExaGetResearchBlock
block = ExaGetResearchBlock()
merged: list[NodeExecutionStats] = []
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
no_cost_response = {**COMPLETED_RESEARCH_RESPONSE, "costDollars": None}
get_resp = MagicMock()
get_resp.json.return_value = no_cost_response
mock_instance = MagicMock()
mock_instance.get = AsyncMock(return_value=get_resp)
with patch("backend.blocks.exa.research.Requests", return_value=mock_instance):
async for _ in block.run(
block.Input(
research_id="test-research-id",
credentials=TEST_CREDENTIALS_INPUT, # type: ignore[arg-type]
),
credentials=TEST_CREDENTIALS,
):
pass
assert merged == []
# ---------------------------------------------------------------------------
# ExaWaitForResearchBlock — cost_dollars from polling response
# ---------------------------------------------------------------------------
class TestExaWaitForResearchBlockCostTracking:
"""ExaWaitForResearchBlock merges cost when the polled research has cost_dollars."""
@pytest.mark.asyncio
async def test_cost_merged_when_research_completes(self):
"""merge_stats called with provider_cost=total once polling returns completed."""
from backend.blocks.exa.research import ExaWaitForResearchBlock
block = ExaWaitForResearchBlock()
merged: list[NodeExecutionStats] = []
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
poll_resp = MagicMock()
poll_resp.json.return_value = COMPLETED_RESEARCH_RESPONSE
mock_instance = MagicMock()
mock_instance.get = AsyncMock(return_value=poll_resp)
with (
patch("backend.blocks.exa.research.Requests", return_value=mock_instance),
patch("asyncio.sleep", new=AsyncMock()),
):
async for _ in block.run(
block.Input(
research_id="test-research-id",
credentials=TEST_CREDENTIALS_INPUT, # type: ignore[arg-type]
),
credentials=TEST_CREDENTIALS,
):
pass
assert len(merged) == 1
assert merged[0].provider_cost == pytest.approx(0.05)
@pytest.mark.asyncio
async def test_no_merge_when_no_cost_dollars(self):
"""When completed research has no costDollars, merge_stats is not called."""
from backend.blocks.exa.research import ExaWaitForResearchBlock
block = ExaWaitForResearchBlock()
merged: list[NodeExecutionStats] = []
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
no_cost_response = {**COMPLETED_RESEARCH_RESPONSE, "costDollars": None}
poll_resp = MagicMock()
poll_resp.json.return_value = no_cost_response
mock_instance = MagicMock()
mock_instance.get = AsyncMock(return_value=poll_resp)
with (
patch("backend.blocks.exa.research.Requests", return_value=mock_instance),
patch("asyncio.sleep", new=AsyncMock()),
):
async for _ in block.run(
block.Input(
research_id="test-research-id",
credentials=TEST_CREDENTIALS_INPUT, # type: ignore[arg-type]
),
credentials=TEST_CREDENTIALS,
):
pass
assert merged == []

View File

@@ -12,6 +12,7 @@ from typing import Any, Dict, List, Optional
from pydantic import BaseModel
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
@@ -232,6 +233,11 @@ class ExaCreateResearchBlock(Block):
if research.cost_dollars:
yield "cost_total", research.cost_dollars.total
self.merge_stats(
NodeExecutionStats(
provider_cost=research.cost_dollars.total
)
)
return
await asyncio.sleep(check_interval)
@@ -346,6 +352,9 @@ class ExaGetResearchBlock(Block):
yield "cost_searches", research.cost_dollars.num_searches
yield "cost_pages", research.cost_dollars.num_pages
yield "cost_reasoning_tokens", research.cost_dollars.reasoning_tokens
self.merge_stats(
NodeExecutionStats(provider_cost=research.cost_dollars.total)
)
yield "error_message", research.error
@@ -432,6 +441,9 @@ class ExaWaitForResearchBlock(Block):
if research.cost_dollars:
yield "cost_total", research.cost_dollars.total
self.merge_stats(
NodeExecutionStats(provider_cost=research.cost_dollars.total)
)
return

View File

@@ -4,6 +4,7 @@ from typing import Optional
from exa_py import AsyncExa
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
@@ -206,3 +207,6 @@ class ExaSearchBlock(Block):
if response.cost_dollars:
yield "cost_dollars", response.cost_dollars
self.merge_stats(
NodeExecutionStats(provider_cost=response.cost_dollars.total)
)

View File

@@ -3,6 +3,7 @@ from typing import Optional
from exa_py import AsyncExa
from backend.data.model import NodeExecutionStats
from backend.sdk import (
APIKeyCredentials,
Block,
@@ -167,3 +168,6 @@ class ExaFindSimilarBlock(Block):
if response.cost_dollars:
yield "cost_dollars", response.cost_dollars
self.merge_stats(
NodeExecutionStats(provider_cost=response.cost_dollars.total)
)

View File

@@ -14,6 +14,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -117,6 +118,11 @@ class GoogleMapsSearchBlock(Block):
input_data.radius,
input_data.max_results,
)
self.merge_stats(
NodeExecutionStats(
provider_cost=float(len(places)), provider_cost_type="items"
)
)
for place in places:
yield "place", place

View File

@@ -10,7 +10,7 @@ from backend.blocks.jina._auth import (
JinaCredentialsField,
JinaCredentialsInput,
)
from backend.data.model import SchemaField
from backend.data.model import NodeExecutionStats, SchemaField
from backend.util.request import Requests
@@ -45,5 +45,13 @@ class JinaEmbeddingBlock(Block):
}
data = {"input": input_data.texts, "model": input_data.model}
response = await Requests().post(url, headers=headers, json=data)
embeddings = [e["embedding"] for e in response.json()["data"]]
resp_json = response.json()
embeddings = [e["embedding"] for e in resp_json["data"]]
usage = resp_json.get("usage", {})
if usage.get("total_tokens"):
self.merge_stats(
NodeExecutionStats(
input_token_count=usage.get("total_tokens", 0),
)
)
yield "embeddings", embeddings

View File

@@ -1,6 +1,7 @@
# This file contains a lot of prompt block strings that would trigger "line too long"
# flake8: noqa: E501
import logging
import math
import re
import secrets
from abc import ABC
@@ -13,6 +14,7 @@ import ollama
import openai
from anthropic.types import ToolParam
from groq import AsyncGroq
from openai.types.chat import ChatCompletion as OpenAIChatCompletion
from pydantic import BaseModel, SecretStr
from backend.blocks._base import (
@@ -205,6 +207,19 @@ class LlmModel(str, Enum, metaclass=LlmModelMeta):
KIMI_K2 = "moonshotai/kimi-k2"
QWEN3_235B_A22B_THINKING = "qwen/qwen3-235b-a22b-thinking-2507"
QWEN3_CODER = "qwen/qwen3-coder"
# Z.ai (Zhipu) models
ZAI_GLM_4_32B = "z-ai/glm-4-32b"
ZAI_GLM_4_5 = "z-ai/glm-4.5"
ZAI_GLM_4_5_AIR = "z-ai/glm-4.5-air"
ZAI_GLM_4_5_AIR_FREE = "z-ai/glm-4.5-air:free"
ZAI_GLM_4_5V = "z-ai/glm-4.5v"
ZAI_GLM_4_6 = "z-ai/glm-4.6"
ZAI_GLM_4_6V = "z-ai/glm-4.6v"
ZAI_GLM_4_7 = "z-ai/glm-4.7"
ZAI_GLM_4_7_FLASH = "z-ai/glm-4.7-flash"
ZAI_GLM_5 = "z-ai/glm-5"
ZAI_GLM_5_TURBO = "z-ai/glm-5-turbo"
ZAI_GLM_5V_TURBO = "z-ai/glm-5v-turbo"
# Llama API models
LLAMA_API_LLAMA_4_SCOUT = "Llama-4-Scout-17B-16E-Instruct-FP8"
LLAMA_API_LLAMA4_MAVERICK = "Llama-4-Maverick-17B-128E-Instruct-FP8"
@@ -630,6 +645,43 @@ MODEL_METADATA = {
LlmModel.QWEN3_CODER: ModelMetadata(
"open_router", 262144, 262144, "Qwen 3 Coder", "OpenRouter", "Qwen", 3
),
# https://openrouter.ai/models?q=z-ai
LlmModel.ZAI_GLM_4_32B: ModelMetadata(
"open_router", 128000, 128000, "GLM 4 32B", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_4_5: ModelMetadata(
"open_router", 131072, 98304, "GLM 4.5", "OpenRouter", "Z.ai", 2
),
LlmModel.ZAI_GLM_4_5_AIR: ModelMetadata(
"open_router", 131072, 98304, "GLM 4.5 Air", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_4_5_AIR_FREE: ModelMetadata(
"open_router", 131072, 96000, "GLM 4.5 Air (Free)", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_4_5V: ModelMetadata(
"open_router", 65536, 16384, "GLM 4.5V", "OpenRouter", "Z.ai", 2
),
LlmModel.ZAI_GLM_4_6: ModelMetadata(
"open_router", 204800, 204800, "GLM 4.6", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_4_6V: ModelMetadata(
"open_router", 131072, 131072, "GLM 4.6V", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_4_7: ModelMetadata(
"open_router", 202752, 65535, "GLM 4.7", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_4_7_FLASH: ModelMetadata(
"open_router", 202752, 202752, "GLM 4.7 Flash", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_5: ModelMetadata(
"open_router", 80000, 80000, "GLM 5", "OpenRouter", "Z.ai", 2
),
LlmModel.ZAI_GLM_5_TURBO: ModelMetadata(
"open_router", 202752, 131072, "GLM 5 Turbo", "OpenRouter", "Z.ai", 3
),
LlmModel.ZAI_GLM_5V_TURBO: ModelMetadata(
"open_router", 202752, 131072, "GLM 5V Turbo", "OpenRouter", "Z.ai", 3
),
# Llama API models
LlmModel.LLAMA_API_LLAMA_4_SCOUT: ModelMetadata(
"llama_api",
@@ -687,6 +739,7 @@ class LLMResponse(BaseModel):
prompt_tokens: int
completion_tokens: int
reasoning: Optional[str] = None
provider_cost: float | None = None
def convert_openai_tool_fmt_to_anthropic(
@@ -721,6 +774,35 @@ def convert_openai_tool_fmt_to_anthropic(
return anthropic_tools
def extract_openrouter_cost(response: OpenAIChatCompletion) -> float | None:
"""Extract OpenRouter's `x-total-cost` header from an OpenAI SDK response.
OpenRouter returns the per-request USD cost in a response header. The
OpenAI SDK exposes the raw httpx response via an undocumented `_response`
attribute. We use try/except AttributeError so that if the SDK ever drops
or renames that attribute, the warning is visible in logs rather than
silently degrading to no cost tracking.
"""
try:
raw_resp = response._response # type: ignore[attr-defined]
except AttributeError:
logger.warning(
"OpenAI SDK response missing _response attribute"
" — OpenRouter cost tracking unavailable"
)
return None
try:
cost_header = raw_resp.headers.get("x-total-cost")
if not cost_header:
return None
cost = float(cost_header)
if not math.isfinite(cost):
return None
return cost
except (ValueError, TypeError, AttributeError):
return None
def extract_openai_reasoning(response) -> str | None:
"""Extract reasoning from OpenAI-compatible response if available."""
"""Note: This will likely not working since the reasoning is not present in another Response API"""
@@ -1053,6 +1135,7 @@ async def llm_call(
prompt_tokens=response.usage.prompt_tokens if response.usage else 0,
completion_tokens=response.usage.completion_tokens if response.usage else 0,
reasoning=reasoning,
provider_cost=extract_openrouter_cost(response),
)
elif provider == "llama_api":
tools_param = tools if tools else openai.NOT_GIVEN
@@ -1360,6 +1443,7 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
error_feedback_message = ""
llm_model = input_data.model
last_attempt_cost: float | None = None
for retry_count in range(input_data.retry):
logger.debug(f"LLM request: {prompt}")
@@ -1377,12 +1461,15 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
max_tokens=input_data.max_tokens,
)
response_text = llm_response.response
self.merge_stats(
NodeExecutionStats(
input_token_count=llm_response.prompt_tokens,
output_token_count=llm_response.completion_tokens,
)
# Merge token counts for every attempt (each call costs tokens).
# provider_cost (actual USD) is tracked separately and only merged
# on success to avoid double-counting across retries.
token_stats = NodeExecutionStats(
input_token_count=llm_response.prompt_tokens,
output_token_count=llm_response.completion_tokens,
)
self.merge_stats(token_stats)
last_attempt_cost = llm_response.provider_cost
logger.debug(f"LLM attempt-{retry_count} response: {response_text}")
if input_data.expected_format:
@@ -1451,6 +1538,7 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
NodeExecutionStats(
llm_call_count=retry_count + 1,
llm_retry_count=retry_count,
provider_cost=last_attempt_cost,
)
)
yield "response", response_obj
@@ -1471,6 +1559,7 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
NodeExecutionStats(
llm_call_count=retry_count + 1,
llm_retry_count=retry_count,
provider_cost=last_attempt_cost,
)
)
yield "response", {"response": response_text}

View File

@@ -23,7 +23,7 @@ from backend.blocks.smartlead.models import (
SaveSequencesResponse,
Sequence,
)
from backend.data.model import CredentialsField, SchemaField
from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField
class CreateCampaignBlock(Block):
@@ -226,6 +226,12 @@ class AddLeadToCampaignBlock(Block):
response = await self.add_leads_to_campaign(
input_data.campaign_id, input_data.lead_list, credentials
)
self.merge_stats(
NodeExecutionStats(
provider_cost=float(len(input_data.lead_list)),
provider_cost_type="items",
)
)
yield "campaign_id", input_data.campaign_id
yield "upload_count", response.upload_count

View File

@@ -199,6 +199,66 @@ class TestLLMStatsTracking:
assert block.execution_stats.llm_call_count == 2 # retry_count + 1 = 1 + 1 = 2
assert block.execution_stats.llm_retry_count == 1
@pytest.mark.asyncio
async def test_retry_cost_uses_last_attempt_only(self):
"""provider_cost is only merged from the final successful attempt.
Intermediate retry costs are intentionally dropped to avoid
double-counting: the cost of failed attempts is captured in
last_attempt_cost only when the loop eventually succeeds.
"""
import backend.blocks.llm as llm
block = llm.AIStructuredResponseGeneratorBlock()
call_count = 0
async def mock_llm_call(*args, **kwargs):
nonlocal call_count
call_count += 1
if call_count == 1:
# First attempt: fails validation, returns cost $0.01
return llm.LLMResponse(
raw_response="",
prompt=[],
response='<json_output id="test123456">{"wrong": "key"}</json_output>',
tool_calls=None,
prompt_tokens=10,
completion_tokens=5,
reasoning=None,
provider_cost=0.01,
)
# Second attempt: succeeds, returns cost $0.02
return llm.LLMResponse(
raw_response="",
prompt=[],
response='<json_output id="test123456">{"key1": "value1", "key2": "value2"}</json_output>',
tool_calls=None,
prompt_tokens=20,
completion_tokens=10,
reasoning=None,
provider_cost=0.02,
)
block.llm_call = mock_llm_call # type: ignore
input_data = llm.AIStructuredResponseGeneratorBlock.Input(
prompt="Test prompt",
expected_format={"key1": "desc1", "key2": "desc2"},
model=llm.DEFAULT_LLM_MODEL,
credentials=llm.TEST_CREDENTIALS_INPUT, # type: ignore
retry=2,
)
with patch("secrets.token_hex", return_value="test123456"):
async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
pass
# Only the final successful attempt's cost is merged
assert block.execution_stats.provider_cost == pytest.approx(0.02)
# Tokens from both attempts accumulate
assert block.execution_stats.input_token_count == 30
assert block.execution_stats.output_token_count == 15
@pytest.mark.asyncio
async def test_ai_text_summarizer_multiple_chunks(self):
"""Test that AITextSummarizerBlock correctly accumulates stats across multiple chunks."""
@@ -987,3 +1047,63 @@ class TestLlmModelMissing:
assert (
llm.LlmModel("extra/google/gemini-2.5-pro") == llm.LlmModel.GEMINI_2_5_PRO
)
class TestExtractOpenRouterCost:
"""Tests for extract_openrouter_cost — the x-total-cost header parser."""
def _mk_response(self, headers: dict | None):
response = MagicMock()
if headers is None:
response._response = None
else:
raw = MagicMock()
raw.headers = headers
response._response = raw
return response
def test_extracts_numeric_cost(self):
response = self._mk_response({"x-total-cost": "0.0042"})
assert llm.extract_openrouter_cost(response) == 0.0042
def test_returns_none_when_header_missing(self):
response = self._mk_response({})
assert llm.extract_openrouter_cost(response) is None
def test_returns_none_when_header_empty_string(self):
response = self._mk_response({"x-total-cost": ""})
assert llm.extract_openrouter_cost(response) is None
def test_returns_none_when_header_non_numeric(self):
response = self._mk_response({"x-total-cost": "not-a-number"})
assert llm.extract_openrouter_cost(response) is None
def test_returns_none_when_no_response_attr(self):
response = MagicMock(spec=[]) # no _response attr
assert llm.extract_openrouter_cost(response) is None
def test_returns_none_when_raw_is_none(self):
response = self._mk_response(None)
assert llm.extract_openrouter_cost(response) is None
def test_returns_none_when_raw_has_no_headers(self):
response = MagicMock()
response._response = MagicMock(spec=[]) # no headers attr
assert llm.extract_openrouter_cost(response) is None
def test_returns_zero_for_zero_cost(self):
"""Zero-cost is a valid value (free tier) and must not become None."""
response = self._mk_response({"x-total-cost": "0"})
assert llm.extract_openrouter_cost(response) == 0.0
def test_returns_none_for_inf(self):
response = self._mk_response({"x-total-cost": "inf"})
assert llm.extract_openrouter_cost(response) is None
def test_returns_none_for_negative_inf(self):
response = self._mk_response({"x-total-cost": "-inf"})
assert llm.extract_openrouter_cost(response) is None
def test_returns_none_for_nan(self):
response = self._mk_response({"x-total-cost": "nan"})
assert llm.extract_openrouter_cost(response) is None

View File

@@ -13,6 +13,7 @@ from backend.data.model import (
APIKeyCredentials,
CredentialsField,
CredentialsMetaInput,
NodeExecutionStats,
SchemaField,
)
from backend.integrations.providers import ProviderName
@@ -104,4 +105,10 @@ class UnrealTextToSpeechBlock(Block):
input_data.text,
input_data.voice_id,
)
self.merge_stats(
NodeExecutionStats(
provider_cost=float(len(input_data.text)),
provider_cost_type="characters",
)
)
yield "mp3_url", api_response["OutputUri"]

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,799 @@
"""Unit tests for baseline service pure-logic helpers.
These tests cover ``_baseline_conversation_updater`` and ``_BaselineStreamState``
without requiring API keys, database connections, or network access.
"""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from openai.types.chat import ChatCompletionToolParam
from backend.copilot.baseline.service import (
_baseline_conversation_updater,
_BaselineStreamState,
_compress_session_messages,
_ThinkingStripper,
)
from backend.copilot.model import ChatMessage
from backend.copilot.transcript_builder import TranscriptBuilder
from backend.util.prompt import CompressResult
from backend.util.tool_call_loop import LLMLoopResponse, LLMToolCall, ToolCallResult
class TestBaselineStreamState:
def test_defaults(self):
state = _BaselineStreamState()
assert state.pending_events == []
assert state.assistant_text == ""
assert state.text_started is False
assert state.turn_prompt_tokens == 0
assert state.turn_completion_tokens == 0
assert state.text_block_id # Should be a UUID string
def test_mutable_fields(self):
state = _BaselineStreamState()
state.assistant_text = "hello"
state.turn_prompt_tokens = 100
state.turn_completion_tokens = 50
assert state.assistant_text == "hello"
assert state.turn_prompt_tokens == 100
assert state.turn_completion_tokens == 50
class TestBaselineConversationUpdater:
"""Tests for _baseline_conversation_updater which updates the OpenAI
message list and transcript builder after each LLM call."""
def _make_transcript_builder(self) -> TranscriptBuilder:
builder = TranscriptBuilder()
builder.append_user("test question")
return builder
def test_text_only_response(self):
"""When the LLM returns text without tool calls, the updater appends
a single assistant message and records it in the transcript."""
messages: list = []
builder = self._make_transcript_builder()
response = LLMLoopResponse(
response_text="Hello, world!",
tool_calls=[],
raw_response=None,
prompt_tokens=0,
completion_tokens=0,
)
_baseline_conversation_updater(
messages,
response,
tool_results=None,
transcript_builder=builder,
model="test-model",
)
assert len(messages) == 1
assert messages[0]["role"] == "assistant"
assert messages[0]["content"] == "Hello, world!"
# Transcript should have user + assistant
assert builder.entry_count == 2
assert builder.last_entry_type == "assistant"
def test_tool_calls_response(self):
"""When the LLM returns tool calls, the updater appends the assistant
message with tool_calls and tool result messages."""
messages: list = []
builder = self._make_transcript_builder()
response = LLMLoopResponse(
response_text="Let me search...",
tool_calls=[
LLMToolCall(
id="tc_1",
name="search",
arguments='{"query": "test"}',
),
],
raw_response=None,
prompt_tokens=0,
completion_tokens=0,
)
tool_results = [
ToolCallResult(
tool_call_id="tc_1",
tool_name="search",
content="Found result",
),
]
_baseline_conversation_updater(
messages,
response,
tool_results=tool_results,
transcript_builder=builder,
model="test-model",
)
# Messages: assistant (with tool_calls) + tool result
assert len(messages) == 2
assert messages[0]["role"] == "assistant"
assert messages[0]["content"] == "Let me search..."
assert len(messages[0]["tool_calls"]) == 1
assert messages[0]["tool_calls"][0]["id"] == "tc_1"
assert messages[1]["role"] == "tool"
assert messages[1]["tool_call_id"] == "tc_1"
assert messages[1]["content"] == "Found result"
# Transcript: user + assistant(tool_use) + user(tool_result)
assert builder.entry_count == 3
def test_tool_calls_without_text(self):
"""Tool calls without accompanying text should still work."""
messages: list = []
builder = self._make_transcript_builder()
response = LLMLoopResponse(
response_text=None,
tool_calls=[
LLMToolCall(id="tc_1", name="run", arguments="{}"),
],
raw_response=None,
prompt_tokens=0,
completion_tokens=0,
)
tool_results = [
ToolCallResult(tool_call_id="tc_1", tool_name="run", content="done"),
]
_baseline_conversation_updater(
messages,
response,
tool_results=tool_results,
transcript_builder=builder,
model="test-model",
)
assert len(messages) == 2
assert "content" not in messages[0] # No text content
assert messages[0]["tool_calls"][0]["function"]["name"] == "run"
def test_no_text_no_tools(self):
"""When the response has no text and no tool calls, nothing is appended."""
messages: list = []
builder = self._make_transcript_builder()
response = LLMLoopResponse(
response_text=None,
tool_calls=[],
raw_response=None,
prompt_tokens=0,
completion_tokens=0,
)
_baseline_conversation_updater(
messages,
response,
tool_results=None,
transcript_builder=builder,
model="test-model",
)
assert len(messages) == 0
# Only the user entry from setup
assert builder.entry_count == 1
def test_multiple_tool_calls(self):
"""Multiple tool calls in a single response are all recorded."""
messages: list = []
builder = self._make_transcript_builder()
response = LLMLoopResponse(
response_text=None,
tool_calls=[
LLMToolCall(id="tc_1", name="tool_a", arguments="{}"),
LLMToolCall(id="tc_2", name="tool_b", arguments='{"x": 1}'),
],
raw_response=None,
prompt_tokens=0,
completion_tokens=0,
)
tool_results = [
ToolCallResult(tool_call_id="tc_1", tool_name="tool_a", content="result_a"),
ToolCallResult(tool_call_id="tc_2", tool_name="tool_b", content="result_b"),
]
_baseline_conversation_updater(
messages,
response,
tool_results=tool_results,
transcript_builder=builder,
model="test-model",
)
# 1 assistant + 2 tool results
assert len(messages) == 3
assert len(messages[0]["tool_calls"]) == 2
assert messages[1]["tool_call_id"] == "tc_1"
assert messages[2]["tool_call_id"] == "tc_2"
def test_invalid_tool_arguments_handled(self):
"""Tool call with invalid JSON arguments: the arguments field is
stored as-is in the message, and orjson failure falls back to {}
in the transcript content_blocks."""
messages: list = []
builder = self._make_transcript_builder()
response = LLMLoopResponse(
response_text=None,
tool_calls=[
LLMToolCall(id="tc_1", name="tool_x", arguments="not-json"),
],
raw_response=None,
prompt_tokens=0,
completion_tokens=0,
)
tool_results = [
ToolCallResult(tool_call_id="tc_1", tool_name="tool_x", content="ok"),
]
_baseline_conversation_updater(
messages,
response,
tool_results=tool_results,
transcript_builder=builder,
model="test-model",
)
# Should not raise — invalid JSON falls back to {} in transcript
assert len(messages) == 2
assert messages[0]["tool_calls"][0]["function"]["arguments"] == "not-json"
class TestCompressSessionMessagesPreservesToolCalls:
"""``_compress_session_messages`` must round-trip tool_calls + tool_call_id.
Compression serialises ChatMessage to dict for ``compress_context`` and
reifies the result back to ChatMessage. A regression that drops
``tool_calls`` or ``tool_call_id`` would corrupt the OpenAI message
list and break downstream tool-execution rounds.
"""
@pytest.mark.asyncio
async def test_compressed_output_keeps_tool_calls_and_ids(self):
# Simulate compression that returns a summary + the most recent
# assistant(tool_call) + tool(tool_result) intact.
summary = {"role": "system", "content": "prior turns: user asked X"}
assistant_with_tc = {
"role": "assistant",
"content": "calling tool",
"tool_calls": [
{
"id": "tc_abc",
"type": "function",
"function": {"name": "search", "arguments": '{"q":"y"}'},
}
],
}
tool_result = {
"role": "tool",
"tool_call_id": "tc_abc",
"content": "search result",
}
compress_result = CompressResult(
messages=[summary, assistant_with_tc, tool_result],
token_count=100,
was_compacted=True,
original_token_count=5000,
messages_summarized=10,
messages_dropped=0,
)
# Input: messages that should be compressed.
input_messages = [
ChatMessage(role="user", content="q1"),
ChatMessage(
role="assistant",
content="calling tool",
tool_calls=[
{
"id": "tc_abc",
"type": "function",
"function": {
"name": "search",
"arguments": '{"q":"y"}',
},
}
],
),
ChatMessage(
role="tool",
tool_call_id="tc_abc",
content="search result",
),
]
with patch(
"backend.copilot.baseline.service.compress_context",
new=AsyncMock(return_value=compress_result),
):
compressed = await _compress_session_messages(
input_messages, model="openrouter/anthropic/claude-opus-4"
)
# Summary, assistant(tool_calls), tool(tool_call_id).
assert len(compressed) == 3
# Assistant message must keep its tool_calls intact.
assistant_msg = compressed[1]
assert assistant_msg.role == "assistant"
assert assistant_msg.tool_calls is not None
assert len(assistant_msg.tool_calls) == 1
assert assistant_msg.tool_calls[0]["id"] == "tc_abc"
assert assistant_msg.tool_calls[0]["function"]["name"] == "search"
# Tool-role message must keep tool_call_id for OpenAI linkage.
tool_msg = compressed[2]
assert tool_msg.role == "tool"
assert tool_msg.tool_call_id == "tc_abc"
assert tool_msg.content == "search result"
@pytest.mark.asyncio
async def test_uncompressed_passthrough_keeps_fields(self):
"""When compression is a no-op (was_compacted=False), the original
messages must be returned unchanged — including tool_calls."""
input_messages = [
ChatMessage(
role="assistant",
content="c",
tool_calls=[
{
"id": "t1",
"type": "function",
"function": {"name": "f", "arguments": "{}"},
}
],
),
ChatMessage(role="tool", tool_call_id="t1", content="ok"),
]
noop_result = CompressResult(
messages=[], # ignored when was_compacted=False
token_count=10,
was_compacted=False,
)
with patch(
"backend.copilot.baseline.service.compress_context",
new=AsyncMock(return_value=noop_result),
):
out = await _compress_session_messages(
input_messages, model="openrouter/anthropic/claude-opus-4"
)
assert out is input_messages # same list returned
assert out[0].tool_calls is not None
assert out[0].tool_calls[0]["id"] == "t1"
assert out[1].tool_call_id == "t1"
# ---- _ThinkingStripper tests ---- #
def test_thinking_stripper_basic_thinking_tag() -> None:
"""<thinking>...</thinking> blocks are fully stripped."""
s = _ThinkingStripper()
assert s.process("<thinking>internal reasoning here</thinking>Hello!") == "Hello!"
def test_thinking_stripper_internal_reasoning_tag() -> None:
"""<internal_reasoning>...</internal_reasoning> blocks (Gemini) are stripped."""
s = _ThinkingStripper()
assert (
s.process("<internal_reasoning>step by step</internal_reasoning>Answer")
== "Answer"
)
def test_thinking_stripper_split_across_chunks() -> None:
"""Tags split across multiple chunks are handled correctly."""
s = _ThinkingStripper()
out = s.process("Hello <thin")
out += s.process("king>secret</thinking> world")
assert out == "Hello world"
def test_thinking_stripper_plain_text_preserved() -> None:
"""Plain text with the word 'thinking' is not stripped."""
s = _ThinkingStripper()
assert (
s.process("I am thinking about this problem")
== "I am thinking about this problem"
)
def test_thinking_stripper_multiple_blocks() -> None:
"""Multiple reasoning blocks in one stream are all stripped."""
s = _ThinkingStripper()
result = s.process(
"A<thinking>x</thinking>B<internal_reasoning>y</internal_reasoning>C"
)
assert result == "ABC"
def test_thinking_stripper_flush_discards_unclosed() -> None:
"""Unclosed reasoning block is discarded on flush."""
s = _ThinkingStripper()
s.process("Start<thinking>never closed")
flushed = s.flush()
assert "never closed" not in flushed
def test_thinking_stripper_empty_block() -> None:
"""Empty reasoning blocks are handled gracefully."""
s = _ThinkingStripper()
assert s.process("Before<thinking></thinking>After") == "BeforeAfter"
# ---- _filter_tools_by_permissions tests ---- #
def _make_tool(name: str) -> ChatCompletionToolParam:
"""Build a minimal OpenAI ChatCompletionToolParam."""
return ChatCompletionToolParam(
type="function",
function={"name": name, "parameters": {}},
)
class TestFilterToolsByPermissions:
"""Tests for _filter_tools_by_permissions."""
@patch(
"backend.copilot.permissions.all_known_tool_names",
return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
)
def test_empty_permissions_returns_all(self, _mock_names):
"""Empty permissions (no filtering) returns every tool unchanged."""
from backend.copilot.baseline.service import _filter_tools_by_permissions
from backend.copilot.permissions import CopilotPermissions
tools = [_make_tool("run_block"), _make_tool("web_fetch")]
perms = CopilotPermissions()
result = _filter_tools_by_permissions(tools, perms)
assert result == tools
@patch(
"backend.copilot.permissions.all_known_tool_names",
return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
)
def test_allowlist_keeps_only_matching(self, _mock_names):
"""Explicit allowlist (tools_exclude=False) keeps only listed tools."""
from backend.copilot.baseline.service import _filter_tools_by_permissions
from backend.copilot.permissions import CopilotPermissions
tools = [
_make_tool("run_block"),
_make_tool("web_fetch"),
_make_tool("bash_exec"),
]
perms = CopilotPermissions(tools=["web_fetch"], tools_exclude=False)
result = _filter_tools_by_permissions(tools, perms)
assert len(result) == 1
assert result[0]["function"]["name"] == "web_fetch"
@patch(
"backend.copilot.permissions.all_known_tool_names",
return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
)
def test_blacklist_excludes_listed(self, _mock_names):
"""Blacklist (tools_exclude=True) removes only the listed tools."""
from backend.copilot.baseline.service import _filter_tools_by_permissions
from backend.copilot.permissions import CopilotPermissions
tools = [
_make_tool("run_block"),
_make_tool("web_fetch"),
_make_tool("bash_exec"),
]
perms = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
result = _filter_tools_by_permissions(tools, perms)
names = [t["function"]["name"] for t in result]
assert "bash_exec" not in names
assert "run_block" in names
assert "web_fetch" in names
assert len(result) == 2
@patch(
"backend.copilot.permissions.all_known_tool_names",
return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
)
def test_unknown_tool_name_filtered_out(self, _mock_names):
"""A tool whose name is not in all_known_tool_names is dropped."""
from backend.copilot.baseline.service import _filter_tools_by_permissions
from backend.copilot.permissions import CopilotPermissions
tools = [_make_tool("run_block"), _make_tool("unknown_tool")]
perms = CopilotPermissions(tools=["run_block"], tools_exclude=False)
result = _filter_tools_by_permissions(tools, perms)
names = [t["function"]["name"] for t in result]
assert "unknown_tool" not in names
assert names == ["run_block"]
# ---- _prepare_baseline_attachments tests ---- #
class TestPrepareBaselineAttachments:
"""Tests for _prepare_baseline_attachments."""
@pytest.mark.asyncio
async def test_empty_file_ids(self):
"""Empty file_ids returns empty hint and blocks."""
from backend.copilot.baseline.service import _prepare_baseline_attachments
hint, blocks = await _prepare_baseline_attachments([], "user1", "sess1", "/tmp")
assert hint == ""
assert blocks == []
@pytest.mark.asyncio
async def test_empty_user_id(self):
"""Empty user_id returns empty hint and blocks."""
from backend.copilot.baseline.service import _prepare_baseline_attachments
hint, blocks = await _prepare_baseline_attachments(
["file1"], "", "sess1", "/tmp"
)
assert hint == ""
assert blocks == []
@pytest.mark.asyncio
async def test_image_file_returns_vision_blocks(self):
"""A PNG image within size limits is returned as a base64 vision block."""
from backend.copilot.baseline.service import _prepare_baseline_attachments
fake_info = AsyncMock()
fake_info.name = "photo.png"
fake_info.mime_type = "image/png"
fake_info.size_bytes = 1024
fake_manager = AsyncMock()
fake_manager.get_file_info = AsyncMock(return_value=fake_info)
fake_manager.read_file_by_id = AsyncMock(return_value=b"\x89PNG_FAKE_DATA")
with patch(
"backend.copilot.baseline.service.get_workspace_manager",
new=AsyncMock(return_value=fake_manager),
):
hint, blocks = await _prepare_baseline_attachments(
["fid1"], "user1", "sess1", "/tmp/workdir"
)
assert len(blocks) == 1
assert blocks[0]["type"] == "image"
assert blocks[0]["source"]["media_type"] == "image/png"
assert blocks[0]["source"]["type"] == "base64"
assert "photo.png" in hint
assert "embedded as image" in hint
@pytest.mark.asyncio
async def test_non_image_file_saved_to_working_dir(self, tmp_path):
"""A non-image file is written to working_dir."""
from backend.copilot.baseline.service import _prepare_baseline_attachments
fake_info = AsyncMock()
fake_info.name = "data.csv"
fake_info.mime_type = "text/csv"
fake_info.size_bytes = 42
fake_manager = AsyncMock()
fake_manager.get_file_info = AsyncMock(return_value=fake_info)
fake_manager.read_file_by_id = AsyncMock(return_value=b"col1,col2\na,b")
with patch(
"backend.copilot.baseline.service.get_workspace_manager",
new=AsyncMock(return_value=fake_manager),
):
hint, blocks = await _prepare_baseline_attachments(
["fid1"], "user1", "sess1", str(tmp_path)
)
assert blocks == []
assert "data.csv" in hint
assert "saved to" in hint
saved = tmp_path / "data.csv"
assert saved.exists()
assert saved.read_bytes() == b"col1,col2\na,b"
@pytest.mark.asyncio
async def test_file_not_found_skipped(self):
"""When get_file_info returns None the file is silently skipped."""
from backend.copilot.baseline.service import _prepare_baseline_attachments
fake_manager = AsyncMock()
fake_manager.get_file_info = AsyncMock(return_value=None)
with patch(
"backend.copilot.baseline.service.get_workspace_manager",
new=AsyncMock(return_value=fake_manager),
):
hint, blocks = await _prepare_baseline_attachments(
["missing_id"], "user1", "sess1", "/tmp"
)
assert hint == ""
assert blocks == []
@pytest.mark.asyncio
async def test_workspace_manager_error(self):
"""When get_workspace_manager raises, returns empty results."""
from backend.copilot.baseline.service import _prepare_baseline_attachments
with patch(
"backend.copilot.baseline.service.get_workspace_manager",
new=AsyncMock(side_effect=RuntimeError("connection failed")),
):
hint, blocks = await _prepare_baseline_attachments(
["fid1"], "user1", "sess1", "/tmp"
)
assert hint == ""
assert blocks == []
class TestBaselineCostExtraction:
"""Tests for x-total-cost header extraction in _baseline_llm_caller."""
@pytest.mark.asyncio
async def test_cost_usd_extracted_from_response_header(self):
"""state.cost_usd is set from x-total-cost header when present."""
from backend.copilot.baseline.service import (
_baseline_llm_caller,
_BaselineStreamState,
)
state = _BaselineStreamState(model="gpt-4o-mini")
# Build a mock raw httpx response with the cost header
mock_raw_response = MagicMock()
mock_raw_response.headers = {"x-total-cost": "0.0123"}
# Build a mock async streaming response that yields no chunks but has
# a _response attribute pointing to the mock httpx response
mock_stream_response = MagicMock()
mock_stream_response._response = mock_raw_response
async def empty_aiter():
return
yield # make it an async generator
mock_stream_response.__aiter__ = lambda self: empty_aiter()
mock_client = MagicMock()
mock_client.chat.completions.create = AsyncMock(
return_value=mock_stream_response
)
with patch(
"backend.copilot.baseline.service._get_openai_client",
return_value=mock_client,
):
await _baseline_llm_caller(
messages=[{"role": "user", "content": "hi"}],
tools=[],
state=state,
)
assert state.cost_usd == pytest.approx(0.0123)
@pytest.mark.asyncio
async def test_cost_usd_accumulates_across_calls(self):
"""cost_usd accumulates when _baseline_llm_caller is called multiple times."""
from backend.copilot.baseline.service import (
_baseline_llm_caller,
_BaselineStreamState,
)
state = _BaselineStreamState(model="gpt-4o-mini")
def make_stream_mock(cost: str) -> MagicMock:
mock_raw = MagicMock()
mock_raw.headers = {"x-total-cost": cost}
mock_stream = MagicMock()
mock_stream._response = mock_raw
async def empty_aiter():
return
yield
mock_stream.__aiter__ = lambda self: empty_aiter()
return mock_stream
mock_client = MagicMock()
mock_client.chat.completions.create = AsyncMock(
side_effect=[make_stream_mock("0.01"), make_stream_mock("0.02")]
)
with patch(
"backend.copilot.baseline.service._get_openai_client",
return_value=mock_client,
):
await _baseline_llm_caller(
messages=[{"role": "user", "content": "first"}],
tools=[],
state=state,
)
await _baseline_llm_caller(
messages=[{"role": "user", "content": "second"}],
tools=[],
state=state,
)
assert state.cost_usd == pytest.approx(0.03)
@pytest.mark.asyncio
async def test_no_cost_when_header_absent(self):
"""state.cost_usd remains None when response has no x-total-cost header."""
from backend.copilot.baseline.service import (
_baseline_llm_caller,
_BaselineStreamState,
)
state = _BaselineStreamState(model="gpt-4o-mini")
mock_raw = MagicMock()
mock_raw.headers = {}
mock_stream = MagicMock()
mock_stream._response = mock_raw
async def empty_aiter():
return
yield
mock_stream.__aiter__ = lambda self: empty_aiter()
mock_client = MagicMock()
mock_client.chat.completions.create = AsyncMock(return_value=mock_stream)
with patch(
"backend.copilot.baseline.service._get_openai_client",
return_value=mock_client,
):
await _baseline_llm_caller(
messages=[{"role": "user", "content": "hi"}],
tools=[],
state=state,
)
assert state.cost_usd is None
@pytest.mark.asyncio
async def test_cost_extracted_even_when_stream_raises(self):
"""cost_usd is captured in the finally block even when streaming fails."""
from backend.copilot.baseline.service import (
_baseline_llm_caller,
_BaselineStreamState,
)
state = _BaselineStreamState(model="gpt-4o-mini")
mock_raw = MagicMock()
mock_raw.headers = {"x-total-cost": "0.005"}
mock_stream = MagicMock()
mock_stream._response = mock_raw
async def failing_aiter():
raise RuntimeError("stream error")
yield # make it an async generator
mock_stream.__aiter__ = lambda self: failing_aiter()
mock_client = MagicMock()
mock_client.chat.completions.create = AsyncMock(return_value=mock_stream)
with (
patch(
"backend.copilot.baseline.service._get_openai_client",
return_value=mock_client,
),
pytest.raises(RuntimeError, match="stream error"),
):
await _baseline_llm_caller(
messages=[{"role": "user", "content": "hi"}],
tools=[],
state=state,
)
assert state.cost_usd == pytest.approx(0.005)

View File

@@ -0,0 +1,667 @@
"""Integration tests for baseline transcript flow.
Exercises the real helpers in ``baseline/service.py`` that download,
validate, load, append to, backfill, and upload the transcript.
Storage is mocked via ``download_transcript`` / ``upload_transcript``
patches; no network access is required.
"""
import json as stdlib_json
from unittest.mock import AsyncMock, patch
import pytest
from backend.copilot.baseline.service import (
_load_prior_transcript,
_record_turn_to_transcript,
_resolve_baseline_model,
_upload_final_transcript,
is_transcript_stale,
should_upload_transcript,
)
from backend.copilot.service import config
from backend.copilot.transcript import (
STOP_REASON_END_TURN,
STOP_REASON_TOOL_USE,
TranscriptDownload,
)
from backend.copilot.transcript_builder import TranscriptBuilder
from backend.util.tool_call_loop import LLMLoopResponse, LLMToolCall, ToolCallResult
def _make_transcript_content(*roles: str) -> str:
"""Build a minimal valid JSONL transcript from role names."""
lines = []
parent = ""
for i, role in enumerate(roles):
uid = f"uuid-{i}"
entry: dict = {
"type": role,
"uuid": uid,
"parentUuid": parent,
"message": {
"role": role,
"content": [{"type": "text", "text": f"{role} message {i}"}],
},
}
if role == "assistant":
entry["message"]["id"] = f"msg_{i}"
entry["message"]["model"] = "test-model"
entry["message"]["type"] = "message"
entry["message"]["stop_reason"] = STOP_REASON_END_TURN
lines.append(stdlib_json.dumps(entry))
parent = uid
return "\n".join(lines) + "\n"
class TestResolveBaselineModel:
"""Model selection honours the per-request mode."""
def test_fast_mode_selects_fast_model(self):
assert _resolve_baseline_model("fast") == config.fast_model
def test_extended_thinking_selects_default_model(self):
assert _resolve_baseline_model("extended_thinking") == config.model
def test_none_mode_selects_default_model(self):
"""Critical: baseline users without a mode MUST keep the default (opus)."""
assert _resolve_baseline_model(None) == config.model
def test_default_and_fast_models_differ(self):
"""Sanity: the two tiers are actually distinct in production config."""
assert config.model != config.fast_model
class TestLoadPriorTranscript:
"""``_load_prior_transcript`` wraps the download + validate + load flow."""
@pytest.mark.asyncio
async def test_loads_fresh_transcript(self):
builder = TranscriptBuilder()
content = _make_transcript_content("user", "assistant")
download = TranscriptDownload(content=content, message_count=2)
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=download),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=3,
transcript_builder=builder,
)
assert covers is True
assert builder.entry_count == 2
assert builder.last_entry_type == "assistant"
@pytest.mark.asyncio
async def test_rejects_stale_transcript(self):
"""msg_count strictly less than session-1 is treated as stale."""
builder = TranscriptBuilder()
content = _make_transcript_content("user", "assistant")
# session has 6 messages, transcript only covers 2 → stale.
download = TranscriptDownload(content=content, message_count=2)
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=download),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=6,
transcript_builder=builder,
)
assert covers is False
assert builder.is_empty
@pytest.mark.asyncio
async def test_missing_transcript_returns_false(self):
builder = TranscriptBuilder()
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=None),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=2,
transcript_builder=builder,
)
assert covers is False
assert builder.is_empty
@pytest.mark.asyncio
async def test_invalid_transcript_returns_false(self):
builder = TranscriptBuilder()
download = TranscriptDownload(
content='{"type":"progress","uuid":"a"}\n',
message_count=1,
)
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=download),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=2,
transcript_builder=builder,
)
assert covers is False
assert builder.is_empty
@pytest.mark.asyncio
async def test_download_exception_returns_false(self):
builder = TranscriptBuilder()
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(side_effect=RuntimeError("boom")),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=2,
transcript_builder=builder,
)
assert covers is False
assert builder.is_empty
@pytest.mark.asyncio
async def test_zero_message_count_not_stale(self):
"""When msg_count is 0 (unknown), staleness check is skipped."""
builder = TranscriptBuilder()
download = TranscriptDownload(
content=_make_transcript_content("user", "assistant"),
message_count=0,
)
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=download),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=20,
transcript_builder=builder,
)
assert covers is True
assert builder.entry_count == 2
class TestUploadFinalTranscript:
"""``_upload_final_transcript`` serialises and calls storage."""
@pytest.mark.asyncio
async def test_uploads_valid_transcript(self):
builder = TranscriptBuilder()
builder.append_user(content="hi")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "hello"}],
model="test-model",
stop_reason=STOP_REASON_END_TURN,
)
upload_mock = AsyncMock(return_value=None)
with patch(
"backend.copilot.baseline.service.upload_transcript",
new=upload_mock,
):
await _upload_final_transcript(
user_id="user-1",
session_id="session-1",
transcript_builder=builder,
session_msg_count=2,
)
upload_mock.assert_awaited_once()
assert upload_mock.await_args is not None
call_kwargs = upload_mock.await_args.kwargs
assert call_kwargs["user_id"] == "user-1"
assert call_kwargs["session_id"] == "session-1"
assert call_kwargs["message_count"] == 2
assert "hello" in call_kwargs["content"]
@pytest.mark.asyncio
async def test_skips_upload_when_builder_empty(self):
builder = TranscriptBuilder()
upload_mock = AsyncMock(return_value=None)
with patch(
"backend.copilot.baseline.service.upload_transcript",
new=upload_mock,
):
await _upload_final_transcript(
user_id="user-1",
session_id="session-1",
transcript_builder=builder,
session_msg_count=0,
)
upload_mock.assert_not_awaited()
@pytest.mark.asyncio
async def test_swallows_upload_exceptions(self):
"""Upload failures should not propagate (flow continues for the user)."""
builder = TranscriptBuilder()
builder.append_user(content="hi")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "hello"}],
model="test-model",
stop_reason=STOP_REASON_END_TURN,
)
with patch(
"backend.copilot.baseline.service.upload_transcript",
new=AsyncMock(side_effect=RuntimeError("storage unavailable")),
):
# Should not raise.
await _upload_final_transcript(
user_id="user-1",
session_id="session-1",
transcript_builder=builder,
session_msg_count=2,
)
class TestRecordTurnToTranscript:
"""``_record_turn_to_transcript`` translates LLMLoopResponse → transcript."""
def test_records_final_assistant_text(self):
builder = TranscriptBuilder()
builder.append_user(content="hi")
response = LLMLoopResponse(
response_text="hello there",
tool_calls=[],
raw_response=None,
)
_record_turn_to_transcript(
response,
tool_results=None,
transcript_builder=builder,
model="test-model",
)
assert builder.entry_count == 2
assert builder.last_entry_type == "assistant"
jsonl = builder.to_jsonl()
assert "hello there" in jsonl
assert STOP_REASON_END_TURN in jsonl
def test_records_tool_use_then_tool_result(self):
"""Anthropic ordering: assistant(tool_use) → user(tool_result)."""
builder = TranscriptBuilder()
builder.append_user(content="use a tool")
response = LLMLoopResponse(
response_text=None,
tool_calls=[
LLMToolCall(id="call-1", name="echo", arguments='{"text":"hi"}')
],
raw_response=None,
)
tool_results = [
ToolCallResult(tool_call_id="call-1", tool_name="echo", content="hi")
]
_record_turn_to_transcript(
response,
tool_results,
transcript_builder=builder,
model="test-model",
)
# user, assistant(tool_use), user(tool_result) = 3 entries
assert builder.entry_count == 3
jsonl = builder.to_jsonl()
assert STOP_REASON_TOOL_USE in jsonl
assert "tool_use" in jsonl
assert "tool_result" in jsonl
assert "call-1" in jsonl
def test_records_nothing_on_empty_response(self):
builder = TranscriptBuilder()
builder.append_user(content="hi")
response = LLMLoopResponse(
response_text=None,
tool_calls=[],
raw_response=None,
)
_record_turn_to_transcript(
response,
tool_results=None,
transcript_builder=builder,
model="test-model",
)
assert builder.entry_count == 1
def test_malformed_tool_args_dont_crash(self):
"""Bad JSON in tool arguments falls back to {} without raising."""
builder = TranscriptBuilder()
builder.append_user(content="hi")
response = LLMLoopResponse(
response_text=None,
tool_calls=[LLMToolCall(id="call-1", name="echo", arguments="{not-json")],
raw_response=None,
)
tool_results = [
ToolCallResult(tool_call_id="call-1", tool_name="echo", content="ok")
]
_record_turn_to_transcript(
response,
tool_results,
transcript_builder=builder,
model="test-model",
)
assert builder.entry_count == 3
jsonl = builder.to_jsonl()
assert '"input":{}' in jsonl
class TestRoundTrip:
"""End-to-end: load prior → append new turn → upload."""
@pytest.mark.asyncio
async def test_full_round_trip(self):
prior = _make_transcript_content("user", "assistant")
download = TranscriptDownload(content=prior, message_count=2)
builder = TranscriptBuilder()
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=download),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=3,
transcript_builder=builder,
)
assert covers is True
assert builder.entry_count == 2
# New user turn.
builder.append_user(content="new question")
assert builder.entry_count == 3
# New assistant turn.
response = LLMLoopResponse(
response_text="new answer",
tool_calls=[],
raw_response=None,
)
_record_turn_to_transcript(
response,
tool_results=None,
transcript_builder=builder,
model="test-model",
)
assert builder.entry_count == 4
# Upload.
upload_mock = AsyncMock(return_value=None)
with patch(
"backend.copilot.baseline.service.upload_transcript",
new=upload_mock,
):
await _upload_final_transcript(
user_id="user-1",
session_id="session-1",
transcript_builder=builder,
session_msg_count=4,
)
upload_mock.assert_awaited_once()
assert upload_mock.await_args is not None
uploaded = upload_mock.await_args.kwargs["content"]
assert "new question" in uploaded
assert "new answer" in uploaded
# Original content preserved in the round trip.
assert "user message 0" in uploaded
assert "assistant message 1" in uploaded
@pytest.mark.asyncio
async def test_backfill_append_guard(self):
"""Backfill only runs when the last entry is not already assistant."""
builder = TranscriptBuilder()
builder.append_user(content="hi")
# Simulate the backfill guard from stream_chat_completion_baseline.
assistant_text = "partial text before error"
if builder.last_entry_type != "assistant":
builder.append_assistant(
content_blocks=[{"type": "text", "text": assistant_text}],
model="test-model",
stop_reason=STOP_REASON_END_TURN,
)
assert builder.last_entry_type == "assistant"
assert "partial text before error" in builder.to_jsonl()
# Second invocation: the guard must prevent double-append.
initial_count = builder.entry_count
if builder.last_entry_type != "assistant":
builder.append_assistant(
content_blocks=[{"type": "text", "text": "duplicate"}],
model="test-model",
stop_reason=STOP_REASON_END_TURN,
)
assert builder.entry_count == initial_count
class TestIsTranscriptStale:
"""``is_transcript_stale`` gates prior-transcript loading."""
def test_none_download_is_not_stale(self):
assert is_transcript_stale(None, session_msg_count=5) is False
def test_zero_message_count_is_not_stale(self):
"""Legacy transcripts without msg_count tracking must remain usable."""
dl = TranscriptDownload(content="", message_count=0)
assert is_transcript_stale(dl, session_msg_count=20) is False
def test_stale_when_covers_less_than_prefix(self):
dl = TranscriptDownload(content="", message_count=2)
# session has 6 messages; transcript must cover at least 5 (6-1).
assert is_transcript_stale(dl, session_msg_count=6) is True
def test_fresh_when_covers_full_prefix(self):
dl = TranscriptDownload(content="", message_count=5)
assert is_transcript_stale(dl, session_msg_count=6) is False
def test_fresh_when_exceeds_prefix(self):
"""Race: transcript ahead of session count is still acceptable."""
dl = TranscriptDownload(content="", message_count=10)
assert is_transcript_stale(dl, session_msg_count=6) is False
def test_boundary_equal_to_prefix_minus_one(self):
dl = TranscriptDownload(content="", message_count=5)
assert is_transcript_stale(dl, session_msg_count=6) is False
class TestShouldUploadTranscript:
"""``should_upload_transcript`` gates the final upload."""
def test_upload_allowed_for_user_with_coverage(self):
assert should_upload_transcript("user-1", True) is True
def test_upload_skipped_when_no_user(self):
assert should_upload_transcript(None, True) is False
def test_upload_skipped_when_empty_user(self):
assert should_upload_transcript("", True) is False
def test_upload_skipped_without_coverage(self):
"""Partial transcript must never clobber a more complete stored one."""
assert should_upload_transcript("user-1", False) is False
def test_upload_skipped_when_no_user_and_no_coverage(self):
assert should_upload_transcript(None, False) is False
class TestTranscriptLifecycle:
"""End-to-end: download → validate → build → upload.
Simulates the full transcript lifecycle inside
``stream_chat_completion_baseline`` by mocking the storage layer and
driving each step through the real helpers.
"""
@pytest.mark.asyncio
async def test_full_lifecycle_happy_path(self):
"""Fresh download, append a turn, upload covers the session."""
builder = TranscriptBuilder()
prior = _make_transcript_content("user", "assistant")
download = TranscriptDownload(content=prior, message_count=2)
upload_mock = AsyncMock(return_value=None)
with (
patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=download),
),
patch(
"backend.copilot.baseline.service.upload_transcript",
new=upload_mock,
),
):
# --- 1. Download & load prior transcript ---
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=3,
transcript_builder=builder,
)
assert covers is True
# --- 2. Append a new user turn + a new assistant response ---
builder.append_user(content="follow-up question")
_record_turn_to_transcript(
LLMLoopResponse(
response_text="follow-up answer",
tool_calls=[],
raw_response=None,
),
tool_results=None,
transcript_builder=builder,
model="test-model",
)
# --- 3. Gate + upload ---
assert (
should_upload_transcript(
user_id="user-1", transcript_covers_prefix=covers
)
is True
)
await _upload_final_transcript(
user_id="user-1",
session_id="session-1",
transcript_builder=builder,
session_msg_count=4,
)
upload_mock.assert_awaited_once()
assert upload_mock.await_args is not None
uploaded = upload_mock.await_args.kwargs["content"]
assert "follow-up question" in uploaded
assert "follow-up answer" in uploaded
# Original prior-turn content preserved.
assert "user message 0" in uploaded
assert "assistant message 1" in uploaded
@pytest.mark.asyncio
async def test_lifecycle_stale_download_suppresses_upload(self):
"""Stale download → covers=False → upload must be skipped."""
builder = TranscriptBuilder()
# session has 10 msgs but stored transcript only covers 2 → stale.
stale = TranscriptDownload(
content=_make_transcript_content("user", "assistant"),
message_count=2,
)
upload_mock = AsyncMock(return_value=None)
with (
patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=stale),
),
patch(
"backend.copilot.baseline.service.upload_transcript",
new=upload_mock,
),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=10,
transcript_builder=builder,
)
assert covers is False
# The caller's gate mirrors the production path.
assert (
should_upload_transcript(user_id="user-1", transcript_covers_prefix=covers)
is False
)
upload_mock.assert_not_awaited()
@pytest.mark.asyncio
async def test_lifecycle_anonymous_user_skips_upload(self):
"""Anonymous (user_id=None) → upload gate must return False."""
builder = TranscriptBuilder()
builder.append_user(content="hi")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "hello"}],
model="test-model",
stop_reason=STOP_REASON_END_TURN,
)
assert (
should_upload_transcript(user_id=None, transcript_covers_prefix=True)
is False
)
@pytest.mark.asyncio
async def test_lifecycle_missing_download_still_uploads_new_content(self):
"""No prior transcript → covers defaults to True in the service,
new turn should upload cleanly."""
builder = TranscriptBuilder()
upload_mock = AsyncMock(return_value=None)
with (
patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=None),
),
patch(
"backend.copilot.baseline.service.upload_transcript",
new=upload_mock,
),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=1,
transcript_builder=builder,
)
# No download: covers is False, so the production path would
# skip upload. This protects against overwriting a future
# more-complete transcript with a single-turn snapshot.
assert covers is False
assert (
should_upload_transcript(
user_id="user-1", transcript_covers_prefix=covers
)
is False
)
upload_mock.assert_not_awaited()

View File

@@ -8,13 +8,26 @@ from pydantic_settings import BaseSettings
from backend.util.clients import OPENROUTER_BASE_URL
# Per-request routing mode for a single chat turn.
# - 'fast': route to the baseline OpenAI-compatible path with the cheaper model.
# - 'extended_thinking': route to the Claude Agent SDK path with the default
# (opus) model.
# ``None`` means "no override"; the server falls back to the Claude Code
# subscription flag → LaunchDarkly COPILOT_SDK → config.use_claude_agent_sdk.
CopilotMode = Literal["fast", "extended_thinking"]
class ChatConfig(BaseSettings):
"""Configuration for the chat system."""
# OpenAI API Configuration
model: str = Field(
default="anthropic/claude-opus-4.6", description="Default model to use"
default="anthropic/claude-opus-4.6",
description="Default model for extended thinking mode",
)
fast_model: str = Field(
default="anthropic/claude-sonnet-4",
description="Model for fast mode (baseline path). Should be faster/cheaper than the default model.",
)
title_model: str = Field(
default="openai/gpt-4o-mini",
@@ -81,11 +94,11 @@ class ChatConfig(BaseSettings):
# allows ~70-100 turns/day.
# Checked at the HTTP layer (routes.py) before each turn.
#
# TODO: These are deploy-time constants applied identically to every user.
# If per-user or per-plan limits are needed (e.g., free tier vs paid), these
# must move to the database (e.g., a UserPlan table) and get_usage_status /
# check_rate_limit would look up each user's specific limits instead of
# reading config.daily_token_limit / config.weekly_token_limit.
# These are base limits for the FREE tier. Higher tiers (PRO, BUSINESS,
# ENTERPRISE) multiply these by their tier multiplier (see
# rate_limit.TIER_MULTIPLIERS). User tier is stored in the
# User.subscriptionTier DB column and resolved inside
# get_global_rate_limits().
daily_token_limit: int = Field(
default=2_500_000,
description="Max tokens per day, resets at midnight UTC (0 = unlimited)",

View File

@@ -14,6 +14,7 @@ from prisma.types import (
ChatSessionUpdateInput,
ChatSessionWhereInput,
)
from pydantic import BaseModel
from backend.data import db
from backend.util.json import SafeJson, sanitize_string
@@ -23,12 +24,22 @@ from .model import (
ChatSession,
ChatSessionInfo,
ChatSessionMetadata,
invalidate_session_cache,
cache_chat_session,
)
from .model import get_chat_session as get_chat_session_cached
logger = logging.getLogger(__name__)
class PaginatedMessages(BaseModel):
"""Result of a paginated message query."""
messages: list[ChatMessage]
has_more: bool
oldest_sequence: int | None
session: ChatSessionInfo
async def get_chat_session(session_id: str) -> ChatSession | None:
"""Get a chat session by ID from the database."""
session = await PrismaChatSession.prisma().find_unique(
@@ -38,6 +49,116 @@ async def get_chat_session(session_id: str) -> ChatSession | None:
return ChatSession.from_db(session) if session else None
async def get_chat_session_metadata(session_id: str) -> ChatSessionInfo | None:
"""Get chat session metadata (without messages) for ownership validation."""
session = await PrismaChatSession.prisma().find_unique(
where={"id": session_id},
)
return ChatSessionInfo.from_db(session) if session else None
async def get_chat_messages_paginated(
session_id: str,
limit: int = 50,
before_sequence: int | None = None,
user_id: str | None = None,
) -> PaginatedMessages | None:
"""Get paginated messages for a session, newest first.
Verifies session existence (and ownership when ``user_id`` is provided)
in parallel with the message query. Returns ``None`` when the session
is not found or does not belong to the user.
Args:
session_id: The chat session ID.
limit: Max messages to return.
before_sequence: Cursor — return messages with sequence < this value.
user_id: If provided, filters via ``Session.userId`` so only the
session owner's messages are returned (acts as an ownership guard).
"""
# Build session-existence / ownership check
session_where: ChatSessionWhereInput = {"id": session_id}
if user_id is not None:
session_where["userId"] = user_id
# Build message include — fetch paginated messages in the same query
msg_include: dict[str, Any] = {
"order_by": {"sequence": "desc"},
"take": limit + 1,
}
if before_sequence is not None:
msg_include["where"] = {"sequence": {"lt": before_sequence}}
# Single query: session existence/ownership + paginated messages
session = await PrismaChatSession.prisma().find_first(
where=session_where,
include={"Messages": msg_include},
)
if session is None:
return None
session_info = ChatSessionInfo.from_db(session)
results = list(session.Messages) if session.Messages else []
has_more = len(results) > limit
results = results[:limit]
# Reverse to ascending order
results.reverse()
# Tool-call boundary fix: if the oldest message is a tool message,
# expand backward to include the preceding assistant message that
# owns the tool_calls, so convertChatSessionMessagesToUiMessages
# can pair them correctly.
_BOUNDARY_SCAN_LIMIT = 10
if results and results[0].role == "tool":
boundary_where: dict[str, Any] = {
"sessionId": session_id,
"sequence": {"lt": results[0].sequence},
}
if user_id is not None:
boundary_where["Session"] = {"is": {"userId": user_id}}
extra = await PrismaChatMessage.prisma().find_many(
where=boundary_where,
order={"sequence": "desc"},
take=_BOUNDARY_SCAN_LIMIT,
)
# Find the first non-tool message (should be the assistant)
boundary_msgs = []
found_owner = False
for msg in extra:
boundary_msgs.append(msg)
if msg.role != "tool":
found_owner = True
break
boundary_msgs.reverse()
if not found_owner:
logger.warning(
"Boundary expansion did not find owning assistant message "
"for session=%s before sequence=%s (%d msgs scanned)",
session_id,
results[0].sequence,
len(extra),
)
if boundary_msgs:
results = boundary_msgs + results
# Only mark has_more if the expanded boundary isn't the
# very start of the conversation (sequence 0).
if boundary_msgs[0].sequence > 0:
has_more = True
messages = [ChatMessage.from_db(m) for m in results]
oldest_sequence = messages[0].sequence if messages else None
return PaginatedMessages(
messages=messages,
has_more=has_more,
oldest_sequence=oldest_sequence,
session=session_info,
)
async def create_chat_session(
session_id: str,
user_id: str,
@@ -380,8 +501,11 @@ async def update_tool_message_content(
async def set_turn_duration(session_id: str, duration_ms: int) -> None:
"""Set durationMs on the last assistant message in a session.
Also invalidates the Redis session cache so the next GET returns
the updated duration.
Updates the Redis cache in-place instead of invalidating it.
Invalidation would delete the key, creating a window where concurrent
``get_chat_session`` calls re-populate the cache from DB — potentially
with stale data if the DB write from the previous turn hasn't propagated.
This race caused duplicate user messages on the next turn.
"""
last_msg = await PrismaChatMessage.prisma().find_first(
where={"sessionId": session_id, "role": "assistant"},
@@ -392,5 +516,13 @@ async def set_turn_duration(session_id: str, duration_ms: int) -> None:
where={"id": last_msg.id},
data={"durationMs": duration_ms},
)
# Invalidate cache so the session is re-fetched from DB with durationMs
await invalidate_session_cache(session_id)
# Update cache in-place rather than invalidating to avoid a
# race window where the empty cache gets re-populated with
# stale data by a concurrent get_chat_session call.
session = await get_chat_session_cached(session_id)
if session and session.messages:
for msg in reversed(session.messages):
if msg.role == "assistant":
msg.duration_ms = duration_ms
break
await cache_chat_session(session)

View File

@@ -0,0 +1,388 @@
"""Unit tests for copilot.db — paginated message queries."""
from __future__ import annotations
from datetime import UTC, datetime
from typing import Any
from unittest.mock import AsyncMock, patch
import pytest
from prisma.models import ChatMessage as PrismaChatMessage
from prisma.models import ChatSession as PrismaChatSession
from backend.copilot.db import (
PaginatedMessages,
get_chat_messages_paginated,
set_turn_duration,
)
from backend.copilot.model import ChatMessage as CopilotChatMessage
from backend.copilot.model import ChatSession, get_chat_session, upsert_chat_session
def _make_msg(
sequence: int,
role: str = "assistant",
content: str | None = "hello",
tool_calls: Any = None,
) -> PrismaChatMessage:
"""Build a minimal PrismaChatMessage for testing."""
return PrismaChatMessage(
id=f"msg-{sequence}",
createdAt=datetime.now(UTC),
sessionId="sess-1",
role=role,
content=content,
sequence=sequence,
toolCalls=tool_calls,
name=None,
toolCallId=None,
refusal=None,
functionCall=None,
)
def _make_session(
session_id: str = "sess-1",
user_id: str = "user-1",
messages: list[PrismaChatMessage] | None = None,
) -> PrismaChatSession:
"""Build a minimal PrismaChatSession for testing."""
now = datetime.now(UTC)
session = PrismaChatSession.model_construct(
id=session_id,
createdAt=now,
updatedAt=now,
userId=user_id,
credentials={},
successfulAgentRuns={},
successfulAgentSchedules={},
totalPromptTokens=0,
totalCompletionTokens=0,
title=None,
metadata={},
Messages=messages or [],
)
return session
SESSION_ID = "sess-1"
@pytest.fixture()
def mock_db():
"""Patch ChatSession.prisma().find_first and ChatMessage.prisma().find_many.
find_first is used for the main query (session + included messages).
find_many is used only for boundary expansion queries.
"""
with (
patch.object(PrismaChatSession, "prisma") as mock_session_prisma,
patch.object(PrismaChatMessage, "prisma") as mock_msg_prisma,
):
find_first = AsyncMock()
mock_session_prisma.return_value.find_first = find_first
find_many = AsyncMock(return_value=[])
mock_msg_prisma.return_value.find_many = find_many
yield find_first, find_many
# ---------- Basic pagination ----------
@pytest.mark.asyncio
async def test_basic_page_returns_messages_ascending(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""Messages are returned in ascending sequence order."""
find_first, _ = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(3), _make_msg(2), _make_msg(1)],
)
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
assert isinstance(page, PaginatedMessages)
assert [m.sequence for m in page.messages] == [1, 2, 3]
assert page.has_more is False
assert page.oldest_sequence == 1
@pytest.mark.asyncio
async def test_has_more_when_results_exceed_limit(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""has_more is True when DB returns more than limit items."""
find_first, _ = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(3), _make_msg(2), _make_msg(1)],
)
page = await get_chat_messages_paginated(SESSION_ID, limit=2)
assert page is not None
assert page.has_more is True
assert len(page.messages) == 2
assert [m.sequence for m in page.messages] == [2, 3]
@pytest.mark.asyncio
async def test_empty_session_returns_no_messages(
mock_db: tuple[AsyncMock, AsyncMock],
):
find_first, _ = mock_db
find_first.return_value = _make_session(messages=[])
page = await get_chat_messages_paginated(SESSION_ID, limit=50)
assert page is not None
assert page.messages == []
assert page.has_more is False
assert page.oldest_sequence is None
@pytest.mark.asyncio
async def test_before_sequence_filters_correctly(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""before_sequence is passed as a where filter inside the Messages include."""
find_first, _ = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(2), _make_msg(1)],
)
await get_chat_messages_paginated(SESSION_ID, limit=50, before_sequence=5)
call_kwargs = find_first.call_args
include = call_kwargs.kwargs.get("include") or call_kwargs[1].get("include")
assert include["Messages"]["where"] == {"sequence": {"lt": 5}}
@pytest.mark.asyncio
async def test_no_where_on_messages_without_before_sequence(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""Without before_sequence, the Messages include has no where clause."""
find_first, _ = mock_db
find_first.return_value = _make_session(messages=[_make_msg(1)])
await get_chat_messages_paginated(SESSION_ID, limit=50)
call_kwargs = find_first.call_args
include = call_kwargs.kwargs.get("include") or call_kwargs[1].get("include")
assert "where" not in include["Messages"]
@pytest.mark.asyncio
async def test_user_id_filter_applied_to_session_where(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""user_id adds a userId filter to the session-level where clause."""
find_first, _ = mock_db
find_first.return_value = _make_session(messages=[_make_msg(1)])
await get_chat_messages_paginated(SESSION_ID, limit=50, user_id="user-abc")
call_kwargs = find_first.call_args
where = call_kwargs.kwargs.get("where") or call_kwargs[1].get("where")
assert where["userId"] == "user-abc"
@pytest.mark.asyncio
async def test_session_not_found_returns_none(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""Returns None when session doesn't exist or user doesn't own it."""
find_first, _ = mock_db
find_first.return_value = None
page = await get_chat_messages_paginated(SESSION_ID, limit=50)
assert page is None
@pytest.mark.asyncio
async def test_session_info_included_in_result(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""PaginatedMessages includes session metadata."""
find_first, _ = mock_db
find_first.return_value = _make_session(messages=[_make_msg(1)])
page = await get_chat_messages_paginated(SESSION_ID, limit=50)
assert page is not None
assert page.session.session_id == SESSION_ID
# ---------- Backward boundary expansion ----------
@pytest.mark.asyncio
async def test_boundary_expansion_includes_assistant(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""When page starts with a tool message, expand backward to include
the owning assistant message."""
find_first, find_many = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(5, role="tool"), _make_msg(4, role="tool")],
)
find_many.return_value = [_make_msg(3, role="assistant")]
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
assert page is not None
assert [m.sequence for m in page.messages] == [3, 4, 5]
assert page.messages[0].role == "assistant"
assert page.oldest_sequence == 3
@pytest.mark.asyncio
async def test_boundary_expansion_includes_multiple_tool_msgs(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""Boundary expansion scans past consecutive tool messages to find
the owning assistant."""
find_first, find_many = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(7, role="tool")],
)
find_many.return_value = [
_make_msg(6, role="tool"),
_make_msg(5, role="tool"),
_make_msg(4, role="assistant"),
]
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
assert page is not None
assert [m.sequence for m in page.messages] == [4, 5, 6, 7]
assert page.messages[0].role == "assistant"
@pytest.mark.asyncio
async def test_boundary_expansion_sets_has_more_when_not_at_start(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""After boundary expansion, has_more=True if expanded msgs aren't at seq 0."""
find_first, find_many = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(3, role="tool")],
)
find_many.return_value = [_make_msg(2, role="assistant")]
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
assert page is not None
assert page.has_more is True
@pytest.mark.asyncio
async def test_boundary_expansion_no_has_more_at_conversation_start(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""has_more stays False when boundary expansion reaches seq 0."""
find_first, find_many = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(1, role="tool")],
)
find_many.return_value = [_make_msg(0, role="assistant")]
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
assert page is not None
assert page.has_more is False
assert page.oldest_sequence == 0
@pytest.mark.asyncio
async def test_no_boundary_expansion_when_first_msg_not_tool(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""No boundary expansion when the first message is not a tool message."""
find_first, find_many = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(3, role="user"), _make_msg(2, role="assistant")],
)
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
assert page is not None
assert find_many.call_count == 0
assert [m.sequence for m in page.messages] == [2, 3]
@pytest.mark.asyncio
async def test_boundary_expansion_warns_when_no_owner_found(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""When boundary scan doesn't find a non-tool message, a warning is logged
and the boundary messages are still included."""
find_first, find_many = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(10, role="tool")],
)
find_many.return_value = [_make_msg(i, role="tool") for i in range(9, -1, -1)]
with patch("backend.copilot.db.logger") as mock_logger:
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
mock_logger.warning.assert_called_once()
assert page is not None
assert page.messages[0].role == "tool"
assert len(page.messages) > 1
# ---------- Turn duration (integration tests) ----------
@pytest.mark.asyncio(loop_scope="session")
async def test_set_turn_duration_updates_cache_in_place(setup_test_user, test_user_id):
"""set_turn_duration patches the cached session without invalidation.
Verifies that after calling set_turn_duration the Redis-cached session
reflects the updated durationMs on the last assistant message, without
the cache having been deleted and re-populated (which could race with
concurrent get_chat_session calls).
"""
session = ChatSession.new(user_id=test_user_id, dry_run=False)
session.messages = [
CopilotChatMessage(role="user", content="hello"),
CopilotChatMessage(role="assistant", content="hi there"),
]
session = await upsert_chat_session(session)
# Ensure the session is in cache
cached = await get_chat_session(session.session_id, test_user_id)
assert cached is not None
assert cached.messages[-1].duration_ms is None
# Update turn duration — should patch cache in-place
await set_turn_duration(session.session_id, 1234)
# Read from cache (not DB) — the cache should already have the update
updated = await get_chat_session(session.session_id, test_user_id)
assert updated is not None
assistant_msgs = [m for m in updated.messages if m.role == "assistant"]
assert len(assistant_msgs) == 1
assert assistant_msgs[0].duration_ms == 1234
@pytest.mark.asyncio(loop_scope="session")
async def test_set_turn_duration_no_assistant_message(setup_test_user, test_user_id):
"""set_turn_duration is a no-op when there are no assistant messages."""
session = ChatSession.new(user_id=test_user_id, dry_run=False)
session.messages = [
CopilotChatMessage(role="user", content="hello"),
]
session = await upsert_chat_session(session)
# Should not raise
await set_turn_duration(session.session_id, 5678)
cached = await get_chat_session(session.session_id, test_user_id)
assert cached is not None
# User message should not have durationMs
assert cached.messages[0].duration_ms is None

View File

@@ -13,7 +13,7 @@ import time
from backend.copilot import stream_registry
from backend.copilot.baseline import stream_chat_completion_baseline
from backend.copilot.config import ChatConfig
from backend.copilot.config import ChatConfig, CopilotMode
from backend.copilot.response_model import StreamError
from backend.copilot.sdk import service as sdk_service
from backend.copilot.sdk.dummy import stream_chat_completion_dummy
@@ -30,6 +30,57 @@ from .utils import CoPilotExecutionEntry, CoPilotLogMetadata
logger = TruncatedLogger(logging.getLogger(__name__), prefix="[CoPilotExecutor]")
# ============ Mode Routing ============ #
async def resolve_effective_mode(
mode: CopilotMode | None,
user_id: str | None,
) -> CopilotMode | None:
"""Strip ``mode`` when the user is not entitled to the toggle.
The UI gates the mode toggle behind ``CHAT_MODE_OPTION``; the
processor enforces the same gate server-side so an authenticated
user cannot bypass the flag by crafting a request directly.
"""
if mode is None:
return None
allowed = await is_feature_enabled(
Flag.CHAT_MODE_OPTION,
user_id or "anonymous",
default=False,
)
if not allowed:
logger.info(f"Ignoring mode={mode} — CHAT_MODE_OPTION is disabled for user")
return None
return mode
async def resolve_use_sdk_for_mode(
mode: CopilotMode | None,
user_id: str | None,
*,
use_claude_code_subscription: bool,
config_default: bool,
) -> bool:
"""Pick the SDK vs baseline path for a single turn.
Per-request ``mode`` wins whenever it is set (after the
``CHAT_MODE_OPTION`` gate has been applied upstream). Otherwise
falls back to the Claude Code subscription override, then the
``COPILOT_SDK`` LaunchDarkly flag, then the config default.
"""
if mode == "fast":
return False
if mode == "extended_thinking":
return True
return use_claude_code_subscription or await is_feature_enabled(
Flag.COPILOT_SDK,
user_id or "anonymous",
default=config_default,
)
# ============ Module Entry Points ============ #
# Thread-local storage for processor instances
@@ -100,8 +151,8 @@ class CoPilotProcessor:
This method is called once per worker thread to set up the async event
loop and initialize any required resources.
Database is accessed only through DatabaseManager, so we don't need to connect
to Prisma directly.
DB operations route through DatabaseManagerAsyncClient (RPC) via the
db_accessors pattern — no direct Prisma connection is needed here.
"""
configure_logging()
set_service_name("CoPilotExecutor")
@@ -250,21 +301,26 @@ class CoPilotProcessor:
if config.test_mode:
stream_fn = stream_chat_completion_dummy
log.warning("Using DUMMY service (CHAT_TEST_MODE=true)")
effective_mode = None
else:
use_sdk = (
config.use_claude_code_subscription
or await is_feature_enabled(
Flag.COPILOT_SDK,
entry.user_id or "anonymous",
default=config.use_claude_agent_sdk,
)
# Enforce server-side feature-flag gate so unauthorised
# users cannot force a mode by crafting the request.
effective_mode = await resolve_effective_mode(entry.mode, entry.user_id)
use_sdk = await resolve_use_sdk_for_mode(
effective_mode,
entry.user_id,
use_claude_code_subscription=config.use_claude_code_subscription,
config_default=config.use_claude_agent_sdk,
)
stream_fn = (
sdk_service.stream_chat_completion_sdk
if use_sdk
else stream_chat_completion_baseline
)
log.info(f"Using {'SDK' if use_sdk else 'baseline'} service")
log.info(
f"Using {'SDK' if use_sdk else 'baseline'} service "
f"(mode={effective_mode or 'default'})"
)
# Stream chat completion and publish chunks to Redis.
# stream_and_publish wraps the raw stream with registry
@@ -276,6 +332,7 @@ class CoPilotProcessor:
user_id=entry.user_id,
context=entry.context,
file_ids=entry.file_ids,
mode=effective_mode,
)
async for chunk in stream_registry.stream_and_publish(
session_id=entry.session_id,

View File

@@ -0,0 +1,175 @@
"""Unit tests for CoPilot mode routing logic in the processor.
Tests cover the mode→service mapping:
- 'fast' → baseline service
- 'extended_thinking' → SDK service
- None → feature flag / config fallback
as well as the ``CHAT_MODE_OPTION`` server-side gate. The tests import
the real production helpers from ``processor.py`` so the routing logic
has meaningful coverage.
"""
from unittest.mock import AsyncMock, patch
import pytest
from backend.copilot.executor.processor import (
resolve_effective_mode,
resolve_use_sdk_for_mode,
)
class TestResolveUseSdkForMode:
"""Tests for the per-request mode routing logic."""
@pytest.mark.asyncio
async def test_fast_mode_uses_baseline(self):
"""mode='fast' always routes to baseline, regardless of flags."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=True),
):
assert (
await resolve_use_sdk_for_mode(
"fast",
"user-1",
use_claude_code_subscription=True,
config_default=True,
)
is False
)
@pytest.mark.asyncio
async def test_extended_thinking_uses_sdk(self):
"""mode='extended_thinking' always routes to SDK, regardless of flags."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=False),
):
assert (
await resolve_use_sdk_for_mode(
"extended_thinking",
"user-1",
use_claude_code_subscription=False,
config_default=False,
)
is True
)
@pytest.mark.asyncio
async def test_none_mode_uses_subscription_override(self):
"""mode=None with claude_code_subscription=True routes to SDK."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=False),
):
assert (
await resolve_use_sdk_for_mode(
None,
"user-1",
use_claude_code_subscription=True,
config_default=False,
)
is True
)
@pytest.mark.asyncio
async def test_none_mode_uses_feature_flag(self):
"""mode=None with feature flag enabled routes to SDK."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=True),
) as flag_mock:
assert (
await resolve_use_sdk_for_mode(
None,
"user-1",
use_claude_code_subscription=False,
config_default=False,
)
is True
)
flag_mock.assert_awaited_once()
@pytest.mark.asyncio
async def test_none_mode_uses_config_default(self):
"""mode=None falls back to config.use_claude_agent_sdk."""
# When LaunchDarkly returns the default (True), we expect SDK routing.
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=True),
):
assert (
await resolve_use_sdk_for_mode(
None,
"user-1",
use_claude_code_subscription=False,
config_default=True,
)
is True
)
@pytest.mark.asyncio
async def test_none_mode_all_disabled(self):
"""mode=None with all flags off routes to baseline."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=False),
):
assert (
await resolve_use_sdk_for_mode(
None,
"user-1",
use_claude_code_subscription=False,
config_default=False,
)
is False
)
class TestResolveEffectiveMode:
"""Tests for the CHAT_MODE_OPTION server-side gate."""
@pytest.mark.asyncio
async def test_none_mode_passes_through(self):
"""mode=None is returned as-is without a flag check."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=False),
) as flag_mock:
assert await resolve_effective_mode(None, "user-1") is None
flag_mock.assert_not_awaited()
@pytest.mark.asyncio
async def test_mode_stripped_when_flag_disabled(self):
"""When CHAT_MODE_OPTION is off, mode is dropped to None."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=False),
):
assert await resolve_effective_mode("fast", "user-1") is None
assert await resolve_effective_mode("extended_thinking", "user-1") is None
@pytest.mark.asyncio
async def test_mode_preserved_when_flag_enabled(self):
"""When CHAT_MODE_OPTION is on, the user-selected mode is preserved."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=True),
):
assert await resolve_effective_mode("fast", "user-1") == "fast"
assert (
await resolve_effective_mode("extended_thinking", "user-1")
== "extended_thinking"
)
@pytest.mark.asyncio
async def test_anonymous_user_with_mode(self):
"""Anonymous users (user_id=None) still pass through the gate."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=False),
) as flag_mock:
assert await resolve_effective_mode("fast", None) is None
flag_mock.assert_awaited_once()

View File

@@ -9,6 +9,7 @@ import logging
from pydantic import BaseModel
from backend.copilot.config import CopilotMode
from backend.data.rabbitmq import Exchange, ExchangeType, Queue, RabbitMQConfig
from backend.util.logging import TruncatedLogger, is_structured_logging_enabled
@@ -156,6 +157,9 @@ class CoPilotExecutionEntry(BaseModel):
file_ids: list[str] | None = None
"""Workspace file IDs attached to the user's message"""
mode: CopilotMode | None = None
"""Autopilot mode override: 'fast' or 'extended_thinking'. None = server default."""
class CancelCoPilotEvent(BaseModel):
"""Event to cancel a CoPilot operation."""
@@ -175,6 +179,7 @@ async def enqueue_copilot_turn(
is_user_message: bool = True,
context: dict[str, str] | None = None,
file_ids: list[str] | None = None,
mode: CopilotMode | None = None,
) -> None:
"""Enqueue a CoPilot task for processing by the executor service.
@@ -186,6 +191,7 @@ async def enqueue_copilot_turn(
is_user_message: Whether the message is from the user (vs system/assistant)
context: Optional context for the message (e.g., {url: str, content: str})
file_ids: Optional workspace file IDs attached to the user's message
mode: Autopilot mode override ('fast' or 'extended_thinking'). None = server default.
"""
from backend.util.clients import get_async_copilot_queue
@@ -197,6 +203,7 @@ async def enqueue_copilot_turn(
is_user_message=is_user_message,
context=context,
file_ids=file_ids,
mode=mode,
)
queue_client = await get_async_copilot_queue()

View File

@@ -0,0 +1,123 @@
"""Tests for CoPilot executor utils (queue config, message models, logging)."""
from backend.copilot.executor.utils import (
COPILOT_EXECUTION_EXCHANGE,
COPILOT_EXECUTION_QUEUE_NAME,
COPILOT_EXECUTION_ROUTING_KEY,
CancelCoPilotEvent,
CoPilotExecutionEntry,
CoPilotLogMetadata,
create_copilot_queue_config,
)
class TestCoPilotExecutionEntry:
def test_basic_fields(self):
entry = CoPilotExecutionEntry(
session_id="s1",
user_id="u1",
message="hello",
)
assert entry.session_id == "s1"
assert entry.user_id == "u1"
assert entry.message == "hello"
assert entry.is_user_message is True
assert entry.mode is None
assert entry.context is None
assert entry.file_ids is None
def test_mode_field(self):
entry = CoPilotExecutionEntry(
session_id="s1",
user_id="u1",
message="test",
mode="fast",
)
assert entry.mode == "fast"
entry2 = CoPilotExecutionEntry(
session_id="s1",
user_id="u1",
message="test",
mode="extended_thinking",
)
assert entry2.mode == "extended_thinking"
def test_optional_fields(self):
entry = CoPilotExecutionEntry(
session_id="s1",
user_id="u1",
message="test",
turn_id="t1",
context={"url": "https://example.com"},
file_ids=["f1", "f2"],
is_user_message=False,
)
assert entry.turn_id == "t1"
assert entry.context == {"url": "https://example.com"}
assert entry.file_ids == ["f1", "f2"]
assert entry.is_user_message is False
def test_serialization_roundtrip(self):
entry = CoPilotExecutionEntry(
session_id="s1",
user_id="u1",
message="hello",
mode="fast",
)
json_str = entry.model_dump_json()
restored = CoPilotExecutionEntry.model_validate_json(json_str)
assert restored == entry
class TestCancelCoPilotEvent:
def test_basic(self):
event = CancelCoPilotEvent(session_id="s1")
assert event.session_id == "s1"
def test_serialization(self):
event = CancelCoPilotEvent(session_id="s1")
restored = CancelCoPilotEvent.model_validate_json(event.model_dump_json())
assert restored.session_id == "s1"
class TestCreateCopilotQueueConfig:
def test_returns_valid_config(self):
config = create_copilot_queue_config()
assert len(config.exchanges) == 2
assert len(config.queues) == 2
def test_execution_queue_properties(self):
config = create_copilot_queue_config()
exec_queue = next(
q for q in config.queues if q.name == COPILOT_EXECUTION_QUEUE_NAME
)
assert exec_queue.durable is True
assert exec_queue.exchange == COPILOT_EXECUTION_EXCHANGE
assert exec_queue.routing_key == COPILOT_EXECUTION_ROUTING_KEY
def test_cancel_queue_uses_fanout(self):
config = create_copilot_queue_config()
cancel_queue = next(
q for q in config.queues if q.name != COPILOT_EXECUTION_QUEUE_NAME
)
assert cancel_queue.exchange is not None
assert cancel_queue.exchange.type.value == "fanout"
class TestCoPilotLogMetadata:
def test_creates_logger_with_metadata(self):
import logging
base_logger = logging.getLogger("test")
log = CoPilotLogMetadata(base_logger, session_id="s1", user_id="u1")
assert log is not None
def test_filters_none_values(self):
import logging
base_logger = logging.getLogger("test")
log = CoPilotLogMetadata(
base_logger, session_id="s1", user_id=None, turn_id="t1"
)
assert log is not None

View File

@@ -64,6 +64,7 @@ class ChatMessage(BaseModel):
refusal: str | None = None
tool_calls: list[dict] | None = None
function_call: dict | None = None
sequence: int | None = None
duration_ms: int | None = None
@staticmethod
@@ -77,10 +78,54 @@ class ChatMessage(BaseModel):
refusal=prisma_message.refusal,
tool_calls=_parse_json_field(prisma_message.toolCalls),
function_call=_parse_json_field(prisma_message.functionCall),
sequence=prisma_message.sequence,
duration_ms=prisma_message.durationMs,
)
def is_message_duplicate(
messages: list[ChatMessage],
role: str,
content: str,
) -> bool:
"""Check whether *content* is already present in the current pending turn.
Only inspects trailing messages that share the given *role* (i.e. the
current turn). This ensures legitimately repeated messages across different
turns are not suppressed, while same-turn duplicates from stale cache are
still caught.
"""
for m in reversed(messages):
if m.role == role:
if m.content == content:
return True
else:
break
return False
def maybe_append_user_message(
session: "ChatSession",
message: str | None,
is_user_message: bool,
) -> bool:
"""Append a user/assistant message to the session if not already present.
The route handler already persists the user message before enqueueing,
so we check trailing same-role messages to avoid re-appending when the
session cache is slightly stale.
Returns True if the message was appended, False if skipped.
"""
if not message:
return False
role = "user" if is_user_message else "assistant"
if is_message_duplicate(session.messages, role, message):
return False
session.messages.append(ChatMessage(role=role, content=message))
return True
class Usage(BaseModel):
prompt_tokens: int
completion_tokens: int

View File

@@ -17,6 +17,8 @@ from .model import (
ChatSession,
Usage,
get_chat_session,
is_message_duplicate,
maybe_append_user_message,
upsert_chat_session,
)
@@ -424,3 +426,151 @@ async def test_concurrent_saves_collision_detection(setup_test_user, test_user_i
assert "Streaming message 1" in contents
assert "Streaming message 2" in contents
assert "Callback result" in contents
# --------------------------------------------------------------------------- #
# is_message_duplicate #
# --------------------------------------------------------------------------- #
def test_duplicate_detected_in_trailing_same_role():
"""Duplicate user message at the tail is detected."""
msgs = [
ChatMessage(role="user", content="hello"),
ChatMessage(role="assistant", content="hi there"),
ChatMessage(role="user", content="yes"),
]
assert is_message_duplicate(msgs, "user", "yes") is True
def test_duplicate_not_detected_across_turns():
"""Same text in a previous turn (separated by assistant) is NOT a duplicate."""
msgs = [
ChatMessage(role="user", content="yes"),
ChatMessage(role="assistant", content="ok"),
]
assert is_message_duplicate(msgs, "user", "yes") is False
def test_no_duplicate_on_empty_messages():
"""Empty message list never reports a duplicate."""
assert is_message_duplicate([], "user", "hello") is False
def test_no_duplicate_when_content_differs():
"""Different content in the trailing same-role block is not a duplicate."""
msgs = [
ChatMessage(role="assistant", content="response"),
ChatMessage(role="user", content="first message"),
]
assert is_message_duplicate(msgs, "user", "second message") is False
def test_duplicate_with_multiple_trailing_same_role():
"""Detects duplicate among multiple consecutive same-role messages."""
msgs = [
ChatMessage(role="assistant", content="response"),
ChatMessage(role="user", content="msg1"),
ChatMessage(role="user", content="msg2"),
]
assert is_message_duplicate(msgs, "user", "msg1") is True
assert is_message_duplicate(msgs, "user", "msg2") is True
assert is_message_duplicate(msgs, "user", "msg3") is False
def test_duplicate_check_for_assistant_role():
"""Works correctly when checking assistant role too."""
msgs = [
ChatMessage(role="user", content="hi"),
ChatMessage(role="assistant", content="hello"),
ChatMessage(role="assistant", content="how can I help?"),
]
assert is_message_duplicate(msgs, "assistant", "hello") is True
assert is_message_duplicate(msgs, "assistant", "new response") is False
def test_no_false_positive_when_content_is_none():
"""Messages with content=None in the trailing block do not match."""
msgs = [
ChatMessage(role="user", content=None),
ChatMessage(role="user", content="hello"),
]
assert is_message_duplicate(msgs, "user", "hello") is True
# None-content message should not match any string
msgs2 = [
ChatMessage(role="user", content=None),
]
assert is_message_duplicate(msgs2, "user", "hello") is False
def test_all_same_role_messages():
"""When all messages share the same role, the entire list is scanned."""
msgs = [
ChatMessage(role="user", content="first"),
ChatMessage(role="user", content="second"),
ChatMessage(role="user", content="third"),
]
assert is_message_duplicate(msgs, "user", "first") is True
assert is_message_duplicate(msgs, "user", "new") is False
# --------------------------------------------------------------------------- #
# maybe_append_user_message #
# --------------------------------------------------------------------------- #
def test_maybe_append_user_message_appends_new():
"""A new user message is appended and returns True."""
session = ChatSession.new(user_id="u", dry_run=False)
session.messages = [
ChatMessage(role="assistant", content="hello"),
]
result = maybe_append_user_message(session, "new msg", is_user_message=True)
assert result is True
assert len(session.messages) == 2
assert session.messages[-1].role == "user"
assert session.messages[-1].content == "new msg"
def test_maybe_append_user_message_skips_duplicate():
"""A duplicate user message is skipped and returns False."""
session = ChatSession.new(user_id="u", dry_run=False)
session.messages = [
ChatMessage(role="assistant", content="hello"),
ChatMessage(role="user", content="dup"),
]
result = maybe_append_user_message(session, "dup", is_user_message=True)
assert result is False
assert len(session.messages) == 2
def test_maybe_append_user_message_none_message():
"""None/empty message returns False without appending."""
session = ChatSession.new(user_id="u", dry_run=False)
assert maybe_append_user_message(session, None, is_user_message=True) is False
assert maybe_append_user_message(session, "", is_user_message=True) is False
assert len(session.messages) == 0
def test_maybe_append_assistant_message():
"""Works for assistant role when is_user_message=False."""
session = ChatSession.new(user_id="u", dry_run=False)
session.messages = [
ChatMessage(role="user", content="hi"),
]
result = maybe_append_user_message(session, "response", is_user_message=False)
assert result is True
assert session.messages[-1].role == "assistant"
assert session.messages[-1].content == "response"
def test_maybe_append_assistant_skips_duplicate():
"""Duplicate assistant message is skipped."""
session = ChatSession.new(user_id="u", dry_run=False)
session.messages = [
ChatMessage(role="user", content="hi"),
ChatMessage(role="assistant", content="dup"),
]
result = maybe_append_user_message(session, "dup", is_user_message=False)
assert result is False
assert len(session.messages) == 2

View File

@@ -126,6 +126,21 @@ After building the file, reference it with `@@agptfile:` in other tools:
- When spawning sub-agents for research, ensure each has a distinct
non-overlapping scope to avoid redundant searches.
### Tool Discovery Priority
When the user asks to interact with a service or API, follow this order:
1. **find_block first** — Search platform blocks with `find_block`. The platform has hundreds of built-in blocks (Google Sheets, Docs, Calendar, Gmail, Slack, GitHub, etc.) that work without extra setup.
2. **run_mcp_tool** — If no matching block exists, check if a hosted MCP server is available for the service. Only use known MCP server URLs from the registry.
3. **SendAuthenticatedWebRequestBlock** — If no block or MCP server exists, use `SendAuthenticatedWebRequestBlock` with existing host-scoped credentials. Check available credentials via `connect_integration`.
4. **Manual API call** — As a last resort, guide the user to set up credentials and use `SendAuthenticatedWebRequestBlock` with direct API calls.
**Never skip step 1.** Built-in blocks are more reliable, tested, and user-friendly than MCP or raw API calls.
### Sub-agent tasks
- When using the Task tool, NEVER set `run_in_background` to true.
All tasks must run in the foreground.

View File

@@ -9,11 +9,15 @@ UTC). Fails open when Redis is unavailable to avoid blocking users.
import asyncio
import logging
from datetime import UTC, datetime, timedelta
from enum import Enum
from prisma.models import User as PrismaUser
from pydantic import BaseModel, Field
from redis.exceptions import RedisError
from backend.data.db_accessors import user_db
from backend.data.redis_client import get_redis_async
from backend.util.cache import cached
logger = logging.getLogger(__name__)
@@ -21,6 +25,40 @@ logger = logging.getLogger(__name__)
_USAGE_KEY_PREFIX = "copilot:usage"
# ---------------------------------------------------------------------------
# Subscription tier definitions
# ---------------------------------------------------------------------------
class SubscriptionTier(str, Enum):
"""Subscription tiers with increasing token allowances.
Mirrors the ``SubscriptionTier`` enum in ``schema.prisma``.
Once ``prisma generate`` is run, this can be replaced with::
from prisma.enums import SubscriptionTier
"""
FREE = "FREE"
PRO = "PRO"
BUSINESS = "BUSINESS"
ENTERPRISE = "ENTERPRISE"
# Multiplier applied to the base limits (from LD / config) for each tier.
# Intentionally int (not float): keeps limits as whole token counts and avoids
# floating-point rounding. If fractional multipliers are ever needed, change
# the type and round the result in get_global_rate_limits().
TIER_MULTIPLIERS: dict[SubscriptionTier, int] = {
SubscriptionTier.FREE: 1,
SubscriptionTier.PRO: 5,
SubscriptionTier.BUSINESS: 20,
SubscriptionTier.ENTERPRISE: 60,
}
DEFAULT_TIER = SubscriptionTier.FREE
class UsageWindow(BaseModel):
"""Usage within a single time window."""
@@ -36,6 +74,7 @@ class CoPilotUsageStatus(BaseModel):
daily: UsageWindow
weekly: UsageWindow
tier: SubscriptionTier = DEFAULT_TIER
reset_cost: int = Field(
default=0,
description="Credit cost (in cents) to reset the daily limit. 0 = feature disabled.",
@@ -66,6 +105,7 @@ async def get_usage_status(
daily_token_limit: int,
weekly_token_limit: int,
rate_limit_reset_cost: int = 0,
tier: SubscriptionTier = DEFAULT_TIER,
) -> CoPilotUsageStatus:
"""Get current usage status for a user.
@@ -74,6 +114,7 @@ async def get_usage_status(
daily_token_limit: Max tokens per day (0 = unlimited).
weekly_token_limit: Max tokens per week (0 = unlimited).
rate_limit_reset_cost: Credit cost (cents) to reset daily limit (0 = disabled).
tier: The user's rate-limit tier (included in the response).
Returns:
CoPilotUsageStatus with current usage and limits.
@@ -103,6 +144,7 @@ async def get_usage_status(
limit=weekly_token_limit,
resets_at=_weekly_reset_time(now=now),
),
tier=tier,
reset_cost=rate_limit_reset_cost,
)
@@ -343,20 +385,103 @@ async def record_token_usage(
)
class _UserNotFoundError(Exception):
"""Raised when a user record is missing or has no subscription tier.
Used internally by ``_fetch_user_tier`` to signal a cache-miss condition:
by raising instead of returning ``DEFAULT_TIER``, we prevent the ``@cached``
decorator from storing the fallback value. This avoids a race condition
where a non-existent user's DEFAULT_TIER is cached, then the user is
created with a higher tier but receives the stale cached FREE tier for
up to 5 minutes.
"""
@cached(maxsize=1000, ttl_seconds=300, shared_cache=True)
async def _fetch_user_tier(user_id: str) -> SubscriptionTier:
"""Fetch the user's rate-limit tier from the database (cached via Redis).
Uses ``shared_cache=True`` so that tier changes propagate across all pods
immediately when the cache entry is invalidated (via ``cache_delete``).
Only successful DB lookups of existing users with a valid tier are cached.
Raises ``_UserNotFoundError`` when the user is missing or has no tier, so
the ``@cached`` decorator does **not** store a fallback value. This
prevents a race condition where a non-existent user's ``DEFAULT_TIER`` is
cached and then persists after the user is created with a higher tier.
"""
try:
user = await user_db().get_user_by_id(user_id)
except Exception:
raise _UserNotFoundError(user_id)
if user.subscription_tier:
return SubscriptionTier(user.subscription_tier)
raise _UserNotFoundError(user_id)
async def get_user_tier(user_id: str) -> SubscriptionTier:
"""Look up the user's rate-limit tier from the database.
Successful results are cached for 5 minutes (via ``_fetch_user_tier``)
to avoid a DB round-trip on every rate-limit check.
Falls back to ``DEFAULT_TIER`` **without caching** when the DB is
unreachable or returns an unrecognised value, so the next call retries
the query instead of serving a stale fallback for up to 5 minutes.
"""
try:
return await _fetch_user_tier(user_id)
except Exception as exc:
logger.warning(
"Failed to resolve rate-limit tier for user %s, defaulting to %s: %s",
user_id[:8],
DEFAULT_TIER.value,
exc,
)
return DEFAULT_TIER
# Expose cache management on the public function so callers (including tests)
# never need to reach into the private ``_fetch_user_tier``.
get_user_tier.cache_clear = _fetch_user_tier.cache_clear # type: ignore[attr-defined]
get_user_tier.cache_delete = _fetch_user_tier.cache_delete # type: ignore[attr-defined]
async def set_user_tier(user_id: str, tier: SubscriptionTier) -> None:
"""Persist the user's rate-limit tier to the database.
Also invalidates the ``get_user_tier`` cache for this user so that
subsequent rate-limit checks immediately see the new tier.
Raises:
prisma.errors.RecordNotFoundError: If the user does not exist.
"""
await PrismaUser.prisma().update(
where={"id": user_id},
data={"subscriptionTier": tier.value},
)
# Invalidate cached tier so rate-limit checks pick up the change immediately.
get_user_tier.cache_delete(user_id) # type: ignore[attr-defined]
async def get_global_rate_limits(
user_id: str,
config_daily: int,
config_weekly: int,
) -> tuple[int, int]:
) -> tuple[int, int, SubscriptionTier]:
"""Resolve global rate limits from LaunchDarkly, falling back to config.
The base limits (from LD or config) are multiplied by the user's
tier multiplier so that higher tiers receive proportionally larger
allowances.
Args:
user_id: User ID for LD flag evaluation context.
config_daily: Fallback daily limit from ChatConfig.
config_weekly: Fallback weekly limit from ChatConfig.
Returns:
(daily_token_limit, weekly_token_limit) tuple.
(daily_token_limit, weekly_token_limit, tier) 3-tuple.
"""
# Lazy import to avoid circular dependency:
# rate_limit -> feature_flag -> settings -> ... -> rate_limit
@@ -378,7 +503,15 @@ async def get_global_rate_limits(
except (TypeError, ValueError):
logger.warning("Invalid LD value for weekly token limit: %r", weekly_raw)
weekly = config_weekly
return daily, weekly
# Apply tier multiplier
tier = await get_user_tier(user_id)
multiplier = TIER_MULTIPLIERS.get(tier, 1)
if multiplier != 1:
daily = daily * multiplier
weekly = weekly * multiplier
return daily, weekly, tier
async def reset_user_usage(user_id: str, *, reset_weekly: bool = False) -> None:

File diff suppressed because it is too large Load Diff

View File

@@ -9,7 +9,7 @@ import pytest
from fastapi import HTTPException
from backend.api.features.chat.routes import reset_copilot_usage
from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow
from backend.copilot.rate_limit import CoPilotUsageStatus, SubscriptionTier, UsageWindow
from backend.util.exceptions import InsufficientBalanceError
@@ -53,6 +53,18 @@ def _mock_settings(enable_credit: bool = True):
return mock
def _mock_rate_limits(
daily: int = 2_500_000,
weekly: int = 12_500_000,
tier: SubscriptionTier = SubscriptionTier.PRO,
):
"""Mock get_global_rate_limits to return fixed limits (no tier multiplier)."""
return patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(daily, weekly, tier)),
)
@pytest.mark.asyncio
class TestResetCopilotUsage:
async def test_feature_disabled_returns_400(self):
@@ -70,10 +82,7 @@ class TestResetCopilotUsage:
with (
patch(f"{_MODULE}.config", _make_config(daily_token_limit=0)),
patch(f"{_MODULE}.settings", _mock_settings()),
patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(0, 12_500_000)),
),
_mock_rate_limits(daily=0),
):
with pytest.raises(HTTPException) as exc_info:
await reset_copilot_usage(user_id="user-1")
@@ -87,10 +96,7 @@ class TestResetCopilotUsage:
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(2_500_000, 12_500_000)),
),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
patch(f"{_MODULE}.release_reset_lock", AsyncMock()) as mock_release,
@@ -120,10 +126,7 @@ class TestResetCopilotUsage:
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(2_500_000, 12_500_000)),
),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
patch(f"{_MODULE}.release_reset_lock", AsyncMock()) as mock_release,
@@ -153,10 +156,7 @@ class TestResetCopilotUsage:
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(2_500_000, 12_500_000)),
),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
patch(f"{_MODULE}.release_reset_lock", AsyncMock()),
@@ -187,10 +187,7 @@ class TestResetCopilotUsage:
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(2_500_000, 12_500_000)),
),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=3)),
):
with pytest.raises(HTTPException) as exc_info:
@@ -228,10 +225,7 @@ class TestResetCopilotUsage:
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(2_500_000, 12_500_000)),
),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
patch(f"{_MODULE}.release_reset_lock", AsyncMock()) as mock_release,
@@ -252,10 +246,7 @@ class TestResetCopilotUsage:
with (
patch(f"{_MODULE}.config", _make_config()),
patch(f"{_MODULE}.settings", _mock_settings()),
patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(2_500_000, 12_500_000)),
),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=None)),
):
with pytest.raises(HTTPException) as exc_info:
@@ -273,10 +264,7 @@ class TestResetCopilotUsage:
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(2_500_000, 12_500_000)),
),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
patch(f"{_MODULE}.release_reset_lock", AsyncMock()),
@@ -307,10 +295,7 @@ class TestResetCopilotUsage:
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(2_500_000, 12_500_000)),
),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
patch(f"{_MODULE}.release_reset_lock", AsyncMock()),

View File

@@ -53,6 +53,12 @@ Steps:
or fix manually based on the error descriptions. Iterate until valid.
8. **Save**: Call `create_agent` (new) or `edit_agent` (existing) with
the final `agent_json`
8. **Dry-run**: ALWAYS call `run_agent` with `dry_run=True` and
`wait_for_result=120` to verify the agent works end-to-end.
9. **Inspect & fix**: Check the dry-run output for errors. If issues are
found, call `edit_agent` to fix and dry-run again. Repeat until the
simulation passes or the problems are clearly unfixable.
See "REQUIRED: Dry-Run Verification Loop" section below for details.
### Agent JSON Structure
@@ -246,19 +252,51 @@ call in a loop until the task is complete:
Regular blocks work exactly like sub-agents as tools — wire each input
field from `source_name: "tools"` on the Orchestrator side.
### Testing with Dry Run
### REQUIRED: Dry-Run Verification Loop (create -> dry-run -> fix)
After saving an agent, suggest a dry run to validate wiring without consuming
real API calls, credentials, or credits:
After creating or editing an agent, you MUST dry-run it before telling the
user the agent is ready. NEVER skip this step.
1. **Run**: Call `run_agent` or `run_block` with `dry_run=True` and provide
sample inputs. This executes the graph with mock outputs, verifying that
links resolve correctly and required inputs are satisfied.
2. **Check results**: Call `view_agent_output` with `show_execution_details=True`
to inspect the full node-by-node execution trace. This shows what each node
received as input and produced as output, making it easy to spot wiring issues.
3. **Iterate**: If the dry run reveals wiring issues or missing inputs, fix
the agent JSON and re-save before suggesting a real execution.
#### Step-by-step workflow
1. **Create/Edit**: Call `create_agent` or `edit_agent` to save the agent.
2. **Dry-run**: Call `run_agent` with `dry_run=True`, `wait_for_result=120`,
and realistic sample inputs that exercise every path in the agent. This
simulates execution using an LLM for each block — no real API calls,
credentials, or credits are consumed.
3. **Inspect output**: Examine the dry-run result for problems. If
`wait_for_result` returns only a summary, call
`view_agent_output(execution_id=..., show_execution_details=True)` to
see the full node-by-node execution trace. Look for:
- **Errors / failed nodes** — a node raised an exception or returned an
error status. Common causes: wrong `source_name`/`sink_name` in links,
missing `input_default` values, or referencing a nonexistent block output.
- **Null / empty outputs** — data did not flow through a link. Verify that
`source_name` and `sink_name` match the block schemas exactly (case-
sensitive, including nested `_#_` notation).
- **Nodes that never executed** — the node was not reached. Likely a
missing or broken link from an upstream node.
- **Unexpected values** — data arrived but in the wrong type or
structure. Check type compatibility between linked ports.
4. **Fix**: If any issues are found, call `edit_agent` with the corrected
agent JSON, then go back to step 2.
5. **Repeat**: Continue the dry-run -> fix cycle until the simulation passes
or the problems are clearly unfixable. If you stop making progress,
report the remaining issues to the user and ask for guidance.
#### Good vs bad dry-run output
**Good output** (agent is ready):
- All nodes executed successfully (no errors in the execution trace)
- Data flows through every link with non-null, correctly-typed values
- The final `AgentOutputBlock` contains a meaningful result
- Status is `COMPLETED`
**Bad output** (needs fixing):
- Status is `FAILED` — check the error message for the failing node
- An output node received `null` — trace back to find the broken link
- A node received data in the wrong format (e.g. string where list expected)
- Nodes downstream of a failing node were skipped entirely
**Special block behaviour in dry-run mode:**
- **OrchestratorBlock** and **AgentExecutorBlock** execute for real so the

View File

@@ -28,13 +28,12 @@ Each result includes a `remotes` array with the exact server URL to use.
### Important: Check blocks first
Before using `run_mcp_tool`, always check if the platform already has blocks for the service
using `find_block`. The platform has hundreds of built-in blocks (Google Sheets, Google Docs,
Google Calendar, Gmail, etc.) that work without MCP setup.
Always follow the **Tool Discovery Priority** described in the tool notes:
call `find_block` before resorting to `run_mcp_tool`.
Only use `run_mcp_tool` when:
- The service is in the known hosted MCP servers list above, OR
- You searched `find_block` first and found no matching blocks
- You searched `find_block` first and found no matching blocks, AND
- The service is in the known hosted MCP servers list above or found via the registry API
**Never guess or construct MCP server URLs.** Only use URLs from the known servers list above
or from the `remotes[].url` field in MCP registry search results.

View File

@@ -8,20 +8,19 @@ from uuid import uuid4
import pytest
from backend.util import json
from backend.util.prompt import CompressResult
from .conftest import build_test_transcript as _build_transcript
from .service import _friendly_error_text, _is_prompt_too_long
from .transcript import (
from backend.copilot.transcript import (
_flatten_assistant_content,
_flatten_tool_result_content,
_messages_to_transcript,
_run_compression,
_transcript_to_messages,
compact_transcript,
validate_transcript,
)
from backend.util import json
from backend.util.prompt import CompressResult
from .conftest import build_test_transcript as _build_transcript
from .service import _friendly_error_text, _is_prompt_too_long
from .transcript import compact_transcript, validate_transcript
# ---------------------------------------------------------------------------
# _flatten_assistant_content
@@ -403,7 +402,7 @@ class TestCompactTranscript:
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):
@@ -438,7 +437,7 @@ class TestCompactTranscript:
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):
@@ -462,7 +461,7 @@ class TestCompactTranscript:
]
)
with patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
side_effect=RuntimeError("LLM unavailable"),
):
@@ -568,11 +567,11 @@ class TestRunCompressionTimeout:
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
"backend.copilot.transcript.get_openai_client",
return_value="fake-client",
),
patch(
"backend.copilot.sdk.transcript.compress_context",
"backend.copilot.transcript.compress_context",
side_effect=_mock_compress,
),
):
@@ -602,11 +601,11 @@ class TestRunCompressionTimeout:
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
"backend.copilot.transcript.get_openai_client",
return_value=None,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
"backend.copilot.transcript.compress_context",
new_callable=AsyncMock,
return_value=truncation_result,
) as mock_compress,

View File

@@ -26,18 +26,17 @@ from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from backend.util import json
from .conftest import build_test_transcript as _build_transcript
from .service import _MAX_STREAM_ATTEMPTS, _reduce_context
from .transcript import (
from backend.copilot.transcript import (
_flatten_assistant_content,
_flatten_tool_result_content,
_messages_to_transcript,
_transcript_to_messages,
compact_transcript,
validate_transcript,
)
from backend.util import json
from .conftest import build_test_transcript as _build_transcript
from .service import _MAX_STREAM_ATTEMPTS, _reduce_context
from .transcript import compact_transcript, validate_transcript
from .transcript_builder import TranscriptBuilder
# ---------------------------------------------------------------------------
@@ -113,7 +112,7 @@ class TestScenarioCompactAndRetry:
)(),
),
patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
),
@@ -170,7 +169,7 @@ class TestScenarioCompactFailsFallback:
)(),
),
patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
side_effect=RuntimeError("LLM unavailable"),
),
@@ -261,7 +260,7 @@ class TestScenarioDoubleFailDBFallback:
)(),
),
patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
),
@@ -337,7 +336,7 @@ class TestScenarioCompactionIdentical:
)(),
),
patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
),
@@ -730,7 +729,7 @@ class TestRetryEdgeCases:
)(),
),
patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
),
@@ -841,7 +840,7 @@ class TestRetryStateReset:
)(),
),
patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
side_effect=RuntimeError("boom"),
),
@@ -1405,9 +1404,9 @@ class TestStreamChatCompletionRetryIntegration:
events.append(event)
# Should NOT retry — only 1 attempt for auth errors
assert attempt_count[0] == 1, (
f"Expected 1 attempt (no retry for auth error), " f"got {attempt_count[0]}"
)
assert (
attempt_count[0] == 1
), f"Expected 1 attempt (no retry for auth error), got {attempt_count[0]}"
errors = [e for e in events if isinstance(e, StreamError)]
assert errors, "Expected StreamError"
assert errors[0].code == "sdk_stream_error"

View File

@@ -29,16 +29,29 @@ from claude_agent_sdk import (
)
from langfuse import propagate_attributes
from langsmith.integrations.claude_agent_sdk import configure_claude_agent_sdk
from opentelemetry import trace as otel_trace
from pydantic import BaseModel
from backend.copilot.context import get_workspace_manager
from backend.copilot.permissions import apply_tool_permissions
from backend.copilot.rate_limit import get_user_tier
from backend.copilot.transcript import (
_run_compression,
cleanup_stale_project_dirs,
compact_transcript,
download_transcript,
read_compacted_entries,
upload_transcript,
validate_transcript,
write_transcript_to_tempfile,
)
from backend.copilot.transcript_builder import TranscriptBuilder
from backend.data.redis_client import get_redis_async
from backend.executor.cluster_lock import AsyncClusterLock
from backend.util.exceptions import NotFoundError
from backend.util.settings import Settings
from ..config import ChatConfig
from ..config import ChatConfig, CopilotMode
from ..constants import (
COPILOT_ERROR_PREFIX,
COPILOT_RETRYABLE_ERROR_PREFIX,
@@ -51,7 +64,7 @@ from ..model import (
ChatMessage,
ChatSession,
get_chat_session,
update_session_title,
maybe_append_user_message,
upsert_chat_session,
)
from ..prompting import get_sdk_supplement
@@ -70,11 +83,7 @@ from ..response_model import (
StreamToolOutputAvailable,
StreamUsage,
)
from ..service import (
_build_system_prompt,
_generate_session_title,
_is_langfuse_configured,
)
from ..service import _build_system_prompt, _is_langfuse_configured, _update_title_async
from ..token_tracking import persist_and_record_usage
from ..tools.e2b_sandbox import get_or_create_sandbox, pause_sandbox_direct
from ..tools.sandbox import WORKSPACE_PREFIX, make_session_path
@@ -92,17 +101,6 @@ from .tool_adapter import (
set_execution_context,
wait_for_stash,
)
from .transcript import (
_run_compression,
cleanup_stale_project_dirs,
compact_transcript,
download_transcript,
read_compacted_entries,
upload_transcript,
validate_transcript,
write_transcript_to_tempfile,
)
from .transcript_builder import TranscriptBuilder
logger = logging.getLogger(__name__)
config = ChatConfig()
@@ -129,6 +127,11 @@ _CIRCUIT_BREAKER_ERROR_MSG = (
"Try breaking your request into smaller parts."
)
# Idle timeout: abort the stream if no meaningful SDK message (only heartbeats)
# arrives for this many seconds. This catches hung tool calls (e.g. WebSearch
# hanging on a search provider that never responds).
_IDLE_TIMEOUT_SECONDS = 10 * 60 # 10 minutes
# Patterns that indicate the prompt/request exceeds the model's context limit.
# Matched case-insensitively against the full exception chain.
_PROMPT_TOO_LONG_PATTERNS: tuple[str, ...] = (
@@ -1271,6 +1274,8 @@ async def _run_stream_attempt(
await client.query(state.query_message, session_id=ctx.session_id)
state.transcript_builder.append_user(content=ctx.current_message)
_last_real_msg_time = time.monotonic()
async for sdk_msg in _iter_sdk_messages(client):
# Heartbeat sentinel — refresh lock and keep SSE alive
if sdk_msg is None:
@@ -1278,8 +1283,34 @@ async def _run_stream_attempt(
for ev in ctx.compaction.emit_start_if_ready():
yield ev
yield StreamHeartbeat()
# Idle timeout: if no real SDK message for too long, a tool
# call is likely hung (e.g. WebSearch provider not responding).
idle_seconds = time.monotonic() - _last_real_msg_time
if idle_seconds >= _IDLE_TIMEOUT_SECONDS:
logger.error(
"%s Idle timeout after %.0fs with no SDK message — "
"aborting stream (likely hung tool call)",
ctx.log_prefix,
idle_seconds,
)
stream_error_msg = (
"A tool call appears to be stuck "
"(no response for 10 minutes). "
"Please try again."
)
stream_error_code = "idle_timeout"
_append_error_marker(ctx.session, stream_error_msg, retryable=True)
yield StreamError(
errorText=stream_error_msg,
code=stream_error_code,
)
ended_with_stream_error = True
break
continue
_last_real_msg_time = time.monotonic()
logger.info(
"%s Received: %s %s (unresolved=%d, current=%d, resolved=%d)",
ctx.log_prefix,
@@ -1528,9 +1559,21 @@ async def _run_stream_attempt(
# --- Intermediate persistence ---
# Flush session messages to DB periodically so page reloads
# show progress during long-running turns.
#
# IMPORTANT: Skip the flush while tool calls are pending
# (tool_calls set on assistant but results not yet received).
# The DB save is append-only (uses start_sequence), so if we
# flush the assistant message before tool_calls are set on it
# (text and tool_use arrive as separate SDK events), the
# tool_calls update is lost — the next flush starts past it.
_msgs_since_flush += 1
now = time.monotonic()
if (
has_pending_tools = (
acc.has_appended_assistant
and acc.accumulated_tool_calls
and not acc.has_tool_results
)
if not has_pending_tools and (
_msgs_since_flush >= _FLUSH_MESSAGE_THRESHOLD
or (now - _last_flush_time) >= _FLUSH_INTERVAL_SECONDS
):
@@ -1630,6 +1673,7 @@ async def stream_chat_completion_sdk(
session: ChatSession | None = None,
file_ids: list[str] | None = None,
permissions: "CopilotPermissions | None" = None,
mode: CopilotMode | None = None,
**_kwargs: Any,
) -> AsyncIterator[StreamBaseResponse]:
"""Stream chat completion using Claude Agent SDK.
@@ -1638,7 +1682,10 @@ async def stream_chat_completion_sdk(
file_ids: Optional workspace file IDs attached to the user's message.
Images are embedded as vision content blocks; other files are
saved to the SDK working directory for the Read tool.
mode: Accepted for signature compatibility with the baseline path.
The SDK path does not currently branch on this value.
"""
_ = mode # SDK path ignores the requested mode.
if session is None:
session = await get_chat_session(session_id, user_id)
@@ -1669,19 +1716,12 @@ async def stream_chat_completion_sdk(
)
session.messages.pop()
# Append the new message to the session if it's not already there
new_message_role = "user" if is_user_message else "assistant"
if message and (
len(session.messages) == 0
or not (
session.messages[-1].role == new_message_role
and session.messages[-1].content == message
)
):
session.messages.append(ChatMessage(role=new_message_role, content=message))
if maybe_append_user_message(session, message, is_user_message):
if is_user_message:
track_user_message(
user_id=user_id, session_id=session_id, message_length=len(message)
user_id=user_id,
session_id=session_id,
message_length=len(message or ""),
)
# Structured log prefix: [SDK][<session>][T<turn>]
@@ -1946,15 +1986,20 @@ async def stream_chat_completion_sdk(
# langsmith tracing integration attaches them to every span. This
# is what Langfuse (or any OTEL backend) maps to its native
# user/session fields.
_user_tier = await get_user_tier(user_id) if user_id else None
_otel_metadata: dict[str, str] = {
"resume": str(use_resume),
"conversation_turn": str(turn),
}
if _user_tier:
_otel_metadata["subscription_tier"] = _user_tier.value
_otel_ctx = propagate_attributes(
user_id=user_id,
session_id=session_id,
trace_name="copilot-sdk",
tags=["sdk"],
metadata={
"resume": str(use_resume),
"conversation_turn": str(turn),
},
metadata=_otel_metadata,
)
_otel_ctx.__enter__()
@@ -2323,8 +2368,26 @@ async def stream_chat_completion_sdk(
raise
finally:
# --- Close OTEL context ---
# --- Close OTEL context (with cost attributes) ---
if _otel_ctx is not None:
try:
span = otel_trace.get_current_span()
if span and span.is_recording():
span.set_attribute("gen_ai.usage.prompt_tokens", turn_prompt_tokens)
span.set_attribute(
"gen_ai.usage.completion_tokens", turn_completion_tokens
)
span.set_attribute(
"gen_ai.usage.cache_read_tokens", turn_cache_read_tokens
)
span.set_attribute(
"gen_ai.usage.cache_creation_tokens",
turn_cache_creation_tokens,
)
if turn_cost_usd is not None:
span.set_attribute("gen_ai.usage.cost_usd", turn_cost_usd)
except Exception:
logger.debug("Failed to set OTEL cost attributes", exc_info=True)
try:
_otel_ctx.__exit__(*sys.exc_info())
except Exception:
@@ -2342,6 +2405,8 @@ async def stream_chat_completion_sdk(
cache_creation_tokens=turn_cache_creation_tokens,
log_prefix=log_prefix,
cost_usd=turn_cost_usd,
model=config.model,
provider="anthropic",
)
# --- Persist session messages ---
@@ -2446,18 +2511,3 @@ async def stream_chat_completion_sdk(
finally:
# Release stream lock to allow new streams for this session
await lock.release()
async def _update_title_async(
session_id: str, message: str, user_id: str | None = None
) -> None:
"""Background task to update session title."""
try:
title = await _generate_session_title(
message, user_id=user_id, session_id=session_id
)
if title and user_id:
await update_session_title(session_id, user_id, title, only_if_empty=True)
logger.debug("[SDK] Generated title for %s: %s", session_id, title)
except Exception as e:
logger.warning("[SDK] Failed to update session title: %s", e)

View File

@@ -27,20 +27,19 @@ from backend.copilot.response_model import (
StreamTextDelta,
StreamTextStart,
)
from backend.util import json
from .conftest import build_structured_transcript
from .response_adapter import SDKResponseAdapter
from .service import _format_sdk_content_blocks
from .transcript import (
from backend.copilot.transcript import (
_find_last_assistant_entry,
_flatten_assistant_content,
_messages_to_transcript,
_rechain_tail,
_transcript_to_messages,
compact_transcript,
validate_transcript,
)
from backend.util import json
from .conftest import build_structured_transcript
from .response_adapter import SDKResponseAdapter
from .service import _format_sdk_content_blocks
from .transcript import compact_transcript, validate_transcript
# ---------------------------------------------------------------------------
# Fixtures: realistic thinking block content
@@ -439,7 +438,7 @@ class TestCompactTranscriptThinkingBlocks:
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):
@@ -498,7 +497,7 @@ class TestCompactTranscriptThinkingBlocks:
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
side_effect=mock_compression,
):
await compact_transcript(transcript, model="test-model")
@@ -551,7 +550,7 @@ class TestCompactTranscriptThinkingBlocks:
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):
@@ -601,7 +600,7 @@ class TestCompactTranscriptThinkingBlocks:
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):
@@ -638,7 +637,7 @@ class TestCompactTranscriptThinkingBlocks:
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):
@@ -699,7 +698,7 @@ class TestCompactTranscriptThinkingBlocks:
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):

File diff suppressed because it is too large Load Diff

View File

@@ -1,235 +1,10 @@
"""Build complete JSONL transcript from SDK messages.
"""Re-export from shared ``backend.copilot.transcript_builder`` for backward compat.
The transcript represents the FULL active context at any point in time.
Each upload REPLACES the previous transcript atomically.
Flow:
Turn 1: Upload [msg1, msg2]
Turn 2: Download [msg1, msg2] → Upload [msg1, msg2, msg3, msg4] (REPLACE)
Turn 3: Download [msg1, msg2, msg3, msg4] → Upload [all messages] (REPLACE)
The transcript is never incremental - always the complete atomic state.
The canonical implementation now lives at ``backend.copilot.transcript_builder``
so both the SDK and baseline paths can import without cross-package
dependencies.
"""
import logging
from typing import Any
from uuid import uuid4
from backend.copilot.transcript_builder import TranscriptBuilder, TranscriptEntry
from pydantic import BaseModel
from backend.util import json
from .transcript import STRIPPABLE_TYPES
logger = logging.getLogger(__name__)
class TranscriptEntry(BaseModel):
"""Single transcript entry (user or assistant turn)."""
type: str
uuid: str
parentUuid: str | None
isCompactSummary: bool | None = None
message: dict[str, Any]
class TranscriptBuilder:
"""Build complete JSONL transcript from SDK messages.
This builder maintains the FULL conversation state, not incremental changes.
The output is always the complete active context.
"""
def __init__(self) -> None:
self._entries: list[TranscriptEntry] = []
self._last_uuid: str | None = None
def _last_is_assistant(self) -> bool:
return bool(self._entries) and self._entries[-1].type == "assistant"
def _last_message_id(self) -> str:
"""Return the message.id of the last entry, or '' if none."""
if self._entries:
return self._entries[-1].message.get("id", "")
return ""
@staticmethod
def _parse_entry(data: dict) -> TranscriptEntry | None:
"""Parse a single transcript entry, filtering strippable types.
Returns ``None`` for entries that should be skipped (strippable types
that are not compaction summaries).
"""
entry_type = data.get("type", "")
if entry_type in STRIPPABLE_TYPES and not data.get("isCompactSummary"):
return None
return TranscriptEntry(
type=entry_type,
uuid=data.get("uuid") or str(uuid4()),
parentUuid=data.get("parentUuid"),
isCompactSummary=data.get("isCompactSummary"),
message=data.get("message", {}),
)
def load_previous(self, content: str, log_prefix: str = "[Transcript]") -> None:
"""Load complete previous transcript.
This loads the FULL previous context. As new messages come in,
we append to this state. The final output is the complete context
(previous + new), not just the delta.
"""
if not content or not content.strip():
return
lines = content.strip().split("\n")
for line_num, line in enumerate(lines, 1):
if not line.strip():
continue
data = json.loads(line, fallback=None)
if data is None:
logger.warning(
"%s Failed to parse transcript line %d/%d",
log_prefix,
line_num,
len(lines),
)
continue
entry = self._parse_entry(data)
if entry is None:
continue
self._entries.append(entry)
self._last_uuid = entry.uuid
logger.info(
"%s Loaded %d entries from previous transcript (last_uuid=%s)",
log_prefix,
len(self._entries),
self._last_uuid[:12] if self._last_uuid else None,
)
def append_user(self, content: str | list[dict], uuid: str | None = None) -> None:
"""Append a user entry."""
msg_uuid = uuid or str(uuid4())
self._entries.append(
TranscriptEntry(
type="user",
uuid=msg_uuid,
parentUuid=self._last_uuid,
message={"role": "user", "content": content},
)
)
self._last_uuid = msg_uuid
def append_tool_result(self, tool_use_id: str, content: str) -> None:
"""Append a tool result as a user entry (one per tool call)."""
self.append_user(
content=[
{"type": "tool_result", "tool_use_id": tool_use_id, "content": content}
]
)
def append_assistant(
self,
content_blocks: list[dict],
model: str = "",
stop_reason: str | None = None,
) -> None:
"""Append an assistant entry.
Consecutive assistant entries automatically share the same message ID
so the CLI can merge them (thinking → text → tool_use) into a single
API message on ``--resume``. A new ID is assigned whenever an
assistant entry follows a non-assistant entry (user message or tool
result), because that marks the start of a new API response.
"""
message_id = (
self._last_message_id()
if self._last_is_assistant()
else f"msg_sdk_{uuid4().hex[:24]}"
)
msg_uuid = str(uuid4())
self._entries.append(
TranscriptEntry(
type="assistant",
uuid=msg_uuid,
parentUuid=self._last_uuid,
message={
"role": "assistant",
"model": model,
"id": message_id,
"type": "message",
"content": content_blocks,
"stop_reason": stop_reason,
"stop_sequence": None,
},
)
)
self._last_uuid = msg_uuid
def replace_entries(
self, compacted_entries: list[dict], log_prefix: str = "[Transcript]"
) -> None:
"""Replace all entries with compacted entries from the CLI session file.
Called after mid-stream compaction so TranscriptBuilder mirrors the
CLI's active context (compaction summary + post-compaction entries).
Builds the new list first and validates it's non-empty before swapping,
so corrupt input cannot wipe the conversation history.
"""
new_entries: list[TranscriptEntry] = []
for data in compacted_entries:
entry = self._parse_entry(data)
if entry is not None:
new_entries.append(entry)
if not new_entries:
logger.warning(
"%s replace_entries produced 0 entries from %d inputs, keeping old (%d entries)",
log_prefix,
len(compacted_entries),
len(self._entries),
)
return
old_count = len(self._entries)
self._entries = new_entries
self._last_uuid = new_entries[-1].uuid
logger.info(
"%s TranscriptBuilder compacted: %d entries -> %d entries",
log_prefix,
old_count,
len(self._entries),
)
def to_jsonl(self) -> str:
"""Export complete context as JSONL.
Consecutive assistant entries are kept separate to match the
native CLI format — the SDK merges them internally on resume.
Returns the FULL conversation state (all entries), not incremental.
This output REPLACES any previous transcript.
"""
if not self._entries:
return ""
lines = [entry.model_dump_json(exclude_none=True) for entry in self._entries]
return "\n".join(lines) + "\n"
@property
def entry_count(self) -> int:
"""Total number of entries in the complete context."""
return len(self._entries)
@property
def is_empty(self) -> bool:
"""Whether this builder has any entries."""
return len(self._entries) == 0
__all__ = ["TranscriptBuilder", "TranscriptEntry"]

View File

@@ -303,7 +303,7 @@ class TestDeleteTranscript:
mock_storage.delete = AsyncMock()
with patch(
"backend.copilot.sdk.transcript.get_workspace_storage",
"backend.copilot.transcript.get_workspace_storage",
new_callable=AsyncMock,
return_value=mock_storage,
):
@@ -323,7 +323,7 @@ class TestDeleteTranscript:
)
with patch(
"backend.copilot.sdk.transcript.get_workspace_storage",
"backend.copilot.transcript.get_workspace_storage",
new_callable=AsyncMock,
return_value=mock_storage,
):
@@ -341,7 +341,7 @@ class TestDeleteTranscript:
)
with patch(
"backend.copilot.sdk.transcript.get_workspace_storage",
"backend.copilot.transcript.get_workspace_storage",
new_callable=AsyncMock,
return_value=mock_storage,
):
@@ -850,7 +850,7 @@ class TestRunCompression:
@pytest.mark.asyncio
async def test_no_client_uses_truncation(self):
"""Path (a): ``get_openai_client()`` returns None → truncation only."""
from .transcript import _run_compression
from backend.copilot.transcript import _run_compression
truncation_result = self._make_compress_result(
True, [{"role": "user", "content": "truncated"}]
@@ -858,11 +858,11 @@ class TestRunCompression:
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
"backend.copilot.transcript.get_openai_client",
return_value=None,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
"backend.copilot.transcript.compress_context",
new_callable=AsyncMock,
return_value=truncation_result,
) as mock_compress,
@@ -885,7 +885,7 @@ class TestRunCompression:
@pytest.mark.asyncio
async def test_llm_success_returns_llm_result(self):
"""Path (b): ``get_openai_client()`` returns a client → LLM compresses."""
from .transcript import _run_compression
from backend.copilot.transcript import _run_compression
llm_result = self._make_compress_result(
True, [{"role": "user", "content": "LLM summary"}]
@@ -894,11 +894,11 @@ class TestRunCompression:
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
"backend.copilot.transcript.get_openai_client",
return_value=mock_client,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
"backend.copilot.transcript.compress_context",
new_callable=AsyncMock,
return_value=llm_result,
) as mock_compress,
@@ -916,7 +916,7 @@ class TestRunCompression:
@pytest.mark.asyncio
async def test_llm_failure_falls_back_to_truncation(self):
"""Path (c): LLM call raises → truncation fallback used instead."""
from .transcript import _run_compression
from backend.copilot.transcript import _run_compression
truncation_result = self._make_compress_result(
True, [{"role": "user", "content": "truncated fallback"}]
@@ -932,11 +932,11 @@ class TestRunCompression:
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
"backend.copilot.transcript.get_openai_client",
return_value=mock_client,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
"backend.copilot.transcript.compress_context",
side_effect=_compress_side_effect,
),
):
@@ -953,7 +953,7 @@ class TestRunCompression:
@pytest.mark.asyncio
async def test_llm_timeout_falls_back_to_truncation(self):
"""Path (d): LLM call exceeds timeout → truncation fallback used."""
from .transcript import _run_compression
from backend.copilot.transcript import _run_compression
truncation_result = self._make_compress_result(
True, [{"role": "user", "content": "truncated after timeout"}]
@@ -970,19 +970,19 @@ class TestRunCompression:
fake_client = MagicMock()
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
"backend.copilot.transcript.get_openai_client",
return_value=fake_client,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
"backend.copilot.transcript.compress_context",
side_effect=_compress_side_effect,
),
patch(
"backend.copilot.sdk.transcript._COMPACTION_TIMEOUT_SECONDS",
"backend.copilot.transcript._COMPACTION_TIMEOUT_SECONDS",
0.05,
),
patch(
"backend.copilot.sdk.transcript._TRUNCATION_TIMEOUT_SECONDS",
"backend.copilot.transcript._TRUNCATION_TIMEOUT_SECONDS",
5,
),
):
@@ -1007,7 +1007,7 @@ class TestCleanupStaleProjectDirs:
def test_removes_old_copilot_dirs(self, tmp_path, monkeypatch):
"""Directories matching copilot pattern older than threshold are removed."""
from backend.copilot.sdk.transcript import (
from backend.copilot.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
@@ -1015,7 +1015,7 @@ class TestCleanupStaleProjectDirs:
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
"backend.copilot.transcript._projects_base",
lambda: str(projects_dir),
)
@@ -1039,12 +1039,12 @@ class TestCleanupStaleProjectDirs:
def test_ignores_non_copilot_dirs(self, tmp_path, monkeypatch):
"""Directories not matching copilot pattern are left alone."""
from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
from backend.copilot.transcript import cleanup_stale_project_dirs
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
"backend.copilot.transcript._projects_base",
lambda: str(projects_dir),
)
@@ -1062,7 +1062,7 @@ class TestCleanupStaleProjectDirs:
def test_ttl_boundary_not_removed(self, tmp_path, monkeypatch):
"""A directory exactly at the TTL boundary should NOT be removed."""
from backend.copilot.sdk.transcript import (
from backend.copilot.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
@@ -1070,7 +1070,7 @@ class TestCleanupStaleProjectDirs:
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
"backend.copilot.transcript._projects_base",
lambda: str(projects_dir),
)
@@ -1088,7 +1088,7 @@ class TestCleanupStaleProjectDirs:
def test_skips_non_directory_entries(self, tmp_path, monkeypatch):
"""Regular files matching the copilot pattern are not removed."""
from backend.copilot.sdk.transcript import (
from backend.copilot.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
@@ -1096,7 +1096,7 @@ class TestCleanupStaleProjectDirs:
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
"backend.copilot.transcript._projects_base",
lambda: str(projects_dir),
)
@@ -1114,11 +1114,11 @@ class TestCleanupStaleProjectDirs:
def test_missing_base_dir_returns_zero(self, tmp_path, monkeypatch):
"""If the projects base directory doesn't exist, return 0 gracefully."""
from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
from backend.copilot.transcript import cleanup_stale_project_dirs
nonexistent = str(tmp_path / "does-not-exist" / "projects")
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
"backend.copilot.transcript._projects_base",
lambda: nonexistent,
)
@@ -1129,7 +1129,7 @@ class TestCleanupStaleProjectDirs:
"""When encoded_cwd is supplied only that directory is swept."""
import time
from backend.copilot.sdk.transcript import (
from backend.copilot.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
@@ -1137,7 +1137,7 @@ class TestCleanupStaleProjectDirs:
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
"backend.copilot.transcript._projects_base",
lambda: str(projects_dir),
)
@@ -1160,12 +1160,12 @@ class TestCleanupStaleProjectDirs:
def test_scoped_fresh_dir_not_removed(self, tmp_path, monkeypatch):
"""Scoped sweep leaves a fresh directory alone."""
from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
from backend.copilot.transcript import cleanup_stale_project_dirs
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
"backend.copilot.transcript._projects_base",
lambda: str(projects_dir),
)
@@ -1181,7 +1181,7 @@ class TestCleanupStaleProjectDirs:
"""Scoped sweep refuses to remove a non-copilot directory."""
import time
from backend.copilot.sdk.transcript import (
from backend.copilot.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
@@ -1189,7 +1189,7 @@ class TestCleanupStaleProjectDirs:
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
"backend.copilot.transcript._projects_base",
lambda: str(projects_dir),
)

View File

@@ -22,7 +22,12 @@ from backend.util.exceptions import NotAuthorizedError, NotFoundError
from backend.util.settings import AppEnvironment, Settings
from .config import ChatConfig
from .model import ChatSessionInfo, get_chat_session, upsert_chat_session
from .model import (
ChatSessionInfo,
get_chat_session,
update_session_title,
upsert_chat_session,
)
logger = logging.getLogger(__name__)
@@ -202,6 +207,22 @@ async def _generate_session_title(
return None
async def _update_title_async(
session_id: str, message: str, user_id: str | None = None
) -> None:
"""Generate and persist a session title in the background.
Shared by both the SDK and baseline execution paths.
"""
try:
title = await _generate_session_title(message, user_id, session_id)
if title and user_id:
await update_session_title(session_id, user_id, title, only_if_empty=True)
logger.debug("Generated title for session %s", session_id)
except Exception as e:
logger.warning("Failed to update session title for %s: %s", session_id, e)
async def assign_user_to_session(
session_id: str,
user_id: str,

View File

@@ -7,7 +7,7 @@ import pytest
from .model import create_chat_session, get_chat_session, upsert_chat_session
from .response_model import StreamError, StreamTextDelta
from .sdk import service as sdk_service
from .sdk.transcript import download_transcript
from .transcript import download_transcript
logger = logging.getLogger(__name__)

View File

@@ -4,17 +4,85 @@ Both the baseline (OpenRouter) and SDK (Anthropic) service layers need to:
1. Append a ``Usage`` record to the session.
2. Log the turn's token counts.
3. Record weighted usage in Redis for rate-limiting.
4. Write a PlatformCostLog entry for admin cost tracking.
This module extracts that common logic so both paths stay in sync.
"""
import asyncio
import logging
import math
import re
import threading
from backend.data.db_accessors import platform_cost_db
from backend.data.platform_cost import PlatformCostEntry, usd_to_microdollars
from .model import ChatSession, Usage
from .rate_limit import record_token_usage
logger = logging.getLogger(__name__)
# Hold strong references to in-flight cost log tasks to prevent GC.
_pending_log_tasks: set[asyncio.Task[None]] = set()
# Guards all reads and writes to _pending_log_tasks. Done callbacks (discard)
# fire from the event loop thread; drain_pending_cost_logs iterates the set
# from any caller — the lock prevents RuntimeError from concurrent modification.
_pending_log_tasks_lock = threading.Lock()
# Per-loop semaphores: asyncio.Semaphore is not thread-safe and must not be
# shared across event loops running in different threads.
_log_semaphores: dict[asyncio.AbstractEventLoop, asyncio.Semaphore] = {}
def _get_log_semaphore() -> asyncio.Semaphore:
loop = asyncio.get_running_loop()
sem = _log_semaphores.get(loop)
if sem is None:
sem = asyncio.Semaphore(50)
_log_semaphores[loop] = sem
return sem
def _schedule_cost_log(entry: PlatformCostEntry) -> None:
"""Schedule a fire-and-forget cost log via DatabaseManagerAsyncClient RPC."""
async def _safe_log() -> None:
async with _get_log_semaphore():
try:
await platform_cost_db().log_platform_cost(entry)
except Exception:
logger.exception(
"Failed to log platform cost for user=%s provider=%s block=%s",
entry.user_id,
entry.provider,
entry.block_name,
)
task = asyncio.create_task(_safe_log())
with _pending_log_tasks_lock:
_pending_log_tasks.add(task)
def _remove(t: asyncio.Task[None]) -> None:
with _pending_log_tasks_lock:
_pending_log_tasks.discard(t)
task.add_done_callback(_remove)
# Identifiers used by PlatformCostLog for copilot turns (not tied to a real
# block/credential in the block_cost_config or credentials_store tables).
COPILOT_BLOCK_ID = "copilot"
COPILOT_CREDENTIAL_ID = "copilot_system"
def _copilot_block_name(log_prefix: str) -> str:
"""Extract stable block_name from ``"[SDK][session][T1]"`` -> ``"copilot:SDK"``."""
match = re.search(r"\[([A-Za-z][A-Za-z0-9_]*)\]", log_prefix)
if match:
return f"{COPILOT_BLOCK_ID}:{match.group(1)}"
tag = log_prefix.strip(" []")
return f"{COPILOT_BLOCK_ID}:{tag}" if tag else COPILOT_BLOCK_ID
async def persist_and_record_usage(
*,
@@ -26,6 +94,8 @@ async def persist_and_record_usage(
cache_creation_tokens: int = 0,
log_prefix: str = "",
cost_usd: float | str | None = None,
model: str | None = None,
provider: str = "open_router",
) -> int:
"""Persist token usage to session and record for rate limiting.
@@ -38,6 +108,7 @@ async def persist_and_record_usage(
cache_creation_tokens: Tokens written to prompt cache (Anthropic only).
log_prefix: Prefix for log messages (e.g. "[SDK]", "[Baseline]").
cost_usd: Optional cost for logging (float from SDK, str otherwise).
provider: Cost provider name (e.g. "anthropic", "open_router").
Returns:
The computed total_tokens (prompt + completion; cache excluded).
@@ -47,12 +118,13 @@ async def persist_and_record_usage(
cache_read_tokens = max(0, cache_read_tokens)
cache_creation_tokens = max(0, cache_creation_tokens)
if (
no_tokens = (
prompt_tokens <= 0
and completion_tokens <= 0
and cache_read_tokens <= 0
and cache_creation_tokens <= 0
):
)
if no_tokens and cost_usd is None:
return 0
# total_tokens = prompt + completion. Cache tokens are tracked
@@ -73,14 +145,14 @@ async def persist_and_record_usage(
if cache_read_tokens or cache_creation_tokens:
logger.info(
f"{log_prefix} Turn usage: uncached={prompt_tokens}, "
f"cache_read={cache_read_tokens}, cache_create={cache_creation_tokens}, "
f"output={completion_tokens}, total={total_tokens}, cost_usd={cost_usd}"
f"{log_prefix} Turn usage: uncached={prompt_tokens}, cache_read={cache_read_tokens},"
f" cache_create={cache_creation_tokens}, output={completion_tokens},"
f" total={total_tokens}, cost_usd={cost_usd}"
)
else:
logger.info(
f"{log_prefix} Turn usage: prompt={prompt_tokens}, "
f"completion={completion_tokens}, total={total_tokens}"
f"{log_prefix} Turn usage: prompt={prompt_tokens}, completion={completion_tokens},"
f" total={total_tokens}"
)
if user_id:
@@ -93,6 +165,54 @@ async def persist_and_record_usage(
cache_creation_tokens=cache_creation_tokens,
)
except Exception as usage_err:
logger.warning(f"{log_prefix} Failed to record token usage: {usage_err}")
logger.warning("%s Failed to record token usage: %s", log_prefix, usage_err)
# Log to PlatformCostLog for admin cost dashboard.
# Include entries where cost_usd is set even if token count is 0
# (e.g. fully-cached Anthropic responses where only cache tokens
# accumulate a charge without incrementing total_tokens).
if user_id and (total_tokens > 0 or cost_usd is not None):
cost_float = None
if cost_usd is not None:
try:
val = float(cost_usd)
if math.isfinite(val) and val >= 0:
cost_float = val
except (ValueError, TypeError):
pass
cost_microdollars = usd_to_microdollars(cost_float)
session_id = session.session_id if session else None
if cost_float is not None:
tracking_type = "cost_usd"
tracking_amount = cost_float
else:
tracking_type = "tokens"
tracking_amount = total_tokens
_schedule_cost_log(
PlatformCostEntry(
user_id=user_id,
graph_exec_id=session_id,
block_id=COPILOT_BLOCK_ID,
block_name=_copilot_block_name(log_prefix),
provider=provider,
credential_id=COPILOT_CREDENTIAL_ID,
cost_microdollars=cost_microdollars,
input_tokens=prompt_tokens,
output_tokens=completion_tokens,
model=model,
tracking_type=tracking_type,
tracking_amount=tracking_amount,
metadata={
"tracking_type": tracking_type,
"tracking_amount": tracking_amount,
"cache_read_tokens": cache_read_tokens,
"cache_creation_tokens": cache_creation_tokens,
"source": "copilot",
},
)
)
return total_tokens

View File

@@ -4,6 +4,7 @@ Covers both the baseline (prompt+completion only) and SDK (with cache breakdown)
calling conventions, session persistence, and rate-limit recording.
"""
import asyncio
from datetime import UTC, datetime
from unittest.mock import AsyncMock, patch
@@ -279,3 +280,260 @@ class TestRateLimitRecording:
completion_tokens=0,
)
mock_record.assert_not_awaited()
# ---------------------------------------------------------------------------
# PlatformCostLog integration
# ---------------------------------------------------------------------------
class TestPlatformCostLogging:
@pytest.mark.asyncio
async def test_logs_cost_entry_with_cost_usd(self):
"""When cost_usd is provided, tracking_type should be 'cost_usd'."""
mock_log = AsyncMock()
with (
patch(
"backend.copilot.token_tracking.record_token_usage",
new_callable=AsyncMock,
),
patch(
"backend.copilot.token_tracking.platform_cost_db",
return_value=type(
"FakePlatformCostDb", (), {"log_platform_cost": mock_log}
)(),
),
):
await persist_and_record_usage(
session=_make_session(),
user_id="user-cost",
prompt_tokens=200,
completion_tokens=100,
cost_usd=0.005,
model="gpt-4",
provider="anthropic",
log_prefix="[SDK]",
)
await asyncio.sleep(0)
mock_log.assert_awaited_once()
entry = mock_log.call_args[0][0]
assert entry.user_id == "user-cost"
assert entry.provider == "anthropic"
assert entry.model == "gpt-4"
assert entry.cost_microdollars == 5000
assert entry.input_tokens == 200
assert entry.output_tokens == 100
assert entry.tracking_type == "cost_usd"
assert entry.metadata["tracking_type"] == "cost_usd"
assert entry.metadata["tracking_amount"] == 0.005
assert entry.block_name == "copilot:SDK"
assert entry.graph_exec_id == "sess-test"
@pytest.mark.asyncio
async def test_logs_cost_entry_without_cost_usd(self):
"""When cost_usd is None, tracking_type should be 'tokens'."""
mock_log = AsyncMock()
with (
patch(
"backend.copilot.token_tracking.record_token_usage",
new_callable=AsyncMock,
),
patch(
"backend.copilot.token_tracking.platform_cost_db",
return_value=type(
"FakePlatformCostDb", (), {"log_platform_cost": mock_log}
)(),
),
):
await persist_and_record_usage(
session=None,
user_id="user-tokens",
prompt_tokens=100,
completion_tokens=50,
log_prefix="[Baseline]",
)
await asyncio.sleep(0)
mock_log.assert_awaited_once()
entry = mock_log.call_args[0][0]
assert entry.cost_microdollars is None
assert entry.tracking_type == "tokens"
assert entry.metadata["tracking_type"] == "tokens"
assert entry.metadata["tracking_amount"] == 150
assert entry.graph_exec_id is None
assert entry.block_name == "copilot:Baseline"
@pytest.mark.asyncio
async def test_skips_cost_log_when_no_user_id(self):
"""No PlatformCostLog entry when user_id is None."""
mock_log = AsyncMock()
with (
patch(
"backend.copilot.token_tracking.record_token_usage",
new_callable=AsyncMock,
),
patch(
"backend.copilot.token_tracking.platform_cost_db",
return_value=type(
"FakePlatformCostDb", (), {"log_platform_cost": mock_log}
)(),
),
):
await persist_and_record_usage(
session=None,
user_id=None,
prompt_tokens=100,
completion_tokens=50,
)
await asyncio.sleep(0)
mock_log.assert_not_awaited()
@pytest.mark.asyncio
async def test_cost_usd_invalid_string_falls_back_to_tokens(self):
"""Invalid cost_usd string should fall back to tokens tracking."""
mock_log = AsyncMock()
with (
patch(
"backend.copilot.token_tracking.record_token_usage",
new_callable=AsyncMock,
),
patch(
"backend.copilot.token_tracking.platform_cost_db",
return_value=type(
"FakePlatformCostDb", (), {"log_platform_cost": mock_log}
)(),
),
):
await persist_and_record_usage(
session=None,
user_id="user-invalid",
prompt_tokens=100,
completion_tokens=50,
cost_usd="not-a-number",
)
await asyncio.sleep(0)
mock_log.assert_awaited_once()
entry = mock_log.call_args[0][0]
assert entry.cost_microdollars is None
assert entry.metadata["tracking_type"] == "tokens"
@pytest.mark.asyncio
async def test_cost_usd_string_number_is_parsed(self):
"""String-encoded cost_usd (e.g. from OpenRouter) should be parsed."""
mock_log = AsyncMock()
with (
patch(
"backend.copilot.token_tracking.record_token_usage",
new_callable=AsyncMock,
),
patch(
"backend.copilot.token_tracking.platform_cost_db",
return_value=type(
"FakePlatformCostDb", (), {"log_platform_cost": mock_log}
)(),
),
):
await persist_and_record_usage(
session=None,
user_id="user-str",
prompt_tokens=100,
completion_tokens=50,
cost_usd="0.01",
)
await asyncio.sleep(0)
mock_log.assert_awaited_once()
entry = mock_log.call_args[0][0]
assert entry.cost_microdollars == 10_000
assert entry.metadata["tracking_type"] == "cost_usd"
@pytest.mark.asyncio
async def test_empty_log_prefix_produces_copilot_block_name(self):
"""Empty log_prefix results in block_name='copilot'."""
mock_log = AsyncMock()
with (
patch(
"backend.copilot.token_tracking.record_token_usage",
new_callable=AsyncMock,
),
patch(
"backend.copilot.token_tracking.platform_cost_db",
return_value=type(
"FakePlatformCostDb", (), {"log_platform_cost": mock_log}
)(),
),
):
await persist_and_record_usage(
session=None,
user_id="user-empty",
prompt_tokens=10,
completion_tokens=5,
log_prefix="",
)
await asyncio.sleep(0)
entry = mock_log.call_args[0][0]
assert entry.block_name == "copilot"
@pytest.mark.asyncio
async def test_cache_tokens_included_in_metadata(self):
"""Cache token counts should be present in the metadata."""
mock_log = AsyncMock()
with (
patch(
"backend.copilot.token_tracking.record_token_usage",
new_callable=AsyncMock,
),
patch(
"backend.copilot.token_tracking.platform_cost_db",
return_value=type(
"FakePlatformCostDb", (), {"log_platform_cost": mock_log}
)(),
),
):
await persist_and_record_usage(
session=None,
user_id="user-cache",
prompt_tokens=100,
completion_tokens=50,
cache_read_tokens=5000,
cache_creation_tokens=300,
)
await asyncio.sleep(0)
entry = mock_log.call_args[0][0]
assert entry.metadata["cache_read_tokens"] == 5000
assert entry.metadata["cache_creation_tokens"] == 300
assert entry.metadata["source"] == "copilot"
@pytest.mark.asyncio
async def test_logs_cost_only_when_tokens_zero(self):
"""Zero prompt+completion tokens with cost_usd set still logs the entry."""
mock_log = AsyncMock()
with (
patch(
"backend.copilot.token_tracking.record_token_usage",
new_callable=AsyncMock,
),
patch(
"backend.copilot.token_tracking.platform_cost_db",
return_value=type(
"FakePlatformCostDb", (), {"log_platform_cost": mock_log}
)(),
),
):
await persist_and_record_usage(
session=None,
user_id="user-cached",
prompt_tokens=0,
completion_tokens=0,
cost_usd=0.005,
model="claude-3-5-sonnet",
provider="anthropic",
log_prefix="[SDK]",
)
await asyncio.sleep(0)
# Guard: total_tokens == 0 but cost_usd is set — must still log
mock_log.assert_awaited_once()
entry = mock_log.call_args[0][0]
assert entry.user_id == "user-cached"
assert entry.tracking_type == "cost_usd"
assert entry.cost_microdollars == 5000
assert entry.input_tokens == 0
assert entry.output_tokens == 0

View File

@@ -33,12 +33,23 @@ _GET_CURRENT_DATE_BLOCK_ID = "b29c1b50-5d0e-4d9f-8f9d-1b0e6fcbf0b1"
_GMAIL_SEND_BLOCK_ID = "6c27abc2-e51d-499e-a85f-5a0041ba94f0"
_TEXT_REPLACE_BLOCK_ID = "7e7c87ab-3469-4bcc-9abe-67705091b713"
# Default OrchestratorBlock model/mode — kept in sync with ChatConfig.model.
# ChatConfig uses the OpenRouter format ("anthropic/claude-opus-4.6");
# OrchestratorBlock uses the native Anthropic model name.
ORCHESTRATOR_DEFAULT_MODEL = "claude-opus-4-6"
ORCHESTRATOR_DEFAULT_EXECUTION_MODE = "extended_thinking"
# Defaults applied to OrchestratorBlock nodes by the fixer.
_SDM_DEFAULTS: dict[str, int | bool] = {
# execution_mode and model match the copilot's default (extended thinking
# with Opus) so generated agents inherit the same reasoning capabilities.
# If the user explicitly sets these fields, the fixer won't override them.
_SDM_DEFAULTS: dict[str, int | bool | str] = {
"agent_mode_max_iterations": 10,
"conversation_compaction": True,
"retry": 3,
"multiple_tool_calls": False,
"execution_mode": ORCHESTRATOR_DEFAULT_EXECUTION_MODE,
"model": ORCHESTRATOR_DEFAULT_MODEL,
}
@@ -879,6 +890,12 @@ class AgentFixer:
)
if is_ai_block:
# Skip AI blocks that don't expose a "model" input property
# (some AI-category blocks have no model selector at all).
input_properties = block.get("inputSchema", {}).get("properties", {})
if "model" not in input_properties:
continue
node_id = node.get("id")
input_default = node.get("input_default", {})
current_model = input_default.get("model")
@@ -887,9 +904,7 @@ class AgentFixer:
# Blocks with a block-specific enum on the model field (e.g.
# PerplexityBlock) use their own enum values; others use the
# generic set.
model_schema = (
block.get("inputSchema", {}).get("properties", {}).get("model", {})
)
model_schema = input_properties.get("model", {})
block_model_enum = model_schema.get("enum")
if block_model_enum:
@@ -1649,6 +1664,8 @@ class AgentFixer:
2. ``conversation_compaction`` defaults to ``True``
3. ``retry`` defaults to ``3``
4. ``multiple_tool_calls`` defaults to ``False``
5. ``execution_mode`` defaults to ``"extended_thinking"``
6. ``model`` defaults to ``"claude-opus-4-6"``
Args:
agent: The agent dictionary to fix
@@ -1748,6 +1765,12 @@ class AgentFixer:
agent = self.fix_node_x_coordinates(agent, node_lookup=node_lookup)
agent = self.fix_getcurrentdate_offset(agent)
# Apply OrchestratorBlock defaults BEFORE fix_ai_model_parameter so that
# the orchestrator-specific model (claude-opus-4-6) is set first and
# fix_ai_model_parameter sees it as a valid allowed model instead of
# overwriting it with the generic default (gpt-4o).
agent = self.fix_orchestrator_blocks(agent)
# Apply fixes that require blocks information
if blocks:
agent = self.fix_invalid_nested_sink_links(
@@ -1765,9 +1788,6 @@ class AgentFixer:
# Apply fixes for MCPToolBlock nodes
agent = self.fix_mcp_tool_blocks(agent)
# Apply fixes for OrchestratorBlock nodes (agent-mode defaults)
agent = self.fix_orchestrator_blocks(agent)
# Apply fixes for AgentExecutorBlock nodes (sub-agents)
if library_agents:
agent = self.fix_agent_executor_blocks(agent, library_agents)

View File

@@ -580,6 +580,29 @@ class TestFixAiModelParameter:
assert result["nodes"][0]["input_default"]["model"] == "perplexity/sonar"
def test_ai_block_without_model_property_is_skipped(self):
"""AI-category blocks that have no 'model' input property should not
have a model injected — they simply don't expose a model selector."""
fixer = AgentFixer()
block_id = generate_uuid()
node = _make_node(node_id="n1", block_id=block_id, input_default={})
agent = _make_agent(nodes=[node])
blocks = [
{
"id": block_id,
"name": "SomeAIBlock",
"categories": [{"category": "AI"}],
"inputSchema": {
"properties": {"prompt": {"type": "string"}},
},
}
]
result = fixer.fix_ai_model_parameter(agent, blocks)
assert "model" not in result["nodes"][0]["input_default"]
class TestFixAgentExecutorBlocks:
"""Tests for fix_agent_executor_blocks."""

View File

@@ -42,7 +42,10 @@ class GetAgentBuildingGuideTool(BaseTool):
@property
def description(self) -> str:
return "Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage). Call before generating agent JSON."
return (
"Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage, "
"and the create->dry-run->fix iterative workflow). Call before generating agent JSON."
)
@property
def parameters(self) -> dict[str, Any]:

View File

@@ -0,0 +1,15 @@
"""Tests for GetAgentBuildingGuideTool."""
from backend.copilot.tools.get_agent_building_guide import _load_guide
def test_load_guide_returns_string():
guide = _load_guide()
assert isinstance(guide, str)
assert len(guide) > 100
def test_load_guide_caches():
guide1 = _load_guide()
guide2 = _load_guide()
assert guide1 is guide2

View File

@@ -48,27 +48,41 @@ logger = logging.getLogger(__name__)
def get_inputs_from_schema(
input_schema: dict[str, Any],
exclude_fields: set[str] | None = None,
input_data: dict[str, Any] | None = None,
) -> list[dict[str, Any]]:
"""Extract input field info from JSON schema."""
"""Extract input field info from JSON schema.
When *input_data* is provided, each field's ``value`` key is populated
with the value the CoPilot already supplied — so the frontend can
prefill the form instead of showing empty inputs. Fields marked
``advanced`` in the schema are flagged so the frontend can hide them
by default (matching the builder behaviour).
"""
if not isinstance(input_schema, dict):
return []
exclude = exclude_fields or set()
properties = input_schema.get("properties", {})
required = set(input_schema.get("required", []))
provided = input_data or {}
return [
{
results: list[dict[str, Any]] = []
for name, schema in properties.items():
if name in exclude:
continue
entry: dict[str, Any] = {
"name": name,
"title": schema.get("title", name),
"type": schema.get("type", "string"),
"description": schema.get("description", ""),
"required": name in required,
"default": schema.get("default"),
"advanced": schema.get("advanced", False),
}
for name, schema in properties.items()
if name not in exclude
]
if name in provided:
entry["value"] = provided[name]
results.append(entry)
return results
async def execute_block(
@@ -446,7 +460,9 @@ async def prepare_block_for_execution(
requirements={
"credentials": missing_creds_list,
"inputs": get_inputs_from_schema(
input_schema, exclude_fields=credentials_fields
input_schema,
exclude_fields=credentials_fields,
input_data=input_data,
),
"execution_modes": ["immediate"],
},

View File

@@ -153,7 +153,11 @@ class RunAgentTool(BaseTool):
},
"dry_run": {
"type": "boolean",
"description": "Execute in preview mode.",
"description": (
"When true, simulates execution using an LLM for each block "
"— no real API calls, credentials, or credits. "
"See agent_generation_guide for the full workflow."
),
},
},
"required": ["dry_run"],

View File

@@ -845,6 +845,7 @@ class WriteWorkspaceFileTool(BaseTool):
path=path,
mime_type=mime_type,
overwrite=overwrite,
metadata={"origin": "agent-created"},
)
# Build informative source label and message.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,240 @@
"""Build complete JSONL transcript from SDK messages.
The transcript represents the FULL active context at any point in time.
Each upload REPLACES the previous transcript atomically.
Flow:
Turn 1: Upload [msg1, msg2]
Turn 2: Download [msg1, msg2] → Upload [msg1, msg2, msg3, msg4] (REPLACE)
Turn 3: Download [msg1, msg2, msg3, msg4] → Upload [all messages] (REPLACE)
The transcript is never incremental - always the complete atomic state.
"""
import logging
from typing import Any
from uuid import uuid4
from pydantic import BaseModel
from backend.util import json
from .transcript import STRIPPABLE_TYPES
logger = logging.getLogger(__name__)
class TranscriptEntry(BaseModel):
"""Single transcript entry (user or assistant turn)."""
type: str
uuid: str
parentUuid: str = ""
isCompactSummary: bool | None = None
message: dict[str, Any]
class TranscriptBuilder:
"""Build complete JSONL transcript from SDK messages.
This builder maintains the FULL conversation state, not incremental changes.
The output is always the complete active context.
"""
def __init__(self) -> None:
self._entries: list[TranscriptEntry] = []
self._last_uuid: str | None = None
def _last_is_assistant(self) -> bool:
return bool(self._entries) and self._entries[-1].type == "assistant"
def _last_message_id(self) -> str:
"""Return the message.id of the last entry, or '' if none."""
if self._entries:
return self._entries[-1].message.get("id", "")
return ""
@staticmethod
def _parse_entry(data: dict) -> TranscriptEntry | None:
"""Parse a single transcript entry, filtering strippable types.
Returns ``None`` for entries that should be skipped (strippable types
that are not compaction summaries).
"""
entry_type = data.get("type", "")
if entry_type in STRIPPABLE_TYPES and not data.get("isCompactSummary"):
return None
return TranscriptEntry(
type=entry_type,
uuid=data.get("uuid") or str(uuid4()),
parentUuid=data.get("parentUuid") or "",
isCompactSummary=data.get("isCompactSummary"),
message=data.get("message", {}),
)
def load_previous(self, content: str, log_prefix: str = "[Transcript]") -> None:
"""Load complete previous transcript.
This loads the FULL previous context. As new messages come in,
we append to this state. The final output is the complete context
(previous + new), not just the delta.
"""
if not content or not content.strip():
return
lines = content.strip().split("\n")
for line_num, line in enumerate(lines, 1):
if not line.strip():
continue
data = json.loads(line, fallback=None)
if data is None:
logger.warning(
"%s Failed to parse transcript line %d/%d",
log_prefix,
line_num,
len(lines),
)
continue
entry = self._parse_entry(data)
if entry is None:
continue
self._entries.append(entry)
self._last_uuid = entry.uuid
logger.info(
"%s Loaded %d entries from previous transcript (last_uuid=%s)",
log_prefix,
len(self._entries),
self._last_uuid[:12] if self._last_uuid else None,
)
def append_user(self, content: str | list[dict], uuid: str | None = None) -> None:
"""Append a user entry."""
msg_uuid = uuid or str(uuid4())
self._entries.append(
TranscriptEntry(
type="user",
uuid=msg_uuid,
parentUuid=self._last_uuid or "",
message={"role": "user", "content": content},
)
)
self._last_uuid = msg_uuid
def append_tool_result(self, tool_use_id: str, content: str) -> None:
"""Append a tool result as a user entry (one per tool call)."""
self.append_user(
content=[
{"type": "tool_result", "tool_use_id": tool_use_id, "content": content}
]
)
def append_assistant(
self,
content_blocks: list[dict],
model: str = "",
stop_reason: str | None = None,
) -> None:
"""Append an assistant entry.
Consecutive assistant entries automatically share the same message ID
so the CLI can merge them (thinking → text → tool_use) into a single
API message on ``--resume``. A new ID is assigned whenever an
assistant entry follows a non-assistant entry (user message or tool
result), because that marks the start of a new API response.
"""
message_id = (
self._last_message_id()
if self._last_is_assistant()
else f"msg_sdk_{uuid4().hex[:24]}"
)
msg_uuid = str(uuid4())
self._entries.append(
TranscriptEntry(
type="assistant",
uuid=msg_uuid,
parentUuid=self._last_uuid or "",
message={
"role": "assistant",
"model": model,
"id": message_id,
"type": "message",
"content": content_blocks,
"stop_reason": stop_reason,
"stop_sequence": None,
},
)
)
self._last_uuid = msg_uuid
def replace_entries(
self, compacted_entries: list[dict], log_prefix: str = "[Transcript]"
) -> None:
"""Replace all entries with compacted entries from the CLI session file.
Called after mid-stream compaction so TranscriptBuilder mirrors the
CLI's active context (compaction summary + post-compaction entries).
Builds the new list first and validates it's non-empty before swapping,
so corrupt input cannot wipe the conversation history.
"""
new_entries: list[TranscriptEntry] = []
for data in compacted_entries:
entry = self._parse_entry(data)
if entry is not None:
new_entries.append(entry)
if not new_entries:
logger.warning(
"%s replace_entries produced 0 entries from %d inputs, keeping old (%d entries)",
log_prefix,
len(compacted_entries),
len(self._entries),
)
return
old_count = len(self._entries)
self._entries = new_entries
self._last_uuid = new_entries[-1].uuid
logger.info(
"%s TranscriptBuilder compacted: %d entries -> %d entries",
log_prefix,
old_count,
len(self._entries),
)
def to_jsonl(self) -> str:
"""Export complete context as JSONL.
Consecutive assistant entries are kept separate to match the
native CLI format — the SDK merges them internally on resume.
Returns the FULL conversation state (all entries), not incremental.
This output REPLACES any previous transcript.
"""
if not self._entries:
return ""
lines = [entry.model_dump_json(exclude_none=True) for entry in self._entries]
return "\n".join(lines) + "\n"
@property
def entry_count(self) -> int:
"""Total number of entries in the complete context."""
return len(self._entries)
@property
def is_empty(self) -> bool:
"""Whether this builder has any entries."""
return len(self._entries) == 0
@property
def last_entry_type(self) -> str | None:
"""Type of the last entry, or None if empty."""
return self._entries[-1].type if self._entries else None

View File

@@ -0,0 +1,260 @@
"""Tests for canonical TranscriptBuilder (backend.copilot.transcript_builder).
These tests directly import from the canonical module to ensure codecov
patch coverage for the new file.
"""
from backend.copilot.transcript_builder import TranscriptBuilder, TranscriptEntry
from backend.util import json
def _make_jsonl(*entries: dict) -> str:
return "\n".join(json.dumps(e) for e in entries) + "\n"
USER_MSG = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "hello"},
}
ASST_MSG = {
"type": "assistant",
"uuid": "a1",
"parentUuid": "u1",
"message": {
"role": "assistant",
"id": "msg_1",
"type": "message",
"content": [{"type": "text", "text": "hi"}],
"stop_reason": "end_turn",
"stop_sequence": None,
},
}
class TestTranscriptEntry:
def test_basic_construction(self):
entry = TranscriptEntry(
type="user", uuid="u1", message={"role": "user", "content": "hi"}
)
assert entry.type == "user"
assert entry.uuid == "u1"
assert entry.parentUuid == ""
assert entry.isCompactSummary is None
def test_optional_fields(self):
entry = TranscriptEntry(
type="summary",
uuid="s1",
parentUuid="p1",
isCompactSummary=True,
message={"role": "user", "content": "summary"},
)
assert entry.isCompactSummary is True
assert entry.parentUuid == "p1"
class TestTranscriptBuilderInit:
def test_starts_empty(self):
builder = TranscriptBuilder()
assert builder.is_empty
assert builder.entry_count == 0
assert builder.last_entry_type is None
assert builder.to_jsonl() == ""
class TestAppendUser:
def test_appends_user_entry(self):
builder = TranscriptBuilder()
builder.append_user("hello")
assert builder.entry_count == 1
assert builder.last_entry_type == "user"
def test_chains_parent_uuid(self):
builder = TranscriptBuilder()
builder.append_user("first", uuid="u1")
builder.append_user("second", uuid="u2")
output = builder.to_jsonl()
entries = [json.loads(line) for line in output.strip().split("\n")]
assert entries[0]["parentUuid"] == ""
assert entries[1]["parentUuid"] == "u1"
def test_custom_uuid(self):
builder = TranscriptBuilder()
builder.append_user("hello", uuid="custom-id")
output = builder.to_jsonl()
entry = json.loads(output.strip())
assert entry["uuid"] == "custom-id"
class TestAppendToolResult:
def test_appends_as_user_entry(self):
builder = TranscriptBuilder()
builder.append_tool_result(tool_use_id="tc_1", content="result text")
assert builder.entry_count == 1
assert builder.last_entry_type == "user"
output = builder.to_jsonl()
entry = json.loads(output.strip())
content = entry["message"]["content"]
assert len(content) == 1
assert content[0]["type"] == "tool_result"
assert content[0]["tool_use_id"] == "tc_1"
assert content[0]["content"] == "result text"
class TestAppendAssistant:
def test_appends_assistant_entry(self):
builder = TranscriptBuilder()
builder.append_user("hi")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "hello"}],
model="test-model",
stop_reason="end_turn",
)
assert builder.entry_count == 2
assert builder.last_entry_type == "assistant"
def test_consecutive_assistants_share_message_id(self):
builder = TranscriptBuilder()
builder.append_user("hi")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "part 1"}],
model="m",
)
builder.append_assistant(
content_blocks=[{"type": "text", "text": "part 2"}],
model="m",
)
output = builder.to_jsonl()
entries = [json.loads(line) for line in output.strip().split("\n")]
# The two assistant entries share the same message ID
assert entries[1]["message"]["id"] == entries[2]["message"]["id"]
def test_non_consecutive_assistants_get_different_ids(self):
builder = TranscriptBuilder()
builder.append_user("q1")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "a1"}],
model="m",
)
builder.append_user("q2")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "a2"}],
model="m",
)
output = builder.to_jsonl()
entries = [json.loads(line) for line in output.strip().split("\n")]
assert entries[1]["message"]["id"] != entries[3]["message"]["id"]
class TestLoadPrevious:
def test_loads_valid_entries(self):
content = _make_jsonl(USER_MSG, ASST_MSG)
builder = TranscriptBuilder()
builder.load_previous(content)
assert builder.entry_count == 2
def test_skips_empty_content(self):
builder = TranscriptBuilder()
builder.load_previous("")
assert builder.is_empty
builder.load_previous(" ")
assert builder.is_empty
def test_skips_strippable_types(self):
progress = {"type": "progress", "uuid": "p1", "message": {}}
content = _make_jsonl(USER_MSG, progress, ASST_MSG)
builder = TranscriptBuilder()
builder.load_previous(content)
assert builder.entry_count == 2 # progress was skipped
def test_preserves_compact_summary(self):
compact = {
"type": "summary",
"uuid": "cs1",
"isCompactSummary": True,
"message": {"role": "user", "content": "summary"},
}
content = _make_jsonl(compact, ASST_MSG)
builder = TranscriptBuilder()
builder.load_previous(content)
assert builder.entry_count == 2
def test_skips_invalid_json_lines(self):
content = '{"type":"user","uuid":"u1","message":{}}\nnot-valid-json\n'
builder = TranscriptBuilder()
builder.load_previous(content)
assert builder.entry_count == 1
class TestToJsonl:
def test_roundtrip(self):
builder = TranscriptBuilder()
builder.append_user("hello", uuid="u1")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "world"}],
model="m",
)
output = builder.to_jsonl()
assert output.endswith("\n")
lines = output.strip().split("\n")
assert len(lines) == 2
for line in lines:
parsed = json.loads(line)
assert "type" in parsed
assert "uuid" in parsed
assert "message" in parsed
class TestReplaceEntries:
def test_replaces_all_entries(self):
builder = TranscriptBuilder()
builder.append_user("old")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "old answer"}], model="m"
)
assert builder.entry_count == 2
compacted = [
{
"type": "summary",
"uuid": "cs1",
"isCompactSummary": True,
"message": {"role": "user", "content": "compacted"},
}
]
builder.replace_entries(compacted)
assert builder.entry_count == 1
def test_empty_replacement_keeps_existing(self):
builder = TranscriptBuilder()
builder.append_user("keep me")
builder.replace_entries([])
assert builder.entry_count == 1
class TestParseEntry:
def test_filters_strippable_non_compact(self):
result = TranscriptBuilder._parse_entry(
{"type": "progress", "uuid": "p1", "message": {}}
)
assert result is None
def test_keeps_compact_summary(self):
result = TranscriptBuilder._parse_entry(
{
"type": "summary",
"uuid": "cs1",
"isCompactSummary": True,
"message": {},
}
)
assert result is not None
assert result.isCompactSummary is True
def test_generates_uuid_if_missing(self):
result = TranscriptBuilder._parse_entry(
{"type": "user", "message": {"role": "user", "content": "hi"}}
)
assert result is not None
assert result.uuid # Should be a generated UUID

View File

@@ -0,0 +1,726 @@
"""Tests for canonical transcript module (backend.copilot.transcript).
Covers pure helper functions that are not exercised by the SDK re-export tests.
"""
from __future__ import annotations
from unittest.mock import MagicMock
from backend.util import json
from .transcript import (
TranscriptDownload,
_build_path_from_parts,
_find_last_assistant_entry,
_flatten_assistant_content,
_flatten_tool_result_content,
_messages_to_transcript,
_meta_storage_path_parts,
_rechain_tail,
_sanitize_id,
_storage_path_parts,
_transcript_to_messages,
strip_for_upload,
validate_transcript,
)
def _make_jsonl(*entries: dict) -> str:
return "\n".join(json.dumps(e) for e in entries) + "\n"
# ---------------------------------------------------------------------------
# _sanitize_id
# ---------------------------------------------------------------------------
class TestSanitizeId:
def test_uuid_passes_through(self):
assert _sanitize_id("abcdef12-3456-7890-abcd-ef1234567890") == (
"abcdef12-3456-7890-abcd-ef1234567890"
)
def test_strips_non_hex_characters(self):
# Only hex chars (0-9, a-f, A-F) and hyphens are kept
result = _sanitize_id("abc/../../etc/passwd")
assert "/" not in result
assert "." not in result
# 'p', 's', 'w' are not hex chars, so they are stripped
assert all(c in "0123456789abcdefABCDEF-" for c in result)
def test_truncates_to_max_len(self):
long_id = "a" * 100
result = _sanitize_id(long_id, max_len=10)
assert len(result) == 10
def test_empty_returns_unknown(self):
assert _sanitize_id("") == "unknown"
def test_none_returns_unknown(self):
assert _sanitize_id(None) == "unknown" # type: ignore[arg-type]
def test_special_chars_only_returns_unknown(self):
assert _sanitize_id("!@#$%^&*()") == "unknown"
# ---------------------------------------------------------------------------
# _storage_path_parts / _meta_storage_path_parts
# ---------------------------------------------------------------------------
class TestStoragePathParts:
def test_returns_triple(self):
prefix, uid, fname = _storage_path_parts("user-1", "sess-2")
assert prefix == "chat-transcripts"
assert "e" in uid # hex chars from "user-1" sanitized
assert fname.endswith(".jsonl")
def test_meta_returns_meta_json(self):
prefix, uid, fname = _meta_storage_path_parts("user-1", "sess-2")
assert prefix == "chat-transcripts"
assert fname.endswith(".meta.json")
# ---------------------------------------------------------------------------
# _build_path_from_parts
# ---------------------------------------------------------------------------
class TestBuildPathFromParts:
def test_gcs_backend(self):
from backend.util.workspace_storage import GCSWorkspaceStorage
mock_gcs = MagicMock(spec=GCSWorkspaceStorage)
mock_gcs.bucket_name = "my-bucket"
path = _build_path_from_parts(("wid", "fid", "file.jsonl"), mock_gcs)
assert path == "gcs://my-bucket/workspaces/wid/fid/file.jsonl"
def test_local_backend(self):
# Use a plain object (not MagicMock) so isinstance(GCSWorkspaceStorage) is False
local_backend = type("LocalBackend", (), {})()
path = _build_path_from_parts(("wid", "fid", "file.jsonl"), local_backend)
assert path == "local://wid/fid/file.jsonl"
# ---------------------------------------------------------------------------
# TranscriptDownload dataclass
# ---------------------------------------------------------------------------
class TestTranscriptDownload:
def test_defaults(self):
td = TranscriptDownload(content="hello")
assert td.content == "hello"
assert td.message_count == 0
assert td.uploaded_at == 0.0
def test_custom_values(self):
td = TranscriptDownload(content="data", message_count=5, uploaded_at=123.45)
assert td.message_count == 5
assert td.uploaded_at == 123.45
# ---------------------------------------------------------------------------
# _flatten_assistant_content
# ---------------------------------------------------------------------------
class TestFlattenAssistantContent:
def test_text_blocks(self):
blocks = [
{"type": "text", "text": "Hello"},
{"type": "text", "text": "World"},
]
assert _flatten_assistant_content(blocks) == "Hello\nWorld"
def test_thinking_blocks_stripped(self):
blocks = [
{"type": "thinking", "thinking": "hmm..."},
{"type": "text", "text": "answer"},
{"type": "redacted_thinking", "data": "secret"},
]
assert _flatten_assistant_content(blocks) == "answer"
def test_tool_use_blocks_stripped(self):
blocks = [
{"type": "text", "text": "I'll run a tool"},
{"type": "tool_use", "name": "bash", "id": "tc1", "input": {}},
]
assert _flatten_assistant_content(blocks) == "I'll run a tool"
def test_string_blocks(self):
blocks = ["hello", "world"]
assert _flatten_assistant_content(blocks) == "hello\nworld"
def test_empty_blocks(self):
assert _flatten_assistant_content([]) == ""
def test_unknown_dict_blocks_skipped(self):
blocks = [{"type": "image", "data": "base64..."}]
assert _flatten_assistant_content(blocks) == ""
# ---------------------------------------------------------------------------
# _flatten_tool_result_content
# ---------------------------------------------------------------------------
class TestFlattenToolResultContent:
def test_tool_result_with_text_content(self):
blocks = [
{
"type": "tool_result",
"tool_use_id": "tc1",
"content": [{"type": "text", "text": "output data"}],
}
]
assert _flatten_tool_result_content(blocks) == "output data"
def test_tool_result_with_string_content(self):
blocks = [
{"type": "tool_result", "tool_use_id": "tc1", "content": "simple string"}
]
assert _flatten_tool_result_content(blocks) == "simple string"
def test_tool_result_with_image_placeholder(self):
blocks = [
{
"type": "tool_result",
"tool_use_id": "tc1",
"content": [{"type": "image", "data": "base64..."}],
}
]
assert _flatten_tool_result_content(blocks) == "[__image__]"
def test_tool_result_with_document_placeholder(self):
blocks = [
{
"type": "tool_result",
"tool_use_id": "tc1",
"content": [{"type": "document", "data": "base64..."}],
}
]
assert _flatten_tool_result_content(blocks) == "[__document__]"
def test_tool_result_with_none_content(self):
blocks = [{"type": "tool_result", "tool_use_id": "tc1", "content": None}]
assert _flatten_tool_result_content(blocks) == ""
def test_text_block_outside_tool_result(self):
blocks = [{"type": "text", "text": "standalone"}]
assert _flatten_tool_result_content(blocks) == "standalone"
def test_unknown_dict_block_placeholder(self):
blocks = [{"type": "custom_widget", "data": "x"}]
assert _flatten_tool_result_content(blocks) == "[__custom_widget__]"
def test_string_blocks(self):
blocks = ["raw text"]
assert _flatten_tool_result_content(blocks) == "raw text"
def test_empty_blocks(self):
assert _flatten_tool_result_content([]) == ""
def test_mixed_content_in_tool_result(self):
blocks = [
{
"type": "tool_result",
"tool_use_id": "tc1",
"content": [
{"type": "text", "text": "line1"},
{"type": "image", "data": "..."},
"raw string",
],
}
]
result = _flatten_tool_result_content(blocks)
assert "line1" in result
assert "[__image__]" in result
assert "raw string" in result
def test_tool_result_with_dict_without_text_key(self):
blocks = [
{
"type": "tool_result",
"tool_use_id": "tc1",
"content": [{"count": 42}],
}
]
result = _flatten_tool_result_content(blocks)
assert "42" in result
def test_tool_result_content_list_with_list_content(self):
blocks = [
{
"type": "tool_result",
"tool_use_id": "tc1",
"content": [{"type": "text", "text": None}],
}
]
result = _flatten_tool_result_content(blocks)
assert result == "None"
# ---------------------------------------------------------------------------
# _transcript_to_messages
# ---------------------------------------------------------------------------
USER_ENTRY = {
"type": "user",
"uuid": "u1",
"parentUuid": "",
"message": {"role": "user", "content": "hello"},
}
ASST_ENTRY = {
"type": "assistant",
"uuid": "a1",
"parentUuid": "u1",
"message": {
"role": "assistant",
"id": "msg_1",
"content": [{"type": "text", "text": "hi there"}],
},
}
PROGRESS_ENTRY = {
"type": "progress",
"uuid": "p1",
"parentUuid": "u1",
"data": {},
}
class TestTranscriptToMessages:
def test_basic_conversion(self):
content = _make_jsonl(USER_ENTRY, ASST_ENTRY)
messages = _transcript_to_messages(content)
assert len(messages) == 2
assert messages[0] == {"role": "user", "content": "hello"}
assert messages[1]["role"] == "assistant"
assert messages[1]["content"] == "hi there"
def test_skips_strippable_types(self):
content = _make_jsonl(USER_ENTRY, PROGRESS_ENTRY, ASST_ENTRY)
messages = _transcript_to_messages(content)
assert len(messages) == 2
def test_skips_entries_without_role(self):
no_role = {"type": "user", "uuid": "x", "message": {"content": "no role"}}
content = _make_jsonl(no_role)
messages = _transcript_to_messages(content)
assert len(messages) == 0
def test_handles_string_content(self):
entry = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "plain string"},
}
content = _make_jsonl(entry)
messages = _transcript_to_messages(content)
assert messages[0]["content"] == "plain string"
def test_handles_tool_result_content(self):
entry = {
"type": "user",
"uuid": "u1",
"message": {
"role": "user",
"content": [
{"type": "tool_result", "tool_use_id": "tc1", "content": "output"}
],
},
}
content = _make_jsonl(entry)
messages = _transcript_to_messages(content)
assert messages[0]["content"] == "output"
def test_handles_none_content(self):
entry = {
"type": "assistant",
"uuid": "a1",
"message": {"role": "assistant", "content": None},
}
content = _make_jsonl(entry)
messages = _transcript_to_messages(content)
assert messages[0]["content"] == ""
def test_skips_invalid_json(self):
content = "not valid json\n"
messages = _transcript_to_messages(content)
assert len(messages) == 0
def test_preserves_compact_summary(self):
compact = {
"type": "summary",
"uuid": "cs1",
"isCompactSummary": True,
"message": {"role": "user", "content": "summary of conversation"},
}
content = _make_jsonl(compact)
messages = _transcript_to_messages(content)
assert len(messages) == 1
def test_strips_summary_without_compact_flag(self):
summary = {
"type": "summary",
"uuid": "s1",
"message": {"role": "user", "content": "summary"},
}
content = _make_jsonl(summary)
messages = _transcript_to_messages(content)
assert len(messages) == 0
# ---------------------------------------------------------------------------
# _messages_to_transcript
# ---------------------------------------------------------------------------
class TestMessagesToTranscript:
def test_basic_roundtrip(self):
messages = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "world"},
]
result = _messages_to_transcript(messages)
assert result.endswith("\n")
lines = result.strip().split("\n")
assert len(lines) == 2
user_entry = json.loads(lines[0])
assert user_entry["type"] == "user"
assert user_entry["message"]["role"] == "user"
assert user_entry["message"]["content"] == "hello"
assert user_entry["parentUuid"] == ""
asst_entry = json.loads(lines[1])
assert asst_entry["type"] == "assistant"
assert asst_entry["message"]["role"] == "assistant"
assert asst_entry["message"]["content"] == [{"type": "text", "text": "world"}]
assert asst_entry["parentUuid"] == user_entry["uuid"]
def test_empty_messages(self):
assert _messages_to_transcript([]) == ""
def test_assistant_has_message_envelope(self):
messages = [{"role": "assistant", "content": "test"}]
result = _messages_to_transcript(messages)
entry = json.loads(result.strip())
msg = entry["message"]
assert "id" in msg
assert msg["id"].startswith("msg_compact_")
assert msg["type"] == "message"
assert msg["stop_reason"] == "end_turn"
assert msg["stop_sequence"] is None
def test_uuid_chain(self):
messages = [
{"role": "user", "content": "a"},
{"role": "assistant", "content": "b"},
{"role": "user", "content": "c"},
]
result = _messages_to_transcript(messages)
lines = result.strip().split("\n")
entries = [json.loads(line) for line in lines]
assert entries[0]["parentUuid"] == ""
assert entries[1]["parentUuid"] == entries[0]["uuid"]
assert entries[2]["parentUuid"] == entries[1]["uuid"]
def test_assistant_with_empty_content(self):
messages = [{"role": "assistant", "content": ""}]
result = _messages_to_transcript(messages)
entry = json.loads(result.strip())
assert entry["message"]["content"] == []
# ---------------------------------------------------------------------------
# _find_last_assistant_entry
# ---------------------------------------------------------------------------
class TestFindLastAssistantEntry:
def test_splits_at_last_assistant(self):
user = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "hi"},
}
asst = {
"type": "assistant",
"uuid": "a1",
"message": {"role": "assistant", "id": "msg1", "content": "answer"},
}
content = _make_jsonl(user, asst)
prefix, tail = _find_last_assistant_entry(content)
assert len(prefix) == 1
assert len(tail) == 1
def test_no_assistant_returns_all_in_prefix(self):
user1 = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "hi"},
}
user2 = {
"type": "user",
"uuid": "u2",
"message": {"role": "user", "content": "hey"},
}
content = _make_jsonl(user1, user2)
prefix, tail = _find_last_assistant_entry(content)
assert len(prefix) == 2
assert len(tail) == 0
def test_multi_entry_turn_preserved(self):
user = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "q"},
}
asst1 = {
"type": "assistant",
"uuid": "a1",
"message": {
"role": "assistant",
"id": "msg_turn",
"content": [{"type": "thinking", "thinking": "hmm"}],
},
}
asst2 = {
"type": "assistant",
"uuid": "a2",
"message": {
"role": "assistant",
"id": "msg_turn",
"content": [{"type": "text", "text": "answer"}],
},
}
content = _make_jsonl(user, asst1, asst2)
prefix, tail = _find_last_assistant_entry(content)
assert len(prefix) == 1 # just the user
assert len(tail) == 2 # both assistant entries
def test_assistant_without_id(self):
user = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "q"},
}
asst = {
"type": "assistant",
"uuid": "a1",
"message": {"role": "assistant", "content": "no id"},
}
content = _make_jsonl(user, asst)
prefix, tail = _find_last_assistant_entry(content)
assert len(prefix) == 1
assert len(tail) == 1
def test_trailing_user_after_assistant(self):
user1 = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "q"},
}
asst = {
"type": "assistant",
"uuid": "a1",
"message": {"role": "assistant", "id": "msg1", "content": "a"},
}
user2 = {
"type": "user",
"uuid": "u2",
"message": {"role": "user", "content": "follow"},
}
content = _make_jsonl(user1, asst, user2)
prefix, tail = _find_last_assistant_entry(content)
assert len(prefix) == 1 # user1
assert len(tail) == 2 # asst + user2
# ---------------------------------------------------------------------------
# _rechain_tail
# ---------------------------------------------------------------------------
class TestRechainTail:
def test_empty_tail(self):
assert _rechain_tail("some prefix\n", []) == ""
def test_patches_first_entry_parent(self):
prefix_entry = {"uuid": "last-prefix-uuid", "type": "user", "message": {}}
prefix = json.dumps(prefix_entry) + "\n"
tail_entry = {
"uuid": "t1",
"parentUuid": "old-parent",
"type": "assistant",
"message": {},
}
tail_lines = [json.dumps(tail_entry)]
result = _rechain_tail(prefix, tail_lines)
parsed = json.loads(result.strip())
assert parsed["parentUuid"] == "last-prefix-uuid"
def test_chains_consecutive_tail_entries(self):
prefix_entry = {"uuid": "p1", "type": "user", "message": {}}
prefix = json.dumps(prefix_entry) + "\n"
t1 = {"uuid": "t1", "parentUuid": "old1", "type": "assistant", "message": {}}
t2 = {"uuid": "t2", "parentUuid": "old2", "type": "user", "message": {}}
tail_lines = [json.dumps(t1), json.dumps(t2)]
result = _rechain_tail(prefix, tail_lines)
entries = [json.loads(line) for line in result.strip().split("\n")]
assert entries[0]["parentUuid"] == "p1"
assert entries[1]["parentUuid"] == "t1"
def test_non_dict_lines_passed_through(self):
prefix_entry = {"uuid": "p1", "type": "user", "message": {}}
prefix = json.dumps(prefix_entry) + "\n"
tail_lines = ["not-a-json-dict"]
result = _rechain_tail(prefix, tail_lines)
assert "not-a-json-dict" in result
# ---------------------------------------------------------------------------
# strip_for_upload (combined single-parse)
# ---------------------------------------------------------------------------
class TestStripForUpload:
def test_strips_progress_and_thinking(self):
user = {
"type": "user",
"uuid": "u1",
"parentUuid": "",
"message": {"role": "user", "content": "hi"},
}
progress = {"type": "progress", "uuid": "p1", "parentUuid": "u1", "data": {}}
asst_old = {
"type": "assistant",
"uuid": "a1",
"parentUuid": "p1",
"message": {
"role": "assistant",
"id": "msg_old",
"content": [
{"type": "thinking", "thinking": "stale thinking"},
{"type": "text", "text": "old answer"},
],
},
}
user2 = {
"type": "user",
"uuid": "u2",
"parentUuid": "a1",
"message": {"role": "user", "content": "next"},
}
asst_new = {
"type": "assistant",
"uuid": "a2",
"parentUuid": "u2",
"message": {
"role": "assistant",
"id": "msg_new",
"content": [
{"type": "thinking", "thinking": "fresh thinking"},
{"type": "text", "text": "new answer"},
],
},
}
content = _make_jsonl(user, progress, asst_old, user2, asst_new)
result = strip_for_upload(content)
lines = result.strip().split("\n")
# Progress should be stripped -> 4 entries remain
assert len(lines) == 4
# First entry (user) should be reparented since its child (progress) was stripped
entries = [json.loads(line) for line in lines]
types = [e.get("type") for e in entries]
assert "progress" not in types
# Old assistant thinking stripped, new assistant thinking preserved
old_asst = next(
e for e in entries if e.get("message", {}).get("id") == "msg_old"
)
old_content = old_asst["message"]["content"]
old_types = [b["type"] for b in old_content if isinstance(b, dict)]
assert "thinking" not in old_types
assert "text" in old_types
new_asst = next(
e for e in entries if e.get("message", {}).get("id") == "msg_new"
)
new_content = new_asst["message"]["content"]
new_types = [b["type"] for b in new_content if isinstance(b, dict)]
assert "thinking" in new_types # last assistant preserved
def test_empty_content(self):
result = strip_for_upload("")
# Empty string produces a single empty line after split, resulting in "\n"
assert result.strip() == ""
def test_preserves_compact_summary(self):
compact = {
"type": "summary",
"uuid": "cs1",
"isCompactSummary": True,
"message": {"role": "user", "content": "summary"},
}
asst = {
"type": "assistant",
"uuid": "a1",
"parentUuid": "cs1",
"message": {"role": "assistant", "id": "msg1", "content": "answer"},
}
content = _make_jsonl(compact, asst)
result = strip_for_upload(content)
lines = result.strip().split("\n")
assert len(lines) == 2
def test_no_assistant_entries(self):
user = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "hi"},
}
content = _make_jsonl(user)
result = strip_for_upload(content)
lines = result.strip().split("\n")
assert len(lines) == 1
# ---------------------------------------------------------------------------
# validate_transcript (additional edge cases)
# ---------------------------------------------------------------------------
class TestValidateTranscript:
def test_valid_with_assistant(self):
content = _make_jsonl(
USER_ENTRY,
ASST_ENTRY,
)
assert validate_transcript(content) is True
def test_none_returns_false(self):
assert validate_transcript(None) is False
def test_whitespace_only_returns_false(self):
assert validate_transcript(" \n ") is False
def test_no_assistant_returns_false(self):
content = _make_jsonl(USER_ENTRY)
assert validate_transcript(content) is False
def test_invalid_json_returns_false(self):
assert validate_transcript("not json\n") is False
def test_assistant_only_is_valid(self):
content = _make_jsonl(ASST_ENTRY)
assert validate_transcript(content) is True

View File

@@ -147,6 +147,19 @@ MODEL_COST: dict[LlmModel, int] = {
LlmModel.KIMI_K2: 1,
LlmModel.QWEN3_235B_A22B_THINKING: 1,
LlmModel.QWEN3_CODER: 9,
# Z.ai (Zhipu) models
LlmModel.ZAI_GLM_4_32B: 1,
LlmModel.ZAI_GLM_4_5: 2,
LlmModel.ZAI_GLM_4_5_AIR: 1,
LlmModel.ZAI_GLM_4_5_AIR_FREE: 1,
LlmModel.ZAI_GLM_4_5V: 2,
LlmModel.ZAI_GLM_4_6: 1,
LlmModel.ZAI_GLM_4_6V: 1,
LlmModel.ZAI_GLM_4_7: 1,
LlmModel.ZAI_GLM_4_7_FLASH: 1,
LlmModel.ZAI_GLM_5: 2,
LlmModel.ZAI_GLM_5_TURBO: 4,
LlmModel.ZAI_GLM_5V_TURBO: 4,
# v0 by Vercel models
LlmModel.V0_1_5_MD: 1,
LlmModel.V0_1_5_LG: 2,

View File

@@ -142,3 +142,9 @@ def credit_db():
credit_db = get_database_manager_async_client()
return credit_db
def platform_cost_db():
from backend.util.clients import get_database_manager_async_client
return get_database_manager_async_client()

View File

@@ -96,6 +96,7 @@ from backend.data.notifications import (
remove_notifications_from_batch,
)
from backend.data.onboarding import increment_onboarding_runs
from backend.data.platform_cost import log_platform_cost
from backend.data.understanding import (
get_business_understanding,
upsert_business_understanding,
@@ -332,6 +333,9 @@ class DatabaseManager(AppService):
get_blocks_needing_optimization = _(get_blocks_needing_optimization)
update_block_optimized_description = _(update_block_optimized_description)
# ============ Platform Cost Tracking ============ #
log_platform_cost = _(log_platform_cost)
# ============ CoPilot Chat Sessions ============ #
get_chat_session = _(chat_db.get_chat_session)
create_chat_session = _(chat_db.create_chat_session)
@@ -529,6 +533,9 @@ class DatabaseManagerAsyncClient(AppServiceClient):
# ============ Block Descriptions ============ #
get_blocks_needing_optimization = d.get_blocks_needing_optimization
# ============ Platform Cost Tracking ============ #
log_platform_cost = d.log_platform_cost
# ============ CoPilot Chat Sessions ============ #
get_chat_session = d.get_chat_session
create_chat_session = d.create_chat_session

View File

@@ -104,6 +104,11 @@ class User(BaseModel):
description="User timezone (IANA timezone identifier or 'not-set')",
)
# Subscription / rate-limit tier
subscription_tier: str | None = Field(
default=None, description="Subscription tier (FREE, PRO, BUSINESS, ENTERPRISE)"
)
@classmethod
def from_db(cls, prisma_user: "PrismaUser") -> "User":
"""Convert a database User object to application User model."""
@@ -158,6 +163,7 @@ class User(BaseModel):
notify_on_weekly_summary=prisma_user.notifyOnWeeklySummary or True,
notify_on_monthly_summary=prisma_user.notifyOnMonthlySummary or True,
timezone=prisma_user.timezone or USER_TIMEZONE_NOT_SET,
subscription_tier=prisma_user.subscriptionTier,
)
@@ -819,6 +825,17 @@ class RefundRequest(BaseModel):
updated_at: datetime
ProviderCostType = Literal[
"cost_usd", # Actual USD cost reported by the provider
"tokens", # LLM token counts (sum of input + output)
"characters", # Per-character billing (TTS providers)
"sandbox_seconds", # Per-second compute billing (e.g. E2B)
"walltime_seconds", # Per-second billing incl. queue/polling
"per_run", # Per-API-call billing with fixed cost
"items", # Per-item billing (lead/organization/result count)
]
class NodeExecutionStats(BaseModel):
"""Execution statistics for a node execution."""
@@ -838,32 +855,39 @@ class NodeExecutionStats(BaseModel):
output_token_count: int = 0
extra_cost: int = 0
extra_steps: int = 0
provider_cost: float | None = None
# Type of the provider-reported cost/usage captured above. When set
# by a block, resolve_tracking honors this directly instead of
# guessing from provider name.
provider_cost_type: Optional[ProviderCostType] = None
# Moderation fields
cleared_inputs: Optional[dict[str, list[str]]] = None
cleared_outputs: Optional[dict[str, list[str]]] = None
def __iadd__(self, other: "NodeExecutionStats") -> "NodeExecutionStats":
"""Mutate this instance by adding another NodeExecutionStats."""
"""Mutate this instance by adding another NodeExecutionStats.
Avoids calling model_dump() twice per merge (called on every
merge_stats() from ~20+ blocks); reads via getattr/vars instead.
"""
if not isinstance(other, NodeExecutionStats):
return NotImplemented
stats_dict = other.model_dump()
current_stats = self.model_dump()
for key, value in stats_dict.items():
if key not in current_stats:
# Field doesn't exist yet, just set it
for key in type(other).model_fields:
value = getattr(other, key)
if value is None:
# Never overwrite an existing value with None
continue
current = getattr(self, key, None)
if current is None:
# Field doesn't exist yet or is None, just set it
setattr(self, key, value)
elif isinstance(value, dict) and isinstance(current_stats[key], dict):
current_stats[key].update(value)
setattr(self, key, current_stats[key])
elif isinstance(value, (int, float)) and isinstance(
current_stats[key], (int, float)
):
setattr(self, key, current_stats[key] + value)
elif isinstance(value, list) and isinstance(current_stats[key], list):
current_stats[key].extend(value)
setattr(self, key, current_stats[key])
elif isinstance(value, dict) and isinstance(current, dict):
current.update(value)
elif isinstance(value, (int, float)) and isinstance(current, (int, float)):
setattr(self, key, current + value)
elif isinstance(value, list) and isinstance(current, list):
current.extend(value)
else:
setattr(self, key, value)

View File

@@ -1,7 +1,7 @@
import pytest
from pydantic import SecretStr
from backend.data.model import HostScopedCredentials
from backend.data.model import HostScopedCredentials, NodeExecutionStats
class TestHostScopedCredentials:
@@ -166,3 +166,84 @@ class TestHostScopedCredentials:
)
assert creds.matches_url(test_url) == expected
class TestNodeExecutionStatsIadd:
def test_adds_numeric_fields(self):
a = NodeExecutionStats(input_token_count=100, output_token_count=50)
b = NodeExecutionStats(input_token_count=200, output_token_count=30)
a += b
assert a.input_token_count == 300
assert a.output_token_count == 80
def test_none_does_not_overwrite(self):
a = NodeExecutionStats(provider_cost=0.5, error="some error")
b = NodeExecutionStats(provider_cost=None, error=None)
a += b
assert a.provider_cost == 0.5
assert a.error == "some error"
def test_none_is_skipped_preserving_existing_value(self):
a = NodeExecutionStats(input_token_count=100)
b = NodeExecutionStats()
a += b
assert a.input_token_count == 100
def test_dict_fields_are_merged(self):
a = NodeExecutionStats(
cleared_inputs={"field1": ["val1"]},
)
b = NodeExecutionStats(
cleared_inputs={"field2": ["val2"]},
)
a += b
assert a.cleared_inputs == {"field1": ["val1"], "field2": ["val2"]}
def test_returns_self(self):
a = NodeExecutionStats()
b = NodeExecutionStats(input_token_count=10)
result = a.__iadd__(b)
assert result is a
def test_not_implemented_for_non_stats(self):
a = NodeExecutionStats()
result = a.__iadd__("not a stats") # type: ignore[arg-type]
assert result is NotImplemented
def test_error_none_does_not_clear_existing_error(self):
a = NodeExecutionStats(error="existing error")
b = NodeExecutionStats(error=None)
a += b
assert a.error == "existing error"
def test_provider_cost_none_does_not_clear_existing_cost(self):
a = NodeExecutionStats(provider_cost=0.05)
b = NodeExecutionStats(provider_cost=None)
a += b
assert a.provider_cost == 0.05
def test_provider_cost_accumulates_when_both_set(self):
a = NodeExecutionStats(provider_cost=0.01)
b = NodeExecutionStats(provider_cost=0.02)
a += b
assert abs((a.provider_cost or 0) - 0.03) < 1e-9
def test_provider_cost_first_write_from_none(self):
a = NodeExecutionStats()
b = NodeExecutionStats(provider_cost=0.05)
a += b
assert a.provider_cost == 0.05
def test_provider_cost_type_first_write_from_none(self):
"""Writing provider_cost_type into a stats with None sets it."""
a = NodeExecutionStats()
b = NodeExecutionStats(provider_cost_type="characters")
a += b
assert a.provider_cost_type == "characters"
def test_provider_cost_type_none_does_not_overwrite(self):
"""A None provider_cost_type from other must not clear an existing value."""
a = NodeExecutionStats(provider_cost_type="tokens")
b = NodeExecutionStats()
a += b
assert a.provider_cost_type == "tokens"

View File

@@ -0,0 +1,390 @@
import asyncio
import json
import logging
from datetime import datetime, timedelta, timezone
from typing import Any
from pydantic import BaseModel
from backend.data.db import execute_raw_with_schema, query_raw_with_schema
logger = logging.getLogger(__name__)
MICRODOLLARS_PER_USD = 1_000_000
# Dashboard query limits — keep in sync with the SQL queries below
MAX_PROVIDER_ROWS = 500
MAX_USER_ROWS = 100
# Default date range for dashboard queries when no start date is provided.
# Prevents full-table scans on large deployments.
DEFAULT_DASHBOARD_DAYS = 30
def usd_to_microdollars(cost_usd: float | None) -> int | None:
"""Convert a USD amount (float) to microdollars (int). None-safe."""
if cost_usd is None:
return None
return round(cost_usd * MICRODOLLARS_PER_USD)
class PlatformCostEntry(BaseModel):
user_id: str
graph_exec_id: str | None = None
node_exec_id: str | None = None
graph_id: str | None = None
node_id: str | None = None
block_id: str
block_name: str
provider: str
credential_id: str
cost_microdollars: int | None = None
input_tokens: int | None = None
output_tokens: int | None = None
data_size: int | None = None
duration: float | None = None
model: str | None = None
tracking_type: str | None = None
tracking_amount: float | None = None
metadata: dict[str, Any] | None = None
async def log_platform_cost(entry: PlatformCostEntry) -> None:
await execute_raw_with_schema(
"""
INSERT INTO {schema_prefix}"PlatformCostLog"
("id", "createdAt", "userId", "graphExecId", "nodeExecId",
"graphId", "nodeId", "blockId", "blockName", "provider",
"credentialId", "costMicrodollars", "inputTokens", "outputTokens",
"dataSize", "duration", "model", "trackingType", "trackingAmount",
"metadata")
VALUES (
gen_random_uuid(), NOW(), $1, $2, $3, $4, $5, $6, $7, $8, $9,
$10, $11, $12, $13, $14, $15, $16, $17, $18::jsonb
)
""",
entry.user_id,
entry.graph_exec_id,
entry.node_exec_id,
entry.graph_id,
entry.node_id,
entry.block_id,
entry.block_name,
# Normalize to lowercase so the (provider, createdAt) index is always
# used without LOWER() on the read side.
entry.provider.lower(),
entry.credential_id,
entry.cost_microdollars,
entry.input_tokens,
entry.output_tokens,
entry.data_size,
entry.duration,
entry.model,
entry.tracking_type,
entry.tracking_amount,
_json_or_none(entry.metadata),
)
# Bound the number of concurrent cost-log DB inserts to prevent unbounded
# task/connection growth under sustained load or DB slowness.
_log_semaphore = asyncio.Semaphore(50)
async def log_platform_cost_safe(entry: PlatformCostEntry) -> None:
"""Fire-and-forget wrapper that never raises."""
try:
async with _log_semaphore:
await log_platform_cost(entry)
except Exception:
logger.exception(
"Failed to log platform cost for user=%s provider=%s block=%s",
entry.user_id,
entry.provider,
entry.block_name,
)
def _json_or_none(data: dict[str, Any] | None) -> str | None:
if data is None:
return None
return json.dumps(data)
def _mask_email(email: str | None) -> str | None:
"""Mask an email address to reduce PII exposure in admin API responses.
Turns 'user@example.com' into 'us***@example.com'.
Handles short local parts gracefully (e.g. 'a@b.com''a***@b.com').
"""
if not email:
return email
at = email.find("@")
if at < 0:
return "***"
local = email[:at]
domain = email[at:]
visible = local[:2] if len(local) >= 2 else local[:1]
return f"{visible}***{domain}"
class ProviderCostSummary(BaseModel):
provider: str
tracking_type: str | None = None
total_cost_microdollars: int
total_input_tokens: int
total_output_tokens: int
total_duration_seconds: float = 0.0
total_tracking_amount: float = 0.0
request_count: int
class UserCostSummary(BaseModel):
user_id: str | None = None
email: str | None = None
total_cost_microdollars: int
total_input_tokens: int
total_output_tokens: int
request_count: int
class CostLogRow(BaseModel):
id: str
created_at: datetime
user_id: str | None = None
email: str | None = None
graph_exec_id: str | None = None
node_exec_id: str | None = None
block_name: str
provider: str
tracking_type: str | None = None
cost_microdollars: int | None = None
input_tokens: int | None = None
output_tokens: int | None = None
duration: float | None = None
model: str | None = None
class PlatformCostDashboard(BaseModel):
by_provider: list[ProviderCostSummary]
by_user: list[UserCostSummary]
total_cost_microdollars: int
total_requests: int
total_users: int
def _build_where(
start: datetime | None,
end: datetime | None,
provider: str | None,
user_id: str | None,
table_alias: str = "",
) -> tuple[str, list[Any]]:
prefix = f"{table_alias}." if table_alias else ""
clauses: list[str] = []
params: list[Any] = []
idx = 1
if start:
clauses.append(f'{prefix}"createdAt" >= ${idx}::timestamptz')
params.append(start)
idx += 1
if end:
clauses.append(f'{prefix}"createdAt" <= ${idx}::timestamptz')
params.append(end)
idx += 1
if provider:
# Provider names are normalized to lowercase at write time so a plain
# equality check is sufficient and the (provider, createdAt) index is used.
clauses.append(f'{prefix}"provider" = ${idx}')
params.append(provider.lower())
idx += 1
if user_id:
clauses.append(f'{prefix}"userId" = ${idx}')
params.append(user_id)
idx += 1
return (" AND ".join(clauses) if clauses else "TRUE", params)
async def get_platform_cost_dashboard(
start: datetime | None = None,
end: datetime | None = None,
provider: str | None = None,
user_id: str | None = None,
) -> PlatformCostDashboard:
"""Aggregate platform cost logs for the admin dashboard.
Note: by_provider rows are keyed on (provider, tracking_type). A single
provider can therefore appear in multiple rows if it has entries with
different billing models (e.g. "openai" with both "tokens" and "cost_usd"
if pricing is later added for some entries). Frontend treats each row
independently rather than as a provider primary key.
Defaults to the last DEFAULT_DASHBOARD_DAYS days when no start date is
provided to avoid full-table scans on large deployments.
"""
if start is None:
start = datetime.now(timezone.utc) - timedelta(days=DEFAULT_DASHBOARD_DAYS)
where_p, params_p = _build_where(start, end, provider, user_id, "p")
by_provider_rows, by_user_rows, total_user_rows = await asyncio.gather(
query_raw_with_schema(
f"""
SELECT
p."provider",
p."trackingType" AS tracking_type,
COALESCE(SUM(p."costMicrodollars"), 0)::bigint AS total_cost,
COALESCE(SUM(p."inputTokens"), 0)::bigint AS total_input_tokens,
COALESCE(SUM(p."outputTokens"), 0)::bigint AS total_output_tokens,
COALESCE(SUM(p."duration"), 0)::float AS total_duration,
COALESCE(SUM(p."trackingAmount"), 0)::float AS total_tracking_amount,
COUNT(*)::bigint AS request_count
FROM {{schema_prefix}}"PlatformCostLog" p
WHERE {where_p}
GROUP BY p."provider", p."trackingType"
ORDER BY total_cost DESC
LIMIT {MAX_PROVIDER_ROWS}
""",
*params_p,
),
query_raw_with_schema(
f"""
SELECT
p."userId" AS user_id,
u."email",
COALESCE(SUM(p."costMicrodollars"), 0)::bigint AS total_cost,
COALESCE(SUM(p."inputTokens"), 0)::bigint AS total_input_tokens,
COALESCE(SUM(p."outputTokens"), 0)::bigint AS total_output_tokens,
COUNT(*)::bigint AS request_count
FROM {{schema_prefix}}"PlatformCostLog" p
LEFT JOIN {{schema_prefix}}"User" u ON u."id" = p."userId"
WHERE {where_p}
GROUP BY p."userId", u."email"
ORDER BY total_cost DESC
LIMIT {MAX_USER_ROWS}
""",
*params_p,
),
query_raw_with_schema(
f"""
SELECT COUNT(DISTINCT p."userId")::bigint AS cnt
FROM {{schema_prefix}}"PlatformCostLog" p
WHERE {where_p}
""",
*params_p,
),
)
# Use the exact COUNT(DISTINCT userId) so total_users is not capped at
# MAX_USER_ROWS (which would silently report 100 for >100 active users).
total_users = int(total_user_rows[0]["cnt"]) if total_user_rows else 0
total_cost = sum(r["total_cost"] for r in by_provider_rows)
total_requests = sum(r["request_count"] for r in by_provider_rows)
return PlatformCostDashboard(
by_provider=[
ProviderCostSummary(
provider=r["provider"],
tracking_type=r.get("tracking_type"),
total_cost_microdollars=r["total_cost"],
total_input_tokens=r["total_input_tokens"],
total_output_tokens=r["total_output_tokens"],
total_duration_seconds=r.get("total_duration", 0.0),
total_tracking_amount=r.get("total_tracking_amount", 0.0),
request_count=r["request_count"],
)
for r in by_provider_rows
],
by_user=[
UserCostSummary(
user_id=r.get("user_id"),
email=_mask_email(r.get("email")),
total_cost_microdollars=r["total_cost"],
total_input_tokens=r["total_input_tokens"],
total_output_tokens=r["total_output_tokens"],
request_count=r["request_count"],
)
for r in by_user_rows
],
total_cost_microdollars=total_cost,
total_requests=total_requests,
total_users=total_users,
)
async def get_platform_cost_logs(
start: datetime | None = None,
end: datetime | None = None,
provider: str | None = None,
user_id: str | None = None,
page: int = 1,
page_size: int = 50,
) -> tuple[list[CostLogRow], int]:
if start is None:
start = datetime.now(tz=timezone.utc) - timedelta(days=DEFAULT_DASHBOARD_DAYS)
where_sql, params = _build_where(start, end, provider, user_id, "p")
offset = (page - 1) * page_size
limit_idx = len(params) + 1
offset_idx = len(params) + 2
count_rows, rows = await asyncio.gather(
query_raw_with_schema(
f"""
SELECT COUNT(*)::bigint AS cnt
FROM {{schema_prefix}}"PlatformCostLog" p
WHERE {where_sql}
""",
*params,
),
query_raw_with_schema(
f"""
SELECT
p."id",
p."createdAt" AS created_at,
p."userId" AS user_id,
u."email",
p."graphExecId" AS graph_exec_id,
p."nodeExecId" AS node_exec_id,
p."blockName" AS block_name,
p."provider",
p."trackingType" AS tracking_type,
p."costMicrodollars" AS cost_microdollars,
p."inputTokens" AS input_tokens,
p."outputTokens" AS output_tokens,
p."duration",
p."model"
FROM {{schema_prefix}}"PlatformCostLog" p
LEFT JOIN {{schema_prefix}}"User" u ON u."id" = p."userId"
WHERE {where_sql}
ORDER BY p."createdAt" DESC, p."id" DESC
LIMIT ${limit_idx} OFFSET ${offset_idx}
""",
*params,
page_size,
offset,
),
)
total = count_rows[0]["cnt"] if count_rows else 0
logs = [
CostLogRow(
id=r["id"],
created_at=r["created_at"],
user_id=r.get("user_id"),
email=_mask_email(r.get("email")),
graph_exec_id=r.get("graph_exec_id"),
node_exec_id=r.get("node_exec_id"),
block_name=r["block_name"],
provider=r["provider"],
tracking_type=r.get("tracking_type"),
cost_microdollars=r.get("cost_microdollars"),
input_tokens=r.get("input_tokens"),
output_tokens=r.get("output_tokens"),
duration=r.get("duration"),
model=r.get("model"),
)
for r in rows
]
return logs, total

View File

@@ -0,0 +1,266 @@
"""Unit tests for helpers and async functions in platform_cost module."""
from datetime import datetime, timezone
from unittest.mock import AsyncMock, patch
import pytest
from .platform_cost import (
PlatformCostEntry,
_build_where,
_json_or_none,
get_platform_cost_dashboard,
get_platform_cost_logs,
log_platform_cost,
log_platform_cost_safe,
)
class TestJsonOrNone:
def test_returns_none_for_none(self):
assert _json_or_none(None) is None
def test_returns_json_string_for_dict(self):
result = _json_or_none({"key": "value", "num": 42})
assert result is not None
assert '"key"' in result
assert '"value"' in result
def test_returns_json_for_empty_dict(self):
assert _json_or_none({}) == "{}"
class TestBuildWhere:
def test_no_filters_returns_true(self):
sql, params = _build_where(None, None, None, None)
assert sql == "TRUE"
assert params == []
def test_start_only(self):
dt = datetime(2026, 1, 1, tzinfo=timezone.utc)
sql, params = _build_where(dt, None, None, None)
assert '"createdAt" >= $1::timestamptz' in sql
assert params == [dt]
def test_end_only(self):
dt = datetime(2026, 6, 1, tzinfo=timezone.utc)
sql, params = _build_where(None, dt, None, None)
assert '"createdAt" <= $1::timestamptz' in sql
assert params == [dt]
def test_provider_only(self):
# Provider names are normalized to lowercase at write time, so the
# filter uses a plain equality check. The input is also lowercased so
# "OpenAI" and "openai" both match stored rows.
sql, params = _build_where(None, None, "OpenAI", None)
assert '"provider" = $1' in sql
assert params == ["openai"]
def test_user_id_only(self):
sql, params = _build_where(None, None, None, "user-123")
assert '"userId" = $1' in sql
assert params == ["user-123"]
def test_all_filters(self):
start = datetime(2026, 1, 1, tzinfo=timezone.utc)
end = datetime(2026, 6, 1, tzinfo=timezone.utc)
sql, params = _build_where(start, end, "Anthropic", "u1")
assert "$1" in sql
assert "$2" in sql
assert "$3" in sql
assert "$4" in sql
assert len(params) == 4
# Provider is lowercased at filter time to match stored lowercase values.
assert params == [start, end, "anthropic", "u1"]
def test_table_alias(self):
dt = datetime(2026, 1, 1, tzinfo=timezone.utc)
sql, params = _build_where(dt, None, None, None, table_alias="p")
assert 'p."createdAt"' in sql
assert params == [dt]
def test_clauses_joined_with_and(self):
start = datetime(2026, 1, 1, tzinfo=timezone.utc)
end = datetime(2026, 6, 1, tzinfo=timezone.utc)
sql, _ = _build_where(start, end, None, None)
assert " AND " in sql
def _make_entry(**overrides: object) -> PlatformCostEntry:
return PlatformCostEntry.model_validate(
{
"user_id": "user-1",
"block_id": "block-1",
"block_name": "TestBlock",
"provider": "openai",
"credential_id": "cred-1",
**overrides,
}
)
class TestLogPlatformCost:
@pytest.mark.asyncio
async def test_calls_execute_raw_with_schema(self):
mock_exec = AsyncMock()
with patch("backend.data.platform_cost.execute_raw_with_schema", new=mock_exec):
entry = _make_entry(
input_tokens=100,
output_tokens=50,
cost_microdollars=5000,
model="gpt-4",
metadata={"key": "val"},
)
await log_platform_cost(entry)
mock_exec.assert_awaited_once()
args = mock_exec.call_args
assert args[0][1] == "user-1" # user_id is first param
assert args[0][6] == "block-1" # block_id
assert args[0][7] == "TestBlock" # block_name
@pytest.mark.asyncio
async def test_metadata_none_passes_none(self):
mock_exec = AsyncMock()
with patch("backend.data.platform_cost.execute_raw_with_schema", new=mock_exec):
entry = _make_entry(metadata=None)
await log_platform_cost(entry)
args = mock_exec.call_args
assert args[0][-1] is None # last arg is metadata json
class TestLogPlatformCostSafe:
@pytest.mark.asyncio
async def test_does_not_raise_on_error(self):
with patch(
"backend.data.platform_cost.execute_raw_with_schema",
new=AsyncMock(side_effect=RuntimeError("DB down")),
):
entry = _make_entry()
await log_platform_cost_safe(entry)
@pytest.mark.asyncio
async def test_succeeds_when_no_error(self):
mock_exec = AsyncMock()
with patch("backend.data.platform_cost.execute_raw_with_schema", new=mock_exec):
entry = _make_entry()
await log_platform_cost_safe(entry)
mock_exec.assert_awaited_once()
class TestGetPlatformCostDashboard:
@pytest.mark.asyncio
async def test_returns_dashboard_with_data(self):
provider_rows = [
{
"provider": "openai",
"tracking_type": "tokens",
"total_cost": 5000,
"total_input_tokens": 1000,
"total_output_tokens": 500,
"total_duration": 10.5,
"request_count": 3,
}
]
user_rows = [
{
"user_id": "u1",
"email": "a@b.com",
"total_cost": 5000,
"total_input_tokens": 1000,
"total_output_tokens": 500,
"request_count": 3,
}
]
# Dashboard runs 3 queries: by_provider, by_user, COUNT(DISTINCT userId).
mock_query = AsyncMock(side_effect=[provider_rows, user_rows, [{"cnt": 1}]])
with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
dashboard = await get_platform_cost_dashboard()
assert dashboard.total_cost_microdollars == 5000
assert dashboard.total_requests == 3
assert dashboard.total_users == 1
assert len(dashboard.by_provider) == 1
assert dashboard.by_provider[0].provider == "openai"
assert dashboard.by_provider[0].tracking_type == "tokens"
assert dashboard.by_provider[0].total_duration_seconds == 10.5
assert len(dashboard.by_user) == 1
assert dashboard.by_user[0].email == "a***@b.com"
@pytest.mark.asyncio
async def test_returns_empty_dashboard(self):
mock_query = AsyncMock(side_effect=[[], [], []])
with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
dashboard = await get_platform_cost_dashboard()
assert dashboard.total_cost_microdollars == 0
assert dashboard.total_requests == 0
assert dashboard.total_users == 0
assert dashboard.by_provider == []
assert dashboard.by_user == []
@pytest.mark.asyncio
async def test_passes_filters_to_queries(self):
start = datetime(2026, 1, 1, tzinfo=timezone.utc)
mock_query = AsyncMock(side_effect=[[], [], []])
with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
await get_platform_cost_dashboard(
start=start, provider="openai", user_id="u1"
)
assert mock_query.await_count == 3
first_call_sql = mock_query.call_args_list[0][0][0]
assert "createdAt" in first_call_sql
class TestGetPlatformCostLogs:
@pytest.mark.asyncio
async def test_returns_logs_and_total(self):
count_rows = [{"cnt": 1}]
log_rows = [
{
"id": "log-1",
"created_at": datetime(2026, 3, 1, tzinfo=timezone.utc),
"user_id": "u1",
"email": "a@b.com",
"graph_exec_id": "g1",
"node_exec_id": "n1",
"block_name": "TestBlock",
"provider": "openai",
"tracking_type": "tokens",
"cost_microdollars": 5000,
"input_tokens": 100,
"output_tokens": 50,
"duration": 1.5,
"model": "gpt-4",
}
]
mock_query = AsyncMock(side_effect=[count_rows, log_rows])
with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
logs, total = await get_platform_cost_logs(page=1, page_size=10)
assert total == 1
assert len(logs) == 1
assert logs[0].id == "log-1"
assert logs[0].provider == "openai"
assert logs[0].model == "gpt-4"
@pytest.mark.asyncio
async def test_returns_empty_when_no_data(self):
mock_query = AsyncMock(side_effect=[[{"cnt": 0}], []])
with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
logs, total = await get_platform_cost_logs()
assert total == 0
assert logs == []
@pytest.mark.asyncio
async def test_pagination_offset(self):
mock_query = AsyncMock(side_effect=[[{"cnt": 100}], []])
with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
logs, total = await get_platform_cost_logs(page=3, page_size=25)
assert total == 100
second_call_args = mock_query.call_args_list[1][0]
assert 25 in second_call_args # page_size
assert 50 in second_call_args # offset = (3-1) * 25
@pytest.mark.asyncio
async def test_empty_count_returns_zero(self):
mock_query = AsyncMock(side_effect=[[], []])
with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
logs, total = await get_platform_cost_logs()
assert total == 0

View File

@@ -82,6 +82,28 @@ async def get_user_by_email(email: str) -> Optional[User]:
raise DatabaseError(f"Failed to get user by email {email}: {e}") from e
async def search_users(query: str, limit: int = 20) -> list[tuple[str, str | None]]:
"""Search users by partial email or name.
Returns a list of ``(user_id, email)`` tuples, up to *limit* results.
Searches the User table directly — no dependency on credit history.
"""
query = query.strip()
if not query or len(query) < 3:
return []
users = await prisma.user.find_many(
where={
"OR": [
{"email": {"contains": query, "mode": "insensitive"}},
{"name": {"contains": query, "mode": "insensitive"}},
],
},
take=limit,
order={"email": "asc"},
)
return [(u.id, u.email) for u in users]
async def update_user_email(user_id: str, email: str):
try:
# Get old email first for cache invalidation

View File

@@ -0,0 +1,291 @@
"""Helpers for platform cost tracking on system-credential block executions."""
import asyncio
import logging
import threading
from typing import TYPE_CHECKING, Any, cast
from backend.blocks._base import Block, BlockSchema
from backend.copilot.token_tracking import _pending_log_tasks as _copilot_tasks
from backend.copilot.token_tracking import (
_pending_log_tasks_lock as _copilot_tasks_lock,
)
from backend.data.execution import NodeExecutionEntry
from backend.data.model import NodeExecutionStats
from backend.data.platform_cost import PlatformCostEntry, usd_to_microdollars
from backend.executor.utils import block_usage_cost
from backend.integrations.credentials_store import is_system_credential
from backend.integrations.providers import ProviderName
if TYPE_CHECKING:
from backend.data.db_manager import DatabaseManagerAsyncClient
logger = logging.getLogger(__name__)
# Provider groupings by billing model — used when the block didn't explicitly
# declare stats.provider_cost_type and we fall back to provider-name
# heuristics. Values match ProviderName enum values.
_CHARACTER_BILLED_PROVIDERS = frozenset(
{ProviderName.D_ID.value, ProviderName.ELEVENLABS.value}
)
_WALLTIME_BILLED_PROVIDERS = frozenset(
{
ProviderName.FAL.value,
ProviderName.REVID.value,
ProviderName.REPLICATE.value,
}
)
# Hold strong references to in-flight log tasks so the event loop doesn't
# garbage-collect them mid-execution. Tasks remove themselves on completion.
# _pending_log_tasks_lock guards all reads and writes: worker threads call
# discard() via done callbacks while drain_pending_cost_logs() iterates.
_pending_log_tasks: set[asyncio.Task] = set()
_pending_log_tasks_lock = threading.Lock()
# Per-loop semaphores: asyncio.Semaphore is not thread-safe and must not be
# shared across event loops running in different threads. Key by loop instance
# so each executor worker thread gets its own semaphore.
_log_semaphores: dict[asyncio.AbstractEventLoop, asyncio.Semaphore] = {}
def _get_log_semaphore() -> asyncio.Semaphore:
loop = asyncio.get_running_loop()
sem = _log_semaphores.get(loop)
if sem is None:
sem = asyncio.Semaphore(50)
_log_semaphores[loop] = sem
return sem
async def drain_pending_cost_logs(timeout: float = 5.0) -> None:
"""Await all in-flight cost log tasks with a timeout.
Drains both the executor cost log tasks (_pending_log_tasks in this module,
used for block execution cost tracking via DatabaseManagerAsyncClient) and
the copilot cost log tasks (token_tracking._pending_log_tasks, used for
copilot LLM turns via platform_cost_db()).
Call this during graceful shutdown to flush pending INSERT tasks before
the process exits. Tasks that don't complete within `timeout` seconds are
abandoned and their failures are already logged by _safe_log.
"""
# asyncio.wait() requires all tasks to belong to the running event loop.
# _pending_log_tasks is shared across executor worker threads (each with
# its own loop), so filter to only tasks owned by the current loop.
# Acquire the lock to take a consistent snapshot (worker threads call
# discard() via done callbacks concurrently with this iteration).
current_loop = asyncio.get_running_loop()
with _pending_log_tasks_lock:
all_pending = [t for t in _pending_log_tasks if t.get_loop() is current_loop]
if all_pending:
logger.info("Draining %d executor cost log task(s)", len(all_pending))
_, still_pending = await asyncio.wait(all_pending, timeout=timeout)
if still_pending:
logger.warning(
"%d executor cost log task(s) did not complete within %.1fs",
len(still_pending),
timeout,
)
# Also drain copilot cost log tasks (token_tracking._pending_log_tasks)
with _copilot_tasks_lock:
copilot_pending = [t for t in _copilot_tasks if t.get_loop() is current_loop]
if copilot_pending:
logger.info("Draining %d copilot cost log task(s)", len(copilot_pending))
_, still_pending = await asyncio.wait(copilot_pending, timeout=timeout)
if still_pending:
logger.warning(
"%d copilot cost log task(s) did not complete within %.1fs",
len(still_pending),
timeout,
)
def _schedule_log(
db_client: "DatabaseManagerAsyncClient", entry: PlatformCostEntry
) -> None:
async def _safe_log() -> None:
async with _get_log_semaphore():
try:
await db_client.log_platform_cost(entry)
except Exception:
logger.exception(
"Failed to log platform cost for user=%s provider=%s block=%s",
entry.user_id,
entry.provider,
entry.block_name,
)
task = asyncio.create_task(_safe_log())
with _pending_log_tasks_lock:
_pending_log_tasks.add(task)
def _remove(t: asyncio.Task) -> None:
with _pending_log_tasks_lock:
_pending_log_tasks.discard(t)
task.add_done_callback(_remove)
def _extract_model_name(raw: str | dict | None) -> str | None:
"""Return a string model name from a block input field, or None.
Handles str (returned as-is), dict (e.g. an enum wrapper, skipped), and
None (no model field). Unexpected types are coerced to str as a fallback.
"""
if raw is None:
return None
if isinstance(raw, str):
return raw
if isinstance(raw, dict):
return None
return str(raw)
def resolve_tracking(
provider: str,
stats: NodeExecutionStats,
input_data: dict[str, Any],
) -> tuple[str, float]:
"""Return (tracking_type, tracking_amount) based on provider billing model.
Preference order:
1. Block-declared: if the block set `provider_cost_type` on its stats,
honor it directly (paired with `provider_cost` as the amount).
2. Heuristic fallback: infer from `provider_cost`/token counts, then
from provider name for per-character / per-second billing.
"""
# 1. Block explicitly declared its cost type (only when an amount is present)
if stats.provider_cost_type and stats.provider_cost is not None:
return stats.provider_cost_type, stats.provider_cost
# 2. Provider returned actual USD cost (OpenRouter, Exa)
if stats.provider_cost is not None:
return "cost_usd", stats.provider_cost
# 3. LLM providers: track by tokens
if stats.input_token_count or stats.output_token_count:
return "tokens", float(
(stats.input_token_count or 0) + (stats.output_token_count or 0)
)
# 4. Provider-specific billing heuristics
# TTS: billed per character of input text
if provider == ProviderName.UNREAL_SPEECH.value:
text = input_data.get("text", "")
return "characters", float(len(text)) if isinstance(text, str) else 0.0
# D-ID + ElevenLabs voice: billed per character of script
if provider in _CHARACTER_BILLED_PROVIDERS:
text = (
input_data.get("script_input", "")
or input_data.get("text", "")
or input_data.get("script", "") # VideoNarrationBlock uses `script`
)
return "characters", float(len(text)) if isinstance(text, str) else 0.0
# E2B: billed per second of sandbox time
if provider == ProviderName.E2B.value:
return "sandbox_seconds", round(stats.walltime, 3) if stats.walltime else 0.0
# Video/image gen: walltime includes queue + generation + polling
if provider in _WALLTIME_BILLED_PROVIDERS:
return "walltime_seconds", round(stats.walltime, 3) if stats.walltime else 0.0
# Per-request: Google Maps, Ideogram, Nvidia, Apollo, etc.
# All billed per API call - count 1 per block execution.
return "per_run", 1.0
async def log_system_credential_cost(
node_exec: NodeExecutionEntry,
block: Block,
stats: NodeExecutionStats,
db_client: "DatabaseManagerAsyncClient",
) -> None:
"""Check if a system credential was used and log the platform cost.
Routes through DatabaseManagerAsyncClient so the write goes via the
message-passing DB service rather than calling Prisma directly (which
is not connected in the executor process).
Logs only the first matching system credential field (one log per
execution). Any unexpected error is caught and logged — cost logging
is strictly best-effort and must never disrupt block execution.
Note: costMicrodollars is left null for providers that don't return
a USD cost. The credit_cost in metadata captures our internal credit
charge as a proxy.
"""
try:
if node_exec.execution_context.dry_run:
return
input_data = node_exec.inputs
input_model = cast(type[BlockSchema], block.input_schema)
for field_name in input_model.get_credentials_fields():
cred_data = input_data.get(field_name)
if not cred_data or not isinstance(cred_data, dict):
continue
cred_id = cred_data.get("id", "")
if not cred_id or not is_system_credential(cred_id):
continue
model_name = _extract_model_name(input_data.get("model"))
credit_cost, _ = block_usage_cost(block=block, input_data=input_data)
provider_name = cred_data.get("provider", "unknown")
tracking_type, tracking_amount = resolve_tracking(
provider=provider_name,
stats=stats,
input_data=input_data,
)
# Only treat provider_cost as USD when the tracking type says so.
# For other types (items, characters, per_run, ...) the
# provider_cost field holds the raw amount, not a dollar value.
# Use tracking_amount (the normalized value from resolve_tracking)
# rather than raw stats.provider_cost to avoid unit mismatches.
cost_microdollars = None
if tracking_type == "cost_usd":
cost_microdollars = usd_to_microdollars(tracking_amount)
meta: dict[str, Any] = {
"tracking_type": tracking_type,
"tracking_amount": tracking_amount,
}
if credit_cost is not None:
meta["credit_cost"] = credit_cost
if stats.provider_cost is not None:
# Use 'provider_cost_raw' — the value's unit varies by tracking
# type (USD for cost_usd, count for items/characters/per_run, etc.)
meta["provider_cost_raw"] = stats.provider_cost
_schedule_log(
db_client,
PlatformCostEntry(
user_id=node_exec.user_id,
graph_exec_id=node_exec.graph_exec_id,
node_exec_id=node_exec.node_exec_id,
graph_id=node_exec.graph_id,
node_id=node_exec.node_id,
block_id=node_exec.block_id,
block_name=block.name,
provider=provider_name,
credential_id=cred_id,
cost_microdollars=cost_microdollars,
input_tokens=stats.input_token_count,
output_tokens=stats.output_token_count,
data_size=stats.output_size if stats.output_size > 0 else None,
duration=stats.walltime if stats.walltime > 0 else None,
model=model_name,
tracking_type=tracking_type,
tracking_amount=tracking_amount,
metadata=meta,
),
)
return # One log per execution is enough
except Exception:
logger.exception("log_system_credential_cost failed unexpectedly")

View File

@@ -45,6 +45,10 @@ from backend.data.notifications import (
ZeroBalanceData,
)
from backend.data.rabbitmq import SyncRabbitMQ
from backend.executor.cost_tracking import (
drain_pending_cost_logs,
log_system_credential_cost,
)
from backend.integrations.creds_manager import IntegrationCredentialsManager
from backend.notifications.notifications import queue_notification
from backend.util import json
@@ -692,6 +696,15 @@ class ExecutionProcessor:
stats=graph_stats,
)
# Log platform cost if system credentials were used (only on success)
if status == ExecutionStatus.COMPLETED:
await log_system_credential_cost(
node_exec=node_exec,
block=node.block,
stats=execution_stats,
db_client=db_client,
)
return execution_stats
@async_time_measured
@@ -2044,6 +2057,18 @@ class ExecutionManager(AppProcess):
prefix + " [cancel-consumer]",
)
# Drain any in-flight cost log tasks before exit so we don't silently
# drop INSERT operations during deployments.
loop = getattr(self, "node_execution_loop", None)
if loop is not None and loop.is_running():
try:
asyncio.run_coroutine_threadsafe(
drain_pending_cost_logs(), loop
).result(timeout=10)
logger.info(f"{prefix} ✅ Cost log tasks drained")
except Exception as e:
logger.warning(f"{prefix} ⚠️ Failed to drain cost log tasks: {e}")
logger.info(f"{prefix} ✅ Finished GraphExec cleanup")
super().cleanup()

View File

@@ -0,0 +1,567 @@
"""Unit tests for resolve_tracking and log_system_credential_cost."""
import asyncio
from typing import Any
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from backend.data.execution import ExecutionContext, NodeExecutionEntry
from backend.data.model import NodeExecutionStats
from backend.executor.cost_tracking import log_system_credential_cost, resolve_tracking
# ---------------------------------------------------------------------------
# resolve_tracking
# ---------------------------------------------------------------------------
class TestResolveTracking:
def _stats(self, **overrides: Any) -> NodeExecutionStats:
return NodeExecutionStats(**overrides)
def test_provider_cost_returns_cost_usd(self):
stats = self._stats(provider_cost=0.0042)
tt, amt = resolve_tracking("openai", stats, {})
assert tt == "cost_usd"
assert amt == 0.0042
def test_token_counts_return_tokens(self):
stats = self._stats(input_token_count=300, output_token_count=100)
tt, amt = resolve_tracking("anthropic", stats, {})
assert tt == "tokens"
assert amt == 400.0
def test_token_counts_only_input(self):
stats = self._stats(input_token_count=500)
tt, amt = resolve_tracking("groq", stats, {})
assert tt == "tokens"
assert amt == 500.0
def test_unreal_speech_returns_characters(self):
stats = self._stats()
tt, amt = resolve_tracking("unreal_speech", stats, {"text": "Hello world"})
assert tt == "characters"
assert amt == 11.0
def test_unreal_speech_empty_text(self):
stats = self._stats()
tt, amt = resolve_tracking("unreal_speech", stats, {"text": ""})
assert tt == "characters"
assert amt == 0.0
def test_unreal_speech_non_string_text(self):
stats = self._stats()
tt, amt = resolve_tracking("unreal_speech", stats, {"text": 123})
assert tt == "characters"
assert amt == 0.0
def test_d_id_uses_script_input(self):
stats = self._stats()
tt, amt = resolve_tracking("d_id", stats, {"script_input": "Hello"})
assert tt == "characters"
assert amt == 5.0
def test_elevenlabs_uses_text(self):
stats = self._stats()
tt, amt = resolve_tracking("elevenlabs", stats, {"text": "Say this"})
assert tt == "characters"
assert amt == 8.0
def test_elevenlabs_fallback_to_text_when_no_script_input(self):
stats = self._stats()
tt, amt = resolve_tracking("elevenlabs", stats, {"text": "Fallback text"})
assert tt == "characters"
assert amt == 13.0
def test_elevenlabs_uses_script_field(self):
"""VideoNarrationBlock (elevenlabs) uses `script` field, not script_input/text."""
stats = self._stats()
tt, amt = resolve_tracking("elevenlabs", stats, {"script": "Narration"})
assert tt == "characters"
assert amt == 9.0
def test_block_declared_cost_type_items(self):
"""Block explicitly setting provider_cost_type='items' short-circuits heuristics."""
stats = self._stats(provider_cost=5.0, provider_cost_type="items")
tt, amt = resolve_tracking("google_maps", stats, {})
assert tt == "items"
assert amt == 5.0
def test_block_declared_cost_type_characters(self):
"""TTS block can declare characters directly, bypassing input_data lookup."""
stats = self._stats(provider_cost=42.0, provider_cost_type="characters")
tt, amt = resolve_tracking("unreal_speech", stats, {})
assert tt == "characters"
assert amt == 42.0
def test_block_declared_cost_type_wins_over_tokens(self):
"""provider_cost_type takes precedence over token-based heuristic."""
stats = self._stats(
provider_cost=1.0,
provider_cost_type="per_run",
input_token_count=500,
)
tt, amt = resolve_tracking("openai", stats, {})
assert tt == "per_run"
assert amt == 1.0
def test_e2b_returns_sandbox_seconds(self):
stats = self._stats(walltime=45.123)
tt, amt = resolve_tracking("e2b", stats, {})
assert tt == "sandbox_seconds"
assert amt == 45.123
def test_e2b_no_walltime(self):
stats = self._stats(walltime=0)
tt, amt = resolve_tracking("e2b", stats, {})
assert tt == "sandbox_seconds"
assert amt == 0.0
def test_fal_returns_walltime(self):
stats = self._stats(walltime=12.5)
tt, amt = resolve_tracking("fal", stats, {})
assert tt == "walltime_seconds"
assert amt == 12.5
def test_revid_returns_walltime(self):
stats = self._stats(walltime=60.0)
tt, amt = resolve_tracking("revid", stats, {})
assert tt == "walltime_seconds"
assert amt == 60.0
def test_replicate_returns_walltime(self):
stats = self._stats(walltime=30.0)
tt, amt = resolve_tracking("replicate", stats, {})
assert tt == "walltime_seconds"
assert amt == 30.0
def test_unknown_provider_returns_per_run(self):
stats = self._stats()
tt, amt = resolve_tracking("google_maps", stats, {})
assert tt == "per_run"
assert amt == 1.0
def test_provider_cost_takes_precedence_over_tokens(self):
stats = self._stats(
provider_cost=0.01, input_token_count=500, output_token_count=200
)
tt, amt = resolve_tracking("openai", stats, {})
assert tt == "cost_usd"
assert amt == 0.01
def test_provider_cost_zero_is_not_none(self):
"""provider_cost=0.0 is falsy but should still be tracked as cost_usd
(e.g. free-tier or fully-cached responses from OpenRouter)."""
stats = self._stats(provider_cost=0.0)
tt, amt = resolve_tracking("open_router", stats, {})
assert tt == "cost_usd"
assert amt == 0.0
def test_tokens_take_precedence_over_provider_specific(self):
stats = self._stats(input_token_count=100, walltime=10.0)
tt, amt = resolve_tracking("fal", stats, {})
assert tt == "tokens"
assert amt == 100.0
# ---------------------------------------------------------------------------
# log_system_credential_cost
# ---------------------------------------------------------------------------
def _make_db_client() -> MagicMock:
db_client = MagicMock()
db_client.log_platform_cost = AsyncMock()
return db_client
def _make_block(has_credentials: bool = True) -> MagicMock:
block = MagicMock()
block.name = "TestBlock"
input_schema = MagicMock()
if has_credentials:
input_schema.get_credentials_fields.return_value = {"credentials": MagicMock()}
else:
input_schema.get_credentials_fields.return_value = {}
block.input_schema = input_schema
return block
def _make_node_exec(
inputs: dict | None = None,
dry_run: bool = False,
) -> NodeExecutionEntry:
return NodeExecutionEntry(
user_id="user-1",
graph_exec_id="gx-1",
graph_id="g-1",
graph_version=1,
node_exec_id="nx-1",
node_id="n-1",
block_id="b-1",
inputs=inputs or {},
execution_context=ExecutionContext(dry_run=dry_run),
)
class TestLogSystemCredentialCost:
@pytest.mark.asyncio
async def test_skips_dry_run(self):
db_client = _make_db_client()
node_exec = _make_node_exec(dry_run=True)
block = _make_block()
stats = NodeExecutionStats()
await log_system_credential_cost(node_exec, block, stats, db_client)
db_client.log_platform_cost.assert_not_awaited()
@pytest.mark.asyncio
async def test_skips_when_no_credential_fields(self):
db_client = _make_db_client()
node_exec = _make_node_exec(inputs={})
block = _make_block(has_credentials=False)
stats = NodeExecutionStats()
await log_system_credential_cost(node_exec, block, stats, db_client)
db_client.log_platform_cost.assert_not_awaited()
@pytest.mark.asyncio
async def test_skips_when_cred_data_missing(self):
db_client = _make_db_client()
node_exec = _make_node_exec(inputs={})
block = _make_block()
stats = NodeExecutionStats()
await log_system_credential_cost(node_exec, block, stats, db_client)
db_client.log_platform_cost.assert_not_awaited()
@pytest.mark.asyncio
async def test_skips_when_not_system_credential(self):
db_client = _make_db_client()
with patch(
"backend.executor.cost_tracking.is_system_credential",
return_value=False,
):
node_exec = _make_node_exec(
inputs={
"credentials": {"id": "user-cred-123", "provider": "openai"},
}
)
block = _make_block()
stats = NodeExecutionStats()
await log_system_credential_cost(node_exec, block, stats, db_client)
db_client.log_platform_cost.assert_not_awaited()
@pytest.mark.asyncio
async def test_logs_with_system_credential(self):
db_client = _make_db_client()
with (
patch(
"backend.executor.cost_tracking.is_system_credential", return_value=True
),
patch(
"backend.executor.cost_tracking.block_usage_cost",
return_value=(10, None),
),
):
node_exec = _make_node_exec(
inputs={
"credentials": {"id": "sys-cred-1", "provider": "openai"},
"model": "gpt-4",
}
)
block = _make_block()
stats = NodeExecutionStats(input_token_count=500, output_token_count=200)
await log_system_credential_cost(node_exec, block, stats, db_client)
await asyncio.sleep(0)
db_client.log_platform_cost.assert_awaited_once()
entry = db_client.log_platform_cost.call_args[0][0]
assert entry.user_id == "user-1"
assert entry.provider == "openai"
assert entry.block_name == "TestBlock"
assert entry.model == "gpt-4"
assert entry.input_tokens == 500
assert entry.output_tokens == 200
assert entry.tracking_type == "tokens"
assert entry.metadata["tracking_type"] == "tokens"
assert entry.metadata["tracking_amount"] == 700.0
assert entry.metadata["credit_cost"] == 10
@pytest.mark.asyncio
async def test_logs_with_provider_cost(self):
db_client = _make_db_client()
with (
patch(
"backend.executor.cost_tracking.is_system_credential", return_value=True
),
patch(
"backend.executor.cost_tracking.block_usage_cost",
return_value=(5, None),
),
):
node_exec = _make_node_exec(
inputs={
"credentials": {"id": "sys-cred-2", "provider": "open_router"},
}
)
block = _make_block()
stats = NodeExecutionStats(provider_cost=0.0015)
await log_system_credential_cost(node_exec, block, stats, db_client)
await asyncio.sleep(0)
entry = db_client.log_platform_cost.call_args[0][0]
assert entry.cost_microdollars == 1500
assert entry.tracking_type == "cost_usd"
assert entry.metadata["tracking_type"] == "cost_usd"
assert entry.metadata["provider_cost_raw"] == 0.0015
@pytest.mark.asyncio
async def test_model_name_enum_converted_to_str(self):
db_client = _make_db_client()
with (
patch(
"backend.executor.cost_tracking.is_system_credential", return_value=True
),
patch(
"backend.executor.cost_tracking.block_usage_cost",
return_value=(0, None),
),
):
from enum import Enum
class FakeModel(Enum):
GPT4 = "gpt-4"
node_exec = _make_node_exec(
inputs={
"credentials": {"id": "sys-cred", "provider": "openai"},
"model": FakeModel.GPT4,
}
)
block = _make_block()
stats = NodeExecutionStats()
await log_system_credential_cost(node_exec, block, stats, db_client)
await asyncio.sleep(0)
entry = db_client.log_platform_cost.call_args[0][0]
assert entry.model == "FakeModel.GPT4"
@pytest.mark.asyncio
async def test_model_name_dict_becomes_none(self):
db_client = _make_db_client()
with (
patch(
"backend.executor.cost_tracking.is_system_credential", return_value=True
),
patch(
"backend.executor.cost_tracking.block_usage_cost",
return_value=(0, None),
),
):
node_exec = _make_node_exec(
inputs={
"credentials": {"id": "sys-cred", "provider": "openai"},
"model": {"nested": "value"},
}
)
block = _make_block()
stats = NodeExecutionStats()
await log_system_credential_cost(node_exec, block, stats, db_client)
await asyncio.sleep(0)
entry = db_client.log_platform_cost.call_args[0][0]
assert entry.model is None
@pytest.mark.asyncio
async def test_does_not_raise_when_block_usage_cost_raises(self):
"""log_system_credential_cost must swallow exceptions from block_usage_cost."""
db_client = _make_db_client()
with (
patch(
"backend.executor.cost_tracking.is_system_credential", return_value=True
),
patch(
"backend.executor.cost_tracking.block_usage_cost",
side_effect=RuntimeError("pricing lookup failed"),
),
):
node_exec = _make_node_exec(
inputs={
"credentials": {"id": "sys-cred", "provider": "openai"},
}
)
block = _make_block()
stats = NodeExecutionStats()
# Should not raise — outer except must catch block_usage_cost error
await log_system_credential_cost(node_exec, block, stats, db_client)
@pytest.mark.asyncio
async def test_round_instead_of_int_for_microdollars(self):
db_client = _make_db_client()
with (
patch(
"backend.executor.cost_tracking.is_system_credential", return_value=True
),
patch(
"backend.executor.cost_tracking.block_usage_cost",
return_value=(0, None),
),
):
node_exec = _make_node_exec(
inputs={
"credentials": {"id": "sys-cred", "provider": "openai"},
}
)
block = _make_block()
# 0.0015 * 1_000_000 = 1499.9999999... with float math
# round() should give 1500, int() would give 1499
stats = NodeExecutionStats(provider_cost=0.0015)
await log_system_credential_cost(node_exec, block, stats, db_client)
await asyncio.sleep(0)
entry = db_client.log_platform_cost.call_args[0][0]
assert entry.cost_microdollars == 1500
@pytest.mark.asyncio
async def test_per_run_metadata_has_no_provider_cost_raw(self):
"""For per-run providers (google_maps etc), provider_cost_raw is absent
from metadata since stats.provider_cost is None."""
db_client = _make_db_client()
with (
patch(
"backend.executor.cost_tracking.is_system_credential", return_value=True
),
patch(
"backend.executor.cost_tracking.block_usage_cost",
return_value=(0, None),
),
):
node_exec = _make_node_exec(
inputs={
"credentials": {"id": "sys-cred", "provider": "google_maps"},
}
)
block = _make_block()
stats = NodeExecutionStats() # no provider_cost
await log_system_credential_cost(node_exec, block, stats, db_client)
await asyncio.sleep(0)
entry = db_client.log_platform_cost.call_args[0][0]
assert entry.tracking_type == "per_run"
assert "provider_cost_raw" not in (entry.metadata or {})
# ---------------------------------------------------------------------------
# merge_stats accumulation
# ---------------------------------------------------------------------------
class TestMergeStats:
"""Tests for NodeExecutionStats accumulation via += (used by Block.merge_stats)."""
def test_accumulates_output_size(self):
stats = NodeExecutionStats()
stats += NodeExecutionStats(output_size=10)
stats += NodeExecutionStats(output_size=25)
assert stats.output_size == 35
def test_accumulates_tokens(self):
stats = NodeExecutionStats()
stats += NodeExecutionStats(input_token_count=100, output_token_count=50)
stats += NodeExecutionStats(input_token_count=200, output_token_count=150)
assert stats.input_token_count == 300
assert stats.output_token_count == 200
def test_preserves_provider_cost(self):
stats = NodeExecutionStats()
stats += NodeExecutionStats(provider_cost=0.005)
stats += NodeExecutionStats(output_size=10)
assert stats.provider_cost == 0.005
assert stats.output_size == 10
def test_provider_cost_accumulates(self):
"""Multiple merge_stats with provider_cost should sum (multi-round
tool-calling in copilot / retries can report cost separately)."""
stats = NodeExecutionStats()
stats += NodeExecutionStats(provider_cost=0.001)
stats += NodeExecutionStats(provider_cost=0.002)
stats += NodeExecutionStats(provider_cost=0.003)
assert stats.provider_cost == pytest.approx(0.006)
def test_provider_cost_none_does_not_overwrite(self):
"""A None provider_cost must not wipe a previously-set value."""
stats = NodeExecutionStats(provider_cost=0.01)
stats += NodeExecutionStats() # provider_cost=None by default
assert stats.provider_cost == 0.01
def test_provider_cost_type_last_write_wins(self):
"""provider_cost_type is a Literal — last set value wins on merge."""
stats = NodeExecutionStats(provider_cost_type="tokens")
stats += NodeExecutionStats(provider_cost_type="items")
assert stats.provider_cost_type == "items"
# ---------------------------------------------------------------------------
# on_node_execution -> log_system_credential_cost integration
# ---------------------------------------------------------------------------
class TestManagerCostTrackingIntegration:
@pytest.mark.asyncio
async def test_log_called_with_accumulated_stats(self):
"""Verify that log_system_credential_cost receives stats that could
have been accumulated by merge_stats across multiple yield steps."""
db_client = _make_db_client()
with (
patch(
"backend.executor.cost_tracking.is_system_credential", return_value=True
),
patch(
"backend.executor.cost_tracking.block_usage_cost",
return_value=(5, None),
),
):
stats = NodeExecutionStats()
stats += NodeExecutionStats(output_size=10, input_token_count=100)
stats += NodeExecutionStats(output_size=25, input_token_count=200)
assert stats.output_size == 35
assert stats.input_token_count == 300
node_exec = _make_node_exec(
inputs={
"credentials": {"id": "sys-cred-acc", "provider": "openai"},
"model": "gpt-4",
}
)
block = _make_block()
await log_system_credential_cost(node_exec, block, stats, db_client)
await asyncio.sleep(0)
db_client.log_platform_cost.assert_awaited_once()
entry = db_client.log_platform_cost.call_args[0][0]
assert entry.input_tokens == 300
assert entry.tracking_type == "tokens"
assert entry.metadata["tracking_amount"] == 300.0
@pytest.mark.asyncio
async def test_skips_cost_log_when_status_is_failed(self):
"""Manager only calls log_system_credential_cost on COMPLETED status.
This test verifies the guard condition `if status == COMPLETED` directly:
calling log_system_credential_cost only happens on success, never on
FAILED or ERROR executions.
"""
from backend.data.execution import ExecutionStatus
db_client = _make_db_client()
node_exec = _make_node_exec(
inputs={"credentials": {"id": "sys-cred", "provider": "openai"}}
)
block = _make_block()
stats = NodeExecutionStats(input_token_count=100)
# Simulate the manager guard: only call on COMPLETED
status = ExecutionStatus.FAILED
if status == ExecutionStatus.COMPLETED:
await log_system_credential_cost(node_exec, block, stats, db_client)
db_client.log_platform_cost.assert_not_awaited()

View File

@@ -121,10 +121,16 @@ def _make_hashable_key(
def _make_redis_key(key: tuple[Any, ...], func_name: str) -> str:
"""Convert a hashable key tuple to a Redis key string."""
# Ensure key is already hashable
hashable_key = key if isinstance(key, tuple) else (key,)
return f"cache:{func_name}:{hash(hashable_key)}"
"""Convert a hashable key tuple to a Redis key string.
Uses SHA-256 instead of Python's built-in ``hash()`` because ``hash()``
is randomised per-process (``PYTHONHASHSEED``). In a multi-pod
deployment every pod must derive the **same** Redis key for the same
arguments, otherwise cache lookups and invalidations silently miss.
"""
key_bytes = repr(key).encode()
digest = hashlib.sha256(key_bytes).hexdigest()
return f"cache:{func_name}:{digest}"
@runtime_checkable

View File

@@ -1,5 +1,6 @@
import contextlib
import logging
import os
from enum import Enum
from functools import wraps
from typing import Any, Awaitable, Callable, TypeVar
@@ -38,6 +39,7 @@ class Flag(str, Enum):
AGENT_ACTIVITY = "agent-activity"
ENABLE_PLATFORM_PAYMENT = "enable-platform-payment"
CHAT = "chat"
CHAT_MODE_OPTION = "chat-mode-option"
COPILOT_SDK = "copilot-sdk"
COPILOT_DAILY_TOKEN_LIMIT = "copilot-daily-token-limit"
COPILOT_WEEKLY_TOKEN_LIMIT = "copilot-weekly-token-limit"
@@ -165,6 +167,30 @@ async def get_feature_flag_value(
return default
def _env_flag_override(flag_key: Flag) -> bool | None:
"""Return a local override for ``flag_key`` from the environment.
Set ``FORCE_FLAG_<NAME>=true|false`` (``NAME`` = flag value with
``-`` → ``_``, upper-cased) to bypass LaunchDarkly for a single
flag in local dev or tests. Returns ``None`` when no override
is configured so the caller falls through to LaunchDarkly.
The ``NEXT_PUBLIC_FORCE_FLAG_<NAME>`` prefix is also accepted so a
single shared env var can toggle a flag across backend and
frontend (the frontend requires the ``NEXT_PUBLIC_`` prefix to
expose the value to the browser bundle).
Example: ``FORCE_FLAG_CHAT_MODE_OPTION=true`` forces
``Flag.CHAT_MODE_OPTION`` on regardless of LaunchDarkly.
"""
suffix = flag_key.value.upper().replace("-", "_")
for prefix in ("FORCE_FLAG_", "NEXT_PUBLIC_FORCE_FLAG_"):
raw = os.environ.get(prefix + suffix)
if raw is not None:
return raw.strip().lower() in ("1", "true", "yes", "on")
return None
async def is_feature_enabled(
flag_key: Flag,
user_id: str,
@@ -181,6 +207,11 @@ async def is_feature_enabled(
Returns:
True if feature is enabled, False otherwise
"""
override = _env_flag_override(flag_key)
if override is not None:
logger.debug(f"Feature flag {flag_key} overridden by env: {override}")
return override
result = await get_feature_flag_value(flag_key.value, user_id, default)
# If the result is already a boolean, return it

View File

@@ -4,6 +4,7 @@ from ldclient import LDClient
from backend.util.feature_flag import (
Flag,
_env_flag_override,
feature_flag,
is_feature_enabled,
mock_flag_variation,
@@ -111,3 +112,59 @@ async def test_is_feature_enabled_with_flag_enum(mocker):
assert result is True
# Should call with the flag's string value
mock_get_feature_flag_value.assert_called_once()
class TestEnvFlagOverride:
def test_force_flag_true(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv("FORCE_FLAG_CHAT", "true")
assert _env_flag_override(Flag.CHAT) is True
def test_force_flag_false(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv("FORCE_FLAG_CHAT", "false")
assert _env_flag_override(Flag.CHAT) is False
def test_next_public_prefix_true(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv("NEXT_PUBLIC_FORCE_FLAG_CHAT", "true")
assert _env_flag_override(Flag.CHAT) is True
def test_unset_returns_none(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.delenv("FORCE_FLAG_CHAT", raising=False)
monkeypatch.delenv("NEXT_PUBLIC_FORCE_FLAG_CHAT", raising=False)
assert _env_flag_override(Flag.CHAT) is None
def test_invalid_value_returns_false(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv("FORCE_FLAG_CHAT", "notaboolean")
assert _env_flag_override(Flag.CHAT) is False
def test_numeric_one_returns_true(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv("FORCE_FLAG_CHAT", "1")
assert _env_flag_override(Flag.CHAT) is True
def test_yes_returns_true(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv("FORCE_FLAG_CHAT", "yes")
assert _env_flag_override(Flag.CHAT) is True
def test_on_returns_true(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv("FORCE_FLAG_CHAT", "on")
assert _env_flag_override(Flag.CHAT) is True
def test_hyphenated_flag_converts_to_underscore(
self, monkeypatch: pytest.MonkeyPatch
):
monkeypatch.setenv("FORCE_FLAG_CHAT_MODE_OPTION", "true")
assert _env_flag_override(Flag.CHAT_MODE_OPTION) is True
def test_force_flag_takes_precedence_over_next_public(
self, monkeypatch: pytest.MonkeyPatch
):
monkeypatch.setenv("FORCE_FLAG_CHAT", "false")
monkeypatch.setenv("NEXT_PUBLIC_FORCE_FLAG_CHAT", "true")
assert _env_flag_override(Flag.CHAT) is False
def test_whitespace_is_stripped(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv("FORCE_FLAG_CHAT", " true ")
assert _env_flag_override(Flag.CHAT) is True
def test_case_insensitive_value(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv("FORCE_FLAG_CHAT", "TRUE")
assert _env_flag_override(Flag.CHAT) is True

View File

@@ -155,6 +155,7 @@ class WorkspaceManager:
path: Optional[str] = None,
mime_type: Optional[str] = None,
overwrite: bool = False,
metadata: Optional[dict] = None,
) -> WorkspaceFile:
"""
Write file to workspace.
@@ -168,6 +169,7 @@ class WorkspaceManager:
path: Virtual path (defaults to "/{filename}", session-scoped if session_id set)
mime_type: MIME type (auto-detected if not provided)
overwrite: Whether to overwrite existing file at path
metadata: Optional metadata dict (e.g., origin tracking)
Returns:
Created WorkspaceFile instance
@@ -246,6 +248,7 @@ class WorkspaceManager:
mime_type=mime_type,
size_bytes=len(content),
checksum=checksum,
metadata=metadata,
)
except UniqueViolationError:
if retries > 0:

View File

@@ -0,0 +1,5 @@
-- CreateEnum
CREATE TYPE "SubscriptionTier" AS ENUM ('FREE', 'PRO', 'BUSINESS', 'ENTERPRISE');
-- AlterTable: add subscriptionTier column with default PRO (beta testing)
ALTER TABLE "User" ADD COLUMN "subscriptionTier" "SubscriptionTier" NOT NULL DEFAULT 'PRO';

View File

@@ -0,0 +1,42 @@
-- CreateTable
CREATE TABLE "PlatformCostLog" (
"id" TEXT NOT NULL,
"createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
"userId" TEXT,
"graphExecId" TEXT,
"nodeExecId" TEXT,
"graphId" TEXT,
"nodeId" TEXT,
"blockId" TEXT NOT NULL,
"blockName" TEXT NOT NULL,
"provider" TEXT NOT NULL,
"credentialId" TEXT NOT NULL,
"costMicrodollars" BIGINT,
"inputTokens" INTEGER,
"outputTokens" INTEGER,
"dataSize" INTEGER,
"duration" DOUBLE PRECISION,
"model" TEXT,
"trackingType" TEXT,
"metadata" JSONB,
CONSTRAINT "PlatformCostLog_pkey" PRIMARY KEY ("id")
);
-- CreateIndex
CREATE INDEX "PlatformCostLog_userId_createdAt_idx" ON "PlatformCostLog"("userId", "createdAt");
-- CreateIndex
CREATE INDEX "PlatformCostLog_provider_createdAt_idx" ON "PlatformCostLog"("provider", "createdAt");
-- CreateIndex
CREATE INDEX "PlatformCostLog_createdAt_idx" ON "PlatformCostLog"("createdAt");
-- CreateIndex
CREATE INDEX "PlatformCostLog_graphExecId_idx" ON "PlatformCostLog"("graphExecId");
-- CreateIndex
CREATE INDEX "PlatformCostLog_provider_trackingType_idx" ON "PlatformCostLog"("provider", "trackingType");
-- AddForeignKey
ALTER TABLE "PlatformCostLog" ADD CONSTRAINT "PlatformCostLog_userId_fkey" FOREIGN KEY ("userId") REFERENCES "User"("id") ON DELETE SET NULL ON UPDATE CASCADE;

View File

@@ -0,0 +1,2 @@
-- AlterTable
ALTER TABLE "PlatformCostLog" ADD COLUMN "trackingAmount" DOUBLE PRECISION;

View File

@@ -40,6 +40,15 @@ model User {
timezone String @default("not-set")
// CoPilot subscription tier — controls rate-limit multipliers.
// Multipliers applied in get_global_rate_limits(): FREE=1x, PRO=5x, BUSINESS=20x, ENTERPRISE=60x.
// NOTE: @default(PRO) is intentional for the beta period — all existing and new
// users receive PRO-level (5x) rate limits by default. The Python-level constant
// DEFAULT_TIER=FREE (in copilot/rate_limit.py) acts as a code-level fallback when
// the DB value is NULL or unrecognised. At GA, a migration will flip the column
// default to FREE and batch-update users to their billing-derived tiers.
subscriptionTier SubscriptionTier @default(PRO)
// Relations
AgentGraphs AgentGraph[]
@@ -66,6 +75,8 @@ model User {
PendingHumanReviews PendingHumanReview[]
Workspace UserWorkspace?
PlatformCostLogs PlatformCostLog[]
// OAuth Provider relations
OAuthApplications OAuthApplication[]
OAuthAuthorizationCodes OAuthAuthorizationCode[]
@@ -73,6 +84,13 @@ model User {
OAuthRefreshTokens OAuthRefreshToken[]
}
enum SubscriptionTier {
FREE
PRO
BUSINESS
ENTERPRISE
}
enum OnboardingStep {
// Introductory onboarding (Library)
WELCOME
@@ -799,6 +817,45 @@ model CreditRefundRequest {
@@index([userId, transactionKey])
}
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////// Platform Cost Tracking TABLES //////////////
////////////////////////////////////////////////////////////
model PlatformCostLog {
id String @id @default(uuid())
createdAt DateTime @default(now())
userId String?
User User? @relation(fields: [userId], references: [id], onDelete: SetNull)
graphExecId String?
nodeExecId String?
graphId String?
nodeId String?
blockId String
blockName String
provider String
credentialId String
// Cost in microdollars (1 USD = 1,000,000). Null if unknown.
costMicrodollars BigInt?
inputTokens Int?
outputTokens Int?
dataSize Int? // bytes
duration Float? // seconds
model String?
trackingType String? // e.g. "cost_usd", "tokens", "characters", "items", "per_run", "sandbox_seconds", "walltime_seconds"
trackingAmount Float? // Amount in the unit implied by trackingType
metadata Json?
@@index([userId, createdAt])
@@index([provider, createdAt])
@@index([createdAt])
@@index([graphExecId])
@@index([provider, trackingType])
}
////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////
////////////// Store TABLES ///////////////////////////

View File

@@ -1,6 +1,7 @@
{
"daily_token_limit": 2500000,
"daily_tokens_used": 500000,
"tier": "FREE",
"user_email": "target@example.com",
"user_id": "5e53486c-cf57-477e-ba2a-cb02dc828e1c",
"weekly_token_limit": 12500000,

View File

@@ -1,6 +1,7 @@
{
"daily_token_limit": 2500000,
"daily_tokens_used": 0,
"tier": "FREE",
"user_email": "target@example.com",
"user_id": "5e53486c-cf57-477e-ba2a-cb02dc828e1c",
"weekly_token_limit": 12500000,

View File

@@ -1,6 +1,7 @@
{
"daily_token_limit": 2500000,
"daily_tokens_used": 0,
"tier": "FREE",
"user_email": "target@example.com",
"user_id": "5e53486c-cf57-477e-ba2a-cb02dc828e1c",
"weekly_token_limit": 12500000,

View File

@@ -140,7 +140,9 @@ class TestFixOrchestratorBlocks:
assert defaults["conversation_compaction"] is True
assert defaults["retry"] == 3
assert defaults["multiple_tool_calls"] is False
assert len(fixer.fixes_applied) == 4
assert defaults["execution_mode"] == "extended_thinking"
assert defaults["model"] == "claude-opus-4-6"
assert len(fixer.fixes_applied) == 6
def test_preserves_existing_values(self):
"""Existing user-set values are never overwritten."""
@@ -153,6 +155,8 @@ class TestFixOrchestratorBlocks:
"conversation_compaction": False,
"retry": 1,
"multiple_tool_calls": True,
"execution_mode": "built_in",
"model": "gpt-4o",
}
)
],
@@ -166,6 +170,8 @@ class TestFixOrchestratorBlocks:
assert defaults["conversation_compaction"] is False
assert defaults["retry"] == 1
assert defaults["multiple_tool_calls"] is True
assert defaults["execution_mode"] == "built_in"
assert defaults["model"] == "gpt-4o"
assert len(fixer.fixes_applied) == 0
def test_partial_defaults(self):
@@ -189,7 +195,9 @@ class TestFixOrchestratorBlocks:
assert defaults["conversation_compaction"] is True # filled
assert defaults["retry"] == 3 # filled
assert defaults["multiple_tool_calls"] is False # filled
assert len(fixer.fixes_applied) == 3
assert defaults["execution_mode"] == "extended_thinking" # filled
assert defaults["model"] == "claude-opus-4-6" # filled
assert len(fixer.fixes_applied) == 5
def test_skips_non_sdm_nodes(self):
"""Non-Orchestrator nodes are untouched."""
@@ -258,11 +266,13 @@ class TestFixOrchestratorBlocks:
result = fixer.fix_orchestrator_blocks(agent)
defaults = result["nodes"][0]["input_default"]
assert defaults["agent_mode_max_iterations"] == 10 # None default
assert defaults["conversation_compaction"] is True # None default
assert defaults["agent_mode_max_iterations"] == 10 # None -> default
assert defaults["conversation_compaction"] is True # None -> default
assert defaults["retry"] == 3 # kept
assert defaults["multiple_tool_calls"] is False # kept
assert len(fixer.fixes_applied) == 2
assert defaults["execution_mode"] == "extended_thinking" # filled
assert defaults["model"] == "claude-opus-4-6" # filled
assert len(fixer.fixes_applied) == 4
def test_multiple_sdm_nodes(self):
"""Multiple SDM nodes are all fixed independently."""
@@ -277,11 +287,11 @@ class TestFixOrchestratorBlocks:
result = fixer.fix_orchestrator_blocks(agent)
# First node: 3 defaults filled (agent_mode was already set)
# First node: 5 defaults filled (agent_mode was already set)
assert result["nodes"][0]["input_default"]["agent_mode_max_iterations"] == 3
# Second node: all 4 defaults filled
# Second node: all 6 defaults filled
assert result["nodes"][1]["input_default"]["agent_mode_max_iterations"] == 10
assert len(fixer.fixes_applied) == 7 # 3 + 4
assert len(fixer.fixes_applied) == 11 # 5 + 6
def test_registered_in_apply_all_fixes(self):
"""fix_orchestrator_blocks runs as part of apply_all_fixes."""
@@ -655,6 +665,7 @@ class TestOrchestratorE2EPipeline:
"conversation_compaction": {"type": "boolean"},
"retry": {"type": "integer"},
"multiple_tool_calls": {"type": "boolean"},
"execution_mode": {"type": "string"},
},
"required": ["prompt"],
},

View File

@@ -0,0 +1,394 @@
"""Prompt regression tests AND functional tests for the dry-run verification loop.
NOTE: This file lives in test/copilot/ rather than being colocated with a
single source module because it is a cross-cutting test spanning multiple
modules: prompting.py, service.py, agent_generation_guide.md, and run_agent.py.
These tests verify that the create -> dry-run -> fix iterative workflow is
properly communicated through tool descriptions, the prompting supplement,
and the agent building guide.
After deduplication, the full dry-run workflow lives in the
agent_generation_guide.md only. The system prompt and individual tool
descriptions no longer repeat it — they keep a minimal footprint.
**Intentionally brittle**: the assertions check for specific substrings so
that accidental removal or rewording of key instructions is caught. If you
deliberately reword a prompt, update the corresponding assertion here.
--- Functional tests (added separately) ---
The dry-run loop is primarily a *prompt/guide* feature — the copilot reads
the guide and follows its instructions. There are no standalone Python
functions that implement "loop until passing" logic; the loop is driven by
the LLM. However, several pieces of real Python infrastructure make the
loop possible:
1. The ``run_agent`` and ``run_block`` OpenAI tool schemas expose a
``dry_run`` boolean parameter that the LLM must be able to set.
2. The ``RunAgentInput`` Pydantic model validates ``dry_run`` as a required
bool, so the executor can branch on it.
3. The ``_check_prerequisites`` method in ``RunAgentTool`` bypasses
credential and missing-input gates when ``dry_run=True``.
4. The guide documents the workflow steps in a specific order that the LLM
must follow: create/edit -> dry-run -> inspect -> fix -> repeat.
The functional test classes below exercise items 1-4 directly.
"""
import re
from pathlib import Path
from typing import Any, cast
import pytest
from openai.types.chat import ChatCompletionToolParam
from pydantic import ValidationError
from backend.copilot.prompting import get_sdk_supplement
from backend.copilot.service import DEFAULT_SYSTEM_PROMPT
from backend.copilot.tools import TOOL_REGISTRY
from backend.copilot.tools.run_agent import RunAgentInput
# Resolved once for the whole module so individual tests stay fast.
_SDK_SUPPLEMENT = get_sdk_supplement(use_e2b=False, cwd="/tmp/test")
# ---------------------------------------------------------------------------
# Prompt regression tests (original)
# ---------------------------------------------------------------------------
class TestSystemPromptBasics:
"""Verify the system prompt includes essential baseline content.
After deduplication, the dry-run workflow lives only in the guide.
The system prompt carries tone and personality only.
"""
def test_mentions_automations(self):
assert "automations" in DEFAULT_SYSTEM_PROMPT.lower()
def test_mentions_action_oriented(self):
assert "action-oriented" in DEFAULT_SYSTEM_PROMPT.lower()
class TestToolDescriptionsDryRunLoop:
"""Verify tool descriptions and parameters related to the dry-run loop."""
def test_get_agent_building_guide_mentions_workflow(self):
desc = TOOL_REGISTRY["get_agent_building_guide"].description
assert "dry-run" in desc.lower()
def test_run_agent_dry_run_param_exists_and_is_boolean(self):
schema = TOOL_REGISTRY["run_agent"].as_openai_tool()
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
assert "dry_run" in params["properties"]
assert params["properties"]["dry_run"]["type"] == "boolean"
def test_run_agent_dry_run_param_mentions_simulation(self):
"""After deduplication the dry_run param description mentions simulation."""
schema = TOOL_REGISTRY["run_agent"].as_openai_tool()
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
dry_run_desc = params["properties"]["dry_run"]["description"]
assert "simulat" in dry_run_desc.lower()
class TestPromptingSupplementContent:
"""Verify the prompting supplement (via get_sdk_supplement) includes
essential shared tool notes. After deduplication, the dry-run workflow
lives only in the guide; the supplement carries storage, file-handling,
and tool-discovery notes.
"""
def test_includes_tool_discovery_priority(self):
assert "Tool Discovery Priority" in _SDK_SUPPLEMENT
def test_includes_find_block_first(self):
assert "find_block first" in _SDK_SUPPLEMENT or "find_block" in _SDK_SUPPLEMENT
def test_includes_send_authenticated_web_request(self):
assert "SendAuthenticatedWebRequestBlock" in _SDK_SUPPLEMENT
class TestAgentBuildingGuideDryRunLoop:
"""Verify the agent building guide includes the dry-run loop."""
@pytest.fixture
def guide_content(self):
guide_path = (
Path(__file__).resolve().parent.parent.parent
/ "backend"
/ "copilot"
/ "sdk"
/ "agent_generation_guide.md"
)
return guide_path.read_text(encoding="utf-8")
def test_has_dry_run_verification_section(self, guide_content):
assert "REQUIRED: Dry-Run Verification Loop" in guide_content
def test_workflow_includes_dry_run_step(self, guide_content):
assert "dry_run=True" in guide_content
def test_mentions_good_vs_bad_output(self, guide_content):
assert "**Good output**" in guide_content
assert "**Bad output**" in guide_content
def test_mentions_repeat_until_pass(self, guide_content):
lower = guide_content.lower()
assert "repeat" in lower
assert "clearly unfixable" in lower
def test_mentions_wait_for_result(self, guide_content):
assert "wait_for_result=120" in guide_content
def test_mentions_view_agent_output(self, guide_content):
assert "view_agent_output" in guide_content
def test_workflow_has_dry_run_and_inspect_steps(self, guide_content):
assert "**Dry-run**" in guide_content
assert "**Inspect & fix**" in guide_content
# ---------------------------------------------------------------------------
# Functional tests: tool schema validation
# ---------------------------------------------------------------------------
class TestRunAgentToolSchema:
"""Validate the run_agent OpenAI tool schema exposes dry_run correctly.
These go beyond substring checks — they verify the full schema structure
that the LLM receives, ensuring the parameter is well-formed and will be
parsed correctly by OpenAI function-calling.
"""
@pytest.fixture
def schema(self) -> ChatCompletionToolParam:
return TOOL_REGISTRY["run_agent"].as_openai_tool()
def test_schema_is_valid_openai_tool(self, schema: ChatCompletionToolParam):
"""The schema has the required top-level OpenAI structure."""
assert schema["type"] == "function"
assert "function" in schema
func = schema["function"]
assert "name" in func
assert "description" in func
assert "parameters" in func
assert func["name"] == "run_agent"
def test_dry_run_is_required(self, schema: ChatCompletionToolParam):
"""dry_run must be in 'required' so the LLM always provides it explicitly."""
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
required = params.get("required", [])
assert "dry_run" in required
def test_dry_run_is_boolean_type(self, schema: ChatCompletionToolParam):
"""dry_run must be typed as boolean so the LLM generates true/false."""
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
assert params["properties"]["dry_run"]["type"] == "boolean"
def test_dry_run_description_is_nonempty(self, schema: ChatCompletionToolParam):
"""The description must be present and substantive for LLM guidance."""
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
desc = params["properties"]["dry_run"]["description"]
assert isinstance(desc, str)
assert len(desc) > 10, "Description too short to guide the LLM"
def test_wait_for_result_coexists_with_dry_run(
self, schema: ChatCompletionToolParam
):
"""wait_for_result must also be present — the guide instructs the LLM
to pass both dry_run=True and wait_for_result=120 together."""
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
assert "wait_for_result" in params["properties"]
assert params["properties"]["wait_for_result"]["type"] == "integer"
class TestRunBlockToolSchema:
"""Validate the run_block OpenAI tool schema exposes dry_run correctly."""
@pytest.fixture
def schema(self) -> ChatCompletionToolParam:
return TOOL_REGISTRY["run_block"].as_openai_tool()
def test_schema_is_valid_openai_tool(self, schema: ChatCompletionToolParam):
assert schema["type"] == "function"
func = schema["function"]
assert func["name"] == "run_block"
assert "parameters" in func
def test_dry_run_exists_and_is_boolean(self, schema: ChatCompletionToolParam):
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
props = params["properties"]
assert "dry_run" in props
assert props["dry_run"]["type"] == "boolean"
def test_dry_run_is_required(self, schema: ChatCompletionToolParam):
"""dry_run must be required — along with block_id and input_data."""
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
required = params.get("required", [])
assert "dry_run" in required
assert "block_id" in required
assert "input_data" in required
def test_dry_run_description_mentions_preview(
self, schema: ChatCompletionToolParam
):
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
desc = params["properties"]["dry_run"]["description"]
assert isinstance(desc, str)
assert (
"preview mode" in desc.lower()
), "run_block dry_run description should mention preview mode"
# ---------------------------------------------------------------------------
# Functional tests: RunAgentInput Pydantic model
# ---------------------------------------------------------------------------
class TestRunAgentInputModel:
"""Validate RunAgentInput Pydantic model handles dry_run correctly.
The executor reads dry_run from this model, so it must parse, default,
and validate properly.
"""
def test_dry_run_accepts_true(self):
model = RunAgentInput(username_agent_slug="user/agent", dry_run=True)
assert model.dry_run is True
def test_dry_run_accepts_false(self):
"""dry_run=False must be accepted when provided explicitly."""
model = RunAgentInput(username_agent_slug="user/agent", dry_run=False)
assert model.dry_run is False
def test_dry_run_coerces_truthy_int(self):
"""Pydantic bool fields coerce int 1 to True."""
model = RunAgentInput(username_agent_slug="user/agent", dry_run=1) # type: ignore[arg-type]
assert model.dry_run is True
def test_dry_run_coerces_falsy_int(self):
"""Pydantic bool fields coerce int 0 to False."""
model = RunAgentInput(username_agent_slug="user/agent", dry_run=0) # type: ignore[arg-type]
assert model.dry_run is False
def test_dry_run_with_wait_for_result(self):
"""The guide instructs passing both dry_run=True and wait_for_result=120.
The model must accept this combination."""
model = RunAgentInput(
username_agent_slug="user/agent",
dry_run=True,
wait_for_result=120,
)
assert model.dry_run is True
assert model.wait_for_result == 120
def test_wait_for_result_upper_bound(self):
"""wait_for_result is bounded at 300 seconds (ge=0, le=300)."""
with pytest.raises(ValidationError):
RunAgentInput(
username_agent_slug="user/agent",
dry_run=True,
wait_for_result=301,
)
def test_string_fields_are_stripped(self):
"""The strip_strings validator should strip whitespace from string fields."""
model = RunAgentInput(username_agent_slug=" user/agent ", dry_run=True)
assert model.username_agent_slug == "user/agent"
# ---------------------------------------------------------------------------
# Functional tests: guide documents the correct workflow ordering
# ---------------------------------------------------------------------------
class TestGuideWorkflowOrdering:
"""Verify the guide documents workflow steps in the correct order.
The LLM must see: create/edit -> dry-run -> inspect -> fix -> repeat.
If these steps are reordered, the copilot would follow the wrong sequence.
These tests verify *ordering*, not just presence.
"""
@pytest.fixture
def guide_content(self) -> str:
guide_path = (
Path(__file__).resolve().parent.parent.parent
/ "backend"
/ "copilot"
/ "sdk"
/ "agent_generation_guide.md"
)
return guide_path.read_text(encoding="utf-8")
def test_create_before_dry_run_in_workflow(self, guide_content: str):
"""Step 7 (Save/create_agent) must appear before step 8 (Dry-run)."""
create_pos = guide_content.index("create_agent")
dry_run_pos = guide_content.index("dry_run=True")
assert (
create_pos < dry_run_pos
), "create_agent must appear before dry_run=True in the workflow"
def test_dry_run_before_inspect_in_verification_section(self, guide_content: str):
"""In the verification loop section, Dry-run step must come before
Inspect & fix step."""
section_start = guide_content.index("REQUIRED: Dry-Run Verification Loop")
section = guide_content[section_start:]
dry_run_pos = section.index("**Dry-run**")
inspect_pos = section.index("**Inspect")
assert (
dry_run_pos < inspect_pos
), "Dry-run step must come before Inspect & fix in the verification loop"
def test_fix_before_repeat_in_verification_section(self, guide_content: str):
"""The Fix step must come before the Repeat step."""
section_start = guide_content.index("REQUIRED: Dry-Run Verification Loop")
section = guide_content[section_start:]
fix_pos = section.index("**Fix**")
repeat_pos = section.index("**Repeat**")
assert fix_pos < repeat_pos
def test_good_output_before_bad_output(self, guide_content: str):
"""Good output examples should be listed before bad output examples,
so the LLM sees the success pattern first."""
good_pos = guide_content.index("**Good output**")
bad_pos = guide_content.index("**Bad output**")
assert good_pos < bad_pos
def test_numbered_steps_in_verification_section(self, guide_content: str):
"""The step-by-step workflow should have numbered steps 1-5."""
section_start = guide_content.index("Step-by-step workflow")
section = guide_content[section_start:]
# The section should contain numbered items 1 through 5
for step_num in range(1, 6):
assert (
f"{step_num}. " in section
), f"Missing numbered step {step_num} in verification workflow"
def test_workflow_steps_are_in_numbered_order(self, guide_content: str):
"""The main workflow steps (1-9) must appear in ascending order."""
# Extract the numbered workflow items from the top-level workflow section
workflow_start = guide_content.index("### Workflow for Creating/Editing Agents")
# End at the next ### section
next_section = guide_content.index("### Agent JSON Structure")
workflow_section = guide_content[workflow_start:next_section]
step_positions = []
for step_num in range(1, 10):
pattern = rf"^{step_num}\.\s"
match = re.search(pattern, workflow_section, re.MULTILINE)
if match:
step_positions.append((step_num, match.start()))
# Verify at least steps 1-9 are present and in order
assert (
len(step_positions) >= 9
), f"Expected 9 workflow steps, found {len(step_positions)}"
for i in range(1, len(step_positions)):
prev_num, prev_pos = step_positions[i - 1]
curr_num, curr_pos = step_positions[i]
assert prev_pos < curr_pos, (
f"Step {prev_num} (pos {prev_pos}) should appear before "
f"step {curr_num} (pos {curr_pos})"
)

View File

@@ -98,6 +98,7 @@ services:
- CLAMD_CONF_MaxScanSize=100M
- CLAMD_CONF_MaxThreads=12
- CLAMD_CONF_ReadTimeout=300
- CLAMD_CONF_TCPAddr=0.0.0.0
healthcheck:
test: ["CMD-SHELL", "clamdscan --version || exit 1"]
interval: 30s

View File

@@ -40,6 +40,8 @@ After making **any** code changes in the frontend, you MUST run the following co
Do NOT skip these steps. If any command reports errors, fix them and re-run until clean. Only then may you consider the task complete. If typing keeps failing, stop and ask the user.
4. `pnpm test:unit` — run integration tests; fix any failures
### Code Style
- Fully capitalize acronyms in symbols, e.g. `graphID`, `useBackendAPI`
@@ -62,7 +64,7 @@ Do NOT skip these steps. If any command reports errors, fix them and re-run unti
- **Icons**: Phosphor Icons only
- **Feature Flags**: LaunchDarkly integration
- **Error Handling**: ErrorCard for render errors, toast for mutations, Sentry for exceptions
- **Testing**: Playwright for E2E, Storybook for component development
- **Testing**: Vitest + React Testing Library + MSW for integration tests (primary), Playwright for E2E, Storybook for visual
## Environment Configuration
@@ -84,7 +86,12 @@ See @CONTRIBUTING.md for complete patterns. Quick reference:
- Regenerate with `pnpm generate:api`
- Pattern: `use{Method}{Version}{OperationName}`
4. **Styling**: Tailwind CSS only, use design tokens, Phosphor Icons only
5. **Testing**: Add Storybook stories for new components, Playwright for E2E. When fixing a bug, write a failing Playwright test first (use `.fixme` annotation), implement the fix, then remove the annotation.
5. **Testing**: Integration tests are the default (~90%). See `TESTING.md` for full details.
- **New pages/features**: Write integration tests in `__tests__/` next to `page.tsx` using Vitest + RTL + MSW
- **API mocking**: Use Orval-generated MSW handlers from `@/app/api/__generated__/endpoints/{tag}/{tag}.msw.ts`
- **Run**: `pnpm test:unit` (integration/unit), `pnpm test` (Playwright E2E)
- **Storybook**: For design system components in `src/components/`
- **TDD**: Write a failing test first, implement, then verify
6. **Code conventions**:
- Use function declarations (not arrow functions) for components/handlers
- Do not use `useCallback` or `useMemo` unless asked to optimise a given function

Some files were not shown because too many files have changed in this diff Show More