Compare commits

...

55 Commits

Author SHA1 Message Date
Zamil Majdy
fff9faf13c fix(copilot): use transient_api_error code for exhausted transient retries
When the except-Exception transient-retry budget was exhausted the post-loop
StreamError yielded code='sdk_stream_error' instead of 'transient_api_error'
and called _friendly_error_text(raw) instead of FRIENDLY_TRANSIENT_MSG.
This made the client unable to show the same "Try again" affordance as the
_HandledStreamError path.

Add transient_exhausted flag; check it in the post-loop alongside
attempts_exhausted to emit the correct code/text.  Also collapse the
unnecessary split f-string in the retry StreamStatus message, and add a
version comment on the CLAUDE_CODE_DISABLE_* env var block.
2026-04-08 10:19:57 +07:00
Zamil Majdy
f95772f0af fix(copilot): fix StreamError ordering and cap exponential backoff
- _HandledStreamError: add already_yielded=True attribute (default True for
  backward compat). For transient ResultMessage errors, already_yielded=False
  so the outer loop — not _run_stream_attempt — yields StreamError only when
  all retries are exhausted. This prevents a premature error flash on the
  client before the retry status message.
- _next_transient_backoff: cap backoff at min(30s, 2^(n-1)) so operators
  who raise max_transient_retries to 10 don't silently stall for 8+ minutes.
- Tests: add TestHandledStreamErrorAlreadyYielded covering the contract
  and the backoff cap formula.
2026-04-08 10:04:17 +07:00
Zamil Majdy
79b8ad80fe fix(copilot): tighten fallback detection pattern and harden header sanitization
- _on_stderr: match "fallback model" instead of bare "fallback" to avoid
  false-positive StreamStatus notifications from unrelated stderr lines
  (tool-level retries, cached result fallbacks, etc.)
- build_sdk_env: upgrade _safe() to strip all non-printable-ASCII chars
  via re.sub(r'[^\x20-\x7e]', '') instead of only \r/\n, producing valid
  RFC 7230 HTTP header values (defence-in-depth; values are system UUIDs
  in practice)
2026-04-08 09:48:56 +07:00
Zamil Majdy
8a4bc0b1e4 fix(copilot): persist retryable marker when transient retries exhausted
When _next_transient_backoff() returns None in the generic except-Exception
handler (retries exhausted), the code fell through to the non-context-error
break without calling _append_error_marker. This meant the frontend lost the
"Try again" affordance after page refresh for 429/5xx/ECONNRESET exhausted-
retry flows.

Fix: handle the exhausted-retry case inside the if-is_transient block,
mirroring the _HandledStreamError path at line ~2310 which already calls
_append_error_marker(..., retryable=True) before breaking.

Addresses coderabbitai review comment #3022088580.
2026-04-07 23:19:06 +07:00
Zamil Majdy
644d39d6be Merge branch 'fix/copilot-p0-cli-internals' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-p0-cli-internals 2026-04-07 21:16:20 +07:00
Zamil Majdy
e2add1ba5b Merge remote-tracking branch 'origin/dev' into fix/copilot-p0-cli-internals 2026-04-07 21:16:00 +07:00
Krzysztof Czerwinski
67bdef13e7 feat(platform): load copilot messages from newest first with cursor-based pagination (#12328)
Copilot chat sessions with long histories loaded all messages at once,
causing slow initial loads. This PR adds cursor-based pagination so only
the most recent messages load initially, with older messages fetched on
demand as the user scrolls up.

### Changes 🏗️

**Backend:**
- Cursor-based pagination on `GET /sessions/{session_id}` (`limit`,
`before_sequence` params)
- `user_id` relation filter on the paginated query — ownership check and
message fetch now run in parallel
- Backward boundary expansion to keep tool-call / assistant message
pairs intact at page edges
- Unit tests for paginated queries

**Frontend:**
- `useLoadMoreMessages` hook + `LoadMoreSentinel` (IntersectionObserver)
for infinite scroll upward
- `ScrollPreserver` to maintain scroll position when older messages are
prepended
- Session-keyed `Conversation` remount with one-frame opacity hide to
eliminate scroll flash on switch
- Scrollbar moved to the correct scroll container; loading spinner no
longer causes overflow

### Checklist 📋

- [x] Pagination: only recent messages load initially; older pages load
on scroll-up
- [x] Scroll position preserved on prepend; no flash on session switch
- [x] Tool-call boundary pairs stay intact across page edges
- [x] Stream reconnection still works on initial load

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-04-07 12:43:47 +00:00
Zamil Majdy
1a52b0d02c fix(copilot): address review comments — security env vars for all auth modes, narrow transient patterns
- Apply security env vars (DISABLE_CLAUDE_MDS, SKIP_PROMPT_HISTORY, DISABLE_AUTO_MEMORY,
  DISABLE_NONESSENTIAL_TRAFFIC) in all three auth modes (subscription, direct Anthropic,
  OpenRouter), not just OpenRouter mode. Refactor env.py to use if/elif/else so common
  hardening runs unconditionally at the end.
- Remove overly broad natural-language transient patterns ("overloaded", "internal server
  error", "bad gateway", "service unavailable", "gateway timeout") — these could match
  application-level error messages and trigger spurious retries. Keep status-code-specific
  patterns (status code 5xx) which cover the same cases without false-positive risk.
- Replace TestSecurityEnvVars source-grep tests with real build_sdk_env() behavior tests
  that assert security vars are present in the returned dict for all three auth modes.
- Update stale test_direct_anthropic_returns_empty_dict to test the actual contract
  (no ANTHROPIC_* overrides) rather than requiring an empty dict.
- Remove dead code: is_transient_api_error(str(exc)) in _HandledStreamError handler —
  str(exc) is always the static error message and never matches any transient pattern.
- Update existing env_test.py exact-dict assertions that broke after security vars
  are now returned by all modes.
2026-04-07 19:38:36 +07:00
Ubbe
e67dd93ee8 refactor(frontend): remove stale feature flags and stabilize share execution (#12697)
## Why

Stale feature flags add noise to the codebase and make it harder to
understand which flags are actually gating live features. Four flags
were defined but never referenced anywhere in the frontend, and the
"Share Execution Results" flag has been stable long enough to remove its
gate.

## What

- Remove 4 unused flags from the `Flag` enum and `defaultFlags`:
`NEW_BLOCK_MENU`, `GRAPH_SEARCH`, `ENABLE_ENHANCED_OUTPUT_HANDLING`,
`AGENT_FAVORITING`
- Remove the `SHARE_EXECUTION_RESULTS` flag and its conditional — the
`ShareRunButton` now always renders

## How

- Deleted enum entries and default values in `use-get-flag.ts`
- Removed the `useGetFlag` call and conditional wrapper around
`<ShareRunButton />` in `SelectedRunActions.tsx`

## Changes

- `src/services/feature-flags/use-get-flag.ts` — removed 5 flags from
enum + defaults
- `src/app/(platform)/library/.../SelectedRunActions.tsx` — removed flag
import, condition; share button always renders

### Checklist

- [x] My PR is small and focused on one change
- [x] I've tested my changes locally
- [x] `pnpm format && pnpm lint` pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 19:28:40 +07:00
Zamil Majdy
b101069eaf Merge remote-tracking branch 'origin/dev' into work/pr-12636 2026-04-07 19:21:30 +07:00
Otto
3140a60816 fix(frontend/builder): allow horizontal scroll for JSON output data (#12638)
Requested by @Abhi1992002 

## Why

JSON output data in the "Complete Output Data" dialog and node output
panel gets clipped — text overflows and is hidden with no way to scroll
right. Reported by Zamil in #frontend.

## What

The `ContentRenderer` wrapper divs used `overflow-hidden` which
prevented the `JSONRenderer`'s `overflow-x-auto` from working. Changed
both wrapper divs from `overflow-hidden` to `overflow-x-auto`.

```diff
- overflow-hidden [&>*]:rounded-xlarge [&>*]:!text-xs [&_pre]:whitespace-pre-wrap [&_pre]:break-words
+ overflow-x-auto [&>*]:rounded-xlarge [&>*]:!text-xs [&_pre]:whitespace-pre-wrap [&_pre]:break-words

- overflow-hidden [&>*]:rounded-xlarge [&>*]:!text-xs
+ overflow-x-auto [&>*]:rounded-xlarge [&>*]:!text-xs
```

## Scope
- 1 file changed (`ContentRenderer.tsx`)
- 2 lines: `overflow-hidden` → `overflow-x-auto`
- CSS only, no logic changes

Resolves SECRT-2206

Co-authored-by: Abhimanyu Yadav <122007096+Abhi1992002@users.noreply.github.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-04-07 19:11:09 +07:00
Nicholas Tindle
41c2ee9f83 feat(platform): add copilot artifact preview panel (#12629)
### Why / What / How

Copilot artifacts were not previewing reliably: PDFs downloaded instead
of rendering, Python code could still render like markdown, JSX/TSX
artifacts were brittle, HTML dashboards/charts could fail to execute,
and users had to manually open artifact panes after generation. The pane
also got stuck at maximized width when trying to drag it smaller.

This PR adds a dedicated copilot artifact panel and preview pipeline
across the backend/frontend boundary. It preserves artifact metadata
needed for classification, adds extension-first preview routing,
introduces dedicated preview/rendering paths for HTML/CSV/code/PDF/React
artifacts, auto-opens new or edited assistant artifacts, and fixes the
maximized-pane resize path so dragging exits maximized mode immediately.

### Changes 🏗️

- add artifact card and artifact panel UI in copilot, including
persisted panel state and resize/maximize/minimize behavior
- add shared artifact extraction/classification helpers and auto-open
behavior for new or edited assistant messages with artifacts
- add preview/rendering support for HTML, CSV, PDF, code, and React
artifact files
- fix code artifacts such as Python to render through the code renderer
with a dark code surface instead of markdown-style output
- improve JSX/TSX preview behavior with provider wrapping, fallback
export selection, and explicit runtime error surfaces
- allow script execution inside HTML previews so embedded chart
dashboards can render
- update workspace artifact/backend API handling and regenerate the
frontend OpenAPI client
- add regression coverage for artifact helpers, React preview runtime,
auto-open behavior, code rendering, and panel store behavior

- post-review hardening: correct download path for cross-origin URLs,
defer scroll restore until content mounts, gate auto-open behind the
ARTIFACTS flag, parse CSVs with RFC 4180-compliant quoted newlines + BOM
handling, distinguish 413 vs 409 on upload, normalize empty session_id,
and keep AnimatePresence mounted so the panel exit animation plays

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `pnpm format`
  - [x] `pnpm lint`
  - [x] `pnpm types`
  - [x] `pnpm test:unit`

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Adds a new Copilot artifact preview surface that executes
user/AI-generated HTML/React in sandboxed iframes and changes workspace
file upload/listing behavior, so regressions could affect file handling
and client security assumptions despite sandboxing safeguards.
> 
> **Overview**
> Adds an **Artifacts** feature (flagged by `Flag.ARTIFACTS`) to
Copilot: workspace file links/attachments now render as `ArtifactCard`s
and can open a new resizable/minimizable `ArtifactPanel` with history,
auto-open behavior, copy/download actions, and persisted panel width.
> 
> Introduces a richer artifact preview pipeline with type classification
and dedicated renderers for **HTML**, **CSV**, **PDF**, **code
(Shiki-highlighted)**, and **React/TSX** (transpiled and executed in a
sandboxed iframe), plus safer download filename handling and content
caching/scroll restore.
> 
> Extends the workspace backend API by adding `GET /workspace/files`
pagination, standardizing operation IDs in OpenAPI, attaching
`metadata.origin` on uploads/agent-created files, normalizing empty
`session_id`, improving upload error mapping (409 vs 413), and hardening
post-quota soft-delete error handling; updates and expands test coverage
accordingly.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
b732d10eca. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 11:24:22 +00:00
Ubbe
ca748ee12a feat(frontend): refine AutoPilot onboarding — branding, auto-advance, soft cap, polish (#12686)
### Why / What / How

**Why:** The onboarding flow had inconsistent branding ("Autopilot" vs
"AutoPilot"), a heavy progress bar that dominated the header, an extra
click on the role screen, and no guidance on how many pain points to
select — leading to users selecting everything or nothing useful.

**What:** Copy & brand fixes, UX improvements (auto-advance, soft cap),
and visual polish (progress bar, checkmark badges, purple focus inputs).

**How:**
- Replaced all "Autopilot" with "AutoPilot" (capital P) across screens
1-3
- Removed the `?` tooltip on screen 1 (users will learn about AutoPilot
from the access email)
- Changed name label to conversational "What should I call you?"
- Screen 2: auto-advances 350ms after role selection (except "Other"
which still shows input + button)
- Screen 3: soft cap of 3 selections with green confirmation text and
shake animation on overflow attempt
- Thinned progress bar from ~10px to 3px (Linear/Notion style)
- Added purple checkmark badges on selected cards
- Updated Input atom focus state to purple ring

### Changes 🏗️

- **WelcomeStep**: "AutoPilot" branding, removed tooltip, conversational
label
- **RoleStep**: Updated subtitle, auto-advance on non-"Other" role
select, Continue button only for "Other"
- **PainPointsStep**: Soft cap of 3 with dynamic helper text and shake
animation
- **usePainPointsStep**: Added `atLimit`/`shaking` state, wrapped
`togglePainPoint` with cap logic
- **store.ts**: `togglePainPoint` returns early when at 3 and adding
- **ProgressBar**: 3px height, removed glow shadow
- **SelectableCard**: Added purple checkmark badge on selected state
- **Input atom**: Focus ring changed from zinc to purple
- **tailwind.config.ts**: Added `shake` keyframe and `animate-shake`
utility

### Checklist 📋

#### For code changes:
- [ ] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan:
  - [ ] Navigate through full onboarding flow (screens 1→2→3→4)
  - [ ] Verify "AutoPilot" branding on all screens (no "Autopilot")
  - [ ] Verify screen 2 auto-advances after tapping a role (non-"Other")
  - [ ] Verify "Other" role still shows text input and Continue button
  - [ ] Verify Back button works correctly from screen 2 and 3
  - [ ] Select 3 pain points and verify green "3 selected" text
  - [ ] Attempt 4th selection and verify shake animation + swap message
  - [ ] Deselect one and verify can select a different one
  - [ ] Verify checkmark badges appear on selected cards
  - [ ] Verify progress bar is thin (3px) and subtle
  - [ ] Verify input focus state is purple across onboarding inputs
- [ ] Verify "Something else" + other text input still works on screen 3

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 17:58:36 +07:00
Zamil Majdy
243b12778f dx: improve pr-test skill — inline screenshots, flow captions, and test evaluation (#12692)
## Changes

### 1. Inline image enforcement (Step 7)
- Added `CRITICAL` warning: never post a bare directory tree link
- Added post-comment verification block that greps for `![` tags and
exits 1 if none found — agents can't silently skip inline embedding

### 2. Structured screenshot captions (Step 6)
- `SCREENSHOT_EXPLANATIONS` now requires **Flow** (which scenario),
**Steps** (exact actions taken), **Evidence** (what this proves)
- Good/bad example included so agents know what format is expected
- A bare "shows the page" caption is explicitly rejected

### 3. Test completeness evaluation (Step 8) — new step
After posting screenshots, the agent must evaluate coverage against the
test plan and post a formal GitHub review:
- **`APPROVE`** — every scenario tested with screenshot + DB/API
evidence, no blockers
- **`REQUEST_CHANGES`** — lists exact gaps: untested scenarios, missing
evidence, confirmed bugs
- Per-scenario checklist (/) required in the review body
- Cannot auto-approve without ticking every item in the test plan

## Why

- Agents were posting `https://github.com/.../tree/test-screenshots/...`
instead of `![name](url)` inline
- Screenshot captions were too vague to be useful ("shows the page")
- No mechanism to catch incomplete test runs — agent could skip
scenarios and still post a passing report

## Checklist

- [x] `.claude/skills/pr-test/SKILL.md` updated
- [x] No production code changes — skill/dx only
- [x] Pre-commit hooks pass
2026-04-07 16:04:08 +07:00
An Vy Le
43c81910ae fix(backend/copilot): skip AI blocks without model property in fix_ai_model_parameter (#12688)
### Why / What / How

**Why:** Some AI-category blocks do not expose a `"model"` input
property in their `inputSchema`. The `fix_ai_model_parameter` fixer was
unconditionally injecting a default model value (e.g. `"gpt-4o"`) into
any node whose block has category `"AI"`, regardless of whether that
block actually accepts a `model` input. This causes the agent JSON to
include an invalid field for those blocks.

**What:** Guard the model-injection logic with a check that `"model"`
exists in the block's `inputSchema.properties` before attempting to set
or validate the field. AI blocks that have no model selector are now
skipped entirely.

**How:** In `fix_ai_model_parameter`, after confirming `is_ai_block`,
extract `input_properties` from the block's `inputSchema.properties` and
`continue` if `"model"` is absent. The subsequent `model_schema` lookup
is also simplified to reuse the already-fetched `input_properties` dict.
A regression test is added to cover this case.

### Changes 🏗️

- `backend/copilot/tools/agent_generator/fixer.py`: In
`fix_ai_model_parameter`, skip AI-category nodes whose block
`inputSchema.properties` does not contain a `"model"` key; reuse
`input_properties` for the subsequent `model_schema` lookup.
- `backend/copilot/tools/agent_generator/fixer_test.py`: Add
`test_ai_block_without_model_property_is_skipped` to
`TestFixAiModelParameter`.

### Checklist 📋

#### For code changes:
- [ ] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] Run `poetry run pytest
backend/copilot/tools/agent_generator/fixer_test.py` — all 50 tests pass
(49 pre-existing + 1 new)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 17:14:11 +00:00
Ubbe
a11199aa67 dx(frontend): set up React integration testing with Vitest + RTL + MSW (#12667)
## Summary
- Establish React integration tests (Vitest + RTL + MSW) as the primary
frontend testing strategy (~90% of tests)
- Update all contributor documentation (TESTING.md, CONTRIBUTING.md,
AGENTS.md) to reflect the integration-first convention
- Add `NuqsTestingAdapter` and `TooltipProvider` to the shared test
wrapper so page-level tests work out of the box
- Write 8 integration tests for the library page as a reference example
for the pattern

## Why
We had the testing infrastructure (Vitest, RTL, MSW, Orval-generated
handlers) but no established convention for page-level integration
tests. Most existing tests were for stores or small components. Since
our frontend is client-first, we need a documented, repeatable pattern
for testing full pages with mocked APIs.

## What
- **Docs**: Rewrote `TESTING.md` as a comprehensive guide. Updated
testing sections in `CONTRIBUTING.md`, `frontend/AGENTS.md`,
`platform/AGENTS.md`, and `autogpt_platform/AGENTS.md`
- **Test infra**: Added `NuqsTestingAdapter` (for `nuqs` query state
hooks) and `TooltipProvider` (for Radix tooltips) to `test-utils.tsx`
- **Reference tests**: `library/__tests__/main.test.tsx` with 8 tests
covering agent rendering, tabs, folders, search bar, and Jump Back In

## How
- Convention: tests live in `__tests__/` next to `page.tsx`, named
descriptively (`main.test.tsx`, `search.test.tsx`)
- Pattern: `setupHandlers()` → `render(<Page />)` → `findBy*` assertions
- MSW handlers from
`@/app/api/__generated__/endpoints/{tag}/{tag}.msw.ts` for API mocking
- Custom `render()` from `@/tests/integrations/test-utils` wraps all
required providers

## Test plan
- [x] All 422 unit/integration tests pass (`pnpm test:unit`)
- [x] `pnpm format` clean
- [x] `pnpm lint` clean (no new errors)
- [x] `pnpm types` — pre-existing onboarding type errors only, no new
errors

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
2026-04-06 13:17:08 +00:00
Zamil Majdy
5f82a71d5f feat(copilot): add Fast/Thinking mode toggle with full tool parity (#12623)
### Why / What / How

Users need a way to choose between fast, cheap responses (Sonnet) and
deep reasoning (Opus) in the copilot. Previously only the SDK/Opus path
existed, and the baseline path was a degraded fallback with no tool
calling, no file attachments, no E2B sandbox, and no permission
enforcement.

This PR adds a copilot mode toggle and brings the baseline (fast) path
to full feature parity with the SDK (extended thinking) path.

### Changes 🏗️

#### 1. Mode toggle (UI → full stack)
- Add Fast / Thinking mode toggle to ChatInput footer (Phosphor
`Brain`/`Zap` icons via lucide-react)
- Thread `mode: "fast" | "extended_thinking" | null` from
`StreamChatRequest` → RabbitMQ queue → executor → service selection
- Fast → baseline service (Sonnet 4 via OpenRouter), Thinking → SDK
service (Opus 4.6)
- Toggle gated behind `CHAT_MODE_OPTION` feature flag with server-side
enforcement
- Mode persists in localStorage with SSR-safe init

#### 2. Baseline service full tool parity
- **Tool call persistence**: Store structured `ChatMessage` entries
(assistant + tool results) instead of flat concatenated text — enables
frontend to render tool call details and maintain context across turns
- **E2B sandbox**: Wire up `get_or_create_sandbox()` so `bash_exec`
routes to E2B (image download, Python/PIL compression, filesystem
access)
- **File attachments**: Accept `file_ids`, download workspace files,
embed images as OpenAI vision blocks, save non-images to working dir
- **Permissions**: Filter tool list via `CopilotPermissions`
(whitelist/blacklist)
- **URL context**: Pass `context` dict to user message for URL-shared
content
- **Execution context**: Pass `sandbox`, `sdk_cwd`, `permissions` to
`set_execution_context()`
- **Model**: Changed `fast_model` from `google/gemini-2.5-flash` to
`anthropic/claude-sonnet-4` for reliable function calling
- **Temp dir cleanup**: Lazy `mkdtemp` (only when files attached) +
`shutil.rmtree` in finally

#### 3. Transcript support for Fast mode
- Baseline service now downloads / validates / loads / appends / uploads
transcripts (parity with SDK)
- Enables seamless mode switching mid-conversation via shared transcript
- Upload shielded from cancellation, bounded at 5s timeout

#### 4. Feature-flag infrastructure fixes
- `FORCE_FLAG_*` env-var overrides on both backend and frontend for
local dev / E2E
- LaunchDarkly context parity (frontend mirrors backend user context)
- `CHAT_MODE_OPTION` default flipped to `false` to match backend

#### 5. Other hardening
- Double-submit ref guard in `useChatInput` + reconnect dedup in
`useCopilotStream`
- `copilotModeRef` pattern to read latest mode without recreating
transport
- Shared `CopilotMode` type across frontend files
- File name collision handling with numeric suffix
- Path sanitization in file description hints (`os.path.basename`)

### Test plan
- [x] 30 new unit tests: `_env_flag_override` (12), `envFlagOverride`
(8), `_filter_tools_by_permissions` (4), `_prepare_baseline_attachments`
(6)
- [x] E2E tested on dev: fast mode creates E2B sandbox, calls 7-10
tools, generates and renders images
- [x] Mode switching mid-session works (shared transcript + session
messages)
- [x] Server-side flag gate enforced (crafted `mode=fast` stripped when
flag off)
- [x] All 37 CI checks green
- [x] Verified via agent-browser: workspace images render correctly in
all message positions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Zamil Majdy <majdy.zamil@gmail.com>
2026-04-06 19:54:36 +07:00
Nicholas Tindle
1a305db162 ci(frontend): add Playwright E2E coverage reporting to Codecov (#12665)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 00:55:09 -05:00
Zamil Majdy
48a653dc63 fix(copilot): prevent duplicate side effects from double-submit and stale-cache race (#12660)
## Why

#12604 (intermediate persistence) introduced two bugs on dev:

1. **Duplicate user messages** — `set_turn_duration` calls
`invalidate_session_cache()` which deletes the Redis key. Concurrent
`get_chat_session()` calls re-populate it from DB with stale data. The
executor loads this stale cache, misses the user message, and re-appends
it.

2. **Tool outputs lost on hydration** — Intermediate flushes save
assistant messages to DB before `StreamToolInputAvailable` sets
`tool_calls` on them. Since `_save_session_to_db` is append-only (uses
`start_sequence`), the `tool_calls` update is lost — subsequent flushes
start past that index. On page refresh / SSE reconnect, tool UIs
(SetupRequirementsCard, run_block output, etc.) are invisible.

3. **Sessions stuck running** — If a tool call hangs (e.g. WebSearch
provider not responding), the stream never completes,
`mark_session_completed` never runs, and the `active_stream` flag stays
stale in Redis.

## What

- **In-place cache update** in `set_turn_duration` — replaces
`invalidate_session_cache()` with a read-modify-write that patches the
duration on the cached session, eliminating the stale-cache repopulation
window
- **tool_calls backfill** — tracks the flush watermark and assistant
message index; when `StreamToolInputAvailable` sets `tool_calls` on an
already-flushed assistant, updates the DB record directly via
`update_message_tool_calls()`
- **Improved message dedup** — `is_message_duplicate()` /
`maybe_append_user_message()` scans trailing same-role messages (current
turn) instead of only checking `messages[-1]`
- **Idle timeout** — aborts the stream with a retryable error if no
meaningful SDK message arrives for 10 minutes, preventing hung tool
calls from leaving sessions stuck

## Changes

- `copilot/db.py` — `update_message_tool_calls()`, in-place cache update
in `set_turn_duration`
- `copilot/model.py` — `is_message_duplicate()`,
`maybe_append_user_message()`
- `copilot/sdk/service.py` — flush watermark tracking, tool_calls
backfill, idle timeout
- `copilot/baseline/service.py` — use `maybe_append_user_message()`
- `copilot/model_test.py` — unit tests for dedup
- `copilot/db_test.py` — unit tests for set_turn_duration cache update

## Checklist

- [x] My PR title follows [conventional
commit](https://www.conventionalcommits.org/) format
- [x] Out-of-scope changes are less than 20% of the PR
- [x] Changes to `data/*.py` validated for user ID checks (N/A)
- [x] Protected routes updated in middleware (N/A)
2026-04-04 01:09:42 +07:00
Toran Bruce Richards
f6ddcbc6cb feat(platform): Add all 12 Z.ai GLM models via OpenRouter (#12672)
## Summary

Add Z.ai (Zhipu AI) GLM model family to the platform LLM blocks, routed
through OpenRouter. This enables users to select any of the 12 Z.ai
models across all LLM-powered blocks (AI Text Generator, AI
Conversation, AI Structured Response, AI Text Summarizer, AI List
Generator).

## Gap Analysis

All 12 Z.ai models currently available on OpenRouter's API were missing
from the AutoGPT platform:

| Model | Context Window | Max Output | Price Tier | Cost |
|-------|---------------|------------|------------|------|
| GLM 4 32B | 128K | N/A | Tier 1 | 1 |
| GLM 4.5 | 131K | 98K | Tier 2 | 2 |
| GLM 4.5 Air | 131K | 98K | Tier 1 | 1 |
| GLM 4.5 Air (Free) | 131K | 96K | Tier 1 | 1 |
| GLM 4.5V (vision) | 65K | 16K | Tier 2 | 2 |
| GLM 4.6 | 204K | 204K | Tier 1 | 1 |
| GLM 4.6V (vision) | 131K | 131K | Tier 1 | 1 |
| GLM 4.7 | 202K | 65K | Tier 1 | 1 |
| GLM 4.7 Flash | 202K | N/A | Tier 1 | 1 |
| GLM 5 | 80K | 131K | Tier 2 | 2 |
| GLM 5 Turbo | 202K | 131K | Tier 3 | 4 |
| GLM 5V Turbo (vision) | 202K | 131K | Tier 3 | 4 |

## Changes

- **`autogpt_platform/backend/backend/blocks/llm.py`**: Added 12
`LlmModel` enum entries and corresponding `MODEL_METADATA` with context
windows, max output tokens, display names, and price tiers sourced from
OpenRouter API
- **`autogpt_platform/backend/backend/data/block_cost_config.py`**:
Added `MODEL_COST` entries for all 12 models, with costs scaled to match
pricing (1 for budget, 2 for mid-range, 4 for premium)

## How it works

All Z.ai models route through the existing OpenRouter provider
(`open_router`) — no new provider or API client code needed. Users with
an OpenRouter API key can immediately select any Z.ai model from the
model dropdown in any LLM block.

## Related

- Linear: REQ-83

---------

Co-authored-by: AutoGPT CoPilot <copilot@agpt.co>
2026-04-03 15:48:33 +00:00
Zamil Majdy
98f13a6e5d feat(copilot): add create -> dry-run -> fix loop to agent generation (#12578)
## Summary
- Instructs the copilot LLM to automatically dry-run agents after
creating or editing them, inspect the output for wiring/data-flow
issues, and fix iteratively before presenting the agent as ready to the
user
- Updates tool descriptions (run_agent, get_agent_building_guide),
prompting supplement, and agent generation guide with clear workflow
instructions and error pattern guidance
- Adds Tool Discovery Priority to shared tool notes (find_block ->
run_mcp_tool -> SendAuthenticatedWebRequestBlock -> manual API)
- Adds 37 tests: prompt regression tests + functional tests (tool schema
validation, Pydantic model, guide workflow ordering)
- **Frontend**: Fixes host-scoped credential UX — replaces duplicate
credentials for the same host instead of stacking them, wires up delete
functionality with confirmation modal, updates button text contextually
("Update headers" vs "Add headers")

## Test plan
- [x] All 37 `dry_run_loop_test.py` tests pass (prompt content, tool
schemas, Pydantic model, guide ordering)
- [x] Existing `tool_schema_test.py` passes (110 tests including
character budget gate)
- [x] Ruff lint and format pass
- [x] Pyright type checking passes
- [x] Frontend: `pnpm lint`, `pnpm types` pass
- [x] Manual verification: confirm copilot follows the create -> dry-run
-> fix workflow when asked to build an agent
- [x] Manual verification: confirm host-scoped credentials replace
instead of duplicate
2026-04-03 14:48:57 +00:00
Zamil Majdy
613978a611 ci: add gitleaks secret scanning to pre-commit hooks (#12649)
### Why / What / How

**Why:** We had no local pre-commit protection against accidentally
committing secrets. The existing `detect-secrets` hook only ran on
`pre-push`, which is too late — secrets are already in git history by
that point. GitHub's push protection only covers known provider patterns
and runs server-side.

**What:** Adds a 3-layer defense against secret leaks: local pre-commit
hooks (gitleaks + detect-secrets), and a CI workflow as a safety net.

**How:** 
- Moved `detect-secrets` from `pre-push` to `pre-commit` stage
- Added `gitleaks` as a second pre-commit hook (Go binary, faster and
more comprehensive rule set)
- Added `.gitleaks.toml` config with allowlists for known false
positives (test fixtures, dev docker JWTs, Firebase public keys, lock
files, docs examples)
- Added `repo-secret-scan.yml` CI workflow using `gitleaks-action` on
PRs/pushes to master/dev

### Changes 🏗️

- `.pre-commit-config.yaml`: Moved `detect-secrets` to pre-commit stage,
added baseline arg, added `gitleaks` hook
- `.gitleaks.toml`: New config with tuned allowlists for this repo's
false positives
- `.secrets.baseline`: Empty baseline for detect-secrets to track known
findings
- `.github/workflows/repo-secret-scan.yml`: New CI workflow running
gitleaks on every PR and push

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Ran `gitleaks detect --no-git` against the full repo — only `.env`
files (gitignored) remain as findings
  - [x] Verified gitleaks catches a test secret file correctly
- [x] Pre-commit hooks pass on commit (both detect-secrets and gitleaks
passed)

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
2026-04-03 14:01:26 +00:00
Zamil Majdy
2b0e8a5a9f feat(platform): add rate-limit tiering system for CoPilot (#12581)
## Summary
- Adds a four-tier subscription system (FREE/PRO/BUSINESS/ENTERPRISE)
for CoPilot with configurable multipliers (1x/5x/20x/60x) applied on top
of the base LaunchDarkly/config limits
- Stores user tier in the database (`User.subscriptionTier` column as a
Prisma enum, defaults to PRO for beta testing) with admin API endpoints
for tier management
- Includes tier info in usage status responses and OTEL/Langfuse trace
metadata for observability

## Tier Structure
| Tier | Multiplier | Daily Tokens | Weekly Tokens | Notes |
|------|-----------|-------------|--------------|-------|
| FREE | 1x | 2.5M | 12.5M | Base tier (unused during beta) |
| PRO | 5x | 12.5M | 62.5M | Default on sign-up (beta) |
| BUSINESS | 20x | 50M | 250M | Manual upgrade for select users |
| ENTERPRISE | 60x | 150M | 750M | Highest tier, custom |

## Changes
- **`rate_limit.py`**: `SubscriptionTier` enum
(FREE/PRO/BUSINESS/ENTERPRISE), `TIER_MULTIPLIERS`, `get_user_tier()`,
`set_user_tier()`, update `get_global_rate_limits()` to apply tier
multiplier and return 3-tuple, add `tier` field to `CoPilotUsageStatus`
- **`rate_limit_admin_routes.py`**: Add `GET/POST
/admin/rate_limit/tier` endpoints, include `tier` in
`UserRateLimitResponse`
- **`routes.py`** (chat): Include tier in `/usage` endpoint response
- **`sdk/service.py`**: Send `subscription_tier` in OTEL/Langfuse trace
metadata
- **`schema.prisma`**: Add `SubscriptionTier` enum and
`subscriptionTier` column to `User` model (default: PRO)
- **`config.py`**: Update docs to reflect tier system
- **Migration**: `20260326200000_add_rate_limit_tier` — creates enum,
migrates STANDARD→PRO, adds BUSINESS, sets default to PRO

## Test plan
- [x] 72 unit tests all passing (43 rate_limit + 11 admin routes + 18
chat routes)
- [ ] Verify FREE tier users get base limits (2.5M daily, 12.5M weekly)
- [ ] Verify PRO tier users get 5x limits (12.5M daily, 62.5M weekly)
- [ ] Verify BUSINESS tier users get 20x limits (50M daily, 250M weekly)
- [ ] Verify ENTERPRISE tier users get 60x limits (150M daily, 750M
weekly)
- [ ] Verify admin can read and set user tiers via API
- [ ] Verify tier info appears in Langfuse traces
- [ ] Verify migration applies cleanly (creates enum, migrates STANDARD
users to PRO, adds BUSINESS, default PRO)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
2026-04-03 13:36:01 +00:00
Zamil Majdy
08bb05141c dx: enhance pr-address skill with detailed codecov coverage guidance (#12662)
Enhanced pr-address skill codecov section with local coverage commands,
priority guide, and troubleshooting steps.
2026-04-03 13:15:46 +00:00
Nicholas Tindle
3ccaa5e103 ci(frontend): make frontend coverage checks informational (non-blocking) (#12663)
### Why / What / How

**Why:** Frontend test coverage is still ramping up. The default
component status checks (project + patch at 80%) would block merges for
insufficient coverage on frontend changes, which isn't practical yet.

**What:** Override the platform-frontend component's coverage statuses
to be `informational: true`, so they report but don't block merges.

**How:** Added explicit `statuses` to the `platform-frontend` component
in `codecov.yml` with `informational: true` on both project and patch
checks, overriding the `default_rules`.

### Changes 🏗️

- **`codecov.yml`**: Added `informational: true` to platform-frontend
component's project and patch status checks

### Checklist 📋

#### For code changes:
- [ ] I have clearly listed my changes in the PR description
- [ ] I have made a test plan
- [ ] I have tested my changes according to the test plan:
- [ ] Verify Codecov frontend status checks show as informational
(non-blocking) on PRs touching frontend code

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Low risk: Codecov configuration-only change that affects merge gating
for frontend coverage statuses but does not alter runtime code.
> 
> **Overview**
> Updates `codecov.yml` to override the `platform-frontend` component’s
coverage `statuses` so both **project** and **patch** checks are marked
`informational: true` (non-blocking), while leaving the default
component coverage rules unchanged for other components.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
f8e8426a31. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 12:22:05 +00:00
Zamil Majdy
de094eee36 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-p0-cli-internals 2026-04-03 13:07:01 +02:00
Zamil Majdy
bddc633a11 fix(copilot): increase guardrail defaults — max_turns=1000, max_budget_usd=100 2026-04-03 10:04:11 +02:00
Zamil Majdy
2411cc386d fix(backend/copilot): update p0 guardrail tests to check env.py after #12635 move
The security env vars (CLAUDE_CODE_TMPDIR, CLAUDE_CODE_DISABLE_CLAUDE_MDS,
etc.) were moved from service.py to build_sdk_env() in env.py by PR #12635.
Update the p0_guardrails_test.py source-grep assertions to point at env.py,
and add the four security env vars to build_sdk_env() which were dropped
during the extraction.
2026-04-02 19:32:24 +02:00
Zamil Majdy
49bef40ef0 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-p0-cli-internals 2026-04-02 19:28:39 +02:00
Zamil Majdy
eeb2f08d6d merge: resolve conflict with dev (use build_sdk_env(sdk_cwd=) from #12635) 2026-04-02 19:16:39 +02:00
Zamil Majdy
eda02f9ce6 fix(backend/copilot): remove duplicate StreamError in _HandledStreamError handler
The _HandledStreamError exception is only raised by _run_stream_attempt
*after* it has already yielded a StreamError to the client. The handler
in the retry loop was yielding a second StreamError for non-transient
errors (e.g. circuit breaker trips) and when transient retries were
exhausted, causing the client to receive duplicate error events.

Remove the redundant yield since the StreamError was already sent.
2026-04-02 17:03:40 +02:00
Zamil Majdy
2a969e5018 fix(backend/copilot): yield final StreamError after transient retry exhaustion for _HandledStreamError
When _run_stream_attempt raises a _HandledStreamError and all transient
retries are exhausted, the outer retry loop sets ended_with_stream_error
but stream_err remains None.  The post-loop code only emits a StreamError
when stream_err is not None, so the SSE stream closes silently and the
frontend never learns the request failed.

Yield a StreamError with the attempt's error message and code just before
breaking out of the retry loop, ensuring clients always receive an error
notification.
2026-04-02 16:49:18 +02:00
Zamil Majdy
a68f48e6b7 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-p0-cli-internals 2026-04-02 15:55:59 +02:00
Zamil Majdy
2bf5a37646 fix(backend): add ge/le bounds to claude_agent_max_transient_retries config field
The field lacked validation bounds unlike max_turns and max_budget_usd,
allowing negative or excessively large values to be configured.
2026-04-02 14:35:09 +02:00
Zamil Majdy
289a19d402 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-p0-cli-internals 2026-04-02 14:34:33 +02:00
Zamil Majdy
e57e48272a security: remove test artifacts containing leaked API keys and OAuth tokens 2026-04-02 10:23:21 +02:00
Zamil Majdy
c2f421cb42 dx(backend/copilot): add live execution guardrail verification for PR #12636
Programmatic verification from running container proving all P0 guardrails
are deployed and active: max_turns=50, max_budget_usd=5.0,
fallback_model=claude-sonnet-4-20250514, max_transient_retries=3,
security env vars, and _last_reset_attempt infinite-loop fix.
2026-04-02 10:01:46 +02:00
Zamil Majdy
e3d589b180 fix(backend/copilot): exclude StreamError/StreamStatus from events_yielded counter
StreamError and StreamStatus are ephemeral notifications, not content
events. When _run_stream_attempt yields a StreamError for a transient
API error before raising _HandledStreamError, the events_yielded counter
was incremented, causing _next_transient_backoff() to return None and
bypassing the retry logic entirely. Exclude these event types from the
counter so transient errors are properly retried with exponential backoff.
2026-04-02 09:56:34 +02:00
Zamil Majdy
8de935c84b dx(backend/copilot): add round 3 E2E test screenshots for PR #12636 2026-04-02 09:20:32 +02:00
Zamil Majdy
a55653f8c1 fix(backend): tighten fallback model detection and reset flag on retry
- Remove "overloaded" from the fallback detection pattern in _on_stderr;
  only "fallback" reliably indicates the SDK switched models. An
  "overloaded" stderr line may just be a transient 529 error that gets
  retried without activating the fallback.

- Reset fallback_model_activated = False at the start of each retry
  iteration (alongside fallback_notified) so a flag set during a failed
  attempt does not leak into the next attempt as a spurious notification.
2026-04-02 07:50:34 +02:00
Zamil Majdy
3e6faf2de7 fix(copilot): address remaining should-fix items from reviewer
- Extract _normalize_model_name() to deduplicate provider-prefix
  stripping and dot-to-hyphen normalization shared by _resolve_sdk_model
  and _resolve_fallback_model.
- Emit a StreamStatus notification when the SDK activates the fallback
  model (detected via CLI stderr lines containing "fallback" or
  "overloaded").
- Item 5 (transcript rollback) was already addressed — both
  _HandledStreamError and generic Exception handlers snapshot and
  restore transcript_builder._entries on retry.
2026-04-02 06:53:55 +02:00
Zamil Majdy
22e8c5c353 fix(copilot): update response_adapter test for expanded transient patterns
"API rate limited" is now correctly caught by is_transient_api_error
after adding 429/rate-limit patterns. Use a non-transient error
("Invalid API key provided") to test the raw error pass-through path.
2026-04-02 06:31:24 +02:00
Zamil Majdy
b3d9e9e856 fix(backend): add 429/5xx patterns to is_transient_api_error and add config validators
- Add rate-limit (429) and server error (5xx) string patterns to
  is_transient_api_error() so the fallback retry path catches these
  in addition to connection-level errors (ECONNRESET).
- Add ge/le validators on max_turns (1-500) and max_budget_usd
  (0.01-100.0) to prevent misconfiguration.
- Rename max_transient -> max_transient_retries and
  _can_retry_transient() -> _next_transient_backoff() for clarity.
- Add comprehensive tests for all new transient patterns and config
  boundary validation.
2026-04-02 06:21:51 +02:00
Zamil Majdy
32bfe1b209 Merge branch 'dev' of github.com:Significant-Gravitas/AutoGPT into fix/copilot-p0-cli-internals 2026-04-01 20:52:00 +02:00
Zamil Majdy
b220fe4347 test(copilot): add build_sdk_env tests for all 3 auth modes
Cover subscription, direct Anthropic, and OpenRouter auth modes in
build_sdk_env(). Also verifies that all modes return a mutable dict
that can accept security env vars like CLAUDE_CODE_TMPDIR.
2026-04-01 20:31:32 +02:00
Zamil Majdy
61513b9dad fix(copilot): mock build_sdk_env to return {} instead of None in retry tests
The tests were mocking build_sdk_env to return None, but the service
code now assigns security env vars (CLAUDE_CODE_TMPDIR, etc.) to the
returned dict. This caused TypeError: 'NoneType' object does not
support item assignment in all 6 retry scenario tests.
2026-04-01 20:27:51 +02:00
Zamil Majdy
e753aee7a0 fix(copilot): prevent infinite transient retry loop
The transient_retries counter was reset to 0 at the top of the while
loop on every iteration, including after transient retry `continue`
statements.  Since transient retries don't increment `attempt`, the
counter reset every time, creating an infinite retry loop that could
never exhaust the max_transient budget.

Fix: only reset transient_retries when the context-level `attempt`
actually changes, using a _last_reset_attempt sentinel.
2026-04-01 18:21:50 +02:00
Zamil Majdy
3f24a003ad fix(copilot): add None guard to fix pyright reportOperatorIssue
_resolve_fallback_model returns str | None, so pyright flags the
`"." not in result` assertion.  Add an explicit `is not None` check
before the containment test to narrow the type.
2026-04-01 18:15:16 +02:00
Zamil Majdy
a369fbe169 fix(copilot): replace tautological env-var tests with source assertions
The TestSecurityEnvVars tests were testing Python dict assignment rather
than verifying the actual production code. Replace with source-level
assertions that grep service.py for the required env var names, catching
accidental removals without duplicating production logic.
2026-04-01 18:05:50 +02:00
Zamil Majdy
d3173605eb test(copilot): add unit tests for P0 guardrails
Tests for _resolve_fallback_model (5 tests), security env vars (4 tests),
and ChatConfig defaults (4 tests). All 13 tests pass.
2026-04-01 17:59:09 +02:00
Zamil Majdy
98c27653f2 fix(copilot): snapshot/restore TranscriptBuilder on transient retry
TranscriptBuilder._entries is independent from session.messages.
Rolling back session.messages alone left duplicate entries in the
uploaded --resume transcript. Now snapshot _entries + _last_uuid
before each attempt and restore both rollback locations on failure.
2026-04-01 17:59:09 +02:00
Zamil Majdy
dced534df3 fix(copilot): review round 3 — fix transient error code check, add SDK compat fields
- Fix exc.code check: "transient" -> "transient_api_error" to match
  the actual code set in _run_stream_attempt (line 1343)
- Add fallback_model, max_turns, max_budget_usd, stderr to SDK compat
  test so field renames in the SDK are caught early
2026-04-01 17:59:09 +02:00
Zamil Majdy
4ebe294707 fix(copilot): review round 2 — fix transient retry consuming context-level attempt
Convert for-loop to while-loop so transient retries (continue) replay
the same context-level attempt instead of advancing to the next one.
Previously, `continue` in a `for attempt in range(...)` loop would
increment `attempt`, causing transient retries to wastefully trigger
context reduction and reset the transient retry counter.

Now: transient retries stay at the same attempt (no attempt++), while
context-error retries explicitly increment attempt before continue.
2026-04-01 17:59:09 +02:00
Zamil Majdy
2e8e115cd1 fix(copilot): review round 1 — fix transient retry count, strip fallback model prefix
- Fix _can_retry_transient off-by-one: >= should be > so max_retries=3
  actually performs 3 retries instead of 2
- Move events_yielded check before counter increment to avoid wasting
  a retry slot when events were already sent
- Strip OpenRouter provider prefix from fallback model name (mirrors
  _resolve_sdk_model logic) to prevent model-not-found errors
2026-04-01 17:59:09 +02:00
Zamil Majdy
5ca49a8ec9 fix(copilot): P0 guardrails — SDK limits, security env vars, transient retry
Based on analysis of the Claude Code CLI internals, adds critical
guardrails rebased on the current dev architecture (env.py extraction):

1. SDK guardrails: fallback_model (auto-retry on 529), max_turns=50
   (runaway prevention), max_budget_usd=5.0 (per-query cost cap)

2. TMPDIR redirect: sets CLAUDE_CODE_TMPDIR to sdk_cwd so CLI output
   is routed into the per-session workspace for isolation/cleanup

3. Security env vars: DISABLE_CLAUDE_MDS, SKIP_PROMPT_HISTORY,
   DISABLE_AUTO_MEMORY, DISABLE_NONESSENTIAL_TRAFFIC

4. Transient error retry: 429/5xx/ECONNRESET errors now retry with
   exponential backoff (1s, 2s, 4s) in both _HandledStreamError and
   generic Exception handlers. Skips retry if events already yielded
2026-04-01 17:59:09 +02:00
204 changed files with 21986 additions and 2365 deletions

View File

@@ -95,6 +95,28 @@ Address comments **one at a time**: fix → commit → push → inline reply →
| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="🤖 Fixed in <commit-sha>: <description>"` |
| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="🤖 Fixed in <commit-sha>: <description>"` |
## Codecov coverage
Codecov patch target is **80%** on changed lines. Checks are **informational** (not blocking) but should be green.
### Running coverage locally
**Backend** (from `autogpt_platform/backend/`):
```bash
poetry run pytest -s -vv --cov=backend --cov-branch --cov-report term-missing
```
**Frontend** (from `autogpt_platform/frontend/`):
```bash
pnpm vitest run --coverage
```
### When codecov/patch fails
1. Find uncovered files: `git diff --name-only $(gh pr view --json baseRefName --jq '.baseRefName')...HEAD`
2. For each uncovered file — extract inline logic to `helpers.ts`/`helpers.py` and test those (highest ROI). Colocate tests as `*_test.py` (backend) or `__tests__/*.test.ts` (frontend).
3. Run coverage locally to verify, commit, push.
## Format and commit
After fixing, format the changed code:

View File

@@ -530,9 +530,19 @@ After showing all screenshots, output a **detailed** summary table:
# but Homebrew bash is 5.x; Linux typically has bash 5.x). If running on Bash <4, use a
# plain variable with a lookup function instead.
declare -A SCREENSHOT_EXPLANATIONS=(
["01-login-page.png"]="Shows the login page loaded successfully with SSO options visible."
["02-builder-with-block.png"]="The builder canvas displays the newly added block connected to the trigger."
# ... one entry per screenshot, using the same explanations you showed the user above
# Each explanation MUST answer three things:
# 1. FLOW: Which test scenario / user journey is this part of?
# 2. STEPS: What exact actions were taken to reach this state?
# 3. EVIDENCE: What does this screenshot prove (pass/fail/data)?
#
# Good example:
# ["03-cost-log-after-run.png"]="Flow: LLM block cost tracking. Steps: Logged in as tester@gmail.com → ran 'Cost Test Agent' → waited for COMPLETED status. Evidence: PlatformCostLog table shows 1 new row with cost_microdollars=1234 and correct user_id."
#
# Bad example (too vague — never do this):
# ["03-cost-log.png"]="Shows the cost log table."
["01-login-page.png"]="Flow: Login flow. Steps: Opened /login. Evidence: Login page renders with email/password fields and SSO options visible."
["02-builder-with-block.png"]="Flow: Block execution. Steps: Logged in → /build → added LLM block. Evidence: Builder canvas shows block connected to trigger, ready to run."
# ... one entry per screenshot using the flow/steps/evidence format above
)
TEST_RESULTS_TABLE="| 1 | Login flow | PASS | N/A | 01-login-before.png, 02-login-after.png |
@@ -547,6 +557,9 @@ Upload screenshots to the PR using the GitHub Git API (no local git operations
**This step is MANDATORY. Every test run MUST post a PR comment with screenshots. No exceptions.**
> **CRITICAL — NEVER post a bare directory link like `https://github.com/.../tree/...`.**
> Every screenshot MUST appear as `![name](raw_url)` inline in the PR comment so reviewers can see them without clicking any links. After posting, the verification step below greps the comment for `![` tags and exits 1 if none are found — the test run is considered incomplete until this passes.
```bash
# Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
REPO="Significant-Gravitas/AutoGPT"
@@ -582,12 +595,25 @@ for img in "${SCREENSHOT_FILES[@]}"; do
done
TREE_JSON+=']'
# Step 2: Create tree, commit, and branch ref
# Step 2: Create tree, commit (with parent), and branch ref
TREE_SHA=$(echo "$TREE_JSON" | jq -c '{tree: .}' | gh api "repos/${REPO}/git/trees" --input - --jq '.sha')
COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
-f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
-f tree="$TREE_SHA" \
--jq '.sha')
# Resolve existing branch tip as parent (avoids orphan commits on repeat runs)
PARENT_SHA=$(gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" --jq '.object.sha' 2>/dev/null || true)
if [ -n "$PARENT_SHA" ]; then
COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
-f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
-f tree="$TREE_SHA" \
-f "parents[]=$PARENT_SHA" \
--jq '.sha')
else
# First commit on this branch — no parent
COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
-f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
-f tree="$TREE_SHA" \
--jq '.sha')
fi
gh api "repos/${REPO}/git/refs" \
-f ref="refs/heads/${SCREENSHOTS_BRANCH}" \
-f sha="$COMMIT_SHA" 2>/dev/null \
@@ -656,17 +682,123 @@ ${IMAGE_MARKDOWN}
${FAILED_SECTION}
INNEREOF
gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE"
POSTED_BODY=$(gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE" --jq '.body')
rm -f "$COMMENT_FILE"
```
**The PR comment MUST include:**
1. A summary table of all scenarios with PASS/FAIL and before/after API evidence
2. Every successfully uploaded screenshot rendered inline; any failed uploads listed with manual attachment instructions
3. A 1-2 sentence explanation below each screenshot describing what it proves
3. A structured explanation below each screenshot covering: **Flow** (which scenario), **Steps** (exact actions taken to reach this state), **Evidence** (what this proves — pass/fail/data values). A bare "shows the page" caption is not acceptable.
This approach uses the GitHub Git API to create blobs, trees, commits, and refs entirely server-side. No local `git checkout` or `git push` — safe for worktrees and won't interfere with the PR branch.
**Verify inline rendering after posting — this is required, not optional:**
```bash
# 1. Confirm the posted comment body contains inline image markdown syntax
if ! echo "$POSTED_BODY" | grep -q '!\['; then
echo "❌ FAIL: No inline image tags in posted comment body. Re-check IMAGE_MARKDOWN and re-post."
exit 1
fi
# 2. Verify at least one raw URL actually resolves (catches wrong branch name, wrong path, etc.)
FIRST_IMG_URL=$(echo "$POSTED_BODY" | grep -o 'https://raw.githubusercontent.com[^)]*' | head -1)
if [ -n "$FIRST_IMG_URL" ]; then
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" --max-time 10 "$FIRST_IMG_URL")
if [ "$HTTP_STATUS" = "200" ]; then
echo "✅ Inline images confirmed and raw URL resolves (HTTP 200)"
else
echo "❌ FAIL: Raw image URL returned HTTP $HTTP_STATUS — images will not render inline."
echo " URL: $FIRST_IMG_URL"
echo " Check branch name, path, and that the push succeeded."
exit 1
fi
else
echo "⚠️ Could not extract a raw URL from the comment — verify manually."
fi
```
## Step 8: Evaluate test completeness and post a GitHub review
After posting the PR comment, evaluate whether the test run actually covered everything it needed to. This is NOT a rubber-stamp — be critical. Then post a formal GitHub review so the PR author and reviewers can see the verdict.
### 8a. Evaluate against the test plan
Re-read `$RESULTS_DIR/test-plan.md` (written in Step 2) and `$RESULTS_DIR/test-report.md` (written in Step 5). For each scenario in the plan, answer:
> **Note:** `test-report.md` is written in Step 5. If it doesn't exist, write it before proceeding here — see the Step 5 template. Do not skip evaluation because the file is missing; create it from your notes instead.
| Question | Pass criteria |
|----------|--------------|
| Was it tested? | Explicit steps were executed, not just described |
| Is there screenshot evidence? | At least one before/after screenshot per scenario |
| Did the core feature work correctly? | Expected state matches actual state |
| Were negative cases tested? | At least one failure/rejection case per feature |
| Was DB/API state verified (not just UI)? | Raw API response or DB query confirms state change |
Build a verdict:
- **APPROVE** — every scenario tested, evidence present, no bugs found or all bugs are minor/known
- **REQUEST_CHANGES** — one or more: untested scenarios, missing evidence, bugs found, data not verified
### 8b. Post the GitHub review
```bash
EVAL_FILE=$(mktemp)
# === STEP A: Write header ===
cat > "$EVAL_FILE" << 'ENDEVAL'
## 🧪 Test Evaluation
### Coverage checklist
ENDEVAL
# === STEP B: Append ONE line per scenario — do this BEFORE calculating verdict ===
# Format: "- ✅ **Scenario N name**: <what was done and verified>"
# or "- ❌ **Scenario N name**: <what is missing or broken>"
# Examples:
# echo "- ✅ **Scenario 1 Login flow**: tested, screenshot evidence present, auth token verified via API" >> "$EVAL_FILE"
# echo "- ❌ **Scenario 3 Cost logging**: NOT verified in DB — UI showed entry but raw SQL query was skipped" >> "$EVAL_FILE"
#
# !!! IMPORTANT: append ALL scenario lines here before proceeding to STEP C !!!
# === STEP C: Derive verdict from the checklist — runs AFTER all lines are appended ===
FAIL_COUNT=$(grep -c "^- ❌" "$EVAL_FILE" || true)
if [ "$FAIL_COUNT" -eq 0 ]; then
VERDICT="APPROVE"
else
VERDICT="REQUEST_CHANGES"
fi
# === STEP D: Append verdict section ===
cat >> "$EVAL_FILE" << ENDVERDICT
### Verdict
ENDVERDICT
if [ "$VERDICT" = "APPROVE" ]; then
echo "✅ All scenarios covered with evidence. No blocking issues found." >> "$EVAL_FILE"
else
echo "$FAIL_COUNT scenario(s) incomplete or have confirmed bugs. See ❌ items above." >> "$EVAL_FILE"
echo "" >> "$EVAL_FILE"
echo "**Required before merge:** address each ❌ item above." >> "$EVAL_FILE"
fi
# === STEP E: Post the review ===
gh api "repos/${REPO}/pulls/$PR_NUMBER/reviews" \
--method POST \
-f body="$(cat "$EVAL_FILE")" \
-f event="$VERDICT"
rm -f "$EVAL_FILE"
```
**Rules:**
- Never auto-approve without checking every scenario in the test plan
- `REQUEST_CHANGES` if ANY scenario is untested, lacks DB/API evidence, or has a confirmed bug
- The evaluation body must list every scenario explicitly (✅ or ❌) — not just the failures
- If you find new bugs during evaluation, add them to the request-changes body and (if `--fix` flag is set) fix them before posting
## Fix mode (--fix flag)
When `--fix` is present, the standard is HIGHER. Do not just note issues — FIX them immediately.

View File

@@ -0,0 +1,224 @@
---
name: write-frontend-tests
description: "Analyze the current branch diff against dev, plan integration tests for changed frontend pages/components, and write them. TRIGGER when user asks to write frontend tests, add test coverage, or 'write tests for my changes'."
user-invocable: true
args: "[base branch] — defaults to dev. Optionally pass a specific base branch to diff against."
metadata:
author: autogpt-team
version: "1.0.0"
---
# Write Frontend Tests
Analyze the current branch's frontend changes, plan integration tests, and write them.
## References
Before writing any tests, read the testing rules and conventions:
- `autogpt_platform/frontend/TESTING.md` — testing strategy, file locations, examples
- `autogpt_platform/frontend/src/tests/AGENTS.md` — detailed testing rules, MSW patterns, decision flowchart
- `autogpt_platform/frontend/src/tests/integrations/test-utils.tsx` — custom render with providers
- `autogpt_platform/frontend/src/tests/integrations/vitest.setup.tsx` — MSW server setup
## Step 1: Identify changed frontend files
```bash
BASE_BRANCH="${ARGUMENTS:-dev}"
cd autogpt_platform/frontend
# Get changed frontend files (excluding generated, config, and test files)
git diff "$BASE_BRANCH"...HEAD --name-only -- src/ \
| grep -v '__generated__' \
| grep -v '__tests__' \
| grep -v '\.test\.' \
| grep -v '\.stories\.' \
| grep -v '\.spec\.'
```
Also read the diff to understand what changed:
```bash
git diff "$BASE_BRANCH"...HEAD --stat -- src/
git diff "$BASE_BRANCH"...HEAD -- src/ | head -500
```
## Step 2: Categorize changes and find test targets
For each changed file, determine:
1. **Is it a page?** (`page.tsx`) — these are the primary test targets
2. **Is it a hook?** (`use*.ts`) — test via the page that uses it
3. **Is it a component?** (`.tsx` in `components/`) — test via the parent page unless it's complex enough to warrant isolation
4. **Is it a helper?** (`helpers.ts`, `utils.ts`) — unit test directly if pure logic
**Priority order:**
1. Pages with new/changed data fetching or user interactions
2. Components with complex internal logic (modals, forms, wizards)
3. Hooks with non-trivial business logic
4. Pure helper functions
Skip: styling-only changes, type-only changes, config changes.
## Step 3: Check for existing tests
For each test target, check if tests already exist:
```bash
# For a page at src/app/(platform)/library/page.tsx
ls src/app/\(platform\)/library/__tests__/ 2>/dev/null
# For a component at src/app/(platform)/library/components/AgentCard/AgentCard.tsx
ls src/app/\(platform\)/library/components/AgentCard/__tests__/ 2>/dev/null
```
Note which targets have no tests (need new files) vs which have tests that need updating.
## Step 4: Identify API endpoints used
For each test target, find which API hooks are used:
```bash
# Find generated API hook imports in the changed files
grep -rn 'from.*__generated__/endpoints' src/app/\(platform\)/library/
grep -rn 'use[A-Z].*V[12]' src/app/\(platform\)/library/
```
For each API hook found, locate the corresponding MSW handler:
```bash
# If the page uses useGetV2ListLibraryAgents, find its MSW handlers
grep -rn 'getGetV2ListLibraryAgents.*Handler' src/app/api/__generated__/endpoints/library/library.msw.ts
```
List every MSW handler you will need (200 for happy path, 4xx for error paths).
## Step 5: Write the test plan
Before writing code, output a plan as a numbered list:
```
Test plan for [branch name]:
1. src/app/(platform)/library/__tests__/main.test.tsx (NEW)
- Renders page with agent list (MSW 200)
- Shows loading state
- Shows error state (MSW 422)
- Handles empty agent list
2. src/app/(platform)/library/__tests__/search.test.tsx (NEW)
- Filters agents by search query
- Shows no results message
- Clears search
3. src/app/(platform)/library/components/AgentCard/__tests__/AgentCard.test.tsx (UPDATE)
- Add test for new "duplicate" action
```
Present this plan to the user. Wait for confirmation before proceeding. If the user has feedback, adjust the plan.
## Step 6: Write the tests
For each test file in the plan, follow these conventions:
### File structure
```tsx
import { render, screen, waitFor } from "@/tests/integrations/test-utils";
import { server } from "@/mocks/mock-server";
// Import MSW handlers for endpoints the page uses
import {
getGetV2ListLibraryAgentsMockHandler200,
getGetV2ListLibraryAgentsMockHandler422,
} from "@/app/api/__generated__/endpoints/library/library.msw";
// Import the component under test
import LibraryPage from "../page";
describe("LibraryPage", () => {
test("renders agent list from API", async () => {
server.use(getGetV2ListLibraryAgentsMockHandler200());
render(<LibraryPage />);
expect(await screen.findByText(/my agents/i)).toBeDefined();
});
test("shows error state on API failure", async () => {
server.use(getGetV2ListLibraryAgentsMockHandler422());
render(<LibraryPage />);
expect(await screen.findByText(/error/i)).toBeDefined();
});
});
```
### Rules
- Use `render()` from `@/tests/integrations/test-utils` (NOT from `@testing-library/react` directly)
- Use `server.use()` to set up MSW handlers BEFORE rendering
- Use `findBy*` (async) for elements that appear after data fetching — NOT `getBy*`
- Use `getBy*` only for elements that are immediately present in the DOM
- Use `screen` queries — do NOT destructure from `render()`
- Use `waitFor` when asserting side effects or state changes after interactions
- Import `fireEvent` or `userEvent` from the test-utils for interactions
- Do NOT mock internal hooks or functions — mock at the API boundary via MSW
- Do NOT use `act()` manually — `render` and `fireEvent` handle it
- Keep tests focused: one behavior per test
- Use descriptive test names that read like sentences
### Test location
```
# For pages: __tests__/ next to page.tsx
src/app/(platform)/library/__tests__/main.test.tsx
# For complex standalone components: __tests__/ inside component folder
src/app/(platform)/library/components/AgentCard/__tests__/AgentCard.test.tsx
# For pure helpers: co-located .test.ts
src/app/(platform)/library/helpers.test.ts
```
### Custom MSW overrides
When the auto-generated faker data is not enough, override with specific data:
```tsx
import { http, HttpResponse } from "msw";
server.use(
http.get("http://localhost:3000/api/proxy/api/v2/library/agents", () => {
return HttpResponse.json({
agents: [
{ id: "1", name: "Test Agent", description: "A test agent" },
],
pagination: { total_items: 1, total_pages: 1, page: 1, page_size: 10 },
});
}),
);
```
Use the proxy URL pattern: `http://localhost:3000/api/proxy/api/v{version}/{path}` — this matches the MSW base URL configured in `orval.config.ts`.
## Step 7: Run and verify
After writing all tests:
```bash
cd autogpt_platform/frontend
pnpm test:unit --reporter=verbose
```
If tests fail:
1. Read the error output carefully
2. Fix the test (not the source code, unless there is a genuine bug)
3. Re-run until all pass
Then run the full checks:
```bash
pnpm format
pnpm lint
pnpm types
```

View File

@@ -179,21 +179,30 @@ jobs:
pip install pyyaml
# Resolve extends and generate a flat compose file that bake can understand
export NEXT_PUBLIC_SOURCEMAPS NEXT_PUBLIC_PW_TEST
docker compose -f docker-compose.yml config > docker-compose.resolved.yml
# Ensure NEXT_PUBLIC_SOURCEMAPS is in resolved compose
# (docker compose config on some versions drops this arg)
if ! grep -q "NEXT_PUBLIC_SOURCEMAPS" docker-compose.resolved.yml; then
echo "Injecting NEXT_PUBLIC_SOURCEMAPS into resolved compose (docker compose config dropped it)"
sed -i '/NEXT_PUBLIC_PW_TEST/a\ NEXT_PUBLIC_SOURCEMAPS: "true"' docker-compose.resolved.yml
fi
# Add cache configuration to the resolved compose file
python ../.github/workflows/scripts/docker-ci-fix-compose-build-cache.py \
--source docker-compose.resolved.yml \
--cache-from "type=gha" \
--cache-to "type=gha,mode=max" \
--backend-hash "${{ hashFiles('autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/poetry.lock', 'autogpt_platform/backend/backend/**') }}" \
--frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}" \
--frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}-sourcemaps" \
--git-ref "${{ github.ref }}"
# Build with bake using the resolved compose file (now includes cache config)
docker buildx bake --allow=fs.read=.. -f docker-compose.resolved.yml --load
env:
NEXT_PUBLIC_PW_TEST: true
NEXT_PUBLIC_SOURCEMAPS: true
- name: Set up tests - Cache E2E test data
id: e2e-data-cache
@@ -279,6 +288,11 @@ jobs:
cache: "pnpm"
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
- name: Copy source maps from Docker for E2E coverage
run: |
FRONTEND_CONTAINER=$(docker compose -f ../docker-compose.resolved.yml ps -q frontend)
docker cp "$FRONTEND_CONTAINER":/app/.next/static .next-static-coverage
- name: Set up tests - Install dependencies
run: pnpm install --frozen-lockfile
@@ -289,6 +303,15 @@ jobs:
run: pnpm test:no-build
continue-on-error: false
- name: Upload E2E coverage to Codecov
if: ${{ !cancelled() }}
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
flags: platform-frontend-e2e
files: ./autogpt_platform/frontend/coverage/e2e/cobertura-coverage.xml
disable_search: true
- name: Upload Playwright report
if: always()
uses: actions/upload-artifact@v4

36
.gitleaks.toml Normal file
View File

@@ -0,0 +1,36 @@
title = "AutoGPT Gitleaks Config"
[extend]
useDefault = true
[allowlist]
description = "Global allowlist"
paths = [
# Template/example env files (no real secrets)
'''\.env\.(default|example|template)$''',
# Lock files
'''pnpm-lock\.yaml$''',
'''poetry\.lock$''',
# Secrets baseline
'''\.secrets\.baseline$''',
# Build artifacts and caches (should not be committed)
'''__pycache__/''',
'''classic/frontend/build/''',
# Docker dev setup (local dev JWTs/keys only)
'''autogpt_platform/db/docker/''',
# Load test configs (dev JWTs)
'''load-tests/configs/''',
# Test files with fake/fixture keys (_test.py, test_*.py, conftest.py)
'''(_test|test_.*|conftest)\.py$''',
# Documentation (only contains placeholder keys in curl/API examples)
'''docs/.*\.md$''',
# Firebase config (public API keys by design)
'''google-services\.json$''',
'''classic/frontend/(lib|web)/''',
]
# CI test-only encryption key (marked DO NOT USE IN PRODUCTION)
regexes = [
'''dvziYgz0KSK8FENhju0ZYi8''',
# LLM model name enum values falsely flagged as API keys
'''Llama-\d.*Instruct''',
]

View File

@@ -23,9 +23,15 @@ repos:
- id: detect-secrets
name: Detect secrets
description: Detects high entropy strings that are likely to be passwords.
args: ["--baseline", ".secrets.baseline"]
files: ^autogpt_platform/
exclude: pnpm-lock\.yaml$
stages: [pre-push]
exclude: (pnpm-lock\.yaml|\.env\.(default|example|template))$
- repo: https://github.com/gitleaks/gitleaks
rev: v8.24.3
hooks:
- id: gitleaks
name: Detect secrets (gitleaks)
- repo: local
# For proper type checking, all dependencies need to be up-to-date.

467
.secrets.baseline Normal file
View File

@@ -0,0 +1,467 @@
{
"version": "1.5.0",
"plugins_used": [
{
"name": "ArtifactoryDetector"
},
{
"name": "AWSKeyDetector"
},
{
"name": "AzureStorageKeyDetector"
},
{
"name": "Base64HighEntropyString",
"limit": 4.5
},
{
"name": "BasicAuthDetector"
},
{
"name": "CloudantDetector"
},
{
"name": "DiscordBotTokenDetector"
},
{
"name": "GitHubTokenDetector"
},
{
"name": "GitLabTokenDetector"
},
{
"name": "HexHighEntropyString",
"limit": 3.0
},
{
"name": "IbmCloudIamDetector"
},
{
"name": "IbmCosHmacDetector"
},
{
"name": "IPPublicDetector"
},
{
"name": "JwtTokenDetector"
},
{
"name": "KeywordDetector",
"keyword_exclude": ""
},
{
"name": "MailchimpDetector"
},
{
"name": "NpmDetector"
},
{
"name": "OpenAIDetector"
},
{
"name": "PrivateKeyDetector"
},
{
"name": "PypiTokenDetector"
},
{
"name": "SendGridDetector"
},
{
"name": "SlackDetector"
},
{
"name": "SoftlayerDetector"
},
{
"name": "SquareOAuthDetector"
},
{
"name": "StripeDetector"
},
{
"name": "TelegramBotTokenDetector"
},
{
"name": "TwilioKeyDetector"
}
],
"filters_used": [
{
"path": "detect_secrets.filters.allowlist.is_line_allowlisted"
},
{
"path": "detect_secrets.filters.common.is_ignored_due_to_verification_policies",
"min_level": 2
},
{
"path": "detect_secrets.filters.heuristic.is_indirect_reference"
},
{
"path": "detect_secrets.filters.heuristic.is_likely_id_string"
},
{
"path": "detect_secrets.filters.heuristic.is_lock_file"
},
{
"path": "detect_secrets.filters.heuristic.is_not_alphanumeric_string"
},
{
"path": "detect_secrets.filters.heuristic.is_potential_uuid"
},
{
"path": "detect_secrets.filters.heuristic.is_prefixed_with_dollar_sign"
},
{
"path": "detect_secrets.filters.heuristic.is_sequential_string"
},
{
"path": "detect_secrets.filters.heuristic.is_swagger_file"
},
{
"path": "detect_secrets.filters.heuristic.is_templated_secret"
},
{
"path": "detect_secrets.filters.regex.should_exclude_file",
"pattern": [
"\\.env$",
"pnpm-lock\\.yaml$",
"\\.env\\.(default|example|template)$",
"__pycache__",
"_test\\.py$",
"test_.*\\.py$",
"conftest\\.py$",
"poetry\\.lock$",
"node_modules"
]
}
],
"results": {
"autogpt_platform/backend/backend/api/external/v1/integrations.py": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/backend/backend/api/external/v1/integrations.py",
"hashed_secret": "665b1e3851eefefa3fb878654292f16597d25155",
"is_verified": false,
"line_number": 289
}
],
"autogpt_platform/backend/backend/blocks/airtable/_config.py": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/backend/backend/blocks/airtable/_config.py",
"hashed_secret": "57e168b03afb7c1ee3cdc4ee3db2fe1cc6e0df26",
"is_verified": false,
"line_number": 29
}
],
"autogpt_platform/backend/backend/blocks/dataforseo/_config.py": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/backend/backend/blocks/dataforseo/_config.py",
"hashed_secret": "32ce93887331fa5d192f2876ea15ec000c7d58b8",
"is_verified": false,
"line_number": 12
}
],
"autogpt_platform/backend/backend/blocks/github/checks.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/checks.py",
"hashed_secret": "8ac6f92737d8586790519c5d7bfb4d2eb172c238",
"is_verified": false,
"line_number": 108
}
],
"autogpt_platform/backend/backend/blocks/github/ci.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/ci.py",
"hashed_secret": "90bd1b48e958257948487b90bee080ba5ed00caa",
"is_verified": false,
"line_number": 123
}
],
"autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
"hashed_secret": "f96896dafced7387dcd22343b8ea29d3d2c65663",
"is_verified": false,
"line_number": 42
},
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
"hashed_secret": "b80a94d5e70bedf4f5f89d2f5a5255cc9492d12e",
"is_verified": false,
"line_number": 193
},
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
"hashed_secret": "75b17e517fe1b3136394f6bec80c4f892da75e42",
"is_verified": false,
"line_number": 344
},
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
"hashed_secret": "b0bfb5e4e2394e7f8906e5ed1dffd88b2bc89dd5",
"is_verified": false,
"line_number": 534
}
],
"autogpt_platform/backend/backend/blocks/github/statuses.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/github/statuses.py",
"hashed_secret": "8ac6f92737d8586790519c5d7bfb4d2eb172c238",
"is_verified": false,
"line_number": 85
}
],
"autogpt_platform/backend/backend/blocks/google/docs.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/google/docs.py",
"hashed_secret": "c95da0c6696342c867ef0c8258d2f74d20fd94d4",
"is_verified": false,
"line_number": 203
}
],
"autogpt_platform/backend/backend/blocks/google/sheets.py": [
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/google/sheets.py",
"hashed_secret": "bd5a04fa3667e693edc13239b6d310c5c7a8564b",
"is_verified": false,
"line_number": 57
}
],
"autogpt_platform/backend/backend/blocks/linear/_config.py": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/backend/backend/blocks/linear/_config.py",
"hashed_secret": "b37f020f42d6d613b6ce30103e4d408c4499b3bb",
"is_verified": false,
"line_number": 53
}
],
"autogpt_platform/backend/backend/blocks/medium.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/medium.py",
"hashed_secret": "ff998abc1ce6d8f01a675fa197368e44c8916e9c",
"is_verified": false,
"line_number": 131
}
],
"autogpt_platform/backend/backend/blocks/replicate/replicate_block.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/replicate/replicate_block.py",
"hashed_secret": "8bbdd6f26368f58ea4011d13d7f763cb662e66f0",
"is_verified": false,
"line_number": 55
}
],
"autogpt_platform/backend/backend/blocks/slant3d/webhook.py": [
{
"type": "Hex High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/slant3d/webhook.py",
"hashed_secret": "36263c76947443b2f6e6b78153967ac4a7da99f9",
"is_verified": false,
"line_number": 100
}
],
"autogpt_platform/backend/backend/blocks/talking_head.py": [
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/backend/backend/blocks/talking_head.py",
"hashed_secret": "44ce2d66222529eea4a32932823466fc0601c799",
"is_verified": false,
"line_number": 113
}
],
"autogpt_platform/backend/backend/blocks/wordpress/_config.py": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/backend/backend/blocks/wordpress/_config.py",
"hashed_secret": "e62679512436161b78e8a8d68c8829c2a1031ccb",
"is_verified": false,
"line_number": 17
}
],
"autogpt_platform/backend/backend/util/cache.py": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/backend/backend/util/cache.py",
"hashed_secret": "37f0c918c3fa47ca4a70e42037f9f123fdfbc75b",
"is_verified": false,
"line_number": 449
}
],
"autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/helpers.ts": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/helpers.ts",
"hashed_secret": "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8",
"is_verified": false,
"line_number": 6
}
],
"autogpt_platform/frontend/src/app/(platform)/dictionaries/en.json": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/dictionaries/en.json",
"hashed_secret": "8be3c943b1609fffbfc51aad666d0a04adf83c9d",
"is_verified": false,
"line_number": 5
}
],
"autogpt_platform/frontend/src/app/(platform)/dictionaries/es.json": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/dictionaries/es.json",
"hashed_secret": "5a6d1c612954979ea99ee33dbb2d231b00f6ac0a",
"is_verified": false,
"line_number": 5
}
],
"autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts",
"hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
"is_verified": false,
"line_number": 6
},
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts",
"hashed_secret": "f72cbb45464d487064610c5411c576ca4019d380",
"is_verified": false,
"line_number": 8
}
],
"autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts",
"hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
"is_verified": false,
"line_number": 5
},
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts",
"hashed_secret": "f72cbb45464d487064610c5411c576ca4019d380",
"is_verified": false,
"line_number": 7
}
],
"autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx",
"hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
"is_verified": false,
"line_number": 192
},
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx",
"hashed_secret": "86275db852204937bbdbdebe5fabe8536e030ab6",
"is_verified": false,
"line_number": 193
}
],
"autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts",
"hashed_secret": "47acd2028cf81b5da88ddeedb2aea4eca4b71fbd",
"is_verified": false,
"line_number": 102
},
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts",
"hashed_secret": "8be3c943b1609fffbfc51aad666d0a04adf83c9d",
"is_verified": false,
"line_number": 103
}
],
"autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts": [
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "9c486c92f1a7420e1045c7ad963fbb7ba3621025",
"is_verified": false,
"line_number": 73
},
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "9277508c7a6effc8fb59163efbfada189e35425c",
"is_verified": false,
"line_number": 75
},
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "8dc7e2cb1d0935897d541bf5facab389b8a50340",
"is_verified": false,
"line_number": 77
},
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "79a26ad48775944299be6aaf9fb1d5302c1ed75b",
"is_verified": false,
"line_number": 79
},
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "a3b62b44500a1612e48d4cab8294df81561b3b1a",
"is_verified": false,
"line_number": 81
},
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "a58979bd0b21ef4f50417d001008e60dd7a85c64",
"is_verified": false,
"line_number": 83
},
{
"type": "Base64 High Entropy String",
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
"hashed_secret": "6cb6e075f8e8c7c850f9d128d6608e5dbe209a79",
"is_verified": false,
"line_number": 85
}
],
"autogpt_platform/frontend/src/lib/constants.ts": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/lib/constants.ts",
"hashed_secret": "27b924db06a28cc755fb07c54f0fddc30659fe4d",
"is_verified": false,
"line_number": 10
}
],
"autogpt_platform/frontend/src/tests/credentials/index.ts": [
{
"type": "Secret Keyword",
"filename": "autogpt_platform/frontend/src/tests/credentials/index.ts",
"hashed_secret": "c18006fc138809314751cd1991f1e0b820fabd37",
"is_verified": false,
"line_number": 4
}
]
},
"generated_at": "2026-04-02T13:10:54Z"
}

View File

@@ -30,7 +30,7 @@ See `/frontend/CONTRIBUTING.md` for complete patterns. Quick reference:
- Regenerate with `pnpm generate:api`
- Pattern: `use{Method}{Version}{OperationName}`
4. **Styling**: Tailwind CSS only, use design tokens, Phosphor Icons only
5. **Testing**: Add Storybook stories for new components, Playwright for E2E
5. **Testing**: Integration tests (Vitest + RTL + MSW) are the default (~90%, page-level). Playwright for E2E critical flows. Storybook for design system components. See `autogpt_platform/frontend/TESTING.md`
6. **Code conventions**: Function declarations (not arrow functions) for components/handlers
- Component props should be `interface Props { ... }` (not exported) unless the interface needs to be used outside the component
@@ -47,7 +47,9 @@ See `/frontend/CONTRIBUTING.md` for complete patterns. Quick reference:
## Testing
- Backend: `poetry run test` (runs pytest with a docker based postgres + prisma).
- Frontend: `pnpm test` or `pnpm test-ui` for Playwright tests. See `docs/content/platform/contributing/tests.md` for tips.
- Frontend integration tests: `pnpm test:unit` (Vitest + RTL + MSW, primary testing approach).
- Frontend E2E tests: `pnpm test` or `pnpm test-ui` for Playwright tests.
- See `autogpt_platform/frontend/TESTING.md` for the full testing strategy.
Always run the relevant linters and tests before committing.
Use conventional commit messages for all commits (e.g. `feat(backend): add API`).

View File

@@ -9,11 +9,14 @@ from pydantic import BaseModel
from backend.copilot.config import ChatConfig
from backend.copilot.rate_limit import (
SubscriptionTier,
get_global_rate_limits,
get_usage_status,
get_user_tier,
reset_user_usage,
set_user_tier,
)
from backend.data.user import get_user_by_email, get_user_email_by_id
from backend.data.user import get_user_by_email, get_user_email_by_id, search_users
logger = logging.getLogger(__name__)
@@ -33,6 +36,17 @@ class UserRateLimitResponse(BaseModel):
weekly_token_limit: int
daily_tokens_used: int
weekly_tokens_used: int
tier: SubscriptionTier
class UserTierResponse(BaseModel):
user_id: str
tier: SubscriptionTier
class SetUserTierRequest(BaseModel):
user_id: str
tier: SubscriptionTier
async def _resolve_user_id(
@@ -86,10 +100,10 @@ async def get_user_rate_limit(
logger.info("Admin %s checking rate limit for user %s", admin_user_id, resolved_id)
daily_limit, weekly_limit = await get_global_rate_limits(
daily_limit, weekly_limit, tier = await get_global_rate_limits(
resolved_id, config.daily_token_limit, config.weekly_token_limit
)
usage = await get_usage_status(resolved_id, daily_limit, weekly_limit)
usage = await get_usage_status(resolved_id, daily_limit, weekly_limit, tier=tier)
return UserRateLimitResponse(
user_id=resolved_id,
@@ -98,6 +112,7 @@ async def get_user_rate_limit(
weekly_token_limit=weekly_limit,
daily_tokens_used=usage.daily.used,
weekly_tokens_used=usage.weekly.used,
tier=tier,
)
@@ -125,10 +140,10 @@ async def reset_user_rate_limit(
logger.exception("Failed to reset user usage")
raise HTTPException(status_code=500, detail="Failed to reset usage") from e
daily_limit, weekly_limit = await get_global_rate_limits(
daily_limit, weekly_limit, tier = await get_global_rate_limits(
user_id, config.daily_token_limit, config.weekly_token_limit
)
usage = await get_usage_status(user_id, daily_limit, weekly_limit)
usage = await get_usage_status(user_id, daily_limit, weekly_limit, tier=tier)
try:
resolved_email = await get_user_email_by_id(user_id)
@@ -143,4 +158,102 @@ async def reset_user_rate_limit(
weekly_token_limit=weekly_limit,
daily_tokens_used=usage.daily.used,
weekly_tokens_used=usage.weekly.used,
tier=tier,
)
@router.get(
"/rate_limit/tier",
response_model=UserTierResponse,
summary="Get User Rate Limit Tier",
)
async def get_user_rate_limit_tier(
user_id: str,
admin_user_id: str = Security(get_user_id),
) -> UserTierResponse:
"""Get a user's current rate-limit tier. Admin-only.
Returns 404 if the user does not exist in the database.
"""
logger.info("Admin %s checking tier for user %s", admin_user_id, user_id)
resolved_email = await get_user_email_by_id(user_id)
if resolved_email is None:
raise HTTPException(status_code=404, detail=f"User {user_id} not found")
tier = await get_user_tier(user_id)
return UserTierResponse(user_id=user_id, tier=tier)
@router.post(
"/rate_limit/tier",
response_model=UserTierResponse,
summary="Set User Rate Limit Tier",
)
async def set_user_rate_limit_tier(
request: SetUserTierRequest,
admin_user_id: str = Security(get_user_id),
) -> UserTierResponse:
"""Set a user's rate-limit tier. Admin-only.
Returns 404 if the user does not exist in the database.
"""
try:
resolved_email = await get_user_email_by_id(request.user_id)
except Exception:
logger.warning(
"Failed to resolve email for user %s",
request.user_id,
exc_info=True,
)
resolved_email = None
if resolved_email is None:
raise HTTPException(status_code=404, detail=f"User {request.user_id} not found")
old_tier = await get_user_tier(request.user_id)
logger.info(
"Admin %s changing tier for user %s (%s): %s -> %s",
admin_user_id,
request.user_id,
resolved_email,
old_tier.value,
request.tier.value,
)
try:
await set_user_tier(request.user_id, request.tier)
except Exception as e:
logger.exception("Failed to set user tier")
raise HTTPException(status_code=500, detail="Failed to set tier") from e
return UserTierResponse(user_id=request.user_id, tier=request.tier)
class UserSearchResult(BaseModel):
user_id: str
user_email: Optional[str] = None
@router.get(
"/rate_limit/search_users",
response_model=list[UserSearchResult],
summary="Search Users by Name or Email",
)
async def admin_search_users(
query: str,
limit: int = 20,
admin_user_id: str = Security(get_user_id),
) -> list[UserSearchResult]:
"""Search users by partial email or name. Admin-only.
Queries the User table directly — returns results even for users
without credit transaction history.
"""
if len(query.strip()) < 3:
raise HTTPException(
status_code=400,
detail="Search query must be at least 3 characters.",
)
logger.info("Admin %s searching users with query=%r", admin_user_id, query)
results = await search_users(query, limit=max(1, min(limit, 50)))
return [UserSearchResult(user_id=uid, user_email=email) for uid, email in results]

View File

@@ -9,7 +9,7 @@ import pytest_mock
from autogpt_libs.auth.jwt_utils import get_jwt_payload
from pytest_snapshot.plugin import Snapshot
from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow
from backend.copilot.rate_limit import CoPilotUsageStatus, SubscriptionTier, UsageWindow
from .rate_limit_admin_routes import router as rate_limit_admin_router
@@ -57,7 +57,7 @@ def _patch_rate_limit_deps(
mocker.patch(
f"{_MOCK_MODULE}.get_global_rate_limits",
new_callable=AsyncMock,
return_value=(2_500_000, 12_500_000),
return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
)
mocker.patch(
f"{_MOCK_MODULE}.get_usage_status",
@@ -89,6 +89,7 @@ def test_get_rate_limit(
assert data["weekly_token_limit"] == 12_500_000
assert data["daily_tokens_used"] == 500_000
assert data["weekly_tokens_used"] == 3_000_000
assert data["tier"] == "FREE"
configured_snapshot.assert_match(
json.dumps(data, indent=2, sort_keys=True) + "\n",
@@ -162,6 +163,7 @@ def test_reset_user_usage_daily_only(
assert data["daily_tokens_used"] == 0
# Weekly is untouched
assert data["weekly_tokens_used"] == 3_000_000
assert data["tier"] == "FREE"
mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=False)
@@ -192,6 +194,7 @@ def test_reset_user_usage_daily_and_weekly(
data = response.json()
assert data["daily_tokens_used"] == 0
assert data["weekly_tokens_used"] == 0
assert data["tier"] == "FREE"
mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=True)
@@ -228,7 +231,7 @@ def test_get_rate_limit_email_lookup_failure(
mocker.patch(
f"{_MOCK_MODULE}.get_global_rate_limits",
new_callable=AsyncMock,
return_value=(2_500_000, 12_500_000),
return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
)
mocker.patch(
f"{_MOCK_MODULE}.get_usage_status",
@@ -261,3 +264,303 @@ def test_admin_endpoints_require_admin_role(mock_jwt_user) -> None:
json={"user_id": "test"},
)
assert response.status_code == 403
# ---------------------------------------------------------------------------
# Tier management endpoints
# ---------------------------------------------------------------------------
def test_get_user_tier(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test getting a user's rate-limit tier."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
return_value=_TARGET_EMAIL,
)
mocker.patch(
f"{_MOCK_MODULE}.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.PRO,
)
response = client.get("/admin/rate_limit/tier", params={"user_id": target_user_id})
assert response.status_code == 200
data = response.json()
assert data["user_id"] == target_user_id
assert data["tier"] == "PRO"
def test_get_user_tier_user_not_found(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test that getting tier for a non-existent user returns 404."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
return_value=None,
)
response = client.get("/admin/rate_limit/tier", params={"user_id": target_user_id})
assert response.status_code == 404
def test_set_user_tier(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test setting a user's rate-limit tier (upgrade)."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
return_value=_TARGET_EMAIL,
)
mocker.patch(
f"{_MOCK_MODULE}.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.FREE,
)
mock_set = mocker.patch(
f"{_MOCK_MODULE}.set_user_tier",
new_callable=AsyncMock,
)
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "ENTERPRISE"},
)
assert response.status_code == 200
data = response.json()
assert data["user_id"] == target_user_id
assert data["tier"] == "ENTERPRISE"
mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.ENTERPRISE)
def test_set_user_tier_downgrade(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test downgrading a user's tier from PRO to FREE."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
return_value=_TARGET_EMAIL,
)
mocker.patch(
f"{_MOCK_MODULE}.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.PRO,
)
mock_set = mocker.patch(
f"{_MOCK_MODULE}.set_user_tier",
new_callable=AsyncMock,
)
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "FREE"},
)
assert response.status_code == 200
data = response.json()
assert data["user_id"] == target_user_id
assert data["tier"] == "FREE"
mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.FREE)
def test_set_user_tier_invalid_tier(
target_user_id: str,
) -> None:
"""Test that setting an invalid tier returns 422."""
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "invalid"},
)
assert response.status_code == 422
def test_set_user_tier_invalid_tier_uppercase(
target_user_id: str,
) -> None:
"""Test that setting an unrecognised uppercase tier (e.g. 'INVALID') returns 422.
Regression: ensures Pydantic enum validation rejects values that are not
members of SubscriptionTier, even when they look like valid enum names.
"""
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "INVALID"},
)
assert response.status_code == 422
body = response.json()
assert "detail" in body
def test_set_user_tier_email_lookup_failure_returns_404(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test that email lookup failure returns 404 (user unverifiable)."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
side_effect=Exception("DB connection failed"),
)
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "PRO"},
)
assert response.status_code == 404
def test_set_user_tier_user_not_found(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test that setting tier for a non-existent user returns 404."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
return_value=None,
)
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "PRO"},
)
assert response.status_code == 404
def test_set_user_tier_db_failure(
mocker: pytest_mock.MockerFixture,
target_user_id: str,
) -> None:
"""Test that DB failure on set tier returns 500."""
mocker.patch(
f"{_MOCK_MODULE}.get_user_email_by_id",
new_callable=AsyncMock,
return_value=_TARGET_EMAIL,
)
mocker.patch(
f"{_MOCK_MODULE}.get_user_tier",
new_callable=AsyncMock,
return_value=SubscriptionTier.FREE,
)
mocker.patch(
f"{_MOCK_MODULE}.set_user_tier",
new_callable=AsyncMock,
side_effect=Exception("DB connection refused"),
)
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": target_user_id, "tier": "PRO"},
)
assert response.status_code == 500
def test_tier_endpoints_require_admin_role(mock_jwt_user) -> None:
"""Test that tier admin endpoints require admin role."""
app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
response = client.get("/admin/rate_limit/tier", params={"user_id": "test"})
assert response.status_code == 403
response = client.post(
"/admin/rate_limit/tier",
json={"user_id": "test", "tier": "PRO"},
)
assert response.status_code == 403
# ─── search_users endpoint ──────────────────────────────────────────
def test_search_users_returns_matching_users(
mocker: pytest_mock.MockerFixture,
admin_user_id: str,
) -> None:
"""Partial search should return all matching users from the User table."""
mocker.patch(
_MOCK_MODULE + ".search_users",
new_callable=AsyncMock,
return_value=[
("user-1", "zamil.majdy@gmail.com"),
("user-2", "zamil.majdy@agpt.co"),
],
)
response = client.get("/admin/rate_limit/search_users", params={"query": "zamil"})
assert response.status_code == 200
results = response.json()
assert len(results) == 2
assert results[0]["user_email"] == "zamil.majdy@gmail.com"
assert results[1]["user_email"] == "zamil.majdy@agpt.co"
def test_search_users_empty_results(
mocker: pytest_mock.MockerFixture,
admin_user_id: str,
) -> None:
"""Search with no matches returns empty list."""
mocker.patch(
_MOCK_MODULE + ".search_users",
new_callable=AsyncMock,
return_value=[],
)
response = client.get(
"/admin/rate_limit/search_users", params={"query": "nonexistent"}
)
assert response.status_code == 200
assert response.json() == []
def test_search_users_short_query_rejected(
admin_user_id: str,
) -> None:
"""Query shorter than 3 characters should return 400."""
response = client.get("/admin/rate_limit/search_users", params={"query": "ab"})
assert response.status_code == 400
def test_search_users_negative_limit_clamped(
mocker: pytest_mock.MockerFixture,
admin_user_id: str,
) -> None:
"""Negative limit should be clamped to 1, not passed through."""
mock_search = mocker.patch(
_MOCK_MODULE + ".search_users",
new_callable=AsyncMock,
return_value=[],
)
response = client.get(
"/admin/rate_limit/search_users", params={"query": "test", "limit": -1}
)
assert response.status_code == 200
mock_search.assert_awaited_once_with("test", limit=1)
def test_search_users_requires_admin_role(mock_jwt_user) -> None:
"""Test that the search_users endpoint requires admin role."""
app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
response = client.get("/admin/rate_limit/search_users", params={"query": "test"})
assert response.status_code == 403

View File

@@ -15,7 +15,8 @@ from pydantic import BaseModel, ConfigDict, Field, field_validator
from backend.copilot import service as chat_service
from backend.copilot import stream_registry
from backend.copilot.config import ChatConfig
from backend.copilot.config import ChatConfig, CopilotMode
from backend.copilot.db import get_chat_messages_paginated
from backend.copilot.executor.utils import enqueue_cancel_task, enqueue_copilot_turn
from backend.copilot.model import (
ChatMessage,
@@ -111,6 +112,11 @@ class StreamChatRequest(BaseModel):
file_ids: list[str] | None = Field(
default=None, max_length=20
) # Workspace file IDs attached to this message
mode: CopilotMode | None = Field(
default=None,
description="Autopilot mode: 'fast' for baseline LLM, 'extended_thinking' for Claude Agent SDK. "
"If None, uses the server default (extended_thinking).",
)
class CreateSessionRequest(BaseModel):
@@ -150,6 +156,8 @@ class SessionDetailResponse(BaseModel):
user_id: str | None
messages: list[dict]
active_stream: ActiveStreamInfo | None = None # Present if stream is still active
has_more_messages: bool = False
oldest_sequence: int | None = None
total_prompt_tokens: int = 0
total_completion_tokens: int = 0
metadata: ChatSessionMetadata = ChatSessionMetadata()
@@ -389,60 +397,78 @@ async def update_session_title_route(
async def get_session(
session_id: str,
user_id: Annotated[str, Security(auth.get_user_id)],
limit: int = Query(default=50, ge=1, le=200),
before_sequence: int | None = Query(default=None, ge=0),
) -> SessionDetailResponse:
"""
Retrieve the details of a specific chat session.
Looks up a chat session by ID for the given user (if authenticated) and returns all session data including messages.
If there's an active stream for this session, returns active_stream info for reconnection.
Supports cursor-based pagination via ``limit`` and ``before_sequence``.
When no pagination params are provided, returns the most recent messages.
Args:
session_id: The unique identifier for the desired chat session.
user_id: The optional authenticated user ID, or None for anonymous access.
user_id: The authenticated user's ID.
limit: Maximum number of messages to return (1-200, default 50).
before_sequence: Return messages with sequence < this value (cursor).
Returns:
SessionDetailResponse: Details for the requested session, including active_stream info if applicable.
SessionDetailResponse: Details for the requested session, including
active_stream info and pagination metadata.
"""
session = await get_chat_session(session_id, user_id)
if not session:
page = await get_chat_messages_paginated(
session_id, limit, before_sequence, user_id=user_id
)
if page is None:
raise NotFoundError(f"Session {session_id} not found.")
messages = [message.model_dump() for message in page.messages]
messages = [message.model_dump() for message in session.messages]
# Check if there's an active stream for this session
# Only check active stream on initial load (not on "load more" requests)
active_stream_info = None
active_session, last_message_id = await stream_registry.get_active_session(
session_id, user_id
)
logger.info(
f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
)
if active_session:
# Keep the assistant message (including tool_calls) so the frontend can
# render the correct tool UI (e.g. CreateAgent with mini game).
# convertChatSessionToUiMessages handles isComplete=false by setting
# tool parts without output to state "input-available".
active_stream_info = ActiveStreamInfo(
turn_id=active_session.turn_id,
last_message_id=last_message_id,
if before_sequence is None:
active_session, last_message_id = await stream_registry.get_active_session(
session_id, user_id
)
logger.info(
f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
)
if active_session:
active_stream_info = ActiveStreamInfo(
turn_id=active_session.turn_id,
last_message_id=last_message_id,
)
# Skip session metadata on "load more" — frontend only needs messages
if before_sequence is not None:
return SessionDetailResponse(
id=page.session.session_id,
created_at=page.session.started_at.isoformat(),
updated_at=page.session.updated_at.isoformat(),
user_id=page.session.user_id or None,
messages=messages,
active_stream=None,
has_more_messages=page.has_more,
oldest_sequence=page.oldest_sequence,
total_prompt_tokens=0,
total_completion_tokens=0,
)
# Sum token usage from session
total_prompt = sum(u.prompt_tokens for u in session.usage)
total_completion = sum(u.completion_tokens for u in session.usage)
total_prompt = sum(u.prompt_tokens for u in page.session.usage)
total_completion = sum(u.completion_tokens for u in page.session.usage)
return SessionDetailResponse(
id=session.session_id,
created_at=session.started_at.isoformat(),
updated_at=session.updated_at.isoformat(),
user_id=session.user_id or None,
id=page.session.session_id,
created_at=page.session.started_at.isoformat(),
updated_at=page.session.updated_at.isoformat(),
user_id=page.session.user_id or None,
messages=messages,
active_stream=active_stream_info,
has_more_messages=page.has_more,
oldest_sequence=page.oldest_sequence,
total_prompt_tokens=total_prompt,
total_completion_tokens=total_completion,
metadata=session.metadata,
metadata=page.session.metadata,
)
@@ -456,8 +482,9 @@ async def get_copilot_usage(
Returns current token usage vs limits for daily and weekly windows.
Global defaults sourced from LaunchDarkly (falling back to config).
Includes the user's rate-limit tier.
"""
daily_limit, weekly_limit = await get_global_rate_limits(
daily_limit, weekly_limit, tier = await get_global_rate_limits(
user_id, config.daily_token_limit, config.weekly_token_limit
)
return await get_usage_status(
@@ -465,6 +492,7 @@ async def get_copilot_usage(
daily_token_limit=daily_limit,
weekly_token_limit=weekly_limit,
rate_limit_reset_cost=config.rate_limit_reset_cost,
tier=tier,
)
@@ -516,7 +544,7 @@ async def reset_copilot_usage(
detail="Rate limit reset is not available (credit system is disabled).",
)
daily_limit, weekly_limit = await get_global_rate_limits(
daily_limit, weekly_limit, tier = await get_global_rate_limits(
user_id, config.daily_token_limit, config.weekly_token_limit
)
@@ -556,6 +584,7 @@ async def reset_copilot_usage(
user_id=user_id,
daily_token_limit=daily_limit,
weekly_token_limit=weekly_limit,
tier=tier,
)
if daily_limit > 0 and usage_status.daily.used < daily_limit:
raise HTTPException(
@@ -631,6 +660,7 @@ async def reset_copilot_usage(
daily_token_limit=daily_limit,
weekly_token_limit=weekly_limit,
rate_limit_reset_cost=config.rate_limit_reset_cost,
tier=tier,
)
return RateLimitResetResponse(
@@ -741,7 +771,7 @@ async def stream_chat_post(
# Global defaults sourced from LaunchDarkly, falling back to config.
if user_id:
try:
daily_limit, weekly_limit = await get_global_rate_limits(
daily_limit, weekly_limit, _ = await get_global_rate_limits(
user_id, config.daily_token_limit, config.weekly_token_limit
)
await check_rate_limit(
@@ -836,6 +866,7 @@ async def stream_chat_post(
is_user_message=request.is_user_message,
context=request.context,
file_ids=sanitized_file_ids,
mode=request.mode,
)
setup_time = (time.perf_counter() - stream_start_time) * 1000

View File

@@ -9,6 +9,7 @@ import pytest
import pytest_mock
from backend.api.features.chat import routes as chat_routes
from backend.copilot.rate_limit import SubscriptionTier
app = fastapi.FastAPI()
app.include_router(chat_routes.router)
@@ -331,14 +332,28 @@ def _mock_usage(
*,
daily_used: int = 500,
weekly_used: int = 2000,
daily_limit: int = 10000,
weekly_limit: int = 50000,
tier: "SubscriptionTier" = SubscriptionTier.FREE,
) -> AsyncMock:
"""Mock get_usage_status to return a predictable CoPilotUsageStatus."""
"""Mock get_usage_status and get_global_rate_limits for usage endpoint tests.
Mocks both ``get_global_rate_limits`` (returns the given limits + tier) and
``get_usage_status`` so that tests exercise the endpoint without hitting
LaunchDarkly or Prisma.
"""
from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow
mocker.patch(
"backend.api.features.chat.routes.get_global_rate_limits",
new_callable=AsyncMock,
return_value=(daily_limit, weekly_limit, tier),
)
resets_at = datetime.now(UTC) + timedelta(days=1)
status = CoPilotUsageStatus(
daily=UsageWindow(used=daily_used, limit=10000, resets_at=resets_at),
weekly=UsageWindow(used=weekly_used, limit=50000, resets_at=resets_at),
daily=UsageWindow(used=daily_used, limit=daily_limit, resets_at=resets_at),
weekly=UsageWindow(used=weekly_used, limit=weekly_limit, resets_at=resets_at),
)
return mocker.patch(
"backend.api.features.chat.routes.get_usage_status",
@@ -369,6 +384,7 @@ def test_usage_returns_daily_and_weekly(
daily_token_limit=10000,
weekly_token_limit=50000,
rate_limit_reset_cost=chat_routes.config.rate_limit_reset_cost,
tier=SubscriptionTier.FREE,
)
@@ -376,11 +392,9 @@ def test_usage_uses_config_limits(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""The endpoint forwards daily_token_limit and weekly_token_limit from config."""
mock_get = _mock_usage(mocker)
"""The endpoint forwards resolved limits from get_global_rate_limits to get_usage_status."""
mock_get = _mock_usage(mocker, daily_limit=99999, weekly_limit=77777)
mocker.patch.object(chat_routes.config, "daily_token_limit", 99999)
mocker.patch.object(chat_routes.config, "weekly_token_limit", 77777)
mocker.patch.object(chat_routes.config, "rate_limit_reset_cost", 500)
response = client.get("/usage")
@@ -391,6 +405,7 @@ def test_usage_uses_config_limits(
daily_token_limit=99999,
weekly_token_limit=77777,
rate_limit_reset_cost=500,
tier=SubscriptionTier.FREE,
)
@@ -526,3 +541,41 @@ def test_create_session_rejects_nested_metadata(
)
assert response.status_code == 422
class TestStreamChatRequestModeValidation:
"""Pydantic-level validation of the ``mode`` field on StreamChatRequest."""
def test_rejects_invalid_mode_value(self) -> None:
"""Any string outside the Literal set must raise ValidationError."""
from pydantic import ValidationError
from backend.api.features.chat.routes import StreamChatRequest
with pytest.raises(ValidationError):
StreamChatRequest(message="hi", mode="turbo") # type: ignore[arg-type]
def test_accepts_fast_mode(self) -> None:
from backend.api.features.chat.routes import StreamChatRequest
req = StreamChatRequest(message="hi", mode="fast")
assert req.mode == "fast"
def test_accepts_extended_thinking_mode(self) -> None:
from backend.api.features.chat.routes import StreamChatRequest
req = StreamChatRequest(message="hi", mode="extended_thinking")
assert req.mode == "extended_thinking"
def test_accepts_none_mode(self) -> None:
"""``mode=None`` is valid (server decides via feature flags)."""
from backend.api.features.chat.routes import StreamChatRequest
req = StreamChatRequest(message="hi", mode=None)
assert req.mode is None
def test_mode_defaults_to_none_when_omitted(self) -> None:
from backend.api.features.chat.routes import StreamChatRequest
req = StreamChatRequest(message="hi")
assert req.mode is None

View File

@@ -189,6 +189,7 @@ async def test_create_store_submission(mocker):
notifyOnAgentApproved=True,
notifyOnAgentRejected=True,
timezone="Europe/Delft",
subscriptionTier=prisma.enums.SubscriptionTier.FREE, # type: ignore[reportCallIssue,reportAttributeAccessIssue]
)
mock_agent = prisma.models.AgentGraph(
id="agent-id",

View File

@@ -12,7 +12,7 @@ import fastapi
from autogpt_libs.auth.dependencies import get_user_id, requires_user
from fastapi import Query, UploadFile
from fastapi.responses import Response
from pydantic import BaseModel
from pydantic import BaseModel, Field
from backend.data.workspace import (
WorkspaceFile,
@@ -131,9 +131,26 @@ class StorageUsageResponse(BaseModel):
file_count: int
class WorkspaceFileItem(BaseModel):
id: str
name: str
path: str
mime_type: str
size_bytes: int
metadata: dict = Field(default_factory=dict)
created_at: str
class ListFilesResponse(BaseModel):
files: list[WorkspaceFileItem]
offset: int = 0
has_more: bool = False
@router.get(
"/files/{file_id}/download",
summary="Download file by ID",
operation_id="getWorkspaceDownloadFileById",
)
async def download_file(
user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -158,6 +175,7 @@ async def download_file(
@router.delete(
"/files/{file_id}",
summary="Delete a workspace file",
operation_id="deleteWorkspaceFile",
)
async def delete_workspace_file(
user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -183,6 +201,7 @@ async def delete_workspace_file(
@router.post(
"/files/upload",
summary="Upload file to workspace",
operation_id="uploadWorkspaceFile",
)
async def upload_file(
user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -196,6 +215,9 @@ async def upload_file(
Files are stored in session-scoped paths when session_id is provided,
so the agent's session-scoped tools can discover them automatically.
"""
# Empty-string session_id drops session scoping; normalize to None.
session_id = session_id or None
config = Config()
# Sanitize filename — strip any directory components
@@ -250,16 +272,27 @@ async def upload_file(
manager = WorkspaceManager(user_id, workspace.id, session_id)
try:
workspace_file = await manager.write_file(
content, filename, overwrite=overwrite
content, filename, overwrite=overwrite, metadata={"origin": "user-upload"}
)
except ValueError as e:
raise fastapi.HTTPException(status_code=409, detail=str(e)) from e
# write_file raises ValueError for both path-conflict and size-limit
# cases; map each to its correct HTTP status.
message = str(e)
if message.startswith("File too large"):
raise fastapi.HTTPException(status_code=413, detail=message) from e
raise fastapi.HTTPException(status_code=409, detail=message) from e
# Post-write storage check — eliminates TOCTOU race on the quota.
# If a concurrent upload pushed us over the limit, undo this write.
new_total = await get_workspace_total_size(workspace.id)
if storage_limit_bytes and new_total > storage_limit_bytes:
await soft_delete_workspace_file(workspace_file.id, workspace.id)
try:
await soft_delete_workspace_file(workspace_file.id, workspace.id)
except Exception as e:
logger.warning(
f"Failed to soft-delete over-quota file {workspace_file.id} "
f"in workspace {workspace.id}: {e}"
)
raise fastapi.HTTPException(
status_code=413,
detail={
@@ -281,6 +314,7 @@ async def upload_file(
@router.get(
"/storage/usage",
summary="Get workspace storage usage",
operation_id="getWorkspaceStorageUsage",
)
async def get_storage_usage(
user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -301,3 +335,57 @@ async def get_storage_usage(
used_percent=round((used_bytes / limit_bytes) * 100, 1) if limit_bytes else 0,
file_count=file_count,
)
@router.get(
"/files",
summary="List workspace files",
operation_id="listWorkspaceFiles",
)
async def list_workspace_files(
user_id: Annotated[str, fastapi.Security(get_user_id)],
session_id: str | None = Query(default=None),
limit: int = Query(default=200, ge=1, le=1000),
offset: int = Query(default=0, ge=0),
) -> ListFilesResponse:
"""
List files in the user's workspace.
When session_id is provided, only files for that session are returned.
Otherwise, all files across sessions are listed. Results are paginated
via `limit`/`offset`; `has_more` indicates whether additional pages exist.
"""
workspace = await get_or_create_workspace(user_id)
# Treat empty-string session_id the same as omitted — an empty value
# would otherwise silently list files across every session instead of
# scoping to one.
session_id = session_id or None
manager = WorkspaceManager(user_id, workspace.id, session_id)
include_all = session_id is None
# Fetch one extra to compute has_more without a separate count query.
files = await manager.list_files(
limit=limit + 1,
offset=offset,
include_all_sessions=include_all,
)
has_more = len(files) > limit
page = files[:limit]
return ListFilesResponse(
files=[
WorkspaceFileItem(
id=f.id,
name=f.name,
path=f.path,
mime_type=f.mime_type,
size_bytes=f.size_bytes,
metadata=f.metadata or {},
created_at=f.created_at.isoformat(),
)
for f in page
],
offset=offset,
has_more=has_more,
)

View File

@@ -1,48 +1,28 @@
"""Tests for workspace file upload and download routes."""
import io
from datetime import datetime, timezone
from unittest.mock import AsyncMock, MagicMock, patch
import fastapi
import fastapi.testclient
import pytest
import pytest_mock
from backend.api.features.workspace import routes as workspace_routes
from backend.data.workspace import WorkspaceFile
from backend.api.features.workspace.routes import router
from backend.data.workspace import Workspace, WorkspaceFile
app = fastapi.FastAPI()
app.include_router(workspace_routes.router)
app.include_router(router)
@app.exception_handler(ValueError)
async def _value_error_handler(
request: fastapi.Request, exc: ValueError
) -> fastapi.responses.JSONResponse:
"""Mirror the production ValueError → 400 mapping from rest_api.py."""
"""Mirror the production ValueError → 400 mapping from the REST app."""
return fastapi.responses.JSONResponse(status_code=400, content={"detail": str(exc)})
client = fastapi.testclient.TestClient(app)
TEST_USER_ID = "3e53486c-cf57-477e-ba2a-cb02dc828e1a"
MOCK_WORKSPACE = type("W", (), {"id": "ws-1"})()
_NOW = datetime(2023, 1, 1, tzinfo=timezone.utc)
MOCK_FILE = WorkspaceFile(
id="file-aaa-bbb",
workspace_id="ws-1",
created_at=_NOW,
updated_at=_NOW,
name="hello.txt",
path="/session/hello.txt",
mime_type="text/plain",
size_bytes=13,
storage_path="local://hello.txt",
)
@pytest.fixture(autouse=True)
def setup_app_auth(mock_jwt_user):
@@ -53,25 +33,201 @@ def setup_app_auth(mock_jwt_user):
app.dependency_overrides.clear()
def _make_workspace(user_id: str = "test-user-id") -> Workspace:
return Workspace(
id="ws-001",
user_id=user_id,
created_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
updated_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
)
def _make_file(**overrides) -> WorkspaceFile:
defaults = {
"id": "file-001",
"workspace_id": "ws-001",
"created_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
"updated_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
"name": "test.txt",
"path": "/test.txt",
"storage_path": "local://test.txt",
"mime_type": "text/plain",
"size_bytes": 100,
"checksum": None,
"is_deleted": False,
"deleted_at": None,
"metadata": {},
}
defaults.update(overrides)
return WorkspaceFile(**defaults)
def _make_file_mock(**overrides) -> MagicMock:
"""Create a mock WorkspaceFile to simulate DB records with null fields."""
defaults = {
"id": "file-001",
"name": "test.txt",
"path": "/test.txt",
"mime_type": "text/plain",
"size_bytes": 100,
"metadata": {},
"created_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
}
defaults.update(overrides)
mock = MagicMock(spec=WorkspaceFile)
for k, v in defaults.items():
setattr(mock, k, v)
return mock
# -- list_workspace_files tests --
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_list_files_returns_all_when_no_session(mock_manager_cls, mock_get_workspace):
mock_get_workspace.return_value = _make_workspace()
files = [
_make_file(id="f1", name="a.txt", metadata={"origin": "user-upload"}),
_make_file(id="f2", name="b.csv", metadata={"origin": "agent-created"}),
]
mock_instance = AsyncMock()
mock_instance.list_files.return_value = files
mock_manager_cls.return_value = mock_instance
response = client.get("/files")
assert response.status_code == 200
data = response.json()
assert len(data["files"]) == 2
assert data["has_more"] is False
assert data["offset"] == 0
assert data["files"][0]["id"] == "f1"
assert data["files"][0]["metadata"] == {"origin": "user-upload"}
assert data["files"][1]["id"] == "f2"
mock_instance.list_files.assert_called_once_with(
limit=201, offset=0, include_all_sessions=True
)
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_list_files_scopes_to_session_when_provided(
mock_manager_cls, mock_get_workspace, test_user_id
):
mock_get_workspace.return_value = _make_workspace(user_id=test_user_id)
mock_instance = AsyncMock()
mock_instance.list_files.return_value = []
mock_manager_cls.return_value = mock_instance
response = client.get("/files?session_id=sess-123")
assert response.status_code == 200
data = response.json()
assert data["files"] == []
assert data["has_more"] is False
mock_manager_cls.assert_called_once_with(test_user_id, "ws-001", "sess-123")
mock_instance.list_files.assert_called_once_with(
limit=201, offset=0, include_all_sessions=False
)
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_list_files_null_metadata_coerced_to_empty_dict(
mock_manager_cls, mock_get_workspace
):
"""Route uses `f.metadata or {}` for pre-existing files with null metadata."""
mock_get_workspace.return_value = _make_workspace()
mock_instance = AsyncMock()
mock_instance.list_files.return_value = [_make_file_mock(metadata=None)]
mock_manager_cls.return_value = mock_instance
response = client.get("/files")
assert response.status_code == 200
assert response.json()["files"][0]["metadata"] == {}
# -- upload_file metadata tests --
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.get_workspace_total_size")
@patch("backend.api.features.workspace.routes.scan_content_safe")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_upload_passes_user_upload_origin_metadata(
mock_manager_cls, mock_scan, mock_total_size, mock_get_workspace
):
mock_get_workspace.return_value = _make_workspace()
mock_total_size.return_value = 100
written = _make_file(id="new-file", name="doc.pdf")
mock_instance = AsyncMock()
mock_instance.write_file.return_value = written
mock_manager_cls.return_value = mock_instance
response = client.post(
"/files/upload",
files={"file": ("doc.pdf", b"fake-pdf-content", "application/pdf")},
)
assert response.status_code == 200
mock_instance.write_file.assert_called_once()
call_kwargs = mock_instance.write_file.call_args
assert call_kwargs.kwargs.get("metadata") == {"origin": "user-upload"}
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.get_workspace_total_size")
@patch("backend.api.features.workspace.routes.scan_content_safe")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_upload_returns_409_on_file_conflict(
mock_manager_cls, mock_scan, mock_total_size, mock_get_workspace
):
mock_get_workspace.return_value = _make_workspace()
mock_total_size.return_value = 100
mock_instance = AsyncMock()
mock_instance.write_file.side_effect = ValueError("File already exists at path")
mock_manager_cls.return_value = mock_instance
response = client.post(
"/files/upload",
files={"file": ("dup.txt", b"content", "text/plain")},
)
assert response.status_code == 409
assert "already exists" in response.json()["detail"]
# -- Restored upload/download/delete security + invariant tests --
def _upload(
filename: str = "hello.txt",
content: bytes = b"Hello, world!",
content_type: str = "text/plain",
):
"""Helper to POST a file upload."""
return client.post(
"/files/upload?session_id=sess-1",
files={"file": (filename, io.BytesIO(content), content_type)},
)
# ---- Happy path ----
_MOCK_FILE = WorkspaceFile(
id="file-aaa-bbb",
workspace_id="ws-001",
created_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
updated_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
name="hello.txt",
path="/sessions/sess-1/hello.txt",
mime_type="text/plain",
size_bytes=13,
storage_path="local://hello.txt",
)
def test_upload_happy_path(mocker: pytest_mock.MockFixture):
def test_upload_happy_path(mocker):
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
@@ -82,7 +238,7 @@ def test_upload_happy_path(mocker: pytest_mock.MockFixture):
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
@@ -96,10 +252,7 @@ def test_upload_happy_path(mocker: pytest_mock.MockFixture):
assert data["size_bytes"] == 13
# ---- Per-file size limit ----
def test_upload_exceeds_max_file_size(mocker: pytest_mock.MockFixture):
def test_upload_exceeds_max_file_size(mocker):
"""Files larger than max_file_size_mb should be rejected with 413."""
cfg = mocker.patch("backend.api.features.workspace.routes.Config")
cfg.return_value.max_file_size_mb = 0 # 0 MB → any content is too big
@@ -109,15 +262,11 @@ def test_upload_exceeds_max_file_size(mocker: pytest_mock.MockFixture):
assert response.status_code == 413
# ---- Storage quota exceeded ----
def test_upload_storage_quota_exceeded(mocker: pytest_mock.MockFixture):
def test_upload_storage_quota_exceeded(mocker):
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
# Current usage already at limit
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
return_value=500 * 1024 * 1024,
@@ -128,27 +277,22 @@ def test_upload_storage_quota_exceeded(mocker: pytest_mock.MockFixture):
assert "Storage limit exceeded" in response.text
# ---- Post-write quota race (B2) ----
def test_upload_post_write_quota_race(mocker: pytest_mock.MockFixture):
"""If a concurrent upload tips the total over the limit after write,
the file should be soft-deleted and 413 returned."""
def test_upload_post_write_quota_race(mocker):
"""Concurrent upload tipping over limit after write should soft-delete + 413."""
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
# Pre-write check passes (under limit), but post-write check fails
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
side_effect=[0, 600 * 1024 * 1024], # first call OK, second over limit
side_effect=[0, 600 * 1024 * 1024],
)
mocker.patch(
"backend.api.features.workspace.routes.scan_content_safe",
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
@@ -160,17 +304,14 @@ def test_upload_post_write_quota_race(mocker: pytest_mock.MockFixture):
response = _upload()
assert response.status_code == 413
mock_delete.assert_called_once_with("file-aaa-bbb", "ws-1")
mock_delete.assert_called_once_with("file-aaa-bbb", "ws-001")
# ---- Any extension accepted (no allowlist) ----
def test_upload_any_extension(mocker: pytest_mock.MockFixture):
def test_upload_any_extension(mocker):
"""Any file extension should be accepted — ClamAV is the security layer."""
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
@@ -181,7 +322,7 @@ def test_upload_any_extension(mocker: pytest_mock.MockFixture):
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
@@ -191,16 +332,13 @@ def test_upload_any_extension(mocker: pytest_mock.MockFixture):
assert response.status_code == 200
# ---- Virus scan rejection ----
def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
def test_upload_blocked_by_virus_scan(mocker):
"""Files flagged by ClamAV should be rejected and never written to storage."""
from backend.api.features.store.exceptions import VirusDetectedError
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
@@ -211,7 +349,7 @@ def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
side_effect=VirusDetectedError("Eicar-Test-Signature"),
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
@@ -219,18 +357,14 @@ def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
response = _upload(filename="evil.exe", content=b"X5O!P%@AP...")
assert response.status_code == 400
assert "Virus detected" in response.text
mock_manager.write_file.assert_not_called()
# ---- No file extension ----
def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
def test_upload_file_without_extension(mocker):
"""Files without an extension should be accepted and stored as-is."""
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
@@ -241,7 +375,7 @@ def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
@@ -257,14 +391,11 @@ def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
assert mock_manager.write_file.call_args[0][1] == "Makefile"
# ---- Filename sanitization (SF5) ----
def test_upload_strips_path_components(mocker: pytest_mock.MockFixture):
def test_upload_strips_path_components(mocker):
"""Path-traversal filenames should be reduced to their basename."""
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
@@ -275,28 +406,23 @@ def test_upload_strips_path_components(mocker: pytest_mock.MockFixture):
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
)
# Filename with traversal
_upload(filename="../../etc/passwd.txt")
# write_file should have been called with just the basename
mock_manager.write_file.assert_called_once()
call_args = mock_manager.write_file.call_args
assert call_args[0][1] == "passwd.txt"
# ---- Download ----
def test_download_file_not_found(mocker: pytest_mock.MockFixture):
def test_download_file_not_found(mocker):
mocker.patch(
"backend.api.features.workspace.routes.get_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_file",
@@ -307,14 +433,11 @@ def test_download_file_not_found(mocker: pytest_mock.MockFixture):
assert response.status_code == 404
# ---- Delete ----
def test_delete_file_success(mocker: pytest_mock.MockFixture):
def test_delete_file_success(mocker):
"""Deleting an existing file should return {"deleted": true}."""
mocker.patch(
"backend.api.features.workspace.routes.get_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
mock_manager = mocker.MagicMock()
mock_manager.delete_file = mocker.AsyncMock(return_value=True)
@@ -329,11 +452,11 @@ def test_delete_file_success(mocker: pytest_mock.MockFixture):
mock_manager.delete_file.assert_called_once_with("file-aaa-bbb")
def test_delete_file_not_found(mocker: pytest_mock.MockFixture):
def test_delete_file_not_found(mocker):
"""Deleting a non-existent file should return 404."""
mocker.patch(
"backend.api.features.workspace.routes.get_workspace",
return_value=MOCK_WORKSPACE,
return_value=_make_workspace(),
)
mock_manager = mocker.MagicMock()
mock_manager.delete_file = mocker.AsyncMock(return_value=False)
@@ -347,7 +470,7 @@ def test_delete_file_not_found(mocker: pytest_mock.MockFixture):
assert "File not found" in response.text
def test_delete_file_no_workspace(mocker: pytest_mock.MockFixture):
def test_delete_file_no_workspace(mocker):
"""Deleting when user has no workspace should return 404."""
mocker.patch(
"backend.api.features.workspace.routes.get_workspace",
@@ -357,3 +480,123 @@ def test_delete_file_no_workspace(mocker: pytest_mock.MockFixture):
response = client.delete("/files/file-aaa-bbb")
assert response.status_code == 404
assert "Workspace not found" in response.text
def test_upload_write_file_too_large_returns_413(mocker):
"""write_file raises ValueError("File too large: …") → must map to 413."""
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=_make_workspace(),
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
return_value=0,
)
mocker.patch(
"backend.api.features.workspace.routes.scan_content_safe",
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(
side_effect=ValueError("File too large: 900 bytes exceeds 1MB limit")
)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
)
response = _upload()
assert response.status_code == 413
assert "File too large" in response.text
def test_upload_write_file_conflict_returns_409(mocker):
"""Non-'File too large' ValueErrors from write_file stay as 409."""
mocker.patch(
"backend.api.features.workspace.routes.get_or_create_workspace",
return_value=_make_workspace(),
)
mocker.patch(
"backend.api.features.workspace.routes.get_workspace_total_size",
return_value=0,
)
mocker.patch(
"backend.api.features.workspace.routes.scan_content_safe",
return_value=None,
)
mock_manager = mocker.MagicMock()
mock_manager.write_file = mocker.AsyncMock(
side_effect=ValueError("File already exists at path: /sessions/x/a.txt")
)
mocker.patch(
"backend.api.features.workspace.routes.WorkspaceManager",
return_value=mock_manager,
)
response = _upload()
assert response.status_code == 409
assert "already exists" in response.text
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_list_files_has_more_true_when_limit_exceeded(
mock_manager_cls, mock_get_workspace
):
"""The limit+1 fetch trick must flip has_more=True and trim the page."""
mock_get_workspace.return_value = _make_workspace()
# Backend was asked for limit+1=3, and returned exactly 3 items.
files = [
_make_file(id="f1", name="a.txt"),
_make_file(id="f2", name="b.txt"),
_make_file(id="f3", name="c.txt"),
]
mock_instance = AsyncMock()
mock_instance.list_files.return_value = files
mock_manager_cls.return_value = mock_instance
response = client.get("/files?limit=2")
assert response.status_code == 200
data = response.json()
assert data["has_more"] is True
assert len(data["files"]) == 2
assert data["files"][0]["id"] == "f1"
assert data["files"][1]["id"] == "f2"
mock_instance.list_files.assert_called_once_with(
limit=3, offset=0, include_all_sessions=True
)
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_list_files_has_more_false_when_exactly_page_size(
mock_manager_cls, mock_get_workspace
):
"""Exactly `limit` rows means we're on the last page — has_more=False."""
mock_get_workspace.return_value = _make_workspace()
files = [_make_file(id="f1", name="a.txt"), _make_file(id="f2", name="b.txt")]
mock_instance = AsyncMock()
mock_instance.list_files.return_value = files
mock_manager_cls.return_value = mock_instance
response = client.get("/files?limit=2")
assert response.status_code == 200
data = response.json()
assert data["has_more"] is False
assert len(data["files"]) == 2
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
@patch("backend.api.features.workspace.routes.WorkspaceManager")
def test_list_files_offset_is_echoed_back(mock_manager_cls, mock_get_workspace):
mock_get_workspace.return_value = _make_workspace()
mock_instance = AsyncMock()
mock_instance.list_files.return_value = []
mock_manager_cls.return_value = mock_instance
response = client.get("/files?offset=50&limit=10")
assert response.status_code == 200
assert response.json()["offset"] == 50
mock_instance.list_files.assert_called_once_with(
limit=11, offset=50, include_all_sessions=True
)

View File

@@ -205,6 +205,19 @@ class LlmModel(str, Enum, metaclass=LlmModelMeta):
KIMI_K2 = "moonshotai/kimi-k2"
QWEN3_235B_A22B_THINKING = "qwen/qwen3-235b-a22b-thinking-2507"
QWEN3_CODER = "qwen/qwen3-coder"
# Z.ai (Zhipu) models
ZAI_GLM_4_32B = "z-ai/glm-4-32b"
ZAI_GLM_4_5 = "z-ai/glm-4.5"
ZAI_GLM_4_5_AIR = "z-ai/glm-4.5-air"
ZAI_GLM_4_5_AIR_FREE = "z-ai/glm-4.5-air:free"
ZAI_GLM_4_5V = "z-ai/glm-4.5v"
ZAI_GLM_4_6 = "z-ai/glm-4.6"
ZAI_GLM_4_6V = "z-ai/glm-4.6v"
ZAI_GLM_4_7 = "z-ai/glm-4.7"
ZAI_GLM_4_7_FLASH = "z-ai/glm-4.7-flash"
ZAI_GLM_5 = "z-ai/glm-5"
ZAI_GLM_5_TURBO = "z-ai/glm-5-turbo"
ZAI_GLM_5V_TURBO = "z-ai/glm-5v-turbo"
# Llama API models
LLAMA_API_LLAMA_4_SCOUT = "Llama-4-Scout-17B-16E-Instruct-FP8"
LLAMA_API_LLAMA4_MAVERICK = "Llama-4-Maverick-17B-128E-Instruct-FP8"
@@ -630,6 +643,43 @@ MODEL_METADATA = {
LlmModel.QWEN3_CODER: ModelMetadata(
"open_router", 262144, 262144, "Qwen 3 Coder", "OpenRouter", "Qwen", 3
),
# https://openrouter.ai/models?q=z-ai
LlmModel.ZAI_GLM_4_32B: ModelMetadata(
"open_router", 128000, 128000, "GLM 4 32B", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_4_5: ModelMetadata(
"open_router", 131072, 98304, "GLM 4.5", "OpenRouter", "Z.ai", 2
),
LlmModel.ZAI_GLM_4_5_AIR: ModelMetadata(
"open_router", 131072, 98304, "GLM 4.5 Air", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_4_5_AIR_FREE: ModelMetadata(
"open_router", 131072, 96000, "GLM 4.5 Air (Free)", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_4_5V: ModelMetadata(
"open_router", 65536, 16384, "GLM 4.5V", "OpenRouter", "Z.ai", 2
),
LlmModel.ZAI_GLM_4_6: ModelMetadata(
"open_router", 204800, 204800, "GLM 4.6", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_4_6V: ModelMetadata(
"open_router", 131072, 131072, "GLM 4.6V", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_4_7: ModelMetadata(
"open_router", 202752, 65535, "GLM 4.7", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_4_7_FLASH: ModelMetadata(
"open_router", 202752, 202752, "GLM 4.7 Flash", "OpenRouter", "Z.ai", 1
),
LlmModel.ZAI_GLM_5: ModelMetadata(
"open_router", 80000, 80000, "GLM 5", "OpenRouter", "Z.ai", 2
),
LlmModel.ZAI_GLM_5_TURBO: ModelMetadata(
"open_router", 202752, 131072, "GLM 5 Turbo", "OpenRouter", "Z.ai", 3
),
LlmModel.ZAI_GLM_5V_TURBO: ModelMetadata(
"open_router", 202752, 131072, "GLM 5V Turbo", "OpenRouter", "Z.ai", 3
),
# Llama API models
LlmModel.LLAMA_API_LLAMA_4_SCOUT: ModelMetadata(
"llama_api",

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,633 @@
"""Unit tests for baseline service pure-logic helpers.
These tests cover ``_baseline_conversation_updater`` and ``_BaselineStreamState``
without requiring API keys, database connections, or network access.
"""
from unittest.mock import AsyncMock, patch
import pytest
from openai.types.chat import ChatCompletionToolParam
from backend.copilot.baseline.service import (
_baseline_conversation_updater,
_BaselineStreamState,
_compress_session_messages,
_ThinkingStripper,
)
from backend.copilot.model import ChatMessage
from backend.copilot.transcript_builder import TranscriptBuilder
from backend.util.prompt import CompressResult
from backend.util.tool_call_loop import LLMLoopResponse, LLMToolCall, ToolCallResult
class TestBaselineStreamState:
def test_defaults(self):
state = _BaselineStreamState()
assert state.pending_events == []
assert state.assistant_text == ""
assert state.text_started is False
assert state.turn_prompt_tokens == 0
assert state.turn_completion_tokens == 0
assert state.text_block_id # Should be a UUID string
def test_mutable_fields(self):
state = _BaselineStreamState()
state.assistant_text = "hello"
state.turn_prompt_tokens = 100
state.turn_completion_tokens = 50
assert state.assistant_text == "hello"
assert state.turn_prompt_tokens == 100
assert state.turn_completion_tokens == 50
class TestBaselineConversationUpdater:
"""Tests for _baseline_conversation_updater which updates the OpenAI
message list and transcript builder after each LLM call."""
def _make_transcript_builder(self) -> TranscriptBuilder:
builder = TranscriptBuilder()
builder.append_user("test question")
return builder
def test_text_only_response(self):
"""When the LLM returns text without tool calls, the updater appends
a single assistant message and records it in the transcript."""
messages: list = []
builder = self._make_transcript_builder()
response = LLMLoopResponse(
response_text="Hello, world!",
tool_calls=[],
raw_response=None,
prompt_tokens=0,
completion_tokens=0,
)
_baseline_conversation_updater(
messages,
response,
tool_results=None,
transcript_builder=builder,
model="test-model",
)
assert len(messages) == 1
assert messages[0]["role"] == "assistant"
assert messages[0]["content"] == "Hello, world!"
# Transcript should have user + assistant
assert builder.entry_count == 2
assert builder.last_entry_type == "assistant"
def test_tool_calls_response(self):
"""When the LLM returns tool calls, the updater appends the assistant
message with tool_calls and tool result messages."""
messages: list = []
builder = self._make_transcript_builder()
response = LLMLoopResponse(
response_text="Let me search...",
tool_calls=[
LLMToolCall(
id="tc_1",
name="search",
arguments='{"query": "test"}',
),
],
raw_response=None,
prompt_tokens=0,
completion_tokens=0,
)
tool_results = [
ToolCallResult(
tool_call_id="tc_1",
tool_name="search",
content="Found result",
),
]
_baseline_conversation_updater(
messages,
response,
tool_results=tool_results,
transcript_builder=builder,
model="test-model",
)
# Messages: assistant (with tool_calls) + tool result
assert len(messages) == 2
assert messages[0]["role"] == "assistant"
assert messages[0]["content"] == "Let me search..."
assert len(messages[0]["tool_calls"]) == 1
assert messages[0]["tool_calls"][0]["id"] == "tc_1"
assert messages[1]["role"] == "tool"
assert messages[1]["tool_call_id"] == "tc_1"
assert messages[1]["content"] == "Found result"
# Transcript: user + assistant(tool_use) + user(tool_result)
assert builder.entry_count == 3
def test_tool_calls_without_text(self):
"""Tool calls without accompanying text should still work."""
messages: list = []
builder = self._make_transcript_builder()
response = LLMLoopResponse(
response_text=None,
tool_calls=[
LLMToolCall(id="tc_1", name="run", arguments="{}"),
],
raw_response=None,
prompt_tokens=0,
completion_tokens=0,
)
tool_results = [
ToolCallResult(tool_call_id="tc_1", tool_name="run", content="done"),
]
_baseline_conversation_updater(
messages,
response,
tool_results=tool_results,
transcript_builder=builder,
model="test-model",
)
assert len(messages) == 2
assert "content" not in messages[0] # No text content
assert messages[0]["tool_calls"][0]["function"]["name"] == "run"
def test_no_text_no_tools(self):
"""When the response has no text and no tool calls, nothing is appended."""
messages: list = []
builder = self._make_transcript_builder()
response = LLMLoopResponse(
response_text=None,
tool_calls=[],
raw_response=None,
prompt_tokens=0,
completion_tokens=0,
)
_baseline_conversation_updater(
messages,
response,
tool_results=None,
transcript_builder=builder,
model="test-model",
)
assert len(messages) == 0
# Only the user entry from setup
assert builder.entry_count == 1
def test_multiple_tool_calls(self):
"""Multiple tool calls in a single response are all recorded."""
messages: list = []
builder = self._make_transcript_builder()
response = LLMLoopResponse(
response_text=None,
tool_calls=[
LLMToolCall(id="tc_1", name="tool_a", arguments="{}"),
LLMToolCall(id="tc_2", name="tool_b", arguments='{"x": 1}'),
],
raw_response=None,
prompt_tokens=0,
completion_tokens=0,
)
tool_results = [
ToolCallResult(tool_call_id="tc_1", tool_name="tool_a", content="result_a"),
ToolCallResult(tool_call_id="tc_2", tool_name="tool_b", content="result_b"),
]
_baseline_conversation_updater(
messages,
response,
tool_results=tool_results,
transcript_builder=builder,
model="test-model",
)
# 1 assistant + 2 tool results
assert len(messages) == 3
assert len(messages[0]["tool_calls"]) == 2
assert messages[1]["tool_call_id"] == "tc_1"
assert messages[2]["tool_call_id"] == "tc_2"
def test_invalid_tool_arguments_handled(self):
"""Tool call with invalid JSON arguments: the arguments field is
stored as-is in the message, and orjson failure falls back to {}
in the transcript content_blocks."""
messages: list = []
builder = self._make_transcript_builder()
response = LLMLoopResponse(
response_text=None,
tool_calls=[
LLMToolCall(id="tc_1", name="tool_x", arguments="not-json"),
],
raw_response=None,
prompt_tokens=0,
completion_tokens=0,
)
tool_results = [
ToolCallResult(tool_call_id="tc_1", tool_name="tool_x", content="ok"),
]
_baseline_conversation_updater(
messages,
response,
tool_results=tool_results,
transcript_builder=builder,
model="test-model",
)
# Should not raise — invalid JSON falls back to {} in transcript
assert len(messages) == 2
assert messages[0]["tool_calls"][0]["function"]["arguments"] == "not-json"
class TestCompressSessionMessagesPreservesToolCalls:
"""``_compress_session_messages`` must round-trip tool_calls + tool_call_id.
Compression serialises ChatMessage to dict for ``compress_context`` and
reifies the result back to ChatMessage. A regression that drops
``tool_calls`` or ``tool_call_id`` would corrupt the OpenAI message
list and break downstream tool-execution rounds.
"""
@pytest.mark.asyncio
async def test_compressed_output_keeps_tool_calls_and_ids(self):
# Simulate compression that returns a summary + the most recent
# assistant(tool_call) + tool(tool_result) intact.
summary = {"role": "system", "content": "prior turns: user asked X"}
assistant_with_tc = {
"role": "assistant",
"content": "calling tool",
"tool_calls": [
{
"id": "tc_abc",
"type": "function",
"function": {"name": "search", "arguments": '{"q":"y"}'},
}
],
}
tool_result = {
"role": "tool",
"tool_call_id": "tc_abc",
"content": "search result",
}
compress_result = CompressResult(
messages=[summary, assistant_with_tc, tool_result],
token_count=100,
was_compacted=True,
original_token_count=5000,
messages_summarized=10,
messages_dropped=0,
)
# Input: messages that should be compressed.
input_messages = [
ChatMessage(role="user", content="q1"),
ChatMessage(
role="assistant",
content="calling tool",
tool_calls=[
{
"id": "tc_abc",
"type": "function",
"function": {
"name": "search",
"arguments": '{"q":"y"}',
},
}
],
),
ChatMessage(
role="tool",
tool_call_id="tc_abc",
content="search result",
),
]
with patch(
"backend.copilot.baseline.service.compress_context",
new=AsyncMock(return_value=compress_result),
):
compressed = await _compress_session_messages(
input_messages, model="openrouter/anthropic/claude-opus-4"
)
# Summary, assistant(tool_calls), tool(tool_call_id).
assert len(compressed) == 3
# Assistant message must keep its tool_calls intact.
assistant_msg = compressed[1]
assert assistant_msg.role == "assistant"
assert assistant_msg.tool_calls is not None
assert len(assistant_msg.tool_calls) == 1
assert assistant_msg.tool_calls[0]["id"] == "tc_abc"
assert assistant_msg.tool_calls[0]["function"]["name"] == "search"
# Tool-role message must keep tool_call_id for OpenAI linkage.
tool_msg = compressed[2]
assert tool_msg.role == "tool"
assert tool_msg.tool_call_id == "tc_abc"
assert tool_msg.content == "search result"
@pytest.mark.asyncio
async def test_uncompressed_passthrough_keeps_fields(self):
"""When compression is a no-op (was_compacted=False), the original
messages must be returned unchanged — including tool_calls."""
input_messages = [
ChatMessage(
role="assistant",
content="c",
tool_calls=[
{
"id": "t1",
"type": "function",
"function": {"name": "f", "arguments": "{}"},
}
],
),
ChatMessage(role="tool", tool_call_id="t1", content="ok"),
]
noop_result = CompressResult(
messages=[], # ignored when was_compacted=False
token_count=10,
was_compacted=False,
)
with patch(
"backend.copilot.baseline.service.compress_context",
new=AsyncMock(return_value=noop_result),
):
out = await _compress_session_messages(
input_messages, model="openrouter/anthropic/claude-opus-4"
)
assert out is input_messages # same list returned
assert out[0].tool_calls is not None
assert out[0].tool_calls[0]["id"] == "t1"
assert out[1].tool_call_id == "t1"
# ---- _ThinkingStripper tests ---- #
def test_thinking_stripper_basic_thinking_tag() -> None:
"""<thinking>...</thinking> blocks are fully stripped."""
s = _ThinkingStripper()
assert s.process("<thinking>internal reasoning here</thinking>Hello!") == "Hello!"
def test_thinking_stripper_internal_reasoning_tag() -> None:
"""<internal_reasoning>...</internal_reasoning> blocks (Gemini) are stripped."""
s = _ThinkingStripper()
assert (
s.process("<internal_reasoning>step by step</internal_reasoning>Answer")
== "Answer"
)
def test_thinking_stripper_split_across_chunks() -> None:
"""Tags split across multiple chunks are handled correctly."""
s = _ThinkingStripper()
out = s.process("Hello <thin")
out += s.process("king>secret</thinking> world")
assert out == "Hello world"
def test_thinking_stripper_plain_text_preserved() -> None:
"""Plain text with the word 'thinking' is not stripped."""
s = _ThinkingStripper()
assert (
s.process("I am thinking about this problem")
== "I am thinking about this problem"
)
def test_thinking_stripper_multiple_blocks() -> None:
"""Multiple reasoning blocks in one stream are all stripped."""
s = _ThinkingStripper()
result = s.process(
"A<thinking>x</thinking>B<internal_reasoning>y</internal_reasoning>C"
)
assert result == "ABC"
def test_thinking_stripper_flush_discards_unclosed() -> None:
"""Unclosed reasoning block is discarded on flush."""
s = _ThinkingStripper()
s.process("Start<thinking>never closed")
flushed = s.flush()
assert "never closed" not in flushed
def test_thinking_stripper_empty_block() -> None:
"""Empty reasoning blocks are handled gracefully."""
s = _ThinkingStripper()
assert s.process("Before<thinking></thinking>After") == "BeforeAfter"
# ---- _filter_tools_by_permissions tests ---- #
def _make_tool(name: str) -> ChatCompletionToolParam:
"""Build a minimal OpenAI ChatCompletionToolParam."""
return ChatCompletionToolParam(
type="function",
function={"name": name, "parameters": {}},
)
class TestFilterToolsByPermissions:
"""Tests for _filter_tools_by_permissions."""
@patch(
"backend.copilot.permissions.all_known_tool_names",
return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
)
def test_empty_permissions_returns_all(self, _mock_names):
"""Empty permissions (no filtering) returns every tool unchanged."""
from backend.copilot.baseline.service import _filter_tools_by_permissions
from backend.copilot.permissions import CopilotPermissions
tools = [_make_tool("run_block"), _make_tool("web_fetch")]
perms = CopilotPermissions()
result = _filter_tools_by_permissions(tools, perms)
assert result == tools
@patch(
"backend.copilot.permissions.all_known_tool_names",
return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
)
def test_allowlist_keeps_only_matching(self, _mock_names):
"""Explicit allowlist (tools_exclude=False) keeps only listed tools."""
from backend.copilot.baseline.service import _filter_tools_by_permissions
from backend.copilot.permissions import CopilotPermissions
tools = [
_make_tool("run_block"),
_make_tool("web_fetch"),
_make_tool("bash_exec"),
]
perms = CopilotPermissions(tools=["web_fetch"], tools_exclude=False)
result = _filter_tools_by_permissions(tools, perms)
assert len(result) == 1
assert result[0]["function"]["name"] == "web_fetch"
@patch(
"backend.copilot.permissions.all_known_tool_names",
return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
)
def test_blacklist_excludes_listed(self, _mock_names):
"""Blacklist (tools_exclude=True) removes only the listed tools."""
from backend.copilot.baseline.service import _filter_tools_by_permissions
from backend.copilot.permissions import CopilotPermissions
tools = [
_make_tool("run_block"),
_make_tool("web_fetch"),
_make_tool("bash_exec"),
]
perms = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
result = _filter_tools_by_permissions(tools, perms)
names = [t["function"]["name"] for t in result]
assert "bash_exec" not in names
assert "run_block" in names
assert "web_fetch" in names
assert len(result) == 2
@patch(
"backend.copilot.permissions.all_known_tool_names",
return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
)
def test_unknown_tool_name_filtered_out(self, _mock_names):
"""A tool whose name is not in all_known_tool_names is dropped."""
from backend.copilot.baseline.service import _filter_tools_by_permissions
from backend.copilot.permissions import CopilotPermissions
tools = [_make_tool("run_block"), _make_tool("unknown_tool")]
perms = CopilotPermissions(tools=["run_block"], tools_exclude=False)
result = _filter_tools_by_permissions(tools, perms)
names = [t["function"]["name"] for t in result]
assert "unknown_tool" not in names
assert names == ["run_block"]
# ---- _prepare_baseline_attachments tests ---- #
class TestPrepareBaselineAttachments:
"""Tests for _prepare_baseline_attachments."""
@pytest.mark.asyncio
async def test_empty_file_ids(self):
"""Empty file_ids returns empty hint and blocks."""
from backend.copilot.baseline.service import _prepare_baseline_attachments
hint, blocks = await _prepare_baseline_attachments([], "user1", "sess1", "/tmp")
assert hint == ""
assert blocks == []
@pytest.mark.asyncio
async def test_empty_user_id(self):
"""Empty user_id returns empty hint and blocks."""
from backend.copilot.baseline.service import _prepare_baseline_attachments
hint, blocks = await _prepare_baseline_attachments(
["file1"], "", "sess1", "/tmp"
)
assert hint == ""
assert blocks == []
@pytest.mark.asyncio
async def test_image_file_returns_vision_blocks(self):
"""A PNG image within size limits is returned as a base64 vision block."""
from backend.copilot.baseline.service import _prepare_baseline_attachments
fake_info = AsyncMock()
fake_info.name = "photo.png"
fake_info.mime_type = "image/png"
fake_info.size_bytes = 1024
fake_manager = AsyncMock()
fake_manager.get_file_info = AsyncMock(return_value=fake_info)
fake_manager.read_file_by_id = AsyncMock(return_value=b"\x89PNG_FAKE_DATA")
with patch(
"backend.copilot.baseline.service.get_workspace_manager",
new=AsyncMock(return_value=fake_manager),
):
hint, blocks = await _prepare_baseline_attachments(
["fid1"], "user1", "sess1", "/tmp/workdir"
)
assert len(blocks) == 1
assert blocks[0]["type"] == "image"
assert blocks[0]["source"]["media_type"] == "image/png"
assert blocks[0]["source"]["type"] == "base64"
assert "photo.png" in hint
assert "embedded as image" in hint
@pytest.mark.asyncio
async def test_non_image_file_saved_to_working_dir(self, tmp_path):
"""A non-image file is written to working_dir."""
from backend.copilot.baseline.service import _prepare_baseline_attachments
fake_info = AsyncMock()
fake_info.name = "data.csv"
fake_info.mime_type = "text/csv"
fake_info.size_bytes = 42
fake_manager = AsyncMock()
fake_manager.get_file_info = AsyncMock(return_value=fake_info)
fake_manager.read_file_by_id = AsyncMock(return_value=b"col1,col2\na,b")
with patch(
"backend.copilot.baseline.service.get_workspace_manager",
new=AsyncMock(return_value=fake_manager),
):
hint, blocks = await _prepare_baseline_attachments(
["fid1"], "user1", "sess1", str(tmp_path)
)
assert blocks == []
assert "data.csv" in hint
assert "saved to" in hint
saved = tmp_path / "data.csv"
assert saved.exists()
assert saved.read_bytes() == b"col1,col2\na,b"
@pytest.mark.asyncio
async def test_file_not_found_skipped(self):
"""When get_file_info returns None the file is silently skipped."""
from backend.copilot.baseline.service import _prepare_baseline_attachments
fake_manager = AsyncMock()
fake_manager.get_file_info = AsyncMock(return_value=None)
with patch(
"backend.copilot.baseline.service.get_workspace_manager",
new=AsyncMock(return_value=fake_manager),
):
hint, blocks = await _prepare_baseline_attachments(
["missing_id"], "user1", "sess1", "/tmp"
)
assert hint == ""
assert blocks == []
@pytest.mark.asyncio
async def test_workspace_manager_error(self):
"""When get_workspace_manager raises, returns empty results."""
from backend.copilot.baseline.service import _prepare_baseline_attachments
with patch(
"backend.copilot.baseline.service.get_workspace_manager",
new=AsyncMock(side_effect=RuntimeError("connection failed")),
):
hint, blocks = await _prepare_baseline_attachments(
["fid1"], "user1", "sess1", "/tmp"
)
assert hint == ""
assert blocks == []

View File

@@ -0,0 +1,667 @@
"""Integration tests for baseline transcript flow.
Exercises the real helpers in ``baseline/service.py`` that download,
validate, load, append to, backfill, and upload the transcript.
Storage is mocked via ``download_transcript`` / ``upload_transcript``
patches; no network access is required.
"""
import json as stdlib_json
from unittest.mock import AsyncMock, patch
import pytest
from backend.copilot.baseline.service import (
_load_prior_transcript,
_record_turn_to_transcript,
_resolve_baseline_model,
_upload_final_transcript,
is_transcript_stale,
should_upload_transcript,
)
from backend.copilot.service import config
from backend.copilot.transcript import (
STOP_REASON_END_TURN,
STOP_REASON_TOOL_USE,
TranscriptDownload,
)
from backend.copilot.transcript_builder import TranscriptBuilder
from backend.util.tool_call_loop import LLMLoopResponse, LLMToolCall, ToolCallResult
def _make_transcript_content(*roles: str) -> str:
"""Build a minimal valid JSONL transcript from role names."""
lines = []
parent = ""
for i, role in enumerate(roles):
uid = f"uuid-{i}"
entry: dict = {
"type": role,
"uuid": uid,
"parentUuid": parent,
"message": {
"role": role,
"content": [{"type": "text", "text": f"{role} message {i}"}],
},
}
if role == "assistant":
entry["message"]["id"] = f"msg_{i}"
entry["message"]["model"] = "test-model"
entry["message"]["type"] = "message"
entry["message"]["stop_reason"] = STOP_REASON_END_TURN
lines.append(stdlib_json.dumps(entry))
parent = uid
return "\n".join(lines) + "\n"
class TestResolveBaselineModel:
"""Model selection honours the per-request mode."""
def test_fast_mode_selects_fast_model(self):
assert _resolve_baseline_model("fast") == config.fast_model
def test_extended_thinking_selects_default_model(self):
assert _resolve_baseline_model("extended_thinking") == config.model
def test_none_mode_selects_default_model(self):
"""Critical: baseline users without a mode MUST keep the default (opus)."""
assert _resolve_baseline_model(None) == config.model
def test_default_and_fast_models_differ(self):
"""Sanity: the two tiers are actually distinct in production config."""
assert config.model != config.fast_model
class TestLoadPriorTranscript:
"""``_load_prior_transcript`` wraps the download + validate + load flow."""
@pytest.mark.asyncio
async def test_loads_fresh_transcript(self):
builder = TranscriptBuilder()
content = _make_transcript_content("user", "assistant")
download = TranscriptDownload(content=content, message_count=2)
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=download),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=3,
transcript_builder=builder,
)
assert covers is True
assert builder.entry_count == 2
assert builder.last_entry_type == "assistant"
@pytest.mark.asyncio
async def test_rejects_stale_transcript(self):
"""msg_count strictly less than session-1 is treated as stale."""
builder = TranscriptBuilder()
content = _make_transcript_content("user", "assistant")
# session has 6 messages, transcript only covers 2 → stale.
download = TranscriptDownload(content=content, message_count=2)
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=download),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=6,
transcript_builder=builder,
)
assert covers is False
assert builder.is_empty
@pytest.mark.asyncio
async def test_missing_transcript_returns_false(self):
builder = TranscriptBuilder()
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=None),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=2,
transcript_builder=builder,
)
assert covers is False
assert builder.is_empty
@pytest.mark.asyncio
async def test_invalid_transcript_returns_false(self):
builder = TranscriptBuilder()
download = TranscriptDownload(
content='{"type":"progress","uuid":"a"}\n',
message_count=1,
)
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=download),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=2,
transcript_builder=builder,
)
assert covers is False
assert builder.is_empty
@pytest.mark.asyncio
async def test_download_exception_returns_false(self):
builder = TranscriptBuilder()
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(side_effect=RuntimeError("boom")),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=2,
transcript_builder=builder,
)
assert covers is False
assert builder.is_empty
@pytest.mark.asyncio
async def test_zero_message_count_not_stale(self):
"""When msg_count is 0 (unknown), staleness check is skipped."""
builder = TranscriptBuilder()
download = TranscriptDownload(
content=_make_transcript_content("user", "assistant"),
message_count=0,
)
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=download),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=20,
transcript_builder=builder,
)
assert covers is True
assert builder.entry_count == 2
class TestUploadFinalTranscript:
"""``_upload_final_transcript`` serialises and calls storage."""
@pytest.mark.asyncio
async def test_uploads_valid_transcript(self):
builder = TranscriptBuilder()
builder.append_user(content="hi")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "hello"}],
model="test-model",
stop_reason=STOP_REASON_END_TURN,
)
upload_mock = AsyncMock(return_value=None)
with patch(
"backend.copilot.baseline.service.upload_transcript",
new=upload_mock,
):
await _upload_final_transcript(
user_id="user-1",
session_id="session-1",
transcript_builder=builder,
session_msg_count=2,
)
upload_mock.assert_awaited_once()
assert upload_mock.await_args is not None
call_kwargs = upload_mock.await_args.kwargs
assert call_kwargs["user_id"] == "user-1"
assert call_kwargs["session_id"] == "session-1"
assert call_kwargs["message_count"] == 2
assert "hello" in call_kwargs["content"]
@pytest.mark.asyncio
async def test_skips_upload_when_builder_empty(self):
builder = TranscriptBuilder()
upload_mock = AsyncMock(return_value=None)
with patch(
"backend.copilot.baseline.service.upload_transcript",
new=upload_mock,
):
await _upload_final_transcript(
user_id="user-1",
session_id="session-1",
transcript_builder=builder,
session_msg_count=0,
)
upload_mock.assert_not_awaited()
@pytest.mark.asyncio
async def test_swallows_upload_exceptions(self):
"""Upload failures should not propagate (flow continues for the user)."""
builder = TranscriptBuilder()
builder.append_user(content="hi")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "hello"}],
model="test-model",
stop_reason=STOP_REASON_END_TURN,
)
with patch(
"backend.copilot.baseline.service.upload_transcript",
new=AsyncMock(side_effect=RuntimeError("storage unavailable")),
):
# Should not raise.
await _upload_final_transcript(
user_id="user-1",
session_id="session-1",
transcript_builder=builder,
session_msg_count=2,
)
class TestRecordTurnToTranscript:
"""``_record_turn_to_transcript`` translates LLMLoopResponse → transcript."""
def test_records_final_assistant_text(self):
builder = TranscriptBuilder()
builder.append_user(content="hi")
response = LLMLoopResponse(
response_text="hello there",
tool_calls=[],
raw_response=None,
)
_record_turn_to_transcript(
response,
tool_results=None,
transcript_builder=builder,
model="test-model",
)
assert builder.entry_count == 2
assert builder.last_entry_type == "assistant"
jsonl = builder.to_jsonl()
assert "hello there" in jsonl
assert STOP_REASON_END_TURN in jsonl
def test_records_tool_use_then_tool_result(self):
"""Anthropic ordering: assistant(tool_use) → user(tool_result)."""
builder = TranscriptBuilder()
builder.append_user(content="use a tool")
response = LLMLoopResponse(
response_text=None,
tool_calls=[
LLMToolCall(id="call-1", name="echo", arguments='{"text":"hi"}')
],
raw_response=None,
)
tool_results = [
ToolCallResult(tool_call_id="call-1", tool_name="echo", content="hi")
]
_record_turn_to_transcript(
response,
tool_results,
transcript_builder=builder,
model="test-model",
)
# user, assistant(tool_use), user(tool_result) = 3 entries
assert builder.entry_count == 3
jsonl = builder.to_jsonl()
assert STOP_REASON_TOOL_USE in jsonl
assert "tool_use" in jsonl
assert "tool_result" in jsonl
assert "call-1" in jsonl
def test_records_nothing_on_empty_response(self):
builder = TranscriptBuilder()
builder.append_user(content="hi")
response = LLMLoopResponse(
response_text=None,
tool_calls=[],
raw_response=None,
)
_record_turn_to_transcript(
response,
tool_results=None,
transcript_builder=builder,
model="test-model",
)
assert builder.entry_count == 1
def test_malformed_tool_args_dont_crash(self):
"""Bad JSON in tool arguments falls back to {} without raising."""
builder = TranscriptBuilder()
builder.append_user(content="hi")
response = LLMLoopResponse(
response_text=None,
tool_calls=[LLMToolCall(id="call-1", name="echo", arguments="{not-json")],
raw_response=None,
)
tool_results = [
ToolCallResult(tool_call_id="call-1", tool_name="echo", content="ok")
]
_record_turn_to_transcript(
response,
tool_results,
transcript_builder=builder,
model="test-model",
)
assert builder.entry_count == 3
jsonl = builder.to_jsonl()
assert '"input":{}' in jsonl
class TestRoundTrip:
"""End-to-end: load prior → append new turn → upload."""
@pytest.mark.asyncio
async def test_full_round_trip(self):
prior = _make_transcript_content("user", "assistant")
download = TranscriptDownload(content=prior, message_count=2)
builder = TranscriptBuilder()
with patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=download),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=3,
transcript_builder=builder,
)
assert covers is True
assert builder.entry_count == 2
# New user turn.
builder.append_user(content="new question")
assert builder.entry_count == 3
# New assistant turn.
response = LLMLoopResponse(
response_text="new answer",
tool_calls=[],
raw_response=None,
)
_record_turn_to_transcript(
response,
tool_results=None,
transcript_builder=builder,
model="test-model",
)
assert builder.entry_count == 4
# Upload.
upload_mock = AsyncMock(return_value=None)
with patch(
"backend.copilot.baseline.service.upload_transcript",
new=upload_mock,
):
await _upload_final_transcript(
user_id="user-1",
session_id="session-1",
transcript_builder=builder,
session_msg_count=4,
)
upload_mock.assert_awaited_once()
assert upload_mock.await_args is not None
uploaded = upload_mock.await_args.kwargs["content"]
assert "new question" in uploaded
assert "new answer" in uploaded
# Original content preserved in the round trip.
assert "user message 0" in uploaded
assert "assistant message 1" in uploaded
@pytest.mark.asyncio
async def test_backfill_append_guard(self):
"""Backfill only runs when the last entry is not already assistant."""
builder = TranscriptBuilder()
builder.append_user(content="hi")
# Simulate the backfill guard from stream_chat_completion_baseline.
assistant_text = "partial text before error"
if builder.last_entry_type != "assistant":
builder.append_assistant(
content_blocks=[{"type": "text", "text": assistant_text}],
model="test-model",
stop_reason=STOP_REASON_END_TURN,
)
assert builder.last_entry_type == "assistant"
assert "partial text before error" in builder.to_jsonl()
# Second invocation: the guard must prevent double-append.
initial_count = builder.entry_count
if builder.last_entry_type != "assistant":
builder.append_assistant(
content_blocks=[{"type": "text", "text": "duplicate"}],
model="test-model",
stop_reason=STOP_REASON_END_TURN,
)
assert builder.entry_count == initial_count
class TestIsTranscriptStale:
"""``is_transcript_stale`` gates prior-transcript loading."""
def test_none_download_is_not_stale(self):
assert is_transcript_stale(None, session_msg_count=5) is False
def test_zero_message_count_is_not_stale(self):
"""Legacy transcripts without msg_count tracking must remain usable."""
dl = TranscriptDownload(content="", message_count=0)
assert is_transcript_stale(dl, session_msg_count=20) is False
def test_stale_when_covers_less_than_prefix(self):
dl = TranscriptDownload(content="", message_count=2)
# session has 6 messages; transcript must cover at least 5 (6-1).
assert is_transcript_stale(dl, session_msg_count=6) is True
def test_fresh_when_covers_full_prefix(self):
dl = TranscriptDownload(content="", message_count=5)
assert is_transcript_stale(dl, session_msg_count=6) is False
def test_fresh_when_exceeds_prefix(self):
"""Race: transcript ahead of session count is still acceptable."""
dl = TranscriptDownload(content="", message_count=10)
assert is_transcript_stale(dl, session_msg_count=6) is False
def test_boundary_equal_to_prefix_minus_one(self):
dl = TranscriptDownload(content="", message_count=5)
assert is_transcript_stale(dl, session_msg_count=6) is False
class TestShouldUploadTranscript:
"""``should_upload_transcript`` gates the final upload."""
def test_upload_allowed_for_user_with_coverage(self):
assert should_upload_transcript("user-1", True) is True
def test_upload_skipped_when_no_user(self):
assert should_upload_transcript(None, True) is False
def test_upload_skipped_when_empty_user(self):
assert should_upload_transcript("", True) is False
def test_upload_skipped_without_coverage(self):
"""Partial transcript must never clobber a more complete stored one."""
assert should_upload_transcript("user-1", False) is False
def test_upload_skipped_when_no_user_and_no_coverage(self):
assert should_upload_transcript(None, False) is False
class TestTranscriptLifecycle:
"""End-to-end: download → validate → build → upload.
Simulates the full transcript lifecycle inside
``stream_chat_completion_baseline`` by mocking the storage layer and
driving each step through the real helpers.
"""
@pytest.mark.asyncio
async def test_full_lifecycle_happy_path(self):
"""Fresh download, append a turn, upload covers the session."""
builder = TranscriptBuilder()
prior = _make_transcript_content("user", "assistant")
download = TranscriptDownload(content=prior, message_count=2)
upload_mock = AsyncMock(return_value=None)
with (
patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=download),
),
patch(
"backend.copilot.baseline.service.upload_transcript",
new=upload_mock,
),
):
# --- 1. Download & load prior transcript ---
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=3,
transcript_builder=builder,
)
assert covers is True
# --- 2. Append a new user turn + a new assistant response ---
builder.append_user(content="follow-up question")
_record_turn_to_transcript(
LLMLoopResponse(
response_text="follow-up answer",
tool_calls=[],
raw_response=None,
),
tool_results=None,
transcript_builder=builder,
model="test-model",
)
# --- 3. Gate + upload ---
assert (
should_upload_transcript(
user_id="user-1", transcript_covers_prefix=covers
)
is True
)
await _upload_final_transcript(
user_id="user-1",
session_id="session-1",
transcript_builder=builder,
session_msg_count=4,
)
upload_mock.assert_awaited_once()
assert upload_mock.await_args is not None
uploaded = upload_mock.await_args.kwargs["content"]
assert "follow-up question" in uploaded
assert "follow-up answer" in uploaded
# Original prior-turn content preserved.
assert "user message 0" in uploaded
assert "assistant message 1" in uploaded
@pytest.mark.asyncio
async def test_lifecycle_stale_download_suppresses_upload(self):
"""Stale download → covers=False → upload must be skipped."""
builder = TranscriptBuilder()
# session has 10 msgs but stored transcript only covers 2 → stale.
stale = TranscriptDownload(
content=_make_transcript_content("user", "assistant"),
message_count=2,
)
upload_mock = AsyncMock(return_value=None)
with (
patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=stale),
),
patch(
"backend.copilot.baseline.service.upload_transcript",
new=upload_mock,
),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=10,
transcript_builder=builder,
)
assert covers is False
# The caller's gate mirrors the production path.
assert (
should_upload_transcript(user_id="user-1", transcript_covers_prefix=covers)
is False
)
upload_mock.assert_not_awaited()
@pytest.mark.asyncio
async def test_lifecycle_anonymous_user_skips_upload(self):
"""Anonymous (user_id=None) → upload gate must return False."""
builder = TranscriptBuilder()
builder.append_user(content="hi")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "hello"}],
model="test-model",
stop_reason=STOP_REASON_END_TURN,
)
assert (
should_upload_transcript(user_id=None, transcript_covers_prefix=True)
is False
)
@pytest.mark.asyncio
async def test_lifecycle_missing_download_still_uploads_new_content(self):
"""No prior transcript → covers defaults to True in the service,
new turn should upload cleanly."""
builder = TranscriptBuilder()
upload_mock = AsyncMock(return_value=None)
with (
patch(
"backend.copilot.baseline.service.download_transcript",
new=AsyncMock(return_value=None),
),
patch(
"backend.copilot.baseline.service.upload_transcript",
new=upload_mock,
),
):
covers = await _load_prior_transcript(
user_id="user-1",
session_id="session-1",
session_msg_count=1,
transcript_builder=builder,
)
# No download: covers is False, so the production path would
# skip upload. This protects against overwriting a future
# more-complete transcript with a single-turn snapshot.
assert covers is False
assert (
should_upload_transcript(
user_id="user-1", transcript_covers_prefix=covers
)
is False
)
upload_mock.assert_not_awaited()

View File

@@ -8,13 +8,26 @@ from pydantic_settings import BaseSettings
from backend.util.clients import OPENROUTER_BASE_URL
# Per-request routing mode for a single chat turn.
# - 'fast': route to the baseline OpenAI-compatible path with the cheaper model.
# - 'extended_thinking': route to the Claude Agent SDK path with the default
# (opus) model.
# ``None`` means "no override"; the server falls back to the Claude Code
# subscription flag → LaunchDarkly COPILOT_SDK → config.use_claude_agent_sdk.
CopilotMode = Literal["fast", "extended_thinking"]
class ChatConfig(BaseSettings):
"""Configuration for the chat system."""
# OpenAI API Configuration
model: str = Field(
default="anthropic/claude-opus-4.6", description="Default model to use"
default="anthropic/claude-opus-4.6",
description="Default model for extended thinking mode",
)
fast_model: str = Field(
default="anthropic/claude-sonnet-4",
description="Model for fast mode (baseline path). Should be faster/cheaper than the default model.",
)
title_model: str = Field(
default="openai/gpt-4o-mini",
@@ -81,11 +94,11 @@ class ChatConfig(BaseSettings):
# allows ~70-100 turns/day.
# Checked at the HTTP layer (routes.py) before each turn.
#
# TODO: These are deploy-time constants applied identically to every user.
# If per-user or per-plan limits are needed (e.g., free tier vs paid), these
# must move to the database (e.g., a UserPlan table) and get_usage_status /
# check_rate_limit would look up each user's specific limits instead of
# reading config.daily_token_limit / config.weekly_token_limit.
# These are base limits for the FREE tier. Higher tiers (PRO, BUSINESS,
# ENTERPRISE) multiply these by their tier multiplier (see
# rate_limit.TIER_MULTIPLIERS). User tier is stored in the
# User.subscriptionTier DB column and resolved inside
# get_global_rate_limits().
daily_token_limit: int = Field(
default=2_500_000,
description="Max tokens per day, resets at midnight UTC (0 = unlimited)",
@@ -133,6 +146,32 @@ class ChatConfig(BaseSettings):
description="Use --resume for multi-turn conversations instead of "
"history compression. Falls back to compression when unavailable.",
)
claude_agent_fallback_model: str = Field(
default="claude-sonnet-4-20250514",
description="Fallback model when the primary model is unavailable (e.g. 529 "
"overloaded). The SDK automatically retries with this cheaper model.",
)
claude_agent_max_turns: int = Field(
default=1000,
ge=1,
le=10000,
description="Maximum number of agentic turns (tool-use loops) per query. "
"Prevents runaway tool loops from burning budget.",
)
claude_agent_max_budget_usd: float = Field(
default=100.0,
ge=0.01,
le=1000.0,
description="Maximum spend in USD per SDK query. The CLI aborts the "
"request if this budget is exceeded.",
)
claude_agent_max_transient_retries: int = Field(
default=3,
ge=0,
le=10,
description="Maximum number of retries for transient API errors "
"(429, 5xx, ECONNRESET) before surfacing the error to the user.",
)
use_openrouter: bool = Field(
default=True,
description="Enable routing API calls through the OpenRouter proxy. "

View File

@@ -44,12 +44,31 @@ def parse_node_id_from_exec_id(node_exec_id: str) -> str:
# Transient Anthropic API error detection
# ---------------------------------------------------------------------------
# Patterns in error text that indicate a transient Anthropic API error
# (ECONNRESET / dropped TCP connection) which is retryable.
# which is retryable. Covers:
# - Connection-level: ECONNRESET, dropped TCP connections
# - HTTP 429: rate-limit / too-many-requests
# - HTTP 5xx: server errors
#
# Prefer specific status-code patterns over natural-language phrases
# (e.g. "overloaded", "bad gateway") — those phrases can appear in
# application-level SDK messages and would trigger spurious retries.
_TRANSIENT_ERROR_PATTERNS = (
# Connection-level
"socket connection was closed unexpectedly",
"ECONNRESET",
"connection was forcibly closed",
"network socket disconnected",
# 429 rate-limit patterns
"rate limit",
"rate_limit",
"too many requests",
"status code 429",
# 5xx server error patterns (status-code-specific to avoid false positives)
"status code 529",
"status code 500",
"status code 502",
"status code 503",
"status code 504",
)
FRIENDLY_TRANSIENT_MSG = "Anthropic connection interrupted — please retry"

View File

@@ -14,6 +14,7 @@ from prisma.types import (
ChatSessionUpdateInput,
ChatSessionWhereInput,
)
from pydantic import BaseModel
from backend.data import db
from backend.util.json import SafeJson, sanitize_string
@@ -23,12 +24,22 @@ from .model import (
ChatSession,
ChatSessionInfo,
ChatSessionMetadata,
invalidate_session_cache,
cache_chat_session,
)
from .model import get_chat_session as get_chat_session_cached
logger = logging.getLogger(__name__)
class PaginatedMessages(BaseModel):
"""Result of a paginated message query."""
messages: list[ChatMessage]
has_more: bool
oldest_sequence: int | None
session: ChatSessionInfo
async def get_chat_session(session_id: str) -> ChatSession | None:
"""Get a chat session by ID from the database."""
session = await PrismaChatSession.prisma().find_unique(
@@ -38,6 +49,116 @@ async def get_chat_session(session_id: str) -> ChatSession | None:
return ChatSession.from_db(session) if session else None
async def get_chat_session_metadata(session_id: str) -> ChatSessionInfo | None:
"""Get chat session metadata (without messages) for ownership validation."""
session = await PrismaChatSession.prisma().find_unique(
where={"id": session_id},
)
return ChatSessionInfo.from_db(session) if session else None
async def get_chat_messages_paginated(
session_id: str,
limit: int = 50,
before_sequence: int | None = None,
user_id: str | None = None,
) -> PaginatedMessages | None:
"""Get paginated messages for a session, newest first.
Verifies session existence (and ownership when ``user_id`` is provided)
in parallel with the message query. Returns ``None`` when the session
is not found or does not belong to the user.
Args:
session_id: The chat session ID.
limit: Max messages to return.
before_sequence: Cursor — return messages with sequence < this value.
user_id: If provided, filters via ``Session.userId`` so only the
session owner's messages are returned (acts as an ownership guard).
"""
# Build session-existence / ownership check
session_where: ChatSessionWhereInput = {"id": session_id}
if user_id is not None:
session_where["userId"] = user_id
# Build message include — fetch paginated messages in the same query
msg_include: dict[str, Any] = {
"order_by": {"sequence": "desc"},
"take": limit + 1,
}
if before_sequence is not None:
msg_include["where"] = {"sequence": {"lt": before_sequence}}
# Single query: session existence/ownership + paginated messages
session = await PrismaChatSession.prisma().find_first(
where=session_where,
include={"Messages": msg_include},
)
if session is None:
return None
session_info = ChatSessionInfo.from_db(session)
results = list(session.Messages) if session.Messages else []
has_more = len(results) > limit
results = results[:limit]
# Reverse to ascending order
results.reverse()
# Tool-call boundary fix: if the oldest message is a tool message,
# expand backward to include the preceding assistant message that
# owns the tool_calls, so convertChatSessionMessagesToUiMessages
# can pair them correctly.
_BOUNDARY_SCAN_LIMIT = 10
if results and results[0].role == "tool":
boundary_where: dict[str, Any] = {
"sessionId": session_id,
"sequence": {"lt": results[0].sequence},
}
if user_id is not None:
boundary_where["Session"] = {"is": {"userId": user_id}}
extra = await PrismaChatMessage.prisma().find_many(
where=boundary_where,
order={"sequence": "desc"},
take=_BOUNDARY_SCAN_LIMIT,
)
# Find the first non-tool message (should be the assistant)
boundary_msgs = []
found_owner = False
for msg in extra:
boundary_msgs.append(msg)
if msg.role != "tool":
found_owner = True
break
boundary_msgs.reverse()
if not found_owner:
logger.warning(
"Boundary expansion did not find owning assistant message "
"for session=%s before sequence=%s (%d msgs scanned)",
session_id,
results[0].sequence,
len(extra),
)
if boundary_msgs:
results = boundary_msgs + results
# Only mark has_more if the expanded boundary isn't the
# very start of the conversation (sequence 0).
if boundary_msgs[0].sequence > 0:
has_more = True
messages = [ChatMessage.from_db(m) for m in results]
oldest_sequence = messages[0].sequence if messages else None
return PaginatedMessages(
messages=messages,
has_more=has_more,
oldest_sequence=oldest_sequence,
session=session_info,
)
async def create_chat_session(
session_id: str,
user_id: str,
@@ -380,8 +501,11 @@ async def update_tool_message_content(
async def set_turn_duration(session_id: str, duration_ms: int) -> None:
"""Set durationMs on the last assistant message in a session.
Also invalidates the Redis session cache so the next GET returns
the updated duration.
Updates the Redis cache in-place instead of invalidating it.
Invalidation would delete the key, creating a window where concurrent
``get_chat_session`` calls re-populate the cache from DB — potentially
with stale data if the DB write from the previous turn hasn't propagated.
This race caused duplicate user messages on the next turn.
"""
last_msg = await PrismaChatMessage.prisma().find_first(
where={"sessionId": session_id, "role": "assistant"},
@@ -392,5 +516,13 @@ async def set_turn_duration(session_id: str, duration_ms: int) -> None:
where={"id": last_msg.id},
data={"durationMs": duration_ms},
)
# Invalidate cache so the session is re-fetched from DB with durationMs
await invalidate_session_cache(session_id)
# Update cache in-place rather than invalidating to avoid a
# race window where the empty cache gets re-populated with
# stale data by a concurrent get_chat_session call.
session = await get_chat_session_cached(session_id)
if session and session.messages:
for msg in reversed(session.messages):
if msg.role == "assistant":
msg.duration_ms = duration_ms
break
await cache_chat_session(session)

View File

@@ -0,0 +1,388 @@
"""Unit tests for copilot.db — paginated message queries."""
from __future__ import annotations
from datetime import UTC, datetime
from typing import Any
from unittest.mock import AsyncMock, patch
import pytest
from prisma.models import ChatMessage as PrismaChatMessage
from prisma.models import ChatSession as PrismaChatSession
from backend.copilot.db import (
PaginatedMessages,
get_chat_messages_paginated,
set_turn_duration,
)
from backend.copilot.model import ChatMessage as CopilotChatMessage
from backend.copilot.model import ChatSession, get_chat_session, upsert_chat_session
def _make_msg(
sequence: int,
role: str = "assistant",
content: str | None = "hello",
tool_calls: Any = None,
) -> PrismaChatMessage:
"""Build a minimal PrismaChatMessage for testing."""
return PrismaChatMessage(
id=f"msg-{sequence}",
createdAt=datetime.now(UTC),
sessionId="sess-1",
role=role,
content=content,
sequence=sequence,
toolCalls=tool_calls,
name=None,
toolCallId=None,
refusal=None,
functionCall=None,
)
def _make_session(
session_id: str = "sess-1",
user_id: str = "user-1",
messages: list[PrismaChatMessage] | None = None,
) -> PrismaChatSession:
"""Build a minimal PrismaChatSession for testing."""
now = datetime.now(UTC)
session = PrismaChatSession.model_construct(
id=session_id,
createdAt=now,
updatedAt=now,
userId=user_id,
credentials={},
successfulAgentRuns={},
successfulAgentSchedules={},
totalPromptTokens=0,
totalCompletionTokens=0,
title=None,
metadata={},
Messages=messages or [],
)
return session
SESSION_ID = "sess-1"
@pytest.fixture()
def mock_db():
"""Patch ChatSession.prisma().find_first and ChatMessage.prisma().find_many.
find_first is used for the main query (session + included messages).
find_many is used only for boundary expansion queries.
"""
with (
patch.object(PrismaChatSession, "prisma") as mock_session_prisma,
patch.object(PrismaChatMessage, "prisma") as mock_msg_prisma,
):
find_first = AsyncMock()
mock_session_prisma.return_value.find_first = find_first
find_many = AsyncMock(return_value=[])
mock_msg_prisma.return_value.find_many = find_many
yield find_first, find_many
# ---------- Basic pagination ----------
@pytest.mark.asyncio
async def test_basic_page_returns_messages_ascending(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""Messages are returned in ascending sequence order."""
find_first, _ = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(3), _make_msg(2), _make_msg(1)],
)
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
assert isinstance(page, PaginatedMessages)
assert [m.sequence for m in page.messages] == [1, 2, 3]
assert page.has_more is False
assert page.oldest_sequence == 1
@pytest.mark.asyncio
async def test_has_more_when_results_exceed_limit(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""has_more is True when DB returns more than limit items."""
find_first, _ = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(3), _make_msg(2), _make_msg(1)],
)
page = await get_chat_messages_paginated(SESSION_ID, limit=2)
assert page is not None
assert page.has_more is True
assert len(page.messages) == 2
assert [m.sequence for m in page.messages] == [2, 3]
@pytest.mark.asyncio
async def test_empty_session_returns_no_messages(
mock_db: tuple[AsyncMock, AsyncMock],
):
find_first, _ = mock_db
find_first.return_value = _make_session(messages=[])
page = await get_chat_messages_paginated(SESSION_ID, limit=50)
assert page is not None
assert page.messages == []
assert page.has_more is False
assert page.oldest_sequence is None
@pytest.mark.asyncio
async def test_before_sequence_filters_correctly(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""before_sequence is passed as a where filter inside the Messages include."""
find_first, _ = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(2), _make_msg(1)],
)
await get_chat_messages_paginated(SESSION_ID, limit=50, before_sequence=5)
call_kwargs = find_first.call_args
include = call_kwargs.kwargs.get("include") or call_kwargs[1].get("include")
assert include["Messages"]["where"] == {"sequence": {"lt": 5}}
@pytest.mark.asyncio
async def test_no_where_on_messages_without_before_sequence(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""Without before_sequence, the Messages include has no where clause."""
find_first, _ = mock_db
find_first.return_value = _make_session(messages=[_make_msg(1)])
await get_chat_messages_paginated(SESSION_ID, limit=50)
call_kwargs = find_first.call_args
include = call_kwargs.kwargs.get("include") or call_kwargs[1].get("include")
assert "where" not in include["Messages"]
@pytest.mark.asyncio
async def test_user_id_filter_applied_to_session_where(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""user_id adds a userId filter to the session-level where clause."""
find_first, _ = mock_db
find_first.return_value = _make_session(messages=[_make_msg(1)])
await get_chat_messages_paginated(SESSION_ID, limit=50, user_id="user-abc")
call_kwargs = find_first.call_args
where = call_kwargs.kwargs.get("where") or call_kwargs[1].get("where")
assert where["userId"] == "user-abc"
@pytest.mark.asyncio
async def test_session_not_found_returns_none(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""Returns None when session doesn't exist or user doesn't own it."""
find_first, _ = mock_db
find_first.return_value = None
page = await get_chat_messages_paginated(SESSION_ID, limit=50)
assert page is None
@pytest.mark.asyncio
async def test_session_info_included_in_result(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""PaginatedMessages includes session metadata."""
find_first, _ = mock_db
find_first.return_value = _make_session(messages=[_make_msg(1)])
page = await get_chat_messages_paginated(SESSION_ID, limit=50)
assert page is not None
assert page.session.session_id == SESSION_ID
# ---------- Backward boundary expansion ----------
@pytest.mark.asyncio
async def test_boundary_expansion_includes_assistant(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""When page starts with a tool message, expand backward to include
the owning assistant message."""
find_first, find_many = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(5, role="tool"), _make_msg(4, role="tool")],
)
find_many.return_value = [_make_msg(3, role="assistant")]
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
assert page is not None
assert [m.sequence for m in page.messages] == [3, 4, 5]
assert page.messages[0].role == "assistant"
assert page.oldest_sequence == 3
@pytest.mark.asyncio
async def test_boundary_expansion_includes_multiple_tool_msgs(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""Boundary expansion scans past consecutive tool messages to find
the owning assistant."""
find_first, find_many = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(7, role="tool")],
)
find_many.return_value = [
_make_msg(6, role="tool"),
_make_msg(5, role="tool"),
_make_msg(4, role="assistant"),
]
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
assert page is not None
assert [m.sequence for m in page.messages] == [4, 5, 6, 7]
assert page.messages[0].role == "assistant"
@pytest.mark.asyncio
async def test_boundary_expansion_sets_has_more_when_not_at_start(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""After boundary expansion, has_more=True if expanded msgs aren't at seq 0."""
find_first, find_many = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(3, role="tool")],
)
find_many.return_value = [_make_msg(2, role="assistant")]
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
assert page is not None
assert page.has_more is True
@pytest.mark.asyncio
async def test_boundary_expansion_no_has_more_at_conversation_start(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""has_more stays False when boundary expansion reaches seq 0."""
find_first, find_many = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(1, role="tool")],
)
find_many.return_value = [_make_msg(0, role="assistant")]
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
assert page is not None
assert page.has_more is False
assert page.oldest_sequence == 0
@pytest.mark.asyncio
async def test_no_boundary_expansion_when_first_msg_not_tool(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""No boundary expansion when the first message is not a tool message."""
find_first, find_many = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(3, role="user"), _make_msg(2, role="assistant")],
)
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
assert page is not None
assert find_many.call_count == 0
assert [m.sequence for m in page.messages] == [2, 3]
@pytest.mark.asyncio
async def test_boundary_expansion_warns_when_no_owner_found(
mock_db: tuple[AsyncMock, AsyncMock],
):
"""When boundary scan doesn't find a non-tool message, a warning is logged
and the boundary messages are still included."""
find_first, find_many = mock_db
find_first.return_value = _make_session(
messages=[_make_msg(10, role="tool")],
)
find_many.return_value = [_make_msg(i, role="tool") for i in range(9, -1, -1)]
with patch("backend.copilot.db.logger") as mock_logger:
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
mock_logger.warning.assert_called_once()
assert page is not None
assert page.messages[0].role == "tool"
assert len(page.messages) > 1
# ---------- Turn duration (integration tests) ----------
@pytest.mark.asyncio(loop_scope="session")
async def test_set_turn_duration_updates_cache_in_place(setup_test_user, test_user_id):
"""set_turn_duration patches the cached session without invalidation.
Verifies that after calling set_turn_duration the Redis-cached session
reflects the updated durationMs on the last assistant message, without
the cache having been deleted and re-populated (which could race with
concurrent get_chat_session calls).
"""
session = ChatSession.new(user_id=test_user_id, dry_run=False)
session.messages = [
CopilotChatMessage(role="user", content="hello"),
CopilotChatMessage(role="assistant", content="hi there"),
]
session = await upsert_chat_session(session)
# Ensure the session is in cache
cached = await get_chat_session(session.session_id, test_user_id)
assert cached is not None
assert cached.messages[-1].duration_ms is None
# Update turn duration — should patch cache in-place
await set_turn_duration(session.session_id, 1234)
# Read from cache (not DB) — the cache should already have the update
updated = await get_chat_session(session.session_id, test_user_id)
assert updated is not None
assistant_msgs = [m for m in updated.messages if m.role == "assistant"]
assert len(assistant_msgs) == 1
assert assistant_msgs[0].duration_ms == 1234
@pytest.mark.asyncio(loop_scope="session")
async def test_set_turn_duration_no_assistant_message(setup_test_user, test_user_id):
"""set_turn_duration is a no-op when there are no assistant messages."""
session = ChatSession.new(user_id=test_user_id, dry_run=False)
session.messages = [
CopilotChatMessage(role="user", content="hello"),
]
session = await upsert_chat_session(session)
# Should not raise
await set_turn_duration(session.session_id, 5678)
cached = await get_chat_session(session.session_id, test_user_id)
assert cached is not None
# User message should not have durationMs
assert cached.messages[0].duration_ms is None

View File

@@ -13,7 +13,7 @@ import time
from backend.copilot import stream_registry
from backend.copilot.baseline import stream_chat_completion_baseline
from backend.copilot.config import ChatConfig
from backend.copilot.config import ChatConfig, CopilotMode
from backend.copilot.response_model import StreamError
from backend.copilot.sdk import service as sdk_service
from backend.copilot.sdk.dummy import stream_chat_completion_dummy
@@ -30,6 +30,57 @@ from .utils import CoPilotExecutionEntry, CoPilotLogMetadata
logger = TruncatedLogger(logging.getLogger(__name__), prefix="[CoPilotExecutor]")
# ============ Mode Routing ============ #
async def resolve_effective_mode(
mode: CopilotMode | None,
user_id: str | None,
) -> CopilotMode | None:
"""Strip ``mode`` when the user is not entitled to the toggle.
The UI gates the mode toggle behind ``CHAT_MODE_OPTION``; the
processor enforces the same gate server-side so an authenticated
user cannot bypass the flag by crafting a request directly.
"""
if mode is None:
return None
allowed = await is_feature_enabled(
Flag.CHAT_MODE_OPTION,
user_id or "anonymous",
default=False,
)
if not allowed:
logger.info(f"Ignoring mode={mode} — CHAT_MODE_OPTION is disabled for user")
return None
return mode
async def resolve_use_sdk_for_mode(
mode: CopilotMode | None,
user_id: str | None,
*,
use_claude_code_subscription: bool,
config_default: bool,
) -> bool:
"""Pick the SDK vs baseline path for a single turn.
Per-request ``mode`` wins whenever it is set (after the
``CHAT_MODE_OPTION`` gate has been applied upstream). Otherwise
falls back to the Claude Code subscription override, then the
``COPILOT_SDK`` LaunchDarkly flag, then the config default.
"""
if mode == "fast":
return False
if mode == "extended_thinking":
return True
return use_claude_code_subscription or await is_feature_enabled(
Flag.COPILOT_SDK,
user_id or "anonymous",
default=config_default,
)
# ============ Module Entry Points ============ #
# Thread-local storage for processor instances
@@ -250,21 +301,26 @@ class CoPilotProcessor:
if config.test_mode:
stream_fn = stream_chat_completion_dummy
log.warning("Using DUMMY service (CHAT_TEST_MODE=true)")
effective_mode = None
else:
use_sdk = (
config.use_claude_code_subscription
or await is_feature_enabled(
Flag.COPILOT_SDK,
entry.user_id or "anonymous",
default=config.use_claude_agent_sdk,
)
# Enforce server-side feature-flag gate so unauthorised
# users cannot force a mode by crafting the request.
effective_mode = await resolve_effective_mode(entry.mode, entry.user_id)
use_sdk = await resolve_use_sdk_for_mode(
effective_mode,
entry.user_id,
use_claude_code_subscription=config.use_claude_code_subscription,
config_default=config.use_claude_agent_sdk,
)
stream_fn = (
sdk_service.stream_chat_completion_sdk
if use_sdk
else stream_chat_completion_baseline
)
log.info(f"Using {'SDK' if use_sdk else 'baseline'} service")
log.info(
f"Using {'SDK' if use_sdk else 'baseline'} service "
f"(mode={effective_mode or 'default'})"
)
# Stream chat completion and publish chunks to Redis.
# stream_and_publish wraps the raw stream with registry
@@ -276,6 +332,7 @@ class CoPilotProcessor:
user_id=entry.user_id,
context=entry.context,
file_ids=entry.file_ids,
mode=effective_mode,
)
async for chunk in stream_registry.stream_and_publish(
session_id=entry.session_id,

View File

@@ -0,0 +1,175 @@
"""Unit tests for CoPilot mode routing logic in the processor.
Tests cover the mode→service mapping:
- 'fast' → baseline service
- 'extended_thinking' → SDK service
- None → feature flag / config fallback
as well as the ``CHAT_MODE_OPTION`` server-side gate. The tests import
the real production helpers from ``processor.py`` so the routing logic
has meaningful coverage.
"""
from unittest.mock import AsyncMock, patch
import pytest
from backend.copilot.executor.processor import (
resolve_effective_mode,
resolve_use_sdk_for_mode,
)
class TestResolveUseSdkForMode:
"""Tests for the per-request mode routing logic."""
@pytest.mark.asyncio
async def test_fast_mode_uses_baseline(self):
"""mode='fast' always routes to baseline, regardless of flags."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=True),
):
assert (
await resolve_use_sdk_for_mode(
"fast",
"user-1",
use_claude_code_subscription=True,
config_default=True,
)
is False
)
@pytest.mark.asyncio
async def test_extended_thinking_uses_sdk(self):
"""mode='extended_thinking' always routes to SDK, regardless of flags."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=False),
):
assert (
await resolve_use_sdk_for_mode(
"extended_thinking",
"user-1",
use_claude_code_subscription=False,
config_default=False,
)
is True
)
@pytest.mark.asyncio
async def test_none_mode_uses_subscription_override(self):
"""mode=None with claude_code_subscription=True routes to SDK."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=False),
):
assert (
await resolve_use_sdk_for_mode(
None,
"user-1",
use_claude_code_subscription=True,
config_default=False,
)
is True
)
@pytest.mark.asyncio
async def test_none_mode_uses_feature_flag(self):
"""mode=None with feature flag enabled routes to SDK."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=True),
) as flag_mock:
assert (
await resolve_use_sdk_for_mode(
None,
"user-1",
use_claude_code_subscription=False,
config_default=False,
)
is True
)
flag_mock.assert_awaited_once()
@pytest.mark.asyncio
async def test_none_mode_uses_config_default(self):
"""mode=None falls back to config.use_claude_agent_sdk."""
# When LaunchDarkly returns the default (True), we expect SDK routing.
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=True),
):
assert (
await resolve_use_sdk_for_mode(
None,
"user-1",
use_claude_code_subscription=False,
config_default=True,
)
is True
)
@pytest.mark.asyncio
async def test_none_mode_all_disabled(self):
"""mode=None with all flags off routes to baseline."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=False),
):
assert (
await resolve_use_sdk_for_mode(
None,
"user-1",
use_claude_code_subscription=False,
config_default=False,
)
is False
)
class TestResolveEffectiveMode:
"""Tests for the CHAT_MODE_OPTION server-side gate."""
@pytest.mark.asyncio
async def test_none_mode_passes_through(self):
"""mode=None is returned as-is without a flag check."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=False),
) as flag_mock:
assert await resolve_effective_mode(None, "user-1") is None
flag_mock.assert_not_awaited()
@pytest.mark.asyncio
async def test_mode_stripped_when_flag_disabled(self):
"""When CHAT_MODE_OPTION is off, mode is dropped to None."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=False),
):
assert await resolve_effective_mode("fast", "user-1") is None
assert await resolve_effective_mode("extended_thinking", "user-1") is None
@pytest.mark.asyncio
async def test_mode_preserved_when_flag_enabled(self):
"""When CHAT_MODE_OPTION is on, the user-selected mode is preserved."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=True),
):
assert await resolve_effective_mode("fast", "user-1") == "fast"
assert (
await resolve_effective_mode("extended_thinking", "user-1")
== "extended_thinking"
)
@pytest.mark.asyncio
async def test_anonymous_user_with_mode(self):
"""Anonymous users (user_id=None) still pass through the gate."""
with patch(
"backend.copilot.executor.processor.is_feature_enabled",
new=AsyncMock(return_value=False),
) as flag_mock:
assert await resolve_effective_mode("fast", None) is None
flag_mock.assert_awaited_once()

View File

@@ -9,6 +9,7 @@ import logging
from pydantic import BaseModel
from backend.copilot.config import CopilotMode
from backend.data.rabbitmq import Exchange, ExchangeType, Queue, RabbitMQConfig
from backend.util.logging import TruncatedLogger, is_structured_logging_enabled
@@ -156,6 +157,9 @@ class CoPilotExecutionEntry(BaseModel):
file_ids: list[str] | None = None
"""Workspace file IDs attached to the user's message"""
mode: CopilotMode | None = None
"""Autopilot mode override: 'fast' or 'extended_thinking'. None = server default."""
class CancelCoPilotEvent(BaseModel):
"""Event to cancel a CoPilot operation."""
@@ -175,6 +179,7 @@ async def enqueue_copilot_turn(
is_user_message: bool = True,
context: dict[str, str] | None = None,
file_ids: list[str] | None = None,
mode: CopilotMode | None = None,
) -> None:
"""Enqueue a CoPilot task for processing by the executor service.
@@ -186,6 +191,7 @@ async def enqueue_copilot_turn(
is_user_message: Whether the message is from the user (vs system/assistant)
context: Optional context for the message (e.g., {url: str, content: str})
file_ids: Optional workspace file IDs attached to the user's message
mode: Autopilot mode override ('fast' or 'extended_thinking'). None = server default.
"""
from backend.util.clients import get_async_copilot_queue
@@ -197,6 +203,7 @@ async def enqueue_copilot_turn(
is_user_message=is_user_message,
context=context,
file_ids=file_ids,
mode=mode,
)
queue_client = await get_async_copilot_queue()

View File

@@ -0,0 +1,123 @@
"""Tests for CoPilot executor utils (queue config, message models, logging)."""
from backend.copilot.executor.utils import (
COPILOT_EXECUTION_EXCHANGE,
COPILOT_EXECUTION_QUEUE_NAME,
COPILOT_EXECUTION_ROUTING_KEY,
CancelCoPilotEvent,
CoPilotExecutionEntry,
CoPilotLogMetadata,
create_copilot_queue_config,
)
class TestCoPilotExecutionEntry:
def test_basic_fields(self):
entry = CoPilotExecutionEntry(
session_id="s1",
user_id="u1",
message="hello",
)
assert entry.session_id == "s1"
assert entry.user_id == "u1"
assert entry.message == "hello"
assert entry.is_user_message is True
assert entry.mode is None
assert entry.context is None
assert entry.file_ids is None
def test_mode_field(self):
entry = CoPilotExecutionEntry(
session_id="s1",
user_id="u1",
message="test",
mode="fast",
)
assert entry.mode == "fast"
entry2 = CoPilotExecutionEntry(
session_id="s1",
user_id="u1",
message="test",
mode="extended_thinking",
)
assert entry2.mode == "extended_thinking"
def test_optional_fields(self):
entry = CoPilotExecutionEntry(
session_id="s1",
user_id="u1",
message="test",
turn_id="t1",
context={"url": "https://example.com"},
file_ids=["f1", "f2"],
is_user_message=False,
)
assert entry.turn_id == "t1"
assert entry.context == {"url": "https://example.com"}
assert entry.file_ids == ["f1", "f2"]
assert entry.is_user_message is False
def test_serialization_roundtrip(self):
entry = CoPilotExecutionEntry(
session_id="s1",
user_id="u1",
message="hello",
mode="fast",
)
json_str = entry.model_dump_json()
restored = CoPilotExecutionEntry.model_validate_json(json_str)
assert restored == entry
class TestCancelCoPilotEvent:
def test_basic(self):
event = CancelCoPilotEvent(session_id="s1")
assert event.session_id == "s1"
def test_serialization(self):
event = CancelCoPilotEvent(session_id="s1")
restored = CancelCoPilotEvent.model_validate_json(event.model_dump_json())
assert restored.session_id == "s1"
class TestCreateCopilotQueueConfig:
def test_returns_valid_config(self):
config = create_copilot_queue_config()
assert len(config.exchanges) == 2
assert len(config.queues) == 2
def test_execution_queue_properties(self):
config = create_copilot_queue_config()
exec_queue = next(
q for q in config.queues if q.name == COPILOT_EXECUTION_QUEUE_NAME
)
assert exec_queue.durable is True
assert exec_queue.exchange == COPILOT_EXECUTION_EXCHANGE
assert exec_queue.routing_key == COPILOT_EXECUTION_ROUTING_KEY
def test_cancel_queue_uses_fanout(self):
config = create_copilot_queue_config()
cancel_queue = next(
q for q in config.queues if q.name != COPILOT_EXECUTION_QUEUE_NAME
)
assert cancel_queue.exchange is not None
assert cancel_queue.exchange.type.value == "fanout"
class TestCoPilotLogMetadata:
def test_creates_logger_with_metadata(self):
import logging
base_logger = logging.getLogger("test")
log = CoPilotLogMetadata(base_logger, session_id="s1", user_id="u1")
assert log is not None
def test_filters_none_values(self):
import logging
base_logger = logging.getLogger("test")
log = CoPilotLogMetadata(
base_logger, session_id="s1", user_id=None, turn_id="t1"
)
assert log is not None

View File

@@ -64,6 +64,7 @@ class ChatMessage(BaseModel):
refusal: str | None = None
tool_calls: list[dict] | None = None
function_call: dict | None = None
sequence: int | None = None
duration_ms: int | None = None
@staticmethod
@@ -77,10 +78,54 @@ class ChatMessage(BaseModel):
refusal=prisma_message.refusal,
tool_calls=_parse_json_field(prisma_message.toolCalls),
function_call=_parse_json_field(prisma_message.functionCall),
sequence=prisma_message.sequence,
duration_ms=prisma_message.durationMs,
)
def is_message_duplicate(
messages: list[ChatMessage],
role: str,
content: str,
) -> bool:
"""Check whether *content* is already present in the current pending turn.
Only inspects trailing messages that share the given *role* (i.e. the
current turn). This ensures legitimately repeated messages across different
turns are not suppressed, while same-turn duplicates from stale cache are
still caught.
"""
for m in reversed(messages):
if m.role == role:
if m.content == content:
return True
else:
break
return False
def maybe_append_user_message(
session: "ChatSession",
message: str | None,
is_user_message: bool,
) -> bool:
"""Append a user/assistant message to the session if not already present.
The route handler already persists the user message before enqueueing,
so we check trailing same-role messages to avoid re-appending when the
session cache is slightly stale.
Returns True if the message was appended, False if skipped.
"""
if not message:
return False
role = "user" if is_user_message else "assistant"
if is_message_duplicate(session.messages, role, message):
return False
session.messages.append(ChatMessage(role=role, content=message))
return True
class Usage(BaseModel):
prompt_tokens: int
completion_tokens: int

View File

@@ -17,6 +17,8 @@ from .model import (
ChatSession,
Usage,
get_chat_session,
is_message_duplicate,
maybe_append_user_message,
upsert_chat_session,
)
@@ -424,3 +426,151 @@ async def test_concurrent_saves_collision_detection(setup_test_user, test_user_i
assert "Streaming message 1" in contents
assert "Streaming message 2" in contents
assert "Callback result" in contents
# --------------------------------------------------------------------------- #
# is_message_duplicate #
# --------------------------------------------------------------------------- #
def test_duplicate_detected_in_trailing_same_role():
"""Duplicate user message at the tail is detected."""
msgs = [
ChatMessage(role="user", content="hello"),
ChatMessage(role="assistant", content="hi there"),
ChatMessage(role="user", content="yes"),
]
assert is_message_duplicate(msgs, "user", "yes") is True
def test_duplicate_not_detected_across_turns():
"""Same text in a previous turn (separated by assistant) is NOT a duplicate."""
msgs = [
ChatMessage(role="user", content="yes"),
ChatMessage(role="assistant", content="ok"),
]
assert is_message_duplicate(msgs, "user", "yes") is False
def test_no_duplicate_on_empty_messages():
"""Empty message list never reports a duplicate."""
assert is_message_duplicate([], "user", "hello") is False
def test_no_duplicate_when_content_differs():
"""Different content in the trailing same-role block is not a duplicate."""
msgs = [
ChatMessage(role="assistant", content="response"),
ChatMessage(role="user", content="first message"),
]
assert is_message_duplicate(msgs, "user", "second message") is False
def test_duplicate_with_multiple_trailing_same_role():
"""Detects duplicate among multiple consecutive same-role messages."""
msgs = [
ChatMessage(role="assistant", content="response"),
ChatMessage(role="user", content="msg1"),
ChatMessage(role="user", content="msg2"),
]
assert is_message_duplicate(msgs, "user", "msg1") is True
assert is_message_duplicate(msgs, "user", "msg2") is True
assert is_message_duplicate(msgs, "user", "msg3") is False
def test_duplicate_check_for_assistant_role():
"""Works correctly when checking assistant role too."""
msgs = [
ChatMessage(role="user", content="hi"),
ChatMessage(role="assistant", content="hello"),
ChatMessage(role="assistant", content="how can I help?"),
]
assert is_message_duplicate(msgs, "assistant", "hello") is True
assert is_message_duplicate(msgs, "assistant", "new response") is False
def test_no_false_positive_when_content_is_none():
"""Messages with content=None in the trailing block do not match."""
msgs = [
ChatMessage(role="user", content=None),
ChatMessage(role="user", content="hello"),
]
assert is_message_duplicate(msgs, "user", "hello") is True
# None-content message should not match any string
msgs2 = [
ChatMessage(role="user", content=None),
]
assert is_message_duplicate(msgs2, "user", "hello") is False
def test_all_same_role_messages():
"""When all messages share the same role, the entire list is scanned."""
msgs = [
ChatMessage(role="user", content="first"),
ChatMessage(role="user", content="second"),
ChatMessage(role="user", content="third"),
]
assert is_message_duplicate(msgs, "user", "first") is True
assert is_message_duplicate(msgs, "user", "new") is False
# --------------------------------------------------------------------------- #
# maybe_append_user_message #
# --------------------------------------------------------------------------- #
def test_maybe_append_user_message_appends_new():
"""A new user message is appended and returns True."""
session = ChatSession.new(user_id="u", dry_run=False)
session.messages = [
ChatMessage(role="assistant", content="hello"),
]
result = maybe_append_user_message(session, "new msg", is_user_message=True)
assert result is True
assert len(session.messages) == 2
assert session.messages[-1].role == "user"
assert session.messages[-1].content == "new msg"
def test_maybe_append_user_message_skips_duplicate():
"""A duplicate user message is skipped and returns False."""
session = ChatSession.new(user_id="u", dry_run=False)
session.messages = [
ChatMessage(role="assistant", content="hello"),
ChatMessage(role="user", content="dup"),
]
result = maybe_append_user_message(session, "dup", is_user_message=True)
assert result is False
assert len(session.messages) == 2
def test_maybe_append_user_message_none_message():
"""None/empty message returns False without appending."""
session = ChatSession.new(user_id="u", dry_run=False)
assert maybe_append_user_message(session, None, is_user_message=True) is False
assert maybe_append_user_message(session, "", is_user_message=True) is False
assert len(session.messages) == 0
def test_maybe_append_assistant_message():
"""Works for assistant role when is_user_message=False."""
session = ChatSession.new(user_id="u", dry_run=False)
session.messages = [
ChatMessage(role="user", content="hi"),
]
result = maybe_append_user_message(session, "response", is_user_message=False)
assert result is True
assert session.messages[-1].role == "assistant"
assert session.messages[-1].content == "response"
def test_maybe_append_assistant_skips_duplicate():
"""Duplicate assistant message is skipped."""
session = ChatSession.new(user_id="u", dry_run=False)
session.messages = [
ChatMessage(role="user", content="hi"),
ChatMessage(role="assistant", content="dup"),
]
result = maybe_append_user_message(session, "dup", is_user_message=False)
assert result is False
assert len(session.messages) == 2

View File

@@ -126,6 +126,21 @@ After building the file, reference it with `@@agptfile:` in other tools:
- When spawning sub-agents for research, ensure each has a distinct
non-overlapping scope to avoid redundant searches.
### Tool Discovery Priority
When the user asks to interact with a service or API, follow this order:
1. **find_block first** — Search platform blocks with `find_block`. The platform has hundreds of built-in blocks (Google Sheets, Docs, Calendar, Gmail, Slack, GitHub, etc.) that work without extra setup.
2. **run_mcp_tool** — If no matching block exists, check if a hosted MCP server is available for the service. Only use known MCP server URLs from the registry.
3. **SendAuthenticatedWebRequestBlock** — If no block or MCP server exists, use `SendAuthenticatedWebRequestBlock` with existing host-scoped credentials. Check available credentials via `connect_integration`.
4. **Manual API call** — As a last resort, guide the user to set up credentials and use `SendAuthenticatedWebRequestBlock` with direct API calls.
**Never skip step 1.** Built-in blocks are more reliable, tested, and user-friendly than MCP or raw API calls.
### Sub-agent tasks
- When using the Task tool, NEVER set `run_in_background` to true.
All tasks must run in the foreground.

View File

@@ -9,11 +9,14 @@ UTC). Fails open when Redis is unavailable to avoid blocking users.
import asyncio
import logging
from datetime import UTC, datetime, timedelta
from enum import Enum
from prisma.models import User as PrismaUser
from pydantic import BaseModel, Field
from redis.exceptions import RedisError
from backend.data.redis_client import get_redis_async
from backend.util.cache import cached
logger = logging.getLogger(__name__)
@@ -21,6 +24,40 @@ logger = logging.getLogger(__name__)
_USAGE_KEY_PREFIX = "copilot:usage"
# ---------------------------------------------------------------------------
# Subscription tier definitions
# ---------------------------------------------------------------------------
class SubscriptionTier(str, Enum):
"""Subscription tiers with increasing token allowances.
Mirrors the ``SubscriptionTier`` enum in ``schema.prisma``.
Once ``prisma generate`` is run, this can be replaced with::
from prisma.enums import SubscriptionTier
"""
FREE = "FREE"
PRO = "PRO"
BUSINESS = "BUSINESS"
ENTERPRISE = "ENTERPRISE"
# Multiplier applied to the base limits (from LD / config) for each tier.
# Intentionally int (not float): keeps limits as whole token counts and avoids
# floating-point rounding. If fractional multipliers are ever needed, change
# the type and round the result in get_global_rate_limits().
TIER_MULTIPLIERS: dict[SubscriptionTier, int] = {
SubscriptionTier.FREE: 1,
SubscriptionTier.PRO: 5,
SubscriptionTier.BUSINESS: 20,
SubscriptionTier.ENTERPRISE: 60,
}
DEFAULT_TIER = SubscriptionTier.FREE
class UsageWindow(BaseModel):
"""Usage within a single time window."""
@@ -36,6 +73,7 @@ class CoPilotUsageStatus(BaseModel):
daily: UsageWindow
weekly: UsageWindow
tier: SubscriptionTier = DEFAULT_TIER
reset_cost: int = Field(
default=0,
description="Credit cost (in cents) to reset the daily limit. 0 = feature disabled.",
@@ -66,6 +104,7 @@ async def get_usage_status(
daily_token_limit: int,
weekly_token_limit: int,
rate_limit_reset_cost: int = 0,
tier: SubscriptionTier = DEFAULT_TIER,
) -> CoPilotUsageStatus:
"""Get current usage status for a user.
@@ -74,6 +113,7 @@ async def get_usage_status(
daily_token_limit: Max tokens per day (0 = unlimited).
weekly_token_limit: Max tokens per week (0 = unlimited).
rate_limit_reset_cost: Credit cost (cents) to reset daily limit (0 = disabled).
tier: The user's rate-limit tier (included in the response).
Returns:
CoPilotUsageStatus with current usage and limits.
@@ -103,6 +143,7 @@ async def get_usage_status(
limit=weekly_token_limit,
resets_at=_weekly_reset_time(now=now),
),
tier=tier,
reset_cost=rate_limit_reset_cost,
)
@@ -343,20 +384,100 @@ async def record_token_usage(
)
class _UserNotFoundError(Exception):
"""Raised when a user record is missing or has no subscription tier.
Used internally by ``_fetch_user_tier`` to signal a cache-miss condition:
by raising instead of returning ``DEFAULT_TIER``, we prevent the ``@cached``
decorator from storing the fallback value. This avoids a race condition
where a non-existent user's DEFAULT_TIER is cached, then the user is
created with a higher tier but receives the stale cached FREE tier for
up to 5 minutes.
"""
@cached(maxsize=1000, ttl_seconds=300, shared_cache=True)
async def _fetch_user_tier(user_id: str) -> SubscriptionTier:
"""Fetch the user's rate-limit tier from the database (cached via Redis).
Uses ``shared_cache=True`` so that tier changes propagate across all pods
immediately when the cache entry is invalidated (via ``cache_delete``).
Only successful DB lookups of existing users with a valid tier are cached.
Raises ``_UserNotFoundError`` when the user is missing or has no tier, so
the ``@cached`` decorator does **not** store a fallback value. This
prevents a race condition where a non-existent user's ``DEFAULT_TIER`` is
cached and then persists after the user is created with a higher tier.
"""
user = await PrismaUser.prisma().find_unique(where={"id": user_id})
if user and user.subscriptionTier: # type: ignore[reportAttributeAccessIssue]
return SubscriptionTier(user.subscriptionTier) # type: ignore[reportAttributeAccessIssue]
raise _UserNotFoundError(user_id)
async def get_user_tier(user_id: str) -> SubscriptionTier:
"""Look up the user's rate-limit tier from the database.
Successful results are cached for 5 minutes (via ``_fetch_user_tier``)
to avoid a DB round-trip on every rate-limit check.
Falls back to ``DEFAULT_TIER`` **without caching** when the DB is
unreachable or returns an unrecognised value, so the next call retries
the query instead of serving a stale fallback for up to 5 minutes.
"""
try:
return await _fetch_user_tier(user_id)
except Exception as exc:
logger.warning(
"Failed to resolve rate-limit tier for user %s, defaulting to %s: %s",
user_id[:8],
DEFAULT_TIER.value,
exc,
)
return DEFAULT_TIER
# Expose cache management on the public function so callers (including tests)
# never need to reach into the private ``_fetch_user_tier``.
get_user_tier.cache_clear = _fetch_user_tier.cache_clear # type: ignore[attr-defined]
get_user_tier.cache_delete = _fetch_user_tier.cache_delete # type: ignore[attr-defined]
async def set_user_tier(user_id: str, tier: SubscriptionTier) -> None:
"""Persist the user's rate-limit tier to the database.
Also invalidates the ``get_user_tier`` cache for this user so that
subsequent rate-limit checks immediately see the new tier.
Raises:
prisma.errors.RecordNotFoundError: If the user does not exist.
"""
await PrismaUser.prisma().update(
where={"id": user_id},
data={"subscriptionTier": tier.value},
)
# Invalidate cached tier so rate-limit checks pick up the change immediately.
get_user_tier.cache_delete(user_id) # type: ignore[attr-defined]
async def get_global_rate_limits(
user_id: str,
config_daily: int,
config_weekly: int,
) -> tuple[int, int]:
) -> tuple[int, int, SubscriptionTier]:
"""Resolve global rate limits from LaunchDarkly, falling back to config.
The base limits (from LD or config) are multiplied by the user's
tier multiplier so that higher tiers receive proportionally larger
allowances.
Args:
user_id: User ID for LD flag evaluation context.
config_daily: Fallback daily limit from ChatConfig.
config_weekly: Fallback weekly limit from ChatConfig.
Returns:
(daily_token_limit, weekly_token_limit) tuple.
(daily_token_limit, weekly_token_limit, tier) 3-tuple.
"""
# Lazy import to avoid circular dependency:
# rate_limit -> feature_flag -> settings -> ... -> rate_limit
@@ -378,7 +499,15 @@ async def get_global_rate_limits(
except (TypeError, ValueError):
logger.warning("Invalid LD value for weekly token limit: %r", weekly_raw)
weekly = config_weekly
return daily, weekly
# Apply tier multiplier
tier = await get_user_tier(user_id)
multiplier = TIER_MULTIPLIERS.get(tier, 1)
if multiplier != 1:
daily = daily * multiplier
weekly = weekly * multiplier
return daily, weekly, tier
async def reset_user_usage(user_id: str, *, reset_weekly: bool = False) -> None:

File diff suppressed because it is too large Load Diff

View File

@@ -9,7 +9,7 @@ import pytest
from fastapi import HTTPException
from backend.api.features.chat.routes import reset_copilot_usage
from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow
from backend.copilot.rate_limit import CoPilotUsageStatus, SubscriptionTier, UsageWindow
from backend.util.exceptions import InsufficientBalanceError
@@ -53,6 +53,18 @@ def _mock_settings(enable_credit: bool = True):
return mock
def _mock_rate_limits(
daily: int = 2_500_000,
weekly: int = 12_500_000,
tier: SubscriptionTier = SubscriptionTier.PRO,
):
"""Mock get_global_rate_limits to return fixed limits (no tier multiplier)."""
return patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(daily, weekly, tier)),
)
@pytest.mark.asyncio
class TestResetCopilotUsage:
async def test_feature_disabled_returns_400(self):
@@ -70,10 +82,7 @@ class TestResetCopilotUsage:
with (
patch(f"{_MODULE}.config", _make_config(daily_token_limit=0)),
patch(f"{_MODULE}.settings", _mock_settings()),
patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(0, 12_500_000)),
),
_mock_rate_limits(daily=0),
):
with pytest.raises(HTTPException) as exc_info:
await reset_copilot_usage(user_id="user-1")
@@ -87,10 +96,7 @@ class TestResetCopilotUsage:
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(2_500_000, 12_500_000)),
),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
patch(f"{_MODULE}.release_reset_lock", AsyncMock()) as mock_release,
@@ -120,10 +126,7 @@ class TestResetCopilotUsage:
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(2_500_000, 12_500_000)),
),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
patch(f"{_MODULE}.release_reset_lock", AsyncMock()) as mock_release,
@@ -153,10 +156,7 @@ class TestResetCopilotUsage:
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(2_500_000, 12_500_000)),
),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
patch(f"{_MODULE}.release_reset_lock", AsyncMock()),
@@ -187,10 +187,7 @@ class TestResetCopilotUsage:
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(2_500_000, 12_500_000)),
),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=3)),
):
with pytest.raises(HTTPException) as exc_info:
@@ -228,10 +225,7 @@ class TestResetCopilotUsage:
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(2_500_000, 12_500_000)),
),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
patch(f"{_MODULE}.release_reset_lock", AsyncMock()) as mock_release,
@@ -252,10 +246,7 @@ class TestResetCopilotUsage:
with (
patch(f"{_MODULE}.config", _make_config()),
patch(f"{_MODULE}.settings", _mock_settings()),
patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(2_500_000, 12_500_000)),
),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=None)),
):
with pytest.raises(HTTPException) as exc_info:
@@ -273,10 +264,7 @@ class TestResetCopilotUsage:
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(2_500_000, 12_500_000)),
),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
patch(f"{_MODULE}.release_reset_lock", AsyncMock()),
@@ -307,10 +295,7 @@ class TestResetCopilotUsage:
with (
patch(f"{_MODULE}.config", cfg),
patch(f"{_MODULE}.settings", _mock_settings()),
patch(
f"{_MODULE}.get_global_rate_limits",
AsyncMock(return_value=(2_500_000, 12_500_000)),
),
_mock_rate_limits(),
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
patch(f"{_MODULE}.release_reset_lock", AsyncMock()),

View File

@@ -53,6 +53,12 @@ Steps:
or fix manually based on the error descriptions. Iterate until valid.
8. **Save**: Call `create_agent` (new) or `edit_agent` (existing) with
the final `agent_json`
8. **Dry-run**: ALWAYS call `run_agent` with `dry_run=True` and
`wait_for_result=120` to verify the agent works end-to-end.
9. **Inspect & fix**: Check the dry-run output for errors. If issues are
found, call `edit_agent` to fix and dry-run again. Repeat until the
simulation passes or the problems are clearly unfixable.
See "REQUIRED: Dry-Run Verification Loop" section below for details.
### Agent JSON Structure
@@ -246,19 +252,51 @@ call in a loop until the task is complete:
Regular blocks work exactly like sub-agents as tools — wire each input
field from `source_name: "tools"` on the Orchestrator side.
### Testing with Dry Run
### REQUIRED: Dry-Run Verification Loop (create -> dry-run -> fix)
After saving an agent, suggest a dry run to validate wiring without consuming
real API calls, credentials, or credits:
After creating or editing an agent, you MUST dry-run it before telling the
user the agent is ready. NEVER skip this step.
1. **Run**: Call `run_agent` or `run_block` with `dry_run=True` and provide
sample inputs. This executes the graph with mock outputs, verifying that
links resolve correctly and required inputs are satisfied.
2. **Check results**: Call `view_agent_output` with `show_execution_details=True`
to inspect the full node-by-node execution trace. This shows what each node
received as input and produced as output, making it easy to spot wiring issues.
3. **Iterate**: If the dry run reveals wiring issues or missing inputs, fix
the agent JSON and re-save before suggesting a real execution.
#### Step-by-step workflow
1. **Create/Edit**: Call `create_agent` or `edit_agent` to save the agent.
2. **Dry-run**: Call `run_agent` with `dry_run=True`, `wait_for_result=120`,
and realistic sample inputs that exercise every path in the agent. This
simulates execution using an LLM for each block — no real API calls,
credentials, or credits are consumed.
3. **Inspect output**: Examine the dry-run result for problems. If
`wait_for_result` returns only a summary, call
`view_agent_output(execution_id=..., show_execution_details=True)` to
see the full node-by-node execution trace. Look for:
- **Errors / failed nodes** — a node raised an exception or returned an
error status. Common causes: wrong `source_name`/`sink_name` in links,
missing `input_default` values, or referencing a nonexistent block output.
- **Null / empty outputs** — data did not flow through a link. Verify that
`source_name` and `sink_name` match the block schemas exactly (case-
sensitive, including nested `_#_` notation).
- **Nodes that never executed** — the node was not reached. Likely a
missing or broken link from an upstream node.
- **Unexpected values** — data arrived but in the wrong type or
structure. Check type compatibility between linked ports.
4. **Fix**: If any issues are found, call `edit_agent` with the corrected
agent JSON, then go back to step 2.
5. **Repeat**: Continue the dry-run -> fix cycle until the simulation passes
or the problems are clearly unfixable. If you stop making progress,
report the remaining issues to the user and ask for guidance.
#### Good vs bad dry-run output
**Good output** (agent is ready):
- All nodes executed successfully (no errors in the execution trace)
- Data flows through every link with non-null, correctly-typed values
- The final `AgentOutputBlock` contains a meaningful result
- Status is `COMPLETED`
**Bad output** (needs fixing):
- Status is `FAILED` — check the error message for the failing node
- An output node received `null` — trace back to find the broken link
- A node received data in the wrong format (e.g. string where list expected)
- Nodes downstream of a failing node were skipped entirely
**Special block behaviour in dry-run mode:**
- **OrchestratorBlock** and **AgentExecutorBlock** execute for real so the

View File

@@ -8,6 +8,8 @@ circular import through ``executor`` → ``credit`` → ``block_cost_config``).
from __future__ import annotations
import re
from backend.copilot.config import ChatConfig
from backend.copilot.sdk.subscription import validate_subscription
@@ -26,14 +28,14 @@ def build_sdk_env(
Three modes (checked in order):
1. **Subscription** — clears all keys; CLI uses ``claude login`` auth.
2. **Direct Anthropic** — returns ``{}``; subprocess inherits
``ANTHROPIC_API_KEY`` from the parent environment.
2. **Direct Anthropic** — subprocess inherits ``ANTHROPIC_API_KEY``
from the parent environment (no overrides needed).
3. **OpenRouter** (default) — overrides base URL and auth token to
route through the proxy, with Langfuse trace headers.
When *sdk_cwd* is provided, ``CLAUDE_CODE_TMPDIR`` is set so that
the CLI writes temp/sub-agent output inside the per-session workspace
directory rather than an inaccessible system temp path.
All modes receive workspace isolation (``CLAUDE_CODE_TMPDIR``) and
security hardening env vars to prevent .claude.md loading, prompt
history persistence, auto-memory writes, and non-essential traffic.
"""
# --- Mode 1: Claude Code subscription auth ---
if config.use_claude_code_subscription:
@@ -43,40 +45,51 @@ def build_sdk_env(
"ANTHROPIC_AUTH_TOKEN": "",
"ANTHROPIC_BASE_URL": "",
}
if sdk_cwd:
env["CLAUDE_CODE_TMPDIR"] = sdk_cwd
return env
# --- Mode 2: Direct Anthropic (no proxy hop) ---
if not config.openrouter_active:
elif not config.openrouter_active:
env = {}
if sdk_cwd:
env["CLAUDE_CODE_TMPDIR"] = sdk_cwd
return env
# --- Mode 3: OpenRouter proxy ---
base = (config.base_url or "").rstrip("/")
if base.endswith("/v1"):
base = base[:-3]
env = {
"ANTHROPIC_BASE_URL": base,
"ANTHROPIC_AUTH_TOKEN": config.api_key or "",
"ANTHROPIC_API_KEY": "", # force CLI to use AUTH_TOKEN
}
else:
base = (config.base_url or "").rstrip("/")
if base.endswith("/v1"):
base = base[:-3]
env = {
"ANTHROPIC_BASE_URL": base,
"ANTHROPIC_AUTH_TOKEN": config.api_key or "",
"ANTHROPIC_API_KEY": "", # force CLI to use AUTH_TOKEN
}
# Inject broadcast headers so OpenRouter forwards traces to Langfuse.
def _safe(v: str) -> str:
return v.replace("\r", "").replace("\n", "").strip()[:128]
# Inject broadcast headers so OpenRouter forwards traces to Langfuse.
def _safe(v: str) -> str:
# Keep only printable ASCII (0x200x7e); strip control chars,
# null bytes, and non-ASCII to produce a valid HTTP header value
# (RFC 7230 §3.2.6).
return re.sub(r"[^\x20-\x7e]", "", v).strip()[:128]
parts = []
if session_id:
parts.append(f"x-session-id: {_safe(session_id)}")
if user_id:
parts.append(f"x-user-id: {_safe(user_id)}")
if parts:
env["ANTHROPIC_CUSTOM_HEADERS"] = "\n".join(parts)
parts = []
if session_id:
parts.append(f"x-session-id: {_safe(session_id)}")
if user_id:
parts.append(f"x-user-id: {_safe(user_id)}")
if parts:
env["ANTHROPIC_CUSTOM_HEADERS"] = "\n".join(parts)
# --- Common: workspace isolation + security hardening (all modes) ---
# Route subagent temp files into the per-session workspace so output
# files are accessible (fixes /tmp/claude-0/ permission errors in E2B).
if sdk_cwd:
env["CLAUDE_CODE_TMPDIR"] = sdk_cwd
# Harden multi-tenant deployment: prevent loading untrusted workspace
# .claude.md files, persisting prompt history, writing auto-memory,
# and sending non-essential telemetry traffic.
# These are undocumented CLI internals validated against
# claude-agent-sdk 0.1.45 — re-verify when upgrading the SDK.
env["CLAUDE_CODE_DISABLE_CLAUDE_MDS"] = "1"
env["CLAUDE_CODE_SKIP_PROMPT_HISTORY"] = "1"
env["CLAUDE_CODE_DISABLE_AUTO_MEMORY"] = "1"
env["CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC"] = "1"
return env

View File

@@ -41,11 +41,9 @@ class TestBuildSdkEnvSubscription:
result = build_sdk_env()
assert result == {
"ANTHROPIC_API_KEY": "",
"ANTHROPIC_AUTH_TOKEN": "",
"ANTHROPIC_BASE_URL": "",
}
assert result["ANTHROPIC_API_KEY"] == ""
assert result["ANTHROPIC_AUTH_TOKEN"] == ""
assert result["ANTHROPIC_BASE_URL"] == ""
mock_validate.assert_called_once()
@patch(
@@ -68,18 +66,20 @@ class TestBuildSdkEnvSubscription:
class TestBuildSdkEnvDirectAnthropic:
"""When OpenRouter is inactive, return empty dict (inherit parent env)."""
"""When OpenRouter is inactive, no ANTHROPIC_* overrides (inherit parent env)."""
def test_returns_empty_dict_when_openrouter_inactive(self):
def test_no_anthropic_key_overrides_when_openrouter_inactive(self):
cfg = _make_config(use_openrouter=False)
with patch("backend.copilot.sdk.env.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
result = build_sdk_env()
assert result == {}
assert "ANTHROPIC_API_KEY" not in result
assert "ANTHROPIC_AUTH_TOKEN" not in result
assert "ANTHROPIC_BASE_URL" not in result
def test_returns_empty_dict_when_openrouter_flag_true_but_no_key(self):
def test_no_anthropic_key_overrides_when_openrouter_flag_true_but_no_key(self):
"""OpenRouter flag is True but no api_key => openrouter_active is False."""
cfg = _make_config(use_openrouter=True, base_url="https://openrouter.ai/api/v1")
# Force api_key to None after construction (field_validator may pick up env vars)
@@ -90,7 +90,9 @@ class TestBuildSdkEnvDirectAnthropic:
result = build_sdk_env()
assert result == {}
assert "ANTHROPIC_API_KEY" not in result
assert "ANTHROPIC_AUTH_TOKEN" not in result
assert "ANTHROPIC_BASE_URL" not in result
# ---------------------------------------------------------------------------
@@ -234,12 +236,12 @@ class TestBuildSdkEnvModePriority:
result = build_sdk_env()
# Should get subscription result, not OpenRouter
assert result == {
"ANTHROPIC_API_KEY": "",
"ANTHROPIC_AUTH_TOKEN": "",
"ANTHROPIC_BASE_URL": "",
}
# Should get subscription result (blanked keys), not OpenRouter proxy
assert result["ANTHROPIC_API_KEY"] == ""
assert result["ANTHROPIC_AUTH_TOKEN"] == ""
assert result["ANTHROPIC_BASE_URL"] == ""
# OpenRouter-specific key must NOT be present
assert "ANTHROPIC_CUSTOM_HEADERS" not in result
# ---------------------------------------------------------------------------

View File

@@ -28,13 +28,12 @@ Each result includes a `remotes` array with the exact server URL to use.
### Important: Check blocks first
Before using `run_mcp_tool`, always check if the platform already has blocks for the service
using `find_block`. The platform has hundreds of built-in blocks (Google Sheets, Google Docs,
Google Calendar, Gmail, etc.) that work without MCP setup.
Always follow the **Tool Discovery Priority** described in the tool notes:
call `find_block` before resorting to `run_mcp_tool`.
Only use `run_mcp_tool` when:
- The service is in the known hosted MCP servers list above, OR
- You searched `find_block` first and found no matching blocks
- You searched `find_block` first and found no matching blocks, AND
- The service is in the known hosted MCP servers list above or found via the registry API
**Never guess or construct MCP server URLs.** Only use URLs from the known servers list above
or from the `remotes[].url` field in MCP registry search results.

View File

@@ -0,0 +1,535 @@
"""Tests for P0 guardrails: _resolve_fallback_model, security env vars, TMPDIR."""
from unittest.mock import patch
import pytest
from pydantic import ValidationError
from backend.copilot.config import ChatConfig
from backend.copilot.constants import is_transient_api_error
def _make_config(**overrides) -> ChatConfig:
"""Create a ChatConfig with safe defaults, applying *overrides*."""
defaults = {
"use_claude_code_subscription": False,
"use_openrouter": False,
"api_key": None,
"base_url": None,
}
defaults.update(overrides)
return ChatConfig(**defaults)
# ---------------------------------------------------------------------------
# _resolve_fallback_model
# ---------------------------------------------------------------------------
_SVC = "backend.copilot.sdk.service"
_ENV = "backend.copilot.sdk.env"
class TestResolveFallbackModel:
"""Provider-aware fallback model resolution."""
def test_returns_none_when_empty(self):
cfg = _make_config(claude_agent_fallback_model="")
with patch(f"{_SVC}.config", cfg):
from backend.copilot.sdk.service import _resolve_fallback_model
assert _resolve_fallback_model() is None
def test_strips_provider_prefix(self):
"""OpenRouter-style 'anthropic/claude-sonnet-4-...' is stripped."""
cfg = _make_config(
claude_agent_fallback_model="anthropic/claude-sonnet-4-20250514",
use_openrouter=True,
api_key="sk-test",
base_url="https://openrouter.ai/api/v1",
)
with patch(f"{_SVC}.config", cfg):
from backend.copilot.sdk.service import _resolve_fallback_model
result = _resolve_fallback_model()
assert result == "claude-sonnet-4-20250514"
assert "/" not in result
def test_dots_replaced_for_direct_anthropic(self):
"""Direct Anthropic requires hyphen-separated versions."""
cfg = _make_config(
claude_agent_fallback_model="claude-sonnet-4.5-20250514",
use_openrouter=False,
)
with patch(f"{_SVC}.config", cfg):
from backend.copilot.sdk.service import _resolve_fallback_model
result = _resolve_fallback_model()
assert result is not None
assert "." not in result
assert result == "claude-sonnet-4-5-20250514"
def test_dots_preserved_for_openrouter(self):
"""OpenRouter uses dot-separated versions — don't normalise."""
cfg = _make_config(
claude_agent_fallback_model="claude-sonnet-4.5-20250514",
use_openrouter=True,
api_key="sk-test",
base_url="https://openrouter.ai/api/v1",
)
with patch(f"{_SVC}.config", cfg):
from backend.copilot.sdk.service import _resolve_fallback_model
result = _resolve_fallback_model()
assert result == "claude-sonnet-4.5-20250514"
def test_default_value(self):
"""Default fallback model resolves to a valid string."""
cfg = _make_config()
with patch(f"{_SVC}.config", cfg):
from backend.copilot.sdk.service import _resolve_fallback_model
result = _resolve_fallback_model()
assert result is not None
assert "sonnet" in result.lower() or "claude" in result.lower()
# ---------------------------------------------------------------------------
# Security & isolation env vars
# ---------------------------------------------------------------------------
_SECURITY_VARS = (
"CLAUDE_CODE_DISABLE_CLAUDE_MDS",
"CLAUDE_CODE_SKIP_PROMPT_HISTORY",
"CLAUDE_CODE_DISABLE_AUTO_MEMORY",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC",
)
class TestSecurityEnvVars:
"""Verify security env vars are set in the returned dict for every auth mode.
Tests call ``build_sdk_env()`` directly and assert the vars are present
in the returned dict — not just present somewhere in the source file.
"""
def test_security_vars_set_in_openrouter_mode(self):
"""Mode 3 (OpenRouter): security vars must be in the returned env."""
cfg = _make_config(
use_claude_code_subscription=False,
use_openrouter=True,
api_key="sk-or-test",
base_url="https://openrouter.ai/api/v1",
)
with patch(f"{_ENV}.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
env = build_sdk_env(session_id="s1", user_id="u1")
for var in _SECURITY_VARS:
assert env.get(var) == "1", f"{var} not set in OpenRouter mode"
def test_security_vars_set_in_direct_anthropic_mode(self):
"""Mode 2 (direct Anthropic): security vars must be in the returned env."""
cfg = _make_config(use_claude_code_subscription=False, use_openrouter=False)
with patch(f"{_ENV}.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
env = build_sdk_env()
for var in _SECURITY_VARS:
assert env.get(var) == "1", f"{var} not set in direct Anthropic mode"
def test_security_vars_set_in_subscription_mode(self):
"""Mode 1 (subscription): security vars must be in the returned env."""
cfg = _make_config(use_claude_code_subscription=True)
with (
patch(f"{_ENV}.config", cfg),
patch(f"{_ENV}.validate_subscription"),
):
from backend.copilot.sdk.env import build_sdk_env
env = build_sdk_env(session_id="s1", user_id="u1")
for var in _SECURITY_VARS:
assert env.get(var) == "1", f"{var} not set in subscription mode"
def test_tmpdir_set_when_sdk_cwd_provided(self):
"""CLAUDE_CODE_TMPDIR must be set when sdk_cwd is provided."""
cfg = _make_config(use_openrouter=False)
with patch(f"{_ENV}.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
env = build_sdk_env(sdk_cwd="/workspace/session-1")
assert env.get("CLAUDE_CODE_TMPDIR") == "/workspace/session-1"
def test_tmpdir_absent_when_sdk_cwd_not_provided(self):
"""CLAUDE_CODE_TMPDIR must NOT be set when sdk_cwd is None."""
cfg = _make_config(use_openrouter=False)
with patch(f"{_ENV}.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
env = build_sdk_env()
assert "CLAUDE_CODE_TMPDIR" not in env
def test_home_not_overridden(self):
"""HOME must NOT be overridden — would break git/ssh/npm in subprocesses."""
cfg = _make_config(use_openrouter=False)
with patch(f"{_ENV}.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
env = build_sdk_env()
assert "HOME" not in env
# ---------------------------------------------------------------------------
# Config defaults
# ---------------------------------------------------------------------------
class TestConfigDefaults:
"""Verify ChatConfig P0 fields have correct defaults."""
def test_fallback_model_default(self):
cfg = _make_config()
assert cfg.claude_agent_fallback_model
assert "sonnet" in cfg.claude_agent_fallback_model.lower()
def test_max_turns_default(self):
cfg = _make_config()
assert cfg.claude_agent_max_turns == 1000
def test_max_budget_usd_default(self):
cfg = _make_config()
assert cfg.claude_agent_max_budget_usd == 100.0
def test_max_transient_retries_default(self):
cfg = _make_config()
assert cfg.claude_agent_max_transient_retries == 3
# ---------------------------------------------------------------------------
# build_sdk_env — all 3 auth modes
# ---------------------------------------------------------------------------
class TestBuildSdkEnv:
"""Verify build_sdk_env returns correct dicts for each auth mode."""
def test_subscription_mode_clears_keys(self):
"""Mode 1: subscription clears API key / auth token / base URL."""
cfg = _make_config(use_claude_code_subscription=True)
with (
patch(f"{_ENV}.config", cfg),
patch(f"{_ENV}.validate_subscription"),
):
from backend.copilot.sdk.env import build_sdk_env
env = build_sdk_env(session_id="s1", user_id="u1")
assert env["ANTHROPIC_API_KEY"] == ""
assert env["ANTHROPIC_AUTH_TOKEN"] == ""
assert env["ANTHROPIC_BASE_URL"] == ""
def test_direct_anthropic_inherits_api_key(self):
"""Mode 2: direct Anthropic doesn't set ANTHROPIC_* keys (inherits from parent)."""
cfg = _make_config(
use_claude_code_subscription=False,
use_openrouter=False,
)
with patch(f"{_ENV}.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
env = build_sdk_env()
assert "ANTHROPIC_API_KEY" not in env
assert "ANTHROPIC_AUTH_TOKEN" not in env
assert "ANTHROPIC_BASE_URL" not in env
def test_openrouter_sets_base_url_and_auth(self):
"""Mode 3: OpenRouter sets base URL, auth token, and clears API key."""
cfg = _make_config(
use_claude_code_subscription=False,
use_openrouter=True,
api_key="sk-or-test",
base_url="https://openrouter.ai/api/v1",
)
with patch(f"{_ENV}.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
env = build_sdk_env(session_id="sess-1", user_id="user-1")
assert env["ANTHROPIC_BASE_URL"] == "https://openrouter.ai/api"
assert env["ANTHROPIC_AUTH_TOKEN"] == "sk-or-test"
assert env["ANTHROPIC_API_KEY"] == ""
assert "x-session-id: sess-1" in env["ANTHROPIC_CUSTOM_HEADERS"]
assert "x-user-id: user-1" in env["ANTHROPIC_CUSTOM_HEADERS"]
def test_openrouter_no_headers_when_ids_empty(self):
"""Mode 3: No custom headers when session_id/user_id are not given."""
cfg = _make_config(
use_claude_code_subscription=False,
use_openrouter=True,
api_key="sk-or-test",
base_url="https://openrouter.ai/api/v1",
)
with patch(f"{_ENV}.config", cfg):
from backend.copilot.sdk.env import build_sdk_env
env = build_sdk_env()
assert "ANTHROPIC_CUSTOM_HEADERS" not in env
def test_all_modes_return_mutable_dict(self):
"""build_sdk_env must return a mutable dict (not None) in every mode."""
for cfg in (
_make_config(use_claude_code_subscription=True),
_make_config(use_openrouter=False),
_make_config(
use_openrouter=True,
api_key="k",
base_url="https://openrouter.ai/api/v1",
),
):
with (
patch(f"{_ENV}.config", cfg),
patch(f"{_ENV}.validate_subscription"),
):
from backend.copilot.sdk.env import build_sdk_env
env = build_sdk_env()
assert isinstance(env, dict)
env["CLAUDE_CODE_TMPDIR"] = "/tmp/test"
assert env["CLAUDE_CODE_TMPDIR"] == "/tmp/test"
# ---------------------------------------------------------------------------
# is_transient_api_error
# ---------------------------------------------------------------------------
class TestIsTransientApiError:
"""Verify that is_transient_api_error detects all transient patterns."""
@pytest.mark.parametrize(
"error_text",
[
"socket connection was closed unexpectedly",
"ECONNRESET",
"connection was forcibly closed",
"network socket disconnected",
],
)
def test_connection_level_errors(self, error_text: str):
assert is_transient_api_error(error_text)
@pytest.mark.parametrize(
"error_text",
[
"rate limit exceeded",
"rate_limit_error",
"Too Many Requests",
"status code 429",
],
)
def test_429_rate_limit_errors(self, error_text: str):
assert is_transient_api_error(error_text)
@pytest.mark.parametrize(
"error_text",
[
# Status-code-specific patterns (preferred — no false-positive risk)
"status code 529",
"status code 500",
"status code 502",
"status code 503",
"status code 504",
],
)
def test_5xx_server_errors(self, error_text: str):
assert is_transient_api_error(error_text)
@pytest.mark.parametrize(
"error_text",
[
"invalid_api_key",
"Authentication failed",
"prompt is too long",
"model not found",
"",
# Natural-language phrases intentionally NOT matched — they are too
# broad and could appear in application-level SDK messages unrelated
# to Anthropic API transient conditions.
"API is overloaded",
"Internal Server Error",
"Bad Gateway",
"Service Unavailable",
"Gateway Timeout",
],
)
def test_non_transient_errors(self, error_text: str):
assert not is_transient_api_error(error_text)
def test_case_insensitive(self):
assert is_transient_api_error("SOCKET CONNECTION WAS CLOSED UNEXPECTEDLY")
assert is_transient_api_error("econnreset")
# ---------------------------------------------------------------------------
# _HandledStreamError.already_yielded contract
# ---------------------------------------------------------------------------
class TestHandledStreamErrorAlreadyYielded:
"""Verify the already_yielded semantics on _HandledStreamError."""
def test_default_already_yielded_is_true(self):
"""Non-transient callers (circuit-breaker, idle timeout) don't pass the flag —
the default True means the outer loop won't yield a duplicate StreamError."""
from backend.copilot.sdk.service import _HandledStreamError
exc = _HandledStreamError("some error", code="circuit_breaker_empty_tool_calls")
assert exc.already_yielded is True
def test_transient_error_sets_already_yielded_false(self):
"""Transient errors pass already_yielded=False so the outer loop
yields StreamError only once (when retries are exhausted)."""
from backend.copilot.sdk.service import _HandledStreamError
exc = _HandledStreamError(
"transient",
code="transient_api_error",
already_yielded=False,
)
assert exc.already_yielded is False
def test_backoff_capped_at_30s(self):
"""Exponential backoff must be capped at 30 seconds.
With max_transient_retries=10, uncapped 2^9=512s would stall users
for 8+ minutes. min(30, 2**(n-1)) keeps the ceiling at 30s.
"""
# Check that 2^(10-1)=512 would exceed 30 but min() caps it.
assert min(30, 2 ** (10 - 1)) == 30
# Verify the formula is monotonically non-decreasing and capped.
backoffs = [min(30, 2 ** (n - 1)) for n in range(1, 11)]
assert all(b <= 30 for b in backoffs)
assert backoffs[-1] == 30 # last retry is capped
assert backoffs[0] == 1 # first retry starts at 1s
# ---------------------------------------------------------------------------
# Config validators for max_turns / max_budget_usd
# ---------------------------------------------------------------------------
class TestConfigValidators:
"""Verify ge/le bounds on max_turns and max_budget_usd."""
def test_max_turns_rejects_zero(self):
with pytest.raises(ValidationError):
_make_config(claude_agent_max_turns=0)
def test_max_turns_rejects_negative(self):
with pytest.raises(ValidationError):
_make_config(claude_agent_max_turns=-1)
def test_max_turns_rejects_above_10000(self):
with pytest.raises(ValidationError):
_make_config(claude_agent_max_turns=10001)
def test_max_turns_accepts_boundary_values(self):
cfg_low = _make_config(claude_agent_max_turns=1)
assert cfg_low.claude_agent_max_turns == 1
cfg_high = _make_config(claude_agent_max_turns=10000)
assert cfg_high.claude_agent_max_turns == 10000
def test_max_budget_rejects_zero(self):
with pytest.raises(ValidationError):
_make_config(claude_agent_max_budget_usd=0.0)
def test_max_budget_rejects_negative(self):
with pytest.raises(ValidationError):
_make_config(claude_agent_max_budget_usd=-1.0)
def test_max_budget_rejects_above_1000(self):
with pytest.raises(ValidationError):
_make_config(claude_agent_max_budget_usd=1000.01)
def test_max_budget_accepts_boundary_values(self):
cfg_low = _make_config(claude_agent_max_budget_usd=0.01)
assert cfg_low.claude_agent_max_budget_usd == 0.01
cfg_high = _make_config(claude_agent_max_budget_usd=1000.0)
assert cfg_high.claude_agent_max_budget_usd == 1000.0
def test_max_transient_retries_rejects_negative(self):
with pytest.raises(ValidationError):
_make_config(claude_agent_max_transient_retries=-1)
def test_max_transient_retries_rejects_above_10(self):
with pytest.raises(ValidationError):
_make_config(claude_agent_max_transient_retries=11)
def test_max_transient_retries_accepts_boundary_values(self):
cfg_low = _make_config(claude_agent_max_transient_retries=0)
assert cfg_low.claude_agent_max_transient_retries == 0
cfg_high = _make_config(claude_agent_max_transient_retries=10)
assert cfg_high.claude_agent_max_transient_retries == 10
# ---------------------------------------------------------------------------
# transient_exhausted SSE code contract
# ---------------------------------------------------------------------------
class TestTransientExhaustedErrorCode:
"""Verify transient-exhausted path emits the correct SSE error code."""
def test_transient_exhausted_uses_transient_api_error_code(self):
"""When except-Exception transient retries are exhausted, the SSE
StreamError must use code='transient_api_error', not 'sdk_stream_error'.
This ensures the frontend shows the same 'Try again' affordance as
the _HandledStreamError path.
"""
from backend.copilot.constants import FRIENDLY_TRANSIENT_MSG
# Simulate the post-loop branching logic extracted from service.py
attempts_exhausted = False
transient_exhausted = True
stream_err: Exception | None = ConnectionResetError("ECONNRESET")
if attempts_exhausted:
error_code = "all_attempts_exhausted"
error_text = "conversation too long"
elif transient_exhausted:
error_code = "transient_api_error"
error_text = FRIENDLY_TRANSIENT_MSG
else:
error_code = "sdk_stream_error"
error_text = f"SDK stream error: {stream_err}"
assert error_code == "transient_api_error"
assert error_text == FRIENDLY_TRANSIENT_MSG
def test_non_transient_exhausted_uses_sdk_stream_error_code(self):
"""Non-transient fatal errors (auth, network) keep 'sdk_stream_error'."""
attempts_exhausted = False
transient_exhausted = False
if attempts_exhausted:
error_code = "all_attempts_exhausted"
elif transient_exhausted:
error_code = "transient_api_error"
else:
error_code = "sdk_stream_error"
assert error_code == "sdk_stream_error"

View File

@@ -8,20 +8,19 @@ from uuid import uuid4
import pytest
from backend.util import json
from backend.util.prompt import CompressResult
from .conftest import build_test_transcript as _build_transcript
from .service import _friendly_error_text, _is_prompt_too_long
from .transcript import (
from backend.copilot.transcript import (
_flatten_assistant_content,
_flatten_tool_result_content,
_messages_to_transcript,
_run_compression,
_transcript_to_messages,
compact_transcript,
validate_transcript,
)
from backend.util import json
from backend.util.prompt import CompressResult
from .conftest import build_test_transcript as _build_transcript
from .service import _friendly_error_text, _is_prompt_too_long
from .transcript import compact_transcript, validate_transcript
# ---------------------------------------------------------------------------
# _flatten_assistant_content
@@ -403,7 +402,7 @@ class TestCompactTranscript:
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):
@@ -438,7 +437,7 @@ class TestCompactTranscript:
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):
@@ -462,7 +461,7 @@ class TestCompactTranscript:
]
)
with patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
side_effect=RuntimeError("LLM unavailable"),
):
@@ -568,11 +567,11 @@ class TestRunCompressionTimeout:
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
"backend.copilot.transcript.get_openai_client",
return_value="fake-client",
),
patch(
"backend.copilot.sdk.transcript.compress_context",
"backend.copilot.transcript.compress_context",
side_effect=_mock_compress,
),
):
@@ -602,11 +601,11 @@ class TestRunCompressionTimeout:
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
"backend.copilot.transcript.get_openai_client",
return_value=None,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
"backend.copilot.transcript.compress_context",
new_callable=AsyncMock,
return_value=truncation_result,
) as mock_compress,

View File

@@ -260,13 +260,13 @@ def test_result_error_emits_error_and_finish():
is_error=True,
num_turns=0,
session_id="s1",
result="API rate limited",
result="Invalid API key provided",
)
results = adapter.convert_message(msg)
# No step was open, so no FinishStep — just Error + Finish
assert len(results) == 2
assert isinstance(results[0], StreamError)
assert "API rate limited" in results[0].errorText
assert "Invalid API key provided" in results[0].errorText
assert isinstance(results[1], StreamFinish)

View File

@@ -26,18 +26,17 @@ from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from backend.util import json
from .conftest import build_test_transcript as _build_transcript
from .service import _MAX_STREAM_ATTEMPTS, _reduce_context
from .transcript import (
from backend.copilot.transcript import (
_flatten_assistant_content,
_flatten_tool_result_content,
_messages_to_transcript,
_transcript_to_messages,
compact_transcript,
validate_transcript,
)
from backend.util import json
from .conftest import build_test_transcript as _build_transcript
from .service import _MAX_STREAM_ATTEMPTS, _reduce_context
from .transcript import compact_transcript, validate_transcript
from .transcript_builder import TranscriptBuilder
# ---------------------------------------------------------------------------
@@ -113,7 +112,7 @@ class TestScenarioCompactAndRetry:
)(),
),
patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
),
@@ -170,7 +169,7 @@ class TestScenarioCompactFailsFallback:
)(),
),
patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
side_effect=RuntimeError("LLM unavailable"),
),
@@ -261,7 +260,7 @@ class TestScenarioDoubleFailDBFallback:
)(),
),
patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
),
@@ -337,7 +336,7 @@ class TestScenarioCompactionIdentical:
)(),
),
patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
),
@@ -730,7 +729,7 @@ class TestRetryEdgeCases:
)(),
),
patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
),
@@ -841,7 +840,7 @@ class TestRetryStateReset:
)(),
),
patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
side_effect=RuntimeError("boom"),
),
@@ -1405,9 +1404,9 @@ class TestStreamChatCompletionRetryIntegration:
events.append(event)
# Should NOT retry — only 1 attempt for auth errors
assert attempt_count[0] == 1, (
f"Expected 1 attempt (no retry for auth error), " f"got {attempt_count[0]}"
)
assert (
attempt_count[0] == 1
), f"Expected 1 attempt (no retry for auth error), got {attempt_count[0]}"
errors = [e for e in events if isinstance(e, StreamError)]
assert errors, "Expected StreamError"
assert errors[0].code == "sdk_stream_error"

View File

@@ -105,6 +105,10 @@ def test_agent_options_accepts_all_our_fields():
"env",
"resume",
"max_buffer_size",
"stderr",
"fallback_model",
"max_turns",
"max_budget_usd",
]
sig = inspect.signature(ClaudeAgentOptions)
for field in fields_we_use:

View File

@@ -33,12 +33,24 @@ from pydantic import BaseModel
from backend.copilot.context import get_workspace_manager
from backend.copilot.permissions import apply_tool_permissions
from backend.copilot.rate_limit import get_user_tier
from backend.copilot.transcript import (
_run_compression,
cleanup_stale_project_dirs,
compact_transcript,
download_transcript,
read_compacted_entries,
upload_transcript,
validate_transcript,
write_transcript_to_tempfile,
)
from backend.copilot.transcript_builder import TranscriptBuilder
from backend.data.redis_client import get_redis_async
from backend.executor.cluster_lock import AsyncClusterLock
from backend.util.exceptions import NotFoundError
from backend.util.settings import Settings
from ..config import ChatConfig
from ..config import ChatConfig, CopilotMode
from ..constants import (
COPILOT_ERROR_PREFIX,
COPILOT_RETRYABLE_ERROR_PREFIX,
@@ -51,6 +63,7 @@ from ..model import (
ChatMessage,
ChatSession,
get_chat_session,
maybe_append_user_message,
update_session_title,
upsert_chat_session,
)
@@ -92,17 +105,6 @@ from .tool_adapter import (
set_execution_context,
wait_for_stash,
)
from .transcript import (
_run_compression,
cleanup_stale_project_dirs,
compact_transcript,
download_transcript,
read_compacted_entries,
upload_transcript,
validate_transcript,
write_transcript_to_tempfile,
)
from .transcript_builder import TranscriptBuilder
logger = logging.getLogger(__name__)
config = ChatConfig()
@@ -129,6 +131,11 @@ _CIRCUIT_BREAKER_ERROR_MSG = (
"Try breaking your request into smaller parts."
)
# Idle timeout: abort the stream if no meaningful SDK message (only heartbeats)
# arrives for this many seconds. This catches hung tool calls (e.g. WebSearch
# hanging on a search provider that never responds).
_IDLE_TIMEOUT_SECONDS = 10 * 60 # 10 minutes
# Patterns that indicate the prompt/request exceeds the model's context limit.
# Matched case-insensitively against the full exception chain.
_PROMPT_TOO_LONG_PATTERNS: tuple[str, ...] = (
@@ -538,17 +545,34 @@ async def _iter_sdk_messages(
pass
def _normalize_model_name(raw_model: str) -> str:
"""Normalize a model name for the current routing configuration.
Applies two transformations shared by both the primary and fallback
model resolution paths:
1. **Strip provider prefix** — OpenRouter-style names like
``"anthropic/claude-opus-4.6"`` are reduced to ``"claude-opus-4.6"``.
2. **Dot-to-hyphen conversion** — when *not* routing through OpenRouter
the direct Anthropic API requires hyphen-separated versions
(``"claude-opus-4-6"``), so dots are replaced with hyphens.
"""
model = raw_model
if "/" in model:
model = model.split("/", 1)[1]
# OpenRouter uses dots in versions (claude-opus-4.6) but the direct
# Anthropic API requires hyphens (claude-opus-4-6). Only normalise
# when NOT routing through OpenRouter.
if not config.openrouter_active:
model = model.replace(".", "-")
return model
def _resolve_sdk_model() -> str | None:
"""Resolve the model name for the Claude Agent SDK CLI.
Uses `config.claude_agent_model` if set, otherwise derives from
`config.model` by stripping the OpenRouter provider prefix (e.g.,
`"anthropic/claude-opus-4.6"` → `"claude-opus-4-6"`).
OpenRouter uses dot-separated versions (`claude-opus-4.6`) while the
direct Anthropic API uses hyphen-separated versions (`claude-opus-4-6`).
Normalisation is only applied when the SDK will actually talk to
Anthropic directly (not through OpenRouter).
`config.model` via :func:`_normalize_model_name`.
When `use_claude_code_subscription` is enabled and no explicit
`claude_agent_model` is set, returns `None` so the CLI uses the
@@ -558,15 +582,18 @@ def _resolve_sdk_model() -> str | None:
return config.claude_agent_model
if config.use_claude_code_subscription:
return None
model = config.model
if "/" in model:
model = model.split("/", 1)[1]
# OpenRouter uses dots in versions (claude-opus-4.6) but the direct
# Anthropic API requires hyphens (claude-opus-4-6). Only normalise
# when NOT routing through OpenRouter.
if not config.openrouter_active:
model = model.replace(".", "-")
return model
return _normalize_model_name(config.model)
def _resolve_fallback_model() -> str | None:
"""Resolve the fallback model name via :func:`_normalize_model_name`.
Returns ``None`` when no fallback is configured (empty string).
"""
raw = config.claude_agent_fallback_model
if not raw:
return None
return _normalize_model_name(raw)
def _make_sdk_cwd(session_id: str) -> str:
@@ -1056,17 +1083,25 @@ def _dispatch_response(
class _HandledStreamError(Exception):
"""Raised by `_run_stream_attempt` after it has already yielded a
`StreamError` to the client (e.g. transient API error, circuit breaker).
"""Raised by `_run_stream_attempt` when an attempt fails and the outer
retry loop must roll back session state.
This signals the outer retry loop that the attempt failed so it can
perform session-message rollback and set the `ended_with_stream_error`
flag, **without** yielding a duplicate `StreamError` to the client.
Two sub-cases:
* ``already_yielded=True`` (default) — a ``StreamError`` was already sent
to the client inside ``_run_stream_attempt`` (circuit-breaker, idle
timeout, etc.). The outer loop must **not** yield another one.
* ``already_yielded=False`` — the error is transient and the outer loop
will decide whether to retry or surface the error. If retrying it
yields a ``StreamStatus("retrying…")``; if exhausted it yields the
``StreamError`` itself so the client sees it only once.
Attributes:
error_msg: The user-facing error message to persist.
code: Machine-readable error code (e.g. ``circuit_breaker_empty_tool_calls``).
retryable: Whether the frontend should offer a retry button.
already_yielded: ``True`` when ``StreamError`` was already sent to the
client before this exception was raised.
"""
def __init__(
@@ -1075,11 +1110,13 @@ class _HandledStreamError(Exception):
error_msg: str | None = None,
code: str | None = None,
retryable: bool = True,
already_yielded: bool = True,
):
super().__init__(message)
self.error_msg = error_msg
self.code = code
self.retryable = retryable
self.already_yielded = already_yielded
@dataclass
@@ -1271,6 +1308,8 @@ async def _run_stream_attempt(
await client.query(state.query_message, session_id=ctx.session_id)
state.transcript_builder.append_user(content=ctx.current_message)
_last_real_msg_time = time.monotonic()
async for sdk_msg in _iter_sdk_messages(client):
# Heartbeat sentinel — refresh lock and keep SSE alive
if sdk_msg is None:
@@ -1278,8 +1317,34 @@ async def _run_stream_attempt(
for ev in ctx.compaction.emit_start_if_ready():
yield ev
yield StreamHeartbeat()
# Idle timeout: if no real SDK message for too long, a tool
# call is likely hung (e.g. WebSearch provider not responding).
idle_seconds = time.monotonic() - _last_real_msg_time
if idle_seconds >= _IDLE_TIMEOUT_SECONDS:
logger.error(
"%s Idle timeout after %.0fs with no SDK message — "
"aborting stream (likely hung tool call)",
ctx.log_prefix,
idle_seconds,
)
stream_error_msg = (
"A tool call appears to be stuck "
"(no response for 10 minutes). "
"Please try again."
)
stream_error_code = "idle_timeout"
_append_error_marker(ctx.session, stream_error_msg, retryable=True)
yield StreamError(
errorText=stream_error_msg,
code=stream_error_code,
)
ended_with_stream_error = True
break
continue
_last_real_msg_time = time.monotonic()
logger.info(
"%s Received: %s %s (unresolved=%d, current=%d, resolved=%d)",
ctx.log_prefix,
@@ -1342,15 +1407,12 @@ async def _run_stream_attempt(
)
stream_error_msg = FRIENDLY_TRANSIENT_MSG
stream_error_code = "transient_api_error"
_append_error_marker(
ctx.session,
stream_error_msg,
retryable=True,
)
yield StreamError(
errorText=stream_error_msg,
code=stream_error_code,
)
# Do NOT yield StreamError or append error marker here.
# The outer retry loop decides: if a retry is available it
# yields StreamStatus("retrying…"); if retries are exhausted
# it appends the marker and yields StreamError exactly once.
# Yielding StreamError before the retry decision causes the
# client to display an error that is immediately superseded.
ended_with_stream_error = True
break
@@ -1528,9 +1590,21 @@ async def _run_stream_attempt(
# --- Intermediate persistence ---
# Flush session messages to DB periodically so page reloads
# show progress during long-running turns.
#
# IMPORTANT: Skip the flush while tool calls are pending
# (tool_calls set on assistant but results not yet received).
# The DB save is append-only (uses start_sequence), so if we
# flush the assistant message before tool_calls are set on it
# (text and tool_use arrive as separate SDK events), the
# tool_calls update is lost — the next flush starts past it.
_msgs_since_flush += 1
now = time.monotonic()
if (
has_pending_tools = (
acc.has_appended_assistant
and acc.accumulated_tool_calls
and not acc.has_tool_results
)
if not has_pending_tools and (
_msgs_since_flush >= _FLUSH_MESSAGE_THRESHOLD
or (now - _last_flush_time) >= _FLUSH_INTERVAL_SECONDS
):
@@ -1611,14 +1685,16 @@ async def _run_stream_attempt(
) and not acc.has_appended_assistant:
ctx.session.messages.append(acc.assistant_response)
# If the attempt ended with a transient error that was already surfaced
# to the client (StreamError yielded above), raise so the outer retry
# loop can rollback session messages and set its error flags properly.
# Raise so the outer retry loop can rollback session messages.
# already_yielded=False for transient_api_error: StreamError was NOT
# sent to the client yet (the outer loop does it when retries are
# exhausted, avoiding a premature error flash before the retry).
if ended_with_stream_error:
raise _HandledStreamError(
"Stream error handled — StreamError already yielded",
"Stream error handled",
error_msg=stream_error_msg,
code=stream_error_code,
already_yielded=(stream_error_code != "transient_api_error"),
)
@@ -1630,6 +1706,7 @@ async def stream_chat_completion_sdk(
session: ChatSession | None = None,
file_ids: list[str] | None = None,
permissions: "CopilotPermissions | None" = None,
mode: CopilotMode | None = None,
**_kwargs: Any,
) -> AsyncIterator[StreamBaseResponse]:
"""Stream chat completion using Claude Agent SDK.
@@ -1638,7 +1715,10 @@ async def stream_chat_completion_sdk(
file_ids: Optional workspace file IDs attached to the user's message.
Images are embedded as vision content blocks; other files are
saved to the SDK working directory for the Read tool.
mode: Accepted for signature compatibility with the baseline path.
The SDK path does not currently branch on this value.
"""
_ = mode # SDK path ignores the requested mode.
if session is None:
session = await get_chat_session(session_id, user_id)
@@ -1669,19 +1749,12 @@ async def stream_chat_completion_sdk(
)
session.messages.pop()
# Append the new message to the session if it's not already there
new_message_role = "user" if is_user_message else "assistant"
if message and (
len(session.messages) == 0
or not (
session.messages[-1].role == new_message_role
and session.messages[-1].content == message
)
):
session.messages.append(ChatMessage(role=new_message_role, content=message))
if maybe_append_user_message(session, message, is_user_message):
if is_user_message:
track_user_message(
user_id=user_id, session_id=session_id, message_length=len(message)
user_id=user_id,
session_id=session_id,
message_length=len(message or ""),
)
# Structured log prefix: [SDK][<session>][T<turn>]
@@ -1916,10 +1989,29 @@ async def stream_chat_completion_sdk(
allowed = get_copilot_tool_names(use_e2b=use_e2b)
disallowed = get_sdk_disallowed_tools(use_e2b=use_e2b)
# Flag set by _on_stderr when the SDK logs that it switched to the
# fallback model (e.g. on a 529 overloaded error). Checked once per
# heartbeat cycle and emitted as a StreamStatus notification.
fallback_model_activated = False
def _on_stderr(line: str) -> None:
"""Log a stderr line emitted by the Claude CLI subprocess."""
nonlocal fallback_model_activated
sid = session_id[:12] if session_id else "?"
logger.info("[SDK] [%s] CLI stderr: %s", sid, line.rstrip())
# Detect SDK fallback-model activation. The CLI logs a
# message containing "fallback model" when it switches models
# after a 529/overloaded error. Match "fallback model" rather
# than just "fallback" to avoid false positives from unrelated
# stderr lines (e.g. tool-level retries, cached result fallbacks).
lower = line.lower()
if not fallback_model_activated and "fallback model" in lower:
fallback_model_activated = True
logger.warning(
"[SDK] [%s] Fallback model activated — primary model "
"overloaded, switching to fallback",
sid,
)
sdk_options_kwargs: dict[str, Any] = {
"system_prompt": system_prompt,
@@ -1930,6 +2022,15 @@ async def stream_chat_completion_sdk(
"cwd": sdk_cwd,
"max_buffer_size": config.claude_agent_max_buffer_size,
"stderr": _on_stderr,
# --- P0 guardrails ---
# fallback_model: SDK auto-retries with this cheaper model on
# 529 (overloaded) errors, avoiding user-visible failures.
"fallback_model": _resolve_fallback_model(),
# max_turns: hard cap on agentic tool-use loops per query to
# prevent runaway execution from burning budget.
"max_turns": config.claude_agent_max_turns,
# max_budget_usd: per-query spend ceiling enforced by the CLI.
"max_budget_usd": config.claude_agent_max_budget_usd,
}
if sdk_model:
sdk_options_kwargs["model"] = sdk_model
@@ -1946,15 +2047,20 @@ async def stream_chat_completion_sdk(
# langsmith tracing integration attaches them to every span. This
# is what Langfuse (or any OTEL backend) maps to its native
# user/session fields.
_user_tier = await get_user_tier(user_id) if user_id else None
_otel_metadata: dict[str, str] = {
"resume": str(use_resume),
"conversation_turn": str(turn),
}
if _user_tier:
_otel_metadata["subscription_tier"] = _user_tier.value
_otel_ctx = propagate_attributes(
user_id=user_id,
session_id=session_id,
trace_name="copilot-sdk",
tags=["sdk"],
metadata={
"resume": str(use_resume),
"conversation_turn": str(turn),
},
metadata=_otel_metadata,
)
_otel_ctx.__enter__()
@@ -2009,8 +2115,29 @@ async def stream_chat_completion_sdk(
# ---------------------------------------------------------------
ended_with_stream_error = False
attempts_exhausted = False
transient_exhausted = False
stream_err: Exception | None = None
# Transient retry helper — deduplicates the logic shared between
# _HandledStreamError and the generic except-Exception handler.
transient_retries = 0
max_transient_retries = config.claude_agent_max_transient_retries
def _next_transient_backoff() -> int | None:
"""Return the next backoff delay in seconds, or ``None`` to surface the error.
Returns the backoff seconds if a retry should be attempted,
or ``None`` if retries are exhausted or events were already
yielded. Mutates outer ``transient_retries`` via nonlocal.
"""
nonlocal transient_retries
if events_yielded > 0:
return None
transient_retries += 1
if transient_retries > max_transient_retries:
return None
return min(30, 2 ** (transient_retries - 1)) # 1s, 2s, 4s, …, cap 30s
state = _RetryState(
options=options,
query_message=query_message,
@@ -2023,7 +2150,19 @@ async def stream_chat_completion_sdk(
usage=_TokenUsage(),
)
for attempt in range(_MAX_STREAM_ATTEMPTS):
attempt = 0
_last_reset_attempt = -1
while attempt < _MAX_STREAM_ATTEMPTS:
# Reset transient retry counter per context-level attempt so
# each attempt (original, compacted, no-transcript) gets the
# full retry budget for transient errors.
# Only reset when the attempt number actually changes —
# transient retries `continue` back to the loop top without
# incrementing `attempt`, so resetting unconditionally would
# create an infinite retry loop.
if attempt != _last_reset_attempt:
transient_retries = 0
_last_reset_attempt = attempt
# Clear any stale stash signal from the previous attempt so
# wait_for_stash() doesn't fire prematurely on a leftover event.
reset_stash_event()
@@ -2078,7 +2217,15 @@ async def stream_chat_completion_sdk(
state.usage.reset()
pre_attempt_msg_count = len(session.messages)
# Snapshot transcript builder state — it maintains an
# independent _entries list from session.messages, so rolling
# back session.messages alone would leave duplicate entries
# from the failed attempt in the uploaded transcript.
pre_transcript_entries = list(state.transcript_builder._entries)
pre_transcript_uuid = state.transcript_builder._last_uuid
events_yielded = 0
fallback_model_activated = False
fallback_notified = False
try:
async for event in _run_stream_attempt(stream_ctx, state):
@@ -2094,9 +2241,24 @@ async def stream_chat_completion_sdk(
StreamToolInputStart,
StreamToolInputAvailable,
StreamToolOutputAvailable,
# Transient StreamError and StreamStatus are
# ephemeral notifications, not content. Counting
# them would prevent the backoff retry from firing
# because _next_transient_backoff() returns None
# when events_yielded > 0.
StreamError,
StreamStatus,
),
):
events_yielded += 1
# Emit a one-time StreamStatus when the SDK switches
# to the fallback model (detected via stderr).
if fallback_model_activated and not fallback_notified:
fallback_notified = True
yield StreamStatus(
message="Primary model overloaded — "
"using fallback model for this request"
)
yield event
break # Stream completed — exit retry loop
except asyncio.CancelledError:
@@ -2113,6 +2275,31 @@ async def stream_chat_completion_sdk(
# session messages and set the error flag — do NOT set
# stream_err so the post-loop code won't emit a
# duplicate StreamError.
session.messages = session.messages[:pre_attempt_msg_count]
state.transcript_builder._entries = pre_transcript_entries
state.transcript_builder._last_uuid = pre_transcript_uuid
# Check if this is a transient error we can retry with backoff.
# exc.code is the only reliable signal — str(exc) is always the
# static "Stream error handled — StreamError already yielded" message.
if exc.code == "transient_api_error":
backoff = _next_transient_backoff()
if backoff is not None:
logger.warning(
"%s Transient error — retrying in %ds (%d/%d)",
log_prefix,
backoff,
transient_retries,
max_transient_retries,
)
yield StreamStatus(
message=f"Connection interrupted, retrying in {backoff}s…"
)
await asyncio.sleep(backoff)
state.adapter = SDKResponseAdapter(
message_id=message_id, session_id=session_id
)
state.usage.reset()
continue # retry the same context-level attempt
logger.warning(
"%s Stream error handled in attempt "
"(attempt %d/%d, code=%s, events_yielded=%d)",
@@ -2122,7 +2309,6 @@ async def stream_chat_completion_sdk(
exc.code or "transient",
events_yielded,
)
session.messages = session.messages[:pre_attempt_msg_count]
# transcript_builder still contains entries from the aborted
# attempt that no longer match session.messages. Skip upload
# so a future --resume doesn't replay rolled-back content.
@@ -2137,22 +2323,37 @@ async def stream_chat_completion_sdk(
retryable=True,
)
ended_with_stream_error = True
# For transient errors the StreamError was deliberately NOT
# yielded inside _run_stream_attempt (already_yielded=False)
# so the client didn't see a premature error flash. Yield it
# now that we know retries are exhausted.
# For non-transient errors (circuit breaker, idle timeout)
# already_yielded=True — do NOT yield again.
if not exc.already_yielded:
yield StreamError(
errorText=exc.error_msg or FRIENDLY_TRANSIENT_MSG,
code=exc.code or "transient_api_error",
)
break
except Exception as e:
stream_err = e
is_context_error = _is_prompt_too_long(e)
is_transient = is_transient_api_error(str(e))
logger.warning(
"%s Stream error (attempt %d/%d, context_error=%s, "
"events_yielded=%d): %s",
"transient=%s, events_yielded=%d): %s",
log_prefix,
attempt + 1,
_MAX_STREAM_ATTEMPTS,
is_context_error,
is_transient,
events_yielded,
stream_err,
exc_info=True,
)
session.messages = session.messages[:pre_attempt_msg_count]
state.transcript_builder._entries = pre_transcript_entries
state.transcript_builder._last_uuid = pre_transcript_uuid
if events_yielded > 0:
# Events were already sent to the frontend and cannot be
# unsent. Retrying would produce duplicate/inconsistent
@@ -2165,16 +2366,50 @@ async def stream_chat_completion_sdk(
skip_transcript_upload = True
ended_with_stream_error = True
break
# Transient API errors (ECONNRESET, 429, 5xx) — retry
# with exponential backoff via the shared helper.
if is_transient:
backoff = _next_transient_backoff()
if backoff is not None:
logger.warning(
"%s Transient exception — retrying in %ds (%d/%d)",
log_prefix,
backoff,
transient_retries,
max_transient_retries,
)
yield StreamStatus(
message=f"Connection interrupted, retrying in {backoff}s…"
)
await asyncio.sleep(backoff)
state.adapter = SDKResponseAdapter(
message_id=message_id, session_id=session_id
)
state.usage.reset()
continue # retry same context-level attempt
# Retries exhausted — persist retryable marker so the
# frontend shows "Try again" after refresh.
# Mirrors the _HandledStreamError exhausted-retry path
# at line ~2310.
transient_exhausted = True
skip_transcript_upload = True
_append_error_marker(
session, FRIENDLY_TRANSIENT_MSG, retryable=True
)
ended_with_stream_error = True
break
if not is_context_error:
# Non-context errors (network, auth, rate-limit) should
# not trigger compaction — surface the error immediately.
# Non-context, non-transient errors (auth, fatal)
# should not trigger compaction — surface immediately.
skip_transcript_upload = True
ended_with_stream_error = True
break
attempt += 1 # advance to next context-level attempt
continue
else:
# All retry attempts exhausted (loop ended without break)
# skip_transcript_upload is already set by _reduce_context
# while condition became False — all attempts exhausted without
# break. skip_transcript_upload is already set by _reduce_context
# when the transcript was dropped (transcript_lost=True).
ended_with_stream_error = True
attempts_exhausted = True
@@ -2203,25 +2438,24 @@ async def stream_chat_completion_sdk(
yield response
if ended_with_stream_error and stream_err is not None:
# Use distinct error codes: "all_attempts_exhausted" when all
# retries were consumed vs "sdk_stream_error" for non-context
# errors that broke the loop immediately (network, auth, etc.).
# Use distinct error codes depending on how the loop ended:
# • "all_attempts_exhausted" — context compaction ran out of room
# • "transient_api_error" — 429/5xx/ECONNRESET retries exhausted
# • "sdk_stream_error" — non-context, non-transient fatal error
safe_err = str(stream_err).replace("\n", " ").replace("\r", "")[:500]
if attempts_exhausted:
error_text = (
"Your conversation is too long. "
"Please start a new chat or clear some history."
)
error_code = "all_attempts_exhausted"
elif transient_exhausted:
error_text = FRIENDLY_TRANSIENT_MSG
error_code = "transient_api_error"
else:
error_text = _friendly_error_text(safe_err)
yield StreamError(
errorText=error_text,
code=(
"all_attempts_exhausted"
if attempts_exhausted
else "sdk_stream_error"
),
)
error_code = "sdk_stream_error"
yield StreamError(errorText=error_text, code=error_code)
# Copy token usage from retry state to outer-scope accumulators
# so the finally block can persist them.

View File

@@ -10,6 +10,7 @@ import pytest
from .service import (
_is_sdk_disconnect_error,
_normalize_model_name,
_prepare_file_attachments,
_resolve_sdk_model,
_safe_close_sdk_client,
@@ -405,6 +406,49 @@ def _clean_config_env(monkeypatch: pytest.MonkeyPatch) -> None:
monkeypatch.delenv(var, raising=False)
class TestNormalizeModelName:
"""Tests for _normalize_model_name — shared provider-aware normalization."""
def test_strips_provider_prefix(self, monkeypatch, _clean_config_env):
from backend.copilot import config as cfg_mod
cfg = cfg_mod.ChatConfig(
use_openrouter=False,
api_key=None,
base_url=None,
use_claude_code_subscription=False,
)
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
assert _normalize_model_name("anthropic/claude-opus-4.6") == "claude-opus-4-6"
def test_dots_preserved_for_openrouter(self, monkeypatch, _clean_config_env):
from backend.copilot import config as cfg_mod
cfg = cfg_mod.ChatConfig(
use_openrouter=True,
api_key="or-key",
base_url="https://openrouter.ai/api/v1",
use_claude_code_subscription=False,
)
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
assert _normalize_model_name("anthropic/claude-opus-4.6") == "claude-opus-4.6"
def test_no_prefix_no_dots(self, monkeypatch, _clean_config_env):
from backend.copilot import config as cfg_mod
cfg = cfg_mod.ChatConfig(
use_openrouter=False,
api_key=None,
base_url=None,
use_claude_code_subscription=False,
)
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
assert (
_normalize_model_name("claude-sonnet-4-20250514")
== "claude-sonnet-4-20250514"
)
class TestResolveSdkModel:
"""Tests for _resolve_sdk_model — model ID resolution for the SDK CLI."""

View File

@@ -27,20 +27,19 @@ from backend.copilot.response_model import (
StreamTextDelta,
StreamTextStart,
)
from backend.util import json
from .conftest import build_structured_transcript
from .response_adapter import SDKResponseAdapter
from .service import _format_sdk_content_blocks
from .transcript import (
from backend.copilot.transcript import (
_find_last_assistant_entry,
_flatten_assistant_content,
_messages_to_transcript,
_rechain_tail,
_transcript_to_messages,
compact_transcript,
validate_transcript,
)
from backend.util import json
from .conftest import build_structured_transcript
from .response_adapter import SDKResponseAdapter
from .service import _format_sdk_content_blocks
from .transcript import compact_transcript, validate_transcript
# ---------------------------------------------------------------------------
# Fixtures: realistic thinking block content
@@ -439,7 +438,7 @@ class TestCompactTranscriptThinkingBlocks:
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):
@@ -498,7 +497,7 @@ class TestCompactTranscriptThinkingBlocks:
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
side_effect=mock_compression,
):
await compact_transcript(transcript, model="test-model")
@@ -551,7 +550,7 @@ class TestCompactTranscriptThinkingBlocks:
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):
@@ -601,7 +600,7 @@ class TestCompactTranscriptThinkingBlocks:
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):
@@ -638,7 +637,7 @@ class TestCompactTranscriptThinkingBlocks:
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):
@@ -699,7 +698,7 @@ class TestCompactTranscriptThinkingBlocks:
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
"backend.copilot.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):

File diff suppressed because it is too large Load Diff

View File

@@ -1,235 +1,10 @@
"""Build complete JSONL transcript from SDK messages.
"""Re-export from shared ``backend.copilot.transcript_builder`` for backward compat.
The transcript represents the FULL active context at any point in time.
Each upload REPLACES the previous transcript atomically.
Flow:
Turn 1: Upload [msg1, msg2]
Turn 2: Download [msg1, msg2] → Upload [msg1, msg2, msg3, msg4] (REPLACE)
Turn 3: Download [msg1, msg2, msg3, msg4] → Upload [all messages] (REPLACE)
The transcript is never incremental - always the complete atomic state.
The canonical implementation now lives at ``backend.copilot.transcript_builder``
so both the SDK and baseline paths can import without cross-package
dependencies.
"""
import logging
from typing import Any
from uuid import uuid4
from backend.copilot.transcript_builder import TranscriptBuilder, TranscriptEntry
from pydantic import BaseModel
from backend.util import json
from .transcript import STRIPPABLE_TYPES
logger = logging.getLogger(__name__)
class TranscriptEntry(BaseModel):
"""Single transcript entry (user or assistant turn)."""
type: str
uuid: str
parentUuid: str | None
isCompactSummary: bool | None = None
message: dict[str, Any]
class TranscriptBuilder:
"""Build complete JSONL transcript from SDK messages.
This builder maintains the FULL conversation state, not incremental changes.
The output is always the complete active context.
"""
def __init__(self) -> None:
self._entries: list[TranscriptEntry] = []
self._last_uuid: str | None = None
def _last_is_assistant(self) -> bool:
return bool(self._entries) and self._entries[-1].type == "assistant"
def _last_message_id(self) -> str:
"""Return the message.id of the last entry, or '' if none."""
if self._entries:
return self._entries[-1].message.get("id", "")
return ""
@staticmethod
def _parse_entry(data: dict) -> TranscriptEntry | None:
"""Parse a single transcript entry, filtering strippable types.
Returns ``None`` for entries that should be skipped (strippable types
that are not compaction summaries).
"""
entry_type = data.get("type", "")
if entry_type in STRIPPABLE_TYPES and not data.get("isCompactSummary"):
return None
return TranscriptEntry(
type=entry_type,
uuid=data.get("uuid") or str(uuid4()),
parentUuid=data.get("parentUuid"),
isCompactSummary=data.get("isCompactSummary"),
message=data.get("message", {}),
)
def load_previous(self, content: str, log_prefix: str = "[Transcript]") -> None:
"""Load complete previous transcript.
This loads the FULL previous context. As new messages come in,
we append to this state. The final output is the complete context
(previous + new), not just the delta.
"""
if not content or not content.strip():
return
lines = content.strip().split("\n")
for line_num, line in enumerate(lines, 1):
if not line.strip():
continue
data = json.loads(line, fallback=None)
if data is None:
logger.warning(
"%s Failed to parse transcript line %d/%d",
log_prefix,
line_num,
len(lines),
)
continue
entry = self._parse_entry(data)
if entry is None:
continue
self._entries.append(entry)
self._last_uuid = entry.uuid
logger.info(
"%s Loaded %d entries from previous transcript (last_uuid=%s)",
log_prefix,
len(self._entries),
self._last_uuid[:12] if self._last_uuid else None,
)
def append_user(self, content: str | list[dict], uuid: str | None = None) -> None:
"""Append a user entry."""
msg_uuid = uuid or str(uuid4())
self._entries.append(
TranscriptEntry(
type="user",
uuid=msg_uuid,
parentUuid=self._last_uuid,
message={"role": "user", "content": content},
)
)
self._last_uuid = msg_uuid
def append_tool_result(self, tool_use_id: str, content: str) -> None:
"""Append a tool result as a user entry (one per tool call)."""
self.append_user(
content=[
{"type": "tool_result", "tool_use_id": tool_use_id, "content": content}
]
)
def append_assistant(
self,
content_blocks: list[dict],
model: str = "",
stop_reason: str | None = None,
) -> None:
"""Append an assistant entry.
Consecutive assistant entries automatically share the same message ID
so the CLI can merge them (thinking → text → tool_use) into a single
API message on ``--resume``. A new ID is assigned whenever an
assistant entry follows a non-assistant entry (user message or tool
result), because that marks the start of a new API response.
"""
message_id = (
self._last_message_id()
if self._last_is_assistant()
else f"msg_sdk_{uuid4().hex[:24]}"
)
msg_uuid = str(uuid4())
self._entries.append(
TranscriptEntry(
type="assistant",
uuid=msg_uuid,
parentUuid=self._last_uuid,
message={
"role": "assistant",
"model": model,
"id": message_id,
"type": "message",
"content": content_blocks,
"stop_reason": stop_reason,
"stop_sequence": None,
},
)
)
self._last_uuid = msg_uuid
def replace_entries(
self, compacted_entries: list[dict], log_prefix: str = "[Transcript]"
) -> None:
"""Replace all entries with compacted entries from the CLI session file.
Called after mid-stream compaction so TranscriptBuilder mirrors the
CLI's active context (compaction summary + post-compaction entries).
Builds the new list first and validates it's non-empty before swapping,
so corrupt input cannot wipe the conversation history.
"""
new_entries: list[TranscriptEntry] = []
for data in compacted_entries:
entry = self._parse_entry(data)
if entry is not None:
new_entries.append(entry)
if not new_entries:
logger.warning(
"%s replace_entries produced 0 entries from %d inputs, keeping old (%d entries)",
log_prefix,
len(compacted_entries),
len(self._entries),
)
return
old_count = len(self._entries)
self._entries = new_entries
self._last_uuid = new_entries[-1].uuid
logger.info(
"%s TranscriptBuilder compacted: %d entries -> %d entries",
log_prefix,
old_count,
len(self._entries),
)
def to_jsonl(self) -> str:
"""Export complete context as JSONL.
Consecutive assistant entries are kept separate to match the
native CLI format — the SDK merges them internally on resume.
Returns the FULL conversation state (all entries), not incremental.
This output REPLACES any previous transcript.
"""
if not self._entries:
return ""
lines = [entry.model_dump_json(exclude_none=True) for entry in self._entries]
return "\n".join(lines) + "\n"
@property
def entry_count(self) -> int:
"""Total number of entries in the complete context."""
return len(self._entries)
@property
def is_empty(self) -> bool:
"""Whether this builder has any entries."""
return len(self._entries) == 0
__all__ = ["TranscriptBuilder", "TranscriptEntry"]

View File

@@ -303,7 +303,7 @@ class TestDeleteTranscript:
mock_storage.delete = AsyncMock()
with patch(
"backend.copilot.sdk.transcript.get_workspace_storage",
"backend.copilot.transcript.get_workspace_storage",
new_callable=AsyncMock,
return_value=mock_storage,
):
@@ -323,7 +323,7 @@ class TestDeleteTranscript:
)
with patch(
"backend.copilot.sdk.transcript.get_workspace_storage",
"backend.copilot.transcript.get_workspace_storage",
new_callable=AsyncMock,
return_value=mock_storage,
):
@@ -341,7 +341,7 @@ class TestDeleteTranscript:
)
with patch(
"backend.copilot.sdk.transcript.get_workspace_storage",
"backend.copilot.transcript.get_workspace_storage",
new_callable=AsyncMock,
return_value=mock_storage,
):
@@ -850,7 +850,7 @@ class TestRunCompression:
@pytest.mark.asyncio
async def test_no_client_uses_truncation(self):
"""Path (a): ``get_openai_client()`` returns None → truncation only."""
from .transcript import _run_compression
from backend.copilot.transcript import _run_compression
truncation_result = self._make_compress_result(
True, [{"role": "user", "content": "truncated"}]
@@ -858,11 +858,11 @@ class TestRunCompression:
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
"backend.copilot.transcript.get_openai_client",
return_value=None,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
"backend.copilot.transcript.compress_context",
new_callable=AsyncMock,
return_value=truncation_result,
) as mock_compress,
@@ -885,7 +885,7 @@ class TestRunCompression:
@pytest.mark.asyncio
async def test_llm_success_returns_llm_result(self):
"""Path (b): ``get_openai_client()`` returns a client → LLM compresses."""
from .transcript import _run_compression
from backend.copilot.transcript import _run_compression
llm_result = self._make_compress_result(
True, [{"role": "user", "content": "LLM summary"}]
@@ -894,11 +894,11 @@ class TestRunCompression:
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
"backend.copilot.transcript.get_openai_client",
return_value=mock_client,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
"backend.copilot.transcript.compress_context",
new_callable=AsyncMock,
return_value=llm_result,
) as mock_compress,
@@ -916,7 +916,7 @@ class TestRunCompression:
@pytest.mark.asyncio
async def test_llm_failure_falls_back_to_truncation(self):
"""Path (c): LLM call raises → truncation fallback used instead."""
from .transcript import _run_compression
from backend.copilot.transcript import _run_compression
truncation_result = self._make_compress_result(
True, [{"role": "user", "content": "truncated fallback"}]
@@ -932,11 +932,11 @@ class TestRunCompression:
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
"backend.copilot.transcript.get_openai_client",
return_value=mock_client,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
"backend.copilot.transcript.compress_context",
side_effect=_compress_side_effect,
),
):
@@ -953,7 +953,7 @@ class TestRunCompression:
@pytest.mark.asyncio
async def test_llm_timeout_falls_back_to_truncation(self):
"""Path (d): LLM call exceeds timeout → truncation fallback used."""
from .transcript import _run_compression
from backend.copilot.transcript import _run_compression
truncation_result = self._make_compress_result(
True, [{"role": "user", "content": "truncated after timeout"}]
@@ -970,19 +970,19 @@ class TestRunCompression:
fake_client = MagicMock()
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
"backend.copilot.transcript.get_openai_client",
return_value=fake_client,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
"backend.copilot.transcript.compress_context",
side_effect=_compress_side_effect,
),
patch(
"backend.copilot.sdk.transcript._COMPACTION_TIMEOUT_SECONDS",
"backend.copilot.transcript._COMPACTION_TIMEOUT_SECONDS",
0.05,
),
patch(
"backend.copilot.sdk.transcript._TRUNCATION_TIMEOUT_SECONDS",
"backend.copilot.transcript._TRUNCATION_TIMEOUT_SECONDS",
5,
),
):
@@ -1007,7 +1007,7 @@ class TestCleanupStaleProjectDirs:
def test_removes_old_copilot_dirs(self, tmp_path, monkeypatch):
"""Directories matching copilot pattern older than threshold are removed."""
from backend.copilot.sdk.transcript import (
from backend.copilot.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
@@ -1015,7 +1015,7 @@ class TestCleanupStaleProjectDirs:
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
"backend.copilot.transcript._projects_base",
lambda: str(projects_dir),
)
@@ -1039,12 +1039,12 @@ class TestCleanupStaleProjectDirs:
def test_ignores_non_copilot_dirs(self, tmp_path, monkeypatch):
"""Directories not matching copilot pattern are left alone."""
from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
from backend.copilot.transcript import cleanup_stale_project_dirs
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
"backend.copilot.transcript._projects_base",
lambda: str(projects_dir),
)
@@ -1062,7 +1062,7 @@ class TestCleanupStaleProjectDirs:
def test_ttl_boundary_not_removed(self, tmp_path, monkeypatch):
"""A directory exactly at the TTL boundary should NOT be removed."""
from backend.copilot.sdk.transcript import (
from backend.copilot.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
@@ -1070,7 +1070,7 @@ class TestCleanupStaleProjectDirs:
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
"backend.copilot.transcript._projects_base",
lambda: str(projects_dir),
)
@@ -1088,7 +1088,7 @@ class TestCleanupStaleProjectDirs:
def test_skips_non_directory_entries(self, tmp_path, monkeypatch):
"""Regular files matching the copilot pattern are not removed."""
from backend.copilot.sdk.transcript import (
from backend.copilot.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
@@ -1096,7 +1096,7 @@ class TestCleanupStaleProjectDirs:
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
"backend.copilot.transcript._projects_base",
lambda: str(projects_dir),
)
@@ -1114,11 +1114,11 @@ class TestCleanupStaleProjectDirs:
def test_missing_base_dir_returns_zero(self, tmp_path, monkeypatch):
"""If the projects base directory doesn't exist, return 0 gracefully."""
from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
from backend.copilot.transcript import cleanup_stale_project_dirs
nonexistent = str(tmp_path / "does-not-exist" / "projects")
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
"backend.copilot.transcript._projects_base",
lambda: nonexistent,
)
@@ -1129,7 +1129,7 @@ class TestCleanupStaleProjectDirs:
"""When encoded_cwd is supplied only that directory is swept."""
import time
from backend.copilot.sdk.transcript import (
from backend.copilot.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
@@ -1137,7 +1137,7 @@ class TestCleanupStaleProjectDirs:
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
"backend.copilot.transcript._projects_base",
lambda: str(projects_dir),
)
@@ -1160,12 +1160,12 @@ class TestCleanupStaleProjectDirs:
def test_scoped_fresh_dir_not_removed(self, tmp_path, monkeypatch):
"""Scoped sweep leaves a fresh directory alone."""
from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
from backend.copilot.transcript import cleanup_stale_project_dirs
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
"backend.copilot.transcript._projects_base",
lambda: str(projects_dir),
)
@@ -1181,7 +1181,7 @@ class TestCleanupStaleProjectDirs:
"""Scoped sweep refuses to remove a non-copilot directory."""
import time
from backend.copilot.sdk.transcript import (
from backend.copilot.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
@@ -1189,7 +1189,7 @@ class TestCleanupStaleProjectDirs:
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
"backend.copilot.transcript._projects_base",
lambda: str(projects_dir),
)

View File

@@ -7,7 +7,7 @@ import pytest
from .model import create_chat_session, get_chat_session, upsert_chat_session
from .response_model import StreamError, StreamTextDelta
from .sdk import service as sdk_service
from .sdk.transcript import download_transcript
from .transcript import download_transcript
logger = logging.getLogger(__name__)

View File

@@ -33,12 +33,23 @@ _GET_CURRENT_DATE_BLOCK_ID = "b29c1b50-5d0e-4d9f-8f9d-1b0e6fcbf0b1"
_GMAIL_SEND_BLOCK_ID = "6c27abc2-e51d-499e-a85f-5a0041ba94f0"
_TEXT_REPLACE_BLOCK_ID = "7e7c87ab-3469-4bcc-9abe-67705091b713"
# Default OrchestratorBlock model/mode — kept in sync with ChatConfig.model.
# ChatConfig uses the OpenRouter format ("anthropic/claude-opus-4.6");
# OrchestratorBlock uses the native Anthropic model name.
ORCHESTRATOR_DEFAULT_MODEL = "claude-opus-4-6"
ORCHESTRATOR_DEFAULT_EXECUTION_MODE = "extended_thinking"
# Defaults applied to OrchestratorBlock nodes by the fixer.
_SDM_DEFAULTS: dict[str, int | bool] = {
# execution_mode and model match the copilot's default (extended thinking
# with Opus) so generated agents inherit the same reasoning capabilities.
# If the user explicitly sets these fields, the fixer won't override them.
_SDM_DEFAULTS: dict[str, int | bool | str] = {
"agent_mode_max_iterations": 10,
"conversation_compaction": True,
"retry": 3,
"multiple_tool_calls": False,
"execution_mode": ORCHESTRATOR_DEFAULT_EXECUTION_MODE,
"model": ORCHESTRATOR_DEFAULT_MODEL,
}
@@ -879,6 +890,12 @@ class AgentFixer:
)
if is_ai_block:
# Skip AI blocks that don't expose a "model" input property
# (some AI-category blocks have no model selector at all).
input_properties = block.get("inputSchema", {}).get("properties", {})
if "model" not in input_properties:
continue
node_id = node.get("id")
input_default = node.get("input_default", {})
current_model = input_default.get("model")
@@ -887,9 +904,7 @@ class AgentFixer:
# Blocks with a block-specific enum on the model field (e.g.
# PerplexityBlock) use their own enum values; others use the
# generic set.
model_schema = (
block.get("inputSchema", {}).get("properties", {}).get("model", {})
)
model_schema = input_properties.get("model", {})
block_model_enum = model_schema.get("enum")
if block_model_enum:
@@ -1649,6 +1664,8 @@ class AgentFixer:
2. ``conversation_compaction`` defaults to ``True``
3. ``retry`` defaults to ``3``
4. ``multiple_tool_calls`` defaults to ``False``
5. ``execution_mode`` defaults to ``"extended_thinking"``
6. ``model`` defaults to ``"claude-opus-4-6"``
Args:
agent: The agent dictionary to fix
@@ -1748,6 +1765,12 @@ class AgentFixer:
agent = self.fix_node_x_coordinates(agent, node_lookup=node_lookup)
agent = self.fix_getcurrentdate_offset(agent)
# Apply OrchestratorBlock defaults BEFORE fix_ai_model_parameter so that
# the orchestrator-specific model (claude-opus-4-6) is set first and
# fix_ai_model_parameter sees it as a valid allowed model instead of
# overwriting it with the generic default (gpt-4o).
agent = self.fix_orchestrator_blocks(agent)
# Apply fixes that require blocks information
if blocks:
agent = self.fix_invalid_nested_sink_links(
@@ -1765,9 +1788,6 @@ class AgentFixer:
# Apply fixes for MCPToolBlock nodes
agent = self.fix_mcp_tool_blocks(agent)
# Apply fixes for OrchestratorBlock nodes (agent-mode defaults)
agent = self.fix_orchestrator_blocks(agent)
# Apply fixes for AgentExecutorBlock nodes (sub-agents)
if library_agents:
agent = self.fix_agent_executor_blocks(agent, library_agents)

View File

@@ -580,6 +580,29 @@ class TestFixAiModelParameter:
assert result["nodes"][0]["input_default"]["model"] == "perplexity/sonar"
def test_ai_block_without_model_property_is_skipped(self):
"""AI-category blocks that have no 'model' input property should not
have a model injected — they simply don't expose a model selector."""
fixer = AgentFixer()
block_id = generate_uuid()
node = _make_node(node_id="n1", block_id=block_id, input_default={})
agent = _make_agent(nodes=[node])
blocks = [
{
"id": block_id,
"name": "SomeAIBlock",
"categories": [{"category": "AI"}],
"inputSchema": {
"properties": {"prompt": {"type": "string"}},
},
}
]
result = fixer.fix_ai_model_parameter(agent, blocks)
assert "model" not in result["nodes"][0]["input_default"]
class TestFixAgentExecutorBlocks:
"""Tests for fix_agent_executor_blocks."""

View File

@@ -42,7 +42,10 @@ class GetAgentBuildingGuideTool(BaseTool):
@property
def description(self) -> str:
return "Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage). Call before generating agent JSON."
return (
"Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage, "
"and the create->dry-run->fix iterative workflow). Call before generating agent JSON."
)
@property
def parameters(self) -> dict[str, Any]:

View File

@@ -0,0 +1,15 @@
"""Tests for GetAgentBuildingGuideTool."""
from backend.copilot.tools.get_agent_building_guide import _load_guide
def test_load_guide_returns_string():
guide = _load_guide()
assert isinstance(guide, str)
assert len(guide) > 100
def test_load_guide_caches():
guide1 = _load_guide()
guide2 = _load_guide()
assert guide1 is guide2

View File

@@ -48,27 +48,41 @@ logger = logging.getLogger(__name__)
def get_inputs_from_schema(
input_schema: dict[str, Any],
exclude_fields: set[str] | None = None,
input_data: dict[str, Any] | None = None,
) -> list[dict[str, Any]]:
"""Extract input field info from JSON schema."""
"""Extract input field info from JSON schema.
When *input_data* is provided, each field's ``value`` key is populated
with the value the CoPilot already supplied — so the frontend can
prefill the form instead of showing empty inputs. Fields marked
``advanced`` in the schema are flagged so the frontend can hide them
by default (matching the builder behaviour).
"""
if not isinstance(input_schema, dict):
return []
exclude = exclude_fields or set()
properties = input_schema.get("properties", {})
required = set(input_schema.get("required", []))
provided = input_data or {}
return [
{
results: list[dict[str, Any]] = []
for name, schema in properties.items():
if name in exclude:
continue
entry: dict[str, Any] = {
"name": name,
"title": schema.get("title", name),
"type": schema.get("type", "string"),
"description": schema.get("description", ""),
"required": name in required,
"default": schema.get("default"),
"advanced": schema.get("advanced", False),
}
for name, schema in properties.items()
if name not in exclude
]
if name in provided:
entry["value"] = provided[name]
results.append(entry)
return results
async def execute_block(
@@ -446,7 +460,9 @@ async def prepare_block_for_execution(
requirements={
"credentials": missing_creds_list,
"inputs": get_inputs_from_schema(
input_schema, exclude_fields=credentials_fields
input_schema,
exclude_fields=credentials_fields,
input_data=input_data,
),
"execution_modes": ["immediate"],
},

View File

@@ -153,7 +153,11 @@ class RunAgentTool(BaseTool):
},
"dry_run": {
"type": "boolean",
"description": "Execute in preview mode.",
"description": (
"When true, simulates execution using an LLM for each block "
"— no real API calls, credentials, or credits. "
"See agent_generation_guide for the full workflow."
),
},
},
"required": ["dry_run"],

View File

@@ -845,6 +845,7 @@ class WriteWorkspaceFileTool(BaseTool):
path=path,
mime_type=mime_type,
overwrite=overwrite,
metadata={"origin": "agent-created"},
)
# Build informative source label and message.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,240 @@
"""Build complete JSONL transcript from SDK messages.
The transcript represents the FULL active context at any point in time.
Each upload REPLACES the previous transcript atomically.
Flow:
Turn 1: Upload [msg1, msg2]
Turn 2: Download [msg1, msg2] → Upload [msg1, msg2, msg3, msg4] (REPLACE)
Turn 3: Download [msg1, msg2, msg3, msg4] → Upload [all messages] (REPLACE)
The transcript is never incremental - always the complete atomic state.
"""
import logging
from typing import Any
from uuid import uuid4
from pydantic import BaseModel
from backend.util import json
from .transcript import STRIPPABLE_TYPES
logger = logging.getLogger(__name__)
class TranscriptEntry(BaseModel):
"""Single transcript entry (user or assistant turn)."""
type: str
uuid: str
parentUuid: str = ""
isCompactSummary: bool | None = None
message: dict[str, Any]
class TranscriptBuilder:
"""Build complete JSONL transcript from SDK messages.
This builder maintains the FULL conversation state, not incremental changes.
The output is always the complete active context.
"""
def __init__(self) -> None:
self._entries: list[TranscriptEntry] = []
self._last_uuid: str | None = None
def _last_is_assistant(self) -> bool:
return bool(self._entries) and self._entries[-1].type == "assistant"
def _last_message_id(self) -> str:
"""Return the message.id of the last entry, or '' if none."""
if self._entries:
return self._entries[-1].message.get("id", "")
return ""
@staticmethod
def _parse_entry(data: dict) -> TranscriptEntry | None:
"""Parse a single transcript entry, filtering strippable types.
Returns ``None`` for entries that should be skipped (strippable types
that are not compaction summaries).
"""
entry_type = data.get("type", "")
if entry_type in STRIPPABLE_TYPES and not data.get("isCompactSummary"):
return None
return TranscriptEntry(
type=entry_type,
uuid=data.get("uuid") or str(uuid4()),
parentUuid=data.get("parentUuid") or "",
isCompactSummary=data.get("isCompactSummary"),
message=data.get("message", {}),
)
def load_previous(self, content: str, log_prefix: str = "[Transcript]") -> None:
"""Load complete previous transcript.
This loads the FULL previous context. As new messages come in,
we append to this state. The final output is the complete context
(previous + new), not just the delta.
"""
if not content or not content.strip():
return
lines = content.strip().split("\n")
for line_num, line in enumerate(lines, 1):
if not line.strip():
continue
data = json.loads(line, fallback=None)
if data is None:
logger.warning(
"%s Failed to parse transcript line %d/%d",
log_prefix,
line_num,
len(lines),
)
continue
entry = self._parse_entry(data)
if entry is None:
continue
self._entries.append(entry)
self._last_uuid = entry.uuid
logger.info(
"%s Loaded %d entries from previous transcript (last_uuid=%s)",
log_prefix,
len(self._entries),
self._last_uuid[:12] if self._last_uuid else None,
)
def append_user(self, content: str | list[dict], uuid: str | None = None) -> None:
"""Append a user entry."""
msg_uuid = uuid or str(uuid4())
self._entries.append(
TranscriptEntry(
type="user",
uuid=msg_uuid,
parentUuid=self._last_uuid or "",
message={"role": "user", "content": content},
)
)
self._last_uuid = msg_uuid
def append_tool_result(self, tool_use_id: str, content: str) -> None:
"""Append a tool result as a user entry (one per tool call)."""
self.append_user(
content=[
{"type": "tool_result", "tool_use_id": tool_use_id, "content": content}
]
)
def append_assistant(
self,
content_blocks: list[dict],
model: str = "",
stop_reason: str | None = None,
) -> None:
"""Append an assistant entry.
Consecutive assistant entries automatically share the same message ID
so the CLI can merge them (thinking → text → tool_use) into a single
API message on ``--resume``. A new ID is assigned whenever an
assistant entry follows a non-assistant entry (user message or tool
result), because that marks the start of a new API response.
"""
message_id = (
self._last_message_id()
if self._last_is_assistant()
else f"msg_sdk_{uuid4().hex[:24]}"
)
msg_uuid = str(uuid4())
self._entries.append(
TranscriptEntry(
type="assistant",
uuid=msg_uuid,
parentUuid=self._last_uuid or "",
message={
"role": "assistant",
"model": model,
"id": message_id,
"type": "message",
"content": content_blocks,
"stop_reason": stop_reason,
"stop_sequence": None,
},
)
)
self._last_uuid = msg_uuid
def replace_entries(
self, compacted_entries: list[dict], log_prefix: str = "[Transcript]"
) -> None:
"""Replace all entries with compacted entries from the CLI session file.
Called after mid-stream compaction so TranscriptBuilder mirrors the
CLI's active context (compaction summary + post-compaction entries).
Builds the new list first and validates it's non-empty before swapping,
so corrupt input cannot wipe the conversation history.
"""
new_entries: list[TranscriptEntry] = []
for data in compacted_entries:
entry = self._parse_entry(data)
if entry is not None:
new_entries.append(entry)
if not new_entries:
logger.warning(
"%s replace_entries produced 0 entries from %d inputs, keeping old (%d entries)",
log_prefix,
len(compacted_entries),
len(self._entries),
)
return
old_count = len(self._entries)
self._entries = new_entries
self._last_uuid = new_entries[-1].uuid
logger.info(
"%s TranscriptBuilder compacted: %d entries -> %d entries",
log_prefix,
old_count,
len(self._entries),
)
def to_jsonl(self) -> str:
"""Export complete context as JSONL.
Consecutive assistant entries are kept separate to match the
native CLI format — the SDK merges them internally on resume.
Returns the FULL conversation state (all entries), not incremental.
This output REPLACES any previous transcript.
"""
if not self._entries:
return ""
lines = [entry.model_dump_json(exclude_none=True) for entry in self._entries]
return "\n".join(lines) + "\n"
@property
def entry_count(self) -> int:
"""Total number of entries in the complete context."""
return len(self._entries)
@property
def is_empty(self) -> bool:
"""Whether this builder has any entries."""
return len(self._entries) == 0
@property
def last_entry_type(self) -> str | None:
"""Type of the last entry, or None if empty."""
return self._entries[-1].type if self._entries else None

View File

@@ -0,0 +1,260 @@
"""Tests for canonical TranscriptBuilder (backend.copilot.transcript_builder).
These tests directly import from the canonical module to ensure codecov
patch coverage for the new file.
"""
from backend.copilot.transcript_builder import TranscriptBuilder, TranscriptEntry
from backend.util import json
def _make_jsonl(*entries: dict) -> str:
return "\n".join(json.dumps(e) for e in entries) + "\n"
USER_MSG = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "hello"},
}
ASST_MSG = {
"type": "assistant",
"uuid": "a1",
"parentUuid": "u1",
"message": {
"role": "assistant",
"id": "msg_1",
"type": "message",
"content": [{"type": "text", "text": "hi"}],
"stop_reason": "end_turn",
"stop_sequence": None,
},
}
class TestTranscriptEntry:
def test_basic_construction(self):
entry = TranscriptEntry(
type="user", uuid="u1", message={"role": "user", "content": "hi"}
)
assert entry.type == "user"
assert entry.uuid == "u1"
assert entry.parentUuid == ""
assert entry.isCompactSummary is None
def test_optional_fields(self):
entry = TranscriptEntry(
type="summary",
uuid="s1",
parentUuid="p1",
isCompactSummary=True,
message={"role": "user", "content": "summary"},
)
assert entry.isCompactSummary is True
assert entry.parentUuid == "p1"
class TestTranscriptBuilderInit:
def test_starts_empty(self):
builder = TranscriptBuilder()
assert builder.is_empty
assert builder.entry_count == 0
assert builder.last_entry_type is None
assert builder.to_jsonl() == ""
class TestAppendUser:
def test_appends_user_entry(self):
builder = TranscriptBuilder()
builder.append_user("hello")
assert builder.entry_count == 1
assert builder.last_entry_type == "user"
def test_chains_parent_uuid(self):
builder = TranscriptBuilder()
builder.append_user("first", uuid="u1")
builder.append_user("second", uuid="u2")
output = builder.to_jsonl()
entries = [json.loads(line) for line in output.strip().split("\n")]
assert entries[0]["parentUuid"] == ""
assert entries[1]["parentUuid"] == "u1"
def test_custom_uuid(self):
builder = TranscriptBuilder()
builder.append_user("hello", uuid="custom-id")
output = builder.to_jsonl()
entry = json.loads(output.strip())
assert entry["uuid"] == "custom-id"
class TestAppendToolResult:
def test_appends_as_user_entry(self):
builder = TranscriptBuilder()
builder.append_tool_result(tool_use_id="tc_1", content="result text")
assert builder.entry_count == 1
assert builder.last_entry_type == "user"
output = builder.to_jsonl()
entry = json.loads(output.strip())
content = entry["message"]["content"]
assert len(content) == 1
assert content[0]["type"] == "tool_result"
assert content[0]["tool_use_id"] == "tc_1"
assert content[0]["content"] == "result text"
class TestAppendAssistant:
def test_appends_assistant_entry(self):
builder = TranscriptBuilder()
builder.append_user("hi")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "hello"}],
model="test-model",
stop_reason="end_turn",
)
assert builder.entry_count == 2
assert builder.last_entry_type == "assistant"
def test_consecutive_assistants_share_message_id(self):
builder = TranscriptBuilder()
builder.append_user("hi")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "part 1"}],
model="m",
)
builder.append_assistant(
content_blocks=[{"type": "text", "text": "part 2"}],
model="m",
)
output = builder.to_jsonl()
entries = [json.loads(line) for line in output.strip().split("\n")]
# The two assistant entries share the same message ID
assert entries[1]["message"]["id"] == entries[2]["message"]["id"]
def test_non_consecutive_assistants_get_different_ids(self):
builder = TranscriptBuilder()
builder.append_user("q1")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "a1"}],
model="m",
)
builder.append_user("q2")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "a2"}],
model="m",
)
output = builder.to_jsonl()
entries = [json.loads(line) for line in output.strip().split("\n")]
assert entries[1]["message"]["id"] != entries[3]["message"]["id"]
class TestLoadPrevious:
def test_loads_valid_entries(self):
content = _make_jsonl(USER_MSG, ASST_MSG)
builder = TranscriptBuilder()
builder.load_previous(content)
assert builder.entry_count == 2
def test_skips_empty_content(self):
builder = TranscriptBuilder()
builder.load_previous("")
assert builder.is_empty
builder.load_previous(" ")
assert builder.is_empty
def test_skips_strippable_types(self):
progress = {"type": "progress", "uuid": "p1", "message": {}}
content = _make_jsonl(USER_MSG, progress, ASST_MSG)
builder = TranscriptBuilder()
builder.load_previous(content)
assert builder.entry_count == 2 # progress was skipped
def test_preserves_compact_summary(self):
compact = {
"type": "summary",
"uuid": "cs1",
"isCompactSummary": True,
"message": {"role": "user", "content": "summary"},
}
content = _make_jsonl(compact, ASST_MSG)
builder = TranscriptBuilder()
builder.load_previous(content)
assert builder.entry_count == 2
def test_skips_invalid_json_lines(self):
content = '{"type":"user","uuid":"u1","message":{}}\nnot-valid-json\n'
builder = TranscriptBuilder()
builder.load_previous(content)
assert builder.entry_count == 1
class TestToJsonl:
def test_roundtrip(self):
builder = TranscriptBuilder()
builder.append_user("hello", uuid="u1")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "world"}],
model="m",
)
output = builder.to_jsonl()
assert output.endswith("\n")
lines = output.strip().split("\n")
assert len(lines) == 2
for line in lines:
parsed = json.loads(line)
assert "type" in parsed
assert "uuid" in parsed
assert "message" in parsed
class TestReplaceEntries:
def test_replaces_all_entries(self):
builder = TranscriptBuilder()
builder.append_user("old")
builder.append_assistant(
content_blocks=[{"type": "text", "text": "old answer"}], model="m"
)
assert builder.entry_count == 2
compacted = [
{
"type": "summary",
"uuid": "cs1",
"isCompactSummary": True,
"message": {"role": "user", "content": "compacted"},
}
]
builder.replace_entries(compacted)
assert builder.entry_count == 1
def test_empty_replacement_keeps_existing(self):
builder = TranscriptBuilder()
builder.append_user("keep me")
builder.replace_entries([])
assert builder.entry_count == 1
class TestParseEntry:
def test_filters_strippable_non_compact(self):
result = TranscriptBuilder._parse_entry(
{"type": "progress", "uuid": "p1", "message": {}}
)
assert result is None
def test_keeps_compact_summary(self):
result = TranscriptBuilder._parse_entry(
{
"type": "summary",
"uuid": "cs1",
"isCompactSummary": True,
"message": {},
}
)
assert result is not None
assert result.isCompactSummary is True
def test_generates_uuid_if_missing(self):
result = TranscriptBuilder._parse_entry(
{"type": "user", "message": {"role": "user", "content": "hi"}}
)
assert result is not None
assert result.uuid # Should be a generated UUID

View File

@@ -0,0 +1,726 @@
"""Tests for canonical transcript module (backend.copilot.transcript).
Covers pure helper functions that are not exercised by the SDK re-export tests.
"""
from __future__ import annotations
from unittest.mock import MagicMock
from backend.util import json
from .transcript import (
TranscriptDownload,
_build_path_from_parts,
_find_last_assistant_entry,
_flatten_assistant_content,
_flatten_tool_result_content,
_messages_to_transcript,
_meta_storage_path_parts,
_rechain_tail,
_sanitize_id,
_storage_path_parts,
_transcript_to_messages,
strip_for_upload,
validate_transcript,
)
def _make_jsonl(*entries: dict) -> str:
return "\n".join(json.dumps(e) for e in entries) + "\n"
# ---------------------------------------------------------------------------
# _sanitize_id
# ---------------------------------------------------------------------------
class TestSanitizeId:
def test_uuid_passes_through(self):
assert _sanitize_id("abcdef12-3456-7890-abcd-ef1234567890") == (
"abcdef12-3456-7890-abcd-ef1234567890"
)
def test_strips_non_hex_characters(self):
# Only hex chars (0-9, a-f, A-F) and hyphens are kept
result = _sanitize_id("abc/../../etc/passwd")
assert "/" not in result
assert "." not in result
# 'p', 's', 'w' are not hex chars, so they are stripped
assert all(c in "0123456789abcdefABCDEF-" for c in result)
def test_truncates_to_max_len(self):
long_id = "a" * 100
result = _sanitize_id(long_id, max_len=10)
assert len(result) == 10
def test_empty_returns_unknown(self):
assert _sanitize_id("") == "unknown"
def test_none_returns_unknown(self):
assert _sanitize_id(None) == "unknown" # type: ignore[arg-type]
def test_special_chars_only_returns_unknown(self):
assert _sanitize_id("!@#$%^&*()") == "unknown"
# ---------------------------------------------------------------------------
# _storage_path_parts / _meta_storage_path_parts
# ---------------------------------------------------------------------------
class TestStoragePathParts:
def test_returns_triple(self):
prefix, uid, fname = _storage_path_parts("user-1", "sess-2")
assert prefix == "chat-transcripts"
assert "e" in uid # hex chars from "user-1" sanitized
assert fname.endswith(".jsonl")
def test_meta_returns_meta_json(self):
prefix, uid, fname = _meta_storage_path_parts("user-1", "sess-2")
assert prefix == "chat-transcripts"
assert fname.endswith(".meta.json")
# ---------------------------------------------------------------------------
# _build_path_from_parts
# ---------------------------------------------------------------------------
class TestBuildPathFromParts:
def test_gcs_backend(self):
from backend.util.workspace_storage import GCSWorkspaceStorage
mock_gcs = MagicMock(spec=GCSWorkspaceStorage)
mock_gcs.bucket_name = "my-bucket"
path = _build_path_from_parts(("wid", "fid", "file.jsonl"), mock_gcs)
assert path == "gcs://my-bucket/workspaces/wid/fid/file.jsonl"
def test_local_backend(self):
# Use a plain object (not MagicMock) so isinstance(GCSWorkspaceStorage) is False
local_backend = type("LocalBackend", (), {})()
path = _build_path_from_parts(("wid", "fid", "file.jsonl"), local_backend)
assert path == "local://wid/fid/file.jsonl"
# ---------------------------------------------------------------------------
# TranscriptDownload dataclass
# ---------------------------------------------------------------------------
class TestTranscriptDownload:
def test_defaults(self):
td = TranscriptDownload(content="hello")
assert td.content == "hello"
assert td.message_count == 0
assert td.uploaded_at == 0.0
def test_custom_values(self):
td = TranscriptDownload(content="data", message_count=5, uploaded_at=123.45)
assert td.message_count == 5
assert td.uploaded_at == 123.45
# ---------------------------------------------------------------------------
# _flatten_assistant_content
# ---------------------------------------------------------------------------
class TestFlattenAssistantContent:
def test_text_blocks(self):
blocks = [
{"type": "text", "text": "Hello"},
{"type": "text", "text": "World"},
]
assert _flatten_assistant_content(blocks) == "Hello\nWorld"
def test_thinking_blocks_stripped(self):
blocks = [
{"type": "thinking", "thinking": "hmm..."},
{"type": "text", "text": "answer"},
{"type": "redacted_thinking", "data": "secret"},
]
assert _flatten_assistant_content(blocks) == "answer"
def test_tool_use_blocks_stripped(self):
blocks = [
{"type": "text", "text": "I'll run a tool"},
{"type": "tool_use", "name": "bash", "id": "tc1", "input": {}},
]
assert _flatten_assistant_content(blocks) == "I'll run a tool"
def test_string_blocks(self):
blocks = ["hello", "world"]
assert _flatten_assistant_content(blocks) == "hello\nworld"
def test_empty_blocks(self):
assert _flatten_assistant_content([]) == ""
def test_unknown_dict_blocks_skipped(self):
blocks = [{"type": "image", "data": "base64..."}]
assert _flatten_assistant_content(blocks) == ""
# ---------------------------------------------------------------------------
# _flatten_tool_result_content
# ---------------------------------------------------------------------------
class TestFlattenToolResultContent:
def test_tool_result_with_text_content(self):
blocks = [
{
"type": "tool_result",
"tool_use_id": "tc1",
"content": [{"type": "text", "text": "output data"}],
}
]
assert _flatten_tool_result_content(blocks) == "output data"
def test_tool_result_with_string_content(self):
blocks = [
{"type": "tool_result", "tool_use_id": "tc1", "content": "simple string"}
]
assert _flatten_tool_result_content(blocks) == "simple string"
def test_tool_result_with_image_placeholder(self):
blocks = [
{
"type": "tool_result",
"tool_use_id": "tc1",
"content": [{"type": "image", "data": "base64..."}],
}
]
assert _flatten_tool_result_content(blocks) == "[__image__]"
def test_tool_result_with_document_placeholder(self):
blocks = [
{
"type": "tool_result",
"tool_use_id": "tc1",
"content": [{"type": "document", "data": "base64..."}],
}
]
assert _flatten_tool_result_content(blocks) == "[__document__]"
def test_tool_result_with_none_content(self):
blocks = [{"type": "tool_result", "tool_use_id": "tc1", "content": None}]
assert _flatten_tool_result_content(blocks) == ""
def test_text_block_outside_tool_result(self):
blocks = [{"type": "text", "text": "standalone"}]
assert _flatten_tool_result_content(blocks) == "standalone"
def test_unknown_dict_block_placeholder(self):
blocks = [{"type": "custom_widget", "data": "x"}]
assert _flatten_tool_result_content(blocks) == "[__custom_widget__]"
def test_string_blocks(self):
blocks = ["raw text"]
assert _flatten_tool_result_content(blocks) == "raw text"
def test_empty_blocks(self):
assert _flatten_tool_result_content([]) == ""
def test_mixed_content_in_tool_result(self):
blocks = [
{
"type": "tool_result",
"tool_use_id": "tc1",
"content": [
{"type": "text", "text": "line1"},
{"type": "image", "data": "..."},
"raw string",
],
}
]
result = _flatten_tool_result_content(blocks)
assert "line1" in result
assert "[__image__]" in result
assert "raw string" in result
def test_tool_result_with_dict_without_text_key(self):
blocks = [
{
"type": "tool_result",
"tool_use_id": "tc1",
"content": [{"count": 42}],
}
]
result = _flatten_tool_result_content(blocks)
assert "42" in result
def test_tool_result_content_list_with_list_content(self):
blocks = [
{
"type": "tool_result",
"tool_use_id": "tc1",
"content": [{"type": "text", "text": None}],
}
]
result = _flatten_tool_result_content(blocks)
assert result == "None"
# ---------------------------------------------------------------------------
# _transcript_to_messages
# ---------------------------------------------------------------------------
USER_ENTRY = {
"type": "user",
"uuid": "u1",
"parentUuid": "",
"message": {"role": "user", "content": "hello"},
}
ASST_ENTRY = {
"type": "assistant",
"uuid": "a1",
"parentUuid": "u1",
"message": {
"role": "assistant",
"id": "msg_1",
"content": [{"type": "text", "text": "hi there"}],
},
}
PROGRESS_ENTRY = {
"type": "progress",
"uuid": "p1",
"parentUuid": "u1",
"data": {},
}
class TestTranscriptToMessages:
def test_basic_conversion(self):
content = _make_jsonl(USER_ENTRY, ASST_ENTRY)
messages = _transcript_to_messages(content)
assert len(messages) == 2
assert messages[0] == {"role": "user", "content": "hello"}
assert messages[1]["role"] == "assistant"
assert messages[1]["content"] == "hi there"
def test_skips_strippable_types(self):
content = _make_jsonl(USER_ENTRY, PROGRESS_ENTRY, ASST_ENTRY)
messages = _transcript_to_messages(content)
assert len(messages) == 2
def test_skips_entries_without_role(self):
no_role = {"type": "user", "uuid": "x", "message": {"content": "no role"}}
content = _make_jsonl(no_role)
messages = _transcript_to_messages(content)
assert len(messages) == 0
def test_handles_string_content(self):
entry = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "plain string"},
}
content = _make_jsonl(entry)
messages = _transcript_to_messages(content)
assert messages[0]["content"] == "plain string"
def test_handles_tool_result_content(self):
entry = {
"type": "user",
"uuid": "u1",
"message": {
"role": "user",
"content": [
{"type": "tool_result", "tool_use_id": "tc1", "content": "output"}
],
},
}
content = _make_jsonl(entry)
messages = _transcript_to_messages(content)
assert messages[0]["content"] == "output"
def test_handles_none_content(self):
entry = {
"type": "assistant",
"uuid": "a1",
"message": {"role": "assistant", "content": None},
}
content = _make_jsonl(entry)
messages = _transcript_to_messages(content)
assert messages[0]["content"] == ""
def test_skips_invalid_json(self):
content = "not valid json\n"
messages = _transcript_to_messages(content)
assert len(messages) == 0
def test_preserves_compact_summary(self):
compact = {
"type": "summary",
"uuid": "cs1",
"isCompactSummary": True,
"message": {"role": "user", "content": "summary of conversation"},
}
content = _make_jsonl(compact)
messages = _transcript_to_messages(content)
assert len(messages) == 1
def test_strips_summary_without_compact_flag(self):
summary = {
"type": "summary",
"uuid": "s1",
"message": {"role": "user", "content": "summary"},
}
content = _make_jsonl(summary)
messages = _transcript_to_messages(content)
assert len(messages) == 0
# ---------------------------------------------------------------------------
# _messages_to_transcript
# ---------------------------------------------------------------------------
class TestMessagesToTranscript:
def test_basic_roundtrip(self):
messages = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "world"},
]
result = _messages_to_transcript(messages)
assert result.endswith("\n")
lines = result.strip().split("\n")
assert len(lines) == 2
user_entry = json.loads(lines[0])
assert user_entry["type"] == "user"
assert user_entry["message"]["role"] == "user"
assert user_entry["message"]["content"] == "hello"
assert user_entry["parentUuid"] == ""
asst_entry = json.loads(lines[1])
assert asst_entry["type"] == "assistant"
assert asst_entry["message"]["role"] == "assistant"
assert asst_entry["message"]["content"] == [{"type": "text", "text": "world"}]
assert asst_entry["parentUuid"] == user_entry["uuid"]
def test_empty_messages(self):
assert _messages_to_transcript([]) == ""
def test_assistant_has_message_envelope(self):
messages = [{"role": "assistant", "content": "test"}]
result = _messages_to_transcript(messages)
entry = json.loads(result.strip())
msg = entry["message"]
assert "id" in msg
assert msg["id"].startswith("msg_compact_")
assert msg["type"] == "message"
assert msg["stop_reason"] == "end_turn"
assert msg["stop_sequence"] is None
def test_uuid_chain(self):
messages = [
{"role": "user", "content": "a"},
{"role": "assistant", "content": "b"},
{"role": "user", "content": "c"},
]
result = _messages_to_transcript(messages)
lines = result.strip().split("\n")
entries = [json.loads(line) for line in lines]
assert entries[0]["parentUuid"] == ""
assert entries[1]["parentUuid"] == entries[0]["uuid"]
assert entries[2]["parentUuid"] == entries[1]["uuid"]
def test_assistant_with_empty_content(self):
messages = [{"role": "assistant", "content": ""}]
result = _messages_to_transcript(messages)
entry = json.loads(result.strip())
assert entry["message"]["content"] == []
# ---------------------------------------------------------------------------
# _find_last_assistant_entry
# ---------------------------------------------------------------------------
class TestFindLastAssistantEntry:
def test_splits_at_last_assistant(self):
user = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "hi"},
}
asst = {
"type": "assistant",
"uuid": "a1",
"message": {"role": "assistant", "id": "msg1", "content": "answer"},
}
content = _make_jsonl(user, asst)
prefix, tail = _find_last_assistant_entry(content)
assert len(prefix) == 1
assert len(tail) == 1
def test_no_assistant_returns_all_in_prefix(self):
user1 = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "hi"},
}
user2 = {
"type": "user",
"uuid": "u2",
"message": {"role": "user", "content": "hey"},
}
content = _make_jsonl(user1, user2)
prefix, tail = _find_last_assistant_entry(content)
assert len(prefix) == 2
assert len(tail) == 0
def test_multi_entry_turn_preserved(self):
user = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "q"},
}
asst1 = {
"type": "assistant",
"uuid": "a1",
"message": {
"role": "assistant",
"id": "msg_turn",
"content": [{"type": "thinking", "thinking": "hmm"}],
},
}
asst2 = {
"type": "assistant",
"uuid": "a2",
"message": {
"role": "assistant",
"id": "msg_turn",
"content": [{"type": "text", "text": "answer"}],
},
}
content = _make_jsonl(user, asst1, asst2)
prefix, tail = _find_last_assistant_entry(content)
assert len(prefix) == 1 # just the user
assert len(tail) == 2 # both assistant entries
def test_assistant_without_id(self):
user = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "q"},
}
asst = {
"type": "assistant",
"uuid": "a1",
"message": {"role": "assistant", "content": "no id"},
}
content = _make_jsonl(user, asst)
prefix, tail = _find_last_assistant_entry(content)
assert len(prefix) == 1
assert len(tail) == 1
def test_trailing_user_after_assistant(self):
user1 = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "q"},
}
asst = {
"type": "assistant",
"uuid": "a1",
"message": {"role": "assistant", "id": "msg1", "content": "a"},
}
user2 = {
"type": "user",
"uuid": "u2",
"message": {"role": "user", "content": "follow"},
}
content = _make_jsonl(user1, asst, user2)
prefix, tail = _find_last_assistant_entry(content)
assert len(prefix) == 1 # user1
assert len(tail) == 2 # asst + user2
# ---------------------------------------------------------------------------
# _rechain_tail
# ---------------------------------------------------------------------------
class TestRechainTail:
def test_empty_tail(self):
assert _rechain_tail("some prefix\n", []) == ""
def test_patches_first_entry_parent(self):
prefix_entry = {"uuid": "last-prefix-uuid", "type": "user", "message": {}}
prefix = json.dumps(prefix_entry) + "\n"
tail_entry = {
"uuid": "t1",
"parentUuid": "old-parent",
"type": "assistant",
"message": {},
}
tail_lines = [json.dumps(tail_entry)]
result = _rechain_tail(prefix, tail_lines)
parsed = json.loads(result.strip())
assert parsed["parentUuid"] == "last-prefix-uuid"
def test_chains_consecutive_tail_entries(self):
prefix_entry = {"uuid": "p1", "type": "user", "message": {}}
prefix = json.dumps(prefix_entry) + "\n"
t1 = {"uuid": "t1", "parentUuid": "old1", "type": "assistant", "message": {}}
t2 = {"uuid": "t2", "parentUuid": "old2", "type": "user", "message": {}}
tail_lines = [json.dumps(t1), json.dumps(t2)]
result = _rechain_tail(prefix, tail_lines)
entries = [json.loads(line) for line in result.strip().split("\n")]
assert entries[0]["parentUuid"] == "p1"
assert entries[1]["parentUuid"] == "t1"
def test_non_dict_lines_passed_through(self):
prefix_entry = {"uuid": "p1", "type": "user", "message": {}}
prefix = json.dumps(prefix_entry) + "\n"
tail_lines = ["not-a-json-dict"]
result = _rechain_tail(prefix, tail_lines)
assert "not-a-json-dict" in result
# ---------------------------------------------------------------------------
# strip_for_upload (combined single-parse)
# ---------------------------------------------------------------------------
class TestStripForUpload:
def test_strips_progress_and_thinking(self):
user = {
"type": "user",
"uuid": "u1",
"parentUuid": "",
"message": {"role": "user", "content": "hi"},
}
progress = {"type": "progress", "uuid": "p1", "parentUuid": "u1", "data": {}}
asst_old = {
"type": "assistant",
"uuid": "a1",
"parentUuid": "p1",
"message": {
"role": "assistant",
"id": "msg_old",
"content": [
{"type": "thinking", "thinking": "stale thinking"},
{"type": "text", "text": "old answer"},
],
},
}
user2 = {
"type": "user",
"uuid": "u2",
"parentUuid": "a1",
"message": {"role": "user", "content": "next"},
}
asst_new = {
"type": "assistant",
"uuid": "a2",
"parentUuid": "u2",
"message": {
"role": "assistant",
"id": "msg_new",
"content": [
{"type": "thinking", "thinking": "fresh thinking"},
{"type": "text", "text": "new answer"},
],
},
}
content = _make_jsonl(user, progress, asst_old, user2, asst_new)
result = strip_for_upload(content)
lines = result.strip().split("\n")
# Progress should be stripped -> 4 entries remain
assert len(lines) == 4
# First entry (user) should be reparented since its child (progress) was stripped
entries = [json.loads(line) for line in lines]
types = [e.get("type") for e in entries]
assert "progress" not in types
# Old assistant thinking stripped, new assistant thinking preserved
old_asst = next(
e for e in entries if e.get("message", {}).get("id") == "msg_old"
)
old_content = old_asst["message"]["content"]
old_types = [b["type"] for b in old_content if isinstance(b, dict)]
assert "thinking" not in old_types
assert "text" in old_types
new_asst = next(
e for e in entries if e.get("message", {}).get("id") == "msg_new"
)
new_content = new_asst["message"]["content"]
new_types = [b["type"] for b in new_content if isinstance(b, dict)]
assert "thinking" in new_types # last assistant preserved
def test_empty_content(self):
result = strip_for_upload("")
# Empty string produces a single empty line after split, resulting in "\n"
assert result.strip() == ""
def test_preserves_compact_summary(self):
compact = {
"type": "summary",
"uuid": "cs1",
"isCompactSummary": True,
"message": {"role": "user", "content": "summary"},
}
asst = {
"type": "assistant",
"uuid": "a1",
"parentUuid": "cs1",
"message": {"role": "assistant", "id": "msg1", "content": "answer"},
}
content = _make_jsonl(compact, asst)
result = strip_for_upload(content)
lines = result.strip().split("\n")
assert len(lines) == 2
def test_no_assistant_entries(self):
user = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "hi"},
}
content = _make_jsonl(user)
result = strip_for_upload(content)
lines = result.strip().split("\n")
assert len(lines) == 1
# ---------------------------------------------------------------------------
# validate_transcript (additional edge cases)
# ---------------------------------------------------------------------------
class TestValidateTranscript:
def test_valid_with_assistant(self):
content = _make_jsonl(
USER_ENTRY,
ASST_ENTRY,
)
assert validate_transcript(content) is True
def test_none_returns_false(self):
assert validate_transcript(None) is False
def test_whitespace_only_returns_false(self):
assert validate_transcript(" \n ") is False
def test_no_assistant_returns_false(self):
content = _make_jsonl(USER_ENTRY)
assert validate_transcript(content) is False
def test_invalid_json_returns_false(self):
assert validate_transcript("not json\n") is False
def test_assistant_only_is_valid(self):
content = _make_jsonl(ASST_ENTRY)
assert validate_transcript(content) is True

View File

@@ -147,6 +147,19 @@ MODEL_COST: dict[LlmModel, int] = {
LlmModel.KIMI_K2: 1,
LlmModel.QWEN3_235B_A22B_THINKING: 1,
LlmModel.QWEN3_CODER: 9,
# Z.ai (Zhipu) models
LlmModel.ZAI_GLM_4_32B: 1,
LlmModel.ZAI_GLM_4_5: 2,
LlmModel.ZAI_GLM_4_5_AIR: 1,
LlmModel.ZAI_GLM_4_5_AIR_FREE: 1,
LlmModel.ZAI_GLM_4_5V: 2,
LlmModel.ZAI_GLM_4_6: 1,
LlmModel.ZAI_GLM_4_6V: 1,
LlmModel.ZAI_GLM_4_7: 1,
LlmModel.ZAI_GLM_4_7_FLASH: 1,
LlmModel.ZAI_GLM_5: 2,
LlmModel.ZAI_GLM_5_TURBO: 4,
LlmModel.ZAI_GLM_5V_TURBO: 4,
# v0 by Vercel models
LlmModel.V0_1_5_MD: 1,
LlmModel.V0_1_5_LG: 2,

View File

@@ -82,6 +82,28 @@ async def get_user_by_email(email: str) -> Optional[User]:
raise DatabaseError(f"Failed to get user by email {email}: {e}") from e
async def search_users(query: str, limit: int = 20) -> list[tuple[str, str | None]]:
"""Search users by partial email or name.
Returns a list of ``(user_id, email)`` tuples, up to *limit* results.
Searches the User table directly — no dependency on credit history.
"""
query = query.strip()
if not query or len(query) < 3:
return []
users = await prisma.user.find_many(
where={
"OR": [
{"email": {"contains": query, "mode": "insensitive"}},
{"name": {"contains": query, "mode": "insensitive"}},
],
},
take=limit,
order={"email": "asc"},
)
return [(u.id, u.email) for u in users]
async def update_user_email(user_id: str, email: str):
try:
# Get old email first for cache invalidation

View File

@@ -121,10 +121,16 @@ def _make_hashable_key(
def _make_redis_key(key: tuple[Any, ...], func_name: str) -> str:
"""Convert a hashable key tuple to a Redis key string."""
# Ensure key is already hashable
hashable_key = key if isinstance(key, tuple) else (key,)
return f"cache:{func_name}:{hash(hashable_key)}"
"""Convert a hashable key tuple to a Redis key string.
Uses SHA-256 instead of Python's built-in ``hash()`` because ``hash()``
is randomised per-process (``PYTHONHASHSEED``). In a multi-pod
deployment every pod must derive the **same** Redis key for the same
arguments, otherwise cache lookups and invalidations silently miss.
"""
key_bytes = repr(key).encode()
digest = hashlib.sha256(key_bytes).hexdigest()
return f"cache:{func_name}:{digest}"
@runtime_checkable

View File

@@ -1,5 +1,6 @@
import contextlib
import logging
import os
from enum import Enum
from functools import wraps
from typing import Any, Awaitable, Callable, TypeVar
@@ -38,6 +39,7 @@ class Flag(str, Enum):
AGENT_ACTIVITY = "agent-activity"
ENABLE_PLATFORM_PAYMENT = "enable-platform-payment"
CHAT = "chat"
CHAT_MODE_OPTION = "chat-mode-option"
COPILOT_SDK = "copilot-sdk"
COPILOT_DAILY_TOKEN_LIMIT = "copilot-daily-token-limit"
COPILOT_WEEKLY_TOKEN_LIMIT = "copilot-weekly-token-limit"
@@ -165,6 +167,30 @@ async def get_feature_flag_value(
return default
def _env_flag_override(flag_key: Flag) -> bool | None:
"""Return a local override for ``flag_key`` from the environment.
Set ``FORCE_FLAG_<NAME>=true|false`` (``NAME`` = flag value with
``-`` → ``_``, upper-cased) to bypass LaunchDarkly for a single
flag in local dev or tests. Returns ``None`` when no override
is configured so the caller falls through to LaunchDarkly.
The ``NEXT_PUBLIC_FORCE_FLAG_<NAME>`` prefix is also accepted so a
single shared env var can toggle a flag across backend and
frontend (the frontend requires the ``NEXT_PUBLIC_`` prefix to
expose the value to the browser bundle).
Example: ``FORCE_FLAG_CHAT_MODE_OPTION=true`` forces
``Flag.CHAT_MODE_OPTION`` on regardless of LaunchDarkly.
"""
suffix = flag_key.value.upper().replace("-", "_")
for prefix in ("FORCE_FLAG_", "NEXT_PUBLIC_FORCE_FLAG_"):
raw = os.environ.get(prefix + suffix)
if raw is not None:
return raw.strip().lower() in ("1", "true", "yes", "on")
return None
async def is_feature_enabled(
flag_key: Flag,
user_id: str,
@@ -181,6 +207,11 @@ async def is_feature_enabled(
Returns:
True if feature is enabled, False otherwise
"""
override = _env_flag_override(flag_key)
if override is not None:
logger.debug(f"Feature flag {flag_key} overridden by env: {override}")
return override
result = await get_feature_flag_value(flag_key.value, user_id, default)
# If the result is already a boolean, return it

View File

@@ -4,6 +4,7 @@ from ldclient import LDClient
from backend.util.feature_flag import (
Flag,
_env_flag_override,
feature_flag,
is_feature_enabled,
mock_flag_variation,
@@ -111,3 +112,59 @@ async def test_is_feature_enabled_with_flag_enum(mocker):
assert result is True
# Should call with the flag's string value
mock_get_feature_flag_value.assert_called_once()
class TestEnvFlagOverride:
def test_force_flag_true(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv("FORCE_FLAG_CHAT", "true")
assert _env_flag_override(Flag.CHAT) is True
def test_force_flag_false(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv("FORCE_FLAG_CHAT", "false")
assert _env_flag_override(Flag.CHAT) is False
def test_next_public_prefix_true(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv("NEXT_PUBLIC_FORCE_FLAG_CHAT", "true")
assert _env_flag_override(Flag.CHAT) is True
def test_unset_returns_none(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.delenv("FORCE_FLAG_CHAT", raising=False)
monkeypatch.delenv("NEXT_PUBLIC_FORCE_FLAG_CHAT", raising=False)
assert _env_flag_override(Flag.CHAT) is None
def test_invalid_value_returns_false(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv("FORCE_FLAG_CHAT", "notaboolean")
assert _env_flag_override(Flag.CHAT) is False
def test_numeric_one_returns_true(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv("FORCE_FLAG_CHAT", "1")
assert _env_flag_override(Flag.CHAT) is True
def test_yes_returns_true(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv("FORCE_FLAG_CHAT", "yes")
assert _env_flag_override(Flag.CHAT) is True
def test_on_returns_true(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv("FORCE_FLAG_CHAT", "on")
assert _env_flag_override(Flag.CHAT) is True
def test_hyphenated_flag_converts_to_underscore(
self, monkeypatch: pytest.MonkeyPatch
):
monkeypatch.setenv("FORCE_FLAG_CHAT_MODE_OPTION", "true")
assert _env_flag_override(Flag.CHAT_MODE_OPTION) is True
def test_force_flag_takes_precedence_over_next_public(
self, monkeypatch: pytest.MonkeyPatch
):
monkeypatch.setenv("FORCE_FLAG_CHAT", "false")
monkeypatch.setenv("NEXT_PUBLIC_FORCE_FLAG_CHAT", "true")
assert _env_flag_override(Flag.CHAT) is False
def test_whitespace_is_stripped(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv("FORCE_FLAG_CHAT", " true ")
assert _env_flag_override(Flag.CHAT) is True
def test_case_insensitive_value(self, monkeypatch: pytest.MonkeyPatch):
monkeypatch.setenv("FORCE_FLAG_CHAT", "TRUE")
assert _env_flag_override(Flag.CHAT) is True

View File

@@ -155,6 +155,7 @@ class WorkspaceManager:
path: Optional[str] = None,
mime_type: Optional[str] = None,
overwrite: bool = False,
metadata: Optional[dict] = None,
) -> WorkspaceFile:
"""
Write file to workspace.
@@ -168,6 +169,7 @@ class WorkspaceManager:
path: Virtual path (defaults to "/{filename}", session-scoped if session_id set)
mime_type: MIME type (auto-detected if not provided)
overwrite: Whether to overwrite existing file at path
metadata: Optional metadata dict (e.g., origin tracking)
Returns:
Created WorkspaceFile instance
@@ -246,6 +248,7 @@ class WorkspaceManager:
mime_type=mime_type,
size_bytes=len(content),
checksum=checksum,
metadata=metadata,
)
except UniqueViolationError:
if retries > 0:

View File

@@ -0,0 +1,5 @@
-- CreateEnum
CREATE TYPE "SubscriptionTier" AS ENUM ('FREE', 'PRO', 'BUSINESS', 'ENTERPRISE');
-- AlterTable: add subscriptionTier column with default PRO (beta testing)
ALTER TABLE "User" ADD COLUMN "subscriptionTier" "SubscriptionTier" NOT NULL DEFAULT 'PRO';

View File

@@ -40,6 +40,15 @@ model User {
timezone String @default("not-set")
// CoPilot subscription tier — controls rate-limit multipliers.
// Multipliers applied in get_global_rate_limits(): FREE=1x, PRO=5x, BUSINESS=20x, ENTERPRISE=60x.
// NOTE: @default(PRO) is intentional for the beta period — all existing and new
// users receive PRO-level (5x) rate limits by default. The Python-level constant
// DEFAULT_TIER=FREE (in copilot/rate_limit.py) acts as a code-level fallback when
// the DB value is NULL or unrecognised. At GA, a migration will flip the column
// default to FREE and batch-update users to their billing-derived tiers.
subscriptionTier SubscriptionTier @default(PRO)
// Relations
AgentGraphs AgentGraph[]
@@ -73,6 +82,13 @@ model User {
OAuthRefreshTokens OAuthRefreshToken[]
}
enum SubscriptionTier {
FREE
PRO
BUSINESS
ENTERPRISE
}
enum OnboardingStep {
// Introductory onboarding (Library)
WELCOME

View File

@@ -1,6 +1,7 @@
{
"daily_token_limit": 2500000,
"daily_tokens_used": 500000,
"tier": "FREE",
"user_email": "target@example.com",
"user_id": "5e53486c-cf57-477e-ba2a-cb02dc828e1c",
"weekly_token_limit": 12500000,

View File

@@ -1,6 +1,7 @@
{
"daily_token_limit": 2500000,
"daily_tokens_used": 0,
"tier": "FREE",
"user_email": "target@example.com",
"user_id": "5e53486c-cf57-477e-ba2a-cb02dc828e1c",
"weekly_token_limit": 12500000,

View File

@@ -1,6 +1,7 @@
{
"daily_token_limit": 2500000,
"daily_tokens_used": 0,
"tier": "FREE",
"user_email": "target@example.com",
"user_id": "5e53486c-cf57-477e-ba2a-cb02dc828e1c",
"weekly_token_limit": 12500000,

View File

@@ -140,7 +140,9 @@ class TestFixOrchestratorBlocks:
assert defaults["conversation_compaction"] is True
assert defaults["retry"] == 3
assert defaults["multiple_tool_calls"] is False
assert len(fixer.fixes_applied) == 4
assert defaults["execution_mode"] == "extended_thinking"
assert defaults["model"] == "claude-opus-4-6"
assert len(fixer.fixes_applied) == 6
def test_preserves_existing_values(self):
"""Existing user-set values are never overwritten."""
@@ -153,6 +155,8 @@ class TestFixOrchestratorBlocks:
"conversation_compaction": False,
"retry": 1,
"multiple_tool_calls": True,
"execution_mode": "built_in",
"model": "gpt-4o",
}
)
],
@@ -166,6 +170,8 @@ class TestFixOrchestratorBlocks:
assert defaults["conversation_compaction"] is False
assert defaults["retry"] == 1
assert defaults["multiple_tool_calls"] is True
assert defaults["execution_mode"] == "built_in"
assert defaults["model"] == "gpt-4o"
assert len(fixer.fixes_applied) == 0
def test_partial_defaults(self):
@@ -189,7 +195,9 @@ class TestFixOrchestratorBlocks:
assert defaults["conversation_compaction"] is True # filled
assert defaults["retry"] == 3 # filled
assert defaults["multiple_tool_calls"] is False # filled
assert len(fixer.fixes_applied) == 3
assert defaults["execution_mode"] == "extended_thinking" # filled
assert defaults["model"] == "claude-opus-4-6" # filled
assert len(fixer.fixes_applied) == 5
def test_skips_non_sdm_nodes(self):
"""Non-Orchestrator nodes are untouched."""
@@ -258,11 +266,13 @@ class TestFixOrchestratorBlocks:
result = fixer.fix_orchestrator_blocks(agent)
defaults = result["nodes"][0]["input_default"]
assert defaults["agent_mode_max_iterations"] == 10 # None default
assert defaults["conversation_compaction"] is True # None default
assert defaults["agent_mode_max_iterations"] == 10 # None -> default
assert defaults["conversation_compaction"] is True # None -> default
assert defaults["retry"] == 3 # kept
assert defaults["multiple_tool_calls"] is False # kept
assert len(fixer.fixes_applied) == 2
assert defaults["execution_mode"] == "extended_thinking" # filled
assert defaults["model"] == "claude-opus-4-6" # filled
assert len(fixer.fixes_applied) == 4
def test_multiple_sdm_nodes(self):
"""Multiple SDM nodes are all fixed independently."""
@@ -277,11 +287,11 @@ class TestFixOrchestratorBlocks:
result = fixer.fix_orchestrator_blocks(agent)
# First node: 3 defaults filled (agent_mode was already set)
# First node: 5 defaults filled (agent_mode was already set)
assert result["nodes"][0]["input_default"]["agent_mode_max_iterations"] == 3
# Second node: all 4 defaults filled
# Second node: all 6 defaults filled
assert result["nodes"][1]["input_default"]["agent_mode_max_iterations"] == 10
assert len(fixer.fixes_applied) == 7 # 3 + 4
assert len(fixer.fixes_applied) == 11 # 5 + 6
def test_registered_in_apply_all_fixes(self):
"""fix_orchestrator_blocks runs as part of apply_all_fixes."""
@@ -655,6 +665,7 @@ class TestOrchestratorE2EPipeline:
"conversation_compaction": {"type": "boolean"},
"retry": {"type": "integer"},
"multiple_tool_calls": {"type": "boolean"},
"execution_mode": {"type": "string"},
},
"required": ["prompt"],
},

View File

@@ -0,0 +1,394 @@
"""Prompt regression tests AND functional tests for the dry-run verification loop.
NOTE: This file lives in test/copilot/ rather than being colocated with a
single source module because it is a cross-cutting test spanning multiple
modules: prompting.py, service.py, agent_generation_guide.md, and run_agent.py.
These tests verify that the create -> dry-run -> fix iterative workflow is
properly communicated through tool descriptions, the prompting supplement,
and the agent building guide.
After deduplication, the full dry-run workflow lives in the
agent_generation_guide.md only. The system prompt and individual tool
descriptions no longer repeat it — they keep a minimal footprint.
**Intentionally brittle**: the assertions check for specific substrings so
that accidental removal or rewording of key instructions is caught. If you
deliberately reword a prompt, update the corresponding assertion here.
--- Functional tests (added separately) ---
The dry-run loop is primarily a *prompt/guide* feature — the copilot reads
the guide and follows its instructions. There are no standalone Python
functions that implement "loop until passing" logic; the loop is driven by
the LLM. However, several pieces of real Python infrastructure make the
loop possible:
1. The ``run_agent`` and ``run_block`` OpenAI tool schemas expose a
``dry_run`` boolean parameter that the LLM must be able to set.
2. The ``RunAgentInput`` Pydantic model validates ``dry_run`` as a required
bool, so the executor can branch on it.
3. The ``_check_prerequisites`` method in ``RunAgentTool`` bypasses
credential and missing-input gates when ``dry_run=True``.
4. The guide documents the workflow steps in a specific order that the LLM
must follow: create/edit -> dry-run -> inspect -> fix -> repeat.
The functional test classes below exercise items 1-4 directly.
"""
import re
from pathlib import Path
from typing import Any, cast
import pytest
from openai.types.chat import ChatCompletionToolParam
from pydantic import ValidationError
from backend.copilot.prompting import get_sdk_supplement
from backend.copilot.service import DEFAULT_SYSTEM_PROMPT
from backend.copilot.tools import TOOL_REGISTRY
from backend.copilot.tools.run_agent import RunAgentInput
# Resolved once for the whole module so individual tests stay fast.
_SDK_SUPPLEMENT = get_sdk_supplement(use_e2b=False, cwd="/tmp/test")
# ---------------------------------------------------------------------------
# Prompt regression tests (original)
# ---------------------------------------------------------------------------
class TestSystemPromptBasics:
"""Verify the system prompt includes essential baseline content.
After deduplication, the dry-run workflow lives only in the guide.
The system prompt carries tone and personality only.
"""
def test_mentions_automations(self):
assert "automations" in DEFAULT_SYSTEM_PROMPT.lower()
def test_mentions_action_oriented(self):
assert "action-oriented" in DEFAULT_SYSTEM_PROMPT.lower()
class TestToolDescriptionsDryRunLoop:
"""Verify tool descriptions and parameters related to the dry-run loop."""
def test_get_agent_building_guide_mentions_workflow(self):
desc = TOOL_REGISTRY["get_agent_building_guide"].description
assert "dry-run" in desc.lower()
def test_run_agent_dry_run_param_exists_and_is_boolean(self):
schema = TOOL_REGISTRY["run_agent"].as_openai_tool()
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
assert "dry_run" in params["properties"]
assert params["properties"]["dry_run"]["type"] == "boolean"
def test_run_agent_dry_run_param_mentions_simulation(self):
"""After deduplication the dry_run param description mentions simulation."""
schema = TOOL_REGISTRY["run_agent"].as_openai_tool()
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
dry_run_desc = params["properties"]["dry_run"]["description"]
assert "simulat" in dry_run_desc.lower()
class TestPromptingSupplementContent:
"""Verify the prompting supplement (via get_sdk_supplement) includes
essential shared tool notes. After deduplication, the dry-run workflow
lives only in the guide; the supplement carries storage, file-handling,
and tool-discovery notes.
"""
def test_includes_tool_discovery_priority(self):
assert "Tool Discovery Priority" in _SDK_SUPPLEMENT
def test_includes_find_block_first(self):
assert "find_block first" in _SDK_SUPPLEMENT or "find_block" in _SDK_SUPPLEMENT
def test_includes_send_authenticated_web_request(self):
assert "SendAuthenticatedWebRequestBlock" in _SDK_SUPPLEMENT
class TestAgentBuildingGuideDryRunLoop:
"""Verify the agent building guide includes the dry-run loop."""
@pytest.fixture
def guide_content(self):
guide_path = (
Path(__file__).resolve().parent.parent.parent
/ "backend"
/ "copilot"
/ "sdk"
/ "agent_generation_guide.md"
)
return guide_path.read_text(encoding="utf-8")
def test_has_dry_run_verification_section(self, guide_content):
assert "REQUIRED: Dry-Run Verification Loop" in guide_content
def test_workflow_includes_dry_run_step(self, guide_content):
assert "dry_run=True" in guide_content
def test_mentions_good_vs_bad_output(self, guide_content):
assert "**Good output**" in guide_content
assert "**Bad output**" in guide_content
def test_mentions_repeat_until_pass(self, guide_content):
lower = guide_content.lower()
assert "repeat" in lower
assert "clearly unfixable" in lower
def test_mentions_wait_for_result(self, guide_content):
assert "wait_for_result=120" in guide_content
def test_mentions_view_agent_output(self, guide_content):
assert "view_agent_output" in guide_content
def test_workflow_has_dry_run_and_inspect_steps(self, guide_content):
assert "**Dry-run**" in guide_content
assert "**Inspect & fix**" in guide_content
# ---------------------------------------------------------------------------
# Functional tests: tool schema validation
# ---------------------------------------------------------------------------
class TestRunAgentToolSchema:
"""Validate the run_agent OpenAI tool schema exposes dry_run correctly.
These go beyond substring checks — they verify the full schema structure
that the LLM receives, ensuring the parameter is well-formed and will be
parsed correctly by OpenAI function-calling.
"""
@pytest.fixture
def schema(self) -> ChatCompletionToolParam:
return TOOL_REGISTRY["run_agent"].as_openai_tool()
def test_schema_is_valid_openai_tool(self, schema: ChatCompletionToolParam):
"""The schema has the required top-level OpenAI structure."""
assert schema["type"] == "function"
assert "function" in schema
func = schema["function"]
assert "name" in func
assert "description" in func
assert "parameters" in func
assert func["name"] == "run_agent"
def test_dry_run_is_required(self, schema: ChatCompletionToolParam):
"""dry_run must be in 'required' so the LLM always provides it explicitly."""
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
required = params.get("required", [])
assert "dry_run" in required
def test_dry_run_is_boolean_type(self, schema: ChatCompletionToolParam):
"""dry_run must be typed as boolean so the LLM generates true/false."""
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
assert params["properties"]["dry_run"]["type"] == "boolean"
def test_dry_run_description_is_nonempty(self, schema: ChatCompletionToolParam):
"""The description must be present and substantive for LLM guidance."""
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
desc = params["properties"]["dry_run"]["description"]
assert isinstance(desc, str)
assert len(desc) > 10, "Description too short to guide the LLM"
def test_wait_for_result_coexists_with_dry_run(
self, schema: ChatCompletionToolParam
):
"""wait_for_result must also be present — the guide instructs the LLM
to pass both dry_run=True and wait_for_result=120 together."""
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
assert "wait_for_result" in params["properties"]
assert params["properties"]["wait_for_result"]["type"] == "integer"
class TestRunBlockToolSchema:
"""Validate the run_block OpenAI tool schema exposes dry_run correctly."""
@pytest.fixture
def schema(self) -> ChatCompletionToolParam:
return TOOL_REGISTRY["run_block"].as_openai_tool()
def test_schema_is_valid_openai_tool(self, schema: ChatCompletionToolParam):
assert schema["type"] == "function"
func = schema["function"]
assert func["name"] == "run_block"
assert "parameters" in func
def test_dry_run_exists_and_is_boolean(self, schema: ChatCompletionToolParam):
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
props = params["properties"]
assert "dry_run" in props
assert props["dry_run"]["type"] == "boolean"
def test_dry_run_is_required(self, schema: ChatCompletionToolParam):
"""dry_run must be required — along with block_id and input_data."""
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
required = params.get("required", [])
assert "dry_run" in required
assert "block_id" in required
assert "input_data" in required
def test_dry_run_description_mentions_preview(
self, schema: ChatCompletionToolParam
):
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
desc = params["properties"]["dry_run"]["description"]
assert isinstance(desc, str)
assert (
"preview mode" in desc.lower()
), "run_block dry_run description should mention preview mode"
# ---------------------------------------------------------------------------
# Functional tests: RunAgentInput Pydantic model
# ---------------------------------------------------------------------------
class TestRunAgentInputModel:
"""Validate RunAgentInput Pydantic model handles dry_run correctly.
The executor reads dry_run from this model, so it must parse, default,
and validate properly.
"""
def test_dry_run_accepts_true(self):
model = RunAgentInput(username_agent_slug="user/agent", dry_run=True)
assert model.dry_run is True
def test_dry_run_accepts_false(self):
"""dry_run=False must be accepted when provided explicitly."""
model = RunAgentInput(username_agent_slug="user/agent", dry_run=False)
assert model.dry_run is False
def test_dry_run_coerces_truthy_int(self):
"""Pydantic bool fields coerce int 1 to True."""
model = RunAgentInput(username_agent_slug="user/agent", dry_run=1) # type: ignore[arg-type]
assert model.dry_run is True
def test_dry_run_coerces_falsy_int(self):
"""Pydantic bool fields coerce int 0 to False."""
model = RunAgentInput(username_agent_slug="user/agent", dry_run=0) # type: ignore[arg-type]
assert model.dry_run is False
def test_dry_run_with_wait_for_result(self):
"""The guide instructs passing both dry_run=True and wait_for_result=120.
The model must accept this combination."""
model = RunAgentInput(
username_agent_slug="user/agent",
dry_run=True,
wait_for_result=120,
)
assert model.dry_run is True
assert model.wait_for_result == 120
def test_wait_for_result_upper_bound(self):
"""wait_for_result is bounded at 300 seconds (ge=0, le=300)."""
with pytest.raises(ValidationError):
RunAgentInput(
username_agent_slug="user/agent",
dry_run=True,
wait_for_result=301,
)
def test_string_fields_are_stripped(self):
"""The strip_strings validator should strip whitespace from string fields."""
model = RunAgentInput(username_agent_slug=" user/agent ", dry_run=True)
assert model.username_agent_slug == "user/agent"
# ---------------------------------------------------------------------------
# Functional tests: guide documents the correct workflow ordering
# ---------------------------------------------------------------------------
class TestGuideWorkflowOrdering:
"""Verify the guide documents workflow steps in the correct order.
The LLM must see: create/edit -> dry-run -> inspect -> fix -> repeat.
If these steps are reordered, the copilot would follow the wrong sequence.
These tests verify *ordering*, not just presence.
"""
@pytest.fixture
def guide_content(self) -> str:
guide_path = (
Path(__file__).resolve().parent.parent.parent
/ "backend"
/ "copilot"
/ "sdk"
/ "agent_generation_guide.md"
)
return guide_path.read_text(encoding="utf-8")
def test_create_before_dry_run_in_workflow(self, guide_content: str):
"""Step 7 (Save/create_agent) must appear before step 8 (Dry-run)."""
create_pos = guide_content.index("create_agent")
dry_run_pos = guide_content.index("dry_run=True")
assert (
create_pos < dry_run_pos
), "create_agent must appear before dry_run=True in the workflow"
def test_dry_run_before_inspect_in_verification_section(self, guide_content: str):
"""In the verification loop section, Dry-run step must come before
Inspect & fix step."""
section_start = guide_content.index("REQUIRED: Dry-Run Verification Loop")
section = guide_content[section_start:]
dry_run_pos = section.index("**Dry-run**")
inspect_pos = section.index("**Inspect")
assert (
dry_run_pos < inspect_pos
), "Dry-run step must come before Inspect & fix in the verification loop"
def test_fix_before_repeat_in_verification_section(self, guide_content: str):
"""The Fix step must come before the Repeat step."""
section_start = guide_content.index("REQUIRED: Dry-Run Verification Loop")
section = guide_content[section_start:]
fix_pos = section.index("**Fix**")
repeat_pos = section.index("**Repeat**")
assert fix_pos < repeat_pos
def test_good_output_before_bad_output(self, guide_content: str):
"""Good output examples should be listed before bad output examples,
so the LLM sees the success pattern first."""
good_pos = guide_content.index("**Good output**")
bad_pos = guide_content.index("**Bad output**")
assert good_pos < bad_pos
def test_numbered_steps_in_verification_section(self, guide_content: str):
"""The step-by-step workflow should have numbered steps 1-5."""
section_start = guide_content.index("Step-by-step workflow")
section = guide_content[section_start:]
# The section should contain numbered items 1 through 5
for step_num in range(1, 6):
assert (
f"{step_num}. " in section
), f"Missing numbered step {step_num} in verification workflow"
def test_workflow_steps_are_in_numbered_order(self, guide_content: str):
"""The main workflow steps (1-9) must appear in ascending order."""
# Extract the numbered workflow items from the top-level workflow section
workflow_start = guide_content.index("### Workflow for Creating/Editing Agents")
# End at the next ### section
next_section = guide_content.index("### Agent JSON Structure")
workflow_section = guide_content[workflow_start:next_section]
step_positions = []
for step_num in range(1, 10):
pattern = rf"^{step_num}\.\s"
match = re.search(pattern, workflow_section, re.MULTILINE)
if match:
step_positions.append((step_num, match.start()))
# Verify at least steps 1-9 are present and in order
assert (
len(step_positions) >= 9
), f"Expected 9 workflow steps, found {len(step_positions)}"
for i in range(1, len(step_positions)):
prev_num, prev_pos = step_positions[i - 1]
curr_num, curr_pos = step_positions[i]
assert prev_pos < curr_pos, (
f"Step {prev_num} (pos {prev_pos}) should appear before "
f"step {curr_num} (pos {curr_pos})"
)

View File

@@ -98,6 +98,7 @@ services:
- CLAMD_CONF_MaxScanSize=100M
- CLAMD_CONF_MaxThreads=12
- CLAMD_CONF_ReadTimeout=300
- CLAMD_CONF_TCPAddr=0.0.0.0
healthcheck:
test: ["CMD-SHELL", "clamdscan --version || exit 1"]
interval: 30s

View File

@@ -40,6 +40,8 @@ After making **any** code changes in the frontend, you MUST run the following co
Do NOT skip these steps. If any command reports errors, fix them and re-run until clean. Only then may you consider the task complete. If typing keeps failing, stop and ask the user.
4. `pnpm test:unit` — run integration tests; fix any failures
### Code Style
- Fully capitalize acronyms in symbols, e.g. `graphID`, `useBackendAPI`
@@ -62,7 +64,7 @@ Do NOT skip these steps. If any command reports errors, fix them and re-run unti
- **Icons**: Phosphor Icons only
- **Feature Flags**: LaunchDarkly integration
- **Error Handling**: ErrorCard for render errors, toast for mutations, Sentry for exceptions
- **Testing**: Playwright for E2E, Storybook for component development
- **Testing**: Vitest + React Testing Library + MSW for integration tests (primary), Playwright for E2E, Storybook for visual
## Environment Configuration
@@ -84,7 +86,12 @@ See @CONTRIBUTING.md for complete patterns. Quick reference:
- Regenerate with `pnpm generate:api`
- Pattern: `use{Method}{Version}{OperationName}`
4. **Styling**: Tailwind CSS only, use design tokens, Phosphor Icons only
5. **Testing**: Add Storybook stories for new components, Playwright for E2E. When fixing a bug, write a failing Playwright test first (use `.fixme` annotation), implement the fix, then remove the annotation.
5. **Testing**: Integration tests are the default (~90%). See `TESTING.md` for full details.
- **New pages/features**: Write integration tests in `__tests__/` next to `page.tsx` using Vitest + RTL + MSW
- **API mocking**: Use Orval-generated MSW handlers from `@/app/api/__generated__/endpoints/{tag}/{tag}.msw.ts`
- **Run**: `pnpm test:unit` (integration/unit), `pnpm test` (Playwright E2E)
- **Storybook**: For design system components in `src/components/`
- **TDD**: Write a failing test first, implement, then verify
6. **Code conventions**:
- Use function declarations (not arrow functions) for components/handlers
- Do not use `useCallback` or `useMemo` unless asked to optimise a given function

View File

@@ -747,9 +747,65 @@ export function CreateButton() {
---
## 🧪 Testing & Storybook
## 🧪 Testing
- See `TESTING.md` for Playwright setup, E2E data seeding, and Storybook usage.
See `TESTING.md` for full details. Key principles:
### Integration tests are the default (~90% of tests)
We test at the **page level**: render the page with React Testing Library, mock API requests with MSW (auto-generated by Orval), and assert with testing-library queries.
```bash
pnpm test:unit # run integration/unit tests
pnpm test:unit:watch # watch mode
```
### Test file location
Tests live in `__tests__/` next to the page or component:
```
app/(platform)/library/
__tests__/
main.test.tsx # main page rendering & interactions
search.test.tsx # search-specific behavior
components/
page.tsx
useLibraryPage.ts
```
### Writing a test
1. Render the page using `render()` from `@/tests/integrations/test-utils`
2. Mock API responses using Orval-generated MSW handlers from `@/app/api/__generated__/endpoints/{tag}/{tag}.msw.ts`
3. Assert with `screen.findByText`, `screen.getByRole`, etc.
```tsx
import { render, screen } from "@/tests/integrations/test-utils";
import { server } from "@/mocks/mock-server";
import { getGetV2ListLibraryAgentsMockHandler200 } from "@/app/api/__generated__/endpoints/library/library.msw";
import LibraryPage from "../page";
test("renders agent list", async () => {
server.use(getGetV2ListLibraryAgentsMockHandler200());
render(<LibraryPage />);
expect(await screen.findByText("My Agents")).toBeDefined();
});
```
### When to use each test type
| Type | When |
| ------------------------------------ | --------------------------------------------- |
| **Integration (Vitest + RTL + MSW)** | Default for all new pages and features |
| **E2E (Playwright)** | Auth flows, payments, cross-page navigation |
| **Storybook** | Design system components in `src/components/` |
### TDD workflow
1. Write a failing test (integration test or Playwright with `.fixme`)
2. Implement the fix/feature
3. Remove annotations and run the full suite
---
@@ -763,8 +819,10 @@ Common scripts (see `package.json` for full list):
- `pnpm lint` — ESLint + Prettier check
- `pnpm format` — Format code
- `pnpm types` — Type-check
- `pnpm test:unit` — Run integration/unit tests (Vitest + RTL + MSW)
- `pnpm test:unit:watch` — Watch mode for integration tests
- `pnpm test` — Run Playwright E2E tests
- `pnpm storybook` — Run Storybook
- `pnpm test` — Run Playwright tests
Generated API client:
@@ -780,6 +838,7 @@ Generated API client:
- Logic is separated into `use*.ts` and `helpers.ts` when non-trivial
- Reusable logic extracted to `src/services/` or `src/lib/utils.ts` when appropriate
- Navigation uses the Next.js router
- Integration tests added/updated for new pages and features (`pnpm test:unit`)
- Lint, format, type-check, and tests pass locally
- Stories updated/added if UI changed; verified in Storybook

View File

@@ -12,6 +12,10 @@ COPY autogpt_platform/frontend/ .
# Allow CI to opt-in to Playwright test build-time flags
ARG NEXT_PUBLIC_PW_TEST="false"
ENV NEXT_PUBLIC_PW_TEST=$NEXT_PUBLIC_PW_TEST
# Allow CI to opt-in to browser sourcemaps for coverage path resolution.
# Keep Docker builds defaulting to false to avoid the memory hit.
ARG NEXT_PUBLIC_SOURCEMAPS="false"
ENV NEXT_PUBLIC_SOURCEMAPS=$NEXT_PUBLIC_SOURCEMAPS
ENV NODE_ENV="production"
# Merge env files appropriately based on environment
RUN if [ -f .env.production ]; then \
@@ -25,10 +29,6 @@ RUN if [ -f .env.production ]; then \
cp .env.default .env; \
fi
RUN pnpm run generate:api
# Disable source-map generation in Docker builds to halve webpack memory usage.
# Source maps are only useful when SENTRY_AUTH_TOKEN is set (Vercel deploys);
# the Docker image never uploads them, so generating them just wastes RAM.
ENV NEXT_PUBLIC_SOURCEMAPS="false"
# In CI, we want NEXT_PUBLIC_PW_TEST=true during build so Next.js inlines it
RUN if [ "$NEXT_PUBLIC_PW_TEST" = "true" ]; then NEXT_PUBLIC_PW_TEST=true NODE_OPTIONS="--max-old-space-size=8192" pnpm build; else NODE_OPTIONS="--max-old-space-size=8192" pnpm build; fi

View File

@@ -1,57 +1,168 @@
# Frontend Testing 🧪
# Frontend Testing
## Quick Start (local) 🚀
## Testing Strategy
| Type | Tool | Speed | When to use |
| ------------------------- | ------------------------------------ | ------------- | ----------------------------------------------------- |
| **Integration (primary)** | Vitest + React Testing Library + MSW | Fast (~100ms) | ~90% of tests — page-level rendering with mocked API |
| **E2E** | Playwright | Slow (~5s) | Critical flows: auth, payments, cross-page navigation |
| **Visual** | Storybook + Chromatic | N/A | Design system components |
**Integration tests are the default.** Since most of our code is client-only, we test at the page level: render the page with React Testing Library, mock API requests with MSW (handlers auto-generated by Orval), and assert with testing-library queries.
## Integration Tests (Vitest + RTL + MSW)
### Running
```bash
pnpm test:unit # run all integration/unit tests with coverage
pnpm test:unit:watch # watch mode for development
```
### File location
Tests live in a `__tests__/` folder next to the page or component they test:
```
app/(platform)/library/
__tests__/
main.test.tsx # tests the main page rendering & interactions
search.test.tsx # tests search-specific behavior
components/
AgentCard/
AgentCard.tsx
__tests__/
AgentCard.test.tsx # only when testing the component in isolation
page.tsx
useLibraryPage.ts
```
**Naming**: use descriptive names like `main.test.tsx`, `search.test.tsx`, `filters.test.tsx` — not `page.test.tsx` or `index.test.tsx`.
### Writing an integration test
1. **Render the page** using the custom `render()` from `@/tests/integrations/test-utils` (wraps providers)
2. **Mock API responses** using Orval-generated MSW handlers from `@/app/api/__generated__/endpoints/{tag}/{tag}.msw.ts`
3. **Assert** with React Testing Library queries (`screen.findByText`, `screen.getByRole`, etc.)
```tsx
import { render, screen } from "@/tests/integrations/test-utils";
import { server } from "@/mocks/mock-server";
import {
getGetV2ListLibraryAgentsMockHandler200,
getGetV2ListLibraryAgentsMockHandler422,
} from "@/app/api/__generated__/endpoints/library/library.msw";
import LibraryPage from "../page";
describe("LibraryPage", () => {
test("renders agent list from API", async () => {
server.use(getGetV2ListLibraryAgentsMockHandler200());
render(<LibraryPage />);
expect(await screen.findByText("My Agents")).toBeDefined();
});
test("shows error state on API failure", async () => {
server.use(getGetV2ListLibraryAgentsMockHandler422());
render(<LibraryPage />);
expect(await screen.findByText(/error/i)).toBeDefined();
});
});
```
### MSW handlers
Orval generates typed MSW handlers for every endpoint and HTTP status code:
- `getGetV2ListLibraryAgentsMockHandler200()` — success response with faker data
- `getGetV2ListLibraryAgentsMockHandler422()` — validation error response
- `getGetV2ListLibraryAgentsMockHandler401()` — unauthorized response
To override with custom data, pass a resolver:
```tsx
import { http, HttpResponse } from "msw";
server.use(
http.get("http://localhost:3000/api/proxy/api/library/agents", () => {
return HttpResponse.json({
agents: [{ id: "1", name: "My Agent" }],
pagination: { total: 1 },
});
}),
);
```
All handlers are aggregated in `src/mocks/mock-handlers.ts` and the MSW server is set up in `src/mocks/mock-server.ts`.
### Test utilities
- **`@/tests/integrations/test-utils`** — custom `render()` that wraps components with `QueryClientProvider`, `BackendAPIProvider`, `OnboardingProvider`, `NuqsTestingAdapter`, and `TooltipProvider`, so query-state hooks and tooltips work out of the box in page-level tests
- **`@/tests/integrations/setup-nextjs-mocks`** — mocks for `next/navigation`, `next/image`, `next/headers`, `next/link`
- **`@/tests/integrations/mock-supabase-request`** — mocks Supabase auth (returns null user by default)
### What to test at page level
- Page renders with API data (happy path)
- Loading and error states
- User interactions that trigger mutations (clicks, form submissions)
- Conditional rendering based on API responses
- Search, filtering, pagination behavior
### When to test a component in isolation
Only when the component has complex internal logic that is hard to exercise through the page test. Prefer page-level tests as the default.
## E2E Tests (Playwright)
### Running
```bash
pnpm test # build + run all Playwright tests
pnpm test-ui # run with Playwright UI
pnpm test:no-build # run against a running dev server
```
### Setup
1. Start the backend + Supabase stack:
- From `autogpt_platform`: `docker compose --profile local up deps_backend -d`
- Or run the full stack: `docker compose up -d`
2. Seed rich E2E data (creates `test123@gmail.com` with library agents):
- From `autogpt_platform/backend`: `poetry run python test/e2e_test_data.py`
3. Run Playwright:
- From `autogpt_platform/frontend`: `pnpm test` or `pnpm test-ui`
## How Playwright setup works 🎭
### How Playwright setup works
- Playwright runs from `frontend/playwright.config.ts` with a global setup step.
- The global setup creates a user pool via the real signup UI and stores it in `frontend/.auth/user-pool.json`.
- Most tests call `getTestUser()` (from `src/tests/utils/auth.ts`) which pulls a random user from that pool.
- these users do not contain library agents, it's user that just "signed up" on the platform, hence some tests to make use of users created via script (see below) with more data
- Playwright runs from `frontend/playwright.config.ts` with a global setup step
- Global setup creates a user pool via the real signup UI, stored in `frontend/.auth/user-pool.json`
- `getTestUser()` (from `src/tests/utils/auth.ts`) pulls a random user from the pool
- `getTestUserWithLibraryAgents()` uses the rich user created by the data script
## Test users 👤
### Test users
- **User pool (basic users)**
Created automatically by the Playwright global setup through `/signup`.
Used by `getTestUser()` in `src/tests/utils/auth.ts`.
- **User pool (basic users)** — created automatically by Playwright global setup. Used by `getTestUser()`
- **Rich user with library agents** — created by `backend/test/e2e_test_data.py`. Used by `getTestUserWithLibraryAgents()`
- **Rich user with library agents**
Created by `backend/test/e2e_test_data.py`.
Accessed via `getTestUserWithLibraryAgents()` in `src/tests/credentials/index.ts`.
Use the rich user when a test needs existing library agents (e.g. `library.spec.ts`).
## Resetting or wiping the DB 🔁
### Resetting the DB
If you reset the Docker DB and logins start failing:
1. Delete `frontend/.auth/user-pool.json` so the pool is regenerated.
2. Re-run the E2E data script to recreate the rich user + library agents:
- `poetry run python test/e2e_test_data.py`
1. Delete `frontend/.auth/user-pool.json`
2. Re-run `poetry run python test/e2e_test_data.py`
## Storybook 📚
## Storybook
## Flow diagram 🗺️
- `pnpm storybook` — run locally
- `pnpm build-storybook` — build static
- `pnpm test-storybook` — CI runner
- When changing components in `src/components`, update or add stories and verify in Storybook/Chromatic
```mermaid
flowchart TD
A[Start Docker stack] --> B[Run e2e_test_data.py]
B --> C[Run Playwright tests]
C --> D[Global setup creates user pool]
D --> E{Test needs rich data?}
E -->|No| F[getTestUser from user pool]
E -->|Yes| G[getTestUserWithLibraryAgents]
```
## TDD Workflow
- `pnpm storybook` Run Storybook locally
- `pnpm build-storybook` Build a static Storybook
- CI runner: `pnpm test-storybook`
- When changing components in `src/components`, update or add stories and verify in Storybook/Chromatic.
When fixing a bug or adding a feature:
1. **Write a failing test first** — for integration tests, write the test and confirm it fails. For Playwright, use `.fixme` annotation
2. **Implement the fix/feature** — write the minimal code to make the test pass
3. **Remove annotations** — once passing, remove `.fixme` and run the full suite

View File

@@ -161,6 +161,7 @@
"eslint-plugin-storybook": "9.1.5",
"happy-dom": "20.3.4",
"import-in-the-middle": "2.0.2",
"monocart-reporter": "2.10.0",
"msw": "2.11.6",
"msw-storybook-addon": "2.0.6",
"orval": "7.13.0",

View File

@@ -5,10 +5,57 @@ import { defineConfig, devices } from "@playwright/test";
* https://github.com/motdotla/dotenv
*/
import dotenv from "dotenv";
import fs from "fs";
import path from "path";
dotenv.config({ path: path.resolve(__dirname, ".env") });
dotenv.config({ path: path.resolve(__dirname, "../backend/.env") });
const frontendRoot = __dirname.replaceAll("\\", "/");
// Directory where CI copies .next/static from the Docker container
const staticCoverageDir = path.resolve(__dirname, ".next-static-coverage");
function normalizeCoverageSourcePath(filePath: string) {
const normalizedFilePath = filePath.replaceAll("\\", "/");
const withoutWebpackPrefix = normalizedFilePath.replace(
/^webpack:\/\/_N_E\//,
"",
);
if (withoutWebpackPrefix.startsWith("./")) {
return withoutWebpackPrefix.slice(2);
}
if (withoutWebpackPrefix.startsWith(frontendRoot)) {
return path.posix.relative(frontendRoot, withoutWebpackPrefix);
}
return withoutWebpackPrefix;
}
// Resolve source maps from the copied .next/static directory.
// Cache parsed results to avoid repeated disk reads during report generation.
const sourceMapCache = new Map<string, object | undefined>();
function resolveSourceMap(sourcePath: string) {
// sourcePath is the sourceMappingURL, e.g.:
// "http://localhost:3000/_next/static/chunks/abc123.js.map"
const match = sourcePath.match(/_next\/static\/(.+)$/);
if (!match) return undefined;
const mapFile = path.join(staticCoverageDir, match[1]);
if (sourceMapCache.has(mapFile)) return sourceMapCache.get(mapFile);
try {
const result = JSON.parse(fs.readFileSync(mapFile, "utf8")) as object;
sourceMapCache.set(mapFile, result);
return result;
} catch {
sourceMapCache.set(mapFile, undefined);
return undefined;
}
}
export default defineConfig({
testDir: "./src/tests",
/* Global setup file that runs before all tests */
@@ -22,7 +69,30 @@ export default defineConfig({
/* use more workers on CI. */
workers: process.env.CI ? 4 : undefined,
/* Reporter to use. See https://playwright.dev/docs/test-reporters */
reporter: [["list"], ["html", { open: "never" }]],
reporter: [
["list"],
["html", { open: "never" }],
[
"monocart-reporter",
{
name: "E2E Coverage Report",
outputFile: "./coverage/e2e/report.html",
coverage: {
reports: ["cobertura"],
outputDir: "./coverage/e2e",
entryFilter: (entry: { url: string }) =>
entry.url.includes("/_next/static/") &&
!entry.url.includes("node_modules"),
sourceFilter: (sourcePath: string) =>
sourcePath.includes("src/") && !sourcePath.includes("node_modules"),
sourcePath: (filePath: string) =>
normalizeCoverageSourcePath(filePath),
sourceMapResolver: (sourcePath: string) =>
resolveSourceMap(sourcePath),
},
},
],
],
/* Shared settings for all the projects below. See https://playwright.dev/docs/api/class-testoptions. */
use: {
/* Base URL to use in actions like `await page.goto('/')`. */

View File

@@ -400,6 +400,9 @@ importers:
import-in-the-middle:
specifier: 2.0.2
version: 2.0.2
monocart-reporter:
specifier: 2.10.0
version: 2.10.0
msw:
specifier: 2.11.6
version: 2.11.6(@types/node@24.10.0)(typescript@5.9.3)
@@ -4064,6 +4067,10 @@ packages:
resolution: {integrity: sha512-h8lQ8tacZYnR3vNQTgibj+tODHI5/+l06Au2Pcriv/Gmet0eaj4TwWH41sO9wnHDiQsEj19q0drzdWdeAHtweg==}
engines: {node: '>=6.5'}
accepts@1.3.8:
resolution: {integrity: sha512-PYAthTa2m2VKxuvSD3DPC/Gy+U+sOA1LAuT8mkmRuvw+NACSaeXEQ+NHcVF7rONl6qcaxV3Uuemwawk+7+SJLw==}
engines: {node: '>= 0.6'}
acorn-import-attributes@1.9.5:
resolution: {integrity: sha512-n02Vykv5uA3eHGM/Z2dQrcD56kL8TyDb2p1+0P83PClMnC/nc+anbQRhIOWnSq4Ke/KvDPrY3C9hDtC/A3eHnQ==}
peerDependencies:
@@ -4080,6 +4087,14 @@ packages:
peerDependencies:
acorn: ^6.0.0 || ^7.0.0 || ^8.0.0
acorn-loose@8.5.2:
resolution: {integrity: sha512-PPvV6g8UGMGgjrMu+n/f9E/tCSkNQ2Y97eFvuVdJfG11+xdIeDcLyNdC8SHcrHbRqkfwLASdplyR6B6sKM1U4A==}
engines: {node: '>=0.4.0'}
acorn-walk@8.3.5:
resolution: {integrity: sha512-HEHNfbars9v4pgpW6SO1KSPkfoS0xVOM/9UzkJltjlsHZmJasxg8aXkuZa7SMf8vKGIBhpUsPluQSqhJFCqebw==}
engines: {node: '>=0.4.0'}
acorn@8.15.0:
resolution: {integrity: sha512-NZyJarBfL7nWwIq+FDL6Zp/yHEhePMNnnJ0y3qfieCrmNvYct8uvtiV41UvlSe6apAfk0fY1FbWx+NwfmpvtTg==}
engines: {node: '>=0.4.0'}
@@ -4610,9 +4625,20 @@ packages:
console-browserify@1.2.0:
resolution: {integrity: sha512-ZMkYO/LkF17QvCPqM0gxw8yUzigAOZOSWSHg91FH6orS7vcEj5dVZTidN2fQ14yBSdg97RqhSNwLUXInd52OTA==}
console-grid@2.2.3:
resolution: {integrity: sha512-+mecFacaFxGl+1G31IsCx41taUXuW2FxX+4xIE0TIPhgML+Jb9JFcBWGhhWerd1/vhScubdmHqTwOhB0KCUUAg==}
constants-browserify@1.0.0:
resolution: {integrity: sha512-xFxOwqIzR/e1k1gLiWEophSCMqXcwVHIH7akf7b/vxcUeGunlj3hvZaaqxwHsTgn+IndtkQJgSztIDWeumWJDQ==}
content-disposition@1.0.1:
resolution: {integrity: sha512-oIXISMynqSqm241k6kcQ5UwttDILMK4BiurCfGEREw6+X9jkkpEe5T9FZaApyLGGOnFuyMWZpdolTXMtvEJ08Q==}
engines: {node: '>=18'}
content-type@1.0.5:
resolution: {integrity: sha512-nTjqfcBFEipKdXCv4YDQWCfmcLZKm81ldF0pAopTvyrFGVbcR6P/VAAd5G7N+0tTr8QqiU0tFadD6FK4NtJwOA==}
engines: {node: '>= 0.6'}
convert-source-map@1.9.0:
resolution: {integrity: sha512-ASFBup0Mz1uyiIjANan1jzLQami9z1PoYSZCiiYW2FczPbenXc45FZdBZLzOT+r6+iciuEModtmCti+hjaAk0A==}
@@ -4623,6 +4649,10 @@ packages:
resolution: {integrity: sha512-9Kr/j4O16ISv8zBBhJoi4bXOYNTkFLOqSL3UDB0njXxCXNezjeyVrJyGOWtgfs/q2km1gwBcfH8q1yEGoMYunA==}
engines: {node: '>=18'}
cookies@0.9.1:
resolution: {integrity: sha512-TG2hpqe4ELx54QER/S3HQ9SRVnQnGBtKUz5bLQWtYAQ+o6GpgMs6sYUvaiJjVxb+UXwhRhAEP3m7LbsIZ77Hmw==}
engines: {node: '>= 0.8'}
core-js-compat@3.47.0:
resolution: {integrity: sha512-IGfuznZ/n7Kp9+nypamBhvwdwLsW6KC8IOaURw2doAK5e98AG3acVLdh0woOnEqCfUtS+Vu882JE4k/DAm3ItQ==}
@@ -4931,6 +4961,9 @@ packages:
resolution: {integrity: sha512-h5k/5U50IJJFpzfL6nO9jaaumfjO/f2NjK/oYB2Djzm4p9L+3T9qWpZqZ2hAbLPuuYq9wrU08WQyBTL5GbPk5Q==}
engines: {node: '>=6'}
deep-equal@1.0.1:
resolution: {integrity: sha512-bHtC0iYvWhyaTzvV3CZgPeZQqCOBGyGsVV7v4eevpdkLHfiSrXUdBG+qAuSz4RI70sszvjQ1QSZ98An1yNwpSw==}
deep-is@0.1.4:
resolution: {integrity: sha512-oIPzksmTg4/MriiaYGO+okXDT7ztn/w3Eptv/+gSIdMdKsJo0u4CfYNFJPy+4SKMuCqGw2wxnA+URMg3t8a/bQ==}
@@ -4957,6 +4990,17 @@ packages:
delaunator@5.0.1:
resolution: {integrity: sha512-8nvh+XBe96aCESrGOqMp/84b13H9cdKbG5P2ejQCh4d4sK9RL4371qou9drQjMhvnPmhWl5hnmqbEE0fXr9Xnw==}
delegates@1.0.0:
resolution: {integrity: sha512-bd2L678uiWATM6m5Z1VzNCErI3jiGzt6HGY8OVICs40JQq/HALfbyNJmp0UDakEY4pMMaN0Ly5om/B1VI/+xfQ==}
depd@1.1.2:
resolution: {integrity: sha512-7emPTl6Dpo6JRXOXjLRxck+FlLRX5847cLKEn00PLAgc3g2hTZZgr+e4c2v6QpSmLeFP3n5yUo7ft6avBK/5jQ==}
engines: {node: '>= 0.6'}
depd@2.0.0:
resolution: {integrity: sha512-g7nH6P6dyDioJogAAGprGpCtVImJhpPk/roCzdb3fIh61/s/nPsfR6onyMwkCAR/OlC3yBC0lESvUoQEAssIrw==}
engines: {node: '>= 0.8'}
dependency-graph@0.11.0:
resolution: {integrity: sha512-JeMq7fEshyepOWDfcfHK06N3MhyPhz++vtqWhMT5O9A3K42rdsEDpfdVqjaqaAhsw6a+ZqeDvQVtD0hFHQWrzg==}
engines: {node: '>= 0.6.0'}
@@ -4968,6 +5012,10 @@ packages:
des.js@1.1.0:
resolution: {integrity: sha512-r17GxjhUCjSRy8aiJpr8/UadFIzMzJGexI3Nmz4ADi9LYSFx4gTBp80+NaX/YsXWWLhpZ7v/v/ubEc/bCNfKwg==}
destroy@1.2.0:
resolution: {integrity: sha512-2sJGJTaXIIaR1w4iJSNoN0hnMY7Gpc/n8D4qSCJw8QqFWXf7cuAgnEHxBpweaVcPevC2l3KpjYCx3NypQQgaJg==}
engines: {node: '>= 0.8', npm: 1.2.8000 || >= 1.4.16}
detect-libc@2.1.2:
resolution: {integrity: sha512-Btj2BOOO83o3WyH59e8MgXsxEQVcarkUOpEYrubB0urwnN10yQ364rsiByU11nZlqWYZm05i/of7io4mzihBtQ==}
engines: {node: '>=8'}
@@ -5049,6 +5097,12 @@ packages:
eastasianwidth@0.2.0:
resolution: {integrity: sha512-I88TYZWc9XiYHRQ4/3c5rjjfgkjhLyW2luGIheGERbNQ6OY7yTybanSpDXZa8y7VUP9YmDcYa+eyq4ca7iLqWA==}
ee-first@1.1.1:
resolution: {integrity: sha512-WMwm9LhRUo+WUaRN+vRuETqG89IgZphVSNkdFgeb6sS/E4OrDIN7t48CAewSHXc6C8lefD8KKfr5vY61brQlow==}
eight-colors@1.3.2:
resolution: {integrity: sha512-qo7BAEbNnadiWn3EgZFD8tk2DWpifEHJE7CVyp09I0FiUJZ6z0YSyCGFmmtopVMi32iaL4hEK6m+/pPkx1iMFA==}
electron-to-chromium@1.5.267:
resolution: {integrity: sha512-0Drusm6MVRXSOJpGbaSVgcQsuB4hEkMpHXaVstcPmhu5LIedxs1xNK/nIxmQIU/RPC0+1/o0AVZfBTkTNJOdUw==}
@@ -5081,6 +5135,10 @@ packages:
resolution: {integrity: sha512-/kyM18EfinwXZbno9FyUGeFh87KC8HRQBQGildHZbEuRyWFOmv1U10o9BBp8XVZDVNNuQKyIGIu5ZYAAXJ0V2Q==}
engines: {node: '>= 4'}
encodeurl@2.0.0:
resolution: {integrity: sha512-Q0n9HRi4m6JuGIV1eFlmvJB7ZEVxu93IrMyiMsGC0lrMJMWzRgx6WGquyfQgZVb31vhGgXnfmPNNXmxnOkRBrg==}
engines: {node: '>= 0.8'}
endent@2.1.0:
resolution: {integrity: sha512-r8VyPX7XL8U01Xgnb1CjZ3XV+z90cXIJ9JPE/R9SEC9vpw2P6CfsRPJmp20DppC5N7ZAMCmjYkJIa744Iyg96w==}
@@ -5180,6 +5238,9 @@ packages:
resolution: {integrity: sha512-WUj2qlxaQtO4g6Pq5c29GTcWGDyd8itL8zTlipgECz3JesAiiOKotd8JU6otB3PACgG6xkJUyVhboMS+bje/jA==}
engines: {node: '>=6'}
escape-html@1.0.3:
resolution: {integrity: sha512-NiSupZ4OeuGwr68lGIeym/ksIZMJodUGOSCZ/FSnTxcrekbvqrgdUxlJOMpijaKZVjAJrWrGs/6Jy8OMuyj9ow==}
escape-string-regexp@4.0.0:
resolution: {integrity: sha512-TtpcNJ3XAzx3Gq8sWRzJaVajRs0uVxA2YAkdb1jm2YkPz4G6egUFAyA3n5vtEIZefPk5Wa4UXbKuS5fKkJWdgA==}
engines: {node: '>=10'}
@@ -5493,6 +5554,10 @@ packages:
react-dom:
optional: true
fresh@0.5.2:
resolution: {integrity: sha512-zJ2mQYM18rEFOudeV4GShTGIQ7RbzA7ozbU9I/XBpm7kqgMywgmylMwXHxZJmkVoYkna9d2pVXVXPdYTP9ej8Q==}
engines: {node: '>= 0.6'}
fs-extra@10.1.0:
resolution: {integrity: sha512-oRXApq54ETRj4eMiFzGnHWGy+zo5raudjuxN0b8H7s/RU2oW0Wvsx9O0ACRN/kRq9E8Vu/ReskGB5o3ji+FzHQ==}
engines: {node: '>=12'}
@@ -5773,6 +5838,18 @@ packages:
htmlparser2@6.1.0:
resolution: {integrity: sha512-gyyPk6rgonLFEDGoeRgQNaEUvdJ4ktTmmUh/h2t7s+M8oPpIPxgNACWa+6ESR57kXstwqPiCut0V8NRpcwgU7A==}
http-assert@1.5.0:
resolution: {integrity: sha512-uPpH7OKX4H25hBmU6G1jWNaqJGpTXxey+YOUizJUAgu0AjLUeC8D73hTrhvDS5D+GJN1DN1+hhc/eF/wpxtp0w==}
engines: {node: '>= 0.8'}
http-errors@1.8.1:
resolution: {integrity: sha512-Kpk9Sm7NmI+RHhnj6OIWDI1d6fIoFAtFt9RLaTMRlg/8w49juAStsrBgp0Dp4OdxdVbRIeKhtCUvoi/RuAhO4g==}
engines: {node: '>= 0.6'}
http-errors@2.0.1:
resolution: {integrity: sha512-4FbRdAX+bSdmo4AUFuS0WNiPz8NgFt+r8ThgNWmlrjQjt1Q7ZR9+zTlce2859x4KSXrwIsaeTqDoKQmtP8pLmQ==}
engines: {node: '>= 0.8'}
http-proxy-agent@7.0.2:
resolution: {integrity: sha512-T1gkAiYYDWYx3V5Bmyu7HcfcvL7mUrTWiM6yOfa3PIphViJ/gFPbvidQ+veqSOHci/PxBcDabeUNCzpOODJZig==}
engines: {node: '>= 14'}
@@ -6193,12 +6270,26 @@ packages:
resolution: {integrity: sha512-YHzO7721WbmAL6Ov1uzN/l5mY5WWWhJBSW+jq4tkfZfsxmo1hu6frS0EOswvjBUnWE6NtjEs48SFn5CQESRLZg==}
hasBin: true
keygrip@1.1.0:
resolution: {integrity: sha512-iYSchDJ+liQ8iwbSI2QqsQOvqv58eJCEanyJPJi+Khyu8smkcKSFUCbPwzFcL7YVtZ6eONjqRX/38caJ7QjRAQ==}
engines: {node: '>= 0.6'}
keyv@4.5.4:
resolution: {integrity: sha512-oxVHkHR/EJf2CNXnWxRLW6mg7JyCCUcG0DtEGmL2ctUo1PNTin1PUil+r/+4r5MpVgC/fn1kjsx7mjSujKqIpw==}
khroma@2.1.0:
resolution: {integrity: sha512-Ls993zuzfayK269Svk9hzpeGUKob/sIgZzyHYdjQoAdQetRKpOLj+k/QQQ/6Qi0Yz65mlROrfd+Ev+1+7dz9Kw==}
koa-compose@4.1.0:
resolution: {integrity: sha512-8ODW8TrDuMYvXRwra/Kh7/rJo9BtOfPc6qO8eAfC80CnCvSjSl0bkRM24X6/XBBEyj0v1nRUQ1LyOy3dbqOWXw==}
koa-static-resolver@1.0.6:
resolution: {integrity: sha512-ZX5RshSzH8nFn05/vUNQzqw32nEigsPa67AVUr6ZuQxuGdnCcTLcdgr4C81+YbJjpgqKHfacMBd7NmJIbj7fXw==}
koa@3.2.0:
resolution: {integrity: sha512-TrM4/tnNY7uJ1aW55sIIa+dqBvc4V14WRIAlGcWat9wV5pRS9Wr5Zk2ZTjQP1jtfIHDoHiSbPuV08P0fUZo2pg==}
engines: {node: '>= 18'}
langium@3.3.1:
resolution: {integrity: sha512-QJv/h939gDpvT+9SiLVlY7tZC3xB2qK57v0J04Sh9wpMb6MP1q8gB21L3WIo8T5P1MSMg3Ep14L7KkDCFG3y4w==}
engines: {node: '>=16.0.0'}
@@ -6351,6 +6442,9 @@ packages:
resolution: {integrity: sha512-h5bgJWpxJNswbU7qCrV0tIKQCaS3blPDrqKWx+QxzuzL1zGUzij9XCWLrSLsJPu5t+eWA/ycetzYAO5IOMcWAQ==}
hasBin: true
lz-utils@2.1.0:
resolution: {integrity: sha512-CMkfimAypidTtWjNDxY8a1bc1mJdyEh04V2FfEQ5Zh8Nx4v7k850EYa+dOWGn9hKG5xOyHP5MkuduAZCTHRvJw==}
magic-string@0.30.21:
resolution: {integrity: sha512-vd2F4YUyEXKGcLHoq+TEyCjxueSeHnFxyyjNp80yg0XV4vUhnDer/lvvlqM/arB5bXQN5K2/3oinyCRyx8T2CQ==}
@@ -6456,6 +6550,10 @@ packages:
mdurl@2.0.0:
resolution: {integrity: sha512-Lf+9+2r+Tdp5wXDXC4PcIBjTDtq4UKjCPMQhKIuzpJNW0b96kVqSwW0bT7FhRSfmAiFYgP+SCRvdrDozfh0U5w==}
media-typer@1.1.0:
resolution: {integrity: sha512-aisnrDP4GNe06UcKFnV5bfMNPBUw4jsLGaWwWfnH3v02GnBuXX2MCVn5RbrWo0j3pczUilYblq7fQ7Nw2t5XKw==}
engines: {node: '>= 0.8'}
memfs@3.5.3:
resolution: {integrity: sha512-UERzLsxzllchadvbPs5aolHh65ISpKpM+ccLbOJ8/vvpBKmAWf+la7dXFy7Mr0ySHbdHrFv5kGFCUHHe6GFEmw==}
engines: {node: '>= 4.0.0'}
@@ -6598,10 +6696,18 @@ packages:
resolution: {integrity: sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg==}
engines: {node: '>= 0.6'}
mime-db@1.54.0:
resolution: {integrity: sha512-aU5EJuIN2WDemCcAp2vFBfp/m4EAhWJnUNSSw0ixs7/kXbd6Pg64EmwJkNdFhB8aWt1sH2CTXrLxo/iAGV3oPQ==}
engines: {node: '>= 0.6'}
mime-types@2.1.35:
resolution: {integrity: sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw==}
engines: {node: '>= 0.6'}
mime-types@3.0.2:
resolution: {integrity: sha512-Lbgzdk0h4juoQ9fCKXW4by0UJqj+nOOrI9MJ1sSj4nI8aI2eo1qmvQEie4VD1glsS250n15LsWsYtCugiStS5A==}
engines: {node: '>=18'}
mimic-fn@2.1.0:
resolution: {integrity: sha512-OqbOk5oEQeAZ8WXWydlu9HJjz9WVdEIvamMCcXmuqUYjTknH/sqsWvhQ3vgwKFRR1HpjvNBKQ37nbJgYzGqGcg==}
engines: {node: '>=6'}
@@ -6640,6 +6746,17 @@ packages:
module-details-from-path@1.0.4:
resolution: {integrity: sha512-EGWKgxALGMgzvxYF1UyGTy0HXX/2vHLkw6+NvDKW2jypWbHpjQuj4UMcqQWXHERJhVGKikolT06G3bcKe4fi7w==}
monocart-coverage-reports@2.12.9:
resolution: {integrity: sha512-vtFqbC3Egl4nVa1FSIrQvMPO6HZtb9lo+3IW7/crdvrLNW2IH8lUsxaK0TsKNmMO2mhFWwqQywLV2CZelqPgwA==}
hasBin: true
monocart-locator@1.0.2:
resolution: {integrity: sha512-v8W5hJLcWMIxLCcSi/MHh+VeefI+ycFmGz23Froer9QzWjrbg4J3gFJBuI/T1VLNoYxF47bVPPxq8ZlNX4gVCw==}
monocart-reporter@2.10.0:
resolution: {integrity: sha512-Q421HL8hCr024HMjQcQylEpOLy69FE6Zli2s/A0zptfFEPW/kaz6B1Ll3CYs8L1j67+egt1HeNC1LTHUsp6W+A==}
hasBin: true
motion-dom@12.24.8:
resolution: {integrity: sha512-wX64WITk6gKOhaTqhsFqmIkayLAAx45SVFiMnJIxIrH5uqyrwrxjrfo8WX9Kh8CaUAixjeMn82iH0W0QT9wD5w==}
@@ -6688,6 +6805,10 @@ packages:
natural-compare@1.4.0:
resolution: {integrity: sha512-OWND8ei3VtNC9h7V60qff3SVobHr996CTwgxubgyQYEpg290h9J0buyECNNJexkFm5sOajh5G116RYA1c8ZMSw==}
negotiator@0.6.3:
resolution: {integrity: sha512-+EUsqGPLsM+j/zdChZjsnX51g4XrHFOIXwfnCVPGlQk/k5giakcKsuxCObBRu6DSm9opw/O6slWbJdghQM4bBg==}
engines: {node: '>= 0.6'}
neo-async@2.6.2:
resolution: {integrity: sha512-Yd3UES5mWCSqR+qNT93S3UoYUkqAZ9lLg8a7g9rimsWmYGK8cVToA4/sF3RrshdyV3sAGMXVUmpMYOw+dLpOuw==}
@@ -6757,6 +6878,10 @@ packages:
node-releases@2.0.27:
resolution: {integrity: sha512-nmh3lCkYZ3grZvqcCH+fjmQ7X+H0OeZgP40OierEaAptX4XofMh5kwNbWh7lBduUzCcV/8kZ+NDLCwm2iorIlA==}
nodemailer@7.0.13:
resolution: {integrity: sha512-PNDFSJdP+KFgdsG3ZzMXCgquO7I6McjY2vlqILjtJd0hy8wEvtugS9xKRF2NWlPNGxvLCXlTNIae4serI7dinw==}
engines: {node: '>=6.0.0'}
normalize-path@3.0.0:
resolution: {integrity: sha512-6eZs5Ls3WtCisHWp9S2GUy8dqkpGi4BVSz3GaqiE6ezub0512ESztXUwUB6C6IKbQkY2Pnb/mD4WYojCRwcwLA==}
engines: {node: '>=0.10.0'}
@@ -6851,6 +6976,10 @@ packages:
obug@2.1.1:
resolution: {integrity: sha512-uTqF9MuPraAQ+IsnPf366RG4cP9RtUi7MLO1N3KEc+wb0a6yKpeL0lmk2IB1jY5KHPAlTc6T/JRdC/YqxHNwkQ==}
on-finished@2.4.1:
resolution: {integrity: sha512-oVlzkg3ENAhCk2zdv7IJwd/QUD4z2RxRwpkcGY8psCVcCYZNq4wYnVWALHM+brtuJjePWiYF/ClmuDr8Ch5+kg==}
engines: {node: '>= 0.8'}
once@1.4.0:
resolution: {integrity: sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w==}
@@ -6953,6 +7082,10 @@ packages:
parse5@8.0.0:
resolution: {integrity: sha512-9m4m5GSgXjL4AjumKzq1Fgfp3Z8rsvjRNbnkVwfu2ImRqE5D0LnY2QfDen18FSY9C573YU5XxSapdHZTZ2WolA==}
parseurl@1.3.3:
resolution: {integrity: sha512-CiyeOxFT/JZyN5m0z9PfXw4SCBJ6Sygz1Dpl0wqjlhDEGGBP1GnsUVEL0p63hoG1fcj3fHynXi9NYO4nWOL+qQ==}
engines: {node: '>= 0.8'}
pascal-case@3.1.2:
resolution: {integrity: sha512-uWlGT3YSnK9x3BQJaOdcZwrnV6hPpd8jFH1/ucpiLRPh/2zCVJKS19E4GvYHvaCcACn3foXZ0cLB9Wrx1KGe5g==}
@@ -7751,6 +7884,9 @@ packages:
setimmediate@1.0.5:
resolution: {integrity: sha512-MATJdZp8sLqDl/68LfQmbP8zKPLQNV6BIZoIgrscFDQ+RsvK/BxeDQOgyxKKoh0y/8h3BqVFnCqQ/gd+reiIXA==}
setprototypeof@1.2.0:
resolution: {integrity: sha512-E5LDX7Wrp85Kil5bhZv46j8jOeboKq5JMmYM3gVGdGH8xFpPWXUMsNrlODCrkoxMEeNi/XZIwuRvY4XNwYMJpw==}
sha.js@2.4.12:
resolution: {integrity: sha512-8LzC5+bvI45BjpfXU8V5fdU2mfeKiQe1D1gIMn7XUlF3OTUrpdJpPPH4EMAnF0DsHHdSZqCdSss5qCmJKuiO3w==}
engines: {node: '>= 0.10'}
@@ -7872,6 +8008,10 @@ packages:
resolution: {integrity: sha512-WjlahMgHmCJpqzU8bIBy4qtsZdU9lRlcZE3Lvyej6t4tuOuv1vk57OW3MBrj6hXBFx/nNoC9MPMTcr5YA7NQbg==}
engines: {node: '>=6'}
statuses@1.5.0:
resolution: {integrity: sha512-OpZ3zP+jT1PI7I8nemJX4AKmAX070ZkYPVWV/AaKTJl+tXCTGyVdC1a4SL8RUQYEwk/f34ZX8UTykN68FwrqAA==}
engines: {node: '>= 0.6'}
statuses@2.0.2:
resolution: {integrity: sha512-DvEy55V3DB7uknRo+4iOGT5fP1slR8wQohVdknigZPMpMstaKJQWhwiYBACJE3Ul2pTnATihhBYnRhZQHGBiRw==}
engines: {node: '>= 0.8'}
@@ -8157,6 +8297,10 @@ packages:
resolution: {integrity: sha512-65P7iz6X5yEr1cwcgvQxbbIw7Uk3gOy5dIdtZ4rDveLqhrdJP+Li/Hx6tyK0NEb+2GCyneCMJiGqrADCSNk8sQ==}
engines: {node: '>=8.0'}
toidentifier@1.0.1:
resolution: {integrity: sha512-o5sSPKEkg/DIQNmH43V0/uerLrpzVedkUh8tGNvaeXpfpuwjKenlSox/2O/BTlZUtEe+JG7s5YhEz608PlAHRA==}
engines: {node: '>=0.6'}
tough-cookie@6.0.0:
resolution: {integrity: sha512-kXuRi1mtaKMrsLUxz3sQYvVl37B0Ns6MzfrtV5DvJceE9bPyspOqk9xxv7XbZWcfLWbFmm997vl83qUWVJA64w==}
engines: {node: '>=16'}
@@ -8228,6 +8372,10 @@ packages:
tslib@2.8.1:
resolution: {integrity: sha512-oJFu94HQb+KVduSUQL7wnpmqnfmLsOA/nAh6b6EH0wCEoK0/mPeXU6c3wKDV83MkOuHPRHtSXKKU99IBazS/2w==}
tsscmp@1.0.6:
resolution: {integrity: sha512-LxhtAkPDTkVCMQjt2h6eBVY28KCjikZqZfMcC15YBeNjkgUpdCfBu5HoiOTDu86v6smE8yOjyEktJ8hlbANHQA==}
engines: {node: '>=0.6.x'}
tty-browserify@0.0.1:
resolution: {integrity: sha512-C3TaO7K81YvjCgQH9Q1S3R3P3BtN3RIM8n+OvX4il1K1zgE8ZhI0op7kClgkxtutIE8hQrcrHBXvIheqKUUCxw==}
@@ -8257,6 +8405,10 @@ packages:
resolution: {integrity: sha512-TeTSQ6H5YHvpqVwBRcnLDCBnDOHWYu7IvGbHT6N8AOymcr9PJGjc1GTtiWZTYg0NCgYwvnYWEkVChQAr9bjfwA==}
engines: {node: '>=16'}
type-is@2.0.1:
resolution: {integrity: sha512-OZs6gsjF4vMp32qrCbiVSkrFmXtG/AZhY3t0iAMrMBiAZyV9oALtXO8hsrHbMXF9x6L3grlFuwW2oAz7cav+Gw==}
engines: {node: '>= 0.6'}
typed-array-buffer@1.0.3:
resolution: {integrity: sha512-nAYYwfY3qnzX30IkA6AQZjVbtK6duGontcQm1WSG1MD94YLqK0515GNApXkoxKOWMusVssAHWLh9SeaoefYFGw==}
engines: {node: '>= 0.4'}
@@ -8457,6 +8609,10 @@ packages:
resolution: {integrity: sha512-spH26xU080ydGggxRyR1Yhcbgx+j3y5jbNXk/8L+iRvdIEQ4uTRH2Sgf2dokud6Q4oAtsbNvJ1Ft+9xmm6IZcA==}
engines: {node: '>= 0.10'}
vary@1.1.2:
resolution: {integrity: sha512-BNGbWLfd0eUPabhkXUVm0j8uuvREyTh5ovRa/dyow/BqAbZJyC+5fU+IzQOzmAKzYqYRAISoRhdQr3eIZ/PXqg==}
engines: {node: '>= 0.8'}
vaul@1.1.2:
resolution: {integrity: sha512-ZFkClGpWyI2WUQjdLJ/BaGuV6AVQiJ3uELGk3OYtP+B6yCO7Cmn9vPFXVJkRaGkOJu3m8bQMgtyzNHixULceQA==}
peerDependencies:
@@ -12911,6 +13067,11 @@ snapshots:
dependencies:
event-target-shim: 5.0.1
accepts@1.3.8:
dependencies:
mime-types: 2.1.35
negotiator: 0.6.3
acorn-import-attributes@1.9.5(acorn@8.15.0):
dependencies:
acorn: 8.15.0
@@ -12923,6 +13084,14 @@ snapshots:
dependencies:
acorn: 8.15.0
acorn-loose@8.5.2:
dependencies:
acorn: 8.15.0
acorn-walk@8.3.5:
dependencies:
acorn: 8.15.0
acorn@8.15.0: {}
adjust-sourcemap-loader@4.0.0:
@@ -13472,14 +13641,25 @@ snapshots:
console-browserify@1.2.0: {}
console-grid@2.2.3: {}
constants-browserify@1.0.0: {}
content-disposition@1.0.1: {}
content-type@1.0.5: {}
convert-source-map@1.9.0: {}
convert-source-map@2.0.0: {}
cookie@1.0.2: {}
cookies@0.9.1:
dependencies:
depd: 2.0.0
keygrip: 1.1.0
core-js-compat@3.47.0:
dependencies:
browserslist: 4.28.1
@@ -13843,6 +14023,8 @@ snapshots:
deep-eql@5.0.2: {}
deep-equal@1.0.1: {}
deep-is@0.1.4: {}
deepmerge-ts@7.1.5: {}
@@ -13867,6 +14049,12 @@ snapshots:
dependencies:
robust-predicates: 3.0.2
delegates@1.0.0: {}
depd@1.1.2: {}
depd@2.0.0: {}
dependency-graph@0.11.0: {}
dequal@2.0.3: {}
@@ -13876,6 +14064,8 @@ snapshots:
inherits: 2.0.4
minimalistic-assert: 1.0.1
destroy@1.2.0: {}
detect-libc@2.1.2:
optional: true
@@ -13958,6 +14148,10 @@ snapshots:
eastasianwidth@0.2.0: {}
ee-first@1.1.1: {}
eight-colors@1.3.2: {}
electron-to-chromium@1.5.267: {}
elliptic@6.6.1:
@@ -13990,6 +14184,8 @@ snapshots:
emojis-list@3.0.0: {}
encodeurl@2.0.0: {}
endent@2.1.0:
dependencies:
dedent: 0.7.0
@@ -14209,6 +14405,8 @@ snapshots:
escalade@3.2.0: {}
escape-html@1.0.3: {}
escape-string-regexp@4.0.0: {}
escape-string-regexp@5.0.0: {}
@@ -14606,6 +14804,8 @@ snapshots:
react: 18.3.1
react-dom: 18.3.1(react@18.3.1)
fresh@0.5.2: {}
fs-extra@10.1.0:
dependencies:
graceful-fs: 4.2.11
@@ -14994,6 +15194,27 @@ snapshots:
domutils: 2.8.0
entities: 2.2.0
http-assert@1.5.0:
dependencies:
deep-equal: 1.0.1
http-errors: 1.8.1
http-errors@1.8.1:
dependencies:
depd: 1.1.2
inherits: 2.0.4
setprototypeof: 1.2.0
statuses: 1.5.0
toidentifier: 1.0.1
http-errors@2.0.1:
dependencies:
depd: 2.0.0
inherits: 2.0.4
setprototypeof: 1.2.0
statuses: 2.0.2
toidentifier: 1.0.1
http-proxy-agent@7.0.2:
dependencies:
agent-base: 7.1.4
@@ -15409,12 +15630,41 @@ snapshots:
dependencies:
commander: 8.3.0
keygrip@1.1.0:
dependencies:
tsscmp: 1.0.6
keyv@4.5.4:
dependencies:
json-buffer: 3.0.1
khroma@2.1.0: {}
koa-compose@4.1.0: {}
koa-static-resolver@1.0.6: {}
koa@3.2.0:
dependencies:
accepts: 1.3.8
content-disposition: 1.0.1
content-type: 1.0.5
cookies: 0.9.1
delegates: 1.0.0
destroy: 1.2.0
encodeurl: 2.0.0
escape-html: 1.0.3
fresh: 0.5.2
http-assert: 1.5.0
http-errors: 2.0.1
koa-compose: 4.1.0
mime-types: 3.0.2
on-finished: 2.4.1
parseurl: 1.3.3
statuses: 2.0.2
type-is: 2.0.1
vary: 1.1.2
langium@3.3.1:
dependencies:
chevrotain: 11.0.3
@@ -15552,6 +15802,8 @@ snapshots:
lz-string@1.5.0: {}
lz-utils@2.1.0: {}
magic-string@0.30.21:
dependencies:
'@jridgewell/sourcemap-codec': 1.5.5
@@ -15771,6 +16023,8 @@ snapshots:
mdurl@2.0.0: {}
media-typer@1.1.0: {}
memfs@3.5.3:
dependencies:
fs-monkey: 1.1.0
@@ -16047,10 +16301,16 @@ snapshots:
mime-db@1.52.0: {}
mime-db@1.54.0: {}
mime-types@2.1.35:
dependencies:
mime-db: 1.52.0
mime-types@3.0.2:
dependencies:
mime-db: 1.54.0
mimic-fn@2.1.0: {}
min-indent@1.0.1: {}
@@ -16084,6 +16344,34 @@ snapshots:
module-details-from-path@1.0.4: {}
monocart-coverage-reports@2.12.9:
dependencies:
acorn: 8.15.0
acorn-loose: 8.5.2
acorn-walk: 8.3.5
commander: 14.0.2
console-grid: 2.2.3
eight-colors: 1.3.2
foreground-child: 3.3.1
istanbul-lib-coverage: 3.2.2
istanbul-lib-report: 3.0.1
istanbul-reports: 3.2.0
lz-utils: 2.1.0
monocart-locator: 1.0.2
monocart-locator@1.0.2: {}
monocart-reporter@2.10.0:
dependencies:
console-grid: 2.2.3
eight-colors: 1.3.2
koa: 3.2.0
koa-static-resolver: 1.0.6
lz-utils: 2.1.0
monocart-coverage-reports: 2.12.9
monocart-locator: 1.0.2
nodemailer: 7.0.13
motion-dom@12.24.8:
dependencies:
motion-utils: 12.23.28
@@ -16138,6 +16426,8 @@ snapshots:
natural-compare@1.4.0: {}
negotiator@0.6.3: {}
neo-async@2.6.2: {}
next-themes@0.4.6(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
@@ -16237,6 +16527,8 @@ snapshots:
node-releases@2.0.27: {}
nodemailer@7.0.13: {}
normalize-path@3.0.0: {}
npm-run-path@4.0.1:
@@ -16338,6 +16630,10 @@ snapshots:
obug@2.1.1: {}
on-finished@2.4.1:
dependencies:
ee-first: 1.1.1
once@1.4.0:
dependencies:
wrappy: 1.0.2
@@ -16495,6 +16791,8 @@ snapshots:
entities: 6.0.1
optional: true
parseurl@1.3.3: {}
pascal-case@3.1.2:
dependencies:
no-case: 3.0.4
@@ -17365,6 +17663,8 @@ snapshots:
setimmediate@1.0.5: {}
setprototypeof@1.2.0: {}
sha.js@2.4.12:
dependencies:
inherits: 2.0.4
@@ -17526,6 +17826,8 @@ snapshots:
dependencies:
type-fest: 0.7.1
statuses@1.5.0: {}
statuses@2.0.2: {}
std-env@3.10.0: {}
@@ -17873,6 +18175,8 @@ snapshots:
dependencies:
is-number: 7.0.0
toidentifier@1.0.1: {}
tough-cookie@6.0.0:
dependencies:
tldts: 7.0.19
@@ -17930,6 +18234,8 @@ snapshots:
tslib@2.8.1: {}
tsscmp@1.0.6: {}
tty-browserify@0.0.1: {}
twemoji-parser@14.0.0: {}
@@ -17953,6 +18259,12 @@ snapshots:
type-fest@4.41.0: {}
type-is@2.0.1:
dependencies:
content-type: 1.0.5
media-typer: 1.1.0
mime-types: 3.0.2
typed-array-buffer@1.0.3:
dependencies:
call-bound: 1.0.4
@@ -18182,6 +18494,8 @@ snapshots:
validator@13.15.26: {}
vary@1.1.2: {}
vaul@1.1.2(@types/react-dom@18.3.5(@types/react@18.3.17))(@types/react@18.3.17)(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
dependencies:
'@radix-ui/react-dialog': 1.1.15(@types/react-dom@18.3.5(@types/react@18.3.17))(@types/react@18.3.17)(react-dom@18.3.1(react@18.3.1))(react@18.3.1)

View File

@@ -0,0 +1,156 @@
import { describe, it, expect, beforeEach } from "vitest";
import { useOnboardingWizardStore } from "../store";
beforeEach(() => {
useOnboardingWizardStore.getState().reset();
});
describe("useOnboardingWizardStore", () => {
describe("initial state", () => {
it("starts at step 1 with empty fields", () => {
const state = useOnboardingWizardStore.getState();
expect(state.currentStep).toBe(1);
expect(state.name).toBe("");
expect(state.role).toBe("");
expect(state.otherRole).toBe("");
expect(state.painPoints).toEqual([]);
expect(state.otherPainPoint).toBe("");
});
});
describe("setName", () => {
it("updates the name", () => {
useOnboardingWizardStore.getState().setName("Alice");
expect(useOnboardingWizardStore.getState().name).toBe("Alice");
});
});
describe("setRole", () => {
it("updates the role", () => {
useOnboardingWizardStore.getState().setRole("Engineer");
expect(useOnboardingWizardStore.getState().role).toBe("Engineer");
});
});
describe("setOtherRole", () => {
it("updates the other role text", () => {
useOnboardingWizardStore.getState().setOtherRole("Designer");
expect(useOnboardingWizardStore.getState().otherRole).toBe("Designer");
});
});
describe("togglePainPoint", () => {
it("adds a pain point", () => {
useOnboardingWizardStore.getState().togglePainPoint("slow builds");
expect(useOnboardingWizardStore.getState().painPoints).toEqual([
"slow builds",
]);
});
it("removes a pain point when toggled again", () => {
useOnboardingWizardStore.getState().togglePainPoint("slow builds");
useOnboardingWizardStore.getState().togglePainPoint("slow builds");
expect(useOnboardingWizardStore.getState().painPoints).toEqual([]);
});
it("handles multiple pain points", () => {
useOnboardingWizardStore.getState().togglePainPoint("slow builds");
useOnboardingWizardStore.getState().togglePainPoint("no tests");
expect(useOnboardingWizardStore.getState().painPoints).toEqual([
"slow builds",
"no tests",
]);
useOnboardingWizardStore.getState().togglePainPoint("slow builds");
expect(useOnboardingWizardStore.getState().painPoints).toEqual([
"no tests",
]);
});
it("ignores new selections when at the max limit", () => {
useOnboardingWizardStore.getState().togglePainPoint("a");
useOnboardingWizardStore.getState().togglePainPoint("b");
useOnboardingWizardStore.getState().togglePainPoint("c");
useOnboardingWizardStore.getState().togglePainPoint("d");
expect(useOnboardingWizardStore.getState().painPoints).toEqual([
"a",
"b",
"c",
]);
});
it("still allows deselecting when at the max limit", () => {
useOnboardingWizardStore.getState().togglePainPoint("a");
useOnboardingWizardStore.getState().togglePainPoint("b");
useOnboardingWizardStore.getState().togglePainPoint("c");
useOnboardingWizardStore.getState().togglePainPoint("b");
expect(useOnboardingWizardStore.getState().painPoints).toEqual([
"a",
"c",
]);
});
});
describe("setOtherPainPoint", () => {
it("updates the other pain point text", () => {
useOnboardingWizardStore.getState().setOtherPainPoint("flaky CI");
expect(useOnboardingWizardStore.getState().otherPainPoint).toBe(
"flaky CI",
);
});
});
describe("nextStep", () => {
it("increments the step", () => {
useOnboardingWizardStore.getState().nextStep();
expect(useOnboardingWizardStore.getState().currentStep).toBe(2);
});
it("clamps at step 4", () => {
useOnboardingWizardStore.getState().goToStep(4);
useOnboardingWizardStore.getState().nextStep();
expect(useOnboardingWizardStore.getState().currentStep).toBe(4);
});
});
describe("prevStep", () => {
it("decrements the step", () => {
useOnboardingWizardStore.getState().goToStep(3);
useOnboardingWizardStore.getState().prevStep();
expect(useOnboardingWizardStore.getState().currentStep).toBe(2);
});
it("clamps at step 1", () => {
useOnboardingWizardStore.getState().prevStep();
expect(useOnboardingWizardStore.getState().currentStep).toBe(1);
});
});
describe("goToStep", () => {
it("jumps to an arbitrary step", () => {
useOnboardingWizardStore.getState().goToStep(3);
expect(useOnboardingWizardStore.getState().currentStep).toBe(3);
});
});
describe("reset", () => {
it("resets all fields to defaults", () => {
useOnboardingWizardStore.getState().setName("Alice");
useOnboardingWizardStore.getState().setRole("Engineer");
useOnboardingWizardStore.getState().setOtherRole("Other");
useOnboardingWizardStore.getState().togglePainPoint("slow builds");
useOnboardingWizardStore.getState().setOtherPainPoint("flaky CI");
useOnboardingWizardStore.getState().goToStep(3);
useOnboardingWizardStore.getState().reset();
const state = useOnboardingWizardStore.getState();
expect(state.currentStep).toBe(1);
expect(state.name).toBe("");
expect(state.role).toBe("");
expect(state.otherRole).toBe("");
expect(state.painPoints).toEqual([]);
expect(state.otherPainPoint).toBe("");
});
});
});

View File

@@ -7,9 +7,9 @@ export function ProgressBar({ currentStep, totalSteps }: Props) {
const percent = (currentStep / totalSteps) * 100;
return (
<div className="absolute left-0 top-0 h-[0.625rem] w-full bg-neutral-300">
<div className="absolute left-0 top-0 h-[3px] w-full bg-neutral-200">
<div
className="h-full bg-purple-400 shadow-[0_0_4px_2px_rgba(168,85,247,0.5)] transition-all duration-500 ease-out"
className="h-full bg-purple-400 transition-all duration-500 ease-out"
style={{ width: `${percent}%` }}
/>
</div>

View File

@@ -2,6 +2,7 @@
import { Text } from "@/components/atoms/Text/Text";
import { cn } from "@/lib/utils";
import { Check } from "@phosphor-icons/react";
interface Props {
icon: React.ReactNode;
@@ -24,13 +25,18 @@ export function SelectableCard({
onClick={onClick}
aria-pressed={selected}
className={cn(
"flex h-[9rem] w-[10.375rem] shrink-0 flex-col items-center justify-center gap-3 rounded-xl border-2 bg-white px-6 py-5 transition-all hover:shadow-sm md:shrink lg:gap-2 lg:px-10 lg:py-8",
"relative flex h-[9rem] w-[10.375rem] shrink-0 flex-col items-center justify-center gap-3 rounded-xl border-2 bg-white px-6 py-5 transition-all hover:shadow-sm md:shrink lg:gap-2 lg:px-10 lg:py-8",
className,
selected
? "border-purple-500 bg-purple-50 shadow-sm"
: "border-transparent",
)}
>
{selected && (
<span className="absolute right-2 top-2 flex h-5 w-5 items-center justify-center rounded-full bg-purple-500">
<Check size={12} weight="bold" className="text-white" />
</span>
)}
<Text
variant="lead"
as="span"

View File

@@ -3,6 +3,7 @@
import { Button } from "@/components/atoms/Button/Button";
import { Input } from "@/components/atoms/Input/Input";
import { Text } from "@/components/atoms/Text/Text";
import { cn } from "@/lib/utils";
import { ReactNode } from "react";
import { FadeIn } from "@/components/atoms/FadeIn/FadeIn";
@@ -73,6 +74,8 @@ export function PainPointsStep() {
togglePainPoint,
setOtherPainPoint,
hasSomethingElse,
atLimit,
shaking,
canContinue,
handleLaunch,
} = usePainPointsStep();
@@ -90,7 +93,7 @@ export function PainPointsStep() {
What&apos;s eating your time?
</Text>
<Text variant="lead" className="!text-zinc-500">
Pick the tasks you&apos;d love to hand off to Autopilot
Pick the tasks you&apos;d love to hand off to AutoPilot
</Text>
</div>
@@ -107,11 +110,22 @@ export function PainPointsStep() {
/>
))}
</div>
{!hasSomethingElse ? (
<Text variant="small" className="!text-zinc-500">
Pick as many as you want you can always change later
</Text>
) : null}
<Text
variant="small"
className={cn(
"transition-colors",
atLimit && canContinue ? "!text-green-600" : "!text-zinc-500",
shaking && "animate-shake",
)}
>
{shaking
? "You've picked 3 — tap one to swap it out"
: atLimit && canContinue
? "3 selected — you're all set!"
: atLimit && hasSomethingElse
? "Tell us what else takes up your time"
: "Pick up to 3 to start — AutoPilot can help with anything else later"}
</Text>
</div>
{hasSomethingElse && (
@@ -133,7 +147,7 @@ export function PainPointsStep() {
disabled={!canContinue}
className="w-full max-w-xs"
>
Launch Autopilot
Launch AutoPilot
</Button>
</div>
</FadeIn>

View File

@@ -8,6 +8,7 @@ import { FadeIn } from "@/components/atoms/FadeIn/FadeIn";
import { SelectableCard } from "../components/SelectableCard";
import { useOnboardingWizardStore } from "../store";
import { Emoji } from "@/components/atoms/Emoji/Emoji";
import { useEffect, useRef } from "react";
const IMG_SIZE = 42;
@@ -57,12 +58,26 @@ export function RoleStep() {
const setRole = useOnboardingWizardStore((s) => s.setRole);
const setOtherRole = useOnboardingWizardStore((s) => s.setOtherRole);
const nextStep = useOnboardingWizardStore((s) => s.nextStep);
const autoAdvanceTimer = useRef<ReturnType<typeof setTimeout> | null>(null);
const isOther = role === "Other";
const canContinue = role && (!isOther || otherRole.trim());
function handleContinue() {
if (canContinue) {
useEffect(() => {
return () => {
if (autoAdvanceTimer.current) clearTimeout(autoAdvanceTimer.current);
};
}, []);
function handleRoleSelect(id: string) {
if (autoAdvanceTimer.current) clearTimeout(autoAdvanceTimer.current);
setRole(id);
if (id !== "Other") {
autoAdvanceTimer.current = setTimeout(nextStep, 350);
}
}
function handleOtherContinue() {
if (otherRole.trim()) {
nextStep();
}
}
@@ -78,7 +93,7 @@ export function RoleStep() {
What best describes you, {name}?
</Text>
<Text variant="lead" className="!text-zinc-500">
Autopilot will tailor automations to your world
So AutoPilot knows how to help you best
</Text>
</div>
@@ -89,33 +104,35 @@ export function RoleStep() {
icon={r.icon}
label={r.label}
selected={role === r.id}
onClick={() => setRole(r.id)}
onClick={() => handleRoleSelect(r.id)}
className="p-8"
/>
))}
</div>
{isOther && (
<div className="-mb-5 w-full px-8 md:px-0">
<Input
id="other-role"
label="Other role"
hideLabel
placeholder="Describe your role..."
value={otherRole}
onChange={(e) => setOtherRole(e.target.value)}
autoFocus
/>
</div>
)}
<>
<div className="-mb-5 w-full px-8 md:px-0">
<Input
id="other-role"
label="Other role"
hideLabel
placeholder="Describe your role..."
value={otherRole}
onChange={(e) => setOtherRole(e.target.value)}
autoFocus
/>
</div>
<Button
onClick={handleContinue}
disabled={!canContinue}
className="w-full max-w-xs"
>
Continue
</Button>
<Button
onClick={handleOtherContinue}
disabled={!otherRole.trim()}
className="w-full max-w-xs"
>
Continue
</Button>
</>
)}
</div>
</FadeIn>
);

View File

@@ -4,13 +4,6 @@ import { AutoGPTLogo } from "@/components/atoms/AutoGPTLogo/AutoGPTLogo";
import { Button } from "@/components/atoms/Button/Button";
import { Input } from "@/components/atoms/Input/Input";
import { Text } from "@/components/atoms/Text/Text";
import {
Tooltip,
TooltipContent,
TooltipProvider,
TooltipTrigger,
} from "@/components/atoms/Tooltip/BaseTooltip";
import { Question } from "@phosphor-icons/react";
import { FadeIn } from "@/components/atoms/FadeIn/FadeIn";
import { useOnboardingWizardStore } from "../store";
@@ -40,36 +33,16 @@ export function WelcomeStep() {
<Text variant="h3">Welcome to AutoGPT</Text>
<Text variant="lead" as="span" className="!text-zinc-500">
Let&apos;s personalize your experience so{" "}
<span className="relative mr-3 inline-block bg-gradient-to-r from-purple-500 to-indigo-500 bg-clip-text text-transparent">
Autopilot
<span className="absolute -right-4 top-0">
<TooltipProvider delayDuration={400}>
<Tooltip>
<TooltipTrigger asChild>
<button
type="button"
aria-label="What is Autopilot?"
className="inline-flex text-purple-500"
>
<Question size={14} />
</button>
</TooltipTrigger>
<TooltipContent>
Autopilot is AutoGPT&apos;s AI assistant that watches your
connected apps, spots repetitive tasks you do every day
and runs them for you automatically.
</TooltipContent>
</Tooltip>
</TooltipProvider>
</span>
<span className="bg-gradient-to-r from-purple-500 to-indigo-500 bg-clip-text text-transparent">
AutoPilot
</span>{" "}
can start saving you time right away
can start saving you time
</Text>
</div>
<Input
id="first-name"
label="Your first name"
label="What should I call you?"
placeholder="e.g. John"
value={name}
onChange={(e) => setName(e.target.value)}

View File

@@ -0,0 +1,154 @@
import {
render,
screen,
fireEvent,
cleanup,
} from "@/tests/integrations/test-utils";
import { afterEach, beforeEach, describe, expect, test, vi } from "vitest";
import { useOnboardingWizardStore } from "../../store";
import { PainPointsStep } from "../PainPointsStep";
vi.mock("@/components/atoms/Emoji/Emoji", () => ({
Emoji: ({ text }: { text: string }) => <span>{text}</span>,
}));
vi.mock("@/components/atoms/FadeIn/FadeIn", () => ({
FadeIn: ({ children }: { children: React.ReactNode }) => (
<div>{children}</div>
),
}));
function getCard(name: RegExp) {
return screen.getByRole("button", { name });
}
function clickCard(name: RegExp) {
fireEvent.click(getCard(name));
}
function getLaunchButton() {
return screen.getByRole("button", { name: /launch autopilot/i });
}
afterEach(cleanup);
beforeEach(() => {
useOnboardingWizardStore.getState().reset();
useOnboardingWizardStore.getState().setName("Alice");
useOnboardingWizardStore.getState().setRole("Founder/CEO");
useOnboardingWizardStore.getState().goToStep(3);
});
describe("PainPointsStep", () => {
test("renders all pain point cards", () => {
render(<PainPointsStep />);
expect(getCard(/finding leads/i)).toBeDefined();
expect(getCard(/email & outreach/i)).toBeDefined();
expect(getCard(/reports & data/i)).toBeDefined();
expect(getCard(/customer support/i)).toBeDefined();
expect(getCard(/social media/i)).toBeDefined();
expect(getCard(/something else/i)).toBeDefined();
});
test("shows default helper text", () => {
render(<PainPointsStep />);
expect(
screen.getAllByText(/pick up to 3 to start/i).length,
).toBeGreaterThan(0);
});
test("selecting a card marks it as pressed", () => {
render(<PainPointsStep />);
clickCard(/finding leads/i);
expect(getCard(/finding leads/i).getAttribute("aria-pressed")).toBe("true");
});
test("launch button is disabled when nothing is selected", () => {
render(<PainPointsStep />);
expect(getLaunchButton().hasAttribute("disabled")).toBe(true);
});
test("launch button is enabled after selecting a pain point", () => {
render(<PainPointsStep />);
clickCard(/finding leads/i);
expect(getLaunchButton().hasAttribute("disabled")).toBe(false);
});
test("shows success text when 3 items are selected", () => {
render(<PainPointsStep />);
clickCard(/finding leads/i);
clickCard(/email & outreach/i);
clickCard(/reports & data/i);
expect(screen.getAllByText(/3 selected/i).length).toBeGreaterThan(0);
});
test("does not select a 4th item when at the limit", () => {
render(<PainPointsStep />);
clickCard(/finding leads/i);
clickCard(/email & outreach/i);
clickCard(/reports & data/i);
clickCard(/customer support/i);
expect(getCard(/customer support/i).getAttribute("aria-pressed")).toBe(
"false",
);
});
test("can deselect when at the limit and select a different one", () => {
render(<PainPointsStep />);
clickCard(/finding leads/i);
clickCard(/email & outreach/i);
clickCard(/reports & data/i);
clickCard(/finding leads/i);
expect(getCard(/finding leads/i).getAttribute("aria-pressed")).toBe(
"false",
);
clickCard(/customer support/i);
expect(getCard(/customer support/i).getAttribute("aria-pressed")).toBe(
"true",
);
});
test("shows input when 'Something else' is selected", () => {
render(<PainPointsStep />);
clickCard(/something else/i);
expect(
screen.getByPlaceholderText(/what else takes up your time/i),
).toBeDefined();
});
test("launch button is disabled when 'Something else' selected but input empty", () => {
render(<PainPointsStep />);
clickCard(/something else/i);
expect(getLaunchButton().hasAttribute("disabled")).toBe(true);
});
test("launch button is enabled when 'Something else' selected and input filled", () => {
render(<PainPointsStep />);
clickCard(/something else/i);
fireEvent.change(
screen.getByPlaceholderText(/what else takes up your time/i),
{ target: { value: "Manual invoicing" } },
);
expect(getLaunchButton().hasAttribute("disabled")).toBe(false);
});
});

View File

@@ -0,0 +1,123 @@
import {
render,
screen,
fireEvent,
cleanup,
} from "@/tests/integrations/test-utils";
import { afterEach, beforeEach, describe, expect, test, vi } from "vitest";
import { useOnboardingWizardStore } from "../../store";
import { RoleStep } from "../RoleStep";
vi.mock("@/components/atoms/Emoji/Emoji", () => ({
Emoji: ({ text }: { text: string }) => <span>{text}</span>,
}));
vi.mock("@/components/atoms/FadeIn/FadeIn", () => ({
FadeIn: ({ children }: { children: React.ReactNode }) => (
<div>{children}</div>
),
}));
afterEach(() => {
cleanup();
vi.useRealTimers();
});
beforeEach(() => {
vi.useFakeTimers();
useOnboardingWizardStore.getState().reset();
useOnboardingWizardStore.getState().setName("Alice");
useOnboardingWizardStore.getState().goToStep(2);
});
describe("RoleStep", () => {
test("renders all role cards", () => {
render(<RoleStep />);
expect(screen.getByText("Founder / CEO")).toBeDefined();
expect(screen.getByText("Operations")).toBeDefined();
expect(screen.getByText("Sales / BD")).toBeDefined();
expect(screen.getByText("Marketing")).toBeDefined();
expect(screen.getByText("Product / PM")).toBeDefined();
expect(screen.getByText("Engineering")).toBeDefined();
expect(screen.getByText("HR / People")).toBeDefined();
expect(screen.getByText("Other")).toBeDefined();
});
test("displays the user name in the heading", () => {
render(<RoleStep />);
expect(
screen.getAllByText(/what best describes you, alice/i).length,
).toBeGreaterThan(0);
});
test("selecting a non-Other role auto-advances after delay", () => {
render(<RoleStep />);
fireEvent.click(screen.getByRole("button", { name: /engineering/i }));
expect(useOnboardingWizardStore.getState().role).toBe("Engineering");
expect(useOnboardingWizardStore.getState().currentStep).toBe(2);
vi.advanceTimersByTime(350);
expect(useOnboardingWizardStore.getState().currentStep).toBe(3);
});
test("selecting 'Other' does not auto-advance", () => {
render(<RoleStep />);
fireEvent.click(screen.getByRole("button", { name: /\bother\b/i }));
vi.advanceTimersByTime(500);
expect(useOnboardingWizardStore.getState().currentStep).toBe(2);
});
test("selecting 'Other' shows text input and Continue button", () => {
render(<RoleStep />);
fireEvent.click(screen.getByRole("button", { name: /\bother\b/i }));
expect(screen.getByPlaceholderText(/describe your role/i)).toBeDefined();
expect(screen.getByRole("button", { name: /continue/i })).toBeDefined();
});
test("Continue button is disabled when Other input is empty", () => {
render(<RoleStep />);
fireEvent.click(screen.getByRole("button", { name: /\bother\b/i }));
const continueBtn = screen.getByRole("button", { name: /continue/i });
expect(continueBtn.hasAttribute("disabled")).toBe(true);
});
test("Continue button advances when Other role text is filled", () => {
render(<RoleStep />);
fireEvent.click(screen.getByRole("button", { name: /\bother\b/i }));
fireEvent.change(screen.getByPlaceholderText(/describe your role/i), {
target: { value: "Designer" },
});
const continueBtn = screen.getByRole("button", { name: /continue/i });
expect(continueBtn.hasAttribute("disabled")).toBe(false);
fireEvent.click(continueBtn);
expect(useOnboardingWizardStore.getState().currentStep).toBe(3);
});
test("switching from Other to a regular role cancels Other and auto-advances", () => {
render(<RoleStep />);
fireEvent.click(screen.getByRole("button", { name: /\bother\b/i }));
expect(screen.getByPlaceholderText(/describe your role/i)).toBeDefined();
fireEvent.click(screen.getByRole("button", { name: /marketing/i }));
expect(useOnboardingWizardStore.getState().role).toBe("Marketing");
vi.advanceTimersByTime(350);
expect(useOnboardingWizardStore.getState().currentStep).toBe(3);
});
});

View File

@@ -1,4 +1,5 @@
import { useOnboardingWizardStore } from "../store";
import { useEffect, useRef, useState } from "react";
import { MAX_PAIN_POINT_SELECTIONS, useOnboardingWizardStore } from "../store";
const ROLE_TOP_PICKS: Record<string, string[]> = {
"Founder/CEO": [
@@ -23,18 +24,38 @@ export function usePainPointsStep() {
const role = useOnboardingWizardStore((s) => s.role);
const painPoints = useOnboardingWizardStore((s) => s.painPoints);
const otherPainPoint = useOnboardingWizardStore((s) => s.otherPainPoint);
const togglePainPoint = useOnboardingWizardStore((s) => s.togglePainPoint);
const storeToggle = useOnboardingWizardStore((s) => s.togglePainPoint);
const setOtherPainPoint = useOnboardingWizardStore(
(s) => s.setOtherPainPoint,
);
const nextStep = useOnboardingWizardStore((s) => s.nextStep);
const [shaking, setShaking] = useState(false);
const shakeTimer = useRef<ReturnType<typeof setTimeout> | null>(null);
useEffect(() => {
return () => {
if (shakeTimer.current) clearTimeout(shakeTimer.current);
};
}, []);
const topIDs = getTopPickIDs(role);
const hasSomethingElse = painPoints.includes("Something else");
const atLimit = painPoints.length >= MAX_PAIN_POINT_SELECTIONS;
const canContinue =
painPoints.length > 0 &&
(!hasSomethingElse || Boolean(otherPainPoint.trim()));
function togglePainPoint(id: string) {
const alreadySelected = painPoints.includes(id);
if (!alreadySelected && atLimit) {
if (shakeTimer.current) clearTimeout(shakeTimer.current);
setShaking(true);
shakeTimer.current = setTimeout(() => setShaking(false), 600);
return;
}
storeToggle(id);
}
function handleLaunch() {
if (canContinue) {
nextStep();
@@ -48,6 +69,8 @@ export function usePainPointsStep() {
togglePainPoint,
setOtherPainPoint,
hasSomethingElse,
atLimit,
shaking,
canContinue,
handleLaunch,
};

View File

@@ -1,5 +1,6 @@
import { create } from "zustand";
export const MAX_PAIN_POINT_SELECTIONS = 3;
export type Step = 1 | 2 | 3 | 4;
interface OnboardingWizardState {
@@ -40,6 +41,8 @@ export const useOnboardingWizardStore = create<OnboardingWizardState>(
togglePainPoint(painPoint) {
set((state) => {
const exists = state.painPoints.includes(painPoint);
if (!exists && state.painPoints.length >= MAX_PAIN_POINT_SELECTIONS)
return state;
return {
painPoints: exists
? state.painPoints.filter((p) => p !== painPoint)

View File

@@ -3,18 +3,48 @@
import { useState } from "react";
import { Button } from "@/components/atoms/Button/Button";
import type { UserRateLimitResponse } from "@/app/api/__generated__/models/userRateLimitResponse";
import { useToast } from "@/components/molecules/Toast/use-toast";
import { UsageBar } from "../../components/UsageBar";
const TIERS = ["FREE", "PRO", "BUSINESS", "ENTERPRISE"] as const;
type Tier = (typeof TIERS)[number];
const TIER_MULTIPLIERS: Record<Tier, string> = {
FREE: "1x base limits",
PRO: "5x base limits",
BUSINESS: "20x base limits",
ENTERPRISE: "60x base limits",
};
const TIER_COLORS: Record<Tier, string> = {
FREE: "bg-gray-100 text-gray-700",
PRO: "bg-blue-100 text-blue-700",
BUSINESS: "bg-purple-100 text-purple-700",
ENTERPRISE: "bg-amber-100 text-amber-700",
};
interface Props {
data: UserRateLimitResponse;
onReset: (resetWeekly: boolean) => Promise<void>;
onTierChange?: (newTier: string) => Promise<void>;
/** Override the outer container classes (default: bordered card). */
className?: string;
}
export function RateLimitDisplay({ data, onReset, className }: Props) {
export function RateLimitDisplay({
data,
onReset,
onTierChange,
className,
}: Props) {
const [isResetting, setIsResetting] = useState(false);
const [resetWeekly, setResetWeekly] = useState(false);
const [isChangingTier, setIsChangingTier] = useState(false);
const { toast } = useToast();
const currentTier = TIERS.includes(data.tier as Tier)
? (data.tier as Tier)
: "FREE";
async function handleReset() {
const msg = resetWeekly
@@ -30,19 +60,76 @@ export function RateLimitDisplay({ data, onReset, className }: Props) {
}
}
async function handleTierChange(newTier: string) {
if (newTier === currentTier || !onTierChange) return;
if (
!window.confirm(
`Change tier from ${currentTier} to ${newTier}? This will change the user's rate limits.`,
)
)
return;
setIsChangingTier(true);
try {
await onTierChange(newTier);
toast({
title: "Tier updated",
description: `Changed to ${newTier} (${TIER_MULTIPLIERS[newTier as Tier]}).`,
});
} catch {
toast({
title: "Error",
description: "Failed to update tier.",
variant: "destructive",
});
} finally {
setIsChangingTier(false);
}
}
const nothingToReset = resetWeekly
? data.daily_tokens_used === 0 && data.weekly_tokens_used === 0
: data.daily_tokens_used === 0;
return (
<div className={className ?? "rounded-md border bg-white p-6"}>
<h2 className="mb-1 text-lg font-semibold">
Rate Limits for {data.user_email ?? data.user_id}
</h2>
{data.user_email && (
<p className="mb-4 text-xs text-gray-500">User ID: {data.user_id}</p>
)}
{!data.user_email && <div className="mb-4" />}
<div className="mb-4 flex items-start justify-between">
<div>
<h2 className="mb-1 text-lg font-semibold">
Rate Limits for {data.user_email ?? data.user_id}
</h2>
{data.user_email && (
<p className="text-xs text-gray-500">User ID: {data.user_id}</p>
)}
</div>
<span
className={`rounded-full px-3 py-1 text-xs font-medium ${TIER_COLORS[currentTier] ?? "bg-gray-100 text-gray-700"}`}
>
{currentTier}
</span>
</div>
<div className="mb-4 flex items-center gap-3">
<label className="text-sm font-medium text-gray-700">
Subscription Tier
</label>
<select
aria-label="Subscription tier"
value={currentTier}
onChange={(e) => handleTierChange(e.target.value)}
className="rounded-md border bg-white px-3 py-1.5 text-sm"
disabled={isChangingTier || !onTierChange}
>
{TIERS.map((tier) => (
<option key={tier} value={tier}>
{tier} {TIER_MULTIPLIERS[tier]}
</option>
))}
</select>
{isChangingTier && (
<span className="text-xs text-gray-500">Updating...</span>
)}
</div>
<div className="grid grid-cols-2 gap-6">
<div className="space-y-2">

View File

@@ -14,6 +14,7 @@ export function RateLimitManager() {
handleSearch,
handleSelectUser,
handleReset,
handleTierChange,
} = useRateLimitManager();
return (
@@ -74,7 +75,11 @@ export function RateLimitManager() {
)}
{rateLimitData && (
<RateLimitDisplay data={rateLimitData} onReset={handleReset} />
<RateLimitDisplay
data={rateLimitData}
onReset={handleReset}
onTierChange={handleTierChange}
/>
)}
</div>
);

View File

@@ -0,0 +1,281 @@
import {
render,
screen,
fireEvent,
waitFor,
cleanup,
} from "@/tests/integrations/test-utils";
import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
import { RateLimitDisplay } from "../RateLimitDisplay";
import type { UserRateLimitResponse } from "@/app/api/__generated__/models/userRateLimitResponse";
vi.mock("@/components/molecules/Toast/use-toast", () => ({
useToast: () => ({ toast: vi.fn() }),
}));
const mockConfirm = vi.fn();
beforeEach(() => {
mockConfirm.mockReset();
window.confirm = mockConfirm;
});
afterEach(() => {
cleanup();
});
function makeData(
overrides: Partial<UserRateLimitResponse> = {},
): UserRateLimitResponse {
return {
user_id: "user-abc-123",
user_email: "alice@example.com",
daily_token_limit: 10000,
weekly_token_limit: 50000,
daily_tokens_used: 2500,
weekly_tokens_used: 10000,
tier: "FREE",
...overrides,
};
}
describe("RateLimitDisplay", () => {
it("renders the user email heading", () => {
render(<RateLimitDisplay data={makeData()} onReset={vi.fn()} />);
expect(
screen.getByText(/Rate Limits for alice@example\.com/),
).toBeDefined();
});
it("renders user ID when email is present", () => {
render(<RateLimitDisplay data={makeData()} onReset={vi.fn()} />);
expect(screen.getByText(/user-abc-123/)).toBeDefined();
});
it("falls back to user_id in heading when email is absent", () => {
render(
<RateLimitDisplay
data={makeData({ user_email: undefined })}
onReset={vi.fn()}
/>,
);
expect(screen.getByText(/Rate Limits for user-abc-123/)).toBeDefined();
});
it("displays the current tier badge", () => {
render(
<RateLimitDisplay data={makeData({ tier: "PRO" })} onReset={vi.fn()} />,
);
const badge = screen.getByText("PRO");
expect(badge).toBeDefined();
expect(badge.className).toContain("bg-blue-100");
});
it("defaults unknown tier to FREE", () => {
render(
<RateLimitDisplay
data={makeData({ tier: "UNKNOWN" as UserRateLimitResponse["tier"] })}
onReset={vi.fn()}
/>,
);
const badge = screen.getByText("FREE");
expect(badge).toBeDefined();
});
it("renders tier dropdown with all tiers", () => {
render(<RateLimitDisplay data={makeData()} onReset={vi.fn()} />);
const select = screen.getByLabelText("Subscription tier");
expect(select).toBeDefined();
expect(select.querySelectorAll("option").length).toBe(4);
});
it("disables tier dropdown when onTierChange is not provided", () => {
render(<RateLimitDisplay data={makeData()} onReset={vi.fn()} />);
const select = screen.getByLabelText(
"Subscription tier",
) as HTMLSelectElement;
expect(select.disabled).toBe(true);
});
it("enables tier dropdown when onTierChange is provided", () => {
render(
<RateLimitDisplay
data={makeData()}
onReset={vi.fn()}
onTierChange={vi.fn()}
/>,
);
const select = screen.getByLabelText(
"Subscription tier",
) as HTMLSelectElement;
expect(select.disabled).toBe(false);
});
it("renders daily and weekly usage sections", () => {
render(<RateLimitDisplay data={makeData()} onReset={vi.fn()} />);
expect(screen.getByText("Daily Usage")).toBeDefined();
expect(screen.getByText("Weekly Usage")).toBeDefined();
});
it("renders reset scope dropdown and reset button", () => {
render(<RateLimitDisplay data={makeData()} onReset={vi.fn()} />);
expect(screen.getByLabelText("Reset scope")).toBeDefined();
expect(screen.getByText("Reset Usage")).toBeDefined();
});
it("disables reset button when nothing to reset", () => {
render(
<RateLimitDisplay
data={makeData({ daily_tokens_used: 0 })}
onReset={vi.fn()}
/>,
);
const button = screen.getByText("Reset Usage").closest("button")!;
expect(button.disabled).toBe(true);
});
it("enables reset button when there is usage to reset", () => {
render(
<RateLimitDisplay
data={makeData({ daily_tokens_used: 100 })}
onReset={vi.fn()}
/>,
);
const button = screen.getByText("Reset Usage").closest("button")!;
expect(button.disabled).toBe(false);
});
it("calls onReset when reset button is clicked and confirmed", async () => {
const onReset = vi.fn().mockResolvedValue(undefined);
mockConfirm.mockReturnValue(true);
render(<RateLimitDisplay data={makeData()} onReset={onReset} />);
fireEvent.click(screen.getByText("Reset Usage"));
await waitFor(() => {
expect(onReset).toHaveBeenCalledWith(false);
});
});
it("does not call onReset when confirm is cancelled", () => {
const onReset = vi.fn();
mockConfirm.mockReturnValue(false);
render(<RateLimitDisplay data={makeData()} onReset={onReset} />);
fireEvent.click(screen.getByText("Reset Usage"));
expect(onReset).not.toHaveBeenCalled();
});
it("passes resetWeekly=true when 'both' is selected", async () => {
const onReset = vi.fn().mockResolvedValue(undefined);
mockConfirm.mockReturnValue(true);
render(
<RateLimitDisplay
data={makeData({ weekly_tokens_used: 100 })}
onReset={onReset}
/>,
);
fireEvent.change(screen.getByLabelText("Reset scope"), {
target: { value: "both" },
});
fireEvent.click(screen.getByText("Reset Usage"));
await waitFor(() => {
expect(onReset).toHaveBeenCalledWith(true);
});
});
it("calls onTierChange when tier is changed and confirmed", async () => {
const onTierChange = vi.fn().mockResolvedValue(undefined);
mockConfirm.mockReturnValue(true);
render(
<RateLimitDisplay
data={makeData({ tier: "FREE" })}
onReset={vi.fn()}
onTierChange={onTierChange}
/>,
);
fireEvent.change(screen.getByLabelText("Subscription tier"), {
target: { value: "PRO" },
});
await waitFor(() => {
expect(onTierChange).toHaveBeenCalledWith("PRO");
});
});
it("does not call onTierChange when selecting the same tier", () => {
const onTierChange = vi.fn();
render(
<RateLimitDisplay
data={makeData({ tier: "FREE" })}
onReset={vi.fn()}
onTierChange={onTierChange}
/>,
);
fireEvent.change(screen.getByLabelText("Subscription tier"), {
target: { value: "FREE" },
});
expect(onTierChange).not.toHaveBeenCalled();
});
it("does not call onTierChange when confirm is cancelled", () => {
const onTierChange = vi.fn();
mockConfirm.mockReturnValue(false);
render(
<RateLimitDisplay
data={makeData({ tier: "FREE" })}
onReset={vi.fn()}
onTierChange={onTierChange}
/>,
);
fireEvent.change(screen.getByLabelText("Subscription tier"), {
target: { value: "PRO" },
});
expect(onTierChange).not.toHaveBeenCalled();
});
it("catches error when onTierChange rejects", async () => {
const onTierChange = vi.fn().mockRejectedValue(new Error("fail"));
mockConfirm.mockReturnValue(true);
render(
<RateLimitDisplay
data={makeData({ tier: "FREE" })}
onReset={vi.fn()}
onTierChange={onTierChange}
/>,
);
fireEvent.change(screen.getByLabelText("Subscription tier"), {
target: { value: "PRO" },
});
await waitFor(() => {
expect(onTierChange).toHaveBeenCalledWith("PRO");
});
});
it("applies custom className when provided", () => {
const { container } = render(
<RateLimitDisplay
data={makeData()}
onReset={vi.fn()}
className="custom-class"
/>,
);
expect(container.firstElementChild?.className).toBe("custom-class");
});
});

View File

@@ -0,0 +1,216 @@
import {
render,
screen,
fireEvent,
cleanup,
} from "@/tests/integrations/test-utils";
import { afterEach, describe, expect, it, vi } from "vitest";
import { RateLimitManager } from "../RateLimitManager";
import type { UserRateLimitResponse } from "@/app/api/__generated__/models/userRateLimitResponse";
const mockHandleSearch = vi.fn();
const mockHandleSelectUser = vi.fn();
const mockHandleReset = vi.fn();
const mockHandleTierChange = vi.fn();
vi.mock("../useRateLimitManager", () => ({
useRateLimitManager: () => mockHookReturn,
}));
vi.mock("../../../components/AdminUserSearch", () => ({
AdminUserSearch: ({
onSearch,
placeholder,
isLoading,
}: {
onSearch: (q: string) => void;
placeholder: string;
isLoading: boolean;
}) => (
<div data-testid="admin-user-search">
<input
data-testid="search-input"
placeholder={placeholder}
disabled={isLoading}
onKeyDown={(e) => {
if (e.key === "Enter") onSearch((e.target as HTMLInputElement).value);
}}
/>
</div>
),
}));
vi.mock("../RateLimitDisplay", () => ({
RateLimitDisplay: ({
data,
onReset,
onTierChange,
}: {
data: UserRateLimitResponse;
onReset: (rw: boolean) => void;
onTierChange: (t: string) => void;
}) => (
<div data-testid="rate-limit-display">
<span>{data.user_email ?? data.user_id}</span>
<button onClick={() => onReset(false)}>mock-reset</button>
<button onClick={() => onTierChange("PRO")}>mock-tier</button>
</div>
),
}));
let mockHookReturn = buildHookReturn();
function buildHookReturn(overrides: Record<string, unknown> = {}) {
return {
isSearching: false,
isLoadingRateLimit: false,
searchResults: [] as Array<{ user_id: string; user_email: string }>,
selectedUser: null as { user_id: string; user_email: string } | null,
rateLimitData: null as UserRateLimitResponse | null,
handleSearch: mockHandleSearch,
handleSelectUser: mockHandleSelectUser,
handleReset: mockHandleReset,
handleTierChange: mockHandleTierChange,
...overrides,
};
}
afterEach(() => {
cleanup();
mockHandleSearch.mockClear();
mockHandleSelectUser.mockClear();
mockHandleReset.mockClear();
mockHandleTierChange.mockClear();
mockHookReturn = buildHookReturn();
});
describe("RateLimitManager", () => {
it("renders the search section", () => {
render(<RateLimitManager />);
expect(screen.getByText("Search User")).toBeDefined();
expect(screen.getByTestId("admin-user-search")).toBeDefined();
});
it("renders description text for search", () => {
render(<RateLimitManager />);
expect(
screen.getByText(/Exact email or user ID does a direct lookup/),
).toBeDefined();
});
it("does not show user list when searchResults is empty", () => {
render(<RateLimitManager />);
expect(screen.queryByText(/Select a user/)).toBeNull();
});
it("shows user selection list when results exist and no user selected", () => {
mockHookReturn = buildHookReturn({
searchResults: [
{ user_id: "u1", user_email: "alice@example.com" },
{ user_id: "u2", user_email: "bob@example.com" },
],
});
render(<RateLimitManager />);
expect(screen.getByText("Select a user (2 results)")).toBeDefined();
expect(screen.getByText("alice@example.com")).toBeDefined();
expect(screen.getByText("bob@example.com")).toBeDefined();
});
it("shows singular 'result' text for single result", () => {
mockHookReturn = buildHookReturn({
searchResults: [{ user_id: "u1", user_email: "alice@example.com" }],
});
render(<RateLimitManager />);
expect(screen.getByText("Select a user (1 result)")).toBeDefined();
});
it("calls handleSelectUser when a user in the list is clicked", () => {
const users = [
{ user_id: "u1", user_email: "alice@example.com" },
{ user_id: "u2", user_email: "bob@example.com" },
];
mockHookReturn = buildHookReturn({ searchResults: users });
render(<RateLimitManager />);
fireEvent.click(screen.getByText("bob@example.com"));
expect(mockHandleSelectUser).toHaveBeenCalledWith(users[1]);
});
it("hides selection list when a user is selected", () => {
const users = [{ user_id: "u1", user_email: "alice@example.com" }];
mockHookReturn = buildHookReturn({
searchResults: users,
selectedUser: users[0],
});
render(<RateLimitManager />);
expect(screen.queryByText(/Select a user/)).toBeNull();
});
it("shows selected user indicator", () => {
const users = [{ user_id: "u1", user_email: "alice@example.com" }];
mockHookReturn = buildHookReturn({
searchResults: users,
selectedUser: users[0],
});
render(<RateLimitManager />);
expect(screen.getByText("Selected:")).toBeDefined();
});
it("shows loading message when isLoadingRateLimit is true", () => {
mockHookReturn = buildHookReturn({ isLoadingRateLimit: true });
render(<RateLimitManager />);
expect(screen.getByText("Loading rate limits...")).toBeDefined();
});
it("renders RateLimitDisplay when rateLimitData is present", () => {
mockHookReturn = buildHookReturn({
rateLimitData: {
user_id: "user-123",
user_email: "alice@example.com",
daily_token_limit: 10000,
weekly_token_limit: 50000,
daily_tokens_used: 2500,
weekly_tokens_used: 10000,
tier: "FREE",
},
});
render(<RateLimitManager />);
expect(screen.getByTestId("rate-limit-display")).toBeDefined();
expect(screen.getByText("alice@example.com")).toBeDefined();
});
it("does not render RateLimitDisplay when rateLimitData is null", () => {
render(<RateLimitManager />);
expect(screen.queryByTestId("rate-limit-display")).toBeNull();
});
it("passes handleReset and handleTierChange to RateLimitDisplay", () => {
mockHookReturn = buildHookReturn({
rateLimitData: {
user_id: "user-123",
user_email: "alice@example.com",
daily_token_limit: 10000,
weekly_token_limit: 50000,
daily_tokens_used: 2500,
weekly_tokens_used: 10000,
tier: "FREE",
},
});
render(<RateLimitManager />);
fireEvent.click(screen.getByText("mock-reset"));
expect(mockHandleReset).toHaveBeenCalledWith(false);
fireEvent.click(screen.getByText("mock-tier"));
expect(mockHandleTierChange).toHaveBeenCalledWith("PRO");
});
});

View File

@@ -0,0 +1,387 @@
import { describe, expect, it, vi, beforeEach, afterEach } from "vitest";
import { renderHook, act, cleanup } from "@testing-library/react";
const mockToast = vi.fn();
vi.mock("@/components/molecules/Toast/use-toast", () => ({
useToast: () => ({ toast: mockToast }),
}));
const mockGetV2GetUserRateLimit = vi.fn();
const mockGetV2SearchUsersByNameOrEmail = vi.fn();
const mockPostV2ResetUserRateLimitUsage = vi.fn();
const mockPostV2SetUserRateLimitTier = vi.fn();
vi.mock("@/app/api/__generated__/endpoints/admin/admin", () => ({
getV2GetUserRateLimit: (...args: unknown[]) =>
mockGetV2GetUserRateLimit(...args),
getV2SearchUsersByNameOrEmail: (...args: unknown[]) =>
mockGetV2SearchUsersByNameOrEmail(...args),
postV2ResetUserRateLimitUsage: (...args: unknown[]) =>
mockPostV2ResetUserRateLimitUsage(...args),
postV2SetUserRateLimitTier: (...args: unknown[]) =>
mockPostV2SetUserRateLimitTier(...args),
}));
import { useRateLimitManager } from "../useRateLimitManager";
function makeRateLimitResponse(overrides = {}) {
return {
user_id: "user-123",
user_email: "alice@example.com",
daily_token_limit: 10000,
weekly_token_limit: 50000,
daily_tokens_used: 2500,
weekly_tokens_used: 10000,
tier: "FREE",
...overrides,
};
}
beforeEach(() => {
mockToast.mockClear();
mockGetV2GetUserRateLimit.mockReset();
mockGetV2SearchUsersByNameOrEmail.mockReset();
mockPostV2ResetUserRateLimitUsage.mockReset();
mockPostV2SetUserRateLimitTier.mockReset();
});
afterEach(() => {
cleanup();
});
describe("useRateLimitManager", () => {
it("returns initial state", () => {
const { result } = renderHook(() => useRateLimitManager());
expect(result.current.isSearching).toBe(false);
expect(result.current.isLoadingRateLimit).toBe(false);
expect(result.current.searchResults).toEqual([]);
expect(result.current.selectedUser).toBeNull();
expect(result.current.rateLimitData).toBeNull();
});
it("handleSearch does nothing for empty query", async () => {
const { result } = renderHook(() => useRateLimitManager());
await act(async () => {
await result.current.handleSearch(" ");
});
expect(mockGetV2GetUserRateLimit).not.toHaveBeenCalled();
expect(mockGetV2SearchUsersByNameOrEmail).not.toHaveBeenCalled();
});
it("handleSearch does direct lookup for email input", async () => {
const data = makeRateLimitResponse();
mockGetV2GetUserRateLimit.mockResolvedValue({ status: 200, data });
const { result } = renderHook(() => useRateLimitManager());
await act(async () => {
await result.current.handleSearch("alice@example.com");
});
expect(mockGetV2GetUserRateLimit).toHaveBeenCalledWith({
email: "alice@example.com",
});
expect(result.current.rateLimitData).toEqual(data);
expect(result.current.selectedUser).toEqual({
user_id: "user-123",
user_email: "alice@example.com",
});
});
it("handleSearch does direct lookup for UUID input", async () => {
const uuid = "550e8400-e29b-41d4-a716-446655440000";
const data = makeRateLimitResponse({ user_id: uuid });
mockGetV2GetUserRateLimit.mockResolvedValue({ status: 200, data });
const { result } = renderHook(() => useRateLimitManager());
await act(async () => {
await result.current.handleSearch(uuid);
});
expect(mockGetV2GetUserRateLimit).toHaveBeenCalledWith({
user_id: uuid,
});
expect(result.current.rateLimitData).toEqual(data);
});
it("handleSearch shows error toast on direct lookup failure", async () => {
mockGetV2GetUserRateLimit.mockResolvedValue({ status: 404 });
const { result } = renderHook(() => useRateLimitManager());
await act(async () => {
await result.current.handleSearch("alice@example.com");
});
expect(mockToast).toHaveBeenCalledWith(
expect.objectContaining({
title: "Error",
variant: "destructive",
}),
);
expect(result.current.rateLimitData).toBeNull();
});
it("handleSearch does fuzzy search for partial text", async () => {
const users = [
{ user_id: "u1", user_email: "alice@example.com" },
{ user_id: "u2", user_email: "bob@example.com" },
];
mockGetV2SearchUsersByNameOrEmail.mockResolvedValue({
status: 200,
data: users,
});
const { result } = renderHook(() => useRateLimitManager());
await act(async () => {
await result.current.handleSearch("alice");
});
expect(mockGetV2SearchUsersByNameOrEmail).toHaveBeenCalledWith({
query: "alice",
limit: 20,
});
expect(result.current.searchResults).toEqual(users);
});
it("handleSearch shows toast when fuzzy search returns no results", async () => {
mockGetV2SearchUsersByNameOrEmail.mockResolvedValue({
status: 200,
data: [],
});
const { result } = renderHook(() => useRateLimitManager());
await act(async () => {
await result.current.handleSearch("nonexistent");
});
expect(mockToast).toHaveBeenCalledWith(
expect.objectContaining({ title: "No results" }),
);
expect(result.current.searchResults).toEqual([]);
});
it("handleSearch shows error toast on fuzzy search failure", async () => {
mockGetV2SearchUsersByNameOrEmail.mockResolvedValue({ status: 500 });
const { result } = renderHook(() => useRateLimitManager());
await act(async () => {
await result.current.handleSearch("alice");
});
expect(mockToast).toHaveBeenCalledWith(
expect.objectContaining({
title: "Error",
variant: "destructive",
}),
);
});
it("handleSelectUser fetches rate limit for selected user", async () => {
const data = makeRateLimitResponse();
mockGetV2GetUserRateLimit.mockResolvedValue({ status: 200, data });
const { result } = renderHook(() => useRateLimitManager());
await act(async () => {
await result.current.handleSelectUser({
user_id: "user-123",
user_email: "alice@example.com",
});
});
expect(mockGetV2GetUserRateLimit).toHaveBeenCalledWith({
user_id: "user-123",
});
expect(result.current.selectedUser).toEqual({
user_id: "user-123",
user_email: "alice@example.com",
});
expect(result.current.rateLimitData).toEqual(data);
});
it("handleSelectUser shows error toast on fetch failure", async () => {
mockGetV2GetUserRateLimit.mockResolvedValue({ status: 500 });
const { result } = renderHook(() => useRateLimitManager());
await act(async () => {
await result.current.handleSelectUser({
user_id: "user-123",
user_email: "alice@example.com",
});
});
expect(mockToast).toHaveBeenCalledWith(
expect.objectContaining({
title: "Error",
variant: "destructive",
}),
);
expect(result.current.rateLimitData).toBeNull();
});
it("handleReset calls reset endpoint and updates data", async () => {
const initial = makeRateLimitResponse({ daily_tokens_used: 5000 });
const after = makeRateLimitResponse({ daily_tokens_used: 0 });
mockGetV2GetUserRateLimit.mockResolvedValue({ status: 200, data: initial });
mockPostV2ResetUserRateLimitUsage.mockResolvedValue({
status: 200,
data: after,
});
const { result } = renderHook(() => useRateLimitManager());
await act(async () => {
await result.current.handleSelectUser({
user_id: "user-123",
user_email: "alice@example.com",
});
});
await act(async () => {
await result.current.handleReset(false);
});
expect(mockPostV2ResetUserRateLimitUsage).toHaveBeenCalledWith({
user_id: "user-123",
reset_weekly: false,
});
expect(result.current.rateLimitData).toEqual(after);
expect(mockToast).toHaveBeenCalledWith(
expect.objectContaining({ title: "Success" }),
);
});
it("handleReset does nothing when no rate limit data", async () => {
const { result } = renderHook(() => useRateLimitManager());
await act(async () => {
await result.current.handleReset(false);
});
expect(mockPostV2ResetUserRateLimitUsage).not.toHaveBeenCalled();
});
it("handleReset shows error toast on failure", async () => {
const initial = makeRateLimitResponse();
mockGetV2GetUserRateLimit.mockResolvedValue({ status: 200, data: initial });
mockPostV2ResetUserRateLimitUsage.mockRejectedValue(
new Error("network error"),
);
const { result } = renderHook(() => useRateLimitManager());
await act(async () => {
await result.current.handleSelectUser({
user_id: "user-123",
user_email: "alice@example.com",
});
});
await act(async () => {
await result.current.handleReset(true);
});
expect(mockToast).toHaveBeenCalledWith(
expect.objectContaining({
title: "Error",
description: "Failed to reset rate limit usage.",
variant: "destructive",
}),
);
});
it("handleTierChange calls set tier and re-fetches", async () => {
const initial = makeRateLimitResponse({ tier: "FREE" });
const updated = makeRateLimitResponse({ tier: "PRO" });
mockGetV2GetUserRateLimit
.mockResolvedValueOnce({ status: 200, data: initial })
.mockResolvedValueOnce({ status: 200, data: updated });
mockPostV2SetUserRateLimitTier.mockResolvedValue({ status: 200 });
const { result } = renderHook(() => useRateLimitManager());
await act(async () => {
await result.current.handleSelectUser({
user_id: "user-123",
user_email: "alice@example.com",
});
});
await act(async () => {
await result.current.handleTierChange("PRO");
});
expect(mockPostV2SetUserRateLimitTier).toHaveBeenCalledWith({
user_id: "user-123",
tier: "PRO",
});
expect(result.current.rateLimitData).toEqual(updated);
});
it("handleTierChange does nothing when no rate limit data", async () => {
const { result } = renderHook(() => useRateLimitManager());
await act(async () => {
await result.current.handleTierChange("PRO");
});
expect(mockPostV2SetUserRateLimitTier).not.toHaveBeenCalled();
});
it("handleReset throws when endpoint returns non-200 status", async () => {
const initial = makeRateLimitResponse({ daily_tokens_used: 5000 });
mockGetV2GetUserRateLimit.mockResolvedValue({ status: 200, data: initial });
mockPostV2ResetUserRateLimitUsage.mockResolvedValue({ status: 500 });
const { result } = renderHook(() => useRateLimitManager());
await act(async () => {
await result.current.handleSelectUser({
user_id: "user-123",
user_email: "alice@example.com",
});
});
await act(async () => {
await result.current.handleReset(false);
});
expect(mockToast).toHaveBeenCalledWith(
expect.objectContaining({
title: "Error",
description: "Failed to reset rate limit usage.",
variant: "destructive",
}),
);
});
it("handleTierChange throws when set-tier endpoint returns non-200", async () => {
const initial = makeRateLimitResponse({ tier: "FREE" });
mockGetV2GetUserRateLimit.mockResolvedValue({ status: 200, data: initial });
mockPostV2SetUserRateLimitTier.mockResolvedValue({ status: 500 });
const { result } = renderHook(() => useRateLimitManager());
await act(async () => {
await result.current.handleSelectUser({
user_id: "user-123",
user_email: "alice@example.com",
});
});
await expect(
act(async () => {
await result.current.handleTierChange("PRO");
}),
).rejects.toThrow("Failed to update tier");
});
});

View File

@@ -2,11 +2,13 @@
import { useState } from "react";
import { useToast } from "@/components/molecules/Toast/use-toast";
import type { SetUserTierRequest } from "@/app/api/__generated__/models/setUserTierRequest";
import type { UserRateLimitResponse } from "@/app/api/__generated__/models/userRateLimitResponse";
import {
getV2GetUserRateLimit,
getV2GetAllUsersHistory,
getV2SearchUsersByNameOrEmail,
postV2ResetUserRateLimitUsage,
postV2SetUserRateLimitTier,
} from "@/app/api/__generated__/endpoints/admin/admin";
export interface UserOption {
@@ -14,18 +16,10 @@ export interface UserOption {
user_email: string;
}
/**
* Returns true when the input looks like a complete email address.
* Used to decide whether to call the direct email lookup endpoint
* vs. the broader user-history search.
*/
function looksLikeEmail(input: string): boolean {
return /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(input);
}
/**
* Returns true when the input looks like a UUID (user ID).
*/
function looksLikeUuid(input: string): boolean {
return /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i.test(
input,
@@ -41,7 +35,6 @@ export function useRateLimitManager() {
const [rateLimitData, setRateLimitData] =
useState<UserRateLimitResponse | null>(null);
/** Direct lookup by email or user ID via the rate-limit endpoint. */
async function handleDirectLookup(trimmed: string) {
setIsSearching(true);
setSearchResults([]);
@@ -77,7 +70,6 @@ export function useRateLimitManager() {
}
}
/** Fuzzy name/email search via the spending-history endpoint. */
async function handleFuzzySearch(trimmed: string) {
setIsSearching(true);
setSearchResults([]);
@@ -85,38 +77,21 @@ export function useRateLimitManager() {
setRateLimitData(null);
try {
const response = await getV2GetAllUsersHistory({
search: trimmed,
page: 1,
page_size: 50,
const response = await getV2SearchUsersByNameOrEmail({
query: trimmed,
limit: 20,
});
if (response.status !== 200) {
throw new Error("Failed to search users");
}
// Deduplicate by user_id to get unique users
const seen = new Set<string>();
const users: UserOption[] = [];
for (const tx of response.data.history) {
if (!seen.has(tx.user_id)) {
seen.add(tx.user_id);
users.push({
user_id: tx.user_id,
user_email: String(tx.user_email ?? tx.user_id),
});
}
}
const users = (response.data ?? []).map((u) => ({
user_id: u.user_id,
user_email: u.user_email ?? u.user_id,
}));
if (users.length === 0) {
toast({
title: "No results",
description: "No users found matching your search.",
});
toast({ title: "No results", description: "No users found." });
}
// Always show the result list so the user explicitly picks a match.
// The history endpoint paginates transactions, not users, so a single
// page may not be authoritative -- avoid auto-selecting.
setSearchResults(users);
} catch (error) {
console.error("Error searching users:", error);
@@ -199,6 +174,32 @@ export function useRateLimitManager() {
}
}
async function handleTierChange(newTier: string) {
if (!rateLimitData) return;
const response = await postV2SetUserRateLimitTier({
user_id: rateLimitData.user_id,
tier: newTier as SetUserTierRequest["tier"],
});
if (response.status !== 200) {
throw new Error("Failed to update tier");
}
// Re-fetch rate limit data to reflect new tier-adjusted limits.
try {
const refreshResponse = await getV2GetUserRateLimit({
user_id: rateLimitData.user_id,
});
if (refreshResponse.status === 200) {
setRateLimitData(refreshResponse.data);
}
} catch {
// Tier was changed server-side; UI will be stale but not incorrect.
// The caller's success toast is still valid — the tier change worked.
}
}
return {
isSearching,
isLoadingRateLimit,
@@ -208,5 +209,6 @@ export function useRateLimitManager() {
handleSearch,
handleSelectUser,
handleReset,
handleTierChange,
};
}

View File

@@ -40,14 +40,14 @@ export const ContentRenderer: React.FC<{
!shortContent
) {
return (
<div className="overflow-hidden [&>*]:rounded-xlarge [&>*]:!text-xs [&_pre]:whitespace-pre-wrap [&_pre]:break-words">
<div className="overflow-x-auto [&>*]:rounded-xlarge [&>*]:!text-xs [&_pre]:whitespace-pre-wrap [&_pre]:break-words">
{renderer?.render(value, metadata)}
</div>
);
}
return (
<div className="overflow-hidden [&>*]:rounded-xlarge [&>*]:!text-xs">
<div className="overflow-x-auto [&>*]:rounded-xlarge [&>*]:!text-xs">
<TextRenderer value={value} truncateLengthLimit={200} />
</div>
);

View File

@@ -8,6 +8,7 @@ import { Flag, useGetFlag } from "@/services/feature-flags/use-get-flag";
import { SidebarProvider } from "@/components/ui/sidebar";
import { cn } from "@/lib/utils";
import { UploadSimple } from "@phosphor-icons/react";
import dynamic from "next/dynamic";
import { useCallback, useEffect, useRef, useState } from "react";
import { ChatContainer } from "./components/ChatContainer/ChatContainer";
import { ChatSidebar } from "./components/ChatSidebar/ChatSidebar";
@@ -20,6 +21,14 @@ import { RateLimitResetDialog } from "./components/RateLimitResetDialog/RateLimi
import { ScaleLoader } from "./components/ScaleLoader/ScaleLoader";
import { useCopilotPage } from "./useCopilotPage";
const ArtifactPanel = dynamic(
() =>
import("./components/ArtifactPanel/ArtifactPanel").then(
(m) => m.ArtifactPanel,
),
{ ssr: false },
);
export function CopilotPage() {
const [isDragging, setIsDragging] = useState(false);
const [droppedFiles, setDroppedFiles] = useState<File[]>([]);
@@ -80,6 +89,10 @@ export function CopilotPage() {
isUploadingFiles,
isUserLoading,
isLoggedIn,
// Pagination
hasMoreMessages,
isLoadingMore,
loadMore,
// Mobile drawer
isMobile,
isDrawerOpen,
@@ -116,6 +129,7 @@ export function CopilotPage() {
const resetCost = usage?.reset_cost;
const isBillingEnabled = useGetFlag(Flag.ENABLE_PLATFORM_PAYMENT);
const isArtifactsEnabled = useGetFlag(Flag.ARTIFACTS);
const { credits, fetchCredits } = useCredits({ fetchInitialCredits: true });
const hasInsufficientCredits =
credits !== null && resetCost != null && credits < resetCost;
@@ -150,48 +164,55 @@ export function CopilotPage() {
className="h-[calc(100vh-72px)] min-h-0"
>
{!isMobile && <ChatSidebar />}
<div
className="relative flex h-full w-full flex-col overflow-hidden bg-[#f8f8f9] px-0"
onDragEnter={handleDragEnter}
onDragOver={handleDragOver}
onDragLeave={handleDragLeave}
onDrop={handleDrop}
>
{isMobile && <MobileHeader onOpenDrawer={handleOpenDrawer} />}
<NotificationBanner />
{/* Drop overlay */}
<div className="flex h-full w-full flex-row overflow-hidden">
<div
className={cn(
"pointer-events-none absolute inset-0 z-50 flex flex-col items-center justify-center gap-3 rounded-lg border-2 border-dashed border-violet-400 bg-violet-500/10 transition-opacity duration-150",
isDragging ? "opacity-100" : "opacity-0",
)}
className="relative flex min-w-0 flex-1 flex-col overflow-hidden bg-[#f8f8f9] px-0"
onDragEnter={handleDragEnter}
onDragOver={handleDragOver}
onDragLeave={handleDragLeave}
onDrop={handleDrop}
>
<UploadSimple className="h-10 w-10 text-violet-500" weight="bold" />
<span className="text-lg font-medium text-violet-600">
Drop files here
</span>
</div>
<div className="flex-1 overflow-hidden">
<ChatContainer
messages={messages}
status={status}
error={error}
sessionId={sessionId}
isLoadingSession={isLoadingSession}
isSessionError={isSessionError}
isCreatingSession={isCreatingSession}
isReconnecting={isReconnecting}
isSyncing={isSyncing}
onCreateSession={createSession}
onSend={onSend}
onStop={stop}
isUploadingFiles={isUploadingFiles}
droppedFiles={droppedFiles}
onDroppedFilesConsumed={handleDroppedFilesConsumed}
historicalDurations={historicalDurations}
/>
{isMobile && <MobileHeader onOpenDrawer={handleOpenDrawer} />}
<NotificationBanner />
{/* Drop overlay */}
<div
className={cn(
"pointer-events-none absolute inset-0 z-50 flex flex-col items-center justify-center gap-3 rounded-lg border-2 border-dashed border-violet-400 bg-violet-500/10 transition-opacity duration-150",
isDragging ? "opacity-100" : "opacity-0",
)}
>
<UploadSimple className="h-10 w-10 text-violet-500" weight="bold" />
<span className="text-lg font-medium text-violet-600">
Drop files here
</span>
</div>
<div className="flex-1 overflow-hidden">
<ChatContainer
messages={messages}
status={status}
error={error}
sessionId={sessionId}
isLoadingSession={isLoadingSession}
isSessionError={isSessionError}
isCreatingSession={isCreatingSession}
isReconnecting={isReconnecting}
isSyncing={isSyncing}
onCreateSession={createSession}
onSend={onSend}
onStop={stop}
isUploadingFiles={isUploadingFiles}
hasMoreMessages={hasMoreMessages}
isLoadingMore={isLoadingMore}
onLoadMore={loadMore}
droppedFiles={droppedFiles}
onDroppedFilesConsumed={handleDroppedFilesConsumed}
historicalDurations={historicalDurations}
/>
</div>
</div>
{!isMobile && isArtifactsEnabled && <ArtifactPanel />}
</div>
{isMobile && isArtifactsEnabled && <ArtifactPanel mobile />}
{isMobile && (
<MobileDrawer
isOpen={isDrawerOpen}

Some files were not shown because too many files have changed in this diff Show More