Compare commits

..

33 Commits

Author SHA1 Message Date
Zamil Majdy
da2d3418bd Merge remote-tracking branch 'origin/dev' into feat/copilot-token-optimization
# Conflicts:
#	autogpt_platform/backend/backend/copilot/prompting.py
2026-03-17 06:18:51 +07:00
Zamil Majdy
9a41312769 feat(backend/copilot): parse @@agptfile bare refs by file extension (#12392)
The `@@agptfile:` expansion system previously used content-sniffing
(trying
`json.loads` then `csv.Sniffer`) to decide whether to parse file content
as
structured data. This was fragile — a file containing just `"42"` would
be
parsed as an integer, and the heuristics could misfire on ambiguous
content.

This PR replaces content-sniffing with **extension/MIME-based format
detection**.
When the file has a well-known extension (`.json`, `.csv`, etc.) or MIME
type
fragment (`workspace://id#application/json`), the content is parsed
accordingly.
Unknown formats or parse failures always fall back to plain string — no
surprises.

> [!NOTE]
> This PR builds on the `@@agptfile:` file reference protocol introduced
in #12332 and the structured data auto-parsing added in #12390.
>
> **What is `@@agptfile:`?**
> It is a special URI prefix (e.g. `@@agptfile:workspace:///report.csv`)
that the CoPilot SDK expands inline before sending tool arguments to
blocks. This lets the AI reference workspace files by name, and the SDK
automatically reads and injects the file content. See #12332 for the
full design.

### Changes 🏗️

**New utility: `backend/util/file_content_parser.py`**
- `infer_format(uri)` — determines format from file extension or MIME
fragment
- `parse_file_content(content, fmt)` — parses content, never raises
- Supported text formats: JSON, JSONL/NDJSON, CSV, TSV, YAML, TOML
- Supported binary formats: Parquet (via pyarrow), Excel/XLSX (via
openpyxl)
- JSON scalars (strings, numbers, booleans, null) stay as strings — only
  containers (arrays, objects) are promoted
- CSV/TSV require ≥1 row and ≥2 columns to qualify as tabular data
- Added `openpyxl` dependency for Excel reading via pandas
- Case-insensitive MIME fragment matching per RFC 2045
- Shared `PARSE_EXCEPTIONS` constant to avoid duplication between
modules

**Updated `expand_file_refs_in_args` in `file_ref.py`**
- Bare refs now use `infer_format` + `parse_file_content` instead of the
  old `_try_parse_structured` content-sniffing function
- Binary formats (parquet, xlsx) read raw bytes via `read_file_bytes`
- Embedded refs (text around `@@agptfile:`) still produce plain strings
- **Size guards**: Workspace and sandbox file reads now enforce a 10 MB
limit
  (matching the existing local file limit) to prevent OOM on large files

**Updated `blocks/github/commits.py`**
- Consolidated `_create_blob` and `_create_binary_blob` into a single
function
  with an `encoding` parameter

**Updated copilot system prompt**
- Documents the extension-based structured data parsing and supported
formats

**66 new tests** in `file_content_parser_test.py` covering:
- Format inference (extension, MIME, case-insensitive, precedence)
- All 8 format parsers (happy path + edge cases + fallbacks)
- Binary format handling (string input fallback, invalid bytes fallback)
- Unknown format passthrough

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] All 66 file_content_parser_test.py tests pass
  - [x] All 31 file_ref_test.py tests pass
  - [x] All 13 file_ref_integration_test.py tests pass
  - [x] `poetry run format` passes clean (including pyright)
2026-03-16 22:31:21 +00:00
Ubbe
048fb06b0a feat(frontend): add "Jump Back In" button to Library page (#12387)
Adds a "Jump Back In" CTA at the top of the Library page to encourage
users to quickly rerun their most recently successful agent.

Closes SECRT-1536

### Changes 🏗️

- New `JumpBackIn` component with `useJumpBackIn` hook at
`library/components/JumpBackIn/`
- Fetches first page of library agents sorted by `updatedAt`
- Finds the first agent with a `COMPLETED` execution in
`recent_executions`
- Shows banner with agent name + "Jump Back In" button linking to
`/library/agents/{id}`
- Returns `null` (hidden) when loading or when no agent with a
successful run exists

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `pnpm format`, `pnpm lint`, `pnpm types` all pass
- [x] Verified banner is hidden when no successful runs exist (edge
case)
- [x] Verified library page renders correctly with no visual regressions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 21:35:03 +08:00
Zamil Majdy
3f653e6614 dx(.claude): refactor and consolidate Claude Code skills (#12424)
Refactors the Claude Code skills for a cleaner, more intuitive dev loop.

### Changes 🏗️

- **`/pr-review` (new)**: Actual code review skill — reads the PR diff,
fetches existing comments to avoid duplicates, and posts inline GitHub
comments with structured feedback (Blockers / Should Fix / Nice to Have
/ Nit) covering correctness, security, code quality, architecture, and
testing.

- **`/pr-address` (was `/babysit-pr`)**: Addresses review comments and
monitors CI until green. Renamed from `/babysit-pr` to `/pr-address` to
better reflect its purpose. Handles bot-specific feedback
(autogpt-reviewer, sentry, coderabbitai) and loops until all comments
are addressed and CI is green.

- **`/backend-check` + `/frontend-check` → `/check`**: Unified into a
single `/check` skill that auto-detects whether backend (Python) or
frontend (TypeScript) code changed and runs the appropriate formatting,
linting, type checking, and tests. Shared code quality rules applied to
both.

- **`/code-style` enhanced**: Now covers both Python and
TypeScript/React. Added learnings from real PR work: lazy `%s` logging,
TOCTOU awareness, SSE protocol rules (`data:` vs `: comment`), FastAPI
`Security()` vs `Depends()`, Redis pipeline atomicity, error path
sanitization, mock target rules after refactoring.

- **`/worktree` fixed**: Normal `git worktree` is now the default (was
branchlet-first). Branchlet moved to optional section. All paths derived
from `git rev-parse --show-toplevel`.

- **`/pr-create`, `/openapi-regen`, `/new-block` cleaned up**: Reference
`/check` and `/code-style` instead of duplicating instructions.

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Verified all skill files parse correctly (valid YAML frontmatter)
  - [x] Verified skill auto-detection triggers updated in descriptions
- [x] Verified old backend-check and frontend-check directories removed
- [x] Verified pr-review and pr-address directories created with correct
content
2026-03-16 10:35:05 +00:00
Zamil Majdy
c9c3d54b2b fix(platform): reduce Sentry noise by filtering expected errors and downgrading log levels (#12430)
## Summary

Reduces Sentry error noise by ~90% by filtering out expected/transient
errors and downgrading inappropriate error-level logs to warnings. Most
of the top Sentry issues are not actual bugs but expected conditions
(user errors, transient infra, business logic) that were incorrectly
logged at ERROR level, causing them to be captured as Sentry events.

## Changes

### 1. Sentry `before_send` filter (`metrics.py`)
Added a `before_send` hook to filter known expected errors before they
reach Sentry:
- **AMQP/RabbitMQ connection errors** — transient during
deploys/restarts
- **User credential errors** — invalid API keys, missing auth headers
(user error, not platform bug)
- **Insufficient balance** — expected business logic
- **Blocked IP access** — security check working as intended
- **Discord bot token errors** — misconfiguration, not runtime error
- **Google metadata DNS errors** — expected in non-GCP environments
- **Inactive email recipients** — expected for bounced addresses
- **Unclosed client sessions/connectors** — resource cleanup noise

### 2. Connection retry log levels (`retry.py`)
- `conn_retry` final failure: `error` → `warning` (these are infra
retries, not bugs)
- `conn_retry` wrapper final failure: `error` → `warning`
- Discord alert send failure: `error` → `warning`

### 3. Block execution Sentry capture (`manager.py`)
- Skip `sentry_sdk.capture_exception()` for `ValueError` subclasses
(BlockExecutionError, BlockInputError, InsufficientBalanceError, etc.) —
these are user-caused errors, not platform bugs
- Downgrade executor shutdown/disconnect errors to warning

### 4. Scheduler log levels (`scheduler.py`)
- Graph validation failure: `error` → `warning` (expected for
old/invalid graphs)
- Unable to unschedule graph: `error` → `warning`
- Job listener failure: `error` → `warning`
- Async operation failure: `error` → `warning`

### 5. Discord system alert (`notifications.py`)
- Wrapped `discord_system_alert` endpoint with try/catch to prevent
unhandled exceptions (fixes AUTOGPT-SERVER-743, AUTOGPT-SERVER-7MW)

### 6. Notification system log levels (`notifications.py`)
- All batch processing errors: `error` → `warning`
- User email not found: `error` → `warning`
- Notification parsing errors: `error` → `warning`
- Email sending failures: `error` → `warning`
- Summary data gathering failure: `error` → `warning`
- Cleaned up unprofessional error messages

### 7. Cloud storage cleanup (`cloud_storage.py`)
- Cleanup error: `error` → `warning`

## Sentry Issues Addressed

### AMQP/RabbitMQ (~3.4M events total)
-
[AUTOGPT-SERVER-3H2](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H2)
— AMQPConnector ConnectionRefusedError (1.2M events)
-
[AUTOGPT-SERVER-3H3](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H3)
— AMQPConnectionWorkflowFailed (770K events)
-
[AUTOGPT-SERVER-3H4](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H4)
— AMQP connection workflow failed (770K events)
-
[AUTOGPT-SERVER-3H5](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H5)
— AMQPConnectionWorkflow reporting failure (770K events)
-
[AUTOGPT-SERVER-3H7](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H7)
— Socket failed to connect (514K events)
-
[AUTOGPT-SERVER-3H8](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H8)
— TCP Connection attempt failed (514K events)
-
[AUTOGPT-SERVER-3H6](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-3H6)
— AMQPConnectionError (93K events)
-
[AUTOGPT-SERVER-7SX](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-7SX)
— Error creating transport (69K events)
-
[AUTOGPT-SERVER-1TN](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-1TN)
— ChannelInvalidStateError (39K events)
-
[AUTOGPT-SERVER-6JC](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6JC)
— ConnectionClosedByBroker (2K events)
-
[AUTOGPT-SERVER-6RJ/6RK/6RN/6RQ/6RP/6RR](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6RJ)
— Various connection failures (~15K events)
-
[AUTOGPT-SERVER-4A5/6RM/7XN](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-4A5)
— Connection close/transport errors (~540 events)

### User Credential Errors (~15K events)
-
[AUTOGPT-SERVER-6S5](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6S5)
— Incorrect OpenAI API key (9.2K events)
-
[AUTOGPT-SERVER-7W4](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-7W4)
— Incorrect API key in AIConditionBlock (3.4K events)
-
[AUTOGPT-SERVER-83Y](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-83Y)
— AI condition invalid key (2.3K events)
-
[AUTOGPT-SERVER-7ZP](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-7ZP)
— Perplexity missing auth header (451 events)
-
[AUTOGPT-SERVER-7XK/7XM](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-7XK)
— Anthropic invalid key (125 events)
-
[AUTOGPT-SERVER-82C](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-82C)
— Missing auth header (27 events)
-
[AUTOGPT-SERVER-721](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-721)
— Ideogram invalid token (165 events)

### Business Logic / Validation (~120K events)
-
[AUTOGPT-SERVER-7YQ](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-7YQ)
— Disabled block used in graph (56K events)
-
[AUTOGPT-SERVER-6W3](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6W3)
— Graph failed validation (46K events)
-
[AUTOGPT-SERVER-6W2](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6W2)
— Unable to unschedule graph (46K events)
-
[AUTOGPT-SERVER-83X](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-83X)
— Blocked IP access (15K events)
-
[AUTOGPT-SERVER-6K9](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6K9)
— Insufficient balance (4K events)

### Discord Alert Failures (~24K events)
-
[AUTOGPT-SERVER-743](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-743)
— Discord improper token (22K events)
-
[AUTOGPT-SERVER-7MW](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-7MW)
— Discord 403 Missing Access (1.5K events)

### Notification System (~16K events)
-
[AUTOGPT-SERVER-550](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-550)
— Notification batch create error (8.3K events)
-
[AUTOGPT-SERVER-58H](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-58H)
— ValidationError for NotificationEventModel (3K events)
-
[AUTOGPT-SERVER-5C6](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-5C6)
— Get notification batch error (2.1K events)
-
[AUTOGPT-SERVER-4BT](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-4BT)
— Notification batch create error (1.8K events)
-
[AUTOGPT-SERVER-5E4](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-5E4)
— NotificationPreference validation (1.4K events)
-
[AUTOGPT-SERVER-508](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-508)
— Inactive email recipients (702 events)

### Infrastructure / Transient (~20K events)
-
[AUTOGPT-SERVER-6WJ](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-6WJ)
— Unclosed client session (13K events)
-
[AUTOGPT-SERVER-745](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-745)
— Unclosed connector (5.8K events)
-
[AUTOGPT-SERVER-4V1](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-4V1)
— Google metadata DNS error (2.2K events)
-
[AUTOGPT-SERVER-80J](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-80J)
— CloudStorage DNS error (35 events)

### Executor Shutdown
-
[AUTOGPT-SERVER-55J](https://significant-gravitas.sentry.io/issues/AUTOGPT-SERVER-55J)
— Error disconnecting run client (118 events)

## Test plan
- [x] All pre-commit hooks pass (Ruff, isort, Black, Pyright typecheck)
- [x] All changed modules import successfully
- [ ] Deploy to staging and verify Sentry event volume drops
significantly
- [ ] Verify legitimate errors still appear in Sentry
2026-03-16 10:29:01 +00:00
Ubbe
53d58e21d3 feat(frontend): replace technical block terminology with user-friendly labels (#12389)
## Summary
- Replaces all user-facing "block" terminology in the CoPilot activity
stream with plain-English labels ("Step failed", "action",
"Credentials", etc.)
- Adds `humanizeFileName()` utility to display file names without
extensions, with title-case and spaces (e.g. `executive_memo.md` →
`"Executive Memo"`)
- Updates error messages across RunBlock, RunAgent, and FindBlocks tools
to use friendly language

## Test plan
- [ ] Open CoPilot and trigger a block execution — verify animation text
says "Running" / "Step failed" instead of "Running the block" / "Error
running block"
- [ ] Trigger a file read/write action — verify the activity shows
humanized file names (e.g. `Reading "Executive Memo"` not `Reading
executive_memo.md`)
- [ ] Trigger FindBlocks — verify labels say "Searching for actions" and
"Results" instead of "Searching for blocks" and "Block results"
- [ ] Check the work-done stats bar — verify it shows "action" /
"actions" instead of "block run" / "block runs"
- [ ] Trigger a setup requirements card — verify labels say
"Credentials" and "Inputs" instead of "Block credentials" and "Block
inputs"
- [ ] Visit `/copilot/styleguide` — verify error test data no longer
contains "Block execution" text

Resolves: [SECRT-2025](https://linear.app/autogpt/issue/SECRT-2025)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 09:00:25 +00:00
Ubbe
fa04fb41d8 feat(frontend): add "Run now" button to schedule view (#12388)
Adds a "Run now" action to the schedule detail view and sidebar
dropdown, allowing users to immediately trigger a scheduled agent run
without waiting for the next cron execution.

### Changes 🏗️

- **`useSelectedScheduleActions.ts`**: Added
`usePostV1ExecuteGraphAgent` hook and `handleRunNow` function that
executes the agent using the schedule's stored `input_data` and
`input_credentials`. On success, invalidates runs query and navigates to
the new run
- **`SelectedScheduleActions.tsx`**: Added Play icon button as first
action button, with loading spinner while running
- **`SelectedScheduleView.tsx`**: Threads `onSelectRun` prop and
`schedule` object to action components (both mobile and desktop layouts)
- **`NewAgentLibraryView.tsx`**: Passes `onSelectRun` handler to enable
navigation to the new run after execution
- **`ScheduleActionsDropdown.tsx`**: Added "Run now" dropdown menu item
with same execution logic
- **`ScheduleListItem.tsx`**: Added `onRunCreated` prop passed to
dropdown
- **`SidebarRunsList.tsx`**: Connects sidebar dropdown to run
selection/navigation

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `pnpm format`, `pnpm lint`, `pnpm types` all pass
- [x] Code review: follows existing patterns (mirrors "Run Again" in
SelectedRunActions)
  - [x] No visual regressions on agent detail page

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 17:00:41 +08:00
Zamil Majdy
0faee668ab fix(copilot): address review comments — should-fix + nice-to-have items
Should Fix:
- run_block.py: remove block_name from properties (never consumed via kwargs)
- tool_schema_test.py: cache _ALL_SCHEMAS at module level (avoid double invocation)
- workspace_files.py: restore routing-hint description for list_workspace_files
- feature_requests.py: restore "phone numbers" in PII prohibition list

Nice to Have:
- prompting.py: restore attention signal on Storage heading (### Storage — important)
- tool_schema_test.py: clarify token baseline comment (tool JSON only, ~6470 tokens)
- bash_exec.py: restore use-case hints in description
- get_agent_building_guide.py: restore AgentExecutorBlock/MCPToolBlock signal in description
2026-03-16 06:14:12 +07:00
Zamil Majdy
64790e769a fix(copilot): include all run_time values in agent_output schema description 2026-03-15 22:55:48 +07:00
Zamil Majdy
3bc8db491f fix(copilot): address autogpt-reviewer should-fix items from PR #12398 review
- run_block.py: mark block_name as optional in description
- web_fetch.py: restore SSRF hint "Public URLs only — internal addresses blocked"
- agent_output.py: restore "Returns current state on timeout" to wait_if_running;
  trim run_time description from ~140 chars to ~75 chars
- workspace_files.py: add trailing period to filename description
- tool_schema_test.py: add token budget regression test (assert < 8000 tokens,
  current baseline ~5200 tokens) to lock in the 34% token reduction
2026-03-15 22:18:43 +07:00
Zamil Majdy
f0c3eb87d1 fix(copilot): address reviewer should-fix items for tool schemas
- Add wait_if_running hint to view_agent_output tool description
- Make block_name optional in run_block schema (required but unused)
2026-03-15 14:05:02 +07:00
Zamil Majdy
1db18f2f0a fix(backend/copilot): add tool schema regression test 2026-03-14 23:00:22 +07:00
Zamil Majdy
e542880660 fix: clarify list_workspace_files description re cross-session persistence
The description "Current session only by default" was confusing for the LLM,
implying the workspace itself is session-scoped. Reworded to clarify the
workspace is persistent and only the default view is filtered to the current
session.
2026-03-14 22:52:18 +07:00
Zamil Majdy
f89a8e5de0 fix(copilot): improve tool schema descriptions for clarity and completeness
- Expand run_time description to list all supported parser inputs (today, last 7 days, last month, last 30 days, YYYY-MM-DD, ISO datetime)
- Add cross-tool reference hints (find_block before run_block, get_mcp_guide for URLs)
- Add mutual exclusivity notes for write_workspace_file content params
- Add auth guidance to browser_navigate, return note to browser_act
- Fix duplicate line range in agptfile example
2026-03-14 22:46:38 +07:00
Zamil Majdy
84c58b6624 fix: restore critical cross-tool references in trimmed tool descriptions
- Add web_fetch preference hint to browser_navigate
- Restore continue_run_block reference in run_block
- Restore fix_agent_graph reference in validate_agent
- Add back ephemeral temp directory example in @@agptfile docs
- Restore search quality hint in find_agent query description
- Restore get_doc_page cross-reference in search_docs
2026-03-14 22:27:44 +07:00
Zamil Majdy
05182b8368 Merge remote-tracking branch 'origin/dev' into feat/copilot-token-optimization 2026-03-14 22:11:13 +07:00
Otto
d9c16ded65 fix(copilot): prioritize block discovery over MCP and sanitize HTML errors (#12394)
Requested by @majdyz

When a user asks for Google Sheets integration, the CoPilot agent skips
block discovery entirely (despite 55+ Google Sheets blocks being
available), jumps straight to MCP, guesses a fake URL
(`https://sheets.googleapis.com/mcp`), and gets a raw HTML 404 error
page dumped into the conversation.

**Changes:**

1. **MCP guide** (`mcp_tool_guide.md`): Added "Check blocks first"
section directing the agent to use `find_block` before attempting MCP
for any service not in the known servers list. Explicitly prohibits
guessing/constructing MCP server URLs.

2. **Error handling** (`run_mcp_tool.py`): Detects HTML error pages in
HTTP responses (e.g. raw 404 pages from non-MCP endpoints) and returns a
clean one-liner like "This URL does not appear to host an MCP server"
instead of dumping the full HTML body.

**Note:** The main CoPilot system prompt (managed externally, not in
repo) should also be updated to reinforce block-first behavior in the
Capability Check section. This PR covers the in-repo changes.

Session reference: `9216df83-5f4a-48eb-9457-3ba2057638ae` (turn 3)
Ticket: [SECRT-2116](https://linear.app/autogpt/issue/SECRT-2116)

---
Co-authored-by: Zamil Majdy (@majdyz) <majdyz@gmail.com>

---------

Co-authored-by: Zamil Majdy (@majdyz) <majdyz@gmail.com>
Co-authored-by: Zamil Majdy <zamil.majdy@agpt.co>
2026-03-14 12:49:03 +00:00
Otto
6dc8429ae7 fix(copilot): downgrade agent validation failure log from error to warning (#12409)
Agent validation failures are expected when the LLM generates invalid
agent graphs (wrong block IDs, missing required inputs, bad output field
names). The validator catches these and returns proper error responses.

However, `validator.py:938` used `logger.error()`, which Sentry captures
as error events — flooding #platform-alerts with non-errors.

This changes it to `logger.warning()`, keeping the log visible for
debugging without triggering Sentry alerts.

Fixes SECRT-2120

---
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
2026-03-14 12:48:36 +00:00
Zamil Majdy
f710bde7a9 Merge remote-tracking branch 'origin/dev' into feat/copilot-token-optimization 2026-03-14 10:21:04 +07:00
Zamil Majdy
cfe22e5a8f fix(backend/copilot): sync TranscriptBuilder with CLI on mid-stream compaction (#12401)
## Summary
- **Root cause**: `TranscriptBuilder` accumulates all raw SDK stream
messages including pre-compaction content. When the CLI compacts
mid-stream, the uploaded transcript was still uncompacted, causing
"Prompt is too long" errors on the next `--resume` turn.
- **Fix**: Detect mid-stream compaction via the `PreCompact` hook, read
the CLI's session file to get the compacted entries (summary +
post-compaction messages), and call
`TranscriptBuilder.replace_entries()` to sync it with the CLI's active
context. This ensures the uploaded transcript always matches what the
CLI sees.
- **Key changes**:
- `CompactionTracker`: stores `transcript_path` from `PreCompact` hook,
one-shot `compaction_just_ended` flag that correctly resets for multiple
compactions
- `read_compacted_entries()`: reads CLI session JSONL, finds
`isCompactSummary: true` entry, returns it + all entries after. Includes
path validation against the CLI projects directory.
- `TranscriptBuilder.replace_entries()`: clears and replaces all entries
with compacted ones, preserving `isCompactSummary` entries (which have
`type: "summary"` that would normally be stripped)
- `load_previous()`: also preserves `isCompactSummary` entries when
loading a previously compacted transcript
- Service stream loop: after compaction ends, reads compacted entries
and syncs TranscriptBuilder

## Test plan
- [x] 69 tests pass across `compaction_test.py` and `transcript_test.py`
- [x] Tests cover: one-shot flag behavior, multiple compactions within a
query, transcript path storage, path traversal rejection,
`read_compacted_entries` (7 tests), `replace_entries` (4 tests),
`load_previous` with compacted content (2 tests)
- [x] Pre-commit hooks pass (lint, format, typecheck)
- [ ] Manual test: trigger compaction in a multi-turn session and verify
the uploaded transcript reflects compaction
2026-03-13 22:17:46 +00:00
Zamil Majdy
1256e8a938 fix(copilot): fix indentation of persistent workspace sub-items in prompt template
Align bullet indentation under item 2 (Persistent workspace) to use
3-space indent, matching item 1's sub-items formatted via the
characteristics/persistence variables.
2026-03-13 22:39:37 +07:00
Zamil Majdy
777b71d409 fix(copilot): remove measure_copilot_tokens.py script from PR
Remove development-only token measurement script that should not be
included in the pull request.
2026-03-13 22:04:31 +07:00
Zamil Majdy
a6ecfe6e5b fix(copilot): address remaining PR review comments
- Add workspace:/// path and E2B sandbox path examples to @@agptfile docs
- Add multi-ref and expansion-rules note to file references section
- Add path guidance ("Use relative or absolute paths under this dir")
- Fix indentation for persistent workspace sub-items in storage template
- Add SPA handling guidance to browser_navigate description
- Strengthen "IMPORTANT" emphasis on browser_screenshot read_workspace_file step
2026-03-13 19:46:21 +07:00
Zamil Majdy
9cc93d84d4 Merge remote-tracking branch 'origin/dev' into feat/copilot-token-optimization 2026-03-13 19:27:00 +07:00
Zamil Majdy
a8259ca935 feat(analytics): read-only SQL views layer with analytics schema (#12367)
### Changes 🏗️

Adds `autogpt_platform/analytics/` — 14 SQL view definitions that expose
production data safely through a locked-down `analytics` schema.

**Security model:**
- Views use `security_invoker = false` (PostgreSQL 15+), so they execute
as their owner (`postgres`), not the caller
- `analytics_readonly` role only has access to `analytics.*` — cannot
touch `platform` or `auth` tables directly

**Files:**
- `backend/generate_views.py` — does everything; auto-reads credentials
from `backend/.env`
- `analytics/queries/*.sql` — 14 documented view definitions (auth, user
activity, executions, onboarding funnel, cohort retention)

---

### Running locally (dev)

```bash
cd autogpt_platform/backend

# First time only — creates analytics schema, role, grants
poetry run analytics-setup

# Create / refresh views (auto-reads backend/.env)
poetry run analytics-views
```

### Running in production (Supabase)

```bash
cd autogpt_platform/backend

# Step 1 — first time only (run in Supabase SQL Editor as postgres superuser)
poetry run analytics-setup --dry-run
# Paste the output into Supabase SQL Editor and run

# Step 2 — apply views (use direct connection host, not pooler)
poetry run analytics-views --db-url "postgresql://postgres:PASSWORD@db.<ref>.supabase.co:5432/postgres"

# Step 3 — set password for analytics_readonly so external tools can connect
# Run in Supabase SQL Editor:
# ALTER ROLE analytics_readonly WITH PASSWORD 'your-password';
```

---

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Setup + views applied cleanly on local Postgres 15
- [x] `analytics_readonly` can `SELECT` from all 14 `analytics.*` views
- [x] `analytics_readonly` gets `permission denied` on `platform.*` and
`auth.*` directly

---------

Co-authored-by: Otto (AGPT) <otto@agpt.co>
2026-03-13 12:04:42 +00:00
Zamil Majdy
32f1e51869 fix(copilot): address PR review — restore default values, strengthen guidance
- Restore JSON Schema "default" fields on: bash_exec timeout, find_block
  include_schemas, web_fetch extract_text, create_agent/edit_agent save
- Strengthen run_block description with IMPORTANT prefix for block ID warning
- Add find_library_agent search guidance to create_agent and edit_agent descriptions
2026-03-13 18:18:39 +07:00
Zamil Majdy
67a121bbe7 fix(copilot): restore functional schema constraints and critical behavioral guidance
Address review comments:
- Restore minimum/maximum JSON Schema constraints on limit parameter
- Restore default fields in JSON Schema for wait_for, annotate, filename, direction
- Restore read_workspace_file instruction after browser_screenshot
- Restore @@agptfile: syntax examples in system prompt
- Restore stronger 'do NOT guess' warning for block IDs
- Restore browser actions list in browser_act description
2026-03-13 18:14:23 +07:00
Swifty
1f1288d623 feat(copilot): generate personalized quick-action prompts from Tally business understanding (#12374)
During Tally data extraction, the system now also generates personalized
quick-action prompts as part of the existing LLM extraction call
(configurable model, defaults to GPT-4o-mini, `temperature=0.0`). The
prompt asks the LLM for 5 candidates, then the code validates (filters
prompts >20 words) and keeps the top 3. These prompts are stored in the
existing `CoPilotUnderstanding.data` JSON field (at the top level, not
under `business`) and served to the frontend via a new API endpoint. The
copilot chat page uses them instead of hardcoded defaults when
available.

### Changes 🏗️

**Backend – Data models** (`understanding.py`):
- Added `suggested_prompts` field to `BusinessUnderstandingInput`
(optional) and `BusinessUnderstanding` (default empty list)
- Updated `from_db()` to deserialize `suggested_prompts` from top-level
of the data JSON
- Updated `merge_business_understanding_data()` with overwrite strategy
for prompts (full replace, not append)
- `format_understanding_for_prompt()` intentionally does **not** include
`suggested_prompts` — they are UI-only

**Backend – Prompt generation** (`tally.py`):
- Extended `_EXTRACTION_PROMPT` to request 5 suggested prompts alongside
the existing business understanding fields — all extracted in a single
LLM call (`temperature=0.0`)
- Post-extraction validation filters out prompts exceeding 20 words and
slices to the top 3
- Model is now configurable via `tally_extraction_llm_model` setting
(defaults to `openai/gpt-4o-mini`)

**Backend – API endpoint** (`routes.py`):
- Added `GET /api/chat/suggested-prompts` (auth required)
- Returns `{prompts: string[]}` from the user's cached business
understanding (48h Redis TTL)
- Returns empty array if no understanding or no prompts exist

**Frontend** (`EmptySession/`):
- `helpers.ts`: Extracted defaults to `DEFAULT_QUICK_ACTIONS`,
`getQuickActions()` now accepts optional custom prompts and falls back
to defaults
- `EmptySession.tsx`: Calls `useGetV2GetSuggestedPrompts` hook
(`staleTime: Infinity`) and passes results to `getQuickActions()` with
hardcoded fallback
- Fixed `useEffect` resize handler that previously used
`window.innerWidth` as a dependency (re-ran every render); now uses a
proper resize event listener
- Added skeleton loading state while prompts are being fetched

**Generated** (`__generated__/`):
- Regenerated Orval API client with new endpoint types and hooks

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] Backend format + lint + pyright pass
  - [x] Frontend format + lint pass
  - [x] All existing tally tests pass (28/28)
  - [x] All chat route tests pass (9/9)
  - [x] All invited_user tests pass (7/7)
- [x] E2E: New user with tally data sees custom prompts on copilot page
  - [x] E2E: User without tally data sees hardcoded default prompts
  - [x] E2E: Clicking a custom prompt sends it as a chat message
2026-03-13 12:11:31 +01:00
Otto
02645732b8 feat(backend/copilot): enable E2B auto_resume and reduce safety-net timeout (#12397)
Enable E2B `auto_resume` lifecycle option and reduce the safety-net
timeout from 3 hours to 5 minutes.

Currently, if the explicit per-turn `pause_sandbox_direct()` call fails
(process crash, network issue, fire-and-forget task cancellation), the
sandbox keeps running for up to **3 hours** before the safety-net
timeout fires. With this change, worst-case billing drops to **5
minutes**.

### Changes
- Add `auto_resume: True` to sandbox lifecycle config — paused sandboxes
wake transparently on SDK activity
- Reduce `e2b_sandbox_timeout` default from 10800s (3h) → 300s (5min)
- Add `e2b_sandbox_auto_resume` config field (default: `True`)
- Guard: `auto_resume` only added when `on_timeout == "pause"`

### What doesn't change
- Explicit per-turn `pause_sandbox_direct()` remains the primary
mechanism
- `connect()` / `_try_reconnect()` flow unchanged
- Redis key management unchanged
- No latency impact (resume is ~1-2s regardless of trigger)

### Risk
Very low — `auto_resume` is additive. If it doesn't work as advertised,
`connect()` still resumes paused sandboxes exactly as before.

Ref: https://e2b.dev/docs/sandbox/auto-resume
Linear: SECRT-2118

---
Co-authored-by: Zamil Majdy (@majdyz) <zamil.majdy@agpt.co>
2026-03-13 10:29:28 +00:00
Swifty
ba301a3912 feat(platform): add whitelisting-backed beta user provisioning (#12347)
### Changes 🏗️

- add invite-backed beta provisioning with a new `InvitedUser` platform
model, Prisma migration, and first-login activation path that
materializes `User`, `Profile`, `UserOnboarding`, and
`CoPilotUnderstanding`
- replace the legacy beta allowlist check with invite-backed gating for
email/password signup and Tally pre-seeding during activation
- add admin backend APIs and frontend `/admin/users` management UI for
listing, creating, revoking, retrying, and bulk-uploading invited users
- add the design doc for the beta invite system and extend backend
coverage for invite activation, bulk uploads, and auth-route behavior
- configuration changes: introduce the new invite/tally schema objects
and migration; no new env vars or docker service changes are required

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] `cd autogpt_platform/backend && poetry run format`
- [x] `cd autogpt_platform/backend && poetry run pytest -q` (run against
an isolated local Postgres database with non-conflicting service port
overrides)

#### For configuration changes:

- [x] `.env.default` is updated or already compatible with my changes
- [x] `docker-compose.yml` is updated or already compatible with my
changes
- [x] I have included a list of my configuration changes in the PR
description (under **Changes**)
2026-03-13 10:25:49 +01:00
Zamil Majdy
3c754825aa perf(copilot): reduce tool schema token cost by 34%
Trim verbose tool descriptions and parameter schemas across all 35
CoPilot tools. Remove redundant cross-tool workflow references.
Condense system prompt shared notes and storage supplement.
No functional changes — only description text trimmed.

Before: ~10,609 tokens (865 system + 9,744 tools)
After:  ~6,967 tokens (497 system + 6,470 tools)
Saving: ~3,642 tokens per conversation turn (34% reduction)
2026-03-13 15:28:24 +07:00
Abhimanyu Yadav
0cd9c0d87a fix(frontend): show sub-folders when navigating inside a folder (#12316)
## Summary

When opening a folder in the library, sub-folders were not displayed —
only agents were shown. This was caused by two issues:

1. The folder list query always fetched root-level folders (no
`parent_id` filter), so sub-folders were never requested
2. `showFolders` was set to `false` whenever a folder was selected,
hiding all folders from the view

### Changes 🏗️

- Pass `parent_id` to the `useGetV2ListLibraryFolders` hook so it
fetches child folders of the currently selected folder
- Remove the `!selectedFolderId` condition from `showFolders` so folders
render inside other folders
- Fetch the current folder via `useGetV2GetFolder` instead of searching
the (now differently-scoped) folder list
- Clean up breadcrumb: remove emoji icon, match folder name text size to
"My Library", replace `Button` with plain `<button>` to remove extra
padding/gap

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
- [x] Open a folder in the library and verify sub-folders are displayed
  - [x] Verify agents inside the folder still display correctly
- [x] Verify breadcrumb shows folder name without emoji, matching "My
Library" text size
- [x] Verify clicking "My Library" in breadcrumb navigates back to root
  - [x] Verify root-level view still shows all top-level folders
  - [x] Verify favorites tab does not show folders
2026-03-13 04:40:09 +00:00
Zamil Majdy
a083493aa2 fix(backend/copilot): auto-parse structured data and robust type coercion (#12390)
The copilot's `@@agptfile:` reference system always produces strings
when expanding
file references. This breaks blocks that expect structured types — e.g.
`GoogleSheetsWriteBlock` expects `values: list[list[str]]`, but receives
a raw CSV
string instead. Additionally, the copilot's input coercion was
duplicating logic from
the executor instead of reusing the shared `convert()` utility, and the
coercion had
no type-aware gating — it would always call `convert()`, which could
incorrectly
transform values that already matched the expected type (e.g.
stringifying a valid
`list[str]` in a `str | list[str]` union).

### Changes 🏗️

**Structured data parsing for `@@agptfile:` bare references:**
- When an entire tool argument value is a bare `@@agptfile:` reference,
the resolved
content is now auto-parsed: JSON → native types, CSV/TSV →
`list[list[str]]`
- Embedded references within larger strings still do plain text
substitution
- Updated copilot system prompt to document the structured data
capability

**Shared type coercion utility (`coerce_inputs_to_schema`):**
- Extracted `coerce_inputs_to_schema()` into `backend/util/type.py` —
shared by both
  the executor's `validate_exec()` and the copilot's `execute_block()`
- Uses Pydantic `model_fields` (not `__annotations__`) to include
inherited fields
- Added `_value_satisfies_type()` gate: only calls `convert()` when the
value doesn't
already match the target type, including recursive inner-element
checking for generics

**`_value_satisfies_type` — recursive type checking:**
- Handles `Any`, `Optional`, `Union`, `list[T]`, `dict[K,V]`, `set[T]`,
`tuple[T, ...]`,
  heterogeneous `tuple[str, int, bool]`, bare generics, nested generics
- Guards against non-runtime origins (`Literal`, etc.) to prevent
`isinstance()` crashes
- Returns `False` (not `True`) for unhandled generic origins as a safe
fallback

**Test coverage:**
- 51 new tests for `_value_satisfies_type` and `coerce_inputs_to_schema`
in `type_test.py`
- 8 new tests for `execute_block` type coercion in `helpers_test.py`

### Checklist 📋

#### For code changes:
- [x] I have clearly listed my changes in the PR description
- [x] I have made a test plan
- [x] I have tested my changes according to the test plan:
  - [x] All existing file_ref tests pass
- [x] All new type_test.py tests pass (51 tests covering
_value_satisfies_type and coerce_inputs_to_schema)
- [x] All new helpers_test.py tests pass (8 tests covering execute_block
coercion)
  - [x] `poetry run format` passes clean
  - [x] `poetry run lint` passes clean
  - [x] Pyright type checking passes
2026-03-12 19:27:41 +00:00
150 changed files with 11828 additions and 1360 deletions

View File

@@ -1,17 +0,0 @@
---
name: backend-check
description: Run the full backend formatting, linting, and test suite. Ensures code quality before commits and PRs. TRIGGER when backend Python code has been modified and needs validation.
user-invocable: true
metadata:
author: autogpt-team
version: "1.0.0"
---
# Backend Check
## Steps
1. **Format**: `poetry run format` — runs formatting AND linting. NEVER run ruff/black/isort individually
2. **Fix** any remaining errors manually, re-run until clean
3. **Test**: `poetry run test` (runs DB setup + pytest). For specific files: `poetry run pytest -s -vvv <test_files>`
4. **Snapshots** (if needed): `poetry run pytest path/to/test.py --snapshot-update` — review with `git diff`

View File

@@ -1,35 +0,0 @@
---
name: code-style
description: Python code style preferences for the AutoGPT backend. Apply when writing or reviewing Python code. TRIGGER when writing new Python code, reviewing PRs, or refactoring backend code.
user-invocable: false
metadata:
author: autogpt-team
version: "1.0.0"
---
# Code Style
## Imports
- **Top-level only** — no local/inner imports. Move all imports to the top of the file.
## Typing
- **No duck typing** — avoid `hasattr`, `getattr`, `isinstance` for type dispatch. Use proper typed interfaces, unions, or protocols.
- **Pydantic models** over dataclass, namedtuple, or raw dict for structured data.
- **No linter suppressors** — avoid `# type: ignore`, `# noqa`, `# pyright: ignore` etc. 99% of the time the right fix is fixing the type/code, not silencing the tool.
## Code Structure
- **List comprehensions** over manual loop-and-append.
- **Early return** — guard clauses first, avoid deep nesting.
- **Flatten inline** — prefer short, concise expressions. Reduce `if/else` chains with direct returns or ternaries when readable.
- **Modular functions** — break complex logic into small, focused functions rather than long blocks with nested conditionals.
## Review Checklist
Before finishing, always ask:
- Can any function be split into smaller pieces?
- Is there unnecessary nesting that an early return would eliminate?
- Can any loop be a comprehension?
- Is there a simpler way to express this logic?

View File

@@ -1,16 +0,0 @@
---
name: frontend-check
description: Run the full frontend formatting, linting, and type checking suite. Ensures code quality before commits and PRs. TRIGGER when frontend TypeScript/React code has been modified and needs validation.
user-invocable: true
metadata:
author: autogpt-team
version: "1.0.0"
---
# Frontend Check
## Steps (in order)
1. **Format**: `pnpm format` — NEVER run individual formatters
2. **Lint**: `pnpm lint` — fix errors, re-run until clean
3. **Types**: `pnpm types` — if it keeps failing after multiple attempts, stop and ask the user

View File

@@ -1,29 +0,0 @@
---
name: new-block
description: Create a new backend block following the Block SDK Guide. Guides through provider configuration, schema definition, authentication, and testing. TRIGGER when user asks to create a new block, add a new integration, or build a new node for the graph editor.
user-invocable: true
metadata:
author: autogpt-team
version: "1.0.0"
---
# New Block Creation
Read `docs/platform/block-sdk-guide.md` first for the full guide.
## Steps
1. **Provider config** (if external service): create `_config.py` with `ProviderBuilder`
2. **Block file** in `backend/blocks/` (from `autogpt_platform/backend/`):
- Generate a UUID once with `uuid.uuid4()`, then **hard-code that string** as `id` (IDs must be stable across imports)
- `Input(BlockSchema)` and `Output(BlockSchema)` classes
- `async def run` that `yield`s output fields
3. **Files**: use `store_media_file()` with `"for_block_output"` for outputs
4. **Test**: `poetry run pytest 'backend/blocks/test/test_block.py::test_available_blocks[MyBlock]' -xvs`
5. **Format**: `poetry run format`
## Rules
- Analyze interfaces: do inputs/outputs connect well with other blocks in a graph?
- Use top-level imports, avoid duck typing
- Always use `for_block_output` for block outputs

View File

@@ -1,28 +0,0 @@
---
name: openapi-regen
description: Regenerate the OpenAPI spec and frontend API client. Starts the backend REST server, fetches the spec, and regenerates the typed frontend hooks. TRIGGER when API routes change, new endpoints are added, or frontend API types are stale.
user-invocable: true
metadata:
author: autogpt-team
version: "1.0.0"
---
# OpenAPI Spec Regeneration
## Steps
1. **Run end-to-end** in a single shell block (so `REST_PID` persists):
```bash
cd autogpt_platform/backend && poetry run rest &
REST_PID=$!
WAIT=0; until curl -sf http://localhost:8006/health > /dev/null 2>&1; do sleep 1; WAIT=$((WAIT+1)); [ $WAIT -ge 60 ] && echo "Timed out" && kill $REST_PID && exit 1; done
cd ../frontend && pnpm generate:api:force
kill $REST_PID
pnpm types && pnpm lint && pnpm format
```
## Rules
- Always use `pnpm generate:api:force` (not `pnpm generate:api`)
- Don't manually edit files in `src/app/api/__generated__/`
- Generated hooks follow: `use{Method}{Version}{OperationName}`

View File

@@ -0,0 +1,79 @@
---
name: pr-address
description: Address PR review comments and loop until CI green and all comments resolved. TRIGGER when user asks to address comments, fix PR feedback, respond to reviewers, or babysit/monitor a PR.
user-invocable: true
args: "[PR number or URL] — if omitted, finds PR for current branch."
metadata:
author: autogpt-team
version: "1.0.0"
---
# PR Address
## Find the PR
```bash
gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT
gh pr view {N}
```
## Fetch comments (all sources)
```bash
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews # top-level reviews
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments # inline review comments
gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments # PR conversation comments
```
**Bots to watch for:**
- `autogpt-reviewer` — posts "Blockers", "Should Fix", "Nice to Have". Address ALL of them.
- `sentry[bot]` — bug predictions. Fix real bugs, explain false positives.
- `coderabbitai[bot]` — automated review. Address actionable items.
## For each unaddressed comment
Address comments **one at a time**: fix → commit → push → inline reply → next.
1. Read the referenced code, make the fix (or reply explaining why it's not needed)
2. Commit and push the fix
3. Reply **inline** (not as a new top-level comment) referencing the fixing commit — this is what resolves the conversation for bot reviewers (coderabbitai, sentry):
| Comment type | How to reply |
|---|---|
| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="Fixed in <commit-sha>: <description>"` |
| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="Fixed in <commit-sha>: <description>"` |
## Format and commit
After fixing, format the changed code:
- **Backend** (from `autogpt_platform/backend/`): `poetry run format`
- **Frontend** (from `autogpt_platform/frontend/`): `pnpm format && pnpm lint && pnpm types`
If API routes changed, regenerate the frontend client:
```bash
cd autogpt_platform/backend && poetry run rest &
REST_PID=$!
trap "kill $REST_PID 2>/dev/null" EXIT
WAIT=0; until curl -sf http://localhost:8006/health > /dev/null 2>&1; do sleep 1; WAIT=$((WAIT+1)); [ $WAIT -ge 60 ] && echo "Timed out" && exit 1; done
cd ../frontend && pnpm generate:api:force
kill $REST_PID 2>/dev/null; trap - EXIT
```
Never manually edit files in `src/app/api/__generated__/`.
Then commit and **push immediately** — never batch commits without pushing.
For backend commits in worktrees: `poetry run git commit` (pre-commit hooks).
## The loop
```text
address comments → format → commit → push
→ re-check comments → fix new ones → push
→ wait for CI → re-check comments after CI settles
→ repeat until: all comments addressed AND CI green AND no new comments arriving
```
While CI runs, stay productive: run local tests, address remaining comments.
**The loop ends when:** CI fully green + all comments addressed + no new comments since CI settled.

View File

@@ -1,31 +0,0 @@
---
name: pr-create
description: Create a pull request for the current branch. TRIGGER when user asks to create a PR, open a pull request, push changes for review, or submit work for merging.
user-invocable: true
metadata:
author: autogpt-team
version: "1.0.0"
---
# Create Pull Request
## Steps
1. **Check for existing PR**: `gh pr view --json url -q .url 2>/dev/null` — if a PR already exists, output its URL and stop
2. **Understand changes**: `git status`, `git diff dev...HEAD`, `git log dev..HEAD --oneline`
3. **Read PR template**: `.github/PULL_REQUEST_TEMPLATE.md`
4. **Draft PR title**: Use conventional commits format (see CLAUDE.md for types and scopes)
5. **Fill out PR template** as the body — be thorough in the Changes section
6. **Format first** (if relevant changes exist):
- Backend: `cd autogpt_platform/backend && poetry run format`
- Frontend: `cd autogpt_platform/frontend && pnpm format`
- Fix any lint errors, then commit formatting changes before pushing
7. **Push**: `git push -u origin HEAD`
8. **Create PR**: `gh pr create --base dev`
9. **Output** the PR URL
## Rules
- Always target `dev` branch
- Do NOT run tests — CI will handle that
- Use the PR template from `.github/PULL_REQUEST_TEMPLATE.md`

View File

@@ -1,51 +1,74 @@
---
name: pr-review
description: Address all open PR review comments systematically. Fetches comments, addresses each one, reacts +1/-1, and replies when clarification is needed. Keeps iterating until all comments are addressed and CI is green. TRIGGER when user shares a PR URL, asks to address review comments, fix PR feedback, or respond to reviewer comments.
description: Review a PR for correctness, security, code quality, and testing issues. TRIGGER when user asks to review a PR, check PR quality, or give feedback on a PR.
user-invocable: true
args: "[PR number or URL] — if omitted, finds PR for current branch."
metadata:
author: autogpt-team
version: "1.0.0"
---
# PR Review Comment Workflow
# PR Review
## Steps
## Find the PR
1. **Find PR**: `gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT`
2. **Fetch comments** (all three sources):
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews` (top-level reviews)
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments` (inline review comments)
- `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments` (PR conversation comments)
3. **Skip** comments already reacted to by PR author
4. **For each unreacted comment**:
- Read referenced code, make the fix (or reply if you disagree/need info)
- **Inline review comments** (`pulls/{N}/comments`):
- React: `gh api repos/.../pulls/comments/{ID}/reactions -f content="+1"` (or `-1`)
- Reply: `gh api repos/.../pulls/{N}/comments/{ID}/replies -f body="..."`
- **PR conversation comments** (`issues/{N}/comments`):
- React: `gh api repos/.../issues/comments/{ID}/reactions -f content="+1"` (or `-1`)
- No threaded replies — post a new issue comment if needed
- **Top-level reviews**: no reaction API — address in code, reply via issue comment if needed
5. **Include autogpt-reviewer bot fixes** too
6. **Format**: `cd autogpt_platform/backend && poetry run format`, `cd autogpt_platform/frontend && pnpm format`
7. **Commit & push**
8. **Re-fetch comments** immediately — address any new unreacted ones before waiting on CI
9. **Stay productive while CI runs** — don't idle. In priority order:
- Run any pending local tests (`poetry run pytest`, e2e, etc.) and fix failures
- Address any remaining comments
- Only poll `gh pr checks {N}` as the last resort when there's truly nothing left to do
10. **If CI fails** — fix, go back to step 6
11. **Re-fetch comments again** after CI is green — address anything that appeared while CI was running
12. **Done** only when: all comments reacted AND CI is green.
```bash
gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT
gh pr view {N}
```
## CRITICAL: Do Not Stop
## Read the diff
**Loop is: address → format → commit → push → re-check comments → run local tests → wait CI → re-check comments → repeat.**
```bash
gh pr diff {N}
```
Never idle. If CI is running and you have nothing to address, run local tests. Waiting on CI is the last resort.
## Fetch existing review comments
## Rules
Before posting anything, fetch existing inline comments to avoid duplicates:
- One todo per comment
- For inline review comments: reply on existing threads. For PR conversation comments: post a new issue comment (API doesn't support threaded replies)
- React to every comment: +1 addressed, -1 disagreed (with explanation)
```bash
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews
```
## What to check
**Correctness:** logic errors, off-by-one, missing edge cases, race conditions (TOCTOU in file access, credit charging), error handling gaps, async correctness (missing `await`, unclosed resources).
**Security:** input validation at boundaries, no injection (command, XSS, SQL), secrets not logged, file paths sanitized (`os.path.basename()` in error messages).
**Code quality:** apply rules from backend/frontend CLAUDE.md files.
**Architecture:** DRY, single responsibility, modular functions. `Security()` vs `Depends()` for FastAPI auth. `data:` for SSE events, `: comment` for heartbeats. `transaction=True` for Redis pipelines.
**Testing:** edge cases covered, colocated `*_test.py` (backend) / `__tests__/` (frontend), mocks target where symbol is **used** not defined, `AsyncMock` for async.
## Output format
Every comment **must** be prefixed with `🤖` and a criticality badge:
| Tier | Badge | Meaning |
|---|---|---|
| Blocker | `🔴 **Blocker**` | Must fix before merge |
| Should Fix | `🟠 **Should Fix**` | Important improvement |
| Nice to Have | `🟡 **Nice to Have**` | Minor suggestion |
| Nit | `🔵 **Nit**` | Style / wording |
Example: `🤖 🔴 **Blocker**: Missing error handling for X — suggest wrapping in try/except.`
## Post inline comments
For each finding, post an inline comment on the PR (do not just write a local report):
```bash
# Get the latest commit SHA for the PR
COMMIT_SHA=$(gh api repos/Significant-Gravitas/AutoGPT/pulls/{N} --jq '.head.sha')
# Post an inline comment on a specific file/line
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments \
-f body="🤖 🔴 **Blocker**: <description>" \
-f commit_id="$COMMIT_SHA" \
-f path="<file path>" \
-F line=<line number>
```

View File

@@ -1,45 +0,0 @@
---
name: worktree-setup
description: Set up a new git worktree for parallel development. Creates the worktree, copies .env files, installs dependencies, generates Prisma client, and optionally starts the app (with port conflict resolution) or runs tests. TRIGGER when user asks to set up a worktree, work on a branch in isolation, or needs a separate environment for a branch or PR.
user-invocable: true
metadata:
author: autogpt-team
version: "1.0.0"
---
# Worktree Setup
## Preferred: Use Branchlet
The repo has a `.branchlet.json` config — it handles env file copying, dependency installation, and Prisma generation automatically.
```bash
npm install -g branchlet # install once
branchlet create -n <name> -s <source-branch> -b <new-branch>
branchlet list --json # list all worktrees
```
## Manual Fallback
If branchlet isn't available:
1. `git worktree add ../<RepoName><N> <branch-name>`
2. Copy `.env` files: `backend/.env`, `frontend/.env`, `autogpt_platform/.env`, `db/docker/.env`
3. Install deps:
- `cd autogpt_platform/backend && poetry install && poetry run prisma generate`
- `cd autogpt_platform/frontend && pnpm install`
## Running the App
Free ports first — backend uses: 8001, 8002, 8003, 8005, 8006, 8007, 8008.
```bash
for port in 8001 8002 8003 8005 8006 8007 8008; do
lsof -ti :$port | xargs kill -9 2>/dev/null || true
done
cd <worktree>/autogpt_platform/backend && poetry run app
```
## CoPilot Testing Gotcha
SDK mode spawns a Claude subprocess — **won't work inside Claude Code**. Set `CHAT_USE_CLAUDE_AGENT_SDK=false` in `backend/.env` to use baseline mode.

View File

@@ -0,0 +1,85 @@
---
name: worktree
description: Set up a new git worktree for parallel development. Creates the worktree, copies .env files, installs dependencies, and generates Prisma client. TRIGGER when user asks to set up a worktree, work on a branch in isolation, or needs a separate environment for a branch or PR.
user-invocable: true
args: "[name] — optional worktree name (e.g., 'AutoGPT7'). If omitted, uses next available AutoGPT<N>."
metadata:
author: autogpt-team
version: "3.0.0"
---
# Worktree Setup
## Create the worktree
Derive paths from the git toplevel. If a name is provided as argument, use it. Otherwise, check `git worktree list` and pick the next `AutoGPT<N>`.
```bash
ROOT=$(git rev-parse --show-toplevel)
PARENT=$(dirname "$ROOT")
# From an existing branch
git worktree add "$PARENT/<NAME>" <branch-name>
# From a new branch off dev
git worktree add -b <new-branch> "$PARENT/<NAME>" dev
```
## Copy environment files
Copy `.env` from the root worktree. Falls back to `.env.default` if `.env` doesn't exist.
```bash
ROOT=$(git rev-parse --show-toplevel)
TARGET="$(dirname "$ROOT")/<NAME>"
for envpath in autogpt_platform/backend autogpt_platform/frontend autogpt_platform; do
if [ -f "$ROOT/$envpath/.env" ]; then
cp "$ROOT/$envpath/.env" "$TARGET/$envpath/.env"
elif [ -f "$ROOT/$envpath/.env.default" ]; then
cp "$ROOT/$envpath/.env.default" "$TARGET/$envpath/.env"
fi
done
```
## Install dependencies
```bash
TARGET="$(dirname "$(git rev-parse --show-toplevel)")/<NAME>"
cd "$TARGET/autogpt_platform/autogpt_libs" && poetry install
cd "$TARGET/autogpt_platform/backend" && poetry install && poetry run prisma generate
cd "$TARGET/autogpt_platform/frontend" && pnpm install
```
Replace `<NAME>` with the actual worktree name (e.g., `AutoGPT7`).
## Running the app (optional)
Backend uses ports: 8001, 8002, 8003, 8005, 8006, 8007, 8008. Free them first if needed:
```bash
TARGET="$(dirname "$(git rev-parse --show-toplevel)")/<NAME>"
for port in 8001 8002 8003 8005 8006 8007 8008; do
lsof -ti :$port | xargs kill -9 2>/dev/null || true
done
cd "$TARGET/autogpt_platform/backend" && poetry run app
```
## CoPilot testing
SDK mode spawns a Claude subprocess — won't work inside Claude Code. Set `CHAT_USE_CLAUDE_AGENT_SDK=false` in `backend/.env` to use baseline mode.
## Cleanup
```bash
# Replace <NAME> with the actual worktree name (e.g., AutoGPT7)
git worktree remove "$(dirname "$(git rev-parse --show-toplevel)")/<NAME>"
```
## Alternative: Branchlet (optional)
If [branchlet](https://www.npmjs.com/package/branchlet) is installed:
```bash
branchlet create -n <name> -s <source-branch> -b <new-branch>
```

View File

@@ -60,9 +60,12 @@ AutoGPT Platform is a monorepo containing:
### Reviewing/Revising Pull Requests
- When the user runs /pr-comments or tries to fetch them, also run gh api /repos/Significant-Gravitas/AutoGPT/pulls/[issuenum]/reviews to get the reviews
- Use gh api /repos/Significant-Gravitas/AutoGPT/pulls/[issuenum]/reviews/[review_id]/comments to get the review contents
- Use gh api /repos/Significant-Gravitas/AutoGPT/issues/9924/comments to get the pr specific comments
Use `/pr-review` to review a PR or `/pr-address` to address comments.
When fetching comments manually:
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews` — top-level reviews
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments` — inline review comments
- `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments` — PR conversation comments
### Conventional Commits

View File

@@ -0,0 +1,40 @@
-- =============================================================
-- View: analytics.auth_activities
-- Looker source alias: ds49 | Charts: 1
-- =============================================================
-- DESCRIPTION
-- Tracks authentication events (login, logout, SSO, password
-- reset, etc.) from Supabase's internal audit log.
-- Useful for monitoring sign-in patterns and detecting anomalies.
--
-- SOURCE TABLES
-- auth.audit_log_entries — Supabase internal auth event log
--
-- OUTPUT COLUMNS
-- created_at TIMESTAMPTZ When the auth event occurred
-- actor_id TEXT User ID who triggered the event
-- actor_via_sso TEXT Whether the action was via SSO ('true'/'false')
-- action TEXT Event type (e.g. 'login', 'logout', 'token_refreshed')
--
-- WINDOW
-- Rolling 90 days from current date
--
-- EXAMPLE QUERIES
-- -- Daily login counts
-- SELECT DATE_TRUNC('day', created_at) AS day, COUNT(*) AS logins
-- FROM analytics.auth_activities
-- WHERE action = 'login'
-- GROUP BY 1 ORDER BY 1;
--
-- -- SSO vs password login breakdown
-- SELECT actor_via_sso, COUNT(*) FROM analytics.auth_activities
-- WHERE action = 'login' GROUP BY 1;
-- =============================================================
SELECT
created_at,
payload->>'actor_id' AS actor_id,
payload->>'actor_via_sso' AS actor_via_sso,
payload->>'action' AS action
FROM auth.audit_log_entries
WHERE created_at >= NOW() - INTERVAL '90 days'

View File

@@ -0,0 +1,105 @@
-- =============================================================
-- View: analytics.graph_execution
-- Looker source alias: ds16 | Charts: 21
-- =============================================================
-- DESCRIPTION
-- One row per agent graph execution (last 90 days).
-- Unpacks the JSONB stats column into individual numeric columns
-- and normalises the executionStatus — runs that failed due to
-- insufficient credits are reclassified as 'NO_CREDITS' for
-- easier filtering. Error messages are scrubbed of IDs and URLs
-- to allow safe grouping.
--
-- SOURCE TABLES
-- platform.AgentGraphExecution — Execution records
-- platform.AgentGraph — Agent graph metadata (for name)
-- platform.LibraryAgent — To flag possibly-AI (safe-mode) agents
--
-- OUTPUT COLUMNS
-- id TEXT Execution UUID
-- agentGraphId TEXT Agent graph UUID
-- agentGraphVersion INT Graph version number
-- executionStatus TEXT COMPLETED | FAILED | NO_CREDITS | RUNNING | QUEUED | TERMINATED
-- createdAt TIMESTAMPTZ When the execution was queued
-- updatedAt TIMESTAMPTZ Last status update time
-- userId TEXT Owner user UUID
-- agentGraphName TEXT Human-readable agent name
-- cputime DECIMAL Total CPU seconds consumed
-- walltime DECIMAL Total wall-clock seconds
-- node_count DECIMAL Number of nodes in the graph
-- nodes_cputime DECIMAL CPU time across all nodes
-- nodes_walltime DECIMAL Wall time across all nodes
-- execution_cost DECIMAL Credit cost of this execution
-- correctness_score FLOAT AI correctness score (if available)
-- possibly_ai BOOLEAN True if agent has sensitive_action_safe_mode enabled
-- groupedErrorMessage TEXT Scrubbed error string (IDs/URLs replaced with wildcards)
--
-- WINDOW
-- Rolling 90 days (createdAt > CURRENT_DATE - 90 days)
--
-- EXAMPLE QUERIES
-- -- Daily execution counts by status
-- SELECT DATE_TRUNC('day', "createdAt") AS day, "executionStatus", COUNT(*)
-- FROM analytics.graph_execution
-- GROUP BY 1, 2 ORDER BY 1;
--
-- -- Average cost per execution by agent
-- SELECT "agentGraphName", AVG("execution_cost") AS avg_cost, COUNT(*) AS runs
-- FROM analytics.graph_execution
-- WHERE "executionStatus" = 'COMPLETED'
-- GROUP BY 1 ORDER BY avg_cost DESC;
--
-- -- Top error messages
-- SELECT "groupedErrorMessage", COUNT(*) AS occurrences
-- FROM analytics.graph_execution
-- WHERE "executionStatus" = 'FAILED'
-- GROUP BY 1 ORDER BY 2 DESC LIMIT 20;
-- =============================================================
SELECT
ge."id" AS id,
ge."agentGraphId" AS agentGraphId,
ge."agentGraphVersion" AS agentGraphVersion,
CASE
WHEN jsonb_exists(ge."stats"::jsonb, 'error')
AND (
(ge."stats"::jsonb->>'error') ILIKE '%insufficient balance%'
OR (ge."stats"::jsonb->>'error') ILIKE '%you have no credits left%'
)
THEN 'NO_CREDITS'
ELSE CAST(ge."executionStatus" AS TEXT)
END AS executionStatus,
ge."createdAt" AS createdAt,
ge."updatedAt" AS updatedAt,
ge."userId" AS userId,
g."name" AS agentGraphName,
(ge."stats"::jsonb->>'cputime')::decimal AS cputime,
(ge."stats"::jsonb->>'walltime')::decimal AS walltime,
(ge."stats"::jsonb->>'node_count')::decimal AS node_count,
(ge."stats"::jsonb->>'nodes_cputime')::decimal AS nodes_cputime,
(ge."stats"::jsonb->>'nodes_walltime')::decimal AS nodes_walltime,
(ge."stats"::jsonb->>'cost')::decimal AS execution_cost,
(ge."stats"::jsonb->>'correctness_score')::float AS correctness_score,
COALESCE(la.possibly_ai, FALSE) AS possibly_ai,
REGEXP_REPLACE(
REGEXP_REPLACE(
TRIM(BOTH '"' FROM ge."stats"::jsonb->>'error'),
'(https?://)([A-Za-z0-9.-]+)(:[0-9]+)?(/[^\s]*)?',
'\1\2/...', 'gi'
),
'[a-zA-Z0-9_:-]*\d[a-zA-Z0-9_:-]*', '*', 'g'
) AS groupedErrorMessage
FROM platform."AgentGraphExecution" ge
LEFT JOIN platform."AgentGraph" g
ON ge."agentGraphId" = g."id"
AND ge."agentGraphVersion" = g."version"
LEFT JOIN (
SELECT DISTINCT ON ("userId", "agentGraphId")
"userId", "agentGraphId",
("settings"::jsonb->>'sensitive_action_safe_mode')::boolean AS possibly_ai
FROM platform."LibraryAgent"
WHERE "isDeleted" = FALSE
AND "isArchived" = FALSE
ORDER BY "userId", "agentGraphId", "agentGraphVersion" DESC
) la ON la."userId" = ge."userId" AND la."agentGraphId" = ge."agentGraphId"
WHERE ge."createdAt" > CURRENT_DATE - INTERVAL '90 days'

View File

@@ -0,0 +1,101 @@
-- =============================================================
-- View: analytics.node_block_execution
-- Looker source alias: ds14 | Charts: 11
-- =============================================================
-- DESCRIPTION
-- One row per node (block) execution (last 90 days).
-- Unpacks stats JSONB and joins to identify which block type
-- was run. For failed nodes, joins the error output and
-- scrubs it for safe grouping.
--
-- SOURCE TABLES
-- platform.AgentNodeExecution — Node execution records
-- platform.AgentNode — Node → block mapping
-- platform.AgentBlock — Block name/ID
-- platform.AgentNodeExecutionInputOutput — Error output values
--
-- OUTPUT COLUMNS
-- id TEXT Node execution UUID
-- agentGraphExecutionId TEXT Parent graph execution UUID
-- agentNodeId TEXT Node UUID within the graph
-- executionStatus TEXT COMPLETED | FAILED | QUEUED | RUNNING | TERMINATED
-- addedTime TIMESTAMPTZ When the node was queued
-- queuedTime TIMESTAMPTZ When it entered the queue
-- startedTime TIMESTAMPTZ When execution started
-- endedTime TIMESTAMPTZ When execution finished
-- inputSize BIGINT Input payload size in bytes
-- outputSize BIGINT Output payload size in bytes
-- walltime NUMERIC Wall-clock seconds for this node
-- cputime NUMERIC CPU seconds for this node
-- llmRetryCount INT Number of LLM retries
-- llmCallCount INT Number of LLM API calls made
-- inputTokenCount BIGINT LLM input tokens consumed
-- outputTokenCount BIGINT LLM output tokens produced
-- blockName TEXT Human-readable block name (e.g. 'OpenAIBlock')
-- blockId TEXT Block UUID
-- groupedErrorMessage TEXT Scrubbed error (IDs/URLs wildcarded)
-- errorMessage TEXT Raw error output (only set when FAILED)
--
-- WINDOW
-- Rolling 90 days (addedTime > CURRENT_DATE - 90 days)
--
-- EXAMPLE QUERIES
-- -- Most-used blocks by execution count
-- SELECT "blockName", COUNT(*) AS executions,
-- COUNT(*) FILTER (WHERE "executionStatus"='FAILED') AS failures
-- FROM analytics.node_block_execution
-- GROUP BY 1 ORDER BY executions DESC LIMIT 20;
--
-- -- Average LLM token usage per block
-- SELECT "blockName",
-- AVG("inputTokenCount") AS avg_input_tokens,
-- AVG("outputTokenCount") AS avg_output_tokens
-- FROM analytics.node_block_execution
-- WHERE "llmCallCount" > 0
-- GROUP BY 1 ORDER BY avg_input_tokens DESC;
--
-- -- Top failure reasons
-- SELECT "blockName", "groupedErrorMessage", COUNT(*) AS count
-- FROM analytics.node_block_execution
-- WHERE "executionStatus" = 'FAILED'
-- GROUP BY 1, 2 ORDER BY count DESC LIMIT 20;
-- =============================================================
SELECT
ne."id" AS id,
ne."agentGraphExecutionId" AS agentGraphExecutionId,
ne."agentNodeId" AS agentNodeId,
CAST(ne."executionStatus" AS TEXT) AS executionStatus,
ne."addedTime" AS addedTime,
ne."queuedTime" AS queuedTime,
ne."startedTime" AS startedTime,
ne."endedTime" AS endedTime,
(ne."stats"::jsonb->>'input_size')::bigint AS inputSize,
(ne."stats"::jsonb->>'output_size')::bigint AS outputSize,
(ne."stats"::jsonb->>'walltime')::numeric AS walltime,
(ne."stats"::jsonb->>'cputime')::numeric AS cputime,
(ne."stats"::jsonb->>'llm_retry_count')::int AS llmRetryCount,
(ne."stats"::jsonb->>'llm_call_count')::int AS llmCallCount,
(ne."stats"::jsonb->>'input_token_count')::bigint AS inputTokenCount,
(ne."stats"::jsonb->>'output_token_count')::bigint AS outputTokenCount,
b."name" AS blockName,
b."id" AS blockId,
REGEXP_REPLACE(
REGEXP_REPLACE(
TRIM(BOTH '"' FROM eio."data"::text),
'(https?://)([A-Za-z0-9.-]+)(:[0-9]+)?(/[^\s]*)?',
'\1\2/...', 'gi'
),
'[a-zA-Z0-9_:-]*\d[a-zA-Z0-9_:-]*', '*', 'g'
) AS groupedErrorMessage,
eio."data" AS errorMessage
FROM platform."AgentNodeExecution" ne
LEFT JOIN platform."AgentNode" nd
ON ne."agentNodeId" = nd."id"
LEFT JOIN platform."AgentBlock" b
ON nd."agentBlockId" = b."id"
LEFT JOIN platform."AgentNodeExecutionInputOutput" eio
ON eio."referencedByOutputExecId" = ne."id"
AND eio."name" = 'error'
AND ne."executionStatus" = 'FAILED'
WHERE ne."addedTime" > CURRENT_DATE - INTERVAL '90 days'

View File

@@ -0,0 +1,97 @@
-- =============================================================
-- View: analytics.retention_agent
-- Looker source alias: ds35 | Charts: 2
-- =============================================================
-- DESCRIPTION
-- Weekly cohort retention broken down per individual agent.
-- Cohort = week of a user's first use of THAT specific agent.
-- Tells you which agents keep users coming back vs. one-shot
-- use. Only includes cohorts from the last 180 days.
--
-- SOURCE TABLES
-- platform.AgentGraphExecution — Execution records (user × agent × time)
-- platform.AgentGraph — Agent names
--
-- OUTPUT COLUMNS
-- agent_id TEXT Agent graph UUID
-- agent_label TEXT 'AgentName [first8chars]'
-- agent_label_n TEXT 'AgentName [first8chars] (n=total_users)'
-- cohort_week_start DATE Week users first ran this agent
-- cohort_label TEXT ISO week label
-- cohort_label_n TEXT ISO week label with cohort size
-- user_lifetime_week INT Weeks since first use of this agent
-- cohort_users BIGINT Users in this cohort for this agent
-- active_users BIGINT Users who ran the agent again in week k
-- retention_rate FLOAT active_users / cohort_users
-- cohort_users_w0 BIGINT cohort_users only at week 0 (safe to SUM)
-- agent_total_users BIGINT Total users across all cohorts for this agent
--
-- EXAMPLE QUERIES
-- -- Best-retained agents at week 2
-- SELECT agent_label, AVG(retention_rate) AS w2_retention
-- FROM analytics.retention_agent
-- WHERE user_lifetime_week = 2 AND cohort_users >= 10
-- GROUP BY 1 ORDER BY w2_retention DESC LIMIT 10;
--
-- -- Agents with most unique users
-- SELECT DISTINCT agent_label, agent_total_users
-- FROM analytics.retention_agent
-- ORDER BY agent_total_users DESC LIMIT 20;
-- =============================================================
WITH params AS (SELECT 12::int AS max_weeks, (CURRENT_DATE - INTERVAL '180 days') AS cohort_start),
events AS (
SELECT e."userId"::text AS user_id, e."agentGraphId" AS agent_id,
e."createdAt"::timestamptz AS created_at,
DATE_TRUNC('week', e."createdAt")::date AS week_start
FROM platform."AgentGraphExecution" e
),
first_use AS (
SELECT user_id, agent_id, MIN(created_at) AS first_use_at,
DATE_TRUNC('week', MIN(created_at))::date AS cohort_week_start
FROM events GROUP BY 1,2
HAVING MIN(created_at) >= (SELECT cohort_start FROM params)
),
activity_weeks AS (SELECT DISTINCT user_id, agent_id, week_start FROM events),
user_week_age AS (
SELECT aw.user_id, aw.agent_id, fu.cohort_week_start,
((aw.week_start - DATE_TRUNC('week',fu.first_use_at)::date)/7)::int AS user_lifetime_week
FROM activity_weeks aw JOIN first_use fu USING (user_id, agent_id)
WHERE aw.week_start >= DATE_TRUNC('week',fu.first_use_at)::date
),
active_counts AS (
SELECT agent_id, cohort_week_start, user_lifetime_week, COUNT(DISTINCT user_id) AS active_users
FROM user_week_age WHERE user_lifetime_week >= 0 GROUP BY 1,2,3
),
cohort_sizes AS (
SELECT agent_id, cohort_week_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_use GROUP BY 1,2
),
cohort_caps AS (
SELECT cs.agent_id, cs.cohort_week_start, cs.cohort_users,
LEAST((SELECT max_weeks FROM params),
GREATEST(0,((DATE_TRUNC('week',CURRENT_DATE)::date-cs.cohort_week_start)/7)::int)) AS cap_weeks
FROM cohort_sizes cs
),
grid AS (
SELECT cc.agent_id, cc.cohort_week_start, gs AS user_lifetime_week, cc.cohort_users
FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_weeks) gs
),
agent_names AS (SELECT DISTINCT ON (g."id") g."id" AS agent_id, g."name" AS agent_name FROM platform."AgentGraph" g ORDER BY g."id", g."version" DESC),
agent_total_users AS (SELECT agent_id, SUM(cohort_users) AS agent_total_users FROM cohort_sizes GROUP BY 1)
SELECT
g.agent_id,
COALESCE(an.agent_name,'(unnamed)')||' ['||LEFT(g.agent_id::text,8)||']' AS agent_label,
COALESCE(an.agent_name,'(unnamed)')||' ['||LEFT(g.agent_id::text,8)||'] (n='||COALESCE(atu.agent_total_users,0)||')' AS agent_label_n,
g.cohort_week_start,
TO_CHAR(g.cohort_week_start,'IYYY-"W"IW') AS cohort_label,
TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')||' (n='||g.cohort_users||')' AS cohort_label_n,
g.user_lifetime_week, g.cohort_users,
COALESCE(ac.active_users,0) AS active_users,
COALESCE(ac.active_users,0)::float / NULLIF(g.cohort_users,0) AS retention_rate,
CASE WHEN g.user_lifetime_week=0 THEN g.cohort_users ELSE 0 END AS cohort_users_w0,
COALESCE(atu.agent_total_users,0) AS agent_total_users
FROM grid g
LEFT JOIN active_counts ac ON ac.agent_id=g.agent_id AND ac.cohort_week_start=g.cohort_week_start AND ac.user_lifetime_week=g.user_lifetime_week
LEFT JOIN agent_names an ON an.agent_id=g.agent_id
LEFT JOIN agent_total_users atu ON atu.agent_id=g.agent_id
ORDER BY agent_label, g.cohort_week_start, g.user_lifetime_week;

View File

@@ -0,0 +1,81 @@
-- =============================================================
-- View: analytics.retention_execution_daily
-- Looker source alias: ds111 | Charts: 1
-- =============================================================
-- DESCRIPTION
-- Daily cohort retention based on agent executions.
-- Cohort anchor = day of user's FIRST ever execution.
-- Only includes cohorts from the last 90 days, up to day 30.
-- Great for early engagement analysis (did users run another
-- agent the next day?).
--
-- SOURCE TABLES
-- platform.AgentGraphExecution — Execution records
--
-- OUTPUT COLUMNS
-- Same pattern as retention_login_daily.
-- cohort_day_start = day of first execution (not first login)
--
-- EXAMPLE QUERIES
-- -- Day-3 execution retention
-- SELECT cohort_label, retention_rate_bounded AS d3_retention
-- FROM analytics.retention_execution_daily
-- WHERE user_lifetime_day = 3 ORDER BY cohort_day_start;
-- =============================================================
WITH params AS (SELECT 30::int AS max_days, (CURRENT_DATE - INTERVAL '90 days') AS cohort_start),
events AS (
SELECT e."userId"::text AS user_id, e."createdAt"::timestamptz AS created_at,
DATE_TRUNC('day', e."createdAt")::date AS day_start
FROM platform."AgentGraphExecution" e WHERE e."userId" IS NOT NULL
),
first_exec AS (
SELECT user_id, MIN(created_at) AS first_exec_at,
DATE_TRUNC('day', MIN(created_at))::date AS cohort_day_start
FROM events GROUP BY 1
HAVING MIN(created_at) >= (SELECT cohort_start FROM params)
),
activity_days AS (SELECT DISTINCT user_id, day_start FROM events),
user_day_age AS (
SELECT ad.user_id, fe.cohort_day_start,
(ad.day_start - DATE_TRUNC('day',fe.first_exec_at)::date)::int AS user_lifetime_day
FROM activity_days ad JOIN first_exec fe USING (user_id)
WHERE ad.day_start >= DATE_TRUNC('day',fe.first_exec_at)::date
),
bounded_counts AS (
SELECT cohort_day_start, user_lifetime_day, COUNT(DISTINCT user_id) AS active_users_bounded
FROM user_day_age WHERE user_lifetime_day >= 0 GROUP BY 1,2
),
last_active AS (
SELECT cohort_day_start, user_id, MAX(user_lifetime_day) AS last_active_day FROM user_day_age GROUP BY 1,2
),
unbounded_counts AS (
SELECT la.cohort_day_start, gs AS user_lifetime_day, COUNT(*) AS retained_users_unbounded
FROM last_active la
CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_day,(SELECT max_days FROM params))) gs
GROUP BY 1,2
),
cohort_sizes AS (SELECT cohort_day_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_exec GROUP BY 1),
cohort_caps AS (
SELECT cs.cohort_day_start, cs.cohort_users,
LEAST((SELECT max_days FROM params), GREATEST(0,(CURRENT_DATE-cs.cohort_day_start)::int)) AS cap_days
FROM cohort_sizes cs
),
grid AS (
SELECT cc.cohort_day_start, gs AS user_lifetime_day, cc.cohort_users
FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_days) gs
)
SELECT
g.cohort_day_start,
TO_CHAR(g.cohort_day_start,'YYYY-MM-DD') AS cohort_label,
TO_CHAR(g.cohort_day_start,'YYYY-MM-DD')||' (n='||g.cohort_users||')' AS cohort_label_n,
g.user_lifetime_day, g.cohort_users,
COALESCE(b.active_users_bounded,0) AS active_users_bounded,
COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END AS retention_rate_bounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
CASE WHEN g.user_lifetime_day=0 THEN g.cohort_users ELSE 0 END AS cohort_users_d0
FROM grid g
LEFT JOIN bounded_counts b ON b.cohort_day_start=g.cohort_day_start AND b.user_lifetime_day=g.user_lifetime_day
LEFT JOIN unbounded_counts u ON u.cohort_day_start=g.cohort_day_start AND u.user_lifetime_day=g.user_lifetime_day
ORDER BY g.cohort_day_start, g.user_lifetime_day;

View File

@@ -0,0 +1,81 @@
-- =============================================================
-- View: analytics.retention_execution_weekly
-- Looker source alias: ds92 | Charts: 2
-- =============================================================
-- DESCRIPTION
-- Weekly cohort retention based on agent executions.
-- Cohort anchor = week of user's FIRST ever agent execution
-- (not first login). Only includes cohorts from the last 180 days.
-- Useful when you care about product engagement, not just visits.
--
-- SOURCE TABLES
-- platform.AgentGraphExecution — Execution records
--
-- OUTPUT COLUMNS
-- Same pattern as retention_login_weekly.
-- cohort_week_start = week of first execution (not first login)
--
-- EXAMPLE QUERIES
-- -- Week-2 execution retention
-- SELECT cohort_label, retention_rate_bounded
-- FROM analytics.retention_execution_weekly
-- WHERE user_lifetime_week = 2 ORDER BY cohort_week_start;
-- =============================================================
WITH params AS (SELECT 12::int AS max_weeks, (CURRENT_DATE - INTERVAL '180 days') AS cohort_start),
events AS (
SELECT e."userId"::text AS user_id, e."createdAt"::timestamptz AS created_at,
DATE_TRUNC('week', e."createdAt")::date AS week_start
FROM platform."AgentGraphExecution" e WHERE e."userId" IS NOT NULL
),
first_exec AS (
SELECT user_id, MIN(created_at) AS first_exec_at,
DATE_TRUNC('week', MIN(created_at))::date AS cohort_week_start
FROM events GROUP BY 1
HAVING MIN(created_at) >= (SELECT cohort_start FROM params)
),
activity_weeks AS (SELECT DISTINCT user_id, week_start FROM events),
user_week_age AS (
SELECT aw.user_id, fe.cohort_week_start,
((aw.week_start - DATE_TRUNC('week',fe.first_exec_at)::date)/7)::int AS user_lifetime_week
FROM activity_weeks aw JOIN first_exec fe USING (user_id)
WHERE aw.week_start >= DATE_TRUNC('week',fe.first_exec_at)::date
),
bounded_counts AS (
SELECT cohort_week_start, user_lifetime_week, COUNT(DISTINCT user_id) AS active_users_bounded
FROM user_week_age WHERE user_lifetime_week >= 0 GROUP BY 1,2
),
last_active AS (
SELECT cohort_week_start, user_id, MAX(user_lifetime_week) AS last_active_week FROM user_week_age GROUP BY 1,2
),
unbounded_counts AS (
SELECT la.cohort_week_start, gs AS user_lifetime_week, COUNT(*) AS retained_users_unbounded
FROM last_active la
CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_week,(SELECT max_weeks FROM params))) gs
GROUP BY 1,2
),
cohort_sizes AS (SELECT cohort_week_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_exec GROUP BY 1),
cohort_caps AS (
SELECT cs.cohort_week_start, cs.cohort_users,
LEAST((SELECT max_weeks FROM params),
GREATEST(0,((DATE_TRUNC('week',CURRENT_DATE)::date-cs.cohort_week_start)/7)::int)) AS cap_weeks
FROM cohort_sizes cs
),
grid AS (
SELECT cc.cohort_week_start, gs AS user_lifetime_week, cc.cohort_users
FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_weeks) gs
)
SELECT
g.cohort_week_start,
TO_CHAR(g.cohort_week_start,'IYYY-"W"IW') AS cohort_label,
TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')||' (n='||g.cohort_users||')' AS cohort_label_n,
g.user_lifetime_week, g.cohort_users,
COALESCE(b.active_users_bounded,0) AS active_users_bounded,
COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END AS retention_rate_bounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
CASE WHEN g.user_lifetime_week=0 THEN g.cohort_users ELSE 0 END AS cohort_users_w0
FROM grid g
LEFT JOIN bounded_counts b ON b.cohort_week_start=g.cohort_week_start AND b.user_lifetime_week=g.user_lifetime_week
LEFT JOIN unbounded_counts u ON u.cohort_week_start=g.cohort_week_start AND u.user_lifetime_week=g.user_lifetime_week
ORDER BY g.cohort_week_start, g.user_lifetime_week;

View File

@@ -0,0 +1,94 @@
-- =============================================================
-- View: analytics.retention_login_daily
-- Looker source alias: ds112 | Charts: 1
-- =============================================================
-- DESCRIPTION
-- Daily cohort retention based on login sessions.
-- Same logic as retention_login_weekly but at day granularity,
-- showing up to day 30 for cohorts from the last 90 days.
-- Useful for analysing early activation (days 1-7) in detail.
--
-- SOURCE TABLES
-- auth.sessions — Login session records
--
-- OUTPUT COLUMNS (same pattern as retention_login_weekly)
-- cohort_day_start DATE First day the cohort logged in
-- cohort_label TEXT Date string (e.g. '2025-03-01')
-- cohort_label_n TEXT Date + cohort size (e.g. '2025-03-01 (n=12)')
-- user_lifetime_day INT Days since first login (0 = signup day)
-- cohort_users BIGINT Total users in cohort
-- active_users_bounded BIGINT Users active on exactly day k
-- retained_users_unbounded BIGINT Users active any time on/after day k
-- retention_rate_bounded FLOAT bounded / cohort_users
-- retention_rate_unbounded FLOAT unbounded / cohort_users
-- cohort_users_d0 BIGINT cohort_users only at day 0, else 0 (safe to SUM)
--
-- EXAMPLE QUERIES
-- -- Day-1 retention rate (came back next day)
-- SELECT cohort_label, retention_rate_bounded AS d1_retention
-- FROM analytics.retention_login_daily
-- WHERE user_lifetime_day = 1 ORDER BY cohort_day_start;
--
-- -- Average retention curve across all cohorts
-- SELECT user_lifetime_day,
-- SUM(active_users_bounded)::float / NULLIF(SUM(cohort_users_d0), 0) AS avg_retention
-- FROM analytics.retention_login_daily
-- GROUP BY 1 ORDER BY 1;
-- =============================================================
WITH params AS (SELECT 30::int AS max_days, (CURRENT_DATE - INTERVAL '90 days')::date AS cohort_start),
events AS (
SELECT s.user_id::text AS user_id, s.created_at::timestamptz AS created_at,
DATE_TRUNC('day', s.created_at)::date AS day_start
FROM auth.sessions s WHERE s.user_id IS NOT NULL
),
first_login AS (
SELECT user_id, MIN(created_at) AS first_login_time,
DATE_TRUNC('day', MIN(created_at))::date AS cohort_day_start
FROM events GROUP BY 1
HAVING MIN(created_at) >= (SELECT cohort_start FROM params)
),
activity_days AS (SELECT DISTINCT user_id, day_start FROM events),
user_day_age AS (
SELECT ad.user_id, fl.cohort_day_start,
(ad.day_start - DATE_TRUNC('day', fl.first_login_time)::date)::int AS user_lifetime_day
FROM activity_days ad JOIN first_login fl USING (user_id)
WHERE ad.day_start >= DATE_TRUNC('day', fl.first_login_time)::date
),
bounded_counts AS (
SELECT cohort_day_start, user_lifetime_day, COUNT(DISTINCT user_id) AS active_users_bounded
FROM user_day_age WHERE user_lifetime_day >= 0 GROUP BY 1,2
),
last_active AS (
SELECT cohort_day_start, user_id, MAX(user_lifetime_day) AS last_active_day FROM user_day_age GROUP BY 1,2
),
unbounded_counts AS (
SELECT la.cohort_day_start, gs AS user_lifetime_day, COUNT(*) AS retained_users_unbounded
FROM last_active la
CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_day,(SELECT max_days FROM params))) gs
GROUP BY 1,2
),
cohort_sizes AS (SELECT cohort_day_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_login GROUP BY 1),
cohort_caps AS (
SELECT cs.cohort_day_start, cs.cohort_users,
LEAST((SELECT max_days FROM params), GREATEST(0,(CURRENT_DATE-cs.cohort_day_start)::int)) AS cap_days
FROM cohort_sizes cs
),
grid AS (
SELECT cc.cohort_day_start, gs AS user_lifetime_day, cc.cohort_users
FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_days) gs
)
SELECT
g.cohort_day_start,
TO_CHAR(g.cohort_day_start,'YYYY-MM-DD') AS cohort_label,
TO_CHAR(g.cohort_day_start,'YYYY-MM-DD')||' (n='||g.cohort_users||')' AS cohort_label_n,
g.user_lifetime_day, g.cohort_users,
COALESCE(b.active_users_bounded,0) AS active_users_bounded,
COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END AS retention_rate_bounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
CASE WHEN g.user_lifetime_day=0 THEN g.cohort_users ELSE 0 END AS cohort_users_d0
FROM grid g
LEFT JOIN bounded_counts b ON b.cohort_day_start=g.cohort_day_start AND b.user_lifetime_day=g.user_lifetime_day
LEFT JOIN unbounded_counts u ON u.cohort_day_start=g.cohort_day_start AND u.user_lifetime_day=g.user_lifetime_day
ORDER BY g.cohort_day_start, g.user_lifetime_day;

View File

@@ -0,0 +1,96 @@
-- =============================================================
-- View: analytics.retention_login_onboarded_weekly
-- Looker source alias: ds101 | Charts: 2
-- =============================================================
-- DESCRIPTION
-- Weekly cohort retention from login sessions, restricted to
-- users who "onboarded" — defined as running at least one
-- agent within 365 days of their first login.
-- Filters out users who signed up but never activated,
-- giving a cleaner view of engaged-user retention.
--
-- SOURCE TABLES
-- auth.sessions — Login session records
-- platform.AgentGraphExecution — Used to identify onboarders
--
-- OUTPUT COLUMNS
-- Same as retention_login_weekly (cohort_week_start, user_lifetime_week,
-- retention_rate_bounded, retention_rate_unbounded, etc.)
-- Only difference: cohort is filtered to onboarded users only.
--
-- EXAMPLE QUERIES
-- -- Compare week-4 retention: all users vs onboarded only
-- SELECT 'all_users' AS segment, AVG(retention_rate_bounded) AS w4_retention
-- FROM analytics.retention_login_weekly WHERE user_lifetime_week = 4
-- UNION ALL
-- SELECT 'onboarded', AVG(retention_rate_bounded)
-- FROM analytics.retention_login_onboarded_weekly WHERE user_lifetime_week = 4;
-- =============================================================
WITH params AS (SELECT 12::int AS max_weeks, 365::int AS onboarding_window_days),
events AS (
SELECT s.user_id::text AS user_id, s.created_at::timestamptz AS created_at,
DATE_TRUNC('week', s.created_at)::date AS week_start
FROM auth.sessions s WHERE s.user_id IS NOT NULL
),
first_login_all AS (
SELECT user_id, MIN(created_at) AS first_login_time,
DATE_TRUNC('week', MIN(created_at))::date AS cohort_week_start
FROM events GROUP BY 1
),
onboarders AS (
SELECT fl.user_id FROM first_login_all fl
WHERE EXISTS (
SELECT 1 FROM platform."AgentGraphExecution" e
WHERE e."userId"::text = fl.user_id
AND e."createdAt" >= fl.first_login_time
AND e."createdAt" < fl.first_login_time
+ make_interval(days => (SELECT onboarding_window_days FROM params))
)
),
first_login AS (SELECT * FROM first_login_all WHERE user_id IN (SELECT user_id FROM onboarders)),
activity_weeks AS (SELECT DISTINCT user_id, week_start FROM events),
user_week_age AS (
SELECT aw.user_id, fl.cohort_week_start,
((aw.week_start - DATE_TRUNC('week',fl.first_login_time)::date)/7)::int AS user_lifetime_week
FROM activity_weeks aw JOIN first_login fl USING (user_id)
WHERE aw.week_start >= DATE_TRUNC('week',fl.first_login_time)::date
),
bounded_counts AS (
SELECT cohort_week_start, user_lifetime_week, COUNT(DISTINCT user_id) AS active_users_bounded
FROM user_week_age WHERE user_lifetime_week >= 0 GROUP BY 1,2
),
last_active AS (
SELECT cohort_week_start, user_id, MAX(user_lifetime_week) AS last_active_week FROM user_week_age GROUP BY 1,2
),
unbounded_counts AS (
SELECT la.cohort_week_start, gs AS user_lifetime_week, COUNT(*) AS retained_users_unbounded
FROM last_active la
CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_week,(SELECT max_weeks FROM params))) gs
GROUP BY 1,2
),
cohort_sizes AS (SELECT cohort_week_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_login GROUP BY 1),
cohort_caps AS (
SELECT cs.cohort_week_start, cs.cohort_users,
LEAST((SELECT max_weeks FROM params),
GREATEST(0,((DATE_TRUNC('week',CURRENT_DATE)::date-cs.cohort_week_start)/7)::int)) AS cap_weeks
FROM cohort_sizes cs
),
grid AS (
SELECT cc.cohort_week_start, gs AS user_lifetime_week, cc.cohort_users
FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_weeks) gs
)
SELECT
g.cohort_week_start,
TO_CHAR(g.cohort_week_start,'IYYY-"W"IW') AS cohort_label,
TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')||' (n='||g.cohort_users||')' AS cohort_label_n,
g.user_lifetime_week, g.cohort_users,
COALESCE(b.active_users_bounded,0) AS active_users_bounded,
COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END AS retention_rate_bounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
CASE WHEN g.user_lifetime_week=0 THEN g.cohort_users ELSE 0 END AS cohort_users_w0
FROM grid g
LEFT JOIN bounded_counts b ON b.cohort_week_start=g.cohort_week_start AND b.user_lifetime_week=g.user_lifetime_week
LEFT JOIN unbounded_counts u ON u.cohort_week_start=g.cohort_week_start AND u.user_lifetime_week=g.user_lifetime_week
ORDER BY g.cohort_week_start, g.user_lifetime_week;

View File

@@ -0,0 +1,103 @@
-- =============================================================
-- View: analytics.retention_login_weekly
-- Looker source alias: ds83 | Charts: 2
-- =============================================================
-- DESCRIPTION
-- Weekly cohort retention based on login sessions.
-- Users are grouped by the ISO week of their first ever login.
-- For each cohort × lifetime-week combination, outputs both:
-- - bounded rate: % active in exactly that week
-- - unbounded rate: % who were ever active on or after that week
-- Weeks are capped to the cohort's actual age (no future data points).
--
-- SOURCE TABLES
-- auth.sessions — Login session records
--
-- HOW TO READ THE OUTPUT
-- cohort_week_start The Monday of the week users first logged in
-- user_lifetime_week 0 = signup week, 1 = one week later, etc.
-- retention_rate_bounded = active_users_bounded / cohort_users
-- retention_rate_unbounded = retained_users_unbounded / cohort_users
--
-- OUTPUT COLUMNS
-- cohort_week_start DATE First day of the cohort's signup week
-- cohort_label TEXT ISO week label (e.g. '2025-W01')
-- cohort_label_n TEXT ISO week label with cohort size (e.g. '2025-W01 (n=42)')
-- user_lifetime_week INT Weeks since first login (0 = signup week)
-- cohort_users BIGINT Total users in this cohort (denominator)
-- active_users_bounded BIGINT Users active in exactly week k
-- retained_users_unbounded BIGINT Users active any time on/after week k
-- retention_rate_bounded FLOAT bounded active / cohort_users
-- retention_rate_unbounded FLOAT unbounded retained / cohort_users
-- cohort_users_w0 BIGINT cohort_users only at week 0, else 0 (safe to SUM in pivot tables)
--
-- EXAMPLE QUERIES
-- -- Week-1 retention rate per cohort
-- SELECT cohort_label, retention_rate_bounded AS w1_retention
-- FROM analytics.retention_login_weekly
-- WHERE user_lifetime_week = 1
-- ORDER BY cohort_week_start;
--
-- -- Overall average retention curve (all cohorts combined)
-- SELECT user_lifetime_week,
-- SUM(active_users_bounded)::float / NULLIF(SUM(cohort_users_w0), 0) AS avg_retention
-- FROM analytics.retention_login_weekly
-- GROUP BY 1 ORDER BY 1;
-- =============================================================
WITH params AS (SELECT 12::int AS max_weeks),
events AS (
SELECT s.user_id::text AS user_id, s.created_at::timestamptz AS created_at,
DATE_TRUNC('week', s.created_at)::date AS week_start
FROM auth.sessions s WHERE s.user_id IS NOT NULL
),
first_login AS (
SELECT user_id, MIN(created_at) AS first_login_time,
DATE_TRUNC('week', MIN(created_at))::date AS cohort_week_start
FROM events GROUP BY 1
),
activity_weeks AS (SELECT DISTINCT user_id, week_start FROM events),
user_week_age AS (
SELECT aw.user_id, fl.cohort_week_start,
((aw.week_start - DATE_TRUNC('week', fl.first_login_time)::date) / 7)::int AS user_lifetime_week
FROM activity_weeks aw JOIN first_login fl USING (user_id)
WHERE aw.week_start >= DATE_TRUNC('week', fl.first_login_time)::date
),
bounded_counts AS (
SELECT cohort_week_start, user_lifetime_week, COUNT(DISTINCT user_id) AS active_users_bounded
FROM user_week_age WHERE user_lifetime_week >= 0 GROUP BY 1,2
),
last_active AS (
SELECT cohort_week_start, user_id, MAX(user_lifetime_week) AS last_active_week FROM user_week_age GROUP BY 1,2
),
unbounded_counts AS (
SELECT la.cohort_week_start, gs AS user_lifetime_week, COUNT(*) AS retained_users_unbounded
FROM last_active la
CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_week,(SELECT max_weeks FROM params))) gs
GROUP BY 1,2
),
cohort_sizes AS (SELECT cohort_week_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_login GROUP BY 1),
cohort_caps AS (
SELECT cs.cohort_week_start, cs.cohort_users,
LEAST((SELECT max_weeks FROM params),
GREATEST(0,((DATE_TRUNC('week',CURRENT_DATE)::date - cs.cohort_week_start)/7)::int)) AS cap_weeks
FROM cohort_sizes cs
),
grid AS (
SELECT cc.cohort_week_start, gs AS user_lifetime_week, cc.cohort_users
FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_weeks) gs
)
SELECT
g.cohort_week_start,
TO_CHAR(g.cohort_week_start,'IYYY-"W"IW') AS cohort_label,
TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')||' (n='||g.cohort_users||')' AS cohort_label_n,
g.user_lifetime_week, g.cohort_users,
COALESCE(b.active_users_bounded,0) AS active_users_bounded,
COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END AS retention_rate_bounded,
CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
CASE WHEN g.user_lifetime_week=0 THEN g.cohort_users ELSE 0 END AS cohort_users_w0
FROM grid g
LEFT JOIN bounded_counts b ON b.cohort_week_start=g.cohort_week_start AND b.user_lifetime_week=g.user_lifetime_week
LEFT JOIN unbounded_counts u ON u.cohort_week_start=g.cohort_week_start AND u.user_lifetime_week=g.user_lifetime_week
ORDER BY g.cohort_week_start, g.user_lifetime_week

View File

@@ -0,0 +1,71 @@
-- =============================================================
-- View: analytics.user_block_spending
-- Looker source alias: ds6 | Charts: 5
-- =============================================================
-- DESCRIPTION
-- One row per credit transaction (last 90 days).
-- Shows how users spend credits broken down by block type,
-- LLM provider and model. Joins node execution stats for
-- token-level detail.
--
-- SOURCE TABLES
-- platform.CreditTransaction — Credit debit/credit records
-- platform.AgentNodeExecution — Node execution stats (for token counts)
--
-- OUTPUT COLUMNS
-- transactionKey TEXT Unique transaction identifier
-- userId TEXT User who was charged
-- amount DECIMAL Credit amount (positive = credit, negative = debit)
-- negativeAmount DECIMAL amount * -1 (convenience for spend charts)
-- transactionType TEXT Transaction type (e.g. 'USAGE', 'REFUND', 'TOP_UP')
-- transactionTime TIMESTAMPTZ When the transaction was recorded
-- blockId TEXT Block UUID that triggered the spend
-- blockName TEXT Human-readable block name
-- llm_provider TEXT LLM provider (e.g. 'openai', 'anthropic')
-- llm_model TEXT Model name (e.g. 'gpt-4o', 'claude-3-5-sonnet')
-- node_exec_id TEXT Linked node execution UUID
-- llm_call_count INT LLM API calls made in that execution
-- llm_retry_count INT LLM retries in that execution
-- llm_input_token_count INT Input tokens consumed
-- llm_output_token_count INT Output tokens produced
--
-- WINDOW
-- Rolling 90 days (createdAt > CURRENT_DATE - 90 days)
--
-- EXAMPLE QUERIES
-- -- Total spend per user (last 90 days)
-- SELECT "userId", SUM("negativeAmount") AS total_spent
-- FROM analytics.user_block_spending
-- WHERE "transactionType" = 'USAGE'
-- GROUP BY 1 ORDER BY total_spent DESC;
--
-- -- Spend by LLM provider + model
-- SELECT "llm_provider", "llm_model",
-- SUM("negativeAmount") AS total_cost,
-- SUM("llm_input_token_count") AS input_tokens,
-- SUM("llm_output_token_count") AS output_tokens
-- FROM analytics.user_block_spending
-- WHERE "llm_provider" IS NOT NULL
-- GROUP BY 1, 2 ORDER BY total_cost DESC;
-- =============================================================
SELECT
c."transactionKey" AS transactionKey,
c."userId" AS userId,
c."amount" AS amount,
c."amount" * -1 AS negativeAmount,
c."type" AS transactionType,
c."createdAt" AS transactionTime,
c.metadata->>'block_id' AS blockId,
c.metadata->>'block' AS blockName,
c.metadata->'input'->'credentials'->>'provider' AS llm_provider,
c.metadata->'input'->>'model' AS llm_model,
c.metadata->>'node_exec_id' AS node_exec_id,
(ne."stats"->>'llm_call_count')::int AS llm_call_count,
(ne."stats"->>'llm_retry_count')::int AS llm_retry_count,
(ne."stats"->>'input_token_count')::int AS llm_input_token_count,
(ne."stats"->>'output_token_count')::int AS llm_output_token_count
FROM platform."CreditTransaction" c
LEFT JOIN platform."AgentNodeExecution" ne
ON (c.metadata->>'node_exec_id') = ne."id"::text
WHERE c."createdAt" > CURRENT_DATE - INTERVAL '90 days'

View File

@@ -0,0 +1,45 @@
-- =============================================================
-- View: analytics.user_onboarding
-- Looker source alias: ds68 | Charts: 3
-- =============================================================
-- DESCRIPTION
-- One row per user onboarding record. Contains the user's
-- stated usage reason, selected integrations, completed
-- onboarding steps and optional first agent selection.
-- Full history (no date filter) since onboarding happens
-- once per user.
--
-- SOURCE TABLES
-- platform.UserOnboarding — Onboarding state per user
--
-- OUTPUT COLUMNS
-- id TEXT Onboarding record UUID
-- createdAt TIMESTAMPTZ When onboarding started
-- updatedAt TIMESTAMPTZ Last update to onboarding state
-- usageReason TEXT Why user signed up (e.g. 'work', 'personal')
-- integrations TEXT[] Array of integration names the user selected
-- userId TEXT User UUID
-- completedSteps TEXT[] Array of onboarding step enums completed
-- selectedStoreListingVersionId TEXT First marketplace agent the user chose (if any)
--
-- EXAMPLE QUERIES
-- -- Usage reason breakdown
-- SELECT "usageReason", COUNT(*) FROM analytics.user_onboarding GROUP BY 1;
--
-- -- Completion rate per step
-- SELECT step, COUNT(*) AS users_completed
-- FROM analytics.user_onboarding
-- CROSS JOIN LATERAL UNNEST("completedSteps") AS step
-- GROUP BY 1 ORDER BY users_completed DESC;
-- =============================================================
SELECT
id,
"createdAt",
"updatedAt",
"usageReason",
integrations,
"userId",
"completedSteps",
"selectedStoreListingVersionId"
FROM platform."UserOnboarding"

View File

@@ -0,0 +1,100 @@
-- =============================================================
-- View: analytics.user_onboarding_funnel
-- Looker source alias: ds74 | Charts: 1
-- =============================================================
-- DESCRIPTION
-- Pre-aggregated onboarding funnel showing how many users
-- completed each step and the drop-off percentage from the
-- previous step. One row per onboarding step (all 22 steps
-- always present, even with 0 completions — prevents sparse
-- gaps from making LAG compare the wrong predecessors).
--
-- SOURCE TABLES
-- platform.UserOnboarding — Onboarding records with completedSteps array
--
-- OUTPUT COLUMNS
-- step TEXT Onboarding step enum name (e.g. 'WELCOME', 'CONGRATS')
-- step_order INT Numeric position in the funnel (1=first, 22=last)
-- users_completed BIGINT Distinct users who completed this step
-- pct_from_prev NUMERIC % of users from the previous step who reached this one
--
-- STEP ORDER
-- 1 WELCOME 9 MARKETPLACE_VISIT 17 SCHEDULE_AGENT
-- 2 USAGE_REASON 10 MARKETPLACE_ADD_AGENT 18 RUN_AGENTS
-- 3 INTEGRATIONS 11 MARKETPLACE_RUN_AGENT 19 RUN_3_DAYS
-- 4 AGENT_CHOICE 12 BUILDER_OPEN 20 TRIGGER_WEBHOOK
-- 5 AGENT_NEW_RUN 13 BUILDER_SAVE_AGENT 21 RUN_14_DAYS
-- 6 AGENT_INPUT 14 BUILDER_RUN_AGENT 22 RUN_AGENTS_100
-- 7 CONGRATS 15 VISIT_COPILOT
-- 8 GET_RESULTS 16 RE_RUN_AGENT
--
-- WINDOW
-- Users who started onboarding in the last 90 days
--
-- EXAMPLE QUERIES
-- -- Full funnel
-- SELECT * FROM analytics.user_onboarding_funnel ORDER BY step_order;
--
-- -- Biggest drop-off point
-- SELECT step, pct_from_prev FROM analytics.user_onboarding_funnel
-- ORDER BY pct_from_prev ASC LIMIT 3;
-- =============================================================
WITH all_steps AS (
-- Complete ordered grid of all 22 steps so zero-completion steps
-- are always present, keeping LAG comparisons correct.
SELECT step_name, step_order
FROM (VALUES
('WELCOME', 1),
('USAGE_REASON', 2),
('INTEGRATIONS', 3),
('AGENT_CHOICE', 4),
('AGENT_NEW_RUN', 5),
('AGENT_INPUT', 6),
('CONGRATS', 7),
('GET_RESULTS', 8),
('MARKETPLACE_VISIT', 9),
('MARKETPLACE_ADD_AGENT', 10),
('MARKETPLACE_RUN_AGENT', 11),
('BUILDER_OPEN', 12),
('BUILDER_SAVE_AGENT', 13),
('BUILDER_RUN_AGENT', 14),
('VISIT_COPILOT', 15),
('RE_RUN_AGENT', 16),
('SCHEDULE_AGENT', 17),
('RUN_AGENTS', 18),
('RUN_3_DAYS', 19),
('TRIGGER_WEBHOOK', 20),
('RUN_14_DAYS', 21),
('RUN_AGENTS_100', 22)
) AS t(step_name, step_order)
),
raw AS (
SELECT
u."userId",
step_txt::text AS step
FROM platform."UserOnboarding" u
CROSS JOIN LATERAL UNNEST(u."completedSteps") AS step_txt
WHERE u."createdAt" >= CURRENT_DATE - INTERVAL '90 days'
),
step_counts AS (
SELECT step, COUNT(DISTINCT "userId") AS users_completed
FROM raw GROUP BY step
),
funnel AS (
SELECT
a.step_name AS step,
a.step_order,
COALESCE(sc.users_completed, 0) AS users_completed,
ROUND(
100.0 * COALESCE(sc.users_completed, 0)
/ NULLIF(
LAG(COALESCE(sc.users_completed, 0)) OVER (ORDER BY a.step_order),
0
),
2
) AS pct_from_prev
FROM all_steps a
LEFT JOIN step_counts sc ON sc.step = a.step_name
)
SELECT * FROM funnel ORDER BY step_order

View File

@@ -0,0 +1,41 @@
-- =============================================================
-- View: analytics.user_onboarding_integration
-- Looker source alias: ds75 | Charts: 1
-- =============================================================
-- DESCRIPTION
-- Pre-aggregated count of users who selected each integration
-- during onboarding. One row per integration type, sorted
-- by popularity.
--
-- SOURCE TABLES
-- platform.UserOnboarding — integrations array column
--
-- OUTPUT COLUMNS
-- integration TEXT Integration name (e.g. 'github', 'slack', 'notion')
-- users_with_integration BIGINT Distinct users who selected this integration
--
-- WINDOW
-- Users who started onboarding in the last 90 days
--
-- EXAMPLE QUERIES
-- -- Full integration popularity ranking
-- SELECT * FROM analytics.user_onboarding_integration;
--
-- -- Top 5 integrations
-- SELECT * FROM analytics.user_onboarding_integration LIMIT 5;
-- =============================================================
WITH exploded AS (
SELECT
u."userId" AS user_id,
UNNEST(u."integrations") AS integration
FROM platform."UserOnboarding" u
WHERE u."createdAt" >= CURRENT_DATE - INTERVAL '90 days'
)
SELECT
integration,
COUNT(DISTINCT user_id) AS users_with_integration
FROM exploded
WHERE integration IS NOT NULL AND integration <> ''
GROUP BY integration
ORDER BY users_with_integration DESC

View File

@@ -0,0 +1,145 @@
-- =============================================================
-- View: analytics.users_activities
-- Looker source alias: ds56 | Charts: 5
-- =============================================================
-- DESCRIPTION
-- One row per user with lifetime activity summary.
-- Joins login sessions with agent graphs, executions and
-- node-level runs to give a full picture of how engaged
-- each user is. Includes a convenience flag for 7-day
-- activation (did the user return at least 7 days after
-- their first login?).
--
-- SOURCE TABLES
-- auth.sessions — Login/session records
-- platform.AgentGraph — Graphs (agents) built by the user
-- platform.AgentGraphExecution — Agent run history
-- platform.AgentNodeExecution — Individual block execution history
--
-- PERFORMANCE NOTE
-- Each CTE aggregates its own table independently by userId.
-- This avoids the fan-out that occurs when driving every join
-- from user_logins across the two largest tables
-- (AgentGraphExecution and AgentNodeExecution).
--
-- OUTPUT COLUMNS
-- user_id TEXT Supabase user UUID
-- first_login_time TIMESTAMPTZ First ever session created_at
-- last_login_time TIMESTAMPTZ Most recent session created_at
-- last_visit_time TIMESTAMPTZ Max of last refresh or login
-- last_agent_save_time TIMESTAMPTZ Last time user saved an agent graph
-- agent_count BIGINT Number of distinct active graphs built (0 if none)
-- first_agent_run_time TIMESTAMPTZ First ever graph execution
-- last_agent_run_time TIMESTAMPTZ Most recent graph execution
-- unique_agent_runs BIGINT Distinct agent graphs ever run (0 if none)
-- agent_runs BIGINT Total graph execution count (0 if none)
-- node_execution_count BIGINT Total node executions across all runs
-- node_execution_failed BIGINT Node executions with FAILED status
-- node_execution_completed BIGINT Node executions with COMPLETED status
-- node_execution_terminated BIGINT Node executions with TERMINATED status
-- node_execution_queued BIGINT Node executions with QUEUED status
-- node_execution_running BIGINT Node executions with RUNNING status
-- is_active_after_7d INT 1=returned after day 7, 0=did not, NULL=too early to tell
-- node_execution_incomplete BIGINT Node executions with INCOMPLETE status
-- node_execution_review BIGINT Node executions with REVIEW status
--
-- EXAMPLE QUERIES
-- -- Users who ran at least one agent and returned after 7 days
-- SELECT COUNT(*) FROM analytics.users_activities
-- WHERE agent_runs > 0 AND is_active_after_7d = 1;
--
-- -- Top 10 most active users by agent runs
-- SELECT user_id, agent_runs, node_execution_count
-- FROM analytics.users_activities
-- ORDER BY agent_runs DESC LIMIT 10;
--
-- -- 7-day activation rate
-- SELECT
-- SUM(CASE WHEN is_active_after_7d = 1 THEN 1 ELSE 0 END)::float
-- / NULLIF(COUNT(CASE WHEN is_active_after_7d IS NOT NULL THEN 1 END), 0)
-- AS activation_rate
-- FROM analytics.users_activities;
-- =============================================================
WITH user_logins AS (
SELECT
user_id::text AS user_id,
MIN(created_at) AS first_login_time,
MAX(created_at) AS last_login_time,
GREATEST(
MAX(refreshed_at)::timestamptz,
MAX(created_at)::timestamptz
) AS last_visit_time
FROM auth.sessions
GROUP BY user_id
),
user_agents AS (
-- Aggregate AgentGraph directly by userId (no fan-out from user_logins)
SELECT
"userId"::text AS user_id,
MAX("updatedAt") AS last_agent_save_time,
COUNT(DISTINCT "id") AS agent_count
FROM platform."AgentGraph"
WHERE "isActive"
GROUP BY "userId"
),
user_graph_runs AS (
-- Aggregate AgentGraphExecution directly by userId
SELECT
"userId"::text AS user_id,
MIN("createdAt") AS first_agent_run_time,
MAX("createdAt") AS last_agent_run_time,
COUNT(DISTINCT "agentGraphId") AS unique_agent_runs,
COUNT("id") AS agent_runs
FROM platform."AgentGraphExecution"
GROUP BY "userId"
),
user_node_runs AS (
-- Aggregate AgentNodeExecution directly; resolve userId via a
-- single join to AgentGraphExecution instead of fanning out from
-- user_logins through both large tables.
SELECT
g."userId"::text AS user_id,
COUNT(*) AS node_execution_count,
COUNT(*) FILTER (WHERE n."executionStatus" = 'FAILED') AS node_execution_failed,
COUNT(*) FILTER (WHERE n."executionStatus" = 'COMPLETED') AS node_execution_completed,
COUNT(*) FILTER (WHERE n."executionStatus" = 'TERMINATED') AS node_execution_terminated,
COUNT(*) FILTER (WHERE n."executionStatus" = 'QUEUED') AS node_execution_queued,
COUNT(*) FILTER (WHERE n."executionStatus" = 'RUNNING') AS node_execution_running,
COUNT(*) FILTER (WHERE n."executionStatus" = 'INCOMPLETE') AS node_execution_incomplete,
COUNT(*) FILTER (WHERE n."executionStatus" = 'REVIEW') AS node_execution_review
FROM platform."AgentNodeExecution" n
JOIN platform."AgentGraphExecution" g
ON g."id" = n."agentGraphExecutionId"
GROUP BY g."userId"
)
SELECT
ul.user_id,
ul.first_login_time,
ul.last_login_time,
ul.last_visit_time,
ua.last_agent_save_time,
COALESCE(ua.agent_count, 0) AS agent_count,
gr.first_agent_run_time,
gr.last_agent_run_time,
COALESCE(gr.unique_agent_runs, 0) AS unique_agent_runs,
COALESCE(gr.agent_runs, 0) AS agent_runs,
COALESCE(nr.node_execution_count, 0) AS node_execution_count,
COALESCE(nr.node_execution_failed, 0) AS node_execution_failed,
COALESCE(nr.node_execution_completed, 0) AS node_execution_completed,
COALESCE(nr.node_execution_terminated, 0) AS node_execution_terminated,
COALESCE(nr.node_execution_queued, 0) AS node_execution_queued,
COALESCE(nr.node_execution_running, 0) AS node_execution_running,
CASE
WHEN ul.first_login_time < NOW() - INTERVAL '7 days'
AND ul.last_visit_time >= ul.first_login_time + INTERVAL '7 days' THEN 1
WHEN ul.first_login_time < NOW() - INTERVAL '7 days'
AND ul.last_visit_time < ul.first_login_time + INTERVAL '7 days' THEN 0
ELSE NULL
END AS is_active_after_7d,
COALESCE(nr.node_execution_incomplete, 0) AS node_execution_incomplete,
COALESCE(nr.node_execution_review, 0) AS node_execution_review
FROM user_logins ul
LEFT JOIN user_agents ua ON ul.user_id = ua.user_id
LEFT JOIN user_graph_runs gr ON ul.user_id = gr.user_id
LEFT JOIN user_node_runs nr ON ul.user_id = nr.user_id

View File

@@ -37,6 +37,10 @@ JWT_VERIFY_KEY=your-super-secret-jwt-token-with-at-least-32-characters-long
ENCRYPTION_KEY=dvziYgz0KSK8FENhju0ZYi8-fRTfAdlz6YLhdB_jhNw=
UNSUBSCRIBE_SECRET_KEY=HlP8ivStJjmbf6NKi78m_3FnOogut0t5ckzjsIqeaio=
## ===== SIGNUP / INVITE GATE ===== ##
# Set to true to require an invite before users can sign up
ENABLE_INVITE_GATE=false
## ===== IMPORTANT OPTIONAL CONFIGURATION ===== ##
# Platform URLs (set these for webhooks and OAuth to work)
PLATFORM_BASE_URL=http://localhost:8000

View File

@@ -58,10 +58,31 @@ poetry run pytest path/to/test.py --snapshot-update
- **Authentication**: JWT-based with Supabase integration
- **Security**: Cache protection middleware prevents sensitive data caching in browsers/proxies
## Code Style
- **Top-level imports only** — no local/inner imports (lazy imports only for heavy optional deps like `openpyxl`)
- **No duck typing** — no `hasattr`/`getattr`/`isinstance` for type dispatch; use typed interfaces/unions/protocols
- **Pydantic models** over dataclass/namedtuple/dict for structured data
- **No linter suppressors** — no `# type: ignore`, `# noqa`, `# pyright: ignore`; fix the type/code
- **List comprehensions** over manual loop-and-append
- **Early return** — guard clauses first, avoid deep nesting
- **Lazy `%s` logging** — `logger.info("Processing %s items", count)` not `logger.info(f"Processing {count} items")`
- **Sanitize error paths** — `os.path.basename()` in error messages to avoid leaking directory structure
- **TOCTOU awareness** — avoid check-then-act patterns for file access and credit charging
- **`Security()` vs `Depends()`** — use `Security()` for auth deps to get proper OpenAPI security spec
- **Redis pipelines** — `transaction=True` for atomicity on multi-step operations
- **`max(0, value)` guards** — for computed values that should never be negative
- **SSE protocol** — `data:` lines for frontend-parsed events (must match Zod schema), `: comment` lines for heartbeats/status
- **File length** — keep files under ~300 lines; if a file grows beyond this, split by responsibility (e.g. extract helpers, models, or a sub-module into a new file). Never keep appending to a long file.
- **Function length** — keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions are a sign of mixed concerns, not complexity.
## Testing Approach
- Uses pytest with snapshot testing for API responses
- Test files are colocated with source files (`*_test.py`)
- Mock at boundaries — mock where the symbol is **used**, not where it's **defined**
- After refactoring, update mock targets to match new module paths
- Use `AsyncMock` for async functions (`from unittest.mock import AsyncMock`)
## Database Schema

View File

@@ -1,8 +1,17 @@
from pydantic import BaseModel
from __future__ import annotations
from datetime import datetime
from typing import TYPE_CHECKING, Any, Literal, Optional
import prisma.enums
from pydantic import BaseModel, EmailStr
from backend.data.model import UserTransaction
from backend.util.models import Pagination
if TYPE_CHECKING:
from backend.data.invited_user import BulkInvitedUsersResult, InvitedUserRecord
class UserHistoryResponse(BaseModel):
"""Response model for listings with version history"""
@@ -14,3 +23,70 @@ class UserHistoryResponse(BaseModel):
class AddUserCreditsResponse(BaseModel):
new_balance: int
transaction_key: str
class CreateInvitedUserRequest(BaseModel):
email: EmailStr
name: Optional[str] = None
class InvitedUserResponse(BaseModel):
id: str
email: str
status: prisma.enums.InvitedUserStatus
auth_user_id: Optional[str] = None
name: Optional[str] = None
tally_understanding: Optional[dict[str, Any]] = None
tally_status: prisma.enums.TallyComputationStatus
tally_computed_at: Optional[datetime] = None
tally_error: Optional[str] = None
created_at: datetime
updated_at: datetime
@classmethod
def from_record(cls, record: InvitedUserRecord) -> InvitedUserResponse:
return cls.model_validate(record.model_dump())
class InvitedUsersResponse(BaseModel):
invited_users: list[InvitedUserResponse]
pagination: Pagination
class BulkInvitedUserRowResponse(BaseModel):
row_number: int
email: Optional[str] = None
name: Optional[str] = None
status: Literal["CREATED", "SKIPPED", "ERROR"]
message: str
invited_user: Optional[InvitedUserResponse] = None
class BulkInvitedUsersResponse(BaseModel):
created_count: int
skipped_count: int
error_count: int
results: list[BulkInvitedUserRowResponse]
@classmethod
def from_result(cls, result: BulkInvitedUsersResult) -> BulkInvitedUsersResponse:
return cls(
created_count=result.created_count,
skipped_count=result.skipped_count,
error_count=result.error_count,
results=[
BulkInvitedUserRowResponse(
row_number=row.row_number,
email=row.email,
name=row.name,
status=row.status,
message=row.message,
invited_user=(
InvitedUserResponse.from_record(row.invited_user)
if row.invited_user is not None
else None
),
)
for row in result.results
],
)

View File

@@ -0,0 +1,137 @@
import logging
import math
from autogpt_libs.auth import get_user_id, requires_admin_user
from fastapi import APIRouter, File, Query, Security, UploadFile
from backend.data.invited_user import (
bulk_create_invited_users_from_file,
create_invited_user,
list_invited_users,
retry_invited_user_tally,
revoke_invited_user,
)
from backend.data.tally import mask_email
from backend.util.models import Pagination
from .model import (
BulkInvitedUsersResponse,
CreateInvitedUserRequest,
InvitedUserResponse,
InvitedUsersResponse,
)
logger = logging.getLogger(__name__)
router = APIRouter(
prefix="/admin",
tags=["users", "admin"],
dependencies=[Security(requires_admin_user)],
)
@router.get(
"/invited-users",
response_model=InvitedUsersResponse,
summary="List Invited Users",
)
async def get_invited_users(
admin_user_id: str = Security(get_user_id),
page: int = Query(1, ge=1),
page_size: int = Query(50, ge=1, le=200),
) -> InvitedUsersResponse:
logger.info("Admin user %s requested invited users", admin_user_id)
invited_users, total = await list_invited_users(page=page, page_size=page_size)
return InvitedUsersResponse(
invited_users=[InvitedUserResponse.from_record(iu) for iu in invited_users],
pagination=Pagination(
total_items=total,
total_pages=max(1, math.ceil(total / page_size)),
current_page=page,
page_size=page_size,
),
)
@router.post(
"/invited-users",
response_model=InvitedUserResponse,
summary="Create Invited User",
)
async def create_invited_user_route(
request: CreateInvitedUserRequest,
admin_user_id: str = Security(get_user_id),
) -> InvitedUserResponse:
logger.info(
"Admin user %s creating invited user for %s",
admin_user_id,
mask_email(request.email),
)
invited_user = await create_invited_user(request.email, request.name)
logger.info(
"Admin user %s created invited user %s",
admin_user_id,
invited_user.id,
)
return InvitedUserResponse.from_record(invited_user)
@router.post(
"/invited-users/bulk",
response_model=BulkInvitedUsersResponse,
summary="Bulk Create Invited Users",
operation_id="postV2BulkCreateInvitedUsers",
)
async def bulk_create_invited_users_route(
file: UploadFile = File(...),
admin_user_id: str = Security(get_user_id),
) -> BulkInvitedUsersResponse:
logger.info(
"Admin user %s bulk invited users from %s",
admin_user_id,
file.filename or "<unnamed>",
)
content = await file.read()
result = await bulk_create_invited_users_from_file(file.filename, content)
return BulkInvitedUsersResponse.from_result(result)
@router.post(
"/invited-users/{invited_user_id}/revoke",
response_model=InvitedUserResponse,
summary="Revoke Invited User",
)
async def revoke_invited_user_route(
invited_user_id: str,
admin_user_id: str = Security(get_user_id),
) -> InvitedUserResponse:
logger.info(
"Admin user %s revoking invited user %s", admin_user_id, invited_user_id
)
invited_user = await revoke_invited_user(invited_user_id)
logger.info("Admin user %s revoked invited user %s", admin_user_id, invited_user_id)
return InvitedUserResponse.from_record(invited_user)
@router.post(
"/invited-users/{invited_user_id}/retry-tally",
response_model=InvitedUserResponse,
summary="Retry Invited User Tally",
)
async def retry_invited_user_tally_route(
invited_user_id: str,
admin_user_id: str = Security(get_user_id),
) -> InvitedUserResponse:
logger.info(
"Admin user %s retrying Tally seed for invited user %s",
admin_user_id,
invited_user_id,
)
invited_user = await retry_invited_user_tally(invited_user_id)
logger.info(
"Admin user %s retried Tally seed for invited user %s",
admin_user_id,
invited_user_id,
)
return InvitedUserResponse.from_record(invited_user)

View File

@@ -0,0 +1,168 @@
from datetime import datetime, timezone
from unittest.mock import AsyncMock
import fastapi
import fastapi.testclient
import prisma.enums
import pytest
import pytest_mock
from autogpt_libs.auth.jwt_utils import get_jwt_payload
from backend.data.invited_user import (
BulkInvitedUserRowResult,
BulkInvitedUsersResult,
InvitedUserRecord,
)
from .user_admin_routes import router as user_admin_router
app = fastapi.FastAPI()
app.include_router(user_admin_router)
client = fastapi.testclient.TestClient(app)
@pytest.fixture(autouse=True)
def setup_app_admin_auth(mock_jwt_admin):
app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
yield
app.dependency_overrides.clear()
def _sample_invited_user() -> InvitedUserRecord:
now = datetime.now(timezone.utc)
return InvitedUserRecord(
id="invite-1",
email="invited@example.com",
status=prisma.enums.InvitedUserStatus.INVITED,
auth_user_id=None,
name="Invited User",
tally_understanding=None,
tally_status=prisma.enums.TallyComputationStatus.PENDING,
tally_computed_at=None,
tally_error=None,
created_at=now,
updated_at=now,
)
def _sample_bulk_invited_users_result() -> BulkInvitedUsersResult:
return BulkInvitedUsersResult(
created_count=1,
skipped_count=1,
error_count=0,
results=[
BulkInvitedUserRowResult(
row_number=1,
email="invited@example.com",
name=None,
status="CREATED",
message="Invite created",
invited_user=_sample_invited_user(),
),
BulkInvitedUserRowResult(
row_number=2,
email="duplicate@example.com",
name=None,
status="SKIPPED",
message="An invited user with this email already exists",
invited_user=None,
),
],
)
def test_get_invited_users(
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"backend.api.features.admin.user_admin_routes.list_invited_users",
AsyncMock(return_value=([_sample_invited_user()], 1)),
)
response = client.get("/admin/invited-users")
assert response.status_code == 200
data = response.json()
assert len(data["invited_users"]) == 1
assert data["invited_users"][0]["email"] == "invited@example.com"
assert data["invited_users"][0]["status"] == "INVITED"
assert data["pagination"]["total_items"] == 1
assert data["pagination"]["current_page"] == 1
assert data["pagination"]["page_size"] == 50
def test_create_invited_user(
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"backend.api.features.admin.user_admin_routes.create_invited_user",
AsyncMock(return_value=_sample_invited_user()),
)
response = client.post(
"/admin/invited-users",
json={"email": "invited@example.com", "name": "Invited User"},
)
assert response.status_code == 200
data = response.json()
assert data["email"] == "invited@example.com"
assert data["name"] == "Invited User"
def test_bulk_create_invited_users(
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"backend.api.features.admin.user_admin_routes.bulk_create_invited_users_from_file",
AsyncMock(return_value=_sample_bulk_invited_users_result()),
)
response = client.post(
"/admin/invited-users/bulk",
files={
"file": ("invites.txt", b"invited@example.com\nduplicate@example.com\n")
},
)
assert response.status_code == 200
data = response.json()
assert data["created_count"] == 1
assert data["skipped_count"] == 1
assert data["results"][0]["status"] == "CREATED"
assert data["results"][1]["status"] == "SKIPPED"
def test_revoke_invited_user(
mocker: pytest_mock.MockerFixture,
) -> None:
revoked = _sample_invited_user().model_copy(
update={"status": prisma.enums.InvitedUserStatus.REVOKED}
)
mocker.patch(
"backend.api.features.admin.user_admin_routes.revoke_invited_user",
AsyncMock(return_value=revoked),
)
response = client.post("/admin/invited-users/invite-1/revoke")
assert response.status_code == 200
assert response.json()["status"] == "REVOKED"
def test_retry_invited_user_tally(
mocker: pytest_mock.MockerFixture,
) -> None:
retried = _sample_invited_user().model_copy(
update={"tally_status": prisma.enums.TallyComputationStatus.RUNNING}
)
mocker.patch(
"backend.api.features.admin.user_admin_routes.retry_invited_user_tally",
AsyncMock(return_value=retried),
)
response = client.post("/admin/invited-users/invite-1/retry-tally")
assert response.status_code == 200
assert response.json()["tally_status"] == "RUNNING"

View File

@@ -54,6 +54,7 @@ from backend.copilot.tools.models import (
)
from backend.copilot.tracking import track_user_message
from backend.data.redis_client import get_redis_async
from backend.data.understanding import get_business_understanding
from backend.data.workspace import get_or_create_workspace
from backend.util.exceptions import NotFoundError
@@ -853,6 +854,36 @@ async def session_assign_user(
return {"status": "ok"}
# ========== Suggested Prompts ==========
class SuggestedPromptsResponse(BaseModel):
"""Response model for user-specific suggested prompts."""
prompts: list[str]
@router.get(
"/suggested-prompts",
dependencies=[Security(auth.requires_user)],
)
async def get_suggested_prompts(
user_id: Annotated[str, Security(auth.get_user_id)],
) -> SuggestedPromptsResponse:
"""
Get LLM-generated suggested prompts for the authenticated user.
Returns personalized quick-action prompts based on the user's
business understanding. Returns an empty list if no custom prompts
are available.
"""
understanding = await get_business_understanding(user_id)
if understanding is None:
return SuggestedPromptsResponse(prompts=[])
return SuggestedPromptsResponse(prompts=understanding.suggested_prompts)
# ========== Configuration ==========

View File

@@ -1,6 +1,6 @@
"""Tests for chat API routes: session title update and file attachment validation."""
"""Tests for chat API routes: session title update, file attachment validation, and suggested prompts."""
from unittest.mock import AsyncMock
from unittest.mock import AsyncMock, MagicMock
import fastapi
import fastapi.testclient
@@ -249,3 +249,62 @@ def test_file_ids_scoped_to_workspace(mocker: pytest_mock.MockFixture):
call_kwargs = mock_prisma.find_many.call_args[1]
assert call_kwargs["where"]["workspaceId"] == "my-workspace-id"
assert call_kwargs["where"]["isDeleted"] is False
# ─── Suggested prompts endpoint ──────────────────────────────────────
def _mock_get_business_understanding(
mocker: pytest_mock.MockerFixture,
*,
return_value=None,
):
"""Mock get_business_understanding."""
return mocker.patch(
"backend.api.features.chat.routes.get_business_understanding",
new_callable=AsyncMock,
return_value=return_value,
)
def test_suggested_prompts_returns_prompts(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""User with understanding and prompts gets them back."""
mock_understanding = MagicMock()
mock_understanding.suggested_prompts = ["Do X", "Do Y", "Do Z"]
_mock_get_business_understanding(mocker, return_value=mock_understanding)
response = client.get("/suggested-prompts")
assert response.status_code == 200
assert response.json() == {"prompts": ["Do X", "Do Y", "Do Z"]}
def test_suggested_prompts_no_understanding(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""User with no understanding gets empty list."""
_mock_get_business_understanding(mocker, return_value=None)
response = client.get("/suggested-prompts")
assert response.status_code == 200
assert response.json() == {"prompts": []}
def test_suggested_prompts_empty_prompts(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""User with understanding but no prompts gets empty list."""
mock_understanding = MagicMock()
mock_understanding.suggested_prompts = []
_mock_get_business_understanding(mocker, return_value=mock_understanding)
response = client.get("/suggested-prompts")
assert response.status_code == 200
assert response.json() == {"prompts": []}

View File

@@ -55,6 +55,7 @@ from backend.data.credit import (
set_auto_top_up,
)
from backend.data.graph import GraphSettings
from backend.data.invited_user import get_or_activate_user
from backend.data.model import CredentialsMetaInput, UserOnboarding
from backend.data.notifications import NotificationPreference, NotificationPreferenceDTO
from backend.data.onboarding import (
@@ -70,7 +71,6 @@ from backend.data.onboarding import (
update_user_onboarding,
)
from backend.data.user import (
get_or_create_user,
get_user_by_id,
get_user_notification_preference,
update_user_email,
@@ -136,12 +136,10 @@ _tally_background_tasks: set[asyncio.Task] = set()
dependencies=[Security(requires_user)],
)
async def get_or_create_user_route(user_data: dict = Security(get_jwt_payload)):
user = await get_or_create_user(user_data)
user = await get_or_activate_user(user_data)
# Fire-and-forget: populate business understanding from Tally form.
# We use created_at proximity instead of an is_new flag because
# get_or_create_user is cached — a separate is_new return value would be
# unreliable on repeated calls within the cache TTL.
# Fire-and-forget: backfill Tally understanding when invite pre-seeding did
# not produce a stored result before first activation.
age_seconds = (datetime.now(timezone.utc) - user.created_at).total_seconds()
if age_seconds < 30:
try:
@@ -165,7 +163,8 @@ async def get_or_create_user_route(user_data: dict = Security(get_jwt_payload)):
dependencies=[Security(requires_user)],
)
async def update_user_email_route(
user_id: Annotated[str, Security(get_user_id)], email: str = Body(...)
user_id: Annotated[str, Security(get_user_id)],
email: str = Body(...),
) -> dict[str, str]:
await update_user_email(user_id, email)
@@ -179,10 +178,16 @@ async def update_user_email_route(
dependencies=[Security(requires_user)],
)
async def get_user_timezone_route(
user_data: dict = Security(get_jwt_payload),
user_id: Annotated[str, Security(get_user_id)],
) -> TimezoneResponse:
"""Get user timezone setting."""
user = await get_or_create_user(user_data)
try:
user = await get_user_by_id(user_id)
except ValueError:
raise HTTPException(
status_code=HTTP_404_NOT_FOUND,
detail="User not found. Please complete activation via /auth/user first.",
)
return TimezoneResponse(timezone=user.timezone)
@@ -193,7 +198,8 @@ async def get_user_timezone_route(
dependencies=[Security(requires_user)],
)
async def update_user_timezone_route(
user_id: Annotated[str, Security(get_user_id)], request: UpdateTimezoneRequest
user_id: Annotated[str, Security(get_user_id)],
request: UpdateTimezoneRequest,
) -> TimezoneResponse:
"""Update user timezone. The timezone should be a valid IANA timezone identifier."""
user = await update_user_timezone(user_id, str(request.timezone))

View File

@@ -51,7 +51,7 @@ def test_get_or_create_user_route(
}
mocker.patch(
"backend.api.features.v1.get_or_create_user",
"backend.api.features.v1.get_or_activate_user",
return_value=mock_user,
)

View File

@@ -19,6 +19,7 @@ from prisma.errors import PrismaError
import backend.api.features.admin.credit_admin_routes
import backend.api.features.admin.execution_analytics_routes
import backend.api.features.admin.store_admin_routes
import backend.api.features.admin.user_admin_routes
import backend.api.features.builder
import backend.api.features.builder.routes
import backend.api.features.chat.routes as chat_routes
@@ -311,6 +312,11 @@ app.include_router(
tags=["v2", "admin"],
prefix="/api/executions",
)
app.include_router(
backend.api.features.admin.user_admin_routes.router,
tags=["v2", "admin"],
prefix="/api/users",
)
app.include_router(
backend.api.features.executions.review.routes.router,
tags=["v2", "executions", "review"],

View File

@@ -11,7 +11,10 @@ from backend.blocks._base import (
BlockSchemaInput,
BlockSchemaOutput,
)
from backend.data.execution import ExecutionContext
from backend.data.model import SchemaField
from backend.util.file import parse_data_uri, resolve_media_content
from backend.util.type import MediaFileType
from ._api import get_api
from ._auth import (
@@ -178,7 +181,8 @@ class FileOperation(StrEnum):
class FileOperationInput(TypedDict):
path: str
content: str
# MediaFileType is a str NewType — no runtime breakage for existing callers.
content: MediaFileType
operation: FileOperation
@@ -275,11 +279,11 @@ class GithubMultiFileCommitBlock(Block):
base_tree_sha = commit_data["tree"]["sha"]
# 3. Build tree entries for each file operation (blobs created concurrently)
async def _create_blob(content: str) -> str:
async def _create_blob(content: str, encoding: str = "utf-8") -> str:
blob_url = repo_url + "/git/blobs"
blob_response = await api.post(
blob_url,
json={"content": content, "encoding": "utf-8"},
json={"content": content, "encoding": encoding},
)
return blob_response.json()["sha"]
@@ -301,10 +305,19 @@ class GithubMultiFileCommitBlock(Block):
else:
upsert_files.append((path, file_op.get("content", "")))
# Create all blobs concurrently
# Create all blobs concurrently. Data URIs (from store_media_file)
# are sent as base64 blobs to preserve binary content.
if upsert_files:
async def _make_blob(content: str) -> str:
parsed = parse_data_uri(content)
if parsed is not None:
_, b64_payload = parsed
return await _create_blob(b64_payload, encoding="base64")
return await _create_blob(content)
blob_shas = await asyncio.gather(
*[_create_blob(content) for _, content in upsert_files]
*[_make_blob(content) for _, content in upsert_files]
)
for (path, _), blob_sha in zip(upsert_files, blob_shas):
tree_entries.append(
@@ -358,15 +371,36 @@ class GithubMultiFileCommitBlock(Block):
input_data: Input,
*,
credentials: GithubCredentials,
execution_context: ExecutionContext,
**kwargs,
) -> BlockOutput:
try:
# Resolve media references (workspace://, data:, URLs) to data
# URIs so _make_blob can send binary content correctly.
resolved_files: list[FileOperationInput] = []
for file_op in input_data.files:
content = file_op.get("content", "")
operation = FileOperation(file_op.get("operation", "upsert"))
if operation != FileOperation.DELETE:
content = await resolve_media_content(
MediaFileType(content),
execution_context,
return_format="for_external_api",
)
resolved_files.append(
FileOperationInput(
path=file_op["path"],
content=MediaFileType(content),
operation=operation,
)
)
sha, url = await self.multi_file_commit(
credentials,
input_data.repo_url,
input_data.branch,
input_data.commit_message,
input_data.files,
resolved_files,
)
yield "sha", sha
yield "url", url

View File

@@ -8,6 +8,7 @@ from backend.blocks.github.pull_requests import (
GithubMergePullRequestBlock,
prepare_pr_api_url,
)
from backend.data.execution import ExecutionContext
from backend.util.exceptions import BlockExecutionError
# ── prepare_pr_api_url tests ──
@@ -97,7 +98,11 @@ async def test_multi_file_commit_error_path():
"credentials": TEST_CREDENTIALS_INPUT,
}
with pytest.raises(BlockExecutionError, match="ref update failed"):
async for _ in block.execute(input_data, credentials=TEST_CREDENTIALS):
async for _ in block.execute(
input_data,
credentials=TEST_CREDENTIALS,
execution_context=ExecutionContext(),
):
pass

View File

@@ -115,7 +115,7 @@ class ChatConfig(BaseSettings):
description="E2B sandbox template to use for copilot sessions.",
)
e2b_sandbox_timeout: int = Field(
default=10800, # 3 hours — wall-clock timeout, not idle; explicit pause is primary
default=300, # 5 min safety net explicit per-turn pause is the primary mechanism
description="E2B sandbox running-time timeout (seconds). "
"E2B timeout is wall-clock (not idle). Explicit per-turn pause is the primary "
"mechanism; this is the safety net.",

View File

@@ -11,6 +11,8 @@ from contextvars import ContextVar
from typing import TYPE_CHECKING
from backend.copilot.model import ChatSession
from backend.data.db_accessors import workspace_db
from backend.util.workspace import WorkspaceManager
if TYPE_CHECKING:
from e2b import AsyncSandbox
@@ -82,6 +84,17 @@ def resolve_sandbox_path(path: str) -> str:
return normalized
async def get_workspace_manager(user_id: str, session_id: str) -> WorkspaceManager:
"""Create a session-scoped :class:`WorkspaceManager`.
Placed here (rather than in ``tools/workspace_files``) so that modules
like ``sdk/file_ref`` can import it without triggering the heavy
``tools/__init__`` import chain.
"""
workspace = await workspace_db().get_or_create_workspace(user_id)
return WorkspaceManager(user_id, workspace.id, session_id)
def is_allowed_local_path(path: str, sdk_cwd: str | None = None) -> bool:
"""Return True if *path* is within an allowed host-filesystem location.

View File

@@ -11,34 +11,18 @@ from backend.copilot.tools import TOOL_REGISTRY
# Shared technical notes that apply to both SDK and baseline modes
_SHARED_TOOL_NOTES = """\
### Sharing files with the user
After saving a file to the persistent workspace with `write_workspace_file`,
share it with the user by embedding the `download_url` from the response in
your message as a Markdown link or image:
### Sharing files
After `write_workspace_file`, embed the `download_url` in Markdown:
- File: `[report.csv](workspace://file_id#text/csv)`
- Image: `![chart](workspace://file_id#image/png)`
- Video: `![recording](workspace://file_id#video/mp4)`
- **Any file** — shows as a clickable download link:
`[report.csv](workspace://file_id#text/csv)`
- **Image** — renders inline in chat:
`![chart](workspace://file_id#image/png)`
- **Video** — renders inline in chat with player controls:
`![recording](workspace://file_id#video/mp4)`
The `download_url` field in the `write_workspace_file` response is already
in the correct format — paste it directly after the `(` in the Markdown.
### Passing file content to tools — @@agptfile: references
Instead of copying large file contents into a tool argument, pass a file
reference and the platform will load the content for you.
Syntax: `@@agptfile:<uri>[<start>-<end>]`
- `<uri>` **must** start with `workspace://` or `/` (absolute path):
- `workspace://<file_id>` — workspace file by ID
- `workspace:///<path>` — workspace file by virtual path
- `/absolute/local/path` — ephemeral or sdk_cwd file
- E2B sandbox absolute path (e.g. `/home/user/script.py`)
- `[<start>-<end>]` is an optional 1-indexed inclusive line range.
- URIs that do not start with `workspace://` or `/` are **not** expanded.
### File references — @@agptfile:
Pass large file content to tools by reference: `@@agptfile:<uri>[<start>-<end>]`
- `workspace://<file_id>` or `workspace:///<path>` — workspace files
- `/absolute/path` — local/sandbox files
- `[start-end]` — optional 1-indexed line range
- Multiple refs per argument supported. Only `workspace://` and absolute paths are expanded.
Examples:
```
@@ -49,13 +33,16 @@ Examples:
@@agptfile:/home/user/script.py
```
You can embed a reference inside any string argument, or use it as the entire
value. Multiple references in one argument are all expanded.
**Structured data**: When the entire argument is a single file reference, the platform auto-parses by extension/MIME. Supported: JSON, JSONL, CSV, TSV, YAML, TOML, Parquet, Excel (.xlsx only). Unrecognised formats return plain string.
**Type coercion**: The platform auto-coerces expanded string values to match block input types (e.g. JSON string → `list[list[str]]`).
### Media file inputs (format: "file")
Inputs with `"format": "file"` accept `workspace://<file_id>` or `data:<mime>;base64,<payload>`.
Pass the `workspace://` URI directly (do NOT wrap in `@@agptfile:`). This avoids large payloads and preserves binary content.
### Sub-agent tasks
- When using the Task tool, NEVER set `run_in_background` to true.
All tasks must run in the foreground.
- Task tool: NEVER set `run_in_background` to true.
"""
@@ -91,30 +78,18 @@ def _build_storage_supplement(
## Tool notes
### Shell commands
- The SDK built-in Bash tool is NOT available. Use the `bash_exec` MCP tool
for shell commands — it runs {sandbox_type}.
### Working directory
- Your working directory is: `{working_dir}`
- All SDK file tools AND `bash_exec` operate on the same filesystem
- Use relative paths or absolute paths under `{working_dir}` for all file operations
### Two storage systems — CRITICAL to understand
### Shell & filesystem
- Use `bash_exec` for shell commands ({sandbox_type}). Working dir: `{working_dir}`
- All file tools share the same filesystem. Use relative or absolute paths under this dir.
### Storage — important
1. **{storage_system_1_name}** (`{working_dir}`):
{characteristics}
{persistence}
2. **Persistent workspace** (cloud storage):
- Files here **survive across sessions indefinitely**
### Moving files between storages
- **{file_move_name_1_to_2}**: Copy to persistent workspace
- **{file_move_name_2_to_1}**: Download for processing
### File persistence
Important files (code, configs, outputs) should be saved to workspace to ensure they persist.
2. **Persistent workspace** (cloud) — survives across sessions.
- {file_move_name_1_to_2}: use `write_workspace_file`
- {file_move_name_2_to_1}: use `read_workspace_file` with save_to_path
- Save important files to workspace for persistence.
{_SHARED_TOOL_NOTES}"""

View File

@@ -3,12 +3,45 @@
This module provides the integration layer between the Claude Agent SDK
and the existing CoPilot tool system, enabling drop-in replacement of
the current LLM orchestration with the battle-tested Claude Agent SDK.
Submodule imports are deferred via PEP 562 ``__getattr__`` to break a
circular import cycle::
sdk/__init__ → tool_adapter → copilot.tools (TOOL_REGISTRY)
copilot.tools → run_block → sdk.file_ref (no cycle here, but…)
sdk/__init__ → service → copilot.prompting → copilot.tools (cycle!)
``tool_adapter`` uses ``TOOL_REGISTRY`` at **module level** to build the
static ``COPILOT_TOOL_NAMES`` list, so the import cannot be deferred to
function scope without a larger refactor (moving tool-name registration
to a separate lightweight module). The lazy-import pattern here is the
least invasive way to break the cycle while keeping module-level constants
intact.
"""
from .service import stream_chat_completion_sdk
from .tool_adapter import create_copilot_mcp_server
from typing import Any
__all__ = [
"stream_chat_completion_sdk",
"create_copilot_mcp_server",
]
# Dispatch table for PEP 562 lazy imports. Each entry is a (module, attr)
# pair so new exports can be added without touching __getattr__ itself.
_LAZY_IMPORTS: dict[str, tuple[str, str]] = {
"stream_chat_completion_sdk": (".service", "stream_chat_completion_sdk"),
"create_copilot_mcp_server": (".tool_adapter", "create_copilot_mcp_server"),
}
def __getattr__(name: str) -> Any:
entry = _LAZY_IMPORTS.get(name)
if entry is not None:
module_path, attr = entry
import importlib
module = importlib.import_module(module_path, package=__name__)
value = getattr(module, attr)
globals()[name] = value
return value
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")

View File

@@ -11,7 +11,7 @@ persistence, and the ``CompactionTracker`` state machine.
import asyncio
import logging
import uuid
from collections.abc import Callable
from dataclasses import dataclass, field
from ..constants import COMPACTION_DONE_MSG, COMPACTION_TOOL_NAME
from ..model import ChatMessage, ChatSession
@@ -27,6 +27,19 @@ from ..response_model import (
logger = logging.getLogger(__name__)
@dataclass
class CompactionResult:
"""Result of emit_end_if_ready — bundles events with compaction metadata.
Eliminates the need for separate ``compaction_just_ended`` checks,
preventing TOCTOU races between the emit call and the flag read.
"""
events: list[StreamBaseResponse] = field(default_factory=list)
just_ended: bool = False
transcript_path: str = ""
# ---------------------------------------------------------------------------
# Event builders (private — use CompactionTracker or compaction_events)
# ---------------------------------------------------------------------------
@@ -177,11 +190,22 @@ class CompactionTracker:
self._start_emitted = False
self._done = False
self._tool_call_id = ""
self._transcript_path: str = ""
@property
def on_compact(self) -> Callable[[], None]:
"""Callback for the PreCompact hook."""
return self._compact_start.set
def on_compact(self, transcript_path: str = "") -> None:
"""Callback for the PreCompact hook. Stores transcript_path."""
if (
self._transcript_path
and transcript_path
and self._transcript_path != transcript_path
):
logger.warning(
"[Compaction] Overwriting transcript_path %s -> %s",
self._transcript_path,
transcript_path,
)
self._transcript_path = transcript_path
self._compact_start.set()
# ------------------------------------------------------------------
# Pre-query compaction
@@ -201,6 +225,7 @@ class CompactionTracker:
self._done = False
self._start_emitted = False
self._tool_call_id = ""
self._transcript_path = ""
def emit_start_if_ready(self) -> list[StreamBaseResponse]:
"""If the PreCompact hook fired, emit start events (spinning tool)."""
@@ -211,15 +236,20 @@ class CompactionTracker:
return _start_events(self._tool_call_id)
return []
async def emit_end_if_ready(self, session: ChatSession) -> list[StreamBaseResponse]:
"""If compaction is in progress, emit end events and persist."""
async def emit_end_if_ready(self, session: ChatSession) -> CompactionResult:
"""If compaction is in progress, emit end events and persist.
Returns a ``CompactionResult`` with ``just_ended=True`` and the
captured ``transcript_path`` when a compaction cycle completes.
This avoids a separate flag check (TOCTOU-safe).
"""
# Yield so pending hook tasks can set compact_start
await asyncio.sleep(0)
if self._done:
return []
return CompactionResult()
if not self._start_emitted and not self._compact_start.is_set():
return []
return CompactionResult()
if self._start_emitted:
# Close the open spinner
@@ -232,8 +262,12 @@ class CompactionTracker:
COMPACTION_DONE_MSG, tool_call_id=persist_id
)
transcript_path = self._transcript_path
self._compact_start.clear()
self._start_emitted = False
self._done = True
self._transcript_path = ""
_persist(session, persist_id, COMPACTION_DONE_MSG)
return done_events
return CompactionResult(
events=done_events, just_ended=True, transcript_path=transcript_path
)

View File

@@ -195,10 +195,11 @@ class TestCompactionTracker:
session = _make_session()
tracker.on_compact()
tracker.emit_start_if_ready()
evts = await tracker.emit_end_if_ready(session)
assert len(evts) == 2
assert isinstance(evts[0], StreamToolOutputAvailable)
assert isinstance(evts[1], StreamFinishStep)
result = await tracker.emit_end_if_ready(session)
assert result.just_ended is True
assert len(result.events) == 2
assert isinstance(result.events[0], StreamToolOutputAvailable)
assert isinstance(result.events[1], StreamFinishStep)
# Should persist
assert len(session.messages) == 2
@@ -210,28 +211,32 @@ class TestCompactionTracker:
session = _make_session()
tracker.on_compact()
# Don't call emit_start_if_ready
evts = await tracker.emit_end_if_ready(session)
assert len(evts) == 5 # Full self-contained event
assert isinstance(evts[0], StreamStartStep)
result = await tracker.emit_end_if_ready(session)
assert result.just_ended is True
assert len(result.events) == 5 # Full self-contained event
assert isinstance(result.events[0], StreamStartStep)
assert len(session.messages) == 2
@pytest.mark.asyncio
async def test_emit_end_no_op_when_done(self):
async def test_emit_end_no_op_when_no_new_compaction(self):
tracker = CompactionTracker()
session = _make_session()
tracker.on_compact()
tracker.emit_start_if_ready()
await tracker.emit_end_if_ready(session)
# Second call should be no-op
evts = await tracker.emit_end_if_ready(session)
assert evts == []
result1 = await tracker.emit_end_if_ready(session)
assert result1.just_ended is True
# Second call should be no-op (no new on_compact)
result2 = await tracker.emit_end_if_ready(session)
assert result2.just_ended is False
assert result2.events == []
@pytest.mark.asyncio
async def test_emit_end_no_op_when_nothing_happened(self):
tracker = CompactionTracker()
session = _make_session()
evts = await tracker.emit_end_if_ready(session)
assert evts == []
result = await tracker.emit_end_if_ready(session)
assert result.just_ended is False
assert result.events == []
def test_emit_pre_query(self):
tracker = CompactionTracker()
@@ -246,20 +251,29 @@ class TestCompactionTracker:
tracker._done = True
tracker._start_emitted = True
tracker._tool_call_id = "old"
tracker._transcript_path = "/some/path"
tracker.reset_for_query()
assert tracker._done is False
assert tracker._start_emitted is False
assert tracker._tool_call_id == ""
assert tracker._transcript_path == ""
@pytest.mark.asyncio
async def test_pre_query_blocks_sdk_compaction(self):
"""After pre-query compaction, SDK compaction events are suppressed."""
async def test_pre_query_blocks_sdk_compaction_until_reset(self):
"""After pre-query compaction, SDK compaction is blocked until
reset_for_query is called."""
tracker = CompactionTracker()
session = _make_session()
tracker.emit_pre_query(session)
tracker.on_compact()
# _done is True so emit_start_if_ready is blocked
evts = tracker.emit_start_if_ready()
assert evts == [] # _done blocks it
assert evts == []
# Reset clears _done, allowing subsequent compaction
tracker.reset_for_query()
tracker.on_compact()
evts = tracker.emit_start_if_ready()
assert len(evts) == 3
@pytest.mark.asyncio
async def test_reset_allows_new_compaction(self):
@@ -279,9 +293,9 @@ class TestCompactionTracker:
session = _make_session()
tracker.on_compact()
start_evts = tracker.emit_start_if_ready()
end_evts = await tracker.emit_end_if_ready(session)
result = await tracker.emit_end_if_ready(session)
start_evt = start_evts[1]
end_evt = end_evts[0]
end_evt = result.events[0]
assert isinstance(start_evt, StreamToolInputStart)
assert isinstance(end_evt, StreamToolOutputAvailable)
assert start_evt.toolCallId == end_evt.toolCallId
@@ -289,3 +303,105 @@ class TestCompactionTracker:
tool_calls = session.messages[0].tool_calls
assert tool_calls is not None
assert tool_calls[0]["id"] == start_evt.toolCallId
@pytest.mark.asyncio
async def test_multiple_compactions_within_query(self):
"""Two mid-stream compactions within a single query both trigger."""
tracker = CompactionTracker()
session = _make_session()
# First compaction cycle
tracker.on_compact("/path/1")
tracker.emit_start_if_ready()
result1 = await tracker.emit_end_if_ready(session)
assert result1.just_ended is True
assert len(result1.events) == 2
assert result1.transcript_path == "/path/1"
# Second compaction cycle (should NOT be blocked — _done resets
# because emit_end_if_ready sets it True, but the next on_compact
# + emit_start_if_ready checks !_done which IS True now.
# So we need reset_for_query between queries, but within a single
# query multiple compactions work because _done blocks emit_start
# until the next message arrives, at which point emit_end detects it)
#
# Actually: _done=True blocks emit_start_if_ready, so we need
# the stream loop to reset. In practice service.py doesn't call
# reset between compactions within the same query — let's verify
# the actual behavior.
tracker.on_compact("/path/2")
# _done is True from first compaction, so start is blocked
start_evts = tracker.emit_start_if_ready()
assert start_evts == []
# But emit_end returns no-op because _done is True
result2 = await tracker.emit_end_if_ready(session)
assert result2.just_ended is False
@pytest.mark.asyncio
async def test_multiple_compactions_with_intervening_message(self):
"""Multiple compactions work when the stream loop processes messages between them.
In the real service.py flow:
1. PreCompact fires → on_compact()
2. emit_start shows spinner
3. Next message arrives → emit_end completes compaction (_done=True)
4. Stream continues processing messages...
5. If a second PreCompact fires, _done=True blocks emit_start
6. But the next message triggers emit_end, which sees _done=True → no-op
7. The stream loop needs to detect this and handle accordingly
The actual flow for multiple compactions within a query requires
_done to be cleared between them. The service.py code uses
CompactionResult.just_ended to trigger replace_entries, and _done
stays True until reset_for_query.
"""
tracker = CompactionTracker()
session = _make_session()
# First compaction
tracker.on_compact("/path/1")
tracker.emit_start_if_ready()
result1 = await tracker.emit_end_if_ready(session)
assert result1.just_ended is True
assert result1.transcript_path == "/path/1"
# Simulate reset between queries
tracker.reset_for_query()
# Second compaction in new query
tracker.on_compact("/path/2")
start_evts = tracker.emit_start_if_ready()
assert len(start_evts) == 3
result2 = await tracker.emit_end_if_ready(session)
assert result2.just_ended is True
assert result2.transcript_path == "/path/2"
def test_on_compact_stores_transcript_path(self):
tracker = CompactionTracker()
tracker.on_compact("/some/path.jsonl")
assert tracker._transcript_path == "/some/path.jsonl"
@pytest.mark.asyncio
async def test_emit_end_returns_transcript_path(self):
"""CompactionResult includes the transcript_path from on_compact."""
tracker = CompactionTracker()
session = _make_session()
tracker.on_compact("/my/session.jsonl")
tracker.emit_start_if_ready()
result = await tracker.emit_end_if_ready(session)
assert result.just_ended is True
assert result.transcript_path == "/my/session.jsonl"
# transcript_path is cleared after emit_end
assert tracker._transcript_path == ""
@pytest.mark.asyncio
async def test_emit_end_clears_transcript_path(self):
"""After emit_end, _transcript_path is reset so it doesn't leak to
subsequent non-compaction emit_end calls."""
tracker = CompactionTracker()
session = _make_session()
tracker.on_compact("/first/path.jsonl")
tracker.emit_start_if_ready()
await tracker.emit_end_if_ready(session)
# After compaction, _transcript_path is cleared
assert tracker._transcript_path == ""

View File

@@ -0,0 +1,531 @@
"""End-to-end compaction flow test.
Simulates the full service.py compaction lifecycle using real-format
JSONL session files — no SDK subprocess needed. Exercises:
1. TranscriptBuilder loads a "downloaded" transcript
2. User query appended, assistant response streamed
3. PreCompact hook fires → CompactionTracker.on_compact()
4. Next message → emit_start_if_ready() yields spinner events
5. Message after that → emit_end_if_ready() returns CompactionResult
6. read_compacted_entries() reads the CLI session file
7. TranscriptBuilder.replace_entries() syncs state
8. More messages appended post-compaction
9. to_jsonl() exports full state for upload
10. Fresh builder loads the export — roundtrip verified
"""
import asyncio
from backend.copilot.model import ChatSession
from backend.copilot.response_model import (
StreamFinishStep,
StreamStartStep,
StreamToolInputAvailable,
StreamToolInputStart,
StreamToolOutputAvailable,
)
from backend.copilot.sdk.compaction import CompactionTracker
from backend.copilot.sdk.transcript import (
read_compacted_entries,
strip_progress_entries,
)
from backend.copilot.sdk.transcript_builder import TranscriptBuilder
from backend.util import json
def _make_jsonl(*entries: dict) -> str:
return "\n".join(json.dumps(e) for e in entries) + "\n"
def _run(coro):
"""Run an async coroutine synchronously."""
return asyncio.run(coro)
# ---------------------------------------------------------------------------
# Fixtures: realistic CLI session file content
# ---------------------------------------------------------------------------
# Pre-compaction conversation
USER_1 = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "What files are in this project?"},
}
ASST_1_THINKING = {
"type": "assistant",
"uuid": "a1-think",
"parentUuid": "u1",
"message": {
"role": "assistant",
"id": "msg_sdk_aaa",
"type": "message",
"content": [{"type": "thinking", "thinking": "Let me look at the files..."}],
"stop_reason": None,
"stop_sequence": None,
},
}
ASST_1_TOOL = {
"type": "assistant",
"uuid": "a1-tool",
"parentUuid": "u1",
"message": {
"role": "assistant",
"id": "msg_sdk_aaa",
"type": "message",
"content": [
{
"type": "tool_use",
"id": "tu1",
"name": "Bash",
"input": {"command": "ls"},
}
],
"stop_reason": "tool_use",
"stop_sequence": None,
},
}
TOOL_RESULT_1 = {
"type": "user",
"uuid": "tr1",
"parentUuid": "a1-tool",
"message": {
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "tu1",
"content": "file1.py\nfile2.py",
}
],
},
}
ASST_1_TEXT = {
"type": "assistant",
"uuid": "a1-text",
"parentUuid": "tr1",
"message": {
"role": "assistant",
"id": "msg_sdk_bbb",
"type": "message",
"content": [{"type": "text", "text": "I found file1.py and file2.py."}],
"stop_reason": "end_turn",
"stop_sequence": None,
},
}
# Progress entries (should be stripped during upload)
PROGRESS_1 = {
"type": "progress",
"uuid": "prog1",
"parentUuid": "a1-tool",
"data": {"type": "bash_progress", "stdout": "running ls..."},
}
# Second user message
USER_2 = {
"type": "user",
"uuid": "u2",
"parentUuid": "a1-text",
"message": {"role": "user", "content": "Show me file1.py"},
}
ASST_2 = {
"type": "assistant",
"uuid": "a2",
"parentUuid": "u2",
"message": {
"role": "assistant",
"id": "msg_sdk_ccc",
"type": "message",
"content": [{"type": "text", "text": "Here is file1.py content..."}],
"stop_reason": "end_turn",
"stop_sequence": None,
},
}
# --- Compaction summary (written by CLI after context compaction) ---
COMPACT_SUMMARY = {
"type": "summary",
"uuid": "cs1",
"isCompactSummary": True,
"message": {
"role": "user",
"content": (
"Summary: User asked about project files. Found file1.py and file2.py. "
"User then asked to see file1.py."
),
},
}
# Post-compaction assistant response
POST_COMPACT_ASST = {
"type": "assistant",
"uuid": "a3",
"parentUuid": "cs1",
"message": {
"role": "assistant",
"id": "msg_sdk_ddd",
"type": "message",
"content": [{"type": "text", "text": "Here is the content of file1.py..."}],
"stop_reason": "end_turn",
"stop_sequence": None,
},
}
# Post-compaction user follow-up
USER_3 = {
"type": "user",
"uuid": "u3",
"parentUuid": "a3",
"message": {"role": "user", "content": "Now show file2.py"},
}
ASST_3 = {
"type": "assistant",
"uuid": "a4",
"parentUuid": "u3",
"message": {
"role": "assistant",
"id": "msg_sdk_eee",
"type": "message",
"content": [{"type": "text", "text": "Here is file2.py..."}],
"stop_reason": "end_turn",
"stop_sequence": None,
},
}
# ---------------------------------------------------------------------------
# E2E test
# ---------------------------------------------------------------------------
class TestCompactionE2E:
def _write_session_file(self, session_dir, entries):
"""Write a CLI session JSONL file."""
path = session_dir / "session.jsonl"
path.write_text(_make_jsonl(*entries))
return path
def test_full_compaction_lifecycle(self, tmp_path, monkeypatch):
"""Simulate the complete service.py compaction flow.
Timeline:
1. Previous turn uploaded transcript with [USER_1, ASST_1, USER_2, ASST_2]
2. Current turn: download → load_previous
3. User sends "Now show file2.py" → append_user
4. SDK starts streaming response
5. Mid-stream: PreCompact hook fires (context too large)
6. CLI writes compaction summary to session file
7. Next SDK message → emit_start (spinner)
8. Following message → emit_end (CompactionResult)
9. read_compacted_entries reads the session file
10. replace_entries syncs TranscriptBuilder
11. More assistant messages appended
12. Export → upload → next turn downloads it
"""
# --- Setup CLI projects directory ---
config_dir = tmp_path / "config"
projects_dir = config_dir / "projects"
session_dir = projects_dir / "proj"
session_dir.mkdir(parents=True)
monkeypatch.setenv("CLAUDE_CONFIG_DIR", str(config_dir))
# --- Step 1-2: Load "downloaded" transcript from previous turn ---
previous_transcript = _make_jsonl(
USER_1,
ASST_1_THINKING,
ASST_1_TOOL,
TOOL_RESULT_1,
ASST_1_TEXT,
USER_2,
ASST_2,
)
builder = TranscriptBuilder()
builder.load_previous(previous_transcript)
assert builder.entry_count == 7
# --- Step 3: User sends new query ---
builder.append_user("Now show file2.py")
assert builder.entry_count == 8
# --- Step 4: SDK starts streaming ---
builder.append_assistant(
[{"type": "thinking", "thinking": "Let me read file2.py..."}],
model="claude-sonnet-4-20250514",
)
assert builder.entry_count == 9
# --- Step 5-6: PreCompact fires, CLI writes session file ---
session_file = self._write_session_file(
session_dir,
[
USER_1,
ASST_1_THINKING,
ASST_1_TOOL,
PROGRESS_1,
TOOL_RESULT_1,
ASST_1_TEXT,
USER_2,
ASST_2,
COMPACT_SUMMARY,
POST_COMPACT_ASST,
USER_3,
ASST_3,
],
)
# --- Step 7: CompactionTracker receives PreCompact hook ---
tracker = CompactionTracker()
session = ChatSession.new(user_id="test-user")
tracker.on_compact(str(session_file))
# --- Step 8: Next SDK message arrives → emit_start ---
start_events = tracker.emit_start_if_ready()
assert len(start_events) == 3
assert isinstance(start_events[0], StreamStartStep)
assert isinstance(start_events[1], StreamToolInputStart)
assert isinstance(start_events[2], StreamToolInputAvailable)
# Verify tool_call_id is set
tool_call_id = start_events[1].toolCallId
assert tool_call_id.startswith("compaction-")
# --- Step 9: Following message → emit_end ---
result = _run(tracker.emit_end_if_ready(session))
assert result.just_ended is True
assert result.transcript_path == str(session_file)
assert len(result.events) == 2
assert isinstance(result.events[0], StreamToolOutputAvailable)
assert isinstance(result.events[1], StreamFinishStep)
# Verify same tool_call_id
assert result.events[0].toolCallId == tool_call_id
# Session should have compaction messages persisted
assert len(session.messages) == 2
assert session.messages[0].role == "assistant"
assert session.messages[1].role == "tool"
# --- Step 10: read_compacted_entries + replace_entries ---
compacted = read_compacted_entries(str(session_file))
assert compacted is not None
# Should have: COMPACT_SUMMARY + POST_COMPACT_ASST + USER_3 + ASST_3
assert len(compacted) == 4
assert compacted[0]["uuid"] == "cs1"
assert compacted[0]["isCompactSummary"] is True
# Replace builder state with compacted entries
old_count = builder.entry_count
builder.replace_entries(compacted)
assert builder.entry_count == 4 # Only compacted entries
assert builder.entry_count < old_count # Compaction reduced entries
# --- Step 11: More assistant messages after compaction ---
builder.append_assistant(
[{"type": "text", "text": "Here is file2.py:\n\ndef hello():\n pass"}],
model="claude-sonnet-4-20250514",
stop_reason="end_turn",
)
assert builder.entry_count == 5
# --- Step 12: Export for upload ---
output = builder.to_jsonl()
assert output # Not empty
output_entries = [json.loads(line) for line in output.strip().split("\n")]
assert len(output_entries) == 5
# Verify structure:
# [COMPACT_SUMMARY, POST_COMPACT_ASST, USER_3, ASST_3, new_assistant]
assert output_entries[0]["type"] == "summary"
assert output_entries[0].get("isCompactSummary") is True
assert output_entries[0]["uuid"] == "cs1"
assert output_entries[1]["uuid"] == "a3"
assert output_entries[2]["uuid"] == "u3"
assert output_entries[3]["uuid"] == "a4"
assert output_entries[4]["type"] == "assistant"
# Verify parent chain is intact
assert output_entries[1]["parentUuid"] == "cs1" # a3 → cs1
assert output_entries[2]["parentUuid"] == "a3" # u3 → a3
assert output_entries[3]["parentUuid"] == "u3" # a4 → u3
assert output_entries[4]["parentUuid"] == "a4" # new → a4
# --- Step 13: Roundtrip — next turn loads this export ---
builder2 = TranscriptBuilder()
builder2.load_previous(output)
assert builder2.entry_count == 5
# isCompactSummary survives roundtrip
output2 = builder2.to_jsonl()
first_entry = json.loads(output2.strip().split("\n")[0])
assert first_entry.get("isCompactSummary") is True
# Can append more messages
builder2.append_user("What about file3.py?")
assert builder2.entry_count == 6
final_output = builder2.to_jsonl()
last_entry = json.loads(final_output.strip().split("\n")[-1])
assert last_entry["type"] == "user"
# Parented to the last entry from previous turn
assert last_entry["parentUuid"] == output_entries[-1]["uuid"]
def test_double_compaction_within_session(self, tmp_path, monkeypatch):
"""Two compactions in the same session (across reset_for_query)."""
config_dir = tmp_path / "config"
projects_dir = config_dir / "projects"
session_dir = projects_dir / "proj"
session_dir.mkdir(parents=True)
monkeypatch.setenv("CLAUDE_CONFIG_DIR", str(config_dir))
tracker = CompactionTracker()
session = ChatSession.new(user_id="test")
builder = TranscriptBuilder()
# --- First query with compaction ---
builder.append_user("first question")
builder.append_assistant([{"type": "text", "text": "first answer"}])
# Write session file for first compaction
first_summary = {
"type": "summary",
"uuid": "cs-first",
"isCompactSummary": True,
"message": {"role": "user", "content": "First compaction summary"},
}
first_post = {
"type": "assistant",
"uuid": "a-first",
"parentUuid": "cs-first",
"message": {"role": "assistant", "content": "first post-compact"},
}
file1 = session_dir / "session1.jsonl"
file1.write_text(_make_jsonl(first_summary, first_post))
tracker.on_compact(str(file1))
tracker.emit_start_if_ready()
result1 = _run(tracker.emit_end_if_ready(session))
assert result1.just_ended is True
compacted1 = read_compacted_entries(str(file1))
assert compacted1 is not None
builder.replace_entries(compacted1)
assert builder.entry_count == 2
# --- Reset for second query ---
tracker.reset_for_query()
# --- Second query with compaction ---
builder.append_user("second question")
builder.append_assistant([{"type": "text", "text": "second answer"}])
second_summary = {
"type": "summary",
"uuid": "cs-second",
"isCompactSummary": True,
"message": {"role": "user", "content": "Second compaction summary"},
}
second_post = {
"type": "assistant",
"uuid": "a-second",
"parentUuid": "cs-second",
"message": {"role": "assistant", "content": "second post-compact"},
}
file2 = session_dir / "session2.jsonl"
file2.write_text(_make_jsonl(second_summary, second_post))
tracker.on_compact(str(file2))
tracker.emit_start_if_ready()
result2 = _run(tracker.emit_end_if_ready(session))
assert result2.just_ended is True
compacted2 = read_compacted_entries(str(file2))
assert compacted2 is not None
builder.replace_entries(compacted2)
assert builder.entry_count == 2 # Only second compaction entries
# Export and verify
output = builder.to_jsonl()
entries = [json.loads(line) for line in output.strip().split("\n")]
assert entries[0]["uuid"] == "cs-second"
assert entries[0].get("isCompactSummary") is True
def test_strip_progress_then_load_then_compact_roundtrip(
self, tmp_path, monkeypatch
):
"""Full pipeline: strip → load → compact → replace → export → reload.
This tests the exact sequence that happens across two turns:
Turn 1: SDK produces transcript with progress entries
Upload: strip_progress_entries removes progress, upload to cloud
Turn 2: Download → load_previous → compaction fires → replace → export
Turn 3: Download the Turn 2 export → load_previous (roundtrip)
"""
config_dir = tmp_path / "config"
projects_dir = config_dir / "projects"
session_dir = projects_dir / "proj"
session_dir.mkdir(parents=True)
monkeypatch.setenv("CLAUDE_CONFIG_DIR", str(config_dir))
# --- Turn 1: SDK produces raw transcript ---
raw_content = _make_jsonl(
USER_1,
ASST_1_THINKING,
ASST_1_TOOL,
PROGRESS_1,
TOOL_RESULT_1,
ASST_1_TEXT,
USER_2,
ASST_2,
)
# Strip progress for upload
stripped = strip_progress_entries(raw_content)
stripped_entries = [
json.loads(line) for line in stripped.strip().split("\n") if line.strip()
]
# Progress should be gone
assert not any(e.get("type") == "progress" for e in stripped_entries)
assert len(stripped_entries) == 7 # 8 - 1 progress
# --- Turn 2: Download stripped, load, compaction happens ---
builder = TranscriptBuilder()
builder.load_previous(stripped)
assert builder.entry_count == 7
builder.append_user("Now show file2.py")
builder.append_assistant(
[{"type": "text", "text": "Reading file2.py..."}],
model="claude-sonnet-4-20250514",
)
# CLI writes session file with compaction
session_file = self._write_session_file(
session_dir,
[
USER_1,
ASST_1_TOOL,
TOOL_RESULT_1,
ASST_1_TEXT,
USER_2,
ASST_2,
COMPACT_SUMMARY,
POST_COMPACT_ASST,
],
)
compacted = read_compacted_entries(str(session_file))
assert compacted is not None
builder.replace_entries(compacted)
# Append post-compaction message
builder.append_user("Thanks!")
output = builder.to_jsonl()
# --- Turn 3: Fresh load of Turn 2 export ---
builder3 = TranscriptBuilder()
builder3.load_previous(output)
# Should have: compact_summary + post_compact_asst + "Thanks!"
assert builder3.entry_count == 3
# Compact summary survived the full pipeline
first = json.loads(builder3.to_jsonl().strip().split("\n")[0])
assert first.get("isCompactSummary") is True
assert first["type"] == "summary"

View File

@@ -41,12 +41,20 @@ from typing import Any
from backend.copilot.context import (
get_current_sandbox,
get_sdk_cwd,
get_workspace_manager,
is_allowed_local_path,
resolve_sandbox_path,
)
from backend.copilot.model import ChatSession
from backend.copilot.tools.workspace_files import get_manager
from backend.util.file import parse_workspace_uri
from backend.util.file_content_parser import (
BINARY_FORMATS,
MIME_TO_FORMAT,
PARSE_EXCEPTIONS,
infer_format_from_uri,
parse_file_content,
)
from backend.util.type import MediaFileType
class FileRefExpansionError(Exception):
@@ -74,6 +82,8 @@ _FILE_REF_RE = re.compile(
_MAX_EXPAND_CHARS = 200_000
# Maximum total characters across all @@agptfile: expansions in one string.
_MAX_TOTAL_EXPAND_CHARS = 1_000_000
# Maximum raw byte size for bare ref structured parsing (10 MB).
_MAX_BARE_REF_BYTES = 10_000_000
@dataclass
@@ -83,6 +93,11 @@ class FileRef:
end_line: int | None # 1-indexed, inclusive
# ---------------------------------------------------------------------------
# Public API (top-down: main functions first, helpers below)
# ---------------------------------------------------------------------------
def parse_file_ref(text: str) -> FileRef | None:
"""Return a :class:`FileRef` if *text* is a bare file reference token.
@@ -104,17 +119,6 @@ def parse_file_ref(text: str) -> FileRef | None:
return FileRef(uri=m.group(1), start_line=start, end_line=end)
def _apply_line_range(text: str, start: int | None, end: int | None) -> str:
"""Slice *text* to the requested 1-indexed line range (inclusive)."""
if start is None and end is None:
return text
lines = text.splitlines(keepends=True)
s = (start - 1) if start is not None else 0
e = end if end is not None else len(lines)
selected = list(itertools.islice(lines, s, e))
return "".join(selected)
async def read_file_bytes(
uri: str,
user_id: str | None,
@@ -130,27 +134,47 @@ async def read_file_bytes(
if plain.startswith("workspace://"):
if not user_id:
raise ValueError("workspace:// file references require authentication")
manager = await get_manager(user_id, session.session_id)
manager = await get_workspace_manager(user_id, session.session_id)
ws = parse_workspace_uri(plain)
try:
return await (
data = await (
manager.read_file(ws.file_ref)
if ws.is_path
else manager.read_file_by_id(ws.file_ref)
)
except FileNotFoundError:
raise ValueError(f"File not found: {plain}")
except Exception as exc:
except (PermissionError, OSError) as exc:
raise ValueError(f"Failed to read {plain}: {exc}") from exc
except (AttributeError, TypeError, RuntimeError) as exc:
# AttributeError/TypeError: workspace manager returned an
# unexpected type or interface; RuntimeError: async runtime issues.
logger.warning("Unexpected error reading %s: %s", plain, exc)
raise ValueError(f"Failed to read {plain}: {exc}") from exc
# NOTE: Workspace API does not support pre-read size checks;
# the full file is loaded before the size guard below.
if len(data) > _MAX_BARE_REF_BYTES:
raise ValueError(
f"File too large ({len(data)} bytes, limit {_MAX_BARE_REF_BYTES})"
)
return data
if is_allowed_local_path(plain, get_sdk_cwd()):
resolved = os.path.realpath(os.path.expanduser(plain))
try:
# Read with a one-byte overshoot to detect files that exceed the limit
# without a separate os.path.getsize call (avoids TOCTOU race).
with open(resolved, "rb") as fh:
return fh.read()
data = fh.read(_MAX_BARE_REF_BYTES + 1)
if len(data) > _MAX_BARE_REF_BYTES:
raise ValueError(
f"File too large (>{_MAX_BARE_REF_BYTES} bytes, "
f"limit {_MAX_BARE_REF_BYTES})"
)
return data
except FileNotFoundError:
raise ValueError(f"File not found: {plain}")
except Exception as exc:
except OSError as exc:
raise ValueError(f"Failed to read {plain}: {exc}") from exc
sandbox = get_current_sandbox()
@@ -162,9 +186,33 @@ async def read_file_bytes(
f"Path is not allowed (not in workspace, sdk_cwd, or sandbox): {plain}"
) from exc
try:
return bytes(await sandbox.files.read(remote, format="bytes"))
except Exception as exc:
data = bytes(await sandbox.files.read(remote, format="bytes"))
except (FileNotFoundError, OSError, UnicodeDecodeError) as exc:
raise ValueError(f"Failed to read from sandbox: {plain}: {exc}") from exc
except Exception as exc:
# E2B SDK raises SandboxException subclasses (NotFoundException,
# TimeoutException, NotEnoughSpaceException, etc.) which don't
# inherit from standard exceptions. Import lazily to avoid a
# hard dependency on e2b at module level.
try:
from e2b.exceptions import SandboxException # noqa: PLC0415
if isinstance(exc, SandboxException):
raise ValueError(
f"Failed to read from sandbox: {plain}: {exc}"
) from exc
except ImportError:
pass
# Re-raise unexpected exceptions (TypeError, AttributeError, etc.)
# so they surface as real bugs rather than being silently masked.
raise
# NOTE: E2B sandbox API does not support pre-read size checks;
# the full file is loaded before the size guard below.
if len(data) > _MAX_BARE_REF_BYTES:
raise ValueError(
f"File too large ({len(data)} bytes, limit {_MAX_BARE_REF_BYTES})"
)
return data
raise ValueError(
f"Path is not allowed (not in workspace, sdk_cwd, or sandbox): {plain}"
@@ -178,15 +226,13 @@ async def resolve_file_ref(
) -> str:
"""Resolve a :class:`FileRef` to its text content."""
raw = await read_file_bytes(ref.uri, user_id, session)
return _apply_line_range(
raw.decode("utf-8", errors="replace"), ref.start_line, ref.end_line
)
return _apply_line_range(_to_str(raw), ref.start_line, ref.end_line)
async def expand_file_refs_in_string(
text: str,
user_id: str | None,
session: "ChatSession",
session: ChatSession,
*,
raise_on_error: bool = False,
) -> str:
@@ -232,6 +278,9 @@ async def expand_file_refs_in_string(
if len(content) > _MAX_EXPAND_CHARS:
content = content[:_MAX_EXPAND_CHARS] + "\n... [truncated]"
remaining = _MAX_TOTAL_EXPAND_CHARS - total_chars
# remaining == 0 means the budget was exactly exhausted by the
# previous ref. The elif below (len > remaining) won't catch
# this since 0 > 0 is false, so we need the <= 0 check.
if remaining <= 0:
content = "[file-ref budget exhausted: total expansion limit reached]"
elif len(content) > remaining:
@@ -252,13 +301,31 @@ async def expand_file_refs_in_string(
async def expand_file_refs_in_args(
args: dict[str, Any],
user_id: str | None,
session: "ChatSession",
session: ChatSession,
*,
input_schema: dict[str, Any] | None = None,
) -> dict[str, Any]:
"""Recursively expand ``@@agptfile:...`` references in tool call arguments.
String values are expanded in-place. Nested dicts and lists are
traversed. Non-string scalars are returned unchanged.
**Bare references** (the entire argument value is a single
``@@agptfile:...`` token with no surrounding text) are resolved and then
parsed according to the file's extension or MIME type. See
:mod:`backend.util.file_content_parser` for the full list of supported
formats (JSON, JSONL, CSV, TSV, YAML, TOML, Parquet, Excel).
When *input_schema* is provided and the target property has
``"type": "string"``, structured parsing is skipped — the raw file content
is returned as a plain string so blocks receive the original text.
If the format is unrecognised or parsing fails, the content is returned as
a plain string (the fallback).
**Embedded references** (``@@agptfile:`` mixed with other text) always
produce a plain string — structured parsing only applies to bare refs.
Raises :class:`FileRefExpansionError` if any reference fails to resolve,
so the tool is *not* executed with an error string as its input. The
caller (the MCP tool wrapper) should convert this into an MCP error
@@ -267,15 +334,382 @@ async def expand_file_refs_in_args(
if not args:
return args
async def _expand(value: Any) -> Any:
properties = (input_schema or {}).get("properties", {})
async def _expand(
value: Any,
*,
prop_schema: dict[str, Any] | None = None,
) -> Any:
"""Recursively expand a single argument value.
Strings are checked for ``@@agptfile:`` references and expanded
(bare refs get structured parsing; embedded refs get inline
substitution). Dicts and lists are traversed recursively,
threading the corresponding sub-schema from *prop_schema* so
that nested fields also receive correct type-aware expansion.
Non-string scalars pass through unchanged.
"""
if isinstance(value, str):
ref = parse_file_ref(value)
if ref is not None:
# MediaFileType fields: return the raw URI immediately —
# no file reading, no format inference, no content parsing.
if _is_media_file_field(prop_schema):
return ref.uri
fmt = infer_format_from_uri(ref.uri)
# Workspace URIs by ID (workspace://abc123) have no extension.
# When the MIME fragment is also missing, fall back to the
# workspace file manager's metadata for format detection.
if fmt is None and ref.uri.startswith("workspace://"):
fmt = await _infer_format_from_workspace(ref.uri, user_id, session)
return await _expand_bare_ref(ref, fmt, user_id, session, prop_schema)
# Not a bare ref — do normal inline expansion.
return await expand_file_refs_in_string(
value, user_id, session, raise_on_error=True
)
if isinstance(value, dict):
return {k: await _expand(v) for k, v in value.items()}
# When the schema says this is an object but doesn't define
# inner properties, skip expansion — the caller (e.g.
# RunBlockTool) will expand with the actual nested schema.
if (
prop_schema is not None
and prop_schema.get("type") == "object"
and "properties" not in prop_schema
):
return value
nested_props = (prop_schema or {}).get("properties", {})
return {
k: await _expand(v, prop_schema=nested_props.get(k))
for k, v in value.items()
}
if isinstance(value, list):
return [await _expand(item) for item in value]
items_schema = (prop_schema or {}).get("items")
return [await _expand(item, prop_schema=items_schema) for item in value]
return value
return {k: await _expand(v) for k, v in args.items()}
return {k: await _expand(v, prop_schema=properties.get(k)) for k, v in args.items()}
# ---------------------------------------------------------------------------
# Private helpers (used by the public functions above)
# ---------------------------------------------------------------------------
def _apply_line_range(text: str, start: int | None, end: int | None) -> str:
"""Slice *text* to the requested 1-indexed line range (inclusive).
When the requested range extends beyond the file, a note is appended
so the LLM knows it received the entire remaining content.
"""
if start is None and end is None:
return text
lines = text.splitlines(keepends=True)
total = len(lines)
s = (start - 1) if start is not None else 0
e = end if end is not None else total
selected = list(itertools.islice(lines, s, e))
result = "".join(selected)
if end is not None and end > total:
result += f"\n[Note: file has only {total} lines]\n"
return result
def _to_str(content: str | bytes) -> str:
"""Decode *content* to a string if it is bytes, otherwise return as-is."""
if isinstance(content, str):
return content
return content.decode("utf-8", errors="replace")
def _check_content_size(content: str | bytes) -> None:
"""Raise :class:`ValueError` if *content* exceeds the byte limit.
Raises ``ValueError`` (not ``FileRefExpansionError``) so that the caller
(``_expand_bare_ref``) can unify all resolution errors into a single
``except ValueError`` → ``FileRefExpansionError`` handler, keeping the
error-flow consistent with ``read_file_bytes`` and ``resolve_file_ref``.
For ``bytes``, the length is the byte count directly. For ``str``,
we encode to UTF-8 first because multi-byte characters (e.g. emoji)
mean the byte size can be up to 4x the character count.
"""
if isinstance(content, bytes):
size = len(content)
else:
char_len = len(content)
# Fast lower bound: UTF-8 byte count >= char count.
# If char count already exceeds the limit, reject immediately
# without allocating an encoded copy.
if char_len > _MAX_BARE_REF_BYTES:
size = char_len # real byte size is even larger
# Fast upper bound: each char is at most 4 UTF-8 bytes.
# If worst-case is still under the limit, skip encoding entirely.
elif char_len * 4 <= _MAX_BARE_REF_BYTES:
return
else:
# Edge case: char count is under limit but multibyte chars
# might push byte count over. Encode to get exact size.
size = len(content.encode("utf-8"))
if size > _MAX_BARE_REF_BYTES:
raise ValueError(
f"File too large for structured parsing "
f"({size} bytes, limit {_MAX_BARE_REF_BYTES})"
)
async def _infer_format_from_workspace(
uri: str,
user_id: str | None,
session: ChatSession,
) -> str | None:
"""Look up workspace file metadata to infer the format.
Workspace URIs by ID (``workspace://abc123``) have no file extension.
When the MIME fragment is also absent, we query the workspace file
manager for the file's stored MIME type and original filename.
"""
if not user_id:
return None
try:
ws = parse_workspace_uri(uri)
manager = await get_workspace_manager(user_id, session.session_id)
info = await (
manager.get_file_info(ws.file_ref)
if not ws.is_path
else manager.get_file_info_by_path(ws.file_ref)
)
if info is None:
return None
# Try MIME type first, then filename extension.
mime = (info.mime_type or "").split(";", 1)[0].strip().lower()
return MIME_TO_FORMAT.get(mime) or infer_format_from_uri(info.name)
except (
ValueError,
FileNotFoundError,
OSError,
PermissionError,
AttributeError,
TypeError,
):
# Expected failures: bad URI, missing file, permission denied, or
# workspace manager returning unexpected types. Propagate anything
# else (e.g. programming errors) so they don't get silently swallowed.
logger.debug("workspace metadata lookup failed for %s", uri, exc_info=True)
return None
def _is_media_file_field(prop_schema: dict[str, Any] | None) -> bool:
"""Return True if *prop_schema* describes a MediaFileType field (format: file)."""
if prop_schema is None:
return False
return (
prop_schema.get("type") == "string"
and prop_schema.get("format") == MediaFileType.string_format
)
async def _expand_bare_ref(
ref: FileRef,
fmt: str | None,
user_id: str | None,
session: ChatSession,
prop_schema: dict[str, Any] | None,
) -> Any:
"""Resolve and parse a bare ``@@agptfile:`` reference.
This is the structured-parsing path: the file is read, optionally parsed
according to *fmt*, and adapted to the target *prop_schema*.
Raises :class:`FileRefExpansionError` on resolution or parsing failure.
Note: MediaFileType fields (format: "file") are handled earlier in
``_expand`` to avoid unnecessary format inference and file I/O.
"""
try:
if fmt is not None and fmt in BINARY_FORMATS:
# Binary formats need raw bytes, not UTF-8 text.
# Line ranges are meaningless for binary formats (parquet/xlsx)
# — ignore them and parse full bytes. Warn so the caller/model
# knows the range was silently dropped.
if ref.start_line is not None or ref.end_line is not None:
logger.warning(
"Line range [%s-%s] ignored for binary format %s (%s); "
"binary formats are always parsed in full.",
ref.start_line,
ref.end_line,
fmt,
ref.uri,
)
content: str | bytes = await read_file_bytes(ref.uri, user_id, session)
else:
content = await resolve_file_ref(ref, user_id, session)
except ValueError as exc:
raise FileRefExpansionError(str(exc)) from exc
# For known formats this rejects files >10 MB before parsing.
# For unknown formats _MAX_EXPAND_CHARS (200K chars) below is stricter,
# but this check still guards the parsing path which has no char limit.
# _check_content_size raises ValueError, which we unify here just like
# resolution errors above.
try:
_check_content_size(content)
except ValueError as exc:
raise FileRefExpansionError(str(exc)) from exc
# When the schema declares this parameter as "string",
# return raw file content — don't parse into a structured
# type that would need json.dumps() serialisation.
expect_string = (prop_schema or {}).get("type") == "string"
if expect_string:
if isinstance(content, bytes):
raise FileRefExpansionError(
f"Cannot use {fmt} file as text input: "
f"binary formats (parquet, xlsx) must be passed "
f"to a block that accepts structured data (list/object), "
f"not a string-typed parameter."
)
return content
if fmt is not None:
# Use strict mode for binary formats so we surface the
# actual error (e.g. missing pyarrow/openpyxl, corrupt
# file) instead of silently returning garbled bytes.
strict = fmt in BINARY_FORMATS
try:
parsed = parse_file_content(content, fmt, strict=strict)
except PARSE_EXCEPTIONS as exc:
raise FileRefExpansionError(f"Failed to parse {fmt} file: {exc}") from exc
# Normalize bytes fallback to str so tools never
# receive raw bytes when parsing fails.
if isinstance(parsed, bytes):
parsed = _to_str(parsed)
return _adapt_to_schema(parsed, prop_schema)
# Unknown format — return as plain string, but apply
# the same per-ref character limit used by inline refs
# to prevent injecting unexpectedly large content.
text = _to_str(content)
if len(text) > _MAX_EXPAND_CHARS:
text = text[:_MAX_EXPAND_CHARS] + "\n... [truncated]"
return text
def _adapt_to_schema(parsed: Any, prop_schema: dict[str, Any] | None) -> Any:
"""Adapt a parsed file value to better fit the target schema type.
When the parser returns a natural type (e.g. dict from YAML, list from CSV)
that doesn't match the block's expected type, this function converts it to
a more useful representation instead of relying on pydantic's generic
coercion (which can produce awkward results like flattened dicts → lists).
Returns *parsed* unchanged when no adaptation is needed.
"""
if prop_schema is None:
return parsed
target_type = prop_schema.get("type")
# Dict → array: delegate to helper.
if isinstance(parsed, dict) and target_type == "array":
return _adapt_dict_to_array(parsed, prop_schema)
# List → object: delegate to helper (raises for non-tabular lists).
if isinstance(parsed, list) and target_type == "object":
return _adapt_list_to_object(parsed)
# Tabular list → Any (no type): convert to list of dicts.
# Blocks like FindInDictionaryBlock have `input: Any` which produces
# a schema with no "type" key. Tabular [[header],[rows]] is unusable
# for key lookup, but [{col: val}, ...] works with FindInDict's
# list-of-dicts branch (line 195-199 in data_manipulation.py).
if isinstance(parsed, list) and target_type is None and _is_tabular(parsed):
return _tabular_to_list_of_dicts(parsed)
return parsed
def _adapt_dict_to_array(parsed: dict, prop_schema: dict[str, Any]) -> Any:
"""Adapt a parsed dict to an array-typed field.
Extracts list-valued entries when the target item type is ``array``,
passes through unchanged when item type is ``string`` (lets pydantic error),
or wraps in ``[parsed]`` as a fallback.
"""
items_type = (prop_schema.get("items") or {}).get("type")
if items_type == "array":
# Target is List[List[Any]] — extract list-typed values from the
# dict as inner lists. E.g. YAML {"fruits": [{...},...]}} with
# ConcatenateLists (List[List[Any]]) → [[{...},...]].
list_values = [v for v in parsed.values() if isinstance(v, list)]
if list_values:
return list_values
if items_type == "string":
# Target is List[str] — wrapping a dict would give [dict]
# which can't coerce to strings. Return unchanged and let
# pydantic surface a clear validation error.
return parsed
# Fallback: wrap in a single-element list so the block gets [dict]
# instead of pydantic flattening keys/values into a flat list.
return [parsed]
def _adapt_list_to_object(parsed: list) -> Any:
"""Adapt a parsed list to an object-typed field.
Converts tabular lists to column-dicts; raises for non-tabular lists.
"""
if _is_tabular(parsed):
return _tabular_to_column_dict(parsed)
# Non-tabular list (e.g. a plain Python list from a YAML file) cannot
# be meaningfully coerced to an object. Raise explicitly so callers
# get a clear error rather than pydantic silently wrapping the list.
raise FileRefExpansionError(
"Cannot adapt a non-tabular list to an object-typed field. "
"Expected a tabular structure ([[header], [row1], ...]) or a dict."
)
def _is_tabular(parsed: Any) -> bool:
"""Check if parsed data is in tabular format: [[header], [row1], ...].
Uses isinstance checks because this is a structural type guard on
opaque parser output (Any), not duck typing. A Protocol wouldn't
help here — we need to verify exact list-of-lists shape.
"""
if not isinstance(parsed, list) or len(parsed) < 2:
return False
header = parsed[0]
if not isinstance(header, list) or not header:
return False
if not all(isinstance(h, str) for h in header):
return False
return all(isinstance(row, list) for row in parsed[1:])
def _tabular_to_list_of_dicts(parsed: list) -> list[dict[str, Any]]:
"""Convert [[header], [row1], ...] → [{header[0]: row[0], ...}, ...].
Ragged rows (fewer columns than the header) get None for missing values.
Extra values beyond the header length are silently dropped.
"""
header = parsed[0]
return [
dict(itertools.zip_longest(header, row[: len(header)], fillvalue=None))
for row in parsed[1:]
]
def _tabular_to_column_dict(parsed: list) -> dict[str, list]:
"""Convert [[header], [row1], ...] → {"col1": [val1, ...], ...}.
Ragged rows (fewer columns than the header) get None for missing values,
ensuring all columns have equal length.
"""
header = parsed[0]
return {
col: [row[i] if i < len(row) else None for row in parsed[1:]]
for i, col in enumerate(header)
}

View File

@@ -175,6 +175,199 @@ async def test_expand_args_replaces_file_ref_in_nested_dict():
assert result["count"] == 42
# ---------------------------------------------------------------------------
# expand_file_refs_in_args — bare ref structured parsing
# ---------------------------------------------------------------------------
@pytest.mark.asyncio
async def test_bare_ref_json_returns_parsed_dict():
"""Bare ref to a .json file returns parsed dict, not raw string."""
with tempfile.TemporaryDirectory() as sdk_cwd:
json_file = os.path.join(sdk_cwd, "data.json")
with open(json_file, "w") as f:
f.write('{"key": "value", "count": 42}')
with patch("backend.copilot.context._current_sdk_cwd") as mock_cwd_var:
mock_cwd_var.get.return_value = sdk_cwd
result = await expand_file_refs_in_args(
{"data": f"@@agptfile:{json_file}"},
user_id="u1",
session=_make_session(),
)
assert result["data"] == {"key": "value", "count": 42}
@pytest.mark.asyncio
async def test_bare_ref_csv_returns_parsed_table():
"""Bare ref to a .csv file returns list[list[str]] table."""
with tempfile.TemporaryDirectory() as sdk_cwd:
csv_file = os.path.join(sdk_cwd, "data.csv")
with open(csv_file, "w") as f:
f.write("Name,Score\nAlice,90\nBob,85")
with patch("backend.copilot.context._current_sdk_cwd") as mock_cwd_var:
mock_cwd_var.get.return_value = sdk_cwd
result = await expand_file_refs_in_args(
{"input": f"@@agptfile:{csv_file}"},
user_id="u1",
session=_make_session(),
)
assert result["input"] == [
["Name", "Score"],
["Alice", "90"],
["Bob", "85"],
]
@pytest.mark.asyncio
async def test_bare_ref_unknown_extension_returns_string():
"""Bare ref to a file with unknown extension returns plain string."""
with tempfile.TemporaryDirectory() as sdk_cwd:
txt_file = os.path.join(sdk_cwd, "readme.txt")
with open(txt_file, "w") as f:
f.write("plain text content")
with patch("backend.copilot.context._current_sdk_cwd") as mock_cwd_var:
mock_cwd_var.get.return_value = sdk_cwd
result = await expand_file_refs_in_args(
{"data": f"@@agptfile:{txt_file}"},
user_id="u1",
session=_make_session(),
)
assert result["data"] == "plain text content"
assert isinstance(result["data"], str)
@pytest.mark.asyncio
async def test_bare_ref_invalid_json_falls_back_to_string():
"""Bare ref to a .json file with invalid JSON falls back to string."""
with tempfile.TemporaryDirectory() as sdk_cwd:
json_file = os.path.join(sdk_cwd, "bad.json")
with open(json_file, "w") as f:
f.write("not valid json {{{")
with patch("backend.copilot.context._current_sdk_cwd") as mock_cwd_var:
mock_cwd_var.get.return_value = sdk_cwd
result = await expand_file_refs_in_args(
{"data": f"@@agptfile:{json_file}"},
user_id="u1",
session=_make_session(),
)
assert result["data"] == "not valid json {{{"
assert isinstance(result["data"], str)
@pytest.mark.asyncio
async def test_embedded_ref_always_returns_string_even_for_json():
"""Embedded ref (text around it) returns plain string, not parsed JSON."""
with tempfile.TemporaryDirectory() as sdk_cwd:
json_file = os.path.join(sdk_cwd, "data.json")
with open(json_file, "w") as f:
f.write('{"key": "value"}')
with patch("backend.copilot.context._current_sdk_cwd") as mock_cwd_var:
mock_cwd_var.get.return_value = sdk_cwd
result = await expand_file_refs_in_args(
{"data": f"prefix @@agptfile:{json_file} suffix"},
user_id="u1",
session=_make_session(),
)
assert isinstance(result["data"], str)
assert result["data"].startswith("prefix ")
assert result["data"].endswith(" suffix")
@pytest.mark.asyncio
async def test_bare_ref_yaml_returns_parsed_dict():
"""Bare ref to a .yaml file returns parsed dict."""
with tempfile.TemporaryDirectory() as sdk_cwd:
yaml_file = os.path.join(sdk_cwd, "config.yaml")
with open(yaml_file, "w") as f:
f.write("name: test\ncount: 42\n")
with patch("backend.copilot.context._current_sdk_cwd") as mock_cwd_var:
mock_cwd_var.get.return_value = sdk_cwd
result = await expand_file_refs_in_args(
{"config": f"@@agptfile:{yaml_file}"},
user_id="u1",
session=_make_session(),
)
assert result["config"] == {"name": "test", "count": 42}
@pytest.mark.asyncio
async def test_bare_ref_binary_with_line_range_ignores_range():
"""Bare ref to a binary file (.parquet) with line range parses the full file.
Binary formats (parquet, xlsx) ignore line ranges — the full content is
parsed and the range is silently dropped with a log warning.
"""
try:
import pandas as pd
except ImportError:
pytest.skip("pandas not installed")
try:
import pyarrow # noqa: F401 # pyright: ignore[reportMissingImports]
except ImportError:
pytest.skip("pyarrow not installed")
with tempfile.TemporaryDirectory() as sdk_cwd:
parquet_file = os.path.join(sdk_cwd, "data.parquet")
import io as _io
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
buf = _io.BytesIO()
df.to_parquet(buf, index=False)
with open(parquet_file, "wb") as f:
f.write(buf.getvalue())
with patch("backend.copilot.context._current_sdk_cwd") as mock_cwd_var:
mock_cwd_var.get.return_value = sdk_cwd
# Line range [1-2] should be silently ignored for binary formats.
result = await expand_file_refs_in_args(
{"data": f"@@agptfile:{parquet_file}[1-2]"},
user_id="u1",
session=_make_session(),
)
# Full file is returned despite the line range.
assert result["data"] == [["A", "B"], [1, 4], [2, 5], [3, 6]]
@pytest.mark.asyncio
async def test_bare_ref_toml_returns_parsed_dict():
"""Bare ref to a .toml file returns parsed dict."""
with tempfile.TemporaryDirectory() as sdk_cwd:
toml_file = os.path.join(sdk_cwd, "config.toml")
with open(toml_file, "w") as f:
f.write('name = "test"\ncount = 42\n')
with patch("backend.copilot.context._current_sdk_cwd") as mock_cwd_var:
mock_cwd_var.get.return_value = sdk_cwd
result = await expand_file_refs_in_args(
{"config": f"@@agptfile:{toml_file}"},
user_id="u1",
session=_make_session(),
)
assert result["config"] == {"name": "test", "count": 42}
# ---------------------------------------------------------------------------
# _read_file_handler — extended to accept workspace:// and local paths
# ---------------------------------------------------------------------------
@@ -219,7 +412,7 @@ async def test_read_file_handler_workspace_uri():
"backend.copilot.sdk.tool_adapter.get_execution_context",
return_value=("user-1", mock_session),
), patch(
"backend.copilot.sdk.file_ref.get_manager",
"backend.copilot.sdk.file_ref.get_workspace_manager",
new=AsyncMock(return_value=mock_manager),
):
result = await _read_file_handler(
@@ -276,7 +469,7 @@ async def test_read_file_bytes_workspace_virtual_path():
mock_manager.read_file.return_value = b"virtual path content"
with patch(
"backend.copilot.sdk.file_ref.get_manager",
"backend.copilot.sdk.file_ref.get_workspace_manager",
new=AsyncMock(return_value=mock_manager),
):
result = await read_file_bytes("workspace:///reports/q1.md", "user-1", session)

File diff suppressed because it is too large Load Diff

View File

@@ -20,7 +20,24 @@ Use these URLs directly without asking the user:
| Cloudflare | `https://mcp.cloudflare.com/mcp` |
| Atlassian / Jira | `https://mcp.atlassian.com/mcp` |
For other services, search the MCP registry at https://registry.modelcontextprotocol.io/.
For other services, search the MCP registry API:
```http
GET https://registry.modelcontextprotocol.io/v0/servers?q=<search_term>
```
Each result includes a `remotes` array with the exact server URL to use.
### Important: Check blocks first
Before using `run_mcp_tool`, always check if the platform already has blocks for the service
using `find_block`. The platform has hundreds of built-in blocks (Google Sheets, Google Docs,
Google Calendar, Gmail, etc.) that work without MCP setup.
Only use `run_mcp_tool` when:
- The service is in the known hosted MCP servers list above, OR
- You searched `find_block` first and found no matching blocks
**Never guess or construct MCP server URLs.** Only use URLs from the known servers list above
or from the `remotes[].url` field in MCP registry search results.
### Authentication

View File

@@ -127,7 +127,7 @@ def create_security_hooks(
user_id: str | None,
sdk_cwd: str | None = None,
max_subtasks: int = 3,
on_compact: Callable[[], None] | None = None,
on_compact: Callable[[str], None] | None = None,
) -> dict[str, Any]:
"""Create the security hooks configuration for Claude Agent SDK.
@@ -142,6 +142,7 @@ def create_security_hooks(
sdk_cwd: SDK working directory for workspace-scoped tool validation
max_subtasks: Maximum concurrent Task (sub-agent) spawns allowed per session
on_compact: Callback invoked when SDK starts compacting context.
Receives the transcript_path from the hook input.
Returns:
Hooks configuration dict for ClaudeAgentOptions
@@ -301,11 +302,21 @@ def create_security_hooks(
"""
_ = context, tool_use_id
trigger = input_data.get("trigger", "auto")
# Sanitize untrusted input before logging to prevent log injection
transcript_path = (
str(input_data.get("transcript_path", ""))
.replace("\n", "")
.replace("\r", "")
)
logger.info(
f"[SDK] Context compaction triggered: {trigger}, user={user_id}"
"[SDK] Context compaction triggered: %s, user=%s, "
"transcript_path=%s",
trigger,
user_id,
transcript_path,
)
if on_compact is not None:
on_compact()
on_compact(transcript_path)
return cast(SyncHookJSONOutput, {})
hooks: dict[str, Any] = {

View File

@@ -29,6 +29,7 @@ from langfuse import propagate_attributes
from langsmith.integrations.claude_agent_sdk import configure_claude_agent_sdk
from pydantic import BaseModel
from backend.copilot.context import get_workspace_manager
from backend.data.redis_client import get_redis_async
from backend.executor.cluster_lock import AsyncClusterLock
from backend.util.exceptions import NotFoundError
@@ -62,7 +63,6 @@ from ..service import (
)
from ..tools.e2b_sandbox import get_or_create_sandbox, pause_sandbox_direct
from ..tools.sandbox import WORKSPACE_PREFIX, make_session_path
from ..tools.workspace_files import get_manager
from ..tracking import track_user_message
from .compaction import CompactionTracker, filter_compaction_messages
from .response_adapter import SDKResponseAdapter
@@ -77,6 +77,7 @@ from .tool_adapter import (
from .transcript import (
cleanup_cli_project_dir,
download_transcript,
read_compacted_entries,
upload_transcript,
validate_transcript,
write_transcript_to_tempfile,
@@ -564,7 +565,7 @@ async def _prepare_file_attachments(
return empty
try:
manager = await get_manager(user_id, session_id)
manager = await get_workspace_manager(user_id, session_id)
except Exception:
logger.warning(
"Failed to create workspace manager for file attachments",
@@ -1045,6 +1046,7 @@ async def stream_chat_completion_sdk(
exc_info=True,
)
ended_with_stream_error = True
yield StreamError(
errorText=f"SDK stream error: {stream_err}",
code="sdk_stream_error",
@@ -1129,9 +1131,26 @@ async def stream_chat_completion_sdk(
sdk_msg.result or "(no error message provided)",
)
# Emit compaction end if SDK finished compacting
for ev in await compaction.emit_end_if_ready(session):
# Emit compaction end if SDK finished compacting.
# When compaction ends, sync TranscriptBuilder with the
# CLI's active context so they stay identical.
compact_result = await compaction.emit_end_if_ready(session)
for ev in compact_result.events:
yield ev
# After replace_entries, skip append_assistant for this
# sdk_msg — the CLI session file already contains it,
# so appending again would create a duplicate.
entries_replaced = False
if compact_result.just_ended:
compacted = await asyncio.to_thread(
read_compacted_entries,
compact_result.transcript_path,
)
if compacted is not None:
transcript_builder.replace_entries(
compacted, log_prefix=log_prefix
)
entries_replaced = True
for response in adapter.convert_message(sdk_msg):
if isinstance(response, StreamStart):
@@ -1218,10 +1237,11 @@ async def stream_chat_completion_sdk(
tool_call_id=response.toolCallId,
)
)
transcript_builder.append_tool_result(
tool_use_id=response.toolCallId,
content=content,
)
if not entries_replaced:
transcript_builder.append_tool_result(
tool_use_id=response.toolCallId,
content=content,
)
has_tool_results = True
elif isinstance(response, StreamFinish):
@@ -1231,7 +1251,9 @@ async def stream_chat_completion_sdk(
# any stashed tool results from the previous turn are
# recorded first, preserving the required API order:
# assistant(tool_use) → tool_result → assistant(text).
if isinstance(sdk_msg, AssistantMessage):
# Skip if replace_entries just ran — the CLI session
# file already contains this message.
if isinstance(sdk_msg, AssistantMessage) and not entries_replaced:
transcript_builder.append_assistant(
content_blocks=_format_sdk_content_blocks(sdk_msg.content),
model=sdk_msg.model,
@@ -1422,13 +1444,13 @@ async def stream_chat_completion_sdk(
task.add_done_callback(_background_tasks.discard)
# --- Upload transcript for next-turn --resume ---
# This MUST run in finally so the transcript is uploaded even when
# the streaming loop raises an exception.
# The transcript represents the COMPLETE active context (atomic).
# TranscriptBuilder is the single source of truth. It mirrors the
# CLI's active context: on compaction, replace_entries() syncs it
# with the compacted session file. No CLI file read needed here.
if config.claude_agent_use_resume and user_id and session is not None:
try:
# Build complete transcript from captured SDK messages
transcript_content = transcript_builder.to_jsonl()
entry_count = transcript_builder.entry_count
if not transcript_content:
logger.warning(
@@ -1438,18 +1460,15 @@ async def stream_chat_completion_sdk(
logger.warning(
"%s Transcript invalid, skipping upload (entries=%d)",
log_prefix,
transcript_builder.entry_count,
entry_count,
)
else:
logger.info(
"%s Uploading complete transcript (entries=%d, bytes=%d)",
"%s Uploading transcript (entries=%d, bytes=%d)",
log_prefix,
transcript_builder.entry_count,
entry_count,
len(transcript_content),
)
# Shield upload from cancellation - let it complete even if
# the finally block is interrupted. No timeout to avoid race
# conditions where backgrounded uploads overwrite newer transcripts.
await asyncio.shield(
upload_transcript(
user_id=user_id,

View File

@@ -20,7 +20,7 @@ class _FakeFileInfo:
size_bytes: int
_PATCH_TARGET = "backend.copilot.sdk.service.get_manager"
_PATCH_TARGET = "backend.copilot.sdk.service.get_workspace_manager"
class TestPrepareFileAttachments:

View File

@@ -347,7 +347,7 @@ def create_copilot_mcp_server(*, use_e2b: bool = False):
:func:`get_sdk_disallowed_tools`.
"""
def _truncating(fn, tool_name: str):
def _truncating(fn, tool_name: str, input_schema: dict[str, Any] | None = None):
"""Wrap a tool handler so its response is truncated to stay under the
SDK's 10 MB JSON buffer, and stash the (truncated) output for the
response adapter before the SDK can apply its own head-truncation.
@@ -361,7 +361,9 @@ def create_copilot_mcp_server(*, use_e2b: bool = False):
user_id, session = get_execution_context()
if session is not None:
try:
args = await expand_file_refs_in_args(args, user_id, session)
args = await expand_file_refs_in_args(
args, user_id, session, input_schema=input_schema
)
except FileRefExpansionError as exc:
return _mcp_error(
f"@@agptfile: reference could not be resolved: {exc}. "
@@ -389,11 +391,12 @@ def create_copilot_mcp_server(*, use_e2b: bool = False):
for tool_name, base_tool in TOOL_REGISTRY.items():
handler = create_tool_handler(base_tool)
schema = _build_input_schema(base_tool)
decorated = tool(
tool_name,
base_tool.description,
_build_input_schema(base_tool),
)(_truncating(handler, tool_name))
schema,
)(_truncating(handler, tool_name, input_schema=schema))
sdk_tools.append(decorated)
# E2B file tools replace SDK built-in Read/Write/Edit/Glob/Grep.

View File

@@ -13,8 +13,10 @@ filesystem for self-hosted) — no DB column needed.
import logging
import os
import re
import shutil
import time
from dataclasses import dataclass
from pathlib import Path
from backend.util import json
@@ -82,7 +84,11 @@ def strip_progress_entries(content: str) -> str:
parent = entry.get("parentUuid", "")
if uid:
uuid_to_parent[uid] = parent
if entry.get("type", "") in STRIPPABLE_TYPES and uid:
if (
entry.get("type", "") in STRIPPABLE_TYPES
and uid
and not entry.get("isCompactSummary")
):
stripped_uuids.add(uid)
# Second pass: keep non-stripped entries, reparenting where needed.
@@ -106,7 +112,9 @@ def strip_progress_entries(content: str) -> str:
if not isinstance(entry, dict):
result_lines.append(line)
continue
if entry.get("type", "") in STRIPPABLE_TYPES:
if entry.get("type", "") in STRIPPABLE_TYPES and not entry.get(
"isCompactSummary"
):
continue
uid = entry.get("uuid", "")
if uid in reparented:
@@ -137,6 +145,155 @@ def _sanitize_id(raw_id: str, max_len: int = 36) -> str:
_SAFE_CWD_PREFIX = os.path.realpath("/tmp/copilot-")
def _projects_base() -> str:
"""Return the resolved path to the CLI's projects directory."""
config_dir = os.environ.get("CLAUDE_CONFIG_DIR") or os.path.expanduser("~/.claude")
return os.path.realpath(os.path.join(config_dir, "projects"))
def _cli_project_dir(sdk_cwd: str) -> str | None:
"""Return the CLI's project directory for a given working directory.
Returns ``None`` if the path would escape the projects base.
"""
cwd_encoded = re.sub(r"[^a-zA-Z0-9]", "-", os.path.realpath(sdk_cwd))
projects_base = _projects_base()
project_dir = os.path.realpath(os.path.join(projects_base, cwd_encoded))
if not project_dir.startswith(projects_base + os.sep):
logger.warning(
"[Transcript] Project dir escaped projects base: %s", project_dir
)
return None
return project_dir
def _safe_glob_jsonl(project_dir: str) -> list[Path]:
"""Glob ``*.jsonl`` files, filtering out symlinks that escape the directory."""
try:
resolved_base = Path(project_dir).resolve()
except OSError as e:
logger.warning("[Transcript] Failed to resolve project dir: %s", e)
return []
result: list[Path] = []
for candidate in Path(project_dir).glob("*.jsonl"):
try:
resolved = candidate.resolve()
if resolved.is_relative_to(resolved_base):
result.append(resolved)
except (OSError, RuntimeError) as e:
logger.debug(
"[Transcript] Skipping invalid CLI session candidate %s: %s",
candidate,
e,
)
return result
def read_compacted_entries(transcript_path: str) -> list[dict] | None:
"""Read compacted entries from the CLI session file after compaction.
Parses the JSONL file line-by-line, finds the ``isCompactSummary: true``
entry, and returns it plus all entries after it.
The CLI writes the compaction summary BEFORE sending the next message,
so the file is guaranteed to be flushed by the time we read it.
Returns a list of parsed dicts, or ``None`` if the file cannot be read
or no compaction summary is found.
"""
if not transcript_path:
return None
projects_base = _projects_base()
real_path = os.path.realpath(transcript_path)
if not real_path.startswith(projects_base + os.sep):
logger.warning(
"[Transcript] transcript_path outside projects base: %s", transcript_path
)
return None
try:
content = Path(real_path).read_text()
except OSError as e:
logger.warning(
"[Transcript] Failed to read session file %s: %s", transcript_path, e
)
return None
lines = content.strip().split("\n")
compact_idx: int | None = None
for idx, line in enumerate(lines):
if not line.strip():
continue
entry = json.loads(line, fallback=None)
if not isinstance(entry, dict):
continue
if entry.get("isCompactSummary"):
compact_idx = idx # don't break — find the LAST summary
if compact_idx is None:
logger.debug("[Transcript] No compaction summary found in %s", transcript_path)
return None
entries: list[dict] = []
for line in lines[compact_idx:]:
if not line.strip():
continue
entry = json.loads(line, fallback=None)
if isinstance(entry, dict):
entries.append(entry)
logger.info(
"[Transcript] Read %d compacted entries from %s (summary at line %d)",
len(entries),
transcript_path,
compact_idx + 1,
)
return entries
def read_cli_session_file(sdk_cwd: str) -> str | None:
"""Read the CLI's own session file, which reflects any compaction.
The CLI writes its session transcript to
``~/.claude/projects/<encoded_cwd>/<session_id>.jsonl``.
Since each SDK turn uses a unique ``sdk_cwd``, there should be
exactly one ``.jsonl`` file in that directory.
Returns the file content, or ``None`` if not found.
"""
project_dir = _cli_project_dir(sdk_cwd)
if not project_dir or not os.path.isdir(project_dir):
return None
jsonl_files = _safe_glob_jsonl(project_dir)
if not jsonl_files:
logger.debug("[Transcript] No CLI session file found in %s", project_dir)
return None
# Pick the most recently modified file (should be only one per turn).
try:
session_file = max(jsonl_files, key=lambda p: p.stat().st_mtime)
except OSError as e:
logger.warning("[Transcript] Failed to inspect CLI session files: %s", e)
return None
try:
content = session_file.read_text()
logger.info(
"[Transcript] Read CLI session file: %s (%d bytes)",
session_file,
len(content),
)
return content
except OSError as e:
logger.warning("[Transcript] Failed to read CLI session file: %s", e)
return None
def cleanup_cli_project_dir(sdk_cwd: str) -> None:
"""Remove the CLI's project directory for a specific working directory.
@@ -144,25 +301,15 @@ def cleanup_cli_project_dir(sdk_cwd: str) -> None:
Each SDK turn uses a unique ``sdk_cwd``, so the project directory is
safe to remove entirely after the transcript has been uploaded.
"""
import shutil
# Encode cwd the same way CLI does (replaces non-alphanumeric with -)
cwd_encoded = re.sub(r"[^a-zA-Z0-9]", "-", os.path.realpath(sdk_cwd))
config_dir = os.environ.get("CLAUDE_CONFIG_DIR") or os.path.expanduser("~/.claude")
projects_base = os.path.realpath(os.path.join(config_dir, "projects"))
project_dir = os.path.realpath(os.path.join(projects_base, cwd_encoded))
if not project_dir.startswith(projects_base + os.sep):
logger.warning(
f"[Transcript] Cleanup path escaped projects base: {project_dir}"
)
project_dir = _cli_project_dir(sdk_cwd)
if not project_dir:
return
if os.path.isdir(project_dir):
shutil.rmtree(project_dir, ignore_errors=True)
logger.debug(f"[Transcript] Cleaned up CLI project dir: {project_dir}")
logger.debug("[Transcript] Cleaned up CLI project dir: %s", project_dir)
else:
logger.debug(f"[Transcript] Project dir not found: {project_dir}")
logger.debug("[Transcript] Project dir not found: %s", project_dir)
def write_transcript_to_tempfile(
@@ -259,24 +406,27 @@ def _meta_storage_path_parts(user_id: str, session_id: str) -> tuple[str, str, s
)
def _build_storage_path(user_id: str, session_id: str, backend: object) -> str:
"""Build the full storage path string that ``retrieve()`` expects.
``store()`` returns a path like ``gcs://bucket/workspaces/...`` or
``local://workspace_id/file_id/filename``. Since we use deterministic
arguments we can reconstruct the same path for download/delete without
having stored the return value.
"""
def _build_path_from_parts(parts: tuple[str, str, str], backend: object) -> str:
"""Build a full storage path from (workspace_id, file_id, filename) parts."""
from backend.util.workspace_storage import GCSWorkspaceStorage
wid, fid, fname = _storage_path_parts(user_id, session_id)
wid, fid, fname = parts
if isinstance(backend, GCSWorkspaceStorage):
blob = f"workspaces/{wid}/{fid}/{fname}"
return f"gcs://{backend.bucket_name}/{blob}"
else:
# LocalWorkspaceStorage returns local://{relative_path}
return f"local://{wid}/{fid}/{fname}"
return f"local://{wid}/{fid}/{fname}"
def _build_storage_path(user_id: str, session_id: str, backend: object) -> str:
"""Build the full storage path string that ``retrieve()`` expects."""
return _build_path_from_parts(_storage_path_parts(user_id, session_id), backend)
def _build_meta_storage_path(user_id: str, session_id: str, backend: object) -> str:
"""Build the full storage path for the companion .meta.json file."""
return _build_path_from_parts(
_meta_storage_path_parts(user_id, session_id), backend
)
async def upload_transcript(
@@ -381,15 +531,7 @@ async def download_transcript(
message_count = 0
uploaded_at = 0.0
try:
from backend.util.workspace_storage import GCSWorkspaceStorage
mwid, mfid, mfname = _meta_storage_path_parts(user_id, session_id)
if isinstance(storage, GCSWorkspaceStorage):
blob = f"workspaces/{mwid}/{mfid}/{mfname}"
meta_path = f"gcs://{storage.bucket_name}/{blob}"
else:
meta_path = f"local://{mwid}/{mfid}/{mfname}"
meta_path = _build_meta_storage_path(user_id, session_id, storage)
meta_data = await storage.retrieve(meta_path)
meta = json.loads(meta_data.decode("utf-8"), fallback={})
message_count = meta.get("message_count", 0)
@@ -406,7 +548,11 @@ async def download_transcript(
async def delete_transcript(user_id: str, session_id: str) -> None:
"""Delete transcript from bucket storage (e.g. after resume failure)."""
"""Delete transcript and its metadata from bucket storage.
Removes both the ``.jsonl`` transcript and the companion ``.meta.json``
so stale ``message_count`` watermarks cannot corrupt gap-fill logic.
"""
from backend.util.workspace_storage import get_workspace_storage
storage = await get_workspace_storage()
@@ -414,6 +560,14 @@ async def delete_transcript(user_id: str, session_id: str) -> None:
try:
await storage.delete(path)
logger.info(f"[Transcript] Deleted transcript for session {session_id}")
logger.info("[Transcript] Deleted transcript for session %s", session_id)
except Exception as e:
logger.warning(f"[Transcript] Failed to delete transcript: {e}")
logger.warning("[Transcript] Failed to delete transcript: %s", e)
# Also delete the companion .meta.json to avoid orphaned metadata.
try:
meta_path = _build_meta_storage_path(user_id, session_id, storage)
await storage.delete(meta_path)
logger.info("[Transcript] Deleted metadata for session %s", session_id)
except Exception as e:
logger.warning("[Transcript] Failed to delete metadata: %s", e)

View File

@@ -30,6 +30,7 @@ class TranscriptEntry(BaseModel):
type: str
uuid: str
parentUuid: str | None
isCompactSummary: bool | None = None
message: dict[str, Any]
@@ -53,6 +54,24 @@ class TranscriptBuilder:
return self._entries[-1].message.get("id", "")
return ""
@staticmethod
def _parse_entry(data: dict) -> TranscriptEntry | None:
"""Parse a single transcript entry, filtering strippable types.
Returns ``None`` for entries that should be skipped (strippable types
that are not compaction summaries).
"""
entry_type = data.get("type", "")
if entry_type in STRIPPABLE_TYPES and not data.get("isCompactSummary"):
return None
return TranscriptEntry(
type=entry_type,
uuid=data.get("uuid") or str(uuid4()),
parentUuid=data.get("parentUuid"),
isCompactSummary=data.get("isCompactSummary") or None,
message=data.get("message", {}),
)
def load_previous(self, content: str, log_prefix: str = "[Transcript]") -> None:
"""Load complete previous transcript.
@@ -78,18 +97,9 @@ class TranscriptBuilder:
)
continue
# Load all non-strippable entries (user/assistant/system/etc.)
# Skip only STRIPPABLE_TYPES to match strip_progress_entries() behavior
entry_type = data.get("type", "")
if entry_type in STRIPPABLE_TYPES:
entry = self._parse_entry(data)
if entry is None:
continue
entry = TranscriptEntry(
type=data["type"],
uuid=data.get("uuid") or str(uuid4()),
parentUuid=data.get("parentUuid"),
message=data.get("message", {}),
)
self._entries.append(entry)
self._last_uuid = entry.uuid
@@ -162,6 +172,43 @@ class TranscriptBuilder:
)
self._last_uuid = msg_uuid
def replace_entries(
self, compacted_entries: list[dict], log_prefix: str = "[Transcript]"
) -> None:
"""Replace all entries with compacted entries from the CLI session file.
Called after mid-stream compaction so TranscriptBuilder mirrors the
CLI's active context (compaction summary + post-compaction entries).
Builds the new list first and validates it's non-empty before swapping,
so corrupt input cannot wipe the conversation history.
"""
new_entries: list[TranscriptEntry] = []
for data in compacted_entries:
entry = self._parse_entry(data)
if entry is not None:
new_entries.append(entry)
if not new_entries:
logger.warning(
"%s replace_entries produced 0 entries from %d inputs, keeping old (%d entries)",
log_prefix,
len(compacted_entries),
len(self._entries),
)
return
old_count = len(self._entries)
self._entries = new_entries
self._last_uuid = new_entries[-1].uuid
logger.info(
"%s TranscriptBuilder compacted: %d entries -> %d entries",
log_prefix,
old_count,
len(self._entries),
)
def to_jsonl(self) -> str:
"""Export complete context as JSONL.

View File

@@ -1,15 +1,23 @@
"""Unit tests for JSONL transcript management utilities."""
import os
from unittest.mock import AsyncMock, patch
import pytest
from backend.util import json
from .transcript import (
STRIPPABLE_TYPES,
_cli_project_dir,
delete_transcript,
read_cli_session_file,
read_compacted_entries,
strip_progress_entries,
validate_transcript,
write_transcript_to_tempfile,
)
from .transcript_builder import TranscriptBuilder
def _make_jsonl(*entries: dict) -> str:
@@ -282,3 +290,610 @@ class TestStripProgressEntries:
lines = result.strip().split("\n")
asst_entry = json.loads(lines[-1])
assert asst_entry["parentUuid"] == "u1" # reparented
# --- read_cli_session_file ---
class TestReadCliSessionFile:
def test_no_matching_files_returns_none(self, tmp_path, monkeypatch):
"""read_cli_session_file returns None when no .jsonl files exist."""
# Create a project dir with no jsonl files
project_dir = tmp_path / "projects" / "encoded-cwd"
project_dir.mkdir(parents=True)
monkeypatch.setattr(
"backend.copilot.sdk.transcript._cli_project_dir",
lambda sdk_cwd: str(project_dir),
)
assert read_cli_session_file("/fake/cwd") is None
def test_one_jsonl_file_returns_content(self, tmp_path, monkeypatch):
"""read_cli_session_file returns the content of a single .jsonl file."""
project_dir = tmp_path / "projects" / "encoded-cwd"
project_dir.mkdir(parents=True)
jsonl_file = project_dir / "session.jsonl"
jsonl_file.write_text("line1\nline2\n")
monkeypatch.setattr(
"backend.copilot.sdk.transcript._cli_project_dir",
lambda sdk_cwd: str(project_dir),
)
result = read_cli_session_file("/fake/cwd")
assert result == "line1\nline2\n"
def test_symlink_escaping_project_dir_is_skipped(self, tmp_path, monkeypatch):
"""read_cli_session_file skips symlinks that escape the project dir."""
project_dir = tmp_path / "projects" / "encoded-cwd"
project_dir.mkdir(parents=True)
# Create a file outside the project dir
outside = tmp_path / "outside"
outside.mkdir()
outside_file = outside / "evil.jsonl"
outside_file.write_text("should not be read\n")
# Symlink from inside project_dir to outside file
symlink = project_dir / "evil.jsonl"
symlink.symlink_to(outside_file)
monkeypatch.setattr(
"backend.copilot.sdk.transcript._cli_project_dir",
lambda sdk_cwd: str(project_dir),
)
# The symlink target resolves outside project_dir, so it should be skipped
result = read_cli_session_file("/fake/cwd")
assert result is None
# --- _cli_project_dir ---
class TestCliProjectDir:
def test_returns_none_for_path_traversal(self, tmp_path, monkeypatch):
"""_cli_project_dir returns None when the project dir symlink escapes projects base."""
config_dir = tmp_path / "config"
config_dir.mkdir()
projects_dir = config_dir / "projects"
projects_dir.mkdir()
monkeypatch.setenv("CLAUDE_CONFIG_DIR", str(config_dir))
# Create a symlink inside projects/ that points outside of it.
# _cli_project_dir encodes the cwd as all-alnum-hyphens, so use a
# cwd whose encoded form matches the symlink name we create.
evil_target = tmp_path / "escaped"
evil_target.mkdir()
# The encoded form of "/evil/cwd" is "-evil-cwd"
symlink_path = projects_dir / "-evil-cwd"
symlink_path.symlink_to(evil_target)
result = _cli_project_dir("/evil/cwd")
assert result is None
# --- delete_transcript ---
class TestDeleteTranscript:
@pytest.mark.asyncio
async def test_deletes_both_jsonl_and_meta(self):
"""delete_transcript removes both the .jsonl and .meta.json files."""
mock_storage = AsyncMock()
mock_storage.delete = AsyncMock()
with patch(
"backend.util.workspace_storage.get_workspace_storage",
new_callable=AsyncMock,
return_value=mock_storage,
):
await delete_transcript("user-123", "session-456")
assert mock_storage.delete.call_count == 2
paths = [call.args[0] for call in mock_storage.delete.call_args_list]
assert any(p.endswith(".jsonl") for p in paths)
assert any(p.endswith(".meta.json") for p in paths)
@pytest.mark.asyncio
async def test_continues_on_jsonl_delete_failure(self):
"""If .jsonl delete fails, .meta.json delete is still attempted."""
mock_storage = AsyncMock()
mock_storage.delete = AsyncMock(
side_effect=[Exception("jsonl delete failed"), None]
)
with patch(
"backend.util.workspace_storage.get_workspace_storage",
new_callable=AsyncMock,
return_value=mock_storage,
):
# Should not raise
await delete_transcript("user-123", "session-456")
assert mock_storage.delete.call_count == 2
@pytest.mark.asyncio
async def test_handles_meta_delete_failure(self):
"""If .meta.json delete fails, no exception propagates."""
mock_storage = AsyncMock()
mock_storage.delete = AsyncMock(
side_effect=[None, Exception("meta delete failed")]
)
with patch(
"backend.util.workspace_storage.get_workspace_storage",
new_callable=AsyncMock,
return_value=mock_storage,
):
# Should not raise
await delete_transcript("user-123", "session-456")
# --- read_compacted_entries ---
COMPACT_SUMMARY = {
"type": "summary",
"uuid": "cs1",
"isCompactSummary": True,
"message": {"role": "assistant", "content": "compacted context"},
}
POST_COMPACT_ASST = {
"type": "assistant",
"uuid": "a2",
"parentUuid": "cs1",
"message": {"role": "assistant", "content": "response after compaction"},
}
class TestReadCompactedEntries:
def test_returns_summary_and_entries_after(self, tmp_path, monkeypatch):
"""File with isCompactSummary entry returns summary + entries after."""
config_dir = tmp_path / "config"
projects_dir = config_dir / "projects"
session_dir = projects_dir / "proj"
session_dir.mkdir(parents=True)
monkeypatch.setenv("CLAUDE_CONFIG_DIR", str(config_dir))
pre_compact = {"type": "user", "uuid": "u1", "message": {"role": "user"}}
path = session_dir / "session.jsonl"
path.write_text(_make_jsonl(pre_compact, COMPACT_SUMMARY, POST_COMPACT_ASST))
result = read_compacted_entries(str(path))
assert result is not None
assert len(result) == 2
assert result[0]["isCompactSummary"] is True
assert result[1]["uuid"] == "a2"
def test_no_compact_summary_returns_none(self, tmp_path, monkeypatch):
"""File without isCompactSummary returns None."""
config_dir = tmp_path / "config"
projects_dir = config_dir / "projects"
session_dir = projects_dir / "proj"
session_dir.mkdir(parents=True)
monkeypatch.setenv("CLAUDE_CONFIG_DIR", str(config_dir))
path = session_dir / "session.jsonl"
path.write_text(_make_jsonl(USER_MSG, ASST_MSG))
result = read_compacted_entries(str(path))
assert result is None
def test_file_not_found_returns_none(self, tmp_path, monkeypatch):
"""Non-existent file returns None."""
config_dir = tmp_path / "config"
projects_dir = config_dir / "projects"
projects_dir.mkdir(parents=True)
monkeypatch.setenv("CLAUDE_CONFIG_DIR", str(config_dir))
result = read_compacted_entries(str(projects_dir / "missing.jsonl"))
assert result is None
def test_empty_path_returns_none(self):
"""Empty string path returns None."""
result = read_compacted_entries("")
assert result is None
def test_malformed_json_lines_skipped(self, tmp_path, monkeypatch):
"""Malformed JSON lines are skipped gracefully."""
config_dir = tmp_path / "config"
projects_dir = config_dir / "projects"
session_dir = projects_dir / "proj"
session_dir.mkdir(parents=True)
monkeypatch.setenv("CLAUDE_CONFIG_DIR", str(config_dir))
path = session_dir / "session.jsonl"
content = "not valid json\n" + json.dumps(COMPACT_SUMMARY) + "\n"
content += "also bad\n" + json.dumps(POST_COMPACT_ASST) + "\n"
path.write_text(content)
result = read_compacted_entries(str(path))
assert result is not None
assert len(result) == 2 # summary + post-compact assistant
def test_multiple_compact_summaries_uses_last(self, tmp_path, monkeypatch):
"""When multiple isCompactSummary entries exist, uses the last one
(most recent compaction)."""
config_dir = tmp_path / "config"
projects_dir = config_dir / "projects"
session_dir = projects_dir / "proj"
session_dir.mkdir(parents=True)
monkeypatch.setenv("CLAUDE_CONFIG_DIR", str(config_dir))
second_summary = {
"type": "summary",
"uuid": "cs2",
"isCompactSummary": True,
"message": {"role": "assistant", "content": "second summary"},
}
path = session_dir / "session.jsonl"
path.write_text(_make_jsonl(COMPACT_SUMMARY, POST_COMPACT_ASST, second_summary))
result = read_compacted_entries(str(path))
assert result is not None
# Last summary found, so only cs2 returned
assert len(result) == 1
assert result[0]["uuid"] == "cs2"
def test_path_outside_projects_base_returns_none(self, tmp_path, monkeypatch):
"""Transcript path outside the projects directory is rejected."""
config_dir = tmp_path / "config"
(config_dir / "projects").mkdir(parents=True)
monkeypatch.setenv("CLAUDE_CONFIG_DIR", str(config_dir))
evil_file = tmp_path / "evil.jsonl"
evil_file.write_text(_make_jsonl(COMPACT_SUMMARY))
result = read_compacted_entries(str(evil_file))
assert result is None
# --- TranscriptBuilder.replace_entries ---
class TestTranscriptBuilderReplaceEntries:
def test_replaces_existing_entries(self):
"""replace_entries replaces all entries with compacted ones."""
builder = TranscriptBuilder()
builder.append_user("hello")
builder.append_assistant([{"type": "text", "text": "world"}])
assert builder.entry_count == 2
compacted = [
{
"type": "user",
"uuid": "cs1",
"isCompactSummary": True,
"message": {"role": "user", "content": "compacted summary"},
},
{
"type": "assistant",
"uuid": "a1",
"parentUuid": "cs1",
"message": {"role": "assistant", "content": "response"},
},
]
builder.replace_entries(compacted)
assert builder.entry_count == 2
output = builder.to_jsonl()
entries = [json.loads(line) for line in output.strip().split("\n")]
assert entries[0]["uuid"] == "cs1"
assert entries[1]["uuid"] == "a1"
def test_filters_strippable_types(self):
"""Strippable types are filtered out during replace."""
builder = TranscriptBuilder()
compacted = [
{
"type": "user",
"uuid": "cs1",
"message": {"role": "user", "content": "compacted summary"},
},
{"type": "progress", "uuid": "p1", "message": {}},
{"type": "summary", "uuid": "s1", "message": {}},
{
"type": "assistant",
"uuid": "a1",
"parentUuid": "cs1",
"message": {"role": "assistant", "content": "hi"},
},
]
builder.replace_entries(compacted)
assert builder.entry_count == 2 # progress and summary were filtered
def test_maintains_last_uuid_chain(self):
"""After replace, _last_uuid is the last entry's uuid."""
builder = TranscriptBuilder()
compacted = [
{
"type": "user",
"uuid": "cs1",
"message": {"role": "user", "content": "compacted summary"},
},
{
"type": "assistant",
"uuid": "a1",
"parentUuid": "cs1",
"message": {"role": "assistant", "content": "hi"},
},
]
builder.replace_entries(compacted)
# Appending a new user message should chain to a1
builder.append_user("next question")
output = builder.to_jsonl()
entries = [json.loads(line) for line in output.strip().split("\n")]
assert entries[-1]["parentUuid"] == "a1"
def test_empty_entries_list_keeps_existing(self):
"""Replacing with empty list keeps existing entries (safety check)."""
builder = TranscriptBuilder()
builder.append_user("hello")
builder.replace_entries([])
# Empty input is treated as corrupt — existing entries preserved
assert builder.entry_count == 1
assert not builder.is_empty
# --- TranscriptBuilder.load_previous with compacted content ---
class TestTranscriptBuilderLoadPreviousCompacted:
def test_preserves_compact_summary_entry(self):
"""load_previous preserves isCompactSummary entries even though
their type is 'summary' (which is in STRIPPABLE_TYPES)."""
compacted_content = _make_jsonl(COMPACT_SUMMARY, POST_COMPACT_ASST)
builder = TranscriptBuilder()
builder.load_previous(compacted_content)
assert builder.entry_count == 2
output = builder.to_jsonl()
entries = [json.loads(line) for line in output.strip().split("\n")]
assert entries[0]["type"] == "summary"
assert entries[0]["uuid"] == "cs1"
assert entries[1]["uuid"] == "a2"
def test_strips_regular_summary_entries(self):
"""Regular summary entries (without isCompactSummary) are still stripped."""
regular_summary = {"type": "summary", "uuid": "s1", "message": {"content": "x"}}
content = _make_jsonl(regular_summary, POST_COMPACT_ASST)
builder = TranscriptBuilder()
builder.load_previous(content)
assert builder.entry_count == 1 # Only the assistant entry
# --- End-to-end compaction flow (simulates service.py) ---
class TestCompactionFlowIntegration:
"""Simulate the full compaction flow as it happens in service.py:
1. TranscriptBuilder loads a previous transcript (download)
2. New messages are appended (user query + assistant response)
3. CompactionTracker fires (PreCompact hook → emit_start → emit_end)
4. read_compacted_entries reads the CLI session file
5. TranscriptBuilder.replace_entries syncs with CLI state
6. Final to_jsonl() produces the correct output (upload)
"""
def test_full_compaction_roundtrip(self, tmp_path, monkeypatch):
"""Full roundtrip: load → append → compact → replace → export."""
# Setup: create a CLI session file with pre-compact + compaction entries
config_dir = tmp_path / "config"
projects_dir = config_dir / "projects"
session_dir = projects_dir / "proj"
session_dir.mkdir(parents=True)
monkeypatch.setenv("CLAUDE_CONFIG_DIR", str(config_dir))
# Simulate a transcript with old messages, then a compaction summary
old_user = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "old question"},
}
old_asst = {
"type": "assistant",
"uuid": "a1",
"parentUuid": "u1",
"message": {"role": "assistant", "content": "old answer"},
}
compact_summary = {
"type": "summary",
"uuid": "cs1",
"isCompactSummary": True,
"message": {"role": "user", "content": "compacted summary of conversation"},
}
post_compact_asst = {
"type": "assistant",
"uuid": "a2",
"parentUuid": "cs1",
"message": {"role": "assistant", "content": "response after compaction"},
}
session_file = session_dir / "session.jsonl"
session_file.write_text(
_make_jsonl(old_user, old_asst, compact_summary, post_compact_asst)
)
# Step 1: TranscriptBuilder loads previous transcript (simulates download)
# The previous transcript would have the OLD entries (pre-compaction)
previous_transcript = _make_jsonl(old_user, old_asst)
builder = TranscriptBuilder()
builder.load_previous(previous_transcript)
assert builder.entry_count == 2
# Step 2: New messages appended during the current query
builder.append_user("new question")
builder.append_assistant([{"type": "text", "text": "new answer"}])
assert builder.entry_count == 4
# Step 3: read_compacted_entries reads the CLI session file
compacted = read_compacted_entries(str(session_file))
assert compacted is not None
assert len(compacted) == 2 # compact_summary + post_compact_asst
assert compacted[0]["isCompactSummary"] is True
# Step 4: replace_entries syncs builder with CLI state
builder.replace_entries(compacted)
assert builder.entry_count == 2 # Only compacted entries now
# Step 5: Append post-compaction messages (continuing the stream)
builder.append_user("follow-up question")
assert builder.entry_count == 3
# Step 6: Export and verify
output = builder.to_jsonl()
entries = [json.loads(line) for line in output.strip().split("\n")]
assert len(entries) == 3
# First entry is the compaction summary
assert entries[0]["type"] == "summary"
assert entries[0]["uuid"] == "cs1"
# Second is the post-compact assistant
assert entries[1]["uuid"] == "a2"
# Third is our follow-up, parented to the last compacted entry
assert entries[2]["type"] == "user"
assert entries[2]["parentUuid"] == "a2"
def test_compaction_preserves_chain_across_multiple_compactions(
self, tmp_path, monkeypatch
):
"""Two compactions: first compacts old history, second compacts the first."""
config_dir = tmp_path / "config"
projects_dir = config_dir / "projects"
session_dir = projects_dir / "proj"
session_dir.mkdir(parents=True)
monkeypatch.setenv("CLAUDE_CONFIG_DIR", str(config_dir))
# First compaction
first_summary = {
"type": "summary",
"uuid": "cs1",
"isCompactSummary": True,
"message": {"role": "user", "content": "first summary"},
}
mid_asst = {
"type": "assistant",
"uuid": "a1",
"parentUuid": "cs1",
"message": {"role": "assistant", "content": "mid response"},
}
# Second compaction (compacts the first summary + mid_asst)
second_summary = {
"type": "summary",
"uuid": "cs2",
"isCompactSummary": True,
"message": {"role": "user", "content": "second summary"},
}
final_asst = {
"type": "assistant",
"uuid": "a2",
"parentUuid": "cs2",
"message": {"role": "assistant", "content": "final response"},
}
session_file = session_dir / "session.jsonl"
session_file.write_text(
_make_jsonl(first_summary, mid_asst, second_summary, final_asst)
)
# read_compacted_entries should find the LAST summary
compacted = read_compacted_entries(str(session_file))
assert compacted is not None
assert len(compacted) == 2 # second_summary + final_asst
assert compacted[0]["uuid"] == "cs2"
# Apply to builder
builder = TranscriptBuilder()
builder.append_user("old stuff")
builder.append_assistant([{"type": "text", "text": "old response"}])
builder.replace_entries(compacted)
assert builder.entry_count == 2
# New message chains correctly
builder.append_user("after second compaction")
output = builder.to_jsonl()
entries = [json.loads(line) for line in output.strip().split("\n")]
assert entries[-1]["parentUuid"] == "a2"
def test_strip_progress_preserves_compact_summaries(self):
"""strip_progress_entries doesn't strip isCompactSummary entries
even though their type is 'summary' (in STRIPPABLE_TYPES)."""
compact_summary = {
"type": "summary",
"uuid": "cs1",
"isCompactSummary": True,
"message": {"role": "user", "content": "compacted"},
}
regular_summary = {"type": "summary", "uuid": "s1", "message": {"content": "x"}}
progress = {"type": "progress", "uuid": "p1", "data": {"stdout": "..."}}
user = {
"type": "user",
"uuid": "u1",
"message": {"role": "user", "content": "hi"},
}
content = _make_jsonl(compact_summary, regular_summary, progress, user)
stripped = strip_progress_entries(content)
stripped_entries = [
json.loads(line) for line in stripped.strip().split("\n") if line.strip()
]
uuids = [e.get("uuid") for e in stripped_entries]
# compact_summary kept, regular_summary stripped, progress stripped, user kept
assert "cs1" in uuids # compact summary preserved
assert "s1" not in uuids # regular summary stripped
assert "p1" not in uuids # progress stripped
assert "u1" in uuids # user kept
def test_builder_load_then_replace_then_export_roundtrip(self):
"""Load a compacted transcript, replace with new compaction, export.
Simulates two consecutive turns with compaction each time."""
# Turn 1: load compacted transcript
compact1 = {
"type": "summary",
"uuid": "cs1",
"isCompactSummary": True,
"message": {"role": "user", "content": "summary v1"},
}
asst1 = {
"type": "assistant",
"uuid": "a1",
"parentUuid": "cs1",
"message": {"role": "assistant", "content": "response 1"},
}
builder = TranscriptBuilder()
builder.load_previous(_make_jsonl(compact1, asst1))
assert builder.entry_count == 2
# Turn 1: append new messages
builder.append_user("question")
builder.append_assistant([{"type": "text", "text": "answer"}])
assert builder.entry_count == 4
# Turn 1: compaction fires — replace with new compacted state
compact2 = {
"type": "summary",
"uuid": "cs2",
"isCompactSummary": True,
"message": {"role": "user", "content": "summary v2"},
}
asst2 = {
"type": "assistant",
"uuid": "a2",
"parentUuid": "cs2",
"message": {"role": "assistant", "content": "continuing"},
}
builder.replace_entries([compact2, asst2])
assert builder.entry_count == 2
# Export (this goes to cloud storage for next turn's download)
output = builder.to_jsonl()
lines = [json.loads(line) for line in output.strip().split("\n")]
assert lines[0]["uuid"] == "cs2"
assert lines[0]["type"] == "summary"
assert lines[1]["uuid"] == "a2"
# Turn 2: fresh builder loads the exported transcript
builder2 = TranscriptBuilder()
builder2.load_previous(output)
assert builder2.entry_count == 2
builder2.append_user("turn 2 question")
output2 = builder2.to_jsonl()
lines2 = [json.loads(line) for line in output2.strip().split("\n")]
assert lines2[-1]["parentUuid"] == "a2"

View File

@@ -22,13 +22,11 @@ class AddUnderstandingTool(BaseTool):
@property
def description(self) -> str:
return """Capture and store information about the user's business context,
workflows, pain points, and automation goals. Call this tool whenever the user
shares information about their business. Each call incrementally adds to the
existing understanding - you don't need to provide all fields at once.
Use this to build a comprehensive profile that helps recommend better agents
and automations for the user's specific needs."""
return (
"Store user's business context, workflows, pain points, and automation goals. "
"Call whenever the user shares business info. Each call incrementally merges "
"with existing data — provide only the fields you have."
)
@property
def parameters(self) -> dict[str, Any]:

View File

@@ -32,6 +32,7 @@ import shutil
import tempfile
from typing import Any
from backend.copilot.context import get_workspace_manager
from backend.copilot.model import ChatSession
from backend.util.request import validate_url_host
@@ -43,7 +44,6 @@ from .models import (
ErrorResponse,
ToolResponseBase,
)
from .workspace_files import get_manager
logger = logging.getLogger(__name__)
@@ -194,7 +194,7 @@ async def _save_browser_state(
),
}
manager = await get_manager(user_id, session.session_id)
manager = await get_workspace_manager(user_id, session.session_id)
await manager.write_file(
content=json.dumps(state).encode("utf-8"),
filename=_STATE_FILENAME,
@@ -218,7 +218,7 @@ async def _restore_browser_state(
Returns True on success (or no state to restore), False on failure.
"""
try:
manager = await get_manager(user_id, session.session_id)
manager = await get_workspace_manager(user_id, session.session_id)
file_info = await manager.get_file_info_by_path(_STATE_FILENAME)
if file_info is None:
@@ -360,7 +360,7 @@ async def close_browser_session(session_name: str, user_id: str | None = None) -
# Delete persisted browser state (cookies, localStorage) from workspace.
if user_id:
try:
manager = await get_manager(user_id, session_name)
manager = await get_workspace_manager(user_id, session_name)
file_info = await manager.get_file_info_by_path(_STATE_FILENAME)
if file_info is not None:
await manager.delete_file(file_info.id)
@@ -408,18 +408,11 @@ class BrowserNavigateTool(BaseTool):
@property
def description(self) -> str:
return (
"Navigate to a URL using a real browser. Returns an accessibility "
"tree snapshot listing the page's interactive elements with @ref IDs "
"(e.g. @e3) that can be used with browser_act. "
"Session persists — cookies and login state carry over between calls. "
"Use this (with browser_act) for multi-step interaction: login flows, "
"form filling, button clicks, or anything requiring page interaction. "
"For plain static pages, prefer web_fetch — no browser overhead. "
"For authenticated pages: navigate to the login page first, use browser_act "
"to fill credentials and submit, then navigate to the target page. "
"Note: for slow SPAs, the returned snapshot may reflect a partially-loaded "
"state. If elements seem missing, use browser_act with action='wait' and a "
"CSS selector or millisecond delay, then take a browser_screenshot to verify."
"Navigate to a URL in a real browser. Returns accessibility tree with @ref IDs "
"for browser_act. Session persists (cookies/auth carry over). "
"For static pages, prefer web_fetch. "
"For SPAs, elements may load late — use browser_act with wait + browser_screenshot to verify. "
"For auth: navigate to login, fill creds with browser_act, then navigate to target."
)
@property
@@ -429,13 +422,13 @@ class BrowserNavigateTool(BaseTool):
"properties": {
"url": {
"type": "string",
"description": "The HTTP/HTTPS URL to navigate to.",
"description": "HTTP/HTTPS URL to navigate to.",
},
"wait_for": {
"type": "string",
"enum": ["networkidle", "load", "domcontentloaded"],
"default": "networkidle",
"description": "When to consider navigation complete. Use 'networkidle' for SPAs (default).",
"description": "Navigation completion strategy (default: networkidle).",
},
},
"required": ["url"],
@@ -554,14 +547,12 @@ class BrowserActTool(BaseTool):
@property
def description(self) -> str:
return (
"Interact with the current browser page. Use @ref IDs from the "
"snapshot (e.g. '@e3') to target elements. Returns an updated snapshot. "
"Supported actions: click, dblclick, fill, type, scroll, hover, press, "
"Interact with the current browser page using @ref IDs from the snapshot. "
"Actions: click, dblclick, fill, type, scroll, hover, press, "
"check, uncheck, select, wait, back, forward, reload. "
"fill clears the field before typing; type appends without clearing. "
"wait accepts a CSS selector (waits for element) or milliseconds string (e.g. '1000'). "
"Example login flow: fill @e1 with email → fill @e2 with password → "
"click @e3 (submit) → browser_navigate to the target page."
"fill clears field first; type appends. "
"wait accepts CSS selector or milliseconds (e.g. '1000'). "
"Returns updated snapshot."
)
@property
@@ -587,30 +578,21 @@ class BrowserActTool(BaseTool):
"forward",
"reload",
],
"description": "The action to perform.",
"description": "Action to perform.",
},
"target": {
"type": "string",
"description": (
"Element to target. Use @ref from snapshot (e.g. '@e3'), "
"a CSS selector, or a text description. "
"Required for: click, dblclick, fill, type, hover, check, uncheck, select. "
"For wait: a CSS selector to wait for, or milliseconds as a string (e.g. '1000')."
),
"description": "@ref ID (e.g. '@e3'), CSS selector, or text description.",
},
"value": {
"type": "string",
"description": (
"For fill/type: the text to enter. "
"For press: key name (e.g. 'Enter', 'Tab', 'Control+a'). "
"For select: the option value to select."
),
"description": "Text for fill/type, key for press (e.g. 'Enter'), option for select.",
},
"direction": {
"type": "string",
"enum": ["up", "down", "left", "right"],
"default": "down",
"description": "For scroll: direction to scroll.",
"description": "Scroll direction (default: down).",
},
},
"required": ["action"],
@@ -757,12 +739,10 @@ class BrowserScreenshotTool(BaseTool):
@property
def description(self) -> str:
return (
"Take a screenshot of the current browser page and save it to the workspace. "
"IMPORTANT: After calling this tool, immediately call read_workspace_file "
"with the returned file_id to display the image inline to the user — "
"the screenshot is not visible until you do this. "
"With annotate=true (default), @ref labels are overlaid on interactive "
"elements, making it easy to see which @ref ID maps to which element on screen."
"Screenshot the current browser page and save to workspace. "
"annotate=true overlays @ref labels on elements. "
"IMPORTANT: After calling, you MUST immediately call read_workspace_file with the "
"returned file_id to display the image inline."
)
@property
@@ -773,12 +753,12 @@ class BrowserScreenshotTool(BaseTool):
"annotate": {
"type": "boolean",
"default": True,
"description": "Overlay @ref labels on interactive elements (default: true).",
"description": "Overlay @ref labels (default: true).",
},
"filename": {
"type": "string",
"default": "screenshot.png",
"description": "Filename to save in the workspace.",
"description": "Workspace filename (default: screenshot.png).",
},
},
}

View File

@@ -897,7 +897,7 @@ class TestHasLocalSession:
# _save_browser_state
# ---------------------------------------------------------------------------
_GET_MANAGER = "backend.copilot.tools.agent_browser.get_manager"
_GET_MANAGER = "backend.copilot.tools.agent_browser.get_workspace_manager"
def _make_mock_manager():

View File

@@ -935,5 +935,5 @@ class AgentValidator:
for i, error in enumerate(self.errors, 1):
error_message += f"{i}. {error}\n"
logger.error(f"Agent validation failed: {error_message}")
logger.warning(f"Agent validation failed: {error_message}")
return False, error_message

View File

@@ -108,22 +108,12 @@ class AgentOutputTool(BaseTool):
@property
def description(self) -> str:
return """Retrieve execution outputs from agents in the user's library.
Identify the agent using one of:
- agent_name: Fuzzy search in user's library
- library_agent_id: Exact library agent ID
- store_slug: Marketplace format 'username/agent-name'
Select which run to retrieve using:
- execution_id: Specific execution ID
- run_time: 'latest' (default), 'yesterday', 'last week', or ISO date 'YYYY-MM-DD'
Wait for completion (optional):
- wait_if_running: Max seconds to wait if execution is still running (0-300).
If the execution is running/queued, waits up to this many seconds for completion.
Returns current status on timeout. If already finished, returns immediately.
"""
return (
"Retrieve execution outputs from a library agent. "
"Identify by agent_name, library_agent_id, or store_slug. "
"Filter by execution_id or run_time. "
"Optionally wait for running executions."
)
@property
def parameters(self) -> dict[str, Any]:
@@ -132,32 +122,27 @@ class AgentOutputTool(BaseTool):
"properties": {
"agent_name": {
"type": "string",
"description": "Agent name to search for in user's library (fuzzy match)",
"description": "Agent name (fuzzy match).",
},
"library_agent_id": {
"type": "string",
"description": "Exact library agent ID",
"description": "Library agent ID.",
},
"store_slug": {
"type": "string",
"description": "Marketplace identifier: 'username/agent-slug'",
"description": "Marketplace 'username/agent-slug'.",
},
"execution_id": {
"type": "string",
"description": "Specific execution ID to retrieve",
"description": "Specific execution ID.",
},
"run_time": {
"type": "string",
"description": (
"Time filter: 'latest', 'yesterday', 'last week', or 'YYYY-MM-DD'"
),
"description": "Time filter: 'latest', today/yesterday/last week/last 7 days/last month/last 30 days, 'YYYY-MM-DD', or ISO datetime.",
},
"wait_if_running": {
"type": "integer",
"description": (
"Max seconds to wait if execution is still running (0-300). "
"If running, waits for completion. Returns current state on timeout."
),
"description": "Max seconds to wait if still running (0-300). Returns current state on timeout.",
},
},
"required": [],

View File

@@ -41,15 +41,9 @@ class BashExecTool(BaseTool):
@property
def description(self) -> str:
return (
"Execute a Bash command or script. "
"Full Bash scripting is supported (loops, conditionals, pipes, "
"functions, etc.). "
"The working directory is shared with the SDK Read/Write/Edit/Glob/Grep "
"tools — files created by either are immediately visible to both. "
"Execution is killed after the timeout (default 30s, max 120s). "
"Returns stdout and stderr. "
"Useful for file manipulation, data processing, running scripts, "
"and installing packages."
"Execute a Bash command or script. Shares filesystem with SDK file tools. "
"Useful for scripts, data processing, and package installation. "
"Killed after timeout (default 30s, max 120s)."
)
@property
@@ -59,13 +53,11 @@ class BashExecTool(BaseTool):
"properties": {
"command": {
"type": "string",
"description": "Bash command or script to execute.",
"description": "Bash command or script.",
},
"timeout": {
"type": "integer",
"description": (
"Max execution time in seconds (default 30, max 120)."
),
"description": "Max seconds (default 30, max 120).",
"default": 30,
},
},

View File

@@ -30,12 +30,7 @@ class ContinueRunBlockTool(BaseTool):
@property
def description(self) -> str:
return (
"Continue executing a block after human review approval. "
"Use this after a run_block call returned review_required. "
"Pass the review_id from the review_required response. "
"The block will execute with the original pre-approved input data."
)
return "Resume block execution after human review approval. Pass the review_id."
@property
def parameters(self) -> dict[str, Any]:
@@ -44,10 +39,7 @@ class ContinueRunBlockTool(BaseTool):
"properties": {
"review_id": {
"type": "string",
"description": (
"The review_id from a previous review_required response. "
"This resumes execution with the pre-approved input data."
),
"description": "review_id from the review_required response.",
},
},
"required": ["review_id"],

View File

@@ -23,12 +23,8 @@ class CreateAgentTool(BaseTool):
@property
def description(self) -> str:
return (
"Create a new agent workflow. Pass `agent_json` with the complete "
"agent graph JSON you generated using block schemas from find_block. "
"The tool validates, auto-fixes, and saves.\n\n"
"IMPORTANT: Before calling this tool, search for relevant existing agents "
"using find_library_agent that could be used as building blocks. "
"Pass their IDs in the library_agent_ids parameter."
"Create a new agent from JSON (nodes + links). Validates, auto-fixes, and saves. "
"Before calling, search for existing agents with find_library_agent."
)
@property
@@ -42,34 +38,21 @@ class CreateAgentTool(BaseTool):
"properties": {
"agent_json": {
"type": "object",
"description": (
"The agent JSON to validate and save. "
"Must contain 'nodes' and 'links' arrays, and optionally "
"'name' and 'description'."
),
"description": "Agent graph with 'nodes' and 'links' arrays.",
},
"library_agent_ids": {
"type": "array",
"items": {"type": "string"},
"description": (
"List of library agent IDs to use as building blocks."
),
"description": "Library agent IDs as building blocks.",
},
"save": {
"type": "boolean",
"description": (
"Whether to save the agent. Default is true. "
"Set to false for preview only."
),
"description": "Save the agent (default: true). False for preview.",
"default": True,
},
"folder_id": {
"type": "string",
"description": (
"Optional folder ID to save the agent into. "
"If not provided, the agent is saved at root level. "
"Use list_folders to find available folders."
),
"description": "Folder ID to save into (default: root).",
},
},
"required": ["agent_json"],

View File

@@ -23,9 +23,7 @@ class CustomizeAgentTool(BaseTool):
@property
def description(self) -> str:
return (
"Customize a marketplace or template agent. Pass `agent_json` "
"with the complete customized agent JSON. The tool validates, "
"auto-fixes, and saves."
"Customize a marketplace/template agent. Validates, auto-fixes, and saves."
)
@property
@@ -39,32 +37,21 @@ class CustomizeAgentTool(BaseTool):
"properties": {
"agent_json": {
"type": "object",
"description": (
"Complete customized agent JSON to validate and save. "
"Optionally include 'name' and 'description'."
),
"description": "Customized agent JSON with nodes and links.",
},
"library_agent_ids": {
"type": "array",
"items": {"type": "string"},
"description": (
"List of library agent IDs to use as building blocks."
),
"description": "Library agent IDs as building blocks.",
},
"save": {
"type": "boolean",
"description": (
"Whether to save the customized agent. Default is true."
),
"description": "Save the agent (default: true). False for preview.",
"default": True,
},
"folder_id": {
"type": "string",
"description": (
"Optional folder ID to save the agent into. "
"If not provided, the agent is saved at root level. "
"Use list_folders to find available folders."
),
"description": "Folder ID to save into (default: root).",
},
},
"required": ["agent_json"],

View File

@@ -21,9 +21,11 @@ Lifecycle
Cost control
------------
Sandboxes are created with a configurable ``on_timeout`` lifecycle action
(default: ``"pause"``). The explicit per-turn ``pause_sandbox()`` call is the
primary mechanism; the lifecycle setting is a safety net. Paused sandboxes are
free.
(default: ``"pause"``) and ``auto_resume`` (default: ``True``). The explicit
per-turn ``pause_sandbox()`` call is the primary mechanism; the lifecycle
timeout is a safety net (default: 5 min). ``auto_resume`` ensures that paused
sandboxes wake transparently on SDK activity, making the aggressive safety-net
timeout safe. Paused sandboxes are free.
The sandbox_id is stored in Redis. The same key doubles as a creation lock:
a ``"creating"`` sentinel value is written with a short TTL while a new sandbox
@@ -40,6 +42,7 @@ import logging
from typing import Any, Awaitable, Callable, Literal
from e2b import AsyncSandbox
from e2b.sandbox.sandbox_api import SandboxLifecycle
from backend.data.redis_client import get_redis_async
@@ -116,9 +119,10 @@ async def get_or_create_sandbox(
removes the need for a separate lock key.
*timeout* controls how long the e2b sandbox may run continuously before
the ``on_timeout`` lifecycle rule fires (default: 3 h).
the ``on_timeout`` lifecycle rule fires (default: 5 min).
*on_timeout* controls what happens on timeout: ``"pause"`` (default, free)
or ``"kill"``.
or ``"kill"``. When ``"pause"``, ``auto_resume`` is enabled so paused
sandboxes wake transparently on SDK activity.
"""
redis = await get_redis_async()
key = _sandbox_key(session_id)
@@ -156,11 +160,15 @@ async def get_or_create_sandbox(
# We hold the slot — create the sandbox.
try:
lifecycle = SandboxLifecycle(
on_timeout=on_timeout,
auto_resume=on_timeout == "pause",
)
sandbox = await AsyncSandbox.create(
template=template,
api_key=api_key,
timeout=timeout,
lifecycle={"on_timeout": on_timeout},
lifecycle=lifecycle,
)
try:
await _set_stored_sandbox_id(session_id, sandbox.sandbox_id)

View File

@@ -157,14 +157,17 @@ class TestGetOrCreateSandbox:
assert result is new_sb
mock_cls.create.assert_awaited_once()
# Verify lifecycle param is set
# Verify lifecycle: pause + auto_resume enabled
_, kwargs = mock_cls.create.call_args
assert kwargs.get("lifecycle") == {"on_timeout": "pause"}
assert kwargs.get("lifecycle") == {
"on_timeout": "pause",
"auto_resume": True,
}
# sandbox_id should be saved to Redis
redis.set.assert_awaited()
def test_create_with_on_timeout_kill(self):
"""on_timeout='kill' is passed through to AsyncSandbox.create."""
"""on_timeout='kill' disables auto_resume automatically."""
new_sb = _mock_sandbox("sb-new")
redis = _mock_redis(set_nx_result=True, stored_sandbox_id=None)
with (
@@ -179,7 +182,10 @@ class TestGetOrCreateSandbox:
)
_, kwargs = mock_cls.create.call_args
assert kwargs.get("lifecycle") == {"on_timeout": "kill"}
assert kwargs.get("lifecycle") == {
"on_timeout": "kill",
"auto_resume": False,
}
def test_create_failure_releases_slot(self):
"""If sandbox creation fails, the Redis creation slot is deleted."""

View File

@@ -23,12 +23,8 @@ class EditAgentTool(BaseTool):
@property
def description(self) -> str:
return (
"Edit an existing agent. Pass `agent_json` with the complete "
"updated agent JSON you generated. The tool validates, auto-fixes, "
"and saves.\n\n"
"IMPORTANT: Before calling this tool, if the changes involve adding new "
"functionality, search for relevant existing agents using find_library_agent "
"that could be used as building blocks."
"Edit an existing agent. Validates, auto-fixes, and saves. "
"Before calling, search for existing agents with find_library_agent."
)
@property
@@ -42,33 +38,20 @@ class EditAgentTool(BaseTool):
"properties": {
"agent_id": {
"type": "string",
"description": (
"The ID of the agent to edit. "
"Can be a graph ID or library agent ID."
),
"description": "Graph ID or library agent ID to edit.",
},
"agent_json": {
"type": "object",
"description": (
"Complete updated agent JSON to validate and save. "
"Must contain 'nodes' and 'links'. "
"Include 'name' and/or 'description' if they need "
"to be updated."
),
"description": "Updated agent JSON with nodes and links.",
},
"library_agent_ids": {
"type": "array",
"items": {"type": "string"},
"description": (
"List of library agent IDs to use as building blocks for the changes."
),
"description": "Library agent IDs as building blocks.",
},
"save": {
"type": "boolean",
"description": (
"Whether to save the changes. "
"Default is true. Set to false for preview only."
),
"description": "Save changes (default: true). False for preview.",
"default": True,
},
},

View File

@@ -134,11 +134,7 @@ class SearchFeatureRequestsTool(BaseTool):
@property
def description(self) -> str:
return (
"Search existing feature requests to check if a similar request "
"already exists before creating a new one. Returns matching feature "
"requests with their ID, title, and description."
)
return "Search existing feature requests. Check before creating a new one."
@property
def parameters(self) -> dict[str, Any]:
@@ -234,14 +230,9 @@ class CreateFeatureRequestTool(BaseTool):
@property
def description(self) -> str:
return (
"Create a new feature request or add a customer need to an existing one. "
"Always search first with search_feature_requests to avoid duplicates. "
"If a matching request exists, pass its ID as existing_issue_id to add "
"the user's need to it instead of creating a duplicate. "
"IMPORTANT: Never include personally identifiable information (PII) in "
"the title or description — no names, emails, phone numbers, company "
"names, or other identifying details. Write titles and descriptions in "
"generic, feature-focused language."
"Create a feature request or add need to existing one. "
"Search first to avoid duplicates. Pass existing_issue_id to add to existing. "
"Never include PII (names, emails, phone numbers, company names) in title/description."
)
@property
@@ -251,28 +242,15 @@ class CreateFeatureRequestTool(BaseTool):
"properties": {
"title": {
"type": "string",
"description": (
"Title for the feature request. Must be generic and "
"feature-focused — do not include any user names, emails, "
"company names, or other PII."
),
"description": "Feature request title. No PII.",
},
"description": {
"type": "string",
"description": (
"Detailed description of what the user wants and why. "
"Must not contain any personally identifiable information "
"(PII) — describe the feature need generically without "
"referencing specific users, companies, or contact details."
),
"description": "What the user wants and why. No PII.",
},
"existing_issue_id": {
"type": "string",
"description": (
"If adding a need to an existing feature request, "
"provide its Linear issue ID (from search results). "
"Omit to create a new feature request."
),
"description": "Linear issue ID to add need to (from search results).",
},
},
"required": ["title", "description"],

View File

@@ -18,9 +18,7 @@ class FindAgentTool(BaseTool):
@property
def description(self) -> str:
return (
"Discover agents from the marketplace based on capabilities and user needs."
)
return "Search marketplace agents by capability."
@property
def parameters(self) -> dict[str, Any]:
@@ -29,7 +27,7 @@ class FindAgentTool(BaseTool):
"properties": {
"query": {
"type": "string",
"description": "Search query describing what the user wants to accomplish. Use single keywords for best results.",
"description": "Search keywords (single keywords work best).",
},
},
"required": ["query"],

View File

@@ -51,14 +51,7 @@ class FindBlockTool(BaseTool):
@property
def description(self) -> str:
return (
"Search for available blocks by name or description. "
"Blocks are reusable components that perform specific tasks like "
"sending emails, making API calls, processing text, etc. "
"IMPORTANT: Use this tool FIRST to get the block's 'id' before calling run_block. "
"The response includes each block's id, name, and description. "
"Call run_block with the block's id **with no inputs** to see detailed inputs/outputs and execute it."
)
return "Search blocks by name or description. Returns block IDs for run_block. Always call this FIRST to get block IDs before using run_block."
@property
def parameters(self) -> dict[str, Any]:
@@ -67,18 +60,11 @@ class FindBlockTool(BaseTool):
"properties": {
"query": {
"type": "string",
"description": (
"Search query to find blocks by name or description. "
"Use keywords like 'email', 'http', 'text', 'ai', etc."
),
"description": "Search keywords (e.g. 'email', 'http', 'ai').",
},
"include_schemas": {
"type": "boolean",
"description": (
"If true, include full input_schema and output_schema "
"for each block. Use when generating agent JSON that "
"needs block schemas. Default is false."
),
"description": "Include full input/output schemas (for agent JSON generation).",
"default": False,
},
},

View File

@@ -19,13 +19,8 @@ class FindLibraryAgentTool(BaseTool):
@property
def description(self) -> str:
return (
"Search for or list agents in the user's library. Use this to find "
"agents the user has already added to their library, including agents "
"they created or added from the marketplace. "
"When creating agents with sub-agent composition, use this to get "
"the agent's graph_id, graph_version, input_schema, and output_schema "
"needed for AgentExecutorBlock nodes. "
"Omit the query to list all agents."
"Search user's library agents. Returns graph_id, schemas for sub-agent composition. "
"Omit query to list all."
)
@property
@@ -35,10 +30,7 @@ class FindLibraryAgentTool(BaseTool):
"properties": {
"query": {
"type": "string",
"description": (
"Search query to find agents by name or description. "
"Omit to list all agents in the library."
),
"description": "Search by name/description. Omit to list all.",
},
},
"required": [],

View File

@@ -22,20 +22,8 @@ class FixAgentGraphTool(BaseTool):
@property
def description(self) -> str:
return (
"Auto-fix common issues in an agent JSON graph. Applies fixes for:\n"
"- Missing or invalid UUIDs on nodes and links\n"
"- StoreValueBlock prerequisites for ConditionBlock\n"
"- Double curly brace escaping in prompt templates\n"
"- AddToList/AddToDictionary prerequisite blocks\n"
"- CodeExecutionBlock output field naming\n"
"- Missing credentials configuration\n"
"- Node X coordinate spacing (800+ units apart)\n"
"- AI model default parameters\n"
"- Link static properties based on input schema\n"
"- Type mismatches (inserts conversion blocks)\n\n"
"Returns the fixed agent JSON plus a list of fixes applied. "
"After fixing, the agent is re-validated. If still invalid, "
"the remaining errors are included in the response."
"Auto-fix common agent JSON issues (UUIDs, types, credentials, spacing, etc.). "
"Returns fixed JSON and list of fixes applied."
)
@property

View File

@@ -42,12 +42,7 @@ class GetAgentBuildingGuideTool(BaseTool):
@property
def description(self) -> str:
return (
"Returns the complete guide for building agent JSON graphs, including "
"block IDs, link structure, AgentInputBlock, AgentOutputBlock, "
"AgentExecutorBlock (for sub-agent composition), and MCPToolBlock usage. "
"Call this before generating agent JSON to ensure correct structure."
)
return "Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage). Call before generating agent JSON."
@property
def parameters(self) -> dict[str, Any]:

View File

@@ -25,8 +25,7 @@ class GetDocPageTool(BaseTool):
@property
def description(self) -> str:
return (
"Get the full content of a documentation page by its path. "
"Use this after search_docs to read the complete content of a relevant page."
"Read full documentation page content by path (from search_docs results)."
)
@property
@@ -36,10 +35,7 @@ class GetDocPageTool(BaseTool):
"properties": {
"path": {
"type": "string",
"description": (
"The path to the documentation file, as returned by search_docs. "
"Example: 'platform/block-sdk-guide.md'"
),
"description": "Doc file path (e.g. 'platform/block-sdk-guide.md').",
},
},
"required": ["path"],

View File

@@ -38,11 +38,7 @@ class GetMCPGuideTool(BaseTool):
@property
def description(self) -> str:
return (
"Returns the MCP tool guide: known hosted server URLs (Notion, Linear, "
"Stripe, Intercom, Cloudflare, Atlassian) and authentication workflow. "
"Call before using run_mcp_tool if you need a server URL or auth info."
)
return "Get MCP server URLs and auth guide."
@property
def parameters(self) -> dict[str, Any]:

View File

@@ -13,6 +13,7 @@ from backend.data.execution import ExecutionContext
from backend.data.model import CredentialsFieldInfo, CredentialsMetaInput
from backend.integrations.creds_manager import IntegrationCredentialsManager
from backend.util.exceptions import BlockError
from backend.util.type import coerce_inputs_to_schema
from .models import BlockOutputResponse, ErrorResponse, ToolResponseBase
from .utils import match_credentials_to_requirements
@@ -111,6 +112,9 @@ async def execute_block(
session_id=session_id,
)
# Coerce non-matching data types to the expected input schema.
coerce_inputs_to_schema(input_data, block.input_schema)
# Execute the block and collect outputs
outputs: dict[str, list[Any]] = defaultdict(list)
async for output_name, output_data in block.execute(

View File

@@ -0,0 +1,333 @@
"""Tests for execute_block type coercion in helpers.py.
Verifies that execute_block() coerces string input values to match the block's
expected input types, mirroring the executor's validate_exec() logic.
This is critical for @@agptfile: expansion, where file content is always a string
but the block may expect structured types (e.g. list[list[str]]).
"""
from typing import Any
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from backend.copilot.tools.helpers import execute_block
from backend.copilot.tools.models import BlockOutputResponse
def _make_block_schema(annotations: dict[str, Any]) -> MagicMock:
"""Create a mock input_schema with model_fields matching the given annotations."""
schema = MagicMock()
# coerce_inputs_to_schema uses model_fields (Pydantic v2 API)
model_fields = {}
for name, ann in annotations.items():
field = MagicMock()
field.annotation = ann
model_fields[name] = field
schema.model_fields = model_fields
return schema
def _make_block(
block_id: str,
name: str,
annotations: dict[str, Any],
outputs: dict[str, list[Any]] | None = None,
) -> MagicMock:
"""Create a mock block with typed annotations and a simple execute method."""
block = MagicMock()
block.id = block_id
block.name = name
block.input_schema = _make_block_schema(annotations)
captured_inputs: dict[str, Any] = {}
async def mock_execute(input_data: dict, **_kwargs: Any):
captured_inputs.update(input_data)
for output_name, values in (outputs or {"result": ["ok"]}).items():
for v in values:
yield output_name, v
block.execute = mock_execute
block._captured_inputs = captured_inputs
return block
_TEST_SESSION_ID = "test-session-coerce"
_TEST_USER_ID = "test-user-coerce"
@pytest.mark.asyncio(loop_scope="session")
async def test_coerce_json_string_to_nested_list():
"""JSON string → list[list[str]] (Google Sheets CSV import case)."""
block = _make_block(
"sheets-write",
"Google Sheets Write",
{"values": list[list[str]], "spreadsheet_id": str},
)
mock_workspace_db = MagicMock()
mock_workspace_db.get_or_create_workspace = AsyncMock(
return_value=MagicMock(id="ws-1")
)
with patch(
"backend.copilot.tools.helpers.workspace_db",
return_value=mock_workspace_db,
):
response = await execute_block(
block=block,
block_id="sheets-write",
input_data={
"values": '[["Name","Score"],["Alice","90"],["Bob","85"]]',
"spreadsheet_id": "abc123",
},
user_id=_TEST_USER_ID,
session_id=_TEST_SESSION_ID,
node_exec_id="exec-1",
matched_credentials={},
)
assert isinstance(response, BlockOutputResponse)
assert response.success is True
# Verify the input was coerced from string to list[list[str]]
assert block._captured_inputs["values"] == [
["Name", "Score"],
["Alice", "90"],
["Bob", "85"],
]
assert isinstance(block._captured_inputs["values"], list)
assert isinstance(block._captured_inputs["values"][0], list)
@pytest.mark.asyncio(loop_scope="session")
async def test_coerce_json_string_to_list():
"""JSON string → list[str]."""
block = _make_block(
"list-block",
"List Block",
{"items": list[str]},
)
mock_workspace_db = MagicMock()
mock_workspace_db.get_or_create_workspace = AsyncMock(
return_value=MagicMock(id="ws-1")
)
with patch(
"backend.copilot.tools.helpers.workspace_db",
return_value=mock_workspace_db,
):
response = await execute_block(
block=block,
block_id="list-block",
input_data={"items": '["a","b","c"]'},
user_id=_TEST_USER_ID,
session_id=_TEST_SESSION_ID,
node_exec_id="exec-2",
matched_credentials={},
)
assert isinstance(response, BlockOutputResponse)
assert block._captured_inputs["items"] == ["a", "b", "c"]
@pytest.mark.asyncio(loop_scope="session")
async def test_coerce_json_string_to_dict():
"""JSON string → dict[str, str]."""
block = _make_block(
"dict-block",
"Dict Block",
{"config": dict[str, str]},
)
mock_workspace_db = MagicMock()
mock_workspace_db.get_or_create_workspace = AsyncMock(
return_value=MagicMock(id="ws-1")
)
with patch(
"backend.copilot.tools.helpers.workspace_db",
return_value=mock_workspace_db,
):
response = await execute_block(
block=block,
block_id="dict-block",
input_data={"config": '{"key": "value", "foo": "bar"}'},
user_id=_TEST_USER_ID,
session_id=_TEST_SESSION_ID,
node_exec_id="exec-3",
matched_credentials={},
)
assert isinstance(response, BlockOutputResponse)
assert block._captured_inputs["config"] == {"key": "value", "foo": "bar"}
@pytest.mark.asyncio(loop_scope="session")
async def test_no_coercion_when_type_matches():
"""Already-correct types pass through without coercion."""
block = _make_block(
"pass-through",
"Pass Through",
{"values": list[list[str]], "name": str},
)
original_values = [["a", "b"], ["c", "d"]]
mock_workspace_db = MagicMock()
mock_workspace_db.get_or_create_workspace = AsyncMock(
return_value=MagicMock(id="ws-1")
)
with patch(
"backend.copilot.tools.helpers.workspace_db",
return_value=mock_workspace_db,
):
response = await execute_block(
block=block,
block_id="pass-through",
input_data={"values": original_values, "name": "test"},
user_id=_TEST_USER_ID,
session_id=_TEST_SESSION_ID,
node_exec_id="exec-4",
matched_credentials={},
)
assert isinstance(response, BlockOutputResponse)
assert block._captured_inputs["values"] == original_values
assert block._captured_inputs["name"] == "test"
@pytest.mark.asyncio(loop_scope="session")
async def test_coerce_string_to_int():
"""String number → int."""
block = _make_block(
"int-block",
"Int Block",
{"count": int},
)
mock_workspace_db = MagicMock()
mock_workspace_db.get_or_create_workspace = AsyncMock(
return_value=MagicMock(id="ws-1")
)
with patch(
"backend.copilot.tools.helpers.workspace_db",
return_value=mock_workspace_db,
):
response = await execute_block(
block=block,
block_id="int-block",
input_data={"count": "42"},
user_id=_TEST_USER_ID,
session_id=_TEST_SESSION_ID,
node_exec_id="exec-5",
matched_credentials={},
)
assert isinstance(response, BlockOutputResponse)
assert block._captured_inputs["count"] == 42
assert isinstance(block._captured_inputs["count"], int)
@pytest.mark.asyncio(loop_scope="session")
async def test_coerce_skips_none_values():
"""None values are not coerced (they may be optional fields)."""
block = _make_block(
"optional-block",
"Optional Block",
{"data": list[str], "label": str},
)
mock_workspace_db = MagicMock()
mock_workspace_db.get_or_create_workspace = AsyncMock(
return_value=MagicMock(id="ws-1")
)
with patch(
"backend.copilot.tools.helpers.workspace_db",
return_value=mock_workspace_db,
):
response = await execute_block(
block=block,
block_id="optional-block",
input_data={"label": "test"},
user_id=_TEST_USER_ID,
session_id=_TEST_SESSION_ID,
node_exec_id="exec-6",
matched_credentials={},
)
assert isinstance(response, BlockOutputResponse)
# 'data' was not provided, so it should not appear in captured inputs
assert "data" not in block._captured_inputs
@pytest.mark.asyncio(loop_scope="session")
async def test_coerce_union_type_preserves_valid_member():
"""Union-typed fields should not be coerced when the value matches a member."""
block = _make_block(
"union-block",
"Union Block",
{"content": str | list[str]},
)
mock_workspace_db = MagicMock()
mock_workspace_db.get_or_create_workspace = AsyncMock(
return_value=MagicMock(id="ws-1")
)
with patch(
"backend.copilot.tools.helpers.workspace_db",
return_value=mock_workspace_db,
):
response = await execute_block(
block=block,
block_id="union-block",
input_data={"content": ["a", "b"]},
user_id=_TEST_USER_ID,
session_id=_TEST_SESSION_ID,
node_exec_id="exec-7",
matched_credentials={},
)
assert isinstance(response, BlockOutputResponse)
# list[str] should NOT be stringified to '["a", "b"]'
assert block._captured_inputs["content"] == ["a", "b"]
assert isinstance(block._captured_inputs["content"], list)
@pytest.mark.asyncio(loop_scope="session")
async def test_coerce_inner_elements_of_generic():
"""Inner elements of generic containers are recursively coerced."""
block = _make_block(
"inner-coerce",
"Inner Coerce",
{"values": list[str]},
)
mock_workspace_db = MagicMock()
mock_workspace_db.get_or_create_workspace = AsyncMock(
return_value=MagicMock(id="ws-1")
)
with patch(
"backend.copilot.tools.helpers.workspace_db",
return_value=mock_workspace_db,
):
response = await execute_block(
block=block,
block_id="inner-coerce",
# Inner elements are ints, but target is list[str]
input_data={"values": [1, 2, 3]},
user_id=_TEST_USER_ID,
session_id=_TEST_SESSION_ID,
node_exec_id="exec-8",
matched_credentials={},
)
assert isinstance(response, BlockOutputResponse)
# Inner elements should be coerced from int to str
assert block._captured_inputs["values"] == ["1", "2", "3"]
assert all(isinstance(v, str) for v in block._captured_inputs["values"])

View File

@@ -88,10 +88,7 @@ class CreateFolderTool(BaseTool):
@property
def description(self) -> str:
return (
"Create a new folder in the user's library to organize agents. "
"Optionally nest it inside an existing folder using parent_id."
)
return "Create a library folder. Use parent_id to nest inside another folder."
@property
def requires_auth(self) -> bool:
@@ -104,22 +101,19 @@ class CreateFolderTool(BaseTool):
"properties": {
"name": {
"type": "string",
"description": "Name for the new folder (max 100 chars).",
"description": "Folder name (max 100 chars).",
},
"parent_id": {
"type": "string",
"description": (
"ID of the parent folder to nest inside. "
"Omit to create at root level."
),
"description": "Parent folder ID (omit for root).",
},
"icon": {
"type": "string",
"description": "Optional icon identifier for the folder.",
"description": "Icon identifier.",
},
"color": {
"type": "string",
"description": "Optional hex color code (#RRGGBB).",
"description": "Hex color (#RRGGBB).",
},
},
"required": ["name"],
@@ -175,13 +169,8 @@ class ListFoldersTool(BaseTool):
@property
def description(self) -> str:
return (
"List the user's library folders. "
"Omit parent_id to get the full folder tree. "
"Provide parent_id to list only direct children of that folder. "
"Set include_agents=true to also return the agents inside each folder "
"and root-level agents not in any folder. Always set include_agents=true "
"when the user asks about agents, wants to see what's in their folders, "
"or mentions agents alongside folders."
"List library folders. Omit parent_id for full tree. "
"Set include_agents=true when user asks about agents in folders."
)
@property
@@ -195,17 +184,11 @@ class ListFoldersTool(BaseTool):
"properties": {
"parent_id": {
"type": "string",
"description": (
"List children of this folder. "
"Omit to get the full folder tree."
),
"description": "List children of this folder (omit for full tree).",
},
"include_agents": {
"type": "boolean",
"description": (
"Whether to include the list of agents inside each folder. "
"Defaults to false."
),
"description": "Include agents in each folder (default: false).",
},
},
"required": [],
@@ -357,10 +340,7 @@ class MoveFolderTool(BaseTool):
@property
def description(self) -> str:
return (
"Move a folder to a different parent folder. "
"Set target_parent_id to null to move to root level."
)
return "Move a folder. Set target_parent_id to null for root."
@property
def requires_auth(self) -> bool:
@@ -373,14 +353,11 @@ class MoveFolderTool(BaseTool):
"properties": {
"folder_id": {
"type": "string",
"description": "ID of the folder to move.",
"description": "Folder ID.",
},
"target_parent_id": {
"type": ["string", "null"],
"description": (
"ID of the new parent folder. "
"Use null to move to root level."
),
"description": "New parent folder ID (null for root).",
},
},
"required": ["folder_id"],
@@ -433,10 +410,7 @@ class DeleteFolderTool(BaseTool):
@property
def description(self) -> str:
return (
"Delete a folder from the user's library. "
"Agents inside the folder are moved to root level (not deleted)."
)
return "Delete a folder. Agents inside move to root (not deleted)."
@property
def requires_auth(self) -> bool:
@@ -499,10 +473,7 @@ class MoveAgentsToFolderTool(BaseTool):
@property
def description(self) -> str:
return (
"Move one or more agents to a folder. "
"Set folder_id to null to move agents to root level."
)
return "Move agents to a folder. Set folder_id to null for root."
@property
def requires_auth(self) -> bool:
@@ -516,13 +487,11 @@ class MoveAgentsToFolderTool(BaseTool):
"agent_ids": {
"type": "array",
"items": {"type": "string"},
"description": "List of library agent IDs to move.",
"description": "Library agent IDs to move.",
},
"folder_id": {
"type": ["string", "null"],
"description": (
"Target folder ID. Use null to move to root level."
),
"description": "Target folder ID (null for root).",
},
},
"required": ["agent_ids"],

View File

@@ -104,19 +104,11 @@ class RunAgentTool(BaseTool):
@property
def description(self) -> str:
return """Run or schedule an agent from the marketplace or user's library.
The tool automatically handles the setup flow:
- Returns missing inputs if required fields are not provided
- Returns missing credentials if user needs to configure them
- Executes immediately if all requirements are met
- Schedules execution if cron expression is provided
Identify the agent using either:
- username_agent_slug: Marketplace format 'username/agent-name'
- library_agent_id: ID of an agent in the user's library
For scheduled execution, provide: schedule_name, cron, and optionally timezone."""
return (
"Run or schedule an agent. Automatically checks inputs and credentials. "
"Identify by username_agent_slug ('user/agent') or library_agent_id. "
"For scheduling, provide schedule_name + cron."
)
@property
def parameters(self) -> dict[str, Any]:
@@ -125,40 +117,36 @@ class RunAgentTool(BaseTool):
"properties": {
"username_agent_slug": {
"type": "string",
"description": "Agent identifier in format 'username/agent-name'",
"description": "Marketplace format 'username/agent-name'.",
},
"library_agent_id": {
"type": "string",
"description": "Library agent ID from user's library",
"description": "Library agent ID.",
},
"inputs": {
"type": "object",
"description": "Input values for the agent",
"description": "Input values for the agent.",
"additionalProperties": True,
},
"use_defaults": {
"type": "boolean",
"description": "Set to true to run with default values (user must confirm)",
"description": "Run with default values (confirm with user first).",
},
"schedule_name": {
"type": "string",
"description": "Name for scheduled execution (triggers scheduling mode)",
"description": "Name for scheduled execution.",
},
"cron": {
"type": "string",
"description": "Cron expression (5 fields: min hour day month weekday)",
"description": "Cron expression (min hour day month weekday).",
},
"timezone": {
"type": "string",
"description": "IANA timezone for schedule (default: UTC)",
"description": "IANA timezone (default: UTC).",
},
"wait_for_result": {
"type": "integer",
"description": (
"Max seconds to wait for execution to complete (0-300). "
"If >0, blocks until the execution finishes or times out. "
"Returns execution outputs when complete."
),
"description": "Max seconds to wait for completion (0-300).",
},
},
"required": [],

View File

@@ -12,6 +12,7 @@ from backend.copilot.constants import (
COPILOT_SESSION_PREFIX,
)
from backend.copilot.model import ChatSession
from backend.copilot.sdk.file_ref import FileRefExpansionError, expand_file_refs_in_args
from backend.data.db_accessors import review_db
from backend.data.execution import ExecutionContext
@@ -44,13 +45,10 @@ class RunBlockTool(BaseTool):
@property
def description(self) -> str:
return (
"Execute a specific block with the provided input data. "
"IMPORTANT: You MUST call find_block first to get the block's 'id' - "
"do NOT guess or make up block IDs. "
"On first attempt (without input_data), returns detailed schema showing "
"required inputs and outputs. Then call again with proper input_data to execute. "
"If a block requires human review, use continue_run_block with the "
"review_id after the user approves."
"Execute a block. IMPORTANT: Always get block_id from find_block first "
"— do NOT guess or fabricate IDs. "
"Call with empty input_data to see schema, then with data to execute. "
"If review_required, use continue_run_block."
)
@property
@@ -60,28 +58,14 @@ class RunBlockTool(BaseTool):
"properties": {
"block_id": {
"type": "string",
"description": (
"The block's 'id' field from find_block results. "
"NEVER guess this - always get it from find_block first."
),
},
"block_name": {
"type": "string",
"description": (
"The block's human-readable name from find_block results. "
"Used for display purposes in the UI."
),
"description": "Block ID from find_block results.",
},
"input_data": {
"type": "object",
"description": (
"Input values for the block. "
"First call with empty {} to see the block's schema, "
"then call again with proper values to execute."
),
"description": "Input values. Use {} first to see schema.",
},
},
"required": ["block_id", "block_name", "input_data"],
"required": ["block_id", "input_data"],
}
@property
@@ -197,6 +181,29 @@ class RunBlockTool(BaseTool):
session_id=session_id,
)
# Expand @@agptfile: refs in input_data with the block's input
# schema. The generic _truncating wrapper skips opaque object
# properties (input_data has no declared inner properties in the
# tool schema), so file ref tokens are still intact here.
# Using the block's schema lets us return raw text for string-typed
# fields and parsed structures for list/dict-typed fields.
if input_data:
try:
input_data = await expand_file_refs_in_args(
input_data,
user_id,
session,
input_schema=input_schema,
)
except FileRefExpansionError as exc:
return ErrorResponse(
message=(
f"Failed to resolve file reference: {exc}. "
"Ensure the file exists before referencing it."
),
session_id=session_id,
)
if missing_credentials:
# Return setup requirements response with missing credentials
credentials_fields_info = block.input_schema.get_credentials_fields_info()

View File

@@ -57,10 +57,9 @@ class RunMCPToolTool(BaseTool):
@property
def description(self) -> str:
return (
"Connect to an MCP (Model Context Protocol) server to discover and execute its tools. "
"Two-step: (1) call with server_url to list available tools, "
"(2) call again with server_url + tool_name + tool_arguments to execute. "
"Call get_mcp_guide for known server URLs and auth details."
"Discover and execute MCP server tools. "
"Call with server_url only to list tools, then with tool_name + tool_arguments to execute. "
"Call get_mcp_guide first for server URLs and auth."
)
@property
@@ -70,24 +69,15 @@ class RunMCPToolTool(BaseTool):
"properties": {
"server_url": {
"type": "string",
"description": (
"URL of the MCP server (Streamable HTTP endpoint), "
"e.g. https://mcp.example.com/mcp"
),
"description": "MCP server URL (Streamable HTTP endpoint).",
},
"tool_name": {
"type": "string",
"description": (
"Name of the MCP tool to execute. "
"Omit on first call to discover available tools."
),
"description": "Tool to execute. Omit to discover available tools.",
},
"tool_arguments": {
"type": "object",
"description": (
"Arguments to pass to the selected tool. "
"Must match the tool's input schema returned during discovery."
),
"description": "Arguments matching the tool's input schema.",
},
},
"required": ["server_url"],
@@ -184,10 +174,12 @@ class RunMCPToolTool(BaseTool):
if e.status_code in _AUTH_STATUS_CODES and not creds:
# Server requires auth and user has no stored credentials
return self._build_setup_requirements(server_url, session_id)
logger.warning("MCP HTTP error for %s: %s", server_host(server_url), e)
host = server_host(server_url)
logger.warning("MCP HTTP error for %s: status=%s", host, e.status_code)
return ErrorResponse(
message=f"MCP server returned HTTP {e.status_code}: {e}",
message=(f"MCP request to {host} failed with HTTP {e.status_code}."),
session_id=session_id,
error=f"HTTP {e.status_code}: {str(e)[:300]}",
)
except MCPClientError as e:

View File

@@ -38,11 +38,7 @@ class SearchDocsTool(BaseTool):
@property
def description(self) -> str:
return (
"Search the AutoGPT platform documentation for information about "
"how to use the platform, build agents, configure blocks, and more. "
"Returns relevant documentation sections. Use get_doc_page to read full content."
)
return "Search platform documentation by keyword. Use get_doc_page to read full results."
@property
def parameters(self) -> dict[str, Any]:
@@ -51,10 +47,7 @@ class SearchDocsTool(BaseTool):
"properties": {
"query": {
"type": "string",
"description": (
"Search query to find relevant documentation. "
"Use natural language to describe what you're looking for."
),
"description": "Documentation search query.",
},
},
"required": ["query"],

View File

@@ -580,6 +580,49 @@ async def test_auth_error_with_existing_creds_returns_error():
assert "403" in response.message
@pytest.mark.asyncio(loop_scope="session")
async def test_http_error_returns_clean_message_with_collapsible_detail():
"""Non-auth HTTP errors return a clean message with raw detail in the `error` field."""
from backend.util.request import HTTPClientError
tool = RunMCPToolTool()
session = make_session(_USER_ID)
with patch(
"backend.copilot.tools.run_mcp_tool.validate_url_host", new_callable=AsyncMock
):
with patch(
"backend.copilot.tools.run_mcp_tool.auto_lookup_mcp_credential",
new_callable=AsyncMock,
return_value=None,
):
mock_client = AsyncMock()
mock_client.initialize = AsyncMock(
side_effect=HTTPClientError(
"<!doctype html><html><body>Not Found</body></html>",
status_code=404,
)
)
with patch(
"backend.copilot.tools.run_mcp_tool.MCPClient",
return_value=mock_client,
):
response = await tool._execute(
user_id=_USER_ID,
session=session,
server_url=_SERVER_URL,
)
assert isinstance(response, ErrorResponse)
assert "404" in response.message
# Raw HTML body must NOT leak into the user-facing message
assert "<!doctype" not in response.message
# Raw detail (including original body) goes in the collapsible `error` field
assert response.error is not None
assert "404" in response.error
assert "<!doctype" in response.error.lower()
@pytest.mark.asyncio(loop_scope="session")
async def test_mcp_client_error_returns_error_response():
"""MCPClientError (protocol-level) maps to a clean ErrorResponse."""

View File

@@ -0,0 +1,81 @@
"""Schema regression tests for all registered CoPilot tools.
Validates that every tool in TOOL_REGISTRY produces a well-formed schema:
- description is non-empty
- all `required` fields exist in `properties`
- every property has a `type` and `description`
- total token budget does not regress past 8000 tokens
"""
import json
import pytest
import tiktoken
from backend.copilot.tools import TOOL_REGISTRY
_TOKEN_BUDGET = 8_000
def _get_all_tool_schemas() -> list[tuple[str, object]]:
"""Return (tool_name, openai_schema) pairs for every registered tool."""
return [(name, tool.as_openai_tool()) for name, tool in TOOL_REGISTRY.items()]
_ALL_SCHEMAS = _get_all_tool_schemas()
@pytest.mark.parametrize(
"tool_name,schema",
_ALL_SCHEMAS,
ids=[name for name, _ in _ALL_SCHEMAS],
)
class TestToolSchema:
"""Validate schema invariants for every registered tool."""
def test_description_non_empty(self, tool_name: str, schema: dict) -> None:
desc = schema["function"].get("description", "")
assert desc, f"Tool '{tool_name}' has an empty description"
def test_required_fields_exist_in_properties(
self, tool_name: str, schema: dict
) -> None:
params = schema["function"].get("parameters", {})
properties = params.get("properties", {})
required = params.get("required", [])
for field in required:
assert field in properties, (
f"Tool '{tool_name}': required field '{field}' "
f"not found in properties {list(properties.keys())}"
)
def test_every_property_has_type_and_description(
self, tool_name: str, schema: dict
) -> None:
params = schema["function"].get("parameters", {})
properties = params.get("properties", {})
for prop_name, prop_def in properties.items():
assert (
"type" in prop_def
), f"Tool '{tool_name}', property '{prop_name}' is missing 'type'"
assert (
"description" in prop_def
), f"Tool '{tool_name}', property '{prop_name}' is missing 'description'"
def test_total_schema_token_budget() -> None:
"""Assert total tool schema size stays under the token budget.
This locks in the 34% token reduction from #12398 and prevents future
description bloat from eroding the gains. Budget is set to 8000 tokens.
Note: this measures tool JSON only (not the full system prompt); the actual
baseline for tool schemas alone is ~6470 tokens, giving ~19% headroom.
"""
schemas = [tool.as_openai_tool() for tool in TOOL_REGISTRY.values()]
serialized = json.dumps(schemas)
enc = tiktoken.get_encoding("cl100k_base")
total_tokens = len(enc.encode(serialized))
assert total_tokens < _TOKEN_BUDGET, (
f"Tool schemas use {total_tokens} tokens, exceeding budget of {_TOKEN_BUDGET}. "
f"Description bloat detected — trim descriptions or raise the budget intentionally."
)

View File

@@ -21,19 +21,7 @@ class ValidateAgentGraphTool(BaseTool):
@property
def description(self) -> str:
return (
"Validate an agent JSON graph for correctness. Checks:\n"
"- All block_ids reference real blocks\n"
"- All links reference valid source/sink nodes and fields\n"
"- Required input fields are wired or have defaults\n"
"- Data types are compatible across links\n"
"- Nested sink links use correct notation\n"
"- Prompt templates use proper curly brace escaping\n"
"- AgentExecutorBlock configurations are valid\n\n"
"Call this after generating agent JSON to verify correctness. "
"If validation fails, either fix issues manually based on the error "
"descriptions, or call fix_agent_graph to auto-fix common problems."
)
return "Validate agent JSON for correctness (block_ids, links, types, schemas). On failure, use fix_agent_graph to auto-fix."
@property
def requires_auth(self) -> bool:
@@ -46,11 +34,7 @@ class ValidateAgentGraphTool(BaseTool):
"properties": {
"agent_json": {
"type": "object",
"description": (
"The agent JSON to validate. Must contain 'nodes' and 'links' arrays. "
"Each node needs: id (UUID), block_id, input_default, metadata. "
"Each link needs: id (UUID), source_id, source_name, sink_id, sink_name."
),
"description": "Agent JSON with 'nodes' and 'links' arrays.",
},
},
"required": ["agent_json"],

View File

@@ -59,13 +59,7 @@ class WebFetchTool(BaseTool):
@property
def description(self) -> str:
return (
"Fetch the content of a public web page by URL. "
"Returns readable text extracted from HTML by default. "
"Useful for reading documentation, articles, and API responses. "
"Only supports HTTP/HTTPS GET requests to public URLs "
"(private/internal network addresses are blocked)."
)
return "Fetch a public web page. Public URLs only — internal addresses blocked. Returns readable text from HTML by default."
@property
def parameters(self) -> dict[str, Any]:
@@ -74,14 +68,11 @@ class WebFetchTool(BaseTool):
"properties": {
"url": {
"type": "string",
"description": "The public HTTP/HTTPS URL to fetch.",
"description": "Public HTTP/HTTPS URL.",
},
"extract_text": {
"type": "boolean",
"description": (
"If true (default), extract readable text from HTML. "
"If false, return raw content."
),
"description": "Extract text from HTML (default: true).",
"default": True,
},
},

View File

@@ -10,11 +10,11 @@ from pydantic import BaseModel
from backend.copilot.context import (
E2B_WORKDIR,
get_current_sandbox,
get_workspace_manager,
resolve_sandbox_path,
)
from backend.copilot.model import ChatSession
from backend.copilot.tools.sandbox import make_session_path
from backend.data.db_accessors import workspace_db
from backend.util.settings import Config
from backend.util.virus_scanner import scan_content_safe
from backend.util.workspace import WorkspaceManager
@@ -218,12 +218,6 @@ def _is_text_mime(mime_type: str) -> bool:
return any(mime_type.startswith(t) for t in _TEXT_MIME_PREFIXES)
async def get_manager(user_id: str, session_id: str) -> WorkspaceManager:
"""Create a session-scoped WorkspaceManager."""
workspace = await workspace_db().get_or_create_workspace(user_id)
return WorkspaceManager(user_id, workspace.id, session_id)
async def _resolve_file(
manager: WorkspaceManager,
file_id: str | None,
@@ -327,13 +321,7 @@ class ListWorkspaceFilesTool(BaseTool):
@property
def description(self) -> str:
return (
"List files in the user's persistent workspace (cloud storage). "
"These files survive across sessions. "
"For ephemeral session files, use the SDK Read/Glob tools instead. "
"Returns file names, paths, sizes, and metadata. "
"Optionally filter by path prefix."
)
return "List persistent workspace files. For ephemeral session files, use SDK Glob/Read instead. Optionally filter by path prefix."
@property
def parameters(self) -> dict[str, Any]:
@@ -342,24 +330,17 @@ class ListWorkspaceFilesTool(BaseTool):
"properties": {
"path_prefix": {
"type": "string",
"description": (
"Optional path prefix to filter files "
"(e.g., '/documents/' to list only files in documents folder). "
"By default, only files from the current session are listed."
),
"description": "Filter by path prefix (e.g. '/documents/').",
},
"limit": {
"type": "integer",
"description": "Maximum number of files to return (default 50, max 100)",
"description": "Max files to return (default 50, max 100).",
"minimum": 1,
"maximum": 100,
},
"include_all_sessions": {
"type": "boolean",
"description": (
"If true, list files from all sessions. "
"Default is false (only current session's files)."
),
"description": "Include files from all sessions (default: false).",
},
},
"required": [],
@@ -386,7 +367,7 @@ class ListWorkspaceFilesTool(BaseTool):
include_all_sessions: bool = kwargs.get("include_all_sessions", False)
try:
manager = await get_manager(user_id, session_id)
manager = await get_workspace_manager(user_id, session_id)
files = await manager.list_files(
path=path_prefix, limit=limit, include_all_sessions=include_all_sessions
)
@@ -442,18 +423,10 @@ class ReadWorkspaceFileTool(BaseTool):
@property
def description(self) -> str:
return (
"Read a file from the user's persistent workspace (cloud storage). "
"These files survive across sessions. "
"For ephemeral session files, use the SDK Read tool instead. "
"Specify either file_id or path to identify the file. "
"For small text files, returns content directly. "
"For large or binary files, returns metadata and a download URL. "
"Use 'save_to_path' to copy the file to the working directory "
"(sandbox or ephemeral) for processing with bash_exec or file tools. "
"Use 'offset' and 'length' for paginated reads of large files "
"(e.g., persisted tool outputs). "
"Paths are scoped to the current session by default. "
"Use /sessions/<session_id>/... for cross-session access."
"Read a file from persistent workspace. Specify file_id or path. "
"Small text/image files return inline; large/binary return metadata+URL. "
"Use save_to_path to copy to working dir for processing. "
"Use offset/length for paginated reads."
)
@property
@@ -463,48 +436,30 @@ class ReadWorkspaceFileTool(BaseTool):
"properties": {
"file_id": {
"type": "string",
"description": "The file's unique ID (from list_workspace_files)",
"description": "File ID from list_workspace_files.",
},
"path": {
"type": "string",
"description": (
"The virtual file path (e.g., '/documents/report.pdf'). "
"Scoped to current session by default."
),
"description": "Virtual file path (e.g. '/documents/report.pdf').",
},
"save_to_path": {
"type": "string",
"description": (
"If provided, save the file to this path in the working "
"directory (cloud sandbox when E2B is active, or "
"ephemeral dir otherwise) so it can be processed with "
"bash_exec or file tools. "
"The file content is still returned in the response."
),
"description": "Copy file to this working directory path for processing.",
},
"force_download_url": {
"type": "boolean",
"description": (
"If true, always return metadata+URL instead of inline content. "
"Default is false (auto-selects based on file size/type)."
),
"description": "Always return metadata+URL instead of inline content.",
},
"offset": {
"type": "integer",
"description": (
"Character offset to start reading from (0-based). "
"Use with 'length' for paginated reads of large files."
),
"description": "Character offset for paginated reads (0-based).",
},
"length": {
"type": "integer",
"description": (
"Maximum number of characters to return. "
"Defaults to full file. Use with 'offset' for paginated reads."
),
"description": "Max characters to return for paginated reads.",
},
},
"required": [], # At least one must be provided
"required": [], # At least one of file_id or path must be provided
}
@property
@@ -536,7 +491,7 @@ class ReadWorkspaceFileTool(BaseTool):
)
try:
manager = await get_manager(user_id, session_id)
manager = await get_workspace_manager(user_id, session_id)
resolved = await _resolve_file(manager, file_id, path, session_id)
if isinstance(resolved, ErrorResponse):
return resolved
@@ -659,15 +614,9 @@ class WriteWorkspaceFileTool(BaseTool):
@property
def description(self) -> str:
return (
"Write or create a file in the user's persistent workspace (cloud storage). "
"These files survive across sessions. "
"For ephemeral session files, use the SDK Write tool instead. "
"Provide content as plain text via 'content', OR base64-encoded via "
"'content_base64', OR copy a file from the ephemeral working directory "
"via 'source_path'. Exactly one of these three is required. "
f"Maximum file size is {Config().max_file_size_mb}MB. "
"Files are saved to the current session's folder by default. "
"Use /sessions/<session_id>/... for cross-session access."
"Write a file to persistent workspace (survives across sessions). "
"Provide exactly one of: content (text), content_base64 (binary), "
f"or source_path (copy from working dir). Max {Config().max_file_size_mb}MB."
)
@property
@@ -677,51 +626,31 @@ class WriteWorkspaceFileTool(BaseTool):
"properties": {
"filename": {
"type": "string",
"description": "Name for the file (e.g., 'report.pdf')",
"description": "Filename (e.g. 'report.pdf').",
},
"content": {
"type": "string",
"description": (
"Plain text content to write. Use this for text files "
"(code, configs, documents, etc.). "
"Mutually exclusive with content_base64 and source_path."
),
"description": "Plain text content. Mutually exclusive with content_base64/source_path.",
},
"content_base64": {
"type": "string",
"description": (
"Base64-encoded file content. Use this for binary files "
"(images, PDFs, etc.). "
"Mutually exclusive with content and source_path."
),
"description": "Base64-encoded binary content. Mutually exclusive with content/source_path.",
},
"source_path": {
"type": "string",
"description": (
"Path to a file in the ephemeral working directory to "
"copy to workspace (e.g., '/tmp/copilot-.../output.csv'). "
"Use this to persist files created by bash_exec or SDK Write. "
"Mutually exclusive with content and content_base64."
),
"description": "Working directory path to copy to workspace. Mutually exclusive with content/content_base64.",
},
"path": {
"type": "string",
"description": (
"Optional virtual path where to save the file "
"(e.g., '/documents/report.pdf'). "
"Defaults to '/{filename}'. Scoped to current session."
),
"description": "Virtual path (e.g. '/documents/report.pdf'). Defaults to '/{filename}'.",
},
"mime_type": {
"type": "string",
"description": (
"Optional MIME type of the file. "
"Auto-detected from filename if not provided."
),
"description": "MIME type. Auto-detected from filename if omitted.",
},
"overwrite": {
"type": "boolean",
"description": "Whether to overwrite if file exists at path (default: false)",
"description": "Overwrite if file exists (default: false).",
},
},
"required": ["filename"],
@@ -772,7 +701,7 @@ class WriteWorkspaceFileTool(BaseTool):
try:
await scan_content_safe(content, filename=filename)
manager = await get_manager(user_id, session_id)
manager = await get_workspace_manager(user_id, session_id)
rec = await manager.write_file(
content=content,
filename=filename,
@@ -848,12 +777,7 @@ class DeleteWorkspaceFileTool(BaseTool):
@property
def description(self) -> str:
return (
"Delete a file from the user's persistent workspace (cloud storage). "
"Specify either file_id or path to identify the file. "
"Paths are scoped to the current session by default. "
"Use /sessions/<session_id>/... for cross-session access."
)
return "Delete a file from persistent workspace. Specify file_id or path."
@property
def parameters(self) -> dict[str, Any]:
@@ -862,17 +786,14 @@ class DeleteWorkspaceFileTool(BaseTool):
"properties": {
"file_id": {
"type": "string",
"description": "The file's unique ID (from list_workspace_files)",
"description": "File ID from list_workspace_files.",
},
"path": {
"type": "string",
"description": (
"The virtual file path (e.g., '/documents/report.pdf'). "
"Scoped to current session by default."
),
"description": "Virtual file path.",
},
},
"required": [], # At least one must be provided
"required": [], # At least one of file_id or path must be provided
}
@property
@@ -899,7 +820,7 @@ class DeleteWorkspaceFileTool(BaseTool):
)
try:
manager = await get_manager(user_id, session_id)
manager = await get_workspace_manager(user_id, session_id)
resolved = await _resolve_file(manager, file_id, path, session_id)
if isinstance(resolved, ErrorResponse):
return resolved

View File

@@ -0,0 +1,750 @@
import asyncio
import csv
import io
import logging
import os
import re
import socket
from dataclasses import dataclass
from datetime import datetime, timezone
from typing import Any, Literal, Optional
from uuid import uuid4
import prisma.enums
import prisma.models
import prisma.types
from prisma.errors import UniqueViolationError
from pydantic import BaseModel, EmailStr, TypeAdapter, ValidationError
from backend.data.db import transaction
from backend.data.model import User
from backend.data.redis_client import get_redis_async
from backend.data.tally import get_business_understanding_input_from_tally, mask_email
from backend.data.understanding import (
BusinessUnderstandingInput,
merge_business_understanding_data,
)
from backend.data.user import get_user_by_email, get_user_by_id
from backend.executor.cluster_lock import AsyncClusterLock
from backend.util.exceptions import (
NotAuthorizedError,
NotFoundError,
PreconditionFailed,
)
from backend.util.json import SafeJson
from backend.util.settings import Settings
logger = logging.getLogger(__name__)
_settings = Settings()
_WORKER_ID = f"{socket.gethostname()}:{os.getpid()}"
_tally_seed_tasks: dict[str, asyncio.Task] = {}
_TALLY_STALE_SECONDS = 300
_MAX_TALLY_ERROR_LENGTH = 200
_email_adapter = TypeAdapter(EmailStr)
MAX_BULK_INVITE_FILE_BYTES = 1024 * 1024
MAX_BULK_INVITE_ROWS = 500
class InvitedUserRecord(BaseModel):
id: str
email: str
status: prisma.enums.InvitedUserStatus
auth_user_id: Optional[str] = None
name: Optional[str] = None
tally_understanding: Optional[dict[str, Any]] = None
tally_status: prisma.enums.TallyComputationStatus
tally_computed_at: Optional[datetime] = None
tally_error: Optional[str] = None
created_at: datetime
updated_at: datetime
@classmethod
def from_db(cls, invited_user: "prisma.models.InvitedUser") -> "InvitedUserRecord":
payload = (
invited_user.tallyUnderstanding
if isinstance(invited_user.tallyUnderstanding, dict)
else None
)
return cls(
id=invited_user.id,
email=invited_user.email,
status=invited_user.status,
auth_user_id=invited_user.authUserId,
name=invited_user.name,
tally_understanding=payload,
tally_status=invited_user.tallyStatus,
tally_computed_at=invited_user.tallyComputedAt,
tally_error=invited_user.tallyError,
created_at=invited_user.createdAt,
updated_at=invited_user.updatedAt,
)
class BulkInvitedUserRowResult(BaseModel):
row_number: int
email: Optional[str] = None
name: Optional[str] = None
status: Literal["CREATED", "SKIPPED", "ERROR"]
message: str
invited_user: Optional[InvitedUserRecord] = None
class BulkInvitedUsersResult(BaseModel):
created_count: int
skipped_count: int
error_count: int
results: list[BulkInvitedUserRowResult]
@dataclass(frozen=True)
class _ParsedInviteRow:
row_number: int
email: str
name: Optional[str]
def normalize_email(email: str) -> str:
return email.strip().lower()
def _normalize_name(name: Optional[str]) -> Optional[str]:
if name is None:
return None
normalized = name.strip()
return normalized or None
def _default_profile_name(email: str, preferred_name: Optional[str]) -> str:
if preferred_name:
return preferred_name
local_part = email.split("@", 1)[0].strip()
return local_part or "user"
def _sanitize_username_base(email: str) -> str:
local_part = email.split("@", 1)[0].lower()
sanitized = re.sub(r"[^a-z0-9-]", "", local_part)
sanitized = sanitized.strip("-")
return sanitized[:40] or "user"
async def _generate_unique_profile_username(email: str, tx) -> str:
base = _sanitize_username_base(email)
for _ in range(2):
candidate = f"{base}-{uuid4().hex[:6]}"
existing = await prisma.models.Profile.prisma(tx).find_unique(
where={"username": candidate}
)
if existing is None:
return candidate
raise RuntimeError(f"Unable to generate unique username for {email}")
async def _ensure_default_profile(
user_id: str,
email: str,
preferred_name: Optional[str],
tx,
) -> None:
existing_profile = await prisma.models.Profile.prisma(tx).find_unique(
where={"userId": user_id}
)
if existing_profile is not None:
return
username = await _generate_unique_profile_username(email, tx)
await prisma.models.Profile.prisma(tx).create(
data=prisma.types.ProfileCreateInput(
userId=user_id,
name=_default_profile_name(email, preferred_name),
username=username,
description="I'm new here",
links=[],
avatarUrl="",
)
)
async def _ensure_default_onboarding(user_id: str, tx) -> None:
await prisma.models.UserOnboarding.prisma(tx).upsert(
where={"userId": user_id},
data={
"create": prisma.types.UserOnboardingCreateInput(userId=user_id),
"update": {},
},
)
async def _apply_tally_understanding(
user_id: str,
invited_user: "prisma.models.InvitedUser",
tx,
) -> None:
if not isinstance(invited_user.tallyUnderstanding, dict):
return
try:
input_data = BusinessUnderstandingInput.model_validate(
invited_user.tallyUnderstanding
)
except Exception:
logger.warning(
"Malformed tallyUnderstanding for invited user %s; skipping",
invited_user.id,
exc_info=True,
)
return
payload = merge_business_understanding_data({}, input_data)
await prisma.models.CoPilotUnderstanding.prisma(tx).upsert(
where={"userId": user_id},
data={
"create": {"userId": user_id, "data": SafeJson(payload)},
"update": {"data": SafeJson(payload)},
},
)
async def list_invited_users(
page: int = 1,
page_size: int = 50,
) -> tuple[list[InvitedUserRecord], int]:
total = await prisma.models.InvitedUser.prisma().count()
invited_users = await prisma.models.InvitedUser.prisma().find_many(
order={"createdAt": "desc"},
skip=(page - 1) * page_size,
take=page_size,
)
return [InvitedUserRecord.from_db(iu) for iu in invited_users], total
async def create_invited_user(
email: str, name: Optional[str] = None
) -> InvitedUserRecord:
normalized_email = normalize_email(email)
normalized_name = _normalize_name(name)
existing_user = await prisma.models.User.prisma().find_unique(
where={"email": normalized_email}
)
if existing_user is not None:
raise PreconditionFailed("An active user with this email already exists")
existing_invited_user = await prisma.models.InvitedUser.prisma().find_unique(
where={"email": normalized_email}
)
if existing_invited_user is not None:
raise PreconditionFailed("An invited user with this email already exists")
try:
invited_user = await prisma.models.InvitedUser.prisma().create(
data={
"email": normalized_email,
"name": normalized_name,
"status": prisma.enums.InvitedUserStatus.INVITED,
"tallyStatus": prisma.enums.TallyComputationStatus.PENDING,
}
)
except UniqueViolationError:
raise PreconditionFailed("An invited user with this email already exists")
schedule_invited_user_tally_precompute(invited_user.id)
return InvitedUserRecord.from_db(invited_user)
async def revoke_invited_user(invited_user_id: str) -> InvitedUserRecord:
invited_user = await prisma.models.InvitedUser.prisma().find_unique(
where={"id": invited_user_id}
)
if invited_user is None:
raise NotFoundError(f"Invited user {invited_user_id} not found")
if invited_user.status == prisma.enums.InvitedUserStatus.CLAIMED:
raise PreconditionFailed("Claimed invited users cannot be revoked")
if invited_user.status == prisma.enums.InvitedUserStatus.REVOKED:
return InvitedUserRecord.from_db(invited_user)
revoked_user = await prisma.models.InvitedUser.prisma().update(
where={"id": invited_user_id},
data={"status": prisma.enums.InvitedUserStatus.REVOKED},
)
if revoked_user is None:
raise NotFoundError(f"Invited user {invited_user_id} not found")
return InvitedUserRecord.from_db(revoked_user)
async def retry_invited_user_tally(invited_user_id: str) -> InvitedUserRecord:
invited_user = await prisma.models.InvitedUser.prisma().find_unique(
where={"id": invited_user_id}
)
if invited_user is None:
raise NotFoundError(f"Invited user {invited_user_id} not found")
if invited_user.status == prisma.enums.InvitedUserStatus.REVOKED:
raise PreconditionFailed("Revoked invited users cannot retry Tally seeding")
refreshed_user = await prisma.models.InvitedUser.prisma().update(
where={"id": invited_user_id},
data={
"tallyUnderstanding": None,
"tallyStatus": prisma.enums.TallyComputationStatus.PENDING,
"tallyComputedAt": None,
"tallyError": None,
},
)
if refreshed_user is None:
raise NotFoundError(f"Invited user {invited_user_id} not found")
schedule_invited_user_tally_precompute(invited_user_id)
return InvitedUserRecord.from_db(refreshed_user)
def _decode_bulk_invite_file(content: bytes) -> str:
if len(content) > MAX_BULK_INVITE_FILE_BYTES:
raise ValueError("Invite file exceeds the maximum size of 1 MB")
try:
return content.decode("utf-8-sig")
except UnicodeDecodeError as exc:
raise ValueError("Invite file must be UTF-8 encoded") from exc
def _parse_bulk_invite_csv(text: str) -> list[_ParsedInviteRow]:
indexed_rows: list[tuple[int, list[str]]] = []
for row_number, row in enumerate(csv.reader(io.StringIO(text)), start=1):
normalized_row = [cell.strip() for cell in row]
if any(normalized_row):
indexed_rows.append((row_number, normalized_row))
if not indexed_rows:
return []
header = [cell.lower() for cell in indexed_rows[0][1]]
has_header = "email" in header
email_index = header.index("email") if has_header else 0
name_index: Optional[int] = (
header.index("name")
if has_header and "name" in header
else (1 if not has_header else None)
)
data_rows = indexed_rows[1:] if has_header else indexed_rows
parsed_rows: list[_ParsedInviteRow] = []
for row_number, row in data_rows:
if len(parsed_rows) >= MAX_BULK_INVITE_ROWS:
break
email = row[email_index].strip() if len(row) > email_index else ""
name = (
row[name_index].strip()
if name_index is not None and len(row) > name_index
else ""
)
parsed_rows.append(
_ParsedInviteRow(
row_number=row_number,
email=email,
name=name or None,
)
)
return parsed_rows
def _parse_bulk_invite_text(text: str) -> list[_ParsedInviteRow]:
parsed_rows: list[_ParsedInviteRow] = []
for row_number, raw_line in enumerate(text.splitlines(), start=1):
if len(parsed_rows) >= MAX_BULK_INVITE_ROWS:
break
line = raw_line.strip()
if not line or line.startswith("#"):
continue
parsed_rows.append(
_ParsedInviteRow(
row_number=row_number,
email=line,
name=None,
)
)
return parsed_rows
def _parse_bulk_invite_file(
filename: Optional[str],
content: bytes,
) -> list[_ParsedInviteRow]:
text = _decode_bulk_invite_file(content)
file_name = filename.lower() if filename else ""
parsed_rows = (
_parse_bulk_invite_csv(text)
if file_name.endswith(".csv")
else _parse_bulk_invite_text(text)
)
if not parsed_rows:
raise ValueError("Invite file did not contain any emails")
return parsed_rows
async def bulk_create_invited_users_from_file(
filename: Optional[str],
content: bytes,
) -> BulkInvitedUsersResult:
parsed_rows = _parse_bulk_invite_file(filename, content)
created_count = 0
skipped_count = 0
error_count = 0
results: list[BulkInvitedUserRowResult] = []
seen_emails: set[str] = set()
for row in parsed_rows:
row_name = _normalize_name(row.name)
try:
validated_email = _email_adapter.validate_python(row.email)
except ValidationError:
error_count += 1
results.append(
BulkInvitedUserRowResult(
row_number=row.row_number,
email=row.email or None,
name=row_name,
status="ERROR",
message="Invalid email address",
)
)
continue
normalized_email = normalize_email(str(validated_email))
if normalized_email in seen_emails:
skipped_count += 1
results.append(
BulkInvitedUserRowResult(
row_number=row.row_number,
email=normalized_email,
name=row_name,
status="SKIPPED",
message="Duplicate email in upload file",
)
)
continue
seen_emails.add(normalized_email)
try:
invited_user = await create_invited_user(normalized_email, row_name)
except PreconditionFailed as exc:
skipped_count += 1
results.append(
BulkInvitedUserRowResult(
row_number=row.row_number,
email=normalized_email,
name=row_name,
status="SKIPPED",
message=str(exc),
)
)
except Exception:
masked = mask_email(normalized_email)
logger.exception(
"Failed to create bulk invite for row %s (%s)",
row.row_number,
masked,
)
error_count += 1
results.append(
BulkInvitedUserRowResult(
row_number=row.row_number,
email=normalized_email,
name=row_name,
status="ERROR",
message="Unexpected error creating invite",
)
)
else:
created_count += 1
results.append(
BulkInvitedUserRowResult(
row_number=row.row_number,
email=normalized_email,
name=row_name,
status="CREATED",
message="Invite created",
invited_user=invited_user,
)
)
return BulkInvitedUsersResult(
created_count=created_count,
skipped_count=skipped_count,
error_count=error_count,
results=results,
)
async def _compute_invited_user_tally_seed(invited_user_id: str) -> None:
invited_user = await prisma.models.InvitedUser.prisma().find_unique(
where={"id": invited_user_id}
)
if invited_user is None:
return
if invited_user.status == prisma.enums.InvitedUserStatus.REVOKED:
return
try:
r = await get_redis_async()
except Exception:
r = None
lock: AsyncClusterLock | None = None
if r is not None:
lock = AsyncClusterLock(
redis=r,
key=f"tally_seed:{invited_user_id}",
owner_id=_WORKER_ID,
timeout=_TALLY_STALE_SECONDS,
)
current_owner = await lock.try_acquire()
if current_owner is None:
logger.warn("Redis unvailable for tally lock - skipping tally enrichement")
return
elif current_owner != _WORKER_ID:
logger.debug(
"Tally seed for %s already locked by %s, skipping",
invited_user_id,
current_owner,
)
return
if (
invited_user.tallyStatus == prisma.enums.TallyComputationStatus.RUNNING
and invited_user.updatedAt is not None
):
age = (datetime.now(timezone.utc) - invited_user.updatedAt).total_seconds()
if age < _TALLY_STALE_SECONDS:
logger.debug(
"Tally task for %s still RUNNING (age=%ds), skipping",
invited_user_id,
int(age),
)
return
logger.info(
"Tally task for %s is stale (age=%ds), re-running",
invited_user_id,
int(age),
)
await prisma.models.InvitedUser.prisma().update(
where={"id": invited_user_id},
data={
"tallyStatus": prisma.enums.TallyComputationStatus.RUNNING,
"tallyError": None,
},
)
try:
input_data = await get_business_understanding_input_from_tally(
invited_user.email,
require_api_key=True,
)
payload = (
SafeJson(input_data.model_dump(exclude_none=True))
if input_data is not None
else None
)
await prisma.models.InvitedUser.prisma().update(
where={"id": invited_user_id},
data={
"tallyUnderstanding": payload,
"tallyStatus": prisma.enums.TallyComputationStatus.READY,
"tallyComputedAt": datetime.now(timezone.utc),
"tallyError": None,
},
)
except Exception as exc:
logger.exception(
"Failed to compute Tally understanding for invited user %s",
invited_user_id,
)
sanitized_error = re.sub(
r"https?://\S+", "<url>", f"{type(exc).__name__}: {exc}"
)[:_MAX_TALLY_ERROR_LENGTH]
await prisma.models.InvitedUser.prisma().update(
where={"id": invited_user_id},
data={
"tallyStatus": prisma.enums.TallyComputationStatus.FAILED,
"tallyError": sanitized_error,
},
)
def schedule_invited_user_tally_precompute(invited_user_id: str) -> None:
existing = _tally_seed_tasks.get(invited_user_id)
if existing is not None and not existing.done():
logger.debug("Tally task already running for %s, skipping", invited_user_id)
return
task = asyncio.create_task(_compute_invited_user_tally_seed(invited_user_id))
_tally_seed_tasks[invited_user_id] = task
def _on_done(t: asyncio.Task, _id: str = invited_user_id) -> None:
if _tally_seed_tasks.get(_id) is t:
del _tally_seed_tasks[_id]
task.add_done_callback(_on_done)
async def _open_signup_create_user(
auth_user_id: str,
normalized_email: str,
metadata_name: Optional[str],
) -> User:
"""Create a user without requiring an invite (open signup mode)."""
preferred_name = _normalize_name(metadata_name)
try:
async with transaction() as tx:
user = await prisma.models.User.prisma(tx).create(
data=prisma.types.UserCreateInput(
id=auth_user_id,
email=normalized_email,
name=preferred_name,
)
)
await _ensure_default_profile(
auth_user_id, normalized_email, preferred_name, tx
)
await _ensure_default_onboarding(auth_user_id, tx)
except UniqueViolationError:
existing = await prisma.models.User.prisma().find_unique(
where={"id": auth_user_id}
)
if existing is not None:
return User.from_db(existing)
raise
return User.from_db(user)
# TODO: We need to change this functions logic before going live
async def get_or_activate_user(user_data: dict) -> User:
auth_user_id = user_data.get("sub")
if not auth_user_id:
raise NotAuthorizedError("User ID not found in token")
auth_email = user_data.get("email")
if not auth_email:
raise NotAuthorizedError("Email not found in token")
normalized_email = normalize_email(auth_email)
user_metadata = user_data.get("user_metadata")
metadata_name = (
user_metadata.get("name") if isinstance(user_metadata, dict) else None
)
existing_user = None
try:
existing_user = await get_user_by_id(auth_user_id)
except ValueError:
existing_user = None
except Exception:
logger.exception("Error on get user by id during tally enrichment process")
raise
if existing_user is not None:
return existing_user
if not _settings.config.enable_invite_gate or normalized_email.endswith("@agpt.co"):
return await _open_signup_create_user(
auth_user_id, normalized_email, metadata_name
)
invited_user = await prisma.models.InvitedUser.prisma().find_unique(
where={"email": normalized_email}
)
if invited_user is None:
raise NotAuthorizedError("Your email is not allowed to access the platform")
if invited_user.status != prisma.enums.InvitedUserStatus.INVITED:
raise NotAuthorizedError("Your invitation is no longer active")
try:
async with transaction() as tx:
current_user = await prisma.models.User.prisma(tx).find_unique(
where={"id": auth_user_id}
)
if current_user is not None:
return User.from_db(current_user)
current_invited_user = await prisma.models.InvitedUser.prisma(
tx
).find_unique(where={"email": normalized_email})
if current_invited_user is None:
raise NotAuthorizedError(
"Your email is not allowed to access the platform"
)
if current_invited_user.status != prisma.enums.InvitedUserStatus.INVITED:
raise NotAuthorizedError("Your invitation is no longer active")
if current_invited_user.authUserId not in (None, auth_user_id):
raise NotAuthorizedError("Your invitation has already been claimed")
preferred_name = current_invited_user.name or _normalize_name(metadata_name)
await prisma.models.User.prisma(tx).create(
data=prisma.types.UserCreateInput(
id=auth_user_id,
email=normalized_email,
name=preferred_name,
)
)
await prisma.models.InvitedUser.prisma(tx).update(
where={"id": current_invited_user.id},
data={
"status": prisma.enums.InvitedUserStatus.CLAIMED,
"authUserId": auth_user_id,
},
)
await _ensure_default_profile(
auth_user_id,
normalized_email,
preferred_name,
tx,
)
await _ensure_default_onboarding(auth_user_id, tx)
await _apply_tally_understanding(auth_user_id, current_invited_user, tx)
except UniqueViolationError:
logger.info("Concurrent activation for user %s; re-fetching", auth_user_id)
already_created = await prisma.models.User.prisma().find_unique(
where={"id": auth_user_id}
)
if already_created is not None:
return User.from_db(already_created)
raise RuntimeError(
f"UniqueViolationError during activation but user {auth_user_id} not found"
)
get_user_by_id.cache_delete(auth_user_id)
get_user_by_email.cache_delete(normalized_email)
activated_user = await prisma.models.User.prisma().find_unique(
where={"id": auth_user_id}
)
if activated_user is None:
raise RuntimeError(
f"Activated user {auth_user_id} was not found after creation"
)
return User.from_db(activated_user)

View File

@@ -0,0 +1,335 @@
from contextlib import asynccontextmanager
from datetime import datetime, timezone
from types import SimpleNamespace
from typing import cast
from unittest.mock import AsyncMock, Mock
import prisma.enums
import prisma.models
import pytest
import pytest_mock
from backend.util.exceptions import NotAuthorizedError, PreconditionFailed
from .invited_user import (
InvitedUserRecord,
bulk_create_invited_users_from_file,
create_invited_user,
get_or_activate_user,
retry_invited_user_tally,
)
def _invited_user_db_record(
*,
status: prisma.enums.InvitedUserStatus = prisma.enums.InvitedUserStatus.INVITED,
tally_understanding: dict | None = None,
):
now = datetime.now(timezone.utc)
return SimpleNamespace(
id="invite-1",
email="invited@example.com",
status=status,
authUserId=None,
name="Invited User",
tallyUnderstanding=tally_understanding,
tallyStatus=prisma.enums.TallyComputationStatus.PENDING,
tallyComputedAt=None,
tallyError=None,
createdAt=now,
updatedAt=now,
)
def _invited_user_record(
*,
status: prisma.enums.InvitedUserStatus = prisma.enums.InvitedUserStatus.INVITED,
tally_understanding: dict | None = None,
):
return InvitedUserRecord.from_db(
cast(
prisma.models.InvitedUser,
_invited_user_db_record(
status=status,
tally_understanding=tally_understanding,
),
)
)
def _user_db_record():
now = datetime.now(timezone.utc)
return SimpleNamespace(
id="auth-user-1",
email="invited@example.com",
emailVerified=True,
name="Invited User",
createdAt=now,
updatedAt=now,
metadata={},
integrations="",
stripeCustomerId=None,
topUpConfig=None,
maxEmailsPerDay=3,
notifyOnAgentRun=True,
notifyOnZeroBalance=True,
notifyOnLowBalance=True,
notifyOnBlockExecutionFailed=True,
notifyOnContinuousAgentError=True,
notifyOnDailySummary=True,
notifyOnWeeklySummary=True,
notifyOnMonthlySummary=True,
notifyOnAgentApproved=True,
notifyOnAgentRejected=True,
timezone="not-set",
)
@pytest.mark.asyncio
async def test_create_invited_user_rejects_existing_active_user(
mocker: pytest_mock.MockerFixture,
) -> None:
user_repo = Mock()
user_repo.find_unique = AsyncMock(return_value=_user_db_record())
invited_user_repo = Mock()
invited_user_repo.find_unique = AsyncMock()
mocker.patch(
"backend.data.invited_user.prisma.models.User.prisma", return_value=user_repo
)
mocker.patch(
"backend.data.invited_user.prisma.models.InvitedUser.prisma",
return_value=invited_user_repo,
)
with pytest.raises(PreconditionFailed):
await create_invited_user("Invited@example.com")
@pytest.mark.asyncio
async def test_create_invited_user_schedules_tally_seed(
mocker: pytest_mock.MockerFixture,
) -> None:
user_repo = Mock()
user_repo.find_unique = AsyncMock(return_value=None)
invited_user_repo = Mock()
invited_user_repo.find_unique = AsyncMock(return_value=None)
invited_user_repo.create = AsyncMock(return_value=_invited_user_db_record())
schedule = mocker.patch(
"backend.data.invited_user.schedule_invited_user_tally_precompute"
)
mocker.patch(
"backend.data.invited_user.prisma.models.User.prisma", return_value=user_repo
)
mocker.patch(
"backend.data.invited_user.prisma.models.InvitedUser.prisma",
return_value=invited_user_repo,
)
invited_user = await create_invited_user("Invited@example.com", "Invited User")
assert invited_user.email == "invited@example.com"
invited_user_repo.create.assert_awaited_once()
schedule.assert_called_once_with("invite-1")
@pytest.mark.asyncio
async def test_retry_invited_user_tally_resets_state_and_schedules(
mocker: pytest_mock.MockerFixture,
) -> None:
invited_user_repo = Mock()
invited_user_repo.find_unique = AsyncMock(return_value=_invited_user_db_record())
invited_user_repo.update = AsyncMock(return_value=_invited_user_db_record())
schedule = mocker.patch(
"backend.data.invited_user.schedule_invited_user_tally_precompute"
)
mocker.patch(
"backend.data.invited_user.prisma.models.InvitedUser.prisma",
return_value=invited_user_repo,
)
invited_user = await retry_invited_user_tally("invite-1")
assert invited_user.id == "invite-1"
invited_user_repo.update.assert_awaited_once()
schedule.assert_called_once_with("invite-1")
@pytest.mark.asyncio
async def test_get_or_activate_user_requires_invite(
mocker: pytest_mock.MockerFixture,
) -> None:
invited_user_repo = Mock()
invited_user_repo.find_unique = AsyncMock(return_value=None)
mock_get_user_by_id = AsyncMock(side_effect=ValueError("User not found"))
mock_get_user_by_id.cache_delete = Mock()
mocker.patch(
"backend.data.invited_user.get_user_by_id",
mock_get_user_by_id,
)
mocker.patch(
"backend.data.invited_user._settings.config.enable_invite_gate",
True,
)
mocker.patch(
"backend.data.invited_user.prisma.models.InvitedUser.prisma",
return_value=invited_user_repo,
)
with pytest.raises(NotAuthorizedError):
await get_or_activate_user(
{"sub": "auth-user-1", "email": "invited@example.com"}
)
@pytest.mark.asyncio
async def test_get_or_activate_user_creates_user_from_invite(
mocker: pytest_mock.MockerFixture,
) -> None:
tx = object()
invited_user = _invited_user_db_record(
tally_understanding={"user_name": "Invited User", "industry": "Automation"}
)
created_user = _user_db_record()
outside_user_repo = Mock()
# Only called once at post-transaction verification (line 741);
# get_user_by_id (line 657) uses prisma.user.find_unique, not this mock.
outside_user_repo.find_unique = AsyncMock(return_value=created_user)
inside_user_repo = Mock()
inside_user_repo.find_unique = AsyncMock(return_value=None)
inside_user_repo.create = AsyncMock(return_value=created_user)
outside_invited_repo = Mock()
outside_invited_repo.find_unique = AsyncMock(return_value=invited_user)
inside_invited_repo = Mock()
inside_invited_repo.find_unique = AsyncMock(return_value=invited_user)
inside_invited_repo.update = AsyncMock(return_value=invited_user)
def user_prisma(client=None):
return inside_user_repo if client is tx else outside_user_repo
def invited_user_prisma(client=None):
return inside_invited_repo if client is tx else outside_invited_repo
@asynccontextmanager
async def fake_transaction():
yield tx
# Mock get_user_by_id since it uses prisma.user.find_unique (global client),
# not prisma.models.User.prisma().find_unique which we mock above.
mock_get_user_by_id = AsyncMock(side_effect=ValueError("User not found"))
mock_get_user_by_id.cache_delete = Mock()
mocker.patch(
"backend.data.invited_user.get_user_by_id",
mock_get_user_by_id,
)
mock_get_user_by_email = AsyncMock()
mock_get_user_by_email.cache_delete = Mock()
mocker.patch(
"backend.data.invited_user.get_user_by_email",
mock_get_user_by_email,
)
ensure_profile = mocker.patch(
"backend.data.invited_user._ensure_default_profile",
AsyncMock(),
)
ensure_onboarding = mocker.patch(
"backend.data.invited_user._ensure_default_onboarding",
AsyncMock(),
)
apply_tally = mocker.patch(
"backend.data.invited_user._apply_tally_understanding",
AsyncMock(),
)
mocker.patch("backend.data.invited_user.transaction", fake_transaction)
mocker.patch(
"backend.data.invited_user.prisma.models.User.prisma", side_effect=user_prisma
)
mocker.patch(
"backend.data.invited_user.prisma.models.InvitedUser.prisma",
side_effect=invited_user_prisma,
)
user = await get_or_activate_user(
{
"sub": "auth-user-1",
"email": "Invited@example.com",
"user_metadata": {"name": "Invited User"},
}
)
assert user.id == "auth-user-1"
inside_user_repo.create.assert_awaited_once()
inside_invited_repo.update.assert_awaited_once()
ensure_profile.assert_awaited_once()
ensure_onboarding.assert_awaited_once_with("auth-user-1", tx)
apply_tally.assert_awaited_once_with("auth-user-1", invited_user, tx)
@pytest.mark.asyncio
async def test_bulk_create_invited_users_from_text_file(
mocker: pytest_mock.MockerFixture,
) -> None:
create_invited = mocker.patch(
"backend.data.invited_user.create_invited_user",
AsyncMock(
side_effect=[
_invited_user_record(),
_invited_user_record(),
]
),
)
result = await bulk_create_invited_users_from_file(
"invites.txt",
b"Invited@example.com\nsecond@example.com\n",
)
assert result.created_count == 2
assert result.skipped_count == 0
assert result.error_count == 0
assert [row.status for row in result.results] == ["CREATED", "CREATED"]
assert create_invited.await_count == 2
@pytest.mark.asyncio
async def test_bulk_create_invited_users_handles_csv_duplicates_and_invalid_rows(
mocker: pytest_mock.MockerFixture,
) -> None:
create_invited = mocker.patch(
"backend.data.invited_user.create_invited_user",
AsyncMock(
side_effect=[
_invited_user_record(),
PreconditionFailed("An invited user with this email already exists"),
]
),
)
result = await bulk_create_invited_users_from_file(
"invites.csv",
(
"email,name\n"
"valid@example.com,Valid User\n"
"not-an-email,Bad Row\n"
"valid@example.com,Duplicate In File\n"
"existing@example.com,Existing User\n"
).encode("utf-8"),
)
assert result.created_count == 1
assert result.skipped_count == 2
assert result.error_count == 1
assert [row.status for row in result.results] == [
"CREATED",
"ERROR",
"SKIPPED",
"SKIPPED",
]
assert create_invited.await_count == 2

View File

@@ -41,7 +41,7 @@ _MAX_PAGES = 100
_LLM_TIMEOUT = 30
def _mask_email(email: str) -> str:
def mask_email(email: str) -> str:
"""Mask an email for safe logging: 'alice@example.com' -> 'a***e@example.com'."""
try:
local, domain = email.rsplit("@", 1)
@@ -196,8 +196,7 @@ async def _refresh_cache(form_id: str) -> tuple[dict, list]:
Returns (email_index, questions).
"""
settings = Settings()
client = _make_tally_client(settings.secrets.tally_api_key)
client = _make_tally_client(_settings.secrets.tally_api_key)
redis = await get_redis_async()
last_fetch_key = _LAST_FETCH_KEY.format(form_id=form_id)
@@ -332,6 +331,9 @@ Fields:
- current_software (list of strings): software/tools currently used
- existing_automation (list of strings): existing automations
- additional_notes (string): any additional context
- suggested_prompts (list of 5 strings): short action prompts (each under 20 words) that would help \
this person get started with automating their work. Should be specific to their industry, role, and \
pain points; actionable and conversational in tone; focused on automation opportunities.
Form data:
"""
@@ -339,21 +341,21 @@ Form data:
_EXTRACTION_SUFFIX = "\n\nReturn ONLY valid JSON."
async def extract_business_understanding(
async def extract_business_understanding_from_tally(
formatted_text: str,
) -> BusinessUnderstandingInput:
"""Use an LLM to extract structured business understanding from form text.
"""
Use an LLM to extract structured business understanding from form text.
Raises on timeout or unparseable response so the caller can handle it.
"""
settings = Settings()
api_key = settings.secrets.open_router_api_key
api_key = _settings.secrets.open_router_api_key
client = AsyncOpenAI(api_key=api_key, base_url=OPENROUTER_BASE_URL)
try:
response = await asyncio.wait_for(
client.chat.completions.create(
model="openai/gpt-4o-mini",
model=_settings.config.tally_extraction_llm_model,
messages=[
{
"role": "user",
@@ -378,9 +380,57 @@ async def extract_business_understanding(
# Filter out null values before constructing
cleaned = {k: v for k, v in data.items() if v is not None}
# Validate suggested_prompts: filter >20 words, keep top 3
raw_prompts = cleaned.get("suggested_prompts", [])
if isinstance(raw_prompts, list):
valid = [
p.strip()
for p in raw_prompts
if isinstance(p, str) and len(p.strip().split()) <= 20
]
# This will keep up to 3 suggestions
short_prompts = valid[:3] if valid else None
if short_prompts:
cleaned["suggested_prompts"] = short_prompts
else:
# We dont want to add a None value suggested_prompts field
cleaned.pop("suggested_prompts", None)
else:
# suggested_prompts must be a list - removing it as its not here
cleaned.pop("suggested_prompts", None)
return BusinessUnderstandingInput(**cleaned)
async def get_business_understanding_input_from_tally(
email: str,
*,
require_api_key: bool = False,
) -> Optional[BusinessUnderstandingInput]:
if not _settings.secrets.tally_api_key:
if require_api_key:
raise RuntimeError("Tally API key is not configured")
logger.debug("Tally: no API key configured, skipping")
return None
masked = mask_email(email)
result = await find_submission_by_email(TALLY_FORM_ID, email)
if result is None:
logger.debug(f"Tally: no submission found for {masked}")
return None
submission, questions = result
logger.info(f"Tally: found submission for {masked}, extracting understanding")
formatted = format_submission_for_llm(submission, questions)
if not formatted.strip():
logger.warning("Tally: formatted submission was empty, skipping")
return None
return await extract_business_understanding_from_tally(formatted)
async def populate_understanding_from_tally(user_id: str, email: str) -> None:
"""Main orchestrator: check Tally for a matching submission and populate understanding.
@@ -395,30 +445,10 @@ async def populate_understanding_from_tally(user_id: str, email: str) -> None:
)
return
# Check API key is configured
settings = Settings()
if not settings.secrets.tally_api_key:
logger.debug("Tally: no API key configured, skipping")
understanding_input = await get_business_understanding_input_from_tally(email)
if understanding_input is None:
return
# Look up submission by email
masked = _mask_email(email)
result = await find_submission_by_email(TALLY_FORM_ID, email)
if result is None:
logger.debug(f"Tally: no submission found for {masked}")
return
submission, questions = result
logger.info(f"Tally: found submission for {masked}, extracting understanding")
# Format and extract
formatted = format_submission_for_llm(submission, questions)
if not formatted.strip():
logger.warning("Tally: formatted submission was empty, skipping")
return
understanding_input = await extract_business_understanding(formatted)
# Upsert into database
await upsert_business_understanding(user_id, understanding_input)
logger.info(f"Tally: successfully populated understanding for user {user_id}")

View File

@@ -12,11 +12,11 @@ from backend.data.tally import (
_build_email_index,
_format_answer,
_make_tally_client,
_mask_email,
_refresh_cache,
extract_business_understanding,
extract_business_understanding_from_tally,
find_submission_by_email,
format_submission_for_llm,
mask_email,
populate_understanding_from_tally,
)
@@ -248,7 +248,7 @@ async def test_populate_understanding_skips_no_api_key():
new_callable=AsyncMock,
return_value=None,
),
patch("backend.data.tally.Settings", return_value=mock_settings),
patch("backend.data.tally._settings", mock_settings),
patch(
"backend.data.tally.find_submission_by_email",
new_callable=AsyncMock,
@@ -284,6 +284,7 @@ async def test_populate_understanding_full_flow():
],
}
mock_input = MagicMock()
mock_input.suggested_prompts = ["Prompt 1", "Prompt 2", "Prompt 3"]
with (
patch(
@@ -291,14 +292,14 @@ async def test_populate_understanding_full_flow():
new_callable=AsyncMock,
return_value=None,
),
patch("backend.data.tally.Settings", return_value=mock_settings),
patch("backend.data.tally._settings", mock_settings),
patch(
"backend.data.tally.find_submission_by_email",
new_callable=AsyncMock,
return_value=(submission, SAMPLE_QUESTIONS),
),
patch(
"backend.data.tally.extract_business_understanding",
"backend.data.tally.extract_business_understanding_from_tally",
new_callable=AsyncMock,
return_value=mock_input,
) as mock_extract,
@@ -331,14 +332,14 @@ async def test_populate_understanding_handles_llm_timeout():
new_callable=AsyncMock,
return_value=None,
),
patch("backend.data.tally.Settings", return_value=mock_settings),
patch("backend.data.tally._settings", mock_settings),
patch(
"backend.data.tally.find_submission_by_email",
new_callable=AsyncMock,
return_value=(submission, SAMPLE_QUESTIONS),
),
patch(
"backend.data.tally.extract_business_understanding",
"backend.data.tally.extract_business_understanding_from_tally",
new_callable=AsyncMock,
side_effect=asyncio.TimeoutError(),
),
@@ -356,13 +357,13 @@ async def test_populate_understanding_handles_llm_timeout():
def test_mask_email():
assert _mask_email("alice@example.com") == "a***e@example.com"
assert _mask_email("ab@example.com") == "a***@example.com"
assert _mask_email("a@example.com") == "a***@example.com"
assert mask_email("alice@example.com") == "a***e@example.com"
assert mask_email("ab@example.com") == "a***@example.com"
assert mask_email("a@example.com") == "a***@example.com"
def test_mask_email_invalid():
assert _mask_email("no-at-sign") == "***"
assert mask_email("no-at-sign") == "***"
# ── Prompt construction (curly-brace safety) ─────────────────────────────────
@@ -393,11 +394,11 @@ def test_extraction_prompt_no_format_placeholders():
assert single_braces == [], f"Found format placeholders: {single_braces}"
# ── extract_business_understanding ────────────────────────────────────────────
# ── extract_business_understanding_from_tally ────────────────────────────────────────────
@pytest.mark.asyncio
async def test_extract_business_understanding_success():
async def test_extract_business_understanding_from_tally_success():
"""Happy path: LLM returns valid JSON that maps to BusinessUnderstandingInput."""
mock_choice = MagicMock()
mock_choice.message.content = json.dumps(
@@ -406,6 +407,13 @@ async def test_extract_business_understanding_success():
"business_name": "Acme Corp",
"industry": "Technology",
"pain_points": ["manual reporting"],
"suggested_prompts": [
"Automate weekly reports",
"Set up invoice processing",
"Create a customer onboarding flow",
"Track project deadlines automatically",
"Send follow-up emails after meetings",
],
}
)
mock_response = MagicMock()
@@ -415,16 +423,56 @@ async def test_extract_business_understanding_success():
mock_client.chat.completions.create.return_value = mock_response
with patch("backend.data.tally.AsyncOpenAI", return_value=mock_client):
result = await extract_business_understanding("Q: Name?\nA: Alice")
result = await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
assert result.user_name == "Alice"
assert result.business_name == "Acme Corp"
assert result.industry == "Technology"
assert result.pain_points == ["manual reporting"]
# suggested_prompts validated and sliced to top 3
assert result.suggested_prompts == [
"Automate weekly reports",
"Set up invoice processing",
"Create a customer onboarding flow",
]
@pytest.mark.asyncio
async def test_extract_business_understanding_filters_nulls():
async def test_extract_business_understanding_from_tally_filters_long_prompts():
"""Prompts exceeding 20 words are excluded and only top 3 are kept."""
long_prompt = " ".join(["word"] * 21)
mock_choice = MagicMock()
mock_choice.message.content = json.dumps(
{
"user_name": "Alice",
"suggested_prompts": [
long_prompt,
"Short prompt one",
long_prompt,
"Short prompt two",
"Short prompt three",
"Short prompt four",
],
}
)
mock_response = MagicMock()
mock_response.choices = [mock_choice]
mock_client = AsyncMock()
mock_client.chat.completions.create.return_value = mock_response
with patch("backend.data.tally.AsyncOpenAI", return_value=mock_client):
result = await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
assert result.suggested_prompts == [
"Short prompt one",
"Short prompt two",
"Short prompt three",
]
@pytest.mark.asyncio
async def test_extract_business_understanding_from_tally_filters_nulls():
"""Null values from LLM should be excluded from the result."""
mock_choice = MagicMock()
mock_choice.message.content = json.dumps(
@@ -437,7 +485,7 @@ async def test_extract_business_understanding_filters_nulls():
mock_client.chat.completions.create.return_value = mock_response
with patch("backend.data.tally.AsyncOpenAI", return_value=mock_client):
result = await extract_business_understanding("Q: Name?\nA: Alice")
result = await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
assert result.user_name == "Alice"
assert result.business_name is None
@@ -445,7 +493,7 @@ async def test_extract_business_understanding_filters_nulls():
@pytest.mark.asyncio
async def test_extract_business_understanding_invalid_json():
async def test_extract_business_understanding_from_tally_invalid_json():
"""Invalid JSON from LLM should raise JSONDecodeError."""
mock_choice = MagicMock()
mock_choice.message.content = "not valid json {"
@@ -459,11 +507,11 @@ async def test_extract_business_understanding_invalid_json():
patch("backend.data.tally.AsyncOpenAI", return_value=mock_client),
pytest.raises(json.JSONDecodeError),
):
await extract_business_understanding("Q: Name?\nA: Alice")
await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
@pytest.mark.asyncio
async def test_extract_business_understanding_timeout():
async def test_extract_business_understanding_from_tally_timeout():
"""LLM timeout should propagate as asyncio.TimeoutError."""
mock_client = AsyncMock()
mock_client.chat.completions.create.side_effect = asyncio.TimeoutError()
@@ -473,7 +521,7 @@ async def test_extract_business_understanding_timeout():
patch("backend.data.tally._LLM_TIMEOUT", 0.001),
pytest.raises(asyncio.TimeoutError),
):
await extract_business_understanding("Q: Name?\nA: Alice")
await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
# ── _refresh_cache ───────────────────────────────────────────────────────────
@@ -492,7 +540,7 @@ async def test_refresh_cache_full_fetch():
submissions = SAMPLE_SUBMISSIONS
with (
patch("backend.data.tally.Settings", return_value=mock_settings),
patch("backend.data.tally._settings", mock_settings),
patch(
"backend.data.tally.get_redis_async",
new_callable=AsyncMock,
@@ -540,7 +588,7 @@ async def test_refresh_cache_incremental_fetch():
new_submissions = [SAMPLE_SUBMISSIONS[0]] # Just Alice
with (
patch("backend.data.tally.Settings", return_value=mock_settings),
patch("backend.data.tally._settings", mock_settings),
patch(
"backend.data.tally.get_redis_async",
new_callable=AsyncMock,

View File

@@ -86,6 +86,11 @@ class BusinessUnderstandingInput(pydantic.BaseModel):
None, description="Any additional context"
)
# Suggested prompts (UI-only, not included in system prompt)
suggested_prompts: Optional[list[str]] = pydantic.Field(
None, description="LLM-generated suggested prompts based on business context"
)
class BusinessUnderstanding(pydantic.BaseModel):
"""Full business understanding model returned from database."""
@@ -122,6 +127,9 @@ class BusinessUnderstanding(pydantic.BaseModel):
# Additional context
additional_notes: Optional[str] = None
# Suggested prompts (UI-only, not included in system prompt)
suggested_prompts: list[str] = pydantic.Field(default_factory=list)
@classmethod
def from_db(cls, db_record: CoPilotUnderstanding) -> "BusinessUnderstanding":
"""Convert database record to Pydantic model."""
@@ -149,6 +157,7 @@ class BusinessUnderstanding(pydantic.BaseModel):
current_software=_json_to_list(business.get("current_software")),
existing_automation=_json_to_list(business.get("existing_automation")),
additional_notes=business.get("additional_notes"),
suggested_prompts=_json_to_list(data.get("suggested_prompts")),
)
@@ -166,6 +175,62 @@ def _merge_lists(existing: list | None, new: list | None) -> list | None:
return merged
def merge_business_understanding_data(
existing_data: dict[str, Any],
input_data: BusinessUnderstandingInput,
) -> dict[str, Any]:
merged_data = dict(existing_data)
merged_business: dict[str, Any] = {}
if isinstance(merged_data.get("business"), dict):
merged_business = dict(merged_data["business"])
business_string_fields = [
"job_title",
"business_name",
"industry",
"business_size",
"user_role",
"additional_notes",
]
business_list_fields = [
"key_workflows",
"daily_activities",
"pain_points",
"bottlenecks",
"manual_tasks",
"automation_goals",
"current_software",
"existing_automation",
]
if input_data.user_name is not None:
merged_data["name"] = input_data.user_name
for field in business_string_fields:
value = getattr(input_data, field)
if value is not None:
merged_business[field] = value
for field in business_list_fields:
value = getattr(input_data, field)
if value is not None:
existing_list = _json_to_list(merged_business.get(field))
merged_list = _merge_lists(existing_list, value)
merged_business[field] = merged_list
merged_business["version"] = 1
merged_data["business"] = merged_business
# suggested_prompts lives at the top level (not under `business`) because
# it's a UI-only artifact consumed by the frontend, not business understanding
# data. The `business` sub-dict feeds the system prompt.
if input_data.suggested_prompts is not None:
merged_data["suggested_prompts"] = input_data.suggested_prompts
return merged_data
async def _get_from_cache(user_id: str) -> Optional[BusinessUnderstanding]:
"""Get business understanding from Redis cache."""
try:
@@ -245,63 +310,18 @@ async def upsert_business_understanding(
where={"userId": user_id}
)
# Get existing data structure or start fresh
existing_data: dict[str, Any] = {}
if existing and isinstance(existing.data, dict):
existing_data = dict(existing.data)
existing_business: dict[str, Any] = {}
if isinstance(existing_data.get("business"), dict):
existing_business = dict(existing_data["business"])
# Business fields (stored inside business object)
business_string_fields = [
"job_title",
"business_name",
"industry",
"business_size",
"user_role",
"additional_notes",
]
business_list_fields = [
"key_workflows",
"daily_activities",
"pain_points",
"bottlenecks",
"manual_tasks",
"automation_goals",
"current_software",
"existing_automation",
]
# Handle top-level name field
if input_data.user_name is not None:
existing_data["name"] = input_data.user_name
# Business string fields - overwrite if provided
for field in business_string_fields:
value = getattr(input_data, field)
if value is not None:
existing_business[field] = value
# Business list fields - merge with existing
for field in business_list_fields:
value = getattr(input_data, field)
if value is not None:
existing_list = _json_to_list(existing_business.get(field))
merged = _merge_lists(existing_list, value)
existing_business[field] = merged
# Set version and nest business data
existing_business["version"] = 1
existing_data["business"] = existing_business
merged_data = merge_business_understanding_data(existing_data, input_data)
# Upsert with the merged data
record = await CoPilotUnderstanding.prisma().upsert(
where={"userId": user_id},
data={
"create": {"userId": user_id, "data": SafeJson(existing_data)},
"update": {"data": SafeJson(existing_data)},
"create": {"userId": user_id, "data": SafeJson(merged_data)},
"update": {"data": SafeJson(merged_data)},
},
)

View File

@@ -0,0 +1,102 @@
"""Tests for business understanding merge and format logic."""
from datetime import datetime, timezone
from typing import Any
from backend.data.understanding import (
BusinessUnderstanding,
BusinessUnderstandingInput,
format_understanding_for_prompt,
merge_business_understanding_data,
)
def _make_input(**kwargs: Any) -> BusinessUnderstandingInput:
"""Create a BusinessUnderstandingInput with only the specified fields."""
return BusinessUnderstandingInput.model_validate(kwargs)
# ─── merge_business_understanding_data: suggested_prompts ─────────────
def test_merge_suggested_prompts_overwrites_existing():
"""New suggested_prompts should fully replace existing ones (not append)."""
existing = {
"name": "Alice",
"business": {"industry": "Tech", "version": 1},
"suggested_prompts": ["Old prompt 1", "Old prompt 2"],
}
input_data = _make_input(
suggested_prompts=["New prompt A", "New prompt B", "New prompt C"],
)
result = merge_business_understanding_data(existing, input_data)
assert result["suggested_prompts"] == [
"New prompt A",
"New prompt B",
"New prompt C",
]
def test_merge_suggested_prompts_none_preserves_existing():
"""When input has suggested_prompts=None, existing prompts are preserved."""
existing = {
"name": "Alice",
"business": {"industry": "Tech", "version": 1},
"suggested_prompts": ["Keep me"],
}
input_data = _make_input(industry="Finance")
result = merge_business_understanding_data(existing, input_data)
assert result["suggested_prompts"] == ["Keep me"]
assert result["business"]["industry"] == "Finance"
def test_merge_suggested_prompts_added_to_empty_data():
"""Suggested prompts are set at top level even when starting from empty data."""
existing: dict[str, Any] = {}
input_data = _make_input(suggested_prompts=["Prompt 1"])
result = merge_business_understanding_data(existing, input_data)
assert result["suggested_prompts"] == ["Prompt 1"]
def test_merge_suggested_prompts_empty_list_overwrites():
"""An explicit empty list should overwrite existing prompts."""
existing: dict[str, Any] = {
"suggested_prompts": ["Old prompt"],
"business": {"version": 1},
}
input_data = _make_input(suggested_prompts=[])
result = merge_business_understanding_data(existing, input_data)
assert result["suggested_prompts"] == []
# ─── format_understanding_for_prompt: excludes suggested_prompts ──────
def test_format_understanding_excludes_suggested_prompts():
"""suggested_prompts is UI-only and must NOT appear in the system prompt."""
understanding = BusinessUnderstanding(
id="test-id",
user_id="user-1",
created_at=datetime.now(tz=timezone.utc),
updated_at=datetime.now(tz=timezone.utc),
user_name="Alice",
industry="Technology",
suggested_prompts=["Automate reports", "Set up alerts", "Track KPIs"],
)
formatted = format_understanding_for_prompt(understanding)
assert "Alice" in formatted
assert "Technology" in formatted
assert "suggested_prompts" not in formatted
assert "Automate reports" not in formatted
assert "Set up alerts" not in formatted
assert "Track KPIs" not in formatted

View File

@@ -61,7 +61,12 @@ from backend.util.decorator import (
error_logged,
time_measured,
)
from backend.util.exceptions import InsufficientBalanceError, ModerationError
from backend.util.exceptions import (
GraphNotFoundError,
InsufficientBalanceError,
ModerationError,
NotFoundError,
)
from backend.util.file import clean_exec_files
from backend.util.logging import TruncatedLogger, configure_logging
from backend.util.metrics import DiscordChannel
@@ -375,9 +380,16 @@ async def execute_node(
log_metadata.debug("Node produced output", **{output_name: output_data})
yield output_name, output_data
except Exception as ex:
# Capture exception WITH context still set before restoring scope
sentry_sdk.capture_exception(error=ex, scope=scope)
sentry_sdk.flush() # Ensure it's sent before we restore scope
# Only capture unexpected errors to Sentry, not user-caused ones.
# Most ValueError subclasses here are expected (BlockExecutionError,
# InsufficientBalanceError, plain ValueError for auth/disabled blocks, etc.)
# but NotFoundError/GraphNotFoundError could indicate real platform issues.
is_expected = isinstance(ex, ValueError) and not isinstance(
ex, (NotFoundError, GraphNotFoundError)
)
if not is_expected:
sentry_sdk.capture_exception(error=ex, scope=scope)
sentry_sdk.flush()
# Re-raise to maintain normal error flow
raise
finally:
@@ -1478,7 +1490,7 @@ class ExecutionProcessor:
alert_message, DiscordChannel.PRODUCT
)
except Exception as e:
logger.error(f"Failed to send low balance Discord alert: {e}")
logger.warning(f"Failed to send low balance Discord alert: {e}")
class ExecutionManager(AppProcess):
@@ -1900,17 +1912,16 @@ class ExecutionManager(AppProcess):
channel = client.get_channel()
channel.connection.add_callback_threadsafe(lambda: channel.stop_consuming())
try:
thread.join(timeout=300)
except TimeoutError:
logger.error(
thread.join(timeout=300)
if thread.is_alive():
logger.warning(
f"{prefix} ⚠️ Run thread did not finish in time, forcing disconnect"
)
client.disconnect()
logger.info(f"{prefix} ✅ Run client disconnected")
except Exception as e:
logger.error(f"{prefix} ⚠️ Error disconnecting run client: {type(e)} {e}")
logger.warning(f"{prefix} ⚠️ Error disconnecting run client: {type(e)} {e}")
def cleanup(self):
"""Override cleanup to implement graceful shutdown with active execution waiting."""
@@ -1926,7 +1937,9 @@ class ExecutionManager(AppProcess):
)
logger.info(f"{prefix} ✅ Exec consumer has been signaled to stop")
except Exception as e:
logger.error(f"{prefix} ⚠️ Error signaling consumer to stop: {type(e)} {e}")
logger.warning(
f"{prefix} ⚠️ Error signaling consumer to stop: {type(e)} {e}"
)
# Wait for active executions to complete
if self.active_graph_runs:
@@ -1957,7 +1970,7 @@ class ExecutionManager(AppProcess):
waited += wait_interval
if self.active_graph_runs:
logger.error(
logger.warning(
f"{prefix} ⚠️ {len(self.active_graph_runs)} executions still running after {max_wait}s"
)
else:
@@ -1968,7 +1981,7 @@ class ExecutionManager(AppProcess):
self.executor.shutdown(cancel_futures=True, wait=False)
logger.info(f"{prefix} ✅ Executor shutdown completed")
except Exception as e:
logger.error(f"{prefix} ⚠️ Error during executor shutdown: {type(e)} {e}")
logger.warning(f"{prefix} ⚠️ Error during executor shutdown: {type(e)} {e}")
# Release remaining execution locks
try:

View File

@@ -94,7 +94,7 @@ SCHEDULER_OPERATION_TIMEOUT_SECONDS = 300 # 5 minutes for scheduler operations
def job_listener(event):
"""Logs job execution outcomes for better monitoring."""
if event.exception:
logger.error(
logger.warning(
f"Job {event.job_id} failed: {type(event.exception).__name__}: {event.exception}"
)
else:
@@ -137,7 +137,7 @@ def run_async(coro, timeout: float = SCHEDULER_OPERATION_TIMEOUT_SECONDS):
try:
return future.result(timeout=timeout)
except Exception as e:
logger.error(f"Async operation failed: {type(e).__name__}: {e}")
logger.warning(f"Async operation failed: {type(e).__name__}: {e}")
raise
@@ -186,7 +186,7 @@ async def _execute_graph(**kwargs):
async def _handle_graph_validation_error(args: "GraphExecutionJobArgs") -> None:
logger.error(
logger.warning(
f"Scheduled Graph {args.graph_id} failed validation. Unscheduling graph"
)
if args.schedule_id:
@@ -196,8 +196,9 @@ async def _handle_graph_validation_error(args: "GraphExecutionJobArgs") -> None:
user_id=args.user_id,
)
else:
logger.error(
f"Unable to unschedule graph: {args.graph_id} as this is an old job with no associated schedule_id please remove manually"
logger.warning(
f"Unable to unschedule graph: {args.graph_id} as this is an old job "
f"with no associated schedule_id please remove manually"
)

View File

@@ -46,7 +46,7 @@ from backend.util.exceptions import (
)
from backend.util.logging import TruncatedLogger, is_structured_logging_enabled
from backend.util.settings import Config
from backend.util.type import convert
from backend.util.type import coerce_inputs_to_schema
config = Config()
logger = TruncatedLogger(logging.getLogger(__name__), prefix="[GraphExecutorUtil]")
@@ -213,11 +213,8 @@ def validate_exec(
if resolve_input:
data = merge_execution_input(data)
# Convert non-matching data types to the expected input schema.
for name, data_type in schema.__annotations__.items():
value = data.get(name)
if (value is not None) and (type(value) is not data_type):
data[name] = convert(value, data_type)
# Coerce non-matching data types to the expected input schema.
coerce_inputs_to_schema(data, schema)
# Input data post-merge should contain all required fields from the schema.
if missing_input := schema.get_missing_input(data):

View File

@@ -303,9 +303,9 @@ class NotificationManager(AppService):
)
if not oldest_message:
# this should never happen
logger.error(
f"Batch for user {batch.user_id} and type {notification_type} has no oldest message whichshould never happen!!!!!!!!!!!!!!!!"
logger.warning(
f"Batch for user {batch.user_id} and type {notification_type} "
f"has no oldest message — batch may have been cleared concurrently"
)
continue
@@ -318,7 +318,7 @@ class NotificationManager(AppService):
).get_user_email_by_id(batch.user_id)
if not recipient_email:
logger.error(
logger.warning(
f"User email not found for user {batch.user_id}"
)
continue
@@ -344,7 +344,7 @@ class NotificationManager(AppService):
).get_user_notification_batch(batch.user_id, notification_type)
if not batch_data or not batch_data.notifications:
logger.error(
logger.warning(
f"Batch data not found for user {batch.user_id}"
)
# Clear the batch
@@ -372,7 +372,7 @@ class NotificationManager(AppService):
)
)
except Exception as e:
logger.error(
logger.warning(
f"Error parsing notification event: {e=}, {db_event=}"
)
continue
@@ -415,7 +415,10 @@ class NotificationManager(AppService):
async def discord_system_alert(
self, content: str, channel: DiscordChannel = DiscordChannel.PLATFORM
):
await discord_send_alert(content, channel)
try:
await discord_send_alert(content, channel)
except Exception as e:
logger.warning(f"Failed to send Discord system alert: {e}")
async def _queue_scheduled_notification(self, event: SummaryParamsEventModel):
"""Queue a scheduled notification - exposed method for other services to call"""
@@ -516,7 +519,7 @@ class NotificationManager(AppService):
raise ValueError("Invalid event type or params")
except Exception as e:
logger.error(f"Failed to gather summary data: {e}")
logger.warning(f"Failed to gather summary data: {e}")
# Return sensible defaults in case of error
if event_type == NotificationType.DAILY_SUMMARY and isinstance(
params, DailySummaryParams
@@ -562,8 +565,9 @@ class NotificationManager(AppService):
should_retry=False
).get_user_notification_oldest_message_in_batch(user_id, event_type)
if not oldest_message:
logger.error(
f"Batch for user {user_id} and type {event_type} has no oldest message whichshould never happen!!!!!!!!!!!!!!!!"
logger.warning(
f"Batch for user {user_id} and type {event_type} "
f"has no oldest message — batch may have been cleared concurrently"
)
return False
oldest_age = oldest_message.created_at
@@ -585,7 +589,7 @@ class NotificationManager(AppService):
get_notif_data_type(event.type)
].model_validate_json(message)
except Exception as e:
logger.error(f"Error parsing message due to non matching schema {e}")
logger.warning(f"Error parsing message due to non matching schema {e}")
return None
async def _process_admin_message(self, message: str) -> bool:
@@ -614,7 +618,7 @@ class NotificationManager(AppService):
should_retry=False
).get_user_email_by_id(event.user_id)
if not recipient_email:
logger.error(f"User email not found for user {event.user_id}")
logger.warning(f"User email not found for user {event.user_id}")
return False
should_send = await self._should_email_user_based_on_preference(
@@ -651,7 +655,7 @@ class NotificationManager(AppService):
should_retry=False
).get_user_email_by_id(event.user_id)
if not recipient_email:
logger.error(f"User email not found for user {event.user_id}")
logger.warning(f"User email not found for user {event.user_id}")
return False
should_send = await self._should_email_user_based_on_preference(
@@ -672,7 +676,7 @@ class NotificationManager(AppService):
should_retry=False
).get_user_notification_batch(event.user_id, event.type)
if not batch or not batch.notifications:
logger.error(f"Batch not found for user {event.user_id}")
logger.warning(f"Batch not found for user {event.user_id}")
return False
unsub_link = generate_unsubscribe_link(event.user_id)
@@ -745,7 +749,7 @@ class NotificationManager(AppService):
f"Removed {len(chunk_ids)} sent notifications from batch"
)
except Exception as e:
logger.error(
logger.warning(
f"Failed to remove sent notifications: {e}"
)
# Continue anyway - better to risk duplicates than lose emails
@@ -770,7 +774,7 @@ class NotificationManager(AppService):
else:
# Message is too large even after size reduction
if attempt_size == 1:
logger.error(
logger.warning(
f"Failed to send notification at index {i}: "
f"Single notification exceeds email size limit "
f"({len(test_message):,} chars > {MAX_EMAIL_SIZE:,} chars). "
@@ -789,7 +793,7 @@ class NotificationManager(AppService):
f"Removed oversized notification {chunk_ids[0]} from batch permanently"
)
except Exception as e:
logger.error(
logger.warning(
f"Failed to remove oversized notification: {e}"
)
@@ -823,7 +827,7 @@ class NotificationManager(AppService):
f"Set email verification to false for user {event.user_id}"
)
except Exception as deactivation_error:
logger.error(
logger.warning(
f"Failed to deactivate email for user {event.user_id}: "
f"{deactivation_error}"
)
@@ -835,7 +839,7 @@ class NotificationManager(AppService):
f"Disabled all notification preferences for user {event.user_id}"
)
except Exception as disable_error:
logger.error(
logger.warning(
f"Failed to disable notification preferences: {disable_error}"
)
@@ -848,7 +852,7 @@ class NotificationManager(AppService):
f"Cleared ALL notification batches for user {event.user_id}"
)
except Exception as remove_error:
logger.error(
logger.warning(
f"Failed to clear batches for inactive recipient: {remove_error}"
)
@@ -859,7 +863,7 @@ class NotificationManager(AppService):
"422" in error_message
or "unprocessable" in error_message
):
logger.error(
logger.warning(
f"Failed to send notification at index {i}: "
f"Malformed notification data rejected by Postmark. "
f"Error: {e}. Removing from batch permanently."
@@ -877,7 +881,7 @@ class NotificationManager(AppService):
"Removed malformed notification from batch permanently"
)
except Exception as remove_error:
logger.error(
logger.warning(
f"Failed to remove malformed notification: {remove_error}"
)
# Check if it's a ValueError for size limit
@@ -885,14 +889,14 @@ class NotificationManager(AppService):
isinstance(e, ValueError)
and "too large" in error_message
):
logger.error(
logger.warning(
f"Failed to send notification at index {i}: "
f"Notification size exceeds email limit. "
f"Error: {e}. Skipping this notification."
)
# Other API errors
else:
logger.error(
logger.warning(
f"Failed to send notification at index {i}: "
f"Email API error ({error_type}): {e}. "
f"Skipping this notification."
@@ -907,7 +911,9 @@ class NotificationManager(AppService):
if not chunk_sent:
# Should not reach here due to single notification handling
logger.error(f"Failed to send notifications starting at index {i}")
logger.warning(
f"Failed to send notifications starting at index {i}"
)
failed_indices.append(i)
i += 1
@@ -946,7 +952,7 @@ class NotificationManager(AppService):
should_retry=False
).get_user_email_by_id(event.user_id)
if not recipient_email:
logger.error(f"User email not found for user {event.user_id}")
logger.warning(f"User email not found for user {event.user_id}")
return False
should_send = await self._should_email_user_based_on_preference(
event.user_id, event.type
@@ -1007,7 +1013,10 @@ class NotificationManager(AppService):
# Let message.process() handle the rejection
pass
except Exception as e:
logger.error(f"Error processing message in {queue_name}: {e}")
logger.warning(
f"Error processing message in {queue_name}: {e}",
exc_info=True,
)
# Let message.process() handle the rejection
raise
except asyncio.CancelledError:

View File

@@ -256,9 +256,9 @@ class TestNotificationErrorHandling:
assert 2 not in successful_indices # Index 2 failed
# Verify 422 error was logged
error_calls = [call[0][0] for call in mock_logger.error.call_args_list]
warning_calls = [call[0][0] for call in mock_logger.warning.call_args_list]
assert any(
"422" in call or "malformed" in call.lower() for call in error_calls
"422" in call or "malformed" in call.lower() for call in warning_calls
)
# Verify all notifications were removed (4 successful + 1 malformed)
@@ -371,10 +371,10 @@ class TestNotificationErrorHandling:
assert 3 not in successful_indices # Index 3 was not sent
# Verify oversized error was logged
error_calls = [call[0][0] for call in mock_logger.error.call_args_list]
warning_calls = [call[0][0] for call in mock_logger.warning.call_args_list]
assert any(
"exceeds email size limit" in call or "oversized" in call.lower()
for call in error_calls
for call in warning_calls
)
@pytest.mark.asyncio
@@ -478,10 +478,10 @@ class TestNotificationErrorHandling:
assert 1 in failed_indices # Index 1 failed
# Verify generic error was logged
error_calls = [call[0][0] for call in mock_logger.error.call_args_list]
warning_calls = [call[0][0] for call in mock_logger.warning.call_args_list]
assert any(
"api error" in call.lower() or "skipping" in call.lower()
for call in error_calls
for call in warning_calls
)
# Only successful ones should be removed from batch (failed one stays for retry)

View File

@@ -613,5 +613,5 @@ async def cleanup_expired_files_async() -> int:
)
return deleted_count
except Exception as e:
logger.error(f"[CloudStorage] Error during cloud storage cleanup: {e}")
logger.warning(f"[CloudStorage] Error during cloud storage cleanup: {e}")
return 0

View File

@@ -275,13 +275,12 @@ async def store_media_file(
# Process file
elif file.startswith("data:"):
# Data URI
match = re.match(r"^data:([^;]+);base64,(.*)$", file, re.DOTALL)
if not match:
parsed_uri = parse_data_uri(file)
if parsed_uri is None:
raise ValueError(
"Invalid data URI format. Expected data:<mime>;base64,<data>"
)
mime_type = match.group(1).strip().lower()
b64_content = match.group(2).strip()
mime_type, b64_content = parsed_uri
# Generate filename and decode
extension = _extension_from_mime(mime_type)
@@ -415,13 +414,70 @@ def get_dir_size(path: Path) -> int:
return total
async def resolve_media_content(
content: MediaFileType,
execution_context: "ExecutionContext",
*,
return_format: MediaReturnFormat,
) -> MediaFileType:
"""Resolve a ``MediaFileType`` value if it is a media reference, pass through otherwise.
Convenience wrapper around :func:`is_media_file_ref` + :func:`store_media_file`.
Plain text content (source code, filenames) is returned unchanged. Media
references (``data:``, ``workspace://``, ``http(s)://``) are resolved via
:func:`store_media_file` using *return_format*.
Use this when a block field is typed as ``MediaFileType`` but may contain
either literal text or a media reference.
"""
if not content or not is_media_file_ref(content):
return content
return await store_media_file(
content, execution_context, return_format=return_format
)
def is_media_file_ref(value: str) -> bool:
"""Return True if *value* looks like a ``MediaFileType`` reference.
Detects data URIs, workspace:// references, and HTTP(S) URLs — the
formats accepted by :func:`store_media_file`. Plain text content
(e.g. source code, filenames) returns False.
Known limitation: HTTP(S) URL detection is heuristic. Any string that
starts with ``http://`` or ``https://`` is treated as a media URL, even
if it appears as a URL inside source-code comments or documentation.
Blocks that produce source code or Markdown as output may therefore
trigger false positives. Callers that need higher precision should
inspect the string further (e.g. verify the URL is reachable or has a
media-friendly extension).
Note: this does *not* match local file paths, which are ambiguous
(could be filenames or actual paths). Blocks that need to resolve
local paths should check for them separately.
"""
return value.startswith(("data:", "workspace://", "http://", "https://"))
def parse_data_uri(value: str) -> tuple[str, str] | None:
"""Parse a ``data:<mime>;base64,<payload>`` URI.
Returns ``(mime_type, base64_payload)`` if *value* is a valid data URI,
or ``None`` if it is not.
"""
match = re.match(r"^data:([^;]+);base64,(.*)$", value, re.DOTALL)
if not match:
return None
return match.group(1).strip().lower(), match.group(2).strip()
def get_mime_type(file: str) -> str:
"""
Get the MIME type of a file, whether it's a data URI, URL, or local path.
"""
if file.startswith("data:"):
match = re.match(r"^data:([^;]+);base64,", file)
return match.group(1) if match else "application/octet-stream"
parsed_uri = parse_data_uri(file)
return parsed_uri[0] if parsed_uri else "application/octet-stream"
elif file.startswith(("http://", "https://")):
parsed_url = urlparse(file)

Some files were not shown because too many files have changed in this diff Show More