Compare commits

..

1 Commits

Author SHA1 Message Date
dependabot[bot]
2276aa1f67 chore(deps): bump actions/checkout from 4 to 6
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-18 09:58:15 +00:00
213 changed files with 5679 additions and 13195 deletions

View File

@@ -2,7 +2,7 @@
name: pr-address
description: Address PR review comments and loop until CI green and all comments resolved. TRIGGER when user asks to address comments, fix PR feedback, respond to reviewers, or babysit/monitor a PR.
user-invocable: true
argument-hint: "[PR number or URL] — if omitted, finds PR for current branch."
args: "[PR number or URL] — if omitted, finds PR for current branch."
metadata:
author: autogpt-team
version: "1.0.0"
@@ -17,70 +17,18 @@ gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoG
gh pr view {N}
```
## Read the PR description
Understand the **Why / What / How** before addressing comments — you need context to make good fixes:
```bash
gh pr view {N} --json body --jq '.body'
```
## Fetch comments (all sources)
### 1. Inline review threads — GraphQL (primary source of actionable items)
Use GraphQL to fetch inline threads. It natively exposes `isResolved`, returns threads already grouped with all replies, and paginates via cursor — no manual thread reconstruction needed.
```bash
gh api graphql -f query='
{
repository(owner: "Significant-Gravitas", name: "AutoGPT") {
pullRequest(number: {N}) {
reviewThreads(first: 100) {
pageInfo { hasNextPage endCursor }
nodes {
id
isResolved
path
comments(last: 1) {
nodes { databaseId body author { login } createdAt }
}
}
}
}
}
}'
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews # top-level reviews
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments # inline review comments
gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments # PR conversation comments
```
If `pageInfo.hasNextPage` is true, fetch subsequent pages by adding `after: "<endCursor>"` to `reviewThreads(first: 100, after: "...")` and repeat until `hasNextPage` is false.
**Filter to unresolved threads only** — skip any thread where `isResolved: true`. `comments(last: 1)` returns the most recent comment in the thread — act on that; it reflects the reviewer's final ask. Use the thread `id` (Relay global ID) to track threads across polls.
### 2. Top-level reviews — REST (MUST paginate)
```bash
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate
```
**CRITICAL — always `--paginate`.** Reviews default to 30 per page. PRs can have 80170+ reviews (mostly empty resolution events). Without pagination you miss reviews past position 30 — including `autogpt-reviewer`'s structured review which is typically posted after several CI runs and sits well beyond the first page.
Two things to extract:
- **Overall state**: look for `CHANGES_REQUESTED` or `APPROVED` reviews.
- **Actionable feedback**: non-empty bodies only. Empty-body reviews are thread-resolution events — they indicate progress but have no feedback to act on.
**Where each reviewer posts:**
- `autogpt-reviewer` — posts detailed structured reviews ("Blockers", "Should Fix", "Nice to Have") as **top-level reviews**. Not present on every PR. Address ALL items.
- `sentry[bot]` — posts bug predictions as **inline threads**. Fix real bugs, explain false positives.
- `coderabbitai[bot]` — posts summaries as **top-level reviews** AND actionable items as **inline threads**. Address actionable items.
- Human reviewers — can post in any source. Address ALL non-empty feedback.
### 3. PR conversation comments — REST
```bash
gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments --paginate
```
Mostly contains: bot summaries (`coderabbitai[bot]`), CI/conflict detection (`github-actions[bot]`), and author status updates. Scan for non-empty messages from non-bot human reviewers that aren't the PR author — those are the ones that need a response.
**Bots to watch for:**
- `autogpt-reviewer` — posts "Blockers", "Should Fix", "Nice to Have". Address ALL of them.
- `sentry[bot]` — bug predictions. Fix real bugs, explain false positives.
- `coderabbitai[bot]` — automated review. Address actionable items.
## For each unaddressed comment
@@ -140,69 +88,21 @@ gh pr checks {N} --repo Significant-Gravitas/AutoGPT --json bucket,name,link
```
Parse the results: if every check has `bucket` of `"pass"` or `"skipping"`, CI is green. If any has `"fail"`, CI has failed. Otherwise CI is still pending.
2. Check for merge conflicts:
2. Check for new comments (all three sources):
```bash
gh pr view {N} --repo Significant-Gravitas/AutoGPT --json mergeable --jq '.mergeable'
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments # inline review comments
gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments # PR conversation comments
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews # top-level reviews
```
If the result is `"CONFLICTING"`, the PR has a merge conflict — see "Resolving merge conflicts" below. If `"UNKNOWN"`, GitHub is still computing mergeability — wait and re-check next poll.
Compare against previously seen comments to detect new ones.
3. Check for new/changed comments (all three sources):
**Inline threads** — re-run the GraphQL query from "Fetch comments". For each unresolved thread, record `{thread_id, last_comment_databaseId}` as your baseline. On each poll, action is needed if:
- A new thread `id` appears that wasn't in the baseline (new thread), OR
- An existing thread's `last_comment_databaseId` has changed (new reply on existing thread)
**Conversation comments:**
```bash
gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments --paginate
```
Compare total count and newest `id` against baseline. Filter to non-empty, non-bot, non-author-update messages.
**Top-level reviews:**
```bash
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate
```
Watch for new non-empty reviews (`CHANGES_REQUESTED` or `COMMENTED` with body). Compare total count and newest `id` against baseline.
4. **React in this precedence order (first match wins):**
3. **React to whichever changed first:**
| What happened | Action |
|---|---|
| Merge conflict detected | See "Resolving merge conflicts" below. |
| Mergeability is `UNKNOWN` | GitHub is still computing mergeability. Sleep 30 seconds, then restart polling from the top. |
| New comments detected | Address them (fix → commit → push → reply). After pushing, re-fetch all comments to update your baseline, then restart this polling loop from the top (new commits invalidate CI status). |
| CI failed (bucket == "fail") | Get failed check links: `gh pr checks {N} --repo Significant-Gravitas/AutoGPT --json bucket,link --jq '.[] \| select(.bucket == "fail") \| .link'`. Extract run ID from link (format: `.../actions/runs/<run-id>/job/...`), read logs with `gh run view <run-id> --repo Significant-Gravitas/AutoGPT --log-failed`. Fix → commit → push → restart polling. |
| CI green + no new comments | **Do not exit immediately.** Bots (coderabbitai, sentry) often post reviews shortly after CI settles. Continue polling for **2 more cycles (60s)** after CI goes green. Only exit after 2 consecutive green+quiet polls. |
| CI pending + no new comments | Sleep 30 seconds, then poll again. |
**The loop ends when:** CI fully green + all comments addressed + **2 consecutive polls with no new comments after CI settled.**
### Resolving merge conflicts
1. Identify the PR's target branch and remote:
```bash
gh pr view {N} --repo Significant-Gravitas/AutoGPT --json baseRefName --jq '.baseRefName'
git remote -v # find the remote pointing to Significant-Gravitas/AutoGPT (typically 'upstream' in forks, 'origin' for direct contributors)
```
2. Pull the latest base branch with a 3-way merge:
```bash
git pull {base-remote} {base-branch} --no-rebase
```
3. Resolve conflicting files, then verify no conflict markers remain:
```bash
if grep -R -n -E '^(<<<<<<<|=======|>>>>>>>)' <conflicted-files>; then
echo "Unresolved conflict markers found — resolve before proceeding."
exit 1
fi
```
4. Stage and push:
```bash
git add <conflicted-files>
git commit -m "Resolve merge conflicts with {base-branch}"
git push
```
5. Restart the polling loop from the top — new commits reset CI status.

View File

@@ -17,16 +17,6 @@ gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoG
gh pr view {N}
```
## Read the PR description
Before reading code, understand the **why**, **what**, and **how** from the PR description:
```bash
gh pr view {N} --json body --jq '.body'
```
Every PR should have a Why / What / How structure. If any of these are missing, note it as feedback.
## Read the diff
```bash
@@ -38,14 +28,12 @@ gh pr diff {N}
Before posting anything, fetch existing inline comments to avoid duplicates:
```bash
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews
```
## What to check
**Description quality:** Does the PR description cover Why (motivation/problem), What (summary of changes), and How (approach/implementation details)? If any are missing, request them — you can't judge the approach without understanding the problem and intent.
**Correctness:** logic errors, off-by-one, missing edge cases, race conditions (TOCTOU in file access, credit charging), error handling gaps, async correctness (missing `await`, unclosed resources).
**Security:** input validation at boundaries, no injection (command, XSS, SQL), secrets not logged, file paths sanitized (`os.path.basename()` in error messages).

View File

@@ -1,558 +0,0 @@
---
name: pr-test
description: "E2E manual testing of PRs/branches using docker compose, agent-browser, and API calls. TRIGGER when user asks to manually test a PR, test a feature end-to-end, or run integration tests against a running system."
user-invocable: true
argument-hint: "[worktree path or PR number] — tests the PR in the given worktree. Optional flags: --fix (auto-fix issues found)"
metadata:
author: autogpt-team
version: "1.0.0"
---
# Manual E2E Test
Test a PR/branch end-to-end by building the full platform, interacting via browser and API, capturing screenshots, and reporting results.
## Arguments
- `$ARGUMENTS` — worktree path (e.g. `$REPO_ROOT`) or PR number
- If `--fix` flag is present, auto-fix bugs found and push fixes (like pr-address loop)
## Step 0: Resolve the target
```bash
# If argument is a PR number, find its worktree
gh pr view {N} --json headRefName --jq '.headRefName'
# If argument is a path, use it directly
```
Determine:
- `REPO_ROOT` — the root repo directory: `git -C "$WORKTREE_PATH" worktree list | head -1 | awk '{print $1}'` (or `git rev-parse --show-toplevel` if not a worktree)
- `WORKTREE_PATH` — the worktree directory
- `PLATFORM_DIR``$WORKTREE_PATH/autogpt_platform`
- `BACKEND_DIR``$PLATFORM_DIR/backend`
- `FRONTEND_DIR``$PLATFORM_DIR/frontend`
- `PR_NUMBER` — the PR number (from `gh pr list --head $(git branch --show-current)`)
- `PR_TITLE` — the PR title, slugified (e.g. "Add copilot permissions" → "add-copilot-permissions")
- `RESULTS_DIR``$REPO_ROOT/test-results/PR-{PR_NUMBER}-{slugified-title}`
Create the results directory:
```bash
PR_NUMBER=$(cd $WORKTREE_PATH && gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT --json number --jq '.[0].number')
PR_TITLE=$(cd $WORKTREE_PATH && gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT --json title --jq '.[0].title' | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//' | head -c 50)
RESULTS_DIR="$REPO_ROOT/test-results/PR-${PR_NUMBER}-${PR_TITLE}"
mkdir -p $RESULTS_DIR
```
**Test user credentials** (for logging into the UI or verifying results manually):
- Email: `test@test.com`
- Password: `testtest123`
## Step 1: Understand the PR
Before testing, understand what changed:
```bash
cd $WORKTREE_PATH
# Read PR description to understand the WHY
gh pr view {N} --json body --jq '.body'
git log --oneline dev..HEAD | head -20
git diff dev --stat
```
Read the PR description (Why / What / How) and changed files to understand:
0. **Why** does this PR exist? What problem does it solve?
1. **What** feature/fix does this PR implement?
2. **How** does it work? What's the approach?
3. What components are affected? (backend, frontend, copilot, executor, etc.)
4. What are the key user-facing behaviors to test?
## Step 2: Write test scenarios
Based on the PR analysis, write a test plan to `$RESULTS_DIR/test-plan.md`:
```markdown
# Test Plan: PR #{N} — {title}
## Scenarios
1. [Scenario name] — [what to verify]
2. ...
## API Tests (if applicable)
1. [Endpoint] — [expected behavior]
## UI Tests (if applicable)
1. [Page/component] — [interaction to test]
## Negative Tests
1. [What should NOT happen]
```
**Be critical** — include edge cases, error paths, and security checks.
## Step 3: Environment setup
### 3a. Copy .env files from the root worktree
The root worktree (`$REPO_ROOT`) has the canonical `.env` files with all API keys. Copy them to the target worktree:
```bash
# CRITICAL: .env files are NOT checked into git. They must be copied manually.
cp $REPO_ROOT/autogpt_platform/.env $PLATFORM_DIR/.env
cp $REPO_ROOT/autogpt_platform/backend/.env $BACKEND_DIR/.env
cp $REPO_ROOT/autogpt_platform/frontend/.env $FRONTEND_DIR/.env
```
### 3b. Configure copilot authentication
The copilot needs an LLM API to function. Two approaches (try subscription first):
#### Option 1: Subscription mode (preferred — uses your Claude Max/Pro subscription)
The `claude_agent_sdk` Python package **bundles its own Claude CLI binary** — no need to install `@anthropic-ai/claude-code` via npm. The backend auto-provisions credentials from environment variables on startup.
Run the helper script to extract tokens from your host and auto-update `backend/.env` (works on macOS, Linux, and Windows/WSL):
```bash
# Extracts OAuth tokens and writes CLAUDE_CODE_OAUTH_TOKEN + CLAUDE_CODE_REFRESH_TOKEN into .env
bash $BACKEND_DIR/scripts/refresh_claude_token.sh --env-file $BACKEND_DIR/.env
```
**How it works:** The script reads the OAuth token from:
- **macOS**: system keychain (`"Claude Code-credentials"`)
- **Linux/WSL**: `~/.claude/.credentials.json`
- **Windows**: `%APPDATA%/claude/.credentials.json`
It sets `CLAUDE_CODE_OAUTH_TOKEN`, `CLAUDE_CODE_REFRESH_TOKEN`, and `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true` in the `.env` file. On container startup, the backend auto-provisions `~/.claude/.credentials.json` inside the container from these env vars. The SDK's bundled CLI then authenticates using that file. No `claude login`, no npm install needed.
**Note:** The OAuth token expires (~24h). If copilot returns auth errors, re-run the script and restart: `$BACKEND_DIR/scripts/refresh_claude_token.sh --env-file $BACKEND_DIR/.env && docker compose up -d copilot_executor`
#### Option 2: OpenRouter API key mode (fallback)
If subscription mode doesn't work, switch to API key mode using OpenRouter:
```bash
# In $BACKEND_DIR/.env, ensure these are set:
CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false
CHAT_API_KEY=<value of OPEN_ROUTER_API_KEY from the same .env>
CHAT_BASE_URL=https://openrouter.ai/api/v1
CHAT_USE_CLAUDE_AGENT_SDK=true
```
Use `sed` to update these values:
```bash
ORKEY=$(grep "^OPEN_ROUTER_API_KEY=" $BACKEND_DIR/.env | cut -d= -f2)
[ -n "$ORKEY" ] || { echo "ERROR: OPEN_ROUTER_API_KEY is missing in $BACKEND_DIR/.env"; exit 1; }
perl -i -pe 's/CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true/CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false/' $BACKEND_DIR/.env
# Add or update CHAT_API_KEY and CHAT_BASE_URL
grep -q "^CHAT_API_KEY=" $BACKEND_DIR/.env && perl -i -pe "s|^CHAT_API_KEY=.*|CHAT_API_KEY=$ORKEY|" $BACKEND_DIR/.env || echo "CHAT_API_KEY=$ORKEY" >> $BACKEND_DIR/.env
grep -q "^CHAT_BASE_URL=" $BACKEND_DIR/.env && perl -i -pe 's|^CHAT_BASE_URL=.*|CHAT_BASE_URL=https://openrouter.ai/api/v1|' $BACKEND_DIR/.env || echo "CHAT_BASE_URL=https://openrouter.ai/api/v1" >> $BACKEND_DIR/.env
```
### 3c. Stop conflicting containers
```bash
# Stop any running app containers (keep infra: supabase, redis, rabbitmq, clamav)
docker ps --format "{{.Names}}" | grep -E "rest_server|executor|copilot|websocket|database_manager|scheduler|notification|frontend|migrate" | while read name; do
docker stop "$name" 2>/dev/null
done
```
### 3e. Build and start
```bash
cd $PLATFORM_DIR && docker compose build --no-cache 2>&1 | tail -20
if [ ${PIPESTATUS[0]} -ne 0 ]; then echo "ERROR: Docker build failed"; exit 1; fi
cd $PLATFORM_DIR && docker compose up -d 2>&1 | tail -20
if [ ${PIPESTATUS[0]} -ne 0 ]; then echo "ERROR: Docker compose up failed"; exit 1; fi
```
**Note:** If the container appears to be running old code (e.g. missing PR changes), use `docker compose build --no-cache` to force a full rebuild. Docker BuildKit may sometimes reuse cached `COPY` layers from a previous build on a different branch.
**Expected time: 3-8 minutes** for build, 5-10 minutes with `--no-cache`.
### 3f. Wait for services to be ready
```bash
# Poll until backend and frontend respond
for i in $(seq 1 60); do
BACKEND=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8006/docs 2>/dev/null)
FRONTEND=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000 2>/dev/null)
if [ "$BACKEND" = "200" ] && [ "$FRONTEND" = "200" ]; then
echo "Services ready"
break
fi
sleep 5
done
```
### 3h. Create test user and get auth token
```bash
ANON_KEY=$(grep "NEXT_PUBLIC_SUPABASE_ANON_KEY=" $FRONTEND_DIR/.env | sed 's/.*NEXT_PUBLIC_SUPABASE_ANON_KEY=//' | tr -d '[:space:]')
# Signup (idempotent — returns "User already registered" if exists)
RESULT=$(curl -s -X POST 'http://localhost:8000/auth/v1/signup' \
-H "apikey: $ANON_KEY" \
-H 'Content-Type: application/json' \
-d '{"email":"test@test.com","password":"testtest123"}')
# If "Database error finding user", restart supabase-auth and retry
if echo "$RESULT" | grep -q "Database error"; then
docker restart supabase-auth && sleep 5
curl -s -X POST 'http://localhost:8000/auth/v1/signup' \
-H "apikey: $ANON_KEY" \
-H 'Content-Type: application/json' \
-d '{"email":"test@test.com","password":"testtest123"}'
fi
# Get auth token
TOKEN=$(curl -s -X POST 'http://localhost:8000/auth/v1/token?grant_type=password' \
-H "apikey: $ANON_KEY" \
-H 'Content-Type: application/json' \
-d '{"email":"test@test.com","password":"testtest123"}' | jq -r '.access_token // ""')
```
**Use this token for ALL API calls:**
```bash
curl -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/...
```
## Step 4: Run tests
### Service ports reference
| Service | Port | URL |
|---------|------|-----|
| Frontend | 3000 | http://localhost:3000 |
| Backend REST | 8006 | http://localhost:8006 |
| Supabase Auth (via Kong) | 8000 | http://localhost:8000 |
| Executor | 8002 | http://localhost:8002 |
| Copilot Executor | 8008 | http://localhost:8008 |
| WebSocket | 8001 | http://localhost:8001 |
| Database Manager | 8005 | http://localhost:8005 |
| Redis | 6379 | localhost:6379 |
| RabbitMQ | 5672 | localhost:5672 |
### API testing
Use `curl` with the auth token for backend API tests:
```bash
# Example: List agents
curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/graphs | jq . | head -20
# Example: Create an agent
curl -s -X POST http://localhost:8006/api/graphs \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{...}' | jq .
# Example: Run an agent
curl -s -X POST "http://localhost:8006/api/graphs/{graph_id}/execute" \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{"data": {...}}'
# Example: Get execution results
curl -s -H "Authorization: Bearer $TOKEN" \
"http://localhost:8006/api/graphs/{graph_id}/executions/{exec_id}" | jq .
```
### Browser testing with agent-browser
```bash
# Close any existing session
agent-browser close 2>/dev/null || true
# Use --session-name to persist cookies across navigations
# This means login only needs to happen once per test session
agent-browser --session-name pr-test open 'http://localhost:3000/login' --timeout 15000
# Get interactive elements
agent-browser --session-name pr-test snapshot | grep "textbox\|button"
# Login
agent-browser --session-name pr-test fill {email_ref} "test@test.com"
agent-browser --session-name pr-test fill {password_ref} "testtest123"
agent-browser --session-name pr-test click {login_button_ref}
sleep 5
# Dismiss cookie banner if present
agent-browser --session-name pr-test click 'text=Accept All' 2>/dev/null || true
# Navigate — cookies are preserved so login persists
agent-browser --session-name pr-test open 'http://localhost:3000/copilot' --timeout 10000
# Take screenshot
agent-browser --session-name pr-test screenshot $RESULTS_DIR/01-page.png
# Interact with elements
agent-browser --session-name pr-test fill {ref} "text"
agent-browser --session-name pr-test press "Enter"
agent-browser --session-name pr-test click {ref}
agent-browser --session-name pr-test click 'text=Button Text'
# Read page content
agent-browser --session-name pr-test snapshot | grep "text:"
```
**Key pages:**
- `/copilot` — CoPilot chat (for testing copilot features)
- `/build` — Agent builder (for testing block/node features)
- `/build?flowID={id}` — Specific agent in builder
- `/library` — Agent library (for testing listing/import features)
- `/library/agents/{id}` — Agent detail with run history
- `/marketplace` — Marketplace
### Checking logs
```bash
# Backend REST server
docker logs autogpt_platform-rest_server-1 2>&1 | tail -30
# Executor (runs agent graphs)
docker logs autogpt_platform-executor-1 2>&1 | tail -30
# Copilot executor (runs copilot chat sessions)
docker logs autogpt_platform-copilot_executor-1 2>&1 | tail -30
# Frontend
docker logs autogpt_platform-frontend-1 2>&1 | tail -30
# Filter for errors
docker logs autogpt_platform-executor-1 2>&1 | grep -i "error\|exception\|traceback" | tail -20
```
### Copilot chat testing
The copilot uses SSE streaming. To test via API:
```bash
# Create a session
SESSION_ID=$(curl -s -X POST 'http://localhost:8006/api/chat/sessions' \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{}' | jq -r '.id // .session_id // ""')
# Stream a message (SSE - will stream chunks)
curl -N -X POST "http://localhost:8006/api/chat/sessions/$SESSION_ID/stream" \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{"message": "Hello, what can you help me with?"}' \
--max-time 60 2>/dev/null | head -50
```
Or test via browser (preferred for UI verification):
```bash
agent-browser --session-name pr-test open 'http://localhost:3000/copilot' --timeout 10000
# ... fill chat input and press Enter, wait 20-30s for response
```
## Step 5: Record results and take screenshots
**Take a screenshot at every significant test step** — before and after interactions, on success, and on failure. Name them sequentially with descriptive names:
```bash
agent-browser --session-name pr-test screenshot $RESULTS_DIR/{NN}-{description}.png
# Examples:
# $RESULTS_DIR/01-login-page.png
# $RESULTS_DIR/02-builder-with-block.png
# $RESULTS_DIR/03-copilot-response.png
# $RESULTS_DIR/04-agent-execution-result.png
# $RESULTS_DIR/05-error-state.png
```
**Aim for at least one screenshot per test scenario.** More is better — screenshots are the primary evidence that tests were actually run.
## Step 6: Show results to user with screenshots
**CRITICAL: After all tests complete, you MUST show every screenshot to the user using the Read tool, with an explanation of what each screenshot shows.** This is the most important part of the test report — the user needs to visually verify the results.
For each screenshot:
1. Use the `Read` tool to display the PNG file (Claude can read images)
2. Write a 1-2 sentence explanation below it describing:
- What page/state is being shown
- What the screenshot proves (which test scenario it validates)
- Any notable details visible in the UI
Format the output like this:
```markdown
### Screenshot 1: {descriptive title}
[Read the PNG file here]
**What it shows:** {1-2 sentence explanation of what this screenshot proves}
---
```
After showing all screenshots, output a summary table:
| # | Scenario | Result |
|---|----------|--------|
| 1 | {name} | PASS/FAIL |
| 2 | ... | ... |
**IMPORTANT:** As you show each screenshot and record test results, persist them in shell variables for Step 7:
```bash
# Build these variables during Step 6 — they are required by Step 7's script
declare -A SCREENSHOT_EXPLANATIONS=(
["01-login-page.png"]="Shows the login page loaded successfully with SSO options visible."
["02-builder-with-block.png"]="The builder canvas displays the newly added block connected to the trigger."
# ... one entry per screenshot, using the same explanations you showed the user above
)
TEST_RESULTS_TABLE="| 1 | Login flow | PASS |
| 2 | Builder block addition | PASS |
| 3 | Copilot chat | FAIL |"
# ... one row per test scenario with actual results
```
## Step 7: Post test report as PR comment with screenshots
Upload screenshots to the PR using the GitHub Git API (no local git operations — safe for worktrees), then post a comment with inline images and per-screenshot explanations.
```bash
# Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
REPO="Significant-Gravitas/AutoGPT"
SCREENSHOTS_BRANCH="test-screenshots/pr-${PR_NUMBER}"
SCREENSHOTS_DIR="test-screenshots/PR-${PR_NUMBER}"
# Step 1: Create blobs for each screenshot and build tree JSON
TREE_JSON='['
FIRST=true
for img in $RESULTS_DIR/*.png; do
BASENAME=$(basename "$img")
B64=$(base64 < "$img")
BLOB_SHA=$(gh api "repos/${REPO}/git/blobs" -f content="$B64" -f encoding="base64" --jq '.sha')
if [ "$FIRST" = true ]; then FIRST=false; else TREE_JSON+=','; fi
TREE_JSON+="{\"path\":\"${SCREENSHOTS_DIR}/${BASENAME}\",\"mode\":\"100644\",\"type\":\"blob\",\"sha\":\"${BLOB_SHA}\"}"
done
TREE_JSON+=']'
# Step 2: Create tree, commit, and branch ref
TREE_SHA=$(echo "$TREE_JSON" | jq -c '{tree: .}' | gh api "repos/${REPO}/git/trees" --input - --jq '.sha')
COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
-f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
-f tree="$TREE_SHA" \
--jq '.sha')
gh api "repos/${REPO}/git/refs" \
-f ref="refs/heads/${SCREENSHOTS_BRANCH}" \
-f sha="$COMMIT_SHA" 2>/dev/null \
|| gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" \
-X PATCH -f sha="$COMMIT_SHA" -f force=true
```
Then post the comment with **inline images AND explanations for each screenshot**:
```bash
REPO_URL="https://raw.githubusercontent.com/${REPO}/${SCREENSHOTS_BRANCH}"
# Build image markdown using SCREENSHOT_EXPLANATIONS and TEST_RESULTS_TABLE from Step 6
IMAGE_MARKDOWN=""
for img in $RESULTS_DIR/*.png; do
BASENAME=$(basename "$img")
TITLE=$(echo "${BASENAME%.png}" | sed 's/^[0-9]*-//' | sed 's/-/ /g' | awk '{for(i=1;i<=NF;i++) $i=toupper(substr($i,1,1)) tolower(substr($i,2))}1')
EXPLANATION="${SCREENSHOT_EXPLANATIONS[$BASENAME]}"
IMAGE_MARKDOWN="${IMAGE_MARKDOWN}
### ${TITLE}
![${BASENAME}](${REPO_URL}/${SCREENSHOTS_DIR}/${BASENAME})
${EXPLANATION}
"
done
# Write comment body to file to avoid shell interpretation issues with special characters
COMMENT_FILE=$(mktemp)
cat > "$COMMENT_FILE" <<INNEREOF
## 🧪 E2E Test Report
| # | Scenario | Result |
|---|----------|--------|
${TEST_RESULTS_TABLE}
${IMAGE_MARKDOWN}
INNEREOF
gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE"
rm -f "$COMMENT_FILE"
```
**The PR comment MUST include:**
1. A summary table of all scenarios with PASS/FAIL
2. Every screenshot rendered inline (not just linked)
3. A 1-2 sentence explanation below each screenshot describing what it proves
This approach uses the GitHub Git API to create blobs, trees, commits, and refs entirely server-side. No local `git checkout` or `git push` — safe for worktrees and won't interfere with the PR branch.
## Fix mode (--fix flag)
When `--fix` is present, after finding a bug:
1. Identify the root cause in the code
2. Fix it in the worktree
3. Rebuild the affected service: `cd $PLATFORM_DIR && docker compose up --build -d {service_name}`
4. Re-test the scenario
5. If fix works, commit and push:
```bash
cd $WORKTREE_PATH
git add -A
git commit -m "fix: {description of fix}"
git push
```
6. Continue testing remaining scenarios
7. After all fixes, run the full test suite again to ensure no regressions
### Fix loop (like pr-address)
```text
test scenario → find bug → fix code → rebuild service → re-test
→ repeat until all scenarios pass
→ commit + push all fixes
→ run full re-test to verify
```
## Known issues and workarounds
### Problem: "Database error finding user" on signup
**Cause:** Supabase auth service schema cache is stale after migration.
**Fix:** `docker restart supabase-auth && sleep 5` then retry signup.
### Problem: Copilot returns auth errors in subscription mode
**Cause:** `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true` but `CLAUDE_CODE_OAUTH_TOKEN` is not set or expired.
**Fix:** Re-extract the OAuth token from macOS keychain (see step 3b, Option 1) and recreate the container (`docker compose up -d copilot_executor`). The backend auto-provisions `~/.claude/.credentials.json` from the env var on startup. No `npm install` or `claude login` needed — the SDK bundles its own CLI binary.
### Problem: agent-browser can't find chromium
**Cause:** The Dockerfile auto-provisions system chromium on all architectures (including ARM64). If your branch is behind `dev`, this may not be present yet.
**Fix:** Check if chromium exists: `which chromium || which chromium-browser`. If missing, install it: `apt-get install -y chromium` and set `AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium` in the container environment.
### Problem: agent-browser selector matches multiple elements
**Cause:** `text=X` matches all elements containing that text.
**Fix:** Use `agent-browser snapshot` to get specific `ref=eNN` references, then use those: `agent-browser click eNN`.
### Problem: Frontend shows cookie banner blocking interaction
**Fix:** `agent-browser click 'text=Accept All'` before other interactions.
### Problem: Container loses npm packages after rebuild
**Cause:** `docker compose up --build` rebuilds the image, losing runtime installs.
**Fix:** Add packages to the Dockerfile instead of installing at runtime.
### Problem: Services not starting after `docker compose up`
**Fix:** Wait and check health: `docker compose ps`. Common cause: migration hasn't finished. Check: `docker logs autogpt_platform-migrate-1 2>&1 | tail -5`. If supabase-db isn't healthy: `docker restart supabase-db && sleep 10`.
### Problem: Docker uses cached layers with old code (PR changes not visible)
**Cause:** `docker compose up --build` reuses cached `COPY` layers from previous builds. If the PR branch changes Python files but the previous build already cached that layer from `dev`, the container runs `dev` code.
**Fix:** Always use `docker compose build --no-cache` for the first build of a PR branch. Subsequent rebuilds within the same branch can use `--build`.
### Problem: `agent-browser open` loses login session
**Cause:** Without session persistence, `agent-browser open` starts fresh.
**Fix:** Use `--session-name pr-test` on ALL agent-browser commands. This auto-saves/restores cookies and localStorage across navigations. Alternatively, use `agent-browser eval "window.location.href = '...'"` to navigate within the same context.
### Problem: Supabase auth returns "Database error querying schema"
**Cause:** The database schema changed (migration ran) but supabase-auth has a stale schema cache.
**Fix:** `docker restart supabase-db && sleep 10 && docker restart supabase-auth && sleep 8`. If user data was lost, re-signup.

View File

@@ -1,12 +1,8 @@
### Why / What / How
<!-- Why: Why does this PR exist? What problem does it solve, or what's broken/missing without it? -->
<!-- What: What does this PR change? Summarize the changes at a high level. -->
<!-- How: How does it work? Describe the approach, key implementation details, or architecture decisions. -->
<!-- Clearly explain the need for these changes: -->
### Changes 🏗️
<!-- List the key changes. Keep it higher level than the diff but specific enough to highlight what's new/modified. -->
<!-- Concisely describe all of the changes made in this pull request: -->
### Checklist 📋

View File

@@ -61,7 +61,7 @@ jobs:
# See: https://github.com/actions/runner/issues/598#issuecomment-2011890429
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
fetch-depth: 0
submodules: true

View File

@@ -16,7 +16,7 @@ jobs:
build-type: [release, dev]
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

View File

@@ -35,7 +35,7 @@ jobs:
build-type: [release, dev]
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
@@ -101,7 +101,7 @@ jobs:
steps:
- name: Check out repository
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
submodules: true

View File

@@ -20,7 +20,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
- name: Log in to Docker hub
uses: docker/login-action@v3

View File

@@ -44,7 +44,7 @@ jobs:
min-python-version: '3.10'
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
fetch-depth: 0
submodules: true

View File

@@ -42,7 +42,7 @@ jobs:
working-directory: classic/benchmark
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
fetch-depth: 0
submodules: true
@@ -114,7 +114,7 @@ jobs:
timeout-minutes: 20
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
fetch-depth: 0
submodules: true

View File

@@ -10,7 +10,7 @@ jobs:
contents: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
submodules: true
fetch-depth: 0

View File

@@ -63,7 +63,7 @@ jobs:
# See: https://github.com/actions/runner/issues/598#issuecomment-2011890429
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
fetch-depth: 0
submodules: true

View File

@@ -25,7 +25,7 @@ jobs:
steps:
- name: Checkout Repo
uses: actions/checkout@v4
uses: actions/checkout@v6
- name: Setup Flutter
uses: subosito/flutter-action@v2

View File

@@ -33,7 +33,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
- id: changes-in
name: Determine affected subprojects
@@ -68,7 +68,7 @@ jobs:
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
fetch-depth: 0
@@ -120,7 +120,7 @@ jobs:
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
fetch-depth: 0

View File

@@ -294,7 +294,7 @@ jobs:
uses: actions/upload-artifact@v4
with:
name: playwright-report
path: autogpt_platform/frontend/playwright-report
path: playwright-report
if-no-files-found: ignore
retention-days: 3
@@ -303,7 +303,7 @@ jobs:
uses: actions/upload-artifact@v4
with:
name: playwright-test-results
path: autogpt_platform/frontend/test-results
path: test-results
if-no-files-found: ignore
retention-days: 3

View File

@@ -16,7 +16,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@v6
with:
fetch-depth: 0 # Need full history for merge testing

View File

@@ -55,37 +55,16 @@ AutoGPT Platform is a monorepo containing:
- Create the PR against the `dev` branch of the repository.
- Ensure the branch name is descriptive (e.g., `feature/add-new-block`)
- Use conventional commit messages (see below)
- **Structure the PR description with Why / What / How** — Why: the motivation (what problem it solves, what's broken/missing without it); What: high-level summary of changes; How: approach, key implementation details, or architecture decisions. Reviewers need all three to judge whether the approach fits the problem.
- Fill out the .github/PULL_REQUEST_TEMPLATE.md template as the PR description
- Always use `--body-file` to pass PR body — avoids shell interpretation of backticks and special characters:
```bash
PR_BODY=$(mktemp)
cat > "$PR_BODY" << 'PREOF'
## Summary
- use `backticks` freely here
PREOF
gh pr create --title "..." --body-file "$PR_BODY" --base dev
rm "$PR_BODY"
```
- Run the github pre-commit hooks to ensure code quality.
### Test-Driven Development (TDD)
When fixing a bug or adding a feature, follow a test-first approach:
1. **Write a failing test first** — create a test that reproduces the bug or validates the new behavior, marked with `@pytest.mark.xfail` (backend) or `.fixme` (Playwright). Run it to confirm it fails for the right reason.
2. **Implement the fix/feature** — write the minimal code to make the test pass.
3. **Remove the xfail marker** — once the test passes, remove the `xfail`/`.fixme` annotation and run the full test suite to confirm nothing else broke.
This ensures every change is covered by a test and that the test actually validates the intended behavior.
### Reviewing/Revising Pull Requests
Use `/pr-review` to review a PR or `/pr-address` to address comments.
When fetching comments manually:
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate` — top-level reviews
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate` — inline review comments (always paginate to avoid missing comments beyond page 1)
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews` — top-level reviews
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments` — inline review comments
- `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments` — PR conversation comments
### Conventional Commits

View File

@@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry 2.2.1 and should not be changed by hand.
# This file is automatically @generated by Poetry 2.1.1 and should not be changed by hand.
[[package]]
name = "annotated-doc"
@@ -67,7 +67,7 @@ description = "Backport of asyncio.Runner, a context manager that controls event
optional = false
python-versions = "<3.11,>=3.8"
groups = ["dev"]
markers = "python_version == \"3.10\""
markers = "python_version < \"3.11\""
files = [
{file = "backports_asyncio_runner-1.2.0-py3-none-any.whl", hash = "sha256:0da0a936a8aeb554eccb426dc55af3ba63bcdc69fa1a600b5bb305413a4477b5"},
{file = "backports_asyncio_runner-1.2.0.tar.gz", hash = "sha256:a5aa7b2b7d8f8bfcaa2b57313f70792df84e32a2a746f585213373f900b42162"},
@@ -541,7 +541,7 @@ description = "Backport of PEP 654 (exception groups)"
optional = false
python-versions = ">=3.7"
groups = ["main", "dev"]
markers = "python_version == \"3.10\""
markers = "python_version < \"3.11\""
files = [
{file = "exceptiongroup-1.3.0-py3-none-any.whl", hash = "sha256:4d111e6e0c13d0644cad6ddaa7ed0261a0b36971f6d23e7ec9b4b9097da78a10"},
{file = "exceptiongroup-1.3.0.tar.gz", hash = "sha256:b241f5885f560bc56a59ee63ca4c6a8bfa46ae4ad651af316d4e81817bb9fd88"},
@@ -2181,14 +2181,14 @@ testing = ["coverage (>=6.2)", "hypothesis (>=5.7.1)"]
[[package]]
name = "pytest-cov"
version = "7.1.0"
version = "7.0.0"
description = "Pytest plugin for measuring coverage."
optional = false
python-versions = ">=3.9"
groups = ["dev"]
files = [
{file = "pytest_cov-7.1.0-py3-none-any.whl", hash = "sha256:a0461110b7865f9a271aa1b51e516c9a95de9d696734a2f71e3e78f46e1d4678"},
{file = "pytest_cov-7.1.0.tar.gz", hash = "sha256:30674f2b5f6351aa09702a9c8c364f6a01c27aae0c1366ae8016160d1efc56b2"},
{file = "pytest_cov-7.0.0-py3-none-any.whl", hash = "sha256:3b8e9558b16cc1479da72058bdecf8073661c7f57f7d3c5f22a1c23507f2d861"},
{file = "pytest_cov-7.0.0.tar.gz", hash = "sha256:33c97eda2e049a0c5298e91f519302a1334c26ac65c1a483d6206fd458361af1"},
]
[package.dependencies]
@@ -2342,30 +2342,30 @@ pyasn1 = ">=0.1.3"
[[package]]
name = "ruff"
version = "0.15.7"
version = "0.15.0"
description = "An extremely fast Python linter and code formatter, written in Rust."
optional = false
python-versions = ">=3.7"
groups = ["dev"]
files = [
{file = "ruff-0.15.7-py3-none-linux_armv6l.whl", hash = "sha256:a81cc5b6910fb7dfc7c32d20652e50fa05963f6e13ead3c5915c41ac5d16668e"},
{file = "ruff-0.15.7-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:722d165bd52403f3bdabc0ce9e41fc47070ac56d7a91b4e0d097b516a53a3477"},
{file = "ruff-0.15.7-py3-none-macosx_11_0_arm64.whl", hash = "sha256:7fbc2448094262552146cbe1b9643a92f66559d3761f1ad0656d4991491af49e"},
{file = "ruff-0.15.7-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:6b39329b60eba44156d138275323cc726bbfbddcec3063da57caa8a8b1d50adf"},
{file = "ruff-0.15.7-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:87768c151808505f2bfc93ae44e5f9e7c8518943e5074f76ac21558ef5627c85"},
{file = "ruff-0.15.7-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:fb0511670002c6c529ec66c0e30641c976c8963de26a113f3a30456b702468b0"},
{file = "ruff-0.15.7-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:e0d19644f801849229db8345180a71bee5407b429dd217f853ec515e968a6912"},
{file = "ruff-0.15.7-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:4806d8e09ef5e84eb19ba833d0442f7e300b23fe3f0981cae159a248a10f0036"},
{file = "ruff-0.15.7-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:dce0896488562f09a27b9c91b1f58a097457143931f3c4d519690dea54e624c5"},
{file = "ruff-0.15.7-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:1852ce241d2bc89e5dc823e03cff4ce73d816b5c6cdadd27dbfe7b03217d2a12"},
{file = "ruff-0.15.7-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:5f3e4b221fb4bd293f79912fc5e93a9063ebd6d0dcbd528f91b89172a9b8436c"},
{file = "ruff-0.15.7-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:b15e48602c9c1d9bdc504b472e90b90c97dc7d46c7028011ae67f3861ceba7b4"},
{file = "ruff-0.15.7-py3-none-musllinux_1_2_i686.whl", hash = "sha256:1b4705e0e85cedc74b0a23cf6a179dbb3df184cb227761979cc76c0440b5ab0d"},
{file = "ruff-0.15.7-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:112c1fa316a558bb34319282c1200a8bf0495f1b735aeb78bfcb2991e6087580"},
{file = "ruff-0.15.7-py3-none-win32.whl", hash = "sha256:6d39e2d3505b082323352f733599f28169d12e891f7dd407f2d4f54b4c2886de"},
{file = "ruff-0.15.7-py3-none-win_amd64.whl", hash = "sha256:4d53d712ddebcd7dace1bc395367aec12c057aacfe9adbb6d832302575f4d3a1"},
{file = "ruff-0.15.7-py3-none-win_arm64.whl", hash = "sha256:18e8d73f1c3fdf27931497972250340f92e8c861722161a9caeb89a58ead6ed2"},
{file = "ruff-0.15.7.tar.gz", hash = "sha256:04f1ae61fc20fe0b148617c324d9d009b5f63412c0b16474f3d5f1a1a665f7ac"},
{file = "ruff-0.15.0-py3-none-linux_armv6l.whl", hash = "sha256:aac4ebaa612a82b23d45964586f24ae9bc23ca101919f5590bdb368d74ad5455"},
{file = "ruff-0.15.0-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:dcd4be7cc75cfbbca24a98d04d0b9b36a270d0833241f776b788d59f4142b14d"},
{file = "ruff-0.15.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:d747e3319b2bce179c7c1eaad3d884dc0a199b5f4d5187620530adf9105268ce"},
{file = "ruff-0.15.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:650bd9c56ae03102c51a5e4b554d74d825ff3abe4db22b90fd32d816c2e90621"},
{file = "ruff-0.15.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:a6664b7eac559e3048223a2da77769c2f92b43a6dfd4720cef42654299a599c9"},
{file = "ruff-0.15.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6f811f97b0f092b35320d1556f3353bf238763420ade5d9e62ebd2b73f2ff179"},
{file = "ruff-0.15.0-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:761ec0a66680fab6454236635a39abaf14198818c8cdf691e036f4bc0f406b2d"},
{file = "ruff-0.15.0-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:940f11c2604d317e797b289f4f9f3fa5555ffe4fb574b55ed006c3d9b6f0eb78"},
{file = "ruff-0.15.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bcbca3d40558789126da91d7ef9a7c87772ee107033db7191edefa34e2c7f1b4"},
{file = "ruff-0.15.0-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:9a121a96db1d75fa3eb39c4539e607f628920dd72ff1f7c5ee4f1b768ac62d6e"},
{file = "ruff-0.15.0-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:5298d518e493061f2eabd4abd067c7e4fb89e2f63291c94332e35631c07c3662"},
{file = "ruff-0.15.0-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:afb6e603d6375ff0d6b0cee563fa21ab570fd15e65c852cb24922cef25050cf1"},
{file = "ruff-0.15.0-py3-none-musllinux_1_2_i686.whl", hash = "sha256:77e515f6b15f828b94dc17d2b4ace334c9ddb7d9468c54b2f9ed2b9c1593ef16"},
{file = "ruff-0.15.0-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:6f6e80850a01eb13b3e42ee0ebdf6e4497151b48c35051aab51c101266d187a3"},
{file = "ruff-0.15.0-py3-none-win32.whl", hash = "sha256:238a717ef803e501b6d51e0bdd0d2c6e8513fe9eec14002445134d3907cd46c3"},
{file = "ruff-0.15.0-py3-none-win_amd64.whl", hash = "sha256:dd5e4d3301dc01de614da3cdffc33d4b1b96fb89e45721f1598e5532ccf78b18"},
{file = "ruff-0.15.0-py3-none-win_arm64.whl", hash = "sha256:c480d632cc0ca3f0727acac8b7d053542d9e114a462a145d0b00e7cd658c515a"},
{file = "ruff-0.15.0.tar.gz", hash = "sha256:6bdea47cdbea30d40f8f8d7d69c0854ba7c15420ec75a26f463290949d7f7e9a"},
]
[[package]]
@@ -2564,7 +2564,7 @@ description = "A lil' TOML parser"
optional = false
python-versions = ">=3.8"
groups = ["dev"]
markers = "python_version == \"3.10\""
markers = "python_version < \"3.11\""
files = [
{file = "tomli-2.2.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:678e4fa69e4575eb77d103de3df8a895e1591b48e740211bd1067378c69e8249"},
{file = "tomli-2.2.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:023aa114dd824ade0100497eb2318602af309e5a55595f76b626d6d9f3b7b0a6"},
@@ -2912,4 +2912,4 @@ type = ["pytest-mypy"]
[metadata]
lock-version = "2.1"
python-versions = ">=3.10,<4.0"
content-hash = "e0936a065565550afed18f6298b7e04e814b44100def7049f1a0d68662624a39"
content-hash = "9619cae908ad38fa2c48016a58bcf4241f6f5793aa0e6cc140276e91c433cbbb"

View File

@@ -26,8 +26,8 @@ pyright = "^1.1.408"
pytest = "^8.4.1"
pytest-asyncio = "^1.3.0"
pytest-mock = "^3.15.1"
pytest-cov = "^7.1.0"
ruff = "^0.15.7"
pytest-cov = "^7.0.0"
ruff = "^0.15.0"
[build-system]
requires = ["poetry-core"]

View File

@@ -37,6 +37,10 @@ JWT_VERIFY_KEY=your-super-secret-jwt-token-with-at-least-32-characters-long
ENCRYPTION_KEY=dvziYgz0KSK8FENhju0ZYi8-fRTfAdlz6YLhdB_jhNw=
UNSUBSCRIBE_SECRET_KEY=HlP8ivStJjmbf6NKi78m_3FnOogut0t5ckzjsIqeaio=
## ===== SIGNUP / INVITE GATE ===== ##
# Set to true to require an invite before users can sign up
ENABLE_INVITE_GATE=false
## ===== IMPORTANT OPTIONAL CONFIGURATION ===== ##
# Platform URLs (set these for webhooks and OAuth to work)
PLATFORM_BASE_URL=http://localhost:8000

View File

@@ -85,30 +85,6 @@ poetry run pytest path/to/test.py --snapshot-update
- After refactoring, update mock targets to match new module paths
- Use `AsyncMock` for async functions (`from unittest.mock import AsyncMock`)
### Test-Driven Development (TDD)
When fixing a bug or adding a feature, write the test **before** the implementation:
```python
# 1. Write a failing test marked xfail
@pytest.mark.xfail(reason="Bug #1234: widget crashes on empty input")
def test_widget_handles_empty_input():
result = widget.process("")
assert result == Widget.EMPTY_RESULT
# 2. Run it — confirm it fails (XFAIL)
# poetry run pytest path/to/test.py::test_widget_handles_empty_input -xvs
# 3. Implement the fix
# 4. Remove xfail, run again — confirm it passes
def test_widget_handles_empty_input():
result = widget.process("")
assert result == Widget.EMPTY_RESULT
```
This catches regressions and proves the fix actually works. **Every bug fix should include a test that would have caught it.**
## Database Schema
Key models (defined in `schema.prisma`):

View File

@@ -121,21 +121,19 @@ RUN ln -s ../lib/node_modules/npm/bin/npm-cli.js /usr/bin/npm \
&& ln -s ../lib/node_modules/npm/bin/npx-cli.js /usr/bin/npx
COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries
# Install agent-browser (Copilot browser tool) using the system chromium package.
# Chrome for Testing (the binary agent-browser downloads via `agent-browser install`)
# has no ARM64 builds, so we use the distro-packaged chromium instead — verified to
# work with agent-browser via Docker tests on arm64; amd64 is validated in CI.
# Note: system chromium tracks the Debian package schedule rather than a pinned
# Chrome for Testing release. If agent-browser requires a specific Chrome version,
# verify compatibility against the chromium package version in the base image.
RUN apt-get update \
&& apt-get install -y --no-install-recommends chromium fonts-liberation \
# Install agent-browser (Copilot browser tool) + Chromium runtime dependencies.
# These are the runtime libraries Chromium/Playwright needs on Debian 13 (trixie).
RUN apt-get update && apt-get install -y --no-install-recommends \
libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 \
libdbus-1-3 libxkbcommon0 libatspi2.0-0t64 libxcomposite1 libxdamage1 \
libxfixes3 libxrandr2 libgbm1 libasound2t64 libpango-1.0-0 libcairo2 \
libx11-6 libx11-xcb1 libxcb1 libxext6 libglib2.0-0t64 \
fonts-liberation libfontconfig1 \
&& rm -rf /var/lib/apt/lists/* \
&& npm install -g agent-browser \
&& agent-browser install \
&& rm -rf /tmp/* /root/.npm
ENV AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium
WORKDIR /app/autogpt_platform/backend
# Copy only the .venv from builder (not the entire /app directory)

View File

@@ -1,8 +1,17 @@
from pydantic import BaseModel
from __future__ import annotations
from datetime import datetime
from typing import TYPE_CHECKING, Any, Literal, Optional
import prisma.enums
from pydantic import BaseModel, EmailStr
from backend.data.model import UserTransaction
from backend.util.models import Pagination
if TYPE_CHECKING:
from backend.data.invited_user import BulkInvitedUsersResult, InvitedUserRecord
class UserHistoryResponse(BaseModel):
"""Response model for listings with version history"""
@@ -14,3 +23,70 @@ class UserHistoryResponse(BaseModel):
class AddUserCreditsResponse(BaseModel):
new_balance: int
transaction_key: str
class CreateInvitedUserRequest(BaseModel):
email: EmailStr
name: Optional[str] = None
class InvitedUserResponse(BaseModel):
id: str
email: str
status: prisma.enums.InvitedUserStatus
auth_user_id: Optional[str] = None
name: Optional[str] = None
tally_understanding: Optional[dict[str, Any]] = None
tally_status: prisma.enums.TallyComputationStatus
tally_computed_at: Optional[datetime] = None
tally_error: Optional[str] = None
created_at: datetime
updated_at: datetime
@classmethod
def from_record(cls, record: InvitedUserRecord) -> InvitedUserResponse:
return cls.model_validate(record.model_dump())
class InvitedUsersResponse(BaseModel):
invited_users: list[InvitedUserResponse]
pagination: Pagination
class BulkInvitedUserRowResponse(BaseModel):
row_number: int
email: Optional[str] = None
name: Optional[str] = None
status: Literal["CREATED", "SKIPPED", "ERROR"]
message: str
invited_user: Optional[InvitedUserResponse] = None
class BulkInvitedUsersResponse(BaseModel):
created_count: int
skipped_count: int
error_count: int
results: list[BulkInvitedUserRowResponse]
@classmethod
def from_result(cls, result: BulkInvitedUsersResult) -> BulkInvitedUsersResponse:
return cls(
created_count=result.created_count,
skipped_count=result.skipped_count,
error_count=result.error_count,
results=[
BulkInvitedUserRowResponse(
row_number=row.row_number,
email=row.email,
name=row.name,
status=row.status,
message=row.message,
invited_user=(
InvitedUserResponse.from_record(row.invited_user)
if row.invited_user is not None
else None
),
)
for row in result.results
],
)

View File

@@ -0,0 +1,137 @@
import logging
import math
from autogpt_libs.auth import get_user_id, requires_admin_user
from fastapi import APIRouter, File, Query, Security, UploadFile
from backend.data.invited_user import (
bulk_create_invited_users_from_file,
create_invited_user,
list_invited_users,
retry_invited_user_tally,
revoke_invited_user,
)
from backend.data.tally import mask_email
from backend.util.models import Pagination
from .model import (
BulkInvitedUsersResponse,
CreateInvitedUserRequest,
InvitedUserResponse,
InvitedUsersResponse,
)
logger = logging.getLogger(__name__)
router = APIRouter(
prefix="/admin",
tags=["users", "admin"],
dependencies=[Security(requires_admin_user)],
)
@router.get(
"/invited-users",
response_model=InvitedUsersResponse,
summary="List Invited Users",
)
async def get_invited_users(
admin_user_id: str = Security(get_user_id),
page: int = Query(1, ge=1),
page_size: int = Query(50, ge=1, le=200),
) -> InvitedUsersResponse:
logger.info("Admin user %s requested invited users", admin_user_id)
invited_users, total = await list_invited_users(page=page, page_size=page_size)
return InvitedUsersResponse(
invited_users=[InvitedUserResponse.from_record(iu) for iu in invited_users],
pagination=Pagination(
total_items=total,
total_pages=max(1, math.ceil(total / page_size)),
current_page=page,
page_size=page_size,
),
)
@router.post(
"/invited-users",
response_model=InvitedUserResponse,
summary="Create Invited User",
)
async def create_invited_user_route(
request: CreateInvitedUserRequest,
admin_user_id: str = Security(get_user_id),
) -> InvitedUserResponse:
logger.info(
"Admin user %s creating invited user for %s",
admin_user_id,
mask_email(request.email),
)
invited_user = await create_invited_user(request.email, request.name)
logger.info(
"Admin user %s created invited user %s",
admin_user_id,
invited_user.id,
)
return InvitedUserResponse.from_record(invited_user)
@router.post(
"/invited-users/bulk",
response_model=BulkInvitedUsersResponse,
summary="Bulk Create Invited Users",
operation_id="postV2BulkCreateInvitedUsers",
)
async def bulk_create_invited_users_route(
file: UploadFile = File(...),
admin_user_id: str = Security(get_user_id),
) -> BulkInvitedUsersResponse:
logger.info(
"Admin user %s bulk invited users from %s",
admin_user_id,
file.filename or "<unnamed>",
)
content = await file.read()
result = await bulk_create_invited_users_from_file(file.filename, content)
return BulkInvitedUsersResponse.from_result(result)
@router.post(
"/invited-users/{invited_user_id}/revoke",
response_model=InvitedUserResponse,
summary="Revoke Invited User",
)
async def revoke_invited_user_route(
invited_user_id: str,
admin_user_id: str = Security(get_user_id),
) -> InvitedUserResponse:
logger.info(
"Admin user %s revoking invited user %s", admin_user_id, invited_user_id
)
invited_user = await revoke_invited_user(invited_user_id)
logger.info("Admin user %s revoked invited user %s", admin_user_id, invited_user_id)
return InvitedUserResponse.from_record(invited_user)
@router.post(
"/invited-users/{invited_user_id}/retry-tally",
response_model=InvitedUserResponse,
summary="Retry Invited User Tally",
)
async def retry_invited_user_tally_route(
invited_user_id: str,
admin_user_id: str = Security(get_user_id),
) -> InvitedUserResponse:
logger.info(
"Admin user %s retrying Tally seed for invited user %s",
admin_user_id,
invited_user_id,
)
invited_user = await retry_invited_user_tally(invited_user_id)
logger.info(
"Admin user %s retried Tally seed for invited user %s",
admin_user_id,
invited_user_id,
)
return InvitedUserResponse.from_record(invited_user)

View File

@@ -0,0 +1,168 @@
from datetime import datetime, timezone
from unittest.mock import AsyncMock
import fastapi
import fastapi.testclient
import prisma.enums
import pytest
import pytest_mock
from autogpt_libs.auth.jwt_utils import get_jwt_payload
from backend.data.invited_user import (
BulkInvitedUserRowResult,
BulkInvitedUsersResult,
InvitedUserRecord,
)
from .user_admin_routes import router as user_admin_router
app = fastapi.FastAPI()
app.include_router(user_admin_router)
client = fastapi.testclient.TestClient(app)
@pytest.fixture(autouse=True)
def setup_app_admin_auth(mock_jwt_admin):
app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
yield
app.dependency_overrides.clear()
def _sample_invited_user() -> InvitedUserRecord:
now = datetime.now(timezone.utc)
return InvitedUserRecord(
id="invite-1",
email="invited@example.com",
status=prisma.enums.InvitedUserStatus.INVITED,
auth_user_id=None,
name="Invited User",
tally_understanding=None,
tally_status=prisma.enums.TallyComputationStatus.PENDING,
tally_computed_at=None,
tally_error=None,
created_at=now,
updated_at=now,
)
def _sample_bulk_invited_users_result() -> BulkInvitedUsersResult:
return BulkInvitedUsersResult(
created_count=1,
skipped_count=1,
error_count=0,
results=[
BulkInvitedUserRowResult(
row_number=1,
email="invited@example.com",
name=None,
status="CREATED",
message="Invite created",
invited_user=_sample_invited_user(),
),
BulkInvitedUserRowResult(
row_number=2,
email="duplicate@example.com",
name=None,
status="SKIPPED",
message="An invited user with this email already exists",
invited_user=None,
),
],
)
def test_get_invited_users(
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"backend.api.features.admin.user_admin_routes.list_invited_users",
AsyncMock(return_value=([_sample_invited_user()], 1)),
)
response = client.get("/admin/invited-users")
assert response.status_code == 200
data = response.json()
assert len(data["invited_users"]) == 1
assert data["invited_users"][0]["email"] == "invited@example.com"
assert data["invited_users"][0]["status"] == "INVITED"
assert data["pagination"]["total_items"] == 1
assert data["pagination"]["current_page"] == 1
assert data["pagination"]["page_size"] == 50
def test_create_invited_user(
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"backend.api.features.admin.user_admin_routes.create_invited_user",
AsyncMock(return_value=_sample_invited_user()),
)
response = client.post(
"/admin/invited-users",
json={"email": "invited@example.com", "name": "Invited User"},
)
assert response.status_code == 200
data = response.json()
assert data["email"] == "invited@example.com"
assert data["name"] == "Invited User"
def test_bulk_create_invited_users(
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"backend.api.features.admin.user_admin_routes.bulk_create_invited_users_from_file",
AsyncMock(return_value=_sample_bulk_invited_users_result()),
)
response = client.post(
"/admin/invited-users/bulk",
files={
"file": ("invites.txt", b"invited@example.com\nduplicate@example.com\n")
},
)
assert response.status_code == 200
data = response.json()
assert data["created_count"] == 1
assert data["skipped_count"] == 1
assert data["results"][0]["status"] == "CREATED"
assert data["results"][1]["status"] == "SKIPPED"
def test_revoke_invited_user(
mocker: pytest_mock.MockerFixture,
) -> None:
revoked = _sample_invited_user().model_copy(
update={"status": prisma.enums.InvitedUserStatus.REVOKED}
)
mocker.patch(
"backend.api.features.admin.user_admin_routes.revoke_invited_user",
AsyncMock(return_value=revoked),
)
response = client.post("/admin/invited-users/invite-1/revoke")
assert response.status_code == 200
assert response.json()["status"] == "REVOKED"
def test_retry_invited_user_tally(
mocker: pytest_mock.MockerFixture,
) -> None:
retried = _sample_invited_user().model_copy(
update={"tally_status": prisma.enums.TallyComputationStatus.RUNNING}
)
mocker.patch(
"backend.api.features.admin.user_admin_routes.retry_invited_user_tally",
AsyncMock(return_value=retried),
)
response = client.post("/admin/invited-users/invite-1/retry-tally")
assert response.status_code == 200
assert response.json()["tally_status"] == "RUNNING"

View File

@@ -4,12 +4,14 @@ from difflib import SequenceMatcher
from typing import Any, Sequence, get_args, get_origin
import prisma
from prisma.enums import ContentType
from prisma.models import mv_suggested_blocks
import backend.api.features.library.db as library_db
import backend.api.features.library.model as library_model
import backend.api.features.store.db as store_db
import backend.api.features.store.model as store_model
from backend.api.features.store.hybrid_search import unified_hybrid_search
from backend.blocks import load_all_blocks
from backend.blocks._base import (
AnyBlockSchema,
@@ -22,7 +24,6 @@ from backend.blocks.llm import LlmModel
from backend.integrations.providers import ProviderName
from backend.util.cache import cached
from backend.util.models import Pagination
from backend.util.text import split_camelcase
from .model import (
BlockCategoryResponse,
@@ -270,7 +271,7 @@ async def _build_cached_search_results(
# Use hybrid search when query is present, otherwise list all blocks
if (include_blocks or include_integrations) and normalized_query:
block_results, block_total, integration_total = await _text_search_blocks(
block_results, block_total, integration_total = await _hybrid_search_blocks(
query=search_query,
include_blocks=include_blocks,
include_integrations=include_integrations,
@@ -382,75 +383,117 @@ def _collect_block_results(
return results, block_count, integration_count
async def _text_search_blocks(
async def _hybrid_search_blocks(
*,
query: str,
include_blocks: bool,
include_integrations: bool,
) -> tuple[list[_ScoredItem], int, int]:
"""
Search blocks using in-memory text matching over the block registry.
Search blocks using hybrid search with builder-specific filtering.
All blocks are already loaded in memory, so this is fast and reliable
regardless of whether OpenAI embeddings are available.
Uses unified_hybrid_search for semantic + lexical search, then applies
post-filtering for block/integration types and scoring adjustments.
Scoring:
- Base: text relevance via _score_primary_fields, plus BLOCK_SCORE_BOOST
- Base: hybrid relevance score (0-1) scaled to 0-100, plus BLOCK_SCORE_BOOST
to prioritize blocks over marketplace agents in combined results
- +30 for exact name match, +15 for prefix name match
- +20 if the block has an LlmModel field and the query matches an LLM model name
Args:
query: The search query string
include_blocks: Whether to include regular blocks
include_integrations: Whether to include integration blocks
Returns:
Tuple of (scored_items, block_count, integration_count)
"""
results: list[_ScoredItem] = []
block_count = 0
integration_count = 0
if not include_blocks and not include_integrations:
return results, 0, 0
return results, block_count, integration_count
normalized_query = query.strip().lower()
all_results, _, _ = _collect_block_results(
include_blocks=include_blocks,
include_integrations=include_integrations,
# Fetch more results to account for post-filtering
search_results, _ = await unified_hybrid_search(
query=query,
content_types=[ContentType.BLOCK],
page=1,
page_size=150,
min_score=0.10,
)
# Load all blocks for getting BlockInfo
all_blocks = load_all_blocks()
for item in all_results:
block_info = item.item
assert isinstance(block_info, BlockInfo)
name = split_camelcase(block_info.name).lower()
for result in search_results:
block_id = result["content_id"]
# Build rich description including input field descriptions,
# matching the searchable text that the embedding pipeline uses
desc_parts = [block_info.description or ""]
block_cls = all_blocks.get(block_info.id)
if block_cls is not None:
block: AnyBlockSchema = block_cls()
desc_parts += [
f"{f}: {info.description}"
for f, info in block.input_schema.model_fields.items()
if info.description
]
description = " ".join(desc_parts).lower()
# Skip excluded blocks
if block_id in EXCLUDED_BLOCK_IDS:
continue
score = _score_primary_fields(name, description, normalized_query)
metadata = result.get("metadata", {})
hybrid_score = result.get("relevance", 0.0)
# Get the actual block class
if block_id not in all_blocks:
continue
block_cls = all_blocks[block_id]
block: AnyBlockSchema = block_cls()
if block.disabled:
continue
# Check block/integration filter using metadata
is_integration = metadata.get("is_integration", False)
if is_integration and not include_integrations:
continue
if not is_integration and not include_blocks:
continue
# Get block info
block_info = block.get_info()
# Calculate final score: scale hybrid score and add builder-specific bonuses
# Hybrid scores are 0-1, builder scores were 0-200+
# Add BLOCK_SCORE_BOOST to prioritize blocks over marketplace agents
final_score = hybrid_score * 100 + BLOCK_SCORE_BOOST
# Add LLM model match bonus
if block_cls is not None and _matches_llm_model(
block_cls().input_schema, normalized_query
):
score += 20
has_llm_field = metadata.get("has_llm_model_field", False)
if has_llm_field and _matches_llm_model(block.input_schema, normalized_query):
final_score += 20
if score >= MIN_SCORE_FOR_FILTERED_RESULTS:
results.append(
_ScoredItem(
item=block_info,
filter_type=item.filter_type,
score=score + BLOCK_SCORE_BOOST,
sort_key=name,
)
# Add exact/prefix match bonus for deterministic tie-breaking
name = block_info.name.lower()
if name == normalized_query:
final_score += 30
elif name.startswith(normalized_query):
final_score += 15
# Track counts
filter_type: FilterType = "integrations" if is_integration else "blocks"
if is_integration:
integration_count += 1
else:
block_count += 1
results.append(
_ScoredItem(
item=block_info,
filter_type=filter_type,
score=final_score,
sort_key=name,
)
)
block_count = sum(1 for r in results if r.filter_type == "blocks")
integration_count = sum(1 for r in results if r.filter_type == "integrations")
return results, block_count, integration_count

View File

@@ -60,6 +60,7 @@ from backend.copilot.tools.models import (
)
from backend.copilot.tracking import track_user_message
from backend.data.redis_client import get_redis_async
from backend.data.understanding import get_business_understanding
from backend.data.workspace import get_or_create_workspace
from backend.util.exceptions import NotFoundError
@@ -894,6 +895,36 @@ async def session_assign_user(
return {"status": "ok"}
# ========== Suggested Prompts ==========
class SuggestedPromptsResponse(BaseModel):
"""Response model for user-specific suggested prompts."""
prompts: list[str]
@router.get(
"/suggested-prompts",
dependencies=[Security(auth.requires_user)],
)
async def get_suggested_prompts(
user_id: Annotated[str, Security(auth.get_user_id)],
) -> SuggestedPromptsResponse:
"""
Get LLM-generated suggested prompts for the authenticated user.
Returns personalized quick-action prompts based on the user's
business understanding. Returns an empty list if no custom prompts
are available.
"""
understanding = await get_business_understanding(user_id)
if understanding is None:
return SuggestedPromptsResponse(prompts=[])
return SuggestedPromptsResponse(prompts=understanding.suggested_prompts)
# ========== Configuration ==========

View File

@@ -1,7 +1,7 @@
"""Tests for chat API routes: session title update, file attachment validation, usage, and rate limiting."""
"""Tests for chat API routes: session title update, file attachment validation, usage, rate limiting, and suggested prompts."""
from datetime import UTC, datetime, timedelta
from unittest.mock import AsyncMock
from unittest.mock import AsyncMock, MagicMock
import fastapi
import fastapi.testclient
@@ -400,3 +400,62 @@ def test_usage_rejects_unauthenticated_request() -> None:
response = unauthenticated_client.get("/usage")
assert response.status_code == 401
# ─── Suggested prompts endpoint ──────────────────────────────────────
def _mock_get_business_understanding(
mocker: pytest_mock.MockerFixture,
*,
return_value=None,
):
"""Mock get_business_understanding."""
return mocker.patch(
"backend.api.features.chat.routes.get_business_understanding",
new_callable=AsyncMock,
return_value=return_value,
)
def test_suggested_prompts_returns_prompts(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""User with understanding and prompts gets them back."""
mock_understanding = MagicMock()
mock_understanding.suggested_prompts = ["Do X", "Do Y", "Do Z"]
_mock_get_business_understanding(mocker, return_value=mock_understanding)
response = client.get("/suggested-prompts")
assert response.status_code == 200
assert response.json() == {"prompts": ["Do X", "Do Y", "Do Z"]}
def test_suggested_prompts_no_understanding(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""User with no understanding gets empty list."""
_mock_get_business_understanding(mocker, return_value=None)
response = client.get("/suggested-prompts")
assert response.status_code == 200
assert response.json() == {"prompts": []}
def test_suggested_prompts_empty_prompts(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""User with understanding but no prompts gets empty list."""
mock_understanding = MagicMock()
mock_understanding.suggested_prompts = []
_mock_get_business_understanding(mocker, return_value=mock_understanding)
response = client.get("/suggested-prompts")
assert response.status_code == 200
assert response.json() == {"prompts": []}

View File

@@ -9,7 +9,7 @@ import prisma.errors
import prisma.models
import prisma.types
from backend.data.db import query_raw_with_schema, transaction
from backend.data.db import transaction
from backend.data.graph import (
GraphModel,
GraphModelWithoutNodes,
@@ -104,8 +104,7 @@ async def get_store_agents(
# search_used_hybrid remains False, will use fallback path below
# Convert hybrid search results (dict format) if hybrid succeeded
# Fall through to direct DB search if hybrid returned nothing
if search_used_hybrid and agents:
if search_used_hybrid:
total_pages = (total + page_size - 1) // page_size
store_agents: list[store_model.StoreAgent] = []
for agent in agents:
@@ -131,20 +130,52 @@ async def get_store_agents(
)
continue
if not search_used_hybrid or not agents:
# Fallback path: direct DB query with optional tsvector search.
# This mirrors the original pre-hybrid-search implementation.
store_agents, total = await _fallback_store_agent_search(
search_query=search_query,
featured=featured,
creators=creators,
category=category,
sorted_by=sorted_by,
page=page,
page_size=page_size,
if not search_used_hybrid:
# Fallback path - use basic search or no search
where_clause: prisma.types.StoreAgentWhereInput = {"is_available": True}
if featured:
where_clause["featured"] = featured
if creators:
where_clause["creator_username"] = {"in": creators}
if category:
where_clause["categories"] = {"has": category}
# Add basic text search if search_query provided but hybrid failed
if search_query:
where_clause["OR"] = [
{"agent_name": {"contains": search_query, "mode": "insensitive"}},
{"sub_heading": {"contains": search_query, "mode": "insensitive"}},
{"description": {"contains": search_query, "mode": "insensitive"}},
]
order_by = []
if sorted_by == StoreAgentsSortOptions.RATING:
order_by.append({"rating": "desc"})
elif sorted_by == StoreAgentsSortOptions.RUNS:
order_by.append({"runs": "desc"})
elif sorted_by == StoreAgentsSortOptions.NAME:
order_by.append({"agent_name": "asc"})
elif sorted_by == StoreAgentsSortOptions.UPDATED_AT:
order_by.append({"updated_at": "desc"})
db_agents = await prisma.models.StoreAgent.prisma().find_many(
where=where_clause,
order=order_by,
skip=(page - 1) * page_size,
take=page_size,
)
total = await prisma.models.StoreAgent.prisma().count(where=where_clause)
total_pages = (total + page_size - 1) // page_size
store_agents: list[store_model.StoreAgent] = []
for agent in db_agents:
try:
store_agents.append(store_model.StoreAgent.from_db(agent))
except Exception as e:
logger.error(f"Error parsing StoreAgent from db: {e}")
continue
logger.debug(f"Found {len(store_agents)} agents")
return store_model.StoreAgentsResponse(
agents=store_agents,
@@ -164,126 +195,6 @@ async def get_store_agents(
# await log_search_term(search_query=search_term)
async def _fallback_store_agent_search(
*,
search_query: str | None,
featured: bool,
creators: list[str] | None,
category: str | None,
sorted_by: StoreAgentsSortOptions | None,
page: int,
page_size: int,
) -> tuple[list[store_model.StoreAgent], int]:
"""Direct DB search fallback when hybrid search is unavailable or empty.
Uses ad-hoc to_tsvector/plainto_tsquery with ts_rank_cd for text search,
matching the quality of the original pre-hybrid-search implementation.
Falls back to simple listing when no search query is provided.
"""
if not search_query:
# No search query — use Prisma for simple filtered listing
where_clause: prisma.types.StoreAgentWhereInput = {"is_available": True}
if featured:
where_clause["featured"] = featured
if creators:
where_clause["creator_username"] = {"in": creators}
if category:
where_clause["categories"] = {"has": category}
order_by = []
if sorted_by == StoreAgentsSortOptions.RATING:
order_by.append({"rating": "desc"})
elif sorted_by == StoreAgentsSortOptions.RUNS:
order_by.append({"runs": "desc"})
elif sorted_by == StoreAgentsSortOptions.NAME:
order_by.append({"agent_name": "asc"})
elif sorted_by == StoreAgentsSortOptions.UPDATED_AT:
order_by.append({"updated_at": "desc"})
db_agents = await prisma.models.StoreAgent.prisma().find_many(
where=where_clause,
order=order_by,
skip=(page - 1) * page_size,
take=page_size,
)
total = await prisma.models.StoreAgent.prisma().count(where=where_clause)
return [store_model.StoreAgent.from_db(a) for a in db_agents], total
# Text search using ad-hoc tsvector on StoreAgent view fields
params: list[Any] = [search_query]
filters = ["sa.is_available = true"]
param_idx = 2
if featured:
filters.append("sa.featured = true")
if creators:
params.append(creators)
filters.append(f"sa.creator_username = ANY(${param_idx})")
param_idx += 1
if category:
params.append(category)
filters.append(f"${param_idx} = ANY(sa.categories)")
param_idx += 1
where_sql = " AND ".join(filters)
params.extend([page_size, (page - 1) * page_size])
limit_param = f"${param_idx}"
param_idx += 1
offset_param = f"${param_idx}"
sql = f"""
WITH ranked AS (
SELECT sa.*,
ts_rank_cd(
to_tsvector('english',
COALESCE(sa.agent_name, '') || ' ' ||
COALESCE(sa.sub_heading, '') || ' ' ||
COALESCE(sa.description, '')
),
plainto_tsquery('english', $1)
) AS rank,
COUNT(*) OVER () AS total_count
FROM {{schema_prefix}}"StoreAgent" sa
WHERE {where_sql}
AND to_tsvector('english',
COALESCE(sa.agent_name, '') || ' ' ||
COALESCE(sa.sub_heading, '') || ' ' ||
COALESCE(sa.description, '')
) @@ plainto_tsquery('english', $1)
)
SELECT * FROM ranked
ORDER BY rank DESC
LIMIT {limit_param} OFFSET {offset_param}
"""
results = await query_raw_with_schema(sql, *params)
total = results[0]["total_count"] if results else 0
store_agents = []
for row in results:
try:
store_agents.append(
store_model.StoreAgent(
slug=row["slug"],
agent_name=row["agent_name"],
agent_image=row["agent_image"][0] if row["agent_image"] else "",
creator=row["creator_username"] or "Needs Profile",
creator_avatar=row["creator_avatar"] or "",
sub_heading=row["sub_heading"],
description=row["description"],
runs=row["runs"],
rating=row["rating"],
agent_graph_id=row.get("graph_id", ""),
)
)
except Exception as e:
logger.error(f"Error parsing StoreAgent from fallback search: {e}")
continue
return store_agents, total
async def log_search_term(search_query: str):
"""Log a search term to the database"""
@@ -1228,21 +1139,16 @@ async def review_store_submission(
},
)
# Generate embedding for approved listing (best-effort)
try:
await ensure_embedding(
version_id=store_listing_version_id,
name=submission.name,
description=submission.description,
sub_heading=submission.subHeading,
categories=submission.categories,
tx=tx,
)
except Exception as emb_err:
logger.warning(
f"Could not generate embedding for listing "
f"{store_listing_version_id}: {emb_err}"
)
# Generate embedding for approved listing (blocking - admin operation)
# Inside transaction: if embedding fails, entire transaction rolls back
await ensure_embedding(
version_id=store_listing_version_id,
name=submission.name,
description=submission.description,
sub_heading=submission.subHeading,
categories=submission.categories,
tx=tx,
)
await prisma.models.StoreListing.prisma(tx).update(
where={"id": submission.storeListingId},

View File

@@ -55,6 +55,7 @@ from backend.data.credit import (
set_auto_top_up,
)
from backend.data.graph import GraphSettings
from backend.data.invited_user import get_or_activate_user
from backend.data.model import CredentialsMetaInput, UserOnboarding
from backend.data.notifications import NotificationPreference, NotificationPreferenceDTO
from backend.data.onboarding import (
@@ -70,7 +71,6 @@ from backend.data.onboarding import (
update_user_onboarding,
)
from backend.data.user import (
get_or_create_user,
get_user_by_id,
get_user_notification_preference,
update_user_email,
@@ -136,12 +136,10 @@ _tally_background_tasks: set[asyncio.Task] = set()
dependencies=[Security(requires_user)],
)
async def get_or_create_user_route(user_data: dict = Security(get_jwt_payload)):
user = await get_or_create_user(user_data)
user = await get_or_activate_user(user_data)
# Fire-and-forget: populate business understanding from Tally form.
# We use created_at proximity instead of an is_new flag because
# get_or_create_user is cached — a separate is_new return value would be
# unreliable on repeated calls within the cache TTL.
# Fire-and-forget: backfill Tally understanding when invite pre-seeding did
# not produce a stored result before first activation.
age_seconds = (datetime.now(timezone.utc) - user.created_at).total_seconds()
if age_seconds < 30:
try:
@@ -165,7 +163,8 @@ async def get_or_create_user_route(user_data: dict = Security(get_jwt_payload)):
dependencies=[Security(requires_user)],
)
async def update_user_email_route(
user_id: Annotated[str, Security(get_user_id)], email: str = Body(...)
user_id: Annotated[str, Security(get_user_id)],
email: str = Body(...),
) -> dict[str, str]:
await update_user_email(user_id, email)
@@ -179,10 +178,16 @@ async def update_user_email_route(
dependencies=[Security(requires_user)],
)
async def get_user_timezone_route(
user_data: dict = Security(get_jwt_payload),
user_id: Annotated[str, Security(get_user_id)],
) -> TimezoneResponse:
"""Get user timezone setting."""
user = await get_or_create_user(user_data)
try:
user = await get_user_by_id(user_id)
except ValueError:
raise HTTPException(
status_code=HTTP_404_NOT_FOUND,
detail="User not found. Please complete activation via /auth/user first.",
)
return TimezoneResponse(timezone=user.timezone)
@@ -193,7 +198,8 @@ async def get_user_timezone_route(
dependencies=[Security(requires_user)],
)
async def update_user_timezone_route(
user_id: Annotated[str, Security(get_user_id)], request: UpdateTimezoneRequest
user_id: Annotated[str, Security(get_user_id)],
request: UpdateTimezoneRequest,
) -> TimezoneResponse:
"""Update user timezone. The timezone should be a valid IANA timezone identifier."""
user = await update_user_timezone(user_id, str(request.timezone))
@@ -592,11 +598,6 @@ async def fulfill_checkout(user_id: Annotated[str, Security(get_user_id)]):
async def configure_user_auto_top_up(
request: AutoTopUpConfig, user_id: Annotated[str, Security(get_user_id)]
) -> str:
"""Configure auto top-up settings and perform an immediate top-up if needed.
Raises HTTPException(422) if the request parameters are invalid or if
the credit top-up fails.
"""
if request.threshold < 0:
raise HTTPException(status_code=422, detail="Threshold must be greater than 0")
if request.amount < 500 and request.amount != 0:
@@ -611,20 +612,10 @@ async def configure_user_auto_top_up(
user_credit_model = await get_user_credit_model(user_id)
current_balance = await user_credit_model.get_credits(user_id)
try:
if current_balance < request.threshold:
await user_credit_model.top_up_credits(user_id, request.amount)
else:
await user_credit_model.top_up_credits(user_id, 0)
except ValueError as e:
known_messages = (
"must not be negative",
"already exists for user",
"No payment method found",
)
if any(msg in str(e) for msg in known_messages):
raise HTTPException(status_code=422, detail=str(e))
raise
if current_balance < request.threshold:
await user_credit_model.top_up_credits(user_id, request.amount)
else:
await user_credit_model.top_up_credits(user_id, 0)
await set_auto_top_up(
user_id, AutoTopUpConfig(threshold=request.threshold, amount=request.amount)

View File

@@ -51,7 +51,7 @@ def test_get_or_create_user_route(
}
mocker.patch(
"backend.api.features.v1.get_or_create_user",
"backend.api.features.v1.get_or_activate_user",
return_value=mock_user,
)

View File

@@ -188,7 +188,6 @@ async def upload_file(
user_id: Annotated[str, fastapi.Security(get_user_id)],
file: UploadFile,
session_id: str | None = Query(default=None),
overwrite: bool = Query(default=False),
) -> UploadFileResponse:
"""
Upload a file to the user's workspace.
@@ -249,9 +248,7 @@ async def upload_file(
# Write file via WorkspaceManager
manager = WorkspaceManager(user_id, workspace.id, session_id)
try:
workspace_file = await manager.write_file(
content, filename, overwrite=overwrite
)
workspace_file = await manager.write_file(content, filename)
except ValueError as e:
raise fastapi.HTTPException(status_code=409, detail=str(e)) from e

View File

@@ -19,6 +19,7 @@ from prisma.errors import PrismaError
import backend.api.features.admin.credit_admin_routes
import backend.api.features.admin.execution_analytics_routes
import backend.api.features.admin.store_admin_routes
import backend.api.features.admin.user_admin_routes
import backend.api.features.builder
import backend.api.features.builder.routes
import backend.api.features.chat.routes as chat_routes
@@ -210,22 +211,13 @@ instrument_fastapi(
def handle_internal_http_error(status_code: int = 500, log_error: bool = True):
def handler(request: fastapi.Request, exc: Exception):
if log_error:
if status_code >= 500:
logger.exception(
"%s %s failed. Investigate and resolve the underlying issue: %s",
request.method,
request.url.path,
exc,
exc_info=exc,
)
else:
logger.warning(
"%s %s failed with %d: %s",
request.method,
request.url.path,
status_code,
exc,
)
logger.exception(
"%s %s failed. Investigate and resolve the underlying issue: %s",
request.method,
request.url.path,
exc,
exc_info=exc,
)
hint = (
"Adjust the request and retry."
@@ -275,10 +267,12 @@ async def validation_error_handler(
app.add_exception_handler(PrismaError, handle_internal_http_error(500))
app.add_exception_handler(FolderAlreadyExistsError, handle_internal_http_error(409))
app.add_exception_handler(FolderValidationError, handle_internal_http_error(400))
app.add_exception_handler(NotFoundError, handle_internal_http_error(404))
app.add_exception_handler(NotAuthorizedError, handle_internal_http_error(403))
app.add_exception_handler(
FolderAlreadyExistsError, handle_internal_http_error(409, False)
)
app.add_exception_handler(FolderValidationError, handle_internal_http_error(400, False))
app.add_exception_handler(NotFoundError, handle_internal_http_error(404, False))
app.add_exception_handler(NotAuthorizedError, handle_internal_http_error(403, False))
app.add_exception_handler(RequestValidationError, validation_error_handler)
app.add_exception_handler(pydantic.ValidationError, validation_error_handler)
app.add_exception_handler(MissingConfigError, handle_internal_http_error(503))
@@ -318,6 +312,11 @@ app.include_router(
tags=["v2", "admin"],
prefix="/api/executions",
)
app.include_router(
backend.api.features.admin.user_admin_routes.router,
tags=["v2", "admin"],
prefix="/api/users",
)
app.include_router(
backend.api.features.executions.review.routes.router,
tags=["v2", "executions", "review"],

View File

@@ -1,376 +0,0 @@
from __future__ import annotations
import asyncio
import contextvars
import json
import logging
from typing import TYPE_CHECKING, Any
from typing_extensions import TypedDict # Needed for Python <3.12 compatibility
from backend.blocks._base import (
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
)
from backend.data.model import SchemaField
if TYPE_CHECKING:
from backend.data.execution import ExecutionContext
logger = logging.getLogger(__name__)
# Block ID shared between autopilot.py and copilot prompting.py.
AUTOPILOT_BLOCK_ID = "c069dc6b-c3ed-4c12-b6e5-d47361e64ce6"
class ToolCallEntry(TypedDict):
"""A single tool invocation record from an autopilot execution."""
tool_call_id: str
tool_name: str
input: Any
output: Any | None
success: bool | None
class TokenUsage(TypedDict):
"""Aggregated token counts from the autopilot stream."""
prompt_tokens: int
completion_tokens: int
total_tokens: int
class AutoPilotBlock(Block):
"""Execute tasks using AutoGPT AutoPilot with full access to platform tools.
The autopilot can manage agents, access workspace files, fetch web content,
run blocks, and more. This block enables sub-agent patterns (autopilot calling
autopilot) and scheduled autopilot execution via the agent executor.
"""
class Input(BlockSchemaInput):
"""Input schema for the AutoPilot block."""
prompt: str = SchemaField(
description=(
"The task or instruction for the autopilot to execute. "
"The autopilot has access to platform tools like agent management, "
"workspace files, web fetch, block execution, and more."
),
placeholder="Find my agents and list them",
advanced=False,
)
system_context: str = SchemaField(
description=(
"Optional additional context prepended to the prompt. "
"Use this to constrain autopilot behavior, provide domain "
"context, or set output format requirements."
),
default="",
advanced=True,
)
session_id: str = SchemaField(
description=(
"Session ID to continue an existing autopilot conversation. "
"Leave empty to start a new session. "
"Use the session_id output from a previous run to continue."
),
default="",
advanced=True,
)
max_recursion_depth: int = SchemaField(
description=(
"Maximum nesting depth when the autopilot calls this block "
"recursively (sub-agent pattern). Prevents infinite loops."
),
default=3,
ge=1,
le=10,
advanced=True,
)
# timeout_seconds removed: the SDK manages its own heartbeat-based
# timeouts internally; wrapping with asyncio.timeout corrupts the
# SDK's internal stream (see service.py CRITICAL comment).
class Output(BlockSchemaOutput):
"""Output schema for the AutoPilot block."""
response: str = SchemaField(
description="The final text response from the autopilot."
)
tool_calls: list[ToolCallEntry] = SchemaField(
description=(
"List of tools called during execution. Each entry has "
"tool_call_id, tool_name, input, output, and success fields."
),
)
conversation_history: str = SchemaField(
description=(
"Current turn messages (user prompt + assistant reply) as JSON. "
"It can be used for logging or analysis."
),
)
session_id: str = SchemaField(
description=(
"Session ID for this conversation. "
"Pass this back to continue the conversation in a future run."
),
)
token_usage: TokenUsage = SchemaField(
description=(
"Token usage statistics: prompt_tokens, "
"completion_tokens, total_tokens."
),
)
def __init__(self):
super().__init__(
id=AUTOPILOT_BLOCK_ID,
description=(
"Execute tasks using AutoGPT AutoPilot with full access to "
"platform tools (agent management, workspace files, web fetch, "
"block execution, and more). Enables sub-agent patterns and "
"scheduled autopilot execution."
),
categories={BlockCategory.AI, BlockCategory.AGENT},
input_schema=AutoPilotBlock.Input,
output_schema=AutoPilotBlock.Output,
test_input={
"prompt": "List my agents",
"system_context": "",
"session_id": "",
"max_recursion_depth": 3,
},
test_output=[
("response", "You have 2 agents: Agent A and Agent B."),
("tool_calls", []),
(
"conversation_history",
'[{"role": "user", "content": "List my agents"}]',
),
("session_id", "test-session-id"),
(
"token_usage",
{
"prompt_tokens": 100,
"completion_tokens": 50,
"total_tokens": 150,
},
),
],
test_mock={
"create_session": lambda *args, **kwargs: "test-session-id",
"execute_copilot": lambda *args, **kwargs: (
"You have 2 agents: Agent A and Agent B.",
[],
'[{"role": "user", "content": "List my agents"}]',
"test-session-id",
{
"prompt_tokens": 100,
"completion_tokens": 50,
"total_tokens": 150,
},
),
},
)
async def create_session(self, user_id: str) -> str:
"""Create a new chat session and return its ID (mockable for tests)."""
from backend.copilot.model import create_chat_session
session = await create_chat_session(user_id)
return session.session_id
async def execute_copilot(
self,
prompt: str,
system_context: str,
session_id: str,
max_recursion_depth: int,
user_id: str,
) -> tuple[str, list[ToolCallEntry], str, str, TokenUsage]:
"""Invoke the copilot and collect all stream results.
Delegates to :func:`collect_copilot_response` — the shared helper that
consumes ``stream_chat_completion_sdk`` without wrapping it in an
``asyncio.timeout`` (the SDK manages its own heartbeat-based timeouts).
Args:
prompt: The user task/instruction.
system_context: Optional context prepended to the prompt.
session_id: Chat session to use.
max_recursion_depth: Maximum allowed recursion nesting.
user_id: Authenticated user ID.
Returns:
A tuple of (response_text, tool_calls, history_json, session_id, usage).
"""
from backend.copilot.sdk.collect import collect_copilot_response
tokens = _check_recursion(max_recursion_depth)
try:
effective_prompt = prompt
if system_context:
effective_prompt = f"[System Context: {system_context}]\n\n{prompt}"
result = await collect_copilot_response(
session_id=session_id,
message=effective_prompt,
user_id=user_id,
)
# Build a lightweight conversation summary from streamed data.
turn_messages: list[dict[str, Any]] = [
{"role": "user", "content": effective_prompt},
]
if result.tool_calls:
turn_messages.append(
{
"role": "assistant",
"content": result.response_text,
"tool_calls": result.tool_calls,
}
)
else:
turn_messages.append(
{"role": "assistant", "content": result.response_text}
)
history_json = json.dumps(turn_messages, default=str)
tool_calls: list[ToolCallEntry] = [
{
"tool_call_id": tc["tool_call_id"],
"tool_name": tc["tool_name"],
"input": tc["input"],
"output": tc["output"],
"success": tc["success"],
}
for tc in result.tool_calls
]
usage: TokenUsage = {
"prompt_tokens": result.prompt_tokens,
"completion_tokens": result.completion_tokens,
"total_tokens": result.total_tokens,
}
return (
result.response_text,
tool_calls,
history_json,
session_id,
usage,
)
finally:
_reset_recursion(tokens)
async def run(
self,
input_data: Input,
*,
execution_context: ExecutionContext,
**kwargs,
) -> BlockOutput:
"""Validate inputs, invoke the autopilot, and yield structured outputs.
Yields session_id even on failure so callers can inspect/resume the session.
"""
if not input_data.prompt.strip():
yield "error", "Prompt cannot be empty."
return
if not execution_context.user_id:
yield "error", "Cannot run autopilot without an authenticated user."
return
if input_data.max_recursion_depth < 1:
yield "error", "max_recursion_depth must be at least 1."
return
# Create session eagerly so the user always gets the session_id,
# even if the downstream stream fails (avoids orphaned sessions).
sid = input_data.session_id
if not sid:
sid = await self.create_session(execution_context.user_id)
# NOTE: No asyncio.timeout() here — the SDK manages its own
# heartbeat-based timeouts internally. Wrapping with asyncio.timeout
# would cancel the task mid-flight, corrupting the SDK's internal
# anyio memory stream (see service.py CRITICAL comment).
try:
response, tool_calls, history, _, usage = await self.execute_copilot(
prompt=input_data.prompt,
system_context=input_data.system_context,
session_id=sid,
max_recursion_depth=input_data.max_recursion_depth,
user_id=execution_context.user_id,
)
yield "response", response
yield "tool_calls", tool_calls
yield "conversation_history", history
yield "session_id", sid
yield "token_usage", usage
except asyncio.CancelledError:
yield "session_id", sid
yield "error", "AutoPilot execution was cancelled."
raise
except Exception as exc:
yield "session_id", sid
yield "error", str(exc)
# ---------------------------------------------------------------------------
# Helpers placed after the block class for top-down readability.
# ---------------------------------------------------------------------------
# Task-scoped recursion depth counter & chain-wide limit.
# contextvars are scoped to the current asyncio task, so concurrent
# graph executions each get independent counters.
_autopilot_recursion_depth: contextvars.ContextVar[int] = contextvars.ContextVar(
"_autopilot_recursion_depth", default=0
)
_autopilot_recursion_limit: contextvars.ContextVar[int | None] = contextvars.ContextVar(
"_autopilot_recursion_limit", default=None
)
def _check_recursion(
max_depth: int,
) -> tuple[contextvars.Token[int], contextvars.Token[int | None]]:
"""Check and increment recursion depth.
Returns ContextVar tokens that must be passed to ``_reset_recursion``
when the caller exits to restore the previous depth.
Raises:
RuntimeError: If the current depth already meets or exceeds the limit.
"""
current = _autopilot_recursion_depth.get()
inherited = _autopilot_recursion_limit.get()
limit = max_depth if inherited is None else min(inherited, max_depth)
if current >= limit:
raise RuntimeError(
f"AutoPilot recursion depth limit reached ({limit}). "
"The autopilot has called itself too many times."
)
return (
_autopilot_recursion_depth.set(current + 1),
_autopilot_recursion_limit.set(limit),
)
def _reset_recursion(
tokens: tuple[contextvars.Token[int], contextvars.Token[int | None]],
) -> None:
"""Restore recursion depth and limit to their previous values."""
_autopilot_recursion_depth.reset(tokens[0])
_autopilot_recursion_limit.reset(tokens[1])

View File

@@ -33,13 +33,6 @@ from backend.integrations.providers import ProviderName
from backend.util import json
from backend.util.clients import OPENROUTER_BASE_URL
from backend.util.logging import TruncatedLogger
from backend.util.openai_responses import (
convert_tools_to_responses_format,
extract_responses_content,
extract_responses_reasoning,
extract_responses_tool_calls,
extract_responses_usage,
)
from backend.util.prompt import compress_context, estimate_token_count
from backend.util.request import validate_url_host
from backend.util.settings import Settings
@@ -118,6 +111,7 @@ class LlmModel(str, Enum, metaclass=LlmModelMeta):
GPT4O_MINI = "gpt-4o-mini"
GPT4O = "gpt-4o"
GPT4_TURBO = "gpt-4-turbo"
GPT3_5_TURBO = "gpt-3.5-turbo"
# Anthropic models
CLAUDE_4_1_OPUS = "claude-opus-4-1-20250805"
CLAUDE_4_OPUS = "claude-opus-4-20250514"
@@ -283,6 +277,9 @@ MODEL_METADATA = {
LlmModel.GPT4_TURBO: ModelMetadata(
"openai", 128000, 4096, "GPT-4 Turbo", "OpenAI", "OpenAI", 3
), # gpt-4-turbo-2024-04-09
LlmModel.GPT3_5_TURBO: ModelMetadata(
"openai", 16385, 4096, "GPT-3.5 Turbo", "OpenAI", "OpenAI", 1
), # gpt-3.5-turbo-0125
# https://docs.anthropic.com/en/docs/about-claude/models
LlmModel.CLAUDE_4_1_OPUS: ModelMetadata(
"anthropic", 200000, 32000, "Claude Opus 4.1", "Anthropic", "Anthropic", 3
@@ -796,19 +793,6 @@ async def llm_call(
)
prompt = result.messages
# Sanitize unpaired surrogates in message content to prevent
# UnicodeEncodeError when httpx encodes the JSON request body.
for msg in prompt:
content = msg.get("content")
if isinstance(content, str):
try:
content.encode("utf-8")
except UnicodeEncodeError:
logger.warning("Sanitized unpaired surrogates in LLM prompt content")
msg["content"] = content.encode("utf-8", errors="surrogatepass").decode(
"utf-8", errors="replace"
)
# Calculate available tokens based on context window and input length
estimated_input_tokens = estimate_token_count(prompt)
model_max_output = llm_model.max_output_tokens or int(2**15)
@@ -817,53 +801,36 @@ async def llm_call(
max_tokens = max(min(available_tokens, model_max_output, user_max), 1)
if provider == "openai":
tools_param = tools if tools else openai.NOT_GIVEN
oai_client = openai.AsyncOpenAI(api_key=credentials.api_key.get_secret_value())
response_format = None
tools_param = convert_tools_to_responses_format(tools) if tools else openai.omit
parallel_tool_calls = get_parallel_tool_calls_param(
llm_model, parallel_tool_calls
)
text_config = openai.omit
if force_json_output:
text_config = {"format": {"type": "json_object"}} # type: ignore
response_format = {"type": "json_object"}
response = await oai_client.responses.create(
response = await oai_client.chat.completions.create(
model=llm_model.value,
input=prompt, # type: ignore[arg-type]
tools=tools_param, # type: ignore[arg-type]
max_output_tokens=max_tokens,
parallel_tool_calls=get_parallel_tool_calls_param(
llm_model, parallel_tool_calls
),
text=text_config, # type: ignore[arg-type]
store=False,
messages=prompt, # type: ignore
response_format=response_format, # type: ignore
max_completion_tokens=max_tokens,
tools=tools_param, # type: ignore
parallel_tool_calls=parallel_tool_calls,
)
raw_tool_calls = extract_responses_tool_calls(response)
tool_calls = (
[
ToolContentBlock(
id=tc["id"],
type=tc["type"],
function=ToolCall(
name=tc["function"]["name"],
arguments=tc["function"]["arguments"],
),
)
for tc in raw_tool_calls
]
if raw_tool_calls
else None
)
reasoning = extract_responses_reasoning(response)
content = extract_responses_content(response)
prompt_tokens, completion_tokens = extract_responses_usage(response)
tool_calls = extract_openai_tool_calls(response)
reasoning = extract_openai_reasoning(response)
return LLMResponse(
raw_response=response,
raw_response=response.choices[0].message,
prompt=prompt,
response=content,
response=response.choices[0].message.content or "",
tool_calls=tool_calls,
prompt_tokens=prompt_tokens,
completion_tokens=completion_tokens,
prompt_tokens=response.usage.prompt_tokens if response.usage else 0,
completion_tokens=response.usage.completion_tokens if response.usage else 0,
reasoning=reasoning,
)
elif provider == "anthropic":

View File

@@ -61,27 +61,20 @@ class ExecutionParams(BaseModel):
def _get_tool_requests(entry: dict[str, Any]) -> list[str]:
"""
Return a list of tool_call_ids if the entry is a tool request.
Supports OpenAI Chat Completions, Responses API, and Anthropic formats.
Supports both OpenAI and Anthropics formats.
"""
tool_call_ids = []
# OpenAI Responses API: function_call items have type="function_call"
if entry.get("type") == "function_call":
if call_id := entry.get("call_id"):
tool_call_ids.append(call_id)
return tool_call_ids
if entry.get("role") != "assistant":
return tool_call_ids
# OpenAI Chat Completions: check for tool_calls in the entry.
# OpenAI: check for tool_calls in the entry.
calls = entry.get("tool_calls")
if isinstance(calls, list):
for call in calls:
if tool_id := call.get("id"):
tool_call_ids.append(tool_id)
# Anthropic: check content items for tool_use type.
# Anthropics: check content items for tool_use type.
content = entry.get("content")
if isinstance(content, list):
for item in content:
@@ -96,22 +89,16 @@ def _get_tool_requests(entry: dict[str, Any]) -> list[str]:
def _get_tool_responses(entry: dict[str, Any]) -> list[str]:
"""
Return a list of tool_call_ids if the entry is a tool response.
Supports OpenAI Chat Completions, Responses API, and Anthropic formats.
Supports both OpenAI and Anthropics formats.
"""
tool_call_ids: list[str] = []
# OpenAI Responses API: function_call_output items
if entry.get("type") == "function_call_output":
if call_id := entry.get("call_id"):
tool_call_ids.append(str(call_id))
return tool_call_ids
# OpenAI Chat Completions: a tool response message with role "tool".
# OpenAI: a tool response message with role "tool" and key "tool_call_id".
if entry.get("role") == "tool":
if tool_call_id := entry.get("tool_call_id"):
tool_call_ids.append(str(tool_call_id))
# Anthropic: check content items for tool_result type.
# Anthropics: check content items for tool_result type.
if entry.get("role") == "user":
content = entry.get("content")
if isinstance(content, list):
@@ -124,16 +111,14 @@ def _get_tool_responses(entry: dict[str, Any]) -> list[str]:
return tool_call_ids
def _create_tool_response(
call_id: str, output: Any, *, responses_api: bool = False
) -> dict[str, Any]:
def _create_tool_response(call_id: str, output: Any) -> dict[str, Any]:
"""
Create a tool response message for OpenAI, Anthropic, or OpenAI Responses API,
based on the tool_id format and the responses_api flag.
Create a tool response message for either OpenAI or Anthropics,
based on the tool_id format.
"""
content = output if isinstance(output, str) else json.dumps(output)
# Anthropic format: tool IDs typically start with "toolu_"
# Anthropics format: tool IDs typically start with "toolu_"
if call_id.startswith("toolu_"):
return {
"role": "user",
@@ -143,11 +128,8 @@ def _create_tool_response(
],
}
# OpenAI Responses API format
if responses_api:
return {"type": "function_call_output", "call_id": call_id, "output": content}
# OpenAI Chat Completions format (default fallback)
# OpenAI format: tool IDs typically start with "call_".
# Or default fallback (if the tool_id doesn't match any known prefix)
return {"role": "tool", "tool_call_id": call_id, "content": content}
@@ -195,19 +177,10 @@ def _combine_tool_responses(tool_outputs: list[dict[str, Any]]) -> list[dict[str
return tool_outputs
def _convert_raw_response_to_dict(
raw_response: Any,
) -> dict[str, Any] | list[dict[str, Any]]:
def _convert_raw_response_to_dict(raw_response: Any) -> dict[str, Any]:
"""
Safely convert raw_response to dictionary format for conversation history.
Handles different response types from different LLM providers.
For the OpenAI Responses API, the raw_response is the entire Response
object. Its ``output`` items (messages, function_calls) are extracted
individually so they can be used as valid input items on the next call.
Returns a **list** of dicts in that case.
For Chat Completions / Anthropic / Ollama, returns a single dict.
"""
if isinstance(raw_response, str):
# Ollama returns a string, convert to dict format
@@ -215,28 +188,11 @@ def _convert_raw_response_to_dict(
elif isinstance(raw_response, dict):
# Already a dict (from tests or some providers)
return raw_response
elif _is_responses_api_object(raw_response):
# OpenAI Responses API: extract individual output items
items = [json.to_dict(item) for item in raw_response.output]
return items if items else [{"role": "assistant", "content": ""}]
else:
# Chat Completions / Anthropic return message objects
# OpenAI/Anthropic return objects, convert with json.to_dict
return json.to_dict(raw_response)
def _is_responses_api_object(obj: Any) -> bool:
"""Detect an OpenAI Responses API Response object.
These have ``object == "response"`` and an ``output`` list, but no
``role`` attribute (unlike ChatCompletionMessage).
"""
return (
getattr(obj, "object", None) == "response"
and hasattr(obj, "output")
and not hasattr(obj, "role")
)
def get_pending_tool_calls(conversation_history: list[Any] | None) -> dict[str, int]:
"""
All the tool calls entry in the conversation history requires a response.
@@ -798,34 +754,19 @@ class SmartDecisionMakerBlock(Block):
self, prompt: list[dict], response, tool_outputs: list | None = None
):
"""Update conversation history with response and tool outputs."""
converted = _convert_raw_response_to_dict(response.raw_response)
# Don't add separate reasoning message with tool calls (breaks Anthropic's tool_use->tool_result pairing)
assistant_message = _convert_raw_response_to_dict(response.raw_response)
has_tool_calls = isinstance(assistant_message.get("content"), list) and any(
item.get("type") == "tool_use"
for item in assistant_message.get("content", [])
)
if isinstance(converted, list):
# Responses API: output items are already individual dicts
has_tool_calls = any(
item.get("type") == "function_call" for item in converted
if response.reasoning and not has_tool_calls:
prompt.append(
{"role": "assistant", "content": f"[Reasoning]: {response.reasoning}"}
)
if response.reasoning and not has_tool_calls:
prompt.append(
{
"role": "assistant",
"content": f"[Reasoning]: {response.reasoning}",
}
)
prompt.extend(converted)
else:
# Chat Completions / Anthropic: single assistant message dict
has_tool_calls = isinstance(converted.get("content"), list) and any(
item.get("type") == "tool_use" for item in converted.get("content", [])
)
if response.reasoning and not has_tool_calls:
prompt.append(
{
"role": "assistant",
"content": f"[Reasoning]: {response.reasoning}",
}
)
prompt.append(converted)
prompt.append(assistant_message)
if tool_outputs:
prompt.extend(tool_outputs)
@@ -835,8 +776,6 @@ class SmartDecisionMakerBlock(Block):
tool_info: ToolInfo,
execution_params: ExecutionParams,
execution_processor: "ExecutionProcessor",
*,
responses_api: bool = False,
) -> dict:
"""Execute a single tool using the execution manager for proper integration."""
# Lazy imports to avoid circular dependencies
@@ -929,17 +868,13 @@ class SmartDecisionMakerBlock(Block):
if node_outputs
else "Tool executed successfully"
)
return _create_tool_response(
tool_call.id, tool_response_content, responses_api=responses_api
)
return _create_tool_response(tool_call.id, tool_response_content)
except Exception as e:
logger.warning(f"Tool execution with manager failed: {e}")
logger.error(f"Tool execution with manager failed: {e}")
# Return error response
return _create_tool_response(
tool_call.id,
f"Tool execution failed: {str(e)}",
responses_api=responses_api,
tool_call.id, f"Tool execution failed: {str(e)}"
)
async def _execute_tools_agent_mode(
@@ -960,7 +895,6 @@ class SmartDecisionMakerBlock(Block):
"""Execute tools in agent mode with a loop until finished."""
max_iterations = input_data.agent_mode_max_iterations
iteration = 0
use_responses_api = input_data.model.metadata.provider == "openai"
# Execution parameters for tool execution
execution_params = ExecutionParams(
@@ -1017,19 +951,14 @@ class SmartDecisionMakerBlock(Block):
for tool_info in processed_tools:
try:
tool_response = await self._execute_single_tool_with_manager(
tool_info,
execution_params,
execution_processor,
responses_api=use_responses_api,
tool_info, execution_params, execution_processor
)
tool_outputs.append(tool_response)
except Exception as e:
logger.error(f"Tool execution failed: {e}")
# Create error response for the tool
error_response = _create_tool_response(
tool_info.tool_call.id,
f"Error: {str(e)}",
responses_api=use_responses_api,
tool_info.tool_call.id, f"Error: {str(e)}"
)
tool_outputs.append(error_response)
@@ -1091,17 +1020,11 @@ class SmartDecisionMakerBlock(Block):
if pending_tool_calls and input_data.last_tool_output is None:
raise ValueError(f"Tool call requires an output for {pending_tool_calls}")
use_responses_api = input_data.model.metadata.provider == "openai"
tool_output = []
if pending_tool_calls and input_data.last_tool_output is not None:
first_call_id = next(iter(pending_tool_calls.keys()))
tool_output.append(
_create_tool_response(
first_call_id,
input_data.last_tool_output,
responses_api=use_responses_api,
)
_create_tool_response(first_call_id, input_data.last_tool_output)
)
prompt.extend(tool_output)
@@ -1133,9 +1056,7 @@ class SmartDecisionMakerBlock(Block):
)
if input_data.sys_prompt and not any(
p.get("role") == "system"
and isinstance(p.get("content"), str)
and p["content"].startswith(MAIN_OBJECTIVE_PREFIX)
p["role"] == "system" and p["content"].startswith(MAIN_OBJECTIVE_PREFIX)
for p in prompt
):
prompt.append(
@@ -1146,9 +1067,7 @@ class SmartDecisionMakerBlock(Block):
)
if input_data.prompt and not any(
p.get("role") == "user"
and isinstance(p.get("content"), str)
and p["content"].startswith(MAIN_OBJECTIVE_PREFIX)
p["role"] == "user" and p["content"].startswith(MAIN_OBJECTIVE_PREFIX)
for p in prompt
):
prompt.append(
@@ -1256,26 +1175,11 @@ class SmartDecisionMakerBlock(Block):
)
yield emit_key, arg_value
converted = _convert_raw_response_to_dict(response.raw_response)
# Check for tool calls to avoid inserting reasoning between tool pairs
if isinstance(converted, list):
has_tool_calls = any(
item.get("type") == "function_call" for item in converted
)
else:
has_tool_calls = isinstance(converted.get("content"), list) and any(
item.get("type") == "tool_use" for item in converted.get("content", [])
)
if response.reasoning and not has_tool_calls:
if response.reasoning:
prompt.append(
{"role": "assistant", "content": f"[Reasoning]: {response.reasoning}"}
)
if isinstance(converted, list):
prompt.extend(converted)
else:
prompt.append(converted)
prompt.append(_convert_raw_response_to_dict(response.raw_response))
yield "conversations", prompt

View File

@@ -1,223 +0,0 @@
"""Tests for AutoPilotBlock: recursion guard, streaming, validation, and error paths."""
import asyncio
from unittest.mock import AsyncMock
import pytest
from backend.blocks.autopilot import (
AUTOPILOT_BLOCK_ID,
AutoPilotBlock,
_autopilot_recursion_depth,
_autopilot_recursion_limit,
_check_recursion,
_reset_recursion,
)
from backend.data.execution import ExecutionContext
def _make_context(user_id: str = "test-user-123") -> ExecutionContext:
"""Helper to build an ExecutionContext for tests."""
return ExecutionContext(
user_id=user_id,
graph_id="graph-1",
graph_exec_id="gexec-1",
graph_version=1,
node_id="node-1",
node_exec_id="nexec-1",
)
# ---------------------------------------------------------------------------
# Recursion guard unit tests
# ---------------------------------------------------------------------------
class TestCheckRecursion:
"""Unit tests for _check_recursion / _reset_recursion."""
def test_first_call_increments_depth(self):
tokens = _check_recursion(3)
try:
assert _autopilot_recursion_depth.get() == 1
assert _autopilot_recursion_limit.get() == 3
finally:
_reset_recursion(tokens)
def test_reset_restores_previous_values(self):
assert _autopilot_recursion_depth.get() == 0
assert _autopilot_recursion_limit.get() is None
tokens = _check_recursion(5)
_reset_recursion(tokens)
assert _autopilot_recursion_depth.get() == 0
assert _autopilot_recursion_limit.get() is None
def test_exceeding_limit_raises(self):
t1 = _check_recursion(2)
try:
t2 = _check_recursion(2)
try:
with pytest.raises(RuntimeError, match="recursion depth limit"):
_check_recursion(2)
finally:
_reset_recursion(t2)
finally:
_reset_recursion(t1)
def test_nested_calls_respect_inherited_limit(self):
"""Inner call with higher max_depth still respects outer limit."""
t1 = _check_recursion(2) # sets limit=2
try:
t2 = _check_recursion(10) # inner wants 10, but inherited is 2
try:
# depth is now 2, limit is min(10, 2) = 2 → should raise
with pytest.raises(RuntimeError, match="recursion depth limit"):
_check_recursion(10)
finally:
_reset_recursion(t2)
finally:
_reset_recursion(t1)
def test_limit_of_one_blocks_immediately_on_second_call(self):
t1 = _check_recursion(1)
try:
with pytest.raises(RuntimeError):
_check_recursion(1)
finally:
_reset_recursion(t1)
# ---------------------------------------------------------------------------
# AutoPilotBlock.run() validation tests
# ---------------------------------------------------------------------------
class TestRunValidation:
"""Tests for input validation in AutoPilotBlock.run()."""
@pytest.fixture
def block(self):
return AutoPilotBlock()
@pytest.mark.asyncio
async def test_empty_prompt_yields_error(self, block):
block.Input # ensure schema is accessible
input_data = block.Input(prompt=" ", max_recursion_depth=3)
ctx = _make_context()
outputs = {}
async for name, value in block.run(input_data, execution_context=ctx):
outputs[name] = value
assert outputs.get("error") == "Prompt cannot be empty."
assert "response" not in outputs
@pytest.mark.asyncio
async def test_missing_user_id_yields_error(self, block):
input_data = block.Input(prompt="hello", max_recursion_depth=3)
ctx = _make_context(user_id="")
outputs = {}
async for name, value in block.run(input_data, execution_context=ctx):
outputs[name] = value
assert "authenticated user" in outputs.get("error", "")
@pytest.mark.asyncio
async def test_successful_run_yields_all_outputs(self, block):
"""With execute_copilot mocked, run() should yield all 5 success outputs."""
mock_result = (
"Hello world",
[],
'[{"role":"user","content":"hi"}]',
"sess-abc",
{"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15},
)
block.execute_copilot = AsyncMock(return_value=mock_result)
block.create_session = AsyncMock(return_value="sess-abc")
input_data = block.Input(prompt="hi", max_recursion_depth=3)
ctx = _make_context()
outputs = {}
async for name, value in block.run(input_data, execution_context=ctx):
outputs[name] = value
assert outputs["response"] == "Hello world"
assert outputs["tool_calls"] == []
assert outputs["session_id"] == "sess-abc"
assert outputs["token_usage"]["total_tokens"] == 15
assert "error" not in outputs
@pytest.mark.asyncio
async def test_exception_yields_error(self, block):
"""On unexpected failure, run() should yield an error output."""
block.execute_copilot = AsyncMock(side_effect=RuntimeError("boom"))
block.create_session = AsyncMock(return_value="sess-fail")
input_data = block.Input(prompt="do something", max_recursion_depth=3)
ctx = _make_context()
outputs = {}
async for name, value in block.run(input_data, execution_context=ctx):
outputs[name] = value
assert outputs["session_id"] == "sess-fail"
assert "boom" in outputs.get("error", "")
@pytest.mark.asyncio
async def test_cancelled_error_yields_error_and_reraises(self, block):
"""CancelledError should yield error, then re-raise."""
block.execute_copilot = AsyncMock(side_effect=asyncio.CancelledError())
block.create_session = AsyncMock(return_value="sess-cancel")
input_data = block.Input(prompt="do something", max_recursion_depth=3)
ctx = _make_context()
outputs = {}
with pytest.raises(asyncio.CancelledError):
async for name, value in block.run(input_data, execution_context=ctx):
outputs[name] = value
assert outputs["session_id"] == "sess-cancel"
assert "cancelled" in outputs.get("error", "").lower()
@pytest.mark.asyncio
async def test_existing_session_id_skips_create(self, block):
"""When session_id is provided, create_session should not be called."""
mock_result = (
"ok",
[],
"[]",
"existing-sid",
{"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
)
block.execute_copilot = AsyncMock(return_value=mock_result)
block.create_session = AsyncMock()
input_data = block.Input(
prompt="test", session_id="existing-sid", max_recursion_depth=3
)
ctx = _make_context()
async for _ in block.run(input_data, execution_context=ctx):
pass
block.create_session.assert_not_called()
# ---------------------------------------------------------------------------
# Block registration / ID tests
# ---------------------------------------------------------------------------
class TestBlockRegistration:
def test_block_id_matches_constant(self):
block = AutoPilotBlock()
assert block.id == AUTOPILOT_BLOCK_ID
def test_max_recursion_depth_has_upper_bound(self):
"""Schema should enforce le=10."""
schema = AutoPilotBlock.Input.model_json_schema()
max_rec = schema["properties"]["max_recursion_depth"]
assert (
max_rec.get("maximum") == 10 or max_rec.get("exclusiveMaximum", 999) <= 11
)
def test_output_schema_has_no_duplicate_error_field(self):
"""Output should inherit error from BlockSchemaOutput, not redefine it."""
# The field should exist (inherited) but there should be no explicit
# redefinition. We verify by checking the class __annotations__ directly.
assert "error" not in AutoPilotBlock.Output.__annotations__

View File

@@ -13,17 +13,18 @@ class TestLLMStatsTracking:
"""Test that llm_call returns proper token counts in LLMResponse."""
import backend.blocks.llm as llm
# Mock the OpenAI Responses API response
# Mock the OpenAI client
mock_response = MagicMock()
mock_response.output_text = "Test response"
mock_response.output = []
mock_response.usage = MagicMock(input_tokens=10, output_tokens=20)
mock_response.choices = [
MagicMock(message=MagicMock(content="Test response", tool_calls=None))
]
mock_response.usage = MagicMock(prompt_tokens=10, completion_tokens=20)
# Test with mocked OpenAI response
with patch("openai.AsyncOpenAI") as mock_openai:
mock_client = AsyncMock()
mock_openai.return_value = mock_client
mock_client.responses.create = AsyncMock(return_value=mock_response)
mock_client.chat.completions.create = AsyncMock(return_value=mock_response)
response = await llm.llm_call(
credentials=llm.TEST_CREDENTIALS,
@@ -270,17 +271,30 @@ class TestLLMStatsTracking:
mock_response = MagicMock()
# Return different responses for chunk summary vs final summary
if call_count == 1:
mock_response.output_text = '<json_output id="test123456">{"summary": "Test chunk summary"}</json_output>'
mock_response.choices = [
MagicMock(
message=MagicMock(
content='<json_output id="test123456">{"summary": "Test chunk summary"}</json_output>',
tool_calls=None,
)
)
]
else:
mock_response.output_text = '<json_output id="test123456">{"final_summary": "Test final summary"}</json_output>'
mock_response.output = []
mock_response.usage = MagicMock(input_tokens=50, output_tokens=30)
mock_response.choices = [
MagicMock(
message=MagicMock(
content='<json_output id="test123456">{"final_summary": "Test final summary"}</json_output>',
tool_calls=None,
)
)
]
mock_response.usage = MagicMock(prompt_tokens=50, completion_tokens=30)
return mock_response
with patch("openai.AsyncOpenAI") as mock_openai:
mock_client = AsyncMock()
mock_openai.return_value = mock_client
mock_client.responses.create = mock_create
mock_client.chat.completions.create = mock_create
# Test with very short text (should only need 1 chunk + 1 final summary)
input_data = llm.AITextSummarizerBlock.Input(

View File

@@ -148,7 +148,7 @@ class ChatConfig(BaseSettings):
description="E2B sandbox template to use for copilot sessions.",
)
e2b_sandbox_timeout: int = Field(
default=420, # 7 min safety net — allows headroom for compaction retries
default=300, # 5 min safety net — explicit per-turn pause is the primary mechanism
description="E2B sandbox running-time timeout (seconds). "
"E2B timeout is wall-clock (not idle). Explicit per-turn pause is the primary "
"mechanism; this is the safety net.",

View File

@@ -1,173 +0,0 @@
"""Integration credential lookup with per-process TTL cache.
Provides token retrieval for connected integrations so that copilot tools
(e.g. bash_exec) can inject auth tokens into the execution environment without
hitting the database on every command.
Cache semantics (handled automatically by TTLCache):
- Token found → cached for _TOKEN_CACHE_TTL (5 min). Avoids repeated DB hits
for users who have credentials and are running many bash commands.
- No credentials found → cached for _NULL_CACHE_TTL (60 s). Avoids a DB hit
on every E2B command for users who haven't connected an account yet, while
still picking up a newly-connected account within one minute.
Both caches are bounded to _CACHE_MAX_SIZE entries; cachetools evicts the
least-recently-used entry when the limit is reached.
Multi-worker note: both caches are in-process only. Each worker/replica
maintains its own independent cache, so a credential fetch may be duplicated
across processes. This is acceptable for the current goal (reduce DB hits per
session per-process), but if cache efficiency across replicas becomes important
a shared cache (e.g. Redis) should be used instead.
"""
import logging
from typing import cast
from cachetools import TTLCache
from backend.copilot.providers import SUPPORTED_PROVIDERS
from backend.data.model import APIKeyCredentials, OAuth2Credentials
from backend.integrations.creds_manager import (
IntegrationCredentialsManager,
register_creds_changed_hook,
)
logger = logging.getLogger(__name__)
# Derived from the single SUPPORTED_PROVIDERS registry for backward compat.
PROVIDER_ENV_VARS: dict[str, list[str]] = {
slug: entry["env_vars"] for slug, entry in SUPPORTED_PROVIDERS.items()
}
_TOKEN_CACHE_TTL = 300.0 # seconds — for found tokens
_NULL_CACHE_TTL = 60.0 # seconds — for "not connected" results
_CACHE_MAX_SIZE = 10_000
# (user_id, provider) → token string. TTLCache handles expiry + eviction.
# Thread-safety note: TTLCache is NOT thread-safe, but that is acceptable here
# because all callers (get_provider_token, invalidate_user_provider_cache) run
# exclusively on the asyncio event loop. There are no await points between a
# cache read and its corresponding write within any function, so no concurrent
# coroutine can interleave. If ThreadPoolExecutor workers are ever added to
# this path, a threading.RLock should be wrapped around these caches.
_token_cache: TTLCache[tuple[str, str], str] = TTLCache(
maxsize=_CACHE_MAX_SIZE, ttl=_TOKEN_CACHE_TTL
)
# Separate cache for "no credentials" results with a shorter TTL.
_null_cache: TTLCache[tuple[str, str], bool] = TTLCache(
maxsize=_CACHE_MAX_SIZE, ttl=_NULL_CACHE_TTL
)
def invalidate_user_provider_cache(user_id: str, provider: str) -> None:
"""Remove the cached entry for *user_id*/*provider* from both caches.
Call this after storing new credentials so that the next
``get_provider_token()`` call performs a fresh DB lookup instead of
serving a stale TTL-cached result.
"""
key = (user_id, provider)
_token_cache.pop(key, None)
_null_cache.pop(key, None)
# Register this module's cache-bust function with the credentials manager so
# that any create/update/delete operation immediately evicts stale cache
# entries. This avoids a lazy import inside creds_manager and eliminates the
# circular-import risk.
try:
register_creds_changed_hook(invalidate_user_provider_cache)
except RuntimeError:
# Hook already registered (e.g. module re-import in tests).
pass
# Module-level singleton to avoid re-instantiating IntegrationCredentialsManager
# on every cache-miss call to get_provider_token().
_manager = IntegrationCredentialsManager()
async def get_provider_token(user_id: str, provider: str) -> str | None:
"""Return the user's access token for *provider*, or ``None`` if not connected.
OAuth2 tokens are preferred (refreshed if needed); API keys are the fallback.
Found tokens are cached for _TOKEN_CACHE_TTL (5 min). "Not connected" results
are cached for _NULL_CACHE_TTL (60 s) to avoid a DB hit on every bash_exec
command for users who haven't connected yet, while still picking up a
newly-connected account within one minute.
"""
cache_key = (user_id, provider)
if cache_key in _null_cache:
return None
if cached := _token_cache.get(cache_key):
return cached
manager = _manager
try:
creds_list = await manager.store.get_creds_by_provider(user_id, provider)
except Exception:
logger.warning(
"Failed to fetch %s credentials for user %s",
provider,
user_id,
exc_info=True,
)
return None
# Pass 1: prefer OAuth2 (carry scope info, refreshable via token endpoint).
# Sort so broader-scoped tokens come first: a token with "repo" scope covers
# full git access, while a public-data-only token lacks push/pull permission.
# lock=False — background injection; not worth a distributed lock acquisition.
oauth2_creds = sorted(
[c for c in creds_list if c.type == "oauth2"],
key=lambda c: 0 if "repo" in (cast(OAuth2Credentials, c).scopes or []) else 1,
)
for creds in oauth2_creds:
if creds.type == "oauth2":
try:
fresh = await manager.refresh_if_needed(
user_id, cast(OAuth2Credentials, creds), lock=False
)
token = fresh.access_token.get_secret_value()
except Exception:
logger.warning(
"Failed to refresh %s OAuth token for user %s; "
"discarding stale token to force re-auth",
provider,
user_id,
exc_info=True,
)
# Do NOT fall back to the stale token — it is likely expired
# or revoked. Returning None forces the caller to re-auth,
# preventing the LLM from receiving a non-functional token.
continue
_token_cache[cache_key] = token
return token
# Pass 2: fall back to API key (no expiry, no refresh needed).
for creds in creds_list:
if creds.type == "api_key":
token = cast(APIKeyCredentials, creds).api_key.get_secret_value()
_token_cache[cache_key] = token
return token
# No credentials found — cache to avoid repeated DB hits.
_null_cache[cache_key] = True
return None
async def get_integration_env_vars(user_id: str) -> dict[str, str]:
"""Return env vars for all providers the user has connected.
Iterates :data:`PROVIDER_ENV_VARS`, fetches each token, and builds a flat
``{env_var: token}`` dict ready to pass to a subprocess or E2B sandbox.
Only providers with a stored credential contribute entries.
"""
env: dict[str, str] = {}
for provider, var_names in PROVIDER_ENV_VARS.items():
token = await get_provider_token(user_id, provider)
if token:
for var in var_names:
env[var] = token
return env

View File

@@ -1,195 +0,0 @@
"""Tests for integration_creds — TTL cache and token lookup paths."""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from pydantic import SecretStr
from backend.copilot.integration_creds import (
_NULL_CACHE_TTL,
_TOKEN_CACHE_TTL,
PROVIDER_ENV_VARS,
_null_cache,
_token_cache,
get_integration_env_vars,
get_provider_token,
invalidate_user_provider_cache,
)
from backend.data.model import APIKeyCredentials, OAuth2Credentials
_USER = "user-integration-creds-test"
_PROVIDER = "github"
def _make_api_key_creds(key: str = "test-api-key") -> APIKeyCredentials:
return APIKeyCredentials(
id="creds-api-key",
provider=_PROVIDER,
api_key=SecretStr(key),
title="Test API Key",
expires_at=None,
)
def _make_oauth2_creds(token: str = "test-oauth-token") -> OAuth2Credentials:
return OAuth2Credentials(
id="creds-oauth2",
provider=_PROVIDER,
title="Test OAuth",
access_token=SecretStr(token),
refresh_token=SecretStr("test-refresh"),
access_token_expires_at=None,
refresh_token_expires_at=None,
scopes=[],
)
@pytest.fixture(autouse=True)
def clear_caches():
"""Ensure clean caches before and after every test."""
_token_cache.clear()
_null_cache.clear()
yield
_token_cache.clear()
_null_cache.clear()
class TestInvalidateUserProviderCache:
def test_removes_token_entry(self):
key = (_USER, _PROVIDER)
_token_cache[key] = "tok"
invalidate_user_provider_cache(_USER, _PROVIDER)
assert key not in _token_cache
def test_removes_null_entry(self):
key = (_USER, _PROVIDER)
_null_cache[key] = True
invalidate_user_provider_cache(_USER, _PROVIDER)
assert key not in _null_cache
def test_noop_when_key_not_cached(self):
# Should not raise even when there is no cache entry.
invalidate_user_provider_cache("no-such-user", _PROVIDER)
def test_only_removes_targeted_key(self):
other_key = ("other-user", _PROVIDER)
_token_cache[other_key] = "other-tok"
invalidate_user_provider_cache(_USER, _PROVIDER)
assert other_key in _token_cache
class TestGetProviderToken:
@pytest.mark.asyncio(loop_scope="session")
async def test_returns_cached_token_without_db_hit(self):
_token_cache[(_USER, _PROVIDER)] = "cached-tok"
mock_manager = MagicMock()
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
assert result == "cached-tok"
mock_manager.store.get_creds_by_provider.assert_not_called()
@pytest.mark.asyncio(loop_scope="session")
async def test_returns_none_for_null_cached_provider(self):
_null_cache[(_USER, _PROVIDER)] = True
mock_manager = MagicMock()
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
assert result is None
mock_manager.store.get_creds_by_provider.assert_not_called()
@pytest.mark.asyncio(loop_scope="session")
async def test_api_key_creds_returned_and_cached(self):
api_creds = _make_api_key_creds("my-api-key")
mock_manager = MagicMock()
mock_manager.store.get_creds_by_provider = AsyncMock(return_value=[api_creds])
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
assert result == "my-api-key"
assert _token_cache.get((_USER, _PROVIDER)) == "my-api-key"
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth2_preferred_over_api_key(self):
oauth_creds = _make_oauth2_creds("oauth-tok")
api_creds = _make_api_key_creds("api-tok")
mock_manager = MagicMock()
mock_manager.store.get_creds_by_provider = AsyncMock(
return_value=[api_creds, oauth_creds]
)
mock_manager.refresh_if_needed = AsyncMock(return_value=oauth_creds)
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
assert result == "oauth-tok"
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth2_refresh_failure_returns_none(self):
"""On refresh failure, return None instead of caching a stale token."""
oauth_creds = _make_oauth2_creds("stale-oauth-tok")
mock_manager = MagicMock()
mock_manager.store.get_creds_by_provider = AsyncMock(return_value=[oauth_creds])
mock_manager.refresh_if_needed = AsyncMock(side_effect=RuntimeError("network"))
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
# Stale tokens must NOT be returned — forces re-auth.
assert result is None
@pytest.mark.asyncio(loop_scope="session")
async def test_no_credentials_caches_null_entry(self):
mock_manager = MagicMock()
mock_manager.store.get_creds_by_provider = AsyncMock(return_value=[])
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
assert result is None
assert _null_cache.get((_USER, _PROVIDER)) is True
@pytest.mark.asyncio(loop_scope="session")
async def test_db_exception_returns_none_without_caching(self):
mock_manager = MagicMock()
mock_manager.store.get_creds_by_provider = AsyncMock(
side_effect=RuntimeError("db down")
)
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
assert result is None
# DB errors are not cached — next call will retry
assert (_USER, _PROVIDER) not in _token_cache
assert (_USER, _PROVIDER) not in _null_cache
@pytest.mark.asyncio(loop_scope="session")
async def test_null_cache_has_shorter_ttl_than_token_cache(self):
"""Verify the TTL constants are set correctly for each cache."""
assert _null_cache.ttl == _NULL_CACHE_TTL
assert _token_cache.ttl == _TOKEN_CACHE_TTL
assert _NULL_CACHE_TTL < _TOKEN_CACHE_TTL
class TestGetIntegrationEnvVars:
@pytest.mark.asyncio(loop_scope="session")
async def test_injects_all_env_vars_for_provider(self):
_token_cache[(_USER, "github")] = "gh-tok"
result = await get_integration_env_vars(_USER)
for var in PROVIDER_ENV_VARS["github"]:
assert result[var] == "gh-tok"
@pytest.mark.asyncio(loop_scope="session")
async def test_empty_dict_when_no_credentials(self):
_null_cache[(_USER, "github")] = True
result = await get_integration_env_vars(_USER)
assert result == {}

View File

@@ -6,24 +6,39 @@ handling the distinction between:
- Local mode vs E2B mode (storage/filesystem differences)
"""
from backend.blocks.autopilot import AUTOPILOT_BLOCK_ID
from backend.copilot.tools import TOOL_REGISTRY
# Shared technical notes that apply to both SDK and baseline modes
_SHARED_TOOL_NOTES = f"""\
_SHARED_TOOL_NOTES = """\
### Sharing files
After `write_workspace_file`, embed the `download_url` in Markdown:
- File: `[report.csv](workspace://file_id#text/csv)`
- Image: `![chart](workspace://file_id#image/png)`
- Video: `![recording](workspace://file_id#video/mp4)`
### Sharing files with the user
After saving a file to the persistent workspace with `write_workspace_file`,
share it with the user by embedding the `download_url` from the response in
your message as a Markdown link or image:
### File references — @@agptfile:
Pass large file content to tools by reference: `@@agptfile:<uri>[<start>-<end>]`
- `workspace://<file_id>` or `workspace:///<path>` — workspace files
- `/absolute/path` — local/sandbox files
- `[start-end]` — optional 1-indexed line range
- Multiple refs per argument supported. Only `workspace://` and absolute paths are expanded.
- **Any file** — shows as a clickable download link:
`[report.csv](workspace://file_id#text/csv)`
- **Image** — renders inline in chat:
`![chart](workspace://file_id#image/png)`
- **Video** — renders inline in chat with player controls:
`![recording](workspace://file_id#video/mp4)`
The `download_url` field in the `write_workspace_file` response is already
in the correct format — paste it directly after the `(` in the Markdown.
### Passing file content to tools — @@agptfile: references
Instead of copying large file contents into a tool argument, pass a file
reference and the platform will load the content for you.
Syntax: `@@agptfile:<uri>[<start>-<end>]`
- `<uri>` **must** start with `workspace://` or `/` (absolute path):
- `workspace://<file_id>` — workspace file by ID
- `workspace:///<path>` — workspace file by virtual path
- `/absolute/local/path` — ephemeral or sdk_cwd file
- E2B sandbox absolute path (e.g. `/home/user/script.py`)
- `[<start>-<end>]` is an optional 1-indexed inclusive line range.
- URIs that do not start with `workspace://` or `/` are **not** expanded.
Examples:
```
@@ -34,9 +49,21 @@ Examples:
@@agptfile:/home/user/script.py
```
**Structured data**: When the entire argument is a single file reference, the platform auto-parses by extension/MIME. Supported: JSON, JSONL, CSV, TSV, YAML, TOML, Parquet, Excel (.xlsx only; legacy `.xls` is NOT supported). Unrecognised formats return plain string.
You can embed a reference inside any string argument, or use it as the entire
value. Multiple references in one argument are all expanded.
**Type coercion**: The platform auto-coerces expanded string values to match block input types (e.g. JSON string → `list[list[str]]`).
**Structured data**: When the **entire** argument value is a single file
reference (no surrounding text), the platform automatically parses the file
content based on its extension or MIME type. Supported formats: JSON, JSONL,
CSV, TSV, YAML, TOML, Parquet, and Excel (.xlsx — first sheet only).
For example, pass `@@agptfile:workspace://<id>` where the file is a `.csv` and
the rows will be parsed into `list[list[str]]` automatically. If the format is
unrecognised or parsing fails, the content is returned as a plain string.
Legacy `.xls` files are **not** supported — only the modern `.xlsx` format.
**Type coercion**: The platform also coerces expanded values to match the
block's expected input types. For example, if a block expects `list[list[str]]`
and the expanded value is a JSON string, it will be parsed into the correct type.
### Media file inputs (format: "file")
Some block inputs accept media files — their schema shows `"format": "file"`.
@@ -54,53 +81,18 @@ that would be corrupted by text encoding.
Example — committing an image file to GitHub:
```json
{{
"files": [{{
{
"files": [{
"path": "docs/hero.png",
"content": "workspace://abc123#image/png",
"operation": "upsert"
}}]
}}
}]
}
```
### Sub-agent tasks
- When using the Task tool, NEVER set `run_in_background` to true.
All tasks must run in the foreground.
### Delegating to another autopilot (sub-autopilot pattern)
Use the **AutoPilotBlock** (`run_block` with block_id
`{AUTOPILOT_BLOCK_ID}`) to delegate a task to a fresh
autopilot instance. The sub-autopilot has its own full tool set and can
perform multi-step work autonomously.
- **Input**: `prompt` (required) — the task description.
Optional: `system_context` to constrain behavior, `session_id` to
continue a previous conversation, `max_recursion_depth` (default 3).
- **Output**: `response` (text), `tool_calls` (list), `session_id`
(for continuation), `conversation_history`, `token_usage`.
Use this when a task is complex enough to benefit from a separate
autopilot context, e.g. "research X and write a report" while the
parent autopilot handles orchestration.
"""
# E2B-only notes — E2B has full internet access so gh CLI works there.
# Not shown in local (bubblewrap) mode: --unshare-net blocks all network.
_E2B_TOOL_NOTES = """
### GitHub CLI (`gh`) and git
- If the user has connected their GitHub account, both `gh` and `git` are
pre-authenticated — use them directly without any manual login step.
`git` HTTPS operations (clone, push, pull) work automatically.
- If the token changes mid-session (e.g. user reconnects with a new token),
run `gh auth setup-git` to re-register the credential helper.
- If `gh` or `git` fails with an authentication error (e.g. "authentication
required", "could not read Username", or exit code 128), call
`connect_integration(provider="github")` to surface the GitHub credentials
setup card so the user can connect their account. Once connected, retry
the operation.
- For operations that need broader access (e.g. private org repos, GitHub
Actions), pass the required scopes: e.g.
`connect_integration(provider="github", scopes=["repo", "read:org"])`.
"""
@@ -113,7 +105,6 @@ def _build_storage_supplement(
storage_system_1_persistence: list[str],
file_move_name_1_to_2: str,
file_move_name_2_to_1: str,
extra_notes: str = "",
) -> str:
"""Build storage/filesystem supplement for a specific environment.
@@ -128,7 +119,6 @@ def _build_storage_supplement(
storage_system_1_persistence: List of persistence behavior descriptions
file_move_name_1_to_2: Direction label for primary→persistent
file_move_name_2_to_1: Direction label for persistent→primary
extra_notes: Environment-specific notes appended after shared notes
"""
# Format lists as bullet points with proper indentation
characteristics = "\n".join(f" - {c}" for c in storage_system_1_characteristics)
@@ -138,12 +128,17 @@ def _build_storage_supplement(
## Tool notes
### Shell & filesystem
- The SDK built-in Bash tool is NOT available. Use `bash_exec` for shell commands ({sandbox_type}). Working dir: `{working_dir}`
- SDK file tools (Read/Write/Edit/Glob/Grep) and `bash_exec` share one filesystem — use relative or absolute paths under this dir.
- `read_workspace_file`/`write_workspace_file` operate on **persistent cloud workspace storage** (separate from the working dir).
### Shell commands
- The SDK built-in Bash tool is NOT available. Use the `bash_exec` MCP tool
for shell commands — it runs {sandbox_type}.
### Working directory
- Your working directory is: `{working_dir}`
- All SDK file tools AND `bash_exec` operate on the same filesystem
- Use relative paths or absolute paths under `{working_dir}` for all file operations
### Two storage systems — CRITICAL to understand
1. **{storage_system_1_name}** (`{working_dir}`):
{characteristics}
{persistence}
@@ -164,16 +159,12 @@ a local file under `~/.claude/projects/.../tool-results/`. To read these files,
always use `read_file` or `Read` (NOT `read_workspace_file`).
`read_workspace_file` reads from cloud workspace storage, where SDK
tool-results are NOT stored.
{_SHARED_TOOL_NOTES}{extra_notes}"""
{_SHARED_TOOL_NOTES}"""
# Pre-built supplements for common environments
def _get_local_storage_supplement(cwd: str) -> str:
"""Local ephemeral storage (files lost between turns).
Network is isolated (bubblewrap --unshare-net), so internet-dependent CLIs
like gh will not work — no integration env-var notes are included.
"""
"""Local ephemeral storage (files lost between turns)."""
return _build_storage_supplement(
working_dir=cwd,
sandbox_type="in a network-isolated sandbox",
@@ -191,11 +182,7 @@ def _get_local_storage_supplement(cwd: str) -> str:
def _get_cloud_sandbox_supplement() -> str:
"""Cloud persistent sandbox (files survive across turns in session).
E2B has full internet access, so integration tokens (GH_TOKEN etc.) are
injected per command in bash_exec — include the CLI guidance notes.
"""
"""Cloud persistent sandbox (files survive across turns in session)."""
return _build_storage_supplement(
working_dir="/home/user",
sandbox_type="in a cloud sandbox with full internet access",
@@ -210,7 +197,6 @@ def _get_cloud_sandbox_supplement() -> str:
],
file_move_name_1_to_2="Sandbox → Persistent",
file_move_name_2_to_1="Persistent → Sandbox",
extra_notes=_E2B_TOOL_NOTES,
)

View File

@@ -1,63 +0,0 @@
"""Single source of truth for copilot-supported integration providers.
Both :mod:`~backend.copilot.integration_creds` (env-var injection) and
:mod:`~backend.copilot.tools.connect_integration` (UI setup card) import from
here, eliminating the risk of the two registries drifting out of sync.
"""
from typing import TypedDict
class ProviderEntry(TypedDict):
"""Metadata for a supported integration provider.
Attributes:
name: Human-readable display name (e.g. "GitHub").
env_vars: Environment variable names injected when the provider is
connected (e.g. ``["GH_TOKEN", "GITHUB_TOKEN"]``).
default_scopes: Default OAuth scopes requested when the agent does not
specify any.
"""
name: str
env_vars: list[str]
default_scopes: list[str]
def _is_github_oauth_configured() -> bool:
"""Return True if GitHub OAuth env vars are set.
Uses a lazy import to avoid triggering ``Secrets()`` during module import,
which can fail in environments where secrets are not yet loaded (e.g. tests,
CLI tooling).
"""
from backend.blocks.github._auth import GITHUB_OAUTH_IS_CONFIGURED
return GITHUB_OAUTH_IS_CONFIGURED
# -- Registry ----------------------------------------------------------------
# Add new providers here. Both env-var injection and the setup-card tool read
# from this single registry.
SUPPORTED_PROVIDERS: dict[str, ProviderEntry] = {
"github": {
"name": "GitHub",
"env_vars": ["GH_TOKEN", "GITHUB_TOKEN"],
"default_scopes": ["repo"],
},
}
def get_provider_auth_types(provider: str) -> list[str]:
"""Return the supported credential types for *provider* at runtime.
OAuth types are only offered when the corresponding OAuth client env vars
are configured.
"""
if provider == "github":
if _is_github_oauth_configured():
return ["api_key", "oauth2"]
return ["api_key"]
# Default for unknown/future providers — API key only.
return ["api_key"]

View File

@@ -43,7 +43,6 @@ class ResponseType(str, Enum):
ERROR = "error"
USAGE = "usage"
HEARTBEAT = "heartbeat"
STATUS = "status"
class StreamBaseResponse(BaseModel):
@@ -264,19 +263,3 @@ class StreamHeartbeat(StreamBaseResponse):
def to_sse(self) -> str:
"""Convert to SSE comment format to keep connection alive."""
return ": heartbeat\n\n"
class StreamStatus(StreamBaseResponse):
"""Transient status notification shown to the user during long operations.
Used to provide feedback when the backend performs behind-the-scenes work
(e.g., compacting conversation context on a retry) that would otherwise
leave the user staring at an unexplained pause.
Sent as a proper ``data:`` event so the frontend can display it to the
user. The AI SDK stream parser gracefully skips unknown chunk types
(logs a console warning), so this does not break the stream.
"""
type: ResponseType = ResponseType.STATUS
message: str = Field(..., description="Human-readable status message")

View File

@@ -24,14 +24,10 @@ from typing import TYPE_CHECKING, Any
# Static imports for type checkers so they can resolve __all__ entries
# without executing the lazy-import machinery at runtime.
if TYPE_CHECKING:
from .collect import CopilotResult as CopilotResult
from .collect import collect_copilot_response as collect_copilot_response
from .service import stream_chat_completion_sdk as stream_chat_completion_sdk
from .tool_adapter import create_copilot_mcp_server as create_copilot_mcp_server
__all__ = [
"CopilotResult",
"collect_copilot_response",
"stream_chat_completion_sdk",
"create_copilot_mcp_server",
]
@@ -39,8 +35,6 @@ __all__ = [
# Dispatch table for PEP 562 lazy imports. Each entry is a (module, attr)
# pair so new exports can be added without touching __getattr__ itself.
_LAZY_IMPORTS: dict[str, tuple[str, str]] = {
"CopilotResult": (".collect", "CopilotResult"),
"collect_copilot_response": (".collect", "collect_copilot_response"),
"stream_chat_completion_sdk": (".service", "stream_chat_completion_sdk"),
"create_copilot_mcp_server": (".tool_adapter", "create_copilot_mcp_server"),
}

View File

@@ -1,108 +0,0 @@
"""Public helpers for consuming a copilot stream as a simple request-response.
This module exposes :class:`CopilotResult` and :func:`collect_copilot_response`
so that callers (e.g. the AutoPilot block) can consume the copilot stream
without implementing their own event loop.
"""
from __future__ import annotations
from typing import Any
class CopilotResult:
"""Aggregated result from consuming a copilot stream.
Returned by :func:`collect_copilot_response` so callers don't need to
implement their own event-loop over the raw stream events.
"""
__slots__ = (
"response_text",
"tool_calls",
"prompt_tokens",
"completion_tokens",
"total_tokens",
)
def __init__(self) -> None:
self.response_text: str = ""
self.tool_calls: list[dict[str, Any]] = []
self.prompt_tokens: int = 0
self.completion_tokens: int = 0
self.total_tokens: int = 0
async def collect_copilot_response(
*,
session_id: str,
message: str,
user_id: str,
is_user_message: bool = True,
) -> CopilotResult:
"""Consume :func:`stream_chat_completion_sdk` and return aggregated results.
This is the recommended entry-point for callers that need a simple
request-response interface (e.g. the AutoPilot block) rather than
streaming individual events. It avoids duplicating the event-collection
logic and does NOT wrap the stream in ``asyncio.timeout`` — the SDK
manages its own heartbeat-based timeouts internally.
Args:
session_id: Chat session to use.
message: The user message / prompt.
user_id: Authenticated user ID.
is_user_message: Whether this is a user-initiated message.
Returns:
A :class:`CopilotResult` with the aggregated response text,
tool calls, and token usage.
Raises:
RuntimeError: If the stream yields a ``StreamError`` event.
"""
from backend.copilot.response_model import (
StreamError,
StreamTextDelta,
StreamToolInputAvailable,
StreamToolOutputAvailable,
StreamUsage,
)
from .service import stream_chat_completion_sdk
result = CopilotResult()
response_parts: list[str] = []
tool_calls_by_id: dict[str, dict[str, Any]] = {}
async for event in stream_chat_completion_sdk(
session_id=session_id,
message=message,
is_user_message=is_user_message,
user_id=user_id,
):
if isinstance(event, StreamTextDelta):
response_parts.append(event.delta)
elif isinstance(event, StreamToolInputAvailable):
entry: dict[str, Any] = {
"tool_call_id": event.toolCallId,
"tool_name": event.toolName,
"input": event.input,
"output": None,
"success": None,
}
result.tool_calls.append(entry)
tool_calls_by_id[event.toolCallId] = entry
elif isinstance(event, StreamToolOutputAvailable):
if tc := tool_calls_by_id.get(event.toolCallId):
tc["output"] = event.output
tc["success"] = event.success
elif isinstance(event, StreamUsage):
result.prompt_tokens += event.prompt_tokens
result.completion_tokens += event.completion_tokens
result.total_tokens += event.total_tokens
elif isinstance(event, StreamError):
raise RuntimeError(f"Copilot error: {event.errorText}")
result.response_text = "".join(response_parts)
return result

View File

@@ -12,7 +12,6 @@ import asyncio
import logging
import uuid
from dataclasses import dataclass, field
from typing import Any
from ..constants import COMPACTION_DONE_MSG, COMPACTION_TOOL_NAME
from ..model import ChatMessage, ChatSession
@@ -120,12 +119,14 @@ def filter_compaction_messages(
filtered: list[ChatMessage] = []
for msg in messages:
if msg.role == "assistant" and msg.tool_calls:
real_calls: list[dict[str, Any]] = []
for tc in msg.tool_calls:
if tc.get("function", {}).get("name") == COMPACTION_TOOL_NAME:
compaction_ids.add(tc.get("id", ""))
else:
real_calls.append(tc)
real_calls = [
tc
for tc in msg.tool_calls
if tc.get("function", {}).get("name") != COMPACTION_TOOL_NAME
]
if not real_calls and not msg.content:
continue
if msg.role == "tool" and msg.tool_call_id in compaction_ids:
@@ -221,7 +222,6 @@ class CompactionTracker:
def reset_for_query(self) -> None:
"""Reset per-query state before a new SDK query."""
self._compact_start.clear()
self._done = False
self._start_emitted = False
self._tool_call_id = ""

View File

@@ -1,54 +0,0 @@
"""Shared test fixtures for copilot SDK tests."""
from __future__ import annotations
from unittest.mock import patch
from uuid import uuid4
import pytest
from backend.util import json
@pytest.fixture()
def mock_chat_config():
"""Mock ChatConfig so compact_transcript tests skip real config lookup."""
with patch(
"backend.copilot.config.ChatConfig",
return_value=type("Cfg", (), {"model": "m", "api_key": "k", "base_url": "u"})(),
):
yield
def build_test_transcript(pairs: list[tuple[str, str]]) -> str:
"""Build a minimal valid JSONL transcript from (role, content) pairs.
Use this helper in any copilot SDK test that needs a well-formed
transcript without hitting the real storage layer.
"""
lines: list[str] = []
last_uuid: str | None = None
for role, content in pairs:
uid = str(uuid4())
entry_type = "assistant" if role == "assistant" else "user"
msg: dict = {"role": role, "content": content}
if role == "assistant":
msg.update(
{
"model": "",
"id": f"msg_{uid[:8]}",
"type": "message",
"content": [{"type": "text", "text": content}],
"stop_reason": "end_turn",
"stop_sequence": None,
}
)
entry = {
"type": entry_type,
"uuid": uid,
"parentUuid": last_uuid,
"message": msg,
}
lines.append(json.dumps(entry, separators=(",", ":")))
last_uuid = uid
return "\n".join(lines) + "\n"

View File

@@ -1,651 +0,0 @@
"""Tests for retry logic and transcript compaction helpers."""
from __future__ import annotations
import asyncio
from unittest.mock import AsyncMock, patch
from uuid import uuid4
import pytest
from backend.util import json
from backend.util.prompt import CompressResult
from .conftest import build_test_transcript as _build_transcript
from .service import _friendly_error_text, _is_prompt_too_long
from .transcript import (
_flatten_assistant_content,
_flatten_tool_result_content,
_messages_to_transcript,
_run_compression,
_transcript_to_messages,
compact_transcript,
validate_transcript,
)
# ---------------------------------------------------------------------------
# _flatten_assistant_content
# ---------------------------------------------------------------------------
class TestFlattenAssistantContent:
def test_text_blocks(self):
blocks = [
{"type": "text", "text": "Hello"},
{"type": "text", "text": "World"},
]
assert _flatten_assistant_content(blocks) == "Hello\nWorld"
def test_tool_use_blocks(self):
blocks = [{"type": "tool_use", "name": "read_file", "input": {}}]
assert _flatten_assistant_content(blocks) == "[tool_use: read_file]"
def test_mixed_blocks(self):
blocks = [
{"type": "text", "text": "Let me read that."},
{"type": "tool_use", "name": "Read", "input": {"path": "/foo"}},
]
result = _flatten_assistant_content(blocks)
assert "Let me read that." in result
assert "[tool_use: Read]" in result
def test_raw_strings(self):
assert _flatten_assistant_content(["hello", "world"]) == "hello\nworld"
def test_unknown_block_type_preserved_as_placeholder(self):
blocks = [
{"type": "text", "text": "See this image:"},
{"type": "image", "source": {"type": "base64", "data": "..."}},
]
result = _flatten_assistant_content(blocks)
assert "See this image:" in result
assert "[__image__]" in result
def test_empty(self):
assert _flatten_assistant_content([]) == ""
# ---------------------------------------------------------------------------
# _flatten_tool_result_content
# ---------------------------------------------------------------------------
class TestFlattenToolResultContent:
def test_tool_result_with_text(self):
blocks = [
{
"type": "tool_result",
"tool_use_id": "123",
"content": [{"type": "text", "text": "file contents here"}],
}
]
assert _flatten_tool_result_content(blocks) == "file contents here"
def test_tool_result_with_string_content(self):
blocks = [{"type": "tool_result", "tool_use_id": "123", "content": "ok"}]
assert _flatten_tool_result_content(blocks) == "ok"
def test_text_block(self):
blocks = [{"type": "text", "text": "plain text"}]
assert _flatten_tool_result_content(blocks) == "plain text"
def test_raw_string(self):
assert _flatten_tool_result_content(["raw"]) == "raw"
def test_tool_result_with_none_content(self):
"""tool_result with content=None should produce empty string."""
blocks = [{"type": "tool_result", "tool_use_id": "x", "content": None}]
assert _flatten_tool_result_content(blocks) == ""
def test_tool_result_with_empty_list_content(self):
"""tool_result with content=[] should produce empty string."""
blocks = [{"type": "tool_result", "tool_use_id": "x", "content": []}]
assert _flatten_tool_result_content(blocks) == ""
def test_empty(self):
assert _flatten_tool_result_content([]) == ""
def test_nested_dict_without_text(self):
"""Dict blocks without text key use json.dumps fallback."""
blocks = [
{
"type": "tool_result",
"tool_use_id": "x",
"content": [{"type": "image", "source": "data:..."}],
}
]
result = _flatten_tool_result_content(blocks)
assert "image" in result # json.dumps fallback
def test_unknown_block_type_preserved_as_placeholder(self):
blocks = [{"type": "image", "source": {"type": "base64", "data": "..."}}]
result = _flatten_tool_result_content(blocks)
assert "[__image__]" in result
# ---------------------------------------------------------------------------
# _transcript_to_messages
# ---------------------------------------------------------------------------
def _make_entry(entry_type: str, role: str, content: str | list, **kwargs) -> str:
"""Build a JSONL line for testing."""
uid = str(uuid4())
msg: dict = {"role": role, "content": content}
msg.update(kwargs)
entry = {
"type": entry_type,
"uuid": uid,
"parentUuid": None,
"message": msg,
}
return json.dumps(entry, separators=(",", ":"))
class TestTranscriptToMessages:
def test_basic_roundtrip(self):
lines = [
_make_entry("user", "user", "Hello"),
_make_entry("assistant", "assistant", [{"type": "text", "text": "Hi"}]),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 2
assert messages[0] == {"role": "user", "content": "Hello"}
assert messages[1] == {"role": "assistant", "content": "Hi"}
def test_skips_strippable_types(self):
"""Progress and metadata entries are excluded."""
lines = [
_make_entry("user", "user", "Hello"),
json.dumps(
{
"type": "progress",
"uuid": str(uuid4()),
"parentUuid": None,
"message": {"role": "assistant", "content": "..."},
}
),
_make_entry("assistant", "assistant", [{"type": "text", "text": "Hi"}]),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 2
def test_empty_content(self):
assert _transcript_to_messages("") == []
def test_tool_result_content(self):
"""User entries with tool_result content blocks are flattened."""
lines = [
_make_entry(
"user",
"user",
[
{
"type": "tool_result",
"tool_use_id": "123",
"content": "tool output",
}
],
),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 1
assert messages[0]["content"] == "tool output"
def test_malformed_json_lines_skipped(self):
"""Malformed JSON lines in transcript are silently skipped."""
lines = [
_make_entry("user", "user", "Hello"),
"this is not valid json",
_make_entry("assistant", "assistant", [{"type": "text", "text": "Hi"}]),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 2
def test_empty_lines_skipped(self):
"""Empty lines and whitespace-only lines are skipped."""
lines = [
_make_entry("user", "user", "Hello"),
"",
" ",
_make_entry("assistant", "assistant", [{"type": "text", "text": "Hi"}]),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 2
def test_unicode_content_preserved(self):
"""Unicode characters survive transcript roundtrip."""
lines = [
_make_entry("user", "user", "Hello 你好 🌍"),
_make_entry(
"assistant",
"assistant",
[{"type": "text", "text": "Bonjour 日本語 émojis 🎉"}],
),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert messages[0]["content"] == "Hello 你好 🌍"
assert messages[1]["content"] == "Bonjour 日本語 émojis 🎉"
def test_entry_without_role_skipped(self):
"""Entries with missing role in message are skipped."""
entry_no_role = json.dumps(
{
"type": "user",
"uuid": str(uuid4()),
"parentUuid": None,
"message": {"content": "no role here"},
}
)
lines = [
entry_no_role,
_make_entry("user", "user", "Hello"),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 1
assert messages[0]["content"] == "Hello"
def test_tool_use_and_result_pairs(self):
"""Tool use + tool result pairs are properly flattened."""
lines = [
_make_entry(
"assistant",
"assistant",
[
{"type": "text", "text": "Let me check."},
{"type": "tool_use", "name": "read_file", "input": {"path": "/x"}},
],
),
_make_entry(
"user",
"user",
[
{
"type": "tool_result",
"tool_use_id": "abc",
"content": [{"type": "text", "text": "file contents"}],
}
],
),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 2
assert "Let me check." in messages[0]["content"]
assert "[tool_use: read_file]" in messages[0]["content"]
assert messages[1]["content"] == "file contents"
# ---------------------------------------------------------------------------
# _messages_to_transcript
# ---------------------------------------------------------------------------
class TestMessagesToTranscript:
def test_produces_valid_jsonl(self):
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"},
]
result = _messages_to_transcript(messages)
lines = result.strip().split("\n")
assert len(lines) == 2
for line in lines:
parsed = json.loads(line)
assert "type" in parsed
assert "uuid" in parsed
assert "message" in parsed
def test_assistant_has_proper_structure(self):
messages = [{"role": "assistant", "content": "Hello"}]
result = _messages_to_transcript(messages)
entry = json.loads(result.strip())
assert entry["type"] == "assistant"
msg = entry["message"]
assert msg["role"] == "assistant"
assert msg["type"] == "message"
assert msg["stop_reason"] == "end_turn"
assert isinstance(msg["content"], list)
assert msg["content"][0]["type"] == "text"
def test_user_has_plain_content(self):
messages = [{"role": "user", "content": "Hi"}]
result = _messages_to_transcript(messages)
entry = json.loads(result.strip())
assert entry["type"] == "user"
assert entry["message"]["content"] == "Hi"
def test_parent_uuid_chain(self):
messages = [
{"role": "user", "content": "A"},
{"role": "assistant", "content": "B"},
{"role": "user", "content": "C"},
]
result = _messages_to_transcript(messages)
lines = result.strip().split("\n")
entries = [json.loads(line) for line in lines]
assert entries[0]["parentUuid"] == ""
assert entries[1]["parentUuid"] == entries[0]["uuid"]
assert entries[2]["parentUuid"] == entries[1]["uuid"]
def test_empty_messages(self):
assert _messages_to_transcript([]) == ""
def test_output_is_valid_transcript(self):
"""Output should pass validate_transcript if it has assistant entries."""
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi"},
]
result = _messages_to_transcript(messages)
assert validate_transcript(result)
def test_roundtrip_to_messages(self):
"""Messages → transcript → messages preserves structure."""
original = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"},
{"role": "user", "content": "How are you?"},
]
transcript = _messages_to_transcript(original)
restored = _transcript_to_messages(transcript)
assert len(restored) == len(original)
for orig, rest in zip(original, restored):
assert orig["role"] == rest["role"]
assert orig["content"] == rest["content"]
# ---------------------------------------------------------------------------
# compact_transcript
# ---------------------------------------------------------------------------
class TestCompactTranscript:
@pytest.mark.asyncio
async def test_too_few_messages_returns_none(self, mock_chat_config):
"""compact_transcript returns None when transcript has < 2 messages."""
transcript = _build_transcript([("user", "Hello")])
result = await compact_transcript(transcript, model="test-model")
assert result is None
@pytest.mark.asyncio
async def test_returns_none_when_not_compacted(self, mock_chat_config):
"""When compress_context says no compaction needed, returns None.
The compressor couldn't reduce it, so retrying with the same
content would fail identically."""
transcript = _build_transcript(
[
("user", "Hello"),
("assistant", "Hi there"),
]
)
mock_result = type(
"CompressResult",
(),
{
"was_compacted": False,
"messages": [],
"original_token_count": 100,
"token_count": 100,
"messages_summarized": 0,
"messages_dropped": 0,
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):
result = await compact_transcript(transcript, model="test-model")
assert result is None
@pytest.mark.asyncio
async def test_returns_compacted_transcript(self, mock_chat_config):
"""When compaction succeeds, returns a valid compacted transcript."""
transcript = _build_transcript(
[
("user", "Hello"),
("assistant", "Hi"),
("user", "More"),
("assistant", "Details"),
]
)
compacted_msgs = [
{"role": "user", "content": "[summary]"},
{"role": "assistant", "content": "Summarized response"},
]
mock_result = type(
"CompressResult",
(),
{
"was_compacted": True,
"messages": compacted_msgs,
"original_token_count": 500,
"token_count": 100,
"messages_summarized": 2,
"messages_dropped": 0,
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):
result = await compact_transcript(transcript, model="test-model")
assert result is not None
assert validate_transcript(result)
msgs = _transcript_to_messages(result)
assert len(msgs) == 2
assert msgs[1]["content"] == "Summarized response"
@pytest.mark.asyncio
async def test_returns_none_on_compression_failure(self, mock_chat_config):
"""When _run_compression raises, returns None."""
transcript = _build_transcript(
[
("user", "Hello"),
("assistant", "Hi"),
]
)
with patch(
"backend.copilot.sdk.transcript._run_compression",
new_callable=AsyncMock,
side_effect=RuntimeError("LLM unavailable"),
):
result = await compact_transcript(transcript, model="test-model")
assert result is None
# ---------------------------------------------------------------------------
# _is_prompt_too_long
# ---------------------------------------------------------------------------
class TestIsPromptTooLong:
"""Unit tests for _is_prompt_too_long pattern matching."""
def test_prompt_is_too_long(self):
err = RuntimeError("prompt is too long for model context")
assert _is_prompt_too_long(err) is True
def test_request_too_large(self):
err = Exception("request too large: 250000 tokens")
assert _is_prompt_too_long(err) is True
def test_maximum_context_length(self):
err = ValueError("maximum context length exceeded")
assert _is_prompt_too_long(err) is True
def test_context_length_exceeded(self):
err = Exception("context_length_exceeded")
assert _is_prompt_too_long(err) is True
def test_input_tokens_exceed(self):
err = Exception("input tokens exceed the max_tokens limit")
assert _is_prompt_too_long(err) is True
def test_input_is_too_long(self):
err = Exception("input is too long for the model")
assert _is_prompt_too_long(err) is True
def test_content_length_exceeds(self):
err = Exception("content length exceeds maximum")
assert _is_prompt_too_long(err) is True
def test_unrelated_error_returns_false(self):
err = RuntimeError("network timeout")
assert _is_prompt_too_long(err) is False
def test_auth_error_returns_false(self):
err = Exception("authentication failed: invalid API key")
assert _is_prompt_too_long(err) is False
def test_chained_exception_detected(self):
"""Prompt-too-long error wrapped in another exception is detected."""
inner = RuntimeError("prompt is too long")
outer = Exception("SDK error")
outer.__cause__ = inner
assert _is_prompt_too_long(outer) is True
def test_case_insensitive(self):
err = Exception("PROMPT IS TOO LONG")
assert _is_prompt_too_long(err) is True
def test_old_max_tokens_exceeded_not_matched(self):
"""The old broad 'max_tokens_exceeded' pattern was removed.
Only 'input tokens exceed' should match now."""
err = Exception("max_tokens_exceeded")
assert _is_prompt_too_long(err) is False
# ---------------------------------------------------------------------------
# _run_compression timeout fallback
# ---------------------------------------------------------------------------
class TestRunCompressionTimeout:
"""Verify _run_compression falls back to truncation when LLM times out."""
@pytest.mark.asyncio
async def test_timeout_falls_back_to_truncation(self):
"""When compress_context with LLM client times out,
_run_compression falls back to truncation (client=None)."""
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"},
]
truncation_result = CompressResult(
messages=messages,
was_compacted=False,
original_token_count=50,
token_count=50,
messages_summarized=0,
messages_dropped=0,
)
call_args: list[dict] = []
async def _mock_compress(**kwargs):
call_args.append(kwargs)
if kwargs.get("client") is not None:
# Simulate timeout by raising asyncio.TimeoutError
raise asyncio.TimeoutError("LLM compaction timed out")
return truncation_result
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
return_value="fake-client",
),
patch(
"backend.copilot.sdk.transcript.compress_context",
side_effect=_mock_compress,
),
):
result = await _run_compression(messages, "test-model", "[test]")
assert result == truncation_result
# Should have been called twice: once with client, once without
assert len(call_args) == 2
assert call_args[0]["client"] is not None # LLM attempt
assert call_args[1]["client"] is None # truncation fallback
@pytest.mark.asyncio
async def test_no_client_uses_truncation_directly(self):
"""When no OpenAI client is configured, goes straight to truncation."""
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"},
]
truncation_result = CompressResult(
messages=messages,
was_compacted=False,
original_token_count=50,
token_count=50,
messages_summarized=0,
messages_dropped=0,
)
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
return_value=None,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
new_callable=AsyncMock,
return_value=truncation_result,
) as mock_compress,
):
result = await _run_compression(messages, "test-model", "[test]")
assert result == truncation_result
mock_compress.assert_called_once()
# When no client, compress_context is called with client=None
assert mock_compress.call_args.kwargs.get("client") is None
# ---------------------------------------------------------------------------
# _friendly_error_text
# ---------------------------------------------------------------------------
class TestFriendlyErrorText:
"""Verify user-friendly error message mapping."""
def test_authentication_error(self):
result = _friendly_error_text("authentication failed: invalid API key")
assert "Authentication" in result
assert "API key" in result
def test_rate_limit_error(self):
result = _friendly_error_text("rate limit exceeded")
assert "Rate limit" in result
def test_overloaded_error(self):
result = _friendly_error_text("API is overloaded")
assert "overloaded" in result
def test_timeout_error(self):
result = _friendly_error_text("Request timeout after 30s")
assert "timed out" in result
def test_connection_error(self):
result = _friendly_error_text("Connection refused")
assert "Connection" in result or "connection" in result
def test_unknown_error_passthrough(self):
result = _friendly_error_text("some unknown error XYZ")
assert "SDK stream error:" in result
assert "XYZ" in result
def test_unauthorized_error(self):
result = _friendly_error_text("401 Unauthorized")
assert "Authentication" in result

File diff suppressed because it is too large Load Diff

View File

@@ -1,283 +0,0 @@
"""Unit tests for extracted service helpers.
Covers ``_is_prompt_too_long``, ``_reduce_context``, ``_iter_sdk_messages``,
and the ``ReducedContext`` named tuple.
"""
from __future__ import annotations
import asyncio
from collections.abc import AsyncGenerator
from unittest.mock import AsyncMock, patch
import pytest
from .conftest import build_test_transcript as _build_transcript
from .service import (
ReducedContext,
_is_prompt_too_long,
_iter_sdk_messages,
_reduce_context,
)
# ---------------------------------------------------------------------------
# _is_prompt_too_long
# ---------------------------------------------------------------------------
class TestIsPromptTooLong:
def test_direct_match(self) -> None:
assert _is_prompt_too_long(Exception("prompt is too long")) is True
def test_case_insensitive(self) -> None:
assert _is_prompt_too_long(Exception("PROMPT IS TOO LONG")) is True
def test_no_match(self) -> None:
assert _is_prompt_too_long(Exception("network timeout")) is False
def test_request_too_large(self) -> None:
assert _is_prompt_too_long(Exception("request too large for model")) is True
def test_context_length_exceeded(self) -> None:
assert _is_prompt_too_long(Exception("context_length_exceeded")) is True
def test_max_tokens_exceeded_not_matched(self) -> None:
"""'max_tokens_exceeded' is intentionally excluded (too broad)."""
assert _is_prompt_too_long(Exception("max_tokens_exceeded")) is False
def test_max_tokens_config_error_no_match(self) -> None:
"""'max_tokens must be at least 1' should NOT match."""
assert _is_prompt_too_long(Exception("max_tokens must be at least 1")) is False
def test_chained_cause(self) -> None:
inner = Exception("prompt is too long")
outer = RuntimeError("SDK error")
outer.__cause__ = inner
assert _is_prompt_too_long(outer) is True
def test_chained_context(self) -> None:
inner = Exception("request too large")
outer = RuntimeError("wrapped")
outer.__context__ = inner
assert _is_prompt_too_long(outer) is True
def test_deep_chain(self) -> None:
bottom = Exception("maximum context length")
middle = RuntimeError("middle")
middle.__cause__ = bottom
top = ValueError("top")
top.__cause__ = middle
assert _is_prompt_too_long(top) is True
def test_chain_no_match(self) -> None:
inner = Exception("rate limit exceeded")
outer = RuntimeError("wrapped")
outer.__cause__ = inner
assert _is_prompt_too_long(outer) is False
def test_cycle_detection(self) -> None:
"""Exception chain with a cycle should not infinite-loop."""
a = Exception("error a")
b = Exception("error b")
a.__cause__ = b
b.__cause__ = a # cycle
assert _is_prompt_too_long(a) is False
def test_all_patterns(self) -> None:
patterns = [
"prompt is too long",
"request too large",
"maximum context length",
"context_length_exceeded",
"input tokens exceed",
"input is too long",
"content length exceeds",
]
for pattern in patterns:
assert _is_prompt_too_long(Exception(pattern)) is True, pattern
# ---------------------------------------------------------------------------
# _reduce_context
# ---------------------------------------------------------------------------
class TestReduceContext:
@pytest.mark.asyncio
async def test_first_retry_compaction_success(self) -> None:
transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
compacted = _build_transcript([("user", "hi"), ("assistant", "[summary]")])
with (
patch(
"backend.copilot.sdk.service.compact_transcript",
new_callable=AsyncMock,
return_value=compacted,
),
patch(
"backend.copilot.sdk.service.validate_transcript",
return_value=True,
),
patch(
"backend.copilot.sdk.service.write_transcript_to_tempfile",
return_value="/tmp/resume.jsonl",
),
):
ctx = await _reduce_context(
transcript, False, "sess-123", "/tmp/cwd", "[test]"
)
assert isinstance(ctx, ReducedContext)
assert ctx.use_resume is True
assert ctx.resume_file == "/tmp/resume.jsonl"
assert ctx.transcript_lost is False
assert ctx.tried_compaction is True
@pytest.mark.asyncio
async def test_compaction_fails_drops_transcript(self) -> None:
transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
with patch(
"backend.copilot.sdk.service.compact_transcript",
new_callable=AsyncMock,
return_value=None,
):
ctx = await _reduce_context(
transcript, False, "sess-123", "/tmp/cwd", "[test]"
)
assert ctx.use_resume is False
assert ctx.resume_file is None
assert ctx.transcript_lost is True
assert ctx.tried_compaction is True
@pytest.mark.asyncio
async def test_already_tried_compaction_skips(self) -> None:
transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
ctx = await _reduce_context(transcript, True, "sess-123", "/tmp/cwd", "[test]")
assert ctx.use_resume is False
assert ctx.transcript_lost is True
assert ctx.tried_compaction is True
@pytest.mark.asyncio
async def test_empty_transcript_drops(self) -> None:
ctx = await _reduce_context("", False, "sess-123", "/tmp/cwd", "[test]")
assert ctx.use_resume is False
assert ctx.transcript_lost is True
@pytest.mark.asyncio
async def test_compaction_returns_same_content_drops(self) -> None:
transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
with patch(
"backend.copilot.sdk.service.compact_transcript",
new_callable=AsyncMock,
return_value=transcript, # same content
):
ctx = await _reduce_context(
transcript, False, "sess-123", "/tmp/cwd", "[test]"
)
assert ctx.transcript_lost is True
@pytest.mark.asyncio
async def test_write_tempfile_fails_drops(self) -> None:
transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
compacted = _build_transcript([("user", "hi"), ("assistant", "[summary]")])
with (
patch(
"backend.copilot.sdk.service.compact_transcript",
new_callable=AsyncMock,
return_value=compacted,
),
patch(
"backend.copilot.sdk.service.validate_transcript",
return_value=True,
),
patch(
"backend.copilot.sdk.service.write_transcript_to_tempfile",
return_value=None,
),
):
ctx = await _reduce_context(
transcript, False, "sess-123", "/tmp/cwd", "[test]"
)
assert ctx.transcript_lost is True
# ---------------------------------------------------------------------------
# _iter_sdk_messages
# ---------------------------------------------------------------------------
class TestIterSdkMessages:
@pytest.mark.asyncio
async def test_yields_messages(self) -> None:
messages = ["msg1", "msg2", "msg3"]
client = AsyncMock()
async def _fake_receive() -> AsyncGenerator[str]:
for m in messages:
yield m
client.receive_response = _fake_receive
result = [msg async for msg in _iter_sdk_messages(client)]
assert result == messages
@pytest.mark.asyncio
async def test_heartbeat_on_timeout(self) -> None:
"""Yields None when asyncio.wait times out."""
client = AsyncMock()
received: list = []
async def _slow_receive() -> AsyncGenerator[str]:
await asyncio.sleep(100) # never completes
yield "never" # pragma: no cover — unreachable, yield makes this an async generator
client.receive_response = _slow_receive
with patch("backend.copilot.sdk.service._HEARTBEAT_INTERVAL", 0.01):
count = 0
async for msg in _iter_sdk_messages(client):
received.append(msg)
count += 1
if count >= 3:
break
assert all(m is None for m in received)
@pytest.mark.asyncio
async def test_exception_propagates(self) -> None:
client = AsyncMock()
async def _error_receive() -> AsyncGenerator[str]:
raise RuntimeError("SDK crash")
yield # pragma: no cover — unreachable, yield makes this an async generator
client.receive_response = _error_receive
with pytest.raises(RuntimeError, match="SDK crash"):
async for _ in _iter_sdk_messages(client):
pass
@pytest.mark.asyncio
async def test_task_cleanup_on_break(self) -> None:
"""Pending task is cancelled when generator is closed."""
client = AsyncMock()
async def _slow_receive() -> AsyncGenerator[str]:
yield "first"
await asyncio.sleep(100)
yield "second"
client.receive_response = _slow_receive
gen = _iter_sdk_messages(client)
first = await gen.__anext__()
assert first == "first"
await gen.aclose() # should cancel pending task cleanly

View File

@@ -1,144 +0,0 @@
"""Claude Code subscription auth helpers.
Handles locating the SDK-bundled CLI binary, provisioning credentials from
environment variables, and validating that subscription auth is functional.
"""
import functools
import json
import logging
import os
import shutil
import subprocess
logger = logging.getLogger(__name__)
def find_bundled_cli() -> str:
"""Locate the Claude CLI binary bundled inside ``claude_agent_sdk``.
Falls back to ``shutil.which("claude")`` if the SDK bundle is absent.
"""
try:
from claude_agent_sdk._internal.transport.subprocess_cli import (
SubprocessCLITransport,
)
path = SubprocessCLITransport._find_bundled_cli(None) # type: ignore[arg-type]
if path:
return str(path)
except Exception:
pass
system_path = shutil.which("claude")
if system_path:
return system_path
raise RuntimeError(
"Claude CLI not found — neither the SDK-bundled binary nor a "
"system-installed `claude` could be located."
)
def provision_credentials_file() -> None:
"""Write ``~/.claude/.credentials.json`` from env when running headless.
If ``CLAUDE_CODE_OAUTH_TOKEN`` is set (an OAuth *access* token obtained
from ``claude auth status`` or extracted from the macOS keychain), this
helper writes a minimal credentials file so the bundled CLI can
authenticate without an interactive ``claude login``.
A ``CLAUDE_CODE_REFRESH_TOKEN`` env var is optional but recommended —
it lets the CLI silently refresh an expired access token.
"""
access_token = os.environ.get("CLAUDE_CODE_OAUTH_TOKEN", "").strip()
if not access_token:
return
creds_dir = os.path.expanduser("~/.claude")
creds_path = os.path.join(creds_dir, ".credentials.json")
# Don't overwrite an existing credentials file (e.g. from a volume mount).
if os.path.exists(creds_path):
logger.debug("Credentials file already exists at %s — skipping", creds_path)
return
os.makedirs(creds_dir, exist_ok=True)
creds = {
"claudeAiOauth": {
"accessToken": access_token,
"refreshToken": os.environ.get("CLAUDE_CODE_REFRESH_TOKEN", "").strip(),
"expiresAt": 0,
"scopes": [
"user:inference",
"user:profile",
"user:sessions:claude_code",
],
}
}
with open(creds_path, "w") as f:
json.dump(creds, f)
logger.info("Provisioned Claude credentials file at %s", creds_path)
@functools.cache
def validate_subscription() -> None:
"""Validate the bundled Claude CLI is reachable and authenticated.
Cached so the blocking subprocess check runs at most once per process
lifetime. On first call, also provisions ``~/.claude/.credentials.json``
from the ``CLAUDE_CODE_OAUTH_TOKEN`` env var when available.
"""
provision_credentials_file()
cli = find_bundled_cli()
result = subprocess.run(
[cli, "--version"],
capture_output=True,
text=True,
timeout=10,
)
if result.returncode != 0:
raise RuntimeError(
f"Claude CLI check failed (exit {result.returncode}): "
f"{result.stderr.strip()}"
)
logger.info(
"Claude Code subscription mode: CLI version %s",
result.stdout.strip(),
)
# Verify the CLI is actually authenticated.
auth_result = subprocess.run(
[cli, "auth", "status"],
capture_output=True,
text=True,
timeout=10,
env={
**os.environ,
"ANTHROPIC_API_KEY": "",
"ANTHROPIC_AUTH_TOKEN": "",
"ANTHROPIC_BASE_URL": "",
},
)
if auth_result.returncode != 0:
raise RuntimeError(
"Claude CLI is not authenticated. Either:\n"
" • Set CLAUDE_CODE_OAUTH_TOKEN env var (from `claude auth status` "
"or macOS keychain), or\n"
" • Mount ~/.claude/.credentials.json into the container, or\n"
" • Run `claude login` inside the container."
)
try:
status = json.loads(auth_result.stdout)
if not status.get("loggedIn"):
raise RuntimeError(
"Claude CLI reports loggedIn=false. Set CLAUDE_CODE_OAUTH_TOKEN "
"or run `claude login`."
)
logger.info(
"Claude subscription auth: method=%s, email=%s",
status.get("authMethod"),
status.get("email"),
)
except json.JSONDecodeError:
logger.warning("Could not parse `claude auth status` output")

View File

@@ -10,9 +10,6 @@ Storage is handled via ``WorkspaceStorageBackend`` (GCS in prod, local
filesystem for self-hosted) — no DB column needed.
"""
from __future__ import annotations
import asyncio
import logging
import os
import re
@@ -20,12 +17,8 @@ import shutil
import time
from dataclasses import dataclass
from pathlib import Path
from uuid import uuid4
from backend.util import json
from backend.util.clients import get_openai_client
from backend.util.prompt import CompressResult, compress_context
from backend.util.workspace_storage import GCSWorkspaceStorage, get_workspace_storage
logger = logging.getLogger(__name__)
@@ -106,14 +99,7 @@ def strip_progress_entries(content: str) -> str:
continue
parent = entry.get("parentUuid", "")
original_parent = parent
# seen_parents is local per-entry (not shared across iterations) so
# it can only detect cycles within a single ancestry walk, not across
# entries. This is intentional: each entry's parent chain is
# independent, and reusing a global set would incorrectly short-circuit
# valid re-use of the same UUID as a parent in different subtrees.
seen_parents: set[str] = set()
while parent in stripped_uuids and parent not in seen_parents:
seen_parents.add(parent)
while parent in stripped_uuids:
parent = uuid_to_parent.get(parent, "")
if parent != original_parent:
entry["parentUuid"] = parent
@@ -350,7 +336,7 @@ def write_transcript_to_tempfile(
# Validate cwd is under the expected sandbox prefix (CodeQL sanitizer).
real_cwd = os.path.realpath(cwd)
if not real_cwd.startswith(_SAFE_CWD_PREFIX):
logger.warning("[Transcript] cwd outside sandbox: %s", cwd)
logger.warning(f"[Transcript] cwd outside sandbox: {cwd}")
return None
try:
@@ -360,17 +346,17 @@ def write_transcript_to_tempfile(
os.path.join(real_cwd, f"transcript-{safe_id}.jsonl")
)
if not jsonl_path.startswith(real_cwd):
logger.warning("[Transcript] Path escaped cwd: %s", jsonl_path)
logger.warning(f"[Transcript] Path escaped cwd: {jsonl_path}")
return None
with open(jsonl_path, "w") as f:
f.write(transcript_content)
logger.info("[Transcript] Wrote resume file: %s", jsonl_path)
logger.info(f"[Transcript] Wrote resume file: {jsonl_path}")
return jsonl_path
except OSError as e:
logger.warning("[Transcript] Failed to write resume file: %s", e)
logger.warning(f"[Transcript] Failed to write resume file: {e}")
return None
@@ -431,6 +417,8 @@ def _meta_storage_path_parts(user_id: str, session_id: str) -> tuple[str, str, s
def _build_path_from_parts(parts: tuple[str, str, str], backend: object) -> str:
"""Build a full storage path from (workspace_id, file_id, filename) parts."""
from backend.util.workspace_storage import GCSWorkspaceStorage
wid, fid, fname = parts
if isinstance(backend, GCSWorkspaceStorage):
blob = f"workspaces/{wid}/{fid}/{fname}"
@@ -469,15 +457,17 @@ async def upload_transcript(
content: Complete JSONL transcript (from TranscriptBuilder).
message_count: ``len(session.messages)`` at upload time.
"""
from backend.util.workspace_storage import get_workspace_storage
# Strip metadata entries (progress, file-history-snapshot, etc.)
# Note: SDK-built transcripts shouldn't have these, but strip for safety
stripped = strip_progress_entries(content)
if not validate_transcript(stripped):
# Log entry types for debugging — helps identify why validation failed
entry_types = [
json.loads(line, fallback={"type": "INVALID_JSON"}).get("type", "?")
for line in stripped.strip().split("\n")
]
entry_types: list[str] = []
for line in stripped.strip().split("\n"):
entry = json.loads(line, fallback={"type": "INVALID_JSON"})
entry_types.append(entry.get("type", "?"))
logger.warning(
"%s Skipping upload — stripped content not valid "
"(types=%s, stripped_len=%d, raw_len=%d)",
@@ -513,14 +503,11 @@ async def upload_transcript(
content=json.dumps(meta).encode("utf-8"),
)
except Exception as e:
logger.warning("%s Failed to write metadata: %s", log_prefix, e)
logger.warning(f"{log_prefix} Failed to write metadata: {e}")
logger.info(
"%s Uploaded %dB (stripped from %dB, msg_count=%d)",
log_prefix,
len(encoded),
len(content),
message_count,
f"{log_prefix} Uploaded {len(encoded)}B "
f"(stripped from {len(content)}B, msg_count={message_count})"
)
@@ -534,6 +521,8 @@ async def download_transcript(
Returns a ``TranscriptDownload`` with the JSONL content and the
``message_count`` watermark from the upload, or ``None`` if not found.
"""
from backend.util.workspace_storage import get_workspace_storage
storage = await get_workspace_storage()
path = _build_storage_path(user_id, session_id, storage)
@@ -541,10 +530,10 @@ async def download_transcript(
data = await storage.retrieve(path)
content = data.decode("utf-8")
except FileNotFoundError:
logger.debug("%s No transcript in storage", log_prefix)
logger.debug(f"{log_prefix} No transcript in storage")
return None
except Exception as e:
logger.warning("%s Failed to download transcript: %s", log_prefix, e)
logger.warning(f"{log_prefix} Failed to download transcript: {e}")
return None
# Try to load metadata (best-effort — old transcripts won't have it)
@@ -556,14 +545,10 @@ async def download_transcript(
meta = json.loads(meta_data.decode("utf-8"), fallback={})
message_count = meta.get("message_count", 0)
uploaded_at = meta.get("uploaded_at", 0.0)
except FileNotFoundError:
except (FileNotFoundError, Exception):
pass # No metadata — treat as unknown (msg_count=0 → always fill gap)
except Exception as e:
logger.debug("%s Failed to load transcript metadata: %s", log_prefix, e)
logger.info(
"%s Downloaded %dB (msg_count=%d)", log_prefix, len(content), message_count
)
logger.info(f"{log_prefix} Downloaded {len(content)}B (msg_count={message_count})")
return TranscriptDownload(
content=content,
message_count=message_count,
@@ -577,6 +562,8 @@ async def delete_transcript(user_id: str, session_id: str) -> None:
Removes both the ``.jsonl`` transcript and the companion ``.meta.json``
so stale ``message_count`` watermarks cannot corrupt gap-fill logic.
"""
from backend.util.workspace_storage import get_workspace_storage
storage = await get_workspace_storage()
path = _build_storage_path(user_id, session_id, storage)
@@ -593,280 +580,3 @@ async def delete_transcript(user_id: str, session_id: str) -> None:
logger.info("[Transcript] Deleted metadata for session %s", session_id)
except Exception as e:
logger.warning("[Transcript] Failed to delete metadata: %s", e)
# ---------------------------------------------------------------------------
# Transcript compaction — LLM summarization for prompt-too-long recovery
# ---------------------------------------------------------------------------
# JSONL protocol values used in transcript serialization.
STOP_REASON_END_TURN = "end_turn"
COMPACT_MSG_ID_PREFIX = "msg_compact_"
ENTRY_TYPE_MESSAGE = "message"
def _flatten_assistant_content(blocks: list) -> str:
"""Flatten assistant content blocks into a single plain-text string.
Structured ``tool_use`` blocks are converted to ``[tool_use: name]``
placeholders. This is intentional: ``compress_context`` requires plain
text for token counting and LLM summarization. The structural loss is
acceptable because compaction only runs when the original transcript was
already too large for the model — a summarized plain-text version is
better than no context at all.
"""
parts: list[str] = []
for block in blocks:
if isinstance(block, dict):
btype = block.get("type", "")
if btype == "text":
parts.append(block.get("text", ""))
elif btype == "tool_use":
parts.append(f"[tool_use: {block.get('name', '?')}]")
else:
# Preserve non-text blocks (e.g. image) as placeholders.
# Use __prefix__ to distinguish from literal user text.
parts.append(f"[__{btype}__]")
elif isinstance(block, str):
parts.append(block)
return "\n".join(parts) if parts else ""
def _flatten_tool_result_content(blocks: list) -> str:
"""Flatten tool_result and other content blocks into plain text.
Handles nested tool_result structures, text blocks, and raw strings.
Uses ``json.dumps`` as fallback for dict blocks without a ``text`` key
or where ``text`` is ``None``.
Like ``_flatten_assistant_content``, structured blocks (images, nested
tool results) are reduced to text representations for compression.
"""
str_parts: list[str] = []
for block in blocks:
if isinstance(block, dict) and block.get("type") == "tool_result":
inner = block.get("content") or ""
if isinstance(inner, list):
for sub in inner:
if isinstance(sub, dict):
sub_type = sub.get("type")
if sub_type in ("image", "document"):
# Avoid serializing base64 binary data into
# the compaction input — use a placeholder.
str_parts.append(f"[__{sub_type}__]")
elif sub_type == "text" or sub.get("text") is not None:
str_parts.append(str(sub.get("text", "")))
else:
str_parts.append(json.dumps(sub))
else:
str_parts.append(str(sub))
else:
str_parts.append(str(inner))
elif isinstance(block, dict) and block.get("type") == "text":
str_parts.append(str(block.get("text", "")))
elif isinstance(block, dict):
# Preserve non-text/non-tool_result blocks (e.g. image) as placeholders.
# Use __prefix__ to distinguish from literal user text.
btype = block.get("type", "unknown")
str_parts.append(f"[__{btype}__]")
elif isinstance(block, str):
str_parts.append(block)
return "\n".join(str_parts) if str_parts else ""
def _transcript_to_messages(content: str) -> list[dict]:
"""Convert JSONL transcript entries to plain message dicts for compression.
Parses each line of the JSONL *content*, skips strippable metadata entries
(progress, file-history-snapshot, etc.), and extracts the ``role`` and
flattened ``content`` from the ``message`` field of each remaining entry.
Structured content blocks (``tool_use``, ``tool_result``, images) are
flattened to plain text via ``_flatten_assistant_content`` and
``_flatten_tool_result_content`` so that ``compress_context`` can
perform token counting and LLM summarization on uniform strings.
Returns:
A list of ``{"role": str, "content": str}`` dicts suitable for
``compress_context``.
"""
messages: list[dict] = []
for line in content.strip().split("\n"):
if not line.strip():
continue
entry = json.loads(line, fallback=None)
if not isinstance(entry, dict):
continue
if entry.get("type", "") in STRIPPABLE_TYPES and not entry.get(
"isCompactSummary"
):
continue
msg = entry.get("message", {})
role = msg.get("role", "")
if not role:
continue
msg_dict: dict = {"role": role}
raw_content = msg.get("content")
if role == "assistant" and isinstance(raw_content, list):
msg_dict["content"] = _flatten_assistant_content(raw_content)
elif isinstance(raw_content, list):
msg_dict["content"] = _flatten_tool_result_content(raw_content)
else:
msg_dict["content"] = raw_content or ""
messages.append(msg_dict)
return messages
def _messages_to_transcript(messages: list[dict]) -> str:
"""Convert compressed message dicts back to JSONL transcript format.
Rebuilds a minimal JSONL transcript from the ``{"role", "content"}``
dicts returned by ``compress_context``. Each message becomes one JSONL
line with a fresh ``uuid`` / ``parentUuid`` chain so the CLI's
``--resume`` flag can reconstruct a valid conversation tree.
Assistant messages are wrapped in the full ``message`` envelope
(``id``, ``model``, ``stop_reason``, structured ``content`` blocks)
that the CLI expects. User messages use the simpler ``{role, content}``
form.
Returns:
A newline-terminated JSONL string, or an empty string if *messages*
is empty.
"""
lines: list[str] = []
last_uuid: str = "" # root entry uses empty string, not null
for msg in messages:
role = msg.get("role", "user")
entry_type = "assistant" if role == "assistant" else "user"
uid = str(uuid4())
content = msg.get("content", "")
if role == "assistant":
message: dict = {
"role": "assistant",
"model": "",
"id": f"{COMPACT_MSG_ID_PREFIX}{uuid4().hex[:24]}",
"type": ENTRY_TYPE_MESSAGE,
"content": [{"type": "text", "text": content}] if content else [],
"stop_reason": STOP_REASON_END_TURN,
"stop_sequence": None,
}
else:
message = {"role": role, "content": content}
entry = {
"type": entry_type,
"uuid": uid,
"parentUuid": last_uuid,
"message": message,
}
lines.append(json.dumps(entry, separators=(",", ":")))
last_uuid = uid
return "\n".join(lines) + "\n" if lines else ""
_COMPACTION_TIMEOUT_SECONDS = 60
_TRUNCATION_TIMEOUT_SECONDS = 30
async def _run_compression(
messages: list[dict],
model: str,
log_prefix: str,
) -> CompressResult:
"""Run LLM-based compression with truncation fallback.
Uses the shared OpenAI client from ``get_openai_client()``.
If no client is configured or the LLM call fails, falls back to
truncation-based compression which drops older messages without
summarization.
A 60-second timeout prevents a hung LLM call from blocking the
retry path indefinitely. The truncation fallback also has a
30-second timeout to guard against slow tokenization on very large
transcripts.
"""
client = get_openai_client()
if client is None:
logger.warning("%s No OpenAI client configured, using truncation", log_prefix)
return await asyncio.wait_for(
compress_context(messages=messages, model=model, client=None),
timeout=_TRUNCATION_TIMEOUT_SECONDS,
)
try:
return await asyncio.wait_for(
compress_context(messages=messages, model=model, client=client),
timeout=_COMPACTION_TIMEOUT_SECONDS,
)
except Exception as e:
logger.warning("%s LLM compaction failed, using truncation: %s", log_prefix, e)
return await asyncio.wait_for(
compress_context(messages=messages, model=model, client=None),
timeout=_TRUNCATION_TIMEOUT_SECONDS,
)
async def compact_transcript(
content: str,
*,
model: str,
log_prefix: str = "[Transcript]",
) -> str | None:
"""Compact an oversized JSONL transcript using LLM summarization.
Converts transcript entries to plain messages, runs ``compress_context``
(the same compressor used for pre-query history), and rebuilds JSONL.
Structured content (``tool_use`` blocks, ``tool_result`` nesting, images)
is flattened to plain text for compression. This matches the fidelity of
the Plan C (DB compression) fallback path, where
``_format_conversation_context`` similarly renders tool calls as
``You called tool: name(args)`` and results as ``Tool result: ...``.
Neither path preserves structured API content blocks — the compacted
context serves as text history for the LLM, which creates proper
structured tool calls going forward.
Images are per-turn attachments loaded from workspace storage by file ID
(via ``_prepare_file_attachments``), not part of the conversation history.
They are re-attached each turn and are unaffected by compaction.
Returns the compacted JSONL string, or ``None`` on failure.
See also:
``_compress_messages`` in ``service.py`` — compresses ``ChatMessage``
lists for pre-query DB history. Both share ``compress_context()``
but operate on different input formats (JSONL transcript entries
here vs. ChatMessage dicts there).
"""
messages = _transcript_to_messages(content)
if len(messages) < 2:
logger.warning("%s Too few messages to compact (%d)", log_prefix, len(messages))
return None
try:
result = await _run_compression(messages, model, log_prefix)
if not result.was_compacted:
# Compressor says it's within budget, but the SDK rejected it.
# Return None so the caller falls through to DB fallback.
logger.warning(
"%s Compressor reports within budget but SDK rejected — "
"signalling failure",
log_prefix,
)
return None
logger.info(
"%s Compacted transcript: %d->%d tokens (%d summarized, %d dropped)",
log_prefix,
result.original_token_count,
result.token_count,
result.messages_summarized,
result.messages_dropped,
)
compacted = _messages_to_transcript(result.messages)
if not validate_transcript(compacted):
logger.warning("%s Compacted transcript failed validation", log_prefix)
return None
return compacted
except Exception as e:
logger.error(
"%s Transcript compaction failed: %s", log_prefix, e, exc_info=True
)
return None

View File

@@ -68,7 +68,7 @@ class TranscriptBuilder:
type=entry_type,
uuid=data.get("uuid") or str(uuid4()),
parentUuid=data.get("parentUuid"),
isCompactSummary=data.get("isCompactSummary"),
isCompactSummary=data.get("isCompactSummary") or None,
message=data.get("message", {}),
)

View File

@@ -1,8 +1,7 @@
"""Unit tests for JSONL transcript management utilities."""
import asyncio
import os
from unittest.mock import AsyncMock, MagicMock, patch
from unittest.mock import AsyncMock, patch
import pytest
@@ -302,7 +301,7 @@ class TestDeleteTranscript:
mock_storage.delete = AsyncMock()
with patch(
"backend.copilot.sdk.transcript.get_workspace_storage",
"backend.util.workspace_storage.get_workspace_storage",
new_callable=AsyncMock,
return_value=mock_storage,
):
@@ -322,7 +321,7 @@ class TestDeleteTranscript:
)
with patch(
"backend.copilot.sdk.transcript.get_workspace_storage",
"backend.util.workspace_storage.get_workspace_storage",
new_callable=AsyncMock,
return_value=mock_storage,
):
@@ -340,7 +339,7 @@ class TestDeleteTranscript:
)
with patch(
"backend.copilot.sdk.transcript.get_workspace_storage",
"backend.util.workspace_storage.get_workspace_storage",
new_callable=AsyncMock,
return_value=mock_storage,
):
@@ -819,183 +818,6 @@ class TestCompactionFlowIntegration:
assert lines2[-1]["parentUuid"] == "a2"
# ---------------------------------------------------------------------------
# _run_compression (direct tests for the 3 code paths)
# ---------------------------------------------------------------------------
class TestRunCompression:
"""Direct tests for ``_run_compression`` covering all 3 code paths.
Paths:
(a) No OpenAI client configured → truncation fallback immediately.
(b) LLM success → returns LLM-compressed result.
(c) LLM call raises → truncation fallback.
"""
def _make_compress_result(self, was_compacted: bool, msgs=None):
"""Build a minimal CompressResult-like object."""
from types import SimpleNamespace
return SimpleNamespace(
was_compacted=was_compacted,
messages=msgs or [{"role": "user", "content": "summary"}],
original_token_count=500,
token_count=100 if was_compacted else 500,
messages_summarized=2 if was_compacted else 0,
messages_dropped=0,
)
@pytest.mark.asyncio
async def test_no_client_uses_truncation(self):
"""Path (a): ``get_openai_client()`` returns None → truncation only."""
from .transcript import _run_compression
truncation_result = self._make_compress_result(
True, [{"role": "user", "content": "truncated"}]
)
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
return_value=None,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
new_callable=AsyncMock,
return_value=truncation_result,
) as mock_compress,
):
result = await _run_compression(
[{"role": "user", "content": "hello"}],
model="test-model",
log_prefix="[test]",
)
# compress_context called with client=None (truncation mode)
call_kwargs = mock_compress.call_args
assert (
call_kwargs.kwargs.get("client") is None
or (call_kwargs.args and call_kwargs.args[2] is None)
or mock_compress.call_args[1].get("client") is None
)
assert result is truncation_result
@pytest.mark.asyncio
async def test_llm_success_returns_llm_result(self):
"""Path (b): ``get_openai_client()`` returns a client → LLM compresses."""
from .transcript import _run_compression
llm_result = self._make_compress_result(
True, [{"role": "user", "content": "LLM summary"}]
)
mock_client = MagicMock()
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
return_value=mock_client,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
new_callable=AsyncMock,
return_value=llm_result,
) as mock_compress,
):
result = await _run_compression(
[{"role": "user", "content": "long conversation"}],
model="test-model",
log_prefix="[test]",
)
# compress_context called with the real client
assert mock_compress.called
assert result is llm_result
@pytest.mark.asyncio
async def test_llm_failure_falls_back_to_truncation(self):
"""Path (c): LLM call raises → truncation fallback used instead."""
from .transcript import _run_compression
truncation_result = self._make_compress_result(
True, [{"role": "user", "content": "truncated fallback"}]
)
mock_client = MagicMock()
call_count = [0]
async def _compress_side_effect(**kwargs):
call_count[0] += 1
if kwargs.get("client") is not None:
raise RuntimeError("LLM timeout")
return truncation_result
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
return_value=mock_client,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
side_effect=_compress_side_effect,
),
):
result = await _run_compression(
[{"role": "user", "content": "long conversation"}],
model="test-model",
log_prefix="[test]",
)
# compress_context called twice: once for LLM (raises), once for truncation
assert call_count[0] == 2
assert result is truncation_result
@pytest.mark.asyncio
async def test_llm_timeout_falls_back_to_truncation(self):
"""Path (d): LLM call exceeds timeout → truncation fallback used."""
from .transcript import _run_compression
truncation_result = self._make_compress_result(
True, [{"role": "user", "content": "truncated after timeout"}]
)
call_count = [0]
async def _compress_side_effect(*, messages, model, client):
call_count[0] += 1
if client is not None:
# Simulate a hang that exceeds the timeout
await asyncio.sleep(9999)
return truncation_result
fake_client = MagicMock()
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
return_value=fake_client,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
side_effect=_compress_side_effect,
),
patch(
"backend.copilot.sdk.transcript._COMPACTION_TIMEOUT_SECONDS",
0.05,
),
patch(
"backend.copilot.sdk.transcript._TRUNCATION_TIMEOUT_SECONDS",
5,
),
):
result = await _run_compression(
[{"role": "user", "content": "long conversation"}],
model="test-model",
log_prefix="[test]",
)
# compress_context called twice: once for LLM (times out), once truncation
assert call_count[0] == 2
assert result is truncation_result
# ---------------------------------------------------------------------------
# cleanup_stale_project_dirs
# ---------------------------------------------------------------------------

View File

@@ -12,7 +12,6 @@ from .agent_browser import BrowserActTool, BrowserNavigateTool, BrowserScreensho
from .agent_output import AgentOutputTool
from .base import BaseTool
from .bash_exec import BashExecTool
from .connect_integration import ConnectIntegrationTool
from .continue_run_block import ContinueRunBlockTool
from .create_agent import CreateAgentTool
from .customize_agent import CustomizeAgentTool
@@ -85,7 +84,6 @@ TOOL_REGISTRY: dict[str, BaseTool] = {
"browser_screenshot": BrowserScreenshotTool(),
# Sandboxed code execution (bubblewrap)
"bash_exec": BashExecTool(),
"connect_integration": ConnectIntegrationTool(),
# Persistent workspace tools (cloud storage, survives across sessions)
# Feature request tools
"search_feature_requests": SearchFeatureRequestsTool(),

View File

@@ -22,12 +22,13 @@ class AddUnderstandingTool(BaseTool):
@property
def description(self) -> str:
return (
"Store user's business context, workflows, pain points, and automation goals. "
"Call whenever the user shares business info. Each call incrementally merges "
"with existing data — provide only the fields you have. "
"Builds a profile that helps recommend better agents for the user's needs."
)
return """Capture and store information about the user's business context,
workflows, pain points, and automation goals. Call this tool whenever the user
shares information about their business. Each call incrementally adds to the
existing understanding - you don't need to provide all fields at once.
Use this to build a comprehensive profile that helps recommend better agents
and automations for the user's specific needs."""
@property
def parameters(self) -> dict[str, Any]:

View File

@@ -20,9 +20,7 @@ SSRF protection:
Requires:
npm install -g agent-browser
In Docker: system chromium package with AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium
(set automatically — no `agent-browser install` needed).
Locally: run `agent-browser install` to download Chromium.
agent-browser install (downloads Chromium, one-time per machine)
"""
import asyncio
@@ -410,11 +408,18 @@ class BrowserNavigateTool(BaseTool):
@property
def description(self) -> str:
return (
"Navigate to a URL in a real browser. Returns accessibility tree with @ref IDs "
"for browser_act. Session persists (cookies/auth carry over). "
"For static pages, prefer web_fetch. "
"For SPAs, elements may load late — use browser_act with wait + browser_screenshot to verify. "
"For auth: navigate to login, fill creds and submit with browser_act, then navigate to target."
"Navigate to a URL using a real browser. Returns an accessibility "
"tree snapshot listing the page's interactive elements with @ref IDs "
"(e.g. @e3) that can be used with browser_act. "
"Session persists — cookies and login state carry over between calls. "
"Use this (with browser_act) for multi-step interaction: login flows, "
"form filling, button clicks, or anything requiring page interaction. "
"For plain static pages, prefer web_fetch — no browser overhead. "
"For authenticated pages: navigate to the login page first, use browser_act "
"to fill credentials and submit, then navigate to the target page. "
"Note: for slow SPAs, the returned snapshot may reflect a partially-loaded "
"state. If elements seem missing, use browser_act with action='wait' and a "
"CSS selector or millisecond delay, then take a browser_screenshot to verify."
)
@property
@@ -424,13 +429,13 @@ class BrowserNavigateTool(BaseTool):
"properties": {
"url": {
"type": "string",
"description": "HTTP/HTTPS URL to navigate to.",
"description": "The HTTP/HTTPS URL to navigate to.",
},
"wait_for": {
"type": "string",
"enum": ["networkidle", "load", "domcontentloaded"],
"default": "networkidle",
"description": "Navigation completion strategy (default: networkidle).",
"description": "When to consider navigation complete. Use 'networkidle' for SPAs (default).",
},
},
"required": ["url"],
@@ -549,12 +554,14 @@ class BrowserActTool(BaseTool):
@property
def description(self) -> str:
return (
"Interact with the current browser page using @ref IDs from the snapshot. "
"Actions: click, dblclick, fill, type, scroll, hover, press, "
"Interact with the current browser page. Use @ref IDs from the "
"snapshot (e.g. '@e3') to target elements. Returns an updated snapshot. "
"Supported actions: click, dblclick, fill, type, scroll, hover, press, "
"check, uncheck, select, wait, back, forward, reload. "
"fill clears field first; type appends. "
"wait accepts CSS selector or milliseconds (e.g. '1000'). "
"Returns updated snapshot."
"fill clears the field before typing; type appends without clearing. "
"wait accepts a CSS selector (waits for element) or milliseconds string (e.g. '1000'). "
"Example login flow: fill @e1 with email → fill @e2 with password → "
"click @e3 (submit) → browser_navigate to the target page."
)
@property
@@ -580,21 +587,30 @@ class BrowserActTool(BaseTool):
"forward",
"reload",
],
"description": "Action to perform.",
"description": "The action to perform.",
},
"target": {
"type": "string",
"description": "@ref ID (e.g. '@e3'), CSS selector, or text. Required for: click, dblclick, fill, type, hover, check, uncheck, select. For wait: CSS selector or milliseconds string (e.g. '1000').",
"description": (
"Element to target. Use @ref from snapshot (e.g. '@e3'), "
"a CSS selector, or a text description. "
"Required for: click, dblclick, fill, type, hover, check, uncheck, select. "
"For wait: a CSS selector to wait for, or milliseconds as a string (e.g. '1000')."
),
},
"value": {
"type": "string",
"description": "Text for fill/type, key for press (e.g. 'Enter'), option for select.",
"description": (
"For fill/type: the text to enter. "
"For press: key name (e.g. 'Enter', 'Tab', 'Control+a'). "
"For select: the option value to select."
),
},
"direction": {
"type": "string",
"enum": ["up", "down", "left", "right"],
"default": "down",
"description": "Scroll direction (default: down).",
"description": "For scroll: direction to scroll.",
},
},
"required": ["action"],
@@ -741,10 +757,12 @@ class BrowserScreenshotTool(BaseTool):
@property
def description(self) -> str:
return (
"Screenshot the current browser page and save to workspace. "
"annotate=true overlays @ref labels on elements. "
"IMPORTANT: After calling, you MUST immediately call read_workspace_file with the "
"returned file_id to display the image inline."
"Take a screenshot of the current browser page and save it to the workspace. "
"IMPORTANT: After calling this tool, immediately call read_workspace_file "
"with the returned file_id to display the image inline to the user — "
"the screenshot is not visible until you do this. "
"With annotate=true (default), @ref labels are overlaid on interactive "
"elements, making it easy to see which @ref ID maps to which element on screen."
)
@property
@@ -755,12 +773,12 @@ class BrowserScreenshotTool(BaseTool):
"annotate": {
"type": "boolean",
"default": True,
"description": "Overlay @ref labels (default: true).",
"description": "Overlay @ref labels on interactive elements (default: true).",
},
"filename": {
"type": "string",
"default": "screenshot.png",
"description": "Workspace filename (default: screenshot.png).",
"description": "Filename to save in the workspace.",
},
},
}

View File

@@ -1,351 +0,0 @@
"""Integration tests for agent-browser + system chromium.
These tests actually invoke the agent-browser binary via subprocess and require:
- agent-browser installed (npm install -g agent-browser)
- AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium (set in Docker)
Run with:
poetry run test
Or to run only this file:
poetry run pytest backend/copilot/tools/agent_browser_integration_test.py -v -p no:autogpt_platform
Skipped automatically when agent-browser binary is not found.
Tests that hit external sites are marked ``integration`` and skipped by default
in CI (use ``-m integration`` to include them).
Two test tiers:
- CLI tests: call agent-browser subprocess directly (no backend imports needed)
- Tool class tests: call BrowserNavigateTool/BrowserActTool._execute() directly
with user_id=None (skips workspace/DB interactions — no Postgres/RabbitMQ needed)
"""
import concurrent.futures
import os
import shutil
import subprocess
import tempfile
from datetime import datetime, timezone
from urllib.parse import urlparse
import pytest
from backend.copilot.model import ChatSession
from backend.copilot.tools.agent_browser import BrowserActTool, BrowserNavigateTool
from backend.copilot.tools.models import (
BrowserActResponse,
BrowserNavigateResponse,
ErrorResponse,
)
pytestmark = pytest.mark.skipif(
shutil.which("agent-browser") is None,
reason="agent-browser binary not found",
)
_SESSION = "integration-test-session"
def _agent_browser(
*args: str, session: str = _SESSION, timeout: int = 30
) -> tuple[int, str, str]:
"""Run agent-browser for the given session, return (rc, stdout, stderr)."""
result = subprocess.run(
["agent-browser", "--session", session, "--session-name", session, *args],
capture_output=True,
text=True,
timeout=timeout,
)
return result.returncode, result.stdout, result.stderr
def _close_session(session: str, timeout: int = 5) -> None:
"""Best-effort close for a browser session; never raises on failure."""
try:
subprocess.run(
["agent-browser", "--session", session, "--session-name", session, "close"],
capture_output=True,
timeout=timeout,
)
except (subprocess.TimeoutExpired, OSError):
pass
@pytest.fixture(autouse=True)
def _teardown():
"""Close the shared test session after each test (best-effort)."""
yield
_close_session(_SESSION)
# ---------------------------------------------------------------------------
# Tests
# ---------------------------------------------------------------------------
def test_chromium_executable_env_is_set():
"""AGENT_BROWSER_EXECUTABLE_PATH must be set and point to an executable binary."""
exe = os.environ.get("AGENT_BROWSER_EXECUTABLE_PATH", "")
assert exe, "AGENT_BROWSER_EXECUTABLE_PATH is not set"
assert os.path.isfile(exe), f"Chromium binary not found at {exe}"
assert os.access(exe, os.X_OK), f"Chromium binary at {exe} is not executable"
@pytest.mark.integration
def test_navigate_returns_success():
"""agent-browser can open a public URL using system chromium."""
rc, _, stderr = _agent_browser("open", "https://example.com")
assert rc == 0, f"open failed (rc={rc}): {stderr}"
@pytest.mark.integration
def test_get_title_after_navigate():
"""get title returns the page title after navigation."""
rc, _, _ = _agent_browser("open", "https://example.com")
assert rc == 0
rc, stdout, stderr = _agent_browser("get", "title", timeout=10)
assert rc == 0, f"get title failed: {stderr}"
assert "example" in stdout.lower()
@pytest.mark.integration
def test_get_url_after_navigate():
"""get url returns the navigated URL."""
rc, _, _ = _agent_browser("open", "https://example.com")
assert rc == 0
rc, stdout, stderr = _agent_browser("get", "url", timeout=10)
assert rc == 0, f"get url failed: {stderr}"
assert urlparse(stdout.strip()).netloc == "example.com"
@pytest.mark.integration
def test_snapshot_returns_interactive_elements():
"""snapshot -i -c lists interactive elements on the page."""
rc, _, _ = _agent_browser("open", "https://example.com")
assert rc == 0
rc, stdout, stderr = _agent_browser("snapshot", "-i", "-c", timeout=15)
assert rc == 0, f"snapshot failed: {stderr}"
assert len(stdout.strip()) > 0, "snapshot returned empty output"
@pytest.mark.integration
def test_screenshot_produces_valid_png():
"""screenshot saves a non-empty, valid PNG file."""
rc, _, _ = _agent_browser("open", "https://example.com")
assert rc == 0
with tempfile.NamedTemporaryFile(suffix=".png", delete=False) as f:
tmp = f.name
try:
rc, _, stderr = _agent_browser("screenshot", tmp, timeout=15)
assert rc == 0, f"screenshot failed: {stderr}"
size = os.path.getsize(tmp)
assert size > 1000, f"PNG too small ({size} bytes) — likely blank or corrupt"
with open(tmp, "rb") as f:
assert f.read(4) == b"\x89PNG", "Output is not a valid PNG"
finally:
os.unlink(tmp)
@pytest.mark.integration
def test_scroll_down():
"""scroll down succeeds without error."""
rc, _, _ = _agent_browser("open", "https://example.com")
assert rc == 0
rc, _, stderr = _agent_browser("scroll", "down", timeout=10)
assert rc == 0, f"scroll failed: {stderr}"
@pytest.mark.integration
def test_fill_form_field():
"""fill writes text into an input field."""
rc, _, _ = _agent_browser("open", "https://httpbin.org/forms/post")
assert rc == 0
rc, _, stderr = _agent_browser(
"fill", "input[name=custname]", "IntegrationTestUser", timeout=10
)
assert rc == 0, f"fill failed: {stderr}"
@pytest.mark.integration
def test_concurrent_independent_sessions():
"""Two independent sessions can navigate in parallel without interference."""
session_a = "integration-concurrent-a"
session_b = "integration-concurrent-b"
try:
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as pool:
fut_a = pool.submit(
_agent_browser, "open", "https://example.com", session=session_a
)
fut_b = pool.submit(
_agent_browser, "open", "https://httpbin.org/html", session=session_b
)
rc_a, _, err_a = fut_a.result(timeout=40)
rc_b, _, err_b = fut_b.result(timeout=40)
assert rc_a == 0, f"session_a open failed: {err_a}"
assert rc_b == 0, f"session_b open failed: {err_b}"
rc_ua, url_a, err_ua = _agent_browser(
"get", "url", session=session_a, timeout=10
)
rc_ub, url_b, err_ub = _agent_browser(
"get", "url", session=session_b, timeout=10
)
assert rc_ua == 0, f"session_a get url failed: {err_ua}"
assert rc_ub == 0, f"session_b get url failed: {err_ub}"
assert urlparse(url_a.strip()).netloc == "example.com"
assert urlparse(url_b.strip()).netloc == "httpbin.org"
finally:
_close_session(session_a)
_close_session(session_b)
@pytest.mark.integration
def test_close_session():
"""close shuts down the browser daemon cleanly."""
rc, _, _ = _agent_browser("open", "https://example.com")
assert rc == 0
rc, _, stderr = _agent_browser("close", timeout=10)
assert rc == 0, f"close failed: {stderr}"
# ---------------------------------------------------------------------------
# Python tool class integration tests
#
# These tests exercise the actual BrowserNavigateTool / BrowserActTool Python
# classes (not just the CLI binary) to verify the full call path — URL
# validation, subprocess dispatch, response parsing — works with system
# chromium. user_id=None skips workspace/DB interactions so no Postgres or
# RabbitMQ is needed.
# ---------------------------------------------------------------------------
_TOOL_SESSION_ID = "integration-tool-test-session"
_TEST_SESSION = ChatSession(
session_id=_TOOL_SESSION_ID,
user_id="test-user",
messages=[],
usage=[],
started_at=datetime.now(timezone.utc),
updated_at=datetime.now(timezone.utc),
)
@pytest.fixture(autouse=False)
def _close_tool_session():
"""Tear down the tool-test browser session after each tool test."""
yield
_close_session(_TOOL_SESSION_ID)
@pytest.mark.integration
@pytest.mark.asyncio
async def test_tool_navigate_returns_response(_close_tool_session):
"""BrowserNavigateTool._execute returns a BrowserNavigateResponse with real content."""
tool = BrowserNavigateTool()
resp = await tool._execute(
user_id=None, session=_TEST_SESSION, url="https://example.com"
)
assert isinstance(
resp, BrowserNavigateResponse
), f"Expected BrowserNavigateResponse, got: {resp}"
assert urlparse(resp.url).netloc == "example.com"
assert resp.title, "Expected non-empty page title"
assert resp.snapshot, "Expected non-empty accessibility snapshot"
@pytest.mark.asyncio
@pytest.mark.parametrize(
"ssrf_url",
[
"http://169.254.169.254/", # AWS/GCP/Azure metadata endpoint
"http://127.0.0.1/", # IPv4 loopback
"http://10.0.0.1/", # RFC-1918 private range
"http://[::1]/", # IPv6 loopback
"http://0.0.0.0/", # Wildcard / INADDR_ANY
],
)
async def test_tool_navigate_blocked_url(ssrf_url: str, _close_tool_session):
"""BrowserNavigateTool._execute rejects internal/private URLs (SSRF guard)."""
tool = BrowserNavigateTool()
resp = await tool._execute(user_id=None, session=_TEST_SESSION, url=ssrf_url)
assert isinstance(
resp, ErrorResponse
), f"Expected ErrorResponse for SSRF URL {ssrf_url!r}, got: {resp}"
assert resp.error == "blocked_url"
@pytest.mark.asyncio
async def test_tool_navigate_missing_url(_close_tool_session):
"""BrowserNavigateTool._execute returns an error when url is empty."""
tool = BrowserNavigateTool()
resp = await tool._execute(user_id=None, session=_TEST_SESSION, url="")
assert isinstance(resp, ErrorResponse)
assert resp.error == "missing_url"
@pytest.mark.integration
@pytest.mark.asyncio
async def test_tool_act_scroll(_close_tool_session):
"""BrowserActTool._execute can scroll after a navigate."""
nav = BrowserNavigateTool()
nav_resp = await nav._execute(
user_id=None, session=_TEST_SESSION, url="https://example.com"
)
assert isinstance(nav_resp, BrowserNavigateResponse)
act = BrowserActTool()
resp = await act._execute(
user_id=None, session=_TEST_SESSION, action="scroll", direction="down"
)
assert isinstance(
resp, BrowserActResponse
), f"Expected BrowserActResponse, got: {resp}"
assert resp.action == "scroll"
@pytest.mark.integration
@pytest.mark.asyncio
async def test_tool_act_fill_and_click(_close_tool_session):
"""BrowserActTool._execute can fill a form field."""
nav = BrowserNavigateTool()
nav_resp = await nav._execute(
user_id=None, session=_TEST_SESSION, url="https://httpbin.org/forms/post"
)
assert isinstance(nav_resp, BrowserNavigateResponse)
act = BrowserActTool()
resp = await act._execute(
user_id=None,
session=_TEST_SESSION,
action="fill",
target="input[name=custname]",
value="ToolIntegrationTest",
)
assert isinstance(resp, BrowserActResponse), f"fill failed: {resp}"
@pytest.mark.asyncio
async def test_tool_act_missing_action(_close_tool_session):
"""BrowserActTool._execute returns an error when action is missing."""
act = BrowserActTool()
resp = await act._execute(user_id=None, session=_TEST_SESSION, action="")
assert isinstance(resp, ErrorResponse)
assert resp.error == "missing_action"
@pytest.mark.asyncio
async def test_tool_act_missing_target(_close_tool_session):
"""BrowserActTool._execute returns an error when click target is missing."""
act = BrowserActTool()
resp = await act._execute(
user_id=None, session=_TEST_SESSION, action="click", target=""
)
assert isinstance(resp, ErrorResponse)
assert resp.error == "missing_target"

View File

@@ -108,12 +108,22 @@ class AgentOutputTool(BaseTool):
@property
def description(self) -> str:
return (
"Retrieve execution outputs from a library agent. "
"Identify by agent_name, library_agent_id, or store_slug. "
"Filter by execution_id or run_time. "
"Optionally wait for running executions."
)
return """Retrieve execution outputs from agents in the user's library.
Identify the agent using one of:
- agent_name: Fuzzy search in user's library
- library_agent_id: Exact library agent ID
- store_slug: Marketplace format 'username/agent-name'
Select which run to retrieve using:
- execution_id: Specific execution ID
- run_time: 'latest' (default), 'yesterday', 'last week', or ISO date 'YYYY-MM-DD'
Wait for completion (optional):
- wait_if_running: Max seconds to wait if execution is still running (0-300).
If the execution is running/queued, waits up to this many seconds for completion.
Returns current status on timeout. If already finished, returns immediately.
"""
@property
def parameters(self) -> dict[str, Any]:
@@ -122,29 +132,32 @@ class AgentOutputTool(BaseTool):
"properties": {
"agent_name": {
"type": "string",
"description": "Agent name (fuzzy match).",
"description": "Agent name to search for in user's library (fuzzy match)",
},
"library_agent_id": {
"type": "string",
"description": "Library agent ID.",
"description": "Exact library agent ID",
},
"store_slug": {
"type": "string",
"description": "Marketplace 'username/agent-name'.",
"description": "Marketplace identifier: 'username/agent-slug'",
},
"execution_id": {
"type": "string",
"description": "Specific execution ID.",
"description": "Specific execution ID to retrieve",
},
"run_time": {
"type": "string",
"description": "Time filter: 'latest', 'today', 'yesterday', 'last week', 'last 7 days', 'last month', 'last 30 days', 'YYYY-MM-DD', or ISO datetime.",
"description": (
"Time filter: 'latest', 'yesterday', 'last week', or 'YYYY-MM-DD'"
),
},
"wait_if_running": {
"type": "integer",
"description": "Max seconds to wait if still running (0-300). Returns current state on timeout.",
"minimum": 0,
"maximum": 300,
"description": (
"Max seconds to wait if execution is still running (0-300). "
"If running, waits for completion. Returns current state on timeout."
),
},
},
"required": [],

View File

@@ -164,9 +164,8 @@ class BaseTool:
"""
if self.requires_auth and not user_id:
logger.warning(
"Attempted tool call for %s but user not authenticated",
self.name,
logger.error(
f"Attempted tool call for {self.name} but user not authenticated"
)
return StreamToolOutputAvailable(
toolCallId=tool_call_id,
@@ -197,7 +196,7 @@ class BaseTool:
output=raw_output,
)
except Exception as e:
logger.warning("Error in %s", self.name, exc_info=True)
logger.error(f"Error in {self.name}: {e}", exc_info=True)
return StreamToolOutputAvailable(
toolCallId=tool_call_id,
toolName=self.name,

View File

@@ -22,7 +22,6 @@ from e2b import AsyncSandbox
from e2b.exceptions import TimeoutException
from backend.copilot.context import E2B_WORKDIR, get_current_sandbox
from backend.copilot.integration_creds import get_integration_env_vars
from backend.copilot.model import ChatSession
from .base import BaseTool
@@ -42,9 +41,15 @@ class BashExecTool(BaseTool):
@property
def description(self) -> str:
return (
"Execute a Bash command or script. Shares filesystem with SDK file tools. "
"Useful for scripts, data processing, and package installation. "
"Killed after timeout (default 30s, max 120s)."
"Execute a Bash command or script. "
"Full Bash scripting is supported (loops, conditionals, pipes, "
"functions, etc.). "
"The working directory is shared with the SDK Read/Write/Edit/Glob/Grep "
"tools — files created by either are immediately visible to both. "
"Execution is killed after the timeout (default 30s, max 120s). "
"Returns stdout and stderr. "
"Useful for file manipulation, data processing, running scripts, "
"and installing packages."
)
@property
@@ -54,11 +59,13 @@ class BashExecTool(BaseTool):
"properties": {
"command": {
"type": "string",
"description": "Bash command or script.",
"description": "Bash command or script to execute.",
},
"timeout": {
"type": "integer",
"description": "Max seconds (default 30, max 120).",
"description": (
"Max execution time in seconds (default 30, max 120)."
),
"default": 30,
},
},
@@ -67,10 +74,7 @@ class BashExecTool(BaseTool):
@property
def requires_auth(self) -> bool:
# True because _execute_on_e2b injects user tokens (GH_TOKEN etc.)
# when user_id is present. Defense-in-depth: ensures only authenticated
# users reach the token injection path.
return True
return False
async def _execute(
self,
@@ -78,14 +82,6 @@ class BashExecTool(BaseTool):
session: ChatSession,
**kwargs: Any,
) -> ToolResponseBase:
"""Run a bash command on E2B (if available) or in a bubblewrap sandbox.
Dispatches to :meth:`_execute_on_e2b` when a sandbox is present in the
current execution context, otherwise falls back to the local bubblewrap
sandbox. Returns a :class:`BashExecResponse` on success or an
:class:`ErrorResponse` when the sandbox is unavailable or the command
is empty.
"""
session_id = session.session_id if session else None
command: str = (kwargs.get("command") or "").strip()
@@ -100,9 +96,7 @@ class BashExecTool(BaseTool):
sandbox = get_current_sandbox()
if sandbox is not None:
return await self._execute_on_e2b(
sandbox, command, timeout, session_id, user_id
)
return await self._execute_on_e2b(sandbox, command, timeout, session_id)
# Bubblewrap fallback: local isolated execution.
if not has_full_sandbox():
@@ -139,42 +133,19 @@ class BashExecTool(BaseTool):
command: str,
timeout: int,
session_id: str | None,
user_id: str | None = None,
) -> ToolResponseBase:
"""Execute *command* on the E2B sandbox via commands.run().
Integration tokens (e.g. GH_TOKEN) are injected into the sandbox env
for any user with connected accounts. E2B has full internet access, so
CLI tools like ``gh`` work without manual authentication.
"""
envs: dict[str, str] = {
"PATH": "/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin",
}
# Collect injected secret values so we can scrub them from output.
secret_values: list[str] = []
if user_id is not None:
integration_env = await get_integration_env_vars(user_id)
secret_values = [v for v in integration_env.values() if v]
envs.update(integration_env)
"""Execute *command* on the E2B sandbox via commands.run()."""
try:
result = await sandbox.commands.run(
f"bash -c {shlex.quote(command)}",
cwd=E2B_WORKDIR,
timeout=timeout,
envs=envs,
envs={"PATH": "/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin"},
)
stdout = result.stdout or ""
stderr = result.stderr or ""
# Scrub injected tokens from command output to prevent exfiltration
# via `echo $GH_TOKEN`, `env`, `printenv`, etc.
for secret in secret_values:
stdout = stdout.replace(secret, "[REDACTED]")
stderr = stderr.replace(secret, "[REDACTED]")
return BashExecResponse(
message=f"Command executed on E2B (exit {result.exit_code})",
stdout=stdout,
stderr=stderr,
stdout=result.stdout or "",
stderr=result.stderr or "",
exit_code=result.exit_code,
timed_out=False,
session_id=session_id,

View File

@@ -1,78 +0,0 @@
"""Tests for BashExecTool — E2B path with token injection."""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from ._test_data import make_session
from .bash_exec import BashExecTool
from .models import BashExecResponse
_USER = "user-bash-exec-test"
def _make_tool() -> BashExecTool:
return BashExecTool()
def _make_sandbox(exit_code: int = 0, stdout: str = "", stderr: str = "") -> MagicMock:
result = MagicMock()
result.exit_code = exit_code
result.stdout = stdout
result.stderr = stderr
sandbox = MagicMock()
sandbox.commands.run = AsyncMock(return_value=result)
return sandbox
class TestBashExecE2BTokenInjection:
@pytest.mark.asyncio(loop_scope="session")
async def test_token_injected_when_user_id_set(self):
"""When user_id is provided, integration env vars are merged into sandbox envs."""
tool = _make_tool()
session = make_session(user_id=_USER)
sandbox = _make_sandbox(stdout="ok")
env_vars = {"GH_TOKEN": "gh-secret", "GITHUB_TOKEN": "gh-secret"}
with patch(
"backend.copilot.tools.bash_exec.get_integration_env_vars",
new=AsyncMock(return_value=env_vars),
) as mock_get_env:
result = await tool._execute_on_e2b(
sandbox=sandbox,
command="echo hi",
timeout=10,
session_id=session.session_id,
user_id=_USER,
)
mock_get_env.assert_awaited_once_with(_USER)
call_kwargs = sandbox.commands.run.call_args[1]
assert call_kwargs["envs"]["GH_TOKEN"] == "gh-secret"
assert call_kwargs["envs"]["GITHUB_TOKEN"] == "gh-secret"
assert isinstance(result, BashExecResponse)
@pytest.mark.asyncio(loop_scope="session")
async def test_no_token_injection_when_user_id_is_none(self):
"""When user_id is None, get_integration_env_vars must NOT be called."""
tool = _make_tool()
session = make_session(user_id=_USER)
sandbox = _make_sandbox(stdout="ok")
with patch(
"backend.copilot.tools.bash_exec.get_integration_env_vars",
new=AsyncMock(return_value={"GH_TOKEN": "should-not-appear"}),
) as mock_get_env:
result = await tool._execute_on_e2b(
sandbox=sandbox,
command="echo hi",
timeout=10,
session_id=session.session_id,
user_id=None,
)
mock_get_env.assert_not_called()
call_kwargs = sandbox.commands.run.call_args[1]
assert "GH_TOKEN" not in call_kwargs["envs"]
assert isinstance(result, BashExecResponse)

View File

@@ -1,20 +0,0 @@
"""Local conftest for copilot/tools tests.
Overrides the session-scoped `server` and `graph_cleanup` autouse fixtures from
backend/conftest.py so that integration tests in this directory do not trigger
the full SpinTestServer startup (which requires Postgres + RabbitMQ).
"""
import pytest_asyncio
@pytest_asyncio.fixture(scope="session", loop_scope="session")
async def server(): # type: ignore[override]
"""No-op server stub — tools tests don't need the full backend."""
return None
@pytest_asyncio.fixture(scope="session", loop_scope="session", autouse=True)
async def graph_cleanup(): # type: ignore[override]
"""No-op graph cleanup stub."""
yield

View File

@@ -1,196 +0,0 @@
"""Tool for prompting the user to connect a required integration.
When the copilot encounters an authentication failure (e.g. `gh` CLI returns
"authentication required"), it calls this tool to surface the credentials
setup card in the chat — the same UI that appears when a GitHub block runs
without configured credentials.
"""
from typing import Any, TypedDict
from backend.copilot.model import ChatSession
from backend.copilot.providers import SUPPORTED_PROVIDERS, get_provider_auth_types
from backend.copilot.tools.models import (
ErrorResponse,
ResponseType,
SetupInfo,
SetupRequirementsResponse,
ToolResponseBase,
UserReadiness,
)
from .base import BaseTool
class _CredentialEntry(TypedDict):
"""Shape of each entry inside SetupRequirementsResponse.user_readiness.missing_credentials.
Partially overlaps with :class:`~backend.data.model.CredentialsMetaInput`
(``id``, ``title``, ``provider``) but carries extra UI-facing fields
(``types``, ``scopes``) that the frontend ``SetupRequirementsCard`` needs
to render the inline credential setup card.
Display name is derived from :data:`SUPPORTED_PROVIDERS` at build time
rather than stored here — eliminates the old ``provider_name`` field.
``types`` replaces the old singular ``type`` field; the frontend already
prefers ``types`` and only fell back to ``type`` for compatibility.
"""
id: str
title: str
# Slug used as the credential key (e.g. "github").
provider: str
# All supported credential types the user can choose from (e.g. ["api_key", "oauth2"]).
# The first element is the default/primary type.
types: list[str]
scopes: list[str]
class ConnectIntegrationTool(BaseTool):
"""Surface the credentials setup UI when an integration is not connected."""
@property
def name(self) -> str:
return "connect_integration"
@property
def description(self) -> str:
return (
"Prompt the user to connect a required integration (e.g. GitHub). "
"Call this when an external CLI or API call fails because the user "
"has not connected the relevant account. "
"The tool surfaces a credentials setup card in the chat so the user "
"can authenticate without leaving the page. "
"After the user connects the account, retry the operation. "
"In E2B/cloud sandbox mode the token (GH_TOKEN/GITHUB_TOKEN) is "
"automatically injected per-command in bash_exec — no manual export needed. "
"In local bubblewrap mode network is isolated so GitHub CLI commands "
"will still fail after connecting; inform the user of this limitation."
)
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"provider": {
"type": "string",
"description": (
"Integration provider slug, e.g. 'github'. "
"Must be one of the supported providers."
),
"enum": list(SUPPORTED_PROVIDERS.keys()),
},
"reason": {
"type": "string",
"description": (
"Brief explanation of why the integration is needed, "
"shown to the user in the setup card."
),
"maxLength": 500,
},
"scopes": {
"type": "array",
"items": {"type": "string"},
"description": (
"OAuth scopes to request. Omit to use the provider default. "
"Add extra scopes when you need more access — e.g. for GitHub: "
"'repo' (clone/push/pull), 'read:org' (org membership), "
"'workflow' (GitHub Actions). "
"Requesting only the scopes you actually need is best practice."
),
},
},
"required": ["provider"],
}
@property
def requires_auth(self) -> bool:
# Require auth so only authenticated users can trigger the setup card.
# The card itself is user-agnostic (no per-user data needed), so
# user_id is intentionally unused in _execute.
return True
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs: Any,
) -> ToolResponseBase:
"""Build and return a :class:`SetupRequirementsResponse` for the requested provider.
Validates the *provider* slug against the known registry, merges any
agent-requested OAuth *scopes* with the provider defaults, and constructs
the credential setup card payload that the frontend renders as an inline
authentication prompt.
Returns an :class:`ErrorResponse` if *provider* is unknown.
"""
_ = user_id # setup card is user-agnostic; auth is enforced via requires_auth
session_id = session.session_id if session else None
provider: str = (kwargs.get("provider") or "").strip().lower()
reason: str = (kwargs.get("reason") or "").strip()[
:500
] # cap LLM-controlled text
extra_scopes: list[str] = [
str(s).strip() for s in (kwargs.get("scopes") or []) if str(s).strip()
]
entry = SUPPORTED_PROVIDERS.get(provider)
if not entry:
supported = ", ".join(f"'{p}'" for p in SUPPORTED_PROVIDERS)
return ErrorResponse(
message=(
f"Unknown provider '{provider}'. "
f"Supported providers: {supported}."
),
error="unknown_provider",
session_id=session_id,
)
display_name: str = entry["name"]
supported_types: list[str] = get_provider_auth_types(provider)
# Merge agent-requested scopes with provider defaults (deduplicated, order preserved).
default_scopes: list[str] = entry["default_scopes"]
seen: set[str] = set()
scopes: list[str] = []
for s in default_scopes + extra_scopes:
if s not in seen:
seen.add(s)
scopes.append(s)
field_key = f"{provider}_credentials"
message_parts = [
f"To continue, please connect your {display_name} account.",
]
if reason:
message_parts.append(reason)
credential_entry: _CredentialEntry = {
"id": field_key,
"title": f"{display_name} Credentials",
"provider": provider,
"types": supported_types,
"scopes": scopes,
}
missing_credentials: dict[str, _CredentialEntry] = {field_key: credential_entry}
return SetupRequirementsResponse(
type=ResponseType.SETUP_REQUIREMENTS,
message=" ".join(message_parts),
session_id=session_id,
setup_info=SetupInfo(
agent_id=f"connect_{provider}",
agent_name=display_name,
user_readiness=UserReadiness(
has_all_credentials=False,
missing_credentials=missing_credentials,
ready_to_run=False,
),
requirements={
"credentials": [missing_credentials[field_key]],
"inputs": [],
"execution_modes": [],
},
),
)

View File

@@ -1,135 +0,0 @@
"""Tests for ConnectIntegrationTool."""
import pytest
from ._test_data import make_session
from .connect_integration import ConnectIntegrationTool
from .models import ErrorResponse, SetupRequirementsResponse
_TEST_USER_ID = "test-user-connect-integration"
class TestConnectIntegrationTool:
def _make_tool(self) -> ConnectIntegrationTool:
return ConnectIntegrationTool()
@pytest.mark.asyncio(loop_scope="session")
async def test_unknown_provider_returns_error(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="nonexistent"
)
assert isinstance(result, ErrorResponse)
assert result.error == "unknown_provider"
assert "nonexistent" in result.message
assert "github" in result.message
@pytest.mark.asyncio(loop_scope="session")
async def test_empty_provider_returns_error(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider=""
)
assert isinstance(result, ErrorResponse)
assert result.error == "unknown_provider"
@pytest.mark.asyncio(loop_scope="session")
async def test_github_provider_returns_setup_response(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="github"
)
assert isinstance(result, SetupRequirementsResponse)
assert result.setup_info.agent_name == "GitHub"
assert result.setup_info.agent_id == "connect_github"
@pytest.mark.asyncio(loop_scope="session")
async def test_github_has_missing_credentials_in_readiness(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="github"
)
assert isinstance(result, SetupRequirementsResponse)
readiness = result.setup_info.user_readiness
assert readiness.has_all_credentials is False
assert readiness.ready_to_run is False
assert "github_credentials" in readiness.missing_credentials
@pytest.mark.asyncio(loop_scope="session")
async def test_github_requirements_include_credential_entry(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="github"
)
assert isinstance(result, SetupRequirementsResponse)
creds = result.setup_info.requirements["credentials"]
assert len(creds) == 1
assert creds[0]["provider"] == "github"
assert creds[0]["id"] == "github_credentials"
@pytest.mark.asyncio(loop_scope="session")
async def test_reason_appears_in_message(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
reason = "Needed to create a pull request."
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="github", reason=reason
)
assert isinstance(result, SetupRequirementsResponse)
assert reason in result.message
@pytest.mark.asyncio(loop_scope="session")
async def test_session_id_propagated(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="github"
)
assert isinstance(result, SetupRequirementsResponse)
assert result.session_id == session.session_id
@pytest.mark.asyncio(loop_scope="session")
async def test_provider_case_insensitive(self):
"""Provider slug is normalised to lowercase before lookup."""
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="GitHub"
)
assert isinstance(result, SetupRequirementsResponse)
def test_tool_name(self):
assert ConnectIntegrationTool().name == "connect_integration"
def test_requires_auth(self):
assert ConnectIntegrationTool().requires_auth is True
@pytest.mark.asyncio(loop_scope="session")
async def test_unauthenticated_user_gets_need_login_response(self):
"""execute() with user_id=None must return NeedLoginResponse, not the setup card.
This verifies that the requires_auth guard in BaseTool.execute() fires
before _execute() is called, so unauthenticated callers cannot probe
which integrations are configured.
"""
import json
tool = self._make_tool()
# Session still needs a user_id string; the None is passed to execute()
# to simulate an unauthenticated call.
session = make_session(user_id=_TEST_USER_ID)
result = await tool.execute(
user_id=None,
session=session,
tool_call_id="test-call-id",
provider="github",
)
raw = result.output
output = json.loads(raw) if isinstance(raw, str) else raw
assert output.get("type") == "need_login"
assert result.success is False

View File

@@ -30,7 +30,12 @@ class ContinueRunBlockTool(BaseTool):
@property
def description(self) -> str:
return "Resume block execution after a run_block call returned review_required. Pass the review_id."
return (
"Continue executing a block after human review approval. "
"Use this after a run_block call returned review_required. "
"Pass the review_id from the review_required response. "
"The block will execute with the original pre-approved input data."
)
@property
def parameters(self) -> dict[str, Any]:
@@ -39,7 +44,10 @@ class ContinueRunBlockTool(BaseTool):
"properties": {
"review_id": {
"type": "string",
"description": "review_id from the review_required response.",
"description": (
"The review_id from a previous review_required response. "
"This resumes execution with the pre-approved input data."
),
},
},
"required": ["review_id"],

View File

@@ -23,8 +23,12 @@ class CreateAgentTool(BaseTool):
@property
def description(self) -> str:
return (
"Create a new agent from JSON (nodes + links). Validates, auto-fixes, and saves. "
"Before calling, search for existing agents with find_library_agent."
"Create a new agent workflow. Pass `agent_json` with the complete "
"agent graph JSON you generated using block schemas from find_block. "
"The tool validates, auto-fixes, and saves.\n\n"
"IMPORTANT: Before calling this tool, search for relevant existing agents "
"using find_library_agent that could be used as building blocks. "
"Pass their IDs in the library_agent_ids parameter."
)
@property
@@ -38,21 +42,34 @@ class CreateAgentTool(BaseTool):
"properties": {
"agent_json": {
"type": "object",
"description": "Agent graph with 'nodes' and 'links' arrays.",
"description": (
"The agent JSON to validate and save. "
"Must contain 'nodes' and 'links' arrays, and optionally "
"'name' and 'description'."
),
},
"library_agent_ids": {
"type": "array",
"items": {"type": "string"},
"description": "Library agent IDs as building blocks.",
"description": (
"List of library agent IDs to use as building blocks."
),
},
"save": {
"type": "boolean",
"description": "Save the agent (default: true). False for preview.",
"description": (
"Whether to save the agent. Default is true. "
"Set to false for preview only."
),
"default": True,
},
"folder_id": {
"type": "string",
"description": "Folder ID to save into (default: root).",
"description": (
"Optional folder ID to save the agent into. "
"If not provided, the agent is saved at root level. "
"Use list_folders to find available folders."
),
},
},
"required": ["agent_json"],

View File

@@ -23,7 +23,9 @@ class CustomizeAgentTool(BaseTool):
@property
def description(self) -> str:
return (
"Customize a marketplace/template agent. Validates, auto-fixes, and saves."
"Customize a marketplace or template agent. Pass `agent_json` "
"with the complete customized agent JSON. The tool validates, "
"auto-fixes, and saves."
)
@property
@@ -37,21 +39,32 @@ class CustomizeAgentTool(BaseTool):
"properties": {
"agent_json": {
"type": "object",
"description": "Customized agent JSON with nodes and links.",
"description": (
"Complete customized agent JSON to validate and save. "
"Optionally include 'name' and 'description'."
),
},
"library_agent_ids": {
"type": "array",
"items": {"type": "string"},
"description": "Library agent IDs as building blocks.",
"description": (
"List of library agent IDs to use as building blocks."
),
},
"save": {
"type": "boolean",
"description": "Save the agent (default: true). False for preview.",
"description": (
"Whether to save the customized agent. Default is true."
),
"default": True,
},
"folder_id": {
"type": "string",
"description": "Folder ID to save into (default: root).",
"description": (
"Optional folder ID to save the agent into. "
"If not provided, the agent is saved at root level. "
"Use list_folders to find available folders."
),
},
},
"required": ["agent_json"],

View File

@@ -41,7 +41,8 @@ import contextlib
import logging
from typing import Any, Awaitable, Callable, Literal
from e2b import AsyncSandbox, SandboxLifecycle
from e2b import AsyncSandbox
from e2b.sandbox.sandbox_api import SandboxLifecycle
from backend.data.redis_client import get_redis_async

View File

@@ -23,8 +23,12 @@ class EditAgentTool(BaseTool):
@property
def description(self) -> str:
return (
"Edit an existing agent. Validates, auto-fixes, and saves. "
"Before calling, search for existing agents with find_library_agent."
"Edit an existing agent. Pass `agent_json` with the complete "
"updated agent JSON you generated. The tool validates, auto-fixes, "
"and saves.\n\n"
"IMPORTANT: Before calling this tool, if the changes involve adding new "
"functionality, search for relevant existing agents using find_library_agent "
"that could be used as building blocks."
)
@property
@@ -38,20 +42,33 @@ class EditAgentTool(BaseTool):
"properties": {
"agent_id": {
"type": "string",
"description": "Graph ID or library agent ID to edit.",
"description": (
"The ID of the agent to edit. "
"Can be a graph ID or library agent ID."
),
},
"agent_json": {
"type": "object",
"description": "Updated agent JSON with nodes and links.",
"description": (
"Complete updated agent JSON to validate and save. "
"Must contain 'nodes' and 'links'. "
"Include 'name' and/or 'description' if they need "
"to be updated."
),
},
"library_agent_ids": {
"type": "array",
"items": {"type": "string"},
"description": "Library agent IDs as building blocks.",
"description": (
"List of library agent IDs to use as building blocks for the changes."
),
},
"save": {
"type": "boolean",
"description": "Save changes (default: true). False for preview.",
"description": (
"Whether to save the changes. "
"Default is true. Set to false for preview only."
),
"default": True,
},
},

View File

@@ -134,7 +134,11 @@ class SearchFeatureRequestsTool(BaseTool):
@property
def description(self) -> str:
return "Search existing feature requests. Check before creating a new one."
return (
"Search existing feature requests to check if a similar request "
"already exists before creating a new one. Returns matching feature "
"requests with their ID, title, and description."
)
@property
def parameters(self) -> dict[str, Any]:
@@ -230,9 +234,14 @@ class CreateFeatureRequestTool(BaseTool):
@property
def description(self) -> str:
return (
"Create a feature request or add need to existing one. "
"Search first to avoid duplicates. Pass existing_issue_id to add to existing. "
"Never include PII (names, emails, phone numbers, company names) in title/description."
"Create a new feature request or add a customer need to an existing one. "
"Always search first with search_feature_requests to avoid duplicates. "
"If a matching request exists, pass its ID as existing_issue_id to add "
"the user's need to it instead of creating a duplicate. "
"IMPORTANT: Never include personally identifiable information (PII) in "
"the title or description — no names, emails, phone numbers, company "
"names, or other identifying details. Write titles and descriptions in "
"generic, feature-focused language."
)
@property
@@ -242,15 +251,28 @@ class CreateFeatureRequestTool(BaseTool):
"properties": {
"title": {
"type": "string",
"description": "Feature request title. No names, emails, or company info.",
"description": (
"Title for the feature request. Must be generic and "
"feature-focused — do not include any user names, emails, "
"company names, or other PII."
),
},
"description": {
"type": "string",
"description": "What the user wants and why. No names, emails, or company info.",
"description": (
"Detailed description of what the user wants and why. "
"Must not contain any personally identifiable information "
"(PII) — describe the feature need generically without "
"referencing specific users, companies, or contact details."
),
},
"existing_issue_id": {
"type": "string",
"description": "Linear issue ID to add need to (from search results).",
"description": (
"If adding a need to an existing feature request, "
"provide its Linear issue ID (from search results). "
"Omit to create a new feature request."
),
},
},
"required": ["title", "description"],

View File

@@ -18,7 +18,10 @@ class FindAgentTool(BaseTool):
@property
def description(self) -> str:
return "Search marketplace agents by capability, or look up by slug ('username/agent-name')."
return (
"Discover agents from the marketplace based on capabilities and "
"user needs, or look up a specific agent by its creator/slug ID."
)
@property
def parameters(self) -> dict[str, Any]:
@@ -27,7 +30,7 @@ class FindAgentTool(BaseTool):
"properties": {
"query": {
"type": "string",
"description": "Search keywords, or 'username/agent-name' for direct slug lookup.",
"description": "Search query describing what the user wants to accomplish, or a creator/slug ID (e.g. 'username/agent-name') for direct lookup. Use single keywords for best results.",
},
},
"required": ["query"],

View File

@@ -54,9 +54,13 @@ class FindBlockTool(BaseTool):
@property
def description(self) -> str:
return (
"Search blocks by name or description. Returns block IDs for run_block. "
"Always call this FIRST to get block IDs before using run_block. "
"Then call run_block with the block's id and empty input_data to see its detailed schema."
"Search for available blocks by name or description, or look up a "
"specific block by its ID. "
"Blocks are reusable components that perform specific tasks like "
"sending emails, making API calls, processing text, etc. "
"IMPORTANT: Use this tool FIRST to get the block's 'id' before calling run_block. "
"The response includes each block's id, name, and description. "
"Call run_block with the block's id **with no inputs** to see detailed inputs/outputs and execute it."
)
@property
@@ -66,11 +70,19 @@ class FindBlockTool(BaseTool):
"properties": {
"query": {
"type": "string",
"description": "Search keywords (e.g. 'email', 'http', 'ai').",
"description": (
"Search query to find blocks by name or description, "
"or a block ID (UUID) for direct lookup. "
"Use keywords like 'email', 'http', 'text', 'ai', etc."
),
},
"include_schemas": {
"type": "boolean",
"description": "Include full input/output schemas (for agent JSON generation).",
"description": (
"If true, include full input_schema and output_schema "
"for each block. Use when generating agent JSON that "
"needs block schemas. Default is false."
),
"default": False,
},
},

View File

@@ -19,8 +19,13 @@ class FindLibraryAgentTool(BaseTool):
@property
def description(self) -> str:
return (
"Search user's library agents. Returns graph_id, schemas for sub-agent composition. "
"Omit query to list all."
"Search for or list agents in the user's library. Use this to find "
"agents the user has already added to their library, including agents "
"they created or added from the marketplace. "
"When creating agents with sub-agent composition, use this to get "
"the agent's graph_id, graph_version, input_schema, and output_schema "
"needed for AgentExecutorBlock nodes. "
"Omit the query to list all agents."
)
@property
@@ -30,7 +35,10 @@ class FindLibraryAgentTool(BaseTool):
"properties": {
"query": {
"type": "string",
"description": "Search by name/description. Omit to list all.",
"description": (
"Search query to find agents by name or description. "
"Omit to list all agents in the library."
),
},
},
"required": [],

View File

@@ -22,10 +22,20 @@ class FixAgentGraphTool(BaseTool):
@property
def description(self) -> str:
return (
"Auto-fix common agent JSON issues: missing/invalid UUIDs, StoreValueBlock prerequisites, "
"double curly brace escaping, AddToList/AddToDictionary prerequisites, credentials, "
"node spacing, AI model defaults, link static properties, and type mismatches. "
"Returns fixed JSON and list of fixes applied."
"Auto-fix common issues in an agent JSON graph. Applies fixes for:\n"
"- Missing or invalid UUIDs on nodes and links\n"
"- StoreValueBlock prerequisites for ConditionBlock\n"
"- Double curly brace escaping in prompt templates\n"
"- AddToList/AddToDictionary prerequisite blocks\n"
"- CodeExecutionBlock output field naming\n"
"- Missing credentials configuration\n"
"- Node X coordinate spacing (800+ units apart)\n"
"- AI model default parameters\n"
"- Link static properties based on input schema\n"
"- Type mismatches (inserts conversion blocks)\n\n"
"Returns the fixed agent JSON plus a list of fixes applied. "
"After fixing, the agent is re-validated. If still invalid, "
"the remaining errors are included in the response."
)
@property

View File

@@ -42,7 +42,12 @@ class GetAgentBuildingGuideTool(BaseTool):
@property
def description(self) -> str:
return "Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage). Call before generating agent JSON."
return (
"Returns the complete guide for building agent JSON graphs, including "
"block IDs, link structure, AgentInputBlock, AgentOutputBlock, "
"AgentExecutorBlock (for sub-agent composition), and MCPToolBlock usage. "
"Call this before generating agent JSON to ensure correct structure."
)
@property
def parameters(self) -> dict[str, Any]:

View File

@@ -25,7 +25,8 @@ class GetDocPageTool(BaseTool):
@property
def description(self) -> str:
return (
"Read full documentation page content by path (from search_docs results)."
"Get the full content of a documentation page by its path. "
"Use this after search_docs to read the complete content of a relevant page."
)
@property
@@ -35,7 +36,10 @@ class GetDocPageTool(BaseTool):
"properties": {
"path": {
"type": "string",
"description": "Doc file path (e.g. 'platform/block-sdk-guide.md').",
"description": (
"The path to the documentation file, as returned by search_docs. "
"Example: 'platform/block-sdk-guide.md'"
),
},
},
"required": ["path"],

View File

@@ -38,7 +38,11 @@ class GetMCPGuideTool(BaseTool):
@property
def description(self) -> str:
return "Get MCP server URLs and auth guide. Call before run_mcp_tool if you need a server URL or auth info."
return (
"Returns the MCP tool guide: known hosted server URLs (Notion, Linear, "
"Stripe, Intercom, Cloudflare, Atlassian) and authentication workflow. "
"Call before using run_mcp_tool if you need a server URL or auth info."
)
@property
def parameters(self) -> dict[str, Any]:

View File

@@ -88,7 +88,10 @@ class CreateFolderTool(BaseTool):
@property
def description(self) -> str:
return "Create a library folder. Use parent_id to nest inside another folder."
return (
"Create a new folder in the user's library to organize agents. "
"Optionally nest it inside an existing folder using parent_id."
)
@property
def requires_auth(self) -> bool:
@@ -101,19 +104,22 @@ class CreateFolderTool(BaseTool):
"properties": {
"name": {
"type": "string",
"description": "Folder name (max 100 chars).",
"description": "Name for the new folder (max 100 chars).",
},
"parent_id": {
"type": "string",
"description": "Parent folder ID (omit for root).",
"description": (
"ID of the parent folder to nest inside. "
"Omit to create at root level."
),
},
"icon": {
"type": "string",
"description": "Icon identifier.",
"description": "Optional icon identifier for the folder.",
},
"color": {
"type": "string",
"description": "Hex color (#RRGGBB).",
"description": "Optional hex color code (#RRGGBB).",
},
},
"required": ["name"],
@@ -169,9 +175,13 @@ class ListFoldersTool(BaseTool):
@property
def description(self) -> str:
return (
"List library folders. Omit parent_id for full tree. "
"Set include_agents=true when user asks about agents, wants to see "
"what's in their folders, or mentions agents alongside folders."
"List the user's library folders. "
"Omit parent_id to get the full folder tree. "
"Provide parent_id to list only direct children of that folder. "
"Set include_agents=true to also return the agents inside each folder "
"and root-level agents not in any folder. Always set include_agents=true "
"when the user asks about agents, wants to see what's in their folders, "
"or mentions agents alongside folders."
)
@property
@@ -185,11 +195,17 @@ class ListFoldersTool(BaseTool):
"properties": {
"parent_id": {
"type": "string",
"description": "List children of this folder (omit for full tree).",
"description": (
"List children of this folder. "
"Omit to get the full folder tree."
),
},
"include_agents": {
"type": "boolean",
"description": "Include agents in each folder (default: false).",
"description": (
"Whether to include the list of agents inside each folder. "
"Defaults to false."
),
},
},
"required": [],
@@ -341,7 +357,10 @@ class MoveFolderTool(BaseTool):
@property
def description(self) -> str:
return "Move a folder. Set target_parent_id to null for root."
return (
"Move a folder to a different parent folder. "
"Set target_parent_id to null to move to root level."
)
@property
def requires_auth(self) -> bool:
@@ -354,11 +373,14 @@ class MoveFolderTool(BaseTool):
"properties": {
"folder_id": {
"type": "string",
"description": "Folder ID.",
"description": "ID of the folder to move.",
},
"target_parent_id": {
"type": ["string", "null"],
"description": "New parent folder ID (null for root).",
"description": (
"ID of the new parent folder. "
"Use null to move to root level."
),
},
},
"required": ["folder_id"],
@@ -411,7 +433,10 @@ class DeleteFolderTool(BaseTool):
@property
def description(self) -> str:
return "Delete a folder. Agents inside move to root (not deleted)."
return (
"Delete a folder from the user's library. "
"Agents inside the folder are moved to root level (not deleted)."
)
@property
def requires_auth(self) -> bool:
@@ -474,7 +499,10 @@ class MoveAgentsToFolderTool(BaseTool):
@property
def description(self) -> str:
return "Move agents to a folder. Set folder_id to null for root."
return (
"Move one or more agents to a folder. "
"Set folder_id to null to move agents to root level."
)
@property
def requires_auth(self) -> bool:
@@ -488,11 +516,13 @@ class MoveAgentsToFolderTool(BaseTool):
"agent_ids": {
"type": "array",
"items": {"type": "string"},
"description": "Library agent IDs to move.",
"description": "List of library agent IDs to move.",
},
"folder_id": {
"type": ["string", "null"],
"description": "Target folder ID (null for root).",
"description": (
"Target folder ID. Use null to move to root level."
),
},
},
"required": ["agent_ids"],

View File

@@ -104,11 +104,19 @@ class RunAgentTool(BaseTool):
@property
def description(self) -> str:
return (
"Run or schedule an agent. Automatically checks inputs and credentials. "
"Identify by username_agent_slug ('user/agent') or library_agent_id. "
"For scheduling, provide schedule_name + cron."
)
return """Run or schedule an agent from the marketplace or user's library.
The tool automatically handles the setup flow:
- Returns missing inputs if required fields are not provided
- Returns missing credentials if user needs to configure them
- Executes immediately if all requirements are met
- Schedules execution if cron expression is provided
Identify the agent using either:
- username_agent_slug: Marketplace format 'username/agent-name'
- library_agent_id: ID of an agent in the user's library
For scheduled execution, provide: schedule_name, cron, and optionally timezone."""
@property
def parameters(self) -> dict[str, Any]:
@@ -117,38 +125,40 @@ class RunAgentTool(BaseTool):
"properties": {
"username_agent_slug": {
"type": "string",
"description": "Marketplace format 'username/agent-name'.",
"description": "Agent identifier in format 'username/agent-name'",
},
"library_agent_id": {
"type": "string",
"description": "Library agent ID.",
"description": "Library agent ID from user's library",
},
"inputs": {
"type": "object",
"description": "Input values for the agent.",
"description": "Input values for the agent",
"additionalProperties": True,
},
"use_defaults": {
"type": "boolean",
"description": "Run with default values (confirm with user first).",
"description": "Set to true to run with default values (user must confirm)",
},
"schedule_name": {
"type": "string",
"description": "Name for scheduled execution. Providing this triggers scheduling mode (also requires cron).",
"description": "Name for scheduled execution (triggers scheduling mode)",
},
"cron": {
"type": "string",
"description": "Cron expression (min hour day month weekday).",
"description": "Cron expression (5 fields: min hour day month weekday)",
},
"timezone": {
"type": "string",
"description": "IANA timezone (default: UTC).",
"description": "IANA timezone for schedule (default: UTC)",
},
"wait_for_result": {
"type": "integer",
"description": "Max seconds to wait for completion (0-300).",
"minimum": 0,
"maximum": 300,
"description": (
"Max seconds to wait for execution to complete (0-300). "
"If >0, blocks until the execution finishes or times out. "
"Returns execution outputs when complete."
),
},
},
"required": [],

View File

@@ -45,10 +45,13 @@ class RunBlockTool(BaseTool):
@property
def description(self) -> str:
return (
"Execute a block. IMPORTANT: Always get block_id from find_block first "
"— do NOT guess or fabricate IDs. "
"Call with empty input_data to see schema, then with data to execute. "
"If review_required, use continue_run_block."
"Execute a specific block with the provided input data. "
"IMPORTANT: You MUST call find_block first to get the block's 'id' - "
"do NOT guess or make up block IDs. "
"On first attempt (without input_data), returns detailed schema showing "
"required inputs and outputs. Then call again with proper input_data to execute. "
"If a block requires human review, use continue_run_block with the "
"review_id after the user approves."
)
@property
@@ -58,14 +61,28 @@ class RunBlockTool(BaseTool):
"properties": {
"block_id": {
"type": "string",
"description": "Block ID from find_block results.",
"description": (
"The block's 'id' field from find_block results. "
"NEVER guess this - always get it from find_block first."
),
},
"block_name": {
"type": "string",
"description": (
"The block's human-readable name from find_block results. "
"Used for display purposes in the UI."
),
},
"input_data": {
"type": "object",
"description": "Input values. Use {} first to see schema.",
"description": (
"Input values for the block. "
"First call with empty {} to see the block's schema, "
"then call again with proper values to execute."
),
},
},
"required": ["block_id", "input_data"],
"required": ["block_id", "block_name", "input_data"],
}
@property

View File

@@ -57,9 +57,10 @@ class RunMCPToolTool(BaseTool):
@property
def description(self) -> str:
return (
"Discover and execute MCP server tools. "
"Call with server_url only to list tools, then with tool_name + tool_arguments to execute. "
"Call get_mcp_guide first for server URLs and auth."
"Connect to an MCP (Model Context Protocol) server to discover and execute its tools. "
"Two-step: (1) call with server_url to list available tools, "
"(2) call again with server_url + tool_name + tool_arguments to execute. "
"Call get_mcp_guide for known server URLs and auth details."
)
@property
@@ -69,15 +70,24 @@ class RunMCPToolTool(BaseTool):
"properties": {
"server_url": {
"type": "string",
"description": "MCP server URL (Streamable HTTP endpoint).",
"description": (
"URL of the MCP server (Streamable HTTP endpoint), "
"e.g. https://mcp.example.com/mcp"
),
},
"tool_name": {
"type": "string",
"description": "Tool to execute. Omit to discover available tools.",
"description": (
"Name of the MCP tool to execute. "
"Omit on first call to discover available tools."
),
},
"tool_arguments": {
"type": "object",
"description": "Arguments matching the tool's input schema.",
"description": (
"Arguments to pass to the selected tool. "
"Must match the tool's input schema returned during discovery."
),
},
},
"required": ["server_url"],

View File

@@ -38,7 +38,11 @@ class SearchDocsTool(BaseTool):
@property
def description(self) -> str:
return "Search platform documentation by keyword. Use get_doc_page to read full results."
return (
"Search the AutoGPT platform documentation for information about "
"how to use the platform, build agents, configure blocks, and more. "
"Returns relevant documentation sections. Use get_doc_page to read full content."
)
@property
def parameters(self) -> dict[str, Any]:
@@ -47,7 +51,10 @@ class SearchDocsTool(BaseTool):
"properties": {
"query": {
"type": "string",
"description": "Documentation search query.",
"description": (
"Search query to find relevant documentation. "
"Use natural language to describe what you're looking for."
),
},
},
"required": ["query"],

View File

@@ -1,119 +0,0 @@
"""Schema regression tests for all registered CoPilot tools.
Validates that every tool in TOOL_REGISTRY produces a well-formed schema:
- description is non-empty
- all `required` fields exist in `properties`
- every property has a `type` and `description`
- total schema character budget does not regress past threshold
"""
import json
from typing import Any, cast
import pytest
from backend.copilot.tools import TOOL_REGISTRY
# Character budget (~4 chars/token heuristic, targeting ~8000 tokens)
_CHAR_BUDGET = 32_000
@pytest.fixture(scope="module")
def all_tool_schemas() -> list[tuple[str, Any]]:
"""Return (tool_name, openai_schema) pairs for every registered tool."""
return [(name, tool.as_openai_tool()) for name, tool in TOOL_REGISTRY.items()]
def _get_parametrize_data() -> list[tuple[str, object]]:
"""Build parametrize data at collection time."""
return [(name, tool.as_openai_tool()) for name, tool in TOOL_REGISTRY.items()]
@pytest.mark.parametrize(
"tool_name,schema",
_get_parametrize_data(),
ids=[name for name, _ in _get_parametrize_data()],
)
class TestToolSchema:
"""Validate schema invariants for every registered tool."""
def test_description_non_empty(self, tool_name: str, schema: dict) -> None:
desc = schema["function"].get("description", "")
assert desc, f"Tool '{tool_name}' has an empty description"
def test_required_fields_exist_in_properties(
self, tool_name: str, schema: dict
) -> None:
params = schema["function"].get("parameters", {})
properties = params.get("properties", {})
required = params.get("required", [])
for field in required:
assert field in properties, (
f"Tool '{tool_name}': required field '{field}' "
f"not found in properties {list(properties.keys())}"
)
def test_every_property_has_type_and_description(
self, tool_name: str, schema: dict
) -> None:
params = schema["function"].get("parameters", {})
properties = params.get("properties", {})
for prop_name, prop_def in properties.items():
assert (
"type" in prop_def
), f"Tool '{tool_name}', property '{prop_name}' is missing 'type'"
assert (
"description" in prop_def
), f"Tool '{tool_name}', property '{prop_name}' is missing 'description'"
def test_browser_act_action_enum_complete() -> None:
"""Assert browser_act action enum still contains all 14 supported actions.
This prevents future PRs from accidentally dropping actions during description
trimming. The enum is the authoritative list — this locks it at 14 values.
"""
tool = TOOL_REGISTRY["browser_act"]
schema = tool.as_openai_tool()
fn_def = schema["function"]
params = cast(dict[str, Any], fn_def.get("parameters", {}))
actions = params["properties"]["action"]["enum"]
expected = {
"click",
"dblclick",
"fill",
"type",
"scroll",
"hover",
"press",
"check",
"uncheck",
"select",
"wait",
"back",
"forward",
"reload",
}
assert set(actions) == expected, (
f"browser_act action enum changed. Got {set(actions)}, expected {expected}. "
"If you added/removed an action, update this test intentionally."
)
def test_total_schema_char_budget() -> None:
"""Assert total tool schema size stays under the character budget.
This locks in the 34% token reduction from #12398 and prevents future
description bloat from eroding the gains. Uses character count with a
~4 chars/token heuristic (budget of 32000 chars ≈ 8000 tokens).
Character count is tokenizer-agnostic — no dependency on GPT or Claude
tokenizers — while still providing a stable regression gate.
"""
schemas = [tool.as_openai_tool() for tool in TOOL_REGISTRY.values()]
serialized = json.dumps(schemas)
total_chars = len(serialized)
assert total_chars < _CHAR_BUDGET, (
f"Tool schemas use {total_chars} chars (~{total_chars // 4} tokens), "
f"exceeding budget of {_CHAR_BUDGET} chars (~{_CHAR_BUDGET // 4} tokens). "
f"Description bloat detected — trim descriptions or raise the budget intentionally."
)

View File

@@ -22,9 +22,17 @@ class ValidateAgentGraphTool(BaseTool):
@property
def description(self) -> str:
return (
"Validate agent JSON for correctness: block_ids, links, required fields, "
"type compatibility, nested sink notation, prompt brace escaping, "
"and AgentExecutorBlock configs. On failure, use fix_agent_graph to auto-fix."
"Validate an agent JSON graph for correctness. Checks:\n"
"- All block_ids reference real blocks\n"
"- All links reference valid source/sink nodes and fields\n"
"- Required input fields are wired or have defaults\n"
"- Data types are compatible across links\n"
"- Nested sink links use correct notation\n"
"- Prompt templates use proper curly brace escaping\n"
"- AgentExecutorBlock configurations are valid\n\n"
"Call this after generating agent JSON to verify correctness. "
"If validation fails, either fix issues manually based on the error "
"descriptions, or call fix_agent_graph to auto-fix common problems."
)
@property
@@ -38,7 +46,11 @@ class ValidateAgentGraphTool(BaseTool):
"properties": {
"agent_json": {
"type": "object",
"description": "Agent JSON with 'nodes' and 'links' arrays.",
"description": (
"The agent JSON to validate. Must contain 'nodes' and 'links' arrays. "
"Each node needs: id (UUID), block_id, input_default, metadata. "
"Each link needs: id (UUID), source_id, source_name, sink_id, sink_name."
),
},
},
"required": ["agent_json"],

View File

@@ -59,7 +59,13 @@ class WebFetchTool(BaseTool):
@property
def description(self) -> str:
return "Fetch a public web page. Public URLs only — internal addresses blocked. Returns readable text from HTML by default."
return (
"Fetch the content of a public web page by URL. "
"Returns readable text extracted from HTML by default. "
"Useful for reading documentation, articles, and API responses. "
"Only supports HTTP/HTTPS GET requests to public URLs "
"(private/internal network addresses are blocked)."
)
@property
def parameters(self) -> dict[str, Any]:
@@ -68,11 +74,14 @@ class WebFetchTool(BaseTool):
"properties": {
"url": {
"type": "string",
"description": "Public HTTP/HTTPS URL.",
"description": "The public HTTP/HTTPS URL to fetch.",
},
"extract_text": {
"type": "boolean",
"description": "Extract text from HTML (default: true).",
"description": (
"If true (default), extract readable text from HTML. "
"If false, return raw content."
),
"default": True,
},
},

View File

@@ -27,8 +27,6 @@ from .models import ErrorResponse, ResponseType, ToolResponseBase
logger = logging.getLogger(__name__)
_MAX_FILE_SIZE_MB = Config().max_file_size_mb
# Sentinel file_id used when a tool-result file is read directly from the local
# host filesystem (rather than from workspace storage).
_LOCAL_TOOL_RESULT_FILE_ID = "local"
@@ -417,7 +415,13 @@ class ListWorkspaceFilesTool(BaseTool):
@property
def description(self) -> str:
return "List persistent workspace files. For ephemeral session files, use SDK Glob/Read instead. Optionally filter by path prefix."
return (
"List files in the user's persistent workspace (cloud storage). "
"These files survive across sessions. "
"For ephemeral session files, use the SDK Read/Glob tools instead. "
"Returns file names, paths, sizes, and metadata. "
"Optionally filter by path prefix."
)
@property
def parameters(self) -> dict[str, Any]:
@@ -426,17 +430,24 @@ class ListWorkspaceFilesTool(BaseTool):
"properties": {
"path_prefix": {
"type": "string",
"description": "Filter by path prefix (e.g. '/documents/').",
"description": (
"Optional path prefix to filter files "
"(e.g., '/documents/' to list only files in documents folder). "
"By default, only files from the current session are listed."
),
},
"limit": {
"type": "integer",
"description": "Max files to return (default 50, max 100).",
"description": "Maximum number of files to return (default 50, max 100)",
"minimum": 1,
"maximum": 100,
},
"include_all_sessions": {
"type": "boolean",
"description": "Include files from all sessions (default: false).",
"description": (
"If true, list files from all sessions. "
"Default is false (only current session's files)."
),
},
},
"required": [],
@@ -519,11 +530,18 @@ class ReadWorkspaceFileTool(BaseTool):
@property
def description(self) -> str:
return (
"Read a file from persistent workspace. Specify file_id or path. "
"Small text/image files return inline; large/binary return metadata+URL. "
"Use save_to_path to copy to working dir for processing. "
"Use offset/length for paginated reads. "
"Paths scoped to current session; use /sessions/<id>/... for cross-session access."
"Read a file from the user's persistent workspace (cloud storage). "
"These files survive across sessions. "
"For ephemeral session files, use the SDK Read tool instead. "
"Specify either file_id or path to identify the file. "
"For small text files, returns content directly. "
"For large or binary files, returns metadata and a download URL. "
"Use 'save_to_path' to copy the file to the working directory "
"(sandbox or ephemeral) for processing with bash_exec or file tools. "
"Use 'offset' and 'length' for paginated reads of large files "
"(e.g., persisted tool outputs). "
"Paths are scoped to the current session by default. "
"Use /sessions/<session_id>/... for cross-session access."
)
@property
@@ -533,30 +551,48 @@ class ReadWorkspaceFileTool(BaseTool):
"properties": {
"file_id": {
"type": "string",
"description": "File ID from list_workspace_files.",
"description": "The file's unique ID (from list_workspace_files)",
},
"path": {
"type": "string",
"description": "Virtual file path (e.g. '/documents/report.pdf').",
"description": (
"The virtual file path (e.g., '/documents/report.pdf'). "
"Scoped to current session by default."
),
},
"save_to_path": {
"type": "string",
"description": "Copy file to this working directory path for processing.",
"description": (
"If provided, save the file to this path in the working "
"directory (cloud sandbox when E2B is active, or "
"ephemeral dir otherwise) so it can be processed with "
"bash_exec or file tools. "
"The file content is still returned in the response."
),
},
"force_download_url": {
"type": "boolean",
"description": "Always return metadata+URL instead of inline content.",
"description": (
"If true, always return metadata+URL instead of inline content. "
"Default is false (auto-selects based on file size/type)."
),
},
"offset": {
"type": "integer",
"description": "Character offset for paginated reads (0-based).",
"description": (
"Character offset to start reading from (0-based). "
"Use with 'length' for paginated reads of large files."
),
},
"length": {
"type": "integer",
"description": "Max characters to return for paginated reads.",
"description": (
"Maximum number of characters to return. "
"Defaults to full file. Use with 'offset' for paginated reads."
),
},
},
"required": [], # At least one of file_id or path must be provided
"required": [], # At least one must be provided
}
@property
@@ -719,10 +755,15 @@ class WriteWorkspaceFileTool(BaseTool):
@property
def description(self) -> str:
return (
"Write a file to persistent workspace (survives across sessions). "
"Provide exactly one of: content (text), content_base64 (binary), "
f"or source_path (copy from working dir). Max {_MAX_FILE_SIZE_MB}MB. "
"Paths scoped to current session; use /sessions/<id>/... for cross-session access."
"Write or create a file in the user's persistent workspace (cloud storage). "
"These files survive across sessions. "
"For ephemeral session files, use the SDK Write tool instead. "
"Provide content as plain text via 'content', OR base64-encoded via "
"'content_base64', OR copy a file from the ephemeral working directory "
"via 'source_path'. Exactly one of these three is required. "
f"Maximum file size is {Config().max_file_size_mb}MB. "
"Files are saved to the current session's folder by default. "
"Use /sessions/<session_id>/... for cross-session access."
)
@property
@@ -732,31 +773,51 @@ class WriteWorkspaceFileTool(BaseTool):
"properties": {
"filename": {
"type": "string",
"description": "Filename (e.g. 'report.pdf').",
"description": "Name for the file (e.g., 'report.pdf')",
},
"content": {
"type": "string",
"description": "Plain text content. Mutually exclusive with content_base64/source_path.",
"description": (
"Plain text content to write. Use this for text files "
"(code, configs, documents, etc.). "
"Mutually exclusive with content_base64 and source_path."
),
},
"content_base64": {
"type": "string",
"description": "Base64-encoded binary content. Mutually exclusive with content/source_path.",
"description": (
"Base64-encoded file content. Use this for binary files "
"(images, PDFs, etc.). "
"Mutually exclusive with content and source_path."
),
},
"source_path": {
"type": "string",
"description": "Working directory path to copy to workspace. Mutually exclusive with content/content_base64.",
"description": (
"Path to a file in the ephemeral working directory to "
"copy to workspace (e.g., '/tmp/copilot-.../output.csv'). "
"Use this to persist files created by bash_exec or SDK Write. "
"Mutually exclusive with content and content_base64."
),
},
"path": {
"type": "string",
"description": "Virtual path (e.g. '/documents/report.pdf'). Defaults to '/{filename}'.",
"description": (
"Optional virtual path where to save the file "
"(e.g., '/documents/report.pdf'). "
"Defaults to '/{filename}'. Scoped to current session."
),
},
"mime_type": {
"type": "string",
"description": "MIME type. Auto-detected from filename if omitted.",
"description": (
"Optional MIME type of the file. "
"Auto-detected from filename if not provided."
),
},
"overwrite": {
"type": "boolean",
"description": "Overwrite if file exists (default: false).",
"description": "Whether to overwrite if file exists at path (default: false)",
},
},
"required": ["filename"],
@@ -798,10 +859,10 @@ class WriteWorkspaceFileTool(BaseTool):
return resolved
content: bytes = resolved
max_size = _MAX_FILE_SIZE_MB * 1024 * 1024
max_size = Config().max_file_size_mb * 1024 * 1024
if len(content) > max_size:
return ErrorResponse(
message=f"File too large. Maximum size is {_MAX_FILE_SIZE_MB}MB",
message=f"File too large. Maximum size is {Config().max_file_size_mb}MB",
session_id=session_id,
)
@@ -883,7 +944,12 @@ class DeleteWorkspaceFileTool(BaseTool):
@property
def description(self) -> str:
return "Delete a file from persistent workspace. Specify file_id or path. Paths scoped to current session; use /sessions/<id>/... for cross-session access."
return (
"Delete a file from the user's persistent workspace (cloud storage). "
"Specify either file_id or path to identify the file. "
"Paths are scoped to the current session by default. "
"Use /sessions/<session_id>/... for cross-session access."
)
@property
def parameters(self) -> dict[str, Any]:
@@ -892,14 +958,17 @@ class DeleteWorkspaceFileTool(BaseTool):
"properties": {
"file_id": {
"type": "string",
"description": "File ID from list_workspace_files.",
"description": "The file's unique ID (from list_workspace_files)",
},
"path": {
"type": "string",
"description": "Virtual file path.",
"description": (
"The virtual file path (e.g., '/documents/report.pdf'). "
"Scoped to current session by default."
),
},
},
"required": [], # At least one of file_id or path must be provided
"required": [], # At least one must be provided
}
@property

View File

@@ -76,6 +76,7 @@ MODEL_COST: dict[LlmModel, int] = {
LlmModel.GPT4O_MINI: 1,
LlmModel.GPT4O: 3,
LlmModel.GPT4_TURBO: 10,
LlmModel.GPT3_5_TURBO: 1,
LlmModel.CLAUDE_4_1_OPUS: 21,
LlmModel.CLAUDE_4_OPUS: 21,
LlmModel.CLAUDE_4_SONNET: 5,

View File

@@ -38,10 +38,6 @@ POOL_TIMEOUT = os.getenv("DB_POOL_TIMEOUT")
if POOL_TIMEOUT:
DATABASE_URL = add_param(DATABASE_URL, "pool_timeout", POOL_TIMEOUT)
STMT_CACHE_SIZE = os.getenv("DB_STATEMENT_CACHE_SIZE")
if STMT_CACHE_SIZE:
DATABASE_URL = add_param(DATABASE_URL, "statement_cache_size", STMT_CACHE_SIZE)
HTTP_TIMEOUT = int(POOL_TIMEOUT) if POOL_TIMEOUT else None
prisma = Prisma(

View File

@@ -877,12 +877,12 @@ async def get_execution_outputs_by_node_exec_id(
where={"referencedByOutputExecId": node_exec_id}
)
result: CompletedBlockOutput = defaultdict(list)
result = {}
for output in outputs:
if output.data is not None:
result[output.name].append(type_utils.convert(output.data, JsonValue))
result[output.name] = type_utils.convert(output.data, JsonValue)
return dict(result)
return result
async def update_graph_execution_start_time(

View File

@@ -1,102 +0,0 @@
"""Test that get_execution_outputs_by_node_exec_id returns CompletedBlockOutput.
CompletedBlockOutput is dict[str, list[Any]] — values must be lists.
The RPC service layer validates return types via TypeAdapter, so if
the function returns plain values instead of lists, it causes:
1 validation error for dict[str,list[any]] response
Input should be a valid list [type=list_type, input_value='', input_type=str]
This breaks SmartDecisionMakerBlock agent mode tool execution.
"""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from pydantic import TypeAdapter
from backend.data.block import CompletedBlockOutput
@pytest.mark.asyncio
async def test_outputs_are_lists():
"""Each value in the returned dict must be a list, matching CompletedBlockOutput."""
from backend.data.execution import get_execution_outputs_by_node_exec_id
mock_output = MagicMock()
mock_output.name = "response"
mock_output.data = "some text output"
with patch(
"backend.data.execution.AgentNodeExecutionInputOutput.prisma"
) as mock_prisma:
mock_prisma.return_value.find_many = AsyncMock(return_value=[mock_output])
result = await get_execution_outputs_by_node_exec_id("test-exec-id")
# The result must conform to CompletedBlockOutput = dict[str, list[Any]]
assert "response" in result
assert isinstance(
result["response"], list
), f"Expected list, got {type(result['response']).__name__}: {result['response']!r}"
# Must also pass TypeAdapter validation (this is what the RPC layer does)
adapter = TypeAdapter(CompletedBlockOutput)
validated = adapter.validate_python(result) # This is the line that fails in prod
assert validated == {"response": ["some text output"]}
@pytest.mark.asyncio
async def test_multiple_outputs_same_name_are_collected():
"""Multiple outputs with the same name should all appear in the list."""
from backend.data.execution import get_execution_outputs_by_node_exec_id
mock_out1 = MagicMock()
mock_out1.name = "result"
mock_out1.data = "first"
mock_out2 = MagicMock()
mock_out2.name = "result"
mock_out2.data = "second"
with patch(
"backend.data.execution.AgentNodeExecutionInputOutput.prisma"
) as mock_prisma:
mock_prisma.return_value.find_many = AsyncMock(
return_value=[mock_out1, mock_out2]
)
result = await get_execution_outputs_by_node_exec_id("test-exec-id")
assert isinstance(result["result"], list)
assert len(result["result"]) == 2
@pytest.mark.asyncio
async def test_empty_outputs_returns_empty_dict():
"""No outputs → empty dict."""
from backend.data.execution import get_execution_outputs_by_node_exec_id
with patch(
"backend.data.execution.AgentNodeExecutionInputOutput.prisma"
) as mock_prisma:
mock_prisma.return_value.find_many = AsyncMock(return_value=[])
result = await get_execution_outputs_by_node_exec_id("test-exec-id")
assert result == {}
@pytest.mark.asyncio
async def test_none_data_skipped():
"""Outputs with data=None should be skipped."""
from backend.data.execution import get_execution_outputs_by_node_exec_id
mock_output = MagicMock()
mock_output.name = "response"
mock_output.data = None
with patch(
"backend.data.execution.AgentNodeExecutionInputOutput.prisma"
) as mock_prisma:
mock_prisma.return_value.find_many = AsyncMock(return_value=[mock_output])
result = await get_execution_outputs_by_node_exec_id("test-exec-id")
assert result == {}

View File

@@ -0,0 +1,750 @@
import asyncio
import csv
import io
import logging
import os
import re
import socket
from dataclasses import dataclass
from datetime import datetime, timezone
from typing import Any, Literal, Optional
from uuid import uuid4
import prisma.enums
import prisma.models
import prisma.types
from prisma.errors import UniqueViolationError
from pydantic import BaseModel, EmailStr, TypeAdapter, ValidationError
from backend.data.db import transaction
from backend.data.model import User
from backend.data.redis_client import get_redis_async
from backend.data.tally import get_business_understanding_input_from_tally, mask_email
from backend.data.understanding import (
BusinessUnderstandingInput,
merge_business_understanding_data,
)
from backend.data.user import get_user_by_email, get_user_by_id
from backend.executor.cluster_lock import AsyncClusterLock
from backend.util.exceptions import (
NotAuthorizedError,
NotFoundError,
PreconditionFailed,
)
from backend.util.json import SafeJson
from backend.util.settings import Settings
logger = logging.getLogger(__name__)
_settings = Settings()
_WORKER_ID = f"{socket.gethostname()}:{os.getpid()}"
_tally_seed_tasks: dict[str, asyncio.Task] = {}
_TALLY_STALE_SECONDS = 300
_MAX_TALLY_ERROR_LENGTH = 200
_email_adapter = TypeAdapter(EmailStr)
MAX_BULK_INVITE_FILE_BYTES = 1024 * 1024
MAX_BULK_INVITE_ROWS = 500
class InvitedUserRecord(BaseModel):
id: str
email: str
status: prisma.enums.InvitedUserStatus
auth_user_id: Optional[str] = None
name: Optional[str] = None
tally_understanding: Optional[dict[str, Any]] = None
tally_status: prisma.enums.TallyComputationStatus
tally_computed_at: Optional[datetime] = None
tally_error: Optional[str] = None
created_at: datetime
updated_at: datetime
@classmethod
def from_db(cls, invited_user: "prisma.models.InvitedUser") -> "InvitedUserRecord":
payload = (
invited_user.tallyUnderstanding
if isinstance(invited_user.tallyUnderstanding, dict)
else None
)
return cls(
id=invited_user.id,
email=invited_user.email,
status=invited_user.status,
auth_user_id=invited_user.authUserId,
name=invited_user.name,
tally_understanding=payload,
tally_status=invited_user.tallyStatus,
tally_computed_at=invited_user.tallyComputedAt,
tally_error=invited_user.tallyError,
created_at=invited_user.createdAt,
updated_at=invited_user.updatedAt,
)
class BulkInvitedUserRowResult(BaseModel):
row_number: int
email: Optional[str] = None
name: Optional[str] = None
status: Literal["CREATED", "SKIPPED", "ERROR"]
message: str
invited_user: Optional[InvitedUserRecord] = None
class BulkInvitedUsersResult(BaseModel):
created_count: int
skipped_count: int
error_count: int
results: list[BulkInvitedUserRowResult]
@dataclass(frozen=True)
class _ParsedInviteRow:
row_number: int
email: str
name: Optional[str]
def normalize_email(email: str) -> str:
return email.strip().lower()
def _normalize_name(name: Optional[str]) -> Optional[str]:
if name is None:
return None
normalized = name.strip()
return normalized or None
def _default_profile_name(email: str, preferred_name: Optional[str]) -> str:
if preferred_name:
return preferred_name
local_part = email.split("@", 1)[0].strip()
return local_part or "user"
def _sanitize_username_base(email: str) -> str:
local_part = email.split("@", 1)[0].lower()
sanitized = re.sub(r"[^a-z0-9-]", "", local_part)
sanitized = sanitized.strip("-")
return sanitized[:40] or "user"
async def _generate_unique_profile_username(email: str, tx) -> str:
base = _sanitize_username_base(email)
for _ in range(2):
candidate = f"{base}-{uuid4().hex[:6]}"
existing = await prisma.models.Profile.prisma(tx).find_unique(
where={"username": candidate}
)
if existing is None:
return candidate
raise RuntimeError(f"Unable to generate unique username for {email}")
async def _ensure_default_profile(
user_id: str,
email: str,
preferred_name: Optional[str],
tx,
) -> None:
existing_profile = await prisma.models.Profile.prisma(tx).find_unique(
where={"userId": user_id}
)
if existing_profile is not None:
return
username = await _generate_unique_profile_username(email, tx)
await prisma.models.Profile.prisma(tx).create(
data=prisma.types.ProfileCreateInput(
userId=user_id,
name=_default_profile_name(email, preferred_name),
username=username,
description="I'm new here",
links=[],
avatarUrl="",
)
)
async def _ensure_default_onboarding(user_id: str, tx) -> None:
await prisma.models.UserOnboarding.prisma(tx).upsert(
where={"userId": user_id},
data={
"create": prisma.types.UserOnboardingCreateInput(userId=user_id),
"update": {},
},
)
async def _apply_tally_understanding(
user_id: str,
invited_user: "prisma.models.InvitedUser",
tx,
) -> None:
if not isinstance(invited_user.tallyUnderstanding, dict):
return
try:
input_data = BusinessUnderstandingInput.model_validate(
invited_user.tallyUnderstanding
)
except Exception:
logger.warning(
"Malformed tallyUnderstanding for invited user %s; skipping",
invited_user.id,
exc_info=True,
)
return
payload = merge_business_understanding_data({}, input_data)
await prisma.models.CoPilotUnderstanding.prisma(tx).upsert(
where={"userId": user_id},
data={
"create": {"userId": user_id, "data": SafeJson(payload)},
"update": {"data": SafeJson(payload)},
},
)
async def list_invited_users(
page: int = 1,
page_size: int = 50,
) -> tuple[list[InvitedUserRecord], int]:
total = await prisma.models.InvitedUser.prisma().count()
invited_users = await prisma.models.InvitedUser.prisma().find_many(
order={"createdAt": "desc"},
skip=(page - 1) * page_size,
take=page_size,
)
return [InvitedUserRecord.from_db(iu) for iu in invited_users], total
async def create_invited_user(
email: str, name: Optional[str] = None
) -> InvitedUserRecord:
normalized_email = normalize_email(email)
normalized_name = _normalize_name(name)
existing_user = await prisma.models.User.prisma().find_unique(
where={"email": normalized_email}
)
if existing_user is not None:
raise PreconditionFailed("An active user with this email already exists")
existing_invited_user = await prisma.models.InvitedUser.prisma().find_unique(
where={"email": normalized_email}
)
if existing_invited_user is not None:
raise PreconditionFailed("An invited user with this email already exists")
try:
invited_user = await prisma.models.InvitedUser.prisma().create(
data={
"email": normalized_email,
"name": normalized_name,
"status": prisma.enums.InvitedUserStatus.INVITED,
"tallyStatus": prisma.enums.TallyComputationStatus.PENDING,
}
)
except UniqueViolationError:
raise PreconditionFailed("An invited user with this email already exists")
schedule_invited_user_tally_precompute(invited_user.id)
return InvitedUserRecord.from_db(invited_user)
async def revoke_invited_user(invited_user_id: str) -> InvitedUserRecord:
invited_user = await prisma.models.InvitedUser.prisma().find_unique(
where={"id": invited_user_id}
)
if invited_user is None:
raise NotFoundError(f"Invited user {invited_user_id} not found")
if invited_user.status == prisma.enums.InvitedUserStatus.CLAIMED:
raise PreconditionFailed("Claimed invited users cannot be revoked")
if invited_user.status == prisma.enums.InvitedUserStatus.REVOKED:
return InvitedUserRecord.from_db(invited_user)
revoked_user = await prisma.models.InvitedUser.prisma().update(
where={"id": invited_user_id},
data={"status": prisma.enums.InvitedUserStatus.REVOKED},
)
if revoked_user is None:
raise NotFoundError(f"Invited user {invited_user_id} not found")
return InvitedUserRecord.from_db(revoked_user)
async def retry_invited_user_tally(invited_user_id: str) -> InvitedUserRecord:
invited_user = await prisma.models.InvitedUser.prisma().find_unique(
where={"id": invited_user_id}
)
if invited_user is None:
raise NotFoundError(f"Invited user {invited_user_id} not found")
if invited_user.status == prisma.enums.InvitedUserStatus.REVOKED:
raise PreconditionFailed("Revoked invited users cannot retry Tally seeding")
refreshed_user = await prisma.models.InvitedUser.prisma().update(
where={"id": invited_user_id},
data={
"tallyUnderstanding": None,
"tallyStatus": prisma.enums.TallyComputationStatus.PENDING,
"tallyComputedAt": None,
"tallyError": None,
},
)
if refreshed_user is None:
raise NotFoundError(f"Invited user {invited_user_id} not found")
schedule_invited_user_tally_precompute(invited_user_id)
return InvitedUserRecord.from_db(refreshed_user)
def _decode_bulk_invite_file(content: bytes) -> str:
if len(content) > MAX_BULK_INVITE_FILE_BYTES:
raise ValueError("Invite file exceeds the maximum size of 1 MB")
try:
return content.decode("utf-8-sig")
except UnicodeDecodeError as exc:
raise ValueError("Invite file must be UTF-8 encoded") from exc
def _parse_bulk_invite_csv(text: str) -> list[_ParsedInviteRow]:
indexed_rows: list[tuple[int, list[str]]] = []
for row_number, row in enumerate(csv.reader(io.StringIO(text)), start=1):
normalized_row = [cell.strip() for cell in row]
if any(normalized_row):
indexed_rows.append((row_number, normalized_row))
if not indexed_rows:
return []
header = [cell.lower() for cell in indexed_rows[0][1]]
has_header = "email" in header
email_index = header.index("email") if has_header else 0
name_index: Optional[int] = (
header.index("name")
if has_header and "name" in header
else (1 if not has_header else None)
)
data_rows = indexed_rows[1:] if has_header else indexed_rows
parsed_rows: list[_ParsedInviteRow] = []
for row_number, row in data_rows:
if len(parsed_rows) >= MAX_BULK_INVITE_ROWS:
break
email = row[email_index].strip() if len(row) > email_index else ""
name = (
row[name_index].strip()
if name_index is not None and len(row) > name_index
else ""
)
parsed_rows.append(
_ParsedInviteRow(
row_number=row_number,
email=email,
name=name or None,
)
)
return parsed_rows
def _parse_bulk_invite_text(text: str) -> list[_ParsedInviteRow]:
parsed_rows: list[_ParsedInviteRow] = []
for row_number, raw_line in enumerate(text.splitlines(), start=1):
if len(parsed_rows) >= MAX_BULK_INVITE_ROWS:
break
line = raw_line.strip()
if not line or line.startswith("#"):
continue
parsed_rows.append(
_ParsedInviteRow(
row_number=row_number,
email=line,
name=None,
)
)
return parsed_rows
def _parse_bulk_invite_file(
filename: Optional[str],
content: bytes,
) -> list[_ParsedInviteRow]:
text = _decode_bulk_invite_file(content)
file_name = filename.lower() if filename else ""
parsed_rows = (
_parse_bulk_invite_csv(text)
if file_name.endswith(".csv")
else _parse_bulk_invite_text(text)
)
if not parsed_rows:
raise ValueError("Invite file did not contain any emails")
return parsed_rows
async def bulk_create_invited_users_from_file(
filename: Optional[str],
content: bytes,
) -> BulkInvitedUsersResult:
parsed_rows = _parse_bulk_invite_file(filename, content)
created_count = 0
skipped_count = 0
error_count = 0
results: list[BulkInvitedUserRowResult] = []
seen_emails: set[str] = set()
for row in parsed_rows:
row_name = _normalize_name(row.name)
try:
validated_email = _email_adapter.validate_python(row.email)
except ValidationError:
error_count += 1
results.append(
BulkInvitedUserRowResult(
row_number=row.row_number,
email=row.email or None,
name=row_name,
status="ERROR",
message="Invalid email address",
)
)
continue
normalized_email = normalize_email(str(validated_email))
if normalized_email in seen_emails:
skipped_count += 1
results.append(
BulkInvitedUserRowResult(
row_number=row.row_number,
email=normalized_email,
name=row_name,
status="SKIPPED",
message="Duplicate email in upload file",
)
)
continue
seen_emails.add(normalized_email)
try:
invited_user = await create_invited_user(normalized_email, row_name)
except PreconditionFailed as exc:
skipped_count += 1
results.append(
BulkInvitedUserRowResult(
row_number=row.row_number,
email=normalized_email,
name=row_name,
status="SKIPPED",
message=str(exc),
)
)
except Exception:
masked = mask_email(normalized_email)
logger.exception(
"Failed to create bulk invite for row %s (%s)",
row.row_number,
masked,
)
error_count += 1
results.append(
BulkInvitedUserRowResult(
row_number=row.row_number,
email=normalized_email,
name=row_name,
status="ERROR",
message="Unexpected error creating invite",
)
)
else:
created_count += 1
results.append(
BulkInvitedUserRowResult(
row_number=row.row_number,
email=normalized_email,
name=row_name,
status="CREATED",
message="Invite created",
invited_user=invited_user,
)
)
return BulkInvitedUsersResult(
created_count=created_count,
skipped_count=skipped_count,
error_count=error_count,
results=results,
)
async def _compute_invited_user_tally_seed(invited_user_id: str) -> None:
invited_user = await prisma.models.InvitedUser.prisma().find_unique(
where={"id": invited_user_id}
)
if invited_user is None:
return
if invited_user.status == prisma.enums.InvitedUserStatus.REVOKED:
return
try:
r = await get_redis_async()
except Exception:
r = None
lock: AsyncClusterLock | None = None
if r is not None:
lock = AsyncClusterLock(
redis=r,
key=f"tally_seed:{invited_user_id}",
owner_id=_WORKER_ID,
timeout=_TALLY_STALE_SECONDS,
)
current_owner = await lock.try_acquire()
if current_owner is None:
logger.warn("Redis unvailable for tally lock - skipping tally enrichement")
return
elif current_owner != _WORKER_ID:
logger.debug(
"Tally seed for %s already locked by %s, skipping",
invited_user_id,
current_owner,
)
return
if (
invited_user.tallyStatus == prisma.enums.TallyComputationStatus.RUNNING
and invited_user.updatedAt is not None
):
age = (datetime.now(timezone.utc) - invited_user.updatedAt).total_seconds()
if age < _TALLY_STALE_SECONDS:
logger.debug(
"Tally task for %s still RUNNING (age=%ds), skipping",
invited_user_id,
int(age),
)
return
logger.info(
"Tally task for %s is stale (age=%ds), re-running",
invited_user_id,
int(age),
)
await prisma.models.InvitedUser.prisma().update(
where={"id": invited_user_id},
data={
"tallyStatus": prisma.enums.TallyComputationStatus.RUNNING,
"tallyError": None,
},
)
try:
input_data = await get_business_understanding_input_from_tally(
invited_user.email,
require_api_key=True,
)
payload = (
SafeJson(input_data.model_dump(exclude_none=True))
if input_data is not None
else None
)
await prisma.models.InvitedUser.prisma().update(
where={"id": invited_user_id},
data={
"tallyUnderstanding": payload,
"tallyStatus": prisma.enums.TallyComputationStatus.READY,
"tallyComputedAt": datetime.now(timezone.utc),
"tallyError": None,
},
)
except Exception as exc:
logger.exception(
"Failed to compute Tally understanding for invited user %s",
invited_user_id,
)
sanitized_error = re.sub(
r"https?://\S+", "<url>", f"{type(exc).__name__}: {exc}"
)[:_MAX_TALLY_ERROR_LENGTH]
await prisma.models.InvitedUser.prisma().update(
where={"id": invited_user_id},
data={
"tallyStatus": prisma.enums.TallyComputationStatus.FAILED,
"tallyError": sanitized_error,
},
)
def schedule_invited_user_tally_precompute(invited_user_id: str) -> None:
existing = _tally_seed_tasks.get(invited_user_id)
if existing is not None and not existing.done():
logger.debug("Tally task already running for %s, skipping", invited_user_id)
return
task = asyncio.create_task(_compute_invited_user_tally_seed(invited_user_id))
_tally_seed_tasks[invited_user_id] = task
def _on_done(t: asyncio.Task, _id: str = invited_user_id) -> None:
if _tally_seed_tasks.get(_id) is t:
del _tally_seed_tasks[_id]
task.add_done_callback(_on_done)
async def _open_signup_create_user(
auth_user_id: str,
normalized_email: str,
metadata_name: Optional[str],
) -> User:
"""Create a user without requiring an invite (open signup mode)."""
preferred_name = _normalize_name(metadata_name)
try:
async with transaction() as tx:
user = await prisma.models.User.prisma(tx).create(
data=prisma.types.UserCreateInput(
id=auth_user_id,
email=normalized_email,
name=preferred_name,
)
)
await _ensure_default_profile(
auth_user_id, normalized_email, preferred_name, tx
)
await _ensure_default_onboarding(auth_user_id, tx)
except UniqueViolationError:
existing = await prisma.models.User.prisma().find_unique(
where={"id": auth_user_id}
)
if existing is not None:
return User.from_db(existing)
raise
return User.from_db(user)
# TODO: We need to change this functions logic before going live
async def get_or_activate_user(user_data: dict) -> User:
auth_user_id = user_data.get("sub")
if not auth_user_id:
raise NotAuthorizedError("User ID not found in token")
auth_email = user_data.get("email")
if not auth_email:
raise NotAuthorizedError("Email not found in token")
normalized_email = normalize_email(auth_email)
user_metadata = user_data.get("user_metadata")
metadata_name = (
user_metadata.get("name") if isinstance(user_metadata, dict) else None
)
existing_user = None
try:
existing_user = await get_user_by_id(auth_user_id)
except ValueError:
existing_user = None
except Exception:
logger.exception("Error on get user by id during tally enrichment process")
raise
if existing_user is not None:
return existing_user
if not _settings.config.enable_invite_gate or normalized_email.endswith("@agpt.co"):
return await _open_signup_create_user(
auth_user_id, normalized_email, metadata_name
)
invited_user = await prisma.models.InvitedUser.prisma().find_unique(
where={"email": normalized_email}
)
if invited_user is None:
raise NotAuthorizedError("Your email is not allowed to access the platform")
if invited_user.status != prisma.enums.InvitedUserStatus.INVITED:
raise NotAuthorizedError("Your invitation is no longer active")
try:
async with transaction() as tx:
current_user = await prisma.models.User.prisma(tx).find_unique(
where={"id": auth_user_id}
)
if current_user is not None:
return User.from_db(current_user)
current_invited_user = await prisma.models.InvitedUser.prisma(
tx
).find_unique(where={"email": normalized_email})
if current_invited_user is None:
raise NotAuthorizedError(
"Your email is not allowed to access the platform"
)
if current_invited_user.status != prisma.enums.InvitedUserStatus.INVITED:
raise NotAuthorizedError("Your invitation is no longer active")
if current_invited_user.authUserId not in (None, auth_user_id):
raise NotAuthorizedError("Your invitation has already been claimed")
preferred_name = current_invited_user.name or _normalize_name(metadata_name)
await prisma.models.User.prisma(tx).create(
data=prisma.types.UserCreateInput(
id=auth_user_id,
email=normalized_email,
name=preferred_name,
)
)
await prisma.models.InvitedUser.prisma(tx).update(
where={"id": current_invited_user.id},
data={
"status": prisma.enums.InvitedUserStatus.CLAIMED,
"authUserId": auth_user_id,
},
)
await _ensure_default_profile(
auth_user_id,
normalized_email,
preferred_name,
tx,
)
await _ensure_default_onboarding(auth_user_id, tx)
await _apply_tally_understanding(auth_user_id, current_invited_user, tx)
except UniqueViolationError:
logger.info("Concurrent activation for user %s; re-fetching", auth_user_id)
already_created = await prisma.models.User.prisma().find_unique(
where={"id": auth_user_id}
)
if already_created is not None:
return User.from_db(already_created)
raise RuntimeError(
f"UniqueViolationError during activation but user {auth_user_id} not found"
)
get_user_by_id.cache_delete(auth_user_id)
get_user_by_email.cache_delete(normalized_email)
activated_user = await prisma.models.User.prisma().find_unique(
where={"id": auth_user_id}
)
if activated_user is None:
raise RuntimeError(
f"Activated user {auth_user_id} was not found after creation"
)
return User.from_db(activated_user)

View File

@@ -0,0 +1,335 @@
from contextlib import asynccontextmanager
from datetime import datetime, timezone
from types import SimpleNamespace
from typing import cast
from unittest.mock import AsyncMock, Mock
import prisma.enums
import prisma.models
import pytest
import pytest_mock
from backend.util.exceptions import NotAuthorizedError, PreconditionFailed
from .invited_user import (
InvitedUserRecord,
bulk_create_invited_users_from_file,
create_invited_user,
get_or_activate_user,
retry_invited_user_tally,
)
def _invited_user_db_record(
*,
status: prisma.enums.InvitedUserStatus = prisma.enums.InvitedUserStatus.INVITED,
tally_understanding: dict | None = None,
):
now = datetime.now(timezone.utc)
return SimpleNamespace(
id="invite-1",
email="invited@example.com",
status=status,
authUserId=None,
name="Invited User",
tallyUnderstanding=tally_understanding,
tallyStatus=prisma.enums.TallyComputationStatus.PENDING,
tallyComputedAt=None,
tallyError=None,
createdAt=now,
updatedAt=now,
)
def _invited_user_record(
*,
status: prisma.enums.InvitedUserStatus = prisma.enums.InvitedUserStatus.INVITED,
tally_understanding: dict | None = None,
):
return InvitedUserRecord.from_db(
cast(
prisma.models.InvitedUser,
_invited_user_db_record(
status=status,
tally_understanding=tally_understanding,
),
)
)
def _user_db_record():
now = datetime.now(timezone.utc)
return SimpleNamespace(
id="auth-user-1",
email="invited@example.com",
emailVerified=True,
name="Invited User",
createdAt=now,
updatedAt=now,
metadata={},
integrations="",
stripeCustomerId=None,
topUpConfig=None,
maxEmailsPerDay=3,
notifyOnAgentRun=True,
notifyOnZeroBalance=True,
notifyOnLowBalance=True,
notifyOnBlockExecutionFailed=True,
notifyOnContinuousAgentError=True,
notifyOnDailySummary=True,
notifyOnWeeklySummary=True,
notifyOnMonthlySummary=True,
notifyOnAgentApproved=True,
notifyOnAgentRejected=True,
timezone="not-set",
)
@pytest.mark.asyncio
async def test_create_invited_user_rejects_existing_active_user(
mocker: pytest_mock.MockerFixture,
) -> None:
user_repo = Mock()
user_repo.find_unique = AsyncMock(return_value=_user_db_record())
invited_user_repo = Mock()
invited_user_repo.find_unique = AsyncMock()
mocker.patch(
"backend.data.invited_user.prisma.models.User.prisma", return_value=user_repo
)
mocker.patch(
"backend.data.invited_user.prisma.models.InvitedUser.prisma",
return_value=invited_user_repo,
)
with pytest.raises(PreconditionFailed):
await create_invited_user("Invited@example.com")
@pytest.mark.asyncio
async def test_create_invited_user_schedules_tally_seed(
mocker: pytest_mock.MockerFixture,
) -> None:
user_repo = Mock()
user_repo.find_unique = AsyncMock(return_value=None)
invited_user_repo = Mock()
invited_user_repo.find_unique = AsyncMock(return_value=None)
invited_user_repo.create = AsyncMock(return_value=_invited_user_db_record())
schedule = mocker.patch(
"backend.data.invited_user.schedule_invited_user_tally_precompute"
)
mocker.patch(
"backend.data.invited_user.prisma.models.User.prisma", return_value=user_repo
)
mocker.patch(
"backend.data.invited_user.prisma.models.InvitedUser.prisma",
return_value=invited_user_repo,
)
invited_user = await create_invited_user("Invited@example.com", "Invited User")
assert invited_user.email == "invited@example.com"
invited_user_repo.create.assert_awaited_once()
schedule.assert_called_once_with("invite-1")
@pytest.mark.asyncio
async def test_retry_invited_user_tally_resets_state_and_schedules(
mocker: pytest_mock.MockerFixture,
) -> None:
invited_user_repo = Mock()
invited_user_repo.find_unique = AsyncMock(return_value=_invited_user_db_record())
invited_user_repo.update = AsyncMock(return_value=_invited_user_db_record())
schedule = mocker.patch(
"backend.data.invited_user.schedule_invited_user_tally_precompute"
)
mocker.patch(
"backend.data.invited_user.prisma.models.InvitedUser.prisma",
return_value=invited_user_repo,
)
invited_user = await retry_invited_user_tally("invite-1")
assert invited_user.id == "invite-1"
invited_user_repo.update.assert_awaited_once()
schedule.assert_called_once_with("invite-1")
@pytest.mark.asyncio
async def test_get_or_activate_user_requires_invite(
mocker: pytest_mock.MockerFixture,
) -> None:
invited_user_repo = Mock()
invited_user_repo.find_unique = AsyncMock(return_value=None)
mock_get_user_by_id = AsyncMock(side_effect=ValueError("User not found"))
mock_get_user_by_id.cache_delete = Mock()
mocker.patch(
"backend.data.invited_user.get_user_by_id",
mock_get_user_by_id,
)
mocker.patch(
"backend.data.invited_user._settings.config.enable_invite_gate",
True,
)
mocker.patch(
"backend.data.invited_user.prisma.models.InvitedUser.prisma",
return_value=invited_user_repo,
)
with pytest.raises(NotAuthorizedError):
await get_or_activate_user(
{"sub": "auth-user-1", "email": "invited@example.com"}
)
@pytest.mark.asyncio
async def test_get_or_activate_user_creates_user_from_invite(
mocker: pytest_mock.MockerFixture,
) -> None:
tx = object()
invited_user = _invited_user_db_record(
tally_understanding={"user_name": "Invited User", "industry": "Automation"}
)
created_user = _user_db_record()
outside_user_repo = Mock()
# Only called once at post-transaction verification (line 741);
# get_user_by_id (line 657) uses prisma.user.find_unique, not this mock.
outside_user_repo.find_unique = AsyncMock(return_value=created_user)
inside_user_repo = Mock()
inside_user_repo.find_unique = AsyncMock(return_value=None)
inside_user_repo.create = AsyncMock(return_value=created_user)
outside_invited_repo = Mock()
outside_invited_repo.find_unique = AsyncMock(return_value=invited_user)
inside_invited_repo = Mock()
inside_invited_repo.find_unique = AsyncMock(return_value=invited_user)
inside_invited_repo.update = AsyncMock(return_value=invited_user)
def user_prisma(client=None):
return inside_user_repo if client is tx else outside_user_repo
def invited_user_prisma(client=None):
return inside_invited_repo if client is tx else outside_invited_repo
@asynccontextmanager
async def fake_transaction():
yield tx
# Mock get_user_by_id since it uses prisma.user.find_unique (global client),
# not prisma.models.User.prisma().find_unique which we mock above.
mock_get_user_by_id = AsyncMock(side_effect=ValueError("User not found"))
mock_get_user_by_id.cache_delete = Mock()
mocker.patch(
"backend.data.invited_user.get_user_by_id",
mock_get_user_by_id,
)
mock_get_user_by_email = AsyncMock()
mock_get_user_by_email.cache_delete = Mock()
mocker.patch(
"backend.data.invited_user.get_user_by_email",
mock_get_user_by_email,
)
ensure_profile = mocker.patch(
"backend.data.invited_user._ensure_default_profile",
AsyncMock(),
)
ensure_onboarding = mocker.patch(
"backend.data.invited_user._ensure_default_onboarding",
AsyncMock(),
)
apply_tally = mocker.patch(
"backend.data.invited_user._apply_tally_understanding",
AsyncMock(),
)
mocker.patch("backend.data.invited_user.transaction", fake_transaction)
mocker.patch(
"backend.data.invited_user.prisma.models.User.prisma", side_effect=user_prisma
)
mocker.patch(
"backend.data.invited_user.prisma.models.InvitedUser.prisma",
side_effect=invited_user_prisma,
)
user = await get_or_activate_user(
{
"sub": "auth-user-1",
"email": "Invited@example.com",
"user_metadata": {"name": "Invited User"},
}
)
assert user.id == "auth-user-1"
inside_user_repo.create.assert_awaited_once()
inside_invited_repo.update.assert_awaited_once()
ensure_profile.assert_awaited_once()
ensure_onboarding.assert_awaited_once_with("auth-user-1", tx)
apply_tally.assert_awaited_once_with("auth-user-1", invited_user, tx)
@pytest.mark.asyncio
async def test_bulk_create_invited_users_from_text_file(
mocker: pytest_mock.MockerFixture,
) -> None:
create_invited = mocker.patch(
"backend.data.invited_user.create_invited_user",
AsyncMock(
side_effect=[
_invited_user_record(),
_invited_user_record(),
]
),
)
result = await bulk_create_invited_users_from_file(
"invites.txt",
b"Invited@example.com\nsecond@example.com\n",
)
assert result.created_count == 2
assert result.skipped_count == 0
assert result.error_count == 0
assert [row.status for row in result.results] == ["CREATED", "CREATED"]
assert create_invited.await_count == 2
@pytest.mark.asyncio
async def test_bulk_create_invited_users_handles_csv_duplicates_and_invalid_rows(
mocker: pytest_mock.MockerFixture,
) -> None:
create_invited = mocker.patch(
"backend.data.invited_user.create_invited_user",
AsyncMock(
side_effect=[
_invited_user_record(),
PreconditionFailed("An invited user with this email already exists"),
]
),
)
result = await bulk_create_invited_users_from_file(
"invites.csv",
(
"email,name\n"
"valid@example.com,Valid User\n"
"not-an-email,Bad Row\n"
"valid@example.com,Duplicate In File\n"
"existing@example.com,Existing User\n"
).encode("utf-8"),
)
assert result.created_count == 1
assert result.skipped_count == 2
assert result.error_count == 1
assert [row.status for row in result.results] == [
"CREATED",
"ERROR",
"SKIPPED",
"SKIPPED",
]
assert create_invited.await_count == 2

View File

@@ -41,7 +41,7 @@ _MAX_PAGES = 100
_LLM_TIMEOUT = 30
def _mask_email(email: str) -> str:
def mask_email(email: str) -> str:
"""Mask an email for safe logging: 'alice@example.com' -> 'a***e@example.com'."""
try:
local, domain = email.rsplit("@", 1)
@@ -196,8 +196,7 @@ async def _refresh_cache(form_id: str) -> tuple[dict, list]:
Returns (email_index, questions).
"""
settings = Settings()
client = _make_tally_client(settings.secrets.tally_api_key)
client = _make_tally_client(_settings.secrets.tally_api_key)
redis = await get_redis_async()
last_fetch_key = _LAST_FETCH_KEY.format(form_id=form_id)
@@ -332,6 +331,9 @@ Fields:
- current_software (list of strings): software/tools currently used
- existing_automation (list of strings): existing automations
- additional_notes (string): any additional context
- suggested_prompts (list of 5 strings): short action prompts (each under 20 words) that would help \
this person get started with automating their work. Should be specific to their industry, role, and \
pain points; actionable and conversational in tone; focused on automation opportunities.
Form data:
"""
@@ -339,21 +341,21 @@ Form data:
_EXTRACTION_SUFFIX = "\n\nReturn ONLY valid JSON."
async def extract_business_understanding(
async def extract_business_understanding_from_tally(
formatted_text: str,
) -> BusinessUnderstandingInput:
"""Use an LLM to extract structured business understanding from form text.
"""
Use an LLM to extract structured business understanding from form text.
Raises on timeout or unparseable response so the caller can handle it.
"""
settings = Settings()
api_key = settings.secrets.open_router_api_key
api_key = _settings.secrets.open_router_api_key
client = AsyncOpenAI(api_key=api_key, base_url=OPENROUTER_BASE_URL)
try:
response = await asyncio.wait_for(
client.chat.completions.create(
model="openai/gpt-4o-mini",
model=_settings.config.tally_extraction_llm_model,
messages=[
{
"role": "user",
@@ -378,9 +380,57 @@ async def extract_business_understanding(
# Filter out null values before constructing
cleaned = {k: v for k, v in data.items() if v is not None}
# Validate suggested_prompts: filter >20 words, keep top 3
raw_prompts = cleaned.get("suggested_prompts", [])
if isinstance(raw_prompts, list):
valid = [
p.strip()
for p in raw_prompts
if isinstance(p, str) and len(p.strip().split()) <= 20
]
# This will keep up to 3 suggestions
short_prompts = valid[:3] if valid else None
if short_prompts:
cleaned["suggested_prompts"] = short_prompts
else:
# We dont want to add a None value suggested_prompts field
cleaned.pop("suggested_prompts", None)
else:
# suggested_prompts must be a list - removing it as its not here
cleaned.pop("suggested_prompts", None)
return BusinessUnderstandingInput(**cleaned)
async def get_business_understanding_input_from_tally(
email: str,
*,
require_api_key: bool = False,
) -> Optional[BusinessUnderstandingInput]:
if not _settings.secrets.tally_api_key:
if require_api_key:
raise RuntimeError("Tally API key is not configured")
logger.debug("Tally: no API key configured, skipping")
return None
masked = mask_email(email)
result = await find_submission_by_email(TALLY_FORM_ID, email)
if result is None:
logger.debug(f"Tally: no submission found for {masked}")
return None
submission, questions = result
logger.info(f"Tally: found submission for {masked}, extracting understanding")
formatted = format_submission_for_llm(submission, questions)
if not formatted.strip():
logger.warning("Tally: formatted submission was empty, skipping")
return None
return await extract_business_understanding_from_tally(formatted)
async def populate_understanding_from_tally(user_id: str, email: str) -> None:
"""Main orchestrator: check Tally for a matching submission and populate understanding.
@@ -395,32 +445,9 @@ async def populate_understanding_from_tally(user_id: str, email: str) -> None:
)
return
# Check required config is present
settings = Settings()
if not settings.secrets.tally_api_key or not settings.secrets.tally_form_id:
logger.debug("Tally: Tally config incomplete, skipping")
understanding_input = await get_business_understanding_input_from_tally(email)
if understanding_input is None:
return
if not settings.secrets.open_router_api_key:
logger.debug("Tally: no OpenRouter API key configured, skipping")
return
# Look up submission by email
masked = _mask_email(email)
result = await find_submission_by_email(settings.secrets.tally_form_id, email)
if result is None:
logger.debug(f"Tally: no submission found for {masked}")
return
submission, questions = result
logger.info(f"Tally: found submission for {masked}, extracting understanding")
# Format and extract
formatted = format_submission_for_llm(submission, questions)
if not formatted.strip():
logger.warning("Tally: formatted submission was empty, skipping")
return
understanding_input = await extract_business_understanding(formatted)
# Upsert into database
await upsert_business_understanding(user_id, understanding_input)

View File

@@ -12,11 +12,11 @@ from backend.data.tally import (
_build_email_index,
_format_answer,
_make_tally_client,
_mask_email,
_refresh_cache,
extract_business_understanding,
extract_business_understanding_from_tally,
find_submission_by_email,
format_submission_for_llm,
mask_email,
populate_understanding_from_tally,
)
@@ -248,7 +248,7 @@ async def test_populate_understanding_skips_no_api_key():
new_callable=AsyncMock,
return_value=None,
),
patch("backend.data.tally.Settings", return_value=mock_settings),
patch("backend.data.tally._settings", mock_settings),
patch(
"backend.data.tally.find_submission_by_email",
new_callable=AsyncMock,
@@ -284,6 +284,7 @@ async def test_populate_understanding_full_flow():
],
}
mock_input = MagicMock()
mock_input.suggested_prompts = ["Prompt 1", "Prompt 2", "Prompt 3"]
with (
patch(
@@ -291,14 +292,14 @@ async def test_populate_understanding_full_flow():
new_callable=AsyncMock,
return_value=None,
),
patch("backend.data.tally.Settings", return_value=mock_settings),
patch("backend.data.tally._settings", mock_settings),
patch(
"backend.data.tally.find_submission_by_email",
new_callable=AsyncMock,
return_value=(submission, SAMPLE_QUESTIONS),
),
patch(
"backend.data.tally.extract_business_understanding",
"backend.data.tally.extract_business_understanding_from_tally",
new_callable=AsyncMock,
return_value=mock_input,
) as mock_extract,
@@ -331,14 +332,14 @@ async def test_populate_understanding_handles_llm_timeout():
new_callable=AsyncMock,
return_value=None,
),
patch("backend.data.tally.Settings", return_value=mock_settings),
patch("backend.data.tally._settings", mock_settings),
patch(
"backend.data.tally.find_submission_by_email",
new_callable=AsyncMock,
return_value=(submission, SAMPLE_QUESTIONS),
),
patch(
"backend.data.tally.extract_business_understanding",
"backend.data.tally.extract_business_understanding_from_tally",
new_callable=AsyncMock,
side_effect=asyncio.TimeoutError(),
),
@@ -356,13 +357,13 @@ async def test_populate_understanding_handles_llm_timeout():
def test_mask_email():
assert _mask_email("alice@example.com") == "a***e@example.com"
assert _mask_email("ab@example.com") == "a***@example.com"
assert _mask_email("a@example.com") == "a***@example.com"
assert mask_email("alice@example.com") == "a***e@example.com"
assert mask_email("ab@example.com") == "a***@example.com"
assert mask_email("a@example.com") == "a***@example.com"
def test_mask_email_invalid():
assert _mask_email("no-at-sign") == "***"
assert mask_email("no-at-sign") == "***"
# ── Prompt construction (curly-brace safety) ─────────────────────────────────
@@ -393,11 +394,11 @@ def test_extraction_prompt_no_format_placeholders():
assert single_braces == [], f"Found format placeholders: {single_braces}"
# ── extract_business_understanding ────────────────────────────────────────────
# ── extract_business_understanding_from_tally ────────────────────────────────────────────
@pytest.mark.asyncio
async def test_extract_business_understanding_success():
async def test_extract_business_understanding_from_tally_success():
"""Happy path: LLM returns valid JSON that maps to BusinessUnderstandingInput."""
mock_choice = MagicMock()
mock_choice.message.content = json.dumps(
@@ -406,6 +407,13 @@ async def test_extract_business_understanding_success():
"business_name": "Acme Corp",
"industry": "Technology",
"pain_points": ["manual reporting"],
"suggested_prompts": [
"Automate weekly reports",
"Set up invoice processing",
"Create a customer onboarding flow",
"Track project deadlines automatically",
"Send follow-up emails after meetings",
],
}
)
mock_response = MagicMock()
@@ -415,16 +423,56 @@ async def test_extract_business_understanding_success():
mock_client.chat.completions.create.return_value = mock_response
with patch("backend.data.tally.AsyncOpenAI", return_value=mock_client):
result = await extract_business_understanding("Q: Name?\nA: Alice")
result = await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
assert result.user_name == "Alice"
assert result.business_name == "Acme Corp"
assert result.industry == "Technology"
assert result.pain_points == ["manual reporting"]
# suggested_prompts validated and sliced to top 3
assert result.suggested_prompts == [
"Automate weekly reports",
"Set up invoice processing",
"Create a customer onboarding flow",
]
@pytest.mark.asyncio
async def test_extract_business_understanding_filters_nulls():
async def test_extract_business_understanding_from_tally_filters_long_prompts():
"""Prompts exceeding 20 words are excluded and only top 3 are kept."""
long_prompt = " ".join(["word"] * 21)
mock_choice = MagicMock()
mock_choice.message.content = json.dumps(
{
"user_name": "Alice",
"suggested_prompts": [
long_prompt,
"Short prompt one",
long_prompt,
"Short prompt two",
"Short prompt three",
"Short prompt four",
],
}
)
mock_response = MagicMock()
mock_response.choices = [mock_choice]
mock_client = AsyncMock()
mock_client.chat.completions.create.return_value = mock_response
with patch("backend.data.tally.AsyncOpenAI", return_value=mock_client):
result = await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
assert result.suggested_prompts == [
"Short prompt one",
"Short prompt two",
"Short prompt three",
]
@pytest.mark.asyncio
async def test_extract_business_understanding_from_tally_filters_nulls():
"""Null values from LLM should be excluded from the result."""
mock_choice = MagicMock()
mock_choice.message.content = json.dumps(
@@ -437,7 +485,7 @@ async def test_extract_business_understanding_filters_nulls():
mock_client.chat.completions.create.return_value = mock_response
with patch("backend.data.tally.AsyncOpenAI", return_value=mock_client):
result = await extract_business_understanding("Q: Name?\nA: Alice")
result = await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
assert result.user_name == "Alice"
assert result.business_name is None
@@ -445,7 +493,7 @@ async def test_extract_business_understanding_filters_nulls():
@pytest.mark.asyncio
async def test_extract_business_understanding_invalid_json():
async def test_extract_business_understanding_from_tally_invalid_json():
"""Invalid JSON from LLM should raise JSONDecodeError."""
mock_choice = MagicMock()
mock_choice.message.content = "not valid json {"
@@ -459,11 +507,11 @@ async def test_extract_business_understanding_invalid_json():
patch("backend.data.tally.AsyncOpenAI", return_value=mock_client),
pytest.raises(json.JSONDecodeError),
):
await extract_business_understanding("Q: Name?\nA: Alice")
await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
@pytest.mark.asyncio
async def test_extract_business_understanding_timeout():
async def test_extract_business_understanding_from_tally_timeout():
"""LLM timeout should propagate as asyncio.TimeoutError."""
mock_client = AsyncMock()
mock_client.chat.completions.create.side_effect = asyncio.TimeoutError()
@@ -473,7 +521,7 @@ async def test_extract_business_understanding_timeout():
patch("backend.data.tally._LLM_TIMEOUT", 0.001),
pytest.raises(asyncio.TimeoutError),
):
await extract_business_understanding("Q: Name?\nA: Alice")
await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
# ── _refresh_cache ───────────────────────────────────────────────────────────
@@ -492,7 +540,7 @@ async def test_refresh_cache_full_fetch():
submissions = SAMPLE_SUBMISSIONS
with (
patch("backend.data.tally.Settings", return_value=mock_settings),
patch("backend.data.tally._settings", mock_settings),
patch(
"backend.data.tally.get_redis_async",
new_callable=AsyncMock,
@@ -540,7 +588,7 @@ async def test_refresh_cache_incremental_fetch():
new_submissions = [SAMPLE_SUBMISSIONS[0]] # Just Alice
with (
patch("backend.data.tally.Settings", return_value=mock_settings),
patch("backend.data.tally._settings", mock_settings),
patch(
"backend.data.tally.get_redis_async",
new_callable=AsyncMock,

View File

@@ -86,6 +86,11 @@ class BusinessUnderstandingInput(pydantic.BaseModel):
None, description="Any additional context"
)
# Suggested prompts (UI-only, not included in system prompt)
suggested_prompts: Optional[list[str]] = pydantic.Field(
None, description="LLM-generated suggested prompts based on business context"
)
class BusinessUnderstanding(pydantic.BaseModel):
"""Full business understanding model returned from database."""
@@ -122,6 +127,9 @@ class BusinessUnderstanding(pydantic.BaseModel):
# Additional context
additional_notes: Optional[str] = None
# Suggested prompts (UI-only, not included in system prompt)
suggested_prompts: list[str] = pydantic.Field(default_factory=list)
@classmethod
def from_db(cls, db_record: CoPilotUnderstanding) -> "BusinessUnderstanding":
"""Convert database record to Pydantic model."""
@@ -149,6 +157,7 @@ class BusinessUnderstanding(pydantic.BaseModel):
current_software=_json_to_list(business.get("current_software")),
existing_automation=_json_to_list(business.get("existing_automation")),
additional_notes=business.get("additional_notes"),
suggested_prompts=_json_to_list(data.get("suggested_prompts")),
)
@@ -166,6 +175,62 @@ def _merge_lists(existing: list | None, new: list | None) -> list | None:
return merged
def merge_business_understanding_data(
existing_data: dict[str, Any],
input_data: BusinessUnderstandingInput,
) -> dict[str, Any]:
merged_data = dict(existing_data)
merged_business: dict[str, Any] = {}
if isinstance(merged_data.get("business"), dict):
merged_business = dict(merged_data["business"])
business_string_fields = [
"job_title",
"business_name",
"industry",
"business_size",
"user_role",
"additional_notes",
]
business_list_fields = [
"key_workflows",
"daily_activities",
"pain_points",
"bottlenecks",
"manual_tasks",
"automation_goals",
"current_software",
"existing_automation",
]
if input_data.user_name is not None:
merged_data["name"] = input_data.user_name
for field in business_string_fields:
value = getattr(input_data, field)
if value is not None:
merged_business[field] = value
for field in business_list_fields:
value = getattr(input_data, field)
if value is not None:
existing_list = _json_to_list(merged_business.get(field))
merged_list = _merge_lists(existing_list, value)
merged_business[field] = merged_list
merged_business["version"] = 1
merged_data["business"] = merged_business
# suggested_prompts lives at the top level (not under `business`) because
# it's a UI-only artifact consumed by the frontend, not business understanding
# data. The `business` sub-dict feeds the system prompt.
if input_data.suggested_prompts is not None:
merged_data["suggested_prompts"] = input_data.suggested_prompts
return merged_data
async def _get_from_cache(user_id: str) -> Optional[BusinessUnderstanding]:
"""Get business understanding from Redis cache."""
try:
@@ -245,63 +310,18 @@ async def upsert_business_understanding(
where={"userId": user_id}
)
# Get existing data structure or start fresh
existing_data: dict[str, Any] = {}
if existing and isinstance(existing.data, dict):
existing_data = dict(existing.data)
existing_business: dict[str, Any] = {}
if isinstance(existing_data.get("business"), dict):
existing_business = dict(existing_data["business"])
# Business fields (stored inside business object)
business_string_fields = [
"job_title",
"business_name",
"industry",
"business_size",
"user_role",
"additional_notes",
]
business_list_fields = [
"key_workflows",
"daily_activities",
"pain_points",
"bottlenecks",
"manual_tasks",
"automation_goals",
"current_software",
"existing_automation",
]
# Handle top-level name field
if input_data.user_name is not None:
existing_data["name"] = input_data.user_name
# Business string fields - overwrite if provided
for field in business_string_fields:
value = getattr(input_data, field)
if value is not None:
existing_business[field] = value
# Business list fields - merge with existing
for field in business_list_fields:
value = getattr(input_data, field)
if value is not None:
existing_list = _json_to_list(existing_business.get(field))
merged = _merge_lists(existing_list, value)
existing_business[field] = merged
# Set version and nest business data
existing_business["version"] = 1
existing_data["business"] = existing_business
merged_data = merge_business_understanding_data(existing_data, input_data)
# Upsert with the merged data
record = await CoPilotUnderstanding.prisma().upsert(
where={"userId": user_id},
data={
"create": {"userId": user_id, "data": SafeJson(existing_data)},
"update": {"data": SafeJson(existing_data)},
"create": {"userId": user_id, "data": SafeJson(merged_data)},
"update": {"data": SafeJson(merged_data)},
},
)

View File

@@ -0,0 +1,102 @@
"""Tests for business understanding merge and format logic."""
from datetime import datetime, timezone
from typing import Any
from backend.data.understanding import (
BusinessUnderstanding,
BusinessUnderstandingInput,
format_understanding_for_prompt,
merge_business_understanding_data,
)
def _make_input(**kwargs: Any) -> BusinessUnderstandingInput:
"""Create a BusinessUnderstandingInput with only the specified fields."""
return BusinessUnderstandingInput.model_validate(kwargs)
# ─── merge_business_understanding_data: suggested_prompts ─────────────
def test_merge_suggested_prompts_overwrites_existing():
"""New suggested_prompts should fully replace existing ones (not append)."""
existing = {
"name": "Alice",
"business": {"industry": "Tech", "version": 1},
"suggested_prompts": ["Old prompt 1", "Old prompt 2"],
}
input_data = _make_input(
suggested_prompts=["New prompt A", "New prompt B", "New prompt C"],
)
result = merge_business_understanding_data(existing, input_data)
assert result["suggested_prompts"] == [
"New prompt A",
"New prompt B",
"New prompt C",
]
def test_merge_suggested_prompts_none_preserves_existing():
"""When input has suggested_prompts=None, existing prompts are preserved."""
existing = {
"name": "Alice",
"business": {"industry": "Tech", "version": 1},
"suggested_prompts": ["Keep me"],
}
input_data = _make_input(industry="Finance")
result = merge_business_understanding_data(existing, input_data)
assert result["suggested_prompts"] == ["Keep me"]
assert result["business"]["industry"] == "Finance"
def test_merge_suggested_prompts_added_to_empty_data():
"""Suggested prompts are set at top level even when starting from empty data."""
existing: dict[str, Any] = {}
input_data = _make_input(suggested_prompts=["Prompt 1"])
result = merge_business_understanding_data(existing, input_data)
assert result["suggested_prompts"] == ["Prompt 1"]
def test_merge_suggested_prompts_empty_list_overwrites():
"""An explicit empty list should overwrite existing prompts."""
existing: dict[str, Any] = {
"suggested_prompts": ["Old prompt"],
"business": {"version": 1},
}
input_data = _make_input(suggested_prompts=[])
result = merge_business_understanding_data(existing, input_data)
assert result["suggested_prompts"] == []
# ─── format_understanding_for_prompt: excludes suggested_prompts ──────
def test_format_understanding_excludes_suggested_prompts():
"""suggested_prompts is UI-only and must NOT appear in the system prompt."""
understanding = BusinessUnderstanding(
id="test-id",
user_id="user-1",
created_at=datetime.now(tz=timezone.utc),
updated_at=datetime.now(tz=timezone.utc),
user_name="Alice",
industry="Technology",
suggested_prompts=["Automate reports", "Set up alerts", "Track KPIs"],
)
formatted = format_understanding_for_prompt(understanding)
assert "Alice" in formatted
assert "Technology" in formatted
assert "suggested_prompts" not in formatted
assert "Automate reports" not in formatted
assert "Set up alerts" not in formatted
assert "Track KPIs" not in formatted

Some files were not shown because too many files have changed in this diff Show More