Compare commits

..

2 Commits

Author SHA1 Message Date
Dream
cfe3a222ac Update autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/Flow/Flow.tsx
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2026-03-17 16:30:59 +05:30
eureka928
b9eaff7763 feat(frontend/builder): enable rectangle drag-to-select multiple blocks
Enable multi-block selection by dragging a rectangle on the canvas.
Left-click drag draws a selection box, middle/right-click drag and
scroll wheel pan the canvas. Nodes partially inside the rectangle
are included in the selection.

Shift+click to add to selection, multi-node drag, and copy/paste
of selected blocks already work via React Flow defaults and the
existing useCopyPaste hook.

Closes #11045
2026-03-17 16:30:59 +05:30
289 changed files with 7487 additions and 24858 deletions

View File

@@ -2,7 +2,7 @@
name: pr-address
description: Address PR review comments and loop until CI green and all comments resolved. TRIGGER when user asks to address comments, fix PR feedback, respond to reviewers, or babysit/monitor a PR.
user-invocable: true
argument-hint: "[PR number or URL] — if omitted, finds PR for current branch."
args: "[PR number or URL] — if omitted, finds PR for current branch."
metadata:
author: autogpt-team
version: "1.0.0"
@@ -19,60 +19,16 @@ gh pr view {N}
## Fetch comments (all sources)
### 1. Inline review threads — GraphQL (primary source of actionable items)
Use GraphQL to fetch inline threads. It natively exposes `isResolved`, returns threads already grouped with all replies, and paginates via cursor — no manual thread reconstruction needed.
```bash
gh api graphql -f query='
{
repository(owner: "Significant-Gravitas", name: "AutoGPT") {
pullRequest(number: {N}) {
reviewThreads(first: 100) {
pageInfo { hasNextPage endCursor }
nodes {
id
isResolved
path
comments(last: 1) {
nodes { databaseId body author { login } createdAt }
}
}
}
}
}
}'
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews # top-level reviews
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments # inline review comments
gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments # PR conversation comments
```
If `pageInfo.hasNextPage` is true, fetch subsequent pages by adding `after: "<endCursor>"` to `reviewThreads(first: 100, after: "...")` and repeat until `hasNextPage` is false.
**Filter to unresolved threads only** — skip any thread where `isResolved: true`. `comments(last: 1)` returns the most recent comment in the thread — act on that; it reflects the reviewer's final ask. Use the thread `id` (Relay global ID) to track threads across polls.
### 2. Top-level reviews — REST (MUST paginate)
```bash
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate
```
**CRITICAL — always `--paginate`.** Reviews default to 30 per page. PRs can have 80170+ reviews (mostly empty resolution events). Without pagination you miss reviews past position 30 — including `autogpt-reviewer`'s structured review which is typically posted after several CI runs and sits well beyond the first page.
Two things to extract:
- **Overall state**: look for `CHANGES_REQUESTED` or `APPROVED` reviews.
- **Actionable feedback**: non-empty bodies only. Empty-body reviews are thread-resolution events — they indicate progress but have no feedback to act on.
**Where each reviewer posts:**
- `autogpt-reviewer` — posts detailed structured reviews ("Blockers", "Should Fix", "Nice to Have") as **top-level reviews**. Not present on every PR. Address ALL items.
- `sentry[bot]` — posts bug predictions as **inline threads**. Fix real bugs, explain false positives.
- `coderabbitai[bot]` — posts summaries as **top-level reviews** AND actionable items as **inline threads**. Address actionable items.
- Human reviewers — can post in any source. Address ALL non-empty feedback.
### 3. PR conversation comments — REST
```bash
gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments --paginate
```
Mostly contains: bot summaries (`coderabbitai[bot]`), CI/conflict detection (`github-actions[bot]`), and author status updates. Scan for non-empty messages from non-bot human reviewers that aren't the PR author — those are the ones that need a response.
**Bots to watch for:**
- `autogpt-reviewer` — posts "Blockers", "Should Fix", "Nice to Have". Address ALL of them.
- `sentry[bot]` — bug predictions. Fix real bugs, explain false positives.
- `coderabbitai[bot]` — automated review. Address actionable items.
## For each unaddressed comment
@@ -84,8 +40,8 @@ Address comments **one at a time**: fix → commit → push → inline reply →
| Comment type | How to reply |
|---|---|
| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="🤖 Fixed in <commit-sha>: <description>"` |
| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="🤖 Fixed in <commit-sha>: <description>"` |
| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="Fixed in <commit-sha>: <description>"` |
| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="Fixed in <commit-sha>: <description>"` |
## Format and commit
@@ -113,88 +69,11 @@ For backend commits in worktrees: `poetry run git commit` (pre-commit hooks).
```text
address comments → format → commit → push
wait for CI (while addressing new comments) → fix failures → push
→ re-check comments after CI settles
re-check comments → fix new ones → push
→ wait for CI → re-check comments after CI settles
→ repeat until: all comments addressed AND CI green AND no new comments arriving
```
### Polling for CI + new comments
While CI runs, stay productive: run local tests, address remaining comments.
After pushing, poll for **both** CI status and new comments in a single loop. Do not use `gh pr checks --watch` — it blocks the tool and prevents reacting to new comments while CI is running.
> **Note:** `gh pr checks --watch --fail-fast` is tempting but it blocks the entire Bash tool call, meaning the agent cannot check for or address new comments until CI fully completes. Always poll manually instead.
**Polling loop — repeat every 30 seconds:**
1. Check CI status:
```bash
gh pr checks {N} --repo Significant-Gravitas/AutoGPT --json bucket,name,link
```
Parse the results: if every check has `bucket` of `"pass"` or `"skipping"`, CI is green. If any has `"fail"`, CI has failed. Otherwise CI is still pending.
2. Check for merge conflicts:
```bash
gh pr view {N} --repo Significant-Gravitas/AutoGPT --json mergeable --jq '.mergeable'
```
If the result is `"CONFLICTING"`, the PR has a merge conflict — see "Resolving merge conflicts" below. If `"UNKNOWN"`, GitHub is still computing mergeability — wait and re-check next poll.
3. Check for new/changed comments (all three sources):
**Inline threads** — re-run the GraphQL query from "Fetch comments". For each unresolved thread, record `{thread_id, last_comment_databaseId}` as your baseline. On each poll, action is needed if:
- A new thread `id` appears that wasn't in the baseline (new thread), OR
- An existing thread's `last_comment_databaseId` has changed (new reply on existing thread)
**Conversation comments:**
```bash
gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments --paginate
```
Compare total count and newest `id` against baseline. Filter to non-empty, non-bot, non-author-update messages.
**Top-level reviews:**
```bash
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate
```
Watch for new non-empty reviews (`CHANGES_REQUESTED` or `COMMENTED` with body). Compare total count and newest `id` against baseline.
4. **React in this precedence order (first match wins):**
| What happened | Action |
|---|---|
| Merge conflict detected | See "Resolving merge conflicts" below. |
| Mergeability is `UNKNOWN` | GitHub is still computing mergeability. Sleep 30 seconds, then restart polling from the top. |
| New comments detected | Address them (fix → commit → push → reply). After pushing, re-fetch all comments to update your baseline, then restart this polling loop from the top (new commits invalidate CI status). |
| CI failed (bucket == "fail") | Get failed check links: `gh pr checks {N} --repo Significant-Gravitas/AutoGPT --json bucket,link --jq '.[] \| select(.bucket == "fail") \| .link'`. Extract run ID from link (format: `.../actions/runs/<run-id>/job/...`), read logs with `gh run view <run-id> --repo Significant-Gravitas/AutoGPT --log-failed`. Fix → commit → push → restart polling. |
| CI green + no new comments | **Do not exit immediately.** Bots (coderabbitai, sentry) often post reviews shortly after CI settles. Continue polling for **2 more cycles (60s)** after CI goes green. Only exit after 2 consecutive green+quiet polls. |
| CI pending + no new comments | Sleep 30 seconds, then poll again. |
**The loop ends when:** CI fully green + all comments addressed + **2 consecutive polls with no new comments after CI settled.**
### Resolving merge conflicts
1. Identify the PR's target branch and remote:
```bash
gh pr view {N} --repo Significant-Gravitas/AutoGPT --json baseRefName --jq '.baseRefName'
git remote -v # find the remote pointing to Significant-Gravitas/AutoGPT (typically 'upstream' in forks, 'origin' for direct contributors)
```
2. Pull the latest base branch with a 3-way merge:
```bash
git pull {base-remote} {base-branch} --no-rebase
```
3. Resolve conflicting files, then verify no conflict markers remain:
```bash
if grep -R -n -E '^(<<<<<<<|=======|>>>>>>>)' <conflicted-files>; then
echo "Unresolved conflict markers found — resolve before proceeding."
exit 1
fi
```
4. Stage and push:
```bash
git add <conflicted-files>
git commit -m "Resolve merge conflicts with {base-branch}"
git push
```
5. Restart the polling loop from the top — new commits reset CI status.
**The loop ends when:** CI fully green + all comments addressed + no new comments since CI settled.

View File

@@ -28,7 +28,7 @@ gh pr diff {N}
Before posting anything, fetch existing inline comments to avoid duplicates:
```bash
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments
gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews
```

View File

@@ -1,534 +0,0 @@
---
name: pr-test
description: "E2E manual testing of PRs/branches using docker compose, agent-browser, and API calls. TRIGGER when user asks to manually test a PR, test a feature end-to-end, or run integration tests against a running system."
user-invocable: true
argument-hint: "[worktree path or PR number] — tests the PR in the given worktree. Optional flags: --fix (auto-fix issues found)"
metadata:
author: autogpt-team
version: "1.0.0"
---
# Manual E2E Test
Test a PR/branch end-to-end by building the full platform, interacting via browser and API, capturing screenshots, and reporting results.
## Arguments
- `$ARGUMENTS` — worktree path (e.g. `$REPO_ROOT`) or PR number
- If `--fix` flag is present, auto-fix bugs found and push fixes (like pr-address loop)
## Step 0: Resolve the target
```bash
# If argument is a PR number, find its worktree
gh pr view {N} --json headRefName --jq '.headRefName'
# If argument is a path, use it directly
```
Determine:
- `REPO_ROOT` — the root repo directory: `git -C "$WORKTREE_PATH" worktree list | head -1 | awk '{print $1}'` (or `git rev-parse --show-toplevel` if not a worktree)
- `WORKTREE_PATH` — the worktree directory
- `PLATFORM_DIR``$WORKTREE_PATH/autogpt_platform`
- `BACKEND_DIR``$PLATFORM_DIR/backend`
- `FRONTEND_DIR``$PLATFORM_DIR/frontend`
- `PR_NUMBER` — the PR number (from `gh pr list --head $(git branch --show-current)`)
- `PR_TITLE` — the PR title, slugified (e.g. "Add copilot permissions" → "add-copilot-permissions")
- `RESULTS_DIR``$REPO_ROOT/test-results/PR-{PR_NUMBER}-{slugified-title}`
Create the results directory:
```bash
PR_NUMBER=$(cd $WORKTREE_PATH && gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT --json number --jq '.[0].number')
PR_TITLE=$(cd $WORKTREE_PATH && gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT --json title --jq '.[0].title' | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//' | head -c 50)
RESULTS_DIR="$REPO_ROOT/test-results/PR-${PR_NUMBER}-${PR_TITLE}"
mkdir -p $RESULTS_DIR
```
**Test user credentials** (for logging into the UI or verifying results manually):
- Email: `test@test.com`
- Password: `testtest123`
## Step 1: Understand the PR
Before testing, understand what changed:
```bash
cd $WORKTREE_PATH
git log --oneline dev..HEAD | head -20
git diff dev --stat
```
Read the changed files to understand:
1. What feature/fix does this PR implement?
2. What components are affected? (backend, frontend, copilot, executor, etc.)
3. What are the key user-facing behaviors to test?
## Step 2: Write test scenarios
Based on the PR analysis, write a test plan to `$RESULTS_DIR/test-plan.md`:
```markdown
# Test Plan: PR #{N} — {title}
## Scenarios
1. [Scenario name] — [what to verify]
2. ...
## API Tests (if applicable)
1. [Endpoint] — [expected behavior]
## UI Tests (if applicable)
1. [Page/component] — [interaction to test]
## Negative Tests
1. [What should NOT happen]
```
**Be critical** — include edge cases, error paths, and security checks.
## Step 3: Environment setup
### 3a. Copy .env files from the root worktree
The root worktree (`$REPO_ROOT`) has the canonical `.env` files with all API keys. Copy them to the target worktree:
```bash
# CRITICAL: .env files are NOT checked into git. They must be copied manually.
cp $REPO_ROOT/autogpt_platform/.env $PLATFORM_DIR/.env
cp $REPO_ROOT/autogpt_platform/backend/.env $BACKEND_DIR/.env
cp $REPO_ROOT/autogpt_platform/frontend/.env $FRONTEND_DIR/.env
```
### 3b. Configure copilot authentication
The copilot needs an LLM API to function. Two approaches (try subscription first):
#### Option 1: Subscription mode (preferred — uses your Claude Max/Pro subscription)
The `claude_agent_sdk` Python package **bundles its own Claude CLI binary** — no need to install `@anthropic-ai/claude-code` via npm. The backend auto-provisions credentials from environment variables on startup.
Run the helper script to extract tokens from your host and auto-update `backend/.env` (works on macOS, Linux, and Windows/WSL):
```bash
# Extracts OAuth tokens and writes CLAUDE_CODE_OAUTH_TOKEN + CLAUDE_CODE_REFRESH_TOKEN into .env
bash $BACKEND_DIR/scripts/refresh_claude_token.sh --env-file $BACKEND_DIR/.env
```
**How it works:** The script reads the OAuth token from:
- **macOS**: system keychain (`"Claude Code-credentials"`)
- **Linux/WSL**: `~/.claude/.credentials.json`
- **Windows**: `%APPDATA%/claude/.credentials.json`
It sets `CLAUDE_CODE_OAUTH_TOKEN`, `CLAUDE_CODE_REFRESH_TOKEN`, and `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true` in the `.env` file. On container startup, the backend auto-provisions `~/.claude/.credentials.json` inside the container from these env vars. The SDK's bundled CLI then authenticates using that file. No `claude login`, no npm install needed.
**Note:** The OAuth token expires (~24h). If copilot returns auth errors, re-run the script and restart: `$BACKEND_DIR/scripts/refresh_claude_token.sh --env-file $BACKEND_DIR/.env && docker compose up -d copilot_executor`
#### Option 2: OpenRouter API key mode (fallback)
If subscription mode doesn't work, switch to API key mode using OpenRouter:
```bash
# In $BACKEND_DIR/.env, ensure these are set:
CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false
CHAT_API_KEY=<value of OPEN_ROUTER_API_KEY from the same .env>
CHAT_BASE_URL=https://openrouter.ai/api/v1
CHAT_USE_CLAUDE_AGENT_SDK=true
```
Use `sed` to update these values:
```bash
ORKEY=$(grep "^OPEN_ROUTER_API_KEY=" $BACKEND_DIR/.env | cut -d= -f2)
[ -n "$ORKEY" ] || { echo "ERROR: OPEN_ROUTER_API_KEY is missing in $BACKEND_DIR/.env"; exit 1; }
perl -i -pe 's/CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true/CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false/' $BACKEND_DIR/.env
# Add or update CHAT_API_KEY and CHAT_BASE_URL
grep -q "^CHAT_API_KEY=" $BACKEND_DIR/.env && perl -i -pe "s|^CHAT_API_KEY=.*|CHAT_API_KEY=$ORKEY|" $BACKEND_DIR/.env || echo "CHAT_API_KEY=$ORKEY" >> $BACKEND_DIR/.env
grep -q "^CHAT_BASE_URL=" $BACKEND_DIR/.env && perl -i -pe 's|^CHAT_BASE_URL=.*|CHAT_BASE_URL=https://openrouter.ai/api/v1|' $BACKEND_DIR/.env || echo "CHAT_BASE_URL=https://openrouter.ai/api/v1" >> $BACKEND_DIR/.env
```
### 3c. Stop conflicting containers
```bash
# Stop any running app containers (keep infra: supabase, redis, rabbitmq, clamav)
docker ps --format "{{.Names}}" | grep -E "rest_server|executor|copilot|websocket|database_manager|scheduler|notification|frontend|migrate" | while read name; do
docker stop "$name" 2>/dev/null
done
```
### 3e. Build and start
```bash
cd $PLATFORM_DIR && docker compose build --no-cache 2>&1 | tail -20
if [ ${PIPESTATUS[0]} -ne 0 ]; then echo "ERROR: Docker build failed"; exit 1; fi
cd $PLATFORM_DIR && docker compose up -d 2>&1 | tail -20
if [ ${PIPESTATUS[0]} -ne 0 ]; then echo "ERROR: Docker compose up failed"; exit 1; fi
```
**Note:** If the container appears to be running old code (e.g. missing PR changes), use `docker compose build --no-cache` to force a full rebuild. Docker BuildKit may sometimes reuse cached `COPY` layers from a previous build on a different branch.
**Expected time: 3-8 minutes** for build, 5-10 minutes with `--no-cache`.
### 3f. Wait for services to be ready
```bash
# Poll until backend and frontend respond
for i in $(seq 1 60); do
BACKEND=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8006/docs 2>/dev/null)
FRONTEND=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000 2>/dev/null)
if [ "$BACKEND" = "200" ] && [ "$FRONTEND" = "200" ]; then
echo "Services ready"
break
fi
sleep 5
done
```
### 3h. Create test user and get auth token
```bash
ANON_KEY=$(grep "NEXT_PUBLIC_SUPABASE_ANON_KEY=" $FRONTEND_DIR/.env | sed 's/.*NEXT_PUBLIC_SUPABASE_ANON_KEY=//' | tr -d '[:space:]')
# Signup (idempotent — returns "User already registered" if exists)
RESULT=$(curl -s -X POST 'http://localhost:8000/auth/v1/signup' \
-H "apikey: $ANON_KEY" \
-H 'Content-Type: application/json' \
-d '{"email":"test@test.com","password":"testtest123"}')
# If "Database error finding user", restart supabase-auth and retry
if echo "$RESULT" | grep -q "Database error"; then
docker restart supabase-auth && sleep 5
curl -s -X POST 'http://localhost:8000/auth/v1/signup' \
-H "apikey: $ANON_KEY" \
-H 'Content-Type: application/json' \
-d '{"email":"test@test.com","password":"testtest123"}'
fi
# Get auth token
TOKEN=$(curl -s -X POST 'http://localhost:8000/auth/v1/token?grant_type=password' \
-H "apikey: $ANON_KEY" \
-H 'Content-Type: application/json' \
-d '{"email":"test@test.com","password":"testtest123"}' | jq -r '.access_token // ""')
```
**Use this token for ALL API calls:**
```bash
curl -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/...
```
## Step 4: Run tests
### Service ports reference
| Service | Port | URL |
|---------|------|-----|
| Frontend | 3000 | http://localhost:3000 |
| Backend REST | 8006 | http://localhost:8006 |
| Supabase Auth (via Kong) | 8000 | http://localhost:8000 |
| Executor | 8002 | http://localhost:8002 |
| Copilot Executor | 8008 | http://localhost:8008 |
| WebSocket | 8001 | http://localhost:8001 |
| Database Manager | 8005 | http://localhost:8005 |
| Redis | 6379 | localhost:6379 |
| RabbitMQ | 5672 | localhost:5672 |
### API testing
Use `curl` with the auth token for backend API tests:
```bash
# Example: List agents
curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/graphs | jq . | head -20
# Example: Create an agent
curl -s -X POST http://localhost:8006/api/graphs \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{...}' | jq .
# Example: Run an agent
curl -s -X POST "http://localhost:8006/api/graphs/{graph_id}/execute" \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{"data": {...}}'
# Example: Get execution results
curl -s -H "Authorization: Bearer $TOKEN" \
"http://localhost:8006/api/graphs/{graph_id}/executions/{exec_id}" | jq .
```
### Browser testing with agent-browser
```bash
# Close any existing session
agent-browser close 2>/dev/null || true
# Use --session-name to persist cookies across navigations
# This means login only needs to happen once per test session
agent-browser --session-name pr-test open 'http://localhost:3000/login' --timeout 15000
# Get interactive elements
agent-browser --session-name pr-test snapshot | grep "textbox\|button"
# Login
agent-browser --session-name pr-test fill {email_ref} "test@test.com"
agent-browser --session-name pr-test fill {password_ref} "testtest123"
agent-browser --session-name pr-test click {login_button_ref}
sleep 5
# Dismiss cookie banner if present
agent-browser --session-name pr-test click 'text=Accept All' 2>/dev/null || true
# Navigate — cookies are preserved so login persists
agent-browser --session-name pr-test open 'http://localhost:3000/copilot' --timeout 10000
# Take screenshot
agent-browser --session-name pr-test screenshot $RESULTS_DIR/01-page.png
# Interact with elements
agent-browser --session-name pr-test fill {ref} "text"
agent-browser --session-name pr-test press "Enter"
agent-browser --session-name pr-test click {ref}
agent-browser --session-name pr-test click 'text=Button Text'
# Read page content
agent-browser --session-name pr-test snapshot | grep "text:"
```
**Key pages:**
- `/copilot` — CoPilot chat (for testing copilot features)
- `/build` — Agent builder (for testing block/node features)
- `/build?flowID={id}` — Specific agent in builder
- `/library` — Agent library (for testing listing/import features)
- `/library/agents/{id}` — Agent detail with run history
- `/marketplace` — Marketplace
### Checking logs
```bash
# Backend REST server
docker logs autogpt_platform-rest_server-1 2>&1 | tail -30
# Executor (runs agent graphs)
docker logs autogpt_platform-executor-1 2>&1 | tail -30
# Copilot executor (runs copilot chat sessions)
docker logs autogpt_platform-copilot_executor-1 2>&1 | tail -30
# Frontend
docker logs autogpt_platform-frontend-1 2>&1 | tail -30
# Filter for errors
docker logs autogpt_platform-executor-1 2>&1 | grep -i "error\|exception\|traceback" | tail -20
```
### Copilot chat testing
The copilot uses SSE streaming. To test via API:
```bash
# Create a session
SESSION_ID=$(curl -s -X POST 'http://localhost:8006/api/chat/sessions' \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{}' | jq -r '.id // .session_id // ""')
# Stream a message (SSE - will stream chunks)
curl -N -X POST "http://localhost:8006/api/chat/sessions/$SESSION_ID/stream" \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{"message": "Hello, what can you help me with?"}' \
--max-time 60 2>/dev/null | head -50
```
Or test via browser (preferred for UI verification):
```bash
agent-browser --session-name pr-test open 'http://localhost:3000/copilot' --timeout 10000
# ... fill chat input and press Enter, wait 20-30s for response
```
## Step 5: Record results
For each test scenario, record in `$RESULTS_DIR/test-report.md`:
```markdown
# E2E Test Report: PR #{N} — {title}
Date: {date}
Branch: {branch}
Worktree: {path}
## Environment
- Docker services: [list running containers]
- API keys: OpenRouter={present/missing}, E2B={present/missing}
## Test Results
### Scenario 1: {name}
**Steps:**
1. ...
2. ...
**Expected:** ...
**Actual:** ...
**Result:** PASS / FAIL
**Screenshot:** {filename}.png
**Logs:** (if relevant)
### Scenario 2: {name}
...
## Summary
- Total: X scenarios
- Passed: Y
- Failed: Z
- Bugs found: [list]
```
Take screenshots at each significant step:
```bash
agent-browser --session-name pr-test screenshot $RESULTS_DIR/{NN}-{description}.png
```
## Step 6: Report results
After all tests complete, output a summary to the user:
1. Table of all scenarios with PASS/FAIL
2. Screenshots of failures (read the PNG files to show them)
3. Any bugs found with details
4. Recommendations
### Post test results as PR comment with screenshots
Upload screenshots to the PR using the GitHub Git API (no local git operations — safe for worktrees).
```bash
# Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
REPO="Significant-Gravitas/AutoGPT"
SCREENSHOTS_BRANCH="test-screenshots/pr-${PR_NUMBER}"
SCREENSHOTS_DIR="test-screenshots/PR-${PR_NUMBER}"
# Step 1: Create blobs for each screenshot
declare -a TREE_ENTRIES
for img in $RESULTS_DIR/*.png; do
BASENAME=$(basename "$img")
B64=$(base64 < "$img")
BLOB_SHA=$(gh api "repos/${REPO}/git/blobs" -f content="$B64" -f encoding="base64" --jq '.sha')
TREE_ENTRIES+=("-f" "tree[][path]=${SCREENSHOTS_DIR}/${BASENAME}" "-f" "tree[][mode]=100644" "-f" "tree[][type]=blob" "-f" "tree[][sha]=${BLOB_SHA}")
done
# Step 2: Create a tree with all screenshot blobs
# Build the tree JSON manually since gh api doesn't handle arrays well
TREE_JSON='['
FIRST=true
for img in $RESULTS_DIR/*.png; do
BASENAME=$(basename "$img")
B64=$(base64 < "$img")
BLOB_SHA=$(gh api "repos/${REPO}/git/blobs" -f content="$B64" -f encoding="base64" --jq '.sha')
if [ "$FIRST" = true ]; then FIRST=false; else TREE_JSON+=','; fi
TREE_JSON+="{\"path\":\"${SCREENSHOTS_DIR}/${BASENAME}\",\"mode\":\"100644\",\"type\":\"blob\",\"sha\":\"${BLOB_SHA}\"}"
done
TREE_JSON+=']'
TREE_SHA=$(echo "$TREE_JSON" | gh api "repos/${REPO}/git/trees" --input - -f base_tree="" --jq '.sha' 2>/dev/null \
|| echo "$TREE_JSON" | jq -c '{tree: .}' | gh api "repos/${REPO}/git/trees" --input - --jq '.sha')
# Step 3: Create a commit pointing to that tree
COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
-f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
-f tree="$TREE_SHA" \
--jq '.sha')
# Step 4: Create or update the ref (branch) — no local checkout needed
gh api "repos/${REPO}/git/refs" \
-f ref="refs/heads/${SCREENSHOTS_BRANCH}" \
-f sha="$COMMIT_SHA" 2>/dev/null \
|| gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" \
-X PATCH -f sha="$COMMIT_SHA" -f force=true
# Step 5: Build image markdown and post the comment
REPO_URL="https://raw.githubusercontent.com/${REPO}/${SCREENSHOTS_BRANCH}"
IMAGE_MARKDOWN=""
for img in $RESULTS_DIR/*.png; do
BASENAME=$(basename "$img")
IMAGE_MARKDOWN="$IMAGE_MARKDOWN
![${BASENAME}](${REPO_URL}/${SCREENSHOTS_DIR}/${BASENAME})"
done
gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -f body="$(cat <<EOF
## 🧪 E2E Test Report
$(cat $RESULTS_DIR/test-report.md)
### Screenshots
${IMAGE_MARKDOWN}
EOF
)"
```
This approach uses the GitHub Git API to create blobs, trees, commits, and refs entirely server-side. No local `git checkout` or `git push` — safe for worktrees and won't interfere with the PR branch.
## Fix mode (--fix flag)
When `--fix` is present, after finding a bug:
1. Identify the root cause in the code
2. Fix it in the worktree
3. Rebuild the affected service: `cd $PLATFORM_DIR && docker compose up --build -d {service_name}`
4. Re-test the scenario
5. If fix works, commit and push:
```bash
cd $WORKTREE_PATH
git add -A
git commit -m "fix: {description of fix}"
git push
```
6. Continue testing remaining scenarios
7. After all fixes, run the full test suite again to ensure no regressions
### Fix loop (like pr-address)
```text
test scenario → find bug → fix code → rebuild service → re-test
→ repeat until all scenarios pass
→ commit + push all fixes
→ run full re-test to verify
```
## Known issues and workarounds
### Problem: "Database error finding user" on signup
**Cause:** Supabase auth service schema cache is stale after migration.
**Fix:** `docker restart supabase-auth && sleep 5` then retry signup.
### Problem: Copilot returns auth errors in subscription mode
**Cause:** `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true` but `CLAUDE_CODE_OAUTH_TOKEN` is not set or expired.
**Fix:** Re-extract the OAuth token from macOS keychain (see step 3b, Option 1) and recreate the container (`docker compose up -d copilot_executor`). The backend auto-provisions `~/.claude/.credentials.json` from the env var on startup. No `npm install` or `claude login` needed — the SDK bundles its own CLI binary.
### Problem: agent-browser can't find chromium
**Cause:** The Dockerfile auto-provisions system chromium on all architectures (including ARM64). If your branch is behind `dev`, this may not be present yet.
**Fix:** Check if chromium exists: `which chromium || which chromium-browser`. If missing, install it: `apt-get install -y chromium` and set `AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium` in the container environment.
### Problem: agent-browser selector matches multiple elements
**Cause:** `text=X` matches all elements containing that text.
**Fix:** Use `agent-browser snapshot` to get specific `ref=eNN` references, then use those: `agent-browser click eNN`.
### Problem: Frontend shows cookie banner blocking interaction
**Fix:** `agent-browser click 'text=Accept All'` before other interactions.
### Problem: Container loses npm packages after rebuild
**Cause:** `docker compose up --build` rebuilds the image, losing runtime installs.
**Fix:** Add packages to the Dockerfile instead of installing at runtime.
### Problem: Services not starting after `docker compose up`
**Fix:** Wait and check health: `docker compose ps`. Common cause: migration hasn't finished. Check: `docker logs autogpt_platform-migrate-1 2>&1 | tail -5`. If supabase-db isn't healthy: `docker restart supabase-db && sleep 10`.
### Problem: Docker uses cached layers with old code (PR changes not visible)
**Cause:** `docker compose up --build` reuses cached `COPY` layers from previous builds. If the PR branch changes Python files but the previous build already cached that layer from `dev`, the container runs `dev` code.
**Fix:** Always use `docker compose build --no-cache` for the first build of a PR branch. Subsequent rebuilds within the same branch can use `--build`.
### Problem: `agent-browser open` loses login session
**Cause:** Without session persistence, `agent-browser open` starts fresh.
**Fix:** Use `--session-name pr-test` on ALL agent-browser commands. This auto-saves/restores cookies and localStorage across navigations. Alternatively, use `agent-browser eval "window.location.href = '...'"` to navigate within the same context.
### Problem: Supabase auth returns "Database error querying schema"
**Cause:** The database schema changed (migration ran) but supabase-auth has a stale schema cache.
**Fix:** `docker restart supabase-db && sleep 10 && docker restart supabase-auth && sleep 8`. If user data was lost, re-signup.

View File

@@ -27,91 +27,10 @@ defaults:
working-directory: autogpt_platform/backend
jobs:
lint:
permissions:
contents: read
timeout-minutes: 10
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Set up Python dependency cache
uses: actions/cache@v5
with:
path: ~/.cache/pypoetry
key: poetry-${{ runner.os }}-py3.12-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
- name: Install Poetry
run: |
HEAD_POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
echo "Using Poetry version ${HEAD_POETRY_VERSION}"
curl -sSL https://install.python-poetry.org | POETRY_VERSION=$HEAD_POETRY_VERSION python3 -
- name: Install Python dependencies
run: poetry install
- name: Run Linters
run: poetry run lint --skip-pyright
env:
CI: true
PLAIN_OUTPUT: True
type-check:
permissions:
contents: read
timeout-minutes: 10
strategy:
fail-fast: false
matrix:
python-version: ["3.11", "3.12", "3.13"]
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Set up Python dependency cache
uses: actions/cache@v5
with:
path: ~/.cache/pypoetry
key: poetry-${{ runner.os }}-py${{ matrix.python-version }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
- name: Install Poetry
run: |
HEAD_POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
echo "Using Poetry version ${HEAD_POETRY_VERSION}"
curl -sSL https://install.python-poetry.org | POETRY_VERSION=$HEAD_POETRY_VERSION python3 -
- name: Install Python dependencies
run: poetry install
- name: Generate Prisma Client
run: poetry run prisma generate && poetry run gen-prisma-stub
- name: Run Pyright
run: poetry run pyright --pythonversion ${{ matrix.python-version }}
env:
CI: true
PLAIN_OUTPUT: True
test:
permissions:
contents: read
timeout-minutes: 15
timeout-minutes: 30
strategy:
fail-fast: false
matrix:
@@ -179,9 +98,9 @@ jobs:
uses: actions/cache@v5
with:
path: ~/.cache/pypoetry
key: poetry-${{ runner.os }}-py${{ matrix.python-version }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
- name: Install Poetry
- name: Install Poetry (Unix)
run: |
# Extract Poetry version from backend/poetry.lock
HEAD_POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
@@ -239,22 +158,22 @@ jobs:
echo "Waiting for ClamAV daemon to start..."
max_attempts=60
attempt=0
until nc -z localhost 3310 || [ $attempt -eq $max_attempts ]; do
echo "ClamAV is unavailable - sleeping (attempt $((attempt+1))/$max_attempts)"
sleep 5
attempt=$((attempt+1))
done
if [ $attempt -eq $max_attempts ]; then
echo "ClamAV failed to start after $((max_attempts*5)) seconds"
echo "Checking ClamAV service logs..."
docker logs $(docker ps -q --filter "ancestor=clamav/clamav-debian:latest") 2>&1 | tail -50 || echo "No ClamAV container found"
exit 1
fi
echo "ClamAV is ready!"
# Verify ClamAV is responsive
echo "Testing ClamAV connection..."
timeout 10 bash -c 'echo "PING" | nc localhost 3310' || {
@@ -269,13 +188,18 @@ jobs:
DATABASE_URL: ${{ steps.supabase.outputs.DB_URL }}
DIRECT_URL: ${{ steps.supabase.outputs.DB_URL }}
- name: Run pytest
- id: lint
name: Run Linter
run: poetry run lint
- name: Run pytest with coverage
run: |
if [[ "${{ runner.debug }}" == "1" ]]; then
poetry run pytest -s -vv -o log_cli=true -o log_cli_level=DEBUG
else
poetry run pytest -s -vv
fi
if: success() || (failure() && steps.lint.outcome == 'failure')
env:
LOG_LEVEL: ${{ runner.debug && 'DEBUG' || 'INFO' }}
DATABASE_URL: ${{ steps.supabase.outputs.DB_URL }}
@@ -287,12 +211,6 @@ jobs:
REDIS_PORT: "6379"
ENCRYPTION_KEY: "dvziYgz0KSK8FENhju0ZYi8-fRTfAdlz6YLhdB_jhNw=" # DO NOT USE IN PRODUCTION!!
# - name: Upload coverage reports to Codecov
# uses: codecov/codecov-action@v4
# with:
# token: ${{ secrets.CODECOV_TOKEN }}
# flags: backend,${{ runner.os }}
env:
CI: true
PLAIN_OUTPUT: True
@@ -306,3 +224,9 @@ jobs:
# the backend service, docker composes, and examples
RABBITMQ_DEFAULT_USER: "rabbitmq_user_default"
RABBITMQ_DEFAULT_PASS: "k0VMxyIJF9S35f3x2uaw5IWAl6Y536O7"
# - name: Upload coverage reports to Codecov
# uses: codecov/codecov-action@v4
# with:
# token: ${{ secrets.CODECOV_TOKEN }}
# flags: backend,${{ runner.os }}

View File

@@ -294,7 +294,7 @@ jobs:
uses: actions/upload-artifact@v4
with:
name: playwright-report
path: autogpt_platform/frontend/playwright-report
path: playwright-report
if-no-files-found: ignore
retention-days: 3
@@ -303,7 +303,7 @@ jobs:
uses: actions/upload-artifact@v4
with:
name: playwright-test-results
path: autogpt_platform/frontend/test-results
path: test-results
if-no-files-found: ignore
retention-days: 3

View File

@@ -56,35 +56,15 @@ AutoGPT Platform is a monorepo containing:
- Ensure the branch name is descriptive (e.g., `feature/add-new-block`)
- Use conventional commit messages (see below)
- Fill out the .github/PULL_REQUEST_TEMPLATE.md template as the PR description
- Always use `--body-file` to pass PR body — avoids shell interpretation of backticks and special characters:
```bash
PR_BODY=$(mktemp)
cat > "$PR_BODY" << 'PREOF'
## Summary
- use `backticks` freely here
PREOF
gh pr create --title "..." --body-file "$PR_BODY" --base dev
rm "$PR_BODY"
```
- Run the github pre-commit hooks to ensure code quality.
### Test-Driven Development (TDD)
When fixing a bug or adding a feature, follow a test-first approach:
1. **Write a failing test first** — create a test that reproduces the bug or validates the new behavior, marked with `@pytest.mark.xfail` (backend) or `.fixme` (Playwright). Run it to confirm it fails for the right reason.
2. **Implement the fix/feature** — write the minimal code to make the test pass.
3. **Remove the xfail marker** — once the test passes, remove the `xfail`/`.fixme` annotation and run the full test suite to confirm nothing else broke.
This ensures every change is covered by a test and that the test actually validates the intended behavior.
### Reviewing/Revising Pull Requests
Use `/pr-review` to review a PR or `/pr-address` to address comments.
When fetching comments manually:
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate` — top-level reviews
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate` — inline review comments (always paginate to avoid missing comments beyond page 1)
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews` — top-level reviews
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments` — inline review comments
- `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments` — PR conversation comments
### Conventional Commits

View File

@@ -37,6 +37,10 @@ JWT_VERIFY_KEY=your-super-secret-jwt-token-with-at-least-32-characters-long
ENCRYPTION_KEY=dvziYgz0KSK8FENhju0ZYi8-fRTfAdlz6YLhdB_jhNw=
UNSUBSCRIBE_SECRET_KEY=HlP8ivStJjmbf6NKi78m_3FnOogut0t5ckzjsIqeaio=
## ===== SIGNUP / INVITE GATE ===== ##
# Set to true to require an invite before users can sign up
ENABLE_INVITE_GATE=false
## ===== IMPORTANT OPTIONAL CONFIGURATION ===== ##
# Platform URLs (set these for webhooks and OAuth to work)
PLATFORM_BASE_URL=http://localhost:8000

View File

@@ -66,7 +66,7 @@ poetry run pytest path/to/test.py --snapshot-update
- **No linter suppressors** — no `# type: ignore`, `# noqa`, `# pyright: ignore`; fix the type/code
- **List comprehensions** over manual loop-and-append
- **Early return** — guard clauses first, avoid deep nesting
- **f-strings vs printf syntax in log statements** — Use `%s` for deferred interpolation in `debug` statements, f-strings elsewhere for readability: `logger.debug("Processing %s items", count)`, `logger.info(f"Processing {count} items")`
- **Lazy `%s` logging** — `logger.info("Processing %s items", count)` not `logger.info(f"Processing {count} items")`
- **Sanitize error paths** — `os.path.basename()` in error messages to avoid leaking directory structure
- **TOCTOU awareness** — avoid check-then-act patterns for file access and credit charging
- **`Security()` vs `Depends()`** — use `Security()` for auth deps to get proper OpenAPI security spec
@@ -75,7 +75,6 @@ poetry run pytest path/to/test.py --snapshot-update
- **SSE protocol** — `data:` lines for frontend-parsed events (must match Zod schema), `: comment` lines for heartbeats/status
- **File length** — keep files under ~300 lines; if a file grows beyond this, split by responsibility (e.g. extract helpers, models, or a sub-module into a new file). Never keep appending to a long file.
- **Function length** — keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions are a sign of mixed concerns, not complexity.
- **Top-down ordering** — define the main/public function or class first, then the helpers it uses below. A reader should encounter high-level logic before implementation details.
## Testing Approach
@@ -85,30 +84,6 @@ poetry run pytest path/to/test.py --snapshot-update
- After refactoring, update mock targets to match new module paths
- Use `AsyncMock` for async functions (`from unittest.mock import AsyncMock`)
### Test-Driven Development (TDD)
When fixing a bug or adding a feature, write the test **before** the implementation:
```python
# 1. Write a failing test marked xfail
@pytest.mark.xfail(reason="Bug #1234: widget crashes on empty input")
def test_widget_handles_empty_input():
result = widget.process("")
assert result == Widget.EMPTY_RESULT
# 2. Run it — confirm it fails (XFAIL)
# poetry run pytest path/to/test.py::test_widget_handles_empty_input -xvs
# 3. Implement the fix
# 4. Remove xfail, run again — confirm it passes
def test_widget_handles_empty_input():
result = widget.process("")
assert result == Widget.EMPTY_RESULT
```
This catches regressions and proves the fix actually works. **Every bug fix should include a test that would have caught it.**
## Database Schema
Key models (defined in `schema.prisma`):

View File

@@ -50,7 +50,7 @@ RUN poetry install --no-ansi --no-root
# Generate Prisma client
COPY autogpt_platform/backend/schema.prisma ./
COPY autogpt_platform/backend/backend/data/partial_types.py ./backend/data/partial_types.py
COPY autogpt_platform/backend/scripts/gen_prisma_types_stub.py ./scripts/
COPY autogpt_platform/backend/gen_prisma_types_stub.py ./
RUN poetry run prisma generate && poetry run gen-prisma-stub
# =============================== DB MIGRATOR =============================== #
@@ -82,7 +82,7 @@ RUN pip3 install prisma>=0.15.0 --break-system-packages
COPY autogpt_platform/backend/schema.prisma ./
COPY autogpt_platform/backend/backend/data/partial_types.py ./backend/data/partial_types.py
COPY autogpt_platform/backend/scripts/gen_prisma_types_stub.py ./scripts/
COPY autogpt_platform/backend/gen_prisma_types_stub.py ./
COPY autogpt_platform/backend/migrations ./migrations
# ============================== BACKEND SERVER ============================== #
@@ -121,37 +121,19 @@ RUN ln -s ../lib/node_modules/npm/bin/npm-cli.js /usr/bin/npm \
&& ln -s ../lib/node_modules/npm/bin/npx-cli.js /usr/bin/npx
COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries
# Install agent-browser (Copilot browser tool) + Chromium.
# On amd64: install runtime libs + run `agent-browser install` to download
# Chrome for Testing (pinned version, tested with Playwright).
# On arm64: install system chromium package — Chrome for Testing has no ARM64
# binary. AGENT_BROWSER_EXECUTABLE_PATH is set at runtime by the entrypoint
# script (below) to redirect agent-browser to the system binary.
ARG TARGETARCH
RUN apt-get update \
&& if [ "$TARGETARCH" = "arm64" ]; then \
apt-get install -y --no-install-recommends chromium fonts-liberation; \
else \
apt-get install -y --no-install-recommends \
libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 \
libdbus-1-3 libxkbcommon0 libatspi2.0-0t64 libxcomposite1 libxdamage1 \
libxfixes3 libxrandr2 libgbm1 libasound2t64 libpango-1.0-0 libcairo2 \
libx11-6 libx11-xcb1 libxcb1 libxext6 libglib2.0-0t64 \
fonts-liberation libfontconfig1; \
fi \
# Install agent-browser (Copilot browser tool) + Chromium runtime dependencies.
# These are the runtime libraries Chromium/Playwright needs on Debian 13 (trixie).
RUN apt-get update && apt-get install -y --no-install-recommends \
libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 \
libdbus-1-3 libxkbcommon0 libatspi2.0-0t64 libxcomposite1 libxdamage1 \
libxfixes3 libxrandr2 libgbm1 libasound2t64 libpango-1.0-0 libcairo2 \
libx11-6 libx11-xcb1 libxcb1 libxext6 libglib2.0-0t64 \
fonts-liberation libfontconfig1 \
&& rm -rf /var/lib/apt/lists/* \
&& npm install -g agent-browser \
&& ([ "$TARGETARCH" = "arm64" ] || agent-browser install) \
&& agent-browser install \
&& rm -rf /tmp/* /root/.npm
# On arm64 the system chromium is at /usr/bin/chromium; set
# AGENT_BROWSER_EXECUTABLE_PATH so agent-browser's daemon uses it instead of
# Chrome for Testing (which has no ARM64 binary). On amd64 the variable is left
# unset so agent-browser uses the Chrome for Testing binary it downloaded above.
RUN printf '#!/bin/sh\n[ -x /usr/bin/chromium ] && export AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium\nexec "$@"\n' \
> /usr/local/bin/entrypoint.sh \
&& chmod +x /usr/local/bin/entrypoint.sh
WORKDIR /app/autogpt_platform/backend
# Copy only the .venv from builder (not the entire /app directory)
@@ -173,5 +155,4 @@ RUN POETRY_VIRTUALENVS_CREATE=true POETRY_VIRTUALENVS_IN_PROJECT=true \
ENV PORT=8000
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
CMD ["rest"]

View File

@@ -1,8 +1,17 @@
from pydantic import BaseModel
from __future__ import annotations
from datetime import datetime
from typing import TYPE_CHECKING, Any, Literal, Optional
import prisma.enums
from pydantic import BaseModel, EmailStr
from backend.data.model import UserTransaction
from backend.util.models import Pagination
if TYPE_CHECKING:
from backend.data.invited_user import BulkInvitedUsersResult, InvitedUserRecord
class UserHistoryResponse(BaseModel):
"""Response model for listings with version history"""
@@ -14,3 +23,70 @@ class UserHistoryResponse(BaseModel):
class AddUserCreditsResponse(BaseModel):
new_balance: int
transaction_key: str
class CreateInvitedUserRequest(BaseModel):
email: EmailStr
name: Optional[str] = None
class InvitedUserResponse(BaseModel):
id: str
email: str
status: prisma.enums.InvitedUserStatus
auth_user_id: Optional[str] = None
name: Optional[str] = None
tally_understanding: Optional[dict[str, Any]] = None
tally_status: prisma.enums.TallyComputationStatus
tally_computed_at: Optional[datetime] = None
tally_error: Optional[str] = None
created_at: datetime
updated_at: datetime
@classmethod
def from_record(cls, record: InvitedUserRecord) -> InvitedUserResponse:
return cls.model_validate(record.model_dump())
class InvitedUsersResponse(BaseModel):
invited_users: list[InvitedUserResponse]
pagination: Pagination
class BulkInvitedUserRowResponse(BaseModel):
row_number: int
email: Optional[str] = None
name: Optional[str] = None
status: Literal["CREATED", "SKIPPED", "ERROR"]
message: str
invited_user: Optional[InvitedUserResponse] = None
class BulkInvitedUsersResponse(BaseModel):
created_count: int
skipped_count: int
error_count: int
results: list[BulkInvitedUserRowResponse]
@classmethod
def from_result(cls, result: BulkInvitedUsersResult) -> BulkInvitedUsersResponse:
return cls(
created_count=result.created_count,
skipped_count=result.skipped_count,
error_count=result.error_count,
results=[
BulkInvitedUserRowResponse(
row_number=row.row_number,
email=row.email,
name=row.name,
status=row.status,
message=row.message,
invited_user=(
InvitedUserResponse.from_record(row.invited_user)
if row.invited_user is not None
else None
),
)
for row in result.results
],
)

View File

@@ -0,0 +1,137 @@
import logging
import math
from autogpt_libs.auth import get_user_id, requires_admin_user
from fastapi import APIRouter, File, Query, Security, UploadFile
from backend.data.invited_user import (
bulk_create_invited_users_from_file,
create_invited_user,
list_invited_users,
retry_invited_user_tally,
revoke_invited_user,
)
from backend.data.tally import mask_email
from backend.util.models import Pagination
from .model import (
BulkInvitedUsersResponse,
CreateInvitedUserRequest,
InvitedUserResponse,
InvitedUsersResponse,
)
logger = logging.getLogger(__name__)
router = APIRouter(
prefix="/admin",
tags=["users", "admin"],
dependencies=[Security(requires_admin_user)],
)
@router.get(
"/invited-users",
response_model=InvitedUsersResponse,
summary="List Invited Users",
)
async def get_invited_users(
admin_user_id: str = Security(get_user_id),
page: int = Query(1, ge=1),
page_size: int = Query(50, ge=1, le=200),
) -> InvitedUsersResponse:
logger.info("Admin user %s requested invited users", admin_user_id)
invited_users, total = await list_invited_users(page=page, page_size=page_size)
return InvitedUsersResponse(
invited_users=[InvitedUserResponse.from_record(iu) for iu in invited_users],
pagination=Pagination(
total_items=total,
total_pages=max(1, math.ceil(total / page_size)),
current_page=page,
page_size=page_size,
),
)
@router.post(
"/invited-users",
response_model=InvitedUserResponse,
summary="Create Invited User",
)
async def create_invited_user_route(
request: CreateInvitedUserRequest,
admin_user_id: str = Security(get_user_id),
) -> InvitedUserResponse:
logger.info(
"Admin user %s creating invited user for %s",
admin_user_id,
mask_email(request.email),
)
invited_user = await create_invited_user(request.email, request.name)
logger.info(
"Admin user %s created invited user %s",
admin_user_id,
invited_user.id,
)
return InvitedUserResponse.from_record(invited_user)
@router.post(
"/invited-users/bulk",
response_model=BulkInvitedUsersResponse,
summary="Bulk Create Invited Users",
operation_id="postV2BulkCreateInvitedUsers",
)
async def bulk_create_invited_users_route(
file: UploadFile = File(...),
admin_user_id: str = Security(get_user_id),
) -> BulkInvitedUsersResponse:
logger.info(
"Admin user %s bulk invited users from %s",
admin_user_id,
file.filename or "<unnamed>",
)
content = await file.read()
result = await bulk_create_invited_users_from_file(file.filename, content)
return BulkInvitedUsersResponse.from_result(result)
@router.post(
"/invited-users/{invited_user_id}/revoke",
response_model=InvitedUserResponse,
summary="Revoke Invited User",
)
async def revoke_invited_user_route(
invited_user_id: str,
admin_user_id: str = Security(get_user_id),
) -> InvitedUserResponse:
logger.info(
"Admin user %s revoking invited user %s", admin_user_id, invited_user_id
)
invited_user = await revoke_invited_user(invited_user_id)
logger.info("Admin user %s revoked invited user %s", admin_user_id, invited_user_id)
return InvitedUserResponse.from_record(invited_user)
@router.post(
"/invited-users/{invited_user_id}/retry-tally",
response_model=InvitedUserResponse,
summary="Retry Invited User Tally",
)
async def retry_invited_user_tally_route(
invited_user_id: str,
admin_user_id: str = Security(get_user_id),
) -> InvitedUserResponse:
logger.info(
"Admin user %s retrying Tally seed for invited user %s",
admin_user_id,
invited_user_id,
)
invited_user = await retry_invited_user_tally(invited_user_id)
logger.info(
"Admin user %s retried Tally seed for invited user %s",
admin_user_id,
invited_user_id,
)
return InvitedUserResponse.from_record(invited_user)

View File

@@ -0,0 +1,168 @@
from datetime import datetime, timezone
from unittest.mock import AsyncMock
import fastapi
import fastapi.testclient
import prisma.enums
import pytest
import pytest_mock
from autogpt_libs.auth.jwt_utils import get_jwt_payload
from backend.data.invited_user import (
BulkInvitedUserRowResult,
BulkInvitedUsersResult,
InvitedUserRecord,
)
from .user_admin_routes import router as user_admin_router
app = fastapi.FastAPI()
app.include_router(user_admin_router)
client = fastapi.testclient.TestClient(app)
@pytest.fixture(autouse=True)
def setup_app_admin_auth(mock_jwt_admin):
app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
yield
app.dependency_overrides.clear()
def _sample_invited_user() -> InvitedUserRecord:
now = datetime.now(timezone.utc)
return InvitedUserRecord(
id="invite-1",
email="invited@example.com",
status=prisma.enums.InvitedUserStatus.INVITED,
auth_user_id=None,
name="Invited User",
tally_understanding=None,
tally_status=prisma.enums.TallyComputationStatus.PENDING,
tally_computed_at=None,
tally_error=None,
created_at=now,
updated_at=now,
)
def _sample_bulk_invited_users_result() -> BulkInvitedUsersResult:
return BulkInvitedUsersResult(
created_count=1,
skipped_count=1,
error_count=0,
results=[
BulkInvitedUserRowResult(
row_number=1,
email="invited@example.com",
name=None,
status="CREATED",
message="Invite created",
invited_user=_sample_invited_user(),
),
BulkInvitedUserRowResult(
row_number=2,
email="duplicate@example.com",
name=None,
status="SKIPPED",
message="An invited user with this email already exists",
invited_user=None,
),
],
)
def test_get_invited_users(
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"backend.api.features.admin.user_admin_routes.list_invited_users",
AsyncMock(return_value=([_sample_invited_user()], 1)),
)
response = client.get("/admin/invited-users")
assert response.status_code == 200
data = response.json()
assert len(data["invited_users"]) == 1
assert data["invited_users"][0]["email"] == "invited@example.com"
assert data["invited_users"][0]["status"] == "INVITED"
assert data["pagination"]["total_items"] == 1
assert data["pagination"]["current_page"] == 1
assert data["pagination"]["page_size"] == 50
def test_create_invited_user(
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"backend.api.features.admin.user_admin_routes.create_invited_user",
AsyncMock(return_value=_sample_invited_user()),
)
response = client.post(
"/admin/invited-users",
json={"email": "invited@example.com", "name": "Invited User"},
)
assert response.status_code == 200
data = response.json()
assert data["email"] == "invited@example.com"
assert data["name"] == "Invited User"
def test_bulk_create_invited_users(
mocker: pytest_mock.MockerFixture,
) -> None:
mocker.patch(
"backend.api.features.admin.user_admin_routes.bulk_create_invited_users_from_file",
AsyncMock(return_value=_sample_bulk_invited_users_result()),
)
response = client.post(
"/admin/invited-users/bulk",
files={
"file": ("invites.txt", b"invited@example.com\nduplicate@example.com\n")
},
)
assert response.status_code == 200
data = response.json()
assert data["created_count"] == 1
assert data["skipped_count"] == 1
assert data["results"][0]["status"] == "CREATED"
assert data["results"][1]["status"] == "SKIPPED"
def test_revoke_invited_user(
mocker: pytest_mock.MockerFixture,
) -> None:
revoked = _sample_invited_user().model_copy(
update={"status": prisma.enums.InvitedUserStatus.REVOKED}
)
mocker.patch(
"backend.api.features.admin.user_admin_routes.revoke_invited_user",
AsyncMock(return_value=revoked),
)
response = client.post("/admin/invited-users/invite-1/revoke")
assert response.status_code == 200
assert response.json()["status"] == "REVOKED"
def test_retry_invited_user_tally(
mocker: pytest_mock.MockerFixture,
) -> None:
retried = _sample_invited_user().model_copy(
update={"tally_status": prisma.enums.TallyComputationStatus.RUNNING}
)
mocker.patch(
"backend.api.features.admin.user_admin_routes.retry_invited_user_tally",
AsyncMock(return_value=retried),
)
response = client.post("/admin/invited-users/invite-1/retry-tally")
assert response.status_code == 200
assert response.json()["tally_status"] == "RUNNING"

View File

@@ -4,12 +4,14 @@ from difflib import SequenceMatcher
from typing import Any, Sequence, get_args, get_origin
import prisma
from prisma.enums import ContentType
from prisma.models import mv_suggested_blocks
import backend.api.features.library.db as library_db
import backend.api.features.library.model as library_model
import backend.api.features.store.db as store_db
import backend.api.features.store.model as store_model
from backend.api.features.store.hybrid_search import unified_hybrid_search
from backend.blocks import load_all_blocks
from backend.blocks._base import (
AnyBlockSchema,
@@ -22,7 +24,6 @@ from backend.blocks.llm import LlmModel
from backend.integrations.providers import ProviderName
from backend.util.cache import cached
from backend.util.models import Pagination
from backend.util.text import split_camelcase
from .model import (
BlockCategoryResponse,
@@ -270,7 +271,7 @@ async def _build_cached_search_results(
# Use hybrid search when query is present, otherwise list all blocks
if (include_blocks or include_integrations) and normalized_query:
block_results, block_total, integration_total = await _text_search_blocks(
block_results, block_total, integration_total = await _hybrid_search_blocks(
query=search_query,
include_blocks=include_blocks,
include_integrations=include_integrations,
@@ -382,75 +383,117 @@ def _collect_block_results(
return results, block_count, integration_count
async def _text_search_blocks(
async def _hybrid_search_blocks(
*,
query: str,
include_blocks: bool,
include_integrations: bool,
) -> tuple[list[_ScoredItem], int, int]:
"""
Search blocks using in-memory text matching over the block registry.
Search blocks using hybrid search with builder-specific filtering.
All blocks are already loaded in memory, so this is fast and reliable
regardless of whether OpenAI embeddings are available.
Uses unified_hybrid_search for semantic + lexical search, then applies
post-filtering for block/integration types and scoring adjustments.
Scoring:
- Base: text relevance via _score_primary_fields, plus BLOCK_SCORE_BOOST
- Base: hybrid relevance score (0-1) scaled to 0-100, plus BLOCK_SCORE_BOOST
to prioritize blocks over marketplace agents in combined results
- +30 for exact name match, +15 for prefix name match
- +20 if the block has an LlmModel field and the query matches an LLM model name
Args:
query: The search query string
include_blocks: Whether to include regular blocks
include_integrations: Whether to include integration blocks
Returns:
Tuple of (scored_items, block_count, integration_count)
"""
results: list[_ScoredItem] = []
block_count = 0
integration_count = 0
if not include_blocks and not include_integrations:
return results, 0, 0
return results, block_count, integration_count
normalized_query = query.strip().lower()
all_results, _, _ = _collect_block_results(
include_blocks=include_blocks,
include_integrations=include_integrations,
# Fetch more results to account for post-filtering
search_results, _ = await unified_hybrid_search(
query=query,
content_types=[ContentType.BLOCK],
page=1,
page_size=150,
min_score=0.10,
)
# Load all blocks for getting BlockInfo
all_blocks = load_all_blocks()
for item in all_results:
block_info = item.item
assert isinstance(block_info, BlockInfo)
name = split_camelcase(block_info.name).lower()
for result in search_results:
block_id = result["content_id"]
# Build rich description including input field descriptions,
# matching the searchable text that the embedding pipeline uses
desc_parts = [block_info.description or ""]
block_cls = all_blocks.get(block_info.id)
if block_cls is not None:
block: AnyBlockSchema = block_cls()
desc_parts += [
f"{f}: {info.description}"
for f, info in block.input_schema.model_fields.items()
if info.description
]
description = " ".join(desc_parts).lower()
# Skip excluded blocks
if block_id in EXCLUDED_BLOCK_IDS:
continue
score = _score_primary_fields(name, description, normalized_query)
metadata = result.get("metadata", {})
hybrid_score = result.get("relevance", 0.0)
# Get the actual block class
if block_id not in all_blocks:
continue
block_cls = all_blocks[block_id]
block: AnyBlockSchema = block_cls()
if block.disabled:
continue
# Check block/integration filter using metadata
is_integration = metadata.get("is_integration", False)
if is_integration and not include_integrations:
continue
if not is_integration and not include_blocks:
continue
# Get block info
block_info = block.get_info()
# Calculate final score: scale hybrid score and add builder-specific bonuses
# Hybrid scores are 0-1, builder scores were 0-200+
# Add BLOCK_SCORE_BOOST to prioritize blocks over marketplace agents
final_score = hybrid_score * 100 + BLOCK_SCORE_BOOST
# Add LLM model match bonus
if block_cls is not None and _matches_llm_model(
block_cls().input_schema, normalized_query
):
score += 20
has_llm_field = metadata.get("has_llm_model_field", False)
if has_llm_field and _matches_llm_model(block.input_schema, normalized_query):
final_score += 20
if score >= MIN_SCORE_FOR_FILTERED_RESULTS:
results.append(
_ScoredItem(
item=block_info,
filter_type=item.filter_type,
score=score + BLOCK_SCORE_BOOST,
sort_key=name,
)
# Add exact/prefix match bonus for deterministic tie-breaking
name = block_info.name.lower()
if name == normalized_query:
final_score += 30
elif name.startswith(normalized_query):
final_score += 15
# Track counts
filter_type: FilterType = "integrations" if is_integration else "blocks"
if is_integration:
integration_count += 1
else:
block_count += 1
results.append(
_ScoredItem(
item=block_info,
filter_type=filter_type,
score=final_score,
sort_key=name,
)
)
block_count = sum(1 for r in results if r.filter_type == "blocks")
integration_count = sum(1 for r in results if r.filter_type == "integrations")
return results, block_count, integration_count

View File

@@ -60,6 +60,7 @@ from backend.copilot.tools.models import (
)
from backend.copilot.tracking import track_user_message
from backend.data.redis_client import get_redis_async
from backend.data.understanding import get_business_understanding
from backend.data.workspace import get_or_create_workspace
from backend.util.exceptions import NotFoundError
@@ -894,6 +895,36 @@ async def session_assign_user(
return {"status": "ok"}
# ========== Suggested Prompts ==========
class SuggestedPromptsResponse(BaseModel):
"""Response model for user-specific suggested prompts."""
prompts: list[str]
@router.get(
"/suggested-prompts",
dependencies=[Security(auth.requires_user)],
)
async def get_suggested_prompts(
user_id: Annotated[str, Security(auth.get_user_id)],
) -> SuggestedPromptsResponse:
"""
Get LLM-generated suggested prompts for the authenticated user.
Returns personalized quick-action prompts based on the user's
business understanding. Returns an empty list if no custom prompts
are available.
"""
understanding = await get_business_understanding(user_id)
if understanding is None:
return SuggestedPromptsResponse(prompts=[])
return SuggestedPromptsResponse(prompts=understanding.suggested_prompts)
# ========== Configuration ==========

View File

@@ -1,7 +1,7 @@
"""Tests for chat API routes: session title update, file attachment validation, usage, and rate limiting."""
"""Tests for chat API routes: session title update, file attachment validation, usage, rate limiting, and suggested prompts."""
from datetime import UTC, datetime, timedelta
from unittest.mock import AsyncMock
from unittest.mock import AsyncMock, MagicMock
import fastapi
import fastapi.testclient
@@ -400,3 +400,62 @@ def test_usage_rejects_unauthenticated_request() -> None:
response = unauthenticated_client.get("/usage")
assert response.status_code == 401
# ─── Suggested prompts endpoint ──────────────────────────────────────
def _mock_get_business_understanding(
mocker: pytest_mock.MockerFixture,
*,
return_value=None,
):
"""Mock get_business_understanding."""
return mocker.patch(
"backend.api.features.chat.routes.get_business_understanding",
new_callable=AsyncMock,
return_value=return_value,
)
def test_suggested_prompts_returns_prompts(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""User with understanding and prompts gets them back."""
mock_understanding = MagicMock()
mock_understanding.suggested_prompts = ["Do X", "Do Y", "Do Z"]
_mock_get_business_understanding(mocker, return_value=mock_understanding)
response = client.get("/suggested-prompts")
assert response.status_code == 200
assert response.json() == {"prompts": ["Do X", "Do Y", "Do Z"]}
def test_suggested_prompts_no_understanding(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""User with no understanding gets empty list."""
_mock_get_business_understanding(mocker, return_value=None)
response = client.get("/suggested-prompts")
assert response.status_code == 200
assert response.json() == {"prompts": []}
def test_suggested_prompts_empty_prompts(
mocker: pytest_mock.MockerFixture,
test_user_id: str,
) -> None:
"""User with understanding but no prompts gets empty list."""
mock_understanding = MagicMock()
mock_understanding.suggested_prompts = []
_mock_get_business_understanding(mocker, return_value=mock_understanding)
response = client.get("/suggested-prompts")
assert response.status_code == 200
assert response.json() == {"prompts": []}

View File

@@ -9,7 +9,7 @@ import prisma.errors
import prisma.models
import prisma.types
from backend.data.db import query_raw_with_schema, transaction
from backend.data.db import transaction
from backend.data.graph import (
GraphModel,
GraphModelWithoutNodes,
@@ -104,8 +104,7 @@ async def get_store_agents(
# search_used_hybrid remains False, will use fallback path below
# Convert hybrid search results (dict format) if hybrid succeeded
# Fall through to direct DB search if hybrid returned nothing
if search_used_hybrid and agents:
if search_used_hybrid:
total_pages = (total + page_size - 1) // page_size
store_agents: list[store_model.StoreAgent] = []
for agent in agents:
@@ -131,20 +130,52 @@ async def get_store_agents(
)
continue
if not search_used_hybrid or not agents:
# Fallback path: direct DB query with optional tsvector search.
# This mirrors the original pre-hybrid-search implementation.
store_agents, total = await _fallback_store_agent_search(
search_query=search_query,
featured=featured,
creators=creators,
category=category,
sorted_by=sorted_by,
page=page,
page_size=page_size,
if not search_used_hybrid:
# Fallback path - use basic search or no search
where_clause: prisma.types.StoreAgentWhereInput = {"is_available": True}
if featured:
where_clause["featured"] = featured
if creators:
where_clause["creator_username"] = {"in": creators}
if category:
where_clause["categories"] = {"has": category}
# Add basic text search if search_query provided but hybrid failed
if search_query:
where_clause["OR"] = [
{"agent_name": {"contains": search_query, "mode": "insensitive"}},
{"sub_heading": {"contains": search_query, "mode": "insensitive"}},
{"description": {"contains": search_query, "mode": "insensitive"}},
]
order_by = []
if sorted_by == StoreAgentsSortOptions.RATING:
order_by.append({"rating": "desc"})
elif sorted_by == StoreAgentsSortOptions.RUNS:
order_by.append({"runs": "desc"})
elif sorted_by == StoreAgentsSortOptions.NAME:
order_by.append({"agent_name": "asc"})
elif sorted_by == StoreAgentsSortOptions.UPDATED_AT:
order_by.append({"updated_at": "desc"})
db_agents = await prisma.models.StoreAgent.prisma().find_many(
where=where_clause,
order=order_by,
skip=(page - 1) * page_size,
take=page_size,
)
total = await prisma.models.StoreAgent.prisma().count(where=where_clause)
total_pages = (total + page_size - 1) // page_size
store_agents: list[store_model.StoreAgent] = []
for agent in db_agents:
try:
store_agents.append(store_model.StoreAgent.from_db(agent))
except Exception as e:
logger.error(f"Error parsing StoreAgent from db: {e}")
continue
logger.debug(f"Found {len(store_agents)} agents")
return store_model.StoreAgentsResponse(
agents=store_agents,
@@ -164,126 +195,6 @@ async def get_store_agents(
# await log_search_term(search_query=search_term)
async def _fallback_store_agent_search(
*,
search_query: str | None,
featured: bool,
creators: list[str] | None,
category: str | None,
sorted_by: StoreAgentsSortOptions | None,
page: int,
page_size: int,
) -> tuple[list[store_model.StoreAgent], int]:
"""Direct DB search fallback when hybrid search is unavailable or empty.
Uses ad-hoc to_tsvector/plainto_tsquery with ts_rank_cd for text search,
matching the quality of the original pre-hybrid-search implementation.
Falls back to simple listing when no search query is provided.
"""
if not search_query:
# No search query — use Prisma for simple filtered listing
where_clause: prisma.types.StoreAgentWhereInput = {"is_available": True}
if featured:
where_clause["featured"] = featured
if creators:
where_clause["creator_username"] = {"in": creators}
if category:
where_clause["categories"] = {"has": category}
order_by = []
if sorted_by == StoreAgentsSortOptions.RATING:
order_by.append({"rating": "desc"})
elif sorted_by == StoreAgentsSortOptions.RUNS:
order_by.append({"runs": "desc"})
elif sorted_by == StoreAgentsSortOptions.NAME:
order_by.append({"agent_name": "asc"})
elif sorted_by == StoreAgentsSortOptions.UPDATED_AT:
order_by.append({"updated_at": "desc"})
db_agents = await prisma.models.StoreAgent.prisma().find_many(
where=where_clause,
order=order_by,
skip=(page - 1) * page_size,
take=page_size,
)
total = await prisma.models.StoreAgent.prisma().count(where=where_clause)
return [store_model.StoreAgent.from_db(a) for a in db_agents], total
# Text search using ad-hoc tsvector on StoreAgent view fields
params: list[Any] = [search_query]
filters = ["sa.is_available = true"]
param_idx = 2
if featured:
filters.append("sa.featured = true")
if creators:
params.append(creators)
filters.append(f"sa.creator_username = ANY(${param_idx})")
param_idx += 1
if category:
params.append(category)
filters.append(f"${param_idx} = ANY(sa.categories)")
param_idx += 1
where_sql = " AND ".join(filters)
params.extend([page_size, (page - 1) * page_size])
limit_param = f"${param_idx}"
param_idx += 1
offset_param = f"${param_idx}"
sql = f"""
WITH ranked AS (
SELECT sa.*,
ts_rank_cd(
to_tsvector('english',
COALESCE(sa.agent_name, '') || ' ' ||
COALESCE(sa.sub_heading, '') || ' ' ||
COALESCE(sa.description, '')
),
plainto_tsquery('english', $1)
) AS rank,
COUNT(*) OVER () AS total_count
FROM {{schema_prefix}}"StoreAgent" sa
WHERE {where_sql}
AND to_tsvector('english',
COALESCE(sa.agent_name, '') || ' ' ||
COALESCE(sa.sub_heading, '') || ' ' ||
COALESCE(sa.description, '')
) @@ plainto_tsquery('english', $1)
)
SELECT * FROM ranked
ORDER BY rank DESC
LIMIT {limit_param} OFFSET {offset_param}
"""
results = await query_raw_with_schema(sql, *params)
total = results[0]["total_count"] if results else 0
store_agents = []
for row in results:
try:
store_agents.append(
store_model.StoreAgent(
slug=row["slug"],
agent_name=row["agent_name"],
agent_image=row["agent_image"][0] if row["agent_image"] else "",
creator=row["creator_username"] or "Needs Profile",
creator_avatar=row["creator_avatar"] or "",
sub_heading=row["sub_heading"],
description=row["description"],
runs=row["runs"],
rating=row["rating"],
agent_graph_id=row.get("graph_id", ""),
)
)
except Exception as e:
logger.error(f"Error parsing StoreAgent from fallback search: {e}")
continue
return store_agents, total
async def log_search_term(search_query: str):
"""Log a search term to the database"""
@@ -1228,21 +1139,16 @@ async def review_store_submission(
},
)
# Generate embedding for approved listing (best-effort)
try:
await ensure_embedding(
version_id=store_listing_version_id,
name=submission.name,
description=submission.description,
sub_heading=submission.subHeading,
categories=submission.categories,
tx=tx,
)
except Exception as emb_err:
logger.warning(
f"Could not generate embedding for listing "
f"{store_listing_version_id}: {emb_err}"
)
# Generate embedding for approved listing (blocking - admin operation)
# Inside transaction: if embedding fails, entire transaction rolls back
await ensure_embedding(
version_id=store_listing_version_id,
name=submission.name,
description=submission.description,
sub_heading=submission.subHeading,
categories=submission.categories,
tx=tx,
)
await prisma.models.StoreListing.prisma(tx).update(
where={"id": submission.storeListingId},

View File

@@ -1,4 +1,5 @@
import logging
import tempfile
import urllib.parse
import autogpt_libs.auth
@@ -258,18 +259,21 @@ async def get_graph_meta_by_store_listing_version_id(
)
async def download_agent_file(
store_listing_version_id: str,
) -> fastapi.responses.Response:
) -> fastapi.responses.FileResponse:
"""Download agent graph file for a specific marketplace listing version"""
graph_data = await store_db.get_agent(store_listing_version_id)
file_name = f"agent_{graph_data.id}_v{graph_data.version or 'latest'}.json"
return fastapi.responses.Response(
content=backend.util.json.dumps(graph_data),
media_type="application/json",
headers={
"Content-Disposition": f'attachment; filename="{file_name}"',
},
)
# Sending graph as a stream (similar to marketplace v1)
with tempfile.NamedTemporaryFile(
mode="w", suffix=".json", delete=False
) as tmp_file:
tmp_file.write(backend.util.json.dumps(graph_data))
tmp_file.flush()
return fastapi.responses.FileResponse(
tmp_file.name, filename=file_name, media_type="application/json"
)
##############################################

View File

@@ -55,6 +55,7 @@ from backend.data.credit import (
set_auto_top_up,
)
from backend.data.graph import GraphSettings
from backend.data.invited_user import get_or_activate_user
from backend.data.model import CredentialsMetaInput, UserOnboarding
from backend.data.notifications import NotificationPreference, NotificationPreferenceDTO
from backend.data.onboarding import (
@@ -70,7 +71,6 @@ from backend.data.onboarding import (
update_user_onboarding,
)
from backend.data.user import (
get_or_create_user,
get_user_by_id,
get_user_notification_preference,
update_user_email,
@@ -136,12 +136,10 @@ _tally_background_tasks: set[asyncio.Task] = set()
dependencies=[Security(requires_user)],
)
async def get_or_create_user_route(user_data: dict = Security(get_jwt_payload)):
user = await get_or_create_user(user_data)
user = await get_or_activate_user(user_data)
# Fire-and-forget: populate business understanding from Tally form.
# We use created_at proximity instead of an is_new flag because
# get_or_create_user is cached — a separate is_new return value would be
# unreliable on repeated calls within the cache TTL.
# Fire-and-forget: backfill Tally understanding when invite pre-seeding did
# not produce a stored result before first activation.
age_seconds = (datetime.now(timezone.utc) - user.created_at).total_seconds()
if age_seconds < 30:
try:
@@ -165,7 +163,8 @@ async def get_or_create_user_route(user_data: dict = Security(get_jwt_payload)):
dependencies=[Security(requires_user)],
)
async def update_user_email_route(
user_id: Annotated[str, Security(get_user_id)], email: str = Body(...)
user_id: Annotated[str, Security(get_user_id)],
email: str = Body(...),
) -> dict[str, str]:
await update_user_email(user_id, email)
@@ -179,10 +178,16 @@ async def update_user_email_route(
dependencies=[Security(requires_user)],
)
async def get_user_timezone_route(
user_data: dict = Security(get_jwt_payload),
user_id: Annotated[str, Security(get_user_id)],
) -> TimezoneResponse:
"""Get user timezone setting."""
user = await get_or_create_user(user_data)
try:
user = await get_user_by_id(user_id)
except ValueError:
raise HTTPException(
status_code=HTTP_404_NOT_FOUND,
detail="User not found. Please complete activation via /auth/user first.",
)
return TimezoneResponse(timezone=user.timezone)
@@ -193,7 +198,8 @@ async def get_user_timezone_route(
dependencies=[Security(requires_user)],
)
async def update_user_timezone_route(
user_id: Annotated[str, Security(get_user_id)], request: UpdateTimezoneRequest
user_id: Annotated[str, Security(get_user_id)],
request: UpdateTimezoneRequest,
) -> TimezoneResponse:
"""Update user timezone. The timezone should be a valid IANA timezone identifier."""
user = await update_user_timezone(user_id, str(request.timezone))
@@ -592,11 +598,6 @@ async def fulfill_checkout(user_id: Annotated[str, Security(get_user_id)]):
async def configure_user_auto_top_up(
request: AutoTopUpConfig, user_id: Annotated[str, Security(get_user_id)]
) -> str:
"""Configure auto top-up settings and perform an immediate top-up if needed.
Raises HTTPException(422) if the request parameters are invalid or if
the credit top-up fails.
"""
if request.threshold < 0:
raise HTTPException(status_code=422, detail="Threshold must be greater than 0")
if request.amount < 500 and request.amount != 0:
@@ -611,20 +612,10 @@ async def configure_user_auto_top_up(
user_credit_model = await get_user_credit_model(user_id)
current_balance = await user_credit_model.get_credits(user_id)
try:
if current_balance < request.threshold:
await user_credit_model.top_up_credits(user_id, request.amount)
else:
await user_credit_model.top_up_credits(user_id, 0)
except ValueError as e:
known_messages = (
"must not be negative",
"already exists for user",
"No payment method found",
)
if any(msg in str(e) for msg in known_messages):
raise HTTPException(status_code=422, detail=str(e))
raise
if current_balance < request.threshold:
await user_credit_model.top_up_credits(user_id, request.amount)
else:
await user_credit_model.top_up_credits(user_id, 0)
await set_auto_top_up(
user_id, AutoTopUpConfig(threshold=request.threshold, amount=request.amount)

View File

@@ -51,7 +51,7 @@ def test_get_or_create_user_route(
}
mocker.patch(
"backend.api.features.v1.get_or_create_user",
"backend.api.features.v1.get_or_activate_user",
return_value=mock_user,
)

View File

@@ -188,7 +188,6 @@ async def upload_file(
user_id: Annotated[str, fastapi.Security(get_user_id)],
file: UploadFile,
session_id: str | None = Query(default=None),
overwrite: bool = Query(default=False),
) -> UploadFileResponse:
"""
Upload a file to the user's workspace.
@@ -249,9 +248,7 @@ async def upload_file(
# Write file via WorkspaceManager
manager = WorkspaceManager(user_id, workspace.id, session_id)
try:
workspace_file = await manager.write_file(
content, filename, overwrite=overwrite
)
workspace_file = await manager.write_file(content, filename)
except ValueError as e:
raise fastapi.HTTPException(status_code=409, detail=str(e)) from e

View File

@@ -1,4 +1,3 @@
import asyncio
import contextlib
import logging
import platform
@@ -20,6 +19,7 @@ from prisma.errors import PrismaError
import backend.api.features.admin.credit_admin_routes
import backend.api.features.admin.execution_analytics_routes
import backend.api.features.admin.store_admin_routes
import backend.api.features.admin.user_admin_routes
import backend.api.features.builder
import backend.api.features.builder.routes
import backend.api.features.chat.routes as chat_routes
@@ -38,10 +38,8 @@ import backend.api.features.workspace.routes as workspace_routes
import backend.data.block
import backend.data.db
import backend.data.graph
import backend.data.llm_registry
import backend.data.user
import backend.integrations.webhooks.utils
import backend.server.v2.llm
import backend.util.service
import backend.util.settings
from backend.api.features.library.exceptions import (
@@ -120,56 +118,16 @@ async def lifespan_context(app: fastapi.FastAPI):
AutoRegistry.patch_integrations()
# Load LLM registry before initializing blocks so blocks can use registry data.
# Tries Redis first (fast path on warm restart), falls back to DB.
# Note: Graceful fallback for now since no blocks consume registry yet (comes in PR #5)
try:
await backend.data.llm_registry.refresh_llm_registry()
logger.info("LLM registry loaded successfully at startup")
except Exception as e:
logger.warning(
f"Failed to load LLM registry at startup: {e}. "
"Blocks will initialize with empty registry."
)
# Start background task so this worker reloads its in-process cache whenever
# another worker (e.g. the admin API) refreshes the registry.
_registry_subscription_task = asyncio.create_task(
backend.data.llm_registry.subscribe_to_registry_refresh(
backend.data.llm_registry.refresh_llm_registry
)
)
await backend.data.block.initialize_blocks()
await backend.data.user.migrate_and_encrypt_user_integrations()
await backend.data.graph.fix_llm_provider_credentials()
try:
await backend.data.graph.migrate_llm_models(DEFAULT_LLM_MODEL)
except Exception as e:
err_str = str(e)
if "AgentNode" in err_str or "does not exist" in err_str:
logger.warning(
f"migrate_llm_models skipped: AgentNode table not found ({e}). "
"This is expected in test environments."
)
else:
logger.error(
f"migrate_llm_models failed unexpectedly: {e}",
exc_info=True,
)
await backend.data.graph.migrate_llm_models(DEFAULT_LLM_MODEL)
await backend.integrations.webhooks.utils.migrate_legacy_triggered_graphs()
with launch_darkly_context():
yield
_registry_subscription_task.cancel()
try:
await _registry_subscription_task
except asyncio.CancelledError:
pass
try:
await shutdown_cloud_storage_handler()
except Exception as e:
@@ -253,22 +211,13 @@ instrument_fastapi(
def handle_internal_http_error(status_code: int = 500, log_error: bool = True):
def handler(request: fastapi.Request, exc: Exception):
if log_error:
if status_code >= 500:
logger.exception(
"%s %s failed. Investigate and resolve the underlying issue: %s",
request.method,
request.url.path,
exc,
exc_info=exc,
)
else:
logger.warning(
"%s %s failed with %d: %s",
request.method,
request.url.path,
status_code,
exc,
)
logger.exception(
"%s %s failed. Investigate and resolve the underlying issue: %s",
request.method,
request.url.path,
exc,
exc_info=exc,
)
hint = (
"Adjust the request and retry."
@@ -318,10 +267,12 @@ async def validation_error_handler(
app.add_exception_handler(PrismaError, handle_internal_http_error(500))
app.add_exception_handler(FolderAlreadyExistsError, handle_internal_http_error(409))
app.add_exception_handler(FolderValidationError, handle_internal_http_error(400))
app.add_exception_handler(NotFoundError, handle_internal_http_error(404))
app.add_exception_handler(NotAuthorizedError, handle_internal_http_error(403))
app.add_exception_handler(
FolderAlreadyExistsError, handle_internal_http_error(409, False)
)
app.add_exception_handler(FolderValidationError, handle_internal_http_error(400, False))
app.add_exception_handler(NotFoundError, handle_internal_http_error(404, False))
app.add_exception_handler(NotAuthorizedError, handle_internal_http_error(403, False))
app.add_exception_handler(RequestValidationError, validation_error_handler)
app.add_exception_handler(pydantic.ValidationError, validation_error_handler)
app.add_exception_handler(MissingConfigError, handle_internal_http_error(503))
@@ -361,6 +312,11 @@ app.include_router(
tags=["v2", "admin"],
prefix="/api/executions",
)
app.include_router(
backend.api.features.admin.user_admin_routes.router,
tags=["v2", "admin"],
prefix="/api/users",
)
app.include_router(
backend.api.features.executions.review.routes.router,
tags=["v2", "executions", "review"],
@@ -398,11 +354,6 @@ app.include_router(
tags=["oauth"],
prefix="/api/oauth",
)
app.include_router(
backend.server.v2.llm.router,
tags=["v2", "llm"],
prefix="/api",
)
app.mount("/external-api", external_api)

View File

@@ -1,33 +0,0 @@
"""
Shared configuration for all AgentMail blocks.
"""
from agentmail import AsyncAgentMail
from backend.sdk import APIKeyCredentials, ProviderBuilder, SecretStr
agent_mail = (
ProviderBuilder("agent_mail")
.with_api_key("AGENTMAIL_API_KEY", "AgentMail API Key")
.build()
)
TEST_CREDENTIALS = APIKeyCredentials(
id="01234567-89ab-cdef-0123-456789abcdef",
provider="agent_mail",
title="Mock AgentMail API Key",
api_key=SecretStr("mock-agentmail-api-key"),
expires_at=None,
)
TEST_CREDENTIALS_INPUT = {
"id": TEST_CREDENTIALS.id,
"provider": TEST_CREDENTIALS.provider,
"type": TEST_CREDENTIALS.type,
"title": TEST_CREDENTIALS.title,
}
def _client(credentials: APIKeyCredentials) -> AsyncAgentMail:
"""Create an AsyncAgentMail client from credentials."""
return AsyncAgentMail(api_key=credentials.api_key.get_secret_value())

View File

@@ -1,211 +0,0 @@
"""
AgentMail Attachment blocks — download file attachments from messages and threads.
Attachments are files associated with messages (PDFs, CSVs, images, etc.).
To send attachments, include them in the attachments parameter when using
AgentMailSendMessageBlock or AgentMailReplyToMessageBlock.
To download, first get the attachment_id from a message's attachments array,
then use these blocks to retrieve the file content as base64.
"""
import base64
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
class AgentMailGetMessageAttachmentBlock(Block):
"""
Download a file attachment from a specific email message.
Retrieves the raw file content and returns it as base64-encoded data.
First get the attachment_id from a message object's attachments array,
then use this block to download the file.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the message belongs to"
)
message_id: str = SchemaField(
description="Message ID containing the attachment"
)
attachment_id: str = SchemaField(
description="Attachment ID to download (from the message's attachments array)"
)
class Output(BlockSchemaOutput):
content_base64: str = SchemaField(
description="File content encoded as a base64 string. Decode with base64.b64decode() to get raw bytes."
)
attachment_id: str = SchemaField(
description="The attachment ID that was downloaded"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="a283ffc4-8087-4c3d-9135-8f26b86742ec",
description="Download a file attachment from an email message. Returns base64-encoded file content.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"message_id": "test-msg",
"attachment_id": "test-attach",
},
test_output=[
("content_base64", "dGVzdA=="),
("attachment_id", "test-attach"),
],
test_mock={
"get_attachment": lambda *a, **kw: b"test",
},
)
@staticmethod
async def get_attachment(
credentials: APIKeyCredentials,
inbox_id: str,
message_id: str,
attachment_id: str,
):
client = _client(credentials)
return await client.inboxes.messages.get_attachment(
inbox_id=inbox_id,
message_id=message_id,
attachment_id=attachment_id,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
data = await self.get_attachment(
credentials=credentials,
inbox_id=input_data.inbox_id,
message_id=input_data.message_id,
attachment_id=input_data.attachment_id,
)
if isinstance(data, bytes):
encoded = base64.b64encode(data).decode()
elif isinstance(data, str):
encoded = base64.b64encode(data.encode("utf-8")).decode()
else:
raise TypeError(
f"Unexpected attachment data type: {type(data).__name__}"
)
yield "content_base64", encoded
yield "attachment_id", input_data.attachment_id
except Exception as e:
yield "error", str(e)
class AgentMailGetThreadAttachmentBlock(Block):
"""
Download a file attachment from a conversation thread.
Same as GetMessageAttachment but looks up by thread ID instead of
message ID. Useful when you know the thread but not the specific
message containing the attachment.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the thread belongs to"
)
thread_id: str = SchemaField(description="Thread ID containing the attachment")
attachment_id: str = SchemaField(
description="Attachment ID to download (from a message's attachments array within the thread)"
)
class Output(BlockSchemaOutput):
content_base64: str = SchemaField(
description="File content encoded as a base64 string. Decode with base64.b64decode() to get raw bytes."
)
attachment_id: str = SchemaField(
description="The attachment ID that was downloaded"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="06b6a4c4-9d71-4992-9e9c-cf3b352763b5",
description="Download a file attachment from a conversation thread. Returns base64-encoded file content.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"thread_id": "test-thread",
"attachment_id": "test-attach",
},
test_output=[
("content_base64", "dGVzdA=="),
("attachment_id", "test-attach"),
],
test_mock={
"get_attachment": lambda *a, **kw: b"test",
},
)
@staticmethod
async def get_attachment(
credentials: APIKeyCredentials,
inbox_id: str,
thread_id: str,
attachment_id: str,
):
client = _client(credentials)
return await client.inboxes.threads.get_attachment(
inbox_id=inbox_id,
thread_id=thread_id,
attachment_id=attachment_id,
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
data = await self.get_attachment(
credentials=credentials,
inbox_id=input_data.inbox_id,
thread_id=input_data.thread_id,
attachment_id=input_data.attachment_id,
)
if isinstance(data, bytes):
encoded = base64.b64encode(data).decode()
elif isinstance(data, str):
encoded = base64.b64encode(data.encode("utf-8")).decode()
else:
raise TypeError(
f"Unexpected attachment data type: {type(data).__name__}"
)
yield "content_base64", encoded
yield "attachment_id", input_data.attachment_id
except Exception as e:
yield "error", str(e)

View File

@@ -1,678 +0,0 @@
"""
AgentMail Draft blocks — create, get, list, update, send, and delete drafts.
A Draft is an unsent message that can be reviewed, edited, and sent later.
Drafts enable human-in-the-loop review, scheduled sending (via send_at),
and complex multi-step email composition workflows.
"""
from typing import Optional
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
class AgentMailCreateDraftBlock(Block):
"""
Create a draft email in an AgentMail inbox for review or scheduled sending.
Drafts let agents prepare emails without sending immediately. Use send_at
to schedule automatic sending at a future time (ISO 8601 format).
Scheduled drafts are auto-labeled 'scheduled' and can be cancelled by
deleting the draft.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to create the draft in"
)
to: list[str] = SchemaField(
description="Recipient email addresses (e.g. ['user@example.com'])"
)
subject: str = SchemaField(description="Email subject line", default="")
text: str = SchemaField(description="Plain text body of the draft", default="")
html: str = SchemaField(
description="Rich HTML body of the draft", default="", advanced=True
)
cc: list[str] = SchemaField(
description="CC recipient email addresses",
default_factory=list,
advanced=True,
)
bcc: list[str] = SchemaField(
description="BCC recipient email addresses",
default_factory=list,
advanced=True,
)
in_reply_to: str = SchemaField(
description="Message ID this draft replies to, for threading follow-up drafts",
default="",
advanced=True,
)
send_at: str = SchemaField(
description="Schedule automatic sending at this ISO 8601 datetime (e.g. '2025-01-15T09:00:00Z'). Leave empty for manual send.",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
draft_id: str = SchemaField(
description="Unique identifier of the created draft"
)
send_status: str = SchemaField(
description="'scheduled' if send_at was set, empty otherwise. Values: scheduled, sending, failed.",
default="",
)
result: dict = SchemaField(
description="Complete draft object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="25ac9086-69fd-48b8-b910-9dbe04b8f3bd",
description="Create a draft email for review or scheduled sending. Use send_at for automatic future delivery.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"to": ["user@example.com"],
},
test_output=[
("draft_id", "mock-draft-id"),
("send_status", ""),
("result", dict),
],
test_mock={
"create_draft": lambda *a, **kw: type(
"Draft",
(),
{
"draft_id": "mock-draft-id",
"send_status": "",
"model_dump": lambda self: {"draft_id": "mock-draft-id"},
},
)(),
},
)
@staticmethod
async def create_draft(credentials: APIKeyCredentials, inbox_id: str, **params):
client = _client(credentials)
return await client.inboxes.drafts.create(inbox_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"to": input_data.to}
if input_data.subject:
params["subject"] = input_data.subject
if input_data.text:
params["text"] = input_data.text
if input_data.html:
params["html"] = input_data.html
if input_data.cc:
params["cc"] = input_data.cc
if input_data.bcc:
params["bcc"] = input_data.bcc
if input_data.in_reply_to:
params["in_reply_to"] = input_data.in_reply_to
if input_data.send_at:
params["send_at"] = input_data.send_at
draft = await self.create_draft(credentials, input_data.inbox_id, **params)
result = draft.model_dump()
yield "draft_id", draft.draft_id
yield "send_status", draft.send_status or ""
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailGetDraftBlock(Block):
"""
Retrieve a specific draft from an AgentMail inbox.
Returns the draft contents including recipients, subject, body, and
scheduled send status. Use this to review a draft before approving it.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the draft belongs to"
)
draft_id: str = SchemaField(description="Draft ID to retrieve")
class Output(BlockSchemaOutput):
draft_id: str = SchemaField(description="Unique identifier of the draft")
subject: str = SchemaField(description="Draft subject line", default="")
send_status: str = SchemaField(
description="Scheduled send status: 'scheduled', 'sending', 'failed', or empty",
default="",
)
send_at: str = SchemaField(
description="Scheduled send time (ISO 8601) if set", default=""
)
result: dict = SchemaField(description="Complete draft object with all fields")
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="8e57780d-dc25-43d4-a0f4-1f02877b09fb",
description="Retrieve a draft email to review its contents, recipients, and scheduled send status.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"draft_id": "test-draft",
},
test_output=[
("draft_id", "test-draft"),
("subject", ""),
("send_status", ""),
("send_at", ""),
("result", dict),
],
test_mock={
"get_draft": lambda *a, **kw: type(
"Draft",
(),
{
"draft_id": "test-draft",
"subject": "",
"send_status": "",
"send_at": "",
"model_dump": lambda self: {"draft_id": "test-draft"},
},
)(),
},
)
@staticmethod
async def get_draft(credentials: APIKeyCredentials, inbox_id: str, draft_id: str):
client = _client(credentials)
return await client.inboxes.drafts.get(inbox_id=inbox_id, draft_id=draft_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
draft = await self.get_draft(
credentials, input_data.inbox_id, input_data.draft_id
)
result = draft.model_dump()
yield "draft_id", draft.draft_id
yield "subject", draft.subject or ""
yield "send_status", draft.send_status or ""
yield "send_at", draft.send_at or ""
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailListDraftsBlock(Block):
"""
List all drafts in an AgentMail inbox with optional label filtering.
Use labels=['scheduled'] to find all drafts queued for future sending.
Useful for building approval dashboards or monitoring pending outreach.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to list drafts from"
)
limit: int = SchemaField(
description="Maximum number of drafts to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
labels: list[str] = SchemaField(
description="Filter drafts by labels (e.g. ['scheduled'] for pending sends)",
default_factory=list,
advanced=True,
)
class Output(BlockSchemaOutput):
drafts: list[dict] = SchemaField(
description="List of draft objects with subject, recipients, send_status, etc."
)
count: int = SchemaField(description="Number of drafts returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="e84883b7-7c39-4c5c-88e8-0a72b078ea63",
description="List drafts in an AgentMail inbox. Filter by labels=['scheduled'] to find pending sends.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
},
test_output=[
("drafts", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_drafts": lambda *a, **kw: type(
"Resp",
(),
{
"drafts": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_drafts(credentials: APIKeyCredentials, inbox_id: str, **params):
client = _client(credentials)
return await client.inboxes.drafts.list(inbox_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
if input_data.labels:
params["labels"] = input_data.labels
response = await self.list_drafts(
credentials, input_data.inbox_id, **params
)
drafts = [d.model_dump() for d in response.drafts]
yield "drafts", drafts
yield "count", response.count
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailUpdateDraftBlock(Block):
"""
Update an existing draft's content, recipients, or scheduled send time.
Use this to reschedule a draft (change send_at), modify recipients,
or edit the subject/body before sending. To cancel a scheduled send,
delete the draft instead.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the draft belongs to"
)
draft_id: str = SchemaField(description="Draft ID to update")
to: Optional[list[str]] = SchemaField(
description="Updated recipient email addresses (replaces existing list). Omit to keep current value.",
default=None,
)
subject: Optional[str] = SchemaField(
description="Updated subject line. Omit to keep current value.",
default=None,
)
text: Optional[str] = SchemaField(
description="Updated plain text body. Omit to keep current value.",
default=None,
)
html: Optional[str] = SchemaField(
description="Updated HTML body. Omit to keep current value.",
default=None,
advanced=True,
)
send_at: Optional[str] = SchemaField(
description="Reschedule: new ISO 8601 send time (e.g. '2025-01-20T14:00:00Z'). Omit to keep current value.",
default=None,
advanced=True,
)
class Output(BlockSchemaOutput):
draft_id: str = SchemaField(description="The updated draft ID")
send_status: str = SchemaField(description="Updated send status", default="")
result: dict = SchemaField(description="Complete updated draft object")
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="351f6e51-695a-421a-9032-46a587b10336",
description="Update a draft's content, recipients, or scheduled send time. Use to reschedule or edit before sending.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"draft_id": "test-draft",
},
test_output=[
("draft_id", "test-draft"),
("send_status", ""),
("result", dict),
],
test_mock={
"update_draft": lambda *a, **kw: type(
"Draft",
(),
{
"draft_id": "test-draft",
"send_status": "",
"model_dump": lambda self: {"draft_id": "test-draft"},
},
)(),
},
)
@staticmethod
async def update_draft(
credentials: APIKeyCredentials, inbox_id: str, draft_id: str, **params
):
client = _client(credentials)
return await client.inboxes.drafts.update(
inbox_id=inbox_id, draft_id=draft_id, **params
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {}
if input_data.to is not None:
params["to"] = input_data.to
if input_data.subject is not None:
params["subject"] = input_data.subject
if input_data.text is not None:
params["text"] = input_data.text
if input_data.html is not None:
params["html"] = input_data.html
if input_data.send_at is not None:
params["send_at"] = input_data.send_at
draft = await self.update_draft(
credentials, input_data.inbox_id, input_data.draft_id, **params
)
result = draft.model_dump()
yield "draft_id", draft.draft_id
yield "send_status", draft.send_status or ""
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailSendDraftBlock(Block):
"""
Send a draft immediately, converting it into a delivered message.
The draft is deleted after successful sending and becomes a regular
message with a message_id. Use this for human-in-the-loop approval
workflows: agent creates draft, human reviews, then this block sends it.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the draft belongs to"
)
draft_id: str = SchemaField(description="Draft ID to send now")
class Output(BlockSchemaOutput):
message_id: str = SchemaField(
description="Message ID of the now-sent email (draft is deleted)"
)
thread_id: str = SchemaField(
description="Thread ID the sent message belongs to"
)
result: dict = SchemaField(description="Complete sent message object")
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="37c39e83-475d-4b3d-843a-d923d001b85a",
description="Send a draft immediately, converting it into a delivered message. The draft is deleted after sending.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"draft_id": "test-draft",
},
test_output=[
("message_id", "mock-msg-id"),
("thread_id", "mock-thread-id"),
("result", dict),
],
test_mock={
"send_draft": lambda *a, **kw: type(
"Msg",
(),
{
"message_id": "mock-msg-id",
"thread_id": "mock-thread-id",
"model_dump": lambda self: {"message_id": "mock-msg-id"},
},
)(),
},
)
@staticmethod
async def send_draft(credentials: APIKeyCredentials, inbox_id: str, draft_id: str):
client = _client(credentials)
return await client.inboxes.drafts.send(inbox_id=inbox_id, draft_id=draft_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
msg = await self.send_draft(
credentials, input_data.inbox_id, input_data.draft_id
)
result = msg.model_dump()
yield "message_id", msg.message_id
yield "thread_id", msg.thread_id or ""
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailDeleteDraftBlock(Block):
"""
Delete a draft from an AgentMail inbox. Also cancels any scheduled send.
If the draft was scheduled with send_at, deleting it cancels the
scheduled delivery. This is the way to cancel a scheduled email.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the draft belongs to"
)
draft_id: str = SchemaField(
description="Draft ID to delete (also cancels scheduled sends)"
)
class Output(BlockSchemaOutput):
success: bool = SchemaField(
description="True if the draft was successfully deleted/cancelled"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="9023eb99-3e2f-4def-808b-d9c584b3d9e7",
description="Delete a draft or cancel a scheduled email. Removes the draft permanently.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"draft_id": "test-draft",
},
test_output=[("success", True)],
test_mock={
"delete_draft": lambda *a, **kw: None,
},
)
@staticmethod
async def delete_draft(
credentials: APIKeyCredentials, inbox_id: str, draft_id: str
):
client = _client(credentials)
await client.inboxes.drafts.delete(inbox_id=inbox_id, draft_id=draft_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
await self.delete_draft(
credentials, input_data.inbox_id, input_data.draft_id
)
yield "success", True
except Exception as e:
yield "error", str(e)
class AgentMailListOrgDraftsBlock(Block):
"""
List all drafts across every inbox in your organization.
Returns drafts from all inboxes in one query. Perfect for building
a central approval dashboard where a human supervisor can review
and approve any draft created by any agent.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
limit: int = SchemaField(
description="Maximum number of drafts to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
drafts: list[dict] = SchemaField(
description="List of draft objects from all inboxes in the organization"
)
count: int = SchemaField(description="Number of drafts returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="ed7558ae-3a07-45f5-af55-a25fe88c9971",
description="List all drafts across every inbox in your organization. Use for central approval dashboards.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT},
test_output=[
("drafts", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_org_drafts": lambda *a, **kw: type(
"Resp",
(),
{
"drafts": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_org_drafts(credentials: APIKeyCredentials, **params):
client = _client(credentials)
return await client.drafts.list(**params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
response = await self.list_org_drafts(credentials, **params)
drafts = [d.model_dump() for d in response.drafts]
yield "drafts", drafts
yield "count", response.count
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)

View File

@@ -1,414 +0,0 @@
"""
AgentMail Inbox blocks — create, get, list, update, and delete inboxes.
An Inbox is a fully programmable email account for AI agents. Each inbox gets
a unique email address and can send, receive, and manage emails via the
AgentMail API. You can create thousands of inboxes on demand.
"""
from agentmail.inboxes.types import CreateInboxRequest
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
class AgentMailCreateInboxBlock(Block):
"""
Create a new email inbox for an AI agent via AgentMail.
Each inbox gets a unique email address (e.g. username@agentmail.to).
If username and domain are not provided, AgentMail auto-generates them.
Use custom domains by specifying the domain field.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
username: str = SchemaField(
description="Local part of the email address (e.g. 'support' for support@domain.com). Leave empty to auto-generate.",
default="",
advanced=False,
)
domain: str = SchemaField(
description="Email domain (e.g. 'mydomain.com'). Defaults to agentmail.to if empty.",
default="",
advanced=False,
)
display_name: str = SchemaField(
description="Friendly name shown in the 'From' field of sent emails (e.g. 'Support Agent')",
default="",
advanced=False,
)
class Output(BlockSchemaOutput):
inbox_id: str = SchemaField(
description="Unique identifier for the created inbox (also the email address)"
)
email_address: str = SchemaField(
description="Full email address of the inbox (e.g. support@agentmail.to)"
)
result: dict = SchemaField(
description="Complete inbox object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="7a8ac219-c6ec-4eec-a828-81af283ce04c",
description="Create a new email inbox for an AI agent via AgentMail. Each inbox gets a unique address and can send/receive emails.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT},
test_output=[
("inbox_id", "mock-inbox-id"),
("email_address", "mock-inbox-id"),
("result", dict),
],
test_mock={
"create_inbox": lambda *a, **kw: type(
"Inbox",
(),
{
"inbox_id": "mock-inbox-id",
"model_dump": lambda self: {"inbox_id": "mock-inbox-id"},
},
)(),
},
)
@staticmethod
async def create_inbox(credentials: APIKeyCredentials, **params):
client = _client(credentials)
return await client.inboxes.create(request=CreateInboxRequest(**params))
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {}
if input_data.username:
params["username"] = input_data.username
if input_data.domain:
params["domain"] = input_data.domain
if input_data.display_name:
params["display_name"] = input_data.display_name
inbox = await self.create_inbox(credentials, **params)
result = inbox.model_dump()
yield "inbox_id", inbox.inbox_id
yield "email_address", inbox.inbox_id
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailGetInboxBlock(Block):
"""
Retrieve details of an existing AgentMail inbox by its ID or email address.
Returns the inbox metadata including email address, display name, and
configuration. Use this to check if an inbox exists or get its properties.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to look up (e.g. 'support@agentmail.to')"
)
class Output(BlockSchemaOutput):
inbox_id: str = SchemaField(description="Unique identifier of the inbox")
email_address: str = SchemaField(description="Full email address of the inbox")
display_name: str = SchemaField(
description="Friendly name shown in the 'From' field", default=""
)
result: dict = SchemaField(
description="Complete inbox object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="b858f62b-6c12-4736-aaf2-dbc5a9281320",
description="Retrieve details of an existing AgentMail inbox including its email address, display name, and configuration.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
},
test_output=[
("inbox_id", "test-inbox"),
("email_address", "test-inbox"),
("display_name", ""),
("result", dict),
],
test_mock={
"get_inbox": lambda *a, **kw: type(
"Inbox",
(),
{
"inbox_id": "test-inbox",
"display_name": "",
"model_dump": lambda self: {"inbox_id": "test-inbox"},
},
)(),
},
)
@staticmethod
async def get_inbox(credentials: APIKeyCredentials, inbox_id: str):
client = _client(credentials)
return await client.inboxes.get(inbox_id=inbox_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
inbox = await self.get_inbox(credentials, input_data.inbox_id)
result = inbox.model_dump()
yield "inbox_id", inbox.inbox_id
yield "email_address", inbox.inbox_id
yield "display_name", inbox.display_name or ""
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailListInboxesBlock(Block):
"""
List all email inboxes in your AgentMail organization.
Returns a paginated list of all inboxes with their metadata.
Use page_token for pagination when you have many inboxes.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
limit: int = SchemaField(
description="Maximum number of inboxes to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page of results",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
inboxes: list[dict] = SchemaField(
description="List of inbox objects, each containing inbox_id, email_address, display_name, etc."
)
count: int = SchemaField(
description="Total number of inboxes in your organization"
)
next_page_token: str = SchemaField(
description="Token to pass as page_token to get the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="cfd84a06-2121-4cef-8d14-8badf52d22f0",
description="List all email inboxes in your AgentMail organization with pagination support.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT},
test_output=[
("inboxes", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_inboxes": lambda *a, **kw: type(
"Resp",
(),
{
"inboxes": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_inboxes(credentials: APIKeyCredentials, **params):
client = _client(credentials)
return await client.inboxes.list(**params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
response = await self.list_inboxes(credentials, **params)
inboxes = [i.model_dump() for i in response.inboxes]
yield "inboxes", inboxes
yield "count", (c if (c := response.count) is not None else len(inboxes))
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailUpdateInboxBlock(Block):
"""
Update the display name of an existing AgentMail inbox.
Changes the friendly name shown in the 'From' field when emails are sent
from this inbox. The email address itself cannot be changed.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to update (e.g. 'support@agentmail.to')"
)
display_name: str = SchemaField(
description="New display name for the inbox (e.g. 'Customer Support Bot')"
)
class Output(BlockSchemaOutput):
inbox_id: str = SchemaField(description="The updated inbox ID")
result: dict = SchemaField(
description="Complete updated inbox object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="59b49f59-a6d1-4203-94c0-3908adac50b6",
description="Update the display name of an AgentMail inbox. Changes the 'From' name shown when emails are sent.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"display_name": "Updated",
},
test_output=[
("inbox_id", "test-inbox"),
("result", dict),
],
test_mock={
"update_inbox": lambda *a, **kw: type(
"Inbox",
(),
{
"inbox_id": "test-inbox",
"model_dump": lambda self: {"inbox_id": "test-inbox"},
},
)(),
},
)
@staticmethod
async def update_inbox(credentials: APIKeyCredentials, inbox_id: str, **params):
client = _client(credentials)
return await client.inboxes.update(inbox_id=inbox_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
inbox = await self.update_inbox(
credentials,
input_data.inbox_id,
display_name=input_data.display_name,
)
result = inbox.model_dump()
yield "inbox_id", inbox.inbox_id
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailDeleteInboxBlock(Block):
"""
Permanently delete an AgentMail inbox and all its data.
This removes the inbox, all its messages, threads, and drafts.
This action cannot be undone. The email address will no longer
receive or send emails.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to permanently delete"
)
class Output(BlockSchemaOutput):
success: bool = SchemaField(
description="True if the inbox was successfully deleted"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="ade970ae-8428-4a7b-9278-b52054dbf535",
description="Permanently delete an AgentMail inbox and all its messages, threads, and drafts. This action cannot be undone.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
},
test_output=[("success", True)],
test_mock={
"delete_inbox": lambda *a, **kw: None,
},
)
@staticmethod
async def delete_inbox(credentials: APIKeyCredentials, inbox_id: str):
client = _client(credentials)
await client.inboxes.delete(inbox_id=inbox_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
await self.delete_inbox(credentials, input_data.inbox_id)
yield "success", True
except Exception as e:
yield "error", str(e)

View File

@@ -1,384 +0,0 @@
"""
AgentMail List blocks — manage allow/block lists for email filtering.
Lists let you control which email addresses and domains your agents can
send to or receive from. There are four list types based on two dimensions:
direction (send/receive) and type (allow/block).
- receive + allow: Only accept emails from these addresses/domains
- receive + block: Reject emails from these addresses/domains
- send + allow: Only send emails to these addresses/domains
- send + block: Prevent sending emails to these addresses/domains
"""
from enum import Enum
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
class ListDirection(str, Enum):
SEND = "send"
RECEIVE = "receive"
class ListType(str, Enum):
ALLOW = "allow"
BLOCK = "block"
class AgentMailListEntriesBlock(Block):
"""
List all entries in an AgentMail allow/block list.
Retrieves email addresses and domains that are currently allowed
or blocked for sending or receiving. Use direction and list_type
to select which of the four lists to query.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
direction: ListDirection = SchemaField(
description="'send' to filter outgoing emails, 'receive' to filter incoming emails"
)
list_type: ListType = SchemaField(
description="'allow' for whitelist (only permit these), 'block' for blacklist (reject these)"
)
limit: int = SchemaField(
description="Maximum number of entries to return per page",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
entries: list[dict] = SchemaField(
description="List of entries, each with an email address or domain"
)
count: int = SchemaField(description="Number of entries returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="01489100-35da-45aa-8a01-9540ba0e9a21",
description="List all entries in an AgentMail allow/block list. Choose send/receive direction and allow/block type.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"direction": "receive",
"list_type": "block",
},
test_output=[
("entries", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_entries": lambda *a, **kw: type(
"Resp",
(),
{
"entries": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_entries(
credentials: APIKeyCredentials, direction: str, list_type: str, **params
):
client = _client(credentials)
return await client.lists.list(direction, list_type, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
response = await self.list_entries(
credentials,
input_data.direction.value,
input_data.list_type.value,
**params,
)
entries = [e.model_dump() for e in response.entries]
yield "entries", entries
yield "count", (c if (c := response.count) is not None else len(entries))
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailCreateListEntryBlock(Block):
"""
Add an email address or domain to an AgentMail allow/block list.
Entries can be full email addresses (e.g. 'partner@example.com') or
entire domains (e.g. 'example.com'). For block lists, you can optionally
provide a reason (e.g. 'spam', 'competitor').
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
direction: ListDirection = SchemaField(
description="'send' for outgoing email rules, 'receive' for incoming email rules"
)
list_type: ListType = SchemaField(
description="'allow' to whitelist, 'block' to blacklist"
)
entry: str = SchemaField(
description="Email address (user@example.com) or domain (example.com) to add"
)
reason: str = SchemaField(
description="Reason for blocking (only used with block lists, e.g. 'spam', 'competitor')",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
entry: str = SchemaField(
description="The email address or domain that was added"
)
result: dict = SchemaField(description="Complete entry object")
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="b6650a0a-b113-40cf-8243-ff20f684f9b8",
description="Add an email address or domain to an allow/block list. Block spam senders or whitelist trusted domains.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"direction": "receive",
"list_type": "block",
"entry": "spam@example.com",
},
test_output=[
("entry", "spam@example.com"),
("result", dict),
],
test_mock={
"create_entry": lambda *a, **kw: type(
"Entry",
(),
{
"model_dump": lambda self: {"entry": "spam@example.com"},
},
)(),
},
)
@staticmethod
async def create_entry(
credentials: APIKeyCredentials, direction: str, list_type: str, **params
):
client = _client(credentials)
return await client.lists.create(direction, list_type, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"entry": input_data.entry}
if input_data.reason and input_data.list_type == ListType.BLOCK:
params["reason"] = input_data.reason
result = await self.create_entry(
credentials,
input_data.direction.value,
input_data.list_type.value,
**params,
)
result_dict = result.model_dump()
yield "entry", input_data.entry
yield "result", result_dict
except Exception as e:
yield "error", str(e)
class AgentMailGetListEntryBlock(Block):
"""
Check if an email address or domain exists in an AgentMail allow/block list.
Returns the entry details if found. Use this to verify whether a specific
address or domain is currently allowed or blocked.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
direction: ListDirection = SchemaField(
description="'send' for outgoing rules, 'receive' for incoming rules"
)
list_type: ListType = SchemaField(
description="'allow' for whitelist, 'block' for blacklist"
)
entry: str = SchemaField(description="Email address or domain to look up")
class Output(BlockSchemaOutput):
entry: str = SchemaField(
description="The email address or domain that was found"
)
result: dict = SchemaField(description="Complete entry object with metadata")
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="fb117058-ab27-40d1-9231-eb1dd526fc7a",
description="Check if an email address or domain is in an allow/block list. Verify filtering rules.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"direction": "receive",
"list_type": "block",
"entry": "spam@example.com",
},
test_output=[
("entry", "spam@example.com"),
("result", dict),
],
test_mock={
"get_entry": lambda *a, **kw: type(
"Entry",
(),
{
"model_dump": lambda self: {"entry": "spam@example.com"},
},
)(),
},
)
@staticmethod
async def get_entry(
credentials: APIKeyCredentials, direction: str, list_type: str, entry: str
):
client = _client(credentials)
return await client.lists.get(direction, list_type, entry=entry)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
result = await self.get_entry(
credentials,
input_data.direction.value,
input_data.list_type.value,
input_data.entry,
)
result_dict = result.model_dump()
yield "entry", input_data.entry
yield "result", result_dict
except Exception as e:
yield "error", str(e)
class AgentMailDeleteListEntryBlock(Block):
"""
Remove an email address or domain from an AgentMail allow/block list.
After removal, the address/domain will no longer be filtered by this list.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
direction: ListDirection = SchemaField(
description="'send' for outgoing rules, 'receive' for incoming rules"
)
list_type: ListType = SchemaField(
description="'allow' for whitelist, 'block' for blacklist"
)
entry: str = SchemaField(
description="Email address or domain to remove from the list"
)
class Output(BlockSchemaOutput):
success: bool = SchemaField(
description="True if the entry was successfully removed"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="2b8d57f1-1c9e-470f-a70b-5991c80fad5f",
description="Remove an email address or domain from an allow/block list to stop filtering it.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"direction": "receive",
"list_type": "block",
"entry": "spam@example.com",
},
test_output=[("success", True)],
test_mock={
"delete_entry": lambda *a, **kw: None,
},
)
@staticmethod
async def delete_entry(
credentials: APIKeyCredentials, direction: str, list_type: str, entry: str
):
client = _client(credentials)
await client.lists.delete(direction, list_type, entry=entry)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
await self.delete_entry(
credentials,
input_data.direction.value,
input_data.list_type.value,
input_data.entry,
)
yield "success", True
except Exception as e:
yield "error", str(e)

View File

@@ -1,695 +0,0 @@
"""
AgentMail Message blocks — send, list, get, reply, forward, and update messages.
A Message is an individual email within a Thread. Agents can send new messages
(which create threads), reply to existing messages, forward them, and manage
labels for state tracking (e.g. read/unread, campaign tags).
"""
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
class AgentMailSendMessageBlock(Block):
"""
Send a new email from an AgentMail inbox, automatically creating a new thread.
Supports plain text and HTML bodies, CC/BCC recipients, and labels for
organizing messages (e.g. campaign tracking, state management).
Max 50 combined recipients across to, cc, and bcc.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to send from (e.g. 'agent@agentmail.to')"
)
to: list[str] = SchemaField(
description="Recipient email addresses (e.g. ['user@example.com'])"
)
subject: str = SchemaField(description="Email subject line")
text: str = SchemaField(
description="Plain text body of the email. Always provide this as a fallback for email clients that don't render HTML."
)
html: str = SchemaField(
description="Rich HTML body of the email. Embed CSS in a <style> tag for best compatibility across email clients.",
default="",
advanced=True,
)
cc: list[str] = SchemaField(
description="CC recipient email addresses for human-in-the-loop oversight",
default_factory=list,
advanced=True,
)
bcc: list[str] = SchemaField(
description="BCC recipient email addresses (hidden from other recipients)",
default_factory=list,
advanced=True,
)
labels: list[str] = SchemaField(
description="Labels to tag the message for filtering and state management (e.g. ['outreach', 'q4-campaign'])",
default_factory=list,
advanced=True,
)
class Output(BlockSchemaOutput):
message_id: str = SchemaField(
description="Unique identifier of the sent message"
)
thread_id: str = SchemaField(
description="Thread ID grouping this message and any future replies"
)
result: dict = SchemaField(
description="Complete sent message object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="b67469b2-7748-4d81-a223-4ebd332cca89",
description="Send a new email from an AgentMail inbox. Creates a new conversation thread. Supports HTML, CC/BCC, and labels.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"to": ["user@example.com"],
"subject": "Test",
"text": "Hello",
},
test_output=[
("message_id", "mock-msg-id"),
("thread_id", "mock-thread-id"),
("result", dict),
],
test_mock={
"send_message": lambda *a, **kw: type(
"Msg",
(),
{
"message_id": "mock-msg-id",
"thread_id": "mock-thread-id",
"model_dump": lambda self: {
"message_id": "mock-msg-id",
"thread_id": "mock-thread-id",
},
},
)(),
},
)
@staticmethod
async def send_message(credentials: APIKeyCredentials, inbox_id: str, **params):
client = _client(credentials)
return await client.inboxes.messages.send(inbox_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
total = len(input_data.to) + len(input_data.cc) + len(input_data.bcc)
if total > 50:
raise ValueError(
f"Max 50 combined recipients across to, cc, and bcc (got {total})"
)
params: dict = {
"to": input_data.to,
"subject": input_data.subject,
"text": input_data.text,
}
if input_data.html:
params["html"] = input_data.html
if input_data.cc:
params["cc"] = input_data.cc
if input_data.bcc:
params["bcc"] = input_data.bcc
if input_data.labels:
params["labels"] = input_data.labels
msg = await self.send_message(credentials, input_data.inbox_id, **params)
result = msg.model_dump()
yield "message_id", msg.message_id
yield "thread_id", msg.thread_id or ""
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailListMessagesBlock(Block):
"""
List all messages in an AgentMail inbox with optional label filtering.
Returns a paginated list of messages. Use labels to filter (e.g.
labels=['unread'] to only get unprocessed messages). Useful for
polling workflows or building inbox views.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to list messages from"
)
limit: int = SchemaField(
description="Maximum number of messages to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
labels: list[str] = SchemaField(
description="Only return messages with ALL of these labels (e.g. ['unread'] or ['q4-campaign', 'follow-up'])",
default_factory=list,
advanced=True,
)
class Output(BlockSchemaOutput):
messages: list[dict] = SchemaField(
description="List of message objects with subject, sender, text, html, labels, etc."
)
count: int = SchemaField(description="Number of messages returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="721234df-c7a2-4927-b205-744badbd5844",
description="List messages in an AgentMail inbox. Filter by labels to find unread, campaign-tagged, or categorized messages.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
},
test_output=[
("messages", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_messages": lambda *a, **kw: type(
"Resp",
(),
{
"messages": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_messages(credentials: APIKeyCredentials, inbox_id: str, **params):
client = _client(credentials)
return await client.inboxes.messages.list(inbox_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
if input_data.labels:
params["labels"] = input_data.labels
response = await self.list_messages(
credentials, input_data.inbox_id, **params
)
messages = [m.model_dump() for m in response.messages]
yield "messages", messages
yield "count", (c if (c := response.count) is not None else len(messages))
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailGetMessageBlock(Block):
"""
Retrieve a specific email message by ID from an AgentMail inbox.
Returns the full message including subject, body (text and HTML),
sender, recipients, and attachments. Use extracted_text to get
only the new reply content without quoted history.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the message belongs to"
)
message_id: str = SchemaField(
description="Message ID to retrieve (e.g. '<abc123@agentmail.to>')"
)
class Output(BlockSchemaOutput):
message_id: str = SchemaField(description="Unique identifier of the message")
thread_id: str = SchemaField(description="Thread this message belongs to")
subject: str = SchemaField(description="Email subject line")
text: str = SchemaField(
description="Full plain text body (may include quoted reply history)"
)
extracted_text: str = SchemaField(
description="Just the new reply content with quoted history stripped. Best for AI processing.",
default="",
)
html: str = SchemaField(description="HTML body of the email", default="")
result: dict = SchemaField(
description="Complete message object with all fields including sender, recipients, attachments, labels"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="2788bdfa-1527-4603-a5e4-a455c05c032f",
description="Retrieve a specific email message by ID. Includes extracted_text for clean reply content without quoted history.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"message_id": "test-msg",
},
test_output=[
("message_id", "test-msg"),
("thread_id", "t1"),
("subject", "Hi"),
("text", "Hello"),
("extracted_text", "Hello"),
("html", ""),
("result", dict),
],
test_mock={
"get_message": lambda *a, **kw: type(
"Msg",
(),
{
"message_id": "test-msg",
"thread_id": "t1",
"subject": "Hi",
"text": "Hello",
"extracted_text": "Hello",
"html": "",
"model_dump": lambda self: {"message_id": "test-msg"},
},
)(),
},
)
@staticmethod
async def get_message(
credentials: APIKeyCredentials,
inbox_id: str,
message_id: str,
):
client = _client(credentials)
return await client.inboxes.messages.get(
inbox_id=inbox_id, message_id=message_id
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
msg = await self.get_message(
credentials, input_data.inbox_id, input_data.message_id
)
result = msg.model_dump()
yield "message_id", msg.message_id
yield "thread_id", msg.thread_id or ""
yield "subject", msg.subject or ""
yield "text", msg.text or ""
yield "extracted_text", msg.extracted_text or ""
yield "html", msg.html or ""
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailReplyToMessageBlock(Block):
"""
Reply to an existing email message, keeping the reply in the same thread.
The reply is automatically added to the same conversation thread as the
original message. Use this for multi-turn agent conversations.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to send the reply from"
)
message_id: str = SchemaField(
description="Message ID to reply to (e.g. '<abc123@agentmail.to>')"
)
text: str = SchemaField(description="Plain text body of the reply")
html: str = SchemaField(
description="Rich HTML body of the reply",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
message_id: str = SchemaField(
description="Unique identifier of the reply message"
)
thread_id: str = SchemaField(description="Thread ID the reply was added to")
result: dict = SchemaField(
description="Complete reply message object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="b9fe53fa-5026-4547-9570-b54ccb487229",
description="Reply to an existing email in the same conversation thread. Use for multi-turn agent conversations.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"message_id": "test-msg",
"text": "Reply",
},
test_output=[
("message_id", "mock-reply-id"),
("thread_id", "mock-thread-id"),
("result", dict),
],
test_mock={
"reply_to_message": lambda *a, **kw: type(
"Msg",
(),
{
"message_id": "mock-reply-id",
"thread_id": "mock-thread-id",
"model_dump": lambda self: {"message_id": "mock-reply-id"},
},
)(),
},
)
@staticmethod
async def reply_to_message(
credentials: APIKeyCredentials, inbox_id: str, message_id: str, **params
):
client = _client(credentials)
return await client.inboxes.messages.reply(
inbox_id=inbox_id, message_id=message_id, **params
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"text": input_data.text}
if input_data.html:
params["html"] = input_data.html
reply = await self.reply_to_message(
credentials,
input_data.inbox_id,
input_data.message_id,
**params,
)
result = reply.model_dump()
yield "message_id", reply.message_id
yield "thread_id", reply.thread_id or ""
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailForwardMessageBlock(Block):
"""
Forward an existing email message to one or more recipients.
Sends the original message content to different email addresses.
Optionally prepend additional text or override the subject line.
Max 50 combined recipients across to, cc, and bcc.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to forward from"
)
message_id: str = SchemaField(description="Message ID to forward")
to: list[str] = SchemaField(
description="Recipient email addresses to forward the message to (e.g. ['user@example.com'])"
)
cc: list[str] = SchemaField(
description="CC recipient email addresses",
default_factory=list,
advanced=True,
)
bcc: list[str] = SchemaField(
description="BCC recipient email addresses (hidden from other recipients)",
default_factory=list,
advanced=True,
)
subject: str = SchemaField(
description="Override the subject line (defaults to 'Fwd: <original subject>')",
default="",
advanced=True,
)
text: str = SchemaField(
description="Additional plain text to prepend before the forwarded content",
default="",
advanced=True,
)
html: str = SchemaField(
description="Additional HTML to prepend before the forwarded content",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
message_id: str = SchemaField(
description="Unique identifier of the forwarded message"
)
thread_id: str = SchemaField(description="Thread ID of the forward")
result: dict = SchemaField(
description="Complete forwarded message object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="b70c7e33-5d66-4f8e-897f-ac73a7bfce82",
description="Forward an email message to one or more recipients. Supports CC/BCC and optional extra text or subject override.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"message_id": "test-msg",
"to": ["user@example.com"],
},
test_output=[
("message_id", "mock-fwd-id"),
("thread_id", "mock-thread-id"),
("result", dict),
],
test_mock={
"forward_message": lambda *a, **kw: type(
"Msg",
(),
{
"message_id": "mock-fwd-id",
"thread_id": "mock-thread-id",
"model_dump": lambda self: {"message_id": "mock-fwd-id"},
},
)(),
},
)
@staticmethod
async def forward_message(
credentials: APIKeyCredentials, inbox_id: str, message_id: str, **params
):
client = _client(credentials)
return await client.inboxes.messages.forward(
inbox_id=inbox_id, message_id=message_id, **params
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
total = len(input_data.to) + len(input_data.cc) + len(input_data.bcc)
if total > 50:
raise ValueError(
f"Max 50 combined recipients across to, cc, and bcc (got {total})"
)
params: dict = {"to": input_data.to}
if input_data.cc:
params["cc"] = input_data.cc
if input_data.bcc:
params["bcc"] = input_data.bcc
if input_data.subject:
params["subject"] = input_data.subject
if input_data.text:
params["text"] = input_data.text
if input_data.html:
params["html"] = input_data.html
fwd = await self.forward_message(
credentials,
input_data.inbox_id,
input_data.message_id,
**params,
)
result = fwd.model_dump()
yield "message_id", fwd.message_id
yield "thread_id", fwd.thread_id or ""
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailUpdateMessageBlock(Block):
"""
Add or remove labels on an email message for state management.
Labels are string tags used to track message state (read/unread),
categorize messages (billing, support), or tag campaigns (q4-outreach).
Common pattern: add 'read' and remove 'unread' after processing a message.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the message belongs to"
)
message_id: str = SchemaField(description="Message ID to update labels on")
add_labels: list[str] = SchemaField(
description="Labels to add (e.g. ['read', 'processed', 'high-priority'])",
default_factory=list,
)
remove_labels: list[str] = SchemaField(
description="Labels to remove (e.g. ['unread', 'pending'])",
default_factory=list,
)
class Output(BlockSchemaOutput):
message_id: str = SchemaField(description="The updated message ID")
result: dict = SchemaField(
description="Complete updated message object with current labels"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="694ff816-4c89-4a5e-a552-8c31be187735",
description="Add or remove labels on an email message. Use for read/unread tracking, campaign tagging, or state management.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"message_id": "test-msg",
"add_labels": ["read"],
},
test_output=[
("message_id", "test-msg"),
("result", dict),
],
test_mock={
"update_message": lambda *a, **kw: type(
"Msg",
(),
{
"message_id": "test-msg",
"model_dump": lambda self: {"message_id": "test-msg"},
},
)(),
},
)
@staticmethod
async def update_message(
credentials: APIKeyCredentials, inbox_id: str, message_id: str, **params
):
client = _client(credentials)
return await client.inboxes.messages.update(
inbox_id=inbox_id, message_id=message_id, **params
)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
if not input_data.add_labels and not input_data.remove_labels:
raise ValueError(
"Must specify at least one label operation: add_labels or remove_labels"
)
params: dict = {}
if input_data.add_labels:
params["add_labels"] = input_data.add_labels
if input_data.remove_labels:
params["remove_labels"] = input_data.remove_labels
msg = await self.update_message(
credentials,
input_data.inbox_id,
input_data.message_id,
**params,
)
result = msg.model_dump()
yield "message_id", msg.message_id
yield "result", result
except Exception as e:
yield "error", str(e)

View File

@@ -1,651 +0,0 @@
"""
AgentMail Pod blocks — create, get, list, delete pods and list pod-scoped resources.
Pods provide multi-tenant isolation between your customers. Each pod acts as
an isolated workspace containing its own inboxes, domains, threads, and drafts.
Use pods when building SaaS platforms, agency tools, or AI agent fleets that
serve multiple customers.
"""
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
class AgentMailCreatePodBlock(Block):
"""
Create a new pod for multi-tenant customer isolation.
Each pod acts as an isolated workspace for one customer or tenant.
Use client_id to map pods to your internal tenant IDs for idempotent
creation (safe to retry without creating duplicates).
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
client_id: str = SchemaField(
description="Your internal tenant/customer ID for idempotent mapping. Lets you access the pod by your own ID instead of AgentMail's pod_id.",
default="",
)
class Output(BlockSchemaOutput):
pod_id: str = SchemaField(description="Unique identifier of the created pod")
result: dict = SchemaField(description="Complete pod object with all metadata")
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="a2db9784-2d17-4f8f-9d6b-0214e6f22101",
description="Create a new pod for multi-tenant customer isolation. Use client_id to map to your internal tenant IDs.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT},
test_output=[
("pod_id", "mock-pod-id"),
("result", dict),
],
test_mock={
"create_pod": lambda *a, **kw: type(
"Pod",
(),
{
"pod_id": "mock-pod-id",
"model_dump": lambda self: {"pod_id": "mock-pod-id"},
},
)(),
},
)
@staticmethod
async def create_pod(credentials: APIKeyCredentials, **params):
client = _client(credentials)
return await client.pods.create(**params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {}
if input_data.client_id:
params["client_id"] = input_data.client_id
pod = await self.create_pod(credentials, **params)
result = pod.model_dump()
yield "pod_id", pod.pod_id
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailGetPodBlock(Block):
"""
Retrieve details of an existing pod by its ID.
Returns the pod metadata including its client_id mapping and
creation timestamp.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
pod_id: str = SchemaField(description="Pod ID to retrieve")
class Output(BlockSchemaOutput):
pod_id: str = SchemaField(description="Unique identifier of the pod")
result: dict = SchemaField(description="Complete pod object with all metadata")
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="553361bc-bb1b-4322-9ad4-0c226200217e",
description="Retrieve details of an existing pod including its client_id mapping and metadata.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT, "pod_id": "test-pod"},
test_output=[
("pod_id", "test-pod"),
("result", dict),
],
test_mock={
"get_pod": lambda *a, **kw: type(
"Pod",
(),
{
"pod_id": "test-pod",
"model_dump": lambda self: {"pod_id": "test-pod"},
},
)(),
},
)
@staticmethod
async def get_pod(credentials: APIKeyCredentials, pod_id: str):
client = _client(credentials)
return await client.pods.get(pod_id=pod_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
pod = await self.get_pod(credentials, pod_id=input_data.pod_id)
result = pod.model_dump()
yield "pod_id", pod.pod_id
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailListPodsBlock(Block):
"""
List all pods in your AgentMail organization.
Returns a paginated list of all tenant pods with their metadata.
Use this to see all customer workspaces at a glance.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
limit: int = SchemaField(
description="Maximum number of pods to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
pods: list[dict] = SchemaField(
description="List of pod objects with pod_id, client_id, creation time, etc."
)
count: int = SchemaField(description="Number of pods returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="9d3725ee-2968-431a-a816-857ab41e1420",
description="List all tenant pods in your organization. See all customer workspaces at a glance.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT},
test_output=[
("pods", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_pods": lambda *a, **kw: type(
"Resp",
(),
{
"pods": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_pods(credentials: APIKeyCredentials, **params):
client = _client(credentials)
return await client.pods.list(**params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
response = await self.list_pods(credentials, **params)
pods = [p.model_dump() for p in response.pods]
yield "pods", pods
yield "count", response.count
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailDeletePodBlock(Block):
"""
Permanently delete a pod. All inboxes and domains must be removed first.
You cannot delete a pod that still contains inboxes or domains.
Delete all child resources first, then delete the pod.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
pod_id: str = SchemaField(
description="Pod ID to permanently delete (must have no inboxes or domains)"
)
class Output(BlockSchemaOutput):
success: bool = SchemaField(
description="True if the pod was successfully deleted"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="f371f8cd-682d-4f5f-905c-529c74a8fb35",
description="Permanently delete a pod. All inboxes and domains must be removed first.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT, "pod_id": "test-pod"},
test_output=[("success", True)],
test_mock={
"delete_pod": lambda *a, **kw: None,
},
)
@staticmethod
async def delete_pod(credentials: APIKeyCredentials, pod_id: str):
client = _client(credentials)
await client.pods.delete(pod_id=pod_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
await self.delete_pod(credentials, pod_id=input_data.pod_id)
yield "success", True
except Exception as e:
yield "error", str(e)
class AgentMailListPodInboxesBlock(Block):
"""
List all inboxes within a specific pod (customer workspace).
Returns only the inboxes belonging to this pod, providing
tenant-scoped visibility.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
pod_id: str = SchemaField(description="Pod ID to list inboxes from")
limit: int = SchemaField(
description="Maximum number of inboxes to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
inboxes: list[dict] = SchemaField(
description="List of inbox objects within this pod"
)
count: int = SchemaField(description="Number of inboxes returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="a8c17ce0-b7c1-4bc3-ae39-680e1952e5d0",
description="List all inboxes within a pod. View email accounts scoped to a specific customer.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT, "pod_id": "test-pod"},
test_output=[
("inboxes", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_pod_inboxes": lambda *a, **kw: type(
"Resp",
(),
{
"inboxes": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_pod_inboxes(credentials: APIKeyCredentials, pod_id: str, **params):
client = _client(credentials)
return await client.pods.inboxes.list(pod_id=pod_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
response = await self.list_pod_inboxes(
credentials, pod_id=input_data.pod_id, **params
)
inboxes = [i.model_dump() for i in response.inboxes]
yield "inboxes", inboxes
yield "count", response.count
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailListPodThreadsBlock(Block):
"""
List all conversation threads across all inboxes within a pod.
Returns threads from every inbox in the pod. Use for building
per-customer dashboards showing all email activity, or for
supervisor agents monitoring a customer's conversations.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
pod_id: str = SchemaField(description="Pod ID to list threads from")
limit: int = SchemaField(
description="Maximum number of threads to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
labels: list[str] = SchemaField(
description="Only return threads matching ALL of these labels",
default_factory=list,
advanced=True,
)
class Output(BlockSchemaOutput):
threads: list[dict] = SchemaField(
description="List of thread objects from all inboxes in this pod"
)
count: int = SchemaField(description="Number of threads returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="80214f08-8b85-4533-a6b8-f8123bfcb410",
description="List all conversation threads across all inboxes within a pod. View all email activity for a customer.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT, "pod_id": "test-pod"},
test_output=[
("threads", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_pod_threads": lambda *a, **kw: type(
"Resp",
(),
{
"threads": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_pod_threads(credentials: APIKeyCredentials, pod_id: str, **params):
client = _client(credentials)
return await client.pods.threads.list(pod_id=pod_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
if input_data.labels:
params["labels"] = input_data.labels
response = await self.list_pod_threads(
credentials, pod_id=input_data.pod_id, **params
)
threads = [t.model_dump() for t in response.threads]
yield "threads", threads
yield "count", response.count
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailListPodDraftsBlock(Block):
"""
List all drafts across all inboxes within a pod.
Returns pending drafts from every inbox in the pod. Use for
per-customer approval dashboards or monitoring scheduled sends.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
pod_id: str = SchemaField(description="Pod ID to list drafts from")
limit: int = SchemaField(
description="Maximum number of drafts to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
class Output(BlockSchemaOutput):
drafts: list[dict] = SchemaField(
description="List of draft objects from all inboxes in this pod"
)
count: int = SchemaField(description="Number of drafts returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="12fd7a3e-51ad-4b20-97c1-0391f207f517",
description="List all drafts across all inboxes within a pod. View pending emails for a customer.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT, "pod_id": "test-pod"},
test_output=[
("drafts", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_pod_drafts": lambda *a, **kw: type(
"Resp",
(),
{
"drafts": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_pod_drafts(credentials: APIKeyCredentials, pod_id: str, **params):
client = _client(credentials)
return await client.pods.drafts.list(pod_id=pod_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
response = await self.list_pod_drafts(
credentials, pod_id=input_data.pod_id, **params
)
drafts = [d.model_dump() for d in response.drafts]
yield "drafts", drafts
yield "count", response.count
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailCreatePodInboxBlock(Block):
"""
Create a new email inbox within a specific pod (customer workspace).
The inbox is automatically scoped to the pod and inherits its
isolation guarantees. If username/domain are not provided,
AgentMail auto-generates a unique address.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
pod_id: str = SchemaField(description="Pod ID to create the inbox in")
username: str = SchemaField(
description="Local part of the email address (e.g. 'support'). Leave empty to auto-generate.",
default="",
)
domain: str = SchemaField(
description="Email domain (e.g. 'mydomain.com'). Defaults to agentmail.to if empty.",
default="",
)
display_name: str = SchemaField(
description="Friendly name shown in the 'From' field (e.g. 'Customer Support')",
default="",
)
class Output(BlockSchemaOutput):
inbox_id: str = SchemaField(
description="Unique identifier of the created inbox"
)
email_address: str = SchemaField(description="Full email address of the inbox")
result: dict = SchemaField(
description="Complete inbox object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="c6862373-1ac6-402e-89e6-7db1fea882af",
description="Create a new email inbox within a pod. The inbox is scoped to the customer workspace.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT, "pod_id": "test-pod"},
test_output=[
("inbox_id", "mock-inbox-id"),
("email_address", "mock-inbox-id"),
("result", dict),
],
test_mock={
"create_pod_inbox": lambda *a, **kw: type(
"Inbox",
(),
{
"inbox_id": "mock-inbox-id",
"model_dump": lambda self: {"inbox_id": "mock-inbox-id"},
},
)(),
},
)
@staticmethod
async def create_pod_inbox(credentials: APIKeyCredentials, pod_id: str, **params):
client = _client(credentials)
return await client.pods.inboxes.create(pod_id=pod_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {}
if input_data.username:
params["username"] = input_data.username
if input_data.domain:
params["domain"] = input_data.domain
if input_data.display_name:
params["display_name"] = input_data.display_name
inbox = await self.create_pod_inbox(
credentials, pod_id=input_data.pod_id, **params
)
result = inbox.model_dump()
yield "inbox_id", inbox.inbox_id
yield "email_address", inbox.inbox_id
yield "result", result
except Exception as e:
yield "error", str(e)

View File

@@ -1,438 +0,0 @@
"""
AgentMail Thread blocks — list, get, and delete conversation threads.
A Thread groups related messages into a single conversation. Threads are
created automatically when a new message is sent and grow as replies are added.
Threads can be queried per-inbox or across the entire organization.
"""
from backend.sdk import (
APIKeyCredentials,
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
CredentialsMetaInput,
SchemaField,
)
from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
class AgentMailListInboxThreadsBlock(Block):
"""
List all conversation threads within a specific AgentMail inbox.
Returns a paginated list of threads with optional label filtering.
Use labels to find threads by campaign, status, or custom tags.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address to list threads from"
)
limit: int = SchemaField(
description="Maximum number of threads to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
labels: list[str] = SchemaField(
description="Only return threads matching ALL of these labels (e.g. ['q4-campaign', 'follow-up'])",
default_factory=list,
advanced=True,
)
class Output(BlockSchemaOutput):
threads: list[dict] = SchemaField(
description="List of thread objects with thread_id, subject, message count, labels, etc."
)
count: int = SchemaField(description="Number of threads returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="63dd9e2d-ef81-405c-b034-c031f0437334",
description="List all conversation threads in an AgentMail inbox. Filter by labels for campaign tracking or status management.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
},
test_output=[
("threads", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_threads": lambda *a, **kw: type(
"Resp",
(),
{
"threads": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_threads(credentials: APIKeyCredentials, inbox_id: str, **params):
client = _client(credentials)
return await client.inboxes.threads.list(inbox_id=inbox_id, **params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
if input_data.labels:
params["labels"] = input_data.labels
response = await self.list_threads(
credentials, input_data.inbox_id, **params
)
threads = [t.model_dump() for t in response.threads]
yield "threads", threads
yield "count", (c if (c := response.count) is not None else len(threads))
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailGetInboxThreadBlock(Block):
"""
Retrieve a single conversation thread from an AgentMail inbox.
Returns the thread with all its messages in chronological order.
Use this to get the full conversation history for context when
composing replies.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the thread belongs to"
)
thread_id: str = SchemaField(description="Thread ID to retrieve")
class Output(BlockSchemaOutput):
thread_id: str = SchemaField(description="Unique identifier of the thread")
messages: list[dict] = SchemaField(
description="All messages in the thread, in chronological order"
)
result: dict = SchemaField(
description="Complete thread object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="42866290-1479-4153-83e7-550b703e9da2",
description="Retrieve a conversation thread with all its messages. Use for getting full conversation context before replying.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"thread_id": "test-thread",
},
test_output=[
("thread_id", "test-thread"),
("messages", []),
("result", dict),
],
test_mock={
"get_thread": lambda *a, **kw: type(
"Thread",
(),
{
"thread_id": "test-thread",
"messages": [],
"model_dump": lambda self: {
"thread_id": "test-thread",
"messages": [],
},
},
)(),
},
)
@staticmethod
async def get_thread(credentials: APIKeyCredentials, inbox_id: str, thread_id: str):
client = _client(credentials)
return await client.inboxes.threads.get(inbox_id=inbox_id, thread_id=thread_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
thread = await self.get_thread(
credentials, input_data.inbox_id, input_data.thread_id
)
messages = [m.model_dump() for m in thread.messages]
result = thread.model_dump()
result["messages"] = messages
yield "thread_id", thread.thread_id
yield "messages", messages
yield "result", result
except Exception as e:
yield "error", str(e)
class AgentMailDeleteInboxThreadBlock(Block):
"""
Permanently delete a conversation thread and all its messages from an inbox.
This removes the thread and every message within it. This action
cannot be undone.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
inbox_id: str = SchemaField(
description="Inbox ID or email address the thread belongs to"
)
thread_id: str = SchemaField(description="Thread ID to permanently delete")
class Output(BlockSchemaOutput):
success: bool = SchemaField(
description="True if the thread was successfully deleted"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="18cd5f6f-4ff6-45da-8300-25a50ea7fb75",
description="Permanently delete a conversation thread and all its messages. This action cannot be undone.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
is_sensitive_action=True,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"inbox_id": "test-inbox",
"thread_id": "test-thread",
},
test_output=[("success", True)],
test_mock={
"delete_thread": lambda *a, **kw: None,
},
)
@staticmethod
async def delete_thread(
credentials: APIKeyCredentials, inbox_id: str, thread_id: str
):
client = _client(credentials)
await client.inboxes.threads.delete(inbox_id=inbox_id, thread_id=thread_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
await self.delete_thread(
credentials, input_data.inbox_id, input_data.thread_id
)
yield "success", True
except Exception as e:
yield "error", str(e)
class AgentMailListOrgThreadsBlock(Block):
"""
List conversation threads across ALL inboxes in your organization.
Unlike per-inbox listing, this returns threads from every inbox.
Ideal for building supervisor agents that monitor all conversations,
analytics dashboards, or cross-agent routing workflows.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
limit: int = SchemaField(
description="Maximum number of threads to return per page (1-100)",
default=20,
advanced=True,
)
page_token: str = SchemaField(
description="Token from a previous response to fetch the next page",
default="",
advanced=True,
)
labels: list[str] = SchemaField(
description="Only return threads matching ALL of these labels",
default_factory=list,
advanced=True,
)
class Output(BlockSchemaOutput):
threads: list[dict] = SchemaField(
description="List of thread objects from all inboxes in the organization"
)
count: int = SchemaField(description="Number of threads returned")
next_page_token: str = SchemaField(
description="Token for the next page. Empty if no more results.",
default="",
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="d7a0657b-58ab-48b2-898b-7bd94f44a708",
description="List threads across ALL inboxes in your organization. Use for supervisor agents, dashboards, or cross-agent monitoring.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={"credentials": TEST_CREDENTIALS_INPUT},
test_output=[
("threads", []),
("count", 0),
("next_page_token", ""),
],
test_mock={
"list_org_threads": lambda *a, **kw: type(
"Resp",
(),
{
"threads": [],
"count": 0,
"next_page_token": "",
},
)(),
},
)
@staticmethod
async def list_org_threads(credentials: APIKeyCredentials, **params):
client = _client(credentials)
return await client.threads.list(**params)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
params: dict = {"limit": input_data.limit}
if input_data.page_token:
params["page_token"] = input_data.page_token
if input_data.labels:
params["labels"] = input_data.labels
response = await self.list_org_threads(credentials, **params)
threads = [t.model_dump() for t in response.threads]
yield "threads", threads
yield "count", (c if (c := response.count) is not None else len(threads))
yield "next_page_token", response.next_page_token or ""
except Exception as e:
yield "error", str(e)
class AgentMailGetOrgThreadBlock(Block):
"""
Retrieve a single conversation thread by ID from anywhere in the organization.
Works without needing to know which inbox the thread belongs to.
Returns the thread with all its messages in chronological order.
"""
class Input(BlockSchemaInput):
credentials: CredentialsMetaInput = agent_mail.credentials_field(
description="AgentMail API key from https://console.agentmail.to"
)
thread_id: str = SchemaField(
description="Thread ID to retrieve (works across all inboxes)"
)
class Output(BlockSchemaOutput):
thread_id: str = SchemaField(description="Unique identifier of the thread")
messages: list[dict] = SchemaField(
description="All messages in the thread, in chronological order"
)
result: dict = SchemaField(
description="Complete thread object with all metadata"
)
error: str = SchemaField(description="Error message if the operation failed")
def __init__(self):
super().__init__(
id="39aaae31-3eb1-44c6-9e37-5a44a4529649",
description="Retrieve a conversation thread by ID from anywhere in the organization, without needing the inbox ID.",
categories={BlockCategory.COMMUNICATION},
input_schema=self.Input,
output_schema=self.Output,
test_credentials=TEST_CREDENTIALS,
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"thread_id": "test-thread",
},
test_output=[
("thread_id", "test-thread"),
("messages", []),
("result", dict),
],
test_mock={
"get_org_thread": lambda *a, **kw: type(
"Thread",
(),
{
"thread_id": "test-thread",
"messages": [],
"model_dump": lambda self: {
"thread_id": "test-thread",
"messages": [],
},
},
)(),
},
)
@staticmethod
async def get_org_thread(credentials: APIKeyCredentials, thread_id: str):
client = _client(credentials)
return await client.threads.get(thread_id=thread_id)
async def run(
self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
) -> BlockOutput:
try:
thread = await self.get_org_thread(credentials, input_data.thread_id)
messages = [m.model_dump() for m in thread.messages]
result = thread.model_dump()
result["messages"] = messages
yield "thread_id", thread.thread_id
yield "messages", messages
yield "result", result
except Exception as e:
yield "error", str(e)

View File

@@ -27,7 +27,6 @@ from backend.util.file import MediaFileType, store_media_file
class GeminiImageModel(str, Enum):
NANO_BANANA = "google/nano-banana"
NANO_BANANA_PRO = "google/nano-banana-pro"
NANO_BANANA_2 = "google/nano-banana-2"
class AspectRatio(str, Enum):
@@ -78,7 +77,7 @@ class AIImageCustomizerBlock(Block):
)
model: GeminiImageModel = SchemaField(
description="The AI model to use for image generation and editing",
default=GeminiImageModel.NANO_BANANA_2,
default=GeminiImageModel.NANO_BANANA,
title="Model",
)
images: list[MediaFileType] = SchemaField(
@@ -104,7 +103,7 @@ class AIImageCustomizerBlock(Block):
super().__init__(
id="d76bbe4c-930e-4894-8469-b66775511f71",
description=(
"Generate and edit custom images using Google's Nano-Banana models from Gemini. "
"Generate and edit custom images using Google's Nano-Banana model from Gemini 2.5. "
"Provide a prompt and optional reference images to create or modify images."
),
categories={BlockCategory.AI, BlockCategory.MULTIMEDIA},
@@ -112,7 +111,7 @@ class AIImageCustomizerBlock(Block):
output_schema=AIImageCustomizerBlock.Output,
test_input={
"prompt": "Make the scene more vibrant and colorful",
"model": GeminiImageModel.NANO_BANANA_2,
"model": GeminiImageModel.NANO_BANANA,
"images": [],
"aspect_ratio": AspectRatio.MATCH_INPUT_IMAGE,
"output_format": OutputFormat.JPG,

View File

@@ -115,7 +115,6 @@ class ImageGenModel(str, Enum):
RECRAFT = "Recraft v3"
SD3_5 = "Stable Diffusion 3.5 Medium"
NANO_BANANA_PRO = "Nano Banana Pro"
NANO_BANANA_2 = "Nano Banana 2"
class AIImageGeneratorBlock(Block):
@@ -132,7 +131,7 @@ class AIImageGeneratorBlock(Block):
)
model: ImageGenModel = SchemaField(
description="The AI model to use for image generation",
default=ImageGenModel.NANO_BANANA_2,
default=ImageGenModel.SD3_5,
title="Model",
)
size: ImageSize = SchemaField(
@@ -166,7 +165,7 @@ class AIImageGeneratorBlock(Block):
test_input={
"credentials": TEST_CREDENTIALS_INPUT,
"prompt": "An octopus using a laptop in a snowy forest with 'AutoGPT' clearly visible on the screen",
"model": ImageGenModel.NANO_BANANA_2,
"model": ImageGenModel.RECRAFT,
"size": ImageSize.SQUARE,
"style": ImageStyle.REALISTIC,
},
@@ -180,9 +179,7 @@ class AIImageGeneratorBlock(Block):
],
test_mock={
# Return a data URI directly so store_media_file doesn't need to download
"_run_client": lambda *args, **kwargs: (
"data:image/webp;base64,UklGRiQAAABXRUJQVlA4IBgAAAAwAQCdASoBAAEAAQAcJYgCdAEO"
)
"_run_client": lambda *args, **kwargs: "data:image/webp;base64,UklGRiQAAABXRUJQVlA4IBgAAAAwAQCdASoBAAEAAQAcJYgCdAEO"
},
)
@@ -283,24 +280,17 @@ class AIImageGeneratorBlock(Block):
)
return output
elif input_data.model in (
ImageGenModel.NANO_BANANA_PRO,
ImageGenModel.NANO_BANANA_2,
):
# Use Nano Banana models (Google Gemini image variants)
model_map = {
ImageGenModel.NANO_BANANA_PRO: "google/nano-banana-pro",
ImageGenModel.NANO_BANANA_2: "google/nano-banana-2",
}
elif input_data.model == ImageGenModel.NANO_BANANA_PRO:
# Use Nano Banana Pro (Google Gemini 3 Pro Image)
input_params = {
"prompt": modified_prompt,
"aspect_ratio": SIZE_TO_NANO_BANANA_RATIO[input_data.size],
"resolution": "2K",
"resolution": "2K", # Default to 2K for good quality/cost balance
"output_format": "jpg",
"safety_filter_level": "block_only_high",
"safety_filter_level": "block_only_high", # Most permissive
}
output = await self._run_client(
credentials, model_map[input_data.model], input_params
credentials, "google/nano-banana-pro", input_params
)
return output

View File

@@ -1,376 +0,0 @@
from __future__ import annotations
import asyncio
import contextvars
import json
import logging
from typing import TYPE_CHECKING, Any
from typing_extensions import TypedDict # Needed for Python <3.12 compatibility
from backend.blocks._base import (
Block,
BlockCategory,
BlockOutput,
BlockSchemaInput,
BlockSchemaOutput,
)
from backend.data.model import SchemaField
if TYPE_CHECKING:
from backend.data.execution import ExecutionContext
logger = logging.getLogger(__name__)
# Block ID shared between autopilot.py and copilot prompting.py.
AUTOPILOT_BLOCK_ID = "c069dc6b-c3ed-4c12-b6e5-d47361e64ce6"
class ToolCallEntry(TypedDict):
"""A single tool invocation record from an autopilot execution."""
tool_call_id: str
tool_name: str
input: Any
output: Any | None
success: bool | None
class TokenUsage(TypedDict):
"""Aggregated token counts from the autopilot stream."""
prompt_tokens: int
completion_tokens: int
total_tokens: int
class AutoPilotBlock(Block):
"""Execute tasks using AutoGPT AutoPilot with full access to platform tools.
The autopilot can manage agents, access workspace files, fetch web content,
run blocks, and more. This block enables sub-agent patterns (autopilot calling
autopilot) and scheduled autopilot execution via the agent executor.
"""
class Input(BlockSchemaInput):
"""Input schema for the AutoPilot block."""
prompt: str = SchemaField(
description=(
"The task or instruction for the autopilot to execute. "
"The autopilot has access to platform tools like agent management, "
"workspace files, web fetch, block execution, and more."
),
placeholder="Find my agents and list them",
advanced=False,
)
system_context: str = SchemaField(
description=(
"Optional additional context prepended to the prompt. "
"Use this to constrain autopilot behavior, provide domain "
"context, or set output format requirements."
),
default="",
advanced=True,
)
session_id: str = SchemaField(
description=(
"Session ID to continue an existing autopilot conversation. "
"Leave empty to start a new session. "
"Use the session_id output from a previous run to continue."
),
default="",
advanced=True,
)
max_recursion_depth: int = SchemaField(
description=(
"Maximum nesting depth when the autopilot calls this block "
"recursively (sub-agent pattern). Prevents infinite loops."
),
default=3,
ge=1,
le=10,
advanced=True,
)
# timeout_seconds removed: the SDK manages its own heartbeat-based
# timeouts internally; wrapping with asyncio.timeout corrupts the
# SDK's internal stream (see service.py CRITICAL comment).
class Output(BlockSchemaOutput):
"""Output schema for the AutoPilot block."""
response: str = SchemaField(
description="The final text response from the autopilot."
)
tool_calls: list[ToolCallEntry] = SchemaField(
description=(
"List of tools called during execution. Each entry has "
"tool_call_id, tool_name, input, output, and success fields."
),
)
conversation_history: str = SchemaField(
description=(
"Current turn messages (user prompt + assistant reply) as JSON. "
"It can be used for logging or analysis."
),
)
session_id: str = SchemaField(
description=(
"Session ID for this conversation. "
"Pass this back to continue the conversation in a future run."
),
)
token_usage: TokenUsage = SchemaField(
description=(
"Token usage statistics: prompt_tokens, "
"completion_tokens, total_tokens."
),
)
def __init__(self):
super().__init__(
id=AUTOPILOT_BLOCK_ID,
description=(
"Execute tasks using AutoGPT AutoPilot with full access to "
"platform tools (agent management, workspace files, web fetch, "
"block execution, and more). Enables sub-agent patterns and "
"scheduled autopilot execution."
),
categories={BlockCategory.AI, BlockCategory.AGENT},
input_schema=AutoPilotBlock.Input,
output_schema=AutoPilotBlock.Output,
test_input={
"prompt": "List my agents",
"system_context": "",
"session_id": "",
"max_recursion_depth": 3,
},
test_output=[
("response", "You have 2 agents: Agent A and Agent B."),
("tool_calls", []),
(
"conversation_history",
'[{"role": "user", "content": "List my agents"}]',
),
("session_id", "test-session-id"),
(
"token_usage",
{
"prompt_tokens": 100,
"completion_tokens": 50,
"total_tokens": 150,
},
),
],
test_mock={
"create_session": lambda *args, **kwargs: "test-session-id",
"execute_copilot": lambda *args, **kwargs: (
"You have 2 agents: Agent A and Agent B.",
[],
'[{"role": "user", "content": "List my agents"}]',
"test-session-id",
{
"prompt_tokens": 100,
"completion_tokens": 50,
"total_tokens": 150,
},
),
},
)
async def create_session(self, user_id: str) -> str:
"""Create a new chat session and return its ID (mockable for tests)."""
from backend.copilot.model import create_chat_session
session = await create_chat_session(user_id)
return session.session_id
async def execute_copilot(
self,
prompt: str,
system_context: str,
session_id: str,
max_recursion_depth: int,
user_id: str,
) -> tuple[str, list[ToolCallEntry], str, str, TokenUsage]:
"""Invoke the copilot and collect all stream results.
Delegates to :func:`collect_copilot_response` — the shared helper that
consumes ``stream_chat_completion_sdk`` without wrapping it in an
``asyncio.timeout`` (the SDK manages its own heartbeat-based timeouts).
Args:
prompt: The user task/instruction.
system_context: Optional context prepended to the prompt.
session_id: Chat session to use.
max_recursion_depth: Maximum allowed recursion nesting.
user_id: Authenticated user ID.
Returns:
A tuple of (response_text, tool_calls, history_json, session_id, usage).
"""
from backend.copilot.sdk.collect import collect_copilot_response
tokens = _check_recursion(max_recursion_depth)
try:
effective_prompt = prompt
if system_context:
effective_prompt = f"[System Context: {system_context}]\n\n{prompt}"
result = await collect_copilot_response(
session_id=session_id,
message=effective_prompt,
user_id=user_id,
)
# Build a lightweight conversation summary from streamed data.
turn_messages: list[dict[str, Any]] = [
{"role": "user", "content": effective_prompt},
]
if result.tool_calls:
turn_messages.append(
{
"role": "assistant",
"content": result.response_text,
"tool_calls": result.tool_calls,
}
)
else:
turn_messages.append(
{"role": "assistant", "content": result.response_text}
)
history_json = json.dumps(turn_messages, default=str)
tool_calls: list[ToolCallEntry] = [
{
"tool_call_id": tc["tool_call_id"],
"tool_name": tc["tool_name"],
"input": tc["input"],
"output": tc["output"],
"success": tc["success"],
}
for tc in result.tool_calls
]
usage: TokenUsage = {
"prompt_tokens": result.prompt_tokens,
"completion_tokens": result.completion_tokens,
"total_tokens": result.total_tokens,
}
return (
result.response_text,
tool_calls,
history_json,
session_id,
usage,
)
finally:
_reset_recursion(tokens)
async def run(
self,
input_data: Input,
*,
execution_context: ExecutionContext,
**kwargs,
) -> BlockOutput:
"""Validate inputs, invoke the autopilot, and yield structured outputs.
Yields session_id even on failure so callers can inspect/resume the session.
"""
if not input_data.prompt.strip():
yield "error", "Prompt cannot be empty."
return
if not execution_context.user_id:
yield "error", "Cannot run autopilot without an authenticated user."
return
if input_data.max_recursion_depth < 1:
yield "error", "max_recursion_depth must be at least 1."
return
# Create session eagerly so the user always gets the session_id,
# even if the downstream stream fails (avoids orphaned sessions).
sid = input_data.session_id
if not sid:
sid = await self.create_session(execution_context.user_id)
# NOTE: No asyncio.timeout() here — the SDK manages its own
# heartbeat-based timeouts internally. Wrapping with asyncio.timeout
# would cancel the task mid-flight, corrupting the SDK's internal
# anyio memory stream (see service.py CRITICAL comment).
try:
response, tool_calls, history, _, usage = await self.execute_copilot(
prompt=input_data.prompt,
system_context=input_data.system_context,
session_id=sid,
max_recursion_depth=input_data.max_recursion_depth,
user_id=execution_context.user_id,
)
yield "response", response
yield "tool_calls", tool_calls
yield "conversation_history", history
yield "session_id", sid
yield "token_usage", usage
except asyncio.CancelledError:
yield "session_id", sid
yield "error", "AutoPilot execution was cancelled."
raise
except Exception as exc:
yield "session_id", sid
yield "error", str(exc)
# ---------------------------------------------------------------------------
# Helpers placed after the block class for top-down readability.
# ---------------------------------------------------------------------------
# Task-scoped recursion depth counter & chain-wide limit.
# contextvars are scoped to the current asyncio task, so concurrent
# graph executions each get independent counters.
_autopilot_recursion_depth: contextvars.ContextVar[int] = contextvars.ContextVar(
"_autopilot_recursion_depth", default=0
)
_autopilot_recursion_limit: contextvars.ContextVar[int | None] = contextvars.ContextVar(
"_autopilot_recursion_limit", default=None
)
def _check_recursion(
max_depth: int,
) -> tuple[contextvars.Token[int], contextvars.Token[int | None]]:
"""Check and increment recursion depth.
Returns ContextVar tokens that must be passed to ``_reset_recursion``
when the caller exits to restore the previous depth.
Raises:
RuntimeError: If the current depth already meets or exceeds the limit.
"""
current = _autopilot_recursion_depth.get()
inherited = _autopilot_recursion_limit.get()
limit = max_depth if inherited is None else min(inherited, max_depth)
if current >= limit:
raise RuntimeError(
f"AutoPilot recursion depth limit reached ({limit}). "
"The autopilot has called itself too many times."
)
return (
_autopilot_recursion_depth.set(current + 1),
_autopilot_recursion_limit.set(limit),
)
def _reset_recursion(
tokens: tuple[contextvars.Token[int], contextvars.Token[int | None]],
) -> None:
"""Restore recursion depth and limit to their previous values."""
_autopilot_recursion_depth.reset(tokens[0])
_autopilot_recursion_limit.reset(tokens[1])

View File

@@ -472,7 +472,7 @@ class AddToListBlock(Block):
async def run(self, input_data: Input, **kwargs) -> BlockOutput:
entries_added = input_data.entries.copy()
if input_data.entry is not None:
if input_data.entry:
entries_added.append(input_data.entry)
updated_list = input_data.list.copy()

View File

@@ -21,7 +21,6 @@ from backend.data.model import (
UserPasswordCredentials,
)
from backend.integrations.providers import ProviderName
from backend.util.request import resolve_and_check_blocked
TEST_CREDENTIALS = UserPasswordCredentials(
id="01234567-89ab-cdef-0123-456789abcdef",
@@ -100,8 +99,6 @@ class SendEmailBlock(Block):
is_sensitive_action=True,
)
ALLOWED_SMTP_PORTS = {25, 465, 587, 2525}
@staticmethod
def send_email(
config: SMTPConfig,
@@ -132,17 +129,6 @@ class SendEmailBlock(Block):
self, input_data: Input, *, credentials: SMTPCredentials, **kwargs
) -> BlockOutput:
try:
# --- SSRF Protection ---
smtp_port = input_data.config.smtp_port
if smtp_port not in self.ALLOWED_SMTP_PORTS:
yield "error", (
f"SMTP port {smtp_port} is not allowed. "
f"Allowed ports: {sorted(self.ALLOWED_SMTP_PORTS)}"
)
return
await resolve_and_check_blocked(input_data.config.smtp_server)
status = self.send_email(
config=input_data.config,
to_email=input_data.to_email,
@@ -194,19 +180,7 @@ class SendEmailBlock(Block):
"was rejected by the server. "
"Please verify your account is authorized to send emails."
)
except smtplib.SMTPConnectError:
yield "error", (
f"Cannot connect to SMTP server '{input_data.config.smtp_server}' "
f"on port {input_data.config.smtp_port}."
)
except smtplib.SMTPServerDisconnected:
yield "error", (
f"SMTP server '{input_data.config.smtp_server}' "
"disconnected unexpectedly."
)
except smtplib.SMTPDataError as e:
yield "error", f"Email data rejected by server: {str(e)}"
except ValueError as e:
yield "error", str(e)
except Exception as e:
raise e

View File

@@ -34,29 +34,17 @@ TEST_CREDENTIALS_INPUT = {
"provider": TEST_CREDENTIALS.provider,
"id": TEST_CREDENTIALS.id,
"type": TEST_CREDENTIALS.type,
"title": TEST_CREDENTIALS.title,
"title": TEST_CREDENTIALS.type,
}
class ImageEditorModel(str, Enum):
FLUX_KONTEXT_PRO = "Flux Kontext Pro"
FLUX_KONTEXT_MAX = "Flux Kontext Max"
NANO_BANANA_PRO = "Nano Banana Pro"
NANO_BANANA_2 = "Nano Banana 2"
class FluxKontextModelName(str, Enum):
PRO = "Flux Kontext Pro"
MAX = "Flux Kontext Max"
@property
def api_name(self) -> str:
_map = {
"FLUX_KONTEXT_PRO": "black-forest-labs/flux-kontext-pro",
"FLUX_KONTEXT_MAX": "black-forest-labs/flux-kontext-max",
"NANO_BANANA_PRO": "google/nano-banana-pro",
"NANO_BANANA_2": "google/nano-banana-2",
}
return _map[self.name]
# Keep old name as alias for backwards compatibility
FluxKontextModelName = ImageEditorModel
return f"black-forest-labs/flux-kontext-{self.name.lower()}"
class AspectRatio(str, Enum):
@@ -81,7 +69,7 @@ class AIImageEditorBlock(Block):
credentials: CredentialsMetaInput[
Literal[ProviderName.REPLICATE], Literal["api_key"]
] = CredentialsField(
description="Replicate API key with permissions for Flux Kontext and Nano Banana models",
description="Replicate API key with permissions for Flux Kontext models",
)
prompt: str = SchemaField(
description="Text instruction describing the desired edit",
@@ -99,14 +87,14 @@ class AIImageEditorBlock(Block):
advanced=False,
)
seed: Optional[int] = SchemaField(
description="Random seed. Set for reproducible generation (Flux Kontext only; ignored by Nano Banana models)",
description="Random seed. Set for reproducible generation",
default=None,
title="Seed",
advanced=True,
)
model: ImageEditorModel = SchemaField(
model: FluxKontextModelName = SchemaField(
description="Model variant to use",
default=ImageEditorModel.NANO_BANANA_2,
default=FluxKontextModelName.PRO,
title="Model",
)
@@ -119,7 +107,7 @@ class AIImageEditorBlock(Block):
super().__init__(
id="3fd9c73d-4370-4925-a1ff-1b86b99fabfa",
description=(
"Edit images using Flux Kontext or Google Nano Banana models. Provide a prompt "
"Edit images using BlackForest Labs' Flux Kontext models. Provide a prompt "
"and optional reference image to generate a modified image."
),
categories={BlockCategory.AI, BlockCategory.MULTIMEDIA},
@@ -130,7 +118,7 @@ class AIImageEditorBlock(Block):
"input_image": "data:image/png;base64,MQ==",
"aspect_ratio": AspectRatio.MATCH_INPUT_IMAGE,
"seed": None,
"model": ImageEditorModel.NANO_BANANA_2,
"model": FluxKontextModelName.PRO,
"credentials": TEST_CREDENTIALS_INPUT,
},
test_output=[
@@ -139,9 +127,7 @@ class AIImageEditorBlock(Block):
],
test_mock={
# Use data URI to avoid HTTP requests during tests
"run_model": lambda *args, **kwargs: (
"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="
),
"run_model": lambda *args, **kwargs: "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==",
},
test_credentials=TEST_CREDENTIALS,
)
@@ -156,7 +142,7 @@ class AIImageEditorBlock(Block):
) -> BlockOutput:
result = await self.run_model(
api_key=credentials.api_key,
model=input_data.model,
model_name=input_data.model.api_name,
prompt=input_data.prompt,
input_image_b64=(
await store_media_file(
@@ -183,7 +169,7 @@ class AIImageEditorBlock(Block):
async def run_model(
self,
api_key: SecretStr,
model: ImageEditorModel,
model_name: str,
prompt: str,
input_image_b64: Optional[str],
aspect_ratio: str,
@@ -192,29 +178,12 @@ class AIImageEditorBlock(Block):
graph_exec_id: str,
) -> MediaFileType:
client = ReplicateClient(api_token=api_key.get_secret_value())
model_name = model.api_name
is_nano_banana = model in (
ImageEditorModel.NANO_BANANA_PRO,
ImageEditorModel.NANO_BANANA_2,
)
if is_nano_banana:
input_params: dict = {
"prompt": prompt,
"aspect_ratio": aspect_ratio,
"output_format": "jpg",
"safety_filter_level": "block_only_high",
}
# NB API expects "image_input" as a list, unlike Flux's single "input_image"
if input_image_b64:
input_params["image_input"] = [input_image_b64]
else:
input_params = {
"prompt": prompt,
"input_image": input_image_b64,
"aspect_ratio": aspect_ratio,
**({"seed": seed} if seed is not None else {}),
}
input_params = {
"prompt": prompt,
"input_image": input_image_b64,
"aspect_ratio": aspect_ratio,
**({"seed": seed} if seed is not None else {}),
}
try:
output: FileOutput | list[FileOutput] = await client.async_run( # type: ignore

View File

@@ -211,7 +211,7 @@ class AgentOutputBlock(Block):
if input_data.format:
try:
formatter = TextFormatter(autoescape=input_data.escape_html)
yield "output", await formatter.format_string(
yield "output", formatter.format_string(
input_data.format, {input_data.name: input_data.value}
)
except Exception as e:

View File

@@ -33,13 +33,6 @@ from backend.integrations.providers import ProviderName
from backend.util import json
from backend.util.clients import OPENROUTER_BASE_URL
from backend.util.logging import TruncatedLogger
from backend.util.openai_responses import (
convert_tools_to_responses_format,
extract_responses_content,
extract_responses_reasoning,
extract_responses_tool_calls,
extract_responses_usage,
)
from backend.util.prompt import compress_context, estimate_token_count
from backend.util.request import validate_url_host
from backend.util.settings import Settings
@@ -118,6 +111,7 @@ class LlmModel(str, Enum, metaclass=LlmModelMeta):
GPT4O_MINI = "gpt-4o-mini"
GPT4O = "gpt-4o"
GPT4_TURBO = "gpt-4-turbo"
GPT3_5_TURBO = "gpt-3.5-turbo"
# Anthropic models
CLAUDE_4_1_OPUS = "claude-opus-4-1-20250805"
CLAUDE_4_OPUS = "claude-opus-4-20250514"
@@ -283,6 +277,9 @@ MODEL_METADATA = {
LlmModel.GPT4_TURBO: ModelMetadata(
"openai", 128000, 4096, "GPT-4 Turbo", "OpenAI", "OpenAI", 3
), # gpt-4-turbo-2024-04-09
LlmModel.GPT3_5_TURBO: ModelMetadata(
"openai", 16385, 4096, "GPT-3.5 Turbo", "OpenAI", "OpenAI", 1
), # gpt-3.5-turbo-0125
# https://docs.anthropic.com/en/docs/about-claude/models
LlmModel.CLAUDE_4_1_OPUS: ModelMetadata(
"anthropic", 200000, 32000, "Claude Opus 4.1", "Anthropic", "Anthropic", 3
@@ -796,19 +793,6 @@ async def llm_call(
)
prompt = result.messages
# Sanitize unpaired surrogates in message content to prevent
# UnicodeEncodeError when httpx encodes the JSON request body.
for msg in prompt:
content = msg.get("content")
if isinstance(content, str):
try:
content.encode("utf-8")
except UnicodeEncodeError:
logger.warning("Sanitized unpaired surrogates in LLM prompt content")
msg["content"] = content.encode("utf-8", errors="surrogatepass").decode(
"utf-8", errors="replace"
)
# Calculate available tokens based on context window and input length
estimated_input_tokens = estimate_token_count(prompt)
model_max_output = llm_model.max_output_tokens or int(2**15)
@@ -817,53 +801,36 @@ async def llm_call(
max_tokens = max(min(available_tokens, model_max_output, user_max), 1)
if provider == "openai":
tools_param = tools if tools else openai.NOT_GIVEN
oai_client = openai.AsyncOpenAI(api_key=credentials.api_key.get_secret_value())
response_format = None
tools_param = convert_tools_to_responses_format(tools) if tools else openai.omit
parallel_tool_calls = get_parallel_tool_calls_param(
llm_model, parallel_tool_calls
)
text_config = openai.omit
if force_json_output:
text_config = {"format": {"type": "json_object"}} # type: ignore
response_format = {"type": "json_object"}
response = await oai_client.responses.create(
response = await oai_client.chat.completions.create(
model=llm_model.value,
input=prompt, # type: ignore[arg-type]
tools=tools_param, # type: ignore[arg-type]
max_output_tokens=max_tokens,
parallel_tool_calls=get_parallel_tool_calls_param(
llm_model, parallel_tool_calls
),
text=text_config, # type: ignore[arg-type]
store=False,
messages=prompt, # type: ignore
response_format=response_format, # type: ignore
max_completion_tokens=max_tokens,
tools=tools_param, # type: ignore
parallel_tool_calls=parallel_tool_calls,
)
raw_tool_calls = extract_responses_tool_calls(response)
tool_calls = (
[
ToolContentBlock(
id=tc["id"],
type=tc["type"],
function=ToolCall(
name=tc["function"]["name"],
arguments=tc["function"]["arguments"],
),
)
for tc in raw_tool_calls
]
if raw_tool_calls
else None
)
reasoning = extract_responses_reasoning(response)
content = extract_responses_content(response)
prompt_tokens, completion_tokens = extract_responses_usage(response)
tool_calls = extract_openai_tool_calls(response)
reasoning = extract_openai_reasoning(response)
return LLMResponse(
raw_response=response,
raw_response=response.choices[0].message,
prompt=prompt,
response=content,
response=response.choices[0].message.content or "",
tool_calls=tool_calls,
prompt_tokens=prompt_tokens,
completion_tokens=completion_tokens,
prompt_tokens=response.usage.prompt_tokens if response.usage else 0,
completion_tokens=response.usage.completion_tokens if response.usage else 0,
reasoning=reasoning,
)
elif provider == "anthropic":
@@ -1309,10 +1276,8 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
values = input_data.prompt_values
if values:
input_data.prompt = await fmt.format_string(input_data.prompt, values)
input_data.sys_prompt = await fmt.format_string(
input_data.sys_prompt, values
)
input_data.prompt = fmt.format_string(input_data.prompt, values)
input_data.sys_prompt = fmt.format_string(input_data.sys_prompt, values)
if input_data.sys_prompt:
prompt.append({"role": "system", "content": input_data.sys_prompt})

View File

@@ -61,27 +61,20 @@ class ExecutionParams(BaseModel):
def _get_tool_requests(entry: dict[str, Any]) -> list[str]:
"""
Return a list of tool_call_ids if the entry is a tool request.
Supports OpenAI Chat Completions, Responses API, and Anthropic formats.
Supports both OpenAI and Anthropics formats.
"""
tool_call_ids = []
# OpenAI Responses API: function_call items have type="function_call"
if entry.get("type") == "function_call":
if call_id := entry.get("call_id"):
tool_call_ids.append(call_id)
return tool_call_ids
if entry.get("role") != "assistant":
return tool_call_ids
# OpenAI Chat Completions: check for tool_calls in the entry.
# OpenAI: check for tool_calls in the entry.
calls = entry.get("tool_calls")
if isinstance(calls, list):
for call in calls:
if tool_id := call.get("id"):
tool_call_ids.append(tool_id)
# Anthropic: check content items for tool_use type.
# Anthropics: check content items for tool_use type.
content = entry.get("content")
if isinstance(content, list):
for item in content:
@@ -96,22 +89,16 @@ def _get_tool_requests(entry: dict[str, Any]) -> list[str]:
def _get_tool_responses(entry: dict[str, Any]) -> list[str]:
"""
Return a list of tool_call_ids if the entry is a tool response.
Supports OpenAI Chat Completions, Responses API, and Anthropic formats.
Supports both OpenAI and Anthropics formats.
"""
tool_call_ids: list[str] = []
# OpenAI Responses API: function_call_output items
if entry.get("type") == "function_call_output":
if call_id := entry.get("call_id"):
tool_call_ids.append(str(call_id))
return tool_call_ids
# OpenAI Chat Completions: a tool response message with role "tool".
# OpenAI: a tool response message with role "tool" and key "tool_call_id".
if entry.get("role") == "tool":
if tool_call_id := entry.get("tool_call_id"):
tool_call_ids.append(str(tool_call_id))
# Anthropic: check content items for tool_result type.
# Anthropics: check content items for tool_result type.
if entry.get("role") == "user":
content = entry.get("content")
if isinstance(content, list):
@@ -124,16 +111,14 @@ def _get_tool_responses(entry: dict[str, Any]) -> list[str]:
return tool_call_ids
def _create_tool_response(
call_id: str, output: Any, *, responses_api: bool = False
) -> dict[str, Any]:
def _create_tool_response(call_id: str, output: Any) -> dict[str, Any]:
"""
Create a tool response message for OpenAI, Anthropic, or OpenAI Responses API,
based on the tool_id format and the responses_api flag.
Create a tool response message for either OpenAI or Anthropics,
based on the tool_id format.
"""
content = output if isinstance(output, str) else json.dumps(output)
# Anthropic format: tool IDs typically start with "toolu_"
# Anthropics format: tool IDs typically start with "toolu_"
if call_id.startswith("toolu_"):
return {
"role": "user",
@@ -143,11 +128,8 @@ def _create_tool_response(
],
}
# OpenAI Responses API format
if responses_api:
return {"type": "function_call_output", "call_id": call_id, "output": content}
# OpenAI Chat Completions format (default fallback)
# OpenAI format: tool IDs typically start with "call_".
# Or default fallback (if the tool_id doesn't match any known prefix)
return {"role": "tool", "tool_call_id": call_id, "content": content}
@@ -195,19 +177,10 @@ def _combine_tool_responses(tool_outputs: list[dict[str, Any]]) -> list[dict[str
return tool_outputs
def _convert_raw_response_to_dict(
raw_response: Any,
) -> dict[str, Any] | list[dict[str, Any]]:
def _convert_raw_response_to_dict(raw_response: Any) -> dict[str, Any]:
"""
Safely convert raw_response to dictionary format for conversation history.
Handles different response types from different LLM providers.
For the OpenAI Responses API, the raw_response is the entire Response
object. Its ``output`` items (messages, function_calls) are extracted
individually so they can be used as valid input items on the next call.
Returns a **list** of dicts in that case.
For Chat Completions / Anthropic / Ollama, returns a single dict.
"""
if isinstance(raw_response, str):
# Ollama returns a string, convert to dict format
@@ -215,28 +188,11 @@ def _convert_raw_response_to_dict(
elif isinstance(raw_response, dict):
# Already a dict (from tests or some providers)
return raw_response
elif _is_responses_api_object(raw_response):
# OpenAI Responses API: extract individual output items
items = [json.to_dict(item) for item in raw_response.output]
return items if items else [{"role": "assistant", "content": ""}]
else:
# Chat Completions / Anthropic return message objects
# OpenAI/Anthropic return objects, convert with json.to_dict
return json.to_dict(raw_response)
def _is_responses_api_object(obj: Any) -> bool:
"""Detect an OpenAI Responses API Response object.
These have ``object == "response"`` and an ``output`` list, but no
``role`` attribute (unlike ChatCompletionMessage).
"""
return (
getattr(obj, "object", None) == "response"
and hasattr(obj, "output")
and not hasattr(obj, "role")
)
def get_pending_tool_calls(conversation_history: list[Any] | None) -> dict[str, int]:
"""
All the tool calls entry in the conversation history requires a response.
@@ -798,34 +754,19 @@ class SmartDecisionMakerBlock(Block):
self, prompt: list[dict], response, tool_outputs: list | None = None
):
"""Update conversation history with response and tool outputs."""
converted = _convert_raw_response_to_dict(response.raw_response)
# Don't add separate reasoning message with tool calls (breaks Anthropic's tool_use->tool_result pairing)
assistant_message = _convert_raw_response_to_dict(response.raw_response)
has_tool_calls = isinstance(assistant_message.get("content"), list) and any(
item.get("type") == "tool_use"
for item in assistant_message.get("content", [])
)
if isinstance(converted, list):
# Responses API: output items are already individual dicts
has_tool_calls = any(
item.get("type") == "function_call" for item in converted
if response.reasoning and not has_tool_calls:
prompt.append(
{"role": "assistant", "content": f"[Reasoning]: {response.reasoning}"}
)
if response.reasoning and not has_tool_calls:
prompt.append(
{
"role": "assistant",
"content": f"[Reasoning]: {response.reasoning}",
}
)
prompt.extend(converted)
else:
# Chat Completions / Anthropic: single assistant message dict
has_tool_calls = isinstance(converted.get("content"), list) and any(
item.get("type") == "tool_use" for item in converted.get("content", [])
)
if response.reasoning and not has_tool_calls:
prompt.append(
{
"role": "assistant",
"content": f"[Reasoning]: {response.reasoning}",
}
)
prompt.append(converted)
prompt.append(assistant_message)
if tool_outputs:
prompt.extend(tool_outputs)
@@ -835,8 +776,6 @@ class SmartDecisionMakerBlock(Block):
tool_info: ToolInfo,
execution_params: ExecutionParams,
execution_processor: "ExecutionProcessor",
*,
responses_api: bool = False,
) -> dict:
"""Execute a single tool using the execution manager for proper integration."""
# Lazy imports to avoid circular dependencies
@@ -929,17 +868,13 @@ class SmartDecisionMakerBlock(Block):
if node_outputs
else "Tool executed successfully"
)
return _create_tool_response(
tool_call.id, tool_response_content, responses_api=responses_api
)
return _create_tool_response(tool_call.id, tool_response_content)
except Exception as e:
logger.warning(f"Tool execution with manager failed: {e}")
logger.error(f"Tool execution with manager failed: {e}")
# Return error response
return _create_tool_response(
tool_call.id,
f"Tool execution failed: {str(e)}",
responses_api=responses_api,
tool_call.id, f"Tool execution failed: {str(e)}"
)
async def _execute_tools_agent_mode(
@@ -960,7 +895,6 @@ class SmartDecisionMakerBlock(Block):
"""Execute tools in agent mode with a loop until finished."""
max_iterations = input_data.agent_mode_max_iterations
iteration = 0
use_responses_api = input_data.model.metadata.provider == "openai"
# Execution parameters for tool execution
execution_params = ExecutionParams(
@@ -1017,19 +951,14 @@ class SmartDecisionMakerBlock(Block):
for tool_info in processed_tools:
try:
tool_response = await self._execute_single_tool_with_manager(
tool_info,
execution_params,
execution_processor,
responses_api=use_responses_api,
tool_info, execution_params, execution_processor
)
tool_outputs.append(tool_response)
except Exception as e:
logger.error(f"Tool execution failed: {e}")
# Create error response for the tool
error_response = _create_tool_response(
tool_info.tool_call.id,
f"Error: {str(e)}",
responses_api=use_responses_api,
tool_info.tool_call.id, f"Error: {str(e)}"
)
tool_outputs.append(error_response)
@@ -1091,17 +1020,11 @@ class SmartDecisionMakerBlock(Block):
if pending_tool_calls and input_data.last_tool_output is None:
raise ValueError(f"Tool call requires an output for {pending_tool_calls}")
use_responses_api = input_data.model.metadata.provider == "openai"
tool_output = []
if pending_tool_calls and input_data.last_tool_output is not None:
first_call_id = next(iter(pending_tool_calls.keys()))
tool_output.append(
_create_tool_response(
first_call_id,
input_data.last_tool_output,
responses_api=use_responses_api,
)
_create_tool_response(first_call_id, input_data.last_tool_output)
)
prompt.extend(tool_output)
@@ -1127,15 +1050,11 @@ class SmartDecisionMakerBlock(Block):
values = input_data.prompt_values
if values:
input_data.prompt = await llm.fmt.format_string(input_data.prompt, values)
input_data.sys_prompt = await llm.fmt.format_string(
input_data.sys_prompt, values
)
input_data.prompt = llm.fmt.format_string(input_data.prompt, values)
input_data.sys_prompt = llm.fmt.format_string(input_data.sys_prompt, values)
if input_data.sys_prompt and not any(
p.get("role") == "system"
and isinstance(p.get("content"), str)
and p["content"].startswith(MAIN_OBJECTIVE_PREFIX)
p["role"] == "system" and p["content"].startswith(MAIN_OBJECTIVE_PREFIX)
for p in prompt
):
prompt.append(
@@ -1146,9 +1065,7 @@ class SmartDecisionMakerBlock(Block):
)
if input_data.prompt and not any(
p.get("role") == "user"
and isinstance(p.get("content"), str)
and p["content"].startswith(MAIN_OBJECTIVE_PREFIX)
p["role"] == "user" and p["content"].startswith(MAIN_OBJECTIVE_PREFIX)
for p in prompt
):
prompt.append(
@@ -1256,26 +1173,11 @@ class SmartDecisionMakerBlock(Block):
)
yield emit_key, arg_value
converted = _convert_raw_response_to_dict(response.raw_response)
# Check for tool calls to avoid inserting reasoning between tool pairs
if isinstance(converted, list):
has_tool_calls = any(
item.get("type") == "function_call" for item in converted
)
else:
has_tool_calls = isinstance(converted.get("content"), list) and any(
item.get("type") == "tool_use" for item in converted.get("content", [])
)
if response.reasoning and not has_tool_calls:
if response.reasoning:
prompt.append(
{"role": "assistant", "content": f"[Reasoning]: {response.reasoning}"}
)
if isinstance(converted, list):
prompt.extend(converted)
else:
prompt.append(converted)
prompt.append(_convert_raw_response_to_dict(response.raw_response))
yield "conversations", prompt

View File

@@ -1,223 +0,0 @@
"""Tests for AutoPilotBlock: recursion guard, streaming, validation, and error paths."""
import asyncio
from unittest.mock import AsyncMock
import pytest
from backend.blocks.autopilot import (
AUTOPILOT_BLOCK_ID,
AutoPilotBlock,
_autopilot_recursion_depth,
_autopilot_recursion_limit,
_check_recursion,
_reset_recursion,
)
from backend.data.execution import ExecutionContext
def _make_context(user_id: str = "test-user-123") -> ExecutionContext:
"""Helper to build an ExecutionContext for tests."""
return ExecutionContext(
user_id=user_id,
graph_id="graph-1",
graph_exec_id="gexec-1",
graph_version=1,
node_id="node-1",
node_exec_id="nexec-1",
)
# ---------------------------------------------------------------------------
# Recursion guard unit tests
# ---------------------------------------------------------------------------
class TestCheckRecursion:
"""Unit tests for _check_recursion / _reset_recursion."""
def test_first_call_increments_depth(self):
tokens = _check_recursion(3)
try:
assert _autopilot_recursion_depth.get() == 1
assert _autopilot_recursion_limit.get() == 3
finally:
_reset_recursion(tokens)
def test_reset_restores_previous_values(self):
assert _autopilot_recursion_depth.get() == 0
assert _autopilot_recursion_limit.get() is None
tokens = _check_recursion(5)
_reset_recursion(tokens)
assert _autopilot_recursion_depth.get() == 0
assert _autopilot_recursion_limit.get() is None
def test_exceeding_limit_raises(self):
t1 = _check_recursion(2)
try:
t2 = _check_recursion(2)
try:
with pytest.raises(RuntimeError, match="recursion depth limit"):
_check_recursion(2)
finally:
_reset_recursion(t2)
finally:
_reset_recursion(t1)
def test_nested_calls_respect_inherited_limit(self):
"""Inner call with higher max_depth still respects outer limit."""
t1 = _check_recursion(2) # sets limit=2
try:
t2 = _check_recursion(10) # inner wants 10, but inherited is 2
try:
# depth is now 2, limit is min(10, 2) = 2 → should raise
with pytest.raises(RuntimeError, match="recursion depth limit"):
_check_recursion(10)
finally:
_reset_recursion(t2)
finally:
_reset_recursion(t1)
def test_limit_of_one_blocks_immediately_on_second_call(self):
t1 = _check_recursion(1)
try:
with pytest.raises(RuntimeError):
_check_recursion(1)
finally:
_reset_recursion(t1)
# ---------------------------------------------------------------------------
# AutoPilotBlock.run() validation tests
# ---------------------------------------------------------------------------
class TestRunValidation:
"""Tests for input validation in AutoPilotBlock.run()."""
@pytest.fixture
def block(self):
return AutoPilotBlock()
@pytest.mark.asyncio
async def test_empty_prompt_yields_error(self, block):
block.Input # ensure schema is accessible
input_data = block.Input(prompt=" ", max_recursion_depth=3)
ctx = _make_context()
outputs = {}
async for name, value in block.run(input_data, execution_context=ctx):
outputs[name] = value
assert outputs.get("error") == "Prompt cannot be empty."
assert "response" not in outputs
@pytest.mark.asyncio
async def test_missing_user_id_yields_error(self, block):
input_data = block.Input(prompt="hello", max_recursion_depth=3)
ctx = _make_context(user_id="")
outputs = {}
async for name, value in block.run(input_data, execution_context=ctx):
outputs[name] = value
assert "authenticated user" in outputs.get("error", "")
@pytest.mark.asyncio
async def test_successful_run_yields_all_outputs(self, block):
"""With execute_copilot mocked, run() should yield all 5 success outputs."""
mock_result = (
"Hello world",
[],
'[{"role":"user","content":"hi"}]',
"sess-abc",
{"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15},
)
block.execute_copilot = AsyncMock(return_value=mock_result)
block.create_session = AsyncMock(return_value="sess-abc")
input_data = block.Input(prompt="hi", max_recursion_depth=3)
ctx = _make_context()
outputs = {}
async for name, value in block.run(input_data, execution_context=ctx):
outputs[name] = value
assert outputs["response"] == "Hello world"
assert outputs["tool_calls"] == []
assert outputs["session_id"] == "sess-abc"
assert outputs["token_usage"]["total_tokens"] == 15
assert "error" not in outputs
@pytest.mark.asyncio
async def test_exception_yields_error(self, block):
"""On unexpected failure, run() should yield an error output."""
block.execute_copilot = AsyncMock(side_effect=RuntimeError("boom"))
block.create_session = AsyncMock(return_value="sess-fail")
input_data = block.Input(prompt="do something", max_recursion_depth=3)
ctx = _make_context()
outputs = {}
async for name, value in block.run(input_data, execution_context=ctx):
outputs[name] = value
assert outputs["session_id"] == "sess-fail"
assert "boom" in outputs.get("error", "")
@pytest.mark.asyncio
async def test_cancelled_error_yields_error_and_reraises(self, block):
"""CancelledError should yield error, then re-raise."""
block.execute_copilot = AsyncMock(side_effect=asyncio.CancelledError())
block.create_session = AsyncMock(return_value="sess-cancel")
input_data = block.Input(prompt="do something", max_recursion_depth=3)
ctx = _make_context()
outputs = {}
with pytest.raises(asyncio.CancelledError):
async for name, value in block.run(input_data, execution_context=ctx):
outputs[name] = value
assert outputs["session_id"] == "sess-cancel"
assert "cancelled" in outputs.get("error", "").lower()
@pytest.mark.asyncio
async def test_existing_session_id_skips_create(self, block):
"""When session_id is provided, create_session should not be called."""
mock_result = (
"ok",
[],
"[]",
"existing-sid",
{"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
)
block.execute_copilot = AsyncMock(return_value=mock_result)
block.create_session = AsyncMock()
input_data = block.Input(
prompt="test", session_id="existing-sid", max_recursion_depth=3
)
ctx = _make_context()
async for _ in block.run(input_data, execution_context=ctx):
pass
block.create_session.assert_not_called()
# ---------------------------------------------------------------------------
# Block registration / ID tests
# ---------------------------------------------------------------------------
class TestBlockRegistration:
def test_block_id_matches_constant(self):
block = AutoPilotBlock()
assert block.id == AUTOPILOT_BLOCK_ID
def test_max_recursion_depth_has_upper_bound(self):
"""Schema should enforce le=10."""
schema = AutoPilotBlock.Input.model_json_schema()
max_rec = schema["properties"]["max_recursion_depth"]
assert (
max_rec.get("maximum") == 10 or max_rec.get("exclusiveMaximum", 999) <= 11
)
def test_output_schema_has_no_duplicate_error_field(self):
"""Output should inherit error from BlockSchemaOutput, not redefine it."""
# The field should exist (inherited) but there should be no explicit
# redefinition. We verify by checking the class __annotations__ directly.
assert "error" not in AutoPilotBlock.Output.__annotations__

View File

@@ -13,17 +13,18 @@ class TestLLMStatsTracking:
"""Test that llm_call returns proper token counts in LLMResponse."""
import backend.blocks.llm as llm
# Mock the OpenAI Responses API response
# Mock the OpenAI client
mock_response = MagicMock()
mock_response.output_text = "Test response"
mock_response.output = []
mock_response.usage = MagicMock(input_tokens=10, output_tokens=20)
mock_response.choices = [
MagicMock(message=MagicMock(content="Test response", tool_calls=None))
]
mock_response.usage = MagicMock(prompt_tokens=10, completion_tokens=20)
# Test with mocked OpenAI response
with patch("openai.AsyncOpenAI") as mock_openai:
mock_client = AsyncMock()
mock_openai.return_value = mock_client
mock_client.responses.create = AsyncMock(return_value=mock_response)
mock_client.chat.completions.create = AsyncMock(return_value=mock_response)
response = await llm.llm_call(
credentials=llm.TEST_CREDENTIALS,
@@ -270,17 +271,30 @@ class TestLLMStatsTracking:
mock_response = MagicMock()
# Return different responses for chunk summary vs final summary
if call_count == 1:
mock_response.output_text = '<json_output id="test123456">{"summary": "Test chunk summary"}</json_output>'
mock_response.choices = [
MagicMock(
message=MagicMock(
content='<json_output id="test123456">{"summary": "Test chunk summary"}</json_output>',
tool_calls=None,
)
)
]
else:
mock_response.output_text = '<json_output id="test123456">{"final_summary": "Test final summary"}</json_output>'
mock_response.output = []
mock_response.usage = MagicMock(input_tokens=50, output_tokens=30)
mock_response.choices = [
MagicMock(
message=MagicMock(
content='<json_output id="test123456">{"final_summary": "Test final summary"}</json_output>',
tool_calls=None,
)
)
]
mock_response.usage = MagicMock(prompt_tokens=50, completion_tokens=30)
return mock_response
with patch("openai.AsyncOpenAI") as mock_openai:
mock_client = AsyncMock()
mock_openai.return_value = mock_client
mock_client.responses.create = mock_create
mock_client.chat.completions.create = mock_create
# Test with very short text (should only need 1 chunk + 1 final summary)
input_data = llm.AITextSummarizerBlock.Input(

View File

@@ -290,9 +290,7 @@ class FillTextTemplateBlock(Block):
async def run(self, input_data: Input, **kwargs) -> BlockOutput:
formatter = text.TextFormatter(autoescape=input_data.escape_html)
yield "output", await formatter.format_string(
input_data.format, input_data.values
)
yield "output", formatter.format_string(input_data.format, input_data.values)
class CombineTextsBlock(Block):

View File

@@ -115,22 +115,10 @@ class ChatConfig(BaseSettings):
description="Use --resume for multi-turn conversations instead of "
"history compression. Falls back to compression when unavailable.",
)
use_openrouter: bool = Field(
default=True,
description="Enable routing API calls through the OpenRouter proxy. "
"The actual decision also requires ``api_key`` and ``base_url`` — "
"use the ``openrouter_active`` property for the final answer.",
)
use_claude_code_subscription: bool = Field(
default=False,
description="For personal/dev use: use Claude Code CLI subscription auth instead of API keys. Requires `claude login` on the host. Only works with SDK mode.",
)
test_mode: bool = Field(
default=False,
description="Use dummy service instead of real LLM calls. "
"Send __test_transient_error__, __test_fatal_error__, or "
"__test_slow_response__ to trigger specific scenarios.",
)
# E2B Sandbox Configuration
use_e2b_sandbox: bool = Field(
@@ -148,7 +136,7 @@ class ChatConfig(BaseSettings):
description="E2B sandbox template to use for copilot sessions.",
)
e2b_sandbox_timeout: int = Field(
default=420, # 7 min safety net — allows headroom for compaction retries
default=300, # 5 min safety net — explicit per-turn pause is the primary mechanism
description="E2B sandbox running-time timeout (seconds). "
"E2B timeout is wall-clock (not idle). Explicit per-turn pause is the primary "
"mechanism; this is the safety net.",
@@ -158,21 +146,6 @@ class ChatConfig(BaseSettings):
description="E2B lifecycle action on timeout: 'pause' (default, free) or 'kill'.",
)
@property
def openrouter_active(self) -> bool:
"""True when OpenRouter is enabled AND credentials are usable.
Single source of truth for "will the SDK route through OpenRouter?".
Checks the flag *and* that ``api_key`` + a valid ``base_url`` are
present — mirrors the fallback logic in ``_build_sdk_env``.
"""
if not self.use_openrouter:
return False
base = (self.base_url or "").rstrip("/")
if base.endswith("/v1"):
base = base[:-3]
return bool(self.api_key and base and base.startswith("http"))
@property
def e2b_active(self) -> bool:
"""True when E2B is enabled and the API key is present.
@@ -195,6 +168,15 @@ class ChatConfig(BaseSettings):
"""
return self.e2b_api_key if self.e2b_active else None
@field_validator("use_e2b_sandbox", mode="before")
@classmethod
def get_use_e2b_sandbox(cls, v):
"""Get use_e2b_sandbox from environment if not provided."""
env_val = os.getenv("CHAT_USE_E2B_SANDBOX", "").lower()
if env_val:
return env_val in ("true", "1", "yes", "on")
return True if v is None else v
@field_validator("e2b_api_key", mode="before")
@classmethod
def get_e2b_api_key(cls, v):
@@ -237,6 +219,26 @@ class ChatConfig(BaseSettings):
v = OPENROUTER_BASE_URL
return v
@field_validator("use_claude_agent_sdk", mode="before")
@classmethod
def get_use_claude_agent_sdk(cls, v):
"""Get use_claude_agent_sdk from environment if not provided."""
# Check environment variable - default to True if not set
env_val = os.getenv("CHAT_USE_CLAUDE_AGENT_SDK", "").lower()
if env_val:
return env_val in ("true", "1", "yes", "on")
# Default to True (SDK enabled by default)
return True if v is None else v
@field_validator("use_claude_code_subscription", mode="before")
@classmethod
def get_use_claude_code_subscription(cls, v):
"""Get use_claude_code_subscription from environment if not provided."""
env_val = os.getenv("CHAT_USE_CLAUDE_CODE_SUBSCRIPTION", "").lower()
if env_val:
return env_val in ("true", "1", "yes", "on")
return False if v is None else v
# Prompt paths for different contexts
PROMPT_PATHS: dict[str, str] = {
"default": "prompts/chat_system.md",
@@ -246,7 +248,6 @@ class ChatConfig(BaseSettings):
class Config:
"""Pydantic config."""
env_prefix = "CHAT_"
env_file = ".env"
env_file_encoding = "utf-8"
extra = "ignore" # Ignore extra environment variables

View File

@@ -6,70 +6,19 @@ from .config import ChatConfig
# Env vars that the ChatConfig validators read — must be cleared so they don't
# override the explicit constructor values we pass in each test.
_ENV_VARS_TO_CLEAR = (
_E2B_ENV_VARS = (
"CHAT_USE_E2B_SANDBOX",
"CHAT_E2B_API_KEY",
"E2B_API_KEY",
"CHAT_USE_OPENROUTER",
"CHAT_API_KEY",
"OPEN_ROUTER_API_KEY",
"OPENAI_API_KEY",
"CHAT_BASE_URL",
"OPENROUTER_BASE_URL",
"OPENAI_BASE_URL",
)
@pytest.fixture(autouse=True)
def _clean_env(monkeypatch: pytest.MonkeyPatch) -> None:
for var in _ENV_VARS_TO_CLEAR:
def _clean_e2b_env(monkeypatch: pytest.MonkeyPatch) -> None:
for var in _E2B_ENV_VARS:
monkeypatch.delenv(var, raising=False)
class TestOpenrouterActive:
"""Tests for the openrouter_active property."""
def test_enabled_with_credentials_returns_true(self):
cfg = ChatConfig(
use_openrouter=True,
api_key="or-key",
base_url="https://openrouter.ai/api/v1",
)
assert cfg.openrouter_active is True
def test_enabled_but_missing_api_key_returns_false(self):
cfg = ChatConfig(
use_openrouter=True,
api_key=None,
base_url="https://openrouter.ai/api/v1",
)
assert cfg.openrouter_active is False
def test_disabled_returns_false_despite_credentials(self):
cfg = ChatConfig(
use_openrouter=False,
api_key="or-key",
base_url="https://openrouter.ai/api/v1",
)
assert cfg.openrouter_active is False
def test_strips_v1_suffix_and_still_valid(self):
cfg = ChatConfig(
use_openrouter=True,
api_key="or-key",
base_url="https://openrouter.ai/api/v1",
)
assert cfg.openrouter_active is True
def test_invalid_base_url_returns_false(self):
cfg = ChatConfig(
use_openrouter=True,
api_key="or-key",
base_url="not-a-url",
)
assert cfg.openrouter_active is False
class TestE2BActive:
"""Tests for the e2b_active property — single source of truth for E2B usage."""

View File

@@ -4,9 +4,6 @@
# The hex suffix makes accidental LLM generation of these strings virtually
# impossible, avoiding false-positive marker detection in normal conversation.
COPILOT_ERROR_PREFIX = "[__COPILOT_ERROR_f7a1__]" # Renders as ErrorCard
COPILOT_RETRYABLE_ERROR_PREFIX = (
"[__COPILOT_RETRYABLE_ERROR_a9c2__]" # ErrorCard + retry
)
COPILOT_SYSTEM_PREFIX = "[__COPILOT_SYSTEM_e3b0__]" # Renders as system info message
# Prefix for all synthetic IDs generated by CoPilot block execution.
@@ -38,24 +35,3 @@ def parse_node_id_from_exec_id(node_exec_id: str) -> str:
Format: "{node_id}:{random_hex}" → returns "{node_id}".
"""
return node_exec_id.rsplit(COPILOT_NODE_EXEC_ID_SEPARATOR, 1)[0]
# ---------------------------------------------------------------------------
# Transient Anthropic API error detection
# ---------------------------------------------------------------------------
# Patterns in error text that indicate a transient Anthropic API error
# (ECONNRESET / dropped TCP connection) which is retryable.
_TRANSIENT_ERROR_PATTERNS = (
"socket connection was closed unexpectedly",
"ECONNRESET",
"connection was forcibly closed",
"network socket disconnected",
)
FRIENDLY_TRANSIENT_MSG = "Anthropic connection interrupted — please retry"
def is_transient_api_error(error_text: str) -> bool:
"""Return True if *error_text* matches a known transient Anthropic API error."""
lower = error_text.lower()
return any(pat.lower() in lower for pat in _TRANSIENT_ERROR_PATTERNS)

View File

@@ -17,17 +17,8 @@ from backend.util.workspace import WorkspaceManager
if TYPE_CHECKING:
from e2b import AsyncSandbox
# Allowed base directory for the Read tool. Public so service.py can use it
# for sweep operations without depending on a private implementation detail.
# Respects CLAUDE_CONFIG_DIR env var, consistent with transcript.py's
# _projects_base() function.
_config_dir = os.environ.get("CLAUDE_CONFIG_DIR") or os.path.expanduser("~/.claude")
SDK_PROJECTS_DIR = os.path.realpath(os.path.join(_config_dir, "projects"))
# Compiled UUID pattern for validating conversation directory names.
# Kept as a module-level constant so the security-relevant pattern is easy
# to audit in one place and avoids recompilation on every call.
_UUID_RE = re.compile(r"^[0-9a-f]{8}(?:-[0-9a-f]{4}){3}-[0-9a-f]{12}$", re.IGNORECASE)
# Allowed base directory for the Read tool.
_SDK_PROJECTS_DIR = os.path.realpath(os.path.expanduser("~/.claude/projects"))
# Encoded project-directory name for the current session (e.g.
# "-private-tmp-copilot-<uuid>"). Set by set_execution_context() so path
@@ -44,20 +35,11 @@ _current_sandbox: ContextVar["AsyncSandbox | None"] = ContextVar(
_current_sdk_cwd: ContextVar[str] = ContextVar("_current_sdk_cwd", default="")
def encode_cwd_for_cli(cwd: str) -> str:
"""Encode a working directory path the same way the Claude CLI does.
The Claude CLI encodes the absolute cwd as a directory name by replacing
every non-alphanumeric character with ``-``. For example
``/tmp/copilot-abc`` becomes ``-tmp-copilot-abc``.
"""
def _encode_cwd_for_cli(cwd: str) -> str:
"""Encode a working directory path the same way the Claude CLI does."""
return re.sub(r"[^a-zA-Z0-9]", "-", os.path.realpath(cwd))
# Keep the private alias for internal callers (backwards compat).
_encode_cwd_for_cli = encode_cwd_for_cli
def set_execution_context(
user_id: str | None,
session: ChatSession,
@@ -118,9 +100,7 @@ def is_allowed_local_path(path: str, sdk_cwd: str | None = None) -> bool:
Allowed:
- Files under *sdk_cwd* (``/tmp/copilot-<session>/``)
- Files under ``~/.claude/projects/<encoded-cwd>/<uuid>/tool-results/...``.
The SDK nests tool-results under a conversation UUID directory;
the UUID segment is validated with ``_UUID_RE``.
- Files under ``~/.claude/projects/<encoded-cwd>/tool-results/`` (SDK tool-results)
"""
if not path:
return False
@@ -139,22 +119,10 @@ def is_allowed_local_path(path: str, sdk_cwd: str | None = None) -> bool:
encoded = _current_project_dir.get("")
if encoded:
project_dir = os.path.realpath(os.path.join(SDK_PROJECTS_DIR, encoded))
# Defence-in-depth: ensure project_dir didn't escape the base.
if not project_dir.startswith(SDK_PROJECTS_DIR + os.sep):
return False
# Only allow: <encoded-cwd>/<uuid>/tool-results/<file>
# The SDK always creates a conversation UUID directory between
# the project dir and tool-results/.
if resolved.startswith(project_dir + os.sep):
relative = resolved[len(project_dir) + 1 :]
parts = relative.split(os.sep)
# Require exactly: [<uuid>, "tool-results", <file>, ...]
if (
len(parts) >= 3
and _UUID_RE.match(parts[0])
and parts[1] == "tool-results"
):
return True
tool_results_dir = os.path.join(_SDK_PROJECTS_DIR, encoded, "tool-results")
if resolved == tool_results_dir or resolved.startswith(
tool_results_dir + os.sep
):
return True
return False

View File

@@ -9,7 +9,7 @@ from unittest.mock import MagicMock
import pytest
from backend.copilot.context import (
SDK_PROJECTS_DIR,
_SDK_PROJECTS_DIR,
_current_project_dir,
get_current_sandbox,
get_execution_context,
@@ -104,13 +104,11 @@ def test_is_allowed_local_path_no_sdk_cwd_no_project_dir():
assert not is_allowed_local_path("/tmp/some-file.txt", sdk_cwd=None)
def test_is_allowed_local_path_tool_results_with_uuid():
"""Files under <encoded-cwd>/<uuid>/tool-results/ are allowed."""
def test_is_allowed_local_path_tool_results_dir():
"""Files under the tool-results directory for the current project are allowed."""
encoded = "test-encoded-dir"
conv_uuid = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
path = os.path.join(
SDK_PROJECTS_DIR, encoded, conv_uuid, "tool-results", "output.txt"
)
tool_results_dir = os.path.join(_SDK_PROJECTS_DIR, encoded, "tool-results")
path = os.path.join(tool_results_dir, "output.txt")
_current_project_dir.set(encoded)
try:
@@ -119,22 +117,10 @@ def test_is_allowed_local_path_tool_results_with_uuid():
_current_project_dir.set("")
def test_is_allowed_local_path_tool_results_without_uuid_rejected():
"""Direct <encoded-cwd>/tool-results/ (no UUID) is rejected."""
encoded = "test-encoded-dir"
path = os.path.join(SDK_PROJECTS_DIR, encoded, "tool-results", "output.txt")
_current_project_dir.set(encoded)
try:
assert not is_allowed_local_path(path, sdk_cwd=None)
finally:
_current_project_dir.set("")
def test_is_allowed_local_path_sibling_of_tool_results_is_rejected():
"""A path adjacent to tool-results/ but not inside it is rejected."""
encoded = "test-encoded-dir"
sibling_path = os.path.join(SDK_PROJECTS_DIR, encoded, "other-dir", "file.txt")
sibling_path = os.path.join(_SDK_PROJECTS_DIR, encoded, "other-dir", "file.txt")
_current_project_dir.set(encoded)
try:
@@ -143,21 +129,6 @@ def test_is_allowed_local_path_sibling_of_tool_results_is_rejected():
_current_project_dir.set("")
def test_is_allowed_local_path_valid_uuid_wrong_segment_name_rejected():
"""A valid UUID dir but non-'tool-results' second segment is rejected."""
encoded = "test-encoded-dir"
uuid_str = "12345678-1234-5678-9abc-def012345678"
path = os.path.join(
SDK_PROJECTS_DIR, encoded, uuid_str, "not-tool-results", "output.txt"
)
_current_project_dir.set(encoded)
try:
assert not is_allowed_local_path(path, sdk_cwd=None)
finally:
_current_project_dir.set("")
# ---------------------------------------------------------------------------
# resolve_sandbox_path
# ---------------------------------------------------------------------------

View File

@@ -16,7 +16,6 @@ from backend.copilot.baseline import stream_chat_completion_baseline
from backend.copilot.config import ChatConfig
from backend.copilot.response_model import StreamFinish
from backend.copilot.sdk import service as sdk_service
from backend.copilot.sdk.dummy import stream_chat_completion_dummy
from backend.executor.cluster_lock import ClusterLock
from backend.util.decorator import error_logged
from backend.util.feature_flag import Flag, is_feature_enabled
@@ -247,25 +246,17 @@ class CoPilotProcessor:
# Choose service based on LaunchDarkly flag.
# Claude Code subscription forces SDK mode (CLI subprocess auth).
config = ChatConfig()
if config.test_mode:
stream_fn = stream_chat_completion_dummy
log.warning("Using DUMMY service (CHAT_TEST_MODE=true)")
else:
use_sdk = (
config.use_claude_code_subscription
or await is_feature_enabled(
Flag.COPILOT_SDK,
entry.user_id or "anonymous",
default=config.use_claude_agent_sdk,
)
)
stream_fn = (
sdk_service.stream_chat_completion_sdk
if use_sdk
else stream_chat_completion_baseline
)
log.info(f"Using {'SDK' if use_sdk else 'baseline'} service")
use_sdk = config.use_claude_code_subscription or await is_feature_enabled(
Flag.COPILOT_SDK,
entry.user_id or "anonymous",
default=config.use_claude_agent_sdk,
)
stream_fn = (
sdk_service.stream_chat_completion_sdk
if use_sdk
else stream_chat_completion_baseline
)
log.info(f"Using {'SDK' if use_sdk else 'baseline'} service")
# Stream chat completion and publish chunks to Redis.
async for chunk in stream_fn(

View File

@@ -1,173 +0,0 @@
"""Integration credential lookup with per-process TTL cache.
Provides token retrieval for connected integrations so that copilot tools
(e.g. bash_exec) can inject auth tokens into the execution environment without
hitting the database on every command.
Cache semantics (handled automatically by TTLCache):
- Token found → cached for _TOKEN_CACHE_TTL (5 min). Avoids repeated DB hits
for users who have credentials and are running many bash commands.
- No credentials found → cached for _NULL_CACHE_TTL (60 s). Avoids a DB hit
on every E2B command for users who haven't connected an account yet, while
still picking up a newly-connected account within one minute.
Both caches are bounded to _CACHE_MAX_SIZE entries; cachetools evicts the
least-recently-used entry when the limit is reached.
Multi-worker note: both caches are in-process only. Each worker/replica
maintains its own independent cache, so a credential fetch may be duplicated
across processes. This is acceptable for the current goal (reduce DB hits per
session per-process), but if cache efficiency across replicas becomes important
a shared cache (e.g. Redis) should be used instead.
"""
import logging
from typing import cast
from cachetools import TTLCache
from backend.copilot.providers import SUPPORTED_PROVIDERS
from backend.data.model import APIKeyCredentials, OAuth2Credentials
from backend.integrations.creds_manager import (
IntegrationCredentialsManager,
register_creds_changed_hook,
)
logger = logging.getLogger(__name__)
# Derived from the single SUPPORTED_PROVIDERS registry for backward compat.
PROVIDER_ENV_VARS: dict[str, list[str]] = {
slug: entry["env_vars"] for slug, entry in SUPPORTED_PROVIDERS.items()
}
_TOKEN_CACHE_TTL = 300.0 # seconds — for found tokens
_NULL_CACHE_TTL = 60.0 # seconds — for "not connected" results
_CACHE_MAX_SIZE = 10_000
# (user_id, provider) → token string. TTLCache handles expiry + eviction.
# Thread-safety note: TTLCache is NOT thread-safe, but that is acceptable here
# because all callers (get_provider_token, invalidate_user_provider_cache) run
# exclusively on the asyncio event loop. There are no await points between a
# cache read and its corresponding write within any function, so no concurrent
# coroutine can interleave. If ThreadPoolExecutor workers are ever added to
# this path, a threading.RLock should be wrapped around these caches.
_token_cache: TTLCache[tuple[str, str], str] = TTLCache(
maxsize=_CACHE_MAX_SIZE, ttl=_TOKEN_CACHE_TTL
)
# Separate cache for "no credentials" results with a shorter TTL.
_null_cache: TTLCache[tuple[str, str], bool] = TTLCache(
maxsize=_CACHE_MAX_SIZE, ttl=_NULL_CACHE_TTL
)
def invalidate_user_provider_cache(user_id: str, provider: str) -> None:
"""Remove the cached entry for *user_id*/*provider* from both caches.
Call this after storing new credentials so that the next
``get_provider_token()`` call performs a fresh DB lookup instead of
serving a stale TTL-cached result.
"""
key = (user_id, provider)
_token_cache.pop(key, None)
_null_cache.pop(key, None)
# Register this module's cache-bust function with the credentials manager so
# that any create/update/delete operation immediately evicts stale cache
# entries. This avoids a lazy import inside creds_manager and eliminates the
# circular-import risk.
try:
register_creds_changed_hook(invalidate_user_provider_cache)
except RuntimeError:
# Hook already registered (e.g. module re-import in tests).
pass
# Module-level singleton to avoid re-instantiating IntegrationCredentialsManager
# on every cache-miss call to get_provider_token().
_manager = IntegrationCredentialsManager()
async def get_provider_token(user_id: str, provider: str) -> str | None:
"""Return the user's access token for *provider*, or ``None`` if not connected.
OAuth2 tokens are preferred (refreshed if needed); API keys are the fallback.
Found tokens are cached for _TOKEN_CACHE_TTL (5 min). "Not connected" results
are cached for _NULL_CACHE_TTL (60 s) to avoid a DB hit on every bash_exec
command for users who haven't connected yet, while still picking up a
newly-connected account within one minute.
"""
cache_key = (user_id, provider)
if cache_key in _null_cache:
return None
if cached := _token_cache.get(cache_key):
return cached
manager = _manager
try:
creds_list = await manager.store.get_creds_by_provider(user_id, provider)
except Exception:
logger.warning(
"Failed to fetch %s credentials for user %s",
provider,
user_id,
exc_info=True,
)
return None
# Pass 1: prefer OAuth2 (carry scope info, refreshable via token endpoint).
# Sort so broader-scoped tokens come first: a token with "repo" scope covers
# full git access, while a public-data-only token lacks push/pull permission.
# lock=False — background injection; not worth a distributed lock acquisition.
oauth2_creds = sorted(
[c for c in creds_list if c.type == "oauth2"],
key=lambda c: 0 if "repo" in (cast(OAuth2Credentials, c).scopes or []) else 1,
)
for creds in oauth2_creds:
if creds.type == "oauth2":
try:
fresh = await manager.refresh_if_needed(
user_id, cast(OAuth2Credentials, creds), lock=False
)
token = fresh.access_token.get_secret_value()
except Exception:
logger.warning(
"Failed to refresh %s OAuth token for user %s; "
"discarding stale token to force re-auth",
provider,
user_id,
exc_info=True,
)
# Do NOT fall back to the stale token — it is likely expired
# or revoked. Returning None forces the caller to re-auth,
# preventing the LLM from receiving a non-functional token.
continue
_token_cache[cache_key] = token
return token
# Pass 2: fall back to API key (no expiry, no refresh needed).
for creds in creds_list:
if creds.type == "api_key":
token = cast(APIKeyCredentials, creds).api_key.get_secret_value()
_token_cache[cache_key] = token
return token
# No credentials found — cache to avoid repeated DB hits.
_null_cache[cache_key] = True
return None
async def get_integration_env_vars(user_id: str) -> dict[str, str]:
"""Return env vars for all providers the user has connected.
Iterates :data:`PROVIDER_ENV_VARS`, fetches each token, and builds a flat
``{env_var: token}`` dict ready to pass to a subprocess or E2B sandbox.
Only providers with a stored credential contribute entries.
"""
env: dict[str, str] = {}
for provider, var_names in PROVIDER_ENV_VARS.items():
token = await get_provider_token(user_id, provider)
if token:
for var in var_names:
env[var] = token
return env

View File

@@ -1,195 +0,0 @@
"""Tests for integration_creds — TTL cache and token lookup paths."""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from pydantic import SecretStr
from backend.copilot.integration_creds import (
_NULL_CACHE_TTL,
_TOKEN_CACHE_TTL,
PROVIDER_ENV_VARS,
_null_cache,
_token_cache,
get_integration_env_vars,
get_provider_token,
invalidate_user_provider_cache,
)
from backend.data.model import APIKeyCredentials, OAuth2Credentials
_USER = "user-integration-creds-test"
_PROVIDER = "github"
def _make_api_key_creds(key: str = "test-api-key") -> APIKeyCredentials:
return APIKeyCredentials(
id="creds-api-key",
provider=_PROVIDER,
api_key=SecretStr(key),
title="Test API Key",
expires_at=None,
)
def _make_oauth2_creds(token: str = "test-oauth-token") -> OAuth2Credentials:
return OAuth2Credentials(
id="creds-oauth2",
provider=_PROVIDER,
title="Test OAuth",
access_token=SecretStr(token),
refresh_token=SecretStr("test-refresh"),
access_token_expires_at=None,
refresh_token_expires_at=None,
scopes=[],
)
@pytest.fixture(autouse=True)
def clear_caches():
"""Ensure clean caches before and after every test."""
_token_cache.clear()
_null_cache.clear()
yield
_token_cache.clear()
_null_cache.clear()
class TestInvalidateUserProviderCache:
def test_removes_token_entry(self):
key = (_USER, _PROVIDER)
_token_cache[key] = "tok"
invalidate_user_provider_cache(_USER, _PROVIDER)
assert key not in _token_cache
def test_removes_null_entry(self):
key = (_USER, _PROVIDER)
_null_cache[key] = True
invalidate_user_provider_cache(_USER, _PROVIDER)
assert key not in _null_cache
def test_noop_when_key_not_cached(self):
# Should not raise even when there is no cache entry.
invalidate_user_provider_cache("no-such-user", _PROVIDER)
def test_only_removes_targeted_key(self):
other_key = ("other-user", _PROVIDER)
_token_cache[other_key] = "other-tok"
invalidate_user_provider_cache(_USER, _PROVIDER)
assert other_key in _token_cache
class TestGetProviderToken:
@pytest.mark.asyncio(loop_scope="session")
async def test_returns_cached_token_without_db_hit(self):
_token_cache[(_USER, _PROVIDER)] = "cached-tok"
mock_manager = MagicMock()
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
assert result == "cached-tok"
mock_manager.store.get_creds_by_provider.assert_not_called()
@pytest.mark.asyncio(loop_scope="session")
async def test_returns_none_for_null_cached_provider(self):
_null_cache[(_USER, _PROVIDER)] = True
mock_manager = MagicMock()
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
assert result is None
mock_manager.store.get_creds_by_provider.assert_not_called()
@pytest.mark.asyncio(loop_scope="session")
async def test_api_key_creds_returned_and_cached(self):
api_creds = _make_api_key_creds("my-api-key")
mock_manager = MagicMock()
mock_manager.store.get_creds_by_provider = AsyncMock(return_value=[api_creds])
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
assert result == "my-api-key"
assert _token_cache.get((_USER, _PROVIDER)) == "my-api-key"
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth2_preferred_over_api_key(self):
oauth_creds = _make_oauth2_creds("oauth-tok")
api_creds = _make_api_key_creds("api-tok")
mock_manager = MagicMock()
mock_manager.store.get_creds_by_provider = AsyncMock(
return_value=[api_creds, oauth_creds]
)
mock_manager.refresh_if_needed = AsyncMock(return_value=oauth_creds)
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
assert result == "oauth-tok"
@pytest.mark.asyncio(loop_scope="session")
async def test_oauth2_refresh_failure_returns_none(self):
"""On refresh failure, return None instead of caching a stale token."""
oauth_creds = _make_oauth2_creds("stale-oauth-tok")
mock_manager = MagicMock()
mock_manager.store.get_creds_by_provider = AsyncMock(return_value=[oauth_creds])
mock_manager.refresh_if_needed = AsyncMock(side_effect=RuntimeError("network"))
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
# Stale tokens must NOT be returned — forces re-auth.
assert result is None
@pytest.mark.asyncio(loop_scope="session")
async def test_no_credentials_caches_null_entry(self):
mock_manager = MagicMock()
mock_manager.store.get_creds_by_provider = AsyncMock(return_value=[])
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
assert result is None
assert _null_cache.get((_USER, _PROVIDER)) is True
@pytest.mark.asyncio(loop_scope="session")
async def test_db_exception_returns_none_without_caching(self):
mock_manager = MagicMock()
mock_manager.store.get_creds_by_provider = AsyncMock(
side_effect=RuntimeError("db down")
)
with patch("backend.copilot.integration_creds._manager", mock_manager):
result = await get_provider_token(_USER, _PROVIDER)
assert result is None
# DB errors are not cached — next call will retry
assert (_USER, _PROVIDER) not in _token_cache
assert (_USER, _PROVIDER) not in _null_cache
@pytest.mark.asyncio(loop_scope="session")
async def test_null_cache_has_shorter_ttl_than_token_cache(self):
"""Verify the TTL constants are set correctly for each cache."""
assert _null_cache.ttl == _NULL_CACHE_TTL
assert _token_cache.ttl == _TOKEN_CACHE_TTL
assert _NULL_CACHE_TTL < _TOKEN_CACHE_TTL
class TestGetIntegrationEnvVars:
@pytest.mark.asyncio(loop_scope="session")
async def test_injects_all_env_vars_for_provider(self):
_token_cache[(_USER, "github")] = "gh-tok"
result = await get_integration_env_vars(_USER)
for var in PROVIDER_ENV_VARS["github"]:
assert result[var] == "gh-tok"
@pytest.mark.asyncio(loop_scope="session")
async def test_empty_dict_when_no_credentials(self):
_null_cache[(_USER, "github")] = True
result = await get_integration_env_vars(_USER)
assert result == {}

View File

@@ -6,24 +6,39 @@ handling the distinction between:
- Local mode vs E2B mode (storage/filesystem differences)
"""
from backend.blocks.autopilot import AUTOPILOT_BLOCK_ID
from backend.copilot.tools import TOOL_REGISTRY
# Shared technical notes that apply to both SDK and baseline modes
_SHARED_TOOL_NOTES = f"""\
_SHARED_TOOL_NOTES = """\
### Sharing files
After `write_workspace_file`, embed the `download_url` in Markdown:
- File: `[report.csv](workspace://file_id#text/csv)`
- Image: `![chart](workspace://file_id#image/png)`
- Video: `![recording](workspace://file_id#video/mp4)`
### Sharing files with the user
After saving a file to the persistent workspace with `write_workspace_file`,
share it with the user by embedding the `download_url` from the response in
your message as a Markdown link or image:
### File references — @@agptfile:
Pass large file content to tools by reference: `@@agptfile:<uri>[<start>-<end>]`
- `workspace://<file_id>` or `workspace:///<path>` — workspace files
- `/absolute/path` — local/sandbox files
- `[start-end]` — optional 1-indexed line range
- Multiple refs per argument supported. Only `workspace://` and absolute paths are expanded.
- **Any file** — shows as a clickable download link:
`[report.csv](workspace://file_id#text/csv)`
- **Image** — renders inline in chat:
`![chart](workspace://file_id#image/png)`
- **Video** — renders inline in chat with player controls:
`![recording](workspace://file_id#video/mp4)`
The `download_url` field in the `write_workspace_file` response is already
in the correct format — paste it directly after the `(` in the Markdown.
### Passing file content to tools — @@agptfile: references
Instead of copying large file contents into a tool argument, pass a file
reference and the platform will load the content for you.
Syntax: `@@agptfile:<uri>[<start>-<end>]`
- `<uri>` **must** start with `workspace://` or `/` (absolute path):
- `workspace://<file_id>` — workspace file by ID
- `workspace:///<path>` — workspace file by virtual path
- `/absolute/local/path` — ephemeral or sdk_cwd file
- E2B sandbox absolute path (e.g. `/home/user/script.py`)
- `[<start>-<end>]` is an optional 1-indexed inclusive line range.
- URIs that do not start with `workspace://` or `/` are **not** expanded.
Examples:
```
@@ -34,9 +49,21 @@ Examples:
@@agptfile:/home/user/script.py
```
**Structured data**: When the entire argument is a single file reference, the platform auto-parses by extension/MIME. Supported: JSON, JSONL, CSV, TSV, YAML, TOML, Parquet, Excel (.xlsx only; legacy `.xls` is NOT supported). Unrecognised formats return plain string.
You can embed a reference inside any string argument, or use it as the entire
value. Multiple references in one argument are all expanded.
**Type coercion**: The platform auto-coerces expanded string values to match block input types (e.g. JSON string → `list[list[str]]`).
**Structured data**: When the **entire** argument value is a single file
reference (no surrounding text), the platform automatically parses the file
content based on its extension or MIME type. Supported formats: JSON, JSONL,
CSV, TSV, YAML, TOML, Parquet, and Excel (.xlsx — first sheet only).
For example, pass `@@agptfile:workspace://<id>` where the file is a `.csv` and
the rows will be parsed into `list[list[str]]` automatically. If the format is
unrecognised or parsing fails, the content is returned as a plain string.
Legacy `.xls` files are **not** supported — only the modern `.xlsx` format.
**Type coercion**: The platform also coerces expanded values to match the
block's expected input types. For example, if a block expects `list[list[str]]`
and the expanded value is a JSON string, it will be parsed into the correct type.
### Media file inputs (format: "file")
Some block inputs accept media files — their schema shows `"format": "file"`.
@@ -54,53 +81,18 @@ that would be corrupted by text encoding.
Example — committing an image file to GitHub:
```json
{{
"files": [{{
{
"files": [{
"path": "docs/hero.png",
"content": "workspace://abc123#image/png",
"operation": "upsert"
}}]
}}
}]
}
```
### Sub-agent tasks
- When using the Task tool, NEVER set `run_in_background` to true.
All tasks must run in the foreground.
### Delegating to another autopilot (sub-autopilot pattern)
Use the **AutoPilotBlock** (`run_block` with block_id
`{AUTOPILOT_BLOCK_ID}`) to delegate a task to a fresh
autopilot instance. The sub-autopilot has its own full tool set and can
perform multi-step work autonomously.
- **Input**: `prompt` (required) — the task description.
Optional: `system_context` to constrain behavior, `session_id` to
continue a previous conversation, `max_recursion_depth` (default 3).
- **Output**: `response` (text), `tool_calls` (list), `session_id`
(for continuation), `conversation_history`, `token_usage`.
Use this when a task is complex enough to benefit from a separate
autopilot context, e.g. "research X and write a report" while the
parent autopilot handles orchestration.
"""
# E2B-only notes — E2B has full internet access so gh CLI works there.
# Not shown in local (bubblewrap) mode: --unshare-net blocks all network.
_E2B_TOOL_NOTES = """
### GitHub CLI (`gh`) and git
- If the user has connected their GitHub account, both `gh` and `git` are
pre-authenticated — use them directly without any manual login step.
`git` HTTPS operations (clone, push, pull) work automatically.
- If the token changes mid-session (e.g. user reconnects with a new token),
run `gh auth setup-git` to re-register the credential helper.
- If `gh` or `git` fails with an authentication error (e.g. "authentication
required", "could not read Username", or exit code 128), call
`connect_integration(provider="github")` to surface the GitHub credentials
setup card so the user can connect their account. Once connected, retry
the operation.
- For operations that need broader access (e.g. private org repos, GitHub
Actions), pass the required scopes: e.g.
`connect_integration(provider="github", scopes=["repo", "read:org"])`.
"""
@@ -113,7 +105,6 @@ def _build_storage_supplement(
storage_system_1_persistence: list[str],
file_move_name_1_to_2: str,
file_move_name_2_to_1: str,
extra_notes: str = "",
) -> str:
"""Build storage/filesystem supplement for a specific environment.
@@ -128,7 +119,6 @@ def _build_storage_supplement(
storage_system_1_persistence: List of persistence behavior descriptions
file_move_name_1_to_2: Direction label for primary→persistent
file_move_name_2_to_1: Direction label for persistent→primary
extra_notes: Environment-specific notes appended after shared notes
"""
# Format lists as bullet points with proper indentation
characteristics = "\n".join(f" - {c}" for c in storage_system_1_characteristics)
@@ -138,12 +128,17 @@ def _build_storage_supplement(
## Tool notes
### Shell & filesystem
- The SDK built-in Bash tool is NOT available. Use `bash_exec` for shell commands ({sandbox_type}). Working dir: `{working_dir}`
- SDK file tools (Read/Write/Edit/Glob/Grep) and `bash_exec` share one filesystem — use relative or absolute paths under this dir.
- `read_workspace_file`/`write_workspace_file` operate on **persistent cloud workspace storage** (separate from the working dir).
### Shell commands
- The SDK built-in Bash tool is NOT available. Use the `bash_exec` MCP tool
for shell commands — it runs {sandbox_type}.
### Working directory
- Your working directory is: `{working_dir}`
- All SDK file tools AND `bash_exec` operate on the same filesystem
- Use relative paths or absolute paths under `{working_dir}` for all file operations
### Two storage systems — CRITICAL to understand
1. **{storage_system_1_name}** (`{working_dir}`):
{characteristics}
{persistence}
@@ -157,23 +152,12 @@ def _build_storage_supplement(
### File persistence
Important files (code, configs, outputs) should be saved to workspace to ensure they persist.
### SDK tool-result files
When tool outputs are large, the SDK truncates them and saves the full output to
a local file under `~/.claude/projects/.../tool-results/`. To read these files,
always use `read_file` or `Read` (NOT `read_workspace_file`).
`read_workspace_file` reads from cloud workspace storage, where SDK
tool-results are NOT stored.
{_SHARED_TOOL_NOTES}{extra_notes}"""
{_SHARED_TOOL_NOTES}"""
# Pre-built supplements for common environments
def _get_local_storage_supplement(cwd: str) -> str:
"""Local ephemeral storage (files lost between turns).
Network is isolated (bubblewrap --unshare-net), so internet-dependent CLIs
like gh will not work — no integration env-var notes are included.
"""
"""Local ephemeral storage (files lost between turns)."""
return _build_storage_supplement(
working_dir=cwd,
sandbox_type="in a network-isolated sandbox",
@@ -191,11 +175,7 @@ def _get_local_storage_supplement(cwd: str) -> str:
def _get_cloud_sandbox_supplement() -> str:
"""Cloud persistent sandbox (files survive across turns in session).
E2B has full internet access, so integration tokens (GH_TOKEN etc.) are
injected per command in bash_exec — include the CLI guidance notes.
"""
"""Cloud persistent sandbox (files survive across turns in session)."""
return _build_storage_supplement(
working_dir="/home/user",
sandbox_type="in a cloud sandbox with full internet access",
@@ -210,7 +190,6 @@ def _get_cloud_sandbox_supplement() -> str:
],
file_move_name_1_to_2="Sandbox → Persistent",
file_move_name_2_to_1="Persistent → Sandbox",
extra_notes=_E2B_TOOL_NOTES,
)

View File

@@ -1,63 +0,0 @@
"""Single source of truth for copilot-supported integration providers.
Both :mod:`~backend.copilot.integration_creds` (env-var injection) and
:mod:`~backend.copilot.tools.connect_integration` (UI setup card) import from
here, eliminating the risk of the two registries drifting out of sync.
"""
from typing import TypedDict
class ProviderEntry(TypedDict):
"""Metadata for a supported integration provider.
Attributes:
name: Human-readable display name (e.g. "GitHub").
env_vars: Environment variable names injected when the provider is
connected (e.g. ``["GH_TOKEN", "GITHUB_TOKEN"]``).
default_scopes: Default OAuth scopes requested when the agent does not
specify any.
"""
name: str
env_vars: list[str]
default_scopes: list[str]
def _is_github_oauth_configured() -> bool:
"""Return True if GitHub OAuth env vars are set.
Uses a lazy import to avoid triggering ``Secrets()`` during module import,
which can fail in environments where secrets are not yet loaded (e.g. tests,
CLI tooling).
"""
from backend.blocks.github._auth import GITHUB_OAUTH_IS_CONFIGURED
return GITHUB_OAUTH_IS_CONFIGURED
# -- Registry ----------------------------------------------------------------
# Add new providers here. Both env-var injection and the setup-card tool read
# from this single registry.
SUPPORTED_PROVIDERS: dict[str, ProviderEntry] = {
"github": {
"name": "GitHub",
"env_vars": ["GH_TOKEN", "GITHUB_TOKEN"],
"default_scopes": ["repo"],
},
}
def get_provider_auth_types(provider: str) -> list[str]:
"""Return the supported credential types for *provider* at runtime.
OAuth types are only offered when the corresponding OAuth client env vars
are configured.
"""
if provider == "github":
if _is_github_oauth_configured():
return ["api_key", "oauth2"]
return ["api_key"]
# Default for unknown/future providers — API key only.
return ["api_key"]

View File

@@ -43,7 +43,6 @@ class ResponseType(str, Enum):
ERROR = "error"
USAGE = "usage"
HEARTBEAT = "heartbeat"
STATUS = "status"
class StreamBaseResponse(BaseModel):
@@ -264,19 +263,3 @@ class StreamHeartbeat(StreamBaseResponse):
def to_sse(self) -> str:
"""Convert to SSE comment format to keep connection alive."""
return ": heartbeat\n\n"
class StreamStatus(StreamBaseResponse):
"""Transient status notification shown to the user during long operations.
Used to provide feedback when the backend performs behind-the-scenes work
(e.g., compacting conversation context on a retry) that would otherwise
leave the user staring at an unexplained pause.
Sent as a proper ``data:`` event so the frontend can display it to the
user. The AI SDK stream parser gracefully skips unknown chunk types
(logs a console warning), so this does not break the stream.
"""
type: ResponseType = ResponseType.STATUS
message: str = Field(..., description="Human-readable status message")

View File

@@ -19,19 +19,9 @@ least invasive way to break the cycle while keeping module-level constants
intact.
"""
from typing import TYPE_CHECKING, Any
# Static imports for type checkers so they can resolve __all__ entries
# without executing the lazy-import machinery at runtime.
if TYPE_CHECKING:
from .collect import CopilotResult as CopilotResult
from .collect import collect_copilot_response as collect_copilot_response
from .service import stream_chat_completion_sdk as stream_chat_completion_sdk
from .tool_adapter import create_copilot_mcp_server as create_copilot_mcp_server
from typing import Any
__all__ = [
"CopilotResult",
"collect_copilot_response",
"stream_chat_completion_sdk",
"create_copilot_mcp_server",
]
@@ -39,8 +29,6 @@ __all__ = [
# Dispatch table for PEP 562 lazy imports. Each entry is a (module, attr)
# pair so new exports can be added without touching __getattr__ itself.
_LAZY_IMPORTS: dict[str, tuple[str, str]] = {
"CopilotResult": (".collect", "CopilotResult"),
"collect_copilot_response": (".collect", "collect_copilot_response"),
"stream_chat_completion_sdk": (".service", "stream_chat_completion_sdk"),
"create_copilot_mcp_server": (".tool_adapter", "create_copilot_mcp_server"),
}

View File

@@ -143,71 +143,6 @@ To use an MCP (Model Context Protocol) tool as a node in the agent:
tool_arguments.
6. Output: `result` (the tool's return value) and `error` (error message)
### Using SmartDecisionMakerBlock (AI Orchestrator with Agent Mode)
To create an agent where AI autonomously decides which tools or sub-agents to
call in a loop until the task is complete:
1. Create a `SmartDecisionMakerBlock` node
(ID: `3b191d9f-356f-482d-8238-ba04b6d18381`)
2. Set `input_default`:
- `agent_mode_max_iterations`: Choose based on task complexity:
- `1` for single-step tool calls (AI picks one tool, calls it, done)
- `3``10` for multi-step tasks (AI calls tools iteratively)
- `-1` for open-ended orchestration (AI loops until it decides it's done).
**Use with caution** — prefer bounded iterations (310) unless
genuinely needed, as unbounded loops risk runaway cost and execution.
Do NOT use `0` (traditional mode) — it requires complex external
conversation-history loop wiring that the agent generator does not
produce.
- `conversation_compaction`: `true` (recommended to avoid context overflow)
- `retry`: Number of retries on tool-call failure (default `3`).
Set to `0` to disable retries.
- `multiple_tool_calls`: Whether the AI can invoke multiple tools in a
single turn (default `false`). Enable when tools are independent and
can run concurrently.
- Optional: `sys_prompt` for extra LLM context about how to orchestrate
3. Wire the `prompt` input from an `AgentInputBlock` (the user's task)
4. Create downstream tool blocks — regular blocks **or** `AgentExecutorBlock`
nodes that call sub-agents
5. Link each tool to the SmartDecisionMaker: set `source_name: "tools"` on
the SmartDecisionMaker side and `sink_name: <input_field>` on each tool
block's input. Create one link per input field the tool needs.
6. Wire the `finished` output to an `AgentOutputBlock` for the final result
7. Credentials (LLM API key) are configured by the user in the platform UI
after saving — do NOT require them upfront
**Example — Orchestrator calling two sub-agents:**
- Node 1: `AgentInputBlock` (input_default: `{"name": "task"}`)
- Node 2: `SmartDecisionMakerBlock` (input_default:
`{"agent_mode_max_iterations": 10, "conversation_compaction": true}`)
- Node 3: `AgentExecutorBlock` (sub-agent A — set `graph_id`, `graph_version`,
`input_schema`, `output_schema` from library agent)
- Node 4: `AgentExecutorBlock` (sub-agent B — same pattern)
- Node 5: `AgentOutputBlock` (input_default: `{"name": "result"}`)
- Links:
- Input→SDM: `source_name: "result"`, `sink_name: "prompt"`
- SDM→Agent A (per input field): `source_name: "tools"`,
`sink_name: "<agent_a_input_field>"`
- SDM→Agent B (per input field): `source_name: "tools"`,
`sink_name: "<agent_b_input_field>"`
- SDM→Output: `source_name: "finished"`, `sink_name: "value"`
**Example — Orchestrator calling regular blocks as tools:**
- Node 1: `AgentInputBlock` (input_default: `{"name": "task"}`)
- Node 2: `SmartDecisionMakerBlock` (input_default:
`{"agent_mode_max_iterations": 5, "conversation_compaction": true}`)
- Node 3: `GetWebpageBlock` (regular block — the AI calls it as a tool)
- Node 4: `AITextGeneratorBlock` (another regular block as a tool)
- Node 5: `AgentOutputBlock` (input_default: `{"name": "result"}`)
- Links:
- Input→SDM: `source_name: "result"`, `sink_name: "prompt"`
- SDM→GetWebpage: `source_name: "tools"`, `sink_name: "url"`
- SDM→AITextGenerator: `source_name: "tools"`, `sink_name: "prompt"`
- SDM→Output: `source_name: "finished"`, `sink_name: "value"`
Regular blocks work exactly like sub-agents as tools — wire each input
field from `source_name: "tools"` on the SmartDecisionMaker side.
### Example: Simple AI Text Processor
A minimal agent with input, processing, and output:

View File

@@ -1,108 +0,0 @@
"""Public helpers for consuming a copilot stream as a simple request-response.
This module exposes :class:`CopilotResult` and :func:`collect_copilot_response`
so that callers (e.g. the AutoPilot block) can consume the copilot stream
without implementing their own event loop.
"""
from __future__ import annotations
from typing import Any
class CopilotResult:
"""Aggregated result from consuming a copilot stream.
Returned by :func:`collect_copilot_response` so callers don't need to
implement their own event-loop over the raw stream events.
"""
__slots__ = (
"response_text",
"tool_calls",
"prompt_tokens",
"completion_tokens",
"total_tokens",
)
def __init__(self) -> None:
self.response_text: str = ""
self.tool_calls: list[dict[str, Any]] = []
self.prompt_tokens: int = 0
self.completion_tokens: int = 0
self.total_tokens: int = 0
async def collect_copilot_response(
*,
session_id: str,
message: str,
user_id: str,
is_user_message: bool = True,
) -> CopilotResult:
"""Consume :func:`stream_chat_completion_sdk` and return aggregated results.
This is the recommended entry-point for callers that need a simple
request-response interface (e.g. the AutoPilot block) rather than
streaming individual events. It avoids duplicating the event-collection
logic and does NOT wrap the stream in ``asyncio.timeout`` — the SDK
manages its own heartbeat-based timeouts internally.
Args:
session_id: Chat session to use.
message: The user message / prompt.
user_id: Authenticated user ID.
is_user_message: Whether this is a user-initiated message.
Returns:
A :class:`CopilotResult` with the aggregated response text,
tool calls, and token usage.
Raises:
RuntimeError: If the stream yields a ``StreamError`` event.
"""
from backend.copilot.response_model import (
StreamError,
StreamTextDelta,
StreamToolInputAvailable,
StreamToolOutputAvailable,
StreamUsage,
)
from .service import stream_chat_completion_sdk
result = CopilotResult()
response_parts: list[str] = []
tool_calls_by_id: dict[str, dict[str, Any]] = {}
async for event in stream_chat_completion_sdk(
session_id=session_id,
message=message,
is_user_message=is_user_message,
user_id=user_id,
):
if isinstance(event, StreamTextDelta):
response_parts.append(event.delta)
elif isinstance(event, StreamToolInputAvailable):
entry: dict[str, Any] = {
"tool_call_id": event.toolCallId,
"tool_name": event.toolName,
"input": event.input,
"output": None,
"success": None,
}
result.tool_calls.append(entry)
tool_calls_by_id[event.toolCallId] = entry
elif isinstance(event, StreamToolOutputAvailable):
if tc := tool_calls_by_id.get(event.toolCallId):
tc["output"] = event.output
tc["success"] = event.success
elif isinstance(event, StreamUsage):
result.prompt_tokens += event.prompt_tokens
result.completion_tokens += event.completion_tokens
result.total_tokens += event.total_tokens
elif isinstance(event, StreamError):
raise RuntimeError(f"Copilot error: {event.errorText}")
result.response_text = "".join(response_parts)
return result

View File

@@ -12,7 +12,6 @@ import asyncio
import logging
import uuid
from dataclasses import dataclass, field
from typing import Any
from ..constants import COMPACTION_DONE_MSG, COMPACTION_TOOL_NAME
from ..model import ChatMessage, ChatSession
@@ -120,12 +119,14 @@ def filter_compaction_messages(
filtered: list[ChatMessage] = []
for msg in messages:
if msg.role == "assistant" and msg.tool_calls:
real_calls: list[dict[str, Any]] = []
for tc in msg.tool_calls:
if tc.get("function", {}).get("name") == COMPACTION_TOOL_NAME:
compaction_ids.add(tc.get("id", ""))
else:
real_calls.append(tc)
real_calls = [
tc
for tc in msg.tool_calls
if tc.get("function", {}).get("name") != COMPACTION_TOOL_NAME
]
if not real_calls and not msg.content:
continue
if msg.role == "tool" and msg.tool_call_id in compaction_ids:
@@ -221,7 +222,6 @@ class CompactionTracker:
def reset_for_query(self) -> None:
"""Reset per-query state before a new SDK query."""
self._compact_start.clear()
self._done = False
self._start_emitted = False
self._tool_call_id = ""

View File

@@ -1,54 +0,0 @@
"""Shared test fixtures for copilot SDK tests."""
from __future__ import annotations
from unittest.mock import patch
from uuid import uuid4
import pytest
from backend.util import json
@pytest.fixture()
def mock_chat_config():
"""Mock ChatConfig so compact_transcript tests skip real config lookup."""
with patch(
"backend.copilot.config.ChatConfig",
return_value=type("Cfg", (), {"model": "m", "api_key": "k", "base_url": "u"})(),
):
yield
def build_test_transcript(pairs: list[tuple[str, str]]) -> str:
"""Build a minimal valid JSONL transcript from (role, content) pairs.
Use this helper in any copilot SDK test that needs a well-formed
transcript without hitting the real storage layer.
"""
lines: list[str] = []
last_uuid: str | None = None
for role, content in pairs:
uid = str(uuid4())
entry_type = "assistant" if role == "assistant" else "user"
msg: dict = {"role": role, "content": content}
if role == "assistant":
msg.update(
{
"model": "",
"id": f"msg_{uid[:8]}",
"type": "message",
"content": [{"type": "text", "text": content}],
"stop_reason": "end_turn",
"stop_sequence": None,
}
)
entry = {
"type": entry_type,
"uuid": uid,
"parentUuid": last_uuid,
"message": msg,
}
lines.append(json.dumps(entry, separators=(",", ":")))
last_uuid = uid
return "\n".join(lines) + "\n"

View File

@@ -1,17 +1,9 @@
"""Dummy SDK service for testing copilot streaming.
Returns mock streaming responses without calling Claude Agent SDK.
Enable via CHAT_TEST_MODE=true in .env (ChatConfig.test_mode).
Enable via COPILOT_TEST_MODE=true environment variable.
WARNING: This is for testing only. Do not use in production.
Magic keywords (case-insensitive, anywhere in message):
__test_transient_error__ — Simulate a transient Anthropic API error
(ECONNRESET). Streams partial text, then
yields StreamError with retryable prefix.
__test_fatal_error__ — Simulate a non-retryable SDK error.
__test_slow_response__ — Simulate a slow response (2s per word).
(no keyword) — Normal dummy response.
"""
import asyncio
@@ -20,39 +12,12 @@ import uuid
from collections.abc import AsyncGenerator
from typing import Any
from ..constants import (
COPILOT_ERROR_PREFIX,
COPILOT_RETRYABLE_ERROR_PREFIX,
FRIENDLY_TRANSIENT_MSG,
)
from ..model import ChatMessage, ChatSession, get_chat_session, upsert_chat_session
from ..response_model import (
StreamBaseResponse,
StreamError,
StreamFinish,
StreamFinishStep,
StreamStart,
StreamStartStep,
StreamTextDelta,
StreamTextEnd,
StreamTextStart,
)
from ..model import ChatSession
from ..response_model import StreamBaseResponse, StreamStart, StreamTextDelta
logger = logging.getLogger(__name__)
async def _safe_upsert(session: ChatSession) -> None:
"""Best-effort session persist — skip silently if DB is unavailable."""
try:
await upsert_chat_session(session)
except Exception:
logger.debug("[TEST MODE] Could not persist session (DB unavailable)")
def _has_keyword(message: str | None, keyword: str) -> bool:
return keyword in (message or "").lower()
async def stream_chat_completion_dummy(
session_id: str,
message: str | None = None,
@@ -71,89 +36,24 @@ async def stream_chat_completion_dummy(
- No timeout occurs
- Text arrives in chunks
- StreamFinish is sent by mark_session_completed
See module docstring for magic keywords that trigger error scenarios.
"""
logger.warning(
f"[TEST MODE] Using dummy copilot streaming for session {session_id}"
)
# Load session from DB (matches SDK service behaviour) so error markers
# and the assistant reply are persisted and survive page refresh.
# Best-effort: skip if DB is unavailable (e.g. unit tests).
if session is None:
try:
session = await get_chat_session(session_id, user_id)
except Exception:
logger.debug("[TEST MODE] Could not load session (DB unavailable)")
session = None
message_id = str(uuid.uuid4())
text_block_id = str(uuid.uuid4())
# Start the stream (matches baseline: StreamStart → StreamStartStep)
# Start the stream
yield StreamStart(messageId=message_id, sessionId=session_id)
yield StreamStartStep()
# --- Magic keyword: transient error (retryable) -------------------------
if _has_keyword(message, "__test_transient_error__"):
# Stream some partial text first (simulates mid-stream failure)
yield StreamTextStart(id=text_block_id)
for word in ["Working", "on", "it..."]:
yield StreamTextDelta(id=text_block_id, delta=f"{word} ")
await asyncio.sleep(0.1)
yield StreamTextEnd(id=text_block_id)
yield StreamFinishStep()
# Persist retryable marker so "Try Again" button shows after refresh
if session:
session.messages.append(
ChatMessage(
role="assistant",
content=f"{COPILOT_RETRYABLE_ERROR_PREFIX} {FRIENDLY_TRANSIENT_MSG}",
)
)
await _safe_upsert(session)
yield StreamError(
errorText=FRIENDLY_TRANSIENT_MSG,
code="transient_api_error",
)
return
# --- Magic keyword: fatal error (non-retryable) -------------------------
if _has_keyword(message, "__test_fatal_error__"):
yield StreamFinishStep()
error_msg = "Internal SDK error: model refused to respond"
# Persist non-retryable error marker
if session:
session.messages.append(
ChatMessage(
role="assistant",
content=f"{COPILOT_ERROR_PREFIX} {error_msg}",
)
)
await _safe_upsert(session)
yield StreamError(errorText=error_msg, code="sdk_error")
return
# --- Magic keyword: slow response ---------------------------------------
delay = 2.0 if _has_keyword(message, "__test_slow_response__") else 0.1
# --- Normal dummy response ----------------------------------------------
# Simulate streaming text response with delays
dummy_response = "I counted: 1... 2... 3. All done!"
words = dummy_response.split()
yield StreamTextStart(id=text_block_id)
for i, word in enumerate(words):
# Add space except for last word
text = word if i == len(words) - 1 else f"{word} "
yield StreamTextDelta(id=text_block_id, delta=text)
await asyncio.sleep(delay)
yield StreamTextEnd(id=text_block_id)
# Persist the assistant reply so it survives page refresh
if session:
session.messages.append(ChatMessage(role="assistant", content=dummy_response))
await _safe_upsert(session)
yield StreamFinishStep()
yield StreamFinish()
# Small delay to simulate real streaming
await asyncio.sleep(0.1)

View File

@@ -26,41 +26,6 @@ from backend.copilot.context import (
logger = logging.getLogger(__name__)
async def _check_sandbox_symlink_escape(
sandbox: Any,
parent: str,
) -> str | None:
"""Resolve the canonical parent path inside the sandbox to detect symlink escapes.
``normpath`` (used by ``resolve_sandbox_path``) only normalises the string;
``readlink -f`` follows actual symlinks on the sandbox filesystem.
Returns the canonical parent path, or ``None`` if the path escapes
``E2B_WORKDIR``.
Note: There is an inherent TOCTOU window between this check and the
subsequent ``sandbox.files.write()``. A symlink could theoretically be
replaced between the two operations. This is acceptable in the E2B
sandbox model since the sandbox is single-user and ephemeral.
"""
canonical_res = await sandbox.commands.run(
f"readlink -f {shlex.quote(parent or E2B_WORKDIR)}",
cwd=E2B_WORKDIR,
timeout=5,
)
canonical_parent = (canonical_res.stdout or "").strip()
if (
canonical_res.exit_code != 0
or not canonical_parent
or (
canonical_parent != E2B_WORKDIR
and not canonical_parent.startswith(E2B_WORKDIR + "/")
)
):
return None
return canonical_parent
def _get_sandbox():
return get_current_sandbox()
@@ -141,10 +106,6 @@ async def _handle_write_file(args: dict[str, Any]) -> dict[str, Any]:
parent = os.path.dirname(remote)
if parent and parent != E2B_WORKDIR:
await sandbox.files.make_dir(parent)
canonical_parent = await _check_sandbox_symlink_escape(sandbox, parent)
if canonical_parent is None:
return _mcp(f"Path must be within {E2B_WORKDIR}: {parent}", error=True)
remote = os.path.join(canonical_parent, os.path.basename(remote))
await sandbox.files.write(remote, content)
except Exception as exc:
return _mcp(f"Failed to write {remote}: {exc}", error=True)
@@ -169,12 +130,6 @@ async def _handle_edit_file(args: dict[str, Any]) -> dict[str, Any]:
return result
sandbox, remote = result
parent = os.path.dirname(remote)
canonical_parent = await _check_sandbox_symlink_escape(sandbox, parent)
if canonical_parent is None:
return _mcp(f"Path must be within {E2B_WORKDIR}: {parent}", error=True)
remote = os.path.join(canonical_parent, os.path.basename(remote))
try:
raw: bytes = await sandbox.files.read(remote, format="bytes")
content = raw.decode("utf-8", errors="replace")

View File

@@ -4,19 +4,15 @@ Pure unit tests with no external dependencies (no E2B, no sandbox).
"""
import os
import shutil
from types import SimpleNamespace
from unittest.mock import AsyncMock
import pytest
from backend.copilot.context import E2B_WORKDIR, SDK_PROJECTS_DIR, _current_project_dir
from backend.copilot.context import _current_project_dir
from .e2b_file_tools import _read_local, resolve_sandbox_path
_SDK_PROJECTS_DIR = os.path.realpath(os.path.expanduser("~/.claude/projects"))
from .e2b_file_tools import (
_check_sandbox_symlink_escape,
_read_local,
resolve_sandbox_path,
)
# ---------------------------------------------------------------------------
# resolve_sandbox_path — sandbox path normalisation & boundary enforcement
@@ -25,48 +21,46 @@ from .e2b_file_tools import (
class TestResolveSandboxPath:
def test_relative_path_resolved(self):
assert resolve_sandbox_path("src/main.py") == f"{E2B_WORKDIR}/src/main.py"
assert resolve_sandbox_path("src/main.py") == "/home/user/src/main.py"
def test_absolute_within_sandbox(self):
assert (
resolve_sandbox_path(f"{E2B_WORKDIR}/file.txt") == f"{E2B_WORKDIR}/file.txt"
)
assert resolve_sandbox_path("/home/user/file.txt") == "/home/user/file.txt"
def test_workdir_itself(self):
assert resolve_sandbox_path(E2B_WORKDIR) == E2B_WORKDIR
assert resolve_sandbox_path("/home/user") == "/home/user"
def test_relative_dotslash(self):
assert resolve_sandbox_path("./README.md") == f"{E2B_WORKDIR}/README.md"
assert resolve_sandbox_path("./README.md") == "/home/user/README.md"
def test_traversal_blocked(self):
with pytest.raises(ValueError, match=f"must be within {E2B_WORKDIR}"):
with pytest.raises(ValueError, match="must be within /home/user"):
resolve_sandbox_path("../../etc/passwd")
def test_absolute_traversal_blocked(self):
with pytest.raises(ValueError, match=f"must be within {E2B_WORKDIR}"):
resolve_sandbox_path(f"{E2B_WORKDIR}/../../etc/passwd")
with pytest.raises(ValueError, match="must be within /home/user"):
resolve_sandbox_path("/home/user/../../etc/passwd")
def test_absolute_outside_sandbox_blocked(self):
with pytest.raises(ValueError, match=f"must be within {E2B_WORKDIR}"):
with pytest.raises(ValueError, match="must be within /home/user"):
resolve_sandbox_path("/etc/passwd")
def test_root_blocked(self):
with pytest.raises(ValueError, match=f"must be within {E2B_WORKDIR}"):
with pytest.raises(ValueError, match="must be within /home/user"):
resolve_sandbox_path("/")
def test_home_other_user_blocked(self):
with pytest.raises(ValueError, match=f"must be within {E2B_WORKDIR}"):
with pytest.raises(ValueError, match="must be within /home/user"):
resolve_sandbox_path("/home/other/file.txt")
def test_deep_nested_allowed(self):
assert resolve_sandbox_path("a/b/c/d/e.txt") == f"{E2B_WORKDIR}/a/b/c/d/e.txt"
assert resolve_sandbox_path("a/b/c/d/e.txt") == "/home/user/a/b/c/d/e.txt"
def test_trailing_slash_normalised(self):
assert resolve_sandbox_path("src/") == f"{E2B_WORKDIR}/src"
assert resolve_sandbox_path("src/") == "/home/user/src"
def test_double_dots_within_sandbox_ok(self):
"""Path that resolves back within E2B_WORKDIR is allowed."""
assert resolve_sandbox_path("a/b/../c.txt") == f"{E2B_WORKDIR}/a/c.txt"
"""Path that resolves back within /home/user is allowed."""
assert resolve_sandbox_path("a/b/../c.txt") == "/home/user/a/c.txt"
# ---------------------------------------------------------------------------
@@ -79,13 +73,9 @@ class TestResolveSandboxPath:
class TestReadLocal:
_CONV_UUID = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
def _make_tool_results_file(self, encoded: str, filename: str, content: str) -> str:
"""Create a tool-results file under <encoded>/<uuid>/tool-results/."""
tool_results_dir = os.path.join(
SDK_PROJECTS_DIR, encoded, self._CONV_UUID, "tool-results"
)
"""Create a tool-results file and return its path."""
tool_results_dir = os.path.join(_SDK_PROJECTS_DIR, encoded, "tool-results")
os.makedirs(tool_results_dir, exist_ok=True)
filepath = os.path.join(tool_results_dir, filename)
with open(filepath, "w") as f:
@@ -117,9 +107,7 @@ class TestReadLocal:
def test_read_nonexistent_tool_results(self):
"""A tool-results path that doesn't exist returns FileNotFoundError."""
encoded = "-tmp-copilot-e2b-test-nofile"
tool_results_dir = os.path.join(
SDK_PROJECTS_DIR, encoded, self._CONV_UUID, "tool-results"
)
tool_results_dir = os.path.join(_SDK_PROJECTS_DIR, encoded, "tool-results")
os.makedirs(tool_results_dir, exist_ok=True)
filepath = os.path.join(tool_results_dir, "nonexistent.txt")
token = _current_project_dir.set(encoded)
@@ -129,7 +117,7 @@ class TestReadLocal:
assert "not found" in result["content"][0]["text"].lower()
finally:
_current_project_dir.reset(token)
shutil.rmtree(os.path.join(SDK_PROJECTS_DIR, encoded), ignore_errors=True)
os.rmdir(tool_results_dir)
def test_read_traversal_path_blocked(self):
"""A traversal attempt that escapes allowed directories is blocked."""
@@ -164,66 +152,3 @@ class TestReadLocal:
"""Without _current_project_dir set, all paths are blocked."""
result = _read_local("/tmp/anything.txt", offset=0, limit=10)
assert result["isError"] is True
# ---------------------------------------------------------------------------
# _check_sandbox_symlink_escape — symlink escape detection
# ---------------------------------------------------------------------------
def _make_sandbox(stdout: str, exit_code: int = 0) -> SimpleNamespace:
"""Build a minimal sandbox mock whose commands.run returns a fixed result."""
run_result = SimpleNamespace(stdout=stdout, exit_code=exit_code)
commands = SimpleNamespace(run=AsyncMock(return_value=run_result))
return SimpleNamespace(commands=commands)
class TestCheckSandboxSymlinkEscape:
@pytest.mark.asyncio
async def test_canonical_path_within_workdir_returns_path(self):
"""When readlink -f resolves to a path inside E2B_WORKDIR, returns it."""
sandbox = _make_sandbox(stdout=f"{E2B_WORKDIR}/src\n", exit_code=0)
result = await _check_sandbox_symlink_escape(sandbox, f"{E2B_WORKDIR}/src")
assert result == f"{E2B_WORKDIR}/src"
@pytest.mark.asyncio
async def test_workdir_itself_returns_workdir(self):
"""When readlink -f resolves to E2B_WORKDIR exactly, returns E2B_WORKDIR."""
sandbox = _make_sandbox(stdout=f"{E2B_WORKDIR}\n", exit_code=0)
result = await _check_sandbox_symlink_escape(sandbox, E2B_WORKDIR)
assert result == E2B_WORKDIR
@pytest.mark.asyncio
async def test_symlink_escape_returns_none(self):
"""When readlink -f resolves outside E2B_WORKDIR (symlink escape), returns None."""
sandbox = _make_sandbox(stdout="/etc\n", exit_code=0)
result = await _check_sandbox_symlink_escape(sandbox, f"{E2B_WORKDIR}/evil")
assert result is None
@pytest.mark.asyncio
async def test_nonzero_exit_code_returns_none(self):
"""A non-zero exit code from readlink -f returns None."""
sandbox = _make_sandbox(stdout="", exit_code=1)
result = await _check_sandbox_symlink_escape(sandbox, f"{E2B_WORKDIR}/src")
assert result is None
@pytest.mark.asyncio
async def test_empty_stdout_returns_none(self):
"""Empty stdout from readlink (e.g. path doesn't exist yet) returns None."""
sandbox = _make_sandbox(stdout="", exit_code=0)
result = await _check_sandbox_symlink_escape(sandbox, f"{E2B_WORKDIR}/src")
assert result is None
@pytest.mark.asyncio
async def test_prefix_collision_returns_none(self):
"""A path prefixed with E2B_WORKDIR but not within it is rejected."""
sandbox = _make_sandbox(stdout=f"{E2B_WORKDIR}-evil\n", exit_code=0)
result = await _check_sandbox_symlink_escape(sandbox, f"{E2B_WORKDIR}-evil")
assert result is None
@pytest.mark.asyncio
async def test_deeply_nested_path_within_workdir(self):
"""Deep nested paths inside E2B_WORKDIR are allowed."""
sandbox = _make_sandbox(stdout=f"{E2B_WORKDIR}/a/b/c/d\n", exit_code=0)
result = await _check_sandbox_symlink_escape(sandbox, f"{E2B_WORKDIR}/a/b/c/d")
assert result == f"{E2B_WORKDIR}/a/b/c/d"

View File

@@ -1,651 +0,0 @@
"""Tests for retry logic and transcript compaction helpers."""
from __future__ import annotations
import asyncio
from unittest.mock import AsyncMock, patch
from uuid import uuid4
import pytest
from backend.util import json
from backend.util.prompt import CompressResult
from .conftest import build_test_transcript as _build_transcript
from .service import _friendly_error_text, _is_prompt_too_long
from .transcript import (
_flatten_assistant_content,
_flatten_tool_result_content,
_messages_to_transcript,
_run_compression,
_transcript_to_messages,
compact_transcript,
validate_transcript,
)
# ---------------------------------------------------------------------------
# _flatten_assistant_content
# ---------------------------------------------------------------------------
class TestFlattenAssistantContent:
def test_text_blocks(self):
blocks = [
{"type": "text", "text": "Hello"},
{"type": "text", "text": "World"},
]
assert _flatten_assistant_content(blocks) == "Hello\nWorld"
def test_tool_use_blocks(self):
blocks = [{"type": "tool_use", "name": "read_file", "input": {}}]
assert _flatten_assistant_content(blocks) == "[tool_use: read_file]"
def test_mixed_blocks(self):
blocks = [
{"type": "text", "text": "Let me read that."},
{"type": "tool_use", "name": "Read", "input": {"path": "/foo"}},
]
result = _flatten_assistant_content(blocks)
assert "Let me read that." in result
assert "[tool_use: Read]" in result
def test_raw_strings(self):
assert _flatten_assistant_content(["hello", "world"]) == "hello\nworld"
def test_unknown_block_type_preserved_as_placeholder(self):
blocks = [
{"type": "text", "text": "See this image:"},
{"type": "image", "source": {"type": "base64", "data": "..."}},
]
result = _flatten_assistant_content(blocks)
assert "See this image:" in result
assert "[__image__]" in result
def test_empty(self):
assert _flatten_assistant_content([]) == ""
# ---------------------------------------------------------------------------
# _flatten_tool_result_content
# ---------------------------------------------------------------------------
class TestFlattenToolResultContent:
def test_tool_result_with_text(self):
blocks = [
{
"type": "tool_result",
"tool_use_id": "123",
"content": [{"type": "text", "text": "file contents here"}],
}
]
assert _flatten_tool_result_content(blocks) == "file contents here"
def test_tool_result_with_string_content(self):
blocks = [{"type": "tool_result", "tool_use_id": "123", "content": "ok"}]
assert _flatten_tool_result_content(blocks) == "ok"
def test_text_block(self):
blocks = [{"type": "text", "text": "plain text"}]
assert _flatten_tool_result_content(blocks) == "plain text"
def test_raw_string(self):
assert _flatten_tool_result_content(["raw"]) == "raw"
def test_tool_result_with_none_content(self):
"""tool_result with content=None should produce empty string."""
blocks = [{"type": "tool_result", "tool_use_id": "x", "content": None}]
assert _flatten_tool_result_content(blocks) == ""
def test_tool_result_with_empty_list_content(self):
"""tool_result with content=[] should produce empty string."""
blocks = [{"type": "tool_result", "tool_use_id": "x", "content": []}]
assert _flatten_tool_result_content(blocks) == ""
def test_empty(self):
assert _flatten_tool_result_content([]) == ""
def test_nested_dict_without_text(self):
"""Dict blocks without text key use json.dumps fallback."""
blocks = [
{
"type": "tool_result",
"tool_use_id": "x",
"content": [{"type": "image", "source": "data:..."}],
}
]
result = _flatten_tool_result_content(blocks)
assert "image" in result # json.dumps fallback
def test_unknown_block_type_preserved_as_placeholder(self):
blocks = [{"type": "image", "source": {"type": "base64", "data": "..."}}]
result = _flatten_tool_result_content(blocks)
assert "[__image__]" in result
# ---------------------------------------------------------------------------
# _transcript_to_messages
# ---------------------------------------------------------------------------
def _make_entry(entry_type: str, role: str, content: str | list, **kwargs) -> str:
"""Build a JSONL line for testing."""
uid = str(uuid4())
msg: dict = {"role": role, "content": content}
msg.update(kwargs)
entry = {
"type": entry_type,
"uuid": uid,
"parentUuid": None,
"message": msg,
}
return json.dumps(entry, separators=(",", ":"))
class TestTranscriptToMessages:
def test_basic_roundtrip(self):
lines = [
_make_entry("user", "user", "Hello"),
_make_entry("assistant", "assistant", [{"type": "text", "text": "Hi"}]),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 2
assert messages[0] == {"role": "user", "content": "Hello"}
assert messages[1] == {"role": "assistant", "content": "Hi"}
def test_skips_strippable_types(self):
"""Progress and metadata entries are excluded."""
lines = [
_make_entry("user", "user", "Hello"),
json.dumps(
{
"type": "progress",
"uuid": str(uuid4()),
"parentUuid": None,
"message": {"role": "assistant", "content": "..."},
}
),
_make_entry("assistant", "assistant", [{"type": "text", "text": "Hi"}]),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 2
def test_empty_content(self):
assert _transcript_to_messages("") == []
def test_tool_result_content(self):
"""User entries with tool_result content blocks are flattened."""
lines = [
_make_entry(
"user",
"user",
[
{
"type": "tool_result",
"tool_use_id": "123",
"content": "tool output",
}
],
),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 1
assert messages[0]["content"] == "tool output"
def test_malformed_json_lines_skipped(self):
"""Malformed JSON lines in transcript are silently skipped."""
lines = [
_make_entry("user", "user", "Hello"),
"this is not valid json",
_make_entry("assistant", "assistant", [{"type": "text", "text": "Hi"}]),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 2
def test_empty_lines_skipped(self):
"""Empty lines and whitespace-only lines are skipped."""
lines = [
_make_entry("user", "user", "Hello"),
"",
" ",
_make_entry("assistant", "assistant", [{"type": "text", "text": "Hi"}]),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 2
def test_unicode_content_preserved(self):
"""Unicode characters survive transcript roundtrip."""
lines = [
_make_entry("user", "user", "Hello 你好 🌍"),
_make_entry(
"assistant",
"assistant",
[{"type": "text", "text": "Bonjour 日本語 émojis 🎉"}],
),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert messages[0]["content"] == "Hello 你好 🌍"
assert messages[1]["content"] == "Bonjour 日本語 émojis 🎉"
def test_entry_without_role_skipped(self):
"""Entries with missing role in message are skipped."""
entry_no_role = json.dumps(
{
"type": "user",
"uuid": str(uuid4()),
"parentUuid": None,
"message": {"content": "no role here"},
}
)
lines = [
entry_no_role,
_make_entry("user", "user", "Hello"),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 1
assert messages[0]["content"] == "Hello"
def test_tool_use_and_result_pairs(self):
"""Tool use + tool result pairs are properly flattened."""
lines = [
_make_entry(
"assistant",
"assistant",
[
{"type": "text", "text": "Let me check."},
{"type": "tool_use", "name": "read_file", "input": {"path": "/x"}},
],
),
_make_entry(
"user",
"user",
[
{
"type": "tool_result",
"tool_use_id": "abc",
"content": [{"type": "text", "text": "file contents"}],
}
],
),
]
content = "\n".join(lines) + "\n"
messages = _transcript_to_messages(content)
assert len(messages) == 2
assert "Let me check." in messages[0]["content"]
assert "[tool_use: read_file]" in messages[0]["content"]
assert messages[1]["content"] == "file contents"
# ---------------------------------------------------------------------------
# _messages_to_transcript
# ---------------------------------------------------------------------------
class TestMessagesToTranscript:
def test_produces_valid_jsonl(self):
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"},
]
result = _messages_to_transcript(messages)
lines = result.strip().split("\n")
assert len(lines) == 2
for line in lines:
parsed = json.loads(line)
assert "type" in parsed
assert "uuid" in parsed
assert "message" in parsed
def test_assistant_has_proper_structure(self):
messages = [{"role": "assistant", "content": "Hello"}]
result = _messages_to_transcript(messages)
entry = json.loads(result.strip())
assert entry["type"] == "assistant"
msg = entry["message"]
assert msg["role"] == "assistant"
assert msg["type"] == "message"
assert msg["stop_reason"] == "end_turn"
assert isinstance(msg["content"], list)
assert msg["content"][0]["type"] == "text"
def test_user_has_plain_content(self):
messages = [{"role": "user", "content": "Hi"}]
result = _messages_to_transcript(messages)
entry = json.loads(result.strip())
assert entry["type"] == "user"
assert entry["message"]["content"] == "Hi"
def test_parent_uuid_chain(self):
messages = [
{"role": "user", "content": "A"},
{"role": "assistant", "content": "B"},
{"role": "user", "content": "C"},
]
result = _messages_to_transcript(messages)
lines = result.strip().split("\n")
entries = [json.loads(line) for line in lines]
assert entries[0]["parentUuid"] == ""
assert entries[1]["parentUuid"] == entries[0]["uuid"]
assert entries[2]["parentUuid"] == entries[1]["uuid"]
def test_empty_messages(self):
assert _messages_to_transcript([]) == ""
def test_output_is_valid_transcript(self):
"""Output should pass validate_transcript if it has assistant entries."""
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi"},
]
result = _messages_to_transcript(messages)
assert validate_transcript(result)
def test_roundtrip_to_messages(self):
"""Messages → transcript → messages preserves structure."""
original = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"},
{"role": "user", "content": "How are you?"},
]
transcript = _messages_to_transcript(original)
restored = _transcript_to_messages(transcript)
assert len(restored) == len(original)
for orig, rest in zip(original, restored):
assert orig["role"] == rest["role"]
assert orig["content"] == rest["content"]
# ---------------------------------------------------------------------------
# compact_transcript
# ---------------------------------------------------------------------------
class TestCompactTranscript:
@pytest.mark.asyncio
async def test_too_few_messages_returns_none(self, mock_chat_config):
"""compact_transcript returns None when transcript has < 2 messages."""
transcript = _build_transcript([("user", "Hello")])
result = await compact_transcript(transcript, model="test-model")
assert result is None
@pytest.mark.asyncio
async def test_returns_none_when_not_compacted(self, mock_chat_config):
"""When compress_context says no compaction needed, returns None.
The compressor couldn't reduce it, so retrying with the same
content would fail identically."""
transcript = _build_transcript(
[
("user", "Hello"),
("assistant", "Hi there"),
]
)
mock_result = type(
"CompressResult",
(),
{
"was_compacted": False,
"messages": [],
"original_token_count": 100,
"token_count": 100,
"messages_summarized": 0,
"messages_dropped": 0,
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):
result = await compact_transcript(transcript, model="test-model")
assert result is None
@pytest.mark.asyncio
async def test_returns_compacted_transcript(self, mock_chat_config):
"""When compaction succeeds, returns a valid compacted transcript."""
transcript = _build_transcript(
[
("user", "Hello"),
("assistant", "Hi"),
("user", "More"),
("assistant", "Details"),
]
)
compacted_msgs = [
{"role": "user", "content": "[summary]"},
{"role": "assistant", "content": "Summarized response"},
]
mock_result = type(
"CompressResult",
(),
{
"was_compacted": True,
"messages": compacted_msgs,
"original_token_count": 500,
"token_count": 100,
"messages_summarized": 2,
"messages_dropped": 0,
},
)()
with patch(
"backend.copilot.sdk.transcript._run_compression",
new_callable=AsyncMock,
return_value=mock_result,
):
result = await compact_transcript(transcript, model="test-model")
assert result is not None
assert validate_transcript(result)
msgs = _transcript_to_messages(result)
assert len(msgs) == 2
assert msgs[1]["content"] == "Summarized response"
@pytest.mark.asyncio
async def test_returns_none_on_compression_failure(self, mock_chat_config):
"""When _run_compression raises, returns None."""
transcript = _build_transcript(
[
("user", "Hello"),
("assistant", "Hi"),
]
)
with patch(
"backend.copilot.sdk.transcript._run_compression",
new_callable=AsyncMock,
side_effect=RuntimeError("LLM unavailable"),
):
result = await compact_transcript(transcript, model="test-model")
assert result is None
# ---------------------------------------------------------------------------
# _is_prompt_too_long
# ---------------------------------------------------------------------------
class TestIsPromptTooLong:
"""Unit tests for _is_prompt_too_long pattern matching."""
def test_prompt_is_too_long(self):
err = RuntimeError("prompt is too long for model context")
assert _is_prompt_too_long(err) is True
def test_request_too_large(self):
err = Exception("request too large: 250000 tokens")
assert _is_prompt_too_long(err) is True
def test_maximum_context_length(self):
err = ValueError("maximum context length exceeded")
assert _is_prompt_too_long(err) is True
def test_context_length_exceeded(self):
err = Exception("context_length_exceeded")
assert _is_prompt_too_long(err) is True
def test_input_tokens_exceed(self):
err = Exception("input tokens exceed the max_tokens limit")
assert _is_prompt_too_long(err) is True
def test_input_is_too_long(self):
err = Exception("input is too long for the model")
assert _is_prompt_too_long(err) is True
def test_content_length_exceeds(self):
err = Exception("content length exceeds maximum")
assert _is_prompt_too_long(err) is True
def test_unrelated_error_returns_false(self):
err = RuntimeError("network timeout")
assert _is_prompt_too_long(err) is False
def test_auth_error_returns_false(self):
err = Exception("authentication failed: invalid API key")
assert _is_prompt_too_long(err) is False
def test_chained_exception_detected(self):
"""Prompt-too-long error wrapped in another exception is detected."""
inner = RuntimeError("prompt is too long")
outer = Exception("SDK error")
outer.__cause__ = inner
assert _is_prompt_too_long(outer) is True
def test_case_insensitive(self):
err = Exception("PROMPT IS TOO LONG")
assert _is_prompt_too_long(err) is True
def test_old_max_tokens_exceeded_not_matched(self):
"""The old broad 'max_tokens_exceeded' pattern was removed.
Only 'input tokens exceed' should match now."""
err = Exception("max_tokens_exceeded")
assert _is_prompt_too_long(err) is False
# ---------------------------------------------------------------------------
# _run_compression timeout fallback
# ---------------------------------------------------------------------------
class TestRunCompressionTimeout:
"""Verify _run_compression falls back to truncation when LLM times out."""
@pytest.mark.asyncio
async def test_timeout_falls_back_to_truncation(self):
"""When compress_context with LLM client times out,
_run_compression falls back to truncation (client=None)."""
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"},
]
truncation_result = CompressResult(
messages=messages,
was_compacted=False,
original_token_count=50,
token_count=50,
messages_summarized=0,
messages_dropped=0,
)
call_args: list[dict] = []
async def _mock_compress(**kwargs):
call_args.append(kwargs)
if kwargs.get("client") is not None:
# Simulate timeout by raising asyncio.TimeoutError
raise asyncio.TimeoutError("LLM compaction timed out")
return truncation_result
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
return_value="fake-client",
),
patch(
"backend.copilot.sdk.transcript.compress_context",
side_effect=_mock_compress,
),
):
result = await _run_compression(messages, "test-model", "[test]")
assert result == truncation_result
# Should have been called twice: once with client, once without
assert len(call_args) == 2
assert call_args[0]["client"] is not None # LLM attempt
assert call_args[1]["client"] is None # truncation fallback
@pytest.mark.asyncio
async def test_no_client_uses_truncation_directly(self):
"""When no OpenAI client is configured, goes straight to truncation."""
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there"},
]
truncation_result = CompressResult(
messages=messages,
was_compacted=False,
original_token_count=50,
token_count=50,
messages_summarized=0,
messages_dropped=0,
)
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
return_value=None,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
new_callable=AsyncMock,
return_value=truncation_result,
) as mock_compress,
):
result = await _run_compression(messages, "test-model", "[test]")
assert result == truncation_result
mock_compress.assert_called_once()
# When no client, compress_context is called with client=None
assert mock_compress.call_args.kwargs.get("client") is None
# ---------------------------------------------------------------------------
# _friendly_error_text
# ---------------------------------------------------------------------------
class TestFriendlyErrorText:
"""Verify user-friendly error message mapping."""
def test_authentication_error(self):
result = _friendly_error_text("authentication failed: invalid API key")
assert "Authentication" in result
assert "API key" in result
def test_rate_limit_error(self):
result = _friendly_error_text("rate limit exceeded")
assert "Rate limit" in result
def test_overloaded_error(self):
result = _friendly_error_text("API is overloaded")
assert "overloaded" in result
def test_timeout_error(self):
result = _friendly_error_text("Request timeout after 30s")
assert "timed out" in result
def test_connection_error(self):
result = _friendly_error_text("Connection refused")
assert "Connection" in result or "connection" in result
def test_unknown_error_passthrough(self):
result = _friendly_error_text("some unknown error XYZ")
assert "SDK stream error:" in result
assert "XYZ" in result
def test_unauthorized_error(self):
result = _friendly_error_text("401 Unauthorized")
assert "Authentication" in result

View File

@@ -20,7 +20,6 @@ from claude_agent_sdk import (
UserMessage,
)
from backend.copilot.constants import FRIENDLY_TRANSIENT_MSG, is_transient_api_error
from backend.copilot.response_model import (
StreamBaseResponse,
StreamError,
@@ -215,12 +214,10 @@ class SDKResponseAdapter:
if sdk_message.subtype == "success":
responses.append(StreamFinish())
elif sdk_message.subtype in ("error", "error_during_execution"):
raw_error = str(sdk_message.result or "Unknown error")
if is_transient_api_error(raw_error):
error_text, code = FRIENDLY_TRANSIENT_MSG, "transient_api_error"
else:
error_text, code = raw_error, "sdk_error"
responses.append(StreamError(errorText=error_text, code=code))
error_msg = sdk_message.result or "Unknown error"
responses.append(
StreamError(errorText=str(error_msg), code="sdk_error")
)
responses.append(StreamFinish())
else:
logger.warning(

View File

@@ -42,7 +42,7 @@ def _validate_workspace_path(
Delegates to :func:`is_allowed_local_path` which permits:
- The SDK working directory (``/tmp/copilot-<session>/``)
- The current session's tool-results directory
(``~/.claude/projects/<encoded-cwd>/<uuid>/tool-results/``)
(``~/.claude/projects/<encoded-cwd>/tool-results/``)
"""
path = tool_input.get("file_path") or tool_input.get("path") or ""
if not path:
@@ -302,11 +302,7 @@ def create_security_hooks(
"""
_ = context, tool_use_id
trigger = input_data.get("trigger", "auto")
# Sanitize untrusted input: strip control chars for logging AND
# for the value passed downstream. read_compacted_entries()
# validates against _projects_base() as defence-in-depth, but
# sanitizing here prevents log injection and rejects obviously
# malformed paths early.
# Sanitize untrusted input before logging to prevent log injection
transcript_path = (
str(input_data.get("transcript_path", ""))
.replace("\n", "")

View File

@@ -122,7 +122,7 @@ def test_read_no_cwd_denies_absolute():
def test_read_tool_results_allowed():
home = os.path.expanduser("~")
path = f"{home}/.claude/projects/-tmp-copilot-abc123/a1b2c3d4-e5f6-7890-abcd-ef1234567890/tool-results/12345.txt"
path = f"{home}/.claude/projects/-tmp-copilot-abc123/tool-results/12345.txt"
# is_allowed_local_path requires the session's encoded cwd to be set
token = _current_project_dir.set("-tmp-copilot-abc123")
try:

File diff suppressed because it is too large Load Diff

View File

@@ -1,283 +0,0 @@
"""Unit tests for extracted service helpers.
Covers ``_is_prompt_too_long``, ``_reduce_context``, ``_iter_sdk_messages``,
and the ``ReducedContext`` named tuple.
"""
from __future__ import annotations
import asyncio
from collections.abc import AsyncGenerator
from unittest.mock import AsyncMock, patch
import pytest
from .conftest import build_test_transcript as _build_transcript
from .service import (
ReducedContext,
_is_prompt_too_long,
_iter_sdk_messages,
_reduce_context,
)
# ---------------------------------------------------------------------------
# _is_prompt_too_long
# ---------------------------------------------------------------------------
class TestIsPromptTooLong:
def test_direct_match(self) -> None:
assert _is_prompt_too_long(Exception("prompt is too long")) is True
def test_case_insensitive(self) -> None:
assert _is_prompt_too_long(Exception("PROMPT IS TOO LONG")) is True
def test_no_match(self) -> None:
assert _is_prompt_too_long(Exception("network timeout")) is False
def test_request_too_large(self) -> None:
assert _is_prompt_too_long(Exception("request too large for model")) is True
def test_context_length_exceeded(self) -> None:
assert _is_prompt_too_long(Exception("context_length_exceeded")) is True
def test_max_tokens_exceeded_not_matched(self) -> None:
"""'max_tokens_exceeded' is intentionally excluded (too broad)."""
assert _is_prompt_too_long(Exception("max_tokens_exceeded")) is False
def test_max_tokens_config_error_no_match(self) -> None:
"""'max_tokens must be at least 1' should NOT match."""
assert _is_prompt_too_long(Exception("max_tokens must be at least 1")) is False
def test_chained_cause(self) -> None:
inner = Exception("prompt is too long")
outer = RuntimeError("SDK error")
outer.__cause__ = inner
assert _is_prompt_too_long(outer) is True
def test_chained_context(self) -> None:
inner = Exception("request too large")
outer = RuntimeError("wrapped")
outer.__context__ = inner
assert _is_prompt_too_long(outer) is True
def test_deep_chain(self) -> None:
bottom = Exception("maximum context length")
middle = RuntimeError("middle")
middle.__cause__ = bottom
top = ValueError("top")
top.__cause__ = middle
assert _is_prompt_too_long(top) is True
def test_chain_no_match(self) -> None:
inner = Exception("rate limit exceeded")
outer = RuntimeError("wrapped")
outer.__cause__ = inner
assert _is_prompt_too_long(outer) is False
def test_cycle_detection(self) -> None:
"""Exception chain with a cycle should not infinite-loop."""
a = Exception("error a")
b = Exception("error b")
a.__cause__ = b
b.__cause__ = a # cycle
assert _is_prompt_too_long(a) is False
def test_all_patterns(self) -> None:
patterns = [
"prompt is too long",
"request too large",
"maximum context length",
"context_length_exceeded",
"input tokens exceed",
"input is too long",
"content length exceeds",
]
for pattern in patterns:
assert _is_prompt_too_long(Exception(pattern)) is True, pattern
# ---------------------------------------------------------------------------
# _reduce_context
# ---------------------------------------------------------------------------
class TestReduceContext:
@pytest.mark.asyncio
async def test_first_retry_compaction_success(self) -> None:
transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
compacted = _build_transcript([("user", "hi"), ("assistant", "[summary]")])
with (
patch(
"backend.copilot.sdk.service.compact_transcript",
new_callable=AsyncMock,
return_value=compacted,
),
patch(
"backend.copilot.sdk.service.validate_transcript",
return_value=True,
),
patch(
"backend.copilot.sdk.service.write_transcript_to_tempfile",
return_value="/tmp/resume.jsonl",
),
):
ctx = await _reduce_context(
transcript, False, "sess-123", "/tmp/cwd", "[test]"
)
assert isinstance(ctx, ReducedContext)
assert ctx.use_resume is True
assert ctx.resume_file == "/tmp/resume.jsonl"
assert ctx.transcript_lost is False
assert ctx.tried_compaction is True
@pytest.mark.asyncio
async def test_compaction_fails_drops_transcript(self) -> None:
transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
with patch(
"backend.copilot.sdk.service.compact_transcript",
new_callable=AsyncMock,
return_value=None,
):
ctx = await _reduce_context(
transcript, False, "sess-123", "/tmp/cwd", "[test]"
)
assert ctx.use_resume is False
assert ctx.resume_file is None
assert ctx.transcript_lost is True
assert ctx.tried_compaction is True
@pytest.mark.asyncio
async def test_already_tried_compaction_skips(self) -> None:
transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
ctx = await _reduce_context(transcript, True, "sess-123", "/tmp/cwd", "[test]")
assert ctx.use_resume is False
assert ctx.transcript_lost is True
assert ctx.tried_compaction is True
@pytest.mark.asyncio
async def test_empty_transcript_drops(self) -> None:
ctx = await _reduce_context("", False, "sess-123", "/tmp/cwd", "[test]")
assert ctx.use_resume is False
assert ctx.transcript_lost is True
@pytest.mark.asyncio
async def test_compaction_returns_same_content_drops(self) -> None:
transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
with patch(
"backend.copilot.sdk.service.compact_transcript",
new_callable=AsyncMock,
return_value=transcript, # same content
):
ctx = await _reduce_context(
transcript, False, "sess-123", "/tmp/cwd", "[test]"
)
assert ctx.transcript_lost is True
@pytest.mark.asyncio
async def test_write_tempfile_fails_drops(self) -> None:
transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
compacted = _build_transcript([("user", "hi"), ("assistant", "[summary]")])
with (
patch(
"backend.copilot.sdk.service.compact_transcript",
new_callable=AsyncMock,
return_value=compacted,
),
patch(
"backend.copilot.sdk.service.validate_transcript",
return_value=True,
),
patch(
"backend.copilot.sdk.service.write_transcript_to_tempfile",
return_value=None,
),
):
ctx = await _reduce_context(
transcript, False, "sess-123", "/tmp/cwd", "[test]"
)
assert ctx.transcript_lost is True
# ---------------------------------------------------------------------------
# _iter_sdk_messages
# ---------------------------------------------------------------------------
class TestIterSdkMessages:
@pytest.mark.asyncio
async def test_yields_messages(self) -> None:
messages = ["msg1", "msg2", "msg3"]
client = AsyncMock()
async def _fake_receive() -> AsyncGenerator[str]:
for m in messages:
yield m
client.receive_response = _fake_receive
result = [msg async for msg in _iter_sdk_messages(client)]
assert result == messages
@pytest.mark.asyncio
async def test_heartbeat_on_timeout(self) -> None:
"""Yields None when asyncio.wait times out."""
client = AsyncMock()
received: list = []
async def _slow_receive() -> AsyncGenerator[str]:
await asyncio.sleep(100) # never completes
yield "never" # pragma: no cover — unreachable, yield makes this an async generator
client.receive_response = _slow_receive
with patch("backend.copilot.sdk.service._HEARTBEAT_INTERVAL", 0.01):
count = 0
async for msg in _iter_sdk_messages(client):
received.append(msg)
count += 1
if count >= 3:
break
assert all(m is None for m in received)
@pytest.mark.asyncio
async def test_exception_propagates(self) -> None:
client = AsyncMock()
async def _error_receive() -> AsyncGenerator[str]:
raise RuntimeError("SDK crash")
yield # pragma: no cover — unreachable, yield makes this an async generator
client.receive_response = _error_receive
with pytest.raises(RuntimeError, match="SDK crash"):
async for _ in _iter_sdk_messages(client):
pass
@pytest.mark.asyncio
async def test_task_cleanup_on_break(self) -> None:
"""Pending task is cancelled when generator is closed."""
client = AsyncMock()
async def _slow_receive() -> AsyncGenerator[str]:
yield "first"
await asyncio.sleep(100)
yield "second"
client.receive_response = _slow_receive
gen = _iter_sdk_messages(client)
first = await gen.__anext__()
assert first == "first"
await gen.aclose() # should cancel pending task cleanly

View File

@@ -8,7 +8,7 @@ from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from .service import _prepare_file_attachments, _resolve_sdk_model
from .service import _prepare_file_attachments
@dataclass
@@ -288,214 +288,3 @@ class TestPromptSupplement:
# Count how many times this tool appears as a bullet point
count = docs.count(f"- **`{tool_name}`**")
assert count == 1, f"Tool '{tool_name}' appears {count} times (should be 1)"
# ---------------------------------------------------------------------------
# _cleanup_sdk_tool_results — orchestration + rate-limiting
# ---------------------------------------------------------------------------
class TestCleanupSdkToolResults:
"""Tests for _cleanup_sdk_tool_results orchestration and sweep rate-limiting."""
# All valid cwds must start with /tmp/copilot- (the _SDK_CWD_PREFIX).
_CWD_PREFIX = "/tmp/copilot-"
@pytest.mark.asyncio
async def test_removes_cwd_directory(self):
"""Cleanup removes the session working directory."""
from .service import _cleanup_sdk_tool_results
cwd = "/tmp/copilot-test-cleanup-remove"
os.makedirs(cwd, exist_ok=True)
with patch("backend.copilot.sdk.service.cleanup_stale_project_dirs"):
import backend.copilot.sdk.service as svc_mod
svc_mod._last_sweep_time = 0.0
await _cleanup_sdk_tool_results(cwd)
assert not os.path.exists(cwd)
@pytest.mark.asyncio
async def test_sweep_runs_when_interval_elapsed(self):
"""cleanup_stale_project_dirs is called when 5-minute interval has elapsed."""
import backend.copilot.sdk.service as svc_mod
from .service import _cleanup_sdk_tool_results
cwd = "/tmp/copilot-test-sweep-elapsed"
os.makedirs(cwd, exist_ok=True)
with patch(
"backend.copilot.sdk.service.cleanup_stale_project_dirs"
) as mock_sweep:
# Set last sweep to a time far in the past
svc_mod._last_sweep_time = 0.0
await _cleanup_sdk_tool_results(cwd)
mock_sweep.assert_called_once()
@pytest.mark.asyncio
async def test_sweep_skipped_within_interval(self):
"""cleanup_stale_project_dirs is NOT called when within 5-minute interval."""
import time
import backend.copilot.sdk.service as svc_mod
from .service import _cleanup_sdk_tool_results
cwd = "/tmp/copilot-test-sweep-ratelimit"
os.makedirs(cwd, exist_ok=True)
with patch(
"backend.copilot.sdk.service.cleanup_stale_project_dirs"
) as mock_sweep:
# Set last sweep to now — interval not elapsed
svc_mod._last_sweep_time = time.time()
await _cleanup_sdk_tool_results(cwd)
mock_sweep.assert_not_called()
@pytest.mark.asyncio
async def test_rejects_path_outside_prefix(self, tmp_path):
"""Cleanup rejects a cwd that does not start with the expected prefix."""
from .service import _cleanup_sdk_tool_results
evil_cwd = str(tmp_path / "evil-path")
os.makedirs(evil_cwd, exist_ok=True)
with patch(
"backend.copilot.sdk.service.cleanup_stale_project_dirs"
) as mock_sweep:
await _cleanup_sdk_tool_results(evil_cwd)
# Directory should NOT have been removed (rejected early)
assert os.path.exists(evil_cwd)
mock_sweep.assert_not_called()
# ---------------------------------------------------------------------------
# Env vars that ChatConfig validators read — must be cleared so explicit
# constructor values are used.
# ---------------------------------------------------------------------------
_CONFIG_ENV_VARS = (
"CHAT_USE_OPENROUTER",
"CHAT_API_KEY",
"OPEN_ROUTER_API_KEY",
"OPENAI_API_KEY",
"CHAT_BASE_URL",
"OPENROUTER_BASE_URL",
"OPENAI_BASE_URL",
"CHAT_USE_CLAUDE_CODE_SUBSCRIPTION",
"CHAT_USE_CLAUDE_AGENT_SDK",
)
@pytest.fixture()
def _clean_config_env(monkeypatch: pytest.MonkeyPatch) -> None:
for var in _CONFIG_ENV_VARS:
monkeypatch.delenv(var, raising=False)
class TestResolveSdkModel:
"""Tests for _resolve_sdk_model — model ID resolution for the SDK CLI."""
def test_openrouter_active_keeps_dots(self, monkeypatch, _clean_config_env):
"""When OpenRouter is fully active, model keeps dot-separated version."""
from backend.copilot import config as cfg_mod
cfg = cfg_mod.ChatConfig(
model="anthropic/claude-opus-4.6",
claude_agent_model=None,
use_openrouter=True,
api_key="or-key",
base_url="https://openrouter.ai/api/v1",
use_claude_code_subscription=False,
)
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
assert _resolve_sdk_model() == "claude-opus-4.6"
def test_openrouter_disabled_normalizes_to_hyphens(
self, monkeypatch, _clean_config_env
):
"""When OpenRouter is disabled, dots are replaced with hyphens."""
from backend.copilot import config as cfg_mod
cfg = cfg_mod.ChatConfig(
model="anthropic/claude-opus-4.6",
claude_agent_model=None,
use_openrouter=False,
api_key=None,
base_url=None,
use_claude_code_subscription=False,
)
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
assert _resolve_sdk_model() == "claude-opus-4-6"
def test_openrouter_enabled_but_missing_key_normalizes(
self, monkeypatch, _clean_config_env
):
"""When OpenRouter is enabled but api_key is missing, falls back to
direct Anthropic and normalizes dots to hyphens."""
from backend.copilot import config as cfg_mod
cfg = cfg_mod.ChatConfig(
model="anthropic/claude-opus-4.6",
claude_agent_model=None,
use_openrouter=True,
api_key=None,
base_url="https://openrouter.ai/api/v1",
use_claude_code_subscription=False,
)
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
assert _resolve_sdk_model() == "claude-opus-4-6"
def test_explicit_claude_agent_model_takes_precedence(
self, monkeypatch, _clean_config_env
):
"""When claude_agent_model is explicitly set, it is returned as-is."""
from backend.copilot import config as cfg_mod
cfg = cfg_mod.ChatConfig(
model="anthropic/claude-opus-4.6",
claude_agent_model="claude-sonnet-4-5-20250514",
use_openrouter=True,
api_key="or-key",
base_url="https://openrouter.ai/api/v1",
use_claude_code_subscription=False,
)
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
assert _resolve_sdk_model() == "claude-sonnet-4-5-20250514"
def test_subscription_mode_returns_none(self, monkeypatch, _clean_config_env):
"""When using Claude Code subscription, returns None (CLI picks model)."""
from backend.copilot import config as cfg_mod
cfg = cfg_mod.ChatConfig(
model="anthropic/claude-opus-4.6",
claude_agent_model=None,
use_openrouter=False,
api_key=None,
base_url=None,
use_claude_code_subscription=True,
)
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
assert _resolve_sdk_model() is None
def test_model_without_provider_prefix(self, monkeypatch, _clean_config_env):
"""When model has no provider prefix, it still normalizes correctly."""
from backend.copilot import config as cfg_mod
cfg = cfg_mod.ChatConfig(
model="claude-opus-4.6",
claude_agent_model=None,
use_openrouter=False,
api_key=None,
base_url=None,
use_claude_code_subscription=False,
)
monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
assert _resolve_sdk_model() == "claude-opus-4-6"

View File

@@ -1,144 +0,0 @@
"""Claude Code subscription auth helpers.
Handles locating the SDK-bundled CLI binary, provisioning credentials from
environment variables, and validating that subscription auth is functional.
"""
import functools
import json
import logging
import os
import shutil
import subprocess
logger = logging.getLogger(__name__)
def find_bundled_cli() -> str:
"""Locate the Claude CLI binary bundled inside ``claude_agent_sdk``.
Falls back to ``shutil.which("claude")`` if the SDK bundle is absent.
"""
try:
from claude_agent_sdk._internal.transport.subprocess_cli import (
SubprocessCLITransport,
)
path = SubprocessCLITransport._find_bundled_cli(None) # type: ignore[arg-type]
if path:
return str(path)
except Exception:
pass
system_path = shutil.which("claude")
if system_path:
return system_path
raise RuntimeError(
"Claude CLI not found — neither the SDK-bundled binary nor a "
"system-installed `claude` could be located."
)
def provision_credentials_file() -> None:
"""Write ``~/.claude/.credentials.json`` from env when running headless.
If ``CLAUDE_CODE_OAUTH_TOKEN`` is set (an OAuth *access* token obtained
from ``claude auth status`` or extracted from the macOS keychain), this
helper writes a minimal credentials file so the bundled CLI can
authenticate without an interactive ``claude login``.
A ``CLAUDE_CODE_REFRESH_TOKEN`` env var is optional but recommended —
it lets the CLI silently refresh an expired access token.
"""
access_token = os.environ.get("CLAUDE_CODE_OAUTH_TOKEN", "").strip()
if not access_token:
return
creds_dir = os.path.expanduser("~/.claude")
creds_path = os.path.join(creds_dir, ".credentials.json")
# Don't overwrite an existing credentials file (e.g. from a volume mount).
if os.path.exists(creds_path):
logger.debug("Credentials file already exists at %s — skipping", creds_path)
return
os.makedirs(creds_dir, exist_ok=True)
creds = {
"claudeAiOauth": {
"accessToken": access_token,
"refreshToken": os.environ.get("CLAUDE_CODE_REFRESH_TOKEN", "").strip(),
"expiresAt": 0,
"scopes": [
"user:inference",
"user:profile",
"user:sessions:claude_code",
],
}
}
with open(creds_path, "w") as f:
json.dump(creds, f)
logger.info("Provisioned Claude credentials file at %s", creds_path)
@functools.cache
def validate_subscription() -> None:
"""Validate the bundled Claude CLI is reachable and authenticated.
Cached so the blocking subprocess check runs at most once per process
lifetime. On first call, also provisions ``~/.claude/.credentials.json``
from the ``CLAUDE_CODE_OAUTH_TOKEN`` env var when available.
"""
provision_credentials_file()
cli = find_bundled_cli()
result = subprocess.run(
[cli, "--version"],
capture_output=True,
text=True,
timeout=10,
)
if result.returncode != 0:
raise RuntimeError(
f"Claude CLI check failed (exit {result.returncode}): "
f"{result.stderr.strip()}"
)
logger.info(
"Claude Code subscription mode: CLI version %s",
result.stdout.strip(),
)
# Verify the CLI is actually authenticated.
auth_result = subprocess.run(
[cli, "auth", "status"],
capture_output=True,
text=True,
timeout=10,
env={
**os.environ,
"ANTHROPIC_API_KEY": "",
"ANTHROPIC_AUTH_TOKEN": "",
"ANTHROPIC_BASE_URL": "",
},
)
if auth_result.returncode != 0:
raise RuntimeError(
"Claude CLI is not authenticated. Either:\n"
" • Set CLAUDE_CODE_OAUTH_TOKEN env var (from `claude auth status` "
"or macOS keychain), or\n"
" • Mount ~/.claude/.credentials.json into the container, or\n"
" • Run `claude login` inside the container."
)
try:
status = json.loads(auth_result.stdout)
if not status.get("loggedIn"):
raise RuntimeError(
"Claude CLI reports loggedIn=false. Set CLAUDE_CODE_OAUTH_TOKEN "
"or run `claude login`."
)
logger.info(
"Claude subscription auth: method=%s, email=%s",
status.get("authMethod"),
status.get("email"),
)
except json.JSONDecodeError:
logger.warning("Could not parse `claude auth status` output")

View File

@@ -146,7 +146,7 @@ def stash_pending_tool_output(tool_name: str, output: Any) -> None:
event.set()
async def wait_for_stash(timeout: float = 2.0) -> bool:
async def wait_for_stash(timeout: float = 0.5) -> bool:
"""Wait for a PostToolUse hook to stash tool output.
The SDK fires PostToolUse hooks asynchronously via ``start_soon()`` —
@@ -155,12 +155,12 @@ async def wait_for_stash(timeout: float = 2.0) -> bool:
by waiting on the ``_stash_event``, which is signaled by
:func:`stash_pending_tool_output`.
Returns ``True`` if a stash signal was received, ``False`` on timeout.
After the event fires, callers should ``await asyncio.sleep(0)`` to
give any remaining concurrent hooks a chance to complete.
The 2.0 s default was chosen based on production metrics: the original
0.5 s caused frequent timeouts under load (parallel tool calls, large
outputs). 2.0 s gives a comfortable margin while still failing fast
when the hook genuinely will not fire.
Returns ``True`` if a stash signal was received, ``False`` on timeout.
The timeout is a safety net — normally the stash happens within
microseconds of yielding to the event loop.
"""
event = _stash_event.get(None)
if event is None:
@@ -285,7 +285,7 @@ async def _read_file_handler(args: dict[str, Any]) -> dict[str, Any]:
resolved = os.path.realpath(os.path.expanduser(file_path))
try:
with open(resolved, encoding="utf-8", errors="replace") as f:
with open(resolved) as f:
selected = list(itertools.islice(f, offset, offset + limit))
# Cleanup happens in _cleanup_sdk_tool_results after session ends;
# don't delete here — the SDK may read in multiple chunks.

View File

@@ -10,9 +10,6 @@ Storage is handled via ``WorkspaceStorageBackend`` (GCS in prod, local
filesystem for self-hosted) — no DB column needed.
"""
from __future__ import annotations
import asyncio
import logging
import os
import re
@@ -20,12 +17,8 @@ import shutil
import time
from dataclasses import dataclass
from pathlib import Path
from uuid import uuid4
from backend.util import json
from backend.util.clients import get_openai_client
from backend.util.prompt import CompressResult, compress_context
from backend.util.workspace_storage import GCSWorkspaceStorage, get_workspace_storage
logger = logging.getLogger(__name__)
@@ -106,14 +99,7 @@ def strip_progress_entries(content: str) -> str:
continue
parent = entry.get("parentUuid", "")
original_parent = parent
# seen_parents is local per-entry (not shared across iterations) so
# it can only detect cycles within a single ancestry walk, not across
# entries. This is intentional: each entry's parent chain is
# independent, and reusing a global set would incorrectly short-circuit
# valid re-use of the same UUID as a parent in different subtrees.
seen_parents: set[str] = set()
while parent in stripped_uuids and parent not in seen_parents:
seen_parents.add(parent)
while parent in stripped_uuids:
parent = uuid_to_parent.get(parent, "")
if parent != original_parent:
entry["parentUuid"] = parent
@@ -165,110 +151,44 @@ def _projects_base() -> str:
return os.path.realpath(os.path.join(config_dir, "projects"))
_STALE_PROJECT_DIR_SECONDS = 12 * 3600 # 12 hours — matches max session lifetime
_MAX_PROJECT_DIRS_TO_SWEEP = 50 # limit per sweep to avoid long pauses
def _cli_project_dir(sdk_cwd: str) -> str | None:
"""Return the CLI's project directory for a given working directory.
def cleanup_stale_project_dirs(encoded_cwd: str | None = None) -> int:
"""Remove CLI project directories older than ``_STALE_PROJECT_DIR_SECONDS``.
Each CoPilot SDK turn creates a unique ``~/.claude/projects/<encoded-cwd>/``
directory. These are intentionally kept across turns so the model can read
tool-result files via ``--resume``. However, after a session ends they
become stale. This function sweeps old ones to prevent unbounded disk
growth.
When *encoded_cwd* is provided the sweep is scoped to that single
directory, making the operation safe in multi-tenant environments where
multiple copilot sessions share the same host. Without it the function
falls back to sweeping all directories matching the copilot naming pattern
(``-tmp-copilot-``), which is only safe for single-tenant deployments.
Returns the number of directories removed.
Returns ``None`` if the path would escape the projects base.
"""
cwd_encoded = re.sub(r"[^a-zA-Z0-9]", "-", os.path.realpath(sdk_cwd))
projects_base = _projects_base()
if not os.path.isdir(projects_base):
return 0
project_dir = os.path.realpath(os.path.join(projects_base, cwd_encoded))
now = time.time()
removed = 0
# Scoped mode: only clean up the one directory for the current session.
if encoded_cwd:
target = Path(projects_base) / encoded_cwd
if not target.is_dir():
return 0
# Guard: only sweep copilot-generated dirs.
if "-tmp-copilot-" not in target.name:
logger.warning(
"[Transcript] Refusing to sweep non-copilot dir: %s", target.name
)
return 0
try:
# st_mtime is used as a proxy for session activity. Claude CLI writes
# its JSONL transcript into this directory during each turn, so mtime
# advances on every turn. A directory whose mtime is older than
# _STALE_PROJECT_DIR_SECONDS has not had an active turn in that window
# and is safe to remove (the session cannot --resume after cleanup).
age = now - target.stat().st_mtime
except OSError:
return 0
if age < _STALE_PROJECT_DIR_SECONDS:
return 0
try:
shutil.rmtree(target, ignore_errors=True)
removed = 1
except OSError:
pass
if removed:
logger.info(
"[Transcript] Swept stale CLI project dir %s (age %ds > %ds)",
target.name,
int(age),
_STALE_PROJECT_DIR_SECONDS,
)
return removed
# Unscoped fallback: sweep all copilot dirs across the projects base.
# Only safe for single-tenant deployments; callers should prefer the
# scoped variant by passing encoded_cwd.
try:
entries = Path(projects_base).iterdir()
except OSError as e:
logger.warning("[Transcript] Failed to list projects dir: %s", e)
return 0
for entry in entries:
if removed >= _MAX_PROJECT_DIRS_TO_SWEEP:
break
# Only sweep copilot-generated dirs (pattern: -tmp-copilot- or
# -private-tmp-copilot-).
if "-tmp-copilot-" not in entry.name:
continue
if not entry.is_dir():
continue
try:
# See the scoped-mode comment above: st_mtime advances on every turn,
# so a stale mtime reliably indicates an inactive session.
age = now - entry.stat().st_mtime
except OSError:
continue
if age < _STALE_PROJECT_DIR_SECONDS:
continue
try:
shutil.rmtree(entry, ignore_errors=True)
removed += 1
except OSError:
pass
if removed:
logger.info(
"[Transcript] Swept %d stale CLI project dirs (older than %ds)",
removed,
_STALE_PROJECT_DIR_SECONDS,
if not project_dir.startswith(projects_base + os.sep):
logger.warning(
"[Transcript] Project dir escaped projects base: %s", project_dir
)
return removed
return None
return project_dir
def _safe_glob_jsonl(project_dir: str) -> list[Path]:
"""Glob ``*.jsonl`` files, filtering out symlinks that escape the directory."""
try:
resolved_base = Path(project_dir).resolve()
except OSError as e:
logger.warning("[Transcript] Failed to resolve project dir: %s", e)
return []
result: list[Path] = []
for candidate in Path(project_dir).glob("*.jsonl"):
try:
resolved = candidate.resolve()
if resolved.is_relative_to(resolved_base):
result.append(resolved)
except (OSError, RuntimeError) as e:
logger.debug(
"[Transcript] Skipping invalid CLI session candidate %s: %s",
candidate,
e,
)
return result
def read_compacted_entries(transcript_path: str) -> list[dict] | None:
@@ -335,6 +255,63 @@ def read_compacted_entries(transcript_path: str) -> list[dict] | None:
return entries
def read_cli_session_file(sdk_cwd: str) -> str | None:
"""Read the CLI's own session file, which reflects any compaction.
The CLI writes its session transcript to
``~/.claude/projects/<encoded_cwd>/<session_id>.jsonl``.
Since each SDK turn uses a unique ``sdk_cwd``, there should be
exactly one ``.jsonl`` file in that directory.
Returns the file content, or ``None`` if not found.
"""
project_dir = _cli_project_dir(sdk_cwd)
if not project_dir or not os.path.isdir(project_dir):
return None
jsonl_files = _safe_glob_jsonl(project_dir)
if not jsonl_files:
logger.debug("[Transcript] No CLI session file found in %s", project_dir)
return None
# Pick the most recently modified file (should be only one per turn).
try:
session_file = max(jsonl_files, key=lambda p: p.stat().st_mtime)
except OSError as e:
logger.warning("[Transcript] Failed to inspect CLI session files: %s", e)
return None
try:
content = session_file.read_text()
logger.info(
"[Transcript] Read CLI session file: %s (%d bytes)",
session_file,
len(content),
)
return content
except OSError as e:
logger.warning("[Transcript] Failed to read CLI session file: %s", e)
return None
def cleanup_cli_project_dir(sdk_cwd: str) -> None:
"""Remove the CLI's project directory for a specific working directory.
The CLI stores session data under ``~/.claude/projects/<encoded_cwd>/``.
Each SDK turn uses a unique ``sdk_cwd``, so the project directory is
safe to remove entirely after the transcript has been uploaded.
"""
project_dir = _cli_project_dir(sdk_cwd)
if not project_dir:
return
if os.path.isdir(project_dir):
shutil.rmtree(project_dir, ignore_errors=True)
logger.debug("[Transcript] Cleaned up CLI project dir: %s", project_dir)
else:
logger.debug("[Transcript] Project dir not found: %s", project_dir)
def write_transcript_to_tempfile(
transcript_content: str,
session_id: str,
@@ -350,7 +327,7 @@ def write_transcript_to_tempfile(
# Validate cwd is under the expected sandbox prefix (CodeQL sanitizer).
real_cwd = os.path.realpath(cwd)
if not real_cwd.startswith(_SAFE_CWD_PREFIX):
logger.warning("[Transcript] cwd outside sandbox: %s", cwd)
logger.warning(f"[Transcript] cwd outside sandbox: {cwd}")
return None
try:
@@ -360,17 +337,17 @@ def write_transcript_to_tempfile(
os.path.join(real_cwd, f"transcript-{safe_id}.jsonl")
)
if not jsonl_path.startswith(real_cwd):
logger.warning("[Transcript] Path escaped cwd: %s", jsonl_path)
logger.warning(f"[Transcript] Path escaped cwd: {jsonl_path}")
return None
with open(jsonl_path, "w") as f:
f.write(transcript_content)
logger.info("[Transcript] Wrote resume file: %s", jsonl_path)
logger.info(f"[Transcript] Wrote resume file: {jsonl_path}")
return jsonl_path
except OSError as e:
logger.warning("[Transcript] Failed to write resume file: %s", e)
logger.warning(f"[Transcript] Failed to write resume file: {e}")
return None
@@ -431,6 +408,8 @@ def _meta_storage_path_parts(user_id: str, session_id: str) -> tuple[str, str, s
def _build_path_from_parts(parts: tuple[str, str, str], backend: object) -> str:
"""Build a full storage path from (workspace_id, file_id, filename) parts."""
from backend.util.workspace_storage import GCSWorkspaceStorage
wid, fid, fname = parts
if isinstance(backend, GCSWorkspaceStorage):
blob = f"workspaces/{wid}/{fid}/{fname}"
@@ -469,15 +448,17 @@ async def upload_transcript(
content: Complete JSONL transcript (from TranscriptBuilder).
message_count: ``len(session.messages)`` at upload time.
"""
from backend.util.workspace_storage import get_workspace_storage
# Strip metadata entries (progress, file-history-snapshot, etc.)
# Note: SDK-built transcripts shouldn't have these, but strip for safety
stripped = strip_progress_entries(content)
if not validate_transcript(stripped):
# Log entry types for debugging — helps identify why validation failed
entry_types = [
json.loads(line, fallback={"type": "INVALID_JSON"}).get("type", "?")
for line in stripped.strip().split("\n")
]
entry_types: list[str] = []
for line in stripped.strip().split("\n"):
entry = json.loads(line, fallback={"type": "INVALID_JSON"})
entry_types.append(entry.get("type", "?"))
logger.warning(
"%s Skipping upload — stripped content not valid "
"(types=%s, stripped_len=%d, raw_len=%d)",
@@ -513,14 +494,11 @@ async def upload_transcript(
content=json.dumps(meta).encode("utf-8"),
)
except Exception as e:
logger.warning("%s Failed to write metadata: %s", log_prefix, e)
logger.warning(f"{log_prefix} Failed to write metadata: {e}")
logger.info(
"%s Uploaded %dB (stripped from %dB, msg_count=%d)",
log_prefix,
len(encoded),
len(content),
message_count,
f"{log_prefix} Uploaded {len(encoded)}B "
f"(stripped from {len(content)}B, msg_count={message_count})"
)
@@ -534,6 +512,8 @@ async def download_transcript(
Returns a ``TranscriptDownload`` with the JSONL content and the
``message_count`` watermark from the upload, or ``None`` if not found.
"""
from backend.util.workspace_storage import get_workspace_storage
storage = await get_workspace_storage()
path = _build_storage_path(user_id, session_id, storage)
@@ -541,10 +521,10 @@ async def download_transcript(
data = await storage.retrieve(path)
content = data.decode("utf-8")
except FileNotFoundError:
logger.debug("%s No transcript in storage", log_prefix)
logger.debug(f"{log_prefix} No transcript in storage")
return None
except Exception as e:
logger.warning("%s Failed to download transcript: %s", log_prefix, e)
logger.warning(f"{log_prefix} Failed to download transcript: {e}")
return None
# Try to load metadata (best-effort — old transcripts won't have it)
@@ -556,14 +536,10 @@ async def download_transcript(
meta = json.loads(meta_data.decode("utf-8"), fallback={})
message_count = meta.get("message_count", 0)
uploaded_at = meta.get("uploaded_at", 0.0)
except FileNotFoundError:
except (FileNotFoundError, Exception):
pass # No metadata — treat as unknown (msg_count=0 → always fill gap)
except Exception as e:
logger.debug("%s Failed to load transcript metadata: %s", log_prefix, e)
logger.info(
"%s Downloaded %dB (msg_count=%d)", log_prefix, len(content), message_count
)
logger.info(f"{log_prefix} Downloaded {len(content)}B (msg_count={message_count})")
return TranscriptDownload(
content=content,
message_count=message_count,
@@ -577,6 +553,8 @@ async def delete_transcript(user_id: str, session_id: str) -> None:
Removes both the ``.jsonl`` transcript and the companion ``.meta.json``
so stale ``message_count`` watermarks cannot corrupt gap-fill logic.
"""
from backend.util.workspace_storage import get_workspace_storage
storage = await get_workspace_storage()
path = _build_storage_path(user_id, session_id, storage)
@@ -593,280 +571,3 @@ async def delete_transcript(user_id: str, session_id: str) -> None:
logger.info("[Transcript] Deleted metadata for session %s", session_id)
except Exception as e:
logger.warning("[Transcript] Failed to delete metadata: %s", e)
# ---------------------------------------------------------------------------
# Transcript compaction — LLM summarization for prompt-too-long recovery
# ---------------------------------------------------------------------------
# JSONL protocol values used in transcript serialization.
STOP_REASON_END_TURN = "end_turn"
COMPACT_MSG_ID_PREFIX = "msg_compact_"
ENTRY_TYPE_MESSAGE = "message"
def _flatten_assistant_content(blocks: list) -> str:
"""Flatten assistant content blocks into a single plain-text string.
Structured ``tool_use`` blocks are converted to ``[tool_use: name]``
placeholders. This is intentional: ``compress_context`` requires plain
text for token counting and LLM summarization. The structural loss is
acceptable because compaction only runs when the original transcript was
already too large for the model — a summarized plain-text version is
better than no context at all.
"""
parts: list[str] = []
for block in blocks:
if isinstance(block, dict):
btype = block.get("type", "")
if btype == "text":
parts.append(block.get("text", ""))
elif btype == "tool_use":
parts.append(f"[tool_use: {block.get('name', '?')}]")
else:
# Preserve non-text blocks (e.g. image) as placeholders.
# Use __prefix__ to distinguish from literal user text.
parts.append(f"[__{btype}__]")
elif isinstance(block, str):
parts.append(block)
return "\n".join(parts) if parts else ""
def _flatten_tool_result_content(blocks: list) -> str:
"""Flatten tool_result and other content blocks into plain text.
Handles nested tool_result structures, text blocks, and raw strings.
Uses ``json.dumps`` as fallback for dict blocks without a ``text`` key
or where ``text`` is ``None``.
Like ``_flatten_assistant_content``, structured blocks (images, nested
tool results) are reduced to text representations for compression.
"""
str_parts: list[str] = []
for block in blocks:
if isinstance(block, dict) and block.get("type") == "tool_result":
inner = block.get("content") or ""
if isinstance(inner, list):
for sub in inner:
if isinstance(sub, dict):
sub_type = sub.get("type")
if sub_type in ("image", "document"):
# Avoid serializing base64 binary data into
# the compaction input — use a placeholder.
str_parts.append(f"[__{sub_type}__]")
elif sub_type == "text" or sub.get("text") is not None:
str_parts.append(str(sub.get("text", "")))
else:
str_parts.append(json.dumps(sub))
else:
str_parts.append(str(sub))
else:
str_parts.append(str(inner))
elif isinstance(block, dict) and block.get("type") == "text":
str_parts.append(str(block.get("text", "")))
elif isinstance(block, dict):
# Preserve non-text/non-tool_result blocks (e.g. image) as placeholders.
# Use __prefix__ to distinguish from literal user text.
btype = block.get("type", "unknown")
str_parts.append(f"[__{btype}__]")
elif isinstance(block, str):
str_parts.append(block)
return "\n".join(str_parts) if str_parts else ""
def _transcript_to_messages(content: str) -> list[dict]:
"""Convert JSONL transcript entries to plain message dicts for compression.
Parses each line of the JSONL *content*, skips strippable metadata entries
(progress, file-history-snapshot, etc.), and extracts the ``role`` and
flattened ``content`` from the ``message`` field of each remaining entry.
Structured content blocks (``tool_use``, ``tool_result``, images) are
flattened to plain text via ``_flatten_assistant_content`` and
``_flatten_tool_result_content`` so that ``compress_context`` can
perform token counting and LLM summarization on uniform strings.
Returns:
A list of ``{"role": str, "content": str}`` dicts suitable for
``compress_context``.
"""
messages: list[dict] = []
for line in content.strip().split("\n"):
if not line.strip():
continue
entry = json.loads(line, fallback=None)
if not isinstance(entry, dict):
continue
if entry.get("type", "") in STRIPPABLE_TYPES and not entry.get(
"isCompactSummary"
):
continue
msg = entry.get("message", {})
role = msg.get("role", "")
if not role:
continue
msg_dict: dict = {"role": role}
raw_content = msg.get("content")
if role == "assistant" and isinstance(raw_content, list):
msg_dict["content"] = _flatten_assistant_content(raw_content)
elif isinstance(raw_content, list):
msg_dict["content"] = _flatten_tool_result_content(raw_content)
else:
msg_dict["content"] = raw_content or ""
messages.append(msg_dict)
return messages
def _messages_to_transcript(messages: list[dict]) -> str:
"""Convert compressed message dicts back to JSONL transcript format.
Rebuilds a minimal JSONL transcript from the ``{"role", "content"}``
dicts returned by ``compress_context``. Each message becomes one JSONL
line with a fresh ``uuid`` / ``parentUuid`` chain so the CLI's
``--resume`` flag can reconstruct a valid conversation tree.
Assistant messages are wrapped in the full ``message`` envelope
(``id``, ``model``, ``stop_reason``, structured ``content`` blocks)
that the CLI expects. User messages use the simpler ``{role, content}``
form.
Returns:
A newline-terminated JSONL string, or an empty string if *messages*
is empty.
"""
lines: list[str] = []
last_uuid: str = "" # root entry uses empty string, not null
for msg in messages:
role = msg.get("role", "user")
entry_type = "assistant" if role == "assistant" else "user"
uid = str(uuid4())
content = msg.get("content", "")
if role == "assistant":
message: dict = {
"role": "assistant",
"model": "",
"id": f"{COMPACT_MSG_ID_PREFIX}{uuid4().hex[:24]}",
"type": ENTRY_TYPE_MESSAGE,
"content": [{"type": "text", "text": content}] if content else [],
"stop_reason": STOP_REASON_END_TURN,
"stop_sequence": None,
}
else:
message = {"role": role, "content": content}
entry = {
"type": entry_type,
"uuid": uid,
"parentUuid": last_uuid,
"message": message,
}
lines.append(json.dumps(entry, separators=(",", ":")))
last_uuid = uid
return "\n".join(lines) + "\n" if lines else ""
_COMPACTION_TIMEOUT_SECONDS = 60
_TRUNCATION_TIMEOUT_SECONDS = 30
async def _run_compression(
messages: list[dict],
model: str,
log_prefix: str,
) -> CompressResult:
"""Run LLM-based compression with truncation fallback.
Uses the shared OpenAI client from ``get_openai_client()``.
If no client is configured or the LLM call fails, falls back to
truncation-based compression which drops older messages without
summarization.
A 60-second timeout prevents a hung LLM call from blocking the
retry path indefinitely. The truncation fallback also has a
30-second timeout to guard against slow tokenization on very large
transcripts.
"""
client = get_openai_client()
if client is None:
logger.warning("%s No OpenAI client configured, using truncation", log_prefix)
return await asyncio.wait_for(
compress_context(messages=messages, model=model, client=None),
timeout=_TRUNCATION_TIMEOUT_SECONDS,
)
try:
return await asyncio.wait_for(
compress_context(messages=messages, model=model, client=client),
timeout=_COMPACTION_TIMEOUT_SECONDS,
)
except Exception as e:
logger.warning("%s LLM compaction failed, using truncation: %s", log_prefix, e)
return await asyncio.wait_for(
compress_context(messages=messages, model=model, client=None),
timeout=_TRUNCATION_TIMEOUT_SECONDS,
)
async def compact_transcript(
content: str,
*,
model: str,
log_prefix: str = "[Transcript]",
) -> str | None:
"""Compact an oversized JSONL transcript using LLM summarization.
Converts transcript entries to plain messages, runs ``compress_context``
(the same compressor used for pre-query history), and rebuilds JSONL.
Structured content (``tool_use`` blocks, ``tool_result`` nesting, images)
is flattened to plain text for compression. This matches the fidelity of
the Plan C (DB compression) fallback path, where
``_format_conversation_context`` similarly renders tool calls as
``You called tool: name(args)`` and results as ``Tool result: ...``.
Neither path preserves structured API content blocks — the compacted
context serves as text history for the LLM, which creates proper
structured tool calls going forward.
Images are per-turn attachments loaded from workspace storage by file ID
(via ``_prepare_file_attachments``), not part of the conversation history.
They are re-attached each turn and are unaffected by compaction.
Returns the compacted JSONL string, or ``None`` on failure.
See also:
``_compress_messages`` in ``service.py`` — compresses ``ChatMessage``
lists for pre-query DB history. Both share ``compress_context()``
but operate on different input formats (JSONL transcript entries
here vs. ChatMessage dicts there).
"""
messages = _transcript_to_messages(content)
if len(messages) < 2:
logger.warning("%s Too few messages to compact (%d)", log_prefix, len(messages))
return None
try:
result = await _run_compression(messages, model, log_prefix)
if not result.was_compacted:
# Compressor says it's within budget, but the SDK rejected it.
# Return None so the caller falls through to DB fallback.
logger.warning(
"%s Compressor reports within budget but SDK rejected — "
"signalling failure",
log_prefix,
)
return None
logger.info(
"%s Compacted transcript: %d->%d tokens (%d summarized, %d dropped)",
log_prefix,
result.original_token_count,
result.token_count,
result.messages_summarized,
result.messages_dropped,
)
compacted = _messages_to_transcript(result.messages)
if not validate_transcript(compacted):
logger.warning("%s Compacted transcript failed validation", log_prefix)
return None
return compacted
except Exception as e:
logger.error(
"%s Transcript compaction failed: %s", log_prefix, e, exc_info=True
)
return None

View File

@@ -68,7 +68,7 @@ class TranscriptBuilder:
type=entry_type,
uuid=data.get("uuid") or str(uuid4()),
parentUuid=data.get("parentUuid"),
isCompactSummary=data.get("isCompactSummary"),
isCompactSummary=data.get("isCompactSummary") or None,
message=data.get("message", {}),
)

View File

@@ -1,8 +1,7 @@
"""Unit tests for JSONL transcript management utilities."""
import asyncio
import os
from unittest.mock import AsyncMock, MagicMock, patch
from unittest.mock import AsyncMock, patch
import pytest
@@ -10,7 +9,9 @@ from backend.util import json
from .transcript import (
STRIPPABLE_TYPES,
_cli_project_dir,
delete_transcript,
read_cli_session_file,
read_compacted_entries,
strip_progress_entries,
validate_transcript,
@@ -291,6 +292,85 @@ class TestStripProgressEntries:
assert asst_entry["parentUuid"] == "u1" # reparented
# --- read_cli_session_file ---
class TestReadCliSessionFile:
def test_no_matching_files_returns_none(self, tmp_path, monkeypatch):
"""read_cli_session_file returns None when no .jsonl files exist."""
# Create a project dir with no jsonl files
project_dir = tmp_path / "projects" / "encoded-cwd"
project_dir.mkdir(parents=True)
monkeypatch.setattr(
"backend.copilot.sdk.transcript._cli_project_dir",
lambda sdk_cwd: str(project_dir),
)
assert read_cli_session_file("/fake/cwd") is None
def test_one_jsonl_file_returns_content(self, tmp_path, monkeypatch):
"""read_cli_session_file returns the content of a single .jsonl file."""
project_dir = tmp_path / "projects" / "encoded-cwd"
project_dir.mkdir(parents=True)
jsonl_file = project_dir / "session.jsonl"
jsonl_file.write_text("line1\nline2\n")
monkeypatch.setattr(
"backend.copilot.sdk.transcript._cli_project_dir",
lambda sdk_cwd: str(project_dir),
)
result = read_cli_session_file("/fake/cwd")
assert result == "line1\nline2\n"
def test_symlink_escaping_project_dir_is_skipped(self, tmp_path, monkeypatch):
"""read_cli_session_file skips symlinks that escape the project dir."""
project_dir = tmp_path / "projects" / "encoded-cwd"
project_dir.mkdir(parents=True)
# Create a file outside the project dir
outside = tmp_path / "outside"
outside.mkdir()
outside_file = outside / "evil.jsonl"
outside_file.write_text("should not be read\n")
# Symlink from inside project_dir to outside file
symlink = project_dir / "evil.jsonl"
symlink.symlink_to(outside_file)
monkeypatch.setattr(
"backend.copilot.sdk.transcript._cli_project_dir",
lambda sdk_cwd: str(project_dir),
)
# The symlink target resolves outside project_dir, so it should be skipped
result = read_cli_session_file("/fake/cwd")
assert result is None
# --- _cli_project_dir ---
class TestCliProjectDir:
def test_returns_none_for_path_traversal(self, tmp_path, monkeypatch):
"""_cli_project_dir returns None when the project dir symlink escapes projects base."""
config_dir = tmp_path / "config"
config_dir.mkdir()
projects_dir = config_dir / "projects"
projects_dir.mkdir()
monkeypatch.setenv("CLAUDE_CONFIG_DIR", str(config_dir))
# Create a symlink inside projects/ that points outside of it.
# _cli_project_dir encodes the cwd as all-alnum-hyphens, so use a
# cwd whose encoded form matches the symlink name we create.
evil_target = tmp_path / "escaped"
evil_target.mkdir()
# The encoded form of "/evil/cwd" is "-evil-cwd"
symlink_path = projects_dir / "-evil-cwd"
symlink_path.symlink_to(evil_target)
result = _cli_project_dir("/evil/cwd")
assert result is None
# --- delete_transcript ---
@@ -302,7 +382,7 @@ class TestDeleteTranscript:
mock_storage.delete = AsyncMock()
with patch(
"backend.copilot.sdk.transcript.get_workspace_storage",
"backend.util.workspace_storage.get_workspace_storage",
new_callable=AsyncMock,
return_value=mock_storage,
):
@@ -322,7 +402,7 @@ class TestDeleteTranscript:
)
with patch(
"backend.copilot.sdk.transcript.get_workspace_storage",
"backend.util.workspace_storage.get_workspace_storage",
new_callable=AsyncMock,
return_value=mock_storage,
):
@@ -340,7 +420,7 @@ class TestDeleteTranscript:
)
with patch(
"backend.copilot.sdk.transcript.get_workspace_storage",
"backend.util.workspace_storage.get_workspace_storage",
new_callable=AsyncMock,
return_value=mock_storage,
):
@@ -817,386 +897,3 @@ class TestCompactionFlowIntegration:
output2 = builder2.to_jsonl()
lines2 = [json.loads(line) for line in output2.strip().split("\n")]
assert lines2[-1]["parentUuid"] == "a2"
# ---------------------------------------------------------------------------
# _run_compression (direct tests for the 3 code paths)
# ---------------------------------------------------------------------------
class TestRunCompression:
"""Direct tests for ``_run_compression`` covering all 3 code paths.
Paths:
(a) No OpenAI client configured → truncation fallback immediately.
(b) LLM success → returns LLM-compressed result.
(c) LLM call raises → truncation fallback.
"""
def _make_compress_result(self, was_compacted: bool, msgs=None):
"""Build a minimal CompressResult-like object."""
from types import SimpleNamespace
return SimpleNamespace(
was_compacted=was_compacted,
messages=msgs or [{"role": "user", "content": "summary"}],
original_token_count=500,
token_count=100 if was_compacted else 500,
messages_summarized=2 if was_compacted else 0,
messages_dropped=0,
)
@pytest.mark.asyncio
async def test_no_client_uses_truncation(self):
"""Path (a): ``get_openai_client()`` returns None → truncation only."""
from .transcript import _run_compression
truncation_result = self._make_compress_result(
True, [{"role": "user", "content": "truncated"}]
)
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
return_value=None,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
new_callable=AsyncMock,
return_value=truncation_result,
) as mock_compress,
):
result = await _run_compression(
[{"role": "user", "content": "hello"}],
model="test-model",
log_prefix="[test]",
)
# compress_context called with client=None (truncation mode)
call_kwargs = mock_compress.call_args
assert (
call_kwargs.kwargs.get("client") is None
or (call_kwargs.args and call_kwargs.args[2] is None)
or mock_compress.call_args[1].get("client") is None
)
assert result is truncation_result
@pytest.mark.asyncio
async def test_llm_success_returns_llm_result(self):
"""Path (b): ``get_openai_client()`` returns a client → LLM compresses."""
from .transcript import _run_compression
llm_result = self._make_compress_result(
True, [{"role": "user", "content": "LLM summary"}]
)
mock_client = MagicMock()
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
return_value=mock_client,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
new_callable=AsyncMock,
return_value=llm_result,
) as mock_compress,
):
result = await _run_compression(
[{"role": "user", "content": "long conversation"}],
model="test-model",
log_prefix="[test]",
)
# compress_context called with the real client
assert mock_compress.called
assert result is llm_result
@pytest.mark.asyncio
async def test_llm_failure_falls_back_to_truncation(self):
"""Path (c): LLM call raises → truncation fallback used instead."""
from .transcript import _run_compression
truncation_result = self._make_compress_result(
True, [{"role": "user", "content": "truncated fallback"}]
)
mock_client = MagicMock()
call_count = [0]
async def _compress_side_effect(**kwargs):
call_count[0] += 1
if kwargs.get("client") is not None:
raise RuntimeError("LLM timeout")
return truncation_result
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
return_value=mock_client,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
side_effect=_compress_side_effect,
),
):
result = await _run_compression(
[{"role": "user", "content": "long conversation"}],
model="test-model",
log_prefix="[test]",
)
# compress_context called twice: once for LLM (raises), once for truncation
assert call_count[0] == 2
assert result is truncation_result
@pytest.mark.asyncio
async def test_llm_timeout_falls_back_to_truncation(self):
"""Path (d): LLM call exceeds timeout → truncation fallback used."""
from .transcript import _run_compression
truncation_result = self._make_compress_result(
True, [{"role": "user", "content": "truncated after timeout"}]
)
call_count = [0]
async def _compress_side_effect(*, messages, model, client):
call_count[0] += 1
if client is not None:
# Simulate a hang that exceeds the timeout
await asyncio.sleep(9999)
return truncation_result
fake_client = MagicMock()
with (
patch(
"backend.copilot.sdk.transcript.get_openai_client",
return_value=fake_client,
),
patch(
"backend.copilot.sdk.transcript.compress_context",
side_effect=_compress_side_effect,
),
patch(
"backend.copilot.sdk.transcript._COMPACTION_TIMEOUT_SECONDS",
0.05,
),
patch(
"backend.copilot.sdk.transcript._TRUNCATION_TIMEOUT_SECONDS",
5,
),
):
result = await _run_compression(
[{"role": "user", "content": "long conversation"}],
model="test-model",
log_prefix="[test]",
)
# compress_context called twice: once for LLM (times out), once truncation
assert call_count[0] == 2
assert result is truncation_result
# ---------------------------------------------------------------------------
# cleanup_stale_project_dirs
# ---------------------------------------------------------------------------
class TestCleanupStaleProjectDirs:
"""Tests for cleanup_stale_project_dirs (disk leak prevention)."""
def test_removes_old_copilot_dirs(self, tmp_path, monkeypatch):
"""Directories matching copilot pattern older than threshold are removed."""
from backend.copilot.sdk.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
lambda: str(projects_dir),
)
# Create a stale dir
stale = projects_dir / "-tmp-copilot-old-session"
stale.mkdir()
# Set mtime to past the threshold
import time
old_time = time.time() - _STALE_PROJECT_DIR_SECONDS - 100
os.utime(stale, (old_time, old_time))
# Create a fresh dir
fresh = projects_dir / "-tmp-copilot-new-session"
fresh.mkdir()
removed = cleanup_stale_project_dirs()
assert removed == 1
assert not stale.exists()
assert fresh.exists()
def test_ignores_non_copilot_dirs(self, tmp_path, monkeypatch):
"""Directories not matching copilot pattern are left alone."""
from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
lambda: str(projects_dir),
)
# Non-copilot dir that's old
import time
other = projects_dir / "some-other-project"
other.mkdir()
old_time = time.time() - 999999
os.utime(other, (old_time, old_time))
removed = cleanup_stale_project_dirs()
assert removed == 0
assert other.exists()
def test_ttl_boundary_not_removed(self, tmp_path, monkeypatch):
"""A directory exactly at the TTL boundary should NOT be removed."""
from backend.copilot.sdk.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
lambda: str(projects_dir),
)
import time
# Dir that's exactly at the TTL (age == threshold, not >) — should survive
boundary = projects_dir / "-tmp-copilot-boundary"
boundary.mkdir()
boundary_time = time.time() - _STALE_PROJECT_DIR_SECONDS + 1
os.utime(boundary, (boundary_time, boundary_time))
removed = cleanup_stale_project_dirs()
assert removed == 0
assert boundary.exists()
def test_skips_non_directory_entries(self, tmp_path, monkeypatch):
"""Regular files matching the copilot pattern are not removed."""
from backend.copilot.sdk.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
lambda: str(projects_dir),
)
import time
# Create a regular FILE (not a dir) with the copilot pattern name
stale_file = projects_dir / "-tmp-copilot-stale-file"
stale_file.write_text("not a dir")
old_time = time.time() - _STALE_PROJECT_DIR_SECONDS - 100
os.utime(stale_file, (old_time, old_time))
removed = cleanup_stale_project_dirs()
assert removed == 0
assert stale_file.exists()
def test_missing_base_dir_returns_zero(self, tmp_path, monkeypatch):
"""If the projects base directory doesn't exist, return 0 gracefully."""
from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
nonexistent = str(tmp_path / "does-not-exist" / "projects")
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
lambda: nonexistent,
)
removed = cleanup_stale_project_dirs()
assert removed == 0
def test_scoped_removes_only_target_dir(self, tmp_path, monkeypatch):
"""When encoded_cwd is supplied only that directory is swept."""
import time
from backend.copilot.sdk.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
lambda: str(projects_dir),
)
old_time = time.time() - _STALE_PROJECT_DIR_SECONDS - 100
# Two stale copilot dirs
target = projects_dir / "-tmp-copilot-session-abc"
target.mkdir()
os.utime(target, (old_time, old_time))
other = projects_dir / "-tmp-copilot-session-xyz"
other.mkdir()
os.utime(other, (old_time, old_time))
# Only the target dir should be removed
removed = cleanup_stale_project_dirs(encoded_cwd="-tmp-copilot-session-abc")
assert removed == 1
assert not target.exists()
assert other.exists() # untouched — not the current session
def test_scoped_fresh_dir_not_removed(self, tmp_path, monkeypatch):
"""Scoped sweep leaves a fresh directory alone."""
from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
lambda: str(projects_dir),
)
fresh = projects_dir / "-tmp-copilot-session-new"
fresh.mkdir()
# mtime is now — well within TTL
removed = cleanup_stale_project_dirs(encoded_cwd="-tmp-copilot-session-new")
assert removed == 0
assert fresh.exists()
def test_scoped_non_copilot_dir_not_removed(self, tmp_path, monkeypatch):
"""Scoped sweep refuses to remove a non-copilot directory."""
import time
from backend.copilot.sdk.transcript import (
_STALE_PROJECT_DIR_SECONDS,
cleanup_stale_project_dirs,
)
projects_dir = tmp_path / "projects"
projects_dir.mkdir()
monkeypatch.setattr(
"backend.copilot.sdk.transcript._projects_base",
lambda: str(projects_dir),
)
old_time = time.time() - _STALE_PROJECT_DIR_SECONDS - 100
non_copilot = projects_dir / "some-other-project"
non_copilot.mkdir()
os.utime(non_copilot, (old_time, old_time))
removed = cleanup_stale_project_dirs(encoded_cwd="some-other-project")
assert removed == 0
assert non_copilot.exists()

View File

@@ -4,12 +4,11 @@ These tests verify the complete copilot flow using dummy implementations
for agent generator and SDK service, allowing automated testing without
external LLM calls.
Enable test mode with CHAT_TEST_MODE=true environment variable (or in .env).
Enable test mode with COPILOT_TEST_MODE=true environment variable.
The dummy service emits the full AI SDK protocol event sequence:
StreamStart → StreamStartStep → StreamTextStart → StreamTextDelta(s) →
StreamTextEnd → StreamFinishStep → StreamFinish.
The processor skips StreamFinish and publishes its own via mark_session_completed.
Note: StreamFinish is NOT emitted by the dummy service — it is published
by mark_session_completed in the processor layer. These tests only cover
the service-level streaming output (StreamStart + StreamTextDelta).
"""
import asyncio
@@ -21,14 +20,9 @@ import pytest
from backend.copilot.model import ChatMessage, ChatSession, upsert_chat_session
from backend.copilot.response_model import (
StreamError,
StreamFinish,
StreamFinishStep,
StreamHeartbeat,
StreamStart,
StreamStartStep,
StreamTextDelta,
StreamTextEnd,
StreamTextStart,
)
from backend.copilot.sdk.dummy import stream_chat_completion_dummy
@@ -36,9 +30,9 @@ from backend.copilot.sdk.dummy import stream_chat_completion_dummy
@pytest.fixture(autouse=True)
def enable_test_mode():
"""Enable test mode for all tests in this module."""
os.environ["CHAT_TEST_MODE"] = "true"
os.environ["COPILOT_TEST_MODE"] = "true"
yield
os.environ.pop("CHAT_TEST_MODE", None)
os.environ.pop("COPILOT_TEST_MODE", None)
@pytest.mark.asyncio
@@ -116,14 +110,9 @@ async def test_streaming_event_types():
):
event_types.add(type(event).__name__)
# Required event types for full AI SDK protocol
# Required event types (StreamFinish is published by processor, not service)
assert "StreamStart" in event_types, "Missing StreamStart"
assert "StreamStartStep" in event_types, "Missing StreamStartStep"
assert "StreamTextStart" in event_types, "Missing StreamTextStart"
assert "StreamTextDelta" in event_types, "Missing StreamTextDelta"
assert "StreamTextEnd" in event_types, "Missing StreamTextEnd"
assert "StreamFinishStep" in event_types, "Missing StreamFinishStep"
assert "StreamFinish" in event_types, "Missing StreamFinish"
print(f"✅ Event types: {sorted(event_types)}")
@@ -186,17 +175,16 @@ async def test_streaming_heartbeat_timing():
@pytest.mark.asyncio
async def test_error_handling():
"""Test that error events have correct SSE structure."""
"""Test that errors are properly formatted and sent."""
# This would require a dummy that can trigger errors
# For now, just verify error event structure
error = StreamError(errorText="Test error", code="test_error")
assert error.errorText == "Test error"
assert error.code == "test_error"
assert str(error.type.value) in ["error", "error"]
# Verify to_sse() strips code (AI SDK protocol compliance)
sse = error.to_sse()
assert '"errorText"' in sse
assert '"code"' not in sse, "to_sse() must strip code field for AI SDK"
print("✅ Error structure verified (code stripped in SSE)")
print("✅ Error structure verified")
@pytest.mark.asyncio
@@ -338,85 +326,20 @@ async def test_stream_completeness():
):
events.append(event)
# Check for all required event types
assert any(isinstance(e, StreamStart) for e in events), "Missing StreamStart"
assert any(
isinstance(e, StreamStartStep) for e in events
), "Missing StreamStartStep"
assert any(
isinstance(e, StreamTextStart) for e in events
), "Missing StreamTextStart"
assert any(
isinstance(e, StreamTextDelta) for e in events
), "Missing StreamTextDelta"
assert any(isinstance(e, StreamTextEnd) for e in events), "Missing StreamTextEnd"
assert any(
isinstance(e, StreamFinishStep) for e in events
), "Missing StreamFinishStep"
assert any(isinstance(e, StreamFinish) for e in events), "Missing StreamFinish"
# Check for required events (StreamFinish is published by processor)
has_start = any(isinstance(e, StreamStart) for e in events)
has_text = any(isinstance(e, StreamTextDelta) for e in events)
assert has_start, "Stream must include StreamStart"
assert has_text, "Stream must include text deltas"
# Verify exactly one start
start_count = sum(1 for e in events if isinstance(e, StreamStart))
assert start_count == 1, f"Should have exactly 1 StreamStart, got {start_count}"
print(f"✅ Completeness: {len(events)} events, full protocol sequence")
@pytest.mark.asyncio
async def test_transient_error_shows_retryable():
"""Test __test_transient_error__ yields partial text then retryable StreamError."""
events = []
async for event in stream_chat_completion_dummy(
session_id="test-transient",
message="please fail __test_transient_error__",
is_user_message=True,
user_id="test-user",
):
events.append(event)
# Should start with StreamStart
assert isinstance(events[0], StreamStart)
# Should have some partial text before the error
text_events = [e for e in events if isinstance(e, StreamTextDelta)]
assert len(text_events) > 0, "Should stream partial text before error"
# Should end with StreamError
error_events = [e for e in events if isinstance(e, StreamError)]
assert len(error_events) == 1, "Should have exactly one StreamError"
assert error_events[0].code == "transient_api_error"
assert "connection interrupted" in error_events[0].errorText.lower()
print(f"✅ Transient error: {len(text_events)} partial deltas + retryable error")
@pytest.mark.asyncio
async def test_fatal_error_not_retryable():
"""Test __test_fatal_error__ yields StreamError without retryable code."""
events = []
async for event in stream_chat_completion_dummy(
session_id="test-fatal",
message="__test_fatal_error__",
is_user_message=True,
user_id="test-user",
):
events.append(event)
assert isinstance(events[0], StreamStart)
# Should have StreamError with sdk_error code (not transient)
error_events = [e for e in events if isinstance(e, StreamError)]
assert len(error_events) == 1
assert error_events[0].code == "sdk_error"
assert "transient" not in error_events[0].code
# Should NOT have any text deltas (fatal errors fail immediately)
text_events = [e for e in events if isinstance(e, StreamTextDelta)]
assert len(text_events) == 0, "Fatal error should not stream any text"
print("✅ Fatal error: immediate error, no partial text")
print(
f"✅ Completeness: 1 start, {sum(1 for e in events if isinstance(e, StreamTextDelta))} text deltas"
)
@pytest.mark.asyncio
@@ -472,8 +395,6 @@ if __name__ == "__main__":
asyncio.run(test_message_deduplication())
asyncio.run(test_event_ordering())
asyncio.run(test_stream_completeness())
asyncio.run(test_transient_error_shows_retryable())
asyncio.run(test_fatal_error_not_retryable())
asyncio.run(test_text_delta_consistency())
print("=" * 60)

View File

@@ -12,7 +12,6 @@ from .agent_browser import BrowserActTool, BrowserNavigateTool, BrowserScreensho
from .agent_output import AgentOutputTool
from .base import BaseTool
from .bash_exec import BashExecTool
from .connect_integration import ConnectIntegrationTool
from .continue_run_block import ContinueRunBlockTool
from .create_agent import CreateAgentTool
from .customize_agent import CustomizeAgentTool
@@ -85,7 +84,6 @@ TOOL_REGISTRY: dict[str, BaseTool] = {
"browser_screenshot": BrowserScreenshotTool(),
# Sandboxed code execution (bubblewrap)
"bash_exec": BashExecTool(),
"connect_integration": ConnectIntegrationTool(),
# Persistent workspace tools (cloud storage, survives across sessions)
# Feature request tools
"search_feature_requests": SearchFeatureRequestsTool(),

View File

@@ -22,12 +22,13 @@ class AddUnderstandingTool(BaseTool):
@property
def description(self) -> str:
return (
"Store user's business context, workflows, pain points, and automation goals. "
"Call whenever the user shares business info. Each call incrementally merges "
"with existing data — provide only the fields you have. "
"Builds a profile that helps recommend better agents for the user's needs."
)
return """Capture and store information about the user's business context,
workflows, pain points, and automation goals. Call this tool whenever the user
shares information about their business. Each call incrementally adds to the
existing understanding - you don't need to provide all fields at once.
Use this to build a comprehensive profile that helps recommend better agents
and automations for the user's specific needs."""
@property
def parameters(self) -> dict[str, Any]:

View File

@@ -20,9 +20,7 @@ SSRF protection:
Requires:
npm install -g agent-browser
agent-browser install (downloads Chromium, one-time — skipped in Docker
where system chromium is pre-installed and
AGENT_BROWSER_EXECUTABLE_PATH is set)
agent-browser install (downloads Chromium, one-time per machine)
"""
import asyncio
@@ -410,11 +408,18 @@ class BrowserNavigateTool(BaseTool):
@property
def description(self) -> str:
return (
"Navigate to a URL in a real browser. Returns accessibility tree with @ref IDs "
"for browser_act. Session persists (cookies/auth carry over). "
"For static pages, prefer web_fetch. "
"For SPAs, elements may load late — use browser_act with wait + browser_screenshot to verify. "
"For auth: navigate to login, fill creds and submit with browser_act, then navigate to target."
"Navigate to a URL using a real browser. Returns an accessibility "
"tree snapshot listing the page's interactive elements with @ref IDs "
"(e.g. @e3) that can be used with browser_act. "
"Session persists — cookies and login state carry over between calls. "
"Use this (with browser_act) for multi-step interaction: login flows, "
"form filling, button clicks, or anything requiring page interaction. "
"For plain static pages, prefer web_fetch — no browser overhead. "
"For authenticated pages: navigate to the login page first, use browser_act "
"to fill credentials and submit, then navigate to the target page. "
"Note: for slow SPAs, the returned snapshot may reflect a partially-loaded "
"state. If elements seem missing, use browser_act with action='wait' and a "
"CSS selector or millisecond delay, then take a browser_screenshot to verify."
)
@property
@@ -424,13 +429,13 @@ class BrowserNavigateTool(BaseTool):
"properties": {
"url": {
"type": "string",
"description": "HTTP/HTTPS URL to navigate to.",
"description": "The HTTP/HTTPS URL to navigate to.",
},
"wait_for": {
"type": "string",
"enum": ["networkidle", "load", "domcontentloaded"],
"default": "networkidle",
"description": "Navigation completion strategy (default: networkidle).",
"description": "When to consider navigation complete. Use 'networkidle' for SPAs (default).",
},
},
"required": ["url"],
@@ -549,12 +554,14 @@ class BrowserActTool(BaseTool):
@property
def description(self) -> str:
return (
"Interact with the current browser page using @ref IDs from the snapshot. "
"Actions: click, dblclick, fill, type, scroll, hover, press, "
"Interact with the current browser page. Use @ref IDs from the "
"snapshot (e.g. '@e3') to target elements. Returns an updated snapshot. "
"Supported actions: click, dblclick, fill, type, scroll, hover, press, "
"check, uncheck, select, wait, back, forward, reload. "
"fill clears field first; type appends. "
"wait accepts CSS selector or milliseconds (e.g. '1000'). "
"Returns updated snapshot."
"fill clears the field before typing; type appends without clearing. "
"wait accepts a CSS selector (waits for element) or milliseconds string (e.g. '1000'). "
"Example login flow: fill @e1 with email → fill @e2 with password → "
"click @e3 (submit) → browser_navigate to the target page."
)
@property
@@ -580,21 +587,30 @@ class BrowserActTool(BaseTool):
"forward",
"reload",
],
"description": "Action to perform.",
"description": "The action to perform.",
},
"target": {
"type": "string",
"description": "@ref ID (e.g. '@e3'), CSS selector, or text. Required for: click, dblclick, fill, type, hover, check, uncheck, select. For wait: CSS selector or milliseconds string (e.g. '1000').",
"description": (
"Element to target. Use @ref from snapshot (e.g. '@e3'), "
"a CSS selector, or a text description. "
"Required for: click, dblclick, fill, type, hover, check, uncheck, select. "
"For wait: a CSS selector to wait for, or milliseconds as a string (e.g. '1000')."
),
},
"value": {
"type": "string",
"description": "Text for fill/type, key for press (e.g. 'Enter'), option for select.",
"description": (
"For fill/type: the text to enter. "
"For press: key name (e.g. 'Enter', 'Tab', 'Control+a'). "
"For select: the option value to select."
),
},
"direction": {
"type": "string",
"enum": ["up", "down", "left", "right"],
"default": "down",
"description": "Scroll direction (default: down).",
"description": "For scroll: direction to scroll.",
},
},
"required": ["action"],
@@ -741,10 +757,12 @@ class BrowserScreenshotTool(BaseTool):
@property
def description(self) -> str:
return (
"Screenshot the current browser page and save to workspace. "
"annotate=true overlays @ref labels on elements. "
"IMPORTANT: After calling, you MUST immediately call read_workspace_file with the "
"returned file_id to display the image inline."
"Take a screenshot of the current browser page and save it to the workspace. "
"IMPORTANT: After calling this tool, immediately call read_workspace_file "
"with the returned file_id to display the image inline to the user — "
"the screenshot is not visible until you do this. "
"With annotate=true (default), @ref labels are overlaid on interactive "
"elements, making it easy to see which @ref ID maps to which element on screen."
)
@property
@@ -755,12 +773,12 @@ class BrowserScreenshotTool(BaseTool):
"annotate": {
"type": "boolean",
"default": True,
"description": "Overlay @ref labels (default: true).",
"description": "Overlay @ref labels on interactive elements (default: true).",
},
"filename": {
"type": "string",
"default": "screenshot.png",
"description": "Workspace filename (default: screenshot.png).",
"description": "Filename to save in the workspace.",
},
},
}

View File

@@ -7,7 +7,6 @@ from typing import Any
from .helpers import (
AGENT_EXECUTOR_BLOCK_ID,
MCP_TOOL_BLOCK_ID,
SMART_DECISION_MAKER_BLOCK_ID,
AgentDict,
are_types_compatible,
generate_uuid,
@@ -31,14 +30,6 @@ _GET_CURRENT_DATE_BLOCK_ID = "b29c1b50-5d0e-4d9f-8f9d-1b0e6fcbf0b1"
_GMAIL_SEND_BLOCK_ID = "6c27abc2-e51d-499e-a85f-5a0041ba94f0"
_TEXT_REPLACE_BLOCK_ID = "7e7c87ab-3469-4bcc-9abe-67705091b713"
# Defaults applied to SmartDecisionMakerBlock nodes by the fixer.
_SDM_DEFAULTS: dict[str, int | bool] = {
"agent_mode_max_iterations": 10,
"conversation_compaction": True,
"retry": 3,
"multiple_tool_calls": False,
}
class AgentFixer:
"""
@@ -1639,43 +1630,6 @@ class AgentFixer:
return agent
def fix_smart_decision_maker_blocks(self, agent: AgentDict) -> AgentDict:
"""Fix SmartDecisionMakerBlock nodes to ensure agent-mode defaults.
Ensures:
1. ``agent_mode_max_iterations`` defaults to ``10`` (bounded agent mode)
2. ``conversation_compaction`` defaults to ``True``
3. ``retry`` defaults to ``3``
4. ``multiple_tool_calls`` defaults to ``False``
Args:
agent: The agent dictionary to fix
Returns:
The fixed agent dictionary
"""
nodes = agent.get("nodes", [])
for node in nodes:
if node.get("block_id") != SMART_DECISION_MAKER_BLOCK_ID:
continue
node_id = node.get("id", "unknown")
input_default = node.get("input_default")
if not isinstance(input_default, dict):
input_default = {}
node["input_default"] = input_default
for field, default_value in _SDM_DEFAULTS.items():
if field not in input_default or input_default[field] is None:
input_default[field] = default_value
self.add_fix_log(
f"SmartDecisionMakerBlock {node_id}: "
f"Set {field}={default_value!r}"
)
return agent
def fix_dynamic_block_sink_names(self, agent: AgentDict) -> AgentDict:
"""Fix links that use _#_ notation for dynamic block sink names.
@@ -1763,9 +1717,6 @@ class AgentFixer:
# Apply fixes for MCPToolBlock nodes
agent = self.fix_mcp_tool_blocks(agent)
# Apply fixes for SmartDecisionMakerBlock nodes (agent-mode defaults)
agent = self.fix_smart_decision_maker_blocks(agent)
# Apply fixes for AgentExecutorBlock nodes (sub-agents)
if library_agents:
agent = self.fix_agent_executor_blocks(agent, library_agents)

View File

@@ -12,7 +12,6 @@ __all__ = [
"AGENT_OUTPUT_BLOCK_ID",
"AgentDict",
"MCP_TOOL_BLOCK_ID",
"SMART_DECISION_MAKER_BLOCK_ID",
"UUID_REGEX",
"are_types_compatible",
"generate_uuid",
@@ -34,7 +33,6 @@ UUID_REGEX = re.compile(r"^" + UUID_RE_STR + r"$")
AGENT_EXECUTOR_BLOCK_ID = "e189baac-8c20-45a1-94a7-55177ea42565"
MCP_TOOL_BLOCK_ID = "a0a4b1c2-d3e4-4f56-a7b8-c9d0e1f2a3b4"
SMART_DECISION_MAKER_BLOCK_ID = "3b191d9f-356f-482d-8238-ba04b6d18381"
AGENT_INPUT_BLOCK_ID = "c0a8e994-ebf1-4a9c-a4d8-89d09c86741b"
AGENT_OUTPUT_BLOCK_ID = "363ae599-353e-4804-937e-b2ee3cef3da4"

View File

@@ -10,7 +10,6 @@ from .helpers import (
AGENT_INPUT_BLOCK_ID,
AGENT_OUTPUT_BLOCK_ID,
MCP_TOOL_BLOCK_ID,
SMART_DECISION_MAKER_BLOCK_ID,
AgentDict,
are_types_compatible,
get_defined_property_type,
@@ -182,23 +181,15 @@ class AgentValidator:
return valid
def _build_node_lookup(self, agent: AgentDict) -> dict[str, dict[str, Any]]:
"""Build a node-id → node dict from the agent's nodes."""
return {node.get("id", ""): node for node in agent.get("nodes", [])}
def validate_data_type_compatibility(
self,
agent: AgentDict,
blocks: list[dict[str, Any]],
node_lookup: dict[str, dict[str, Any]] | None = None,
self, agent: AgentDict, blocks: list[dict[str, Any]]
) -> bool:
"""
Validate that linked data types are compatible between source and sink.
Returns True if all data types are compatible, False otherwise.
"""
valid = True
if node_lookup is None:
node_lookup = self._build_node_lookup(agent)
node_lookup = {node.get("id", ""): node for node in agent.get("nodes", [])}
block_lookup = {block.get("id", ""): block for block in blocks}
for link in agent.get("links", []):
@@ -218,8 +209,8 @@ class AgentValidator:
valid = False
continue
source_node = node_lookup.get(source_id)
sink_node = node_lookup.get(sink_id)
source_node = node_lookup.get(source_id, "")
sink_node = node_lookup.get(sink_id, "")
if not source_node or not sink_node:
continue
@@ -257,10 +248,7 @@ class AgentValidator:
return valid
def validate_nested_sink_links(
self,
agent: AgentDict,
blocks: list[dict[str, Any]],
node_lookup: dict[str, dict[str, Any]] | None = None,
self, agent: AgentDict, blocks: list[dict[str, Any]]
) -> bool:
"""
Validate nested sink links (links with _#_ notation).
@@ -274,8 +262,7 @@ class AgentValidator:
block_names = {
block.get("id", ""): block.get("name", "Unknown Block") for block in blocks
}
if node_lookup is None:
node_lookup = self._build_node_lookup(agent)
node_lookup = {node.get("id", ""): node for node in agent.get("nodes", [])}
for link in agent.get("links", []):
sink_name = link.get("sink_name", "")
@@ -401,10 +388,7 @@ class AgentValidator:
return valid
def validate_source_output_existence(
self,
agent: AgentDict,
blocks: list[dict[str, Any]],
node_lookup: dict[str, dict[str, Any]] | None = None,
self, agent: AgentDict, blocks: list[dict[str, Any]]
) -> bool:
"""
Validate that all source_names in links exist in the corresponding
@@ -417,7 +401,6 @@ class AgentValidator:
Args:
agent: The agent dictionary to validate
blocks: List of available blocks with their schemas
node_lookup: Optional pre-built node-id → node dict
Returns:
True if all source output fields exist, False otherwise
@@ -432,8 +415,7 @@ class AgentValidator:
block_names = {
block.get("id", ""): block.get("name", "Unknown Block") for block in blocks
}
if node_lookup is None:
node_lookup = self._build_node_lookup(agent)
node_lookup = {node.get("id", ""): node for node in agent.get("nodes", [])}
for link in agent.get("links", []):
source_id = link.get("source_id")
@@ -827,96 +809,6 @@ class AgentValidator:
return valid
def validate_smart_decision_maker_blocks(
self,
agent: AgentDict,
node_lookup: dict[str, dict[str, Any]] | None = None,
) -> bool:
"""Validate that SmartDecisionMakerBlock nodes have downstream tools.
Checks that each SmartDecisionMakerBlock node has at least one link
with ``source_name == "tools"`` connecting to a downstream block.
Without tools, the block has nothing to call and will error at runtime.
Returns True if all SmartDecisionMakerBlock nodes are valid.
"""
valid = True
nodes = agent.get("nodes", [])
links = agent.get("links", [])
if node_lookup is None:
node_lookup = self._build_node_lookup(agent)
non_tool_block_ids = {AGENT_INPUT_BLOCK_ID, AGENT_OUTPUT_BLOCK_ID}
for node in nodes:
if node.get("block_id") != SMART_DECISION_MAKER_BLOCK_ID:
continue
node_id = node.get("id", "unknown")
customized_name = (node.get("metadata") or {}).get(
"customized_name", node_id
)
# Warn if agent_mode_max_iterations is 0 (traditional mode) —
# requires complex external conversation-history loop wiring
# that the agent generator does not produce.
input_default = node.get("input_default", {})
max_iter = input_default.get("agent_mode_max_iterations")
if max_iter is not None and not isinstance(max_iter, int):
self.add_error(
f"SmartDecisionMakerBlock node '{customized_name}' "
f"({node_id}) has non-integer "
f"agent_mode_max_iterations={max_iter!r}. "
f"This field must be an integer."
)
valid = False
elif isinstance(max_iter, int) and max_iter < -1:
self.add_error(
f"SmartDecisionMakerBlock node '{customized_name}' "
f"({node_id}) has invalid "
f"agent_mode_max_iterations={max_iter}. "
f"Use -1 for infinite or a positive number for "
f"bounded iterations."
)
valid = False
elif isinstance(max_iter, int) and max_iter > 100:
self.add_error(
f"SmartDecisionMakerBlock node '{customized_name}' "
f"({node_id}) has agent_mode_max_iterations="
f"{max_iter} which is unusually high. Values above "
f"100 risk excessive cost and long execution times. "
f"Consider using a lower value (3-10) or -1 for "
f"genuinely open-ended tasks."
)
valid = False
elif max_iter == 0:
self.add_error(
f"SmartDecisionMakerBlock node '{customized_name}' "
f"({node_id}) has agent_mode_max_iterations=0 "
f"(traditional mode). The agent generator only supports "
f"agent mode (set to -1 for infinite or a positive "
f"number for bounded iterations)."
)
valid = False
has_tools = any(
link.get("source_id") == node_id
and link.get("source_name") == "tools"
and node_lookup.get(link.get("sink_id", ""), {}).get("block_id")
not in non_tool_block_ids
for link in links
)
if not has_tools:
self.add_error(
f"SmartDecisionMakerBlock node '{customized_name}' "
f"({node_id}) has no downstream tool blocks connected. "
f"Connect at least one block to its 'tools' output so "
f"the AI has tools to call."
)
valid = False
return valid
def validate_mcp_tool_blocks(self, agent: AgentDict) -> bool:
"""Validate that MCPToolBlock nodes have required fields.
@@ -978,9 +870,6 @@ class AgentValidator:
logger.info("Validating agent...")
self.errors = []
# Build node lookup once and share across validation methods
node_lookup = self._build_node_lookup(agent)
checks = [
(
"Block existence",
@@ -996,15 +885,15 @@ class AgentValidator:
),
(
"Data type compatibility",
self.validate_data_type_compatibility(agent, blocks, node_lookup),
self.validate_data_type_compatibility(agent, blocks),
),
(
"Nested sink links",
self.validate_nested_sink_links(agent, blocks, node_lookup),
self.validate_nested_sink_links(agent, blocks),
),
(
"Source output existence",
self.validate_source_output_existence(agent, blocks, node_lookup),
self.validate_source_output_existence(agent, blocks),
),
(
"Prompt double curly braces spaces",
@@ -1024,10 +913,6 @@ class AgentValidator:
"MCP tool blocks",
self.validate_mcp_tool_blocks(agent),
),
(
"SmartDecisionMaker blocks",
self.validate_smart_decision_maker_blocks(agent, node_lookup),
),
]
# Add AgentExecutorBlock detailed validation if library_agents

View File

@@ -108,12 +108,22 @@ class AgentOutputTool(BaseTool):
@property
def description(self) -> str:
return (
"Retrieve execution outputs from a library agent. "
"Identify by agent_name, library_agent_id, or store_slug. "
"Filter by execution_id or run_time. "
"Optionally wait for running executions."
)
return """Retrieve execution outputs from agents in the user's library.
Identify the agent using one of:
- agent_name: Fuzzy search in user's library
- library_agent_id: Exact library agent ID
- store_slug: Marketplace format 'username/agent-name'
Select which run to retrieve using:
- execution_id: Specific execution ID
- run_time: 'latest' (default), 'yesterday', 'last week', or ISO date 'YYYY-MM-DD'
Wait for completion (optional):
- wait_if_running: Max seconds to wait if execution is still running (0-300).
If the execution is running/queued, waits up to this many seconds for completion.
Returns current status on timeout. If already finished, returns immediately.
"""
@property
def parameters(self) -> dict[str, Any]:
@@ -122,29 +132,32 @@ class AgentOutputTool(BaseTool):
"properties": {
"agent_name": {
"type": "string",
"description": "Agent name (fuzzy match).",
"description": "Agent name to search for in user's library (fuzzy match)",
},
"library_agent_id": {
"type": "string",
"description": "Library agent ID.",
"description": "Exact library agent ID",
},
"store_slug": {
"type": "string",
"description": "Marketplace 'username/agent-name'.",
"description": "Marketplace identifier: 'username/agent-slug'",
},
"execution_id": {
"type": "string",
"description": "Specific execution ID.",
"description": "Specific execution ID to retrieve",
},
"run_time": {
"type": "string",
"description": "Time filter: 'latest', 'today', 'yesterday', 'last week', 'last 7 days', 'last month', 'last 30 days', 'YYYY-MM-DD', or ISO datetime.",
"description": (
"Time filter: 'latest', 'yesterday', 'last week', or 'YYYY-MM-DD'"
),
},
"wait_if_running": {
"type": "integer",
"description": "Max seconds to wait if still running (0-300). Returns current state on timeout.",
"minimum": 0,
"maximum": 300,
"description": (
"Max seconds to wait if execution is still running (0-300). "
"If running, waits for completion. Returns current state on timeout."
),
},
},
"required": [],

View File

@@ -3,11 +3,11 @@
from __future__ import annotations
import logging
import re
from typing import TYPE_CHECKING, Literal
if TYPE_CHECKING:
from backend.api.features.library.model import LibraryAgent
from backend.api.features.store.model import StoreAgent, StoreAgentDetails
from backend.data.db_accessors import library_db, store_db
from backend.util.exceptions import DatabaseError, NotFoundError
@@ -19,12 +19,16 @@ from .models import (
NoResultsResponse,
ToolResponseBase,
)
from .utils import is_creator_slug, is_uuid
logger = logging.getLogger(__name__)
SearchSource = Literal["marketplace", "library"]
_UUID_PATTERN = re.compile(
r"^[a-f0-9]{8}-[a-f0-9]{4}-4[a-f0-9]{3}-[89ab][a-f0-9]{3}-[a-f0-9]{12}$",
re.IGNORECASE,
)
# Keywords that should be treated as "list all" rather than a literal search
_LIST_ALL_KEYWORDS = frozenset({"all", "*", "everything", "any", ""})
@@ -35,160 +39,149 @@ async def search_agents(
session_id: str | None = None,
user_id: str | None = None,
) -> ToolResponseBase:
"""Search for agents in marketplace or user library."""
if source == "marketplace":
return await _search_marketplace(query, session_id)
else:
return await _search_library(query, session_id, user_id)
"""
Search for agents in marketplace or user library.
For library searches, keywords like "all", "*", "everything", or an empty
query will list all agents without filtering.
async def _search_marketplace(query: str, session_id: str | None) -> ToolResponseBase:
"""Search marketplace agents, with direct creator/slug lookup fallback."""
query = query.strip()
if not query:
Args:
query: Search query string. Special keywords list all library agents.
source: "marketplace" or "library"
session_id: Chat session ID
user_id: User ID (required for library search)
Returns:
AgentsFoundResponse, NoResultsResponse, or ErrorResponse
"""
# Normalize list-all keywords to empty string for library searches
if source == "library" and query.lower().strip() in _LIST_ALL_KEYWORDS:
query = ""
if source == "marketplace" and not query:
return ErrorResponse(
message="Please provide a search query", session_id=session_id
)
agents: list[AgentInfo] = []
try:
# Direct lookup if query matches "creator/slug" pattern
if is_creator_slug(query):
logger.info(f"Query looks like creator/slug, trying direct lookup: {query}")
creator, slug = query.split("/", 1)
agent_info = await _get_marketplace_agent_by_slug(creator, slug)
if agent_info:
agents.append(agent_info)
if not agents:
logger.info(f"Searching marketplace for: {query}")
results = await store_db().get_store_agents(search_query=query, page_size=5)
for agent in results.agents:
agents.append(_marketplace_agent_to_info(agent))
except NotFoundError:
pass
except DatabaseError as e:
logger.error(f"Error searching marketplace: {e}", exc_info=True)
return ErrorResponse(
message="Failed to search marketplace. Please try again.",
error=str(e),
session_id=session_id,
)
if not agents:
return NoResultsResponse(
message=(
f"No agents found matching '{query}'. Let the user know they can "
"try different keywords or browse the marketplace. Also let them "
"know you can create a custom agent for them based on their needs."
),
suggestions=[
"Try more general terms",
"Browse categories in the marketplace",
"Check spelling",
],
session_id=session_id,
)
return AgentsFoundResponse(
message=(
"Now you have found some options for the user to choose from. "
"You can add a link to a recommended agent at: /marketplace/agent/agent_id "
"Please ask the user if they would like to use any of these agents. "
"Let the user know we can create a custom agent for them based on their needs."
),
title=f"Found {len(agents)} agent{'s' if len(agents) != 1 else ''} for '{query}'",
agents=agents,
count=len(agents),
session_id=session_id,
)
async def _search_library(
query: str, session_id: str | None, user_id: str | None
) -> ToolResponseBase:
"""Search user's library agents, with direct UUID lookup fallback."""
if not user_id:
if source == "library" and not user_id:
return ErrorResponse(
message="User authentication required to search library",
session_id=session_id,
)
query = query.strip()
# Normalize list-all keywords to empty string
if query.lower() in _LIST_ALL_KEYWORDS:
query = ""
agents: list[AgentInfo] = []
try:
if is_uuid(query):
logger.info(f"Query looks like UUID, trying direct lookup: {query}")
agent = await _get_library_agent_by_id(user_id, query)
if agent:
agents.append(agent)
if not agents:
logger.info(
f"{'Listing all agents in' if not query else 'Searching'} "
f"user library{'' if not query else f' for: {query}'}"
)
results = await library_db().list_library_agents(
user_id=user_id,
search_term=query or None,
page_size=50 if not query else 10,
)
if source == "marketplace":
logger.info(f"Searching marketplace for: {query}")
results = await store_db().get_store_agents(search_query=query, page_size=5)
for agent in results.agents:
agents.append(_library_agent_to_info(agent))
agents.append(
AgentInfo(
id=f"{agent.creator}/{agent.slug}",
name=agent.agent_name,
description=agent.description or "",
source="marketplace",
in_library=False,
creator=agent.creator,
category="general",
rating=agent.rating,
runs=agent.runs,
is_featured=False,
)
)
else:
if _is_uuid(query):
logger.info(f"Query looks like UUID, trying direct lookup: {query}")
agent = await _get_library_agent_by_id(user_id, query) # type: ignore[arg-type]
if agent:
agents.append(agent)
logger.info(f"Found agent by direct ID lookup: {agent.name}")
if not agents:
search_term = query or None
logger.info(
f"{'Listing all agents in' if not query else 'Searching'} "
f"user library{'' if not query else f' for: {query}'}"
)
results = await library_db().list_library_agents(
user_id=user_id, # type: ignore[arg-type]
search_term=search_term,
page_size=50 if not query else 10,
)
for agent in results.agents:
agents.append(_library_agent_to_info(agent))
logger.info(f"Found {len(agents)} agents in {source}")
except NotFoundError:
pass
except DatabaseError as e:
logger.error(f"Error searching library: {e}", exc_info=True)
logger.error(f"Error searching {source}: {e}", exc_info=True)
return ErrorResponse(
message="Failed to search library. Please try again.",
message=f"Failed to search {source}. Please try again.",
error=str(e),
session_id=session_id,
)
if not agents:
if not query:
return NoResultsResponse(
message=(
"Your library is empty. Let the user know they can browse the "
"marketplace to find agents, or you can create a custom agent "
"for them based on their needs."
),
suggestions=[
"Browse the marketplace to find and add agents",
"Use find_agent to search the marketplace",
],
session_id=session_id,
if source == "marketplace":
suggestions = [
"Try more general terms",
"Browse categories in the marketplace",
"Check spelling",
]
no_results_msg = (
f"No agents found matching '{query}'. Let the user know they can "
"try different keywords or browse the marketplace. Also let them "
"know you can create a custom agent for them based on their needs."
)
return NoResultsResponse(
message=(
f"No agents matching '{query}' found in your library. Let the "
"user know you can create a custom agent for them based on "
"their needs."
),
suggestions=[
elif not query:
# User asked to list all but library is empty
suggestions = [
"Browse the marketplace to find and add agents",
"Use find_agent to search the marketplace",
]
no_results_msg = (
"Your library is empty. Let the user know they can browse the "
"marketplace to find agents, or you can create a custom agent "
"for them based on their needs."
)
else:
suggestions = [
"Try different keywords",
"Use find_agent to search the marketplace",
"Check your library at /library",
],
session_id=session_id,
]
no_results_msg = (
f"No agents matching '{query}' found in your library. Let the "
"user know you can create a custom agent for them based on "
"their needs."
)
return NoResultsResponse(
message=no_results_msg, session_id=session_id, suggestions=suggestions
)
if not query:
if source == "marketplace":
title = (
f"Found {len(agents)} agent{'s' if len(agents) != 1 else ''} for '{query}'"
)
elif not query:
title = f"Found {len(agents)} agent{'s' if len(agents) != 1 else ''} in your library"
else:
title = f"Found {len(agents)} agent{'s' if len(agents) != 1 else ''} in your library for '{query}'"
message = (
"Now you have found some options for the user to choose from. "
"You can add a link to a recommended agent at: /marketplace/agent/agent_id "
"Please ask the user if they would like to use any of these agents. "
"Let the user know we can create a custom agent for them based on their needs."
if source == "marketplace"
else "Found agents in the user's library. You can provide a link to view "
"an agent at: /library/agents/{agent_id}. Use agent_output to get "
"execution results, or run_agent to execute. Let the user know we can "
"create a custom agent for them based on their needs."
)
return AgentsFoundResponse(
message=(
"Found agents in the user's library. You can provide a link to view "
"an agent at: /library/agents/{agent_id}. Use agent_output to get "
"execution results, or run_agent to execute. Let the user know we can "
"create a custom agent for them based on their needs."
),
message=message,
title=title,
agents=agents,
count=len(agents),
@@ -196,20 +189,9 @@ async def _search_library(
)
def _marketplace_agent_to_info(agent: StoreAgent | StoreAgentDetails) -> AgentInfo:
"""Convert a marketplace agent (StoreAgent or StoreAgentDetails) to an AgentInfo."""
return AgentInfo(
id=f"{agent.creator}/{agent.slug}",
name=agent.agent_name,
description=agent.description or "",
source="marketplace",
in_library=False,
creator=agent.creator,
category="general",
rating=agent.rating,
runs=agent.runs,
is_featured=False,
)
def _is_uuid(text: str) -> bool:
"""Check if text is a valid UUID v4."""
return bool(_UUID_PATTERN.match(text.strip()))
def _library_agent_to_info(agent: LibraryAgent) -> AgentInfo:
@@ -232,23 +214,6 @@ def _library_agent_to_info(agent: LibraryAgent) -> AgentInfo:
)
async def _get_marketplace_agent_by_slug(creator: str, slug: str) -> AgentInfo | None:
"""Fetch a marketplace agent by creator/slug identifier."""
try:
details = await store_db().get_store_agent_details(creator, slug)
return _marketplace_agent_to_info(details)
except NotFoundError:
pass
except DatabaseError:
raise
except Exception as e:
logger.warning(
f"Could not fetch marketplace agent {creator}/{slug}: {e}",
exc_info=True,
)
return None
async def _get_library_agent_by_id(user_id: str, agent_id: str) -> AgentInfo | None:
"""Fetch a library agent by ID (library agent ID or graph_id).
@@ -261,9 +226,10 @@ async def _get_library_agent_by_id(user_id: str, agent_id: str) -> AgentInfo | N
try:
agent = await lib_db.get_library_agent_by_graph_id(user_id, agent_id)
if agent:
logger.debug(f"Found library agent by graph_id: {agent.name}")
return _library_agent_to_info(agent)
except NotFoundError:
pass
logger.debug(f"Library agent not found by graph_id: {agent_id}")
except DatabaseError:
raise
except Exception as e:
@@ -275,9 +241,10 @@ async def _get_library_agent_by_id(user_id: str, agent_id: str) -> AgentInfo | N
try:
agent = await lib_db.get_library_agent(agent_id, user_id)
if agent:
logger.debug(f"Found library agent by library_id: {agent.name}")
return _library_agent_to_info(agent)
except NotFoundError:
pass
logger.debug(f"Library agent not found by library_id: {agent_id}")
except DatabaseError:
raise
except Exception as e:

View File

@@ -1,170 +0,0 @@
"""Tests for agent search direct lookup functionality."""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from .agent_search import search_agents
from .models import AgentsFoundResponse, NoResultsResponse
_TEST_USER_ID = "test-user-agent-search"
class TestMarketplaceSlugLookup:
"""Tests for creator/slug direct lookup in marketplace search."""
@pytest.mark.asyncio(loop_scope="session")
async def test_slug_lookup_found(self):
"""creator/slug query returns the agent directly."""
mock_details = MagicMock()
mock_details.creator = "testuser"
mock_details.slug = "my-agent"
mock_details.agent_name = "My Agent"
mock_details.description = "A test agent"
mock_details.rating = 4.5
mock_details.runs = 100
mock_store = MagicMock()
mock_store.get_store_agent_details = AsyncMock(return_value=mock_details)
with patch(
"backend.copilot.tools.agent_search.store_db",
return_value=mock_store,
):
response = await search_agents(
query="testuser/my-agent",
source="marketplace",
session_id="test-session",
)
assert isinstance(response, AgentsFoundResponse)
assert response.count == 1
assert response.agents[0].id == "testuser/my-agent"
assert response.agents[0].name == "My Agent"
@pytest.mark.asyncio(loop_scope="session")
async def test_slug_lookup_not_found_falls_back_to_search(self):
"""creator/slug not found falls back to general search."""
from backend.util.exceptions import NotFoundError
mock_store = MagicMock()
mock_store.get_store_agent_details = AsyncMock(side_effect=NotFoundError(""))
# Fallback search returns results
mock_search_results = MagicMock()
mock_agent = MagicMock()
mock_agent.creator = "other"
mock_agent.slug = "similar-agent"
mock_agent.agent_name = "Similar Agent"
mock_agent.description = "A similar agent"
mock_agent.rating = 3.0
mock_agent.runs = 50
mock_search_results.agents = [mock_agent]
mock_store.get_store_agents = AsyncMock(return_value=mock_search_results)
with patch(
"backend.copilot.tools.agent_search.store_db",
return_value=mock_store,
):
response = await search_agents(
query="testuser/my-agent",
source="marketplace",
session_id="test-session",
)
assert isinstance(response, AgentsFoundResponse)
assert response.count == 1
assert response.agents[0].id == "other/similar-agent"
@pytest.mark.asyncio(loop_scope="session")
async def test_slug_lookup_not_found_no_search_results(self):
"""creator/slug not found and search returns nothing."""
from backend.util.exceptions import NotFoundError
mock_store = MagicMock()
mock_store.get_store_agent_details = AsyncMock(side_effect=NotFoundError(""))
mock_search_results = MagicMock()
mock_search_results.agents = []
mock_store.get_store_agents = AsyncMock(return_value=mock_search_results)
with patch(
"backend.copilot.tools.agent_search.store_db",
return_value=mock_store,
):
response = await search_agents(
query="testuser/nonexistent",
source="marketplace",
session_id="test-session",
)
assert isinstance(response, NoResultsResponse)
@pytest.mark.asyncio(loop_scope="session")
async def test_non_slug_query_goes_to_search(self):
"""Regular keyword query skips slug lookup and goes to search."""
mock_store = MagicMock()
mock_search_results = MagicMock()
mock_agent = MagicMock()
mock_agent.creator = "creator1"
mock_agent.slug = "email-agent"
mock_agent.agent_name = "Email Agent"
mock_agent.description = "Sends emails"
mock_agent.rating = 4.0
mock_agent.runs = 200
mock_search_results.agents = [mock_agent]
mock_store.get_store_agents = AsyncMock(return_value=mock_search_results)
with patch(
"backend.copilot.tools.agent_search.store_db",
return_value=mock_store,
):
response = await search_agents(
query="email",
source="marketplace",
session_id="test-session",
)
assert isinstance(response, AgentsFoundResponse)
# get_store_agent_details should NOT have been called
mock_store.get_store_agent_details.assert_not_called()
class TestLibraryUUIDLookup:
"""Tests for UUID direct lookup in library search."""
@pytest.mark.asyncio(loop_scope="session")
async def test_uuid_lookup_found_by_graph_id(self):
"""UUID query matching a graph_id returns the agent directly."""
agent_id = "a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d"
mock_agent = MagicMock()
mock_agent.id = "lib-agent-id"
mock_agent.name = "My Library Agent"
mock_agent.description = "A library agent"
mock_agent.creator_name = "testuser"
mock_agent.status.value = "HEALTHY"
mock_agent.can_access_graph = True
mock_agent.has_external_trigger = False
mock_agent.new_output = False
mock_agent.graph_id = agent_id
mock_agent.graph_version = 1
mock_agent.input_schema = {}
mock_agent.output_schema = {}
mock_lib_db = MagicMock()
mock_lib_db.get_library_agent_by_graph_id = AsyncMock(return_value=mock_agent)
with patch(
"backend.copilot.tools.agent_search.library_db",
return_value=mock_lib_db,
):
response = await search_agents(
query=agent_id,
source="library",
session_id="test-session",
user_id=_TEST_USER_ID,
)
assert isinstance(response, AgentsFoundResponse)
assert response.count == 1
assert response.agents[0].name == "My Library Agent"

View File

@@ -164,9 +164,8 @@ class BaseTool:
"""
if self.requires_auth and not user_id:
logger.warning(
"Attempted tool call for %s but user not authenticated",
self.name,
logger.error(
f"Attempted tool call for {self.name} but user not authenticated"
)
return StreamToolOutputAvailable(
toolCallId=tool_call_id,
@@ -197,7 +196,7 @@ class BaseTool:
output=raw_output,
)
except Exception as e:
logger.warning("Error in %s", self.name, exc_info=True)
logger.error(f"Error in {self.name}: {e}", exc_info=True)
return StreamToolOutputAvailable(
toolCallId=tool_call_id,
toolName=self.name,

View File

@@ -22,7 +22,6 @@ from e2b import AsyncSandbox
from e2b.exceptions import TimeoutException
from backend.copilot.context import E2B_WORKDIR, get_current_sandbox
from backend.copilot.integration_creds import get_integration_env_vars
from backend.copilot.model import ChatSession
from .base import BaseTool
@@ -42,9 +41,15 @@ class BashExecTool(BaseTool):
@property
def description(self) -> str:
return (
"Execute a Bash command or script. Shares filesystem with SDK file tools. "
"Useful for scripts, data processing, and package installation. "
"Killed after timeout (default 30s, max 120s)."
"Execute a Bash command or script. "
"Full Bash scripting is supported (loops, conditionals, pipes, "
"functions, etc.). "
"The working directory is shared with the SDK Read/Write/Edit/Glob/Grep "
"tools — files created by either are immediately visible to both. "
"Execution is killed after the timeout (default 30s, max 120s). "
"Returns stdout and stderr. "
"Useful for file manipulation, data processing, running scripts, "
"and installing packages."
)
@property
@@ -54,11 +59,13 @@ class BashExecTool(BaseTool):
"properties": {
"command": {
"type": "string",
"description": "Bash command or script.",
"description": "Bash command or script to execute.",
},
"timeout": {
"type": "integer",
"description": "Max seconds (default 30, max 120).",
"description": (
"Max execution time in seconds (default 30, max 120)."
),
"default": 30,
},
},
@@ -67,10 +74,7 @@ class BashExecTool(BaseTool):
@property
def requires_auth(self) -> bool:
# True because _execute_on_e2b injects user tokens (GH_TOKEN etc.)
# when user_id is present. Defense-in-depth: ensures only authenticated
# users reach the token injection path.
return True
return False
async def _execute(
self,
@@ -78,14 +82,6 @@ class BashExecTool(BaseTool):
session: ChatSession,
**kwargs: Any,
) -> ToolResponseBase:
"""Run a bash command on E2B (if available) or in a bubblewrap sandbox.
Dispatches to :meth:`_execute_on_e2b` when a sandbox is present in the
current execution context, otherwise falls back to the local bubblewrap
sandbox. Returns a :class:`BashExecResponse` on success or an
:class:`ErrorResponse` when the sandbox is unavailable or the command
is empty.
"""
session_id = session.session_id if session else None
command: str = (kwargs.get("command") or "").strip()
@@ -100,9 +96,7 @@ class BashExecTool(BaseTool):
sandbox = get_current_sandbox()
if sandbox is not None:
return await self._execute_on_e2b(
sandbox, command, timeout, session_id, user_id
)
return await self._execute_on_e2b(sandbox, command, timeout, session_id)
# Bubblewrap fallback: local isolated execution.
if not has_full_sandbox():
@@ -139,42 +133,19 @@ class BashExecTool(BaseTool):
command: str,
timeout: int,
session_id: str | None,
user_id: str | None = None,
) -> ToolResponseBase:
"""Execute *command* on the E2B sandbox via commands.run().
Integration tokens (e.g. GH_TOKEN) are injected into the sandbox env
for any user with connected accounts. E2B has full internet access, so
CLI tools like ``gh`` work without manual authentication.
"""
envs: dict[str, str] = {
"PATH": "/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin",
}
# Collect injected secret values so we can scrub them from output.
secret_values: list[str] = []
if user_id is not None:
integration_env = await get_integration_env_vars(user_id)
secret_values = [v for v in integration_env.values() if v]
envs.update(integration_env)
"""Execute *command* on the E2B sandbox via commands.run()."""
try:
result = await sandbox.commands.run(
f"bash -c {shlex.quote(command)}",
cwd=E2B_WORKDIR,
timeout=timeout,
envs=envs,
envs={"PATH": "/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin"},
)
stdout = result.stdout or ""
stderr = result.stderr or ""
# Scrub injected tokens from command output to prevent exfiltration
# via `echo $GH_TOKEN`, `env`, `printenv`, etc.
for secret in secret_values:
stdout = stdout.replace(secret, "[REDACTED]")
stderr = stderr.replace(secret, "[REDACTED]")
return BashExecResponse(
message=f"Command executed on E2B (exit {result.exit_code})",
stdout=stdout,
stderr=stderr,
stdout=result.stdout or "",
stderr=result.stderr or "",
exit_code=result.exit_code,
timed_out=False,
session_id=session_id,

View File

@@ -1,78 +0,0 @@
"""Tests for BashExecTool — E2B path with token injection."""
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from ._test_data import make_session
from .bash_exec import BashExecTool
from .models import BashExecResponse
_USER = "user-bash-exec-test"
def _make_tool() -> BashExecTool:
return BashExecTool()
def _make_sandbox(exit_code: int = 0, stdout: str = "", stderr: str = "") -> MagicMock:
result = MagicMock()
result.exit_code = exit_code
result.stdout = stdout
result.stderr = stderr
sandbox = MagicMock()
sandbox.commands.run = AsyncMock(return_value=result)
return sandbox
class TestBashExecE2BTokenInjection:
@pytest.mark.asyncio(loop_scope="session")
async def test_token_injected_when_user_id_set(self):
"""When user_id is provided, integration env vars are merged into sandbox envs."""
tool = _make_tool()
session = make_session(user_id=_USER)
sandbox = _make_sandbox(stdout="ok")
env_vars = {"GH_TOKEN": "gh-secret", "GITHUB_TOKEN": "gh-secret"}
with patch(
"backend.copilot.tools.bash_exec.get_integration_env_vars",
new=AsyncMock(return_value=env_vars),
) as mock_get_env:
result = await tool._execute_on_e2b(
sandbox=sandbox,
command="echo hi",
timeout=10,
session_id=session.session_id,
user_id=_USER,
)
mock_get_env.assert_awaited_once_with(_USER)
call_kwargs = sandbox.commands.run.call_args[1]
assert call_kwargs["envs"]["GH_TOKEN"] == "gh-secret"
assert call_kwargs["envs"]["GITHUB_TOKEN"] == "gh-secret"
assert isinstance(result, BashExecResponse)
@pytest.mark.asyncio(loop_scope="session")
async def test_no_token_injection_when_user_id_is_none(self):
"""When user_id is None, get_integration_env_vars must NOT be called."""
tool = _make_tool()
session = make_session(user_id=_USER)
sandbox = _make_sandbox(stdout="ok")
with patch(
"backend.copilot.tools.bash_exec.get_integration_env_vars",
new=AsyncMock(return_value={"GH_TOKEN": "should-not-appear"}),
) as mock_get_env:
result = await tool._execute_on_e2b(
sandbox=sandbox,
command="echo hi",
timeout=10,
session_id=session.session_id,
user_id=None,
)
mock_get_env.assert_not_called()
call_kwargs = sandbox.commands.run.call_args[1]
assert "GH_TOKEN" not in call_kwargs["envs"]
assert isinstance(result, BashExecResponse)

View File

@@ -1,196 +0,0 @@
"""Tool for prompting the user to connect a required integration.
When the copilot encounters an authentication failure (e.g. `gh` CLI returns
"authentication required"), it calls this tool to surface the credentials
setup card in the chat — the same UI that appears when a GitHub block runs
without configured credentials.
"""
from typing import Any, TypedDict
from backend.copilot.model import ChatSession
from backend.copilot.providers import SUPPORTED_PROVIDERS, get_provider_auth_types
from backend.copilot.tools.models import (
ErrorResponse,
ResponseType,
SetupInfo,
SetupRequirementsResponse,
ToolResponseBase,
UserReadiness,
)
from .base import BaseTool
class _CredentialEntry(TypedDict):
"""Shape of each entry inside SetupRequirementsResponse.user_readiness.missing_credentials.
Partially overlaps with :class:`~backend.data.model.CredentialsMetaInput`
(``id``, ``title``, ``provider``) but carries extra UI-facing fields
(``types``, ``scopes``) that the frontend ``SetupRequirementsCard`` needs
to render the inline credential setup card.
Display name is derived from :data:`SUPPORTED_PROVIDERS` at build time
rather than stored here — eliminates the old ``provider_name`` field.
``types`` replaces the old singular ``type`` field; the frontend already
prefers ``types`` and only fell back to ``type`` for compatibility.
"""
id: str
title: str
# Slug used as the credential key (e.g. "github").
provider: str
# All supported credential types the user can choose from (e.g. ["api_key", "oauth2"]).
# The first element is the default/primary type.
types: list[str]
scopes: list[str]
class ConnectIntegrationTool(BaseTool):
"""Surface the credentials setup UI when an integration is not connected."""
@property
def name(self) -> str:
return "connect_integration"
@property
def description(self) -> str:
return (
"Prompt the user to connect a required integration (e.g. GitHub). "
"Call this when an external CLI or API call fails because the user "
"has not connected the relevant account. "
"The tool surfaces a credentials setup card in the chat so the user "
"can authenticate without leaving the page. "
"After the user connects the account, retry the operation. "
"In E2B/cloud sandbox mode the token (GH_TOKEN/GITHUB_TOKEN) is "
"automatically injected per-command in bash_exec — no manual export needed. "
"In local bubblewrap mode network is isolated so GitHub CLI commands "
"will still fail after connecting; inform the user of this limitation."
)
@property
def parameters(self) -> dict[str, Any]:
return {
"type": "object",
"properties": {
"provider": {
"type": "string",
"description": (
"Integration provider slug, e.g. 'github'. "
"Must be one of the supported providers."
),
"enum": list(SUPPORTED_PROVIDERS.keys()),
},
"reason": {
"type": "string",
"description": (
"Brief explanation of why the integration is needed, "
"shown to the user in the setup card."
),
"maxLength": 500,
},
"scopes": {
"type": "array",
"items": {"type": "string"},
"description": (
"OAuth scopes to request. Omit to use the provider default. "
"Add extra scopes when you need more access — e.g. for GitHub: "
"'repo' (clone/push/pull), 'read:org' (org membership), "
"'workflow' (GitHub Actions). "
"Requesting only the scopes you actually need is best practice."
),
},
},
"required": ["provider"],
}
@property
def requires_auth(self) -> bool:
# Require auth so only authenticated users can trigger the setup card.
# The card itself is user-agnostic (no per-user data needed), so
# user_id is intentionally unused in _execute.
return True
async def _execute(
self,
user_id: str | None,
session: ChatSession,
**kwargs: Any,
) -> ToolResponseBase:
"""Build and return a :class:`SetupRequirementsResponse` for the requested provider.
Validates the *provider* slug against the known registry, merges any
agent-requested OAuth *scopes* with the provider defaults, and constructs
the credential setup card payload that the frontend renders as an inline
authentication prompt.
Returns an :class:`ErrorResponse` if *provider* is unknown.
"""
_ = user_id # setup card is user-agnostic; auth is enforced via requires_auth
session_id = session.session_id if session else None
provider: str = (kwargs.get("provider") or "").strip().lower()
reason: str = (kwargs.get("reason") or "").strip()[
:500
] # cap LLM-controlled text
extra_scopes: list[str] = [
str(s).strip() for s in (kwargs.get("scopes") or []) if str(s).strip()
]
entry = SUPPORTED_PROVIDERS.get(provider)
if not entry:
supported = ", ".join(f"'{p}'" for p in SUPPORTED_PROVIDERS)
return ErrorResponse(
message=(
f"Unknown provider '{provider}'. "
f"Supported providers: {supported}."
),
error="unknown_provider",
session_id=session_id,
)
display_name: str = entry["name"]
supported_types: list[str] = get_provider_auth_types(provider)
# Merge agent-requested scopes with provider defaults (deduplicated, order preserved).
default_scopes: list[str] = entry["default_scopes"]
seen: set[str] = set()
scopes: list[str] = []
for s in default_scopes + extra_scopes:
if s not in seen:
seen.add(s)
scopes.append(s)
field_key = f"{provider}_credentials"
message_parts = [
f"To continue, please connect your {display_name} account.",
]
if reason:
message_parts.append(reason)
credential_entry: _CredentialEntry = {
"id": field_key,
"title": f"{display_name} Credentials",
"provider": provider,
"types": supported_types,
"scopes": scopes,
}
missing_credentials: dict[str, _CredentialEntry] = {field_key: credential_entry}
return SetupRequirementsResponse(
type=ResponseType.SETUP_REQUIREMENTS,
message=" ".join(message_parts),
session_id=session_id,
setup_info=SetupInfo(
agent_id=f"connect_{provider}",
agent_name=display_name,
user_readiness=UserReadiness(
has_all_credentials=False,
missing_credentials=missing_credentials,
ready_to_run=False,
),
requirements={
"credentials": [missing_credentials[field_key]],
"inputs": [],
"execution_modes": [],
},
),
)

View File

@@ -1,135 +0,0 @@
"""Tests for ConnectIntegrationTool."""
import pytest
from ._test_data import make_session
from .connect_integration import ConnectIntegrationTool
from .models import ErrorResponse, SetupRequirementsResponse
_TEST_USER_ID = "test-user-connect-integration"
class TestConnectIntegrationTool:
def _make_tool(self) -> ConnectIntegrationTool:
return ConnectIntegrationTool()
@pytest.mark.asyncio(loop_scope="session")
async def test_unknown_provider_returns_error(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="nonexistent"
)
assert isinstance(result, ErrorResponse)
assert result.error == "unknown_provider"
assert "nonexistent" in result.message
assert "github" in result.message
@pytest.mark.asyncio(loop_scope="session")
async def test_empty_provider_returns_error(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider=""
)
assert isinstance(result, ErrorResponse)
assert result.error == "unknown_provider"
@pytest.mark.asyncio(loop_scope="session")
async def test_github_provider_returns_setup_response(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="github"
)
assert isinstance(result, SetupRequirementsResponse)
assert result.setup_info.agent_name == "GitHub"
assert result.setup_info.agent_id == "connect_github"
@pytest.mark.asyncio(loop_scope="session")
async def test_github_has_missing_credentials_in_readiness(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="github"
)
assert isinstance(result, SetupRequirementsResponse)
readiness = result.setup_info.user_readiness
assert readiness.has_all_credentials is False
assert readiness.ready_to_run is False
assert "github_credentials" in readiness.missing_credentials
@pytest.mark.asyncio(loop_scope="session")
async def test_github_requirements_include_credential_entry(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="github"
)
assert isinstance(result, SetupRequirementsResponse)
creds = result.setup_info.requirements["credentials"]
assert len(creds) == 1
assert creds[0]["provider"] == "github"
assert creds[0]["id"] == "github_credentials"
@pytest.mark.asyncio(loop_scope="session")
async def test_reason_appears_in_message(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
reason = "Needed to create a pull request."
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="github", reason=reason
)
assert isinstance(result, SetupRequirementsResponse)
assert reason in result.message
@pytest.mark.asyncio(loop_scope="session")
async def test_session_id_propagated(self):
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="github"
)
assert isinstance(result, SetupRequirementsResponse)
assert result.session_id == session.session_id
@pytest.mark.asyncio(loop_scope="session")
async def test_provider_case_insensitive(self):
"""Provider slug is normalised to lowercase before lookup."""
tool = self._make_tool()
session = make_session(user_id=_TEST_USER_ID)
result = await tool._execute(
user_id=_TEST_USER_ID, session=session, provider="GitHub"
)
assert isinstance(result, SetupRequirementsResponse)
def test_tool_name(self):
assert ConnectIntegrationTool().name == "connect_integration"
def test_requires_auth(self):
assert ConnectIntegrationTool().requires_auth is True
@pytest.mark.asyncio(loop_scope="session")
async def test_unauthenticated_user_gets_need_login_response(self):
"""execute() with user_id=None must return NeedLoginResponse, not the setup card.
This verifies that the requires_auth guard in BaseTool.execute() fires
before _execute() is called, so unauthenticated callers cannot probe
which integrations are configured.
"""
import json
tool = self._make_tool()
# Session still needs a user_id string; the None is passed to execute()
# to simulate an unauthenticated call.
session = make_session(user_id=_TEST_USER_ID)
result = await tool.execute(
user_id=None,
session=session,
tool_call_id="test-call-id",
provider="github",
)
raw = result.output
output = json.loads(raw) if isinstance(raw, str) else raw
assert output.get("type") == "need_login"
assert result.success is False

View File

@@ -30,7 +30,12 @@ class ContinueRunBlockTool(BaseTool):
@property
def description(self) -> str:
return "Resume block execution after a run_block call returned review_required. Pass the review_id."
return (
"Continue executing a block after human review approval. "
"Use this after a run_block call returned review_required. "
"Pass the review_id from the review_required response. "
"The block will execute with the original pre-approved input data."
)
@property
def parameters(self) -> dict[str, Any]:
@@ -39,7 +44,10 @@ class ContinueRunBlockTool(BaseTool):
"properties": {
"review_id": {
"type": "string",
"description": "review_id from the review_required response.",
"description": (
"The review_id from a previous review_required response. "
"This resumes execution with the pre-approved input data."
),
},
},
"required": ["review_id"],

View File

@@ -23,8 +23,12 @@ class CreateAgentTool(BaseTool):
@property
def description(self) -> str:
return (
"Create a new agent from JSON (nodes + links). Validates, auto-fixes, and saves. "
"Before calling, search for existing agents with find_library_agent."
"Create a new agent workflow. Pass `agent_json` with the complete "
"agent graph JSON you generated using block schemas from find_block. "
"The tool validates, auto-fixes, and saves.\n\n"
"IMPORTANT: Before calling this tool, search for relevant existing agents "
"using find_library_agent that could be used as building blocks. "
"Pass their IDs in the library_agent_ids parameter."
)
@property
@@ -38,21 +42,34 @@ class CreateAgentTool(BaseTool):
"properties": {
"agent_json": {
"type": "object",
"description": "Agent graph with 'nodes' and 'links' arrays.",
"description": (
"The agent JSON to validate and save. "
"Must contain 'nodes' and 'links' arrays, and optionally "
"'name' and 'description'."
),
},
"library_agent_ids": {
"type": "array",
"items": {"type": "string"},
"description": "Library agent IDs as building blocks.",
"description": (
"List of library agent IDs to use as building blocks."
),
},
"save": {
"type": "boolean",
"description": "Save the agent (default: true). False for preview.",
"description": (
"Whether to save the agent. Default is true. "
"Set to false for preview only."
),
"default": True,
},
"folder_id": {
"type": "string",
"description": "Folder ID to save into (default: root).",
"description": (
"Optional folder ID to save the agent into. "
"If not provided, the agent is saved at root level. "
"Use list_folders to find available folders."
),
},
},
"required": ["agent_json"],

View File

@@ -23,7 +23,9 @@ class CustomizeAgentTool(BaseTool):
@property
def description(self) -> str:
return (
"Customize a marketplace/template agent. Validates, auto-fixes, and saves."
"Customize a marketplace or template agent. Pass `agent_json` "
"with the complete customized agent JSON. The tool validates, "
"auto-fixes, and saves."
)
@property
@@ -37,21 +39,32 @@ class CustomizeAgentTool(BaseTool):
"properties": {
"agent_json": {
"type": "object",
"description": "Customized agent JSON with nodes and links.",
"description": (
"Complete customized agent JSON to validate and save. "
"Optionally include 'name' and 'description'."
),
},
"library_agent_ids": {
"type": "array",
"items": {"type": "string"},
"description": "Library agent IDs as building blocks.",
"description": (
"List of library agent IDs to use as building blocks."
),
},
"save": {
"type": "boolean",
"description": "Save the agent (default: true). False for preview.",
"description": (
"Whether to save the customized agent. Default is true."
),
"default": True,
},
"folder_id": {
"type": "string",
"description": "Folder ID to save into (default: root).",
"description": (
"Optional folder ID to save the agent into. "
"If not provided, the agent is saved at root level. "
"Use list_folders to find available folders."
),
},
},
"required": ["agent_json"],

View File

@@ -41,7 +41,8 @@ import contextlib
import logging
from typing import Any, Awaitable, Callable, Literal
from e2b import AsyncSandbox, SandboxLifecycle
from e2b import AsyncSandbox
from e2b.sandbox.sandbox_api import SandboxLifecycle
from backend.data.redis_client import get_redis_async

View File

@@ -23,8 +23,12 @@ class EditAgentTool(BaseTool):
@property
def description(self) -> str:
return (
"Edit an existing agent. Validates, auto-fixes, and saves. "
"Before calling, search for existing agents with find_library_agent."
"Edit an existing agent. Pass `agent_json` with the complete "
"updated agent JSON you generated. The tool validates, auto-fixes, "
"and saves.\n\n"
"IMPORTANT: Before calling this tool, if the changes involve adding new "
"functionality, search for relevant existing agents using find_library_agent "
"that could be used as building blocks."
)
@property
@@ -38,20 +42,33 @@ class EditAgentTool(BaseTool):
"properties": {
"agent_id": {
"type": "string",
"description": "Graph ID or library agent ID to edit.",
"description": (
"The ID of the agent to edit. "
"Can be a graph ID or library agent ID."
),
},
"agent_json": {
"type": "object",
"description": "Updated agent JSON with nodes and links.",
"description": (
"Complete updated agent JSON to validate and save. "
"Must contain 'nodes' and 'links'. "
"Include 'name' and/or 'description' if they need "
"to be updated."
),
},
"library_agent_ids": {
"type": "array",
"items": {"type": "string"},
"description": "Library agent IDs as building blocks.",
"description": (
"List of library agent IDs to use as building blocks for the changes."
),
},
"save": {
"type": "boolean",
"description": "Save changes (default: true). False for preview.",
"description": (
"Whether to save the changes. "
"Default is true. Set to false for preview only."
),
"default": True,
},
},

View File

@@ -134,7 +134,11 @@ class SearchFeatureRequestsTool(BaseTool):
@property
def description(self) -> str:
return "Search existing feature requests. Check before creating a new one."
return (
"Search existing feature requests to check if a similar request "
"already exists before creating a new one. Returns matching feature "
"requests with their ID, title, and description."
)
@property
def parameters(self) -> dict[str, Any]:
@@ -230,9 +234,14 @@ class CreateFeatureRequestTool(BaseTool):
@property
def description(self) -> str:
return (
"Create a feature request or add need to existing one. "
"Search first to avoid duplicates. Pass existing_issue_id to add to existing. "
"Never include PII (names, emails, phone numbers, company names) in title/description."
"Create a new feature request or add a customer need to an existing one. "
"Always search first with search_feature_requests to avoid duplicates. "
"If a matching request exists, pass its ID as existing_issue_id to add "
"the user's need to it instead of creating a duplicate. "
"IMPORTANT: Never include personally identifiable information (PII) in "
"the title or description — no names, emails, phone numbers, company "
"names, or other identifying details. Write titles and descriptions in "
"generic, feature-focused language."
)
@property
@@ -242,15 +251,28 @@ class CreateFeatureRequestTool(BaseTool):
"properties": {
"title": {
"type": "string",
"description": "Feature request title. No names, emails, or company info.",
"description": (
"Title for the feature request. Must be generic and "
"feature-focused — do not include any user names, emails, "
"company names, or other PII."
),
},
"description": {
"type": "string",
"description": "What the user wants and why. No names, emails, or company info.",
"description": (
"Detailed description of what the user wants and why. "
"Must not contain any personally identifiable information "
"(PII) — describe the feature need generically without "
"referencing specific users, companies, or contact details."
),
},
"existing_issue_id": {
"type": "string",
"description": "Linear issue ID to add need to (from search results).",
"description": (
"If adding a need to an existing feature request, "
"provide its Linear issue ID (from search results). "
"Omit to create a new feature request."
),
},
},
"required": ["title", "description"],

View File

@@ -18,7 +18,9 @@ class FindAgentTool(BaseTool):
@property
def description(self) -> str:
return "Search marketplace agents by capability, or look up by slug ('username/agent-name')."
return (
"Discover agents from the marketplace based on capabilities and user needs."
)
@property
def parameters(self) -> dict[str, Any]:
@@ -27,7 +29,7 @@ class FindAgentTool(BaseTool):
"properties": {
"query": {
"type": "string",
"description": "Search keywords, or 'username/agent-name' for direct slug lookup.",
"description": "Search query describing what the user wants to accomplish. Use single keywords for best results.",
},
},
"required": ["query"],
@@ -36,7 +38,6 @@ class FindAgentTool(BaseTool):
async def _execute(
self, user_id: str | None, session: ChatSession, **kwargs
) -> ToolResponseBase:
"""Search marketplace for agents matching the query."""
return await search_agents(
query=kwargs.get("query", "").strip(),
source="marketplace",

View File

@@ -15,7 +15,6 @@ from .models import (
ErrorResponse,
NoResultsResponse,
)
from .utils import is_uuid
logger = logging.getLogger(__name__)
@@ -38,8 +37,7 @@ COPILOT_EXCLUDED_BLOCK_TYPES = {
# Specific block IDs excluded from CoPilot (STANDARD type but still require graph context)
COPILOT_EXCLUDED_BLOCK_IDS = {
# SmartDecisionMakerBlock - dynamically discovers downstream blocks via graph topology;
# usable in agent graphs (guide hardcodes its ID) but cannot run standalone.
# SmartDecisionMakerBlock - dynamically discovers downstream blocks via graph topology
"3b191d9f-356f-482d-8238-ba04b6d18381",
}
@@ -54,9 +52,12 @@ class FindBlockTool(BaseTool):
@property
def description(self) -> str:
return (
"Search blocks by name or description. Returns block IDs for run_block. "
"Always call this FIRST to get block IDs before using run_block. "
"Then call run_block with the block's id and empty input_data to see its detailed schema."
"Search for available blocks by name or description. "
"Blocks are reusable components that perform specific tasks like "
"sending emails, making API calls, processing text, etc. "
"IMPORTANT: Use this tool FIRST to get the block's 'id' before calling run_block. "
"The response includes each block's id, name, and description. "
"Call run_block with the block's id **with no inputs** to see detailed inputs/outputs and execute it."
)
@property
@@ -66,11 +67,18 @@ class FindBlockTool(BaseTool):
"properties": {
"query": {
"type": "string",
"description": "Search keywords (e.g. 'email', 'http', 'ai').",
"description": (
"Search query to find blocks by name or description. "
"Use keywords like 'email', 'http', 'text', 'ai', etc."
),
},
"include_schemas": {
"type": "boolean",
"description": "Include full input/output schemas (for agent JSON generation).",
"description": (
"If true, include full input_schema and output_schema "
"for each block. Use when generating agent JSON that "
"needs block schemas. Default is false."
),
"default": False,
},
},
@@ -105,77 +113,11 @@ class FindBlockTool(BaseTool):
if not query:
return ErrorResponse(
message="Please provide a search query or block ID",
message="Please provide a search query",
session_id=session_id,
)
try:
# Direct ID lookup if query looks like a UUID
if is_uuid(query):
block = get_block(query.lower())
if block:
if block.disabled:
return NoResultsResponse(
message=f"Block '{block.name}' (ID: {block.id}) is disabled and cannot be used.",
suggestions=["Search for an alternative block by name"],
session_id=session_id,
)
if (
block.block_type in COPILOT_EXCLUDED_BLOCK_TYPES
or block.id in COPILOT_EXCLUDED_BLOCK_IDS
):
if block.block_type == BlockType.MCP_TOOL:
return NoResultsResponse(
message=(
f"Block '{block.name}' (ID: {block.id}) is not "
"runnable through find_block/run_block. Use "
"run_mcp_tool instead."
),
suggestions=[
"Use run_mcp_tool to discover and run this MCP tool",
"Search for an alternative block by name",
],
session_id=session_id,
)
return NoResultsResponse(
message=(
f"Block '{block.name}' (ID: {block.id}) is not available "
"in CoPilot. It can only be used within agent graphs."
),
suggestions=[
"Search for an alternative block by name",
"Use this block in an agent graph instead",
],
session_id=session_id,
)
summary = BlockInfoSummary(
id=block.id,
name=block.name,
description=(
block.optimized_description or block.description or ""
),
categories=[c.value for c in block.categories],
)
if include_schemas:
info = block.get_info()
summary.input_schema = info.inputSchema
summary.output_schema = info.outputSchema
summary.static_output = info.staticOutput
return BlockListResponse(
message=(
f"Found block '{block.name}' by ID. "
"To see inputs/outputs and execute it, use "
"run_block with the block's 'id' - providing "
"no inputs."
),
blocks=[summary],
count=1,
query=query,
session_id=session_id,
)
# Search for blocks using hybrid search
results, total = await search().unified_hybrid_search(
query=query,

View File

@@ -499,123 +499,3 @@ class TestFindBlockFiltering:
assert response.blocks[0].input_schema == input_schema
assert response.blocks[0].output_schema == output_schema
assert response.blocks[0].static_output is True
class TestFindBlockDirectLookup:
"""Tests for direct UUID lookup in FindBlockTool."""
@pytest.mark.asyncio(loop_scope="session")
async def test_uuid_lookup_found(self):
"""UUID query returns the block directly without search."""
session = make_session(user_id=_TEST_USER_ID)
block_id = "a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d"
block = make_mock_block(block_id, "Test Block", BlockType.STANDARD)
with patch(
"backend.copilot.tools.find_block.get_block",
return_value=block,
):
tool = FindBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID, session=session, query=block_id
)
assert isinstance(response, BlockListResponse)
assert response.count == 1
assert response.blocks[0].id == block_id
assert response.blocks[0].name == "Test Block"
@pytest.mark.asyncio(loop_scope="session")
async def test_uuid_lookup_not_found_falls_through(self):
"""UUID that doesn't match any block falls through to search."""
session = make_session(user_id=_TEST_USER_ID)
block_id = "a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d"
mock_search_db = MagicMock()
mock_search_db.unified_hybrid_search = AsyncMock(return_value=([], 0))
with (
patch(
"backend.copilot.tools.find_block.get_block",
return_value=None,
),
patch(
"backend.copilot.tools.find_block.search",
return_value=mock_search_db,
),
):
tool = FindBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID, session=session, query=block_id
)
from .models import NoResultsResponse
assert isinstance(response, NoResultsResponse)
@pytest.mark.asyncio(loop_scope="session")
async def test_uuid_lookup_disabled_block(self):
"""UUID matching a disabled block returns NoResultsResponse."""
session = make_session(user_id=_TEST_USER_ID)
block_id = "a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d"
block = make_mock_block(
block_id, "Disabled Block", BlockType.STANDARD, disabled=True
)
with patch(
"backend.copilot.tools.find_block.get_block",
return_value=block,
):
tool = FindBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID, session=session, query=block_id
)
from .models import NoResultsResponse
assert isinstance(response, NoResultsResponse)
assert "disabled" in response.message.lower()
@pytest.mark.asyncio(loop_scope="session")
async def test_uuid_lookup_excluded_block_type(self):
"""UUID matching an excluded block type returns NoResultsResponse."""
session = make_session(user_id=_TEST_USER_ID)
block_id = "a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d"
block = make_mock_block(block_id, "Input Block", BlockType.INPUT)
with patch(
"backend.copilot.tools.find_block.get_block",
return_value=block,
):
tool = FindBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID, session=session, query=block_id
)
from .models import NoResultsResponse
assert isinstance(response, NoResultsResponse)
assert "not available" in response.message.lower()
@pytest.mark.asyncio(loop_scope="session")
async def test_uuid_lookup_excluded_block_id(self):
"""UUID matching an excluded block ID returns NoResultsResponse."""
session = make_session(user_id=_TEST_USER_ID)
smart_decision_id = "3b191d9f-356f-482d-8238-ba04b6d18381"
block = make_mock_block(
smart_decision_id, "Smart Decision Maker", BlockType.STANDARD
)
with patch(
"backend.copilot.tools.find_block.get_block",
return_value=block,
):
tool = FindBlockTool()
response = await tool._execute(
user_id=_TEST_USER_ID, session=session, query=smart_decision_id
)
from .models import NoResultsResponse
assert isinstance(response, NoResultsResponse)
assert "not available" in response.message.lower()

View File

@@ -19,8 +19,13 @@ class FindLibraryAgentTool(BaseTool):
@property
def description(self) -> str:
return (
"Search user's library agents. Returns graph_id, schemas for sub-agent composition. "
"Omit query to list all."
"Search for or list agents in the user's library. Use this to find "
"agents the user has already added to their library, including agents "
"they created or added from the marketplace. "
"When creating agents with sub-agent composition, use this to get "
"the agent's graph_id, graph_version, input_schema, and output_schema "
"needed for AgentExecutorBlock nodes. "
"Omit the query to list all agents."
)
@property
@@ -30,7 +35,10 @@ class FindLibraryAgentTool(BaseTool):
"properties": {
"query": {
"type": "string",
"description": "Search by name/description. Omit to list all.",
"description": (
"Search query to find agents by name or description. "
"Omit to list all agents in the library."
),
},
},
"required": [],

View File

@@ -22,10 +22,20 @@ class FixAgentGraphTool(BaseTool):
@property
def description(self) -> str:
return (
"Auto-fix common agent JSON issues: missing/invalid UUIDs, StoreValueBlock prerequisites, "
"double curly brace escaping, AddToList/AddToDictionary prerequisites, credentials, "
"node spacing, AI model defaults, link static properties, and type mismatches. "
"Returns fixed JSON and list of fixes applied."
"Auto-fix common issues in an agent JSON graph. Applies fixes for:\n"
"- Missing or invalid UUIDs on nodes and links\n"
"- StoreValueBlock prerequisites for ConditionBlock\n"
"- Double curly brace escaping in prompt templates\n"
"- AddToList/AddToDictionary prerequisite blocks\n"
"- CodeExecutionBlock output field naming\n"
"- Missing credentials configuration\n"
"- Node X coordinate spacing (800+ units apart)\n"
"- AI model default parameters\n"
"- Link static properties based on input schema\n"
"- Type mismatches (inserts conversion blocks)\n\n"
"Returns the fixed agent JSON plus a list of fixes applied. "
"After fixing, the agent is re-validated. If still invalid, "
"the remaining errors are included in the response."
)
@property

Some files were not shown because too many files have changed in this diff Show More