mirror of
https://github.com/Significant-Gravitas/AutoGPT.git
synced 2026-04-08 03:00:28 -04:00
Compare commits
107 Commits
feat/keep-
...
codex/plat
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
5cc72e7608 | ||
|
|
fa0650214d | ||
|
|
3e016508d4 | ||
|
|
b80d7abda9 | ||
|
|
0e310c788a | ||
|
|
91af007c18 | ||
|
|
e7ca81ed89 | ||
|
|
5164fa878f | ||
|
|
cf605ef5a3 | ||
|
|
e7bd05c6f1 | ||
|
|
22fb3549e3 | ||
|
|
1c3fe1444e | ||
|
|
b89321a688 | ||
|
|
67bdef13e7 | ||
|
|
e67dd93ee8 | ||
|
|
3140a60816 | ||
|
|
41c2ee9f83 | ||
|
|
630d6d4705 | ||
|
|
7c685c6677 | ||
|
|
ca748ee12a | ||
|
|
bbdf13c7a8 | ||
|
|
e1ea4cf326 | ||
|
|
db6b4444e0 | ||
|
|
9b1175473b | ||
|
|
752a238166 | ||
|
|
2a73d1baa9 | ||
|
|
254e6057f4 | ||
|
|
a616e5a060 | ||
|
|
c9461836c6 | ||
|
|
50a8df3d67 | ||
|
|
243b12778f | ||
|
|
3f7a8dc44d | ||
|
|
1c15d6a6cc | ||
|
|
a31be77408 | ||
|
|
1d45f2f18c | ||
|
|
27e34e9514 | ||
|
|
16d696edcc | ||
|
|
f87bbd5966 | ||
|
|
b64d1ed9fa | ||
|
|
43c81910ae | ||
|
|
3895d95826 | ||
|
|
181208528f | ||
|
|
0365a26c85 | ||
|
|
fb63ae54f0 | ||
|
|
6de79fb73f | ||
|
|
a11199aa67 | ||
|
|
5f82a71d5f | ||
|
|
d57da6c078 | ||
|
|
689cd67a13 | ||
|
|
dca89d1586 | ||
|
|
2f63fcd383 | ||
|
|
f04cd08e40 | ||
|
|
44714f1b25 | ||
|
|
78b95f8a76 | ||
|
|
6f0c1dfa11 | ||
|
|
5e595231da | ||
|
|
7b36bed8a5 | ||
|
|
372900c141 | ||
|
|
1a305db162 | ||
|
|
7afd2b249d | ||
|
|
8d22653810 | ||
|
|
48a653dc63 | ||
|
|
f6ddcbc6cb | ||
|
|
b00e16b438 | ||
|
|
b5acfb7855 | ||
|
|
1ee0bd6619 | ||
|
|
98f13a6e5d | ||
|
|
613978a611 | ||
|
|
2b0e8a5a9f | ||
|
|
08bb05141c | ||
|
|
4190f75b0b | ||
|
|
71315aa982 | ||
|
|
3ccaa5e103 | ||
|
|
960f893295 | ||
|
|
759effab60 | ||
|
|
45b6ada739 | ||
|
|
da544d3411 | ||
|
|
54e5059d7c | ||
|
|
1d7d2f77f3 | ||
|
|
567bc73ec4 | ||
|
|
61ef54af05 | ||
|
|
405403e6b7 | ||
|
|
ab16e63b0a | ||
|
|
45d3193727 | ||
|
|
9a08011d7d | ||
|
|
6fa66ac7da | ||
|
|
4bad08394c | ||
|
|
993c43b623 | ||
|
|
a8a62eeefc | ||
|
|
173614bcc5 | ||
|
|
fbe634fb19 | ||
|
|
a338c72c42 | ||
|
|
7f4398efa3 | ||
|
|
c2a054c511 | ||
|
|
83b00f4789 | ||
|
|
95524e94b3 | ||
|
|
2c517ff9a1 | ||
|
|
7020ae2189 | ||
|
|
b9336984be | ||
|
|
9924dedddc | ||
|
|
c054799b4f | ||
|
|
f3b5d584a3 | ||
|
|
476d9dcf80 | ||
|
|
072b623f8b | ||
|
|
26b0c95936 | ||
|
|
308357de84 | ||
|
|
1a6c50c6cc |
@@ -95,6 +95,28 @@ Address comments **one at a time**: fix → commit → push → inline reply →
|
||||
| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="🤖 Fixed in <commit-sha>: <description>"` |
|
||||
| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="🤖 Fixed in <commit-sha>: <description>"` |
|
||||
|
||||
## Codecov coverage
|
||||
|
||||
Codecov patch target is **80%** on changed lines. Checks are **informational** (not blocking) but should be green.
|
||||
|
||||
### Running coverage locally
|
||||
|
||||
**Backend** (from `autogpt_platform/backend/`):
|
||||
```bash
|
||||
poetry run pytest -s -vv --cov=backend --cov-branch --cov-report term-missing
|
||||
```
|
||||
|
||||
**Frontend** (from `autogpt_platform/frontend/`):
|
||||
```bash
|
||||
pnpm vitest run --coverage
|
||||
```
|
||||
|
||||
### When codecov/patch fails
|
||||
|
||||
1. Find uncovered files: `git diff --name-only $(gh pr view --json baseRefName --jq '.baseRefName')...HEAD`
|
||||
2. For each uncovered file — extract inline logic to `helpers.ts`/`helpers.py` and test those (highest ROI). Colocate tests as `*_test.py` (backend) or `__tests__/*.test.ts` (frontend).
|
||||
3. Run coverage locally to verify, commit, push.
|
||||
|
||||
## Format and commit
|
||||
|
||||
After fixing, format the changed code:
|
||||
|
||||
@@ -530,9 +530,19 @@ After showing all screenshots, output a **detailed** summary table:
|
||||
# but Homebrew bash is 5.x; Linux typically has bash 5.x). If running on Bash <4, use a
|
||||
# plain variable with a lookup function instead.
|
||||
declare -A SCREENSHOT_EXPLANATIONS=(
|
||||
["01-login-page.png"]="Shows the login page loaded successfully with SSO options visible."
|
||||
["02-builder-with-block.png"]="The builder canvas displays the newly added block connected to the trigger."
|
||||
# ... one entry per screenshot, using the same explanations you showed the user above
|
||||
# Each explanation MUST answer three things:
|
||||
# 1. FLOW: Which test scenario / user journey is this part of?
|
||||
# 2. STEPS: What exact actions were taken to reach this state?
|
||||
# 3. EVIDENCE: What does this screenshot prove (pass/fail/data)?
|
||||
#
|
||||
# Good example:
|
||||
# ["03-cost-log-after-run.png"]="Flow: LLM block cost tracking. Steps: Logged in as tester@gmail.com → ran 'Cost Test Agent' → waited for COMPLETED status. Evidence: PlatformCostLog table shows 1 new row with cost_microdollars=1234 and correct user_id."
|
||||
#
|
||||
# Bad example (too vague — never do this):
|
||||
# ["03-cost-log.png"]="Shows the cost log table."
|
||||
["01-login-page.png"]="Flow: Login flow. Steps: Opened /login. Evidence: Login page renders with email/password fields and SSO options visible."
|
||||
["02-builder-with-block.png"]="Flow: Block execution. Steps: Logged in → /build → added LLM block. Evidence: Builder canvas shows block connected to trigger, ready to run."
|
||||
# ... one entry per screenshot using the flow/steps/evidence format above
|
||||
)
|
||||
|
||||
TEST_RESULTS_TABLE="| 1 | Login flow | PASS | N/A | 01-login-before.png, 02-login-after.png |
|
||||
@@ -547,6 +557,9 @@ Upload screenshots to the PR using the GitHub Git API (no local git operations
|
||||
|
||||
**This step is MANDATORY. Every test run MUST post a PR comment with screenshots. No exceptions.**
|
||||
|
||||
> **CRITICAL — NEVER post a bare directory link like `https://github.com/.../tree/...`.**
|
||||
> Every screenshot MUST appear as `` inline in the PR comment so reviewers can see them without clicking any links. After posting, the verification step below greps the comment for `![` tags and exits 1 if none are found — the test run is considered incomplete until this passes.
|
||||
|
||||
```bash
|
||||
# Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
|
||||
REPO="Significant-Gravitas/AutoGPT"
|
||||
@@ -582,12 +595,25 @@ for img in "${SCREENSHOT_FILES[@]}"; do
|
||||
done
|
||||
TREE_JSON+=']'
|
||||
|
||||
# Step 2: Create tree, commit, and branch ref
|
||||
# Step 2: Create tree, commit (with parent), and branch ref
|
||||
TREE_SHA=$(echo "$TREE_JSON" | jq -c '{tree: .}' | gh api "repos/${REPO}/git/trees" --input - --jq '.sha')
|
||||
COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
|
||||
-f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
|
||||
-f tree="$TREE_SHA" \
|
||||
--jq '.sha')
|
||||
|
||||
# Resolve existing branch tip as parent (avoids orphan commits on repeat runs)
|
||||
PARENT_SHA=$(gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" --jq '.object.sha' 2>/dev/null || true)
|
||||
if [ -n "$PARENT_SHA" ]; then
|
||||
COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
|
||||
-f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
|
||||
-f tree="$TREE_SHA" \
|
||||
-f "parents[]=$PARENT_SHA" \
|
||||
--jq '.sha')
|
||||
else
|
||||
# First commit on this branch — no parent
|
||||
COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
|
||||
-f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
|
||||
-f tree="$TREE_SHA" \
|
||||
--jq '.sha')
|
||||
fi
|
||||
|
||||
gh api "repos/${REPO}/git/refs" \
|
||||
-f ref="refs/heads/${SCREENSHOTS_BRANCH}" \
|
||||
-f sha="$COMMIT_SHA" 2>/dev/null \
|
||||
@@ -656,17 +682,123 @@ ${IMAGE_MARKDOWN}
|
||||
${FAILED_SECTION}
|
||||
INNEREOF
|
||||
|
||||
gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE"
|
||||
POSTED_BODY=$(gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE" --jq '.body')
|
||||
rm -f "$COMMENT_FILE"
|
||||
```
|
||||
|
||||
**The PR comment MUST include:**
|
||||
1. A summary table of all scenarios with PASS/FAIL and before/after API evidence
|
||||
2. Every successfully uploaded screenshot rendered inline; any failed uploads listed with manual attachment instructions
|
||||
3. A 1-2 sentence explanation below each screenshot describing what it proves
|
||||
3. A structured explanation below each screenshot covering: **Flow** (which scenario), **Steps** (exact actions taken to reach this state), **Evidence** (what this proves — pass/fail/data values). A bare "shows the page" caption is not acceptable.
|
||||
|
||||
This approach uses the GitHub Git API to create blobs, trees, commits, and refs entirely server-side. No local `git checkout` or `git push` — safe for worktrees and won't interfere with the PR branch.
|
||||
|
||||
**Verify inline rendering after posting — this is required, not optional:**
|
||||
|
||||
```bash
|
||||
# 1. Confirm the posted comment body contains inline image markdown syntax
|
||||
if ! echo "$POSTED_BODY" | grep -q '!\['; then
|
||||
echo "❌ FAIL: No inline image tags in posted comment body. Re-check IMAGE_MARKDOWN and re-post."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 2. Verify at least one raw URL actually resolves (catches wrong branch name, wrong path, etc.)
|
||||
FIRST_IMG_URL=$(echo "$POSTED_BODY" | grep -o 'https://raw.githubusercontent.com[^)]*' | head -1)
|
||||
if [ -n "$FIRST_IMG_URL" ]; then
|
||||
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" --max-time 10 "$FIRST_IMG_URL")
|
||||
if [ "$HTTP_STATUS" = "200" ]; then
|
||||
echo "✅ Inline images confirmed and raw URL resolves (HTTP 200)"
|
||||
else
|
||||
echo "❌ FAIL: Raw image URL returned HTTP $HTTP_STATUS — images will not render inline."
|
||||
echo " URL: $FIRST_IMG_URL"
|
||||
echo " Check branch name, path, and that the push succeeded."
|
||||
exit 1
|
||||
fi
|
||||
else
|
||||
echo "⚠️ Could not extract a raw URL from the comment — verify manually."
|
||||
fi
|
||||
```
|
||||
|
||||
## Step 8: Evaluate test completeness and post a GitHub review
|
||||
|
||||
After posting the PR comment, evaluate whether the test run actually covered everything it needed to. This is NOT a rubber-stamp — be critical. Then post a formal GitHub review so the PR author and reviewers can see the verdict.
|
||||
|
||||
### 8a. Evaluate against the test plan
|
||||
|
||||
Re-read `$RESULTS_DIR/test-plan.md` (written in Step 2) and `$RESULTS_DIR/test-report.md` (written in Step 5). For each scenario in the plan, answer:
|
||||
|
||||
> **Note:** `test-report.md` is written in Step 5. If it doesn't exist, write it before proceeding here — see the Step 5 template. Do not skip evaluation because the file is missing; create it from your notes instead.
|
||||
|
||||
| Question | Pass criteria |
|
||||
|----------|--------------|
|
||||
| Was it tested? | Explicit steps were executed, not just described |
|
||||
| Is there screenshot evidence? | At least one before/after screenshot per scenario |
|
||||
| Did the core feature work correctly? | Expected state matches actual state |
|
||||
| Were negative cases tested? | At least one failure/rejection case per feature |
|
||||
| Was DB/API state verified (not just UI)? | Raw API response or DB query confirms state change |
|
||||
|
||||
Build a verdict:
|
||||
- **APPROVE** — every scenario tested, evidence present, no bugs found or all bugs are minor/known
|
||||
- **REQUEST_CHANGES** — one or more: untested scenarios, missing evidence, bugs found, data not verified
|
||||
|
||||
### 8b. Post the GitHub review
|
||||
|
||||
```bash
|
||||
EVAL_FILE=$(mktemp)
|
||||
|
||||
# === STEP A: Write header ===
|
||||
cat > "$EVAL_FILE" << 'ENDEVAL'
|
||||
## 🧪 Test Evaluation
|
||||
|
||||
### Coverage checklist
|
||||
ENDEVAL
|
||||
|
||||
# === STEP B: Append ONE line per scenario — do this BEFORE calculating verdict ===
|
||||
# Format: "- ✅ **Scenario N – name**: <what was done and verified>"
|
||||
# or "- ❌ **Scenario N – name**: <what is missing or broken>"
|
||||
# Examples:
|
||||
# echo "- ✅ **Scenario 1 – Login flow**: tested, screenshot evidence present, auth token verified via API" >> "$EVAL_FILE"
|
||||
# echo "- ❌ **Scenario 3 – Cost logging**: NOT verified in DB — UI showed entry but raw SQL query was skipped" >> "$EVAL_FILE"
|
||||
#
|
||||
# !!! IMPORTANT: append ALL scenario lines here before proceeding to STEP C !!!
|
||||
|
||||
# === STEP C: Derive verdict from the checklist — runs AFTER all lines are appended ===
|
||||
FAIL_COUNT=$(grep -c "^- ❌" "$EVAL_FILE" || true)
|
||||
if [ "$FAIL_COUNT" -eq 0 ]; then
|
||||
VERDICT="APPROVE"
|
||||
else
|
||||
VERDICT="REQUEST_CHANGES"
|
||||
fi
|
||||
|
||||
# === STEP D: Append verdict section ===
|
||||
cat >> "$EVAL_FILE" << ENDVERDICT
|
||||
|
||||
### Verdict
|
||||
ENDVERDICT
|
||||
|
||||
if [ "$VERDICT" = "APPROVE" ]; then
|
||||
echo "✅ All scenarios covered with evidence. No blocking issues found." >> "$EVAL_FILE"
|
||||
else
|
||||
echo "❌ $FAIL_COUNT scenario(s) incomplete or have confirmed bugs. See ❌ items above." >> "$EVAL_FILE"
|
||||
echo "" >> "$EVAL_FILE"
|
||||
echo "**Required before merge:** address each ❌ item above." >> "$EVAL_FILE"
|
||||
fi
|
||||
|
||||
# === STEP E: Post the review ===
|
||||
gh api "repos/${REPO}/pulls/$PR_NUMBER/reviews" \
|
||||
--method POST \
|
||||
-f body="$(cat "$EVAL_FILE")" \
|
||||
-f event="$VERDICT"
|
||||
|
||||
rm -f "$EVAL_FILE"
|
||||
```
|
||||
|
||||
**Rules:**
|
||||
- Never auto-approve without checking every scenario in the test plan
|
||||
- `REQUEST_CHANGES` if ANY scenario is untested, lacks DB/API evidence, or has a confirmed bug
|
||||
- The evaluation body must list every scenario explicitly (✅ or ❌) — not just the failures
|
||||
- If you find new bugs during evaluation, add them to the request-changes body and (if `--fix` flag is set) fix them before posting
|
||||
|
||||
## Fix mode (--fix flag)
|
||||
|
||||
When `--fix` is present, the standard is HIGHER. Do not just note issues — FIX them immediately.
|
||||
|
||||
224
.claude/skills/write-frontend-tests/SKILL.md
Normal file
224
.claude/skills/write-frontend-tests/SKILL.md
Normal file
@@ -0,0 +1,224 @@
|
||||
---
|
||||
name: write-frontend-tests
|
||||
description: "Analyze the current branch diff against dev, plan integration tests for changed frontend pages/components, and write them. TRIGGER when user asks to write frontend tests, add test coverage, or 'write tests for my changes'."
|
||||
user-invocable: true
|
||||
args: "[base branch] — defaults to dev. Optionally pass a specific base branch to diff against."
|
||||
metadata:
|
||||
author: autogpt-team
|
||||
version: "1.0.0"
|
||||
---
|
||||
|
||||
# Write Frontend Tests
|
||||
|
||||
Analyze the current branch's frontend changes, plan integration tests, and write them.
|
||||
|
||||
## References
|
||||
|
||||
Before writing any tests, read the testing rules and conventions:
|
||||
|
||||
- `autogpt_platform/frontend/TESTING.md` — testing strategy, file locations, examples
|
||||
- `autogpt_platform/frontend/src/tests/AGENTS.md` — detailed testing rules, MSW patterns, decision flowchart
|
||||
- `autogpt_platform/frontend/src/tests/integrations/test-utils.tsx` — custom render with providers
|
||||
- `autogpt_platform/frontend/src/tests/integrations/vitest.setup.tsx` — MSW server setup
|
||||
|
||||
## Step 1: Identify changed frontend files
|
||||
|
||||
```bash
|
||||
BASE_BRANCH="${ARGUMENTS:-dev}"
|
||||
cd autogpt_platform/frontend
|
||||
|
||||
# Get changed frontend files (excluding generated, config, and test files)
|
||||
git diff "$BASE_BRANCH"...HEAD --name-only -- src/ \
|
||||
| grep -v '__generated__' \
|
||||
| grep -v '__tests__' \
|
||||
| grep -v '\.test\.' \
|
||||
| grep -v '\.stories\.' \
|
||||
| grep -v '\.spec\.'
|
||||
```
|
||||
|
||||
Also read the diff to understand what changed:
|
||||
|
||||
```bash
|
||||
git diff "$BASE_BRANCH"...HEAD --stat -- src/
|
||||
git diff "$BASE_BRANCH"...HEAD -- src/ | head -500
|
||||
```
|
||||
|
||||
## Step 2: Categorize changes and find test targets
|
||||
|
||||
For each changed file, determine:
|
||||
|
||||
1. **Is it a page?** (`page.tsx`) — these are the primary test targets
|
||||
2. **Is it a hook?** (`use*.ts`) — test via the page that uses it
|
||||
3. **Is it a component?** (`.tsx` in `components/`) — test via the parent page unless it's complex enough to warrant isolation
|
||||
4. **Is it a helper?** (`helpers.ts`, `utils.ts`) — unit test directly if pure logic
|
||||
|
||||
**Priority order:**
|
||||
1. Pages with new/changed data fetching or user interactions
|
||||
2. Components with complex internal logic (modals, forms, wizards)
|
||||
3. Hooks with non-trivial business logic
|
||||
4. Pure helper functions
|
||||
|
||||
Skip: styling-only changes, type-only changes, config changes.
|
||||
|
||||
## Step 3: Check for existing tests
|
||||
|
||||
For each test target, check if tests already exist:
|
||||
|
||||
```bash
|
||||
# For a page at src/app/(platform)/library/page.tsx
|
||||
ls src/app/\(platform\)/library/__tests__/ 2>/dev/null
|
||||
|
||||
# For a component at src/app/(platform)/library/components/AgentCard/AgentCard.tsx
|
||||
ls src/app/\(platform\)/library/components/AgentCard/__tests__/ 2>/dev/null
|
||||
```
|
||||
|
||||
Note which targets have no tests (need new files) vs which have tests that need updating.
|
||||
|
||||
## Step 4: Identify API endpoints used
|
||||
|
||||
For each test target, find which API hooks are used:
|
||||
|
||||
```bash
|
||||
# Find generated API hook imports in the changed files
|
||||
grep -rn 'from.*__generated__/endpoints' src/app/\(platform\)/library/
|
||||
grep -rn 'use[A-Z].*V[12]' src/app/\(platform\)/library/
|
||||
```
|
||||
|
||||
For each API hook found, locate the corresponding MSW handler:
|
||||
|
||||
```bash
|
||||
# If the page uses useGetV2ListLibraryAgents, find its MSW handlers
|
||||
grep -rn 'getGetV2ListLibraryAgents.*Handler' src/app/api/__generated__/endpoints/library/library.msw.ts
|
||||
```
|
||||
|
||||
List every MSW handler you will need (200 for happy path, 4xx for error paths).
|
||||
|
||||
## Step 5: Write the test plan
|
||||
|
||||
Before writing code, output a plan as a numbered list:
|
||||
|
||||
```
|
||||
Test plan for [branch name]:
|
||||
|
||||
1. src/app/(platform)/library/__tests__/main.test.tsx (NEW)
|
||||
- Renders page with agent list (MSW 200)
|
||||
- Shows loading state
|
||||
- Shows error state (MSW 422)
|
||||
- Handles empty agent list
|
||||
|
||||
2. src/app/(platform)/library/__tests__/search.test.tsx (NEW)
|
||||
- Filters agents by search query
|
||||
- Shows no results message
|
||||
- Clears search
|
||||
|
||||
3. src/app/(platform)/library/components/AgentCard/__tests__/AgentCard.test.tsx (UPDATE)
|
||||
- Add test for new "duplicate" action
|
||||
```
|
||||
|
||||
Present this plan to the user. Wait for confirmation before proceeding. If the user has feedback, adjust the plan.
|
||||
|
||||
## Step 6: Write the tests
|
||||
|
||||
For each test file in the plan, follow these conventions:
|
||||
|
||||
### File structure
|
||||
|
||||
```tsx
|
||||
import { render, screen, waitFor } from "@/tests/integrations/test-utils";
|
||||
import { server } from "@/mocks/mock-server";
|
||||
// Import MSW handlers for endpoints the page uses
|
||||
import {
|
||||
getGetV2ListLibraryAgentsMockHandler200,
|
||||
getGetV2ListLibraryAgentsMockHandler422,
|
||||
} from "@/app/api/__generated__/endpoints/library/library.msw";
|
||||
// Import the component under test
|
||||
import LibraryPage from "../page";
|
||||
|
||||
describe("LibraryPage", () => {
|
||||
test("renders agent list from API", async () => {
|
||||
server.use(getGetV2ListLibraryAgentsMockHandler200());
|
||||
|
||||
render(<LibraryPage />);
|
||||
|
||||
expect(await screen.findByText(/my agents/i)).toBeDefined();
|
||||
});
|
||||
|
||||
test("shows error state on API failure", async () => {
|
||||
server.use(getGetV2ListLibraryAgentsMockHandler422());
|
||||
|
||||
render(<LibraryPage />);
|
||||
|
||||
expect(await screen.findByText(/error/i)).toBeDefined();
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
### Rules
|
||||
|
||||
- Use `render()` from `@/tests/integrations/test-utils` (NOT from `@testing-library/react` directly)
|
||||
- Use `server.use()` to set up MSW handlers BEFORE rendering
|
||||
- Use `findBy*` (async) for elements that appear after data fetching — NOT `getBy*`
|
||||
- Use `getBy*` only for elements that are immediately present in the DOM
|
||||
- Use `screen` queries — do NOT destructure from `render()`
|
||||
- Use `waitFor` when asserting side effects or state changes after interactions
|
||||
- Import `fireEvent` or `userEvent` from the test-utils for interactions
|
||||
- Do NOT mock internal hooks or functions — mock at the API boundary via MSW
|
||||
- Do NOT use `act()` manually — `render` and `fireEvent` handle it
|
||||
- Keep tests focused: one behavior per test
|
||||
- Use descriptive test names that read like sentences
|
||||
|
||||
### Test location
|
||||
|
||||
```
|
||||
# For pages: __tests__/ next to page.tsx
|
||||
src/app/(platform)/library/__tests__/main.test.tsx
|
||||
|
||||
# For complex standalone components: __tests__/ inside component folder
|
||||
src/app/(platform)/library/components/AgentCard/__tests__/AgentCard.test.tsx
|
||||
|
||||
# For pure helpers: co-located .test.ts
|
||||
src/app/(platform)/library/helpers.test.ts
|
||||
```
|
||||
|
||||
### Custom MSW overrides
|
||||
|
||||
When the auto-generated faker data is not enough, override with specific data:
|
||||
|
||||
```tsx
|
||||
import { http, HttpResponse } from "msw";
|
||||
|
||||
server.use(
|
||||
http.get("http://localhost:3000/api/proxy/api/v2/library/agents", () => {
|
||||
return HttpResponse.json({
|
||||
agents: [
|
||||
{ id: "1", name: "Test Agent", description: "A test agent" },
|
||||
],
|
||||
pagination: { total_items: 1, total_pages: 1, page: 1, page_size: 10 },
|
||||
});
|
||||
}),
|
||||
);
|
||||
```
|
||||
|
||||
Use the proxy URL pattern: `http://localhost:3000/api/proxy/api/v{version}/{path}` — this matches the MSW base URL configured in `orval.config.ts`.
|
||||
|
||||
## Step 7: Run and verify
|
||||
|
||||
After writing all tests:
|
||||
|
||||
```bash
|
||||
cd autogpt_platform/frontend
|
||||
pnpm test:unit --reporter=verbose
|
||||
```
|
||||
|
||||
If tests fail:
|
||||
1. Read the error output carefully
|
||||
2. Fix the test (not the source code, unless there is a genuine bug)
|
||||
3. Re-run until all pass
|
||||
|
||||
Then run the full checks:
|
||||
|
||||
```bash
|
||||
pnpm format
|
||||
pnpm lint
|
||||
pnpm types
|
||||
```
|
||||
25
.github/workflows/platform-fullstack-ci.yml
vendored
25
.github/workflows/platform-fullstack-ci.yml
vendored
@@ -179,21 +179,30 @@ jobs:
|
||||
pip install pyyaml
|
||||
|
||||
# Resolve extends and generate a flat compose file that bake can understand
|
||||
export NEXT_PUBLIC_SOURCEMAPS NEXT_PUBLIC_PW_TEST
|
||||
docker compose -f docker-compose.yml config > docker-compose.resolved.yml
|
||||
|
||||
# Ensure NEXT_PUBLIC_SOURCEMAPS is in resolved compose
|
||||
# (docker compose config on some versions drops this arg)
|
||||
if ! grep -q "NEXT_PUBLIC_SOURCEMAPS" docker-compose.resolved.yml; then
|
||||
echo "Injecting NEXT_PUBLIC_SOURCEMAPS into resolved compose (docker compose config dropped it)"
|
||||
sed -i '/NEXT_PUBLIC_PW_TEST/a\ NEXT_PUBLIC_SOURCEMAPS: "true"' docker-compose.resolved.yml
|
||||
fi
|
||||
|
||||
# Add cache configuration to the resolved compose file
|
||||
python ../.github/workflows/scripts/docker-ci-fix-compose-build-cache.py \
|
||||
--source docker-compose.resolved.yml \
|
||||
--cache-from "type=gha" \
|
||||
--cache-to "type=gha,mode=max" \
|
||||
--backend-hash "${{ hashFiles('autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/poetry.lock', 'autogpt_platform/backend/backend/**') }}" \
|
||||
--frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}" \
|
||||
--frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}-sourcemaps" \
|
||||
--git-ref "${{ github.ref }}"
|
||||
|
||||
# Build with bake using the resolved compose file (now includes cache config)
|
||||
docker buildx bake --allow=fs.read=.. -f docker-compose.resolved.yml --load
|
||||
env:
|
||||
NEXT_PUBLIC_PW_TEST: true
|
||||
NEXT_PUBLIC_SOURCEMAPS: true
|
||||
|
||||
- name: Set up tests - Cache E2E test data
|
||||
id: e2e-data-cache
|
||||
@@ -279,6 +288,11 @@ jobs:
|
||||
cache: "pnpm"
|
||||
cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
|
||||
|
||||
- name: Copy source maps from Docker for E2E coverage
|
||||
run: |
|
||||
FRONTEND_CONTAINER=$(docker compose -f ../docker-compose.resolved.yml ps -q frontend)
|
||||
docker cp "$FRONTEND_CONTAINER":/app/.next/static .next-static-coverage
|
||||
|
||||
- name: Set up tests - Install dependencies
|
||||
run: pnpm install --frozen-lockfile
|
||||
|
||||
@@ -289,6 +303,15 @@ jobs:
|
||||
run: pnpm test:no-build
|
||||
continue-on-error: false
|
||||
|
||||
- name: Upload E2E coverage to Codecov
|
||||
if: ${{ !cancelled() }}
|
||||
uses: codecov/codecov-action@v5
|
||||
with:
|
||||
token: ${{ secrets.CODECOV_TOKEN }}
|
||||
flags: platform-frontend-e2e
|
||||
files: ./autogpt_platform/frontend/coverage/e2e/cobertura-coverage.xml
|
||||
disable_search: true
|
||||
|
||||
- name: Upload Playwright report
|
||||
if: always()
|
||||
uses: actions/upload-artifact@v4
|
||||
|
||||
36
.gitleaks.toml
Normal file
36
.gitleaks.toml
Normal file
@@ -0,0 +1,36 @@
|
||||
title = "AutoGPT Gitleaks Config"
|
||||
|
||||
[extend]
|
||||
useDefault = true
|
||||
|
||||
[allowlist]
|
||||
description = "Global allowlist"
|
||||
paths = [
|
||||
# Template/example env files (no real secrets)
|
||||
'''\.env\.(default|example|template)$''',
|
||||
# Lock files
|
||||
'''pnpm-lock\.yaml$''',
|
||||
'''poetry\.lock$''',
|
||||
# Secrets baseline
|
||||
'''\.secrets\.baseline$''',
|
||||
# Build artifacts and caches (should not be committed)
|
||||
'''__pycache__/''',
|
||||
'''classic/frontend/build/''',
|
||||
# Docker dev setup (local dev JWTs/keys only)
|
||||
'''autogpt_platform/db/docker/''',
|
||||
# Load test configs (dev JWTs)
|
||||
'''load-tests/configs/''',
|
||||
# Test files with fake/fixture keys (_test.py, test_*.py, conftest.py)
|
||||
'''(_test|test_.*|conftest)\.py$''',
|
||||
# Documentation (only contains placeholder keys in curl/API examples)
|
||||
'''docs/.*\.md$''',
|
||||
# Firebase config (public API keys by design)
|
||||
'''google-services\.json$''',
|
||||
'''classic/frontend/(lib|web)/''',
|
||||
]
|
||||
# CI test-only encryption key (marked DO NOT USE IN PRODUCTION)
|
||||
regexes = [
|
||||
'''dvziYgz0KSK8FENhju0ZYi8''',
|
||||
# LLM model name enum values falsely flagged as API keys
|
||||
'''Llama-\d.*Instruct''',
|
||||
]
|
||||
@@ -23,9 +23,15 @@ repos:
|
||||
- id: detect-secrets
|
||||
name: Detect secrets
|
||||
description: Detects high entropy strings that are likely to be passwords.
|
||||
args: ["--baseline", ".secrets.baseline"]
|
||||
files: ^autogpt_platform/
|
||||
exclude: pnpm-lock\.yaml$
|
||||
stages: [pre-push]
|
||||
exclude: (pnpm-lock\.yaml|\.env\.(default|example|template))$
|
||||
|
||||
- repo: https://github.com/gitleaks/gitleaks
|
||||
rev: v8.24.3
|
||||
hooks:
|
||||
- id: gitleaks
|
||||
name: Detect secrets (gitleaks)
|
||||
|
||||
- repo: local
|
||||
# For proper type checking, all dependencies need to be up-to-date.
|
||||
|
||||
467
.secrets.baseline
Normal file
467
.secrets.baseline
Normal file
@@ -0,0 +1,467 @@
|
||||
{
|
||||
"version": "1.5.0",
|
||||
"plugins_used": [
|
||||
{
|
||||
"name": "ArtifactoryDetector"
|
||||
},
|
||||
{
|
||||
"name": "AWSKeyDetector"
|
||||
},
|
||||
{
|
||||
"name": "AzureStorageKeyDetector"
|
||||
},
|
||||
{
|
||||
"name": "Base64HighEntropyString",
|
||||
"limit": 4.5
|
||||
},
|
||||
{
|
||||
"name": "BasicAuthDetector"
|
||||
},
|
||||
{
|
||||
"name": "CloudantDetector"
|
||||
},
|
||||
{
|
||||
"name": "DiscordBotTokenDetector"
|
||||
},
|
||||
{
|
||||
"name": "GitHubTokenDetector"
|
||||
},
|
||||
{
|
||||
"name": "GitLabTokenDetector"
|
||||
},
|
||||
{
|
||||
"name": "HexHighEntropyString",
|
||||
"limit": 3.0
|
||||
},
|
||||
{
|
||||
"name": "IbmCloudIamDetector"
|
||||
},
|
||||
{
|
||||
"name": "IbmCosHmacDetector"
|
||||
},
|
||||
{
|
||||
"name": "IPPublicDetector"
|
||||
},
|
||||
{
|
||||
"name": "JwtTokenDetector"
|
||||
},
|
||||
{
|
||||
"name": "KeywordDetector",
|
||||
"keyword_exclude": ""
|
||||
},
|
||||
{
|
||||
"name": "MailchimpDetector"
|
||||
},
|
||||
{
|
||||
"name": "NpmDetector"
|
||||
},
|
||||
{
|
||||
"name": "OpenAIDetector"
|
||||
},
|
||||
{
|
||||
"name": "PrivateKeyDetector"
|
||||
},
|
||||
{
|
||||
"name": "PypiTokenDetector"
|
||||
},
|
||||
{
|
||||
"name": "SendGridDetector"
|
||||
},
|
||||
{
|
||||
"name": "SlackDetector"
|
||||
},
|
||||
{
|
||||
"name": "SoftlayerDetector"
|
||||
},
|
||||
{
|
||||
"name": "SquareOAuthDetector"
|
||||
},
|
||||
{
|
||||
"name": "StripeDetector"
|
||||
},
|
||||
{
|
||||
"name": "TelegramBotTokenDetector"
|
||||
},
|
||||
{
|
||||
"name": "TwilioKeyDetector"
|
||||
}
|
||||
],
|
||||
"filters_used": [
|
||||
{
|
||||
"path": "detect_secrets.filters.allowlist.is_line_allowlisted"
|
||||
},
|
||||
{
|
||||
"path": "detect_secrets.filters.common.is_ignored_due_to_verification_policies",
|
||||
"min_level": 2
|
||||
},
|
||||
{
|
||||
"path": "detect_secrets.filters.heuristic.is_indirect_reference"
|
||||
},
|
||||
{
|
||||
"path": "detect_secrets.filters.heuristic.is_likely_id_string"
|
||||
},
|
||||
{
|
||||
"path": "detect_secrets.filters.heuristic.is_lock_file"
|
||||
},
|
||||
{
|
||||
"path": "detect_secrets.filters.heuristic.is_not_alphanumeric_string"
|
||||
},
|
||||
{
|
||||
"path": "detect_secrets.filters.heuristic.is_potential_uuid"
|
||||
},
|
||||
{
|
||||
"path": "detect_secrets.filters.heuristic.is_prefixed_with_dollar_sign"
|
||||
},
|
||||
{
|
||||
"path": "detect_secrets.filters.heuristic.is_sequential_string"
|
||||
},
|
||||
{
|
||||
"path": "detect_secrets.filters.heuristic.is_swagger_file"
|
||||
},
|
||||
{
|
||||
"path": "detect_secrets.filters.heuristic.is_templated_secret"
|
||||
},
|
||||
{
|
||||
"path": "detect_secrets.filters.regex.should_exclude_file",
|
||||
"pattern": [
|
||||
"\\.env$",
|
||||
"pnpm-lock\\.yaml$",
|
||||
"\\.env\\.(default|example|template)$",
|
||||
"__pycache__",
|
||||
"_test\\.py$",
|
||||
"test_.*\\.py$",
|
||||
"conftest\\.py$",
|
||||
"poetry\\.lock$",
|
||||
"node_modules"
|
||||
]
|
||||
}
|
||||
],
|
||||
"results": {
|
||||
"autogpt_platform/backend/backend/api/external/v1/integrations.py": [
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/backend/backend/api/external/v1/integrations.py",
|
||||
"hashed_secret": "665b1e3851eefefa3fb878654292f16597d25155",
|
||||
"is_verified": false,
|
||||
"line_number": 289
|
||||
}
|
||||
],
|
||||
"autogpt_platform/backend/backend/blocks/airtable/_config.py": [
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/backend/backend/blocks/airtable/_config.py",
|
||||
"hashed_secret": "57e168b03afb7c1ee3cdc4ee3db2fe1cc6e0df26",
|
||||
"is_verified": false,
|
||||
"line_number": 29
|
||||
}
|
||||
],
|
||||
"autogpt_platform/backend/backend/blocks/dataforseo/_config.py": [
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/backend/backend/blocks/dataforseo/_config.py",
|
||||
"hashed_secret": "32ce93887331fa5d192f2876ea15ec000c7d58b8",
|
||||
"is_verified": false,
|
||||
"line_number": 12
|
||||
}
|
||||
],
|
||||
"autogpt_platform/backend/backend/blocks/github/checks.py": [
|
||||
{
|
||||
"type": "Hex High Entropy String",
|
||||
"filename": "autogpt_platform/backend/backend/blocks/github/checks.py",
|
||||
"hashed_secret": "8ac6f92737d8586790519c5d7bfb4d2eb172c238",
|
||||
"is_verified": false,
|
||||
"line_number": 108
|
||||
}
|
||||
],
|
||||
"autogpt_platform/backend/backend/blocks/github/ci.py": [
|
||||
{
|
||||
"type": "Hex High Entropy String",
|
||||
"filename": "autogpt_platform/backend/backend/blocks/github/ci.py",
|
||||
"hashed_secret": "90bd1b48e958257948487b90bee080ba5ed00caa",
|
||||
"is_verified": false,
|
||||
"line_number": 123
|
||||
}
|
||||
],
|
||||
"autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json": [
|
||||
{
|
||||
"type": "Hex High Entropy String",
|
||||
"filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
|
||||
"hashed_secret": "f96896dafced7387dcd22343b8ea29d3d2c65663",
|
||||
"is_verified": false,
|
||||
"line_number": 42
|
||||
},
|
||||
{
|
||||
"type": "Hex High Entropy String",
|
||||
"filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
|
||||
"hashed_secret": "b80a94d5e70bedf4f5f89d2f5a5255cc9492d12e",
|
||||
"is_verified": false,
|
||||
"line_number": 193
|
||||
},
|
||||
{
|
||||
"type": "Hex High Entropy String",
|
||||
"filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
|
||||
"hashed_secret": "75b17e517fe1b3136394f6bec80c4f892da75e42",
|
||||
"is_verified": false,
|
||||
"line_number": 344
|
||||
},
|
||||
{
|
||||
"type": "Hex High Entropy String",
|
||||
"filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
|
||||
"hashed_secret": "b0bfb5e4e2394e7f8906e5ed1dffd88b2bc89dd5",
|
||||
"is_verified": false,
|
||||
"line_number": 534
|
||||
}
|
||||
],
|
||||
"autogpt_platform/backend/backend/blocks/github/statuses.py": [
|
||||
{
|
||||
"type": "Hex High Entropy String",
|
||||
"filename": "autogpt_platform/backend/backend/blocks/github/statuses.py",
|
||||
"hashed_secret": "8ac6f92737d8586790519c5d7bfb4d2eb172c238",
|
||||
"is_verified": false,
|
||||
"line_number": 85
|
||||
}
|
||||
],
|
||||
"autogpt_platform/backend/backend/blocks/google/docs.py": [
|
||||
{
|
||||
"type": "Hex High Entropy String",
|
||||
"filename": "autogpt_platform/backend/backend/blocks/google/docs.py",
|
||||
"hashed_secret": "c95da0c6696342c867ef0c8258d2f74d20fd94d4",
|
||||
"is_verified": false,
|
||||
"line_number": 203
|
||||
}
|
||||
],
|
||||
"autogpt_platform/backend/backend/blocks/google/sheets.py": [
|
||||
{
|
||||
"type": "Base64 High Entropy String",
|
||||
"filename": "autogpt_platform/backend/backend/blocks/google/sheets.py",
|
||||
"hashed_secret": "bd5a04fa3667e693edc13239b6d310c5c7a8564b",
|
||||
"is_verified": false,
|
||||
"line_number": 57
|
||||
}
|
||||
],
|
||||
"autogpt_platform/backend/backend/blocks/linear/_config.py": [
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/backend/backend/blocks/linear/_config.py",
|
||||
"hashed_secret": "b37f020f42d6d613b6ce30103e4d408c4499b3bb",
|
||||
"is_verified": false,
|
||||
"line_number": 53
|
||||
}
|
||||
],
|
||||
"autogpt_platform/backend/backend/blocks/medium.py": [
|
||||
{
|
||||
"type": "Hex High Entropy String",
|
||||
"filename": "autogpt_platform/backend/backend/blocks/medium.py",
|
||||
"hashed_secret": "ff998abc1ce6d8f01a675fa197368e44c8916e9c",
|
||||
"is_verified": false,
|
||||
"line_number": 131
|
||||
}
|
||||
],
|
||||
"autogpt_platform/backend/backend/blocks/replicate/replicate_block.py": [
|
||||
{
|
||||
"type": "Hex High Entropy String",
|
||||
"filename": "autogpt_platform/backend/backend/blocks/replicate/replicate_block.py",
|
||||
"hashed_secret": "8bbdd6f26368f58ea4011d13d7f763cb662e66f0",
|
||||
"is_verified": false,
|
||||
"line_number": 55
|
||||
}
|
||||
],
|
||||
"autogpt_platform/backend/backend/blocks/slant3d/webhook.py": [
|
||||
{
|
||||
"type": "Hex High Entropy String",
|
||||
"filename": "autogpt_platform/backend/backend/blocks/slant3d/webhook.py",
|
||||
"hashed_secret": "36263c76947443b2f6e6b78153967ac4a7da99f9",
|
||||
"is_verified": false,
|
||||
"line_number": 100
|
||||
}
|
||||
],
|
||||
"autogpt_platform/backend/backend/blocks/talking_head.py": [
|
||||
{
|
||||
"type": "Base64 High Entropy String",
|
||||
"filename": "autogpt_platform/backend/backend/blocks/talking_head.py",
|
||||
"hashed_secret": "44ce2d66222529eea4a32932823466fc0601c799",
|
||||
"is_verified": false,
|
||||
"line_number": 113
|
||||
}
|
||||
],
|
||||
"autogpt_platform/backend/backend/blocks/wordpress/_config.py": [
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/backend/backend/blocks/wordpress/_config.py",
|
||||
"hashed_secret": "e62679512436161b78e8a8d68c8829c2a1031ccb",
|
||||
"is_verified": false,
|
||||
"line_number": 17
|
||||
}
|
||||
],
|
||||
"autogpt_platform/backend/backend/util/cache.py": [
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/backend/backend/util/cache.py",
|
||||
"hashed_secret": "37f0c918c3fa47ca4a70e42037f9f123fdfbc75b",
|
||||
"is_verified": false,
|
||||
"line_number": 449
|
||||
}
|
||||
],
|
||||
"autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/helpers.ts": [
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/helpers.ts",
|
||||
"hashed_secret": "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8",
|
||||
"is_verified": false,
|
||||
"line_number": 6
|
||||
}
|
||||
],
|
||||
"autogpt_platform/frontend/src/app/(platform)/dictionaries/en.json": [
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/frontend/src/app/(platform)/dictionaries/en.json",
|
||||
"hashed_secret": "8be3c943b1609fffbfc51aad666d0a04adf83c9d",
|
||||
"is_verified": false,
|
||||
"line_number": 5
|
||||
}
|
||||
],
|
||||
"autogpt_platform/frontend/src/app/(platform)/dictionaries/es.json": [
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/frontend/src/app/(platform)/dictionaries/es.json",
|
||||
"hashed_secret": "5a6d1c612954979ea99ee33dbb2d231b00f6ac0a",
|
||||
"is_verified": false,
|
||||
"line_number": 5
|
||||
}
|
||||
],
|
||||
"autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts": [
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts",
|
||||
"hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
|
||||
"is_verified": false,
|
||||
"line_number": 6
|
||||
},
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts",
|
||||
"hashed_secret": "f72cbb45464d487064610c5411c576ca4019d380",
|
||||
"is_verified": false,
|
||||
"line_number": 8
|
||||
}
|
||||
],
|
||||
"autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts": [
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts",
|
||||
"hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
|
||||
"is_verified": false,
|
||||
"line_number": 5
|
||||
},
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts",
|
||||
"hashed_secret": "f72cbb45464d487064610c5411c576ca4019d380",
|
||||
"is_verified": false,
|
||||
"line_number": 7
|
||||
}
|
||||
],
|
||||
"autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx": [
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx",
|
||||
"hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
|
||||
"is_verified": false,
|
||||
"line_number": 192
|
||||
},
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx",
|
||||
"hashed_secret": "86275db852204937bbdbdebe5fabe8536e030ab6",
|
||||
"is_verified": false,
|
||||
"line_number": 193
|
||||
}
|
||||
],
|
||||
"autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts": [
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts",
|
||||
"hashed_secret": "47acd2028cf81b5da88ddeedb2aea4eca4b71fbd",
|
||||
"is_verified": false,
|
||||
"line_number": 102
|
||||
},
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts",
|
||||
"hashed_secret": "8be3c943b1609fffbfc51aad666d0a04adf83c9d",
|
||||
"is_verified": false,
|
||||
"line_number": 103
|
||||
}
|
||||
],
|
||||
"autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts": [
|
||||
{
|
||||
"type": "Base64 High Entropy String",
|
||||
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
|
||||
"hashed_secret": "9c486c92f1a7420e1045c7ad963fbb7ba3621025",
|
||||
"is_verified": false,
|
||||
"line_number": 73
|
||||
},
|
||||
{
|
||||
"type": "Base64 High Entropy String",
|
||||
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
|
||||
"hashed_secret": "9277508c7a6effc8fb59163efbfada189e35425c",
|
||||
"is_verified": false,
|
||||
"line_number": 75
|
||||
},
|
||||
{
|
||||
"type": "Base64 High Entropy String",
|
||||
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
|
||||
"hashed_secret": "8dc7e2cb1d0935897d541bf5facab389b8a50340",
|
||||
"is_verified": false,
|
||||
"line_number": 77
|
||||
},
|
||||
{
|
||||
"type": "Base64 High Entropy String",
|
||||
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
|
||||
"hashed_secret": "79a26ad48775944299be6aaf9fb1d5302c1ed75b",
|
||||
"is_verified": false,
|
||||
"line_number": 79
|
||||
},
|
||||
{
|
||||
"type": "Base64 High Entropy String",
|
||||
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
|
||||
"hashed_secret": "a3b62b44500a1612e48d4cab8294df81561b3b1a",
|
||||
"is_verified": false,
|
||||
"line_number": 81
|
||||
},
|
||||
{
|
||||
"type": "Base64 High Entropy String",
|
||||
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
|
||||
"hashed_secret": "a58979bd0b21ef4f50417d001008e60dd7a85c64",
|
||||
"is_verified": false,
|
||||
"line_number": 83
|
||||
},
|
||||
{
|
||||
"type": "Base64 High Entropy String",
|
||||
"filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
|
||||
"hashed_secret": "6cb6e075f8e8c7c850f9d128d6608e5dbe209a79",
|
||||
"is_verified": false,
|
||||
"line_number": 85
|
||||
}
|
||||
],
|
||||
"autogpt_platform/frontend/src/lib/constants.ts": [
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/frontend/src/lib/constants.ts",
|
||||
"hashed_secret": "27b924db06a28cc755fb07c54f0fddc30659fe4d",
|
||||
"is_verified": false,
|
||||
"line_number": 10
|
||||
}
|
||||
],
|
||||
"autogpt_platform/frontend/src/tests/credentials/index.ts": [
|
||||
{
|
||||
"type": "Secret Keyword",
|
||||
"filename": "autogpt_platform/frontend/src/tests/credentials/index.ts",
|
||||
"hashed_secret": "c18006fc138809314751cd1991f1e0b820fabd37",
|
||||
"is_verified": false,
|
||||
"line_number": 4
|
||||
}
|
||||
]
|
||||
},
|
||||
"generated_at": "2026-04-02T13:10:54Z"
|
||||
}
|
||||
@@ -30,7 +30,7 @@ See `/frontend/CONTRIBUTING.md` for complete patterns. Quick reference:
|
||||
- Regenerate with `pnpm generate:api`
|
||||
- Pattern: `use{Method}{Version}{OperationName}`
|
||||
4. **Styling**: Tailwind CSS only, use design tokens, Phosphor Icons only
|
||||
5. **Testing**: Add Storybook stories for new components, Playwright for E2E
|
||||
5. **Testing**: Integration tests (Vitest + RTL + MSW) are the default (~90%, page-level). Playwright for E2E critical flows. Storybook for design system components. See `autogpt_platform/frontend/TESTING.md`
|
||||
6. **Code conventions**: Function declarations (not arrow functions) for components/handlers
|
||||
|
||||
- Component props should be `interface Props { ... }` (not exported) unless the interface needs to be used outside the component
|
||||
@@ -47,7 +47,9 @@ See `/frontend/CONTRIBUTING.md` for complete patterns. Quick reference:
|
||||
## Testing
|
||||
|
||||
- Backend: `poetry run test` (runs pytest with a docker based postgres + prisma).
|
||||
- Frontend: `pnpm test` or `pnpm test-ui` for Playwright tests. See `docs/content/platform/contributing/tests.md` for tips.
|
||||
- Frontend integration tests: `pnpm test:unit` (Vitest + RTL + MSW, primary testing approach).
|
||||
- Frontend E2E tests: `pnpm test` or `pnpm test-ui` for Playwright tests.
|
||||
- See `autogpt_platform/frontend/TESTING.md` for the full testing strategy.
|
||||
|
||||
Always run the relevant linters and tests before committing.
|
||||
Use conventional commit messages for all commits (e.g. `feat(backend): add API`).
|
||||
|
||||
@@ -0,0 +1,98 @@
|
||||
import logging
|
||||
from datetime import datetime
|
||||
|
||||
from autogpt_libs.auth import get_user_id, requires_admin_user
|
||||
from cachetools import TTLCache
|
||||
from fastapi import APIRouter, Query, Security
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.data.platform_cost import (
|
||||
CostLogRow,
|
||||
PlatformCostDashboard,
|
||||
get_platform_cost_dashboard,
|
||||
get_platform_cost_logs,
|
||||
)
|
||||
from backend.util.models import Pagination
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Cache dashboard results for 30 seconds per unique filter combination.
|
||||
# The table is append-only so stale reads are acceptable for analytics.
|
||||
_DASHBOARD_CACHE_TTL = 30
|
||||
_dashboard_cache: TTLCache[tuple, PlatformCostDashboard] = TTLCache(
|
||||
maxsize=256, ttl=_DASHBOARD_CACHE_TTL
|
||||
)
|
||||
|
||||
|
||||
router = APIRouter(
|
||||
prefix="/platform-costs",
|
||||
tags=["platform-cost", "admin"],
|
||||
dependencies=[Security(requires_admin_user)],
|
||||
)
|
||||
|
||||
|
||||
class PlatformCostLogsResponse(BaseModel):
|
||||
logs: list[CostLogRow]
|
||||
pagination: Pagination
|
||||
|
||||
|
||||
@router.get(
|
||||
"/dashboard",
|
||||
response_model=PlatformCostDashboard,
|
||||
summary="Get Platform Cost Dashboard",
|
||||
)
|
||||
async def get_cost_dashboard(
|
||||
admin_user_id: str = Security(get_user_id),
|
||||
start: datetime | None = Query(None),
|
||||
end: datetime | None = Query(None),
|
||||
provider: str | None = Query(None),
|
||||
user_id: str | None = Query(None),
|
||||
):
|
||||
logger.info("Admin %s fetching platform cost dashboard", admin_user_id)
|
||||
cache_key = (start, end, provider, user_id)
|
||||
cached = _dashboard_cache.get(cache_key)
|
||||
if cached is not None:
|
||||
return cached
|
||||
result = await get_platform_cost_dashboard(
|
||||
start=start,
|
||||
end=end,
|
||||
provider=provider,
|
||||
user_id=user_id,
|
||||
)
|
||||
_dashboard_cache[cache_key] = result
|
||||
return result
|
||||
|
||||
|
||||
@router.get(
|
||||
"/logs",
|
||||
response_model=PlatformCostLogsResponse,
|
||||
summary="Get Platform Cost Logs",
|
||||
)
|
||||
async def get_cost_logs(
|
||||
admin_user_id: str = Security(get_user_id),
|
||||
start: datetime | None = Query(None),
|
||||
end: datetime | None = Query(None),
|
||||
provider: str | None = Query(None),
|
||||
user_id: str | None = Query(None),
|
||||
page: int = Query(1, ge=1),
|
||||
page_size: int = Query(50, ge=1, le=200),
|
||||
):
|
||||
logger.info("Admin %s fetching platform cost logs", admin_user_id)
|
||||
logs, total = await get_platform_cost_logs(
|
||||
start=start,
|
||||
end=end,
|
||||
provider=provider,
|
||||
user_id=user_id,
|
||||
page=page,
|
||||
page_size=page_size,
|
||||
)
|
||||
total_pages = (total + page_size - 1) // page_size
|
||||
return PlatformCostLogsResponse(
|
||||
logs=logs,
|
||||
pagination=Pagination(
|
||||
total_items=total,
|
||||
total_pages=total_pages,
|
||||
current_page=page,
|
||||
page_size=page_size,
|
||||
),
|
||||
)
|
||||
@@ -0,0 +1,192 @@
|
||||
from unittest.mock import AsyncMock
|
||||
|
||||
import fastapi
|
||||
import fastapi.testclient
|
||||
import pytest
|
||||
import pytest_mock
|
||||
from autogpt_libs.auth.jwt_utils import get_jwt_payload
|
||||
|
||||
from backend.data.platform_cost import PlatformCostDashboard
|
||||
|
||||
from . import platform_cost_routes
|
||||
from .platform_cost_routes import router as platform_cost_router
|
||||
|
||||
app = fastapi.FastAPI()
|
||||
app.include_router(platform_cost_router)
|
||||
|
||||
client = fastapi.testclient.TestClient(app)
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def setup_app_admin_auth(mock_jwt_admin):
|
||||
"""Setup admin auth overrides for all tests in this module"""
|
||||
app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
|
||||
# Clear TTL cache so each test starts cold.
|
||||
platform_cost_routes._dashboard_cache.clear()
|
||||
yield
|
||||
app.dependency_overrides.clear()
|
||||
|
||||
|
||||
def test_get_dashboard_success(
|
||||
mocker: pytest_mock.MockerFixture,
|
||||
) -> None:
|
||||
real_dashboard = PlatformCostDashboard(
|
||||
by_provider=[],
|
||||
by_user=[],
|
||||
total_cost_microdollars=0,
|
||||
total_requests=0,
|
||||
total_users=0,
|
||||
)
|
||||
mocker.patch(
|
||||
"backend.api.features.admin.platform_cost_routes.get_platform_cost_dashboard",
|
||||
AsyncMock(return_value=real_dashboard),
|
||||
)
|
||||
|
||||
response = client.get("/platform-costs/dashboard")
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert "by_provider" in data
|
||||
assert "by_user" in data
|
||||
assert data["total_cost_microdollars"] == 0
|
||||
|
||||
|
||||
def test_get_logs_success(
|
||||
mocker: pytest_mock.MockerFixture,
|
||||
) -> None:
|
||||
mocker.patch(
|
||||
"backend.api.features.admin.platform_cost_routes.get_platform_cost_logs",
|
||||
AsyncMock(return_value=([], 0)),
|
||||
)
|
||||
|
||||
response = client.get("/platform-costs/logs")
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["logs"] == []
|
||||
assert data["pagination"]["total_items"] == 0
|
||||
|
||||
|
||||
def test_get_dashboard_with_filters(
|
||||
mocker: pytest_mock.MockerFixture,
|
||||
) -> None:
|
||||
real_dashboard = PlatformCostDashboard(
|
||||
by_provider=[],
|
||||
by_user=[],
|
||||
total_cost_microdollars=0,
|
||||
total_requests=0,
|
||||
total_users=0,
|
||||
)
|
||||
mock_dashboard = AsyncMock(return_value=real_dashboard)
|
||||
mocker.patch(
|
||||
"backend.api.features.admin.platform_cost_routes.get_platform_cost_dashboard",
|
||||
mock_dashboard,
|
||||
)
|
||||
|
||||
response = client.get(
|
||||
"/platform-costs/dashboard",
|
||||
params={
|
||||
"start": "2026-01-01T00:00:00",
|
||||
"end": "2026-04-01T00:00:00",
|
||||
"provider": "openai",
|
||||
"user_id": "test-user-123",
|
||||
},
|
||||
)
|
||||
assert response.status_code == 200
|
||||
mock_dashboard.assert_called_once()
|
||||
call_kwargs = mock_dashboard.call_args.kwargs
|
||||
assert call_kwargs["provider"] == "openai"
|
||||
assert call_kwargs["user_id"] == "test-user-123"
|
||||
assert call_kwargs["start"] is not None
|
||||
assert call_kwargs["end"] is not None
|
||||
|
||||
|
||||
def test_get_logs_with_pagination(
|
||||
mocker: pytest_mock.MockerFixture,
|
||||
) -> None:
|
||||
mocker.patch(
|
||||
"backend.api.features.admin.platform_cost_routes.get_platform_cost_logs",
|
||||
AsyncMock(return_value=([], 0)),
|
||||
)
|
||||
|
||||
response = client.get(
|
||||
"/platform-costs/logs",
|
||||
params={"page": 2, "page_size": 25, "provider": "anthropic"},
|
||||
)
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["pagination"]["current_page"] == 2
|
||||
assert data["pagination"]["page_size"] == 25
|
||||
|
||||
|
||||
def test_get_dashboard_requires_admin() -> None:
|
||||
import fastapi
|
||||
from fastapi import HTTPException
|
||||
|
||||
def reject_jwt(request: fastapi.Request):
|
||||
raise HTTPException(status_code=401, detail="Not authenticated")
|
||||
|
||||
app.dependency_overrides[get_jwt_payload] = reject_jwt
|
||||
try:
|
||||
response = client.get("/platform-costs/dashboard")
|
||||
assert response.status_code == 401
|
||||
response = client.get("/platform-costs/logs")
|
||||
assert response.status_code == 401
|
||||
finally:
|
||||
app.dependency_overrides.clear()
|
||||
|
||||
|
||||
def test_get_dashboard_rejects_non_admin(mock_jwt_user, mock_jwt_admin) -> None:
|
||||
"""Non-admin JWT must be rejected with 403 by requires_admin_user."""
|
||||
app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
|
||||
try:
|
||||
response = client.get("/platform-costs/dashboard")
|
||||
assert response.status_code == 403
|
||||
response = client.get("/platform-costs/logs")
|
||||
assert response.status_code == 403
|
||||
finally:
|
||||
app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
|
||||
|
||||
|
||||
def test_get_logs_invalid_page_size_too_large() -> None:
|
||||
"""page_size > 200 must be rejected with 422."""
|
||||
response = client.get("/platform-costs/logs", params={"page_size": 201})
|
||||
assert response.status_code == 422
|
||||
|
||||
|
||||
def test_get_logs_invalid_page_size_zero() -> None:
|
||||
"""page_size = 0 (below ge=1) must be rejected with 422."""
|
||||
response = client.get("/platform-costs/logs", params={"page_size": 0})
|
||||
assert response.status_code == 422
|
||||
|
||||
|
||||
def test_get_logs_invalid_page_negative() -> None:
|
||||
"""page < 1 must be rejected with 422."""
|
||||
response = client.get("/platform-costs/logs", params={"page": 0})
|
||||
assert response.status_code == 422
|
||||
|
||||
|
||||
def test_get_dashboard_invalid_date_format() -> None:
|
||||
"""Malformed start date must be rejected with 422."""
|
||||
response = client.get("/platform-costs/dashboard", params={"start": "not-a-date"})
|
||||
assert response.status_code == 422
|
||||
|
||||
|
||||
def test_get_dashboard_cache_hit(
|
||||
mocker: pytest_mock.MockerFixture,
|
||||
) -> None:
|
||||
"""Second identical request returns cached result without calling the DB again."""
|
||||
real_dashboard = PlatformCostDashboard(
|
||||
by_provider=[],
|
||||
by_user=[],
|
||||
total_cost_microdollars=42,
|
||||
total_requests=1,
|
||||
total_users=1,
|
||||
)
|
||||
mock_fn = mocker.patch(
|
||||
"backend.api.features.admin.platform_cost_routes.get_platform_cost_dashboard",
|
||||
AsyncMock(return_value=real_dashboard),
|
||||
)
|
||||
|
||||
client.get("/platform-costs/dashboard")
|
||||
client.get("/platform-costs/dashboard")
|
||||
|
||||
mock_fn.assert_awaited_once() # second request hit the cache
|
||||
@@ -9,11 +9,14 @@ from pydantic import BaseModel
|
||||
|
||||
from backend.copilot.config import ChatConfig
|
||||
from backend.copilot.rate_limit import (
|
||||
SubscriptionTier,
|
||||
get_global_rate_limits,
|
||||
get_usage_status,
|
||||
get_user_tier,
|
||||
reset_user_usage,
|
||||
set_user_tier,
|
||||
)
|
||||
from backend.data.user import get_user_by_email, get_user_email_by_id
|
||||
from backend.data.user import get_user_by_email, get_user_email_by_id, search_users
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -33,6 +36,17 @@ class UserRateLimitResponse(BaseModel):
|
||||
weekly_token_limit: int
|
||||
daily_tokens_used: int
|
||||
weekly_tokens_used: int
|
||||
tier: SubscriptionTier
|
||||
|
||||
|
||||
class UserTierResponse(BaseModel):
|
||||
user_id: str
|
||||
tier: SubscriptionTier
|
||||
|
||||
|
||||
class SetUserTierRequest(BaseModel):
|
||||
user_id: str
|
||||
tier: SubscriptionTier
|
||||
|
||||
|
||||
async def _resolve_user_id(
|
||||
@@ -86,10 +100,10 @@ async def get_user_rate_limit(
|
||||
|
||||
logger.info("Admin %s checking rate limit for user %s", admin_user_id, resolved_id)
|
||||
|
||||
daily_limit, weekly_limit = await get_global_rate_limits(
|
||||
daily_limit, weekly_limit, tier = await get_global_rate_limits(
|
||||
resolved_id, config.daily_token_limit, config.weekly_token_limit
|
||||
)
|
||||
usage = await get_usage_status(resolved_id, daily_limit, weekly_limit)
|
||||
usage = await get_usage_status(resolved_id, daily_limit, weekly_limit, tier=tier)
|
||||
|
||||
return UserRateLimitResponse(
|
||||
user_id=resolved_id,
|
||||
@@ -98,6 +112,7 @@ async def get_user_rate_limit(
|
||||
weekly_token_limit=weekly_limit,
|
||||
daily_tokens_used=usage.daily.used,
|
||||
weekly_tokens_used=usage.weekly.used,
|
||||
tier=tier,
|
||||
)
|
||||
|
||||
|
||||
@@ -125,10 +140,10 @@ async def reset_user_rate_limit(
|
||||
logger.exception("Failed to reset user usage")
|
||||
raise HTTPException(status_code=500, detail="Failed to reset usage") from e
|
||||
|
||||
daily_limit, weekly_limit = await get_global_rate_limits(
|
||||
daily_limit, weekly_limit, tier = await get_global_rate_limits(
|
||||
user_id, config.daily_token_limit, config.weekly_token_limit
|
||||
)
|
||||
usage = await get_usage_status(user_id, daily_limit, weekly_limit)
|
||||
usage = await get_usage_status(user_id, daily_limit, weekly_limit, tier=tier)
|
||||
|
||||
try:
|
||||
resolved_email = await get_user_email_by_id(user_id)
|
||||
@@ -143,4 +158,102 @@ async def reset_user_rate_limit(
|
||||
weekly_token_limit=weekly_limit,
|
||||
daily_tokens_used=usage.daily.used,
|
||||
weekly_tokens_used=usage.weekly.used,
|
||||
tier=tier,
|
||||
)
|
||||
|
||||
|
||||
@router.get(
|
||||
"/rate_limit/tier",
|
||||
response_model=UserTierResponse,
|
||||
summary="Get User Rate Limit Tier",
|
||||
)
|
||||
async def get_user_rate_limit_tier(
|
||||
user_id: str,
|
||||
admin_user_id: str = Security(get_user_id),
|
||||
) -> UserTierResponse:
|
||||
"""Get a user's current rate-limit tier. Admin-only.
|
||||
|
||||
Returns 404 if the user does not exist in the database.
|
||||
"""
|
||||
logger.info("Admin %s checking tier for user %s", admin_user_id, user_id)
|
||||
|
||||
resolved_email = await get_user_email_by_id(user_id)
|
||||
if resolved_email is None:
|
||||
raise HTTPException(status_code=404, detail=f"User {user_id} not found")
|
||||
|
||||
tier = await get_user_tier(user_id)
|
||||
return UserTierResponse(user_id=user_id, tier=tier)
|
||||
|
||||
|
||||
@router.post(
|
||||
"/rate_limit/tier",
|
||||
response_model=UserTierResponse,
|
||||
summary="Set User Rate Limit Tier",
|
||||
)
|
||||
async def set_user_rate_limit_tier(
|
||||
request: SetUserTierRequest,
|
||||
admin_user_id: str = Security(get_user_id),
|
||||
) -> UserTierResponse:
|
||||
"""Set a user's rate-limit tier. Admin-only.
|
||||
|
||||
Returns 404 if the user does not exist in the database.
|
||||
"""
|
||||
try:
|
||||
resolved_email = await get_user_email_by_id(request.user_id)
|
||||
except Exception:
|
||||
logger.warning(
|
||||
"Failed to resolve email for user %s",
|
||||
request.user_id,
|
||||
exc_info=True,
|
||||
)
|
||||
resolved_email = None
|
||||
|
||||
if resolved_email is None:
|
||||
raise HTTPException(status_code=404, detail=f"User {request.user_id} not found")
|
||||
|
||||
old_tier = await get_user_tier(request.user_id)
|
||||
logger.info(
|
||||
"Admin %s changing tier for user %s (%s): %s -> %s",
|
||||
admin_user_id,
|
||||
request.user_id,
|
||||
resolved_email,
|
||||
old_tier.value,
|
||||
request.tier.value,
|
||||
)
|
||||
try:
|
||||
await set_user_tier(request.user_id, request.tier)
|
||||
except Exception as e:
|
||||
logger.exception("Failed to set user tier")
|
||||
raise HTTPException(status_code=500, detail="Failed to set tier") from e
|
||||
|
||||
return UserTierResponse(user_id=request.user_id, tier=request.tier)
|
||||
|
||||
|
||||
class UserSearchResult(BaseModel):
|
||||
user_id: str
|
||||
user_email: Optional[str] = None
|
||||
|
||||
|
||||
@router.get(
|
||||
"/rate_limit/search_users",
|
||||
response_model=list[UserSearchResult],
|
||||
summary="Search Users by Name or Email",
|
||||
)
|
||||
async def admin_search_users(
|
||||
query: str,
|
||||
limit: int = 20,
|
||||
admin_user_id: str = Security(get_user_id),
|
||||
) -> list[UserSearchResult]:
|
||||
"""Search users by partial email or name. Admin-only.
|
||||
|
||||
Queries the User table directly — returns results even for users
|
||||
without credit transaction history.
|
||||
"""
|
||||
if len(query.strip()) < 3:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail="Search query must be at least 3 characters.",
|
||||
)
|
||||
logger.info("Admin %s searching users with query=%r", admin_user_id, query)
|
||||
results = await search_users(query, limit=max(1, min(limit, 50)))
|
||||
return [UserSearchResult(user_id=uid, user_email=email) for uid, email in results]
|
||||
|
||||
@@ -9,7 +9,7 @@ import pytest_mock
|
||||
from autogpt_libs.auth.jwt_utils import get_jwt_payload
|
||||
from pytest_snapshot.plugin import Snapshot
|
||||
|
||||
from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow
|
||||
from backend.copilot.rate_limit import CoPilotUsageStatus, SubscriptionTier, UsageWindow
|
||||
|
||||
from .rate_limit_admin_routes import router as rate_limit_admin_router
|
||||
|
||||
@@ -57,7 +57,7 @@ def _patch_rate_limit_deps(
|
||||
mocker.patch(
|
||||
f"{_MOCK_MODULE}.get_global_rate_limits",
|
||||
new_callable=AsyncMock,
|
||||
return_value=(2_500_000, 12_500_000),
|
||||
return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
|
||||
)
|
||||
mocker.patch(
|
||||
f"{_MOCK_MODULE}.get_usage_status",
|
||||
@@ -89,6 +89,7 @@ def test_get_rate_limit(
|
||||
assert data["weekly_token_limit"] == 12_500_000
|
||||
assert data["daily_tokens_used"] == 500_000
|
||||
assert data["weekly_tokens_used"] == 3_000_000
|
||||
assert data["tier"] == "FREE"
|
||||
|
||||
configured_snapshot.assert_match(
|
||||
json.dumps(data, indent=2, sort_keys=True) + "\n",
|
||||
@@ -162,6 +163,7 @@ def test_reset_user_usage_daily_only(
|
||||
assert data["daily_tokens_used"] == 0
|
||||
# Weekly is untouched
|
||||
assert data["weekly_tokens_used"] == 3_000_000
|
||||
assert data["tier"] == "FREE"
|
||||
|
||||
mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=False)
|
||||
|
||||
@@ -192,6 +194,7 @@ def test_reset_user_usage_daily_and_weekly(
|
||||
data = response.json()
|
||||
assert data["daily_tokens_used"] == 0
|
||||
assert data["weekly_tokens_used"] == 0
|
||||
assert data["tier"] == "FREE"
|
||||
|
||||
mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=True)
|
||||
|
||||
@@ -228,7 +231,7 @@ def test_get_rate_limit_email_lookup_failure(
|
||||
mocker.patch(
|
||||
f"{_MOCK_MODULE}.get_global_rate_limits",
|
||||
new_callable=AsyncMock,
|
||||
return_value=(2_500_000, 12_500_000),
|
||||
return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
|
||||
)
|
||||
mocker.patch(
|
||||
f"{_MOCK_MODULE}.get_usage_status",
|
||||
@@ -261,3 +264,303 @@ def test_admin_endpoints_require_admin_role(mock_jwt_user) -> None:
|
||||
json={"user_id": "test"},
|
||||
)
|
||||
assert response.status_code == 403
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Tier management endpoints
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_get_user_tier(
|
||||
mocker: pytest_mock.MockerFixture,
|
||||
target_user_id: str,
|
||||
) -> None:
|
||||
"""Test getting a user's rate-limit tier."""
|
||||
mocker.patch(
|
||||
f"{_MOCK_MODULE}.get_user_email_by_id",
|
||||
new_callable=AsyncMock,
|
||||
return_value=_TARGET_EMAIL,
|
||||
)
|
||||
mocker.patch(
|
||||
f"{_MOCK_MODULE}.get_user_tier",
|
||||
new_callable=AsyncMock,
|
||||
return_value=SubscriptionTier.PRO,
|
||||
)
|
||||
|
||||
response = client.get("/admin/rate_limit/tier", params={"user_id": target_user_id})
|
||||
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["user_id"] == target_user_id
|
||||
assert data["tier"] == "PRO"
|
||||
|
||||
|
||||
def test_get_user_tier_user_not_found(
|
||||
mocker: pytest_mock.MockerFixture,
|
||||
target_user_id: str,
|
||||
) -> None:
|
||||
"""Test that getting tier for a non-existent user returns 404."""
|
||||
mocker.patch(
|
||||
f"{_MOCK_MODULE}.get_user_email_by_id",
|
||||
new_callable=AsyncMock,
|
||||
return_value=None,
|
||||
)
|
||||
|
||||
response = client.get("/admin/rate_limit/tier", params={"user_id": target_user_id})
|
||||
|
||||
assert response.status_code == 404
|
||||
|
||||
|
||||
def test_set_user_tier(
|
||||
mocker: pytest_mock.MockerFixture,
|
||||
target_user_id: str,
|
||||
) -> None:
|
||||
"""Test setting a user's rate-limit tier (upgrade)."""
|
||||
mocker.patch(
|
||||
f"{_MOCK_MODULE}.get_user_email_by_id",
|
||||
new_callable=AsyncMock,
|
||||
return_value=_TARGET_EMAIL,
|
||||
)
|
||||
mocker.patch(
|
||||
f"{_MOCK_MODULE}.get_user_tier",
|
||||
new_callable=AsyncMock,
|
||||
return_value=SubscriptionTier.FREE,
|
||||
)
|
||||
mock_set = mocker.patch(
|
||||
f"{_MOCK_MODULE}.set_user_tier",
|
||||
new_callable=AsyncMock,
|
||||
)
|
||||
|
||||
response = client.post(
|
||||
"/admin/rate_limit/tier",
|
||||
json={"user_id": target_user_id, "tier": "ENTERPRISE"},
|
||||
)
|
||||
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["user_id"] == target_user_id
|
||||
assert data["tier"] == "ENTERPRISE"
|
||||
mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.ENTERPRISE)
|
||||
|
||||
|
||||
def test_set_user_tier_downgrade(
|
||||
mocker: pytest_mock.MockerFixture,
|
||||
target_user_id: str,
|
||||
) -> None:
|
||||
"""Test downgrading a user's tier from PRO to FREE."""
|
||||
mocker.patch(
|
||||
f"{_MOCK_MODULE}.get_user_email_by_id",
|
||||
new_callable=AsyncMock,
|
||||
return_value=_TARGET_EMAIL,
|
||||
)
|
||||
mocker.patch(
|
||||
f"{_MOCK_MODULE}.get_user_tier",
|
||||
new_callable=AsyncMock,
|
||||
return_value=SubscriptionTier.PRO,
|
||||
)
|
||||
mock_set = mocker.patch(
|
||||
f"{_MOCK_MODULE}.set_user_tier",
|
||||
new_callable=AsyncMock,
|
||||
)
|
||||
|
||||
response = client.post(
|
||||
"/admin/rate_limit/tier",
|
||||
json={"user_id": target_user_id, "tier": "FREE"},
|
||||
)
|
||||
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["user_id"] == target_user_id
|
||||
assert data["tier"] == "FREE"
|
||||
mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.FREE)
|
||||
|
||||
|
||||
def test_set_user_tier_invalid_tier(
|
||||
target_user_id: str,
|
||||
) -> None:
|
||||
"""Test that setting an invalid tier returns 422."""
|
||||
response = client.post(
|
||||
"/admin/rate_limit/tier",
|
||||
json={"user_id": target_user_id, "tier": "invalid"},
|
||||
)
|
||||
|
||||
assert response.status_code == 422
|
||||
|
||||
|
||||
def test_set_user_tier_invalid_tier_uppercase(
|
||||
target_user_id: str,
|
||||
) -> None:
|
||||
"""Test that setting an unrecognised uppercase tier (e.g. 'INVALID') returns 422.
|
||||
|
||||
Regression: ensures Pydantic enum validation rejects values that are not
|
||||
members of SubscriptionTier, even when they look like valid enum names.
|
||||
"""
|
||||
response = client.post(
|
||||
"/admin/rate_limit/tier",
|
||||
json={"user_id": target_user_id, "tier": "INVALID"},
|
||||
)
|
||||
|
||||
assert response.status_code == 422
|
||||
body = response.json()
|
||||
assert "detail" in body
|
||||
|
||||
|
||||
def test_set_user_tier_email_lookup_failure_returns_404(
|
||||
mocker: pytest_mock.MockerFixture,
|
||||
target_user_id: str,
|
||||
) -> None:
|
||||
"""Test that email lookup failure returns 404 (user unverifiable)."""
|
||||
mocker.patch(
|
||||
f"{_MOCK_MODULE}.get_user_email_by_id",
|
||||
new_callable=AsyncMock,
|
||||
side_effect=Exception("DB connection failed"),
|
||||
)
|
||||
|
||||
response = client.post(
|
||||
"/admin/rate_limit/tier",
|
||||
json={"user_id": target_user_id, "tier": "PRO"},
|
||||
)
|
||||
|
||||
assert response.status_code == 404
|
||||
|
||||
|
||||
def test_set_user_tier_user_not_found(
|
||||
mocker: pytest_mock.MockerFixture,
|
||||
target_user_id: str,
|
||||
) -> None:
|
||||
"""Test that setting tier for a non-existent user returns 404."""
|
||||
mocker.patch(
|
||||
f"{_MOCK_MODULE}.get_user_email_by_id",
|
||||
new_callable=AsyncMock,
|
||||
return_value=None,
|
||||
)
|
||||
|
||||
response = client.post(
|
||||
"/admin/rate_limit/tier",
|
||||
json={"user_id": target_user_id, "tier": "PRO"},
|
||||
)
|
||||
|
||||
assert response.status_code == 404
|
||||
|
||||
|
||||
def test_set_user_tier_db_failure(
|
||||
mocker: pytest_mock.MockerFixture,
|
||||
target_user_id: str,
|
||||
) -> None:
|
||||
"""Test that DB failure on set tier returns 500."""
|
||||
mocker.patch(
|
||||
f"{_MOCK_MODULE}.get_user_email_by_id",
|
||||
new_callable=AsyncMock,
|
||||
return_value=_TARGET_EMAIL,
|
||||
)
|
||||
mocker.patch(
|
||||
f"{_MOCK_MODULE}.get_user_tier",
|
||||
new_callable=AsyncMock,
|
||||
return_value=SubscriptionTier.FREE,
|
||||
)
|
||||
mocker.patch(
|
||||
f"{_MOCK_MODULE}.set_user_tier",
|
||||
new_callable=AsyncMock,
|
||||
side_effect=Exception("DB connection refused"),
|
||||
)
|
||||
|
||||
response = client.post(
|
||||
"/admin/rate_limit/tier",
|
||||
json={"user_id": target_user_id, "tier": "PRO"},
|
||||
)
|
||||
|
||||
assert response.status_code == 500
|
||||
|
||||
|
||||
def test_tier_endpoints_require_admin_role(mock_jwt_user) -> None:
|
||||
"""Test that tier admin endpoints require admin role."""
|
||||
app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
|
||||
|
||||
response = client.get("/admin/rate_limit/tier", params={"user_id": "test"})
|
||||
assert response.status_code == 403
|
||||
|
||||
response = client.post(
|
||||
"/admin/rate_limit/tier",
|
||||
json={"user_id": "test", "tier": "PRO"},
|
||||
)
|
||||
assert response.status_code == 403
|
||||
|
||||
|
||||
# ─── search_users endpoint ──────────────────────────────────────────
|
||||
|
||||
|
||||
def test_search_users_returns_matching_users(
|
||||
mocker: pytest_mock.MockerFixture,
|
||||
admin_user_id: str,
|
||||
) -> None:
|
||||
"""Partial search should return all matching users from the User table."""
|
||||
mocker.patch(
|
||||
_MOCK_MODULE + ".search_users",
|
||||
new_callable=AsyncMock,
|
||||
return_value=[
|
||||
("user-1", "zamil.majdy@gmail.com"),
|
||||
("user-2", "zamil.majdy@agpt.co"),
|
||||
],
|
||||
)
|
||||
|
||||
response = client.get("/admin/rate_limit/search_users", params={"query": "zamil"})
|
||||
|
||||
assert response.status_code == 200
|
||||
results = response.json()
|
||||
assert len(results) == 2
|
||||
assert results[0]["user_email"] == "zamil.majdy@gmail.com"
|
||||
assert results[1]["user_email"] == "zamil.majdy@agpt.co"
|
||||
|
||||
|
||||
def test_search_users_empty_results(
|
||||
mocker: pytest_mock.MockerFixture,
|
||||
admin_user_id: str,
|
||||
) -> None:
|
||||
"""Search with no matches returns empty list."""
|
||||
mocker.patch(
|
||||
_MOCK_MODULE + ".search_users",
|
||||
new_callable=AsyncMock,
|
||||
return_value=[],
|
||||
)
|
||||
|
||||
response = client.get(
|
||||
"/admin/rate_limit/search_users", params={"query": "nonexistent"}
|
||||
)
|
||||
|
||||
assert response.status_code == 200
|
||||
assert response.json() == []
|
||||
|
||||
|
||||
def test_search_users_short_query_rejected(
|
||||
admin_user_id: str,
|
||||
) -> None:
|
||||
"""Query shorter than 3 characters should return 400."""
|
||||
response = client.get("/admin/rate_limit/search_users", params={"query": "ab"})
|
||||
assert response.status_code == 400
|
||||
|
||||
|
||||
def test_search_users_negative_limit_clamped(
|
||||
mocker: pytest_mock.MockerFixture,
|
||||
admin_user_id: str,
|
||||
) -> None:
|
||||
"""Negative limit should be clamped to 1, not passed through."""
|
||||
mock_search = mocker.patch(
|
||||
_MOCK_MODULE + ".search_users",
|
||||
new_callable=AsyncMock,
|
||||
return_value=[],
|
||||
)
|
||||
|
||||
response = client.get(
|
||||
"/admin/rate_limit/search_users", params={"query": "test", "limit": -1}
|
||||
)
|
||||
|
||||
assert response.status_code == 200
|
||||
mock_search.assert_awaited_once_with("test", limit=1)
|
||||
|
||||
|
||||
def test_search_users_requires_admin_role(mock_jwt_user) -> None:
|
||||
"""Test that the search_users endpoint requires admin role."""
|
||||
app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
|
||||
|
||||
response = client.get("/admin/rate_limit/search_users", params={"query": "test"})
|
||||
assert response.status_code == 403
|
||||
|
||||
@@ -15,7 +15,8 @@ from pydantic import BaseModel, ConfigDict, Field, field_validator
|
||||
|
||||
from backend.copilot import service as chat_service
|
||||
from backend.copilot import stream_registry
|
||||
from backend.copilot.config import ChatConfig
|
||||
from backend.copilot.config import ChatConfig, CopilotMode
|
||||
from backend.copilot.db import get_chat_messages_paginated
|
||||
from backend.copilot.executor.utils import enqueue_cancel_task, enqueue_copilot_turn
|
||||
from backend.copilot.model import (
|
||||
ChatMessage,
|
||||
@@ -111,6 +112,11 @@ class StreamChatRequest(BaseModel):
|
||||
file_ids: list[str] | None = Field(
|
||||
default=None, max_length=20
|
||||
) # Workspace file IDs attached to this message
|
||||
mode: CopilotMode | None = Field(
|
||||
default=None,
|
||||
description="Autopilot mode: 'fast' for baseline LLM, 'extended_thinking' for Claude Agent SDK. "
|
||||
"If None, uses the server default (extended_thinking).",
|
||||
)
|
||||
|
||||
|
||||
class CreateSessionRequest(BaseModel):
|
||||
@@ -150,6 +156,8 @@ class SessionDetailResponse(BaseModel):
|
||||
user_id: str | None
|
||||
messages: list[dict]
|
||||
active_stream: ActiveStreamInfo | None = None # Present if stream is still active
|
||||
has_more_messages: bool = False
|
||||
oldest_sequence: int | None = None
|
||||
total_prompt_tokens: int = 0
|
||||
total_completion_tokens: int = 0
|
||||
metadata: ChatSessionMetadata = ChatSessionMetadata()
|
||||
@@ -389,60 +397,78 @@ async def update_session_title_route(
|
||||
async def get_session(
|
||||
session_id: str,
|
||||
user_id: Annotated[str, Security(auth.get_user_id)],
|
||||
limit: int = Query(default=50, ge=1, le=200),
|
||||
before_sequence: int | None = Query(default=None, ge=0),
|
||||
) -> SessionDetailResponse:
|
||||
"""
|
||||
Retrieve the details of a specific chat session.
|
||||
|
||||
Looks up a chat session by ID for the given user (if authenticated) and returns all session data including messages.
|
||||
If there's an active stream for this session, returns active_stream info for reconnection.
|
||||
Supports cursor-based pagination via ``limit`` and ``before_sequence``.
|
||||
When no pagination params are provided, returns the most recent messages.
|
||||
|
||||
Args:
|
||||
session_id: The unique identifier for the desired chat session.
|
||||
user_id: The optional authenticated user ID, or None for anonymous access.
|
||||
user_id: The authenticated user's ID.
|
||||
limit: Maximum number of messages to return (1-200, default 50).
|
||||
before_sequence: Return messages with sequence < this value (cursor).
|
||||
|
||||
Returns:
|
||||
SessionDetailResponse: Details for the requested session, including active_stream info if applicable.
|
||||
|
||||
SessionDetailResponse: Details for the requested session, including
|
||||
active_stream info and pagination metadata.
|
||||
"""
|
||||
session = await get_chat_session(session_id, user_id)
|
||||
if not session:
|
||||
page = await get_chat_messages_paginated(
|
||||
session_id, limit, before_sequence, user_id=user_id
|
||||
)
|
||||
if page is None:
|
||||
raise NotFoundError(f"Session {session_id} not found.")
|
||||
messages = [message.model_dump() for message in page.messages]
|
||||
|
||||
messages = [message.model_dump() for message in session.messages]
|
||||
|
||||
# Check if there's an active stream for this session
|
||||
# Only check active stream on initial load (not on "load more" requests)
|
||||
active_stream_info = None
|
||||
active_session, last_message_id = await stream_registry.get_active_session(
|
||||
session_id, user_id
|
||||
)
|
||||
logger.info(
|
||||
f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
|
||||
f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
|
||||
)
|
||||
if active_session:
|
||||
# Keep the assistant message (including tool_calls) so the frontend can
|
||||
# render the correct tool UI (e.g. CreateAgent with mini game).
|
||||
# convertChatSessionToUiMessages handles isComplete=false by setting
|
||||
# tool parts without output to state "input-available".
|
||||
active_stream_info = ActiveStreamInfo(
|
||||
turn_id=active_session.turn_id,
|
||||
last_message_id=last_message_id,
|
||||
if before_sequence is None:
|
||||
active_session, last_message_id = await stream_registry.get_active_session(
|
||||
session_id, user_id
|
||||
)
|
||||
logger.info(
|
||||
f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
|
||||
f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
|
||||
)
|
||||
if active_session:
|
||||
active_stream_info = ActiveStreamInfo(
|
||||
turn_id=active_session.turn_id,
|
||||
last_message_id=last_message_id,
|
||||
)
|
||||
|
||||
# Skip session metadata on "load more" — frontend only needs messages
|
||||
if before_sequence is not None:
|
||||
return SessionDetailResponse(
|
||||
id=page.session.session_id,
|
||||
created_at=page.session.started_at.isoformat(),
|
||||
updated_at=page.session.updated_at.isoformat(),
|
||||
user_id=page.session.user_id or None,
|
||||
messages=messages,
|
||||
active_stream=None,
|
||||
has_more_messages=page.has_more,
|
||||
oldest_sequence=page.oldest_sequence,
|
||||
total_prompt_tokens=0,
|
||||
total_completion_tokens=0,
|
||||
)
|
||||
|
||||
# Sum token usage from session
|
||||
total_prompt = sum(u.prompt_tokens for u in session.usage)
|
||||
total_completion = sum(u.completion_tokens for u in session.usage)
|
||||
total_prompt = sum(u.prompt_tokens for u in page.session.usage)
|
||||
total_completion = sum(u.completion_tokens for u in page.session.usage)
|
||||
|
||||
return SessionDetailResponse(
|
||||
id=session.session_id,
|
||||
created_at=session.started_at.isoformat(),
|
||||
updated_at=session.updated_at.isoformat(),
|
||||
user_id=session.user_id or None,
|
||||
id=page.session.session_id,
|
||||
created_at=page.session.started_at.isoformat(),
|
||||
updated_at=page.session.updated_at.isoformat(),
|
||||
user_id=page.session.user_id or None,
|
||||
messages=messages,
|
||||
active_stream=active_stream_info,
|
||||
has_more_messages=page.has_more,
|
||||
oldest_sequence=page.oldest_sequence,
|
||||
total_prompt_tokens=total_prompt,
|
||||
total_completion_tokens=total_completion,
|
||||
metadata=session.metadata,
|
||||
metadata=page.session.metadata,
|
||||
)
|
||||
|
||||
|
||||
@@ -456,8 +482,9 @@ async def get_copilot_usage(
|
||||
|
||||
Returns current token usage vs limits for daily and weekly windows.
|
||||
Global defaults sourced from LaunchDarkly (falling back to config).
|
||||
Includes the user's rate-limit tier.
|
||||
"""
|
||||
daily_limit, weekly_limit = await get_global_rate_limits(
|
||||
daily_limit, weekly_limit, tier = await get_global_rate_limits(
|
||||
user_id, config.daily_token_limit, config.weekly_token_limit
|
||||
)
|
||||
return await get_usage_status(
|
||||
@@ -465,6 +492,7 @@ async def get_copilot_usage(
|
||||
daily_token_limit=daily_limit,
|
||||
weekly_token_limit=weekly_limit,
|
||||
rate_limit_reset_cost=config.rate_limit_reset_cost,
|
||||
tier=tier,
|
||||
)
|
||||
|
||||
|
||||
@@ -516,7 +544,7 @@ async def reset_copilot_usage(
|
||||
detail="Rate limit reset is not available (credit system is disabled).",
|
||||
)
|
||||
|
||||
daily_limit, weekly_limit = await get_global_rate_limits(
|
||||
daily_limit, weekly_limit, tier = await get_global_rate_limits(
|
||||
user_id, config.daily_token_limit, config.weekly_token_limit
|
||||
)
|
||||
|
||||
@@ -556,6 +584,7 @@ async def reset_copilot_usage(
|
||||
user_id=user_id,
|
||||
daily_token_limit=daily_limit,
|
||||
weekly_token_limit=weekly_limit,
|
||||
tier=tier,
|
||||
)
|
||||
if daily_limit > 0 and usage_status.daily.used < daily_limit:
|
||||
raise HTTPException(
|
||||
@@ -631,6 +660,7 @@ async def reset_copilot_usage(
|
||||
daily_token_limit=daily_limit,
|
||||
weekly_token_limit=weekly_limit,
|
||||
rate_limit_reset_cost=config.rate_limit_reset_cost,
|
||||
tier=tier,
|
||||
)
|
||||
|
||||
return RateLimitResetResponse(
|
||||
@@ -741,7 +771,7 @@ async def stream_chat_post(
|
||||
# Global defaults sourced from LaunchDarkly, falling back to config.
|
||||
if user_id:
|
||||
try:
|
||||
daily_limit, weekly_limit = await get_global_rate_limits(
|
||||
daily_limit, weekly_limit, _ = await get_global_rate_limits(
|
||||
user_id, config.daily_token_limit, config.weekly_token_limit
|
||||
)
|
||||
await check_rate_limit(
|
||||
@@ -836,6 +866,7 @@ async def stream_chat_post(
|
||||
is_user_message=request.is_user_message,
|
||||
context=request.context,
|
||||
file_ids=sanitized_file_ids,
|
||||
mode=request.mode,
|
||||
)
|
||||
|
||||
setup_time = (time.perf_counter() - stream_start_time) * 1000
|
||||
|
||||
@@ -9,6 +9,7 @@ import pytest
|
||||
import pytest_mock
|
||||
|
||||
from backend.api.features.chat import routes as chat_routes
|
||||
from backend.copilot.rate_limit import SubscriptionTier
|
||||
|
||||
app = fastapi.FastAPI()
|
||||
app.include_router(chat_routes.router)
|
||||
@@ -331,14 +332,28 @@ def _mock_usage(
|
||||
*,
|
||||
daily_used: int = 500,
|
||||
weekly_used: int = 2000,
|
||||
daily_limit: int = 10000,
|
||||
weekly_limit: int = 50000,
|
||||
tier: "SubscriptionTier" = SubscriptionTier.FREE,
|
||||
) -> AsyncMock:
|
||||
"""Mock get_usage_status to return a predictable CoPilotUsageStatus."""
|
||||
"""Mock get_usage_status and get_global_rate_limits for usage endpoint tests.
|
||||
|
||||
Mocks both ``get_global_rate_limits`` (returns the given limits + tier) and
|
||||
``get_usage_status`` so that tests exercise the endpoint without hitting
|
||||
LaunchDarkly or Prisma.
|
||||
"""
|
||||
from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow
|
||||
|
||||
mocker.patch(
|
||||
"backend.api.features.chat.routes.get_global_rate_limits",
|
||||
new_callable=AsyncMock,
|
||||
return_value=(daily_limit, weekly_limit, tier),
|
||||
)
|
||||
|
||||
resets_at = datetime.now(UTC) + timedelta(days=1)
|
||||
status = CoPilotUsageStatus(
|
||||
daily=UsageWindow(used=daily_used, limit=10000, resets_at=resets_at),
|
||||
weekly=UsageWindow(used=weekly_used, limit=50000, resets_at=resets_at),
|
||||
daily=UsageWindow(used=daily_used, limit=daily_limit, resets_at=resets_at),
|
||||
weekly=UsageWindow(used=weekly_used, limit=weekly_limit, resets_at=resets_at),
|
||||
)
|
||||
return mocker.patch(
|
||||
"backend.api.features.chat.routes.get_usage_status",
|
||||
@@ -369,6 +384,7 @@ def test_usage_returns_daily_and_weekly(
|
||||
daily_token_limit=10000,
|
||||
weekly_token_limit=50000,
|
||||
rate_limit_reset_cost=chat_routes.config.rate_limit_reset_cost,
|
||||
tier=SubscriptionTier.FREE,
|
||||
)
|
||||
|
||||
|
||||
@@ -376,11 +392,9 @@ def test_usage_uses_config_limits(
|
||||
mocker: pytest_mock.MockerFixture,
|
||||
test_user_id: str,
|
||||
) -> None:
|
||||
"""The endpoint forwards daily_token_limit and weekly_token_limit from config."""
|
||||
mock_get = _mock_usage(mocker)
|
||||
"""The endpoint forwards resolved limits from get_global_rate_limits to get_usage_status."""
|
||||
mock_get = _mock_usage(mocker, daily_limit=99999, weekly_limit=77777)
|
||||
|
||||
mocker.patch.object(chat_routes.config, "daily_token_limit", 99999)
|
||||
mocker.patch.object(chat_routes.config, "weekly_token_limit", 77777)
|
||||
mocker.patch.object(chat_routes.config, "rate_limit_reset_cost", 500)
|
||||
|
||||
response = client.get("/usage")
|
||||
@@ -391,6 +405,7 @@ def test_usage_uses_config_limits(
|
||||
daily_token_limit=99999,
|
||||
weekly_token_limit=77777,
|
||||
rate_limit_reset_cost=500,
|
||||
tier=SubscriptionTier.FREE,
|
||||
)
|
||||
|
||||
|
||||
@@ -526,3 +541,41 @@ def test_create_session_rejects_nested_metadata(
|
||||
)
|
||||
|
||||
assert response.status_code == 422
|
||||
|
||||
|
||||
class TestStreamChatRequestModeValidation:
|
||||
"""Pydantic-level validation of the ``mode`` field on StreamChatRequest."""
|
||||
|
||||
def test_rejects_invalid_mode_value(self) -> None:
|
||||
"""Any string outside the Literal set must raise ValidationError."""
|
||||
from pydantic import ValidationError
|
||||
|
||||
from backend.api.features.chat.routes import StreamChatRequest
|
||||
|
||||
with pytest.raises(ValidationError):
|
||||
StreamChatRequest(message="hi", mode="turbo") # type: ignore[arg-type]
|
||||
|
||||
def test_accepts_fast_mode(self) -> None:
|
||||
from backend.api.features.chat.routes import StreamChatRequest
|
||||
|
||||
req = StreamChatRequest(message="hi", mode="fast")
|
||||
assert req.mode == "fast"
|
||||
|
||||
def test_accepts_extended_thinking_mode(self) -> None:
|
||||
from backend.api.features.chat.routes import StreamChatRequest
|
||||
|
||||
req = StreamChatRequest(message="hi", mode="extended_thinking")
|
||||
assert req.mode == "extended_thinking"
|
||||
|
||||
def test_accepts_none_mode(self) -> None:
|
||||
"""``mode=None`` is valid (server decides via feature flags)."""
|
||||
from backend.api.features.chat.routes import StreamChatRequest
|
||||
|
||||
req = StreamChatRequest(message="hi", mode=None)
|
||||
assert req.mode is None
|
||||
|
||||
def test_mode_defaults_to_none_when_omitted(self) -> None:
|
||||
from backend.api.features.chat.routes import StreamChatRequest
|
||||
|
||||
req = StreamChatRequest(message="hi")
|
||||
assert req.mode is None
|
||||
|
||||
@@ -189,6 +189,7 @@ async def test_create_store_submission(mocker):
|
||||
notifyOnAgentApproved=True,
|
||||
notifyOnAgentRejected=True,
|
||||
timezone="Europe/Delft",
|
||||
subscriptionTier=prisma.enums.SubscriptionTier.FREE, # type: ignore[reportCallIssue,reportAttributeAccessIssue]
|
||||
)
|
||||
mock_agent = prisma.models.AgentGraph(
|
||||
id="agent-id",
|
||||
|
||||
@@ -12,7 +12,7 @@ import fastapi
|
||||
from autogpt_libs.auth.dependencies import get_user_id, requires_user
|
||||
from fastapi import Query, UploadFile
|
||||
from fastapi.responses import Response
|
||||
from pydantic import BaseModel
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
from backend.data.workspace import (
|
||||
WorkspaceFile,
|
||||
@@ -131,9 +131,26 @@ class StorageUsageResponse(BaseModel):
|
||||
file_count: int
|
||||
|
||||
|
||||
class WorkspaceFileItem(BaseModel):
|
||||
id: str
|
||||
name: str
|
||||
path: str
|
||||
mime_type: str
|
||||
size_bytes: int
|
||||
metadata: dict = Field(default_factory=dict)
|
||||
created_at: str
|
||||
|
||||
|
||||
class ListFilesResponse(BaseModel):
|
||||
files: list[WorkspaceFileItem]
|
||||
offset: int = 0
|
||||
has_more: bool = False
|
||||
|
||||
|
||||
@router.get(
|
||||
"/files/{file_id}/download",
|
||||
summary="Download file by ID",
|
||||
operation_id="getWorkspaceDownloadFileById",
|
||||
)
|
||||
async def download_file(
|
||||
user_id: Annotated[str, fastapi.Security(get_user_id)],
|
||||
@@ -158,6 +175,7 @@ async def download_file(
|
||||
@router.delete(
|
||||
"/files/{file_id}",
|
||||
summary="Delete a workspace file",
|
||||
operation_id="deleteWorkspaceFile",
|
||||
)
|
||||
async def delete_workspace_file(
|
||||
user_id: Annotated[str, fastapi.Security(get_user_id)],
|
||||
@@ -183,6 +201,7 @@ async def delete_workspace_file(
|
||||
@router.post(
|
||||
"/files/upload",
|
||||
summary="Upload file to workspace",
|
||||
operation_id="uploadWorkspaceFile",
|
||||
)
|
||||
async def upload_file(
|
||||
user_id: Annotated[str, fastapi.Security(get_user_id)],
|
||||
@@ -196,6 +215,9 @@ async def upload_file(
|
||||
Files are stored in session-scoped paths when session_id is provided,
|
||||
so the agent's session-scoped tools can discover them automatically.
|
||||
"""
|
||||
# Empty-string session_id drops session scoping; normalize to None.
|
||||
session_id = session_id or None
|
||||
|
||||
config = Config()
|
||||
|
||||
# Sanitize filename — strip any directory components
|
||||
@@ -250,16 +272,27 @@ async def upload_file(
|
||||
manager = WorkspaceManager(user_id, workspace.id, session_id)
|
||||
try:
|
||||
workspace_file = await manager.write_file(
|
||||
content, filename, overwrite=overwrite
|
||||
content, filename, overwrite=overwrite, metadata={"origin": "user-upload"}
|
||||
)
|
||||
except ValueError as e:
|
||||
raise fastapi.HTTPException(status_code=409, detail=str(e)) from e
|
||||
# write_file raises ValueError for both path-conflict and size-limit
|
||||
# cases; map each to its correct HTTP status.
|
||||
message = str(e)
|
||||
if message.startswith("File too large"):
|
||||
raise fastapi.HTTPException(status_code=413, detail=message) from e
|
||||
raise fastapi.HTTPException(status_code=409, detail=message) from e
|
||||
|
||||
# Post-write storage check — eliminates TOCTOU race on the quota.
|
||||
# If a concurrent upload pushed us over the limit, undo this write.
|
||||
new_total = await get_workspace_total_size(workspace.id)
|
||||
if storage_limit_bytes and new_total > storage_limit_bytes:
|
||||
await soft_delete_workspace_file(workspace_file.id, workspace.id)
|
||||
try:
|
||||
await soft_delete_workspace_file(workspace_file.id, workspace.id)
|
||||
except Exception as e:
|
||||
logger.warning(
|
||||
f"Failed to soft-delete over-quota file {workspace_file.id} "
|
||||
f"in workspace {workspace.id}: {e}"
|
||||
)
|
||||
raise fastapi.HTTPException(
|
||||
status_code=413,
|
||||
detail={
|
||||
@@ -281,6 +314,7 @@ async def upload_file(
|
||||
@router.get(
|
||||
"/storage/usage",
|
||||
summary="Get workspace storage usage",
|
||||
operation_id="getWorkspaceStorageUsage",
|
||||
)
|
||||
async def get_storage_usage(
|
||||
user_id: Annotated[str, fastapi.Security(get_user_id)],
|
||||
@@ -301,3 +335,57 @@ async def get_storage_usage(
|
||||
used_percent=round((used_bytes / limit_bytes) * 100, 1) if limit_bytes else 0,
|
||||
file_count=file_count,
|
||||
)
|
||||
|
||||
|
||||
@router.get(
|
||||
"/files",
|
||||
summary="List workspace files",
|
||||
operation_id="listWorkspaceFiles",
|
||||
)
|
||||
async def list_workspace_files(
|
||||
user_id: Annotated[str, fastapi.Security(get_user_id)],
|
||||
session_id: str | None = Query(default=None),
|
||||
limit: int = Query(default=200, ge=1, le=1000),
|
||||
offset: int = Query(default=0, ge=0),
|
||||
) -> ListFilesResponse:
|
||||
"""
|
||||
List files in the user's workspace.
|
||||
|
||||
When session_id is provided, only files for that session are returned.
|
||||
Otherwise, all files across sessions are listed. Results are paginated
|
||||
via `limit`/`offset`; `has_more` indicates whether additional pages exist.
|
||||
"""
|
||||
workspace = await get_or_create_workspace(user_id)
|
||||
|
||||
# Treat empty-string session_id the same as omitted — an empty value
|
||||
# would otherwise silently list files across every session instead of
|
||||
# scoping to one.
|
||||
session_id = session_id or None
|
||||
|
||||
manager = WorkspaceManager(user_id, workspace.id, session_id)
|
||||
include_all = session_id is None
|
||||
# Fetch one extra to compute has_more without a separate count query.
|
||||
files = await manager.list_files(
|
||||
limit=limit + 1,
|
||||
offset=offset,
|
||||
include_all_sessions=include_all,
|
||||
)
|
||||
has_more = len(files) > limit
|
||||
page = files[:limit]
|
||||
|
||||
return ListFilesResponse(
|
||||
files=[
|
||||
WorkspaceFileItem(
|
||||
id=f.id,
|
||||
name=f.name,
|
||||
path=f.path,
|
||||
mime_type=f.mime_type,
|
||||
size_bytes=f.size_bytes,
|
||||
metadata=f.metadata or {},
|
||||
created_at=f.created_at.isoformat(),
|
||||
)
|
||||
for f in page
|
||||
],
|
||||
offset=offset,
|
||||
has_more=has_more,
|
||||
)
|
||||
|
||||
@@ -1,48 +1,28 @@
|
||||
"""Tests for workspace file upload and download routes."""
|
||||
|
||||
import io
|
||||
from datetime import datetime, timezone
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import fastapi
|
||||
import fastapi.testclient
|
||||
import pytest
|
||||
import pytest_mock
|
||||
|
||||
from backend.api.features.workspace import routes as workspace_routes
|
||||
from backend.data.workspace import WorkspaceFile
|
||||
from backend.api.features.workspace.routes import router
|
||||
from backend.data.workspace import Workspace, WorkspaceFile
|
||||
|
||||
app = fastapi.FastAPI()
|
||||
app.include_router(workspace_routes.router)
|
||||
app.include_router(router)
|
||||
|
||||
|
||||
@app.exception_handler(ValueError)
|
||||
async def _value_error_handler(
|
||||
request: fastapi.Request, exc: ValueError
|
||||
) -> fastapi.responses.JSONResponse:
|
||||
"""Mirror the production ValueError → 400 mapping from rest_api.py."""
|
||||
"""Mirror the production ValueError → 400 mapping from the REST app."""
|
||||
return fastapi.responses.JSONResponse(status_code=400, content={"detail": str(exc)})
|
||||
|
||||
|
||||
client = fastapi.testclient.TestClient(app)
|
||||
|
||||
TEST_USER_ID = "3e53486c-cf57-477e-ba2a-cb02dc828e1a"
|
||||
|
||||
MOCK_WORKSPACE = type("W", (), {"id": "ws-1"})()
|
||||
|
||||
_NOW = datetime(2023, 1, 1, tzinfo=timezone.utc)
|
||||
|
||||
MOCK_FILE = WorkspaceFile(
|
||||
id="file-aaa-bbb",
|
||||
workspace_id="ws-1",
|
||||
created_at=_NOW,
|
||||
updated_at=_NOW,
|
||||
name="hello.txt",
|
||||
path="/session/hello.txt",
|
||||
mime_type="text/plain",
|
||||
size_bytes=13,
|
||||
storage_path="local://hello.txt",
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def setup_app_auth(mock_jwt_user):
|
||||
@@ -53,25 +33,201 @@ def setup_app_auth(mock_jwt_user):
|
||||
app.dependency_overrides.clear()
|
||||
|
||||
|
||||
def _make_workspace(user_id: str = "test-user-id") -> Workspace:
|
||||
return Workspace(
|
||||
id="ws-001",
|
||||
user_id=user_id,
|
||||
created_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
|
||||
updated_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
|
||||
)
|
||||
|
||||
|
||||
def _make_file(**overrides) -> WorkspaceFile:
|
||||
defaults = {
|
||||
"id": "file-001",
|
||||
"workspace_id": "ws-001",
|
||||
"created_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
|
||||
"updated_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
|
||||
"name": "test.txt",
|
||||
"path": "/test.txt",
|
||||
"storage_path": "local://test.txt",
|
||||
"mime_type": "text/plain",
|
||||
"size_bytes": 100,
|
||||
"checksum": None,
|
||||
"is_deleted": False,
|
||||
"deleted_at": None,
|
||||
"metadata": {},
|
||||
}
|
||||
defaults.update(overrides)
|
||||
return WorkspaceFile(**defaults)
|
||||
|
||||
|
||||
def _make_file_mock(**overrides) -> MagicMock:
|
||||
"""Create a mock WorkspaceFile to simulate DB records with null fields."""
|
||||
defaults = {
|
||||
"id": "file-001",
|
||||
"name": "test.txt",
|
||||
"path": "/test.txt",
|
||||
"mime_type": "text/plain",
|
||||
"size_bytes": 100,
|
||||
"metadata": {},
|
||||
"created_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
|
||||
}
|
||||
defaults.update(overrides)
|
||||
mock = MagicMock(spec=WorkspaceFile)
|
||||
for k, v in defaults.items():
|
||||
setattr(mock, k, v)
|
||||
return mock
|
||||
|
||||
|
||||
# -- list_workspace_files tests --
|
||||
|
||||
|
||||
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
|
||||
@patch("backend.api.features.workspace.routes.WorkspaceManager")
|
||||
def test_list_files_returns_all_when_no_session(mock_manager_cls, mock_get_workspace):
|
||||
mock_get_workspace.return_value = _make_workspace()
|
||||
files = [
|
||||
_make_file(id="f1", name="a.txt", metadata={"origin": "user-upload"}),
|
||||
_make_file(id="f2", name="b.csv", metadata={"origin": "agent-created"}),
|
||||
]
|
||||
mock_instance = AsyncMock()
|
||||
mock_instance.list_files.return_value = files
|
||||
mock_manager_cls.return_value = mock_instance
|
||||
|
||||
response = client.get("/files")
|
||||
assert response.status_code == 200
|
||||
|
||||
data = response.json()
|
||||
assert len(data["files"]) == 2
|
||||
assert data["has_more"] is False
|
||||
assert data["offset"] == 0
|
||||
assert data["files"][0]["id"] == "f1"
|
||||
assert data["files"][0]["metadata"] == {"origin": "user-upload"}
|
||||
assert data["files"][1]["id"] == "f2"
|
||||
mock_instance.list_files.assert_called_once_with(
|
||||
limit=201, offset=0, include_all_sessions=True
|
||||
)
|
||||
|
||||
|
||||
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
|
||||
@patch("backend.api.features.workspace.routes.WorkspaceManager")
|
||||
def test_list_files_scopes_to_session_when_provided(
|
||||
mock_manager_cls, mock_get_workspace, test_user_id
|
||||
):
|
||||
mock_get_workspace.return_value = _make_workspace(user_id=test_user_id)
|
||||
mock_instance = AsyncMock()
|
||||
mock_instance.list_files.return_value = []
|
||||
mock_manager_cls.return_value = mock_instance
|
||||
|
||||
response = client.get("/files?session_id=sess-123")
|
||||
assert response.status_code == 200
|
||||
|
||||
data = response.json()
|
||||
assert data["files"] == []
|
||||
assert data["has_more"] is False
|
||||
mock_manager_cls.assert_called_once_with(test_user_id, "ws-001", "sess-123")
|
||||
mock_instance.list_files.assert_called_once_with(
|
||||
limit=201, offset=0, include_all_sessions=False
|
||||
)
|
||||
|
||||
|
||||
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
|
||||
@patch("backend.api.features.workspace.routes.WorkspaceManager")
|
||||
def test_list_files_null_metadata_coerced_to_empty_dict(
|
||||
mock_manager_cls, mock_get_workspace
|
||||
):
|
||||
"""Route uses `f.metadata or {}` for pre-existing files with null metadata."""
|
||||
mock_get_workspace.return_value = _make_workspace()
|
||||
mock_instance = AsyncMock()
|
||||
mock_instance.list_files.return_value = [_make_file_mock(metadata=None)]
|
||||
mock_manager_cls.return_value = mock_instance
|
||||
|
||||
response = client.get("/files")
|
||||
assert response.status_code == 200
|
||||
assert response.json()["files"][0]["metadata"] == {}
|
||||
|
||||
|
||||
# -- upload_file metadata tests --
|
||||
|
||||
|
||||
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
|
||||
@patch("backend.api.features.workspace.routes.get_workspace_total_size")
|
||||
@patch("backend.api.features.workspace.routes.scan_content_safe")
|
||||
@patch("backend.api.features.workspace.routes.WorkspaceManager")
|
||||
def test_upload_passes_user_upload_origin_metadata(
|
||||
mock_manager_cls, mock_scan, mock_total_size, mock_get_workspace
|
||||
):
|
||||
mock_get_workspace.return_value = _make_workspace()
|
||||
mock_total_size.return_value = 100
|
||||
written = _make_file(id="new-file", name="doc.pdf")
|
||||
mock_instance = AsyncMock()
|
||||
mock_instance.write_file.return_value = written
|
||||
mock_manager_cls.return_value = mock_instance
|
||||
|
||||
response = client.post(
|
||||
"/files/upload",
|
||||
files={"file": ("doc.pdf", b"fake-pdf-content", "application/pdf")},
|
||||
)
|
||||
assert response.status_code == 200
|
||||
|
||||
mock_instance.write_file.assert_called_once()
|
||||
call_kwargs = mock_instance.write_file.call_args
|
||||
assert call_kwargs.kwargs.get("metadata") == {"origin": "user-upload"}
|
||||
|
||||
|
||||
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
|
||||
@patch("backend.api.features.workspace.routes.get_workspace_total_size")
|
||||
@patch("backend.api.features.workspace.routes.scan_content_safe")
|
||||
@patch("backend.api.features.workspace.routes.WorkspaceManager")
|
||||
def test_upload_returns_409_on_file_conflict(
|
||||
mock_manager_cls, mock_scan, mock_total_size, mock_get_workspace
|
||||
):
|
||||
mock_get_workspace.return_value = _make_workspace()
|
||||
mock_total_size.return_value = 100
|
||||
mock_instance = AsyncMock()
|
||||
mock_instance.write_file.side_effect = ValueError("File already exists at path")
|
||||
mock_manager_cls.return_value = mock_instance
|
||||
|
||||
response = client.post(
|
||||
"/files/upload",
|
||||
files={"file": ("dup.txt", b"content", "text/plain")},
|
||||
)
|
||||
assert response.status_code == 409
|
||||
assert "already exists" in response.json()["detail"]
|
||||
|
||||
|
||||
# -- Restored upload/download/delete security + invariant tests --
|
||||
|
||||
|
||||
def _upload(
|
||||
filename: str = "hello.txt",
|
||||
content: bytes = b"Hello, world!",
|
||||
content_type: str = "text/plain",
|
||||
):
|
||||
"""Helper to POST a file upload."""
|
||||
return client.post(
|
||||
"/files/upload?session_id=sess-1",
|
||||
files={"file": (filename, io.BytesIO(content), content_type)},
|
||||
)
|
||||
|
||||
|
||||
# ---- Happy path ----
|
||||
_MOCK_FILE = WorkspaceFile(
|
||||
id="file-aaa-bbb",
|
||||
workspace_id="ws-001",
|
||||
created_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
|
||||
updated_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
|
||||
name="hello.txt",
|
||||
path="/sessions/sess-1/hello.txt",
|
||||
mime_type="text/plain",
|
||||
size_bytes=13,
|
||||
storage_path="local://hello.txt",
|
||||
)
|
||||
|
||||
|
||||
def test_upload_happy_path(mocker: pytest_mock.MockFixture):
|
||||
def test_upload_happy_path(mocker):
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_or_create_workspace",
|
||||
return_value=MOCK_WORKSPACE,
|
||||
return_value=_make_workspace(),
|
||||
)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_workspace_total_size",
|
||||
@@ -82,7 +238,7 @@ def test_upload_happy_path(mocker: pytest_mock.MockFixture):
|
||||
return_value=None,
|
||||
)
|
||||
mock_manager = mocker.MagicMock()
|
||||
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
|
||||
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.WorkspaceManager",
|
||||
return_value=mock_manager,
|
||||
@@ -96,10 +252,7 @@ def test_upload_happy_path(mocker: pytest_mock.MockFixture):
|
||||
assert data["size_bytes"] == 13
|
||||
|
||||
|
||||
# ---- Per-file size limit ----
|
||||
|
||||
|
||||
def test_upload_exceeds_max_file_size(mocker: pytest_mock.MockFixture):
|
||||
def test_upload_exceeds_max_file_size(mocker):
|
||||
"""Files larger than max_file_size_mb should be rejected with 413."""
|
||||
cfg = mocker.patch("backend.api.features.workspace.routes.Config")
|
||||
cfg.return_value.max_file_size_mb = 0 # 0 MB → any content is too big
|
||||
@@ -109,15 +262,11 @@ def test_upload_exceeds_max_file_size(mocker: pytest_mock.MockFixture):
|
||||
assert response.status_code == 413
|
||||
|
||||
|
||||
# ---- Storage quota exceeded ----
|
||||
|
||||
|
||||
def test_upload_storage_quota_exceeded(mocker: pytest_mock.MockFixture):
|
||||
def test_upload_storage_quota_exceeded(mocker):
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_or_create_workspace",
|
||||
return_value=MOCK_WORKSPACE,
|
||||
return_value=_make_workspace(),
|
||||
)
|
||||
# Current usage already at limit
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_workspace_total_size",
|
||||
return_value=500 * 1024 * 1024,
|
||||
@@ -128,27 +277,22 @@ def test_upload_storage_quota_exceeded(mocker: pytest_mock.MockFixture):
|
||||
assert "Storage limit exceeded" in response.text
|
||||
|
||||
|
||||
# ---- Post-write quota race (B2) ----
|
||||
|
||||
|
||||
def test_upload_post_write_quota_race(mocker: pytest_mock.MockFixture):
|
||||
"""If a concurrent upload tips the total over the limit after write,
|
||||
the file should be soft-deleted and 413 returned."""
|
||||
def test_upload_post_write_quota_race(mocker):
|
||||
"""Concurrent upload tipping over limit after write should soft-delete + 413."""
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_or_create_workspace",
|
||||
return_value=MOCK_WORKSPACE,
|
||||
return_value=_make_workspace(),
|
||||
)
|
||||
# Pre-write check passes (under limit), but post-write check fails
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_workspace_total_size",
|
||||
side_effect=[0, 600 * 1024 * 1024], # first call OK, second over limit
|
||||
side_effect=[0, 600 * 1024 * 1024],
|
||||
)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.scan_content_safe",
|
||||
return_value=None,
|
||||
)
|
||||
mock_manager = mocker.MagicMock()
|
||||
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
|
||||
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.WorkspaceManager",
|
||||
return_value=mock_manager,
|
||||
@@ -160,17 +304,14 @@ def test_upload_post_write_quota_race(mocker: pytest_mock.MockFixture):
|
||||
|
||||
response = _upload()
|
||||
assert response.status_code == 413
|
||||
mock_delete.assert_called_once_with("file-aaa-bbb", "ws-1")
|
||||
mock_delete.assert_called_once_with("file-aaa-bbb", "ws-001")
|
||||
|
||||
|
||||
# ---- Any extension accepted (no allowlist) ----
|
||||
|
||||
|
||||
def test_upload_any_extension(mocker: pytest_mock.MockFixture):
|
||||
def test_upload_any_extension(mocker):
|
||||
"""Any file extension should be accepted — ClamAV is the security layer."""
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_or_create_workspace",
|
||||
return_value=MOCK_WORKSPACE,
|
||||
return_value=_make_workspace(),
|
||||
)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_workspace_total_size",
|
||||
@@ -181,7 +322,7 @@ def test_upload_any_extension(mocker: pytest_mock.MockFixture):
|
||||
return_value=None,
|
||||
)
|
||||
mock_manager = mocker.MagicMock()
|
||||
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
|
||||
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.WorkspaceManager",
|
||||
return_value=mock_manager,
|
||||
@@ -191,16 +332,13 @@ def test_upload_any_extension(mocker: pytest_mock.MockFixture):
|
||||
assert response.status_code == 200
|
||||
|
||||
|
||||
# ---- Virus scan rejection ----
|
||||
|
||||
|
||||
def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
|
||||
def test_upload_blocked_by_virus_scan(mocker):
|
||||
"""Files flagged by ClamAV should be rejected and never written to storage."""
|
||||
from backend.api.features.store.exceptions import VirusDetectedError
|
||||
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_or_create_workspace",
|
||||
return_value=MOCK_WORKSPACE,
|
||||
return_value=_make_workspace(),
|
||||
)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_workspace_total_size",
|
||||
@@ -211,7 +349,7 @@ def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
|
||||
side_effect=VirusDetectedError("Eicar-Test-Signature"),
|
||||
)
|
||||
mock_manager = mocker.MagicMock()
|
||||
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
|
||||
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.WorkspaceManager",
|
||||
return_value=mock_manager,
|
||||
@@ -219,18 +357,14 @@ def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
|
||||
|
||||
response = _upload(filename="evil.exe", content=b"X5O!P%@AP...")
|
||||
assert response.status_code == 400
|
||||
assert "Virus detected" in response.text
|
||||
mock_manager.write_file.assert_not_called()
|
||||
|
||||
|
||||
# ---- No file extension ----
|
||||
|
||||
|
||||
def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
|
||||
def test_upload_file_without_extension(mocker):
|
||||
"""Files without an extension should be accepted and stored as-is."""
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_or_create_workspace",
|
||||
return_value=MOCK_WORKSPACE,
|
||||
return_value=_make_workspace(),
|
||||
)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_workspace_total_size",
|
||||
@@ -241,7 +375,7 @@ def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
|
||||
return_value=None,
|
||||
)
|
||||
mock_manager = mocker.MagicMock()
|
||||
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
|
||||
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.WorkspaceManager",
|
||||
return_value=mock_manager,
|
||||
@@ -257,14 +391,11 @@ def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
|
||||
assert mock_manager.write_file.call_args[0][1] == "Makefile"
|
||||
|
||||
|
||||
# ---- Filename sanitization (SF5) ----
|
||||
|
||||
|
||||
def test_upload_strips_path_components(mocker: pytest_mock.MockFixture):
|
||||
def test_upload_strips_path_components(mocker):
|
||||
"""Path-traversal filenames should be reduced to their basename."""
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_or_create_workspace",
|
||||
return_value=MOCK_WORKSPACE,
|
||||
return_value=_make_workspace(),
|
||||
)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_workspace_total_size",
|
||||
@@ -275,28 +406,23 @@ def test_upload_strips_path_components(mocker: pytest_mock.MockFixture):
|
||||
return_value=None,
|
||||
)
|
||||
mock_manager = mocker.MagicMock()
|
||||
mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
|
||||
mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.WorkspaceManager",
|
||||
return_value=mock_manager,
|
||||
)
|
||||
|
||||
# Filename with traversal
|
||||
_upload(filename="../../etc/passwd.txt")
|
||||
|
||||
# write_file should have been called with just the basename
|
||||
mock_manager.write_file.assert_called_once()
|
||||
call_args = mock_manager.write_file.call_args
|
||||
assert call_args[0][1] == "passwd.txt"
|
||||
|
||||
|
||||
# ---- Download ----
|
||||
|
||||
|
||||
def test_download_file_not_found(mocker: pytest_mock.MockFixture):
|
||||
def test_download_file_not_found(mocker):
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_workspace",
|
||||
return_value=MOCK_WORKSPACE,
|
||||
return_value=_make_workspace(),
|
||||
)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_workspace_file",
|
||||
@@ -307,14 +433,11 @@ def test_download_file_not_found(mocker: pytest_mock.MockFixture):
|
||||
assert response.status_code == 404
|
||||
|
||||
|
||||
# ---- Delete ----
|
||||
|
||||
|
||||
def test_delete_file_success(mocker: pytest_mock.MockFixture):
|
||||
def test_delete_file_success(mocker):
|
||||
"""Deleting an existing file should return {"deleted": true}."""
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_workspace",
|
||||
return_value=MOCK_WORKSPACE,
|
||||
return_value=_make_workspace(),
|
||||
)
|
||||
mock_manager = mocker.MagicMock()
|
||||
mock_manager.delete_file = mocker.AsyncMock(return_value=True)
|
||||
@@ -329,11 +452,11 @@ def test_delete_file_success(mocker: pytest_mock.MockFixture):
|
||||
mock_manager.delete_file.assert_called_once_with("file-aaa-bbb")
|
||||
|
||||
|
||||
def test_delete_file_not_found(mocker: pytest_mock.MockFixture):
|
||||
def test_delete_file_not_found(mocker):
|
||||
"""Deleting a non-existent file should return 404."""
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_workspace",
|
||||
return_value=MOCK_WORKSPACE,
|
||||
return_value=_make_workspace(),
|
||||
)
|
||||
mock_manager = mocker.MagicMock()
|
||||
mock_manager.delete_file = mocker.AsyncMock(return_value=False)
|
||||
@@ -347,7 +470,7 @@ def test_delete_file_not_found(mocker: pytest_mock.MockFixture):
|
||||
assert "File not found" in response.text
|
||||
|
||||
|
||||
def test_delete_file_no_workspace(mocker: pytest_mock.MockFixture):
|
||||
def test_delete_file_no_workspace(mocker):
|
||||
"""Deleting when user has no workspace should return 404."""
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_workspace",
|
||||
@@ -357,3 +480,123 @@ def test_delete_file_no_workspace(mocker: pytest_mock.MockFixture):
|
||||
response = client.delete("/files/file-aaa-bbb")
|
||||
assert response.status_code == 404
|
||||
assert "Workspace not found" in response.text
|
||||
|
||||
|
||||
def test_upload_write_file_too_large_returns_413(mocker):
|
||||
"""write_file raises ValueError("File too large: …") → must map to 413."""
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_or_create_workspace",
|
||||
return_value=_make_workspace(),
|
||||
)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_workspace_total_size",
|
||||
return_value=0,
|
||||
)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.scan_content_safe",
|
||||
return_value=None,
|
||||
)
|
||||
mock_manager = mocker.MagicMock()
|
||||
mock_manager.write_file = mocker.AsyncMock(
|
||||
side_effect=ValueError("File too large: 900 bytes exceeds 1MB limit")
|
||||
)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.WorkspaceManager",
|
||||
return_value=mock_manager,
|
||||
)
|
||||
|
||||
response = _upload()
|
||||
assert response.status_code == 413
|
||||
assert "File too large" in response.text
|
||||
|
||||
|
||||
def test_upload_write_file_conflict_returns_409(mocker):
|
||||
"""Non-'File too large' ValueErrors from write_file stay as 409."""
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_or_create_workspace",
|
||||
return_value=_make_workspace(),
|
||||
)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.get_workspace_total_size",
|
||||
return_value=0,
|
||||
)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.scan_content_safe",
|
||||
return_value=None,
|
||||
)
|
||||
mock_manager = mocker.MagicMock()
|
||||
mock_manager.write_file = mocker.AsyncMock(
|
||||
side_effect=ValueError("File already exists at path: /sessions/x/a.txt")
|
||||
)
|
||||
mocker.patch(
|
||||
"backend.api.features.workspace.routes.WorkspaceManager",
|
||||
return_value=mock_manager,
|
||||
)
|
||||
|
||||
response = _upload()
|
||||
assert response.status_code == 409
|
||||
assert "already exists" in response.text
|
||||
|
||||
|
||||
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
|
||||
@patch("backend.api.features.workspace.routes.WorkspaceManager")
|
||||
def test_list_files_has_more_true_when_limit_exceeded(
|
||||
mock_manager_cls, mock_get_workspace
|
||||
):
|
||||
"""The limit+1 fetch trick must flip has_more=True and trim the page."""
|
||||
mock_get_workspace.return_value = _make_workspace()
|
||||
# Backend was asked for limit+1=3, and returned exactly 3 items.
|
||||
files = [
|
||||
_make_file(id="f1", name="a.txt"),
|
||||
_make_file(id="f2", name="b.txt"),
|
||||
_make_file(id="f3", name="c.txt"),
|
||||
]
|
||||
mock_instance = AsyncMock()
|
||||
mock_instance.list_files.return_value = files
|
||||
mock_manager_cls.return_value = mock_instance
|
||||
|
||||
response = client.get("/files?limit=2")
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["has_more"] is True
|
||||
assert len(data["files"]) == 2
|
||||
assert data["files"][0]["id"] == "f1"
|
||||
assert data["files"][1]["id"] == "f2"
|
||||
mock_instance.list_files.assert_called_once_with(
|
||||
limit=3, offset=0, include_all_sessions=True
|
||||
)
|
||||
|
||||
|
||||
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
|
||||
@patch("backend.api.features.workspace.routes.WorkspaceManager")
|
||||
def test_list_files_has_more_false_when_exactly_page_size(
|
||||
mock_manager_cls, mock_get_workspace
|
||||
):
|
||||
"""Exactly `limit` rows means we're on the last page — has_more=False."""
|
||||
mock_get_workspace.return_value = _make_workspace()
|
||||
files = [_make_file(id="f1", name="a.txt"), _make_file(id="f2", name="b.txt")]
|
||||
mock_instance = AsyncMock()
|
||||
mock_instance.list_files.return_value = files
|
||||
mock_manager_cls.return_value = mock_instance
|
||||
|
||||
response = client.get("/files?limit=2")
|
||||
assert response.status_code == 200
|
||||
data = response.json()
|
||||
assert data["has_more"] is False
|
||||
assert len(data["files"]) == 2
|
||||
|
||||
|
||||
@patch("backend.api.features.workspace.routes.get_or_create_workspace")
|
||||
@patch("backend.api.features.workspace.routes.WorkspaceManager")
|
||||
def test_list_files_offset_is_echoed_back(mock_manager_cls, mock_get_workspace):
|
||||
mock_get_workspace.return_value = _make_workspace()
|
||||
mock_instance = AsyncMock()
|
||||
mock_instance.list_files.return_value = []
|
||||
mock_manager_cls.return_value = mock_instance
|
||||
|
||||
response = client.get("/files?offset=50&limit=10")
|
||||
assert response.status_code == 200
|
||||
assert response.json()["offset"] == 50
|
||||
mock_instance.list_files.assert_called_once_with(
|
||||
limit=11, offset=50, include_all_sessions=True
|
||||
)
|
||||
|
||||
@@ -18,6 +18,7 @@ from prisma.errors import PrismaError
|
||||
|
||||
import backend.api.features.admin.credit_admin_routes
|
||||
import backend.api.features.admin.execution_analytics_routes
|
||||
import backend.api.features.admin.platform_cost_routes
|
||||
import backend.api.features.admin.rate_limit_admin_routes
|
||||
import backend.api.features.admin.store_admin_routes
|
||||
import backend.api.features.builder
|
||||
@@ -329,6 +330,11 @@ app.include_router(
|
||||
tags=["v2", "admin"],
|
||||
prefix="/api/copilot",
|
||||
)
|
||||
app.include_router(
|
||||
backend.api.features.admin.platform_cost_routes.router,
|
||||
tags=["v2", "admin"],
|
||||
prefix="/api/admin",
|
||||
)
|
||||
app.include_router(
|
||||
backend.api.features.executions.review.routes.router,
|
||||
tags=["v2", "executions", "review"],
|
||||
|
||||
@@ -17,7 +17,7 @@ from backend.blocks.apollo.models import (
|
||||
PrimaryPhone,
|
||||
SearchOrganizationsRequest,
|
||||
)
|
||||
from backend.data.model import CredentialsField, SchemaField
|
||||
from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField
|
||||
|
||||
|
||||
class SearchOrganizationsBlock(Block):
|
||||
@@ -218,6 +218,11 @@ To find IDs, identify the values for organization_id when you call this endpoint
|
||||
) -> BlockOutput:
|
||||
query = SearchOrganizationsRequest(**input_data.model_dump())
|
||||
organizations = await self.search_organizations(query, credentials)
|
||||
self.merge_stats(
|
||||
NodeExecutionStats(
|
||||
provider_cost=float(len(organizations)), provider_cost_type="items"
|
||||
)
|
||||
)
|
||||
for organization in organizations:
|
||||
yield "organization", organization
|
||||
yield "organizations", organizations
|
||||
|
||||
@@ -21,7 +21,7 @@ from backend.blocks.apollo.models import (
|
||||
SearchPeopleRequest,
|
||||
SenorityLevels,
|
||||
)
|
||||
from backend.data.model import CredentialsField, SchemaField
|
||||
from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField
|
||||
|
||||
|
||||
class SearchPeopleBlock(Block):
|
||||
@@ -366,4 +366,9 @@ class SearchPeopleBlock(Block):
|
||||
*(enrich_or_fallback(person) for person in people)
|
||||
)
|
||||
|
||||
self.merge_stats(
|
||||
NodeExecutionStats(
|
||||
provider_cost=float(len(people)), provider_cost_type="items"
|
||||
)
|
||||
)
|
||||
yield "people", people
|
||||
|
||||
@@ -0,0 +1,712 @@
|
||||
"""Unit tests for merge_stats cost tracking in individual blocks.
|
||||
|
||||
Covers the exa code_context, exa contents, and apollo organization blocks
|
||||
to verify provider cost is correctly extracted and reported.
|
||||
"""
|
||||
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
from pydantic import SecretStr
|
||||
|
||||
from backend.data.model import APIKeyCredentials, NodeExecutionStats
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
TEST_EXA_CREDENTIALS = APIKeyCredentials(
|
||||
id="01234567-89ab-cdef-0123-456789abcdef",
|
||||
provider="exa",
|
||||
api_key=SecretStr("mock-exa-api-key"),
|
||||
title="Mock Exa API key",
|
||||
expires_at=None,
|
||||
)
|
||||
|
||||
TEST_EXA_CREDENTIALS_INPUT = {
|
||||
"provider": TEST_EXA_CREDENTIALS.provider,
|
||||
"id": TEST_EXA_CREDENTIALS.id,
|
||||
"type": TEST_EXA_CREDENTIALS.type,
|
||||
"title": TEST_EXA_CREDENTIALS.title,
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# ExaCodeContextBlock — cost_dollars is a string like "0.005"
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestExaCodeContextBlockCostTracking:
|
||||
@pytest.mark.asyncio
|
||||
async def test_merge_stats_called_with_float_cost(self):
|
||||
"""float(cost_dollars) parsed from API string and passed to merge_stats."""
|
||||
from backend.blocks.exa.code_context import ExaCodeContextBlock
|
||||
|
||||
block = ExaCodeContextBlock()
|
||||
|
||||
api_response = {
|
||||
"requestId": "req-1",
|
||||
"query": "how to use hooks",
|
||||
"response": "Here are some examples...",
|
||||
"resultsCount": 3,
|
||||
"costDollars": "0.005",
|
||||
"searchTime": 1.2,
|
||||
"outputTokens": 100,
|
||||
}
|
||||
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.json.return_value = api_response
|
||||
|
||||
accumulated: list[NodeExecutionStats] = []
|
||||
|
||||
with (
|
||||
patch(
|
||||
"backend.blocks.exa.code_context.Requests.post",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_resp,
|
||||
),
|
||||
patch.object(
|
||||
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
|
||||
),
|
||||
):
|
||||
input_data = ExaCodeContextBlock.Input(
|
||||
query="how to use hooks",
|
||||
credentials=TEST_EXA_CREDENTIALS_INPUT, # type: ignore[arg-type]
|
||||
)
|
||||
results = []
|
||||
async for output in block.run(
|
||||
input_data,
|
||||
credentials=TEST_EXA_CREDENTIALS,
|
||||
):
|
||||
results.append(output)
|
||||
|
||||
assert len(accumulated) == 1
|
||||
assert accumulated[0].provider_cost == pytest.approx(0.005)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_invalid_cost_dollars_does_not_raise(self):
|
||||
"""When cost_dollars cannot be parsed as float, merge_stats is not called."""
|
||||
from backend.blocks.exa.code_context import ExaCodeContextBlock
|
||||
|
||||
block = ExaCodeContextBlock()
|
||||
|
||||
api_response = {
|
||||
"requestId": "req-2",
|
||||
"query": "query",
|
||||
"response": "response",
|
||||
"resultsCount": 0,
|
||||
"costDollars": "N/A",
|
||||
"searchTime": 0.5,
|
||||
"outputTokens": 0,
|
||||
}
|
||||
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.json.return_value = api_response
|
||||
|
||||
merge_calls: list[NodeExecutionStats] = []
|
||||
|
||||
with (
|
||||
patch(
|
||||
"backend.blocks.exa.code_context.Requests.post",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_resp,
|
||||
),
|
||||
patch.object(
|
||||
block, "merge_stats", side_effect=lambda s: merge_calls.append(s)
|
||||
),
|
||||
):
|
||||
input_data = ExaCodeContextBlock.Input(
|
||||
query="query",
|
||||
credentials=TEST_EXA_CREDENTIALS_INPUT, # type: ignore[arg-type]
|
||||
)
|
||||
async for _ in block.run(
|
||||
input_data,
|
||||
credentials=TEST_EXA_CREDENTIALS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert merge_calls == []
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_zero_cost_is_tracked(self):
|
||||
"""A zero cost_dollars string '0.0' should still be recorded."""
|
||||
from backend.blocks.exa.code_context import ExaCodeContextBlock
|
||||
|
||||
block = ExaCodeContextBlock()
|
||||
|
||||
api_response = {
|
||||
"requestId": "req-3",
|
||||
"query": "query",
|
||||
"response": "...",
|
||||
"resultsCount": 1,
|
||||
"costDollars": "0.0",
|
||||
"searchTime": 0.1,
|
||||
"outputTokens": 10,
|
||||
}
|
||||
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.json.return_value = api_response
|
||||
|
||||
accumulated: list[NodeExecutionStats] = []
|
||||
|
||||
with (
|
||||
patch(
|
||||
"backend.blocks.exa.code_context.Requests.post",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_resp,
|
||||
),
|
||||
patch.object(
|
||||
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
|
||||
),
|
||||
):
|
||||
input_data = ExaCodeContextBlock.Input(
|
||||
query="query",
|
||||
credentials=TEST_EXA_CREDENTIALS_INPUT, # type: ignore[arg-type]
|
||||
)
|
||||
async for _ in block.run(
|
||||
input_data,
|
||||
credentials=TEST_EXA_CREDENTIALS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert len(accumulated) == 1
|
||||
assert accumulated[0].provider_cost == 0.0
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# ExaContentsBlock — response.cost_dollars.total (CostDollars model)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestExaContentsBlockCostTracking:
|
||||
@pytest.mark.asyncio
|
||||
async def test_merge_stats_called_with_cost_dollars_total(self):
|
||||
"""provider_cost equals response.cost_dollars.total when present."""
|
||||
from backend.blocks.exa.contents import ExaContentsBlock
|
||||
from backend.blocks.exa.helpers import CostDollars
|
||||
|
||||
block = ExaContentsBlock()
|
||||
|
||||
cost_dollars = CostDollars(total=0.012)
|
||||
|
||||
mock_response = MagicMock()
|
||||
mock_response.results = []
|
||||
mock_response.context = None
|
||||
mock_response.statuses = None
|
||||
mock_response.cost_dollars = cost_dollars
|
||||
|
||||
accumulated: list[NodeExecutionStats] = []
|
||||
|
||||
with (
|
||||
patch(
|
||||
"backend.blocks.exa.contents.AsyncExa",
|
||||
return_value=MagicMock(
|
||||
get_contents=AsyncMock(return_value=mock_response)
|
||||
),
|
||||
),
|
||||
patch.object(
|
||||
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
|
||||
),
|
||||
):
|
||||
input_data = ExaContentsBlock.Input(
|
||||
urls=["https://example.com"],
|
||||
credentials=TEST_EXA_CREDENTIALS_INPUT, # type: ignore[arg-type]
|
||||
)
|
||||
async for _ in block.run(
|
||||
input_data,
|
||||
credentials=TEST_EXA_CREDENTIALS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert len(accumulated) == 1
|
||||
assert accumulated[0].provider_cost == pytest.approx(0.012)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_no_merge_stats_when_cost_dollars_absent(self):
|
||||
"""When response.cost_dollars is None, merge_stats is not called."""
|
||||
from backend.blocks.exa.contents import ExaContentsBlock
|
||||
|
||||
block = ExaContentsBlock()
|
||||
|
||||
mock_response = MagicMock()
|
||||
mock_response.results = []
|
||||
mock_response.context = None
|
||||
mock_response.statuses = None
|
||||
mock_response.cost_dollars = None
|
||||
|
||||
accumulated: list[NodeExecutionStats] = []
|
||||
|
||||
with (
|
||||
patch(
|
||||
"backend.blocks.exa.contents.AsyncExa",
|
||||
return_value=MagicMock(
|
||||
get_contents=AsyncMock(return_value=mock_response)
|
||||
),
|
||||
),
|
||||
patch.object(
|
||||
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
|
||||
),
|
||||
):
|
||||
input_data = ExaContentsBlock.Input(
|
||||
urls=["https://example.com"],
|
||||
credentials=TEST_EXA_CREDENTIALS_INPUT, # type: ignore[arg-type]
|
||||
)
|
||||
async for _ in block.run(
|
||||
input_data,
|
||||
credentials=TEST_EXA_CREDENTIALS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert accumulated == []
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# SearchOrganizationsBlock — provider_cost = float(len(organizations))
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestSearchOrganizationsBlockCostTracking:
|
||||
@pytest.mark.asyncio
|
||||
async def test_merge_stats_called_with_org_count(self):
|
||||
"""provider_cost == number of returned organizations, type == 'items'."""
|
||||
from backend.blocks.apollo._auth import TEST_CREDENTIALS as APOLLO_CREDS
|
||||
from backend.blocks.apollo._auth import (
|
||||
TEST_CREDENTIALS_INPUT as APOLLO_CREDS_INPUT,
|
||||
)
|
||||
from backend.blocks.apollo.models import Organization
|
||||
from backend.blocks.apollo.organization import SearchOrganizationsBlock
|
||||
|
||||
block = SearchOrganizationsBlock()
|
||||
|
||||
fake_orgs = [Organization(id=str(i), name=f"Org{i}") for i in range(3)]
|
||||
|
||||
accumulated: list[NodeExecutionStats] = []
|
||||
|
||||
with (
|
||||
patch.object(
|
||||
SearchOrganizationsBlock,
|
||||
"search_organizations",
|
||||
new_callable=AsyncMock,
|
||||
return_value=fake_orgs,
|
||||
),
|
||||
patch.object(
|
||||
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
|
||||
),
|
||||
):
|
||||
input_data = SearchOrganizationsBlock.Input(
|
||||
credentials=APOLLO_CREDS_INPUT, # type: ignore[arg-type]
|
||||
)
|
||||
results = []
|
||||
async for output in block.run(
|
||||
input_data,
|
||||
credentials=APOLLO_CREDS,
|
||||
):
|
||||
results.append(output)
|
||||
|
||||
assert len(accumulated) == 1
|
||||
assert accumulated[0].provider_cost == pytest.approx(3.0)
|
||||
assert accumulated[0].provider_cost_type == "items"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_empty_org_list_tracks_zero(self):
|
||||
"""An empty organization list results in provider_cost=0.0."""
|
||||
from backend.blocks.apollo._auth import TEST_CREDENTIALS as APOLLO_CREDS
|
||||
from backend.blocks.apollo._auth import (
|
||||
TEST_CREDENTIALS_INPUT as APOLLO_CREDS_INPUT,
|
||||
)
|
||||
from backend.blocks.apollo.organization import SearchOrganizationsBlock
|
||||
|
||||
block = SearchOrganizationsBlock()
|
||||
accumulated: list[NodeExecutionStats] = []
|
||||
|
||||
with (
|
||||
patch.object(
|
||||
SearchOrganizationsBlock,
|
||||
"search_organizations",
|
||||
new_callable=AsyncMock,
|
||||
return_value=[],
|
||||
),
|
||||
patch.object(
|
||||
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
|
||||
),
|
||||
):
|
||||
input_data = SearchOrganizationsBlock.Input(
|
||||
credentials=APOLLO_CREDS_INPUT, # type: ignore[arg-type]
|
||||
)
|
||||
async for _ in block.run(
|
||||
input_data,
|
||||
credentials=APOLLO_CREDS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert len(accumulated) == 1
|
||||
assert accumulated[0].provider_cost == 0.0
|
||||
assert accumulated[0].provider_cost_type == "items"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# JinaEmbeddingBlock — token count from usage.total_tokens
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestJinaEmbeddingBlockCostTracking:
|
||||
@pytest.mark.asyncio
|
||||
async def test_merge_stats_called_with_token_count(self):
|
||||
"""provider token count is recorded when API returns usage.total_tokens."""
|
||||
from backend.blocks.jina._auth import TEST_CREDENTIALS as JINA_CREDS
|
||||
from backend.blocks.jina._auth import TEST_CREDENTIALS_INPUT as JINA_CREDS_INPUT
|
||||
from backend.blocks.jina.embeddings import JinaEmbeddingBlock
|
||||
|
||||
block = JinaEmbeddingBlock()
|
||||
|
||||
api_response = {
|
||||
"data": [{"embedding": [0.1, 0.2, 0.3]}],
|
||||
"usage": {"total_tokens": 42},
|
||||
}
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.json.return_value = api_response
|
||||
|
||||
accumulated: list[NodeExecutionStats] = []
|
||||
|
||||
with (
|
||||
patch(
|
||||
"backend.blocks.jina.embeddings.Requests.post",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_resp,
|
||||
),
|
||||
patch.object(
|
||||
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
|
||||
),
|
||||
):
|
||||
input_data = JinaEmbeddingBlock.Input(
|
||||
texts=["hello world"],
|
||||
credentials=JINA_CREDS_INPUT, # type: ignore[arg-type]
|
||||
)
|
||||
async for _ in block.run(input_data, credentials=JINA_CREDS):
|
||||
pass
|
||||
|
||||
assert len(accumulated) == 1
|
||||
assert accumulated[0].input_token_count == 42
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_no_merge_stats_when_usage_absent(self):
|
||||
"""When API response omits usage field, merge_stats is not called."""
|
||||
from backend.blocks.jina._auth import TEST_CREDENTIALS as JINA_CREDS
|
||||
from backend.blocks.jina._auth import TEST_CREDENTIALS_INPUT as JINA_CREDS_INPUT
|
||||
from backend.blocks.jina.embeddings import JinaEmbeddingBlock
|
||||
|
||||
block = JinaEmbeddingBlock()
|
||||
|
||||
api_response = {
|
||||
"data": [{"embedding": [0.1, 0.2, 0.3]}],
|
||||
}
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.json.return_value = api_response
|
||||
|
||||
accumulated: list[NodeExecutionStats] = []
|
||||
|
||||
with (
|
||||
patch(
|
||||
"backend.blocks.jina.embeddings.Requests.post",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_resp,
|
||||
),
|
||||
patch.object(
|
||||
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
|
||||
),
|
||||
):
|
||||
input_data = JinaEmbeddingBlock.Input(
|
||||
texts=["hello"],
|
||||
credentials=JINA_CREDS_INPUT, # type: ignore[arg-type]
|
||||
)
|
||||
async for _ in block.run(input_data, credentials=JINA_CREDS):
|
||||
pass
|
||||
|
||||
assert accumulated == []
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# UnrealTextToSpeechBlock — character count from input text length
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestUnrealTextToSpeechBlockCostTracking:
|
||||
@pytest.mark.asyncio
|
||||
async def test_merge_stats_called_with_character_count(self):
|
||||
"""provider_cost equals len(text) with type='characters'."""
|
||||
from backend.blocks.text_to_speech_block import TEST_CREDENTIALS as TTS_CREDS
|
||||
from backend.blocks.text_to_speech_block import (
|
||||
TEST_CREDENTIALS_INPUT as TTS_CREDS_INPUT,
|
||||
)
|
||||
from backend.blocks.text_to_speech_block import UnrealTextToSpeechBlock
|
||||
|
||||
block = UnrealTextToSpeechBlock()
|
||||
test_text = "Hello, world!"
|
||||
|
||||
with (
|
||||
patch.object(
|
||||
UnrealTextToSpeechBlock,
|
||||
"call_unreal_speech_api",
|
||||
new_callable=AsyncMock,
|
||||
return_value={"OutputUri": "https://example.com/audio.mp3"},
|
||||
),
|
||||
patch.object(block, "merge_stats") as mock_merge,
|
||||
):
|
||||
input_data = UnrealTextToSpeechBlock.Input(
|
||||
text=test_text,
|
||||
credentials=TTS_CREDS_INPUT, # type: ignore[arg-type]
|
||||
)
|
||||
async for _ in block.run(input_data, credentials=TTS_CREDS):
|
||||
pass
|
||||
|
||||
mock_merge.assert_called_once()
|
||||
stats = mock_merge.call_args[0][0]
|
||||
assert stats.provider_cost == float(len(test_text))
|
||||
assert stats.provider_cost_type == "characters"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_empty_text_gives_zero_characters(self):
|
||||
"""An empty text string results in provider_cost=0.0."""
|
||||
from backend.blocks.text_to_speech_block import TEST_CREDENTIALS as TTS_CREDS
|
||||
from backend.blocks.text_to_speech_block import (
|
||||
TEST_CREDENTIALS_INPUT as TTS_CREDS_INPUT,
|
||||
)
|
||||
from backend.blocks.text_to_speech_block import UnrealTextToSpeechBlock
|
||||
|
||||
block = UnrealTextToSpeechBlock()
|
||||
|
||||
with (
|
||||
patch.object(
|
||||
UnrealTextToSpeechBlock,
|
||||
"call_unreal_speech_api",
|
||||
new_callable=AsyncMock,
|
||||
return_value={"OutputUri": "https://example.com/audio.mp3"},
|
||||
),
|
||||
patch.object(block, "merge_stats") as mock_merge,
|
||||
):
|
||||
input_data = UnrealTextToSpeechBlock.Input(
|
||||
text="",
|
||||
credentials=TTS_CREDS_INPUT, # type: ignore[arg-type]
|
||||
)
|
||||
async for _ in block.run(input_data, credentials=TTS_CREDS):
|
||||
pass
|
||||
|
||||
mock_merge.assert_called_once()
|
||||
stats = mock_merge.call_args[0][0]
|
||||
assert stats.provider_cost == 0.0
|
||||
assert stats.provider_cost_type == "characters"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# GoogleMapsSearchBlock — item count from search_places results
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestGoogleMapsSearchBlockCostTracking:
|
||||
@pytest.mark.asyncio
|
||||
async def test_merge_stats_called_with_place_count(self):
|
||||
"""provider_cost equals number of returned places, type == 'items'."""
|
||||
from backend.blocks.google_maps import TEST_CREDENTIALS as MAPS_CREDS
|
||||
from backend.blocks.google_maps import (
|
||||
TEST_CREDENTIALS_INPUT as MAPS_CREDS_INPUT,
|
||||
)
|
||||
from backend.blocks.google_maps import GoogleMapsSearchBlock
|
||||
|
||||
block = GoogleMapsSearchBlock()
|
||||
|
||||
fake_places = [{"name": f"Place{i}", "address": f"Addr{i}"} for i in range(4)]
|
||||
accumulated: list[NodeExecutionStats] = []
|
||||
|
||||
with (
|
||||
patch.object(
|
||||
GoogleMapsSearchBlock,
|
||||
"search_places",
|
||||
return_value=fake_places,
|
||||
),
|
||||
patch.object(
|
||||
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
|
||||
),
|
||||
):
|
||||
input_data = GoogleMapsSearchBlock.Input(
|
||||
query="coffee shops",
|
||||
credentials=MAPS_CREDS_INPUT, # type: ignore[arg-type]
|
||||
)
|
||||
async for _ in block.run(input_data, credentials=MAPS_CREDS):
|
||||
pass
|
||||
|
||||
assert len(accumulated) == 1
|
||||
assert accumulated[0].provider_cost == 4.0
|
||||
assert accumulated[0].provider_cost_type == "items"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_empty_results_tracks_zero(self):
|
||||
"""Zero places returned results in provider_cost=0.0."""
|
||||
from backend.blocks.google_maps import TEST_CREDENTIALS as MAPS_CREDS
|
||||
from backend.blocks.google_maps import (
|
||||
TEST_CREDENTIALS_INPUT as MAPS_CREDS_INPUT,
|
||||
)
|
||||
from backend.blocks.google_maps import GoogleMapsSearchBlock
|
||||
|
||||
block = GoogleMapsSearchBlock()
|
||||
accumulated: list[NodeExecutionStats] = []
|
||||
|
||||
with (
|
||||
patch.object(
|
||||
GoogleMapsSearchBlock,
|
||||
"search_places",
|
||||
return_value=[],
|
||||
),
|
||||
patch.object(
|
||||
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
|
||||
),
|
||||
):
|
||||
input_data = GoogleMapsSearchBlock.Input(
|
||||
query="nothing here",
|
||||
credentials=MAPS_CREDS_INPUT, # type: ignore[arg-type]
|
||||
)
|
||||
async for _ in block.run(input_data, credentials=MAPS_CREDS):
|
||||
pass
|
||||
|
||||
assert len(accumulated) == 1
|
||||
assert accumulated[0].provider_cost == 0.0
|
||||
assert accumulated[0].provider_cost_type == "items"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# SmartLeadAddLeadsBlock — item count from lead_list length
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestSmartLeadAddLeadsBlockCostTracking:
|
||||
@pytest.mark.asyncio
|
||||
async def test_merge_stats_called_with_lead_count(self):
|
||||
"""provider_cost equals number of leads uploaded, type == 'items'."""
|
||||
from backend.blocks.smartlead._auth import TEST_CREDENTIALS as SL_CREDS
|
||||
from backend.blocks.smartlead._auth import (
|
||||
TEST_CREDENTIALS_INPUT as SL_CREDS_INPUT,
|
||||
)
|
||||
from backend.blocks.smartlead.campaign import AddLeadToCampaignBlock
|
||||
from backend.blocks.smartlead.models import (
|
||||
AddLeadsToCampaignResponse,
|
||||
LeadInput,
|
||||
)
|
||||
|
||||
block = AddLeadToCampaignBlock()
|
||||
|
||||
fake_leads = [
|
||||
LeadInput(first_name="Alice", last_name="A", email="alice@example.com"),
|
||||
LeadInput(first_name="Bob", last_name="B", email="bob@example.com"),
|
||||
]
|
||||
fake_response = AddLeadsToCampaignResponse(
|
||||
ok=True,
|
||||
upload_count=2,
|
||||
total_leads=2,
|
||||
block_count=0,
|
||||
duplicate_count=0,
|
||||
invalid_email_count=0,
|
||||
invalid_emails=[],
|
||||
already_added_to_campaign=0,
|
||||
unsubscribed_leads=[],
|
||||
is_lead_limit_exhausted=False,
|
||||
lead_import_stopped_count=0,
|
||||
bounce_count=0,
|
||||
)
|
||||
accumulated: list[NodeExecutionStats] = []
|
||||
|
||||
with (
|
||||
patch.object(
|
||||
AddLeadToCampaignBlock,
|
||||
"add_leads_to_campaign",
|
||||
new_callable=AsyncMock,
|
||||
return_value=fake_response,
|
||||
),
|
||||
patch.object(
|
||||
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
|
||||
),
|
||||
):
|
||||
input_data = AddLeadToCampaignBlock.Input(
|
||||
campaign_id=123,
|
||||
lead_list=fake_leads,
|
||||
credentials=SL_CREDS_INPUT, # type: ignore[arg-type]
|
||||
)
|
||||
async for _ in block.run(input_data, credentials=SL_CREDS):
|
||||
pass
|
||||
|
||||
assert len(accumulated) == 1
|
||||
assert accumulated[0].provider_cost == 2.0
|
||||
assert accumulated[0].provider_cost_type == "items"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# SearchPeopleBlock — item count from people list length
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestSearchPeopleBlockCostTracking:
|
||||
@pytest.mark.asyncio
|
||||
async def test_merge_stats_called_with_people_count(self):
|
||||
"""provider_cost equals number of returned people, type == 'items'."""
|
||||
from backend.blocks.apollo._auth import TEST_CREDENTIALS as APOLLO_CREDS
|
||||
from backend.blocks.apollo._auth import (
|
||||
TEST_CREDENTIALS_INPUT as APOLLO_CREDS_INPUT,
|
||||
)
|
||||
from backend.blocks.apollo.models import Contact
|
||||
from backend.blocks.apollo.people import SearchPeopleBlock
|
||||
|
||||
block = SearchPeopleBlock()
|
||||
fake_people = [Contact(id=str(i), first_name=f"Person{i}") for i in range(5)]
|
||||
accumulated: list[NodeExecutionStats] = []
|
||||
|
||||
with (
|
||||
patch.object(
|
||||
SearchPeopleBlock,
|
||||
"search_people",
|
||||
new_callable=AsyncMock,
|
||||
return_value=fake_people,
|
||||
),
|
||||
patch.object(
|
||||
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
|
||||
),
|
||||
):
|
||||
input_data = SearchPeopleBlock.Input(
|
||||
credentials=APOLLO_CREDS_INPUT, # type: ignore[arg-type]
|
||||
)
|
||||
async for _ in block.run(input_data, credentials=APOLLO_CREDS):
|
||||
pass
|
||||
|
||||
assert len(accumulated) == 1
|
||||
assert accumulated[0].provider_cost == pytest.approx(5.0)
|
||||
assert accumulated[0].provider_cost_type == "items"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_empty_people_list_tracks_zero(self):
|
||||
"""An empty people list results in provider_cost=0.0."""
|
||||
from backend.blocks.apollo._auth import TEST_CREDENTIALS as APOLLO_CREDS
|
||||
from backend.blocks.apollo._auth import (
|
||||
TEST_CREDENTIALS_INPUT as APOLLO_CREDS_INPUT,
|
||||
)
|
||||
from backend.blocks.apollo.people import SearchPeopleBlock
|
||||
|
||||
block = SearchPeopleBlock()
|
||||
accumulated: list[NodeExecutionStats] = []
|
||||
|
||||
with (
|
||||
patch.object(
|
||||
SearchPeopleBlock,
|
||||
"search_people",
|
||||
new_callable=AsyncMock,
|
||||
return_value=[],
|
||||
),
|
||||
patch.object(
|
||||
block, "merge_stats", side_effect=lambda s: accumulated.append(s)
|
||||
),
|
||||
):
|
||||
input_data = SearchPeopleBlock.Input(
|
||||
credentials=APOLLO_CREDS_INPUT, # type: ignore[arg-type]
|
||||
)
|
||||
async for _ in block.run(input_data, credentials=APOLLO_CREDS):
|
||||
pass
|
||||
|
||||
assert len(accumulated) == 1
|
||||
assert accumulated[0].provider_cost == 0.0
|
||||
assert accumulated[0].provider_cost_type == "items"
|
||||
@@ -9,6 +9,7 @@ from typing import Union
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.data.model import NodeExecutionStats
|
||||
from backend.sdk import (
|
||||
APIKeyCredentials,
|
||||
Block,
|
||||
@@ -116,3 +117,10 @@ class ExaCodeContextBlock(Block):
|
||||
yield "cost_dollars", context.cost_dollars
|
||||
yield "search_time", context.search_time
|
||||
yield "output_tokens", context.output_tokens
|
||||
|
||||
# Parse cost_dollars (API returns as string, e.g. "0.005")
|
||||
try:
|
||||
cost_usd = float(context.cost_dollars)
|
||||
self.merge_stats(NodeExecutionStats(provider_cost=cost_usd))
|
||||
except (ValueError, TypeError):
|
||||
pass
|
||||
|
||||
@@ -4,6 +4,7 @@ from typing import Optional
|
||||
from exa_py import AsyncExa
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.data.model import NodeExecutionStats
|
||||
from backend.sdk import (
|
||||
APIKeyCredentials,
|
||||
Block,
|
||||
@@ -223,3 +224,6 @@ class ExaContentsBlock(Block):
|
||||
|
||||
if response.cost_dollars:
|
||||
yield "cost_dollars", response.cost_dollars
|
||||
self.merge_stats(
|
||||
NodeExecutionStats(provider_cost=response.cost_dollars.total)
|
||||
)
|
||||
|
||||
@@ -0,0 +1,575 @@
|
||||
"""Tests for cost tracking in Exa blocks.
|
||||
|
||||
Covers the cost_dollars → provider_cost → merge_stats path for both
|
||||
ExaContentsBlock and ExaCodeContextBlock.
|
||||
"""
|
||||
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from backend.blocks.exa._test import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT
|
||||
from backend.data.model import NodeExecutionStats
|
||||
|
||||
|
||||
class TestExaCodeContextCostTracking:
|
||||
"""ExaCodeContextBlock parses cost_dollars (string) and calls merge_stats."""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_valid_cost_string_is_parsed_and_merged(self):
|
||||
"""A numeric cost string like '0.005' is merged as provider_cost."""
|
||||
from backend.blocks.exa.code_context import ExaCodeContextBlock
|
||||
|
||||
block = ExaCodeContextBlock()
|
||||
merged: list[NodeExecutionStats] = []
|
||||
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
|
||||
|
||||
api_response = {
|
||||
"requestId": "req-1",
|
||||
"query": "test query",
|
||||
"response": "some code",
|
||||
"resultsCount": 3,
|
||||
"costDollars": "0.005",
|
||||
"searchTime": 1.2,
|
||||
"outputTokens": 100,
|
||||
}
|
||||
|
||||
with patch("backend.blocks.exa.code_context.Requests") as mock_requests_cls:
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.json.return_value = api_response
|
||||
mock_requests_cls.return_value.post = AsyncMock(return_value=mock_resp)
|
||||
|
||||
outputs = []
|
||||
async for key, value in block.run(
|
||||
block.Input(query="test query", credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
|
||||
credentials=TEST_CREDENTIALS,
|
||||
):
|
||||
outputs.append((key, value))
|
||||
|
||||
assert any(k == "cost_dollars" for k, _ in outputs)
|
||||
assert len(merged) == 1
|
||||
assert merged[0].provider_cost == pytest.approx(0.005)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_invalid_cost_string_does_not_raise(self):
|
||||
"""A non-numeric cost_dollars value is swallowed silently."""
|
||||
from backend.blocks.exa.code_context import ExaCodeContextBlock
|
||||
|
||||
block = ExaCodeContextBlock()
|
||||
merged: list[NodeExecutionStats] = []
|
||||
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
|
||||
|
||||
api_response = {
|
||||
"requestId": "req-2",
|
||||
"query": "test",
|
||||
"response": "code",
|
||||
"resultsCount": 0,
|
||||
"costDollars": "N/A",
|
||||
"searchTime": 0.5,
|
||||
"outputTokens": 0,
|
||||
}
|
||||
|
||||
with patch("backend.blocks.exa.code_context.Requests") as mock_requests_cls:
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.json.return_value = api_response
|
||||
mock_requests_cls.return_value.post = AsyncMock(return_value=mock_resp)
|
||||
|
||||
outputs = []
|
||||
async for key, value in block.run(
|
||||
block.Input(query="test", credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
|
||||
credentials=TEST_CREDENTIALS,
|
||||
):
|
||||
outputs.append((key, value))
|
||||
|
||||
# No merge_stats call because float() raised ValueError
|
||||
assert len(merged) == 0
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_zero_cost_string_is_merged(self):
|
||||
"""'0.0' is a valid cost — should still be tracked."""
|
||||
from backend.blocks.exa.code_context import ExaCodeContextBlock
|
||||
|
||||
block = ExaCodeContextBlock()
|
||||
merged: list[NodeExecutionStats] = []
|
||||
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
|
||||
|
||||
api_response = {
|
||||
"requestId": "req-3",
|
||||
"query": "free query",
|
||||
"response": "result",
|
||||
"resultsCount": 1,
|
||||
"costDollars": "0.0",
|
||||
"searchTime": 0.1,
|
||||
"outputTokens": 10,
|
||||
}
|
||||
|
||||
with patch("backend.blocks.exa.code_context.Requests") as mock_requests_cls:
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.json.return_value = api_response
|
||||
mock_requests_cls.return_value.post = AsyncMock(return_value=mock_resp)
|
||||
|
||||
async for _ in block.run(
|
||||
block.Input(query="free query", credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
|
||||
credentials=TEST_CREDENTIALS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert len(merged) == 1
|
||||
assert merged[0].provider_cost == pytest.approx(0.0)
|
||||
|
||||
|
||||
class TestExaContentsCostTracking:
|
||||
"""ExaContentsBlock merges cost_dollars.total as provider_cost."""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cost_dollars_total_is_merged(self):
|
||||
"""When the SDK response includes cost_dollars, its total is merged."""
|
||||
from backend.blocks.exa.contents import ExaContentsBlock
|
||||
from backend.blocks.exa.helpers import CostDollars
|
||||
|
||||
block = ExaContentsBlock()
|
||||
merged: list[NodeExecutionStats] = []
|
||||
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
|
||||
|
||||
mock_sdk_response = MagicMock()
|
||||
mock_sdk_response.results = []
|
||||
mock_sdk_response.context = None
|
||||
mock_sdk_response.statuses = None
|
||||
mock_sdk_response.cost_dollars = CostDollars(total=0.012)
|
||||
|
||||
with patch("backend.blocks.exa.contents.AsyncExa") as mock_exa_cls:
|
||||
mock_exa = MagicMock()
|
||||
mock_exa.get_contents = AsyncMock(return_value=mock_sdk_response)
|
||||
mock_exa_cls.return_value = mock_exa
|
||||
|
||||
async for _ in block.run(
|
||||
block.Input(urls=["https://example.com"], credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
|
||||
credentials=TEST_CREDENTIALS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert len(merged) == 1
|
||||
assert merged[0].provider_cost == pytest.approx(0.012)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_no_cost_dollars_skips_merge(self):
|
||||
"""When cost_dollars is absent, merge_stats is not called."""
|
||||
from backend.blocks.exa.contents import ExaContentsBlock
|
||||
|
||||
block = ExaContentsBlock()
|
||||
merged: list[NodeExecutionStats] = []
|
||||
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
|
||||
|
||||
mock_sdk_response = MagicMock()
|
||||
mock_sdk_response.results = []
|
||||
mock_sdk_response.context = None
|
||||
mock_sdk_response.statuses = None
|
||||
mock_sdk_response.cost_dollars = None
|
||||
|
||||
with patch("backend.blocks.exa.contents.AsyncExa") as mock_exa_cls:
|
||||
mock_exa = MagicMock()
|
||||
mock_exa.get_contents = AsyncMock(return_value=mock_sdk_response)
|
||||
mock_exa_cls.return_value = mock_exa
|
||||
|
||||
async for _ in block.run(
|
||||
block.Input(urls=["https://example.com"], credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
|
||||
credentials=TEST_CREDENTIALS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert len(merged) == 0
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_zero_cost_dollars_is_merged(self):
|
||||
"""A total of 0.0 (free tier) should still be merged."""
|
||||
from backend.blocks.exa.contents import ExaContentsBlock
|
||||
from backend.blocks.exa.helpers import CostDollars
|
||||
|
||||
block = ExaContentsBlock()
|
||||
merged: list[NodeExecutionStats] = []
|
||||
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
|
||||
|
||||
mock_sdk_response = MagicMock()
|
||||
mock_sdk_response.results = []
|
||||
mock_sdk_response.context = None
|
||||
mock_sdk_response.statuses = None
|
||||
mock_sdk_response.cost_dollars = CostDollars(total=0.0)
|
||||
|
||||
with patch("backend.blocks.exa.contents.AsyncExa") as mock_exa_cls:
|
||||
mock_exa = MagicMock()
|
||||
mock_exa.get_contents = AsyncMock(return_value=mock_sdk_response)
|
||||
mock_exa_cls.return_value = mock_exa
|
||||
|
||||
async for _ in block.run(
|
||||
block.Input(urls=["https://example.com"], credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
|
||||
credentials=TEST_CREDENTIALS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert len(merged) == 1
|
||||
assert merged[0].provider_cost == pytest.approx(0.0)
|
||||
|
||||
|
||||
class TestExaSearchCostTracking:
|
||||
"""ExaSearchBlock merges cost_dollars.total as provider_cost."""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cost_dollars_total_is_merged(self):
|
||||
"""When the SDK response includes cost_dollars, its total is merged."""
|
||||
from backend.blocks.exa.helpers import CostDollars
|
||||
from backend.blocks.exa.search import ExaSearchBlock
|
||||
|
||||
block = ExaSearchBlock()
|
||||
merged: list[NodeExecutionStats] = []
|
||||
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
|
||||
|
||||
mock_sdk_response = MagicMock()
|
||||
mock_sdk_response.results = []
|
||||
mock_sdk_response.context = None
|
||||
mock_sdk_response.resolved_search_type = None
|
||||
mock_sdk_response.cost_dollars = CostDollars(total=0.008)
|
||||
|
||||
with patch("backend.blocks.exa.search.AsyncExa") as mock_exa_cls:
|
||||
mock_exa = MagicMock()
|
||||
mock_exa.search = AsyncMock(return_value=mock_sdk_response)
|
||||
mock_exa_cls.return_value = mock_exa
|
||||
|
||||
async for _ in block.run(
|
||||
block.Input(query="test query", credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
|
||||
credentials=TEST_CREDENTIALS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert len(merged) == 1
|
||||
assert merged[0].provider_cost == pytest.approx(0.008)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_no_cost_dollars_skips_merge(self):
|
||||
"""When cost_dollars is absent, merge_stats is not called."""
|
||||
from backend.blocks.exa.search import ExaSearchBlock
|
||||
|
||||
block = ExaSearchBlock()
|
||||
merged: list[NodeExecutionStats] = []
|
||||
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
|
||||
|
||||
mock_sdk_response = MagicMock()
|
||||
mock_sdk_response.results = []
|
||||
mock_sdk_response.context = None
|
||||
mock_sdk_response.resolved_search_type = None
|
||||
mock_sdk_response.cost_dollars = None
|
||||
|
||||
with patch("backend.blocks.exa.search.AsyncExa") as mock_exa_cls:
|
||||
mock_exa = MagicMock()
|
||||
mock_exa.search = AsyncMock(return_value=mock_sdk_response)
|
||||
mock_exa_cls.return_value = mock_exa
|
||||
|
||||
async for _ in block.run(
|
||||
block.Input(query="test query", credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
|
||||
credentials=TEST_CREDENTIALS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert len(merged) == 0
|
||||
|
||||
|
||||
class TestExaSimilarCostTracking:
|
||||
"""ExaFindSimilarBlock merges cost_dollars.total as provider_cost."""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cost_dollars_total_is_merged(self):
|
||||
"""When the SDK response includes cost_dollars, its total is merged."""
|
||||
from backend.blocks.exa.helpers import CostDollars
|
||||
from backend.blocks.exa.similar import ExaFindSimilarBlock
|
||||
|
||||
block = ExaFindSimilarBlock()
|
||||
merged: list[NodeExecutionStats] = []
|
||||
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
|
||||
|
||||
mock_sdk_response = MagicMock()
|
||||
mock_sdk_response.results = []
|
||||
mock_sdk_response.context = None
|
||||
mock_sdk_response.request_id = "req-1"
|
||||
mock_sdk_response.cost_dollars = CostDollars(total=0.015)
|
||||
|
||||
with patch("backend.blocks.exa.similar.AsyncExa") as mock_exa_cls:
|
||||
mock_exa = MagicMock()
|
||||
mock_exa.find_similar = AsyncMock(return_value=mock_sdk_response)
|
||||
mock_exa_cls.return_value = mock_exa
|
||||
|
||||
async for _ in block.run(
|
||||
block.Input(url="https://example.com", credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
|
||||
credentials=TEST_CREDENTIALS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert len(merged) == 1
|
||||
assert merged[0].provider_cost == pytest.approx(0.015)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_no_cost_dollars_skips_merge(self):
|
||||
"""When cost_dollars is absent, merge_stats is not called."""
|
||||
from backend.blocks.exa.similar import ExaFindSimilarBlock
|
||||
|
||||
block = ExaFindSimilarBlock()
|
||||
merged: list[NodeExecutionStats] = []
|
||||
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
|
||||
|
||||
mock_sdk_response = MagicMock()
|
||||
mock_sdk_response.results = []
|
||||
mock_sdk_response.context = None
|
||||
mock_sdk_response.request_id = "req-2"
|
||||
mock_sdk_response.cost_dollars = None
|
||||
|
||||
with patch("backend.blocks.exa.similar.AsyncExa") as mock_exa_cls:
|
||||
mock_exa = MagicMock()
|
||||
mock_exa.find_similar = AsyncMock(return_value=mock_sdk_response)
|
||||
mock_exa_cls.return_value = mock_exa
|
||||
|
||||
async for _ in block.run(
|
||||
block.Input(url="https://example.com", credentials=TEST_CREDENTIALS_INPUT), # type: ignore[arg-type]
|
||||
credentials=TEST_CREDENTIALS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert len(merged) == 0
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# ExaCreateResearchBlock — cost_dollars from completed poll response
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
COMPLETED_RESEARCH_RESPONSE = {
|
||||
"researchId": "test-research-id",
|
||||
"status": "completed",
|
||||
"model": "exa-research",
|
||||
"instructions": "test instructions",
|
||||
"createdAt": 1700000000000,
|
||||
"finishedAt": 1700000060000,
|
||||
"costDollars": {
|
||||
"total": 0.05,
|
||||
"numSearches": 3,
|
||||
"numPages": 10,
|
||||
"reasoningTokens": 500,
|
||||
},
|
||||
"output": {"content": "Research findings...", "parsed": None},
|
||||
}
|
||||
|
||||
PENDING_RESEARCH_RESPONSE = {
|
||||
"researchId": "test-research-id",
|
||||
"status": "pending",
|
||||
"model": "exa-research",
|
||||
"instructions": "test instructions",
|
||||
"createdAt": 1700000000000,
|
||||
}
|
||||
|
||||
|
||||
class TestExaCreateResearchBlockCostTracking:
|
||||
"""ExaCreateResearchBlock merges cost from completed poll response."""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cost_merged_when_research_completes(self):
|
||||
"""merge_stats called with provider_cost=total when poll returns completed."""
|
||||
from backend.blocks.exa.research import ExaCreateResearchBlock
|
||||
|
||||
block = ExaCreateResearchBlock()
|
||||
merged: list[NodeExecutionStats] = []
|
||||
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
|
||||
|
||||
create_resp = MagicMock()
|
||||
create_resp.json.return_value = PENDING_RESEARCH_RESPONSE
|
||||
|
||||
poll_resp = MagicMock()
|
||||
poll_resp.json.return_value = COMPLETED_RESEARCH_RESPONSE
|
||||
|
||||
mock_instance = MagicMock()
|
||||
mock_instance.post = AsyncMock(return_value=create_resp)
|
||||
mock_instance.get = AsyncMock(return_value=poll_resp)
|
||||
|
||||
with (
|
||||
patch("backend.blocks.exa.research.Requests", return_value=mock_instance),
|
||||
patch("asyncio.sleep", new=AsyncMock()),
|
||||
):
|
||||
async for _ in block.run(
|
||||
block.Input(
|
||||
instructions="test instructions",
|
||||
wait_for_completion=True,
|
||||
credentials=TEST_CREDENTIALS_INPUT, # type: ignore[arg-type]
|
||||
),
|
||||
credentials=TEST_CREDENTIALS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert len(merged) == 1
|
||||
assert merged[0].provider_cost == pytest.approx(0.05)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_no_merge_when_no_cost_dollars(self):
|
||||
"""When completed response has no costDollars, merge_stats is not called."""
|
||||
from backend.blocks.exa.research import ExaCreateResearchBlock
|
||||
|
||||
block = ExaCreateResearchBlock()
|
||||
merged: list[NodeExecutionStats] = []
|
||||
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
|
||||
|
||||
no_cost_response = {**COMPLETED_RESEARCH_RESPONSE, "costDollars": None}
|
||||
create_resp = MagicMock()
|
||||
create_resp.json.return_value = PENDING_RESEARCH_RESPONSE
|
||||
poll_resp = MagicMock()
|
||||
poll_resp.json.return_value = no_cost_response
|
||||
|
||||
mock_instance = MagicMock()
|
||||
mock_instance.post = AsyncMock(return_value=create_resp)
|
||||
mock_instance.get = AsyncMock(return_value=poll_resp)
|
||||
|
||||
with (
|
||||
patch("backend.blocks.exa.research.Requests", return_value=mock_instance),
|
||||
patch("asyncio.sleep", new=AsyncMock()),
|
||||
):
|
||||
async for _ in block.run(
|
||||
block.Input(
|
||||
instructions="test instructions",
|
||||
wait_for_completion=True,
|
||||
credentials=TEST_CREDENTIALS_INPUT, # type: ignore[arg-type]
|
||||
),
|
||||
credentials=TEST_CREDENTIALS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert merged == []
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# ExaGetResearchBlock — cost_dollars from single GET response
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestExaGetResearchBlockCostTracking:
|
||||
"""ExaGetResearchBlock merges cost when the fetched research has cost_dollars."""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cost_merged_from_completed_research(self):
|
||||
"""merge_stats called with provider_cost=total when research has costDollars."""
|
||||
from backend.blocks.exa.research import ExaGetResearchBlock
|
||||
|
||||
block = ExaGetResearchBlock()
|
||||
merged: list[NodeExecutionStats] = []
|
||||
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
|
||||
|
||||
get_resp = MagicMock()
|
||||
get_resp.json.return_value = COMPLETED_RESEARCH_RESPONSE
|
||||
|
||||
mock_instance = MagicMock()
|
||||
mock_instance.get = AsyncMock(return_value=get_resp)
|
||||
|
||||
with patch("backend.blocks.exa.research.Requests", return_value=mock_instance):
|
||||
async for _ in block.run(
|
||||
block.Input(
|
||||
research_id="test-research-id",
|
||||
credentials=TEST_CREDENTIALS_INPUT, # type: ignore[arg-type]
|
||||
),
|
||||
credentials=TEST_CREDENTIALS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert len(merged) == 1
|
||||
assert merged[0].provider_cost == pytest.approx(0.05)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_no_merge_when_no_cost_dollars(self):
|
||||
"""When research has no costDollars, merge_stats is not called."""
|
||||
from backend.blocks.exa.research import ExaGetResearchBlock
|
||||
|
||||
block = ExaGetResearchBlock()
|
||||
merged: list[NodeExecutionStats] = []
|
||||
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
|
||||
|
||||
no_cost_response = {**COMPLETED_RESEARCH_RESPONSE, "costDollars": None}
|
||||
get_resp = MagicMock()
|
||||
get_resp.json.return_value = no_cost_response
|
||||
|
||||
mock_instance = MagicMock()
|
||||
mock_instance.get = AsyncMock(return_value=get_resp)
|
||||
|
||||
with patch("backend.blocks.exa.research.Requests", return_value=mock_instance):
|
||||
async for _ in block.run(
|
||||
block.Input(
|
||||
research_id="test-research-id",
|
||||
credentials=TEST_CREDENTIALS_INPUT, # type: ignore[arg-type]
|
||||
),
|
||||
credentials=TEST_CREDENTIALS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert merged == []
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# ExaWaitForResearchBlock — cost_dollars from polling response
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestExaWaitForResearchBlockCostTracking:
|
||||
"""ExaWaitForResearchBlock merges cost when the polled research has cost_dollars."""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cost_merged_when_research_completes(self):
|
||||
"""merge_stats called with provider_cost=total once polling returns completed."""
|
||||
from backend.blocks.exa.research import ExaWaitForResearchBlock
|
||||
|
||||
block = ExaWaitForResearchBlock()
|
||||
merged: list[NodeExecutionStats] = []
|
||||
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
|
||||
|
||||
poll_resp = MagicMock()
|
||||
poll_resp.json.return_value = COMPLETED_RESEARCH_RESPONSE
|
||||
|
||||
mock_instance = MagicMock()
|
||||
mock_instance.get = AsyncMock(return_value=poll_resp)
|
||||
|
||||
with (
|
||||
patch("backend.blocks.exa.research.Requests", return_value=mock_instance),
|
||||
patch("asyncio.sleep", new=AsyncMock()),
|
||||
):
|
||||
async for _ in block.run(
|
||||
block.Input(
|
||||
research_id="test-research-id",
|
||||
credentials=TEST_CREDENTIALS_INPUT, # type: ignore[arg-type]
|
||||
),
|
||||
credentials=TEST_CREDENTIALS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert len(merged) == 1
|
||||
assert merged[0].provider_cost == pytest.approx(0.05)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_no_merge_when_no_cost_dollars(self):
|
||||
"""When completed research has no costDollars, merge_stats is not called."""
|
||||
from backend.blocks.exa.research import ExaWaitForResearchBlock
|
||||
|
||||
block = ExaWaitForResearchBlock()
|
||||
merged: list[NodeExecutionStats] = []
|
||||
block.merge_stats = lambda s: merged.append(s) # type: ignore[assignment]
|
||||
|
||||
no_cost_response = {**COMPLETED_RESEARCH_RESPONSE, "costDollars": None}
|
||||
poll_resp = MagicMock()
|
||||
poll_resp.json.return_value = no_cost_response
|
||||
|
||||
mock_instance = MagicMock()
|
||||
mock_instance.get = AsyncMock(return_value=poll_resp)
|
||||
|
||||
with (
|
||||
patch("backend.blocks.exa.research.Requests", return_value=mock_instance),
|
||||
patch("asyncio.sleep", new=AsyncMock()),
|
||||
):
|
||||
async for _ in block.run(
|
||||
block.Input(
|
||||
research_id="test-research-id",
|
||||
credentials=TEST_CREDENTIALS_INPUT, # type: ignore[arg-type]
|
||||
),
|
||||
credentials=TEST_CREDENTIALS,
|
||||
):
|
||||
pass
|
||||
|
||||
assert merged == []
|
||||
@@ -12,6 +12,7 @@ from typing import Any, Dict, List, Optional
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.data.model import NodeExecutionStats
|
||||
from backend.sdk import (
|
||||
APIKeyCredentials,
|
||||
Block,
|
||||
@@ -232,6 +233,11 @@ class ExaCreateResearchBlock(Block):
|
||||
|
||||
if research.cost_dollars:
|
||||
yield "cost_total", research.cost_dollars.total
|
||||
self.merge_stats(
|
||||
NodeExecutionStats(
|
||||
provider_cost=research.cost_dollars.total
|
||||
)
|
||||
)
|
||||
return
|
||||
|
||||
await asyncio.sleep(check_interval)
|
||||
@@ -346,6 +352,9 @@ class ExaGetResearchBlock(Block):
|
||||
yield "cost_searches", research.cost_dollars.num_searches
|
||||
yield "cost_pages", research.cost_dollars.num_pages
|
||||
yield "cost_reasoning_tokens", research.cost_dollars.reasoning_tokens
|
||||
self.merge_stats(
|
||||
NodeExecutionStats(provider_cost=research.cost_dollars.total)
|
||||
)
|
||||
|
||||
yield "error_message", research.error
|
||||
|
||||
@@ -432,6 +441,9 @@ class ExaWaitForResearchBlock(Block):
|
||||
|
||||
if research.cost_dollars:
|
||||
yield "cost_total", research.cost_dollars.total
|
||||
self.merge_stats(
|
||||
NodeExecutionStats(provider_cost=research.cost_dollars.total)
|
||||
)
|
||||
|
||||
return
|
||||
|
||||
|
||||
@@ -4,6 +4,7 @@ from typing import Optional
|
||||
|
||||
from exa_py import AsyncExa
|
||||
|
||||
from backend.data.model import NodeExecutionStats
|
||||
from backend.sdk import (
|
||||
APIKeyCredentials,
|
||||
Block,
|
||||
@@ -206,3 +207,6 @@ class ExaSearchBlock(Block):
|
||||
|
||||
if response.cost_dollars:
|
||||
yield "cost_dollars", response.cost_dollars
|
||||
self.merge_stats(
|
||||
NodeExecutionStats(provider_cost=response.cost_dollars.total)
|
||||
)
|
||||
|
||||
@@ -3,6 +3,7 @@ from typing import Optional
|
||||
|
||||
from exa_py import AsyncExa
|
||||
|
||||
from backend.data.model import NodeExecutionStats
|
||||
from backend.sdk import (
|
||||
APIKeyCredentials,
|
||||
Block,
|
||||
@@ -167,3 +168,6 @@ class ExaFindSimilarBlock(Block):
|
||||
|
||||
if response.cost_dollars:
|
||||
yield "cost_dollars", response.cost_dollars
|
||||
self.merge_stats(
|
||||
NodeExecutionStats(provider_cost=response.cost_dollars.total)
|
||||
)
|
||||
|
||||
@@ -14,6 +14,7 @@ from backend.data.model import (
|
||||
APIKeyCredentials,
|
||||
CredentialsField,
|
||||
CredentialsMetaInput,
|
||||
NodeExecutionStats,
|
||||
SchemaField,
|
||||
)
|
||||
from backend.integrations.providers import ProviderName
|
||||
@@ -117,6 +118,11 @@ class GoogleMapsSearchBlock(Block):
|
||||
input_data.radius,
|
||||
input_data.max_results,
|
||||
)
|
||||
self.merge_stats(
|
||||
NodeExecutionStats(
|
||||
provider_cost=float(len(places)), provider_cost_type="items"
|
||||
)
|
||||
)
|
||||
for place in places:
|
||||
yield "place", place
|
||||
|
||||
|
||||
@@ -10,7 +10,7 @@ from backend.blocks.jina._auth import (
|
||||
JinaCredentialsField,
|
||||
JinaCredentialsInput,
|
||||
)
|
||||
from backend.data.model import SchemaField
|
||||
from backend.data.model import NodeExecutionStats, SchemaField
|
||||
from backend.util.request import Requests
|
||||
|
||||
|
||||
@@ -45,5 +45,13 @@ class JinaEmbeddingBlock(Block):
|
||||
}
|
||||
data = {"input": input_data.texts, "model": input_data.model}
|
||||
response = await Requests().post(url, headers=headers, json=data)
|
||||
embeddings = [e["embedding"] for e in response.json()["data"]]
|
||||
resp_json = response.json()
|
||||
embeddings = [e["embedding"] for e in resp_json["data"]]
|
||||
usage = resp_json.get("usage", {})
|
||||
if usage.get("total_tokens"):
|
||||
self.merge_stats(
|
||||
NodeExecutionStats(
|
||||
input_token_count=usage.get("total_tokens", 0),
|
||||
)
|
||||
)
|
||||
yield "embeddings", embeddings
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
# This file contains a lot of prompt block strings that would trigger "line too long"
|
||||
# flake8: noqa: E501
|
||||
import logging
|
||||
import math
|
||||
import re
|
||||
import secrets
|
||||
from abc import ABC
|
||||
@@ -13,6 +14,7 @@ import ollama
|
||||
import openai
|
||||
from anthropic.types import ToolParam
|
||||
from groq import AsyncGroq
|
||||
from openai.types.chat import ChatCompletion as OpenAIChatCompletion
|
||||
from pydantic import BaseModel, SecretStr
|
||||
|
||||
from backend.blocks._base import (
|
||||
@@ -205,6 +207,19 @@ class LlmModel(str, Enum, metaclass=LlmModelMeta):
|
||||
KIMI_K2 = "moonshotai/kimi-k2"
|
||||
QWEN3_235B_A22B_THINKING = "qwen/qwen3-235b-a22b-thinking-2507"
|
||||
QWEN3_CODER = "qwen/qwen3-coder"
|
||||
# Z.ai (Zhipu) models
|
||||
ZAI_GLM_4_32B = "z-ai/glm-4-32b"
|
||||
ZAI_GLM_4_5 = "z-ai/glm-4.5"
|
||||
ZAI_GLM_4_5_AIR = "z-ai/glm-4.5-air"
|
||||
ZAI_GLM_4_5_AIR_FREE = "z-ai/glm-4.5-air:free"
|
||||
ZAI_GLM_4_5V = "z-ai/glm-4.5v"
|
||||
ZAI_GLM_4_6 = "z-ai/glm-4.6"
|
||||
ZAI_GLM_4_6V = "z-ai/glm-4.6v"
|
||||
ZAI_GLM_4_7 = "z-ai/glm-4.7"
|
||||
ZAI_GLM_4_7_FLASH = "z-ai/glm-4.7-flash"
|
||||
ZAI_GLM_5 = "z-ai/glm-5"
|
||||
ZAI_GLM_5_TURBO = "z-ai/glm-5-turbo"
|
||||
ZAI_GLM_5V_TURBO = "z-ai/glm-5v-turbo"
|
||||
# Llama API models
|
||||
LLAMA_API_LLAMA_4_SCOUT = "Llama-4-Scout-17B-16E-Instruct-FP8"
|
||||
LLAMA_API_LLAMA4_MAVERICK = "Llama-4-Maverick-17B-128E-Instruct-FP8"
|
||||
@@ -630,6 +645,43 @@ MODEL_METADATA = {
|
||||
LlmModel.QWEN3_CODER: ModelMetadata(
|
||||
"open_router", 262144, 262144, "Qwen 3 Coder", "OpenRouter", "Qwen", 3
|
||||
),
|
||||
# https://openrouter.ai/models?q=z-ai
|
||||
LlmModel.ZAI_GLM_4_32B: ModelMetadata(
|
||||
"open_router", 128000, 128000, "GLM 4 32B", "OpenRouter", "Z.ai", 1
|
||||
),
|
||||
LlmModel.ZAI_GLM_4_5: ModelMetadata(
|
||||
"open_router", 131072, 98304, "GLM 4.5", "OpenRouter", "Z.ai", 2
|
||||
),
|
||||
LlmModel.ZAI_GLM_4_5_AIR: ModelMetadata(
|
||||
"open_router", 131072, 98304, "GLM 4.5 Air", "OpenRouter", "Z.ai", 1
|
||||
),
|
||||
LlmModel.ZAI_GLM_4_5_AIR_FREE: ModelMetadata(
|
||||
"open_router", 131072, 96000, "GLM 4.5 Air (Free)", "OpenRouter", "Z.ai", 1
|
||||
),
|
||||
LlmModel.ZAI_GLM_4_5V: ModelMetadata(
|
||||
"open_router", 65536, 16384, "GLM 4.5V", "OpenRouter", "Z.ai", 2
|
||||
),
|
||||
LlmModel.ZAI_GLM_4_6: ModelMetadata(
|
||||
"open_router", 204800, 204800, "GLM 4.6", "OpenRouter", "Z.ai", 1
|
||||
),
|
||||
LlmModel.ZAI_GLM_4_6V: ModelMetadata(
|
||||
"open_router", 131072, 131072, "GLM 4.6V", "OpenRouter", "Z.ai", 1
|
||||
),
|
||||
LlmModel.ZAI_GLM_4_7: ModelMetadata(
|
||||
"open_router", 202752, 65535, "GLM 4.7", "OpenRouter", "Z.ai", 1
|
||||
),
|
||||
LlmModel.ZAI_GLM_4_7_FLASH: ModelMetadata(
|
||||
"open_router", 202752, 202752, "GLM 4.7 Flash", "OpenRouter", "Z.ai", 1
|
||||
),
|
||||
LlmModel.ZAI_GLM_5: ModelMetadata(
|
||||
"open_router", 80000, 80000, "GLM 5", "OpenRouter", "Z.ai", 2
|
||||
),
|
||||
LlmModel.ZAI_GLM_5_TURBO: ModelMetadata(
|
||||
"open_router", 202752, 131072, "GLM 5 Turbo", "OpenRouter", "Z.ai", 3
|
||||
),
|
||||
LlmModel.ZAI_GLM_5V_TURBO: ModelMetadata(
|
||||
"open_router", 202752, 131072, "GLM 5V Turbo", "OpenRouter", "Z.ai", 3
|
||||
),
|
||||
# Llama API models
|
||||
LlmModel.LLAMA_API_LLAMA_4_SCOUT: ModelMetadata(
|
||||
"llama_api",
|
||||
@@ -687,6 +739,7 @@ class LLMResponse(BaseModel):
|
||||
prompt_tokens: int
|
||||
completion_tokens: int
|
||||
reasoning: Optional[str] = None
|
||||
provider_cost: float | None = None
|
||||
|
||||
|
||||
def convert_openai_tool_fmt_to_anthropic(
|
||||
@@ -721,6 +774,35 @@ def convert_openai_tool_fmt_to_anthropic(
|
||||
return anthropic_tools
|
||||
|
||||
|
||||
def extract_openrouter_cost(response: OpenAIChatCompletion) -> float | None:
|
||||
"""Extract OpenRouter's `x-total-cost` header from an OpenAI SDK response.
|
||||
|
||||
OpenRouter returns the per-request USD cost in a response header. The
|
||||
OpenAI SDK exposes the raw httpx response via an undocumented `_response`
|
||||
attribute. We use try/except AttributeError so that if the SDK ever drops
|
||||
or renames that attribute, the warning is visible in logs rather than
|
||||
silently degrading to no cost tracking.
|
||||
"""
|
||||
try:
|
||||
raw_resp = response._response # type: ignore[attr-defined]
|
||||
except AttributeError:
|
||||
logger.warning(
|
||||
"OpenAI SDK response missing _response attribute"
|
||||
" — OpenRouter cost tracking unavailable"
|
||||
)
|
||||
return None
|
||||
try:
|
||||
cost_header = raw_resp.headers.get("x-total-cost")
|
||||
if not cost_header:
|
||||
return None
|
||||
cost = float(cost_header)
|
||||
if not math.isfinite(cost):
|
||||
return None
|
||||
return cost
|
||||
except (ValueError, TypeError, AttributeError):
|
||||
return None
|
||||
|
||||
|
||||
def extract_openai_reasoning(response) -> str | None:
|
||||
"""Extract reasoning from OpenAI-compatible response if available."""
|
||||
"""Note: This will likely not working since the reasoning is not present in another Response API"""
|
||||
@@ -1053,6 +1135,7 @@ async def llm_call(
|
||||
prompt_tokens=response.usage.prompt_tokens if response.usage else 0,
|
||||
completion_tokens=response.usage.completion_tokens if response.usage else 0,
|
||||
reasoning=reasoning,
|
||||
provider_cost=extract_openrouter_cost(response),
|
||||
)
|
||||
elif provider == "llama_api":
|
||||
tools_param = tools if tools else openai.NOT_GIVEN
|
||||
@@ -1360,6 +1443,7 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
|
||||
|
||||
error_feedback_message = ""
|
||||
llm_model = input_data.model
|
||||
last_attempt_cost: float | None = None
|
||||
|
||||
for retry_count in range(input_data.retry):
|
||||
logger.debug(f"LLM request: {prompt}")
|
||||
@@ -1377,12 +1461,15 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
|
||||
max_tokens=input_data.max_tokens,
|
||||
)
|
||||
response_text = llm_response.response
|
||||
self.merge_stats(
|
||||
NodeExecutionStats(
|
||||
input_token_count=llm_response.prompt_tokens,
|
||||
output_token_count=llm_response.completion_tokens,
|
||||
)
|
||||
# Merge token counts for every attempt (each call costs tokens).
|
||||
# provider_cost (actual USD) is tracked separately and only merged
|
||||
# on success to avoid double-counting across retries.
|
||||
token_stats = NodeExecutionStats(
|
||||
input_token_count=llm_response.prompt_tokens,
|
||||
output_token_count=llm_response.completion_tokens,
|
||||
)
|
||||
self.merge_stats(token_stats)
|
||||
last_attempt_cost = llm_response.provider_cost
|
||||
logger.debug(f"LLM attempt-{retry_count} response: {response_text}")
|
||||
|
||||
if input_data.expected_format:
|
||||
@@ -1451,6 +1538,7 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
|
||||
NodeExecutionStats(
|
||||
llm_call_count=retry_count + 1,
|
||||
llm_retry_count=retry_count,
|
||||
provider_cost=last_attempt_cost,
|
||||
)
|
||||
)
|
||||
yield "response", response_obj
|
||||
@@ -1471,6 +1559,7 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
|
||||
NodeExecutionStats(
|
||||
llm_call_count=retry_count + 1,
|
||||
llm_retry_count=retry_count,
|
||||
provider_cost=last_attempt_cost,
|
||||
)
|
||||
)
|
||||
yield "response", {"response": response_text}
|
||||
|
||||
@@ -23,7 +23,7 @@ from backend.blocks.smartlead.models import (
|
||||
SaveSequencesResponse,
|
||||
Sequence,
|
||||
)
|
||||
from backend.data.model import CredentialsField, SchemaField
|
||||
from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField
|
||||
|
||||
|
||||
class CreateCampaignBlock(Block):
|
||||
@@ -226,6 +226,12 @@ class AddLeadToCampaignBlock(Block):
|
||||
response = await self.add_leads_to_campaign(
|
||||
input_data.campaign_id, input_data.lead_list, credentials
|
||||
)
|
||||
self.merge_stats(
|
||||
NodeExecutionStats(
|
||||
provider_cost=float(len(input_data.lead_list)),
|
||||
provider_cost_type="items",
|
||||
)
|
||||
)
|
||||
|
||||
yield "campaign_id", input_data.campaign_id
|
||||
yield "upload_count", response.upload_count
|
||||
|
||||
@@ -199,6 +199,66 @@ class TestLLMStatsTracking:
|
||||
assert block.execution_stats.llm_call_count == 2 # retry_count + 1 = 1 + 1 = 2
|
||||
assert block.execution_stats.llm_retry_count == 1
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_retry_cost_uses_last_attempt_only(self):
|
||||
"""provider_cost is only merged from the final successful attempt.
|
||||
|
||||
Intermediate retry costs are intentionally dropped to avoid
|
||||
double-counting: the cost of failed attempts is captured in
|
||||
last_attempt_cost only when the loop eventually succeeds.
|
||||
"""
|
||||
import backend.blocks.llm as llm
|
||||
|
||||
block = llm.AIStructuredResponseGeneratorBlock()
|
||||
call_count = 0
|
||||
|
||||
async def mock_llm_call(*args, **kwargs):
|
||||
nonlocal call_count
|
||||
call_count += 1
|
||||
if call_count == 1:
|
||||
# First attempt: fails validation, returns cost $0.01
|
||||
return llm.LLMResponse(
|
||||
raw_response="",
|
||||
prompt=[],
|
||||
response='<json_output id="test123456">{"wrong": "key"}</json_output>',
|
||||
tool_calls=None,
|
||||
prompt_tokens=10,
|
||||
completion_tokens=5,
|
||||
reasoning=None,
|
||||
provider_cost=0.01,
|
||||
)
|
||||
# Second attempt: succeeds, returns cost $0.02
|
||||
return llm.LLMResponse(
|
||||
raw_response="",
|
||||
prompt=[],
|
||||
response='<json_output id="test123456">{"key1": "value1", "key2": "value2"}</json_output>',
|
||||
tool_calls=None,
|
||||
prompt_tokens=20,
|
||||
completion_tokens=10,
|
||||
reasoning=None,
|
||||
provider_cost=0.02,
|
||||
)
|
||||
|
||||
block.llm_call = mock_llm_call # type: ignore
|
||||
|
||||
input_data = llm.AIStructuredResponseGeneratorBlock.Input(
|
||||
prompt="Test prompt",
|
||||
expected_format={"key1": "desc1", "key2": "desc2"},
|
||||
model=llm.DEFAULT_LLM_MODEL,
|
||||
credentials=llm.TEST_CREDENTIALS_INPUT, # type: ignore
|
||||
retry=2,
|
||||
)
|
||||
|
||||
with patch("secrets.token_hex", return_value="test123456"):
|
||||
async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
|
||||
pass
|
||||
|
||||
# Only the final successful attempt's cost is merged
|
||||
assert block.execution_stats.provider_cost == pytest.approx(0.02)
|
||||
# Tokens from both attempts accumulate
|
||||
assert block.execution_stats.input_token_count == 30
|
||||
assert block.execution_stats.output_token_count == 15
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_ai_text_summarizer_multiple_chunks(self):
|
||||
"""Test that AITextSummarizerBlock correctly accumulates stats across multiple chunks."""
|
||||
@@ -987,3 +1047,63 @@ class TestLlmModelMissing:
|
||||
assert (
|
||||
llm.LlmModel("extra/google/gemini-2.5-pro") == llm.LlmModel.GEMINI_2_5_PRO
|
||||
)
|
||||
|
||||
|
||||
class TestExtractOpenRouterCost:
|
||||
"""Tests for extract_openrouter_cost — the x-total-cost header parser."""
|
||||
|
||||
def _mk_response(self, headers: dict | None):
|
||||
response = MagicMock()
|
||||
if headers is None:
|
||||
response._response = None
|
||||
else:
|
||||
raw = MagicMock()
|
||||
raw.headers = headers
|
||||
response._response = raw
|
||||
return response
|
||||
|
||||
def test_extracts_numeric_cost(self):
|
||||
response = self._mk_response({"x-total-cost": "0.0042"})
|
||||
assert llm.extract_openrouter_cost(response) == 0.0042
|
||||
|
||||
def test_returns_none_when_header_missing(self):
|
||||
response = self._mk_response({})
|
||||
assert llm.extract_openrouter_cost(response) is None
|
||||
|
||||
def test_returns_none_when_header_empty_string(self):
|
||||
response = self._mk_response({"x-total-cost": ""})
|
||||
assert llm.extract_openrouter_cost(response) is None
|
||||
|
||||
def test_returns_none_when_header_non_numeric(self):
|
||||
response = self._mk_response({"x-total-cost": "not-a-number"})
|
||||
assert llm.extract_openrouter_cost(response) is None
|
||||
|
||||
def test_returns_none_when_no_response_attr(self):
|
||||
response = MagicMock(spec=[]) # no _response attr
|
||||
assert llm.extract_openrouter_cost(response) is None
|
||||
|
||||
def test_returns_none_when_raw_is_none(self):
|
||||
response = self._mk_response(None)
|
||||
assert llm.extract_openrouter_cost(response) is None
|
||||
|
||||
def test_returns_none_when_raw_has_no_headers(self):
|
||||
response = MagicMock()
|
||||
response._response = MagicMock(spec=[]) # no headers attr
|
||||
assert llm.extract_openrouter_cost(response) is None
|
||||
|
||||
def test_returns_zero_for_zero_cost(self):
|
||||
"""Zero-cost is a valid value (free tier) and must not become None."""
|
||||
response = self._mk_response({"x-total-cost": "0"})
|
||||
assert llm.extract_openrouter_cost(response) == 0.0
|
||||
|
||||
def test_returns_none_for_inf(self):
|
||||
response = self._mk_response({"x-total-cost": "inf"})
|
||||
assert llm.extract_openrouter_cost(response) is None
|
||||
|
||||
def test_returns_none_for_negative_inf(self):
|
||||
response = self._mk_response({"x-total-cost": "-inf"})
|
||||
assert llm.extract_openrouter_cost(response) is None
|
||||
|
||||
def test_returns_none_for_nan(self):
|
||||
response = self._mk_response({"x-total-cost": "nan"})
|
||||
assert llm.extract_openrouter_cost(response) is None
|
||||
|
||||
@@ -13,6 +13,7 @@ from backend.data.model import (
|
||||
APIKeyCredentials,
|
||||
CredentialsField,
|
||||
CredentialsMetaInput,
|
||||
NodeExecutionStats,
|
||||
SchemaField,
|
||||
)
|
||||
from backend.integrations.providers import ProviderName
|
||||
@@ -104,4 +105,10 @@ class UnrealTextToSpeechBlock(Block):
|
||||
input_data.text,
|
||||
input_data.voice_id,
|
||||
)
|
||||
self.merge_stats(
|
||||
NodeExecutionStats(
|
||||
provider_cost=float(len(input_data.text)),
|
||||
provider_cost_type="characters",
|
||||
)
|
||||
)
|
||||
yield "mp3_url", api_response["OutputUri"]
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,799 @@
|
||||
"""Unit tests for baseline service pure-logic helpers.
|
||||
|
||||
These tests cover ``_baseline_conversation_updater`` and ``_BaselineStreamState``
|
||||
without requiring API keys, database connections, or network access.
|
||||
"""
|
||||
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
from openai.types.chat import ChatCompletionToolParam
|
||||
|
||||
from backend.copilot.baseline.service import (
|
||||
_baseline_conversation_updater,
|
||||
_BaselineStreamState,
|
||||
_compress_session_messages,
|
||||
_ThinkingStripper,
|
||||
)
|
||||
from backend.copilot.model import ChatMessage
|
||||
from backend.copilot.transcript_builder import TranscriptBuilder
|
||||
from backend.util.prompt import CompressResult
|
||||
from backend.util.tool_call_loop import LLMLoopResponse, LLMToolCall, ToolCallResult
|
||||
|
||||
|
||||
class TestBaselineStreamState:
|
||||
def test_defaults(self):
|
||||
state = _BaselineStreamState()
|
||||
assert state.pending_events == []
|
||||
assert state.assistant_text == ""
|
||||
assert state.text_started is False
|
||||
assert state.turn_prompt_tokens == 0
|
||||
assert state.turn_completion_tokens == 0
|
||||
assert state.text_block_id # Should be a UUID string
|
||||
|
||||
def test_mutable_fields(self):
|
||||
state = _BaselineStreamState()
|
||||
state.assistant_text = "hello"
|
||||
state.turn_prompt_tokens = 100
|
||||
state.turn_completion_tokens = 50
|
||||
assert state.assistant_text == "hello"
|
||||
assert state.turn_prompt_tokens == 100
|
||||
assert state.turn_completion_tokens == 50
|
||||
|
||||
|
||||
class TestBaselineConversationUpdater:
|
||||
"""Tests for _baseline_conversation_updater which updates the OpenAI
|
||||
message list and transcript builder after each LLM call."""
|
||||
|
||||
def _make_transcript_builder(self) -> TranscriptBuilder:
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_user("test question")
|
||||
return builder
|
||||
|
||||
def test_text_only_response(self):
|
||||
"""When the LLM returns text without tool calls, the updater appends
|
||||
a single assistant message and records it in the transcript."""
|
||||
messages: list = []
|
||||
builder = self._make_transcript_builder()
|
||||
response = LLMLoopResponse(
|
||||
response_text="Hello, world!",
|
||||
tool_calls=[],
|
||||
raw_response=None,
|
||||
prompt_tokens=0,
|
||||
completion_tokens=0,
|
||||
)
|
||||
|
||||
_baseline_conversation_updater(
|
||||
messages,
|
||||
response,
|
||||
tool_results=None,
|
||||
transcript_builder=builder,
|
||||
model="test-model",
|
||||
)
|
||||
|
||||
assert len(messages) == 1
|
||||
assert messages[0]["role"] == "assistant"
|
||||
assert messages[0]["content"] == "Hello, world!"
|
||||
# Transcript should have user + assistant
|
||||
assert builder.entry_count == 2
|
||||
assert builder.last_entry_type == "assistant"
|
||||
|
||||
def test_tool_calls_response(self):
|
||||
"""When the LLM returns tool calls, the updater appends the assistant
|
||||
message with tool_calls and tool result messages."""
|
||||
messages: list = []
|
||||
builder = self._make_transcript_builder()
|
||||
response = LLMLoopResponse(
|
||||
response_text="Let me search...",
|
||||
tool_calls=[
|
||||
LLMToolCall(
|
||||
id="tc_1",
|
||||
name="search",
|
||||
arguments='{"query": "test"}',
|
||||
),
|
||||
],
|
||||
raw_response=None,
|
||||
prompt_tokens=0,
|
||||
completion_tokens=0,
|
||||
)
|
||||
tool_results = [
|
||||
ToolCallResult(
|
||||
tool_call_id="tc_1",
|
||||
tool_name="search",
|
||||
content="Found result",
|
||||
),
|
||||
]
|
||||
|
||||
_baseline_conversation_updater(
|
||||
messages,
|
||||
response,
|
||||
tool_results=tool_results,
|
||||
transcript_builder=builder,
|
||||
model="test-model",
|
||||
)
|
||||
|
||||
# Messages: assistant (with tool_calls) + tool result
|
||||
assert len(messages) == 2
|
||||
assert messages[0]["role"] == "assistant"
|
||||
assert messages[0]["content"] == "Let me search..."
|
||||
assert len(messages[0]["tool_calls"]) == 1
|
||||
assert messages[0]["tool_calls"][0]["id"] == "tc_1"
|
||||
assert messages[1]["role"] == "tool"
|
||||
assert messages[1]["tool_call_id"] == "tc_1"
|
||||
assert messages[1]["content"] == "Found result"
|
||||
|
||||
# Transcript: user + assistant(tool_use) + user(tool_result)
|
||||
assert builder.entry_count == 3
|
||||
|
||||
def test_tool_calls_without_text(self):
|
||||
"""Tool calls without accompanying text should still work."""
|
||||
messages: list = []
|
||||
builder = self._make_transcript_builder()
|
||||
response = LLMLoopResponse(
|
||||
response_text=None,
|
||||
tool_calls=[
|
||||
LLMToolCall(id="tc_1", name="run", arguments="{}"),
|
||||
],
|
||||
raw_response=None,
|
||||
prompt_tokens=0,
|
||||
completion_tokens=0,
|
||||
)
|
||||
tool_results = [
|
||||
ToolCallResult(tool_call_id="tc_1", tool_name="run", content="done"),
|
||||
]
|
||||
|
||||
_baseline_conversation_updater(
|
||||
messages,
|
||||
response,
|
||||
tool_results=tool_results,
|
||||
transcript_builder=builder,
|
||||
model="test-model",
|
||||
)
|
||||
|
||||
assert len(messages) == 2
|
||||
assert "content" not in messages[0] # No text content
|
||||
assert messages[0]["tool_calls"][0]["function"]["name"] == "run"
|
||||
|
||||
def test_no_text_no_tools(self):
|
||||
"""When the response has no text and no tool calls, nothing is appended."""
|
||||
messages: list = []
|
||||
builder = self._make_transcript_builder()
|
||||
response = LLMLoopResponse(
|
||||
response_text=None,
|
||||
tool_calls=[],
|
||||
raw_response=None,
|
||||
prompt_tokens=0,
|
||||
completion_tokens=0,
|
||||
)
|
||||
|
||||
_baseline_conversation_updater(
|
||||
messages,
|
||||
response,
|
||||
tool_results=None,
|
||||
transcript_builder=builder,
|
||||
model="test-model",
|
||||
)
|
||||
|
||||
assert len(messages) == 0
|
||||
# Only the user entry from setup
|
||||
assert builder.entry_count == 1
|
||||
|
||||
def test_multiple_tool_calls(self):
|
||||
"""Multiple tool calls in a single response are all recorded."""
|
||||
messages: list = []
|
||||
builder = self._make_transcript_builder()
|
||||
response = LLMLoopResponse(
|
||||
response_text=None,
|
||||
tool_calls=[
|
||||
LLMToolCall(id="tc_1", name="tool_a", arguments="{}"),
|
||||
LLMToolCall(id="tc_2", name="tool_b", arguments='{"x": 1}'),
|
||||
],
|
||||
raw_response=None,
|
||||
prompt_tokens=0,
|
||||
completion_tokens=0,
|
||||
)
|
||||
tool_results = [
|
||||
ToolCallResult(tool_call_id="tc_1", tool_name="tool_a", content="result_a"),
|
||||
ToolCallResult(tool_call_id="tc_2", tool_name="tool_b", content="result_b"),
|
||||
]
|
||||
|
||||
_baseline_conversation_updater(
|
||||
messages,
|
||||
response,
|
||||
tool_results=tool_results,
|
||||
transcript_builder=builder,
|
||||
model="test-model",
|
||||
)
|
||||
|
||||
# 1 assistant + 2 tool results
|
||||
assert len(messages) == 3
|
||||
assert len(messages[0]["tool_calls"]) == 2
|
||||
assert messages[1]["tool_call_id"] == "tc_1"
|
||||
assert messages[2]["tool_call_id"] == "tc_2"
|
||||
|
||||
def test_invalid_tool_arguments_handled(self):
|
||||
"""Tool call with invalid JSON arguments: the arguments field is
|
||||
stored as-is in the message, and orjson failure falls back to {}
|
||||
in the transcript content_blocks."""
|
||||
messages: list = []
|
||||
builder = self._make_transcript_builder()
|
||||
response = LLMLoopResponse(
|
||||
response_text=None,
|
||||
tool_calls=[
|
||||
LLMToolCall(id="tc_1", name="tool_x", arguments="not-json"),
|
||||
],
|
||||
raw_response=None,
|
||||
prompt_tokens=0,
|
||||
completion_tokens=0,
|
||||
)
|
||||
tool_results = [
|
||||
ToolCallResult(tool_call_id="tc_1", tool_name="tool_x", content="ok"),
|
||||
]
|
||||
|
||||
_baseline_conversation_updater(
|
||||
messages,
|
||||
response,
|
||||
tool_results=tool_results,
|
||||
transcript_builder=builder,
|
||||
model="test-model",
|
||||
)
|
||||
|
||||
# Should not raise — invalid JSON falls back to {} in transcript
|
||||
assert len(messages) == 2
|
||||
assert messages[0]["tool_calls"][0]["function"]["arguments"] == "not-json"
|
||||
|
||||
|
||||
class TestCompressSessionMessagesPreservesToolCalls:
|
||||
"""``_compress_session_messages`` must round-trip tool_calls + tool_call_id.
|
||||
|
||||
Compression serialises ChatMessage to dict for ``compress_context`` and
|
||||
reifies the result back to ChatMessage. A regression that drops
|
||||
``tool_calls`` or ``tool_call_id`` would corrupt the OpenAI message
|
||||
list and break downstream tool-execution rounds.
|
||||
"""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_compressed_output_keeps_tool_calls_and_ids(self):
|
||||
# Simulate compression that returns a summary + the most recent
|
||||
# assistant(tool_call) + tool(tool_result) intact.
|
||||
summary = {"role": "system", "content": "prior turns: user asked X"}
|
||||
assistant_with_tc = {
|
||||
"role": "assistant",
|
||||
"content": "calling tool",
|
||||
"tool_calls": [
|
||||
{
|
||||
"id": "tc_abc",
|
||||
"type": "function",
|
||||
"function": {"name": "search", "arguments": '{"q":"y"}'},
|
||||
}
|
||||
],
|
||||
}
|
||||
tool_result = {
|
||||
"role": "tool",
|
||||
"tool_call_id": "tc_abc",
|
||||
"content": "search result",
|
||||
}
|
||||
|
||||
compress_result = CompressResult(
|
||||
messages=[summary, assistant_with_tc, tool_result],
|
||||
token_count=100,
|
||||
was_compacted=True,
|
||||
original_token_count=5000,
|
||||
messages_summarized=10,
|
||||
messages_dropped=0,
|
||||
)
|
||||
|
||||
# Input: messages that should be compressed.
|
||||
input_messages = [
|
||||
ChatMessage(role="user", content="q1"),
|
||||
ChatMessage(
|
||||
role="assistant",
|
||||
content="calling tool",
|
||||
tool_calls=[
|
||||
{
|
||||
"id": "tc_abc",
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "search",
|
||||
"arguments": '{"q":"y"}',
|
||||
},
|
||||
}
|
||||
],
|
||||
),
|
||||
ChatMessage(
|
||||
role="tool",
|
||||
tool_call_id="tc_abc",
|
||||
content="search result",
|
||||
),
|
||||
]
|
||||
|
||||
with patch(
|
||||
"backend.copilot.baseline.service.compress_context",
|
||||
new=AsyncMock(return_value=compress_result),
|
||||
):
|
||||
compressed = await _compress_session_messages(
|
||||
input_messages, model="openrouter/anthropic/claude-opus-4"
|
||||
)
|
||||
|
||||
# Summary, assistant(tool_calls), tool(tool_call_id).
|
||||
assert len(compressed) == 3
|
||||
# Assistant message must keep its tool_calls intact.
|
||||
assistant_msg = compressed[1]
|
||||
assert assistant_msg.role == "assistant"
|
||||
assert assistant_msg.tool_calls is not None
|
||||
assert len(assistant_msg.tool_calls) == 1
|
||||
assert assistant_msg.tool_calls[0]["id"] == "tc_abc"
|
||||
assert assistant_msg.tool_calls[0]["function"]["name"] == "search"
|
||||
# Tool-role message must keep tool_call_id for OpenAI linkage.
|
||||
tool_msg = compressed[2]
|
||||
assert tool_msg.role == "tool"
|
||||
assert tool_msg.tool_call_id == "tc_abc"
|
||||
assert tool_msg.content == "search result"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_uncompressed_passthrough_keeps_fields(self):
|
||||
"""When compression is a no-op (was_compacted=False), the original
|
||||
messages must be returned unchanged — including tool_calls."""
|
||||
input_messages = [
|
||||
ChatMessage(
|
||||
role="assistant",
|
||||
content="c",
|
||||
tool_calls=[
|
||||
{
|
||||
"id": "t1",
|
||||
"type": "function",
|
||||
"function": {"name": "f", "arguments": "{}"},
|
||||
}
|
||||
],
|
||||
),
|
||||
ChatMessage(role="tool", tool_call_id="t1", content="ok"),
|
||||
]
|
||||
|
||||
noop_result = CompressResult(
|
||||
messages=[], # ignored when was_compacted=False
|
||||
token_count=10,
|
||||
was_compacted=False,
|
||||
)
|
||||
|
||||
with patch(
|
||||
"backend.copilot.baseline.service.compress_context",
|
||||
new=AsyncMock(return_value=noop_result),
|
||||
):
|
||||
out = await _compress_session_messages(
|
||||
input_messages, model="openrouter/anthropic/claude-opus-4"
|
||||
)
|
||||
|
||||
assert out is input_messages # same list returned
|
||||
assert out[0].tool_calls is not None
|
||||
assert out[0].tool_calls[0]["id"] == "t1"
|
||||
assert out[1].tool_call_id == "t1"
|
||||
|
||||
|
||||
# ---- _ThinkingStripper tests ---- #
|
||||
|
||||
|
||||
def test_thinking_stripper_basic_thinking_tag() -> None:
|
||||
"""<thinking>...</thinking> blocks are fully stripped."""
|
||||
s = _ThinkingStripper()
|
||||
assert s.process("<thinking>internal reasoning here</thinking>Hello!") == "Hello!"
|
||||
|
||||
|
||||
def test_thinking_stripper_internal_reasoning_tag() -> None:
|
||||
"""<internal_reasoning>...</internal_reasoning> blocks (Gemini) are stripped."""
|
||||
s = _ThinkingStripper()
|
||||
assert (
|
||||
s.process("<internal_reasoning>step by step</internal_reasoning>Answer")
|
||||
== "Answer"
|
||||
)
|
||||
|
||||
|
||||
def test_thinking_stripper_split_across_chunks() -> None:
|
||||
"""Tags split across multiple chunks are handled correctly."""
|
||||
s = _ThinkingStripper()
|
||||
out = s.process("Hello <thin")
|
||||
out += s.process("king>secret</thinking> world")
|
||||
assert out == "Hello world"
|
||||
|
||||
|
||||
def test_thinking_stripper_plain_text_preserved() -> None:
|
||||
"""Plain text with the word 'thinking' is not stripped."""
|
||||
s = _ThinkingStripper()
|
||||
assert (
|
||||
s.process("I am thinking about this problem")
|
||||
== "I am thinking about this problem"
|
||||
)
|
||||
|
||||
|
||||
def test_thinking_stripper_multiple_blocks() -> None:
|
||||
"""Multiple reasoning blocks in one stream are all stripped."""
|
||||
s = _ThinkingStripper()
|
||||
result = s.process(
|
||||
"A<thinking>x</thinking>B<internal_reasoning>y</internal_reasoning>C"
|
||||
)
|
||||
assert result == "ABC"
|
||||
|
||||
|
||||
def test_thinking_stripper_flush_discards_unclosed() -> None:
|
||||
"""Unclosed reasoning block is discarded on flush."""
|
||||
s = _ThinkingStripper()
|
||||
s.process("Start<thinking>never closed")
|
||||
flushed = s.flush()
|
||||
assert "never closed" not in flushed
|
||||
|
||||
|
||||
def test_thinking_stripper_empty_block() -> None:
|
||||
"""Empty reasoning blocks are handled gracefully."""
|
||||
s = _ThinkingStripper()
|
||||
assert s.process("Before<thinking></thinking>After") == "BeforeAfter"
|
||||
|
||||
|
||||
# ---- _filter_tools_by_permissions tests ---- #
|
||||
|
||||
|
||||
def _make_tool(name: str) -> ChatCompletionToolParam:
|
||||
"""Build a minimal OpenAI ChatCompletionToolParam."""
|
||||
return ChatCompletionToolParam(
|
||||
type="function",
|
||||
function={"name": name, "parameters": {}},
|
||||
)
|
||||
|
||||
|
||||
class TestFilterToolsByPermissions:
|
||||
"""Tests for _filter_tools_by_permissions."""
|
||||
|
||||
@patch(
|
||||
"backend.copilot.permissions.all_known_tool_names",
|
||||
return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
|
||||
)
|
||||
def test_empty_permissions_returns_all(self, _mock_names):
|
||||
"""Empty permissions (no filtering) returns every tool unchanged."""
|
||||
from backend.copilot.baseline.service import _filter_tools_by_permissions
|
||||
from backend.copilot.permissions import CopilotPermissions
|
||||
|
||||
tools = [_make_tool("run_block"), _make_tool("web_fetch")]
|
||||
perms = CopilotPermissions()
|
||||
result = _filter_tools_by_permissions(tools, perms)
|
||||
assert result == tools
|
||||
|
||||
@patch(
|
||||
"backend.copilot.permissions.all_known_tool_names",
|
||||
return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
|
||||
)
|
||||
def test_allowlist_keeps_only_matching(self, _mock_names):
|
||||
"""Explicit allowlist (tools_exclude=False) keeps only listed tools."""
|
||||
from backend.copilot.baseline.service import _filter_tools_by_permissions
|
||||
from backend.copilot.permissions import CopilotPermissions
|
||||
|
||||
tools = [
|
||||
_make_tool("run_block"),
|
||||
_make_tool("web_fetch"),
|
||||
_make_tool("bash_exec"),
|
||||
]
|
||||
perms = CopilotPermissions(tools=["web_fetch"], tools_exclude=False)
|
||||
result = _filter_tools_by_permissions(tools, perms)
|
||||
assert len(result) == 1
|
||||
assert result[0]["function"]["name"] == "web_fetch"
|
||||
|
||||
@patch(
|
||||
"backend.copilot.permissions.all_known_tool_names",
|
||||
return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
|
||||
)
|
||||
def test_blacklist_excludes_listed(self, _mock_names):
|
||||
"""Blacklist (tools_exclude=True) removes only the listed tools."""
|
||||
from backend.copilot.baseline.service import _filter_tools_by_permissions
|
||||
from backend.copilot.permissions import CopilotPermissions
|
||||
|
||||
tools = [
|
||||
_make_tool("run_block"),
|
||||
_make_tool("web_fetch"),
|
||||
_make_tool("bash_exec"),
|
||||
]
|
||||
perms = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
|
||||
result = _filter_tools_by_permissions(tools, perms)
|
||||
names = [t["function"]["name"] for t in result]
|
||||
assert "bash_exec" not in names
|
||||
assert "run_block" in names
|
||||
assert "web_fetch" in names
|
||||
assert len(result) == 2
|
||||
|
||||
@patch(
|
||||
"backend.copilot.permissions.all_known_tool_names",
|
||||
return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
|
||||
)
|
||||
def test_unknown_tool_name_filtered_out(self, _mock_names):
|
||||
"""A tool whose name is not in all_known_tool_names is dropped."""
|
||||
from backend.copilot.baseline.service import _filter_tools_by_permissions
|
||||
from backend.copilot.permissions import CopilotPermissions
|
||||
|
||||
tools = [_make_tool("run_block"), _make_tool("unknown_tool")]
|
||||
perms = CopilotPermissions(tools=["run_block"], tools_exclude=False)
|
||||
result = _filter_tools_by_permissions(tools, perms)
|
||||
names = [t["function"]["name"] for t in result]
|
||||
assert "unknown_tool" not in names
|
||||
assert names == ["run_block"]
|
||||
|
||||
|
||||
# ---- _prepare_baseline_attachments tests ---- #
|
||||
|
||||
|
||||
class TestPrepareBaselineAttachments:
|
||||
"""Tests for _prepare_baseline_attachments."""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_empty_file_ids(self):
|
||||
"""Empty file_ids returns empty hint and blocks."""
|
||||
from backend.copilot.baseline.service import _prepare_baseline_attachments
|
||||
|
||||
hint, blocks = await _prepare_baseline_attachments([], "user1", "sess1", "/tmp")
|
||||
assert hint == ""
|
||||
assert blocks == []
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_empty_user_id(self):
|
||||
"""Empty user_id returns empty hint and blocks."""
|
||||
from backend.copilot.baseline.service import _prepare_baseline_attachments
|
||||
|
||||
hint, blocks = await _prepare_baseline_attachments(
|
||||
["file1"], "", "sess1", "/tmp"
|
||||
)
|
||||
assert hint == ""
|
||||
assert blocks == []
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_image_file_returns_vision_blocks(self):
|
||||
"""A PNG image within size limits is returned as a base64 vision block."""
|
||||
from backend.copilot.baseline.service import _prepare_baseline_attachments
|
||||
|
||||
fake_info = AsyncMock()
|
||||
fake_info.name = "photo.png"
|
||||
fake_info.mime_type = "image/png"
|
||||
fake_info.size_bytes = 1024
|
||||
|
||||
fake_manager = AsyncMock()
|
||||
fake_manager.get_file_info = AsyncMock(return_value=fake_info)
|
||||
fake_manager.read_file_by_id = AsyncMock(return_value=b"\x89PNG_FAKE_DATA")
|
||||
|
||||
with patch(
|
||||
"backend.copilot.baseline.service.get_workspace_manager",
|
||||
new=AsyncMock(return_value=fake_manager),
|
||||
):
|
||||
hint, blocks = await _prepare_baseline_attachments(
|
||||
["fid1"], "user1", "sess1", "/tmp/workdir"
|
||||
)
|
||||
|
||||
assert len(blocks) == 1
|
||||
assert blocks[0]["type"] == "image"
|
||||
assert blocks[0]["source"]["media_type"] == "image/png"
|
||||
assert blocks[0]["source"]["type"] == "base64"
|
||||
assert "photo.png" in hint
|
||||
assert "embedded as image" in hint
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_non_image_file_saved_to_working_dir(self, tmp_path):
|
||||
"""A non-image file is written to working_dir."""
|
||||
from backend.copilot.baseline.service import _prepare_baseline_attachments
|
||||
|
||||
fake_info = AsyncMock()
|
||||
fake_info.name = "data.csv"
|
||||
fake_info.mime_type = "text/csv"
|
||||
fake_info.size_bytes = 42
|
||||
|
||||
fake_manager = AsyncMock()
|
||||
fake_manager.get_file_info = AsyncMock(return_value=fake_info)
|
||||
fake_manager.read_file_by_id = AsyncMock(return_value=b"col1,col2\na,b")
|
||||
|
||||
with patch(
|
||||
"backend.copilot.baseline.service.get_workspace_manager",
|
||||
new=AsyncMock(return_value=fake_manager),
|
||||
):
|
||||
hint, blocks = await _prepare_baseline_attachments(
|
||||
["fid1"], "user1", "sess1", str(tmp_path)
|
||||
)
|
||||
|
||||
assert blocks == []
|
||||
assert "data.csv" in hint
|
||||
assert "saved to" in hint
|
||||
saved = tmp_path / "data.csv"
|
||||
assert saved.exists()
|
||||
assert saved.read_bytes() == b"col1,col2\na,b"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_file_not_found_skipped(self):
|
||||
"""When get_file_info returns None the file is silently skipped."""
|
||||
from backend.copilot.baseline.service import _prepare_baseline_attachments
|
||||
|
||||
fake_manager = AsyncMock()
|
||||
fake_manager.get_file_info = AsyncMock(return_value=None)
|
||||
|
||||
with patch(
|
||||
"backend.copilot.baseline.service.get_workspace_manager",
|
||||
new=AsyncMock(return_value=fake_manager),
|
||||
):
|
||||
hint, blocks = await _prepare_baseline_attachments(
|
||||
["missing_id"], "user1", "sess1", "/tmp"
|
||||
)
|
||||
|
||||
assert hint == ""
|
||||
assert blocks == []
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_workspace_manager_error(self):
|
||||
"""When get_workspace_manager raises, returns empty results."""
|
||||
from backend.copilot.baseline.service import _prepare_baseline_attachments
|
||||
|
||||
with patch(
|
||||
"backend.copilot.baseline.service.get_workspace_manager",
|
||||
new=AsyncMock(side_effect=RuntimeError("connection failed")),
|
||||
):
|
||||
hint, blocks = await _prepare_baseline_attachments(
|
||||
["fid1"], "user1", "sess1", "/tmp"
|
||||
)
|
||||
|
||||
assert hint == ""
|
||||
assert blocks == []
|
||||
|
||||
|
||||
class TestBaselineCostExtraction:
|
||||
"""Tests for x-total-cost header extraction in _baseline_llm_caller."""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cost_usd_extracted_from_response_header(self):
|
||||
"""state.cost_usd is set from x-total-cost header when present."""
|
||||
from backend.copilot.baseline.service import (
|
||||
_baseline_llm_caller,
|
||||
_BaselineStreamState,
|
||||
)
|
||||
|
||||
state = _BaselineStreamState(model="gpt-4o-mini")
|
||||
|
||||
# Build a mock raw httpx response with the cost header
|
||||
mock_raw_response = MagicMock()
|
||||
mock_raw_response.headers = {"x-total-cost": "0.0123"}
|
||||
|
||||
# Build a mock async streaming response that yields no chunks but has
|
||||
# a _response attribute pointing to the mock httpx response
|
||||
mock_stream_response = MagicMock()
|
||||
mock_stream_response._response = mock_raw_response
|
||||
|
||||
async def empty_aiter():
|
||||
return
|
||||
yield # make it an async generator
|
||||
|
||||
mock_stream_response.__aiter__ = lambda self: empty_aiter()
|
||||
|
||||
mock_client = MagicMock()
|
||||
mock_client.chat.completions.create = AsyncMock(
|
||||
return_value=mock_stream_response
|
||||
)
|
||||
|
||||
with patch(
|
||||
"backend.copilot.baseline.service._get_openai_client",
|
||||
return_value=mock_client,
|
||||
):
|
||||
await _baseline_llm_caller(
|
||||
messages=[{"role": "user", "content": "hi"}],
|
||||
tools=[],
|
||||
state=state,
|
||||
)
|
||||
|
||||
assert state.cost_usd == pytest.approx(0.0123)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cost_usd_accumulates_across_calls(self):
|
||||
"""cost_usd accumulates when _baseline_llm_caller is called multiple times."""
|
||||
from backend.copilot.baseline.service import (
|
||||
_baseline_llm_caller,
|
||||
_BaselineStreamState,
|
||||
)
|
||||
|
||||
state = _BaselineStreamState(model="gpt-4o-mini")
|
||||
|
||||
def make_stream_mock(cost: str) -> MagicMock:
|
||||
mock_raw = MagicMock()
|
||||
mock_raw.headers = {"x-total-cost": cost}
|
||||
mock_stream = MagicMock()
|
||||
mock_stream._response = mock_raw
|
||||
|
||||
async def empty_aiter():
|
||||
return
|
||||
yield
|
||||
|
||||
mock_stream.__aiter__ = lambda self: empty_aiter()
|
||||
return mock_stream
|
||||
|
||||
mock_client = MagicMock()
|
||||
mock_client.chat.completions.create = AsyncMock(
|
||||
side_effect=[make_stream_mock("0.01"), make_stream_mock("0.02")]
|
||||
)
|
||||
|
||||
with patch(
|
||||
"backend.copilot.baseline.service._get_openai_client",
|
||||
return_value=mock_client,
|
||||
):
|
||||
await _baseline_llm_caller(
|
||||
messages=[{"role": "user", "content": "first"}],
|
||||
tools=[],
|
||||
state=state,
|
||||
)
|
||||
await _baseline_llm_caller(
|
||||
messages=[{"role": "user", "content": "second"}],
|
||||
tools=[],
|
||||
state=state,
|
||||
)
|
||||
|
||||
assert state.cost_usd == pytest.approx(0.03)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_no_cost_when_header_absent(self):
|
||||
"""state.cost_usd remains None when response has no x-total-cost header."""
|
||||
from backend.copilot.baseline.service import (
|
||||
_baseline_llm_caller,
|
||||
_BaselineStreamState,
|
||||
)
|
||||
|
||||
state = _BaselineStreamState(model="gpt-4o-mini")
|
||||
|
||||
mock_raw = MagicMock()
|
||||
mock_raw.headers = {}
|
||||
mock_stream = MagicMock()
|
||||
mock_stream._response = mock_raw
|
||||
|
||||
async def empty_aiter():
|
||||
return
|
||||
yield
|
||||
|
||||
mock_stream.__aiter__ = lambda self: empty_aiter()
|
||||
|
||||
mock_client = MagicMock()
|
||||
mock_client.chat.completions.create = AsyncMock(return_value=mock_stream)
|
||||
|
||||
with patch(
|
||||
"backend.copilot.baseline.service._get_openai_client",
|
||||
return_value=mock_client,
|
||||
):
|
||||
await _baseline_llm_caller(
|
||||
messages=[{"role": "user", "content": "hi"}],
|
||||
tools=[],
|
||||
state=state,
|
||||
)
|
||||
|
||||
assert state.cost_usd is None
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cost_extracted_even_when_stream_raises(self):
|
||||
"""cost_usd is captured in the finally block even when streaming fails."""
|
||||
from backend.copilot.baseline.service import (
|
||||
_baseline_llm_caller,
|
||||
_BaselineStreamState,
|
||||
)
|
||||
|
||||
state = _BaselineStreamState(model="gpt-4o-mini")
|
||||
|
||||
mock_raw = MagicMock()
|
||||
mock_raw.headers = {"x-total-cost": "0.005"}
|
||||
mock_stream = MagicMock()
|
||||
mock_stream._response = mock_raw
|
||||
|
||||
async def failing_aiter():
|
||||
raise RuntimeError("stream error")
|
||||
yield # make it an async generator
|
||||
|
||||
mock_stream.__aiter__ = lambda self: failing_aiter()
|
||||
|
||||
mock_client = MagicMock()
|
||||
mock_client.chat.completions.create = AsyncMock(return_value=mock_stream)
|
||||
|
||||
with (
|
||||
patch(
|
||||
"backend.copilot.baseline.service._get_openai_client",
|
||||
return_value=mock_client,
|
||||
),
|
||||
pytest.raises(RuntimeError, match="stream error"),
|
||||
):
|
||||
await _baseline_llm_caller(
|
||||
messages=[{"role": "user", "content": "hi"}],
|
||||
tools=[],
|
||||
state=state,
|
||||
)
|
||||
|
||||
assert state.cost_usd == pytest.approx(0.005)
|
||||
@@ -0,0 +1,667 @@
|
||||
"""Integration tests for baseline transcript flow.
|
||||
|
||||
Exercises the real helpers in ``baseline/service.py`` that download,
|
||||
validate, load, append to, backfill, and upload the transcript.
|
||||
Storage is mocked via ``download_transcript`` / ``upload_transcript``
|
||||
patches; no network access is required.
|
||||
"""
|
||||
|
||||
import json as stdlib_json
|
||||
from unittest.mock import AsyncMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from backend.copilot.baseline.service import (
|
||||
_load_prior_transcript,
|
||||
_record_turn_to_transcript,
|
||||
_resolve_baseline_model,
|
||||
_upload_final_transcript,
|
||||
is_transcript_stale,
|
||||
should_upload_transcript,
|
||||
)
|
||||
from backend.copilot.service import config
|
||||
from backend.copilot.transcript import (
|
||||
STOP_REASON_END_TURN,
|
||||
STOP_REASON_TOOL_USE,
|
||||
TranscriptDownload,
|
||||
)
|
||||
from backend.copilot.transcript_builder import TranscriptBuilder
|
||||
from backend.util.tool_call_loop import LLMLoopResponse, LLMToolCall, ToolCallResult
|
||||
|
||||
|
||||
def _make_transcript_content(*roles: str) -> str:
|
||||
"""Build a minimal valid JSONL transcript from role names."""
|
||||
lines = []
|
||||
parent = ""
|
||||
for i, role in enumerate(roles):
|
||||
uid = f"uuid-{i}"
|
||||
entry: dict = {
|
||||
"type": role,
|
||||
"uuid": uid,
|
||||
"parentUuid": parent,
|
||||
"message": {
|
||||
"role": role,
|
||||
"content": [{"type": "text", "text": f"{role} message {i}"}],
|
||||
},
|
||||
}
|
||||
if role == "assistant":
|
||||
entry["message"]["id"] = f"msg_{i}"
|
||||
entry["message"]["model"] = "test-model"
|
||||
entry["message"]["type"] = "message"
|
||||
entry["message"]["stop_reason"] = STOP_REASON_END_TURN
|
||||
lines.append(stdlib_json.dumps(entry))
|
||||
parent = uid
|
||||
return "\n".join(lines) + "\n"
|
||||
|
||||
|
||||
class TestResolveBaselineModel:
|
||||
"""Model selection honours the per-request mode."""
|
||||
|
||||
def test_fast_mode_selects_fast_model(self):
|
||||
assert _resolve_baseline_model("fast") == config.fast_model
|
||||
|
||||
def test_extended_thinking_selects_default_model(self):
|
||||
assert _resolve_baseline_model("extended_thinking") == config.model
|
||||
|
||||
def test_none_mode_selects_default_model(self):
|
||||
"""Critical: baseline users without a mode MUST keep the default (opus)."""
|
||||
assert _resolve_baseline_model(None) == config.model
|
||||
|
||||
def test_default_and_fast_models_differ(self):
|
||||
"""Sanity: the two tiers are actually distinct in production config."""
|
||||
assert config.model != config.fast_model
|
||||
|
||||
|
||||
class TestLoadPriorTranscript:
|
||||
"""``_load_prior_transcript`` wraps the download + validate + load flow."""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_loads_fresh_transcript(self):
|
||||
builder = TranscriptBuilder()
|
||||
content = _make_transcript_content("user", "assistant")
|
||||
download = TranscriptDownload(content=content, message_count=2)
|
||||
|
||||
with patch(
|
||||
"backend.copilot.baseline.service.download_transcript",
|
||||
new=AsyncMock(return_value=download),
|
||||
):
|
||||
covers = await _load_prior_transcript(
|
||||
user_id="user-1",
|
||||
session_id="session-1",
|
||||
session_msg_count=3,
|
||||
transcript_builder=builder,
|
||||
)
|
||||
|
||||
assert covers is True
|
||||
assert builder.entry_count == 2
|
||||
assert builder.last_entry_type == "assistant"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_rejects_stale_transcript(self):
|
||||
"""msg_count strictly less than session-1 is treated as stale."""
|
||||
builder = TranscriptBuilder()
|
||||
content = _make_transcript_content("user", "assistant")
|
||||
# session has 6 messages, transcript only covers 2 → stale.
|
||||
download = TranscriptDownload(content=content, message_count=2)
|
||||
|
||||
with patch(
|
||||
"backend.copilot.baseline.service.download_transcript",
|
||||
new=AsyncMock(return_value=download),
|
||||
):
|
||||
covers = await _load_prior_transcript(
|
||||
user_id="user-1",
|
||||
session_id="session-1",
|
||||
session_msg_count=6,
|
||||
transcript_builder=builder,
|
||||
)
|
||||
|
||||
assert covers is False
|
||||
assert builder.is_empty
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_missing_transcript_returns_false(self):
|
||||
builder = TranscriptBuilder()
|
||||
with patch(
|
||||
"backend.copilot.baseline.service.download_transcript",
|
||||
new=AsyncMock(return_value=None),
|
||||
):
|
||||
covers = await _load_prior_transcript(
|
||||
user_id="user-1",
|
||||
session_id="session-1",
|
||||
session_msg_count=2,
|
||||
transcript_builder=builder,
|
||||
)
|
||||
|
||||
assert covers is False
|
||||
assert builder.is_empty
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_invalid_transcript_returns_false(self):
|
||||
builder = TranscriptBuilder()
|
||||
download = TranscriptDownload(
|
||||
content='{"type":"progress","uuid":"a"}\n',
|
||||
message_count=1,
|
||||
)
|
||||
with patch(
|
||||
"backend.copilot.baseline.service.download_transcript",
|
||||
new=AsyncMock(return_value=download),
|
||||
):
|
||||
covers = await _load_prior_transcript(
|
||||
user_id="user-1",
|
||||
session_id="session-1",
|
||||
session_msg_count=2,
|
||||
transcript_builder=builder,
|
||||
)
|
||||
|
||||
assert covers is False
|
||||
assert builder.is_empty
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_download_exception_returns_false(self):
|
||||
builder = TranscriptBuilder()
|
||||
with patch(
|
||||
"backend.copilot.baseline.service.download_transcript",
|
||||
new=AsyncMock(side_effect=RuntimeError("boom")),
|
||||
):
|
||||
covers = await _load_prior_transcript(
|
||||
user_id="user-1",
|
||||
session_id="session-1",
|
||||
session_msg_count=2,
|
||||
transcript_builder=builder,
|
||||
)
|
||||
|
||||
assert covers is False
|
||||
assert builder.is_empty
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_zero_message_count_not_stale(self):
|
||||
"""When msg_count is 0 (unknown), staleness check is skipped."""
|
||||
builder = TranscriptBuilder()
|
||||
download = TranscriptDownload(
|
||||
content=_make_transcript_content("user", "assistant"),
|
||||
message_count=0,
|
||||
)
|
||||
with patch(
|
||||
"backend.copilot.baseline.service.download_transcript",
|
||||
new=AsyncMock(return_value=download),
|
||||
):
|
||||
covers = await _load_prior_transcript(
|
||||
user_id="user-1",
|
||||
session_id="session-1",
|
||||
session_msg_count=20,
|
||||
transcript_builder=builder,
|
||||
)
|
||||
|
||||
assert covers is True
|
||||
assert builder.entry_count == 2
|
||||
|
||||
|
||||
class TestUploadFinalTranscript:
|
||||
"""``_upload_final_transcript`` serialises and calls storage."""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_uploads_valid_transcript(self):
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_user(content="hi")
|
||||
builder.append_assistant(
|
||||
content_blocks=[{"type": "text", "text": "hello"}],
|
||||
model="test-model",
|
||||
stop_reason=STOP_REASON_END_TURN,
|
||||
)
|
||||
|
||||
upload_mock = AsyncMock(return_value=None)
|
||||
with patch(
|
||||
"backend.copilot.baseline.service.upload_transcript",
|
||||
new=upload_mock,
|
||||
):
|
||||
await _upload_final_transcript(
|
||||
user_id="user-1",
|
||||
session_id="session-1",
|
||||
transcript_builder=builder,
|
||||
session_msg_count=2,
|
||||
)
|
||||
|
||||
upload_mock.assert_awaited_once()
|
||||
assert upload_mock.await_args is not None
|
||||
call_kwargs = upload_mock.await_args.kwargs
|
||||
assert call_kwargs["user_id"] == "user-1"
|
||||
assert call_kwargs["session_id"] == "session-1"
|
||||
assert call_kwargs["message_count"] == 2
|
||||
assert "hello" in call_kwargs["content"]
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_skips_upload_when_builder_empty(self):
|
||||
builder = TranscriptBuilder()
|
||||
upload_mock = AsyncMock(return_value=None)
|
||||
with patch(
|
||||
"backend.copilot.baseline.service.upload_transcript",
|
||||
new=upload_mock,
|
||||
):
|
||||
await _upload_final_transcript(
|
||||
user_id="user-1",
|
||||
session_id="session-1",
|
||||
transcript_builder=builder,
|
||||
session_msg_count=0,
|
||||
)
|
||||
|
||||
upload_mock.assert_not_awaited()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_swallows_upload_exceptions(self):
|
||||
"""Upload failures should not propagate (flow continues for the user)."""
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_user(content="hi")
|
||||
builder.append_assistant(
|
||||
content_blocks=[{"type": "text", "text": "hello"}],
|
||||
model="test-model",
|
||||
stop_reason=STOP_REASON_END_TURN,
|
||||
)
|
||||
|
||||
with patch(
|
||||
"backend.copilot.baseline.service.upload_transcript",
|
||||
new=AsyncMock(side_effect=RuntimeError("storage unavailable")),
|
||||
):
|
||||
# Should not raise.
|
||||
await _upload_final_transcript(
|
||||
user_id="user-1",
|
||||
session_id="session-1",
|
||||
transcript_builder=builder,
|
||||
session_msg_count=2,
|
||||
)
|
||||
|
||||
|
||||
class TestRecordTurnToTranscript:
|
||||
"""``_record_turn_to_transcript`` translates LLMLoopResponse → transcript."""
|
||||
|
||||
def test_records_final_assistant_text(self):
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_user(content="hi")
|
||||
|
||||
response = LLMLoopResponse(
|
||||
response_text="hello there",
|
||||
tool_calls=[],
|
||||
raw_response=None,
|
||||
)
|
||||
_record_turn_to_transcript(
|
||||
response,
|
||||
tool_results=None,
|
||||
transcript_builder=builder,
|
||||
model="test-model",
|
||||
)
|
||||
|
||||
assert builder.entry_count == 2
|
||||
assert builder.last_entry_type == "assistant"
|
||||
jsonl = builder.to_jsonl()
|
||||
assert "hello there" in jsonl
|
||||
assert STOP_REASON_END_TURN in jsonl
|
||||
|
||||
def test_records_tool_use_then_tool_result(self):
|
||||
"""Anthropic ordering: assistant(tool_use) → user(tool_result)."""
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_user(content="use a tool")
|
||||
|
||||
response = LLMLoopResponse(
|
||||
response_text=None,
|
||||
tool_calls=[
|
||||
LLMToolCall(id="call-1", name="echo", arguments='{"text":"hi"}')
|
||||
],
|
||||
raw_response=None,
|
||||
)
|
||||
tool_results = [
|
||||
ToolCallResult(tool_call_id="call-1", tool_name="echo", content="hi")
|
||||
]
|
||||
_record_turn_to_transcript(
|
||||
response,
|
||||
tool_results,
|
||||
transcript_builder=builder,
|
||||
model="test-model",
|
||||
)
|
||||
|
||||
# user, assistant(tool_use), user(tool_result) = 3 entries
|
||||
assert builder.entry_count == 3
|
||||
jsonl = builder.to_jsonl()
|
||||
assert STOP_REASON_TOOL_USE in jsonl
|
||||
assert "tool_use" in jsonl
|
||||
assert "tool_result" in jsonl
|
||||
assert "call-1" in jsonl
|
||||
|
||||
def test_records_nothing_on_empty_response(self):
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_user(content="hi")
|
||||
|
||||
response = LLMLoopResponse(
|
||||
response_text=None,
|
||||
tool_calls=[],
|
||||
raw_response=None,
|
||||
)
|
||||
_record_turn_to_transcript(
|
||||
response,
|
||||
tool_results=None,
|
||||
transcript_builder=builder,
|
||||
model="test-model",
|
||||
)
|
||||
|
||||
assert builder.entry_count == 1
|
||||
|
||||
def test_malformed_tool_args_dont_crash(self):
|
||||
"""Bad JSON in tool arguments falls back to {} without raising."""
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_user(content="hi")
|
||||
|
||||
response = LLMLoopResponse(
|
||||
response_text=None,
|
||||
tool_calls=[LLMToolCall(id="call-1", name="echo", arguments="{not-json")],
|
||||
raw_response=None,
|
||||
)
|
||||
tool_results = [
|
||||
ToolCallResult(tool_call_id="call-1", tool_name="echo", content="ok")
|
||||
]
|
||||
_record_turn_to_transcript(
|
||||
response,
|
||||
tool_results,
|
||||
transcript_builder=builder,
|
||||
model="test-model",
|
||||
)
|
||||
|
||||
assert builder.entry_count == 3
|
||||
jsonl = builder.to_jsonl()
|
||||
assert '"input":{}' in jsonl
|
||||
|
||||
|
||||
class TestRoundTrip:
|
||||
"""End-to-end: load prior → append new turn → upload."""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_full_round_trip(self):
|
||||
prior = _make_transcript_content("user", "assistant")
|
||||
download = TranscriptDownload(content=prior, message_count=2)
|
||||
|
||||
builder = TranscriptBuilder()
|
||||
with patch(
|
||||
"backend.copilot.baseline.service.download_transcript",
|
||||
new=AsyncMock(return_value=download),
|
||||
):
|
||||
covers = await _load_prior_transcript(
|
||||
user_id="user-1",
|
||||
session_id="session-1",
|
||||
session_msg_count=3,
|
||||
transcript_builder=builder,
|
||||
)
|
||||
assert covers is True
|
||||
assert builder.entry_count == 2
|
||||
|
||||
# New user turn.
|
||||
builder.append_user(content="new question")
|
||||
assert builder.entry_count == 3
|
||||
|
||||
# New assistant turn.
|
||||
response = LLMLoopResponse(
|
||||
response_text="new answer",
|
||||
tool_calls=[],
|
||||
raw_response=None,
|
||||
)
|
||||
_record_turn_to_transcript(
|
||||
response,
|
||||
tool_results=None,
|
||||
transcript_builder=builder,
|
||||
model="test-model",
|
||||
)
|
||||
assert builder.entry_count == 4
|
||||
|
||||
# Upload.
|
||||
upload_mock = AsyncMock(return_value=None)
|
||||
with patch(
|
||||
"backend.copilot.baseline.service.upload_transcript",
|
||||
new=upload_mock,
|
||||
):
|
||||
await _upload_final_transcript(
|
||||
user_id="user-1",
|
||||
session_id="session-1",
|
||||
transcript_builder=builder,
|
||||
session_msg_count=4,
|
||||
)
|
||||
|
||||
upload_mock.assert_awaited_once()
|
||||
assert upload_mock.await_args is not None
|
||||
uploaded = upload_mock.await_args.kwargs["content"]
|
||||
assert "new question" in uploaded
|
||||
assert "new answer" in uploaded
|
||||
# Original content preserved in the round trip.
|
||||
assert "user message 0" in uploaded
|
||||
assert "assistant message 1" in uploaded
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_backfill_append_guard(self):
|
||||
"""Backfill only runs when the last entry is not already assistant."""
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_user(content="hi")
|
||||
|
||||
# Simulate the backfill guard from stream_chat_completion_baseline.
|
||||
assistant_text = "partial text before error"
|
||||
if builder.last_entry_type != "assistant":
|
||||
builder.append_assistant(
|
||||
content_blocks=[{"type": "text", "text": assistant_text}],
|
||||
model="test-model",
|
||||
stop_reason=STOP_REASON_END_TURN,
|
||||
)
|
||||
|
||||
assert builder.last_entry_type == "assistant"
|
||||
assert "partial text before error" in builder.to_jsonl()
|
||||
|
||||
# Second invocation: the guard must prevent double-append.
|
||||
initial_count = builder.entry_count
|
||||
if builder.last_entry_type != "assistant":
|
||||
builder.append_assistant(
|
||||
content_blocks=[{"type": "text", "text": "duplicate"}],
|
||||
model="test-model",
|
||||
stop_reason=STOP_REASON_END_TURN,
|
||||
)
|
||||
assert builder.entry_count == initial_count
|
||||
|
||||
|
||||
class TestIsTranscriptStale:
|
||||
"""``is_transcript_stale`` gates prior-transcript loading."""
|
||||
|
||||
def test_none_download_is_not_stale(self):
|
||||
assert is_transcript_stale(None, session_msg_count=5) is False
|
||||
|
||||
def test_zero_message_count_is_not_stale(self):
|
||||
"""Legacy transcripts without msg_count tracking must remain usable."""
|
||||
dl = TranscriptDownload(content="", message_count=0)
|
||||
assert is_transcript_stale(dl, session_msg_count=20) is False
|
||||
|
||||
def test_stale_when_covers_less_than_prefix(self):
|
||||
dl = TranscriptDownload(content="", message_count=2)
|
||||
# session has 6 messages; transcript must cover at least 5 (6-1).
|
||||
assert is_transcript_stale(dl, session_msg_count=6) is True
|
||||
|
||||
def test_fresh_when_covers_full_prefix(self):
|
||||
dl = TranscriptDownload(content="", message_count=5)
|
||||
assert is_transcript_stale(dl, session_msg_count=6) is False
|
||||
|
||||
def test_fresh_when_exceeds_prefix(self):
|
||||
"""Race: transcript ahead of session count is still acceptable."""
|
||||
dl = TranscriptDownload(content="", message_count=10)
|
||||
assert is_transcript_stale(dl, session_msg_count=6) is False
|
||||
|
||||
def test_boundary_equal_to_prefix_minus_one(self):
|
||||
dl = TranscriptDownload(content="", message_count=5)
|
||||
assert is_transcript_stale(dl, session_msg_count=6) is False
|
||||
|
||||
|
||||
class TestShouldUploadTranscript:
|
||||
"""``should_upload_transcript`` gates the final upload."""
|
||||
|
||||
def test_upload_allowed_for_user_with_coverage(self):
|
||||
assert should_upload_transcript("user-1", True) is True
|
||||
|
||||
def test_upload_skipped_when_no_user(self):
|
||||
assert should_upload_transcript(None, True) is False
|
||||
|
||||
def test_upload_skipped_when_empty_user(self):
|
||||
assert should_upload_transcript("", True) is False
|
||||
|
||||
def test_upload_skipped_without_coverage(self):
|
||||
"""Partial transcript must never clobber a more complete stored one."""
|
||||
assert should_upload_transcript("user-1", False) is False
|
||||
|
||||
def test_upload_skipped_when_no_user_and_no_coverage(self):
|
||||
assert should_upload_transcript(None, False) is False
|
||||
|
||||
|
||||
class TestTranscriptLifecycle:
|
||||
"""End-to-end: download → validate → build → upload.
|
||||
|
||||
Simulates the full transcript lifecycle inside
|
||||
``stream_chat_completion_baseline`` by mocking the storage layer and
|
||||
driving each step through the real helpers.
|
||||
"""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_full_lifecycle_happy_path(self):
|
||||
"""Fresh download, append a turn, upload covers the session."""
|
||||
builder = TranscriptBuilder()
|
||||
prior = _make_transcript_content("user", "assistant")
|
||||
download = TranscriptDownload(content=prior, message_count=2)
|
||||
|
||||
upload_mock = AsyncMock(return_value=None)
|
||||
with (
|
||||
patch(
|
||||
"backend.copilot.baseline.service.download_transcript",
|
||||
new=AsyncMock(return_value=download),
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.baseline.service.upload_transcript",
|
||||
new=upload_mock,
|
||||
),
|
||||
):
|
||||
# --- 1. Download & load prior transcript ---
|
||||
covers = await _load_prior_transcript(
|
||||
user_id="user-1",
|
||||
session_id="session-1",
|
||||
session_msg_count=3,
|
||||
transcript_builder=builder,
|
||||
)
|
||||
assert covers is True
|
||||
|
||||
# --- 2. Append a new user turn + a new assistant response ---
|
||||
builder.append_user(content="follow-up question")
|
||||
_record_turn_to_transcript(
|
||||
LLMLoopResponse(
|
||||
response_text="follow-up answer",
|
||||
tool_calls=[],
|
||||
raw_response=None,
|
||||
),
|
||||
tool_results=None,
|
||||
transcript_builder=builder,
|
||||
model="test-model",
|
||||
)
|
||||
|
||||
# --- 3. Gate + upload ---
|
||||
assert (
|
||||
should_upload_transcript(
|
||||
user_id="user-1", transcript_covers_prefix=covers
|
||||
)
|
||||
is True
|
||||
)
|
||||
await _upload_final_transcript(
|
||||
user_id="user-1",
|
||||
session_id="session-1",
|
||||
transcript_builder=builder,
|
||||
session_msg_count=4,
|
||||
)
|
||||
|
||||
upload_mock.assert_awaited_once()
|
||||
assert upload_mock.await_args is not None
|
||||
uploaded = upload_mock.await_args.kwargs["content"]
|
||||
assert "follow-up question" in uploaded
|
||||
assert "follow-up answer" in uploaded
|
||||
# Original prior-turn content preserved.
|
||||
assert "user message 0" in uploaded
|
||||
assert "assistant message 1" in uploaded
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_lifecycle_stale_download_suppresses_upload(self):
|
||||
"""Stale download → covers=False → upload must be skipped."""
|
||||
builder = TranscriptBuilder()
|
||||
# session has 10 msgs but stored transcript only covers 2 → stale.
|
||||
stale = TranscriptDownload(
|
||||
content=_make_transcript_content("user", "assistant"),
|
||||
message_count=2,
|
||||
)
|
||||
|
||||
upload_mock = AsyncMock(return_value=None)
|
||||
with (
|
||||
patch(
|
||||
"backend.copilot.baseline.service.download_transcript",
|
||||
new=AsyncMock(return_value=stale),
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.baseline.service.upload_transcript",
|
||||
new=upload_mock,
|
||||
),
|
||||
):
|
||||
covers = await _load_prior_transcript(
|
||||
user_id="user-1",
|
||||
session_id="session-1",
|
||||
session_msg_count=10,
|
||||
transcript_builder=builder,
|
||||
)
|
||||
|
||||
assert covers is False
|
||||
# The caller's gate mirrors the production path.
|
||||
assert (
|
||||
should_upload_transcript(user_id="user-1", transcript_covers_prefix=covers)
|
||||
is False
|
||||
)
|
||||
upload_mock.assert_not_awaited()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_lifecycle_anonymous_user_skips_upload(self):
|
||||
"""Anonymous (user_id=None) → upload gate must return False."""
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_user(content="hi")
|
||||
builder.append_assistant(
|
||||
content_blocks=[{"type": "text", "text": "hello"}],
|
||||
model="test-model",
|
||||
stop_reason=STOP_REASON_END_TURN,
|
||||
)
|
||||
|
||||
assert (
|
||||
should_upload_transcript(user_id=None, transcript_covers_prefix=True)
|
||||
is False
|
||||
)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_lifecycle_missing_download_still_uploads_new_content(self):
|
||||
"""No prior transcript → covers defaults to True in the service,
|
||||
new turn should upload cleanly."""
|
||||
builder = TranscriptBuilder()
|
||||
upload_mock = AsyncMock(return_value=None)
|
||||
with (
|
||||
patch(
|
||||
"backend.copilot.baseline.service.download_transcript",
|
||||
new=AsyncMock(return_value=None),
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.baseline.service.upload_transcript",
|
||||
new=upload_mock,
|
||||
),
|
||||
):
|
||||
covers = await _load_prior_transcript(
|
||||
user_id="user-1",
|
||||
session_id="session-1",
|
||||
session_msg_count=1,
|
||||
transcript_builder=builder,
|
||||
)
|
||||
# No download: covers is False, so the production path would
|
||||
# skip upload. This protects against overwriting a future
|
||||
# more-complete transcript with a single-turn snapshot.
|
||||
assert covers is False
|
||||
assert (
|
||||
should_upload_transcript(
|
||||
user_id="user-1", transcript_covers_prefix=covers
|
||||
)
|
||||
is False
|
||||
)
|
||||
upload_mock.assert_not_awaited()
|
||||
@@ -8,13 +8,26 @@ from pydantic_settings import BaseSettings
|
||||
|
||||
from backend.util.clients import OPENROUTER_BASE_URL
|
||||
|
||||
# Per-request routing mode for a single chat turn.
|
||||
# - 'fast': route to the baseline OpenAI-compatible path with the cheaper model.
|
||||
# - 'extended_thinking': route to the Claude Agent SDK path with the default
|
||||
# (opus) model.
|
||||
# ``None`` means "no override"; the server falls back to the Claude Code
|
||||
# subscription flag → LaunchDarkly COPILOT_SDK → config.use_claude_agent_sdk.
|
||||
CopilotMode = Literal["fast", "extended_thinking"]
|
||||
|
||||
|
||||
class ChatConfig(BaseSettings):
|
||||
"""Configuration for the chat system."""
|
||||
|
||||
# OpenAI API Configuration
|
||||
model: str = Field(
|
||||
default="anthropic/claude-opus-4.6", description="Default model to use"
|
||||
default="anthropic/claude-opus-4.6",
|
||||
description="Default model for extended thinking mode",
|
||||
)
|
||||
fast_model: str = Field(
|
||||
default="anthropic/claude-sonnet-4",
|
||||
description="Model for fast mode (baseline path). Should be faster/cheaper than the default model.",
|
||||
)
|
||||
title_model: str = Field(
|
||||
default="openai/gpt-4o-mini",
|
||||
@@ -81,11 +94,11 @@ class ChatConfig(BaseSettings):
|
||||
# allows ~70-100 turns/day.
|
||||
# Checked at the HTTP layer (routes.py) before each turn.
|
||||
#
|
||||
# TODO: These are deploy-time constants applied identically to every user.
|
||||
# If per-user or per-plan limits are needed (e.g., free tier vs paid), these
|
||||
# must move to the database (e.g., a UserPlan table) and get_usage_status /
|
||||
# check_rate_limit would look up each user's specific limits instead of
|
||||
# reading config.daily_token_limit / config.weekly_token_limit.
|
||||
# These are base limits for the FREE tier. Higher tiers (PRO, BUSINESS,
|
||||
# ENTERPRISE) multiply these by their tier multiplier (see
|
||||
# rate_limit.TIER_MULTIPLIERS). User tier is stored in the
|
||||
# User.subscriptionTier DB column and resolved inside
|
||||
# get_global_rate_limits().
|
||||
daily_token_limit: int = Field(
|
||||
default=2_500_000,
|
||||
description="Max tokens per day, resets at midnight UTC (0 = unlimited)",
|
||||
|
||||
@@ -14,6 +14,7 @@ from prisma.types import (
|
||||
ChatSessionUpdateInput,
|
||||
ChatSessionWhereInput,
|
||||
)
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.data import db
|
||||
from backend.util.json import SafeJson, sanitize_string
|
||||
@@ -23,12 +24,22 @@ from .model import (
|
||||
ChatSession,
|
||||
ChatSessionInfo,
|
||||
ChatSessionMetadata,
|
||||
invalidate_session_cache,
|
||||
cache_chat_session,
|
||||
)
|
||||
from .model import get_chat_session as get_chat_session_cached
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class PaginatedMessages(BaseModel):
|
||||
"""Result of a paginated message query."""
|
||||
|
||||
messages: list[ChatMessage]
|
||||
has_more: bool
|
||||
oldest_sequence: int | None
|
||||
session: ChatSessionInfo
|
||||
|
||||
|
||||
async def get_chat_session(session_id: str) -> ChatSession | None:
|
||||
"""Get a chat session by ID from the database."""
|
||||
session = await PrismaChatSession.prisma().find_unique(
|
||||
@@ -38,6 +49,116 @@ async def get_chat_session(session_id: str) -> ChatSession | None:
|
||||
return ChatSession.from_db(session) if session else None
|
||||
|
||||
|
||||
async def get_chat_session_metadata(session_id: str) -> ChatSessionInfo | None:
|
||||
"""Get chat session metadata (without messages) for ownership validation."""
|
||||
session = await PrismaChatSession.prisma().find_unique(
|
||||
where={"id": session_id},
|
||||
)
|
||||
return ChatSessionInfo.from_db(session) if session else None
|
||||
|
||||
|
||||
async def get_chat_messages_paginated(
|
||||
session_id: str,
|
||||
limit: int = 50,
|
||||
before_sequence: int | None = None,
|
||||
user_id: str | None = None,
|
||||
) -> PaginatedMessages | None:
|
||||
"""Get paginated messages for a session, newest first.
|
||||
|
||||
Verifies session existence (and ownership when ``user_id`` is provided)
|
||||
in parallel with the message query. Returns ``None`` when the session
|
||||
is not found or does not belong to the user.
|
||||
|
||||
Args:
|
||||
session_id: The chat session ID.
|
||||
limit: Max messages to return.
|
||||
before_sequence: Cursor — return messages with sequence < this value.
|
||||
user_id: If provided, filters via ``Session.userId`` so only the
|
||||
session owner's messages are returned (acts as an ownership guard).
|
||||
"""
|
||||
# Build session-existence / ownership check
|
||||
session_where: ChatSessionWhereInput = {"id": session_id}
|
||||
if user_id is not None:
|
||||
session_where["userId"] = user_id
|
||||
|
||||
# Build message include — fetch paginated messages in the same query
|
||||
msg_include: dict[str, Any] = {
|
||||
"order_by": {"sequence": "desc"},
|
||||
"take": limit + 1,
|
||||
}
|
||||
if before_sequence is not None:
|
||||
msg_include["where"] = {"sequence": {"lt": before_sequence}}
|
||||
|
||||
# Single query: session existence/ownership + paginated messages
|
||||
session = await PrismaChatSession.prisma().find_first(
|
||||
where=session_where,
|
||||
include={"Messages": msg_include},
|
||||
)
|
||||
|
||||
if session is None:
|
||||
return None
|
||||
|
||||
session_info = ChatSessionInfo.from_db(session)
|
||||
results = list(session.Messages) if session.Messages else []
|
||||
|
||||
has_more = len(results) > limit
|
||||
results = results[:limit]
|
||||
|
||||
# Reverse to ascending order
|
||||
results.reverse()
|
||||
|
||||
# Tool-call boundary fix: if the oldest message is a tool message,
|
||||
# expand backward to include the preceding assistant message that
|
||||
# owns the tool_calls, so convertChatSessionMessagesToUiMessages
|
||||
# can pair them correctly.
|
||||
_BOUNDARY_SCAN_LIMIT = 10
|
||||
if results and results[0].role == "tool":
|
||||
boundary_where: dict[str, Any] = {
|
||||
"sessionId": session_id,
|
||||
"sequence": {"lt": results[0].sequence},
|
||||
}
|
||||
if user_id is not None:
|
||||
boundary_where["Session"] = {"is": {"userId": user_id}}
|
||||
extra = await PrismaChatMessage.prisma().find_many(
|
||||
where=boundary_where,
|
||||
order={"sequence": "desc"},
|
||||
take=_BOUNDARY_SCAN_LIMIT,
|
||||
)
|
||||
# Find the first non-tool message (should be the assistant)
|
||||
boundary_msgs = []
|
||||
found_owner = False
|
||||
for msg in extra:
|
||||
boundary_msgs.append(msg)
|
||||
if msg.role != "tool":
|
||||
found_owner = True
|
||||
break
|
||||
boundary_msgs.reverse()
|
||||
if not found_owner:
|
||||
logger.warning(
|
||||
"Boundary expansion did not find owning assistant message "
|
||||
"for session=%s before sequence=%s (%d msgs scanned)",
|
||||
session_id,
|
||||
results[0].sequence,
|
||||
len(extra),
|
||||
)
|
||||
if boundary_msgs:
|
||||
results = boundary_msgs + results
|
||||
# Only mark has_more if the expanded boundary isn't the
|
||||
# very start of the conversation (sequence 0).
|
||||
if boundary_msgs[0].sequence > 0:
|
||||
has_more = True
|
||||
|
||||
messages = [ChatMessage.from_db(m) for m in results]
|
||||
oldest_sequence = messages[0].sequence if messages else None
|
||||
|
||||
return PaginatedMessages(
|
||||
messages=messages,
|
||||
has_more=has_more,
|
||||
oldest_sequence=oldest_sequence,
|
||||
session=session_info,
|
||||
)
|
||||
|
||||
|
||||
async def create_chat_session(
|
||||
session_id: str,
|
||||
user_id: str,
|
||||
@@ -380,8 +501,11 @@ async def update_tool_message_content(
|
||||
async def set_turn_duration(session_id: str, duration_ms: int) -> None:
|
||||
"""Set durationMs on the last assistant message in a session.
|
||||
|
||||
Also invalidates the Redis session cache so the next GET returns
|
||||
the updated duration.
|
||||
Updates the Redis cache in-place instead of invalidating it.
|
||||
Invalidation would delete the key, creating a window where concurrent
|
||||
``get_chat_session`` calls re-populate the cache from DB — potentially
|
||||
with stale data if the DB write from the previous turn hasn't propagated.
|
||||
This race caused duplicate user messages on the next turn.
|
||||
"""
|
||||
last_msg = await PrismaChatMessage.prisma().find_first(
|
||||
where={"sessionId": session_id, "role": "assistant"},
|
||||
@@ -392,5 +516,13 @@ async def set_turn_duration(session_id: str, duration_ms: int) -> None:
|
||||
where={"id": last_msg.id},
|
||||
data={"durationMs": duration_ms},
|
||||
)
|
||||
# Invalidate cache so the session is re-fetched from DB with durationMs
|
||||
await invalidate_session_cache(session_id)
|
||||
# Update cache in-place rather than invalidating to avoid a
|
||||
# race window where the empty cache gets re-populated with
|
||||
# stale data by a concurrent get_chat_session call.
|
||||
session = await get_chat_session_cached(session_id)
|
||||
if session and session.messages:
|
||||
for msg in reversed(session.messages):
|
||||
if msg.role == "assistant":
|
||||
msg.duration_ms = duration_ms
|
||||
break
|
||||
await cache_chat_session(session)
|
||||
|
||||
388
autogpt_platform/backend/backend/copilot/db_test.py
Normal file
388
autogpt_platform/backend/backend/copilot/db_test.py
Normal file
@@ -0,0 +1,388 @@
|
||||
"""Unit tests for copilot.db — paginated message queries."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import UTC, datetime
|
||||
from typing import Any
|
||||
from unittest.mock import AsyncMock, patch
|
||||
|
||||
import pytest
|
||||
from prisma.models import ChatMessage as PrismaChatMessage
|
||||
from prisma.models import ChatSession as PrismaChatSession
|
||||
|
||||
from backend.copilot.db import (
|
||||
PaginatedMessages,
|
||||
get_chat_messages_paginated,
|
||||
set_turn_duration,
|
||||
)
|
||||
from backend.copilot.model import ChatMessage as CopilotChatMessage
|
||||
from backend.copilot.model import ChatSession, get_chat_session, upsert_chat_session
|
||||
|
||||
|
||||
def _make_msg(
|
||||
sequence: int,
|
||||
role: str = "assistant",
|
||||
content: str | None = "hello",
|
||||
tool_calls: Any = None,
|
||||
) -> PrismaChatMessage:
|
||||
"""Build a minimal PrismaChatMessage for testing."""
|
||||
return PrismaChatMessage(
|
||||
id=f"msg-{sequence}",
|
||||
createdAt=datetime.now(UTC),
|
||||
sessionId="sess-1",
|
||||
role=role,
|
||||
content=content,
|
||||
sequence=sequence,
|
||||
toolCalls=tool_calls,
|
||||
name=None,
|
||||
toolCallId=None,
|
||||
refusal=None,
|
||||
functionCall=None,
|
||||
)
|
||||
|
||||
|
||||
def _make_session(
|
||||
session_id: str = "sess-1",
|
||||
user_id: str = "user-1",
|
||||
messages: list[PrismaChatMessage] | None = None,
|
||||
) -> PrismaChatSession:
|
||||
"""Build a minimal PrismaChatSession for testing."""
|
||||
now = datetime.now(UTC)
|
||||
session = PrismaChatSession.model_construct(
|
||||
id=session_id,
|
||||
createdAt=now,
|
||||
updatedAt=now,
|
||||
userId=user_id,
|
||||
credentials={},
|
||||
successfulAgentRuns={},
|
||||
successfulAgentSchedules={},
|
||||
totalPromptTokens=0,
|
||||
totalCompletionTokens=0,
|
||||
title=None,
|
||||
metadata={},
|
||||
Messages=messages or [],
|
||||
)
|
||||
return session
|
||||
|
||||
|
||||
SESSION_ID = "sess-1"
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def mock_db():
|
||||
"""Patch ChatSession.prisma().find_first and ChatMessage.prisma().find_many.
|
||||
|
||||
find_first is used for the main query (session + included messages).
|
||||
find_many is used only for boundary expansion queries.
|
||||
"""
|
||||
with (
|
||||
patch.object(PrismaChatSession, "prisma") as mock_session_prisma,
|
||||
patch.object(PrismaChatMessage, "prisma") as mock_msg_prisma,
|
||||
):
|
||||
find_first = AsyncMock()
|
||||
mock_session_prisma.return_value.find_first = find_first
|
||||
|
||||
find_many = AsyncMock(return_value=[])
|
||||
mock_msg_prisma.return_value.find_many = find_many
|
||||
|
||||
yield find_first, find_many
|
||||
|
||||
|
||||
# ---------- Basic pagination ----------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_basic_page_returns_messages_ascending(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""Messages are returned in ascending sequence order."""
|
||||
find_first, _ = mock_db
|
||||
find_first.return_value = _make_session(
|
||||
messages=[_make_msg(3), _make_msg(2), _make_msg(1)],
|
||||
)
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
|
||||
|
||||
assert isinstance(page, PaginatedMessages)
|
||||
assert [m.sequence for m in page.messages] == [1, 2, 3]
|
||||
assert page.has_more is False
|
||||
assert page.oldest_sequence == 1
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_has_more_when_results_exceed_limit(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""has_more is True when DB returns more than limit items."""
|
||||
find_first, _ = mock_db
|
||||
find_first.return_value = _make_session(
|
||||
messages=[_make_msg(3), _make_msg(2), _make_msg(1)],
|
||||
)
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=2)
|
||||
|
||||
assert page is not None
|
||||
assert page.has_more is True
|
||||
assert len(page.messages) == 2
|
||||
assert [m.sequence for m in page.messages] == [2, 3]
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_empty_session_returns_no_messages(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
find_first, _ = mock_db
|
||||
find_first.return_value = _make_session(messages=[])
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=50)
|
||||
|
||||
assert page is not None
|
||||
assert page.messages == []
|
||||
assert page.has_more is False
|
||||
assert page.oldest_sequence is None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_before_sequence_filters_correctly(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""before_sequence is passed as a where filter inside the Messages include."""
|
||||
find_first, _ = mock_db
|
||||
find_first.return_value = _make_session(
|
||||
messages=[_make_msg(2), _make_msg(1)],
|
||||
)
|
||||
|
||||
await get_chat_messages_paginated(SESSION_ID, limit=50, before_sequence=5)
|
||||
|
||||
call_kwargs = find_first.call_args
|
||||
include = call_kwargs.kwargs.get("include") or call_kwargs[1].get("include")
|
||||
assert include["Messages"]["where"] == {"sequence": {"lt": 5}}
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_no_where_on_messages_without_before_sequence(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""Without before_sequence, the Messages include has no where clause."""
|
||||
find_first, _ = mock_db
|
||||
find_first.return_value = _make_session(messages=[_make_msg(1)])
|
||||
|
||||
await get_chat_messages_paginated(SESSION_ID, limit=50)
|
||||
|
||||
call_kwargs = find_first.call_args
|
||||
include = call_kwargs.kwargs.get("include") or call_kwargs[1].get("include")
|
||||
assert "where" not in include["Messages"]
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_user_id_filter_applied_to_session_where(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""user_id adds a userId filter to the session-level where clause."""
|
||||
find_first, _ = mock_db
|
||||
find_first.return_value = _make_session(messages=[_make_msg(1)])
|
||||
|
||||
await get_chat_messages_paginated(SESSION_ID, limit=50, user_id="user-abc")
|
||||
|
||||
call_kwargs = find_first.call_args
|
||||
where = call_kwargs.kwargs.get("where") or call_kwargs[1].get("where")
|
||||
assert where["userId"] == "user-abc"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_session_not_found_returns_none(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""Returns None when session doesn't exist or user doesn't own it."""
|
||||
find_first, _ = mock_db
|
||||
find_first.return_value = None
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=50)
|
||||
|
||||
assert page is None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_session_info_included_in_result(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""PaginatedMessages includes session metadata."""
|
||||
find_first, _ = mock_db
|
||||
find_first.return_value = _make_session(messages=[_make_msg(1)])
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=50)
|
||||
|
||||
assert page is not None
|
||||
assert page.session.session_id == SESSION_ID
|
||||
|
||||
|
||||
# ---------- Backward boundary expansion ----------
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_boundary_expansion_includes_assistant(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""When page starts with a tool message, expand backward to include
|
||||
the owning assistant message."""
|
||||
find_first, find_many = mock_db
|
||||
find_first.return_value = _make_session(
|
||||
messages=[_make_msg(5, role="tool"), _make_msg(4, role="tool")],
|
||||
)
|
||||
find_many.return_value = [_make_msg(3, role="assistant")]
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
|
||||
|
||||
assert page is not None
|
||||
assert [m.sequence for m in page.messages] == [3, 4, 5]
|
||||
assert page.messages[0].role == "assistant"
|
||||
assert page.oldest_sequence == 3
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_boundary_expansion_includes_multiple_tool_msgs(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""Boundary expansion scans past consecutive tool messages to find
|
||||
the owning assistant."""
|
||||
find_first, find_many = mock_db
|
||||
find_first.return_value = _make_session(
|
||||
messages=[_make_msg(7, role="tool")],
|
||||
)
|
||||
find_many.return_value = [
|
||||
_make_msg(6, role="tool"),
|
||||
_make_msg(5, role="tool"),
|
||||
_make_msg(4, role="assistant"),
|
||||
]
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
|
||||
|
||||
assert page is not None
|
||||
assert [m.sequence for m in page.messages] == [4, 5, 6, 7]
|
||||
assert page.messages[0].role == "assistant"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_boundary_expansion_sets_has_more_when_not_at_start(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""After boundary expansion, has_more=True if expanded msgs aren't at seq 0."""
|
||||
find_first, find_many = mock_db
|
||||
find_first.return_value = _make_session(
|
||||
messages=[_make_msg(3, role="tool")],
|
||||
)
|
||||
find_many.return_value = [_make_msg(2, role="assistant")]
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
|
||||
|
||||
assert page is not None
|
||||
assert page.has_more is True
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_boundary_expansion_no_has_more_at_conversation_start(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""has_more stays False when boundary expansion reaches seq 0."""
|
||||
find_first, find_many = mock_db
|
||||
find_first.return_value = _make_session(
|
||||
messages=[_make_msg(1, role="tool")],
|
||||
)
|
||||
find_many.return_value = [_make_msg(0, role="assistant")]
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
|
||||
|
||||
assert page is not None
|
||||
assert page.has_more is False
|
||||
assert page.oldest_sequence == 0
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_no_boundary_expansion_when_first_msg_not_tool(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""No boundary expansion when the first message is not a tool message."""
|
||||
find_first, find_many = mock_db
|
||||
find_first.return_value = _make_session(
|
||||
messages=[_make_msg(3, role="user"), _make_msg(2, role="assistant")],
|
||||
)
|
||||
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
|
||||
|
||||
assert page is not None
|
||||
assert find_many.call_count == 0
|
||||
assert [m.sequence for m in page.messages] == [2, 3]
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_boundary_expansion_warns_when_no_owner_found(
|
||||
mock_db: tuple[AsyncMock, AsyncMock],
|
||||
):
|
||||
"""When boundary scan doesn't find a non-tool message, a warning is logged
|
||||
and the boundary messages are still included."""
|
||||
find_first, find_many = mock_db
|
||||
find_first.return_value = _make_session(
|
||||
messages=[_make_msg(10, role="tool")],
|
||||
)
|
||||
find_many.return_value = [_make_msg(i, role="tool") for i in range(9, -1, -1)]
|
||||
|
||||
with patch("backend.copilot.db.logger") as mock_logger:
|
||||
page = await get_chat_messages_paginated(SESSION_ID, limit=5)
|
||||
mock_logger.warning.assert_called_once()
|
||||
|
||||
assert page is not None
|
||||
assert page.messages[0].role == "tool"
|
||||
assert len(page.messages) > 1
|
||||
|
||||
|
||||
# ---------- Turn duration (integration tests) ----------
|
||||
|
||||
|
||||
@pytest.mark.asyncio(loop_scope="session")
|
||||
async def test_set_turn_duration_updates_cache_in_place(setup_test_user, test_user_id):
|
||||
"""set_turn_duration patches the cached session without invalidation.
|
||||
|
||||
Verifies that after calling set_turn_duration the Redis-cached session
|
||||
reflects the updated durationMs on the last assistant message, without
|
||||
the cache having been deleted and re-populated (which could race with
|
||||
concurrent get_chat_session calls).
|
||||
"""
|
||||
session = ChatSession.new(user_id=test_user_id, dry_run=False)
|
||||
session.messages = [
|
||||
CopilotChatMessage(role="user", content="hello"),
|
||||
CopilotChatMessage(role="assistant", content="hi there"),
|
||||
]
|
||||
session = await upsert_chat_session(session)
|
||||
|
||||
# Ensure the session is in cache
|
||||
cached = await get_chat_session(session.session_id, test_user_id)
|
||||
assert cached is not None
|
||||
assert cached.messages[-1].duration_ms is None
|
||||
|
||||
# Update turn duration — should patch cache in-place
|
||||
await set_turn_duration(session.session_id, 1234)
|
||||
|
||||
# Read from cache (not DB) — the cache should already have the update
|
||||
updated = await get_chat_session(session.session_id, test_user_id)
|
||||
assert updated is not None
|
||||
assistant_msgs = [m for m in updated.messages if m.role == "assistant"]
|
||||
assert len(assistant_msgs) == 1
|
||||
assert assistant_msgs[0].duration_ms == 1234
|
||||
|
||||
|
||||
@pytest.mark.asyncio(loop_scope="session")
|
||||
async def test_set_turn_duration_no_assistant_message(setup_test_user, test_user_id):
|
||||
"""set_turn_duration is a no-op when there are no assistant messages."""
|
||||
session = ChatSession.new(user_id=test_user_id, dry_run=False)
|
||||
session.messages = [
|
||||
CopilotChatMessage(role="user", content="hello"),
|
||||
]
|
||||
session = await upsert_chat_session(session)
|
||||
|
||||
# Should not raise
|
||||
await set_turn_duration(session.session_id, 5678)
|
||||
|
||||
cached = await get_chat_session(session.session_id, test_user_id)
|
||||
assert cached is not None
|
||||
# User message should not have durationMs
|
||||
assert cached.messages[0].duration_ms is None
|
||||
@@ -13,7 +13,7 @@ import time
|
||||
|
||||
from backend.copilot import stream_registry
|
||||
from backend.copilot.baseline import stream_chat_completion_baseline
|
||||
from backend.copilot.config import ChatConfig
|
||||
from backend.copilot.config import ChatConfig, CopilotMode
|
||||
from backend.copilot.response_model import StreamError
|
||||
from backend.copilot.sdk import service as sdk_service
|
||||
from backend.copilot.sdk.dummy import stream_chat_completion_dummy
|
||||
@@ -30,6 +30,57 @@ from .utils import CoPilotExecutionEntry, CoPilotLogMetadata
|
||||
logger = TruncatedLogger(logging.getLogger(__name__), prefix="[CoPilotExecutor]")
|
||||
|
||||
|
||||
# ============ Mode Routing ============ #
|
||||
|
||||
|
||||
async def resolve_effective_mode(
|
||||
mode: CopilotMode | None,
|
||||
user_id: str | None,
|
||||
) -> CopilotMode | None:
|
||||
"""Strip ``mode`` when the user is not entitled to the toggle.
|
||||
|
||||
The UI gates the mode toggle behind ``CHAT_MODE_OPTION``; the
|
||||
processor enforces the same gate server-side so an authenticated
|
||||
user cannot bypass the flag by crafting a request directly.
|
||||
"""
|
||||
if mode is None:
|
||||
return None
|
||||
allowed = await is_feature_enabled(
|
||||
Flag.CHAT_MODE_OPTION,
|
||||
user_id or "anonymous",
|
||||
default=False,
|
||||
)
|
||||
if not allowed:
|
||||
logger.info(f"Ignoring mode={mode} — CHAT_MODE_OPTION is disabled for user")
|
||||
return None
|
||||
return mode
|
||||
|
||||
|
||||
async def resolve_use_sdk_for_mode(
|
||||
mode: CopilotMode | None,
|
||||
user_id: str | None,
|
||||
*,
|
||||
use_claude_code_subscription: bool,
|
||||
config_default: bool,
|
||||
) -> bool:
|
||||
"""Pick the SDK vs baseline path for a single turn.
|
||||
|
||||
Per-request ``mode`` wins whenever it is set (after the
|
||||
``CHAT_MODE_OPTION`` gate has been applied upstream). Otherwise
|
||||
falls back to the Claude Code subscription override, then the
|
||||
``COPILOT_SDK`` LaunchDarkly flag, then the config default.
|
||||
"""
|
||||
if mode == "fast":
|
||||
return False
|
||||
if mode == "extended_thinking":
|
||||
return True
|
||||
return use_claude_code_subscription or await is_feature_enabled(
|
||||
Flag.COPILOT_SDK,
|
||||
user_id or "anonymous",
|
||||
default=config_default,
|
||||
)
|
||||
|
||||
|
||||
# ============ Module Entry Points ============ #
|
||||
|
||||
# Thread-local storage for processor instances
|
||||
@@ -100,8 +151,8 @@ class CoPilotProcessor:
|
||||
This method is called once per worker thread to set up the async event
|
||||
loop and initialize any required resources.
|
||||
|
||||
Database is accessed only through DatabaseManager, so we don't need to connect
|
||||
to Prisma directly.
|
||||
DB operations route through DatabaseManagerAsyncClient (RPC) via the
|
||||
db_accessors pattern — no direct Prisma connection is needed here.
|
||||
"""
|
||||
configure_logging()
|
||||
set_service_name("CoPilotExecutor")
|
||||
@@ -250,21 +301,26 @@ class CoPilotProcessor:
|
||||
if config.test_mode:
|
||||
stream_fn = stream_chat_completion_dummy
|
||||
log.warning("Using DUMMY service (CHAT_TEST_MODE=true)")
|
||||
effective_mode = None
|
||||
else:
|
||||
use_sdk = (
|
||||
config.use_claude_code_subscription
|
||||
or await is_feature_enabled(
|
||||
Flag.COPILOT_SDK,
|
||||
entry.user_id or "anonymous",
|
||||
default=config.use_claude_agent_sdk,
|
||||
)
|
||||
# Enforce server-side feature-flag gate so unauthorised
|
||||
# users cannot force a mode by crafting the request.
|
||||
effective_mode = await resolve_effective_mode(entry.mode, entry.user_id)
|
||||
use_sdk = await resolve_use_sdk_for_mode(
|
||||
effective_mode,
|
||||
entry.user_id,
|
||||
use_claude_code_subscription=config.use_claude_code_subscription,
|
||||
config_default=config.use_claude_agent_sdk,
|
||||
)
|
||||
stream_fn = (
|
||||
sdk_service.stream_chat_completion_sdk
|
||||
if use_sdk
|
||||
else stream_chat_completion_baseline
|
||||
)
|
||||
log.info(f"Using {'SDK' if use_sdk else 'baseline'} service")
|
||||
log.info(
|
||||
f"Using {'SDK' if use_sdk else 'baseline'} service "
|
||||
f"(mode={effective_mode or 'default'})"
|
||||
)
|
||||
|
||||
# Stream chat completion and publish chunks to Redis.
|
||||
# stream_and_publish wraps the raw stream with registry
|
||||
@@ -276,6 +332,7 @@ class CoPilotProcessor:
|
||||
user_id=entry.user_id,
|
||||
context=entry.context,
|
||||
file_ids=entry.file_ids,
|
||||
mode=effective_mode,
|
||||
)
|
||||
async for chunk in stream_registry.stream_and_publish(
|
||||
session_id=entry.session_id,
|
||||
|
||||
@@ -0,0 +1,175 @@
|
||||
"""Unit tests for CoPilot mode routing logic in the processor.
|
||||
|
||||
Tests cover the mode→service mapping:
|
||||
- 'fast' → baseline service
|
||||
- 'extended_thinking' → SDK service
|
||||
- None → feature flag / config fallback
|
||||
|
||||
as well as the ``CHAT_MODE_OPTION`` server-side gate. The tests import
|
||||
the real production helpers from ``processor.py`` so the routing logic
|
||||
has meaningful coverage.
|
||||
"""
|
||||
|
||||
from unittest.mock import AsyncMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from backend.copilot.executor.processor import (
|
||||
resolve_effective_mode,
|
||||
resolve_use_sdk_for_mode,
|
||||
)
|
||||
|
||||
|
||||
class TestResolveUseSdkForMode:
|
||||
"""Tests for the per-request mode routing logic."""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_fast_mode_uses_baseline(self):
|
||||
"""mode='fast' always routes to baseline, regardless of flags."""
|
||||
with patch(
|
||||
"backend.copilot.executor.processor.is_feature_enabled",
|
||||
new=AsyncMock(return_value=True),
|
||||
):
|
||||
assert (
|
||||
await resolve_use_sdk_for_mode(
|
||||
"fast",
|
||||
"user-1",
|
||||
use_claude_code_subscription=True,
|
||||
config_default=True,
|
||||
)
|
||||
is False
|
||||
)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_extended_thinking_uses_sdk(self):
|
||||
"""mode='extended_thinking' always routes to SDK, regardless of flags."""
|
||||
with patch(
|
||||
"backend.copilot.executor.processor.is_feature_enabled",
|
||||
new=AsyncMock(return_value=False),
|
||||
):
|
||||
assert (
|
||||
await resolve_use_sdk_for_mode(
|
||||
"extended_thinking",
|
||||
"user-1",
|
||||
use_claude_code_subscription=False,
|
||||
config_default=False,
|
||||
)
|
||||
is True
|
||||
)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_none_mode_uses_subscription_override(self):
|
||||
"""mode=None with claude_code_subscription=True routes to SDK."""
|
||||
with patch(
|
||||
"backend.copilot.executor.processor.is_feature_enabled",
|
||||
new=AsyncMock(return_value=False),
|
||||
):
|
||||
assert (
|
||||
await resolve_use_sdk_for_mode(
|
||||
None,
|
||||
"user-1",
|
||||
use_claude_code_subscription=True,
|
||||
config_default=False,
|
||||
)
|
||||
is True
|
||||
)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_none_mode_uses_feature_flag(self):
|
||||
"""mode=None with feature flag enabled routes to SDK."""
|
||||
with patch(
|
||||
"backend.copilot.executor.processor.is_feature_enabled",
|
||||
new=AsyncMock(return_value=True),
|
||||
) as flag_mock:
|
||||
assert (
|
||||
await resolve_use_sdk_for_mode(
|
||||
None,
|
||||
"user-1",
|
||||
use_claude_code_subscription=False,
|
||||
config_default=False,
|
||||
)
|
||||
is True
|
||||
)
|
||||
flag_mock.assert_awaited_once()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_none_mode_uses_config_default(self):
|
||||
"""mode=None falls back to config.use_claude_agent_sdk."""
|
||||
# When LaunchDarkly returns the default (True), we expect SDK routing.
|
||||
with patch(
|
||||
"backend.copilot.executor.processor.is_feature_enabled",
|
||||
new=AsyncMock(return_value=True),
|
||||
):
|
||||
assert (
|
||||
await resolve_use_sdk_for_mode(
|
||||
None,
|
||||
"user-1",
|
||||
use_claude_code_subscription=False,
|
||||
config_default=True,
|
||||
)
|
||||
is True
|
||||
)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_none_mode_all_disabled(self):
|
||||
"""mode=None with all flags off routes to baseline."""
|
||||
with patch(
|
||||
"backend.copilot.executor.processor.is_feature_enabled",
|
||||
new=AsyncMock(return_value=False),
|
||||
):
|
||||
assert (
|
||||
await resolve_use_sdk_for_mode(
|
||||
None,
|
||||
"user-1",
|
||||
use_claude_code_subscription=False,
|
||||
config_default=False,
|
||||
)
|
||||
is False
|
||||
)
|
||||
|
||||
|
||||
class TestResolveEffectiveMode:
|
||||
"""Tests for the CHAT_MODE_OPTION server-side gate."""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_none_mode_passes_through(self):
|
||||
"""mode=None is returned as-is without a flag check."""
|
||||
with patch(
|
||||
"backend.copilot.executor.processor.is_feature_enabled",
|
||||
new=AsyncMock(return_value=False),
|
||||
) as flag_mock:
|
||||
assert await resolve_effective_mode(None, "user-1") is None
|
||||
flag_mock.assert_not_awaited()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_mode_stripped_when_flag_disabled(self):
|
||||
"""When CHAT_MODE_OPTION is off, mode is dropped to None."""
|
||||
with patch(
|
||||
"backend.copilot.executor.processor.is_feature_enabled",
|
||||
new=AsyncMock(return_value=False),
|
||||
):
|
||||
assert await resolve_effective_mode("fast", "user-1") is None
|
||||
assert await resolve_effective_mode("extended_thinking", "user-1") is None
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_mode_preserved_when_flag_enabled(self):
|
||||
"""When CHAT_MODE_OPTION is on, the user-selected mode is preserved."""
|
||||
with patch(
|
||||
"backend.copilot.executor.processor.is_feature_enabled",
|
||||
new=AsyncMock(return_value=True),
|
||||
):
|
||||
assert await resolve_effective_mode("fast", "user-1") == "fast"
|
||||
assert (
|
||||
await resolve_effective_mode("extended_thinking", "user-1")
|
||||
== "extended_thinking"
|
||||
)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_anonymous_user_with_mode(self):
|
||||
"""Anonymous users (user_id=None) still pass through the gate."""
|
||||
with patch(
|
||||
"backend.copilot.executor.processor.is_feature_enabled",
|
||||
new=AsyncMock(return_value=False),
|
||||
) as flag_mock:
|
||||
assert await resolve_effective_mode("fast", None) is None
|
||||
flag_mock.assert_awaited_once()
|
||||
@@ -9,6 +9,7 @@ import logging
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.copilot.config import CopilotMode
|
||||
from backend.data.rabbitmq import Exchange, ExchangeType, Queue, RabbitMQConfig
|
||||
from backend.util.logging import TruncatedLogger, is_structured_logging_enabled
|
||||
|
||||
@@ -156,6 +157,9 @@ class CoPilotExecutionEntry(BaseModel):
|
||||
file_ids: list[str] | None = None
|
||||
"""Workspace file IDs attached to the user's message"""
|
||||
|
||||
mode: CopilotMode | None = None
|
||||
"""Autopilot mode override: 'fast' or 'extended_thinking'. None = server default."""
|
||||
|
||||
|
||||
class CancelCoPilotEvent(BaseModel):
|
||||
"""Event to cancel a CoPilot operation."""
|
||||
@@ -175,6 +179,7 @@ async def enqueue_copilot_turn(
|
||||
is_user_message: bool = True,
|
||||
context: dict[str, str] | None = None,
|
||||
file_ids: list[str] | None = None,
|
||||
mode: CopilotMode | None = None,
|
||||
) -> None:
|
||||
"""Enqueue a CoPilot task for processing by the executor service.
|
||||
|
||||
@@ -186,6 +191,7 @@ async def enqueue_copilot_turn(
|
||||
is_user_message: Whether the message is from the user (vs system/assistant)
|
||||
context: Optional context for the message (e.g., {url: str, content: str})
|
||||
file_ids: Optional workspace file IDs attached to the user's message
|
||||
mode: Autopilot mode override ('fast' or 'extended_thinking'). None = server default.
|
||||
"""
|
||||
from backend.util.clients import get_async_copilot_queue
|
||||
|
||||
@@ -197,6 +203,7 @@ async def enqueue_copilot_turn(
|
||||
is_user_message=is_user_message,
|
||||
context=context,
|
||||
file_ids=file_ids,
|
||||
mode=mode,
|
||||
)
|
||||
|
||||
queue_client = await get_async_copilot_queue()
|
||||
|
||||
123
autogpt_platform/backend/backend/copilot/executor/utils_test.py
Normal file
123
autogpt_platform/backend/backend/copilot/executor/utils_test.py
Normal file
@@ -0,0 +1,123 @@
|
||||
"""Tests for CoPilot executor utils (queue config, message models, logging)."""
|
||||
|
||||
from backend.copilot.executor.utils import (
|
||||
COPILOT_EXECUTION_EXCHANGE,
|
||||
COPILOT_EXECUTION_QUEUE_NAME,
|
||||
COPILOT_EXECUTION_ROUTING_KEY,
|
||||
CancelCoPilotEvent,
|
||||
CoPilotExecutionEntry,
|
||||
CoPilotLogMetadata,
|
||||
create_copilot_queue_config,
|
||||
)
|
||||
|
||||
|
||||
class TestCoPilotExecutionEntry:
|
||||
def test_basic_fields(self):
|
||||
entry = CoPilotExecutionEntry(
|
||||
session_id="s1",
|
||||
user_id="u1",
|
||||
message="hello",
|
||||
)
|
||||
assert entry.session_id == "s1"
|
||||
assert entry.user_id == "u1"
|
||||
assert entry.message == "hello"
|
||||
assert entry.is_user_message is True
|
||||
assert entry.mode is None
|
||||
assert entry.context is None
|
||||
assert entry.file_ids is None
|
||||
|
||||
def test_mode_field(self):
|
||||
entry = CoPilotExecutionEntry(
|
||||
session_id="s1",
|
||||
user_id="u1",
|
||||
message="test",
|
||||
mode="fast",
|
||||
)
|
||||
assert entry.mode == "fast"
|
||||
|
||||
entry2 = CoPilotExecutionEntry(
|
||||
session_id="s1",
|
||||
user_id="u1",
|
||||
message="test",
|
||||
mode="extended_thinking",
|
||||
)
|
||||
assert entry2.mode == "extended_thinking"
|
||||
|
||||
def test_optional_fields(self):
|
||||
entry = CoPilotExecutionEntry(
|
||||
session_id="s1",
|
||||
user_id="u1",
|
||||
message="test",
|
||||
turn_id="t1",
|
||||
context={"url": "https://example.com"},
|
||||
file_ids=["f1", "f2"],
|
||||
is_user_message=False,
|
||||
)
|
||||
assert entry.turn_id == "t1"
|
||||
assert entry.context == {"url": "https://example.com"}
|
||||
assert entry.file_ids == ["f1", "f2"]
|
||||
assert entry.is_user_message is False
|
||||
|
||||
def test_serialization_roundtrip(self):
|
||||
entry = CoPilotExecutionEntry(
|
||||
session_id="s1",
|
||||
user_id="u1",
|
||||
message="hello",
|
||||
mode="fast",
|
||||
)
|
||||
json_str = entry.model_dump_json()
|
||||
restored = CoPilotExecutionEntry.model_validate_json(json_str)
|
||||
assert restored == entry
|
||||
|
||||
|
||||
class TestCancelCoPilotEvent:
|
||||
def test_basic(self):
|
||||
event = CancelCoPilotEvent(session_id="s1")
|
||||
assert event.session_id == "s1"
|
||||
|
||||
def test_serialization(self):
|
||||
event = CancelCoPilotEvent(session_id="s1")
|
||||
restored = CancelCoPilotEvent.model_validate_json(event.model_dump_json())
|
||||
assert restored.session_id == "s1"
|
||||
|
||||
|
||||
class TestCreateCopilotQueueConfig:
|
||||
def test_returns_valid_config(self):
|
||||
config = create_copilot_queue_config()
|
||||
assert len(config.exchanges) == 2
|
||||
assert len(config.queues) == 2
|
||||
|
||||
def test_execution_queue_properties(self):
|
||||
config = create_copilot_queue_config()
|
||||
exec_queue = next(
|
||||
q for q in config.queues if q.name == COPILOT_EXECUTION_QUEUE_NAME
|
||||
)
|
||||
assert exec_queue.durable is True
|
||||
assert exec_queue.exchange == COPILOT_EXECUTION_EXCHANGE
|
||||
assert exec_queue.routing_key == COPILOT_EXECUTION_ROUTING_KEY
|
||||
|
||||
def test_cancel_queue_uses_fanout(self):
|
||||
config = create_copilot_queue_config()
|
||||
cancel_queue = next(
|
||||
q for q in config.queues if q.name != COPILOT_EXECUTION_QUEUE_NAME
|
||||
)
|
||||
assert cancel_queue.exchange is not None
|
||||
assert cancel_queue.exchange.type.value == "fanout"
|
||||
|
||||
|
||||
class TestCoPilotLogMetadata:
|
||||
def test_creates_logger_with_metadata(self):
|
||||
import logging
|
||||
|
||||
base_logger = logging.getLogger("test")
|
||||
log = CoPilotLogMetadata(base_logger, session_id="s1", user_id="u1")
|
||||
assert log is not None
|
||||
|
||||
def test_filters_none_values(self):
|
||||
import logging
|
||||
|
||||
base_logger = logging.getLogger("test")
|
||||
log = CoPilotLogMetadata(
|
||||
base_logger, session_id="s1", user_id=None, turn_id="t1"
|
||||
)
|
||||
assert log is not None
|
||||
@@ -64,6 +64,7 @@ class ChatMessage(BaseModel):
|
||||
refusal: str | None = None
|
||||
tool_calls: list[dict] | None = None
|
||||
function_call: dict | None = None
|
||||
sequence: int | None = None
|
||||
duration_ms: int | None = None
|
||||
|
||||
@staticmethod
|
||||
@@ -77,10 +78,54 @@ class ChatMessage(BaseModel):
|
||||
refusal=prisma_message.refusal,
|
||||
tool_calls=_parse_json_field(prisma_message.toolCalls),
|
||||
function_call=_parse_json_field(prisma_message.functionCall),
|
||||
sequence=prisma_message.sequence,
|
||||
duration_ms=prisma_message.durationMs,
|
||||
)
|
||||
|
||||
|
||||
def is_message_duplicate(
|
||||
messages: list[ChatMessage],
|
||||
role: str,
|
||||
content: str,
|
||||
) -> bool:
|
||||
"""Check whether *content* is already present in the current pending turn.
|
||||
|
||||
Only inspects trailing messages that share the given *role* (i.e. the
|
||||
current turn). This ensures legitimately repeated messages across different
|
||||
turns are not suppressed, while same-turn duplicates from stale cache are
|
||||
still caught.
|
||||
"""
|
||||
for m in reversed(messages):
|
||||
if m.role == role:
|
||||
if m.content == content:
|
||||
return True
|
||||
else:
|
||||
break
|
||||
return False
|
||||
|
||||
|
||||
def maybe_append_user_message(
|
||||
session: "ChatSession",
|
||||
message: str | None,
|
||||
is_user_message: bool,
|
||||
) -> bool:
|
||||
"""Append a user/assistant message to the session if not already present.
|
||||
|
||||
The route handler already persists the user message before enqueueing,
|
||||
so we check trailing same-role messages to avoid re-appending when the
|
||||
session cache is slightly stale.
|
||||
|
||||
Returns True if the message was appended, False if skipped.
|
||||
"""
|
||||
if not message:
|
||||
return False
|
||||
role = "user" if is_user_message else "assistant"
|
||||
if is_message_duplicate(session.messages, role, message):
|
||||
return False
|
||||
session.messages.append(ChatMessage(role=role, content=message))
|
||||
return True
|
||||
|
||||
|
||||
class Usage(BaseModel):
|
||||
prompt_tokens: int
|
||||
completion_tokens: int
|
||||
|
||||
@@ -17,6 +17,8 @@ from .model import (
|
||||
ChatSession,
|
||||
Usage,
|
||||
get_chat_session,
|
||||
is_message_duplicate,
|
||||
maybe_append_user_message,
|
||||
upsert_chat_session,
|
||||
)
|
||||
|
||||
@@ -424,3 +426,151 @@ async def test_concurrent_saves_collision_detection(setup_test_user, test_user_i
|
||||
assert "Streaming message 1" in contents
|
||||
assert "Streaming message 2" in contents
|
||||
assert "Callback result" in contents
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# is_message_duplicate #
|
||||
# --------------------------------------------------------------------------- #
|
||||
|
||||
|
||||
def test_duplicate_detected_in_trailing_same_role():
|
||||
"""Duplicate user message at the tail is detected."""
|
||||
msgs = [
|
||||
ChatMessage(role="user", content="hello"),
|
||||
ChatMessage(role="assistant", content="hi there"),
|
||||
ChatMessage(role="user", content="yes"),
|
||||
]
|
||||
assert is_message_duplicate(msgs, "user", "yes") is True
|
||||
|
||||
|
||||
def test_duplicate_not_detected_across_turns():
|
||||
"""Same text in a previous turn (separated by assistant) is NOT a duplicate."""
|
||||
msgs = [
|
||||
ChatMessage(role="user", content="yes"),
|
||||
ChatMessage(role="assistant", content="ok"),
|
||||
]
|
||||
assert is_message_duplicate(msgs, "user", "yes") is False
|
||||
|
||||
|
||||
def test_no_duplicate_on_empty_messages():
|
||||
"""Empty message list never reports a duplicate."""
|
||||
assert is_message_duplicate([], "user", "hello") is False
|
||||
|
||||
|
||||
def test_no_duplicate_when_content_differs():
|
||||
"""Different content in the trailing same-role block is not a duplicate."""
|
||||
msgs = [
|
||||
ChatMessage(role="assistant", content="response"),
|
||||
ChatMessage(role="user", content="first message"),
|
||||
]
|
||||
assert is_message_duplicate(msgs, "user", "second message") is False
|
||||
|
||||
|
||||
def test_duplicate_with_multiple_trailing_same_role():
|
||||
"""Detects duplicate among multiple consecutive same-role messages."""
|
||||
msgs = [
|
||||
ChatMessage(role="assistant", content="response"),
|
||||
ChatMessage(role="user", content="msg1"),
|
||||
ChatMessage(role="user", content="msg2"),
|
||||
]
|
||||
assert is_message_duplicate(msgs, "user", "msg1") is True
|
||||
assert is_message_duplicate(msgs, "user", "msg2") is True
|
||||
assert is_message_duplicate(msgs, "user", "msg3") is False
|
||||
|
||||
|
||||
def test_duplicate_check_for_assistant_role():
|
||||
"""Works correctly when checking assistant role too."""
|
||||
msgs = [
|
||||
ChatMessage(role="user", content="hi"),
|
||||
ChatMessage(role="assistant", content="hello"),
|
||||
ChatMessage(role="assistant", content="how can I help?"),
|
||||
]
|
||||
assert is_message_duplicate(msgs, "assistant", "hello") is True
|
||||
assert is_message_duplicate(msgs, "assistant", "new response") is False
|
||||
|
||||
|
||||
def test_no_false_positive_when_content_is_none():
|
||||
"""Messages with content=None in the trailing block do not match."""
|
||||
msgs = [
|
||||
ChatMessage(role="user", content=None),
|
||||
ChatMessage(role="user", content="hello"),
|
||||
]
|
||||
assert is_message_duplicate(msgs, "user", "hello") is True
|
||||
# None-content message should not match any string
|
||||
msgs2 = [
|
||||
ChatMessage(role="user", content=None),
|
||||
]
|
||||
assert is_message_duplicate(msgs2, "user", "hello") is False
|
||||
|
||||
|
||||
def test_all_same_role_messages():
|
||||
"""When all messages share the same role, the entire list is scanned."""
|
||||
msgs = [
|
||||
ChatMessage(role="user", content="first"),
|
||||
ChatMessage(role="user", content="second"),
|
||||
ChatMessage(role="user", content="third"),
|
||||
]
|
||||
assert is_message_duplicate(msgs, "user", "first") is True
|
||||
assert is_message_duplicate(msgs, "user", "new") is False
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# maybe_append_user_message #
|
||||
# --------------------------------------------------------------------------- #
|
||||
|
||||
|
||||
def test_maybe_append_user_message_appends_new():
|
||||
"""A new user message is appended and returns True."""
|
||||
session = ChatSession.new(user_id="u", dry_run=False)
|
||||
session.messages = [
|
||||
ChatMessage(role="assistant", content="hello"),
|
||||
]
|
||||
result = maybe_append_user_message(session, "new msg", is_user_message=True)
|
||||
assert result is True
|
||||
assert len(session.messages) == 2
|
||||
assert session.messages[-1].role == "user"
|
||||
assert session.messages[-1].content == "new msg"
|
||||
|
||||
|
||||
def test_maybe_append_user_message_skips_duplicate():
|
||||
"""A duplicate user message is skipped and returns False."""
|
||||
session = ChatSession.new(user_id="u", dry_run=False)
|
||||
session.messages = [
|
||||
ChatMessage(role="assistant", content="hello"),
|
||||
ChatMessage(role="user", content="dup"),
|
||||
]
|
||||
result = maybe_append_user_message(session, "dup", is_user_message=True)
|
||||
assert result is False
|
||||
assert len(session.messages) == 2
|
||||
|
||||
|
||||
def test_maybe_append_user_message_none_message():
|
||||
"""None/empty message returns False without appending."""
|
||||
session = ChatSession.new(user_id="u", dry_run=False)
|
||||
assert maybe_append_user_message(session, None, is_user_message=True) is False
|
||||
assert maybe_append_user_message(session, "", is_user_message=True) is False
|
||||
assert len(session.messages) == 0
|
||||
|
||||
|
||||
def test_maybe_append_assistant_message():
|
||||
"""Works for assistant role when is_user_message=False."""
|
||||
session = ChatSession.new(user_id="u", dry_run=False)
|
||||
session.messages = [
|
||||
ChatMessage(role="user", content="hi"),
|
||||
]
|
||||
result = maybe_append_user_message(session, "response", is_user_message=False)
|
||||
assert result is True
|
||||
assert session.messages[-1].role == "assistant"
|
||||
assert session.messages[-1].content == "response"
|
||||
|
||||
|
||||
def test_maybe_append_assistant_skips_duplicate():
|
||||
"""Duplicate assistant message is skipped."""
|
||||
session = ChatSession.new(user_id="u", dry_run=False)
|
||||
session.messages = [
|
||||
ChatMessage(role="user", content="hi"),
|
||||
ChatMessage(role="assistant", content="dup"),
|
||||
]
|
||||
result = maybe_append_user_message(session, "dup", is_user_message=False)
|
||||
assert result is False
|
||||
assert len(session.messages) == 2
|
||||
|
||||
@@ -126,6 +126,21 @@ After building the file, reference it with `@@agptfile:` in other tools:
|
||||
- When spawning sub-agents for research, ensure each has a distinct
|
||||
non-overlapping scope to avoid redundant searches.
|
||||
|
||||
|
||||
### Tool Discovery Priority
|
||||
|
||||
When the user asks to interact with a service or API, follow this order:
|
||||
|
||||
1. **find_block first** — Search platform blocks with `find_block`. The platform has hundreds of built-in blocks (Google Sheets, Docs, Calendar, Gmail, Slack, GitHub, etc.) that work without extra setup.
|
||||
|
||||
2. **run_mcp_tool** — If no matching block exists, check if a hosted MCP server is available for the service. Only use known MCP server URLs from the registry.
|
||||
|
||||
3. **SendAuthenticatedWebRequestBlock** — If no block or MCP server exists, use `SendAuthenticatedWebRequestBlock` with existing host-scoped credentials. Check available credentials via `connect_integration`.
|
||||
|
||||
4. **Manual API call** — As a last resort, guide the user to set up credentials and use `SendAuthenticatedWebRequestBlock` with direct API calls.
|
||||
|
||||
**Never skip step 1.** Built-in blocks are more reliable, tested, and user-friendly than MCP or raw API calls.
|
||||
|
||||
### Sub-agent tasks
|
||||
- When using the Task tool, NEVER set `run_in_background` to true.
|
||||
All tasks must run in the foreground.
|
||||
|
||||
@@ -9,11 +9,15 @@ UTC). Fails open when Redis is unavailable to avoid blocking users.
|
||||
import asyncio
|
||||
import logging
|
||||
from datetime import UTC, datetime, timedelta
|
||||
from enum import Enum
|
||||
|
||||
from prisma.models import User as PrismaUser
|
||||
from pydantic import BaseModel, Field
|
||||
from redis.exceptions import RedisError
|
||||
|
||||
from backend.data.db_accessors import user_db
|
||||
from backend.data.redis_client import get_redis_async
|
||||
from backend.util.cache import cached
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -21,6 +25,40 @@ logger = logging.getLogger(__name__)
|
||||
_USAGE_KEY_PREFIX = "copilot:usage"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Subscription tier definitions
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class SubscriptionTier(str, Enum):
|
||||
"""Subscription tiers with increasing token allowances.
|
||||
|
||||
Mirrors the ``SubscriptionTier`` enum in ``schema.prisma``.
|
||||
Once ``prisma generate`` is run, this can be replaced with::
|
||||
|
||||
from prisma.enums import SubscriptionTier
|
||||
"""
|
||||
|
||||
FREE = "FREE"
|
||||
PRO = "PRO"
|
||||
BUSINESS = "BUSINESS"
|
||||
ENTERPRISE = "ENTERPRISE"
|
||||
|
||||
|
||||
# Multiplier applied to the base limits (from LD / config) for each tier.
|
||||
# Intentionally int (not float): keeps limits as whole token counts and avoids
|
||||
# floating-point rounding. If fractional multipliers are ever needed, change
|
||||
# the type and round the result in get_global_rate_limits().
|
||||
TIER_MULTIPLIERS: dict[SubscriptionTier, int] = {
|
||||
SubscriptionTier.FREE: 1,
|
||||
SubscriptionTier.PRO: 5,
|
||||
SubscriptionTier.BUSINESS: 20,
|
||||
SubscriptionTier.ENTERPRISE: 60,
|
||||
}
|
||||
|
||||
DEFAULT_TIER = SubscriptionTier.FREE
|
||||
|
||||
|
||||
class UsageWindow(BaseModel):
|
||||
"""Usage within a single time window."""
|
||||
|
||||
@@ -36,6 +74,7 @@ class CoPilotUsageStatus(BaseModel):
|
||||
|
||||
daily: UsageWindow
|
||||
weekly: UsageWindow
|
||||
tier: SubscriptionTier = DEFAULT_TIER
|
||||
reset_cost: int = Field(
|
||||
default=0,
|
||||
description="Credit cost (in cents) to reset the daily limit. 0 = feature disabled.",
|
||||
@@ -66,6 +105,7 @@ async def get_usage_status(
|
||||
daily_token_limit: int,
|
||||
weekly_token_limit: int,
|
||||
rate_limit_reset_cost: int = 0,
|
||||
tier: SubscriptionTier = DEFAULT_TIER,
|
||||
) -> CoPilotUsageStatus:
|
||||
"""Get current usage status for a user.
|
||||
|
||||
@@ -74,6 +114,7 @@ async def get_usage_status(
|
||||
daily_token_limit: Max tokens per day (0 = unlimited).
|
||||
weekly_token_limit: Max tokens per week (0 = unlimited).
|
||||
rate_limit_reset_cost: Credit cost (cents) to reset daily limit (0 = disabled).
|
||||
tier: The user's rate-limit tier (included in the response).
|
||||
|
||||
Returns:
|
||||
CoPilotUsageStatus with current usage and limits.
|
||||
@@ -103,6 +144,7 @@ async def get_usage_status(
|
||||
limit=weekly_token_limit,
|
||||
resets_at=_weekly_reset_time(now=now),
|
||||
),
|
||||
tier=tier,
|
||||
reset_cost=rate_limit_reset_cost,
|
||||
)
|
||||
|
||||
@@ -343,20 +385,103 @@ async def record_token_usage(
|
||||
)
|
||||
|
||||
|
||||
class _UserNotFoundError(Exception):
|
||||
"""Raised when a user record is missing or has no subscription tier.
|
||||
|
||||
Used internally by ``_fetch_user_tier`` to signal a cache-miss condition:
|
||||
by raising instead of returning ``DEFAULT_TIER``, we prevent the ``@cached``
|
||||
decorator from storing the fallback value. This avoids a race condition
|
||||
where a non-existent user's DEFAULT_TIER is cached, then the user is
|
||||
created with a higher tier but receives the stale cached FREE tier for
|
||||
up to 5 minutes.
|
||||
"""
|
||||
|
||||
|
||||
@cached(maxsize=1000, ttl_seconds=300, shared_cache=True)
|
||||
async def _fetch_user_tier(user_id: str) -> SubscriptionTier:
|
||||
"""Fetch the user's rate-limit tier from the database (cached via Redis).
|
||||
|
||||
Uses ``shared_cache=True`` so that tier changes propagate across all pods
|
||||
immediately when the cache entry is invalidated (via ``cache_delete``).
|
||||
|
||||
Only successful DB lookups of existing users with a valid tier are cached.
|
||||
Raises ``_UserNotFoundError`` when the user is missing or has no tier, so
|
||||
the ``@cached`` decorator does **not** store a fallback value. This
|
||||
prevents a race condition where a non-existent user's ``DEFAULT_TIER`` is
|
||||
cached and then persists after the user is created with a higher tier.
|
||||
"""
|
||||
try:
|
||||
user = await user_db().get_user_by_id(user_id)
|
||||
except Exception:
|
||||
raise _UserNotFoundError(user_id)
|
||||
if user.subscription_tier:
|
||||
return SubscriptionTier(user.subscription_tier)
|
||||
raise _UserNotFoundError(user_id)
|
||||
|
||||
|
||||
async def get_user_tier(user_id: str) -> SubscriptionTier:
|
||||
"""Look up the user's rate-limit tier from the database.
|
||||
|
||||
Successful results are cached for 5 minutes (via ``_fetch_user_tier``)
|
||||
to avoid a DB round-trip on every rate-limit check.
|
||||
|
||||
Falls back to ``DEFAULT_TIER`` **without caching** when the DB is
|
||||
unreachable or returns an unrecognised value, so the next call retries
|
||||
the query instead of serving a stale fallback for up to 5 minutes.
|
||||
"""
|
||||
try:
|
||||
return await _fetch_user_tier(user_id)
|
||||
except Exception as exc:
|
||||
logger.warning(
|
||||
"Failed to resolve rate-limit tier for user %s, defaulting to %s: %s",
|
||||
user_id[:8],
|
||||
DEFAULT_TIER.value,
|
||||
exc,
|
||||
)
|
||||
return DEFAULT_TIER
|
||||
|
||||
|
||||
# Expose cache management on the public function so callers (including tests)
|
||||
# never need to reach into the private ``_fetch_user_tier``.
|
||||
get_user_tier.cache_clear = _fetch_user_tier.cache_clear # type: ignore[attr-defined]
|
||||
get_user_tier.cache_delete = _fetch_user_tier.cache_delete # type: ignore[attr-defined]
|
||||
|
||||
|
||||
async def set_user_tier(user_id: str, tier: SubscriptionTier) -> None:
|
||||
"""Persist the user's rate-limit tier to the database.
|
||||
|
||||
Also invalidates the ``get_user_tier`` cache for this user so that
|
||||
subsequent rate-limit checks immediately see the new tier.
|
||||
|
||||
Raises:
|
||||
prisma.errors.RecordNotFoundError: If the user does not exist.
|
||||
"""
|
||||
await PrismaUser.prisma().update(
|
||||
where={"id": user_id},
|
||||
data={"subscriptionTier": tier.value},
|
||||
)
|
||||
# Invalidate cached tier so rate-limit checks pick up the change immediately.
|
||||
get_user_tier.cache_delete(user_id) # type: ignore[attr-defined]
|
||||
|
||||
|
||||
async def get_global_rate_limits(
|
||||
user_id: str,
|
||||
config_daily: int,
|
||||
config_weekly: int,
|
||||
) -> tuple[int, int]:
|
||||
) -> tuple[int, int, SubscriptionTier]:
|
||||
"""Resolve global rate limits from LaunchDarkly, falling back to config.
|
||||
|
||||
The base limits (from LD or config) are multiplied by the user's
|
||||
tier multiplier so that higher tiers receive proportionally larger
|
||||
allowances.
|
||||
|
||||
Args:
|
||||
user_id: User ID for LD flag evaluation context.
|
||||
config_daily: Fallback daily limit from ChatConfig.
|
||||
config_weekly: Fallback weekly limit from ChatConfig.
|
||||
|
||||
Returns:
|
||||
(daily_token_limit, weekly_token_limit) tuple.
|
||||
(daily_token_limit, weekly_token_limit, tier) 3-tuple.
|
||||
"""
|
||||
# Lazy import to avoid circular dependency:
|
||||
# rate_limit -> feature_flag -> settings -> ... -> rate_limit
|
||||
@@ -378,7 +503,15 @@ async def get_global_rate_limits(
|
||||
except (TypeError, ValueError):
|
||||
logger.warning("Invalid LD value for weekly token limit: %r", weekly_raw)
|
||||
weekly = config_weekly
|
||||
return daily, weekly
|
||||
|
||||
# Apply tier multiplier
|
||||
tier = await get_user_tier(user_id)
|
||||
multiplier = TIER_MULTIPLIERS.get(tier, 1)
|
||||
if multiplier != 1:
|
||||
daily = daily * multiplier
|
||||
weekly = weekly * multiplier
|
||||
|
||||
return daily, weekly, tier
|
||||
|
||||
|
||||
async def reset_user_usage(user_id: str, *, reset_weekly: bool = False) -> None:
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -9,7 +9,7 @@ import pytest
|
||||
from fastapi import HTTPException
|
||||
|
||||
from backend.api.features.chat.routes import reset_copilot_usage
|
||||
from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow
|
||||
from backend.copilot.rate_limit import CoPilotUsageStatus, SubscriptionTier, UsageWindow
|
||||
from backend.util.exceptions import InsufficientBalanceError
|
||||
|
||||
|
||||
@@ -53,6 +53,18 @@ def _mock_settings(enable_credit: bool = True):
|
||||
return mock
|
||||
|
||||
|
||||
def _mock_rate_limits(
|
||||
daily: int = 2_500_000,
|
||||
weekly: int = 12_500_000,
|
||||
tier: SubscriptionTier = SubscriptionTier.PRO,
|
||||
):
|
||||
"""Mock get_global_rate_limits to return fixed limits (no tier multiplier)."""
|
||||
return patch(
|
||||
f"{_MODULE}.get_global_rate_limits",
|
||||
AsyncMock(return_value=(daily, weekly, tier)),
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
class TestResetCopilotUsage:
|
||||
async def test_feature_disabled_returns_400(self):
|
||||
@@ -70,10 +82,7 @@ class TestResetCopilotUsage:
|
||||
with (
|
||||
patch(f"{_MODULE}.config", _make_config(daily_token_limit=0)),
|
||||
patch(f"{_MODULE}.settings", _mock_settings()),
|
||||
patch(
|
||||
f"{_MODULE}.get_global_rate_limits",
|
||||
AsyncMock(return_value=(0, 12_500_000)),
|
||||
),
|
||||
_mock_rate_limits(daily=0),
|
||||
):
|
||||
with pytest.raises(HTTPException) as exc_info:
|
||||
await reset_copilot_usage(user_id="user-1")
|
||||
@@ -87,10 +96,7 @@ class TestResetCopilotUsage:
|
||||
with (
|
||||
patch(f"{_MODULE}.config", cfg),
|
||||
patch(f"{_MODULE}.settings", _mock_settings()),
|
||||
patch(
|
||||
f"{_MODULE}.get_global_rate_limits",
|
||||
AsyncMock(return_value=(2_500_000, 12_500_000)),
|
||||
),
|
||||
_mock_rate_limits(),
|
||||
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
|
||||
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
|
||||
patch(f"{_MODULE}.release_reset_lock", AsyncMock()) as mock_release,
|
||||
@@ -120,10 +126,7 @@ class TestResetCopilotUsage:
|
||||
with (
|
||||
patch(f"{_MODULE}.config", cfg),
|
||||
patch(f"{_MODULE}.settings", _mock_settings()),
|
||||
patch(
|
||||
f"{_MODULE}.get_global_rate_limits",
|
||||
AsyncMock(return_value=(2_500_000, 12_500_000)),
|
||||
),
|
||||
_mock_rate_limits(),
|
||||
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
|
||||
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
|
||||
patch(f"{_MODULE}.release_reset_lock", AsyncMock()) as mock_release,
|
||||
@@ -153,10 +156,7 @@ class TestResetCopilotUsage:
|
||||
with (
|
||||
patch(f"{_MODULE}.config", cfg),
|
||||
patch(f"{_MODULE}.settings", _mock_settings()),
|
||||
patch(
|
||||
f"{_MODULE}.get_global_rate_limits",
|
||||
AsyncMock(return_value=(2_500_000, 12_500_000)),
|
||||
),
|
||||
_mock_rate_limits(),
|
||||
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
|
||||
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
|
||||
patch(f"{_MODULE}.release_reset_lock", AsyncMock()),
|
||||
@@ -187,10 +187,7 @@ class TestResetCopilotUsage:
|
||||
with (
|
||||
patch(f"{_MODULE}.config", cfg),
|
||||
patch(f"{_MODULE}.settings", _mock_settings()),
|
||||
patch(
|
||||
f"{_MODULE}.get_global_rate_limits",
|
||||
AsyncMock(return_value=(2_500_000, 12_500_000)),
|
||||
),
|
||||
_mock_rate_limits(),
|
||||
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=3)),
|
||||
):
|
||||
with pytest.raises(HTTPException) as exc_info:
|
||||
@@ -228,10 +225,7 @@ class TestResetCopilotUsage:
|
||||
with (
|
||||
patch(f"{_MODULE}.config", cfg),
|
||||
patch(f"{_MODULE}.settings", _mock_settings()),
|
||||
patch(
|
||||
f"{_MODULE}.get_global_rate_limits",
|
||||
AsyncMock(return_value=(2_500_000, 12_500_000)),
|
||||
),
|
||||
_mock_rate_limits(),
|
||||
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
|
||||
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
|
||||
patch(f"{_MODULE}.release_reset_lock", AsyncMock()) as mock_release,
|
||||
@@ -252,10 +246,7 @@ class TestResetCopilotUsage:
|
||||
with (
|
||||
patch(f"{_MODULE}.config", _make_config()),
|
||||
patch(f"{_MODULE}.settings", _mock_settings()),
|
||||
patch(
|
||||
f"{_MODULE}.get_global_rate_limits",
|
||||
AsyncMock(return_value=(2_500_000, 12_500_000)),
|
||||
),
|
||||
_mock_rate_limits(),
|
||||
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=None)),
|
||||
):
|
||||
with pytest.raises(HTTPException) as exc_info:
|
||||
@@ -273,10 +264,7 @@ class TestResetCopilotUsage:
|
||||
with (
|
||||
patch(f"{_MODULE}.config", cfg),
|
||||
patch(f"{_MODULE}.settings", _mock_settings()),
|
||||
patch(
|
||||
f"{_MODULE}.get_global_rate_limits",
|
||||
AsyncMock(return_value=(2_500_000, 12_500_000)),
|
||||
),
|
||||
_mock_rate_limits(),
|
||||
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
|
||||
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
|
||||
patch(f"{_MODULE}.release_reset_lock", AsyncMock()),
|
||||
@@ -307,10 +295,7 @@ class TestResetCopilotUsage:
|
||||
with (
|
||||
patch(f"{_MODULE}.config", cfg),
|
||||
patch(f"{_MODULE}.settings", _mock_settings()),
|
||||
patch(
|
||||
f"{_MODULE}.get_global_rate_limits",
|
||||
AsyncMock(return_value=(2_500_000, 12_500_000)),
|
||||
),
|
||||
_mock_rate_limits(),
|
||||
patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
|
||||
patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
|
||||
patch(f"{_MODULE}.release_reset_lock", AsyncMock()),
|
||||
|
||||
@@ -53,6 +53,12 @@ Steps:
|
||||
or fix manually based on the error descriptions. Iterate until valid.
|
||||
8. **Save**: Call `create_agent` (new) or `edit_agent` (existing) with
|
||||
the final `agent_json`
|
||||
8. **Dry-run**: ALWAYS call `run_agent` with `dry_run=True` and
|
||||
`wait_for_result=120` to verify the agent works end-to-end.
|
||||
9. **Inspect & fix**: Check the dry-run output for errors. If issues are
|
||||
found, call `edit_agent` to fix and dry-run again. Repeat until the
|
||||
simulation passes or the problems are clearly unfixable.
|
||||
See "REQUIRED: Dry-Run Verification Loop" section below for details.
|
||||
|
||||
### Agent JSON Structure
|
||||
|
||||
@@ -246,19 +252,51 @@ call in a loop until the task is complete:
|
||||
Regular blocks work exactly like sub-agents as tools — wire each input
|
||||
field from `source_name: "tools"` on the Orchestrator side.
|
||||
|
||||
### Testing with Dry Run
|
||||
### REQUIRED: Dry-Run Verification Loop (create -> dry-run -> fix)
|
||||
|
||||
After saving an agent, suggest a dry run to validate wiring without consuming
|
||||
real API calls, credentials, or credits:
|
||||
After creating or editing an agent, you MUST dry-run it before telling the
|
||||
user the agent is ready. NEVER skip this step.
|
||||
|
||||
1. **Run**: Call `run_agent` or `run_block` with `dry_run=True` and provide
|
||||
sample inputs. This executes the graph with mock outputs, verifying that
|
||||
links resolve correctly and required inputs are satisfied.
|
||||
2. **Check results**: Call `view_agent_output` with `show_execution_details=True`
|
||||
to inspect the full node-by-node execution trace. This shows what each node
|
||||
received as input and produced as output, making it easy to spot wiring issues.
|
||||
3. **Iterate**: If the dry run reveals wiring issues or missing inputs, fix
|
||||
the agent JSON and re-save before suggesting a real execution.
|
||||
#### Step-by-step workflow
|
||||
|
||||
1. **Create/Edit**: Call `create_agent` or `edit_agent` to save the agent.
|
||||
2. **Dry-run**: Call `run_agent` with `dry_run=True`, `wait_for_result=120`,
|
||||
and realistic sample inputs that exercise every path in the agent. This
|
||||
simulates execution using an LLM for each block — no real API calls,
|
||||
credentials, or credits are consumed.
|
||||
3. **Inspect output**: Examine the dry-run result for problems. If
|
||||
`wait_for_result` returns only a summary, call
|
||||
`view_agent_output(execution_id=..., show_execution_details=True)` to
|
||||
see the full node-by-node execution trace. Look for:
|
||||
- **Errors / failed nodes** — a node raised an exception or returned an
|
||||
error status. Common causes: wrong `source_name`/`sink_name` in links,
|
||||
missing `input_default` values, or referencing a nonexistent block output.
|
||||
- **Null / empty outputs** — data did not flow through a link. Verify that
|
||||
`source_name` and `sink_name` match the block schemas exactly (case-
|
||||
sensitive, including nested `_#_` notation).
|
||||
- **Nodes that never executed** — the node was not reached. Likely a
|
||||
missing or broken link from an upstream node.
|
||||
- **Unexpected values** — data arrived but in the wrong type or
|
||||
structure. Check type compatibility between linked ports.
|
||||
4. **Fix**: If any issues are found, call `edit_agent` with the corrected
|
||||
agent JSON, then go back to step 2.
|
||||
5. **Repeat**: Continue the dry-run -> fix cycle until the simulation passes
|
||||
or the problems are clearly unfixable. If you stop making progress,
|
||||
report the remaining issues to the user and ask for guidance.
|
||||
|
||||
#### Good vs bad dry-run output
|
||||
|
||||
**Good output** (agent is ready):
|
||||
- All nodes executed successfully (no errors in the execution trace)
|
||||
- Data flows through every link with non-null, correctly-typed values
|
||||
- The final `AgentOutputBlock` contains a meaningful result
|
||||
- Status is `COMPLETED`
|
||||
|
||||
**Bad output** (needs fixing):
|
||||
- Status is `FAILED` — check the error message for the failing node
|
||||
- An output node received `null` — trace back to find the broken link
|
||||
- A node received data in the wrong format (e.g. string where list expected)
|
||||
- Nodes downstream of a failing node were skipped entirely
|
||||
|
||||
**Special block behaviour in dry-run mode:**
|
||||
- **OrchestratorBlock** and **AgentExecutorBlock** execute for real so the
|
||||
|
||||
@@ -28,13 +28,12 @@ Each result includes a `remotes` array with the exact server URL to use.
|
||||
|
||||
### Important: Check blocks first
|
||||
|
||||
Before using `run_mcp_tool`, always check if the platform already has blocks for the service
|
||||
using `find_block`. The platform has hundreds of built-in blocks (Google Sheets, Google Docs,
|
||||
Google Calendar, Gmail, etc.) that work without MCP setup.
|
||||
Always follow the **Tool Discovery Priority** described in the tool notes:
|
||||
call `find_block` before resorting to `run_mcp_tool`.
|
||||
|
||||
Only use `run_mcp_tool` when:
|
||||
- The service is in the known hosted MCP servers list above, OR
|
||||
- You searched `find_block` first and found no matching blocks
|
||||
- You searched `find_block` first and found no matching blocks, AND
|
||||
- The service is in the known hosted MCP servers list above or found via the registry API
|
||||
|
||||
**Never guess or construct MCP server URLs.** Only use URLs from the known servers list above
|
||||
or from the `remotes[].url` field in MCP registry search results.
|
||||
|
||||
@@ -8,20 +8,19 @@ from uuid import uuid4
|
||||
|
||||
import pytest
|
||||
|
||||
from backend.util import json
|
||||
from backend.util.prompt import CompressResult
|
||||
|
||||
from .conftest import build_test_transcript as _build_transcript
|
||||
from .service import _friendly_error_text, _is_prompt_too_long
|
||||
from .transcript import (
|
||||
from backend.copilot.transcript import (
|
||||
_flatten_assistant_content,
|
||||
_flatten_tool_result_content,
|
||||
_messages_to_transcript,
|
||||
_run_compression,
|
||||
_transcript_to_messages,
|
||||
compact_transcript,
|
||||
validate_transcript,
|
||||
)
|
||||
from backend.util import json
|
||||
from backend.util.prompt import CompressResult
|
||||
|
||||
from .conftest import build_test_transcript as _build_transcript
|
||||
from .service import _friendly_error_text, _is_prompt_too_long
|
||||
from .transcript import compact_transcript, validate_transcript
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _flatten_assistant_content
|
||||
@@ -403,7 +402,7 @@ class TestCompactTranscript:
|
||||
},
|
||||
)()
|
||||
with patch(
|
||||
"backend.copilot.sdk.transcript._run_compression",
|
||||
"backend.copilot.transcript._run_compression",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_result,
|
||||
):
|
||||
@@ -438,7 +437,7 @@ class TestCompactTranscript:
|
||||
},
|
||||
)()
|
||||
with patch(
|
||||
"backend.copilot.sdk.transcript._run_compression",
|
||||
"backend.copilot.transcript._run_compression",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_result,
|
||||
):
|
||||
@@ -462,7 +461,7 @@ class TestCompactTranscript:
|
||||
]
|
||||
)
|
||||
with patch(
|
||||
"backend.copilot.sdk.transcript._run_compression",
|
||||
"backend.copilot.transcript._run_compression",
|
||||
new_callable=AsyncMock,
|
||||
side_effect=RuntimeError("LLM unavailable"),
|
||||
):
|
||||
@@ -568,11 +567,11 @@ class TestRunCompressionTimeout:
|
||||
|
||||
with (
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript.get_openai_client",
|
||||
"backend.copilot.transcript.get_openai_client",
|
||||
return_value="fake-client",
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript.compress_context",
|
||||
"backend.copilot.transcript.compress_context",
|
||||
side_effect=_mock_compress,
|
||||
),
|
||||
):
|
||||
@@ -602,11 +601,11 @@ class TestRunCompressionTimeout:
|
||||
|
||||
with (
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript.get_openai_client",
|
||||
"backend.copilot.transcript.get_openai_client",
|
||||
return_value=None,
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript.compress_context",
|
||||
"backend.copilot.transcript.compress_context",
|
||||
new_callable=AsyncMock,
|
||||
return_value=truncation_result,
|
||||
) as mock_compress,
|
||||
|
||||
@@ -26,18 +26,17 @@ from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from backend.util import json
|
||||
|
||||
from .conftest import build_test_transcript as _build_transcript
|
||||
from .service import _MAX_STREAM_ATTEMPTS, _reduce_context
|
||||
from .transcript import (
|
||||
from backend.copilot.transcript import (
|
||||
_flatten_assistant_content,
|
||||
_flatten_tool_result_content,
|
||||
_messages_to_transcript,
|
||||
_transcript_to_messages,
|
||||
compact_transcript,
|
||||
validate_transcript,
|
||||
)
|
||||
from backend.util import json
|
||||
|
||||
from .conftest import build_test_transcript as _build_transcript
|
||||
from .service import _MAX_STREAM_ATTEMPTS, _reduce_context
|
||||
from .transcript import compact_transcript, validate_transcript
|
||||
from .transcript_builder import TranscriptBuilder
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
@@ -113,7 +112,7 @@ class TestScenarioCompactAndRetry:
|
||||
)(),
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript._run_compression",
|
||||
"backend.copilot.transcript._run_compression",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_result,
|
||||
),
|
||||
@@ -170,7 +169,7 @@ class TestScenarioCompactFailsFallback:
|
||||
)(),
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript._run_compression",
|
||||
"backend.copilot.transcript._run_compression",
|
||||
new_callable=AsyncMock,
|
||||
side_effect=RuntimeError("LLM unavailable"),
|
||||
),
|
||||
@@ -261,7 +260,7 @@ class TestScenarioDoubleFailDBFallback:
|
||||
)(),
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript._run_compression",
|
||||
"backend.copilot.transcript._run_compression",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_result,
|
||||
),
|
||||
@@ -337,7 +336,7 @@ class TestScenarioCompactionIdentical:
|
||||
)(),
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript._run_compression",
|
||||
"backend.copilot.transcript._run_compression",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_result,
|
||||
),
|
||||
@@ -730,7 +729,7 @@ class TestRetryEdgeCases:
|
||||
)(),
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript._run_compression",
|
||||
"backend.copilot.transcript._run_compression",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_result,
|
||||
),
|
||||
@@ -841,7 +840,7 @@ class TestRetryStateReset:
|
||||
)(),
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript._run_compression",
|
||||
"backend.copilot.transcript._run_compression",
|
||||
new_callable=AsyncMock,
|
||||
side_effect=RuntimeError("boom"),
|
||||
),
|
||||
@@ -1405,9 +1404,9 @@ class TestStreamChatCompletionRetryIntegration:
|
||||
events.append(event)
|
||||
|
||||
# Should NOT retry — only 1 attempt for auth errors
|
||||
assert attempt_count[0] == 1, (
|
||||
f"Expected 1 attempt (no retry for auth error), " f"got {attempt_count[0]}"
|
||||
)
|
||||
assert (
|
||||
attempt_count[0] == 1
|
||||
), f"Expected 1 attempt (no retry for auth error), got {attempt_count[0]}"
|
||||
errors = [e for e in events if isinstance(e, StreamError)]
|
||||
assert errors, "Expected StreamError"
|
||||
assert errors[0].code == "sdk_stream_error"
|
||||
|
||||
@@ -29,16 +29,29 @@ from claude_agent_sdk import (
|
||||
)
|
||||
from langfuse import propagate_attributes
|
||||
from langsmith.integrations.claude_agent_sdk import configure_claude_agent_sdk
|
||||
from opentelemetry import trace as otel_trace
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.copilot.context import get_workspace_manager
|
||||
from backend.copilot.permissions import apply_tool_permissions
|
||||
from backend.copilot.rate_limit import get_user_tier
|
||||
from backend.copilot.transcript import (
|
||||
_run_compression,
|
||||
cleanup_stale_project_dirs,
|
||||
compact_transcript,
|
||||
download_transcript,
|
||||
read_compacted_entries,
|
||||
upload_transcript,
|
||||
validate_transcript,
|
||||
write_transcript_to_tempfile,
|
||||
)
|
||||
from backend.copilot.transcript_builder import TranscriptBuilder
|
||||
from backend.data.redis_client import get_redis_async
|
||||
from backend.executor.cluster_lock import AsyncClusterLock
|
||||
from backend.util.exceptions import NotFoundError
|
||||
from backend.util.settings import Settings
|
||||
|
||||
from ..config import ChatConfig
|
||||
from ..config import ChatConfig, CopilotMode
|
||||
from ..constants import (
|
||||
COPILOT_ERROR_PREFIX,
|
||||
COPILOT_RETRYABLE_ERROR_PREFIX,
|
||||
@@ -51,7 +64,7 @@ from ..model import (
|
||||
ChatMessage,
|
||||
ChatSession,
|
||||
get_chat_session,
|
||||
update_session_title,
|
||||
maybe_append_user_message,
|
||||
upsert_chat_session,
|
||||
)
|
||||
from ..prompting import get_sdk_supplement
|
||||
@@ -70,11 +83,7 @@ from ..response_model import (
|
||||
StreamToolOutputAvailable,
|
||||
StreamUsage,
|
||||
)
|
||||
from ..service import (
|
||||
_build_system_prompt,
|
||||
_generate_session_title,
|
||||
_is_langfuse_configured,
|
||||
)
|
||||
from ..service import _build_system_prompt, _is_langfuse_configured, _update_title_async
|
||||
from ..token_tracking import persist_and_record_usage
|
||||
from ..tools.e2b_sandbox import get_or_create_sandbox, pause_sandbox_direct
|
||||
from ..tools.sandbox import WORKSPACE_PREFIX, make_session_path
|
||||
@@ -92,17 +101,6 @@ from .tool_adapter import (
|
||||
set_execution_context,
|
||||
wait_for_stash,
|
||||
)
|
||||
from .transcript import (
|
||||
_run_compression,
|
||||
cleanup_stale_project_dirs,
|
||||
compact_transcript,
|
||||
download_transcript,
|
||||
read_compacted_entries,
|
||||
upload_transcript,
|
||||
validate_transcript,
|
||||
write_transcript_to_tempfile,
|
||||
)
|
||||
from .transcript_builder import TranscriptBuilder
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
config = ChatConfig()
|
||||
@@ -129,6 +127,11 @@ _CIRCUIT_BREAKER_ERROR_MSG = (
|
||||
"Try breaking your request into smaller parts."
|
||||
)
|
||||
|
||||
# Idle timeout: abort the stream if no meaningful SDK message (only heartbeats)
|
||||
# arrives for this many seconds. This catches hung tool calls (e.g. WebSearch
|
||||
# hanging on a search provider that never responds).
|
||||
_IDLE_TIMEOUT_SECONDS = 10 * 60 # 10 minutes
|
||||
|
||||
# Patterns that indicate the prompt/request exceeds the model's context limit.
|
||||
# Matched case-insensitively against the full exception chain.
|
||||
_PROMPT_TOO_LONG_PATTERNS: tuple[str, ...] = (
|
||||
@@ -1271,6 +1274,8 @@ async def _run_stream_attempt(
|
||||
await client.query(state.query_message, session_id=ctx.session_id)
|
||||
state.transcript_builder.append_user(content=ctx.current_message)
|
||||
|
||||
_last_real_msg_time = time.monotonic()
|
||||
|
||||
async for sdk_msg in _iter_sdk_messages(client):
|
||||
# Heartbeat sentinel — refresh lock and keep SSE alive
|
||||
if sdk_msg is None:
|
||||
@@ -1278,8 +1283,34 @@ async def _run_stream_attempt(
|
||||
for ev in ctx.compaction.emit_start_if_ready():
|
||||
yield ev
|
||||
yield StreamHeartbeat()
|
||||
|
||||
# Idle timeout: if no real SDK message for too long, a tool
|
||||
# call is likely hung (e.g. WebSearch provider not responding).
|
||||
idle_seconds = time.monotonic() - _last_real_msg_time
|
||||
if idle_seconds >= _IDLE_TIMEOUT_SECONDS:
|
||||
logger.error(
|
||||
"%s Idle timeout after %.0fs with no SDK message — "
|
||||
"aborting stream (likely hung tool call)",
|
||||
ctx.log_prefix,
|
||||
idle_seconds,
|
||||
)
|
||||
stream_error_msg = (
|
||||
"A tool call appears to be stuck "
|
||||
"(no response for 10 minutes). "
|
||||
"Please try again."
|
||||
)
|
||||
stream_error_code = "idle_timeout"
|
||||
_append_error_marker(ctx.session, stream_error_msg, retryable=True)
|
||||
yield StreamError(
|
||||
errorText=stream_error_msg,
|
||||
code=stream_error_code,
|
||||
)
|
||||
ended_with_stream_error = True
|
||||
break
|
||||
continue
|
||||
|
||||
_last_real_msg_time = time.monotonic()
|
||||
|
||||
logger.info(
|
||||
"%s Received: %s %s (unresolved=%d, current=%d, resolved=%d)",
|
||||
ctx.log_prefix,
|
||||
@@ -1528,9 +1559,21 @@ async def _run_stream_attempt(
|
||||
# --- Intermediate persistence ---
|
||||
# Flush session messages to DB periodically so page reloads
|
||||
# show progress during long-running turns.
|
||||
#
|
||||
# IMPORTANT: Skip the flush while tool calls are pending
|
||||
# (tool_calls set on assistant but results not yet received).
|
||||
# The DB save is append-only (uses start_sequence), so if we
|
||||
# flush the assistant message before tool_calls are set on it
|
||||
# (text and tool_use arrive as separate SDK events), the
|
||||
# tool_calls update is lost — the next flush starts past it.
|
||||
_msgs_since_flush += 1
|
||||
now = time.monotonic()
|
||||
if (
|
||||
has_pending_tools = (
|
||||
acc.has_appended_assistant
|
||||
and acc.accumulated_tool_calls
|
||||
and not acc.has_tool_results
|
||||
)
|
||||
if not has_pending_tools and (
|
||||
_msgs_since_flush >= _FLUSH_MESSAGE_THRESHOLD
|
||||
or (now - _last_flush_time) >= _FLUSH_INTERVAL_SECONDS
|
||||
):
|
||||
@@ -1630,6 +1673,7 @@ async def stream_chat_completion_sdk(
|
||||
session: ChatSession | None = None,
|
||||
file_ids: list[str] | None = None,
|
||||
permissions: "CopilotPermissions | None" = None,
|
||||
mode: CopilotMode | None = None,
|
||||
**_kwargs: Any,
|
||||
) -> AsyncIterator[StreamBaseResponse]:
|
||||
"""Stream chat completion using Claude Agent SDK.
|
||||
@@ -1638,7 +1682,10 @@ async def stream_chat_completion_sdk(
|
||||
file_ids: Optional workspace file IDs attached to the user's message.
|
||||
Images are embedded as vision content blocks; other files are
|
||||
saved to the SDK working directory for the Read tool.
|
||||
mode: Accepted for signature compatibility with the baseline path.
|
||||
The SDK path does not currently branch on this value.
|
||||
"""
|
||||
_ = mode # SDK path ignores the requested mode.
|
||||
|
||||
if session is None:
|
||||
session = await get_chat_session(session_id, user_id)
|
||||
@@ -1669,19 +1716,12 @@ async def stream_chat_completion_sdk(
|
||||
)
|
||||
session.messages.pop()
|
||||
|
||||
# Append the new message to the session if it's not already there
|
||||
new_message_role = "user" if is_user_message else "assistant"
|
||||
if message and (
|
||||
len(session.messages) == 0
|
||||
or not (
|
||||
session.messages[-1].role == new_message_role
|
||||
and session.messages[-1].content == message
|
||||
)
|
||||
):
|
||||
session.messages.append(ChatMessage(role=new_message_role, content=message))
|
||||
if maybe_append_user_message(session, message, is_user_message):
|
||||
if is_user_message:
|
||||
track_user_message(
|
||||
user_id=user_id, session_id=session_id, message_length=len(message)
|
||||
user_id=user_id,
|
||||
session_id=session_id,
|
||||
message_length=len(message or ""),
|
||||
)
|
||||
|
||||
# Structured log prefix: [SDK][<session>][T<turn>]
|
||||
@@ -1946,15 +1986,20 @@ async def stream_chat_completion_sdk(
|
||||
# langsmith tracing integration attaches them to every span. This
|
||||
# is what Langfuse (or any OTEL backend) maps to its native
|
||||
# user/session fields.
|
||||
_user_tier = await get_user_tier(user_id) if user_id else None
|
||||
_otel_metadata: dict[str, str] = {
|
||||
"resume": str(use_resume),
|
||||
"conversation_turn": str(turn),
|
||||
}
|
||||
if _user_tier:
|
||||
_otel_metadata["subscription_tier"] = _user_tier.value
|
||||
|
||||
_otel_ctx = propagate_attributes(
|
||||
user_id=user_id,
|
||||
session_id=session_id,
|
||||
trace_name="copilot-sdk",
|
||||
tags=["sdk"],
|
||||
metadata={
|
||||
"resume": str(use_resume),
|
||||
"conversation_turn": str(turn),
|
||||
},
|
||||
metadata=_otel_metadata,
|
||||
)
|
||||
_otel_ctx.__enter__()
|
||||
|
||||
@@ -2323,8 +2368,26 @@ async def stream_chat_completion_sdk(
|
||||
|
||||
raise
|
||||
finally:
|
||||
# --- Close OTEL context ---
|
||||
# --- Close OTEL context (with cost attributes) ---
|
||||
if _otel_ctx is not None:
|
||||
try:
|
||||
span = otel_trace.get_current_span()
|
||||
if span and span.is_recording():
|
||||
span.set_attribute("gen_ai.usage.prompt_tokens", turn_prompt_tokens)
|
||||
span.set_attribute(
|
||||
"gen_ai.usage.completion_tokens", turn_completion_tokens
|
||||
)
|
||||
span.set_attribute(
|
||||
"gen_ai.usage.cache_read_tokens", turn_cache_read_tokens
|
||||
)
|
||||
span.set_attribute(
|
||||
"gen_ai.usage.cache_creation_tokens",
|
||||
turn_cache_creation_tokens,
|
||||
)
|
||||
if turn_cost_usd is not None:
|
||||
span.set_attribute("gen_ai.usage.cost_usd", turn_cost_usd)
|
||||
except Exception:
|
||||
logger.debug("Failed to set OTEL cost attributes", exc_info=True)
|
||||
try:
|
||||
_otel_ctx.__exit__(*sys.exc_info())
|
||||
except Exception:
|
||||
@@ -2342,6 +2405,8 @@ async def stream_chat_completion_sdk(
|
||||
cache_creation_tokens=turn_cache_creation_tokens,
|
||||
log_prefix=log_prefix,
|
||||
cost_usd=turn_cost_usd,
|
||||
model=config.model,
|
||||
provider="anthropic",
|
||||
)
|
||||
|
||||
# --- Persist session messages ---
|
||||
@@ -2446,18 +2511,3 @@ async def stream_chat_completion_sdk(
|
||||
finally:
|
||||
# Release stream lock to allow new streams for this session
|
||||
await lock.release()
|
||||
|
||||
|
||||
async def _update_title_async(
|
||||
session_id: str, message: str, user_id: str | None = None
|
||||
) -> None:
|
||||
"""Background task to update session title."""
|
||||
try:
|
||||
title = await _generate_session_title(
|
||||
message, user_id=user_id, session_id=session_id
|
||||
)
|
||||
if title and user_id:
|
||||
await update_session_title(session_id, user_id, title, only_if_empty=True)
|
||||
logger.debug("[SDK] Generated title for %s: %s", session_id, title)
|
||||
except Exception as e:
|
||||
logger.warning("[SDK] Failed to update session title: %s", e)
|
||||
|
||||
@@ -27,20 +27,19 @@ from backend.copilot.response_model import (
|
||||
StreamTextDelta,
|
||||
StreamTextStart,
|
||||
)
|
||||
from backend.util import json
|
||||
|
||||
from .conftest import build_structured_transcript
|
||||
from .response_adapter import SDKResponseAdapter
|
||||
from .service import _format_sdk_content_blocks
|
||||
from .transcript import (
|
||||
from backend.copilot.transcript import (
|
||||
_find_last_assistant_entry,
|
||||
_flatten_assistant_content,
|
||||
_messages_to_transcript,
|
||||
_rechain_tail,
|
||||
_transcript_to_messages,
|
||||
compact_transcript,
|
||||
validate_transcript,
|
||||
)
|
||||
from backend.util import json
|
||||
|
||||
from .conftest import build_structured_transcript
|
||||
from .response_adapter import SDKResponseAdapter
|
||||
from .service import _format_sdk_content_blocks
|
||||
from .transcript import compact_transcript, validate_transcript
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Fixtures: realistic thinking block content
|
||||
@@ -439,7 +438,7 @@ class TestCompactTranscriptThinkingBlocks:
|
||||
},
|
||||
)()
|
||||
with patch(
|
||||
"backend.copilot.sdk.transcript._run_compression",
|
||||
"backend.copilot.transcript._run_compression",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_result,
|
||||
):
|
||||
@@ -498,7 +497,7 @@ class TestCompactTranscriptThinkingBlocks:
|
||||
)()
|
||||
|
||||
with patch(
|
||||
"backend.copilot.sdk.transcript._run_compression",
|
||||
"backend.copilot.transcript._run_compression",
|
||||
side_effect=mock_compression,
|
||||
):
|
||||
await compact_transcript(transcript, model="test-model")
|
||||
@@ -551,7 +550,7 @@ class TestCompactTranscriptThinkingBlocks:
|
||||
},
|
||||
)()
|
||||
with patch(
|
||||
"backend.copilot.sdk.transcript._run_compression",
|
||||
"backend.copilot.transcript._run_compression",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_result,
|
||||
):
|
||||
@@ -601,7 +600,7 @@ class TestCompactTranscriptThinkingBlocks:
|
||||
},
|
||||
)()
|
||||
with patch(
|
||||
"backend.copilot.sdk.transcript._run_compression",
|
||||
"backend.copilot.transcript._run_compression",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_result,
|
||||
):
|
||||
@@ -638,7 +637,7 @@ class TestCompactTranscriptThinkingBlocks:
|
||||
},
|
||||
)()
|
||||
with patch(
|
||||
"backend.copilot.sdk.transcript._run_compression",
|
||||
"backend.copilot.transcript._run_compression",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_result,
|
||||
):
|
||||
@@ -699,7 +698,7 @@ class TestCompactTranscriptThinkingBlocks:
|
||||
},
|
||||
)()
|
||||
with patch(
|
||||
"backend.copilot.sdk.transcript._run_compression",
|
||||
"backend.copilot.transcript._run_compression",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_result,
|
||||
):
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,235 +1,10 @@
|
||||
"""Build complete JSONL transcript from SDK messages.
|
||||
"""Re-export from shared ``backend.copilot.transcript_builder`` for backward compat.
|
||||
|
||||
The transcript represents the FULL active context at any point in time.
|
||||
Each upload REPLACES the previous transcript atomically.
|
||||
|
||||
Flow:
|
||||
Turn 1: Upload [msg1, msg2]
|
||||
Turn 2: Download [msg1, msg2] → Upload [msg1, msg2, msg3, msg4] (REPLACE)
|
||||
Turn 3: Download [msg1, msg2, msg3, msg4] → Upload [all messages] (REPLACE)
|
||||
|
||||
The transcript is never incremental - always the complete atomic state.
|
||||
The canonical implementation now lives at ``backend.copilot.transcript_builder``
|
||||
so both the SDK and baseline paths can import without cross-package
|
||||
dependencies.
|
||||
"""
|
||||
|
||||
import logging
|
||||
from typing import Any
|
||||
from uuid import uuid4
|
||||
from backend.copilot.transcript_builder import TranscriptBuilder, TranscriptEntry
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.util import json
|
||||
|
||||
from .transcript import STRIPPABLE_TYPES
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class TranscriptEntry(BaseModel):
|
||||
"""Single transcript entry (user or assistant turn)."""
|
||||
|
||||
type: str
|
||||
uuid: str
|
||||
parentUuid: str | None
|
||||
isCompactSummary: bool | None = None
|
||||
message: dict[str, Any]
|
||||
|
||||
|
||||
class TranscriptBuilder:
|
||||
"""Build complete JSONL transcript from SDK messages.
|
||||
|
||||
This builder maintains the FULL conversation state, not incremental changes.
|
||||
The output is always the complete active context.
|
||||
"""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._entries: list[TranscriptEntry] = []
|
||||
self._last_uuid: str | None = None
|
||||
|
||||
def _last_is_assistant(self) -> bool:
|
||||
return bool(self._entries) and self._entries[-1].type == "assistant"
|
||||
|
||||
def _last_message_id(self) -> str:
|
||||
"""Return the message.id of the last entry, or '' if none."""
|
||||
if self._entries:
|
||||
return self._entries[-1].message.get("id", "")
|
||||
return ""
|
||||
|
||||
@staticmethod
|
||||
def _parse_entry(data: dict) -> TranscriptEntry | None:
|
||||
"""Parse a single transcript entry, filtering strippable types.
|
||||
|
||||
Returns ``None`` for entries that should be skipped (strippable types
|
||||
that are not compaction summaries).
|
||||
"""
|
||||
entry_type = data.get("type", "")
|
||||
if entry_type in STRIPPABLE_TYPES and not data.get("isCompactSummary"):
|
||||
return None
|
||||
return TranscriptEntry(
|
||||
type=entry_type,
|
||||
uuid=data.get("uuid") or str(uuid4()),
|
||||
parentUuid=data.get("parentUuid"),
|
||||
isCompactSummary=data.get("isCompactSummary"),
|
||||
message=data.get("message", {}),
|
||||
)
|
||||
|
||||
def load_previous(self, content: str, log_prefix: str = "[Transcript]") -> None:
|
||||
"""Load complete previous transcript.
|
||||
|
||||
This loads the FULL previous context. As new messages come in,
|
||||
we append to this state. The final output is the complete context
|
||||
(previous + new), not just the delta.
|
||||
"""
|
||||
if not content or not content.strip():
|
||||
return
|
||||
|
||||
lines = content.strip().split("\n")
|
||||
for line_num, line in enumerate(lines, 1):
|
||||
if not line.strip():
|
||||
continue
|
||||
|
||||
data = json.loads(line, fallback=None)
|
||||
if data is None:
|
||||
logger.warning(
|
||||
"%s Failed to parse transcript line %d/%d",
|
||||
log_prefix,
|
||||
line_num,
|
||||
len(lines),
|
||||
)
|
||||
continue
|
||||
|
||||
entry = self._parse_entry(data)
|
||||
if entry is None:
|
||||
continue
|
||||
self._entries.append(entry)
|
||||
self._last_uuid = entry.uuid
|
||||
|
||||
logger.info(
|
||||
"%s Loaded %d entries from previous transcript (last_uuid=%s)",
|
||||
log_prefix,
|
||||
len(self._entries),
|
||||
self._last_uuid[:12] if self._last_uuid else None,
|
||||
)
|
||||
|
||||
def append_user(self, content: str | list[dict], uuid: str | None = None) -> None:
|
||||
"""Append a user entry."""
|
||||
msg_uuid = uuid or str(uuid4())
|
||||
|
||||
self._entries.append(
|
||||
TranscriptEntry(
|
||||
type="user",
|
||||
uuid=msg_uuid,
|
||||
parentUuid=self._last_uuid,
|
||||
message={"role": "user", "content": content},
|
||||
)
|
||||
)
|
||||
self._last_uuid = msg_uuid
|
||||
|
||||
def append_tool_result(self, tool_use_id: str, content: str) -> None:
|
||||
"""Append a tool result as a user entry (one per tool call)."""
|
||||
self.append_user(
|
||||
content=[
|
||||
{"type": "tool_result", "tool_use_id": tool_use_id, "content": content}
|
||||
]
|
||||
)
|
||||
|
||||
def append_assistant(
|
||||
self,
|
||||
content_blocks: list[dict],
|
||||
model: str = "",
|
||||
stop_reason: str | None = None,
|
||||
) -> None:
|
||||
"""Append an assistant entry.
|
||||
|
||||
Consecutive assistant entries automatically share the same message ID
|
||||
so the CLI can merge them (thinking → text → tool_use) into a single
|
||||
API message on ``--resume``. A new ID is assigned whenever an
|
||||
assistant entry follows a non-assistant entry (user message or tool
|
||||
result), because that marks the start of a new API response.
|
||||
"""
|
||||
message_id = (
|
||||
self._last_message_id()
|
||||
if self._last_is_assistant()
|
||||
else f"msg_sdk_{uuid4().hex[:24]}"
|
||||
)
|
||||
|
||||
msg_uuid = str(uuid4())
|
||||
|
||||
self._entries.append(
|
||||
TranscriptEntry(
|
||||
type="assistant",
|
||||
uuid=msg_uuid,
|
||||
parentUuid=self._last_uuid,
|
||||
message={
|
||||
"role": "assistant",
|
||||
"model": model,
|
||||
"id": message_id,
|
||||
"type": "message",
|
||||
"content": content_blocks,
|
||||
"stop_reason": stop_reason,
|
||||
"stop_sequence": None,
|
||||
},
|
||||
)
|
||||
)
|
||||
self._last_uuid = msg_uuid
|
||||
|
||||
def replace_entries(
|
||||
self, compacted_entries: list[dict], log_prefix: str = "[Transcript]"
|
||||
) -> None:
|
||||
"""Replace all entries with compacted entries from the CLI session file.
|
||||
|
||||
Called after mid-stream compaction so TranscriptBuilder mirrors the
|
||||
CLI's active context (compaction summary + post-compaction entries).
|
||||
|
||||
Builds the new list first and validates it's non-empty before swapping,
|
||||
so corrupt input cannot wipe the conversation history.
|
||||
"""
|
||||
new_entries: list[TranscriptEntry] = []
|
||||
for data in compacted_entries:
|
||||
entry = self._parse_entry(data)
|
||||
if entry is not None:
|
||||
new_entries.append(entry)
|
||||
|
||||
if not new_entries:
|
||||
logger.warning(
|
||||
"%s replace_entries produced 0 entries from %d inputs, keeping old (%d entries)",
|
||||
log_prefix,
|
||||
len(compacted_entries),
|
||||
len(self._entries),
|
||||
)
|
||||
return
|
||||
|
||||
old_count = len(self._entries)
|
||||
self._entries = new_entries
|
||||
self._last_uuid = new_entries[-1].uuid
|
||||
|
||||
logger.info(
|
||||
"%s TranscriptBuilder compacted: %d entries -> %d entries",
|
||||
log_prefix,
|
||||
old_count,
|
||||
len(self._entries),
|
||||
)
|
||||
|
||||
def to_jsonl(self) -> str:
|
||||
"""Export complete context as JSONL.
|
||||
|
||||
Consecutive assistant entries are kept separate to match the
|
||||
native CLI format — the SDK merges them internally on resume.
|
||||
|
||||
Returns the FULL conversation state (all entries), not incremental.
|
||||
This output REPLACES any previous transcript.
|
||||
"""
|
||||
if not self._entries:
|
||||
return ""
|
||||
|
||||
lines = [entry.model_dump_json(exclude_none=True) for entry in self._entries]
|
||||
return "\n".join(lines) + "\n"
|
||||
|
||||
@property
|
||||
def entry_count(self) -> int:
|
||||
"""Total number of entries in the complete context."""
|
||||
return len(self._entries)
|
||||
|
||||
@property
|
||||
def is_empty(self) -> bool:
|
||||
"""Whether this builder has any entries."""
|
||||
return len(self._entries) == 0
|
||||
__all__ = ["TranscriptBuilder", "TranscriptEntry"]
|
||||
|
||||
@@ -303,7 +303,7 @@ class TestDeleteTranscript:
|
||||
mock_storage.delete = AsyncMock()
|
||||
|
||||
with patch(
|
||||
"backend.copilot.sdk.transcript.get_workspace_storage",
|
||||
"backend.copilot.transcript.get_workspace_storage",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_storage,
|
||||
):
|
||||
@@ -323,7 +323,7 @@ class TestDeleteTranscript:
|
||||
)
|
||||
|
||||
with patch(
|
||||
"backend.copilot.sdk.transcript.get_workspace_storage",
|
||||
"backend.copilot.transcript.get_workspace_storage",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_storage,
|
||||
):
|
||||
@@ -341,7 +341,7 @@ class TestDeleteTranscript:
|
||||
)
|
||||
|
||||
with patch(
|
||||
"backend.copilot.sdk.transcript.get_workspace_storage",
|
||||
"backend.copilot.transcript.get_workspace_storage",
|
||||
new_callable=AsyncMock,
|
||||
return_value=mock_storage,
|
||||
):
|
||||
@@ -850,7 +850,7 @@ class TestRunCompression:
|
||||
@pytest.mark.asyncio
|
||||
async def test_no_client_uses_truncation(self):
|
||||
"""Path (a): ``get_openai_client()`` returns None → truncation only."""
|
||||
from .transcript import _run_compression
|
||||
from backend.copilot.transcript import _run_compression
|
||||
|
||||
truncation_result = self._make_compress_result(
|
||||
True, [{"role": "user", "content": "truncated"}]
|
||||
@@ -858,11 +858,11 @@ class TestRunCompression:
|
||||
|
||||
with (
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript.get_openai_client",
|
||||
"backend.copilot.transcript.get_openai_client",
|
||||
return_value=None,
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript.compress_context",
|
||||
"backend.copilot.transcript.compress_context",
|
||||
new_callable=AsyncMock,
|
||||
return_value=truncation_result,
|
||||
) as mock_compress,
|
||||
@@ -885,7 +885,7 @@ class TestRunCompression:
|
||||
@pytest.mark.asyncio
|
||||
async def test_llm_success_returns_llm_result(self):
|
||||
"""Path (b): ``get_openai_client()`` returns a client → LLM compresses."""
|
||||
from .transcript import _run_compression
|
||||
from backend.copilot.transcript import _run_compression
|
||||
|
||||
llm_result = self._make_compress_result(
|
||||
True, [{"role": "user", "content": "LLM summary"}]
|
||||
@@ -894,11 +894,11 @@ class TestRunCompression:
|
||||
|
||||
with (
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript.get_openai_client",
|
||||
"backend.copilot.transcript.get_openai_client",
|
||||
return_value=mock_client,
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript.compress_context",
|
||||
"backend.copilot.transcript.compress_context",
|
||||
new_callable=AsyncMock,
|
||||
return_value=llm_result,
|
||||
) as mock_compress,
|
||||
@@ -916,7 +916,7 @@ class TestRunCompression:
|
||||
@pytest.mark.asyncio
|
||||
async def test_llm_failure_falls_back_to_truncation(self):
|
||||
"""Path (c): LLM call raises → truncation fallback used instead."""
|
||||
from .transcript import _run_compression
|
||||
from backend.copilot.transcript import _run_compression
|
||||
|
||||
truncation_result = self._make_compress_result(
|
||||
True, [{"role": "user", "content": "truncated fallback"}]
|
||||
@@ -932,11 +932,11 @@ class TestRunCompression:
|
||||
|
||||
with (
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript.get_openai_client",
|
||||
"backend.copilot.transcript.get_openai_client",
|
||||
return_value=mock_client,
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript.compress_context",
|
||||
"backend.copilot.transcript.compress_context",
|
||||
side_effect=_compress_side_effect,
|
||||
),
|
||||
):
|
||||
@@ -953,7 +953,7 @@ class TestRunCompression:
|
||||
@pytest.mark.asyncio
|
||||
async def test_llm_timeout_falls_back_to_truncation(self):
|
||||
"""Path (d): LLM call exceeds timeout → truncation fallback used."""
|
||||
from .transcript import _run_compression
|
||||
from backend.copilot.transcript import _run_compression
|
||||
|
||||
truncation_result = self._make_compress_result(
|
||||
True, [{"role": "user", "content": "truncated after timeout"}]
|
||||
@@ -970,19 +970,19 @@ class TestRunCompression:
|
||||
fake_client = MagicMock()
|
||||
with (
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript.get_openai_client",
|
||||
"backend.copilot.transcript.get_openai_client",
|
||||
return_value=fake_client,
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript.compress_context",
|
||||
"backend.copilot.transcript.compress_context",
|
||||
side_effect=_compress_side_effect,
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript._COMPACTION_TIMEOUT_SECONDS",
|
||||
"backend.copilot.transcript._COMPACTION_TIMEOUT_SECONDS",
|
||||
0.05,
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.sdk.transcript._TRUNCATION_TIMEOUT_SECONDS",
|
||||
"backend.copilot.transcript._TRUNCATION_TIMEOUT_SECONDS",
|
||||
5,
|
||||
),
|
||||
):
|
||||
@@ -1007,7 +1007,7 @@ class TestCleanupStaleProjectDirs:
|
||||
|
||||
def test_removes_old_copilot_dirs(self, tmp_path, monkeypatch):
|
||||
"""Directories matching copilot pattern older than threshold are removed."""
|
||||
from backend.copilot.sdk.transcript import (
|
||||
from backend.copilot.transcript import (
|
||||
_STALE_PROJECT_DIR_SECONDS,
|
||||
cleanup_stale_project_dirs,
|
||||
)
|
||||
@@ -1015,7 +1015,7 @@ class TestCleanupStaleProjectDirs:
|
||||
projects_dir = tmp_path / "projects"
|
||||
projects_dir.mkdir()
|
||||
monkeypatch.setattr(
|
||||
"backend.copilot.sdk.transcript._projects_base",
|
||||
"backend.copilot.transcript._projects_base",
|
||||
lambda: str(projects_dir),
|
||||
)
|
||||
|
||||
@@ -1039,12 +1039,12 @@ class TestCleanupStaleProjectDirs:
|
||||
|
||||
def test_ignores_non_copilot_dirs(self, tmp_path, monkeypatch):
|
||||
"""Directories not matching copilot pattern are left alone."""
|
||||
from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
|
||||
from backend.copilot.transcript import cleanup_stale_project_dirs
|
||||
|
||||
projects_dir = tmp_path / "projects"
|
||||
projects_dir.mkdir()
|
||||
monkeypatch.setattr(
|
||||
"backend.copilot.sdk.transcript._projects_base",
|
||||
"backend.copilot.transcript._projects_base",
|
||||
lambda: str(projects_dir),
|
||||
)
|
||||
|
||||
@@ -1062,7 +1062,7 @@ class TestCleanupStaleProjectDirs:
|
||||
|
||||
def test_ttl_boundary_not_removed(self, tmp_path, monkeypatch):
|
||||
"""A directory exactly at the TTL boundary should NOT be removed."""
|
||||
from backend.copilot.sdk.transcript import (
|
||||
from backend.copilot.transcript import (
|
||||
_STALE_PROJECT_DIR_SECONDS,
|
||||
cleanup_stale_project_dirs,
|
||||
)
|
||||
@@ -1070,7 +1070,7 @@ class TestCleanupStaleProjectDirs:
|
||||
projects_dir = tmp_path / "projects"
|
||||
projects_dir.mkdir()
|
||||
monkeypatch.setattr(
|
||||
"backend.copilot.sdk.transcript._projects_base",
|
||||
"backend.copilot.transcript._projects_base",
|
||||
lambda: str(projects_dir),
|
||||
)
|
||||
|
||||
@@ -1088,7 +1088,7 @@ class TestCleanupStaleProjectDirs:
|
||||
|
||||
def test_skips_non_directory_entries(self, tmp_path, monkeypatch):
|
||||
"""Regular files matching the copilot pattern are not removed."""
|
||||
from backend.copilot.sdk.transcript import (
|
||||
from backend.copilot.transcript import (
|
||||
_STALE_PROJECT_DIR_SECONDS,
|
||||
cleanup_stale_project_dirs,
|
||||
)
|
||||
@@ -1096,7 +1096,7 @@ class TestCleanupStaleProjectDirs:
|
||||
projects_dir = tmp_path / "projects"
|
||||
projects_dir.mkdir()
|
||||
monkeypatch.setattr(
|
||||
"backend.copilot.sdk.transcript._projects_base",
|
||||
"backend.copilot.transcript._projects_base",
|
||||
lambda: str(projects_dir),
|
||||
)
|
||||
|
||||
@@ -1114,11 +1114,11 @@ class TestCleanupStaleProjectDirs:
|
||||
|
||||
def test_missing_base_dir_returns_zero(self, tmp_path, monkeypatch):
|
||||
"""If the projects base directory doesn't exist, return 0 gracefully."""
|
||||
from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
|
||||
from backend.copilot.transcript import cleanup_stale_project_dirs
|
||||
|
||||
nonexistent = str(tmp_path / "does-not-exist" / "projects")
|
||||
monkeypatch.setattr(
|
||||
"backend.copilot.sdk.transcript._projects_base",
|
||||
"backend.copilot.transcript._projects_base",
|
||||
lambda: nonexistent,
|
||||
)
|
||||
|
||||
@@ -1129,7 +1129,7 @@ class TestCleanupStaleProjectDirs:
|
||||
"""When encoded_cwd is supplied only that directory is swept."""
|
||||
import time
|
||||
|
||||
from backend.copilot.sdk.transcript import (
|
||||
from backend.copilot.transcript import (
|
||||
_STALE_PROJECT_DIR_SECONDS,
|
||||
cleanup_stale_project_dirs,
|
||||
)
|
||||
@@ -1137,7 +1137,7 @@ class TestCleanupStaleProjectDirs:
|
||||
projects_dir = tmp_path / "projects"
|
||||
projects_dir.mkdir()
|
||||
monkeypatch.setattr(
|
||||
"backend.copilot.sdk.transcript._projects_base",
|
||||
"backend.copilot.transcript._projects_base",
|
||||
lambda: str(projects_dir),
|
||||
)
|
||||
|
||||
@@ -1160,12 +1160,12 @@ class TestCleanupStaleProjectDirs:
|
||||
|
||||
def test_scoped_fresh_dir_not_removed(self, tmp_path, monkeypatch):
|
||||
"""Scoped sweep leaves a fresh directory alone."""
|
||||
from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
|
||||
from backend.copilot.transcript import cleanup_stale_project_dirs
|
||||
|
||||
projects_dir = tmp_path / "projects"
|
||||
projects_dir.mkdir()
|
||||
monkeypatch.setattr(
|
||||
"backend.copilot.sdk.transcript._projects_base",
|
||||
"backend.copilot.transcript._projects_base",
|
||||
lambda: str(projects_dir),
|
||||
)
|
||||
|
||||
@@ -1181,7 +1181,7 @@ class TestCleanupStaleProjectDirs:
|
||||
"""Scoped sweep refuses to remove a non-copilot directory."""
|
||||
import time
|
||||
|
||||
from backend.copilot.sdk.transcript import (
|
||||
from backend.copilot.transcript import (
|
||||
_STALE_PROJECT_DIR_SECONDS,
|
||||
cleanup_stale_project_dirs,
|
||||
)
|
||||
@@ -1189,7 +1189,7 @@ class TestCleanupStaleProjectDirs:
|
||||
projects_dir = tmp_path / "projects"
|
||||
projects_dir.mkdir()
|
||||
monkeypatch.setattr(
|
||||
"backend.copilot.sdk.transcript._projects_base",
|
||||
"backend.copilot.transcript._projects_base",
|
||||
lambda: str(projects_dir),
|
||||
)
|
||||
|
||||
|
||||
@@ -22,7 +22,12 @@ from backend.util.exceptions import NotAuthorizedError, NotFoundError
|
||||
from backend.util.settings import AppEnvironment, Settings
|
||||
|
||||
from .config import ChatConfig
|
||||
from .model import ChatSessionInfo, get_chat_session, upsert_chat_session
|
||||
from .model import (
|
||||
ChatSessionInfo,
|
||||
get_chat_session,
|
||||
update_session_title,
|
||||
upsert_chat_session,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -202,6 +207,22 @@ async def _generate_session_title(
|
||||
return None
|
||||
|
||||
|
||||
async def _update_title_async(
|
||||
session_id: str, message: str, user_id: str | None = None
|
||||
) -> None:
|
||||
"""Generate and persist a session title in the background.
|
||||
|
||||
Shared by both the SDK and baseline execution paths.
|
||||
"""
|
||||
try:
|
||||
title = await _generate_session_title(message, user_id, session_id)
|
||||
if title and user_id:
|
||||
await update_session_title(session_id, user_id, title, only_if_empty=True)
|
||||
logger.debug("Generated title for session %s", session_id)
|
||||
except Exception as e:
|
||||
logger.warning("Failed to update session title for %s: %s", session_id, e)
|
||||
|
||||
|
||||
async def assign_user_to_session(
|
||||
session_id: str,
|
||||
user_id: str,
|
||||
|
||||
@@ -7,7 +7,7 @@ import pytest
|
||||
from .model import create_chat_session, get_chat_session, upsert_chat_session
|
||||
from .response_model import StreamError, StreamTextDelta
|
||||
from .sdk import service as sdk_service
|
||||
from .sdk.transcript import download_transcript
|
||||
from .transcript import download_transcript
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@@ -4,17 +4,85 @@ Both the baseline (OpenRouter) and SDK (Anthropic) service layers need to:
|
||||
1. Append a ``Usage`` record to the session.
|
||||
2. Log the turn's token counts.
|
||||
3. Record weighted usage in Redis for rate-limiting.
|
||||
4. Write a PlatformCostLog entry for admin cost tracking.
|
||||
|
||||
This module extracts that common logic so both paths stay in sync.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import math
|
||||
import re
|
||||
import threading
|
||||
|
||||
from backend.data.db_accessors import platform_cost_db
|
||||
from backend.data.platform_cost import PlatformCostEntry, usd_to_microdollars
|
||||
|
||||
from .model import ChatSession, Usage
|
||||
from .rate_limit import record_token_usage
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Hold strong references to in-flight cost log tasks to prevent GC.
|
||||
_pending_log_tasks: set[asyncio.Task[None]] = set()
|
||||
# Guards all reads and writes to _pending_log_tasks. Done callbacks (discard)
|
||||
# fire from the event loop thread; drain_pending_cost_logs iterates the set
|
||||
# from any caller — the lock prevents RuntimeError from concurrent modification.
|
||||
_pending_log_tasks_lock = threading.Lock()
|
||||
# Per-loop semaphores: asyncio.Semaphore is not thread-safe and must not be
|
||||
# shared across event loops running in different threads.
|
||||
_log_semaphores: dict[asyncio.AbstractEventLoop, asyncio.Semaphore] = {}
|
||||
|
||||
|
||||
def _get_log_semaphore() -> asyncio.Semaphore:
|
||||
loop = asyncio.get_running_loop()
|
||||
sem = _log_semaphores.get(loop)
|
||||
if sem is None:
|
||||
sem = asyncio.Semaphore(50)
|
||||
_log_semaphores[loop] = sem
|
||||
return sem
|
||||
|
||||
|
||||
def _schedule_cost_log(entry: PlatformCostEntry) -> None:
|
||||
"""Schedule a fire-and-forget cost log via DatabaseManagerAsyncClient RPC."""
|
||||
|
||||
async def _safe_log() -> None:
|
||||
async with _get_log_semaphore():
|
||||
try:
|
||||
await platform_cost_db().log_platform_cost(entry)
|
||||
except Exception:
|
||||
logger.exception(
|
||||
"Failed to log platform cost for user=%s provider=%s block=%s",
|
||||
entry.user_id,
|
||||
entry.provider,
|
||||
entry.block_name,
|
||||
)
|
||||
|
||||
task = asyncio.create_task(_safe_log())
|
||||
with _pending_log_tasks_lock:
|
||||
_pending_log_tasks.add(task)
|
||||
|
||||
def _remove(t: asyncio.Task[None]) -> None:
|
||||
with _pending_log_tasks_lock:
|
||||
_pending_log_tasks.discard(t)
|
||||
|
||||
task.add_done_callback(_remove)
|
||||
|
||||
|
||||
# Identifiers used by PlatformCostLog for copilot turns (not tied to a real
|
||||
# block/credential in the block_cost_config or credentials_store tables).
|
||||
COPILOT_BLOCK_ID = "copilot"
|
||||
COPILOT_CREDENTIAL_ID = "copilot_system"
|
||||
|
||||
|
||||
def _copilot_block_name(log_prefix: str) -> str:
|
||||
"""Extract stable block_name from ``"[SDK][session][T1]"`` -> ``"copilot:SDK"``."""
|
||||
match = re.search(r"\[([A-Za-z][A-Za-z0-9_]*)\]", log_prefix)
|
||||
if match:
|
||||
return f"{COPILOT_BLOCK_ID}:{match.group(1)}"
|
||||
tag = log_prefix.strip(" []")
|
||||
return f"{COPILOT_BLOCK_ID}:{tag}" if tag else COPILOT_BLOCK_ID
|
||||
|
||||
|
||||
async def persist_and_record_usage(
|
||||
*,
|
||||
@@ -26,6 +94,8 @@ async def persist_and_record_usage(
|
||||
cache_creation_tokens: int = 0,
|
||||
log_prefix: str = "",
|
||||
cost_usd: float | str | None = None,
|
||||
model: str | None = None,
|
||||
provider: str = "open_router",
|
||||
) -> int:
|
||||
"""Persist token usage to session and record for rate limiting.
|
||||
|
||||
@@ -38,6 +108,7 @@ async def persist_and_record_usage(
|
||||
cache_creation_tokens: Tokens written to prompt cache (Anthropic only).
|
||||
log_prefix: Prefix for log messages (e.g. "[SDK]", "[Baseline]").
|
||||
cost_usd: Optional cost for logging (float from SDK, str otherwise).
|
||||
provider: Cost provider name (e.g. "anthropic", "open_router").
|
||||
|
||||
Returns:
|
||||
The computed total_tokens (prompt + completion; cache excluded).
|
||||
@@ -47,12 +118,13 @@ async def persist_and_record_usage(
|
||||
cache_read_tokens = max(0, cache_read_tokens)
|
||||
cache_creation_tokens = max(0, cache_creation_tokens)
|
||||
|
||||
if (
|
||||
no_tokens = (
|
||||
prompt_tokens <= 0
|
||||
and completion_tokens <= 0
|
||||
and cache_read_tokens <= 0
|
||||
and cache_creation_tokens <= 0
|
||||
):
|
||||
)
|
||||
if no_tokens and cost_usd is None:
|
||||
return 0
|
||||
|
||||
# total_tokens = prompt + completion. Cache tokens are tracked
|
||||
@@ -73,14 +145,14 @@ async def persist_and_record_usage(
|
||||
|
||||
if cache_read_tokens or cache_creation_tokens:
|
||||
logger.info(
|
||||
f"{log_prefix} Turn usage: uncached={prompt_tokens}, "
|
||||
f"cache_read={cache_read_tokens}, cache_create={cache_creation_tokens}, "
|
||||
f"output={completion_tokens}, total={total_tokens}, cost_usd={cost_usd}"
|
||||
f"{log_prefix} Turn usage: uncached={prompt_tokens}, cache_read={cache_read_tokens},"
|
||||
f" cache_create={cache_creation_tokens}, output={completion_tokens},"
|
||||
f" total={total_tokens}, cost_usd={cost_usd}"
|
||||
)
|
||||
else:
|
||||
logger.info(
|
||||
f"{log_prefix} Turn usage: prompt={prompt_tokens}, "
|
||||
f"completion={completion_tokens}, total={total_tokens}"
|
||||
f"{log_prefix} Turn usage: prompt={prompt_tokens}, completion={completion_tokens},"
|
||||
f" total={total_tokens}"
|
||||
)
|
||||
|
||||
if user_id:
|
||||
@@ -93,6 +165,54 @@ async def persist_and_record_usage(
|
||||
cache_creation_tokens=cache_creation_tokens,
|
||||
)
|
||||
except Exception as usage_err:
|
||||
logger.warning(f"{log_prefix} Failed to record token usage: {usage_err}")
|
||||
logger.warning("%s Failed to record token usage: %s", log_prefix, usage_err)
|
||||
|
||||
# Log to PlatformCostLog for admin cost dashboard.
|
||||
# Include entries where cost_usd is set even if token count is 0
|
||||
# (e.g. fully-cached Anthropic responses where only cache tokens
|
||||
# accumulate a charge without incrementing total_tokens).
|
||||
if user_id and (total_tokens > 0 or cost_usd is not None):
|
||||
cost_float = None
|
||||
if cost_usd is not None:
|
||||
try:
|
||||
val = float(cost_usd)
|
||||
if math.isfinite(val) and val >= 0:
|
||||
cost_float = val
|
||||
except (ValueError, TypeError):
|
||||
pass
|
||||
|
||||
cost_microdollars = usd_to_microdollars(cost_float)
|
||||
session_id = session.session_id if session else None
|
||||
|
||||
if cost_float is not None:
|
||||
tracking_type = "cost_usd"
|
||||
tracking_amount = cost_float
|
||||
else:
|
||||
tracking_type = "tokens"
|
||||
tracking_amount = total_tokens
|
||||
|
||||
_schedule_cost_log(
|
||||
PlatformCostEntry(
|
||||
user_id=user_id,
|
||||
graph_exec_id=session_id,
|
||||
block_id=COPILOT_BLOCK_ID,
|
||||
block_name=_copilot_block_name(log_prefix),
|
||||
provider=provider,
|
||||
credential_id=COPILOT_CREDENTIAL_ID,
|
||||
cost_microdollars=cost_microdollars,
|
||||
input_tokens=prompt_tokens,
|
||||
output_tokens=completion_tokens,
|
||||
model=model,
|
||||
tracking_type=tracking_type,
|
||||
tracking_amount=tracking_amount,
|
||||
metadata={
|
||||
"tracking_type": tracking_type,
|
||||
"tracking_amount": tracking_amount,
|
||||
"cache_read_tokens": cache_read_tokens,
|
||||
"cache_creation_tokens": cache_creation_tokens,
|
||||
"source": "copilot",
|
||||
},
|
||||
)
|
||||
)
|
||||
|
||||
return total_tokens
|
||||
|
||||
@@ -4,6 +4,7 @@ Covers both the baseline (prompt+completion only) and SDK (with cache breakdown)
|
||||
calling conventions, session persistence, and rate-limit recording.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
from datetime import UTC, datetime
|
||||
from unittest.mock import AsyncMock, patch
|
||||
|
||||
@@ -279,3 +280,260 @@ class TestRateLimitRecording:
|
||||
completion_tokens=0,
|
||||
)
|
||||
mock_record.assert_not_awaited()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# PlatformCostLog integration
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestPlatformCostLogging:
|
||||
@pytest.mark.asyncio
|
||||
async def test_logs_cost_entry_with_cost_usd(self):
|
||||
"""When cost_usd is provided, tracking_type should be 'cost_usd'."""
|
||||
mock_log = AsyncMock()
|
||||
with (
|
||||
patch(
|
||||
"backend.copilot.token_tracking.record_token_usage",
|
||||
new_callable=AsyncMock,
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.token_tracking.platform_cost_db",
|
||||
return_value=type(
|
||||
"FakePlatformCostDb", (), {"log_platform_cost": mock_log}
|
||||
)(),
|
||||
),
|
||||
):
|
||||
await persist_and_record_usage(
|
||||
session=_make_session(),
|
||||
user_id="user-cost",
|
||||
prompt_tokens=200,
|
||||
completion_tokens=100,
|
||||
cost_usd=0.005,
|
||||
model="gpt-4",
|
||||
provider="anthropic",
|
||||
log_prefix="[SDK]",
|
||||
)
|
||||
await asyncio.sleep(0)
|
||||
mock_log.assert_awaited_once()
|
||||
entry = mock_log.call_args[0][0]
|
||||
assert entry.user_id == "user-cost"
|
||||
assert entry.provider == "anthropic"
|
||||
assert entry.model == "gpt-4"
|
||||
assert entry.cost_microdollars == 5000
|
||||
assert entry.input_tokens == 200
|
||||
assert entry.output_tokens == 100
|
||||
assert entry.tracking_type == "cost_usd"
|
||||
assert entry.metadata["tracking_type"] == "cost_usd"
|
||||
assert entry.metadata["tracking_amount"] == 0.005
|
||||
assert entry.block_name == "copilot:SDK"
|
||||
assert entry.graph_exec_id == "sess-test"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_logs_cost_entry_without_cost_usd(self):
|
||||
"""When cost_usd is None, tracking_type should be 'tokens'."""
|
||||
mock_log = AsyncMock()
|
||||
with (
|
||||
patch(
|
||||
"backend.copilot.token_tracking.record_token_usage",
|
||||
new_callable=AsyncMock,
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.token_tracking.platform_cost_db",
|
||||
return_value=type(
|
||||
"FakePlatformCostDb", (), {"log_platform_cost": mock_log}
|
||||
)(),
|
||||
),
|
||||
):
|
||||
await persist_and_record_usage(
|
||||
session=None,
|
||||
user_id="user-tokens",
|
||||
prompt_tokens=100,
|
||||
completion_tokens=50,
|
||||
log_prefix="[Baseline]",
|
||||
)
|
||||
await asyncio.sleep(0)
|
||||
mock_log.assert_awaited_once()
|
||||
entry = mock_log.call_args[0][0]
|
||||
assert entry.cost_microdollars is None
|
||||
assert entry.tracking_type == "tokens"
|
||||
assert entry.metadata["tracking_type"] == "tokens"
|
||||
assert entry.metadata["tracking_amount"] == 150
|
||||
assert entry.graph_exec_id is None
|
||||
assert entry.block_name == "copilot:Baseline"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_skips_cost_log_when_no_user_id(self):
|
||||
"""No PlatformCostLog entry when user_id is None."""
|
||||
mock_log = AsyncMock()
|
||||
with (
|
||||
patch(
|
||||
"backend.copilot.token_tracking.record_token_usage",
|
||||
new_callable=AsyncMock,
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.token_tracking.platform_cost_db",
|
||||
return_value=type(
|
||||
"FakePlatformCostDb", (), {"log_platform_cost": mock_log}
|
||||
)(),
|
||||
),
|
||||
):
|
||||
await persist_and_record_usage(
|
||||
session=None,
|
||||
user_id=None,
|
||||
prompt_tokens=100,
|
||||
completion_tokens=50,
|
||||
)
|
||||
await asyncio.sleep(0)
|
||||
mock_log.assert_not_awaited()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cost_usd_invalid_string_falls_back_to_tokens(self):
|
||||
"""Invalid cost_usd string should fall back to tokens tracking."""
|
||||
mock_log = AsyncMock()
|
||||
with (
|
||||
patch(
|
||||
"backend.copilot.token_tracking.record_token_usage",
|
||||
new_callable=AsyncMock,
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.token_tracking.platform_cost_db",
|
||||
return_value=type(
|
||||
"FakePlatformCostDb", (), {"log_platform_cost": mock_log}
|
||||
)(),
|
||||
),
|
||||
):
|
||||
await persist_and_record_usage(
|
||||
session=None,
|
||||
user_id="user-invalid",
|
||||
prompt_tokens=100,
|
||||
completion_tokens=50,
|
||||
cost_usd="not-a-number",
|
||||
)
|
||||
await asyncio.sleep(0)
|
||||
mock_log.assert_awaited_once()
|
||||
entry = mock_log.call_args[0][0]
|
||||
assert entry.cost_microdollars is None
|
||||
assert entry.metadata["tracking_type"] == "tokens"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cost_usd_string_number_is_parsed(self):
|
||||
"""String-encoded cost_usd (e.g. from OpenRouter) should be parsed."""
|
||||
mock_log = AsyncMock()
|
||||
with (
|
||||
patch(
|
||||
"backend.copilot.token_tracking.record_token_usage",
|
||||
new_callable=AsyncMock,
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.token_tracking.platform_cost_db",
|
||||
return_value=type(
|
||||
"FakePlatformCostDb", (), {"log_platform_cost": mock_log}
|
||||
)(),
|
||||
),
|
||||
):
|
||||
await persist_and_record_usage(
|
||||
session=None,
|
||||
user_id="user-str",
|
||||
prompt_tokens=100,
|
||||
completion_tokens=50,
|
||||
cost_usd="0.01",
|
||||
)
|
||||
await asyncio.sleep(0)
|
||||
mock_log.assert_awaited_once()
|
||||
entry = mock_log.call_args[0][0]
|
||||
assert entry.cost_microdollars == 10_000
|
||||
assert entry.metadata["tracking_type"] == "cost_usd"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_empty_log_prefix_produces_copilot_block_name(self):
|
||||
"""Empty log_prefix results in block_name='copilot'."""
|
||||
mock_log = AsyncMock()
|
||||
with (
|
||||
patch(
|
||||
"backend.copilot.token_tracking.record_token_usage",
|
||||
new_callable=AsyncMock,
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.token_tracking.platform_cost_db",
|
||||
return_value=type(
|
||||
"FakePlatformCostDb", (), {"log_platform_cost": mock_log}
|
||||
)(),
|
||||
),
|
||||
):
|
||||
await persist_and_record_usage(
|
||||
session=None,
|
||||
user_id="user-empty",
|
||||
prompt_tokens=10,
|
||||
completion_tokens=5,
|
||||
log_prefix="",
|
||||
)
|
||||
await asyncio.sleep(0)
|
||||
entry = mock_log.call_args[0][0]
|
||||
assert entry.block_name == "copilot"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_cache_tokens_included_in_metadata(self):
|
||||
"""Cache token counts should be present in the metadata."""
|
||||
mock_log = AsyncMock()
|
||||
with (
|
||||
patch(
|
||||
"backend.copilot.token_tracking.record_token_usage",
|
||||
new_callable=AsyncMock,
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.token_tracking.platform_cost_db",
|
||||
return_value=type(
|
||||
"FakePlatformCostDb", (), {"log_platform_cost": mock_log}
|
||||
)(),
|
||||
),
|
||||
):
|
||||
await persist_and_record_usage(
|
||||
session=None,
|
||||
user_id="user-cache",
|
||||
prompt_tokens=100,
|
||||
completion_tokens=50,
|
||||
cache_read_tokens=5000,
|
||||
cache_creation_tokens=300,
|
||||
)
|
||||
await asyncio.sleep(0)
|
||||
entry = mock_log.call_args[0][0]
|
||||
assert entry.metadata["cache_read_tokens"] == 5000
|
||||
assert entry.metadata["cache_creation_tokens"] == 300
|
||||
assert entry.metadata["source"] == "copilot"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_logs_cost_only_when_tokens_zero(self):
|
||||
"""Zero prompt+completion tokens with cost_usd set still logs the entry."""
|
||||
mock_log = AsyncMock()
|
||||
with (
|
||||
patch(
|
||||
"backend.copilot.token_tracking.record_token_usage",
|
||||
new_callable=AsyncMock,
|
||||
),
|
||||
patch(
|
||||
"backend.copilot.token_tracking.platform_cost_db",
|
||||
return_value=type(
|
||||
"FakePlatformCostDb", (), {"log_platform_cost": mock_log}
|
||||
)(),
|
||||
),
|
||||
):
|
||||
await persist_and_record_usage(
|
||||
session=None,
|
||||
user_id="user-cached",
|
||||
prompt_tokens=0,
|
||||
completion_tokens=0,
|
||||
cost_usd=0.005,
|
||||
model="claude-3-5-sonnet",
|
||||
provider="anthropic",
|
||||
log_prefix="[SDK]",
|
||||
)
|
||||
await asyncio.sleep(0)
|
||||
# Guard: total_tokens == 0 but cost_usd is set — must still log
|
||||
mock_log.assert_awaited_once()
|
||||
entry = mock_log.call_args[0][0]
|
||||
assert entry.user_id == "user-cached"
|
||||
assert entry.tracking_type == "cost_usd"
|
||||
assert entry.cost_microdollars == 5000
|
||||
assert entry.input_tokens == 0
|
||||
assert entry.output_tokens == 0
|
||||
|
||||
@@ -33,12 +33,23 @@ _GET_CURRENT_DATE_BLOCK_ID = "b29c1b50-5d0e-4d9f-8f9d-1b0e6fcbf0b1"
|
||||
_GMAIL_SEND_BLOCK_ID = "6c27abc2-e51d-499e-a85f-5a0041ba94f0"
|
||||
_TEXT_REPLACE_BLOCK_ID = "7e7c87ab-3469-4bcc-9abe-67705091b713"
|
||||
|
||||
# Default OrchestratorBlock model/mode — kept in sync with ChatConfig.model.
|
||||
# ChatConfig uses the OpenRouter format ("anthropic/claude-opus-4.6");
|
||||
# OrchestratorBlock uses the native Anthropic model name.
|
||||
ORCHESTRATOR_DEFAULT_MODEL = "claude-opus-4-6"
|
||||
ORCHESTRATOR_DEFAULT_EXECUTION_MODE = "extended_thinking"
|
||||
|
||||
# Defaults applied to OrchestratorBlock nodes by the fixer.
|
||||
_SDM_DEFAULTS: dict[str, int | bool] = {
|
||||
# execution_mode and model match the copilot's default (extended thinking
|
||||
# with Opus) so generated agents inherit the same reasoning capabilities.
|
||||
# If the user explicitly sets these fields, the fixer won't override them.
|
||||
_SDM_DEFAULTS: dict[str, int | bool | str] = {
|
||||
"agent_mode_max_iterations": 10,
|
||||
"conversation_compaction": True,
|
||||
"retry": 3,
|
||||
"multiple_tool_calls": False,
|
||||
"execution_mode": ORCHESTRATOR_DEFAULT_EXECUTION_MODE,
|
||||
"model": ORCHESTRATOR_DEFAULT_MODEL,
|
||||
}
|
||||
|
||||
|
||||
@@ -879,6 +890,12 @@ class AgentFixer:
|
||||
)
|
||||
|
||||
if is_ai_block:
|
||||
# Skip AI blocks that don't expose a "model" input property
|
||||
# (some AI-category blocks have no model selector at all).
|
||||
input_properties = block.get("inputSchema", {}).get("properties", {})
|
||||
if "model" not in input_properties:
|
||||
continue
|
||||
|
||||
node_id = node.get("id")
|
||||
input_default = node.get("input_default", {})
|
||||
current_model = input_default.get("model")
|
||||
@@ -887,9 +904,7 @@ class AgentFixer:
|
||||
# Blocks with a block-specific enum on the model field (e.g.
|
||||
# PerplexityBlock) use their own enum values; others use the
|
||||
# generic set.
|
||||
model_schema = (
|
||||
block.get("inputSchema", {}).get("properties", {}).get("model", {})
|
||||
)
|
||||
model_schema = input_properties.get("model", {})
|
||||
block_model_enum = model_schema.get("enum")
|
||||
|
||||
if block_model_enum:
|
||||
@@ -1649,6 +1664,8 @@ class AgentFixer:
|
||||
2. ``conversation_compaction`` defaults to ``True``
|
||||
3. ``retry`` defaults to ``3``
|
||||
4. ``multiple_tool_calls`` defaults to ``False``
|
||||
5. ``execution_mode`` defaults to ``"extended_thinking"``
|
||||
6. ``model`` defaults to ``"claude-opus-4-6"``
|
||||
|
||||
Args:
|
||||
agent: The agent dictionary to fix
|
||||
@@ -1748,6 +1765,12 @@ class AgentFixer:
|
||||
agent = self.fix_node_x_coordinates(agent, node_lookup=node_lookup)
|
||||
agent = self.fix_getcurrentdate_offset(agent)
|
||||
|
||||
# Apply OrchestratorBlock defaults BEFORE fix_ai_model_parameter so that
|
||||
# the orchestrator-specific model (claude-opus-4-6) is set first and
|
||||
# fix_ai_model_parameter sees it as a valid allowed model instead of
|
||||
# overwriting it with the generic default (gpt-4o).
|
||||
agent = self.fix_orchestrator_blocks(agent)
|
||||
|
||||
# Apply fixes that require blocks information
|
||||
if blocks:
|
||||
agent = self.fix_invalid_nested_sink_links(
|
||||
@@ -1765,9 +1788,6 @@ class AgentFixer:
|
||||
# Apply fixes for MCPToolBlock nodes
|
||||
agent = self.fix_mcp_tool_blocks(agent)
|
||||
|
||||
# Apply fixes for OrchestratorBlock nodes (agent-mode defaults)
|
||||
agent = self.fix_orchestrator_blocks(agent)
|
||||
|
||||
# Apply fixes for AgentExecutorBlock nodes (sub-agents)
|
||||
if library_agents:
|
||||
agent = self.fix_agent_executor_blocks(agent, library_agents)
|
||||
|
||||
@@ -580,6 +580,29 @@ class TestFixAiModelParameter:
|
||||
|
||||
assert result["nodes"][0]["input_default"]["model"] == "perplexity/sonar"
|
||||
|
||||
def test_ai_block_without_model_property_is_skipped(self):
|
||||
"""AI-category blocks that have no 'model' input property should not
|
||||
have a model injected — they simply don't expose a model selector."""
|
||||
fixer = AgentFixer()
|
||||
block_id = generate_uuid()
|
||||
node = _make_node(node_id="n1", block_id=block_id, input_default={})
|
||||
agent = _make_agent(nodes=[node])
|
||||
|
||||
blocks = [
|
||||
{
|
||||
"id": block_id,
|
||||
"name": "SomeAIBlock",
|
||||
"categories": [{"category": "AI"}],
|
||||
"inputSchema": {
|
||||
"properties": {"prompt": {"type": "string"}},
|
||||
},
|
||||
}
|
||||
]
|
||||
|
||||
result = fixer.fix_ai_model_parameter(agent, blocks)
|
||||
|
||||
assert "model" not in result["nodes"][0]["input_default"]
|
||||
|
||||
|
||||
class TestFixAgentExecutorBlocks:
|
||||
"""Tests for fix_agent_executor_blocks."""
|
||||
|
||||
@@ -42,7 +42,10 @@ class GetAgentBuildingGuideTool(BaseTool):
|
||||
|
||||
@property
|
||||
def description(self) -> str:
|
||||
return "Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage). Call before generating agent JSON."
|
||||
return (
|
||||
"Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage, "
|
||||
"and the create->dry-run->fix iterative workflow). Call before generating agent JSON."
|
||||
)
|
||||
|
||||
@property
|
||||
def parameters(self) -> dict[str, Any]:
|
||||
|
||||
@@ -0,0 +1,15 @@
|
||||
"""Tests for GetAgentBuildingGuideTool."""
|
||||
|
||||
from backend.copilot.tools.get_agent_building_guide import _load_guide
|
||||
|
||||
|
||||
def test_load_guide_returns_string():
|
||||
guide = _load_guide()
|
||||
assert isinstance(guide, str)
|
||||
assert len(guide) > 100
|
||||
|
||||
|
||||
def test_load_guide_caches():
|
||||
guide1 = _load_guide()
|
||||
guide2 = _load_guide()
|
||||
assert guide1 is guide2
|
||||
@@ -48,27 +48,41 @@ logger = logging.getLogger(__name__)
|
||||
def get_inputs_from_schema(
|
||||
input_schema: dict[str, Any],
|
||||
exclude_fields: set[str] | None = None,
|
||||
input_data: dict[str, Any] | None = None,
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Extract input field info from JSON schema."""
|
||||
"""Extract input field info from JSON schema.
|
||||
|
||||
When *input_data* is provided, each field's ``value`` key is populated
|
||||
with the value the CoPilot already supplied — so the frontend can
|
||||
prefill the form instead of showing empty inputs. Fields marked
|
||||
``advanced`` in the schema are flagged so the frontend can hide them
|
||||
by default (matching the builder behaviour).
|
||||
"""
|
||||
if not isinstance(input_schema, dict):
|
||||
return []
|
||||
|
||||
exclude = exclude_fields or set()
|
||||
properties = input_schema.get("properties", {})
|
||||
required = set(input_schema.get("required", []))
|
||||
provided = input_data or {}
|
||||
|
||||
return [
|
||||
{
|
||||
results: list[dict[str, Any]] = []
|
||||
for name, schema in properties.items():
|
||||
if name in exclude:
|
||||
continue
|
||||
entry: dict[str, Any] = {
|
||||
"name": name,
|
||||
"title": schema.get("title", name),
|
||||
"type": schema.get("type", "string"),
|
||||
"description": schema.get("description", ""),
|
||||
"required": name in required,
|
||||
"default": schema.get("default"),
|
||||
"advanced": schema.get("advanced", False),
|
||||
}
|
||||
for name, schema in properties.items()
|
||||
if name not in exclude
|
||||
]
|
||||
if name in provided:
|
||||
entry["value"] = provided[name]
|
||||
results.append(entry)
|
||||
return results
|
||||
|
||||
|
||||
async def execute_block(
|
||||
@@ -446,7 +460,9 @@ async def prepare_block_for_execution(
|
||||
requirements={
|
||||
"credentials": missing_creds_list,
|
||||
"inputs": get_inputs_from_schema(
|
||||
input_schema, exclude_fields=credentials_fields
|
||||
input_schema,
|
||||
exclude_fields=credentials_fields,
|
||||
input_data=input_data,
|
||||
),
|
||||
"execution_modes": ["immediate"],
|
||||
},
|
||||
|
||||
@@ -153,7 +153,11 @@ class RunAgentTool(BaseTool):
|
||||
},
|
||||
"dry_run": {
|
||||
"type": "boolean",
|
||||
"description": "Execute in preview mode.",
|
||||
"description": (
|
||||
"When true, simulates execution using an LLM for each block "
|
||||
"— no real API calls, credentials, or credits. "
|
||||
"See agent_generation_guide for the full workflow."
|
||||
),
|
||||
},
|
||||
},
|
||||
"required": ["dry_run"],
|
||||
|
||||
@@ -845,6 +845,7 @@ class WriteWorkspaceFileTool(BaseTool):
|
||||
path=path,
|
||||
mime_type=mime_type,
|
||||
overwrite=overwrite,
|
||||
metadata={"origin": "agent-created"},
|
||||
)
|
||||
|
||||
# Build informative source label and message.
|
||||
|
||||
1247
autogpt_platform/backend/backend/copilot/transcript.py
Normal file
1247
autogpt_platform/backend/backend/copilot/transcript.py
Normal file
File diff suppressed because it is too large
Load Diff
240
autogpt_platform/backend/backend/copilot/transcript_builder.py
Normal file
240
autogpt_platform/backend/backend/copilot/transcript_builder.py
Normal file
@@ -0,0 +1,240 @@
|
||||
"""Build complete JSONL transcript from SDK messages.
|
||||
|
||||
The transcript represents the FULL active context at any point in time.
|
||||
Each upload REPLACES the previous transcript atomically.
|
||||
|
||||
Flow:
|
||||
Turn 1: Upload [msg1, msg2]
|
||||
Turn 2: Download [msg1, msg2] → Upload [msg1, msg2, msg3, msg4] (REPLACE)
|
||||
Turn 3: Download [msg1, msg2, msg3, msg4] → Upload [all messages] (REPLACE)
|
||||
|
||||
The transcript is never incremental - always the complete atomic state.
|
||||
"""
|
||||
|
||||
import logging
|
||||
from typing import Any
|
||||
from uuid import uuid4
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.util import json
|
||||
|
||||
from .transcript import STRIPPABLE_TYPES
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class TranscriptEntry(BaseModel):
|
||||
"""Single transcript entry (user or assistant turn)."""
|
||||
|
||||
type: str
|
||||
uuid: str
|
||||
parentUuid: str = ""
|
||||
isCompactSummary: bool | None = None
|
||||
message: dict[str, Any]
|
||||
|
||||
|
||||
class TranscriptBuilder:
|
||||
"""Build complete JSONL transcript from SDK messages.
|
||||
|
||||
This builder maintains the FULL conversation state, not incremental changes.
|
||||
The output is always the complete active context.
|
||||
"""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._entries: list[TranscriptEntry] = []
|
||||
self._last_uuid: str | None = None
|
||||
|
||||
def _last_is_assistant(self) -> bool:
|
||||
return bool(self._entries) and self._entries[-1].type == "assistant"
|
||||
|
||||
def _last_message_id(self) -> str:
|
||||
"""Return the message.id of the last entry, or '' if none."""
|
||||
if self._entries:
|
||||
return self._entries[-1].message.get("id", "")
|
||||
return ""
|
||||
|
||||
@staticmethod
|
||||
def _parse_entry(data: dict) -> TranscriptEntry | None:
|
||||
"""Parse a single transcript entry, filtering strippable types.
|
||||
|
||||
Returns ``None`` for entries that should be skipped (strippable types
|
||||
that are not compaction summaries).
|
||||
"""
|
||||
entry_type = data.get("type", "")
|
||||
if entry_type in STRIPPABLE_TYPES and not data.get("isCompactSummary"):
|
||||
return None
|
||||
return TranscriptEntry(
|
||||
type=entry_type,
|
||||
uuid=data.get("uuid") or str(uuid4()),
|
||||
parentUuid=data.get("parentUuid") or "",
|
||||
isCompactSummary=data.get("isCompactSummary"),
|
||||
message=data.get("message", {}),
|
||||
)
|
||||
|
||||
def load_previous(self, content: str, log_prefix: str = "[Transcript]") -> None:
|
||||
"""Load complete previous transcript.
|
||||
|
||||
This loads the FULL previous context. As new messages come in,
|
||||
we append to this state. The final output is the complete context
|
||||
(previous + new), not just the delta.
|
||||
"""
|
||||
if not content or not content.strip():
|
||||
return
|
||||
|
||||
lines = content.strip().split("\n")
|
||||
for line_num, line in enumerate(lines, 1):
|
||||
if not line.strip():
|
||||
continue
|
||||
|
||||
data = json.loads(line, fallback=None)
|
||||
if data is None:
|
||||
logger.warning(
|
||||
"%s Failed to parse transcript line %d/%d",
|
||||
log_prefix,
|
||||
line_num,
|
||||
len(lines),
|
||||
)
|
||||
continue
|
||||
|
||||
entry = self._parse_entry(data)
|
||||
if entry is None:
|
||||
continue
|
||||
self._entries.append(entry)
|
||||
self._last_uuid = entry.uuid
|
||||
|
||||
logger.info(
|
||||
"%s Loaded %d entries from previous transcript (last_uuid=%s)",
|
||||
log_prefix,
|
||||
len(self._entries),
|
||||
self._last_uuid[:12] if self._last_uuid else None,
|
||||
)
|
||||
|
||||
def append_user(self, content: str | list[dict], uuid: str | None = None) -> None:
|
||||
"""Append a user entry."""
|
||||
msg_uuid = uuid or str(uuid4())
|
||||
|
||||
self._entries.append(
|
||||
TranscriptEntry(
|
||||
type="user",
|
||||
uuid=msg_uuid,
|
||||
parentUuid=self._last_uuid or "",
|
||||
message={"role": "user", "content": content},
|
||||
)
|
||||
)
|
||||
self._last_uuid = msg_uuid
|
||||
|
||||
def append_tool_result(self, tool_use_id: str, content: str) -> None:
|
||||
"""Append a tool result as a user entry (one per tool call)."""
|
||||
self.append_user(
|
||||
content=[
|
||||
{"type": "tool_result", "tool_use_id": tool_use_id, "content": content}
|
||||
]
|
||||
)
|
||||
|
||||
def append_assistant(
|
||||
self,
|
||||
content_blocks: list[dict],
|
||||
model: str = "",
|
||||
stop_reason: str | None = None,
|
||||
) -> None:
|
||||
"""Append an assistant entry.
|
||||
|
||||
Consecutive assistant entries automatically share the same message ID
|
||||
so the CLI can merge them (thinking → text → tool_use) into a single
|
||||
API message on ``--resume``. A new ID is assigned whenever an
|
||||
assistant entry follows a non-assistant entry (user message or tool
|
||||
result), because that marks the start of a new API response.
|
||||
"""
|
||||
message_id = (
|
||||
self._last_message_id()
|
||||
if self._last_is_assistant()
|
||||
else f"msg_sdk_{uuid4().hex[:24]}"
|
||||
)
|
||||
|
||||
msg_uuid = str(uuid4())
|
||||
|
||||
self._entries.append(
|
||||
TranscriptEntry(
|
||||
type="assistant",
|
||||
uuid=msg_uuid,
|
||||
parentUuid=self._last_uuid or "",
|
||||
message={
|
||||
"role": "assistant",
|
||||
"model": model,
|
||||
"id": message_id,
|
||||
"type": "message",
|
||||
"content": content_blocks,
|
||||
"stop_reason": stop_reason,
|
||||
"stop_sequence": None,
|
||||
},
|
||||
)
|
||||
)
|
||||
self._last_uuid = msg_uuid
|
||||
|
||||
def replace_entries(
|
||||
self, compacted_entries: list[dict], log_prefix: str = "[Transcript]"
|
||||
) -> None:
|
||||
"""Replace all entries with compacted entries from the CLI session file.
|
||||
|
||||
Called after mid-stream compaction so TranscriptBuilder mirrors the
|
||||
CLI's active context (compaction summary + post-compaction entries).
|
||||
|
||||
Builds the new list first and validates it's non-empty before swapping,
|
||||
so corrupt input cannot wipe the conversation history.
|
||||
"""
|
||||
new_entries: list[TranscriptEntry] = []
|
||||
for data in compacted_entries:
|
||||
entry = self._parse_entry(data)
|
||||
if entry is not None:
|
||||
new_entries.append(entry)
|
||||
|
||||
if not new_entries:
|
||||
logger.warning(
|
||||
"%s replace_entries produced 0 entries from %d inputs, keeping old (%d entries)",
|
||||
log_prefix,
|
||||
len(compacted_entries),
|
||||
len(self._entries),
|
||||
)
|
||||
return
|
||||
|
||||
old_count = len(self._entries)
|
||||
self._entries = new_entries
|
||||
self._last_uuid = new_entries[-1].uuid
|
||||
|
||||
logger.info(
|
||||
"%s TranscriptBuilder compacted: %d entries -> %d entries",
|
||||
log_prefix,
|
||||
old_count,
|
||||
len(self._entries),
|
||||
)
|
||||
|
||||
def to_jsonl(self) -> str:
|
||||
"""Export complete context as JSONL.
|
||||
|
||||
Consecutive assistant entries are kept separate to match the
|
||||
native CLI format — the SDK merges them internally on resume.
|
||||
|
||||
Returns the FULL conversation state (all entries), not incremental.
|
||||
This output REPLACES any previous transcript.
|
||||
"""
|
||||
if not self._entries:
|
||||
return ""
|
||||
|
||||
lines = [entry.model_dump_json(exclude_none=True) for entry in self._entries]
|
||||
return "\n".join(lines) + "\n"
|
||||
|
||||
@property
|
||||
def entry_count(self) -> int:
|
||||
"""Total number of entries in the complete context."""
|
||||
return len(self._entries)
|
||||
|
||||
@property
|
||||
def is_empty(self) -> bool:
|
||||
"""Whether this builder has any entries."""
|
||||
return len(self._entries) == 0
|
||||
|
||||
@property
|
||||
def last_entry_type(self) -> str | None:
|
||||
"""Type of the last entry, or None if empty."""
|
||||
return self._entries[-1].type if self._entries else None
|
||||
@@ -0,0 +1,260 @@
|
||||
"""Tests for canonical TranscriptBuilder (backend.copilot.transcript_builder).
|
||||
|
||||
These tests directly import from the canonical module to ensure codecov
|
||||
patch coverage for the new file.
|
||||
"""
|
||||
|
||||
from backend.copilot.transcript_builder import TranscriptBuilder, TranscriptEntry
|
||||
from backend.util import json
|
||||
|
||||
|
||||
def _make_jsonl(*entries: dict) -> str:
|
||||
return "\n".join(json.dumps(e) for e in entries) + "\n"
|
||||
|
||||
|
||||
USER_MSG = {
|
||||
"type": "user",
|
||||
"uuid": "u1",
|
||||
"message": {"role": "user", "content": "hello"},
|
||||
}
|
||||
ASST_MSG = {
|
||||
"type": "assistant",
|
||||
"uuid": "a1",
|
||||
"parentUuid": "u1",
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"id": "msg_1",
|
||||
"type": "message",
|
||||
"content": [{"type": "text", "text": "hi"}],
|
||||
"stop_reason": "end_turn",
|
||||
"stop_sequence": None,
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
class TestTranscriptEntry:
|
||||
def test_basic_construction(self):
|
||||
entry = TranscriptEntry(
|
||||
type="user", uuid="u1", message={"role": "user", "content": "hi"}
|
||||
)
|
||||
assert entry.type == "user"
|
||||
assert entry.uuid == "u1"
|
||||
assert entry.parentUuid == ""
|
||||
assert entry.isCompactSummary is None
|
||||
|
||||
def test_optional_fields(self):
|
||||
entry = TranscriptEntry(
|
||||
type="summary",
|
||||
uuid="s1",
|
||||
parentUuid="p1",
|
||||
isCompactSummary=True,
|
||||
message={"role": "user", "content": "summary"},
|
||||
)
|
||||
assert entry.isCompactSummary is True
|
||||
assert entry.parentUuid == "p1"
|
||||
|
||||
|
||||
class TestTranscriptBuilderInit:
|
||||
def test_starts_empty(self):
|
||||
builder = TranscriptBuilder()
|
||||
assert builder.is_empty
|
||||
assert builder.entry_count == 0
|
||||
assert builder.last_entry_type is None
|
||||
assert builder.to_jsonl() == ""
|
||||
|
||||
|
||||
class TestAppendUser:
|
||||
def test_appends_user_entry(self):
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_user("hello")
|
||||
assert builder.entry_count == 1
|
||||
assert builder.last_entry_type == "user"
|
||||
|
||||
def test_chains_parent_uuid(self):
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_user("first", uuid="u1")
|
||||
builder.append_user("second", uuid="u2")
|
||||
output = builder.to_jsonl()
|
||||
entries = [json.loads(line) for line in output.strip().split("\n")]
|
||||
assert entries[0]["parentUuid"] == ""
|
||||
assert entries[1]["parentUuid"] == "u1"
|
||||
|
||||
def test_custom_uuid(self):
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_user("hello", uuid="custom-id")
|
||||
output = builder.to_jsonl()
|
||||
entry = json.loads(output.strip())
|
||||
assert entry["uuid"] == "custom-id"
|
||||
|
||||
|
||||
class TestAppendToolResult:
|
||||
def test_appends_as_user_entry(self):
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_tool_result(tool_use_id="tc_1", content="result text")
|
||||
assert builder.entry_count == 1
|
||||
assert builder.last_entry_type == "user"
|
||||
output = builder.to_jsonl()
|
||||
entry = json.loads(output.strip())
|
||||
content = entry["message"]["content"]
|
||||
assert len(content) == 1
|
||||
assert content[0]["type"] == "tool_result"
|
||||
assert content[0]["tool_use_id"] == "tc_1"
|
||||
assert content[0]["content"] == "result text"
|
||||
|
||||
|
||||
class TestAppendAssistant:
|
||||
def test_appends_assistant_entry(self):
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_user("hi")
|
||||
builder.append_assistant(
|
||||
content_blocks=[{"type": "text", "text": "hello"}],
|
||||
model="test-model",
|
||||
stop_reason="end_turn",
|
||||
)
|
||||
assert builder.entry_count == 2
|
||||
assert builder.last_entry_type == "assistant"
|
||||
|
||||
def test_consecutive_assistants_share_message_id(self):
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_user("hi")
|
||||
builder.append_assistant(
|
||||
content_blocks=[{"type": "text", "text": "part 1"}],
|
||||
model="m",
|
||||
)
|
||||
builder.append_assistant(
|
||||
content_blocks=[{"type": "text", "text": "part 2"}],
|
||||
model="m",
|
||||
)
|
||||
output = builder.to_jsonl()
|
||||
entries = [json.loads(line) for line in output.strip().split("\n")]
|
||||
# The two assistant entries share the same message ID
|
||||
assert entries[1]["message"]["id"] == entries[2]["message"]["id"]
|
||||
|
||||
def test_non_consecutive_assistants_get_different_ids(self):
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_user("q1")
|
||||
builder.append_assistant(
|
||||
content_blocks=[{"type": "text", "text": "a1"}],
|
||||
model="m",
|
||||
)
|
||||
builder.append_user("q2")
|
||||
builder.append_assistant(
|
||||
content_blocks=[{"type": "text", "text": "a2"}],
|
||||
model="m",
|
||||
)
|
||||
output = builder.to_jsonl()
|
||||
entries = [json.loads(line) for line in output.strip().split("\n")]
|
||||
assert entries[1]["message"]["id"] != entries[3]["message"]["id"]
|
||||
|
||||
|
||||
class TestLoadPrevious:
|
||||
def test_loads_valid_entries(self):
|
||||
content = _make_jsonl(USER_MSG, ASST_MSG)
|
||||
builder = TranscriptBuilder()
|
||||
builder.load_previous(content)
|
||||
assert builder.entry_count == 2
|
||||
|
||||
def test_skips_empty_content(self):
|
||||
builder = TranscriptBuilder()
|
||||
builder.load_previous("")
|
||||
assert builder.is_empty
|
||||
builder.load_previous(" ")
|
||||
assert builder.is_empty
|
||||
|
||||
def test_skips_strippable_types(self):
|
||||
progress = {"type": "progress", "uuid": "p1", "message": {}}
|
||||
content = _make_jsonl(USER_MSG, progress, ASST_MSG)
|
||||
builder = TranscriptBuilder()
|
||||
builder.load_previous(content)
|
||||
assert builder.entry_count == 2 # progress was skipped
|
||||
|
||||
def test_preserves_compact_summary(self):
|
||||
compact = {
|
||||
"type": "summary",
|
||||
"uuid": "cs1",
|
||||
"isCompactSummary": True,
|
||||
"message": {"role": "user", "content": "summary"},
|
||||
}
|
||||
content = _make_jsonl(compact, ASST_MSG)
|
||||
builder = TranscriptBuilder()
|
||||
builder.load_previous(content)
|
||||
assert builder.entry_count == 2
|
||||
|
||||
def test_skips_invalid_json_lines(self):
|
||||
content = '{"type":"user","uuid":"u1","message":{}}\nnot-valid-json\n'
|
||||
builder = TranscriptBuilder()
|
||||
builder.load_previous(content)
|
||||
assert builder.entry_count == 1
|
||||
|
||||
|
||||
class TestToJsonl:
|
||||
def test_roundtrip(self):
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_user("hello", uuid="u1")
|
||||
builder.append_assistant(
|
||||
content_blocks=[{"type": "text", "text": "world"}],
|
||||
model="m",
|
||||
)
|
||||
output = builder.to_jsonl()
|
||||
assert output.endswith("\n")
|
||||
lines = output.strip().split("\n")
|
||||
assert len(lines) == 2
|
||||
for line in lines:
|
||||
parsed = json.loads(line)
|
||||
assert "type" in parsed
|
||||
assert "uuid" in parsed
|
||||
assert "message" in parsed
|
||||
|
||||
|
||||
class TestReplaceEntries:
|
||||
def test_replaces_all_entries(self):
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_user("old")
|
||||
builder.append_assistant(
|
||||
content_blocks=[{"type": "text", "text": "old answer"}], model="m"
|
||||
)
|
||||
assert builder.entry_count == 2
|
||||
|
||||
compacted = [
|
||||
{
|
||||
"type": "summary",
|
||||
"uuid": "cs1",
|
||||
"isCompactSummary": True,
|
||||
"message": {"role": "user", "content": "compacted"},
|
||||
}
|
||||
]
|
||||
builder.replace_entries(compacted)
|
||||
assert builder.entry_count == 1
|
||||
|
||||
def test_empty_replacement_keeps_existing(self):
|
||||
builder = TranscriptBuilder()
|
||||
builder.append_user("keep me")
|
||||
builder.replace_entries([])
|
||||
assert builder.entry_count == 1
|
||||
|
||||
|
||||
class TestParseEntry:
|
||||
def test_filters_strippable_non_compact(self):
|
||||
result = TranscriptBuilder._parse_entry(
|
||||
{"type": "progress", "uuid": "p1", "message": {}}
|
||||
)
|
||||
assert result is None
|
||||
|
||||
def test_keeps_compact_summary(self):
|
||||
result = TranscriptBuilder._parse_entry(
|
||||
{
|
||||
"type": "summary",
|
||||
"uuid": "cs1",
|
||||
"isCompactSummary": True,
|
||||
"message": {},
|
||||
}
|
||||
)
|
||||
assert result is not None
|
||||
assert result.isCompactSummary is True
|
||||
|
||||
def test_generates_uuid_if_missing(self):
|
||||
result = TranscriptBuilder._parse_entry(
|
||||
{"type": "user", "message": {"role": "user", "content": "hi"}}
|
||||
)
|
||||
assert result is not None
|
||||
assert result.uuid # Should be a generated UUID
|
||||
726
autogpt_platform/backend/backend/copilot/transcript_test.py
Normal file
726
autogpt_platform/backend/backend/copilot/transcript_test.py
Normal file
@@ -0,0 +1,726 @@
|
||||
"""Tests for canonical transcript module (backend.copilot.transcript).
|
||||
|
||||
Covers pure helper functions that are not exercised by the SDK re-export tests.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
from backend.util import json
|
||||
|
||||
from .transcript import (
|
||||
TranscriptDownload,
|
||||
_build_path_from_parts,
|
||||
_find_last_assistant_entry,
|
||||
_flatten_assistant_content,
|
||||
_flatten_tool_result_content,
|
||||
_messages_to_transcript,
|
||||
_meta_storage_path_parts,
|
||||
_rechain_tail,
|
||||
_sanitize_id,
|
||||
_storage_path_parts,
|
||||
_transcript_to_messages,
|
||||
strip_for_upload,
|
||||
validate_transcript,
|
||||
)
|
||||
|
||||
|
||||
def _make_jsonl(*entries: dict) -> str:
|
||||
return "\n".join(json.dumps(e) for e in entries) + "\n"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _sanitize_id
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestSanitizeId:
|
||||
def test_uuid_passes_through(self):
|
||||
assert _sanitize_id("abcdef12-3456-7890-abcd-ef1234567890") == (
|
||||
"abcdef12-3456-7890-abcd-ef1234567890"
|
||||
)
|
||||
|
||||
def test_strips_non_hex_characters(self):
|
||||
# Only hex chars (0-9, a-f, A-F) and hyphens are kept
|
||||
result = _sanitize_id("abc/../../etc/passwd")
|
||||
assert "/" not in result
|
||||
assert "." not in result
|
||||
# 'p', 's', 'w' are not hex chars, so they are stripped
|
||||
assert all(c in "0123456789abcdefABCDEF-" for c in result)
|
||||
|
||||
def test_truncates_to_max_len(self):
|
||||
long_id = "a" * 100
|
||||
result = _sanitize_id(long_id, max_len=10)
|
||||
assert len(result) == 10
|
||||
|
||||
def test_empty_returns_unknown(self):
|
||||
assert _sanitize_id("") == "unknown"
|
||||
|
||||
def test_none_returns_unknown(self):
|
||||
assert _sanitize_id(None) == "unknown" # type: ignore[arg-type]
|
||||
|
||||
def test_special_chars_only_returns_unknown(self):
|
||||
assert _sanitize_id("!@#$%^&*()") == "unknown"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _storage_path_parts / _meta_storage_path_parts
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestStoragePathParts:
|
||||
def test_returns_triple(self):
|
||||
prefix, uid, fname = _storage_path_parts("user-1", "sess-2")
|
||||
assert prefix == "chat-transcripts"
|
||||
assert "e" in uid # hex chars from "user-1" sanitized
|
||||
assert fname.endswith(".jsonl")
|
||||
|
||||
def test_meta_returns_meta_json(self):
|
||||
prefix, uid, fname = _meta_storage_path_parts("user-1", "sess-2")
|
||||
assert prefix == "chat-transcripts"
|
||||
assert fname.endswith(".meta.json")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _build_path_from_parts
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestBuildPathFromParts:
|
||||
def test_gcs_backend(self):
|
||||
from backend.util.workspace_storage import GCSWorkspaceStorage
|
||||
|
||||
mock_gcs = MagicMock(spec=GCSWorkspaceStorage)
|
||||
mock_gcs.bucket_name = "my-bucket"
|
||||
path = _build_path_from_parts(("wid", "fid", "file.jsonl"), mock_gcs)
|
||||
assert path == "gcs://my-bucket/workspaces/wid/fid/file.jsonl"
|
||||
|
||||
def test_local_backend(self):
|
||||
# Use a plain object (not MagicMock) so isinstance(GCSWorkspaceStorage) is False
|
||||
local_backend = type("LocalBackend", (), {})()
|
||||
path = _build_path_from_parts(("wid", "fid", "file.jsonl"), local_backend)
|
||||
assert path == "local://wid/fid/file.jsonl"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# TranscriptDownload dataclass
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestTranscriptDownload:
|
||||
def test_defaults(self):
|
||||
td = TranscriptDownload(content="hello")
|
||||
assert td.content == "hello"
|
||||
assert td.message_count == 0
|
||||
assert td.uploaded_at == 0.0
|
||||
|
||||
def test_custom_values(self):
|
||||
td = TranscriptDownload(content="data", message_count=5, uploaded_at=123.45)
|
||||
assert td.message_count == 5
|
||||
assert td.uploaded_at == 123.45
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _flatten_assistant_content
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestFlattenAssistantContent:
|
||||
def test_text_blocks(self):
|
||||
blocks = [
|
||||
{"type": "text", "text": "Hello"},
|
||||
{"type": "text", "text": "World"},
|
||||
]
|
||||
assert _flatten_assistant_content(blocks) == "Hello\nWorld"
|
||||
|
||||
def test_thinking_blocks_stripped(self):
|
||||
blocks = [
|
||||
{"type": "thinking", "thinking": "hmm..."},
|
||||
{"type": "text", "text": "answer"},
|
||||
{"type": "redacted_thinking", "data": "secret"},
|
||||
]
|
||||
assert _flatten_assistant_content(blocks) == "answer"
|
||||
|
||||
def test_tool_use_blocks_stripped(self):
|
||||
blocks = [
|
||||
{"type": "text", "text": "I'll run a tool"},
|
||||
{"type": "tool_use", "name": "bash", "id": "tc1", "input": {}},
|
||||
]
|
||||
assert _flatten_assistant_content(blocks) == "I'll run a tool"
|
||||
|
||||
def test_string_blocks(self):
|
||||
blocks = ["hello", "world"]
|
||||
assert _flatten_assistant_content(blocks) == "hello\nworld"
|
||||
|
||||
def test_empty_blocks(self):
|
||||
assert _flatten_assistant_content([]) == ""
|
||||
|
||||
def test_unknown_dict_blocks_skipped(self):
|
||||
blocks = [{"type": "image", "data": "base64..."}]
|
||||
assert _flatten_assistant_content(blocks) == ""
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _flatten_tool_result_content
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestFlattenToolResultContent:
|
||||
def test_tool_result_with_text_content(self):
|
||||
blocks = [
|
||||
{
|
||||
"type": "tool_result",
|
||||
"tool_use_id": "tc1",
|
||||
"content": [{"type": "text", "text": "output data"}],
|
||||
}
|
||||
]
|
||||
assert _flatten_tool_result_content(blocks) == "output data"
|
||||
|
||||
def test_tool_result_with_string_content(self):
|
||||
blocks = [
|
||||
{"type": "tool_result", "tool_use_id": "tc1", "content": "simple string"}
|
||||
]
|
||||
assert _flatten_tool_result_content(blocks) == "simple string"
|
||||
|
||||
def test_tool_result_with_image_placeholder(self):
|
||||
blocks = [
|
||||
{
|
||||
"type": "tool_result",
|
||||
"tool_use_id": "tc1",
|
||||
"content": [{"type": "image", "data": "base64..."}],
|
||||
}
|
||||
]
|
||||
assert _flatten_tool_result_content(blocks) == "[__image__]"
|
||||
|
||||
def test_tool_result_with_document_placeholder(self):
|
||||
blocks = [
|
||||
{
|
||||
"type": "tool_result",
|
||||
"tool_use_id": "tc1",
|
||||
"content": [{"type": "document", "data": "base64..."}],
|
||||
}
|
||||
]
|
||||
assert _flatten_tool_result_content(blocks) == "[__document__]"
|
||||
|
||||
def test_tool_result_with_none_content(self):
|
||||
blocks = [{"type": "tool_result", "tool_use_id": "tc1", "content": None}]
|
||||
assert _flatten_tool_result_content(blocks) == ""
|
||||
|
||||
def test_text_block_outside_tool_result(self):
|
||||
blocks = [{"type": "text", "text": "standalone"}]
|
||||
assert _flatten_tool_result_content(blocks) == "standalone"
|
||||
|
||||
def test_unknown_dict_block_placeholder(self):
|
||||
blocks = [{"type": "custom_widget", "data": "x"}]
|
||||
assert _flatten_tool_result_content(blocks) == "[__custom_widget__]"
|
||||
|
||||
def test_string_blocks(self):
|
||||
blocks = ["raw text"]
|
||||
assert _flatten_tool_result_content(blocks) == "raw text"
|
||||
|
||||
def test_empty_blocks(self):
|
||||
assert _flatten_tool_result_content([]) == ""
|
||||
|
||||
def test_mixed_content_in_tool_result(self):
|
||||
blocks = [
|
||||
{
|
||||
"type": "tool_result",
|
||||
"tool_use_id": "tc1",
|
||||
"content": [
|
||||
{"type": "text", "text": "line1"},
|
||||
{"type": "image", "data": "..."},
|
||||
"raw string",
|
||||
],
|
||||
}
|
||||
]
|
||||
result = _flatten_tool_result_content(blocks)
|
||||
assert "line1" in result
|
||||
assert "[__image__]" in result
|
||||
assert "raw string" in result
|
||||
|
||||
def test_tool_result_with_dict_without_text_key(self):
|
||||
blocks = [
|
||||
{
|
||||
"type": "tool_result",
|
||||
"tool_use_id": "tc1",
|
||||
"content": [{"count": 42}],
|
||||
}
|
||||
]
|
||||
result = _flatten_tool_result_content(blocks)
|
||||
assert "42" in result
|
||||
|
||||
def test_tool_result_content_list_with_list_content(self):
|
||||
blocks = [
|
||||
{
|
||||
"type": "tool_result",
|
||||
"tool_use_id": "tc1",
|
||||
"content": [{"type": "text", "text": None}],
|
||||
}
|
||||
]
|
||||
result = _flatten_tool_result_content(blocks)
|
||||
assert result == "None"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _transcript_to_messages
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
USER_ENTRY = {
|
||||
"type": "user",
|
||||
"uuid": "u1",
|
||||
"parentUuid": "",
|
||||
"message": {"role": "user", "content": "hello"},
|
||||
}
|
||||
ASST_ENTRY = {
|
||||
"type": "assistant",
|
||||
"uuid": "a1",
|
||||
"parentUuid": "u1",
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"id": "msg_1",
|
||||
"content": [{"type": "text", "text": "hi there"}],
|
||||
},
|
||||
}
|
||||
PROGRESS_ENTRY = {
|
||||
"type": "progress",
|
||||
"uuid": "p1",
|
||||
"parentUuid": "u1",
|
||||
"data": {},
|
||||
}
|
||||
|
||||
|
||||
class TestTranscriptToMessages:
|
||||
def test_basic_conversion(self):
|
||||
content = _make_jsonl(USER_ENTRY, ASST_ENTRY)
|
||||
messages = _transcript_to_messages(content)
|
||||
assert len(messages) == 2
|
||||
assert messages[0] == {"role": "user", "content": "hello"}
|
||||
assert messages[1]["role"] == "assistant"
|
||||
assert messages[1]["content"] == "hi there"
|
||||
|
||||
def test_skips_strippable_types(self):
|
||||
content = _make_jsonl(USER_ENTRY, PROGRESS_ENTRY, ASST_ENTRY)
|
||||
messages = _transcript_to_messages(content)
|
||||
assert len(messages) == 2
|
||||
|
||||
def test_skips_entries_without_role(self):
|
||||
no_role = {"type": "user", "uuid": "x", "message": {"content": "no role"}}
|
||||
content = _make_jsonl(no_role)
|
||||
messages = _transcript_to_messages(content)
|
||||
assert len(messages) == 0
|
||||
|
||||
def test_handles_string_content(self):
|
||||
entry = {
|
||||
"type": "user",
|
||||
"uuid": "u1",
|
||||
"message": {"role": "user", "content": "plain string"},
|
||||
}
|
||||
content = _make_jsonl(entry)
|
||||
messages = _transcript_to_messages(content)
|
||||
assert messages[0]["content"] == "plain string"
|
||||
|
||||
def test_handles_tool_result_content(self):
|
||||
entry = {
|
||||
"type": "user",
|
||||
"uuid": "u1",
|
||||
"message": {
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "tool_result", "tool_use_id": "tc1", "content": "output"}
|
||||
],
|
||||
},
|
||||
}
|
||||
content = _make_jsonl(entry)
|
||||
messages = _transcript_to_messages(content)
|
||||
assert messages[0]["content"] == "output"
|
||||
|
||||
def test_handles_none_content(self):
|
||||
entry = {
|
||||
"type": "assistant",
|
||||
"uuid": "a1",
|
||||
"message": {"role": "assistant", "content": None},
|
||||
}
|
||||
content = _make_jsonl(entry)
|
||||
messages = _transcript_to_messages(content)
|
||||
assert messages[0]["content"] == ""
|
||||
|
||||
def test_skips_invalid_json(self):
|
||||
content = "not valid json\n"
|
||||
messages = _transcript_to_messages(content)
|
||||
assert len(messages) == 0
|
||||
|
||||
def test_preserves_compact_summary(self):
|
||||
compact = {
|
||||
"type": "summary",
|
||||
"uuid": "cs1",
|
||||
"isCompactSummary": True,
|
||||
"message": {"role": "user", "content": "summary of conversation"},
|
||||
}
|
||||
content = _make_jsonl(compact)
|
||||
messages = _transcript_to_messages(content)
|
||||
assert len(messages) == 1
|
||||
|
||||
def test_strips_summary_without_compact_flag(self):
|
||||
summary = {
|
||||
"type": "summary",
|
||||
"uuid": "s1",
|
||||
"message": {"role": "user", "content": "summary"},
|
||||
}
|
||||
content = _make_jsonl(summary)
|
||||
messages = _transcript_to_messages(content)
|
||||
assert len(messages) == 0
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _messages_to_transcript
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestMessagesToTranscript:
|
||||
def test_basic_roundtrip(self):
|
||||
messages = [
|
||||
{"role": "user", "content": "hello"},
|
||||
{"role": "assistant", "content": "world"},
|
||||
]
|
||||
result = _messages_to_transcript(messages)
|
||||
assert result.endswith("\n")
|
||||
lines = result.strip().split("\n")
|
||||
assert len(lines) == 2
|
||||
|
||||
user_entry = json.loads(lines[0])
|
||||
assert user_entry["type"] == "user"
|
||||
assert user_entry["message"]["role"] == "user"
|
||||
assert user_entry["message"]["content"] == "hello"
|
||||
assert user_entry["parentUuid"] == ""
|
||||
|
||||
asst_entry = json.loads(lines[1])
|
||||
assert asst_entry["type"] == "assistant"
|
||||
assert asst_entry["message"]["role"] == "assistant"
|
||||
assert asst_entry["message"]["content"] == [{"type": "text", "text": "world"}]
|
||||
assert asst_entry["parentUuid"] == user_entry["uuid"]
|
||||
|
||||
def test_empty_messages(self):
|
||||
assert _messages_to_transcript([]) == ""
|
||||
|
||||
def test_assistant_has_message_envelope(self):
|
||||
messages = [{"role": "assistant", "content": "test"}]
|
||||
result = _messages_to_transcript(messages)
|
||||
entry = json.loads(result.strip())
|
||||
msg = entry["message"]
|
||||
assert "id" in msg
|
||||
assert msg["id"].startswith("msg_compact_")
|
||||
assert msg["type"] == "message"
|
||||
assert msg["stop_reason"] == "end_turn"
|
||||
assert msg["stop_sequence"] is None
|
||||
|
||||
def test_uuid_chain(self):
|
||||
messages = [
|
||||
{"role": "user", "content": "a"},
|
||||
{"role": "assistant", "content": "b"},
|
||||
{"role": "user", "content": "c"},
|
||||
]
|
||||
result = _messages_to_transcript(messages)
|
||||
lines = result.strip().split("\n")
|
||||
entries = [json.loads(line) for line in lines]
|
||||
assert entries[0]["parentUuid"] == ""
|
||||
assert entries[1]["parentUuid"] == entries[0]["uuid"]
|
||||
assert entries[2]["parentUuid"] == entries[1]["uuid"]
|
||||
|
||||
def test_assistant_with_empty_content(self):
|
||||
messages = [{"role": "assistant", "content": ""}]
|
||||
result = _messages_to_transcript(messages)
|
||||
entry = json.loads(result.strip())
|
||||
assert entry["message"]["content"] == []
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _find_last_assistant_entry
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestFindLastAssistantEntry:
|
||||
def test_splits_at_last_assistant(self):
|
||||
user = {
|
||||
"type": "user",
|
||||
"uuid": "u1",
|
||||
"message": {"role": "user", "content": "hi"},
|
||||
}
|
||||
asst = {
|
||||
"type": "assistant",
|
||||
"uuid": "a1",
|
||||
"message": {"role": "assistant", "id": "msg1", "content": "answer"},
|
||||
}
|
||||
content = _make_jsonl(user, asst)
|
||||
prefix, tail = _find_last_assistant_entry(content)
|
||||
assert len(prefix) == 1
|
||||
assert len(tail) == 1
|
||||
|
||||
def test_no_assistant_returns_all_in_prefix(self):
|
||||
user1 = {
|
||||
"type": "user",
|
||||
"uuid": "u1",
|
||||
"message": {"role": "user", "content": "hi"},
|
||||
}
|
||||
user2 = {
|
||||
"type": "user",
|
||||
"uuid": "u2",
|
||||
"message": {"role": "user", "content": "hey"},
|
||||
}
|
||||
content = _make_jsonl(user1, user2)
|
||||
prefix, tail = _find_last_assistant_entry(content)
|
||||
assert len(prefix) == 2
|
||||
assert len(tail) == 0
|
||||
|
||||
def test_multi_entry_turn_preserved(self):
|
||||
user = {
|
||||
"type": "user",
|
||||
"uuid": "u1",
|
||||
"message": {"role": "user", "content": "q"},
|
||||
}
|
||||
asst1 = {
|
||||
"type": "assistant",
|
||||
"uuid": "a1",
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"id": "msg_turn",
|
||||
"content": [{"type": "thinking", "thinking": "hmm"}],
|
||||
},
|
||||
}
|
||||
asst2 = {
|
||||
"type": "assistant",
|
||||
"uuid": "a2",
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"id": "msg_turn",
|
||||
"content": [{"type": "text", "text": "answer"}],
|
||||
},
|
||||
}
|
||||
content = _make_jsonl(user, asst1, asst2)
|
||||
prefix, tail = _find_last_assistant_entry(content)
|
||||
assert len(prefix) == 1 # just the user
|
||||
assert len(tail) == 2 # both assistant entries
|
||||
|
||||
def test_assistant_without_id(self):
|
||||
user = {
|
||||
"type": "user",
|
||||
"uuid": "u1",
|
||||
"message": {"role": "user", "content": "q"},
|
||||
}
|
||||
asst = {
|
||||
"type": "assistant",
|
||||
"uuid": "a1",
|
||||
"message": {"role": "assistant", "content": "no id"},
|
||||
}
|
||||
content = _make_jsonl(user, asst)
|
||||
prefix, tail = _find_last_assistant_entry(content)
|
||||
assert len(prefix) == 1
|
||||
assert len(tail) == 1
|
||||
|
||||
def test_trailing_user_after_assistant(self):
|
||||
user1 = {
|
||||
"type": "user",
|
||||
"uuid": "u1",
|
||||
"message": {"role": "user", "content": "q"},
|
||||
}
|
||||
asst = {
|
||||
"type": "assistant",
|
||||
"uuid": "a1",
|
||||
"message": {"role": "assistant", "id": "msg1", "content": "a"},
|
||||
}
|
||||
user2 = {
|
||||
"type": "user",
|
||||
"uuid": "u2",
|
||||
"message": {"role": "user", "content": "follow"},
|
||||
}
|
||||
content = _make_jsonl(user1, asst, user2)
|
||||
prefix, tail = _find_last_assistant_entry(content)
|
||||
assert len(prefix) == 1 # user1
|
||||
assert len(tail) == 2 # asst + user2
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _rechain_tail
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestRechainTail:
|
||||
def test_empty_tail(self):
|
||||
assert _rechain_tail("some prefix\n", []) == ""
|
||||
|
||||
def test_patches_first_entry_parent(self):
|
||||
prefix_entry = {"uuid": "last-prefix-uuid", "type": "user", "message": {}}
|
||||
prefix = json.dumps(prefix_entry) + "\n"
|
||||
|
||||
tail_entry = {
|
||||
"uuid": "t1",
|
||||
"parentUuid": "old-parent",
|
||||
"type": "assistant",
|
||||
"message": {},
|
||||
}
|
||||
tail_lines = [json.dumps(tail_entry)]
|
||||
|
||||
result = _rechain_tail(prefix, tail_lines)
|
||||
parsed = json.loads(result.strip())
|
||||
assert parsed["parentUuid"] == "last-prefix-uuid"
|
||||
|
||||
def test_chains_consecutive_tail_entries(self):
|
||||
prefix_entry = {"uuid": "p1", "type": "user", "message": {}}
|
||||
prefix = json.dumps(prefix_entry) + "\n"
|
||||
|
||||
t1 = {"uuid": "t1", "parentUuid": "old1", "type": "assistant", "message": {}}
|
||||
t2 = {"uuid": "t2", "parentUuid": "old2", "type": "user", "message": {}}
|
||||
tail_lines = [json.dumps(t1), json.dumps(t2)]
|
||||
|
||||
result = _rechain_tail(prefix, tail_lines)
|
||||
entries = [json.loads(line) for line in result.strip().split("\n")]
|
||||
assert entries[0]["parentUuid"] == "p1"
|
||||
assert entries[1]["parentUuid"] == "t1"
|
||||
|
||||
def test_non_dict_lines_passed_through(self):
|
||||
prefix_entry = {"uuid": "p1", "type": "user", "message": {}}
|
||||
prefix = json.dumps(prefix_entry) + "\n"
|
||||
|
||||
tail_lines = ["not-a-json-dict"]
|
||||
result = _rechain_tail(prefix, tail_lines)
|
||||
assert "not-a-json-dict" in result
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# strip_for_upload (combined single-parse)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestStripForUpload:
|
||||
def test_strips_progress_and_thinking(self):
|
||||
user = {
|
||||
"type": "user",
|
||||
"uuid": "u1",
|
||||
"parentUuid": "",
|
||||
"message": {"role": "user", "content": "hi"},
|
||||
}
|
||||
progress = {"type": "progress", "uuid": "p1", "parentUuid": "u1", "data": {}}
|
||||
asst_old = {
|
||||
"type": "assistant",
|
||||
"uuid": "a1",
|
||||
"parentUuid": "p1",
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"id": "msg_old",
|
||||
"content": [
|
||||
{"type": "thinking", "thinking": "stale thinking"},
|
||||
{"type": "text", "text": "old answer"},
|
||||
],
|
||||
},
|
||||
}
|
||||
user2 = {
|
||||
"type": "user",
|
||||
"uuid": "u2",
|
||||
"parentUuid": "a1",
|
||||
"message": {"role": "user", "content": "next"},
|
||||
}
|
||||
asst_new = {
|
||||
"type": "assistant",
|
||||
"uuid": "a2",
|
||||
"parentUuid": "u2",
|
||||
"message": {
|
||||
"role": "assistant",
|
||||
"id": "msg_new",
|
||||
"content": [
|
||||
{"type": "thinking", "thinking": "fresh thinking"},
|
||||
{"type": "text", "text": "new answer"},
|
||||
],
|
||||
},
|
||||
}
|
||||
content = _make_jsonl(user, progress, asst_old, user2, asst_new)
|
||||
result = strip_for_upload(content)
|
||||
|
||||
lines = result.strip().split("\n")
|
||||
# Progress should be stripped -> 4 entries remain
|
||||
assert len(lines) == 4
|
||||
|
||||
# First entry (user) should be reparented since its child (progress) was stripped
|
||||
entries = [json.loads(line) for line in lines]
|
||||
types = [e.get("type") for e in entries]
|
||||
assert "progress" not in types
|
||||
|
||||
# Old assistant thinking stripped, new assistant thinking preserved
|
||||
old_asst = next(
|
||||
e for e in entries if e.get("message", {}).get("id") == "msg_old"
|
||||
)
|
||||
old_content = old_asst["message"]["content"]
|
||||
old_types = [b["type"] for b in old_content if isinstance(b, dict)]
|
||||
assert "thinking" not in old_types
|
||||
assert "text" in old_types
|
||||
|
||||
new_asst = next(
|
||||
e for e in entries if e.get("message", {}).get("id") == "msg_new"
|
||||
)
|
||||
new_content = new_asst["message"]["content"]
|
||||
new_types = [b["type"] for b in new_content if isinstance(b, dict)]
|
||||
assert "thinking" in new_types # last assistant preserved
|
||||
|
||||
def test_empty_content(self):
|
||||
result = strip_for_upload("")
|
||||
# Empty string produces a single empty line after split, resulting in "\n"
|
||||
assert result.strip() == ""
|
||||
|
||||
def test_preserves_compact_summary(self):
|
||||
compact = {
|
||||
"type": "summary",
|
||||
"uuid": "cs1",
|
||||
"isCompactSummary": True,
|
||||
"message": {"role": "user", "content": "summary"},
|
||||
}
|
||||
asst = {
|
||||
"type": "assistant",
|
||||
"uuid": "a1",
|
||||
"parentUuid": "cs1",
|
||||
"message": {"role": "assistant", "id": "msg1", "content": "answer"},
|
||||
}
|
||||
content = _make_jsonl(compact, asst)
|
||||
result = strip_for_upload(content)
|
||||
lines = result.strip().split("\n")
|
||||
assert len(lines) == 2
|
||||
|
||||
def test_no_assistant_entries(self):
|
||||
user = {
|
||||
"type": "user",
|
||||
"uuid": "u1",
|
||||
"message": {"role": "user", "content": "hi"},
|
||||
}
|
||||
content = _make_jsonl(user)
|
||||
result = strip_for_upload(content)
|
||||
lines = result.strip().split("\n")
|
||||
assert len(lines) == 1
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# validate_transcript (additional edge cases)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestValidateTranscript:
|
||||
def test_valid_with_assistant(self):
|
||||
content = _make_jsonl(
|
||||
USER_ENTRY,
|
||||
ASST_ENTRY,
|
||||
)
|
||||
assert validate_transcript(content) is True
|
||||
|
||||
def test_none_returns_false(self):
|
||||
assert validate_transcript(None) is False
|
||||
|
||||
def test_whitespace_only_returns_false(self):
|
||||
assert validate_transcript(" \n ") is False
|
||||
|
||||
def test_no_assistant_returns_false(self):
|
||||
content = _make_jsonl(USER_ENTRY)
|
||||
assert validate_transcript(content) is False
|
||||
|
||||
def test_invalid_json_returns_false(self):
|
||||
assert validate_transcript("not json\n") is False
|
||||
|
||||
def test_assistant_only_is_valid(self):
|
||||
content = _make_jsonl(ASST_ENTRY)
|
||||
assert validate_transcript(content) is True
|
||||
@@ -147,6 +147,19 @@ MODEL_COST: dict[LlmModel, int] = {
|
||||
LlmModel.KIMI_K2: 1,
|
||||
LlmModel.QWEN3_235B_A22B_THINKING: 1,
|
||||
LlmModel.QWEN3_CODER: 9,
|
||||
# Z.ai (Zhipu) models
|
||||
LlmModel.ZAI_GLM_4_32B: 1,
|
||||
LlmModel.ZAI_GLM_4_5: 2,
|
||||
LlmModel.ZAI_GLM_4_5_AIR: 1,
|
||||
LlmModel.ZAI_GLM_4_5_AIR_FREE: 1,
|
||||
LlmModel.ZAI_GLM_4_5V: 2,
|
||||
LlmModel.ZAI_GLM_4_6: 1,
|
||||
LlmModel.ZAI_GLM_4_6V: 1,
|
||||
LlmModel.ZAI_GLM_4_7: 1,
|
||||
LlmModel.ZAI_GLM_4_7_FLASH: 1,
|
||||
LlmModel.ZAI_GLM_5: 2,
|
||||
LlmModel.ZAI_GLM_5_TURBO: 4,
|
||||
LlmModel.ZAI_GLM_5V_TURBO: 4,
|
||||
# v0 by Vercel models
|
||||
LlmModel.V0_1_5_MD: 1,
|
||||
LlmModel.V0_1_5_LG: 2,
|
||||
|
||||
@@ -142,3 +142,9 @@ def credit_db():
|
||||
credit_db = get_database_manager_async_client()
|
||||
|
||||
return credit_db
|
||||
|
||||
|
||||
def platform_cost_db():
|
||||
from backend.util.clients import get_database_manager_async_client
|
||||
|
||||
return get_database_manager_async_client()
|
||||
|
||||
@@ -96,6 +96,7 @@ from backend.data.notifications import (
|
||||
remove_notifications_from_batch,
|
||||
)
|
||||
from backend.data.onboarding import increment_onboarding_runs
|
||||
from backend.data.platform_cost import log_platform_cost
|
||||
from backend.data.understanding import (
|
||||
get_business_understanding,
|
||||
upsert_business_understanding,
|
||||
@@ -332,6 +333,9 @@ class DatabaseManager(AppService):
|
||||
get_blocks_needing_optimization = _(get_blocks_needing_optimization)
|
||||
update_block_optimized_description = _(update_block_optimized_description)
|
||||
|
||||
# ============ Platform Cost Tracking ============ #
|
||||
log_platform_cost = _(log_platform_cost)
|
||||
|
||||
# ============ CoPilot Chat Sessions ============ #
|
||||
get_chat_session = _(chat_db.get_chat_session)
|
||||
create_chat_session = _(chat_db.create_chat_session)
|
||||
@@ -529,6 +533,9 @@ class DatabaseManagerAsyncClient(AppServiceClient):
|
||||
# ============ Block Descriptions ============ #
|
||||
get_blocks_needing_optimization = d.get_blocks_needing_optimization
|
||||
|
||||
# ============ Platform Cost Tracking ============ #
|
||||
log_platform_cost = d.log_platform_cost
|
||||
|
||||
# ============ CoPilot Chat Sessions ============ #
|
||||
get_chat_session = d.get_chat_session
|
||||
create_chat_session = d.create_chat_session
|
||||
|
||||
@@ -104,6 +104,11 @@ class User(BaseModel):
|
||||
description="User timezone (IANA timezone identifier or 'not-set')",
|
||||
)
|
||||
|
||||
# Subscription / rate-limit tier
|
||||
subscription_tier: str | None = Field(
|
||||
default=None, description="Subscription tier (FREE, PRO, BUSINESS, ENTERPRISE)"
|
||||
)
|
||||
|
||||
@classmethod
|
||||
def from_db(cls, prisma_user: "PrismaUser") -> "User":
|
||||
"""Convert a database User object to application User model."""
|
||||
@@ -158,6 +163,7 @@ class User(BaseModel):
|
||||
notify_on_weekly_summary=prisma_user.notifyOnWeeklySummary or True,
|
||||
notify_on_monthly_summary=prisma_user.notifyOnMonthlySummary or True,
|
||||
timezone=prisma_user.timezone or USER_TIMEZONE_NOT_SET,
|
||||
subscription_tier=prisma_user.subscriptionTier,
|
||||
)
|
||||
|
||||
|
||||
@@ -819,6 +825,17 @@ class RefundRequest(BaseModel):
|
||||
updated_at: datetime
|
||||
|
||||
|
||||
ProviderCostType = Literal[
|
||||
"cost_usd", # Actual USD cost reported by the provider
|
||||
"tokens", # LLM token counts (sum of input + output)
|
||||
"characters", # Per-character billing (TTS providers)
|
||||
"sandbox_seconds", # Per-second compute billing (e.g. E2B)
|
||||
"walltime_seconds", # Per-second billing incl. queue/polling
|
||||
"per_run", # Per-API-call billing with fixed cost
|
||||
"items", # Per-item billing (lead/organization/result count)
|
||||
]
|
||||
|
||||
|
||||
class NodeExecutionStats(BaseModel):
|
||||
"""Execution statistics for a node execution."""
|
||||
|
||||
@@ -838,32 +855,39 @@ class NodeExecutionStats(BaseModel):
|
||||
output_token_count: int = 0
|
||||
extra_cost: int = 0
|
||||
extra_steps: int = 0
|
||||
provider_cost: float | None = None
|
||||
# Type of the provider-reported cost/usage captured above. When set
|
||||
# by a block, resolve_tracking honors this directly instead of
|
||||
# guessing from provider name.
|
||||
provider_cost_type: Optional[ProviderCostType] = None
|
||||
# Moderation fields
|
||||
cleared_inputs: Optional[dict[str, list[str]]] = None
|
||||
cleared_outputs: Optional[dict[str, list[str]]] = None
|
||||
|
||||
def __iadd__(self, other: "NodeExecutionStats") -> "NodeExecutionStats":
|
||||
"""Mutate this instance by adding another NodeExecutionStats."""
|
||||
"""Mutate this instance by adding another NodeExecutionStats.
|
||||
|
||||
Avoids calling model_dump() twice per merge (called on every
|
||||
merge_stats() from ~20+ blocks); reads via getattr/vars instead.
|
||||
"""
|
||||
if not isinstance(other, NodeExecutionStats):
|
||||
return NotImplemented
|
||||
|
||||
stats_dict = other.model_dump()
|
||||
current_stats = self.model_dump()
|
||||
|
||||
for key, value in stats_dict.items():
|
||||
if key not in current_stats:
|
||||
# Field doesn't exist yet, just set it
|
||||
for key in type(other).model_fields:
|
||||
value = getattr(other, key)
|
||||
if value is None:
|
||||
# Never overwrite an existing value with None
|
||||
continue
|
||||
current = getattr(self, key, None)
|
||||
if current is None:
|
||||
# Field doesn't exist yet or is None, just set it
|
||||
setattr(self, key, value)
|
||||
elif isinstance(value, dict) and isinstance(current_stats[key], dict):
|
||||
current_stats[key].update(value)
|
||||
setattr(self, key, current_stats[key])
|
||||
elif isinstance(value, (int, float)) and isinstance(
|
||||
current_stats[key], (int, float)
|
||||
):
|
||||
setattr(self, key, current_stats[key] + value)
|
||||
elif isinstance(value, list) and isinstance(current_stats[key], list):
|
||||
current_stats[key].extend(value)
|
||||
setattr(self, key, current_stats[key])
|
||||
elif isinstance(value, dict) and isinstance(current, dict):
|
||||
current.update(value)
|
||||
elif isinstance(value, (int, float)) and isinstance(current, (int, float)):
|
||||
setattr(self, key, current + value)
|
||||
elif isinstance(value, list) and isinstance(current, list):
|
||||
current.extend(value)
|
||||
else:
|
||||
setattr(self, key, value)
|
||||
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
import pytest
|
||||
from pydantic import SecretStr
|
||||
|
||||
from backend.data.model import HostScopedCredentials
|
||||
from backend.data.model import HostScopedCredentials, NodeExecutionStats
|
||||
|
||||
|
||||
class TestHostScopedCredentials:
|
||||
@@ -166,3 +166,84 @@ class TestHostScopedCredentials:
|
||||
)
|
||||
|
||||
assert creds.matches_url(test_url) == expected
|
||||
|
||||
|
||||
class TestNodeExecutionStatsIadd:
|
||||
def test_adds_numeric_fields(self):
|
||||
a = NodeExecutionStats(input_token_count=100, output_token_count=50)
|
||||
b = NodeExecutionStats(input_token_count=200, output_token_count=30)
|
||||
a += b
|
||||
assert a.input_token_count == 300
|
||||
assert a.output_token_count == 80
|
||||
|
||||
def test_none_does_not_overwrite(self):
|
||||
a = NodeExecutionStats(provider_cost=0.5, error="some error")
|
||||
b = NodeExecutionStats(provider_cost=None, error=None)
|
||||
a += b
|
||||
assert a.provider_cost == 0.5
|
||||
assert a.error == "some error"
|
||||
|
||||
def test_none_is_skipped_preserving_existing_value(self):
|
||||
a = NodeExecutionStats(input_token_count=100)
|
||||
b = NodeExecutionStats()
|
||||
a += b
|
||||
assert a.input_token_count == 100
|
||||
|
||||
def test_dict_fields_are_merged(self):
|
||||
a = NodeExecutionStats(
|
||||
cleared_inputs={"field1": ["val1"]},
|
||||
)
|
||||
b = NodeExecutionStats(
|
||||
cleared_inputs={"field2": ["val2"]},
|
||||
)
|
||||
a += b
|
||||
assert a.cleared_inputs == {"field1": ["val1"], "field2": ["val2"]}
|
||||
|
||||
def test_returns_self(self):
|
||||
a = NodeExecutionStats()
|
||||
b = NodeExecutionStats(input_token_count=10)
|
||||
result = a.__iadd__(b)
|
||||
assert result is a
|
||||
|
||||
def test_not_implemented_for_non_stats(self):
|
||||
a = NodeExecutionStats()
|
||||
result = a.__iadd__("not a stats") # type: ignore[arg-type]
|
||||
assert result is NotImplemented
|
||||
|
||||
def test_error_none_does_not_clear_existing_error(self):
|
||||
a = NodeExecutionStats(error="existing error")
|
||||
b = NodeExecutionStats(error=None)
|
||||
a += b
|
||||
assert a.error == "existing error"
|
||||
|
||||
def test_provider_cost_none_does_not_clear_existing_cost(self):
|
||||
a = NodeExecutionStats(provider_cost=0.05)
|
||||
b = NodeExecutionStats(provider_cost=None)
|
||||
a += b
|
||||
assert a.provider_cost == 0.05
|
||||
|
||||
def test_provider_cost_accumulates_when_both_set(self):
|
||||
a = NodeExecutionStats(provider_cost=0.01)
|
||||
b = NodeExecutionStats(provider_cost=0.02)
|
||||
a += b
|
||||
assert abs((a.provider_cost or 0) - 0.03) < 1e-9
|
||||
|
||||
def test_provider_cost_first_write_from_none(self):
|
||||
a = NodeExecutionStats()
|
||||
b = NodeExecutionStats(provider_cost=0.05)
|
||||
a += b
|
||||
assert a.provider_cost == 0.05
|
||||
|
||||
def test_provider_cost_type_first_write_from_none(self):
|
||||
"""Writing provider_cost_type into a stats with None sets it."""
|
||||
a = NodeExecutionStats()
|
||||
b = NodeExecutionStats(provider_cost_type="characters")
|
||||
a += b
|
||||
assert a.provider_cost_type == "characters"
|
||||
|
||||
def test_provider_cost_type_none_does_not_overwrite(self):
|
||||
"""A None provider_cost_type from other must not clear an existing value."""
|
||||
a = NodeExecutionStats(provider_cost_type="tokens")
|
||||
b = NodeExecutionStats()
|
||||
a += b
|
||||
assert a.provider_cost_type == "tokens"
|
||||
|
||||
390
autogpt_platform/backend/backend/data/platform_cost.py
Normal file
390
autogpt_platform/backend/backend/data/platform_cost.py
Normal file
@@ -0,0 +1,390 @@
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from typing import Any
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
from backend.data.db import execute_raw_with_schema, query_raw_with_schema
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
MICRODOLLARS_PER_USD = 1_000_000
|
||||
|
||||
# Dashboard query limits — keep in sync with the SQL queries below
|
||||
MAX_PROVIDER_ROWS = 500
|
||||
MAX_USER_ROWS = 100
|
||||
|
||||
# Default date range for dashboard queries when no start date is provided.
|
||||
# Prevents full-table scans on large deployments.
|
||||
DEFAULT_DASHBOARD_DAYS = 30
|
||||
|
||||
|
||||
def usd_to_microdollars(cost_usd: float | None) -> int | None:
|
||||
"""Convert a USD amount (float) to microdollars (int). None-safe."""
|
||||
if cost_usd is None:
|
||||
return None
|
||||
return round(cost_usd * MICRODOLLARS_PER_USD)
|
||||
|
||||
|
||||
class PlatformCostEntry(BaseModel):
|
||||
user_id: str
|
||||
graph_exec_id: str | None = None
|
||||
node_exec_id: str | None = None
|
||||
graph_id: str | None = None
|
||||
node_id: str | None = None
|
||||
block_id: str
|
||||
block_name: str
|
||||
provider: str
|
||||
credential_id: str
|
||||
cost_microdollars: int | None = None
|
||||
input_tokens: int | None = None
|
||||
output_tokens: int | None = None
|
||||
data_size: int | None = None
|
||||
duration: float | None = None
|
||||
model: str | None = None
|
||||
tracking_type: str | None = None
|
||||
tracking_amount: float | None = None
|
||||
metadata: dict[str, Any] | None = None
|
||||
|
||||
|
||||
async def log_platform_cost(entry: PlatformCostEntry) -> None:
|
||||
await execute_raw_with_schema(
|
||||
"""
|
||||
INSERT INTO {schema_prefix}"PlatformCostLog"
|
||||
("id", "createdAt", "userId", "graphExecId", "nodeExecId",
|
||||
"graphId", "nodeId", "blockId", "blockName", "provider",
|
||||
"credentialId", "costMicrodollars", "inputTokens", "outputTokens",
|
||||
"dataSize", "duration", "model", "trackingType", "trackingAmount",
|
||||
"metadata")
|
||||
VALUES (
|
||||
gen_random_uuid(), NOW(), $1, $2, $3, $4, $5, $6, $7, $8, $9,
|
||||
$10, $11, $12, $13, $14, $15, $16, $17, $18::jsonb
|
||||
)
|
||||
""",
|
||||
entry.user_id,
|
||||
entry.graph_exec_id,
|
||||
entry.node_exec_id,
|
||||
entry.graph_id,
|
||||
entry.node_id,
|
||||
entry.block_id,
|
||||
entry.block_name,
|
||||
# Normalize to lowercase so the (provider, createdAt) index is always
|
||||
# used without LOWER() on the read side.
|
||||
entry.provider.lower(),
|
||||
entry.credential_id,
|
||||
entry.cost_microdollars,
|
||||
entry.input_tokens,
|
||||
entry.output_tokens,
|
||||
entry.data_size,
|
||||
entry.duration,
|
||||
entry.model,
|
||||
entry.tracking_type,
|
||||
entry.tracking_amount,
|
||||
_json_or_none(entry.metadata),
|
||||
)
|
||||
|
||||
|
||||
# Bound the number of concurrent cost-log DB inserts to prevent unbounded
|
||||
# task/connection growth under sustained load or DB slowness.
|
||||
_log_semaphore = asyncio.Semaphore(50)
|
||||
|
||||
|
||||
async def log_platform_cost_safe(entry: PlatformCostEntry) -> None:
|
||||
"""Fire-and-forget wrapper that never raises."""
|
||||
try:
|
||||
async with _log_semaphore:
|
||||
await log_platform_cost(entry)
|
||||
except Exception:
|
||||
logger.exception(
|
||||
"Failed to log platform cost for user=%s provider=%s block=%s",
|
||||
entry.user_id,
|
||||
entry.provider,
|
||||
entry.block_name,
|
||||
)
|
||||
|
||||
|
||||
def _json_or_none(data: dict[str, Any] | None) -> str | None:
|
||||
if data is None:
|
||||
return None
|
||||
return json.dumps(data)
|
||||
|
||||
|
||||
def _mask_email(email: str | None) -> str | None:
|
||||
"""Mask an email address to reduce PII exposure in admin API responses.
|
||||
|
||||
Turns 'user@example.com' into 'us***@example.com'.
|
||||
Handles short local parts gracefully (e.g. 'a@b.com' → 'a***@b.com').
|
||||
"""
|
||||
if not email:
|
||||
return email
|
||||
at = email.find("@")
|
||||
if at < 0:
|
||||
return "***"
|
||||
local = email[:at]
|
||||
domain = email[at:]
|
||||
visible = local[:2] if len(local) >= 2 else local[:1]
|
||||
return f"{visible}***{domain}"
|
||||
|
||||
|
||||
class ProviderCostSummary(BaseModel):
|
||||
provider: str
|
||||
tracking_type: str | None = None
|
||||
total_cost_microdollars: int
|
||||
total_input_tokens: int
|
||||
total_output_tokens: int
|
||||
total_duration_seconds: float = 0.0
|
||||
total_tracking_amount: float = 0.0
|
||||
request_count: int
|
||||
|
||||
|
||||
class UserCostSummary(BaseModel):
|
||||
user_id: str | None = None
|
||||
email: str | None = None
|
||||
total_cost_microdollars: int
|
||||
total_input_tokens: int
|
||||
total_output_tokens: int
|
||||
request_count: int
|
||||
|
||||
|
||||
class CostLogRow(BaseModel):
|
||||
id: str
|
||||
created_at: datetime
|
||||
user_id: str | None = None
|
||||
email: str | None = None
|
||||
graph_exec_id: str | None = None
|
||||
node_exec_id: str | None = None
|
||||
block_name: str
|
||||
provider: str
|
||||
tracking_type: str | None = None
|
||||
cost_microdollars: int | None = None
|
||||
input_tokens: int | None = None
|
||||
output_tokens: int | None = None
|
||||
duration: float | None = None
|
||||
model: str | None = None
|
||||
|
||||
|
||||
class PlatformCostDashboard(BaseModel):
|
||||
by_provider: list[ProviderCostSummary]
|
||||
by_user: list[UserCostSummary]
|
||||
total_cost_microdollars: int
|
||||
total_requests: int
|
||||
total_users: int
|
||||
|
||||
|
||||
def _build_where(
|
||||
start: datetime | None,
|
||||
end: datetime | None,
|
||||
provider: str | None,
|
||||
user_id: str | None,
|
||||
table_alias: str = "",
|
||||
) -> tuple[str, list[Any]]:
|
||||
prefix = f"{table_alias}." if table_alias else ""
|
||||
clauses: list[str] = []
|
||||
params: list[Any] = []
|
||||
idx = 1
|
||||
|
||||
if start:
|
||||
clauses.append(f'{prefix}"createdAt" >= ${idx}::timestamptz')
|
||||
params.append(start)
|
||||
idx += 1
|
||||
if end:
|
||||
clauses.append(f'{prefix}"createdAt" <= ${idx}::timestamptz')
|
||||
params.append(end)
|
||||
idx += 1
|
||||
if provider:
|
||||
# Provider names are normalized to lowercase at write time so a plain
|
||||
# equality check is sufficient and the (provider, createdAt) index is used.
|
||||
clauses.append(f'{prefix}"provider" = ${idx}')
|
||||
params.append(provider.lower())
|
||||
idx += 1
|
||||
if user_id:
|
||||
clauses.append(f'{prefix}"userId" = ${idx}')
|
||||
params.append(user_id)
|
||||
idx += 1
|
||||
|
||||
return (" AND ".join(clauses) if clauses else "TRUE", params)
|
||||
|
||||
|
||||
async def get_platform_cost_dashboard(
|
||||
start: datetime | None = None,
|
||||
end: datetime | None = None,
|
||||
provider: str | None = None,
|
||||
user_id: str | None = None,
|
||||
) -> PlatformCostDashboard:
|
||||
"""Aggregate platform cost logs for the admin dashboard.
|
||||
|
||||
Note: by_provider rows are keyed on (provider, tracking_type). A single
|
||||
provider can therefore appear in multiple rows if it has entries with
|
||||
different billing models (e.g. "openai" with both "tokens" and "cost_usd"
|
||||
if pricing is later added for some entries). Frontend treats each row
|
||||
independently rather than as a provider primary key.
|
||||
|
||||
Defaults to the last DEFAULT_DASHBOARD_DAYS days when no start date is
|
||||
provided to avoid full-table scans on large deployments.
|
||||
"""
|
||||
if start is None:
|
||||
start = datetime.now(timezone.utc) - timedelta(days=DEFAULT_DASHBOARD_DAYS)
|
||||
where_p, params_p = _build_where(start, end, provider, user_id, "p")
|
||||
|
||||
by_provider_rows, by_user_rows, total_user_rows = await asyncio.gather(
|
||||
query_raw_with_schema(
|
||||
f"""
|
||||
SELECT
|
||||
p."provider",
|
||||
p."trackingType" AS tracking_type,
|
||||
COALESCE(SUM(p."costMicrodollars"), 0)::bigint AS total_cost,
|
||||
COALESCE(SUM(p."inputTokens"), 0)::bigint AS total_input_tokens,
|
||||
COALESCE(SUM(p."outputTokens"), 0)::bigint AS total_output_tokens,
|
||||
COALESCE(SUM(p."duration"), 0)::float AS total_duration,
|
||||
COALESCE(SUM(p."trackingAmount"), 0)::float AS total_tracking_amount,
|
||||
COUNT(*)::bigint AS request_count
|
||||
FROM {{schema_prefix}}"PlatformCostLog" p
|
||||
WHERE {where_p}
|
||||
GROUP BY p."provider", p."trackingType"
|
||||
ORDER BY total_cost DESC
|
||||
LIMIT {MAX_PROVIDER_ROWS}
|
||||
""",
|
||||
*params_p,
|
||||
),
|
||||
query_raw_with_schema(
|
||||
f"""
|
||||
SELECT
|
||||
p."userId" AS user_id,
|
||||
u."email",
|
||||
COALESCE(SUM(p."costMicrodollars"), 0)::bigint AS total_cost,
|
||||
COALESCE(SUM(p."inputTokens"), 0)::bigint AS total_input_tokens,
|
||||
COALESCE(SUM(p."outputTokens"), 0)::bigint AS total_output_tokens,
|
||||
COUNT(*)::bigint AS request_count
|
||||
FROM {{schema_prefix}}"PlatformCostLog" p
|
||||
LEFT JOIN {{schema_prefix}}"User" u ON u."id" = p."userId"
|
||||
WHERE {where_p}
|
||||
GROUP BY p."userId", u."email"
|
||||
ORDER BY total_cost DESC
|
||||
LIMIT {MAX_USER_ROWS}
|
||||
""",
|
||||
*params_p,
|
||||
),
|
||||
query_raw_with_schema(
|
||||
f"""
|
||||
SELECT COUNT(DISTINCT p."userId")::bigint AS cnt
|
||||
FROM {{schema_prefix}}"PlatformCostLog" p
|
||||
WHERE {where_p}
|
||||
""",
|
||||
*params_p,
|
||||
),
|
||||
)
|
||||
|
||||
# Use the exact COUNT(DISTINCT userId) so total_users is not capped at
|
||||
# MAX_USER_ROWS (which would silently report 100 for >100 active users).
|
||||
total_users = int(total_user_rows[0]["cnt"]) if total_user_rows else 0
|
||||
total_cost = sum(r["total_cost"] for r in by_provider_rows)
|
||||
total_requests = sum(r["request_count"] for r in by_provider_rows)
|
||||
|
||||
return PlatformCostDashboard(
|
||||
by_provider=[
|
||||
ProviderCostSummary(
|
||||
provider=r["provider"],
|
||||
tracking_type=r.get("tracking_type"),
|
||||
total_cost_microdollars=r["total_cost"],
|
||||
total_input_tokens=r["total_input_tokens"],
|
||||
total_output_tokens=r["total_output_tokens"],
|
||||
total_duration_seconds=r.get("total_duration", 0.0),
|
||||
total_tracking_amount=r.get("total_tracking_amount", 0.0),
|
||||
request_count=r["request_count"],
|
||||
)
|
||||
for r in by_provider_rows
|
||||
],
|
||||
by_user=[
|
||||
UserCostSummary(
|
||||
user_id=r.get("user_id"),
|
||||
email=_mask_email(r.get("email")),
|
||||
total_cost_microdollars=r["total_cost"],
|
||||
total_input_tokens=r["total_input_tokens"],
|
||||
total_output_tokens=r["total_output_tokens"],
|
||||
request_count=r["request_count"],
|
||||
)
|
||||
for r in by_user_rows
|
||||
],
|
||||
total_cost_microdollars=total_cost,
|
||||
total_requests=total_requests,
|
||||
total_users=total_users,
|
||||
)
|
||||
|
||||
|
||||
async def get_platform_cost_logs(
|
||||
start: datetime | None = None,
|
||||
end: datetime | None = None,
|
||||
provider: str | None = None,
|
||||
user_id: str | None = None,
|
||||
page: int = 1,
|
||||
page_size: int = 50,
|
||||
) -> tuple[list[CostLogRow], int]:
|
||||
if start is None:
|
||||
start = datetime.now(tz=timezone.utc) - timedelta(days=DEFAULT_DASHBOARD_DAYS)
|
||||
where_sql, params = _build_where(start, end, provider, user_id, "p")
|
||||
|
||||
offset = (page - 1) * page_size
|
||||
limit_idx = len(params) + 1
|
||||
offset_idx = len(params) + 2
|
||||
|
||||
count_rows, rows = await asyncio.gather(
|
||||
query_raw_with_schema(
|
||||
f"""
|
||||
SELECT COUNT(*)::bigint AS cnt
|
||||
FROM {{schema_prefix}}"PlatformCostLog" p
|
||||
WHERE {where_sql}
|
||||
""",
|
||||
*params,
|
||||
),
|
||||
query_raw_with_schema(
|
||||
f"""
|
||||
SELECT
|
||||
p."id",
|
||||
p."createdAt" AS created_at,
|
||||
p."userId" AS user_id,
|
||||
u."email",
|
||||
p."graphExecId" AS graph_exec_id,
|
||||
p."nodeExecId" AS node_exec_id,
|
||||
p."blockName" AS block_name,
|
||||
p."provider",
|
||||
p."trackingType" AS tracking_type,
|
||||
p."costMicrodollars" AS cost_microdollars,
|
||||
p."inputTokens" AS input_tokens,
|
||||
p."outputTokens" AS output_tokens,
|
||||
p."duration",
|
||||
p."model"
|
||||
FROM {{schema_prefix}}"PlatformCostLog" p
|
||||
LEFT JOIN {{schema_prefix}}"User" u ON u."id" = p."userId"
|
||||
WHERE {where_sql}
|
||||
ORDER BY p."createdAt" DESC, p."id" DESC
|
||||
LIMIT ${limit_idx} OFFSET ${offset_idx}
|
||||
""",
|
||||
*params,
|
||||
page_size,
|
||||
offset,
|
||||
),
|
||||
)
|
||||
total = count_rows[0]["cnt"] if count_rows else 0
|
||||
|
||||
logs = [
|
||||
CostLogRow(
|
||||
id=r["id"],
|
||||
created_at=r["created_at"],
|
||||
user_id=r.get("user_id"),
|
||||
email=_mask_email(r.get("email")),
|
||||
graph_exec_id=r.get("graph_exec_id"),
|
||||
node_exec_id=r.get("node_exec_id"),
|
||||
block_name=r["block_name"],
|
||||
provider=r["provider"],
|
||||
tracking_type=r.get("tracking_type"),
|
||||
cost_microdollars=r.get("cost_microdollars"),
|
||||
input_tokens=r.get("input_tokens"),
|
||||
output_tokens=r.get("output_tokens"),
|
||||
duration=r.get("duration"),
|
||||
model=r.get("model"),
|
||||
)
|
||||
for r in rows
|
||||
]
|
||||
return logs, total
|
||||
266
autogpt_platform/backend/backend/data/platform_cost_test.py
Normal file
266
autogpt_platform/backend/backend/data/platform_cost_test.py
Normal file
@@ -0,0 +1,266 @@
|
||||
"""Unit tests for helpers and async functions in platform_cost module."""
|
||||
|
||||
from datetime import datetime, timezone
|
||||
from unittest.mock import AsyncMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from .platform_cost import (
|
||||
PlatformCostEntry,
|
||||
_build_where,
|
||||
_json_or_none,
|
||||
get_platform_cost_dashboard,
|
||||
get_platform_cost_logs,
|
||||
log_platform_cost,
|
||||
log_platform_cost_safe,
|
||||
)
|
||||
|
||||
|
||||
class TestJsonOrNone:
|
||||
def test_returns_none_for_none(self):
|
||||
assert _json_or_none(None) is None
|
||||
|
||||
def test_returns_json_string_for_dict(self):
|
||||
result = _json_or_none({"key": "value", "num": 42})
|
||||
assert result is not None
|
||||
assert '"key"' in result
|
||||
assert '"value"' in result
|
||||
|
||||
def test_returns_json_for_empty_dict(self):
|
||||
assert _json_or_none({}) == "{}"
|
||||
|
||||
|
||||
class TestBuildWhere:
|
||||
def test_no_filters_returns_true(self):
|
||||
sql, params = _build_where(None, None, None, None)
|
||||
assert sql == "TRUE"
|
||||
assert params == []
|
||||
|
||||
def test_start_only(self):
|
||||
dt = datetime(2026, 1, 1, tzinfo=timezone.utc)
|
||||
sql, params = _build_where(dt, None, None, None)
|
||||
assert '"createdAt" >= $1::timestamptz' in sql
|
||||
assert params == [dt]
|
||||
|
||||
def test_end_only(self):
|
||||
dt = datetime(2026, 6, 1, tzinfo=timezone.utc)
|
||||
sql, params = _build_where(None, dt, None, None)
|
||||
assert '"createdAt" <= $1::timestamptz' in sql
|
||||
assert params == [dt]
|
||||
|
||||
def test_provider_only(self):
|
||||
# Provider names are normalized to lowercase at write time, so the
|
||||
# filter uses a plain equality check. The input is also lowercased so
|
||||
# "OpenAI" and "openai" both match stored rows.
|
||||
sql, params = _build_where(None, None, "OpenAI", None)
|
||||
assert '"provider" = $1' in sql
|
||||
assert params == ["openai"]
|
||||
|
||||
def test_user_id_only(self):
|
||||
sql, params = _build_where(None, None, None, "user-123")
|
||||
assert '"userId" = $1' in sql
|
||||
assert params == ["user-123"]
|
||||
|
||||
def test_all_filters(self):
|
||||
start = datetime(2026, 1, 1, tzinfo=timezone.utc)
|
||||
end = datetime(2026, 6, 1, tzinfo=timezone.utc)
|
||||
sql, params = _build_where(start, end, "Anthropic", "u1")
|
||||
assert "$1" in sql
|
||||
assert "$2" in sql
|
||||
assert "$3" in sql
|
||||
assert "$4" in sql
|
||||
assert len(params) == 4
|
||||
# Provider is lowercased at filter time to match stored lowercase values.
|
||||
assert params == [start, end, "anthropic", "u1"]
|
||||
|
||||
def test_table_alias(self):
|
||||
dt = datetime(2026, 1, 1, tzinfo=timezone.utc)
|
||||
sql, params = _build_where(dt, None, None, None, table_alias="p")
|
||||
assert 'p."createdAt"' in sql
|
||||
assert params == [dt]
|
||||
|
||||
def test_clauses_joined_with_and(self):
|
||||
start = datetime(2026, 1, 1, tzinfo=timezone.utc)
|
||||
end = datetime(2026, 6, 1, tzinfo=timezone.utc)
|
||||
sql, _ = _build_where(start, end, None, None)
|
||||
assert " AND " in sql
|
||||
|
||||
|
||||
def _make_entry(**overrides: object) -> PlatformCostEntry:
|
||||
return PlatformCostEntry.model_validate(
|
||||
{
|
||||
"user_id": "user-1",
|
||||
"block_id": "block-1",
|
||||
"block_name": "TestBlock",
|
||||
"provider": "openai",
|
||||
"credential_id": "cred-1",
|
||||
**overrides,
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
class TestLogPlatformCost:
|
||||
@pytest.mark.asyncio
|
||||
async def test_calls_execute_raw_with_schema(self):
|
||||
mock_exec = AsyncMock()
|
||||
with patch("backend.data.platform_cost.execute_raw_with_schema", new=mock_exec):
|
||||
entry = _make_entry(
|
||||
input_tokens=100,
|
||||
output_tokens=50,
|
||||
cost_microdollars=5000,
|
||||
model="gpt-4",
|
||||
metadata={"key": "val"},
|
||||
)
|
||||
await log_platform_cost(entry)
|
||||
mock_exec.assert_awaited_once()
|
||||
args = mock_exec.call_args
|
||||
assert args[0][1] == "user-1" # user_id is first param
|
||||
assert args[0][6] == "block-1" # block_id
|
||||
assert args[0][7] == "TestBlock" # block_name
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_metadata_none_passes_none(self):
|
||||
mock_exec = AsyncMock()
|
||||
with patch("backend.data.platform_cost.execute_raw_with_schema", new=mock_exec):
|
||||
entry = _make_entry(metadata=None)
|
||||
await log_platform_cost(entry)
|
||||
args = mock_exec.call_args
|
||||
assert args[0][-1] is None # last arg is metadata json
|
||||
|
||||
|
||||
class TestLogPlatformCostSafe:
|
||||
@pytest.mark.asyncio
|
||||
async def test_does_not_raise_on_error(self):
|
||||
with patch(
|
||||
"backend.data.platform_cost.execute_raw_with_schema",
|
||||
new=AsyncMock(side_effect=RuntimeError("DB down")),
|
||||
):
|
||||
entry = _make_entry()
|
||||
await log_platform_cost_safe(entry)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_succeeds_when_no_error(self):
|
||||
mock_exec = AsyncMock()
|
||||
with patch("backend.data.platform_cost.execute_raw_with_schema", new=mock_exec):
|
||||
entry = _make_entry()
|
||||
await log_platform_cost_safe(entry)
|
||||
mock_exec.assert_awaited_once()
|
||||
|
||||
|
||||
class TestGetPlatformCostDashboard:
|
||||
@pytest.mark.asyncio
|
||||
async def test_returns_dashboard_with_data(self):
|
||||
provider_rows = [
|
||||
{
|
||||
"provider": "openai",
|
||||
"tracking_type": "tokens",
|
||||
"total_cost": 5000,
|
||||
"total_input_tokens": 1000,
|
||||
"total_output_tokens": 500,
|
||||
"total_duration": 10.5,
|
||||
"request_count": 3,
|
||||
}
|
||||
]
|
||||
user_rows = [
|
||||
{
|
||||
"user_id": "u1",
|
||||
"email": "a@b.com",
|
||||
"total_cost": 5000,
|
||||
"total_input_tokens": 1000,
|
||||
"total_output_tokens": 500,
|
||||
"request_count": 3,
|
||||
}
|
||||
]
|
||||
# Dashboard runs 3 queries: by_provider, by_user, COUNT(DISTINCT userId).
|
||||
mock_query = AsyncMock(side_effect=[provider_rows, user_rows, [{"cnt": 1}]])
|
||||
with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
|
||||
dashboard = await get_platform_cost_dashboard()
|
||||
assert dashboard.total_cost_microdollars == 5000
|
||||
assert dashboard.total_requests == 3
|
||||
assert dashboard.total_users == 1
|
||||
assert len(dashboard.by_provider) == 1
|
||||
assert dashboard.by_provider[0].provider == "openai"
|
||||
assert dashboard.by_provider[0].tracking_type == "tokens"
|
||||
assert dashboard.by_provider[0].total_duration_seconds == 10.5
|
||||
assert len(dashboard.by_user) == 1
|
||||
assert dashboard.by_user[0].email == "a***@b.com"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_returns_empty_dashboard(self):
|
||||
mock_query = AsyncMock(side_effect=[[], [], []])
|
||||
with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
|
||||
dashboard = await get_platform_cost_dashboard()
|
||||
assert dashboard.total_cost_microdollars == 0
|
||||
assert dashboard.total_requests == 0
|
||||
assert dashboard.total_users == 0
|
||||
assert dashboard.by_provider == []
|
||||
assert dashboard.by_user == []
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_passes_filters_to_queries(self):
|
||||
start = datetime(2026, 1, 1, tzinfo=timezone.utc)
|
||||
mock_query = AsyncMock(side_effect=[[], [], []])
|
||||
with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
|
||||
await get_platform_cost_dashboard(
|
||||
start=start, provider="openai", user_id="u1"
|
||||
)
|
||||
assert mock_query.await_count == 3
|
||||
first_call_sql = mock_query.call_args_list[0][0][0]
|
||||
assert "createdAt" in first_call_sql
|
||||
|
||||
|
||||
class TestGetPlatformCostLogs:
|
||||
@pytest.mark.asyncio
|
||||
async def test_returns_logs_and_total(self):
|
||||
count_rows = [{"cnt": 1}]
|
||||
log_rows = [
|
||||
{
|
||||
"id": "log-1",
|
||||
"created_at": datetime(2026, 3, 1, tzinfo=timezone.utc),
|
||||
"user_id": "u1",
|
||||
"email": "a@b.com",
|
||||
"graph_exec_id": "g1",
|
||||
"node_exec_id": "n1",
|
||||
"block_name": "TestBlock",
|
||||
"provider": "openai",
|
||||
"tracking_type": "tokens",
|
||||
"cost_microdollars": 5000,
|
||||
"input_tokens": 100,
|
||||
"output_tokens": 50,
|
||||
"duration": 1.5,
|
||||
"model": "gpt-4",
|
||||
}
|
||||
]
|
||||
mock_query = AsyncMock(side_effect=[count_rows, log_rows])
|
||||
with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
|
||||
logs, total = await get_platform_cost_logs(page=1, page_size=10)
|
||||
assert total == 1
|
||||
assert len(logs) == 1
|
||||
assert logs[0].id == "log-1"
|
||||
assert logs[0].provider == "openai"
|
||||
assert logs[0].model == "gpt-4"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_returns_empty_when_no_data(self):
|
||||
mock_query = AsyncMock(side_effect=[[{"cnt": 0}], []])
|
||||
with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
|
||||
logs, total = await get_platform_cost_logs()
|
||||
assert total == 0
|
||||
assert logs == []
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_pagination_offset(self):
|
||||
mock_query = AsyncMock(side_effect=[[{"cnt": 100}], []])
|
||||
with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
|
||||
logs, total = await get_platform_cost_logs(page=3, page_size=25)
|
||||
assert total == 100
|
||||
second_call_args = mock_query.call_args_list[1][0]
|
||||
assert 25 in second_call_args # page_size
|
||||
assert 50 in second_call_args # offset = (3-1) * 25
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_empty_count_returns_zero(self):
|
||||
mock_query = AsyncMock(side_effect=[[], []])
|
||||
with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
|
||||
logs, total = await get_platform_cost_logs()
|
||||
assert total == 0
|
||||
@@ -82,6 +82,28 @@ async def get_user_by_email(email: str) -> Optional[User]:
|
||||
raise DatabaseError(f"Failed to get user by email {email}: {e}") from e
|
||||
|
||||
|
||||
async def search_users(query: str, limit: int = 20) -> list[tuple[str, str | None]]:
|
||||
"""Search users by partial email or name.
|
||||
|
||||
Returns a list of ``(user_id, email)`` tuples, up to *limit* results.
|
||||
Searches the User table directly — no dependency on credit history.
|
||||
"""
|
||||
query = query.strip()
|
||||
if not query or len(query) < 3:
|
||||
return []
|
||||
users = await prisma.user.find_many(
|
||||
where={
|
||||
"OR": [
|
||||
{"email": {"contains": query, "mode": "insensitive"}},
|
||||
{"name": {"contains": query, "mode": "insensitive"}},
|
||||
],
|
||||
},
|
||||
take=limit,
|
||||
order={"email": "asc"},
|
||||
)
|
||||
return [(u.id, u.email) for u in users]
|
||||
|
||||
|
||||
async def update_user_email(user_id: str, email: str):
|
||||
try:
|
||||
# Get old email first for cache invalidation
|
||||
|
||||
291
autogpt_platform/backend/backend/executor/cost_tracking.py
Normal file
291
autogpt_platform/backend/backend/executor/cost_tracking.py
Normal file
@@ -0,0 +1,291 @@
|
||||
"""Helpers for platform cost tracking on system-credential block executions."""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import threading
|
||||
from typing import TYPE_CHECKING, Any, cast
|
||||
|
||||
from backend.blocks._base import Block, BlockSchema
|
||||
from backend.copilot.token_tracking import _pending_log_tasks as _copilot_tasks
|
||||
from backend.copilot.token_tracking import (
|
||||
_pending_log_tasks_lock as _copilot_tasks_lock,
|
||||
)
|
||||
from backend.data.execution import NodeExecutionEntry
|
||||
from backend.data.model import NodeExecutionStats
|
||||
from backend.data.platform_cost import PlatformCostEntry, usd_to_microdollars
|
||||
from backend.executor.utils import block_usage_cost
|
||||
from backend.integrations.credentials_store import is_system_credential
|
||||
from backend.integrations.providers import ProviderName
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from backend.data.db_manager import DatabaseManagerAsyncClient
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Provider groupings by billing model — used when the block didn't explicitly
|
||||
# declare stats.provider_cost_type and we fall back to provider-name
|
||||
# heuristics. Values match ProviderName enum values.
|
||||
_CHARACTER_BILLED_PROVIDERS = frozenset(
|
||||
{ProviderName.D_ID.value, ProviderName.ELEVENLABS.value}
|
||||
)
|
||||
_WALLTIME_BILLED_PROVIDERS = frozenset(
|
||||
{
|
||||
ProviderName.FAL.value,
|
||||
ProviderName.REVID.value,
|
||||
ProviderName.REPLICATE.value,
|
||||
}
|
||||
)
|
||||
|
||||
# Hold strong references to in-flight log tasks so the event loop doesn't
|
||||
# garbage-collect them mid-execution. Tasks remove themselves on completion.
|
||||
# _pending_log_tasks_lock guards all reads and writes: worker threads call
|
||||
# discard() via done callbacks while drain_pending_cost_logs() iterates.
|
||||
_pending_log_tasks: set[asyncio.Task] = set()
|
||||
_pending_log_tasks_lock = threading.Lock()
|
||||
# Per-loop semaphores: asyncio.Semaphore is not thread-safe and must not be
|
||||
# shared across event loops running in different threads. Key by loop instance
|
||||
# so each executor worker thread gets its own semaphore.
|
||||
_log_semaphores: dict[asyncio.AbstractEventLoop, asyncio.Semaphore] = {}
|
||||
|
||||
|
||||
def _get_log_semaphore() -> asyncio.Semaphore:
|
||||
loop = asyncio.get_running_loop()
|
||||
sem = _log_semaphores.get(loop)
|
||||
if sem is None:
|
||||
sem = asyncio.Semaphore(50)
|
||||
_log_semaphores[loop] = sem
|
||||
return sem
|
||||
|
||||
|
||||
async def drain_pending_cost_logs(timeout: float = 5.0) -> None:
|
||||
"""Await all in-flight cost log tasks with a timeout.
|
||||
|
||||
Drains both the executor cost log tasks (_pending_log_tasks in this module,
|
||||
used for block execution cost tracking via DatabaseManagerAsyncClient) and
|
||||
the copilot cost log tasks (token_tracking._pending_log_tasks, used for
|
||||
copilot LLM turns via platform_cost_db()).
|
||||
|
||||
Call this during graceful shutdown to flush pending INSERT tasks before
|
||||
the process exits. Tasks that don't complete within `timeout` seconds are
|
||||
abandoned and their failures are already logged by _safe_log.
|
||||
"""
|
||||
# asyncio.wait() requires all tasks to belong to the running event loop.
|
||||
# _pending_log_tasks is shared across executor worker threads (each with
|
||||
# its own loop), so filter to only tasks owned by the current loop.
|
||||
# Acquire the lock to take a consistent snapshot (worker threads call
|
||||
# discard() via done callbacks concurrently with this iteration).
|
||||
current_loop = asyncio.get_running_loop()
|
||||
with _pending_log_tasks_lock:
|
||||
all_pending = [t for t in _pending_log_tasks if t.get_loop() is current_loop]
|
||||
if all_pending:
|
||||
logger.info("Draining %d executor cost log task(s)", len(all_pending))
|
||||
_, still_pending = await asyncio.wait(all_pending, timeout=timeout)
|
||||
if still_pending:
|
||||
logger.warning(
|
||||
"%d executor cost log task(s) did not complete within %.1fs",
|
||||
len(still_pending),
|
||||
timeout,
|
||||
)
|
||||
# Also drain copilot cost log tasks (token_tracking._pending_log_tasks)
|
||||
with _copilot_tasks_lock:
|
||||
copilot_pending = [t for t in _copilot_tasks if t.get_loop() is current_loop]
|
||||
if copilot_pending:
|
||||
logger.info("Draining %d copilot cost log task(s)", len(copilot_pending))
|
||||
_, still_pending = await asyncio.wait(copilot_pending, timeout=timeout)
|
||||
if still_pending:
|
||||
logger.warning(
|
||||
"%d copilot cost log task(s) did not complete within %.1fs",
|
||||
len(still_pending),
|
||||
timeout,
|
||||
)
|
||||
|
||||
|
||||
def _schedule_log(
|
||||
db_client: "DatabaseManagerAsyncClient", entry: PlatformCostEntry
|
||||
) -> None:
|
||||
async def _safe_log() -> None:
|
||||
async with _get_log_semaphore():
|
||||
try:
|
||||
await db_client.log_platform_cost(entry)
|
||||
except Exception:
|
||||
logger.exception(
|
||||
"Failed to log platform cost for user=%s provider=%s block=%s",
|
||||
entry.user_id,
|
||||
entry.provider,
|
||||
entry.block_name,
|
||||
)
|
||||
|
||||
task = asyncio.create_task(_safe_log())
|
||||
with _pending_log_tasks_lock:
|
||||
_pending_log_tasks.add(task)
|
||||
|
||||
def _remove(t: asyncio.Task) -> None:
|
||||
with _pending_log_tasks_lock:
|
||||
_pending_log_tasks.discard(t)
|
||||
|
||||
task.add_done_callback(_remove)
|
||||
|
||||
|
||||
def _extract_model_name(raw: str | dict | None) -> str | None:
|
||||
"""Return a string model name from a block input field, or None.
|
||||
|
||||
Handles str (returned as-is), dict (e.g. an enum wrapper, skipped), and
|
||||
None (no model field). Unexpected types are coerced to str as a fallback.
|
||||
"""
|
||||
if raw is None:
|
||||
return None
|
||||
if isinstance(raw, str):
|
||||
return raw
|
||||
if isinstance(raw, dict):
|
||||
return None
|
||||
return str(raw)
|
||||
|
||||
|
||||
def resolve_tracking(
|
||||
provider: str,
|
||||
stats: NodeExecutionStats,
|
||||
input_data: dict[str, Any],
|
||||
) -> tuple[str, float]:
|
||||
"""Return (tracking_type, tracking_amount) based on provider billing model.
|
||||
|
||||
Preference order:
|
||||
1. Block-declared: if the block set `provider_cost_type` on its stats,
|
||||
honor it directly (paired with `provider_cost` as the amount).
|
||||
2. Heuristic fallback: infer from `provider_cost`/token counts, then
|
||||
from provider name for per-character / per-second billing.
|
||||
"""
|
||||
# 1. Block explicitly declared its cost type (only when an amount is present)
|
||||
if stats.provider_cost_type and stats.provider_cost is not None:
|
||||
return stats.provider_cost_type, stats.provider_cost
|
||||
|
||||
# 2. Provider returned actual USD cost (OpenRouter, Exa)
|
||||
if stats.provider_cost is not None:
|
||||
return "cost_usd", stats.provider_cost
|
||||
|
||||
# 3. LLM providers: track by tokens
|
||||
if stats.input_token_count or stats.output_token_count:
|
||||
return "tokens", float(
|
||||
(stats.input_token_count or 0) + (stats.output_token_count or 0)
|
||||
)
|
||||
|
||||
# 4. Provider-specific billing heuristics
|
||||
|
||||
# TTS: billed per character of input text
|
||||
if provider == ProviderName.UNREAL_SPEECH.value:
|
||||
text = input_data.get("text", "")
|
||||
return "characters", float(len(text)) if isinstance(text, str) else 0.0
|
||||
|
||||
# D-ID + ElevenLabs voice: billed per character of script
|
||||
if provider in _CHARACTER_BILLED_PROVIDERS:
|
||||
text = (
|
||||
input_data.get("script_input", "")
|
||||
or input_data.get("text", "")
|
||||
or input_data.get("script", "") # VideoNarrationBlock uses `script`
|
||||
)
|
||||
return "characters", float(len(text)) if isinstance(text, str) else 0.0
|
||||
|
||||
# E2B: billed per second of sandbox time
|
||||
if provider == ProviderName.E2B.value:
|
||||
return "sandbox_seconds", round(stats.walltime, 3) if stats.walltime else 0.0
|
||||
|
||||
# Video/image gen: walltime includes queue + generation + polling
|
||||
if provider in _WALLTIME_BILLED_PROVIDERS:
|
||||
return "walltime_seconds", round(stats.walltime, 3) if stats.walltime else 0.0
|
||||
|
||||
# Per-request: Google Maps, Ideogram, Nvidia, Apollo, etc.
|
||||
# All billed per API call - count 1 per block execution.
|
||||
return "per_run", 1.0
|
||||
|
||||
|
||||
async def log_system_credential_cost(
|
||||
node_exec: NodeExecutionEntry,
|
||||
block: Block,
|
||||
stats: NodeExecutionStats,
|
||||
db_client: "DatabaseManagerAsyncClient",
|
||||
) -> None:
|
||||
"""Check if a system credential was used and log the platform cost.
|
||||
|
||||
Routes through DatabaseManagerAsyncClient so the write goes via the
|
||||
message-passing DB service rather than calling Prisma directly (which
|
||||
is not connected in the executor process).
|
||||
|
||||
Logs only the first matching system credential field (one log per
|
||||
execution). Any unexpected error is caught and logged — cost logging
|
||||
is strictly best-effort and must never disrupt block execution.
|
||||
|
||||
Note: costMicrodollars is left null for providers that don't return
|
||||
a USD cost. The credit_cost in metadata captures our internal credit
|
||||
charge as a proxy.
|
||||
"""
|
||||
try:
|
||||
if node_exec.execution_context.dry_run:
|
||||
return
|
||||
|
||||
input_data = node_exec.inputs
|
||||
input_model = cast(type[BlockSchema], block.input_schema)
|
||||
|
||||
for field_name in input_model.get_credentials_fields():
|
||||
cred_data = input_data.get(field_name)
|
||||
if not cred_data or not isinstance(cred_data, dict):
|
||||
continue
|
||||
cred_id = cred_data.get("id", "")
|
||||
if not cred_id or not is_system_credential(cred_id):
|
||||
continue
|
||||
|
||||
model_name = _extract_model_name(input_data.get("model"))
|
||||
|
||||
credit_cost, _ = block_usage_cost(block=block, input_data=input_data)
|
||||
|
||||
provider_name = cred_data.get("provider", "unknown")
|
||||
tracking_type, tracking_amount = resolve_tracking(
|
||||
provider=provider_name,
|
||||
stats=stats,
|
||||
input_data=input_data,
|
||||
)
|
||||
|
||||
# Only treat provider_cost as USD when the tracking type says so.
|
||||
# For other types (items, characters, per_run, ...) the
|
||||
# provider_cost field holds the raw amount, not a dollar value.
|
||||
# Use tracking_amount (the normalized value from resolve_tracking)
|
||||
# rather than raw stats.provider_cost to avoid unit mismatches.
|
||||
cost_microdollars = None
|
||||
if tracking_type == "cost_usd":
|
||||
cost_microdollars = usd_to_microdollars(tracking_amount)
|
||||
|
||||
meta: dict[str, Any] = {
|
||||
"tracking_type": tracking_type,
|
||||
"tracking_amount": tracking_amount,
|
||||
}
|
||||
if credit_cost is not None:
|
||||
meta["credit_cost"] = credit_cost
|
||||
if stats.provider_cost is not None:
|
||||
# Use 'provider_cost_raw' — the value's unit varies by tracking
|
||||
# type (USD for cost_usd, count for items/characters/per_run, etc.)
|
||||
meta["provider_cost_raw"] = stats.provider_cost
|
||||
|
||||
_schedule_log(
|
||||
db_client,
|
||||
PlatformCostEntry(
|
||||
user_id=node_exec.user_id,
|
||||
graph_exec_id=node_exec.graph_exec_id,
|
||||
node_exec_id=node_exec.node_exec_id,
|
||||
graph_id=node_exec.graph_id,
|
||||
node_id=node_exec.node_id,
|
||||
block_id=node_exec.block_id,
|
||||
block_name=block.name,
|
||||
provider=provider_name,
|
||||
credential_id=cred_id,
|
||||
cost_microdollars=cost_microdollars,
|
||||
input_tokens=stats.input_token_count,
|
||||
output_tokens=stats.output_token_count,
|
||||
data_size=stats.output_size if stats.output_size > 0 else None,
|
||||
duration=stats.walltime if stats.walltime > 0 else None,
|
||||
model=model_name,
|
||||
tracking_type=tracking_type,
|
||||
tracking_amount=tracking_amount,
|
||||
metadata=meta,
|
||||
),
|
||||
)
|
||||
return # One log per execution is enough
|
||||
except Exception:
|
||||
logger.exception("log_system_credential_cost failed unexpectedly")
|
||||
@@ -45,6 +45,10 @@ from backend.data.notifications import (
|
||||
ZeroBalanceData,
|
||||
)
|
||||
from backend.data.rabbitmq import SyncRabbitMQ
|
||||
from backend.executor.cost_tracking import (
|
||||
drain_pending_cost_logs,
|
||||
log_system_credential_cost,
|
||||
)
|
||||
from backend.integrations.creds_manager import IntegrationCredentialsManager
|
||||
from backend.notifications.notifications import queue_notification
|
||||
from backend.util import json
|
||||
@@ -692,6 +696,15 @@ class ExecutionProcessor:
|
||||
stats=graph_stats,
|
||||
)
|
||||
|
||||
# Log platform cost if system credentials were used (only on success)
|
||||
if status == ExecutionStatus.COMPLETED:
|
||||
await log_system_credential_cost(
|
||||
node_exec=node_exec,
|
||||
block=node.block,
|
||||
stats=execution_stats,
|
||||
db_client=db_client,
|
||||
)
|
||||
|
||||
return execution_stats
|
||||
|
||||
@async_time_measured
|
||||
@@ -2044,6 +2057,18 @@ class ExecutionManager(AppProcess):
|
||||
prefix + " [cancel-consumer]",
|
||||
)
|
||||
|
||||
# Drain any in-flight cost log tasks before exit so we don't silently
|
||||
# drop INSERT operations during deployments.
|
||||
loop = getattr(self, "node_execution_loop", None)
|
||||
if loop is not None and loop.is_running():
|
||||
try:
|
||||
asyncio.run_coroutine_threadsafe(
|
||||
drain_pending_cost_logs(), loop
|
||||
).result(timeout=10)
|
||||
logger.info(f"{prefix} ✅ Cost log tasks drained")
|
||||
except Exception as e:
|
||||
logger.warning(f"{prefix} ⚠️ Failed to drain cost log tasks: {e}")
|
||||
|
||||
logger.info(f"{prefix} ✅ Finished GraphExec cleanup")
|
||||
|
||||
super().cleanup()
|
||||
|
||||
@@ -0,0 +1,567 @@
|
||||
"""Unit tests for resolve_tracking and log_system_credential_cost."""
|
||||
|
||||
import asyncio
|
||||
from typing import Any
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from backend.data.execution import ExecutionContext, NodeExecutionEntry
|
||||
from backend.data.model import NodeExecutionStats
|
||||
from backend.executor.cost_tracking import log_system_credential_cost, resolve_tracking
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# resolve_tracking
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestResolveTracking:
|
||||
def _stats(self, **overrides: Any) -> NodeExecutionStats:
|
||||
return NodeExecutionStats(**overrides)
|
||||
|
||||
def test_provider_cost_returns_cost_usd(self):
|
||||
stats = self._stats(provider_cost=0.0042)
|
||||
tt, amt = resolve_tracking("openai", stats, {})
|
||||
assert tt == "cost_usd"
|
||||
assert amt == 0.0042
|
||||
|
||||
def test_token_counts_return_tokens(self):
|
||||
stats = self._stats(input_token_count=300, output_token_count=100)
|
||||
tt, amt = resolve_tracking("anthropic", stats, {})
|
||||
assert tt == "tokens"
|
||||
assert amt == 400.0
|
||||
|
||||
def test_token_counts_only_input(self):
|
||||
stats = self._stats(input_token_count=500)
|
||||
tt, amt = resolve_tracking("groq", stats, {})
|
||||
assert tt == "tokens"
|
||||
assert amt == 500.0
|
||||
|
||||
def test_unreal_speech_returns_characters(self):
|
||||
stats = self._stats()
|
||||
tt, amt = resolve_tracking("unreal_speech", stats, {"text": "Hello world"})
|
||||
assert tt == "characters"
|
||||
assert amt == 11.0
|
||||
|
||||
def test_unreal_speech_empty_text(self):
|
||||
stats = self._stats()
|
||||
tt, amt = resolve_tracking("unreal_speech", stats, {"text": ""})
|
||||
assert tt == "characters"
|
||||
assert amt == 0.0
|
||||
|
||||
def test_unreal_speech_non_string_text(self):
|
||||
stats = self._stats()
|
||||
tt, amt = resolve_tracking("unreal_speech", stats, {"text": 123})
|
||||
assert tt == "characters"
|
||||
assert amt == 0.0
|
||||
|
||||
def test_d_id_uses_script_input(self):
|
||||
stats = self._stats()
|
||||
tt, amt = resolve_tracking("d_id", stats, {"script_input": "Hello"})
|
||||
assert tt == "characters"
|
||||
assert amt == 5.0
|
||||
|
||||
def test_elevenlabs_uses_text(self):
|
||||
stats = self._stats()
|
||||
tt, amt = resolve_tracking("elevenlabs", stats, {"text": "Say this"})
|
||||
assert tt == "characters"
|
||||
assert amt == 8.0
|
||||
|
||||
def test_elevenlabs_fallback_to_text_when_no_script_input(self):
|
||||
stats = self._stats()
|
||||
tt, amt = resolve_tracking("elevenlabs", stats, {"text": "Fallback text"})
|
||||
assert tt == "characters"
|
||||
assert amt == 13.0
|
||||
|
||||
def test_elevenlabs_uses_script_field(self):
|
||||
"""VideoNarrationBlock (elevenlabs) uses `script` field, not script_input/text."""
|
||||
stats = self._stats()
|
||||
tt, amt = resolve_tracking("elevenlabs", stats, {"script": "Narration"})
|
||||
assert tt == "characters"
|
||||
assert amt == 9.0
|
||||
|
||||
def test_block_declared_cost_type_items(self):
|
||||
"""Block explicitly setting provider_cost_type='items' short-circuits heuristics."""
|
||||
stats = self._stats(provider_cost=5.0, provider_cost_type="items")
|
||||
tt, amt = resolve_tracking("google_maps", stats, {})
|
||||
assert tt == "items"
|
||||
assert amt == 5.0
|
||||
|
||||
def test_block_declared_cost_type_characters(self):
|
||||
"""TTS block can declare characters directly, bypassing input_data lookup."""
|
||||
stats = self._stats(provider_cost=42.0, provider_cost_type="characters")
|
||||
tt, amt = resolve_tracking("unreal_speech", stats, {})
|
||||
assert tt == "characters"
|
||||
assert amt == 42.0
|
||||
|
||||
def test_block_declared_cost_type_wins_over_tokens(self):
|
||||
"""provider_cost_type takes precedence over token-based heuristic."""
|
||||
stats = self._stats(
|
||||
provider_cost=1.0,
|
||||
provider_cost_type="per_run",
|
||||
input_token_count=500,
|
||||
)
|
||||
tt, amt = resolve_tracking("openai", stats, {})
|
||||
assert tt == "per_run"
|
||||
assert amt == 1.0
|
||||
|
||||
def test_e2b_returns_sandbox_seconds(self):
|
||||
stats = self._stats(walltime=45.123)
|
||||
tt, amt = resolve_tracking("e2b", stats, {})
|
||||
assert tt == "sandbox_seconds"
|
||||
assert amt == 45.123
|
||||
|
||||
def test_e2b_no_walltime(self):
|
||||
stats = self._stats(walltime=0)
|
||||
tt, amt = resolve_tracking("e2b", stats, {})
|
||||
assert tt == "sandbox_seconds"
|
||||
assert amt == 0.0
|
||||
|
||||
def test_fal_returns_walltime(self):
|
||||
stats = self._stats(walltime=12.5)
|
||||
tt, amt = resolve_tracking("fal", stats, {})
|
||||
assert tt == "walltime_seconds"
|
||||
assert amt == 12.5
|
||||
|
||||
def test_revid_returns_walltime(self):
|
||||
stats = self._stats(walltime=60.0)
|
||||
tt, amt = resolve_tracking("revid", stats, {})
|
||||
assert tt == "walltime_seconds"
|
||||
assert amt == 60.0
|
||||
|
||||
def test_replicate_returns_walltime(self):
|
||||
stats = self._stats(walltime=30.0)
|
||||
tt, amt = resolve_tracking("replicate", stats, {})
|
||||
assert tt == "walltime_seconds"
|
||||
assert amt == 30.0
|
||||
|
||||
def test_unknown_provider_returns_per_run(self):
|
||||
stats = self._stats()
|
||||
tt, amt = resolve_tracking("google_maps", stats, {})
|
||||
assert tt == "per_run"
|
||||
assert amt == 1.0
|
||||
|
||||
def test_provider_cost_takes_precedence_over_tokens(self):
|
||||
stats = self._stats(
|
||||
provider_cost=0.01, input_token_count=500, output_token_count=200
|
||||
)
|
||||
tt, amt = resolve_tracking("openai", stats, {})
|
||||
assert tt == "cost_usd"
|
||||
assert amt == 0.01
|
||||
|
||||
def test_provider_cost_zero_is_not_none(self):
|
||||
"""provider_cost=0.0 is falsy but should still be tracked as cost_usd
|
||||
(e.g. free-tier or fully-cached responses from OpenRouter)."""
|
||||
stats = self._stats(provider_cost=0.0)
|
||||
tt, amt = resolve_tracking("open_router", stats, {})
|
||||
assert tt == "cost_usd"
|
||||
assert amt == 0.0
|
||||
|
||||
def test_tokens_take_precedence_over_provider_specific(self):
|
||||
stats = self._stats(input_token_count=100, walltime=10.0)
|
||||
tt, amt = resolve_tracking("fal", stats, {})
|
||||
assert tt == "tokens"
|
||||
assert amt == 100.0
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# log_system_credential_cost
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _make_db_client() -> MagicMock:
|
||||
db_client = MagicMock()
|
||||
db_client.log_platform_cost = AsyncMock()
|
||||
return db_client
|
||||
|
||||
|
||||
def _make_block(has_credentials: bool = True) -> MagicMock:
|
||||
block = MagicMock()
|
||||
block.name = "TestBlock"
|
||||
input_schema = MagicMock()
|
||||
if has_credentials:
|
||||
input_schema.get_credentials_fields.return_value = {"credentials": MagicMock()}
|
||||
else:
|
||||
input_schema.get_credentials_fields.return_value = {}
|
||||
block.input_schema = input_schema
|
||||
return block
|
||||
|
||||
|
||||
def _make_node_exec(
|
||||
inputs: dict | None = None,
|
||||
dry_run: bool = False,
|
||||
) -> NodeExecutionEntry:
|
||||
return NodeExecutionEntry(
|
||||
user_id="user-1",
|
||||
graph_exec_id="gx-1",
|
||||
graph_id="g-1",
|
||||
graph_version=1,
|
||||
node_exec_id="nx-1",
|
||||
node_id="n-1",
|
||||
block_id="b-1",
|
||||
inputs=inputs or {},
|
||||
execution_context=ExecutionContext(dry_run=dry_run),
|
||||
)
|
||||
|
||||
|
||||
class TestLogSystemCredentialCost:
|
||||
@pytest.mark.asyncio
|
||||
async def test_skips_dry_run(self):
|
||||
db_client = _make_db_client()
|
||||
node_exec = _make_node_exec(dry_run=True)
|
||||
block = _make_block()
|
||||
stats = NodeExecutionStats()
|
||||
await log_system_credential_cost(node_exec, block, stats, db_client)
|
||||
db_client.log_platform_cost.assert_not_awaited()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_skips_when_no_credential_fields(self):
|
||||
db_client = _make_db_client()
|
||||
node_exec = _make_node_exec(inputs={})
|
||||
block = _make_block(has_credentials=False)
|
||||
stats = NodeExecutionStats()
|
||||
await log_system_credential_cost(node_exec, block, stats, db_client)
|
||||
db_client.log_platform_cost.assert_not_awaited()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_skips_when_cred_data_missing(self):
|
||||
db_client = _make_db_client()
|
||||
node_exec = _make_node_exec(inputs={})
|
||||
block = _make_block()
|
||||
stats = NodeExecutionStats()
|
||||
await log_system_credential_cost(node_exec, block, stats, db_client)
|
||||
db_client.log_platform_cost.assert_not_awaited()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_skips_when_not_system_credential(self):
|
||||
db_client = _make_db_client()
|
||||
with patch(
|
||||
"backend.executor.cost_tracking.is_system_credential",
|
||||
return_value=False,
|
||||
):
|
||||
node_exec = _make_node_exec(
|
||||
inputs={
|
||||
"credentials": {"id": "user-cred-123", "provider": "openai"},
|
||||
}
|
||||
)
|
||||
block = _make_block()
|
||||
stats = NodeExecutionStats()
|
||||
await log_system_credential_cost(node_exec, block, stats, db_client)
|
||||
db_client.log_platform_cost.assert_not_awaited()
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_logs_with_system_credential(self):
|
||||
db_client = _make_db_client()
|
||||
with (
|
||||
patch(
|
||||
"backend.executor.cost_tracking.is_system_credential", return_value=True
|
||||
),
|
||||
patch(
|
||||
"backend.executor.cost_tracking.block_usage_cost",
|
||||
return_value=(10, None),
|
||||
),
|
||||
):
|
||||
node_exec = _make_node_exec(
|
||||
inputs={
|
||||
"credentials": {"id": "sys-cred-1", "provider": "openai"},
|
||||
"model": "gpt-4",
|
||||
}
|
||||
)
|
||||
block = _make_block()
|
||||
stats = NodeExecutionStats(input_token_count=500, output_token_count=200)
|
||||
await log_system_credential_cost(node_exec, block, stats, db_client)
|
||||
await asyncio.sleep(0)
|
||||
|
||||
db_client.log_platform_cost.assert_awaited_once()
|
||||
entry = db_client.log_platform_cost.call_args[0][0]
|
||||
assert entry.user_id == "user-1"
|
||||
assert entry.provider == "openai"
|
||||
assert entry.block_name == "TestBlock"
|
||||
assert entry.model == "gpt-4"
|
||||
assert entry.input_tokens == 500
|
||||
assert entry.output_tokens == 200
|
||||
assert entry.tracking_type == "tokens"
|
||||
assert entry.metadata["tracking_type"] == "tokens"
|
||||
assert entry.metadata["tracking_amount"] == 700.0
|
||||
assert entry.metadata["credit_cost"] == 10
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_logs_with_provider_cost(self):
|
||||
db_client = _make_db_client()
|
||||
with (
|
||||
patch(
|
||||
"backend.executor.cost_tracking.is_system_credential", return_value=True
|
||||
),
|
||||
patch(
|
||||
"backend.executor.cost_tracking.block_usage_cost",
|
||||
return_value=(5, None),
|
||||
),
|
||||
):
|
||||
node_exec = _make_node_exec(
|
||||
inputs={
|
||||
"credentials": {"id": "sys-cred-2", "provider": "open_router"},
|
||||
}
|
||||
)
|
||||
block = _make_block()
|
||||
stats = NodeExecutionStats(provider_cost=0.0015)
|
||||
await log_system_credential_cost(node_exec, block, stats, db_client)
|
||||
await asyncio.sleep(0)
|
||||
|
||||
entry = db_client.log_platform_cost.call_args[0][0]
|
||||
assert entry.cost_microdollars == 1500
|
||||
assert entry.tracking_type == "cost_usd"
|
||||
assert entry.metadata["tracking_type"] == "cost_usd"
|
||||
assert entry.metadata["provider_cost_raw"] == 0.0015
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_model_name_enum_converted_to_str(self):
|
||||
db_client = _make_db_client()
|
||||
with (
|
||||
patch(
|
||||
"backend.executor.cost_tracking.is_system_credential", return_value=True
|
||||
),
|
||||
patch(
|
||||
"backend.executor.cost_tracking.block_usage_cost",
|
||||
return_value=(0, None),
|
||||
),
|
||||
):
|
||||
from enum import Enum
|
||||
|
||||
class FakeModel(Enum):
|
||||
GPT4 = "gpt-4"
|
||||
|
||||
node_exec = _make_node_exec(
|
||||
inputs={
|
||||
"credentials": {"id": "sys-cred", "provider": "openai"},
|
||||
"model": FakeModel.GPT4,
|
||||
}
|
||||
)
|
||||
block = _make_block()
|
||||
stats = NodeExecutionStats()
|
||||
await log_system_credential_cost(node_exec, block, stats, db_client)
|
||||
await asyncio.sleep(0)
|
||||
|
||||
entry = db_client.log_platform_cost.call_args[0][0]
|
||||
assert entry.model == "FakeModel.GPT4"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_model_name_dict_becomes_none(self):
|
||||
db_client = _make_db_client()
|
||||
with (
|
||||
patch(
|
||||
"backend.executor.cost_tracking.is_system_credential", return_value=True
|
||||
),
|
||||
patch(
|
||||
"backend.executor.cost_tracking.block_usage_cost",
|
||||
return_value=(0, None),
|
||||
),
|
||||
):
|
||||
node_exec = _make_node_exec(
|
||||
inputs={
|
||||
"credentials": {"id": "sys-cred", "provider": "openai"},
|
||||
"model": {"nested": "value"},
|
||||
}
|
||||
)
|
||||
block = _make_block()
|
||||
stats = NodeExecutionStats()
|
||||
await log_system_credential_cost(node_exec, block, stats, db_client)
|
||||
await asyncio.sleep(0)
|
||||
|
||||
entry = db_client.log_platform_cost.call_args[0][0]
|
||||
assert entry.model is None
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_does_not_raise_when_block_usage_cost_raises(self):
|
||||
"""log_system_credential_cost must swallow exceptions from block_usage_cost."""
|
||||
db_client = _make_db_client()
|
||||
with (
|
||||
patch(
|
||||
"backend.executor.cost_tracking.is_system_credential", return_value=True
|
||||
),
|
||||
patch(
|
||||
"backend.executor.cost_tracking.block_usage_cost",
|
||||
side_effect=RuntimeError("pricing lookup failed"),
|
||||
),
|
||||
):
|
||||
node_exec = _make_node_exec(
|
||||
inputs={
|
||||
"credentials": {"id": "sys-cred", "provider": "openai"},
|
||||
}
|
||||
)
|
||||
block = _make_block()
|
||||
stats = NodeExecutionStats()
|
||||
# Should not raise — outer except must catch block_usage_cost error
|
||||
await log_system_credential_cost(node_exec, block, stats, db_client)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_round_instead_of_int_for_microdollars(self):
|
||||
db_client = _make_db_client()
|
||||
with (
|
||||
patch(
|
||||
"backend.executor.cost_tracking.is_system_credential", return_value=True
|
||||
),
|
||||
patch(
|
||||
"backend.executor.cost_tracking.block_usage_cost",
|
||||
return_value=(0, None),
|
||||
),
|
||||
):
|
||||
node_exec = _make_node_exec(
|
||||
inputs={
|
||||
"credentials": {"id": "sys-cred", "provider": "openai"},
|
||||
}
|
||||
)
|
||||
block = _make_block()
|
||||
# 0.0015 * 1_000_000 = 1499.9999999... with float math
|
||||
# round() should give 1500, int() would give 1499
|
||||
stats = NodeExecutionStats(provider_cost=0.0015)
|
||||
await log_system_credential_cost(node_exec, block, stats, db_client)
|
||||
await asyncio.sleep(0)
|
||||
|
||||
entry = db_client.log_platform_cost.call_args[0][0]
|
||||
assert entry.cost_microdollars == 1500
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_per_run_metadata_has_no_provider_cost_raw(self):
|
||||
"""For per-run providers (google_maps etc), provider_cost_raw is absent
|
||||
from metadata since stats.provider_cost is None."""
|
||||
db_client = _make_db_client()
|
||||
with (
|
||||
patch(
|
||||
"backend.executor.cost_tracking.is_system_credential", return_value=True
|
||||
),
|
||||
patch(
|
||||
"backend.executor.cost_tracking.block_usage_cost",
|
||||
return_value=(0, None),
|
||||
),
|
||||
):
|
||||
node_exec = _make_node_exec(
|
||||
inputs={
|
||||
"credentials": {"id": "sys-cred", "provider": "google_maps"},
|
||||
}
|
||||
)
|
||||
block = _make_block()
|
||||
stats = NodeExecutionStats() # no provider_cost
|
||||
await log_system_credential_cost(node_exec, block, stats, db_client)
|
||||
await asyncio.sleep(0)
|
||||
|
||||
entry = db_client.log_platform_cost.call_args[0][0]
|
||||
assert entry.tracking_type == "per_run"
|
||||
assert "provider_cost_raw" not in (entry.metadata or {})
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# merge_stats accumulation
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestMergeStats:
|
||||
"""Tests for NodeExecutionStats accumulation via += (used by Block.merge_stats)."""
|
||||
|
||||
def test_accumulates_output_size(self):
|
||||
stats = NodeExecutionStats()
|
||||
stats += NodeExecutionStats(output_size=10)
|
||||
stats += NodeExecutionStats(output_size=25)
|
||||
assert stats.output_size == 35
|
||||
|
||||
def test_accumulates_tokens(self):
|
||||
stats = NodeExecutionStats()
|
||||
stats += NodeExecutionStats(input_token_count=100, output_token_count=50)
|
||||
stats += NodeExecutionStats(input_token_count=200, output_token_count=150)
|
||||
assert stats.input_token_count == 300
|
||||
assert stats.output_token_count == 200
|
||||
|
||||
def test_preserves_provider_cost(self):
|
||||
stats = NodeExecutionStats()
|
||||
stats += NodeExecutionStats(provider_cost=0.005)
|
||||
stats += NodeExecutionStats(output_size=10)
|
||||
assert stats.provider_cost == 0.005
|
||||
assert stats.output_size == 10
|
||||
|
||||
def test_provider_cost_accumulates(self):
|
||||
"""Multiple merge_stats with provider_cost should sum (multi-round
|
||||
tool-calling in copilot / retries can report cost separately)."""
|
||||
stats = NodeExecutionStats()
|
||||
stats += NodeExecutionStats(provider_cost=0.001)
|
||||
stats += NodeExecutionStats(provider_cost=0.002)
|
||||
stats += NodeExecutionStats(provider_cost=0.003)
|
||||
assert stats.provider_cost == pytest.approx(0.006)
|
||||
|
||||
def test_provider_cost_none_does_not_overwrite(self):
|
||||
"""A None provider_cost must not wipe a previously-set value."""
|
||||
stats = NodeExecutionStats(provider_cost=0.01)
|
||||
stats += NodeExecutionStats() # provider_cost=None by default
|
||||
assert stats.provider_cost == 0.01
|
||||
|
||||
def test_provider_cost_type_last_write_wins(self):
|
||||
"""provider_cost_type is a Literal — last set value wins on merge."""
|
||||
stats = NodeExecutionStats(provider_cost_type="tokens")
|
||||
stats += NodeExecutionStats(provider_cost_type="items")
|
||||
assert stats.provider_cost_type == "items"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# on_node_execution -> log_system_credential_cost integration
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestManagerCostTrackingIntegration:
|
||||
@pytest.mark.asyncio
|
||||
async def test_log_called_with_accumulated_stats(self):
|
||||
"""Verify that log_system_credential_cost receives stats that could
|
||||
have been accumulated by merge_stats across multiple yield steps."""
|
||||
db_client = _make_db_client()
|
||||
with (
|
||||
patch(
|
||||
"backend.executor.cost_tracking.is_system_credential", return_value=True
|
||||
),
|
||||
patch(
|
||||
"backend.executor.cost_tracking.block_usage_cost",
|
||||
return_value=(5, None),
|
||||
),
|
||||
):
|
||||
stats = NodeExecutionStats()
|
||||
stats += NodeExecutionStats(output_size=10, input_token_count=100)
|
||||
stats += NodeExecutionStats(output_size=25, input_token_count=200)
|
||||
|
||||
assert stats.output_size == 35
|
||||
assert stats.input_token_count == 300
|
||||
|
||||
node_exec = _make_node_exec(
|
||||
inputs={
|
||||
"credentials": {"id": "sys-cred-acc", "provider": "openai"},
|
||||
"model": "gpt-4",
|
||||
}
|
||||
)
|
||||
block = _make_block()
|
||||
await log_system_credential_cost(node_exec, block, stats, db_client)
|
||||
await asyncio.sleep(0)
|
||||
|
||||
db_client.log_platform_cost.assert_awaited_once()
|
||||
entry = db_client.log_platform_cost.call_args[0][0]
|
||||
assert entry.input_tokens == 300
|
||||
assert entry.tracking_type == "tokens"
|
||||
assert entry.metadata["tracking_amount"] == 300.0
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_skips_cost_log_when_status_is_failed(self):
|
||||
"""Manager only calls log_system_credential_cost on COMPLETED status.
|
||||
|
||||
This test verifies the guard condition `if status == COMPLETED` directly:
|
||||
calling log_system_credential_cost only happens on success, never on
|
||||
FAILED or ERROR executions.
|
||||
"""
|
||||
from backend.data.execution import ExecutionStatus
|
||||
|
||||
db_client = _make_db_client()
|
||||
node_exec = _make_node_exec(
|
||||
inputs={"credentials": {"id": "sys-cred", "provider": "openai"}}
|
||||
)
|
||||
block = _make_block()
|
||||
stats = NodeExecutionStats(input_token_count=100)
|
||||
|
||||
# Simulate the manager guard: only call on COMPLETED
|
||||
status = ExecutionStatus.FAILED
|
||||
if status == ExecutionStatus.COMPLETED:
|
||||
await log_system_credential_cost(node_exec, block, stats, db_client)
|
||||
|
||||
db_client.log_platform_cost.assert_not_awaited()
|
||||
@@ -121,10 +121,16 @@ def _make_hashable_key(
|
||||
|
||||
|
||||
def _make_redis_key(key: tuple[Any, ...], func_name: str) -> str:
|
||||
"""Convert a hashable key tuple to a Redis key string."""
|
||||
# Ensure key is already hashable
|
||||
hashable_key = key if isinstance(key, tuple) else (key,)
|
||||
return f"cache:{func_name}:{hash(hashable_key)}"
|
||||
"""Convert a hashable key tuple to a Redis key string.
|
||||
|
||||
Uses SHA-256 instead of Python's built-in ``hash()`` because ``hash()``
|
||||
is randomised per-process (``PYTHONHASHSEED``). In a multi-pod
|
||||
deployment every pod must derive the **same** Redis key for the same
|
||||
arguments, otherwise cache lookups and invalidations silently miss.
|
||||
"""
|
||||
key_bytes = repr(key).encode()
|
||||
digest = hashlib.sha256(key_bytes).hexdigest()
|
||||
return f"cache:{func_name}:{digest}"
|
||||
|
||||
|
||||
@runtime_checkable
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
import contextlib
|
||||
import logging
|
||||
import os
|
||||
from enum import Enum
|
||||
from functools import wraps
|
||||
from typing import Any, Awaitable, Callable, TypeVar
|
||||
@@ -38,6 +39,7 @@ class Flag(str, Enum):
|
||||
AGENT_ACTIVITY = "agent-activity"
|
||||
ENABLE_PLATFORM_PAYMENT = "enable-platform-payment"
|
||||
CHAT = "chat"
|
||||
CHAT_MODE_OPTION = "chat-mode-option"
|
||||
COPILOT_SDK = "copilot-sdk"
|
||||
COPILOT_DAILY_TOKEN_LIMIT = "copilot-daily-token-limit"
|
||||
COPILOT_WEEKLY_TOKEN_LIMIT = "copilot-weekly-token-limit"
|
||||
@@ -165,6 +167,30 @@ async def get_feature_flag_value(
|
||||
return default
|
||||
|
||||
|
||||
def _env_flag_override(flag_key: Flag) -> bool | None:
|
||||
"""Return a local override for ``flag_key`` from the environment.
|
||||
|
||||
Set ``FORCE_FLAG_<NAME>=true|false`` (``NAME`` = flag value with
|
||||
``-`` → ``_``, upper-cased) to bypass LaunchDarkly for a single
|
||||
flag in local dev or tests. Returns ``None`` when no override
|
||||
is configured so the caller falls through to LaunchDarkly.
|
||||
|
||||
The ``NEXT_PUBLIC_FORCE_FLAG_<NAME>`` prefix is also accepted so a
|
||||
single shared env var can toggle a flag across backend and
|
||||
frontend (the frontend requires the ``NEXT_PUBLIC_`` prefix to
|
||||
expose the value to the browser bundle).
|
||||
|
||||
Example: ``FORCE_FLAG_CHAT_MODE_OPTION=true`` forces
|
||||
``Flag.CHAT_MODE_OPTION`` on regardless of LaunchDarkly.
|
||||
"""
|
||||
suffix = flag_key.value.upper().replace("-", "_")
|
||||
for prefix in ("FORCE_FLAG_", "NEXT_PUBLIC_FORCE_FLAG_"):
|
||||
raw = os.environ.get(prefix + suffix)
|
||||
if raw is not None:
|
||||
return raw.strip().lower() in ("1", "true", "yes", "on")
|
||||
return None
|
||||
|
||||
|
||||
async def is_feature_enabled(
|
||||
flag_key: Flag,
|
||||
user_id: str,
|
||||
@@ -181,6 +207,11 @@ async def is_feature_enabled(
|
||||
Returns:
|
||||
True if feature is enabled, False otherwise
|
||||
"""
|
||||
override = _env_flag_override(flag_key)
|
||||
if override is not None:
|
||||
logger.debug(f"Feature flag {flag_key} overridden by env: {override}")
|
||||
return override
|
||||
|
||||
result = await get_feature_flag_value(flag_key.value, user_id, default)
|
||||
|
||||
# If the result is already a boolean, return it
|
||||
|
||||
@@ -4,6 +4,7 @@ from ldclient import LDClient
|
||||
|
||||
from backend.util.feature_flag import (
|
||||
Flag,
|
||||
_env_flag_override,
|
||||
feature_flag,
|
||||
is_feature_enabled,
|
||||
mock_flag_variation,
|
||||
@@ -111,3 +112,59 @@ async def test_is_feature_enabled_with_flag_enum(mocker):
|
||||
assert result is True
|
||||
# Should call with the flag's string value
|
||||
mock_get_feature_flag_value.assert_called_once()
|
||||
|
||||
|
||||
class TestEnvFlagOverride:
|
||||
def test_force_flag_true(self, monkeypatch: pytest.MonkeyPatch):
|
||||
monkeypatch.setenv("FORCE_FLAG_CHAT", "true")
|
||||
assert _env_flag_override(Flag.CHAT) is True
|
||||
|
||||
def test_force_flag_false(self, monkeypatch: pytest.MonkeyPatch):
|
||||
monkeypatch.setenv("FORCE_FLAG_CHAT", "false")
|
||||
assert _env_flag_override(Flag.CHAT) is False
|
||||
|
||||
def test_next_public_prefix_true(self, monkeypatch: pytest.MonkeyPatch):
|
||||
monkeypatch.setenv("NEXT_PUBLIC_FORCE_FLAG_CHAT", "true")
|
||||
assert _env_flag_override(Flag.CHAT) is True
|
||||
|
||||
def test_unset_returns_none(self, monkeypatch: pytest.MonkeyPatch):
|
||||
monkeypatch.delenv("FORCE_FLAG_CHAT", raising=False)
|
||||
monkeypatch.delenv("NEXT_PUBLIC_FORCE_FLAG_CHAT", raising=False)
|
||||
assert _env_flag_override(Flag.CHAT) is None
|
||||
|
||||
def test_invalid_value_returns_false(self, monkeypatch: pytest.MonkeyPatch):
|
||||
monkeypatch.setenv("FORCE_FLAG_CHAT", "notaboolean")
|
||||
assert _env_flag_override(Flag.CHAT) is False
|
||||
|
||||
def test_numeric_one_returns_true(self, monkeypatch: pytest.MonkeyPatch):
|
||||
monkeypatch.setenv("FORCE_FLAG_CHAT", "1")
|
||||
assert _env_flag_override(Flag.CHAT) is True
|
||||
|
||||
def test_yes_returns_true(self, monkeypatch: pytest.MonkeyPatch):
|
||||
monkeypatch.setenv("FORCE_FLAG_CHAT", "yes")
|
||||
assert _env_flag_override(Flag.CHAT) is True
|
||||
|
||||
def test_on_returns_true(self, monkeypatch: pytest.MonkeyPatch):
|
||||
monkeypatch.setenv("FORCE_FLAG_CHAT", "on")
|
||||
assert _env_flag_override(Flag.CHAT) is True
|
||||
|
||||
def test_hyphenated_flag_converts_to_underscore(
|
||||
self, monkeypatch: pytest.MonkeyPatch
|
||||
):
|
||||
monkeypatch.setenv("FORCE_FLAG_CHAT_MODE_OPTION", "true")
|
||||
assert _env_flag_override(Flag.CHAT_MODE_OPTION) is True
|
||||
|
||||
def test_force_flag_takes_precedence_over_next_public(
|
||||
self, monkeypatch: pytest.MonkeyPatch
|
||||
):
|
||||
monkeypatch.setenv("FORCE_FLAG_CHAT", "false")
|
||||
monkeypatch.setenv("NEXT_PUBLIC_FORCE_FLAG_CHAT", "true")
|
||||
assert _env_flag_override(Flag.CHAT) is False
|
||||
|
||||
def test_whitespace_is_stripped(self, monkeypatch: pytest.MonkeyPatch):
|
||||
monkeypatch.setenv("FORCE_FLAG_CHAT", " true ")
|
||||
assert _env_flag_override(Flag.CHAT) is True
|
||||
|
||||
def test_case_insensitive_value(self, monkeypatch: pytest.MonkeyPatch):
|
||||
monkeypatch.setenv("FORCE_FLAG_CHAT", "TRUE")
|
||||
assert _env_flag_override(Flag.CHAT) is True
|
||||
|
||||
@@ -155,6 +155,7 @@ class WorkspaceManager:
|
||||
path: Optional[str] = None,
|
||||
mime_type: Optional[str] = None,
|
||||
overwrite: bool = False,
|
||||
metadata: Optional[dict] = None,
|
||||
) -> WorkspaceFile:
|
||||
"""
|
||||
Write file to workspace.
|
||||
@@ -168,6 +169,7 @@ class WorkspaceManager:
|
||||
path: Virtual path (defaults to "/{filename}", session-scoped if session_id set)
|
||||
mime_type: MIME type (auto-detected if not provided)
|
||||
overwrite: Whether to overwrite existing file at path
|
||||
metadata: Optional metadata dict (e.g., origin tracking)
|
||||
|
||||
Returns:
|
||||
Created WorkspaceFile instance
|
||||
@@ -246,6 +248,7 @@ class WorkspaceManager:
|
||||
mime_type=mime_type,
|
||||
size_bytes=len(content),
|
||||
checksum=checksum,
|
||||
metadata=metadata,
|
||||
)
|
||||
except UniqueViolationError:
|
||||
if retries > 0:
|
||||
|
||||
@@ -0,0 +1,5 @@
|
||||
-- CreateEnum
|
||||
CREATE TYPE "SubscriptionTier" AS ENUM ('FREE', 'PRO', 'BUSINESS', 'ENTERPRISE');
|
||||
|
||||
-- AlterTable: add subscriptionTier column with default PRO (beta testing)
|
||||
ALTER TABLE "User" ADD COLUMN "subscriptionTier" "SubscriptionTier" NOT NULL DEFAULT 'PRO';
|
||||
@@ -0,0 +1,42 @@
|
||||
-- CreateTable
|
||||
CREATE TABLE "PlatformCostLog" (
|
||||
"id" TEXT NOT NULL,
|
||||
"createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
||||
"userId" TEXT,
|
||||
"graphExecId" TEXT,
|
||||
"nodeExecId" TEXT,
|
||||
"graphId" TEXT,
|
||||
"nodeId" TEXT,
|
||||
"blockId" TEXT NOT NULL,
|
||||
"blockName" TEXT NOT NULL,
|
||||
"provider" TEXT NOT NULL,
|
||||
"credentialId" TEXT NOT NULL,
|
||||
"costMicrodollars" BIGINT,
|
||||
"inputTokens" INTEGER,
|
||||
"outputTokens" INTEGER,
|
||||
"dataSize" INTEGER,
|
||||
"duration" DOUBLE PRECISION,
|
||||
"model" TEXT,
|
||||
"trackingType" TEXT,
|
||||
"metadata" JSONB,
|
||||
|
||||
CONSTRAINT "PlatformCostLog_pkey" PRIMARY KEY ("id")
|
||||
);
|
||||
|
||||
-- CreateIndex
|
||||
CREATE INDEX "PlatformCostLog_userId_createdAt_idx" ON "PlatformCostLog"("userId", "createdAt");
|
||||
|
||||
-- CreateIndex
|
||||
CREATE INDEX "PlatformCostLog_provider_createdAt_idx" ON "PlatformCostLog"("provider", "createdAt");
|
||||
|
||||
-- CreateIndex
|
||||
CREATE INDEX "PlatformCostLog_createdAt_idx" ON "PlatformCostLog"("createdAt");
|
||||
|
||||
-- CreateIndex
|
||||
CREATE INDEX "PlatformCostLog_graphExecId_idx" ON "PlatformCostLog"("graphExecId");
|
||||
|
||||
-- CreateIndex
|
||||
CREATE INDEX "PlatformCostLog_provider_trackingType_idx" ON "PlatformCostLog"("provider", "trackingType");
|
||||
|
||||
-- AddForeignKey
|
||||
ALTER TABLE "PlatformCostLog" ADD CONSTRAINT "PlatformCostLog_userId_fkey" FOREIGN KEY ("userId") REFERENCES "User"("id") ON DELETE SET NULL ON UPDATE CASCADE;
|
||||
@@ -0,0 +1,2 @@
|
||||
-- AlterTable
|
||||
ALTER TABLE "PlatformCostLog" ADD COLUMN "trackingAmount" DOUBLE PRECISION;
|
||||
@@ -40,6 +40,15 @@ model User {
|
||||
|
||||
timezone String @default("not-set")
|
||||
|
||||
// CoPilot subscription tier — controls rate-limit multipliers.
|
||||
// Multipliers applied in get_global_rate_limits(): FREE=1x, PRO=5x, BUSINESS=20x, ENTERPRISE=60x.
|
||||
// NOTE: @default(PRO) is intentional for the beta period — all existing and new
|
||||
// users receive PRO-level (5x) rate limits by default. The Python-level constant
|
||||
// DEFAULT_TIER=FREE (in copilot/rate_limit.py) acts as a code-level fallback when
|
||||
// the DB value is NULL or unrecognised. At GA, a migration will flip the column
|
||||
// default to FREE and batch-update users to their billing-derived tiers.
|
||||
subscriptionTier SubscriptionTier @default(PRO)
|
||||
|
||||
// Relations
|
||||
|
||||
AgentGraphs AgentGraph[]
|
||||
@@ -66,6 +75,8 @@ model User {
|
||||
PendingHumanReviews PendingHumanReview[]
|
||||
Workspace UserWorkspace?
|
||||
|
||||
PlatformCostLogs PlatformCostLog[]
|
||||
|
||||
// OAuth Provider relations
|
||||
OAuthApplications OAuthApplication[]
|
||||
OAuthAuthorizationCodes OAuthAuthorizationCode[]
|
||||
@@ -73,6 +84,13 @@ model User {
|
||||
OAuthRefreshTokens OAuthRefreshToken[]
|
||||
}
|
||||
|
||||
enum SubscriptionTier {
|
||||
FREE
|
||||
PRO
|
||||
BUSINESS
|
||||
ENTERPRISE
|
||||
}
|
||||
|
||||
enum OnboardingStep {
|
||||
// Introductory onboarding (Library)
|
||||
WELCOME
|
||||
@@ -799,6 +817,45 @@ model CreditRefundRequest {
|
||||
@@index([userId, transactionKey])
|
||||
}
|
||||
|
||||
////////////////////////////////////////////////////////////
|
||||
////////////////////////////////////////////////////////////
|
||||
////////// Platform Cost Tracking TABLES //////////////
|
||||
////////////////////////////////////////////////////////////
|
||||
|
||||
model PlatformCostLog {
|
||||
id String @id @default(uuid())
|
||||
createdAt DateTime @default(now())
|
||||
|
||||
userId String?
|
||||
User User? @relation(fields: [userId], references: [id], onDelete: SetNull)
|
||||
graphExecId String?
|
||||
nodeExecId String?
|
||||
graphId String?
|
||||
nodeId String?
|
||||
blockId String
|
||||
blockName String
|
||||
provider String
|
||||
credentialId String
|
||||
|
||||
// Cost in microdollars (1 USD = 1,000,000). Null if unknown.
|
||||
costMicrodollars BigInt?
|
||||
|
||||
inputTokens Int?
|
||||
outputTokens Int?
|
||||
dataSize Int? // bytes
|
||||
duration Float? // seconds
|
||||
model String?
|
||||
trackingType String? // e.g. "cost_usd", "tokens", "characters", "items", "per_run", "sandbox_seconds", "walltime_seconds"
|
||||
trackingAmount Float? // Amount in the unit implied by trackingType
|
||||
metadata Json?
|
||||
|
||||
@@index([userId, createdAt])
|
||||
@@index([provider, createdAt])
|
||||
@@index([createdAt])
|
||||
@@index([graphExecId])
|
||||
@@index([provider, trackingType])
|
||||
}
|
||||
|
||||
////////////////////////////////////////////////////////////
|
||||
////////////////////////////////////////////////////////////
|
||||
////////////// Store TABLES ///////////////////////////
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
{
|
||||
"daily_token_limit": 2500000,
|
||||
"daily_tokens_used": 500000,
|
||||
"tier": "FREE",
|
||||
"user_email": "target@example.com",
|
||||
"user_id": "5e53486c-cf57-477e-ba2a-cb02dc828e1c",
|
||||
"weekly_token_limit": 12500000,
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
{
|
||||
"daily_token_limit": 2500000,
|
||||
"daily_tokens_used": 0,
|
||||
"tier": "FREE",
|
||||
"user_email": "target@example.com",
|
||||
"user_id": "5e53486c-cf57-477e-ba2a-cb02dc828e1c",
|
||||
"weekly_token_limit": 12500000,
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
{
|
||||
"daily_token_limit": 2500000,
|
||||
"daily_tokens_used": 0,
|
||||
"tier": "FREE",
|
||||
"user_email": "target@example.com",
|
||||
"user_id": "5e53486c-cf57-477e-ba2a-cb02dc828e1c",
|
||||
"weekly_token_limit": 12500000,
|
||||
|
||||
@@ -140,7 +140,9 @@ class TestFixOrchestratorBlocks:
|
||||
assert defaults["conversation_compaction"] is True
|
||||
assert defaults["retry"] == 3
|
||||
assert defaults["multiple_tool_calls"] is False
|
||||
assert len(fixer.fixes_applied) == 4
|
||||
assert defaults["execution_mode"] == "extended_thinking"
|
||||
assert defaults["model"] == "claude-opus-4-6"
|
||||
assert len(fixer.fixes_applied) == 6
|
||||
|
||||
def test_preserves_existing_values(self):
|
||||
"""Existing user-set values are never overwritten."""
|
||||
@@ -153,6 +155,8 @@ class TestFixOrchestratorBlocks:
|
||||
"conversation_compaction": False,
|
||||
"retry": 1,
|
||||
"multiple_tool_calls": True,
|
||||
"execution_mode": "built_in",
|
||||
"model": "gpt-4o",
|
||||
}
|
||||
)
|
||||
],
|
||||
@@ -166,6 +170,8 @@ class TestFixOrchestratorBlocks:
|
||||
assert defaults["conversation_compaction"] is False
|
||||
assert defaults["retry"] == 1
|
||||
assert defaults["multiple_tool_calls"] is True
|
||||
assert defaults["execution_mode"] == "built_in"
|
||||
assert defaults["model"] == "gpt-4o"
|
||||
assert len(fixer.fixes_applied) == 0
|
||||
|
||||
def test_partial_defaults(self):
|
||||
@@ -189,7 +195,9 @@ class TestFixOrchestratorBlocks:
|
||||
assert defaults["conversation_compaction"] is True # filled
|
||||
assert defaults["retry"] == 3 # filled
|
||||
assert defaults["multiple_tool_calls"] is False # filled
|
||||
assert len(fixer.fixes_applied) == 3
|
||||
assert defaults["execution_mode"] == "extended_thinking" # filled
|
||||
assert defaults["model"] == "claude-opus-4-6" # filled
|
||||
assert len(fixer.fixes_applied) == 5
|
||||
|
||||
def test_skips_non_sdm_nodes(self):
|
||||
"""Non-Orchestrator nodes are untouched."""
|
||||
@@ -258,11 +266,13 @@ class TestFixOrchestratorBlocks:
|
||||
result = fixer.fix_orchestrator_blocks(agent)
|
||||
|
||||
defaults = result["nodes"][0]["input_default"]
|
||||
assert defaults["agent_mode_max_iterations"] == 10 # None → default
|
||||
assert defaults["conversation_compaction"] is True # None → default
|
||||
assert defaults["agent_mode_max_iterations"] == 10 # None -> default
|
||||
assert defaults["conversation_compaction"] is True # None -> default
|
||||
assert defaults["retry"] == 3 # kept
|
||||
assert defaults["multiple_tool_calls"] is False # kept
|
||||
assert len(fixer.fixes_applied) == 2
|
||||
assert defaults["execution_mode"] == "extended_thinking" # filled
|
||||
assert defaults["model"] == "claude-opus-4-6" # filled
|
||||
assert len(fixer.fixes_applied) == 4
|
||||
|
||||
def test_multiple_sdm_nodes(self):
|
||||
"""Multiple SDM nodes are all fixed independently."""
|
||||
@@ -277,11 +287,11 @@ class TestFixOrchestratorBlocks:
|
||||
|
||||
result = fixer.fix_orchestrator_blocks(agent)
|
||||
|
||||
# First node: 3 defaults filled (agent_mode was already set)
|
||||
# First node: 5 defaults filled (agent_mode was already set)
|
||||
assert result["nodes"][0]["input_default"]["agent_mode_max_iterations"] == 3
|
||||
# Second node: all 4 defaults filled
|
||||
# Second node: all 6 defaults filled
|
||||
assert result["nodes"][1]["input_default"]["agent_mode_max_iterations"] == 10
|
||||
assert len(fixer.fixes_applied) == 7 # 3 + 4
|
||||
assert len(fixer.fixes_applied) == 11 # 5 + 6
|
||||
|
||||
def test_registered_in_apply_all_fixes(self):
|
||||
"""fix_orchestrator_blocks runs as part of apply_all_fixes."""
|
||||
@@ -655,6 +665,7 @@ class TestOrchestratorE2EPipeline:
|
||||
"conversation_compaction": {"type": "boolean"},
|
||||
"retry": {"type": "integer"},
|
||||
"multiple_tool_calls": {"type": "boolean"},
|
||||
"execution_mode": {"type": "string"},
|
||||
},
|
||||
"required": ["prompt"],
|
||||
},
|
||||
|
||||
0
autogpt_platform/backend/test/copilot/__init__.py
Normal file
0
autogpt_platform/backend/test/copilot/__init__.py
Normal file
394
autogpt_platform/backend/test/copilot/dry_run_loop_test.py
Normal file
394
autogpt_platform/backend/test/copilot/dry_run_loop_test.py
Normal file
@@ -0,0 +1,394 @@
|
||||
"""Prompt regression tests AND functional tests for the dry-run verification loop.
|
||||
|
||||
NOTE: This file lives in test/copilot/ rather than being colocated with a
|
||||
single source module because it is a cross-cutting test spanning multiple
|
||||
modules: prompting.py, service.py, agent_generation_guide.md, and run_agent.py.
|
||||
|
||||
These tests verify that the create -> dry-run -> fix iterative workflow is
|
||||
properly communicated through tool descriptions, the prompting supplement,
|
||||
and the agent building guide.
|
||||
|
||||
After deduplication, the full dry-run workflow lives in the
|
||||
agent_generation_guide.md only. The system prompt and individual tool
|
||||
descriptions no longer repeat it — they keep a minimal footprint.
|
||||
|
||||
**Intentionally brittle**: the assertions check for specific substrings so
|
||||
that accidental removal or rewording of key instructions is caught. If you
|
||||
deliberately reword a prompt, update the corresponding assertion here.
|
||||
|
||||
--- Functional tests (added separately) ---
|
||||
|
||||
The dry-run loop is primarily a *prompt/guide* feature — the copilot reads
|
||||
the guide and follows its instructions. There are no standalone Python
|
||||
functions that implement "loop until passing" logic; the loop is driven by
|
||||
the LLM. However, several pieces of real Python infrastructure make the
|
||||
loop possible:
|
||||
|
||||
1. The ``run_agent`` and ``run_block`` OpenAI tool schemas expose a
|
||||
``dry_run`` boolean parameter that the LLM must be able to set.
|
||||
2. The ``RunAgentInput`` Pydantic model validates ``dry_run`` as a required
|
||||
bool, so the executor can branch on it.
|
||||
3. The ``_check_prerequisites`` method in ``RunAgentTool`` bypasses
|
||||
credential and missing-input gates when ``dry_run=True``.
|
||||
4. The guide documents the workflow steps in a specific order that the LLM
|
||||
must follow: create/edit -> dry-run -> inspect -> fix -> repeat.
|
||||
|
||||
The functional test classes below exercise items 1-4 directly.
|
||||
"""
|
||||
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Any, cast
|
||||
|
||||
import pytest
|
||||
from openai.types.chat import ChatCompletionToolParam
|
||||
from pydantic import ValidationError
|
||||
|
||||
from backend.copilot.prompting import get_sdk_supplement
|
||||
from backend.copilot.service import DEFAULT_SYSTEM_PROMPT
|
||||
from backend.copilot.tools import TOOL_REGISTRY
|
||||
from backend.copilot.tools.run_agent import RunAgentInput
|
||||
|
||||
# Resolved once for the whole module so individual tests stay fast.
|
||||
_SDK_SUPPLEMENT = get_sdk_supplement(use_e2b=False, cwd="/tmp/test")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Prompt regression tests (original)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestSystemPromptBasics:
|
||||
"""Verify the system prompt includes essential baseline content.
|
||||
|
||||
After deduplication, the dry-run workflow lives only in the guide.
|
||||
The system prompt carries tone and personality only.
|
||||
"""
|
||||
|
||||
def test_mentions_automations(self):
|
||||
assert "automations" in DEFAULT_SYSTEM_PROMPT.lower()
|
||||
|
||||
def test_mentions_action_oriented(self):
|
||||
assert "action-oriented" in DEFAULT_SYSTEM_PROMPT.lower()
|
||||
|
||||
|
||||
class TestToolDescriptionsDryRunLoop:
|
||||
"""Verify tool descriptions and parameters related to the dry-run loop."""
|
||||
|
||||
def test_get_agent_building_guide_mentions_workflow(self):
|
||||
desc = TOOL_REGISTRY["get_agent_building_guide"].description
|
||||
assert "dry-run" in desc.lower()
|
||||
|
||||
def test_run_agent_dry_run_param_exists_and_is_boolean(self):
|
||||
schema = TOOL_REGISTRY["run_agent"].as_openai_tool()
|
||||
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
|
||||
assert "dry_run" in params["properties"]
|
||||
assert params["properties"]["dry_run"]["type"] == "boolean"
|
||||
|
||||
def test_run_agent_dry_run_param_mentions_simulation(self):
|
||||
"""After deduplication the dry_run param description mentions simulation."""
|
||||
schema = TOOL_REGISTRY["run_agent"].as_openai_tool()
|
||||
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
|
||||
dry_run_desc = params["properties"]["dry_run"]["description"]
|
||||
assert "simulat" in dry_run_desc.lower()
|
||||
|
||||
|
||||
class TestPromptingSupplementContent:
|
||||
"""Verify the prompting supplement (via get_sdk_supplement) includes
|
||||
essential shared tool notes. After deduplication, the dry-run workflow
|
||||
lives only in the guide; the supplement carries storage, file-handling,
|
||||
and tool-discovery notes.
|
||||
"""
|
||||
|
||||
def test_includes_tool_discovery_priority(self):
|
||||
assert "Tool Discovery Priority" in _SDK_SUPPLEMENT
|
||||
|
||||
def test_includes_find_block_first(self):
|
||||
assert "find_block first" in _SDK_SUPPLEMENT or "find_block" in _SDK_SUPPLEMENT
|
||||
|
||||
def test_includes_send_authenticated_web_request(self):
|
||||
assert "SendAuthenticatedWebRequestBlock" in _SDK_SUPPLEMENT
|
||||
|
||||
|
||||
class TestAgentBuildingGuideDryRunLoop:
|
||||
"""Verify the agent building guide includes the dry-run loop."""
|
||||
|
||||
@pytest.fixture
|
||||
def guide_content(self):
|
||||
guide_path = (
|
||||
Path(__file__).resolve().parent.parent.parent
|
||||
/ "backend"
|
||||
/ "copilot"
|
||||
/ "sdk"
|
||||
/ "agent_generation_guide.md"
|
||||
)
|
||||
return guide_path.read_text(encoding="utf-8")
|
||||
|
||||
def test_has_dry_run_verification_section(self, guide_content):
|
||||
assert "REQUIRED: Dry-Run Verification Loop" in guide_content
|
||||
|
||||
def test_workflow_includes_dry_run_step(self, guide_content):
|
||||
assert "dry_run=True" in guide_content
|
||||
|
||||
def test_mentions_good_vs_bad_output(self, guide_content):
|
||||
assert "**Good output**" in guide_content
|
||||
assert "**Bad output**" in guide_content
|
||||
|
||||
def test_mentions_repeat_until_pass(self, guide_content):
|
||||
lower = guide_content.lower()
|
||||
assert "repeat" in lower
|
||||
assert "clearly unfixable" in lower
|
||||
|
||||
def test_mentions_wait_for_result(self, guide_content):
|
||||
assert "wait_for_result=120" in guide_content
|
||||
|
||||
def test_mentions_view_agent_output(self, guide_content):
|
||||
assert "view_agent_output" in guide_content
|
||||
|
||||
def test_workflow_has_dry_run_and_inspect_steps(self, guide_content):
|
||||
assert "**Dry-run**" in guide_content
|
||||
assert "**Inspect & fix**" in guide_content
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Functional tests: tool schema validation
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestRunAgentToolSchema:
|
||||
"""Validate the run_agent OpenAI tool schema exposes dry_run correctly.
|
||||
|
||||
These go beyond substring checks — they verify the full schema structure
|
||||
that the LLM receives, ensuring the parameter is well-formed and will be
|
||||
parsed correctly by OpenAI function-calling.
|
||||
"""
|
||||
|
||||
@pytest.fixture
|
||||
def schema(self) -> ChatCompletionToolParam:
|
||||
return TOOL_REGISTRY["run_agent"].as_openai_tool()
|
||||
|
||||
def test_schema_is_valid_openai_tool(self, schema: ChatCompletionToolParam):
|
||||
"""The schema has the required top-level OpenAI structure."""
|
||||
assert schema["type"] == "function"
|
||||
assert "function" in schema
|
||||
func = schema["function"]
|
||||
assert "name" in func
|
||||
assert "description" in func
|
||||
assert "parameters" in func
|
||||
assert func["name"] == "run_agent"
|
||||
|
||||
def test_dry_run_is_required(self, schema: ChatCompletionToolParam):
|
||||
"""dry_run must be in 'required' so the LLM always provides it explicitly."""
|
||||
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
|
||||
required = params.get("required", [])
|
||||
assert "dry_run" in required
|
||||
|
||||
def test_dry_run_is_boolean_type(self, schema: ChatCompletionToolParam):
|
||||
"""dry_run must be typed as boolean so the LLM generates true/false."""
|
||||
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
|
||||
assert params["properties"]["dry_run"]["type"] == "boolean"
|
||||
|
||||
def test_dry_run_description_is_nonempty(self, schema: ChatCompletionToolParam):
|
||||
"""The description must be present and substantive for LLM guidance."""
|
||||
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
|
||||
desc = params["properties"]["dry_run"]["description"]
|
||||
assert isinstance(desc, str)
|
||||
assert len(desc) > 10, "Description too short to guide the LLM"
|
||||
|
||||
def test_wait_for_result_coexists_with_dry_run(
|
||||
self, schema: ChatCompletionToolParam
|
||||
):
|
||||
"""wait_for_result must also be present — the guide instructs the LLM
|
||||
to pass both dry_run=True and wait_for_result=120 together."""
|
||||
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
|
||||
assert "wait_for_result" in params["properties"]
|
||||
assert params["properties"]["wait_for_result"]["type"] == "integer"
|
||||
|
||||
|
||||
class TestRunBlockToolSchema:
|
||||
"""Validate the run_block OpenAI tool schema exposes dry_run correctly."""
|
||||
|
||||
@pytest.fixture
|
||||
def schema(self) -> ChatCompletionToolParam:
|
||||
return TOOL_REGISTRY["run_block"].as_openai_tool()
|
||||
|
||||
def test_schema_is_valid_openai_tool(self, schema: ChatCompletionToolParam):
|
||||
assert schema["type"] == "function"
|
||||
func = schema["function"]
|
||||
assert func["name"] == "run_block"
|
||||
assert "parameters" in func
|
||||
|
||||
def test_dry_run_exists_and_is_boolean(self, schema: ChatCompletionToolParam):
|
||||
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
|
||||
props = params["properties"]
|
||||
assert "dry_run" in props
|
||||
assert props["dry_run"]["type"] == "boolean"
|
||||
|
||||
def test_dry_run_is_required(self, schema: ChatCompletionToolParam):
|
||||
"""dry_run must be required — along with block_id and input_data."""
|
||||
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
|
||||
required = params.get("required", [])
|
||||
assert "dry_run" in required
|
||||
assert "block_id" in required
|
||||
assert "input_data" in required
|
||||
|
||||
def test_dry_run_description_mentions_preview(
|
||||
self, schema: ChatCompletionToolParam
|
||||
):
|
||||
params = cast(dict[str, Any], schema["function"].get("parameters", {}))
|
||||
desc = params["properties"]["dry_run"]["description"]
|
||||
assert isinstance(desc, str)
|
||||
assert (
|
||||
"preview mode" in desc.lower()
|
||||
), "run_block dry_run description should mention preview mode"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Functional tests: RunAgentInput Pydantic model
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestRunAgentInputModel:
|
||||
"""Validate RunAgentInput Pydantic model handles dry_run correctly.
|
||||
|
||||
The executor reads dry_run from this model, so it must parse, default,
|
||||
and validate properly.
|
||||
"""
|
||||
|
||||
def test_dry_run_accepts_true(self):
|
||||
model = RunAgentInput(username_agent_slug="user/agent", dry_run=True)
|
||||
assert model.dry_run is True
|
||||
|
||||
def test_dry_run_accepts_false(self):
|
||||
"""dry_run=False must be accepted when provided explicitly."""
|
||||
model = RunAgentInput(username_agent_slug="user/agent", dry_run=False)
|
||||
assert model.dry_run is False
|
||||
|
||||
def test_dry_run_coerces_truthy_int(self):
|
||||
"""Pydantic bool fields coerce int 1 to True."""
|
||||
model = RunAgentInput(username_agent_slug="user/agent", dry_run=1) # type: ignore[arg-type]
|
||||
assert model.dry_run is True
|
||||
|
||||
def test_dry_run_coerces_falsy_int(self):
|
||||
"""Pydantic bool fields coerce int 0 to False."""
|
||||
model = RunAgentInput(username_agent_slug="user/agent", dry_run=0) # type: ignore[arg-type]
|
||||
assert model.dry_run is False
|
||||
|
||||
def test_dry_run_with_wait_for_result(self):
|
||||
"""The guide instructs passing both dry_run=True and wait_for_result=120.
|
||||
The model must accept this combination."""
|
||||
model = RunAgentInput(
|
||||
username_agent_slug="user/agent",
|
||||
dry_run=True,
|
||||
wait_for_result=120,
|
||||
)
|
||||
assert model.dry_run is True
|
||||
assert model.wait_for_result == 120
|
||||
|
||||
def test_wait_for_result_upper_bound(self):
|
||||
"""wait_for_result is bounded at 300 seconds (ge=0, le=300)."""
|
||||
with pytest.raises(ValidationError):
|
||||
RunAgentInput(
|
||||
username_agent_slug="user/agent",
|
||||
dry_run=True,
|
||||
wait_for_result=301,
|
||||
)
|
||||
|
||||
def test_string_fields_are_stripped(self):
|
||||
"""The strip_strings validator should strip whitespace from string fields."""
|
||||
model = RunAgentInput(username_agent_slug=" user/agent ", dry_run=True)
|
||||
assert model.username_agent_slug == "user/agent"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Functional tests: guide documents the correct workflow ordering
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestGuideWorkflowOrdering:
|
||||
"""Verify the guide documents workflow steps in the correct order.
|
||||
|
||||
The LLM must see: create/edit -> dry-run -> inspect -> fix -> repeat.
|
||||
If these steps are reordered, the copilot would follow the wrong sequence.
|
||||
These tests verify *ordering*, not just presence.
|
||||
"""
|
||||
|
||||
@pytest.fixture
|
||||
def guide_content(self) -> str:
|
||||
guide_path = (
|
||||
Path(__file__).resolve().parent.parent.parent
|
||||
/ "backend"
|
||||
/ "copilot"
|
||||
/ "sdk"
|
||||
/ "agent_generation_guide.md"
|
||||
)
|
||||
return guide_path.read_text(encoding="utf-8")
|
||||
|
||||
def test_create_before_dry_run_in_workflow(self, guide_content: str):
|
||||
"""Step 7 (Save/create_agent) must appear before step 8 (Dry-run)."""
|
||||
create_pos = guide_content.index("create_agent")
|
||||
dry_run_pos = guide_content.index("dry_run=True")
|
||||
assert (
|
||||
create_pos < dry_run_pos
|
||||
), "create_agent must appear before dry_run=True in the workflow"
|
||||
|
||||
def test_dry_run_before_inspect_in_verification_section(self, guide_content: str):
|
||||
"""In the verification loop section, Dry-run step must come before
|
||||
Inspect & fix step."""
|
||||
section_start = guide_content.index("REQUIRED: Dry-Run Verification Loop")
|
||||
section = guide_content[section_start:]
|
||||
dry_run_pos = section.index("**Dry-run**")
|
||||
inspect_pos = section.index("**Inspect")
|
||||
assert (
|
||||
dry_run_pos < inspect_pos
|
||||
), "Dry-run step must come before Inspect & fix in the verification loop"
|
||||
|
||||
def test_fix_before_repeat_in_verification_section(self, guide_content: str):
|
||||
"""The Fix step must come before the Repeat step."""
|
||||
section_start = guide_content.index("REQUIRED: Dry-Run Verification Loop")
|
||||
section = guide_content[section_start:]
|
||||
fix_pos = section.index("**Fix**")
|
||||
repeat_pos = section.index("**Repeat**")
|
||||
assert fix_pos < repeat_pos
|
||||
|
||||
def test_good_output_before_bad_output(self, guide_content: str):
|
||||
"""Good output examples should be listed before bad output examples,
|
||||
so the LLM sees the success pattern first."""
|
||||
good_pos = guide_content.index("**Good output**")
|
||||
bad_pos = guide_content.index("**Bad output**")
|
||||
assert good_pos < bad_pos
|
||||
|
||||
def test_numbered_steps_in_verification_section(self, guide_content: str):
|
||||
"""The step-by-step workflow should have numbered steps 1-5."""
|
||||
section_start = guide_content.index("Step-by-step workflow")
|
||||
section = guide_content[section_start:]
|
||||
# The section should contain numbered items 1 through 5
|
||||
for step_num in range(1, 6):
|
||||
assert (
|
||||
f"{step_num}. " in section
|
||||
), f"Missing numbered step {step_num} in verification workflow"
|
||||
|
||||
def test_workflow_steps_are_in_numbered_order(self, guide_content: str):
|
||||
"""The main workflow steps (1-9) must appear in ascending order."""
|
||||
# Extract the numbered workflow items from the top-level workflow section
|
||||
workflow_start = guide_content.index("### Workflow for Creating/Editing Agents")
|
||||
# End at the next ### section
|
||||
next_section = guide_content.index("### Agent JSON Structure")
|
||||
workflow_section = guide_content[workflow_start:next_section]
|
||||
step_positions = []
|
||||
for step_num in range(1, 10):
|
||||
pattern = rf"^{step_num}\.\s"
|
||||
match = re.search(pattern, workflow_section, re.MULTILINE)
|
||||
if match:
|
||||
step_positions.append((step_num, match.start()))
|
||||
# Verify at least steps 1-9 are present and in order
|
||||
assert (
|
||||
len(step_positions) >= 9
|
||||
), f"Expected 9 workflow steps, found {len(step_positions)}"
|
||||
for i in range(1, len(step_positions)):
|
||||
prev_num, prev_pos = step_positions[i - 1]
|
||||
curr_num, curr_pos = step_positions[i]
|
||||
assert prev_pos < curr_pos, (
|
||||
f"Step {prev_num} (pos {prev_pos}) should appear before "
|
||||
f"step {curr_num} (pos {curr_pos})"
|
||||
)
|
||||
@@ -98,6 +98,7 @@ services:
|
||||
- CLAMD_CONF_MaxScanSize=100M
|
||||
- CLAMD_CONF_MaxThreads=12
|
||||
- CLAMD_CONF_ReadTimeout=300
|
||||
- CLAMD_CONF_TCPAddr=0.0.0.0
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "clamdscan --version || exit 1"]
|
||||
interval: 30s
|
||||
|
||||
@@ -40,6 +40,8 @@ After making **any** code changes in the frontend, you MUST run the following co
|
||||
|
||||
Do NOT skip these steps. If any command reports errors, fix them and re-run until clean. Only then may you consider the task complete. If typing keeps failing, stop and ask the user.
|
||||
|
||||
4. `pnpm test:unit` — run integration tests; fix any failures
|
||||
|
||||
### Code Style
|
||||
|
||||
- Fully capitalize acronyms in symbols, e.g. `graphID`, `useBackendAPI`
|
||||
@@ -62,7 +64,7 @@ Do NOT skip these steps. If any command reports errors, fix them and re-run unti
|
||||
- **Icons**: Phosphor Icons only
|
||||
- **Feature Flags**: LaunchDarkly integration
|
||||
- **Error Handling**: ErrorCard for render errors, toast for mutations, Sentry for exceptions
|
||||
- **Testing**: Playwright for E2E, Storybook for component development
|
||||
- **Testing**: Vitest + React Testing Library + MSW for integration tests (primary), Playwright for E2E, Storybook for visual
|
||||
|
||||
## Environment Configuration
|
||||
|
||||
@@ -84,7 +86,12 @@ See @CONTRIBUTING.md for complete patterns. Quick reference:
|
||||
- Regenerate with `pnpm generate:api`
|
||||
- Pattern: `use{Method}{Version}{OperationName}`
|
||||
4. **Styling**: Tailwind CSS only, use design tokens, Phosphor Icons only
|
||||
5. **Testing**: Add Storybook stories for new components, Playwright for E2E. When fixing a bug, write a failing Playwright test first (use `.fixme` annotation), implement the fix, then remove the annotation.
|
||||
5. **Testing**: Integration tests are the default (~90%). See `TESTING.md` for full details.
|
||||
- **New pages/features**: Write integration tests in `__tests__/` next to `page.tsx` using Vitest + RTL + MSW
|
||||
- **API mocking**: Use Orval-generated MSW handlers from `@/app/api/__generated__/endpoints/{tag}/{tag}.msw.ts`
|
||||
- **Run**: `pnpm test:unit` (integration/unit), `pnpm test` (Playwright E2E)
|
||||
- **Storybook**: For design system components in `src/components/`
|
||||
- **TDD**: Write a failing test first, implement, then verify
|
||||
6. **Code conventions**:
|
||||
- Use function declarations (not arrow functions) for components/handlers
|
||||
- Do not use `useCallback` or `useMemo` unless asked to optimise a given function
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user