fix(backend): validate non-finite float in extract_openrouter_cost, restore UTILITIES comment

- Add math.isfinite() guard in extract_openrouter_cost so inf/nan header values are rejected instead of stored (Sentry finding) - Add test cases for inf, -inf, and nan header values - Restore the '# ------- UTILITIES ------- #' section separator in manager.py that was accidentally dropped during the drain-on-shutdown commit
refactor(backend): extract shared _update_title_async to copilot/service.py
2026-04-08 03:00:28 -04:00 · 2026-04-08 10:26:41 +07:00 · 2026-04-08 10:14:29 +07:00 · 2026-04-08 10:08:52 +07:00 · 2026-04-08 10:05:18 +07:00 · 2026-04-07 23:15:42 +07:00
242 changed files with 27506 additions and 2342 deletions
--- a/.claude/skills/pr-address/SKILL.md
+++ b/.claude/skills/pr-address/SKILL.md
@@ -95,6 +95,28 @@ Address comments **one at a time**: fix → commit → push → inline reply →
 | Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="🤖 Fixed in <commit-sha>: <description>"` |
 | Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="🤖 Fixed in <commit-sha>: <description>"` |

+## Codecov coverage
+
+Codecov patch target is **80%** on changed lines. Checks are **informational** (not blocking) but should be green.
+
+### Running coverage locally
+
+**Backend** (from `autogpt_platform/backend/`):
+```bash
+poetry run pytest -s -vv --cov=backend --cov-branch --cov-report term-missing
+```
+
+**Frontend** (from `autogpt_platform/frontend/`):
+```bash
+pnpm vitest run --coverage
+```
+
+### When codecov/patch fails
+
+1. Find uncovered files: `git diff --name-only $(gh pr view --json baseRefName --jq '.baseRefName')...HEAD`
+2. For each uncovered file — extract inline logic to `helpers.ts`/`helpers.py` and test those (highest ROI). Colocate tests as `*_test.py` (backend) or `__tests__/*.test.ts` (frontend).
+3. Run coverage locally to verify, commit, push.
+
 ## Format and commit

 After fixing, format the changed code:
--- a/.claude/skills/pr-test/SKILL.md
+++ b/.claude/skills/pr-test/SKILL.md
@@ -530,9 +530,19 @@ After showing all screenshots, output a **detailed** summary table:
 # but Homebrew bash is 5.x; Linux typically has bash 5.x). If running on Bash <4, use a
 # plain variable with a lookup function instead.
 declare -A SCREENSHOT_EXPLANATIONS=(
-  ["01-login-page.png"]="Shows the login page loaded successfully with SSO options visible."
-  ["02-builder-with-block.png"]="The builder canvas displays the newly added block connected to the trigger."
-  # ... one entry per screenshot, using the same explanations you showed the user above
+  # Each explanation MUST answer three things:
+  #   1. FLOW: Which test scenario / user journey is this part of?
+  #   2. STEPS: What exact actions were taken to reach this state?
+  #   3. EVIDENCE: What does this screenshot prove (pass/fail/data)?
+  #
+  # Good example:
+  #   ["03-cost-log-after-run.png"]="Flow: LLM block cost tracking. Steps: Logged in as tester@gmail.com → ran 'Cost Test Agent' → waited for COMPLETED status. Evidence: PlatformCostLog table shows 1 new row with cost_microdollars=1234 and correct user_id."
+  #
+  # Bad example (too vague — never do this):
+  #   ["03-cost-log.png"]="Shows the cost log table."
+  ["01-login-page.png"]="Flow: Login flow. Steps: Opened /login. Evidence: Login page renders with email/password fields and SSO options visible."
+  ["02-builder-with-block.png"]="Flow: Block execution. Steps: Logged in → /build → added LLM block. Evidence: Builder canvas shows block connected to trigger, ready to run."
+  # ... one entry per screenshot using the flow/steps/evidence format above
 )

 TEST_RESULTS_TABLE="| 1 | Login flow | PASS | N/A | 01-login-before.png, 02-login-after.png |
@@ -547,6 +557,9 @@ Upload screenshots to the PR using the GitHub Git API (no local git operations

 **This step is MANDATORY. Every test run MUST post a PR comment with screenshots. No exceptions.**

+> **CRITICAL — NEVER post a bare directory link like `https://github.com/.../tree/...`.**
+> Every screenshot MUST appear as `![name](raw_url)` inline in the PR comment so reviewers can see them without clicking any links. After posting, the verification step below greps the comment for `![` tags and exits 1 if none are found — the test run is considered incomplete until this passes.
+
 ```bash
 # Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
 REPO="Significant-Gravitas/AutoGPT"
@@ -582,12 +595,25 @@ for img in "${SCREENSHOT_FILES[@]}"; do
 done
 TREE_JSON+=']'

-# Step 2: Create tree, commit, and branch ref
+# Step 2: Create tree, commit (with parent), and branch ref
 TREE_SHA=$(echo "$TREE_JSON" | jq -c '{tree: .}' | gh api "repos/${REPO}/git/trees" --input - --jq '.sha')
-COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
-  -f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
-  -f tree="$TREE_SHA" \
-  --jq '.sha')
+
+# Resolve existing branch tip as parent (avoids orphan commits on repeat runs)
+PARENT_SHA=$(gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" --jq '.object.sha' 2>/dev/null || true)
+if [ -n "$PARENT_SHA" ]; then
+  COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
+    -f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
+    -f tree="$TREE_SHA" \
+    -f "parents[]=$PARENT_SHA" \
+    --jq '.sha')
+else
+  # First commit on this branch — no parent
+  COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
+    -f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
+    -f tree="$TREE_SHA" \
+    --jq '.sha')
+fi
+
 gh api "repos/${REPO}/git/refs" \
  -f ref="refs/heads/${SCREENSHOTS_BRANCH}" \
  -f sha="$COMMIT_SHA" 2>/dev/null \
@@ -656,17 +682,123 @@ ${IMAGE_MARKDOWN}
 ${FAILED_SECTION}
 INNEREOF

-gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE"
+POSTED_BODY=$(gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE" --jq '.body')
 rm -f "$COMMENT_FILE"
 ```

 **The PR comment MUST include:**
 1. A summary table of all scenarios with PASS/FAIL and before/after API evidence
 2. Every successfully uploaded screenshot rendered inline; any failed uploads listed with manual attachment instructions
-3. A 1-2 sentence explanation below each screenshot describing what it proves
+3. A structured explanation below each screenshot covering: **Flow** (which scenario), **Steps** (exact actions taken to reach this state), **Evidence** (what this proves — pass/fail/data values). A bare "shows the page" caption is not acceptable.

 This approach uses the GitHub Git API to create blobs, trees, commits, and refs entirely server-side. No local `git checkout` or `git push` — safe for worktrees and won't interfere with the PR branch.

+**Verify inline rendering after posting — this is required, not optional:**
+
+```bash
+# 1. Confirm the posted comment body contains inline image markdown syntax
+if ! echo "$POSTED_BODY" | grep -q '!\['; then
+  echo "❌ FAIL: No inline image tags in posted comment body. Re-check IMAGE_MARKDOWN and re-post."
+  exit 1
+fi
+
+# 2. Verify at least one raw URL actually resolves (catches wrong branch name, wrong path, etc.)
+FIRST_IMG_URL=$(echo "$POSTED_BODY" | grep -o 'https://raw.githubusercontent.com[^)]*' | head -1)
+if [ -n "$FIRST_IMG_URL" ]; then
+  HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" --max-time 10 "$FIRST_IMG_URL")
+  if [ "$HTTP_STATUS" = "200" ]; then
+    echo "✅ Inline images confirmed and raw URL resolves (HTTP 200)"
+  else
+    echo "❌ FAIL: Raw image URL returned HTTP $HTTP_STATUS — images will not render inline."
+    echo "   URL: $FIRST_IMG_URL"
+    echo "   Check branch name, path, and that the push succeeded."
+    exit 1
+  fi
+else
+  echo "⚠️  Could not extract a raw URL from the comment — verify manually."
+fi
+```
+
+## Step 8: Evaluate test completeness and post a GitHub review
+
+After posting the PR comment, evaluate whether the test run actually covered everything it needed to. This is NOT a rubber-stamp — be critical. Then post a formal GitHub review so the PR author and reviewers can see the verdict.
+
+### 8a. Evaluate against the test plan
+
+Re-read `$RESULTS_DIR/test-plan.md` (written in Step 2) and `$RESULTS_DIR/test-report.md` (written in Step 5). For each scenario in the plan, answer:
+
+> **Note:** `test-report.md` is written in Step 5. If it doesn't exist, write it before proceeding here — see the Step 5 template. Do not skip evaluation because the file is missing; create it from your notes instead.
+
+| Question | Pass criteria |
+|----------|--------------|
+| Was it tested? | Explicit steps were executed, not just described |
+| Is there screenshot evidence? | At least one before/after screenshot per scenario |
+| Did the core feature work correctly? | Expected state matches actual state |
+| Were negative cases tested? | At least one failure/rejection case per feature |
+| Was DB/API state verified (not just UI)? | Raw API response or DB query confirms state change |
+
+Build a verdict:
+- **APPROVE** — every scenario tested, evidence present, no bugs found or all bugs are minor/known
+- **REQUEST_CHANGES** — one or more: untested scenarios, missing evidence, bugs found, data not verified
+
+### 8b. Post the GitHub review
+
+```bash
+EVAL_FILE=$(mktemp)
+
+# === STEP A: Write header ===
+cat > "$EVAL_FILE" << 'ENDEVAL'
+## 🧪 Test Evaluation
+
+### Coverage checklist
+ENDEVAL
+
+# === STEP B: Append ONE line per scenario — do this BEFORE calculating verdict ===
+# Format: "- ✅ **Scenario N – name**: <what was done and verified>"
+#      or "- ❌ **Scenario N – name**: <what is missing or broken>"
+# Examples:
+#   echo "- ✅ **Scenario 1 – Login flow**: tested, screenshot evidence present, auth token verified via API" >> "$EVAL_FILE"
+#   echo "- ❌ **Scenario 3 – Cost logging**: NOT verified in DB — UI showed entry but raw SQL query was skipped" >> "$EVAL_FILE"
+#
+# !!! IMPORTANT: append ALL scenario lines here before proceeding to STEP C !!!
+
+# === STEP C: Derive verdict from the checklist — runs AFTER all lines are appended ===
+FAIL_COUNT=$(grep -c "^- ❌" "$EVAL_FILE" || true)
+if [ "$FAIL_COUNT" -eq 0 ]; then
+  VERDICT="APPROVE"
+else
+  VERDICT="REQUEST_CHANGES"
+fi
+
+# === STEP D: Append verdict section ===
+cat >> "$EVAL_FILE" << ENDVERDICT
+
+### Verdict
+ENDVERDICT
+
+if [ "$VERDICT" = "APPROVE" ]; then
+  echo "✅ All scenarios covered with evidence. No blocking issues found." >> "$EVAL_FILE"
+else
+  echo "❌ $FAIL_COUNT scenario(s) incomplete or have confirmed bugs. See ❌ items above." >> "$EVAL_FILE"
+  echo "" >> "$EVAL_FILE"
+  echo "**Required before merge:** address each ❌ item above." >> "$EVAL_FILE"
+fi
+
+# === STEP E: Post the review ===
+gh api "repos/${REPO}/pulls/$PR_NUMBER/reviews" \
+  --method POST \
+  -f body="$(cat "$EVAL_FILE")" \
+  -f event="$VERDICT"
+
+rm -f "$EVAL_FILE"
+```
+
+**Rules:**
+- Never auto-approve without checking every scenario in the test plan
+- `REQUEST_CHANGES` if ANY scenario is untested, lacks DB/API evidence, or has a confirmed bug
+- The evaluation body must list every scenario explicitly (✅ or ❌) — not just the failures
+- If you find new bugs during evaluation, add them to the request-changes body and (if `--fix` flag is set) fix them before posting
+
 ## Fix mode (--fix flag)

 When `--fix` is present, the standard is HIGHER. Do not just note issues — FIX them immediately.
--- a/.claude/skills/write-frontend-tests/SKILL.md
+++ b/.claude/skills/write-frontend-tests/SKILL.md
@@ -0,0 +1,224 @@
+---
+name: write-frontend-tests
+description: "Analyze the current branch diff against dev, plan integration tests for changed frontend pages/components, and write them. TRIGGER when user asks to write frontend tests, add test coverage, or 'write tests for my changes'."
+user-invocable: true
+args: "[base branch] — defaults to dev. Optionally pass a specific base branch to diff against."
+metadata:
+  author: autogpt-team
+  version: "1.0.0"
+---
+
+# Write Frontend Tests
+
+Analyze the current branch's frontend changes, plan integration tests, and write them.
+
+## References
+
+Before writing any tests, read the testing rules and conventions:
+
+- `autogpt_platform/frontend/TESTING.md` — testing strategy, file locations, examples
+- `autogpt_platform/frontend/src/tests/AGENTS.md` — detailed testing rules, MSW patterns, decision flowchart
+- `autogpt_platform/frontend/src/tests/integrations/test-utils.tsx` — custom render with providers
+- `autogpt_platform/frontend/src/tests/integrations/vitest.setup.tsx` — MSW server setup
+
+## Step 1: Identify changed frontend files
+
+```bash
+BASE_BRANCH="${ARGUMENTS:-dev}"
+cd autogpt_platform/frontend
+
+# Get changed frontend files (excluding generated, config, and test files)
+git diff "$BASE_BRANCH"...HEAD --name-only -- src/ \
+  | grep -v '__generated__' \
+  | grep -v '__tests__' \
+  | grep -v '\.test\.' \
+  | grep -v '\.stories\.' \
+  | grep -v '\.spec\.'
+```
+
+Also read the diff to understand what changed:
+
+```bash
+git diff "$BASE_BRANCH"...HEAD --stat -- src/
+git diff "$BASE_BRANCH"...HEAD -- src/ | head -500
+```
+
+## Step 2: Categorize changes and find test targets
+
+For each changed file, determine:
+
+1. **Is it a page?** (`page.tsx`) — these are the primary test targets
+2. **Is it a hook?** (`use*.ts`) — test via the page that uses it
+3. **Is it a component?** (`.tsx` in `components/`) — test via the parent page unless it's complex enough to warrant isolation
+4. **Is it a helper?** (`helpers.ts`, `utils.ts`) — unit test directly if pure logic
+
+**Priority order:**
+1. Pages with new/changed data fetching or user interactions
+2. Components with complex internal logic (modals, forms, wizards)
+3. Hooks with non-trivial business logic
+4. Pure helper functions
+
+Skip: styling-only changes, type-only changes, config changes.
+
+## Step 3: Check for existing tests
+
+For each test target, check if tests already exist:
+
+```bash
+# For a page at src/app/(platform)/library/page.tsx
+ls src/app/\(platform\)/library/__tests__/ 2>/dev/null
+
+# For a component at src/app/(platform)/library/components/AgentCard/AgentCard.tsx
+ls src/app/\(platform\)/library/components/AgentCard/__tests__/ 2>/dev/null
+```
+
+Note which targets have no tests (need new files) vs which have tests that need updating.
+
+## Step 4: Identify API endpoints used
+
+For each test target, find which API hooks are used:
+
+```bash
+# Find generated API hook imports in the changed files
+grep -rn 'from.*__generated__/endpoints' src/app/\(platform\)/library/
+grep -rn 'use[A-Z].*V[12]' src/app/\(platform\)/library/
+```
+
+For each API hook found, locate the corresponding MSW handler:
+
+```bash
+# If the page uses useGetV2ListLibraryAgents, find its MSW handlers
+grep -rn 'getGetV2ListLibraryAgents.*Handler' src/app/api/__generated__/endpoints/library/library.msw.ts
+```
+
+List every MSW handler you will need (200 for happy path, 4xx for error paths).
+
+## Step 5: Write the test plan
+
+Before writing code, output a plan as a numbered list:
+
+```
+Test plan for [branch name]:
+
+1. src/app/(platform)/library/__tests__/main.test.tsx (NEW)
+   - Renders page with agent list (MSW 200)
+   - Shows loading state
+   - Shows error state (MSW 422)
+   - Handles empty agent list
+
+2. src/app/(platform)/library/__tests__/search.test.tsx (NEW)
+   - Filters agents by search query
+   - Shows no results message
+   - Clears search
+
+3. src/app/(platform)/library/components/AgentCard/__tests__/AgentCard.test.tsx (UPDATE)
+   - Add test for new "duplicate" action
+```
+
+Present this plan to the user. Wait for confirmation before proceeding. If the user has feedback, adjust the plan.
+
+## Step 6: Write the tests
+
+For each test file in the plan, follow these conventions:
+
+### File structure
+
+```tsx
+import { render, screen, waitFor } from "@/tests/integrations/test-utils";
+import { server } from "@/mocks/mock-server";
+// Import MSW handlers for endpoints the page uses
+import {
+  getGetV2ListLibraryAgentsMockHandler200,
+  getGetV2ListLibraryAgentsMockHandler422,
+} from "@/app/api/__generated__/endpoints/library/library.msw";
+// Import the component under test
+import LibraryPage from "../page";
+
+describe("LibraryPage", () => {
+  test("renders agent list from API", async () => {
+    server.use(getGetV2ListLibraryAgentsMockHandler200());
+
+    render(<LibraryPage />);
+
+    expect(await screen.findByText(/my agents/i)).toBeDefined();
+  });
+
+  test("shows error state on API failure", async () => {
+    server.use(getGetV2ListLibraryAgentsMockHandler422());
+
+    render(<LibraryPage />);
+
+    expect(await screen.findByText(/error/i)).toBeDefined();
+  });
+});
+```
+
+### Rules
+
+- Use `render()` from `@/tests/integrations/test-utils` (NOT from `@testing-library/react` directly)
+- Use `server.use()` to set up MSW handlers BEFORE rendering
+- Use `findBy*` (async) for elements that appear after data fetching — NOT `getBy*`
+- Use `getBy*` only for elements that are immediately present in the DOM
+- Use `screen` queries — do NOT destructure from `render()`
+- Use `waitFor` when asserting side effects or state changes after interactions
+- Import `fireEvent` or `userEvent` from the test-utils for interactions
+- Do NOT mock internal hooks or functions — mock at the API boundary via MSW
+- Do NOT use `act()` manually — `render` and `fireEvent` handle it
+- Keep tests focused: one behavior per test
+- Use descriptive test names that read like sentences
+
+### Test location
+
+```
+# For pages: __tests__/ next to page.tsx
+src/app/(platform)/library/__tests__/main.test.tsx
+
+# For complex standalone components: __tests__/ inside component folder
+src/app/(platform)/library/components/AgentCard/__tests__/AgentCard.test.tsx
+
+# For pure helpers: co-located .test.ts
+src/app/(platform)/library/helpers.test.ts
+```
+
+### Custom MSW overrides
+
+When the auto-generated faker data is not enough, override with specific data:
+
+```tsx
+import { http, HttpResponse } from "msw";
+
+server.use(
+  http.get("http://localhost:3000/api/proxy/api/v2/library/agents", () => {
+    return HttpResponse.json({
+      agents: [
+        { id: "1", name: "Test Agent", description: "A test agent" },
+      ],
+      pagination: { total_items: 1, total_pages: 1, page: 1, page_size: 10 },
+    });
+  }),
+);
+```
+
+Use the proxy URL pattern: `http://localhost:3000/api/proxy/api/v{version}/{path}` — this matches the MSW base URL configured in `orval.config.ts`.
+
+## Step 7: Run and verify
+
+After writing all tests:
+
+```bash
+cd autogpt_platform/frontend
+pnpm test:unit --reporter=verbose
+```
+
+If tests fail:
+1. Read the error output carefully
+2. Fix the test (not the source code, unless there is a genuine bug)
+3. Re-run until all pass
+
+Then run the full checks:
+
+```bash
+pnpm format
+pnpm lint
+pnpm types
+```
--- a/.github/workflows/platform-fullstack-ci.yml
+++ b/.github/workflows/platform-fullstack-ci.yml
@@ -179,21 +179,30 @@ jobs:
          pip install pyyaml

          # Resolve extends and generate a flat compose file that bake can understand
+          export NEXT_PUBLIC_SOURCEMAPS NEXT_PUBLIC_PW_TEST
          docker compose -f docker-compose.yml config > docker-compose.resolved.yml

+          # Ensure NEXT_PUBLIC_SOURCEMAPS is in resolved compose
+          # (docker compose config on some versions drops this arg)
+          if ! grep -q "NEXT_PUBLIC_SOURCEMAPS" docker-compose.resolved.yml; then
+            echo "Injecting NEXT_PUBLIC_SOURCEMAPS into resolved compose (docker compose config dropped it)"
+            sed -i '/NEXT_PUBLIC_PW_TEST/a\        NEXT_PUBLIC_SOURCEMAPS: "true"' docker-compose.resolved.yml
+          fi
+
          # Add cache configuration to the resolved compose file
          python ../.github/workflows/scripts/docker-ci-fix-compose-build-cache.py \
            --source docker-compose.resolved.yml \
            --cache-from "type=gha" \
            --cache-to "type=gha,mode=max" \
            --backend-hash "${{ hashFiles('autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/poetry.lock', 'autogpt_platform/backend/backend/**') }}" \
-            --frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}" \
+            --frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}-sourcemaps" \
            --git-ref "${{ github.ref }}"

          # Build with bake using the resolved compose file (now includes cache config)
          docker buildx bake --allow=fs.read=.. -f docker-compose.resolved.yml --load
        env:
          NEXT_PUBLIC_PW_TEST: true
+          NEXT_PUBLIC_SOURCEMAPS: true

      - name: Set up tests - Cache E2E test data
        id: e2e-data-cache
@@ -279,6 +288,11 @@ jobs:
          cache: "pnpm"
          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml

+      - name: Copy source maps from Docker for E2E coverage
+        run: |
+          FRONTEND_CONTAINER=$(docker compose -f ../docker-compose.resolved.yml ps -q frontend)
+          docker cp "$FRONTEND_CONTAINER":/app/.next/static .next-static-coverage
+
      - name: Set up tests - Install dependencies
        run: pnpm install --frozen-lockfile

@@ -289,6 +303,15 @@ jobs:
        run: pnpm test:no-build
        continue-on-error: false

+      - name: Upload E2E coverage to Codecov
+        if: ${{ !cancelled() }}
+        uses: codecov/codecov-action@v5
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}
+          flags: platform-frontend-e2e
+          files: ./autogpt_platform/frontend/coverage/e2e/cobertura-coverage.xml
+          disable_search: true
+
      - name: Upload Playwright report
        if: always()
        uses: actions/upload-artifact@v4
--- a/.gitleaks.toml
+++ b/.gitleaks.toml
@@ -0,0 +1,36 @@
+title = "AutoGPT Gitleaks Config"
+
+[extend]
+useDefault = true
+
+[allowlist]
+description = "Global allowlist"
+paths = [
+    # Template/example env files (no real secrets)
+    '''\.env\.(default|example|template)$''',
+    # Lock files
+    '''pnpm-lock\.yaml$''',
+    '''poetry\.lock$''',
+    # Secrets baseline
+    '''\.secrets\.baseline$''',
+    # Build artifacts and caches (should not be committed)
+    '''__pycache__/''',
+    '''classic/frontend/build/''',
+    # Docker dev setup (local dev JWTs/keys only)
+    '''autogpt_platform/db/docker/''',
+    # Load test configs (dev JWTs)
+    '''load-tests/configs/''',
+    # Test files with fake/fixture keys (_test.py, test_*.py, conftest.py)
+    '''(_test|test_.*|conftest)\.py$''',
+    # Documentation (only contains placeholder keys in curl/API examples)
+    '''docs/.*\.md$''',
+    # Firebase config (public API keys by design)
+    '''google-services\.json$''',
+    '''classic/frontend/(lib|web)/''',
+]
+# CI test-only encryption key (marked DO NOT USE IN PRODUCTION)
+regexes = [
+    '''dvziYgz0KSK8FENhju0ZYi8''',
+    # LLM model name enum values falsely flagged as API keys
+    '''Llama-\d.*Instruct''',
+]
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -23,9 +23,15 @@ repos:
      - id: detect-secrets
        name: Detect secrets
        description: Detects high entropy strings that are likely to be passwords.
+        args: ["--baseline", ".secrets.baseline"]
        files: ^autogpt_platform/
-        exclude: pnpm-lock\.yaml$
-        stages: [pre-push]
+        exclude: (pnpm-lock\.yaml|\.env\.(default|example|template))$
+
+  - repo: https://github.com/gitleaks/gitleaks
+    rev: v8.24.3
+    hooks:
+      - id: gitleaks
+        name: Detect secrets (gitleaks)

  - repo: local
    # For proper type checking, all dependencies need to be up-to-date.
--- a/.secrets.baseline
+++ b/.secrets.baseline
@@ -0,0 +1,467 @@
+{
+  "version": "1.5.0",
+  "plugins_used": [
+    {
+      "name": "ArtifactoryDetector"
+    },
+    {
+      "name": "AWSKeyDetector"
+    },
+    {
+      "name": "AzureStorageKeyDetector"
+    },
+    {
+      "name": "Base64HighEntropyString",
+      "limit": 4.5
+    },
+    {
+      "name": "BasicAuthDetector"
+    },
+    {
+      "name": "CloudantDetector"
+    },
+    {
+      "name": "DiscordBotTokenDetector"
+    },
+    {
+      "name": "GitHubTokenDetector"
+    },
+    {
+      "name": "GitLabTokenDetector"
+    },
+    {
+      "name": "HexHighEntropyString",
+      "limit": 3.0
+    },
+    {
+      "name": "IbmCloudIamDetector"
+    },
+    {
+      "name": "IbmCosHmacDetector"
+    },
+    {
+      "name": "IPPublicDetector"
+    },
+    {
+      "name": "JwtTokenDetector"
+    },
+    {
+      "name": "KeywordDetector",
+      "keyword_exclude": ""
+    },
+    {
+      "name": "MailchimpDetector"
+    },
+    {
+      "name": "NpmDetector"
+    },
+    {
+      "name": "OpenAIDetector"
+    },
+    {
+      "name": "PrivateKeyDetector"
+    },
+    {
+      "name": "PypiTokenDetector"
+    },
+    {
+      "name": "SendGridDetector"
+    },
+    {
+      "name": "SlackDetector"
+    },
+    {
+      "name": "SoftlayerDetector"
+    },
+    {
+      "name": "SquareOAuthDetector"
+    },
+    {
+      "name": "StripeDetector"
+    },
+    {
+      "name": "TelegramBotTokenDetector"
+    },
+    {
+      "name": "TwilioKeyDetector"
+    }
+  ],
+  "filters_used": [
+    {
+      "path": "detect_secrets.filters.allowlist.is_line_allowlisted"
+    },
+    {
+      "path": "detect_secrets.filters.common.is_ignored_due_to_verification_policies",
+      "min_level": 2
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_indirect_reference"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_likely_id_string"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_lock_file"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_not_alphanumeric_string"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_potential_uuid"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_prefixed_with_dollar_sign"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_sequential_string"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_swagger_file"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_templated_secret"
+    },
+    {
+      "path": "detect_secrets.filters.regex.should_exclude_file",
+      "pattern": [
+        "\\.env$",
+        "pnpm-lock\\.yaml$",
+        "\\.env\\.(default|example|template)$",
+        "__pycache__",
+        "_test\\.py$",
+        "test_.*\\.py$",
+        "conftest\\.py$",
+        "poetry\\.lock$",
+        "node_modules"
+      ]
+    }
+  ],
+  "results": {
+    "autogpt_platform/backend/backend/api/external/v1/integrations.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/api/external/v1/integrations.py",
+        "hashed_secret": "665b1e3851eefefa3fb878654292f16597d25155",
+        "is_verified": false,
+        "line_number": 289
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/airtable/_config.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/blocks/airtable/_config.py",
+        "hashed_secret": "57e168b03afb7c1ee3cdc4ee3db2fe1cc6e0df26",
+        "is_verified": false,
+        "line_number": 29
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/dataforseo/_config.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/blocks/dataforseo/_config.py",
+        "hashed_secret": "32ce93887331fa5d192f2876ea15ec000c7d58b8",
+        "is_verified": false,
+        "line_number": 12
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/github/checks.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/checks.py",
+        "hashed_secret": "8ac6f92737d8586790519c5d7bfb4d2eb172c238",
+        "is_verified": false,
+        "line_number": 108
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/github/ci.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/ci.py",
+        "hashed_secret": "90bd1b48e958257948487b90bee080ba5ed00caa",
+        "is_verified": false,
+        "line_number": 123
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
+        "hashed_secret": "f96896dafced7387dcd22343b8ea29d3d2c65663",
+        "is_verified": false,
+        "line_number": 42
+      },
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
+        "hashed_secret": "b80a94d5e70bedf4f5f89d2f5a5255cc9492d12e",
+        "is_verified": false,
+        "line_number": 193
+      },
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
+        "hashed_secret": "75b17e517fe1b3136394f6bec80c4f892da75e42",
+        "is_verified": false,
+        "line_number": 344
+      },
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
+        "hashed_secret": "b0bfb5e4e2394e7f8906e5ed1dffd88b2bc89dd5",
+        "is_verified": false,
+        "line_number": 534
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/github/statuses.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/statuses.py",
+        "hashed_secret": "8ac6f92737d8586790519c5d7bfb4d2eb172c238",
+        "is_verified": false,
+        "line_number": 85
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/google/docs.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/google/docs.py",
+        "hashed_secret": "c95da0c6696342c867ef0c8258d2f74d20fd94d4",
+        "is_verified": false,
+        "line_number": 203
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/google/sheets.py": [
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/google/sheets.py",
+        "hashed_secret": "bd5a04fa3667e693edc13239b6d310c5c7a8564b",
+        "is_verified": false,
+        "line_number": 57
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/linear/_config.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/blocks/linear/_config.py",
+        "hashed_secret": "b37f020f42d6d613b6ce30103e4d408c4499b3bb",
+        "is_verified": false,
+        "line_number": 53
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/medium.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/medium.py",
+        "hashed_secret": "ff998abc1ce6d8f01a675fa197368e44c8916e9c",
+        "is_verified": false,
+        "line_number": 131
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/replicate/replicate_block.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/replicate/replicate_block.py",
+        "hashed_secret": "8bbdd6f26368f58ea4011d13d7f763cb662e66f0",
+        "is_verified": false,
+        "line_number": 55
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/slant3d/webhook.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/slant3d/webhook.py",
+        "hashed_secret": "36263c76947443b2f6e6b78153967ac4a7da99f9",
+        "is_verified": false,
+        "line_number": 100
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/talking_head.py": [
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/talking_head.py",
+        "hashed_secret": "44ce2d66222529eea4a32932823466fc0601c799",
+        "is_verified": false,
+        "line_number": 113
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/wordpress/_config.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/blocks/wordpress/_config.py",
+        "hashed_secret": "e62679512436161b78e8a8d68c8829c2a1031ccb",
+        "is_verified": false,
+        "line_number": 17
+      }
+    ],
+    "autogpt_platform/backend/backend/util/cache.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/util/cache.py",
+        "hashed_secret": "37f0c918c3fa47ca4a70e42037f9f123fdfbc75b",
+        "is_verified": false,
+        "line_number": 449
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/helpers.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/helpers.ts",
+        "hashed_secret": "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8",
+        "is_verified": false,
+        "line_number": 6
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/dictionaries/en.json": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/dictionaries/en.json",
+        "hashed_secret": "8be3c943b1609fffbfc51aad666d0a04adf83c9d",
+        "is_verified": false,
+        "line_number": 5
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/dictionaries/es.json": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/dictionaries/es.json",
+        "hashed_secret": "5a6d1c612954979ea99ee33dbb2d231b00f6ac0a",
+        "is_verified": false,
+        "line_number": 5
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts",
+        "hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
+        "is_verified": false,
+        "line_number": 6
+      },
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts",
+        "hashed_secret": "f72cbb45464d487064610c5411c576ca4019d380",
+        "is_verified": false,
+        "line_number": 8
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts",
+        "hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
+        "is_verified": false,
+        "line_number": 5
+      },
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts",
+        "hashed_secret": "f72cbb45464d487064610c5411c576ca4019d380",
+        "is_verified": false,
+        "line_number": 7
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx",
+        "hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
+        "is_verified": false,
+        "line_number": 192
+      },
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx",
+        "hashed_secret": "86275db852204937bbdbdebe5fabe8536e030ab6",
+        "is_verified": false,
+        "line_number": 193
+      }
+    ],
+    "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts",
+        "hashed_secret": "47acd2028cf81b5da88ddeedb2aea4eca4b71fbd",
+        "is_verified": false,
+        "line_number": 102
+      },
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts",
+        "hashed_secret": "8be3c943b1609fffbfc51aad666d0a04adf83c9d",
+        "is_verified": false,
+        "line_number": 103
+      }
+    ],
+    "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts": [
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "9c486c92f1a7420e1045c7ad963fbb7ba3621025",
+        "is_verified": false,
+        "line_number": 73
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "9277508c7a6effc8fb59163efbfada189e35425c",
+        "is_verified": false,
+        "line_number": 75
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "8dc7e2cb1d0935897d541bf5facab389b8a50340",
+        "is_verified": false,
+        "line_number": 77
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "79a26ad48775944299be6aaf9fb1d5302c1ed75b",
+        "is_verified": false,
+        "line_number": 79
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "a3b62b44500a1612e48d4cab8294df81561b3b1a",
+        "is_verified": false,
+        "line_number": 81
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "a58979bd0b21ef4f50417d001008e60dd7a85c64",
+        "is_verified": false,
+        "line_number": 83
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "6cb6e075f8e8c7c850f9d128d6608e5dbe209a79",
+        "is_verified": false,
+        "line_number": 85
+      }
+    ],
+    "autogpt_platform/frontend/src/lib/constants.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/lib/constants.ts",
+        "hashed_secret": "27b924db06a28cc755fb07c54f0fddc30659fe4d",
+        "is_verified": false,
+        "line_number": 10
+      }
+    ],
+    "autogpt_platform/frontend/src/tests/credentials/index.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/tests/credentials/index.ts",
+        "hashed_secret": "c18006fc138809314751cd1991f1e0b820fabd37",
+        "is_verified": false,
+        "line_number": 4
+      }
+    ]
+  },
+  "generated_at": "2026-04-02T13:10:54Z"
+}
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -30,7 +30,7 @@ See `/frontend/CONTRIBUTING.md` for complete patterns. Quick reference:
   - Regenerate with `pnpm generate:api`
   - Pattern: `use{Method}{Version}{OperationName}`
 4. **Styling**: Tailwind CSS only, use design tokens, Phosphor Icons only
-5. **Testing**: Add Storybook stories for new components, Playwright for E2E
+5. **Testing**: Integration tests (Vitest + RTL + MSW) are the default (~90%, page-level). Playwright for E2E critical flows. Storybook for design system components. See `autogpt_platform/frontend/TESTING.md`
 6. **Code conventions**: Function declarations (not arrow functions) for components/handlers

 - Component props should be `interface Props { ... }` (not exported) unless the interface needs to be used outside the component
@@ -47,7 +47,9 @@ See `/frontend/CONTRIBUTING.md` for complete patterns. Quick reference:
 ## Testing

 - Backend: `poetry run test` (runs pytest with a docker based postgres + prisma).
- Frontend: `pnpm test` or `pnpm test-ui` for Playwright tests. See `docs/content/platform/contributing/tests.md` for tips.
+- Frontend integration tests: `pnpm test:unit` (Vitest + RTL + MSW, primary testing approach).
+- Frontend E2E tests: `pnpm test` or `pnpm test-ui` for Playwright tests.
+- See `autogpt_platform/frontend/TESTING.md` for the full testing strategy.

 Always run the relevant linters and tests before committing.
 Use conventional commit messages for all commits (e.g. `feat(backend): add API`).
--- a/autogpt_platform/backend/backend/api/features/admin/platform_cost_routes.py
+++ b/autogpt_platform/backend/backend/api/features/admin/platform_cost_routes.py
@@ -0,0 +1,98 @@
+import logging
+from datetime import datetime
+
+from autogpt_libs.auth import get_user_id, requires_admin_user
+from cachetools import TTLCache
+from fastapi import APIRouter, Query, Security
+from pydantic import BaseModel
+
+from backend.data.platform_cost import (
+    CostLogRow,
+    PlatformCostDashboard,
+    get_platform_cost_dashboard,
+    get_platform_cost_logs,
+)
+from backend.util.models import Pagination
+
+logger = logging.getLogger(__name__)
+
+# Cache dashboard results for 30 seconds per unique filter combination.
+# The table is append-only so stale reads are acceptable for analytics.
+_DASHBOARD_CACHE_TTL = 30
+_dashboard_cache: TTLCache[tuple, PlatformCostDashboard] = TTLCache(
+    maxsize=256, ttl=_DASHBOARD_CACHE_TTL
+)
+
+
+router = APIRouter(
+    prefix="/platform-costs",
+    tags=["platform-cost", "admin"],
+    dependencies=[Security(requires_admin_user)],
+)
+
+
+class PlatformCostLogsResponse(BaseModel):
+    logs: list[CostLogRow]
+    pagination: Pagination
+
+
+@router.get(
+    "/dashboard",
+    response_model=PlatformCostDashboard,
+    summary="Get Platform Cost Dashboard",
+)
+async def get_cost_dashboard(
+    admin_user_id: str = Security(get_user_id),
+    start: datetime | None = Query(None),
+    end: datetime | None = Query(None),
+    provider: str | None = Query(None),
+    user_id: str | None = Query(None),
+):
+    logger.info("Admin %s fetching platform cost dashboard", admin_user_id)
+    cache_key = (start, end, provider, user_id)
+    cached = _dashboard_cache.get(cache_key)
+    if cached is not None:
+        return cached
+    result = await get_platform_cost_dashboard(
+        start=start,
+        end=end,
+        provider=provider,
+        user_id=user_id,
+    )
+    _dashboard_cache[cache_key] = result
+    return result
+
+
+@router.get(
+    "/logs",
+    response_model=PlatformCostLogsResponse,
+    summary="Get Platform Cost Logs",
+)
+async def get_cost_logs(
+    admin_user_id: str = Security(get_user_id),
+    start: datetime | None = Query(None),
+    end: datetime | None = Query(None),
+    provider: str | None = Query(None),
+    user_id: str | None = Query(None),
+    page: int = Query(1, ge=1),
+    page_size: int = Query(50, ge=1, le=200),
+):
+    logger.info("Admin %s fetching platform cost logs", admin_user_id)
+    logs, total = await get_platform_cost_logs(
+        start=start,
+        end=end,
+        provider=provider,
+        user_id=user_id,
+        page=page,
+        page_size=page_size,
+    )
+    total_pages = (total + page_size - 1) // page_size
+    return PlatformCostLogsResponse(
+        logs=logs,
+        pagination=Pagination(
+            total_items=total,
+            total_pages=total_pages,
+            current_page=page,
+            page_size=page_size,
+        ),
+    )
--- a/autogpt_platform/backend/backend/api/features/admin/platform_cost_routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/admin/platform_cost_routes_test.py
@@ -0,0 +1,192 @@
+from unittest.mock import AsyncMock
+
+import fastapi
+import fastapi.testclient
+import pytest
+import pytest_mock
+from autogpt_libs.auth.jwt_utils import get_jwt_payload
+
+from backend.data.platform_cost import PlatformCostDashboard
+
+from . import platform_cost_routes
+from .platform_cost_routes import router as platform_cost_router
+
+app = fastapi.FastAPI()
+app.include_router(platform_cost_router)
+
+client = fastapi.testclient.TestClient(app)
+
+
+@pytest.fixture(autouse=True)
+def setup_app_admin_auth(mock_jwt_admin):
+    """Setup admin auth overrides for all tests in this module"""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
+    # Clear TTL cache so each test starts cold.
+    platform_cost_routes._dashboard_cache.clear()
+    yield
+    app.dependency_overrides.clear()
+
+
+def test_get_dashboard_success(
+    mocker: pytest_mock.MockerFixture,
+) -> None:
+    real_dashboard = PlatformCostDashboard(
+        by_provider=[],
+        by_user=[],
+        total_cost_microdollars=0,
+        total_requests=0,
+        total_users=0,
+    )
+    mocker.patch(
+        "backend.api.features.admin.platform_cost_routes.get_platform_cost_dashboard",
+        AsyncMock(return_value=real_dashboard),
+    )
+
+    response = client.get("/platform-costs/dashboard")
+    assert response.status_code == 200
+    data = response.json()
+    assert "by_provider" in data
+    assert "by_user" in data
+    assert data["total_cost_microdollars"] == 0
+
+
+def test_get_logs_success(
+    mocker: pytest_mock.MockerFixture,
+) -> None:
+    mocker.patch(
+        "backend.api.features.admin.platform_cost_routes.get_platform_cost_logs",
+        AsyncMock(return_value=([], 0)),
+    )
+
+    response = client.get("/platform-costs/logs")
+    assert response.status_code == 200
+    data = response.json()
+    assert data["logs"] == []
+    assert data["pagination"]["total_items"] == 0
+
+
+def test_get_dashboard_with_filters(
+    mocker: pytest_mock.MockerFixture,
+) -> None:
+    real_dashboard = PlatformCostDashboard(
+        by_provider=[],
+        by_user=[],
+        total_cost_microdollars=0,
+        total_requests=0,
+        total_users=0,
+    )
+    mock_dashboard = AsyncMock(return_value=real_dashboard)
+    mocker.patch(
+        "backend.api.features.admin.platform_cost_routes.get_platform_cost_dashboard",
+        mock_dashboard,
+    )
+
+    response = client.get(
+        "/platform-costs/dashboard",
+        params={
+            "start": "2026-01-01T00:00:00",
+            "end": "2026-04-01T00:00:00",
+            "provider": "openai",
+            "user_id": "test-user-123",
+        },
+    )
+    assert response.status_code == 200
+    mock_dashboard.assert_called_once()
+    call_kwargs = mock_dashboard.call_args.kwargs
+    assert call_kwargs["provider"] == "openai"
+    assert call_kwargs["user_id"] == "test-user-123"
+    assert call_kwargs["start"] is not None
+    assert call_kwargs["end"] is not None
+
+
+def test_get_logs_with_pagination(
+    mocker: pytest_mock.MockerFixture,
+) -> None:
+    mocker.patch(
+        "backend.api.features.admin.platform_cost_routes.get_platform_cost_logs",
+        AsyncMock(return_value=([], 0)),
+    )
+
+    response = client.get(
+        "/platform-costs/logs",
+        params={"page": 2, "page_size": 25, "provider": "anthropic"},
+    )
+    assert response.status_code == 200
+    data = response.json()
+    assert data["pagination"]["current_page"] == 2
+    assert data["pagination"]["page_size"] == 25
+
+
+def test_get_dashboard_requires_admin() -> None:
+    import fastapi
+    from fastapi import HTTPException
+
+    def reject_jwt(request: fastapi.Request):
+        raise HTTPException(status_code=401, detail="Not authenticated")
+
+    app.dependency_overrides[get_jwt_payload] = reject_jwt
+    try:
+        response = client.get("/platform-costs/dashboard")
+        assert response.status_code == 401
+        response = client.get("/platform-costs/logs")
+        assert response.status_code == 401
+    finally:
+        app.dependency_overrides.clear()
+
+
+def test_get_dashboard_rejects_non_admin(mock_jwt_user, mock_jwt_admin) -> None:
+    """Non-admin JWT must be rejected with 403 by requires_admin_user."""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
+    try:
+        response = client.get("/platform-costs/dashboard")
+        assert response.status_code == 403
+        response = client.get("/platform-costs/logs")
+        assert response.status_code == 403
+    finally:
+        app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
+
+
+def test_get_logs_invalid_page_size_too_large() -> None:
+    """page_size > 200 must be rejected with 422."""
+    response = client.get("/platform-costs/logs", params={"page_size": 201})
+    assert response.status_code == 422
+
+
+def test_get_logs_invalid_page_size_zero() -> None:
+    """page_size = 0 (below ge=1) must be rejected with 422."""
+    response = client.get("/platform-costs/logs", params={"page_size": 0})
+    assert response.status_code == 422
+
+
+def test_get_logs_invalid_page_negative() -> None:
+    """page < 1 must be rejected with 422."""
+    response = client.get("/platform-costs/logs", params={"page": 0})
+    assert response.status_code == 422
+
+
+def test_get_dashboard_invalid_date_format() -> None:
+    """Malformed start date must be rejected with 422."""
+    response = client.get("/platform-costs/dashboard", params={"start": "not-a-date"})
+    assert response.status_code == 422
+
+
+def test_get_dashboard_cache_hit(
+    mocker: pytest_mock.MockerFixture,
+) -> None:
+    """Second identical request returns cached result without calling the DB again."""
+    real_dashboard = PlatformCostDashboard(
+        by_provider=[],
+        by_user=[],
+        total_cost_microdollars=42,
+        total_requests=1,
+        total_users=1,
+    )
+    mock_fn = mocker.patch(
+        "backend.api.features.admin.platform_cost_routes.get_platform_cost_dashboard",
+        AsyncMock(return_value=real_dashboard),
+    )
+
+    client.get("/platform-costs/dashboard")
+    client.get("/platform-costs/dashboard")
+
+    mock_fn.assert_awaited_once()  # second request hit the cache
--- a/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes.py
+++ b/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes.py
@@ -9,11 +9,14 @@ from pydantic import BaseModel

 from backend.copilot.config import ChatConfig
 from backend.copilot.rate_limit import (
+    SubscriptionTier,
    get_global_rate_limits,
    get_usage_status,
+    get_user_tier,
    reset_user_usage,
+    set_user_tier,
 )
-from backend.data.user import get_user_by_email, get_user_email_by_id
+from backend.data.user import get_user_by_email, get_user_email_by_id, search_users

 logger = logging.getLogger(__name__)

@@ -33,6 +36,17 @@ class UserRateLimitResponse(BaseModel):
    weekly_token_limit: int
    daily_tokens_used: int
    weekly_tokens_used: int
+    tier: SubscriptionTier
+
+
+class UserTierResponse(BaseModel):
+    user_id: str
+    tier: SubscriptionTier
+
+
+class SetUserTierRequest(BaseModel):
+    user_id: str
+    tier: SubscriptionTier


 async def _resolve_user_id(
@@ -86,10 +100,10 @@ async def get_user_rate_limit(

    logger.info("Admin %s checking rate limit for user %s", admin_user_id, resolved_id)

-    daily_limit, weekly_limit = await get_global_rate_limits(
+    daily_limit, weekly_limit, tier = await get_global_rate_limits(
        resolved_id, config.daily_token_limit, config.weekly_token_limit
    )
-    usage = await get_usage_status(resolved_id, daily_limit, weekly_limit)
+    usage = await get_usage_status(resolved_id, daily_limit, weekly_limit, tier=tier)

    return UserRateLimitResponse(
        user_id=resolved_id,
@@ -98,6 +112,7 @@ async def get_user_rate_limit(
        weekly_token_limit=weekly_limit,
        daily_tokens_used=usage.daily.used,
        weekly_tokens_used=usage.weekly.used,
+        tier=tier,
    )


@@ -125,10 +140,10 @@ async def reset_user_rate_limit(
        logger.exception("Failed to reset user usage")
        raise HTTPException(status_code=500, detail="Failed to reset usage") from e

-    daily_limit, weekly_limit = await get_global_rate_limits(
+    daily_limit, weekly_limit, tier = await get_global_rate_limits(
        user_id, config.daily_token_limit, config.weekly_token_limit
    )
-    usage = await get_usage_status(user_id, daily_limit, weekly_limit)
+    usage = await get_usage_status(user_id, daily_limit, weekly_limit, tier=tier)

    try:
        resolved_email = await get_user_email_by_id(user_id)
@@ -143,4 +158,102 @@ async def reset_user_rate_limit(
        weekly_token_limit=weekly_limit,
        daily_tokens_used=usage.daily.used,
        weekly_tokens_used=usage.weekly.used,
+        tier=tier,
    )
+
+
+@router.get(
+    "/rate_limit/tier",
+    response_model=UserTierResponse,
+    summary="Get User Rate Limit Tier",
+)
+async def get_user_rate_limit_tier(
+    user_id: str,
+    admin_user_id: str = Security(get_user_id),
+) -> UserTierResponse:
+    """Get a user's current rate-limit tier. Admin-only.
+
+    Returns 404 if the user does not exist in the database.
+    """
+    logger.info("Admin %s checking tier for user %s", admin_user_id, user_id)
+
+    resolved_email = await get_user_email_by_id(user_id)
+    if resolved_email is None:
+        raise HTTPException(status_code=404, detail=f"User {user_id} not found")
+
+    tier = await get_user_tier(user_id)
+    return UserTierResponse(user_id=user_id, tier=tier)
+
+
+@router.post(
+    "/rate_limit/tier",
+    response_model=UserTierResponse,
+    summary="Set User Rate Limit Tier",
+)
+async def set_user_rate_limit_tier(
+    request: SetUserTierRequest,
+    admin_user_id: str = Security(get_user_id),
+) -> UserTierResponse:
+    """Set a user's rate-limit tier. Admin-only.
+
+    Returns 404 if the user does not exist in the database.
+    """
+    try:
+        resolved_email = await get_user_email_by_id(request.user_id)
+    except Exception:
+        logger.warning(
+            "Failed to resolve email for user %s",
+            request.user_id,
+            exc_info=True,
+        )
+        resolved_email = None
+
+    if resolved_email is None:
+        raise HTTPException(status_code=404, detail=f"User {request.user_id} not found")
+
+    old_tier = await get_user_tier(request.user_id)
+    logger.info(
+        "Admin %s changing tier for user %s (%s): %s -> %s",
+        admin_user_id,
+        request.user_id,
+        resolved_email,
+        old_tier.value,
+        request.tier.value,
+    )
+    try:
+        await set_user_tier(request.user_id, request.tier)
+    except Exception as e:
+        logger.exception("Failed to set user tier")
+        raise HTTPException(status_code=500, detail="Failed to set tier") from e
+
+    return UserTierResponse(user_id=request.user_id, tier=request.tier)
+
+
+class UserSearchResult(BaseModel):
+    user_id: str
+    user_email: Optional[str] = None
+
+
+@router.get(
+    "/rate_limit/search_users",
+    response_model=list[UserSearchResult],
+    summary="Search Users by Name or Email",
+)
+async def admin_search_users(
+    query: str,
+    limit: int = 20,
+    admin_user_id: str = Security(get_user_id),
+) -> list[UserSearchResult]:
+    """Search users by partial email or name. Admin-only.
+
+    Queries the User table directly — returns results even for users
+    without credit transaction history.
+    """
+    if len(query.strip()) < 3:
+        raise HTTPException(
+            status_code=400,
+            detail="Search query must be at least 3 characters.",
+        )
+    logger.info("Admin %s searching users with query=%r", admin_user_id, query)
+    results = await search_users(query, limit=max(1, min(limit, 50)))
+    return [UserSearchResult(user_id=uid, user_email=email) for uid, email in results]
--- a/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes_test.py
@@ -9,7 +9,7 @@ import pytest_mock
 from autogpt_libs.auth.jwt_utils import get_jwt_payload
 from pytest_snapshot.plugin import Snapshot

-from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow
+from backend.copilot.rate_limit import CoPilotUsageStatus, SubscriptionTier, UsageWindow

 from .rate_limit_admin_routes import router as rate_limit_admin_router

@@ -57,7 +57,7 @@ def _patch_rate_limit_deps(
    mocker.patch(
        f"{_MOCK_MODULE}.get_global_rate_limits",
        new_callable=AsyncMock,
-        return_value=(2_500_000, 12_500_000),
+        return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
    )
    mocker.patch(
        f"{_MOCK_MODULE}.get_usage_status",
@@ -89,6 +89,7 @@ def test_get_rate_limit(
    assert data["weekly_token_limit"] == 12_500_000
    assert data["daily_tokens_used"] == 500_000
    assert data["weekly_tokens_used"] == 3_000_000
+    assert data["tier"] == "FREE"

    configured_snapshot.assert_match(
        json.dumps(data, indent=2, sort_keys=True) + "\n",
@@ -162,6 +163,7 @@ def test_reset_user_usage_daily_only(
    assert data["daily_tokens_used"] == 0
    # Weekly is untouched
    assert data["weekly_tokens_used"] == 3_000_000
+    assert data["tier"] == "FREE"

    mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=False)

@@ -192,6 +194,7 @@ def test_reset_user_usage_daily_and_weekly(
    data = response.json()
    assert data["daily_tokens_used"] == 0
    assert data["weekly_tokens_used"] == 0
+    assert data["tier"] == "FREE"

    mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=True)

@@ -228,7 +231,7 @@ def test_get_rate_limit_email_lookup_failure(
    mocker.patch(
        f"{_MOCK_MODULE}.get_global_rate_limits",
        new_callable=AsyncMock,
-        return_value=(2_500_000, 12_500_000),
+        return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
    )
    mocker.patch(
        f"{_MOCK_MODULE}.get_usage_status",
@@ -261,3 +264,303 @@ def test_admin_endpoints_require_admin_role(mock_jwt_user) -> None:
        json={"user_id": "test"},
    )
    assert response.status_code == 403
+
+
+# ---------------------------------------------------------------------------
+# Tier management endpoints
+# ---------------------------------------------------------------------------
+
+
+def test_get_user_tier(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test getting a user's rate-limit tier."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=_TARGET_EMAIL,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_tier",
+        new_callable=AsyncMock,
+        return_value=SubscriptionTier.PRO,
+    )
+
+    response = client.get("/admin/rate_limit/tier", params={"user_id": target_user_id})
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["user_id"] == target_user_id
+    assert data["tier"] == "PRO"
+
+
+def test_get_user_tier_user_not_found(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that getting tier for a non-existent user returns 404."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=None,
+    )
+
+    response = client.get("/admin/rate_limit/tier", params={"user_id": target_user_id})
+
+    assert response.status_code == 404
+
+
+def test_set_user_tier(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test setting a user's rate-limit tier (upgrade)."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=_TARGET_EMAIL,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_tier",
+        new_callable=AsyncMock,
+        return_value=SubscriptionTier.FREE,
+    )
+    mock_set = mocker.patch(
+        f"{_MOCK_MODULE}.set_user_tier",
+        new_callable=AsyncMock,
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "ENTERPRISE"},
+    )
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["user_id"] == target_user_id
+    assert data["tier"] == "ENTERPRISE"
+    mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.ENTERPRISE)
+
+
+def test_set_user_tier_downgrade(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test downgrading a user's tier from PRO to FREE."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=_TARGET_EMAIL,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_tier",
+        new_callable=AsyncMock,
+        return_value=SubscriptionTier.PRO,
+    )
+    mock_set = mocker.patch(
+        f"{_MOCK_MODULE}.set_user_tier",
+        new_callable=AsyncMock,
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "FREE"},
+    )
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["user_id"] == target_user_id
+    assert data["tier"] == "FREE"
+    mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.FREE)
+
+
+def test_set_user_tier_invalid_tier(
+    target_user_id: str,
+) -> None:
+    """Test that setting an invalid tier returns 422."""
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "invalid"},
+    )
+
+    assert response.status_code == 422
+
+
+def test_set_user_tier_invalid_tier_uppercase(
+    target_user_id: str,
+) -> None:
+    """Test that setting an unrecognised uppercase tier (e.g. 'INVALID') returns 422.
+
+    Regression: ensures Pydantic enum validation rejects values that are not
+    members of SubscriptionTier, even when they look like valid enum names.
+    """
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "INVALID"},
+    )
+
+    assert response.status_code == 422
+    body = response.json()
+    assert "detail" in body
+
+
+def test_set_user_tier_email_lookup_failure_returns_404(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that email lookup failure returns 404 (user unverifiable)."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        side_effect=Exception("DB connection failed"),
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "PRO"},
+    )
+
+    assert response.status_code == 404
+
+
+def test_set_user_tier_user_not_found(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that setting tier for a non-existent user returns 404."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=None,
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "PRO"},
+    )
+
+    assert response.status_code == 404
+
+
+def test_set_user_tier_db_failure(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that DB failure on set tier returns 500."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=_TARGET_EMAIL,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_tier",
+        new_callable=AsyncMock,
+        return_value=SubscriptionTier.FREE,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.set_user_tier",
+        new_callable=AsyncMock,
+        side_effect=Exception("DB connection refused"),
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "PRO"},
+    )
+
+    assert response.status_code == 500
+
+
+def test_tier_endpoints_require_admin_role(mock_jwt_user) -> None:
+    """Test that tier admin endpoints require admin role."""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
+
+    response = client.get("/admin/rate_limit/tier", params={"user_id": "test"})
+    assert response.status_code == 403
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": "test", "tier": "PRO"},
+    )
+    assert response.status_code == 403
+
+
+# ─── search_users endpoint ──────────────────────────────────────────
+
+
+def test_search_users_returns_matching_users(
+    mocker: pytest_mock.MockerFixture,
+    admin_user_id: str,
+) -> None:
+    """Partial search should return all matching users from the User table."""
+    mocker.patch(
+        _MOCK_MODULE + ".search_users",
+        new_callable=AsyncMock,
+        return_value=[
+            ("user-1", "zamil.majdy@gmail.com"),
+            ("user-2", "zamil.majdy@agpt.co"),
+        ],
+    )
+
+    response = client.get("/admin/rate_limit/search_users", params={"query": "zamil"})
+
+    assert response.status_code == 200
+    results = response.json()
+    assert len(results) == 2
+    assert results[0]["user_email"] == "zamil.majdy@gmail.com"
+    assert results[1]["user_email"] == "zamil.majdy@agpt.co"
+
+
+def test_search_users_empty_results(
+    mocker: pytest_mock.MockerFixture,
+    admin_user_id: str,
+) -> None:
+    """Search with no matches returns empty list."""
+    mocker.patch(
+        _MOCK_MODULE + ".search_users",
+        new_callable=AsyncMock,
+        return_value=[],
+    )
+
+    response = client.get(
+        "/admin/rate_limit/search_users", params={"query": "nonexistent"}
+    )
+
+    assert response.status_code == 200
+    assert response.json() == []
+
+
+def test_search_users_short_query_rejected(
+    admin_user_id: str,
+) -> None:
+    """Query shorter than 3 characters should return 400."""
+    response = client.get("/admin/rate_limit/search_users", params={"query": "ab"})
+    assert response.status_code == 400
+
+
+def test_search_users_negative_limit_clamped(
+    mocker: pytest_mock.MockerFixture,
+    admin_user_id: str,
+) -> None:
+    """Negative limit should be clamped to 1, not passed through."""
+    mock_search = mocker.patch(
+        _MOCK_MODULE + ".search_users",
+        new_callable=AsyncMock,
+        return_value=[],
+    )
+
+    response = client.get(
+        "/admin/rate_limit/search_users", params={"query": "test", "limit": -1}
+    )
+
+    assert response.status_code == 200
+    mock_search.assert_awaited_once_with("test", limit=1)
+
+
+def test_search_users_requires_admin_role(mock_jwt_user) -> None:
+    """Test that the search_users endpoint requires admin role."""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
+
+    response = client.get("/admin/rate_limit/search_users", params={"query": "test"})
+    assert response.status_code == 403
--- a/autogpt_platform/backend/backend/api/features/chat/routes.py
+++ b/autogpt_platform/backend/backend/api/features/chat/routes.py
@@ -15,7 +15,8 @@ from pydantic import BaseModel, ConfigDict, Field, field_validator

 from backend.copilot import service as chat_service
 from backend.copilot import stream_registry
-from backend.copilot.config import ChatConfig
+from backend.copilot.config import ChatConfig, CopilotMode
+from backend.copilot.db import get_chat_messages_paginated
 from backend.copilot.executor.utils import enqueue_cancel_task, enqueue_copilot_turn
 from backend.copilot.model import (
    ChatMessage,
@@ -111,6 +112,11 @@ class StreamChatRequest(BaseModel):
    file_ids: list[str] | None = Field(
        default=None, max_length=20
    )  # Workspace file IDs attached to this message
+    mode: CopilotMode | None = Field(
+        default=None,
+        description="Autopilot mode: 'fast' for baseline LLM, 'extended_thinking' for Claude Agent SDK. "
+        "If None, uses the server default (extended_thinking).",
+    )


 class CreateSessionRequest(BaseModel):
@@ -150,6 +156,8 @@ class SessionDetailResponse(BaseModel):
    user_id: str | None
    messages: list[dict]
    active_stream: ActiveStreamInfo | None = None  # Present if stream is still active
+    has_more_messages: bool = False
+    oldest_sequence: int | None = None
    total_prompt_tokens: int = 0
    total_completion_tokens: int = 0
    metadata: ChatSessionMetadata = ChatSessionMetadata()
@@ -389,60 +397,78 @@ async def update_session_title_route(
 async def get_session(
    session_id: str,
    user_id: Annotated[str, Security(auth.get_user_id)],
+    limit: int = Query(default=50, ge=1, le=200),
+    before_sequence: int | None = Query(default=None, ge=0),
 ) -> SessionDetailResponse:
    """
    Retrieve the details of a specific chat session.

-    Looks up a chat session by ID for the given user (if authenticated) and returns all session data including messages.
-    If there's an active stream for this session, returns active_stream info for reconnection.
+    Supports cursor-based pagination via ``limit`` and ``before_sequence``.
+    When no pagination params are provided, returns the most recent messages.

    Args:
        session_id: The unique identifier for the desired chat session.
-        user_id: The optional authenticated user ID, or None for anonymous access.
+        user_id: The authenticated user's ID.
+        limit: Maximum number of messages to return (1-200, default 50).
+        before_sequence: Return messages with sequence < this value (cursor).

    Returns:
-        SessionDetailResponse: Details for the requested session, including active_stream info if applicable.
-
+        SessionDetailResponse: Details for the requested session, including
+            active_stream info and pagination metadata.
    """
-    session = await get_chat_session(session_id, user_id)
-    if not session:
+    page = await get_chat_messages_paginated(
+        session_id, limit, before_sequence, user_id=user_id
+    )
+    if page is None:
        raise NotFoundError(f"Session {session_id} not found.")
+    messages = [message.model_dump() for message in page.messages]

-    messages = [message.model_dump() for message in session.messages]
-
-    # Check if there's an active stream for this session
+    # Only check active stream on initial load (not on "load more" requests)
    active_stream_info = None
-    active_session, last_message_id = await stream_registry.get_active_session(
-        session_id, user_id
-    )
-    logger.info(
-        f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
-        f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
-    )
-    if active_session:
-        # Keep the assistant message (including tool_calls) so the frontend can
-        # render the correct tool UI (e.g. CreateAgent with mini game).
-        # convertChatSessionToUiMessages handles isComplete=false by setting
-        # tool parts without output to state "input-available".
-        active_stream_info = ActiveStreamInfo(
-            turn_id=active_session.turn_id,
-            last_message_id=last_message_id,
+    if before_sequence is None:
+        active_session, last_message_id = await stream_registry.get_active_session(
+            session_id, user_id
+        )
+        logger.info(
+            f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
+            f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
+        )
+        if active_session:
+            active_stream_info = ActiveStreamInfo(
+                turn_id=active_session.turn_id,
+                last_message_id=last_message_id,
+            )
+
+    # Skip session metadata on "load more" — frontend only needs messages
+    if before_sequence is not None:
+        return SessionDetailResponse(
+            id=page.session.session_id,
+            created_at=page.session.started_at.isoformat(),
+            updated_at=page.session.updated_at.isoformat(),
+            user_id=page.session.user_id or None,
+            messages=messages,
+            active_stream=None,
+            has_more_messages=page.has_more,
+            oldest_sequence=page.oldest_sequence,
+            total_prompt_tokens=0,
+            total_completion_tokens=0,
        )

-    # Sum token usage from session
-    total_prompt = sum(u.prompt_tokens for u in session.usage)
-    total_completion = sum(u.completion_tokens for u in session.usage)
+    total_prompt = sum(u.prompt_tokens for u in page.session.usage)
+    total_completion = sum(u.completion_tokens for u in page.session.usage)

    return SessionDetailResponse(
-        id=session.session_id,
-        created_at=session.started_at.isoformat(),
-        updated_at=session.updated_at.isoformat(),
-        user_id=session.user_id or None,
+        id=page.session.session_id,
+        created_at=page.session.started_at.isoformat(),
+        updated_at=page.session.updated_at.isoformat(),
+        user_id=page.session.user_id or None,
        messages=messages,
        active_stream=active_stream_info,
+        has_more_messages=page.has_more,
+        oldest_sequence=page.oldest_sequence,
        total_prompt_tokens=total_prompt,
        total_completion_tokens=total_completion,
-        metadata=session.metadata,
+        metadata=page.session.metadata,
    )


@@ -456,8 +482,9 @@ async def get_copilot_usage(

    Returns current token usage vs limits for daily and weekly windows.
    Global defaults sourced from LaunchDarkly (falling back to config).
+    Includes the user's rate-limit tier.
    """
-    daily_limit, weekly_limit = await get_global_rate_limits(
+    daily_limit, weekly_limit, tier = await get_global_rate_limits(
        user_id, config.daily_token_limit, config.weekly_token_limit
    )
    return await get_usage_status(
@@ -465,6 +492,7 @@ async def get_copilot_usage(
        daily_token_limit=daily_limit,
        weekly_token_limit=weekly_limit,
        rate_limit_reset_cost=config.rate_limit_reset_cost,
+        tier=tier,
    )


@@ -516,7 +544,7 @@ async def reset_copilot_usage(
            detail="Rate limit reset is not available (credit system is disabled).",
        )

-    daily_limit, weekly_limit = await get_global_rate_limits(
+    daily_limit, weekly_limit, tier = await get_global_rate_limits(
        user_id, config.daily_token_limit, config.weekly_token_limit
    )

@@ -556,6 +584,7 @@ async def reset_copilot_usage(
            user_id=user_id,
            daily_token_limit=daily_limit,
            weekly_token_limit=weekly_limit,
+            tier=tier,
        )
        if daily_limit > 0 and usage_status.daily.used < daily_limit:
            raise HTTPException(
@@ -631,6 +660,7 @@ async def reset_copilot_usage(
        daily_token_limit=daily_limit,
        weekly_token_limit=weekly_limit,
        rate_limit_reset_cost=config.rate_limit_reset_cost,
+        tier=tier,
    )

    return RateLimitResetResponse(
@@ -741,7 +771,7 @@ async def stream_chat_post(
    # Global defaults sourced from LaunchDarkly, falling back to config.
    if user_id:
        try:
-            daily_limit, weekly_limit = await get_global_rate_limits(
+            daily_limit, weekly_limit, _ = await get_global_rate_limits(
                user_id, config.daily_token_limit, config.weekly_token_limit
            )
            await check_rate_limit(
@@ -836,6 +866,7 @@ async def stream_chat_post(
        is_user_message=request.is_user_message,
        context=request.context,
        file_ids=sanitized_file_ids,
+        mode=request.mode,
    )

    setup_time = (time.perf_counter() - stream_start_time) * 1000
--- a/autogpt_platform/backend/backend/api/features/chat/routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/routes_test.py
@@ -9,6 +9,7 @@ import pytest
 import pytest_mock

 from backend.api.features.chat import routes as chat_routes
+from backend.copilot.rate_limit import SubscriptionTier

 app = fastapi.FastAPI()
 app.include_router(chat_routes.router)
@@ -331,14 +332,28 @@ def _mock_usage(
    *,
    daily_used: int = 500,
    weekly_used: int = 2000,
+    daily_limit: int = 10000,
+    weekly_limit: int = 50000,
+    tier: "SubscriptionTier" = SubscriptionTier.FREE,
 ) -> AsyncMock:
-    """Mock get_usage_status to return a predictable CoPilotUsageStatus."""
+    """Mock get_usage_status and get_global_rate_limits for usage endpoint tests.
+
+    Mocks both ``get_global_rate_limits`` (returns the given limits + tier) and
+    ``get_usage_status`` so that tests exercise the endpoint without hitting
+    LaunchDarkly or Prisma.
+    """
    from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow

+    mocker.patch(
+        "backend.api.features.chat.routes.get_global_rate_limits",
+        new_callable=AsyncMock,
+        return_value=(daily_limit, weekly_limit, tier),
+    )
+
    resets_at = datetime.now(UTC) + timedelta(days=1)
    status = CoPilotUsageStatus(
-        daily=UsageWindow(used=daily_used, limit=10000, resets_at=resets_at),
-        weekly=UsageWindow(used=weekly_used, limit=50000, resets_at=resets_at),
+        daily=UsageWindow(used=daily_used, limit=daily_limit, resets_at=resets_at),
+        weekly=UsageWindow(used=weekly_used, limit=weekly_limit, resets_at=resets_at),
    )
    return mocker.patch(
        "backend.api.features.chat.routes.get_usage_status",
@@ -369,6 +384,7 @@ def test_usage_returns_daily_and_weekly(
        daily_token_limit=10000,
        weekly_token_limit=50000,
        rate_limit_reset_cost=chat_routes.config.rate_limit_reset_cost,
+        tier=SubscriptionTier.FREE,
    )


@@ -376,11 +392,9 @@ def test_usage_uses_config_limits(
    mocker: pytest_mock.MockerFixture,
    test_user_id: str,
 ) -> None:
-    """The endpoint forwards daily_token_limit and weekly_token_limit from config."""
-    mock_get = _mock_usage(mocker)
+    """The endpoint forwards resolved limits from get_global_rate_limits to get_usage_status."""
+    mock_get = _mock_usage(mocker, daily_limit=99999, weekly_limit=77777)

-    mocker.patch.object(chat_routes.config, "daily_token_limit", 99999)
-    mocker.patch.object(chat_routes.config, "weekly_token_limit", 77777)
    mocker.patch.object(chat_routes.config, "rate_limit_reset_cost", 500)

    response = client.get("/usage")
@@ -391,6 +405,7 @@ def test_usage_uses_config_limits(
        daily_token_limit=99999,
        weekly_token_limit=77777,
        rate_limit_reset_cost=500,
+        tier=SubscriptionTier.FREE,
    )


@@ -526,3 +541,41 @@ def test_create_session_rejects_nested_metadata(
    )

    assert response.status_code == 422
+
+
+class TestStreamChatRequestModeValidation:
+    """Pydantic-level validation of the ``mode`` field on StreamChatRequest."""
+
+    def test_rejects_invalid_mode_value(self) -> None:
+        """Any string outside the Literal set must raise ValidationError."""
+        from pydantic import ValidationError
+
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        with pytest.raises(ValidationError):
+            StreamChatRequest(message="hi", mode="turbo")  # type: ignore[arg-type]
+
+    def test_accepts_fast_mode(self) -> None:
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        req = StreamChatRequest(message="hi", mode="fast")
+        assert req.mode == "fast"
+
+    def test_accepts_extended_thinking_mode(self) -> None:
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        req = StreamChatRequest(message="hi", mode="extended_thinking")
+        assert req.mode == "extended_thinking"
+
+    def test_accepts_none_mode(self) -> None:
+        """``mode=None`` is valid (server decides via feature flags)."""
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        req = StreamChatRequest(message="hi", mode=None)
+        assert req.mode is None
+
+    def test_mode_defaults_to_none_when_omitted(self) -> None:
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        req = StreamChatRequest(message="hi")
+        assert req.mode is None
--- a/autogpt_platform/backend/backend/api/features/store/db_test.py
+++ b/autogpt_platform/backend/backend/api/features/store/db_test.py
@@ -189,6 +189,7 @@ async def test_create_store_submission(mocker):
        notifyOnAgentApproved=True,
        notifyOnAgentRejected=True,
        timezone="Europe/Delft",
+        subscriptionTier=prisma.enums.SubscriptionTier.FREE,  # type: ignore[reportCallIssue,reportAttributeAccessIssue]
    )
    mock_agent = prisma.models.AgentGraph(
        id="agent-id",
--- a/autogpt_platform/backend/backend/api/features/workspace/routes.py
+++ b/autogpt_platform/backend/backend/api/features/workspace/routes.py
@@ -12,7 +12,7 @@ import fastapi
 from autogpt_libs.auth.dependencies import get_user_id, requires_user
 from fastapi import Query, UploadFile
 from fastapi.responses import Response
-from pydantic import BaseModel
+from pydantic import BaseModel, Field

 from backend.data.workspace import (
    WorkspaceFile,
@@ -131,9 +131,26 @@ class StorageUsageResponse(BaseModel):
    file_count: int


+class WorkspaceFileItem(BaseModel):
+    id: str
+    name: str
+    path: str
+    mime_type: str
+    size_bytes: int
+    metadata: dict = Field(default_factory=dict)
+    created_at: str
+
+
+class ListFilesResponse(BaseModel):
+    files: list[WorkspaceFileItem]
+    offset: int = 0
+    has_more: bool = False
+
+
@router.get(
    "/files/{file_id}/download",
    summary="Download file by ID",
+    operation_id="getWorkspaceDownloadFileById",
 )
 async def download_file(
    user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -158,6 +175,7 @@ async def download_file(
@router.delete(
    "/files/{file_id}",
    summary="Delete a workspace file",
+    operation_id="deleteWorkspaceFile",
 )
 async def delete_workspace_file(
    user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -183,6 +201,7 @@ async def delete_workspace_file(
@router.post(
    "/files/upload",
    summary="Upload file to workspace",
+    operation_id="uploadWorkspaceFile",
 )
 async def upload_file(
    user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -196,6 +215,9 @@ async def upload_file(
    Files are stored in session-scoped paths when session_id is provided,
    so the agent's session-scoped tools can discover them automatically.
    """
+    # Empty-string session_id drops session scoping; normalize to None.
+    session_id = session_id or None
+
    config = Config()

    # Sanitize filename — strip any directory components
@@ -250,16 +272,27 @@ async def upload_file(
    manager = WorkspaceManager(user_id, workspace.id, session_id)
    try:
        workspace_file = await manager.write_file(
-            content, filename, overwrite=overwrite
+            content, filename, overwrite=overwrite, metadata={"origin": "user-upload"}
        )
    except ValueError as e:
-        raise fastapi.HTTPException(status_code=409, detail=str(e)) from e
+        # write_file raises ValueError for both path-conflict and size-limit
+        # cases; map each to its correct HTTP status.
+        message = str(e)
+        if message.startswith("File too large"):
+            raise fastapi.HTTPException(status_code=413, detail=message) from e
+        raise fastapi.HTTPException(status_code=409, detail=message) from e

    # Post-write storage check — eliminates TOCTOU race on the quota.
    # If a concurrent upload pushed us over the limit, undo this write.
    new_total = await get_workspace_total_size(workspace.id)
    if storage_limit_bytes and new_total > storage_limit_bytes:
-        await soft_delete_workspace_file(workspace_file.id, workspace.id)
+        try:
+            await soft_delete_workspace_file(workspace_file.id, workspace.id)
+        except Exception as e:
+            logger.warning(
+                f"Failed to soft-delete over-quota file {workspace_file.id} "
+                f"in workspace {workspace.id}: {e}"
+            )
        raise fastapi.HTTPException(
            status_code=413,
            detail={
@@ -281,6 +314,7 @@ async def upload_file(
@router.get(
    "/storage/usage",
    summary="Get workspace storage usage",
+    operation_id="getWorkspaceStorageUsage",
 )
 async def get_storage_usage(
    user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -301,3 +335,57 @@ async def get_storage_usage(
        used_percent=round((used_bytes / limit_bytes) * 100, 1) if limit_bytes else 0,
        file_count=file_count,
    )
+
+
+@router.get(
+    "/files",
+    summary="List workspace files",
+    operation_id="listWorkspaceFiles",
+)
+async def list_workspace_files(
+    user_id: Annotated[str, fastapi.Security(get_user_id)],
+    session_id: str | None = Query(default=None),
+    limit: int = Query(default=200, ge=1, le=1000),
+    offset: int = Query(default=0, ge=0),
+) -> ListFilesResponse:
+    """
+    List files in the user's workspace.
+
+    When session_id is provided, only files for that session are returned.
+    Otherwise, all files across sessions are listed. Results are paginated
+    via `limit`/`offset`; `has_more` indicates whether additional pages exist.
+    """
+    workspace = await get_or_create_workspace(user_id)
+
+    # Treat empty-string session_id the same as omitted — an empty value
+    # would otherwise silently list files across every session instead of
+    # scoping to one.
+    session_id = session_id or None
+
+    manager = WorkspaceManager(user_id, workspace.id, session_id)
+    include_all = session_id is None
+    # Fetch one extra to compute has_more without a separate count query.
+    files = await manager.list_files(
+        limit=limit + 1,
+        offset=offset,
+        include_all_sessions=include_all,
+    )
+    has_more = len(files) > limit
+    page = files[:limit]
+
+    return ListFilesResponse(
+        files=[
+            WorkspaceFileItem(
+                id=f.id,
+                name=f.name,
+                path=f.path,
+                mime_type=f.mime_type,
+                size_bytes=f.size_bytes,
+                metadata=f.metadata or {},
+                created_at=f.created_at.isoformat(),
+            )
+            for f in page
+        ],
+        offset=offset,
+        has_more=has_more,
+    )
--- a/autogpt_platform/backend/backend/api/features/workspace/routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/workspace/routes_test.py
@@ -1,48 +1,28 @@
-"""Tests for workspace file upload and download routes."""
-
 import io
 from datetime import datetime, timezone
+from unittest.mock import AsyncMock, MagicMock, patch

 import fastapi
 import fastapi.testclient
 import pytest
-import pytest_mock

-from backend.api.features.workspace import routes as workspace_routes
-from backend.data.workspace import WorkspaceFile
+from backend.api.features.workspace.routes import router
+from backend.data.workspace import Workspace, WorkspaceFile

 app = fastapi.FastAPI()
-app.include_router(workspace_routes.router)
+app.include_router(router)


@app.exception_handler(ValueError)
 async def _value_error_handler(
    request: fastapi.Request, exc: ValueError
 ) -> fastapi.responses.JSONResponse:
-    """Mirror the production ValueError → 400 mapping from rest_api.py."""
+    """Mirror the production ValueError → 400 mapping from the REST app."""
    return fastapi.responses.JSONResponse(status_code=400, content={"detail": str(exc)})


 client = fastapi.testclient.TestClient(app)

-TEST_USER_ID = "3e53486c-cf57-477e-ba2a-cb02dc828e1a"
-
-MOCK_WORKSPACE = type("W", (), {"id": "ws-1"})()
-
-_NOW = datetime(2023, 1, 1, tzinfo=timezone.utc)
-
-MOCK_FILE = WorkspaceFile(
-    id="file-aaa-bbb",
-    workspace_id="ws-1",
-    created_at=_NOW,
-    updated_at=_NOW,
-    name="hello.txt",
-    path="/session/hello.txt",
-    mime_type="text/plain",
-    size_bytes=13,
-    storage_path="local://hello.txt",
-)
-

@pytest.fixture(autouse=True)
 def setup_app_auth(mock_jwt_user):
@@ -53,25 +33,201 @@ def setup_app_auth(mock_jwt_user):
    app.dependency_overrides.clear()


+def _make_workspace(user_id: str = "test-user-id") -> Workspace:
+    return Workspace(
+        id="ws-001",
+        user_id=user_id,
+        created_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
+        updated_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
+    )
+
+
+def _make_file(**overrides) -> WorkspaceFile:
+    defaults = {
+        "id": "file-001",
+        "workspace_id": "ws-001",
+        "created_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
+        "updated_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
+        "name": "test.txt",
+        "path": "/test.txt",
+        "storage_path": "local://test.txt",
+        "mime_type": "text/plain",
+        "size_bytes": 100,
+        "checksum": None,
+        "is_deleted": False,
+        "deleted_at": None,
+        "metadata": {},
+    }
+    defaults.update(overrides)
+    return WorkspaceFile(**defaults)
+
+
+def _make_file_mock(**overrides) -> MagicMock:
+    """Create a mock WorkspaceFile to simulate DB records with null fields."""
+    defaults = {
+        "id": "file-001",
+        "name": "test.txt",
+        "path": "/test.txt",
+        "mime_type": "text/plain",
+        "size_bytes": 100,
+        "metadata": {},
+        "created_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
+    }
+    defaults.update(overrides)
+    mock = MagicMock(spec=WorkspaceFile)
+    for k, v in defaults.items():
+        setattr(mock, k, v)
+    return mock
+
+
+# -- list_workspace_files tests --
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_returns_all_when_no_session(mock_manager_cls, mock_get_workspace):
+    mock_get_workspace.return_value = _make_workspace()
+    files = [
+        _make_file(id="f1", name="a.txt", metadata={"origin": "user-upload"}),
+        _make_file(id="f2", name="b.csv", metadata={"origin": "agent-created"}),
+    ]
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = files
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files")
+    assert response.status_code == 200
+
+    data = response.json()
+    assert len(data["files"]) == 2
+    assert data["has_more"] is False
+    assert data["offset"] == 0
+    assert data["files"][0]["id"] == "f1"
+    assert data["files"][0]["metadata"] == {"origin": "user-upload"}
+    assert data["files"][1]["id"] == "f2"
+    mock_instance.list_files.assert_called_once_with(
+        limit=201, offset=0, include_all_sessions=True
+    )
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_scopes_to_session_when_provided(
+    mock_manager_cls, mock_get_workspace, test_user_id
+):
+    mock_get_workspace.return_value = _make_workspace(user_id=test_user_id)
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = []
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files?session_id=sess-123")
+    assert response.status_code == 200
+
+    data = response.json()
+    assert data["files"] == []
+    assert data["has_more"] is False
+    mock_manager_cls.assert_called_once_with(test_user_id, "ws-001", "sess-123")
+    mock_instance.list_files.assert_called_once_with(
+        limit=201, offset=0, include_all_sessions=False
+    )
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_null_metadata_coerced_to_empty_dict(
+    mock_manager_cls, mock_get_workspace
+):
+    """Route uses `f.metadata or {}` for pre-existing files with null metadata."""
+    mock_get_workspace.return_value = _make_workspace()
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = [_make_file_mock(metadata=None)]
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files")
+    assert response.status_code == 200
+    assert response.json()["files"][0]["metadata"] == {}
+
+
+# -- upload_file metadata tests --
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.get_workspace_total_size")
+@patch("backend.api.features.workspace.routes.scan_content_safe")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_upload_passes_user_upload_origin_metadata(
+    mock_manager_cls, mock_scan, mock_total_size, mock_get_workspace
+):
+    mock_get_workspace.return_value = _make_workspace()
+    mock_total_size.return_value = 100
+    written = _make_file(id="new-file", name="doc.pdf")
+    mock_instance = AsyncMock()
+    mock_instance.write_file.return_value = written
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.post(
+        "/files/upload",
+        files={"file": ("doc.pdf", b"fake-pdf-content", "application/pdf")},
+    )
+    assert response.status_code == 200
+
+    mock_instance.write_file.assert_called_once()
+    call_kwargs = mock_instance.write_file.call_args
+    assert call_kwargs.kwargs.get("metadata") == {"origin": "user-upload"}
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.get_workspace_total_size")
+@patch("backend.api.features.workspace.routes.scan_content_safe")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_upload_returns_409_on_file_conflict(
+    mock_manager_cls, mock_scan, mock_total_size, mock_get_workspace
+):
+    mock_get_workspace.return_value = _make_workspace()
+    mock_total_size.return_value = 100
+    mock_instance = AsyncMock()
+    mock_instance.write_file.side_effect = ValueError("File already exists at path")
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.post(
+        "/files/upload",
+        files={"file": ("dup.txt", b"content", "text/plain")},
+    )
+    assert response.status_code == 409
+    assert "already exists" in response.json()["detail"]
+
+
+# -- Restored upload/download/delete security + invariant tests --
+
+
 def _upload(
    filename: str = "hello.txt",
    content: bytes = b"Hello, world!",
    content_type: str = "text/plain",
 ):
-    """Helper to POST a file upload."""
    return client.post(
        "/files/upload?session_id=sess-1",
        files={"file": (filename, io.BytesIO(content), content_type)},
    )


-# ---- Happy path ----
+_MOCK_FILE = WorkspaceFile(
+    id="file-aaa-bbb",
+    workspace_id="ws-001",
+    created_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
+    updated_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
+    name="hello.txt",
+    path="/sessions/sess-1/hello.txt",
+    mime_type="text/plain",
+    size_bytes=13,
+    storage_path="local://hello.txt",
+)


-def test_upload_happy_path(mocker: pytest_mock.MockFixture):
+def test_upload_happy_path(mocker):
    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -82,7 +238,7 @@ def test_upload_happy_path(mocker: pytest_mock.MockFixture):
        return_value=None,
    )
    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
    mocker.patch(
        "backend.api.features.workspace.routes.WorkspaceManager",
        return_value=mock_manager,
@@ -96,10 +252,7 @@ def test_upload_happy_path(mocker: pytest_mock.MockFixture):
    assert data["size_bytes"] == 13


-# ---- Per-file size limit ----
-
-
-def test_upload_exceeds_max_file_size(mocker: pytest_mock.MockFixture):
+def test_upload_exceeds_max_file_size(mocker):
    """Files larger than max_file_size_mb should be rejected with 413."""
    cfg = mocker.patch("backend.api.features.workspace.routes.Config")
    cfg.return_value.max_file_size_mb = 0  # 0 MB → any content is too big
@@ -109,15 +262,11 @@ def test_upload_exceeds_max_file_size(mocker: pytest_mock.MockFixture):
    assert response.status_code == 413


-# ---- Storage quota exceeded ----
-
-
-def test_upload_storage_quota_exceeded(mocker: pytest_mock.MockFixture):
+def test_upload_storage_quota_exceeded(mocker):
    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
-    # Current usage already at limit
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
        return_value=500 * 1024 * 1024,
@@ -128,27 +277,22 @@ def test_upload_storage_quota_exceeded(mocker: pytest_mock.MockFixture):
    assert "Storage limit exceeded" in response.text


-# ---- Post-write quota race (B2) ----
-
-
-def test_upload_post_write_quota_race(mocker: pytest_mock.MockFixture):
-    """If a concurrent upload tips the total over the limit after write,
-    the file should be soft-deleted and 413 returned."""
+def test_upload_post_write_quota_race(mocker):
+    """Concurrent upload tipping over limit after write should soft-delete + 413."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
-    # Pre-write check passes (under limit), but post-write check fails
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
-        side_effect=[0, 600 * 1024 * 1024],  # first call OK, second over limit
+        side_effect=[0, 600 * 1024 * 1024],
    )
    mocker.patch(
        "backend.api.features.workspace.routes.scan_content_safe",
        return_value=None,
    )
    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
    mocker.patch(
        "backend.api.features.workspace.routes.WorkspaceManager",
        return_value=mock_manager,
@@ -160,17 +304,14 @@ def test_upload_post_write_quota_race(mocker: pytest_mock.MockFixture):

    response = _upload()
    assert response.status_code == 413
-    mock_delete.assert_called_once_with("file-aaa-bbb", "ws-1")
+    mock_delete.assert_called_once_with("file-aaa-bbb", "ws-001")


-# ---- Any extension accepted (no allowlist) ----
-
-
-def test_upload_any_extension(mocker: pytest_mock.MockFixture):
+def test_upload_any_extension(mocker):
    """Any file extension should be accepted — ClamAV is the security layer."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -181,7 +322,7 @@ def test_upload_any_extension(mocker: pytest_mock.MockFixture):
        return_value=None,
    )
    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
    mocker.patch(
        "backend.api.features.workspace.routes.WorkspaceManager",
        return_value=mock_manager,
@@ -191,16 +332,13 @@ def test_upload_any_extension(mocker: pytest_mock.MockFixture):
    assert response.status_code == 200


-# ---- Virus scan rejection ----
-
-
-def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
+def test_upload_blocked_by_virus_scan(mocker):
    """Files flagged by ClamAV should be rejected and never written to storage."""
    from backend.api.features.store.exceptions import VirusDetectedError

    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -211,7 +349,7 @@ def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
        side_effect=VirusDetectedError("Eicar-Test-Signature"),
    )
    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
    mocker.patch(
        "backend.api.features.workspace.routes.WorkspaceManager",
        return_value=mock_manager,
@@ -219,18 +357,14 @@ def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):

    response = _upload(filename="evil.exe", content=b"X5O!P%@AP...")
    assert response.status_code == 400
-    assert "Virus detected" in response.text
    mock_manager.write_file.assert_not_called()


-# ---- No file extension ----
-
-
-def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
+def test_upload_file_without_extension(mocker):
    """Files without an extension should be accepted and stored as-is."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -241,7 +375,7 @@ def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
        return_value=None,
    )
    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
    mocker.patch(
        "backend.api.features.workspace.routes.WorkspaceManager",
        return_value=mock_manager,
@@ -257,14 +391,11 @@ def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
    assert mock_manager.write_file.call_args[0][1] == "Makefile"


-# ---- Filename sanitization (SF5) ----
-
-
-def test_upload_strips_path_components(mocker: pytest_mock.MockFixture):
+def test_upload_strips_path_components(mocker):
    """Path-traversal filenames should be reduced to their basename."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -275,28 +406,23 @@ def test_upload_strips_path_components(mocker: pytest_mock.MockFixture):
        return_value=None,
    )
    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
    mocker.patch(
        "backend.api.features.workspace.routes.WorkspaceManager",
        return_value=mock_manager,
    )

-    # Filename with traversal
    _upload(filename="../../etc/passwd.txt")

-    # write_file should have been called with just the basename
    mock_manager.write_file.assert_called_once()
    call_args = mock_manager.write_file.call_args
    assert call_args[0][1] == "passwd.txt"


-# ---- Download ----
-
-
-def test_download_file_not_found(mocker: pytest_mock.MockFixture):
+def test_download_file_not_found(mocker):
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_file",
@@ -307,14 +433,11 @@ def test_download_file_not_found(mocker: pytest_mock.MockFixture):
    assert response.status_code == 404


-# ---- Delete ----
-
-
-def test_delete_file_success(mocker: pytest_mock.MockFixture):
+def test_delete_file_success(mocker):
    """Deleting an existing file should return {"deleted": true}."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mock_manager = mocker.MagicMock()
    mock_manager.delete_file = mocker.AsyncMock(return_value=True)
@@ -329,11 +452,11 @@ def test_delete_file_success(mocker: pytest_mock.MockFixture):
    mock_manager.delete_file.assert_called_once_with("file-aaa-bbb")


-def test_delete_file_not_found(mocker: pytest_mock.MockFixture):
+def test_delete_file_not_found(mocker):
    """Deleting a non-existent file should return 404."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mock_manager = mocker.MagicMock()
    mock_manager.delete_file = mocker.AsyncMock(return_value=False)
@@ -347,7 +470,7 @@ def test_delete_file_not_found(mocker: pytest_mock.MockFixture):
    assert "File not found" in response.text


-def test_delete_file_no_workspace(mocker: pytest_mock.MockFixture):
+def test_delete_file_no_workspace(mocker):
    """Deleting when user has no workspace should return 404."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace",
@@ -357,3 +480,123 @@ def test_delete_file_no_workspace(mocker: pytest_mock.MockFixture):
    response = client.delete("/files/file-aaa-bbb")
    assert response.status_code == 404
    assert "Workspace not found" in response.text
+
+
+def test_upload_write_file_too_large_returns_413(mocker):
+    """write_file raises ValueError("File too large: …") → must map to 413."""
+    mocker.patch(
+        "backend.api.features.workspace.routes.get_or_create_workspace",
+        return_value=_make_workspace(),
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.get_workspace_total_size",
+        return_value=0,
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.scan_content_safe",
+        return_value=None,
+    )
+    mock_manager = mocker.MagicMock()
+    mock_manager.write_file = mocker.AsyncMock(
+        side_effect=ValueError("File too large: 900 bytes exceeds 1MB limit")
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.WorkspaceManager",
+        return_value=mock_manager,
+    )
+
+    response = _upload()
+    assert response.status_code == 413
+    assert "File too large" in response.text
+
+
+def test_upload_write_file_conflict_returns_409(mocker):
+    """Non-'File too large' ValueErrors from write_file stay as 409."""
+    mocker.patch(
+        "backend.api.features.workspace.routes.get_or_create_workspace",
+        return_value=_make_workspace(),
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.get_workspace_total_size",
+        return_value=0,
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.scan_content_safe",
+        return_value=None,
+    )
+    mock_manager = mocker.MagicMock()
+    mock_manager.write_file = mocker.AsyncMock(
+        side_effect=ValueError("File already exists at path: /sessions/x/a.txt")
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.WorkspaceManager",
+        return_value=mock_manager,
+    )
+
+    response = _upload()
+    assert response.status_code == 409
+    assert "already exists" in response.text
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_has_more_true_when_limit_exceeded(
+    mock_manager_cls, mock_get_workspace
+):
+    """The limit+1 fetch trick must flip has_more=True and trim the page."""
+    mock_get_workspace.return_value = _make_workspace()
+    # Backend was asked for limit+1=3, and returned exactly 3 items.
+    files = [
+        _make_file(id="f1", name="a.txt"),
+        _make_file(id="f2", name="b.txt"),
+        _make_file(id="f3", name="c.txt"),
+    ]
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = files
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files?limit=2")
+    assert response.status_code == 200
+    data = response.json()
+    assert data["has_more"] is True
+    assert len(data["files"]) == 2
+    assert data["files"][0]["id"] == "f1"
+    assert data["files"][1]["id"] == "f2"
+    mock_instance.list_files.assert_called_once_with(
+        limit=3, offset=0, include_all_sessions=True
+    )
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_has_more_false_when_exactly_page_size(
+    mock_manager_cls, mock_get_workspace
+):
+    """Exactly `limit` rows means we're on the last page — has_more=False."""
+    mock_get_workspace.return_value = _make_workspace()
+    files = [_make_file(id="f1", name="a.txt"), _make_file(id="f2", name="b.txt")]
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = files
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files?limit=2")
+    assert response.status_code == 200
+    data = response.json()
+    assert data["has_more"] is False
+    assert len(data["files"]) == 2
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_offset_is_echoed_back(mock_manager_cls, mock_get_workspace):
+    mock_get_workspace.return_value = _make_workspace()
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = []
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files?offset=50&limit=10")
+    assert response.status_code == 200
+    assert response.json()["offset"] == 50
+    mock_instance.list_files.assert_called_once_with(
+        limit=11, offset=50, include_all_sessions=True
+    )
--- a/autogpt_platform/backend/backend/api/rest_api.py
+++ b/autogpt_platform/backend/backend/api/rest_api.py
@@ -18,6 +18,7 @@ from prisma.errors import PrismaError

 import backend.api.features.admin.credit_admin_routes
 import backend.api.features.admin.execution_analytics_routes
+import backend.api.features.admin.platform_cost_routes
 import backend.api.features.admin.rate_limit_admin_routes
 import backend.api.features.admin.store_admin_routes
 import backend.api.features.builder
@@ -329,6 +330,11 @@ app.include_router(
    tags=["v2", "admin"],
    prefix="/api/copilot",
 )
+app.include_router(
+    backend.api.features.admin.platform_cost_routes.router,
+    tags=["v2", "admin"],
+    prefix="/api/admin",
+)
 app.include_router(
    backend.api.features.executions.review.routes.router,
    tags=["v2", "executions", "review"],
--- a/autogpt_platform/backend/backend/blocks/apollo/organization.py
+++ b/autogpt_platform/backend/backend/blocks/apollo/organization.py
@@ -17,7 +17,7 @@ from backend.blocks.apollo.models import (
    PrimaryPhone,
    SearchOrganizationsRequest,
 )
-from backend.data.model import CredentialsField, SchemaField
+from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField


 class SearchOrganizationsBlock(Block):
@@ -218,6 +218,11 @@ To find IDs, identify the values for organization_id when you call this endpoint
    ) -> BlockOutput:
        query = SearchOrganizationsRequest(**input_data.model_dump())
        organizations = await self.search_organizations(query, credentials)
+        self.merge_stats(
+            NodeExecutionStats(
+                provider_cost=float(len(organizations)), provider_cost_type="items"
+            )
+        )
        for organization in organizations:
            yield "organization", organization
        yield "organizations", organizations
--- a/autogpt_platform/backend/backend/blocks/apollo/people.py
+++ b/autogpt_platform/backend/backend/blocks/apollo/people.py
@@ -21,7 +21,7 @@ from backend.blocks.apollo.models import (
    SearchPeopleRequest,
    SenorityLevels,
 )
-from backend.data.model import CredentialsField, SchemaField
+from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField


 class SearchPeopleBlock(Block):
@@ -366,4 +366,9 @@ class SearchPeopleBlock(Block):
                *(enrich_or_fallback(person) for person in people)
            )

+        self.merge_stats(
+            NodeExecutionStats(
+                provider_cost=float(len(people)), provider_cost_type="items"
+            )
+        )
        yield "people", people
--- a/autogpt_platform/backend/backend/blocks/block_cost_tracking_test.py
+++ b/autogpt_platform/backend/backend/blocks/block_cost_tracking_test.py
@@ -0,0 +1,712 @@
+"""Unit tests for merge_stats cost tracking in individual blocks.
+
+Covers the exa code_context, exa contents, and apollo organization blocks
+to verify provider cost is correctly extracted and reported.
+"""
+
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+from pydantic import SecretStr
+
+from backend.data.model import APIKeyCredentials, NodeExecutionStats
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+TEST_EXA_CREDENTIALS = APIKeyCredentials(
+    id="01234567-89ab-cdef-0123-456789abcdef",
+    provider="exa",
+    api_key=SecretStr("mock-exa-api-key"),
+    title="Mock Exa API key",
+    expires_at=None,
+)
+
+TEST_EXA_CREDENTIALS_INPUT = {
+    "provider": TEST_EXA_CREDENTIALS.provider,
+    "id": TEST_EXA_CREDENTIALS.id,
+    "type": TEST_EXA_CREDENTIALS.type,
+    "title": TEST_EXA_CREDENTIALS.title,
+}
+
+
+# ---------------------------------------------------------------------------
+# ExaCodeContextBlock — cost_dollars is a string like "0.005"
+# ---------------------------------------------------------------------------
+
+
+class TestExaCodeContextBlockCostTracking:
+    @pytest.mark.asyncio
+    async def test_merge_stats_called_with_float_cost(self):
+        """float(cost_dollars) parsed from API string and passed to merge_stats."""
+        from backend.blocks.exa.code_context import ExaCodeContextBlock
+
+        block = ExaCodeContextBlock()
+
+        api_response = {
+            "requestId": "req-1",
+            "query": "how to use hooks",
+            "response": "Here are some examples...",
+            "resultsCount": 3,
+            "costDollars": "0.005",
+            "searchTime": 1.2,
+            "outputTokens": 100,
+        }
+
+        mock_resp = MagicMock()
+        mock_resp.json.return_value = api_response
+
+        accumulated: list[NodeExecutionStats] = []
+
+        with (
+            patch(
+                "backend.blocks.exa.code_context.Requests.post",
+                new_callable=AsyncMock,
+                return_value=mock_resp,
+            ),
+            patch.object(
+                block, "merge_stats", side_effect=lambda s: accumulated.append(s)
+            ),
+        ):
+            input_data = ExaCodeContextBlock.Input(
+                query="how to use hooks",
+                credentials=TEST_EXA_CREDENTIALS_INPUT,  # type: ignore[arg-type]
+            )
+            results = []
+            async for output in block.run(
+                input_data,
+                credentials=TEST_EXA_CREDENTIALS,
+            ):
+                results.append(output)
+
+        assert len(accumulated) == 1
+        assert accumulated[0].provider_cost == pytest.approx(0.005)
+
+    @pytest.mark.asyncio
+    async def test_invalid_cost_dollars_does_not_raise(self):
+        """When cost_dollars cannot be parsed as float, merge_stats is not called."""
+        from backend.blocks.exa.code_context import ExaCodeContextBlock
+
+        block = ExaCodeContextBlock()
+
+        api_response = {
+            "requestId": "req-2",
+            "query": "query",
+            "response": "response",
+            "resultsCount": 0,
+            "costDollars": "N/A",
+            "searchTime": 0.5,
+            "outputTokens": 0,
+        }
+
+        mock_resp = MagicMock()
+        mock_resp.json.return_value = api_response
+
+        merge_calls: list[NodeExecutionStats] = []
+
+        with (
+            patch(
+                "backend.blocks.exa.code_context.Requests.post",
+                new_callable=AsyncMock,
+                return_value=mock_resp,
+            ),
+            patch.object(
+                block, "merge_stats", side_effect=lambda s: merge_calls.append(s)
+            ),
+        ):
+            input_data = ExaCodeContextBlock.Input(
+                query="query",
+                credentials=TEST_EXA_CREDENTIALS_INPUT,  # type: ignore[arg-type]
+            )
+            async for _ in block.run(
+                input_data,
+                credentials=TEST_EXA_CREDENTIALS,
+            ):
+                pass
+
+        assert merge_calls == []
+
+    @pytest.mark.asyncio
+    async def test_zero_cost_is_tracked(self):
+        """A zero cost_dollars string '0.0' should still be recorded."""
+        from backend.blocks.exa.code_context import ExaCodeContextBlock
+
+        block = ExaCodeContextBlock()
+
+        api_response = {
+            "requestId": "req-3",
+            "query": "query",
+            "response": "...",
+            "resultsCount": 1,
+            "costDollars": "0.0",
+            "searchTime": 0.1,
+            "outputTokens": 10,
+        }
+
+        mock_resp = MagicMock()
+        mock_resp.json.return_value = api_response
+
+        accumulated: list[NodeExecutionStats] = []
+
+        with (
+            patch(
+                "backend.blocks.exa.code_context.Requests.post",
+                new_callable=AsyncMock,
+                return_value=mock_resp,
+            ),
+            patch.object(
+                block, "merge_stats", side_effect=lambda s: accumulated.append(s)
+            ),
+        ):
+            input_data = ExaCodeContextBlock.Input(
+                query="query",
+                credentials=TEST_EXA_CREDENTIALS_INPUT,  # type: ignore[arg-type]
+            )
+            async for _ in block.run(
+                input_data,
+                credentials=TEST_EXA_CREDENTIALS,
+            ):
+                pass
+
+        assert len(accumulated) == 1
+        assert accumulated[0].provider_cost == 0.0
+
+
+# ---------------------------------------------------------------------------
+# ExaContentsBlock — response.cost_dollars.total (CostDollars model)
+# ---------------------------------------------------------------------------
+
+
+class TestExaContentsBlockCostTracking:
+    @pytest.mark.asyncio
+    async def test_merge_stats_called_with_cost_dollars_total(self):
+        """provider_cost equals response.cost_dollars.total when present."""
+        from backend.blocks.exa.contents import ExaContentsBlock
+        from backend.blocks.exa.helpers import CostDollars
+
+        block = ExaContentsBlock()
+
+        cost_dollars = CostDollars(total=0.012)
+
+        mock_response = MagicMock()
+        mock_response.results = []
+        mock_response.context = None
+        mock_response.statuses = None
+        mock_response.cost_dollars = cost_dollars
+
+        accumulated: list[NodeExecutionStats] = []
+
+        with (
+            patch(
+                "backend.blocks.exa.contents.AsyncExa",
+                return_value=MagicMock(
+                    get_contents=AsyncMock(return_value=mock_response)
+                ),
+            ),
+            patch.object(
+                block, "merge_stats", side_effect=lambda s: accumulated.append(s)
+            ),
+        ):
+            input_data = ExaContentsBlock.Input(
+                urls=["https://example.com"],
+                credentials=TEST_EXA_CREDENTIALS_INPUT,  # type: ignore[arg-type]
+            )
+            async for _ in block.run(
+                input_data,
+                credentials=TEST_EXA_CREDENTIALS,
+            ):
+                pass
+
+        assert len(accumulated) == 1
+        assert accumulated[0].provider_cost == pytest.approx(0.012)
+
+    @pytest.mark.asyncio
+    async def test_no_merge_stats_when_cost_dollars_absent(self):
+        """When response.cost_dollars is None, merge_stats is not called."""
+        from backend.blocks.exa.contents import ExaContentsBlock
+
+        block = ExaContentsBlock()
+
+        mock_response = MagicMock()
+        mock_response.results = []
+        mock_response.context = None
+        mock_response.statuses = None
+        mock_response.cost_dollars = None
+
+        accumulated: list[NodeExecutionStats] = []
+
+        with (
+            patch(
+                "backend.blocks.exa.contents.AsyncExa",
+                return_value=MagicMock(
+                    get_contents=AsyncMock(return_value=mock_response)
+                ),
+            ),
+            patch.object(
+                block, "merge_stats", side_effect=lambda s: accumulated.append(s)
+            ),
+        ):
+            input_data = ExaContentsBlock.Input(
+                urls=["https://example.com"],
+                credentials=TEST_EXA_CREDENTIALS_INPUT,  # type: ignore[arg-type]
+            )
+            async for _ in block.run(
+                input_data,
+                credentials=TEST_EXA_CREDENTIALS,
+            ):
+                pass
+
+        assert accumulated == []
+
+
+# ---------------------------------------------------------------------------
+# SearchOrganizationsBlock — provider_cost = float(len(organizations))
+# ---------------------------------------------------------------------------
+
+
+class TestSearchOrganizationsBlockCostTracking:
+    @pytest.mark.asyncio
+    async def test_merge_stats_called_with_org_count(self):
+        """provider_cost == number of returned organizations, type == 'items'."""
+        from backend.blocks.apollo._auth import TEST_CREDENTIALS as APOLLO_CREDS
+        from backend.blocks.apollo._auth import (
+            TEST_CREDENTIALS_INPUT as APOLLO_CREDS_INPUT,
+        )
+        from backend.blocks.apollo.models import Organization
+        from backend.blocks.apollo.organization import SearchOrganizationsBlock
+
+        block = SearchOrganizationsBlock()
+
+        fake_orgs = [Organization(id=str(i), name=f"Org{i}") for i in range(3)]
+
+        accumulated: list[NodeExecutionStats] = []
+
+        with (
+            patch.object(
+                SearchOrganizationsBlock,
+                "search_organizations",
+                new_callable=AsyncMock,
+                return_value=fake_orgs,
+            ),
+            patch.object(
+                block, "merge_stats", side_effect=lambda s: accumulated.append(s)
+            ),
+        ):
+            input_data = SearchOrganizationsBlock.Input(
+                credentials=APOLLO_CREDS_INPUT,  # type: ignore[arg-type]
+            )
+            results = []
+            async for output in block.run(
+                input_data,
+                credentials=APOLLO_CREDS,
+            ):
+                results.append(output)
+
+        assert len(accumulated) == 1
+        assert accumulated[0].provider_cost == pytest.approx(3.0)
+        assert accumulated[0].provider_cost_type == "items"
+
+    @pytest.mark.asyncio
+    async def test_empty_org_list_tracks_zero(self):
+        """An empty organization list results in provider_cost=0.0."""
+        from backend.blocks.apollo._auth import TEST_CREDENTIALS as APOLLO_CREDS
+        from backend.blocks.apollo._auth import (
+            TEST_CREDENTIALS_INPUT as APOLLO_CREDS_INPUT,
+        )
+        from backend.blocks.apollo.organization import SearchOrganizationsBlock
+
+        block = SearchOrganizationsBlock()
+        accumulated: list[NodeExecutionStats] = []
+
+        with (
+            patch.object(
+                SearchOrganizationsBlock,
+                "search_organizations",
+                new_callable=AsyncMock,
+                return_value=[],
+            ),
+            patch.object(
+                block, "merge_stats", side_effect=lambda s: accumulated.append(s)
+            ),
+        ):
+            input_data = SearchOrganizationsBlock.Input(
+                credentials=APOLLO_CREDS_INPUT,  # type: ignore[arg-type]
+            )
+            async for _ in block.run(
+                input_data,
+                credentials=APOLLO_CREDS,
+            ):
+                pass
+
+        assert len(accumulated) == 1
+        assert accumulated[0].provider_cost == 0.0
+        assert accumulated[0].provider_cost_type == "items"
+
+
+# ---------------------------------------------------------------------------
+# JinaEmbeddingBlock — token count from usage.total_tokens
+# ---------------------------------------------------------------------------
+
+
+class TestJinaEmbeddingBlockCostTracking:
+    @pytest.mark.asyncio
+    async def test_merge_stats_called_with_token_count(self):
+        """provider token count is recorded when API returns usage.total_tokens."""
+        from backend.blocks.jina._auth import TEST_CREDENTIALS as JINA_CREDS
+        from backend.blocks.jina._auth import TEST_CREDENTIALS_INPUT as JINA_CREDS_INPUT
+        from backend.blocks.jina.embeddings import JinaEmbeddingBlock
+
+        block = JinaEmbeddingBlock()
+
+        api_response = {
+            "data": [{"embedding": [0.1, 0.2, 0.3]}],
+            "usage": {"total_tokens": 42},
+        }
+        mock_resp = MagicMock()
+        mock_resp.json.return_value = api_response
+
+        accumulated: list[NodeExecutionStats] = []
+
+        with (
+            patch(
+                "backend.blocks.jina.embeddings.Requests.post",
+                new_callable=AsyncMock,
+                return_value=mock_resp,
+            ),
+            patch.object(
+                block, "merge_stats", side_effect=lambda s: accumulated.append(s)
+            ),
+        ):
+            input_data = JinaEmbeddingBlock.Input(
+                texts=["hello world"],
+                credentials=JINA_CREDS_INPUT,  # type: ignore[arg-type]
+            )
+            async for _ in block.run(input_data, credentials=JINA_CREDS):
+                pass
+
+        assert len(accumulated) == 1
+        assert accumulated[0].input_token_count == 42
+
+    @pytest.mark.asyncio
+    async def test_no_merge_stats_when_usage_absent(self):
+        """When API response omits usage field, merge_stats is not called."""
+        from backend.blocks.jina._auth import TEST_CREDENTIALS as JINA_CREDS
+        from backend.blocks.jina._auth import TEST_CREDENTIALS_INPUT as JINA_CREDS_INPUT
+        from backend.blocks.jina.embeddings import JinaEmbeddingBlock
+
+        block = JinaEmbeddingBlock()
+
+        api_response = {
+            "data": [{"embedding": [0.1, 0.2, 0.3]}],
+        }
+        mock_resp = MagicMock()
+        mock_resp.json.return_value = api_response
+
+        accumulated: list[NodeExecutionStats] = []
+
+        with (
+            patch(
+                "backend.blocks.jina.embeddings.Requests.post",
+                new_callable=AsyncMock,
+                return_value=mock_resp,
+            ),
+            patch.object(
+                block, "merge_stats", side_effect=lambda s: accumulated.append(s)
+            ),
+        ):
+            input_data = JinaEmbeddingBlock.Input(
+                texts=["hello"],
+                credentials=JINA_CREDS_INPUT,  # type: ignore[arg-type]
+            )
+            async for _ in block.run(input_data, credentials=JINA_CREDS):
+                pass
+
+        assert accumulated == []
+
+
+# ---------------------------------------------------------------------------
+# UnrealTextToSpeechBlock — character count from input text length
+# ---------------------------------------------------------------------------
+
+
+class TestUnrealTextToSpeechBlockCostTracking:
+    @pytest.mark.asyncio
+    async def test_merge_stats_called_with_character_count(self):
+        """provider_cost equals len(text) with type='characters'."""
+        from backend.blocks.text_to_speech_block import TEST_CREDENTIALS as TTS_CREDS
+        from backend.blocks.text_to_speech_block import (
+            TEST_CREDENTIALS_INPUT as TTS_CREDS_INPUT,
+        )
+        from backend.blocks.text_to_speech_block import UnrealTextToSpeechBlock
+
+        block = UnrealTextToSpeechBlock()
+        test_text = "Hello, world!"
+
+        with (
+            patch.object(
+                UnrealTextToSpeechBlock,
+                "call_unreal_speech_api",
+                new_callable=AsyncMock,
+                return_value={"OutputUri": "https://example.com/audio.mp3"},
+            ),
+            patch.object(block, "merge_stats") as mock_merge,
+        ):
+            input_data = UnrealTextToSpeechBlock.Input(
+                text=test_text,
+                credentials=TTS_CREDS_INPUT,  # type: ignore[arg-type]
+            )
+            async for _ in block.run(input_data, credentials=TTS_CREDS):
+                pass
+
+        mock_merge.assert_called_once()
+        stats = mock_merge.call_args[0][0]
+        assert stats.provider_cost == float(len(test_text))
+        assert stats.provider_cost_type == "characters"
+
+    @pytest.mark.asyncio
+    async def test_empty_text_gives_zero_characters(self):
+        """An empty text string results in provider_cost=0.0."""
+        from backend.blocks.text_to_speech_block import TEST_CREDENTIALS as TTS_CREDS
+        from backend.blocks.text_to_speech_block import (
+            TEST_CREDENTIALS_INPUT as TTS_CREDS_INPUT,
+        )
+        from backend.blocks.text_to_speech_block import UnrealTextToSpeechBlock
+
+        block = UnrealTextToSpeechBlock()
+
+        with (
+            patch.object(
+                UnrealTextToSpeechBlock,
+                "call_unreal_speech_api",
+                new_callable=AsyncMock,
+                return_value={"OutputUri": "https://example.com/audio.mp3"},
+            ),
+            patch.object(block, "merge_stats") as mock_merge,
+        ):
+            input_data = UnrealTextToSpeechBlock.Input(
+                text="",
+                credentials=TTS_CREDS_INPUT,  # type: ignore[arg-type]
+            )
+            async for _ in block.run(input_data, credentials=TTS_CREDS):
+                pass
+
+        mock_merge.assert_called_once()
+        stats = mock_merge.call_args[0][0]
+        assert stats.provider_cost == 0.0
+        assert stats.provider_cost_type == "characters"
+
+
+# ---------------------------------------------------------------------------
+# GoogleMapsSearchBlock — item count from search_places results
+# ---------------------------------------------------------------------------
+
+
+class TestGoogleMapsSearchBlockCostTracking:
+    @pytest.mark.asyncio
+    async def test_merge_stats_called_with_place_count(self):
+        """provider_cost equals number of returned places, type == 'items'."""
+        from backend.blocks.google_maps import TEST_CREDENTIALS as MAPS_CREDS
+        from backend.blocks.google_maps import (
+            TEST_CREDENTIALS_INPUT as MAPS_CREDS_INPUT,
+        )
+        from backend.blocks.google_maps import GoogleMapsSearchBlock
+
+        block = GoogleMapsSearchBlock()
+
+        fake_places = [{"name": f"Place{i}", "address": f"Addr{i}"} for i in range(4)]
+        accumulated: list[NodeExecutionStats] = []
+
+        with (
+            patch.object(
+                GoogleMapsSearchBlock,
+                "search_places",
+                return_value=fake_places,
+            ),
+            patch.object(
+                block, "merge_stats", side_effect=lambda s: accumulated.append(s)
+            ),
+        ):
+            input_data = GoogleMapsSearchBlock.Input(
+                query="coffee shops",
+                credentials=MAPS_CREDS_INPUT,  # type: ignore[arg-type]
+            )
+            async for _ in block.run(input_data, credentials=MAPS_CREDS):
+                pass
+
+        assert len(accumulated) == 1
+        assert accumulated[0].provider_cost == 4.0
+        assert accumulated[0].provider_cost_type == "items"
+
+    @pytest.mark.asyncio
+    async def test_empty_results_tracks_zero(self):
+        """Zero places returned results in provider_cost=0.0."""
+        from backend.blocks.google_maps import TEST_CREDENTIALS as MAPS_CREDS
+        from backend.blocks.google_maps import (
+            TEST_CREDENTIALS_INPUT as MAPS_CREDS_INPUT,
+        )
+        from backend.blocks.google_maps import GoogleMapsSearchBlock
+
+        block = GoogleMapsSearchBlock()
+        accumulated: list[NodeExecutionStats] = []
+
+        with (
+            patch.object(
+                GoogleMapsSearchBlock,
+                "search_places",
+                return_value=[],
+            ),
+            patch.object(
+                block, "merge_stats", side_effect=lambda s: accumulated.append(s)
+            ),
+        ):
+            input_data = GoogleMapsSearchBlock.Input(
+                query="nothing here",
+                credentials=MAPS_CREDS_INPUT,  # type: ignore[arg-type]
+            )
+            async for _ in block.run(input_data, credentials=MAPS_CREDS):
+                pass
+
+        assert len(accumulated) == 1
+        assert accumulated[0].provider_cost == 0.0
+        assert accumulated[0].provider_cost_type == "items"
+
+
+# ---------------------------------------------------------------------------
+# SmartLeadAddLeadsBlock — item count from lead_list length
+# ---------------------------------------------------------------------------
+
+
+class TestSmartLeadAddLeadsBlockCostTracking:
+    @pytest.mark.asyncio
+    async def test_merge_stats_called_with_lead_count(self):
+        """provider_cost equals number of leads uploaded, type == 'items'."""
+        from backend.blocks.smartlead._auth import TEST_CREDENTIALS as SL_CREDS
+        from backend.blocks.smartlead._auth import (
+            TEST_CREDENTIALS_INPUT as SL_CREDS_INPUT,
+        )
+        from backend.blocks.smartlead.campaign import AddLeadToCampaignBlock
+        from backend.blocks.smartlead.models import (
+            AddLeadsToCampaignResponse,
+            LeadInput,
+        )
+
+        block = AddLeadToCampaignBlock()
+
+        fake_leads = [
+            LeadInput(first_name="Alice", last_name="A", email="alice@example.com"),
+            LeadInput(first_name="Bob", last_name="B", email="bob@example.com"),
+        ]
+        fake_response = AddLeadsToCampaignResponse(
+            ok=True,
+            upload_count=2,
+            total_leads=2,
+            block_count=0,
+            duplicate_count=0,
+            invalid_email_count=0,
+            invalid_emails=[],
+            already_added_to_campaign=0,
+            unsubscribed_leads=[],
+            is_lead_limit_exhausted=False,
+            lead_import_stopped_count=0,
+            bounce_count=0,
+        )
+        accumulated: list[NodeExecutionStats] = []
+
+        with (
+            patch.object(
+                AddLeadToCampaignBlock,
+                "add_leads_to_campaign",
+                new_callable=AsyncMock,
+                return_value=fake_response,
+            ),
+            patch.object(
+                block, "merge_stats", side_effect=lambda s: accumulated.append(s)
+            ),
+        ):
+            input_data = AddLeadToCampaignBlock.Input(
+                campaign_id=123,
+                lead_list=fake_leads,
+                credentials=SL_CREDS_INPUT,  # type: ignore[arg-type]
+            )
+            async for _ in block.run(input_data, credentials=SL_CREDS):
+                pass
+
+        assert len(accumulated) == 1
+        assert accumulated[0].provider_cost == 2.0
+        assert accumulated[0].provider_cost_type == "items"
+
+
+# ---------------------------------------------------------------------------
+# SearchPeopleBlock — item count from people list length
+# ---------------------------------------------------------------------------
+
+
+class TestSearchPeopleBlockCostTracking:
+    @pytest.mark.asyncio
+    async def test_merge_stats_called_with_people_count(self):
+        """provider_cost equals number of returned people, type == 'items'."""
+        from backend.blocks.apollo._auth import TEST_CREDENTIALS as APOLLO_CREDS
+        from backend.blocks.apollo._auth import (
+            TEST_CREDENTIALS_INPUT as APOLLO_CREDS_INPUT,
+        )
+        from backend.blocks.apollo.models import Contact
+        from backend.blocks.apollo.people import SearchPeopleBlock
+
+        block = SearchPeopleBlock()
+        fake_people = [Contact(id=str(i), first_name=f"Person{i}") for i in range(5)]
+        accumulated: list[NodeExecutionStats] = []
+
+        with (
+            patch.object(
+                SearchPeopleBlock,
+                "search_people",
+                new_callable=AsyncMock,
+                return_value=fake_people,
+            ),
+            patch.object(
+                block, "merge_stats", side_effect=lambda s: accumulated.append(s)
+            ),
+        ):
+            input_data = SearchPeopleBlock.Input(
+                credentials=APOLLO_CREDS_INPUT,  # type: ignore[arg-type]
+            )
+            async for _ in block.run(input_data, credentials=APOLLO_CREDS):
+                pass
+
+        assert len(accumulated) == 1
+        assert accumulated[0].provider_cost == pytest.approx(5.0)
+        assert accumulated[0].provider_cost_type == "items"
+
+    @pytest.mark.asyncio
+    async def test_empty_people_list_tracks_zero(self):
+        """An empty people list results in provider_cost=0.0."""
+        from backend.blocks.apollo._auth import TEST_CREDENTIALS as APOLLO_CREDS
+        from backend.blocks.apollo._auth import (
+            TEST_CREDENTIALS_INPUT as APOLLO_CREDS_INPUT,
+        )
+        from backend.blocks.apollo.people import SearchPeopleBlock
+
+        block = SearchPeopleBlock()
+        accumulated: list[NodeExecutionStats] = []
+
+        with (
+            patch.object(
+                SearchPeopleBlock,
+                "search_people",
+                new_callable=AsyncMock,
+                return_value=[],
+            ),
+            patch.object(
+                block, "merge_stats", side_effect=lambda s: accumulated.append(s)
+            ),
+        ):
+            input_data = SearchPeopleBlock.Input(
+                credentials=APOLLO_CREDS_INPUT,  # type: ignore[arg-type]
+            )
+            async for _ in block.run(input_data, credentials=APOLLO_CREDS):
+                pass
+
+        assert len(accumulated) == 1
+        assert accumulated[0].provider_cost == 0.0
+        assert accumulated[0].provider_cost_type == "items"
--- a/autogpt_platform/backend/backend/blocks/exa/code_context.py
+++ b/autogpt_platform/backend/backend/blocks/exa/code_context.py
@@ -9,6 +9,7 @@ from typing import Union

 from pydantic import BaseModel

+from backend.data.model import NodeExecutionStats
 from backend.sdk import (
    APIKeyCredentials,
    Block,
@@ -116,3 +117,10 @@ class ExaCodeContextBlock(Block):
        yield "cost_dollars", context.cost_dollars
        yield "search_time", context.search_time
        yield "output_tokens", context.output_tokens
+
+        # Parse cost_dollars (API returns as string, e.g. "0.005")
+        try:
+            cost_usd = float(context.cost_dollars)
+            self.merge_stats(NodeExecutionStats(provider_cost=cost_usd))
+        except (ValueError, TypeError):
+            pass
--- a/autogpt_platform/backend/backend/blocks/exa/contents.py
+++ b/autogpt_platform/backend/backend/blocks/exa/contents.py
@@ -4,6 +4,7 @@ from typing import Optional
 from exa_py import AsyncExa
 from pydantic import BaseModel

+from backend.data.model import NodeExecutionStats
 from backend.sdk import (
    APIKeyCredentials,
    Block,
@@ -223,3 +224,6 @@ class ExaContentsBlock(Block):

        if response.cost_dollars:
            yield "cost_dollars", response.cost_dollars
+            self.merge_stats(
+                NodeExecutionStats(provider_cost=response.cost_dollars.total)
+            )
--- a/autogpt_platform/backend/backend/blocks/exa/cost_tracking_test.py
+++ b/autogpt_platform/backend/backend/blocks/exa/cost_tracking_test.py
@@ -0,0 +1,575 @@
+"""Tests for cost tracking in Exa blocks.
+
+Covers the cost_dollars → provider_cost → merge_stats path for both
+ExaContentsBlock and ExaCodeContextBlock.
+"""
+
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+from backend.blocks.exa._test import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT
+from backend.data.model import NodeExecutionStats
+
+
+class TestExaCodeContextCostTracking:
+    """ExaCodeContextBlock parses cost_dollars (string) and calls merge_stats."""
+
+    @pytest.mark.asyncio
+    async def test_valid_cost_string_is_parsed_and_merged(self):
+        """A numeric cost string like '0.005' is merged as provider_cost."""
+        from backend.blocks.exa.code_context import ExaCodeContextBlock
+
+        block = ExaCodeContextBlock()
+        merged: list[NodeExecutionStats] = []
+        block.merge_stats = lambda s: merged.append(s)  # type: ignore[assignment]
+
+        api_response = {
+            "requestId": "req-1",
+            "query": "test query",
+            "response": "some code",
+            "resultsCount": 3,
+            "costDollars": "0.005",
+            "searchTime": 1.2,
+            "outputTokens": 100,
+        }
+
+        with patch("backend.blocks.exa.code_context.Requests") as mock_requests_cls:
+            mock_resp = MagicMock()
+            mock_resp.json.return_value = api_response
+            mock_requests_cls.return_value.post = AsyncMock(return_value=mock_resp)
+
+            outputs = []
+            async for key, value in block.run(
+                block.Input(query="test query", credentials=TEST_CREDENTIALS_INPUT),  # type: ignore[arg-type]
+                credentials=TEST_CREDENTIALS,
+            ):
+                outputs.append((key, value))
+
+        assert any(k == "cost_dollars" for k, _ in outputs)
+        assert len(merged) == 1
+        assert merged[0].provider_cost == pytest.approx(0.005)
+
+    @pytest.mark.asyncio
+    async def test_invalid_cost_string_does_not_raise(self):
+        """A non-numeric cost_dollars value is swallowed silently."""
+        from backend.blocks.exa.code_context import ExaCodeContextBlock
+
+        block = ExaCodeContextBlock()
+        merged: list[NodeExecutionStats] = []
+        block.merge_stats = lambda s: merged.append(s)  # type: ignore[assignment]
+
+        api_response = {
+            "requestId": "req-2",
+            "query": "test",
+            "response": "code",
+            "resultsCount": 0,
+            "costDollars": "N/A",
+            "searchTime": 0.5,
+            "outputTokens": 0,
+        }
+
+        with patch("backend.blocks.exa.code_context.Requests") as mock_requests_cls:
+            mock_resp = MagicMock()
+            mock_resp.json.return_value = api_response
+            mock_requests_cls.return_value.post = AsyncMock(return_value=mock_resp)
+
+            outputs = []
+            async for key, value in block.run(
+                block.Input(query="test", credentials=TEST_CREDENTIALS_INPUT),  # type: ignore[arg-type]
+                credentials=TEST_CREDENTIALS,
+            ):
+                outputs.append((key, value))
+
+        # No merge_stats call because float() raised ValueError
+        assert len(merged) == 0
+
+    @pytest.mark.asyncio
+    async def test_zero_cost_string_is_merged(self):
+        """'0.0' is a valid cost — should still be tracked."""
+        from backend.blocks.exa.code_context import ExaCodeContextBlock
+
+        block = ExaCodeContextBlock()
+        merged: list[NodeExecutionStats] = []
+        block.merge_stats = lambda s: merged.append(s)  # type: ignore[assignment]
+
+        api_response = {
+            "requestId": "req-3",
+            "query": "free query",
+            "response": "result",
+            "resultsCount": 1,
+            "costDollars": "0.0",
+            "searchTime": 0.1,
+            "outputTokens": 10,
+        }
+
+        with patch("backend.blocks.exa.code_context.Requests") as mock_requests_cls:
+            mock_resp = MagicMock()
+            mock_resp.json.return_value = api_response
+            mock_requests_cls.return_value.post = AsyncMock(return_value=mock_resp)
+
+            async for _ in block.run(
+                block.Input(query="free query", credentials=TEST_CREDENTIALS_INPUT),  # type: ignore[arg-type]
+                credentials=TEST_CREDENTIALS,
+            ):
+                pass
+
+        assert len(merged) == 1
+        assert merged[0].provider_cost == pytest.approx(0.0)
+
+
+class TestExaContentsCostTracking:
+    """ExaContentsBlock merges cost_dollars.total as provider_cost."""
+
+    @pytest.mark.asyncio
+    async def test_cost_dollars_total_is_merged(self):
+        """When the SDK response includes cost_dollars, its total is merged."""
+        from backend.blocks.exa.contents import ExaContentsBlock
+        from backend.blocks.exa.helpers import CostDollars
+
+        block = ExaContentsBlock()
+        merged: list[NodeExecutionStats] = []
+        block.merge_stats = lambda s: merged.append(s)  # type: ignore[assignment]
+
+        mock_sdk_response = MagicMock()
+        mock_sdk_response.results = []
+        mock_sdk_response.context = None
+        mock_sdk_response.statuses = None
+        mock_sdk_response.cost_dollars = CostDollars(total=0.012)
+
+        with patch("backend.blocks.exa.contents.AsyncExa") as mock_exa_cls:
+            mock_exa = MagicMock()
+            mock_exa.get_contents = AsyncMock(return_value=mock_sdk_response)
+            mock_exa_cls.return_value = mock_exa
+
+            async for _ in block.run(
+                block.Input(urls=["https://example.com"], credentials=TEST_CREDENTIALS_INPUT),  # type: ignore[arg-type]
+                credentials=TEST_CREDENTIALS,
+            ):
+                pass
+
+        assert len(merged) == 1
+        assert merged[0].provider_cost == pytest.approx(0.012)
+
+    @pytest.mark.asyncio
+    async def test_no_cost_dollars_skips_merge(self):
+        """When cost_dollars is absent, merge_stats is not called."""
+        from backend.blocks.exa.contents import ExaContentsBlock
+
+        block = ExaContentsBlock()
+        merged: list[NodeExecutionStats] = []
+        block.merge_stats = lambda s: merged.append(s)  # type: ignore[assignment]
+
+        mock_sdk_response = MagicMock()
+        mock_sdk_response.results = []
+        mock_sdk_response.context = None
+        mock_sdk_response.statuses = None
+        mock_sdk_response.cost_dollars = None
+
+        with patch("backend.blocks.exa.contents.AsyncExa") as mock_exa_cls:
+            mock_exa = MagicMock()
+            mock_exa.get_contents = AsyncMock(return_value=mock_sdk_response)
+            mock_exa_cls.return_value = mock_exa
+
+            async for _ in block.run(
+                block.Input(urls=["https://example.com"], credentials=TEST_CREDENTIALS_INPUT),  # type: ignore[arg-type]
+                credentials=TEST_CREDENTIALS,
+            ):
+                pass
+
+        assert len(merged) == 0
+
+    @pytest.mark.asyncio
+    async def test_zero_cost_dollars_is_merged(self):
+        """A total of 0.0 (free tier) should still be merged."""
+        from backend.blocks.exa.contents import ExaContentsBlock
+        from backend.blocks.exa.helpers import CostDollars
+
+        block = ExaContentsBlock()
+        merged: list[NodeExecutionStats] = []
+        block.merge_stats = lambda s: merged.append(s)  # type: ignore[assignment]
+
+        mock_sdk_response = MagicMock()
+        mock_sdk_response.results = []
+        mock_sdk_response.context = None
+        mock_sdk_response.statuses = None
+        mock_sdk_response.cost_dollars = CostDollars(total=0.0)
+
+        with patch("backend.blocks.exa.contents.AsyncExa") as mock_exa_cls:
+            mock_exa = MagicMock()
+            mock_exa.get_contents = AsyncMock(return_value=mock_sdk_response)
+            mock_exa_cls.return_value = mock_exa
+
+            async for _ in block.run(
+                block.Input(urls=["https://example.com"], credentials=TEST_CREDENTIALS_INPUT),  # type: ignore[arg-type]
+                credentials=TEST_CREDENTIALS,
+            ):
+                pass
+
+        assert len(merged) == 1
+        assert merged[0].provider_cost == pytest.approx(0.0)
+
+
+class TestExaSearchCostTracking:
+    """ExaSearchBlock merges cost_dollars.total as provider_cost."""
+
+    @pytest.mark.asyncio
+    async def test_cost_dollars_total_is_merged(self):
+        """When the SDK response includes cost_dollars, its total is merged."""
+        from backend.blocks.exa.helpers import CostDollars
+        from backend.blocks.exa.search import ExaSearchBlock
+
+        block = ExaSearchBlock()
+        merged: list[NodeExecutionStats] = []
+        block.merge_stats = lambda s: merged.append(s)  # type: ignore[assignment]
+
+        mock_sdk_response = MagicMock()
+        mock_sdk_response.results = []
+        mock_sdk_response.context = None
+        mock_sdk_response.resolved_search_type = None
+        mock_sdk_response.cost_dollars = CostDollars(total=0.008)
+
+        with patch("backend.blocks.exa.search.AsyncExa") as mock_exa_cls:
+            mock_exa = MagicMock()
+            mock_exa.search = AsyncMock(return_value=mock_sdk_response)
+            mock_exa_cls.return_value = mock_exa
+
+            async for _ in block.run(
+                block.Input(query="test query", credentials=TEST_CREDENTIALS_INPUT),  # type: ignore[arg-type]
+                credentials=TEST_CREDENTIALS,
+            ):
+                pass
+
+        assert len(merged) == 1
+        assert merged[0].provider_cost == pytest.approx(0.008)
+
+    @pytest.mark.asyncio
+    async def test_no_cost_dollars_skips_merge(self):
+        """When cost_dollars is absent, merge_stats is not called."""
+        from backend.blocks.exa.search import ExaSearchBlock
+
+        block = ExaSearchBlock()
+        merged: list[NodeExecutionStats] = []
+        block.merge_stats = lambda s: merged.append(s)  # type: ignore[assignment]
+
+        mock_sdk_response = MagicMock()
+        mock_sdk_response.results = []
+        mock_sdk_response.context = None
+        mock_sdk_response.resolved_search_type = None
+        mock_sdk_response.cost_dollars = None
+
+        with patch("backend.blocks.exa.search.AsyncExa") as mock_exa_cls:
+            mock_exa = MagicMock()
+            mock_exa.search = AsyncMock(return_value=mock_sdk_response)
+            mock_exa_cls.return_value = mock_exa
+
+            async for _ in block.run(
+                block.Input(query="test query", credentials=TEST_CREDENTIALS_INPUT),  # type: ignore[arg-type]
+                credentials=TEST_CREDENTIALS,
+            ):
+                pass
+
+        assert len(merged) == 0
+
+
+class TestExaSimilarCostTracking:
+    """ExaFindSimilarBlock merges cost_dollars.total as provider_cost."""
+
+    @pytest.mark.asyncio
+    async def test_cost_dollars_total_is_merged(self):
+        """When the SDK response includes cost_dollars, its total is merged."""
+        from backend.blocks.exa.helpers import CostDollars
+        from backend.blocks.exa.similar import ExaFindSimilarBlock
+
+        block = ExaFindSimilarBlock()
+        merged: list[NodeExecutionStats] = []
+        block.merge_stats = lambda s: merged.append(s)  # type: ignore[assignment]
+
+        mock_sdk_response = MagicMock()
+        mock_sdk_response.results = []
+        mock_sdk_response.context = None
+        mock_sdk_response.request_id = "req-1"
+        mock_sdk_response.cost_dollars = CostDollars(total=0.015)
+
+        with patch("backend.blocks.exa.similar.AsyncExa") as mock_exa_cls:
+            mock_exa = MagicMock()
+            mock_exa.find_similar = AsyncMock(return_value=mock_sdk_response)
+            mock_exa_cls.return_value = mock_exa
+
+            async for _ in block.run(
+                block.Input(url="https://example.com", credentials=TEST_CREDENTIALS_INPUT),  # type: ignore[arg-type]
+                credentials=TEST_CREDENTIALS,
+            ):
+                pass
+
+        assert len(merged) == 1
+        assert merged[0].provider_cost == pytest.approx(0.015)
+
+    @pytest.mark.asyncio
+    async def test_no_cost_dollars_skips_merge(self):
+        """When cost_dollars is absent, merge_stats is not called."""
+        from backend.blocks.exa.similar import ExaFindSimilarBlock
+
+        block = ExaFindSimilarBlock()
+        merged: list[NodeExecutionStats] = []
+        block.merge_stats = lambda s: merged.append(s)  # type: ignore[assignment]
+
+        mock_sdk_response = MagicMock()
+        mock_sdk_response.results = []
+        mock_sdk_response.context = None
+        mock_sdk_response.request_id = "req-2"
+        mock_sdk_response.cost_dollars = None
+
+        with patch("backend.blocks.exa.similar.AsyncExa") as mock_exa_cls:
+            mock_exa = MagicMock()
+            mock_exa.find_similar = AsyncMock(return_value=mock_sdk_response)
+            mock_exa_cls.return_value = mock_exa
+
+            async for _ in block.run(
+                block.Input(url="https://example.com", credentials=TEST_CREDENTIALS_INPUT),  # type: ignore[arg-type]
+                credentials=TEST_CREDENTIALS,
+            ):
+                pass
+
+        assert len(merged) == 0
+
+
+# ---------------------------------------------------------------------------
+# ExaCreateResearchBlock — cost_dollars from completed poll response
+# ---------------------------------------------------------------------------
+
+
+COMPLETED_RESEARCH_RESPONSE = {
+    "researchId": "test-research-id",
+    "status": "completed",
+    "model": "exa-research",
+    "instructions": "test instructions",
+    "createdAt": 1700000000000,
+    "finishedAt": 1700000060000,
+    "costDollars": {
+        "total": 0.05,
+        "numSearches": 3,
+        "numPages": 10,
+        "reasoningTokens": 500,
+    },
+    "output": {"content": "Research findings...", "parsed": None},
+}
+
+PENDING_RESEARCH_RESPONSE = {
+    "researchId": "test-research-id",
+    "status": "pending",
+    "model": "exa-research",
+    "instructions": "test instructions",
+    "createdAt": 1700000000000,
+}
+
+
+class TestExaCreateResearchBlockCostTracking:
+    """ExaCreateResearchBlock merges cost from completed poll response."""
+
+    @pytest.mark.asyncio
+    async def test_cost_merged_when_research_completes(self):
+        """merge_stats called with provider_cost=total when poll returns completed."""
+        from backend.blocks.exa.research import ExaCreateResearchBlock
+
+        block = ExaCreateResearchBlock()
+        merged: list[NodeExecutionStats] = []
+        block.merge_stats = lambda s: merged.append(s)  # type: ignore[assignment]
+
+        create_resp = MagicMock()
+        create_resp.json.return_value = PENDING_RESEARCH_RESPONSE
+
+        poll_resp = MagicMock()
+        poll_resp.json.return_value = COMPLETED_RESEARCH_RESPONSE
+
+        mock_instance = MagicMock()
+        mock_instance.post = AsyncMock(return_value=create_resp)
+        mock_instance.get = AsyncMock(return_value=poll_resp)
+
+        with (
+            patch("backend.blocks.exa.research.Requests", return_value=mock_instance),
+            patch("asyncio.sleep", new=AsyncMock()),
+        ):
+            async for _ in block.run(
+                block.Input(
+                    instructions="test instructions",
+                    wait_for_completion=True,
+                    credentials=TEST_CREDENTIALS_INPUT,  # type: ignore[arg-type]
+                ),
+                credentials=TEST_CREDENTIALS,
+            ):
+                pass
+
+        assert len(merged) == 1
+        assert merged[0].provider_cost == pytest.approx(0.05)
+
+    @pytest.mark.asyncio
+    async def test_no_merge_when_no_cost_dollars(self):
+        """When completed response has no costDollars, merge_stats is not called."""
+        from backend.blocks.exa.research import ExaCreateResearchBlock
+
+        block = ExaCreateResearchBlock()
+        merged: list[NodeExecutionStats] = []
+        block.merge_stats = lambda s: merged.append(s)  # type: ignore[assignment]
+
+        no_cost_response = {**COMPLETED_RESEARCH_RESPONSE, "costDollars": None}
+        create_resp = MagicMock()
+        create_resp.json.return_value = PENDING_RESEARCH_RESPONSE
+        poll_resp = MagicMock()
+        poll_resp.json.return_value = no_cost_response
+
+        mock_instance = MagicMock()
+        mock_instance.post = AsyncMock(return_value=create_resp)
+        mock_instance.get = AsyncMock(return_value=poll_resp)
+
+        with (
+            patch("backend.blocks.exa.research.Requests", return_value=mock_instance),
+            patch("asyncio.sleep", new=AsyncMock()),
+        ):
+            async for _ in block.run(
+                block.Input(
+                    instructions="test instructions",
+                    wait_for_completion=True,
+                    credentials=TEST_CREDENTIALS_INPUT,  # type: ignore[arg-type]
+                ),
+                credentials=TEST_CREDENTIALS,
+            ):
+                pass
+
+        assert merged == []
+
+
+# ---------------------------------------------------------------------------
+# ExaGetResearchBlock — cost_dollars from single GET response
+# ---------------------------------------------------------------------------
+
+
+class TestExaGetResearchBlockCostTracking:
+    """ExaGetResearchBlock merges cost when the fetched research has cost_dollars."""
+
+    @pytest.mark.asyncio
+    async def test_cost_merged_from_completed_research(self):
+        """merge_stats called with provider_cost=total when research has costDollars."""
+        from backend.blocks.exa.research import ExaGetResearchBlock
+
+        block = ExaGetResearchBlock()
+        merged: list[NodeExecutionStats] = []
+        block.merge_stats = lambda s: merged.append(s)  # type: ignore[assignment]
+
+        get_resp = MagicMock()
+        get_resp.json.return_value = COMPLETED_RESEARCH_RESPONSE
+
+        mock_instance = MagicMock()
+        mock_instance.get = AsyncMock(return_value=get_resp)
+
+        with patch("backend.blocks.exa.research.Requests", return_value=mock_instance):
+            async for _ in block.run(
+                block.Input(
+                    research_id="test-research-id",
+                    credentials=TEST_CREDENTIALS_INPUT,  # type: ignore[arg-type]
+                ),
+                credentials=TEST_CREDENTIALS,
+            ):
+                pass
+
+        assert len(merged) == 1
+        assert merged[0].provider_cost == pytest.approx(0.05)
+
+    @pytest.mark.asyncio
+    async def test_no_merge_when_no_cost_dollars(self):
+        """When research has no costDollars, merge_stats is not called."""
+        from backend.blocks.exa.research import ExaGetResearchBlock
+
+        block = ExaGetResearchBlock()
+        merged: list[NodeExecutionStats] = []
+        block.merge_stats = lambda s: merged.append(s)  # type: ignore[assignment]
+
+        no_cost_response = {**COMPLETED_RESEARCH_RESPONSE, "costDollars": None}
+        get_resp = MagicMock()
+        get_resp.json.return_value = no_cost_response
+
+        mock_instance = MagicMock()
+        mock_instance.get = AsyncMock(return_value=get_resp)
+
+        with patch("backend.blocks.exa.research.Requests", return_value=mock_instance):
+            async for _ in block.run(
+                block.Input(
+                    research_id="test-research-id",
+                    credentials=TEST_CREDENTIALS_INPUT,  # type: ignore[arg-type]
+                ),
+                credentials=TEST_CREDENTIALS,
+            ):
+                pass
+
+        assert merged == []
+
+
+# ---------------------------------------------------------------------------
+# ExaWaitForResearchBlock — cost_dollars from polling response
+# ---------------------------------------------------------------------------
+
+
+class TestExaWaitForResearchBlockCostTracking:
+    """ExaWaitForResearchBlock merges cost when the polled research has cost_dollars."""
+
+    @pytest.mark.asyncio
+    async def test_cost_merged_when_research_completes(self):
+        """merge_stats called with provider_cost=total once polling returns completed."""
+        from backend.blocks.exa.research import ExaWaitForResearchBlock
+
+        block = ExaWaitForResearchBlock()
+        merged: list[NodeExecutionStats] = []
+        block.merge_stats = lambda s: merged.append(s)  # type: ignore[assignment]
+
+        poll_resp = MagicMock()
+        poll_resp.json.return_value = COMPLETED_RESEARCH_RESPONSE
+
+        mock_instance = MagicMock()
+        mock_instance.get = AsyncMock(return_value=poll_resp)
+
+        with (
+            patch("backend.blocks.exa.research.Requests", return_value=mock_instance),
+            patch("asyncio.sleep", new=AsyncMock()),
+        ):
+            async for _ in block.run(
+                block.Input(
+                    research_id="test-research-id",
+                    credentials=TEST_CREDENTIALS_INPUT,  # type: ignore[arg-type]
+                ),
+                credentials=TEST_CREDENTIALS,
+            ):
+                pass
+
+        assert len(merged) == 1
+        assert merged[0].provider_cost == pytest.approx(0.05)
+
+    @pytest.mark.asyncio
+    async def test_no_merge_when_no_cost_dollars(self):
+        """When completed research has no costDollars, merge_stats is not called."""
+        from backend.blocks.exa.research import ExaWaitForResearchBlock
+
+        block = ExaWaitForResearchBlock()
+        merged: list[NodeExecutionStats] = []
+        block.merge_stats = lambda s: merged.append(s)  # type: ignore[assignment]
+
+        no_cost_response = {**COMPLETED_RESEARCH_RESPONSE, "costDollars": None}
+        poll_resp = MagicMock()
+        poll_resp.json.return_value = no_cost_response
+
+        mock_instance = MagicMock()
+        mock_instance.get = AsyncMock(return_value=poll_resp)
+
+        with (
+            patch("backend.blocks.exa.research.Requests", return_value=mock_instance),
+            patch("asyncio.sleep", new=AsyncMock()),
+        ):
+            async for _ in block.run(
+                block.Input(
+                    research_id="test-research-id",
+                    credentials=TEST_CREDENTIALS_INPUT,  # type: ignore[arg-type]
+                ),
+                credentials=TEST_CREDENTIALS,
+            ):
+                pass
+
+        assert merged == []
--- a/autogpt_platform/backend/backend/blocks/exa/research.py
+++ b/autogpt_platform/backend/backend/blocks/exa/research.py
@@ -12,6 +12,7 @@ from typing import Any, Dict, List, Optional

 from pydantic import BaseModel

+from backend.data.model import NodeExecutionStats
 from backend.sdk import (
    APIKeyCredentials,
    Block,
@@ -232,6 +233,11 @@ class ExaCreateResearchBlock(Block):

                    if research.cost_dollars:
                        yield "cost_total", research.cost_dollars.total
+                        self.merge_stats(
+                            NodeExecutionStats(
+                                provider_cost=research.cost_dollars.total
+                            )
+                        )
                    return

                await asyncio.sleep(check_interval)
@@ -346,6 +352,9 @@ class ExaGetResearchBlock(Block):
            yield "cost_searches", research.cost_dollars.num_searches
            yield "cost_pages", research.cost_dollars.num_pages
            yield "cost_reasoning_tokens", research.cost_dollars.reasoning_tokens
+            self.merge_stats(
+                NodeExecutionStats(provider_cost=research.cost_dollars.total)
+            )

        yield "error_message", research.error

@@ -432,6 +441,9 @@ class ExaWaitForResearchBlock(Block):

                if research.cost_dollars:
                    yield "cost_total", research.cost_dollars.total
+                    self.merge_stats(
+                        NodeExecutionStats(provider_cost=research.cost_dollars.total)
+                    )

                return

--- a/autogpt_platform/backend/backend/blocks/exa/search.py
+++ b/autogpt_platform/backend/backend/blocks/exa/search.py
@@ -4,6 +4,7 @@ from typing import Optional

 from exa_py import AsyncExa

+from backend.data.model import NodeExecutionStats
 from backend.sdk import (
    APIKeyCredentials,
    Block,
@@ -206,3 +207,6 @@ class ExaSearchBlock(Block):

        if response.cost_dollars:
            yield "cost_dollars", response.cost_dollars
+            self.merge_stats(
+                NodeExecutionStats(provider_cost=response.cost_dollars.total)
+            )
--- a/autogpt_platform/backend/backend/blocks/exa/similar.py
+++ b/autogpt_platform/backend/backend/blocks/exa/similar.py
@@ -3,6 +3,7 @@ from typing import Optional

 from exa_py import AsyncExa

+from backend.data.model import NodeExecutionStats
 from backend.sdk import (
    APIKeyCredentials,
    Block,
@@ -167,3 +168,6 @@ class ExaFindSimilarBlock(Block):

        if response.cost_dollars:
            yield "cost_dollars", response.cost_dollars
+            self.merge_stats(
+                NodeExecutionStats(provider_cost=response.cost_dollars.total)
+            )
--- a/autogpt_platform/backend/backend/blocks/google_maps.py
+++ b/autogpt_platform/backend/backend/blocks/google_maps.py
@@ -14,6 +14,7 @@ from backend.data.model import (
    APIKeyCredentials,
    CredentialsField,
    CredentialsMetaInput,
+    NodeExecutionStats,
    SchemaField,
 )
 from backend.integrations.providers import ProviderName
@@ -117,6 +118,11 @@ class GoogleMapsSearchBlock(Block):
            input_data.radius,
            input_data.max_results,
        )
+        self.merge_stats(
+            NodeExecutionStats(
+                provider_cost=float(len(places)), provider_cost_type="items"
+            )
+        )
        for place in places:
            yield "place", place

--- a/autogpt_platform/backend/backend/blocks/jina/embeddings.py
+++ b/autogpt_platform/backend/backend/blocks/jina/embeddings.py
@@ -10,7 +10,7 @@ from backend.blocks.jina._auth import (
    JinaCredentialsField,
    JinaCredentialsInput,
 )
-from backend.data.model import SchemaField
+from backend.data.model import NodeExecutionStats, SchemaField
 from backend.util.request import Requests


@@ -45,5 +45,13 @@ class JinaEmbeddingBlock(Block):
        }
        data = {"input": input_data.texts, "model": input_data.model}
        response = await Requests().post(url, headers=headers, json=data)
-        embeddings = [e["embedding"] for e in response.json()["data"]]
+        resp_json = response.json()
+        embeddings = [e["embedding"] for e in resp_json["data"]]
+        usage = resp_json.get("usage", {})
+        if usage.get("total_tokens"):
+            self.merge_stats(
+                NodeExecutionStats(
+                    input_token_count=usage.get("total_tokens", 0),
+                )
+            )
        yield "embeddings", embeddings
--- a/autogpt_platform/backend/backend/blocks/llm.py
+++ b/autogpt_platform/backend/backend/blocks/llm.py
@@ -1,6 +1,7 @@
 # This file contains a lot of prompt block strings that would trigger "line too long"
 # flake8: noqa: E501
 import logging
+import math
 import re
 import secrets
 from abc import ABC
@@ -13,6 +14,7 @@ import ollama
 import openai
 from anthropic.types import ToolParam
 from groq import AsyncGroq
+from openai.types.chat import ChatCompletion as OpenAIChatCompletion
 from pydantic import BaseModel, SecretStr

 from backend.blocks._base import (
@@ -205,6 +207,19 @@ class LlmModel(str, Enum, metaclass=LlmModelMeta):
    KIMI_K2 = "moonshotai/kimi-k2"
    QWEN3_235B_A22B_THINKING = "qwen/qwen3-235b-a22b-thinking-2507"
    QWEN3_CODER = "qwen/qwen3-coder"
+    # Z.ai (Zhipu) models
+    ZAI_GLM_4_32B = "z-ai/glm-4-32b"
+    ZAI_GLM_4_5 = "z-ai/glm-4.5"
+    ZAI_GLM_4_5_AIR = "z-ai/glm-4.5-air"
+    ZAI_GLM_4_5_AIR_FREE = "z-ai/glm-4.5-air:free"
+    ZAI_GLM_4_5V = "z-ai/glm-4.5v"
+    ZAI_GLM_4_6 = "z-ai/glm-4.6"
+    ZAI_GLM_4_6V = "z-ai/glm-4.6v"
+    ZAI_GLM_4_7 = "z-ai/glm-4.7"
+    ZAI_GLM_4_7_FLASH = "z-ai/glm-4.7-flash"
+    ZAI_GLM_5 = "z-ai/glm-5"
+    ZAI_GLM_5_TURBO = "z-ai/glm-5-turbo"
+    ZAI_GLM_5V_TURBO = "z-ai/glm-5v-turbo"
    # Llama API models
    LLAMA_API_LLAMA_4_SCOUT = "Llama-4-Scout-17B-16E-Instruct-FP8"
    LLAMA_API_LLAMA4_MAVERICK = "Llama-4-Maverick-17B-128E-Instruct-FP8"
@@ -630,6 +645,43 @@ MODEL_METADATA = {
    LlmModel.QWEN3_CODER: ModelMetadata(
        "open_router", 262144, 262144, "Qwen 3 Coder", "OpenRouter", "Qwen", 3
    ),
+    # https://openrouter.ai/models?q=z-ai
+    LlmModel.ZAI_GLM_4_32B: ModelMetadata(
+        "open_router", 128000, 128000, "GLM 4 32B", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_4_5: ModelMetadata(
+        "open_router", 131072, 98304, "GLM 4.5", "OpenRouter", "Z.ai", 2
+    ),
+    LlmModel.ZAI_GLM_4_5_AIR: ModelMetadata(
+        "open_router", 131072, 98304, "GLM 4.5 Air", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_4_5_AIR_FREE: ModelMetadata(
+        "open_router", 131072, 96000, "GLM 4.5 Air (Free)", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_4_5V: ModelMetadata(
+        "open_router", 65536, 16384, "GLM 4.5V", "OpenRouter", "Z.ai", 2
+    ),
+    LlmModel.ZAI_GLM_4_6: ModelMetadata(
+        "open_router", 204800, 204800, "GLM 4.6", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_4_6V: ModelMetadata(
+        "open_router", 131072, 131072, "GLM 4.6V", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_4_7: ModelMetadata(
+        "open_router", 202752, 65535, "GLM 4.7", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_4_7_FLASH: ModelMetadata(
+        "open_router", 202752, 202752, "GLM 4.7 Flash", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_5: ModelMetadata(
+        "open_router", 80000, 80000, "GLM 5", "OpenRouter", "Z.ai", 2
+    ),
+    LlmModel.ZAI_GLM_5_TURBO: ModelMetadata(
+        "open_router", 202752, 131072, "GLM 5 Turbo", "OpenRouter", "Z.ai", 3
+    ),
+    LlmModel.ZAI_GLM_5V_TURBO: ModelMetadata(
+        "open_router", 202752, 131072, "GLM 5V Turbo", "OpenRouter", "Z.ai", 3
+    ),
    # Llama API models
    LlmModel.LLAMA_API_LLAMA_4_SCOUT: ModelMetadata(
        "llama_api",
@@ -687,6 +739,7 @@ class LLMResponse(BaseModel):
    prompt_tokens: int
    completion_tokens: int
    reasoning: Optional[str] = None
+    provider_cost: float | None = None


 def convert_openai_tool_fmt_to_anthropic(
@@ -721,6 +774,35 @@ def convert_openai_tool_fmt_to_anthropic(
    return anthropic_tools


+def extract_openrouter_cost(response: OpenAIChatCompletion) -> float | None:
+    """Extract OpenRouter's `x-total-cost` header from an OpenAI SDK response.
+
+    OpenRouter returns the per-request USD cost in a response header. The
+    OpenAI SDK exposes the raw httpx response via an undocumented `_response`
+    attribute. We use try/except AttributeError so that if the SDK ever drops
+    or renames that attribute, the warning is visible in logs rather than
+    silently degrading to no cost tracking.
+    """
+    try:
+        raw_resp = response._response  # type: ignore[attr-defined]
+    except AttributeError:
+        logger.warning(
+            "OpenAI SDK response missing _response attribute"
+            " — OpenRouter cost tracking unavailable"
+        )
+        return None
+    try:
+        cost_header = raw_resp.headers.get("x-total-cost")
+        if not cost_header:
+            return None
+        cost = float(cost_header)
+        if not math.isfinite(cost):
+            return None
+        return cost
+    except (ValueError, TypeError, AttributeError):
+        return None
+
+
 def extract_openai_reasoning(response) -> str | None:
    """Extract reasoning from OpenAI-compatible response if available."""
    """Note: This will likely not working since the reasoning is not present in another Response API"""
@@ -1053,6 +1135,7 @@ async def llm_call(
            prompt_tokens=response.usage.prompt_tokens if response.usage else 0,
            completion_tokens=response.usage.completion_tokens if response.usage else 0,
            reasoning=reasoning,
+            provider_cost=extract_openrouter_cost(response),
        )
    elif provider == "llama_api":
        tools_param = tools if tools else openai.NOT_GIVEN
@@ -1360,6 +1443,7 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):

        error_feedback_message = ""
        llm_model = input_data.model
+        last_attempt_cost: float | None = None

        for retry_count in range(input_data.retry):
            logger.debug(f"LLM request: {prompt}")
@@ -1377,12 +1461,15 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
                    max_tokens=input_data.max_tokens,
                )
                response_text = llm_response.response
-                self.merge_stats(
-                    NodeExecutionStats(
-                        input_token_count=llm_response.prompt_tokens,
-                        output_token_count=llm_response.completion_tokens,
-                    )
+                # Merge token counts for every attempt (each call costs tokens).
+                # provider_cost (actual USD) is tracked separately and only merged
+                # on success to avoid double-counting across retries.
+                token_stats = NodeExecutionStats(
+                    input_token_count=llm_response.prompt_tokens,
+                    output_token_count=llm_response.completion_tokens,
                )
+                self.merge_stats(token_stats)
+                last_attempt_cost = llm_response.provider_cost
                logger.debug(f"LLM attempt-{retry_count} response: {response_text}")

                if input_data.expected_format:
@@ -1451,6 +1538,7 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
                            NodeExecutionStats(
                                llm_call_count=retry_count + 1,
                                llm_retry_count=retry_count,
+                                provider_cost=last_attempt_cost,
                            )
                        )
                        yield "response", response_obj
@@ -1471,6 +1559,7 @@ class AIStructuredResponseGeneratorBlock(AIBlockBase):
                        NodeExecutionStats(
                            llm_call_count=retry_count + 1,
                            llm_retry_count=retry_count,
+                            provider_cost=last_attempt_cost,
                        )
                    )
                    yield "response", {"response": response_text}
--- a/autogpt_platform/backend/backend/blocks/smartlead/campaign.py
+++ b/autogpt_platform/backend/backend/blocks/smartlead/campaign.py
@@ -23,7 +23,7 @@ from backend.blocks.smartlead.models import (
    SaveSequencesResponse,
    Sequence,
 )
-from backend.data.model import CredentialsField, SchemaField
+from backend.data.model import CredentialsField, NodeExecutionStats, SchemaField


 class CreateCampaignBlock(Block):
@@ -226,6 +226,12 @@ class AddLeadToCampaignBlock(Block):
        response = await self.add_leads_to_campaign(
            input_data.campaign_id, input_data.lead_list, credentials
        )
+        self.merge_stats(
+            NodeExecutionStats(
+                provider_cost=float(len(input_data.lead_list)),
+                provider_cost_type="items",
+            )
+        )

        yield "campaign_id", input_data.campaign_id
        yield "upload_count", response.upload_count
--- a/autogpt_platform/backend/backend/blocks/test/test_llm.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_llm.py
@@ -199,6 +199,66 @@ class TestLLMStatsTracking:
        assert block.execution_stats.llm_call_count == 2  # retry_count + 1 = 1 + 1 = 2
        assert block.execution_stats.llm_retry_count == 1

+    @pytest.mark.asyncio
+    async def test_retry_cost_uses_last_attempt_only(self):
+        """provider_cost is only merged from the final successful attempt.
+
+        Intermediate retry costs are intentionally dropped to avoid
+        double-counting: the cost of failed attempts is captured in
+        last_attempt_cost only when the loop eventually succeeds.
+        """
+        import backend.blocks.llm as llm
+
+        block = llm.AIStructuredResponseGeneratorBlock()
+        call_count = 0
+
+        async def mock_llm_call(*args, **kwargs):
+            nonlocal call_count
+            call_count += 1
+            if call_count == 1:
+                # First attempt: fails validation, returns cost $0.01
+                return llm.LLMResponse(
+                    raw_response="",
+                    prompt=[],
+                    response='<json_output id="test123456">{"wrong": "key"}</json_output>',
+                    tool_calls=None,
+                    prompt_tokens=10,
+                    completion_tokens=5,
+                    reasoning=None,
+                    provider_cost=0.01,
+                )
+            # Second attempt: succeeds, returns cost $0.02
+            return llm.LLMResponse(
+                raw_response="",
+                prompt=[],
+                response='<json_output id="test123456">{"key1": "value1", "key2": "value2"}</json_output>',
+                tool_calls=None,
+                prompt_tokens=20,
+                completion_tokens=10,
+                reasoning=None,
+                provider_cost=0.02,
+            )
+
+        block.llm_call = mock_llm_call  # type: ignore
+
+        input_data = llm.AIStructuredResponseGeneratorBlock.Input(
+            prompt="Test prompt",
+            expected_format={"key1": "desc1", "key2": "desc2"},
+            model=llm.DEFAULT_LLM_MODEL,
+            credentials=llm.TEST_CREDENTIALS_INPUT,  # type: ignore
+            retry=2,
+        )
+
+        with patch("secrets.token_hex", return_value="test123456"):
+            async for _ in block.run(input_data, credentials=llm.TEST_CREDENTIALS):
+                pass
+
+        # Only the final successful attempt's cost is merged
+        assert block.execution_stats.provider_cost == pytest.approx(0.02)
+        # Tokens from both attempts accumulate
+        assert block.execution_stats.input_token_count == 30
+        assert block.execution_stats.output_token_count == 15
+
    @pytest.mark.asyncio
    async def test_ai_text_summarizer_multiple_chunks(self):
        """Test that AITextSummarizerBlock correctly accumulates stats across multiple chunks."""
@@ -987,3 +1047,63 @@ class TestLlmModelMissing:
        assert (
            llm.LlmModel("extra/google/gemini-2.5-pro") == llm.LlmModel.GEMINI_2_5_PRO
        )
+
+
+class TestExtractOpenRouterCost:
+    """Tests for extract_openrouter_cost — the x-total-cost header parser."""
+
+    def _mk_response(self, headers: dict | None):
+        response = MagicMock()
+        if headers is None:
+            response._response = None
+        else:
+            raw = MagicMock()
+            raw.headers = headers
+            response._response = raw
+        return response
+
+    def test_extracts_numeric_cost(self):
+        response = self._mk_response({"x-total-cost": "0.0042"})
+        assert llm.extract_openrouter_cost(response) == 0.0042
+
+    def test_returns_none_when_header_missing(self):
+        response = self._mk_response({})
+        assert llm.extract_openrouter_cost(response) is None
+
+    def test_returns_none_when_header_empty_string(self):
+        response = self._mk_response({"x-total-cost": ""})
+        assert llm.extract_openrouter_cost(response) is None
+
+    def test_returns_none_when_header_non_numeric(self):
+        response = self._mk_response({"x-total-cost": "not-a-number"})
+        assert llm.extract_openrouter_cost(response) is None
+
+    def test_returns_none_when_no_response_attr(self):
+        response = MagicMock(spec=[])  # no _response attr
+        assert llm.extract_openrouter_cost(response) is None
+
+    def test_returns_none_when_raw_is_none(self):
+        response = self._mk_response(None)
+        assert llm.extract_openrouter_cost(response) is None
+
+    def test_returns_none_when_raw_has_no_headers(self):
+        response = MagicMock()
+        response._response = MagicMock(spec=[])  # no headers attr
+        assert llm.extract_openrouter_cost(response) is None
+
+    def test_returns_zero_for_zero_cost(self):
+        """Zero-cost is a valid value (free tier) and must not become None."""
+        response = self._mk_response({"x-total-cost": "0"})
+        assert llm.extract_openrouter_cost(response) == 0.0
+
+    def test_returns_none_for_inf(self):
+        response = self._mk_response({"x-total-cost": "inf"})
+        assert llm.extract_openrouter_cost(response) is None
+
+    def test_returns_none_for_negative_inf(self):
+        response = self._mk_response({"x-total-cost": "-inf"})
+        assert llm.extract_openrouter_cost(response) is None
+
+    def test_returns_none_for_nan(self):
+        response = self._mk_response({"x-total-cost": "nan"})
+        assert llm.extract_openrouter_cost(response) is None
--- a/autogpt_platform/backend/backend/blocks/text_to_speech_block.py
+++ b/autogpt_platform/backend/backend/blocks/text_to_speech_block.py
@@ -13,6 +13,7 @@ from backend.data.model import (
    APIKeyCredentials,
    CredentialsField,
    CredentialsMetaInput,
+    NodeExecutionStats,
    SchemaField,
 )
 from backend.integrations.providers import ProviderName
@@ -104,4 +105,10 @@ class UnrealTextToSpeechBlock(Block):
            input_data.text,
            input_data.voice_id,
        )
+        self.merge_stats(
+            NodeExecutionStats(
+                provider_cost=float(len(input_data.text)),
+                provider_cost_type="characters",
+            )
+        )
        yield "mp3_url", api_response["OutputUri"]
--- a/autogpt_platform/backend/backend/copilot/baseline/service.py
+++ b/autogpt_platform/backend/backend/copilot/baseline/service.py
--- a/autogpt_platform/backend/backend/copilot/baseline/service_unit_test.py
+++ b/autogpt_platform/backend/backend/copilot/baseline/service_unit_test.py
@@ -0,0 +1,799 @@
+"""Unit tests for baseline service pure-logic helpers.
+
+These tests cover ``_baseline_conversation_updater`` and ``_BaselineStreamState``
+without requiring API keys, database connections, or network access.
+"""
+
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+from openai.types.chat import ChatCompletionToolParam
+
+from backend.copilot.baseline.service import (
+    _baseline_conversation_updater,
+    _BaselineStreamState,
+    _compress_session_messages,
+    _ThinkingStripper,
+)
+from backend.copilot.model import ChatMessage
+from backend.copilot.transcript_builder import TranscriptBuilder
+from backend.util.prompt import CompressResult
+from backend.util.tool_call_loop import LLMLoopResponse, LLMToolCall, ToolCallResult
+
+
+class TestBaselineStreamState:
+    def test_defaults(self):
+        state = _BaselineStreamState()
+        assert state.pending_events == []
+        assert state.assistant_text == ""
+        assert state.text_started is False
+        assert state.turn_prompt_tokens == 0
+        assert state.turn_completion_tokens == 0
+        assert state.text_block_id  # Should be a UUID string
+
+    def test_mutable_fields(self):
+        state = _BaselineStreamState()
+        state.assistant_text = "hello"
+        state.turn_prompt_tokens = 100
+        state.turn_completion_tokens = 50
+        assert state.assistant_text == "hello"
+        assert state.turn_prompt_tokens == 100
+        assert state.turn_completion_tokens == 50
+
+
+class TestBaselineConversationUpdater:
+    """Tests for _baseline_conversation_updater which updates the OpenAI
+    message list and transcript builder after each LLM call."""
+
+    def _make_transcript_builder(self) -> TranscriptBuilder:
+        builder = TranscriptBuilder()
+        builder.append_user("test question")
+        return builder
+
+    def test_text_only_response(self):
+        """When the LLM returns text without tool calls, the updater appends
+        a single assistant message and records it in the transcript."""
+        messages: list = []
+        builder = self._make_transcript_builder()
+        response = LLMLoopResponse(
+            response_text="Hello, world!",
+            tool_calls=[],
+            raw_response=None,
+            prompt_tokens=0,
+            completion_tokens=0,
+        )
+
+        _baseline_conversation_updater(
+            messages,
+            response,
+            tool_results=None,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        assert len(messages) == 1
+        assert messages[0]["role"] == "assistant"
+        assert messages[0]["content"] == "Hello, world!"
+        # Transcript should have user + assistant
+        assert builder.entry_count == 2
+        assert builder.last_entry_type == "assistant"
+
+    def test_tool_calls_response(self):
+        """When the LLM returns tool calls, the updater appends the assistant
+        message with tool_calls and tool result messages."""
+        messages: list = []
+        builder = self._make_transcript_builder()
+        response = LLMLoopResponse(
+            response_text="Let me search...",
+            tool_calls=[
+                LLMToolCall(
+                    id="tc_1",
+                    name="search",
+                    arguments='{"query": "test"}',
+                ),
+            ],
+            raw_response=None,
+            prompt_tokens=0,
+            completion_tokens=0,
+        )
+        tool_results = [
+            ToolCallResult(
+                tool_call_id="tc_1",
+                tool_name="search",
+                content="Found result",
+            ),
+        ]
+
+        _baseline_conversation_updater(
+            messages,
+            response,
+            tool_results=tool_results,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        # Messages: assistant (with tool_calls) + tool result
+        assert len(messages) == 2
+        assert messages[0]["role"] == "assistant"
+        assert messages[0]["content"] == "Let me search..."
+        assert len(messages[0]["tool_calls"]) == 1
+        assert messages[0]["tool_calls"][0]["id"] == "tc_1"
+        assert messages[1]["role"] == "tool"
+        assert messages[1]["tool_call_id"] == "tc_1"
+        assert messages[1]["content"] == "Found result"
+
+        # Transcript: user + assistant(tool_use) + user(tool_result)
+        assert builder.entry_count == 3
+
+    def test_tool_calls_without_text(self):
+        """Tool calls without accompanying text should still work."""
+        messages: list = []
+        builder = self._make_transcript_builder()
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[
+                LLMToolCall(id="tc_1", name="run", arguments="{}"),
+            ],
+            raw_response=None,
+            prompt_tokens=0,
+            completion_tokens=0,
+        )
+        tool_results = [
+            ToolCallResult(tool_call_id="tc_1", tool_name="run", content="done"),
+        ]
+
+        _baseline_conversation_updater(
+            messages,
+            response,
+            tool_results=tool_results,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        assert len(messages) == 2
+        assert "content" not in messages[0]  # No text content
+        assert messages[0]["tool_calls"][0]["function"]["name"] == "run"
+
+    def test_no_text_no_tools(self):
+        """When the response has no text and no tool calls, nothing is appended."""
+        messages: list = []
+        builder = self._make_transcript_builder()
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[],
+            raw_response=None,
+            prompt_tokens=0,
+            completion_tokens=0,
+        )
+
+        _baseline_conversation_updater(
+            messages,
+            response,
+            tool_results=None,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        assert len(messages) == 0
+        # Only the user entry from setup
+        assert builder.entry_count == 1
+
+    def test_multiple_tool_calls(self):
+        """Multiple tool calls in a single response are all recorded."""
+        messages: list = []
+        builder = self._make_transcript_builder()
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[
+                LLMToolCall(id="tc_1", name="tool_a", arguments="{}"),
+                LLMToolCall(id="tc_2", name="tool_b", arguments='{"x": 1}'),
+            ],
+            raw_response=None,
+            prompt_tokens=0,
+            completion_tokens=0,
+        )
+        tool_results = [
+            ToolCallResult(tool_call_id="tc_1", tool_name="tool_a", content="result_a"),
+            ToolCallResult(tool_call_id="tc_2", tool_name="tool_b", content="result_b"),
+        ]
+
+        _baseline_conversation_updater(
+            messages,
+            response,
+            tool_results=tool_results,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        # 1 assistant + 2 tool results
+        assert len(messages) == 3
+        assert len(messages[0]["tool_calls"]) == 2
+        assert messages[1]["tool_call_id"] == "tc_1"
+        assert messages[2]["tool_call_id"] == "tc_2"
+
+    def test_invalid_tool_arguments_handled(self):
+        """Tool call with invalid JSON arguments: the arguments field is
+        stored as-is in the message, and orjson failure falls back to {}
+        in the transcript content_blocks."""
+        messages: list = []
+        builder = self._make_transcript_builder()
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[
+                LLMToolCall(id="tc_1", name="tool_x", arguments="not-json"),
+            ],
+            raw_response=None,
+            prompt_tokens=0,
+            completion_tokens=0,
+        )
+        tool_results = [
+            ToolCallResult(tool_call_id="tc_1", tool_name="tool_x", content="ok"),
+        ]
+
+        _baseline_conversation_updater(
+            messages,
+            response,
+            tool_results=tool_results,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        # Should not raise — invalid JSON falls back to {} in transcript
+        assert len(messages) == 2
+        assert messages[0]["tool_calls"][0]["function"]["arguments"] == "not-json"
+
+
+class TestCompressSessionMessagesPreservesToolCalls:
+    """``_compress_session_messages`` must round-trip tool_calls + tool_call_id.
+
+    Compression serialises ChatMessage to dict for ``compress_context`` and
+    reifies the result back to ChatMessage.  A regression that drops
+    ``tool_calls`` or ``tool_call_id`` would corrupt the OpenAI message
+    list and break downstream tool-execution rounds.
+    """
+
+    @pytest.mark.asyncio
+    async def test_compressed_output_keeps_tool_calls_and_ids(self):
+        # Simulate compression that returns a summary + the most recent
+        # assistant(tool_call) + tool(tool_result) intact.
+        summary = {"role": "system", "content": "prior turns: user asked X"}
+        assistant_with_tc = {
+            "role": "assistant",
+            "content": "calling tool",
+            "tool_calls": [
+                {
+                    "id": "tc_abc",
+                    "type": "function",
+                    "function": {"name": "search", "arguments": '{"q":"y"}'},
+                }
+            ],
+        }
+        tool_result = {
+            "role": "tool",
+            "tool_call_id": "tc_abc",
+            "content": "search result",
+        }
+
+        compress_result = CompressResult(
+            messages=[summary, assistant_with_tc, tool_result],
+            token_count=100,
+            was_compacted=True,
+            original_token_count=5000,
+            messages_summarized=10,
+            messages_dropped=0,
+        )
+
+        # Input: messages that should be compressed.
+        input_messages = [
+            ChatMessage(role="user", content="q1"),
+            ChatMessage(
+                role="assistant",
+                content="calling tool",
+                tool_calls=[
+                    {
+                        "id": "tc_abc",
+                        "type": "function",
+                        "function": {
+                            "name": "search",
+                            "arguments": '{"q":"y"}',
+                        },
+                    }
+                ],
+            ),
+            ChatMessage(
+                role="tool",
+                tool_call_id="tc_abc",
+                content="search result",
+            ),
+        ]
+
+        with patch(
+            "backend.copilot.baseline.service.compress_context",
+            new=AsyncMock(return_value=compress_result),
+        ):
+            compressed = await _compress_session_messages(
+                input_messages, model="openrouter/anthropic/claude-opus-4"
+            )
+
+        # Summary, assistant(tool_calls), tool(tool_call_id).
+        assert len(compressed) == 3
+        # Assistant message must keep its tool_calls intact.
+        assistant_msg = compressed[1]
+        assert assistant_msg.role == "assistant"
+        assert assistant_msg.tool_calls is not None
+        assert len(assistant_msg.tool_calls) == 1
+        assert assistant_msg.tool_calls[0]["id"] == "tc_abc"
+        assert assistant_msg.tool_calls[0]["function"]["name"] == "search"
+        # Tool-role message must keep tool_call_id for OpenAI linkage.
+        tool_msg = compressed[2]
+        assert tool_msg.role == "tool"
+        assert tool_msg.tool_call_id == "tc_abc"
+        assert tool_msg.content == "search result"
+
+    @pytest.mark.asyncio
+    async def test_uncompressed_passthrough_keeps_fields(self):
+        """When compression is a no-op (was_compacted=False), the original
+        messages must be returned unchanged — including tool_calls."""
+        input_messages = [
+            ChatMessage(
+                role="assistant",
+                content="c",
+                tool_calls=[
+                    {
+                        "id": "t1",
+                        "type": "function",
+                        "function": {"name": "f", "arguments": "{}"},
+                    }
+                ],
+            ),
+            ChatMessage(role="tool", tool_call_id="t1", content="ok"),
+        ]
+
+        noop_result = CompressResult(
+            messages=[],  # ignored when was_compacted=False
+            token_count=10,
+            was_compacted=False,
+        )
+
+        with patch(
+            "backend.copilot.baseline.service.compress_context",
+            new=AsyncMock(return_value=noop_result),
+        ):
+            out = await _compress_session_messages(
+                input_messages, model="openrouter/anthropic/claude-opus-4"
+            )
+
+        assert out is input_messages  # same list returned
+        assert out[0].tool_calls is not None
+        assert out[0].tool_calls[0]["id"] == "t1"
+        assert out[1].tool_call_id == "t1"
+
+
+# ---- _ThinkingStripper tests ---- #
+
+
+def test_thinking_stripper_basic_thinking_tag() -> None:
+    """<thinking>...</thinking> blocks are fully stripped."""
+    s = _ThinkingStripper()
+    assert s.process("<thinking>internal reasoning here</thinking>Hello!") == "Hello!"
+
+
+def test_thinking_stripper_internal_reasoning_tag() -> None:
+    """<internal_reasoning>...</internal_reasoning> blocks (Gemini) are stripped."""
+    s = _ThinkingStripper()
+    assert (
+        s.process("<internal_reasoning>step by step</internal_reasoning>Answer")
+        == "Answer"
+    )
+
+
+def test_thinking_stripper_split_across_chunks() -> None:
+    """Tags split across multiple chunks are handled correctly."""
+    s = _ThinkingStripper()
+    out = s.process("Hello <thin")
+    out += s.process("king>secret</thinking> world")
+    assert out == "Hello  world"
+
+
+def test_thinking_stripper_plain_text_preserved() -> None:
+    """Plain text with the word 'thinking' is not stripped."""
+    s = _ThinkingStripper()
+    assert (
+        s.process("I am thinking about this problem")
+        == "I am thinking about this problem"
+    )
+
+
+def test_thinking_stripper_multiple_blocks() -> None:
+    """Multiple reasoning blocks in one stream are all stripped."""
+    s = _ThinkingStripper()
+    result = s.process(
+        "A<thinking>x</thinking>B<internal_reasoning>y</internal_reasoning>C"
+    )
+    assert result == "ABC"
+
+
+def test_thinking_stripper_flush_discards_unclosed() -> None:
+    """Unclosed reasoning block is discarded on flush."""
+    s = _ThinkingStripper()
+    s.process("Start<thinking>never closed")
+    flushed = s.flush()
+    assert "never closed" not in flushed
+
+
+def test_thinking_stripper_empty_block() -> None:
+    """Empty reasoning blocks are handled gracefully."""
+    s = _ThinkingStripper()
+    assert s.process("Before<thinking></thinking>After") == "BeforeAfter"
+
+
+# ---- _filter_tools_by_permissions tests ---- #
+
+
+def _make_tool(name: str) -> ChatCompletionToolParam:
+    """Build a minimal OpenAI ChatCompletionToolParam."""
+    return ChatCompletionToolParam(
+        type="function",
+        function={"name": name, "parameters": {}},
+    )
+
+
+class TestFilterToolsByPermissions:
+    """Tests for _filter_tools_by_permissions."""
+
+    @patch(
+        "backend.copilot.permissions.all_known_tool_names",
+        return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
+    )
+    def test_empty_permissions_returns_all(self, _mock_names):
+        """Empty permissions (no filtering) returns every tool unchanged."""
+        from backend.copilot.baseline.service import _filter_tools_by_permissions
+        from backend.copilot.permissions import CopilotPermissions
+
+        tools = [_make_tool("run_block"), _make_tool("web_fetch")]
+        perms = CopilotPermissions()
+        result = _filter_tools_by_permissions(tools, perms)
+        assert result == tools
+
+    @patch(
+        "backend.copilot.permissions.all_known_tool_names",
+        return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
+    )
+    def test_allowlist_keeps_only_matching(self, _mock_names):
+        """Explicit allowlist (tools_exclude=False) keeps only listed tools."""
+        from backend.copilot.baseline.service import _filter_tools_by_permissions
+        from backend.copilot.permissions import CopilotPermissions
+
+        tools = [
+            _make_tool("run_block"),
+            _make_tool("web_fetch"),
+            _make_tool("bash_exec"),
+        ]
+        perms = CopilotPermissions(tools=["web_fetch"], tools_exclude=False)
+        result = _filter_tools_by_permissions(tools, perms)
+        assert len(result) == 1
+        assert result[0]["function"]["name"] == "web_fetch"
+
+    @patch(
+        "backend.copilot.permissions.all_known_tool_names",
+        return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
+    )
+    def test_blacklist_excludes_listed(self, _mock_names):
+        """Blacklist (tools_exclude=True) removes only the listed tools."""
+        from backend.copilot.baseline.service import _filter_tools_by_permissions
+        from backend.copilot.permissions import CopilotPermissions
+
+        tools = [
+            _make_tool("run_block"),
+            _make_tool("web_fetch"),
+            _make_tool("bash_exec"),
+        ]
+        perms = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
+        result = _filter_tools_by_permissions(tools, perms)
+        names = [t["function"]["name"] for t in result]
+        assert "bash_exec" not in names
+        assert "run_block" in names
+        assert "web_fetch" in names
+        assert len(result) == 2
+
+    @patch(
+        "backend.copilot.permissions.all_known_tool_names",
+        return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
+    )
+    def test_unknown_tool_name_filtered_out(self, _mock_names):
+        """A tool whose name is not in all_known_tool_names is dropped."""
+        from backend.copilot.baseline.service import _filter_tools_by_permissions
+        from backend.copilot.permissions import CopilotPermissions
+
+        tools = [_make_tool("run_block"), _make_tool("unknown_tool")]
+        perms = CopilotPermissions(tools=["run_block"], tools_exclude=False)
+        result = _filter_tools_by_permissions(tools, perms)
+        names = [t["function"]["name"] for t in result]
+        assert "unknown_tool" not in names
+        assert names == ["run_block"]
+
+
+# ---- _prepare_baseline_attachments tests ---- #
+
+
+class TestPrepareBaselineAttachments:
+    """Tests for _prepare_baseline_attachments."""
+
+    @pytest.mark.asyncio
+    async def test_empty_file_ids(self):
+        """Empty file_ids returns empty hint and blocks."""
+        from backend.copilot.baseline.service import _prepare_baseline_attachments
+
+        hint, blocks = await _prepare_baseline_attachments([], "user1", "sess1", "/tmp")
+        assert hint == ""
+        assert blocks == []
+
+    @pytest.mark.asyncio
+    async def test_empty_user_id(self):
+        """Empty user_id returns empty hint and blocks."""
+        from backend.copilot.baseline.service import _prepare_baseline_attachments
+
+        hint, blocks = await _prepare_baseline_attachments(
+            ["file1"], "", "sess1", "/tmp"
+        )
+        assert hint == ""
+        assert blocks == []
+
+    @pytest.mark.asyncio
+    async def test_image_file_returns_vision_blocks(self):
+        """A PNG image within size limits is returned as a base64 vision block."""
+        from backend.copilot.baseline.service import _prepare_baseline_attachments
+
+        fake_info = AsyncMock()
+        fake_info.name = "photo.png"
+        fake_info.mime_type = "image/png"
+        fake_info.size_bytes = 1024
+
+        fake_manager = AsyncMock()
+        fake_manager.get_file_info = AsyncMock(return_value=fake_info)
+        fake_manager.read_file_by_id = AsyncMock(return_value=b"\x89PNG_FAKE_DATA")
+
+        with patch(
+            "backend.copilot.baseline.service.get_workspace_manager",
+            new=AsyncMock(return_value=fake_manager),
+        ):
+            hint, blocks = await _prepare_baseline_attachments(
+                ["fid1"], "user1", "sess1", "/tmp/workdir"
+            )
+
+        assert len(blocks) == 1
+        assert blocks[0]["type"] == "image"
+        assert blocks[0]["source"]["media_type"] == "image/png"
+        assert blocks[0]["source"]["type"] == "base64"
+        assert "photo.png" in hint
+        assert "embedded as image" in hint
+
+    @pytest.mark.asyncio
+    async def test_non_image_file_saved_to_working_dir(self, tmp_path):
+        """A non-image file is written to working_dir."""
+        from backend.copilot.baseline.service import _prepare_baseline_attachments
+
+        fake_info = AsyncMock()
+        fake_info.name = "data.csv"
+        fake_info.mime_type = "text/csv"
+        fake_info.size_bytes = 42
+
+        fake_manager = AsyncMock()
+        fake_manager.get_file_info = AsyncMock(return_value=fake_info)
+        fake_manager.read_file_by_id = AsyncMock(return_value=b"col1,col2\na,b")
+
+        with patch(
+            "backend.copilot.baseline.service.get_workspace_manager",
+            new=AsyncMock(return_value=fake_manager),
+        ):
+            hint, blocks = await _prepare_baseline_attachments(
+                ["fid1"], "user1", "sess1", str(tmp_path)
+            )
+
+        assert blocks == []
+        assert "data.csv" in hint
+        assert "saved to" in hint
+        saved = tmp_path / "data.csv"
+        assert saved.exists()
+        assert saved.read_bytes() == b"col1,col2\na,b"
+
+    @pytest.mark.asyncio
+    async def test_file_not_found_skipped(self):
+        """When get_file_info returns None the file is silently skipped."""
+        from backend.copilot.baseline.service import _prepare_baseline_attachments
+
+        fake_manager = AsyncMock()
+        fake_manager.get_file_info = AsyncMock(return_value=None)
+
+        with patch(
+            "backend.copilot.baseline.service.get_workspace_manager",
+            new=AsyncMock(return_value=fake_manager),
+        ):
+            hint, blocks = await _prepare_baseline_attachments(
+                ["missing_id"], "user1", "sess1", "/tmp"
+            )
+
+        assert hint == ""
+        assert blocks == []
+
+    @pytest.mark.asyncio
+    async def test_workspace_manager_error(self):
+        """When get_workspace_manager raises, returns empty results."""
+        from backend.copilot.baseline.service import _prepare_baseline_attachments
+
+        with patch(
+            "backend.copilot.baseline.service.get_workspace_manager",
+            new=AsyncMock(side_effect=RuntimeError("connection failed")),
+        ):
+            hint, blocks = await _prepare_baseline_attachments(
+                ["fid1"], "user1", "sess1", "/tmp"
+            )
+
+        assert hint == ""
+        assert blocks == []
+
+
+class TestBaselineCostExtraction:
+    """Tests for x-total-cost header extraction in _baseline_llm_caller."""
+
+    @pytest.mark.asyncio
+    async def test_cost_usd_extracted_from_response_header(self):
+        """state.cost_usd is set from x-total-cost header when present."""
+        from backend.copilot.baseline.service import (
+            _baseline_llm_caller,
+            _BaselineStreamState,
+        )
+
+        state = _BaselineStreamState(model="gpt-4o-mini")
+
+        # Build a mock raw httpx response with the cost header
+        mock_raw_response = MagicMock()
+        mock_raw_response.headers = {"x-total-cost": "0.0123"}
+
+        # Build a mock async streaming response that yields no chunks but has
+        # a _response attribute pointing to the mock httpx response
+        mock_stream_response = MagicMock()
+        mock_stream_response._response = mock_raw_response
+
+        async def empty_aiter():
+            return
+            yield  # make it an async generator
+
+        mock_stream_response.__aiter__ = lambda self: empty_aiter()
+
+        mock_client = MagicMock()
+        mock_client.chat.completions.create = AsyncMock(
+            return_value=mock_stream_response
+        )
+
+        with patch(
+            "backend.copilot.baseline.service._get_openai_client",
+            return_value=mock_client,
+        ):
+            await _baseline_llm_caller(
+                messages=[{"role": "user", "content": "hi"}],
+                tools=[],
+                state=state,
+            )
+
+        assert state.cost_usd == pytest.approx(0.0123)
+
+    @pytest.mark.asyncio
+    async def test_cost_usd_accumulates_across_calls(self):
+        """cost_usd accumulates when _baseline_llm_caller is called multiple times."""
+        from backend.copilot.baseline.service import (
+            _baseline_llm_caller,
+            _BaselineStreamState,
+        )
+
+        state = _BaselineStreamState(model="gpt-4o-mini")
+
+        def make_stream_mock(cost: str) -> MagicMock:
+            mock_raw = MagicMock()
+            mock_raw.headers = {"x-total-cost": cost}
+            mock_stream = MagicMock()
+            mock_stream._response = mock_raw
+
+            async def empty_aiter():
+                return
+                yield
+
+            mock_stream.__aiter__ = lambda self: empty_aiter()
+            return mock_stream
+
+        mock_client = MagicMock()
+        mock_client.chat.completions.create = AsyncMock(
+            side_effect=[make_stream_mock("0.01"), make_stream_mock("0.02")]
+        )
+
+        with patch(
+            "backend.copilot.baseline.service._get_openai_client",
+            return_value=mock_client,
+        ):
+            await _baseline_llm_caller(
+                messages=[{"role": "user", "content": "first"}],
+                tools=[],
+                state=state,
+            )
+            await _baseline_llm_caller(
+                messages=[{"role": "user", "content": "second"}],
+                tools=[],
+                state=state,
+            )
+
+        assert state.cost_usd == pytest.approx(0.03)
+
+    @pytest.mark.asyncio
+    async def test_no_cost_when_header_absent(self):
+        """state.cost_usd remains None when response has no x-total-cost header."""
+        from backend.copilot.baseline.service import (
+            _baseline_llm_caller,
+            _BaselineStreamState,
+        )
+
+        state = _BaselineStreamState(model="gpt-4o-mini")
+
+        mock_raw = MagicMock()
+        mock_raw.headers = {}
+        mock_stream = MagicMock()
+        mock_stream._response = mock_raw
+
+        async def empty_aiter():
+            return
+            yield
+
+        mock_stream.__aiter__ = lambda self: empty_aiter()
+
+        mock_client = MagicMock()
+        mock_client.chat.completions.create = AsyncMock(return_value=mock_stream)
+
+        with patch(
+            "backend.copilot.baseline.service._get_openai_client",
+            return_value=mock_client,
+        ):
+            await _baseline_llm_caller(
+                messages=[{"role": "user", "content": "hi"}],
+                tools=[],
+                state=state,
+            )
+
+        assert state.cost_usd is None
+
+    @pytest.mark.asyncio
+    async def test_cost_extracted_even_when_stream_raises(self):
+        """cost_usd is captured in the finally block even when streaming fails."""
+        from backend.copilot.baseline.service import (
+            _baseline_llm_caller,
+            _BaselineStreamState,
+        )
+
+        state = _BaselineStreamState(model="gpt-4o-mini")
+
+        mock_raw = MagicMock()
+        mock_raw.headers = {"x-total-cost": "0.005"}
+        mock_stream = MagicMock()
+        mock_stream._response = mock_raw
+
+        async def failing_aiter():
+            raise RuntimeError("stream error")
+            yield  # make it an async generator
+
+        mock_stream.__aiter__ = lambda self: failing_aiter()
+
+        mock_client = MagicMock()
+        mock_client.chat.completions.create = AsyncMock(return_value=mock_stream)
+
+        with (
+            patch(
+                "backend.copilot.baseline.service._get_openai_client",
+                return_value=mock_client,
+            ),
+            pytest.raises(RuntimeError, match="stream error"),
+        ):
+            await _baseline_llm_caller(
+                messages=[{"role": "user", "content": "hi"}],
+                tools=[],
+                state=state,
+            )
+
+        assert state.cost_usd == pytest.approx(0.005)
--- a/autogpt_platform/backend/backend/copilot/baseline/transcript_integration_test.py
+++ b/autogpt_platform/backend/backend/copilot/baseline/transcript_integration_test.py
@@ -0,0 +1,667 @@
+"""Integration tests for baseline transcript flow.
+
+Exercises the real helpers in ``baseline/service.py`` that download,
+validate, load, append to, backfill, and upload the transcript.
+Storage is mocked via ``download_transcript`` / ``upload_transcript``
+patches; no network access is required.
+"""
+
+import json as stdlib_json
+from unittest.mock import AsyncMock, patch
+
+import pytest
+
+from backend.copilot.baseline.service import (
+    _load_prior_transcript,
+    _record_turn_to_transcript,
+    _resolve_baseline_model,
+    _upload_final_transcript,
+    is_transcript_stale,
+    should_upload_transcript,
+)
+from backend.copilot.service import config
+from backend.copilot.transcript import (
+    STOP_REASON_END_TURN,
+    STOP_REASON_TOOL_USE,
+    TranscriptDownload,
+)
+from backend.copilot.transcript_builder import TranscriptBuilder
+from backend.util.tool_call_loop import LLMLoopResponse, LLMToolCall, ToolCallResult
+
+
+def _make_transcript_content(*roles: str) -> str:
+    """Build a minimal valid JSONL transcript from role names."""
+    lines = []
+    parent = ""
+    for i, role in enumerate(roles):
+        uid = f"uuid-{i}"
+        entry: dict = {
+            "type": role,
+            "uuid": uid,
+            "parentUuid": parent,
+            "message": {
+                "role": role,
+                "content": [{"type": "text", "text": f"{role} message {i}"}],
+            },
+        }
+        if role == "assistant":
+            entry["message"]["id"] = f"msg_{i}"
+            entry["message"]["model"] = "test-model"
+            entry["message"]["type"] = "message"
+            entry["message"]["stop_reason"] = STOP_REASON_END_TURN
+        lines.append(stdlib_json.dumps(entry))
+        parent = uid
+    return "\n".join(lines) + "\n"
+
+
+class TestResolveBaselineModel:
+    """Model selection honours the per-request mode."""
+
+    def test_fast_mode_selects_fast_model(self):
+        assert _resolve_baseline_model("fast") == config.fast_model
+
+    def test_extended_thinking_selects_default_model(self):
+        assert _resolve_baseline_model("extended_thinking") == config.model
+
+    def test_none_mode_selects_default_model(self):
+        """Critical: baseline users without a mode MUST keep the default (opus)."""
+        assert _resolve_baseline_model(None) == config.model
+
+    def test_default_and_fast_models_differ(self):
+        """Sanity: the two tiers are actually distinct in production config."""
+        assert config.model != config.fast_model
+
+
+class TestLoadPriorTranscript:
+    """``_load_prior_transcript`` wraps the download + validate + load flow."""
+
+    @pytest.mark.asyncio
+    async def test_loads_fresh_transcript(self):
+        builder = TranscriptBuilder()
+        content = _make_transcript_content("user", "assistant")
+        download = TranscriptDownload(content=content, message_count=2)
+
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(return_value=download),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=3,
+                transcript_builder=builder,
+            )
+
+        assert covers is True
+        assert builder.entry_count == 2
+        assert builder.last_entry_type == "assistant"
+
+    @pytest.mark.asyncio
+    async def test_rejects_stale_transcript(self):
+        """msg_count strictly less than session-1 is treated as stale."""
+        builder = TranscriptBuilder()
+        content = _make_transcript_content("user", "assistant")
+        # session has 6 messages, transcript only covers 2 → stale.
+        download = TranscriptDownload(content=content, message_count=2)
+
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(return_value=download),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=6,
+                transcript_builder=builder,
+            )
+
+        assert covers is False
+        assert builder.is_empty
+
+    @pytest.mark.asyncio
+    async def test_missing_transcript_returns_false(self):
+        builder = TranscriptBuilder()
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(return_value=None),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=2,
+                transcript_builder=builder,
+            )
+
+        assert covers is False
+        assert builder.is_empty
+
+    @pytest.mark.asyncio
+    async def test_invalid_transcript_returns_false(self):
+        builder = TranscriptBuilder()
+        download = TranscriptDownload(
+            content='{"type":"progress","uuid":"a"}\n',
+            message_count=1,
+        )
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(return_value=download),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=2,
+                transcript_builder=builder,
+            )
+
+        assert covers is False
+        assert builder.is_empty
+
+    @pytest.mark.asyncio
+    async def test_download_exception_returns_false(self):
+        builder = TranscriptBuilder()
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(side_effect=RuntimeError("boom")),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=2,
+                transcript_builder=builder,
+            )
+
+        assert covers is False
+        assert builder.is_empty
+
+    @pytest.mark.asyncio
+    async def test_zero_message_count_not_stale(self):
+        """When msg_count is 0 (unknown), staleness check is skipped."""
+        builder = TranscriptBuilder()
+        download = TranscriptDownload(
+            content=_make_transcript_content("user", "assistant"),
+            message_count=0,
+        )
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(return_value=download),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=20,
+                transcript_builder=builder,
+            )
+
+        assert covers is True
+        assert builder.entry_count == 2
+
+
+class TestUploadFinalTranscript:
+    """``_upload_final_transcript`` serialises and calls storage."""
+
+    @pytest.mark.asyncio
+    async def test_uploads_valid_transcript(self):
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "hello"}],
+            model="test-model",
+            stop_reason=STOP_REASON_END_TURN,
+        )
+
+        upload_mock = AsyncMock(return_value=None)
+        with patch(
+            "backend.copilot.baseline.service.upload_transcript",
+            new=upload_mock,
+        ):
+            await _upload_final_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                transcript_builder=builder,
+                session_msg_count=2,
+            )
+
+        upload_mock.assert_awaited_once()
+        assert upload_mock.await_args is not None
+        call_kwargs = upload_mock.await_args.kwargs
+        assert call_kwargs["user_id"] == "user-1"
+        assert call_kwargs["session_id"] == "session-1"
+        assert call_kwargs["message_count"] == 2
+        assert "hello" in call_kwargs["content"]
+
+    @pytest.mark.asyncio
+    async def test_skips_upload_when_builder_empty(self):
+        builder = TranscriptBuilder()
+        upload_mock = AsyncMock(return_value=None)
+        with patch(
+            "backend.copilot.baseline.service.upload_transcript",
+            new=upload_mock,
+        ):
+            await _upload_final_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                transcript_builder=builder,
+                session_msg_count=0,
+            )
+
+        upload_mock.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_swallows_upload_exceptions(self):
+        """Upload failures should not propagate (flow continues for the user)."""
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "hello"}],
+            model="test-model",
+            stop_reason=STOP_REASON_END_TURN,
+        )
+
+        with patch(
+            "backend.copilot.baseline.service.upload_transcript",
+            new=AsyncMock(side_effect=RuntimeError("storage unavailable")),
+        ):
+            # Should not raise.
+            await _upload_final_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                transcript_builder=builder,
+                session_msg_count=2,
+            )
+
+
+class TestRecordTurnToTranscript:
+    """``_record_turn_to_transcript`` translates LLMLoopResponse → transcript."""
+
+    def test_records_final_assistant_text(self):
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+
+        response = LLMLoopResponse(
+            response_text="hello there",
+            tool_calls=[],
+            raw_response=None,
+        )
+        _record_turn_to_transcript(
+            response,
+            tool_results=None,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        assert builder.entry_count == 2
+        assert builder.last_entry_type == "assistant"
+        jsonl = builder.to_jsonl()
+        assert "hello there" in jsonl
+        assert STOP_REASON_END_TURN in jsonl
+
+    def test_records_tool_use_then_tool_result(self):
+        """Anthropic ordering: assistant(tool_use) → user(tool_result)."""
+        builder = TranscriptBuilder()
+        builder.append_user(content="use a tool")
+
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[
+                LLMToolCall(id="call-1", name="echo", arguments='{"text":"hi"}')
+            ],
+            raw_response=None,
+        )
+        tool_results = [
+            ToolCallResult(tool_call_id="call-1", tool_name="echo", content="hi")
+        ]
+        _record_turn_to_transcript(
+            response,
+            tool_results,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        # user, assistant(tool_use), user(tool_result) = 3 entries
+        assert builder.entry_count == 3
+        jsonl = builder.to_jsonl()
+        assert STOP_REASON_TOOL_USE in jsonl
+        assert "tool_use" in jsonl
+        assert "tool_result" in jsonl
+        assert "call-1" in jsonl
+
+    def test_records_nothing_on_empty_response(self):
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[],
+            raw_response=None,
+        )
+        _record_turn_to_transcript(
+            response,
+            tool_results=None,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        assert builder.entry_count == 1
+
+    def test_malformed_tool_args_dont_crash(self):
+        """Bad JSON in tool arguments falls back to {} without raising."""
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[LLMToolCall(id="call-1", name="echo", arguments="{not-json")],
+            raw_response=None,
+        )
+        tool_results = [
+            ToolCallResult(tool_call_id="call-1", tool_name="echo", content="ok")
+        ]
+        _record_turn_to_transcript(
+            response,
+            tool_results,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        assert builder.entry_count == 3
+        jsonl = builder.to_jsonl()
+        assert '"input":{}' in jsonl
+
+
+class TestRoundTrip:
+    """End-to-end: load prior → append new turn → upload."""
+
+    @pytest.mark.asyncio
+    async def test_full_round_trip(self):
+        prior = _make_transcript_content("user", "assistant")
+        download = TranscriptDownload(content=prior, message_count=2)
+
+        builder = TranscriptBuilder()
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(return_value=download),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=3,
+                transcript_builder=builder,
+            )
+        assert covers is True
+        assert builder.entry_count == 2
+
+        # New user turn.
+        builder.append_user(content="new question")
+        assert builder.entry_count == 3
+
+        # New assistant turn.
+        response = LLMLoopResponse(
+            response_text="new answer",
+            tool_calls=[],
+            raw_response=None,
+        )
+        _record_turn_to_transcript(
+            response,
+            tool_results=None,
+            transcript_builder=builder,
+            model="test-model",
+        )
+        assert builder.entry_count == 4
+
+        # Upload.
+        upload_mock = AsyncMock(return_value=None)
+        with patch(
+            "backend.copilot.baseline.service.upload_transcript",
+            new=upload_mock,
+        ):
+            await _upload_final_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                transcript_builder=builder,
+                session_msg_count=4,
+            )
+
+        upload_mock.assert_awaited_once()
+        assert upload_mock.await_args is not None
+        uploaded = upload_mock.await_args.kwargs["content"]
+        assert "new question" in uploaded
+        assert "new answer" in uploaded
+        # Original content preserved in the round trip.
+        assert "user message 0" in uploaded
+        assert "assistant message 1" in uploaded
+
+    @pytest.mark.asyncio
+    async def test_backfill_append_guard(self):
+        """Backfill only runs when the last entry is not already assistant."""
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+
+        # Simulate the backfill guard from stream_chat_completion_baseline.
+        assistant_text = "partial text before error"
+        if builder.last_entry_type != "assistant":
+            builder.append_assistant(
+                content_blocks=[{"type": "text", "text": assistant_text}],
+                model="test-model",
+                stop_reason=STOP_REASON_END_TURN,
+            )
+
+        assert builder.last_entry_type == "assistant"
+        assert "partial text before error" in builder.to_jsonl()
+
+        # Second invocation: the guard must prevent double-append.
+        initial_count = builder.entry_count
+        if builder.last_entry_type != "assistant":
+            builder.append_assistant(
+                content_blocks=[{"type": "text", "text": "duplicate"}],
+                model="test-model",
+                stop_reason=STOP_REASON_END_TURN,
+            )
+        assert builder.entry_count == initial_count
+
+
+class TestIsTranscriptStale:
+    """``is_transcript_stale`` gates prior-transcript loading."""
+
+    def test_none_download_is_not_stale(self):
+        assert is_transcript_stale(None, session_msg_count=5) is False
+
+    def test_zero_message_count_is_not_stale(self):
+        """Legacy transcripts without msg_count tracking must remain usable."""
+        dl = TranscriptDownload(content="", message_count=0)
+        assert is_transcript_stale(dl, session_msg_count=20) is False
+
+    def test_stale_when_covers_less_than_prefix(self):
+        dl = TranscriptDownload(content="", message_count=2)
+        # session has 6 messages; transcript must cover at least 5 (6-1).
+        assert is_transcript_stale(dl, session_msg_count=6) is True
+
+    def test_fresh_when_covers_full_prefix(self):
+        dl = TranscriptDownload(content="", message_count=5)
+        assert is_transcript_stale(dl, session_msg_count=6) is False
+
+    def test_fresh_when_exceeds_prefix(self):
+        """Race: transcript ahead of session count is still acceptable."""
+        dl = TranscriptDownload(content="", message_count=10)
+        assert is_transcript_stale(dl, session_msg_count=6) is False
+
+    def test_boundary_equal_to_prefix_minus_one(self):
+        dl = TranscriptDownload(content="", message_count=5)
+        assert is_transcript_stale(dl, session_msg_count=6) is False
+
+
+class TestShouldUploadTranscript:
+    """``should_upload_transcript`` gates the final upload."""
+
+    def test_upload_allowed_for_user_with_coverage(self):
+        assert should_upload_transcript("user-1", True) is True
+
+    def test_upload_skipped_when_no_user(self):
+        assert should_upload_transcript(None, True) is False
+
+    def test_upload_skipped_when_empty_user(self):
+        assert should_upload_transcript("", True) is False
+
+    def test_upload_skipped_without_coverage(self):
+        """Partial transcript must never clobber a more complete stored one."""
+        assert should_upload_transcript("user-1", False) is False
+
+    def test_upload_skipped_when_no_user_and_no_coverage(self):
+        assert should_upload_transcript(None, False) is False
+
+
+class TestTranscriptLifecycle:
+    """End-to-end: download → validate → build → upload.
+
+    Simulates the full transcript lifecycle inside
+    ``stream_chat_completion_baseline`` by mocking the storage layer and
+    driving each step through the real helpers.
+    """
+
+    @pytest.mark.asyncio
+    async def test_full_lifecycle_happy_path(self):
+        """Fresh download, append a turn, upload covers the session."""
+        builder = TranscriptBuilder()
+        prior = _make_transcript_content("user", "assistant")
+        download = TranscriptDownload(content=prior, message_count=2)
+
+        upload_mock = AsyncMock(return_value=None)
+        with (
+            patch(
+                "backend.copilot.baseline.service.download_transcript",
+                new=AsyncMock(return_value=download),
+            ),
+            patch(
+                "backend.copilot.baseline.service.upload_transcript",
+                new=upload_mock,
+            ),
+        ):
+            # --- 1. Download & load prior transcript ---
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=3,
+                transcript_builder=builder,
+            )
+            assert covers is True
+
+            # --- 2. Append a new user turn + a new assistant response ---
+            builder.append_user(content="follow-up question")
+            _record_turn_to_transcript(
+                LLMLoopResponse(
+                    response_text="follow-up answer",
+                    tool_calls=[],
+                    raw_response=None,
+                ),
+                tool_results=None,
+                transcript_builder=builder,
+                model="test-model",
+            )
+
+            # --- 3. Gate + upload ---
+            assert (
+                should_upload_transcript(
+                    user_id="user-1", transcript_covers_prefix=covers
+                )
+                is True
+            )
+            await _upload_final_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                transcript_builder=builder,
+                session_msg_count=4,
+            )
+
+        upload_mock.assert_awaited_once()
+        assert upload_mock.await_args is not None
+        uploaded = upload_mock.await_args.kwargs["content"]
+        assert "follow-up question" in uploaded
+        assert "follow-up answer" in uploaded
+        # Original prior-turn content preserved.
+        assert "user message 0" in uploaded
+        assert "assistant message 1" in uploaded
+
+    @pytest.mark.asyncio
+    async def test_lifecycle_stale_download_suppresses_upload(self):
+        """Stale download → covers=False → upload must be skipped."""
+        builder = TranscriptBuilder()
+        # session has 10 msgs but stored transcript only covers 2 → stale.
+        stale = TranscriptDownload(
+            content=_make_transcript_content("user", "assistant"),
+            message_count=2,
+        )
+
+        upload_mock = AsyncMock(return_value=None)
+        with (
+            patch(
+                "backend.copilot.baseline.service.download_transcript",
+                new=AsyncMock(return_value=stale),
+            ),
+            patch(
+                "backend.copilot.baseline.service.upload_transcript",
+                new=upload_mock,
+            ),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=10,
+                transcript_builder=builder,
+            )
+
+        assert covers is False
+        # The caller's gate mirrors the production path.
+        assert (
+            should_upload_transcript(user_id="user-1", transcript_covers_prefix=covers)
+            is False
+        )
+        upload_mock.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_lifecycle_anonymous_user_skips_upload(self):
+        """Anonymous (user_id=None) → upload gate must return False."""
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "hello"}],
+            model="test-model",
+            stop_reason=STOP_REASON_END_TURN,
+        )
+
+        assert (
+            should_upload_transcript(user_id=None, transcript_covers_prefix=True)
+            is False
+        )
+
+    @pytest.mark.asyncio
+    async def test_lifecycle_missing_download_still_uploads_new_content(self):
+        """No prior transcript → covers defaults to True in the service,
+        new turn should upload cleanly."""
+        builder = TranscriptBuilder()
+        upload_mock = AsyncMock(return_value=None)
+        with (
+            patch(
+                "backend.copilot.baseline.service.download_transcript",
+                new=AsyncMock(return_value=None),
+            ),
+            patch(
+                "backend.copilot.baseline.service.upload_transcript",
+                new=upload_mock,
+            ),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=1,
+                transcript_builder=builder,
+            )
+            # No download: covers is False, so the production path would
+            # skip upload. This protects against overwriting a future
+            # more-complete transcript with a single-turn snapshot.
+            assert covers is False
+            assert (
+                should_upload_transcript(
+                    user_id="user-1", transcript_covers_prefix=covers
+                )
+                is False
+            )
+            upload_mock.assert_not_awaited()
--- a/autogpt_platform/backend/backend/copilot/config.py
+++ b/autogpt_platform/backend/backend/copilot/config.py
@@ -8,13 +8,26 @@ from pydantic_settings import BaseSettings

 from backend.util.clients import OPENROUTER_BASE_URL

+# Per-request routing mode for a single chat turn.
+# - 'fast': route to the baseline OpenAI-compatible path with the cheaper model.
+# - 'extended_thinking': route to the Claude Agent SDK path with the default
+#   (opus) model.
+# ``None`` means "no override"; the server falls back to the Claude Code
+# subscription flag → LaunchDarkly COPILOT_SDK → config.use_claude_agent_sdk.
+CopilotMode = Literal["fast", "extended_thinking"]
+

 class ChatConfig(BaseSettings):
    """Configuration for the chat system."""

    # OpenAI API Configuration
    model: str = Field(
-        default="anthropic/claude-opus-4.6", description="Default model to use"
+        default="anthropic/claude-opus-4.6",
+        description="Default model for extended thinking mode",
+    )
+    fast_model: str = Field(
+        default="anthropic/claude-sonnet-4",
+        description="Model for fast mode (baseline path). Should be faster/cheaper than the default model.",
    )
    title_model: str = Field(
        default="openai/gpt-4o-mini",
@@ -81,11 +94,11 @@ class ChatConfig(BaseSettings):
    # allows ~70-100 turns/day.
    # Checked at the HTTP layer (routes.py) before each turn.
    #
-    # TODO: These are deploy-time constants applied identically to every user.
-    #  If per-user or per-plan limits are needed (e.g., free tier vs paid), these
-    #  must move to the database (e.g., a UserPlan table) and get_usage_status /
-    #  check_rate_limit would look up each user's specific limits instead of
-    #  reading config.daily_token_limit / config.weekly_token_limit.
+    # These are base limits for the FREE tier. Higher tiers (PRO, BUSINESS,
+    # ENTERPRISE) multiply these by their tier multiplier (see
+    # rate_limit.TIER_MULTIPLIERS). User tier is stored in the
+    # User.subscriptionTier DB column and resolved inside
+    # get_global_rate_limits().
    daily_token_limit: int = Field(
        default=2_500_000,
        description="Max tokens per day, resets at midnight UTC (0 = unlimited)",
--- a/autogpt_platform/backend/backend/copilot/db.py
+++ b/autogpt_platform/backend/backend/copilot/db.py
@@ -14,6 +14,7 @@ from prisma.types import (
    ChatSessionUpdateInput,
    ChatSessionWhereInput,
 )
+from pydantic import BaseModel

 from backend.data import db
 from backend.util.json import SafeJson, sanitize_string
@@ -23,12 +24,22 @@ from .model import (
    ChatSession,
    ChatSessionInfo,
    ChatSessionMetadata,
-    invalidate_session_cache,
+    cache_chat_session,
 )
+from .model import get_chat_session as get_chat_session_cached

 logger = logging.getLogger(__name__)


+class PaginatedMessages(BaseModel):
+    """Result of a paginated message query."""
+
+    messages: list[ChatMessage]
+    has_more: bool
+    oldest_sequence: int | None
+    session: ChatSessionInfo
+
+
 async def get_chat_session(session_id: str) -> ChatSession | None:
    """Get a chat session by ID from the database."""
    session = await PrismaChatSession.prisma().find_unique(
@@ -38,6 +49,116 @@ async def get_chat_session(session_id: str) -> ChatSession | None:
    return ChatSession.from_db(session) if session else None


+async def get_chat_session_metadata(session_id: str) -> ChatSessionInfo | None:
+    """Get chat session metadata (without messages) for ownership validation."""
+    session = await PrismaChatSession.prisma().find_unique(
+        where={"id": session_id},
+    )
+    return ChatSessionInfo.from_db(session) if session else None
+
+
+async def get_chat_messages_paginated(
+    session_id: str,
+    limit: int = 50,
+    before_sequence: int | None = None,
+    user_id: str | None = None,
+) -> PaginatedMessages | None:
+    """Get paginated messages for a session, newest first.
+
+    Verifies session existence (and ownership when ``user_id`` is provided)
+    in parallel with the message query.  Returns ``None`` when the session
+    is not found or does not belong to the user.
+
+    Args:
+        session_id: The chat session ID.
+        limit: Max messages to return.
+        before_sequence: Cursor — return messages with sequence < this value.
+        user_id: If provided, filters via ``Session.userId`` so only the
+            session owner's messages are returned (acts as an ownership guard).
+    """
+    # Build session-existence / ownership check
+    session_where: ChatSessionWhereInput = {"id": session_id}
+    if user_id is not None:
+        session_where["userId"] = user_id
+
+    # Build message include — fetch paginated messages in the same query
+    msg_include: dict[str, Any] = {
+        "order_by": {"sequence": "desc"},
+        "take": limit + 1,
+    }
+    if before_sequence is not None:
+        msg_include["where"] = {"sequence": {"lt": before_sequence}}
+
+    # Single query: session existence/ownership + paginated messages
+    session = await PrismaChatSession.prisma().find_first(
+        where=session_where,
+        include={"Messages": msg_include},
+    )
+
+    if session is None:
+        return None
+
+    session_info = ChatSessionInfo.from_db(session)
+    results = list(session.Messages) if session.Messages else []
+
+    has_more = len(results) > limit
+    results = results[:limit]
+
+    # Reverse to ascending order
+    results.reverse()
+
+    # Tool-call boundary fix: if the oldest message is a tool message,
+    # expand backward to include the preceding assistant message that
+    # owns the tool_calls, so convertChatSessionMessagesToUiMessages
+    # can pair them correctly.
+    _BOUNDARY_SCAN_LIMIT = 10
+    if results and results[0].role == "tool":
+        boundary_where: dict[str, Any] = {
+            "sessionId": session_id,
+            "sequence": {"lt": results[0].sequence},
+        }
+        if user_id is not None:
+            boundary_where["Session"] = {"is": {"userId": user_id}}
+        extra = await PrismaChatMessage.prisma().find_many(
+            where=boundary_where,
+            order={"sequence": "desc"},
+            take=_BOUNDARY_SCAN_LIMIT,
+        )
+        # Find the first non-tool message (should be the assistant)
+        boundary_msgs = []
+        found_owner = False
+        for msg in extra:
+            boundary_msgs.append(msg)
+            if msg.role != "tool":
+                found_owner = True
+                break
+        boundary_msgs.reverse()
+        if not found_owner:
+            logger.warning(
+                "Boundary expansion did not find owning assistant message "
+                "for session=%s before sequence=%s (%d msgs scanned)",
+                session_id,
+                results[0].sequence,
+                len(extra),
+            )
+        if boundary_msgs:
+            results = boundary_msgs + results
+            # Only mark has_more if the expanded boundary isn't the
+            # very start of the conversation (sequence 0).
+            if boundary_msgs[0].sequence > 0:
+                has_more = True
+
+    messages = [ChatMessage.from_db(m) for m in results]
+    oldest_sequence = messages[0].sequence if messages else None
+
+    return PaginatedMessages(
+        messages=messages,
+        has_more=has_more,
+        oldest_sequence=oldest_sequence,
+        session=session_info,
+    )
+
+
 async def create_chat_session(
    session_id: str,
    user_id: str,
@@ -380,8 +501,11 @@ async def update_tool_message_content(
 async def set_turn_duration(session_id: str, duration_ms: int) -> None:
    """Set durationMs on the last assistant message in a session.

-    Also invalidates the Redis session cache so the next GET returns
-    the updated duration.
+    Updates the Redis cache in-place instead of invalidating it.
+    Invalidation would delete the key, creating a window where concurrent
+    ``get_chat_session`` calls re-populate the cache from DB — potentially
+    with stale data if the DB write from the previous turn hasn't propagated.
+    This race caused duplicate user messages on the next turn.
    """
    last_msg = await PrismaChatMessage.prisma().find_first(
        where={"sessionId": session_id, "role": "assistant"},
@@ -392,5 +516,13 @@ async def set_turn_duration(session_id: str, duration_ms: int) -> None:
            where={"id": last_msg.id},
            data={"durationMs": duration_ms},
        )
-        # Invalidate cache so the session is re-fetched from DB with durationMs
-        await invalidate_session_cache(session_id)
+        # Update cache in-place rather than invalidating to avoid a
+        # race window where the empty cache gets re-populated with
+        # stale data by a concurrent get_chat_session call.
+        session = await get_chat_session_cached(session_id)
+        if session and session.messages:
+            for msg in reversed(session.messages):
+                if msg.role == "assistant":
+                    msg.duration_ms = duration_ms
+                    break
+            await cache_chat_session(session)
--- a/autogpt_platform/backend/backend/copilot/db_test.py
+++ b/autogpt_platform/backend/backend/copilot/db_test.py
@@ -0,0 +1,388 @@
+"""Unit tests for copilot.db — paginated message queries."""
+
+from __future__ import annotations
+
+from datetime import UTC, datetime
+from typing import Any
+from unittest.mock import AsyncMock, patch
+
+import pytest
+from prisma.models import ChatMessage as PrismaChatMessage
+from prisma.models import ChatSession as PrismaChatSession
+
+from backend.copilot.db import (
+    PaginatedMessages,
+    get_chat_messages_paginated,
+    set_turn_duration,
+)
+from backend.copilot.model import ChatMessage as CopilotChatMessage
+from backend.copilot.model import ChatSession, get_chat_session, upsert_chat_session
+
+
+def _make_msg(
+    sequence: int,
+    role: str = "assistant",
+    content: str | None = "hello",
+    tool_calls: Any = None,
+) -> PrismaChatMessage:
+    """Build a minimal PrismaChatMessage for testing."""
+    return PrismaChatMessage(
+        id=f"msg-{sequence}",
+        createdAt=datetime.now(UTC),
+        sessionId="sess-1",
+        role=role,
+        content=content,
+        sequence=sequence,
+        toolCalls=tool_calls,
+        name=None,
+        toolCallId=None,
+        refusal=None,
+        functionCall=None,
+    )
+
+
+def _make_session(
+    session_id: str = "sess-1",
+    user_id: str = "user-1",
+    messages: list[PrismaChatMessage] | None = None,
+) -> PrismaChatSession:
+    """Build a minimal PrismaChatSession for testing."""
+    now = datetime.now(UTC)
+    session = PrismaChatSession.model_construct(
+        id=session_id,
+        createdAt=now,
+        updatedAt=now,
+        userId=user_id,
+        credentials={},
+        successfulAgentRuns={},
+        successfulAgentSchedules={},
+        totalPromptTokens=0,
+        totalCompletionTokens=0,
+        title=None,
+        metadata={},
+        Messages=messages or [],
+    )
+    return session
+
+
+SESSION_ID = "sess-1"
+
+
+@pytest.fixture()
+def mock_db():
+    """Patch ChatSession.prisma().find_first and ChatMessage.prisma().find_many.
+
+    find_first is used for the main query (session + included messages).
+    find_many is used only for boundary expansion queries.
+    """
+    with (
+        patch.object(PrismaChatSession, "prisma") as mock_session_prisma,
+        patch.object(PrismaChatMessage, "prisma") as mock_msg_prisma,
+    ):
+        find_first = AsyncMock()
+        mock_session_prisma.return_value.find_first = find_first
+
+        find_many = AsyncMock(return_value=[])
+        mock_msg_prisma.return_value.find_many = find_many
+
+        yield find_first, find_many
+
+
+# ---------- Basic pagination ----------
+
+
+@pytest.mark.asyncio
+async def test_basic_page_returns_messages_ascending(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """Messages are returned in ascending sequence order."""
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(3), _make_msg(2), _make_msg(1)],
+    )
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+
+    assert isinstance(page, PaginatedMessages)
+    assert [m.sequence for m in page.messages] == [1, 2, 3]
+    assert page.has_more is False
+    assert page.oldest_sequence == 1
+
+
+@pytest.mark.asyncio
+async def test_has_more_when_results_exceed_limit(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """has_more is True when DB returns more than limit items."""
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(3), _make_msg(2), _make_msg(1)],
+    )
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=2)
+
+    assert page is not None
+    assert page.has_more is True
+    assert len(page.messages) == 2
+    assert [m.sequence for m in page.messages] == [2, 3]
+
+
+@pytest.mark.asyncio
+async def test_empty_session_returns_no_messages(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(messages=[])
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=50)
+
+    assert page is not None
+    assert page.messages == []
+    assert page.has_more is False
+    assert page.oldest_sequence is None
+
+
+@pytest.mark.asyncio
+async def test_before_sequence_filters_correctly(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """before_sequence is passed as a where filter inside the Messages include."""
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(2), _make_msg(1)],
+    )
+
+    await get_chat_messages_paginated(SESSION_ID, limit=50, before_sequence=5)
+
+    call_kwargs = find_first.call_args
+    include = call_kwargs.kwargs.get("include") or call_kwargs[1].get("include")
+    assert include["Messages"]["where"] == {"sequence": {"lt": 5}}
+
+
+@pytest.mark.asyncio
+async def test_no_where_on_messages_without_before_sequence(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """Without before_sequence, the Messages include has no where clause."""
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(messages=[_make_msg(1)])
+
+    await get_chat_messages_paginated(SESSION_ID, limit=50)
+
+    call_kwargs = find_first.call_args
+    include = call_kwargs.kwargs.get("include") or call_kwargs[1].get("include")
+    assert "where" not in include["Messages"]
+
+
+@pytest.mark.asyncio
+async def test_user_id_filter_applied_to_session_where(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """user_id adds a userId filter to the session-level where clause."""
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(messages=[_make_msg(1)])
+
+    await get_chat_messages_paginated(SESSION_ID, limit=50, user_id="user-abc")
+
+    call_kwargs = find_first.call_args
+    where = call_kwargs.kwargs.get("where") or call_kwargs[1].get("where")
+    assert where["userId"] == "user-abc"
+
+
+@pytest.mark.asyncio
+async def test_session_not_found_returns_none(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """Returns None when session doesn't exist or user doesn't own it."""
+    find_first, _ = mock_db
+    find_first.return_value = None
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=50)
+
+    assert page is None
+
+
+@pytest.mark.asyncio
+async def test_session_info_included_in_result(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """PaginatedMessages includes session metadata."""
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(messages=[_make_msg(1)])
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=50)
+
+    assert page is not None
+    assert page.session.session_id == SESSION_ID
+
+
+# ---------- Backward boundary expansion ----------
+
+
+@pytest.mark.asyncio
+async def test_boundary_expansion_includes_assistant(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """When page starts with a tool message, expand backward to include
+    the owning assistant message."""
+    find_first, find_many = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(5, role="tool"), _make_msg(4, role="tool")],
+    )
+    find_many.return_value = [_make_msg(3, role="assistant")]
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+
+    assert page is not None
+    assert [m.sequence for m in page.messages] == [3, 4, 5]
+    assert page.messages[0].role == "assistant"
+    assert page.oldest_sequence == 3
+
+
+@pytest.mark.asyncio
+async def test_boundary_expansion_includes_multiple_tool_msgs(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """Boundary expansion scans past consecutive tool messages to find
+    the owning assistant."""
+    find_first, find_many = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(7, role="tool")],
+    )
+    find_many.return_value = [
+        _make_msg(6, role="tool"),
+        _make_msg(5, role="tool"),
+        _make_msg(4, role="assistant"),
+    ]
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+
+    assert page is not None
+    assert [m.sequence for m in page.messages] == [4, 5, 6, 7]
+    assert page.messages[0].role == "assistant"
+
+
+@pytest.mark.asyncio
+async def test_boundary_expansion_sets_has_more_when_not_at_start(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """After boundary expansion, has_more=True if expanded msgs aren't at seq 0."""
+    find_first, find_many = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(3, role="tool")],
+    )
+    find_many.return_value = [_make_msg(2, role="assistant")]
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+
+    assert page is not None
+    assert page.has_more is True
+
+
+@pytest.mark.asyncio
+async def test_boundary_expansion_no_has_more_at_conversation_start(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """has_more stays False when boundary expansion reaches seq 0."""
+    find_first, find_many = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(1, role="tool")],
+    )
+    find_many.return_value = [_make_msg(0, role="assistant")]
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+
+    assert page is not None
+    assert page.has_more is False
+    assert page.oldest_sequence == 0
+
+
+@pytest.mark.asyncio
+async def test_no_boundary_expansion_when_first_msg_not_tool(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """No boundary expansion when the first message is not a tool message."""
+    find_first, find_many = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(3, role="user"), _make_msg(2, role="assistant")],
+    )
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+
+    assert page is not None
+    assert find_many.call_count == 0
+    assert [m.sequence for m in page.messages] == [2, 3]
+
+
+@pytest.mark.asyncio
+async def test_boundary_expansion_warns_when_no_owner_found(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """When boundary scan doesn't find a non-tool message, a warning is logged
+    and the boundary messages are still included."""
+    find_first, find_many = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(10, role="tool")],
+    )
+    find_many.return_value = [_make_msg(i, role="tool") for i in range(9, -1, -1)]
+
+    with patch("backend.copilot.db.logger") as mock_logger:
+        page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+        mock_logger.warning.assert_called_once()
+
+    assert page is not None
+    assert page.messages[0].role == "tool"
+    assert len(page.messages) > 1
+
+
+# ---------- Turn duration (integration tests) ----------
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_set_turn_duration_updates_cache_in_place(setup_test_user, test_user_id):
+    """set_turn_duration patches the cached session without invalidation.
+
+    Verifies that after calling set_turn_duration the Redis-cached session
+    reflects the updated durationMs on the last assistant message, without
+    the cache having been deleted and re-populated (which could race with
+    concurrent get_chat_session calls).
+    """
+    session = ChatSession.new(user_id=test_user_id, dry_run=False)
+    session.messages = [
+        CopilotChatMessage(role="user", content="hello"),
+        CopilotChatMessage(role="assistant", content="hi there"),
+    ]
+    session = await upsert_chat_session(session)
+
+    # Ensure the session is in cache
+    cached = await get_chat_session(session.session_id, test_user_id)
+    assert cached is not None
+    assert cached.messages[-1].duration_ms is None
+
+    # Update turn duration — should patch cache in-place
+    await set_turn_duration(session.session_id, 1234)
+
+    # Read from cache (not DB) — the cache should already have the update
+    updated = await get_chat_session(session.session_id, test_user_id)
+    assert updated is not None
+    assistant_msgs = [m for m in updated.messages if m.role == "assistant"]
+    assert len(assistant_msgs) == 1
+    assert assistant_msgs[0].duration_ms == 1234
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_set_turn_duration_no_assistant_message(setup_test_user, test_user_id):
+    """set_turn_duration is a no-op when there are no assistant messages."""
+    session = ChatSession.new(user_id=test_user_id, dry_run=False)
+    session.messages = [
+        CopilotChatMessage(role="user", content="hello"),
+    ]
+    session = await upsert_chat_session(session)
+
+    # Should not raise
+    await set_turn_duration(session.session_id, 5678)
+
+    cached = await get_chat_session(session.session_id, test_user_id)
+    assert cached is not None
+    # User message should not have durationMs
+    assert cached.messages[0].duration_ms is None
--- a/autogpt_platform/backend/backend/copilot/executor/processor.py
+++ b/autogpt_platform/backend/backend/copilot/executor/processor.py
@@ -13,7 +13,7 @@ import time

 from backend.copilot import stream_registry
 from backend.copilot.baseline import stream_chat_completion_baseline
-from backend.copilot.config import ChatConfig
+from backend.copilot.config import ChatConfig, CopilotMode
 from backend.copilot.response_model import StreamError
 from backend.copilot.sdk import service as sdk_service
 from backend.copilot.sdk.dummy import stream_chat_completion_dummy
@@ -30,6 +30,57 @@ from .utils import CoPilotExecutionEntry, CoPilotLogMetadata
 logger = TruncatedLogger(logging.getLogger(__name__), prefix="[CoPilotExecutor]")


+# ============ Mode Routing ============ #
+
+
+async def resolve_effective_mode(
+    mode: CopilotMode | None,
+    user_id: str | None,
+) -> CopilotMode | None:
+    """Strip ``mode`` when the user is not entitled to the toggle.
+
+    The UI gates the mode toggle behind ``CHAT_MODE_OPTION``; the
+    processor enforces the same gate server-side so an authenticated
+    user cannot bypass the flag by crafting a request directly.
+    """
+    if mode is None:
+        return None
+    allowed = await is_feature_enabled(
+        Flag.CHAT_MODE_OPTION,
+        user_id or "anonymous",
+        default=False,
+    )
+    if not allowed:
+        logger.info(f"Ignoring mode={mode} — CHAT_MODE_OPTION is disabled for user")
+        return None
+    return mode
+
+
+async def resolve_use_sdk_for_mode(
+    mode: CopilotMode | None,
+    user_id: str | None,
+    *,
+    use_claude_code_subscription: bool,
+    config_default: bool,
+) -> bool:
+    """Pick the SDK vs baseline path for a single turn.
+
+    Per-request ``mode`` wins whenever it is set (after the
+    ``CHAT_MODE_OPTION`` gate has been applied upstream).  Otherwise
+    falls back to the Claude Code subscription override, then the
+    ``COPILOT_SDK`` LaunchDarkly flag, then the config default.
+    """
+    if mode == "fast":
+        return False
+    if mode == "extended_thinking":
+        return True
+    return use_claude_code_subscription or await is_feature_enabled(
+        Flag.COPILOT_SDK,
+        user_id or "anonymous",
+        default=config_default,
+    )
+
+
 # ============ Module Entry Points ============ #

 # Thread-local storage for processor instances
@@ -100,8 +151,8 @@ class CoPilotProcessor:
        This method is called once per worker thread to set up the async event
        loop and initialize any required resources.

-        Database is accessed only through DatabaseManager, so we don't need to connect
-        to Prisma directly.
+        DB operations route through DatabaseManagerAsyncClient (RPC) via the
+        db_accessors pattern — no direct Prisma connection is needed here.
        """
        configure_logging()
        set_service_name("CoPilotExecutor")
@@ -250,21 +301,26 @@ class CoPilotProcessor:
            if config.test_mode:
                stream_fn = stream_chat_completion_dummy
                log.warning("Using DUMMY service (CHAT_TEST_MODE=true)")
+                effective_mode = None
            else:
-                use_sdk = (
-                    config.use_claude_code_subscription
-                    or await is_feature_enabled(
-                        Flag.COPILOT_SDK,
-                        entry.user_id or "anonymous",
-                        default=config.use_claude_agent_sdk,
-                    )
+                # Enforce server-side feature-flag gate so unauthorised
+                # users cannot force a mode by crafting the request.
+                effective_mode = await resolve_effective_mode(entry.mode, entry.user_id)
+                use_sdk = await resolve_use_sdk_for_mode(
+                    effective_mode,
+                    entry.user_id,
+                    use_claude_code_subscription=config.use_claude_code_subscription,
+                    config_default=config.use_claude_agent_sdk,
                )
                stream_fn = (
                    sdk_service.stream_chat_completion_sdk
                    if use_sdk
                    else stream_chat_completion_baseline
                )
-                log.info(f"Using {'SDK' if use_sdk else 'baseline'} service")
+                log.info(
+                    f"Using {'SDK' if use_sdk else 'baseline'} service "
+                    f"(mode={effective_mode or 'default'})"
+                )

            # Stream chat completion and publish chunks to Redis.
            # stream_and_publish wraps the raw stream with registry
@@ -276,6 +332,7 @@ class CoPilotProcessor:
                user_id=entry.user_id,
                context=entry.context,
                file_ids=entry.file_ids,
+                mode=effective_mode,
            )
            async for chunk in stream_registry.stream_and_publish(
                session_id=entry.session_id,
--- a/autogpt_platform/backend/backend/copilot/executor/processor_test.py
+++ b/autogpt_platform/backend/backend/copilot/executor/processor_test.py
@@ -0,0 +1,175 @@
+"""Unit tests for CoPilot mode routing logic in the processor.
+
+Tests cover the mode→service mapping:
+  - 'fast' → baseline service
+  - 'extended_thinking' → SDK service
+  - None → feature flag / config fallback
+
+as well as the ``CHAT_MODE_OPTION`` server-side gate.  The tests import
+the real production helpers from ``processor.py`` so the routing logic
+has meaningful coverage.
+"""
+
+from unittest.mock import AsyncMock, patch
+
+import pytest
+
+from backend.copilot.executor.processor import (
+    resolve_effective_mode,
+    resolve_use_sdk_for_mode,
+)
+
+
+class TestResolveUseSdkForMode:
+    """Tests for the per-request mode routing logic."""
+
+    @pytest.mark.asyncio
+    async def test_fast_mode_uses_baseline(self):
+        """mode='fast' always routes to baseline, regardless of flags."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=True),
+        ):
+            assert (
+                await resolve_use_sdk_for_mode(
+                    "fast",
+                    "user-1",
+                    use_claude_code_subscription=True,
+                    config_default=True,
+                )
+                is False
+            )
+
+    @pytest.mark.asyncio
+    async def test_extended_thinking_uses_sdk(self):
+        """mode='extended_thinking' always routes to SDK, regardless of flags."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=False),
+        ):
+            assert (
+                await resolve_use_sdk_for_mode(
+                    "extended_thinking",
+                    "user-1",
+                    use_claude_code_subscription=False,
+                    config_default=False,
+                )
+                is True
+            )
+
+    @pytest.mark.asyncio
+    async def test_none_mode_uses_subscription_override(self):
+        """mode=None with claude_code_subscription=True routes to SDK."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=False),
+        ):
+            assert (
+                await resolve_use_sdk_for_mode(
+                    None,
+                    "user-1",
+                    use_claude_code_subscription=True,
+                    config_default=False,
+                )
+                is True
+            )
+
+    @pytest.mark.asyncio
+    async def test_none_mode_uses_feature_flag(self):
+        """mode=None with feature flag enabled routes to SDK."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=True),
+        ) as flag_mock:
+            assert (
+                await resolve_use_sdk_for_mode(
+                    None,
+                    "user-1",
+                    use_claude_code_subscription=False,
+                    config_default=False,
+                )
+                is True
+            )
+            flag_mock.assert_awaited_once()
+
+    @pytest.mark.asyncio
+    async def test_none_mode_uses_config_default(self):
+        """mode=None falls back to config.use_claude_agent_sdk."""
+        # When LaunchDarkly returns the default (True), we expect SDK routing.
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=True),
+        ):
+            assert (
+                await resolve_use_sdk_for_mode(
+                    None,
+                    "user-1",
+                    use_claude_code_subscription=False,
+                    config_default=True,
+                )
+                is True
+            )
+
+    @pytest.mark.asyncio
+    async def test_none_mode_all_disabled(self):
+        """mode=None with all flags off routes to baseline."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=False),
+        ):
+            assert (
+                await resolve_use_sdk_for_mode(
+                    None,
+                    "user-1",
+                    use_claude_code_subscription=False,
+                    config_default=False,
+                )
+                is False
+            )
+
+
+class TestResolveEffectiveMode:
+    """Tests for the CHAT_MODE_OPTION server-side gate."""
+
+    @pytest.mark.asyncio
+    async def test_none_mode_passes_through(self):
+        """mode=None is returned as-is without a flag check."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=False),
+        ) as flag_mock:
+            assert await resolve_effective_mode(None, "user-1") is None
+            flag_mock.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_mode_stripped_when_flag_disabled(self):
+        """When CHAT_MODE_OPTION is off, mode is dropped to None."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=False),
+        ):
+            assert await resolve_effective_mode("fast", "user-1") is None
+            assert await resolve_effective_mode("extended_thinking", "user-1") is None
+
+    @pytest.mark.asyncio
+    async def test_mode_preserved_when_flag_enabled(self):
+        """When CHAT_MODE_OPTION is on, the user-selected mode is preserved."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=True),
+        ):
+            assert await resolve_effective_mode("fast", "user-1") == "fast"
+            assert (
+                await resolve_effective_mode("extended_thinking", "user-1")
+                == "extended_thinking"
+            )
+
+    @pytest.mark.asyncio
+    async def test_anonymous_user_with_mode(self):
+        """Anonymous users (user_id=None) still pass through the gate."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=False),
+        ) as flag_mock:
+            assert await resolve_effective_mode("fast", None) is None
+            flag_mock.assert_awaited_once()
--- a/autogpt_platform/backend/backend/copilot/executor/utils.py
+++ b/autogpt_platform/backend/backend/copilot/executor/utils.py
@@ -9,6 +9,7 @@ import logging

 from pydantic import BaseModel

+from backend.copilot.config import CopilotMode
 from backend.data.rabbitmq import Exchange, ExchangeType, Queue, RabbitMQConfig
 from backend.util.logging import TruncatedLogger, is_structured_logging_enabled

@@ -156,6 +157,9 @@ class CoPilotExecutionEntry(BaseModel):
    file_ids: list[str] | None = None
    """Workspace file IDs attached to the user's message"""

+    mode: CopilotMode | None = None
+    """Autopilot mode override: 'fast' or 'extended_thinking'. None = server default."""
+

 class CancelCoPilotEvent(BaseModel):
    """Event to cancel a CoPilot operation."""
@@ -175,6 +179,7 @@ async def enqueue_copilot_turn(
    is_user_message: bool = True,
    context: dict[str, str] | None = None,
    file_ids: list[str] | None = None,
+    mode: CopilotMode | None = None,
 ) -> None:
    """Enqueue a CoPilot task for processing by the executor service.

@@ -186,6 +191,7 @@ async def enqueue_copilot_turn(
        is_user_message: Whether the message is from the user (vs system/assistant)
        context: Optional context for the message (e.g., {url: str, content: str})
        file_ids: Optional workspace file IDs attached to the user's message
+        mode: Autopilot mode override ('fast' or 'extended_thinking'). None = server default.
    """
    from backend.util.clients import get_async_copilot_queue

@@ -197,6 +203,7 @@ async def enqueue_copilot_turn(
        is_user_message=is_user_message,
        context=context,
        file_ids=file_ids,
+        mode=mode,
    )

    queue_client = await get_async_copilot_queue()
--- a/autogpt_platform/backend/backend/copilot/executor/utils_test.py
+++ b/autogpt_platform/backend/backend/copilot/executor/utils_test.py
@@ -0,0 +1,123 @@
+"""Tests for CoPilot executor utils (queue config, message models, logging)."""
+
+from backend.copilot.executor.utils import (
+    COPILOT_EXECUTION_EXCHANGE,
+    COPILOT_EXECUTION_QUEUE_NAME,
+    COPILOT_EXECUTION_ROUTING_KEY,
+    CancelCoPilotEvent,
+    CoPilotExecutionEntry,
+    CoPilotLogMetadata,
+    create_copilot_queue_config,
+)
+
+
+class TestCoPilotExecutionEntry:
+    def test_basic_fields(self):
+        entry = CoPilotExecutionEntry(
+            session_id="s1",
+            user_id="u1",
+            message="hello",
+        )
+        assert entry.session_id == "s1"
+        assert entry.user_id == "u1"
+        assert entry.message == "hello"
+        assert entry.is_user_message is True
+        assert entry.mode is None
+        assert entry.context is None
+        assert entry.file_ids is None
+
+    def test_mode_field(self):
+        entry = CoPilotExecutionEntry(
+            session_id="s1",
+            user_id="u1",
+            message="test",
+            mode="fast",
+        )
+        assert entry.mode == "fast"
+
+        entry2 = CoPilotExecutionEntry(
+            session_id="s1",
+            user_id="u1",
+            message="test",
+            mode="extended_thinking",
+        )
+        assert entry2.mode == "extended_thinking"
+
+    def test_optional_fields(self):
+        entry = CoPilotExecutionEntry(
+            session_id="s1",
+            user_id="u1",
+            message="test",
+            turn_id="t1",
+            context={"url": "https://example.com"},
+            file_ids=["f1", "f2"],
+            is_user_message=False,
+        )
+        assert entry.turn_id == "t1"
+        assert entry.context == {"url": "https://example.com"}
+        assert entry.file_ids == ["f1", "f2"]
+        assert entry.is_user_message is False
+
+    def test_serialization_roundtrip(self):
+        entry = CoPilotExecutionEntry(
+            session_id="s1",
+            user_id="u1",
+            message="hello",
+            mode="fast",
+        )
+        json_str = entry.model_dump_json()
+        restored = CoPilotExecutionEntry.model_validate_json(json_str)
+        assert restored == entry
+
+
+class TestCancelCoPilotEvent:
+    def test_basic(self):
+        event = CancelCoPilotEvent(session_id="s1")
+        assert event.session_id == "s1"
+
+    def test_serialization(self):
+        event = CancelCoPilotEvent(session_id="s1")
+        restored = CancelCoPilotEvent.model_validate_json(event.model_dump_json())
+        assert restored.session_id == "s1"
+
+
+class TestCreateCopilotQueueConfig:
+    def test_returns_valid_config(self):
+        config = create_copilot_queue_config()
+        assert len(config.exchanges) == 2
+        assert len(config.queues) == 2
+
+    def test_execution_queue_properties(self):
+        config = create_copilot_queue_config()
+        exec_queue = next(
+            q for q in config.queues if q.name == COPILOT_EXECUTION_QUEUE_NAME
+        )
+        assert exec_queue.durable is True
+        assert exec_queue.exchange == COPILOT_EXECUTION_EXCHANGE
+        assert exec_queue.routing_key == COPILOT_EXECUTION_ROUTING_KEY
+
+    def test_cancel_queue_uses_fanout(self):
+        config = create_copilot_queue_config()
+        cancel_queue = next(
+            q for q in config.queues if q.name != COPILOT_EXECUTION_QUEUE_NAME
+        )
+        assert cancel_queue.exchange is not None
+        assert cancel_queue.exchange.type.value == "fanout"
+
+
+class TestCoPilotLogMetadata:
+    def test_creates_logger_with_metadata(self):
+        import logging
+
+        base_logger = logging.getLogger("test")
+        log = CoPilotLogMetadata(base_logger, session_id="s1", user_id="u1")
+        assert log is not None
+
+    def test_filters_none_values(self):
+        import logging
+
+        base_logger = logging.getLogger("test")
+        log = CoPilotLogMetadata(
+            base_logger, session_id="s1", user_id=None, turn_id="t1"
+        )
+        assert log is not None
--- a/autogpt_platform/backend/backend/copilot/model.py
+++ b/autogpt_platform/backend/backend/copilot/model.py
@@ -64,6 +64,7 @@ class ChatMessage(BaseModel):
    refusal: str | None = None
    tool_calls: list[dict] | None = None
    function_call: dict | None = None
+    sequence: int | None = None
    duration_ms: int | None = None

    @staticmethod
@@ -77,10 +78,54 @@ class ChatMessage(BaseModel):
            refusal=prisma_message.refusal,
            tool_calls=_parse_json_field(prisma_message.toolCalls),
            function_call=_parse_json_field(prisma_message.functionCall),
+            sequence=prisma_message.sequence,
            duration_ms=prisma_message.durationMs,
        )


+def is_message_duplicate(
+    messages: list[ChatMessage],
+    role: str,
+    content: str,
+) -> bool:
+    """Check whether *content* is already present in the current pending turn.
+
+    Only inspects trailing messages that share the given *role* (i.e. the
+    current turn). This ensures legitimately repeated messages across different
+    turns are not suppressed, while same-turn duplicates from stale cache are
+    still caught.
+    """
+    for m in reversed(messages):
+        if m.role == role:
+            if m.content == content:
+                return True
+        else:
+            break
+    return False
+
+
+def maybe_append_user_message(
+    session: "ChatSession",
+    message: str | None,
+    is_user_message: bool,
+) -> bool:
+    """Append a user/assistant message to the session if not already present.
+
+    The route handler already persists the user message before enqueueing,
+    so we check trailing same-role messages to avoid re-appending when the
+    session cache is slightly stale.
+
+    Returns True if the message was appended, False if skipped.
+    """
+    if not message:
+        return False
+    role = "user" if is_user_message else "assistant"
+    if is_message_duplicate(session.messages, role, message):
+        return False
+    session.messages.append(ChatMessage(role=role, content=message))
+    return True
+
+
 class Usage(BaseModel):
    prompt_tokens: int
    completion_tokens: int
--- a/autogpt_platform/backend/backend/copilot/model_test.py
+++ b/autogpt_platform/backend/backend/copilot/model_test.py
@@ -17,6 +17,8 @@ from .model import (
    ChatSession,
    Usage,
    get_chat_session,
+    is_message_duplicate,
+    maybe_append_user_message,
    upsert_chat_session,
 )

@@ -424,3 +426,151 @@ async def test_concurrent_saves_collision_detection(setup_test_user, test_user_i
    assert "Streaming message 1" in contents
    assert "Streaming message 2" in contents
    assert "Callback result" in contents
+
+
+# --------------------------------------------------------------------------- #
+#  is_message_duplicate                                                        #
+# --------------------------------------------------------------------------- #
+
+
+def test_duplicate_detected_in_trailing_same_role():
+    """Duplicate user message at the tail is detected."""
+    msgs = [
+        ChatMessage(role="user", content="hello"),
+        ChatMessage(role="assistant", content="hi there"),
+        ChatMessage(role="user", content="yes"),
+    ]
+    assert is_message_duplicate(msgs, "user", "yes") is True
+
+
+def test_duplicate_not_detected_across_turns():
+    """Same text in a previous turn (separated by assistant) is NOT a duplicate."""
+    msgs = [
+        ChatMessage(role="user", content="yes"),
+        ChatMessage(role="assistant", content="ok"),
+    ]
+    assert is_message_duplicate(msgs, "user", "yes") is False
+
+
+def test_no_duplicate_on_empty_messages():
+    """Empty message list never reports a duplicate."""
+    assert is_message_duplicate([], "user", "hello") is False
+
+
+def test_no_duplicate_when_content_differs():
+    """Different content in the trailing same-role block is not a duplicate."""
+    msgs = [
+        ChatMessage(role="assistant", content="response"),
+        ChatMessage(role="user", content="first message"),
+    ]
+    assert is_message_duplicate(msgs, "user", "second message") is False
+
+
+def test_duplicate_with_multiple_trailing_same_role():
+    """Detects duplicate among multiple consecutive same-role messages."""
+    msgs = [
+        ChatMessage(role="assistant", content="response"),
+        ChatMessage(role="user", content="msg1"),
+        ChatMessage(role="user", content="msg2"),
+    ]
+    assert is_message_duplicate(msgs, "user", "msg1") is True
+    assert is_message_duplicate(msgs, "user", "msg2") is True
+    assert is_message_duplicate(msgs, "user", "msg3") is False
+
+
+def test_duplicate_check_for_assistant_role():
+    """Works correctly when checking assistant role too."""
+    msgs = [
+        ChatMessage(role="user", content="hi"),
+        ChatMessage(role="assistant", content="hello"),
+        ChatMessage(role="assistant", content="how can I help?"),
+    ]
+    assert is_message_duplicate(msgs, "assistant", "hello") is True
+    assert is_message_duplicate(msgs, "assistant", "new response") is False
+
+
+def test_no_false_positive_when_content_is_none():
+    """Messages with content=None in the trailing block do not match."""
+    msgs = [
+        ChatMessage(role="user", content=None),
+        ChatMessage(role="user", content="hello"),
+    ]
+    assert is_message_duplicate(msgs, "user", "hello") is True
+    # None-content message should not match any string
+    msgs2 = [
+        ChatMessage(role="user", content=None),
+    ]
+    assert is_message_duplicate(msgs2, "user", "hello") is False
+
+
+def test_all_same_role_messages():
+    """When all messages share the same role, the entire list is scanned."""
+    msgs = [
+        ChatMessage(role="user", content="first"),
+        ChatMessage(role="user", content="second"),
+        ChatMessage(role="user", content="third"),
+    ]
+    assert is_message_duplicate(msgs, "user", "first") is True
+    assert is_message_duplicate(msgs, "user", "new") is False
+
+
+# --------------------------------------------------------------------------- #
+#  maybe_append_user_message                                                   #
+# --------------------------------------------------------------------------- #
+
+
+def test_maybe_append_user_message_appends_new():
+    """A new user message is appended and returns True."""
+    session = ChatSession.new(user_id="u", dry_run=False)
+    session.messages = [
+        ChatMessage(role="assistant", content="hello"),
+    ]
+    result = maybe_append_user_message(session, "new msg", is_user_message=True)
+    assert result is True
+    assert len(session.messages) == 2
+    assert session.messages[-1].role == "user"
+    assert session.messages[-1].content == "new msg"
+
+
+def test_maybe_append_user_message_skips_duplicate():
+    """A duplicate user message is skipped and returns False."""
+    session = ChatSession.new(user_id="u", dry_run=False)
+    session.messages = [
+        ChatMessage(role="assistant", content="hello"),
+        ChatMessage(role="user", content="dup"),
+    ]
+    result = maybe_append_user_message(session, "dup", is_user_message=True)
+    assert result is False
+    assert len(session.messages) == 2
+
+
+def test_maybe_append_user_message_none_message():
+    """None/empty message returns False without appending."""
+    session = ChatSession.new(user_id="u", dry_run=False)
+    assert maybe_append_user_message(session, None, is_user_message=True) is False
+    assert maybe_append_user_message(session, "", is_user_message=True) is False
+    assert len(session.messages) == 0
+
+
+def test_maybe_append_assistant_message():
+    """Works for assistant role when is_user_message=False."""
+    session = ChatSession.new(user_id="u", dry_run=False)
+    session.messages = [
+        ChatMessage(role="user", content="hi"),
+    ]
+    result = maybe_append_user_message(session, "response", is_user_message=False)
+    assert result is True
+    assert session.messages[-1].role == "assistant"
+    assert session.messages[-1].content == "response"
+
+
+def test_maybe_append_assistant_skips_duplicate():
+    """Duplicate assistant message is skipped."""
+    session = ChatSession.new(user_id="u", dry_run=False)
+    session.messages = [
+        ChatMessage(role="user", content="hi"),
+        ChatMessage(role="assistant", content="dup"),
+    ]
+    result = maybe_append_user_message(session, "dup", is_user_message=False)
+    assert result is False
+    assert len(session.messages) == 2
--- a/autogpt_platform/backend/backend/copilot/prompting.py
+++ b/autogpt_platform/backend/backend/copilot/prompting.py
@@ -126,6 +126,21 @@ After building the file, reference it with `@@agptfile:` in other tools:
 - When spawning sub-agents for research, ensure each has a distinct
  non-overlapping scope to avoid redundant searches.

+
+### Tool Discovery Priority
+
+When the user asks to interact with a service or API, follow this order:
+
+1. **find_block first** — Search platform blocks with `find_block`. The platform has hundreds of built-in blocks (Google Sheets, Docs, Calendar, Gmail, Slack, GitHub, etc.) that work without extra setup.
+
+2. **run_mcp_tool** — If no matching block exists, check if a hosted MCP server is available for the service. Only use known MCP server URLs from the registry.
+
+3. **SendAuthenticatedWebRequestBlock** — If no block or MCP server exists, use `SendAuthenticatedWebRequestBlock` with existing host-scoped credentials. Check available credentials via `connect_integration`.
+
+4. **Manual API call** — As a last resort, guide the user to set up credentials and use `SendAuthenticatedWebRequestBlock` with direct API calls.
+
+**Never skip step 1.** Built-in blocks are more reliable, tested, and user-friendly than MCP or raw API calls.
+
 ### Sub-agent tasks
 - When using the Task tool, NEVER set `run_in_background` to true.
  All tasks must run in the foreground.
--- a/autogpt_platform/backend/backend/copilot/rate_limit.py
+++ b/autogpt_platform/backend/backend/copilot/rate_limit.py
@@ -9,11 +9,15 @@ UTC). Fails open when Redis is unavailable to avoid blocking users.
 import asyncio
 import logging
 from datetime import UTC, datetime, timedelta
+from enum import Enum

+from prisma.models import User as PrismaUser
 from pydantic import BaseModel, Field
 from redis.exceptions import RedisError

+from backend.data.db_accessors import user_db
 from backend.data.redis_client import get_redis_async
+from backend.util.cache import cached

 logger = logging.getLogger(__name__)

@@ -21,6 +25,40 @@ logger = logging.getLogger(__name__)
 _USAGE_KEY_PREFIX = "copilot:usage"


+# ---------------------------------------------------------------------------
+# Subscription tier definitions
+# ---------------------------------------------------------------------------
+
+
+class SubscriptionTier(str, Enum):
+    """Subscription tiers with increasing token allowances.
+
+    Mirrors the ``SubscriptionTier`` enum in ``schema.prisma``.
+    Once ``prisma generate`` is run, this can be replaced with::
+
+        from prisma.enums import SubscriptionTier
+    """
+
+    FREE = "FREE"
+    PRO = "PRO"
+    BUSINESS = "BUSINESS"
+    ENTERPRISE = "ENTERPRISE"
+
+
+# Multiplier applied to the base limits (from LD / config) for each tier.
+# Intentionally int (not float): keeps limits as whole token counts and avoids
+# floating-point rounding.  If fractional multipliers are ever needed, change
+# the type and round the result in get_global_rate_limits().
+TIER_MULTIPLIERS: dict[SubscriptionTier, int] = {
+    SubscriptionTier.FREE: 1,
+    SubscriptionTier.PRO: 5,
+    SubscriptionTier.BUSINESS: 20,
+    SubscriptionTier.ENTERPRISE: 60,
+}
+
+DEFAULT_TIER = SubscriptionTier.FREE
+
+
 class UsageWindow(BaseModel):
    """Usage within a single time window."""

@@ -36,6 +74,7 @@ class CoPilotUsageStatus(BaseModel):

    daily: UsageWindow
    weekly: UsageWindow
+    tier: SubscriptionTier = DEFAULT_TIER
    reset_cost: int = Field(
        default=0,
        description="Credit cost (in cents) to reset the daily limit. 0 = feature disabled.",
@@ -66,6 +105,7 @@ async def get_usage_status(
    daily_token_limit: int,
    weekly_token_limit: int,
    rate_limit_reset_cost: int = 0,
+    tier: SubscriptionTier = DEFAULT_TIER,
 ) -> CoPilotUsageStatus:
    """Get current usage status for a user.

@@ -74,6 +114,7 @@ async def get_usage_status(
        daily_token_limit: Max tokens per day (0 = unlimited).
        weekly_token_limit: Max tokens per week (0 = unlimited).
        rate_limit_reset_cost: Credit cost (cents) to reset daily limit (0 = disabled).
+        tier: The user's rate-limit tier (included in the response).

    Returns:
        CoPilotUsageStatus with current usage and limits.
@@ -103,6 +144,7 @@ async def get_usage_status(
            limit=weekly_token_limit,
            resets_at=_weekly_reset_time(now=now),
        ),
+        tier=tier,
        reset_cost=rate_limit_reset_cost,
    )

@@ -343,20 +385,103 @@ async def record_token_usage(
        )


+class _UserNotFoundError(Exception):
+    """Raised when a user record is missing or has no subscription tier.
+
+    Used internally by ``_fetch_user_tier`` to signal a cache-miss condition:
+    by raising instead of returning ``DEFAULT_TIER``, we prevent the ``@cached``
+    decorator from storing the fallback value.  This avoids a race condition
+    where a non-existent user's DEFAULT_TIER is cached, then the user is
+    created with a higher tier but receives the stale cached FREE tier for
+    up to 5 minutes.
+    """
+
+
+@cached(maxsize=1000, ttl_seconds=300, shared_cache=True)
+async def _fetch_user_tier(user_id: str) -> SubscriptionTier:
+    """Fetch the user's rate-limit tier from the database (cached via Redis).
+
+    Uses ``shared_cache=True`` so that tier changes propagate across all pods
+    immediately when the cache entry is invalidated (via ``cache_delete``).
+
+    Only successful DB lookups of existing users with a valid tier are cached.
+    Raises ``_UserNotFoundError`` when the user is missing or has no tier, so
+    the ``@cached`` decorator does **not** store a fallback value.  This
+    prevents a race condition where a non-existent user's ``DEFAULT_TIER`` is
+    cached and then persists after the user is created with a higher tier.
+    """
+    try:
+        user = await user_db().get_user_by_id(user_id)
+    except Exception:
+        raise _UserNotFoundError(user_id)
+    if user.subscription_tier:
+        return SubscriptionTier(user.subscription_tier)
+    raise _UserNotFoundError(user_id)
+
+
+async def get_user_tier(user_id: str) -> SubscriptionTier:
+    """Look up the user's rate-limit tier from the database.
+
+    Successful results are cached for 5 minutes (via ``_fetch_user_tier``)
+    to avoid a DB round-trip on every rate-limit check.
+
+    Falls back to ``DEFAULT_TIER`` **without caching** when the DB is
+    unreachable or returns an unrecognised value, so the next call retries
+    the query instead of serving a stale fallback for up to 5 minutes.
+    """
+    try:
+        return await _fetch_user_tier(user_id)
+    except Exception as exc:
+        logger.warning(
+            "Failed to resolve rate-limit tier for user %s, defaulting to %s: %s",
+            user_id[:8],
+            DEFAULT_TIER.value,
+            exc,
+        )
+    return DEFAULT_TIER
+
+
+# Expose cache management on the public function so callers (including tests)
+# never need to reach into the private ``_fetch_user_tier``.
+get_user_tier.cache_clear = _fetch_user_tier.cache_clear  # type: ignore[attr-defined]
+get_user_tier.cache_delete = _fetch_user_tier.cache_delete  # type: ignore[attr-defined]
+
+
+async def set_user_tier(user_id: str, tier: SubscriptionTier) -> None:
+    """Persist the user's rate-limit tier to the database.
+
+    Also invalidates the ``get_user_tier`` cache for this user so that
+    subsequent rate-limit checks immediately see the new tier.
+
+    Raises:
+        prisma.errors.RecordNotFoundError: If the user does not exist.
+    """
+    await PrismaUser.prisma().update(
+        where={"id": user_id},
+        data={"subscriptionTier": tier.value},
+    )
+    # Invalidate cached tier so rate-limit checks pick up the change immediately.
+    get_user_tier.cache_delete(user_id)  # type: ignore[attr-defined]
+
+
 async def get_global_rate_limits(
    user_id: str,
    config_daily: int,
    config_weekly: int,
-) -> tuple[int, int]:
+) -> tuple[int, int, SubscriptionTier]:
    """Resolve global rate limits from LaunchDarkly, falling back to config.

+    The base limits (from LD or config) are multiplied by the user's
+    tier multiplier so that higher tiers receive proportionally larger
+    allowances.
+
    Args:
        user_id: User ID for LD flag evaluation context.
        config_daily: Fallback daily limit from ChatConfig.
        config_weekly: Fallback weekly limit from ChatConfig.

    Returns:
-        (daily_token_limit, weekly_token_limit) tuple.
+        (daily_token_limit, weekly_token_limit, tier) 3-tuple.
    """
    # Lazy import to avoid circular dependency:
    # rate_limit -> feature_flag -> settings -> ... -> rate_limit
@@ -378,7 +503,15 @@ async def get_global_rate_limits(
    except (TypeError, ValueError):
        logger.warning("Invalid LD value for weekly token limit: %r", weekly_raw)
        weekly = config_weekly
-    return daily, weekly
+
+    # Apply tier multiplier
+    tier = await get_user_tier(user_id)
+    multiplier = TIER_MULTIPLIERS.get(tier, 1)
+    if multiplier != 1:
+        daily = daily * multiplier
+        weekly = weekly * multiplier
+
+    return daily, weekly, tier


 async def reset_user_usage(user_id: str, *, reset_weekly: bool = False) -> None:
--- a/autogpt_platform/backend/backend/copilot/rate_limit_test.py
+++ b/autogpt_platform/backend/backend/copilot/rate_limit_test.py
--- a/autogpt_platform/backend/backend/copilot/reset_usage_test.py
+++ b/autogpt_platform/backend/backend/copilot/reset_usage_test.py
@@ -9,7 +9,7 @@ import pytest
 from fastapi import HTTPException

 from backend.api.features.chat.routes import reset_copilot_usage
-from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow
+from backend.copilot.rate_limit import CoPilotUsageStatus, SubscriptionTier, UsageWindow
 from backend.util.exceptions import InsufficientBalanceError


@@ -53,6 +53,18 @@ def _mock_settings(enable_credit: bool = True):
    return mock


+def _mock_rate_limits(
+    daily: int = 2_500_000,
+    weekly: int = 12_500_000,
+    tier: SubscriptionTier = SubscriptionTier.PRO,
+):
+    """Mock get_global_rate_limits to return fixed limits (no tier multiplier)."""
+    return patch(
+        f"{_MODULE}.get_global_rate_limits",
+        AsyncMock(return_value=(daily, weekly, tier)),
+    )
+
+
@pytest.mark.asyncio
 class TestResetCopilotUsage:
    async def test_feature_disabled_returns_400(self):
@@ -70,10 +82,7 @@ class TestResetCopilotUsage:
        with (
            patch(f"{_MODULE}.config", _make_config(daily_token_limit=0)),
            patch(f"{_MODULE}.settings", _mock_settings()),
-            patch(
-                f"{_MODULE}.get_global_rate_limits",
-                AsyncMock(return_value=(0, 12_500_000)),
-            ),
+            _mock_rate_limits(daily=0),
        ):
            with pytest.raises(HTTPException) as exc_info:
                await reset_copilot_usage(user_id="user-1")
@@ -87,10 +96,7 @@ class TestResetCopilotUsage:
        with (
            patch(f"{_MODULE}.config", cfg),
            patch(f"{_MODULE}.settings", _mock_settings()),
-            patch(
-                f"{_MODULE}.get_global_rate_limits",
-                AsyncMock(return_value=(2_500_000, 12_500_000)),
-            ),
+            _mock_rate_limits(),
            patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
            patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
            patch(f"{_MODULE}.release_reset_lock", AsyncMock()) as mock_release,
@@ -120,10 +126,7 @@ class TestResetCopilotUsage:
        with (
            patch(f"{_MODULE}.config", cfg),
            patch(f"{_MODULE}.settings", _mock_settings()),
-            patch(
-                f"{_MODULE}.get_global_rate_limits",
-                AsyncMock(return_value=(2_500_000, 12_500_000)),
-            ),
+            _mock_rate_limits(),
            patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
            patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
            patch(f"{_MODULE}.release_reset_lock", AsyncMock()) as mock_release,
@@ -153,10 +156,7 @@ class TestResetCopilotUsage:
        with (
            patch(f"{_MODULE}.config", cfg),
            patch(f"{_MODULE}.settings", _mock_settings()),
-            patch(
-                f"{_MODULE}.get_global_rate_limits",
-                AsyncMock(return_value=(2_500_000, 12_500_000)),
-            ),
+            _mock_rate_limits(),
            patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
            patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
            patch(f"{_MODULE}.release_reset_lock", AsyncMock()),
@@ -187,10 +187,7 @@ class TestResetCopilotUsage:
        with (
            patch(f"{_MODULE}.config", cfg),
            patch(f"{_MODULE}.settings", _mock_settings()),
-            patch(
-                f"{_MODULE}.get_global_rate_limits",
-                AsyncMock(return_value=(2_500_000, 12_500_000)),
-            ),
+            _mock_rate_limits(),
            patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=3)),
        ):
            with pytest.raises(HTTPException) as exc_info:
@@ -228,10 +225,7 @@ class TestResetCopilotUsage:
        with (
            patch(f"{_MODULE}.config", cfg),
            patch(f"{_MODULE}.settings", _mock_settings()),
-            patch(
-                f"{_MODULE}.get_global_rate_limits",
-                AsyncMock(return_value=(2_500_000, 12_500_000)),
-            ),
+            _mock_rate_limits(),
            patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
            patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
            patch(f"{_MODULE}.release_reset_lock", AsyncMock()) as mock_release,
@@ -252,10 +246,7 @@ class TestResetCopilotUsage:
        with (
            patch(f"{_MODULE}.config", _make_config()),
            patch(f"{_MODULE}.settings", _mock_settings()),
-            patch(
-                f"{_MODULE}.get_global_rate_limits",
-                AsyncMock(return_value=(2_500_000, 12_500_000)),
-            ),
+            _mock_rate_limits(),
            patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=None)),
        ):
            with pytest.raises(HTTPException) as exc_info:
@@ -273,10 +264,7 @@ class TestResetCopilotUsage:
        with (
            patch(f"{_MODULE}.config", cfg),
            patch(f"{_MODULE}.settings", _mock_settings()),
-            patch(
-                f"{_MODULE}.get_global_rate_limits",
-                AsyncMock(return_value=(2_500_000, 12_500_000)),
-            ),
+            _mock_rate_limits(),
            patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
            patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
            patch(f"{_MODULE}.release_reset_lock", AsyncMock()),
@@ -307,10 +295,7 @@ class TestResetCopilotUsage:
        with (
            patch(f"{_MODULE}.config", cfg),
            patch(f"{_MODULE}.settings", _mock_settings()),
-            patch(
-                f"{_MODULE}.get_global_rate_limits",
-                AsyncMock(return_value=(2_500_000, 12_500_000)),
-            ),
+            _mock_rate_limits(),
            patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
            patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
            patch(f"{_MODULE}.release_reset_lock", AsyncMock()),
--- a/autogpt_platform/backend/backend/copilot/sdk/agent_generation_guide.md
+++ b/autogpt_platform/backend/backend/copilot/sdk/agent_generation_guide.md
@@ -53,6 +53,12 @@ Steps:
   or fix manually based on the error descriptions. Iterate until valid.
 8. **Save**: Call `create_agent` (new) or `edit_agent` (existing) with
   the final `agent_json`
+8. **Dry-run**: ALWAYS call `run_agent` with `dry_run=True` and
+   `wait_for_result=120` to verify the agent works end-to-end.
+9. **Inspect & fix**: Check the dry-run output for errors. If issues are
+   found, call `edit_agent` to fix and dry-run again. Repeat until the
+   simulation passes or the problems are clearly unfixable.
+   See "REQUIRED: Dry-Run Verification Loop" section below for details.

 ### Agent JSON Structure

@@ -246,19 +252,51 @@ call in a loop until the task is complete:
 Regular blocks work exactly like sub-agents as tools — wire each input
 field from `source_name: "tools"` on the Orchestrator side.

-### Testing with Dry Run
+### REQUIRED: Dry-Run Verification Loop (create -> dry-run -> fix)

-After saving an agent, suggest a dry run to validate wiring without consuming
-real API calls, credentials, or credits:
+After creating or editing an agent, you MUST dry-run it before telling the
+user the agent is ready. NEVER skip this step.

-1. **Run**: Call `run_agent` or `run_block` with `dry_run=True` and provide
-   sample inputs. This executes the graph with mock outputs, verifying that
-   links resolve correctly and required inputs are satisfied.
-2. **Check results**: Call `view_agent_output` with `show_execution_details=True`
-   to inspect the full node-by-node execution trace. This shows what each node
-   received as input and produced as output, making it easy to spot wiring issues.
-3. **Iterate**: If the dry run reveals wiring issues or missing inputs, fix
-   the agent JSON and re-save before suggesting a real execution.
+#### Step-by-step workflow
+
+1. **Create/Edit**: Call `create_agent` or `edit_agent` to save the agent.
+2. **Dry-run**: Call `run_agent` with `dry_run=True`, `wait_for_result=120`,
+   and realistic sample inputs that exercise every path in the agent. This
+   simulates execution using an LLM for each block — no real API calls,
+   credentials, or credits are consumed.
+3. **Inspect output**: Examine the dry-run result for problems. If
+   `wait_for_result` returns only a summary, call
+   `view_agent_output(execution_id=..., show_execution_details=True)` to
+   see the full node-by-node execution trace. Look for:
+   - **Errors / failed nodes** — a node raised an exception or returned an
+     error status. Common causes: wrong `source_name`/`sink_name` in links,
+     missing `input_default` values, or referencing a nonexistent block output.
+   - **Null / empty outputs** — data did not flow through a link. Verify that
+     `source_name` and `sink_name` match the block schemas exactly (case-
+     sensitive, including nested `_#_` notation).
+   - **Nodes that never executed** — the node was not reached. Likely a
+     missing or broken link from an upstream node.
+   - **Unexpected values** — data arrived but in the wrong type or
+     structure. Check type compatibility between linked ports.
+4. **Fix**: If any issues are found, call `edit_agent` with the corrected
+   agent JSON, then go back to step 2.
+5. **Repeat**: Continue the dry-run -> fix cycle until the simulation passes
+   or the problems are clearly unfixable. If you stop making progress,
+   report the remaining issues to the user and ask for guidance.
+
+#### Good vs bad dry-run output
+
+**Good output** (agent is ready):
+- All nodes executed successfully (no errors in the execution trace)
+- Data flows through every link with non-null, correctly-typed values
+- The final `AgentOutputBlock` contains a meaningful result
+- Status is `COMPLETED`
+
+**Bad output** (needs fixing):
+- Status is `FAILED` — check the error message for the failing node
+- An output node received `null` — trace back to find the broken link
+- A node received data in the wrong format (e.g. string where list expected)
+- Nodes downstream of a failing node were skipped entirely

 **Special block behaviour in dry-run mode:**
 - **OrchestratorBlock** and **AgentExecutorBlock** execute for real so the
--- a/autogpt_platform/backend/backend/copilot/sdk/mcp_tool_guide.md
+++ b/autogpt_platform/backend/backend/copilot/sdk/mcp_tool_guide.md
@@ -28,13 +28,12 @@ Each result includes a `remotes` array with the exact server URL to use.

 ### Important: Check blocks first

-Before using `run_mcp_tool`, always check if the platform already has blocks for the service
-using `find_block`. The platform has hundreds of built-in blocks (Google Sheets, Google Docs,
-Google Calendar, Gmail, etc.) that work without MCP setup.
+Always follow the **Tool Discovery Priority** described in the tool notes:
+call `find_block` before resorting to `run_mcp_tool`.

 Only use `run_mcp_tool` when:
- The service is in the known hosted MCP servers list above, OR
- You searched `find_block` first and found no matching blocks
+- You searched `find_block` first and found no matching blocks, AND
+- The service is in the known hosted MCP servers list above or found via the registry API

 **Never guess or construct MCP server URLs.** Only use URLs from the known servers list above
 or from the `remotes[].url` field in MCP registry search results.
--- a/autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
@@ -8,20 +8,19 @@ from uuid import uuid4

 import pytest

-from backend.util import json
-from backend.util.prompt import CompressResult
-
-from .conftest import build_test_transcript as _build_transcript
-from .service import _friendly_error_text, _is_prompt_too_long
-from .transcript import (
+from backend.copilot.transcript import (
    _flatten_assistant_content,
    _flatten_tool_result_content,
    _messages_to_transcript,
    _run_compression,
    _transcript_to_messages,
-    compact_transcript,
-    validate_transcript,
 )
+from backend.util import json
+from backend.util.prompt import CompressResult
+
+from .conftest import build_test_transcript as _build_transcript
+from .service import _friendly_error_text, _is_prompt_too_long
+from .transcript import compact_transcript, validate_transcript

 # ---------------------------------------------------------------------------
 # _flatten_assistant_content
@@ -403,7 +402,7 @@ class TestCompactTranscript:
            },
        )()
        with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
            new_callable=AsyncMock,
            return_value=mock_result,
        ):
@@ -438,7 +437,7 @@ class TestCompactTranscript:
            },
        )()
        with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
            new_callable=AsyncMock,
            return_value=mock_result,
        ):
@@ -462,7 +461,7 @@ class TestCompactTranscript:
            ]
        )
        with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
            new_callable=AsyncMock,
            side_effect=RuntimeError("LLM unavailable"),
        ):
@@ -568,11 +567,11 @@ class TestRunCompressionTimeout:

        with (
            patch(
-                "backend.copilot.sdk.transcript.get_openai_client",
+                "backend.copilot.transcript.get_openai_client",
                return_value="fake-client",
            ),
            patch(
-                "backend.copilot.sdk.transcript.compress_context",
+                "backend.copilot.transcript.compress_context",
                side_effect=_mock_compress,
            ),
        ):
@@ -602,11 +601,11 @@ class TestRunCompressionTimeout:

        with (
            patch(
-                "backend.copilot.sdk.transcript.get_openai_client",
+                "backend.copilot.transcript.get_openai_client",
                return_value=None,
            ),
            patch(
-                "backend.copilot.sdk.transcript.compress_context",
+                "backend.copilot.transcript.compress_context",
                new_callable=AsyncMock,
                return_value=truncation_result,
            ) as mock_compress,
--- a/autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
@@ -26,18 +26,17 @@ from unittest.mock import AsyncMock, MagicMock, patch

 import pytest

-from backend.util import json
-
-from .conftest import build_test_transcript as _build_transcript
-from .service import _MAX_STREAM_ATTEMPTS, _reduce_context
-from .transcript import (
+from backend.copilot.transcript import (
    _flatten_assistant_content,
    _flatten_tool_result_content,
    _messages_to_transcript,
    _transcript_to_messages,
-    compact_transcript,
-    validate_transcript,
 )
+from backend.util import json
+
+from .conftest import build_test_transcript as _build_transcript
+from .service import _MAX_STREAM_ATTEMPTS, _reduce_context
+from .transcript import compact_transcript, validate_transcript
 from .transcript_builder import TranscriptBuilder

 # ---------------------------------------------------------------------------
@@ -113,7 +112,7 @@ class TestScenarioCompactAndRetry:
                )(),
            ),
            patch(
-                "backend.copilot.sdk.transcript._run_compression",
+                "backend.copilot.transcript._run_compression",
                new_callable=AsyncMock,
                return_value=mock_result,
            ),
@@ -170,7 +169,7 @@ class TestScenarioCompactFailsFallback:
                )(),
            ),
            patch(
-                "backend.copilot.sdk.transcript._run_compression",
+                "backend.copilot.transcript._run_compression",
                new_callable=AsyncMock,
                side_effect=RuntimeError("LLM unavailable"),
            ),
@@ -261,7 +260,7 @@ class TestScenarioDoubleFailDBFallback:
                )(),
            ),
            patch(
-                "backend.copilot.sdk.transcript._run_compression",
+                "backend.copilot.transcript._run_compression",
                new_callable=AsyncMock,
                return_value=mock_result,
            ),
@@ -337,7 +336,7 @@ class TestScenarioCompactionIdentical:
                )(),
            ),
            patch(
-                "backend.copilot.sdk.transcript._run_compression",
+                "backend.copilot.transcript._run_compression",
                new_callable=AsyncMock,
                return_value=mock_result,
            ),
@@ -730,7 +729,7 @@ class TestRetryEdgeCases:
                )(),
            ),
            patch(
-                "backend.copilot.sdk.transcript._run_compression",
+                "backend.copilot.transcript._run_compression",
                new_callable=AsyncMock,
                return_value=mock_result,
            ),
@@ -841,7 +840,7 @@ class TestRetryStateReset:
                )(),
            ),
            patch(
-                "backend.copilot.sdk.transcript._run_compression",
+                "backend.copilot.transcript._run_compression",
                new_callable=AsyncMock,
                side_effect=RuntimeError("boom"),
            ),
@@ -1405,9 +1404,9 @@ class TestStreamChatCompletionRetryIntegration:
                events.append(event)

        # Should NOT retry — only 1 attempt for auth errors
-        assert attempt_count[0] == 1, (
-            f"Expected 1 attempt (no retry for auth error), " f"got {attempt_count[0]}"
-        )
+        assert (
+            attempt_count[0] == 1
+        ), f"Expected 1 attempt (no retry for auth error), got {attempt_count[0]}"
        errors = [e for e in events if isinstance(e, StreamError)]
        assert errors, "Expected StreamError"
        assert errors[0].code == "sdk_stream_error"
--- a/autogpt_platform/backend/backend/copilot/sdk/service.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/service.py
@@ -29,16 +29,29 @@ from claude_agent_sdk import (
 )
 from langfuse import propagate_attributes
 from langsmith.integrations.claude_agent_sdk import configure_claude_agent_sdk
+from opentelemetry import trace as otel_trace
 from pydantic import BaseModel

 from backend.copilot.context import get_workspace_manager
 from backend.copilot.permissions import apply_tool_permissions
+from backend.copilot.rate_limit import get_user_tier
+from backend.copilot.transcript import (
+    _run_compression,
+    cleanup_stale_project_dirs,
+    compact_transcript,
+    download_transcript,
+    read_compacted_entries,
+    upload_transcript,
+    validate_transcript,
+    write_transcript_to_tempfile,
+)
+from backend.copilot.transcript_builder import TranscriptBuilder
 from backend.data.redis_client import get_redis_async
 from backend.executor.cluster_lock import AsyncClusterLock
 from backend.util.exceptions import NotFoundError
 from backend.util.settings import Settings

-from ..config import ChatConfig
+from ..config import ChatConfig, CopilotMode
 from ..constants import (
    COPILOT_ERROR_PREFIX,
    COPILOT_RETRYABLE_ERROR_PREFIX,
@@ -51,7 +64,7 @@ from ..model import (
    ChatMessage,
    ChatSession,
    get_chat_session,
-    update_session_title,
+    maybe_append_user_message,
    upsert_chat_session,
 )
 from ..prompting import get_sdk_supplement
@@ -70,11 +83,7 @@ from ..response_model import (
    StreamToolOutputAvailable,
    StreamUsage,
 )
-from ..service import (
-    _build_system_prompt,
-    _generate_session_title,
-    _is_langfuse_configured,
-)
+from ..service import _build_system_prompt, _is_langfuse_configured, _update_title_async
 from ..token_tracking import persist_and_record_usage
 from ..tools.e2b_sandbox import get_or_create_sandbox, pause_sandbox_direct
 from ..tools.sandbox import WORKSPACE_PREFIX, make_session_path
@@ -92,17 +101,6 @@ from .tool_adapter import (
    set_execution_context,
    wait_for_stash,
 )
-from .transcript import (
-    _run_compression,
-    cleanup_stale_project_dirs,
-    compact_transcript,
-    download_transcript,
-    read_compacted_entries,
-    upload_transcript,
-    validate_transcript,
-    write_transcript_to_tempfile,
-)
-from .transcript_builder import TranscriptBuilder

 logger = logging.getLogger(__name__)
 config = ChatConfig()
@@ -129,6 +127,11 @@ _CIRCUIT_BREAKER_ERROR_MSG = (
    "Try breaking your request into smaller parts."
 )

+# Idle timeout: abort the stream if no meaningful SDK message (only heartbeats)
+# arrives for this many seconds. This catches hung tool calls (e.g. WebSearch
+# hanging on a search provider that never responds).
+_IDLE_TIMEOUT_SECONDS = 10 * 60  # 10 minutes
+
 # Patterns that indicate the prompt/request exceeds the model's context limit.
 # Matched case-insensitively against the full exception chain.
 _PROMPT_TOO_LONG_PATTERNS: tuple[str, ...] = (
@@ -1271,6 +1274,8 @@ async def _run_stream_attempt(
            await client.query(state.query_message, session_id=ctx.session_id)
            state.transcript_builder.append_user(content=ctx.current_message)

+        _last_real_msg_time = time.monotonic()
+
        async for sdk_msg in _iter_sdk_messages(client):
            # Heartbeat sentinel — refresh lock and keep SSE alive
            if sdk_msg is None:
@@ -1278,8 +1283,34 @@ async def _run_stream_attempt(
                for ev in ctx.compaction.emit_start_if_ready():
                    yield ev
                yield StreamHeartbeat()
+
+                # Idle timeout: if no real SDK message for too long, a tool
+                # call is likely hung (e.g. WebSearch provider not responding).
+                idle_seconds = time.monotonic() - _last_real_msg_time
+                if idle_seconds >= _IDLE_TIMEOUT_SECONDS:
+                    logger.error(
+                        "%s Idle timeout after %.0fs with no SDK message — "
+                        "aborting stream (likely hung tool call)",
+                        ctx.log_prefix,
+                        idle_seconds,
+                    )
+                    stream_error_msg = (
+                        "A tool call appears to be stuck "
+                        "(no response for 10 minutes). "
+                        "Please try again."
+                    )
+                    stream_error_code = "idle_timeout"
+                    _append_error_marker(ctx.session, stream_error_msg, retryable=True)
+                    yield StreamError(
+                        errorText=stream_error_msg,
+                        code=stream_error_code,
+                    )
+                    ended_with_stream_error = True
+                    break
                continue

+            _last_real_msg_time = time.monotonic()
+
            logger.info(
                "%s Received: %s %s (unresolved=%d, current=%d, resolved=%d)",
                ctx.log_prefix,
@@ -1528,9 +1559,21 @@ async def _run_stream_attempt(
            # --- Intermediate persistence ---
            # Flush session messages to DB periodically so page reloads
            # show progress during long-running turns.
+            #
+            # IMPORTANT: Skip the flush while tool calls are pending
+            # (tool_calls set on assistant but results not yet received).
+            # The DB save is append-only (uses start_sequence), so if we
+            # flush the assistant message before tool_calls are set on it
+            # (text and tool_use arrive as separate SDK events), the
+            # tool_calls update is lost — the next flush starts past it.
            _msgs_since_flush += 1
            now = time.monotonic()
-            if (
+            has_pending_tools = (
+                acc.has_appended_assistant
+                and acc.accumulated_tool_calls
+                and not acc.has_tool_results
+            )
+            if not has_pending_tools and (
                _msgs_since_flush >= _FLUSH_MESSAGE_THRESHOLD
                or (now - _last_flush_time) >= _FLUSH_INTERVAL_SECONDS
            ):
@@ -1630,6 +1673,7 @@ async def stream_chat_completion_sdk(
    session: ChatSession | None = None,
    file_ids: list[str] | None = None,
    permissions: "CopilotPermissions | None" = None,
+    mode: CopilotMode | None = None,
    **_kwargs: Any,
 ) -> AsyncIterator[StreamBaseResponse]:
    """Stream chat completion using Claude Agent SDK.
@@ -1638,7 +1682,10 @@ async def stream_chat_completion_sdk(
        file_ids: Optional workspace file IDs attached to the user's message.
            Images are embedded as vision content blocks; other files are
            saved to the SDK working directory for the Read tool.
+        mode: Accepted for signature compatibility with the baseline path.
+            The SDK path does not currently branch on this value.
    """
+    _ = mode  # SDK path ignores the requested mode.

    if session is None:
        session = await get_chat_session(session_id, user_id)
@@ -1669,19 +1716,12 @@ async def stream_chat_completion_sdk(
        )
        session.messages.pop()

-    # Append the new message to the session if it's not already there
-    new_message_role = "user" if is_user_message else "assistant"
-    if message and (
-        len(session.messages) == 0
-        or not (
-            session.messages[-1].role == new_message_role
-            and session.messages[-1].content == message
-        )
-    ):
-        session.messages.append(ChatMessage(role=new_message_role, content=message))
+    if maybe_append_user_message(session, message, is_user_message):
        if is_user_message:
            track_user_message(
-                user_id=user_id, session_id=session_id, message_length=len(message)
+                user_id=user_id,
+                session_id=session_id,
+                message_length=len(message or ""),
            )

    # Structured log prefix: [SDK][<session>][T<turn>]
@@ -1946,15 +1986,20 @@ async def stream_chat_completion_sdk(
        # langsmith tracing integration attaches them to every span.  This
        # is what Langfuse (or any OTEL backend) maps to its native
        # user/session fields.
+        _user_tier = await get_user_tier(user_id) if user_id else None
+        _otel_metadata: dict[str, str] = {
+            "resume": str(use_resume),
+            "conversation_turn": str(turn),
+        }
+        if _user_tier:
+            _otel_metadata["subscription_tier"] = _user_tier.value
+
        _otel_ctx = propagate_attributes(
            user_id=user_id,
            session_id=session_id,
            trace_name="copilot-sdk",
            tags=["sdk"],
-            metadata={
-                "resume": str(use_resume),
-                "conversation_turn": str(turn),
-            },
+            metadata=_otel_metadata,
        )
        _otel_ctx.__enter__()

@@ -2323,8 +2368,26 @@ async def stream_chat_completion_sdk(

        raise
    finally:
-        # --- Close OTEL context ---
+        # --- Close OTEL context (with cost attributes) ---
        if _otel_ctx is not None:
+            try:
+                span = otel_trace.get_current_span()
+                if span and span.is_recording():
+                    span.set_attribute("gen_ai.usage.prompt_tokens", turn_prompt_tokens)
+                    span.set_attribute(
+                        "gen_ai.usage.completion_tokens", turn_completion_tokens
+                    )
+                    span.set_attribute(
+                        "gen_ai.usage.cache_read_tokens", turn_cache_read_tokens
+                    )
+                    span.set_attribute(
+                        "gen_ai.usage.cache_creation_tokens",
+                        turn_cache_creation_tokens,
+                    )
+                    if turn_cost_usd is not None:
+                        span.set_attribute("gen_ai.usage.cost_usd", turn_cost_usd)
+            except Exception:
+                logger.debug("Failed to set OTEL cost attributes", exc_info=True)
            try:
                _otel_ctx.__exit__(*sys.exc_info())
            except Exception:
@@ -2342,6 +2405,8 @@ async def stream_chat_completion_sdk(
            cache_creation_tokens=turn_cache_creation_tokens,
            log_prefix=log_prefix,
            cost_usd=turn_cost_usd,
+            model=config.model,
+            provider="anthropic",
        )

        # --- Persist session messages ---
@@ -2446,18 +2511,3 @@ async def stream_chat_completion_sdk(
        finally:
            # Release stream lock to allow new streams for this session
            await lock.release()
-
-
-async def _update_title_async(
-    session_id: str, message: str, user_id: str | None = None
-) -> None:
-    """Background task to update session title."""
-    try:
-        title = await _generate_session_title(
-            message, user_id=user_id, session_id=session_id
-        )
-        if title and user_id:
-            await update_session_title(session_id, user_id, title, only_if_empty=True)
-            logger.debug("[SDK] Generated title for %s: %s", session_id, title)
-    except Exception as e:
-        logger.warning("[SDK] Failed to update session title: %s", e)
--- a/autogpt_platform/backend/backend/copilot/sdk/thinking_blocks_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/thinking_blocks_test.py
@@ -27,20 +27,19 @@ from backend.copilot.response_model import (
    StreamTextDelta,
    StreamTextStart,
 )
-from backend.util import json
-
-from .conftest import build_structured_transcript
-from .response_adapter import SDKResponseAdapter
-from .service import _format_sdk_content_blocks
-from .transcript import (
+from backend.copilot.transcript import (
    _find_last_assistant_entry,
    _flatten_assistant_content,
    _messages_to_transcript,
    _rechain_tail,
    _transcript_to_messages,
-    compact_transcript,
-    validate_transcript,
 )
+from backend.util import json
+
+from .conftest import build_structured_transcript
+from .response_adapter import SDKResponseAdapter
+from .service import _format_sdk_content_blocks
+from .transcript import compact_transcript, validate_transcript

 # ---------------------------------------------------------------------------
 # Fixtures: realistic thinking block content
@@ -439,7 +438,7 @@ class TestCompactTranscriptThinkingBlocks:
            },
        )()
        with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
            new_callable=AsyncMock,
            return_value=mock_result,
        ):
@@ -498,7 +497,7 @@ class TestCompactTranscriptThinkingBlocks:
            )()

        with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
            side_effect=mock_compression,
        ):
            await compact_transcript(transcript, model="test-model")
@@ -551,7 +550,7 @@ class TestCompactTranscriptThinkingBlocks:
            },
        )()
        with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
            new_callable=AsyncMock,
            return_value=mock_result,
        ):
@@ -601,7 +600,7 @@ class TestCompactTranscriptThinkingBlocks:
            },
        )()
        with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
            new_callable=AsyncMock,
            return_value=mock_result,
        ):
@@ -638,7 +637,7 @@ class TestCompactTranscriptThinkingBlocks:
            },
        )()
        with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
            new_callable=AsyncMock,
            return_value=mock_result,
        ):
@@ -699,7 +698,7 @@ class TestCompactTranscriptThinkingBlocks:
            },
        )()
        with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
            new_callable=AsyncMock,
            return_value=mock_result,
        ):
--- a/autogpt_platform/backend/backend/copilot/sdk/transcript.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/transcript.py
--- a/autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
@@ -1,235 +1,10 @@
-"""Build complete JSONL transcript from SDK messages.
+"""Re-export from shared ``backend.copilot.transcript_builder`` for backward compat.

-The transcript represents the FULL active context at any point in time.
-Each upload REPLACES the previous transcript atomically.
-
-Flow:
-  Turn 1: Upload [msg1, msg2]
-  Turn 2: Download [msg1, msg2] → Upload [msg1, msg2, msg3, msg4] (REPLACE)
-  Turn 3: Download [msg1, msg2, msg3, msg4] → Upload [all messages] (REPLACE)
-
-The transcript is never incremental - always the complete atomic state.
+The canonical implementation now lives at ``backend.copilot.transcript_builder``
+so both the SDK and baseline paths can import without cross-package
+dependencies.
 """

-import logging
-from typing import Any
-from uuid import uuid4
+from backend.copilot.transcript_builder import TranscriptBuilder, TranscriptEntry

-from pydantic import BaseModel
-
-from backend.util import json
-
-from .transcript import STRIPPABLE_TYPES
-
-logger = logging.getLogger(__name__)
-
-
-class TranscriptEntry(BaseModel):
-    """Single transcript entry (user or assistant turn)."""
-
-    type: str
-    uuid: str
-    parentUuid: str | None
-    isCompactSummary: bool | None = None
-    message: dict[str, Any]
-
-
-class TranscriptBuilder:
-    """Build complete JSONL transcript from SDK messages.
-
-    This builder maintains the FULL conversation state, not incremental changes.
-    The output is always the complete active context.
-    """
-
-    def __init__(self) -> None:
-        self._entries: list[TranscriptEntry] = []
-        self._last_uuid: str | None = None
-
-    def _last_is_assistant(self) -> bool:
-        return bool(self._entries) and self._entries[-1].type == "assistant"
-
-    def _last_message_id(self) -> str:
-        """Return the message.id of the last entry, or '' if none."""
-        if self._entries:
-            return self._entries[-1].message.get("id", "")
-        return ""
-
-    @staticmethod
-    def _parse_entry(data: dict) -> TranscriptEntry | None:
-        """Parse a single transcript entry, filtering strippable types.
-
-        Returns ``None`` for entries that should be skipped (strippable types
-        that are not compaction summaries).
-        """
-        entry_type = data.get("type", "")
-        if entry_type in STRIPPABLE_TYPES and not data.get("isCompactSummary"):
-            return None
-        return TranscriptEntry(
-            type=entry_type,
-            uuid=data.get("uuid") or str(uuid4()),
-            parentUuid=data.get("parentUuid"),
-            isCompactSummary=data.get("isCompactSummary"),
-            message=data.get("message", {}),
-        )
-
-    def load_previous(self, content: str, log_prefix: str = "[Transcript]") -> None:
-        """Load complete previous transcript.
-
-        This loads the FULL previous context. As new messages come in,
-        we append to this state. The final output is the complete context
-        (previous + new), not just the delta.
-        """
-        if not content or not content.strip():
-            return
-
-        lines = content.strip().split("\n")
-        for line_num, line in enumerate(lines, 1):
-            if not line.strip():
-                continue
-
-            data = json.loads(line, fallback=None)
-            if data is None:
-                logger.warning(
-                    "%s Failed to parse transcript line %d/%d",
-                    log_prefix,
-                    line_num,
-                    len(lines),
-                )
-                continue
-
-            entry = self._parse_entry(data)
-            if entry is None:
-                continue
-            self._entries.append(entry)
-            self._last_uuid = entry.uuid
-
-        logger.info(
-            "%s Loaded %d entries from previous transcript (last_uuid=%s)",
-            log_prefix,
-            len(self._entries),
-            self._last_uuid[:12] if self._last_uuid else None,
-        )
-
-    def append_user(self, content: str | list[dict], uuid: str | None = None) -> None:
-        """Append a user entry."""
-        msg_uuid = uuid or str(uuid4())
-
-        self._entries.append(
-            TranscriptEntry(
-                type="user",
-                uuid=msg_uuid,
-                parentUuid=self._last_uuid,
-                message={"role": "user", "content": content},
-            )
-        )
-        self._last_uuid = msg_uuid
-
-    def append_tool_result(self, tool_use_id: str, content: str) -> None:
-        """Append a tool result as a user entry (one per tool call)."""
-        self.append_user(
-            content=[
-                {"type": "tool_result", "tool_use_id": tool_use_id, "content": content}
-            ]
-        )
-
-    def append_assistant(
-        self,
-        content_blocks: list[dict],
-        model: str = "",
-        stop_reason: str | None = None,
-    ) -> None:
-        """Append an assistant entry.
-
-        Consecutive assistant entries automatically share the same message ID
-        so the CLI can merge them (thinking → text → tool_use) into a single
-        API message on ``--resume``.  A new ID is assigned whenever an
-        assistant entry follows a non-assistant entry (user message or tool
-        result), because that marks the start of a new API response.
-        """
-        message_id = (
-            self._last_message_id()
-            if self._last_is_assistant()
-            else f"msg_sdk_{uuid4().hex[:24]}"
-        )
-
-        msg_uuid = str(uuid4())
-
-        self._entries.append(
-            TranscriptEntry(
-                type="assistant",
-                uuid=msg_uuid,
-                parentUuid=self._last_uuid,
-                message={
-                    "role": "assistant",
-                    "model": model,
-                    "id": message_id,
-                    "type": "message",
-                    "content": content_blocks,
-                    "stop_reason": stop_reason,
-                    "stop_sequence": None,
-                },
-            )
-        )
-        self._last_uuid = msg_uuid
-
-    def replace_entries(
-        self, compacted_entries: list[dict], log_prefix: str = "[Transcript]"
-    ) -> None:
-        """Replace all entries with compacted entries from the CLI session file.
-
-        Called after mid-stream compaction so TranscriptBuilder mirrors the
-        CLI's active context (compaction summary + post-compaction entries).
-
-        Builds the new list first and validates it's non-empty before swapping,
-        so corrupt input cannot wipe the conversation history.
-        """
-        new_entries: list[TranscriptEntry] = []
-        for data in compacted_entries:
-            entry = self._parse_entry(data)
-            if entry is not None:
-                new_entries.append(entry)
-
-        if not new_entries:
-            logger.warning(
-                "%s replace_entries produced 0 entries from %d inputs, keeping old (%d entries)",
-                log_prefix,
-                len(compacted_entries),
-                len(self._entries),
-            )
-            return
-
-        old_count = len(self._entries)
-        self._entries = new_entries
-        self._last_uuid = new_entries[-1].uuid
-
-        logger.info(
-            "%s TranscriptBuilder compacted: %d entries -> %d entries",
-            log_prefix,
-            old_count,
-            len(self._entries),
-        )
-
-    def to_jsonl(self) -> str:
-        """Export complete context as JSONL.
-
-        Consecutive assistant entries are kept separate to match the
-        native CLI format — the SDK merges them internally on resume.
-
-        Returns the FULL conversation state (all entries), not incremental.
-        This output REPLACES any previous transcript.
-        """
-        if not self._entries:
-            return ""
-
-        lines = [entry.model_dump_json(exclude_none=True) for entry in self._entries]
-        return "\n".join(lines) + "\n"
-
-    @property
-    def entry_count(self) -> int:
-        """Total number of entries in the complete context."""
-        return len(self._entries)
-
-    @property
-    def is_empty(self) -> bool:
-        """Whether this builder has any entries."""
-        return len(self._entries) == 0
+__all__ = ["TranscriptBuilder", "TranscriptEntry"]
--- a/autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
@@ -303,7 +303,7 @@ class TestDeleteTranscript:
        mock_storage.delete = AsyncMock()

        with patch(
-            "backend.copilot.sdk.transcript.get_workspace_storage",
+            "backend.copilot.transcript.get_workspace_storage",
            new_callable=AsyncMock,
            return_value=mock_storage,
        ):
@@ -323,7 +323,7 @@ class TestDeleteTranscript:
        )

        with patch(
-            "backend.copilot.sdk.transcript.get_workspace_storage",
+            "backend.copilot.transcript.get_workspace_storage",
            new_callable=AsyncMock,
            return_value=mock_storage,
        ):
@@ -341,7 +341,7 @@ class TestDeleteTranscript:
        )

        with patch(
-            "backend.copilot.sdk.transcript.get_workspace_storage",
+            "backend.copilot.transcript.get_workspace_storage",
            new_callable=AsyncMock,
            return_value=mock_storage,
        ):
@@ -850,7 +850,7 @@ class TestRunCompression:
    @pytest.mark.asyncio
    async def test_no_client_uses_truncation(self):
        """Path (a): ``get_openai_client()`` returns None → truncation only."""
-        from .transcript import _run_compression
+        from backend.copilot.transcript import _run_compression

        truncation_result = self._make_compress_result(
            True, [{"role": "user", "content": "truncated"}]
@@ -858,11 +858,11 @@ class TestRunCompression:

        with (
            patch(
-                "backend.copilot.sdk.transcript.get_openai_client",
+                "backend.copilot.transcript.get_openai_client",
                return_value=None,
            ),
            patch(
-                "backend.copilot.sdk.transcript.compress_context",
+                "backend.copilot.transcript.compress_context",
                new_callable=AsyncMock,
                return_value=truncation_result,
            ) as mock_compress,
@@ -885,7 +885,7 @@ class TestRunCompression:
    @pytest.mark.asyncio
    async def test_llm_success_returns_llm_result(self):
        """Path (b): ``get_openai_client()`` returns a client → LLM compresses."""
-        from .transcript import _run_compression
+        from backend.copilot.transcript import _run_compression

        llm_result = self._make_compress_result(
            True, [{"role": "user", "content": "LLM summary"}]
@@ -894,11 +894,11 @@ class TestRunCompression:

        with (
            patch(
-                "backend.copilot.sdk.transcript.get_openai_client",
+                "backend.copilot.transcript.get_openai_client",
                return_value=mock_client,
            ),
            patch(
-                "backend.copilot.sdk.transcript.compress_context",
+                "backend.copilot.transcript.compress_context",
                new_callable=AsyncMock,
                return_value=llm_result,
            ) as mock_compress,
@@ -916,7 +916,7 @@ class TestRunCompression:
    @pytest.mark.asyncio
    async def test_llm_failure_falls_back_to_truncation(self):
        """Path (c): LLM call raises → truncation fallback used instead."""
-        from .transcript import _run_compression
+        from backend.copilot.transcript import _run_compression

        truncation_result = self._make_compress_result(
            True, [{"role": "user", "content": "truncated fallback"}]
@@ -932,11 +932,11 @@ class TestRunCompression:

        with (
            patch(
-                "backend.copilot.sdk.transcript.get_openai_client",
+                "backend.copilot.transcript.get_openai_client",
                return_value=mock_client,
            ),
            patch(
-                "backend.copilot.sdk.transcript.compress_context",
+                "backend.copilot.transcript.compress_context",
                side_effect=_compress_side_effect,
            ),
        ):
@@ -953,7 +953,7 @@ class TestRunCompression:
    @pytest.mark.asyncio
    async def test_llm_timeout_falls_back_to_truncation(self):
        """Path (d): LLM call exceeds timeout → truncation fallback used."""
-        from .transcript import _run_compression
+        from backend.copilot.transcript import _run_compression

        truncation_result = self._make_compress_result(
            True, [{"role": "user", "content": "truncated after timeout"}]
@@ -970,19 +970,19 @@ class TestRunCompression:
        fake_client = MagicMock()
        with (
            patch(
-                "backend.copilot.sdk.transcript.get_openai_client",
+                "backend.copilot.transcript.get_openai_client",
                return_value=fake_client,
            ),
            patch(
-                "backend.copilot.sdk.transcript.compress_context",
+                "backend.copilot.transcript.compress_context",
                side_effect=_compress_side_effect,
            ),
            patch(
-                "backend.copilot.sdk.transcript._COMPACTION_TIMEOUT_SECONDS",
+                "backend.copilot.transcript._COMPACTION_TIMEOUT_SECONDS",
                0.05,
            ),
            patch(
-                "backend.copilot.sdk.transcript._TRUNCATION_TIMEOUT_SECONDS",
+                "backend.copilot.transcript._TRUNCATION_TIMEOUT_SECONDS",
                5,
            ),
        ):
@@ -1007,7 +1007,7 @@ class TestCleanupStaleProjectDirs:

    def test_removes_old_copilot_dirs(self, tmp_path, monkeypatch):
        """Directories matching copilot pattern older than threshold are removed."""
-        from backend.copilot.sdk.transcript import (
+        from backend.copilot.transcript import (
            _STALE_PROJECT_DIR_SECONDS,
            cleanup_stale_project_dirs,
        )
@@ -1015,7 +1015,7 @@ class TestCleanupStaleProjectDirs:
        projects_dir = tmp_path / "projects"
        projects_dir.mkdir()
        monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
            lambda: str(projects_dir),
        )

@@ -1039,12 +1039,12 @@ class TestCleanupStaleProjectDirs:

    def test_ignores_non_copilot_dirs(self, tmp_path, monkeypatch):
        """Directories not matching copilot pattern are left alone."""
-        from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
+        from backend.copilot.transcript import cleanup_stale_project_dirs

        projects_dir = tmp_path / "projects"
        projects_dir.mkdir()
        monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
            lambda: str(projects_dir),
        )

@@ -1062,7 +1062,7 @@ class TestCleanupStaleProjectDirs:

    def test_ttl_boundary_not_removed(self, tmp_path, monkeypatch):
        """A directory exactly at the TTL boundary should NOT be removed."""
-        from backend.copilot.sdk.transcript import (
+        from backend.copilot.transcript import (
            _STALE_PROJECT_DIR_SECONDS,
            cleanup_stale_project_dirs,
        )
@@ -1070,7 +1070,7 @@ class TestCleanupStaleProjectDirs:
        projects_dir = tmp_path / "projects"
        projects_dir.mkdir()
        monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
            lambda: str(projects_dir),
        )

@@ -1088,7 +1088,7 @@ class TestCleanupStaleProjectDirs:

    def test_skips_non_directory_entries(self, tmp_path, monkeypatch):
        """Regular files matching the copilot pattern are not removed."""
-        from backend.copilot.sdk.transcript import (
+        from backend.copilot.transcript import (
            _STALE_PROJECT_DIR_SECONDS,
            cleanup_stale_project_dirs,
        )
@@ -1096,7 +1096,7 @@ class TestCleanupStaleProjectDirs:
        projects_dir = tmp_path / "projects"
        projects_dir.mkdir()
        monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
            lambda: str(projects_dir),
        )

@@ -1114,11 +1114,11 @@ class TestCleanupStaleProjectDirs:

    def test_missing_base_dir_returns_zero(self, tmp_path, monkeypatch):
        """If the projects base directory doesn't exist, return 0 gracefully."""
-        from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
+        from backend.copilot.transcript import cleanup_stale_project_dirs

        nonexistent = str(tmp_path / "does-not-exist" / "projects")
        monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
            lambda: nonexistent,
        )

@@ -1129,7 +1129,7 @@ class TestCleanupStaleProjectDirs:
        """When encoded_cwd is supplied only that directory is swept."""
        import time

-        from backend.copilot.sdk.transcript import (
+        from backend.copilot.transcript import (
            _STALE_PROJECT_DIR_SECONDS,
            cleanup_stale_project_dirs,
        )
@@ -1137,7 +1137,7 @@ class TestCleanupStaleProjectDirs:
        projects_dir = tmp_path / "projects"
        projects_dir.mkdir()
        monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
            lambda: str(projects_dir),
        )

@@ -1160,12 +1160,12 @@ class TestCleanupStaleProjectDirs:

    def test_scoped_fresh_dir_not_removed(self, tmp_path, monkeypatch):
        """Scoped sweep leaves a fresh directory alone."""
-        from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
+        from backend.copilot.transcript import cleanup_stale_project_dirs

        projects_dir = tmp_path / "projects"
        projects_dir.mkdir()
        monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
            lambda: str(projects_dir),
        )

@@ -1181,7 +1181,7 @@ class TestCleanupStaleProjectDirs:
        """Scoped sweep refuses to remove a non-copilot directory."""
        import time

-        from backend.copilot.sdk.transcript import (
+        from backend.copilot.transcript import (
            _STALE_PROJECT_DIR_SECONDS,
            cleanup_stale_project_dirs,
        )
@@ -1189,7 +1189,7 @@ class TestCleanupStaleProjectDirs:
        projects_dir = tmp_path / "projects"
        projects_dir.mkdir()
        monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
            lambda: str(projects_dir),
        )

--- a/autogpt_platform/backend/backend/copilot/service.py
+++ b/autogpt_platform/backend/backend/copilot/service.py
@@ -22,7 +22,12 @@ from backend.util.exceptions import NotAuthorizedError, NotFoundError
 from backend.util.settings import AppEnvironment, Settings

 from .config import ChatConfig
-from .model import ChatSessionInfo, get_chat_session, upsert_chat_session
+from .model import (
+    ChatSessionInfo,
+    get_chat_session,
+    update_session_title,
+    upsert_chat_session,
+)

 logger = logging.getLogger(__name__)

@@ -202,6 +207,22 @@ async def _generate_session_title(
        return None


+async def _update_title_async(
+    session_id: str, message: str, user_id: str | None = None
+) -> None:
+    """Generate and persist a session title in the background.
+
+    Shared by both the SDK and baseline execution paths.
+    """
+    try:
+        title = await _generate_session_title(message, user_id, session_id)
+        if title and user_id:
+            await update_session_title(session_id, user_id, title, only_if_empty=True)
+            logger.debug("Generated title for session %s", session_id)
+    except Exception as e:
+        logger.warning("Failed to update session title for %s: %s", session_id, e)
+
+
 async def assign_user_to_session(
    session_id: str,
    user_id: str,
--- a/autogpt_platform/backend/backend/copilot/service_test.py
+++ b/autogpt_platform/backend/backend/copilot/service_test.py
@@ -7,7 +7,7 @@ import pytest
 from .model import create_chat_session, get_chat_session, upsert_chat_session
 from .response_model import StreamError, StreamTextDelta
 from .sdk import service as sdk_service
-from .sdk.transcript import download_transcript
+from .transcript import download_transcript

 logger = logging.getLogger(__name__)

--- a/autogpt_platform/backend/backend/copilot/token_tracking.py
+++ b/autogpt_platform/backend/backend/copilot/token_tracking.py
@@ -4,17 +4,85 @@ Both the baseline (OpenRouter) and SDK (Anthropic) service layers need to:
  1. Append a ``Usage`` record to the session.
  2. Log the turn's token counts.
  3. Record weighted usage in Redis for rate-limiting.
+  4. Write a PlatformCostLog entry for admin cost tracking.

 This module extracts that common logic so both paths stay in sync.
 """

+import asyncio
 import logging
+import math
+import re
+import threading
+
+from backend.data.db_accessors import platform_cost_db
+from backend.data.platform_cost import PlatformCostEntry, usd_to_microdollars

 from .model import ChatSession, Usage
 from .rate_limit import record_token_usage

 logger = logging.getLogger(__name__)

+# Hold strong references to in-flight cost log tasks to prevent GC.
+_pending_log_tasks: set[asyncio.Task[None]] = set()
+# Guards all reads and writes to _pending_log_tasks. Done callbacks (discard)
+# fire from the event loop thread; drain_pending_cost_logs iterates the set
+# from any caller — the lock prevents RuntimeError from concurrent modification.
+_pending_log_tasks_lock = threading.Lock()
+# Per-loop semaphores: asyncio.Semaphore is not thread-safe and must not be
+# shared across event loops running in different threads.
+_log_semaphores: dict[asyncio.AbstractEventLoop, asyncio.Semaphore] = {}
+
+
+def _get_log_semaphore() -> asyncio.Semaphore:
+    loop = asyncio.get_running_loop()
+    sem = _log_semaphores.get(loop)
+    if sem is None:
+        sem = asyncio.Semaphore(50)
+        _log_semaphores[loop] = sem
+    return sem
+
+
+def _schedule_cost_log(entry: PlatformCostEntry) -> None:
+    """Schedule a fire-and-forget cost log via DatabaseManagerAsyncClient RPC."""
+
+    async def _safe_log() -> None:
+        async with _get_log_semaphore():
+            try:
+                await platform_cost_db().log_platform_cost(entry)
+            except Exception:
+                logger.exception(
+                    "Failed to log platform cost for user=%s provider=%s block=%s",
+                    entry.user_id,
+                    entry.provider,
+                    entry.block_name,
+                )
+
+    task = asyncio.create_task(_safe_log())
+    with _pending_log_tasks_lock:
+        _pending_log_tasks.add(task)
+
+    def _remove(t: asyncio.Task[None]) -> None:
+        with _pending_log_tasks_lock:
+            _pending_log_tasks.discard(t)
+
+    task.add_done_callback(_remove)
+
+
+# Identifiers used by PlatformCostLog for copilot turns (not tied to a real
+# block/credential in the block_cost_config or credentials_store tables).
+COPILOT_BLOCK_ID = "copilot"
+COPILOT_CREDENTIAL_ID = "copilot_system"
+
+
+def _copilot_block_name(log_prefix: str) -> str:
+    """Extract stable block_name from ``"[SDK][session][T1]"`` -> ``"copilot:SDK"``."""
+    match = re.search(r"\[([A-Za-z][A-Za-z0-9_]*)\]", log_prefix)
+    if match:
+        return f"{COPILOT_BLOCK_ID}:{match.group(1)}"
+    tag = log_prefix.strip(" []")
+    return f"{COPILOT_BLOCK_ID}:{tag}" if tag else COPILOT_BLOCK_ID
+

 async def persist_and_record_usage(
    *,
@@ -26,6 +94,8 @@ async def persist_and_record_usage(
    cache_creation_tokens: int = 0,
    log_prefix: str = "",
    cost_usd: float | str | None = None,
+    model: str | None = None,
+    provider: str = "open_router",
 ) -> int:
    """Persist token usage to session and record for rate limiting.

@@ -38,6 +108,7 @@ async def persist_and_record_usage(
        cache_creation_tokens: Tokens written to prompt cache (Anthropic only).
        log_prefix: Prefix for log messages (e.g. "[SDK]", "[Baseline]").
        cost_usd: Optional cost for logging (float from SDK, str otherwise).
+        provider: Cost provider name (e.g. "anthropic", "open_router").

    Returns:
        The computed total_tokens (prompt + completion; cache excluded).
@@ -47,12 +118,13 @@ async def persist_and_record_usage(
    cache_read_tokens = max(0, cache_read_tokens)
    cache_creation_tokens = max(0, cache_creation_tokens)

-    if (
+    no_tokens = (
        prompt_tokens <= 0
        and completion_tokens <= 0
        and cache_read_tokens <= 0
        and cache_creation_tokens <= 0
-    ):
+    )
+    if no_tokens and cost_usd is None:
        return 0

    # total_tokens = prompt + completion. Cache tokens are tracked
@@ -73,14 +145,14 @@ async def persist_and_record_usage(

    if cache_read_tokens or cache_creation_tokens:
        logger.info(
-            f"{log_prefix} Turn usage: uncached={prompt_tokens}, "
-            f"cache_read={cache_read_tokens}, cache_create={cache_creation_tokens}, "
-            f"output={completion_tokens}, total={total_tokens}, cost_usd={cost_usd}"
+            f"{log_prefix} Turn usage: uncached={prompt_tokens}, cache_read={cache_read_tokens},"
+            f" cache_create={cache_creation_tokens}, output={completion_tokens},"
+            f" total={total_tokens}, cost_usd={cost_usd}"
        )
    else:
        logger.info(
-            f"{log_prefix} Turn usage: prompt={prompt_tokens}, "
-            f"completion={completion_tokens}, total={total_tokens}"
+            f"{log_prefix} Turn usage: prompt={prompt_tokens}, completion={completion_tokens},"
+            f" total={total_tokens}"
        )

    if user_id:
@@ -93,6 +165,54 @@ async def persist_and_record_usage(
                cache_creation_tokens=cache_creation_tokens,
            )
        except Exception as usage_err:
-            logger.warning(f"{log_prefix} Failed to record token usage: {usage_err}")
+            logger.warning("%s Failed to record token usage: %s", log_prefix, usage_err)
+
+    # Log to PlatformCostLog for admin cost dashboard.
+    # Include entries where cost_usd is set even if token count is 0
+    # (e.g. fully-cached Anthropic responses where only cache tokens
+    # accumulate a charge without incrementing total_tokens).
+    if user_id and (total_tokens > 0 or cost_usd is not None):
+        cost_float = None
+        if cost_usd is not None:
+            try:
+                val = float(cost_usd)
+                if math.isfinite(val) and val >= 0:
+                    cost_float = val
+            except (ValueError, TypeError):
+                pass
+
+        cost_microdollars = usd_to_microdollars(cost_float)
+        session_id = session.session_id if session else None
+
+        if cost_float is not None:
+            tracking_type = "cost_usd"
+            tracking_amount = cost_float
+        else:
+            tracking_type = "tokens"
+            tracking_amount = total_tokens
+
+        _schedule_cost_log(
+            PlatformCostEntry(
+                user_id=user_id,
+                graph_exec_id=session_id,
+                block_id=COPILOT_BLOCK_ID,
+                block_name=_copilot_block_name(log_prefix),
+                provider=provider,
+                credential_id=COPILOT_CREDENTIAL_ID,
+                cost_microdollars=cost_microdollars,
+                input_tokens=prompt_tokens,
+                output_tokens=completion_tokens,
+                model=model,
+                tracking_type=tracking_type,
+                tracking_amount=tracking_amount,
+                metadata={
+                    "tracking_type": tracking_type,
+                    "tracking_amount": tracking_amount,
+                    "cache_read_tokens": cache_read_tokens,
+                    "cache_creation_tokens": cache_creation_tokens,
+                    "source": "copilot",
+                },
+            )
+        )

    return total_tokens
--- a/autogpt_platform/backend/backend/copilot/token_tracking_test.py
+++ b/autogpt_platform/backend/backend/copilot/token_tracking_test.py
@@ -4,6 +4,7 @@ Covers both the baseline (prompt+completion only) and SDK (with cache breakdown)
 calling conventions, session persistence, and rate-limit recording.
 """

+import asyncio
 from datetime import UTC, datetime
 from unittest.mock import AsyncMock, patch

@@ -279,3 +280,260 @@ class TestRateLimitRecording:
                completion_tokens=0,
            )
        mock_record.assert_not_awaited()
+
+
+# ---------------------------------------------------------------------------
+# PlatformCostLog integration
+# ---------------------------------------------------------------------------
+
+
+class TestPlatformCostLogging:
+    @pytest.mark.asyncio
+    async def test_logs_cost_entry_with_cost_usd(self):
+        """When cost_usd is provided, tracking_type should be 'cost_usd'."""
+        mock_log = AsyncMock()
+        with (
+            patch(
+                "backend.copilot.token_tracking.record_token_usage",
+                new_callable=AsyncMock,
+            ),
+            patch(
+                "backend.copilot.token_tracking.platform_cost_db",
+                return_value=type(
+                    "FakePlatformCostDb", (), {"log_platform_cost": mock_log}
+                )(),
+            ),
+        ):
+            await persist_and_record_usage(
+                session=_make_session(),
+                user_id="user-cost",
+                prompt_tokens=200,
+                completion_tokens=100,
+                cost_usd=0.005,
+                model="gpt-4",
+                provider="anthropic",
+                log_prefix="[SDK]",
+            )
+            await asyncio.sleep(0)
+        mock_log.assert_awaited_once()
+        entry = mock_log.call_args[0][0]
+        assert entry.user_id == "user-cost"
+        assert entry.provider == "anthropic"
+        assert entry.model == "gpt-4"
+        assert entry.cost_microdollars == 5000
+        assert entry.input_tokens == 200
+        assert entry.output_tokens == 100
+        assert entry.tracking_type == "cost_usd"
+        assert entry.metadata["tracking_type"] == "cost_usd"
+        assert entry.metadata["tracking_amount"] == 0.005
+        assert entry.block_name == "copilot:SDK"
+        assert entry.graph_exec_id == "sess-test"
+
+    @pytest.mark.asyncio
+    async def test_logs_cost_entry_without_cost_usd(self):
+        """When cost_usd is None, tracking_type should be 'tokens'."""
+        mock_log = AsyncMock()
+        with (
+            patch(
+                "backend.copilot.token_tracking.record_token_usage",
+                new_callable=AsyncMock,
+            ),
+            patch(
+                "backend.copilot.token_tracking.platform_cost_db",
+                return_value=type(
+                    "FakePlatformCostDb", (), {"log_platform_cost": mock_log}
+                )(),
+            ),
+        ):
+            await persist_and_record_usage(
+                session=None,
+                user_id="user-tokens",
+                prompt_tokens=100,
+                completion_tokens=50,
+                log_prefix="[Baseline]",
+            )
+            await asyncio.sleep(0)
+        mock_log.assert_awaited_once()
+        entry = mock_log.call_args[0][0]
+        assert entry.cost_microdollars is None
+        assert entry.tracking_type == "tokens"
+        assert entry.metadata["tracking_type"] == "tokens"
+        assert entry.metadata["tracking_amount"] == 150
+        assert entry.graph_exec_id is None
+        assert entry.block_name == "copilot:Baseline"
+
+    @pytest.mark.asyncio
+    async def test_skips_cost_log_when_no_user_id(self):
+        """No PlatformCostLog entry when user_id is None."""
+        mock_log = AsyncMock()
+        with (
+            patch(
+                "backend.copilot.token_tracking.record_token_usage",
+                new_callable=AsyncMock,
+            ),
+            patch(
+                "backend.copilot.token_tracking.platform_cost_db",
+                return_value=type(
+                    "FakePlatformCostDb", (), {"log_platform_cost": mock_log}
+                )(),
+            ),
+        ):
+            await persist_and_record_usage(
+                session=None,
+                user_id=None,
+                prompt_tokens=100,
+                completion_tokens=50,
+            )
+            await asyncio.sleep(0)
+        mock_log.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_cost_usd_invalid_string_falls_back_to_tokens(self):
+        """Invalid cost_usd string should fall back to tokens tracking."""
+        mock_log = AsyncMock()
+        with (
+            patch(
+                "backend.copilot.token_tracking.record_token_usage",
+                new_callable=AsyncMock,
+            ),
+            patch(
+                "backend.copilot.token_tracking.platform_cost_db",
+                return_value=type(
+                    "FakePlatformCostDb", (), {"log_platform_cost": mock_log}
+                )(),
+            ),
+        ):
+            await persist_and_record_usage(
+                session=None,
+                user_id="user-invalid",
+                prompt_tokens=100,
+                completion_tokens=50,
+                cost_usd="not-a-number",
+            )
+            await asyncio.sleep(0)
+        mock_log.assert_awaited_once()
+        entry = mock_log.call_args[0][0]
+        assert entry.cost_microdollars is None
+        assert entry.metadata["tracking_type"] == "tokens"
+
+    @pytest.mark.asyncio
+    async def test_cost_usd_string_number_is_parsed(self):
+        """String-encoded cost_usd (e.g. from OpenRouter) should be parsed."""
+        mock_log = AsyncMock()
+        with (
+            patch(
+                "backend.copilot.token_tracking.record_token_usage",
+                new_callable=AsyncMock,
+            ),
+            patch(
+                "backend.copilot.token_tracking.platform_cost_db",
+                return_value=type(
+                    "FakePlatformCostDb", (), {"log_platform_cost": mock_log}
+                )(),
+            ),
+        ):
+            await persist_and_record_usage(
+                session=None,
+                user_id="user-str",
+                prompt_tokens=100,
+                completion_tokens=50,
+                cost_usd="0.01",
+            )
+            await asyncio.sleep(0)
+        mock_log.assert_awaited_once()
+        entry = mock_log.call_args[0][0]
+        assert entry.cost_microdollars == 10_000
+        assert entry.metadata["tracking_type"] == "cost_usd"
+
+    @pytest.mark.asyncio
+    async def test_empty_log_prefix_produces_copilot_block_name(self):
+        """Empty log_prefix results in block_name='copilot'."""
+        mock_log = AsyncMock()
+        with (
+            patch(
+                "backend.copilot.token_tracking.record_token_usage",
+                new_callable=AsyncMock,
+            ),
+            patch(
+                "backend.copilot.token_tracking.platform_cost_db",
+                return_value=type(
+                    "FakePlatformCostDb", (), {"log_platform_cost": mock_log}
+                )(),
+            ),
+        ):
+            await persist_and_record_usage(
+                session=None,
+                user_id="user-empty",
+                prompt_tokens=10,
+                completion_tokens=5,
+                log_prefix="",
+            )
+            await asyncio.sleep(0)
+        entry = mock_log.call_args[0][0]
+        assert entry.block_name == "copilot"
+
+    @pytest.mark.asyncio
+    async def test_cache_tokens_included_in_metadata(self):
+        """Cache token counts should be present in the metadata."""
+        mock_log = AsyncMock()
+        with (
+            patch(
+                "backend.copilot.token_tracking.record_token_usage",
+                new_callable=AsyncMock,
+            ),
+            patch(
+                "backend.copilot.token_tracking.platform_cost_db",
+                return_value=type(
+                    "FakePlatformCostDb", (), {"log_platform_cost": mock_log}
+                )(),
+            ),
+        ):
+            await persist_and_record_usage(
+                session=None,
+                user_id="user-cache",
+                prompt_tokens=100,
+                completion_tokens=50,
+                cache_read_tokens=5000,
+                cache_creation_tokens=300,
+            )
+            await asyncio.sleep(0)
+        entry = mock_log.call_args[0][0]
+        assert entry.metadata["cache_read_tokens"] == 5000
+        assert entry.metadata["cache_creation_tokens"] == 300
+        assert entry.metadata["source"] == "copilot"
+
+    @pytest.mark.asyncio
+    async def test_logs_cost_only_when_tokens_zero(self):
+        """Zero prompt+completion tokens with cost_usd set still logs the entry."""
+        mock_log = AsyncMock()
+        with (
+            patch(
+                "backend.copilot.token_tracking.record_token_usage",
+                new_callable=AsyncMock,
+            ),
+            patch(
+                "backend.copilot.token_tracking.platform_cost_db",
+                return_value=type(
+                    "FakePlatformCostDb", (), {"log_platform_cost": mock_log}
+                )(),
+            ),
+        ):
+            await persist_and_record_usage(
+                session=None,
+                user_id="user-cached",
+                prompt_tokens=0,
+                completion_tokens=0,
+                cost_usd=0.005,
+                model="claude-3-5-sonnet",
+                provider="anthropic",
+                log_prefix="[SDK]",
+            )
+            await asyncio.sleep(0)
+        # Guard: total_tokens == 0 but cost_usd is set — must still log
+        mock_log.assert_awaited_once()
+        entry = mock_log.call_args[0][0]
+        assert entry.user_id == "user-cached"
+        assert entry.tracking_type == "cost_usd"
+        assert entry.cost_microdollars == 5000
+        assert entry.input_tokens == 0
+        assert entry.output_tokens == 0
--- a/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py
@@ -33,12 +33,23 @@ _GET_CURRENT_DATE_BLOCK_ID = "b29c1b50-5d0e-4d9f-8f9d-1b0e6fcbf0b1"
 _GMAIL_SEND_BLOCK_ID = "6c27abc2-e51d-499e-a85f-5a0041ba94f0"
 _TEXT_REPLACE_BLOCK_ID = "7e7c87ab-3469-4bcc-9abe-67705091b713"

+# Default OrchestratorBlock model/mode — kept in sync with ChatConfig.model.
+# ChatConfig uses the OpenRouter format ("anthropic/claude-opus-4.6");
+# OrchestratorBlock uses the native Anthropic model name.
+ORCHESTRATOR_DEFAULT_MODEL = "claude-opus-4-6"
+ORCHESTRATOR_DEFAULT_EXECUTION_MODE = "extended_thinking"
+
 # Defaults applied to OrchestratorBlock nodes by the fixer.
-_SDM_DEFAULTS: dict[str, int | bool] = {
+# execution_mode and model match the copilot's default (extended thinking
+# with Opus) so generated agents inherit the same reasoning capabilities.
+# If the user explicitly sets these fields, the fixer won't override them.
+_SDM_DEFAULTS: dict[str, int | bool | str] = {
    "agent_mode_max_iterations": 10,
    "conversation_compaction": True,
    "retry": 3,
    "multiple_tool_calls": False,
+    "execution_mode": ORCHESTRATOR_DEFAULT_EXECUTION_MODE,
+    "model": ORCHESTRATOR_DEFAULT_MODEL,
 }


@@ -879,6 +890,12 @@ class AgentFixer:
            )

            if is_ai_block:
+                # Skip AI blocks that don't expose a "model" input property
+                # (some AI-category blocks have no model selector at all).
+                input_properties = block.get("inputSchema", {}).get("properties", {})
+                if "model" not in input_properties:
+                    continue
+
                node_id = node.get("id")
                input_default = node.get("input_default", {})
                current_model = input_default.get("model")
@@ -887,9 +904,7 @@ class AgentFixer:
                # Blocks with a block-specific enum on the model field (e.g.
                # PerplexityBlock) use their own enum values; others use the
                # generic set.
-                model_schema = (
-                    block.get("inputSchema", {}).get("properties", {}).get("model", {})
-                )
+                model_schema = input_properties.get("model", {})
                block_model_enum = model_schema.get("enum")

                if block_model_enum:
@@ -1649,6 +1664,8 @@ class AgentFixer:
        2. ``conversation_compaction`` defaults to ``True``
        3. ``retry`` defaults to ``3``
        4. ``multiple_tool_calls`` defaults to ``False``
+        5. ``execution_mode`` defaults to ``"extended_thinking"``
+        6. ``model`` defaults to ``"claude-opus-4-6"``

        Args:
            agent: The agent dictionary to fix
@@ -1748,6 +1765,12 @@ class AgentFixer:
        agent = self.fix_node_x_coordinates(agent, node_lookup=node_lookup)
        agent = self.fix_getcurrentdate_offset(agent)

+        # Apply OrchestratorBlock defaults BEFORE fix_ai_model_parameter so that
+        # the orchestrator-specific model (claude-opus-4-6) is set first and
+        # fix_ai_model_parameter sees it as a valid allowed model instead of
+        # overwriting it with the generic default (gpt-4o).
+        agent = self.fix_orchestrator_blocks(agent)
+
        # Apply fixes that require blocks information
        if blocks:
            agent = self.fix_invalid_nested_sink_links(
@@ -1765,9 +1788,6 @@ class AgentFixer:
        # Apply fixes for MCPToolBlock nodes
        agent = self.fix_mcp_tool_blocks(agent)

-        # Apply fixes for OrchestratorBlock nodes (agent-mode defaults)
-        agent = self.fix_orchestrator_blocks(agent)
-
        # Apply fixes for AgentExecutorBlock nodes (sub-agents)
        if library_agents:
            agent = self.fix_agent_executor_blocks(agent, library_agents)
--- a/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer_test.py
@@ -580,6 +580,29 @@ class TestFixAiModelParameter:

        assert result["nodes"][0]["input_default"]["model"] == "perplexity/sonar"

+    def test_ai_block_without_model_property_is_skipped(self):
+        """AI-category blocks that have no 'model' input property should not
+        have a model injected — they simply don't expose a model selector."""
+        fixer = AgentFixer()
+        block_id = generate_uuid()
+        node = _make_node(node_id="n1", block_id=block_id, input_default={})
+        agent = _make_agent(nodes=[node])
+
+        blocks = [
+            {
+                "id": block_id,
+                "name": "SomeAIBlock",
+                "categories": [{"category": "AI"}],
+                "inputSchema": {
+                    "properties": {"prompt": {"type": "string"}},
+                },
+            }
+        ]
+
+        result = fixer.fix_ai_model_parameter(agent, blocks)
+
+        assert "model" not in result["nodes"][0]["input_default"]
+

 class TestFixAgentExecutorBlocks:
    """Tests for fix_agent_executor_blocks."""
--- a/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide.py
+++ b/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide.py
@@ -42,7 +42,10 @@ class GetAgentBuildingGuideTool(BaseTool):

    @property
    def description(self) -> str:
-        return "Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage). Call before generating agent JSON."
+        return (
+            "Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage, "
+            "and the create->dry-run->fix iterative workflow). Call before generating agent JSON."
+        )

    @property
    def parameters(self) -> dict[str, Any]:
--- a/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide_test.py
@@ -0,0 +1,15 @@
+"""Tests for GetAgentBuildingGuideTool."""
+
+from backend.copilot.tools.get_agent_building_guide import _load_guide
+
+
+def test_load_guide_returns_string():
+    guide = _load_guide()
+    assert isinstance(guide, str)
+    assert len(guide) > 100
+
+
+def test_load_guide_caches():
+    guide1 = _load_guide()
+    guide2 = _load_guide()
+    assert guide1 is guide2
--- a/autogpt_platform/backend/backend/copilot/tools/helpers.py
+++ b/autogpt_platform/backend/backend/copilot/tools/helpers.py
@@ -48,27 +48,41 @@ logger = logging.getLogger(__name__)
 def get_inputs_from_schema(
    input_schema: dict[str, Any],
    exclude_fields: set[str] | None = None,
+    input_data: dict[str, Any] | None = None,
 ) -> list[dict[str, Any]]:
-    """Extract input field info from JSON schema."""
+    """Extract input field info from JSON schema.
+
+    When *input_data* is provided, each field's ``value`` key is populated
+    with the value the CoPilot already supplied — so the frontend can
+    prefill the form instead of showing empty inputs.  Fields marked
+    ``advanced`` in the schema are flagged so the frontend can hide them
+    by default (matching the builder behaviour).
+    """
    if not isinstance(input_schema, dict):
        return []

    exclude = exclude_fields or set()
    properties = input_schema.get("properties", {})
    required = set(input_schema.get("required", []))
+    provided = input_data or {}

-    return [
-        {
+    results: list[dict[str, Any]] = []
+    for name, schema in properties.items():
+        if name in exclude:
+            continue
+        entry: dict[str, Any] = {
            "name": name,
            "title": schema.get("title", name),
            "type": schema.get("type", "string"),
            "description": schema.get("description", ""),
            "required": name in required,
            "default": schema.get("default"),
+            "advanced": schema.get("advanced", False),
        }
-        for name, schema in properties.items()
-        if name not in exclude
-    ]
+        if name in provided:
+            entry["value"] = provided[name]
+        results.append(entry)
+    return results


 async def execute_block(
@@ -446,7 +460,9 @@ async def prepare_block_for_execution(
                requirements={
                    "credentials": missing_creds_list,
                    "inputs": get_inputs_from_schema(
-                        input_schema, exclude_fields=credentials_fields
+                        input_schema,
+                        exclude_fields=credentials_fields,
+                        input_data=input_data,
                    ),
                    "execution_modes": ["immediate"],
                },
--- a/autogpt_platform/backend/backend/copilot/tools/run_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/run_agent.py
@@ -153,7 +153,11 @@ class RunAgentTool(BaseTool):
                },
                "dry_run": {
                    "type": "boolean",
-                    "description": "Execute in preview mode.",
+                    "description": (
+                        "When true, simulates execution using an LLM for each block "
+                        "— no real API calls, credentials, or credits. "
+                        "See agent_generation_guide for the full workflow."
+                    ),
                },
            },
            "required": ["dry_run"],
--- a/autogpt_platform/backend/backend/copilot/tools/workspace_files.py
+++ b/autogpt_platform/backend/backend/copilot/tools/workspace_files.py
@@ -845,6 +845,7 @@ class WriteWorkspaceFileTool(BaseTool):
                path=path,
                mime_type=mime_type,
                overwrite=overwrite,
+                metadata={"origin": "agent-created"},
            )

            # Build informative source label and message.
--- a/autogpt_platform/backend/backend/copilot/transcript.py
+++ b/autogpt_platform/backend/backend/copilot/transcript.py
--- a/autogpt_platform/backend/backend/copilot/transcript_builder.py
+++ b/autogpt_platform/backend/backend/copilot/transcript_builder.py
@@ -0,0 +1,240 @@
+"""Build complete JSONL transcript from SDK messages.
+
+The transcript represents the FULL active context at any point in time.
+Each upload REPLACES the previous transcript atomically.
+
+Flow:
+  Turn 1: Upload [msg1, msg2]
+  Turn 2: Download [msg1, msg2] → Upload [msg1, msg2, msg3, msg4] (REPLACE)
+  Turn 3: Download [msg1, msg2, msg3, msg4] → Upload [all messages] (REPLACE)
+
+The transcript is never incremental - always the complete atomic state.
+"""
+
+import logging
+from typing import Any
+from uuid import uuid4
+
+from pydantic import BaseModel
+
+from backend.util import json
+
+from .transcript import STRIPPABLE_TYPES
+
+logger = logging.getLogger(__name__)
+
+
+class TranscriptEntry(BaseModel):
+    """Single transcript entry (user or assistant turn)."""
+
+    type: str
+    uuid: str
+    parentUuid: str = ""
+    isCompactSummary: bool | None = None
+    message: dict[str, Any]
+
+
+class TranscriptBuilder:
+    """Build complete JSONL transcript from SDK messages.
+
+    This builder maintains the FULL conversation state, not incremental changes.
+    The output is always the complete active context.
+    """
+
+    def __init__(self) -> None:
+        self._entries: list[TranscriptEntry] = []
+        self._last_uuid: str | None = None
+
+    def _last_is_assistant(self) -> bool:
+        return bool(self._entries) and self._entries[-1].type == "assistant"
+
+    def _last_message_id(self) -> str:
+        """Return the message.id of the last entry, or '' if none."""
+        if self._entries:
+            return self._entries[-1].message.get("id", "")
+        return ""
+
+    @staticmethod
+    def _parse_entry(data: dict) -> TranscriptEntry | None:
+        """Parse a single transcript entry, filtering strippable types.
+
+        Returns ``None`` for entries that should be skipped (strippable types
+        that are not compaction summaries).
+        """
+        entry_type = data.get("type", "")
+        if entry_type in STRIPPABLE_TYPES and not data.get("isCompactSummary"):
+            return None
+        return TranscriptEntry(
+            type=entry_type,
+            uuid=data.get("uuid") or str(uuid4()),
+            parentUuid=data.get("parentUuid") or "",
+            isCompactSummary=data.get("isCompactSummary"),
+            message=data.get("message", {}),
+        )
+
+    def load_previous(self, content: str, log_prefix: str = "[Transcript]") -> None:
+        """Load complete previous transcript.
+
+        This loads the FULL previous context. As new messages come in,
+        we append to this state. The final output is the complete context
+        (previous + new), not just the delta.
+        """
+        if not content or not content.strip():
+            return
+
+        lines = content.strip().split("\n")
+        for line_num, line in enumerate(lines, 1):
+            if not line.strip():
+                continue
+
+            data = json.loads(line, fallback=None)
+            if data is None:
+                logger.warning(
+                    "%s Failed to parse transcript line %d/%d",
+                    log_prefix,
+                    line_num,
+                    len(lines),
+                )
+                continue
+
+            entry = self._parse_entry(data)
+            if entry is None:
+                continue
+            self._entries.append(entry)
+            self._last_uuid = entry.uuid
+
+        logger.info(
+            "%s Loaded %d entries from previous transcript (last_uuid=%s)",
+            log_prefix,
+            len(self._entries),
+            self._last_uuid[:12] if self._last_uuid else None,
+        )
+
+    def append_user(self, content: str | list[dict], uuid: str | None = None) -> None:
+        """Append a user entry."""
+        msg_uuid = uuid or str(uuid4())
+
+        self._entries.append(
+            TranscriptEntry(
+                type="user",
+                uuid=msg_uuid,
+                parentUuid=self._last_uuid or "",
+                message={"role": "user", "content": content},
+            )
+        )
+        self._last_uuid = msg_uuid
+
+    def append_tool_result(self, tool_use_id: str, content: str) -> None:
+        """Append a tool result as a user entry (one per tool call)."""
+        self.append_user(
+            content=[
+                {"type": "tool_result", "tool_use_id": tool_use_id, "content": content}
+            ]
+        )
+
+    def append_assistant(
+        self,
+        content_blocks: list[dict],
+        model: str = "",
+        stop_reason: str | None = None,
+    ) -> None:
+        """Append an assistant entry.
+
+        Consecutive assistant entries automatically share the same message ID
+        so the CLI can merge them (thinking → text → tool_use) into a single
+        API message on ``--resume``.  A new ID is assigned whenever an
+        assistant entry follows a non-assistant entry (user message or tool
+        result), because that marks the start of a new API response.
+        """
+        message_id = (
+            self._last_message_id()
+            if self._last_is_assistant()
+            else f"msg_sdk_{uuid4().hex[:24]}"
+        )
+
+        msg_uuid = str(uuid4())
+
+        self._entries.append(
+            TranscriptEntry(
+                type="assistant",
+                uuid=msg_uuid,
+                parentUuid=self._last_uuid or "",
+                message={
+                    "role": "assistant",
+                    "model": model,
+                    "id": message_id,
+                    "type": "message",
+                    "content": content_blocks,
+                    "stop_reason": stop_reason,
+                    "stop_sequence": None,
+                },
+            )
+        )
+        self._last_uuid = msg_uuid
+
+    def replace_entries(
+        self, compacted_entries: list[dict], log_prefix: str = "[Transcript]"
+    ) -> None:
+        """Replace all entries with compacted entries from the CLI session file.
+
+        Called after mid-stream compaction so TranscriptBuilder mirrors the
+        CLI's active context (compaction summary + post-compaction entries).
+
+        Builds the new list first and validates it's non-empty before swapping,
+        so corrupt input cannot wipe the conversation history.
+        """
+        new_entries: list[TranscriptEntry] = []
+        for data in compacted_entries:
+            entry = self._parse_entry(data)
+            if entry is not None:
+                new_entries.append(entry)
+
+        if not new_entries:
+            logger.warning(
+                "%s replace_entries produced 0 entries from %d inputs, keeping old (%d entries)",
+                log_prefix,
+                len(compacted_entries),
+                len(self._entries),
+            )
+            return
+
+        old_count = len(self._entries)
+        self._entries = new_entries
+        self._last_uuid = new_entries[-1].uuid
+
+        logger.info(
+            "%s TranscriptBuilder compacted: %d entries -> %d entries",
+            log_prefix,
+            old_count,
+            len(self._entries),
+        )
+
+    def to_jsonl(self) -> str:
+        """Export complete context as JSONL.
+
+        Consecutive assistant entries are kept separate to match the
+        native CLI format — the SDK merges them internally on resume.
+
+        Returns the FULL conversation state (all entries), not incremental.
+        This output REPLACES any previous transcript.
+        """
+        if not self._entries:
+            return ""
+
+        lines = [entry.model_dump_json(exclude_none=True) for entry in self._entries]
+        return "\n".join(lines) + "\n"
+
+    @property
+    def entry_count(self) -> int:
+        """Total number of entries in the complete context."""
+        return len(self._entries)
+
+    @property
+    def is_empty(self) -> bool:
+        """Whether this builder has any entries."""
+        return len(self._entries) == 0
+
+    @property
+    def last_entry_type(self) -> str | None:
+        """Type of the last entry, or None if empty."""
+        return self._entries[-1].type if self._entries else None
--- a/autogpt_platform/backend/backend/copilot/transcript_builder_test.py
+++ b/autogpt_platform/backend/backend/copilot/transcript_builder_test.py
@@ -0,0 +1,260 @@
+"""Tests for canonical TranscriptBuilder (backend.copilot.transcript_builder).
+
+These tests directly import from the canonical module to ensure codecov
+patch coverage for the new file.
+"""
+
+from backend.copilot.transcript_builder import TranscriptBuilder, TranscriptEntry
+from backend.util import json
+
+
+def _make_jsonl(*entries: dict) -> str:
+    return "\n".join(json.dumps(e) for e in entries) + "\n"
+
+
+USER_MSG = {
+    "type": "user",
+    "uuid": "u1",
+    "message": {"role": "user", "content": "hello"},
+}
+ASST_MSG = {
+    "type": "assistant",
+    "uuid": "a1",
+    "parentUuid": "u1",
+    "message": {
+        "role": "assistant",
+        "id": "msg_1",
+        "type": "message",
+        "content": [{"type": "text", "text": "hi"}],
+        "stop_reason": "end_turn",
+        "stop_sequence": None,
+    },
+}
+
+
+class TestTranscriptEntry:
+    def test_basic_construction(self):
+        entry = TranscriptEntry(
+            type="user", uuid="u1", message={"role": "user", "content": "hi"}
+        )
+        assert entry.type == "user"
+        assert entry.uuid == "u1"
+        assert entry.parentUuid == ""
+        assert entry.isCompactSummary is None
+
+    def test_optional_fields(self):
+        entry = TranscriptEntry(
+            type="summary",
+            uuid="s1",
+            parentUuid="p1",
+            isCompactSummary=True,
+            message={"role": "user", "content": "summary"},
+        )
+        assert entry.isCompactSummary is True
+        assert entry.parentUuid == "p1"
+
+
+class TestTranscriptBuilderInit:
+    def test_starts_empty(self):
+        builder = TranscriptBuilder()
+        assert builder.is_empty
+        assert builder.entry_count == 0
+        assert builder.last_entry_type is None
+        assert builder.to_jsonl() == ""
+
+
+class TestAppendUser:
+    def test_appends_user_entry(self):
+        builder = TranscriptBuilder()
+        builder.append_user("hello")
+        assert builder.entry_count == 1
+        assert builder.last_entry_type == "user"
+
+    def test_chains_parent_uuid(self):
+        builder = TranscriptBuilder()
+        builder.append_user("first", uuid="u1")
+        builder.append_user("second", uuid="u2")
+        output = builder.to_jsonl()
+        entries = [json.loads(line) for line in output.strip().split("\n")]
+        assert entries[0]["parentUuid"] == ""
+        assert entries[1]["parentUuid"] == "u1"
+
+    def test_custom_uuid(self):
+        builder = TranscriptBuilder()
+        builder.append_user("hello", uuid="custom-id")
+        output = builder.to_jsonl()
+        entry = json.loads(output.strip())
+        assert entry["uuid"] == "custom-id"
+
+
+class TestAppendToolResult:
+    def test_appends_as_user_entry(self):
+        builder = TranscriptBuilder()
+        builder.append_tool_result(tool_use_id="tc_1", content="result text")
+        assert builder.entry_count == 1
+        assert builder.last_entry_type == "user"
+        output = builder.to_jsonl()
+        entry = json.loads(output.strip())
+        content = entry["message"]["content"]
+        assert len(content) == 1
+        assert content[0]["type"] == "tool_result"
+        assert content[0]["tool_use_id"] == "tc_1"
+        assert content[0]["content"] == "result text"
+
+
+class TestAppendAssistant:
+    def test_appends_assistant_entry(self):
+        builder = TranscriptBuilder()
+        builder.append_user("hi")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "hello"}],
+            model="test-model",
+            stop_reason="end_turn",
+        )
+        assert builder.entry_count == 2
+        assert builder.last_entry_type == "assistant"
+
+    def test_consecutive_assistants_share_message_id(self):
+        builder = TranscriptBuilder()
+        builder.append_user("hi")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "part 1"}],
+            model="m",
+        )
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "part 2"}],
+            model="m",
+        )
+        output = builder.to_jsonl()
+        entries = [json.loads(line) for line in output.strip().split("\n")]
+        # The two assistant entries share the same message ID
+        assert entries[1]["message"]["id"] == entries[2]["message"]["id"]
+
+    def test_non_consecutive_assistants_get_different_ids(self):
+        builder = TranscriptBuilder()
+        builder.append_user("q1")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "a1"}],
+            model="m",
+        )
+        builder.append_user("q2")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "a2"}],
+            model="m",
+        )
+        output = builder.to_jsonl()
+        entries = [json.loads(line) for line in output.strip().split("\n")]
+        assert entries[1]["message"]["id"] != entries[3]["message"]["id"]
+
+
+class TestLoadPrevious:
+    def test_loads_valid_entries(self):
+        content = _make_jsonl(USER_MSG, ASST_MSG)
+        builder = TranscriptBuilder()
+        builder.load_previous(content)
+        assert builder.entry_count == 2
+
+    def test_skips_empty_content(self):
+        builder = TranscriptBuilder()
+        builder.load_previous("")
+        assert builder.is_empty
+        builder.load_previous("   ")
+        assert builder.is_empty
+
+    def test_skips_strippable_types(self):
+        progress = {"type": "progress", "uuid": "p1", "message": {}}
+        content = _make_jsonl(USER_MSG, progress, ASST_MSG)
+        builder = TranscriptBuilder()
+        builder.load_previous(content)
+        assert builder.entry_count == 2  # progress was skipped
+
+    def test_preserves_compact_summary(self):
+        compact = {
+            "type": "summary",
+            "uuid": "cs1",
+            "isCompactSummary": True,
+            "message": {"role": "user", "content": "summary"},
+        }
+        content = _make_jsonl(compact, ASST_MSG)
+        builder = TranscriptBuilder()
+        builder.load_previous(content)
+        assert builder.entry_count == 2
+
+    def test_skips_invalid_json_lines(self):
+        content = '{"type":"user","uuid":"u1","message":{}}\nnot-valid-json\n'
+        builder = TranscriptBuilder()
+        builder.load_previous(content)
+        assert builder.entry_count == 1
+
+
+class TestToJsonl:
+    def test_roundtrip(self):
+        builder = TranscriptBuilder()
+        builder.append_user("hello", uuid="u1")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "world"}],
+            model="m",
+        )
+        output = builder.to_jsonl()
+        assert output.endswith("\n")
+        lines = output.strip().split("\n")
+        assert len(lines) == 2
+        for line in lines:
+            parsed = json.loads(line)
+            assert "type" in parsed
+            assert "uuid" in parsed
+            assert "message" in parsed
+
+
+class TestReplaceEntries:
+    def test_replaces_all_entries(self):
+        builder = TranscriptBuilder()
+        builder.append_user("old")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "old answer"}], model="m"
+        )
+        assert builder.entry_count == 2
+
+        compacted = [
+            {
+                "type": "summary",
+                "uuid": "cs1",
+                "isCompactSummary": True,
+                "message": {"role": "user", "content": "compacted"},
+            }
+        ]
+        builder.replace_entries(compacted)
+        assert builder.entry_count == 1
+
+    def test_empty_replacement_keeps_existing(self):
+        builder = TranscriptBuilder()
+        builder.append_user("keep me")
+        builder.replace_entries([])
+        assert builder.entry_count == 1
+
+
+class TestParseEntry:
+    def test_filters_strippable_non_compact(self):
+        result = TranscriptBuilder._parse_entry(
+            {"type": "progress", "uuid": "p1", "message": {}}
+        )
+        assert result is None
+
+    def test_keeps_compact_summary(self):
+        result = TranscriptBuilder._parse_entry(
+            {
+                "type": "summary",
+                "uuid": "cs1",
+                "isCompactSummary": True,
+                "message": {},
+            }
+        )
+        assert result is not None
+        assert result.isCompactSummary is True
+
+    def test_generates_uuid_if_missing(self):
+        result = TranscriptBuilder._parse_entry(
+            {"type": "user", "message": {"role": "user", "content": "hi"}}
+        )
+        assert result is not None
+        assert result.uuid  # Should be a generated UUID
--- a/autogpt_platform/backend/backend/copilot/transcript_test.py
+++ b/autogpt_platform/backend/backend/copilot/transcript_test.py
@@ -0,0 +1,726 @@
+"""Tests for canonical transcript module (backend.copilot.transcript).
+
+Covers pure helper functions that are not exercised by the SDK re-export tests.
+"""
+
+from __future__ import annotations
+
+from unittest.mock import MagicMock
+
+from backend.util import json
+
+from .transcript import (
+    TranscriptDownload,
+    _build_path_from_parts,
+    _find_last_assistant_entry,
+    _flatten_assistant_content,
+    _flatten_tool_result_content,
+    _messages_to_transcript,
+    _meta_storage_path_parts,
+    _rechain_tail,
+    _sanitize_id,
+    _storage_path_parts,
+    _transcript_to_messages,
+    strip_for_upload,
+    validate_transcript,
+)
+
+
+def _make_jsonl(*entries: dict) -> str:
+    return "\n".join(json.dumps(e) for e in entries) + "\n"
+
+
+# ---------------------------------------------------------------------------
+# _sanitize_id
+# ---------------------------------------------------------------------------
+
+
+class TestSanitizeId:
+    def test_uuid_passes_through(self):
+        assert _sanitize_id("abcdef12-3456-7890-abcd-ef1234567890") == (
+            "abcdef12-3456-7890-abcd-ef1234567890"
+        )
+
+    def test_strips_non_hex_characters(self):
+        # Only hex chars (0-9, a-f, A-F) and hyphens are kept
+        result = _sanitize_id("abc/../../etc/passwd")
+        assert "/" not in result
+        assert "." not in result
+        # 'p', 's', 'w' are not hex chars, so they are stripped
+        assert all(c in "0123456789abcdefABCDEF-" for c in result)
+
+    def test_truncates_to_max_len(self):
+        long_id = "a" * 100
+        result = _sanitize_id(long_id, max_len=10)
+        assert len(result) == 10
+
+    def test_empty_returns_unknown(self):
+        assert _sanitize_id("") == "unknown"
+
+    def test_none_returns_unknown(self):
+        assert _sanitize_id(None) == "unknown"  # type: ignore[arg-type]
+
+    def test_special_chars_only_returns_unknown(self):
+        assert _sanitize_id("!@#$%^&*()") == "unknown"
+
+
+# ---------------------------------------------------------------------------
+# _storage_path_parts / _meta_storage_path_parts
+# ---------------------------------------------------------------------------
+
+
+class TestStoragePathParts:
+    def test_returns_triple(self):
+        prefix, uid, fname = _storage_path_parts("user-1", "sess-2")
+        assert prefix == "chat-transcripts"
+        assert "e" in uid  # hex chars from "user-1" sanitized
+        assert fname.endswith(".jsonl")
+
+    def test_meta_returns_meta_json(self):
+        prefix, uid, fname = _meta_storage_path_parts("user-1", "sess-2")
+        assert prefix == "chat-transcripts"
+        assert fname.endswith(".meta.json")
+
+
+# ---------------------------------------------------------------------------
+# _build_path_from_parts
+# ---------------------------------------------------------------------------
+
+
+class TestBuildPathFromParts:
+    def test_gcs_backend(self):
+        from backend.util.workspace_storage import GCSWorkspaceStorage
+
+        mock_gcs = MagicMock(spec=GCSWorkspaceStorage)
+        mock_gcs.bucket_name = "my-bucket"
+        path = _build_path_from_parts(("wid", "fid", "file.jsonl"), mock_gcs)
+        assert path == "gcs://my-bucket/workspaces/wid/fid/file.jsonl"
+
+    def test_local_backend(self):
+        # Use a plain object (not MagicMock) so isinstance(GCSWorkspaceStorage) is False
+        local_backend = type("LocalBackend", (), {})()
+        path = _build_path_from_parts(("wid", "fid", "file.jsonl"), local_backend)
+        assert path == "local://wid/fid/file.jsonl"
+
+
+# ---------------------------------------------------------------------------
+# TranscriptDownload dataclass
+# ---------------------------------------------------------------------------
+
+
+class TestTranscriptDownload:
+    def test_defaults(self):
+        td = TranscriptDownload(content="hello")
+        assert td.content == "hello"
+        assert td.message_count == 0
+        assert td.uploaded_at == 0.0
+
+    def test_custom_values(self):
+        td = TranscriptDownload(content="data", message_count=5, uploaded_at=123.45)
+        assert td.message_count == 5
+        assert td.uploaded_at == 123.45
+
+
+# ---------------------------------------------------------------------------
+# _flatten_assistant_content
+# ---------------------------------------------------------------------------
+
+
+class TestFlattenAssistantContent:
+    def test_text_blocks(self):
+        blocks = [
+            {"type": "text", "text": "Hello"},
+            {"type": "text", "text": "World"},
+        ]
+        assert _flatten_assistant_content(blocks) == "Hello\nWorld"
+
+    def test_thinking_blocks_stripped(self):
+        blocks = [
+            {"type": "thinking", "thinking": "hmm..."},
+            {"type": "text", "text": "answer"},
+            {"type": "redacted_thinking", "data": "secret"},
+        ]
+        assert _flatten_assistant_content(blocks) == "answer"
+
+    def test_tool_use_blocks_stripped(self):
+        blocks = [
+            {"type": "text", "text": "I'll run a tool"},
+            {"type": "tool_use", "name": "bash", "id": "tc1", "input": {}},
+        ]
+        assert _flatten_assistant_content(blocks) == "I'll run a tool"
+
+    def test_string_blocks(self):
+        blocks = ["hello", "world"]
+        assert _flatten_assistant_content(blocks) == "hello\nworld"
+
+    def test_empty_blocks(self):
+        assert _flatten_assistant_content([]) == ""
+
+    def test_unknown_dict_blocks_skipped(self):
+        blocks = [{"type": "image", "data": "base64..."}]
+        assert _flatten_assistant_content(blocks) == ""
+
+
+# ---------------------------------------------------------------------------
+# _flatten_tool_result_content
+# ---------------------------------------------------------------------------
+
+
+class TestFlattenToolResultContent:
+    def test_tool_result_with_text_content(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "tc1",
+                "content": [{"type": "text", "text": "output data"}],
+            }
+        ]
+        assert _flatten_tool_result_content(blocks) == "output data"
+
+    def test_tool_result_with_string_content(self):
+        blocks = [
+            {"type": "tool_result", "tool_use_id": "tc1", "content": "simple string"}
+        ]
+        assert _flatten_tool_result_content(blocks) == "simple string"
+
+    def test_tool_result_with_image_placeholder(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "tc1",
+                "content": [{"type": "image", "data": "base64..."}],
+            }
+        ]
+        assert _flatten_tool_result_content(blocks) == "[__image__]"
+
+    def test_tool_result_with_document_placeholder(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "tc1",
+                "content": [{"type": "document", "data": "base64..."}],
+            }
+        ]
+        assert _flatten_tool_result_content(blocks) == "[__document__]"
+
+    def test_tool_result_with_none_content(self):
+        blocks = [{"type": "tool_result", "tool_use_id": "tc1", "content": None}]
+        assert _flatten_tool_result_content(blocks) == ""
+
+    def test_text_block_outside_tool_result(self):
+        blocks = [{"type": "text", "text": "standalone"}]
+        assert _flatten_tool_result_content(blocks) == "standalone"
+
+    def test_unknown_dict_block_placeholder(self):
+        blocks = [{"type": "custom_widget", "data": "x"}]
+        assert _flatten_tool_result_content(blocks) == "[__custom_widget__]"
+
+    def test_string_blocks(self):
+        blocks = ["raw text"]
+        assert _flatten_tool_result_content(blocks) == "raw text"
+
+    def test_empty_blocks(self):
+        assert _flatten_tool_result_content([]) == ""
+
+    def test_mixed_content_in_tool_result(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "tc1",
+                "content": [
+                    {"type": "text", "text": "line1"},
+                    {"type": "image", "data": "..."},
+                    "raw string",
+                ],
+            }
+        ]
+        result = _flatten_tool_result_content(blocks)
+        assert "line1" in result
+        assert "[__image__]" in result
+        assert "raw string" in result
+
+    def test_tool_result_with_dict_without_text_key(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "tc1",
+                "content": [{"count": 42}],
+            }
+        ]
+        result = _flatten_tool_result_content(blocks)
+        assert "42" in result
+
+    def test_tool_result_content_list_with_list_content(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "tc1",
+                "content": [{"type": "text", "text": None}],
+            }
+        ]
+        result = _flatten_tool_result_content(blocks)
+        assert result == "None"
+
+
+# ---------------------------------------------------------------------------
+# _transcript_to_messages
+# ---------------------------------------------------------------------------
+
+USER_ENTRY = {
+    "type": "user",
+    "uuid": "u1",
+    "parentUuid": "",
+    "message": {"role": "user", "content": "hello"},
+}
+ASST_ENTRY = {
+    "type": "assistant",
+    "uuid": "a1",
+    "parentUuid": "u1",
+    "message": {
+        "role": "assistant",
+        "id": "msg_1",
+        "content": [{"type": "text", "text": "hi there"}],
+    },
+}
+PROGRESS_ENTRY = {
+    "type": "progress",
+    "uuid": "p1",
+    "parentUuid": "u1",
+    "data": {},
+}
+
+
+class TestTranscriptToMessages:
+    def test_basic_conversion(self):
+        content = _make_jsonl(USER_ENTRY, ASST_ENTRY)
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 2
+        assert messages[0] == {"role": "user", "content": "hello"}
+        assert messages[1]["role"] == "assistant"
+        assert messages[1]["content"] == "hi there"
+
+    def test_skips_strippable_types(self):
+        content = _make_jsonl(USER_ENTRY, PROGRESS_ENTRY, ASST_ENTRY)
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 2
+
+    def test_skips_entries_without_role(self):
+        no_role = {"type": "user", "uuid": "x", "message": {"content": "no role"}}
+        content = _make_jsonl(no_role)
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 0
+
+    def test_handles_string_content(self):
+        entry = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "plain string"},
+        }
+        content = _make_jsonl(entry)
+        messages = _transcript_to_messages(content)
+        assert messages[0]["content"] == "plain string"
+
+    def test_handles_tool_result_content(self):
+        entry = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {
+                "role": "user",
+                "content": [
+                    {"type": "tool_result", "tool_use_id": "tc1", "content": "output"}
+                ],
+            },
+        }
+        content = _make_jsonl(entry)
+        messages = _transcript_to_messages(content)
+        assert messages[0]["content"] == "output"
+
+    def test_handles_none_content(self):
+        entry = {
+            "type": "assistant",
+            "uuid": "a1",
+            "message": {"role": "assistant", "content": None},
+        }
+        content = _make_jsonl(entry)
+        messages = _transcript_to_messages(content)
+        assert messages[0]["content"] == ""
+
+    def test_skips_invalid_json(self):
+        content = "not valid json\n"
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 0
+
+    def test_preserves_compact_summary(self):
+        compact = {
+            "type": "summary",
+            "uuid": "cs1",
+            "isCompactSummary": True,
+            "message": {"role": "user", "content": "summary of conversation"},
+        }
+        content = _make_jsonl(compact)
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 1
+
+    def test_strips_summary_without_compact_flag(self):
+        summary = {
+            "type": "summary",
+            "uuid": "s1",
+            "message": {"role": "user", "content": "summary"},
+        }
+        content = _make_jsonl(summary)
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 0
+
+
+# ---------------------------------------------------------------------------
+# _messages_to_transcript
+# ---------------------------------------------------------------------------
+
+
+class TestMessagesToTranscript:
+    def test_basic_roundtrip(self):
+        messages = [
+            {"role": "user", "content": "hello"},
+            {"role": "assistant", "content": "world"},
+        ]
+        result = _messages_to_transcript(messages)
+        assert result.endswith("\n")
+        lines = result.strip().split("\n")
+        assert len(lines) == 2
+
+        user_entry = json.loads(lines[0])
+        assert user_entry["type"] == "user"
+        assert user_entry["message"]["role"] == "user"
+        assert user_entry["message"]["content"] == "hello"
+        assert user_entry["parentUuid"] == ""
+
+        asst_entry = json.loads(lines[1])
+        assert asst_entry["type"] == "assistant"
+        assert asst_entry["message"]["role"] == "assistant"
+        assert asst_entry["message"]["content"] == [{"type": "text", "text": "world"}]
+        assert asst_entry["parentUuid"] == user_entry["uuid"]
+
+    def test_empty_messages(self):
+        assert _messages_to_transcript([]) == ""
+
+    def test_assistant_has_message_envelope(self):
+        messages = [{"role": "assistant", "content": "test"}]
+        result = _messages_to_transcript(messages)
+        entry = json.loads(result.strip())
+        msg = entry["message"]
+        assert "id" in msg
+        assert msg["id"].startswith("msg_compact_")
+        assert msg["type"] == "message"
+        assert msg["stop_reason"] == "end_turn"
+        assert msg["stop_sequence"] is None
+
+    def test_uuid_chain(self):
+        messages = [
+            {"role": "user", "content": "a"},
+            {"role": "assistant", "content": "b"},
+            {"role": "user", "content": "c"},
+        ]
+        result = _messages_to_transcript(messages)
+        lines = result.strip().split("\n")
+        entries = [json.loads(line) for line in lines]
+        assert entries[0]["parentUuid"] == ""
+        assert entries[1]["parentUuid"] == entries[0]["uuid"]
+        assert entries[2]["parentUuid"] == entries[1]["uuid"]
+
+    def test_assistant_with_empty_content(self):
+        messages = [{"role": "assistant", "content": ""}]
+        result = _messages_to_transcript(messages)
+        entry = json.loads(result.strip())
+        assert entry["message"]["content"] == []
+
+
+# ---------------------------------------------------------------------------
+# _find_last_assistant_entry
+# ---------------------------------------------------------------------------
+
+
+class TestFindLastAssistantEntry:
+    def test_splits_at_last_assistant(self):
+        user = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "hi"},
+        }
+        asst = {
+            "type": "assistant",
+            "uuid": "a1",
+            "message": {"role": "assistant", "id": "msg1", "content": "answer"},
+        }
+        content = _make_jsonl(user, asst)
+        prefix, tail = _find_last_assistant_entry(content)
+        assert len(prefix) == 1
+        assert len(tail) == 1
+
+    def test_no_assistant_returns_all_in_prefix(self):
+        user1 = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "hi"},
+        }
+        user2 = {
+            "type": "user",
+            "uuid": "u2",
+            "message": {"role": "user", "content": "hey"},
+        }
+        content = _make_jsonl(user1, user2)
+        prefix, tail = _find_last_assistant_entry(content)
+        assert len(prefix) == 2
+        assert len(tail) == 0
+
+    def test_multi_entry_turn_preserved(self):
+        user = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "q"},
+        }
+        asst1 = {
+            "type": "assistant",
+            "uuid": "a1",
+            "message": {
+                "role": "assistant",
+                "id": "msg_turn",
+                "content": [{"type": "thinking", "thinking": "hmm"}],
+            },
+        }
+        asst2 = {
+            "type": "assistant",
+            "uuid": "a2",
+            "message": {
+                "role": "assistant",
+                "id": "msg_turn",
+                "content": [{"type": "text", "text": "answer"}],
+            },
+        }
+        content = _make_jsonl(user, asst1, asst2)
+        prefix, tail = _find_last_assistant_entry(content)
+        assert len(prefix) == 1  # just the user
+        assert len(tail) == 2  # both assistant entries
+
+    def test_assistant_without_id(self):
+        user = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "q"},
+        }
+        asst = {
+            "type": "assistant",
+            "uuid": "a1",
+            "message": {"role": "assistant", "content": "no id"},
+        }
+        content = _make_jsonl(user, asst)
+        prefix, tail = _find_last_assistant_entry(content)
+        assert len(prefix) == 1
+        assert len(tail) == 1
+
+    def test_trailing_user_after_assistant(self):
+        user1 = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "q"},
+        }
+        asst = {
+            "type": "assistant",
+            "uuid": "a1",
+            "message": {"role": "assistant", "id": "msg1", "content": "a"},
+        }
+        user2 = {
+            "type": "user",
+            "uuid": "u2",
+            "message": {"role": "user", "content": "follow"},
+        }
+        content = _make_jsonl(user1, asst, user2)
+        prefix, tail = _find_last_assistant_entry(content)
+        assert len(prefix) == 1  # user1
+        assert len(tail) == 2  # asst + user2
+
+
+# ---------------------------------------------------------------------------
+# _rechain_tail
+# ---------------------------------------------------------------------------
+
+
+class TestRechainTail:
+    def test_empty_tail(self):
+        assert _rechain_tail("some prefix\n", []) == ""
+
+    def test_patches_first_entry_parent(self):
+        prefix_entry = {"uuid": "last-prefix-uuid", "type": "user", "message": {}}
+        prefix = json.dumps(prefix_entry) + "\n"
+
+        tail_entry = {
+            "uuid": "t1",
+            "parentUuid": "old-parent",
+            "type": "assistant",
+            "message": {},
+        }
+        tail_lines = [json.dumps(tail_entry)]
+
+        result = _rechain_tail(prefix, tail_lines)
+        parsed = json.loads(result.strip())
+        assert parsed["parentUuid"] == "last-prefix-uuid"
+
+    def test_chains_consecutive_tail_entries(self):
+        prefix_entry = {"uuid": "p1", "type": "user", "message": {}}
+        prefix = json.dumps(prefix_entry) + "\n"
+
+        t1 = {"uuid": "t1", "parentUuid": "old1", "type": "assistant", "message": {}}
+        t2 = {"uuid": "t2", "parentUuid": "old2", "type": "user", "message": {}}
+        tail_lines = [json.dumps(t1), json.dumps(t2)]
+
+        result = _rechain_tail(prefix, tail_lines)
+        entries = [json.loads(line) for line in result.strip().split("\n")]
+        assert entries[0]["parentUuid"] == "p1"
+        assert entries[1]["parentUuid"] == "t1"
+
+    def test_non_dict_lines_passed_through(self):
+        prefix_entry = {"uuid": "p1", "type": "user", "message": {}}
+        prefix = json.dumps(prefix_entry) + "\n"
+
+        tail_lines = ["not-a-json-dict"]
+        result = _rechain_tail(prefix, tail_lines)
+        assert "not-a-json-dict" in result
+
+
+# ---------------------------------------------------------------------------
+# strip_for_upload (combined single-parse)
+# ---------------------------------------------------------------------------
+
+
+class TestStripForUpload:
+    def test_strips_progress_and_thinking(self):
+        user = {
+            "type": "user",
+            "uuid": "u1",
+            "parentUuid": "",
+            "message": {"role": "user", "content": "hi"},
+        }
+        progress = {"type": "progress", "uuid": "p1", "parentUuid": "u1", "data": {}}
+        asst_old = {
+            "type": "assistant",
+            "uuid": "a1",
+            "parentUuid": "p1",
+            "message": {
+                "role": "assistant",
+                "id": "msg_old",
+                "content": [
+                    {"type": "thinking", "thinking": "stale thinking"},
+                    {"type": "text", "text": "old answer"},
+                ],
+            },
+        }
+        user2 = {
+            "type": "user",
+            "uuid": "u2",
+            "parentUuid": "a1",
+            "message": {"role": "user", "content": "next"},
+        }
+        asst_new = {
+            "type": "assistant",
+            "uuid": "a2",
+            "parentUuid": "u2",
+            "message": {
+                "role": "assistant",
+                "id": "msg_new",
+                "content": [
+                    {"type": "thinking", "thinking": "fresh thinking"},
+                    {"type": "text", "text": "new answer"},
+                ],
+            },
+        }
+        content = _make_jsonl(user, progress, asst_old, user2, asst_new)
+        result = strip_for_upload(content)
+
+        lines = result.strip().split("\n")
+        # Progress should be stripped -> 4 entries remain
+        assert len(lines) == 4
+
+        # First entry (user) should be reparented since its child (progress) was stripped
+        entries = [json.loads(line) for line in lines]
+        types = [e.get("type") for e in entries]
+        assert "progress" not in types
+
+        # Old assistant thinking stripped, new assistant thinking preserved
+        old_asst = next(
+            e for e in entries if e.get("message", {}).get("id") == "msg_old"
+        )
+        old_content = old_asst["message"]["content"]
+        old_types = [b["type"] for b in old_content if isinstance(b, dict)]
+        assert "thinking" not in old_types
+        assert "text" in old_types
+
+        new_asst = next(
+            e for e in entries if e.get("message", {}).get("id") == "msg_new"
+        )
+        new_content = new_asst["message"]["content"]
+        new_types = [b["type"] for b in new_content if isinstance(b, dict)]
+        assert "thinking" in new_types  # last assistant preserved
+
+    def test_empty_content(self):
+        result = strip_for_upload("")
+        # Empty string produces a single empty line after split, resulting in "\n"
+        assert result.strip() == ""
+
+    def test_preserves_compact_summary(self):
+        compact = {
+            "type": "summary",
+            "uuid": "cs1",
+            "isCompactSummary": True,
+            "message": {"role": "user", "content": "summary"},
+        }
+        asst = {
+            "type": "assistant",
+            "uuid": "a1",
+            "parentUuid": "cs1",
+            "message": {"role": "assistant", "id": "msg1", "content": "answer"},
+        }
+        content = _make_jsonl(compact, asst)
+        result = strip_for_upload(content)
+        lines = result.strip().split("\n")
+        assert len(lines) == 2
+
+    def test_no_assistant_entries(self):
+        user = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "hi"},
+        }
+        content = _make_jsonl(user)
+        result = strip_for_upload(content)
+        lines = result.strip().split("\n")
+        assert len(lines) == 1
+
+
+# ---------------------------------------------------------------------------
+# validate_transcript (additional edge cases)
+# ---------------------------------------------------------------------------
+
+
+class TestValidateTranscript:
+    def test_valid_with_assistant(self):
+        content = _make_jsonl(
+            USER_ENTRY,
+            ASST_ENTRY,
+        )
+        assert validate_transcript(content) is True
+
+    def test_none_returns_false(self):
+        assert validate_transcript(None) is False
+
+    def test_whitespace_only_returns_false(self):
+        assert validate_transcript("   \n  ") is False
+
+    def test_no_assistant_returns_false(self):
+        content = _make_jsonl(USER_ENTRY)
+        assert validate_transcript(content) is False
+
+    def test_invalid_json_returns_false(self):
+        assert validate_transcript("not json\n") is False
+
+    def test_assistant_only_is_valid(self):
+        content = _make_jsonl(ASST_ENTRY)
+        assert validate_transcript(content) is True
--- a/autogpt_platform/backend/backend/data/block_cost_config.py
+++ b/autogpt_platform/backend/backend/data/block_cost_config.py
@@ -147,6 +147,19 @@ MODEL_COST: dict[LlmModel, int] = {
    LlmModel.KIMI_K2: 1,
    LlmModel.QWEN3_235B_A22B_THINKING: 1,
    LlmModel.QWEN3_CODER: 9,
+    # Z.ai (Zhipu) models
+    LlmModel.ZAI_GLM_4_32B: 1,
+    LlmModel.ZAI_GLM_4_5: 2,
+    LlmModel.ZAI_GLM_4_5_AIR: 1,
+    LlmModel.ZAI_GLM_4_5_AIR_FREE: 1,
+    LlmModel.ZAI_GLM_4_5V: 2,
+    LlmModel.ZAI_GLM_4_6: 1,
+    LlmModel.ZAI_GLM_4_6V: 1,
+    LlmModel.ZAI_GLM_4_7: 1,
+    LlmModel.ZAI_GLM_4_7_FLASH: 1,
+    LlmModel.ZAI_GLM_5: 2,
+    LlmModel.ZAI_GLM_5_TURBO: 4,
+    LlmModel.ZAI_GLM_5V_TURBO: 4,
    # v0 by Vercel models
    LlmModel.V0_1_5_MD: 1,
    LlmModel.V0_1_5_LG: 2,
--- a/autogpt_platform/backend/backend/data/db_accessors.py
+++ b/autogpt_platform/backend/backend/data/db_accessors.py
@@ -142,3 +142,9 @@ def credit_db():
        credit_db = get_database_manager_async_client()

    return credit_db
+
+
+def platform_cost_db():
+    from backend.util.clients import get_database_manager_async_client
+
+    return get_database_manager_async_client()
--- a/autogpt_platform/backend/backend/data/db_manager.py
+++ b/autogpt_platform/backend/backend/data/db_manager.py
@@ -96,6 +96,7 @@ from backend.data.notifications import (
    remove_notifications_from_batch,
 )
 from backend.data.onboarding import increment_onboarding_runs
+from backend.data.platform_cost import log_platform_cost
 from backend.data.understanding import (
    get_business_understanding,
    upsert_business_understanding,
@@ -332,6 +333,9 @@ class DatabaseManager(AppService):
    get_blocks_needing_optimization = _(get_blocks_needing_optimization)
    update_block_optimized_description = _(update_block_optimized_description)

+    # ============ Platform Cost Tracking ============ #
+    log_platform_cost = _(log_platform_cost)
+
    # ============ CoPilot Chat Sessions ============ #
    get_chat_session = _(chat_db.get_chat_session)
    create_chat_session = _(chat_db.create_chat_session)
@@ -529,6 +533,9 @@ class DatabaseManagerAsyncClient(AppServiceClient):
    # ============ Block Descriptions ============ #
    get_blocks_needing_optimization = d.get_blocks_needing_optimization

+    # ============ Platform Cost Tracking ============ #
+    log_platform_cost = d.log_platform_cost
+
    # ============ CoPilot Chat Sessions ============ #
    get_chat_session = d.get_chat_session
    create_chat_session = d.create_chat_session
--- a/autogpt_platform/backend/backend/data/model.py
+++ b/autogpt_platform/backend/backend/data/model.py
@@ -104,6 +104,11 @@ class User(BaseModel):
        description="User timezone (IANA timezone identifier or 'not-set')",
    )

+    # Subscription / rate-limit tier
+    subscription_tier: str | None = Field(
+        default=None, description="Subscription tier (FREE, PRO, BUSINESS, ENTERPRISE)"
+    )
+
    @classmethod
    def from_db(cls, prisma_user: "PrismaUser") -> "User":
        """Convert a database User object to application User model."""
@@ -158,6 +163,7 @@ class User(BaseModel):
            notify_on_weekly_summary=prisma_user.notifyOnWeeklySummary or True,
            notify_on_monthly_summary=prisma_user.notifyOnMonthlySummary or True,
            timezone=prisma_user.timezone or USER_TIMEZONE_NOT_SET,
+            subscription_tier=prisma_user.subscriptionTier,
        )


@@ -819,6 +825,17 @@ class RefundRequest(BaseModel):
    updated_at: datetime


+ProviderCostType = Literal[
+    "cost_usd",  # Actual USD cost reported by the provider
+    "tokens",  # LLM token counts (sum of input + output)
+    "characters",  # Per-character billing (TTS providers)
+    "sandbox_seconds",  # Per-second compute billing (e.g. E2B)
+    "walltime_seconds",  # Per-second billing incl. queue/polling
+    "per_run",  # Per-API-call billing with fixed cost
+    "items",  # Per-item billing (lead/organization/result count)
+]
+
+
 class NodeExecutionStats(BaseModel):
    """Execution statistics for a node execution."""

@@ -838,32 +855,39 @@ class NodeExecutionStats(BaseModel):
    output_token_count: int = 0
    extra_cost: int = 0
    extra_steps: int = 0
+    provider_cost: float | None = None
+    # Type of the provider-reported cost/usage captured above. When set
+    # by a block, resolve_tracking honors this directly instead of
+    # guessing from provider name.
+    provider_cost_type: Optional[ProviderCostType] = None
    # Moderation fields
    cleared_inputs: Optional[dict[str, list[str]]] = None
    cleared_outputs: Optional[dict[str, list[str]]] = None

    def __iadd__(self, other: "NodeExecutionStats") -> "NodeExecutionStats":
-        """Mutate this instance by adding another NodeExecutionStats."""
+        """Mutate this instance by adding another NodeExecutionStats.
+
+        Avoids calling model_dump() twice per merge (called on every
+        merge_stats() from ~20+ blocks); reads via getattr/vars instead.
+        """
        if not isinstance(other, NodeExecutionStats):
            return NotImplemented

-        stats_dict = other.model_dump()
-        current_stats = self.model_dump()
-
-        for key, value in stats_dict.items():
-            if key not in current_stats:
-                # Field doesn't exist yet, just set it
+        for key in type(other).model_fields:
+            value = getattr(other, key)
+            if value is None:
+                # Never overwrite an existing value with None
+                continue
+            current = getattr(self, key, None)
+            if current is None:
+                # Field doesn't exist yet or is None, just set it
                setattr(self, key, value)
-            elif isinstance(value, dict) and isinstance(current_stats[key], dict):
-                current_stats[key].update(value)
-                setattr(self, key, current_stats[key])
-            elif isinstance(value, (int, float)) and isinstance(
-                current_stats[key], (int, float)
-            ):
-                setattr(self, key, current_stats[key] + value)
-            elif isinstance(value, list) and isinstance(current_stats[key], list):
-                current_stats[key].extend(value)
-                setattr(self, key, current_stats[key])
+            elif isinstance(value, dict) and isinstance(current, dict):
+                current.update(value)
+            elif isinstance(value, (int, float)) and isinstance(current, (int, float)):
+                setattr(self, key, current + value)
+            elif isinstance(value, list) and isinstance(current, list):
+                current.extend(value)
            else:
                setattr(self, key, value)

--- a/autogpt_platform/backend/backend/data/model_test.py
+++ b/autogpt_platform/backend/backend/data/model_test.py
@@ -1,7 +1,7 @@
 import pytest
 from pydantic import SecretStr

-from backend.data.model import HostScopedCredentials
+from backend.data.model import HostScopedCredentials, NodeExecutionStats


 class TestHostScopedCredentials:
@@ -166,3 +166,84 @@ class TestHostScopedCredentials:
        )

        assert creds.matches_url(test_url) == expected
+
+
+class TestNodeExecutionStatsIadd:
+    def test_adds_numeric_fields(self):
+        a = NodeExecutionStats(input_token_count=100, output_token_count=50)
+        b = NodeExecutionStats(input_token_count=200, output_token_count=30)
+        a += b
+        assert a.input_token_count == 300
+        assert a.output_token_count == 80
+
+    def test_none_does_not_overwrite(self):
+        a = NodeExecutionStats(provider_cost=0.5, error="some error")
+        b = NodeExecutionStats(provider_cost=None, error=None)
+        a += b
+        assert a.provider_cost == 0.5
+        assert a.error == "some error"
+
+    def test_none_is_skipped_preserving_existing_value(self):
+        a = NodeExecutionStats(input_token_count=100)
+        b = NodeExecutionStats()
+        a += b
+        assert a.input_token_count == 100
+
+    def test_dict_fields_are_merged(self):
+        a = NodeExecutionStats(
+            cleared_inputs={"field1": ["val1"]},
+        )
+        b = NodeExecutionStats(
+            cleared_inputs={"field2": ["val2"]},
+        )
+        a += b
+        assert a.cleared_inputs == {"field1": ["val1"], "field2": ["val2"]}
+
+    def test_returns_self(self):
+        a = NodeExecutionStats()
+        b = NodeExecutionStats(input_token_count=10)
+        result = a.__iadd__(b)
+        assert result is a
+
+    def test_not_implemented_for_non_stats(self):
+        a = NodeExecutionStats()
+        result = a.__iadd__("not a stats")  # type: ignore[arg-type]
+        assert result is NotImplemented
+
+    def test_error_none_does_not_clear_existing_error(self):
+        a = NodeExecutionStats(error="existing error")
+        b = NodeExecutionStats(error=None)
+        a += b
+        assert a.error == "existing error"
+
+    def test_provider_cost_none_does_not_clear_existing_cost(self):
+        a = NodeExecutionStats(provider_cost=0.05)
+        b = NodeExecutionStats(provider_cost=None)
+        a += b
+        assert a.provider_cost == 0.05
+
+    def test_provider_cost_accumulates_when_both_set(self):
+        a = NodeExecutionStats(provider_cost=0.01)
+        b = NodeExecutionStats(provider_cost=0.02)
+        a += b
+        assert abs((a.provider_cost or 0) - 0.03) < 1e-9
+
+    def test_provider_cost_first_write_from_none(self):
+        a = NodeExecutionStats()
+        b = NodeExecutionStats(provider_cost=0.05)
+        a += b
+        assert a.provider_cost == 0.05
+
+    def test_provider_cost_type_first_write_from_none(self):
+        """Writing provider_cost_type into a stats with None sets it."""
+        a = NodeExecutionStats()
+        b = NodeExecutionStats(provider_cost_type="characters")
+        a += b
+        assert a.provider_cost_type == "characters"
+
+    def test_provider_cost_type_none_does_not_overwrite(self):
+        """A None provider_cost_type from other must not clear an existing value."""
+        a = NodeExecutionStats(provider_cost_type="tokens")
+        b = NodeExecutionStats()
+        a += b
+        assert a.provider_cost_type == "tokens"
--- a/autogpt_platform/backend/backend/data/platform_cost.py
+++ b/autogpt_platform/backend/backend/data/platform_cost.py
@@ -0,0 +1,390 @@
+import asyncio
+import json
+import logging
+from datetime import datetime, timedelta, timezone
+from typing import Any
+
+from pydantic import BaseModel
+
+from backend.data.db import execute_raw_with_schema, query_raw_with_schema
+
+logger = logging.getLogger(__name__)
+
+MICRODOLLARS_PER_USD = 1_000_000
+
+# Dashboard query limits — keep in sync with the SQL queries below
+MAX_PROVIDER_ROWS = 500
+MAX_USER_ROWS = 100
+
+# Default date range for dashboard queries when no start date is provided.
+# Prevents full-table scans on large deployments.
+DEFAULT_DASHBOARD_DAYS = 30
+
+
+def usd_to_microdollars(cost_usd: float | None) -> int | None:
+    """Convert a USD amount (float) to microdollars (int). None-safe."""
+    if cost_usd is None:
+        return None
+    return round(cost_usd * MICRODOLLARS_PER_USD)
+
+
+class PlatformCostEntry(BaseModel):
+    user_id: str
+    graph_exec_id: str | None = None
+    node_exec_id: str | None = None
+    graph_id: str | None = None
+    node_id: str | None = None
+    block_id: str
+    block_name: str
+    provider: str
+    credential_id: str
+    cost_microdollars: int | None = None
+    input_tokens: int | None = None
+    output_tokens: int | None = None
+    data_size: int | None = None
+    duration: float | None = None
+    model: str | None = None
+    tracking_type: str | None = None
+    tracking_amount: float | None = None
+    metadata: dict[str, Any] | None = None
+
+
+async def log_platform_cost(entry: PlatformCostEntry) -> None:
+    await execute_raw_with_schema(
+        """
+        INSERT INTO {schema_prefix}"PlatformCostLog"
+            ("id", "createdAt", "userId", "graphExecId", "nodeExecId",
+             "graphId", "nodeId", "blockId", "blockName", "provider",
+             "credentialId", "costMicrodollars", "inputTokens", "outputTokens",
+             "dataSize", "duration", "model", "trackingType", "trackingAmount",
+             "metadata")
+        VALUES (
+            gen_random_uuid(), NOW(), $1, $2, $3, $4, $5, $6, $7, $8, $9,
+            $10, $11, $12, $13, $14, $15, $16, $17, $18::jsonb
+        )
+        """,
+        entry.user_id,
+        entry.graph_exec_id,
+        entry.node_exec_id,
+        entry.graph_id,
+        entry.node_id,
+        entry.block_id,
+        entry.block_name,
+        # Normalize to lowercase so the (provider, createdAt) index is always
+        # used without LOWER() on the read side.
+        entry.provider.lower(),
+        entry.credential_id,
+        entry.cost_microdollars,
+        entry.input_tokens,
+        entry.output_tokens,
+        entry.data_size,
+        entry.duration,
+        entry.model,
+        entry.tracking_type,
+        entry.tracking_amount,
+        _json_or_none(entry.metadata),
+    )
+
+
+# Bound the number of concurrent cost-log DB inserts to prevent unbounded
+# task/connection growth under sustained load or DB slowness.
+_log_semaphore = asyncio.Semaphore(50)
+
+
+async def log_platform_cost_safe(entry: PlatformCostEntry) -> None:
+    """Fire-and-forget wrapper that never raises."""
+    try:
+        async with _log_semaphore:
+            await log_platform_cost(entry)
+    except Exception:
+        logger.exception(
+            "Failed to log platform cost for user=%s provider=%s block=%s",
+            entry.user_id,
+            entry.provider,
+            entry.block_name,
+        )
+
+
+def _json_or_none(data: dict[str, Any] | None) -> str | None:
+    if data is None:
+        return None
+    return json.dumps(data)
+
+
+def _mask_email(email: str | None) -> str | None:
+    """Mask an email address to reduce PII exposure in admin API responses.
+
+    Turns 'user@example.com' into 'us***@example.com'.
+    Handles short local parts gracefully (e.g. 'a@b.com' → 'a***@b.com').
+    """
+    if not email:
+        return email
+    at = email.find("@")
+    if at < 0:
+        return "***"
+    local = email[:at]
+    domain = email[at:]
+    visible = local[:2] if len(local) >= 2 else local[:1]
+    return f"{visible}***{domain}"
+
+
+class ProviderCostSummary(BaseModel):
+    provider: str
+    tracking_type: str | None = None
+    total_cost_microdollars: int
+    total_input_tokens: int
+    total_output_tokens: int
+    total_duration_seconds: float = 0.0
+    total_tracking_amount: float = 0.0
+    request_count: int
+
+
+class UserCostSummary(BaseModel):
+    user_id: str | None = None
+    email: str | None = None
+    total_cost_microdollars: int
+    total_input_tokens: int
+    total_output_tokens: int
+    request_count: int
+
+
+class CostLogRow(BaseModel):
+    id: str
+    created_at: datetime
+    user_id: str | None = None
+    email: str | None = None
+    graph_exec_id: str | None = None
+    node_exec_id: str | None = None
+    block_name: str
+    provider: str
+    tracking_type: str | None = None
+    cost_microdollars: int | None = None
+    input_tokens: int | None = None
+    output_tokens: int | None = None
+    duration: float | None = None
+    model: str | None = None
+
+
+class PlatformCostDashboard(BaseModel):
+    by_provider: list[ProviderCostSummary]
+    by_user: list[UserCostSummary]
+    total_cost_microdollars: int
+    total_requests: int
+    total_users: int
+
+
+def _build_where(
+    start: datetime | None,
+    end: datetime | None,
+    provider: str | None,
+    user_id: str | None,
+    table_alias: str = "",
+) -> tuple[str, list[Any]]:
+    prefix = f"{table_alias}." if table_alias else ""
+    clauses: list[str] = []
+    params: list[Any] = []
+    idx = 1
+
+    if start:
+        clauses.append(f'{prefix}"createdAt" >= ${idx}::timestamptz')
+        params.append(start)
+        idx += 1
+    if end:
+        clauses.append(f'{prefix}"createdAt" <= ${idx}::timestamptz')
+        params.append(end)
+        idx += 1
+    if provider:
+        # Provider names are normalized to lowercase at write time so a plain
+        # equality check is sufficient and the (provider, createdAt) index is used.
+        clauses.append(f'{prefix}"provider" = ${idx}')
+        params.append(provider.lower())
+        idx += 1
+    if user_id:
+        clauses.append(f'{prefix}"userId" = ${idx}')
+        params.append(user_id)
+        idx += 1
+
+    return (" AND ".join(clauses) if clauses else "TRUE", params)
+
+
+async def get_platform_cost_dashboard(
+    start: datetime | None = None,
+    end: datetime | None = None,
+    provider: str | None = None,
+    user_id: str | None = None,
+) -> PlatformCostDashboard:
+    """Aggregate platform cost logs for the admin dashboard.
+
+    Note: by_provider rows are keyed on (provider, tracking_type). A single
+    provider can therefore appear in multiple rows if it has entries with
+    different billing models (e.g. "openai" with both "tokens" and "cost_usd"
+    if pricing is later added for some entries). Frontend treats each row
+    independently rather than as a provider primary key.
+
+    Defaults to the last DEFAULT_DASHBOARD_DAYS days when no start date is
+    provided to avoid full-table scans on large deployments.
+    """
+    if start is None:
+        start = datetime.now(timezone.utc) - timedelta(days=DEFAULT_DASHBOARD_DAYS)
+    where_p, params_p = _build_where(start, end, provider, user_id, "p")
+
+    by_provider_rows, by_user_rows, total_user_rows = await asyncio.gather(
+        query_raw_with_schema(
+            f"""
+            SELECT
+                p."provider",
+                p."trackingType" AS tracking_type,
+                COALESCE(SUM(p."costMicrodollars"), 0)::bigint AS total_cost,
+                COALESCE(SUM(p."inputTokens"), 0)::bigint AS total_input_tokens,
+                COALESCE(SUM(p."outputTokens"), 0)::bigint AS total_output_tokens,
+                COALESCE(SUM(p."duration"), 0)::float AS total_duration,
+                COALESCE(SUM(p."trackingAmount"), 0)::float AS total_tracking_amount,
+                COUNT(*)::bigint AS request_count
+            FROM {{schema_prefix}}"PlatformCostLog" p
+            WHERE {where_p}
+            GROUP BY p."provider", p."trackingType"
+            ORDER BY total_cost DESC
+            LIMIT {MAX_PROVIDER_ROWS}
+            """,
+            *params_p,
+        ),
+        query_raw_with_schema(
+            f"""
+            SELECT
+                p."userId" AS user_id,
+                u."email",
+                COALESCE(SUM(p."costMicrodollars"), 0)::bigint AS total_cost,
+                COALESCE(SUM(p."inputTokens"), 0)::bigint AS total_input_tokens,
+                COALESCE(SUM(p."outputTokens"), 0)::bigint AS total_output_tokens,
+                COUNT(*)::bigint AS request_count
+            FROM {{schema_prefix}}"PlatformCostLog" p
+            LEFT JOIN {{schema_prefix}}"User" u ON u."id" = p."userId"
+            WHERE {where_p}
+            GROUP BY p."userId", u."email"
+            ORDER BY total_cost DESC
+            LIMIT {MAX_USER_ROWS}
+            """,
+            *params_p,
+        ),
+        query_raw_with_schema(
+            f"""
+            SELECT COUNT(DISTINCT p."userId")::bigint AS cnt
+            FROM {{schema_prefix}}"PlatformCostLog" p
+            WHERE {where_p}
+            """,
+            *params_p,
+        ),
+    )
+
+    # Use the exact COUNT(DISTINCT userId) so total_users is not capped at
+    # MAX_USER_ROWS (which would silently report 100 for >100 active users).
+    total_users = int(total_user_rows[0]["cnt"]) if total_user_rows else 0
+    total_cost = sum(r["total_cost"] for r in by_provider_rows)
+    total_requests = sum(r["request_count"] for r in by_provider_rows)
+
+    return PlatformCostDashboard(
+        by_provider=[
+            ProviderCostSummary(
+                provider=r["provider"],
+                tracking_type=r.get("tracking_type"),
+                total_cost_microdollars=r["total_cost"],
+                total_input_tokens=r["total_input_tokens"],
+                total_output_tokens=r["total_output_tokens"],
+                total_duration_seconds=r.get("total_duration", 0.0),
+                total_tracking_amount=r.get("total_tracking_amount", 0.0),
+                request_count=r["request_count"],
+            )
+            for r in by_provider_rows
+        ],
+        by_user=[
+            UserCostSummary(
+                user_id=r.get("user_id"),
+                email=_mask_email(r.get("email")),
+                total_cost_microdollars=r["total_cost"],
+                total_input_tokens=r["total_input_tokens"],
+                total_output_tokens=r["total_output_tokens"],
+                request_count=r["request_count"],
+            )
+            for r in by_user_rows
+        ],
+        total_cost_microdollars=total_cost,
+        total_requests=total_requests,
+        total_users=total_users,
+    )
+
+
+async def get_platform_cost_logs(
+    start: datetime | None = None,
+    end: datetime | None = None,
+    provider: str | None = None,
+    user_id: str | None = None,
+    page: int = 1,
+    page_size: int = 50,
+) -> tuple[list[CostLogRow], int]:
+    if start is None:
+        start = datetime.now(tz=timezone.utc) - timedelta(days=DEFAULT_DASHBOARD_DAYS)
+    where_sql, params = _build_where(start, end, provider, user_id, "p")
+
+    offset = (page - 1) * page_size
+    limit_idx = len(params) + 1
+    offset_idx = len(params) + 2
+
+    count_rows, rows = await asyncio.gather(
+        query_raw_with_schema(
+            f"""
+            SELECT COUNT(*)::bigint AS cnt
+            FROM {{schema_prefix}}"PlatformCostLog" p
+            WHERE {where_sql}
+            """,
+            *params,
+        ),
+        query_raw_with_schema(
+            f"""
+            SELECT
+                p."id",
+                p."createdAt" AS created_at,
+                p."userId" AS user_id,
+                u."email",
+                p."graphExecId" AS graph_exec_id,
+                p."nodeExecId" AS node_exec_id,
+                p."blockName" AS block_name,
+                p."provider",
+                p."trackingType" AS tracking_type,
+                p."costMicrodollars" AS cost_microdollars,
+                p."inputTokens" AS input_tokens,
+                p."outputTokens" AS output_tokens,
+                p."duration",
+                p."model"
+            FROM {{schema_prefix}}"PlatformCostLog" p
+            LEFT JOIN {{schema_prefix}}"User" u ON u."id" = p."userId"
+            WHERE {where_sql}
+            ORDER BY p."createdAt" DESC, p."id" DESC
+            LIMIT ${limit_idx} OFFSET ${offset_idx}
+            """,
+            *params,
+            page_size,
+            offset,
+        ),
+    )
+    total = count_rows[0]["cnt"] if count_rows else 0
+
+    logs = [
+        CostLogRow(
+            id=r["id"],
+            created_at=r["created_at"],
+            user_id=r.get("user_id"),
+            email=_mask_email(r.get("email")),
+            graph_exec_id=r.get("graph_exec_id"),
+            node_exec_id=r.get("node_exec_id"),
+            block_name=r["block_name"],
+            provider=r["provider"],
+            tracking_type=r.get("tracking_type"),
+            cost_microdollars=r.get("cost_microdollars"),
+            input_tokens=r.get("input_tokens"),
+            output_tokens=r.get("output_tokens"),
+            duration=r.get("duration"),
+            model=r.get("model"),
+        )
+        for r in rows
+    ]
+    return logs, total
--- a/autogpt_platform/backend/backend/data/platform_cost_test.py
+++ b/autogpt_platform/backend/backend/data/platform_cost_test.py
@@ -0,0 +1,266 @@
+"""Unit tests for helpers and async functions in platform_cost module."""
+
+from datetime import datetime, timezone
+from unittest.mock import AsyncMock, patch
+
+import pytest
+
+from .platform_cost import (
+    PlatformCostEntry,
+    _build_where,
+    _json_or_none,
+    get_platform_cost_dashboard,
+    get_platform_cost_logs,
+    log_platform_cost,
+    log_platform_cost_safe,
+)
+
+
+class TestJsonOrNone:
+    def test_returns_none_for_none(self):
+        assert _json_or_none(None) is None
+
+    def test_returns_json_string_for_dict(self):
+        result = _json_or_none({"key": "value", "num": 42})
+        assert result is not None
+        assert '"key"' in result
+        assert '"value"' in result
+
+    def test_returns_json_for_empty_dict(self):
+        assert _json_or_none({}) == "{}"
+
+
+class TestBuildWhere:
+    def test_no_filters_returns_true(self):
+        sql, params = _build_where(None, None, None, None)
+        assert sql == "TRUE"
+        assert params == []
+
+    def test_start_only(self):
+        dt = datetime(2026, 1, 1, tzinfo=timezone.utc)
+        sql, params = _build_where(dt, None, None, None)
+        assert '"createdAt" >= $1::timestamptz' in sql
+        assert params == [dt]
+
+    def test_end_only(self):
+        dt = datetime(2026, 6, 1, tzinfo=timezone.utc)
+        sql, params = _build_where(None, dt, None, None)
+        assert '"createdAt" <= $1::timestamptz' in sql
+        assert params == [dt]
+
+    def test_provider_only(self):
+        # Provider names are normalized to lowercase at write time, so the
+        # filter uses a plain equality check. The input is also lowercased so
+        # "OpenAI" and "openai" both match stored rows.
+        sql, params = _build_where(None, None, "OpenAI", None)
+        assert '"provider" = $1' in sql
+        assert params == ["openai"]
+
+    def test_user_id_only(self):
+        sql, params = _build_where(None, None, None, "user-123")
+        assert '"userId" = $1' in sql
+        assert params == ["user-123"]
+
+    def test_all_filters(self):
+        start = datetime(2026, 1, 1, tzinfo=timezone.utc)
+        end = datetime(2026, 6, 1, tzinfo=timezone.utc)
+        sql, params = _build_where(start, end, "Anthropic", "u1")
+        assert "$1" in sql
+        assert "$2" in sql
+        assert "$3" in sql
+        assert "$4" in sql
+        assert len(params) == 4
+        # Provider is lowercased at filter time to match stored lowercase values.
+        assert params == [start, end, "anthropic", "u1"]
+
+    def test_table_alias(self):
+        dt = datetime(2026, 1, 1, tzinfo=timezone.utc)
+        sql, params = _build_where(dt, None, None, None, table_alias="p")
+        assert 'p."createdAt"' in sql
+        assert params == [dt]
+
+    def test_clauses_joined_with_and(self):
+        start = datetime(2026, 1, 1, tzinfo=timezone.utc)
+        end = datetime(2026, 6, 1, tzinfo=timezone.utc)
+        sql, _ = _build_where(start, end, None, None)
+        assert " AND " in sql
+
+
+def _make_entry(**overrides: object) -> PlatformCostEntry:
+    return PlatformCostEntry.model_validate(
+        {
+            "user_id": "user-1",
+            "block_id": "block-1",
+            "block_name": "TestBlock",
+            "provider": "openai",
+            "credential_id": "cred-1",
+            **overrides,
+        }
+    )
+
+
+class TestLogPlatformCost:
+    @pytest.mark.asyncio
+    async def test_calls_execute_raw_with_schema(self):
+        mock_exec = AsyncMock()
+        with patch("backend.data.platform_cost.execute_raw_with_schema", new=mock_exec):
+            entry = _make_entry(
+                input_tokens=100,
+                output_tokens=50,
+                cost_microdollars=5000,
+                model="gpt-4",
+                metadata={"key": "val"},
+            )
+            await log_platform_cost(entry)
+        mock_exec.assert_awaited_once()
+        args = mock_exec.call_args
+        assert args[0][1] == "user-1"  # user_id is first param
+        assert args[0][6] == "block-1"  # block_id
+        assert args[0][7] == "TestBlock"  # block_name
+
+    @pytest.mark.asyncio
+    async def test_metadata_none_passes_none(self):
+        mock_exec = AsyncMock()
+        with patch("backend.data.platform_cost.execute_raw_with_schema", new=mock_exec):
+            entry = _make_entry(metadata=None)
+            await log_platform_cost(entry)
+        args = mock_exec.call_args
+        assert args[0][-1] is None  # last arg is metadata json
+
+
+class TestLogPlatformCostSafe:
+    @pytest.mark.asyncio
+    async def test_does_not_raise_on_error(self):
+        with patch(
+            "backend.data.platform_cost.execute_raw_with_schema",
+            new=AsyncMock(side_effect=RuntimeError("DB down")),
+        ):
+            entry = _make_entry()
+            await log_platform_cost_safe(entry)
+
+    @pytest.mark.asyncio
+    async def test_succeeds_when_no_error(self):
+        mock_exec = AsyncMock()
+        with patch("backend.data.platform_cost.execute_raw_with_schema", new=mock_exec):
+            entry = _make_entry()
+            await log_platform_cost_safe(entry)
+        mock_exec.assert_awaited_once()
+
+
+class TestGetPlatformCostDashboard:
+    @pytest.mark.asyncio
+    async def test_returns_dashboard_with_data(self):
+        provider_rows = [
+            {
+                "provider": "openai",
+                "tracking_type": "tokens",
+                "total_cost": 5000,
+                "total_input_tokens": 1000,
+                "total_output_tokens": 500,
+                "total_duration": 10.5,
+                "request_count": 3,
+            }
+        ]
+        user_rows = [
+            {
+                "user_id": "u1",
+                "email": "a@b.com",
+                "total_cost": 5000,
+                "total_input_tokens": 1000,
+                "total_output_tokens": 500,
+                "request_count": 3,
+            }
+        ]
+        # Dashboard runs 3 queries: by_provider, by_user, COUNT(DISTINCT userId).
+        mock_query = AsyncMock(side_effect=[provider_rows, user_rows, [{"cnt": 1}]])
+        with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
+            dashboard = await get_platform_cost_dashboard()
+        assert dashboard.total_cost_microdollars == 5000
+        assert dashboard.total_requests == 3
+        assert dashboard.total_users == 1
+        assert len(dashboard.by_provider) == 1
+        assert dashboard.by_provider[0].provider == "openai"
+        assert dashboard.by_provider[0].tracking_type == "tokens"
+        assert dashboard.by_provider[0].total_duration_seconds == 10.5
+        assert len(dashboard.by_user) == 1
+        assert dashboard.by_user[0].email == "a***@b.com"
+
+    @pytest.mark.asyncio
+    async def test_returns_empty_dashboard(self):
+        mock_query = AsyncMock(side_effect=[[], [], []])
+        with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
+            dashboard = await get_platform_cost_dashboard()
+        assert dashboard.total_cost_microdollars == 0
+        assert dashboard.total_requests == 0
+        assert dashboard.total_users == 0
+        assert dashboard.by_provider == []
+        assert dashboard.by_user == []
+
+    @pytest.mark.asyncio
+    async def test_passes_filters_to_queries(self):
+        start = datetime(2026, 1, 1, tzinfo=timezone.utc)
+        mock_query = AsyncMock(side_effect=[[], [], []])
+        with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
+            await get_platform_cost_dashboard(
+                start=start, provider="openai", user_id="u1"
+            )
+        assert mock_query.await_count == 3
+        first_call_sql = mock_query.call_args_list[0][0][0]
+        assert "createdAt" in first_call_sql
+
+
+class TestGetPlatformCostLogs:
+    @pytest.mark.asyncio
+    async def test_returns_logs_and_total(self):
+        count_rows = [{"cnt": 1}]
+        log_rows = [
+            {
+                "id": "log-1",
+                "created_at": datetime(2026, 3, 1, tzinfo=timezone.utc),
+                "user_id": "u1",
+                "email": "a@b.com",
+                "graph_exec_id": "g1",
+                "node_exec_id": "n1",
+                "block_name": "TestBlock",
+                "provider": "openai",
+                "tracking_type": "tokens",
+                "cost_microdollars": 5000,
+                "input_tokens": 100,
+                "output_tokens": 50,
+                "duration": 1.5,
+                "model": "gpt-4",
+            }
+        ]
+        mock_query = AsyncMock(side_effect=[count_rows, log_rows])
+        with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
+            logs, total = await get_platform_cost_logs(page=1, page_size=10)
+        assert total == 1
+        assert len(logs) == 1
+        assert logs[0].id == "log-1"
+        assert logs[0].provider == "openai"
+        assert logs[0].model == "gpt-4"
+
+    @pytest.mark.asyncio
+    async def test_returns_empty_when_no_data(self):
+        mock_query = AsyncMock(side_effect=[[{"cnt": 0}], []])
+        with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
+            logs, total = await get_platform_cost_logs()
+        assert total == 0
+        assert logs == []
+
+    @pytest.mark.asyncio
+    async def test_pagination_offset(self):
+        mock_query = AsyncMock(side_effect=[[{"cnt": 100}], []])
+        with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
+            logs, total = await get_platform_cost_logs(page=3, page_size=25)
+        assert total == 100
+        second_call_args = mock_query.call_args_list[1][0]
+        assert 25 in second_call_args  # page_size
+        assert 50 in second_call_args  # offset = (3-1) * 25
+
+    @pytest.mark.asyncio
+    async def test_empty_count_returns_zero(self):
+        mock_query = AsyncMock(side_effect=[[], []])
+        with patch("backend.data.platform_cost.query_raw_with_schema", new=mock_query):
+            logs, total = await get_platform_cost_logs()
+        assert total == 0
--- a/autogpt_platform/backend/backend/data/user.py
+++ b/autogpt_platform/backend/backend/data/user.py
@@ -82,6 +82,28 @@ async def get_user_by_email(email: str) -> Optional[User]:
        raise DatabaseError(f"Failed to get user by email {email}: {e}") from e


+async def search_users(query: str, limit: int = 20) -> list[tuple[str, str | None]]:
+    """Search users by partial email or name.
+
+    Returns a list of ``(user_id, email)`` tuples, up to *limit* results.
+    Searches the User table directly — no dependency on credit history.
+    """
+    query = query.strip()
+    if not query or len(query) < 3:
+        return []
+    users = await prisma.user.find_many(
+        where={
+            "OR": [
+                {"email": {"contains": query, "mode": "insensitive"}},
+                {"name": {"contains": query, "mode": "insensitive"}},
+            ],
+        },
+        take=limit,
+        order={"email": "asc"},
+    )
+    return [(u.id, u.email) for u in users]
+
+
 async def update_user_email(user_id: str, email: str):
    try:
        # Get old email first for cache invalidation
--- a/autogpt_platform/backend/backend/executor/cost_tracking.py
+++ b/autogpt_platform/backend/backend/executor/cost_tracking.py
@@ -0,0 +1,291 @@
+"""Helpers for platform cost tracking on system-credential block executions."""
+
+import asyncio
+import logging
+import threading
+from typing import TYPE_CHECKING, Any, cast
+
+from backend.blocks._base import Block, BlockSchema
+from backend.copilot.token_tracking import _pending_log_tasks as _copilot_tasks
+from backend.copilot.token_tracking import (
+    _pending_log_tasks_lock as _copilot_tasks_lock,
+)
+from backend.data.execution import NodeExecutionEntry
+from backend.data.model import NodeExecutionStats
+from backend.data.platform_cost import PlatformCostEntry, usd_to_microdollars
+from backend.executor.utils import block_usage_cost
+from backend.integrations.credentials_store import is_system_credential
+from backend.integrations.providers import ProviderName
+
+if TYPE_CHECKING:
+    from backend.data.db_manager import DatabaseManagerAsyncClient
+
+logger = logging.getLogger(__name__)
+
+# Provider groupings by billing model — used when the block didn't explicitly
+# declare stats.provider_cost_type and we fall back to provider-name
+# heuristics. Values match ProviderName enum values.
+_CHARACTER_BILLED_PROVIDERS = frozenset(
+    {ProviderName.D_ID.value, ProviderName.ELEVENLABS.value}
+)
+_WALLTIME_BILLED_PROVIDERS = frozenset(
+    {
+        ProviderName.FAL.value,
+        ProviderName.REVID.value,
+        ProviderName.REPLICATE.value,
+    }
+)
+
+# Hold strong references to in-flight log tasks so the event loop doesn't
+# garbage-collect them mid-execution. Tasks remove themselves on completion.
+# _pending_log_tasks_lock guards all reads and writes: worker threads call
+# discard() via done callbacks while drain_pending_cost_logs() iterates.
+_pending_log_tasks: set[asyncio.Task] = set()
+_pending_log_tasks_lock = threading.Lock()
+# Per-loop semaphores: asyncio.Semaphore is not thread-safe and must not be
+# shared across event loops running in different threads. Key by loop instance
+# so each executor worker thread gets its own semaphore.
+_log_semaphores: dict[asyncio.AbstractEventLoop, asyncio.Semaphore] = {}
+
+
+def _get_log_semaphore() -> asyncio.Semaphore:
+    loop = asyncio.get_running_loop()
+    sem = _log_semaphores.get(loop)
+    if sem is None:
+        sem = asyncio.Semaphore(50)
+        _log_semaphores[loop] = sem
+    return sem
+
+
+async def drain_pending_cost_logs(timeout: float = 5.0) -> None:
+    """Await all in-flight cost log tasks with a timeout.
+
+    Drains both the executor cost log tasks (_pending_log_tasks in this module,
+    used for block execution cost tracking via DatabaseManagerAsyncClient) and
+    the copilot cost log tasks (token_tracking._pending_log_tasks, used for
+    copilot LLM turns via platform_cost_db()).
+
+    Call this during graceful shutdown to flush pending INSERT tasks before
+    the process exits. Tasks that don't complete within `timeout` seconds are
+    abandoned and their failures are already logged by _safe_log.
+    """
+    # asyncio.wait() requires all tasks to belong to the running event loop.
+    # _pending_log_tasks is shared across executor worker threads (each with
+    # its own loop), so filter to only tasks owned by the current loop.
+    # Acquire the lock to take a consistent snapshot (worker threads call
+    # discard() via done callbacks concurrently with this iteration).
+    current_loop = asyncio.get_running_loop()
+    with _pending_log_tasks_lock:
+        all_pending = [t for t in _pending_log_tasks if t.get_loop() is current_loop]
+    if all_pending:
+        logger.info("Draining %d executor cost log task(s)", len(all_pending))
+        _, still_pending = await asyncio.wait(all_pending, timeout=timeout)
+        if still_pending:
+            logger.warning(
+                "%d executor cost log task(s) did not complete within %.1fs",
+                len(still_pending),
+                timeout,
+            )
+    # Also drain copilot cost log tasks (token_tracking._pending_log_tasks)
+    with _copilot_tasks_lock:
+        copilot_pending = [t for t in _copilot_tasks if t.get_loop() is current_loop]
+    if copilot_pending:
+        logger.info("Draining %d copilot cost log task(s)", len(copilot_pending))
+        _, still_pending = await asyncio.wait(copilot_pending, timeout=timeout)
+        if still_pending:
+            logger.warning(
+                "%d copilot cost log task(s) did not complete within %.1fs",
+                len(still_pending),
+                timeout,
+            )
+
+
+def _schedule_log(
+    db_client: "DatabaseManagerAsyncClient", entry: PlatformCostEntry
+) -> None:
+    async def _safe_log() -> None:
+        async with _get_log_semaphore():
+            try:
+                await db_client.log_platform_cost(entry)
+            except Exception:
+                logger.exception(
+                    "Failed to log platform cost for user=%s provider=%s block=%s",
+                    entry.user_id,
+                    entry.provider,
+                    entry.block_name,
+                )
+
+    task = asyncio.create_task(_safe_log())
+    with _pending_log_tasks_lock:
+        _pending_log_tasks.add(task)
+
+    def _remove(t: asyncio.Task) -> None:
+        with _pending_log_tasks_lock:
+            _pending_log_tasks.discard(t)
+
+    task.add_done_callback(_remove)
+
+
+def _extract_model_name(raw: str | dict | None) -> str | None:
+    """Return a string model name from a block input field, or None.
+
+    Handles str (returned as-is), dict (e.g. an enum wrapper, skipped), and
+    None (no model field). Unexpected types are coerced to str as a fallback.
+    """
+    if raw is None:
+        return None
+    if isinstance(raw, str):
+        return raw
+    if isinstance(raw, dict):
+        return None
+    return str(raw)
+
+
+def resolve_tracking(
+    provider: str,
+    stats: NodeExecutionStats,
+    input_data: dict[str, Any],
+) -> tuple[str, float]:
+    """Return (tracking_type, tracking_amount) based on provider billing model.
+
+    Preference order:
+    1. Block-declared: if the block set `provider_cost_type` on its stats,
+       honor it directly (paired with `provider_cost` as the amount).
+    2. Heuristic fallback: infer from `provider_cost`/token counts, then
+       from provider name for per-character / per-second billing.
+    """
+    # 1. Block explicitly declared its cost type (only when an amount is present)
+    if stats.provider_cost_type and stats.provider_cost is not None:
+        return stats.provider_cost_type, stats.provider_cost
+
+    # 2. Provider returned actual USD cost (OpenRouter, Exa)
+    if stats.provider_cost is not None:
+        return "cost_usd", stats.provider_cost
+
+    # 3. LLM providers: track by tokens
+    if stats.input_token_count or stats.output_token_count:
+        return "tokens", float(
+            (stats.input_token_count or 0) + (stats.output_token_count or 0)
+        )
+
+    # 4. Provider-specific billing heuristics
+
+    # TTS: billed per character of input text
+    if provider == ProviderName.UNREAL_SPEECH.value:
+        text = input_data.get("text", "")
+        return "characters", float(len(text)) if isinstance(text, str) else 0.0
+
+    # D-ID + ElevenLabs voice: billed per character of script
+    if provider in _CHARACTER_BILLED_PROVIDERS:
+        text = (
+            input_data.get("script_input", "")
+            or input_data.get("text", "")
+            or input_data.get("script", "")  # VideoNarrationBlock uses `script`
+        )
+        return "characters", float(len(text)) if isinstance(text, str) else 0.0
+
+    # E2B: billed per second of sandbox time
+    if provider == ProviderName.E2B.value:
+        return "sandbox_seconds", round(stats.walltime, 3) if stats.walltime else 0.0
+
+    # Video/image gen: walltime includes queue + generation + polling
+    if provider in _WALLTIME_BILLED_PROVIDERS:
+        return "walltime_seconds", round(stats.walltime, 3) if stats.walltime else 0.0
+
+    # Per-request: Google Maps, Ideogram, Nvidia, Apollo, etc.
+    # All billed per API call - count 1 per block execution.
+    return "per_run", 1.0
+
+
+async def log_system_credential_cost(
+    node_exec: NodeExecutionEntry,
+    block: Block,
+    stats: NodeExecutionStats,
+    db_client: "DatabaseManagerAsyncClient",
+) -> None:
+    """Check if a system credential was used and log the platform cost.
+
+    Routes through DatabaseManagerAsyncClient so the write goes via the
+    message-passing DB service rather than calling Prisma directly (which
+    is not connected in the executor process).
+
+    Logs only the first matching system credential field (one log per
+    execution). Any unexpected error is caught and logged — cost logging
+    is strictly best-effort and must never disrupt block execution.
+
+    Note: costMicrodollars is left null for providers that don't return
+    a USD cost. The credit_cost in metadata captures our internal credit
+    charge as a proxy.
+    """
+    try:
+        if node_exec.execution_context.dry_run:
+            return
+
+        input_data = node_exec.inputs
+        input_model = cast(type[BlockSchema], block.input_schema)
+
+        for field_name in input_model.get_credentials_fields():
+            cred_data = input_data.get(field_name)
+            if not cred_data or not isinstance(cred_data, dict):
+                continue
+            cred_id = cred_data.get("id", "")
+            if not cred_id or not is_system_credential(cred_id):
+                continue
+
+            model_name = _extract_model_name(input_data.get("model"))
+
+            credit_cost, _ = block_usage_cost(block=block, input_data=input_data)
+
+            provider_name = cred_data.get("provider", "unknown")
+            tracking_type, tracking_amount = resolve_tracking(
+                provider=provider_name,
+                stats=stats,
+                input_data=input_data,
+            )
+
+            # Only treat provider_cost as USD when the tracking type says so.
+            # For other types (items, characters, per_run, ...) the
+            # provider_cost field holds the raw amount, not a dollar value.
+            # Use tracking_amount (the normalized value from resolve_tracking)
+            # rather than raw stats.provider_cost to avoid unit mismatches.
+            cost_microdollars = None
+            if tracking_type == "cost_usd":
+                cost_microdollars = usd_to_microdollars(tracking_amount)
+
+            meta: dict[str, Any] = {
+                "tracking_type": tracking_type,
+                "tracking_amount": tracking_amount,
+            }
+            if credit_cost is not None:
+                meta["credit_cost"] = credit_cost
+            if stats.provider_cost is not None:
+                # Use 'provider_cost_raw' — the value's unit varies by tracking
+                # type (USD for cost_usd, count for items/characters/per_run, etc.)
+                meta["provider_cost_raw"] = stats.provider_cost
+
+            _schedule_log(
+                db_client,
+                PlatformCostEntry(
+                    user_id=node_exec.user_id,
+                    graph_exec_id=node_exec.graph_exec_id,
+                    node_exec_id=node_exec.node_exec_id,
+                    graph_id=node_exec.graph_id,
+                    node_id=node_exec.node_id,
+                    block_id=node_exec.block_id,
+                    block_name=block.name,
+                    provider=provider_name,
+                    credential_id=cred_id,
+                    cost_microdollars=cost_microdollars,
+                    input_tokens=stats.input_token_count,
+                    output_tokens=stats.output_token_count,
+                    data_size=stats.output_size if stats.output_size > 0 else None,
+                    duration=stats.walltime if stats.walltime > 0 else None,
+                    model=model_name,
+                    tracking_type=tracking_type,
+                    tracking_amount=tracking_amount,
+                    metadata=meta,
+                ),
+            )
+            return  # One log per execution is enough
+    except Exception:
+        logger.exception("log_system_credential_cost failed unexpectedly")
--- a/autogpt_platform/backend/backend/executor/manager.py
+++ b/autogpt_platform/backend/backend/executor/manager.py
@@ -45,6 +45,10 @@ from backend.data.notifications import (
    ZeroBalanceData,
 )
 from backend.data.rabbitmq import SyncRabbitMQ
+from backend.executor.cost_tracking import (
+    drain_pending_cost_logs,
+    log_system_credential_cost,
+)
 from backend.integrations.creds_manager import IntegrationCredentialsManager
 from backend.notifications.notifications import queue_notification
 from backend.util import json
@@ -692,6 +696,15 @@ class ExecutionProcessor:
            stats=graph_stats,
        )

+        # Log platform cost if system credentials were used (only on success)
+        if status == ExecutionStatus.COMPLETED:
+            await log_system_credential_cost(
+                node_exec=node_exec,
+                block=node.block,
+                stats=execution_stats,
+                db_client=db_client,
+            )
+
        return execution_stats

    @async_time_measured
@@ -2044,6 +2057,18 @@ class ExecutionManager(AppProcess):
            prefix + " [cancel-consumer]",
        )

+        # Drain any in-flight cost log tasks before exit so we don't silently
+        # drop INSERT operations during deployments.
+        loop = getattr(self, "node_execution_loop", None)
+        if loop is not None and loop.is_running():
+            try:
+                asyncio.run_coroutine_threadsafe(
+                    drain_pending_cost_logs(), loop
+                ).result(timeout=10)
+                logger.info(f"{prefix} ✅ Cost log tasks drained")
+            except Exception as e:
+                logger.warning(f"{prefix} ⚠️ Failed to drain cost log tasks: {e}")
+
        logger.info(f"{prefix} ✅ Finished GraphExec cleanup")

        super().cleanup()
--- a/autogpt_platform/backend/backend/executor/manager_cost_tracking_test.py
+++ b/autogpt_platform/backend/backend/executor/manager_cost_tracking_test.py
@@ -0,0 +1,567 @@
+"""Unit tests for resolve_tracking and log_system_credential_cost."""
+
+import asyncio
+from typing import Any
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+from backend.data.execution import ExecutionContext, NodeExecutionEntry
+from backend.data.model import NodeExecutionStats
+from backend.executor.cost_tracking import log_system_credential_cost, resolve_tracking
+
+# ---------------------------------------------------------------------------
+# resolve_tracking
+# ---------------------------------------------------------------------------
+
+
+class TestResolveTracking:
+    def _stats(self, **overrides: Any) -> NodeExecutionStats:
+        return NodeExecutionStats(**overrides)
+
+    def test_provider_cost_returns_cost_usd(self):
+        stats = self._stats(provider_cost=0.0042)
+        tt, amt = resolve_tracking("openai", stats, {})
+        assert tt == "cost_usd"
+        assert amt == 0.0042
+
+    def test_token_counts_return_tokens(self):
+        stats = self._stats(input_token_count=300, output_token_count=100)
+        tt, amt = resolve_tracking("anthropic", stats, {})
+        assert tt == "tokens"
+        assert amt == 400.0
+
+    def test_token_counts_only_input(self):
+        stats = self._stats(input_token_count=500)
+        tt, amt = resolve_tracking("groq", stats, {})
+        assert tt == "tokens"
+        assert amt == 500.0
+
+    def test_unreal_speech_returns_characters(self):
+        stats = self._stats()
+        tt, amt = resolve_tracking("unreal_speech", stats, {"text": "Hello world"})
+        assert tt == "characters"
+        assert amt == 11.0
+
+    def test_unreal_speech_empty_text(self):
+        stats = self._stats()
+        tt, amt = resolve_tracking("unreal_speech", stats, {"text": ""})
+        assert tt == "characters"
+        assert amt == 0.0
+
+    def test_unreal_speech_non_string_text(self):
+        stats = self._stats()
+        tt, amt = resolve_tracking("unreal_speech", stats, {"text": 123})
+        assert tt == "characters"
+        assert amt == 0.0
+
+    def test_d_id_uses_script_input(self):
+        stats = self._stats()
+        tt, amt = resolve_tracking("d_id", stats, {"script_input": "Hello"})
+        assert tt == "characters"
+        assert amt == 5.0
+
+    def test_elevenlabs_uses_text(self):
+        stats = self._stats()
+        tt, amt = resolve_tracking("elevenlabs", stats, {"text": "Say this"})
+        assert tt == "characters"
+        assert amt == 8.0
+
+    def test_elevenlabs_fallback_to_text_when_no_script_input(self):
+        stats = self._stats()
+        tt, amt = resolve_tracking("elevenlabs", stats, {"text": "Fallback text"})
+        assert tt == "characters"
+        assert amt == 13.0
+
+    def test_elevenlabs_uses_script_field(self):
+        """VideoNarrationBlock (elevenlabs) uses `script` field, not script_input/text."""
+        stats = self._stats()
+        tt, amt = resolve_tracking("elevenlabs", stats, {"script": "Narration"})
+        assert tt == "characters"
+        assert amt == 9.0
+
+    def test_block_declared_cost_type_items(self):
+        """Block explicitly setting provider_cost_type='items' short-circuits heuristics."""
+        stats = self._stats(provider_cost=5.0, provider_cost_type="items")
+        tt, amt = resolve_tracking("google_maps", stats, {})
+        assert tt == "items"
+        assert amt == 5.0
+
+    def test_block_declared_cost_type_characters(self):
+        """TTS block can declare characters directly, bypassing input_data lookup."""
+        stats = self._stats(provider_cost=42.0, provider_cost_type="characters")
+        tt, amt = resolve_tracking("unreal_speech", stats, {})
+        assert tt == "characters"
+        assert amt == 42.0
+
+    def test_block_declared_cost_type_wins_over_tokens(self):
+        """provider_cost_type takes precedence over token-based heuristic."""
+        stats = self._stats(
+            provider_cost=1.0,
+            provider_cost_type="per_run",
+            input_token_count=500,
+        )
+        tt, amt = resolve_tracking("openai", stats, {})
+        assert tt == "per_run"
+        assert amt == 1.0
+
+    def test_e2b_returns_sandbox_seconds(self):
+        stats = self._stats(walltime=45.123)
+        tt, amt = resolve_tracking("e2b", stats, {})
+        assert tt == "sandbox_seconds"
+        assert amt == 45.123
+
+    def test_e2b_no_walltime(self):
+        stats = self._stats(walltime=0)
+        tt, amt = resolve_tracking("e2b", stats, {})
+        assert tt == "sandbox_seconds"
+        assert amt == 0.0
+
+    def test_fal_returns_walltime(self):
+        stats = self._stats(walltime=12.5)
+        tt, amt = resolve_tracking("fal", stats, {})
+        assert tt == "walltime_seconds"
+        assert amt == 12.5
+
+    def test_revid_returns_walltime(self):
+        stats = self._stats(walltime=60.0)
+        tt, amt = resolve_tracking("revid", stats, {})
+        assert tt == "walltime_seconds"
+        assert amt == 60.0
+
+    def test_replicate_returns_walltime(self):
+        stats = self._stats(walltime=30.0)
+        tt, amt = resolve_tracking("replicate", stats, {})
+        assert tt == "walltime_seconds"
+        assert amt == 30.0
+
+    def test_unknown_provider_returns_per_run(self):
+        stats = self._stats()
+        tt, amt = resolve_tracking("google_maps", stats, {})
+        assert tt == "per_run"
+        assert amt == 1.0
+
+    def test_provider_cost_takes_precedence_over_tokens(self):
+        stats = self._stats(
+            provider_cost=0.01, input_token_count=500, output_token_count=200
+        )
+        tt, amt = resolve_tracking("openai", stats, {})
+        assert tt == "cost_usd"
+        assert amt == 0.01
+
+    def test_provider_cost_zero_is_not_none(self):
+        """provider_cost=0.0 is falsy but should still be tracked as cost_usd
+        (e.g. free-tier or fully-cached responses from OpenRouter)."""
+        stats = self._stats(provider_cost=0.0)
+        tt, amt = resolve_tracking("open_router", stats, {})
+        assert tt == "cost_usd"
+        assert amt == 0.0
+
+    def test_tokens_take_precedence_over_provider_specific(self):
+        stats = self._stats(input_token_count=100, walltime=10.0)
+        tt, amt = resolve_tracking("fal", stats, {})
+        assert tt == "tokens"
+        assert amt == 100.0
+
+
+# ---------------------------------------------------------------------------
+# log_system_credential_cost
+# ---------------------------------------------------------------------------
+
+
+def _make_db_client() -> MagicMock:
+    db_client = MagicMock()
+    db_client.log_platform_cost = AsyncMock()
+    return db_client
+
+
+def _make_block(has_credentials: bool = True) -> MagicMock:
+    block = MagicMock()
+    block.name = "TestBlock"
+    input_schema = MagicMock()
+    if has_credentials:
+        input_schema.get_credentials_fields.return_value = {"credentials": MagicMock()}
+    else:
+        input_schema.get_credentials_fields.return_value = {}
+    block.input_schema = input_schema
+    return block
+
+
+def _make_node_exec(
+    inputs: dict | None = None,
+    dry_run: bool = False,
+) -> NodeExecutionEntry:
+    return NodeExecutionEntry(
+        user_id="user-1",
+        graph_exec_id="gx-1",
+        graph_id="g-1",
+        graph_version=1,
+        node_exec_id="nx-1",
+        node_id="n-1",
+        block_id="b-1",
+        inputs=inputs or {},
+        execution_context=ExecutionContext(dry_run=dry_run),
+    )
+
+
+class TestLogSystemCredentialCost:
+    @pytest.mark.asyncio
+    async def test_skips_dry_run(self):
+        db_client = _make_db_client()
+        node_exec = _make_node_exec(dry_run=True)
+        block = _make_block()
+        stats = NodeExecutionStats()
+        await log_system_credential_cost(node_exec, block, stats, db_client)
+        db_client.log_platform_cost.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_skips_when_no_credential_fields(self):
+        db_client = _make_db_client()
+        node_exec = _make_node_exec(inputs={})
+        block = _make_block(has_credentials=False)
+        stats = NodeExecutionStats()
+        await log_system_credential_cost(node_exec, block, stats, db_client)
+        db_client.log_platform_cost.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_skips_when_cred_data_missing(self):
+        db_client = _make_db_client()
+        node_exec = _make_node_exec(inputs={})
+        block = _make_block()
+        stats = NodeExecutionStats()
+        await log_system_credential_cost(node_exec, block, stats, db_client)
+        db_client.log_platform_cost.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_skips_when_not_system_credential(self):
+        db_client = _make_db_client()
+        with patch(
+            "backend.executor.cost_tracking.is_system_credential",
+            return_value=False,
+        ):
+            node_exec = _make_node_exec(
+                inputs={
+                    "credentials": {"id": "user-cred-123", "provider": "openai"},
+                }
+            )
+            block = _make_block()
+            stats = NodeExecutionStats()
+            await log_system_credential_cost(node_exec, block, stats, db_client)
+        db_client.log_platform_cost.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_logs_with_system_credential(self):
+        db_client = _make_db_client()
+        with (
+            patch(
+                "backend.executor.cost_tracking.is_system_credential", return_value=True
+            ),
+            patch(
+                "backend.executor.cost_tracking.block_usage_cost",
+                return_value=(10, None),
+            ),
+        ):
+            node_exec = _make_node_exec(
+                inputs={
+                    "credentials": {"id": "sys-cred-1", "provider": "openai"},
+                    "model": "gpt-4",
+                }
+            )
+            block = _make_block()
+            stats = NodeExecutionStats(input_token_count=500, output_token_count=200)
+            await log_system_credential_cost(node_exec, block, stats, db_client)
+            await asyncio.sleep(0)
+
+        db_client.log_platform_cost.assert_awaited_once()
+        entry = db_client.log_platform_cost.call_args[0][0]
+        assert entry.user_id == "user-1"
+        assert entry.provider == "openai"
+        assert entry.block_name == "TestBlock"
+        assert entry.model == "gpt-4"
+        assert entry.input_tokens == 500
+        assert entry.output_tokens == 200
+        assert entry.tracking_type == "tokens"
+        assert entry.metadata["tracking_type"] == "tokens"
+        assert entry.metadata["tracking_amount"] == 700.0
+        assert entry.metadata["credit_cost"] == 10
+
+    @pytest.mark.asyncio
+    async def test_logs_with_provider_cost(self):
+        db_client = _make_db_client()
+        with (
+            patch(
+                "backend.executor.cost_tracking.is_system_credential", return_value=True
+            ),
+            patch(
+                "backend.executor.cost_tracking.block_usage_cost",
+                return_value=(5, None),
+            ),
+        ):
+            node_exec = _make_node_exec(
+                inputs={
+                    "credentials": {"id": "sys-cred-2", "provider": "open_router"},
+                }
+            )
+            block = _make_block()
+            stats = NodeExecutionStats(provider_cost=0.0015)
+            await log_system_credential_cost(node_exec, block, stats, db_client)
+            await asyncio.sleep(0)
+
+        entry = db_client.log_platform_cost.call_args[0][0]
+        assert entry.cost_microdollars == 1500
+        assert entry.tracking_type == "cost_usd"
+        assert entry.metadata["tracking_type"] == "cost_usd"
+        assert entry.metadata["provider_cost_raw"] == 0.0015
+
+    @pytest.mark.asyncio
+    async def test_model_name_enum_converted_to_str(self):
+        db_client = _make_db_client()
+        with (
+            patch(
+                "backend.executor.cost_tracking.is_system_credential", return_value=True
+            ),
+            patch(
+                "backend.executor.cost_tracking.block_usage_cost",
+                return_value=(0, None),
+            ),
+        ):
+            from enum import Enum
+
+            class FakeModel(Enum):
+                GPT4 = "gpt-4"
+
+            node_exec = _make_node_exec(
+                inputs={
+                    "credentials": {"id": "sys-cred", "provider": "openai"},
+                    "model": FakeModel.GPT4,
+                }
+            )
+            block = _make_block()
+            stats = NodeExecutionStats()
+            await log_system_credential_cost(node_exec, block, stats, db_client)
+            await asyncio.sleep(0)
+
+        entry = db_client.log_platform_cost.call_args[0][0]
+        assert entry.model == "FakeModel.GPT4"
+
+    @pytest.mark.asyncio
+    async def test_model_name_dict_becomes_none(self):
+        db_client = _make_db_client()
+        with (
+            patch(
+                "backend.executor.cost_tracking.is_system_credential", return_value=True
+            ),
+            patch(
+                "backend.executor.cost_tracking.block_usage_cost",
+                return_value=(0, None),
+            ),
+        ):
+            node_exec = _make_node_exec(
+                inputs={
+                    "credentials": {"id": "sys-cred", "provider": "openai"},
+                    "model": {"nested": "value"},
+                }
+            )
+            block = _make_block()
+            stats = NodeExecutionStats()
+            await log_system_credential_cost(node_exec, block, stats, db_client)
+            await asyncio.sleep(0)
+
+        entry = db_client.log_platform_cost.call_args[0][0]
+        assert entry.model is None
+
+    @pytest.mark.asyncio
+    async def test_does_not_raise_when_block_usage_cost_raises(self):
+        """log_system_credential_cost must swallow exceptions from block_usage_cost."""
+        db_client = _make_db_client()
+        with (
+            patch(
+                "backend.executor.cost_tracking.is_system_credential", return_value=True
+            ),
+            patch(
+                "backend.executor.cost_tracking.block_usage_cost",
+                side_effect=RuntimeError("pricing lookup failed"),
+            ),
+        ):
+            node_exec = _make_node_exec(
+                inputs={
+                    "credentials": {"id": "sys-cred", "provider": "openai"},
+                }
+            )
+            block = _make_block()
+            stats = NodeExecutionStats()
+            # Should not raise — outer except must catch block_usage_cost error
+            await log_system_credential_cost(node_exec, block, stats, db_client)
+
+    @pytest.mark.asyncio
+    async def test_round_instead_of_int_for_microdollars(self):
+        db_client = _make_db_client()
+        with (
+            patch(
+                "backend.executor.cost_tracking.is_system_credential", return_value=True
+            ),
+            patch(
+                "backend.executor.cost_tracking.block_usage_cost",
+                return_value=(0, None),
+            ),
+        ):
+            node_exec = _make_node_exec(
+                inputs={
+                    "credentials": {"id": "sys-cred", "provider": "openai"},
+                }
+            )
+            block = _make_block()
+            # 0.0015 * 1_000_000 = 1499.9999999... with float math
+            # round() should give 1500, int() would give 1499
+            stats = NodeExecutionStats(provider_cost=0.0015)
+            await log_system_credential_cost(node_exec, block, stats, db_client)
+            await asyncio.sleep(0)
+
+        entry = db_client.log_platform_cost.call_args[0][0]
+        assert entry.cost_microdollars == 1500
+
+    @pytest.mark.asyncio
+    async def test_per_run_metadata_has_no_provider_cost_raw(self):
+        """For per-run providers (google_maps etc), provider_cost_raw is absent
+        from metadata since stats.provider_cost is None."""
+        db_client = _make_db_client()
+        with (
+            patch(
+                "backend.executor.cost_tracking.is_system_credential", return_value=True
+            ),
+            patch(
+                "backend.executor.cost_tracking.block_usage_cost",
+                return_value=(0, None),
+            ),
+        ):
+            node_exec = _make_node_exec(
+                inputs={
+                    "credentials": {"id": "sys-cred", "provider": "google_maps"},
+                }
+            )
+            block = _make_block()
+            stats = NodeExecutionStats()  # no provider_cost
+            await log_system_credential_cost(node_exec, block, stats, db_client)
+            await asyncio.sleep(0)
+
+        entry = db_client.log_platform_cost.call_args[0][0]
+        assert entry.tracking_type == "per_run"
+        assert "provider_cost_raw" not in (entry.metadata or {})
+
+
+# ---------------------------------------------------------------------------
+# merge_stats accumulation
+# ---------------------------------------------------------------------------
+
+
+class TestMergeStats:
+    """Tests for NodeExecutionStats accumulation via += (used by Block.merge_stats)."""
+
+    def test_accumulates_output_size(self):
+        stats = NodeExecutionStats()
+        stats += NodeExecutionStats(output_size=10)
+        stats += NodeExecutionStats(output_size=25)
+        assert stats.output_size == 35
+
+    def test_accumulates_tokens(self):
+        stats = NodeExecutionStats()
+        stats += NodeExecutionStats(input_token_count=100, output_token_count=50)
+        stats += NodeExecutionStats(input_token_count=200, output_token_count=150)
+        assert stats.input_token_count == 300
+        assert stats.output_token_count == 200
+
+    def test_preserves_provider_cost(self):
+        stats = NodeExecutionStats()
+        stats += NodeExecutionStats(provider_cost=0.005)
+        stats += NodeExecutionStats(output_size=10)
+        assert stats.provider_cost == 0.005
+        assert stats.output_size == 10
+
+    def test_provider_cost_accumulates(self):
+        """Multiple merge_stats with provider_cost should sum (multi-round
+        tool-calling in copilot / retries can report cost separately)."""
+        stats = NodeExecutionStats()
+        stats += NodeExecutionStats(provider_cost=0.001)
+        stats += NodeExecutionStats(provider_cost=0.002)
+        stats += NodeExecutionStats(provider_cost=0.003)
+        assert stats.provider_cost == pytest.approx(0.006)
+
+    def test_provider_cost_none_does_not_overwrite(self):
+        """A None provider_cost must not wipe a previously-set value."""
+        stats = NodeExecutionStats(provider_cost=0.01)
+        stats += NodeExecutionStats()  # provider_cost=None by default
+        assert stats.provider_cost == 0.01
+
+    def test_provider_cost_type_last_write_wins(self):
+        """provider_cost_type is a Literal — last set value wins on merge."""
+        stats = NodeExecutionStats(provider_cost_type="tokens")
+        stats += NodeExecutionStats(provider_cost_type="items")
+        assert stats.provider_cost_type == "items"
+
+
+# ---------------------------------------------------------------------------
+# on_node_execution -> log_system_credential_cost integration
+# ---------------------------------------------------------------------------
+
+
+class TestManagerCostTrackingIntegration:
+    @pytest.mark.asyncio
+    async def test_log_called_with_accumulated_stats(self):
+        """Verify that log_system_credential_cost receives stats that could
+        have been accumulated by merge_stats across multiple yield steps."""
+        db_client = _make_db_client()
+        with (
+            patch(
+                "backend.executor.cost_tracking.is_system_credential", return_value=True
+            ),
+            patch(
+                "backend.executor.cost_tracking.block_usage_cost",
+                return_value=(5, None),
+            ),
+        ):
+            stats = NodeExecutionStats()
+            stats += NodeExecutionStats(output_size=10, input_token_count=100)
+            stats += NodeExecutionStats(output_size=25, input_token_count=200)
+
+            assert stats.output_size == 35
+            assert stats.input_token_count == 300
+
+            node_exec = _make_node_exec(
+                inputs={
+                    "credentials": {"id": "sys-cred-acc", "provider": "openai"},
+                    "model": "gpt-4",
+                }
+            )
+            block = _make_block()
+            await log_system_credential_cost(node_exec, block, stats, db_client)
+            await asyncio.sleep(0)
+
+        db_client.log_platform_cost.assert_awaited_once()
+        entry = db_client.log_platform_cost.call_args[0][0]
+        assert entry.input_tokens == 300
+        assert entry.tracking_type == "tokens"
+        assert entry.metadata["tracking_amount"] == 300.0
+
+    @pytest.mark.asyncio
+    async def test_skips_cost_log_when_status_is_failed(self):
+        """Manager only calls log_system_credential_cost on COMPLETED status.
+
+        This test verifies the guard condition `if status == COMPLETED` directly:
+        calling log_system_credential_cost only happens on success, never on
+        FAILED or ERROR executions.
+        """
+        from backend.data.execution import ExecutionStatus
+
+        db_client = _make_db_client()
+        node_exec = _make_node_exec(
+            inputs={"credentials": {"id": "sys-cred", "provider": "openai"}}
+        )
+        block = _make_block()
+        stats = NodeExecutionStats(input_token_count=100)
+
+        # Simulate the manager guard: only call on COMPLETED
+        status = ExecutionStatus.FAILED
+        if status == ExecutionStatus.COMPLETED:
+            await log_system_credential_cost(node_exec, block, stats, db_client)
+
+        db_client.log_platform_cost.assert_not_awaited()
--- a/autogpt_platform/backend/backend/util/cache.py
+++ b/autogpt_platform/backend/backend/util/cache.py
@@ -121,10 +121,16 @@ def _make_hashable_key(


 def _make_redis_key(key: tuple[Any, ...], func_name: str) -> str:
-    """Convert a hashable key tuple to a Redis key string."""
-    # Ensure key is already hashable
-    hashable_key = key if isinstance(key, tuple) else (key,)
-    return f"cache:{func_name}:{hash(hashable_key)}"
+    """Convert a hashable key tuple to a Redis key string.
+
+    Uses SHA-256 instead of Python's built-in ``hash()`` because ``hash()``
+    is randomised per-process (``PYTHONHASHSEED``).  In a multi-pod
+    deployment every pod must derive the **same** Redis key for the same
+    arguments, otherwise cache lookups and invalidations silently miss.
+    """
+    key_bytes = repr(key).encode()
+    digest = hashlib.sha256(key_bytes).hexdigest()
+    return f"cache:{func_name}:{digest}"


@runtime_checkable
--- a/autogpt_platform/backend/backend/util/feature_flag.py
+++ b/autogpt_platform/backend/backend/util/feature_flag.py
@@ -1,5 +1,6 @@
 import contextlib
 import logging
+import os
 from enum import Enum
 from functools import wraps
 from typing import Any, Awaitable, Callable, TypeVar
@@ -38,6 +39,7 @@ class Flag(str, Enum):
    AGENT_ACTIVITY = "agent-activity"
    ENABLE_PLATFORM_PAYMENT = "enable-platform-payment"
    CHAT = "chat"
+    CHAT_MODE_OPTION = "chat-mode-option"
    COPILOT_SDK = "copilot-sdk"
    COPILOT_DAILY_TOKEN_LIMIT = "copilot-daily-token-limit"
    COPILOT_WEEKLY_TOKEN_LIMIT = "copilot-weekly-token-limit"
@@ -165,6 +167,30 @@ async def get_feature_flag_value(
        return default


+def _env_flag_override(flag_key: Flag) -> bool | None:
+    """Return a local override for ``flag_key`` from the environment.
+
+    Set ``FORCE_FLAG_<NAME>=true|false`` (``NAME`` = flag value with
+    ``-`` → ``_``, upper-cased) to bypass LaunchDarkly for a single
+    flag in local dev or tests.  Returns ``None`` when no override
+    is configured so the caller falls through to LaunchDarkly.
+
+    The ``NEXT_PUBLIC_FORCE_FLAG_<NAME>`` prefix is also accepted so a
+    single shared env var can toggle a flag across backend and
+    frontend (the frontend requires the ``NEXT_PUBLIC_`` prefix to
+    expose the value to the browser bundle).
+
+    Example: ``FORCE_FLAG_CHAT_MODE_OPTION=true`` forces
+    ``Flag.CHAT_MODE_OPTION`` on regardless of LaunchDarkly.
+    """
+    suffix = flag_key.value.upper().replace("-", "_")
+    for prefix in ("FORCE_FLAG_", "NEXT_PUBLIC_FORCE_FLAG_"):
+        raw = os.environ.get(prefix + suffix)
+        if raw is not None:
+            return raw.strip().lower() in ("1", "true", "yes", "on")
+    return None
+
+
 async def is_feature_enabled(
    flag_key: Flag,
    user_id: str,
@@ -181,6 +207,11 @@ async def is_feature_enabled(
    Returns:
        True if feature is enabled, False otherwise
    """
+    override = _env_flag_override(flag_key)
+    if override is not None:
+        logger.debug(f"Feature flag {flag_key} overridden by env: {override}")
+        return override
+
    result = await get_feature_flag_value(flag_key.value, user_id, default)

    # If the result is already a boolean, return it
--- a/autogpt_platform/backend/backend/util/feature_flag_test.py
+++ b/autogpt_platform/backend/backend/util/feature_flag_test.py
@@ -4,6 +4,7 @@ from ldclient import LDClient

 from backend.util.feature_flag import (
    Flag,
+    _env_flag_override,
    feature_flag,
    is_feature_enabled,
    mock_flag_variation,
@@ -111,3 +112,59 @@ async def test_is_feature_enabled_with_flag_enum(mocker):
    assert result is True
    # Should call with the flag's string value
    mock_get_feature_flag_value.assert_called_once()
+
+
+class TestEnvFlagOverride:
+    def test_force_flag_true(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "true")
+        assert _env_flag_override(Flag.CHAT) is True
+
+    def test_force_flag_false(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "false")
+        assert _env_flag_override(Flag.CHAT) is False
+
+    def test_next_public_prefix_true(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("NEXT_PUBLIC_FORCE_FLAG_CHAT", "true")
+        assert _env_flag_override(Flag.CHAT) is True
+
+    def test_unset_returns_none(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.delenv("FORCE_FLAG_CHAT", raising=False)
+        monkeypatch.delenv("NEXT_PUBLIC_FORCE_FLAG_CHAT", raising=False)
+        assert _env_flag_override(Flag.CHAT) is None
+
+    def test_invalid_value_returns_false(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "notaboolean")
+        assert _env_flag_override(Flag.CHAT) is False
+
+    def test_numeric_one_returns_true(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "1")
+        assert _env_flag_override(Flag.CHAT) is True
+
+    def test_yes_returns_true(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "yes")
+        assert _env_flag_override(Flag.CHAT) is True
+
+    def test_on_returns_true(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "on")
+        assert _env_flag_override(Flag.CHAT) is True
+
+    def test_hyphenated_flag_converts_to_underscore(
+        self, monkeypatch: pytest.MonkeyPatch
+    ):
+        monkeypatch.setenv("FORCE_FLAG_CHAT_MODE_OPTION", "true")
+        assert _env_flag_override(Flag.CHAT_MODE_OPTION) is True
+
+    def test_force_flag_takes_precedence_over_next_public(
+        self, monkeypatch: pytest.MonkeyPatch
+    ):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "false")
+        monkeypatch.setenv("NEXT_PUBLIC_FORCE_FLAG_CHAT", "true")
+        assert _env_flag_override(Flag.CHAT) is False
+
+    def test_whitespace_is_stripped(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "  true  ")
+        assert _env_flag_override(Flag.CHAT) is True
+
+    def test_case_insensitive_value(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "TRUE")
+        assert _env_flag_override(Flag.CHAT) is True
--- a/autogpt_platform/backend/backend/util/workspace.py
+++ b/autogpt_platform/backend/backend/util/workspace.py
@@ -155,6 +155,7 @@ class WorkspaceManager:
        path: Optional[str] = None,
        mime_type: Optional[str] = None,
        overwrite: bool = False,
+        metadata: Optional[dict] = None,
    ) -> WorkspaceFile:
        """
        Write file to workspace.
@@ -168,6 +169,7 @@ class WorkspaceManager:
            path: Virtual path (defaults to "/{filename}", session-scoped if session_id set)
            mime_type: MIME type (auto-detected if not provided)
            overwrite: Whether to overwrite existing file at path
+            metadata: Optional metadata dict (e.g., origin tracking)

        Returns:
            Created WorkspaceFile instance
@@ -246,6 +248,7 @@ class WorkspaceManager:
                    mime_type=mime_type,
                    size_bytes=len(content),
                    checksum=checksum,
+                    metadata=metadata,
                )
            except UniqueViolationError:
                if retries > 0:
--- a/autogpt_platform/backend/migrations/20260326200000_add_rate_limit_tier/migration.sql
+++ b/autogpt_platform/backend/migrations/20260326200000_add_rate_limit_tier/migration.sql
@@ -0,0 +1,5 @@
+-- CreateEnum
+CREATE TYPE "SubscriptionTier" AS ENUM ('FREE', 'PRO', 'BUSINESS', 'ENTERPRISE');
+
+-- AlterTable: add subscriptionTier column with default PRO (beta testing)
+ALTER TABLE "User" ADD COLUMN "subscriptionTier" "SubscriptionTier" NOT NULL DEFAULT 'PRO';
--- a/autogpt_platform/backend/migrations/20260402120000_add_platform_cost_log/migration.sql
+++ b/autogpt_platform/backend/migrations/20260402120000_add_platform_cost_log/migration.sql
@@ -0,0 +1,42 @@
+-- CreateTable
+CREATE TABLE "PlatformCostLog" (
+    "id" TEXT NOT NULL,
+    "createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    "userId" TEXT,
+    "graphExecId" TEXT,
+    "nodeExecId" TEXT,
+    "graphId" TEXT,
+    "nodeId" TEXT,
+    "blockId" TEXT NOT NULL,
+    "blockName" TEXT NOT NULL,
+    "provider" TEXT NOT NULL,
+    "credentialId" TEXT NOT NULL,
+    "costMicrodollars" BIGINT,
+    "inputTokens" INTEGER,
+    "outputTokens" INTEGER,
+    "dataSize" INTEGER,
+    "duration" DOUBLE PRECISION,
+    "model" TEXT,
+    "trackingType" TEXT,
+    "metadata" JSONB,
+
+    CONSTRAINT "PlatformCostLog_pkey" PRIMARY KEY ("id")
+);
+
+-- CreateIndex
+CREATE INDEX "PlatformCostLog_userId_createdAt_idx" ON "PlatformCostLog"("userId", "createdAt");
+
+-- CreateIndex
+CREATE INDEX "PlatformCostLog_provider_createdAt_idx" ON "PlatformCostLog"("provider", "createdAt");
+
+-- CreateIndex
+CREATE INDEX "PlatformCostLog_createdAt_idx" ON "PlatformCostLog"("createdAt");
+
+-- CreateIndex
+CREATE INDEX "PlatformCostLog_graphExecId_idx" ON "PlatformCostLog"("graphExecId");
+
+-- CreateIndex
+CREATE INDEX "PlatformCostLog_provider_trackingType_idx" ON "PlatformCostLog"("provider", "trackingType");
+
+-- AddForeignKey
+ALTER TABLE "PlatformCostLog" ADD CONSTRAINT "PlatformCostLog_userId_fkey" FOREIGN KEY ("userId") REFERENCES "User"("id") ON DELETE SET NULL ON UPDATE CASCADE;
--- a/autogpt_platform/backend/migrations/20260405140000_add_platform_cost_tracking_amount/migration.sql
+++ b/autogpt_platform/backend/migrations/20260405140000_add_platform_cost_tracking_amount/migration.sql
@@ -0,0 +1,2 @@
+-- AlterTable
+ALTER TABLE "PlatformCostLog" ADD COLUMN "trackingAmount" DOUBLE PRECISION;
--- a/autogpt_platform/backend/schema.prisma
+++ b/autogpt_platform/backend/schema.prisma
@@ -40,6 +40,15 @@ model User {

  timezone String @default("not-set")

+  // CoPilot subscription tier — controls rate-limit multipliers.
+  // Multipliers applied in get_global_rate_limits(): FREE=1x, PRO=5x, BUSINESS=20x, ENTERPRISE=60x.
+  // NOTE: @default(PRO) is intentional for the beta period — all existing and new
+  // users receive PRO-level (5x) rate limits by default. The Python-level constant
+  // DEFAULT_TIER=FREE (in copilot/rate_limit.py) acts as a code-level fallback when
+  // the DB value is NULL or unrecognised. At GA, a migration will flip the column
+  // default to FREE and batch-update users to their billing-derived tiers.
+  subscriptionTier SubscriptionTier @default(PRO)
+
  // Relations

  AgentGraphs          AgentGraph[]
@@ -66,6 +75,8 @@ model User {
  PendingHumanReviews   PendingHumanReview[]
  Workspace             UserWorkspace?

+  PlatformCostLogs PlatformCostLog[]
+
  // OAuth Provider relations
  OAuthApplications       OAuthApplication[]
  OAuthAuthorizationCodes OAuthAuthorizationCode[]
@@ -73,6 +84,13 @@ model User {
  OAuthRefreshTokens      OAuthRefreshToken[]
 }

+enum SubscriptionTier {
+  FREE
+  PRO
+  BUSINESS
+  ENTERPRISE
+}
+
 enum OnboardingStep {
  // Introductory onboarding (Library)
  WELCOME
@@ -799,6 +817,45 @@ model CreditRefundRequest {
  @@index([userId, transactionKey])
 }

+////////////////////////////////////////////////////////////
+////////////////////////////////////////////////////////////
+//////////   Platform Cost Tracking TABLES    //////////////
+////////////////////////////////////////////////////////////
+
+model PlatformCostLog {
+  id        String   @id @default(uuid())
+  createdAt DateTime @default(now())
+
+  userId       String?
+  User         User?  @relation(fields: [userId], references: [id], onDelete: SetNull)
+  graphExecId  String?
+  nodeExecId   String?
+  graphId      String?
+  nodeId       String?
+  blockId      String
+  blockName    String
+  provider     String
+  credentialId String
+
+  // Cost in microdollars (1 USD = 1,000,000). Null if unknown.
+  costMicrodollars BigInt?
+
+  inputTokens  Int?
+  outputTokens Int?
+  dataSize     Int? // bytes
+  duration     Float? // seconds
+  model        String?
+  trackingType   String? // e.g. "cost_usd", "tokens", "characters", "items", "per_run", "sandbox_seconds", "walltime_seconds"
+  trackingAmount Float? // Amount in the unit implied by trackingType
+  metadata       Json?
+
+  @@index([userId, createdAt])
+  @@index([provider, createdAt])
+  @@index([createdAt])
+  @@index([graphExecId])
+  @@index([provider, trackingType])
+}
+
 ////////////////////////////////////////////////////////////
 ////////////////////////////////////////////////////////////
 //////////////   Store TABLES    ///////////////////////////
--- a/autogpt_platform/backend/snapshots/get_rate_limit
+++ b/autogpt_platform/backend/snapshots/get_rate_limit
@@ -1,6 +1,7 @@
 {
  "daily_token_limit": 2500000,
  "daily_tokens_used": 500000,
+  "tier": "FREE",
  "user_email": "target@example.com",
  "user_id": "5e53486c-cf57-477e-ba2a-cb02dc828e1c",
  "weekly_token_limit": 12500000,
--- a/autogpt_platform/backend/snapshots/reset_user_usage_daily_and_weekly
+++ b/autogpt_platform/backend/snapshots/reset_user_usage_daily_and_weekly
@@ -1,6 +1,7 @@
 {
  "daily_token_limit": 2500000,
  "daily_tokens_used": 0,
+  "tier": "FREE",
  "user_email": "target@example.com",
  "user_id": "5e53486c-cf57-477e-ba2a-cb02dc828e1c",
  "weekly_token_limit": 12500000,
--- a/autogpt_platform/backend/snapshots/reset_user_usage_daily_only
+++ b/autogpt_platform/backend/snapshots/reset_user_usage_daily_only
@@ -1,6 +1,7 @@
 {
  "daily_token_limit": 2500000,
  "daily_tokens_used": 0,
+  "tier": "FREE",
  "user_email": "target@example.com",
  "user_id": "5e53486c-cf57-477e-ba2a-cb02dc828e1c",
  "weekly_token_limit": 12500000,
--- a/autogpt_platform/backend/test/agent_generator/test_orchestrator.py
+++ b/autogpt_platform/backend/test/agent_generator/test_orchestrator.py
@@ -140,7 +140,9 @@ class TestFixOrchestratorBlocks:
        assert defaults["conversation_compaction"] is True
        assert defaults["retry"] == 3
        assert defaults["multiple_tool_calls"] is False
-        assert len(fixer.fixes_applied) == 4
+        assert defaults["execution_mode"] == "extended_thinking"
+        assert defaults["model"] == "claude-opus-4-6"
+        assert len(fixer.fixes_applied) == 6

    def test_preserves_existing_values(self):
        """Existing user-set values are never overwritten."""
@@ -153,6 +155,8 @@ class TestFixOrchestratorBlocks:
                        "conversation_compaction": False,
                        "retry": 1,
                        "multiple_tool_calls": True,
+                        "execution_mode": "built_in",
+                        "model": "gpt-4o",
                    }
                )
            ],
@@ -166,6 +170,8 @@ class TestFixOrchestratorBlocks:
        assert defaults["conversation_compaction"] is False
        assert defaults["retry"] == 1
        assert defaults["multiple_tool_calls"] is True
+        assert defaults["execution_mode"] == "built_in"
+        assert defaults["model"] == "gpt-4o"
        assert len(fixer.fixes_applied) == 0

    def test_partial_defaults(self):
@@ -189,7 +195,9 @@ class TestFixOrchestratorBlocks:
        assert defaults["conversation_compaction"] is True  # filled
        assert defaults["retry"] == 3  # filled
        assert defaults["multiple_tool_calls"] is False  # filled
-        assert len(fixer.fixes_applied) == 3
+        assert defaults["execution_mode"] == "extended_thinking"  # filled
+        assert defaults["model"] == "claude-opus-4-6"  # filled
+        assert len(fixer.fixes_applied) == 5

    def test_skips_non_sdm_nodes(self):
        """Non-Orchestrator nodes are untouched."""
@@ -258,11 +266,13 @@ class TestFixOrchestratorBlocks:
        result = fixer.fix_orchestrator_blocks(agent)

        defaults = result["nodes"][0]["input_default"]
-        assert defaults["agent_mode_max_iterations"] == 10  # None → default
-        assert defaults["conversation_compaction"] is True  # None → default
+        assert defaults["agent_mode_max_iterations"] == 10  # None -> default
+        assert defaults["conversation_compaction"] is True  # None -> default
        assert defaults["retry"] == 3  # kept
        assert defaults["multiple_tool_calls"] is False  # kept
-        assert len(fixer.fixes_applied) == 2
+        assert defaults["execution_mode"] == "extended_thinking"  # filled
+        assert defaults["model"] == "claude-opus-4-6"  # filled
+        assert len(fixer.fixes_applied) == 4

    def test_multiple_sdm_nodes(self):
        """Multiple SDM nodes are all fixed independently."""
@@ -277,11 +287,11 @@ class TestFixOrchestratorBlocks:

        result = fixer.fix_orchestrator_blocks(agent)

-        # First node: 3 defaults filled (agent_mode was already set)
+        # First node: 5 defaults filled (agent_mode was already set)
        assert result["nodes"][0]["input_default"]["agent_mode_max_iterations"] == 3
-        # Second node: all 4 defaults filled
+        # Second node: all 6 defaults filled
        assert result["nodes"][1]["input_default"]["agent_mode_max_iterations"] == 10
-        assert len(fixer.fixes_applied) == 7  # 3 + 4
+        assert len(fixer.fixes_applied) == 11  # 5 + 6

    def test_registered_in_apply_all_fixes(self):
        """fix_orchestrator_blocks runs as part of apply_all_fixes."""
@@ -655,6 +665,7 @@ class TestOrchestratorE2EPipeline:
                        "conversation_compaction": {"type": "boolean"},
                        "retry": {"type": "integer"},
                        "multiple_tool_calls": {"type": "boolean"},
+                        "execution_mode": {"type": "string"},
                    },
                    "required": ["prompt"],
                },
--- a/autogpt_platform/backend/test/copilot/init.py
+++ b/autogpt_platform/backend/test/copilot/init.py
--- a/autogpt_platform/backend/test/copilot/dry_run_loop_test.py
+++ b/autogpt_platform/backend/test/copilot/dry_run_loop_test.py
@@ -0,0 +1,394 @@
+"""Prompt regression tests AND functional tests for the dry-run verification loop.
+
+NOTE: This file lives in test/copilot/ rather than being colocated with a
+single source module because it is a cross-cutting test spanning multiple
+modules: prompting.py, service.py, agent_generation_guide.md, and run_agent.py.
+
+These tests verify that the create -> dry-run -> fix iterative workflow is
+properly communicated through tool descriptions, the prompting supplement,
+and the agent building guide.
+
+After deduplication, the full dry-run workflow lives in the
+agent_generation_guide.md only. The system prompt and individual tool
+descriptions no longer repeat it — they keep a minimal footprint.
+
+**Intentionally brittle**: the assertions check for specific substrings so
+that accidental removal or rewording of key instructions is caught. If you
+deliberately reword a prompt, update the corresponding assertion here.
+
+--- Functional tests (added separately) ---
+
+The dry-run loop is primarily a *prompt/guide* feature — the copilot reads
+the guide and follows its instructions.  There are no standalone Python
+functions that implement "loop until passing" logic; the loop is driven by
+the LLM.  However, several pieces of real Python infrastructure make the
+loop possible:
+
+1. The ``run_agent`` and ``run_block`` OpenAI tool schemas expose a
+   ``dry_run`` boolean parameter that the LLM must be able to set.
+2. The ``RunAgentInput`` Pydantic model validates ``dry_run`` as a required
+   bool, so the executor can branch on it.
+3. The ``_check_prerequisites`` method in ``RunAgentTool`` bypasses
+   credential and missing-input gates when ``dry_run=True``.
+4. The guide documents the workflow steps in a specific order that the LLM
+   must follow: create/edit -> dry-run -> inspect -> fix -> repeat.
+
+The functional test classes below exercise items 1-4 directly.
+"""
+
+import re
+from pathlib import Path
+from typing import Any, cast
+
+import pytest
+from openai.types.chat import ChatCompletionToolParam
+from pydantic import ValidationError
+
+from backend.copilot.prompting import get_sdk_supplement
+from backend.copilot.service import DEFAULT_SYSTEM_PROMPT
+from backend.copilot.tools import TOOL_REGISTRY
+from backend.copilot.tools.run_agent import RunAgentInput
+
+# Resolved once for the whole module so individual tests stay fast.
+_SDK_SUPPLEMENT = get_sdk_supplement(use_e2b=False, cwd="/tmp/test")
+
+
+# ---------------------------------------------------------------------------
+# Prompt regression tests (original)
+# ---------------------------------------------------------------------------
+
+
+class TestSystemPromptBasics:
+    """Verify the system prompt includes essential baseline content.
+
+    After deduplication, the dry-run workflow lives only in the guide.
+    The system prompt carries tone and personality only.
+    """
+
+    def test_mentions_automations(self):
+        assert "automations" in DEFAULT_SYSTEM_PROMPT.lower()
+
+    def test_mentions_action_oriented(self):
+        assert "action-oriented" in DEFAULT_SYSTEM_PROMPT.lower()
+
+
+class TestToolDescriptionsDryRunLoop:
+    """Verify tool descriptions and parameters related to the dry-run loop."""
+
+    def test_get_agent_building_guide_mentions_workflow(self):
+        desc = TOOL_REGISTRY["get_agent_building_guide"].description
+        assert "dry-run" in desc.lower()
+
+    def test_run_agent_dry_run_param_exists_and_is_boolean(self):
+        schema = TOOL_REGISTRY["run_agent"].as_openai_tool()
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        assert "dry_run" in params["properties"]
+        assert params["properties"]["dry_run"]["type"] == "boolean"
+
+    def test_run_agent_dry_run_param_mentions_simulation(self):
+        """After deduplication the dry_run param description mentions simulation."""
+        schema = TOOL_REGISTRY["run_agent"].as_openai_tool()
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        dry_run_desc = params["properties"]["dry_run"]["description"]
+        assert "simulat" in dry_run_desc.lower()
+
+
+class TestPromptingSupplementContent:
+    """Verify the prompting supplement (via get_sdk_supplement) includes
+    essential shared tool notes.  After deduplication, the dry-run workflow
+    lives only in the guide; the supplement carries storage, file-handling,
+    and tool-discovery notes.
+    """
+
+    def test_includes_tool_discovery_priority(self):
+        assert "Tool Discovery Priority" in _SDK_SUPPLEMENT
+
+    def test_includes_find_block_first(self):
+        assert "find_block first" in _SDK_SUPPLEMENT or "find_block" in _SDK_SUPPLEMENT
+
+    def test_includes_send_authenticated_web_request(self):
+        assert "SendAuthenticatedWebRequestBlock" in _SDK_SUPPLEMENT
+
+
+class TestAgentBuildingGuideDryRunLoop:
+    """Verify the agent building guide includes the dry-run loop."""
+
+    @pytest.fixture
+    def guide_content(self):
+        guide_path = (
+            Path(__file__).resolve().parent.parent.parent
+            / "backend"
+            / "copilot"
+            / "sdk"
+            / "agent_generation_guide.md"
+        )
+        return guide_path.read_text(encoding="utf-8")
+
+    def test_has_dry_run_verification_section(self, guide_content):
+        assert "REQUIRED: Dry-Run Verification Loop" in guide_content
+
+    def test_workflow_includes_dry_run_step(self, guide_content):
+        assert "dry_run=True" in guide_content
+
+    def test_mentions_good_vs_bad_output(self, guide_content):
+        assert "**Good output**" in guide_content
+        assert "**Bad output**" in guide_content
+
+    def test_mentions_repeat_until_pass(self, guide_content):
+        lower = guide_content.lower()
+        assert "repeat" in lower
+        assert "clearly unfixable" in lower
+
+    def test_mentions_wait_for_result(self, guide_content):
+        assert "wait_for_result=120" in guide_content
+
+    def test_mentions_view_agent_output(self, guide_content):
+        assert "view_agent_output" in guide_content
+
+    def test_workflow_has_dry_run_and_inspect_steps(self, guide_content):
+        assert "**Dry-run**" in guide_content
+        assert "**Inspect & fix**" in guide_content
+
+
+# ---------------------------------------------------------------------------
+# Functional tests: tool schema validation
+# ---------------------------------------------------------------------------
+
+
+class TestRunAgentToolSchema:
+    """Validate the run_agent OpenAI tool schema exposes dry_run correctly.
+
+    These go beyond substring checks — they verify the full schema structure
+    that the LLM receives, ensuring the parameter is well-formed and will be
+    parsed correctly by OpenAI function-calling.
+    """
+
+    @pytest.fixture
+    def schema(self) -> ChatCompletionToolParam:
+        return TOOL_REGISTRY["run_agent"].as_openai_tool()
+
+    def test_schema_is_valid_openai_tool(self, schema: ChatCompletionToolParam):
+        """The schema has the required top-level OpenAI structure."""
+        assert schema["type"] == "function"
+        assert "function" in schema
+        func = schema["function"]
+        assert "name" in func
+        assert "description" in func
+        assert "parameters" in func
+        assert func["name"] == "run_agent"
+
+    def test_dry_run_is_required(self, schema: ChatCompletionToolParam):
+        """dry_run must be in 'required' so the LLM always provides it explicitly."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        required = params.get("required", [])
+        assert "dry_run" in required
+
+    def test_dry_run_is_boolean_type(self, schema: ChatCompletionToolParam):
+        """dry_run must be typed as boolean so the LLM generates true/false."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        assert params["properties"]["dry_run"]["type"] == "boolean"
+
+    def test_dry_run_description_is_nonempty(self, schema: ChatCompletionToolParam):
+        """The description must be present and substantive for LLM guidance."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        desc = params["properties"]["dry_run"]["description"]
+        assert isinstance(desc, str)
+        assert len(desc) > 10, "Description too short to guide the LLM"
+
+    def test_wait_for_result_coexists_with_dry_run(
+        self, schema: ChatCompletionToolParam
+    ):
+        """wait_for_result must also be present — the guide instructs the LLM
+        to pass both dry_run=True and wait_for_result=120 together."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        assert "wait_for_result" in params["properties"]
+        assert params["properties"]["wait_for_result"]["type"] == "integer"
+
+
+class TestRunBlockToolSchema:
+    """Validate the run_block OpenAI tool schema exposes dry_run correctly."""
+
+    @pytest.fixture
+    def schema(self) -> ChatCompletionToolParam:
+        return TOOL_REGISTRY["run_block"].as_openai_tool()
+
+    def test_schema_is_valid_openai_tool(self, schema: ChatCompletionToolParam):
+        assert schema["type"] == "function"
+        func = schema["function"]
+        assert func["name"] == "run_block"
+        assert "parameters" in func
+
+    def test_dry_run_exists_and_is_boolean(self, schema: ChatCompletionToolParam):
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        props = params["properties"]
+        assert "dry_run" in props
+        assert props["dry_run"]["type"] == "boolean"
+
+    def test_dry_run_is_required(self, schema: ChatCompletionToolParam):
+        """dry_run must be required — along with block_id and input_data."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        required = params.get("required", [])
+        assert "dry_run" in required
+        assert "block_id" in required
+        assert "input_data" in required
+
+    def test_dry_run_description_mentions_preview(
+        self, schema: ChatCompletionToolParam
+    ):
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        desc = params["properties"]["dry_run"]["description"]
+        assert isinstance(desc, str)
+        assert (
+            "preview mode" in desc.lower()
+        ), "run_block dry_run description should mention preview mode"
+
+
+# ---------------------------------------------------------------------------
+# Functional tests: RunAgentInput Pydantic model
+# ---------------------------------------------------------------------------
+
+
+class TestRunAgentInputModel:
+    """Validate RunAgentInput Pydantic model handles dry_run correctly.
+
+    The executor reads dry_run from this model, so it must parse, default,
+    and validate properly.
+    """
+
+    def test_dry_run_accepts_true(self):
+        model = RunAgentInput(username_agent_slug="user/agent", dry_run=True)
+        assert model.dry_run is True
+
+    def test_dry_run_accepts_false(self):
+        """dry_run=False must be accepted when provided explicitly."""
+        model = RunAgentInput(username_agent_slug="user/agent", dry_run=False)
+        assert model.dry_run is False
+
+    def test_dry_run_coerces_truthy_int(self):
+        """Pydantic bool fields coerce int 1 to True."""
+        model = RunAgentInput(username_agent_slug="user/agent", dry_run=1)  # type: ignore[arg-type]
+        assert model.dry_run is True
+
+    def test_dry_run_coerces_falsy_int(self):
+        """Pydantic bool fields coerce int 0 to False."""
+        model = RunAgentInput(username_agent_slug="user/agent", dry_run=0)  # type: ignore[arg-type]
+        assert model.dry_run is False
+
+    def test_dry_run_with_wait_for_result(self):
+        """The guide instructs passing both dry_run=True and wait_for_result=120.
+        The model must accept this combination."""
+        model = RunAgentInput(
+            username_agent_slug="user/agent",
+            dry_run=True,
+            wait_for_result=120,
+        )
+        assert model.dry_run is True
+        assert model.wait_for_result == 120
+
+    def test_wait_for_result_upper_bound(self):
+        """wait_for_result is bounded at 300 seconds (ge=0, le=300)."""
+        with pytest.raises(ValidationError):
+            RunAgentInput(
+                username_agent_slug="user/agent",
+                dry_run=True,
+                wait_for_result=301,
+            )
+
+    def test_string_fields_are_stripped(self):
+        """The strip_strings validator should strip whitespace from string fields."""
+        model = RunAgentInput(username_agent_slug="  user/agent  ", dry_run=True)
+        assert model.username_agent_slug == "user/agent"
+
+
+# ---------------------------------------------------------------------------
+# Functional tests: guide documents the correct workflow ordering
+# ---------------------------------------------------------------------------
+
+
+class TestGuideWorkflowOrdering:
+    """Verify the guide documents workflow steps in the correct order.
+
+    The LLM must see: create/edit -> dry-run -> inspect -> fix -> repeat.
+    If these steps are reordered, the copilot would follow the wrong sequence.
+    These tests verify *ordering*, not just presence.
+    """
+
+    @pytest.fixture
+    def guide_content(self) -> str:
+        guide_path = (
+            Path(__file__).resolve().parent.parent.parent
+            / "backend"
+            / "copilot"
+            / "sdk"
+            / "agent_generation_guide.md"
+        )
+        return guide_path.read_text(encoding="utf-8")
+
+    def test_create_before_dry_run_in_workflow(self, guide_content: str):
+        """Step 7 (Save/create_agent) must appear before step 8 (Dry-run)."""
+        create_pos = guide_content.index("create_agent")
+        dry_run_pos = guide_content.index("dry_run=True")
+        assert (
+            create_pos < dry_run_pos
+        ), "create_agent must appear before dry_run=True in the workflow"
+
+    def test_dry_run_before_inspect_in_verification_section(self, guide_content: str):
+        """In the verification loop section, Dry-run step must come before
+        Inspect & fix step."""
+        section_start = guide_content.index("REQUIRED: Dry-Run Verification Loop")
+        section = guide_content[section_start:]
+        dry_run_pos = section.index("**Dry-run**")
+        inspect_pos = section.index("**Inspect")
+        assert (
+            dry_run_pos < inspect_pos
+        ), "Dry-run step must come before Inspect & fix in the verification loop"
+
+    def test_fix_before_repeat_in_verification_section(self, guide_content: str):
+        """The Fix step must come before the Repeat step."""
+        section_start = guide_content.index("REQUIRED: Dry-Run Verification Loop")
+        section = guide_content[section_start:]
+        fix_pos = section.index("**Fix**")
+        repeat_pos = section.index("**Repeat**")
+        assert fix_pos < repeat_pos
+
+    def test_good_output_before_bad_output(self, guide_content: str):
+        """Good output examples should be listed before bad output examples,
+        so the LLM sees the success pattern first."""
+        good_pos = guide_content.index("**Good output**")
+        bad_pos = guide_content.index("**Bad output**")
+        assert good_pos < bad_pos
+
+    def test_numbered_steps_in_verification_section(self, guide_content: str):
+        """The step-by-step workflow should have numbered steps 1-5."""
+        section_start = guide_content.index("Step-by-step workflow")
+        section = guide_content[section_start:]
+        # The section should contain numbered items 1 through 5
+        for step_num in range(1, 6):
+            assert (
+                f"{step_num}. " in section
+            ), f"Missing numbered step {step_num} in verification workflow"
+
+    def test_workflow_steps_are_in_numbered_order(self, guide_content: str):
+        """The main workflow steps (1-9) must appear in ascending order."""
+        # Extract the numbered workflow items from the top-level workflow section
+        workflow_start = guide_content.index("### Workflow for Creating/Editing Agents")
+        # End at the next ### section
+        next_section = guide_content.index("### Agent JSON Structure")
+        workflow_section = guide_content[workflow_start:next_section]
+        step_positions = []
+        for step_num in range(1, 10):
+            pattern = rf"^{step_num}\.\s"
+            match = re.search(pattern, workflow_section, re.MULTILINE)
+            if match:
+                step_positions.append((step_num, match.start()))
+        # Verify at least steps 1-9 are present and in order
+        assert (
+            len(step_positions) >= 9
+        ), f"Expected 9 workflow steps, found {len(step_positions)}"
+        for i in range(1, len(step_positions)):
+            prev_num, prev_pos = step_positions[i - 1]
+            curr_num, curr_pos = step_positions[i]
+            assert prev_pos < curr_pos, (
+                f"Step {prev_num} (pos {prev_pos}) should appear before "
+                f"step {curr_num} (pos {curr_pos})"
+            )
--- a/autogpt_platform/docker-compose.yml
+++ b/autogpt_platform/docker-compose.yml
@@ -98,6 +98,7 @@ services:
      - CLAMD_CONF_MaxScanSize=100M
      - CLAMD_CONF_MaxThreads=12
      - CLAMD_CONF_ReadTimeout=300
+      - CLAMD_CONF_TCPAddr=0.0.0.0
    healthcheck:
      test: ["CMD-SHELL", "clamdscan --version || exit 1"]
      interval: 30s
--- a/autogpt_platform/frontend/AGENTS.md
+++ b/autogpt_platform/frontend/AGENTS.md
@@ -40,6 +40,8 @@ After making **any** code changes in the frontend, you MUST run the following co

 Do NOT skip these steps. If any command reports errors, fix them and re-run until clean. Only then may you consider the task complete. If typing keeps failing, stop and ask the user.

+4. `pnpm test:unit` — run integration tests; fix any failures
+
 ### Code Style

 - Fully capitalize acronyms in symbols, e.g. `graphID`, `useBackendAPI`
@@ -62,7 +64,7 @@ Do NOT skip these steps. If any command reports errors, fix them and re-run unti
 - **Icons**: Phosphor Icons only
 - **Feature Flags**: LaunchDarkly integration
 - **Error Handling**: ErrorCard for render errors, toast for mutations, Sentry for exceptions
- **Testing**: Playwright for E2E, Storybook for component development
+- **Testing**: Vitest + React Testing Library + MSW for integration tests (primary), Playwright for E2E, Storybook for visual

 ## Environment Configuration

@@ -84,7 +86,12 @@ See @CONTRIBUTING.md for complete patterns. Quick reference:
   - Regenerate with `pnpm generate:api`
   - Pattern: `use{Method}{Version}{OperationName}`
 4. **Styling**: Tailwind CSS only, use design tokens, Phosphor Icons only
-5. **Testing**: Add Storybook stories for new components, Playwright for E2E. When fixing a bug, write a failing Playwright test first (use `.fixme` annotation), implement the fix, then remove the annotation.
+5. **Testing**: Integration tests are the default (~90%). See `TESTING.md` for full details.
+   - **New pages/features**: Write integration tests in `__tests__/` next to `page.tsx` using Vitest + RTL + MSW
+   - **API mocking**: Use Orval-generated MSW handlers from `@/app/api/__generated__/endpoints/{tag}/{tag}.msw.ts`
+   - **Run**: `pnpm test:unit` (integration/unit), `pnpm test` (Playwright E2E)
+   - **Storybook**: For design system components in `src/components/`
+   - **TDD**: Write a failing test first, implement, then verify
 6. **Code conventions**:
   - Use function declarations (not arrow functions) for components/handlers
   - Do not use `useCallback` or `useMemo` unless asked to optimise a given function
--- a/Show More
+++ b/Show More