fix(copilot): use transient_api_error code for exhausted transient retries

When the except-Exception transient-retry budget was exhausted the post-loop StreamError yielded code='sdk_stream_error' instead of 'transient_api_error' and called _friendly_error_text(raw) instead of FRIENDLY_TRANSIENT_MSG. This made the client unable to show the same "Try again" affordance as the _HandledStreamError path. Add transient_exhausted flag; check it in the post-loop alongside attempts_exhausted to emit the correct code/text. Also collapse the unnecessary split f-string in the retry StreamStatus message, and add a version comment on the CLAUDE_CODE_DISABLE_* env var block.
fix(copilot): fix StreamError ordering and cap exponential backoff
2026-04-08 03:00:28 -04:00 · 2026-04-08 10:19:57 +07:00 · 2026-04-08 10:04:17 +07:00 · 2026-04-08 09:48:56 +07:00 · 2026-04-07 23:19:06 +07:00 · 2026-04-07 21:16:20 +07:00
204 changed files with 21986 additions and 2365 deletions
--- a/.claude/skills/pr-address/SKILL.md
+++ b/.claude/skills/pr-address/SKILL.md
@@ -95,6 +95,28 @@ Address comments **one at a time**: fix → commit → push → inline reply →
 | Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="🤖 Fixed in <commit-sha>: <description>"` |
 | Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="🤖 Fixed in <commit-sha>: <description>"` |

+## Codecov coverage
+
+Codecov patch target is **80%** on changed lines. Checks are **informational** (not blocking) but should be green.
+
+### Running coverage locally
+
+**Backend** (from `autogpt_platform/backend/`):
+```bash
+poetry run pytest -s -vv --cov=backend --cov-branch --cov-report term-missing
+```
+
+**Frontend** (from `autogpt_platform/frontend/`):
+```bash
+pnpm vitest run --coverage
+```
+
+### When codecov/patch fails
+
+1. Find uncovered files: `git diff --name-only $(gh pr view --json baseRefName --jq '.baseRefName')...HEAD`
+2. For each uncovered file — extract inline logic to `helpers.ts`/`helpers.py` and test those (highest ROI). Colocate tests as `*_test.py` (backend) or `__tests__/*.test.ts` (frontend).
+3. Run coverage locally to verify, commit, push.
+
 ## Format and commit

 After fixing, format the changed code:
--- a/.claude/skills/pr-test/SKILL.md
+++ b/.claude/skills/pr-test/SKILL.md
@@ -530,9 +530,19 @@ After showing all screenshots, output a **detailed** summary table:
 # but Homebrew bash is 5.x; Linux typically has bash 5.x). If running on Bash <4, use a
 # plain variable with a lookup function instead.
 declare -A SCREENSHOT_EXPLANATIONS=(
-  ["01-login-page.png"]="Shows the login page loaded successfully with SSO options visible."
-  ["02-builder-with-block.png"]="The builder canvas displays the newly added block connected to the trigger."
-  # ... one entry per screenshot, using the same explanations you showed the user above
+  # Each explanation MUST answer three things:
+  #   1. FLOW: Which test scenario / user journey is this part of?
+  #   2. STEPS: What exact actions were taken to reach this state?
+  #   3. EVIDENCE: What does this screenshot prove (pass/fail/data)?
+  #
+  # Good example:
+  #   ["03-cost-log-after-run.png"]="Flow: LLM block cost tracking. Steps: Logged in as tester@gmail.com → ran 'Cost Test Agent' → waited for COMPLETED status. Evidence: PlatformCostLog table shows 1 new row with cost_microdollars=1234 and correct user_id."
+  #
+  # Bad example (too vague — never do this):
+  #   ["03-cost-log.png"]="Shows the cost log table."
+  ["01-login-page.png"]="Flow: Login flow. Steps: Opened /login. Evidence: Login page renders with email/password fields and SSO options visible."
+  ["02-builder-with-block.png"]="Flow: Block execution. Steps: Logged in → /build → added LLM block. Evidence: Builder canvas shows block connected to trigger, ready to run."
+  # ... one entry per screenshot using the flow/steps/evidence format above
 )

 TEST_RESULTS_TABLE="| 1 | Login flow | PASS | N/A | 01-login-before.png, 02-login-after.png |
@@ -547,6 +557,9 @@ Upload screenshots to the PR using the GitHub Git API (no local git operations

 **This step is MANDATORY. Every test run MUST post a PR comment with screenshots. No exceptions.**

+> **CRITICAL — NEVER post a bare directory link like `https://github.com/.../tree/...`.**
+> Every screenshot MUST appear as `![name](raw_url)` inline in the PR comment so reviewers can see them without clicking any links. After posting, the verification step below greps the comment for `![` tags and exits 1 if none are found — the test run is considered incomplete until this passes.
+
 ```bash
 # Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
 REPO="Significant-Gravitas/AutoGPT"
@@ -582,12 +595,25 @@ for img in "${SCREENSHOT_FILES[@]}"; do
 done
 TREE_JSON+=']'

-# Step 2: Create tree, commit, and branch ref
+# Step 2: Create tree, commit (with parent), and branch ref
 TREE_SHA=$(echo "$TREE_JSON" | jq -c '{tree: .}' | gh api "repos/${REPO}/git/trees" --input - --jq '.sha')
-COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
-  -f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
-  -f tree="$TREE_SHA" \
-  --jq '.sha')
+
+# Resolve existing branch tip as parent (avoids orphan commits on repeat runs)
+PARENT_SHA=$(gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" --jq '.object.sha' 2>/dev/null || true)
+if [ -n "$PARENT_SHA" ]; then
+  COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
+    -f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
+    -f tree="$TREE_SHA" \
+    -f "parents[]=$PARENT_SHA" \
+    --jq '.sha')
+else
+  # First commit on this branch — no parent
+  COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
+    -f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
+    -f tree="$TREE_SHA" \
+    --jq '.sha')
+fi
+
 gh api "repos/${REPO}/git/refs" \
  -f ref="refs/heads/${SCREENSHOTS_BRANCH}" \
  -f sha="$COMMIT_SHA" 2>/dev/null \
@@ -656,17 +682,123 @@ ${IMAGE_MARKDOWN}
 ${FAILED_SECTION}
 INNEREOF

-gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE"
+POSTED_BODY=$(gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE" --jq '.body')
 rm -f "$COMMENT_FILE"
 ```

 **The PR comment MUST include:**
 1. A summary table of all scenarios with PASS/FAIL and before/after API evidence
 2. Every successfully uploaded screenshot rendered inline; any failed uploads listed with manual attachment instructions
-3. A 1-2 sentence explanation below each screenshot describing what it proves
+3. A structured explanation below each screenshot covering: **Flow** (which scenario), **Steps** (exact actions taken to reach this state), **Evidence** (what this proves — pass/fail/data values). A bare "shows the page" caption is not acceptable.

 This approach uses the GitHub Git API to create blobs, trees, commits, and refs entirely server-side. No local `git checkout` or `git push` — safe for worktrees and won't interfere with the PR branch.

+**Verify inline rendering after posting — this is required, not optional:**
+
+```bash
+# 1. Confirm the posted comment body contains inline image markdown syntax
+if ! echo "$POSTED_BODY" | grep -q '!\['; then
+  echo "❌ FAIL: No inline image tags in posted comment body. Re-check IMAGE_MARKDOWN and re-post."
+  exit 1
+fi
+
+# 2. Verify at least one raw URL actually resolves (catches wrong branch name, wrong path, etc.)
+FIRST_IMG_URL=$(echo "$POSTED_BODY" | grep -o 'https://raw.githubusercontent.com[^)]*' | head -1)
+if [ -n "$FIRST_IMG_URL" ]; then
+  HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" --max-time 10 "$FIRST_IMG_URL")
+  if [ "$HTTP_STATUS" = "200" ]; then
+    echo "✅ Inline images confirmed and raw URL resolves (HTTP 200)"
+  else
+    echo "❌ FAIL: Raw image URL returned HTTP $HTTP_STATUS — images will not render inline."
+    echo "   URL: $FIRST_IMG_URL"
+    echo "   Check branch name, path, and that the push succeeded."
+    exit 1
+  fi
+else
+  echo "⚠️  Could not extract a raw URL from the comment — verify manually."
+fi
+```
+
+## Step 8: Evaluate test completeness and post a GitHub review
+
+After posting the PR comment, evaluate whether the test run actually covered everything it needed to. This is NOT a rubber-stamp — be critical. Then post a formal GitHub review so the PR author and reviewers can see the verdict.
+
+### 8a. Evaluate against the test plan
+
+Re-read `$RESULTS_DIR/test-plan.md` (written in Step 2) and `$RESULTS_DIR/test-report.md` (written in Step 5). For each scenario in the plan, answer:
+
+> **Note:** `test-report.md` is written in Step 5. If it doesn't exist, write it before proceeding here — see the Step 5 template. Do not skip evaluation because the file is missing; create it from your notes instead.
+
+| Question | Pass criteria |
+|----------|--------------|
+| Was it tested? | Explicit steps were executed, not just described |
+| Is there screenshot evidence? | At least one before/after screenshot per scenario |
+| Did the core feature work correctly? | Expected state matches actual state |
+| Were negative cases tested? | At least one failure/rejection case per feature |
+| Was DB/API state verified (not just UI)? | Raw API response or DB query confirms state change |
+
+Build a verdict:
+- **APPROVE** — every scenario tested, evidence present, no bugs found or all bugs are minor/known
+- **REQUEST_CHANGES** — one or more: untested scenarios, missing evidence, bugs found, data not verified
+
+### 8b. Post the GitHub review
+
+```bash
+EVAL_FILE=$(mktemp)
+
+# === STEP A: Write header ===
+cat > "$EVAL_FILE" << 'ENDEVAL'
+## 🧪 Test Evaluation
+
+### Coverage checklist
+ENDEVAL
+
+# === STEP B: Append ONE line per scenario — do this BEFORE calculating verdict ===
+# Format: "- ✅ **Scenario N – name**: <what was done and verified>"
+#      or "- ❌ **Scenario N – name**: <what is missing or broken>"
+# Examples:
+#   echo "- ✅ **Scenario 1 – Login flow**: tested, screenshot evidence present, auth token verified via API" >> "$EVAL_FILE"
+#   echo "- ❌ **Scenario 3 – Cost logging**: NOT verified in DB — UI showed entry but raw SQL query was skipped" >> "$EVAL_FILE"
+#
+# !!! IMPORTANT: append ALL scenario lines here before proceeding to STEP C !!!
+
+# === STEP C: Derive verdict from the checklist — runs AFTER all lines are appended ===
+FAIL_COUNT=$(grep -c "^- ❌" "$EVAL_FILE" || true)
+if [ "$FAIL_COUNT" -eq 0 ]; then
+  VERDICT="APPROVE"
+else
+  VERDICT="REQUEST_CHANGES"
+fi
+
+# === STEP D: Append verdict section ===
+cat >> "$EVAL_FILE" << ENDVERDICT
+
+### Verdict
+ENDVERDICT
+
+if [ "$VERDICT" = "APPROVE" ]; then
+  echo "✅ All scenarios covered with evidence. No blocking issues found." >> "$EVAL_FILE"
+else
+  echo "❌ $FAIL_COUNT scenario(s) incomplete or have confirmed bugs. See ❌ items above." >> "$EVAL_FILE"
+  echo "" >> "$EVAL_FILE"
+  echo "**Required before merge:** address each ❌ item above." >> "$EVAL_FILE"
+fi
+
+# === STEP E: Post the review ===
+gh api "repos/${REPO}/pulls/$PR_NUMBER/reviews" \
+  --method POST \
+  -f body="$(cat "$EVAL_FILE")" \
+  -f event="$VERDICT"
+
+rm -f "$EVAL_FILE"
+```
+
+**Rules:**
+- Never auto-approve without checking every scenario in the test plan
+- `REQUEST_CHANGES` if ANY scenario is untested, lacks DB/API evidence, or has a confirmed bug
+- The evaluation body must list every scenario explicitly (✅ or ❌) — not just the failures
+- If you find new bugs during evaluation, add them to the request-changes body and (if `--fix` flag is set) fix them before posting
+
 ## Fix mode (--fix flag)

 When `--fix` is present, the standard is HIGHER. Do not just note issues — FIX them immediately.
--- a/.claude/skills/write-frontend-tests/SKILL.md
+++ b/.claude/skills/write-frontend-tests/SKILL.md
@@ -0,0 +1,224 @@
+---
+name: write-frontend-tests
+description: "Analyze the current branch diff against dev, plan integration tests for changed frontend pages/components, and write them. TRIGGER when user asks to write frontend tests, add test coverage, or 'write tests for my changes'."
+user-invocable: true
+args: "[base branch] — defaults to dev. Optionally pass a specific base branch to diff against."
+metadata:
+  author: autogpt-team
+  version: "1.0.0"
+---
+
+# Write Frontend Tests
+
+Analyze the current branch's frontend changes, plan integration tests, and write them.
+
+## References
+
+Before writing any tests, read the testing rules and conventions:
+
+- `autogpt_platform/frontend/TESTING.md` — testing strategy, file locations, examples
+- `autogpt_platform/frontend/src/tests/AGENTS.md` — detailed testing rules, MSW patterns, decision flowchart
+- `autogpt_platform/frontend/src/tests/integrations/test-utils.tsx` — custom render with providers
+- `autogpt_platform/frontend/src/tests/integrations/vitest.setup.tsx` — MSW server setup
+
+## Step 1: Identify changed frontend files
+
+```bash
+BASE_BRANCH="${ARGUMENTS:-dev}"
+cd autogpt_platform/frontend
+
+# Get changed frontend files (excluding generated, config, and test files)
+git diff "$BASE_BRANCH"...HEAD --name-only -- src/ \
+  | grep -v '__generated__' \
+  | grep -v '__tests__' \
+  | grep -v '\.test\.' \
+  | grep -v '\.stories\.' \
+  | grep -v '\.spec\.'
+```
+
+Also read the diff to understand what changed:
+
+```bash
+git diff "$BASE_BRANCH"...HEAD --stat -- src/
+git diff "$BASE_BRANCH"...HEAD -- src/ | head -500
+```
+
+## Step 2: Categorize changes and find test targets
+
+For each changed file, determine:
+
+1. **Is it a page?** (`page.tsx`) — these are the primary test targets
+2. **Is it a hook?** (`use*.ts`) — test via the page that uses it
+3. **Is it a component?** (`.tsx` in `components/`) — test via the parent page unless it's complex enough to warrant isolation
+4. **Is it a helper?** (`helpers.ts`, `utils.ts`) — unit test directly if pure logic
+
+**Priority order:**
+1. Pages with new/changed data fetching or user interactions
+2. Components with complex internal logic (modals, forms, wizards)
+3. Hooks with non-trivial business logic
+4. Pure helper functions
+
+Skip: styling-only changes, type-only changes, config changes.
+
+## Step 3: Check for existing tests
+
+For each test target, check if tests already exist:
+
+```bash
+# For a page at src/app/(platform)/library/page.tsx
+ls src/app/\(platform\)/library/__tests__/ 2>/dev/null
+
+# For a component at src/app/(platform)/library/components/AgentCard/AgentCard.tsx
+ls src/app/\(platform\)/library/components/AgentCard/__tests__/ 2>/dev/null
+```
+
+Note which targets have no tests (need new files) vs which have tests that need updating.
+
+## Step 4: Identify API endpoints used
+
+For each test target, find which API hooks are used:
+
+```bash
+# Find generated API hook imports in the changed files
+grep -rn 'from.*__generated__/endpoints' src/app/\(platform\)/library/
+grep -rn 'use[A-Z].*V[12]' src/app/\(platform\)/library/
+```
+
+For each API hook found, locate the corresponding MSW handler:
+
+```bash
+# If the page uses useGetV2ListLibraryAgents, find its MSW handlers
+grep -rn 'getGetV2ListLibraryAgents.*Handler' src/app/api/__generated__/endpoints/library/library.msw.ts
+```
+
+List every MSW handler you will need (200 for happy path, 4xx for error paths).
+
+## Step 5: Write the test plan
+
+Before writing code, output a plan as a numbered list:
+
+```
+Test plan for [branch name]:
+
+1. src/app/(platform)/library/__tests__/main.test.tsx (NEW)
+   - Renders page with agent list (MSW 200)
+   - Shows loading state
+   - Shows error state (MSW 422)
+   - Handles empty agent list
+
+2. src/app/(platform)/library/__tests__/search.test.tsx (NEW)
+   - Filters agents by search query
+   - Shows no results message
+   - Clears search
+
+3. src/app/(platform)/library/components/AgentCard/__tests__/AgentCard.test.tsx (UPDATE)
+   - Add test for new "duplicate" action
+```
+
+Present this plan to the user. Wait for confirmation before proceeding. If the user has feedback, adjust the plan.
+
+## Step 6: Write the tests
+
+For each test file in the plan, follow these conventions:
+
+### File structure
+
+```tsx
+import { render, screen, waitFor } from "@/tests/integrations/test-utils";
+import { server } from "@/mocks/mock-server";
+// Import MSW handlers for endpoints the page uses
+import {
+  getGetV2ListLibraryAgentsMockHandler200,
+  getGetV2ListLibraryAgentsMockHandler422,
+} from "@/app/api/__generated__/endpoints/library/library.msw";
+// Import the component under test
+import LibraryPage from "../page";
+
+describe("LibraryPage", () => {
+  test("renders agent list from API", async () => {
+    server.use(getGetV2ListLibraryAgentsMockHandler200());
+
+    render(<LibraryPage />);
+
+    expect(await screen.findByText(/my agents/i)).toBeDefined();
+  });
+
+  test("shows error state on API failure", async () => {
+    server.use(getGetV2ListLibraryAgentsMockHandler422());
+
+    render(<LibraryPage />);
+
+    expect(await screen.findByText(/error/i)).toBeDefined();
+  });
+});
+```
+
+### Rules
+
+- Use `render()` from `@/tests/integrations/test-utils` (NOT from `@testing-library/react` directly)
+- Use `server.use()` to set up MSW handlers BEFORE rendering
+- Use `findBy*` (async) for elements that appear after data fetching — NOT `getBy*`
+- Use `getBy*` only for elements that are immediately present in the DOM
+- Use `screen` queries — do NOT destructure from `render()`
+- Use `waitFor` when asserting side effects or state changes after interactions
+- Import `fireEvent` or `userEvent` from the test-utils for interactions
+- Do NOT mock internal hooks or functions — mock at the API boundary via MSW
+- Do NOT use `act()` manually — `render` and `fireEvent` handle it
+- Keep tests focused: one behavior per test
+- Use descriptive test names that read like sentences
+
+### Test location
+
+```
+# For pages: __tests__/ next to page.tsx
+src/app/(platform)/library/__tests__/main.test.tsx
+
+# For complex standalone components: __tests__/ inside component folder
+src/app/(platform)/library/components/AgentCard/__tests__/AgentCard.test.tsx
+
+# For pure helpers: co-located .test.ts
+src/app/(platform)/library/helpers.test.ts
+```
+
+### Custom MSW overrides
+
+When the auto-generated faker data is not enough, override with specific data:
+
+```tsx
+import { http, HttpResponse } from "msw";
+
+server.use(
+  http.get("http://localhost:3000/api/proxy/api/v2/library/agents", () => {
+    return HttpResponse.json({
+      agents: [
+        { id: "1", name: "Test Agent", description: "A test agent" },
+      ],
+      pagination: { total_items: 1, total_pages: 1, page: 1, page_size: 10 },
+    });
+  }),
+);
+```
+
+Use the proxy URL pattern: `http://localhost:3000/api/proxy/api/v{version}/{path}` — this matches the MSW base URL configured in `orval.config.ts`.
+
+## Step 7: Run and verify
+
+After writing all tests:
+
+```bash
+cd autogpt_platform/frontend
+pnpm test:unit --reporter=verbose
+```
+
+If tests fail:
+1. Read the error output carefully
+2. Fix the test (not the source code, unless there is a genuine bug)
+3. Re-run until all pass
+
+Then run the full checks:
+
+```bash
+pnpm format
+pnpm lint
+pnpm types
+```
--- a/.github/workflows/platform-fullstack-ci.yml
+++ b/.github/workflows/platform-fullstack-ci.yml
@@ -179,21 +179,30 @@ jobs:
          pip install pyyaml

          # Resolve extends and generate a flat compose file that bake can understand
+          export NEXT_PUBLIC_SOURCEMAPS NEXT_PUBLIC_PW_TEST
          docker compose -f docker-compose.yml config > docker-compose.resolved.yml

+          # Ensure NEXT_PUBLIC_SOURCEMAPS is in resolved compose
+          # (docker compose config on some versions drops this arg)
+          if ! grep -q "NEXT_PUBLIC_SOURCEMAPS" docker-compose.resolved.yml; then
+            echo "Injecting NEXT_PUBLIC_SOURCEMAPS into resolved compose (docker compose config dropped it)"
+            sed -i '/NEXT_PUBLIC_PW_TEST/a\        NEXT_PUBLIC_SOURCEMAPS: "true"' docker-compose.resolved.yml
+          fi
+
          # Add cache configuration to the resolved compose file
          python ../.github/workflows/scripts/docker-ci-fix-compose-build-cache.py \
            --source docker-compose.resolved.yml \
            --cache-from "type=gha" \
            --cache-to "type=gha,mode=max" \
            --backend-hash "${{ hashFiles('autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/poetry.lock', 'autogpt_platform/backend/backend/**') }}" \
-            --frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}" \
+            --frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}-sourcemaps" \
            --git-ref "${{ github.ref }}"

          # Build with bake using the resolved compose file (now includes cache config)
          docker buildx bake --allow=fs.read=.. -f docker-compose.resolved.yml --load
        env:
          NEXT_PUBLIC_PW_TEST: true
+          NEXT_PUBLIC_SOURCEMAPS: true

      - name: Set up tests - Cache E2E test data
        id: e2e-data-cache
@@ -279,6 +288,11 @@ jobs:
          cache: "pnpm"
          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml

+      - name: Copy source maps from Docker for E2E coverage
+        run: |
+          FRONTEND_CONTAINER=$(docker compose -f ../docker-compose.resolved.yml ps -q frontend)
+          docker cp "$FRONTEND_CONTAINER":/app/.next/static .next-static-coverage
+
      - name: Set up tests - Install dependencies
        run: pnpm install --frozen-lockfile

@@ -289,6 +303,15 @@ jobs:
        run: pnpm test:no-build
        continue-on-error: false

+      - name: Upload E2E coverage to Codecov
+        if: ${{ !cancelled() }}
+        uses: codecov/codecov-action@v5
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}
+          flags: platform-frontend-e2e
+          files: ./autogpt_platform/frontend/coverage/e2e/cobertura-coverage.xml
+          disable_search: true
+
      - name: Upload Playwright report
        if: always()
        uses: actions/upload-artifact@v4
--- a/.gitleaks.toml
+++ b/.gitleaks.toml
@@ -0,0 +1,36 @@
+title = "AutoGPT Gitleaks Config"
+
+[extend]
+useDefault = true
+
+[allowlist]
+description = "Global allowlist"
+paths = [
+    # Template/example env files (no real secrets)
+    '''\.env\.(default|example|template)$''',
+    # Lock files
+    '''pnpm-lock\.yaml$''',
+    '''poetry\.lock$''',
+    # Secrets baseline
+    '''\.secrets\.baseline$''',
+    # Build artifacts and caches (should not be committed)
+    '''__pycache__/''',
+    '''classic/frontend/build/''',
+    # Docker dev setup (local dev JWTs/keys only)
+    '''autogpt_platform/db/docker/''',
+    # Load test configs (dev JWTs)
+    '''load-tests/configs/''',
+    # Test files with fake/fixture keys (_test.py, test_*.py, conftest.py)
+    '''(_test|test_.*|conftest)\.py$''',
+    # Documentation (only contains placeholder keys in curl/API examples)
+    '''docs/.*\.md$''',
+    # Firebase config (public API keys by design)
+    '''google-services\.json$''',
+    '''classic/frontend/(lib|web)/''',
+]
+# CI test-only encryption key (marked DO NOT USE IN PRODUCTION)
+regexes = [
+    '''dvziYgz0KSK8FENhju0ZYi8''',
+    # LLM model name enum values falsely flagged as API keys
+    '''Llama-\d.*Instruct''',
+]
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -23,9 +23,15 @@ repos:
      - id: detect-secrets
        name: Detect secrets
        description: Detects high entropy strings that are likely to be passwords.
+        args: ["--baseline", ".secrets.baseline"]
        files: ^autogpt_platform/
-        exclude: pnpm-lock\.yaml$
-        stages: [pre-push]
+        exclude: (pnpm-lock\.yaml|\.env\.(default|example|template))$
+
+  - repo: https://github.com/gitleaks/gitleaks
+    rev: v8.24.3
+    hooks:
+      - id: gitleaks
+        name: Detect secrets (gitleaks)

  - repo: local
    # For proper type checking, all dependencies need to be up-to-date.
--- a/.secrets.baseline
+++ b/.secrets.baseline
@@ -0,0 +1,467 @@
+{
+  "version": "1.5.0",
+  "plugins_used": [
+    {
+      "name": "ArtifactoryDetector"
+    },
+    {
+      "name": "AWSKeyDetector"
+    },
+    {
+      "name": "AzureStorageKeyDetector"
+    },
+    {
+      "name": "Base64HighEntropyString",
+      "limit": 4.5
+    },
+    {
+      "name": "BasicAuthDetector"
+    },
+    {
+      "name": "CloudantDetector"
+    },
+    {
+      "name": "DiscordBotTokenDetector"
+    },
+    {
+      "name": "GitHubTokenDetector"
+    },
+    {
+      "name": "GitLabTokenDetector"
+    },
+    {
+      "name": "HexHighEntropyString",
+      "limit": 3.0
+    },
+    {
+      "name": "IbmCloudIamDetector"
+    },
+    {
+      "name": "IbmCosHmacDetector"
+    },
+    {
+      "name": "IPPublicDetector"
+    },
+    {
+      "name": "JwtTokenDetector"
+    },
+    {
+      "name": "KeywordDetector",
+      "keyword_exclude": ""
+    },
+    {
+      "name": "MailchimpDetector"
+    },
+    {
+      "name": "NpmDetector"
+    },
+    {
+      "name": "OpenAIDetector"
+    },
+    {
+      "name": "PrivateKeyDetector"
+    },
+    {
+      "name": "PypiTokenDetector"
+    },
+    {
+      "name": "SendGridDetector"
+    },
+    {
+      "name": "SlackDetector"
+    },
+    {
+      "name": "SoftlayerDetector"
+    },
+    {
+      "name": "SquareOAuthDetector"
+    },
+    {
+      "name": "StripeDetector"
+    },
+    {
+      "name": "TelegramBotTokenDetector"
+    },
+    {
+      "name": "TwilioKeyDetector"
+    }
+  ],
+  "filters_used": [
+    {
+      "path": "detect_secrets.filters.allowlist.is_line_allowlisted"
+    },
+    {
+      "path": "detect_secrets.filters.common.is_ignored_due_to_verification_policies",
+      "min_level": 2
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_indirect_reference"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_likely_id_string"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_lock_file"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_not_alphanumeric_string"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_potential_uuid"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_prefixed_with_dollar_sign"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_sequential_string"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_swagger_file"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_templated_secret"
+    },
+    {
+      "path": "detect_secrets.filters.regex.should_exclude_file",
+      "pattern": [
+        "\\.env$",
+        "pnpm-lock\\.yaml$",
+        "\\.env\\.(default|example|template)$",
+        "__pycache__",
+        "_test\\.py$",
+        "test_.*\\.py$",
+        "conftest\\.py$",
+        "poetry\\.lock$",
+        "node_modules"
+      ]
+    }
+  ],
+  "results": {
+    "autogpt_platform/backend/backend/api/external/v1/integrations.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/api/external/v1/integrations.py",
+        "hashed_secret": "665b1e3851eefefa3fb878654292f16597d25155",
+        "is_verified": false,
+        "line_number": 289
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/airtable/_config.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/blocks/airtable/_config.py",
+        "hashed_secret": "57e168b03afb7c1ee3cdc4ee3db2fe1cc6e0df26",
+        "is_verified": false,
+        "line_number": 29
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/dataforseo/_config.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/blocks/dataforseo/_config.py",
+        "hashed_secret": "32ce93887331fa5d192f2876ea15ec000c7d58b8",
+        "is_verified": false,
+        "line_number": 12
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/github/checks.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/checks.py",
+        "hashed_secret": "8ac6f92737d8586790519c5d7bfb4d2eb172c238",
+        "is_verified": false,
+        "line_number": 108
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/github/ci.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/ci.py",
+        "hashed_secret": "90bd1b48e958257948487b90bee080ba5ed00caa",
+        "is_verified": false,
+        "line_number": 123
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
+        "hashed_secret": "f96896dafced7387dcd22343b8ea29d3d2c65663",
+        "is_verified": false,
+        "line_number": 42
+      },
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
+        "hashed_secret": "b80a94d5e70bedf4f5f89d2f5a5255cc9492d12e",
+        "is_verified": false,
+        "line_number": 193
+      },
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
+        "hashed_secret": "75b17e517fe1b3136394f6bec80c4f892da75e42",
+        "is_verified": false,
+        "line_number": 344
+      },
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
+        "hashed_secret": "b0bfb5e4e2394e7f8906e5ed1dffd88b2bc89dd5",
+        "is_verified": false,
+        "line_number": 534
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/github/statuses.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/statuses.py",
+        "hashed_secret": "8ac6f92737d8586790519c5d7bfb4d2eb172c238",
+        "is_verified": false,
+        "line_number": 85
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/google/docs.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/google/docs.py",
+        "hashed_secret": "c95da0c6696342c867ef0c8258d2f74d20fd94d4",
+        "is_verified": false,
+        "line_number": 203
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/google/sheets.py": [
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/google/sheets.py",
+        "hashed_secret": "bd5a04fa3667e693edc13239b6d310c5c7a8564b",
+        "is_verified": false,
+        "line_number": 57
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/linear/_config.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/blocks/linear/_config.py",
+        "hashed_secret": "b37f020f42d6d613b6ce30103e4d408c4499b3bb",
+        "is_verified": false,
+        "line_number": 53
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/medium.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/medium.py",
+        "hashed_secret": "ff998abc1ce6d8f01a675fa197368e44c8916e9c",
+        "is_verified": false,
+        "line_number": 131
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/replicate/replicate_block.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/replicate/replicate_block.py",
+        "hashed_secret": "8bbdd6f26368f58ea4011d13d7f763cb662e66f0",
+        "is_verified": false,
+        "line_number": 55
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/slant3d/webhook.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/slant3d/webhook.py",
+        "hashed_secret": "36263c76947443b2f6e6b78153967ac4a7da99f9",
+        "is_verified": false,
+        "line_number": 100
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/talking_head.py": [
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/talking_head.py",
+        "hashed_secret": "44ce2d66222529eea4a32932823466fc0601c799",
+        "is_verified": false,
+        "line_number": 113
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/wordpress/_config.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/blocks/wordpress/_config.py",
+        "hashed_secret": "e62679512436161b78e8a8d68c8829c2a1031ccb",
+        "is_verified": false,
+        "line_number": 17
+      }
+    ],
+    "autogpt_platform/backend/backend/util/cache.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/util/cache.py",
+        "hashed_secret": "37f0c918c3fa47ca4a70e42037f9f123fdfbc75b",
+        "is_verified": false,
+        "line_number": 449
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/helpers.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/helpers.ts",
+        "hashed_secret": "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8",
+        "is_verified": false,
+        "line_number": 6
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/dictionaries/en.json": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/dictionaries/en.json",
+        "hashed_secret": "8be3c943b1609fffbfc51aad666d0a04adf83c9d",
+        "is_verified": false,
+        "line_number": 5
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/dictionaries/es.json": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/dictionaries/es.json",
+        "hashed_secret": "5a6d1c612954979ea99ee33dbb2d231b00f6ac0a",
+        "is_verified": false,
+        "line_number": 5
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts",
+        "hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
+        "is_verified": false,
+        "line_number": 6
+      },
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts",
+        "hashed_secret": "f72cbb45464d487064610c5411c576ca4019d380",
+        "is_verified": false,
+        "line_number": 8
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts",
+        "hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
+        "is_verified": false,
+        "line_number": 5
+      },
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts",
+        "hashed_secret": "f72cbb45464d487064610c5411c576ca4019d380",
+        "is_verified": false,
+        "line_number": 7
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx",
+        "hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
+        "is_verified": false,
+        "line_number": 192
+      },
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx",
+        "hashed_secret": "86275db852204937bbdbdebe5fabe8536e030ab6",
+        "is_verified": false,
+        "line_number": 193
+      }
+    ],
+    "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts",
+        "hashed_secret": "47acd2028cf81b5da88ddeedb2aea4eca4b71fbd",
+        "is_verified": false,
+        "line_number": 102
+      },
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts",
+        "hashed_secret": "8be3c943b1609fffbfc51aad666d0a04adf83c9d",
+        "is_verified": false,
+        "line_number": 103
+      }
+    ],
+    "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts": [
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "9c486c92f1a7420e1045c7ad963fbb7ba3621025",
+        "is_verified": false,
+        "line_number": 73
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "9277508c7a6effc8fb59163efbfada189e35425c",
+        "is_verified": false,
+        "line_number": 75
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "8dc7e2cb1d0935897d541bf5facab389b8a50340",
+        "is_verified": false,
+        "line_number": 77
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "79a26ad48775944299be6aaf9fb1d5302c1ed75b",
+        "is_verified": false,
+        "line_number": 79
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "a3b62b44500a1612e48d4cab8294df81561b3b1a",
+        "is_verified": false,
+        "line_number": 81
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "a58979bd0b21ef4f50417d001008e60dd7a85c64",
+        "is_verified": false,
+        "line_number": 83
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "6cb6e075f8e8c7c850f9d128d6608e5dbe209a79",
+        "is_verified": false,
+        "line_number": 85
+      }
+    ],
+    "autogpt_platform/frontend/src/lib/constants.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/lib/constants.ts",
+        "hashed_secret": "27b924db06a28cc755fb07c54f0fddc30659fe4d",
+        "is_verified": false,
+        "line_number": 10
+      }
+    ],
+    "autogpt_platform/frontend/src/tests/credentials/index.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/tests/credentials/index.ts",
+        "hashed_secret": "c18006fc138809314751cd1991f1e0b820fabd37",
+        "is_verified": false,
+        "line_number": 4
+      }
+    ]
+  },
+  "generated_at": "2026-04-02T13:10:54Z"
+}
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -30,7 +30,7 @@ See `/frontend/CONTRIBUTING.md` for complete patterns. Quick reference:
   - Regenerate with `pnpm generate:api`
   - Pattern: `use{Method}{Version}{OperationName}`
 4. **Styling**: Tailwind CSS only, use design tokens, Phosphor Icons only
-5. **Testing**: Add Storybook stories for new components, Playwright for E2E
+5. **Testing**: Integration tests (Vitest + RTL + MSW) are the default (~90%, page-level). Playwright for E2E critical flows. Storybook for design system components. See `autogpt_platform/frontend/TESTING.md`
 6. **Code conventions**: Function declarations (not arrow functions) for components/handlers

 - Component props should be `interface Props { ... }` (not exported) unless the interface needs to be used outside the component
@@ -47,7 +47,9 @@ See `/frontend/CONTRIBUTING.md` for complete patterns. Quick reference:
 ## Testing

 - Backend: `poetry run test` (runs pytest with a docker based postgres + prisma).
- Frontend: `pnpm test` or `pnpm test-ui` for Playwright tests. See `docs/content/platform/contributing/tests.md` for tips.
+- Frontend integration tests: `pnpm test:unit` (Vitest + RTL + MSW, primary testing approach).
+- Frontend E2E tests: `pnpm test` or `pnpm test-ui` for Playwright tests.
+- See `autogpt_platform/frontend/TESTING.md` for the full testing strategy.

 Always run the relevant linters and tests before committing.
 Use conventional commit messages for all commits (e.g. `feat(backend): add API`).
--- a/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes.py
+++ b/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes.py
@@ -9,11 +9,14 @@ from pydantic import BaseModel

 from backend.copilot.config import ChatConfig
 from backend.copilot.rate_limit import (
+    SubscriptionTier,
    get_global_rate_limits,
    get_usage_status,
+    get_user_tier,
    reset_user_usage,
+    set_user_tier,
 )
-from backend.data.user import get_user_by_email, get_user_email_by_id
+from backend.data.user import get_user_by_email, get_user_email_by_id, search_users

 logger = logging.getLogger(__name__)

@@ -33,6 +36,17 @@ class UserRateLimitResponse(BaseModel):
    weekly_token_limit: int
    daily_tokens_used: int
    weekly_tokens_used: int
+    tier: SubscriptionTier
+
+
+class UserTierResponse(BaseModel):
+    user_id: str
+    tier: SubscriptionTier
+
+
+class SetUserTierRequest(BaseModel):
+    user_id: str
+    tier: SubscriptionTier


 async def _resolve_user_id(
@@ -86,10 +100,10 @@ async def get_user_rate_limit(

    logger.info("Admin %s checking rate limit for user %s", admin_user_id, resolved_id)

-    daily_limit, weekly_limit = await get_global_rate_limits(
+    daily_limit, weekly_limit, tier = await get_global_rate_limits(
        resolved_id, config.daily_token_limit, config.weekly_token_limit
    )
-    usage = await get_usage_status(resolved_id, daily_limit, weekly_limit)
+    usage = await get_usage_status(resolved_id, daily_limit, weekly_limit, tier=tier)

    return UserRateLimitResponse(
        user_id=resolved_id,
@@ -98,6 +112,7 @@ async def get_user_rate_limit(
        weekly_token_limit=weekly_limit,
        daily_tokens_used=usage.daily.used,
        weekly_tokens_used=usage.weekly.used,
+        tier=tier,
    )


@@ -125,10 +140,10 @@ async def reset_user_rate_limit(
        logger.exception("Failed to reset user usage")
        raise HTTPException(status_code=500, detail="Failed to reset usage") from e

-    daily_limit, weekly_limit = await get_global_rate_limits(
+    daily_limit, weekly_limit, tier = await get_global_rate_limits(
        user_id, config.daily_token_limit, config.weekly_token_limit
    )
-    usage = await get_usage_status(user_id, daily_limit, weekly_limit)
+    usage = await get_usage_status(user_id, daily_limit, weekly_limit, tier=tier)

    try:
        resolved_email = await get_user_email_by_id(user_id)
@@ -143,4 +158,102 @@ async def reset_user_rate_limit(
        weekly_token_limit=weekly_limit,
        daily_tokens_used=usage.daily.used,
        weekly_tokens_used=usage.weekly.used,
+        tier=tier,
    )
+
+
+@router.get(
+    "/rate_limit/tier",
+    response_model=UserTierResponse,
+    summary="Get User Rate Limit Tier",
+)
+async def get_user_rate_limit_tier(
+    user_id: str,
+    admin_user_id: str = Security(get_user_id),
+) -> UserTierResponse:
+    """Get a user's current rate-limit tier. Admin-only.
+
+    Returns 404 if the user does not exist in the database.
+    """
+    logger.info("Admin %s checking tier for user %s", admin_user_id, user_id)
+
+    resolved_email = await get_user_email_by_id(user_id)
+    if resolved_email is None:
+        raise HTTPException(status_code=404, detail=f"User {user_id} not found")
+
+    tier = await get_user_tier(user_id)
+    return UserTierResponse(user_id=user_id, tier=tier)
+
+
+@router.post(
+    "/rate_limit/tier",
+    response_model=UserTierResponse,
+    summary="Set User Rate Limit Tier",
+)
+async def set_user_rate_limit_tier(
+    request: SetUserTierRequest,
+    admin_user_id: str = Security(get_user_id),
+) -> UserTierResponse:
+    """Set a user's rate-limit tier. Admin-only.
+
+    Returns 404 if the user does not exist in the database.
+    """
+    try:
+        resolved_email = await get_user_email_by_id(request.user_id)
+    except Exception:
+        logger.warning(
+            "Failed to resolve email for user %s",
+            request.user_id,
+            exc_info=True,
+        )
+        resolved_email = None
+
+    if resolved_email is None:
+        raise HTTPException(status_code=404, detail=f"User {request.user_id} not found")
+
+    old_tier = await get_user_tier(request.user_id)
+    logger.info(
+        "Admin %s changing tier for user %s (%s): %s -> %s",
+        admin_user_id,
+        request.user_id,
+        resolved_email,
+        old_tier.value,
+        request.tier.value,
+    )
+    try:
+        await set_user_tier(request.user_id, request.tier)
+    except Exception as e:
+        logger.exception("Failed to set user tier")
+        raise HTTPException(status_code=500, detail="Failed to set tier") from e
+
+    return UserTierResponse(user_id=request.user_id, tier=request.tier)
+
+
+class UserSearchResult(BaseModel):
+    user_id: str
+    user_email: Optional[str] = None
+
+
+@router.get(
+    "/rate_limit/search_users",
+    response_model=list[UserSearchResult],
+    summary="Search Users by Name or Email",
+)
+async def admin_search_users(
+    query: str,
+    limit: int = 20,
+    admin_user_id: str = Security(get_user_id),
+) -> list[UserSearchResult]:
+    """Search users by partial email or name. Admin-only.
+
+    Queries the User table directly — returns results even for users
+    without credit transaction history.
+    """
+    if len(query.strip()) < 3:
+        raise HTTPException(
+            status_code=400,
+            detail="Search query must be at least 3 characters.",
+        )
+    logger.info("Admin %s searching users with query=%r", admin_user_id, query)
+    results = await search_users(query, limit=max(1, min(limit, 50)))
+    return [UserSearchResult(user_id=uid, user_email=email) for uid, email in results]
--- a/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes_test.py
@@ -9,7 +9,7 @@ import pytest_mock
 from autogpt_libs.auth.jwt_utils import get_jwt_payload
 from pytest_snapshot.plugin import Snapshot

-from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow
+from backend.copilot.rate_limit import CoPilotUsageStatus, SubscriptionTier, UsageWindow

 from .rate_limit_admin_routes import router as rate_limit_admin_router

@@ -57,7 +57,7 @@ def _patch_rate_limit_deps(
    mocker.patch(
        f"{_MOCK_MODULE}.get_global_rate_limits",
        new_callable=AsyncMock,
-        return_value=(2_500_000, 12_500_000),
+        return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
    )
    mocker.patch(
        f"{_MOCK_MODULE}.get_usage_status",
@@ -89,6 +89,7 @@ def test_get_rate_limit(
    assert data["weekly_token_limit"] == 12_500_000
    assert data["daily_tokens_used"] == 500_000
    assert data["weekly_tokens_used"] == 3_000_000
+    assert data["tier"] == "FREE"

    configured_snapshot.assert_match(
        json.dumps(data, indent=2, sort_keys=True) + "\n",
@@ -162,6 +163,7 @@ def test_reset_user_usage_daily_only(
    assert data["daily_tokens_used"] == 0
    # Weekly is untouched
    assert data["weekly_tokens_used"] == 3_000_000
+    assert data["tier"] == "FREE"

    mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=False)

@@ -192,6 +194,7 @@ def test_reset_user_usage_daily_and_weekly(
    data = response.json()
    assert data["daily_tokens_used"] == 0
    assert data["weekly_tokens_used"] == 0
+    assert data["tier"] == "FREE"

    mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=True)

@@ -228,7 +231,7 @@ def test_get_rate_limit_email_lookup_failure(
    mocker.patch(
        f"{_MOCK_MODULE}.get_global_rate_limits",
        new_callable=AsyncMock,
-        return_value=(2_500_000, 12_500_000),
+        return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
    )
    mocker.patch(
        f"{_MOCK_MODULE}.get_usage_status",
@@ -261,3 +264,303 @@ def test_admin_endpoints_require_admin_role(mock_jwt_user) -> None:
        json={"user_id": "test"},
    )
    assert response.status_code == 403
+
+
+# ---------------------------------------------------------------------------
+# Tier management endpoints
+# ---------------------------------------------------------------------------
+
+
+def test_get_user_tier(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test getting a user's rate-limit tier."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=_TARGET_EMAIL,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_tier",
+        new_callable=AsyncMock,
+        return_value=SubscriptionTier.PRO,
+    )
+
+    response = client.get("/admin/rate_limit/tier", params={"user_id": target_user_id})
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["user_id"] == target_user_id
+    assert data["tier"] == "PRO"
+
+
+def test_get_user_tier_user_not_found(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that getting tier for a non-existent user returns 404."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=None,
+    )
+
+    response = client.get("/admin/rate_limit/tier", params={"user_id": target_user_id})
+
+    assert response.status_code == 404
+
+
+def test_set_user_tier(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test setting a user's rate-limit tier (upgrade)."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=_TARGET_EMAIL,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_tier",
+        new_callable=AsyncMock,
+        return_value=SubscriptionTier.FREE,
+    )
+    mock_set = mocker.patch(
+        f"{_MOCK_MODULE}.set_user_tier",
+        new_callable=AsyncMock,
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "ENTERPRISE"},
+    )
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["user_id"] == target_user_id
+    assert data["tier"] == "ENTERPRISE"
+    mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.ENTERPRISE)
+
+
+def test_set_user_tier_downgrade(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test downgrading a user's tier from PRO to FREE."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=_TARGET_EMAIL,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_tier",
+        new_callable=AsyncMock,
+        return_value=SubscriptionTier.PRO,
+    )
+    mock_set = mocker.patch(
+        f"{_MOCK_MODULE}.set_user_tier",
+        new_callable=AsyncMock,
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "FREE"},
+    )
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["user_id"] == target_user_id
+    assert data["tier"] == "FREE"
+    mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.FREE)
+
+
+def test_set_user_tier_invalid_tier(
+    target_user_id: str,
+) -> None:
+    """Test that setting an invalid tier returns 422."""
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "invalid"},
+    )
+
+    assert response.status_code == 422
+
+
+def test_set_user_tier_invalid_tier_uppercase(
+    target_user_id: str,
+) -> None:
+    """Test that setting an unrecognised uppercase tier (e.g. 'INVALID') returns 422.
+
+    Regression: ensures Pydantic enum validation rejects values that are not
+    members of SubscriptionTier, even when they look like valid enum names.
+    """
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "INVALID"},
+    )
+
+    assert response.status_code == 422
+    body = response.json()
+    assert "detail" in body
+
+
+def test_set_user_tier_email_lookup_failure_returns_404(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that email lookup failure returns 404 (user unverifiable)."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        side_effect=Exception("DB connection failed"),
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "PRO"},
+    )
+
+    assert response.status_code == 404
+
+
+def test_set_user_tier_user_not_found(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that setting tier for a non-existent user returns 404."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=None,
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "PRO"},
+    )
+
+    assert response.status_code == 404
+
+
+def test_set_user_tier_db_failure(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that DB failure on set tier returns 500."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=_TARGET_EMAIL,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_tier",
+        new_callable=AsyncMock,
+        return_value=SubscriptionTier.FREE,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.set_user_tier",
+        new_callable=AsyncMock,
+        side_effect=Exception("DB connection refused"),
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "PRO"},
+    )
+
+    assert response.status_code == 500
+
+
+def test_tier_endpoints_require_admin_role(mock_jwt_user) -> None:
+    """Test that tier admin endpoints require admin role."""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
+
+    response = client.get("/admin/rate_limit/tier", params={"user_id": "test"})
+    assert response.status_code == 403
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": "test", "tier": "PRO"},
+    )
+    assert response.status_code == 403
+
+
+# ─── search_users endpoint ──────────────────────────────────────────
+
+
+def test_search_users_returns_matching_users(
+    mocker: pytest_mock.MockerFixture,
+    admin_user_id: str,
+) -> None:
+    """Partial search should return all matching users from the User table."""
+    mocker.patch(
+        _MOCK_MODULE + ".search_users",
+        new_callable=AsyncMock,
+        return_value=[
+            ("user-1", "zamil.majdy@gmail.com"),
+            ("user-2", "zamil.majdy@agpt.co"),
+        ],
+    )
+
+    response = client.get("/admin/rate_limit/search_users", params={"query": "zamil"})
+
+    assert response.status_code == 200
+    results = response.json()
+    assert len(results) == 2
+    assert results[0]["user_email"] == "zamil.majdy@gmail.com"
+    assert results[1]["user_email"] == "zamil.majdy@agpt.co"
+
+
+def test_search_users_empty_results(
+    mocker: pytest_mock.MockerFixture,
+    admin_user_id: str,
+) -> None:
+    """Search with no matches returns empty list."""
+    mocker.patch(
+        _MOCK_MODULE + ".search_users",
+        new_callable=AsyncMock,
+        return_value=[],
+    )
+
+    response = client.get(
+        "/admin/rate_limit/search_users", params={"query": "nonexistent"}
+    )
+
+    assert response.status_code == 200
+    assert response.json() == []
+
+
+def test_search_users_short_query_rejected(
+    admin_user_id: str,
+) -> None:
+    """Query shorter than 3 characters should return 400."""
+    response = client.get("/admin/rate_limit/search_users", params={"query": "ab"})
+    assert response.status_code == 400
+
+
+def test_search_users_negative_limit_clamped(
+    mocker: pytest_mock.MockerFixture,
+    admin_user_id: str,
+) -> None:
+    """Negative limit should be clamped to 1, not passed through."""
+    mock_search = mocker.patch(
+        _MOCK_MODULE + ".search_users",
+        new_callable=AsyncMock,
+        return_value=[],
+    )
+
+    response = client.get(
+        "/admin/rate_limit/search_users", params={"query": "test", "limit": -1}
+    )
+
+    assert response.status_code == 200
+    mock_search.assert_awaited_once_with("test", limit=1)
+
+
+def test_search_users_requires_admin_role(mock_jwt_user) -> None:
+    """Test that the search_users endpoint requires admin role."""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
+
+    response = client.get("/admin/rate_limit/search_users", params={"query": "test"})
+    assert response.status_code == 403
--- a/autogpt_platform/backend/backend/api/features/chat/routes.py
+++ b/autogpt_platform/backend/backend/api/features/chat/routes.py
@@ -15,7 +15,8 @@ from pydantic import BaseModel, ConfigDict, Field, field_validator

 from backend.copilot import service as chat_service
 from backend.copilot import stream_registry
-from backend.copilot.config import ChatConfig
+from backend.copilot.config import ChatConfig, CopilotMode
+from backend.copilot.db import get_chat_messages_paginated
 from backend.copilot.executor.utils import enqueue_cancel_task, enqueue_copilot_turn
 from backend.copilot.model import (
    ChatMessage,
@@ -111,6 +112,11 @@ class StreamChatRequest(BaseModel):
    file_ids: list[str] | None = Field(
        default=None, max_length=20
    )  # Workspace file IDs attached to this message
+    mode: CopilotMode | None = Field(
+        default=None,
+        description="Autopilot mode: 'fast' for baseline LLM, 'extended_thinking' for Claude Agent SDK. "
+        "If None, uses the server default (extended_thinking).",
+    )


 class CreateSessionRequest(BaseModel):
@@ -150,6 +156,8 @@ class SessionDetailResponse(BaseModel):
    user_id: str | None
    messages: list[dict]
    active_stream: ActiveStreamInfo | None = None  # Present if stream is still active
+    has_more_messages: bool = False
+    oldest_sequence: int | None = None
    total_prompt_tokens: int = 0
    total_completion_tokens: int = 0
    metadata: ChatSessionMetadata = ChatSessionMetadata()
@@ -389,60 +397,78 @@ async def update_session_title_route(
 async def get_session(
    session_id: str,
    user_id: Annotated[str, Security(auth.get_user_id)],
+    limit: int = Query(default=50, ge=1, le=200),
+    before_sequence: int | None = Query(default=None, ge=0),
 ) -> SessionDetailResponse:
    """
    Retrieve the details of a specific chat session.

-    Looks up a chat session by ID for the given user (if authenticated) and returns all session data including messages.
-    If there's an active stream for this session, returns active_stream info for reconnection.
+    Supports cursor-based pagination via ``limit`` and ``before_sequence``.
+    When no pagination params are provided, returns the most recent messages.

    Args:
        session_id: The unique identifier for the desired chat session.
-        user_id: The optional authenticated user ID, or None for anonymous access.
+        user_id: The authenticated user's ID.
+        limit: Maximum number of messages to return (1-200, default 50).
+        before_sequence: Return messages with sequence < this value (cursor).

    Returns:
-        SessionDetailResponse: Details for the requested session, including active_stream info if applicable.
-
+        SessionDetailResponse: Details for the requested session, including
+            active_stream info and pagination metadata.
    """
-    session = await get_chat_session(session_id, user_id)
-    if not session:
+    page = await get_chat_messages_paginated(
+        session_id, limit, before_sequence, user_id=user_id
+    )
+    if page is None:
        raise NotFoundError(f"Session {session_id} not found.")
+    messages = [message.model_dump() for message in page.messages]

-    messages = [message.model_dump() for message in session.messages]
-
-    # Check if there's an active stream for this session
+    # Only check active stream on initial load (not on "load more" requests)
    active_stream_info = None
-    active_session, last_message_id = await stream_registry.get_active_session(
-        session_id, user_id
-    )
-    logger.info(
-        f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
-        f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
-    )
-    if active_session:
-        # Keep the assistant message (including tool_calls) so the frontend can
-        # render the correct tool UI (e.g. CreateAgent with mini game).
-        # convertChatSessionToUiMessages handles isComplete=false by setting
-        # tool parts without output to state "input-available".
-        active_stream_info = ActiveStreamInfo(
-            turn_id=active_session.turn_id,
-            last_message_id=last_message_id,
+    if before_sequence is None:
+        active_session, last_message_id = await stream_registry.get_active_session(
+            session_id, user_id
+        )
+        logger.info(
+            f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
+            f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
+        )
+        if active_session:
+            active_stream_info = ActiveStreamInfo(
+                turn_id=active_session.turn_id,
+                last_message_id=last_message_id,
+            )
+
+    # Skip session metadata on "load more" — frontend only needs messages
+    if before_sequence is not None:
+        return SessionDetailResponse(
+            id=page.session.session_id,
+            created_at=page.session.started_at.isoformat(),
+            updated_at=page.session.updated_at.isoformat(),
+            user_id=page.session.user_id or None,
+            messages=messages,
+            active_stream=None,
+            has_more_messages=page.has_more,
+            oldest_sequence=page.oldest_sequence,
+            total_prompt_tokens=0,
+            total_completion_tokens=0,
        )

-    # Sum token usage from session
-    total_prompt = sum(u.prompt_tokens for u in session.usage)
-    total_completion = sum(u.completion_tokens for u in session.usage)
+    total_prompt = sum(u.prompt_tokens for u in page.session.usage)
+    total_completion = sum(u.completion_tokens for u in page.session.usage)

    return SessionDetailResponse(
-        id=session.session_id,
-        created_at=session.started_at.isoformat(),
-        updated_at=session.updated_at.isoformat(),
-        user_id=session.user_id or None,
+        id=page.session.session_id,
+        created_at=page.session.started_at.isoformat(),
+        updated_at=page.session.updated_at.isoformat(),
+        user_id=page.session.user_id or None,
        messages=messages,
        active_stream=active_stream_info,
+        has_more_messages=page.has_more,
+        oldest_sequence=page.oldest_sequence,
        total_prompt_tokens=total_prompt,
        total_completion_tokens=total_completion,
-        metadata=session.metadata,
+        metadata=page.session.metadata,
    )


@@ -456,8 +482,9 @@ async def get_copilot_usage(

    Returns current token usage vs limits for daily and weekly windows.
    Global defaults sourced from LaunchDarkly (falling back to config).
+    Includes the user's rate-limit tier.
    """
-    daily_limit, weekly_limit = await get_global_rate_limits(
+    daily_limit, weekly_limit, tier = await get_global_rate_limits(
        user_id, config.daily_token_limit, config.weekly_token_limit
    )
    return await get_usage_status(
@@ -465,6 +492,7 @@ async def get_copilot_usage(
        daily_token_limit=daily_limit,
        weekly_token_limit=weekly_limit,
        rate_limit_reset_cost=config.rate_limit_reset_cost,
+        tier=tier,
    )


@@ -516,7 +544,7 @@ async def reset_copilot_usage(
            detail="Rate limit reset is not available (credit system is disabled).",
        )

-    daily_limit, weekly_limit = await get_global_rate_limits(
+    daily_limit, weekly_limit, tier = await get_global_rate_limits(
        user_id, config.daily_token_limit, config.weekly_token_limit
    )

@@ -556,6 +584,7 @@ async def reset_copilot_usage(
            user_id=user_id,
            daily_token_limit=daily_limit,
            weekly_token_limit=weekly_limit,
+            tier=tier,
        )
        if daily_limit > 0 and usage_status.daily.used < daily_limit:
            raise HTTPException(
@@ -631,6 +660,7 @@ async def reset_copilot_usage(
        daily_token_limit=daily_limit,
        weekly_token_limit=weekly_limit,
        rate_limit_reset_cost=config.rate_limit_reset_cost,
+        tier=tier,
    )

    return RateLimitResetResponse(
@@ -741,7 +771,7 @@ async def stream_chat_post(
    # Global defaults sourced from LaunchDarkly, falling back to config.
    if user_id:
        try:
-            daily_limit, weekly_limit = await get_global_rate_limits(
+            daily_limit, weekly_limit, _ = await get_global_rate_limits(
                user_id, config.daily_token_limit, config.weekly_token_limit
            )
            await check_rate_limit(
@@ -836,6 +866,7 @@ async def stream_chat_post(
        is_user_message=request.is_user_message,
        context=request.context,
        file_ids=sanitized_file_ids,
+        mode=request.mode,
    )

    setup_time = (time.perf_counter() - stream_start_time) * 1000
--- a/autogpt_platform/backend/backend/api/features/chat/routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/routes_test.py
@@ -9,6 +9,7 @@ import pytest
 import pytest_mock

 from backend.api.features.chat import routes as chat_routes
+from backend.copilot.rate_limit import SubscriptionTier

 app = fastapi.FastAPI()
 app.include_router(chat_routes.router)
@@ -331,14 +332,28 @@ def _mock_usage(
    *,
    daily_used: int = 500,
    weekly_used: int = 2000,
+    daily_limit: int = 10000,
+    weekly_limit: int = 50000,
+    tier: "SubscriptionTier" = SubscriptionTier.FREE,
 ) -> AsyncMock:
-    """Mock get_usage_status to return a predictable CoPilotUsageStatus."""
+    """Mock get_usage_status and get_global_rate_limits for usage endpoint tests.
+
+    Mocks both ``get_global_rate_limits`` (returns the given limits + tier) and
+    ``get_usage_status`` so that tests exercise the endpoint without hitting
+    LaunchDarkly or Prisma.
+    """
    from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow

+    mocker.patch(
+        "backend.api.features.chat.routes.get_global_rate_limits",
+        new_callable=AsyncMock,
+        return_value=(daily_limit, weekly_limit, tier),
+    )
+
    resets_at = datetime.now(UTC) + timedelta(days=1)
    status = CoPilotUsageStatus(
-        daily=UsageWindow(used=daily_used, limit=10000, resets_at=resets_at),
-        weekly=UsageWindow(used=weekly_used, limit=50000, resets_at=resets_at),
+        daily=UsageWindow(used=daily_used, limit=daily_limit, resets_at=resets_at),
+        weekly=UsageWindow(used=weekly_used, limit=weekly_limit, resets_at=resets_at),
    )
    return mocker.patch(
        "backend.api.features.chat.routes.get_usage_status",
@@ -369,6 +384,7 @@ def test_usage_returns_daily_and_weekly(
        daily_token_limit=10000,
        weekly_token_limit=50000,
        rate_limit_reset_cost=chat_routes.config.rate_limit_reset_cost,
+        tier=SubscriptionTier.FREE,
    )


@@ -376,11 +392,9 @@ def test_usage_uses_config_limits(
    mocker: pytest_mock.MockerFixture,
    test_user_id: str,
 ) -> None:
-    """The endpoint forwards daily_token_limit and weekly_token_limit from config."""
-    mock_get = _mock_usage(mocker)
+    """The endpoint forwards resolved limits from get_global_rate_limits to get_usage_status."""
+    mock_get = _mock_usage(mocker, daily_limit=99999, weekly_limit=77777)

-    mocker.patch.object(chat_routes.config, "daily_token_limit", 99999)
-    mocker.patch.object(chat_routes.config, "weekly_token_limit", 77777)
    mocker.patch.object(chat_routes.config, "rate_limit_reset_cost", 500)

    response = client.get("/usage")
@@ -391,6 +405,7 @@ def test_usage_uses_config_limits(
        daily_token_limit=99999,
        weekly_token_limit=77777,
        rate_limit_reset_cost=500,
+        tier=SubscriptionTier.FREE,
    )


@@ -526,3 +541,41 @@ def test_create_session_rejects_nested_metadata(
    )

    assert response.status_code == 422
+
+
+class TestStreamChatRequestModeValidation:
+    """Pydantic-level validation of the ``mode`` field on StreamChatRequest."""
+
+    def test_rejects_invalid_mode_value(self) -> None:
+        """Any string outside the Literal set must raise ValidationError."""
+        from pydantic import ValidationError
+
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        with pytest.raises(ValidationError):
+            StreamChatRequest(message="hi", mode="turbo")  # type: ignore[arg-type]
+
+    def test_accepts_fast_mode(self) -> None:
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        req = StreamChatRequest(message="hi", mode="fast")
+        assert req.mode == "fast"
+
+    def test_accepts_extended_thinking_mode(self) -> None:
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        req = StreamChatRequest(message="hi", mode="extended_thinking")
+        assert req.mode == "extended_thinking"
+
+    def test_accepts_none_mode(self) -> None:
+        """``mode=None`` is valid (server decides via feature flags)."""
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        req = StreamChatRequest(message="hi", mode=None)
+        assert req.mode is None
+
+    def test_mode_defaults_to_none_when_omitted(self) -> None:
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        req = StreamChatRequest(message="hi")
+        assert req.mode is None
--- a/autogpt_platform/backend/backend/api/features/store/db_test.py
+++ b/autogpt_platform/backend/backend/api/features/store/db_test.py
@@ -189,6 +189,7 @@ async def test_create_store_submission(mocker):
        notifyOnAgentApproved=True,
        notifyOnAgentRejected=True,
        timezone="Europe/Delft",
+        subscriptionTier=prisma.enums.SubscriptionTier.FREE,  # type: ignore[reportCallIssue,reportAttributeAccessIssue]
    )
    mock_agent = prisma.models.AgentGraph(
        id="agent-id",
--- a/autogpt_platform/backend/backend/api/features/workspace/routes.py
+++ b/autogpt_platform/backend/backend/api/features/workspace/routes.py
@@ -12,7 +12,7 @@ import fastapi
 from autogpt_libs.auth.dependencies import get_user_id, requires_user
 from fastapi import Query, UploadFile
 from fastapi.responses import Response
-from pydantic import BaseModel
+from pydantic import BaseModel, Field

 from backend.data.workspace import (
    WorkspaceFile,
@@ -131,9 +131,26 @@ class StorageUsageResponse(BaseModel):
    file_count: int


+class WorkspaceFileItem(BaseModel):
+    id: str
+    name: str
+    path: str
+    mime_type: str
+    size_bytes: int
+    metadata: dict = Field(default_factory=dict)
+    created_at: str
+
+
+class ListFilesResponse(BaseModel):
+    files: list[WorkspaceFileItem]
+    offset: int = 0
+    has_more: bool = False
+
+
@router.get(
    "/files/{file_id}/download",
    summary="Download file by ID",
+    operation_id="getWorkspaceDownloadFileById",
 )
 async def download_file(
    user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -158,6 +175,7 @@ async def download_file(
@router.delete(
    "/files/{file_id}",
    summary="Delete a workspace file",
+    operation_id="deleteWorkspaceFile",
 )
 async def delete_workspace_file(
    user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -183,6 +201,7 @@ async def delete_workspace_file(
@router.post(
    "/files/upload",
    summary="Upload file to workspace",
+    operation_id="uploadWorkspaceFile",
 )
 async def upload_file(
    user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -196,6 +215,9 @@ async def upload_file(
    Files are stored in session-scoped paths when session_id is provided,
    so the agent's session-scoped tools can discover them automatically.
    """
+    # Empty-string session_id drops session scoping; normalize to None.
+    session_id = session_id or None
+
    config = Config()

    # Sanitize filename — strip any directory components
@@ -250,16 +272,27 @@ async def upload_file(
    manager = WorkspaceManager(user_id, workspace.id, session_id)
    try:
        workspace_file = await manager.write_file(
-            content, filename, overwrite=overwrite
+            content, filename, overwrite=overwrite, metadata={"origin": "user-upload"}
        )
    except ValueError as e:
-        raise fastapi.HTTPException(status_code=409, detail=str(e)) from e
+        # write_file raises ValueError for both path-conflict and size-limit
+        # cases; map each to its correct HTTP status.
+        message = str(e)
+        if message.startswith("File too large"):
+            raise fastapi.HTTPException(status_code=413, detail=message) from e
+        raise fastapi.HTTPException(status_code=409, detail=message) from e

    # Post-write storage check — eliminates TOCTOU race on the quota.
    # If a concurrent upload pushed us over the limit, undo this write.
    new_total = await get_workspace_total_size(workspace.id)
    if storage_limit_bytes and new_total > storage_limit_bytes:
-        await soft_delete_workspace_file(workspace_file.id, workspace.id)
+        try:
+            await soft_delete_workspace_file(workspace_file.id, workspace.id)
+        except Exception as e:
+            logger.warning(
+                f"Failed to soft-delete over-quota file {workspace_file.id} "
+                f"in workspace {workspace.id}: {e}"
+            )
        raise fastapi.HTTPException(
            status_code=413,
            detail={
@@ -281,6 +314,7 @@ async def upload_file(
@router.get(
    "/storage/usage",
    summary="Get workspace storage usage",
+    operation_id="getWorkspaceStorageUsage",
 )
 async def get_storage_usage(
    user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -301,3 +335,57 @@ async def get_storage_usage(
        used_percent=round((used_bytes / limit_bytes) * 100, 1) if limit_bytes else 0,
        file_count=file_count,
    )
+
+
+@router.get(
+    "/files",
+    summary="List workspace files",
+    operation_id="listWorkspaceFiles",
+)
+async def list_workspace_files(
+    user_id: Annotated[str, fastapi.Security(get_user_id)],
+    session_id: str | None = Query(default=None),
+    limit: int = Query(default=200, ge=1, le=1000),
+    offset: int = Query(default=0, ge=0),
+) -> ListFilesResponse:
+    """
+    List files in the user's workspace.
+
+    When session_id is provided, only files for that session are returned.
+    Otherwise, all files across sessions are listed. Results are paginated
+    via `limit`/`offset`; `has_more` indicates whether additional pages exist.
+    """
+    workspace = await get_or_create_workspace(user_id)
+
+    # Treat empty-string session_id the same as omitted — an empty value
+    # would otherwise silently list files across every session instead of
+    # scoping to one.
+    session_id = session_id or None
+
+    manager = WorkspaceManager(user_id, workspace.id, session_id)
+    include_all = session_id is None
+    # Fetch one extra to compute has_more without a separate count query.
+    files = await manager.list_files(
+        limit=limit + 1,
+        offset=offset,
+        include_all_sessions=include_all,
+    )
+    has_more = len(files) > limit
+    page = files[:limit]
+
+    return ListFilesResponse(
+        files=[
+            WorkspaceFileItem(
+                id=f.id,
+                name=f.name,
+                path=f.path,
+                mime_type=f.mime_type,
+                size_bytes=f.size_bytes,
+                metadata=f.metadata or {},
+                created_at=f.created_at.isoformat(),
+            )
+            for f in page
+        ],
+        offset=offset,
+        has_more=has_more,
+    )
--- a/autogpt_platform/backend/backend/api/features/workspace/routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/workspace/routes_test.py
@@ -1,48 +1,28 @@
-"""Tests for workspace file upload and download routes."""
-
 import io
 from datetime import datetime, timezone
+from unittest.mock import AsyncMock, MagicMock, patch

 import fastapi
 import fastapi.testclient
 import pytest
-import pytest_mock

-from backend.api.features.workspace import routes as workspace_routes
-from backend.data.workspace import WorkspaceFile
+from backend.api.features.workspace.routes import router
+from backend.data.workspace import Workspace, WorkspaceFile

 app = fastapi.FastAPI()
-app.include_router(workspace_routes.router)
+app.include_router(router)


@app.exception_handler(ValueError)
 async def _value_error_handler(
    request: fastapi.Request, exc: ValueError
 ) -> fastapi.responses.JSONResponse:
-    """Mirror the production ValueError → 400 mapping from rest_api.py."""
+    """Mirror the production ValueError → 400 mapping from the REST app."""
    return fastapi.responses.JSONResponse(status_code=400, content={"detail": str(exc)})


 client = fastapi.testclient.TestClient(app)

-TEST_USER_ID = "3e53486c-cf57-477e-ba2a-cb02dc828e1a"
-
-MOCK_WORKSPACE = type("W", (), {"id": "ws-1"})()
-
-_NOW = datetime(2023, 1, 1, tzinfo=timezone.utc)
-
-MOCK_FILE = WorkspaceFile(
-    id="file-aaa-bbb",
-    workspace_id="ws-1",
-    created_at=_NOW,
-    updated_at=_NOW,
-    name="hello.txt",
-    path="/session/hello.txt",
-    mime_type="text/plain",
-    size_bytes=13,
-    storage_path="local://hello.txt",
-)
-

@pytest.fixture(autouse=True)
 def setup_app_auth(mock_jwt_user):
@@ -53,25 +33,201 @@ def setup_app_auth(mock_jwt_user):
    app.dependency_overrides.clear()


+def _make_workspace(user_id: str = "test-user-id") -> Workspace:
+    return Workspace(
+        id="ws-001",
+        user_id=user_id,
+        created_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
+        updated_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
+    )
+
+
+def _make_file(**overrides) -> WorkspaceFile:
+    defaults = {
+        "id": "file-001",
+        "workspace_id": "ws-001",
+        "created_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
+        "updated_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
+        "name": "test.txt",
+        "path": "/test.txt",
+        "storage_path": "local://test.txt",
+        "mime_type": "text/plain",
+        "size_bytes": 100,
+        "checksum": None,
+        "is_deleted": False,
+        "deleted_at": None,
+        "metadata": {},
+    }
+    defaults.update(overrides)
+    return WorkspaceFile(**defaults)
+
+
+def _make_file_mock(**overrides) -> MagicMock:
+    """Create a mock WorkspaceFile to simulate DB records with null fields."""
+    defaults = {
+        "id": "file-001",
+        "name": "test.txt",
+        "path": "/test.txt",
+        "mime_type": "text/plain",
+        "size_bytes": 100,
+        "metadata": {},
+        "created_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
+    }
+    defaults.update(overrides)
+    mock = MagicMock(spec=WorkspaceFile)
+    for k, v in defaults.items():
+        setattr(mock, k, v)
+    return mock
+
+
+# -- list_workspace_files tests --
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_returns_all_when_no_session(mock_manager_cls, mock_get_workspace):
+    mock_get_workspace.return_value = _make_workspace()
+    files = [
+        _make_file(id="f1", name="a.txt", metadata={"origin": "user-upload"}),
+        _make_file(id="f2", name="b.csv", metadata={"origin": "agent-created"}),
+    ]
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = files
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files")
+    assert response.status_code == 200
+
+    data = response.json()
+    assert len(data["files"]) == 2
+    assert data["has_more"] is False
+    assert data["offset"] == 0
+    assert data["files"][0]["id"] == "f1"
+    assert data["files"][0]["metadata"] == {"origin": "user-upload"}
+    assert data["files"][1]["id"] == "f2"
+    mock_instance.list_files.assert_called_once_with(
+        limit=201, offset=0, include_all_sessions=True
+    )
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_scopes_to_session_when_provided(
+    mock_manager_cls, mock_get_workspace, test_user_id
+):
+    mock_get_workspace.return_value = _make_workspace(user_id=test_user_id)
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = []
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files?session_id=sess-123")
+    assert response.status_code == 200
+
+    data = response.json()
+    assert data["files"] == []
+    assert data["has_more"] is False
+    mock_manager_cls.assert_called_once_with(test_user_id, "ws-001", "sess-123")
+    mock_instance.list_files.assert_called_once_with(
+        limit=201, offset=0, include_all_sessions=False
+    )
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_null_metadata_coerced_to_empty_dict(
+    mock_manager_cls, mock_get_workspace
+):
+    """Route uses `f.metadata or {}` for pre-existing files with null metadata."""
+    mock_get_workspace.return_value = _make_workspace()
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = [_make_file_mock(metadata=None)]
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files")
+    assert response.status_code == 200
+    assert response.json()["files"][0]["metadata"] == {}
+
+
+# -- upload_file metadata tests --
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.get_workspace_total_size")
+@patch("backend.api.features.workspace.routes.scan_content_safe")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_upload_passes_user_upload_origin_metadata(
+    mock_manager_cls, mock_scan, mock_total_size, mock_get_workspace
+):
+    mock_get_workspace.return_value = _make_workspace()
+    mock_total_size.return_value = 100
+    written = _make_file(id="new-file", name="doc.pdf")
+    mock_instance = AsyncMock()
+    mock_instance.write_file.return_value = written
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.post(
+        "/files/upload",
+        files={"file": ("doc.pdf", b"fake-pdf-content", "application/pdf")},
+    )
+    assert response.status_code == 200
+
+    mock_instance.write_file.assert_called_once()
+    call_kwargs = mock_instance.write_file.call_args
+    assert call_kwargs.kwargs.get("metadata") == {"origin": "user-upload"}
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.get_workspace_total_size")
+@patch("backend.api.features.workspace.routes.scan_content_safe")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_upload_returns_409_on_file_conflict(
+    mock_manager_cls, mock_scan, mock_total_size, mock_get_workspace
+):
+    mock_get_workspace.return_value = _make_workspace()
+    mock_total_size.return_value = 100
+    mock_instance = AsyncMock()
+    mock_instance.write_file.side_effect = ValueError("File already exists at path")
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.post(
+        "/files/upload",
+        files={"file": ("dup.txt", b"content", "text/plain")},
+    )
+    assert response.status_code == 409
+    assert "already exists" in response.json()["detail"]
+
+
+# -- Restored upload/download/delete security + invariant tests --
+
+
 def _upload(
    filename: str = "hello.txt",
    content: bytes = b"Hello, world!",
    content_type: str = "text/plain",
 ):
-    """Helper to POST a file upload."""
    return client.post(
        "/files/upload?session_id=sess-1",
        files={"file": (filename, io.BytesIO(content), content_type)},
    )


-# ---- Happy path ----
+_MOCK_FILE = WorkspaceFile(
+    id="file-aaa-bbb",
+    workspace_id="ws-001",
+    created_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
+    updated_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
+    name="hello.txt",
+    path="/sessions/sess-1/hello.txt",
+    mime_type="text/plain",
+    size_bytes=13,
+    storage_path="local://hello.txt",
+)


-def test_upload_happy_path(mocker: pytest_mock.MockFixture):
+def test_upload_happy_path(mocker):
    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -82,7 +238,7 @@ def test_upload_happy_path(mocker: pytest_mock.MockFixture):
        return_value=None,
    )
    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
    mocker.patch(
        "backend.api.features.workspace.routes.WorkspaceManager",
        return_value=mock_manager,
@@ -96,10 +252,7 @@ def test_upload_happy_path(mocker: pytest_mock.MockFixture):
    assert data["size_bytes"] == 13


-# ---- Per-file size limit ----
-
-
-def test_upload_exceeds_max_file_size(mocker: pytest_mock.MockFixture):
+def test_upload_exceeds_max_file_size(mocker):
    """Files larger than max_file_size_mb should be rejected with 413."""
    cfg = mocker.patch("backend.api.features.workspace.routes.Config")
    cfg.return_value.max_file_size_mb = 0  # 0 MB → any content is too big
@@ -109,15 +262,11 @@ def test_upload_exceeds_max_file_size(mocker: pytest_mock.MockFixture):
    assert response.status_code == 413


-# ---- Storage quota exceeded ----
-
-
-def test_upload_storage_quota_exceeded(mocker: pytest_mock.MockFixture):
+def test_upload_storage_quota_exceeded(mocker):
    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
-    # Current usage already at limit
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
        return_value=500 * 1024 * 1024,
@@ -128,27 +277,22 @@ def test_upload_storage_quota_exceeded(mocker: pytest_mock.MockFixture):
    assert "Storage limit exceeded" in response.text


-# ---- Post-write quota race (B2) ----
-
-
-def test_upload_post_write_quota_race(mocker: pytest_mock.MockFixture):
-    """If a concurrent upload tips the total over the limit after write,
-    the file should be soft-deleted and 413 returned."""
+def test_upload_post_write_quota_race(mocker):
+    """Concurrent upload tipping over limit after write should soft-delete + 413."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
-    # Pre-write check passes (under limit), but post-write check fails
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
-        side_effect=[0, 600 * 1024 * 1024],  # first call OK, second over limit
+        side_effect=[0, 600 * 1024 * 1024],
    )
    mocker.patch(
        "backend.api.features.workspace.routes.scan_content_safe",
        return_value=None,
    )
    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
    mocker.patch(
        "backend.api.features.workspace.routes.WorkspaceManager",
        return_value=mock_manager,
@@ -160,17 +304,14 @@ def test_upload_post_write_quota_race(mocker: pytest_mock.MockFixture):

    response = _upload()
    assert response.status_code == 413
-    mock_delete.assert_called_once_with("file-aaa-bbb", "ws-1")
+    mock_delete.assert_called_once_with("file-aaa-bbb", "ws-001")


-# ---- Any extension accepted (no allowlist) ----
-
-
-def test_upload_any_extension(mocker: pytest_mock.MockFixture):
+def test_upload_any_extension(mocker):
    """Any file extension should be accepted — ClamAV is the security layer."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -181,7 +322,7 @@ def test_upload_any_extension(mocker: pytest_mock.MockFixture):
        return_value=None,
    )
    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
    mocker.patch(
        "backend.api.features.workspace.routes.WorkspaceManager",
        return_value=mock_manager,
@@ -191,16 +332,13 @@ def test_upload_any_extension(mocker: pytest_mock.MockFixture):
    assert response.status_code == 200


-# ---- Virus scan rejection ----
-
-
-def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
+def test_upload_blocked_by_virus_scan(mocker):
    """Files flagged by ClamAV should be rejected and never written to storage."""
    from backend.api.features.store.exceptions import VirusDetectedError

    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -211,7 +349,7 @@ def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
        side_effect=VirusDetectedError("Eicar-Test-Signature"),
    )
    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
    mocker.patch(
        "backend.api.features.workspace.routes.WorkspaceManager",
        return_value=mock_manager,
@@ -219,18 +357,14 @@ def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):

    response = _upload(filename="evil.exe", content=b"X5O!P%@AP...")
    assert response.status_code == 400
-    assert "Virus detected" in response.text
    mock_manager.write_file.assert_not_called()


-# ---- No file extension ----
-
-
-def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
+def test_upload_file_without_extension(mocker):
    """Files without an extension should be accepted and stored as-is."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -241,7 +375,7 @@ def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
        return_value=None,
    )
    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
    mocker.patch(
        "backend.api.features.workspace.routes.WorkspaceManager",
        return_value=mock_manager,
@@ -257,14 +391,11 @@ def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
    assert mock_manager.write_file.call_args[0][1] == "Makefile"


-# ---- Filename sanitization (SF5) ----
-
-
-def test_upload_strips_path_components(mocker: pytest_mock.MockFixture):
+def test_upload_strips_path_components(mocker):
    """Path-traversal filenames should be reduced to their basename."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -275,28 +406,23 @@ def test_upload_strips_path_components(mocker: pytest_mock.MockFixture):
        return_value=None,
    )
    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
    mocker.patch(
        "backend.api.features.workspace.routes.WorkspaceManager",
        return_value=mock_manager,
    )

-    # Filename with traversal
    _upload(filename="../../etc/passwd.txt")

-    # write_file should have been called with just the basename
    mock_manager.write_file.assert_called_once()
    call_args = mock_manager.write_file.call_args
    assert call_args[0][1] == "passwd.txt"


-# ---- Download ----
-
-
-def test_download_file_not_found(mocker: pytest_mock.MockFixture):
+def test_download_file_not_found(mocker):
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_file",
@@ -307,14 +433,11 @@ def test_download_file_not_found(mocker: pytest_mock.MockFixture):
    assert response.status_code == 404


-# ---- Delete ----
-
-
-def test_delete_file_success(mocker: pytest_mock.MockFixture):
+def test_delete_file_success(mocker):
    """Deleting an existing file should return {"deleted": true}."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mock_manager = mocker.MagicMock()
    mock_manager.delete_file = mocker.AsyncMock(return_value=True)
@@ -329,11 +452,11 @@ def test_delete_file_success(mocker: pytest_mock.MockFixture):
    mock_manager.delete_file.assert_called_once_with("file-aaa-bbb")


-def test_delete_file_not_found(mocker: pytest_mock.MockFixture):
+def test_delete_file_not_found(mocker):
    """Deleting a non-existent file should return 404."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mock_manager = mocker.MagicMock()
    mock_manager.delete_file = mocker.AsyncMock(return_value=False)
@@ -347,7 +470,7 @@ def test_delete_file_not_found(mocker: pytest_mock.MockFixture):
    assert "File not found" in response.text


-def test_delete_file_no_workspace(mocker: pytest_mock.MockFixture):
+def test_delete_file_no_workspace(mocker):
    """Deleting when user has no workspace should return 404."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace",
@@ -357,3 +480,123 @@ def test_delete_file_no_workspace(mocker: pytest_mock.MockFixture):
    response = client.delete("/files/file-aaa-bbb")
    assert response.status_code == 404
    assert "Workspace not found" in response.text
+
+
+def test_upload_write_file_too_large_returns_413(mocker):
+    """write_file raises ValueError("File too large: …") → must map to 413."""
+    mocker.patch(
+        "backend.api.features.workspace.routes.get_or_create_workspace",
+        return_value=_make_workspace(),
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.get_workspace_total_size",
+        return_value=0,
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.scan_content_safe",
+        return_value=None,
+    )
+    mock_manager = mocker.MagicMock()
+    mock_manager.write_file = mocker.AsyncMock(
+        side_effect=ValueError("File too large: 900 bytes exceeds 1MB limit")
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.WorkspaceManager",
+        return_value=mock_manager,
+    )
+
+    response = _upload()
+    assert response.status_code == 413
+    assert "File too large" in response.text
+
+
+def test_upload_write_file_conflict_returns_409(mocker):
+    """Non-'File too large' ValueErrors from write_file stay as 409."""
+    mocker.patch(
+        "backend.api.features.workspace.routes.get_or_create_workspace",
+        return_value=_make_workspace(),
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.get_workspace_total_size",
+        return_value=0,
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.scan_content_safe",
+        return_value=None,
+    )
+    mock_manager = mocker.MagicMock()
+    mock_manager.write_file = mocker.AsyncMock(
+        side_effect=ValueError("File already exists at path: /sessions/x/a.txt")
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.WorkspaceManager",
+        return_value=mock_manager,
+    )
+
+    response = _upload()
+    assert response.status_code == 409
+    assert "already exists" in response.text
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_has_more_true_when_limit_exceeded(
+    mock_manager_cls, mock_get_workspace
+):
+    """The limit+1 fetch trick must flip has_more=True and trim the page."""
+    mock_get_workspace.return_value = _make_workspace()
+    # Backend was asked for limit+1=3, and returned exactly 3 items.
+    files = [
+        _make_file(id="f1", name="a.txt"),
+        _make_file(id="f2", name="b.txt"),
+        _make_file(id="f3", name="c.txt"),
+    ]
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = files
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files?limit=2")
+    assert response.status_code == 200
+    data = response.json()
+    assert data["has_more"] is True
+    assert len(data["files"]) == 2
+    assert data["files"][0]["id"] == "f1"
+    assert data["files"][1]["id"] == "f2"
+    mock_instance.list_files.assert_called_once_with(
+        limit=3, offset=0, include_all_sessions=True
+    )
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_has_more_false_when_exactly_page_size(
+    mock_manager_cls, mock_get_workspace
+):
+    """Exactly `limit` rows means we're on the last page — has_more=False."""
+    mock_get_workspace.return_value = _make_workspace()
+    files = [_make_file(id="f1", name="a.txt"), _make_file(id="f2", name="b.txt")]
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = files
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files?limit=2")
+    assert response.status_code == 200
+    data = response.json()
+    assert data["has_more"] is False
+    assert len(data["files"]) == 2
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_offset_is_echoed_back(mock_manager_cls, mock_get_workspace):
+    mock_get_workspace.return_value = _make_workspace()
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = []
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files?offset=50&limit=10")
+    assert response.status_code == 200
+    assert response.json()["offset"] == 50
+    mock_instance.list_files.assert_called_once_with(
+        limit=11, offset=50, include_all_sessions=True
+    )
--- a/autogpt_platform/backend/backend/blocks/llm.py
+++ b/autogpt_platform/backend/backend/blocks/llm.py
@@ -205,6 +205,19 @@ class LlmModel(str, Enum, metaclass=LlmModelMeta):
    KIMI_K2 = "moonshotai/kimi-k2"
    QWEN3_235B_A22B_THINKING = "qwen/qwen3-235b-a22b-thinking-2507"
    QWEN3_CODER = "qwen/qwen3-coder"
+    # Z.ai (Zhipu) models
+    ZAI_GLM_4_32B = "z-ai/glm-4-32b"
+    ZAI_GLM_4_5 = "z-ai/glm-4.5"
+    ZAI_GLM_4_5_AIR = "z-ai/glm-4.5-air"
+    ZAI_GLM_4_5_AIR_FREE = "z-ai/glm-4.5-air:free"
+    ZAI_GLM_4_5V = "z-ai/glm-4.5v"
+    ZAI_GLM_4_6 = "z-ai/glm-4.6"
+    ZAI_GLM_4_6V = "z-ai/glm-4.6v"
+    ZAI_GLM_4_7 = "z-ai/glm-4.7"
+    ZAI_GLM_4_7_FLASH = "z-ai/glm-4.7-flash"
+    ZAI_GLM_5 = "z-ai/glm-5"
+    ZAI_GLM_5_TURBO = "z-ai/glm-5-turbo"
+    ZAI_GLM_5V_TURBO = "z-ai/glm-5v-turbo"
    # Llama API models
    LLAMA_API_LLAMA_4_SCOUT = "Llama-4-Scout-17B-16E-Instruct-FP8"
    LLAMA_API_LLAMA4_MAVERICK = "Llama-4-Maverick-17B-128E-Instruct-FP8"
@@ -630,6 +643,43 @@ MODEL_METADATA = {
    LlmModel.QWEN3_CODER: ModelMetadata(
        "open_router", 262144, 262144, "Qwen 3 Coder", "OpenRouter", "Qwen", 3
    ),
+    # https://openrouter.ai/models?q=z-ai
+    LlmModel.ZAI_GLM_4_32B: ModelMetadata(
+        "open_router", 128000, 128000, "GLM 4 32B", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_4_5: ModelMetadata(
+        "open_router", 131072, 98304, "GLM 4.5", "OpenRouter", "Z.ai", 2
+    ),
+    LlmModel.ZAI_GLM_4_5_AIR: ModelMetadata(
+        "open_router", 131072, 98304, "GLM 4.5 Air", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_4_5_AIR_FREE: ModelMetadata(
+        "open_router", 131072, 96000, "GLM 4.5 Air (Free)", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_4_5V: ModelMetadata(
+        "open_router", 65536, 16384, "GLM 4.5V", "OpenRouter", "Z.ai", 2
+    ),
+    LlmModel.ZAI_GLM_4_6: ModelMetadata(
+        "open_router", 204800, 204800, "GLM 4.6", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_4_6V: ModelMetadata(
+        "open_router", 131072, 131072, "GLM 4.6V", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_4_7: ModelMetadata(
+        "open_router", 202752, 65535, "GLM 4.7", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_4_7_FLASH: ModelMetadata(
+        "open_router", 202752, 202752, "GLM 4.7 Flash", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_5: ModelMetadata(
+        "open_router", 80000, 80000, "GLM 5", "OpenRouter", "Z.ai", 2
+    ),
+    LlmModel.ZAI_GLM_5_TURBO: ModelMetadata(
+        "open_router", 202752, 131072, "GLM 5 Turbo", "OpenRouter", "Z.ai", 3
+    ),
+    LlmModel.ZAI_GLM_5V_TURBO: ModelMetadata(
+        "open_router", 202752, 131072, "GLM 5V Turbo", "OpenRouter", "Z.ai", 3
+    ),
    # Llama API models
    LlmModel.LLAMA_API_LLAMA_4_SCOUT: ModelMetadata(
        "llama_api",
--- a/autogpt_platform/backend/backend/copilot/baseline/service.py
+++ b/autogpt_platform/backend/backend/copilot/baseline/service.py
--- a/autogpt_platform/backend/backend/copilot/baseline/service_unit_test.py
+++ b/autogpt_platform/backend/backend/copilot/baseline/service_unit_test.py
@@ -0,0 +1,633 @@
+"""Unit tests for baseline service pure-logic helpers.
+
+These tests cover ``_baseline_conversation_updater`` and ``_BaselineStreamState``
+without requiring API keys, database connections, or network access.
+"""
+
+from unittest.mock import AsyncMock, patch
+
+import pytest
+from openai.types.chat import ChatCompletionToolParam
+
+from backend.copilot.baseline.service import (
+    _baseline_conversation_updater,
+    _BaselineStreamState,
+    _compress_session_messages,
+    _ThinkingStripper,
+)
+from backend.copilot.model import ChatMessage
+from backend.copilot.transcript_builder import TranscriptBuilder
+from backend.util.prompt import CompressResult
+from backend.util.tool_call_loop import LLMLoopResponse, LLMToolCall, ToolCallResult
+
+
+class TestBaselineStreamState:
+    def test_defaults(self):
+        state = _BaselineStreamState()
+        assert state.pending_events == []
+        assert state.assistant_text == ""
+        assert state.text_started is False
+        assert state.turn_prompt_tokens == 0
+        assert state.turn_completion_tokens == 0
+        assert state.text_block_id  # Should be a UUID string
+
+    def test_mutable_fields(self):
+        state = _BaselineStreamState()
+        state.assistant_text = "hello"
+        state.turn_prompt_tokens = 100
+        state.turn_completion_tokens = 50
+        assert state.assistant_text == "hello"
+        assert state.turn_prompt_tokens == 100
+        assert state.turn_completion_tokens == 50
+
+
+class TestBaselineConversationUpdater:
+    """Tests for _baseline_conversation_updater which updates the OpenAI
+    message list and transcript builder after each LLM call."""
+
+    def _make_transcript_builder(self) -> TranscriptBuilder:
+        builder = TranscriptBuilder()
+        builder.append_user("test question")
+        return builder
+
+    def test_text_only_response(self):
+        """When the LLM returns text without tool calls, the updater appends
+        a single assistant message and records it in the transcript."""
+        messages: list = []
+        builder = self._make_transcript_builder()
+        response = LLMLoopResponse(
+            response_text="Hello, world!",
+            tool_calls=[],
+            raw_response=None,
+            prompt_tokens=0,
+            completion_tokens=0,
+        )
+
+        _baseline_conversation_updater(
+            messages,
+            response,
+            tool_results=None,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        assert len(messages) == 1
+        assert messages[0]["role"] == "assistant"
+        assert messages[0]["content"] == "Hello, world!"
+        # Transcript should have user + assistant
+        assert builder.entry_count == 2
+        assert builder.last_entry_type == "assistant"
+
+    def test_tool_calls_response(self):
+        """When the LLM returns tool calls, the updater appends the assistant
+        message with tool_calls and tool result messages."""
+        messages: list = []
+        builder = self._make_transcript_builder()
+        response = LLMLoopResponse(
+            response_text="Let me search...",
+            tool_calls=[
+                LLMToolCall(
+                    id="tc_1",
+                    name="search",
+                    arguments='{"query": "test"}',
+                ),
+            ],
+            raw_response=None,
+            prompt_tokens=0,
+            completion_tokens=0,
+        )
+        tool_results = [
+            ToolCallResult(
+                tool_call_id="tc_1",
+                tool_name="search",
+                content="Found result",
+            ),
+        ]
+
+        _baseline_conversation_updater(
+            messages,
+            response,
+            tool_results=tool_results,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        # Messages: assistant (with tool_calls) + tool result
+        assert len(messages) == 2
+        assert messages[0]["role"] == "assistant"
+        assert messages[0]["content"] == "Let me search..."
+        assert len(messages[0]["tool_calls"]) == 1
+        assert messages[0]["tool_calls"][0]["id"] == "tc_1"
+        assert messages[1]["role"] == "tool"
+        assert messages[1]["tool_call_id"] == "tc_1"
+        assert messages[1]["content"] == "Found result"
+
+        # Transcript: user + assistant(tool_use) + user(tool_result)
+        assert builder.entry_count == 3
+
+    def test_tool_calls_without_text(self):
+        """Tool calls without accompanying text should still work."""
+        messages: list = []
+        builder = self._make_transcript_builder()
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[
+                LLMToolCall(id="tc_1", name="run", arguments="{}"),
+            ],
+            raw_response=None,
+            prompt_tokens=0,
+            completion_tokens=0,
+        )
+        tool_results = [
+            ToolCallResult(tool_call_id="tc_1", tool_name="run", content="done"),
+        ]
+
+        _baseline_conversation_updater(
+            messages,
+            response,
+            tool_results=tool_results,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        assert len(messages) == 2
+        assert "content" not in messages[0]  # No text content
+        assert messages[0]["tool_calls"][0]["function"]["name"] == "run"
+
+    def test_no_text_no_tools(self):
+        """When the response has no text and no tool calls, nothing is appended."""
+        messages: list = []
+        builder = self._make_transcript_builder()
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[],
+            raw_response=None,
+            prompt_tokens=0,
+            completion_tokens=0,
+        )
+
+        _baseline_conversation_updater(
+            messages,
+            response,
+            tool_results=None,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        assert len(messages) == 0
+        # Only the user entry from setup
+        assert builder.entry_count == 1
+
+    def test_multiple_tool_calls(self):
+        """Multiple tool calls in a single response are all recorded."""
+        messages: list = []
+        builder = self._make_transcript_builder()
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[
+                LLMToolCall(id="tc_1", name="tool_a", arguments="{}"),
+                LLMToolCall(id="tc_2", name="tool_b", arguments='{"x": 1}'),
+            ],
+            raw_response=None,
+            prompt_tokens=0,
+            completion_tokens=0,
+        )
+        tool_results = [
+            ToolCallResult(tool_call_id="tc_1", tool_name="tool_a", content="result_a"),
+            ToolCallResult(tool_call_id="tc_2", tool_name="tool_b", content="result_b"),
+        ]
+
+        _baseline_conversation_updater(
+            messages,
+            response,
+            tool_results=tool_results,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        # 1 assistant + 2 tool results
+        assert len(messages) == 3
+        assert len(messages[0]["tool_calls"]) == 2
+        assert messages[1]["tool_call_id"] == "tc_1"
+        assert messages[2]["tool_call_id"] == "tc_2"
+
+    def test_invalid_tool_arguments_handled(self):
+        """Tool call with invalid JSON arguments: the arguments field is
+        stored as-is in the message, and orjson failure falls back to {}
+        in the transcript content_blocks."""
+        messages: list = []
+        builder = self._make_transcript_builder()
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[
+                LLMToolCall(id="tc_1", name="tool_x", arguments="not-json"),
+            ],
+            raw_response=None,
+            prompt_tokens=0,
+            completion_tokens=0,
+        )
+        tool_results = [
+            ToolCallResult(tool_call_id="tc_1", tool_name="tool_x", content="ok"),
+        ]
+
+        _baseline_conversation_updater(
+            messages,
+            response,
+            tool_results=tool_results,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        # Should not raise — invalid JSON falls back to {} in transcript
+        assert len(messages) == 2
+        assert messages[0]["tool_calls"][0]["function"]["arguments"] == "not-json"
+
+
+class TestCompressSessionMessagesPreservesToolCalls:
+    """``_compress_session_messages`` must round-trip tool_calls + tool_call_id.
+
+    Compression serialises ChatMessage to dict for ``compress_context`` and
+    reifies the result back to ChatMessage.  A regression that drops
+    ``tool_calls`` or ``tool_call_id`` would corrupt the OpenAI message
+    list and break downstream tool-execution rounds.
+    """
+
+    @pytest.mark.asyncio
+    async def test_compressed_output_keeps_tool_calls_and_ids(self):
+        # Simulate compression that returns a summary + the most recent
+        # assistant(tool_call) + tool(tool_result) intact.
+        summary = {"role": "system", "content": "prior turns: user asked X"}
+        assistant_with_tc = {
+            "role": "assistant",
+            "content": "calling tool",
+            "tool_calls": [
+                {
+                    "id": "tc_abc",
+                    "type": "function",
+                    "function": {"name": "search", "arguments": '{"q":"y"}'},
+                }
+            ],
+        }
+        tool_result = {
+            "role": "tool",
+            "tool_call_id": "tc_abc",
+            "content": "search result",
+        }
+
+        compress_result = CompressResult(
+            messages=[summary, assistant_with_tc, tool_result],
+            token_count=100,
+            was_compacted=True,
+            original_token_count=5000,
+            messages_summarized=10,
+            messages_dropped=0,
+        )
+
+        # Input: messages that should be compressed.
+        input_messages = [
+            ChatMessage(role="user", content="q1"),
+            ChatMessage(
+                role="assistant",
+                content="calling tool",
+                tool_calls=[
+                    {
+                        "id": "tc_abc",
+                        "type": "function",
+                        "function": {
+                            "name": "search",
+                            "arguments": '{"q":"y"}',
+                        },
+                    }
+                ],
+            ),
+            ChatMessage(
+                role="tool",
+                tool_call_id="tc_abc",
+                content="search result",
+            ),
+        ]
+
+        with patch(
+            "backend.copilot.baseline.service.compress_context",
+            new=AsyncMock(return_value=compress_result),
+        ):
+            compressed = await _compress_session_messages(
+                input_messages, model="openrouter/anthropic/claude-opus-4"
+            )
+
+        # Summary, assistant(tool_calls), tool(tool_call_id).
+        assert len(compressed) == 3
+        # Assistant message must keep its tool_calls intact.
+        assistant_msg = compressed[1]
+        assert assistant_msg.role == "assistant"
+        assert assistant_msg.tool_calls is not None
+        assert len(assistant_msg.tool_calls) == 1
+        assert assistant_msg.tool_calls[0]["id"] == "tc_abc"
+        assert assistant_msg.tool_calls[0]["function"]["name"] == "search"
+        # Tool-role message must keep tool_call_id for OpenAI linkage.
+        tool_msg = compressed[2]
+        assert tool_msg.role == "tool"
+        assert tool_msg.tool_call_id == "tc_abc"
+        assert tool_msg.content == "search result"
+
+    @pytest.mark.asyncio
+    async def test_uncompressed_passthrough_keeps_fields(self):
+        """When compression is a no-op (was_compacted=False), the original
+        messages must be returned unchanged — including tool_calls."""
+        input_messages = [
+            ChatMessage(
+                role="assistant",
+                content="c",
+                tool_calls=[
+                    {
+                        "id": "t1",
+                        "type": "function",
+                        "function": {"name": "f", "arguments": "{}"},
+                    }
+                ],
+            ),
+            ChatMessage(role="tool", tool_call_id="t1", content="ok"),
+        ]
+
+        noop_result = CompressResult(
+            messages=[],  # ignored when was_compacted=False
+            token_count=10,
+            was_compacted=False,
+        )
+
+        with patch(
+            "backend.copilot.baseline.service.compress_context",
+            new=AsyncMock(return_value=noop_result),
+        ):
+            out = await _compress_session_messages(
+                input_messages, model="openrouter/anthropic/claude-opus-4"
+            )
+
+        assert out is input_messages  # same list returned
+        assert out[0].tool_calls is not None
+        assert out[0].tool_calls[0]["id"] == "t1"
+        assert out[1].tool_call_id == "t1"
+
+
+# ---- _ThinkingStripper tests ---- #
+
+
+def test_thinking_stripper_basic_thinking_tag() -> None:
+    """<thinking>...</thinking> blocks are fully stripped."""
+    s = _ThinkingStripper()
+    assert s.process("<thinking>internal reasoning here</thinking>Hello!") == "Hello!"
+
+
+def test_thinking_stripper_internal_reasoning_tag() -> None:
+    """<internal_reasoning>...</internal_reasoning> blocks (Gemini) are stripped."""
+    s = _ThinkingStripper()
+    assert (
+        s.process("<internal_reasoning>step by step</internal_reasoning>Answer")
+        == "Answer"
+    )
+
+
+def test_thinking_stripper_split_across_chunks() -> None:
+    """Tags split across multiple chunks are handled correctly."""
+    s = _ThinkingStripper()
+    out = s.process("Hello <thin")
+    out += s.process("king>secret</thinking> world")
+    assert out == "Hello  world"
+
+
+def test_thinking_stripper_plain_text_preserved() -> None:
+    """Plain text with the word 'thinking' is not stripped."""
+    s = _ThinkingStripper()
+    assert (
+        s.process("I am thinking about this problem")
+        == "I am thinking about this problem"
+    )
+
+
+def test_thinking_stripper_multiple_blocks() -> None:
+    """Multiple reasoning blocks in one stream are all stripped."""
+    s = _ThinkingStripper()
+    result = s.process(
+        "A<thinking>x</thinking>B<internal_reasoning>y</internal_reasoning>C"
+    )
+    assert result == "ABC"
+
+
+def test_thinking_stripper_flush_discards_unclosed() -> None:
+    """Unclosed reasoning block is discarded on flush."""
+    s = _ThinkingStripper()
+    s.process("Start<thinking>never closed")
+    flushed = s.flush()
+    assert "never closed" not in flushed
+
+
+def test_thinking_stripper_empty_block() -> None:
+    """Empty reasoning blocks are handled gracefully."""
+    s = _ThinkingStripper()
+    assert s.process("Before<thinking></thinking>After") == "BeforeAfter"
+
+
+# ---- _filter_tools_by_permissions tests ---- #
+
+
+def _make_tool(name: str) -> ChatCompletionToolParam:
+    """Build a minimal OpenAI ChatCompletionToolParam."""
+    return ChatCompletionToolParam(
+        type="function",
+        function={"name": name, "parameters": {}},
+    )
+
+
+class TestFilterToolsByPermissions:
+    """Tests for _filter_tools_by_permissions."""
+
+    @patch(
+        "backend.copilot.permissions.all_known_tool_names",
+        return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
+    )
+    def test_empty_permissions_returns_all(self, _mock_names):
+        """Empty permissions (no filtering) returns every tool unchanged."""
+        from backend.copilot.baseline.service import _filter_tools_by_permissions
+        from backend.copilot.permissions import CopilotPermissions
+
+        tools = [_make_tool("run_block"), _make_tool("web_fetch")]
+        perms = CopilotPermissions()
+        result = _filter_tools_by_permissions(tools, perms)
+        assert result == tools
+
+    @patch(
+        "backend.copilot.permissions.all_known_tool_names",
+        return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
+    )
+    def test_allowlist_keeps_only_matching(self, _mock_names):
+        """Explicit allowlist (tools_exclude=False) keeps only listed tools."""
+        from backend.copilot.baseline.service import _filter_tools_by_permissions
+        from backend.copilot.permissions import CopilotPermissions
+
+        tools = [
+            _make_tool("run_block"),
+            _make_tool("web_fetch"),
+            _make_tool("bash_exec"),
+        ]
+        perms = CopilotPermissions(tools=["web_fetch"], tools_exclude=False)
+        result = _filter_tools_by_permissions(tools, perms)
+        assert len(result) == 1
+        assert result[0]["function"]["name"] == "web_fetch"
+
+    @patch(
+        "backend.copilot.permissions.all_known_tool_names",
+        return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
+    )
+    def test_blacklist_excludes_listed(self, _mock_names):
+        """Blacklist (tools_exclude=True) removes only the listed tools."""
+        from backend.copilot.baseline.service import _filter_tools_by_permissions
+        from backend.copilot.permissions import CopilotPermissions
+
+        tools = [
+            _make_tool("run_block"),
+            _make_tool("web_fetch"),
+            _make_tool("bash_exec"),
+        ]
+        perms = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
+        result = _filter_tools_by_permissions(tools, perms)
+        names = [t["function"]["name"] for t in result]
+        assert "bash_exec" not in names
+        assert "run_block" in names
+        assert "web_fetch" in names
+        assert len(result) == 2
+
+    @patch(
+        "backend.copilot.permissions.all_known_tool_names",
+        return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
+    )
+    def test_unknown_tool_name_filtered_out(self, _mock_names):
+        """A tool whose name is not in all_known_tool_names is dropped."""
+        from backend.copilot.baseline.service import _filter_tools_by_permissions
+        from backend.copilot.permissions import CopilotPermissions
+
+        tools = [_make_tool("run_block"), _make_tool("unknown_tool")]
+        perms = CopilotPermissions(tools=["run_block"], tools_exclude=False)
+        result = _filter_tools_by_permissions(tools, perms)
+        names = [t["function"]["name"] for t in result]
+        assert "unknown_tool" not in names
+        assert names == ["run_block"]
+
+
+# ---- _prepare_baseline_attachments tests ---- #
+
+
+class TestPrepareBaselineAttachments:
+    """Tests for _prepare_baseline_attachments."""
+
+    @pytest.mark.asyncio
+    async def test_empty_file_ids(self):
+        """Empty file_ids returns empty hint and blocks."""
+        from backend.copilot.baseline.service import _prepare_baseline_attachments
+
+        hint, blocks = await _prepare_baseline_attachments([], "user1", "sess1", "/tmp")
+        assert hint == ""
+        assert blocks == []
+
+    @pytest.mark.asyncio
+    async def test_empty_user_id(self):
+        """Empty user_id returns empty hint and blocks."""
+        from backend.copilot.baseline.service import _prepare_baseline_attachments
+
+        hint, blocks = await _prepare_baseline_attachments(
+            ["file1"], "", "sess1", "/tmp"
+        )
+        assert hint == ""
+        assert blocks == []
+
+    @pytest.mark.asyncio
+    async def test_image_file_returns_vision_blocks(self):
+        """A PNG image within size limits is returned as a base64 vision block."""
+        from backend.copilot.baseline.service import _prepare_baseline_attachments
+
+        fake_info = AsyncMock()
+        fake_info.name = "photo.png"
+        fake_info.mime_type = "image/png"
+        fake_info.size_bytes = 1024
+
+        fake_manager = AsyncMock()
+        fake_manager.get_file_info = AsyncMock(return_value=fake_info)
+        fake_manager.read_file_by_id = AsyncMock(return_value=b"\x89PNG_FAKE_DATA")
+
+        with patch(
+            "backend.copilot.baseline.service.get_workspace_manager",
+            new=AsyncMock(return_value=fake_manager),
+        ):
+            hint, blocks = await _prepare_baseline_attachments(
+                ["fid1"], "user1", "sess1", "/tmp/workdir"
+            )
+
+        assert len(blocks) == 1
+        assert blocks[0]["type"] == "image"
+        assert blocks[0]["source"]["media_type"] == "image/png"
+        assert blocks[0]["source"]["type"] == "base64"
+        assert "photo.png" in hint
+        assert "embedded as image" in hint
+
+    @pytest.mark.asyncio
+    async def test_non_image_file_saved_to_working_dir(self, tmp_path):
+        """A non-image file is written to working_dir."""
+        from backend.copilot.baseline.service import _prepare_baseline_attachments
+
+        fake_info = AsyncMock()
+        fake_info.name = "data.csv"
+        fake_info.mime_type = "text/csv"
+        fake_info.size_bytes = 42
+
+        fake_manager = AsyncMock()
+        fake_manager.get_file_info = AsyncMock(return_value=fake_info)
+        fake_manager.read_file_by_id = AsyncMock(return_value=b"col1,col2\na,b")
+
+        with patch(
+            "backend.copilot.baseline.service.get_workspace_manager",
+            new=AsyncMock(return_value=fake_manager),
+        ):
+            hint, blocks = await _prepare_baseline_attachments(
+                ["fid1"], "user1", "sess1", str(tmp_path)
+            )
+
+        assert blocks == []
+        assert "data.csv" in hint
+        assert "saved to" in hint
+        saved = tmp_path / "data.csv"
+        assert saved.exists()
+        assert saved.read_bytes() == b"col1,col2\na,b"
+
+    @pytest.mark.asyncio
+    async def test_file_not_found_skipped(self):
+        """When get_file_info returns None the file is silently skipped."""
+        from backend.copilot.baseline.service import _prepare_baseline_attachments
+
+        fake_manager = AsyncMock()
+        fake_manager.get_file_info = AsyncMock(return_value=None)
+
+        with patch(
+            "backend.copilot.baseline.service.get_workspace_manager",
+            new=AsyncMock(return_value=fake_manager),
+        ):
+            hint, blocks = await _prepare_baseline_attachments(
+                ["missing_id"], "user1", "sess1", "/tmp"
+            )
+
+        assert hint == ""
+        assert blocks == []
+
+    @pytest.mark.asyncio
+    async def test_workspace_manager_error(self):
+        """When get_workspace_manager raises, returns empty results."""
+        from backend.copilot.baseline.service import _prepare_baseline_attachments
+
+        with patch(
+            "backend.copilot.baseline.service.get_workspace_manager",
+            new=AsyncMock(side_effect=RuntimeError("connection failed")),
+        ):
+            hint, blocks = await _prepare_baseline_attachments(
+                ["fid1"], "user1", "sess1", "/tmp"
+            )
+
+        assert hint == ""
+        assert blocks == []
--- a/autogpt_platform/backend/backend/copilot/baseline/transcript_integration_test.py
+++ b/autogpt_platform/backend/backend/copilot/baseline/transcript_integration_test.py
@@ -0,0 +1,667 @@
+"""Integration tests for baseline transcript flow.
+
+Exercises the real helpers in ``baseline/service.py`` that download,
+validate, load, append to, backfill, and upload the transcript.
+Storage is mocked via ``download_transcript`` / ``upload_transcript``
+patches; no network access is required.
+"""
+
+import json as stdlib_json
+from unittest.mock import AsyncMock, patch
+
+import pytest
+
+from backend.copilot.baseline.service import (
+    _load_prior_transcript,
+    _record_turn_to_transcript,
+    _resolve_baseline_model,
+    _upload_final_transcript,
+    is_transcript_stale,
+    should_upload_transcript,
+)
+from backend.copilot.service import config
+from backend.copilot.transcript import (
+    STOP_REASON_END_TURN,
+    STOP_REASON_TOOL_USE,
+    TranscriptDownload,
+)
+from backend.copilot.transcript_builder import TranscriptBuilder
+from backend.util.tool_call_loop import LLMLoopResponse, LLMToolCall, ToolCallResult
+
+
+def _make_transcript_content(*roles: str) -> str:
+    """Build a minimal valid JSONL transcript from role names."""
+    lines = []
+    parent = ""
+    for i, role in enumerate(roles):
+        uid = f"uuid-{i}"
+        entry: dict = {
+            "type": role,
+            "uuid": uid,
+            "parentUuid": parent,
+            "message": {
+                "role": role,
+                "content": [{"type": "text", "text": f"{role} message {i}"}],
+            },
+        }
+        if role == "assistant":
+            entry["message"]["id"] = f"msg_{i}"
+            entry["message"]["model"] = "test-model"
+            entry["message"]["type"] = "message"
+            entry["message"]["stop_reason"] = STOP_REASON_END_TURN
+        lines.append(stdlib_json.dumps(entry))
+        parent = uid
+    return "\n".join(lines) + "\n"
+
+
+class TestResolveBaselineModel:
+    """Model selection honours the per-request mode."""
+
+    def test_fast_mode_selects_fast_model(self):
+        assert _resolve_baseline_model("fast") == config.fast_model
+
+    def test_extended_thinking_selects_default_model(self):
+        assert _resolve_baseline_model("extended_thinking") == config.model
+
+    def test_none_mode_selects_default_model(self):
+        """Critical: baseline users without a mode MUST keep the default (opus)."""
+        assert _resolve_baseline_model(None) == config.model
+
+    def test_default_and_fast_models_differ(self):
+        """Sanity: the two tiers are actually distinct in production config."""
+        assert config.model != config.fast_model
+
+
+class TestLoadPriorTranscript:
+    """``_load_prior_transcript`` wraps the download + validate + load flow."""
+
+    @pytest.mark.asyncio
+    async def test_loads_fresh_transcript(self):
+        builder = TranscriptBuilder()
+        content = _make_transcript_content("user", "assistant")
+        download = TranscriptDownload(content=content, message_count=2)
+
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(return_value=download),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=3,
+                transcript_builder=builder,
+            )
+
+        assert covers is True
+        assert builder.entry_count == 2
+        assert builder.last_entry_type == "assistant"
+
+    @pytest.mark.asyncio
+    async def test_rejects_stale_transcript(self):
+        """msg_count strictly less than session-1 is treated as stale."""
+        builder = TranscriptBuilder()
+        content = _make_transcript_content("user", "assistant")
+        # session has 6 messages, transcript only covers 2 → stale.
+        download = TranscriptDownload(content=content, message_count=2)
+
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(return_value=download),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=6,
+                transcript_builder=builder,
+            )
+
+        assert covers is False
+        assert builder.is_empty
+
+    @pytest.mark.asyncio
+    async def test_missing_transcript_returns_false(self):
+        builder = TranscriptBuilder()
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(return_value=None),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=2,
+                transcript_builder=builder,
+            )
+
+        assert covers is False
+        assert builder.is_empty
+
+    @pytest.mark.asyncio
+    async def test_invalid_transcript_returns_false(self):
+        builder = TranscriptBuilder()
+        download = TranscriptDownload(
+            content='{"type":"progress","uuid":"a"}\n',
+            message_count=1,
+        )
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(return_value=download),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=2,
+                transcript_builder=builder,
+            )
+
+        assert covers is False
+        assert builder.is_empty
+
+    @pytest.mark.asyncio
+    async def test_download_exception_returns_false(self):
+        builder = TranscriptBuilder()
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(side_effect=RuntimeError("boom")),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=2,
+                transcript_builder=builder,
+            )
+
+        assert covers is False
+        assert builder.is_empty
+
+    @pytest.mark.asyncio
+    async def test_zero_message_count_not_stale(self):
+        """When msg_count is 0 (unknown), staleness check is skipped."""
+        builder = TranscriptBuilder()
+        download = TranscriptDownload(
+            content=_make_transcript_content("user", "assistant"),
+            message_count=0,
+        )
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(return_value=download),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=20,
+                transcript_builder=builder,
+            )
+
+        assert covers is True
+        assert builder.entry_count == 2
+
+
+class TestUploadFinalTranscript:
+    """``_upload_final_transcript`` serialises and calls storage."""
+
+    @pytest.mark.asyncio
+    async def test_uploads_valid_transcript(self):
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "hello"}],
+            model="test-model",
+            stop_reason=STOP_REASON_END_TURN,
+        )
+
+        upload_mock = AsyncMock(return_value=None)
+        with patch(
+            "backend.copilot.baseline.service.upload_transcript",
+            new=upload_mock,
+        ):
+            await _upload_final_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                transcript_builder=builder,
+                session_msg_count=2,
+            )
+
+        upload_mock.assert_awaited_once()
+        assert upload_mock.await_args is not None
+        call_kwargs = upload_mock.await_args.kwargs
+        assert call_kwargs["user_id"] == "user-1"
+        assert call_kwargs["session_id"] == "session-1"
+        assert call_kwargs["message_count"] == 2
+        assert "hello" in call_kwargs["content"]
+
+    @pytest.mark.asyncio
+    async def test_skips_upload_when_builder_empty(self):
+        builder = TranscriptBuilder()
+        upload_mock = AsyncMock(return_value=None)
+        with patch(
+            "backend.copilot.baseline.service.upload_transcript",
+            new=upload_mock,
+        ):
+            await _upload_final_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                transcript_builder=builder,
+                session_msg_count=0,
+            )
+
+        upload_mock.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_swallows_upload_exceptions(self):
+        """Upload failures should not propagate (flow continues for the user)."""
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "hello"}],
+            model="test-model",
+            stop_reason=STOP_REASON_END_TURN,
+        )
+
+        with patch(
+            "backend.copilot.baseline.service.upload_transcript",
+            new=AsyncMock(side_effect=RuntimeError("storage unavailable")),
+        ):
+            # Should not raise.
+            await _upload_final_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                transcript_builder=builder,
+                session_msg_count=2,
+            )
+
+
+class TestRecordTurnToTranscript:
+    """``_record_turn_to_transcript`` translates LLMLoopResponse → transcript."""
+
+    def test_records_final_assistant_text(self):
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+
+        response = LLMLoopResponse(
+            response_text="hello there",
+            tool_calls=[],
+            raw_response=None,
+        )
+        _record_turn_to_transcript(
+            response,
+            tool_results=None,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        assert builder.entry_count == 2
+        assert builder.last_entry_type == "assistant"
+        jsonl = builder.to_jsonl()
+        assert "hello there" in jsonl
+        assert STOP_REASON_END_TURN in jsonl
+
+    def test_records_tool_use_then_tool_result(self):
+        """Anthropic ordering: assistant(tool_use) → user(tool_result)."""
+        builder = TranscriptBuilder()
+        builder.append_user(content="use a tool")
+
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[
+                LLMToolCall(id="call-1", name="echo", arguments='{"text":"hi"}')
+            ],
+            raw_response=None,
+        )
+        tool_results = [
+            ToolCallResult(tool_call_id="call-1", tool_name="echo", content="hi")
+        ]
+        _record_turn_to_transcript(
+            response,
+            tool_results,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        # user, assistant(tool_use), user(tool_result) = 3 entries
+        assert builder.entry_count == 3
+        jsonl = builder.to_jsonl()
+        assert STOP_REASON_TOOL_USE in jsonl
+        assert "tool_use" in jsonl
+        assert "tool_result" in jsonl
+        assert "call-1" in jsonl
+
+    def test_records_nothing_on_empty_response(self):
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[],
+            raw_response=None,
+        )
+        _record_turn_to_transcript(
+            response,
+            tool_results=None,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        assert builder.entry_count == 1
+
+    def test_malformed_tool_args_dont_crash(self):
+        """Bad JSON in tool arguments falls back to {} without raising."""
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[LLMToolCall(id="call-1", name="echo", arguments="{not-json")],
+            raw_response=None,
+        )
+        tool_results = [
+            ToolCallResult(tool_call_id="call-1", tool_name="echo", content="ok")
+        ]
+        _record_turn_to_transcript(
+            response,
+            tool_results,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        assert builder.entry_count == 3
+        jsonl = builder.to_jsonl()
+        assert '"input":{}' in jsonl
+
+
+class TestRoundTrip:
+    """End-to-end: load prior → append new turn → upload."""
+
+    @pytest.mark.asyncio
+    async def test_full_round_trip(self):
+        prior = _make_transcript_content("user", "assistant")
+        download = TranscriptDownload(content=prior, message_count=2)
+
+        builder = TranscriptBuilder()
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(return_value=download),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=3,
+                transcript_builder=builder,
+            )
+        assert covers is True
+        assert builder.entry_count == 2
+
+        # New user turn.
+        builder.append_user(content="new question")
+        assert builder.entry_count == 3
+
+        # New assistant turn.
+        response = LLMLoopResponse(
+            response_text="new answer",
+            tool_calls=[],
+            raw_response=None,
+        )
+        _record_turn_to_transcript(
+            response,
+            tool_results=None,
+            transcript_builder=builder,
+            model="test-model",
+        )
+        assert builder.entry_count == 4
+
+        # Upload.
+        upload_mock = AsyncMock(return_value=None)
+        with patch(
+            "backend.copilot.baseline.service.upload_transcript",
+            new=upload_mock,
+        ):
+            await _upload_final_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                transcript_builder=builder,
+                session_msg_count=4,
+            )
+
+        upload_mock.assert_awaited_once()
+        assert upload_mock.await_args is not None
+        uploaded = upload_mock.await_args.kwargs["content"]
+        assert "new question" in uploaded
+        assert "new answer" in uploaded
+        # Original content preserved in the round trip.
+        assert "user message 0" in uploaded
+        assert "assistant message 1" in uploaded
+
+    @pytest.mark.asyncio
+    async def test_backfill_append_guard(self):
+        """Backfill only runs when the last entry is not already assistant."""
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+
+        # Simulate the backfill guard from stream_chat_completion_baseline.
+        assistant_text = "partial text before error"
+        if builder.last_entry_type != "assistant":
+            builder.append_assistant(
+                content_blocks=[{"type": "text", "text": assistant_text}],
+                model="test-model",
+                stop_reason=STOP_REASON_END_TURN,
+            )
+
+        assert builder.last_entry_type == "assistant"
+        assert "partial text before error" in builder.to_jsonl()
+
+        # Second invocation: the guard must prevent double-append.
+        initial_count = builder.entry_count
+        if builder.last_entry_type != "assistant":
+            builder.append_assistant(
+                content_blocks=[{"type": "text", "text": "duplicate"}],
+                model="test-model",
+                stop_reason=STOP_REASON_END_TURN,
+            )
+        assert builder.entry_count == initial_count
+
+
+class TestIsTranscriptStale:
+    """``is_transcript_stale`` gates prior-transcript loading."""
+
+    def test_none_download_is_not_stale(self):
+        assert is_transcript_stale(None, session_msg_count=5) is False
+
+    def test_zero_message_count_is_not_stale(self):
+        """Legacy transcripts without msg_count tracking must remain usable."""
+        dl = TranscriptDownload(content="", message_count=0)
+        assert is_transcript_stale(dl, session_msg_count=20) is False
+
+    def test_stale_when_covers_less_than_prefix(self):
+        dl = TranscriptDownload(content="", message_count=2)
+        # session has 6 messages; transcript must cover at least 5 (6-1).
+        assert is_transcript_stale(dl, session_msg_count=6) is True
+
+    def test_fresh_when_covers_full_prefix(self):
+        dl = TranscriptDownload(content="", message_count=5)
+        assert is_transcript_stale(dl, session_msg_count=6) is False
+
+    def test_fresh_when_exceeds_prefix(self):
+        """Race: transcript ahead of session count is still acceptable."""
+        dl = TranscriptDownload(content="", message_count=10)
+        assert is_transcript_stale(dl, session_msg_count=6) is False
+
+    def test_boundary_equal_to_prefix_minus_one(self):
+        dl = TranscriptDownload(content="", message_count=5)
+        assert is_transcript_stale(dl, session_msg_count=6) is False
+
+
+class TestShouldUploadTranscript:
+    """``should_upload_transcript`` gates the final upload."""
+
+    def test_upload_allowed_for_user_with_coverage(self):
+        assert should_upload_transcript("user-1", True) is True
+
+    def test_upload_skipped_when_no_user(self):
+        assert should_upload_transcript(None, True) is False
+
+    def test_upload_skipped_when_empty_user(self):
+        assert should_upload_transcript("", True) is False
+
+    def test_upload_skipped_without_coverage(self):
+        """Partial transcript must never clobber a more complete stored one."""
+        assert should_upload_transcript("user-1", False) is False
+
+    def test_upload_skipped_when_no_user_and_no_coverage(self):
+        assert should_upload_transcript(None, False) is False
+
+
+class TestTranscriptLifecycle:
+    """End-to-end: download → validate → build → upload.
+
+    Simulates the full transcript lifecycle inside
+    ``stream_chat_completion_baseline`` by mocking the storage layer and
+    driving each step through the real helpers.
+    """
+
+    @pytest.mark.asyncio
+    async def test_full_lifecycle_happy_path(self):
+        """Fresh download, append a turn, upload covers the session."""
+        builder = TranscriptBuilder()
+        prior = _make_transcript_content("user", "assistant")
+        download = TranscriptDownload(content=prior, message_count=2)
+
+        upload_mock = AsyncMock(return_value=None)
+        with (
+            patch(
+                "backend.copilot.baseline.service.download_transcript",
+                new=AsyncMock(return_value=download),
+            ),
+            patch(
+                "backend.copilot.baseline.service.upload_transcript",
+                new=upload_mock,
+            ),
+        ):
+            # --- 1. Download & load prior transcript ---
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=3,
+                transcript_builder=builder,
+            )
+            assert covers is True
+
+            # --- 2. Append a new user turn + a new assistant response ---
+            builder.append_user(content="follow-up question")
+            _record_turn_to_transcript(
+                LLMLoopResponse(
+                    response_text="follow-up answer",
+                    tool_calls=[],
+                    raw_response=None,
+                ),
+                tool_results=None,
+                transcript_builder=builder,
+                model="test-model",
+            )
+
+            # --- 3. Gate + upload ---
+            assert (
+                should_upload_transcript(
+                    user_id="user-1", transcript_covers_prefix=covers
+                )
+                is True
+            )
+            await _upload_final_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                transcript_builder=builder,
+                session_msg_count=4,
+            )
+
+        upload_mock.assert_awaited_once()
+        assert upload_mock.await_args is not None
+        uploaded = upload_mock.await_args.kwargs["content"]
+        assert "follow-up question" in uploaded
+        assert "follow-up answer" in uploaded
+        # Original prior-turn content preserved.
+        assert "user message 0" in uploaded
+        assert "assistant message 1" in uploaded
+
+    @pytest.mark.asyncio
+    async def test_lifecycle_stale_download_suppresses_upload(self):
+        """Stale download → covers=False → upload must be skipped."""
+        builder = TranscriptBuilder()
+        # session has 10 msgs but stored transcript only covers 2 → stale.
+        stale = TranscriptDownload(
+            content=_make_transcript_content("user", "assistant"),
+            message_count=2,
+        )
+
+        upload_mock = AsyncMock(return_value=None)
+        with (
+            patch(
+                "backend.copilot.baseline.service.download_transcript",
+                new=AsyncMock(return_value=stale),
+            ),
+            patch(
+                "backend.copilot.baseline.service.upload_transcript",
+                new=upload_mock,
+            ),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=10,
+                transcript_builder=builder,
+            )
+
+        assert covers is False
+        # The caller's gate mirrors the production path.
+        assert (
+            should_upload_transcript(user_id="user-1", transcript_covers_prefix=covers)
+            is False
+        )
+        upload_mock.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_lifecycle_anonymous_user_skips_upload(self):
+        """Anonymous (user_id=None) → upload gate must return False."""
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "hello"}],
+            model="test-model",
+            stop_reason=STOP_REASON_END_TURN,
+        )
+
+        assert (
+            should_upload_transcript(user_id=None, transcript_covers_prefix=True)
+            is False
+        )
+
+    @pytest.mark.asyncio
+    async def test_lifecycle_missing_download_still_uploads_new_content(self):
+        """No prior transcript → covers defaults to True in the service,
+        new turn should upload cleanly."""
+        builder = TranscriptBuilder()
+        upload_mock = AsyncMock(return_value=None)
+        with (
+            patch(
+                "backend.copilot.baseline.service.download_transcript",
+                new=AsyncMock(return_value=None),
+            ),
+            patch(
+                "backend.copilot.baseline.service.upload_transcript",
+                new=upload_mock,
+            ),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=1,
+                transcript_builder=builder,
+            )
+            # No download: covers is False, so the production path would
+            # skip upload. This protects against overwriting a future
+            # more-complete transcript with a single-turn snapshot.
+            assert covers is False
+            assert (
+                should_upload_transcript(
+                    user_id="user-1", transcript_covers_prefix=covers
+                )
+                is False
+            )
+            upload_mock.assert_not_awaited()
--- a/autogpt_platform/backend/backend/copilot/config.py
+++ b/autogpt_platform/backend/backend/copilot/config.py
@@ -8,13 +8,26 @@ from pydantic_settings import BaseSettings

 from backend.util.clients import OPENROUTER_BASE_URL

+# Per-request routing mode for a single chat turn.
+# - 'fast': route to the baseline OpenAI-compatible path with the cheaper model.
+# - 'extended_thinking': route to the Claude Agent SDK path with the default
+#   (opus) model.
+# ``None`` means "no override"; the server falls back to the Claude Code
+# subscription flag → LaunchDarkly COPILOT_SDK → config.use_claude_agent_sdk.
+CopilotMode = Literal["fast", "extended_thinking"]
+

 class ChatConfig(BaseSettings):
    """Configuration for the chat system."""

    # OpenAI API Configuration
    model: str = Field(
-        default="anthropic/claude-opus-4.6", description="Default model to use"
+        default="anthropic/claude-opus-4.6",
+        description="Default model for extended thinking mode",
+    )
+    fast_model: str = Field(
+        default="anthropic/claude-sonnet-4",
+        description="Model for fast mode (baseline path). Should be faster/cheaper than the default model.",
    )
    title_model: str = Field(
        default="openai/gpt-4o-mini",
@@ -81,11 +94,11 @@ class ChatConfig(BaseSettings):
    # allows ~70-100 turns/day.
    # Checked at the HTTP layer (routes.py) before each turn.
    #
-    # TODO: These are deploy-time constants applied identically to every user.
-    #  If per-user or per-plan limits are needed (e.g., free tier vs paid), these
-    #  must move to the database (e.g., a UserPlan table) and get_usage_status /
-    #  check_rate_limit would look up each user's specific limits instead of
-    #  reading config.daily_token_limit / config.weekly_token_limit.
+    # These are base limits for the FREE tier. Higher tiers (PRO, BUSINESS,
+    # ENTERPRISE) multiply these by their tier multiplier (see
+    # rate_limit.TIER_MULTIPLIERS). User tier is stored in the
+    # User.subscriptionTier DB column and resolved inside
+    # get_global_rate_limits().
    daily_token_limit: int = Field(
        default=2_500_000,
        description="Max tokens per day, resets at midnight UTC (0 = unlimited)",
@@ -133,6 +146,32 @@ class ChatConfig(BaseSettings):
        description="Use --resume for multi-turn conversations instead of "
        "history compression. Falls back to compression when unavailable.",
    )
+    claude_agent_fallback_model: str = Field(
+        default="claude-sonnet-4-20250514",
+        description="Fallback model when the primary model is unavailable (e.g. 529 "
+        "overloaded). The SDK automatically retries with this cheaper model.",
+    )
+    claude_agent_max_turns: int = Field(
+        default=1000,
+        ge=1,
+        le=10000,
+        description="Maximum number of agentic turns (tool-use loops) per query. "
+        "Prevents runaway tool loops from burning budget.",
+    )
+    claude_agent_max_budget_usd: float = Field(
+        default=100.0,
+        ge=0.01,
+        le=1000.0,
+        description="Maximum spend in USD per SDK query. The CLI aborts the "
+        "request if this budget is exceeded.",
+    )
+    claude_agent_max_transient_retries: int = Field(
+        default=3,
+        ge=0,
+        le=10,
+        description="Maximum number of retries for transient API errors "
+        "(429, 5xx, ECONNRESET) before surfacing the error to the user.",
+    )
    use_openrouter: bool = Field(
        default=True,
        description="Enable routing API calls through the OpenRouter proxy. "
--- a/autogpt_platform/backend/backend/copilot/constants.py
+++ b/autogpt_platform/backend/backend/copilot/constants.py
@@ -44,12 +44,31 @@ def parse_node_id_from_exec_id(node_exec_id: str) -> str:
 # Transient Anthropic API error detection
 # ---------------------------------------------------------------------------
 # Patterns in error text that indicate a transient Anthropic API error
-# (ECONNRESET / dropped TCP connection) which is retryable.
+# which is retryable.  Covers:
+#   - Connection-level: ECONNRESET, dropped TCP connections
+#   - HTTP 429: rate-limit / too-many-requests
+#   - HTTP 5xx: server errors
+#
+# Prefer specific status-code patterns over natural-language phrases
+# (e.g. "overloaded", "bad gateway") — those phrases can appear in
+# application-level SDK messages and would trigger spurious retries.
 _TRANSIENT_ERROR_PATTERNS = (
+    # Connection-level
    "socket connection was closed unexpectedly",
    "ECONNRESET",
    "connection was forcibly closed",
    "network socket disconnected",
+    # 429 rate-limit patterns
+    "rate limit",
+    "rate_limit",
+    "too many requests",
+    "status code 429",
+    # 5xx server error patterns (status-code-specific to avoid false positives)
+    "status code 529",
+    "status code 500",
+    "status code 502",
+    "status code 503",
+    "status code 504",
 )

 FRIENDLY_TRANSIENT_MSG = "Anthropic connection interrupted — please retry"
--- a/autogpt_platform/backend/backend/copilot/db.py
+++ b/autogpt_platform/backend/backend/copilot/db.py
@@ -14,6 +14,7 @@ from prisma.types import (
    ChatSessionUpdateInput,
    ChatSessionWhereInput,
 )
+from pydantic import BaseModel

 from backend.data import db
 from backend.util.json import SafeJson, sanitize_string
@@ -23,12 +24,22 @@ from .model import (
    ChatSession,
    ChatSessionInfo,
    ChatSessionMetadata,
-    invalidate_session_cache,
+    cache_chat_session,
 )
+from .model import get_chat_session as get_chat_session_cached

 logger = logging.getLogger(__name__)


+class PaginatedMessages(BaseModel):
+    """Result of a paginated message query."""
+
+    messages: list[ChatMessage]
+    has_more: bool
+    oldest_sequence: int | None
+    session: ChatSessionInfo
+
+
 async def get_chat_session(session_id: str) -> ChatSession | None:
    """Get a chat session by ID from the database."""
    session = await PrismaChatSession.prisma().find_unique(
@@ -38,6 +49,116 @@ async def get_chat_session(session_id: str) -> ChatSession | None:
    return ChatSession.from_db(session) if session else None


+async def get_chat_session_metadata(session_id: str) -> ChatSessionInfo | None:
+    """Get chat session metadata (without messages) for ownership validation."""
+    session = await PrismaChatSession.prisma().find_unique(
+        where={"id": session_id},
+    )
+    return ChatSessionInfo.from_db(session) if session else None
+
+
+async def get_chat_messages_paginated(
+    session_id: str,
+    limit: int = 50,
+    before_sequence: int | None = None,
+    user_id: str | None = None,
+) -> PaginatedMessages | None:
+    """Get paginated messages for a session, newest first.
+
+    Verifies session existence (and ownership when ``user_id`` is provided)
+    in parallel with the message query.  Returns ``None`` when the session
+    is not found or does not belong to the user.
+
+    Args:
+        session_id: The chat session ID.
+        limit: Max messages to return.
+        before_sequence: Cursor — return messages with sequence < this value.
+        user_id: If provided, filters via ``Session.userId`` so only the
+            session owner's messages are returned (acts as an ownership guard).
+    """
+    # Build session-existence / ownership check
+    session_where: ChatSessionWhereInput = {"id": session_id}
+    if user_id is not None:
+        session_where["userId"] = user_id
+
+    # Build message include — fetch paginated messages in the same query
+    msg_include: dict[str, Any] = {
+        "order_by": {"sequence": "desc"},
+        "take": limit + 1,
+    }
+    if before_sequence is not None:
+        msg_include["where"] = {"sequence": {"lt": before_sequence}}
+
+    # Single query: session existence/ownership + paginated messages
+    session = await PrismaChatSession.prisma().find_first(
+        where=session_where,
+        include={"Messages": msg_include},
+    )
+
+    if session is None:
+        return None
+
+    session_info = ChatSessionInfo.from_db(session)
+    results = list(session.Messages) if session.Messages else []
+
+    has_more = len(results) > limit
+    results = results[:limit]
+
+    # Reverse to ascending order
+    results.reverse()
+
+    # Tool-call boundary fix: if the oldest message is a tool message,
+    # expand backward to include the preceding assistant message that
+    # owns the tool_calls, so convertChatSessionMessagesToUiMessages
+    # can pair them correctly.
+    _BOUNDARY_SCAN_LIMIT = 10
+    if results and results[0].role == "tool":
+        boundary_where: dict[str, Any] = {
+            "sessionId": session_id,
+            "sequence": {"lt": results[0].sequence},
+        }
+        if user_id is not None:
+            boundary_where["Session"] = {"is": {"userId": user_id}}
+        extra = await PrismaChatMessage.prisma().find_many(
+            where=boundary_where,
+            order={"sequence": "desc"},
+            take=_BOUNDARY_SCAN_LIMIT,
+        )
+        # Find the first non-tool message (should be the assistant)
+        boundary_msgs = []
+        found_owner = False
+        for msg in extra:
+            boundary_msgs.append(msg)
+            if msg.role != "tool":
+                found_owner = True
+                break
+        boundary_msgs.reverse()
+        if not found_owner:
+            logger.warning(
+                "Boundary expansion did not find owning assistant message "
+                "for session=%s before sequence=%s (%d msgs scanned)",
+                session_id,
+                results[0].sequence,
+                len(extra),
+            )
+        if boundary_msgs:
+            results = boundary_msgs + results
+            # Only mark has_more if the expanded boundary isn't the
+            # very start of the conversation (sequence 0).
+            if boundary_msgs[0].sequence > 0:
+                has_more = True
+
+    messages = [ChatMessage.from_db(m) for m in results]
+    oldest_sequence = messages[0].sequence if messages else None
+
+    return PaginatedMessages(
+        messages=messages,
+        has_more=has_more,
+        oldest_sequence=oldest_sequence,
+        session=session_info,
+    )
+
+
 async def create_chat_session(
    session_id: str,
    user_id: str,
@@ -380,8 +501,11 @@ async def update_tool_message_content(
 async def set_turn_duration(session_id: str, duration_ms: int) -> None:
    """Set durationMs on the last assistant message in a session.

-    Also invalidates the Redis session cache so the next GET returns
-    the updated duration.
+    Updates the Redis cache in-place instead of invalidating it.
+    Invalidation would delete the key, creating a window where concurrent
+    ``get_chat_session`` calls re-populate the cache from DB — potentially
+    with stale data if the DB write from the previous turn hasn't propagated.
+    This race caused duplicate user messages on the next turn.
    """
    last_msg = await PrismaChatMessage.prisma().find_first(
        where={"sessionId": session_id, "role": "assistant"},
@@ -392,5 +516,13 @@ async def set_turn_duration(session_id: str, duration_ms: int) -> None:
            where={"id": last_msg.id},
            data={"durationMs": duration_ms},
        )
-        # Invalidate cache so the session is re-fetched from DB with durationMs
-        await invalidate_session_cache(session_id)
+        # Update cache in-place rather than invalidating to avoid a
+        # race window where the empty cache gets re-populated with
+        # stale data by a concurrent get_chat_session call.
+        session = await get_chat_session_cached(session_id)
+        if session and session.messages:
+            for msg in reversed(session.messages):
+                if msg.role == "assistant":
+                    msg.duration_ms = duration_ms
+                    break
+            await cache_chat_session(session)
--- a/autogpt_platform/backend/backend/copilot/db_test.py
+++ b/autogpt_platform/backend/backend/copilot/db_test.py
@@ -0,0 +1,388 @@
+"""Unit tests for copilot.db — paginated message queries."""
+
+from __future__ import annotations
+
+from datetime import UTC, datetime
+from typing import Any
+from unittest.mock import AsyncMock, patch
+
+import pytest
+from prisma.models import ChatMessage as PrismaChatMessage
+from prisma.models import ChatSession as PrismaChatSession
+
+from backend.copilot.db import (
+    PaginatedMessages,
+    get_chat_messages_paginated,
+    set_turn_duration,
+)
+from backend.copilot.model import ChatMessage as CopilotChatMessage
+from backend.copilot.model import ChatSession, get_chat_session, upsert_chat_session
+
+
+def _make_msg(
+    sequence: int,
+    role: str = "assistant",
+    content: str | None = "hello",
+    tool_calls: Any = None,
+) -> PrismaChatMessage:
+    """Build a minimal PrismaChatMessage for testing."""
+    return PrismaChatMessage(
+        id=f"msg-{sequence}",
+        createdAt=datetime.now(UTC),
+        sessionId="sess-1",
+        role=role,
+        content=content,
+        sequence=sequence,
+        toolCalls=tool_calls,
+        name=None,
+        toolCallId=None,
+        refusal=None,
+        functionCall=None,
+    )
+
+
+def _make_session(
+    session_id: str = "sess-1",
+    user_id: str = "user-1",
+    messages: list[PrismaChatMessage] | None = None,
+) -> PrismaChatSession:
+    """Build a minimal PrismaChatSession for testing."""
+    now = datetime.now(UTC)
+    session = PrismaChatSession.model_construct(
+        id=session_id,
+        createdAt=now,
+        updatedAt=now,
+        userId=user_id,
+        credentials={},
+        successfulAgentRuns={},
+        successfulAgentSchedules={},
+        totalPromptTokens=0,
+        totalCompletionTokens=0,
+        title=None,
+        metadata={},
+        Messages=messages or [],
+    )
+    return session
+
+
+SESSION_ID = "sess-1"
+
+
+@pytest.fixture()
+def mock_db():
+    """Patch ChatSession.prisma().find_first and ChatMessage.prisma().find_many.
+
+    find_first is used for the main query (session + included messages).
+    find_many is used only for boundary expansion queries.
+    """
+    with (
+        patch.object(PrismaChatSession, "prisma") as mock_session_prisma,
+        patch.object(PrismaChatMessage, "prisma") as mock_msg_prisma,
+    ):
+        find_first = AsyncMock()
+        mock_session_prisma.return_value.find_first = find_first
+
+        find_many = AsyncMock(return_value=[])
+        mock_msg_prisma.return_value.find_many = find_many
+
+        yield find_first, find_many
+
+
+# ---------- Basic pagination ----------
+
+
+@pytest.mark.asyncio
+async def test_basic_page_returns_messages_ascending(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """Messages are returned in ascending sequence order."""
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(3), _make_msg(2), _make_msg(1)],
+    )
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+
+    assert isinstance(page, PaginatedMessages)
+    assert [m.sequence for m in page.messages] == [1, 2, 3]
+    assert page.has_more is False
+    assert page.oldest_sequence == 1
+
+
+@pytest.mark.asyncio
+async def test_has_more_when_results_exceed_limit(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """has_more is True when DB returns more than limit items."""
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(3), _make_msg(2), _make_msg(1)],
+    )
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=2)
+
+    assert page is not None
+    assert page.has_more is True
+    assert len(page.messages) == 2
+    assert [m.sequence for m in page.messages] == [2, 3]
+
+
+@pytest.mark.asyncio
+async def test_empty_session_returns_no_messages(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(messages=[])
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=50)
+
+    assert page is not None
+    assert page.messages == []
+    assert page.has_more is False
+    assert page.oldest_sequence is None
+
+
+@pytest.mark.asyncio
+async def test_before_sequence_filters_correctly(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """before_sequence is passed as a where filter inside the Messages include."""
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(2), _make_msg(1)],
+    )
+
+    await get_chat_messages_paginated(SESSION_ID, limit=50, before_sequence=5)
+
+    call_kwargs = find_first.call_args
+    include = call_kwargs.kwargs.get("include") or call_kwargs[1].get("include")
+    assert include["Messages"]["where"] == {"sequence": {"lt": 5}}
+
+
+@pytest.mark.asyncio
+async def test_no_where_on_messages_without_before_sequence(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """Without before_sequence, the Messages include has no where clause."""
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(messages=[_make_msg(1)])
+
+    await get_chat_messages_paginated(SESSION_ID, limit=50)
+
+    call_kwargs = find_first.call_args
+    include = call_kwargs.kwargs.get("include") or call_kwargs[1].get("include")
+    assert "where" not in include["Messages"]
+
+
+@pytest.mark.asyncio
+async def test_user_id_filter_applied_to_session_where(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """user_id adds a userId filter to the session-level where clause."""
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(messages=[_make_msg(1)])
+
+    await get_chat_messages_paginated(SESSION_ID, limit=50, user_id="user-abc")
+
+    call_kwargs = find_first.call_args
+    where = call_kwargs.kwargs.get("where") or call_kwargs[1].get("where")
+    assert where["userId"] == "user-abc"
+
+
+@pytest.mark.asyncio
+async def test_session_not_found_returns_none(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """Returns None when session doesn't exist or user doesn't own it."""
+    find_first, _ = mock_db
+    find_first.return_value = None
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=50)
+
+    assert page is None
+
+
+@pytest.mark.asyncio
+async def test_session_info_included_in_result(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """PaginatedMessages includes session metadata."""
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(messages=[_make_msg(1)])
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=50)
+
+    assert page is not None
+    assert page.session.session_id == SESSION_ID
+
+
+# ---------- Backward boundary expansion ----------
+
+
+@pytest.mark.asyncio
+async def test_boundary_expansion_includes_assistant(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """When page starts with a tool message, expand backward to include
+    the owning assistant message."""
+    find_first, find_many = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(5, role="tool"), _make_msg(4, role="tool")],
+    )
+    find_many.return_value = [_make_msg(3, role="assistant")]
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+
+    assert page is not None
+    assert [m.sequence for m in page.messages] == [3, 4, 5]
+    assert page.messages[0].role == "assistant"
+    assert page.oldest_sequence == 3
+
+
+@pytest.mark.asyncio
+async def test_boundary_expansion_includes_multiple_tool_msgs(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """Boundary expansion scans past consecutive tool messages to find
+    the owning assistant."""
+    find_first, find_many = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(7, role="tool")],
+    )
+    find_many.return_value = [
+        _make_msg(6, role="tool"),
+        _make_msg(5, role="tool"),
+        _make_msg(4, role="assistant"),
+    ]
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+
+    assert page is not None
+    assert [m.sequence for m in page.messages] == [4, 5, 6, 7]
+    assert page.messages[0].role == "assistant"
+
+
+@pytest.mark.asyncio
+async def test_boundary_expansion_sets_has_more_when_not_at_start(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """After boundary expansion, has_more=True if expanded msgs aren't at seq 0."""
+    find_first, find_many = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(3, role="tool")],
+    )
+    find_many.return_value = [_make_msg(2, role="assistant")]
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+
+    assert page is not None
+    assert page.has_more is True
+
+
+@pytest.mark.asyncio
+async def test_boundary_expansion_no_has_more_at_conversation_start(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """has_more stays False when boundary expansion reaches seq 0."""
+    find_first, find_many = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(1, role="tool")],
+    )
+    find_many.return_value = [_make_msg(0, role="assistant")]
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+
+    assert page is not None
+    assert page.has_more is False
+    assert page.oldest_sequence == 0
+
+
+@pytest.mark.asyncio
+async def test_no_boundary_expansion_when_first_msg_not_tool(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """No boundary expansion when the first message is not a tool message."""
+    find_first, find_many = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(3, role="user"), _make_msg(2, role="assistant")],
+    )
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+
+    assert page is not None
+    assert find_many.call_count == 0
+    assert [m.sequence for m in page.messages] == [2, 3]
+
+
+@pytest.mark.asyncio
+async def test_boundary_expansion_warns_when_no_owner_found(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """When boundary scan doesn't find a non-tool message, a warning is logged
+    and the boundary messages are still included."""
+    find_first, find_many = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(10, role="tool")],
+    )
+    find_many.return_value = [_make_msg(i, role="tool") for i in range(9, -1, -1)]
+
+    with patch("backend.copilot.db.logger") as mock_logger:
+        page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+        mock_logger.warning.assert_called_once()
+
+    assert page is not None
+    assert page.messages[0].role == "tool"
+    assert len(page.messages) > 1
+
+
+# ---------- Turn duration (integration tests) ----------
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_set_turn_duration_updates_cache_in_place(setup_test_user, test_user_id):
+    """set_turn_duration patches the cached session without invalidation.
+
+    Verifies that after calling set_turn_duration the Redis-cached session
+    reflects the updated durationMs on the last assistant message, without
+    the cache having been deleted and re-populated (which could race with
+    concurrent get_chat_session calls).
+    """
+    session = ChatSession.new(user_id=test_user_id, dry_run=False)
+    session.messages = [
+        CopilotChatMessage(role="user", content="hello"),
+        CopilotChatMessage(role="assistant", content="hi there"),
+    ]
+    session = await upsert_chat_session(session)
+
+    # Ensure the session is in cache
+    cached = await get_chat_session(session.session_id, test_user_id)
+    assert cached is not None
+    assert cached.messages[-1].duration_ms is None
+
+    # Update turn duration — should patch cache in-place
+    await set_turn_duration(session.session_id, 1234)
+
+    # Read from cache (not DB) — the cache should already have the update
+    updated = await get_chat_session(session.session_id, test_user_id)
+    assert updated is not None
+    assistant_msgs = [m for m in updated.messages if m.role == "assistant"]
+    assert len(assistant_msgs) == 1
+    assert assistant_msgs[0].duration_ms == 1234
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_set_turn_duration_no_assistant_message(setup_test_user, test_user_id):
+    """set_turn_duration is a no-op when there are no assistant messages."""
+    session = ChatSession.new(user_id=test_user_id, dry_run=False)
+    session.messages = [
+        CopilotChatMessage(role="user", content="hello"),
+    ]
+    session = await upsert_chat_session(session)
+
+    # Should not raise
+    await set_turn_duration(session.session_id, 5678)
+
+    cached = await get_chat_session(session.session_id, test_user_id)
+    assert cached is not None
+    # User message should not have durationMs
+    assert cached.messages[0].duration_ms is None
--- a/autogpt_platform/backend/backend/copilot/executor/processor.py
+++ b/autogpt_platform/backend/backend/copilot/executor/processor.py
@@ -13,7 +13,7 @@ import time

 from backend.copilot import stream_registry
 from backend.copilot.baseline import stream_chat_completion_baseline
-from backend.copilot.config import ChatConfig
+from backend.copilot.config import ChatConfig, CopilotMode
 from backend.copilot.response_model import StreamError
 from backend.copilot.sdk import service as sdk_service
 from backend.copilot.sdk.dummy import stream_chat_completion_dummy
@@ -30,6 +30,57 @@ from .utils import CoPilotExecutionEntry, CoPilotLogMetadata
 logger = TruncatedLogger(logging.getLogger(__name__), prefix="[CoPilotExecutor]")


+# ============ Mode Routing ============ #
+
+
+async def resolve_effective_mode(
+    mode: CopilotMode | None,
+    user_id: str | None,
+) -> CopilotMode | None:
+    """Strip ``mode`` when the user is not entitled to the toggle.
+
+    The UI gates the mode toggle behind ``CHAT_MODE_OPTION``; the
+    processor enforces the same gate server-side so an authenticated
+    user cannot bypass the flag by crafting a request directly.
+    """
+    if mode is None:
+        return None
+    allowed = await is_feature_enabled(
+        Flag.CHAT_MODE_OPTION,
+        user_id or "anonymous",
+        default=False,
+    )
+    if not allowed:
+        logger.info(f"Ignoring mode={mode} — CHAT_MODE_OPTION is disabled for user")
+        return None
+    return mode
+
+
+async def resolve_use_sdk_for_mode(
+    mode: CopilotMode | None,
+    user_id: str | None,
+    *,
+    use_claude_code_subscription: bool,
+    config_default: bool,
+) -> bool:
+    """Pick the SDK vs baseline path for a single turn.
+
+    Per-request ``mode`` wins whenever it is set (after the
+    ``CHAT_MODE_OPTION`` gate has been applied upstream).  Otherwise
+    falls back to the Claude Code subscription override, then the
+    ``COPILOT_SDK`` LaunchDarkly flag, then the config default.
+    """
+    if mode == "fast":
+        return False
+    if mode == "extended_thinking":
+        return True
+    return use_claude_code_subscription or await is_feature_enabled(
+        Flag.COPILOT_SDK,
+        user_id or "anonymous",
+        default=config_default,
+    )
+
+
 # ============ Module Entry Points ============ #

 # Thread-local storage for processor instances
@@ -250,21 +301,26 @@ class CoPilotProcessor:
            if config.test_mode:
                stream_fn = stream_chat_completion_dummy
                log.warning("Using DUMMY service (CHAT_TEST_MODE=true)")
+                effective_mode = None
            else:
-                use_sdk = (
-                    config.use_claude_code_subscription
-                    or await is_feature_enabled(
-                        Flag.COPILOT_SDK,
-                        entry.user_id or "anonymous",
-                        default=config.use_claude_agent_sdk,
-                    )
+                # Enforce server-side feature-flag gate so unauthorised
+                # users cannot force a mode by crafting the request.
+                effective_mode = await resolve_effective_mode(entry.mode, entry.user_id)
+                use_sdk = await resolve_use_sdk_for_mode(
+                    effective_mode,
+                    entry.user_id,
+                    use_claude_code_subscription=config.use_claude_code_subscription,
+                    config_default=config.use_claude_agent_sdk,
                )
                stream_fn = (
                    sdk_service.stream_chat_completion_sdk
                    if use_sdk
                    else stream_chat_completion_baseline
                )
-                log.info(f"Using {'SDK' if use_sdk else 'baseline'} service")
+                log.info(
+                    f"Using {'SDK' if use_sdk else 'baseline'} service "
+                    f"(mode={effective_mode or 'default'})"
+                )

            # Stream chat completion and publish chunks to Redis.
            # stream_and_publish wraps the raw stream with registry
@@ -276,6 +332,7 @@ class CoPilotProcessor:
                user_id=entry.user_id,
                context=entry.context,
                file_ids=entry.file_ids,
+                mode=effective_mode,
            )
            async for chunk in stream_registry.stream_and_publish(
                session_id=entry.session_id,
--- a/autogpt_platform/backend/backend/copilot/executor/processor_test.py
+++ b/autogpt_platform/backend/backend/copilot/executor/processor_test.py
@@ -0,0 +1,175 @@
+"""Unit tests for CoPilot mode routing logic in the processor.
+
+Tests cover the mode→service mapping:
+  - 'fast' → baseline service
+  - 'extended_thinking' → SDK service
+  - None → feature flag / config fallback
+
+as well as the ``CHAT_MODE_OPTION`` server-side gate.  The tests import
+the real production helpers from ``processor.py`` so the routing logic
+has meaningful coverage.
+"""
+
+from unittest.mock import AsyncMock, patch
+
+import pytest
+
+from backend.copilot.executor.processor import (
+    resolve_effective_mode,
+    resolve_use_sdk_for_mode,
+)
+
+
+class TestResolveUseSdkForMode:
+    """Tests for the per-request mode routing logic."""
+
+    @pytest.mark.asyncio
+    async def test_fast_mode_uses_baseline(self):
+        """mode='fast' always routes to baseline, regardless of flags."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=True),
+        ):
+            assert (
+                await resolve_use_sdk_for_mode(
+                    "fast",
+                    "user-1",
+                    use_claude_code_subscription=True,
+                    config_default=True,
+                )
+                is False
+            )
+
+    @pytest.mark.asyncio
+    async def test_extended_thinking_uses_sdk(self):
+        """mode='extended_thinking' always routes to SDK, regardless of flags."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=False),
+        ):
+            assert (
+                await resolve_use_sdk_for_mode(
+                    "extended_thinking",
+                    "user-1",
+                    use_claude_code_subscription=False,
+                    config_default=False,
+                )
+                is True
+            )
+
+    @pytest.mark.asyncio
+    async def test_none_mode_uses_subscription_override(self):
+        """mode=None with claude_code_subscription=True routes to SDK."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=False),
+        ):
+            assert (
+                await resolve_use_sdk_for_mode(
+                    None,
+                    "user-1",
+                    use_claude_code_subscription=True,
+                    config_default=False,
+                )
+                is True
+            )
+
+    @pytest.mark.asyncio
+    async def test_none_mode_uses_feature_flag(self):
+        """mode=None with feature flag enabled routes to SDK."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=True),
+        ) as flag_mock:
+            assert (
+                await resolve_use_sdk_for_mode(
+                    None,
+                    "user-1",
+                    use_claude_code_subscription=False,
+                    config_default=False,
+                )
+                is True
+            )
+            flag_mock.assert_awaited_once()
+
+    @pytest.mark.asyncio
+    async def test_none_mode_uses_config_default(self):
+        """mode=None falls back to config.use_claude_agent_sdk."""
+        # When LaunchDarkly returns the default (True), we expect SDK routing.
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=True),
+        ):
+            assert (
+                await resolve_use_sdk_for_mode(
+                    None,
+                    "user-1",
+                    use_claude_code_subscription=False,
+                    config_default=True,
+                )
+                is True
+            )
+
+    @pytest.mark.asyncio
+    async def test_none_mode_all_disabled(self):
+        """mode=None with all flags off routes to baseline."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=False),
+        ):
+            assert (
+                await resolve_use_sdk_for_mode(
+                    None,
+                    "user-1",
+                    use_claude_code_subscription=False,
+                    config_default=False,
+                )
+                is False
+            )
+
+
+class TestResolveEffectiveMode:
+    """Tests for the CHAT_MODE_OPTION server-side gate."""
+
+    @pytest.mark.asyncio
+    async def test_none_mode_passes_through(self):
+        """mode=None is returned as-is without a flag check."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=False),
+        ) as flag_mock:
+            assert await resolve_effective_mode(None, "user-1") is None
+            flag_mock.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_mode_stripped_when_flag_disabled(self):
+        """When CHAT_MODE_OPTION is off, mode is dropped to None."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=False),
+        ):
+            assert await resolve_effective_mode("fast", "user-1") is None
+            assert await resolve_effective_mode("extended_thinking", "user-1") is None
+
+    @pytest.mark.asyncio
+    async def test_mode_preserved_when_flag_enabled(self):
+        """When CHAT_MODE_OPTION is on, the user-selected mode is preserved."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=True),
+        ):
+            assert await resolve_effective_mode("fast", "user-1") == "fast"
+            assert (
+                await resolve_effective_mode("extended_thinking", "user-1")
+                == "extended_thinking"
+            )
+
+    @pytest.mark.asyncio
+    async def test_anonymous_user_with_mode(self):
+        """Anonymous users (user_id=None) still pass through the gate."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=False),
+        ) as flag_mock:
+            assert await resolve_effective_mode("fast", None) is None
+            flag_mock.assert_awaited_once()
--- a/autogpt_platform/backend/backend/copilot/executor/utils.py
+++ b/autogpt_platform/backend/backend/copilot/executor/utils.py
@@ -9,6 +9,7 @@ import logging

 from pydantic import BaseModel

+from backend.copilot.config import CopilotMode
 from backend.data.rabbitmq import Exchange, ExchangeType, Queue, RabbitMQConfig
 from backend.util.logging import TruncatedLogger, is_structured_logging_enabled

@@ -156,6 +157,9 @@ class CoPilotExecutionEntry(BaseModel):
    file_ids: list[str] | None = None
    """Workspace file IDs attached to the user's message"""

+    mode: CopilotMode | None = None
+    """Autopilot mode override: 'fast' or 'extended_thinking'. None = server default."""
+

 class CancelCoPilotEvent(BaseModel):
    """Event to cancel a CoPilot operation."""
@@ -175,6 +179,7 @@ async def enqueue_copilot_turn(
    is_user_message: bool = True,
    context: dict[str, str] | None = None,
    file_ids: list[str] | None = None,
+    mode: CopilotMode | None = None,
 ) -> None:
    """Enqueue a CoPilot task for processing by the executor service.

@@ -186,6 +191,7 @@ async def enqueue_copilot_turn(
        is_user_message: Whether the message is from the user (vs system/assistant)
        context: Optional context for the message (e.g., {url: str, content: str})
        file_ids: Optional workspace file IDs attached to the user's message
+        mode: Autopilot mode override ('fast' or 'extended_thinking'). None = server default.
    """
    from backend.util.clients import get_async_copilot_queue

@@ -197,6 +203,7 @@ async def enqueue_copilot_turn(
        is_user_message=is_user_message,
        context=context,
        file_ids=file_ids,
+        mode=mode,
    )

    queue_client = await get_async_copilot_queue()
--- a/autogpt_platform/backend/backend/copilot/executor/utils_test.py
+++ b/autogpt_platform/backend/backend/copilot/executor/utils_test.py
@@ -0,0 +1,123 @@
+"""Tests for CoPilot executor utils (queue config, message models, logging)."""
+
+from backend.copilot.executor.utils import (
+    COPILOT_EXECUTION_EXCHANGE,
+    COPILOT_EXECUTION_QUEUE_NAME,
+    COPILOT_EXECUTION_ROUTING_KEY,
+    CancelCoPilotEvent,
+    CoPilotExecutionEntry,
+    CoPilotLogMetadata,
+    create_copilot_queue_config,
+)
+
+
+class TestCoPilotExecutionEntry:
+    def test_basic_fields(self):
+        entry = CoPilotExecutionEntry(
+            session_id="s1",
+            user_id="u1",
+            message="hello",
+        )
+        assert entry.session_id == "s1"
+        assert entry.user_id == "u1"
+        assert entry.message == "hello"
+        assert entry.is_user_message is True
+        assert entry.mode is None
+        assert entry.context is None
+        assert entry.file_ids is None
+
+    def test_mode_field(self):
+        entry = CoPilotExecutionEntry(
+            session_id="s1",
+            user_id="u1",
+            message="test",
+            mode="fast",
+        )
+        assert entry.mode == "fast"
+
+        entry2 = CoPilotExecutionEntry(
+            session_id="s1",
+            user_id="u1",
+            message="test",
+            mode="extended_thinking",
+        )
+        assert entry2.mode == "extended_thinking"
+
+    def test_optional_fields(self):
+        entry = CoPilotExecutionEntry(
+            session_id="s1",
+            user_id="u1",
+            message="test",
+            turn_id="t1",
+            context={"url": "https://example.com"},
+            file_ids=["f1", "f2"],
+            is_user_message=False,
+        )
+        assert entry.turn_id == "t1"
+        assert entry.context == {"url": "https://example.com"}
+        assert entry.file_ids == ["f1", "f2"]
+        assert entry.is_user_message is False
+
+    def test_serialization_roundtrip(self):
+        entry = CoPilotExecutionEntry(
+            session_id="s1",
+            user_id="u1",
+            message="hello",
+            mode="fast",
+        )
+        json_str = entry.model_dump_json()
+        restored = CoPilotExecutionEntry.model_validate_json(json_str)
+        assert restored == entry
+
+
+class TestCancelCoPilotEvent:
+    def test_basic(self):
+        event = CancelCoPilotEvent(session_id="s1")
+        assert event.session_id == "s1"
+
+    def test_serialization(self):
+        event = CancelCoPilotEvent(session_id="s1")
+        restored = CancelCoPilotEvent.model_validate_json(event.model_dump_json())
+        assert restored.session_id == "s1"
+
+
+class TestCreateCopilotQueueConfig:
+    def test_returns_valid_config(self):
+        config = create_copilot_queue_config()
+        assert len(config.exchanges) == 2
+        assert len(config.queues) == 2
+
+    def test_execution_queue_properties(self):
+        config = create_copilot_queue_config()
+        exec_queue = next(
+            q for q in config.queues if q.name == COPILOT_EXECUTION_QUEUE_NAME
+        )
+        assert exec_queue.durable is True
+        assert exec_queue.exchange == COPILOT_EXECUTION_EXCHANGE
+        assert exec_queue.routing_key == COPILOT_EXECUTION_ROUTING_KEY
+
+    def test_cancel_queue_uses_fanout(self):
+        config = create_copilot_queue_config()
+        cancel_queue = next(
+            q for q in config.queues if q.name != COPILOT_EXECUTION_QUEUE_NAME
+        )
+        assert cancel_queue.exchange is not None
+        assert cancel_queue.exchange.type.value == "fanout"
+
+
+class TestCoPilotLogMetadata:
+    def test_creates_logger_with_metadata(self):
+        import logging
+
+        base_logger = logging.getLogger("test")
+        log = CoPilotLogMetadata(base_logger, session_id="s1", user_id="u1")
+        assert log is not None
+
+    def test_filters_none_values(self):
+        import logging
+
+        base_logger = logging.getLogger("test")
+        log = CoPilotLogMetadata(
+            base_logger, session_id="s1", user_id=None, turn_id="t1"
+        )
+        assert log is not None
--- a/autogpt_platform/backend/backend/copilot/model.py
+++ b/autogpt_platform/backend/backend/copilot/model.py
@@ -64,6 +64,7 @@ class ChatMessage(BaseModel):
    refusal: str | None = None
    tool_calls: list[dict] | None = None
    function_call: dict | None = None
+    sequence: int | None = None
    duration_ms: int | None = None

    @staticmethod
@@ -77,10 +78,54 @@ class ChatMessage(BaseModel):
            refusal=prisma_message.refusal,
            tool_calls=_parse_json_field(prisma_message.toolCalls),
            function_call=_parse_json_field(prisma_message.functionCall),
+            sequence=prisma_message.sequence,
            duration_ms=prisma_message.durationMs,
        )


+def is_message_duplicate(
+    messages: list[ChatMessage],
+    role: str,
+    content: str,
+) -> bool:
+    """Check whether *content* is already present in the current pending turn.
+
+    Only inspects trailing messages that share the given *role* (i.e. the
+    current turn). This ensures legitimately repeated messages across different
+    turns are not suppressed, while same-turn duplicates from stale cache are
+    still caught.
+    """
+    for m in reversed(messages):
+        if m.role == role:
+            if m.content == content:
+                return True
+        else:
+            break
+    return False
+
+
+def maybe_append_user_message(
+    session: "ChatSession",
+    message: str | None,
+    is_user_message: bool,
+) -> bool:
+    """Append a user/assistant message to the session if not already present.
+
+    The route handler already persists the user message before enqueueing,
+    so we check trailing same-role messages to avoid re-appending when the
+    session cache is slightly stale.
+
+    Returns True if the message was appended, False if skipped.
+    """
+    if not message:
+        return False
+    role = "user" if is_user_message else "assistant"
+    if is_message_duplicate(session.messages, role, message):
+        return False
+    session.messages.append(ChatMessage(role=role, content=message))
+    return True
+
+
 class Usage(BaseModel):
    prompt_tokens: int
    completion_tokens: int
--- a/autogpt_platform/backend/backend/copilot/model_test.py
+++ b/autogpt_platform/backend/backend/copilot/model_test.py
@@ -17,6 +17,8 @@ from .model import (
    ChatSession,
    Usage,
    get_chat_session,
+    is_message_duplicate,
+    maybe_append_user_message,
    upsert_chat_session,
 )

@@ -424,3 +426,151 @@ async def test_concurrent_saves_collision_detection(setup_test_user, test_user_i
    assert "Streaming message 1" in contents
    assert "Streaming message 2" in contents
    assert "Callback result" in contents
+
+
+# --------------------------------------------------------------------------- #
+#  is_message_duplicate                                                        #
+# --------------------------------------------------------------------------- #
+
+
+def test_duplicate_detected_in_trailing_same_role():
+    """Duplicate user message at the tail is detected."""
+    msgs = [
+        ChatMessage(role="user", content="hello"),
+        ChatMessage(role="assistant", content="hi there"),
+        ChatMessage(role="user", content="yes"),
+    ]
+    assert is_message_duplicate(msgs, "user", "yes") is True
+
+
+def test_duplicate_not_detected_across_turns():
+    """Same text in a previous turn (separated by assistant) is NOT a duplicate."""
+    msgs = [
+        ChatMessage(role="user", content="yes"),
+        ChatMessage(role="assistant", content="ok"),
+    ]
+    assert is_message_duplicate(msgs, "user", "yes") is False
+
+
+def test_no_duplicate_on_empty_messages():
+    """Empty message list never reports a duplicate."""
+    assert is_message_duplicate([], "user", "hello") is False
+
+
+def test_no_duplicate_when_content_differs():
+    """Different content in the trailing same-role block is not a duplicate."""
+    msgs = [
+        ChatMessage(role="assistant", content="response"),
+        ChatMessage(role="user", content="first message"),
+    ]
+    assert is_message_duplicate(msgs, "user", "second message") is False
+
+
+def test_duplicate_with_multiple_trailing_same_role():
+    """Detects duplicate among multiple consecutive same-role messages."""
+    msgs = [
+        ChatMessage(role="assistant", content="response"),
+        ChatMessage(role="user", content="msg1"),
+        ChatMessage(role="user", content="msg2"),
+    ]
+    assert is_message_duplicate(msgs, "user", "msg1") is True
+    assert is_message_duplicate(msgs, "user", "msg2") is True
+    assert is_message_duplicate(msgs, "user", "msg3") is False
+
+
+def test_duplicate_check_for_assistant_role():
+    """Works correctly when checking assistant role too."""
+    msgs = [
+        ChatMessage(role="user", content="hi"),
+        ChatMessage(role="assistant", content="hello"),
+        ChatMessage(role="assistant", content="how can I help?"),
+    ]
+    assert is_message_duplicate(msgs, "assistant", "hello") is True
+    assert is_message_duplicate(msgs, "assistant", "new response") is False
+
+
+def test_no_false_positive_when_content_is_none():
+    """Messages with content=None in the trailing block do not match."""
+    msgs = [
+        ChatMessage(role="user", content=None),
+        ChatMessage(role="user", content="hello"),
+    ]
+    assert is_message_duplicate(msgs, "user", "hello") is True
+    # None-content message should not match any string
+    msgs2 = [
+        ChatMessage(role="user", content=None),
+    ]
+    assert is_message_duplicate(msgs2, "user", "hello") is False
+
+
+def test_all_same_role_messages():
+    """When all messages share the same role, the entire list is scanned."""
+    msgs = [
+        ChatMessage(role="user", content="first"),
+        ChatMessage(role="user", content="second"),
+        ChatMessage(role="user", content="third"),
+    ]
+    assert is_message_duplicate(msgs, "user", "first") is True
+    assert is_message_duplicate(msgs, "user", "new") is False
+
+
+# --------------------------------------------------------------------------- #
+#  maybe_append_user_message                                                   #
+# --------------------------------------------------------------------------- #
+
+
+def test_maybe_append_user_message_appends_new():
+    """A new user message is appended and returns True."""
+    session = ChatSession.new(user_id="u", dry_run=False)
+    session.messages = [
+        ChatMessage(role="assistant", content="hello"),
+    ]
+    result = maybe_append_user_message(session, "new msg", is_user_message=True)
+    assert result is True
+    assert len(session.messages) == 2
+    assert session.messages[-1].role == "user"
+    assert session.messages[-1].content == "new msg"
+
+
+def test_maybe_append_user_message_skips_duplicate():
+    """A duplicate user message is skipped and returns False."""
+    session = ChatSession.new(user_id="u", dry_run=False)
+    session.messages = [
+        ChatMessage(role="assistant", content="hello"),
+        ChatMessage(role="user", content="dup"),
+    ]
+    result = maybe_append_user_message(session, "dup", is_user_message=True)
+    assert result is False
+    assert len(session.messages) == 2
+
+
+def test_maybe_append_user_message_none_message():
+    """None/empty message returns False without appending."""
+    session = ChatSession.new(user_id="u", dry_run=False)
+    assert maybe_append_user_message(session, None, is_user_message=True) is False
+    assert maybe_append_user_message(session, "", is_user_message=True) is False
+    assert len(session.messages) == 0
+
+
+def test_maybe_append_assistant_message():
+    """Works for assistant role when is_user_message=False."""
+    session = ChatSession.new(user_id="u", dry_run=False)
+    session.messages = [
+        ChatMessage(role="user", content="hi"),
+    ]
+    result = maybe_append_user_message(session, "response", is_user_message=False)
+    assert result is True
+    assert session.messages[-1].role == "assistant"
+    assert session.messages[-1].content == "response"
+
+
+def test_maybe_append_assistant_skips_duplicate():
+    """Duplicate assistant message is skipped."""
+    session = ChatSession.new(user_id="u", dry_run=False)
+    session.messages = [
+        ChatMessage(role="user", content="hi"),
+        ChatMessage(role="assistant", content="dup"),
+    ]
+    result = maybe_append_user_message(session, "dup", is_user_message=False)
+    assert result is False
+    assert len(session.messages) == 2
--- a/autogpt_platform/backend/backend/copilot/prompting.py
+++ b/autogpt_platform/backend/backend/copilot/prompting.py
@@ -126,6 +126,21 @@ After building the file, reference it with `@@agptfile:` in other tools:
 - When spawning sub-agents for research, ensure each has a distinct
  non-overlapping scope to avoid redundant searches.

+
+### Tool Discovery Priority
+
+When the user asks to interact with a service or API, follow this order:
+
+1. **find_block first** — Search platform blocks with `find_block`. The platform has hundreds of built-in blocks (Google Sheets, Docs, Calendar, Gmail, Slack, GitHub, etc.) that work without extra setup.
+
+2. **run_mcp_tool** — If no matching block exists, check if a hosted MCP server is available for the service. Only use known MCP server URLs from the registry.
+
+3. **SendAuthenticatedWebRequestBlock** — If no block or MCP server exists, use `SendAuthenticatedWebRequestBlock` with existing host-scoped credentials. Check available credentials via `connect_integration`.
+
+4. **Manual API call** — As a last resort, guide the user to set up credentials and use `SendAuthenticatedWebRequestBlock` with direct API calls.
+
+**Never skip step 1.** Built-in blocks are more reliable, tested, and user-friendly than MCP or raw API calls.
+
 ### Sub-agent tasks
 - When using the Task tool, NEVER set `run_in_background` to true.
  All tasks must run in the foreground.
--- a/autogpt_platform/backend/backend/copilot/rate_limit.py
+++ b/autogpt_platform/backend/backend/copilot/rate_limit.py
@@ -9,11 +9,14 @@ UTC). Fails open when Redis is unavailable to avoid blocking users.
 import asyncio
 import logging
 from datetime import UTC, datetime, timedelta
+from enum import Enum

+from prisma.models import User as PrismaUser
 from pydantic import BaseModel, Field
 from redis.exceptions import RedisError

 from backend.data.redis_client import get_redis_async
+from backend.util.cache import cached

 logger = logging.getLogger(__name__)

@@ -21,6 +24,40 @@ logger = logging.getLogger(__name__)
 _USAGE_KEY_PREFIX = "copilot:usage"


+# ---------------------------------------------------------------------------
+# Subscription tier definitions
+# ---------------------------------------------------------------------------
+
+
+class SubscriptionTier(str, Enum):
+    """Subscription tiers with increasing token allowances.
+
+    Mirrors the ``SubscriptionTier`` enum in ``schema.prisma``.
+    Once ``prisma generate`` is run, this can be replaced with::
+
+        from prisma.enums import SubscriptionTier
+    """
+
+    FREE = "FREE"
+    PRO = "PRO"
+    BUSINESS = "BUSINESS"
+    ENTERPRISE = "ENTERPRISE"
+
+
+# Multiplier applied to the base limits (from LD / config) for each tier.
+# Intentionally int (not float): keeps limits as whole token counts and avoids
+# floating-point rounding.  If fractional multipliers are ever needed, change
+# the type and round the result in get_global_rate_limits().
+TIER_MULTIPLIERS: dict[SubscriptionTier, int] = {
+    SubscriptionTier.FREE: 1,
+    SubscriptionTier.PRO: 5,
+    SubscriptionTier.BUSINESS: 20,
+    SubscriptionTier.ENTERPRISE: 60,
+}
+
+DEFAULT_TIER = SubscriptionTier.FREE
+
+
 class UsageWindow(BaseModel):
    """Usage within a single time window."""

@@ -36,6 +73,7 @@ class CoPilotUsageStatus(BaseModel):

    daily: UsageWindow
    weekly: UsageWindow
+    tier: SubscriptionTier = DEFAULT_TIER
    reset_cost: int = Field(
        default=0,
        description="Credit cost (in cents) to reset the daily limit. 0 = feature disabled.",
@@ -66,6 +104,7 @@ async def get_usage_status(
    daily_token_limit: int,
    weekly_token_limit: int,
    rate_limit_reset_cost: int = 0,
+    tier: SubscriptionTier = DEFAULT_TIER,
 ) -> CoPilotUsageStatus:
    """Get current usage status for a user.

@@ -74,6 +113,7 @@ async def get_usage_status(
        daily_token_limit: Max tokens per day (0 = unlimited).
        weekly_token_limit: Max tokens per week (0 = unlimited).
        rate_limit_reset_cost: Credit cost (cents) to reset daily limit (0 = disabled).
+        tier: The user's rate-limit tier (included in the response).

    Returns:
        CoPilotUsageStatus with current usage and limits.
@@ -103,6 +143,7 @@ async def get_usage_status(
            limit=weekly_token_limit,
            resets_at=_weekly_reset_time(now=now),
        ),
+        tier=tier,
        reset_cost=rate_limit_reset_cost,
    )

@@ -343,20 +384,100 @@ async def record_token_usage(
        )


+class _UserNotFoundError(Exception):
+    """Raised when a user record is missing or has no subscription tier.
+
+    Used internally by ``_fetch_user_tier`` to signal a cache-miss condition:
+    by raising instead of returning ``DEFAULT_TIER``, we prevent the ``@cached``
+    decorator from storing the fallback value.  This avoids a race condition
+    where a non-existent user's DEFAULT_TIER is cached, then the user is
+    created with a higher tier but receives the stale cached FREE tier for
+    up to 5 minutes.
+    """
+
+
+@cached(maxsize=1000, ttl_seconds=300, shared_cache=True)
+async def _fetch_user_tier(user_id: str) -> SubscriptionTier:
+    """Fetch the user's rate-limit tier from the database (cached via Redis).
+
+    Uses ``shared_cache=True`` so that tier changes propagate across all pods
+    immediately when the cache entry is invalidated (via ``cache_delete``).
+
+    Only successful DB lookups of existing users with a valid tier are cached.
+    Raises ``_UserNotFoundError`` when the user is missing or has no tier, so
+    the ``@cached`` decorator does **not** store a fallback value.  This
+    prevents a race condition where a non-existent user's ``DEFAULT_TIER`` is
+    cached and then persists after the user is created with a higher tier.
+    """
+    user = await PrismaUser.prisma().find_unique(where={"id": user_id})
+    if user and user.subscriptionTier:  # type: ignore[reportAttributeAccessIssue]
+        return SubscriptionTier(user.subscriptionTier)  # type: ignore[reportAttributeAccessIssue]
+    raise _UserNotFoundError(user_id)
+
+
+async def get_user_tier(user_id: str) -> SubscriptionTier:
+    """Look up the user's rate-limit tier from the database.
+
+    Successful results are cached for 5 minutes (via ``_fetch_user_tier``)
+    to avoid a DB round-trip on every rate-limit check.
+
+    Falls back to ``DEFAULT_TIER`` **without caching** when the DB is
+    unreachable or returns an unrecognised value, so the next call retries
+    the query instead of serving a stale fallback for up to 5 minutes.
+    """
+    try:
+        return await _fetch_user_tier(user_id)
+    except Exception as exc:
+        logger.warning(
+            "Failed to resolve rate-limit tier for user %s, defaulting to %s: %s",
+            user_id[:8],
+            DEFAULT_TIER.value,
+            exc,
+        )
+    return DEFAULT_TIER
+
+
+# Expose cache management on the public function so callers (including tests)
+# never need to reach into the private ``_fetch_user_tier``.
+get_user_tier.cache_clear = _fetch_user_tier.cache_clear  # type: ignore[attr-defined]
+get_user_tier.cache_delete = _fetch_user_tier.cache_delete  # type: ignore[attr-defined]
+
+
+async def set_user_tier(user_id: str, tier: SubscriptionTier) -> None:
+    """Persist the user's rate-limit tier to the database.
+
+    Also invalidates the ``get_user_tier`` cache for this user so that
+    subsequent rate-limit checks immediately see the new tier.
+
+    Raises:
+        prisma.errors.RecordNotFoundError: If the user does not exist.
+    """
+    await PrismaUser.prisma().update(
+        where={"id": user_id},
+        data={"subscriptionTier": tier.value},
+    )
+    # Invalidate cached tier so rate-limit checks pick up the change immediately.
+    get_user_tier.cache_delete(user_id)  # type: ignore[attr-defined]
+
+
 async def get_global_rate_limits(
    user_id: str,
    config_daily: int,
    config_weekly: int,
-) -> tuple[int, int]:
+) -> tuple[int, int, SubscriptionTier]:
    """Resolve global rate limits from LaunchDarkly, falling back to config.

+    The base limits (from LD or config) are multiplied by the user's
+    tier multiplier so that higher tiers receive proportionally larger
+    allowances.
+
    Args:
        user_id: User ID for LD flag evaluation context.
        config_daily: Fallback daily limit from ChatConfig.
        config_weekly: Fallback weekly limit from ChatConfig.

    Returns:
-        (daily_token_limit, weekly_token_limit) tuple.
+        (daily_token_limit, weekly_token_limit, tier) 3-tuple.
    """
    # Lazy import to avoid circular dependency:
    # rate_limit -> feature_flag -> settings -> ... -> rate_limit
@@ -378,7 +499,15 @@ async def get_global_rate_limits(
    except (TypeError, ValueError):
        logger.warning("Invalid LD value for weekly token limit: %r", weekly_raw)
        weekly = config_weekly
-    return daily, weekly
+
+    # Apply tier multiplier
+    tier = await get_user_tier(user_id)
+    multiplier = TIER_MULTIPLIERS.get(tier, 1)
+    if multiplier != 1:
+        daily = daily * multiplier
+        weekly = weekly * multiplier
+
+    return daily, weekly, tier


 async def reset_user_usage(user_id: str, *, reset_weekly: bool = False) -> None:
--- a/autogpt_platform/backend/backend/copilot/rate_limit_test.py
+++ b/autogpt_platform/backend/backend/copilot/rate_limit_test.py
--- a/autogpt_platform/backend/backend/copilot/reset_usage_test.py
+++ b/autogpt_platform/backend/backend/copilot/reset_usage_test.py
@@ -9,7 +9,7 @@ import pytest
 from fastapi import HTTPException

 from backend.api.features.chat.routes import reset_copilot_usage
-from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow
+from backend.copilot.rate_limit import CoPilotUsageStatus, SubscriptionTier, UsageWindow
 from backend.util.exceptions import InsufficientBalanceError


@@ -53,6 +53,18 @@ def _mock_settings(enable_credit: bool = True):
    return mock


+def _mock_rate_limits(
+    daily: int = 2_500_000,
+    weekly: int = 12_500_000,
+    tier: SubscriptionTier = SubscriptionTier.PRO,
+):
+    """Mock get_global_rate_limits to return fixed limits (no tier multiplier)."""
+    return patch(
+        f"{_MODULE}.get_global_rate_limits",
+        AsyncMock(return_value=(daily, weekly, tier)),
+    )
+
+
@pytest.mark.asyncio
 class TestResetCopilotUsage:
    async def test_feature_disabled_returns_400(self):
@@ -70,10 +82,7 @@ class TestResetCopilotUsage:
        with (
            patch(f"{_MODULE}.config", _make_config(daily_token_limit=0)),
            patch(f"{_MODULE}.settings", _mock_settings()),
-            patch(
-                f"{_MODULE}.get_global_rate_limits",
-                AsyncMock(return_value=(0, 12_500_000)),
-            ),
+            _mock_rate_limits(daily=0),
        ):
            with pytest.raises(HTTPException) as exc_info:
                await reset_copilot_usage(user_id="user-1")
@@ -87,10 +96,7 @@ class TestResetCopilotUsage:
        with (
            patch(f"{_MODULE}.config", cfg),
            patch(f"{_MODULE}.settings", _mock_settings()),
-            patch(
-                f"{_MODULE}.get_global_rate_limits",
-                AsyncMock(return_value=(2_500_000, 12_500_000)),
-            ),
+            _mock_rate_limits(),
            patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
            patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
            patch(f"{_MODULE}.release_reset_lock", AsyncMock()) as mock_release,
@@ -120,10 +126,7 @@ class TestResetCopilotUsage:
        with (
            patch(f"{_MODULE}.config", cfg),
            patch(f"{_MODULE}.settings", _mock_settings()),
-            patch(
-                f"{_MODULE}.get_global_rate_limits",
-                AsyncMock(return_value=(2_500_000, 12_500_000)),
-            ),
+            _mock_rate_limits(),
            patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
            patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
            patch(f"{_MODULE}.release_reset_lock", AsyncMock()) as mock_release,
@@ -153,10 +156,7 @@ class TestResetCopilotUsage:
        with (
            patch(f"{_MODULE}.config", cfg),
            patch(f"{_MODULE}.settings", _mock_settings()),
-            patch(
-                f"{_MODULE}.get_global_rate_limits",
-                AsyncMock(return_value=(2_500_000, 12_500_000)),
-            ),
+            _mock_rate_limits(),
            patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
            patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
            patch(f"{_MODULE}.release_reset_lock", AsyncMock()),
@@ -187,10 +187,7 @@ class TestResetCopilotUsage:
        with (
            patch(f"{_MODULE}.config", cfg),
            patch(f"{_MODULE}.settings", _mock_settings()),
-            patch(
-                f"{_MODULE}.get_global_rate_limits",
-                AsyncMock(return_value=(2_500_000, 12_500_000)),
-            ),
+            _mock_rate_limits(),
            patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=3)),
        ):
            with pytest.raises(HTTPException) as exc_info:
@@ -228,10 +225,7 @@ class TestResetCopilotUsage:
        with (
            patch(f"{_MODULE}.config", cfg),
            patch(f"{_MODULE}.settings", _mock_settings()),
-            patch(
-                f"{_MODULE}.get_global_rate_limits",
-                AsyncMock(return_value=(2_500_000, 12_500_000)),
-            ),
+            _mock_rate_limits(),
            patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
            patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
            patch(f"{_MODULE}.release_reset_lock", AsyncMock()) as mock_release,
@@ -252,10 +246,7 @@ class TestResetCopilotUsage:
        with (
            patch(f"{_MODULE}.config", _make_config()),
            patch(f"{_MODULE}.settings", _mock_settings()),
-            patch(
-                f"{_MODULE}.get_global_rate_limits",
-                AsyncMock(return_value=(2_500_000, 12_500_000)),
-            ),
+            _mock_rate_limits(),
            patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=None)),
        ):
            with pytest.raises(HTTPException) as exc_info:
@@ -273,10 +264,7 @@ class TestResetCopilotUsage:
        with (
            patch(f"{_MODULE}.config", cfg),
            patch(f"{_MODULE}.settings", _mock_settings()),
-            patch(
-                f"{_MODULE}.get_global_rate_limits",
-                AsyncMock(return_value=(2_500_000, 12_500_000)),
-            ),
+            _mock_rate_limits(),
            patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
            patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
            patch(f"{_MODULE}.release_reset_lock", AsyncMock()),
@@ -307,10 +295,7 @@ class TestResetCopilotUsage:
        with (
            patch(f"{_MODULE}.config", cfg),
            patch(f"{_MODULE}.settings", _mock_settings()),
-            patch(
-                f"{_MODULE}.get_global_rate_limits",
-                AsyncMock(return_value=(2_500_000, 12_500_000)),
-            ),
+            _mock_rate_limits(),
            patch(f"{_MODULE}.get_daily_reset_count", AsyncMock(return_value=0)),
            patch(f"{_MODULE}.acquire_reset_lock", AsyncMock(return_value=True)),
            patch(f"{_MODULE}.release_reset_lock", AsyncMock()),
--- a/autogpt_platform/backend/backend/copilot/sdk/agent_generation_guide.md
+++ b/autogpt_platform/backend/backend/copilot/sdk/agent_generation_guide.md
@@ -53,6 +53,12 @@ Steps:
   or fix manually based on the error descriptions. Iterate until valid.
 8. **Save**: Call `create_agent` (new) or `edit_agent` (existing) with
   the final `agent_json`
+8. **Dry-run**: ALWAYS call `run_agent` with `dry_run=True` and
+   `wait_for_result=120` to verify the agent works end-to-end.
+9. **Inspect & fix**: Check the dry-run output for errors. If issues are
+   found, call `edit_agent` to fix and dry-run again. Repeat until the
+   simulation passes or the problems are clearly unfixable.
+   See "REQUIRED: Dry-Run Verification Loop" section below for details.

 ### Agent JSON Structure

@@ -246,19 +252,51 @@ call in a loop until the task is complete:
 Regular blocks work exactly like sub-agents as tools — wire each input
 field from `source_name: "tools"` on the Orchestrator side.

-### Testing with Dry Run
+### REQUIRED: Dry-Run Verification Loop (create -> dry-run -> fix)

-After saving an agent, suggest a dry run to validate wiring without consuming
-real API calls, credentials, or credits:
+After creating or editing an agent, you MUST dry-run it before telling the
+user the agent is ready. NEVER skip this step.

-1. **Run**: Call `run_agent` or `run_block` with `dry_run=True` and provide
-   sample inputs. This executes the graph with mock outputs, verifying that
-   links resolve correctly and required inputs are satisfied.
-2. **Check results**: Call `view_agent_output` with `show_execution_details=True`
-   to inspect the full node-by-node execution trace. This shows what each node
-   received as input and produced as output, making it easy to spot wiring issues.
-3. **Iterate**: If the dry run reveals wiring issues or missing inputs, fix
-   the agent JSON and re-save before suggesting a real execution.
+#### Step-by-step workflow
+
+1. **Create/Edit**: Call `create_agent` or `edit_agent` to save the agent.
+2. **Dry-run**: Call `run_agent` with `dry_run=True`, `wait_for_result=120`,
+   and realistic sample inputs that exercise every path in the agent. This
+   simulates execution using an LLM for each block — no real API calls,
+   credentials, or credits are consumed.
+3. **Inspect output**: Examine the dry-run result for problems. If
+   `wait_for_result` returns only a summary, call
+   `view_agent_output(execution_id=..., show_execution_details=True)` to
+   see the full node-by-node execution trace. Look for:
+   - **Errors / failed nodes** — a node raised an exception or returned an
+     error status. Common causes: wrong `source_name`/`sink_name` in links,
+     missing `input_default` values, or referencing a nonexistent block output.
+   - **Null / empty outputs** — data did not flow through a link. Verify that
+     `source_name` and `sink_name` match the block schemas exactly (case-
+     sensitive, including nested `_#_` notation).
+   - **Nodes that never executed** — the node was not reached. Likely a
+     missing or broken link from an upstream node.
+   - **Unexpected values** — data arrived but in the wrong type or
+     structure. Check type compatibility between linked ports.
+4. **Fix**: If any issues are found, call `edit_agent` with the corrected
+   agent JSON, then go back to step 2.
+5. **Repeat**: Continue the dry-run -> fix cycle until the simulation passes
+   or the problems are clearly unfixable. If you stop making progress,
+   report the remaining issues to the user and ask for guidance.
+
+#### Good vs bad dry-run output
+
+**Good output** (agent is ready):
+- All nodes executed successfully (no errors in the execution trace)
+- Data flows through every link with non-null, correctly-typed values
+- The final `AgentOutputBlock` contains a meaningful result
+- Status is `COMPLETED`
+
+**Bad output** (needs fixing):
+- Status is `FAILED` — check the error message for the failing node
+- An output node received `null` — trace back to find the broken link
+- A node received data in the wrong format (e.g. string where list expected)
+- Nodes downstream of a failing node were skipped entirely

 **Special block behaviour in dry-run mode:**
 - **OrchestratorBlock** and **AgentExecutorBlock** execute for real so the
--- a/autogpt_platform/backend/backend/copilot/sdk/env.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/env.py
@@ -8,6 +8,8 @@ circular import through ``executor`` → ``credit`` → ``block_cost_config``).

 from __future__ import annotations

+import re
+
 from backend.copilot.config import ChatConfig
 from backend.copilot.sdk.subscription import validate_subscription

@@ -26,14 +28,14 @@ def build_sdk_env(

    Three modes (checked in order):
    1. **Subscription** — clears all keys; CLI uses ``claude login`` auth.
-    2. **Direct Anthropic** — returns ``{}``; subprocess inherits
-       ``ANTHROPIC_API_KEY`` from the parent environment.
+    2. **Direct Anthropic** — subprocess inherits ``ANTHROPIC_API_KEY``
+       from the parent environment (no overrides needed).
    3. **OpenRouter** (default) — overrides base URL and auth token to
       route through the proxy, with Langfuse trace headers.

-    When *sdk_cwd* is provided, ``CLAUDE_CODE_TMPDIR`` is set so that
-    the CLI writes temp/sub-agent output inside the per-session workspace
-    directory rather than an inaccessible system temp path.
+    All modes receive workspace isolation (``CLAUDE_CODE_TMPDIR``) and
+    security hardening env vars to prevent .claude.md loading, prompt
+    history persistence, auto-memory writes, and non-essential traffic.
    """
    # --- Mode 1: Claude Code subscription auth ---
    if config.use_claude_code_subscription:
@@ -43,40 +45,51 @@ def build_sdk_env(
            "ANTHROPIC_AUTH_TOKEN": "",
            "ANTHROPIC_BASE_URL": "",
        }
-        if sdk_cwd:
-            env["CLAUDE_CODE_TMPDIR"] = sdk_cwd
-        return env

    # --- Mode 2: Direct Anthropic (no proxy hop) ---
-    if not config.openrouter_active:
+    elif not config.openrouter_active:
        env = {}
-        if sdk_cwd:
-            env["CLAUDE_CODE_TMPDIR"] = sdk_cwd
-        return env

    # --- Mode 3: OpenRouter proxy ---
-    base = (config.base_url or "").rstrip("/")
-    if base.endswith("/v1"):
-        base = base[:-3]
-    env = {
-        "ANTHROPIC_BASE_URL": base,
-        "ANTHROPIC_AUTH_TOKEN": config.api_key or "",
-        "ANTHROPIC_API_KEY": "",  # force CLI to use AUTH_TOKEN
-    }
+    else:
+        base = (config.base_url or "").rstrip("/")
+        if base.endswith("/v1"):
+            base = base[:-3]
+        env = {
+            "ANTHROPIC_BASE_URL": base,
+            "ANTHROPIC_AUTH_TOKEN": config.api_key or "",
+            "ANTHROPIC_API_KEY": "",  # force CLI to use AUTH_TOKEN
+        }

-    # Inject broadcast headers so OpenRouter forwards traces to Langfuse.
-    def _safe(v: str) -> str:
-        return v.replace("\r", "").replace("\n", "").strip()[:128]
+        # Inject broadcast headers so OpenRouter forwards traces to Langfuse.
+        def _safe(v: str) -> str:
+            # Keep only printable ASCII (0x20–0x7e); strip control chars,
+            # null bytes, and non-ASCII to produce a valid HTTP header value
+            # (RFC 7230 §3.2.6).
+            return re.sub(r"[^\x20-\x7e]", "", v).strip()[:128]

-    parts = []
-    if session_id:
-        parts.append(f"x-session-id: {_safe(session_id)}")
-    if user_id:
-        parts.append(f"x-user-id: {_safe(user_id)}")
-    if parts:
-        env["ANTHROPIC_CUSTOM_HEADERS"] = "\n".join(parts)
+        parts = []
+        if session_id:
+            parts.append(f"x-session-id: {_safe(session_id)}")
+        if user_id:
+            parts.append(f"x-user-id: {_safe(user_id)}")
+        if parts:
+            env["ANTHROPIC_CUSTOM_HEADERS"] = "\n".join(parts)

+    # --- Common: workspace isolation + security hardening (all modes) ---
+    # Route subagent temp files into the per-session workspace so output
+    # files are accessible (fixes /tmp/claude-0/ permission errors in E2B).
    if sdk_cwd:
        env["CLAUDE_CODE_TMPDIR"] = sdk_cwd

+    # Harden multi-tenant deployment: prevent loading untrusted workspace
+    # .claude.md files, persisting prompt history, writing auto-memory,
+    # and sending non-essential telemetry traffic.
+    # These are undocumented CLI internals validated against
+    # claude-agent-sdk 0.1.45 — re-verify when upgrading the SDK.
+    env["CLAUDE_CODE_DISABLE_CLAUDE_MDS"] = "1"
+    env["CLAUDE_CODE_SKIP_PROMPT_HISTORY"] = "1"
+    env["CLAUDE_CODE_DISABLE_AUTO_MEMORY"] = "1"
+    env["CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC"] = "1"
+
    return env
--- a/autogpt_platform/backend/backend/copilot/sdk/env_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/env_test.py
@@ -41,11 +41,9 @@ class TestBuildSdkEnvSubscription:

            result = build_sdk_env()

-        assert result == {
-            "ANTHROPIC_API_KEY": "",
-            "ANTHROPIC_AUTH_TOKEN": "",
-            "ANTHROPIC_BASE_URL": "",
-        }
+        assert result["ANTHROPIC_API_KEY"] == ""
+        assert result["ANTHROPIC_AUTH_TOKEN"] == ""
+        assert result["ANTHROPIC_BASE_URL"] == ""
        mock_validate.assert_called_once()

    @patch(
@@ -68,18 +66,20 @@ class TestBuildSdkEnvSubscription:


 class TestBuildSdkEnvDirectAnthropic:
-    """When OpenRouter is inactive, return empty dict (inherit parent env)."""
+    """When OpenRouter is inactive, no ANTHROPIC_* overrides (inherit parent env)."""

-    def test_returns_empty_dict_when_openrouter_inactive(self):
+    def test_no_anthropic_key_overrides_when_openrouter_inactive(self):
        cfg = _make_config(use_openrouter=False)
        with patch("backend.copilot.sdk.env.config", cfg):
            from backend.copilot.sdk.env import build_sdk_env

            result = build_sdk_env()

-        assert result == {}
+        assert "ANTHROPIC_API_KEY" not in result
+        assert "ANTHROPIC_AUTH_TOKEN" not in result
+        assert "ANTHROPIC_BASE_URL" not in result

-    def test_returns_empty_dict_when_openrouter_flag_true_but_no_key(self):
+    def test_no_anthropic_key_overrides_when_openrouter_flag_true_but_no_key(self):
        """OpenRouter flag is True but no api_key => openrouter_active is False."""
        cfg = _make_config(use_openrouter=True, base_url="https://openrouter.ai/api/v1")
        # Force api_key to None after construction (field_validator may pick up env vars)
@@ -90,7 +90,9 @@ class TestBuildSdkEnvDirectAnthropic:

            result = build_sdk_env()

-        assert result == {}
+        assert "ANTHROPIC_API_KEY" not in result
+        assert "ANTHROPIC_AUTH_TOKEN" not in result
+        assert "ANTHROPIC_BASE_URL" not in result


 # ---------------------------------------------------------------------------
@@ -234,12 +236,12 @@ class TestBuildSdkEnvModePriority:

            result = build_sdk_env()

-        # Should get subscription result, not OpenRouter
-        assert result == {
-            "ANTHROPIC_API_KEY": "",
-            "ANTHROPIC_AUTH_TOKEN": "",
-            "ANTHROPIC_BASE_URL": "",
-        }
+        # Should get subscription result (blanked keys), not OpenRouter proxy
+        assert result["ANTHROPIC_API_KEY"] == ""
+        assert result["ANTHROPIC_AUTH_TOKEN"] == ""
+        assert result["ANTHROPIC_BASE_URL"] == ""
+        # OpenRouter-specific key must NOT be present
+        assert "ANTHROPIC_CUSTOM_HEADERS" not in result


 # ---------------------------------------------------------------------------
--- a/autogpt_platform/backend/backend/copilot/sdk/mcp_tool_guide.md
+++ b/autogpt_platform/backend/backend/copilot/sdk/mcp_tool_guide.md
@@ -28,13 +28,12 @@ Each result includes a `remotes` array with the exact server URL to use.

 ### Important: Check blocks first

-Before using `run_mcp_tool`, always check if the platform already has blocks for the service
-using `find_block`. The platform has hundreds of built-in blocks (Google Sheets, Google Docs,
-Google Calendar, Gmail, etc.) that work without MCP setup.
+Always follow the **Tool Discovery Priority** described in the tool notes:
+call `find_block` before resorting to `run_mcp_tool`.

 Only use `run_mcp_tool` when:
- The service is in the known hosted MCP servers list above, OR
- You searched `find_block` first and found no matching blocks
+- You searched `find_block` first and found no matching blocks, AND
+- The service is in the known hosted MCP servers list above or found via the registry API

 **Never guess or construct MCP server URLs.** Only use URLs from the known servers list above
 or from the `remotes[].url` field in MCP registry search results.
--- a/autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/p0_guardrails_test.py
@@ -0,0 +1,535 @@
+"""Tests for P0 guardrails: _resolve_fallback_model, security env vars, TMPDIR."""
+
+from unittest.mock import patch
+
+import pytest
+from pydantic import ValidationError
+
+from backend.copilot.config import ChatConfig
+from backend.copilot.constants import is_transient_api_error
+
+
+def _make_config(**overrides) -> ChatConfig:
+    """Create a ChatConfig with safe defaults, applying *overrides*."""
+    defaults = {
+        "use_claude_code_subscription": False,
+        "use_openrouter": False,
+        "api_key": None,
+        "base_url": None,
+    }
+    defaults.update(overrides)
+    return ChatConfig(**defaults)
+
+
+# ---------------------------------------------------------------------------
+# _resolve_fallback_model
+# ---------------------------------------------------------------------------
+
+_SVC = "backend.copilot.sdk.service"
+_ENV = "backend.copilot.sdk.env"
+
+
+class TestResolveFallbackModel:
+    """Provider-aware fallback model resolution."""
+
+    def test_returns_none_when_empty(self):
+        cfg = _make_config(claude_agent_fallback_model="")
+        with patch(f"{_SVC}.config", cfg):
+            from backend.copilot.sdk.service import _resolve_fallback_model
+
+            assert _resolve_fallback_model() is None
+
+    def test_strips_provider_prefix(self):
+        """OpenRouter-style 'anthropic/claude-sonnet-4-...' is stripped."""
+        cfg = _make_config(
+            claude_agent_fallback_model="anthropic/claude-sonnet-4-20250514",
+            use_openrouter=True,
+            api_key="sk-test",
+            base_url="https://openrouter.ai/api/v1",
+        )
+        with patch(f"{_SVC}.config", cfg):
+            from backend.copilot.sdk.service import _resolve_fallback_model
+
+            result = _resolve_fallback_model()
+
+        assert result == "claude-sonnet-4-20250514"
+        assert "/" not in result
+
+    def test_dots_replaced_for_direct_anthropic(self):
+        """Direct Anthropic requires hyphen-separated versions."""
+        cfg = _make_config(
+            claude_agent_fallback_model="claude-sonnet-4.5-20250514",
+            use_openrouter=False,
+        )
+        with patch(f"{_SVC}.config", cfg):
+            from backend.copilot.sdk.service import _resolve_fallback_model
+
+            result = _resolve_fallback_model()
+
+        assert result is not None
+        assert "." not in result
+        assert result == "claude-sonnet-4-5-20250514"
+
+    def test_dots_preserved_for_openrouter(self):
+        """OpenRouter uses dot-separated versions — don't normalise."""
+        cfg = _make_config(
+            claude_agent_fallback_model="claude-sonnet-4.5-20250514",
+            use_openrouter=True,
+            api_key="sk-test",
+            base_url="https://openrouter.ai/api/v1",
+        )
+        with patch(f"{_SVC}.config", cfg):
+            from backend.copilot.sdk.service import _resolve_fallback_model
+
+            result = _resolve_fallback_model()
+
+        assert result == "claude-sonnet-4.5-20250514"
+
+    def test_default_value(self):
+        """Default fallback model resolves to a valid string."""
+        cfg = _make_config()
+        with patch(f"{_SVC}.config", cfg):
+            from backend.copilot.sdk.service import _resolve_fallback_model
+
+            result = _resolve_fallback_model()
+
+        assert result is not None
+        assert "sonnet" in result.lower() or "claude" in result.lower()
+
+
+# ---------------------------------------------------------------------------
+# Security & isolation env vars
+# ---------------------------------------------------------------------------
+
+
+_SECURITY_VARS = (
+    "CLAUDE_CODE_DISABLE_CLAUDE_MDS",
+    "CLAUDE_CODE_SKIP_PROMPT_HISTORY",
+    "CLAUDE_CODE_DISABLE_AUTO_MEMORY",
+    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC",
+)
+
+
+class TestSecurityEnvVars:
+    """Verify security env vars are set in the returned dict for every auth mode.
+
+    Tests call ``build_sdk_env()`` directly and assert the vars are present
+    in the returned dict — not just present somewhere in the source file.
+    """
+
+    def test_security_vars_set_in_openrouter_mode(self):
+        """Mode 3 (OpenRouter): security vars must be in the returned env."""
+        cfg = _make_config(
+            use_claude_code_subscription=False,
+            use_openrouter=True,
+            api_key="sk-or-test",
+            base_url="https://openrouter.ai/api/v1",
+        )
+        with patch(f"{_ENV}.config", cfg):
+            from backend.copilot.sdk.env import build_sdk_env
+
+            env = build_sdk_env(session_id="s1", user_id="u1")
+
+        for var in _SECURITY_VARS:
+            assert env.get(var) == "1", f"{var} not set in OpenRouter mode"
+
+    def test_security_vars_set_in_direct_anthropic_mode(self):
+        """Mode 2 (direct Anthropic): security vars must be in the returned env."""
+        cfg = _make_config(use_claude_code_subscription=False, use_openrouter=False)
+        with patch(f"{_ENV}.config", cfg):
+            from backend.copilot.sdk.env import build_sdk_env
+
+            env = build_sdk_env()
+
+        for var in _SECURITY_VARS:
+            assert env.get(var) == "1", f"{var} not set in direct Anthropic mode"
+
+    def test_security_vars_set_in_subscription_mode(self):
+        """Mode 1 (subscription): security vars must be in the returned env."""
+        cfg = _make_config(use_claude_code_subscription=True)
+        with (
+            patch(f"{_ENV}.config", cfg),
+            patch(f"{_ENV}.validate_subscription"),
+        ):
+            from backend.copilot.sdk.env import build_sdk_env
+
+            env = build_sdk_env(session_id="s1", user_id="u1")
+
+        for var in _SECURITY_VARS:
+            assert env.get(var) == "1", f"{var} not set in subscription mode"
+
+    def test_tmpdir_set_when_sdk_cwd_provided(self):
+        """CLAUDE_CODE_TMPDIR must be set when sdk_cwd is provided."""
+        cfg = _make_config(use_openrouter=False)
+        with patch(f"{_ENV}.config", cfg):
+            from backend.copilot.sdk.env import build_sdk_env
+
+            env = build_sdk_env(sdk_cwd="/workspace/session-1")
+
+        assert env.get("CLAUDE_CODE_TMPDIR") == "/workspace/session-1"
+
+    def test_tmpdir_absent_when_sdk_cwd_not_provided(self):
+        """CLAUDE_CODE_TMPDIR must NOT be set when sdk_cwd is None."""
+        cfg = _make_config(use_openrouter=False)
+        with patch(f"{_ENV}.config", cfg):
+            from backend.copilot.sdk.env import build_sdk_env
+
+            env = build_sdk_env()
+
+        assert "CLAUDE_CODE_TMPDIR" not in env
+
+    def test_home_not_overridden(self):
+        """HOME must NOT be overridden — would break git/ssh/npm in subprocesses."""
+        cfg = _make_config(use_openrouter=False)
+        with patch(f"{_ENV}.config", cfg):
+            from backend.copilot.sdk.env import build_sdk_env
+
+            env = build_sdk_env()
+
+        assert "HOME" not in env
+
+
+# ---------------------------------------------------------------------------
+# Config defaults
+# ---------------------------------------------------------------------------
+
+
+class TestConfigDefaults:
+    """Verify ChatConfig P0 fields have correct defaults."""
+
+    def test_fallback_model_default(self):
+        cfg = _make_config()
+        assert cfg.claude_agent_fallback_model
+        assert "sonnet" in cfg.claude_agent_fallback_model.lower()
+
+    def test_max_turns_default(self):
+        cfg = _make_config()
+        assert cfg.claude_agent_max_turns == 1000
+
+    def test_max_budget_usd_default(self):
+        cfg = _make_config()
+        assert cfg.claude_agent_max_budget_usd == 100.0
+
+    def test_max_transient_retries_default(self):
+        cfg = _make_config()
+        assert cfg.claude_agent_max_transient_retries == 3
+
+
+# ---------------------------------------------------------------------------
+# build_sdk_env — all 3 auth modes
+# ---------------------------------------------------------------------------
+
+
+class TestBuildSdkEnv:
+    """Verify build_sdk_env returns correct dicts for each auth mode."""
+
+    def test_subscription_mode_clears_keys(self):
+        """Mode 1: subscription clears API key / auth token / base URL."""
+        cfg = _make_config(use_claude_code_subscription=True)
+        with (
+            patch(f"{_ENV}.config", cfg),
+            patch(f"{_ENV}.validate_subscription"),
+        ):
+            from backend.copilot.sdk.env import build_sdk_env
+
+            env = build_sdk_env(session_id="s1", user_id="u1")
+
+        assert env["ANTHROPIC_API_KEY"] == ""
+        assert env["ANTHROPIC_AUTH_TOKEN"] == ""
+        assert env["ANTHROPIC_BASE_URL"] == ""
+
+    def test_direct_anthropic_inherits_api_key(self):
+        """Mode 2: direct Anthropic doesn't set ANTHROPIC_* keys (inherits from parent)."""
+        cfg = _make_config(
+            use_claude_code_subscription=False,
+            use_openrouter=False,
+        )
+        with patch(f"{_ENV}.config", cfg):
+            from backend.copilot.sdk.env import build_sdk_env
+
+            env = build_sdk_env()
+
+        assert "ANTHROPIC_API_KEY" not in env
+        assert "ANTHROPIC_AUTH_TOKEN" not in env
+        assert "ANTHROPIC_BASE_URL" not in env
+
+    def test_openrouter_sets_base_url_and_auth(self):
+        """Mode 3: OpenRouter sets base URL, auth token, and clears API key."""
+        cfg = _make_config(
+            use_claude_code_subscription=False,
+            use_openrouter=True,
+            api_key="sk-or-test",
+            base_url="https://openrouter.ai/api/v1",
+        )
+        with patch(f"{_ENV}.config", cfg):
+            from backend.copilot.sdk.env import build_sdk_env
+
+            env = build_sdk_env(session_id="sess-1", user_id="user-1")
+
+        assert env["ANTHROPIC_BASE_URL"] == "https://openrouter.ai/api"
+        assert env["ANTHROPIC_AUTH_TOKEN"] == "sk-or-test"
+        assert env["ANTHROPIC_API_KEY"] == ""
+        assert "x-session-id: sess-1" in env["ANTHROPIC_CUSTOM_HEADERS"]
+        assert "x-user-id: user-1" in env["ANTHROPIC_CUSTOM_HEADERS"]
+
+    def test_openrouter_no_headers_when_ids_empty(self):
+        """Mode 3: No custom headers when session_id/user_id are not given."""
+        cfg = _make_config(
+            use_claude_code_subscription=False,
+            use_openrouter=True,
+            api_key="sk-or-test",
+            base_url="https://openrouter.ai/api/v1",
+        )
+        with patch(f"{_ENV}.config", cfg):
+            from backend.copilot.sdk.env import build_sdk_env
+
+            env = build_sdk_env()
+
+        assert "ANTHROPIC_CUSTOM_HEADERS" not in env
+
+    def test_all_modes_return_mutable_dict(self):
+        """build_sdk_env must return a mutable dict (not None) in every mode."""
+        for cfg in (
+            _make_config(use_claude_code_subscription=True),
+            _make_config(use_openrouter=False),
+            _make_config(
+                use_openrouter=True,
+                api_key="k",
+                base_url="https://openrouter.ai/api/v1",
+            ),
+        ):
+            with (
+                patch(f"{_ENV}.config", cfg),
+                patch(f"{_ENV}.validate_subscription"),
+            ):
+                from backend.copilot.sdk.env import build_sdk_env
+
+                env = build_sdk_env()
+
+            assert isinstance(env, dict)
+            env["CLAUDE_CODE_TMPDIR"] = "/tmp/test"
+            assert env["CLAUDE_CODE_TMPDIR"] == "/tmp/test"
+
+
+# ---------------------------------------------------------------------------
+# is_transient_api_error
+# ---------------------------------------------------------------------------
+
+
+class TestIsTransientApiError:
+    """Verify that is_transient_api_error detects all transient patterns."""
+
+    @pytest.mark.parametrize(
+        "error_text",
+        [
+            "socket connection was closed unexpectedly",
+            "ECONNRESET",
+            "connection was forcibly closed",
+            "network socket disconnected",
+        ],
+    )
+    def test_connection_level_errors(self, error_text: str):
+        assert is_transient_api_error(error_text)
+
+    @pytest.mark.parametrize(
+        "error_text",
+        [
+            "rate limit exceeded",
+            "rate_limit_error",
+            "Too Many Requests",
+            "status code 429",
+        ],
+    )
+    def test_429_rate_limit_errors(self, error_text: str):
+        assert is_transient_api_error(error_text)
+
+    @pytest.mark.parametrize(
+        "error_text",
+        [
+            # Status-code-specific patterns (preferred — no false-positive risk)
+            "status code 529",
+            "status code 500",
+            "status code 502",
+            "status code 503",
+            "status code 504",
+        ],
+    )
+    def test_5xx_server_errors(self, error_text: str):
+        assert is_transient_api_error(error_text)
+
+    @pytest.mark.parametrize(
+        "error_text",
+        [
+            "invalid_api_key",
+            "Authentication failed",
+            "prompt is too long",
+            "model not found",
+            "",
+            # Natural-language phrases intentionally NOT matched — they are too
+            # broad and could appear in application-level SDK messages unrelated
+            # to Anthropic API transient conditions.
+            "API is overloaded",
+            "Internal Server Error",
+            "Bad Gateway",
+            "Service Unavailable",
+            "Gateway Timeout",
+        ],
+    )
+    def test_non_transient_errors(self, error_text: str):
+        assert not is_transient_api_error(error_text)
+
+    def test_case_insensitive(self):
+        assert is_transient_api_error("SOCKET CONNECTION WAS CLOSED UNEXPECTEDLY")
+        assert is_transient_api_error("econnreset")
+
+
+# ---------------------------------------------------------------------------
+# _HandledStreamError.already_yielded contract
+# ---------------------------------------------------------------------------
+
+
+class TestHandledStreamErrorAlreadyYielded:
+    """Verify the already_yielded semantics on _HandledStreamError."""
+
+    def test_default_already_yielded_is_true(self):
+        """Non-transient callers (circuit-breaker, idle timeout) don't pass the flag —
+        the default True means the outer loop won't yield a duplicate StreamError."""
+        from backend.copilot.sdk.service import _HandledStreamError
+
+        exc = _HandledStreamError("some error", code="circuit_breaker_empty_tool_calls")
+        assert exc.already_yielded is True
+
+    def test_transient_error_sets_already_yielded_false(self):
+        """Transient errors pass already_yielded=False so the outer loop
+        yields StreamError only once (when retries are exhausted)."""
+        from backend.copilot.sdk.service import _HandledStreamError
+
+        exc = _HandledStreamError(
+            "transient",
+            code="transient_api_error",
+            already_yielded=False,
+        )
+        assert exc.already_yielded is False
+
+    def test_backoff_capped_at_30s(self):
+        """Exponential backoff must be capped at 30 seconds.
+
+        With max_transient_retries=10, uncapped 2^9=512s would stall users
+        for 8+ minutes.  min(30, 2**(n-1)) keeps the ceiling at 30s.
+        """
+        # Check that 2^(10-1)=512 would exceed 30 but min() caps it.
+        assert min(30, 2 ** (10 - 1)) == 30
+        # Verify the formula is monotonically non-decreasing and capped.
+        backoffs = [min(30, 2 ** (n - 1)) for n in range(1, 11)]
+        assert all(b <= 30 for b in backoffs)
+        assert backoffs[-1] == 30  # last retry is capped
+        assert backoffs[0] == 1  # first retry starts at 1s
+
+
+# ---------------------------------------------------------------------------
+# Config validators for max_turns / max_budget_usd
+# ---------------------------------------------------------------------------
+
+
+class TestConfigValidators:
+    """Verify ge/le bounds on max_turns and max_budget_usd."""
+
+    def test_max_turns_rejects_zero(self):
+        with pytest.raises(ValidationError):
+            _make_config(claude_agent_max_turns=0)
+
+    def test_max_turns_rejects_negative(self):
+        with pytest.raises(ValidationError):
+            _make_config(claude_agent_max_turns=-1)
+
+    def test_max_turns_rejects_above_10000(self):
+        with pytest.raises(ValidationError):
+            _make_config(claude_agent_max_turns=10001)
+
+    def test_max_turns_accepts_boundary_values(self):
+        cfg_low = _make_config(claude_agent_max_turns=1)
+        assert cfg_low.claude_agent_max_turns == 1
+        cfg_high = _make_config(claude_agent_max_turns=10000)
+        assert cfg_high.claude_agent_max_turns == 10000
+
+    def test_max_budget_rejects_zero(self):
+        with pytest.raises(ValidationError):
+            _make_config(claude_agent_max_budget_usd=0.0)
+
+    def test_max_budget_rejects_negative(self):
+        with pytest.raises(ValidationError):
+            _make_config(claude_agent_max_budget_usd=-1.0)
+
+    def test_max_budget_rejects_above_1000(self):
+        with pytest.raises(ValidationError):
+            _make_config(claude_agent_max_budget_usd=1000.01)
+
+    def test_max_budget_accepts_boundary_values(self):
+        cfg_low = _make_config(claude_agent_max_budget_usd=0.01)
+        assert cfg_low.claude_agent_max_budget_usd == 0.01
+        cfg_high = _make_config(claude_agent_max_budget_usd=1000.0)
+        assert cfg_high.claude_agent_max_budget_usd == 1000.0
+
+    def test_max_transient_retries_rejects_negative(self):
+        with pytest.raises(ValidationError):
+            _make_config(claude_agent_max_transient_retries=-1)
+
+    def test_max_transient_retries_rejects_above_10(self):
+        with pytest.raises(ValidationError):
+            _make_config(claude_agent_max_transient_retries=11)
+
+    def test_max_transient_retries_accepts_boundary_values(self):
+        cfg_low = _make_config(claude_agent_max_transient_retries=0)
+        assert cfg_low.claude_agent_max_transient_retries == 0
+        cfg_high = _make_config(claude_agent_max_transient_retries=10)
+        assert cfg_high.claude_agent_max_transient_retries == 10
+
+
+# ---------------------------------------------------------------------------
+# transient_exhausted SSE code contract
+# ---------------------------------------------------------------------------
+
+
+class TestTransientExhaustedErrorCode:
+    """Verify transient-exhausted path emits the correct SSE error code."""
+
+    def test_transient_exhausted_uses_transient_api_error_code(self):
+        """When except-Exception transient retries are exhausted, the SSE
+        StreamError must use code='transient_api_error', not 'sdk_stream_error'.
+
+        This ensures the frontend shows the same 'Try again' affordance as
+        the _HandledStreamError path.
+        """
+        from backend.copilot.constants import FRIENDLY_TRANSIENT_MSG
+
+        # Simulate the post-loop branching logic extracted from service.py
+        attempts_exhausted = False
+        transient_exhausted = True
+        stream_err: Exception | None = ConnectionResetError("ECONNRESET")
+
+        if attempts_exhausted:
+            error_code = "all_attempts_exhausted"
+            error_text = "conversation too long"
+        elif transient_exhausted:
+            error_code = "transient_api_error"
+            error_text = FRIENDLY_TRANSIENT_MSG
+        else:
+            error_code = "sdk_stream_error"
+            error_text = f"SDK stream error: {stream_err}"
+
+        assert error_code == "transient_api_error"
+        assert error_text == FRIENDLY_TRANSIENT_MSG
+
+    def test_non_transient_exhausted_uses_sdk_stream_error_code(self):
+        """Non-transient fatal errors (auth, network) keep 'sdk_stream_error'."""
+        attempts_exhausted = False
+        transient_exhausted = False
+
+        if attempts_exhausted:
+            error_code = "all_attempts_exhausted"
+        elif transient_exhausted:
+            error_code = "transient_api_error"
+        else:
+            error_code = "sdk_stream_error"
+
+        assert error_code == "sdk_stream_error"
--- a/autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
@@ -8,20 +8,19 @@ from uuid import uuid4

 import pytest

-from backend.util import json
-from backend.util.prompt import CompressResult
-
-from .conftest import build_test_transcript as _build_transcript
-from .service import _friendly_error_text, _is_prompt_too_long
-from .transcript import (
+from backend.copilot.transcript import (
    _flatten_assistant_content,
    _flatten_tool_result_content,
    _messages_to_transcript,
    _run_compression,
    _transcript_to_messages,
-    compact_transcript,
-    validate_transcript,
 )
+from backend.util import json
+from backend.util.prompt import CompressResult
+
+from .conftest import build_test_transcript as _build_transcript
+from .service import _friendly_error_text, _is_prompt_too_long
+from .transcript import compact_transcript, validate_transcript

 # ---------------------------------------------------------------------------
 # _flatten_assistant_content
@@ -403,7 +402,7 @@ class TestCompactTranscript:
            },
        )()
        with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
            new_callable=AsyncMock,
            return_value=mock_result,
        ):
@@ -438,7 +437,7 @@ class TestCompactTranscript:
            },
        )()
        with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
            new_callable=AsyncMock,
            return_value=mock_result,
        ):
@@ -462,7 +461,7 @@ class TestCompactTranscript:
            ]
        )
        with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
            new_callable=AsyncMock,
            side_effect=RuntimeError("LLM unavailable"),
        ):
@@ -568,11 +567,11 @@ class TestRunCompressionTimeout:

        with (
            patch(
-                "backend.copilot.sdk.transcript.get_openai_client",
+                "backend.copilot.transcript.get_openai_client",
                return_value="fake-client",
            ),
            patch(
-                "backend.copilot.sdk.transcript.compress_context",
+                "backend.copilot.transcript.compress_context",
                side_effect=_mock_compress,
            ),
        ):
@@ -602,11 +601,11 @@ class TestRunCompressionTimeout:

        with (
            patch(
-                "backend.copilot.sdk.transcript.get_openai_client",
+                "backend.copilot.transcript.get_openai_client",
                return_value=None,
            ),
            patch(
-                "backend.copilot.sdk.transcript.compress_context",
+                "backend.copilot.transcript.compress_context",
                new_callable=AsyncMock,
                return_value=truncation_result,
            ) as mock_compress,
--- a/autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/response_adapter_test.py
@@ -260,13 +260,13 @@ def test_result_error_emits_error_and_finish():
        is_error=True,
        num_turns=0,
        session_id="s1",
-        result="API rate limited",
+        result="Invalid API key provided",
    )
    results = adapter.convert_message(msg)
    # No step was open, so no FinishStep — just Error + Finish
    assert len(results) == 2
    assert isinstance(results[0], StreamError)
-    assert "API rate limited" in results[0].errorText
+    assert "Invalid API key provided" in results[0].errorText
    assert isinstance(results[1], StreamFinish)


--- a/autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
@@ -26,18 +26,17 @@ from unittest.mock import AsyncMock, MagicMock, patch

 import pytest

-from backend.util import json
-
-from .conftest import build_test_transcript as _build_transcript
-from .service import _MAX_STREAM_ATTEMPTS, _reduce_context
-from .transcript import (
+from backend.copilot.transcript import (
    _flatten_assistant_content,
    _flatten_tool_result_content,
    _messages_to_transcript,
    _transcript_to_messages,
-    compact_transcript,
-    validate_transcript,
 )
+from backend.util import json
+
+from .conftest import build_test_transcript as _build_transcript
+from .service import _MAX_STREAM_ATTEMPTS, _reduce_context
+from .transcript import compact_transcript, validate_transcript
 from .transcript_builder import TranscriptBuilder

 # ---------------------------------------------------------------------------
@@ -113,7 +112,7 @@ class TestScenarioCompactAndRetry:
                )(),
            ),
            patch(
-                "backend.copilot.sdk.transcript._run_compression",
+                "backend.copilot.transcript._run_compression",
                new_callable=AsyncMock,
                return_value=mock_result,
            ),
@@ -170,7 +169,7 @@ class TestScenarioCompactFailsFallback:
                )(),
            ),
            patch(
-                "backend.copilot.sdk.transcript._run_compression",
+                "backend.copilot.transcript._run_compression",
                new_callable=AsyncMock,
                side_effect=RuntimeError("LLM unavailable"),
            ),
@@ -261,7 +260,7 @@ class TestScenarioDoubleFailDBFallback:
                )(),
            ),
            patch(
-                "backend.copilot.sdk.transcript._run_compression",
+                "backend.copilot.transcript._run_compression",
                new_callable=AsyncMock,
                return_value=mock_result,
            ),
@@ -337,7 +336,7 @@ class TestScenarioCompactionIdentical:
                )(),
            ),
            patch(
-                "backend.copilot.sdk.transcript._run_compression",
+                "backend.copilot.transcript._run_compression",
                new_callable=AsyncMock,
                return_value=mock_result,
            ),
@@ -730,7 +729,7 @@ class TestRetryEdgeCases:
                )(),
            ),
            patch(
-                "backend.copilot.sdk.transcript._run_compression",
+                "backend.copilot.transcript._run_compression",
                new_callable=AsyncMock,
                return_value=mock_result,
            ),
@@ -841,7 +840,7 @@ class TestRetryStateReset:
                )(),
            ),
            patch(
-                "backend.copilot.sdk.transcript._run_compression",
+                "backend.copilot.transcript._run_compression",
                new_callable=AsyncMock,
                side_effect=RuntimeError("boom"),
            ),
@@ -1405,9 +1404,9 @@ class TestStreamChatCompletionRetryIntegration:
                events.append(event)

        # Should NOT retry — only 1 attempt for auth errors
-        assert attempt_count[0] == 1, (
-            f"Expected 1 attempt (no retry for auth error), " f"got {attempt_count[0]}"
-        )
+        assert (
+            attempt_count[0] == 1
+        ), f"Expected 1 attempt (no retry for auth error), got {attempt_count[0]}"
        errors = [e for e in events if isinstance(e, StreamError)]
        assert errors, "Expected StreamError"
        assert errors[0].code == "sdk_stream_error"
--- a/autogpt_platform/backend/backend/copilot/sdk/sdk_compat_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/sdk_compat_test.py
@@ -105,6 +105,10 @@ def test_agent_options_accepts_all_our_fields():
        "env",
        "resume",
        "max_buffer_size",
+        "stderr",
+        "fallback_model",
+        "max_turns",
+        "max_budget_usd",
    ]
    sig = inspect.signature(ClaudeAgentOptions)
    for field in fields_we_use:
--- a/autogpt_platform/backend/backend/copilot/sdk/service.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/service.py
@@ -33,12 +33,24 @@ from pydantic import BaseModel

 from backend.copilot.context import get_workspace_manager
 from backend.copilot.permissions import apply_tool_permissions
+from backend.copilot.rate_limit import get_user_tier
+from backend.copilot.transcript import (
+    _run_compression,
+    cleanup_stale_project_dirs,
+    compact_transcript,
+    download_transcript,
+    read_compacted_entries,
+    upload_transcript,
+    validate_transcript,
+    write_transcript_to_tempfile,
+)
+from backend.copilot.transcript_builder import TranscriptBuilder
 from backend.data.redis_client import get_redis_async
 from backend.executor.cluster_lock import AsyncClusterLock
 from backend.util.exceptions import NotFoundError
 from backend.util.settings import Settings

-from ..config import ChatConfig
+from ..config import ChatConfig, CopilotMode
 from ..constants import (
    COPILOT_ERROR_PREFIX,
    COPILOT_RETRYABLE_ERROR_PREFIX,
@@ -51,6 +63,7 @@ from ..model import (
    ChatMessage,
    ChatSession,
    get_chat_session,
+    maybe_append_user_message,
    update_session_title,
    upsert_chat_session,
 )
@@ -92,17 +105,6 @@ from .tool_adapter import (
    set_execution_context,
    wait_for_stash,
 )
-from .transcript import (
-    _run_compression,
-    cleanup_stale_project_dirs,
-    compact_transcript,
-    download_transcript,
-    read_compacted_entries,
-    upload_transcript,
-    validate_transcript,
-    write_transcript_to_tempfile,
-)
-from .transcript_builder import TranscriptBuilder

 logger = logging.getLogger(__name__)
 config = ChatConfig()
@@ -129,6 +131,11 @@ _CIRCUIT_BREAKER_ERROR_MSG = (
    "Try breaking your request into smaller parts."
 )

+# Idle timeout: abort the stream if no meaningful SDK message (only heartbeats)
+# arrives for this many seconds. This catches hung tool calls (e.g. WebSearch
+# hanging on a search provider that never responds).
+_IDLE_TIMEOUT_SECONDS = 10 * 60  # 10 minutes
+
 # Patterns that indicate the prompt/request exceeds the model's context limit.
 # Matched case-insensitively against the full exception chain.
 _PROMPT_TOO_LONG_PATTERNS: tuple[str, ...] = (
@@ -538,17 +545,34 @@ async def _iter_sdk_messages(
                pass


+def _normalize_model_name(raw_model: str) -> str:
+    """Normalize a model name for the current routing configuration.
+
+    Applies two transformations shared by both the primary and fallback
+    model resolution paths:
+
+    1. **Strip provider prefix** — OpenRouter-style names like
+       ``"anthropic/claude-opus-4.6"`` are reduced to ``"claude-opus-4.6"``.
+    2. **Dot-to-hyphen conversion** — when *not* routing through OpenRouter
+       the direct Anthropic API requires hyphen-separated versions
+       (``"claude-opus-4-6"``), so dots are replaced with hyphens.
+    """
+    model = raw_model
+    if "/" in model:
+        model = model.split("/", 1)[1]
+    # OpenRouter uses dots in versions (claude-opus-4.6) but the direct
+    # Anthropic API requires hyphens (claude-opus-4-6).  Only normalise
+    # when NOT routing through OpenRouter.
+    if not config.openrouter_active:
+        model = model.replace(".", "-")
+    return model
+
+
 def _resolve_sdk_model() -> str | None:
    """Resolve the model name for the Claude Agent SDK CLI.

    Uses `config.claude_agent_model` if set, otherwise derives from
-    `config.model` by stripping the OpenRouter provider prefix (e.g.,
-    `"anthropic/claude-opus-4.6"` → `"claude-opus-4-6"`).
-
-    OpenRouter uses dot-separated versions (`claude-opus-4.6`) while the
-    direct Anthropic API uses hyphen-separated versions (`claude-opus-4-6`).
-    Normalisation is only applied when the SDK will actually talk to
-    Anthropic directly (not through OpenRouter).
+    `config.model` via :func:`_normalize_model_name`.

    When `use_claude_code_subscription` is enabled and no explicit
    `claude_agent_model` is set, returns `None` so the CLI uses the
@@ -558,15 +582,18 @@ def _resolve_sdk_model() -> str | None:
        return config.claude_agent_model
    if config.use_claude_code_subscription:
        return None
-    model = config.model
-    if "/" in model:
-        model = model.split("/", 1)[1]
-    # OpenRouter uses dots in versions (claude-opus-4.6) but the direct
-    # Anthropic API requires hyphens (claude-opus-4-6).  Only normalise
-    # when NOT routing through OpenRouter.
-    if not config.openrouter_active:
-        model = model.replace(".", "-")
-    return model
+    return _normalize_model_name(config.model)
+
+
+def _resolve_fallback_model() -> str | None:
+    """Resolve the fallback model name via :func:`_normalize_model_name`.
+
+    Returns ``None`` when no fallback is configured (empty string).
+    """
+    raw = config.claude_agent_fallback_model
+    if not raw:
+        return None
+    return _normalize_model_name(raw)


 def _make_sdk_cwd(session_id: str) -> str:
@@ -1056,17 +1083,25 @@ def _dispatch_response(


 class _HandledStreamError(Exception):
-    """Raised by `_run_stream_attempt` after it has already yielded a
-    `StreamError` to the client (e.g. transient API error, circuit breaker).
+    """Raised by `_run_stream_attempt` when an attempt fails and the outer
+    retry loop must roll back session state.

-    This signals the outer retry loop that the attempt failed so it can
-    perform session-message rollback and set the `ended_with_stream_error`
-    flag, **without** yielding a duplicate `StreamError` to the client.
+    Two sub-cases:
+
+    * ``already_yielded=True`` (default) — a ``StreamError`` was already sent
+      to the client inside ``_run_stream_attempt`` (circuit-breaker, idle
+      timeout, etc.).  The outer loop must **not** yield another one.
+    * ``already_yielded=False`` — the error is transient and the outer loop
+      will decide whether to retry or surface the error.  If retrying it
+      yields a ``StreamStatus("retrying…")``; if exhausted it yields the
+      ``StreamError`` itself so the client sees it only once.

    Attributes:
        error_msg: The user-facing error message to persist.
        code: Machine-readable error code (e.g. ``circuit_breaker_empty_tool_calls``).
        retryable: Whether the frontend should offer a retry button.
+        already_yielded: ``True`` when ``StreamError`` was already sent to the
+            client before this exception was raised.
    """

    def __init__(
@@ -1075,11 +1110,13 @@ class _HandledStreamError(Exception):
        error_msg: str | None = None,
        code: str | None = None,
        retryable: bool = True,
+        already_yielded: bool = True,
    ):
        super().__init__(message)
        self.error_msg = error_msg
        self.code = code
        self.retryable = retryable
+        self.already_yielded = already_yielded


@dataclass
@@ -1271,6 +1308,8 @@ async def _run_stream_attempt(
            await client.query(state.query_message, session_id=ctx.session_id)
            state.transcript_builder.append_user(content=ctx.current_message)

+        _last_real_msg_time = time.monotonic()
+
        async for sdk_msg in _iter_sdk_messages(client):
            # Heartbeat sentinel — refresh lock and keep SSE alive
            if sdk_msg is None:
@@ -1278,8 +1317,34 @@ async def _run_stream_attempt(
                for ev in ctx.compaction.emit_start_if_ready():
                    yield ev
                yield StreamHeartbeat()
+
+                # Idle timeout: if no real SDK message for too long, a tool
+                # call is likely hung (e.g. WebSearch provider not responding).
+                idle_seconds = time.monotonic() - _last_real_msg_time
+                if idle_seconds >= _IDLE_TIMEOUT_SECONDS:
+                    logger.error(
+                        "%s Idle timeout after %.0fs with no SDK message — "
+                        "aborting stream (likely hung tool call)",
+                        ctx.log_prefix,
+                        idle_seconds,
+                    )
+                    stream_error_msg = (
+                        "A tool call appears to be stuck "
+                        "(no response for 10 minutes). "
+                        "Please try again."
+                    )
+                    stream_error_code = "idle_timeout"
+                    _append_error_marker(ctx.session, stream_error_msg, retryable=True)
+                    yield StreamError(
+                        errorText=stream_error_msg,
+                        code=stream_error_code,
+                    )
+                    ended_with_stream_error = True
+                    break
                continue

+            _last_real_msg_time = time.monotonic()
+
            logger.info(
                "%s Received: %s %s (unresolved=%d, current=%d, resolved=%d)",
                ctx.log_prefix,
@@ -1342,15 +1407,12 @@ async def _run_stream_attempt(
                    )
                    stream_error_msg = FRIENDLY_TRANSIENT_MSG
                    stream_error_code = "transient_api_error"
-                    _append_error_marker(
-                        ctx.session,
-                        stream_error_msg,
-                        retryable=True,
-                    )
-                    yield StreamError(
-                        errorText=stream_error_msg,
-                        code=stream_error_code,
-                    )
+                    # Do NOT yield StreamError or append error marker here.
+                    # The outer retry loop decides: if a retry is available it
+                    # yields StreamStatus("retrying…"); if retries are exhausted
+                    # it appends the marker and yields StreamError exactly once.
+                    # Yielding StreamError before the retry decision causes the
+                    # client to display an error that is immediately superseded.
                    ended_with_stream_error = True
                    break

@@ -1528,9 +1590,21 @@ async def _run_stream_attempt(
            # --- Intermediate persistence ---
            # Flush session messages to DB periodically so page reloads
            # show progress during long-running turns.
+            #
+            # IMPORTANT: Skip the flush while tool calls are pending
+            # (tool_calls set on assistant but results not yet received).
+            # The DB save is append-only (uses start_sequence), so if we
+            # flush the assistant message before tool_calls are set on it
+            # (text and tool_use arrive as separate SDK events), the
+            # tool_calls update is lost — the next flush starts past it.
            _msgs_since_flush += 1
            now = time.monotonic()
-            if (
+            has_pending_tools = (
+                acc.has_appended_assistant
+                and acc.accumulated_tool_calls
+                and not acc.has_tool_results
+            )
+            if not has_pending_tools and (
                _msgs_since_flush >= _FLUSH_MESSAGE_THRESHOLD
                or (now - _last_flush_time) >= _FLUSH_INTERVAL_SECONDS
            ):
@@ -1611,14 +1685,16 @@ async def _run_stream_attempt(
    ) and not acc.has_appended_assistant:
        ctx.session.messages.append(acc.assistant_response)

-    # If the attempt ended with a transient error that was already surfaced
-    # to the client (StreamError yielded above), raise so the outer retry
-    # loop can rollback session messages and set its error flags properly.
+    # Raise so the outer retry loop can rollback session messages.
+    # already_yielded=False for transient_api_error: StreamError was NOT
+    # sent to the client yet (the outer loop does it when retries are
+    # exhausted, avoiding a premature error flash before the retry).
    if ended_with_stream_error:
        raise _HandledStreamError(
-            "Stream error handled — StreamError already yielded",
+            "Stream error handled",
            error_msg=stream_error_msg,
            code=stream_error_code,
+            already_yielded=(stream_error_code != "transient_api_error"),
        )


@@ -1630,6 +1706,7 @@ async def stream_chat_completion_sdk(
    session: ChatSession | None = None,
    file_ids: list[str] | None = None,
    permissions: "CopilotPermissions | None" = None,
+    mode: CopilotMode | None = None,
    **_kwargs: Any,
 ) -> AsyncIterator[StreamBaseResponse]:
    """Stream chat completion using Claude Agent SDK.
@@ -1638,7 +1715,10 @@ async def stream_chat_completion_sdk(
        file_ids: Optional workspace file IDs attached to the user's message.
            Images are embedded as vision content blocks; other files are
            saved to the SDK working directory for the Read tool.
+        mode: Accepted for signature compatibility with the baseline path.
+            The SDK path does not currently branch on this value.
    """
+    _ = mode  # SDK path ignores the requested mode.

    if session is None:
        session = await get_chat_session(session_id, user_id)
@@ -1669,19 +1749,12 @@ async def stream_chat_completion_sdk(
        )
        session.messages.pop()

-    # Append the new message to the session if it's not already there
-    new_message_role = "user" if is_user_message else "assistant"
-    if message and (
-        len(session.messages) == 0
-        or not (
-            session.messages[-1].role == new_message_role
-            and session.messages[-1].content == message
-        )
-    ):
-        session.messages.append(ChatMessage(role=new_message_role, content=message))
+    if maybe_append_user_message(session, message, is_user_message):
        if is_user_message:
            track_user_message(
-                user_id=user_id, session_id=session_id, message_length=len(message)
+                user_id=user_id,
+                session_id=session_id,
+                message_length=len(message or ""),
            )

    # Structured log prefix: [SDK][<session>][T<turn>]
@@ -1916,10 +1989,29 @@ async def stream_chat_completion_sdk(
            allowed = get_copilot_tool_names(use_e2b=use_e2b)
            disallowed = get_sdk_disallowed_tools(use_e2b=use_e2b)

+        # Flag set by _on_stderr when the SDK logs that it switched to the
+        # fallback model (e.g. on a 529 overloaded error).  Checked once per
+        # heartbeat cycle and emitted as a StreamStatus notification.
+        fallback_model_activated = False
+
        def _on_stderr(line: str) -> None:
            """Log a stderr line emitted by the Claude CLI subprocess."""
+            nonlocal fallback_model_activated
            sid = session_id[:12] if session_id else "?"
            logger.info("[SDK] [%s] CLI stderr: %s", sid, line.rstrip())
+            # Detect SDK fallback-model activation.  The CLI logs a
+            # message containing "fallback model" when it switches models
+            # after a 529/overloaded error.  Match "fallback model" rather
+            # than just "fallback" to avoid false positives from unrelated
+            # stderr lines (e.g. tool-level retries, cached result fallbacks).
+            lower = line.lower()
+            if not fallback_model_activated and "fallback model" in lower:
+                fallback_model_activated = True
+                logger.warning(
+                    "[SDK] [%s] Fallback model activated — primary model "
+                    "overloaded, switching to fallback",
+                    sid,
+                )

        sdk_options_kwargs: dict[str, Any] = {
            "system_prompt": system_prompt,
@@ -1930,6 +2022,15 @@ async def stream_chat_completion_sdk(
            "cwd": sdk_cwd,
            "max_buffer_size": config.claude_agent_max_buffer_size,
            "stderr": _on_stderr,
+            # --- P0 guardrails ---
+            # fallback_model: SDK auto-retries with this cheaper model on
+            # 529 (overloaded) errors, avoiding user-visible failures.
+            "fallback_model": _resolve_fallback_model(),
+            # max_turns: hard cap on agentic tool-use loops per query to
+            # prevent runaway execution from burning budget.
+            "max_turns": config.claude_agent_max_turns,
+            # max_budget_usd: per-query spend ceiling enforced by the CLI.
+            "max_budget_usd": config.claude_agent_max_budget_usd,
        }
        if sdk_model:
            sdk_options_kwargs["model"] = sdk_model
@@ -1946,15 +2047,20 @@ async def stream_chat_completion_sdk(
        # langsmith tracing integration attaches them to every span.  This
        # is what Langfuse (or any OTEL backend) maps to its native
        # user/session fields.
+        _user_tier = await get_user_tier(user_id) if user_id else None
+        _otel_metadata: dict[str, str] = {
+            "resume": str(use_resume),
+            "conversation_turn": str(turn),
+        }
+        if _user_tier:
+            _otel_metadata["subscription_tier"] = _user_tier.value
+
        _otel_ctx = propagate_attributes(
            user_id=user_id,
            session_id=session_id,
            trace_name="copilot-sdk",
            tags=["sdk"],
-            metadata={
-                "resume": str(use_resume),
-                "conversation_turn": str(turn),
-            },
+            metadata=_otel_metadata,
        )
        _otel_ctx.__enter__()

@@ -2009,8 +2115,29 @@ async def stream_chat_completion_sdk(
        # ---------------------------------------------------------------
        ended_with_stream_error = False
        attempts_exhausted = False
+        transient_exhausted = False
        stream_err: Exception | None = None

+        # Transient retry helper — deduplicates the logic shared between
+        # _HandledStreamError and the generic except-Exception handler.
+        transient_retries = 0
+        max_transient_retries = config.claude_agent_max_transient_retries
+
+        def _next_transient_backoff() -> int | None:
+            """Return the next backoff delay in seconds, or ``None`` to surface the error.
+
+            Returns the backoff seconds if a retry should be attempted,
+            or ``None`` if retries are exhausted or events were already
+            yielded.  Mutates outer ``transient_retries`` via nonlocal.
+            """
+            nonlocal transient_retries
+            if events_yielded > 0:
+                return None
+            transient_retries += 1
+            if transient_retries > max_transient_retries:
+                return None
+            return min(30, 2 ** (transient_retries - 1))  # 1s, 2s, 4s, …, cap 30s
+
        state = _RetryState(
            options=options,
            query_message=query_message,
@@ -2023,7 +2150,19 @@ async def stream_chat_completion_sdk(
            usage=_TokenUsage(),
        )

-        for attempt in range(_MAX_STREAM_ATTEMPTS):
+        attempt = 0
+        _last_reset_attempt = -1
+        while attempt < _MAX_STREAM_ATTEMPTS:
+            # Reset transient retry counter per context-level attempt so
+            # each attempt (original, compacted, no-transcript) gets the
+            # full retry budget for transient errors.
+            # Only reset when the attempt number actually changes —
+            # transient retries `continue` back to the loop top without
+            # incrementing `attempt`, so resetting unconditionally would
+            # create an infinite retry loop.
+            if attempt != _last_reset_attempt:
+                transient_retries = 0
+                _last_reset_attempt = attempt
            # Clear any stale stash signal from the previous attempt so
            # wait_for_stash() doesn't fire prematurely on a leftover event.
            reset_stash_event()
@@ -2078,7 +2217,15 @@ async def stream_chat_completion_sdk(
                state.usage.reset()

            pre_attempt_msg_count = len(session.messages)
+            # Snapshot transcript builder state — it maintains an
+            # independent _entries list from session.messages, so rolling
+            # back session.messages alone would leave duplicate entries
+            # from the failed attempt in the uploaded transcript.
+            pre_transcript_entries = list(state.transcript_builder._entries)
+            pre_transcript_uuid = state.transcript_builder._last_uuid
            events_yielded = 0
+            fallback_model_activated = False
+            fallback_notified = False

            try:
                async for event in _run_stream_attempt(stream_ctx, state):
@@ -2094,9 +2241,24 @@ async def stream_chat_completion_sdk(
                            StreamToolInputStart,
                            StreamToolInputAvailable,
                            StreamToolOutputAvailable,
+                            # Transient StreamError and StreamStatus are
+                            # ephemeral notifications, not content.  Counting
+                            # them would prevent the backoff retry from firing
+                            # because _next_transient_backoff() returns None
+                            # when events_yielded > 0.
+                            StreamError,
+                            StreamStatus,
                        ),
                    ):
                        events_yielded += 1
+                    # Emit a one-time StreamStatus when the SDK switches
+                    # to the fallback model (detected via stderr).
+                    if fallback_model_activated and not fallback_notified:
+                        fallback_notified = True
+                        yield StreamStatus(
+                            message="Primary model overloaded — "
+                            "using fallback model for this request"
+                        )
                    yield event
                break  # Stream completed — exit retry loop
            except asyncio.CancelledError:
@@ -2113,6 +2275,31 @@ async def stream_chat_completion_sdk(
                # session messages and set the error flag — do NOT set
                # stream_err so the post-loop code won't emit a
                # duplicate StreamError.
+                session.messages = session.messages[:pre_attempt_msg_count]
+                state.transcript_builder._entries = pre_transcript_entries
+                state.transcript_builder._last_uuid = pre_transcript_uuid
+                # Check if this is a transient error we can retry with backoff.
+                # exc.code is the only reliable signal — str(exc) is always the
+                # static "Stream error handled — StreamError already yielded" message.
+                if exc.code == "transient_api_error":
+                    backoff = _next_transient_backoff()
+                    if backoff is not None:
+                        logger.warning(
+                            "%s Transient error — retrying in %ds (%d/%d)",
+                            log_prefix,
+                            backoff,
+                            transient_retries,
+                            max_transient_retries,
+                        )
+                        yield StreamStatus(
+                            message=f"Connection interrupted, retrying in {backoff}s…"
+                        )
+                        await asyncio.sleep(backoff)
+                        state.adapter = SDKResponseAdapter(
+                            message_id=message_id, session_id=session_id
+                        )
+                        state.usage.reset()
+                        continue  # retry the same context-level attempt
                logger.warning(
                    "%s Stream error handled in attempt "
                    "(attempt %d/%d, code=%s, events_yielded=%d)",
@@ -2122,7 +2309,6 @@ async def stream_chat_completion_sdk(
                    exc.code or "transient",
                    events_yielded,
                )
-                session.messages = session.messages[:pre_attempt_msg_count]
                # transcript_builder still contains entries from the aborted
                # attempt that no longer match session.messages.  Skip upload
                # so a future --resume doesn't replay rolled-back content.
@@ -2137,22 +2323,37 @@ async def stream_chat_completion_sdk(
                    retryable=True,
                )
                ended_with_stream_error = True
+                # For transient errors the StreamError was deliberately NOT
+                # yielded inside _run_stream_attempt (already_yielded=False)
+                # so the client didn't see a premature error flash.  Yield it
+                # now that we know retries are exhausted.
+                # For non-transient errors (circuit breaker, idle timeout)
+                # already_yielded=True — do NOT yield again.
+                if not exc.already_yielded:
+                    yield StreamError(
+                        errorText=exc.error_msg or FRIENDLY_TRANSIENT_MSG,
+                        code=exc.code or "transient_api_error",
+                    )
                break
            except Exception as e:
                stream_err = e
                is_context_error = _is_prompt_too_long(e)
+                is_transient = is_transient_api_error(str(e))
                logger.warning(
                    "%s Stream error (attempt %d/%d, context_error=%s, "
-                    "events_yielded=%d): %s",
+                    "transient=%s, events_yielded=%d): %s",
                    log_prefix,
                    attempt + 1,
                    _MAX_STREAM_ATTEMPTS,
                    is_context_error,
+                    is_transient,
                    events_yielded,
                    stream_err,
                    exc_info=True,
                )
                session.messages = session.messages[:pre_attempt_msg_count]
+                state.transcript_builder._entries = pre_transcript_entries
+                state.transcript_builder._last_uuid = pre_transcript_uuid
                if events_yielded > 0:
                    # Events were already sent to the frontend and cannot be
                    # unsent.  Retrying would produce duplicate/inconsistent
@@ -2165,16 +2366,50 @@ async def stream_chat_completion_sdk(
                    skip_transcript_upload = True
                    ended_with_stream_error = True
                    break
+                # Transient API errors (ECONNRESET, 429, 5xx) — retry
+                # with exponential backoff via the shared helper.
+                if is_transient:
+                    backoff = _next_transient_backoff()
+                    if backoff is not None:
+                        logger.warning(
+                            "%s Transient exception — retrying in %ds (%d/%d)",
+                            log_prefix,
+                            backoff,
+                            transient_retries,
+                            max_transient_retries,
+                        )
+                        yield StreamStatus(
+                            message=f"Connection interrupted, retrying in {backoff}s…"
+                        )
+                        await asyncio.sleep(backoff)
+                        state.adapter = SDKResponseAdapter(
+                            message_id=message_id, session_id=session_id
+                        )
+                        state.usage.reset()
+                        continue  # retry same context-level attempt
+                    # Retries exhausted — persist retryable marker so the
+                    # frontend shows "Try again" after refresh.
+                    # Mirrors the _HandledStreamError exhausted-retry path
+                    # at line ~2310.
+                    transient_exhausted = True
+                    skip_transcript_upload = True
+                    _append_error_marker(
+                        session, FRIENDLY_TRANSIENT_MSG, retryable=True
+                    )
+                    ended_with_stream_error = True
+                    break
+
                if not is_context_error:
-                    # Non-context errors (network, auth, rate-limit) should
-                    # not trigger compaction — surface the error immediately.
+                    # Non-context, non-transient errors (auth, fatal)
+                    # should not trigger compaction — surface immediately.
                    skip_transcript_upload = True
                    ended_with_stream_error = True
                    break
+                attempt += 1  # advance to next context-level attempt
                continue
        else:
-            # All retry attempts exhausted (loop ended without break)
-            # skip_transcript_upload is already set by _reduce_context
+            # while condition became False — all attempts exhausted without
+            # break.  skip_transcript_upload is already set by _reduce_context
            # when the transcript was dropped (transcript_lost=True).
            ended_with_stream_error = True
            attempts_exhausted = True
@@ -2203,25 +2438,24 @@ async def stream_chat_completion_sdk(
                yield response

        if ended_with_stream_error and stream_err is not None:
-            # Use distinct error codes: "all_attempts_exhausted" when all
-            # retries were consumed vs "sdk_stream_error" for non-context
-            # errors that broke the loop immediately (network, auth, etc.).
+            # Use distinct error codes depending on how the loop ended:
+            # • "all_attempts_exhausted" — context compaction ran out of room
+            # • "transient_api_error" — 429/5xx/ECONNRESET retries exhausted
+            # • "sdk_stream_error" — non-context, non-transient fatal error
            safe_err = str(stream_err).replace("\n", " ").replace("\r", "")[:500]
            if attempts_exhausted:
                error_text = (
                    "Your conversation is too long. "
                    "Please start a new chat or clear some history."
                )
+                error_code = "all_attempts_exhausted"
+            elif transient_exhausted:
+                error_text = FRIENDLY_TRANSIENT_MSG
+                error_code = "transient_api_error"
            else:
                error_text = _friendly_error_text(safe_err)
-            yield StreamError(
-                errorText=error_text,
-                code=(
-                    "all_attempts_exhausted"
-                    if attempts_exhausted
-                    else "sdk_stream_error"
-                ),
-            )
+                error_code = "sdk_stream_error"
+            yield StreamError(errorText=error_text, code=error_code)

        # Copy token usage from retry state to outer-scope accumulators
        # so the finally block can persist them.
--- a/autogpt_platform/backend/backend/copilot/sdk/service_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/service_test.py
@@ -10,6 +10,7 @@ import pytest

 from .service import (
    _is_sdk_disconnect_error,
+    _normalize_model_name,
    _prepare_file_attachments,
    _resolve_sdk_model,
    _safe_close_sdk_client,
@@ -405,6 +406,49 @@ def _clean_config_env(monkeypatch: pytest.MonkeyPatch) -> None:
        monkeypatch.delenv(var, raising=False)


+class TestNormalizeModelName:
+    """Tests for _normalize_model_name — shared provider-aware normalization."""
+
+    def test_strips_provider_prefix(self, monkeypatch, _clean_config_env):
+        from backend.copilot import config as cfg_mod
+
+        cfg = cfg_mod.ChatConfig(
+            use_openrouter=False,
+            api_key=None,
+            base_url=None,
+            use_claude_code_subscription=False,
+        )
+        monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
+        assert _normalize_model_name("anthropic/claude-opus-4.6") == "claude-opus-4-6"
+
+    def test_dots_preserved_for_openrouter(self, monkeypatch, _clean_config_env):
+        from backend.copilot import config as cfg_mod
+
+        cfg = cfg_mod.ChatConfig(
+            use_openrouter=True,
+            api_key="or-key",
+            base_url="https://openrouter.ai/api/v1",
+            use_claude_code_subscription=False,
+        )
+        monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
+        assert _normalize_model_name("anthropic/claude-opus-4.6") == "claude-opus-4.6"
+
+    def test_no_prefix_no_dots(self, monkeypatch, _clean_config_env):
+        from backend.copilot import config as cfg_mod
+
+        cfg = cfg_mod.ChatConfig(
+            use_openrouter=False,
+            api_key=None,
+            base_url=None,
+            use_claude_code_subscription=False,
+        )
+        monkeypatch.setattr("backend.copilot.sdk.service.config", cfg)
+        assert (
+            _normalize_model_name("claude-sonnet-4-20250514")
+            == "claude-sonnet-4-20250514"
+        )
+
+
 class TestResolveSdkModel:
    """Tests for _resolve_sdk_model — model ID resolution for the SDK CLI."""

--- a/autogpt_platform/backend/backend/copilot/sdk/thinking_blocks_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/thinking_blocks_test.py
@@ -27,20 +27,19 @@ from backend.copilot.response_model import (
    StreamTextDelta,
    StreamTextStart,
 )
-from backend.util import json
-
-from .conftest import build_structured_transcript
-from .response_adapter import SDKResponseAdapter
-from .service import _format_sdk_content_blocks
-from .transcript import (
+from backend.copilot.transcript import (
    _find_last_assistant_entry,
    _flatten_assistant_content,
    _messages_to_transcript,
    _rechain_tail,
    _transcript_to_messages,
-    compact_transcript,
-    validate_transcript,
 )
+from backend.util import json
+
+from .conftest import build_structured_transcript
+from .response_adapter import SDKResponseAdapter
+from .service import _format_sdk_content_blocks
+from .transcript import compact_transcript, validate_transcript

 # ---------------------------------------------------------------------------
 # Fixtures: realistic thinking block content
@@ -439,7 +438,7 @@ class TestCompactTranscriptThinkingBlocks:
            },
        )()
        with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
            new_callable=AsyncMock,
            return_value=mock_result,
        ):
@@ -498,7 +497,7 @@ class TestCompactTranscriptThinkingBlocks:
            )()

        with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
            side_effect=mock_compression,
        ):
            await compact_transcript(transcript, model="test-model")
@@ -551,7 +550,7 @@ class TestCompactTranscriptThinkingBlocks:
            },
        )()
        with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
            new_callable=AsyncMock,
            return_value=mock_result,
        ):
@@ -601,7 +600,7 @@ class TestCompactTranscriptThinkingBlocks:
            },
        )()
        with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
            new_callable=AsyncMock,
            return_value=mock_result,
        ):
@@ -638,7 +637,7 @@ class TestCompactTranscriptThinkingBlocks:
            },
        )()
        with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
            new_callable=AsyncMock,
            return_value=mock_result,
        ):
@@ -699,7 +698,7 @@ class TestCompactTranscriptThinkingBlocks:
            },
        )()
        with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
            new_callable=AsyncMock,
            return_value=mock_result,
        ):
--- a/autogpt_platform/backend/backend/copilot/sdk/transcript.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/transcript.py
--- a/autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
@@ -1,235 +1,10 @@
-"""Build complete JSONL transcript from SDK messages.
+"""Re-export from shared ``backend.copilot.transcript_builder`` for backward compat.

-The transcript represents the FULL active context at any point in time.
-Each upload REPLACES the previous transcript atomically.
-
-Flow:
-  Turn 1: Upload [msg1, msg2]
-  Turn 2: Download [msg1, msg2] → Upload [msg1, msg2, msg3, msg4] (REPLACE)
-  Turn 3: Download [msg1, msg2, msg3, msg4] → Upload [all messages] (REPLACE)
-
-The transcript is never incremental - always the complete atomic state.
+The canonical implementation now lives at ``backend.copilot.transcript_builder``
+so both the SDK and baseline paths can import without cross-package
+dependencies.
 """

-import logging
-from typing import Any
-from uuid import uuid4
+from backend.copilot.transcript_builder import TranscriptBuilder, TranscriptEntry

-from pydantic import BaseModel
-
-from backend.util import json
-
-from .transcript import STRIPPABLE_TYPES
-
-logger = logging.getLogger(__name__)
-
-
-class TranscriptEntry(BaseModel):
-    """Single transcript entry (user or assistant turn)."""
-
-    type: str
-    uuid: str
-    parentUuid: str | None
-    isCompactSummary: bool | None = None
-    message: dict[str, Any]
-
-
-class TranscriptBuilder:
-    """Build complete JSONL transcript from SDK messages.
-
-    This builder maintains the FULL conversation state, not incremental changes.
-    The output is always the complete active context.
-    """
-
-    def __init__(self) -> None:
-        self._entries: list[TranscriptEntry] = []
-        self._last_uuid: str | None = None
-
-    def _last_is_assistant(self) -> bool:
-        return bool(self._entries) and self._entries[-1].type == "assistant"
-
-    def _last_message_id(self) -> str:
-        """Return the message.id of the last entry, or '' if none."""
-        if self._entries:
-            return self._entries[-1].message.get("id", "")
-        return ""
-
-    @staticmethod
-    def _parse_entry(data: dict) -> TranscriptEntry | None:
-        """Parse a single transcript entry, filtering strippable types.
-
-        Returns ``None`` for entries that should be skipped (strippable types
-        that are not compaction summaries).
-        """
-        entry_type = data.get("type", "")
-        if entry_type in STRIPPABLE_TYPES and not data.get("isCompactSummary"):
-            return None
-        return TranscriptEntry(
-            type=entry_type,
-            uuid=data.get("uuid") or str(uuid4()),
-            parentUuid=data.get("parentUuid"),
-            isCompactSummary=data.get("isCompactSummary"),
-            message=data.get("message", {}),
-        )
-
-    def load_previous(self, content: str, log_prefix: str = "[Transcript]") -> None:
-        """Load complete previous transcript.
-
-        This loads the FULL previous context. As new messages come in,
-        we append to this state. The final output is the complete context
-        (previous + new), not just the delta.
-        """
-        if not content or not content.strip():
-            return
-
-        lines = content.strip().split("\n")
-        for line_num, line in enumerate(lines, 1):
-            if not line.strip():
-                continue
-
-            data = json.loads(line, fallback=None)
-            if data is None:
-                logger.warning(
-                    "%s Failed to parse transcript line %d/%d",
-                    log_prefix,
-                    line_num,
-                    len(lines),
-                )
-                continue
-
-            entry = self._parse_entry(data)
-            if entry is None:
-                continue
-            self._entries.append(entry)
-            self._last_uuid = entry.uuid
-
-        logger.info(
-            "%s Loaded %d entries from previous transcript (last_uuid=%s)",
-            log_prefix,
-            len(self._entries),
-            self._last_uuid[:12] if self._last_uuid else None,
-        )
-
-    def append_user(self, content: str | list[dict], uuid: str | None = None) -> None:
-        """Append a user entry."""
-        msg_uuid = uuid or str(uuid4())
-
-        self._entries.append(
-            TranscriptEntry(
-                type="user",
-                uuid=msg_uuid,
-                parentUuid=self._last_uuid,
-                message={"role": "user", "content": content},
-            )
-        )
-        self._last_uuid = msg_uuid
-
-    def append_tool_result(self, tool_use_id: str, content: str) -> None:
-        """Append a tool result as a user entry (one per tool call)."""
-        self.append_user(
-            content=[
-                {"type": "tool_result", "tool_use_id": tool_use_id, "content": content}
-            ]
-        )
-
-    def append_assistant(
-        self,
-        content_blocks: list[dict],
-        model: str = "",
-        stop_reason: str | None = None,
-    ) -> None:
-        """Append an assistant entry.
-
-        Consecutive assistant entries automatically share the same message ID
-        so the CLI can merge them (thinking → text → tool_use) into a single
-        API message on ``--resume``.  A new ID is assigned whenever an
-        assistant entry follows a non-assistant entry (user message or tool
-        result), because that marks the start of a new API response.
-        """
-        message_id = (
-            self._last_message_id()
-            if self._last_is_assistant()
-            else f"msg_sdk_{uuid4().hex[:24]}"
-        )
-
-        msg_uuid = str(uuid4())
-
-        self._entries.append(
-            TranscriptEntry(
-                type="assistant",
-                uuid=msg_uuid,
-                parentUuid=self._last_uuid,
-                message={
-                    "role": "assistant",
-                    "model": model,
-                    "id": message_id,
-                    "type": "message",
-                    "content": content_blocks,
-                    "stop_reason": stop_reason,
-                    "stop_sequence": None,
-                },
-            )
-        )
-        self._last_uuid = msg_uuid
-
-    def replace_entries(
-        self, compacted_entries: list[dict], log_prefix: str = "[Transcript]"
-    ) -> None:
-        """Replace all entries with compacted entries from the CLI session file.
-
-        Called after mid-stream compaction so TranscriptBuilder mirrors the
-        CLI's active context (compaction summary + post-compaction entries).
-
-        Builds the new list first and validates it's non-empty before swapping,
-        so corrupt input cannot wipe the conversation history.
-        """
-        new_entries: list[TranscriptEntry] = []
-        for data in compacted_entries:
-            entry = self._parse_entry(data)
-            if entry is not None:
-                new_entries.append(entry)
-
-        if not new_entries:
-            logger.warning(
-                "%s replace_entries produced 0 entries from %d inputs, keeping old (%d entries)",
-                log_prefix,
-                len(compacted_entries),
-                len(self._entries),
-            )
-            return
-
-        old_count = len(self._entries)
-        self._entries = new_entries
-        self._last_uuid = new_entries[-1].uuid
-
-        logger.info(
-            "%s TranscriptBuilder compacted: %d entries -> %d entries",
-            log_prefix,
-            old_count,
-            len(self._entries),
-        )
-
-    def to_jsonl(self) -> str:
-        """Export complete context as JSONL.
-
-        Consecutive assistant entries are kept separate to match the
-        native CLI format — the SDK merges them internally on resume.
-
-        Returns the FULL conversation state (all entries), not incremental.
-        This output REPLACES any previous transcript.
-        """
-        if not self._entries:
-            return ""
-
-        lines = [entry.model_dump_json(exclude_none=True) for entry in self._entries]
-        return "\n".join(lines) + "\n"
-
-    @property
-    def entry_count(self) -> int:
-        """Total number of entries in the complete context."""
-        return len(self._entries)
-
-    @property
-    def is_empty(self) -> bool:
-        """Whether this builder has any entries."""
-        return len(self._entries) == 0
+__all__ = ["TranscriptBuilder", "TranscriptEntry"]
--- a/autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
@@ -303,7 +303,7 @@ class TestDeleteTranscript:
        mock_storage.delete = AsyncMock()

        with patch(
-            "backend.copilot.sdk.transcript.get_workspace_storage",
+            "backend.copilot.transcript.get_workspace_storage",
            new_callable=AsyncMock,
            return_value=mock_storage,
        ):
@@ -323,7 +323,7 @@ class TestDeleteTranscript:
        )

        with patch(
-            "backend.copilot.sdk.transcript.get_workspace_storage",
+            "backend.copilot.transcript.get_workspace_storage",
            new_callable=AsyncMock,
            return_value=mock_storage,
        ):
@@ -341,7 +341,7 @@ class TestDeleteTranscript:
        )

        with patch(
-            "backend.copilot.sdk.transcript.get_workspace_storage",
+            "backend.copilot.transcript.get_workspace_storage",
            new_callable=AsyncMock,
            return_value=mock_storage,
        ):
@@ -850,7 +850,7 @@ class TestRunCompression:
    @pytest.mark.asyncio
    async def test_no_client_uses_truncation(self):
        """Path (a): ``get_openai_client()`` returns None → truncation only."""
-        from .transcript import _run_compression
+        from backend.copilot.transcript import _run_compression

        truncation_result = self._make_compress_result(
            True, [{"role": "user", "content": "truncated"}]
@@ -858,11 +858,11 @@ class TestRunCompression:

        with (
            patch(
-                "backend.copilot.sdk.transcript.get_openai_client",
+                "backend.copilot.transcript.get_openai_client",
                return_value=None,
            ),
            patch(
-                "backend.copilot.sdk.transcript.compress_context",
+                "backend.copilot.transcript.compress_context",
                new_callable=AsyncMock,
                return_value=truncation_result,
            ) as mock_compress,
@@ -885,7 +885,7 @@ class TestRunCompression:
    @pytest.mark.asyncio
    async def test_llm_success_returns_llm_result(self):
        """Path (b): ``get_openai_client()`` returns a client → LLM compresses."""
-        from .transcript import _run_compression
+        from backend.copilot.transcript import _run_compression

        llm_result = self._make_compress_result(
            True, [{"role": "user", "content": "LLM summary"}]
@@ -894,11 +894,11 @@ class TestRunCompression:

        with (
            patch(
-                "backend.copilot.sdk.transcript.get_openai_client",
+                "backend.copilot.transcript.get_openai_client",
                return_value=mock_client,
            ),
            patch(
-                "backend.copilot.sdk.transcript.compress_context",
+                "backend.copilot.transcript.compress_context",
                new_callable=AsyncMock,
                return_value=llm_result,
            ) as mock_compress,
@@ -916,7 +916,7 @@ class TestRunCompression:
    @pytest.mark.asyncio
    async def test_llm_failure_falls_back_to_truncation(self):
        """Path (c): LLM call raises → truncation fallback used instead."""
-        from .transcript import _run_compression
+        from backend.copilot.transcript import _run_compression

        truncation_result = self._make_compress_result(
            True, [{"role": "user", "content": "truncated fallback"}]
@@ -932,11 +932,11 @@ class TestRunCompression:

        with (
            patch(
-                "backend.copilot.sdk.transcript.get_openai_client",
+                "backend.copilot.transcript.get_openai_client",
                return_value=mock_client,
            ),
            patch(
-                "backend.copilot.sdk.transcript.compress_context",
+                "backend.copilot.transcript.compress_context",
                side_effect=_compress_side_effect,
            ),
        ):
@@ -953,7 +953,7 @@ class TestRunCompression:
    @pytest.mark.asyncio
    async def test_llm_timeout_falls_back_to_truncation(self):
        """Path (d): LLM call exceeds timeout → truncation fallback used."""
-        from .transcript import _run_compression
+        from backend.copilot.transcript import _run_compression

        truncation_result = self._make_compress_result(
            True, [{"role": "user", "content": "truncated after timeout"}]
@@ -970,19 +970,19 @@ class TestRunCompression:
        fake_client = MagicMock()
        with (
            patch(
-                "backend.copilot.sdk.transcript.get_openai_client",
+                "backend.copilot.transcript.get_openai_client",
                return_value=fake_client,
            ),
            patch(
-                "backend.copilot.sdk.transcript.compress_context",
+                "backend.copilot.transcript.compress_context",
                side_effect=_compress_side_effect,
            ),
            patch(
-                "backend.copilot.sdk.transcript._COMPACTION_TIMEOUT_SECONDS",
+                "backend.copilot.transcript._COMPACTION_TIMEOUT_SECONDS",
                0.05,
            ),
            patch(
-                "backend.copilot.sdk.transcript._TRUNCATION_TIMEOUT_SECONDS",
+                "backend.copilot.transcript._TRUNCATION_TIMEOUT_SECONDS",
                5,
            ),
        ):
@@ -1007,7 +1007,7 @@ class TestCleanupStaleProjectDirs:

    def test_removes_old_copilot_dirs(self, tmp_path, monkeypatch):
        """Directories matching copilot pattern older than threshold are removed."""
-        from backend.copilot.sdk.transcript import (
+        from backend.copilot.transcript import (
            _STALE_PROJECT_DIR_SECONDS,
            cleanup_stale_project_dirs,
        )
@@ -1015,7 +1015,7 @@ class TestCleanupStaleProjectDirs:
        projects_dir = tmp_path / "projects"
        projects_dir.mkdir()
        monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
            lambda: str(projects_dir),
        )

@@ -1039,12 +1039,12 @@ class TestCleanupStaleProjectDirs:

    def test_ignores_non_copilot_dirs(self, tmp_path, monkeypatch):
        """Directories not matching copilot pattern are left alone."""
-        from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
+        from backend.copilot.transcript import cleanup_stale_project_dirs

        projects_dir = tmp_path / "projects"
        projects_dir.mkdir()
        monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
            lambda: str(projects_dir),
        )

@@ -1062,7 +1062,7 @@ class TestCleanupStaleProjectDirs:

    def test_ttl_boundary_not_removed(self, tmp_path, monkeypatch):
        """A directory exactly at the TTL boundary should NOT be removed."""
-        from backend.copilot.sdk.transcript import (
+        from backend.copilot.transcript import (
            _STALE_PROJECT_DIR_SECONDS,
            cleanup_stale_project_dirs,
        )
@@ -1070,7 +1070,7 @@ class TestCleanupStaleProjectDirs:
        projects_dir = tmp_path / "projects"
        projects_dir.mkdir()
        monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
            lambda: str(projects_dir),
        )

@@ -1088,7 +1088,7 @@ class TestCleanupStaleProjectDirs:

    def test_skips_non_directory_entries(self, tmp_path, monkeypatch):
        """Regular files matching the copilot pattern are not removed."""
-        from backend.copilot.sdk.transcript import (
+        from backend.copilot.transcript import (
            _STALE_PROJECT_DIR_SECONDS,
            cleanup_stale_project_dirs,
        )
@@ -1096,7 +1096,7 @@ class TestCleanupStaleProjectDirs:
        projects_dir = tmp_path / "projects"
        projects_dir.mkdir()
        monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
            lambda: str(projects_dir),
        )

@@ -1114,11 +1114,11 @@ class TestCleanupStaleProjectDirs:

    def test_missing_base_dir_returns_zero(self, tmp_path, monkeypatch):
        """If the projects base directory doesn't exist, return 0 gracefully."""
-        from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
+        from backend.copilot.transcript import cleanup_stale_project_dirs

        nonexistent = str(tmp_path / "does-not-exist" / "projects")
        monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
            lambda: nonexistent,
        )

@@ -1129,7 +1129,7 @@ class TestCleanupStaleProjectDirs:
        """When encoded_cwd is supplied only that directory is swept."""
        import time

-        from backend.copilot.sdk.transcript import (
+        from backend.copilot.transcript import (
            _STALE_PROJECT_DIR_SECONDS,
            cleanup_stale_project_dirs,
        )
@@ -1137,7 +1137,7 @@ class TestCleanupStaleProjectDirs:
        projects_dir = tmp_path / "projects"
        projects_dir.mkdir()
        monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
            lambda: str(projects_dir),
        )

@@ -1160,12 +1160,12 @@ class TestCleanupStaleProjectDirs:

    def test_scoped_fresh_dir_not_removed(self, tmp_path, monkeypatch):
        """Scoped sweep leaves a fresh directory alone."""
-        from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
+        from backend.copilot.transcript import cleanup_stale_project_dirs

        projects_dir = tmp_path / "projects"
        projects_dir.mkdir()
        monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
            lambda: str(projects_dir),
        )

@@ -1181,7 +1181,7 @@ class TestCleanupStaleProjectDirs:
        """Scoped sweep refuses to remove a non-copilot directory."""
        import time

-        from backend.copilot.sdk.transcript import (
+        from backend.copilot.transcript import (
            _STALE_PROJECT_DIR_SECONDS,
            cleanup_stale_project_dirs,
        )
@@ -1189,7 +1189,7 @@ class TestCleanupStaleProjectDirs:
        projects_dir = tmp_path / "projects"
        projects_dir.mkdir()
        monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
            lambda: str(projects_dir),
        )

--- a/autogpt_platform/backend/backend/copilot/service_test.py
+++ b/autogpt_platform/backend/backend/copilot/service_test.py
@@ -7,7 +7,7 @@ import pytest
 from .model import create_chat_session, get_chat_session, upsert_chat_session
 from .response_model import StreamError, StreamTextDelta
 from .sdk import service as sdk_service
-from .sdk.transcript import download_transcript
+from .transcript import download_transcript

 logger = logging.getLogger(__name__)

--- a/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py
@@ -33,12 +33,23 @@ _GET_CURRENT_DATE_BLOCK_ID = "b29c1b50-5d0e-4d9f-8f9d-1b0e6fcbf0b1"
 _GMAIL_SEND_BLOCK_ID = "6c27abc2-e51d-499e-a85f-5a0041ba94f0"
 _TEXT_REPLACE_BLOCK_ID = "7e7c87ab-3469-4bcc-9abe-67705091b713"

+# Default OrchestratorBlock model/mode — kept in sync with ChatConfig.model.
+# ChatConfig uses the OpenRouter format ("anthropic/claude-opus-4.6");
+# OrchestratorBlock uses the native Anthropic model name.
+ORCHESTRATOR_DEFAULT_MODEL = "claude-opus-4-6"
+ORCHESTRATOR_DEFAULT_EXECUTION_MODE = "extended_thinking"
+
 # Defaults applied to OrchestratorBlock nodes by the fixer.
-_SDM_DEFAULTS: dict[str, int | bool] = {
+# execution_mode and model match the copilot's default (extended thinking
+# with Opus) so generated agents inherit the same reasoning capabilities.
+# If the user explicitly sets these fields, the fixer won't override them.
+_SDM_DEFAULTS: dict[str, int | bool | str] = {
    "agent_mode_max_iterations": 10,
    "conversation_compaction": True,
    "retry": 3,
    "multiple_tool_calls": False,
+    "execution_mode": ORCHESTRATOR_DEFAULT_EXECUTION_MODE,
+    "model": ORCHESTRATOR_DEFAULT_MODEL,
 }


@@ -879,6 +890,12 @@ class AgentFixer:
            )

            if is_ai_block:
+                # Skip AI blocks that don't expose a "model" input property
+                # (some AI-category blocks have no model selector at all).
+                input_properties = block.get("inputSchema", {}).get("properties", {})
+                if "model" not in input_properties:
+                    continue
+
                node_id = node.get("id")
                input_default = node.get("input_default", {})
                current_model = input_default.get("model")
@@ -887,9 +904,7 @@ class AgentFixer:
                # Blocks with a block-specific enum on the model field (e.g.
                # PerplexityBlock) use their own enum values; others use the
                # generic set.
-                model_schema = (
-                    block.get("inputSchema", {}).get("properties", {}).get("model", {})
-                )
+                model_schema = input_properties.get("model", {})
                block_model_enum = model_schema.get("enum")

                if block_model_enum:
@@ -1649,6 +1664,8 @@ class AgentFixer:
        2. ``conversation_compaction`` defaults to ``True``
        3. ``retry`` defaults to ``3``
        4. ``multiple_tool_calls`` defaults to ``False``
+        5. ``execution_mode`` defaults to ``"extended_thinking"``
+        6. ``model`` defaults to ``"claude-opus-4-6"``

        Args:
            agent: The agent dictionary to fix
@@ -1748,6 +1765,12 @@ class AgentFixer:
        agent = self.fix_node_x_coordinates(agent, node_lookup=node_lookup)
        agent = self.fix_getcurrentdate_offset(agent)

+        # Apply OrchestratorBlock defaults BEFORE fix_ai_model_parameter so that
+        # the orchestrator-specific model (claude-opus-4-6) is set first and
+        # fix_ai_model_parameter sees it as a valid allowed model instead of
+        # overwriting it with the generic default (gpt-4o).
+        agent = self.fix_orchestrator_blocks(agent)
+
        # Apply fixes that require blocks information
        if blocks:
            agent = self.fix_invalid_nested_sink_links(
@@ -1765,9 +1788,6 @@ class AgentFixer:
        # Apply fixes for MCPToolBlock nodes
        agent = self.fix_mcp_tool_blocks(agent)

-        # Apply fixes for OrchestratorBlock nodes (agent-mode defaults)
-        agent = self.fix_orchestrator_blocks(agent)
-
        # Apply fixes for AgentExecutorBlock nodes (sub-agents)
        if library_agents:
            agent = self.fix_agent_executor_blocks(agent, library_agents)
--- a/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer_test.py
@@ -580,6 +580,29 @@ class TestFixAiModelParameter:

        assert result["nodes"][0]["input_default"]["model"] == "perplexity/sonar"

+    def test_ai_block_without_model_property_is_skipped(self):
+        """AI-category blocks that have no 'model' input property should not
+        have a model injected — they simply don't expose a model selector."""
+        fixer = AgentFixer()
+        block_id = generate_uuid()
+        node = _make_node(node_id="n1", block_id=block_id, input_default={})
+        agent = _make_agent(nodes=[node])
+
+        blocks = [
+            {
+                "id": block_id,
+                "name": "SomeAIBlock",
+                "categories": [{"category": "AI"}],
+                "inputSchema": {
+                    "properties": {"prompt": {"type": "string"}},
+                },
+            }
+        ]
+
+        result = fixer.fix_ai_model_parameter(agent, blocks)
+
+        assert "model" not in result["nodes"][0]["input_default"]
+

 class TestFixAgentExecutorBlocks:
    """Tests for fix_agent_executor_blocks."""
--- a/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide.py
+++ b/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide.py
@@ -42,7 +42,10 @@ class GetAgentBuildingGuideTool(BaseTool):

    @property
    def description(self) -> str:
-        return "Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage). Call before generating agent JSON."
+        return (
+            "Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage, "
+            "and the create->dry-run->fix iterative workflow). Call before generating agent JSON."
+        )

    @property
    def parameters(self) -> dict[str, Any]:
--- a/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide_test.py
@@ -0,0 +1,15 @@
+"""Tests for GetAgentBuildingGuideTool."""
+
+from backend.copilot.tools.get_agent_building_guide import _load_guide
+
+
+def test_load_guide_returns_string():
+    guide = _load_guide()
+    assert isinstance(guide, str)
+    assert len(guide) > 100
+
+
+def test_load_guide_caches():
+    guide1 = _load_guide()
+    guide2 = _load_guide()
+    assert guide1 is guide2
--- a/autogpt_platform/backend/backend/copilot/tools/helpers.py
+++ b/autogpt_platform/backend/backend/copilot/tools/helpers.py
@@ -48,27 +48,41 @@ logger = logging.getLogger(__name__)
 def get_inputs_from_schema(
    input_schema: dict[str, Any],
    exclude_fields: set[str] | None = None,
+    input_data: dict[str, Any] | None = None,
 ) -> list[dict[str, Any]]:
-    """Extract input field info from JSON schema."""
+    """Extract input field info from JSON schema.
+
+    When *input_data* is provided, each field's ``value`` key is populated
+    with the value the CoPilot already supplied — so the frontend can
+    prefill the form instead of showing empty inputs.  Fields marked
+    ``advanced`` in the schema are flagged so the frontend can hide them
+    by default (matching the builder behaviour).
+    """
    if not isinstance(input_schema, dict):
        return []

    exclude = exclude_fields or set()
    properties = input_schema.get("properties", {})
    required = set(input_schema.get("required", []))
+    provided = input_data or {}

-    return [
-        {
+    results: list[dict[str, Any]] = []
+    for name, schema in properties.items():
+        if name in exclude:
+            continue
+        entry: dict[str, Any] = {
            "name": name,
            "title": schema.get("title", name),
            "type": schema.get("type", "string"),
            "description": schema.get("description", ""),
            "required": name in required,
            "default": schema.get("default"),
+            "advanced": schema.get("advanced", False),
        }
-        for name, schema in properties.items()
-        if name not in exclude
-    ]
+        if name in provided:
+            entry["value"] = provided[name]
+        results.append(entry)
+    return results


 async def execute_block(
@@ -446,7 +460,9 @@ async def prepare_block_for_execution(
                requirements={
                    "credentials": missing_creds_list,
                    "inputs": get_inputs_from_schema(
-                        input_schema, exclude_fields=credentials_fields
+                        input_schema,
+                        exclude_fields=credentials_fields,
+                        input_data=input_data,
                    ),
                    "execution_modes": ["immediate"],
                },
--- a/autogpt_platform/backend/backend/copilot/tools/run_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/run_agent.py
@@ -153,7 +153,11 @@ class RunAgentTool(BaseTool):
                },
                "dry_run": {
                    "type": "boolean",
-                    "description": "Execute in preview mode.",
+                    "description": (
+                        "When true, simulates execution using an LLM for each block "
+                        "— no real API calls, credentials, or credits. "
+                        "See agent_generation_guide for the full workflow."
+                    ),
                },
            },
            "required": ["dry_run"],
--- a/autogpt_platform/backend/backend/copilot/tools/workspace_files.py
+++ b/autogpt_platform/backend/backend/copilot/tools/workspace_files.py
@@ -845,6 +845,7 @@ class WriteWorkspaceFileTool(BaseTool):
                path=path,
                mime_type=mime_type,
                overwrite=overwrite,
+                metadata={"origin": "agent-created"},
            )

            # Build informative source label and message.
--- a/autogpt_platform/backend/backend/copilot/transcript.py
+++ b/autogpt_platform/backend/backend/copilot/transcript.py
--- a/autogpt_platform/backend/backend/copilot/transcript_builder.py
+++ b/autogpt_platform/backend/backend/copilot/transcript_builder.py
@@ -0,0 +1,240 @@
+"""Build complete JSONL transcript from SDK messages.
+
+The transcript represents the FULL active context at any point in time.
+Each upload REPLACES the previous transcript atomically.
+
+Flow:
+  Turn 1: Upload [msg1, msg2]
+  Turn 2: Download [msg1, msg2] → Upload [msg1, msg2, msg3, msg4] (REPLACE)
+  Turn 3: Download [msg1, msg2, msg3, msg4] → Upload [all messages] (REPLACE)
+
+The transcript is never incremental - always the complete atomic state.
+"""
+
+import logging
+from typing import Any
+from uuid import uuid4
+
+from pydantic import BaseModel
+
+from backend.util import json
+
+from .transcript import STRIPPABLE_TYPES
+
+logger = logging.getLogger(__name__)
+
+
+class TranscriptEntry(BaseModel):
+    """Single transcript entry (user or assistant turn)."""
+
+    type: str
+    uuid: str
+    parentUuid: str = ""
+    isCompactSummary: bool | None = None
+    message: dict[str, Any]
+
+
+class TranscriptBuilder:
+    """Build complete JSONL transcript from SDK messages.
+
+    This builder maintains the FULL conversation state, not incremental changes.
+    The output is always the complete active context.
+    """
+
+    def __init__(self) -> None:
+        self._entries: list[TranscriptEntry] = []
+        self._last_uuid: str | None = None
+
+    def _last_is_assistant(self) -> bool:
+        return bool(self._entries) and self._entries[-1].type == "assistant"
+
+    def _last_message_id(self) -> str:
+        """Return the message.id of the last entry, or '' if none."""
+        if self._entries:
+            return self._entries[-1].message.get("id", "")
+        return ""
+
+    @staticmethod
+    def _parse_entry(data: dict) -> TranscriptEntry | None:
+        """Parse a single transcript entry, filtering strippable types.
+
+        Returns ``None`` for entries that should be skipped (strippable types
+        that are not compaction summaries).
+        """
+        entry_type = data.get("type", "")
+        if entry_type in STRIPPABLE_TYPES and not data.get("isCompactSummary"):
+            return None
+        return TranscriptEntry(
+            type=entry_type,
+            uuid=data.get("uuid") or str(uuid4()),
+            parentUuid=data.get("parentUuid") or "",
+            isCompactSummary=data.get("isCompactSummary"),
+            message=data.get("message", {}),
+        )
+
+    def load_previous(self, content: str, log_prefix: str = "[Transcript]") -> None:
+        """Load complete previous transcript.
+
+        This loads the FULL previous context. As new messages come in,
+        we append to this state. The final output is the complete context
+        (previous + new), not just the delta.
+        """
+        if not content or not content.strip():
+            return
+
+        lines = content.strip().split("\n")
+        for line_num, line in enumerate(lines, 1):
+            if not line.strip():
+                continue
+
+            data = json.loads(line, fallback=None)
+            if data is None:
+                logger.warning(
+                    "%s Failed to parse transcript line %d/%d",
+                    log_prefix,
+                    line_num,
+                    len(lines),
+                )
+                continue
+
+            entry = self._parse_entry(data)
+            if entry is None:
+                continue
+            self._entries.append(entry)
+            self._last_uuid = entry.uuid
+
+        logger.info(
+            "%s Loaded %d entries from previous transcript (last_uuid=%s)",
+            log_prefix,
+            len(self._entries),
+            self._last_uuid[:12] if self._last_uuid else None,
+        )
+
+    def append_user(self, content: str | list[dict], uuid: str | None = None) -> None:
+        """Append a user entry."""
+        msg_uuid = uuid or str(uuid4())
+
+        self._entries.append(
+            TranscriptEntry(
+                type="user",
+                uuid=msg_uuid,
+                parentUuid=self._last_uuid or "",
+                message={"role": "user", "content": content},
+            )
+        )
+        self._last_uuid = msg_uuid
+
+    def append_tool_result(self, tool_use_id: str, content: str) -> None:
+        """Append a tool result as a user entry (one per tool call)."""
+        self.append_user(
+            content=[
+                {"type": "tool_result", "tool_use_id": tool_use_id, "content": content}
+            ]
+        )
+
+    def append_assistant(
+        self,
+        content_blocks: list[dict],
+        model: str = "",
+        stop_reason: str | None = None,
+    ) -> None:
+        """Append an assistant entry.
+
+        Consecutive assistant entries automatically share the same message ID
+        so the CLI can merge them (thinking → text → tool_use) into a single
+        API message on ``--resume``.  A new ID is assigned whenever an
+        assistant entry follows a non-assistant entry (user message or tool
+        result), because that marks the start of a new API response.
+        """
+        message_id = (
+            self._last_message_id()
+            if self._last_is_assistant()
+            else f"msg_sdk_{uuid4().hex[:24]}"
+        )
+
+        msg_uuid = str(uuid4())
+
+        self._entries.append(
+            TranscriptEntry(
+                type="assistant",
+                uuid=msg_uuid,
+                parentUuid=self._last_uuid or "",
+                message={
+                    "role": "assistant",
+                    "model": model,
+                    "id": message_id,
+                    "type": "message",
+                    "content": content_blocks,
+                    "stop_reason": stop_reason,
+                    "stop_sequence": None,
+                },
+            )
+        )
+        self._last_uuid = msg_uuid
+
+    def replace_entries(
+        self, compacted_entries: list[dict], log_prefix: str = "[Transcript]"
+    ) -> None:
+        """Replace all entries with compacted entries from the CLI session file.
+
+        Called after mid-stream compaction so TranscriptBuilder mirrors the
+        CLI's active context (compaction summary + post-compaction entries).
+
+        Builds the new list first and validates it's non-empty before swapping,
+        so corrupt input cannot wipe the conversation history.
+        """
+        new_entries: list[TranscriptEntry] = []
+        for data in compacted_entries:
+            entry = self._parse_entry(data)
+            if entry is not None:
+                new_entries.append(entry)
+
+        if not new_entries:
+            logger.warning(
+                "%s replace_entries produced 0 entries from %d inputs, keeping old (%d entries)",
+                log_prefix,
+                len(compacted_entries),
+                len(self._entries),
+            )
+            return
+
+        old_count = len(self._entries)
+        self._entries = new_entries
+        self._last_uuid = new_entries[-1].uuid
+
+        logger.info(
+            "%s TranscriptBuilder compacted: %d entries -> %d entries",
+            log_prefix,
+            old_count,
+            len(self._entries),
+        )
+
+    def to_jsonl(self) -> str:
+        """Export complete context as JSONL.
+
+        Consecutive assistant entries are kept separate to match the
+        native CLI format — the SDK merges them internally on resume.
+
+        Returns the FULL conversation state (all entries), not incremental.
+        This output REPLACES any previous transcript.
+        """
+        if not self._entries:
+            return ""
+
+        lines = [entry.model_dump_json(exclude_none=True) for entry in self._entries]
+        return "\n".join(lines) + "\n"
+
+    @property
+    def entry_count(self) -> int:
+        """Total number of entries in the complete context."""
+        return len(self._entries)
+
+    @property
+    def is_empty(self) -> bool:
+        """Whether this builder has any entries."""
+        return len(self._entries) == 0
+
+    @property
+    def last_entry_type(self) -> str | None:
+        """Type of the last entry, or None if empty."""
+        return self._entries[-1].type if self._entries else None
--- a/autogpt_platform/backend/backend/copilot/transcript_builder_test.py
+++ b/autogpt_platform/backend/backend/copilot/transcript_builder_test.py
@@ -0,0 +1,260 @@
+"""Tests for canonical TranscriptBuilder (backend.copilot.transcript_builder).
+
+These tests directly import from the canonical module to ensure codecov
+patch coverage for the new file.
+"""
+
+from backend.copilot.transcript_builder import TranscriptBuilder, TranscriptEntry
+from backend.util import json
+
+
+def _make_jsonl(*entries: dict) -> str:
+    return "\n".join(json.dumps(e) for e in entries) + "\n"
+
+
+USER_MSG = {
+    "type": "user",
+    "uuid": "u1",
+    "message": {"role": "user", "content": "hello"},
+}
+ASST_MSG = {
+    "type": "assistant",
+    "uuid": "a1",
+    "parentUuid": "u1",
+    "message": {
+        "role": "assistant",
+        "id": "msg_1",
+        "type": "message",
+        "content": [{"type": "text", "text": "hi"}],
+        "stop_reason": "end_turn",
+        "stop_sequence": None,
+    },
+}
+
+
+class TestTranscriptEntry:
+    def test_basic_construction(self):
+        entry = TranscriptEntry(
+            type="user", uuid="u1", message={"role": "user", "content": "hi"}
+        )
+        assert entry.type == "user"
+        assert entry.uuid == "u1"
+        assert entry.parentUuid == ""
+        assert entry.isCompactSummary is None
+
+    def test_optional_fields(self):
+        entry = TranscriptEntry(
+            type="summary",
+            uuid="s1",
+            parentUuid="p1",
+            isCompactSummary=True,
+            message={"role": "user", "content": "summary"},
+        )
+        assert entry.isCompactSummary is True
+        assert entry.parentUuid == "p1"
+
+
+class TestTranscriptBuilderInit:
+    def test_starts_empty(self):
+        builder = TranscriptBuilder()
+        assert builder.is_empty
+        assert builder.entry_count == 0
+        assert builder.last_entry_type is None
+        assert builder.to_jsonl() == ""
+
+
+class TestAppendUser:
+    def test_appends_user_entry(self):
+        builder = TranscriptBuilder()
+        builder.append_user("hello")
+        assert builder.entry_count == 1
+        assert builder.last_entry_type == "user"
+
+    def test_chains_parent_uuid(self):
+        builder = TranscriptBuilder()
+        builder.append_user("first", uuid="u1")
+        builder.append_user("second", uuid="u2")
+        output = builder.to_jsonl()
+        entries = [json.loads(line) for line in output.strip().split("\n")]
+        assert entries[0]["parentUuid"] == ""
+        assert entries[1]["parentUuid"] == "u1"
+
+    def test_custom_uuid(self):
+        builder = TranscriptBuilder()
+        builder.append_user("hello", uuid="custom-id")
+        output = builder.to_jsonl()
+        entry = json.loads(output.strip())
+        assert entry["uuid"] == "custom-id"
+
+
+class TestAppendToolResult:
+    def test_appends_as_user_entry(self):
+        builder = TranscriptBuilder()
+        builder.append_tool_result(tool_use_id="tc_1", content="result text")
+        assert builder.entry_count == 1
+        assert builder.last_entry_type == "user"
+        output = builder.to_jsonl()
+        entry = json.loads(output.strip())
+        content = entry["message"]["content"]
+        assert len(content) == 1
+        assert content[0]["type"] == "tool_result"
+        assert content[0]["tool_use_id"] == "tc_1"
+        assert content[0]["content"] == "result text"
+
+
+class TestAppendAssistant:
+    def test_appends_assistant_entry(self):
+        builder = TranscriptBuilder()
+        builder.append_user("hi")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "hello"}],
+            model="test-model",
+            stop_reason="end_turn",
+        )
+        assert builder.entry_count == 2
+        assert builder.last_entry_type == "assistant"
+
+    def test_consecutive_assistants_share_message_id(self):
+        builder = TranscriptBuilder()
+        builder.append_user("hi")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "part 1"}],
+            model="m",
+        )
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "part 2"}],
+            model="m",
+        )
+        output = builder.to_jsonl()
+        entries = [json.loads(line) for line in output.strip().split("\n")]
+        # The two assistant entries share the same message ID
+        assert entries[1]["message"]["id"] == entries[2]["message"]["id"]
+
+    def test_non_consecutive_assistants_get_different_ids(self):
+        builder = TranscriptBuilder()
+        builder.append_user("q1")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "a1"}],
+            model="m",
+        )
+        builder.append_user("q2")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "a2"}],
+            model="m",
+        )
+        output = builder.to_jsonl()
+        entries = [json.loads(line) for line in output.strip().split("\n")]
+        assert entries[1]["message"]["id"] != entries[3]["message"]["id"]
+
+
+class TestLoadPrevious:
+    def test_loads_valid_entries(self):
+        content = _make_jsonl(USER_MSG, ASST_MSG)
+        builder = TranscriptBuilder()
+        builder.load_previous(content)
+        assert builder.entry_count == 2
+
+    def test_skips_empty_content(self):
+        builder = TranscriptBuilder()
+        builder.load_previous("")
+        assert builder.is_empty
+        builder.load_previous("   ")
+        assert builder.is_empty
+
+    def test_skips_strippable_types(self):
+        progress = {"type": "progress", "uuid": "p1", "message": {}}
+        content = _make_jsonl(USER_MSG, progress, ASST_MSG)
+        builder = TranscriptBuilder()
+        builder.load_previous(content)
+        assert builder.entry_count == 2  # progress was skipped
+
+    def test_preserves_compact_summary(self):
+        compact = {
+            "type": "summary",
+            "uuid": "cs1",
+            "isCompactSummary": True,
+            "message": {"role": "user", "content": "summary"},
+        }
+        content = _make_jsonl(compact, ASST_MSG)
+        builder = TranscriptBuilder()
+        builder.load_previous(content)
+        assert builder.entry_count == 2
+
+    def test_skips_invalid_json_lines(self):
+        content = '{"type":"user","uuid":"u1","message":{}}\nnot-valid-json\n'
+        builder = TranscriptBuilder()
+        builder.load_previous(content)
+        assert builder.entry_count == 1
+
+
+class TestToJsonl:
+    def test_roundtrip(self):
+        builder = TranscriptBuilder()
+        builder.append_user("hello", uuid="u1")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "world"}],
+            model="m",
+        )
+        output = builder.to_jsonl()
+        assert output.endswith("\n")
+        lines = output.strip().split("\n")
+        assert len(lines) == 2
+        for line in lines:
+            parsed = json.loads(line)
+            assert "type" in parsed
+            assert "uuid" in parsed
+            assert "message" in parsed
+
+
+class TestReplaceEntries:
+    def test_replaces_all_entries(self):
+        builder = TranscriptBuilder()
+        builder.append_user("old")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "old answer"}], model="m"
+        )
+        assert builder.entry_count == 2
+
+        compacted = [
+            {
+                "type": "summary",
+                "uuid": "cs1",
+                "isCompactSummary": True,
+                "message": {"role": "user", "content": "compacted"},
+            }
+        ]
+        builder.replace_entries(compacted)
+        assert builder.entry_count == 1
+
+    def test_empty_replacement_keeps_existing(self):
+        builder = TranscriptBuilder()
+        builder.append_user("keep me")
+        builder.replace_entries([])
+        assert builder.entry_count == 1
+
+
+class TestParseEntry:
+    def test_filters_strippable_non_compact(self):
+        result = TranscriptBuilder._parse_entry(
+            {"type": "progress", "uuid": "p1", "message": {}}
+        )
+        assert result is None
+
+    def test_keeps_compact_summary(self):
+        result = TranscriptBuilder._parse_entry(
+            {
+                "type": "summary",
+                "uuid": "cs1",
+                "isCompactSummary": True,
+                "message": {},
+            }
+        )
+        assert result is not None
+        assert result.isCompactSummary is True
+
+    def test_generates_uuid_if_missing(self):
+        result = TranscriptBuilder._parse_entry(
+            {"type": "user", "message": {"role": "user", "content": "hi"}}
+        )
+        assert result is not None
+        assert result.uuid  # Should be a generated UUID
--- a/autogpt_platform/backend/backend/copilot/transcript_test.py
+++ b/autogpt_platform/backend/backend/copilot/transcript_test.py
@@ -0,0 +1,726 @@
+"""Tests for canonical transcript module (backend.copilot.transcript).
+
+Covers pure helper functions that are not exercised by the SDK re-export tests.
+"""
+
+from __future__ import annotations
+
+from unittest.mock import MagicMock
+
+from backend.util import json
+
+from .transcript import (
+    TranscriptDownload,
+    _build_path_from_parts,
+    _find_last_assistant_entry,
+    _flatten_assistant_content,
+    _flatten_tool_result_content,
+    _messages_to_transcript,
+    _meta_storage_path_parts,
+    _rechain_tail,
+    _sanitize_id,
+    _storage_path_parts,
+    _transcript_to_messages,
+    strip_for_upload,
+    validate_transcript,
+)
+
+
+def _make_jsonl(*entries: dict) -> str:
+    return "\n".join(json.dumps(e) for e in entries) + "\n"
+
+
+# ---------------------------------------------------------------------------
+# _sanitize_id
+# ---------------------------------------------------------------------------
+
+
+class TestSanitizeId:
+    def test_uuid_passes_through(self):
+        assert _sanitize_id("abcdef12-3456-7890-abcd-ef1234567890") == (
+            "abcdef12-3456-7890-abcd-ef1234567890"
+        )
+
+    def test_strips_non_hex_characters(self):
+        # Only hex chars (0-9, a-f, A-F) and hyphens are kept
+        result = _sanitize_id("abc/../../etc/passwd")
+        assert "/" not in result
+        assert "." not in result
+        # 'p', 's', 'w' are not hex chars, so they are stripped
+        assert all(c in "0123456789abcdefABCDEF-" for c in result)
+
+    def test_truncates_to_max_len(self):
+        long_id = "a" * 100
+        result = _sanitize_id(long_id, max_len=10)
+        assert len(result) == 10
+
+    def test_empty_returns_unknown(self):
+        assert _sanitize_id("") == "unknown"
+
+    def test_none_returns_unknown(self):
+        assert _sanitize_id(None) == "unknown"  # type: ignore[arg-type]
+
+    def test_special_chars_only_returns_unknown(self):
+        assert _sanitize_id("!@#$%^&*()") == "unknown"
+
+
+# ---------------------------------------------------------------------------
+# _storage_path_parts / _meta_storage_path_parts
+# ---------------------------------------------------------------------------
+
+
+class TestStoragePathParts:
+    def test_returns_triple(self):
+        prefix, uid, fname = _storage_path_parts("user-1", "sess-2")
+        assert prefix == "chat-transcripts"
+        assert "e" in uid  # hex chars from "user-1" sanitized
+        assert fname.endswith(".jsonl")
+
+    def test_meta_returns_meta_json(self):
+        prefix, uid, fname = _meta_storage_path_parts("user-1", "sess-2")
+        assert prefix == "chat-transcripts"
+        assert fname.endswith(".meta.json")
+
+
+# ---------------------------------------------------------------------------
+# _build_path_from_parts
+# ---------------------------------------------------------------------------
+
+
+class TestBuildPathFromParts:
+    def test_gcs_backend(self):
+        from backend.util.workspace_storage import GCSWorkspaceStorage
+
+        mock_gcs = MagicMock(spec=GCSWorkspaceStorage)
+        mock_gcs.bucket_name = "my-bucket"
+        path = _build_path_from_parts(("wid", "fid", "file.jsonl"), mock_gcs)
+        assert path == "gcs://my-bucket/workspaces/wid/fid/file.jsonl"
+
+    def test_local_backend(self):
+        # Use a plain object (not MagicMock) so isinstance(GCSWorkspaceStorage) is False
+        local_backend = type("LocalBackend", (), {})()
+        path = _build_path_from_parts(("wid", "fid", "file.jsonl"), local_backend)
+        assert path == "local://wid/fid/file.jsonl"
+
+
+# ---------------------------------------------------------------------------
+# TranscriptDownload dataclass
+# ---------------------------------------------------------------------------
+
+
+class TestTranscriptDownload:
+    def test_defaults(self):
+        td = TranscriptDownload(content="hello")
+        assert td.content == "hello"
+        assert td.message_count == 0
+        assert td.uploaded_at == 0.0
+
+    def test_custom_values(self):
+        td = TranscriptDownload(content="data", message_count=5, uploaded_at=123.45)
+        assert td.message_count == 5
+        assert td.uploaded_at == 123.45
+
+
+# ---------------------------------------------------------------------------
+# _flatten_assistant_content
+# ---------------------------------------------------------------------------
+
+
+class TestFlattenAssistantContent:
+    def test_text_blocks(self):
+        blocks = [
+            {"type": "text", "text": "Hello"},
+            {"type": "text", "text": "World"},
+        ]
+        assert _flatten_assistant_content(blocks) == "Hello\nWorld"
+
+    def test_thinking_blocks_stripped(self):
+        blocks = [
+            {"type": "thinking", "thinking": "hmm..."},
+            {"type": "text", "text": "answer"},
+            {"type": "redacted_thinking", "data": "secret"},
+        ]
+        assert _flatten_assistant_content(blocks) == "answer"
+
+    def test_tool_use_blocks_stripped(self):
+        blocks = [
+            {"type": "text", "text": "I'll run a tool"},
+            {"type": "tool_use", "name": "bash", "id": "tc1", "input": {}},
+        ]
+        assert _flatten_assistant_content(blocks) == "I'll run a tool"
+
+    def test_string_blocks(self):
+        blocks = ["hello", "world"]
+        assert _flatten_assistant_content(blocks) == "hello\nworld"
+
+    def test_empty_blocks(self):
+        assert _flatten_assistant_content([]) == ""
+
+    def test_unknown_dict_blocks_skipped(self):
+        blocks = [{"type": "image", "data": "base64..."}]
+        assert _flatten_assistant_content(blocks) == ""
+
+
+# ---------------------------------------------------------------------------
+# _flatten_tool_result_content
+# ---------------------------------------------------------------------------
+
+
+class TestFlattenToolResultContent:
+    def test_tool_result_with_text_content(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "tc1",
+                "content": [{"type": "text", "text": "output data"}],
+            }
+        ]
+        assert _flatten_tool_result_content(blocks) == "output data"
+
+    def test_tool_result_with_string_content(self):
+        blocks = [
+            {"type": "tool_result", "tool_use_id": "tc1", "content": "simple string"}
+        ]
+        assert _flatten_tool_result_content(blocks) == "simple string"
+
+    def test_tool_result_with_image_placeholder(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "tc1",
+                "content": [{"type": "image", "data": "base64..."}],
+            }
+        ]
+        assert _flatten_tool_result_content(blocks) == "[__image__]"
+
+    def test_tool_result_with_document_placeholder(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "tc1",
+                "content": [{"type": "document", "data": "base64..."}],
+            }
+        ]
+        assert _flatten_tool_result_content(blocks) == "[__document__]"
+
+    def test_tool_result_with_none_content(self):
+        blocks = [{"type": "tool_result", "tool_use_id": "tc1", "content": None}]
+        assert _flatten_tool_result_content(blocks) == ""
+
+    def test_text_block_outside_tool_result(self):
+        blocks = [{"type": "text", "text": "standalone"}]
+        assert _flatten_tool_result_content(blocks) == "standalone"
+
+    def test_unknown_dict_block_placeholder(self):
+        blocks = [{"type": "custom_widget", "data": "x"}]
+        assert _flatten_tool_result_content(blocks) == "[__custom_widget__]"
+
+    def test_string_blocks(self):
+        blocks = ["raw text"]
+        assert _flatten_tool_result_content(blocks) == "raw text"
+
+    def test_empty_blocks(self):
+        assert _flatten_tool_result_content([]) == ""
+
+    def test_mixed_content_in_tool_result(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "tc1",
+                "content": [
+                    {"type": "text", "text": "line1"},
+                    {"type": "image", "data": "..."},
+                    "raw string",
+                ],
+            }
+        ]
+        result = _flatten_tool_result_content(blocks)
+        assert "line1" in result
+        assert "[__image__]" in result
+        assert "raw string" in result
+
+    def test_tool_result_with_dict_without_text_key(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "tc1",
+                "content": [{"count": 42}],
+            }
+        ]
+        result = _flatten_tool_result_content(blocks)
+        assert "42" in result
+
+    def test_tool_result_content_list_with_list_content(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "tc1",
+                "content": [{"type": "text", "text": None}],
+            }
+        ]
+        result = _flatten_tool_result_content(blocks)
+        assert result == "None"
+
+
+# ---------------------------------------------------------------------------
+# _transcript_to_messages
+# ---------------------------------------------------------------------------
+
+USER_ENTRY = {
+    "type": "user",
+    "uuid": "u1",
+    "parentUuid": "",
+    "message": {"role": "user", "content": "hello"},
+}
+ASST_ENTRY = {
+    "type": "assistant",
+    "uuid": "a1",
+    "parentUuid": "u1",
+    "message": {
+        "role": "assistant",
+        "id": "msg_1",
+        "content": [{"type": "text", "text": "hi there"}],
+    },
+}
+PROGRESS_ENTRY = {
+    "type": "progress",
+    "uuid": "p1",
+    "parentUuid": "u1",
+    "data": {},
+}
+
+
+class TestTranscriptToMessages:
+    def test_basic_conversion(self):
+        content = _make_jsonl(USER_ENTRY, ASST_ENTRY)
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 2
+        assert messages[0] == {"role": "user", "content": "hello"}
+        assert messages[1]["role"] == "assistant"
+        assert messages[1]["content"] == "hi there"
+
+    def test_skips_strippable_types(self):
+        content = _make_jsonl(USER_ENTRY, PROGRESS_ENTRY, ASST_ENTRY)
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 2
+
+    def test_skips_entries_without_role(self):
+        no_role = {"type": "user", "uuid": "x", "message": {"content": "no role"}}
+        content = _make_jsonl(no_role)
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 0
+
+    def test_handles_string_content(self):
+        entry = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "plain string"},
+        }
+        content = _make_jsonl(entry)
+        messages = _transcript_to_messages(content)
+        assert messages[0]["content"] == "plain string"
+
+    def test_handles_tool_result_content(self):
+        entry = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {
+                "role": "user",
+                "content": [
+                    {"type": "tool_result", "tool_use_id": "tc1", "content": "output"}
+                ],
+            },
+        }
+        content = _make_jsonl(entry)
+        messages = _transcript_to_messages(content)
+        assert messages[0]["content"] == "output"
+
+    def test_handles_none_content(self):
+        entry = {
+            "type": "assistant",
+            "uuid": "a1",
+            "message": {"role": "assistant", "content": None},
+        }
+        content = _make_jsonl(entry)
+        messages = _transcript_to_messages(content)
+        assert messages[0]["content"] == ""
+
+    def test_skips_invalid_json(self):
+        content = "not valid json\n"
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 0
+
+    def test_preserves_compact_summary(self):
+        compact = {
+            "type": "summary",
+            "uuid": "cs1",
+            "isCompactSummary": True,
+            "message": {"role": "user", "content": "summary of conversation"},
+        }
+        content = _make_jsonl(compact)
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 1
+
+    def test_strips_summary_without_compact_flag(self):
+        summary = {
+            "type": "summary",
+            "uuid": "s1",
+            "message": {"role": "user", "content": "summary"},
+        }
+        content = _make_jsonl(summary)
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 0
+
+
+# ---------------------------------------------------------------------------
+# _messages_to_transcript
+# ---------------------------------------------------------------------------
+
+
+class TestMessagesToTranscript:
+    def test_basic_roundtrip(self):
+        messages = [
+            {"role": "user", "content": "hello"},
+            {"role": "assistant", "content": "world"},
+        ]
+        result = _messages_to_transcript(messages)
+        assert result.endswith("\n")
+        lines = result.strip().split("\n")
+        assert len(lines) == 2
+
+        user_entry = json.loads(lines[0])
+        assert user_entry["type"] == "user"
+        assert user_entry["message"]["role"] == "user"
+        assert user_entry["message"]["content"] == "hello"
+        assert user_entry["parentUuid"] == ""
+
+        asst_entry = json.loads(lines[1])
+        assert asst_entry["type"] == "assistant"
+        assert asst_entry["message"]["role"] == "assistant"
+        assert asst_entry["message"]["content"] == [{"type": "text", "text": "world"}]
+        assert asst_entry["parentUuid"] == user_entry["uuid"]
+
+    def test_empty_messages(self):
+        assert _messages_to_transcript([]) == ""
+
+    def test_assistant_has_message_envelope(self):
+        messages = [{"role": "assistant", "content": "test"}]
+        result = _messages_to_transcript(messages)
+        entry = json.loads(result.strip())
+        msg = entry["message"]
+        assert "id" in msg
+        assert msg["id"].startswith("msg_compact_")
+        assert msg["type"] == "message"
+        assert msg["stop_reason"] == "end_turn"
+        assert msg["stop_sequence"] is None
+
+    def test_uuid_chain(self):
+        messages = [
+            {"role": "user", "content": "a"},
+            {"role": "assistant", "content": "b"},
+            {"role": "user", "content": "c"},
+        ]
+        result = _messages_to_transcript(messages)
+        lines = result.strip().split("\n")
+        entries = [json.loads(line) for line in lines]
+        assert entries[0]["parentUuid"] == ""
+        assert entries[1]["parentUuid"] == entries[0]["uuid"]
+        assert entries[2]["parentUuid"] == entries[1]["uuid"]
+
+    def test_assistant_with_empty_content(self):
+        messages = [{"role": "assistant", "content": ""}]
+        result = _messages_to_transcript(messages)
+        entry = json.loads(result.strip())
+        assert entry["message"]["content"] == []
+
+
+# ---------------------------------------------------------------------------
+# _find_last_assistant_entry
+# ---------------------------------------------------------------------------
+
+
+class TestFindLastAssistantEntry:
+    def test_splits_at_last_assistant(self):
+        user = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "hi"},
+        }
+        asst = {
+            "type": "assistant",
+            "uuid": "a1",
+            "message": {"role": "assistant", "id": "msg1", "content": "answer"},
+        }
+        content = _make_jsonl(user, asst)
+        prefix, tail = _find_last_assistant_entry(content)
+        assert len(prefix) == 1
+        assert len(tail) == 1
+
+    def test_no_assistant_returns_all_in_prefix(self):
+        user1 = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "hi"},
+        }
+        user2 = {
+            "type": "user",
+            "uuid": "u2",
+            "message": {"role": "user", "content": "hey"},
+        }
+        content = _make_jsonl(user1, user2)
+        prefix, tail = _find_last_assistant_entry(content)
+        assert len(prefix) == 2
+        assert len(tail) == 0
+
+    def test_multi_entry_turn_preserved(self):
+        user = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "q"},
+        }
+        asst1 = {
+            "type": "assistant",
+            "uuid": "a1",
+            "message": {
+                "role": "assistant",
+                "id": "msg_turn",
+                "content": [{"type": "thinking", "thinking": "hmm"}],
+            },
+        }
+        asst2 = {
+            "type": "assistant",
+            "uuid": "a2",
+            "message": {
+                "role": "assistant",
+                "id": "msg_turn",
+                "content": [{"type": "text", "text": "answer"}],
+            },
+        }
+        content = _make_jsonl(user, asst1, asst2)
+        prefix, tail = _find_last_assistant_entry(content)
+        assert len(prefix) == 1  # just the user
+        assert len(tail) == 2  # both assistant entries
+
+    def test_assistant_without_id(self):
+        user = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "q"},
+        }
+        asst = {
+            "type": "assistant",
+            "uuid": "a1",
+            "message": {"role": "assistant", "content": "no id"},
+        }
+        content = _make_jsonl(user, asst)
+        prefix, tail = _find_last_assistant_entry(content)
+        assert len(prefix) == 1
+        assert len(tail) == 1
+
+    def test_trailing_user_after_assistant(self):
+        user1 = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "q"},
+        }
+        asst = {
+            "type": "assistant",
+            "uuid": "a1",
+            "message": {"role": "assistant", "id": "msg1", "content": "a"},
+        }
+        user2 = {
+            "type": "user",
+            "uuid": "u2",
+            "message": {"role": "user", "content": "follow"},
+        }
+        content = _make_jsonl(user1, asst, user2)
+        prefix, tail = _find_last_assistant_entry(content)
+        assert len(prefix) == 1  # user1
+        assert len(tail) == 2  # asst + user2
+
+
+# ---------------------------------------------------------------------------
+# _rechain_tail
+# ---------------------------------------------------------------------------
+
+
+class TestRechainTail:
+    def test_empty_tail(self):
+        assert _rechain_tail("some prefix\n", []) == ""
+
+    def test_patches_first_entry_parent(self):
+        prefix_entry = {"uuid": "last-prefix-uuid", "type": "user", "message": {}}
+        prefix = json.dumps(prefix_entry) + "\n"
+
+        tail_entry = {
+            "uuid": "t1",
+            "parentUuid": "old-parent",
+            "type": "assistant",
+            "message": {},
+        }
+        tail_lines = [json.dumps(tail_entry)]
+
+        result = _rechain_tail(prefix, tail_lines)
+        parsed = json.loads(result.strip())
+        assert parsed["parentUuid"] == "last-prefix-uuid"
+
+    def test_chains_consecutive_tail_entries(self):
+        prefix_entry = {"uuid": "p1", "type": "user", "message": {}}
+        prefix = json.dumps(prefix_entry) + "\n"
+
+        t1 = {"uuid": "t1", "parentUuid": "old1", "type": "assistant", "message": {}}
+        t2 = {"uuid": "t2", "parentUuid": "old2", "type": "user", "message": {}}
+        tail_lines = [json.dumps(t1), json.dumps(t2)]
+
+        result = _rechain_tail(prefix, tail_lines)
+        entries = [json.loads(line) for line in result.strip().split("\n")]
+        assert entries[0]["parentUuid"] == "p1"
+        assert entries[1]["parentUuid"] == "t1"
+
+    def test_non_dict_lines_passed_through(self):
+        prefix_entry = {"uuid": "p1", "type": "user", "message": {}}
+        prefix = json.dumps(prefix_entry) + "\n"
+
+        tail_lines = ["not-a-json-dict"]
+        result = _rechain_tail(prefix, tail_lines)
+        assert "not-a-json-dict" in result
+
+
+# ---------------------------------------------------------------------------
+# strip_for_upload (combined single-parse)
+# ---------------------------------------------------------------------------
+
+
+class TestStripForUpload:
+    def test_strips_progress_and_thinking(self):
+        user = {
+            "type": "user",
+            "uuid": "u1",
+            "parentUuid": "",
+            "message": {"role": "user", "content": "hi"},
+        }
+        progress = {"type": "progress", "uuid": "p1", "parentUuid": "u1", "data": {}}
+        asst_old = {
+            "type": "assistant",
+            "uuid": "a1",
+            "parentUuid": "p1",
+            "message": {
+                "role": "assistant",
+                "id": "msg_old",
+                "content": [
+                    {"type": "thinking", "thinking": "stale thinking"},
+                    {"type": "text", "text": "old answer"},
+                ],
+            },
+        }
+        user2 = {
+            "type": "user",
+            "uuid": "u2",
+            "parentUuid": "a1",
+            "message": {"role": "user", "content": "next"},
+        }
+        asst_new = {
+            "type": "assistant",
+            "uuid": "a2",
+            "parentUuid": "u2",
+            "message": {
+                "role": "assistant",
+                "id": "msg_new",
+                "content": [
+                    {"type": "thinking", "thinking": "fresh thinking"},
+                    {"type": "text", "text": "new answer"},
+                ],
+            },
+        }
+        content = _make_jsonl(user, progress, asst_old, user2, asst_new)
+        result = strip_for_upload(content)
+
+        lines = result.strip().split("\n")
+        # Progress should be stripped -> 4 entries remain
+        assert len(lines) == 4
+
+        # First entry (user) should be reparented since its child (progress) was stripped
+        entries = [json.loads(line) for line in lines]
+        types = [e.get("type") for e in entries]
+        assert "progress" not in types
+
+        # Old assistant thinking stripped, new assistant thinking preserved
+        old_asst = next(
+            e for e in entries if e.get("message", {}).get("id") == "msg_old"
+        )
+        old_content = old_asst["message"]["content"]
+        old_types = [b["type"] for b in old_content if isinstance(b, dict)]
+        assert "thinking" not in old_types
+        assert "text" in old_types
+
+        new_asst = next(
+            e for e in entries if e.get("message", {}).get("id") == "msg_new"
+        )
+        new_content = new_asst["message"]["content"]
+        new_types = [b["type"] for b in new_content if isinstance(b, dict)]
+        assert "thinking" in new_types  # last assistant preserved
+
+    def test_empty_content(self):
+        result = strip_for_upload("")
+        # Empty string produces a single empty line after split, resulting in "\n"
+        assert result.strip() == ""
+
+    def test_preserves_compact_summary(self):
+        compact = {
+            "type": "summary",
+            "uuid": "cs1",
+            "isCompactSummary": True,
+            "message": {"role": "user", "content": "summary"},
+        }
+        asst = {
+            "type": "assistant",
+            "uuid": "a1",
+            "parentUuid": "cs1",
+            "message": {"role": "assistant", "id": "msg1", "content": "answer"},
+        }
+        content = _make_jsonl(compact, asst)
+        result = strip_for_upload(content)
+        lines = result.strip().split("\n")
+        assert len(lines) == 2
+
+    def test_no_assistant_entries(self):
+        user = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "hi"},
+        }
+        content = _make_jsonl(user)
+        result = strip_for_upload(content)
+        lines = result.strip().split("\n")
+        assert len(lines) == 1
+
+
+# ---------------------------------------------------------------------------
+# validate_transcript (additional edge cases)
+# ---------------------------------------------------------------------------
+
+
+class TestValidateTranscript:
+    def test_valid_with_assistant(self):
+        content = _make_jsonl(
+            USER_ENTRY,
+            ASST_ENTRY,
+        )
+        assert validate_transcript(content) is True
+
+    def test_none_returns_false(self):
+        assert validate_transcript(None) is False
+
+    def test_whitespace_only_returns_false(self):
+        assert validate_transcript("   \n  ") is False
+
+    def test_no_assistant_returns_false(self):
+        content = _make_jsonl(USER_ENTRY)
+        assert validate_transcript(content) is False
+
+    def test_invalid_json_returns_false(self):
+        assert validate_transcript("not json\n") is False
+
+    def test_assistant_only_is_valid(self):
+        content = _make_jsonl(ASST_ENTRY)
+        assert validate_transcript(content) is True
--- a/autogpt_platform/backend/backend/data/block_cost_config.py
+++ b/autogpt_platform/backend/backend/data/block_cost_config.py
@@ -147,6 +147,19 @@ MODEL_COST: dict[LlmModel, int] = {
    LlmModel.KIMI_K2: 1,
    LlmModel.QWEN3_235B_A22B_THINKING: 1,
    LlmModel.QWEN3_CODER: 9,
+    # Z.ai (Zhipu) models
+    LlmModel.ZAI_GLM_4_32B: 1,
+    LlmModel.ZAI_GLM_4_5: 2,
+    LlmModel.ZAI_GLM_4_5_AIR: 1,
+    LlmModel.ZAI_GLM_4_5_AIR_FREE: 1,
+    LlmModel.ZAI_GLM_4_5V: 2,
+    LlmModel.ZAI_GLM_4_6: 1,
+    LlmModel.ZAI_GLM_4_6V: 1,
+    LlmModel.ZAI_GLM_4_7: 1,
+    LlmModel.ZAI_GLM_4_7_FLASH: 1,
+    LlmModel.ZAI_GLM_5: 2,
+    LlmModel.ZAI_GLM_5_TURBO: 4,
+    LlmModel.ZAI_GLM_5V_TURBO: 4,
    # v0 by Vercel models
    LlmModel.V0_1_5_MD: 1,
    LlmModel.V0_1_5_LG: 2,
--- a/autogpt_platform/backend/backend/data/user.py
+++ b/autogpt_platform/backend/backend/data/user.py
@@ -82,6 +82,28 @@ async def get_user_by_email(email: str) -> Optional[User]:
        raise DatabaseError(f"Failed to get user by email {email}: {e}") from e


+async def search_users(query: str, limit: int = 20) -> list[tuple[str, str | None]]:
+    """Search users by partial email or name.
+
+    Returns a list of ``(user_id, email)`` tuples, up to *limit* results.
+    Searches the User table directly — no dependency on credit history.
+    """
+    query = query.strip()
+    if not query or len(query) < 3:
+        return []
+    users = await prisma.user.find_many(
+        where={
+            "OR": [
+                {"email": {"contains": query, "mode": "insensitive"}},
+                {"name": {"contains": query, "mode": "insensitive"}},
+            ],
+        },
+        take=limit,
+        order={"email": "asc"},
+    )
+    return [(u.id, u.email) for u in users]
+
+
 async def update_user_email(user_id: str, email: str):
    try:
        # Get old email first for cache invalidation
--- a/autogpt_platform/backend/backend/util/cache.py
+++ b/autogpt_platform/backend/backend/util/cache.py
@@ -121,10 +121,16 @@ def _make_hashable_key(


 def _make_redis_key(key: tuple[Any, ...], func_name: str) -> str:
-    """Convert a hashable key tuple to a Redis key string."""
-    # Ensure key is already hashable
-    hashable_key = key if isinstance(key, tuple) else (key,)
-    return f"cache:{func_name}:{hash(hashable_key)}"
+    """Convert a hashable key tuple to a Redis key string.
+
+    Uses SHA-256 instead of Python's built-in ``hash()`` because ``hash()``
+    is randomised per-process (``PYTHONHASHSEED``).  In a multi-pod
+    deployment every pod must derive the **same** Redis key for the same
+    arguments, otherwise cache lookups and invalidations silently miss.
+    """
+    key_bytes = repr(key).encode()
+    digest = hashlib.sha256(key_bytes).hexdigest()
+    return f"cache:{func_name}:{digest}"


@runtime_checkable
--- a/autogpt_platform/backend/backend/util/feature_flag.py
+++ b/autogpt_platform/backend/backend/util/feature_flag.py
@@ -1,5 +1,6 @@
 import contextlib
 import logging
+import os
 from enum import Enum
 from functools import wraps
 from typing import Any, Awaitable, Callable, TypeVar
@@ -38,6 +39,7 @@ class Flag(str, Enum):
    AGENT_ACTIVITY = "agent-activity"
    ENABLE_PLATFORM_PAYMENT = "enable-platform-payment"
    CHAT = "chat"
+    CHAT_MODE_OPTION = "chat-mode-option"
    COPILOT_SDK = "copilot-sdk"
    COPILOT_DAILY_TOKEN_LIMIT = "copilot-daily-token-limit"
    COPILOT_WEEKLY_TOKEN_LIMIT = "copilot-weekly-token-limit"
@@ -165,6 +167,30 @@ async def get_feature_flag_value(
        return default


+def _env_flag_override(flag_key: Flag) -> bool | None:
+    """Return a local override for ``flag_key`` from the environment.
+
+    Set ``FORCE_FLAG_<NAME>=true|false`` (``NAME`` = flag value with
+    ``-`` → ``_``, upper-cased) to bypass LaunchDarkly for a single
+    flag in local dev or tests.  Returns ``None`` when no override
+    is configured so the caller falls through to LaunchDarkly.
+
+    The ``NEXT_PUBLIC_FORCE_FLAG_<NAME>`` prefix is also accepted so a
+    single shared env var can toggle a flag across backend and
+    frontend (the frontend requires the ``NEXT_PUBLIC_`` prefix to
+    expose the value to the browser bundle).
+
+    Example: ``FORCE_FLAG_CHAT_MODE_OPTION=true`` forces
+    ``Flag.CHAT_MODE_OPTION`` on regardless of LaunchDarkly.
+    """
+    suffix = flag_key.value.upper().replace("-", "_")
+    for prefix in ("FORCE_FLAG_", "NEXT_PUBLIC_FORCE_FLAG_"):
+        raw = os.environ.get(prefix + suffix)
+        if raw is not None:
+            return raw.strip().lower() in ("1", "true", "yes", "on")
+    return None
+
+
 async def is_feature_enabled(
    flag_key: Flag,
    user_id: str,
@@ -181,6 +207,11 @@ async def is_feature_enabled(
    Returns:
        True if feature is enabled, False otherwise
    """
+    override = _env_flag_override(flag_key)
+    if override is not None:
+        logger.debug(f"Feature flag {flag_key} overridden by env: {override}")
+        return override
+
    result = await get_feature_flag_value(flag_key.value, user_id, default)

    # If the result is already a boolean, return it
--- a/autogpt_platform/backend/backend/util/feature_flag_test.py
+++ b/autogpt_platform/backend/backend/util/feature_flag_test.py
@@ -4,6 +4,7 @@ from ldclient import LDClient

 from backend.util.feature_flag import (
    Flag,
+    _env_flag_override,
    feature_flag,
    is_feature_enabled,
    mock_flag_variation,
@@ -111,3 +112,59 @@ async def test_is_feature_enabled_with_flag_enum(mocker):
    assert result is True
    # Should call with the flag's string value
    mock_get_feature_flag_value.assert_called_once()
+
+
+class TestEnvFlagOverride:
+    def test_force_flag_true(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "true")
+        assert _env_flag_override(Flag.CHAT) is True
+
+    def test_force_flag_false(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "false")
+        assert _env_flag_override(Flag.CHAT) is False
+
+    def test_next_public_prefix_true(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("NEXT_PUBLIC_FORCE_FLAG_CHAT", "true")
+        assert _env_flag_override(Flag.CHAT) is True
+
+    def test_unset_returns_none(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.delenv("FORCE_FLAG_CHAT", raising=False)
+        monkeypatch.delenv("NEXT_PUBLIC_FORCE_FLAG_CHAT", raising=False)
+        assert _env_flag_override(Flag.CHAT) is None
+
+    def test_invalid_value_returns_false(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "notaboolean")
+        assert _env_flag_override(Flag.CHAT) is False
+
+    def test_numeric_one_returns_true(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "1")
+        assert _env_flag_override(Flag.CHAT) is True
+
+    def test_yes_returns_true(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "yes")
+        assert _env_flag_override(Flag.CHAT) is True
+
+    def test_on_returns_true(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "on")
+        assert _env_flag_override(Flag.CHAT) is True
+
+    def test_hyphenated_flag_converts_to_underscore(
+        self, monkeypatch: pytest.MonkeyPatch
+    ):
+        monkeypatch.setenv("FORCE_FLAG_CHAT_MODE_OPTION", "true")
+        assert _env_flag_override(Flag.CHAT_MODE_OPTION) is True
+
+    def test_force_flag_takes_precedence_over_next_public(
+        self, monkeypatch: pytest.MonkeyPatch
+    ):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "false")
+        monkeypatch.setenv("NEXT_PUBLIC_FORCE_FLAG_CHAT", "true")
+        assert _env_flag_override(Flag.CHAT) is False
+
+    def test_whitespace_is_stripped(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "  true  ")
+        assert _env_flag_override(Flag.CHAT) is True
+
+    def test_case_insensitive_value(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "TRUE")
+        assert _env_flag_override(Flag.CHAT) is True
--- a/autogpt_platform/backend/backend/util/workspace.py
+++ b/autogpt_platform/backend/backend/util/workspace.py
@@ -155,6 +155,7 @@ class WorkspaceManager:
        path: Optional[str] = None,
        mime_type: Optional[str] = None,
        overwrite: bool = False,
+        metadata: Optional[dict] = None,
    ) -> WorkspaceFile:
        """
        Write file to workspace.
@@ -168,6 +169,7 @@ class WorkspaceManager:
            path: Virtual path (defaults to "/{filename}", session-scoped if session_id set)
            mime_type: MIME type (auto-detected if not provided)
            overwrite: Whether to overwrite existing file at path
+            metadata: Optional metadata dict (e.g., origin tracking)

        Returns:
            Created WorkspaceFile instance
@@ -246,6 +248,7 @@ class WorkspaceManager:
                    mime_type=mime_type,
                    size_bytes=len(content),
                    checksum=checksum,
+                    metadata=metadata,
                )
            except UniqueViolationError:
                if retries > 0:
--- a/autogpt_platform/backend/migrations/20260326200000_add_rate_limit_tier/migration.sql
+++ b/autogpt_platform/backend/migrations/20260326200000_add_rate_limit_tier/migration.sql
@@ -0,0 +1,5 @@
+-- CreateEnum
+CREATE TYPE "SubscriptionTier" AS ENUM ('FREE', 'PRO', 'BUSINESS', 'ENTERPRISE');
+
+-- AlterTable: add subscriptionTier column with default PRO (beta testing)
+ALTER TABLE "User" ADD COLUMN "subscriptionTier" "SubscriptionTier" NOT NULL DEFAULT 'PRO';
--- a/autogpt_platform/backend/schema.prisma
+++ b/autogpt_platform/backend/schema.prisma
@@ -40,6 +40,15 @@ model User {

  timezone String @default("not-set")

+  // CoPilot subscription tier — controls rate-limit multipliers.
+  // Multipliers applied in get_global_rate_limits(): FREE=1x, PRO=5x, BUSINESS=20x, ENTERPRISE=60x.
+  // NOTE: @default(PRO) is intentional for the beta period — all existing and new
+  // users receive PRO-level (5x) rate limits by default. The Python-level constant
+  // DEFAULT_TIER=FREE (in copilot/rate_limit.py) acts as a code-level fallback when
+  // the DB value is NULL or unrecognised. At GA, a migration will flip the column
+  // default to FREE and batch-update users to their billing-derived tiers.
+  subscriptionTier SubscriptionTier @default(PRO)
+
  // Relations

  AgentGraphs          AgentGraph[]
@@ -73,6 +82,13 @@ model User {
  OAuthRefreshTokens      OAuthRefreshToken[]
 }

+enum SubscriptionTier {
+  FREE
+  PRO
+  BUSINESS
+  ENTERPRISE
+}
+
 enum OnboardingStep {
  // Introductory onboarding (Library)
  WELCOME
--- a/autogpt_platform/backend/snapshots/get_rate_limit
+++ b/autogpt_platform/backend/snapshots/get_rate_limit
@@ -1,6 +1,7 @@
 {
  "daily_token_limit": 2500000,
  "daily_tokens_used": 500000,
+  "tier": "FREE",
  "user_email": "target@example.com",
  "user_id": "5e53486c-cf57-477e-ba2a-cb02dc828e1c",
  "weekly_token_limit": 12500000,
--- a/autogpt_platform/backend/snapshots/reset_user_usage_daily_and_weekly
+++ b/autogpt_platform/backend/snapshots/reset_user_usage_daily_and_weekly
@@ -1,6 +1,7 @@
 {
  "daily_token_limit": 2500000,
  "daily_tokens_used": 0,
+  "tier": "FREE",
  "user_email": "target@example.com",
  "user_id": "5e53486c-cf57-477e-ba2a-cb02dc828e1c",
  "weekly_token_limit": 12500000,
--- a/autogpt_platform/backend/snapshots/reset_user_usage_daily_only
+++ b/autogpt_platform/backend/snapshots/reset_user_usage_daily_only
@@ -1,6 +1,7 @@
 {
  "daily_token_limit": 2500000,
  "daily_tokens_used": 0,
+  "tier": "FREE",
  "user_email": "target@example.com",
  "user_id": "5e53486c-cf57-477e-ba2a-cb02dc828e1c",
  "weekly_token_limit": 12500000,
--- a/autogpt_platform/backend/test/agent_generator/test_orchestrator.py
+++ b/autogpt_platform/backend/test/agent_generator/test_orchestrator.py
@@ -140,7 +140,9 @@ class TestFixOrchestratorBlocks:
        assert defaults["conversation_compaction"] is True
        assert defaults["retry"] == 3
        assert defaults["multiple_tool_calls"] is False
-        assert len(fixer.fixes_applied) == 4
+        assert defaults["execution_mode"] == "extended_thinking"
+        assert defaults["model"] == "claude-opus-4-6"
+        assert len(fixer.fixes_applied) == 6

    def test_preserves_existing_values(self):
        """Existing user-set values are never overwritten."""
@@ -153,6 +155,8 @@ class TestFixOrchestratorBlocks:
                        "conversation_compaction": False,
                        "retry": 1,
                        "multiple_tool_calls": True,
+                        "execution_mode": "built_in",
+                        "model": "gpt-4o",
                    }
                )
            ],
@@ -166,6 +170,8 @@ class TestFixOrchestratorBlocks:
        assert defaults["conversation_compaction"] is False
        assert defaults["retry"] == 1
        assert defaults["multiple_tool_calls"] is True
+        assert defaults["execution_mode"] == "built_in"
+        assert defaults["model"] == "gpt-4o"
        assert len(fixer.fixes_applied) == 0

    def test_partial_defaults(self):
@@ -189,7 +195,9 @@ class TestFixOrchestratorBlocks:
        assert defaults["conversation_compaction"] is True  # filled
        assert defaults["retry"] == 3  # filled
        assert defaults["multiple_tool_calls"] is False  # filled
-        assert len(fixer.fixes_applied) == 3
+        assert defaults["execution_mode"] == "extended_thinking"  # filled
+        assert defaults["model"] == "claude-opus-4-6"  # filled
+        assert len(fixer.fixes_applied) == 5

    def test_skips_non_sdm_nodes(self):
        """Non-Orchestrator nodes are untouched."""
@@ -258,11 +266,13 @@ class TestFixOrchestratorBlocks:
        result = fixer.fix_orchestrator_blocks(agent)

        defaults = result["nodes"][0]["input_default"]
-        assert defaults["agent_mode_max_iterations"] == 10  # None → default
-        assert defaults["conversation_compaction"] is True  # None → default
+        assert defaults["agent_mode_max_iterations"] == 10  # None -> default
+        assert defaults["conversation_compaction"] is True  # None -> default
        assert defaults["retry"] == 3  # kept
        assert defaults["multiple_tool_calls"] is False  # kept
-        assert len(fixer.fixes_applied) == 2
+        assert defaults["execution_mode"] == "extended_thinking"  # filled
+        assert defaults["model"] == "claude-opus-4-6"  # filled
+        assert len(fixer.fixes_applied) == 4

    def test_multiple_sdm_nodes(self):
        """Multiple SDM nodes are all fixed independently."""
@@ -277,11 +287,11 @@ class TestFixOrchestratorBlocks:

        result = fixer.fix_orchestrator_blocks(agent)

-        # First node: 3 defaults filled (agent_mode was already set)
+        # First node: 5 defaults filled (agent_mode was already set)
        assert result["nodes"][0]["input_default"]["agent_mode_max_iterations"] == 3
-        # Second node: all 4 defaults filled
+        # Second node: all 6 defaults filled
        assert result["nodes"][1]["input_default"]["agent_mode_max_iterations"] == 10
-        assert len(fixer.fixes_applied) == 7  # 3 + 4
+        assert len(fixer.fixes_applied) == 11  # 5 + 6

    def test_registered_in_apply_all_fixes(self):
        """fix_orchestrator_blocks runs as part of apply_all_fixes."""
@@ -655,6 +665,7 @@ class TestOrchestratorE2EPipeline:
                        "conversation_compaction": {"type": "boolean"},
                        "retry": {"type": "integer"},
                        "multiple_tool_calls": {"type": "boolean"},
+                        "execution_mode": {"type": "string"},
                    },
                    "required": ["prompt"],
                },
--- a/autogpt_platform/backend/test/copilot/init.py
+++ b/autogpt_platform/backend/test/copilot/init.py
--- a/autogpt_platform/backend/test/copilot/dry_run_loop_test.py
+++ b/autogpt_platform/backend/test/copilot/dry_run_loop_test.py
@@ -0,0 +1,394 @@
+"""Prompt regression tests AND functional tests for the dry-run verification loop.
+
+NOTE: This file lives in test/copilot/ rather than being colocated with a
+single source module because it is a cross-cutting test spanning multiple
+modules: prompting.py, service.py, agent_generation_guide.md, and run_agent.py.
+
+These tests verify that the create -> dry-run -> fix iterative workflow is
+properly communicated through tool descriptions, the prompting supplement,
+and the agent building guide.
+
+After deduplication, the full dry-run workflow lives in the
+agent_generation_guide.md only. The system prompt and individual tool
+descriptions no longer repeat it — they keep a minimal footprint.
+
+**Intentionally brittle**: the assertions check for specific substrings so
+that accidental removal or rewording of key instructions is caught. If you
+deliberately reword a prompt, update the corresponding assertion here.
+
+--- Functional tests (added separately) ---
+
+The dry-run loop is primarily a *prompt/guide* feature — the copilot reads
+the guide and follows its instructions.  There are no standalone Python
+functions that implement "loop until passing" logic; the loop is driven by
+the LLM.  However, several pieces of real Python infrastructure make the
+loop possible:
+
+1. The ``run_agent`` and ``run_block`` OpenAI tool schemas expose a
+   ``dry_run`` boolean parameter that the LLM must be able to set.
+2. The ``RunAgentInput`` Pydantic model validates ``dry_run`` as a required
+   bool, so the executor can branch on it.
+3. The ``_check_prerequisites`` method in ``RunAgentTool`` bypasses
+   credential and missing-input gates when ``dry_run=True``.
+4. The guide documents the workflow steps in a specific order that the LLM
+   must follow: create/edit -> dry-run -> inspect -> fix -> repeat.
+
+The functional test classes below exercise items 1-4 directly.
+"""
+
+import re
+from pathlib import Path
+from typing import Any, cast
+
+import pytest
+from openai.types.chat import ChatCompletionToolParam
+from pydantic import ValidationError
+
+from backend.copilot.prompting import get_sdk_supplement
+from backend.copilot.service import DEFAULT_SYSTEM_PROMPT
+from backend.copilot.tools import TOOL_REGISTRY
+from backend.copilot.tools.run_agent import RunAgentInput
+
+# Resolved once for the whole module so individual tests stay fast.
+_SDK_SUPPLEMENT = get_sdk_supplement(use_e2b=False, cwd="/tmp/test")
+
+
+# ---------------------------------------------------------------------------
+# Prompt regression tests (original)
+# ---------------------------------------------------------------------------
+
+
+class TestSystemPromptBasics:
+    """Verify the system prompt includes essential baseline content.
+
+    After deduplication, the dry-run workflow lives only in the guide.
+    The system prompt carries tone and personality only.
+    """
+
+    def test_mentions_automations(self):
+        assert "automations" in DEFAULT_SYSTEM_PROMPT.lower()
+
+    def test_mentions_action_oriented(self):
+        assert "action-oriented" in DEFAULT_SYSTEM_PROMPT.lower()
+
+
+class TestToolDescriptionsDryRunLoop:
+    """Verify tool descriptions and parameters related to the dry-run loop."""
+
+    def test_get_agent_building_guide_mentions_workflow(self):
+        desc = TOOL_REGISTRY["get_agent_building_guide"].description
+        assert "dry-run" in desc.lower()
+
+    def test_run_agent_dry_run_param_exists_and_is_boolean(self):
+        schema = TOOL_REGISTRY["run_agent"].as_openai_tool()
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        assert "dry_run" in params["properties"]
+        assert params["properties"]["dry_run"]["type"] == "boolean"
+
+    def test_run_agent_dry_run_param_mentions_simulation(self):
+        """After deduplication the dry_run param description mentions simulation."""
+        schema = TOOL_REGISTRY["run_agent"].as_openai_tool()
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        dry_run_desc = params["properties"]["dry_run"]["description"]
+        assert "simulat" in dry_run_desc.lower()
+
+
+class TestPromptingSupplementContent:
+    """Verify the prompting supplement (via get_sdk_supplement) includes
+    essential shared tool notes.  After deduplication, the dry-run workflow
+    lives only in the guide; the supplement carries storage, file-handling,
+    and tool-discovery notes.
+    """
+
+    def test_includes_tool_discovery_priority(self):
+        assert "Tool Discovery Priority" in _SDK_SUPPLEMENT
+
+    def test_includes_find_block_first(self):
+        assert "find_block first" in _SDK_SUPPLEMENT or "find_block" in _SDK_SUPPLEMENT
+
+    def test_includes_send_authenticated_web_request(self):
+        assert "SendAuthenticatedWebRequestBlock" in _SDK_SUPPLEMENT
+
+
+class TestAgentBuildingGuideDryRunLoop:
+    """Verify the agent building guide includes the dry-run loop."""
+
+    @pytest.fixture
+    def guide_content(self):
+        guide_path = (
+            Path(__file__).resolve().parent.parent.parent
+            / "backend"
+            / "copilot"
+            / "sdk"
+            / "agent_generation_guide.md"
+        )
+        return guide_path.read_text(encoding="utf-8")
+
+    def test_has_dry_run_verification_section(self, guide_content):
+        assert "REQUIRED: Dry-Run Verification Loop" in guide_content
+
+    def test_workflow_includes_dry_run_step(self, guide_content):
+        assert "dry_run=True" in guide_content
+
+    def test_mentions_good_vs_bad_output(self, guide_content):
+        assert "**Good output**" in guide_content
+        assert "**Bad output**" in guide_content
+
+    def test_mentions_repeat_until_pass(self, guide_content):
+        lower = guide_content.lower()
+        assert "repeat" in lower
+        assert "clearly unfixable" in lower
+
+    def test_mentions_wait_for_result(self, guide_content):
+        assert "wait_for_result=120" in guide_content
+
+    def test_mentions_view_agent_output(self, guide_content):
+        assert "view_agent_output" in guide_content
+
+    def test_workflow_has_dry_run_and_inspect_steps(self, guide_content):
+        assert "**Dry-run**" in guide_content
+        assert "**Inspect & fix**" in guide_content
+
+
+# ---------------------------------------------------------------------------
+# Functional tests: tool schema validation
+# ---------------------------------------------------------------------------
+
+
+class TestRunAgentToolSchema:
+    """Validate the run_agent OpenAI tool schema exposes dry_run correctly.
+
+    These go beyond substring checks — they verify the full schema structure
+    that the LLM receives, ensuring the parameter is well-formed and will be
+    parsed correctly by OpenAI function-calling.
+    """
+
+    @pytest.fixture
+    def schema(self) -> ChatCompletionToolParam:
+        return TOOL_REGISTRY["run_agent"].as_openai_tool()
+
+    def test_schema_is_valid_openai_tool(self, schema: ChatCompletionToolParam):
+        """The schema has the required top-level OpenAI structure."""
+        assert schema["type"] == "function"
+        assert "function" in schema
+        func = schema["function"]
+        assert "name" in func
+        assert "description" in func
+        assert "parameters" in func
+        assert func["name"] == "run_agent"
+
+    def test_dry_run_is_required(self, schema: ChatCompletionToolParam):
+        """dry_run must be in 'required' so the LLM always provides it explicitly."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        required = params.get("required", [])
+        assert "dry_run" in required
+
+    def test_dry_run_is_boolean_type(self, schema: ChatCompletionToolParam):
+        """dry_run must be typed as boolean so the LLM generates true/false."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        assert params["properties"]["dry_run"]["type"] == "boolean"
+
+    def test_dry_run_description_is_nonempty(self, schema: ChatCompletionToolParam):
+        """The description must be present and substantive for LLM guidance."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        desc = params["properties"]["dry_run"]["description"]
+        assert isinstance(desc, str)
+        assert len(desc) > 10, "Description too short to guide the LLM"
+
+    def test_wait_for_result_coexists_with_dry_run(
+        self, schema: ChatCompletionToolParam
+    ):
+        """wait_for_result must also be present — the guide instructs the LLM
+        to pass both dry_run=True and wait_for_result=120 together."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        assert "wait_for_result" in params["properties"]
+        assert params["properties"]["wait_for_result"]["type"] == "integer"
+
+
+class TestRunBlockToolSchema:
+    """Validate the run_block OpenAI tool schema exposes dry_run correctly."""
+
+    @pytest.fixture
+    def schema(self) -> ChatCompletionToolParam:
+        return TOOL_REGISTRY["run_block"].as_openai_tool()
+
+    def test_schema_is_valid_openai_tool(self, schema: ChatCompletionToolParam):
+        assert schema["type"] == "function"
+        func = schema["function"]
+        assert func["name"] == "run_block"
+        assert "parameters" in func
+
+    def test_dry_run_exists_and_is_boolean(self, schema: ChatCompletionToolParam):
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        props = params["properties"]
+        assert "dry_run" in props
+        assert props["dry_run"]["type"] == "boolean"
+
+    def test_dry_run_is_required(self, schema: ChatCompletionToolParam):
+        """dry_run must be required — along with block_id and input_data."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        required = params.get("required", [])
+        assert "dry_run" in required
+        assert "block_id" in required
+        assert "input_data" in required
+
+    def test_dry_run_description_mentions_preview(
+        self, schema: ChatCompletionToolParam
+    ):
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        desc = params["properties"]["dry_run"]["description"]
+        assert isinstance(desc, str)
+        assert (
+            "preview mode" in desc.lower()
+        ), "run_block dry_run description should mention preview mode"
+
+
+# ---------------------------------------------------------------------------
+# Functional tests: RunAgentInput Pydantic model
+# ---------------------------------------------------------------------------
+
+
+class TestRunAgentInputModel:
+    """Validate RunAgentInput Pydantic model handles dry_run correctly.
+
+    The executor reads dry_run from this model, so it must parse, default,
+    and validate properly.
+    """
+
+    def test_dry_run_accepts_true(self):
+        model = RunAgentInput(username_agent_slug="user/agent", dry_run=True)
+        assert model.dry_run is True
+
+    def test_dry_run_accepts_false(self):
+        """dry_run=False must be accepted when provided explicitly."""
+        model = RunAgentInput(username_agent_slug="user/agent", dry_run=False)
+        assert model.dry_run is False
+
+    def test_dry_run_coerces_truthy_int(self):
+        """Pydantic bool fields coerce int 1 to True."""
+        model = RunAgentInput(username_agent_slug="user/agent", dry_run=1)  # type: ignore[arg-type]
+        assert model.dry_run is True
+
+    def test_dry_run_coerces_falsy_int(self):
+        """Pydantic bool fields coerce int 0 to False."""
+        model = RunAgentInput(username_agent_slug="user/agent", dry_run=0)  # type: ignore[arg-type]
+        assert model.dry_run is False
+
+    def test_dry_run_with_wait_for_result(self):
+        """The guide instructs passing both dry_run=True and wait_for_result=120.
+        The model must accept this combination."""
+        model = RunAgentInput(
+            username_agent_slug="user/agent",
+            dry_run=True,
+            wait_for_result=120,
+        )
+        assert model.dry_run is True
+        assert model.wait_for_result == 120
+
+    def test_wait_for_result_upper_bound(self):
+        """wait_for_result is bounded at 300 seconds (ge=0, le=300)."""
+        with pytest.raises(ValidationError):
+            RunAgentInput(
+                username_agent_slug="user/agent",
+                dry_run=True,
+                wait_for_result=301,
+            )
+
+    def test_string_fields_are_stripped(self):
+        """The strip_strings validator should strip whitespace from string fields."""
+        model = RunAgentInput(username_agent_slug="  user/agent  ", dry_run=True)
+        assert model.username_agent_slug == "user/agent"
+
+
+# ---------------------------------------------------------------------------
+# Functional tests: guide documents the correct workflow ordering
+# ---------------------------------------------------------------------------
+
+
+class TestGuideWorkflowOrdering:
+    """Verify the guide documents workflow steps in the correct order.
+
+    The LLM must see: create/edit -> dry-run -> inspect -> fix -> repeat.
+    If these steps are reordered, the copilot would follow the wrong sequence.
+    These tests verify *ordering*, not just presence.
+    """
+
+    @pytest.fixture
+    def guide_content(self) -> str:
+        guide_path = (
+            Path(__file__).resolve().parent.parent.parent
+            / "backend"
+            / "copilot"
+            / "sdk"
+            / "agent_generation_guide.md"
+        )
+        return guide_path.read_text(encoding="utf-8")
+
+    def test_create_before_dry_run_in_workflow(self, guide_content: str):
+        """Step 7 (Save/create_agent) must appear before step 8 (Dry-run)."""
+        create_pos = guide_content.index("create_agent")
+        dry_run_pos = guide_content.index("dry_run=True")
+        assert (
+            create_pos < dry_run_pos
+        ), "create_agent must appear before dry_run=True in the workflow"
+
+    def test_dry_run_before_inspect_in_verification_section(self, guide_content: str):
+        """In the verification loop section, Dry-run step must come before
+        Inspect & fix step."""
+        section_start = guide_content.index("REQUIRED: Dry-Run Verification Loop")
+        section = guide_content[section_start:]
+        dry_run_pos = section.index("**Dry-run**")
+        inspect_pos = section.index("**Inspect")
+        assert (
+            dry_run_pos < inspect_pos
+        ), "Dry-run step must come before Inspect & fix in the verification loop"
+
+    def test_fix_before_repeat_in_verification_section(self, guide_content: str):
+        """The Fix step must come before the Repeat step."""
+        section_start = guide_content.index("REQUIRED: Dry-Run Verification Loop")
+        section = guide_content[section_start:]
+        fix_pos = section.index("**Fix**")
+        repeat_pos = section.index("**Repeat**")
+        assert fix_pos < repeat_pos
+
+    def test_good_output_before_bad_output(self, guide_content: str):
+        """Good output examples should be listed before bad output examples,
+        so the LLM sees the success pattern first."""
+        good_pos = guide_content.index("**Good output**")
+        bad_pos = guide_content.index("**Bad output**")
+        assert good_pos < bad_pos
+
+    def test_numbered_steps_in_verification_section(self, guide_content: str):
+        """The step-by-step workflow should have numbered steps 1-5."""
+        section_start = guide_content.index("Step-by-step workflow")
+        section = guide_content[section_start:]
+        # The section should contain numbered items 1 through 5
+        for step_num in range(1, 6):
+            assert (
+                f"{step_num}. " in section
+            ), f"Missing numbered step {step_num} in verification workflow"
+
+    def test_workflow_steps_are_in_numbered_order(self, guide_content: str):
+        """The main workflow steps (1-9) must appear in ascending order."""
+        # Extract the numbered workflow items from the top-level workflow section
+        workflow_start = guide_content.index("### Workflow for Creating/Editing Agents")
+        # End at the next ### section
+        next_section = guide_content.index("### Agent JSON Structure")
+        workflow_section = guide_content[workflow_start:next_section]
+        step_positions = []
+        for step_num in range(1, 10):
+            pattern = rf"^{step_num}\.\s"
+            match = re.search(pattern, workflow_section, re.MULTILINE)
+            if match:
+                step_positions.append((step_num, match.start()))
+        # Verify at least steps 1-9 are present and in order
+        assert (
+            len(step_positions) >= 9
+        ), f"Expected 9 workflow steps, found {len(step_positions)}"
+        for i in range(1, len(step_positions)):
+            prev_num, prev_pos = step_positions[i - 1]
+            curr_num, curr_pos = step_positions[i]
+            assert prev_pos < curr_pos, (
+                f"Step {prev_num} (pos {prev_pos}) should appear before "
+                f"step {curr_num} (pos {curr_pos})"
+            )
--- a/autogpt_platform/docker-compose.yml
+++ b/autogpt_platform/docker-compose.yml
@@ -98,6 +98,7 @@ services:
      - CLAMD_CONF_MaxScanSize=100M
      - CLAMD_CONF_MaxThreads=12
      - CLAMD_CONF_ReadTimeout=300
+      - CLAMD_CONF_TCPAddr=0.0.0.0
    healthcheck:
      test: ["CMD-SHELL", "clamdscan --version || exit 1"]
      interval: 30s
--- a/autogpt_platform/frontend/AGENTS.md
+++ b/autogpt_platform/frontend/AGENTS.md
@@ -40,6 +40,8 @@ After making **any** code changes in the frontend, you MUST run the following co

 Do NOT skip these steps. If any command reports errors, fix them and re-run until clean. Only then may you consider the task complete. If typing keeps failing, stop and ask the user.

+4. `pnpm test:unit` — run integration tests; fix any failures
+
 ### Code Style

 - Fully capitalize acronyms in symbols, e.g. `graphID`, `useBackendAPI`
@@ -62,7 +64,7 @@ Do NOT skip these steps. If any command reports errors, fix them and re-run unti
 - **Icons**: Phosphor Icons only
 - **Feature Flags**: LaunchDarkly integration
 - **Error Handling**: ErrorCard for render errors, toast for mutations, Sentry for exceptions
- **Testing**: Playwright for E2E, Storybook for component development
+- **Testing**: Vitest + React Testing Library + MSW for integration tests (primary), Playwright for E2E, Storybook for visual

 ## Environment Configuration

@@ -84,7 +86,12 @@ See @CONTRIBUTING.md for complete patterns. Quick reference:
   - Regenerate with `pnpm generate:api`
   - Pattern: `use{Method}{Version}{OperationName}`
 4. **Styling**: Tailwind CSS only, use design tokens, Phosphor Icons only
-5. **Testing**: Add Storybook stories for new components, Playwright for E2E. When fixing a bug, write a failing Playwright test first (use `.fixme` annotation), implement the fix, then remove the annotation.
+5. **Testing**: Integration tests are the default (~90%). See `TESTING.md` for full details.
+   - **New pages/features**: Write integration tests in `__tests__/` next to `page.tsx` using Vitest + RTL + MSW
+   - **API mocking**: Use Orval-generated MSW handlers from `@/app/api/__generated__/endpoints/{tag}/{tag}.msw.ts`
+   - **Run**: `pnpm test:unit` (integration/unit), `pnpm test` (Playwright E2E)
+   - **Storybook**: For design system components in `src/components/`
+   - **TDD**: Write a failing test first, implement, then verify
 6. **Code conventions**:
   - Use function declarations (not arrow functions) for components/handlers
   - Do not use `useCallback` or `useMemo` unless asked to optimise a given function
--- a/autogpt_platform/frontend/CONTRIBUTING.md
+++ b/autogpt_platform/frontend/CONTRIBUTING.md
@@ -747,9 +747,65 @@ export function CreateButton() {

 ---

-## 🧪 Testing & Storybook
+## 🧪 Testing

- See `TESTING.md` for Playwright setup, E2E data seeding, and Storybook usage.
+See `TESTING.md` for full details. Key principles:
+
+### Integration tests are the default (~90% of tests)
+
+We test at the **page level**: render the page with React Testing Library, mock API requests with MSW (auto-generated by Orval), and assert with testing-library queries.
+
+```bash
+pnpm test:unit              # run integration/unit tests
+pnpm test:unit:watch        # watch mode
+```
+
+### Test file location
+
+Tests live in `__tests__/` next to the page or component:
+
+```
+app/(platform)/library/
+  __tests__/
+    main.test.tsx           # main page rendering & interactions
+    search.test.tsx         # search-specific behavior
+  components/
+  page.tsx
+  useLibraryPage.ts
+```
+
+### Writing a test
+
+1. Render the page using `render()` from `@/tests/integrations/test-utils`
+2. Mock API responses using Orval-generated MSW handlers from `@/app/api/__generated__/endpoints/{tag}/{tag}.msw.ts`
+3. Assert with `screen.findByText`, `screen.getByRole`, etc.
+
+```tsx
+import { render, screen } from "@/tests/integrations/test-utils";
+import { server } from "@/mocks/mock-server";
+import { getGetV2ListLibraryAgentsMockHandler200 } from "@/app/api/__generated__/endpoints/library/library.msw";
+import LibraryPage from "../page";
+
+test("renders agent list", async () => {
+  server.use(getGetV2ListLibraryAgentsMockHandler200());
+  render(<LibraryPage />);
+  expect(await screen.findByText("My Agents")).toBeDefined();
+});
+```
+
+### When to use each test type
+
+| Type                                 | When                                          |
+| ------------------------------------ | --------------------------------------------- |
+| **Integration (Vitest + RTL + MSW)** | Default for all new pages and features        |
+| **E2E (Playwright)**                 | Auth flows, payments, cross-page navigation   |
+| **Storybook**                        | Design system components in `src/components/` |
+
+### TDD workflow
+
+1. Write a failing test (integration test or Playwright with `.fixme`)
+2. Implement the fix/feature
+3. Remove annotations and run the full suite

 ---

@@ -763,8 +819,10 @@ Common scripts (see `package.json` for full list):
 - `pnpm lint` — ESLint + Prettier check
 - `pnpm format` — Format code
 - `pnpm types` — Type-check
+- `pnpm test:unit` — Run integration/unit tests (Vitest + RTL + MSW)
+- `pnpm test:unit:watch` — Watch mode for integration tests
+- `pnpm test` — Run Playwright E2E tests
 - `pnpm storybook` — Run Storybook
- `pnpm test` — Run Playwright tests

 Generated API client:

@@ -780,6 +838,7 @@ Generated API client:
 - Logic is separated into `use*.ts` and `helpers.ts` when non-trivial
 - Reusable logic extracted to `src/services/` or `src/lib/utils.ts` when appropriate
 - Navigation uses the Next.js router
+- Integration tests added/updated for new pages and features (`pnpm test:unit`)
 - Lint, format, type-check, and tests pass locally
 - Stories updated/added if UI changed; verified in Storybook

--- a/autogpt_platform/frontend/Dockerfile
+++ b/autogpt_platform/frontend/Dockerfile
@@ -12,6 +12,10 @@ COPY autogpt_platform/frontend/ .
 # Allow CI to opt-in to Playwright test build-time flags
 ARG NEXT_PUBLIC_PW_TEST="false"
 ENV NEXT_PUBLIC_PW_TEST=$NEXT_PUBLIC_PW_TEST
+# Allow CI to opt-in to browser sourcemaps for coverage path resolution.
+# Keep Docker builds defaulting to false to avoid the memory hit.
+ARG NEXT_PUBLIC_SOURCEMAPS="false"
+ENV NEXT_PUBLIC_SOURCEMAPS=$NEXT_PUBLIC_SOURCEMAPS
 ENV NODE_ENV="production"
 # Merge env files appropriately based on environment
 RUN if [ -f .env.production ]; then \
@@ -25,10 +29,6 @@ RUN if [ -f .env.production ]; then \
      cp .env.default .env; \
    fi
 RUN pnpm run generate:api
-# Disable source-map generation in Docker builds to halve webpack memory usage.
-# Source maps are only useful when SENTRY_AUTH_TOKEN is set (Vercel deploys);
-# the Docker image never uploads them, so generating them just wastes RAM.
-ENV NEXT_PUBLIC_SOURCEMAPS="false"
 # In CI, we want NEXT_PUBLIC_PW_TEST=true during build so Next.js inlines it
 RUN if [ "$NEXT_PUBLIC_PW_TEST" = "true" ]; then NEXT_PUBLIC_PW_TEST=true NODE_OPTIONS="--max-old-space-size=8192" pnpm build; else NODE_OPTIONS="--max-old-space-size=8192" pnpm build; fi

--- a/autogpt_platform/frontend/TESTING.md
+++ b/autogpt_platform/frontend/TESTING.md
@@ -1,57 +1,168 @@
-# Frontend Testing 🧪
+# Frontend Testing

-## Quick Start (local) 🚀
+## Testing Strategy
+
+| Type                      | Tool                                 | Speed         | When to use                                           |
+| ------------------------- | ------------------------------------ | ------------- | ----------------------------------------------------- |
+| **Integration (primary)** | Vitest + React Testing Library + MSW | Fast (~100ms) | ~90% of tests — page-level rendering with mocked API  |
+| **E2E**                   | Playwright                           | Slow (~5s)    | Critical flows: auth, payments, cross-page navigation |
+| **Visual**                | Storybook + Chromatic                | N/A           | Design system components                              |
+
+**Integration tests are the default.** Since most of our code is client-only, we test at the page level: render the page with React Testing Library, mock API requests with MSW (handlers auto-generated by Orval), and assert with testing-library queries.
+
+## Integration Tests (Vitest + RTL + MSW)
+
+### Running
+
+```bash
+pnpm test:unit              # run all integration/unit tests with coverage
+pnpm test:unit:watch        # watch mode for development
+```
+
+### File location
+
+Tests live in a `__tests__/` folder next to the page or component they test:
+
+```
+app/(platform)/library/
+  __tests__/
+    main.test.tsx           # tests the main page rendering & interactions
+    search.test.tsx         # tests search-specific behavior
+  components/
+    AgentCard/
+      AgentCard.tsx
+      __tests__/
+        AgentCard.test.tsx  # only when testing the component in isolation
+  page.tsx
+  useLibraryPage.ts
+```
+
+**Naming**: use descriptive names like `main.test.tsx`, `search.test.tsx`, `filters.test.tsx` — not `page.test.tsx` or `index.test.tsx`.
+
+### Writing an integration test
+
+1. **Render the page** using the custom `render()` from `@/tests/integrations/test-utils` (wraps providers)
+2. **Mock API responses** using Orval-generated MSW handlers from `@/app/api/__generated__/endpoints/{tag}/{tag}.msw.ts`
+3. **Assert** with React Testing Library queries (`screen.findByText`, `screen.getByRole`, etc.)
+
+```tsx
+import { render, screen } from "@/tests/integrations/test-utils";
+import { server } from "@/mocks/mock-server";
+import {
+  getGetV2ListLibraryAgentsMockHandler200,
+  getGetV2ListLibraryAgentsMockHandler422,
+} from "@/app/api/__generated__/endpoints/library/library.msw";
+import LibraryPage from "../page";
+
+describe("LibraryPage", () => {
+  test("renders agent list from API", async () => {
+    server.use(getGetV2ListLibraryAgentsMockHandler200());
+
+    render(<LibraryPage />);
+
+    expect(await screen.findByText("My Agents")).toBeDefined();
+  });
+
+  test("shows error state on API failure", async () => {
+    server.use(getGetV2ListLibraryAgentsMockHandler422());
+
+    render(<LibraryPage />);
+
+    expect(await screen.findByText(/error/i)).toBeDefined();
+  });
+});
+```
+
+### MSW handlers
+
+Orval generates typed MSW handlers for every endpoint and HTTP status code:
+
+- `getGetV2ListLibraryAgentsMockHandler200()` — success response with faker data
+- `getGetV2ListLibraryAgentsMockHandler422()` — validation error response
+- `getGetV2ListLibraryAgentsMockHandler401()` — unauthorized response
+
+To override with custom data, pass a resolver:
+
+```tsx
+import { http, HttpResponse } from "msw";
+
+server.use(
+  http.get("http://localhost:3000/api/proxy/api/library/agents", () => {
+    return HttpResponse.json({
+      agents: [{ id: "1", name: "My Agent" }],
+      pagination: { total: 1 },
+    });
+  }),
+);
+```
+
+All handlers are aggregated in `src/mocks/mock-handlers.ts` and the MSW server is set up in `src/mocks/mock-server.ts`.
+
+### Test utilities
+
+- **`@/tests/integrations/test-utils`** — custom `render()` that wraps components with `QueryClientProvider`, `BackendAPIProvider`, `OnboardingProvider`, `NuqsTestingAdapter`, and `TooltipProvider`, so query-state hooks and tooltips work out of the box in page-level tests
+- **`@/tests/integrations/setup-nextjs-mocks`** — mocks for `next/navigation`, `next/image`, `next/headers`, `next/link`
+- **`@/tests/integrations/mock-supabase-request`** — mocks Supabase auth (returns null user by default)
+
+### What to test at page level
+
+- Page renders with API data (happy path)
+- Loading and error states
+- User interactions that trigger mutations (clicks, form submissions)
+- Conditional rendering based on API responses
+- Search, filtering, pagination behavior
+
+### When to test a component in isolation
+
+Only when the component has complex internal logic that is hard to exercise through the page test. Prefer page-level tests as the default.
+
+## E2E Tests (Playwright)
+
+### Running
+
+```bash
+pnpm test                   # build + run all Playwright tests
+pnpm test-ui                # run with Playwright UI
+pnpm test:no-build          # run against a running dev server
+```
+
+### Setup

 1. Start the backend + Supabase stack:
   - From `autogpt_platform`: `docker compose --profile local up deps_backend -d`
-   - Or run the full stack: `docker compose up -d`
 2. Seed rich E2E data (creates `test123@gmail.com` with library agents):
   - From `autogpt_platform/backend`: `poetry run python test/e2e_test_data.py`
-3. Run Playwright:
-   - From `autogpt_platform/frontend`: `pnpm test` or `pnpm test-ui`

-## How Playwright setup works 🎭
+### How Playwright setup works

- Playwright runs from `frontend/playwright.config.ts` with a global setup step.
- The global setup creates a user pool via the real signup UI and stores it in `frontend/.auth/user-pool.json`.
- Most tests call `getTestUser()` (from `src/tests/utils/auth.ts`) which pulls a random user from that pool.
-  - these users do not contain library agents, it's user that just "signed up" on the platform, hence some tests to make use of users created via script (see below) with more data
+- Playwright runs from `frontend/playwright.config.ts` with a global setup step
+- Global setup creates a user pool via the real signup UI, stored in `frontend/.auth/user-pool.json`
+- `getTestUser()` (from `src/tests/utils/auth.ts`) pulls a random user from the pool
+- `getTestUserWithLibraryAgents()` uses the rich user created by the data script

-## Test users 👤
+### Test users

- **User pool (basic users)**  
-  Created automatically by the Playwright global setup through `/signup`.  
-  Used by `getTestUser()` in `src/tests/utils/auth.ts`.
+- **User pool (basic users)** — created automatically by Playwright global setup. Used by `getTestUser()`
+- **Rich user with library agents** — created by `backend/test/e2e_test_data.py`. Used by `getTestUserWithLibraryAgents()`

- **Rich user with library agents**  
-  Created by `backend/test/e2e_test_data.py`.  
-  Accessed via `getTestUserWithLibraryAgents()` in `src/tests/credentials/index.ts`.
-
-Use the rich user when a test needs existing library agents (e.g. `library.spec.ts`).
-
-## Resetting or wiping the DB 🔁
+### Resetting the DB

 If you reset the Docker DB and logins start failing:

-1. Delete `frontend/.auth/user-pool.json` so the pool is regenerated.
-2. Re-run the E2E data script to recreate the rich user + library agents:
-   - `poetry run python test/e2e_test_data.py`
+1. Delete `frontend/.auth/user-pool.json`
+2. Re-run `poetry run python test/e2e_test_data.py`

-## Storybook 📚
+## Storybook

-## Flow diagram 🗺️
+- `pnpm storybook` — run locally
+- `pnpm build-storybook` — build static
+- `pnpm test-storybook` — CI runner
+- When changing components in `src/components`, update or add stories and verify in Storybook/Chromatic

-```mermaid
-flowchart TD
-  A[Start Docker stack] --> B[Run e2e_test_data.py]
-  B --> C[Run Playwright tests]
-  C --> D[Global setup creates user pool]
-  D --> E{Test needs rich data?}
-  E -->|No| F[getTestUser from user pool]
-  E -->|Yes| G[getTestUserWithLibraryAgents]
-```
+## TDD Workflow

- `pnpm storybook` – Run Storybook locally
- `pnpm build-storybook` – Build a static Storybook
- CI runner: `pnpm test-storybook`
- When changing components in `src/components`, update or add stories and verify in Storybook/Chromatic.
+When fixing a bug or adding a feature:
+
+1. **Write a failing test first** — for integration tests, write the test and confirm it fails. For Playwright, use `.fixme` annotation
+2. **Implement the fix/feature** — write the minimal code to make the test pass
+3. **Remove annotations** — once passing, remove `.fixme` and run the full suite
--- a/autogpt_platform/frontend/package.json
+++ b/autogpt_platform/frontend/package.json
@@ -161,6 +161,7 @@
    "eslint-plugin-storybook": "9.1.5",
    "happy-dom": "20.3.4",
    "import-in-the-middle": "2.0.2",
+    "monocart-reporter": "2.10.0",
    "msw": "2.11.6",
    "msw-storybook-addon": "2.0.6",
    "orval": "7.13.0",
--- a/autogpt_platform/frontend/playwright.config.ts
+++ b/autogpt_platform/frontend/playwright.config.ts
@@ -5,10 +5,57 @@ import { defineConfig, devices } from "@playwright/test";
 * https://github.com/motdotla/dotenv
 */
 import dotenv from "dotenv";
+import fs from "fs";
 import path from "path";
 dotenv.config({ path: path.resolve(__dirname, ".env") });
 dotenv.config({ path: path.resolve(__dirname, "../backend/.env") });

+const frontendRoot = __dirname.replaceAll("\\", "/");
+
+// Directory where CI copies .next/static from the Docker container
+const staticCoverageDir = path.resolve(__dirname, ".next-static-coverage");
+
+function normalizeCoverageSourcePath(filePath: string) {
+  const normalizedFilePath = filePath.replaceAll("\\", "/");
+  const withoutWebpackPrefix = normalizedFilePath.replace(
+    /^webpack:\/\/_N_E\//,
+    "",
+  );
+
+  if (withoutWebpackPrefix.startsWith("./")) {
+    return withoutWebpackPrefix.slice(2);
+  }
+
+  if (withoutWebpackPrefix.startsWith(frontendRoot)) {
+    return path.posix.relative(frontendRoot, withoutWebpackPrefix);
+  }
+
+  return withoutWebpackPrefix;
+}
+
+// Resolve source maps from the copied .next/static directory.
+// Cache parsed results to avoid repeated disk reads during report generation.
+const sourceMapCache = new Map<string, object | undefined>();
+
+function resolveSourceMap(sourcePath: string) {
+  // sourcePath is the sourceMappingURL, e.g.:
+  //   "http://localhost:3000/_next/static/chunks/abc123.js.map"
+  const match = sourcePath.match(/_next\/static\/(.+)$/);
+  if (!match) return undefined;
+
+  const mapFile = path.join(staticCoverageDir, match[1]);
+  if (sourceMapCache.has(mapFile)) return sourceMapCache.get(mapFile);
+
+  try {
+    const result = JSON.parse(fs.readFileSync(mapFile, "utf8")) as object;
+    sourceMapCache.set(mapFile, result);
+    return result;
+  } catch {
+    sourceMapCache.set(mapFile, undefined);
+    return undefined;
+  }
+}
+
 export default defineConfig({
  testDir: "./src/tests",
  /* Global setup file that runs before all tests */
@@ -22,7 +69,30 @@ export default defineConfig({
  /* use more workers on CI. */
  workers: process.env.CI ? 4 : undefined,
  /* Reporter to use. See https://playwright.dev/docs/test-reporters */
-  reporter: [["list"], ["html", { open: "never" }]],
+  reporter: [
+    ["list"],
+    ["html", { open: "never" }],
+    [
+      "monocart-reporter",
+      {
+        name: "E2E Coverage Report",
+        outputFile: "./coverage/e2e/report.html",
+        coverage: {
+          reports: ["cobertura"],
+          outputDir: "./coverage/e2e",
+          entryFilter: (entry: { url: string }) =>
+            entry.url.includes("/_next/static/") &&
+            !entry.url.includes("node_modules"),
+          sourceFilter: (sourcePath: string) =>
+            sourcePath.includes("src/") && !sourcePath.includes("node_modules"),
+          sourcePath: (filePath: string) =>
+            normalizeCoverageSourcePath(filePath),
+          sourceMapResolver: (sourcePath: string) =>
+            resolveSourceMap(sourcePath),
+        },
+      },
+    ],
+  ],
  /* Shared settings for all the projects below. See https://playwright.dev/docs/api/class-testoptions. */
  use: {
    /* Base URL to use in actions like `await page.goto('/')`. */
--- a/autogpt_platform/frontend/pnpm-lock.yaml
+++ b/autogpt_platform/frontend/pnpm-lock.yaml
@@ -400,6 +400,9 @@ importers:
      import-in-the-middle:
        specifier: 2.0.2
        version: 2.0.2
+      monocart-reporter:
+        specifier: 2.10.0
+        version: 2.10.0
      msw:
        specifier: 2.11.6
        version: 2.11.6(@types/node@24.10.0)(typescript@5.9.3)
@@ -4064,6 +4067,10 @@ packages:
    resolution: {integrity: sha512-h8lQ8tacZYnR3vNQTgibj+tODHI5/+l06Au2Pcriv/Gmet0eaj4TwWH41sO9wnHDiQsEj19q0drzdWdeAHtweg==}
    engines: {node: '>=6.5'}

+  accepts@1.3.8:
+    resolution: {integrity: sha512-PYAthTa2m2VKxuvSD3DPC/Gy+U+sOA1LAuT8mkmRuvw+NACSaeXEQ+NHcVF7rONl6qcaxV3Uuemwawk+7+SJLw==}
+    engines: {node: '>= 0.6'}
+
  acorn-import-attributes@1.9.5:
    resolution: {integrity: sha512-n02Vykv5uA3eHGM/Z2dQrcD56kL8TyDb2p1+0P83PClMnC/nc+anbQRhIOWnSq4Ke/KvDPrY3C9hDtC/A3eHnQ==}
    peerDependencies:
@@ -4080,6 +4087,14 @@ packages:
    peerDependencies:
      acorn: ^6.0.0 || ^7.0.0 || ^8.0.0

+  acorn-loose@8.5.2:
+    resolution: {integrity: sha512-PPvV6g8UGMGgjrMu+n/f9E/tCSkNQ2Y97eFvuVdJfG11+xdIeDcLyNdC8SHcrHbRqkfwLASdplyR6B6sKM1U4A==}
+    engines: {node: '>=0.4.0'}
+
+  acorn-walk@8.3.5:
+    resolution: {integrity: sha512-HEHNfbars9v4pgpW6SO1KSPkfoS0xVOM/9UzkJltjlsHZmJasxg8aXkuZa7SMf8vKGIBhpUsPluQSqhJFCqebw==}
+    engines: {node: '>=0.4.0'}
+
  acorn@8.15.0:
    resolution: {integrity: sha512-NZyJarBfL7nWwIq+FDL6Zp/yHEhePMNnnJ0y3qfieCrmNvYct8uvtiV41UvlSe6apAfk0fY1FbWx+NwfmpvtTg==}
    engines: {node: '>=0.4.0'}
@@ -4610,9 +4625,20 @@ packages:
  console-browserify@1.2.0:
    resolution: {integrity: sha512-ZMkYO/LkF17QvCPqM0gxw8yUzigAOZOSWSHg91FH6orS7vcEj5dVZTidN2fQ14yBSdg97RqhSNwLUXInd52OTA==}

+  console-grid@2.2.3:
+    resolution: {integrity: sha512-+mecFacaFxGl+1G31IsCx41taUXuW2FxX+4xIE0TIPhgML+Jb9JFcBWGhhWerd1/vhScubdmHqTwOhB0KCUUAg==}
+
  constants-browserify@1.0.0:
    resolution: {integrity: sha512-xFxOwqIzR/e1k1gLiWEophSCMqXcwVHIH7akf7b/vxcUeGunlj3hvZaaqxwHsTgn+IndtkQJgSztIDWeumWJDQ==}

+  content-disposition@1.0.1:
+    resolution: {integrity: sha512-oIXISMynqSqm241k6kcQ5UwttDILMK4BiurCfGEREw6+X9jkkpEe5T9FZaApyLGGOnFuyMWZpdolTXMtvEJ08Q==}
+    engines: {node: '>=18'}
+
+  content-type@1.0.5:
+    resolution: {integrity: sha512-nTjqfcBFEipKdXCv4YDQWCfmcLZKm81ldF0pAopTvyrFGVbcR6P/VAAd5G7N+0tTr8QqiU0tFadD6FK4NtJwOA==}
+    engines: {node: '>= 0.6'}
+
  convert-source-map@1.9.0:
    resolution: {integrity: sha512-ASFBup0Mz1uyiIjANan1jzLQami9z1PoYSZCiiYW2FczPbenXc45FZdBZLzOT+r6+iciuEModtmCti+hjaAk0A==}

@@ -4623,6 +4649,10 @@ packages:
    resolution: {integrity: sha512-9Kr/j4O16ISv8zBBhJoi4bXOYNTkFLOqSL3UDB0njXxCXNezjeyVrJyGOWtgfs/q2km1gwBcfH8q1yEGoMYunA==}
    engines: {node: '>=18'}

+  cookies@0.9.1:
+    resolution: {integrity: sha512-TG2hpqe4ELx54QER/S3HQ9SRVnQnGBtKUz5bLQWtYAQ+o6GpgMs6sYUvaiJjVxb+UXwhRhAEP3m7LbsIZ77Hmw==}
+    engines: {node: '>= 0.8'}
+
  core-js-compat@3.47.0:
    resolution: {integrity: sha512-IGfuznZ/n7Kp9+nypamBhvwdwLsW6KC8IOaURw2doAK5e98AG3acVLdh0woOnEqCfUtS+Vu882JE4k/DAm3ItQ==}

@@ -4931,6 +4961,9 @@ packages:
    resolution: {integrity: sha512-h5k/5U50IJJFpzfL6nO9jaaumfjO/f2NjK/oYB2Djzm4p9L+3T9qWpZqZ2hAbLPuuYq9wrU08WQyBTL5GbPk5Q==}
    engines: {node: '>=6'}

+  deep-equal@1.0.1:
+    resolution: {integrity: sha512-bHtC0iYvWhyaTzvV3CZgPeZQqCOBGyGsVV7v4eevpdkLHfiSrXUdBG+qAuSz4RI70sszvjQ1QSZ98An1yNwpSw==}
+
  deep-is@0.1.4:
    resolution: {integrity: sha512-oIPzksmTg4/MriiaYGO+okXDT7ztn/w3Eptv/+gSIdMdKsJo0u4CfYNFJPy+4SKMuCqGw2wxnA+URMg3t8a/bQ==}

@@ -4957,6 +4990,17 @@ packages:
  delaunator@5.0.1:
    resolution: {integrity: sha512-8nvh+XBe96aCESrGOqMp/84b13H9cdKbG5P2ejQCh4d4sK9RL4371qou9drQjMhvnPmhWl5hnmqbEE0fXr9Xnw==}

+  delegates@1.0.0:
+    resolution: {integrity: sha512-bd2L678uiWATM6m5Z1VzNCErI3jiGzt6HGY8OVICs40JQq/HALfbyNJmp0UDakEY4pMMaN0Ly5om/B1VI/+xfQ==}
+
+  depd@1.1.2:
+    resolution: {integrity: sha512-7emPTl6Dpo6JRXOXjLRxck+FlLRX5847cLKEn00PLAgc3g2hTZZgr+e4c2v6QpSmLeFP3n5yUo7ft6avBK/5jQ==}
+    engines: {node: '>= 0.6'}
+
+  depd@2.0.0:
+    resolution: {integrity: sha512-g7nH6P6dyDioJogAAGprGpCtVImJhpPk/roCzdb3fIh61/s/nPsfR6onyMwkCAR/OlC3yBC0lESvUoQEAssIrw==}
+    engines: {node: '>= 0.8'}
+
  dependency-graph@0.11.0:
    resolution: {integrity: sha512-JeMq7fEshyepOWDfcfHK06N3MhyPhz++vtqWhMT5O9A3K42rdsEDpfdVqjaqaAhsw6a+ZqeDvQVtD0hFHQWrzg==}
    engines: {node: '>= 0.6.0'}
@@ -4968,6 +5012,10 @@ packages:
  des.js@1.1.0:
    resolution: {integrity: sha512-r17GxjhUCjSRy8aiJpr8/UadFIzMzJGexI3Nmz4ADi9LYSFx4gTBp80+NaX/YsXWWLhpZ7v/v/ubEc/bCNfKwg==}

+  destroy@1.2.0:
+    resolution: {integrity: sha512-2sJGJTaXIIaR1w4iJSNoN0hnMY7Gpc/n8D4qSCJw8QqFWXf7cuAgnEHxBpweaVcPevC2l3KpjYCx3NypQQgaJg==}
+    engines: {node: '>= 0.8', npm: 1.2.8000 || >= 1.4.16}
+
  detect-libc@2.1.2:
    resolution: {integrity: sha512-Btj2BOOO83o3WyH59e8MgXsxEQVcarkUOpEYrubB0urwnN10yQ364rsiByU11nZlqWYZm05i/of7io4mzihBtQ==}
    engines: {node: '>=8'}
@@ -5049,6 +5097,12 @@ packages:
  eastasianwidth@0.2.0:
    resolution: {integrity: sha512-I88TYZWc9XiYHRQ4/3c5rjjfgkjhLyW2luGIheGERbNQ6OY7yTybanSpDXZa8y7VUP9YmDcYa+eyq4ca7iLqWA==}

+  ee-first@1.1.1:
+    resolution: {integrity: sha512-WMwm9LhRUo+WUaRN+vRuETqG89IgZphVSNkdFgeb6sS/E4OrDIN7t48CAewSHXc6C8lefD8KKfr5vY61brQlow==}
+
+  eight-colors@1.3.2:
+    resolution: {integrity: sha512-qo7BAEbNnadiWn3EgZFD8tk2DWpifEHJE7CVyp09I0FiUJZ6z0YSyCGFmmtopVMi32iaL4hEK6m+/pPkx1iMFA==}
+
  electron-to-chromium@1.5.267:
    resolution: {integrity: sha512-0Drusm6MVRXSOJpGbaSVgcQsuB4hEkMpHXaVstcPmhu5LIedxs1xNK/nIxmQIU/RPC0+1/o0AVZfBTkTNJOdUw==}

@@ -5081,6 +5135,10 @@ packages:
    resolution: {integrity: sha512-/kyM18EfinwXZbno9FyUGeFh87KC8HRQBQGildHZbEuRyWFOmv1U10o9BBp8XVZDVNNuQKyIGIu5ZYAAXJ0V2Q==}
    engines: {node: '>= 4'}

+  encodeurl@2.0.0:
+    resolution: {integrity: sha512-Q0n9HRi4m6JuGIV1eFlmvJB7ZEVxu93IrMyiMsGC0lrMJMWzRgx6WGquyfQgZVb31vhGgXnfmPNNXmxnOkRBrg==}
+    engines: {node: '>= 0.8'}
+
  endent@2.1.0:
    resolution: {integrity: sha512-r8VyPX7XL8U01Xgnb1CjZ3XV+z90cXIJ9JPE/R9SEC9vpw2P6CfsRPJmp20DppC5N7ZAMCmjYkJIa744Iyg96w==}

@@ -5180,6 +5238,9 @@ packages:
    resolution: {integrity: sha512-WUj2qlxaQtO4g6Pq5c29GTcWGDyd8itL8zTlipgECz3JesAiiOKotd8JU6otB3PACgG6xkJUyVhboMS+bje/jA==}
    engines: {node: '>=6'}

+  escape-html@1.0.3:
+    resolution: {integrity: sha512-NiSupZ4OeuGwr68lGIeym/ksIZMJodUGOSCZ/FSnTxcrekbvqrgdUxlJOMpijaKZVjAJrWrGs/6Jy8OMuyj9ow==}
+
  escape-string-regexp@4.0.0:
    resolution: {integrity: sha512-TtpcNJ3XAzx3Gq8sWRzJaVajRs0uVxA2YAkdb1jm2YkPz4G6egUFAyA3n5vtEIZefPk5Wa4UXbKuS5fKkJWdgA==}
    engines: {node: '>=10'}
@@ -5493,6 +5554,10 @@ packages:
      react-dom:
        optional: true

+  fresh@0.5.2:
+    resolution: {integrity: sha512-zJ2mQYM18rEFOudeV4GShTGIQ7RbzA7ozbU9I/XBpm7kqgMywgmylMwXHxZJmkVoYkna9d2pVXVXPdYTP9ej8Q==}
+    engines: {node: '>= 0.6'}
+
  fs-extra@10.1.0:
    resolution: {integrity: sha512-oRXApq54ETRj4eMiFzGnHWGy+zo5raudjuxN0b8H7s/RU2oW0Wvsx9O0ACRN/kRq9E8Vu/ReskGB5o3ji+FzHQ==}
    engines: {node: '>=12'}
@@ -5773,6 +5838,18 @@ packages:
  htmlparser2@6.1.0:
    resolution: {integrity: sha512-gyyPk6rgonLFEDGoeRgQNaEUvdJ4ktTmmUh/h2t7s+M8oPpIPxgNACWa+6ESR57kXstwqPiCut0V8NRpcwgU7A==}

+  http-assert@1.5.0:
+    resolution: {integrity: sha512-uPpH7OKX4H25hBmU6G1jWNaqJGpTXxey+YOUizJUAgu0AjLUeC8D73hTrhvDS5D+GJN1DN1+hhc/eF/wpxtp0w==}
+    engines: {node: '>= 0.8'}
+
+  http-errors@1.8.1:
+    resolution: {integrity: sha512-Kpk9Sm7NmI+RHhnj6OIWDI1d6fIoFAtFt9RLaTMRlg/8w49juAStsrBgp0Dp4OdxdVbRIeKhtCUvoi/RuAhO4g==}
+    engines: {node: '>= 0.6'}
+
+  http-errors@2.0.1:
+    resolution: {integrity: sha512-4FbRdAX+bSdmo4AUFuS0WNiPz8NgFt+r8ThgNWmlrjQjt1Q7ZR9+zTlce2859x4KSXrwIsaeTqDoKQmtP8pLmQ==}
+    engines: {node: '>= 0.8'}
+
  http-proxy-agent@7.0.2:
    resolution: {integrity: sha512-T1gkAiYYDWYx3V5Bmyu7HcfcvL7mUrTWiM6yOfa3PIphViJ/gFPbvidQ+veqSOHci/PxBcDabeUNCzpOODJZig==}
    engines: {node: '>= 14'}
@@ -6193,12 +6270,26 @@ packages:
    resolution: {integrity: sha512-YHzO7721WbmAL6Ov1uzN/l5mY5WWWhJBSW+jq4tkfZfsxmo1hu6frS0EOswvjBUnWE6NtjEs48SFn5CQESRLZg==}
    hasBin: true

+  keygrip@1.1.0:
+    resolution: {integrity: sha512-iYSchDJ+liQ8iwbSI2QqsQOvqv58eJCEanyJPJi+Khyu8smkcKSFUCbPwzFcL7YVtZ6eONjqRX/38caJ7QjRAQ==}
+    engines: {node: '>= 0.6'}
+
  keyv@4.5.4:
    resolution: {integrity: sha512-oxVHkHR/EJf2CNXnWxRLW6mg7JyCCUcG0DtEGmL2ctUo1PNTin1PUil+r/+4r5MpVgC/fn1kjsx7mjSujKqIpw==}

  khroma@2.1.0:
    resolution: {integrity: sha512-Ls993zuzfayK269Svk9hzpeGUKob/sIgZzyHYdjQoAdQetRKpOLj+k/QQQ/6Qi0Yz65mlROrfd+Ev+1+7dz9Kw==}

+  koa-compose@4.1.0:
+    resolution: {integrity: sha512-8ODW8TrDuMYvXRwra/Kh7/rJo9BtOfPc6qO8eAfC80CnCvSjSl0bkRM24X6/XBBEyj0v1nRUQ1LyOy3dbqOWXw==}
+
+  koa-static-resolver@1.0.6:
+    resolution: {integrity: sha512-ZX5RshSzH8nFn05/vUNQzqw32nEigsPa67AVUr6ZuQxuGdnCcTLcdgr4C81+YbJjpgqKHfacMBd7NmJIbj7fXw==}
+
+  koa@3.2.0:
+    resolution: {integrity: sha512-TrM4/tnNY7uJ1aW55sIIa+dqBvc4V14WRIAlGcWat9wV5pRS9Wr5Zk2ZTjQP1jtfIHDoHiSbPuV08P0fUZo2pg==}
+    engines: {node: '>= 18'}
+
  langium@3.3.1:
    resolution: {integrity: sha512-QJv/h939gDpvT+9SiLVlY7tZC3xB2qK57v0J04Sh9wpMb6MP1q8gB21L3WIo8T5P1MSMg3Ep14L7KkDCFG3y4w==}
    engines: {node: '>=16.0.0'}
@@ -6351,6 +6442,9 @@ packages:
    resolution: {integrity: sha512-h5bgJWpxJNswbU7qCrV0tIKQCaS3blPDrqKWx+QxzuzL1zGUzij9XCWLrSLsJPu5t+eWA/ycetzYAO5IOMcWAQ==}
    hasBin: true

+  lz-utils@2.1.0:
+    resolution: {integrity: sha512-CMkfimAypidTtWjNDxY8a1bc1mJdyEh04V2FfEQ5Zh8Nx4v7k850EYa+dOWGn9hKG5xOyHP5MkuduAZCTHRvJw==}
+
  magic-string@0.30.21:
    resolution: {integrity: sha512-vd2F4YUyEXKGcLHoq+TEyCjxueSeHnFxyyjNp80yg0XV4vUhnDer/lvvlqM/arB5bXQN5K2/3oinyCRyx8T2CQ==}

@@ -6456,6 +6550,10 @@ packages:
  mdurl@2.0.0:
    resolution: {integrity: sha512-Lf+9+2r+Tdp5wXDXC4PcIBjTDtq4UKjCPMQhKIuzpJNW0b96kVqSwW0bT7FhRSfmAiFYgP+SCRvdrDozfh0U5w==}

+  media-typer@1.1.0:
+    resolution: {integrity: sha512-aisnrDP4GNe06UcKFnV5bfMNPBUw4jsLGaWwWfnH3v02GnBuXX2MCVn5RbrWo0j3pczUilYblq7fQ7Nw2t5XKw==}
+    engines: {node: '>= 0.8'}
+
  memfs@3.5.3:
    resolution: {integrity: sha512-UERzLsxzllchadvbPs5aolHh65ISpKpM+ccLbOJ8/vvpBKmAWf+la7dXFy7Mr0ySHbdHrFv5kGFCUHHe6GFEmw==}
    engines: {node: '>= 4.0.0'}
@@ -6598,10 +6696,18 @@ packages:
    resolution: {integrity: sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg==}
    engines: {node: '>= 0.6'}

+  mime-db@1.54.0:
+    resolution: {integrity: sha512-aU5EJuIN2WDemCcAp2vFBfp/m4EAhWJnUNSSw0ixs7/kXbd6Pg64EmwJkNdFhB8aWt1sH2CTXrLxo/iAGV3oPQ==}
+    engines: {node: '>= 0.6'}
+
  mime-types@2.1.35:
    resolution: {integrity: sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw==}
    engines: {node: '>= 0.6'}

+  mime-types@3.0.2:
+    resolution: {integrity: sha512-Lbgzdk0h4juoQ9fCKXW4by0UJqj+nOOrI9MJ1sSj4nI8aI2eo1qmvQEie4VD1glsS250n15LsWsYtCugiStS5A==}
+    engines: {node: '>=18'}
+
  mimic-fn@2.1.0:
    resolution: {integrity: sha512-OqbOk5oEQeAZ8WXWydlu9HJjz9WVdEIvamMCcXmuqUYjTknH/sqsWvhQ3vgwKFRR1HpjvNBKQ37nbJgYzGqGcg==}
    engines: {node: '>=6'}
@@ -6640,6 +6746,17 @@ packages:
  module-details-from-path@1.0.4:
    resolution: {integrity: sha512-EGWKgxALGMgzvxYF1UyGTy0HXX/2vHLkw6+NvDKW2jypWbHpjQuj4UMcqQWXHERJhVGKikolT06G3bcKe4fi7w==}

+  monocart-coverage-reports@2.12.9:
+    resolution: {integrity: sha512-vtFqbC3Egl4nVa1FSIrQvMPO6HZtb9lo+3IW7/crdvrLNW2IH8lUsxaK0TsKNmMO2mhFWwqQywLV2CZelqPgwA==}
+    hasBin: true
+
+  monocart-locator@1.0.2:
+    resolution: {integrity: sha512-v8W5hJLcWMIxLCcSi/MHh+VeefI+ycFmGz23Froer9QzWjrbg4J3gFJBuI/T1VLNoYxF47bVPPxq8ZlNX4gVCw==}
+
+  monocart-reporter@2.10.0:
+    resolution: {integrity: sha512-Q421HL8hCr024HMjQcQylEpOLy69FE6Zli2s/A0zptfFEPW/kaz6B1Ll3CYs8L1j67+egt1HeNC1LTHUsp6W+A==}
+    hasBin: true
+
  motion-dom@12.24.8:
    resolution: {integrity: sha512-wX64WITk6gKOhaTqhsFqmIkayLAAx45SVFiMnJIxIrH5uqyrwrxjrfo8WX9Kh8CaUAixjeMn82iH0W0QT9wD5w==}

@@ -6688,6 +6805,10 @@ packages:
  natural-compare@1.4.0:
    resolution: {integrity: sha512-OWND8ei3VtNC9h7V60qff3SVobHr996CTwgxubgyQYEpg290h9J0buyECNNJexkFm5sOajh5G116RYA1c8ZMSw==}

+  negotiator@0.6.3:
+    resolution: {integrity: sha512-+EUsqGPLsM+j/zdChZjsnX51g4XrHFOIXwfnCVPGlQk/k5giakcKsuxCObBRu6DSm9opw/O6slWbJdghQM4bBg==}
+    engines: {node: '>= 0.6'}
+
  neo-async@2.6.2:
    resolution: {integrity: sha512-Yd3UES5mWCSqR+qNT93S3UoYUkqAZ9lLg8a7g9rimsWmYGK8cVToA4/sF3RrshdyV3sAGMXVUmpMYOw+dLpOuw==}

@@ -6757,6 +6878,10 @@ packages:
  node-releases@2.0.27:
    resolution: {integrity: sha512-nmh3lCkYZ3grZvqcCH+fjmQ7X+H0OeZgP40OierEaAptX4XofMh5kwNbWh7lBduUzCcV/8kZ+NDLCwm2iorIlA==}

+  nodemailer@7.0.13:
+    resolution: {integrity: sha512-PNDFSJdP+KFgdsG3ZzMXCgquO7I6McjY2vlqILjtJd0hy8wEvtugS9xKRF2NWlPNGxvLCXlTNIae4serI7dinw==}
+    engines: {node: '>=6.0.0'}
+
  normalize-path@3.0.0:
    resolution: {integrity: sha512-6eZs5Ls3WtCisHWp9S2GUy8dqkpGi4BVSz3GaqiE6ezub0512ESztXUwUB6C6IKbQkY2Pnb/mD4WYojCRwcwLA==}
    engines: {node: '>=0.10.0'}
@@ -6851,6 +6976,10 @@ packages:
  obug@2.1.1:
    resolution: {integrity: sha512-uTqF9MuPraAQ+IsnPf366RG4cP9RtUi7MLO1N3KEc+wb0a6yKpeL0lmk2IB1jY5KHPAlTc6T/JRdC/YqxHNwkQ==}

+  on-finished@2.4.1:
+    resolution: {integrity: sha512-oVlzkg3ENAhCk2zdv7IJwd/QUD4z2RxRwpkcGY8psCVcCYZNq4wYnVWALHM+brtuJjePWiYF/ClmuDr8Ch5+kg==}
+    engines: {node: '>= 0.8'}
+
  once@1.4.0:
    resolution: {integrity: sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w==}

@@ -6953,6 +7082,10 @@ packages:
  parse5@8.0.0:
    resolution: {integrity: sha512-9m4m5GSgXjL4AjumKzq1Fgfp3Z8rsvjRNbnkVwfu2ImRqE5D0LnY2QfDen18FSY9C573YU5XxSapdHZTZ2WolA==}

+  parseurl@1.3.3:
+    resolution: {integrity: sha512-CiyeOxFT/JZyN5m0z9PfXw4SCBJ6Sygz1Dpl0wqjlhDEGGBP1GnsUVEL0p63hoG1fcj3fHynXi9NYO4nWOL+qQ==}
+    engines: {node: '>= 0.8'}
+
  pascal-case@3.1.2:
    resolution: {integrity: sha512-uWlGT3YSnK9x3BQJaOdcZwrnV6hPpd8jFH1/ucpiLRPh/2zCVJKS19E4GvYHvaCcACn3foXZ0cLB9Wrx1KGe5g==}

@@ -7751,6 +7884,9 @@ packages:
  setimmediate@1.0.5:
    resolution: {integrity: sha512-MATJdZp8sLqDl/68LfQmbP8zKPLQNV6BIZoIgrscFDQ+RsvK/BxeDQOgyxKKoh0y/8h3BqVFnCqQ/gd+reiIXA==}

+  setprototypeof@1.2.0:
+    resolution: {integrity: sha512-E5LDX7Wrp85Kil5bhZv46j8jOeboKq5JMmYM3gVGdGH8xFpPWXUMsNrlODCrkoxMEeNi/XZIwuRvY4XNwYMJpw==}
+
  sha.js@2.4.12:
    resolution: {integrity: sha512-8LzC5+bvI45BjpfXU8V5fdU2mfeKiQe1D1gIMn7XUlF3OTUrpdJpPPH4EMAnF0DsHHdSZqCdSss5qCmJKuiO3w==}
    engines: {node: '>= 0.10'}
@@ -7872,6 +8008,10 @@ packages:
    resolution: {integrity: sha512-WjlahMgHmCJpqzU8bIBy4qtsZdU9lRlcZE3Lvyej6t4tuOuv1vk57OW3MBrj6hXBFx/nNoC9MPMTcr5YA7NQbg==}
    engines: {node: '>=6'}

+  statuses@1.5.0:
+    resolution: {integrity: sha512-OpZ3zP+jT1PI7I8nemJX4AKmAX070ZkYPVWV/AaKTJl+tXCTGyVdC1a4SL8RUQYEwk/f34ZX8UTykN68FwrqAA==}
+    engines: {node: '>= 0.6'}
+
  statuses@2.0.2:
    resolution: {integrity: sha512-DvEy55V3DB7uknRo+4iOGT5fP1slR8wQohVdknigZPMpMstaKJQWhwiYBACJE3Ul2pTnATihhBYnRhZQHGBiRw==}
    engines: {node: '>= 0.8'}
@@ -8157,6 +8297,10 @@ packages:
    resolution: {integrity: sha512-65P7iz6X5yEr1cwcgvQxbbIw7Uk3gOy5dIdtZ4rDveLqhrdJP+Li/Hx6tyK0NEb+2GCyneCMJiGqrADCSNk8sQ==}
    engines: {node: '>=8.0'}

+  toidentifier@1.0.1:
+    resolution: {integrity: sha512-o5sSPKEkg/DIQNmH43V0/uerLrpzVedkUh8tGNvaeXpfpuwjKenlSox/2O/BTlZUtEe+JG7s5YhEz608PlAHRA==}
+    engines: {node: '>=0.6'}
+
  tough-cookie@6.0.0:
    resolution: {integrity: sha512-kXuRi1mtaKMrsLUxz3sQYvVl37B0Ns6MzfrtV5DvJceE9bPyspOqk9xxv7XbZWcfLWbFmm997vl83qUWVJA64w==}
    engines: {node: '>=16'}
@@ -8228,6 +8372,10 @@ packages:
  tslib@2.8.1:
    resolution: {integrity: sha512-oJFu94HQb+KVduSUQL7wnpmqnfmLsOA/nAh6b6EH0wCEoK0/mPeXU6c3wKDV83MkOuHPRHtSXKKU99IBazS/2w==}

+  tsscmp@1.0.6:
+    resolution: {integrity: sha512-LxhtAkPDTkVCMQjt2h6eBVY28KCjikZqZfMcC15YBeNjkgUpdCfBu5HoiOTDu86v6smE8yOjyEktJ8hlbANHQA==}
+    engines: {node: '>=0.6.x'}
+
  tty-browserify@0.0.1:
    resolution: {integrity: sha512-C3TaO7K81YvjCgQH9Q1S3R3P3BtN3RIM8n+OvX4il1K1zgE8ZhI0op7kClgkxtutIE8hQrcrHBXvIheqKUUCxw==}

@@ -8257,6 +8405,10 @@ packages:
    resolution: {integrity: sha512-TeTSQ6H5YHvpqVwBRcnLDCBnDOHWYu7IvGbHT6N8AOymcr9PJGjc1GTtiWZTYg0NCgYwvnYWEkVChQAr9bjfwA==}
    engines: {node: '>=16'}

+  type-is@2.0.1:
+    resolution: {integrity: sha512-OZs6gsjF4vMp32qrCbiVSkrFmXtG/AZhY3t0iAMrMBiAZyV9oALtXO8hsrHbMXF9x6L3grlFuwW2oAz7cav+Gw==}
+    engines: {node: '>= 0.6'}
+
  typed-array-buffer@1.0.3:
    resolution: {integrity: sha512-nAYYwfY3qnzX30IkA6AQZjVbtK6duGontcQm1WSG1MD94YLqK0515GNApXkoxKOWMusVssAHWLh9SeaoefYFGw==}
    engines: {node: '>= 0.4'}
@@ -8457,6 +8609,10 @@ packages:
    resolution: {integrity: sha512-spH26xU080ydGggxRyR1Yhcbgx+j3y5jbNXk/8L+iRvdIEQ4uTRH2Sgf2dokud6Q4oAtsbNvJ1Ft+9xmm6IZcA==}
    engines: {node: '>= 0.10'}

+  vary@1.1.2:
+    resolution: {integrity: sha512-BNGbWLfd0eUPabhkXUVm0j8uuvREyTh5ovRa/dyow/BqAbZJyC+5fU+IzQOzmAKzYqYRAISoRhdQr3eIZ/PXqg==}
+    engines: {node: '>= 0.8'}
+
  vaul@1.1.2:
    resolution: {integrity: sha512-ZFkClGpWyI2WUQjdLJ/BaGuV6AVQiJ3uELGk3OYtP+B6yCO7Cmn9vPFXVJkRaGkOJu3m8bQMgtyzNHixULceQA==}
    peerDependencies:
@@ -12911,6 +13067,11 @@ snapshots:
    dependencies:
      event-target-shim: 5.0.1

+  accepts@1.3.8:
+    dependencies:
+      mime-types: 2.1.35
+      negotiator: 0.6.3
+
  acorn-import-attributes@1.9.5(acorn@8.15.0):
    dependencies:
      acorn: 8.15.0
@@ -12923,6 +13084,14 @@ snapshots:
    dependencies:
      acorn: 8.15.0

+  acorn-loose@8.5.2:
+    dependencies:
+      acorn: 8.15.0
+
+  acorn-walk@8.3.5:
+    dependencies:
+      acorn: 8.15.0
+
  acorn@8.15.0: {}

  adjust-sourcemap-loader@4.0.0:
@@ -13472,14 +13641,25 @@ snapshots:

  console-browserify@1.2.0: {}

+  console-grid@2.2.3: {}
+
  constants-browserify@1.0.0: {}

+  content-disposition@1.0.1: {}
+
+  content-type@1.0.5: {}
+
  convert-source-map@1.9.0: {}

  convert-source-map@2.0.0: {}

  cookie@1.0.2: {}

+  cookies@0.9.1:
+    dependencies:
+      depd: 2.0.0
+      keygrip: 1.1.0
+
  core-js-compat@3.47.0:
    dependencies:
      browserslist: 4.28.1
@@ -13843,6 +14023,8 @@ snapshots:

  deep-eql@5.0.2: {}

+  deep-equal@1.0.1: {}
+
  deep-is@0.1.4: {}

  deepmerge-ts@7.1.5: {}
@@ -13867,6 +14049,12 @@ snapshots:
    dependencies:
      robust-predicates: 3.0.2

+  delegates@1.0.0: {}
+
+  depd@1.1.2: {}
+
+  depd@2.0.0: {}
+
  dependency-graph@0.11.0: {}

  dequal@2.0.3: {}
@@ -13876,6 +14064,8 @@ snapshots:
      inherits: 2.0.4
      minimalistic-assert: 1.0.1

+  destroy@1.2.0: {}
+
  detect-libc@2.1.2:
    optional: true

@@ -13958,6 +14148,10 @@ snapshots:

  eastasianwidth@0.2.0: {}

+  ee-first@1.1.1: {}
+
+  eight-colors@1.3.2: {}
+
  electron-to-chromium@1.5.267: {}

  elliptic@6.6.1:
@@ -13990,6 +14184,8 @@ snapshots:

  emojis-list@3.0.0: {}

+  encodeurl@2.0.0: {}
+
  endent@2.1.0:
    dependencies:
      dedent: 0.7.0
@@ -14209,6 +14405,8 @@ snapshots:

  escalade@3.2.0: {}

+  escape-html@1.0.3: {}
+
  escape-string-regexp@4.0.0: {}

  escape-string-regexp@5.0.0: {}
@@ -14606,6 +14804,8 @@ snapshots:
      react: 18.3.1
      react-dom: 18.3.1(react@18.3.1)

+  fresh@0.5.2: {}
+
  fs-extra@10.1.0:
    dependencies:
      graceful-fs: 4.2.11
@@ -14994,6 +15194,27 @@ snapshots:
      domutils: 2.8.0
      entities: 2.2.0

+  http-assert@1.5.0:
+    dependencies:
+      deep-equal: 1.0.1
+      http-errors: 1.8.1
+
+  http-errors@1.8.1:
+    dependencies:
+      depd: 1.1.2
+      inherits: 2.0.4
+      setprototypeof: 1.2.0
+      statuses: 1.5.0
+      toidentifier: 1.0.1
+
+  http-errors@2.0.1:
+    dependencies:
+      depd: 2.0.0
+      inherits: 2.0.4
+      setprototypeof: 1.2.0
+      statuses: 2.0.2
+      toidentifier: 1.0.1
+
  http-proxy-agent@7.0.2:
    dependencies:
      agent-base: 7.1.4
@@ -15409,12 +15630,41 @@ snapshots:
    dependencies:
      commander: 8.3.0

+  keygrip@1.1.0:
+    dependencies:
+      tsscmp: 1.0.6
+
  keyv@4.5.4:
    dependencies:
      json-buffer: 3.0.1

  khroma@2.1.0: {}

+  koa-compose@4.1.0: {}
+
+  koa-static-resolver@1.0.6: {}
+
+  koa@3.2.0:
+    dependencies:
+      accepts: 1.3.8
+      content-disposition: 1.0.1
+      content-type: 1.0.5
+      cookies: 0.9.1
+      delegates: 1.0.0
+      destroy: 1.2.0
+      encodeurl: 2.0.0
+      escape-html: 1.0.3
+      fresh: 0.5.2
+      http-assert: 1.5.0
+      http-errors: 2.0.1
+      koa-compose: 4.1.0
+      mime-types: 3.0.2
+      on-finished: 2.4.1
+      parseurl: 1.3.3
+      statuses: 2.0.2
+      type-is: 2.0.1
+      vary: 1.1.2
+
  langium@3.3.1:
    dependencies:
      chevrotain: 11.0.3
@@ -15552,6 +15802,8 @@ snapshots:

  lz-string@1.5.0: {}

+  lz-utils@2.1.0: {}
+
  magic-string@0.30.21:
    dependencies:
      '@jridgewell/sourcemap-codec': 1.5.5
@@ -15771,6 +16023,8 @@ snapshots:

  mdurl@2.0.0: {}

+  media-typer@1.1.0: {}
+
  memfs@3.5.3:
    dependencies:
      fs-monkey: 1.1.0
@@ -16047,10 +16301,16 @@ snapshots:

  mime-db@1.52.0: {}

+  mime-db@1.54.0: {}
+
  mime-types@2.1.35:
    dependencies:
      mime-db: 1.52.0

+  mime-types@3.0.2:
+    dependencies:
+      mime-db: 1.54.0
+
  mimic-fn@2.1.0: {}

  min-indent@1.0.1: {}
@@ -16084,6 +16344,34 @@ snapshots:

  module-details-from-path@1.0.4: {}

+  monocart-coverage-reports@2.12.9:
+    dependencies:
+      acorn: 8.15.0
+      acorn-loose: 8.5.2
+      acorn-walk: 8.3.5
+      commander: 14.0.2
+      console-grid: 2.2.3
+      eight-colors: 1.3.2
+      foreground-child: 3.3.1
+      istanbul-lib-coverage: 3.2.2
+      istanbul-lib-report: 3.0.1
+      istanbul-reports: 3.2.0
+      lz-utils: 2.1.0
+      monocart-locator: 1.0.2
+
+  monocart-locator@1.0.2: {}
+
+  monocart-reporter@2.10.0:
+    dependencies:
+      console-grid: 2.2.3
+      eight-colors: 1.3.2
+      koa: 3.2.0
+      koa-static-resolver: 1.0.6
+      lz-utils: 2.1.0
+      monocart-coverage-reports: 2.12.9
+      monocart-locator: 1.0.2
+      nodemailer: 7.0.13
+
  motion-dom@12.24.8:
    dependencies:
      motion-utils: 12.23.28
@@ -16138,6 +16426,8 @@ snapshots:

  natural-compare@1.4.0: {}

+  negotiator@0.6.3: {}
+
  neo-async@2.6.2: {}

  next-themes@0.4.6(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
@@ -16237,6 +16527,8 @@ snapshots:

  node-releases@2.0.27: {}

+  nodemailer@7.0.13: {}
+
  normalize-path@3.0.0: {}

  npm-run-path@4.0.1:
@@ -16338,6 +16630,10 @@ snapshots:

  obug@2.1.1: {}

+  on-finished@2.4.1:
+    dependencies:
+      ee-first: 1.1.1
+
  once@1.4.0:
    dependencies:
      wrappy: 1.0.2
@@ -16495,6 +16791,8 @@ snapshots:
      entities: 6.0.1
    optional: true

+  parseurl@1.3.3: {}
+
  pascal-case@3.1.2:
    dependencies:
      no-case: 3.0.4
@@ -17365,6 +17663,8 @@ snapshots:

  setimmediate@1.0.5: {}

+  setprototypeof@1.2.0: {}
+
  sha.js@2.4.12:
    dependencies:
      inherits: 2.0.4
@@ -17526,6 +17826,8 @@ snapshots:
    dependencies:
      type-fest: 0.7.1

+  statuses@1.5.0: {}
+
  statuses@2.0.2: {}

  std-env@3.10.0: {}
@@ -17873,6 +18175,8 @@ snapshots:
    dependencies:
      is-number: 7.0.0

+  toidentifier@1.0.1: {}
+
  tough-cookie@6.0.0:
    dependencies:
      tldts: 7.0.19
@@ -17930,6 +18234,8 @@ snapshots:

  tslib@2.8.1: {}

+  tsscmp@1.0.6: {}
+
  tty-browserify@0.0.1: {}

  twemoji-parser@14.0.0: {}
@@ -17953,6 +18259,12 @@ snapshots:

  type-fest@4.41.0: {}

+  type-is@2.0.1:
+    dependencies:
+      content-type: 1.0.5
+      media-typer: 1.1.0
+      mime-types: 3.0.2
+
  typed-array-buffer@1.0.3:
    dependencies:
      call-bound: 1.0.4
@@ -18182,6 +18494,8 @@ snapshots:

  validator@13.15.26: {}

+  vary@1.1.2: {}
+
  vaul@1.1.2(@types/react-dom@18.3.5(@types/react@18.3.17))(@types/react@18.3.17)(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
    dependencies:
      '@radix-ui/react-dialog': 1.1.15(@types/react-dom@18.3.5(@types/react@18.3.17))(@types/react@18.3.17)(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
--- a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/tests/store.test.ts
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/tests/store.test.ts
@@ -0,0 +1,156 @@
+import { describe, it, expect, beforeEach } from "vitest";
+import { useOnboardingWizardStore } from "../store";
+
+beforeEach(() => {
+  useOnboardingWizardStore.getState().reset();
+});
+
+describe("useOnboardingWizardStore", () => {
+  describe("initial state", () => {
+    it("starts at step 1 with empty fields", () => {
+      const state = useOnboardingWizardStore.getState();
+      expect(state.currentStep).toBe(1);
+      expect(state.name).toBe("");
+      expect(state.role).toBe("");
+      expect(state.otherRole).toBe("");
+      expect(state.painPoints).toEqual([]);
+      expect(state.otherPainPoint).toBe("");
+    });
+  });
+
+  describe("setName", () => {
+    it("updates the name", () => {
+      useOnboardingWizardStore.getState().setName("Alice");
+      expect(useOnboardingWizardStore.getState().name).toBe("Alice");
+    });
+  });
+
+  describe("setRole", () => {
+    it("updates the role", () => {
+      useOnboardingWizardStore.getState().setRole("Engineer");
+      expect(useOnboardingWizardStore.getState().role).toBe("Engineer");
+    });
+  });
+
+  describe("setOtherRole", () => {
+    it("updates the other role text", () => {
+      useOnboardingWizardStore.getState().setOtherRole("Designer");
+      expect(useOnboardingWizardStore.getState().otherRole).toBe("Designer");
+    });
+  });
+
+  describe("togglePainPoint", () => {
+    it("adds a pain point", () => {
+      useOnboardingWizardStore.getState().togglePainPoint("slow builds");
+      expect(useOnboardingWizardStore.getState().painPoints).toEqual([
+        "slow builds",
+      ]);
+    });
+
+    it("removes a pain point when toggled again", () => {
+      useOnboardingWizardStore.getState().togglePainPoint("slow builds");
+      useOnboardingWizardStore.getState().togglePainPoint("slow builds");
+      expect(useOnboardingWizardStore.getState().painPoints).toEqual([]);
+    });
+
+    it("handles multiple pain points", () => {
+      useOnboardingWizardStore.getState().togglePainPoint("slow builds");
+      useOnboardingWizardStore.getState().togglePainPoint("no tests");
+      expect(useOnboardingWizardStore.getState().painPoints).toEqual([
+        "slow builds",
+        "no tests",
+      ]);
+
+      useOnboardingWizardStore.getState().togglePainPoint("slow builds");
+      expect(useOnboardingWizardStore.getState().painPoints).toEqual([
+        "no tests",
+      ]);
+    });
+
+    it("ignores new selections when at the max limit", () => {
+      useOnboardingWizardStore.getState().togglePainPoint("a");
+      useOnboardingWizardStore.getState().togglePainPoint("b");
+      useOnboardingWizardStore.getState().togglePainPoint("c");
+      useOnboardingWizardStore.getState().togglePainPoint("d");
+      expect(useOnboardingWizardStore.getState().painPoints).toEqual([
+        "a",
+        "b",
+        "c",
+      ]);
+    });
+
+    it("still allows deselecting when at the max limit", () => {
+      useOnboardingWizardStore.getState().togglePainPoint("a");
+      useOnboardingWizardStore.getState().togglePainPoint("b");
+      useOnboardingWizardStore.getState().togglePainPoint("c");
+      useOnboardingWizardStore.getState().togglePainPoint("b");
+      expect(useOnboardingWizardStore.getState().painPoints).toEqual([
+        "a",
+        "c",
+      ]);
+    });
+  });
+
+  describe("setOtherPainPoint", () => {
+    it("updates the other pain point text", () => {
+      useOnboardingWizardStore.getState().setOtherPainPoint("flaky CI");
+      expect(useOnboardingWizardStore.getState().otherPainPoint).toBe(
+        "flaky CI",
+      );
+    });
+  });
+
+  describe("nextStep", () => {
+    it("increments the step", () => {
+      useOnboardingWizardStore.getState().nextStep();
+      expect(useOnboardingWizardStore.getState().currentStep).toBe(2);
+    });
+
+    it("clamps at step 4", () => {
+      useOnboardingWizardStore.getState().goToStep(4);
+      useOnboardingWizardStore.getState().nextStep();
+      expect(useOnboardingWizardStore.getState().currentStep).toBe(4);
+    });
+  });
+
+  describe("prevStep", () => {
+    it("decrements the step", () => {
+      useOnboardingWizardStore.getState().goToStep(3);
+      useOnboardingWizardStore.getState().prevStep();
+      expect(useOnboardingWizardStore.getState().currentStep).toBe(2);
+    });
+
+    it("clamps at step 1", () => {
+      useOnboardingWizardStore.getState().prevStep();
+      expect(useOnboardingWizardStore.getState().currentStep).toBe(1);
+    });
+  });
+
+  describe("goToStep", () => {
+    it("jumps to an arbitrary step", () => {
+      useOnboardingWizardStore.getState().goToStep(3);
+      expect(useOnboardingWizardStore.getState().currentStep).toBe(3);
+    });
+  });
+
+  describe("reset", () => {
+    it("resets all fields to defaults", () => {
+      useOnboardingWizardStore.getState().setName("Alice");
+      useOnboardingWizardStore.getState().setRole("Engineer");
+      useOnboardingWizardStore.getState().setOtherRole("Other");
+      useOnboardingWizardStore.getState().togglePainPoint("slow builds");
+      useOnboardingWizardStore.getState().setOtherPainPoint("flaky CI");
+      useOnboardingWizardStore.getState().goToStep(3);
+
+      useOnboardingWizardStore.getState().reset();
+
+      const state = useOnboardingWizardStore.getState();
+      expect(state.currentStep).toBe(1);
+      expect(state.name).toBe("");
+      expect(state.role).toBe("");
+      expect(state.otherRole).toBe("");
+      expect(state.painPoints).toEqual([]);
+      expect(state.otherPainPoint).toBe("");
+    });
+  });
+});
--- a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/components/ProgressBar.tsx
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/components/ProgressBar.tsx
@@ -7,9 +7,9 @@ export function ProgressBar({ currentStep, totalSteps }: Props) {
  const percent = (currentStep / totalSteps) * 100;

  return (
-    <div className="absolute left-0 top-0 h-[0.625rem] w-full bg-neutral-300">
+    <div className="absolute left-0 top-0 h-[3px] w-full bg-neutral-200">
      <div
-        className="h-full bg-purple-400 shadow-[0_0_4px_2px_rgba(168,85,247,0.5)] transition-all duration-500 ease-out"
+        className="h-full bg-purple-400 transition-all duration-500 ease-out"
        style={{ width: `${percent}%` }}
      />
    </div>
--- a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/components/SelectableCard.tsx
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/components/SelectableCard.tsx
@@ -2,6 +2,7 @@

 import { Text } from "@/components/atoms/Text/Text";
 import { cn } from "@/lib/utils";
+import { Check } from "@phosphor-icons/react";

 interface Props {
  icon: React.ReactNode;
@@ -24,13 +25,18 @@ export function SelectableCard({
      onClick={onClick}
      aria-pressed={selected}
      className={cn(
-        "flex h-[9rem] w-[10.375rem] shrink-0 flex-col items-center justify-center gap-3 rounded-xl border-2 bg-white px-6 py-5 transition-all hover:shadow-sm md:shrink lg:gap-2 lg:px-10 lg:py-8",
+        "relative flex h-[9rem] w-[10.375rem] shrink-0 flex-col items-center justify-center gap-3 rounded-xl border-2 bg-white px-6 py-5 transition-all hover:shadow-sm md:shrink lg:gap-2 lg:px-10 lg:py-8",
        className,
        selected
          ? "border-purple-500 bg-purple-50 shadow-sm"
          : "border-transparent",
      )}
    >
+      {selected && (
+        <span className="absolute right-2 top-2 flex h-5 w-5 items-center justify-center rounded-full bg-purple-500">
+          <Check size={12} weight="bold" className="text-white" />
+        </span>
+      )}
      <Text
        variant="lead"
        as="span"
--- a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/PainPointsStep.tsx
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/PainPointsStep.tsx
@@ -3,6 +3,7 @@
 import { Button } from "@/components/atoms/Button/Button";
 import { Input } from "@/components/atoms/Input/Input";
 import { Text } from "@/components/atoms/Text/Text";
+import { cn } from "@/lib/utils";
 import { ReactNode } from "react";

 import { FadeIn } from "@/components/atoms/FadeIn/FadeIn";
@@ -73,6 +74,8 @@ export function PainPointsStep() {
    togglePainPoint,
    setOtherPainPoint,
    hasSomethingElse,
+    atLimit,
+    shaking,
    canContinue,
    handleLaunch,
  } = usePainPointsStep();
@@ -90,7 +93,7 @@ export function PainPointsStep() {
            What&apos;s eating your time?
          </Text>
          <Text variant="lead" className="!text-zinc-500">
-            Pick the tasks you&apos;d love to hand off to Autopilot
+            Pick the tasks you&apos;d love to hand off to AutoPilot
          </Text>
        </div>

@@ -107,11 +110,22 @@ export function PainPointsStep() {
              />
            ))}
          </div>
-          {!hasSomethingElse ? (
-            <Text variant="small" className="!text-zinc-500">
-              Pick as many as you want — you can always change later
-            </Text>
-          ) : null}
+          <Text
+            variant="small"
+            className={cn(
+              "transition-colors",
+              atLimit && canContinue ? "!text-green-600" : "!text-zinc-500",
+              shaking && "animate-shake",
+            )}
+          >
+            {shaking
+              ? "You've picked 3 — tap one to swap it out"
+              : atLimit && canContinue
+                ? "3 selected — you're all set!"
+                : atLimit && hasSomethingElse
+                  ? "Tell us what else takes up your time"
+                  : "Pick up to 3 to start — AutoPilot can help with anything else later"}
+          </Text>
        </div>

        {hasSomethingElse && (
@@ -133,7 +147,7 @@ export function PainPointsStep() {
          disabled={!canContinue}
          className="w-full max-w-xs"
        >
-          Launch Autopilot
+          Launch AutoPilot
        </Button>
      </div>
    </FadeIn>
--- a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/RoleStep.tsx
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/RoleStep.tsx
@@ -8,6 +8,7 @@ import { FadeIn } from "@/components/atoms/FadeIn/FadeIn";
 import { SelectableCard } from "../components/SelectableCard";
 import { useOnboardingWizardStore } from "../store";
 import { Emoji } from "@/components/atoms/Emoji/Emoji";
+import { useEffect, useRef } from "react";

 const IMG_SIZE = 42;

@@ -57,12 +58,26 @@ export function RoleStep() {
  const setRole = useOnboardingWizardStore((s) => s.setRole);
  const setOtherRole = useOnboardingWizardStore((s) => s.setOtherRole);
  const nextStep = useOnboardingWizardStore((s) => s.nextStep);
+  const autoAdvanceTimer = useRef<ReturnType<typeof setTimeout> | null>(null);

  const isOther = role === "Other";
-  const canContinue = role && (!isOther || otherRole.trim());

-  function handleContinue() {
-    if (canContinue) {
+  useEffect(() => {
+    return () => {
+      if (autoAdvanceTimer.current) clearTimeout(autoAdvanceTimer.current);
+    };
+  }, []);
+
+  function handleRoleSelect(id: string) {
+    if (autoAdvanceTimer.current) clearTimeout(autoAdvanceTimer.current);
+    setRole(id);
+    if (id !== "Other") {
+      autoAdvanceTimer.current = setTimeout(nextStep, 350);
+    }
+  }
+
+  function handleOtherContinue() {
+    if (otherRole.trim()) {
      nextStep();
    }
  }
@@ -78,7 +93,7 @@ export function RoleStep() {
            What best describes you, {name}?
          </Text>
          <Text variant="lead" className="!text-zinc-500">
-            Autopilot will tailor automations to your world
+            So AutoPilot knows how to help you best
          </Text>
        </div>

@@ -89,33 +104,35 @@ export function RoleStep() {
              icon={r.icon}
              label={r.label}
              selected={role === r.id}
-              onClick={() => setRole(r.id)}
+              onClick={() => handleRoleSelect(r.id)}
              className="p-8"
            />
          ))}
        </div>

        {isOther && (
-          <div className="-mb-5 w-full px-8 md:px-0">
-            <Input
-              id="other-role"
-              label="Other role"
-              hideLabel
-              placeholder="Describe your role..."
-              value={otherRole}
-              onChange={(e) => setOtherRole(e.target.value)}
-              autoFocus
-            />
-          </div>
-        )}
+          <>
+            <div className="-mb-5 w-full px-8 md:px-0">
+              <Input
+                id="other-role"
+                label="Other role"
+                hideLabel
+                placeholder="Describe your role..."
+                value={otherRole}
+                onChange={(e) => setOtherRole(e.target.value)}
+                autoFocus
+              />
+            </div>

-        <Button
-          onClick={handleContinue}
-          disabled={!canContinue}
-          className="w-full max-w-xs"
-        >
-          Continue
-        </Button>
+            <Button
+              onClick={handleOtherContinue}
+              disabled={!otherRole.trim()}
+              className="w-full max-w-xs"
+            >
+              Continue
+            </Button>
+          </>
+        )}
      </div>
    </FadeIn>
  );
--- a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/WelcomeStep.tsx
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/WelcomeStep.tsx
@@ -4,13 +4,6 @@ import { AutoGPTLogo } from "@/components/atoms/AutoGPTLogo/AutoGPTLogo";
 import { Button } from "@/components/atoms/Button/Button";
 import { Input } from "@/components/atoms/Input/Input";
 import { Text } from "@/components/atoms/Text/Text";
-import {
-  Tooltip,
-  TooltipContent,
-  TooltipProvider,
-  TooltipTrigger,
-} from "@/components/atoms/Tooltip/BaseTooltip";
-import { Question } from "@phosphor-icons/react";
 import { FadeIn } from "@/components/atoms/FadeIn/FadeIn";
 import { useOnboardingWizardStore } from "../store";

@@ -40,36 +33,16 @@ export function WelcomeStep() {
          <Text variant="h3">Welcome to AutoGPT</Text>
          <Text variant="lead" as="span" className="!text-zinc-500">
            Let&apos;s personalize your experience so{" "}
-            <span className="relative mr-3 inline-block bg-gradient-to-r from-purple-500 to-indigo-500 bg-clip-text text-transparent">
-              Autopilot
-              <span className="absolute -right-4 top-0">
-                <TooltipProvider delayDuration={400}>
-                  <Tooltip>
-                    <TooltipTrigger asChild>
-                      <button
-                        type="button"
-                        aria-label="What is Autopilot?"
-                        className="inline-flex text-purple-500"
-                      >
-                        <Question size={14} />
-                      </button>
-                    </TooltipTrigger>
-                    <TooltipContent>
-                      Autopilot is AutoGPT&apos;s AI assistant that watches your
-                      connected apps, spots repetitive tasks you do every day
-                      and runs them for you automatically.
-                    </TooltipContent>
-                  </Tooltip>
-                </TooltipProvider>
-              </span>
+            <span className="bg-gradient-to-r from-purple-500 to-indigo-500 bg-clip-text text-transparent">
+              AutoPilot
            </span>{" "}
-            can start saving you time right away
+            can start saving you time
          </Text>
        </div>

        <Input
          id="first-name"
-          label="Your first name"
+          label="What should I call you?"
          placeholder="e.g. John"
          value={name}
          onChange={(e) => setName(e.target.value)}
--- a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/tests/PainPointsStep.test.tsx
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/tests/PainPointsStep.test.tsx
@@ -0,0 +1,154 @@
+import {
+  render,
+  screen,
+  fireEvent,
+  cleanup,
+} from "@/tests/integrations/test-utils";
+import { afterEach, beforeEach, describe, expect, test, vi } from "vitest";
+import { useOnboardingWizardStore } from "../../store";
+import { PainPointsStep } from "../PainPointsStep";
+
+vi.mock("@/components/atoms/Emoji/Emoji", () => ({
+  Emoji: ({ text }: { text: string }) => <span>{text}</span>,
+}));
+
+vi.mock("@/components/atoms/FadeIn/FadeIn", () => ({
+  FadeIn: ({ children }: { children: React.ReactNode }) => (
+    <div>{children}</div>
+  ),
+}));
+
+function getCard(name: RegExp) {
+  return screen.getByRole("button", { name });
+}
+
+function clickCard(name: RegExp) {
+  fireEvent.click(getCard(name));
+}
+
+function getLaunchButton() {
+  return screen.getByRole("button", { name: /launch autopilot/i });
+}
+
+afterEach(cleanup);
+
+beforeEach(() => {
+  useOnboardingWizardStore.getState().reset();
+  useOnboardingWizardStore.getState().setName("Alice");
+  useOnboardingWizardStore.getState().setRole("Founder/CEO");
+  useOnboardingWizardStore.getState().goToStep(3);
+});
+
+describe("PainPointsStep", () => {
+  test("renders all pain point cards", () => {
+    render(<PainPointsStep />);
+
+    expect(getCard(/finding leads/i)).toBeDefined();
+    expect(getCard(/email & outreach/i)).toBeDefined();
+    expect(getCard(/reports & data/i)).toBeDefined();
+    expect(getCard(/customer support/i)).toBeDefined();
+    expect(getCard(/social media/i)).toBeDefined();
+    expect(getCard(/something else/i)).toBeDefined();
+  });
+
+  test("shows default helper text", () => {
+    render(<PainPointsStep />);
+
+    expect(
+      screen.getAllByText(/pick up to 3 to start/i).length,
+    ).toBeGreaterThan(0);
+  });
+
+  test("selecting a card marks it as pressed", () => {
+    render(<PainPointsStep />);
+
+    clickCard(/finding leads/i);
+
+    expect(getCard(/finding leads/i).getAttribute("aria-pressed")).toBe("true");
+  });
+
+  test("launch button is disabled when nothing is selected", () => {
+    render(<PainPointsStep />);
+
+    expect(getLaunchButton().hasAttribute("disabled")).toBe(true);
+  });
+
+  test("launch button is enabled after selecting a pain point", () => {
+    render(<PainPointsStep />);
+
+    clickCard(/finding leads/i);
+
+    expect(getLaunchButton().hasAttribute("disabled")).toBe(false);
+  });
+
+  test("shows success text when 3 items are selected", () => {
+    render(<PainPointsStep />);
+
+    clickCard(/finding leads/i);
+    clickCard(/email & outreach/i);
+    clickCard(/reports & data/i);
+
+    expect(screen.getAllByText(/3 selected/i).length).toBeGreaterThan(0);
+  });
+
+  test("does not select a 4th item when at the limit", () => {
+    render(<PainPointsStep />);
+
+    clickCard(/finding leads/i);
+    clickCard(/email & outreach/i);
+    clickCard(/reports & data/i);
+    clickCard(/customer support/i);
+
+    expect(getCard(/customer support/i).getAttribute("aria-pressed")).toBe(
+      "false",
+    );
+  });
+
+  test("can deselect when at the limit and select a different one", () => {
+    render(<PainPointsStep />);
+
+    clickCard(/finding leads/i);
+    clickCard(/email & outreach/i);
+    clickCard(/reports & data/i);
+
+    clickCard(/finding leads/i);
+    expect(getCard(/finding leads/i).getAttribute("aria-pressed")).toBe(
+      "false",
+    );
+
+    clickCard(/customer support/i);
+    expect(getCard(/customer support/i).getAttribute("aria-pressed")).toBe(
+      "true",
+    );
+  });
+
+  test("shows input when 'Something else' is selected", () => {
+    render(<PainPointsStep />);
+
+    clickCard(/something else/i);
+
+    expect(
+      screen.getByPlaceholderText(/what else takes up your time/i),
+    ).toBeDefined();
+  });
+
+  test("launch button is disabled when 'Something else' selected but input empty", () => {
+    render(<PainPointsStep />);
+
+    clickCard(/something else/i);
+
+    expect(getLaunchButton().hasAttribute("disabled")).toBe(true);
+  });
+
+  test("launch button is enabled when 'Something else' selected and input filled", () => {
+    render(<PainPointsStep />);
+
+    clickCard(/something else/i);
+    fireEvent.change(
+      screen.getByPlaceholderText(/what else takes up your time/i),
+      { target: { value: "Manual invoicing" } },
+    );
+
+    expect(getLaunchButton().hasAttribute("disabled")).toBe(false);
+  });
+});
--- a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/tests/RoleStep.test.tsx
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/tests/RoleStep.test.tsx
@@ -0,0 +1,123 @@
+import {
+  render,
+  screen,
+  fireEvent,
+  cleanup,
+} from "@/tests/integrations/test-utils";
+import { afterEach, beforeEach, describe, expect, test, vi } from "vitest";
+import { useOnboardingWizardStore } from "../../store";
+import { RoleStep } from "../RoleStep";
+
+vi.mock("@/components/atoms/Emoji/Emoji", () => ({
+  Emoji: ({ text }: { text: string }) => <span>{text}</span>,
+}));
+
+vi.mock("@/components/atoms/FadeIn/FadeIn", () => ({
+  FadeIn: ({ children }: { children: React.ReactNode }) => (
+    <div>{children}</div>
+  ),
+}));
+
+afterEach(() => {
+  cleanup();
+  vi.useRealTimers();
+});
+
+beforeEach(() => {
+  vi.useFakeTimers();
+  useOnboardingWizardStore.getState().reset();
+  useOnboardingWizardStore.getState().setName("Alice");
+  useOnboardingWizardStore.getState().goToStep(2);
+});
+
+describe("RoleStep", () => {
+  test("renders all role cards", () => {
+    render(<RoleStep />);
+
+    expect(screen.getByText("Founder / CEO")).toBeDefined();
+    expect(screen.getByText("Operations")).toBeDefined();
+    expect(screen.getByText("Sales / BD")).toBeDefined();
+    expect(screen.getByText("Marketing")).toBeDefined();
+    expect(screen.getByText("Product / PM")).toBeDefined();
+    expect(screen.getByText("Engineering")).toBeDefined();
+    expect(screen.getByText("HR / People")).toBeDefined();
+    expect(screen.getByText("Other")).toBeDefined();
+  });
+
+  test("displays the user name in the heading", () => {
+    render(<RoleStep />);
+
+    expect(
+      screen.getAllByText(/what best describes you, alice/i).length,
+    ).toBeGreaterThan(0);
+  });
+
+  test("selecting a non-Other role auto-advances after delay", () => {
+    render(<RoleStep />);
+
+    fireEvent.click(screen.getByRole("button", { name: /engineering/i }));
+
+    expect(useOnboardingWizardStore.getState().role).toBe("Engineering");
+    expect(useOnboardingWizardStore.getState().currentStep).toBe(2);
+
+    vi.advanceTimersByTime(350);
+
+    expect(useOnboardingWizardStore.getState().currentStep).toBe(3);
+  });
+
+  test("selecting 'Other' does not auto-advance", () => {
+    render(<RoleStep />);
+
+    fireEvent.click(screen.getByRole("button", { name: /\bother\b/i }));
+
+    vi.advanceTimersByTime(500);
+
+    expect(useOnboardingWizardStore.getState().currentStep).toBe(2);
+  });
+
+  test("selecting 'Other' shows text input and Continue button", () => {
+    render(<RoleStep />);
+
+    fireEvent.click(screen.getByRole("button", { name: /\bother\b/i }));
+
+    expect(screen.getByPlaceholderText(/describe your role/i)).toBeDefined();
+    expect(screen.getByRole("button", { name: /continue/i })).toBeDefined();
+  });
+
+  test("Continue button is disabled when Other input is empty", () => {
+    render(<RoleStep />);
+
+    fireEvent.click(screen.getByRole("button", { name: /\bother\b/i }));
+
+    const continueBtn = screen.getByRole("button", { name: /continue/i });
+    expect(continueBtn.hasAttribute("disabled")).toBe(true);
+  });
+
+  test("Continue button advances when Other role text is filled", () => {
+    render(<RoleStep />);
+
+    fireEvent.click(screen.getByRole("button", { name: /\bother\b/i }));
+    fireEvent.change(screen.getByPlaceholderText(/describe your role/i), {
+      target: { value: "Designer" },
+    });
+
+    const continueBtn = screen.getByRole("button", { name: /continue/i });
+    expect(continueBtn.hasAttribute("disabled")).toBe(false);
+
+    fireEvent.click(continueBtn);
+    expect(useOnboardingWizardStore.getState().currentStep).toBe(3);
+  });
+
+  test("switching from Other to a regular role cancels Other and auto-advances", () => {
+    render(<RoleStep />);
+
+    fireEvent.click(screen.getByRole("button", { name: /\bother\b/i }));
+    expect(screen.getByPlaceholderText(/describe your role/i)).toBeDefined();
+
+    fireEvent.click(screen.getByRole("button", { name: /marketing/i }));
+
+    expect(useOnboardingWizardStore.getState().role).toBe("Marketing");
+    vi.advanceTimersByTime(350);
+    expect(useOnboardingWizardStore.getState().currentStep).toBe(3);
+  });
+});
--- a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/usePainPointsStep.ts
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/usePainPointsStep.ts
@@ -1,4 +1,5 @@
-import { useOnboardingWizardStore } from "../store";
+import { useEffect, useRef, useState } from "react";
+import { MAX_PAIN_POINT_SELECTIONS, useOnboardingWizardStore } from "../store";

 const ROLE_TOP_PICKS: Record<string, string[]> = {
  "Founder/CEO": [
@@ -23,18 +24,38 @@ export function usePainPointsStep() {
  const role = useOnboardingWizardStore((s) => s.role);
  const painPoints = useOnboardingWizardStore((s) => s.painPoints);
  const otherPainPoint = useOnboardingWizardStore((s) => s.otherPainPoint);
-  const togglePainPoint = useOnboardingWizardStore((s) => s.togglePainPoint);
+  const storeToggle = useOnboardingWizardStore((s) => s.togglePainPoint);
  const setOtherPainPoint = useOnboardingWizardStore(
    (s) => s.setOtherPainPoint,
  );
  const nextStep = useOnboardingWizardStore((s) => s.nextStep);
+  const [shaking, setShaking] = useState(false);
+  const shakeTimer = useRef<ReturnType<typeof setTimeout> | null>(null);
+
+  useEffect(() => {
+    return () => {
+      if (shakeTimer.current) clearTimeout(shakeTimer.current);
+    };
+  }, []);

  const topIDs = getTopPickIDs(role);
  const hasSomethingElse = painPoints.includes("Something else");
+  const atLimit = painPoints.length >= MAX_PAIN_POINT_SELECTIONS;
  const canContinue =
    painPoints.length > 0 &&
    (!hasSomethingElse || Boolean(otherPainPoint.trim()));

+  function togglePainPoint(id: string) {
+    const alreadySelected = painPoints.includes(id);
+    if (!alreadySelected && atLimit) {
+      if (shakeTimer.current) clearTimeout(shakeTimer.current);
+      setShaking(true);
+      shakeTimer.current = setTimeout(() => setShaking(false), 600);
+      return;
+    }
+    storeToggle(id);
+  }
+
  function handleLaunch() {
    if (canContinue) {
      nextStep();
@@ -48,6 +69,8 @@ export function usePainPointsStep() {
    togglePainPoint,
    setOtherPainPoint,
    hasSomethingElse,
+    atLimit,
+    shaking,
    canContinue,
    handleLaunch,
  };
--- a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/store.ts
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/store.ts
@@ -1,5 +1,6 @@
 import { create } from "zustand";

+export const MAX_PAIN_POINT_SELECTIONS = 3;
 export type Step = 1 | 2 | 3 | 4;

 interface OnboardingWizardState {
@@ -40,6 +41,8 @@ export const useOnboardingWizardStore = create<OnboardingWizardState>(
    togglePainPoint(painPoint) {
      set((state) => {
        const exists = state.painPoints.includes(painPoint);
+        if (!exists && state.painPoints.length >= MAX_PAIN_POINT_SELECTIONS)
+          return state;
        return {
          painPoints: exists
            ? state.painPoints.filter((p) => p !== painPoint)
--- a/autogpt_platform/frontend/src/app/(platform)/admin/rate-limits/components/RateLimitDisplay.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/admin/rate-limits/components/RateLimitDisplay.tsx
@@ -3,18 +3,48 @@
 import { useState } from "react";
 import { Button } from "@/components/atoms/Button/Button";
 import type { UserRateLimitResponse } from "@/app/api/__generated__/models/userRateLimitResponse";
+import { useToast } from "@/components/molecules/Toast/use-toast";
 import { UsageBar } from "../../components/UsageBar";

+const TIERS = ["FREE", "PRO", "BUSINESS", "ENTERPRISE"] as const;
+type Tier = (typeof TIERS)[number];
+
+const TIER_MULTIPLIERS: Record<Tier, string> = {
+  FREE: "1x base limits",
+  PRO: "5x base limits",
+  BUSINESS: "20x base limits",
+  ENTERPRISE: "60x base limits",
+};
+
+const TIER_COLORS: Record<Tier, string> = {
+  FREE: "bg-gray-100 text-gray-700",
+  PRO: "bg-blue-100 text-blue-700",
+  BUSINESS: "bg-purple-100 text-purple-700",
+  ENTERPRISE: "bg-amber-100 text-amber-700",
+};
+
 interface Props {
  data: UserRateLimitResponse;
  onReset: (resetWeekly: boolean) => Promise<void>;
+  onTierChange?: (newTier: string) => Promise<void>;
  /** Override the outer container classes (default: bordered card). */
  className?: string;
 }

-export function RateLimitDisplay({ data, onReset, className }: Props) {
+export function RateLimitDisplay({
+  data,
+  onReset,
+  onTierChange,
+  className,
+}: Props) {
  const [isResetting, setIsResetting] = useState(false);
  const [resetWeekly, setResetWeekly] = useState(false);
+  const [isChangingTier, setIsChangingTier] = useState(false);
+  const { toast } = useToast();
+
+  const currentTier = TIERS.includes(data.tier as Tier)
+    ? (data.tier as Tier)
+    : "FREE";

  async function handleReset() {
    const msg = resetWeekly
@@ -30,19 +60,76 @@ export function RateLimitDisplay({ data, onReset, className }: Props) {
    }
  }

+  async function handleTierChange(newTier: string) {
+    if (newTier === currentTier || !onTierChange) return;
+    if (
+      !window.confirm(
+        `Change tier from ${currentTier} to ${newTier}? This will change the user's rate limits.`,
+      )
+    )
+      return;
+
+    setIsChangingTier(true);
+    try {
+      await onTierChange(newTier);
+      toast({
+        title: "Tier updated",
+        description: `Changed to ${newTier} (${TIER_MULTIPLIERS[newTier as Tier]}).`,
+      });
+    } catch {
+      toast({
+        title: "Error",
+        description: "Failed to update tier.",
+        variant: "destructive",
+      });
+    } finally {
+      setIsChangingTier(false);
+    }
+  }
+
  const nothingToReset = resetWeekly
    ? data.daily_tokens_used === 0 && data.weekly_tokens_used === 0
    : data.daily_tokens_used === 0;

  return (
    <div className={className ?? "rounded-md border bg-white p-6"}>
-      <h2 className="mb-1 text-lg font-semibold">
-        Rate Limits for {data.user_email ?? data.user_id}
-      </h2>
-      {data.user_email && (
-        <p className="mb-4 text-xs text-gray-500">User ID: {data.user_id}</p>
-      )}
-      {!data.user_email && <div className="mb-4" />}
+      <div className="mb-4 flex items-start justify-between">
+        <div>
+          <h2 className="mb-1 text-lg font-semibold">
+            Rate Limits for {data.user_email ?? data.user_id}
+          </h2>
+          {data.user_email && (
+            <p className="text-xs text-gray-500">User ID: {data.user_id}</p>
+          )}
+        </div>
+        <span
+          className={`rounded-full px-3 py-1 text-xs font-medium ${TIER_COLORS[currentTier] ?? "bg-gray-100 text-gray-700"}`}
+        >
+          {currentTier}
+        </span>
+      </div>
+
+      <div className="mb-4 flex items-center gap-3">
+        <label className="text-sm font-medium text-gray-700">
+          Subscription Tier
+        </label>
+        <select
+          aria-label="Subscription tier"
+          value={currentTier}
+          onChange={(e) => handleTierChange(e.target.value)}
+          className="rounded-md border bg-white px-3 py-1.5 text-sm"
+          disabled={isChangingTier || !onTierChange}
+        >
+          {TIERS.map((tier) => (
+            <option key={tier} value={tier}>
+              {tier} — {TIER_MULTIPLIERS[tier]}
+            </option>
+          ))}
+        </select>
+        {isChangingTier && (
+          <span className="text-xs text-gray-500">Updating...</span>
+        )}
+      </div>

      <div className="grid grid-cols-2 gap-6">
        <div className="space-y-2">
--- a/autogpt_platform/frontend/src/app/(platform)/admin/rate-limits/components/RateLimitManager.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/admin/rate-limits/components/RateLimitManager.tsx
@@ -14,6 +14,7 @@ export function RateLimitManager() {
    handleSearch,
    handleSelectUser,
    handleReset,
+    handleTierChange,
  } = useRateLimitManager();

  return (
@@ -74,7 +75,11 @@ export function RateLimitManager() {
      )}

      {rateLimitData && (
-        <RateLimitDisplay data={rateLimitData} onReset={handleReset} />
+        <RateLimitDisplay
+          data={rateLimitData}
+          onReset={handleReset}
+          onTierChange={handleTierChange}
+        />
      )}
    </div>
  );
--- a/autogpt_platform/frontend/src/app/(platform)/admin/rate-limits/components/tests/RateLimitDisplay.test.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/admin/rate-limits/components/tests/RateLimitDisplay.test.tsx
@@ -0,0 +1,281 @@
+import {
+  render,
+  screen,
+  fireEvent,
+  waitFor,
+  cleanup,
+} from "@/tests/integrations/test-utils";
+import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
+import { RateLimitDisplay } from "../RateLimitDisplay";
+import type { UserRateLimitResponse } from "@/app/api/__generated__/models/userRateLimitResponse";
+
+vi.mock("@/components/molecules/Toast/use-toast", () => ({
+  useToast: () => ({ toast: vi.fn() }),
+}));
+
+const mockConfirm = vi.fn();
+
+beforeEach(() => {
+  mockConfirm.mockReset();
+  window.confirm = mockConfirm;
+});
+
+afterEach(() => {
+  cleanup();
+});
+
+function makeData(
+  overrides: Partial<UserRateLimitResponse> = {},
+): UserRateLimitResponse {
+  return {
+    user_id: "user-abc-123",
+    user_email: "alice@example.com",
+    daily_token_limit: 10000,
+    weekly_token_limit: 50000,
+    daily_tokens_used: 2500,
+    weekly_tokens_used: 10000,
+    tier: "FREE",
+    ...overrides,
+  };
+}
+
+describe("RateLimitDisplay", () => {
+  it("renders the user email heading", () => {
+    render(<RateLimitDisplay data={makeData()} onReset={vi.fn()} />);
+    expect(
+      screen.getByText(/Rate Limits for alice@example\.com/),
+    ).toBeDefined();
+  });
+
+  it("renders user ID when email is present", () => {
+    render(<RateLimitDisplay data={makeData()} onReset={vi.fn()} />);
+    expect(screen.getByText(/user-abc-123/)).toBeDefined();
+  });
+
+  it("falls back to user_id in heading when email is absent", () => {
+    render(
+      <RateLimitDisplay
+        data={makeData({ user_email: undefined })}
+        onReset={vi.fn()}
+      />,
+    );
+    expect(screen.getByText(/Rate Limits for user-abc-123/)).toBeDefined();
+  });
+
+  it("displays the current tier badge", () => {
+    render(
+      <RateLimitDisplay data={makeData({ tier: "PRO" })} onReset={vi.fn()} />,
+    );
+    const badge = screen.getByText("PRO");
+    expect(badge).toBeDefined();
+    expect(badge.className).toContain("bg-blue-100");
+  });
+
+  it("defaults unknown tier to FREE", () => {
+    render(
+      <RateLimitDisplay
+        data={makeData({ tier: "UNKNOWN" as UserRateLimitResponse["tier"] })}
+        onReset={vi.fn()}
+      />,
+    );
+    const badge = screen.getByText("FREE");
+    expect(badge).toBeDefined();
+  });
+
+  it("renders tier dropdown with all tiers", () => {
+    render(<RateLimitDisplay data={makeData()} onReset={vi.fn()} />);
+    const select = screen.getByLabelText("Subscription tier");
+    expect(select).toBeDefined();
+    expect(select.querySelectorAll("option").length).toBe(4);
+  });
+
+  it("disables tier dropdown when onTierChange is not provided", () => {
+    render(<RateLimitDisplay data={makeData()} onReset={vi.fn()} />);
+    const select = screen.getByLabelText(
+      "Subscription tier",
+    ) as HTMLSelectElement;
+    expect(select.disabled).toBe(true);
+  });
+
+  it("enables tier dropdown when onTierChange is provided", () => {
+    render(
+      <RateLimitDisplay
+        data={makeData()}
+        onReset={vi.fn()}
+        onTierChange={vi.fn()}
+      />,
+    );
+    const select = screen.getByLabelText(
+      "Subscription tier",
+    ) as HTMLSelectElement;
+    expect(select.disabled).toBe(false);
+  });
+
+  it("renders daily and weekly usage sections", () => {
+    render(<RateLimitDisplay data={makeData()} onReset={vi.fn()} />);
+    expect(screen.getByText("Daily Usage")).toBeDefined();
+    expect(screen.getByText("Weekly Usage")).toBeDefined();
+  });
+
+  it("renders reset scope dropdown and reset button", () => {
+    render(<RateLimitDisplay data={makeData()} onReset={vi.fn()} />);
+    expect(screen.getByLabelText("Reset scope")).toBeDefined();
+    expect(screen.getByText("Reset Usage")).toBeDefined();
+  });
+
+  it("disables reset button when nothing to reset", () => {
+    render(
+      <RateLimitDisplay
+        data={makeData({ daily_tokens_used: 0 })}
+        onReset={vi.fn()}
+      />,
+    );
+    const button = screen.getByText("Reset Usage").closest("button")!;
+    expect(button.disabled).toBe(true);
+  });
+
+  it("enables reset button when there is usage to reset", () => {
+    render(
+      <RateLimitDisplay
+        data={makeData({ daily_tokens_used: 100 })}
+        onReset={vi.fn()}
+      />,
+    );
+    const button = screen.getByText("Reset Usage").closest("button")!;
+    expect(button.disabled).toBe(false);
+  });
+
+  it("calls onReset when reset button is clicked and confirmed", async () => {
+    const onReset = vi.fn().mockResolvedValue(undefined);
+    mockConfirm.mockReturnValue(true);
+
+    render(<RateLimitDisplay data={makeData()} onReset={onReset} />);
+
+    fireEvent.click(screen.getByText("Reset Usage"));
+
+    await waitFor(() => {
+      expect(onReset).toHaveBeenCalledWith(false);
+    });
+  });
+
+  it("does not call onReset when confirm is cancelled", () => {
+    const onReset = vi.fn();
+    mockConfirm.mockReturnValue(false);
+
+    render(<RateLimitDisplay data={makeData()} onReset={onReset} />);
+
+    fireEvent.click(screen.getByText("Reset Usage"));
+    expect(onReset).not.toHaveBeenCalled();
+  });
+
+  it("passes resetWeekly=true when 'both' is selected", async () => {
+    const onReset = vi.fn().mockResolvedValue(undefined);
+    mockConfirm.mockReturnValue(true);
+
+    render(
+      <RateLimitDisplay
+        data={makeData({ weekly_tokens_used: 100 })}
+        onReset={onReset}
+      />,
+    );
+
+    fireEvent.change(screen.getByLabelText("Reset scope"), {
+      target: { value: "both" },
+    });
+    fireEvent.click(screen.getByText("Reset Usage"));
+
+    await waitFor(() => {
+      expect(onReset).toHaveBeenCalledWith(true);
+    });
+  });
+
+  it("calls onTierChange when tier is changed and confirmed", async () => {
+    const onTierChange = vi.fn().mockResolvedValue(undefined);
+    mockConfirm.mockReturnValue(true);
+
+    render(
+      <RateLimitDisplay
+        data={makeData({ tier: "FREE" })}
+        onReset={vi.fn()}
+        onTierChange={onTierChange}
+      />,
+    );
+
+    fireEvent.change(screen.getByLabelText("Subscription tier"), {
+      target: { value: "PRO" },
+    });
+
+    await waitFor(() => {
+      expect(onTierChange).toHaveBeenCalledWith("PRO");
+    });
+  });
+
+  it("does not call onTierChange when selecting the same tier", () => {
+    const onTierChange = vi.fn();
+
+    render(
+      <RateLimitDisplay
+        data={makeData({ tier: "FREE" })}
+        onReset={vi.fn()}
+        onTierChange={onTierChange}
+      />,
+    );
+
+    fireEvent.change(screen.getByLabelText("Subscription tier"), {
+      target: { value: "FREE" },
+    });
+
+    expect(onTierChange).not.toHaveBeenCalled();
+  });
+
+  it("does not call onTierChange when confirm is cancelled", () => {
+    const onTierChange = vi.fn();
+    mockConfirm.mockReturnValue(false);
+
+    render(
+      <RateLimitDisplay
+        data={makeData({ tier: "FREE" })}
+        onReset={vi.fn()}
+        onTierChange={onTierChange}
+      />,
+    );
+
+    fireEvent.change(screen.getByLabelText("Subscription tier"), {
+      target: { value: "PRO" },
+    });
+
+    expect(onTierChange).not.toHaveBeenCalled();
+  });
+
+  it("catches error when onTierChange rejects", async () => {
+    const onTierChange = vi.fn().mockRejectedValue(new Error("fail"));
+    mockConfirm.mockReturnValue(true);
+
+    render(
+      <RateLimitDisplay
+        data={makeData({ tier: "FREE" })}
+        onReset={vi.fn()}
+        onTierChange={onTierChange}
+      />,
+    );
+
+    fireEvent.change(screen.getByLabelText("Subscription tier"), {
+      target: { value: "PRO" },
+    });
+
+    await waitFor(() => {
+      expect(onTierChange).toHaveBeenCalledWith("PRO");
+    });
+  });
+
+  it("applies custom className when provided", () => {
+    const { container } = render(
+      <RateLimitDisplay
+        data={makeData()}
+        onReset={vi.fn()}
+        className="custom-class"
+      />,
+    );
+    expect(container.firstElementChild?.className).toBe("custom-class");
+  });
+});
--- a/autogpt_platform/frontend/src/app/(platform)/admin/rate-limits/components/tests/RateLimitManager.test.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/admin/rate-limits/components/tests/RateLimitManager.test.tsx
@@ -0,0 +1,216 @@
+import {
+  render,
+  screen,
+  fireEvent,
+  cleanup,
+} from "@/tests/integrations/test-utils";
+import { afterEach, describe, expect, it, vi } from "vitest";
+import { RateLimitManager } from "../RateLimitManager";
+import type { UserRateLimitResponse } from "@/app/api/__generated__/models/userRateLimitResponse";
+
+const mockHandleSearch = vi.fn();
+const mockHandleSelectUser = vi.fn();
+const mockHandleReset = vi.fn();
+const mockHandleTierChange = vi.fn();
+
+vi.mock("../useRateLimitManager", () => ({
+  useRateLimitManager: () => mockHookReturn,
+}));
+
+vi.mock("../../../components/AdminUserSearch", () => ({
+  AdminUserSearch: ({
+    onSearch,
+    placeholder,
+    isLoading,
+  }: {
+    onSearch: (q: string) => void;
+    placeholder: string;
+    isLoading: boolean;
+  }) => (
+    <div data-testid="admin-user-search">
+      <input
+        data-testid="search-input"
+        placeholder={placeholder}
+        disabled={isLoading}
+        onKeyDown={(e) => {
+          if (e.key === "Enter") onSearch((e.target as HTMLInputElement).value);
+        }}
+      />
+    </div>
+  ),
+}));
+
+vi.mock("../RateLimitDisplay", () => ({
+  RateLimitDisplay: ({
+    data,
+    onReset,
+    onTierChange,
+  }: {
+    data: UserRateLimitResponse;
+    onReset: (rw: boolean) => void;
+    onTierChange: (t: string) => void;
+  }) => (
+    <div data-testid="rate-limit-display">
+      <span>{data.user_email ?? data.user_id}</span>
+      <button onClick={() => onReset(false)}>mock-reset</button>
+      <button onClick={() => onTierChange("PRO")}>mock-tier</button>
+    </div>
+  ),
+}));
+
+let mockHookReturn = buildHookReturn();
+
+function buildHookReturn(overrides: Record<string, unknown> = {}) {
+  return {
+    isSearching: false,
+    isLoadingRateLimit: false,
+    searchResults: [] as Array<{ user_id: string; user_email: string }>,
+    selectedUser: null as { user_id: string; user_email: string } | null,
+    rateLimitData: null as UserRateLimitResponse | null,
+    handleSearch: mockHandleSearch,
+    handleSelectUser: mockHandleSelectUser,
+    handleReset: mockHandleReset,
+    handleTierChange: mockHandleTierChange,
+    ...overrides,
+  };
+}
+
+afterEach(() => {
+  cleanup();
+  mockHandleSearch.mockClear();
+  mockHandleSelectUser.mockClear();
+  mockHandleReset.mockClear();
+  mockHandleTierChange.mockClear();
+  mockHookReturn = buildHookReturn();
+});
+
+describe("RateLimitManager", () => {
+  it("renders the search section", () => {
+    render(<RateLimitManager />);
+    expect(screen.getByText("Search User")).toBeDefined();
+    expect(screen.getByTestId("admin-user-search")).toBeDefined();
+  });
+
+  it("renders description text for search", () => {
+    render(<RateLimitManager />);
+    expect(
+      screen.getByText(/Exact email or user ID does a direct lookup/),
+    ).toBeDefined();
+  });
+
+  it("does not show user list when searchResults is empty", () => {
+    render(<RateLimitManager />);
+    expect(screen.queryByText(/Select a user/)).toBeNull();
+  });
+
+  it("shows user selection list when results exist and no user selected", () => {
+    mockHookReturn = buildHookReturn({
+      searchResults: [
+        { user_id: "u1", user_email: "alice@example.com" },
+        { user_id: "u2", user_email: "bob@example.com" },
+      ],
+    });
+
+    render(<RateLimitManager />);
+
+    expect(screen.getByText("Select a user (2 results)")).toBeDefined();
+    expect(screen.getByText("alice@example.com")).toBeDefined();
+    expect(screen.getByText("bob@example.com")).toBeDefined();
+  });
+
+  it("shows singular 'result' text for single result", () => {
+    mockHookReturn = buildHookReturn({
+      searchResults: [{ user_id: "u1", user_email: "alice@example.com" }],
+    });
+
+    render(<RateLimitManager />);
+    expect(screen.getByText("Select a user (1 result)")).toBeDefined();
+  });
+
+  it("calls handleSelectUser when a user in the list is clicked", () => {
+    const users = [
+      { user_id: "u1", user_email: "alice@example.com" },
+      { user_id: "u2", user_email: "bob@example.com" },
+    ];
+    mockHookReturn = buildHookReturn({ searchResults: users });
+
+    render(<RateLimitManager />);
+
+    fireEvent.click(screen.getByText("bob@example.com"));
+    expect(mockHandleSelectUser).toHaveBeenCalledWith(users[1]);
+  });
+
+  it("hides selection list when a user is selected", () => {
+    const users = [{ user_id: "u1", user_email: "alice@example.com" }];
+    mockHookReturn = buildHookReturn({
+      searchResults: users,
+      selectedUser: users[0],
+    });
+
+    render(<RateLimitManager />);
+    expect(screen.queryByText(/Select a user/)).toBeNull();
+  });
+
+  it("shows selected user indicator", () => {
+    const users = [{ user_id: "u1", user_email: "alice@example.com" }];
+    mockHookReturn = buildHookReturn({
+      searchResults: users,
+      selectedUser: users[0],
+    });
+
+    render(<RateLimitManager />);
+    expect(screen.getByText("Selected:")).toBeDefined();
+  });
+
+  it("shows loading message when isLoadingRateLimit is true", () => {
+    mockHookReturn = buildHookReturn({ isLoadingRateLimit: true });
+
+    render(<RateLimitManager />);
+    expect(screen.getByText("Loading rate limits...")).toBeDefined();
+  });
+
+  it("renders RateLimitDisplay when rateLimitData is present", () => {
+    mockHookReturn = buildHookReturn({
+      rateLimitData: {
+        user_id: "user-123",
+        user_email: "alice@example.com",
+        daily_token_limit: 10000,
+        weekly_token_limit: 50000,
+        daily_tokens_used: 2500,
+        weekly_tokens_used: 10000,
+        tier: "FREE",
+      },
+    });
+
+    render(<RateLimitManager />);
+    expect(screen.getByTestId("rate-limit-display")).toBeDefined();
+    expect(screen.getByText("alice@example.com")).toBeDefined();
+  });
+
+  it("does not render RateLimitDisplay when rateLimitData is null", () => {
+    render(<RateLimitManager />);
+    expect(screen.queryByTestId("rate-limit-display")).toBeNull();
+  });
+
+  it("passes handleReset and handleTierChange to RateLimitDisplay", () => {
+    mockHookReturn = buildHookReturn({
+      rateLimitData: {
+        user_id: "user-123",
+        user_email: "alice@example.com",
+        daily_token_limit: 10000,
+        weekly_token_limit: 50000,
+        daily_tokens_used: 2500,
+        weekly_tokens_used: 10000,
+        tier: "FREE",
+      },
+    });
+
+    render(<RateLimitManager />);
+
+    fireEvent.click(screen.getByText("mock-reset"));
+    expect(mockHandleReset).toHaveBeenCalledWith(false);
+
+    fireEvent.click(screen.getByText("mock-tier"));
+    expect(mockHandleTierChange).toHaveBeenCalledWith("PRO");
+  });
+});
--- a/autogpt_platform/frontend/src/app/(platform)/admin/rate-limits/components/tests/useRateLimitManager.test.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/admin/rate-limits/components/tests/useRateLimitManager.test.ts
@@ -0,0 +1,387 @@
+import { describe, expect, it, vi, beforeEach, afterEach } from "vitest";
+import { renderHook, act, cleanup } from "@testing-library/react";
+
+const mockToast = vi.fn();
+vi.mock("@/components/molecules/Toast/use-toast", () => ({
+  useToast: () => ({ toast: mockToast }),
+}));
+
+const mockGetV2GetUserRateLimit = vi.fn();
+const mockGetV2SearchUsersByNameOrEmail = vi.fn();
+const mockPostV2ResetUserRateLimitUsage = vi.fn();
+const mockPostV2SetUserRateLimitTier = vi.fn();
+
+vi.mock("@/app/api/__generated__/endpoints/admin/admin", () => ({
+  getV2GetUserRateLimit: (...args: unknown[]) =>
+    mockGetV2GetUserRateLimit(...args),
+  getV2SearchUsersByNameOrEmail: (...args: unknown[]) =>
+    mockGetV2SearchUsersByNameOrEmail(...args),
+  postV2ResetUserRateLimitUsage: (...args: unknown[]) =>
+    mockPostV2ResetUserRateLimitUsage(...args),
+  postV2SetUserRateLimitTier: (...args: unknown[]) =>
+    mockPostV2SetUserRateLimitTier(...args),
+}));
+
+import { useRateLimitManager } from "../useRateLimitManager";
+
+function makeRateLimitResponse(overrides = {}) {
+  return {
+    user_id: "user-123",
+    user_email: "alice@example.com",
+    daily_token_limit: 10000,
+    weekly_token_limit: 50000,
+    daily_tokens_used: 2500,
+    weekly_tokens_used: 10000,
+    tier: "FREE",
+    ...overrides,
+  };
+}
+
+beforeEach(() => {
+  mockToast.mockClear();
+  mockGetV2GetUserRateLimit.mockReset();
+  mockGetV2SearchUsersByNameOrEmail.mockReset();
+  mockPostV2ResetUserRateLimitUsage.mockReset();
+  mockPostV2SetUserRateLimitTier.mockReset();
+});
+
+afterEach(() => {
+  cleanup();
+});
+
+describe("useRateLimitManager", () => {
+  it("returns initial state", () => {
+    const { result } = renderHook(() => useRateLimitManager());
+
+    expect(result.current.isSearching).toBe(false);
+    expect(result.current.isLoadingRateLimit).toBe(false);
+    expect(result.current.searchResults).toEqual([]);
+    expect(result.current.selectedUser).toBeNull();
+    expect(result.current.rateLimitData).toBeNull();
+  });
+
+  it("handleSearch does nothing for empty query", async () => {
+    const { result } = renderHook(() => useRateLimitManager());
+
+    await act(async () => {
+      await result.current.handleSearch("  ");
+    });
+
+    expect(mockGetV2GetUserRateLimit).not.toHaveBeenCalled();
+    expect(mockGetV2SearchUsersByNameOrEmail).not.toHaveBeenCalled();
+  });
+
+  it("handleSearch does direct lookup for email input", async () => {
+    const data = makeRateLimitResponse();
+    mockGetV2GetUserRateLimit.mockResolvedValue({ status: 200, data });
+
+    const { result } = renderHook(() => useRateLimitManager());
+
+    await act(async () => {
+      await result.current.handleSearch("alice@example.com");
+    });
+
+    expect(mockGetV2GetUserRateLimit).toHaveBeenCalledWith({
+      email: "alice@example.com",
+    });
+    expect(result.current.rateLimitData).toEqual(data);
+    expect(result.current.selectedUser).toEqual({
+      user_id: "user-123",
+      user_email: "alice@example.com",
+    });
+  });
+
+  it("handleSearch does direct lookup for UUID input", async () => {
+    const uuid = "550e8400-e29b-41d4-a716-446655440000";
+    const data = makeRateLimitResponse({ user_id: uuid });
+    mockGetV2GetUserRateLimit.mockResolvedValue({ status: 200, data });
+
+    const { result } = renderHook(() => useRateLimitManager());
+
+    await act(async () => {
+      await result.current.handleSearch(uuid);
+    });
+
+    expect(mockGetV2GetUserRateLimit).toHaveBeenCalledWith({
+      user_id: uuid,
+    });
+    expect(result.current.rateLimitData).toEqual(data);
+  });
+
+  it("handleSearch shows error toast on direct lookup failure", async () => {
+    mockGetV2GetUserRateLimit.mockResolvedValue({ status: 404 });
+
+    const { result } = renderHook(() => useRateLimitManager());
+
+    await act(async () => {
+      await result.current.handleSearch("alice@example.com");
+    });
+
+    expect(mockToast).toHaveBeenCalledWith(
+      expect.objectContaining({
+        title: "Error",
+        variant: "destructive",
+      }),
+    );
+    expect(result.current.rateLimitData).toBeNull();
+  });
+
+  it("handleSearch does fuzzy search for partial text", async () => {
+    const users = [
+      { user_id: "u1", user_email: "alice@example.com" },
+      { user_id: "u2", user_email: "bob@example.com" },
+    ];
+    mockGetV2SearchUsersByNameOrEmail.mockResolvedValue({
+      status: 200,
+      data: users,
+    });
+
+    const { result } = renderHook(() => useRateLimitManager());
+
+    await act(async () => {
+      await result.current.handleSearch("alice");
+    });
+
+    expect(mockGetV2SearchUsersByNameOrEmail).toHaveBeenCalledWith({
+      query: "alice",
+      limit: 20,
+    });
+    expect(result.current.searchResults).toEqual(users);
+  });
+
+  it("handleSearch shows toast when fuzzy search returns no results", async () => {
+    mockGetV2SearchUsersByNameOrEmail.mockResolvedValue({
+      status: 200,
+      data: [],
+    });
+
+    const { result } = renderHook(() => useRateLimitManager());
+
+    await act(async () => {
+      await result.current.handleSearch("nonexistent");
+    });
+
+    expect(mockToast).toHaveBeenCalledWith(
+      expect.objectContaining({ title: "No results" }),
+    );
+    expect(result.current.searchResults).toEqual([]);
+  });
+
+  it("handleSearch shows error toast on fuzzy search failure", async () => {
+    mockGetV2SearchUsersByNameOrEmail.mockResolvedValue({ status: 500 });
+
+    const { result } = renderHook(() => useRateLimitManager());
+
+    await act(async () => {
+      await result.current.handleSearch("alice");
+    });
+
+    expect(mockToast).toHaveBeenCalledWith(
+      expect.objectContaining({
+        title: "Error",
+        variant: "destructive",
+      }),
+    );
+  });
+
+  it("handleSelectUser fetches rate limit for selected user", async () => {
+    const data = makeRateLimitResponse();
+    mockGetV2GetUserRateLimit.mockResolvedValue({ status: 200, data });
+
+    const { result } = renderHook(() => useRateLimitManager());
+
+    await act(async () => {
+      await result.current.handleSelectUser({
+        user_id: "user-123",
+        user_email: "alice@example.com",
+      });
+    });
+
+    expect(mockGetV2GetUserRateLimit).toHaveBeenCalledWith({
+      user_id: "user-123",
+    });
+    expect(result.current.selectedUser).toEqual({
+      user_id: "user-123",
+      user_email: "alice@example.com",
+    });
+    expect(result.current.rateLimitData).toEqual(data);
+  });
+
+  it("handleSelectUser shows error toast on fetch failure", async () => {
+    mockGetV2GetUserRateLimit.mockResolvedValue({ status: 500 });
+
+    const { result } = renderHook(() => useRateLimitManager());
+
+    await act(async () => {
+      await result.current.handleSelectUser({
+        user_id: "user-123",
+        user_email: "alice@example.com",
+      });
+    });
+
+    expect(mockToast).toHaveBeenCalledWith(
+      expect.objectContaining({
+        title: "Error",
+        variant: "destructive",
+      }),
+    );
+    expect(result.current.rateLimitData).toBeNull();
+  });
+
+  it("handleReset calls reset endpoint and updates data", async () => {
+    const initial = makeRateLimitResponse({ daily_tokens_used: 5000 });
+    const after = makeRateLimitResponse({ daily_tokens_used: 0 });
+    mockGetV2GetUserRateLimit.mockResolvedValue({ status: 200, data: initial });
+    mockPostV2ResetUserRateLimitUsage.mockResolvedValue({
+      status: 200,
+      data: after,
+    });
+
+    const { result } = renderHook(() => useRateLimitManager());
+
+    await act(async () => {
+      await result.current.handleSelectUser({
+        user_id: "user-123",
+        user_email: "alice@example.com",
+      });
+    });
+
+    await act(async () => {
+      await result.current.handleReset(false);
+    });
+
+    expect(mockPostV2ResetUserRateLimitUsage).toHaveBeenCalledWith({
+      user_id: "user-123",
+      reset_weekly: false,
+    });
+    expect(result.current.rateLimitData).toEqual(after);
+    expect(mockToast).toHaveBeenCalledWith(
+      expect.objectContaining({ title: "Success" }),
+    );
+  });
+
+  it("handleReset does nothing when no rate limit data", async () => {
+    const { result } = renderHook(() => useRateLimitManager());
+
+    await act(async () => {
+      await result.current.handleReset(false);
+    });
+
+    expect(mockPostV2ResetUserRateLimitUsage).not.toHaveBeenCalled();
+  });
+
+  it("handleReset shows error toast on failure", async () => {
+    const initial = makeRateLimitResponse();
+    mockGetV2GetUserRateLimit.mockResolvedValue({ status: 200, data: initial });
+    mockPostV2ResetUserRateLimitUsage.mockRejectedValue(
+      new Error("network error"),
+    );
+
+    const { result } = renderHook(() => useRateLimitManager());
+
+    await act(async () => {
+      await result.current.handleSelectUser({
+        user_id: "user-123",
+        user_email: "alice@example.com",
+      });
+    });
+
+    await act(async () => {
+      await result.current.handleReset(true);
+    });
+
+    expect(mockToast).toHaveBeenCalledWith(
+      expect.objectContaining({
+        title: "Error",
+        description: "Failed to reset rate limit usage.",
+        variant: "destructive",
+      }),
+    );
+  });
+
+  it("handleTierChange calls set tier and re-fetches", async () => {
+    const initial = makeRateLimitResponse({ tier: "FREE" });
+    const updated = makeRateLimitResponse({ tier: "PRO" });
+    mockGetV2GetUserRateLimit
+      .mockResolvedValueOnce({ status: 200, data: initial })
+      .mockResolvedValueOnce({ status: 200, data: updated });
+    mockPostV2SetUserRateLimitTier.mockResolvedValue({ status: 200 });
+
+    const { result } = renderHook(() => useRateLimitManager());
+
+    await act(async () => {
+      await result.current.handleSelectUser({
+        user_id: "user-123",
+        user_email: "alice@example.com",
+      });
+    });
+
+    await act(async () => {
+      await result.current.handleTierChange("PRO");
+    });
+
+    expect(mockPostV2SetUserRateLimitTier).toHaveBeenCalledWith({
+      user_id: "user-123",
+      tier: "PRO",
+    });
+    expect(result.current.rateLimitData).toEqual(updated);
+  });
+
+  it("handleTierChange does nothing when no rate limit data", async () => {
+    const { result } = renderHook(() => useRateLimitManager());
+
+    await act(async () => {
+      await result.current.handleTierChange("PRO");
+    });
+
+    expect(mockPostV2SetUserRateLimitTier).not.toHaveBeenCalled();
+  });
+
+  it("handleReset throws when endpoint returns non-200 status", async () => {
+    const initial = makeRateLimitResponse({ daily_tokens_used: 5000 });
+    mockGetV2GetUserRateLimit.mockResolvedValue({ status: 200, data: initial });
+    mockPostV2ResetUserRateLimitUsage.mockResolvedValue({ status: 500 });
+
+    const { result } = renderHook(() => useRateLimitManager());
+
+    await act(async () => {
+      await result.current.handleSelectUser({
+        user_id: "user-123",
+        user_email: "alice@example.com",
+      });
+    });
+
+    await act(async () => {
+      await result.current.handleReset(false);
+    });
+
+    expect(mockToast).toHaveBeenCalledWith(
+      expect.objectContaining({
+        title: "Error",
+        description: "Failed to reset rate limit usage.",
+        variant: "destructive",
+      }),
+    );
+  });
+
+  it("handleTierChange throws when set-tier endpoint returns non-200", async () => {
+    const initial = makeRateLimitResponse({ tier: "FREE" });
+    mockGetV2GetUserRateLimit.mockResolvedValue({ status: 200, data: initial });
+    mockPostV2SetUserRateLimitTier.mockResolvedValue({ status: 500 });
+
+    const { result } = renderHook(() => useRateLimitManager());
+
+    await act(async () => {
+      await result.current.handleSelectUser({
+        user_id: "user-123",
+        user_email: "alice@example.com",
+      });
+    });
+
+    await expect(
+      act(async () => {
+        await result.current.handleTierChange("PRO");
+      }),
+    ).rejects.toThrow("Failed to update tier");
+  });
+});
--- a/autogpt_platform/frontend/src/app/(platform)/admin/rate-limits/components/useRateLimitManager.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/admin/rate-limits/components/useRateLimitManager.ts
@@ -2,11 +2,13 @@

 import { useState } from "react";
 import { useToast } from "@/components/molecules/Toast/use-toast";
+import type { SetUserTierRequest } from "@/app/api/__generated__/models/setUserTierRequest";
 import type { UserRateLimitResponse } from "@/app/api/__generated__/models/userRateLimitResponse";
 import {
  getV2GetUserRateLimit,
-  getV2GetAllUsersHistory,
+  getV2SearchUsersByNameOrEmail,
  postV2ResetUserRateLimitUsage,
+  postV2SetUserRateLimitTier,
 } from "@/app/api/__generated__/endpoints/admin/admin";

 export interface UserOption {
@@ -14,18 +16,10 @@ export interface UserOption {
  user_email: string;
 }

-/**
- * Returns true when the input looks like a complete email address.
- * Used to decide whether to call the direct email lookup endpoint
- * vs. the broader user-history search.
- */
 function looksLikeEmail(input: string): boolean {
  return /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(input);
 }

-/**
- * Returns true when the input looks like a UUID (user ID).
- */
 function looksLikeUuid(input: string): boolean {
  return /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i.test(
    input,
@@ -41,7 +35,6 @@ export function useRateLimitManager() {
  const [rateLimitData, setRateLimitData] =
    useState<UserRateLimitResponse | null>(null);

-  /** Direct lookup by email or user ID via the rate-limit endpoint. */
  async function handleDirectLookup(trimmed: string) {
    setIsSearching(true);
    setSearchResults([]);
@@ -77,7 +70,6 @@ export function useRateLimitManager() {
    }
  }

-  /** Fuzzy name/email search via the spending-history endpoint. */
  async function handleFuzzySearch(trimmed: string) {
    setIsSearching(true);
    setSearchResults([]);
@@ -85,38 +77,21 @@ export function useRateLimitManager() {
    setRateLimitData(null);

    try {
-      const response = await getV2GetAllUsersHistory({
-        search: trimmed,
-        page: 1,
-        page_size: 50,
+      const response = await getV2SearchUsersByNameOrEmail({
+        query: trimmed,
+        limit: 20,
      });
      if (response.status !== 200) {
        throw new Error("Failed to search users");
      }

-      // Deduplicate by user_id to get unique users
-      const seen = new Set<string>();
-      const users: UserOption[] = [];
-      for (const tx of response.data.history) {
-        if (!seen.has(tx.user_id)) {
-          seen.add(tx.user_id);
-          users.push({
-            user_id: tx.user_id,
-            user_email: String(tx.user_email ?? tx.user_id),
-          });
-        }
-      }
-
+      const users = (response.data ?? []).map((u) => ({
+        user_id: u.user_id,
+        user_email: u.user_email ?? u.user_id,
+      }));
      if (users.length === 0) {
-        toast({
-          title: "No results",
-          description: "No users found matching your search.",
-        });
+        toast({ title: "No results", description: "No users found." });
      }
-
-      // Always show the result list so the user explicitly picks a match.
-      // The history endpoint paginates transactions, not users, so a single
-      // page may not be authoritative -- avoid auto-selecting.
      setSearchResults(users);
    } catch (error) {
      console.error("Error searching users:", error);
@@ -199,6 +174,32 @@ export function useRateLimitManager() {
    }
  }

+  async function handleTierChange(newTier: string) {
+    if (!rateLimitData) return;
+
+    const response = await postV2SetUserRateLimitTier({
+      user_id: rateLimitData.user_id,
+      tier: newTier as SetUserTierRequest["tier"],
+    });
+
+    if (response.status !== 200) {
+      throw new Error("Failed to update tier");
+    }
+
+    // Re-fetch rate limit data to reflect new tier-adjusted limits.
+    try {
+      const refreshResponse = await getV2GetUserRateLimit({
+        user_id: rateLimitData.user_id,
+      });
+      if (refreshResponse.status === 200) {
+        setRateLimitData(refreshResponse.data);
+      }
+    } catch {
+      // Tier was changed server-side; UI will be stale but not incorrect.
+      // The caller's success toast is still valid — the tier change worked.
+    }
+  }
+
  return {
    isSearching,
    isLoadingRateLimit,
@@ -208,5 +209,6 @@ export function useRateLimitManager() {
    handleSearch,
    handleSelectUser,
    handleReset,
+    handleTierChange,
  };
 }
--- a/autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/CustomNode/components/NodeOutput/components/ContentRenderer.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/CustomNode/components/NodeOutput/components/ContentRenderer.tsx
@@ -40,14 +40,14 @@ export const ContentRenderer: React.FC<{
    !shortContent
  ) {
    return (
-      <div className="overflow-hidden [&>*]:rounded-xlarge [&>*]:!text-xs [&_pre]:whitespace-pre-wrap [&_pre]:break-words">
+      <div className="overflow-x-auto [&>*]:rounded-xlarge [&>*]:!text-xs [&_pre]:whitespace-pre-wrap [&_pre]:break-words">
        {renderer?.render(value, metadata)}
      </div>
    );
  }

  return (
-    <div className="overflow-hidden [&>*]:rounded-xlarge [&>*]:!text-xs">
+    <div className="overflow-x-auto [&>*]:rounded-xlarge [&>*]:!text-xs">
      <TextRenderer value={value} truncateLengthLimit={200} />
    </div>
  );
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/CopilotPage.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/CopilotPage.tsx
@@ -8,6 +8,7 @@ import { Flag, useGetFlag } from "@/services/feature-flags/use-get-flag";
 import { SidebarProvider } from "@/components/ui/sidebar";
 import { cn } from "@/lib/utils";
 import { UploadSimple } from "@phosphor-icons/react";
+import dynamic from "next/dynamic";
 import { useCallback, useEffect, useRef, useState } from "react";
 import { ChatContainer } from "./components/ChatContainer/ChatContainer";
 import { ChatSidebar } from "./components/ChatSidebar/ChatSidebar";
@@ -20,6 +21,14 @@ import { RateLimitResetDialog } from "./components/RateLimitResetDialog/RateLimi
 import { ScaleLoader } from "./components/ScaleLoader/ScaleLoader";
 import { useCopilotPage } from "./useCopilotPage";

+const ArtifactPanel = dynamic(
+  () =>
+    import("./components/ArtifactPanel/ArtifactPanel").then(
+      (m) => m.ArtifactPanel,
+    ),
+  { ssr: false },
+);
+
 export function CopilotPage() {
  const [isDragging, setIsDragging] = useState(false);
  const [droppedFiles, setDroppedFiles] = useState<File[]>([]);
@@ -80,6 +89,10 @@ export function CopilotPage() {
    isUploadingFiles,
    isUserLoading,
    isLoggedIn,
+    // Pagination
+    hasMoreMessages,
+    isLoadingMore,
+    loadMore,
    // Mobile drawer
    isMobile,
    isDrawerOpen,
@@ -116,6 +129,7 @@ export function CopilotPage() {
  const resetCost = usage?.reset_cost;

  const isBillingEnabled = useGetFlag(Flag.ENABLE_PLATFORM_PAYMENT);
+  const isArtifactsEnabled = useGetFlag(Flag.ARTIFACTS);
  const { credits, fetchCredits } = useCredits({ fetchInitialCredits: true });
  const hasInsufficientCredits =
    credits !== null && resetCost != null && credits < resetCost;
@@ -150,48 +164,55 @@ export function CopilotPage() {
      className="h-[calc(100vh-72px)] min-h-0"
    >
      {!isMobile && <ChatSidebar />}
-      <div
-        className="relative flex h-full w-full flex-col overflow-hidden bg-[#f8f8f9] px-0"
-        onDragEnter={handleDragEnter}
-        onDragOver={handleDragOver}
-        onDragLeave={handleDragLeave}
-        onDrop={handleDrop}
-      >
-        {isMobile && <MobileHeader onOpenDrawer={handleOpenDrawer} />}
-        <NotificationBanner />
-        {/* Drop overlay */}
+      <div className="flex h-full w-full flex-row overflow-hidden">
        <div
-          className={cn(
-            "pointer-events-none absolute inset-0 z-50 flex flex-col items-center justify-center gap-3 rounded-lg border-2 border-dashed border-violet-400 bg-violet-500/10 transition-opacity duration-150",
-            isDragging ? "opacity-100" : "opacity-0",
-          )}
+          className="relative flex min-w-0 flex-1 flex-col overflow-hidden bg-[#f8f8f9] px-0"
+          onDragEnter={handleDragEnter}
+          onDragOver={handleDragOver}
+          onDragLeave={handleDragLeave}
+          onDrop={handleDrop}
        >
-          <UploadSimple className="h-10 w-10 text-violet-500" weight="bold" />
-          <span className="text-lg font-medium text-violet-600">
-            Drop files here
-          </span>
-        </div>
-        <div className="flex-1 overflow-hidden">
-          <ChatContainer
-            messages={messages}
-            status={status}
-            error={error}
-            sessionId={sessionId}
-            isLoadingSession={isLoadingSession}
-            isSessionError={isSessionError}
-            isCreatingSession={isCreatingSession}
-            isReconnecting={isReconnecting}
-            isSyncing={isSyncing}
-            onCreateSession={createSession}
-            onSend={onSend}
-            onStop={stop}
-            isUploadingFiles={isUploadingFiles}
-            droppedFiles={droppedFiles}
-            onDroppedFilesConsumed={handleDroppedFilesConsumed}
-            historicalDurations={historicalDurations}
-          />
+          {isMobile && <MobileHeader onOpenDrawer={handleOpenDrawer} />}
+          <NotificationBanner />
+          {/* Drop overlay */}
+          <div
+            className={cn(
+              "pointer-events-none absolute inset-0 z-50 flex flex-col items-center justify-center gap-3 rounded-lg border-2 border-dashed border-violet-400 bg-violet-500/10 transition-opacity duration-150",
+              isDragging ? "opacity-100" : "opacity-0",
+            )}
+          >
+            <UploadSimple className="h-10 w-10 text-violet-500" weight="bold" />
+            <span className="text-lg font-medium text-violet-600">
+              Drop files here
+            </span>
+          </div>
+          <div className="flex-1 overflow-hidden">
+            <ChatContainer
+              messages={messages}
+              status={status}
+              error={error}
+              sessionId={sessionId}
+              isLoadingSession={isLoadingSession}
+              isSessionError={isSessionError}
+              isCreatingSession={isCreatingSession}
+              isReconnecting={isReconnecting}
+              isSyncing={isSyncing}
+              onCreateSession={createSession}
+              onSend={onSend}
+              onStop={stop}
+              isUploadingFiles={isUploadingFiles}
+              hasMoreMessages={hasMoreMessages}
+              isLoadingMore={isLoadingMore}
+              onLoadMore={loadMore}
+              droppedFiles={droppedFiles}
+              onDroppedFilesConsumed={handleDroppedFilesConsumed}
+              historicalDurations={historicalDurations}
+            />
+          </div>
        </div>
+        {!isMobile && isArtifactsEnabled && <ArtifactPanel />}
      </div>
+      {isMobile && isArtifactsEnabled && <ArtifactPanel mobile />}
      {isMobile && (
        <MobileDrawer
          isOpen={isDrawerOpen}
--- a/Show More
+++ b/Show More