fix(frontend): address latest CodeRabbit review suggestions

- Use valid sort value "runs" instead of undefined in MainSearchResultPage test defaultProps to match production default and satisfy type contract - Remove redundant marketplacePage.goto() navigation in E2E test since the page is already at /marketplace after login Co-authored-by: Ubbe <0ubbe@users.noreply.github.com>
fix(frontend): address CodeRabbit review suggestions for marketplace tests
2026-04-08 03:00:28 -04:00 · 2026-02-12 15:02:18 +00:00 · 2026-02-12 14:36:34 +00:00 · 2026-01-30 06:34:39 +00:00 · 2026-01-30 11:42:52 +05:30 · 2026-01-30 06:11:53 +00:00
3758 changed files with 875366 additions and 243148 deletions
--- a/.agents/skills
+++ b/.agents/skills
@@ -1 +0,0 @@
-../.claude/skills
--- a/.claude/settings.json
+++ b/.claude/settings.json
@@ -1,10 +0,0 @@
-{
-  "permissions": {
-    "allowedTools": [
-      "Read", "Grep", "Glob",
-      "Bash(ls:*)", "Bash(cat:*)", "Bash(grep:*)", "Bash(find:*)",
-      "Bash(git status:*)", "Bash(git diff:*)", "Bash(git log:*)", "Bash(git worktree:*)",
-      "Bash(tmux:*)", "Bash(sleep:*)", "Bash(branchlet:*)"
-    ]
-  }
-}
--- a/.claude/skills/open-pr/SKILL.md
+++ b/.claude/skills/open-pr/SKILL.md
@@ -1,106 +0,0 @@
---
-name: open-pr
-description: Open a pull request with proper PR template, test coverage, and review workflow. Guides agents through creating a PR that follows repo conventions, ensures existing behaviors aren't broken, covers new behaviors with tests, and handles review via bot when local testing isn't possible. TRIGGER when user asks to "open a PR", "create a PR", "make a PR", "submit a PR", "open pull request", "push and create PR", or any variation of opening/submitting a pull request.
-user-invocable: true
-args: "[base-branch] — optional target branch (defaults to dev)."
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# Open a Pull Request
-
-## Step 1: Pre-flight checks
-
-Before opening the PR:
-
-1. Ensure all changes are committed
-2. Ensure the branch is pushed to the remote (`git push -u origin <branch>`)
-3. Run linters/formatters across the whole repo (not just changed files) and commit any fixes
-
-## Step 2: Test coverage
-
-**This is critical.** Before opening the PR, verify:
-
-### Existing behavior is not broken
- Identify which modules/components your changes touch
- Run the existing test suites for those areas
- If tests fail, fix them before opening the PR — do not open a PR with known regressions
-
-### New behavior has test coverage
- Every new feature, endpoint, or behavior change needs tests
- If you added a new block, add tests for that block
- If you changed API behavior, add or update API tests
- If you changed frontend behavior, verify it doesn't break existing flows
-
-If you cannot run the full test suite locally, note which tests you ran and which you couldn't in the test plan.
-
-## Step 3: Create the PR using the repo template
-
-Read the canonical PR template at `.github/PULL_REQUEST_TEMPLATE.md` and use it **verbatim** as your PR body:
-
-1. Read the template: `cat .github/PULL_REQUEST_TEMPLATE.md`
-2. Preserve the exact section titles and formatting, including:
-   - `### Why / What / How`
-   - `### Changes 🏗️`
-   - `### Checklist 📋`
-3. Replace HTML comment prompts (`<!-- ... -->`) with actual content; do not leave them in
-4. **Do not pre-check boxes** — leave all checkboxes as `- [ ]` until each step is actually completed
-5. Do not alter the template structure, rename sections, or remove any checklist items
-
-**PR title must use conventional commit format** (e.g., `feat(backend): add new block`, `fix(frontend): resolve routing bug`, `dx(skills): update PR workflow`). See CLAUDE.md for the full list of scopes.
-
-Use `gh pr create` with the base branch (defaults to `dev` if no `[base-branch]` was provided). Use `--body-file` to avoid shell interpretation of backticks and special characters:
-
-```bash
-BASE_BRANCH="${BASE_BRANCH:-dev}"
-PR_BODY=$(mktemp)
-cat > "$PR_BODY" << 'PREOF'
-<filled-in template from .github/PULL_REQUEST_TEMPLATE.md>
-PREOF
-gh pr create --base "$BASE_BRANCH" --title "<type>(scope): short description" --body-file "$PR_BODY"
-rm "$PR_BODY"
-```
-
-## Step 4: Review workflow
-
-### If you have a workspace that allows testing (docker, running backend, etc.)
- Run `/pr-test` to do E2E manual testing of the PR using docker compose, agent-browser, and API calls. This is the most thorough way to validate your changes before review.
- After testing, run `/pr-review` to self-review the PR for correctness, security, code quality, and testing gaps before requesting human review.
-
-### If you do NOT have a workspace that allows testing
-This is common for agents running in worktrees without a full stack. In this case:
-
-1. Run `/pr-review` locally to catch obvious issues before pushing
-2. **Comment `/review` on the PR** after creating it to trigger the review bot
-3. **Poll for the review** rather than blindly waiting — check for new review comments every 30 seconds using `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate` and the GraphQL inline threads query. The bot typically responds within 30 minutes, but polling lets the agent react as soon as it arrives.
-4. Do NOT proceed or merge until the bot review comes back
-5. Address any issues the bot raises — use `/pr-address` which has a full polling loop with CI + comment tracking
-
-```bash
-# After creating the PR:
-PR_NUMBER=$(gh pr view --json number -q .number)
-gh pr comment "$PR_NUMBER" --body "/review"
-# Then use /pr-address to poll for and address the review when it arrives
-```
-
-## Step 5: Address review feedback
-
-Once the review bot or human reviewers leave comments:
- Run `/pr-address` to address review comments. It will loop until CI is green and all comments are resolved.
- Do not merge without human approval.
-
-## Related skills
-
-| Skill | When to use |
-|---|---|
-| `/pr-test` | E2E testing with docker compose, agent-browser, API calls — use when you have a running workspace |
-| `/pr-review` | Review for correctness, security, code quality — use before requesting human review |
-| `/pr-address` | Address reviewer comments and loop until CI green — use after reviews come in |
-
-## Step 6: Post-creation
-
-After the PR is created and review is triggered:
- Share the PR URL with the user
- If waiting on the review bot, let the user know the expected wait time (~30 min)
- Do not merge without human approval
--- a/.claude/skills/pr-address/SKILL.md
+++ b/.claude/skills/pr-address/SKILL.md
@@ -1,232 +0,0 @@
---
-name: pr-address
-description: Address PR review comments and loop until CI green and all comments resolved. TRIGGER when user asks to address comments, fix PR feedback, respond to reviewers, or babysit/monitor a PR.
-user-invocable: true
-argument-hint: "[PR number or URL] — if omitted, finds PR for current branch."
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# PR Address
-
-## Find the PR
-
-```bash
-gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT
-gh pr view {N}
-```
-
-## Read the PR description
-
-Understand the **Why / What / How** before addressing comments — you need context to make good fixes:
-
-```bash
-gh pr view {N} --json body --jq '.body'
-```
-
-## Fetch comments (all sources)
-
-### 1. Inline review threads — GraphQL (primary source of actionable items)
-
-Use GraphQL to fetch inline threads. It natively exposes `isResolved`, returns threads already grouped with all replies, and paginates via cursor — no manual thread reconstruction needed.
-
-```bash
-gh api graphql -f query='
-{
-  repository(owner: "Significant-Gravitas", name: "AutoGPT") {
-    pullRequest(number: {N}) {
-      reviewThreads(first: 100) {
-        pageInfo { hasNextPage endCursor }
-        nodes {
-          id
-          isResolved
-          path
-          comments(last: 1) {
-            nodes { databaseId body author { login } createdAt }
-          }
-        }
-      }
-    }
-  }
-}'
-```
-
-If `pageInfo.hasNextPage` is true, fetch subsequent pages by adding `after: "<endCursor>"` to `reviewThreads(first: 100, after: "...")` and repeat until `hasNextPage` is false.
-
-**Filter to unresolved threads only** — skip any thread where `isResolved: true`. `comments(last: 1)` returns the most recent comment in the thread — act on that; it reflects the reviewer's final ask. Use the thread `id` (Relay global ID) to track threads across polls.
-
-### 2. Top-level reviews — REST (MUST paginate)
-
-```bash
-gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate
-```
-
-**CRITICAL — always `--paginate`.** Reviews default to 30 per page. PRs can have 80–170+ reviews (mostly empty resolution events). Without pagination you miss reviews past position 30 — including `autogpt-reviewer`'s structured review which is typically posted after several CI runs and sits well beyond the first page.
-
-Two things to extract:
- **Overall state**: look for `CHANGES_REQUESTED` or `APPROVED` reviews.
- **Actionable feedback**: non-empty bodies only. Empty-body reviews are thread-resolution events — they indicate progress but have no feedback to act on.
-
-**Where each reviewer posts:**
- `autogpt-reviewer` — posts detailed structured reviews ("Blockers", "Should Fix", "Nice to Have") as **top-level reviews**. Not present on every PR. Address ALL items.
- `sentry[bot]` — posts bug predictions as **inline threads**. Fix real bugs, explain false positives.
- `coderabbitai[bot]` — posts summaries as **top-level reviews** AND actionable items as **inline threads**. Address actionable items.
- Human reviewers — can post in any source. Address ALL non-empty feedback.
-
-### 3. PR conversation comments — REST
-
-```bash
-gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments --paginate
-```
-
-Mostly contains: bot summaries (`coderabbitai[bot]`), CI/conflict detection (`github-actions[bot]`), and author status updates. Scan for non-empty messages from non-bot human reviewers that aren't the PR author — those are the ones that need a response.
-
-## For each unaddressed comment
-
-Address comments **one at a time**: fix → commit → push → inline reply → next.
-
-1. Read the referenced code, make the fix (or reply explaining why it's not needed)
-2. Commit and push the fix
-3. Reply **inline** (not as a new top-level comment) referencing the fixing commit — this is what resolves the conversation for bot reviewers (coderabbitai, sentry):
-
-| Comment type | How to reply |
-|---|---|
-| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="🤖 Fixed in <commit-sha>: <description>"` |
-| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="🤖 Fixed in <commit-sha>: <description>"` |
-
-## Codecov coverage
-
-Codecov patch target is **80%** on changed lines. Checks are **informational** (not blocking) but should be green.
-
-### Running coverage locally
-
-**Backend** (from `autogpt_platform/backend/`):
-```bash
-poetry run pytest -s -vv --cov=backend --cov-branch --cov-report term-missing
-```
-
-**Frontend** (from `autogpt_platform/frontend/`):
-```bash
-pnpm vitest run --coverage
-```
-
-### When codecov/patch fails
-
-1. Find uncovered files: `git diff --name-only $(gh pr view --json baseRefName --jq '.baseRefName')...HEAD`
-2. For each uncovered file — extract inline logic to `helpers.ts`/`helpers.py` and test those (highest ROI). Colocate tests as `*_test.py` (backend) or `__tests__/*.test.ts` (frontend).
-3. Run coverage locally to verify, commit, push.
-
-## Format and commit
-
-After fixing, format the changed code:
-
- **Backend** (from `autogpt_platform/backend/`): `poetry run format`
- **Frontend** (from `autogpt_platform/frontend/`): `pnpm format && pnpm lint && pnpm types`
-
-If API routes changed, regenerate the frontend client:
-```bash
-cd autogpt_platform/backend && poetry run rest &
-REST_PID=$!
-trap "kill $REST_PID 2>/dev/null" EXIT
-WAIT=0; until curl -sf http://localhost:8006/health > /dev/null 2>&1; do sleep 1; WAIT=$((WAIT+1)); [ $WAIT -ge 60 ] && echo "Timed out" && exit 1; done
-cd ../frontend && pnpm generate:api:force
-kill $REST_PID 2>/dev/null; trap - EXIT
-```
-Never manually edit files in `src/app/api/__generated__/`.
-
-Then commit and **push immediately** — never batch commits without pushing. Each fix should be visible on GitHub right away so CI can start and reviewers can see progress.
-
-**Never push empty commits** (`git commit --allow-empty`) to re-trigger CI or bot checks. When a check fails, investigate the root cause (unchecked PR checklist, unaddressed review comments, code issues) and fix those directly. Empty commits add noise to git history.
-
-For backend commits in worktrees: `poetry run git commit` (pre-commit hooks).
-
-## The loop
-
-```text
-address comments → format → commit → push
-→ wait for CI (while addressing new comments) → fix failures → push
-→ re-check comments after CI settles
-→ repeat until: all comments addressed AND CI green AND no new comments arriving
-```
-
-### Polling for CI + new comments
-
-After pushing, poll for **both** CI status and new comments in a single loop. Do not use `gh pr checks --watch` — it blocks the tool and prevents reacting to new comments while CI is running.
-
-> **Note:** `gh pr checks --watch --fail-fast` is tempting but it blocks the entire Bash tool call, meaning the agent cannot check for or address new comments until CI fully completes. Always poll manually instead.
-
-**Polling loop — repeat every 30 seconds:**
-
-1. Check CI status:
-```bash
-gh pr checks {N} --repo Significant-Gravitas/AutoGPT --json bucket,name,link
-```
-   Parse the results: if every check has `bucket` of `"pass"` or `"skipping"`, CI is green. If any has `"fail"`, CI has failed. Otherwise CI is still pending.
-
-2. Check for merge conflicts:
-```bash
-gh pr view {N} --repo Significant-Gravitas/AutoGPT --json mergeable --jq '.mergeable'
-```
-   If the result is `"CONFLICTING"`, the PR has a merge conflict — see "Resolving merge conflicts" below. If `"UNKNOWN"`, GitHub is still computing mergeability — wait and re-check next poll.
-
-3. Check for new/changed comments (all three sources):
-
-   **Inline threads** — re-run the GraphQL query from "Fetch comments". For each unresolved thread, record `{thread_id, last_comment_databaseId}` as your baseline. On each poll, action is needed if:
-   - A new thread `id` appears that wasn't in the baseline (new thread), OR
-   - An existing thread's `last_comment_databaseId` has changed (new reply on existing thread)
-
-   **Conversation comments:**
-   ```bash
-   gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments --paginate
-   ```
-   Compare total count and newest `id` against baseline. Filter to non-empty, non-bot, non-author-update messages.
-
-   **Top-level reviews:**
-   ```bash
-   gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate
-   ```
-   Watch for new non-empty reviews (`CHANGES_REQUESTED` or `COMMENTED` with body). Compare total count and newest `id` against baseline.
-
-4. **React in this precedence order (first match wins):**
-
-| What happened | Action |
-|---|---|
-| Merge conflict detected | See "Resolving merge conflicts" below. |
-| Mergeability is `UNKNOWN` | GitHub is still computing mergeability. Sleep 30 seconds, then restart polling from the top. |
-| New comments detected | Address them (fix → commit → push → reply). After pushing, re-fetch all comments to update your baseline, then restart this polling loop from the top (new commits invalidate CI status). |
-| CI failed (bucket == "fail") | Get failed check links: `gh pr checks {N} --repo Significant-Gravitas/AutoGPT --json bucket,link --jq '.[] \| select(.bucket == "fail") \| .link'`. Extract run ID from link (format: `.../actions/runs/<run-id>/job/...`), read logs with `gh run view <run-id> --repo Significant-Gravitas/AutoGPT --log-failed`. Fix → commit → push → restart polling. |
-| CI green + no new comments | **Do not exit immediately.** Bots (coderabbitai, sentry) often post reviews shortly after CI settles. Continue polling for **2 more cycles (60s)** after CI goes green. Only exit after 2 consecutive green+quiet polls. |
-| CI pending + no new comments | Sleep 30 seconds, then poll again. |
-
-**The loop ends when:** CI fully green + all comments addressed + **2 consecutive polls with no new comments after CI settled.**
-
-### Resolving merge conflicts
-
-1. Identify the PR's target branch and remote:
-```bash
-gh pr view {N} --repo Significant-Gravitas/AutoGPT --json baseRefName --jq '.baseRefName'
-git remote -v   # find the remote pointing to Significant-Gravitas/AutoGPT (typically 'upstream' in forks, 'origin' for direct contributors)
-```
-
-2. Pull the latest base branch with a 3-way merge:
-```bash
-git pull {base-remote} {base-branch} --no-rebase
-```
-
-3. Resolve conflicting files, then verify no conflict markers remain:
-```bash
-if grep -R -n -E '^(<<<<<<<|=======|>>>>>>>)' <conflicted-files>; then
-  echo "Unresolved conflict markers found — resolve before proceeding."
-  exit 1
-fi
-```
-
-4. Stage and push:
-```bash
-git add <conflicted-files>
-git commit -m "Resolve merge conflicts with {base-branch}"
-git push
-```
-
-5. Restart the polling loop from the top — new commits reset CI status.
--- a/.claude/skills/pr-review/SKILL.md
+++ b/.claude/skills/pr-review/SKILL.md
@@ -1,86 +0,0 @@
---
-name: pr-review
-description: Review a PR for correctness, security, code quality, and testing issues. TRIGGER when user asks to review a PR, check PR quality, or give feedback on a PR.
-user-invocable: true
-args: "[PR number or URL] — if omitted, finds PR for current branch."
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# PR Review
-
-## Find the PR
-
-```bash
-gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT
-gh pr view {N}
-```
-
-## Read the PR description
-
-Before reading code, understand the **why**, **what**, and **how** from the PR description:
-
-```bash
-gh pr view {N} --json body --jq '.body'
-```
-
-Every PR should have a Why / What / How structure. If any of these are missing, note it as feedback.
-
-## Read the diff
-
-```bash
-gh pr diff {N}
-```
-
-## Fetch existing review comments
-
-Before posting anything, fetch existing inline comments to avoid duplicates:
-
-```bash
-gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate
-gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews
-```
-
-## What to check
-
-**Description quality:** Does the PR description cover Why (motivation/problem), What (summary of changes), and How (approach/implementation details)? If any are missing, request them — you can't judge the approach without understanding the problem and intent.
-
-**Correctness:** logic errors, off-by-one, missing edge cases, race conditions (TOCTOU in file access, credit charging), error handling gaps, async correctness (missing `await`, unclosed resources).
-
-**Security:** input validation at boundaries, no injection (command, XSS, SQL), secrets not logged, file paths sanitized (`os.path.basename()` in error messages).
-
-**Code quality:** apply rules from backend/frontend CLAUDE.md files.
-
-**Architecture:** DRY, single responsibility, modular functions. `Security()` vs `Depends()` for FastAPI auth. `data:` for SSE events, `: comment` for heartbeats. `transaction=True` for Redis pipelines.
-
-**Testing:** edge cases covered, colocated `*_test.py` (backend) / `__tests__/` (frontend), mocks target where symbol is **used** not defined, `AsyncMock` for async.
-
-## Output format
-
-Every comment **must** be prefixed with `🤖` and a criticality badge:
-
-| Tier | Badge | Meaning |
-|---|---|---|
-| Blocker | `🔴 **Blocker**` | Must fix before merge |
-| Should Fix | `🟠 **Should Fix**` | Important improvement |
-| Nice to Have | `🟡 **Nice to Have**` | Minor suggestion |
-| Nit | `🔵 **Nit**` | Style / wording |
-
-Example: `🤖 🔴 **Blocker**: Missing error handling for X — suggest wrapping in try/except.`
-
-## Post inline comments
-
-For each finding, post an inline comment on the PR (do not just write a local report):
-
-```bash
-# Get the latest commit SHA for the PR
-COMMIT_SHA=$(gh api repos/Significant-Gravitas/AutoGPT/pulls/{N} --jq '.head.sha')
-
-# Post an inline comment on a specific file/line
-gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments \
-  -f body="🤖 🔴 **Blocker**: <description>" \
-  -f commit_id="$COMMIT_SHA" \
-  -f path="<file path>" \
-  -F line=<line number>
-```
--- a/.claude/skills/pr-test/SKILL.md
+++ b/.claude/skills/pr-test/SKILL.md
@@ -1,886 +0,0 @@
---
-name: pr-test
-description: "E2E manual testing of PRs/branches using docker compose, agent-browser, and API calls. TRIGGER when user asks to manually test a PR, test a feature end-to-end, or run integration tests against a running system."
-user-invocable: true
-argument-hint: "[worktree path or PR number] — tests the PR in the given worktree. Optional flags: --fix (auto-fix issues found)"
-metadata:
-  author: autogpt-team
-  version: "2.0.0"
---
-
-# Manual E2E Test
-
-Test a PR/branch end-to-end by building the full platform, interacting via browser and API, capturing screenshots, and reporting results.
-
-## Critical Requirements
-
-These are NON-NEGOTIABLE. Every test run MUST satisfy ALL the following:
-
-### 1. Screenshots at Every Step
- Take a screenshot at EVERY significant test step — not just at the end
- Every test scenario MUST have at least one BEFORE and one AFTER screenshot
- Name screenshots sequentially: `{NN}-{action}-{state}.png` (e.g., `01-credits-before.png`, `02-credits-after.png`)
- If a screenshot is missing for a scenario, the test is INCOMPLETE — go back and take it
-
-### 2. Screenshots MUST Be Posted to PR
- Push ALL screenshots to a temp branch `test-screenshots/pr-{N}`
- Post a PR comment with ALL screenshots embedded inline using GitHub raw URLs
- This is NOT optional — every test run MUST end with a PR comment containing screenshots
- If screenshot upload fails, retry. If it still fails, list failed files and require manual drag-and-drop/paste attachment in the PR comment
-
-### 3. State Verification with Before/After Evidence
- For EVERY state-changing operation (API call, user action), capture the state BEFORE and AFTER
- Log the actual API response values (e.g., `credits_before=100, credits_after=95`)
- Screenshot MUST show the relevant UI state change
- Compare expected vs actual values explicitly — do not just eyeball it
-
-### 4. Negative Test Cases Are Mandatory
- Test at least ONE negative case per feature (e.g., insufficient credits, invalid input, unauthorized access)
- Verify error messages are user-friendly and accurate
- Verify the system state did NOT change after a rejected operation
-
-### 5. Test Report Must Include Full Evidence
-Each test scenario in the report MUST have:
- **Steps**: What was done (exact commands or UI actions)
- **Expected**: What should happen
- **Actual**: What actually happened
- **API Evidence**: Before/after API response values for state-changing operations
- **Screenshot Evidence**: Before/after screenshots with explanations
-
-## State Manipulation for Realistic Testing
-
-When testing features that depend on specific states (rate limits, credits, quotas):
-
-1. **Use Redis CLI to set counters directly:**
-   ```bash
-   # Find the Redis container
-   REDIS_CONTAINER=$(docker ps --format '{{.Names}}' | grep redis | head -1)
-   # Set a key with expiry
-   docker exec $REDIS_CONTAINER redis-cli SET key value EX ttl
-   # Example: Set rate limit counter to near-limit
-   docker exec $REDIS_CONTAINER redis-cli SET "rate_limit:user:test@test.com" 99 EX 3600
-   # Example: Check current value
-   docker exec $REDIS_CONTAINER redis-cli GET "rate_limit:user:test@test.com"
-   ```
-
-2. **Use API calls to check before/after state:**
-   ```bash
-   # BEFORE: Record current state
-   BEFORE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits')
-   echo "Credits BEFORE: $BEFORE"
-
-   # Perform the action...
-
-   # AFTER: Record new state and compare
-   AFTER=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits')
-   echo "Credits AFTER: $AFTER"
-   echo "Delta: $(( BEFORE - AFTER ))"
-   ```
-
-3. **Take screenshots BEFORE and AFTER state changes** — the UI must reflect the backend state change
-
-4. **Never rely on mocked/injected browser state** — always use real backend state. Do NOT use `agent-browser eval` to fake UI state. The backend must be the source of truth.
-
-5. **Use direct DB queries when needed:**
-   ```bash
-   # Query via Supabase's PostgREST or docker exec into the DB
-   docker exec supabase-db psql -U supabase_admin -d postgres -c "SELECT credits FROM user_credits WHERE user_id = '...';"
-   ```
-
-6. **After every API test, verify the state change actually persisted:**
-   ```bash
-   # Example: After a credits purchase, verify DB matches API
-   API_CREDITS=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits')
-   DB_CREDITS=$(docker exec supabase-db psql -U supabase_admin -d postgres -t -c "SELECT credits FROM user_credits WHERE user_id = '...';" | tr -d ' ')
-   [ "$API_CREDITS" = "$DB_CREDITS" ] && echo "CONSISTENT" || echo "MISMATCH: API=$API_CREDITS DB=$DB_CREDITS"
-   ```
-
-## Arguments
-
- `$ARGUMENTS` — worktree path (e.g. `$REPO_ROOT`) or PR number
- If `--fix` flag is present, auto-fix bugs found and push fixes (like pr-address loop)
-
-## Step 0: Resolve the target
-
-```bash
-# If argument is a PR number, find its worktree
-gh pr view {N} --json headRefName --jq '.headRefName'
-# If argument is a path, use it directly
-```
-
-Determine:
- `REPO_ROOT` — the root repo directory: `git -C "$WORKTREE_PATH" worktree list | head -1 | awk '{print $1}'` (or `git rev-parse --show-toplevel` if not a worktree)
- `WORKTREE_PATH` — the worktree directory
- `PLATFORM_DIR` — `$WORKTREE_PATH/autogpt_platform`
- `BACKEND_DIR` — `$PLATFORM_DIR/backend`
- `FRONTEND_DIR` — `$PLATFORM_DIR/frontend`
- `PR_NUMBER` — the PR number (from `gh pr list --head $(git branch --show-current)`)
- `PR_TITLE` — the PR title, slugified (e.g. "Add copilot permissions" → "add-copilot-permissions")
- `RESULTS_DIR` — `$REPO_ROOT/test-results/PR-{PR_NUMBER}-{slugified-title}`
-
-Create the results directory:
-```bash
-PR_NUMBER=$(cd $WORKTREE_PATH && gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT --json number --jq '.[0].number')
-PR_TITLE=$(cd $WORKTREE_PATH && gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT --json title --jq '.[0].title' | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//' | head -c 50)
-RESULTS_DIR="$REPO_ROOT/test-results/PR-${PR_NUMBER}-${PR_TITLE}"
-mkdir -p $RESULTS_DIR
-```
-
-**Test user credentials** (for logging into the UI or verifying results manually):
- Email: `test@test.com`
- Password: `testtest123`
-
-## Step 1: Understand the PR
-
-Before testing, understand what changed:
-
-```bash
-cd $WORKTREE_PATH
-
-# Read PR description to understand the WHY
-gh pr view {N} --json body --jq '.body'
-
-git log --oneline dev..HEAD | head -20
-git diff dev --stat
-```
-
-Read the PR description (Why / What / How) and changed files to understand:
-0. **Why** does this PR exist? What problem does it solve?
-1. **What** feature/fix does this PR implement?
-2. **How** does it work? What's the approach?
-3. What components are affected? (backend, frontend, copilot, executor, etc.)
-4. What are the key user-facing behaviors to test?
-
-## Step 2: Write test scenarios
-
-Based on the PR analysis, write a test plan to `$RESULTS_DIR/test-plan.md`:
-
-```markdown
-# Test Plan: PR #{N} — {title}
-
-## Scenarios
-1. [Scenario name] — [what to verify]
-2. ...
-
-## API Tests (if applicable)
-1. [Endpoint] — [expected behavior]
-   - Before state: [what to check before]
-   - After state: [what to verify changed]
-
-## UI Tests (if applicable)
-1. [Page/component] — [interaction to test]
-   - Screenshot before: [what to capture]
-   - Screenshot after: [what to capture]
-
-## Negative Tests (REQUIRED — at least one per feature)
-1. [What should NOT happen] — [how to trigger it]
-   - Expected error: [what error message/code]
-   - State unchanged: [what to verify did NOT change]
-```
-
-**Be critical** — include edge cases, error paths, and security checks. Every scenario MUST specify what screenshots to take and what state to verify.
-
-## Step 3: Environment setup
-
-### 3a. Copy .env files from the root worktree
-
-The root worktree (`$REPO_ROOT`) has the canonical `.env` files with all API keys. Copy them to the target worktree:
-
-```bash
-# CRITICAL: .env files are NOT checked into git. They must be copied manually.
-cp $REPO_ROOT/autogpt_platform/.env $PLATFORM_DIR/.env
-cp $REPO_ROOT/autogpt_platform/backend/.env $BACKEND_DIR/.env
-cp $REPO_ROOT/autogpt_platform/frontend/.env $FRONTEND_DIR/.env
-```
-
-### 3b. Configure copilot authentication
-
-The copilot needs an LLM API to function. Two approaches (try subscription first):
-
-#### Option 1: Subscription mode (preferred — uses your Claude Max/Pro subscription)
-
-The `claude_agent_sdk` Python package **bundles its own Claude CLI binary** — no need to install `@anthropic-ai/claude-code` via npm. The backend auto-provisions credentials from environment variables on startup.
-
-Run the helper script to extract tokens from your host and auto-update `backend/.env` (works on macOS, Linux, and Windows/WSL):
-
-```bash
-# Extracts OAuth tokens and writes CLAUDE_CODE_OAUTH_TOKEN + CLAUDE_CODE_REFRESH_TOKEN into .env
-bash $BACKEND_DIR/scripts/refresh_claude_token.sh --env-file $BACKEND_DIR/.env
-```
-
-**How it works:** The script reads the OAuth token from:
- **macOS**: system keychain (`"Claude Code-credentials"`)
- **Linux/WSL**: `~/.claude/.credentials.json`
- **Windows**: `%APPDATA%/claude/.credentials.json`
-
-It sets `CLAUDE_CODE_OAUTH_TOKEN`, `CLAUDE_CODE_REFRESH_TOKEN`, and `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true` in the `.env` file. On container startup, the backend auto-provisions `~/.claude/.credentials.json` inside the container from these env vars. The SDK's bundled CLI then authenticates using that file. No `claude login`, no npm install needed.
-
-**Note:** The OAuth token expires (~24h). If copilot returns auth errors, re-run the script and restart: `$BACKEND_DIR/scripts/refresh_claude_token.sh --env-file $BACKEND_DIR/.env && docker compose up -d copilot_executor`
-
-#### Option 2: OpenRouter API key mode (fallback)
-
-If subscription mode doesn't work, switch to API key mode using OpenRouter:
-
-```bash
-# In $BACKEND_DIR/.env, ensure these are set:
-CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false
-CHAT_API_KEY=<value of OPEN_ROUTER_API_KEY from the same .env>
-CHAT_BASE_URL=https://openrouter.ai/api/v1
-CHAT_USE_CLAUDE_AGENT_SDK=true
-```
-
-Use `sed` to update these values:
-```bash
-ORKEY=$(grep "^OPEN_ROUTER_API_KEY=" $BACKEND_DIR/.env | cut -d= -f2)
-[ -n "$ORKEY" ] || { echo "ERROR: OPEN_ROUTER_API_KEY is missing in $BACKEND_DIR/.env"; exit 1; }
-perl -i -pe 's/CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true/CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false/' $BACKEND_DIR/.env
-# Add or update CHAT_API_KEY and CHAT_BASE_URL
-grep -q "^CHAT_API_KEY=" $BACKEND_DIR/.env && perl -i -pe "s|^CHAT_API_KEY=.*|CHAT_API_KEY=$ORKEY|" $BACKEND_DIR/.env || echo "CHAT_API_KEY=$ORKEY" >> $BACKEND_DIR/.env
-grep -q "^CHAT_BASE_URL=" $BACKEND_DIR/.env && perl -i -pe 's|^CHAT_BASE_URL=.*|CHAT_BASE_URL=https://openrouter.ai/api/v1|' $BACKEND_DIR/.env || echo "CHAT_BASE_URL=https://openrouter.ai/api/v1" >> $BACKEND_DIR/.env
-```
-
-### 3c. Stop conflicting containers
-
-```bash
-# Stop any running app containers (keep infra: supabase, redis, rabbitmq, clamav)
-docker ps --format "{{.Names}}" | grep -E "rest_server|executor|copilot|websocket|database_manager|scheduler|notification|frontend|migrate" | while read name; do
-  docker stop "$name" 2>/dev/null
-done
-```
-
-### 3e. Build and start
-
-```bash
-cd $PLATFORM_DIR && docker compose build --no-cache 2>&1 | tail -20
-if [ ${PIPESTATUS[0]} -ne 0 ]; then echo "ERROR: Docker build failed"; exit 1; fi
-
-cd $PLATFORM_DIR && docker compose up -d 2>&1 | tail -20
-if [ ${PIPESTATUS[0]} -ne 0 ]; then echo "ERROR: Docker compose up failed"; exit 1; fi
-```
-
-**Note:** If the container appears to be running old code (e.g. missing PR changes), use `docker compose build --no-cache` to force a full rebuild. Docker BuildKit may sometimes reuse cached `COPY` layers from a previous build on a different branch.
-
-**Expected time: 3-8 minutes** for build, 5-10 minutes with `--no-cache`.
-
-### 3f. Wait for services to be ready
-
-```bash
-# Poll until backend and frontend respond
-for i in $(seq 1 60); do
-  BACKEND=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8006/docs 2>/dev/null)
-  FRONTEND=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000 2>/dev/null)
-  if [ "$BACKEND" = "200" ] && [ "$FRONTEND" = "200" ]; then
-    echo "Services ready"
-    break
-  fi
-  sleep 5
-done
-```
-
-
-### 3h. Create test user and get auth token
-
-```bash
-ANON_KEY=$(grep "NEXT_PUBLIC_SUPABASE_ANON_KEY=" $FRONTEND_DIR/.env | sed 's/.*NEXT_PUBLIC_SUPABASE_ANON_KEY=//' | tr -d '[:space:]')
-
-# Signup (idempotent — returns "User already registered" if exists)
-RESULT=$(curl -s -X POST 'http://localhost:8000/auth/v1/signup' \
-  -H "apikey: $ANON_KEY" \
-  -H 'Content-Type: application/json' \
-  -d '{"email":"test@test.com","password":"testtest123"}')
-
-# If "Database error finding user", restart supabase-auth and retry
-if echo "$RESULT" | grep -q "Database error"; then
-  docker restart supabase-auth && sleep 5
-  curl -s -X POST 'http://localhost:8000/auth/v1/signup' \
-    -H "apikey: $ANON_KEY" \
-    -H 'Content-Type: application/json' \
-    -d '{"email":"test@test.com","password":"testtest123"}'
-fi
-
-# Get auth token
-TOKEN=$(curl -s -X POST 'http://localhost:8000/auth/v1/token?grant_type=password' \
-  -H "apikey: $ANON_KEY" \
-  -H 'Content-Type: application/json' \
-  -d '{"email":"test@test.com","password":"testtest123"}' | jq -r '.access_token // ""')
-```
-
-**Use this token for ALL API calls:**
-```bash
-curl -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/...
-```
-
-## Step 4: Run tests
-
-### Service ports reference
-
-| Service | Port | URL |
-|---------|------|-----|
-| Frontend | 3000 | http://localhost:3000 |
-| Backend REST | 8006 | http://localhost:8006 |
-| Supabase Auth (via Kong) | 8000 | http://localhost:8000 |
-| Executor | 8002 | http://localhost:8002 |
-| Copilot Executor | 8008 | http://localhost:8008 |
-| WebSocket | 8001 | http://localhost:8001 |
-| Database Manager | 8005 | http://localhost:8005 |
-| Redis | 6379 | localhost:6379 |
-| RabbitMQ | 5672 | localhost:5672 |
-
-### API testing
-
-Use `curl` with the auth token for backend API tests. **For EVERY API call that changes state, record before/after values:**
-
-```bash
-# Example: List agents
-curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/graphs | jq . | head -20
-
-# Example: Create an agent
-curl -s -X POST http://localhost:8006/api/graphs \
-  -H "Authorization: Bearer $TOKEN" \
-  -H 'Content-Type: application/json' \
-  -d '{...}' | jq .
-
-# Example: Run an agent
-curl -s -X POST "http://localhost:8006/api/graphs/{graph_id}/execute" \
-  -H "Authorization: Bearer $TOKEN" \
-  -H 'Content-Type: application/json' \
-  -d '{"data": {...}}'
-
-# Example: Get execution results
-curl -s -H "Authorization: Bearer $TOKEN" \
-  "http://localhost:8006/api/graphs/{graph_id}/executions/{exec_id}" | jq .
-```
-
-**State verification pattern (use for EVERY state-changing API call):**
-```bash
-# 1. Record BEFORE state
-BEFORE_STATE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/{resource} | jq '{relevant_fields}')
-echo "BEFORE: $BEFORE_STATE"
-
-# 2. Perform the action
-ACTION_RESULT=$(curl -s -X POST ... | jq .)
-echo "ACTION RESULT: $ACTION_RESULT"
-
-# 3. Record AFTER state
-AFTER_STATE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/{resource} | jq '{relevant_fields}')
-echo "AFTER: $AFTER_STATE"
-
-# 4. Log the comparison
-echo "=== STATE CHANGE VERIFICATION ==="
-echo "Before: $BEFORE_STATE"
-echo "After: $AFTER_STATE"
-echo "Expected change: {describe what should have changed}"
-```
-
-### Browser testing with agent-browser
-
-```bash
-# Close any existing session
-agent-browser close 2>/dev/null || true
-
-# Use --session-name to persist cookies across navigations
-# This means login only needs to happen once per test session
-agent-browser --session-name pr-test open 'http://localhost:3000/login' --timeout 15000
-
-# Get interactive elements
-agent-browser --session-name pr-test snapshot | grep "textbox\|button"
-
-# Login
-agent-browser --session-name pr-test fill {email_ref} "test@test.com"
-agent-browser --session-name pr-test fill {password_ref} "testtest123"
-agent-browser --session-name pr-test click {login_button_ref}
-sleep 5
-
-# Dismiss cookie banner if present
-agent-browser --session-name pr-test click 'text=Accept All' 2>/dev/null || true
-
-# Navigate — cookies are preserved so login persists
-agent-browser --session-name pr-test open 'http://localhost:3000/copilot' --timeout 10000
-
-# Take screenshot
-agent-browser --session-name pr-test screenshot $RESULTS_DIR/01-page.png
-
-# Interact with elements
-agent-browser --session-name pr-test fill {ref} "text"
-agent-browser --session-name pr-test press "Enter"
-agent-browser --session-name pr-test click {ref}
-agent-browser --session-name pr-test click 'text=Button Text'
-
-# Read page content
-agent-browser --session-name pr-test snapshot | grep "text:"
-```
-
-**Key pages:**
- `/copilot` — CoPilot chat (for testing copilot features)
- `/build` — Agent builder (for testing block/node features)
- `/build?flowID={id}` — Specific agent in builder
- `/library` — Agent library (for testing listing/import features)
- `/library/agents/{id}` — Agent detail with run history
- `/marketplace` — Marketplace
-
-### Checking logs
-
-```bash
-# Backend REST server
-docker logs autogpt_platform-rest_server-1 2>&1 | tail -30
-
-# Executor (runs agent graphs)
-docker logs autogpt_platform-executor-1 2>&1 | tail -30
-
-# Copilot executor (runs copilot chat sessions)
-docker logs autogpt_platform-copilot_executor-1 2>&1 | tail -30
-
-# Frontend
-docker logs autogpt_platform-frontend-1 2>&1 | tail -30
-
-# Filter for errors
-docker logs autogpt_platform-executor-1 2>&1 | grep -i "error\|exception\|traceback" | tail -20
-```
-
-### Copilot chat testing
-
-The copilot uses SSE streaming. To test via API:
-
-```bash
-# Create a session
-SESSION_ID=$(curl -s -X POST 'http://localhost:8006/api/chat/sessions' \
-  -H "Authorization: Bearer $TOKEN" \
-  -H 'Content-Type: application/json' \
-  -d '{}' | jq -r '.id // .session_id // ""')
-
-# Stream a message (SSE - will stream chunks)
-curl -N -X POST "http://localhost:8006/api/chat/sessions/$SESSION_ID/stream" \
-  -H "Authorization: Bearer $TOKEN" \
-  -H 'Content-Type: application/json' \
-  -d '{"message": "Hello, what can you help me with?"}' \
-  --max-time 60 2>/dev/null | head -50
-```
-
-Or test via browser (preferred for UI verification):
-```bash
-agent-browser --session-name pr-test open 'http://localhost:3000/copilot' --timeout 10000
-# ... fill chat input and press Enter, wait 20-30s for response
-```
-
-## Step 5: Record results and take screenshots
-
-**Take a screenshot at EVERY significant test step** — before and after interactions, on success, and on failure. This is NON-NEGOTIABLE.
-
-**Required screenshot pattern for each test scenario:**
-```bash
-# BEFORE the action
-agent-browser --session-name pr-test screenshot $RESULTS_DIR/{NN}-{scenario}-before.png
-
-# Perform the action...
-
-# AFTER the action
-agent-browser --session-name pr-test screenshot $RESULTS_DIR/{NN}-{scenario}-after.png
-```
-
-**Naming convention:**
-```bash
-# Examples:
-# $RESULTS_DIR/01-login-page-before.png
-# $RESULTS_DIR/02-login-page-after.png
-# $RESULTS_DIR/03-credits-page-before.png
-# $RESULTS_DIR/04-credits-purchase-after.png
-# $RESULTS_DIR/05-negative-insufficient-credits.png
-# $RESULTS_DIR/06-error-state.png
-```
-
-**Minimum requirements:**
- At least TWO screenshots per test scenario (before + after)
- At least ONE screenshot for each negative test case showing the error state
- If a test fails, screenshot the failure state AND any error logs visible in the UI
-
-## Step 6: Show results to user with screenshots
-
-**CRITICAL: After all tests complete, you MUST show every screenshot to the user using the Read tool, with an explanation of what each screenshot shows.** This is the most important part of the test report — the user needs to visually verify the results.
-
-For each screenshot:
-1. Use the `Read` tool to display the PNG file (Claude can read images)
-2. Write a 1-2 sentence explanation below it describing:
-   - What page/state is being shown
-   - What the screenshot proves (which test scenario it validates)
-   - Any notable details visible in the UI
-
-Format the output like this:
-
-```markdown
-### Screenshot 1: {descriptive title}
-[Read the PNG file here]
-
-**What it shows:** {1-2 sentence explanation of what this screenshot proves}
-
---
-```
-
-After showing all screenshots, output a **detailed** summary table:
-
-| # | Scenario | Result | API Evidence | Screenshot Evidence |
-|---|----------|--------|-------------|-------------------|
-| 1 | {name} | PASS/FAIL | Before: X, After: Y | 01-before.png, 02-after.png |
-| 2 | ... | ... | ... | ... |
-
-**IMPORTANT:** As you show each screenshot and record test results, persist them in shell variables for Step 7:
-
-```bash
-# Build these variables during Step 6 — they are required by Step 7's script
-# NOTE: declare -A requires Bash 4.0+. This is standard on modern systems (macOS ships zsh
-# but Homebrew bash is 5.x; Linux typically has bash 5.x). If running on Bash <4, use a
-# plain variable with a lookup function instead.
-declare -A SCREENSHOT_EXPLANATIONS=(
-  # Each explanation MUST answer three things:
-  #   1. FLOW: Which test scenario / user journey is this part of?
-  #   2. STEPS: What exact actions were taken to reach this state?
-  #   3. EVIDENCE: What does this screenshot prove (pass/fail/data)?
-  #
-  # Good example:
-  #   ["03-cost-log-after-run.png"]="Flow: LLM block cost tracking. Steps: Logged in as tester@gmail.com → ran 'Cost Test Agent' → waited for COMPLETED status. Evidence: PlatformCostLog table shows 1 new row with cost_microdollars=1234 and correct user_id."
-  #
-  # Bad example (too vague — never do this):
-  #   ["03-cost-log.png"]="Shows the cost log table."
-  ["01-login-page.png"]="Flow: Login flow. Steps: Opened /login. Evidence: Login page renders with email/password fields and SSO options visible."
-  ["02-builder-with-block.png"]="Flow: Block execution. Steps: Logged in → /build → added LLM block. Evidence: Builder canvas shows block connected to trigger, ready to run."
-  # ... one entry per screenshot using the flow/steps/evidence format above
-)
-
-TEST_RESULTS_TABLE="| 1 | Login flow | PASS | N/A | 01-login-before.png, 02-login-after.png |
-| 2 | Credits purchase | PASS | Before: 100, After: 95 | 03-credits-before.png, 04-credits-after.png |
-| 3 | Insufficient credits (negative) | PASS | Credits: 0, rejected | 05-insufficient-credits-error.png |"
-# ... one row per test scenario with actual results
-```
-
-## Step 7: Post test report as PR comment with screenshots
-
-Upload screenshots to the PR using the GitHub Git API (no local git operations — safe for worktrees), then post a comment with inline images and per-screenshot explanations.
-
-**This step is MANDATORY. Every test run MUST post a PR comment with screenshots. No exceptions.**
-
-> **CRITICAL — NEVER post a bare directory link like `https://github.com/.../tree/...`.**
-> Every screenshot MUST appear as `![name](raw_url)` inline in the PR comment so reviewers can see them without clicking any links. After posting, the verification step below greps the comment for `![` tags and exits 1 if none are found — the test run is considered incomplete until this passes.
-
-```bash
-# Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
-REPO="Significant-Gravitas/AutoGPT"
-SCREENSHOTS_BRANCH="test-screenshots/pr-${PR_NUMBER}"
-SCREENSHOTS_DIR="test-screenshots/PR-${PR_NUMBER}"
-
-# Step 1: Create blobs for each screenshot and build tree JSON
-# Retry each blob upload up to 3 times. If still failing, list them at end of report.
-shopt -s nullglob
-SCREENSHOT_FILES=("$RESULTS_DIR"/*.png)
-if [ ${#SCREENSHOT_FILES[@]} -eq 0 ]; then
-  echo "ERROR: No screenshots found in $RESULTS_DIR. Test run is incomplete."
-  exit 1
-fi
-TREE_JSON='['
-FIRST=true
-FAILED_UPLOADS=()
-for img in "${SCREENSHOT_FILES[@]}"; do
-  BASENAME=$(basename "$img")
-  B64=$(base64 < "$img")
-  BLOB_SHA=""
-  for attempt in 1 2 3; do
-    BLOB_SHA=$(gh api "repos/${REPO}/git/blobs" -f content="$B64" -f encoding="base64" --jq '.sha' 2>/dev/null || true)
-    [ -n "$BLOB_SHA" ] && break
-    sleep 1
-  done
-  if [ -z "$BLOB_SHA" ]; then
-    FAILED_UPLOADS+=("$img")
-    continue
-  fi
-  if [ "$FIRST" = true ]; then FIRST=false; else TREE_JSON+=','; fi
-  TREE_JSON+="{\"path\":\"${SCREENSHOTS_DIR}/${BASENAME}\",\"mode\":\"100644\",\"type\":\"blob\",\"sha\":\"${BLOB_SHA}\"}"
-done
-TREE_JSON+=']'
-
-# Step 2: Create tree, commit (with parent), and branch ref
-TREE_SHA=$(echo "$TREE_JSON" | jq -c '{tree: .}' | gh api "repos/${REPO}/git/trees" --input - --jq '.sha')
-
-# Resolve existing branch tip as parent (avoids orphan commits on repeat runs)
-PARENT_SHA=$(gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" --jq '.object.sha' 2>/dev/null || true)
-if [ -n "$PARENT_SHA" ]; then
-  COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
-    -f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
-    -f tree="$TREE_SHA" \
-    -f "parents[]=$PARENT_SHA" \
-    --jq '.sha')
-else
-  # First commit on this branch — no parent
-  COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
-    -f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
-    -f tree="$TREE_SHA" \
-    --jq '.sha')
-fi
-
-gh api "repos/${REPO}/git/refs" \
-  -f ref="refs/heads/${SCREENSHOTS_BRANCH}" \
-  -f sha="$COMMIT_SHA" 2>/dev/null \
-  || gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" \
-    -X PATCH -f sha="$COMMIT_SHA" -f force=true
-```
-
-Then post the comment with **inline images AND explanations for each screenshot**:
-
-```bash
-REPO_URL="https://raw.githubusercontent.com/${REPO}/${SCREENSHOTS_BRANCH}"
-
-# Build image markdown using uploaded image URLs; skip FAILED_UPLOADS (listed separately)
-
-IMAGE_MARKDOWN=""
-for img in "${SCREENSHOT_FILES[@]}"; do
-  BASENAME=$(basename "$img")
-  TITLE=$(echo "${BASENAME%.png}" | sed 's/^[0-9]*-//' | sed 's/-/ /g' | awk '{for(i=1;i<=NF;i++) $i=toupper(substr($i,1,1)) tolower(substr($i,2))}1')
-  # Skip images that failed to upload — they will be listed at the end
-  IS_FAILED=false
-  for failed in "${FAILED_UPLOADS[@]}"; do
-    [ "$(basename "$failed")" = "$BASENAME" ] && IS_FAILED=true && break
-  done
-  if [ "$IS_FAILED" = true ]; then
-    continue
-  fi
-  EXPLANATION="${SCREENSHOT_EXPLANATIONS[$BASENAME]}"
-  if [ -z "$EXPLANATION" ]; then
-    echo "ERROR: Missing screenshot explanation for $BASENAME. Add it to SCREENSHOT_EXPLANATIONS in Step 6."
-    exit 1
-  fi
-  IMAGE_MARKDOWN="${IMAGE_MARKDOWN}
-### ${TITLE}
-![${BASENAME}](${REPO_URL}/${SCREENSHOTS_DIR}/${BASENAME})
-${EXPLANATION}
-"
-done
-
-# Write comment body to file to avoid shell interpretation issues with special characters
-COMMENT_FILE=$(mktemp)
-# If any uploads failed, append a section listing them with instructions
-FAILED_SECTION=""
-if [ ${#FAILED_UPLOADS[@]} -gt 0 ]; then
-  FAILED_SECTION="
-## ⚠️ Failed Screenshot Uploads
-The following screenshots could not be uploaded via the GitHub API after 3 retries.
-**To add them:** drag-and-drop or paste these files into a PR comment manually:
-"
-  for failed in "${FAILED_UPLOADS[@]}"; do
-    FAILED_SECTION="${FAILED_SECTION}
- \`$(basename "$failed")\` (local path: \`$failed\`)"
-  done
-  FAILED_SECTION="${FAILED_SECTION}
-
-**Run status:** INCOMPLETE until the files above are manually attached and visible inline in the PR."
-fi
-
-cat > "$COMMENT_FILE" <<INNEREOF
-## E2E Test Report
-
-| # | Scenario | Result | API Evidence | Screenshot Evidence |
-|---|----------|--------|-------------|-------------------|
-${TEST_RESULTS_TABLE}
-
-${IMAGE_MARKDOWN}
-${FAILED_SECTION}
-INNEREOF
-
-POSTED_BODY=$(gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE" --jq '.body')
-rm -f "$COMMENT_FILE"
-```
-
-**The PR comment MUST include:**
-1. A summary table of all scenarios with PASS/FAIL and before/after API evidence
-2. Every successfully uploaded screenshot rendered inline; any failed uploads listed with manual attachment instructions
-3. A structured explanation below each screenshot covering: **Flow** (which scenario), **Steps** (exact actions taken to reach this state), **Evidence** (what this proves — pass/fail/data values). A bare "shows the page" caption is not acceptable.
-
-This approach uses the GitHub Git API to create blobs, trees, commits, and refs entirely server-side. No local `git checkout` or `git push` — safe for worktrees and won't interfere with the PR branch.
-
-**Verify inline rendering after posting — this is required, not optional:**
-
-```bash
-# 1. Confirm the posted comment body contains inline image markdown syntax
-if ! echo "$POSTED_BODY" | grep -q '!\['; then
-  echo "❌ FAIL: No inline image tags in posted comment body. Re-check IMAGE_MARKDOWN and re-post."
-  exit 1
-fi
-
-# 2. Verify at least one raw URL actually resolves (catches wrong branch name, wrong path, etc.)
-FIRST_IMG_URL=$(echo "$POSTED_BODY" | grep -o 'https://raw.githubusercontent.com[^)]*' | head -1)
-if [ -n "$FIRST_IMG_URL" ]; then
-  HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" --max-time 10 "$FIRST_IMG_URL")
-  if [ "$HTTP_STATUS" = "200" ]; then
-    echo "✅ Inline images confirmed and raw URL resolves (HTTP 200)"
-  else
-    echo "❌ FAIL: Raw image URL returned HTTP $HTTP_STATUS — images will not render inline."
-    echo "   URL: $FIRST_IMG_URL"
-    echo "   Check branch name, path, and that the push succeeded."
-    exit 1
-  fi
-else
-  echo "⚠️  Could not extract a raw URL from the comment — verify manually."
-fi
-```
-
-## Step 8: Evaluate test completeness and post a GitHub review
-
-After posting the PR comment, evaluate whether the test run actually covered everything it needed to. This is NOT a rubber-stamp — be critical. Then post a formal GitHub review so the PR author and reviewers can see the verdict.
-
-### 8a. Evaluate against the test plan
-
-Re-read `$RESULTS_DIR/test-plan.md` (written in Step 2) and `$RESULTS_DIR/test-report.md` (written in Step 5). For each scenario in the plan, answer:
-
-> **Note:** `test-report.md` is written in Step 5. If it doesn't exist, write it before proceeding here — see the Step 5 template. Do not skip evaluation because the file is missing; create it from your notes instead.
-
-| Question | Pass criteria |
-|----------|--------------|
-| Was it tested? | Explicit steps were executed, not just described |
-| Is there screenshot evidence? | At least one before/after screenshot per scenario |
-| Did the core feature work correctly? | Expected state matches actual state |
-| Were negative cases tested? | At least one failure/rejection case per feature |
-| Was DB/API state verified (not just UI)? | Raw API response or DB query confirms state change |
-
-Build a verdict:
- **APPROVE** — every scenario tested, evidence present, no bugs found or all bugs are minor/known
- **REQUEST_CHANGES** — one or more: untested scenarios, missing evidence, bugs found, data not verified
-
-### 8b. Post the GitHub review
-
-```bash
-EVAL_FILE=$(mktemp)
-
-# === STEP A: Write header ===
-cat > "$EVAL_FILE" << 'ENDEVAL'
-## 🧪 Test Evaluation
-
-### Coverage checklist
-ENDEVAL
-
-# === STEP B: Append ONE line per scenario — do this BEFORE calculating verdict ===
-# Format: "- ✅ **Scenario N – name**: <what was done and verified>"
-#      or "- ❌ **Scenario N – name**: <what is missing or broken>"
-# Examples:
-#   echo "- ✅ **Scenario 1 – Login flow**: tested, screenshot evidence present, auth token verified via API" >> "$EVAL_FILE"
-#   echo "- ❌ **Scenario 3 – Cost logging**: NOT verified in DB — UI showed entry but raw SQL query was skipped" >> "$EVAL_FILE"
-#
-# !!! IMPORTANT: append ALL scenario lines here before proceeding to STEP C !!!
-
-# === STEP C: Derive verdict from the checklist — runs AFTER all lines are appended ===
-FAIL_COUNT=$(grep -c "^- ❌" "$EVAL_FILE" || true)
-if [ "$FAIL_COUNT" -eq 0 ]; then
-  VERDICT="APPROVE"
-else
-  VERDICT="REQUEST_CHANGES"
-fi
-
-# === STEP D: Append verdict section ===
-cat >> "$EVAL_FILE" << ENDVERDICT
-
-### Verdict
-ENDVERDICT
-
-if [ "$VERDICT" = "APPROVE" ]; then
-  echo "✅ All scenarios covered with evidence. No blocking issues found." >> "$EVAL_FILE"
-else
-  echo "❌ $FAIL_COUNT scenario(s) incomplete or have confirmed bugs. See ❌ items above." >> "$EVAL_FILE"
-  echo "" >> "$EVAL_FILE"
-  echo "**Required before merge:** address each ❌ item above." >> "$EVAL_FILE"
-fi
-
-# === STEP E: Post the review ===
-gh api "repos/${REPO}/pulls/$PR_NUMBER/reviews" \
-  --method POST \
-  -f body="$(cat "$EVAL_FILE")" \
-  -f event="$VERDICT"
-
-rm -f "$EVAL_FILE"
-```
-
-**Rules:**
- Never auto-approve without checking every scenario in the test plan
- `REQUEST_CHANGES` if ANY scenario is untested, lacks DB/API evidence, or has a confirmed bug
- The evaluation body must list every scenario explicitly (✅ or ❌) — not just the failures
- If you find new bugs during evaluation, add them to the request-changes body and (if `--fix` flag is set) fix them before posting
-
-## Fix mode (--fix flag)
-
-When `--fix` is present, the standard is HIGHER. Do not just note issues — FIX them immediately.
-
-### Fix protocol for EVERY issue found (including UX issues):
-
-1. **Identify** the root cause in the code — read the relevant source files
-2. **Write a failing test first** (TDD): For backend bugs, write a test marked with `pytest.mark.xfail(reason="...")`. For frontend/Playwright bugs, write a test with `.fixme` annotation. Run it to confirm it fails as expected.
-3. **Screenshot** the broken state: `agent-browser screenshot $RESULTS_DIR/{NN}-broken-{description}.png`
-4. **Fix** the code in the worktree
-5. **Rebuild** ONLY the affected service (not the whole stack):
-   ```bash
-   cd $PLATFORM_DIR && docker compose up --build -d {service_name}
-   # e.g., docker compose up --build -d rest_server
-   # e.g., docker compose up --build -d frontend
-   ```
-6. **Wait** for the service to be ready (poll health endpoint)
-7. **Re-test** the same scenario
-8. **Screenshot** the fixed state: `agent-browser screenshot $RESULTS_DIR/{NN}-fixed-{description}.png`
-9. **Remove the xfail/fixme marker** from the test written in step 2, and verify it passes
-10. **Verify** the fix did not break other scenarios (run a quick smoke test)
-11. **Commit and push** immediately:
-   ```bash
-   cd $WORKTREE_PATH
-   git add -A
-   git commit -m "fix: {description of fix}"
-   git push
-   ```
-12. **Continue** to the next test scenario
-
-### Fix loop (like pr-address)
-
-```text
-test scenario → find issue (bug OR UX problem) → screenshot broken state
-→ fix code → rebuild affected service only → re-test → screenshot fixed state
-→ verify no regressions → commit + push
-→ repeat for next scenario
-→ after ALL scenarios pass, run full re-test to verify everything together
-```
-
-**Key differences from non-fix mode:**
- UX issues count as bugs — fix them (bad alignment, confusing labels, missing loading states)
- Every fix MUST have a before/after screenshot pair proving it works
- Commit after EACH fix, not in a batch at the end
- The final re-test must produce a clean set of all-passing screenshots
-
-## Known issues and workarounds
-
-### Problem: "Database error finding user" on signup
-**Cause:** Supabase auth service schema cache is stale after migration.
-**Fix:** `docker restart supabase-auth && sleep 5` then retry signup.
-
-### Problem: Copilot returns auth errors in subscription mode
-**Cause:** `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true` but `CLAUDE_CODE_OAUTH_TOKEN` is not set or expired.
-**Fix:** Re-extract the OAuth token from macOS keychain (see step 3b, Option 1) and recreate the container (`docker compose up -d copilot_executor`). The backend auto-provisions `~/.claude/.credentials.json` from the env var on startup. No `npm install` or `claude login` needed — the SDK bundles its own CLI binary.
-
-### Problem: agent-browser can't find chromium
-**Cause:** The Dockerfile auto-provisions system chromium on all architectures (including ARM64). If your branch is behind `dev`, this may not be present yet.
-**Fix:** Check if chromium exists: `which chromium || which chromium-browser`. If missing, install it: `apt-get install -y chromium` and set `AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium` in the container environment.
-
-### Problem: agent-browser selector matches multiple elements
-**Cause:** `text=X` matches all elements containing that text.
-**Fix:** Use `agent-browser snapshot` to get specific `ref=eNN` references, then use those: `agent-browser click eNN`.
-
-### Problem: Frontend shows cookie banner blocking interaction
-**Fix:** `agent-browser click 'text=Accept All'` before other interactions.
-
-### Problem: Container loses npm packages after rebuild
-**Cause:** `docker compose up --build` rebuilds the image, losing runtime installs.
-**Fix:** Add packages to the Dockerfile instead of installing at runtime.
-
-### Problem: Services not starting after `docker compose up`
-**Fix:** Wait and check health: `docker compose ps`. Common cause: migration hasn't finished. Check: `docker logs autogpt_platform-migrate-1 2>&1 | tail -5`. If supabase-db isn't healthy: `docker restart supabase-db && sleep 10`.
-
-### Problem: Docker uses cached layers with old code (PR changes not visible)
-**Cause:** `docker compose up --build` reuses cached `COPY` layers from previous builds. If the PR branch changes Python files but the previous build already cached that layer from `dev`, the container runs `dev` code.
-**Fix:** Always use `docker compose build --no-cache` for the first build of a PR branch. Subsequent rebuilds within the same branch can use `--build`.
-
-### Problem: `agent-browser open` loses login session
-**Cause:** Without session persistence, `agent-browser open` starts fresh.
-**Fix:** Use `--session-name pr-test` on ALL agent-browser commands. This auto-saves/restores cookies and localStorage across navigations. Alternatively, use `agent-browser eval "window.location.href = '...'"` to navigate within the same context.
-
-### Problem: Supabase auth returns "Database error querying schema"
-**Cause:** The database schema changed (migration ran) but supabase-auth has a stale schema cache.
-**Fix:** `docker restart supabase-db && sleep 10 && docker restart supabase-auth && sleep 8`. If user data was lost, re-signup.
--- a/.claude/skills/setup-repo/SKILL.md
+++ b/.claude/skills/setup-repo/SKILL.md
@@ -1,195 +0,0 @@
---
-name: setup-repo
-description: Initialize a worktree-based repo layout for parallel development. Creates a main worktree, a reviews worktree for PR reviews, and N numbered work branches. Handles .env creation, dependency installation, and branchlet config. TRIGGER when user asks to set up the repo from scratch, initialize worktrees, bootstrap their dev environment, "setup repo", "setup worktrees", "initialize dev environment", "set up branches", or when a freshly cloned repo has no sibling worktrees.
-user-invocable: true
-args: "No arguments — interactive setup via prompts."
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# Repository Setup
-
-This skill sets up a worktree-based development layout from a freshly cloned repo. It creates:
- A **main** worktree (the primary checkout)
- A **reviews** worktree (for PR reviews)
- **N work branches** (branch1..branchN) for parallel development
-
-## Step 1: Identify the repo
-
-Determine the repo root and parent directory:
-
-```bash
-ROOT=$(git rev-parse --show-toplevel)
-REPO_NAME=$(basename "$ROOT")
-PARENT=$(dirname "$ROOT")
-```
-
-Detect if the repo is already inside a worktree layout by counting sibling worktrees (not just checking the directory name, which could be anything):
-
-```bash
-# Count worktrees that are siblings (live under $PARENT but aren't $ROOT itself)
-SIBLING_COUNT=$(git worktree list --porcelain 2>/dev/null | grep "^worktree " | grep -c "$PARENT/" || true)
-if [ "$SIBLING_COUNT" -gt 1 ]; then
-  echo "INFO: Existing worktree layout detected at $PARENT ($SIBLING_COUNT worktrees)"
-  # Use $ROOT as-is; skip renaming/restructuring
-else
-  echo "INFO: Fresh clone detected, proceeding with setup"
-fi
-```
-
-## Step 2: Ask the user questions
-
-Use AskUserQuestion to gather setup preferences:
-
-1. **How many parallel work branches do you need?** (Options: 4, 8, 16, or custom)
-   - These become `branch1` through `branchN`
-2. **Which branch should be the base?** (Options: origin/master, origin/dev, or custom)
-   - All work branches and reviews will start from this
-
-## Step 3: Fetch and set up branches
-
-```bash
-cd "$ROOT"
-git fetch origin
-
-# Create the reviews branch from base (skip if already exists)
-if git show-ref --verify --quiet refs/heads/reviews; then
-  echo "INFO: Branch 'reviews' already exists, skipping"
-else
-  git branch reviews <base-branch>
-fi
-
-# Create numbered work branches from base (skip if already exists)
-for i in $(seq 1 "$COUNT"); do
-  if git show-ref --verify --quiet "refs/heads/branch$i"; then
-    echo "INFO: Branch 'branch$i' already exists, skipping"
-  else
-    git branch "branch$i" <base-branch>
-  fi
-done
-```
-
-## Step 4: Create worktrees
-
-Create worktrees as siblings to the main checkout:
-
-```bash
-if [ -d "$PARENT/reviews" ]; then
-  echo "INFO: Worktree '$PARENT/reviews' already exists, skipping"
-else
-  git worktree add "$PARENT/reviews" reviews
-fi
-
-for i in $(seq 1 "$COUNT"); do
-  if [ -d "$PARENT/branch$i" ]; then
-    echo "INFO: Worktree '$PARENT/branch$i' already exists, skipping"
-  else
-    git worktree add "$PARENT/branch$i" "branch$i"
-  fi
-done
-```
-
-## Step 5: Set up environment files
-
-**Do NOT assume .env files exist.** For each worktree (including main if needed):
-
-1. Check if `.env` exists in the source worktree for each path
-2. If `.env` exists, copy it
-3. If only `.env.default` or `.env.example` exists, copy that as `.env`
-4. If neither exists, warn the user and list which env files are missing
-
-Env file locations to check (same as the `/worktree` skill — keep these in sync):
- `autogpt_platform/.env`
- `autogpt_platform/backend/.env`
- `autogpt_platform/frontend/.env`
-
-> **Note:** This env copying logic intentionally mirrors the `/worktree` skill's approach. If you update the path list or fallback logic here, update `/worktree` as well.
-
-```bash
-SOURCE="$ROOT"
-WORKTREES="reviews"
-for i in $(seq 1 "$COUNT"); do WORKTREES="$WORKTREES branch$i"; done
-
-FOUND_ANY_ENV=0
-for wt in $WORKTREES; do
-  TARGET="$PARENT/$wt"
-  for envpath in autogpt_platform autogpt_platform/backend autogpt_platform/frontend; do
-    if [ -f "$SOURCE/$envpath/.env" ]; then
-      FOUND_ANY_ENV=1
-      cp "$SOURCE/$envpath/.env" "$TARGET/$envpath/.env"
-    elif [ -f "$SOURCE/$envpath/.env.default" ]; then
-      FOUND_ANY_ENV=1
-      cp "$SOURCE/$envpath/.env.default" "$TARGET/$envpath/.env"
-      echo "NOTE: $wt/$envpath/.env was created from .env.default — you may need to edit it"
-    elif [ -f "$SOURCE/$envpath/.env.example" ]; then
-      FOUND_ANY_ENV=1
-      cp "$SOURCE/$envpath/.env.example" "$TARGET/$envpath/.env"
-      echo "NOTE: $wt/$envpath/.env was created from .env.example — you may need to edit it"
-    else
-      echo "WARNING: No .env, .env.default, or .env.example found at $SOURCE/$envpath/"
-    fi
-  done
-done
-
-if [ "$FOUND_ANY_ENV" -eq 0 ]; then
-  echo "WARNING: No environment files or templates were found in the source worktree."
-  # Use AskUserQuestion to confirm: "Continue setup without env files?"
-  # If the user declines, stop here and let them set up .env files first.
-fi
-```
-
-## Step 6: Copy branchlet config
-
-Copy `.branchlet.json` from main to each worktree so branchlet can manage sub-worktrees:
-
-```bash
-if [ -f "$ROOT/.branchlet.json" ]; then
-  for wt in $WORKTREES; do
-    cp "$ROOT/.branchlet.json" "$PARENT/$wt/.branchlet.json"
-  done
-fi
-```
-
-## Step 7: Install dependencies
-
-Install deps in all worktrees. Run these sequentially per worktree:
-
-```bash
-for wt in $WORKTREES; do
-  TARGET="$PARENT/$wt"
-  echo "=== Installing deps for $wt ==="
-  (cd "$TARGET/autogpt_platform/autogpt_libs" && poetry install) &&
-  (cd "$TARGET/autogpt_platform/backend" && poetry install && poetry run prisma generate) &&
-  (cd "$TARGET/autogpt_platform/frontend" && pnpm install) &&
-  echo "=== Done: $wt ===" ||
-  echo "=== FAILED: $wt ==="
-done
-```
-
-This is slow. Run in background if possible and notify when complete.
-
-## Step 8: Verify and report
-
-After setup, verify and report to the user:
-
-```bash
-git worktree list
-```
-
-Summarize:
- Number of worktrees created
- Which env files were copied vs created from defaults vs missing
- Any warnings or errors encountered
-
-## Final directory layout
-
-```
-parent/
-  main/              # Primary checkout (already exists)
-  reviews/           # PR review worktree
-  branch1/           # Work branch 1
-  branch2/           # Work branch 2
-  ...
-  branchN/           # Work branch N
-```
--- a/.claude/skills/worktree/SKILL.md
+++ b/.claude/skills/worktree/SKILL.md
@@ -1,85 +0,0 @@
---
-name: worktree
-description: Set up a new git worktree for parallel development. Creates the worktree, copies .env files, installs dependencies, and generates Prisma client. TRIGGER when user asks to set up a worktree, work on a branch in isolation, or needs a separate environment for a branch or PR.
-user-invocable: true
-args: "[name] — optional worktree name (e.g., 'AutoGPT7'). If omitted, uses next available AutoGPT<N>."
-metadata:
-  author: autogpt-team
-  version: "3.0.0"
---
-
-# Worktree Setup
-
-## Create the worktree
-
-Derive paths from the git toplevel. If a name is provided as argument, use it. Otherwise, check `git worktree list` and pick the next `AutoGPT<N>`.
-
-```bash
-ROOT=$(git rev-parse --show-toplevel)
-PARENT=$(dirname "$ROOT")
-
-# From an existing branch
-git worktree add "$PARENT/<NAME>" <branch-name>
-
-# From a new branch off dev
-git worktree add -b <new-branch> "$PARENT/<NAME>" dev
-```
-
-## Copy environment files
-
-Copy `.env` from the root worktree. Falls back to `.env.default` if `.env` doesn't exist.
-
-```bash
-ROOT=$(git rev-parse --show-toplevel)
-TARGET="$(dirname "$ROOT")/<NAME>"
-
-for envpath in autogpt_platform/backend autogpt_platform/frontend autogpt_platform; do
-  if [ -f "$ROOT/$envpath/.env" ]; then
-    cp "$ROOT/$envpath/.env" "$TARGET/$envpath/.env"
-  elif [ -f "$ROOT/$envpath/.env.default" ]; then
-    cp "$ROOT/$envpath/.env.default" "$TARGET/$envpath/.env"
-  fi
-done
-```
-
-## Install dependencies
-
-```bash
-TARGET="$(dirname "$(git rev-parse --show-toplevel)")/<NAME>"
-cd "$TARGET/autogpt_platform/autogpt_libs" && poetry install
-cd "$TARGET/autogpt_platform/backend" && poetry install && poetry run prisma generate
-cd "$TARGET/autogpt_platform/frontend" && pnpm install
-```
-
-Replace `<NAME>` with the actual worktree name (e.g., `AutoGPT7`).
-
-## Running the app (optional)
-
-Backend uses ports: 8001, 8002, 8003, 8005, 8006, 8007, 8008. Free them first if needed:
-
-```bash
-TARGET="$(dirname "$(git rev-parse --show-toplevel)")/<NAME>"
-for port in 8001 8002 8003 8005 8006 8007 8008; do
-  lsof -ti :$port | xargs kill -9 2>/dev/null || true
-done
-cd "$TARGET/autogpt_platform/backend" && poetry run app
-```
-
-## CoPilot testing
-
-SDK mode spawns a Claude subprocess — won't work inside Claude Code. Set `CHAT_USE_CLAUDE_AGENT_SDK=false` in `backend/.env` to use baseline mode.
-
-## Cleanup
-
-```bash
-# Replace <NAME> with the actual worktree name (e.g., AutoGPT7)
-git worktree remove "$(dirname "$(git rev-parse --show-toplevel)")/<NAME>"
-```
-
-## Alternative: Branchlet (optional)
-
-If [branchlet](https://www.npmjs.com/package/branchlet) is installed:
-
-```bash
-branchlet create -n <name> -s <source-branch> -b <new-branch>
-```
--- a/.claude/skills/write-frontend-tests/SKILL.md
+++ b/.claude/skills/write-frontend-tests/SKILL.md
@@ -1,224 +0,0 @@
---
-name: write-frontend-tests
-description: "Analyze the current branch diff against dev, plan integration tests for changed frontend pages/components, and write them. TRIGGER when user asks to write frontend tests, add test coverage, or 'write tests for my changes'."
-user-invocable: true
-args: "[base branch] — defaults to dev. Optionally pass a specific base branch to diff against."
-metadata:
-  author: autogpt-team
-  version: "1.0.0"
---
-
-# Write Frontend Tests
-
-Analyze the current branch's frontend changes, plan integration tests, and write them.
-
-## References
-
-Before writing any tests, read the testing rules and conventions:
-
- `autogpt_platform/frontend/TESTING.md` — testing strategy, file locations, examples
- `autogpt_platform/frontend/src/tests/AGENTS.md` — detailed testing rules, MSW patterns, decision flowchart
- `autogpt_platform/frontend/src/tests/integrations/test-utils.tsx` — custom render with providers
- `autogpt_platform/frontend/src/tests/integrations/vitest.setup.tsx` — MSW server setup
-
-## Step 1: Identify changed frontend files
-
-```bash
-BASE_BRANCH="${ARGUMENTS:-dev}"
-cd autogpt_platform/frontend
-
-# Get changed frontend files (excluding generated, config, and test files)
-git diff "$BASE_BRANCH"...HEAD --name-only -- src/ \
-  | grep -v '__generated__' \
-  | grep -v '__tests__' \
-  | grep -v '\.test\.' \
-  | grep -v '\.stories\.' \
-  | grep -v '\.spec\.'
-```
-
-Also read the diff to understand what changed:
-
-```bash
-git diff "$BASE_BRANCH"...HEAD --stat -- src/
-git diff "$BASE_BRANCH"...HEAD -- src/ | head -500
-```
-
-## Step 2: Categorize changes and find test targets
-
-For each changed file, determine:
-
-1. **Is it a page?** (`page.tsx`) — these are the primary test targets
-2. **Is it a hook?** (`use*.ts`) — test via the page that uses it
-3. **Is it a component?** (`.tsx` in `components/`) — test via the parent page unless it's complex enough to warrant isolation
-4. **Is it a helper?** (`helpers.ts`, `utils.ts`) — unit test directly if pure logic
-
-**Priority order:**
-1. Pages with new/changed data fetching or user interactions
-2. Components with complex internal logic (modals, forms, wizards)
-3. Hooks with non-trivial business logic
-4. Pure helper functions
-
-Skip: styling-only changes, type-only changes, config changes.
-
-## Step 3: Check for existing tests
-
-For each test target, check if tests already exist:
-
-```bash
-# For a page at src/app/(platform)/library/page.tsx
-ls src/app/\(platform\)/library/__tests__/ 2>/dev/null
-
-# For a component at src/app/(platform)/library/components/AgentCard/AgentCard.tsx
-ls src/app/\(platform\)/library/components/AgentCard/__tests__/ 2>/dev/null
-```
-
-Note which targets have no tests (need new files) vs which have tests that need updating.
-
-## Step 4: Identify API endpoints used
-
-For each test target, find which API hooks are used:
-
-```bash
-# Find generated API hook imports in the changed files
-grep -rn 'from.*__generated__/endpoints' src/app/\(platform\)/library/
-grep -rn 'use[A-Z].*V[12]' src/app/\(platform\)/library/
-```
-
-For each API hook found, locate the corresponding MSW handler:
-
-```bash
-# If the page uses useGetV2ListLibraryAgents, find its MSW handlers
-grep -rn 'getGetV2ListLibraryAgents.*Handler' src/app/api/__generated__/endpoints/library/library.msw.ts
-```
-
-List every MSW handler you will need (200 for happy path, 4xx for error paths).
-
-## Step 5: Write the test plan
-
-Before writing code, output a plan as a numbered list:
-
-```
-Test plan for [branch name]:
-
-1. src/app/(platform)/library/__tests__/main.test.tsx (NEW)
-   - Renders page with agent list (MSW 200)
-   - Shows loading state
-   - Shows error state (MSW 422)
-   - Handles empty agent list
-
-2. src/app/(platform)/library/__tests__/search.test.tsx (NEW)
-   - Filters agents by search query
-   - Shows no results message
-   - Clears search
-
-3. src/app/(platform)/library/components/AgentCard/__tests__/AgentCard.test.tsx (UPDATE)
-   - Add test for new "duplicate" action
-```
-
-Present this plan to the user. Wait for confirmation before proceeding. If the user has feedback, adjust the plan.
-
-## Step 6: Write the tests
-
-For each test file in the plan, follow these conventions:
-
-### File structure
-
-```tsx
-import { render, screen, waitFor } from "@/tests/integrations/test-utils";
-import { server } from "@/mocks/mock-server";
-// Import MSW handlers for endpoints the page uses
-import {
-  getGetV2ListLibraryAgentsMockHandler200,
-  getGetV2ListLibraryAgentsMockHandler422,
-} from "@/app/api/__generated__/endpoints/library/library.msw";
-// Import the component under test
-import LibraryPage from "../page";
-
-describe("LibraryPage", () => {
-  test("renders agent list from API", async () => {
-    server.use(getGetV2ListLibraryAgentsMockHandler200());
-
-    render(<LibraryPage />);
-
-    expect(await screen.findByText(/my agents/i)).toBeDefined();
-  });
-
-  test("shows error state on API failure", async () => {
-    server.use(getGetV2ListLibraryAgentsMockHandler422());
-
-    render(<LibraryPage />);
-
-    expect(await screen.findByText(/error/i)).toBeDefined();
-  });
-});
-```
-
-### Rules
-
- Use `render()` from `@/tests/integrations/test-utils` (NOT from `@testing-library/react` directly)
- Use `server.use()` to set up MSW handlers BEFORE rendering
- Use `findBy*` (async) for elements that appear after data fetching — NOT `getBy*`
- Use `getBy*` only for elements that are immediately present in the DOM
- Use `screen` queries — do NOT destructure from `render()`
- Use `waitFor` when asserting side effects or state changes after interactions
- Import `fireEvent` or `userEvent` from the test-utils for interactions
- Do NOT mock internal hooks or functions — mock at the API boundary via MSW
- Do NOT use `act()` manually — `render` and `fireEvent` handle it
- Keep tests focused: one behavior per test
- Use descriptive test names that read like sentences
-
-### Test location
-
-```
-# For pages: __tests__/ next to page.tsx
-src/app/(platform)/library/__tests__/main.test.tsx
-
-# For complex standalone components: __tests__/ inside component folder
-src/app/(platform)/library/components/AgentCard/__tests__/AgentCard.test.tsx
-
-# For pure helpers: co-located .test.ts
-src/app/(platform)/library/helpers.test.ts
-```
-
-### Custom MSW overrides
-
-When the auto-generated faker data is not enough, override with specific data:
-
-```tsx
-import { http, HttpResponse } from "msw";
-
-server.use(
-  http.get("http://localhost:3000/api/proxy/api/v2/library/agents", () => {
-    return HttpResponse.json({
-      agents: [
-        { id: "1", name: "Test Agent", description: "A test agent" },
-      ],
-      pagination: { total_items: 1, total_pages: 1, page: 1, page_size: 10 },
-    });
-  }),
-);
-```
-
-Use the proxy URL pattern: `http://localhost:3000/api/proxy/api/v{version}/{path}` — this matches the MSW base URL configured in `orval.config.ts`.
-
-## Step 7: Run and verify
-
-After writing all tests:
-
-```bash
-cd autogpt_platform/frontend
-pnpm test:unit --reporter=verbose
-```
-
-If tests fail:
-1. Read the error output carefully
-2. Fix the test (not the source code, unless there is a genuine bug)
-3. Re-run until all pass
-
-Then run the full checks:
-
-```bash
-pnpm format
-pnpm lint
-pnpm types
-```
--- a/.dockerignore
+++ b/.dockerignore
@@ -5,13 +5,42 @@
 !docs/

 # Platform - Libs
-!autogpt_platform/autogpt_libs/
+!autogpt_platform/autogpt_libs/autogpt_libs/
+!autogpt_platform/autogpt_libs/pyproject.toml
+!autogpt_platform/autogpt_libs/poetry.lock
+!autogpt_platform/autogpt_libs/README.md

 # Platform - Backend
-!autogpt_platform/backend/
+!autogpt_platform/backend/backend/
+!autogpt_platform/backend/test/e2e_test_data.py
+!autogpt_platform/backend/migrations/
+!autogpt_platform/backend/schema.prisma
+!autogpt_platform/backend/pyproject.toml
+!autogpt_platform/backend/poetry.lock
+!autogpt_platform/backend/README.md
+!autogpt_platform/backend/.env
+!autogpt_platform/backend/gen_prisma_types_stub.py
+
+# Platform - Market
+!autogpt_platform/market/market/
+!autogpt_platform/market/scripts.py
+!autogpt_platform/market/schema.prisma
+!autogpt_platform/market/pyproject.toml
+!autogpt_platform/market/poetry.lock
+!autogpt_platform/market/README.md

 # Platform - Frontend
-!autogpt_platform/frontend/
+!autogpt_platform/frontend/src/
+!autogpt_platform/frontend/public/
+!autogpt_platform/frontend/scripts/
+!autogpt_platform/frontend/package.json
+!autogpt_platform/frontend/pnpm-lock.yaml
+!autogpt_platform/frontend/tsconfig.json
+!autogpt_platform/frontend/README.md
+## config
+!autogpt_platform/frontend/*.config.*
+!autogpt_platform/frontend/.env.*
+!autogpt_platform/frontend/.env

 # Classic - AutoGPT
 !classic/original_autogpt/autogpt/
@@ -35,38 +64,6 @@
 # Classic - Frontend
 !classic/frontend/build/web/

-# Explicitly re-ignore unwanted files from whitelisted directories
-# Note: These patterns MUST come after the whitelist rules to take effect
-
-# Hidden files and directories (but keep frontend .env files needed for build)
-**/.*
-!autogpt_platform/frontend/.env
-!autogpt_platform/frontend/.env.default
-!autogpt_platform/frontend/.env.production
-
-# Python artifacts
-**/__pycache__/
-**/*.pyc
-**/*.pyo
-**/.venv/
-**/.ruff_cache/
-**/.pytest_cache/
-**/.coverage
-**/htmlcov/
-
-# Node artifacts
-**/node_modules/
-**/.next/
-**/storybook-static/
-**/playwright-report/
-**/test-results/
-
-# Build artifacts
-**/dist/
-**/build/
-!autogpt_platform/frontend/src/**/build/
-**/target/
-
-# Logs and temp files
-**/*.log
-**/*.tmp
+# Explicitly re-ignore some folders
+.*
+**/__pycache__
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,12 +1,8 @@
-### Why / What / How
-
-<!-- Why: Why does this PR exist? What problem does it solve, or what's broken/missing without it? -->
-<!-- What: What does this PR change? Summarize the changes at a high level. -->
-<!-- How: How does it work? Describe the approach, key implementation details, or architecture decisions. -->
+<!-- Clearly explain the need for these changes: -->

 ### Changes 🏗️

-<!-- List the key changes. Keep it higher level than the diff but specific enough to highlight what's new/modified. -->
+<!-- Concisely describe all of the changes made in this pull request: -->

 ### Checklist 📋

--- a/.github/scripts/detect_overlaps.py
+++ b/.github/scripts/detect_overlaps.py
--- a/.github/workflows/classic-autogpt-ci.yml
+++ b/.github/workflows/classic-autogpt-ci.yml
@@ -6,19 +6,11 @@ on:
    paths:
      - '.github/workflows/classic-autogpt-ci.yml'
      - 'classic/original_autogpt/**'
-      - 'classic/direct_benchmark/**'
-      - 'classic/forge/**'
-      - 'classic/pyproject.toml'
-      - 'classic/poetry.lock'
  pull_request:
    branches: [ master, dev, release-* ]
    paths:
      - '.github/workflows/classic-autogpt-ci.yml'
      - 'classic/original_autogpt/**'
-      - 'classic/direct_benchmark/**'
-      - 'classic/forge/**'
-      - 'classic/pyproject.toml'
-      - 'classic/poetry.lock'

 concurrency:
  group: ${{ format('classic-autogpt-ci-{0}', github.head_ref && format('{0}-{1}', github.event_name, github.event.pull_request.number) || github.sha) }}
@@ -27,22 +19,47 @@ concurrency:
 defaults:
  run:
    shell: bash
-    working-directory: classic
+    working-directory: classic/original_autogpt

 jobs:
  test:
    permissions:
      contents: read
    timeout-minutes: 30
-    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: ["3.10"]
+        platform-os: [ubuntu, macos, macos-arm64, windows]
+    runs-on: ${{ matrix.platform-os != 'macos-arm64' && format('{0}-latest', matrix.platform-os) || 'macos-14' }}

    steps:
-      - name: Start MinIO service
+      # Quite slow on macOS (2~4 minutes to set up Docker)
+      # - name: Set up Docker (macOS)
+      #   if: runner.os == 'macOS'
+      #   uses: crazy-max/ghaction-setup-docker@v3
+
+      - name: Start MinIO service (Linux)
+        if: runner.os == 'Linux'
        working-directory: '.'
        run: |
          docker pull minio/minio:edge-cicd
          docker run -d -p 9000:9000 minio/minio:edge-cicd

+      - name: Start MinIO service (macOS)
+        if: runner.os == 'macOS'
+        working-directory: ${{ runner.temp }}
+        run: |
+          brew install minio/stable/minio
+          mkdir data
+          minio server ./data &
+
+      # No MinIO on Windows:
+      # - Windows doesn't support running Linux Docker containers
+      # - It doesn't seem possible to start background processes on Windows. They are
+      #   killed after the step returns.
+      #   See: https://github.com/actions/runner/issues/598#issuecomment-2011890429
+
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
@@ -54,23 +71,41 @@ jobs:
          git config --global user.name "Auto-GPT-Bot"
          git config --global user.email "github-bot@agpt.co"

-      - name: Set up Python 3.12
+      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v5
        with:
-          python-version: "3.12"
+          python-version: ${{ matrix.python-version }}

      - id: get_date
        name: Get date
        run: echo "date=$(date +'%Y-%m-%d')" >> $GITHUB_OUTPUT

      - name: Set up Python dependency cache
+        # On Windows, unpacking cached dependencies takes longer than just installing them
+        if: runner.os != 'Windows'
        uses: actions/cache@v4
        with:
-          path: ~/.cache/pypoetry
-          key: poetry-${{ runner.os }}-${{ hashFiles('classic/poetry.lock') }}
+          path: ${{ runner.os == 'macOS' && '~/Library/Caches/pypoetry' || '~/.cache/pypoetry' }}
+          key: poetry-${{ runner.os }}-${{ hashFiles('classic/original_autogpt/poetry.lock') }}

-      - name: Install Poetry
-        run: curl -sSL https://install.python-poetry.org | python3 -
+      - name: Install Poetry (Unix)
+        if: runner.os != 'Windows'
+        run: |
+          curl -sSL https://install.python-poetry.org | python3 -
+
+          if [ "${{ runner.os }}" = "macOS" ]; then
+            PATH="$HOME/.local/bin:$PATH"
+            echo "$HOME/.local/bin" >> $GITHUB_PATH
+          fi
+
+      - name: Install Poetry (Windows)
+        if: runner.os == 'Windows'
+        shell: pwsh
+        run: |
+          (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -
+
+          $env:PATH += ";$env:APPDATA\Python\Scripts"
+          echo "$env:APPDATA\Python\Scripts" >> $env:GITHUB_PATH

      - name: Install Python dependencies
        run: poetry install
@@ -81,13 +116,12 @@ jobs:
            --cov=autogpt --cov-branch --cov-report term-missing --cov-report xml \
            --numprocesses=logical --durations=10 \
            --junitxml=junit.xml -o junit_family=legacy \
-            original_autogpt/tests/unit original_autogpt/tests/integration
+            tests/unit tests/integration
        env:
          CI: true
          PLAIN_OUTPUT: True
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
-          S3_ENDPOINT_URL: http://127.0.0.1:9000
+          S3_ENDPOINT_URL: ${{ runner.os != 'Windows' && 'http://127.0.0.1:9000' || '' }}
          AWS_ACCESS_KEY_ID: minioadmin
          AWS_SECRET_ACCESS_KEY: minioadmin

@@ -101,11 +135,11 @@ jobs:
        uses: codecov/codecov-action@v5
        with:
          token: ${{ secrets.CODECOV_TOKEN }}
-          flags: autogpt-agent
+          flags: autogpt-agent,${{ runner.os }}

      - name: Upload logs to artifact
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: test-logs
-          path: classic/logs/
+          path: classic/original_autogpt/logs/
--- a/.github/workflows/classic-autogpt-docker-ci.yml
+++ b/.github/workflows/classic-autogpt-docker-ci.yml
@@ -148,7 +148,7 @@ jobs:
            --entrypoint poetry ${{ env.IMAGE_NAME }} run \
            pytest -v --cov=autogpt --cov-branch --cov-report term-missing \
            --numprocesses=4 --durations=10 \
-            original_autogpt/tests/unit original_autogpt/tests/integration 2>&1 | tee test_output.txt
+            tests/unit tests/integration 2>&1 | tee test_output.txt

          test_failure=${PIPESTATUS[0]}

--- a/.github/workflows/classic-autogpts-ci.yml
+++ b/.github/workflows/classic-autogpts-ci.yml
@@ -10,9 +10,10 @@ on:
      - '.github/workflows/classic-autogpts-ci.yml'
      - 'classic/original_autogpt/**'
      - 'classic/forge/**'
-      - 'classic/direct_benchmark/**'
-      - 'classic/pyproject.toml'
-      - 'classic/poetry.lock'
+      - 'classic/benchmark/**'
+      - 'classic/run'
+      - 'classic/cli.py'
+      - 'classic/setup.py'
      - '!**/*.md'
  pull_request:
    branches: [ master, dev, release-* ]
@@ -20,9 +21,10 @@ on:
      - '.github/workflows/classic-autogpts-ci.yml'
      - 'classic/original_autogpt/**'
      - 'classic/forge/**'
-      - 'classic/direct_benchmark/**'
-      - 'classic/pyproject.toml'
-      - 'classic/poetry.lock'
+      - 'classic/benchmark/**'
+      - 'classic/run'
+      - 'classic/cli.py'
+      - 'classic/setup.py'
      - '!**/*.md'

 defaults:
@@ -33,9 +35,13 @@ defaults:
 jobs:
  serve-agent-protocol:
    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        agent-name: [ original_autogpt ]
+      fail-fast: false
    timeout-minutes: 20
    env:
-      min-python-version: '3.12'
+      min-python-version: '3.10'
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
@@ -49,22 +55,22 @@ jobs:
          python-version: ${{ env.min-python-version }}

      - name: Install Poetry
+        working-directory: ./classic/${{ matrix.agent-name }}/
        run: |
          curl -sSL https://install.python-poetry.org | python -

-      - name: Install dependencies
-        run: poetry install
-
-      - name: Run smoke tests with direct-benchmark
+      - name: Run regression tests
        run: |
-          poetry run direct-benchmark run \
-            --strategies one_shot \
-            --models claude \
-            --tests ReadFile,WriteFile \
-            --json
+          ./run agent start ${{ matrix.agent-name }}
+          cd ${{ matrix.agent-name }}
+          poetry run agbenchmark --mock --test=BasicRetrieval --test=Battleship --test=WebArenaTask_0
+          poetry run agbenchmark --test=WriteFile
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          AGENT_NAME: ${{ matrix.agent-name }}
          REQUESTS_CA_BUNDLE: /etc/ssl/certs/ca-certificates.crt
-          NONINTERACTIVE_MODE: "true"
-          CI: true
+          HELICONE_CACHE_ENABLED: false
+          HELICONE_PROPERTY_AGENT: ${{ matrix.agent-name }}
+          REPORTS_FOLDER: ${{ format('../../reports/{0}', matrix.agent-name) }}
+          TELEMETRY_ENVIRONMENT: autogpt-ci
+          TELEMETRY_OPT_IN: ${{ github.ref_name == 'master' }}
--- a/.github/workflows/classic-benchmark-ci.yml
+++ b/.github/workflows/classic-benchmark-ci.yml
@@ -1,24 +1,18 @@
-name: Classic - Direct Benchmark CI
+name: Classic - AGBenchmark CI

 on:
  push:
    branches: [ master, dev, ci-test* ]
    paths:
-      - 'classic/direct_benchmark/**'
-      - 'classic/original_autogpt/**'
-      - 'classic/forge/**'
+      - 'classic/benchmark/**'
+      - '!classic/benchmark/reports/**'
      - .github/workflows/classic-benchmark-ci.yml
-      - 'classic/pyproject.toml'
-      - 'classic/poetry.lock'
  pull_request:
    branches: [ master, dev, release-* ]
    paths:
-      - 'classic/direct_benchmark/**'
-      - 'classic/original_autogpt/**'
-      - 'classic/forge/**'
+      - 'classic/benchmark/**'
+      - '!classic/benchmark/reports/**'
      - .github/workflows/classic-benchmark-ci.yml
-      - 'classic/pyproject.toml'
-      - 'classic/poetry.lock'

 concurrency:
  group: ${{ format('benchmark-ci-{0}', github.head_ref && format('{0}-{1}', github.event_name, github.event.pull_request.number) || github.sha) }}
@@ -29,16 +23,23 @@ defaults:
    shell: bash

 env:
-  min-python-version: '3.12'
+  min-python-version: '3.10'

 jobs:
-  benchmark-tests:
-    runs-on: ubuntu-latest
+  test:
+    permissions:
+      contents: read
    timeout-minutes: 30
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: ["3.10"]
+        platform-os: [ubuntu, macos, macos-arm64, windows]
+    runs-on: ${{ matrix.platform-os != 'macos-arm64' && format('{0}-latest', matrix.platform-os) || 'macos-14' }}
    defaults:
      run:
        shell: bash
-        working-directory: classic
+        working-directory: classic/benchmark
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
@@ -46,88 +47,71 @@ jobs:
          fetch-depth: 0
          submodules: true

-      - name: Set up Python ${{ env.min-python-version }}
+      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v5
        with:
-          python-version: ${{ env.min-python-version }}
+          python-version: ${{ matrix.python-version }}

      - name: Set up Python dependency cache
+        # On Windows, unpacking cached dependencies takes longer than just installing them
+        if: runner.os != 'Windows'
        uses: actions/cache@v4
        with:
-          path: ~/.cache/pypoetry
-          key: poetry-${{ runner.os }}-${{ hashFiles('classic/poetry.lock') }}
+          path: ${{ runner.os == 'macOS' && '~/Library/Caches/pypoetry' || '~/.cache/pypoetry' }}
+          key: poetry-${{ runner.os }}-${{ hashFiles('classic/benchmark/poetry.lock') }}

-      - name: Install Poetry
+      - name: Install Poetry (Unix)
+        if: runner.os != 'Windows'
        run: |
          curl -sSL https://install.python-poetry.org | python3 -

-      - name: Install dependencies
+          if [ "${{ runner.os }}" = "macOS" ]; then
+            PATH="$HOME/.local/bin:$PATH"
+            echo "$HOME/.local/bin" >> $GITHUB_PATH
+          fi
+
+      - name: Install Poetry (Windows)
+        if: runner.os == 'Windows'
+        shell: pwsh
+        run: |
+          (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -
+
+          $env:PATH += ";$env:APPDATA\Python\Scripts"
+          echo "$env:APPDATA\Python\Scripts" >> $env:GITHUB_PATH
+
+      - name: Install Python dependencies
        run: poetry install

-      - name: Run basic benchmark tests
+      - name: Run pytest with coverage
        run: |
-          echo "Testing ReadFile challenge with one_shot strategy..."
-          poetry run direct-benchmark run \
-            --fresh \
-            --strategies one_shot \
-            --models claude \
-            --tests ReadFile \
-            --json
-
-          echo "Testing WriteFile challenge..."
-          poetry run direct-benchmark run \
-            --fresh \
-            --strategies one_shot \
-            --models claude \
-            --tests WriteFile \
-            --json
+          poetry run pytest -vv \
+            --cov=agbenchmark --cov-branch --cov-report term-missing --cov-report xml \
+            --durations=10 \
+            --junitxml=junit.xml -o junit_family=legacy \
+            tests
        env:
          CI: true
-          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-          NONINTERACTIVE_MODE: "true"

-      - name: Test category filtering
-        run: |
-          echo "Testing coding category..."
-          poetry run direct-benchmark run \
-            --fresh \
-            --strategies one_shot \
-            --models claude \
-            --categories coding \
-            --tests ReadFile,WriteFile \
-            --json
-        env:
-          CI: true
-          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
-          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-          NONINTERACTIVE_MODE: "true"
+      - name: Upload test results to Codecov
+        if: ${{ !cancelled() }}  # Run even if tests fail
+        uses: codecov/test-results-action@v1
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}

-      - name: Test multiple strategies
-        run: |
-          echo "Testing multiple strategies..."
-          poetry run direct-benchmark run \
-            --fresh \
-            --strategies one_shot,plan_execute \
-            --models claude \
-            --tests ReadFile \
-            --parallel 2 \
-            --json
-        env:
-          CI: true
-          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
-          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-          NONINTERACTIVE_MODE: "true"
+      - name: Upload coverage reports to Codecov
+        uses: codecov/codecov-action@v5
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}
+          flags: agbenchmark,${{ runner.os }}

-  # Run regression tests on maintain challenges
-  regression-tests:
+  self-test-with-agent:
    runs-on: ubuntu-latest
-    timeout-minutes: 45
-    if: github.ref == 'refs/heads/master' || github.ref == 'refs/heads/dev'
-    defaults:
-      run:
-        shell: bash
-        working-directory: classic
+    strategy:
+      matrix:
+        agent-name: [forge]
+      fail-fast: false
+    timeout-minutes: 20
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
@@ -140,31 +124,53 @@ jobs:
        with:
          python-version: ${{ env.min-python-version }}

-      - name: Set up Python dependency cache
-        uses: actions/cache@v4
-        with:
-          path: ~/.cache/pypoetry
-          key: poetry-${{ runner.os }}-${{ hashFiles('classic/poetry.lock') }}
-
      - name: Install Poetry
        run: |
-          curl -sSL https://install.python-poetry.org | python3 -
-
-      - name: Install dependencies
-        run: poetry install
+          curl -sSL https://install.python-poetry.org | python -

      - name: Run regression tests
+        working-directory: classic
        run: |
-          echo "Running regression tests (previously beaten challenges)..."
-          poetry run direct-benchmark run \
-            --fresh \
-            --strategies one_shot \
-            --models claude \
-            --maintain \
-            --parallel 4 \
-            --json
+          ./run agent start ${{ matrix.agent-name }}
+          cd ${{ matrix.agent-name }}
+
+          set +e # Ignore non-zero exit codes and continue execution
+          echo "Running the following command: poetry run agbenchmark --maintain --mock"
+          poetry run agbenchmark --maintain --mock
+          EXIT_CODE=$?
+          set -e  # Stop ignoring non-zero exit codes
+          # Check if the exit code was 5, and if so, exit with 0 instead
+          if [ $EXIT_CODE -eq 5 ]; then
+            echo "regression_tests.json is empty."
+          fi
+
+          echo "Running the following command: poetry run agbenchmark --mock"
+          poetry run agbenchmark --mock
+
+          echo "Running the following command: poetry run agbenchmark --mock --category=data"
+          poetry run agbenchmark --mock --category=data
+
+          echo "Running the following command: poetry run agbenchmark --mock --category=coding"
+          poetry run agbenchmark --mock --category=coding
+
+          # echo "Running the following command: poetry run agbenchmark --test=WriteFile"
+          # poetry run agbenchmark --test=WriteFile
+          cd ../benchmark
+          poetry install
+          echo "Adding the BUILD_SKILL_TREE environment variable. This will attempt to add new elements in the skill tree. If new elements are added, the CI fails because they should have been pushed"
+          export BUILD_SKILL_TREE=true
+
+          # poetry run agbenchmark --mock
+
+          # CHANGED=$(git diff --name-only | grep -E '(agbenchmark/challenges)|(../classic/frontend/assets)') || echo "No diffs"
+          # if [ ! -z "$CHANGED" ]; then
+          #   echo "There are unstaged changes please run agbenchmark and commit those changes since they are needed."
+          #   echo "$CHANGED"
+          #   exit 1
+          # else
+          #   echo "No unstaged changes."
+          # fi
        env:
-          CI: true
-          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-          NONINTERACTIVE_MODE: "true"
+          TELEMETRY_ENVIRONMENT: autogpt-benchmark-ci
+          TELEMETRY_OPT_IN: ${{ github.ref_name == 'master' }}
--- a/.github/workflows/classic-forge-ci.yml
+++ b/.github/workflows/classic-forge-ci.yml
@@ -6,15 +6,13 @@ on:
    paths:
      - '.github/workflows/classic-forge-ci.yml'
      - 'classic/forge/**'
-      - 'classic/pyproject.toml'
-      - 'classic/poetry.lock'
+      - '!classic/forge/tests/vcr_cassettes'
  pull_request:
    branches: [ master, dev, release-* ]
    paths:
      - '.github/workflows/classic-forge-ci.yml'
      - 'classic/forge/**'
-      - 'classic/pyproject.toml'
-      - 'classic/poetry.lock'
+      - '!classic/forge/tests/vcr_cassettes'

 concurrency:
  group: ${{ format('forge-ci-{0}', github.head_ref && format('{0}-{1}', github.event_name, github.event.pull_request.number) || github.sha) }}
@@ -23,60 +21,131 @@ concurrency:
 defaults:
  run:
    shell: bash
-    working-directory: classic
+    working-directory: classic/forge

 jobs:
  test:
    permissions:
      contents: read
    timeout-minutes: 30
-    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: ["3.10"]
+        platform-os: [ubuntu, macos, macos-arm64, windows]
+    runs-on: ${{ matrix.platform-os != 'macos-arm64' && format('{0}-latest', matrix.platform-os) || 'macos-14' }}

    steps:
-      - name: Start MinIO service
+      # Quite slow on macOS (2~4 minutes to set up Docker)
+      # - name: Set up Docker (macOS)
+      #   if: runner.os == 'macOS'
+      #   uses: crazy-max/ghaction-setup-docker@v3
+
+      - name: Start MinIO service (Linux)
+        if: runner.os == 'Linux'
        working-directory: '.'
        run: |
          docker pull minio/minio:edge-cicd
          docker run -d -p 9000:9000 minio/minio:edge-cicd

+      - name: Start MinIO service (macOS)
+        if: runner.os == 'macOS'
+        working-directory: ${{ runner.temp }}
+        run: |
+          brew install minio/stable/minio
+          mkdir data
+          minio server ./data &
+
+      # No MinIO on Windows:
+      # - Windows doesn't support running Linux Docker containers
+      # - It doesn't seem possible to start background processes on Windows. They are
+      #   killed after the step returns.
+      #   See: https://github.com/actions/runner/issues/598#issuecomment-2011890429
+
      - name: Checkout repository
        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+          submodules: true

-      - name: Set up Python 3.12
+      - name: Checkout cassettes
+        if: ${{ startsWith(github.event_name, 'pull_request') }}
+        env:
+          PR_BASE: ${{ github.event.pull_request.base.ref }}
+          PR_BRANCH: ${{ github.event.pull_request.head.ref }}
+          PR_AUTHOR: ${{ github.event.pull_request.user.login }}
+        run: |
+          cassette_branch="${PR_AUTHOR}-${PR_BRANCH}"
+          cassette_base_branch="${PR_BASE}"
+          cd tests/vcr_cassettes
+
+          if ! git ls-remote --exit-code --heads origin $cassette_base_branch ; then
+            cassette_base_branch="master"
+          fi
+
+          if git ls-remote --exit-code --heads origin $cassette_branch ; then
+            git fetch origin $cassette_branch
+            git fetch origin $cassette_base_branch
+
+            git checkout $cassette_branch
+
+            # Pick non-conflicting cassette updates from the base branch
+            git merge --no-commit --strategy-option=ours origin/$cassette_base_branch
+            echo "Using cassettes from mirror branch '$cassette_branch'," \
+              "synced to upstream branch '$cassette_base_branch'."
+          else
+            git checkout -b $cassette_branch
+            echo "Branch '$cassette_branch' does not exist in cassette submodule." \
+              "Using cassettes from '$cassette_base_branch'."
+          fi
+
+      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v5
        with:
-          python-version: "3.12"
+          python-version: ${{ matrix.python-version }}

      - name: Set up Python dependency cache
+        # On Windows, unpacking cached dependencies takes longer than just installing them
+        if: runner.os != 'Windows'
        uses: actions/cache@v4
        with:
-          path: ~/.cache/pypoetry
-          key: poetry-${{ runner.os }}-${{ hashFiles('classic/poetry.lock') }}
+          path: ${{ runner.os == 'macOS' && '~/Library/Caches/pypoetry' || '~/.cache/pypoetry' }}
+          key: poetry-${{ runner.os }}-${{ hashFiles('classic/forge/poetry.lock') }}

-      - name: Install Poetry
-        run: curl -sSL https://install.python-poetry.org | python3 -
+      - name: Install Poetry (Unix)
+        if: runner.os != 'Windows'
+        run: |
+          curl -sSL https://install.python-poetry.org | python3 -
+
+          if [ "${{ runner.os }}" = "macOS" ]; then
+            PATH="$HOME/.local/bin:$PATH"
+            echo "$HOME/.local/bin" >> $GITHUB_PATH
+          fi
+
+      - name: Install Poetry (Windows)
+        if: runner.os == 'Windows'
+        shell: pwsh
+        run: |
+          (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -
+
+          $env:PATH += ";$env:APPDATA\Python\Scripts"
+          echo "$env:APPDATA\Python\Scripts" >> $env:GITHUB_PATH

      - name: Install Python dependencies
        run: poetry install

-      - name: Install Playwright browsers
-        run: poetry run playwright install chromium
-
      - name: Run pytest with coverage
        run: |
          poetry run pytest -vv \
            --cov=forge --cov-branch --cov-report term-missing --cov-report xml \
            --durations=10 \
            --junitxml=junit.xml -o junit_family=legacy \
-            forge/forge forge/tests
+            forge
        env:
          CI: true
          PLAIN_OUTPUT: True
-          # API keys - tests that need these will skip if not available
-          # Secrets are not available to fork PRs (GitHub security feature)
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
-          S3_ENDPOINT_URL: http://127.0.0.1:9000
+          S3_ENDPOINT_URL: ${{ runner.os != 'Windows' && 'http://127.0.0.1:9000' || '' }}
          AWS_ACCESS_KEY_ID: minioadmin
          AWS_SECRET_ACCESS_KEY: minioadmin

@@ -90,11 +159,85 @@ jobs:
        uses: codecov/codecov-action@v5
        with:
          token: ${{ secrets.CODECOV_TOKEN }}
-          flags: forge
+          flags: forge,${{ runner.os }}
+
+      - id: setup_git_auth
+        name: Set up git token authentication
+        # Cassettes may be pushed even when tests fail
+        if: success() || failure()
+        run: |
+          config_key="http.${{ github.server_url }}/.extraheader"
+          if [ "${{ runner.os }}" = 'macOS' ]; then
+            base64_pat=$(echo -n "pat:${{ secrets.PAT_REVIEW }}" | base64)
+          else
+            base64_pat=$(echo -n "pat:${{ secrets.PAT_REVIEW }}" | base64 -w0)
+          fi
+
+          git config "$config_key" \
+            "Authorization: Basic $base64_pat"
+
+          cd tests/vcr_cassettes
+          git config "$config_key" \
+            "Authorization: Basic $base64_pat"
+
+          echo "config_key=$config_key" >> $GITHUB_OUTPUT
+
+      - id: push_cassettes
+        name: Push updated cassettes
+        # For pull requests, push updated cassettes even when tests fail
+        if: github.event_name == 'push' || (! github.event.pull_request.head.repo.fork && (success() || failure()))
+        env:
+          PR_BRANCH: ${{ github.event.pull_request.head.ref }}
+          PR_AUTHOR: ${{ github.event.pull_request.user.login }}
+        run: |
+          if [ "${{ startsWith(github.event_name, 'pull_request') }}" = "true" ]; then
+            is_pull_request=true
+            cassette_branch="${PR_AUTHOR}-${PR_BRANCH}"
+          else
+            cassette_branch="${{ github.ref_name }}"
+          fi
+
+          cd tests/vcr_cassettes
+          # Commit & push changes to cassettes if any
+          if ! git diff --quiet; then
+            git add .
+            git commit -m "Auto-update cassettes"
+            git push origin HEAD:$cassette_branch
+            if [ ! $is_pull_request ]; then
+              cd ../..
+              git add tests/vcr_cassettes
+              git commit -m "Update cassette submodule"
+              git push origin HEAD:$cassette_branch
+            fi
+            echo "updated=true" >> $GITHUB_OUTPUT
+          else
+            echo "updated=false" >> $GITHUB_OUTPUT
+            echo "No cassette changes to commit"
+          fi
+
+      - name: Post Set up git token auth
+        if: steps.setup_git_auth.outcome == 'success'
+        run: |
+          git config --unset-all '${{ steps.setup_git_auth.outputs.config_key }}'
+          git submodule foreach git config --unset-all '${{ steps.setup_git_auth.outputs.config_key }}'
+
+      - name: Apply "behaviour change" label and comment on PR
+        if: ${{ startsWith(github.event_name, 'pull_request') }}
+        run: |
+          PR_NUMBER="${{ github.event.pull_request.number }}"
+          TOKEN="${{ secrets.PAT_REVIEW }}"
+          REPO="${{ github.repository }}"
+
+          if [[ "${{ steps.push_cassettes.outputs.updated }}" == "true" ]]; then
+            echo "Adding label and comment..."
+            echo $TOKEN | gh auth login --with-token
+            gh issue edit $PR_NUMBER --add-label "behaviour change"
+            gh issue comment $PR_NUMBER --body "You changed AutoGPT's behaviour on ${{ runner.os }}. The cassettes have been updated and will be merged to the submodule when this Pull Request gets merged."
+          fi

      - name: Upload logs to artifact
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: test-logs
-          path: classic/logs/
+          path: classic/forge/logs/
--- a/.github/workflows/classic-frontend-ci.yml
+++ b/.github/workflows/classic-frontend-ci.yml
@@ -0,0 +1,60 @@
+name: Classic - Frontend CI/CD
+
+on:
+  push:
+    branches:
+      - master
+      - dev
+      - 'ci-test*' # This will match any branch that starts with "ci-test"
+    paths:
+      - 'classic/frontend/**'
+      - '.github/workflows/classic-frontend-ci.yml'
+  pull_request:
+    paths:
+      - 'classic/frontend/**'
+      - '.github/workflows/classic-frontend-ci.yml'
+
+jobs:
+  build:
+    permissions:
+      contents: write
+      pull-requests: write
+    runs-on: ubuntu-latest
+    env:
+      BUILD_BRANCH: ${{ format('classic-frontend-build/{0}', github.ref_name) }}
+
+    steps:
+      - name: Checkout Repo
+        uses: actions/checkout@v4
+
+      - name: Setup Flutter
+        uses: subosito/flutter-action@v2
+        with:
+          flutter-version: '3.13.2'
+
+      - name: Build Flutter to Web
+        run: |
+          cd classic/frontend
+          flutter build web --base-href /app/
+
+      # - name: Commit and Push to ${{ env.BUILD_BRANCH }}
+      #   if: github.event_name == 'push'
+      #   run: |
+      #     git config --local user.email "action@github.com"
+      #     git config --local user.name "GitHub Action"
+      #     git add classic/frontend/build/web
+      #     git checkout -B ${{ env.BUILD_BRANCH }}
+      #     git commit -m "Update frontend build to ${GITHUB_SHA:0:7}" -a
+      #     git push -f origin ${{ env.BUILD_BRANCH }}
+
+      - name: Create PR ${{ env.BUILD_BRANCH }} -> ${{ github.ref_name }}
+        if: github.event_name == 'push'
+        uses: peter-evans/create-pull-request@v7
+        with:
+          add-paths: classic/frontend/build/web
+          base: ${{ github.ref_name }}
+          branch: ${{ env.BUILD_BRANCH }}
+          delete-branch: true
+          title: "Update frontend build in `${{ github.ref_name }}`"
+          body: "This PR updates the frontend build based on commit ${{ github.sha }}."
+          commit-message: "Update frontend build based on commit ${{ github.sha }}"
--- a/.github/workflows/classic-python-checks.yml
+++ b/.github/workflows/classic-python-checks.yml
@@ -7,9 +7,7 @@ on:
      - '.github/workflows/classic-python-checks-ci.yml'
      - 'classic/original_autogpt/**'
      - 'classic/forge/**'
-      - 'classic/direct_benchmark/**'
-      - 'classic/pyproject.toml'
-      - 'classic/poetry.lock'
+      - 'classic/benchmark/**'
      - '**.py'
      - '!classic/forge/tests/vcr_cassettes'
  pull_request:
@@ -18,9 +16,7 @@ on:
      - '.github/workflows/classic-python-checks-ci.yml'
      - 'classic/original_autogpt/**'
      - 'classic/forge/**'
-      - 'classic/direct_benchmark/**'
-      - 'classic/pyproject.toml'
-      - 'classic/poetry.lock'
+      - 'classic/benchmark/**'
      - '**.py'
      - '!classic/forge/tests/vcr_cassettes'

@@ -31,13 +27,44 @@ concurrency:
 defaults:
  run:
    shell: bash
-    working-directory: classic

 jobs:
+  get-changed-parts:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - id: changes-in
+        name: Determine affected subprojects
+        uses: dorny/paths-filter@v3
+        with:
+          filters: |
+            original_autogpt:
+              - classic/original_autogpt/autogpt/**
+              - classic/original_autogpt/tests/**
+              - classic/original_autogpt/poetry.lock
+            forge:
+              - classic/forge/forge/**
+              - classic/forge/tests/**
+              - classic/forge/poetry.lock
+            benchmark:
+              - classic/benchmark/agbenchmark/**
+              - classic/benchmark/tests/**
+              - classic/benchmark/poetry.lock
+    outputs:
+      changed-parts: ${{ steps.changes-in.outputs.changes }}
+
  lint:
+    needs: get-changed-parts
    runs-on: ubuntu-latest
    env:
-      min-python-version: "3.12"
+      min-python-version: "3.10"
+
+    strategy:
+      matrix:
+        sub-package: ${{ fromJson(needs.get-changed-parts.outputs.changed-parts) }}
+      fail-fast: false

    steps:
      - name: Checkout repository
@@ -54,31 +81,42 @@ jobs:
        uses: actions/cache@v4
        with:
          path: ~/.cache/pypoetry
-          key: ${{ runner.os }}-poetry-${{ hashFiles('classic/poetry.lock') }}
+          key: ${{ runner.os }}-poetry-${{ hashFiles(format('{0}/poetry.lock', matrix.sub-package)) }}

      - name: Install Poetry
        run: curl -sSL https://install.python-poetry.org | python3 -

+      # Install dependencies
+
      - name: Install Python dependencies
-        run: poetry install
+        run: poetry -C classic/${{ matrix.sub-package }} install

      # Lint

      - name: Lint (isort)
        run: poetry run isort --check .
+        working-directory: classic/${{ matrix.sub-package }}

      - name: Lint (Black)
        if: success() || failure()
        run: poetry run black --check .
+        working-directory: classic/${{ matrix.sub-package }}

      - name: Lint (Flake8)
        if: success() || failure()
        run: poetry run flake8 .
+        working-directory: classic/${{ matrix.sub-package }}

  types:
+    needs: get-changed-parts
    runs-on: ubuntu-latest
    env:
-      min-python-version: "3.12"
+      min-python-version: "3.10"
+
+    strategy:
+      matrix:
+        sub-package: ${{ fromJson(needs.get-changed-parts.outputs.changed-parts) }}
+      fail-fast: false

    steps:
      - name: Checkout repository
@@ -95,16 +133,19 @@ jobs:
        uses: actions/cache@v4
        with:
          path: ~/.cache/pypoetry
-          key: ${{ runner.os }}-poetry-${{ hashFiles('classic/poetry.lock') }}
+          key: ${{ runner.os }}-poetry-${{ hashFiles(format('{0}/poetry.lock', matrix.sub-package)) }}

      - name: Install Poetry
        run: curl -sSL https://install.python-poetry.org | python3 -

+      # Install dependencies
+
      - name: Install Python dependencies
-        run: poetry install
+        run: poetry -C classic/${{ matrix.sub-package }} install

      # Typecheck

      - name: Typecheck
        if: success() || failure()
        run: poetry run pyright
+        working-directory: classic/${{ matrix.sub-package }}
--- a/.github/workflows/claude-ci-failure-auto-fix.yml
+++ b/.github/workflows/claude-ci-failure-auto-fix.yml
@@ -22,7 +22,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4
        with:
          ref: ${{ github.event.workflow_run.head_branch }}
          fetch-depth: 0
@@ -40,51 +40,9 @@ jobs:
          git checkout -b "$BRANCH_NAME"
          echo "branch_name=$BRANCH_NAME" >> $GITHUB_OUTPUT

-      # Backend Python/Poetry setup (so Claude can run linting/tests)
-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: "3.11"
-
-      - name: Set up Python dependency cache
-        uses: actions/cache@v5
-        with:
-          path: ~/.cache/pypoetry
-          key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
-
-      - name: Install Poetry
-        run: |
-          cd autogpt_platform/backend
-          HEAD_POETRY_VERSION=$(python3 ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
-          curl -sSL https://install.python-poetry.org | POETRY_VERSION=$HEAD_POETRY_VERSION python3 -
-          echo "$HOME/.local/bin" >> $GITHUB_PATH
-
-      - name: Install Python dependencies
-        working-directory: autogpt_platform/backend
-        run: poetry install
-
-      - name: Generate Prisma Client
-        working-directory: autogpt_platform/backend
-        run: poetry run prisma generate && poetry run gen-prisma-stub
-
-      # Frontend Node.js/pnpm setup (so Claude can run linting/tests)
-      - name: Enable corepack
-        run: corepack enable
-
-      - name: Set up Node.js
-        uses: actions/setup-node@v6
-        with:
-          node-version: "22"
-          cache: "pnpm"
-          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
-
-      - name: Install JavaScript dependencies
-        working-directory: autogpt_platform/frontend
-        run: pnpm install --frozen-lockfile
-
      - name: Get CI failure details
        id: failure_details
-        uses: actions/github-script@v8
+        uses: actions/github-script@v7
        with:
          script: |
            const run = await github.rest.actions.getWorkflowRun({
--- a/.github/workflows/claude-dependabot.yml
+++ b/.github/workflows/claude-dependabot.yml
@@ -30,7 +30,7 @@ jobs:
      actions: read # Required for CI access
    steps:
      - name: Checkout code
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4
        with:
          fetch-depth: 1

@@ -41,7 +41,7 @@ jobs:
          python-version: "3.11"  # Use standard version matching CI

      - name: Set up Python dependency cache
-        uses: actions/cache@v5
+        uses: actions/cache@v4
        with:
          path: ~/.cache/pypoetry
          key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
@@ -77,15 +77,27 @@ jobs:
        run: poetry run prisma generate && poetry run gen-prisma-stub

      # Frontend Node.js/pnpm setup (mirrors platform-frontend-ci.yml)
+      - name: Set up Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: "22"
+
      - name: Enable corepack
        run: corepack enable

-      - name: Set up Node.js
-        uses: actions/setup-node@v6
+      - name: Set pnpm store directory
+        run: |
+          pnpm config set store-dir ~/.pnpm-store
+          echo "PNPM_HOME=$HOME/.pnpm-store" >> $GITHUB_ENV
+
+      - name: Cache frontend dependencies
+        uses: actions/cache@v4
        with:
-          node-version: "22"
-          cache: "pnpm"
-          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
+          path: ~/.pnpm-store
+          key: ${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/package.json') }}
+          restore-keys: |
+            ${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
+            ${{ runner.os }}-pnpm-

      - name: Install JavaScript dependencies
        working-directory: autogpt_platform/frontend
@@ -112,7 +124,7 @@ jobs:
      # Phase 1: Cache and load Docker images for faster setup
      - name: Set up Docker image cache
        id: docker-cache
-        uses: actions/cache@v5
+        uses: actions/cache@v4
        with:
          path: ~/docker-cache
          # Use a versioned key for cache invalidation when image list changes
@@ -297,7 +309,6 @@ jobs:
        uses: anthropics/claude-code-action@v1
        with:
          claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
-          allowed_bots: "dependabot[bot]"
          claude_args: |
            --allowedTools "Bash(npm:*),Bash(pnpm:*),Bash(poetry:*),Bash(git:*),Edit,Replace,NotebookEditCell,mcp__github_inline_comment__create_inline_comment,Bash(gh pr comment:*), Bash(gh pr diff:*), Bash(gh pr view:*)"
          prompt: |
--- a/.github/workflows/claude.yml
+++ b/.github/workflows/claude.yml
@@ -40,7 +40,7 @@ jobs:
      actions: read # Required for CI access
    steps:
      - name: Checkout code
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4
        with:
          fetch-depth: 1

@@ -57,7 +57,7 @@ jobs:
          python-version: "3.11"  # Use standard version matching CI

      - name: Set up Python dependency cache
-        uses: actions/cache@v5
+        uses: actions/cache@v4
        with:
          path: ~/.cache/pypoetry
          key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
@@ -93,15 +93,27 @@ jobs:
        run: poetry run prisma generate && poetry run gen-prisma-stub

      # Frontend Node.js/pnpm setup (mirrors platform-frontend-ci.yml)
+      - name: Set up Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: "22"
+
      - name: Enable corepack
        run: corepack enable

-      - name: Set up Node.js
-        uses: actions/setup-node@v6
+      - name: Set pnpm store directory
+        run: |
+          pnpm config set store-dir ~/.pnpm-store
+          echo "PNPM_HOME=$HOME/.pnpm-store" >> $GITHUB_ENV
+
+      - name: Cache frontend dependencies
+        uses: actions/cache@v4
        with:
-          node-version: "22"
-          cache: "pnpm"
-          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
+          path: ~/.pnpm-store
+          key: ${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/package.json') }}
+          restore-keys: |
+            ${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
+            ${{ runner.os }}-pnpm-

      - name: Install JavaScript dependencies
        working-directory: autogpt_platform/frontend
@@ -128,7 +140,7 @@ jobs:
      # Phase 1: Cache and load Docker images for faster setup
      - name: Set up Docker image cache
        id: docker-cache
-        uses: actions/cache@v5
+        uses: actions/cache@v4
        with:
          path: ~/docker-cache
          # Use a versioned key for cache invalidation when image list changes
--- a/.github/workflows/codeql.yml
+++ b/.github/workflows/codeql.yml
@@ -58,11 +58,11 @@ jobs:
        # your codebase is analyzed, see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/codeql-code-scanning-for-compiled-languages
    steps:
    - name: Checkout repository
-      uses: actions/checkout@v6
+      uses: actions/checkout@v4

    # Initializes the CodeQL tools for scanning.
    - name: Initialize CodeQL
-      uses: github/codeql-action/init@v4
+      uses: github/codeql-action/init@v3
      with:
        languages: ${{ matrix.language }}
        build-mode: ${{ matrix.build-mode }}
@@ -93,6 +93,6 @@ jobs:
        exit 1

    - name: Perform CodeQL Analysis
-      uses: github/codeql-action/analyze@v4
+      uses: github/codeql-action/analyze@v3
      with:
        category: "/language:${{matrix.language}}"
--- a/.github/workflows/copilot-setup-steps.yml
+++ b/.github/workflows/copilot-setup-steps.yml
@@ -27,7 +27,7 @@ jobs:
    # If you do not check out your code, Copilot will do this for you.
    steps:
      - name: Checkout code
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          submodules: true
@@ -39,7 +39,7 @@ jobs:
          python-version: "3.11"  # Use standard version matching CI

      - name: Set up Python dependency cache
-        uses: actions/cache@v5
+        uses: actions/cache@v4
        with:
          path: ~/.cache/pypoetry
          key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
@@ -76,7 +76,7 @@ jobs:

      # Frontend Node.js/pnpm setup (mirrors platform-frontend-ci.yml)
      - name: Set up Node.js
-        uses: actions/setup-node@v6
+        uses: actions/setup-node@v4
        with:
          node-version: "22"

@@ -89,7 +89,7 @@ jobs:
          echo "PNPM_HOME=$HOME/.pnpm-store" >> $GITHUB_ENV

      - name: Cache frontend dependencies
-        uses: actions/cache@v5
+        uses: actions/cache@v4
        with:
          path: ~/.pnpm-store
          key: ${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/package.json') }}
@@ -132,7 +132,7 @@ jobs:
      # Phase 1: Cache and load Docker images for faster setup
      - name: Set up Docker image cache
        id: docker-cache
-        uses: actions/cache@v5
+        uses: actions/cache@v4
        with:
          path: ~/docker-cache
          # Use a versioned key for cache invalidation when image list changes
--- a/.github/workflows/docs-block-sync.yml
+++ b/.github/workflows/docs-block-sync.yml
@@ -23,7 +23,7 @@ jobs:

    steps:
      - name: Checkout code
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4
        with:
          fetch-depth: 1

@@ -33,7 +33,7 @@ jobs:
          python-version: "3.11"

      - name: Set up Python dependency cache
-        uses: actions/cache@v5
+        uses: actions/cache@v4
        with:
          path: ~/.cache/pypoetry
          key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
--- a/.github/workflows/docs-claude-review.yml
+++ b/.github/workflows/docs-claude-review.yml
@@ -7,10 +7,6 @@ on:
      - "docs/integrations/**"
      - "autogpt_platform/backend/backend/blocks/**"

-concurrency:
-  group: claude-docs-review-${{ github.event.pull_request.number }}
-  cancel-in-progress: true
-
 jobs:
  claude-review:
    # Only run for PRs from members/collaborators
@@ -27,7 +23,7 @@ jobs:

    steps:
      - name: Checkout code
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4
        with:
          fetch-depth: 0

@@ -37,7 +33,7 @@ jobs:
          python-version: "3.11"

      - name: Set up Python dependency cache
-        uses: actions/cache@v5
+        uses: actions/cache@v4
        with:
          path: ~/.cache/pypoetry
          key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
@@ -95,35 +91,5 @@ jobs:
            3. Read corresponding documentation files to verify accuracy
            4. Provide your feedback as a PR comment

-            ## IMPORTANT: Comment Marker
-            Start your PR comment with exactly this HTML comment marker on its own line:
-            <!-- CLAUDE_DOCS_REVIEW -->
-
-            This marker is used to identify and replace your comment on subsequent runs.
-
            Be constructive and specific. If everything looks good, say so!
            If there are issues, explain what's wrong and suggest how to fix it.
-
-      - name: Delete old Claude review comments
-        env:
-          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        run: |
-          # Get all comment IDs with our marker, sorted by creation date (oldest first)
-          COMMENT_IDS=$(gh api \
-            repos/${{ github.repository }}/issues/${{ github.event.pull_request.number }}/comments \
-            --jq '[.[] | select(.body | contains("<!-- CLAUDE_DOCS_REVIEW -->"))] | sort_by(.created_at) | .[].id')
-
-          # Count comments
-          COMMENT_COUNT=$(echo "$COMMENT_IDS" | grep -c . || true)
-
-          if [ "$COMMENT_COUNT" -gt 1 ]; then
-            # Delete all but the last (newest) comment
-            echo "$COMMENT_IDS" | head -n -1 | while read -r COMMENT_ID; do
-              if [ -n "$COMMENT_ID" ]; then
-                echo "Deleting old review comment: $COMMENT_ID"
-                gh api -X DELETE repos/${{ github.repository }}/issues/comments/$COMMENT_ID
-              fi
-            done
-          else
-            echo "No old review comments to clean up"
-          fi
--- a/.github/workflows/docs-enhance.yml
+++ b/.github/workflows/docs-enhance.yml
@@ -28,7 +28,7 @@ jobs:

    steps:
      - name: Checkout code
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4
        with:
          fetch-depth: 1

@@ -38,7 +38,7 @@ jobs:
          python-version: "3.11"

      - name: Set up Python dependency cache
-        uses: actions/cache@v5
+        uses: actions/cache@v4
        with:
          path: ~/.cache/pypoetry
          key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
--- a/.github/workflows/platform-autogpt-deploy-dev.yaml
+++ b/.github/workflows/platform-autogpt-deploy-dev.yaml
@@ -25,7 +25,7 @@ jobs:

    steps:
      - name: Checkout code
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4
        with:
          ref: ${{ github.event.inputs.git_ref || github.ref_name }}

@@ -52,7 +52,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Trigger deploy workflow
-        uses: peter-evans/repository-dispatch@v4
+        uses: peter-evans/repository-dispatch@v3
        with:
          token: ${{ secrets.DEPLOY_TOKEN }}
          repository: Significant-Gravitas/AutoGPT_cloud_infrastructure
--- a/.github/workflows/platform-autogpt-deploy-prod.yml
+++ b/.github/workflows/platform-autogpt-deploy-prod.yml
@@ -17,7 +17,7 @@ jobs:

    steps:
      - name: Checkout code
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4
        with:
          ref: ${{ github.ref_name || 'master' }}

@@ -45,7 +45,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Trigger deploy workflow
-        uses: peter-evans/repository-dispatch@v4
+        uses: peter-evans/repository-dispatch@v3
        with:
          token: ${{ secrets.DEPLOY_TOKEN }}
          repository: Significant-Gravitas/AutoGPT_cloud_infrastructure
--- a/.github/workflows/platform-backend-ci.yml
+++ b/.github/workflows/platform-backend-ci.yml
@@ -5,14 +5,12 @@ on:
    branches: [master, dev, ci-test*]
    paths:
      - ".github/workflows/platform-backend-ci.yml"
-      - ".github/workflows/scripts/get_package_version_from_lockfile.py"
      - "autogpt_platform/backend/**"
      - "autogpt_platform/autogpt_libs/**"
  pull_request:
    branches: [master, dev, release-*]
    paths:
      - ".github/workflows/platform-backend-ci.yml"
-      - ".github/workflows/scripts/get_package_version_from_lockfile.py"
      - "autogpt_platform/backend/**"
      - "autogpt_platform/autogpt_libs/**"
  merge_group:
@@ -27,91 +25,10 @@ defaults:
    working-directory: autogpt_platform/backend

 jobs:
-  lint:
-    permissions:
-      contents: read
-    timeout-minutes: 10
-    runs-on: ubuntu-latest
-
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v6
-
-      - name: Set up Python 3.12
-        uses: actions/setup-python@v5
-        with:
-          python-version: "3.12"
-
-      - name: Set up Python dependency cache
-        uses: actions/cache@v5
-        with:
-          path: ~/.cache/pypoetry
-          key: poetry-${{ runner.os }}-py3.12-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
-
-      - name: Install Poetry
-        run: |
-          HEAD_POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
-          echo "Using Poetry version ${HEAD_POETRY_VERSION}"
-          curl -sSL https://install.python-poetry.org | POETRY_VERSION=$HEAD_POETRY_VERSION python3 -
-
-      - name: Install Python dependencies
-        run: poetry install
-
-      - name: Run Linters
-        run: poetry run lint --skip-pyright
-
-    env:
-      CI: true
-      PLAIN_OUTPUT: True
-
-  type-check:
-    permissions:
-      contents: read
-    timeout-minutes: 10
-    strategy:
-      fail-fast: false
-      matrix:
-        python-version: ["3.11", "3.12", "3.13"]
-    runs-on: ubuntu-latest
-
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v6
-
-      - name: Set up Python ${{ matrix.python-version }}
-        uses: actions/setup-python@v5
-        with:
-          python-version: ${{ matrix.python-version }}
-
-      - name: Set up Python dependency cache
-        uses: actions/cache@v5
-        with:
-          path: ~/.cache/pypoetry
-          key: poetry-${{ runner.os }}-py${{ matrix.python-version }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
-
-      - name: Install Poetry
-        run: |
-          HEAD_POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
-          echo "Using Poetry version ${HEAD_POETRY_VERSION}"
-          curl -sSL https://install.python-poetry.org | POETRY_VERSION=$HEAD_POETRY_VERSION python3 -
-
-      - name: Install Python dependencies
-        run: poetry install
-
-      - name: Generate Prisma Client
-        run: poetry run prisma generate && poetry run gen-prisma-stub
-
-      - name: Run Pyright
-        run: poetry run pyright --pythonversion ${{ matrix.python-version }}
-
-    env:
-      CI: true
-      PLAIN_OUTPUT: True
-
  test:
    permissions:
      contents: read
-    timeout-minutes: 15
+    timeout-minutes: 30
    strategy:
      fail-fast: false
      matrix:
@@ -124,18 +41,13 @@ jobs:
        ports:
          - 6379:6379
      rabbitmq:
-        image: rabbitmq:4.1.4
+        image: rabbitmq:3.12-management
        ports:
          - 5672:5672
+          - 15672:15672
        env:
          RABBITMQ_DEFAULT_USER: ${{ env.RABBITMQ_DEFAULT_USER }}
          RABBITMQ_DEFAULT_PASS: ${{ env.RABBITMQ_DEFAULT_PASS }}
-        options: >-
-          --health-cmd "rabbitmq-diagnostics -q ping"
-          --health-interval 30s
-          --health-timeout 10s
-          --health-retries 5
-          --health-start-period 10s
      clamav:
        image: clamav/clamav-debian:latest
        ports:
@@ -156,7 +68,7 @@ jobs:

    steps:
      - name: Checkout repository
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4
        with:
          fetch-depth: 0
          submodules: true
@@ -176,12 +88,12 @@ jobs:
        run: echo "date=$(date +'%Y-%m-%d')" >> $GITHUB_OUTPUT

      - name: Set up Python dependency cache
-        uses: actions/cache@v5
+        uses: actions/cache@v4
        with:
          path: ~/.cache/pypoetry
-          key: poetry-${{ runner.os }}-py${{ matrix.python-version }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
+          key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}

-      - name: Install Poetry
+      - name: Install Poetry (Unix)
        run: |
          # Extract Poetry version from backend/poetry.lock
          HEAD_POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
@@ -239,22 +151,22 @@ jobs:
          echo "Waiting for ClamAV daemon to start..."
          max_attempts=60
          attempt=0
-
+          
          until nc -z localhost 3310 || [ $attempt -eq $max_attempts ]; do
            echo "ClamAV is unavailable - sleeping (attempt $((attempt+1))/$max_attempts)"
            sleep 5
            attempt=$((attempt+1))
          done
-
+          
          if [ $attempt -eq $max_attempts ]; then
            echo "ClamAV failed to start after $((max_attempts*5)) seconds"
            echo "Checking ClamAV service logs..."
            docker logs $(docker ps -q --filter "ancestor=clamav/clamav-debian:latest") 2>&1 | tail -50 || echo "No ClamAV container found"
            exit 1
          fi
-
+          
          echo "ClamAV is ready!"
-
+          
          # Verify ClamAV is responsive
          echo "Testing ClamAV connection..."
          timeout 10 bash -c 'echo "PING" | nc localhost 3310' || {
@@ -269,15 +181,18 @@ jobs:
          DATABASE_URL: ${{ steps.supabase.outputs.DB_URL }}
          DIRECT_URL: ${{ steps.supabase.outputs.DB_URL }}

+      - id: lint
+        name: Run Linter
+        run: poetry run lint
+
      - name: Run pytest with coverage
        run: |
          if [[ "${{ runner.debug }}" == "1" ]]; then
-            poetry run pytest -s -vv -o log_cli=true -o log_cli_level=DEBUG \
-              --cov=backend --cov-branch --cov-report term-missing --cov-report xml
+            poetry run pytest -s -vv -o log_cli=true -o log_cli_level=DEBUG
          else
-            poetry run pytest -s -vv \
-              --cov=backend --cov-branch --cov-report term-missing --cov-report xml
+            poetry run pytest -s -vv
          fi
+        if: success() || (failure() && steps.lint.outcome == 'failure')
        env:
          LOG_LEVEL: ${{ runner.debug && 'DEBUG' || 'INFO' }}
          DATABASE_URL: ${{ steps.supabase.outputs.DB_URL }}
@@ -289,14 +204,6 @@ jobs:
          REDIS_PORT: "6379"
          ENCRYPTION_KEY: "dvziYgz0KSK8FENhju0ZYi8-fRTfAdlz6YLhdB_jhNw=" # DO NOT USE IN PRODUCTION!!

-      - name: Upload coverage reports to Codecov
-        if: ${{ !cancelled() }}
-        uses: codecov/codecov-action@v5
-        with:
-          token: ${{ secrets.CODECOV_TOKEN }}
-          flags: platform-backend
-          files: ./autogpt_platform/backend/coverage.xml
-
    env:
      CI: true
      PLAIN_OUTPUT: True
@@ -310,3 +217,9 @@ jobs:
      # the backend service, docker composes, and examples
      RABBITMQ_DEFAULT_USER: "rabbitmq_user_default"
      RABBITMQ_DEFAULT_PASS: "k0VMxyIJF9S35f3x2uaw5IWAl6Y536O7"
+
+      # - name: Upload coverage reports to Codecov
+      #   uses: codecov/codecov-action@v4
+      #   with:
+      #     token: ${{ secrets.CODECOV_TOKEN }}
+      #     flags: backend,${{ runner.os }}
--- a/.github/workflows/platform-dev-deploy-event-dispatcher.yml
+++ b/.github/workflows/platform-dev-deploy-event-dispatcher.yml
@@ -17,7 +17,7 @@ jobs:
      - name: Check comment permissions and deployment status
        id: check_status
        if: github.event_name == 'issue_comment' && github.event.issue.pull_request
-        uses: actions/github-script@v8
+        uses: actions/github-script@v7
        with:
          script: |
            const commentBody = context.payload.comment.body.trim();
@@ -55,7 +55,7 @@ jobs:

      - name: Post permission denied comment
        if: steps.check_status.outputs.permission_denied == 'true'
-        uses: actions/github-script@v8
+        uses: actions/github-script@v7
        with:
          script: |
            await github.rest.issues.createComment({
@@ -68,7 +68,7 @@ jobs:
      - name: Get PR details for deployment
        id: pr_details
        if: steps.check_status.outputs.should_deploy == 'true' || steps.check_status.outputs.should_undeploy == 'true'
-        uses: actions/github-script@v8
+        uses: actions/github-script@v7
        with:
          script: |
            const pr = await github.rest.pulls.get({
@@ -82,7 +82,7 @@ jobs:
          
      - name: Dispatch Deploy Event
        if: steps.check_status.outputs.should_deploy == 'true'
-        uses: peter-evans/repository-dispatch@v4
+        uses: peter-evans/repository-dispatch@v3
        with:
          token: ${{ secrets.DISPATCH_TOKEN }}
          repository: Significant-Gravitas/AutoGPT_cloud_infrastructure
@@ -98,7 +98,7 @@ jobs:

      - name: Post deploy success comment
        if: steps.check_status.outputs.should_deploy == 'true'
-        uses: actions/github-script@v8
+        uses: actions/github-script@v7
        with:
          script: |
            await github.rest.issues.createComment({
@@ -110,7 +110,7 @@ jobs:

      - name: Dispatch Undeploy Event (from comment)
        if: steps.check_status.outputs.should_undeploy == 'true'
-        uses: peter-evans/repository-dispatch@v4
+        uses: peter-evans/repository-dispatch@v3
        with:
          token: ${{ secrets.DISPATCH_TOKEN }}
          repository: Significant-Gravitas/AutoGPT_cloud_infrastructure
@@ -126,7 +126,7 @@ jobs:

      - name: Post undeploy success comment
        if: steps.check_status.outputs.should_undeploy == 'true'
-        uses: actions/github-script@v8
+        uses: actions/github-script@v7
        with:
          script: |
            await github.rest.issues.createComment({
@@ -139,7 +139,7 @@ jobs:
      - name: Check deployment status on PR close
        id: check_pr_close
        if: github.event_name == 'pull_request' && github.event.action == 'closed'
-        uses: actions/github-script@v8
+        uses: actions/github-script@v7
        with:
          script: |
            const comments = await github.rest.issues.listComments({
@@ -168,7 +168,7 @@ jobs:
          github.event_name == 'pull_request' &&
          github.event.action == 'closed' &&
          steps.check_pr_close.outputs.should_undeploy == 'true'
-        uses: peter-evans/repository-dispatch@v4
+        uses: peter-evans/repository-dispatch@v3
        with:
          token: ${{ secrets.DISPATCH_TOKEN }}
          repository: Significant-Gravitas/AutoGPT_cloud_infrastructure
@@ -187,7 +187,7 @@ jobs:
          github.event_name == 'pull_request' &&
          github.event.action == 'closed' &&
          steps.check_pr_close.outputs.should_undeploy == 'true'
-        uses: actions/github-script@v8
+        uses: actions/github-script@v7
        with:
          script: |
            await github.rest.issues.createComment({
--- a/.github/workflows/platform-frontend-ci.yml
+++ b/.github/workflows/platform-frontend-ci.yml
@@ -6,16 +6,10 @@ on:
    paths:
      - ".github/workflows/platform-frontend-ci.yml"
      - "autogpt_platform/frontend/**"
-      - "autogpt_platform/backend/Dockerfile"
-      - "autogpt_platform/docker-compose.yml"
-      - "autogpt_platform/docker-compose.platform.yml"
  pull_request:
    paths:
      - ".github/workflows/platform-frontend-ci.yml"
      - "autogpt_platform/frontend/**"
-      - "autogpt_platform/backend/Dockerfile"
-      - "autogpt_platform/docker-compose.yml"
-      - "autogpt_platform/docker-compose.platform.yml"
  merge_group:
  workflow_dispatch:

@@ -32,31 +26,34 @@ jobs:
  setup:
    runs-on: ubuntu-latest
    outputs:
-      components-changed: ${{ steps.filter.outputs.components }}
+      cache-key: ${{ steps.cache-key.outputs.key }}

    steps:
      - name: Checkout repository
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4

-      - name: Check for component changes
-        uses: dorny/paths-filter@v3
-        id: filter
+      - name: Set up Node.js
+        uses: actions/setup-node@v4
        with:
-          filters: |
-            components:
-              - 'autogpt_platform/frontend/src/components/**'
+          node-version: "22.18.0"

      - name: Enable corepack
        run: corepack enable

-      - name: Set up Node
-        uses: actions/setup-node@v6
-        with:
-          node-version: "22.18.0"
-          cache: "pnpm"
-          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
+      - name: Generate cache key
+        id: cache-key
+        run: echo "key=${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/package.json') }}" >> $GITHUB_OUTPUT

-      - name: Install dependencies to populate cache
+      - name: Cache dependencies
+        uses: actions/cache@v4
+        with:
+          path: ~/.pnpm-store
+          key: ${{ steps.cache-key.outputs.key }}
+          restore-keys: |
+            ${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
+            ${{ runner.os }}-pnpm-
+
+      - name: Install dependencies
        run: pnpm install --frozen-lockfile

  lint:
@@ -65,17 +62,24 @@ jobs:

    steps:
      - name: Checkout repository
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4
+
+      - name: Set up Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: "22.18.0"

      - name: Enable corepack
        run: corepack enable

-      - name: Set up Node
-        uses: actions/setup-node@v6
+      - name: Restore dependencies cache
+        uses: actions/cache@v4
        with:
-          node-version: "22.18.0"
-          cache: "pnpm"
-          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
+          path: ~/.pnpm-store
+          key: ${{ needs.setup.outputs.cache-key }}
+          restore-keys: |
+            ${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
+            ${{ runner.os }}-pnpm-

      - name: Install dependencies
        run: pnpm install --frozen-lockfile
@@ -86,27 +90,31 @@ jobs:
  chromatic:
    runs-on: ubuntu-latest
    needs: setup
-    # Disabled: to re-enable, remove 'false &&' from the condition below
-    if: >-
-      false
-      && (github.ref == 'refs/heads/dev' || github.base_ref == 'dev')
-      && needs.setup.outputs.components-changed == 'true'
+    # Only run on dev branch pushes or PRs targeting dev
+    if: github.ref == 'refs/heads/dev' || github.base_ref == 'dev'

    steps:
      - name: Checkout repository
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4
        with:
          fetch-depth: 0

+      - name: Set up Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: "22.18.0"
+
      - name: Enable corepack
        run: corepack enable

-      - name: Set up Node
-        uses: actions/setup-node@v6
+      - name: Restore dependencies cache
+        uses: actions/cache@v4
        with:
-          node-version: "22.18.0"
-          cache: "pnpm"
-          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
+          path: ~/.pnpm-store
+          key: ${{ needs.setup.outputs.cache-key }}
+          restore-keys: |
+            ${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
+            ${{ runner.os }}-pnpm-

      - name: Install dependencies
        run: pnpm install --frozen-lockfile
@@ -120,25 +128,163 @@ jobs:
          token: ${{ secrets.GITHUB_TOKEN }}
          exitOnceUploaded: true

+  e2e_test:
+    runs-on: big-boi
+    needs: setup
+    strategy:
+      fail-fast: false
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+        with:
+          submodules: recursive
+
+      - name: Set up Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: "22.18.0"
+
+      - name: Enable corepack
+        run: corepack enable
+
+      - name: Copy default supabase .env
+        run: |
+          cp ../.env.default ../.env
+
+      - name: Copy backend .env and set OpenAI API key
+        run: |
+          cp ../backend/.env.default ../backend/.env
+          echo "OPENAI_INTERNAL_API_KEY=${{ secrets.OPENAI_API_KEY }}" >> ../backend/.env
+        env:
+          # Used by E2E test data script to generate embeddings for approved store agents
+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
+
+      - name: Cache Docker layers
+        uses: actions/cache@v4
+        with:
+          path: /tmp/.buildx-cache
+          key: ${{ runner.os }}-buildx-frontend-test-${{ hashFiles('autogpt_platform/docker-compose.yml', 'autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/pyproject.toml', 'autogpt_platform/backend/poetry.lock') }}
+          restore-keys: |
+            ${{ runner.os }}-buildx-frontend-test-
+
+      - name: Run docker compose
+        run: |
+          NEXT_PUBLIC_PW_TEST=true docker compose -f ../docker-compose.yml up -d
+        env:
+          DOCKER_BUILDKIT: 1
+          BUILDX_CACHE_FROM: type=local,src=/tmp/.buildx-cache
+          BUILDX_CACHE_TO: type=local,dest=/tmp/.buildx-cache-new,mode=max
+
+      - name: Move cache
+        run: |
+          rm -rf /tmp/.buildx-cache
+          if [ -d "/tmp/.buildx-cache-new" ]; then
+            mv /tmp/.buildx-cache-new /tmp/.buildx-cache
+          fi
+
+      - name: Wait for services to be ready
+        run: |
+          echo "Waiting for rest_server to be ready..."
+          timeout 60 sh -c 'until curl -f http://localhost:8006/health 2>/dev/null; do sleep 2; done' || echo "Rest server health check timeout, continuing..."
+          echo "Waiting for database to be ready..."
+          timeout 60 sh -c 'until docker compose -f ../docker-compose.yml exec -T db pg_isready -U postgres 2>/dev/null; do sleep 2; done' || echo "Database ready check timeout, continuing..."
+
+      - name: Create E2E test data
+        run: |
+          echo "Creating E2E test data..."
+          # First try to run the script from inside the container
+          if docker compose -f ../docker-compose.yml exec -T rest_server test -f /app/autogpt_platform/backend/test/e2e_test_data.py; then
+            echo "✅ Found e2e_test_data.py in container, running it..."
+            docker compose -f ../docker-compose.yml exec -T rest_server sh -c "cd /app/autogpt_platform && python backend/test/e2e_test_data.py" || {
+              echo "❌ E2E test data creation failed!"
+              docker compose -f ../docker-compose.yml logs --tail=50 rest_server
+              exit 1
+            }
+          else
+            echo "⚠️ e2e_test_data.py not found in container, copying and running..."
+            # Copy the script into the container and run it
+            docker cp ../backend/test/e2e_test_data.py $(docker compose -f ../docker-compose.yml ps -q rest_server):/tmp/e2e_test_data.py || {
+              echo "❌ Failed to copy script to container"
+              exit 1
+            }
+            docker compose -f ../docker-compose.yml exec -T rest_server sh -c "cd /app/autogpt_platform && python /tmp/e2e_test_data.py" || {
+              echo "❌ E2E test data creation failed!"
+              docker compose -f ../docker-compose.yml logs --tail=50 rest_server
+              exit 1
+            }
+          fi
+
+      - name: Restore dependencies cache
+        uses: actions/cache@v4
+        with:
+          path: ~/.pnpm-store
+          key: ${{ needs.setup.outputs.cache-key }}
+          restore-keys: |
+            ${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
+            ${{ runner.os }}-pnpm-
+
+      - name: Install dependencies
+        run: pnpm install --frozen-lockfile
+
+      - name: Install Browser 'chromium'
+        run: pnpm playwright install --with-deps chromium
+
+      - name: Run Playwright tests
+        run: pnpm test:no-build
+        continue-on-error: false
+
+      - name: Upload Playwright report
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: playwright-report
+          path: playwright-report
+          if-no-files-found: ignore
+          retention-days: 3
+
+      - name: Upload Playwright test results
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: playwright-test-results
+          path: test-results
+          if-no-files-found: ignore
+          retention-days: 3
+
+      - name: Print Final Docker Compose logs
+        if: always()
+        run: docker compose -f ../docker-compose.yml logs
+
  integration_test:
    runs-on: ubuntu-latest
    needs: setup

    steps:
      - name: Checkout repository
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4
        with:
          submodules: recursive

+      - name: Set up Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: "22.18.0"
+
      - name: Enable corepack
        run: corepack enable

-      - name: Set up Node
-        uses: actions/setup-node@v6
+      - name: Restore dependencies cache
+        uses: actions/cache@v4
        with:
-          node-version: "22.18.0"
-          cache: "pnpm"
-          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
+          path: ~/.pnpm-store
+          key: ${{ needs.setup.outputs.cache-key }}
+          restore-keys: |
+            ${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
+            ${{ runner.os }}-pnpm-

      - name: Install dependencies
        run: pnpm install --frozen-lockfile
@@ -148,11 +294,3 @@ jobs:

      - name: Run Integration Tests
        run: pnpm test:unit
-
-      - name: Upload coverage reports to Codecov
-        if: ${{ !cancelled() }}
-        uses: codecov/codecov-action@v5
-        with:
-          token: ${{ secrets.CODECOV_TOKEN }}
-          flags: platform-frontend
-          files: ./autogpt_platform/frontend/coverage/cobertura-coverage.xml
--- a/.github/workflows/platform-fullstack-ci.yml
+++ b/.github/workflows/platform-fullstack-ci.yml
@@ -1,18 +1,14 @@
-name: AutoGPT Platform - Full-stack CI
+name: AutoGPT Platform - Frontend CI

 on:
  push:
    branches: [master, dev]
    paths:
      - ".github/workflows/platform-fullstack-ci.yml"
-      - ".github/workflows/scripts/docker-ci-fix-compose-build-cache.py"
-      - ".github/workflows/scripts/get_package_version_from_lockfile.py"
      - "autogpt_platform/**"
  pull_request:
    paths:
      - ".github/workflows/platform-fullstack-ci.yml"
-      - ".github/workflows/scripts/docker-ci-fix-compose-build-cache.py"
-      - ".github/workflows/scripts/get_package_version_from_lockfile.py"
      - "autogpt_platform/**"
  merge_group:

@@ -28,308 +24,113 @@ defaults:
 jobs:
  setup:
    runs-on: ubuntu-latest
+    outputs:
+      cache-key: ${{ steps.cache-key.outputs.key }}

    steps:
      - name: Checkout repository
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4
+
+      - name: Set up Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: "22.18.0"

      - name: Enable corepack
        run: corepack enable

-      - name: Set up Node
-        uses: actions/setup-node@v6
-        with:
-          node-version: "22.18.0"
-          cache: "pnpm"
-          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
+      - name: Generate cache key
+        id: cache-key
+        run: echo "key=${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/package.json') }}" >> $GITHUB_OUTPUT

-      - name: Install dependencies to populate cache
+      - name: Cache dependencies
+        uses: actions/cache@v4
+        with:
+          path: ~/.pnpm-store
+          key: ${{ steps.cache-key.outputs.key }}
+          restore-keys: |
+            ${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
+            ${{ runner.os }}-pnpm-
+
+      - name: Install dependencies
        run: pnpm install --frozen-lockfile

-  check-api-types:
-    name: check API types
+  types:
    runs-on: ubuntu-latest
    needs: setup
+    strategy:
+      fail-fast: false

    steps:
      - name: Checkout repository
-        uses: actions/checkout@v6
+        uses: actions/checkout@v4
        with:
          submodules: recursive

-      # ------------------------ Backend setup ------------------------
-
-      - name: Set up Backend - Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: "3.12"
-
-      - name: Set up Backend - Install Poetry
-        working-directory: autogpt_platform/backend
-        run: |
-          POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
-          echo "Installing Poetry version ${POETRY_VERSION}"
-          curl -sSL https://install.python-poetry.org | POETRY_VERSION=$POETRY_VERSION python3 -
-
-      - name: Set up Backend - Set up dependency cache
-        uses: actions/cache@v5
-        with:
-          path: ~/.cache/pypoetry
-          key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
-
-      - name: Set up Backend - Install dependencies
-        working-directory: autogpt_platform/backend
-        run: poetry install
-
-      - name: Set up Backend - Generate Prisma client
-        working-directory: autogpt_platform/backend
-        run: poetry run prisma generate && poetry run gen-prisma-stub
-
-      - name: Set up Frontend - Export OpenAPI schema from Backend
-        working-directory: autogpt_platform/backend
-        run: poetry run export-api-schema --output ../frontend/src/app/api/openapi.json
-
-      # ------------------------ Frontend setup ------------------------
-
-      - name: Set up Frontend - Enable corepack
-        run: corepack enable
-
-      - name: Set up Frontend - Set up Node
-        uses: actions/setup-node@v6
+      - name: Set up Node.js
+        uses: actions/setup-node@v4
        with:
          node-version: "22.18.0"
-          cache: "pnpm"
-          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml

-      - name: Set up Frontend - Install dependencies
+      - name: Enable corepack
+        run: corepack enable
+
+      - name: Copy default supabase .env
+        run: |
+          cp ../.env.default ../.env
+
+      - name: Copy backend .env
+        run: |
+          cp ../backend/.env.default ../backend/.env
+
+      - name: Run docker compose
+        run: |
+          docker compose -f ../docker-compose.yml --profile local --profile deps_backend up -d
+
+      - name: Restore dependencies cache
+        uses: actions/cache@v4
+        with:
+          path: ~/.pnpm-store
+          key: ${{ needs.setup.outputs.cache-key }}
+          restore-keys: |
+            ${{ runner.os }}-pnpm-
+
+      - name: Install dependencies
        run: pnpm install --frozen-lockfile

-      - name: Set up Frontend - Format OpenAPI schema
-        id: format-schema
-        run: pnpm prettier --write ./src/app/api/openapi.json
+      - name: Setup .env
+        run: cp .env.default .env
+
+      - name: Wait for services to be ready
+        run: |
+          echo "Waiting for rest_server to be ready..."
+          timeout 60 sh -c 'until curl -f http://localhost:8006/health 2>/dev/null; do sleep 2; done' || echo "Rest server health check timeout, continuing..."
+          echo "Waiting for database to be ready..."
+          timeout 60 sh -c 'until docker compose -f ../docker-compose.yml exec -T db pg_isready -U postgres 2>/dev/null; do sleep 2; done' || echo "Database ready check timeout, continuing..."
+
+      - name: Generate API queries
+        run: pnpm generate:api:force

      - name: Check for API schema changes
        run: |
          if ! git diff --exit-code src/app/api/openapi.json; then
            echo "❌ API schema changes detected in src/app/api/openapi.json"
            echo ""
-            echo "The openapi.json file has been modified after exporting the API schema."
+            echo "The openapi.json file has been modified after running 'pnpm generate:api-all'."
            echo "This usually means changes have been made in the BE endpoints without updating the Frontend."
            echo "The API schema is now out of sync with the Front-end queries."
            echo ""
            echo "To fix this:"
-            echo "\nIn the backend directory:"
-            echo "1. Run 'poetry run export-api-schema --output ../frontend/src/app/api/openapi.json'"
-            echo "\nIn the frontend directory:"
-            echo "2. Run 'pnpm prettier --write src/app/api/openapi.json'"
-            echo "3. Run 'pnpm generate:api'"
-            echo "4. Run 'pnpm types'"
-            echo "5. Fix any TypeScript errors that may have been introduced"
-            echo "6. Commit and push your changes"
+            echo "1. Pull the backend 'docker compose pull && docker compose up -d --build --force-recreate'"
+            echo "2. Run 'pnpm generate:api' locally"
+            echo "3. Run 'pnpm types' locally"
+            echo "4. Fix any TypeScript errors that may have been introduced"
+            echo "5. Commit and push your changes"
            echo ""
            exit 1
          else
            echo "✅ No API schema changes detected"
          fi

-      - name: Set up Frontend - Generate API client
-        id: generate-api-client
-        run: pnpm orval --config ./orval.config.ts
-        # Continue with type generation & check even if there are schema changes
-        if: success() || (steps.format-schema.outcome == 'success')
-
-      - name: Check for TypeScript errors
+      - name: Run Typescript checks
        run: pnpm types
-        if: success() || (steps.generate-api-client.outcome == 'success')
-
-  e2e_test:
-    name: end-to-end tests
-    runs-on: big-boi
-
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v6
-        with:
-          submodules: recursive
-
-      - name: Set up Platform - Copy default supabase .env
-        run: |
-          cp ../.env.default ../.env
-
-      - name: Set up Platform - Copy backend .env and set OpenAI API key
-        run: |
-          cp ../backend/.env.default ../backend/.env
-          echo "OPENAI_INTERNAL_API_KEY=${{ secrets.OPENAI_API_KEY }}" >> ../backend/.env
-        env:
-          # Used by E2E test data script to generate embeddings for approved store agents
-          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-
-      - name: Set up Platform - Set up Docker Buildx
-        uses: docker/setup-buildx-action@v3
-        with:
-          driver: docker-container
-          driver-opts: network=host
-
-      - name: Set up Platform - Expose GHA cache to docker buildx CLI
-        uses: crazy-max/ghaction-github-runtime@v4
-
-      - name: Set up Platform - Build Docker images (with cache)
-        working-directory: autogpt_platform
-        run: |
-          pip install pyyaml
-
-          # Resolve extends and generate a flat compose file that bake can understand
-          export NEXT_PUBLIC_SOURCEMAPS NEXT_PUBLIC_PW_TEST
-          docker compose -f docker-compose.yml config > docker-compose.resolved.yml
-
-          # Ensure NEXT_PUBLIC_SOURCEMAPS is in resolved compose
-          # (docker compose config on some versions drops this arg)
-          if ! grep -q "NEXT_PUBLIC_SOURCEMAPS" docker-compose.resolved.yml; then
-            echo "Injecting NEXT_PUBLIC_SOURCEMAPS into resolved compose (docker compose config dropped it)"
-            sed -i '/NEXT_PUBLIC_PW_TEST/a\        NEXT_PUBLIC_SOURCEMAPS: "true"' docker-compose.resolved.yml
-          fi
-
-          # Add cache configuration to the resolved compose file
-          python ../.github/workflows/scripts/docker-ci-fix-compose-build-cache.py \
-            --source docker-compose.resolved.yml \
-            --cache-from "type=gha" \
-            --cache-to "type=gha,mode=max" \
-            --backend-hash "${{ hashFiles('autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/poetry.lock', 'autogpt_platform/backend/backend/**') }}" \
-            --frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}-sourcemaps" \
-            --git-ref "${{ github.ref }}"
-
-          # Build with bake using the resolved compose file (now includes cache config)
-          docker buildx bake --allow=fs.read=.. -f docker-compose.resolved.yml --load
-        env:
-          NEXT_PUBLIC_PW_TEST: true
-          NEXT_PUBLIC_SOURCEMAPS: true
-
-      - name: Set up tests - Cache E2E test data
-        id: e2e-data-cache
-        uses: actions/cache@v5
-        with:
-          path: /tmp/e2e_test_data.sql
-          key: e2e-test-data-${{ hashFiles('autogpt_platform/backend/test/e2e_test_data.py', 'autogpt_platform/backend/migrations/**', '.github/workflows/platform-fullstack-ci.yml') }}
-
-      - name: Set up Platform - Start Supabase DB + Auth
-        run: |
-          docker compose -f ../docker-compose.resolved.yml up -d db auth --no-build
-          echo "Waiting for database to be ready..."
-          timeout 60 sh -c 'until docker compose -f ../docker-compose.resolved.yml exec -T db pg_isready -U postgres 2>/dev/null; do sleep 2; done'
-          echo "Waiting for auth service to be ready..."
-          timeout 60 sh -c 'until docker compose -f ../docker-compose.resolved.yml exec -T db psql -U postgres -d postgres -c "SELECT 1 FROM auth.users LIMIT 1" 2>/dev/null; do sleep 2; done' || echo "Auth schema check timeout, continuing..."
-
-      - name: Set up Platform - Run migrations
-        run: |
-          echo "Running migrations..."
-          docker compose -f ../docker-compose.resolved.yml run --rm migrate
-          echo "✅ Migrations completed"
-        env:
-          NEXT_PUBLIC_PW_TEST: true
-
-      - name: Set up tests - Load cached E2E test data
-        if: steps.e2e-data-cache.outputs.cache-hit == 'true'
-        run: |
-          echo "✅ Found cached E2E test data, restoring..."
-          {
-            echo "SET session_replication_role = 'replica';"
-            cat /tmp/e2e_test_data.sql
-            echo "SET session_replication_role = 'origin';"
-          } | docker compose -f ../docker-compose.resolved.yml exec -T db psql -U postgres -d postgres -b
-          # Refresh materialized views after restore
-          docker compose -f ../docker-compose.resolved.yml exec -T db \
-            psql -U postgres -d postgres -b -c "SET search_path TO platform; SELECT refresh_store_materialized_views();" || true
-
-          echo "✅ E2E test data restored from cache"
-
-      - name: Set up Platform - Start (all other services)
-        run: |
-          docker compose -f ../docker-compose.resolved.yml up -d --no-build
-          echo "Waiting for rest_server to be ready..."
-          timeout 60 sh -c 'until curl -f http://localhost:8006/health 2>/dev/null; do sleep 2; done' || echo "Rest server health check timeout, continuing..."
-        env:
-          NEXT_PUBLIC_PW_TEST: true
-
-      - name: Set up tests - Create E2E test data
-        if: steps.e2e-data-cache.outputs.cache-hit != 'true'
-        run: |
-          echo "Creating E2E test data..."
-          docker cp ../backend/test/e2e_test_data.py $(docker compose -f ../docker-compose.resolved.yml ps -q rest_server):/tmp/e2e_test_data.py
-          docker compose -f ../docker-compose.resolved.yml exec -T rest_server sh -c "cd /app/autogpt_platform && python /tmp/e2e_test_data.py" || {
-            echo "❌ E2E test data creation failed!"
-            docker compose -f ../docker-compose.resolved.yml logs --tail=50 rest_server
-            exit 1
-          }
-
-          # Dump auth.users + platform schema for cache (two separate dumps)
-          echo "Dumping database for cache..."
-          {
-            docker compose -f ../docker-compose.resolved.yml exec -T db \
-              pg_dump -U postgres --data-only --column-inserts \
-              --table='auth.users' postgres
-            docker compose -f ../docker-compose.resolved.yml exec -T db \
-              pg_dump -U postgres --data-only --column-inserts \
-              --schema=platform \
-              --exclude-table='platform._prisma_migrations' \
-              --exclude-table='platform.apscheduler_jobs' \
-              --exclude-table='platform.apscheduler_jobs_batched_notifications' \
-              postgres
-          } > /tmp/e2e_test_data.sql
-
-          echo "✅ Database dump created for caching ($(wc -l < /tmp/e2e_test_data.sql) lines)"
-
-      - name: Set up tests - Enable corepack
-        run: corepack enable
-
-      - name: Set up tests - Set up Node
-        uses: actions/setup-node@v6
-        with:
-          node-version: "22.18.0"
-          cache: "pnpm"
-          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
-
-      - name: Copy source maps from Docker for E2E coverage
-        run: |
-          FRONTEND_CONTAINER=$(docker compose -f ../docker-compose.resolved.yml ps -q frontend)
-          docker cp "$FRONTEND_CONTAINER":/app/.next/static .next-static-coverage
-
-      - name: Set up tests - Install dependencies
-        run: pnpm install --frozen-lockfile
-
-      - name: Set up tests - Install browser 'chromium'
-        run: pnpm playwright install --with-deps chromium
-
-      - name: Run Playwright tests
-        run: pnpm test:no-build
-        continue-on-error: false
-
-      - name: Upload E2E coverage to Codecov
-        if: ${{ !cancelled() }}
-        uses: codecov/codecov-action@v5
-        with:
-          token: ${{ secrets.CODECOV_TOKEN }}
-          flags: platform-frontend-e2e
-          files: ./autogpt_platform/frontend/coverage/e2e/cobertura-coverage.xml
-          disable_search: true
-
-      - name: Upload Playwright report
-        if: always()
-        uses: actions/upload-artifact@v4
-        with:
-          name: playwright-report
-          path: autogpt_platform/frontend/playwright-report
-          if-no-files-found: ignore
-          retention-days: 3
-
-      - name: Upload Playwright test results
-        if: always()
-        uses: actions/upload-artifact@v4
-        with:
-          name: playwright-test-results
-          path: autogpt_platform/frontend/test-results
-          if-no-files-found: ignore
-          retention-days: 3
-
-      - name: Print Final Docker Compose logs
-        if: always()
-        run: docker compose -f ../docker-compose.resolved.yml logs
--- a/.github/workflows/pr-overlap-check.yml
+++ b/.github/workflows/pr-overlap-check.yml
@@ -1,39 +0,0 @@
-name: PR Overlap Detection
-
-on:
-  pull_request:
-    types: [opened, synchronize, reopened]
-    branches:
-      - dev
-      - master
-
-permissions:
-  contents: read
-  pull-requests: write
-
-jobs:
-  check-overlaps:
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v4
-        with:
-          fetch-depth: 0  # Need full history for merge testing
-
-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: '3.11'
-
-      - name: Configure git
-        run: |
-          git config user.email "github-actions[bot]@users.noreply.github.com"
-          git config user.name "github-actions[bot]"
-
-      - name: Run overlap detection
-        env:
-          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        # Always succeed - this check informs contributors, it shouldn't block merging
-        continue-on-error: true
-        run: |
-          python .github/scripts/detect_overlaps.py ${{ github.event.pull_request.number }}
--- a/.github/workflows/repo-workflow-checker.yml
+++ b/.github/workflows/repo-workflow-checker.yml
@@ -11,7 +11,7 @@ jobs:
    steps:
      # - name: Wait some time for all actions to start
      #   run: sleep 30
-      - uses: actions/checkout@v6
+      - uses: actions/checkout@v4
        # with:
          # fetch-depth: 0
      - name: Set up Python
--- a/.github/workflows/scripts/docker-ci-fix-compose-build-cache.py
+++ b/.github/workflows/scripts/docker-ci-fix-compose-build-cache.py
@@ -1,195 +0,0 @@
-#!/usr/bin/env python3
-"""
-Add cache configuration to a resolved docker-compose file for all services
-that have a build key, and ensure image names match what docker compose expects.
-"""
-
-import argparse
-
-import yaml
-
-
-DEFAULT_BRANCH = "dev"
-CACHE_BUILDS_FOR_COMPONENTS = ["backend", "frontend"]
-
-
-def main():
-    parser = argparse.ArgumentParser(
-        description="Add cache config to a resolved compose file"
-    )
-    parser.add_argument(
-        "--source",
-        required=True,
-        help="Source compose file to read (should be output of `docker compose config`)",
-    )
-    parser.add_argument(
-        "--cache-from",
-        default="type=gha",
-        help="Cache source configuration",
-    )
-    parser.add_argument(
-        "--cache-to",
-        default="type=gha,mode=max",
-        help="Cache destination configuration",
-    )
-    for component in CACHE_BUILDS_FOR_COMPONENTS:
-        parser.add_argument(
-            f"--{component}-hash",
-            default="",
-            help=f"Hash for {component} cache scope (e.g., from hashFiles())",
-        )
-    parser.add_argument(
-        "--git-ref",
-        default="",
-        help="Git ref for branch-based cache scope (e.g., refs/heads/master)",
-    )
-    args = parser.parse_args()
-
-    # Normalize git ref to a safe scope name (e.g., refs/heads/master -> master)
-    git_ref_scope = ""
-    if args.git_ref:
-        git_ref_scope = args.git_ref.replace("refs/heads/", "").replace("/", "-")
-
-    with open(args.source, "r") as f:
-        compose = yaml.safe_load(f)
-
-    # Get project name from compose file or default
-    project_name = compose.get("name", "autogpt_platform")
-
-    def get_image_name(dockerfile: str, target: str) -> str:
-        """Generate image name based on Dockerfile folder and build target."""
-        dockerfile_parts = dockerfile.replace("\\", "/").split("/")
-        if len(dockerfile_parts) >= 2:
-            folder_name = dockerfile_parts[-2]  # e.g., "backend" or "frontend"
-        else:
-            folder_name = "app"
-        return f"{project_name}-{folder_name}:{target}"
-
-    def get_build_key(dockerfile: str, target: str) -> str:
-        """Generate a unique key for a Dockerfile+target combination."""
-        return f"{dockerfile}:{target}"
-
-    def get_component(dockerfile: str) -> str | None:
-        """Get component name (frontend/backend) from dockerfile path."""
-        for component in CACHE_BUILDS_FOR_COMPONENTS:
-            if component in dockerfile:
-                return component
-        return None
-
-    # First pass: collect all services with build configs and identify duplicates
-    # Track which (dockerfile, target) combinations we've seen
-    build_key_to_first_service: dict[str, str] = {}
-    services_to_build: list[str] = []
-    services_to_dedupe: list[str] = []
-
-    for service_name, service_config in compose.get("services", {}).items():
-        if "build" not in service_config:
-            continue
-
-        build_config = service_config["build"]
-        dockerfile = build_config.get("dockerfile", "Dockerfile")
-        target = build_config.get("target", "default")
-        build_key = get_build_key(dockerfile, target)
-
-        if build_key not in build_key_to_first_service:
-            # First service with this build config - it will do the actual build
-            build_key_to_first_service[build_key] = service_name
-            services_to_build.append(service_name)
-        else:
-            # Duplicate - will just use the image from the first service
-            services_to_dedupe.append(service_name)
-
-    # Second pass: configure builds and deduplicate
-    modified_services = []
-    for service_name, service_config in compose.get("services", {}).items():
-        if "build" not in service_config:
-            continue
-
-        build_config = service_config["build"]
-        dockerfile = build_config.get("dockerfile", "Dockerfile")
-        target = build_config.get("target", "latest")
-        image_name = get_image_name(dockerfile, target)
-
-        # Set image name for all services (needed for both builders and deduped)
-        service_config["image"] = image_name
-
-        if service_name in services_to_dedupe:
-            # Remove build config - this service will use the pre-built image
-            del service_config["build"]
-            continue
-
-        # This service will do the actual build - add cache config
-        cache_from_list = []
-        cache_to_list = []
-
-        component = get_component(dockerfile)
-        if not component:
-            # Skip services that don't clearly match frontend/backend
-            continue
-
-        # Get the hash for this component
-        component_hash = getattr(args, f"{component}_hash")
-
-        # Scope format: platform-{component}-{target}-{hash|ref}
-        # Example: platform-backend-server-abc123
-
-        if "type=gha" in args.cache_from:
-            # 1. Primary: exact hash match (most specific)
-            if component_hash:
-                hash_scope = f"platform-{component}-{target}-{component_hash}"
-                cache_from_list.append(f"{args.cache_from},scope={hash_scope}")
-
-            # 2. Fallback: branch-based cache
-            if git_ref_scope:
-                ref_scope = f"platform-{component}-{target}-{git_ref_scope}"
-                cache_from_list.append(f"{args.cache_from},scope={ref_scope}")
-
-            # 3. Fallback: dev branch cache (for PRs/feature branches)
-            if git_ref_scope and git_ref_scope != DEFAULT_BRANCH:
-                master_scope = f"platform-{component}-{target}-{DEFAULT_BRANCH}"
-                cache_from_list.append(f"{args.cache_from},scope={master_scope}")
-
-        if "type=gha" in args.cache_to:
-            # Write to both hash-based and branch-based scopes
-            if component_hash:
-                hash_scope = f"platform-{component}-{target}-{component_hash}"
-                cache_to_list.append(f"{args.cache_to},scope={hash_scope}")
-
-            if git_ref_scope:
-                ref_scope = f"platform-{component}-{target}-{git_ref_scope}"
-                cache_to_list.append(f"{args.cache_to},scope={ref_scope}")
-
-        # Ensure we have at least one cache source/target
-        if not cache_from_list:
-            cache_from_list.append(args.cache_from)
-        if not cache_to_list:
-            cache_to_list.append(args.cache_to)
-
-        build_config["cache_from"] = cache_from_list
-        build_config["cache_to"] = cache_to_list
-        modified_services.append(service_name)
-
-    # Write back to the same file
-    with open(args.source, "w") as f:
-        yaml.dump(compose, f, default_flow_style=False, sort_keys=False)
-
-    print(f"Added cache config to {len(modified_services)} services in {args.source}:")
-    for svc in modified_services:
-        svc_config = compose["services"][svc]
-        build_cfg = svc_config.get("build", {})
-        cache_from_list = build_cfg.get("cache_from", ["none"])
-        cache_to_list = build_cfg.get("cache_to", ["none"])
-        print(f"  - {svc}")
-        print(f"      image: {svc_config.get('image', 'N/A')}")
-        print(f"      cache_from: {cache_from_list}")
-        print(f"      cache_to: {cache_to_list}")
-    if services_to_dedupe:
-        print(
-            f"Deduplicated {len(services_to_dedupe)} services (will use pre-built images):"
-        )
-        for svc in services_to_dedupe:
-            print(f"  - {svc} -> {compose['services'][svc].get('image', 'N/A')}")
-
-
-if __name__ == "__main__":
-    main()
--- a/.gitignore
+++ b/.gitignore
@@ -3,7 +3,6 @@
 classic/original_autogpt/keys.py
 classic/original_autogpt/*.json
 auto_gpt_workspace/*
-.autogpt/
 *.mpeg
 .env
 # Root .env files
@@ -17,7 +16,6 @@ log-ingestion.txt
 /logs
 *.log
 *.mp3
-!autogpt_platform/frontend/public/notification.mp3
 mem.sqlite3
 venvAutoGPT

@@ -161,10 +159,6 @@ CURRENT_BULLETIN.md

 # AgBenchmark
 classic/benchmark/agbenchmark/reports/
-classic/reports/
-classic/direct_benchmark/reports/
-classic/.benchmark_workspaces/
-classic/direct_benchmark/.benchmark_workspaces/

 # Nodejs
 package-lock.json
@@ -183,13 +177,6 @@ autogpt_platform/backend/settings.py

 *.ign.*
 .test-contents
-**/.claude/settings.local.json
 .claude/settings.local.json
 CLAUDE.local.md
 /autogpt_platform/backend/logs
-
-# Test database
-test.db
-.next
-# Implementation plans (generated by AI agents)
-plans/
--- a/.gitleaks.toml
+++ b/.gitleaks.toml
@@ -1,36 +0,0 @@
-title = "AutoGPT Gitleaks Config"
-
-[extend]
-useDefault = true
-
-[allowlist]
-description = "Global allowlist"
-paths = [
-    # Template/example env files (no real secrets)
-    '''\.env\.(default|example|template)$''',
-    # Lock files
-    '''pnpm-lock\.yaml$''',
-    '''poetry\.lock$''',
-    # Secrets baseline
-    '''\.secrets\.baseline$''',
-    # Build artifacts and caches (should not be committed)
-    '''__pycache__/''',
-    '''classic/frontend/build/''',
-    # Docker dev setup (local dev JWTs/keys only)
-    '''autogpt_platform/db/docker/''',
-    # Load test configs (dev JWTs)
-    '''load-tests/configs/''',
-    # Test files with fake/fixture keys (_test.py, test_*.py, conftest.py)
-    '''(_test|test_.*|conftest)\.py$''',
-    # Documentation (only contains placeholder keys in curl/API examples)
-    '''docs/.*\.md$''',
-    # Firebase config (public API keys by design)
-    '''google-services\.json$''',
-    '''classic/frontend/(lib|web)/''',
-]
-# CI test-only encryption key (marked DO NOT USE IN PRODUCTION)
-regexes = [
-    '''dvziYgz0KSK8FENhju0ZYi8''',
-    # LLM model name enum values falsely flagged as API keys
-    '''Llama-\d.*Instruct''',
-]
--- a/.gitmodules
+++ b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "classic/forge/tests/vcr_cassettes"]
+	path = classic/forge/tests/vcr_cassettes
+	url = https://github.com/Significant-Gravitas/Auto-GPT-test-cassettes
--- a/.nvmrc
+++ b/.nvmrc
@@ -1 +0,0 @@
-22
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -1,10 +1,3 @@
-default_install_hook_types:
-  - pre-commit
-  - pre-push
-  - post-checkout
-
-default_stages: [pre-commit]
-
 repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.4.0
@@ -23,15 +16,8 @@ repos:
      - id: detect-secrets
        name: Detect secrets
        description: Detects high entropy strings that are likely to be passwords.
-        args: ["--baseline", ".secrets.baseline"]
        files: ^autogpt_platform/
-        exclude: (pnpm-lock\.yaml|\.env\.(default|example|template))$
-
-  - repo: https://github.com/gitleaks/gitleaks
-    rev: v8.24.3
-    hooks:
-      - id: gitleaks
-        name: Detect secrets (gitleaks)
+        stages: [pre-push]

  - repo: local
    # For proper type checking, all dependencies need to be up-to-date.
@@ -40,71 +26,49 @@ repos:
      - id: poetry-install
        name: Check & Install dependencies - AutoGPT Platform - Backend
        alias: poetry-install-platform-backend
+        entry: poetry -C autogpt_platform/backend install
        # include autogpt_libs source (since it's a path dependency)
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^autogpt_platform/(backend|autogpt_libs)/poetry\.lock$" || exit 0;
-          poetry -C autogpt_platform/backend install
-          '
-        always_run: true
+        files: ^autogpt_platform/(backend|autogpt_libs)/poetry\.lock$
+        types: [file]
        language: system
        pass_filenames: false
-        stages: [pre-commit, post-checkout]

      - id: poetry-install
        name: Check & Install dependencies - AutoGPT Platform - Libs
        alias: poetry-install-platform-libs
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^autogpt_platform/autogpt_libs/poetry\.lock$" || exit 0;
-          poetry -C autogpt_platform/autogpt_libs install
-          '
-        always_run: true
+        entry: poetry -C autogpt_platform/autogpt_libs install
+        files: ^autogpt_platform/autogpt_libs/poetry\.lock$
+        types: [file]
        language: system
        pass_filenames: false
-        stages: [pre-commit, post-checkout]
-
-      - id: pnpm-install
-        name: Check & Install dependencies - AutoGPT Platform - Frontend
-        alias: pnpm-install-platform-frontend
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^autogpt_platform/frontend/pnpm-lock\.yaml$" || exit 0;
-          pnpm --prefix autogpt_platform/frontend install
-          '
-        always_run: true
-        language: system
-        pass_filenames: false
-        stages: [pre-commit, post-checkout]

      - id: poetry-install
-        name: Check & Install dependencies - Classic
-        alias: poetry-install-classic
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^classic/poetry\.lock$" || exit 0;
-          poetry -C classic install
-          '
-        always_run: true
+        name: Check & Install dependencies - Classic - AutoGPT
+        alias: poetry-install-classic-autogpt
+        entry: poetry -C classic/original_autogpt install
+        # include forge source (since it's a path dependency)
+        files: ^classic/(original_autogpt|forge)/poetry\.lock$
+        types: [file]
+        language: system
+        pass_filenames: false
+
+      - id: poetry-install
+        name: Check & Install dependencies - Classic - Forge
+        alias: poetry-install-classic-forge
+        entry: poetry -C classic/forge install
+        files: ^classic/forge/poetry\.lock$
+        types: [file]
+        language: system
+        pass_filenames: false
+
+      - id: poetry-install
+        name: Check & Install dependencies - Classic - Benchmark
+        alias: poetry-install-classic-benchmark
+        entry: poetry -C classic/benchmark install
+        files: ^classic/benchmark/poetry\.lock$
+        types: [file]
        language: system
        pass_filenames: false
-        stages: [pre-commit, post-checkout]

  - repo: local
    # For proper type checking, Prisma client must be up-to-date.
@@ -112,54 +76,12 @@ repos:
      - id: prisma-generate
        name: Prisma Generate - AutoGPT Platform - Backend
        alias: prisma-generate-platform-backend
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^autogpt_platform/((backend|autogpt_libs)/poetry\.lock|backend/schema\.prisma)$" || exit 0;
-          cd autogpt_platform/backend
-          && poetry run prisma generate
-          && poetry run gen-prisma-stub
-          '
+        entry: bash -c 'cd autogpt_platform/backend && poetry run prisma generate'
        # include everything that triggers poetry install + the prisma schema
-        always_run: true
+        files: ^autogpt_platform/((backend|autogpt_libs)/poetry\.lock|backend/schema.prisma)$
+        types: [file]
        language: system
        pass_filenames: false
-        stages: [pre-commit, post-checkout]
-
-      - id: export-api-schema
-        name: Export API schema - AutoGPT Platform - Backend -> Frontend
-        alias: export-api-schema-platform
-        entry: >
-          bash -c '
-          cd autogpt_platform/backend
-          && poetry run export-api-schema --output ../frontend/src/app/api/openapi.json
-          && cd ../frontend
-          && pnpm prettier --write ./src/app/api/openapi.json
-          '
-        files: ^autogpt_platform/backend/
-        language: system
-        pass_filenames: false
-
-      - id: generate-api-client
-        name: Generate API client - AutoGPT Platform - Frontend
-        alias: generate-api-client-platform-frontend
-        entry: >
-          bash -c '
-          SCHEMA=autogpt_platform/frontend/src/app/api/openapi.json;
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --quiet "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF" -- "$SCHEMA" && exit 0
-          else
-            git diff --quiet HEAD -- "$SCHEMA" && exit 0
-          fi;
-          cd autogpt_platform/frontend && pnpm generate:api
-          '
-        always_run: true
-        language: system
-        pass_filenames: false
-        stages: [pre-commit, post-checkout]

  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.7.2
@@ -194,10 +116,26 @@ repos:
        language: system

      - id: isort
-        name: Lint (isort) - Classic
-        alias: isort-classic
-        entry: bash -c 'cd classic && poetry run isort $(echo "$@" | sed "s|classic/||g")' --
-        files: ^classic/(original_autogpt|forge|direct_benchmark)/
+        name: Lint (isort) - Classic - AutoGPT
+        alias: isort-classic-autogpt
+        entry: poetry -P classic/original_autogpt run isort -p autogpt
+        files: ^classic/original_autogpt/
+        types: [file, python]
+        language: system
+
+      - id: isort
+        name: Lint (isort) - Classic - Forge
+        alias: isort-classic-forge
+        entry: poetry -P classic/forge run isort -p forge
+        files: ^classic/forge/
+        types: [file, python]
+        language: system
+
+      - id: isort
+        name: Lint (isort) - Classic - Benchmark
+        alias: isort-classic-benchmark
+        entry: poetry -P classic/benchmark run isort -p agbenchmark
+        files: ^classic/benchmark/
        types: [file, python]
        language: system

@@ -211,13 +149,26 @@ repos:

  - repo: https://github.com/PyCQA/flake8
    rev: 7.0.0
-    # Use consolidated flake8 config at classic/.flake8
+    # To have flake8 load the config of the individual subprojects, we have to call
+    # them separately.
    hooks:
      - id: flake8
-        name: Lint (Flake8) - Classic
-        alias: flake8-classic
-        files: ^classic/(original_autogpt|forge|direct_benchmark)/
-        args: [--config=classic/.flake8]
+        name: Lint (Flake8) - Classic - AutoGPT
+        alias: flake8-classic-autogpt
+        files: ^classic/original_autogpt/(autogpt|scripts|tests)/
+        args: [--config=classic/original_autogpt/.flake8]
+
+      - id: flake8
+        name: Lint (Flake8) - Classic - Forge
+        alias: flake8-classic-forge
+        files: ^classic/forge/(forge|tests)/
+        args: [--config=classic/forge/.flake8]
+
+      - id: flake8
+        name: Lint (Flake8) - Classic - Benchmark
+        alias: flake8-classic-benchmark
+        files: ^classic/benchmark/(agbenchmark|tests)/((?!reports).)*[/.]
+        args: [--config=classic/benchmark/.flake8]

  - repo: local
    hooks:
@@ -253,10 +204,29 @@ repos:
        pass_filenames: false

      - id: pyright
-        name: Typecheck - Classic
-        alias: pyright-classic
-        entry: poetry -C classic run pyright
-        files: ^classic/(original_autogpt|forge|direct_benchmark)/.*\.py$|^classic/poetry\.lock$
+        name: Typecheck - Classic - AutoGPT
+        alias: pyright-classic-autogpt
+        entry: poetry -C classic/original_autogpt run pyright
+        # include forge source (since it's a path dependency) but exclude *_test.py files:
+        files: ^(classic/original_autogpt/((autogpt|scripts|tests)/|poetry\.lock$)|classic/forge/(forge/.*(?<!_test)\.py|poetry\.lock)$)
+        types: [file]
+        language: system
+        pass_filenames: false
+
+      - id: pyright
+        name: Typecheck - Classic - Forge
+        alias: pyright-classic-forge
+        entry: poetry -C classic/forge run pyright
+        files: ^classic/forge/(forge/|poetry\.lock$)
+        types: [file]
+        language: system
+        pass_filenames: false
+
+      - id: pyright
+        name: Typecheck - Classic - Benchmark
+        alias: pyright-classic-benchmark
+        entry: poetry -C classic/benchmark run pyright
+        files: ^classic/benchmark/(agbenchmark/|tests/|poetry\.lock$)
        types: [file]
        language: system
        pass_filenames: false
@@ -283,9 +253,26 @@ repos:
  #       pass_filenames: false

  #     - id: pytest
-  #       name: Run tests - Classic (excl. slow tests)
-  #       alias: pytest-classic
-  #       entry: bash -c 'cd classic && poetry run pytest -m "not slow"'
-  #       files: ^classic/(original_autogpt|forge|direct_benchmark)/
+  #       name: Run tests - Classic - AutoGPT (excl. slow tests)
+  #       alias: pytest-classic-autogpt
+  #       entry: bash -c 'cd classic/original_autogpt && poetry run pytest --cov=autogpt -m "not slow" tests/unit tests/integration'
+  #       # include forge source (since it's a path dependency) but exclude *_test.py files:
+  #       files: ^(classic/original_autogpt/((autogpt|tests)/|poetry\.lock$)|classic/forge/(forge/.*(?<!_test)\.py|poetry\.lock)$)
+  #       language: system
+  #       pass_filenames: false
+
+  #     - id: pytest
+  #       name: Run tests - Classic - Forge (excl. slow tests)
+  #       alias: pytest-classic-forge
+  #       entry: bash -c 'cd classic/forge && poetry run pytest --cov=forge -m "not slow"'
+  #       files: ^classic/forge/(forge/|tests/|poetry\.lock$)
+  #       language: system
+  #       pass_filenames: false
+
+  #     - id: pytest
+  #       name: Run tests - Classic - Benchmark
+  #       alias: pytest-classic-benchmark
+  #       entry: bash -c 'cd classic/benchmark && poetry run pytest --cov=benchmark'
+  #       files: ^classic/benchmark/(agbenchmark/|tests/|poetry\.lock$)
  #       language: system
  #       pass_filenames: false
--- a/.secrets.baseline
+++ b/.secrets.baseline
@@ -1,467 +0,0 @@
-{
-  "version": "1.5.0",
-  "plugins_used": [
-    {
-      "name": "ArtifactoryDetector"
-    },
-    {
-      "name": "AWSKeyDetector"
-    },
-    {
-      "name": "AzureStorageKeyDetector"
-    },
-    {
-      "name": "Base64HighEntropyString",
-      "limit": 4.5
-    },
-    {
-      "name": "BasicAuthDetector"
-    },
-    {
-      "name": "CloudantDetector"
-    },
-    {
-      "name": "DiscordBotTokenDetector"
-    },
-    {
-      "name": "GitHubTokenDetector"
-    },
-    {
-      "name": "GitLabTokenDetector"
-    },
-    {
-      "name": "HexHighEntropyString",
-      "limit": 3.0
-    },
-    {
-      "name": "IbmCloudIamDetector"
-    },
-    {
-      "name": "IbmCosHmacDetector"
-    },
-    {
-      "name": "IPPublicDetector"
-    },
-    {
-      "name": "JwtTokenDetector"
-    },
-    {
-      "name": "KeywordDetector",
-      "keyword_exclude": ""
-    },
-    {
-      "name": "MailchimpDetector"
-    },
-    {
-      "name": "NpmDetector"
-    },
-    {
-      "name": "OpenAIDetector"
-    },
-    {
-      "name": "PrivateKeyDetector"
-    },
-    {
-      "name": "PypiTokenDetector"
-    },
-    {
-      "name": "SendGridDetector"
-    },
-    {
-      "name": "SlackDetector"
-    },
-    {
-      "name": "SoftlayerDetector"
-    },
-    {
-      "name": "SquareOAuthDetector"
-    },
-    {
-      "name": "StripeDetector"
-    },
-    {
-      "name": "TelegramBotTokenDetector"
-    },
-    {
-      "name": "TwilioKeyDetector"
-    }
-  ],
-  "filters_used": [
-    {
-      "path": "detect_secrets.filters.allowlist.is_line_allowlisted"
-    },
-    {
-      "path": "detect_secrets.filters.common.is_ignored_due_to_verification_policies",
-      "min_level": 2
-    },
-    {
-      "path": "detect_secrets.filters.heuristic.is_indirect_reference"
-    },
-    {
-      "path": "detect_secrets.filters.heuristic.is_likely_id_string"
-    },
-    {
-      "path": "detect_secrets.filters.heuristic.is_lock_file"
-    },
-    {
-      "path": "detect_secrets.filters.heuristic.is_not_alphanumeric_string"
-    },
-    {
-      "path": "detect_secrets.filters.heuristic.is_potential_uuid"
-    },
-    {
-      "path": "detect_secrets.filters.heuristic.is_prefixed_with_dollar_sign"
-    },
-    {
-      "path": "detect_secrets.filters.heuristic.is_sequential_string"
-    },
-    {
-      "path": "detect_secrets.filters.heuristic.is_swagger_file"
-    },
-    {
-      "path": "detect_secrets.filters.heuristic.is_templated_secret"
-    },
-    {
-      "path": "detect_secrets.filters.regex.should_exclude_file",
-      "pattern": [
-        "\\.env$",
-        "pnpm-lock\\.yaml$",
-        "\\.env\\.(default|example|template)$",
-        "__pycache__",
-        "_test\\.py$",
-        "test_.*\\.py$",
-        "conftest\\.py$",
-        "poetry\\.lock$",
-        "node_modules"
-      ]
-    }
-  ],
-  "results": {
-    "autogpt_platform/backend/backend/api/external/v1/integrations.py": [
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/backend/backend/api/external/v1/integrations.py",
-        "hashed_secret": "665b1e3851eefefa3fb878654292f16597d25155",
-        "is_verified": false,
-        "line_number": 289
-      }
-    ],
-    "autogpt_platform/backend/backend/blocks/airtable/_config.py": [
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/backend/backend/blocks/airtable/_config.py",
-        "hashed_secret": "57e168b03afb7c1ee3cdc4ee3db2fe1cc6e0df26",
-        "is_verified": false,
-        "line_number": 29
-      }
-    ],
-    "autogpt_platform/backend/backend/blocks/dataforseo/_config.py": [
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/backend/backend/blocks/dataforseo/_config.py",
-        "hashed_secret": "32ce93887331fa5d192f2876ea15ec000c7d58b8",
-        "is_verified": false,
-        "line_number": 12
-      }
-    ],
-    "autogpt_platform/backend/backend/blocks/github/checks.py": [
-      {
-        "type": "Hex High Entropy String",
-        "filename": "autogpt_platform/backend/backend/blocks/github/checks.py",
-        "hashed_secret": "8ac6f92737d8586790519c5d7bfb4d2eb172c238",
-        "is_verified": false,
-        "line_number": 108
-      }
-    ],
-    "autogpt_platform/backend/backend/blocks/github/ci.py": [
-      {
-        "type": "Hex High Entropy String",
-        "filename": "autogpt_platform/backend/backend/blocks/github/ci.py",
-        "hashed_secret": "90bd1b48e958257948487b90bee080ba5ed00caa",
-        "is_verified": false,
-        "line_number": 123
-      }
-    ],
-    "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json": [
-      {
-        "type": "Hex High Entropy String",
-        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
-        "hashed_secret": "f96896dafced7387dcd22343b8ea29d3d2c65663",
-        "is_verified": false,
-        "line_number": 42
-      },
-      {
-        "type": "Hex High Entropy String",
-        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
-        "hashed_secret": "b80a94d5e70bedf4f5f89d2f5a5255cc9492d12e",
-        "is_verified": false,
-        "line_number": 193
-      },
-      {
-        "type": "Hex High Entropy String",
-        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
-        "hashed_secret": "75b17e517fe1b3136394f6bec80c4f892da75e42",
-        "is_verified": false,
-        "line_number": 344
-      },
-      {
-        "type": "Hex High Entropy String",
-        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
-        "hashed_secret": "b0bfb5e4e2394e7f8906e5ed1dffd88b2bc89dd5",
-        "is_verified": false,
-        "line_number": 534
-      }
-    ],
-    "autogpt_platform/backend/backend/blocks/github/statuses.py": [
-      {
-        "type": "Hex High Entropy String",
-        "filename": "autogpt_platform/backend/backend/blocks/github/statuses.py",
-        "hashed_secret": "8ac6f92737d8586790519c5d7bfb4d2eb172c238",
-        "is_verified": false,
-        "line_number": 85
-      }
-    ],
-    "autogpt_platform/backend/backend/blocks/google/docs.py": [
-      {
-        "type": "Hex High Entropy String",
-        "filename": "autogpt_platform/backend/backend/blocks/google/docs.py",
-        "hashed_secret": "c95da0c6696342c867ef0c8258d2f74d20fd94d4",
-        "is_verified": false,
-        "line_number": 203
-      }
-    ],
-    "autogpt_platform/backend/backend/blocks/google/sheets.py": [
-      {
-        "type": "Base64 High Entropy String",
-        "filename": "autogpt_platform/backend/backend/blocks/google/sheets.py",
-        "hashed_secret": "bd5a04fa3667e693edc13239b6d310c5c7a8564b",
-        "is_verified": false,
-        "line_number": 57
-      }
-    ],
-    "autogpt_platform/backend/backend/blocks/linear/_config.py": [
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/backend/backend/blocks/linear/_config.py",
-        "hashed_secret": "b37f020f42d6d613b6ce30103e4d408c4499b3bb",
-        "is_verified": false,
-        "line_number": 53
-      }
-    ],
-    "autogpt_platform/backend/backend/blocks/medium.py": [
-      {
-        "type": "Hex High Entropy String",
-        "filename": "autogpt_platform/backend/backend/blocks/medium.py",
-        "hashed_secret": "ff998abc1ce6d8f01a675fa197368e44c8916e9c",
-        "is_verified": false,
-        "line_number": 131
-      }
-    ],
-    "autogpt_platform/backend/backend/blocks/replicate/replicate_block.py": [
-      {
-        "type": "Hex High Entropy String",
-        "filename": "autogpt_platform/backend/backend/blocks/replicate/replicate_block.py",
-        "hashed_secret": "8bbdd6f26368f58ea4011d13d7f763cb662e66f0",
-        "is_verified": false,
-        "line_number": 55
-      }
-    ],
-    "autogpt_platform/backend/backend/blocks/slant3d/webhook.py": [
-      {
-        "type": "Hex High Entropy String",
-        "filename": "autogpt_platform/backend/backend/blocks/slant3d/webhook.py",
-        "hashed_secret": "36263c76947443b2f6e6b78153967ac4a7da99f9",
-        "is_verified": false,
-        "line_number": 100
-      }
-    ],
-    "autogpt_platform/backend/backend/blocks/talking_head.py": [
-      {
-        "type": "Base64 High Entropy String",
-        "filename": "autogpt_platform/backend/backend/blocks/talking_head.py",
-        "hashed_secret": "44ce2d66222529eea4a32932823466fc0601c799",
-        "is_verified": false,
-        "line_number": 113
-      }
-    ],
-    "autogpt_platform/backend/backend/blocks/wordpress/_config.py": [
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/backend/backend/blocks/wordpress/_config.py",
-        "hashed_secret": "e62679512436161b78e8a8d68c8829c2a1031ccb",
-        "is_verified": false,
-        "line_number": 17
-      }
-    ],
-    "autogpt_platform/backend/backend/util/cache.py": [
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/backend/backend/util/cache.py",
-        "hashed_secret": "37f0c918c3fa47ca4a70e42037f9f123fdfbc75b",
-        "is_verified": false,
-        "line_number": 449
-      }
-    ],
-    "autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/helpers.ts": [
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/helpers.ts",
-        "hashed_secret": "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8",
-        "is_verified": false,
-        "line_number": 6
-      }
-    ],
-    "autogpt_platform/frontend/src/app/(platform)/dictionaries/en.json": [
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/frontend/src/app/(platform)/dictionaries/en.json",
-        "hashed_secret": "8be3c943b1609fffbfc51aad666d0a04adf83c9d",
-        "is_verified": false,
-        "line_number": 5
-      }
-    ],
-    "autogpt_platform/frontend/src/app/(platform)/dictionaries/es.json": [
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/frontend/src/app/(platform)/dictionaries/es.json",
-        "hashed_secret": "5a6d1c612954979ea99ee33dbb2d231b00f6ac0a",
-        "is_verified": false,
-        "line_number": 5
-      }
-    ],
-    "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts": [
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts",
-        "hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
-        "is_verified": false,
-        "line_number": 6
-      },
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts",
-        "hashed_secret": "f72cbb45464d487064610c5411c576ca4019d380",
-        "is_verified": false,
-        "line_number": 8
-      }
-    ],
-    "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts": [
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts",
-        "hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
-        "is_verified": false,
-        "line_number": 5
-      },
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts",
-        "hashed_secret": "f72cbb45464d487064610c5411c576ca4019d380",
-        "is_verified": false,
-        "line_number": 7
-      }
-    ],
-    "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx": [
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx",
-        "hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
-        "is_verified": false,
-        "line_number": 192
-      },
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx",
-        "hashed_secret": "86275db852204937bbdbdebe5fabe8536e030ab6",
-        "is_verified": false,
-        "line_number": 193
-      }
-    ],
-    "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts": [
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts",
-        "hashed_secret": "47acd2028cf81b5da88ddeedb2aea4eca4b71fbd",
-        "is_verified": false,
-        "line_number": 102
-      },
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts",
-        "hashed_secret": "8be3c943b1609fffbfc51aad666d0a04adf83c9d",
-        "is_verified": false,
-        "line_number": 103
-      }
-    ],
-    "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts": [
-      {
-        "type": "Base64 High Entropy String",
-        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
-        "hashed_secret": "9c486c92f1a7420e1045c7ad963fbb7ba3621025",
-        "is_verified": false,
-        "line_number": 73
-      },
-      {
-        "type": "Base64 High Entropy String",
-        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
-        "hashed_secret": "9277508c7a6effc8fb59163efbfada189e35425c",
-        "is_verified": false,
-        "line_number": 75
-      },
-      {
-        "type": "Base64 High Entropy String",
-        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
-        "hashed_secret": "8dc7e2cb1d0935897d541bf5facab389b8a50340",
-        "is_verified": false,
-        "line_number": 77
-      },
-      {
-        "type": "Base64 High Entropy String",
-        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
-        "hashed_secret": "79a26ad48775944299be6aaf9fb1d5302c1ed75b",
-        "is_verified": false,
-        "line_number": 79
-      },
-      {
-        "type": "Base64 High Entropy String",
-        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
-        "hashed_secret": "a3b62b44500a1612e48d4cab8294df81561b3b1a",
-        "is_verified": false,
-        "line_number": 81
-      },
-      {
-        "type": "Base64 High Entropy String",
-        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
-        "hashed_secret": "a58979bd0b21ef4f50417d001008e60dd7a85c64",
-        "is_verified": false,
-        "line_number": 83
-      },
-      {
-        "type": "Base64 High Entropy String",
-        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
-        "hashed_secret": "6cb6e075f8e8c7c850f9d128d6608e5dbe209a79",
-        "is_verified": false,
-        "line_number": 85
-      }
-    ],
-    "autogpt_platform/frontend/src/lib/constants.ts": [
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/frontend/src/lib/constants.ts",
-        "hashed_secret": "27b924db06a28cc755fb07c54f0fddc30659fe4d",
-        "is_verified": false,
-        "line_number": 10
-      }
-    ],
-    "autogpt_platform/frontend/src/tests/credentials/index.ts": [
-      {
-        "type": "Secret Keyword",
-        "filename": "autogpt_platform/frontend/src/tests/credentials/index.ts",
-        "hashed_secret": "c18006fc138809314751cd1991f1e0b820fabd37",
-        "is_verified": false,
-        "line_number": 4
-      }
-    ]
-  },
-  "generated_at": "2026-04-02T13:10:54Z"
-}
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,6 +1,6 @@
 # AutoGPT Platform Contribution Guide

-This guide provides context for coding agents when updating the **autogpt_platform** folder.
+This guide provides context for Codex when updating the **autogpt_platform** folder.

 ## Directory overview

@@ -30,7 +30,7 @@ See `/frontend/CONTRIBUTING.md` for complete patterns. Quick reference:
   - Regenerate with `pnpm generate:api`
   - Pattern: `use{Method}{Version}{OperationName}`
 4. **Styling**: Tailwind CSS only, use design tokens, Phosphor Icons only
-5. **Testing**: Integration tests (Vitest + RTL + MSW) are the default (~90%, page-level). Playwright for E2E critical flows. Storybook for design system components. See `autogpt_platform/frontend/TESTING.md`
+5. **Testing**: Add Storybook stories for new components, Playwright for E2E
 6. **Code conventions**: Function declarations (not arrow functions) for components/handlers

 - Component props should be `interface Props { ... }` (not exported) unless the interface needs to be used outside the component
@@ -47,9 +47,7 @@ See `/frontend/CONTRIBUTING.md` for complete patterns. Quick reference:
 ## Testing

 - Backend: `poetry run test` (runs pytest with a docker based postgres + prisma).
- Frontend integration tests: `pnpm test:unit` (Vitest + RTL + MSW, primary testing approach).
- Frontend E2E tests: `pnpm test` or `pnpm test-ui` for Playwright tests.
- See `autogpt_platform/frontend/TESTING.md` for the full testing strategy.
+- Frontend: `pnpm test` or `pnpm test-ui` for Playwright tests. See `docs/content/platform/contributing/tests.md` for tips.

 Always run the relevant linters and tests before committing.
 Use conventional commit messages for all commits (e.g. `feat(backend): add API`).
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1 +0,0 @@
-@AGENTS.md
--- a/README.md
+++ b/README.md
@@ -54,7 +54,7 @@ Before proceeding with the installation, ensure your system meets the following
 ### Updated Setup Instructions:
 We've moved to a fully maintained and regularly updated documentation site.

-👉 [Follow the official self-hosting guide here](https://agpt.co/docs/platform/getting-started/getting-started)
+👉 [Follow the official self-hosting guide here](https://docs.agpt.co/platform/getting-started/)


 This tutorial assumes you have Docker, VSCode, git and npm installed.
@@ -83,13 +83,13 @@ The AutoGPT frontend is where users interact with our powerful AI automation pla

   **Agent Builder:** For those who want to customize, our intuitive, low-code interface allows you to design and configure your own AI agents. 
   
-   **Workflow Management:** Build, modify, and optimize your automation workflows with ease. You build your agent by connecting blocks, where each block performs a single action.
+   **Workflow Management:** Build, modify, and optimize your automation workflows with ease. You build your agent by connecting blocks, where each block     performs a single action.
   
   **Deployment Controls:** Manage the lifecycle of your agents, from testing to production.
   
   **Ready-to-Use Agents:** Don't want to build? Simply select from our library of pre-configured agents and put them to work immediately.
   
-   **Agent Interaction:** Whether you've built your own or are using pre-configured agents, easily run and interact with them through our user-friendly interface.
+   **Agent Interaction:** Whether you've built your own or are using pre-configured agents, easily run and interact with them through our user-friendly      interface.

   **Monitoring and Analytics:** Keep track of your agents' performance and gain insights to continually improve your automation processes.

--- a/autogpt_platform/.gitignore
+++ b/autogpt_platform/.gitignore
@@ -1,3 +1,2 @@
 *.ignore.*
-*.ign.*
-.application.logs
+*.ign.*
--- a/autogpt_platform/AGENTS.md
+++ b/autogpt_platform/AGENTS.md
@@ -1,120 +0,0 @@
-# AutoGPT Platform
-
-This file provides guidance to coding agents when working with code in this repository.
-
-## Repository Overview
-
-AutoGPT Platform is a monorepo containing:
-
- **Backend** (`backend`): Python FastAPI server with async support
- **Frontend** (`frontend`): Next.js React application
- **Shared Libraries** (`autogpt_libs`): Common Python utilities
-
-## Component Documentation
-
- **Backend**: See @backend/AGENTS.md for backend-specific commands, architecture, and development tasks
- **Frontend**: See @frontend/AGENTS.md for frontend-specific commands, architecture, and development patterns
-
-## Key Concepts
-
-1. **Agent Graphs**: Workflow definitions stored as JSON, executed by the backend
-2. **Blocks**: Reusable components in `backend/backend/blocks/` that perform specific tasks
-3. **Integrations**: OAuth and API connections stored per user
-4. **Store**: Marketplace for sharing agent templates
-5. **Virus Scanning**: ClamAV integration for file upload security
-
-### Environment Configuration
-
-#### Configuration Files
-
- **Backend**: `backend/.env.default` (defaults) → `backend/.env` (user overrides)
- **Frontend**: `frontend/.env.default` (defaults) → `frontend/.env` (user overrides)
- **Platform**: `.env.default` (Supabase/shared defaults) → `.env` (user overrides)
-
-#### Docker Environment Loading Order
-
-1. `.env.default` files provide base configuration (tracked in git)
-2. `.env` files provide user-specific overrides (gitignored)
-3. Docker Compose `environment:` sections provide service-specific overrides
-4. Shell environment variables have highest precedence
-
-#### Key Points
-
- All services use hardcoded defaults in docker-compose files (no `${VARIABLE}` substitutions)
- The `env_file` directive loads variables INTO containers at runtime
- Backend/Frontend services use YAML anchors for consistent configuration
- Supabase services (`db/docker/docker-compose.yml`) follow the same pattern
-
-### Branching Strategy
-
- **`dev`** is the main development branch. All PRs should target `dev`.
- **`master`** is the production branch. Only used for production releases.
-
-### Creating Pull Requests
-
- Create the PR against the `dev` branch of the repository.
- **Split PRs by concern** — each PR should have a single clear purpose. For example, "usage tracking" and "credit charging" should be separate PRs even if related. Combining multiple concerns makes it harder for reviewers to understand what belongs to what.
- Ensure the branch name is descriptive (e.g., `feature/add-new-block`)
- Use conventional commit messages (see below)
- **Structure the PR description with Why / What / How** — Why: the motivation (what problem it solves, what's broken/missing without it); What: high-level summary of changes; How: approach, key implementation details, or architecture decisions. Reviewers need all three to judge whether the approach fits the problem.
- Fill out the .github/PULL_REQUEST_TEMPLATE.md template as the PR description
- Always use `--body-file` to pass PR body — avoids shell interpretation of backticks and special characters:
-  ```bash
-  PR_BODY=$(mktemp)
-  cat > "$PR_BODY" << 'PREOF'
-  ## Summary
-  - use `backticks` freely here
-  PREOF
-  gh pr create --title "..." --body-file "$PR_BODY" --base dev
-  rm "$PR_BODY"
-  ```
- Run the github pre-commit hooks to ensure code quality.
-
-### Test-Driven Development (TDD)
-
-When fixing a bug or adding a feature, follow a test-first approach:
-
-1. **Write a failing test first** — create a test that reproduces the bug or validates the new behavior, marked with `@pytest.mark.xfail` (backend) or `.fixme` (Playwright). Run it to confirm it fails for the right reason.
-2. **Implement the fix/feature** — write the minimal code to make the test pass.
-3. **Remove the xfail marker** — once the test passes, remove the `xfail`/`.fixme` annotation and run the full test suite to confirm nothing else broke.
-
-This ensures every change is covered by a test and that the test actually validates the intended behavior.
-
-### Reviewing/Revising Pull Requests
-
-Use `/pr-review` to review a PR or `/pr-address` to address comments.
-
-When fetching comments manually:
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate` — top-level reviews
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate` — inline review comments (always paginate to avoid missing comments beyond page 1)
- `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments` — PR conversation comments
-
-### Conventional Commits
-
-Use this format for commit messages and Pull Request titles:
-
-**Conventional Commit Types:**
-
- `feat`: Introduces a new feature to the codebase
- `fix`: Patches a bug in the codebase
- `refactor`: Code change that neither fixes a bug nor adds a feature; also applies to removing features
- `ci`: Changes to CI configuration
- `docs`: Documentation-only changes
- `dx`: Improvements to the developer experience
-
-**Recommended Base Scopes:**
-
- `platform`: Changes affecting both frontend and backend
- `frontend`
- `backend`
- `infra`
- `blocks`: Modifications/additions of individual blocks
-
-**Subscope Examples:**
-
- `backend/executor`
- `backend/db`
- `frontend/builder` (includes changes to the block UI component)
- `infra/prod`
-
-Use these scopes and subscopes for clarity and consistency in commit messages.
--- a/autogpt_platform/CLAUDE.md
+++ b/autogpt_platform/CLAUDE.md
@@ -1 +1,90 @@
-@AGENTS.md
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Repository Overview
+
+AutoGPT Platform is a monorepo containing:
+
+- **Backend** (`backend`): Python FastAPI server with async support
+- **Frontend** (`frontend`): Next.js React application
+- **Shared Libraries** (`autogpt_libs`): Common Python utilities
+
+## Component Documentation
+
+- **Backend**: See @backend/CLAUDE.md for backend-specific commands, architecture, and development tasks
+- **Frontend**: See @frontend/CLAUDE.md for frontend-specific commands, architecture, and development patterns
+
+## Key Concepts
+
+1. **Agent Graphs**: Workflow definitions stored as JSON, executed by the backend
+2. **Blocks**: Reusable components in `backend/backend/blocks/` that perform specific tasks
+3. **Integrations**: OAuth and API connections stored per user
+4. **Store**: Marketplace for sharing agent templates
+5. **Virus Scanning**: ClamAV integration for file upload security
+
+### Environment Configuration
+
+#### Configuration Files
+
+- **Backend**: `backend/.env.default` (defaults) → `backend/.env` (user overrides)
+- **Frontend**: `frontend/.env.default` (defaults) → `frontend/.env` (user overrides)
+- **Platform**: `.env.default` (Supabase/shared defaults) → `.env` (user overrides)
+
+#### Docker Environment Loading Order
+
+1. `.env.default` files provide base configuration (tracked in git)
+2. `.env` files provide user-specific overrides (gitignored)
+3. Docker Compose `environment:` sections provide service-specific overrides
+4. Shell environment variables have highest precedence
+
+#### Key Points
+
+- All services use hardcoded defaults in docker-compose files (no `${VARIABLE}` substitutions)
+- The `env_file` directive loads variables INTO containers at runtime
+- Backend/Frontend services use YAML anchors for consistent configuration
+- Supabase services (`db/docker/docker-compose.yml`) follow the same pattern
+
+### Creating Pull Requests
+
+- Create the PR against the `dev` branch of the repository.
+- Ensure the branch name is descriptive (e.g., `feature/add-new-block`)
+- Use conventional commit messages (see below)
+- Fill out the .github/PULL_REQUEST_TEMPLATE.md template as the PR description
+- Run the github pre-commit hooks to ensure code quality.
+
+### Reviewing/Revising Pull Requests
+
+- When the user runs /pr-comments or tries to fetch them, also run gh api /repos/Significant-Gravitas/AutoGPT/pulls/[issuenum]/reviews to get the reviews
+- Use gh api /repos/Significant-Gravitas/AutoGPT/pulls/[issuenum]/reviews/[review_id]/comments to get the review contents
+- Use gh api /repos/Significant-Gravitas/AutoGPT/issues/9924/comments to get the pr specific comments
+
+### Conventional Commits
+
+Use this format for commit messages and Pull Request titles:
+
+**Conventional Commit Types:**
+
+- `feat`: Introduces a new feature to the codebase
+- `fix`: Patches a bug in the codebase
+- `refactor`: Code change that neither fixes a bug nor adds a feature; also applies to removing features
+- `ci`: Changes to CI configuration
+- `docs`: Documentation-only changes
+- `dx`: Improvements to the developer experience
+
+**Recommended Base Scopes:**
+
+- `platform`: Changes affecting both frontend and backend
+- `frontend`
+- `backend`
+- `infra`
+- `blocks`: Modifications/additions of individual blocks
+
+**Subscope Examples:**
+
+- `backend/executor`
+- `backend/db`
+- `frontend/builder` (includes changes to the block UI component)
+- `infra/prod`
+
+Use these scopes and subscopes for clarity and consistency in commit messages.
--- a/autogpt_platform/analytics/queries/auth_activities.sql
+++ b/autogpt_platform/analytics/queries/auth_activities.sql
@@ -1,40 +0,0 @@
-- =============================================================
-- View: analytics.auth_activities
-- Looker source alias: ds49  |  Charts: 1
-- =============================================================
-- DESCRIPTION
--   Tracks authentication events (login, logout, SSO, password
--   reset, etc.) from Supabase's internal audit log.
--   Useful for monitoring sign-in patterns and detecting anomalies.
--
-- SOURCE TABLES
--   auth.audit_log_entries  — Supabase internal auth event log
--
-- OUTPUT COLUMNS
--   created_at      TIMESTAMPTZ  When the auth event occurred
--   actor_id        TEXT         User ID who triggered the event
--   actor_via_sso   TEXT         Whether the action was via SSO ('true'/'false')
--   action          TEXT         Event type (e.g. 'login', 'logout', 'token_refreshed')
--
-- WINDOW
--   Rolling 90 days from current date
--
-- EXAMPLE QUERIES
--   -- Daily login counts
--   SELECT DATE_TRUNC('day', created_at) AS day, COUNT(*) AS logins
--   FROM analytics.auth_activities
--   WHERE action = 'login'
--   GROUP BY 1 ORDER BY 1;
--
--   -- SSO vs password login breakdown
--   SELECT actor_via_sso, COUNT(*) FROM analytics.auth_activities
--   WHERE action = 'login' GROUP BY 1;
-- =============================================================
-
-SELECT
-    created_at,
-    payload->>'actor_id'      AS actor_id,
-    payload->>'actor_via_sso' AS actor_via_sso,
-    payload->>'action'        AS action
-FROM auth.audit_log_entries
-WHERE created_at >= NOW() - INTERVAL '90 days'
--- a/autogpt_platform/analytics/queries/graph_execution.sql
+++ b/autogpt_platform/analytics/queries/graph_execution.sql
@@ -1,105 +0,0 @@
-- =============================================================
-- View: analytics.graph_execution
-- Looker source alias: ds16  |  Charts: 21
-- =============================================================
-- DESCRIPTION
--   One row per agent graph execution (last 90 days).
--   Unpacks the JSONB stats column into individual numeric columns
--   and normalises the executionStatus — runs that failed due to
--   insufficient credits are reclassified as 'NO_CREDITS' for
--   easier filtering.  Error messages are scrubbed of IDs and URLs
--   to allow safe grouping.
--
-- SOURCE TABLES
--   platform.AgentGraphExecution  — Execution records
--   platform.AgentGraph           — Agent graph metadata (for name)
--   platform.LibraryAgent         — To flag possibly-AI (safe-mode) agents
--
-- OUTPUT COLUMNS
--   id                TEXT         Execution UUID
--   agentGraphId      TEXT         Agent graph UUID
--   agentGraphVersion INT          Graph version number
--   executionStatus   TEXT         COMPLETED | FAILED | NO_CREDITS | RUNNING | QUEUED | TERMINATED
--   createdAt         TIMESTAMPTZ  When the execution was queued
--   updatedAt         TIMESTAMPTZ  Last status update time
--   userId            TEXT         Owner user UUID
--   agentGraphName    TEXT         Human-readable agent name
--   cputime           DECIMAL      Total CPU seconds consumed
--   walltime          DECIMAL      Total wall-clock seconds
--   node_count        DECIMAL      Number of nodes in the graph
--   nodes_cputime     DECIMAL      CPU time across all nodes
--   nodes_walltime    DECIMAL      Wall time across all nodes
--   execution_cost    DECIMAL      Credit cost of this execution
--   correctness_score FLOAT        AI correctness score (if available)
--   possibly_ai       BOOLEAN      True if agent has sensitive_action_safe_mode enabled
--   groupedErrorMessage TEXT       Scrubbed error string (IDs/URLs replaced with wildcards)
--
-- WINDOW
--   Rolling 90 days (createdAt > CURRENT_DATE - 90 days)
--
-- EXAMPLE QUERIES
--   -- Daily execution counts by status
--   SELECT DATE_TRUNC('day', "createdAt") AS day, "executionStatus", COUNT(*)
--   FROM analytics.graph_execution
--   GROUP BY 1, 2 ORDER BY 1;
--
--   -- Average cost per execution by agent
--   SELECT "agentGraphName", AVG("execution_cost") AS avg_cost, COUNT(*) AS runs
--   FROM analytics.graph_execution
--   WHERE "executionStatus" = 'COMPLETED'
--   GROUP BY 1 ORDER BY avg_cost DESC;
--
--   -- Top error messages
--   SELECT "groupedErrorMessage", COUNT(*) AS occurrences
--   FROM analytics.graph_execution
--   WHERE "executionStatus" = 'FAILED'
--   GROUP BY 1 ORDER BY 2 DESC LIMIT 20;
-- =============================================================
-
-SELECT
-    ge."id"                                                        AS id,
-    ge."agentGraphId"                                              AS agentGraphId,
-    ge."agentGraphVersion"                                         AS agentGraphVersion,
-    CASE
-        WHEN jsonb_exists(ge."stats"::jsonb, 'error')
-         AND (
-               (ge."stats"::jsonb->>'error') ILIKE '%insufficient balance%'
-            OR (ge."stats"::jsonb->>'error') ILIKE '%you have no credits left%'
-             )
-        THEN 'NO_CREDITS'
-        ELSE CAST(ge."executionStatus" AS TEXT)
-    END                                                            AS executionStatus,
-    ge."createdAt"                                                 AS createdAt,
-    ge."updatedAt"                                                 AS updatedAt,
-    ge."userId"                                                    AS userId,
-    g."name"                                                       AS agentGraphName,
-    (ge."stats"::jsonb->>'cputime')::decimal                       AS cputime,
-    (ge."stats"::jsonb->>'walltime')::decimal                      AS walltime,
-    (ge."stats"::jsonb->>'node_count')::decimal                    AS node_count,
-    (ge."stats"::jsonb->>'nodes_cputime')::decimal                 AS nodes_cputime,
-    (ge."stats"::jsonb->>'nodes_walltime')::decimal                AS nodes_walltime,
-    (ge."stats"::jsonb->>'cost')::decimal                          AS execution_cost,
-    (ge."stats"::jsonb->>'correctness_score')::float               AS correctness_score,
-    COALESCE(la.possibly_ai, FALSE)                                AS possibly_ai,
-    REGEXP_REPLACE(
-        REGEXP_REPLACE(
-            TRIM(BOTH '"' FROM ge."stats"::jsonb->>'error'),
-            '(https?://)([A-Za-z0-9.-]+)(:[0-9]+)?(/[^\s]*)?',
-            '\1\2/...', 'gi'
-        ),
-        '[a-zA-Z0-9_:-]*\d[a-zA-Z0-9_:-]*', '*', 'g'
-    )                                                              AS groupedErrorMessage
-FROM platform."AgentGraphExecution" ge
-LEFT JOIN platform."AgentGraph" g
-       ON ge."agentGraphId" = g."id"
-      AND ge."agentGraphVersion" = g."version"
-LEFT JOIN (
-    SELECT DISTINCT ON ("userId", "agentGraphId")
-           "userId", "agentGraphId",
-           ("settings"::jsonb->>'sensitive_action_safe_mode')::boolean AS possibly_ai
-    FROM platform."LibraryAgent"
-    WHERE "isDeleted"  = FALSE
-      AND "isArchived" = FALSE
-    ORDER BY "userId", "agentGraphId", "agentGraphVersion" DESC
-) la ON la."userId" = ge."userId" AND la."agentGraphId" = ge."agentGraphId"
-WHERE ge."createdAt" > CURRENT_DATE - INTERVAL '90 days'
--- a/autogpt_platform/analytics/queries/node_block_execution.sql
+++ b/autogpt_platform/analytics/queries/node_block_execution.sql
@@ -1,101 +0,0 @@
-- =============================================================
-- View: analytics.node_block_execution
-- Looker source alias: ds14  |  Charts: 11
-- =============================================================
-- DESCRIPTION
--   One row per node (block) execution (last 90 days).
--   Unpacks stats JSONB and joins to identify which block type
--   was run.  For failed nodes, joins the error output and
--   scrubs it for safe grouping.
--
-- SOURCE TABLES
--   platform.AgentNodeExecution              — Node execution records
--   platform.AgentNode                       — Node → block mapping
--   platform.AgentBlock                      — Block name/ID
--   platform.AgentNodeExecutionInputOutput   — Error output values
--
-- OUTPUT COLUMNS
--   id                    TEXT         Node execution UUID
--   agentGraphExecutionId TEXT         Parent graph execution UUID
--   agentNodeId           TEXT         Node UUID within the graph
--   executionStatus       TEXT         COMPLETED | FAILED | QUEUED | RUNNING | TERMINATED
--   addedTime             TIMESTAMPTZ  When the node was queued
--   queuedTime            TIMESTAMPTZ  When it entered the queue
--   startedTime           TIMESTAMPTZ  When execution started
--   endedTime             TIMESTAMPTZ  When execution finished
--   inputSize             BIGINT       Input payload size in bytes
--   outputSize            BIGINT       Output payload size in bytes
--   walltime              NUMERIC      Wall-clock seconds for this node
--   cputime               NUMERIC      CPU seconds for this node
--   llmRetryCount         INT          Number of LLM retries
--   llmCallCount          INT          Number of LLM API calls made
--   inputTokenCount       BIGINT       LLM input tokens consumed
--   outputTokenCount      BIGINT       LLM output tokens produced
--   blockName             TEXT         Human-readable block name (e.g. 'OpenAIBlock')
--   blockId               TEXT         Block UUID
--   groupedErrorMessage   TEXT         Scrubbed error (IDs/URLs wildcarded)
--   errorMessage          TEXT         Raw error output (only set when FAILED)
--
-- WINDOW
--   Rolling 90 days (addedTime > CURRENT_DATE - 90 days)
--
-- EXAMPLE QUERIES
--   -- Most-used blocks by execution count
--   SELECT "blockName", COUNT(*) AS executions,
--          COUNT(*) FILTER (WHERE "executionStatus"='FAILED') AS failures
--   FROM analytics.node_block_execution
--   GROUP BY 1 ORDER BY executions DESC LIMIT 20;
--
--   -- Average LLM token usage per block
--   SELECT "blockName",
--          AVG("inputTokenCount") AS avg_input_tokens,
--          AVG("outputTokenCount") AS avg_output_tokens
--   FROM analytics.node_block_execution
--   WHERE "llmCallCount" > 0
--   GROUP BY 1 ORDER BY avg_input_tokens DESC;
--
--   -- Top failure reasons
--   SELECT "blockName", "groupedErrorMessage", COUNT(*) AS count
--   FROM analytics.node_block_execution
--   WHERE "executionStatus" = 'FAILED'
--   GROUP BY 1, 2 ORDER BY count DESC LIMIT 20;
-- =============================================================
-
-SELECT
-    ne."id"                                                            AS id,
-    ne."agentGraphExecutionId"                                         AS agentGraphExecutionId,
-    ne."agentNodeId"                                                   AS agentNodeId,
-    CAST(ne."executionStatus" AS TEXT)                                 AS executionStatus,
-    ne."addedTime"                                                     AS addedTime,
-    ne."queuedTime"                                                    AS queuedTime,
-    ne."startedTime"                                                   AS startedTime,
-    ne."endedTime"                                                     AS endedTime,
-    (ne."stats"::jsonb->>'input_size')::bigint                         AS inputSize,
-    (ne."stats"::jsonb->>'output_size')::bigint                        AS outputSize,
-    (ne."stats"::jsonb->>'walltime')::numeric                          AS walltime,
-    (ne."stats"::jsonb->>'cputime')::numeric                           AS cputime,
-    (ne."stats"::jsonb->>'llm_retry_count')::int                       AS llmRetryCount,
-    (ne."stats"::jsonb->>'llm_call_count')::int                        AS llmCallCount,
-    (ne."stats"::jsonb->>'input_token_count')::bigint                  AS inputTokenCount,
-    (ne."stats"::jsonb->>'output_token_count')::bigint                 AS outputTokenCount,
-    b."name"                                                           AS blockName,
-    b."id"                                                             AS blockId,
-    REGEXP_REPLACE(
-        REGEXP_REPLACE(
-            TRIM(BOTH '"' FROM eio."data"::text),
-            '(https?://)([A-Za-z0-9.-]+)(:[0-9]+)?(/[^\s]*)?',
-            '\1\2/...', 'gi'
-        ),
-        '[a-zA-Z0-9_:-]*\d[a-zA-Z0-9_:-]*', '*', 'g'
-    )                                                                  AS groupedErrorMessage,
-    eio."data"                                                         AS errorMessage
-FROM platform."AgentNodeExecution" ne
-LEFT JOIN platform."AgentNode" nd
-       ON ne."agentNodeId" = nd."id"
-LEFT JOIN platform."AgentBlock" b
-       ON nd."agentBlockId" = b."id"
-LEFT JOIN platform."AgentNodeExecutionInputOutput" eio
-       ON eio."referencedByOutputExecId" = ne."id"
-      AND eio."name" = 'error'
-      AND ne."executionStatus" = 'FAILED'
-WHERE ne."addedTime" > CURRENT_DATE - INTERVAL '90 days'
--- a/autogpt_platform/analytics/queries/retention_agent.sql
+++ b/autogpt_platform/analytics/queries/retention_agent.sql
@@ -1,97 +0,0 @@
-- =============================================================
-- View: analytics.retention_agent
-- Looker source alias: ds35  |  Charts: 2
-- =============================================================
-- DESCRIPTION
--   Weekly cohort retention broken down per individual agent.
--   Cohort = week of a user's first use of THAT specific agent.
--   Tells you which agents keep users coming back vs. one-shot
--   use. Only includes cohorts from the last 180 days.
--
-- SOURCE TABLES
--   platform.AgentGraphExecution  — Execution records (user × agent × time)
--   platform.AgentGraph           — Agent names
--
-- OUTPUT COLUMNS
--   agent_id            TEXT   Agent graph UUID
--   agent_label         TEXT   'AgentName [first8chars]'
--   agent_label_n       TEXT   'AgentName [first8chars] (n=total_users)'
--   cohort_week_start   DATE   Week users first ran this agent
--   cohort_label        TEXT   ISO week label
--   cohort_label_n      TEXT   ISO week label with cohort size
--   user_lifetime_week  INT    Weeks since first use of this agent
--   cohort_users        BIGINT Users in this cohort for this agent
--   active_users        BIGINT Users who ran the agent again in week k
--   retention_rate      FLOAT  active_users / cohort_users
--   cohort_users_w0     BIGINT cohort_users only at week 0 (safe to SUM)
--   agent_total_users   BIGINT Total users across all cohorts for this agent
--
-- EXAMPLE QUERIES
--   -- Best-retained agents at week 2
--   SELECT agent_label, AVG(retention_rate) AS w2_retention
--   FROM analytics.retention_agent
--   WHERE user_lifetime_week = 2 AND cohort_users >= 10
--   GROUP BY 1 ORDER BY w2_retention DESC LIMIT 10;
--
--   -- Agents with most unique users
--   SELECT DISTINCT agent_label, agent_total_users
--   FROM analytics.retention_agent
--   ORDER BY agent_total_users DESC LIMIT 20;
-- =============================================================
-
-WITH params AS (SELECT 12::int AS max_weeks, (CURRENT_DATE - INTERVAL '180 days') AS cohort_start),
-events AS (
-  SELECT e."userId"::text AS user_id, e."agentGraphId" AS agent_id,
-         e."createdAt"::timestamptz AS created_at,
-         DATE_TRUNC('week', e."createdAt")::date AS week_start
-  FROM platform."AgentGraphExecution" e
-),
-first_use AS (
-  SELECT user_id, agent_id, MIN(created_at) AS first_use_at,
-         DATE_TRUNC('week', MIN(created_at))::date AS cohort_week_start
-  FROM events GROUP BY 1,2
-  HAVING MIN(created_at) >= (SELECT cohort_start FROM params)
-),
-activity_weeks AS (SELECT DISTINCT user_id, agent_id, week_start FROM events),
-user_week_age AS (
-  SELECT aw.user_id, aw.agent_id, fu.cohort_week_start,
-         ((aw.week_start - DATE_TRUNC('week',fu.first_use_at)::date)/7)::int AS user_lifetime_week
-  FROM activity_weeks aw JOIN first_use fu USING (user_id, agent_id)
-  WHERE aw.week_start >= DATE_TRUNC('week',fu.first_use_at)::date
-),
-active_counts AS (
-  SELECT agent_id, cohort_week_start, user_lifetime_week, COUNT(DISTINCT user_id) AS active_users
-  FROM user_week_age WHERE user_lifetime_week >= 0 GROUP BY 1,2,3
-),
-cohort_sizes AS (
-  SELECT agent_id, cohort_week_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_use GROUP BY 1,2
-),
-cohort_caps AS (
-  SELECT cs.agent_id, cs.cohort_week_start, cs.cohort_users,
-         LEAST((SELECT max_weeks FROM params),
-               GREATEST(0,((DATE_TRUNC('week',CURRENT_DATE)::date-cs.cohort_week_start)/7)::int)) AS cap_weeks
-  FROM cohort_sizes cs
-),
-grid AS (
-  SELECT cc.agent_id, cc.cohort_week_start, gs AS user_lifetime_week, cc.cohort_users
-  FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_weeks) gs
-),
-agent_names AS (SELECT DISTINCT ON (g."id") g."id" AS agent_id, g."name" AS agent_name FROM platform."AgentGraph" g ORDER BY g."id", g."version" DESC),
-agent_total_users AS (SELECT agent_id, SUM(cohort_users) AS agent_total_users FROM cohort_sizes GROUP BY 1)
-SELECT
-  g.agent_id,
-  COALESCE(an.agent_name,'(unnamed)')||' ['||LEFT(g.agent_id::text,8)||']'  AS agent_label,
-  COALESCE(an.agent_name,'(unnamed)')||' ['||LEFT(g.agent_id::text,8)||'] (n='||COALESCE(atu.agent_total_users,0)||')' AS agent_label_n,
-  g.cohort_week_start,
-  TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')                               AS cohort_label,
-  TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')||' (n='||g.cohort_users||')'  AS cohort_label_n,
-  g.user_lifetime_week, g.cohort_users,
-  COALESCE(ac.active_users,0)                                              AS active_users,
-  COALESCE(ac.active_users,0)::float / NULLIF(g.cohort_users,0)           AS retention_rate,
-  CASE WHEN g.user_lifetime_week=0 THEN g.cohort_users ELSE 0 END         AS cohort_users_w0,
-  COALESCE(atu.agent_total_users,0)                                        AS agent_total_users
-FROM grid g
-LEFT JOIN active_counts     ac  ON ac.agent_id=g.agent_id AND ac.cohort_week_start=g.cohort_week_start AND ac.user_lifetime_week=g.user_lifetime_week
-LEFT JOIN agent_names       an  ON an.agent_id=g.agent_id
-LEFT JOIN agent_total_users atu ON atu.agent_id=g.agent_id
-ORDER BY agent_label, g.cohort_week_start, g.user_lifetime_week;
--- a/autogpt_platform/analytics/queries/retention_execution_daily.sql
+++ b/autogpt_platform/analytics/queries/retention_execution_daily.sql
@@ -1,81 +0,0 @@
-- =============================================================
-- View: analytics.retention_execution_daily
-- Looker source alias: ds111  |  Charts: 1
-- =============================================================
-- DESCRIPTION
--   Daily cohort retention based on agent executions.
--   Cohort anchor = day of user's FIRST ever execution.
--   Only includes cohorts from the last 90 days, up to day 30.
--   Great for early engagement analysis (did users run another
--   agent the next day?).
--
-- SOURCE TABLES
--   platform.AgentGraphExecution  — Execution records
--
-- OUTPUT COLUMNS
--   Same pattern as retention_login_daily.
--   cohort_day_start = day of first execution (not first login)
--
-- EXAMPLE QUERIES
--   -- Day-3 execution retention
--   SELECT cohort_label, retention_rate_bounded AS d3_retention
--   FROM analytics.retention_execution_daily
--   WHERE user_lifetime_day = 3 ORDER BY cohort_day_start;
-- =============================================================
-
-WITH params AS (SELECT 30::int AS max_days, (CURRENT_DATE - INTERVAL '90 days') AS cohort_start),
-events AS (
-  SELECT e."userId"::text AS user_id, e."createdAt"::timestamptz AS created_at,
-         DATE_TRUNC('day', e."createdAt")::date AS day_start
-  FROM platform."AgentGraphExecution" e WHERE e."userId" IS NOT NULL
-),
-first_exec AS (
-  SELECT user_id, MIN(created_at) AS first_exec_at,
-         DATE_TRUNC('day', MIN(created_at))::date AS cohort_day_start
-  FROM events GROUP BY 1
-  HAVING MIN(created_at) >= (SELECT cohort_start FROM params)
-),
-activity_days AS (SELECT DISTINCT user_id, day_start FROM events),
-user_day_age AS (
-  SELECT ad.user_id, fe.cohort_day_start,
-         (ad.day_start - DATE_TRUNC('day',fe.first_exec_at)::date)::int AS user_lifetime_day
-  FROM activity_days ad JOIN first_exec fe USING (user_id)
-  WHERE ad.day_start >= DATE_TRUNC('day',fe.first_exec_at)::date
-),
-bounded_counts AS (
-  SELECT cohort_day_start, user_lifetime_day, COUNT(DISTINCT user_id) AS active_users_bounded
-  FROM user_day_age WHERE user_lifetime_day >= 0 GROUP BY 1,2
-),
-last_active AS (
-  SELECT cohort_day_start, user_id, MAX(user_lifetime_day) AS last_active_day FROM user_day_age GROUP BY 1,2
-),
-unbounded_counts AS (
-  SELECT la.cohort_day_start, gs AS user_lifetime_day, COUNT(*) AS retained_users_unbounded
-  FROM last_active la
-  CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_day,(SELECT max_days FROM params))) gs
-  GROUP BY 1,2
-),
-cohort_sizes AS (SELECT cohort_day_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_exec GROUP BY 1),
-cohort_caps AS (
-  SELECT cs.cohort_day_start, cs.cohort_users,
-         LEAST((SELECT max_days FROM params), GREATEST(0,(CURRENT_DATE-cs.cohort_day_start)::int)) AS cap_days
-  FROM cohort_sizes cs
-),
-grid AS (
-  SELECT cc.cohort_day_start, gs AS user_lifetime_day, cc.cohort_users
-  FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_days) gs
-)
-SELECT
-  g.cohort_day_start,
-  TO_CHAR(g.cohort_day_start,'YYYY-MM-DD')                                AS cohort_label,
-  TO_CHAR(g.cohort_day_start,'YYYY-MM-DD')||' (n='||g.cohort_users||')'   AS cohort_label_n,
-  g.user_lifetime_day, g.cohort_users,
-  COALESCE(b.active_users_bounded,0)     AS active_users_bounded,
-  COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
-  CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END    AS retention_rate_bounded,
-  CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
-  CASE WHEN g.user_lifetime_day=0 THEN g.cohort_users ELSE 0 END          AS cohort_users_d0
-FROM grid g
-LEFT JOIN bounded_counts   b ON b.cohort_day_start=g.cohort_day_start AND b.user_lifetime_day=g.user_lifetime_day
-LEFT JOIN unbounded_counts u ON u.cohort_day_start=g.cohort_day_start AND u.user_lifetime_day=g.user_lifetime_day
-ORDER BY g.cohort_day_start, g.user_lifetime_day;
--- a/autogpt_platform/analytics/queries/retention_execution_weekly.sql
+++ b/autogpt_platform/analytics/queries/retention_execution_weekly.sql
@@ -1,81 +0,0 @@
-- =============================================================
-- View: analytics.retention_execution_weekly
-- Looker source alias: ds92  |  Charts: 2
-- =============================================================
-- DESCRIPTION
--   Weekly cohort retention based on agent executions.
--   Cohort anchor = week of user's FIRST ever agent execution
--   (not first login). Only includes cohorts from the last 180 days.
--   Useful when you care about product engagement, not just visits.
--
-- SOURCE TABLES
--   platform.AgentGraphExecution  — Execution records
--
-- OUTPUT COLUMNS
--   Same pattern as retention_login_weekly.
--   cohort_week_start = week of first execution (not first login)
--
-- EXAMPLE QUERIES
--   -- Week-2 execution retention
--   SELECT cohort_label, retention_rate_bounded
--   FROM analytics.retention_execution_weekly
--   WHERE user_lifetime_week = 2 ORDER BY cohort_week_start;
-- =============================================================
-
-WITH params AS (SELECT 12::int AS max_weeks, (CURRENT_DATE - INTERVAL '180 days') AS cohort_start),
-events AS (
-  SELECT e."userId"::text AS user_id, e."createdAt"::timestamptz AS created_at,
-         DATE_TRUNC('week', e."createdAt")::date AS week_start
-  FROM platform."AgentGraphExecution" e WHERE e."userId" IS NOT NULL
-),
-first_exec AS (
-  SELECT user_id, MIN(created_at) AS first_exec_at,
-         DATE_TRUNC('week', MIN(created_at))::date AS cohort_week_start
-  FROM events GROUP BY 1
-  HAVING MIN(created_at) >= (SELECT cohort_start FROM params)
-),
-activity_weeks AS (SELECT DISTINCT user_id, week_start FROM events),
-user_week_age AS (
-  SELECT aw.user_id, fe.cohort_week_start,
-         ((aw.week_start - DATE_TRUNC('week',fe.first_exec_at)::date)/7)::int AS user_lifetime_week
-  FROM activity_weeks aw JOIN first_exec fe USING (user_id)
-  WHERE aw.week_start >= DATE_TRUNC('week',fe.first_exec_at)::date
-),
-bounded_counts AS (
-  SELECT cohort_week_start, user_lifetime_week, COUNT(DISTINCT user_id) AS active_users_bounded
-  FROM user_week_age WHERE user_lifetime_week >= 0 GROUP BY 1,2
-),
-last_active AS (
-  SELECT cohort_week_start, user_id, MAX(user_lifetime_week) AS last_active_week FROM user_week_age GROUP BY 1,2
-),
-unbounded_counts AS (
-  SELECT la.cohort_week_start, gs AS user_lifetime_week, COUNT(*) AS retained_users_unbounded
-  FROM last_active la
-  CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_week,(SELECT max_weeks FROM params))) gs
-  GROUP BY 1,2
-),
-cohort_sizes AS (SELECT cohort_week_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_exec GROUP BY 1),
-cohort_caps AS (
-  SELECT cs.cohort_week_start, cs.cohort_users,
-         LEAST((SELECT max_weeks FROM params),
-               GREATEST(0,((DATE_TRUNC('week',CURRENT_DATE)::date-cs.cohort_week_start)/7)::int)) AS cap_weeks
-  FROM cohort_sizes cs
-),
-grid AS (
-  SELECT cc.cohort_week_start, gs AS user_lifetime_week, cc.cohort_users
-  FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_weeks) gs
-)
-SELECT
-  g.cohort_week_start,
-  TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')                               AS cohort_label,
-  TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')||' (n='||g.cohort_users||')'  AS cohort_label_n,
-  g.user_lifetime_week, g.cohort_users,
-  COALESCE(b.active_users_bounded,0)     AS active_users_bounded,
-  COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
-  CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END    AS retention_rate_bounded,
-  CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
-  CASE WHEN g.user_lifetime_week=0 THEN g.cohort_users ELSE 0 END         AS cohort_users_w0
-FROM grid g
-LEFT JOIN bounded_counts   b ON b.cohort_week_start=g.cohort_week_start AND b.user_lifetime_week=g.user_lifetime_week
-LEFT JOIN unbounded_counts u ON u.cohort_week_start=g.cohort_week_start AND u.user_lifetime_week=g.user_lifetime_week
-ORDER BY g.cohort_week_start, g.user_lifetime_week;
--- a/autogpt_platform/analytics/queries/retention_login_daily.sql
+++ b/autogpt_platform/analytics/queries/retention_login_daily.sql
@@ -1,94 +0,0 @@
-- =============================================================
-- View: analytics.retention_login_daily
-- Looker source alias: ds112  |  Charts: 1
-- =============================================================
-- DESCRIPTION
--   Daily cohort retention based on login sessions.
--   Same logic as retention_login_weekly but at day granularity,
--   showing up to day 30 for cohorts from the last 90 days.
--   Useful for analysing early activation (days 1-7) in detail.
--
-- SOURCE TABLES
--   auth.sessions  — Login session records
--
-- OUTPUT COLUMNS (same pattern as retention_login_weekly)
--   cohort_day_start          DATE     First day the cohort logged in
--   cohort_label              TEXT     Date string (e.g. '2025-03-01')
--   cohort_label_n            TEXT     Date + cohort size (e.g. '2025-03-01 (n=12)')
--   user_lifetime_day         INT      Days since first login (0 = signup day)
--   cohort_users              BIGINT   Total users in cohort
--   active_users_bounded      BIGINT   Users active on exactly day k
--   retained_users_unbounded  BIGINT   Users active any time on/after day k
--   retention_rate_bounded    FLOAT    bounded / cohort_users
--   retention_rate_unbounded  FLOAT    unbounded / cohort_users
--   cohort_users_d0           BIGINT   cohort_users only at day 0, else 0 (safe to SUM)
--
-- EXAMPLE QUERIES
--   -- Day-1 retention rate (came back next day)
--   SELECT cohort_label, retention_rate_bounded AS d1_retention
--   FROM analytics.retention_login_daily
--   WHERE user_lifetime_day = 1 ORDER BY cohort_day_start;
--
--   -- Average retention curve across all cohorts
--   SELECT user_lifetime_day,
--          SUM(active_users_bounded)::float / NULLIF(SUM(cohort_users_d0), 0) AS avg_retention
--   FROM analytics.retention_login_daily
--   GROUP BY 1 ORDER BY 1;
-- =============================================================
-
-WITH params AS (SELECT 30::int AS max_days, (CURRENT_DATE - INTERVAL '90 days')::date AS cohort_start),
-events AS (
-  SELECT s.user_id::text AS user_id, s.created_at::timestamptz AS created_at,
-         DATE_TRUNC('day', s.created_at)::date AS day_start
-  FROM auth.sessions s WHERE s.user_id IS NOT NULL
-),
-first_login AS (
-  SELECT user_id, MIN(created_at) AS first_login_time,
-         DATE_TRUNC('day', MIN(created_at))::date AS cohort_day_start
-  FROM events GROUP BY 1
-  HAVING MIN(created_at) >= (SELECT cohort_start FROM params)
-),
-activity_days AS (SELECT DISTINCT user_id, day_start FROM events),
-user_day_age AS (
-  SELECT ad.user_id, fl.cohort_day_start,
-         (ad.day_start - DATE_TRUNC('day', fl.first_login_time)::date)::int AS user_lifetime_day
-  FROM activity_days ad JOIN first_login fl USING (user_id)
-  WHERE ad.day_start >= DATE_TRUNC('day', fl.first_login_time)::date
-),
-bounded_counts AS (
-  SELECT cohort_day_start, user_lifetime_day, COUNT(DISTINCT user_id) AS active_users_bounded
-  FROM user_day_age WHERE user_lifetime_day >= 0 GROUP BY 1,2
-),
-last_active AS (
-  SELECT cohort_day_start, user_id, MAX(user_lifetime_day) AS last_active_day FROM user_day_age GROUP BY 1,2
-),
-unbounded_counts AS (
-  SELECT la.cohort_day_start, gs AS user_lifetime_day, COUNT(*) AS retained_users_unbounded
-  FROM last_active la
-  CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_day,(SELECT max_days FROM params))) gs
-  GROUP BY 1,2
-),
-cohort_sizes AS (SELECT cohort_day_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_login GROUP BY 1),
-cohort_caps AS (
-  SELECT cs.cohort_day_start, cs.cohort_users,
-         LEAST((SELECT max_days FROM params), GREATEST(0,(CURRENT_DATE-cs.cohort_day_start)::int)) AS cap_days
-  FROM cohort_sizes cs
-),
-grid AS (
-  SELECT cc.cohort_day_start, gs AS user_lifetime_day, cc.cohort_users
-  FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_days) gs
-)
-SELECT
-  g.cohort_day_start,
-  TO_CHAR(g.cohort_day_start,'YYYY-MM-DD')                                  AS cohort_label,
-  TO_CHAR(g.cohort_day_start,'YYYY-MM-DD')||' (n='||g.cohort_users||')'     AS cohort_label_n,
-  g.user_lifetime_day, g.cohort_users,
-  COALESCE(b.active_users_bounded,0)     AS active_users_bounded,
-  COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
-  CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END    AS retention_rate_bounded,
-  CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
-  CASE WHEN g.user_lifetime_day=0 THEN g.cohort_users ELSE 0 END            AS cohort_users_d0
-FROM grid g
-LEFT JOIN bounded_counts   b ON b.cohort_day_start=g.cohort_day_start AND b.user_lifetime_day=g.user_lifetime_day
-LEFT JOIN unbounded_counts u ON u.cohort_day_start=g.cohort_day_start AND u.user_lifetime_day=g.user_lifetime_day
-ORDER BY g.cohort_day_start, g.user_lifetime_day;
--- a/autogpt_platform/analytics/queries/retention_login_onboarded_weekly.sql
+++ b/autogpt_platform/analytics/queries/retention_login_onboarded_weekly.sql
@@ -1,96 +0,0 @@
-- =============================================================
-- View: analytics.retention_login_onboarded_weekly
-- Looker source alias: ds101  |  Charts: 2
-- =============================================================
-- DESCRIPTION
--   Weekly cohort retention from login sessions, restricted to
--   users who "onboarded" — defined as running at least one
--   agent within 365 days of their first login.
--   Filters out users who signed up but never activated,
--   giving a cleaner view of engaged-user retention.
--
-- SOURCE TABLES
--   auth.sessions                  — Login session records
--   platform.AgentGraphExecution   — Used to identify onboarders
--
-- OUTPUT COLUMNS
--   Same as retention_login_weekly (cohort_week_start, user_lifetime_week,
--   retention_rate_bounded, retention_rate_unbounded, etc.)
--   Only difference: cohort is filtered to onboarded users only.
--
-- EXAMPLE QUERIES
--   -- Compare week-4 retention: all users vs onboarded only
--   SELECT 'all_users' AS segment, AVG(retention_rate_bounded) AS w4_retention
--   FROM analytics.retention_login_weekly WHERE user_lifetime_week = 4
--   UNION ALL
--   SELECT 'onboarded', AVG(retention_rate_bounded)
--   FROM analytics.retention_login_onboarded_weekly WHERE user_lifetime_week = 4;
-- =============================================================
-
-WITH params AS (SELECT 12::int AS max_weeks, 365::int AS onboarding_window_days),
-events AS (
-  SELECT s.user_id::text AS user_id, s.created_at::timestamptz AS created_at,
-         DATE_TRUNC('week', s.created_at)::date AS week_start
-  FROM auth.sessions s WHERE s.user_id IS NOT NULL
-),
-first_login_all AS (
-  SELECT user_id, MIN(created_at) AS first_login_time,
-         DATE_TRUNC('week', MIN(created_at))::date AS cohort_week_start
-  FROM events GROUP BY 1
-),
-onboarders AS (
-  SELECT fl.user_id FROM first_login_all fl
-  WHERE EXISTS (
-    SELECT 1 FROM platform."AgentGraphExecution" e
-    WHERE e."userId"::text = fl.user_id
-      AND e."createdAt" >= fl.first_login_time
-      AND e."createdAt" < fl.first_login_time
-          + make_interval(days => (SELECT onboarding_window_days FROM params))
-  )
-),
-first_login AS (SELECT * FROM first_login_all WHERE user_id IN (SELECT user_id FROM onboarders)),
-activity_weeks AS (SELECT DISTINCT user_id, week_start FROM events),
-user_week_age AS (
-  SELECT aw.user_id, fl.cohort_week_start,
-         ((aw.week_start - DATE_TRUNC('week',fl.first_login_time)::date)/7)::int AS user_lifetime_week
-  FROM activity_weeks aw JOIN first_login fl USING (user_id)
-  WHERE aw.week_start >= DATE_TRUNC('week',fl.first_login_time)::date
-),
-bounded_counts AS (
-  SELECT cohort_week_start, user_lifetime_week, COUNT(DISTINCT user_id) AS active_users_bounded
-  FROM user_week_age WHERE user_lifetime_week >= 0 GROUP BY 1,2
-),
-last_active AS (
-  SELECT cohort_week_start, user_id, MAX(user_lifetime_week) AS last_active_week FROM user_week_age GROUP BY 1,2
-),
-unbounded_counts AS (
-  SELECT la.cohort_week_start, gs AS user_lifetime_week, COUNT(*) AS retained_users_unbounded
-  FROM last_active la
-  CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_week,(SELECT max_weeks FROM params))) gs
-  GROUP BY 1,2
-),
-cohort_sizes AS (SELECT cohort_week_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_login GROUP BY 1),
-cohort_caps AS (
-  SELECT cs.cohort_week_start, cs.cohort_users,
-         LEAST((SELECT max_weeks FROM params),
-               GREATEST(0,((DATE_TRUNC('week',CURRENT_DATE)::date-cs.cohort_week_start)/7)::int)) AS cap_weeks
-  FROM cohort_sizes cs
-),
-grid AS (
-  SELECT cc.cohort_week_start, gs AS user_lifetime_week, cc.cohort_users
-  FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_weeks) gs
-)
-SELECT
-  g.cohort_week_start,
-  TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')                               AS cohort_label,
-  TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')||' (n='||g.cohort_users||')'  AS cohort_label_n,
-  g.user_lifetime_week, g.cohort_users,
-  COALESCE(b.active_users_bounded,0)     AS active_users_bounded,
-  COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
-  CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END    AS retention_rate_bounded,
-  CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
-  CASE WHEN g.user_lifetime_week=0 THEN g.cohort_users ELSE 0 END         AS cohort_users_w0
-FROM grid g
-LEFT JOIN bounded_counts   b ON b.cohort_week_start=g.cohort_week_start AND b.user_lifetime_week=g.user_lifetime_week
-LEFT JOIN unbounded_counts u ON u.cohort_week_start=g.cohort_week_start AND u.user_lifetime_week=g.user_lifetime_week
-ORDER BY g.cohort_week_start, g.user_lifetime_week;
--- a/autogpt_platform/analytics/queries/retention_login_weekly.sql
+++ b/autogpt_platform/analytics/queries/retention_login_weekly.sql
@@ -1,103 +0,0 @@
-- =============================================================
-- View: analytics.retention_login_weekly
-- Looker source alias: ds83  |  Charts: 2
-- =============================================================
-- DESCRIPTION
--   Weekly cohort retention based on login sessions.
--   Users are grouped by the ISO week of their first ever login.
--   For each cohort × lifetime-week combination, outputs both:
--     - bounded rate: % active in exactly that week
--     - unbounded rate: % who were ever active on or after that week
--   Weeks are capped to the cohort's actual age (no future data points).
--
-- SOURCE TABLES
--   auth.sessions  — Login session records
--
-- HOW TO READ THE OUTPUT
--   cohort_week_start   The Monday of the week users first logged in
--   user_lifetime_week  0 = signup week, 1 = one week later, etc.
--   retention_rate_bounded   = active_users_bounded / cohort_users
--   retention_rate_unbounded = retained_users_unbounded / cohort_users
--
-- OUTPUT COLUMNS
--   cohort_week_start         DATE     First day of the cohort's signup week
--   cohort_label              TEXT     ISO week label (e.g. '2025-W01')
--   cohort_label_n            TEXT     ISO week label with cohort size (e.g. '2025-W01 (n=42)')
--   user_lifetime_week        INT      Weeks since first login (0 = signup week)
--   cohort_users              BIGINT   Total users in this cohort (denominator)
--   active_users_bounded      BIGINT   Users active in exactly week k
--   retained_users_unbounded  BIGINT   Users active any time on/after week k
--   retention_rate_bounded    FLOAT    bounded active / cohort_users
--   retention_rate_unbounded  FLOAT    unbounded retained / cohort_users
--   cohort_users_w0           BIGINT   cohort_users only at week 0, else 0 (safe to SUM in pivot tables)
--
-- EXAMPLE QUERIES
--   -- Week-1 retention rate per cohort
--   SELECT cohort_label, retention_rate_bounded AS w1_retention
--   FROM analytics.retention_login_weekly
--   WHERE user_lifetime_week = 1
--   ORDER BY cohort_week_start;
--
--   -- Overall average retention curve (all cohorts combined)
--   SELECT user_lifetime_week,
--          SUM(active_users_bounded)::float / NULLIF(SUM(cohort_users_w0), 0) AS avg_retention
--   FROM analytics.retention_login_weekly
--   GROUP BY 1 ORDER BY 1;
-- =============================================================
-
-WITH params AS (SELECT 12::int AS max_weeks),
-events AS (
-  SELECT s.user_id::text AS user_id, s.created_at::timestamptz AS created_at,
-         DATE_TRUNC('week', s.created_at)::date AS week_start
-  FROM auth.sessions s WHERE s.user_id IS NOT NULL
-),
-first_login AS (
-  SELECT user_id, MIN(created_at) AS first_login_time,
-         DATE_TRUNC('week', MIN(created_at))::date AS cohort_week_start
-  FROM events GROUP BY 1
-),
-activity_weeks AS (SELECT DISTINCT user_id, week_start FROM events),
-user_week_age AS (
-  SELECT aw.user_id, fl.cohort_week_start,
-         ((aw.week_start - DATE_TRUNC('week', fl.first_login_time)::date) / 7)::int AS user_lifetime_week
-  FROM activity_weeks aw JOIN first_login fl USING (user_id)
-  WHERE aw.week_start >= DATE_TRUNC('week', fl.first_login_time)::date
-),
-bounded_counts AS (
-  SELECT cohort_week_start, user_lifetime_week, COUNT(DISTINCT user_id) AS active_users_bounded
-  FROM user_week_age WHERE user_lifetime_week >= 0 GROUP BY 1,2
-),
-last_active AS (
-  SELECT cohort_week_start, user_id, MAX(user_lifetime_week) AS last_active_week FROM user_week_age GROUP BY 1,2
-),
-unbounded_counts AS (
-  SELECT la.cohort_week_start, gs AS user_lifetime_week, COUNT(*) AS retained_users_unbounded
-  FROM last_active la
-  CROSS JOIN LATERAL generate_series(0, LEAST(la.last_active_week,(SELECT max_weeks FROM params))) gs
-  GROUP BY 1,2
-),
-cohort_sizes AS (SELECT cohort_week_start, COUNT(DISTINCT user_id) AS cohort_users FROM first_login GROUP BY 1),
-cohort_caps AS (
-  SELECT cs.cohort_week_start, cs.cohort_users,
-         LEAST((SELECT max_weeks FROM params),
-               GREATEST(0,((DATE_TRUNC('week',CURRENT_DATE)::date - cs.cohort_week_start)/7)::int)) AS cap_weeks
-  FROM cohort_sizes cs
-),
-grid AS (
-  SELECT cc.cohort_week_start, gs AS user_lifetime_week, cc.cohort_users
-  FROM cohort_caps cc CROSS JOIN LATERAL generate_series(0, cc.cap_weeks) gs
-)
-SELECT
-  g.cohort_week_start,
-  TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')                                    AS cohort_label,
-  TO_CHAR(g.cohort_week_start,'IYYY-"W"IW')||' (n='||g.cohort_users||')'       AS cohort_label_n,
-  g.user_lifetime_week, g.cohort_users,
-  COALESCE(b.active_users_bounded,0)     AS active_users_bounded,
-  COALESCE(u.retained_users_unbounded,0) AS retained_users_unbounded,
-  CASE WHEN g.cohort_users>0 THEN COALESCE(b.active_users_bounded,0)::float/g.cohort_users END    AS retention_rate_bounded,
-  CASE WHEN g.cohort_users>0 THEN COALESCE(u.retained_users_unbounded,0)::float/g.cohort_users END AS retention_rate_unbounded,
-  CASE WHEN g.user_lifetime_week=0 THEN g.cohort_users ELSE 0 END               AS cohort_users_w0
-FROM grid g
-LEFT JOIN bounded_counts   b ON b.cohort_week_start=g.cohort_week_start AND b.user_lifetime_week=g.user_lifetime_week
-LEFT JOIN unbounded_counts u ON u.cohort_week_start=g.cohort_week_start AND u.user_lifetime_week=g.user_lifetime_week
-ORDER BY g.cohort_week_start, g.user_lifetime_week
--- a/autogpt_platform/analytics/queries/user_block_spending.sql
+++ b/autogpt_platform/analytics/queries/user_block_spending.sql
@@ -1,71 +0,0 @@
-- =============================================================
-- View: analytics.user_block_spending
-- Looker source alias: ds6  |  Charts: 5
-- =============================================================
-- DESCRIPTION
--   One row per credit transaction (last 90 days).
--   Shows how users spend credits broken down by block type,
--   LLM provider and model.  Joins node execution stats for
--   token-level detail.
--
-- SOURCE TABLES
--   platform.CreditTransaction   — Credit debit/credit records
--   platform.AgentNodeExecution  — Node execution stats (for token counts)
--
-- OUTPUT COLUMNS
--   transactionKey        TEXT         Unique transaction identifier
--   userId                TEXT         User who was charged
--   amount                DECIMAL      Credit amount (positive = credit, negative = debit)
--   negativeAmount        DECIMAL      amount * -1 (convenience for spend charts)
--   transactionType       TEXT         Transaction type (e.g. 'USAGE', 'REFUND', 'TOP_UP')
--   transactionTime       TIMESTAMPTZ  When the transaction was recorded
--   blockId               TEXT         Block UUID that triggered the spend
--   blockName             TEXT         Human-readable block name
--   llm_provider          TEXT         LLM provider (e.g. 'openai', 'anthropic')
--   llm_model             TEXT         Model name (e.g. 'gpt-4o', 'claude-3-5-sonnet')
--   node_exec_id          TEXT         Linked node execution UUID
--   llm_call_count        INT          LLM API calls made in that execution
--   llm_retry_count       INT          LLM retries in that execution
--   llm_input_token_count INT          Input tokens consumed
--   llm_output_token_count INT         Output tokens produced
--
-- WINDOW
--   Rolling 90 days (createdAt > CURRENT_DATE - 90 days)
--
-- EXAMPLE QUERIES
--   -- Total spend per user (last 90 days)
--   SELECT "userId", SUM("negativeAmount") AS total_spent
--   FROM analytics.user_block_spending
--   WHERE "transactionType" = 'USAGE'
--   GROUP BY 1 ORDER BY total_spent DESC;
--
--   -- Spend by LLM provider + model
--   SELECT "llm_provider", "llm_model",
--          SUM("negativeAmount") AS total_cost,
--          SUM("llm_input_token_count") AS input_tokens,
--          SUM("llm_output_token_count") AS output_tokens
--   FROM analytics.user_block_spending
--   WHERE "llm_provider" IS NOT NULL
--   GROUP BY 1, 2 ORDER BY total_cost DESC;
-- =============================================================
-
-SELECT
-    c."transactionKey"                                        AS transactionKey,
-    c."userId"                                                AS userId,
-    c."amount"                                                AS amount,
-    c."amount" * -1                                           AS negativeAmount,
-    c."type"                                                  AS transactionType,
-    c."createdAt"                                             AS transactionTime,
-    c.metadata->>'block_id'                                   AS blockId,
-    c.metadata->>'block'                                      AS blockName,
-    c.metadata->'input'->'credentials'->>'provider'           AS llm_provider,
-    c.metadata->'input'->>'model'                             AS llm_model,
-    c.metadata->>'node_exec_id'                               AS node_exec_id,
-    (ne."stats"->>'llm_call_count')::int                       AS llm_call_count,
-    (ne."stats"->>'llm_retry_count')::int                      AS llm_retry_count,
-    (ne."stats"->>'input_token_count')::int                    AS llm_input_token_count,
-    (ne."stats"->>'output_token_count')::int                   AS llm_output_token_count
-FROM platform."CreditTransaction" c
-LEFT JOIN platform."AgentNodeExecution" ne
-       ON (c.metadata->>'node_exec_id') = ne."id"::text
-WHERE c."createdAt" > CURRENT_DATE - INTERVAL '90 days'
--- a/autogpt_platform/analytics/queries/user_onboarding.sql
+++ b/autogpt_platform/analytics/queries/user_onboarding.sql
@@ -1,45 +0,0 @@
-- =============================================================
-- View: analytics.user_onboarding
-- Looker source alias: ds68  |  Charts: 3
-- =============================================================
-- DESCRIPTION
--   One row per user onboarding record.  Contains the user's
--   stated usage reason, selected integrations, completed
--   onboarding steps and optional first agent selection.
--   Full history (no date filter) since onboarding happens
--   once per user.
--
-- SOURCE TABLES
--   platform.UserOnboarding  — Onboarding state per user
--
-- OUTPUT COLUMNS
--   id                            TEXT         Onboarding record UUID
--   createdAt                     TIMESTAMPTZ  When onboarding started
--   updatedAt                     TIMESTAMPTZ  Last update to onboarding state
--   usageReason                   TEXT         Why user signed up (e.g. 'work', 'personal')
--   integrations                  TEXT[]       Array of integration names the user selected
--   userId                        TEXT         User UUID
--   completedSteps                TEXT[]       Array of onboarding step enums completed
--   selectedStoreListingVersionId TEXT         First marketplace agent the user chose (if any)
--
-- EXAMPLE QUERIES
--   -- Usage reason breakdown
--   SELECT "usageReason", COUNT(*) FROM analytics.user_onboarding GROUP BY 1;
--
--   -- Completion rate per step
--   SELECT step, COUNT(*) AS users_completed
--   FROM analytics.user_onboarding
--   CROSS JOIN LATERAL UNNEST("completedSteps") AS step
--   GROUP BY 1 ORDER BY users_completed DESC;
-- =============================================================
-
-SELECT
-    id,
-    "createdAt",
-    "updatedAt",
-    "usageReason",
-    integrations,
-    "userId",
-    "completedSteps",
-    "selectedStoreListingVersionId"
-FROM platform."UserOnboarding"
--- a/autogpt_platform/analytics/queries/user_onboarding_funnel.sql
+++ b/autogpt_platform/analytics/queries/user_onboarding_funnel.sql
@@ -1,100 +0,0 @@
-- =============================================================
-- View: analytics.user_onboarding_funnel
-- Looker source alias: ds74  |  Charts: 1
-- =============================================================
-- DESCRIPTION
--   Pre-aggregated onboarding funnel showing how many users
--   completed each step and the drop-off percentage from the
--   previous step.  One row per onboarding step (all 22 steps
--   always present, even with 0 completions — prevents sparse
--   gaps from making LAG compare the wrong predecessors).
--
-- SOURCE TABLES
--   platform.UserOnboarding  — Onboarding records with completedSteps array
--
-- OUTPUT COLUMNS
--   step             TEXT     Onboarding step enum name (e.g. 'WELCOME', 'CONGRATS')
--   step_order       INT      Numeric position in the funnel (1=first, 22=last)
--   users_completed  BIGINT   Distinct users who completed this step
--   pct_from_prev    NUMERIC  % of users from the previous step who reached this one
--
-- STEP ORDER
--   1  WELCOME               9  MARKETPLACE_VISIT     17  SCHEDULE_AGENT
--   2  USAGE_REASON         10  MARKETPLACE_ADD_AGENT  18  RUN_AGENTS
--   3  INTEGRATIONS         11  MARKETPLACE_RUN_AGENT  19  RUN_3_DAYS
--   4  AGENT_CHOICE         12  BUILDER_OPEN           20  TRIGGER_WEBHOOK
--   5  AGENT_NEW_RUN        13  BUILDER_SAVE_AGENT     21  RUN_14_DAYS
--   6  AGENT_INPUT          14  BUILDER_RUN_AGENT      22  RUN_AGENTS_100
--   7  CONGRATS             15  VISIT_COPILOT
--   8  GET_RESULTS          16  RE_RUN_AGENT
--
-- WINDOW
--   Users who started onboarding in the last 90 days
--
-- EXAMPLE QUERIES
--   -- Full funnel
--   SELECT * FROM analytics.user_onboarding_funnel ORDER BY step_order;
--
--   -- Biggest drop-off point
--   SELECT step, pct_from_prev FROM analytics.user_onboarding_funnel
--   ORDER BY pct_from_prev ASC LIMIT 3;
-- =============================================================
-
-WITH all_steps AS (
-  -- Complete ordered grid of all 22 steps so zero-completion steps
-  -- are always present, keeping LAG comparisons correct.
-  SELECT step_name, step_order
-  FROM (VALUES
-    ('WELCOME',               1),
-    ('USAGE_REASON',          2),
-    ('INTEGRATIONS',          3),
-    ('AGENT_CHOICE',          4),
-    ('AGENT_NEW_RUN',         5),
-    ('AGENT_INPUT',           6),
-    ('CONGRATS',              7),
-    ('GET_RESULTS',           8),
-    ('MARKETPLACE_VISIT',     9),
-    ('MARKETPLACE_ADD_AGENT', 10),
-    ('MARKETPLACE_RUN_AGENT', 11),
-    ('BUILDER_OPEN',          12),
-    ('BUILDER_SAVE_AGENT',    13),
-    ('BUILDER_RUN_AGENT',     14),
-    ('VISIT_COPILOT',         15),
-    ('RE_RUN_AGENT',          16),
-    ('SCHEDULE_AGENT',        17),
-    ('RUN_AGENTS',            18),
-    ('RUN_3_DAYS',            19),
-    ('TRIGGER_WEBHOOK',       20),
-    ('RUN_14_DAYS',           21),
-    ('RUN_AGENTS_100',        22)
-  ) AS t(step_name, step_order)
-),
-raw AS (
-  SELECT
-      u."userId",
-      step_txt::text AS step
-  FROM platform."UserOnboarding" u
-  CROSS JOIN LATERAL UNNEST(u."completedSteps") AS step_txt
-  WHERE u."createdAt" >= CURRENT_DATE - INTERVAL '90 days'
-),
-step_counts AS (
-  SELECT step, COUNT(DISTINCT "userId") AS users_completed
-  FROM raw GROUP BY step
-),
-funnel AS (
-  SELECT
-      a.step_name                          AS step,
-      a.step_order,
-      COALESCE(sc.users_completed, 0)      AS users_completed,
-      ROUND(
-        100.0 * COALESCE(sc.users_completed, 0)
-        / NULLIF(
-            LAG(COALESCE(sc.users_completed, 0)) OVER (ORDER BY a.step_order),
-            0
-          ),
-        2
-      )                                    AS pct_from_prev
-  FROM all_steps a
-  LEFT JOIN step_counts sc ON sc.step = a.step_name
-)
-SELECT * FROM funnel ORDER BY step_order
--- a/autogpt_platform/analytics/queries/user_onboarding_integration.sql
+++ b/autogpt_platform/analytics/queries/user_onboarding_integration.sql
@@ -1,41 +0,0 @@
-- =============================================================
-- View: analytics.user_onboarding_integration
-- Looker source alias: ds75  |  Charts: 1
-- =============================================================
-- DESCRIPTION
--   Pre-aggregated count of users who selected each integration
--   during onboarding.  One row per integration type, sorted
--   by popularity.
--
-- SOURCE TABLES
--   platform.UserOnboarding  — integrations array column
--
-- OUTPUT COLUMNS
--   integration            TEXT    Integration name (e.g. 'github', 'slack', 'notion')
--   users_with_integration BIGINT  Distinct users who selected this integration
--
-- WINDOW
--   Users who started onboarding in the last 90 days
--
-- EXAMPLE QUERIES
--   -- Full integration popularity ranking
--   SELECT * FROM analytics.user_onboarding_integration;
--
--   -- Top 5 integrations
--   SELECT * FROM analytics.user_onboarding_integration LIMIT 5;
-- =============================================================
-
-WITH exploded AS (
-  SELECT
-      u."userId" AS user_id,
-      UNNEST(u."integrations") AS integration
-  FROM platform."UserOnboarding" u
-  WHERE u."createdAt" >= CURRENT_DATE - INTERVAL '90 days'
-)
-SELECT
-    integration,
-    COUNT(DISTINCT user_id) AS users_with_integration
-FROM exploded
-WHERE integration IS NOT NULL AND integration <> ''
-GROUP BY integration
-ORDER BY users_with_integration DESC
--- a/autogpt_platform/analytics/queries/users_activities.sql
+++ b/autogpt_platform/analytics/queries/users_activities.sql
@@ -1,145 +0,0 @@
-- =============================================================
-- View: analytics.users_activities
-- Looker source alias: ds56  |  Charts: 5
-- =============================================================
-- DESCRIPTION
--   One row per user with lifetime activity summary.
--   Joins login sessions with agent graphs, executions and
--   node-level runs to give a full picture of how engaged
--   each user is.  Includes a convenience flag for 7-day
--   activation (did the user return at least 7 days after
--   their first login?).
--
-- SOURCE TABLES
--   auth.sessions                    — Login/session records
--   platform.AgentGraph              — Graphs (agents) built by the user
--   platform.AgentGraphExecution     — Agent run history
--   platform.AgentNodeExecution      — Individual block execution history
--
-- PERFORMANCE NOTE
--   Each CTE aggregates its own table independently by userId.
--   This avoids the fan-out that occurs when driving every join
--   from user_logins across the two largest tables
--   (AgentGraphExecution and AgentNodeExecution).
--
-- OUTPUT COLUMNS
--   user_id                   TEXT         Supabase user UUID
--   first_login_time          TIMESTAMPTZ  First ever session created_at
--   last_login_time           TIMESTAMPTZ  Most recent session created_at
--   last_visit_time           TIMESTAMPTZ  Max of last refresh or login
--   last_agent_save_time      TIMESTAMPTZ  Last time user saved an agent graph
--   agent_count               BIGINT       Number of distinct active graphs built (0 if none)
--   first_agent_run_time      TIMESTAMPTZ  First ever graph execution
--   last_agent_run_time       TIMESTAMPTZ  Most recent graph execution
--   unique_agent_runs         BIGINT       Distinct agent graphs ever run (0 if none)
--   agent_runs                BIGINT       Total graph execution count (0 if none)
--   node_execution_count      BIGINT       Total node executions across all runs
--   node_execution_failed     BIGINT       Node executions with FAILED status
--   node_execution_completed  BIGINT       Node executions with COMPLETED status
--   node_execution_terminated BIGINT       Node executions with TERMINATED status
--   node_execution_queued     BIGINT       Node executions with QUEUED status
--   node_execution_running    BIGINT       Node executions with RUNNING status
--   is_active_after_7d        INT          1=returned after day 7, 0=did not, NULL=too early to tell
--   node_execution_incomplete BIGINT       Node executions with INCOMPLETE status
--   node_execution_review     BIGINT       Node executions with REVIEW status
--
-- EXAMPLE QUERIES
--   -- Users who ran at least one agent and returned after 7 days
--   SELECT COUNT(*) FROM analytics.users_activities
--   WHERE agent_runs > 0 AND is_active_after_7d = 1;
--
--   -- Top 10 most active users by agent runs
--   SELECT user_id, agent_runs, node_execution_count
--   FROM analytics.users_activities
--   ORDER BY agent_runs DESC LIMIT 10;
--
--   -- 7-day activation rate
--   SELECT
--     SUM(CASE WHEN is_active_after_7d = 1 THEN 1 ELSE 0 END)::float
--     / NULLIF(COUNT(CASE WHEN is_active_after_7d IS NOT NULL THEN 1 END), 0)
--     AS activation_rate
--   FROM analytics.users_activities;
-- =============================================================
-
-WITH user_logins AS (
-  SELECT
-    user_id::text                                    AS user_id,
-    MIN(created_at)                                  AS first_login_time,
-    MAX(created_at)                                  AS last_login_time,
-    GREATEST(
-      MAX(refreshed_at)::timestamptz,
-      MAX(created_at)::timestamptz
-    )                                                AS last_visit_time
-  FROM auth.sessions
-  GROUP BY user_id
-),
-user_agents AS (
-  -- Aggregate AgentGraph directly by userId (no fan-out from user_logins)
-  SELECT
-    "userId"::text                AS user_id,
-    MAX("updatedAt")              AS last_agent_save_time,
-    COUNT(DISTINCT "id")          AS agent_count
-  FROM platform."AgentGraph"
-  WHERE "isActive"
-  GROUP BY "userId"
-),
-user_graph_runs AS (
-  -- Aggregate AgentGraphExecution directly by userId
-  SELECT
-    "userId"::text                        AS user_id,
-    MIN("createdAt")                      AS first_agent_run_time,
-    MAX("createdAt")                      AS last_agent_run_time,
-    COUNT(DISTINCT "agentGraphId")        AS unique_agent_runs,
-    COUNT("id")                           AS agent_runs
-  FROM platform."AgentGraphExecution"
-  GROUP BY "userId"
-),
-user_node_runs AS (
-  -- Aggregate AgentNodeExecution directly; resolve userId via a
-  -- single join to AgentGraphExecution instead of fanning out from
-  -- user_logins through both large tables.
-  SELECT
-    g."userId"::text                                                   AS user_id,
-    COUNT(*)                                                           AS node_execution_count,
-    COUNT(*) FILTER (WHERE n."executionStatus" = 'FAILED')             AS node_execution_failed,
-    COUNT(*) FILTER (WHERE n."executionStatus" = 'COMPLETED')          AS node_execution_completed,
-    COUNT(*) FILTER (WHERE n."executionStatus" = 'TERMINATED')         AS node_execution_terminated,
-    COUNT(*) FILTER (WHERE n."executionStatus" = 'QUEUED')             AS node_execution_queued,
-    COUNT(*) FILTER (WHERE n."executionStatus" = 'RUNNING')            AS node_execution_running,
-    COUNT(*) FILTER (WHERE n."executionStatus" = 'INCOMPLETE')         AS node_execution_incomplete,
-    COUNT(*) FILTER (WHERE n."executionStatus" = 'REVIEW')             AS node_execution_review
-  FROM platform."AgentNodeExecution" n
-  JOIN platform."AgentGraphExecution" g
-    ON g."id" = n."agentGraphExecutionId"
-  GROUP BY g."userId"
-)
-SELECT
-  ul.user_id,
-  ul.first_login_time,
-  ul.last_login_time,
-  ul.last_visit_time,
-  ua.last_agent_save_time,
-  COALESCE(ua.agent_count, 0)             AS agent_count,
-  gr.first_agent_run_time,
-  gr.last_agent_run_time,
-  COALESCE(gr.unique_agent_runs, 0)       AS unique_agent_runs,
-  COALESCE(gr.agent_runs, 0)              AS agent_runs,
-  COALESCE(nr.node_execution_count, 0)      AS node_execution_count,
-  COALESCE(nr.node_execution_failed, 0)     AS node_execution_failed,
-  COALESCE(nr.node_execution_completed, 0)  AS node_execution_completed,
-  COALESCE(nr.node_execution_terminated, 0) AS node_execution_terminated,
-  COALESCE(nr.node_execution_queued, 0)     AS node_execution_queued,
-  COALESCE(nr.node_execution_running, 0)    AS node_execution_running,
-  CASE
-    WHEN ul.first_login_time < NOW() - INTERVAL '7 days'
-     AND ul.last_visit_time  >= ul.first_login_time + INTERVAL '7 days' THEN 1
-    WHEN ul.first_login_time < NOW() - INTERVAL '7 days'
-     AND ul.last_visit_time  <  ul.first_login_time + INTERVAL '7 days' THEN 0
-    ELSE NULL
-  END AS is_active_after_7d,
-  COALESCE(nr.node_execution_incomplete, 0) AS node_execution_incomplete,
-  COALESCE(nr.node_execution_review, 0)     AS node_execution_review
-FROM user_logins ul
-LEFT JOIN user_agents     ua ON ul.user_id = ua.user_id
-LEFT JOIN user_graph_runs gr ON ul.user_id = gr.user_id
-LEFT JOIN user_node_runs  nr ON ul.user_id = nr.user_id
--- a/autogpt_platform/autogpt_libs/poetry.lock
+++ b/autogpt_platform/autogpt_libs/poetry.lock
--- a/autogpt_platform/autogpt_libs/pyproject.toml
+++ b/autogpt_platform/autogpt_libs/pyproject.toml
@@ -9,25 +9,25 @@ packages = [{ include = "autogpt_libs" }]
 [tool.poetry.dependencies]
 python = ">=3.10,<4.0"
 colorama = "^0.4.6"
-cryptography = "^46.0"
+cryptography = "^45.0"
 expiringdict = "^1.2.2"
-fastapi = "^0.128.7"
-google-cloud-logging = "^3.13.0"
-launchdarkly-server-sdk = "^9.15.0"
-pydantic = "^2.12.5"
-pydantic-settings = "^2.12.0"
-pyjwt = { version = "^2.11.0", extras = ["crypto"] }
+fastapi = "^0.116.1"
+google-cloud-logging = "^3.12.1"
+launchdarkly-server-sdk = "^9.12.0"
+pydantic = "^2.11.7"
+pydantic-settings = "^2.10.1"
+pyjwt = { version = "^2.10.1", extras = ["crypto"] }
 redis = "^6.2.0"
-supabase = "^2.28.0"
-uvicorn = "^0.40.0"
+supabase = "^2.16.0"
+uvicorn = "^0.35.0"

 [tool.poetry.group.dev.dependencies]
-pyright = "^1.1.408"
+pyright = "^1.1.404"
 pytest = "^8.4.1"
-pytest-asyncio = "^1.3.0"
-pytest-mock = "^3.15.1"
-pytest-cov = "^7.1.0"
-ruff = "^0.15.7"
+pytest-asyncio = "^1.1.0"
+pytest-mock = "^3.14.1"
+pytest-cov = "^6.2.1"
+ruff = "^0.12.11"

 [build-system]
 requires = ["poetry-core"]
--- a/autogpt_platform/backend/.env.default
+++ b/autogpt_platform/backend/.env.default
@@ -104,12 +104,6 @@ TWITTER_CLIENT_SECRET=
 # Make a new workspace for your OAuth APP -- trust me
 # https://linear.app/settings/api/applications/new
 # Callback URL: http://localhost:3000/auth/integrations/oauth_callback
-LINEAR_API_KEY=
-# Linear project and team IDs for the feature request tracker.
-# Find these in your Linear workspace URL: linear.app/<workspace>/project/<project-id>
-# and in team settings. Used by the chat copilot to file and search feature requests.
-LINEAR_FEATURE_REQUEST_PROJECT_ID=
-LINEAR_FEATURE_REQUEST_TEAM_ID=
 LINEAR_CLIENT_ID=
 LINEAR_CLIENT_SECRET=

@@ -158,7 +152,6 @@ REPLICATE_API_KEY=
 REVID_API_KEY=
 SCREENSHOTONE_API_KEY=
 UNREAL_SPEECH_API_KEY=
-ELEVENLABS_API_KEY=

 # Data & Search Services
 E2B_API_KEY=
@@ -178,7 +171,6 @@ SMTP_USERNAME=
 SMTP_PASSWORD=

 # Business & Marketing Tools
-AGENTMAIL_API_KEY=
 APOLLO_API_KEY=
 ENRICHLAYER_API_KEY=
 AYRSHARE_API_KEY=
@@ -191,8 +183,5 @@ ZEROBOUNCE_API_KEY=
 POSTHOG_API_KEY=
 POSTHOG_HOST=https://eu.i.posthog.com

-# Tally Form Integration (pre-populate business understanding on signup)
-TALLY_API_KEY=
-
 # Other Services
 AUTOMOD_API_KEY=
--- a/autogpt_platform/backend/.gitignore
+++ b/autogpt_platform/backend/.gitignore
@@ -19,6 +19,3 @@ load-tests/*.json
 load-tests/*.log
 load-tests/node_modules/*
 migrations/*/rollback*.sql
-
-# Workspace files
-workspaces/
--- a/autogpt_platform/backend/AGENTS.md
+++ b/autogpt_platform/backend/AGENTS.md
@@ -1,227 +0,0 @@
-# Backend
-
-This file provides guidance to coding agents when working with the backend.
-
-## Essential Commands
-
-To run something with Python package dependencies you MUST use `poetry run ...`.
-
-```bash
-# Install dependencies
-poetry install
-
-# Run database migrations
-poetry run prisma migrate dev
-
-# Start all services (database, redis, rabbitmq, clamav)
-docker compose up -d
-
-# Run the backend as a whole
-poetry run app
-
-# Run tests
-poetry run test
-
-# Run specific test
-poetry run pytest path/to/test_file.py::test_function_name
-
-# Run block tests (tests that validate all blocks work correctly)
-poetry run pytest backend/blocks/test/test_block.py -xvs
-
-# Run tests for a specific block (e.g., GetCurrentTimeBlock)
-poetry run pytest 'backend/blocks/test/test_block.py::test_available_blocks[GetCurrentTimeBlock]' -xvs
-
-# Lint and format
-# prefer format if you want to just "fix" it and only get the errors that can't be autofixed
-poetry run format  # Black + isort
-poetry run lint    # ruff
-```
-
-More details can be found in @TESTING.md
-
-### Creating/Updating Snapshots
-
-When you first write a test or when the expected output changes:
-
-```bash
-poetry run pytest path/to/test.py --snapshot-update
-```
-
-⚠️ **Important**: Always review snapshot changes before committing! Use `git diff` to verify the changes are expected.
-
-## Architecture
-
- **API Layer**: FastAPI with REST and WebSocket endpoints
- **Database**: PostgreSQL with Prisma ORM, includes pgvector for embeddings
- **Queue System**: RabbitMQ for async task processing
- **Execution Engine**: Separate executor service processes agent workflows
- **Authentication**: JWT-based with Supabase integration
- **Security**: Cache protection middleware prevents sensitive data caching in browsers/proxies
-
-## Code Style
-
- **Top-level imports only** — no local/inner imports (lazy imports only for heavy optional deps like `openpyxl`)
- **Absolute imports** — use `from backend.module import ...` for cross-package imports. Single-dot relative (`from .sibling import ...`) is acceptable for sibling modules within the same package (e.g., blocks). Avoid double-dot relative imports (`from ..parent import ...`) — use the absolute path instead
- **No duck typing** — no `hasattr`/`getattr`/`isinstance` for type dispatch; use typed interfaces/unions/protocols
- **Pydantic models** over dataclass/namedtuple/dict for structured data
- **No linter suppressors** — no `# type: ignore`, `# noqa`, `# pyright: ignore`; fix the type/code
- **List comprehensions** over manual loop-and-append
- **Early return** — guard clauses first, avoid deep nesting
- **f-strings vs printf syntax in log statements** — Use `%s` for deferred interpolation in `debug` statements, f-strings elsewhere for readability: `logger.debug("Processing %s items", count)`, `logger.info(f"Processing {count} items")`
- **Sanitize error paths** — `os.path.basename()` in error messages to avoid leaking directory structure
- **TOCTOU awareness** — avoid check-then-act patterns for file access and credit charging
- **`Security()` vs `Depends()`** — use `Security()` for auth deps to get proper OpenAPI security spec
- **Redis pipelines** — `transaction=True` for atomicity on multi-step operations
- **`max(0, value)` guards** — for computed values that should never be negative
- **SSE protocol** — `data:` lines for frontend-parsed events (must match Zod schema), `: comment` lines for heartbeats/status
- **File length** — keep files under ~300 lines; if a file grows beyond this, split by responsibility (e.g. extract helpers, models, or a sub-module into a new file). Never keep appending to a long file.
- **Function length** — keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions are a sign of mixed concerns, not complexity.
- **Top-down ordering** — define the main/public function or class first, then the helpers it uses below. A reader should encounter high-level logic before implementation details.
-
-## Testing Approach
-
- Uses pytest with snapshot testing for API responses
- Test files are colocated with source files (`*_test.py`)
- Mock at boundaries — mock where the symbol is **used**, not where it's **defined**
- After refactoring, update mock targets to match new module paths
- Use `AsyncMock` for async functions (`from unittest.mock import AsyncMock`)
-
-### Test-Driven Development (TDD)
-
-When fixing a bug or adding a feature, write the test **before** the implementation:
-
-```python
-# 1. Write a failing test marked xfail
-@pytest.mark.xfail(reason="Bug #1234: widget crashes on empty input")
-def test_widget_handles_empty_input():
-    result = widget.process("")
-    assert result == Widget.EMPTY_RESULT
-
-# 2. Run it — confirm it fails (XFAIL)
-# poetry run pytest path/to/test.py::test_widget_handles_empty_input -xvs
-
-# 3. Implement the fix
-
-# 4. Remove xfail, run again — confirm it passes
-def test_widget_handles_empty_input():
-    result = widget.process("")
-    assert result == Widget.EMPTY_RESULT
-```
-
-This catches regressions and proves the fix actually works. **Every bug fix should include a test that would have caught it.**
-
-## Database Schema
-
-Key models (defined in `schema.prisma`):
-
- `User`: Authentication and profile data
- `AgentGraph`: Workflow definitions with version control
- `AgentGraphExecution`: Execution history and results
- `AgentNode`: Individual nodes in a workflow
- `StoreListing`: Marketplace listings for sharing agents
-
-## Environment Configuration
-
- **Backend**: `.env.default` (defaults) → `.env` (user overrides)
-
-## Common Development Tasks
-
-### Adding a new block
-
-Follow the comprehensive [Block SDK Guide](@../../docs/platform/block-sdk-guide.md) which covers:
-
- Provider configuration with `ProviderBuilder`
- Block schema definition
- Authentication (API keys, OAuth, webhooks)
- Testing and validation
- File organization
-
-Quick steps:
-
-1. Create new file in `backend/blocks/`
-2. Configure provider using `ProviderBuilder` in `_config.py`
-3. Inherit from `Block` base class
-4. Define input/output schemas using `BlockSchema`
-5. Implement async `run` method
-6. Generate unique block ID using `uuid.uuid4()`
-7. Test with `poetry run pytest backend/blocks/test/test_block.py`
-
-Note: when making many new blocks analyze the interfaces for each of these blocks and picture if they would go well together in a graph-based editor or would they struggle to connect productively?
-ex: do the inputs and outputs tie well together?
-
-If you get any pushback or hit complex block conditions check the new_blocks guide in the docs.
-
-#### Handling files in blocks with `store_media_file()`
-
-When blocks need to work with files (images, videos, documents), use `store_media_file()` from `backend.util.file`. The `return_format` parameter determines what you get back:
-
-| Format | Use When | Returns |
-|--------|----------|---------|
-| `"for_local_processing"` | Processing with local tools (ffmpeg, MoviePy, PIL) | Local file path (e.g., `"image.png"`) |
-| `"for_external_api"` | Sending content to external APIs (Replicate, OpenAI) | Data URI (e.g., `"data:image/png;base64,..."`) |
-| `"for_block_output"` | Returning output from your block | Smart: `workspace://` in CoPilot, data URI in graphs |
-
-**Examples:**
-
-```python
-# INPUT: Need to process file locally with ffmpeg
-local_path = await store_media_file(
-    file=input_data.video,
-    execution_context=execution_context,
-    return_format="for_local_processing",
-)
-# local_path = "video.mp4" - use with Path/ffmpeg/etc
-
-# INPUT: Need to send to external API like Replicate
-image_b64 = await store_media_file(
-    file=input_data.image,
-    execution_context=execution_context,
-    return_format="for_external_api",
-)
-# image_b64 = "data:image/png;base64,iVBORw0..." - send to API
-
-# OUTPUT: Returning result from block
-result_url = await store_media_file(
-    file=generated_image_url,
-    execution_context=execution_context,
-    return_format="for_block_output",
-)
-yield "image_url", result_url
-# In CoPilot: result_url = "workspace://abc123"
-# In graphs:  result_url = "data:image/png;base64,..."
-```
-
-**Key points:**
-
- `for_block_output` is the ONLY format that auto-adapts to execution context
- Always use `for_block_output` for block outputs unless you have a specific reason not to
- Never hardcode workspace checks - let `for_block_output` handle it
-
-### Modifying the API
-
-1. Update route in `backend/api/features/`
-2. Add/update Pydantic models in same directory
-3. Write tests alongside the route file
-4. Run `poetry run test` to verify
-
-## Workspace & Media Files
-
-**Read [Workspace & Media Architecture](../../docs/platform/workspace-media-architecture.md) when:**
- Working on CoPilot file upload/download features
- Building blocks that handle `MediaFileType` inputs/outputs
- Modifying `WorkspaceManager` or `store_media_file()`
- Debugging file persistence or virus scanning issues
-
-Covers: `WorkspaceManager` (persistent storage with session scoping), `store_media_file()` (media normalization pipeline), and responsibility boundaries for virus scanning and persistence.
-
-## Security Implementation
-
-### Cache Protection Middleware
-
- Located in `backend/api/middleware/security.py`
- Default behavior: Disables caching for ALL endpoints with `Cache-Control: no-store, no-cache, must-revalidate, private`
- Uses an allow list approach - only explicitly permitted paths can be cached
- Cacheable paths include: static assets (`static/*`, `_next/static/*`), health checks, public store pages, documentation
- Prevents sensitive data (auth tokens, API keys, user data) from being cached by browsers/proxies
- To allow caching for a new endpoint, add it to `CACHEABLE_PATHS` in the middleware
- Applied to both main API server and external API applications
--- a/autogpt_platform/backend/CLAUDE.md
+++ b/autogpt_platform/backend/CLAUDE.md
@@ -1 +1,170 @@
-@AGENTS.md
+# CLAUDE.md - Backend
+
+This file provides guidance to Claude Code when working with the backend.
+
+## Essential Commands
+
+To run something with Python package dependencies you MUST use `poetry run ...`.
+
+```bash
+# Install dependencies
+poetry install
+
+# Run database migrations
+poetry run prisma migrate dev
+
+# Start all services (database, redis, rabbitmq, clamav)
+docker compose up -d
+
+# Run the backend as a whole
+poetry run app
+
+# Run tests
+poetry run test
+
+# Run specific test
+poetry run pytest path/to/test_file.py::test_function_name
+
+# Run block tests (tests that validate all blocks work correctly)
+poetry run pytest backend/blocks/test/test_block.py -xvs
+
+# Run tests for a specific block (e.g., GetCurrentTimeBlock)
+poetry run pytest 'backend/blocks/test/test_block.py::test_available_blocks[GetCurrentTimeBlock]' -xvs
+
+# Lint and format
+# prefer format if you want to just "fix" it and only get the errors that can't be autofixed
+poetry run format  # Black + isort
+poetry run lint    # ruff
+```
+
+More details can be found in @TESTING.md
+
+### Creating/Updating Snapshots
+
+When you first write a test or when the expected output changes:
+
+```bash
+poetry run pytest path/to/test.py --snapshot-update
+```
+
+⚠️ **Important**: Always review snapshot changes before committing! Use `git diff` to verify the changes are expected.
+
+## Architecture
+
+- **API Layer**: FastAPI with REST and WebSocket endpoints
+- **Database**: PostgreSQL with Prisma ORM, includes pgvector for embeddings
+- **Queue System**: RabbitMQ for async task processing
+- **Execution Engine**: Separate executor service processes agent workflows
+- **Authentication**: JWT-based with Supabase integration
+- **Security**: Cache protection middleware prevents sensitive data caching in browsers/proxies
+
+## Testing Approach
+
+- Uses pytest with snapshot testing for API responses
+- Test files are colocated with source files (`*_test.py`)
+
+## Database Schema
+
+Key models (defined in `schema.prisma`):
+
+- `User`: Authentication and profile data
+- `AgentGraph`: Workflow definitions with version control
+- `AgentGraphExecution`: Execution history and results
+- `AgentNode`: Individual nodes in a workflow
+- `StoreListing`: Marketplace listings for sharing agents
+
+## Environment Configuration
+
+- **Backend**: `.env.default` (defaults) → `.env` (user overrides)
+
+## Common Development Tasks
+
+### Adding a new block
+
+Follow the comprehensive [Block SDK Guide](@../../docs/content/platform/block-sdk-guide.md) which covers:
+
+- Provider configuration with `ProviderBuilder`
+- Block schema definition
+- Authentication (API keys, OAuth, webhooks)
+- Testing and validation
+- File organization
+
+Quick steps:
+
+1. Create new file in `backend/blocks/`
+2. Configure provider using `ProviderBuilder` in `_config.py`
+3. Inherit from `Block` base class
+4. Define input/output schemas using `BlockSchema`
+5. Implement async `run` method
+6. Generate unique block ID using `uuid.uuid4()`
+7. Test with `poetry run pytest backend/blocks/test/test_block.py`
+
+Note: when making many new blocks analyze the interfaces for each of these blocks and picture if they would go well together in a graph-based editor or would they struggle to connect productively?
+ex: do the inputs and outputs tie well together?
+
+If you get any pushback or hit complex block conditions check the new_blocks guide in the docs.
+
+#### Handling files in blocks with `store_media_file()`
+
+When blocks need to work with files (images, videos, documents), use `store_media_file()` from `backend.util.file`. The `return_format` parameter determines what you get back:
+
+| Format | Use When | Returns |
+|--------|----------|---------|
+| `"for_local_processing"` | Processing with local tools (ffmpeg, MoviePy, PIL) | Local file path (e.g., `"image.png"`) |
+| `"for_external_api"` | Sending content to external APIs (Replicate, OpenAI) | Data URI (e.g., `"data:image/png;base64,..."`) |
+| `"for_block_output"` | Returning output from your block | Smart: `workspace://` in CoPilot, data URI in graphs |
+
+**Examples:**
+
+```python
+# INPUT: Need to process file locally with ffmpeg
+local_path = await store_media_file(
+    file=input_data.video,
+    execution_context=execution_context,
+    return_format="for_local_processing",
+)
+# local_path = "video.mp4" - use with Path/ffmpeg/etc
+
+# INPUT: Need to send to external API like Replicate
+image_b64 = await store_media_file(
+    file=input_data.image,
+    execution_context=execution_context,
+    return_format="for_external_api",
+)
+# image_b64 = "data:image/png;base64,iVBORw0..." - send to API
+
+# OUTPUT: Returning result from block
+result_url = await store_media_file(
+    file=generated_image_url,
+    execution_context=execution_context,
+    return_format="for_block_output",
+)
+yield "image_url", result_url
+# In CoPilot: result_url = "workspace://abc123"
+# In graphs:  result_url = "data:image/png;base64,..."
+```
+
+**Key points:**
+
+- `for_block_output` is the ONLY format that auto-adapts to execution context
+- Always use `for_block_output` for block outputs unless you have a specific reason not to
+- Never hardcode workspace checks - let `for_block_output` handle it
+
+### Modifying the API
+
+1. Update route in `backend/api/features/`
+2. Add/update Pydantic models in same directory
+3. Write tests alongside the route file
+4. Run `poetry run test` to verify
+
+## Security Implementation
+
+### Cache Protection Middleware
+
+- Located in `backend/api/middleware/security.py`
+- Default behavior: Disables caching for ALL endpoints with `Cache-Control: no-store, no-cache, must-revalidate, private`
+- Uses an allow list approach - only explicitly permitted paths can be cached
+- Cacheable paths include: static assets (`static/*`, `_next/static/*`), health checks, public store pages, documentation
+- Prevents sensitive data (auth tokens, API keys, user data) from being cached by browsers/proxies
+- To allow caching for a new endpoint, add it to `CACHEABLE_PATHS` in the middleware
+- Applied to both main API server and external API applications
--- a/autogpt_platform/backend/Dockerfile
+++ b/autogpt_platform/backend/Dockerfile
@@ -1,5 +1,3 @@
-# ============================ DEPENDENCY BUILDER ============================ #
-
 FROM debian:13-slim AS builder

 # Set environment variables
@@ -50,111 +48,61 @@ RUN poetry install --no-ansi --no-root
 # Generate Prisma client
 COPY autogpt_platform/backend/schema.prisma ./
 COPY autogpt_platform/backend/backend/data/partial_types.py ./backend/data/partial_types.py
-COPY autogpt_platform/backend/scripts/gen_prisma_types_stub.py ./scripts/
+COPY autogpt_platform/backend/gen_prisma_types_stub.py ./
 RUN poetry run prisma generate && poetry run gen-prisma-stub

-# =============================== DB MIGRATOR =============================== #
-
-# Lightweight migrate stage - only needs Prisma CLI, not full Python environment
-FROM debian:13-slim AS migrate
-
-WORKDIR /app/autogpt_platform/backend
-
-ENV DEBIAN_FRONTEND=noninteractive
-
-# Install only what's needed for prisma migrate: Node.js and minimal Python for prisma-python
-RUN apt-get update && apt-get install -y --no-install-recommends \
-    python3.13 \
-    python3-pip \
-    ca-certificates \
-    && rm -rf /var/lib/apt/lists/*
-
-# Copy Node.js from builder (needed for Prisma CLI)
-COPY --from=builder /usr/bin/node /usr/bin/node
-COPY --from=builder /usr/lib/node_modules /usr/lib/node_modules
-COPY --from=builder /usr/bin/npm /usr/bin/npm
-
-# Copy Prisma binaries
-COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries
-
-# Install prisma-client-py directly (much smaller than copying full venv)
-RUN pip3 install prisma>=0.15.0 --break-system-packages
-
-COPY autogpt_platform/backend/schema.prisma ./
-COPY autogpt_platform/backend/backend/data/partial_types.py ./backend/data/partial_types.py
-COPY autogpt_platform/backend/scripts/gen_prisma_types_stub.py ./scripts/
-COPY autogpt_platform/backend/migrations ./migrations
-
-# ============================== BACKEND SERVER ============================== #
-
-FROM debian:13-slim AS server
+FROM debian:13-slim AS server_dependencies

 WORKDIR /app

-ENV DEBIAN_FRONTEND=noninteractive
+ENV POETRY_HOME=/opt/poetry \
+    POETRY_NO_INTERACTION=1 \
+    POETRY_VIRTUALENVS_CREATE=true \
+    POETRY_VIRTUALENVS_IN_PROJECT=true \
+    DEBIAN_FRONTEND=noninteractive
+ENV PATH=/opt/poetry/bin:$PATH

-# Install Python, FFmpeg, ImageMagick, and CLI tools for agent use.
-# bubblewrap provides OS-level sandbox (whitelist-only FS + no network)
-# for the bash_exec MCP tool (fallback when E2B is not configured).
-# Using --no-install-recommends saves ~650MB by skipping unnecessary deps like llvm, mesa, etc.
-RUN apt-get update && apt-get install -y --no-install-recommends \
+# Install Python without upgrading system-managed packages
+RUN apt-get update && apt-get install -y \
    python3.13 \
    python3-pip \
-    ffmpeg \
-    imagemagick \
-    jq \
-    ripgrep \
-    tree \
-    bubblewrap \
    && rm -rf /var/lib/apt/lists/*

-# Copy poetry (build-time only, for `poetry install --only-root` to create entry points)
+# Copy only necessary files from builder
+COPY --from=builder /app /app
 COPY --from=builder /usr/local/lib/python3* /usr/local/lib/python3*
 COPY --from=builder /usr/local/bin/poetry /usr/local/bin/poetry
-# Copy Node.js installation for Prisma and agent-browser.
-# npm/npx are symlinks in the builder (-> ../lib/node_modules/npm/bin/*-cli.js);
-# COPY resolves them to regular files, breaking require() paths.  Recreate as
-# proper symlinks so npm/npx can find their modules.
+# Copy Node.js installation for Prisma
 COPY --from=builder /usr/bin/node /usr/bin/node
 COPY --from=builder /usr/lib/node_modules /usr/lib/node_modules
-RUN ln -s ../lib/node_modules/npm/bin/npm-cli.js /usr/bin/npm \
-    && ln -s ../lib/node_modules/npm/bin/npx-cli.js /usr/bin/npx
+COPY --from=builder /usr/bin/npm /usr/bin/npm
+COPY --from=builder /usr/bin/npx /usr/bin/npx
 COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries

-# Install agent-browser (Copilot browser tool) using the system chromium package.
-# Chrome for Testing (the binary agent-browser downloads via `agent-browser install`)
-# has no ARM64 builds, so we use the distro-packaged chromium instead — verified to
-# work with agent-browser via Docker tests on arm64; amd64 is validated in CI.
-# Note: system chromium tracks the Debian package schedule rather than a pinned
-# Chrome for Testing release. If agent-browser requires a specific Chrome version,
-# verify compatibility against the chromium package version in the base image.
-RUN apt-get update \
-    && apt-get install -y --no-install-recommends chromium fonts-liberation \
-    && rm -rf /var/lib/apt/lists/* \
-    && npm install -g agent-browser \
-    && rm -rf /tmp/* /root/.npm
+ENV PATH="/app/autogpt_platform/backend/.venv/bin:$PATH"

-ENV AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium
+RUN mkdir -p /app/autogpt_platform/autogpt_libs
+RUN mkdir -p /app/autogpt_platform/backend
+
+COPY autogpt_platform/autogpt_libs /app/autogpt_platform/autogpt_libs
+
+COPY autogpt_platform/backend/poetry.lock autogpt_platform/backend/pyproject.toml /app/autogpt_platform/backend/

 WORKDIR /app/autogpt_platform/backend

-# Copy only the .venv from builder (not the entire /app directory)
-# The .venv includes the generated Prisma client
-COPY --from=builder /app/autogpt_platform/backend/.venv ./.venv
-ENV PATH="/app/autogpt_platform/backend/.venv/bin:$PATH"
+FROM server_dependencies AS migrate

-# Copy dependency files + autogpt_libs (path dependency)
-COPY autogpt_platform/autogpt_libs /app/autogpt_platform/autogpt_libs
-COPY autogpt_platform/backend/poetry.lock autogpt_platform/backend/pyproject.toml ./
+# Migration stage only needs schema and migrations - much lighter than full backend
+COPY autogpt_platform/backend/schema.prisma /app/autogpt_platform/backend/
+COPY autogpt_platform/backend/backend/data/partial_types.py /app/autogpt_platform/backend/backend/data/partial_types.py
+COPY autogpt_platform/backend/migrations /app/autogpt_platform/backend/migrations

-# Copy backend code + docs (for Copilot docs search)
-COPY autogpt_platform/backend ./
+FROM server_dependencies AS server
+
+COPY autogpt_platform/backend /app/autogpt_platform/backend
 COPY docs /app/docs
-# Install the project package to create entry point scripts in .venv/bin/
-# (e.g., rest, executor, ws, db, scheduler, notification - see [tool.poetry.scripts])
-RUN POETRY_VIRTUALENVS_CREATE=true POETRY_VIRTUALENVS_IN_PROJECT=true \
-    poetry install --no-ansi --only-root
+RUN poetry install --no-ansi --only-root

 ENV PORT=8000

-CMD ["rest"]
+CMD ["poetry", "run", "rest"]
--- a/autogpt_platform/backend/backend/api/conftest.py
+++ b/autogpt_platform/backend/backend/api/conftest.py
@@ -1,9 +1,4 @@
-"""Common test fixtures for server tests.
-
-Note: Common fixtures like test_user_id, admin_user_id, target_user_id,
-setup_test_user, and setup_admin_user are defined in the parent conftest.py
-(backend/conftest.py) and are available here automatically.
-"""
+"""Common test fixtures for server tests."""

 import pytest
 from pytest_snapshot.plugin import Snapshot
@@ -16,6 +11,54 @@ def configured_snapshot(snapshot: Snapshot) -> Snapshot:
    return snapshot


+@pytest.fixture
+def test_user_id() -> str:
+    """Test user ID fixture."""
+    return "3e53486c-cf57-477e-ba2a-cb02dc828e1a"
+
+
+@pytest.fixture
+def admin_user_id() -> str:
+    """Admin user ID fixture."""
+    return "4e53486c-cf57-477e-ba2a-cb02dc828e1b"
+
+
+@pytest.fixture
+def target_user_id() -> str:
+    """Target user ID fixture."""
+    return "5e53486c-cf57-477e-ba2a-cb02dc828e1c"
+
+
+@pytest.fixture
+async def setup_test_user(test_user_id):
+    """Create test user in database before tests."""
+    from backend.data.user import get_or_create_user
+
+    # Create the test user in the database using JWT token format
+    user_data = {
+        "sub": test_user_id,
+        "email": "test@example.com",
+        "user_metadata": {"name": "Test User"},
+    }
+    await get_or_create_user(user_data)
+    return test_user_id
+
+
+@pytest.fixture
+async def setup_admin_user(admin_user_id):
+    """Create admin user in database before tests."""
+    from backend.data.user import get_or_create_user
+
+    # Create the admin user in the database using JWT token format
+    user_data = {
+        "sub": admin_user_id,
+        "email": "test-admin@example.com",
+        "user_metadata": {"name": "Test Admin"},
+    }
+    await get_or_create_user(user_data)
+    return admin_user_id
+
+
@pytest.fixture
 def mock_jwt_user(test_user_id):
    """Provide mock JWT payload for regular user testing."""
--- a/autogpt_platform/backend/backend/api/external/middleware.py
+++ b/autogpt_platform/backend/backend/api/external/middleware.py
@@ -88,23 +88,20 @@ async def require_auth(
    )


-def require_permission(*permissions: APIKeyPermission):
+def require_permission(permission: APIKeyPermission):
    """
-    Dependency function for checking required permissions.
-    All listed permissions must be present.
+    Dependency function for checking specific permissions
    (works with API keys and OAuth tokens)
    """

-    async def check_permissions(
+    async def check_permission(
        auth: APIAuthorizationInfo = Security(require_auth),
    ) -> APIAuthorizationInfo:
-        missing = [p for p in permissions if p not in auth.scopes]
-        if missing:
+        if permission not in auth.scopes:
            raise HTTPException(
                status_code=status.HTTP_403_FORBIDDEN,
-                detail=f"Missing required permission(s): "
-                f"{', '.join(p.value for p in missing)}",
+                detail=f"Missing required permission: {permission.value}",
            )
        return auth

-    return check_permissions
+    return check_permission
--- a/autogpt_platform/backend/backend/api/external/v1/integrations.py
+++ b/autogpt_platform/backend/backend/api/external/v1/integrations.py
@@ -18,22 +18,14 @@ from pydantic import BaseModel, Field, SecretStr

 from backend.api.external.middleware import require_permission
 from backend.api.features.integrations.models import get_all_provider_names
-from backend.api.features.integrations.router import (
-    CredentialsMetaResponse,
-    to_meta_response,
-)
 from backend.data.auth.base import APIAuthorizationInfo
 from backend.data.model import (
    APIKeyCredentials,
    Credentials,
    CredentialsType,
    HostScopedCredentials,
+    OAuth2Credentials,
    UserPasswordCredentials,
-    is_sdk_default,
-)
-from backend.integrations.credentials_store import (
-    is_system_credential,
-    provider_matches,
 )
 from backend.integrations.creds_manager import IntegrationCredentialsManager
 from backend.integrations.oauth import CREDENTIALS_BY_PROVIDER, HANDLERS_BY_NAME
@@ -99,6 +91,18 @@ class OAuthCompleteResponse(BaseModel):
    )


+class CredentialSummary(BaseModel):
+    """Summary of a credential without sensitive data."""
+
+    id: str
+    provider: str
+    type: CredentialsType
+    title: Optional[str] = None
+    scopes: Optional[list[str]] = None
+    username: Optional[str] = None
+    host: Optional[str] = None
+
+
 class ProviderInfo(BaseModel):
    """Information about an integration provider."""

@@ -469,12 +473,12 @@ async def complete_oauth(
    )


-@integrations_router.get("/credentials", response_model=list[CredentialsMetaResponse])
+@integrations_router.get("/credentials", response_model=list[CredentialSummary])
 async def list_credentials(
    auth: APIAuthorizationInfo = Security(
        require_permission(APIKeyPermission.READ_INTEGRATIONS)
    ),
-) -> list[CredentialsMetaResponse]:
+) -> list[CredentialSummary]:
    """
    List all credentials for the authenticated user.

@@ -482,19 +486,28 @@ async def list_credentials(
    """
    credentials = await creds_manager.store.get_all_creds(auth.user_id)
    return [
-        to_meta_response(cred) for cred in credentials if not is_sdk_default(cred.id)
+        CredentialSummary(
+            id=cred.id,
+            provider=cred.provider,
+            type=cred.type,
+            title=cred.title,
+            scopes=cred.scopes if isinstance(cred, OAuth2Credentials) else None,
+            username=cred.username if isinstance(cred, OAuth2Credentials) else None,
+            host=cred.host if isinstance(cred, HostScopedCredentials) else None,
+        )
+        for cred in credentials
    ]


@integrations_router.get(
-    "/{provider}/credentials", response_model=list[CredentialsMetaResponse]
+    "/{provider}/credentials", response_model=list[CredentialSummary]
 )
 async def list_credentials_by_provider(
    provider: Annotated[str, Path(title="The provider to list credentials for")],
    auth: APIAuthorizationInfo = Security(
        require_permission(APIKeyPermission.READ_INTEGRATIONS)
    ),
-) -> list[CredentialsMetaResponse]:
+) -> list[CredentialSummary]:
    """
    List credentials for a specific provider.
    """
@@ -502,7 +515,16 @@ async def list_credentials_by_provider(
        auth.user_id, provider
    )
    return [
-        to_meta_response(cred) for cred in credentials if not is_sdk_default(cred.id)
+        CredentialSummary(
+            id=cred.id,
+            provider=cred.provider,
+            type=cred.type,
+            title=cred.title,
+            scopes=cred.scopes if isinstance(cred, OAuth2Credentials) else None,
+            username=cred.username if isinstance(cred, OAuth2Credentials) else None,
+            host=cred.host if isinstance(cred, HostScopedCredentials) else None,
+        )
+        for cred in credentials
    ]


@@ -575,11 +597,11 @@ async def create_credential(
    # Store credentials
    try:
        await creds_manager.create(auth.user_id, credentials)
-    except Exception:
-        logger.exception("Failed to store credentials")
+    except Exception as e:
+        logger.error(f"Failed to store credentials: {e}")
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
-            detail="Failed to store credentials",
+            detail=f"Failed to store credentials: {str(e)}",
        )

    logger.info(f"Created {request.type} credentials for provider {provider}")
@@ -617,23 +639,15 @@ async def delete_credential(
    use the main API's delete endpoint which handles webhook cleanup and
    token revocation.
    """
-    if is_sdk_default(cred_id):
-        raise HTTPException(
-            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
-        )
-    if is_system_credential(cred_id):
-        raise HTTPException(
-            status_code=status.HTTP_403_FORBIDDEN,
-            detail="System-managed credentials cannot be deleted",
-        )
    creds = await creds_manager.store.get_creds_by_id(auth.user_id, cred_id)
    if not creds:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
        )
-    if not provider_matches(creds.provider, provider):
+    if creds.provider != provider:
        raise HTTPException(
-            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
+            status_code=status.HTTP_404_NOT_FOUND,
+            detail="Credentials do not match the specified provider",
        )

    await creds_manager.delete(auth.user_id, cred_id)
--- a/autogpt_platform/backend/backend/api/external/v1/routes.py
+++ b/autogpt_platform/backend/backend/api/external/v1/routes.py
@@ -1,7 +1,7 @@
 import logging
 import urllib.parse
 from collections import defaultdict
-from typing import Annotated, Any, Optional, Sequence
+from typing import Annotated, Any, Literal, Optional, Sequence

 from fastapi import APIRouter, Body, HTTPException, Security
 from prisma.enums import AgentExecutionStatus, APIKeyPermission
@@ -9,17 +9,15 @@ from pydantic import BaseModel, Field
 from typing_extensions import TypedDict

 import backend.api.features.store.cache as store_cache
-import backend.api.features.store.db as store_db
 import backend.api.features.store.model as store_model
-import backend.blocks
-from backend.api.external.middleware import require_auth, require_permission
+import backend.data.block
+from backend.api.external.middleware import require_permission
 from backend.data import execution as execution_db
 from backend.data import graph as graph_db
 from backend.data import user as user_db
 from backend.data.auth.base import APIAuthorizationInfo
 from backend.data.block import BlockInput, CompletedBlockOutput
 from backend.executor.utils import add_graph_execution
-from backend.integrations.webhooks.graph_lifecycle_hooks import on_graph_activate
 from backend.util.settings import Settings

 from .integrations import integrations_router
@@ -69,7 +67,7 @@ async def get_user_info(
    dependencies=[Security(require_permission(APIKeyPermission.READ_BLOCK))],
 )
 async def get_graph_blocks() -> Sequence[dict[Any, Any]]:
-    blocks = [block() for block in backend.blocks.get_blocks().values()]
+    blocks = [block() for block in backend.data.block.get_blocks().values()]
    return [b.to_dict() for b in blocks if not b.disabled]


@@ -85,7 +83,7 @@ async def execute_graph_block(
        require_permission(APIKeyPermission.EXECUTE_BLOCK)
    ),
 ) -> CompletedBlockOutput:
-    obj = backend.blocks.get_block(block_id)
+    obj = backend.data.block.get_block(block_id)
    if not obj:
        raise HTTPException(status_code=404, detail=f"Block #{block_id} not found.")
    if obj.disabled:
@@ -97,43 +95,6 @@ async def execute_graph_block(
    return output


-@v1_router.post(
-    path="/graphs",
-    tags=["graphs"],
-    status_code=201,
-    dependencies=[
-        Security(
-            require_permission(
-                APIKeyPermission.WRITE_GRAPH, APIKeyPermission.WRITE_LIBRARY
-            )
-        )
-    ],
-)
-async def create_graph(
-    graph: graph_db.Graph,
-    auth: APIAuthorizationInfo = Security(
-        require_permission(APIKeyPermission.WRITE_GRAPH, APIKeyPermission.WRITE_LIBRARY)
-    ),
-) -> graph_db.GraphModel:
-    """
-    Create a new agent graph.
-
-    The graph will be validated and assigned a new ID.
-    It is automatically added to the user's library.
-    """
-    from backend.api.features.library import db as library_db
-
-    graph_model = graph_db.make_graph_model(graph, auth.user_id)
-    graph_model.reassign_ids(user_id=auth.user_id, reassign_graph_id=True)
-    graph_model.validate_graph(for_run=False)
-
-    await graph_db.create_graph(graph_model, user_id=auth.user_id)
-    await library_db.create_library_agent(graph_model, auth.user_id)
-    activated_graph = await on_graph_activate(graph_model, user_id=auth.user_id)
-
-    return activated_graph
-
-
@v1_router.post(
    path="/graphs/{graph_id}/execute/{graph_version}",
    tags=["graphs"],
@@ -231,13 +192,13 @@ async def get_graph_execution_results(
@v1_router.get(
    path="/store/agents",
    tags=["store"],
-    dependencies=[Security(require_auth)],  # data is public; auth required as anti-DDoS
+    dependencies=[Security(require_permission(APIKeyPermission.READ_STORE))],
    response_model=store_model.StoreAgentsResponse,
 )
 async def get_store_agents(
    featured: bool = False,
    creator: str | None = None,
-    sorted_by: store_db.StoreAgentsSortOptions | None = None,
+    sorted_by: Literal["rating", "runs", "name", "updated_at"] | None = None,
    search_query: str | None = None,
    category: str | None = None,
    page: int = 1,
@@ -279,7 +240,7 @@ async def get_store_agents(
@v1_router.get(
    path="/store/agents/{username}/{agent_name}",
    tags=["store"],
-    dependencies=[Security(require_auth)],  # data is public; auth required as anti-DDoS
+    dependencies=[Security(require_permission(APIKeyPermission.READ_STORE))],
    response_model=store_model.StoreAgentDetails,
 )
 async def get_store_agent(
@@ -307,13 +268,13 @@ async def get_store_agent(
@v1_router.get(
    path="/store/creators",
    tags=["store"],
-    dependencies=[Security(require_auth)],  # data is public; auth required as anti-DDoS
+    dependencies=[Security(require_permission(APIKeyPermission.READ_STORE))],
    response_model=store_model.CreatorsResponse,
 )
 async def get_store_creators(
    featured: bool = False,
    search_query: str | None = None,
-    sorted_by: store_db.StoreCreatorsSortOptions | None = None,
+    sorted_by: Literal["agent_rating", "agent_runs", "num_agents"] | None = None,
    page: int = 1,
    page_size: int = 20,
 ) -> store_model.CreatorsResponse:
@@ -349,7 +310,7 @@ async def get_store_creators(
@v1_router.get(
    path="/store/creators/{username}",
    tags=["store"],
-    dependencies=[Security(require_auth)],  # data is public; auth required as anti-DDoS
+    dependencies=[Security(require_permission(APIKeyPermission.READ_STORE))],
    response_model=store_model.CreatorDetails,
 )
 async def get_store_creator(
--- a/autogpt_platform/backend/backend/api/external/v1/tools.py
+++ b/autogpt_platform/backend/backend/api/external/v1/tools.py
@@ -15,9 +15,9 @@ from prisma.enums import APIKeyPermission
 from pydantic import BaseModel, Field

 from backend.api.external.middleware import require_permission
-from backend.copilot.model import ChatSession
-from backend.copilot.tools import find_agent_tool, run_agent_tool
-from backend.copilot.tools.models import ToolResponseBase
+from backend.api.features.chat.model import ChatSession
+from backend.api.features.chat.tools import find_agent_tool, run_agent_tool
+from backend.api.features.chat.tools.models import ToolResponseBase
 from backend.data.auth.base import APIAuthorizationInfo

 logger = logging.getLogger(__name__)
@@ -72,7 +72,7 @@ class RunAgentRequest(BaseModel):

 def _create_ephemeral_session(user_id: str) -> ChatSession:
    """Create an ephemeral session for stateless API requests."""
-    return ChatSession.new(user_id, dry_run=False)
+    return ChatSession.new(user_id)


@tools_router.post(
--- a/autogpt_platform/backend/backend/api/features/admin/platform_cost_routes.py
+++ b/autogpt_platform/backend/backend/api/features/admin/platform_cost_routes.py
@@ -1,98 +0,0 @@
-import logging
-from datetime import datetime
-
-from autogpt_libs.auth import get_user_id, requires_admin_user
-from cachetools import TTLCache
-from fastapi import APIRouter, Query, Security
-from pydantic import BaseModel
-
-from backend.data.platform_cost import (
-    CostLogRow,
-    PlatformCostDashboard,
-    get_platform_cost_dashboard,
-    get_platform_cost_logs,
-)
-from backend.util.models import Pagination
-
-logger = logging.getLogger(__name__)
-
-# Cache dashboard results for 30 seconds per unique filter combination.
-# The table is append-only so stale reads are acceptable for analytics.
-_DASHBOARD_CACHE_TTL = 30
-_dashboard_cache: TTLCache[tuple, PlatformCostDashboard] = TTLCache(
-    maxsize=256, ttl=_DASHBOARD_CACHE_TTL
-)
-
-
-router = APIRouter(
-    prefix="/platform-costs",
-    tags=["platform-cost", "admin"],
-    dependencies=[Security(requires_admin_user)],
-)
-
-
-class PlatformCostLogsResponse(BaseModel):
-    logs: list[CostLogRow]
-    pagination: Pagination
-
-
-@router.get(
-    "/dashboard",
-    response_model=PlatformCostDashboard,
-    summary="Get Platform Cost Dashboard",
-)
-async def get_cost_dashboard(
-    admin_user_id: str = Security(get_user_id),
-    start: datetime | None = Query(None),
-    end: datetime | None = Query(None),
-    provider: str | None = Query(None),
-    user_id: str | None = Query(None),
-):
-    logger.info("Admin %s fetching platform cost dashboard", admin_user_id)
-    cache_key = (start, end, provider, user_id)
-    cached = _dashboard_cache.get(cache_key)
-    if cached is not None:
-        return cached
-    result = await get_platform_cost_dashboard(
-        start=start,
-        end=end,
-        provider=provider,
-        user_id=user_id,
-    )
-    _dashboard_cache[cache_key] = result
-    return result
-
-
-@router.get(
-    "/logs",
-    response_model=PlatformCostLogsResponse,
-    summary="Get Platform Cost Logs",
-)
-async def get_cost_logs(
-    admin_user_id: str = Security(get_user_id),
-    start: datetime | None = Query(None),
-    end: datetime | None = Query(None),
-    provider: str | None = Query(None),
-    user_id: str | None = Query(None),
-    page: int = Query(1, ge=1),
-    page_size: int = Query(50, ge=1, le=200),
-):
-    logger.info("Admin %s fetching platform cost logs", admin_user_id)
-    logs, total = await get_platform_cost_logs(
-        start=start,
-        end=end,
-        provider=provider,
-        user_id=user_id,
-        page=page,
-        page_size=page_size,
-    )
-    total_pages = (total + page_size - 1) // page_size
-    return PlatformCostLogsResponse(
-        logs=logs,
-        pagination=Pagination(
-            total_items=total,
-            total_pages=total_pages,
-            current_page=page,
-            page_size=page_size,
-        ),
-    )
--- a/autogpt_platform/backend/backend/api/features/admin/platform_cost_routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/admin/platform_cost_routes_test.py
@@ -1,192 +0,0 @@
-from unittest.mock import AsyncMock
-
-import fastapi
-import fastapi.testclient
-import pytest
-import pytest_mock
-from autogpt_libs.auth.jwt_utils import get_jwt_payload
-
-from backend.data.platform_cost import PlatformCostDashboard
-
-from . import platform_cost_routes
-from .platform_cost_routes import router as platform_cost_router
-
-app = fastapi.FastAPI()
-app.include_router(platform_cost_router)
-
-client = fastapi.testclient.TestClient(app)
-
-
-@pytest.fixture(autouse=True)
-def setup_app_admin_auth(mock_jwt_admin):
-    """Setup admin auth overrides for all tests in this module"""
-    app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
-    # Clear TTL cache so each test starts cold.
-    platform_cost_routes._dashboard_cache.clear()
-    yield
-    app.dependency_overrides.clear()
-
-
-def test_get_dashboard_success(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    real_dashboard = PlatformCostDashboard(
-        by_provider=[],
-        by_user=[],
-        total_cost_microdollars=0,
-        total_requests=0,
-        total_users=0,
-    )
-    mocker.patch(
-        "backend.api.features.admin.platform_cost_routes.get_platform_cost_dashboard",
-        AsyncMock(return_value=real_dashboard),
-    )
-
-    response = client.get("/platform-costs/dashboard")
-    assert response.status_code == 200
-    data = response.json()
-    assert "by_provider" in data
-    assert "by_user" in data
-    assert data["total_cost_microdollars"] == 0
-
-
-def test_get_logs_success(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    mocker.patch(
-        "backend.api.features.admin.platform_cost_routes.get_platform_cost_logs",
-        AsyncMock(return_value=([], 0)),
-    )
-
-    response = client.get("/platform-costs/logs")
-    assert response.status_code == 200
-    data = response.json()
-    assert data["logs"] == []
-    assert data["pagination"]["total_items"] == 0
-
-
-def test_get_dashboard_with_filters(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    real_dashboard = PlatformCostDashboard(
-        by_provider=[],
-        by_user=[],
-        total_cost_microdollars=0,
-        total_requests=0,
-        total_users=0,
-    )
-    mock_dashboard = AsyncMock(return_value=real_dashboard)
-    mocker.patch(
-        "backend.api.features.admin.platform_cost_routes.get_platform_cost_dashboard",
-        mock_dashboard,
-    )
-
-    response = client.get(
-        "/platform-costs/dashboard",
-        params={
-            "start": "2026-01-01T00:00:00",
-            "end": "2026-04-01T00:00:00",
-            "provider": "openai",
-            "user_id": "test-user-123",
-        },
-    )
-    assert response.status_code == 200
-    mock_dashboard.assert_called_once()
-    call_kwargs = mock_dashboard.call_args.kwargs
-    assert call_kwargs["provider"] == "openai"
-    assert call_kwargs["user_id"] == "test-user-123"
-    assert call_kwargs["start"] is not None
-    assert call_kwargs["end"] is not None
-
-
-def test_get_logs_with_pagination(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    mocker.patch(
-        "backend.api.features.admin.platform_cost_routes.get_platform_cost_logs",
-        AsyncMock(return_value=([], 0)),
-    )
-
-    response = client.get(
-        "/platform-costs/logs",
-        params={"page": 2, "page_size": 25, "provider": "anthropic"},
-    )
-    assert response.status_code == 200
-    data = response.json()
-    assert data["pagination"]["current_page"] == 2
-    assert data["pagination"]["page_size"] == 25
-
-
-def test_get_dashboard_requires_admin() -> None:
-    import fastapi
-    from fastapi import HTTPException
-
-    def reject_jwt(request: fastapi.Request):
-        raise HTTPException(status_code=401, detail="Not authenticated")
-
-    app.dependency_overrides[get_jwt_payload] = reject_jwt
-    try:
-        response = client.get("/platform-costs/dashboard")
-        assert response.status_code == 401
-        response = client.get("/platform-costs/logs")
-        assert response.status_code == 401
-    finally:
-        app.dependency_overrides.clear()
-
-
-def test_get_dashboard_rejects_non_admin(mock_jwt_user, mock_jwt_admin) -> None:
-    """Non-admin JWT must be rejected with 403 by requires_admin_user."""
-    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
-    try:
-        response = client.get("/platform-costs/dashboard")
-        assert response.status_code == 403
-        response = client.get("/platform-costs/logs")
-        assert response.status_code == 403
-    finally:
-        app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
-
-
-def test_get_logs_invalid_page_size_too_large() -> None:
-    """page_size > 200 must be rejected with 422."""
-    response = client.get("/platform-costs/logs", params={"page_size": 201})
-    assert response.status_code == 422
-
-
-def test_get_logs_invalid_page_size_zero() -> None:
-    """page_size = 0 (below ge=1) must be rejected with 422."""
-    response = client.get("/platform-costs/logs", params={"page_size": 0})
-    assert response.status_code == 422
-
-
-def test_get_logs_invalid_page_negative() -> None:
-    """page < 1 must be rejected with 422."""
-    response = client.get("/platform-costs/logs", params={"page": 0})
-    assert response.status_code == 422
-
-
-def test_get_dashboard_invalid_date_format() -> None:
-    """Malformed start date must be rejected with 422."""
-    response = client.get("/platform-costs/dashboard", params={"start": "not-a-date"})
-    assert response.status_code == 422
-
-
-def test_get_dashboard_cache_hit(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    """Second identical request returns cached result without calling the DB again."""
-    real_dashboard = PlatformCostDashboard(
-        by_provider=[],
-        by_user=[],
-        total_cost_microdollars=42,
-        total_requests=1,
-        total_users=1,
-    )
-    mock_fn = mocker.patch(
-        "backend.api.features.admin.platform_cost_routes.get_platform_cost_dashboard",
-        AsyncMock(return_value=real_dashboard),
-    )
-
-    client.get("/platform-costs/dashboard")
-    client.get("/platform-costs/dashboard")
-
-    mock_fn.assert_awaited_once()  # second request hit the cache
--- a/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes.py
+++ b/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes.py
@@ -1,259 +0,0 @@
-"""Admin endpoints for checking and resetting user CoPilot rate limit usage."""
-
-import logging
-from typing import Optional
-
-from autogpt_libs.auth import get_user_id, requires_admin_user
-from fastapi import APIRouter, Body, HTTPException, Security
-from pydantic import BaseModel
-
-from backend.copilot.config import ChatConfig
-from backend.copilot.rate_limit import (
-    SubscriptionTier,
-    get_global_rate_limits,
-    get_usage_status,
-    get_user_tier,
-    reset_user_usage,
-    set_user_tier,
-)
-from backend.data.user import get_user_by_email, get_user_email_by_id, search_users
-
-logger = logging.getLogger(__name__)
-
-config = ChatConfig()
-
-router = APIRouter(
-    prefix="/admin",
-    tags=["copilot", "admin"],
-    dependencies=[Security(requires_admin_user)],
-)
-
-
-class UserRateLimitResponse(BaseModel):
-    user_id: str
-    user_email: Optional[str] = None
-    daily_token_limit: int
-    weekly_token_limit: int
-    daily_tokens_used: int
-    weekly_tokens_used: int
-    tier: SubscriptionTier
-
-
-class UserTierResponse(BaseModel):
-    user_id: str
-    tier: SubscriptionTier
-
-
-class SetUserTierRequest(BaseModel):
-    user_id: str
-    tier: SubscriptionTier
-
-
-async def _resolve_user_id(
-    user_id: Optional[str], email: Optional[str]
-) -> tuple[str, Optional[str]]:
-    """Resolve a user_id and email from the provided parameters.
-
-    Returns (user_id, email). Accepts either user_id or email; at least one
-    must be provided.  When both are provided, ``email`` takes precedence.
-    """
-    if email:
-        user = await get_user_by_email(email)
-        if not user:
-            raise HTTPException(
-                status_code=404, detail="No user found with the provided email."
-            )
-        return user.id, email
-
-    if not user_id:
-        raise HTTPException(
-            status_code=400,
-            detail="Either user_id or email query parameter is required.",
-        )
-
-    # We have a user_id; try to look up their email for display purposes.
-    # This is non-critical -- a failure should not block the response.
-    try:
-        resolved_email = await get_user_email_by_id(user_id)
-    except Exception:
-        logger.warning("Failed to resolve email for user %s", user_id, exc_info=True)
-        resolved_email = None
-    return user_id, resolved_email
-
-
-@router.get(
-    "/rate_limit",
-    response_model=UserRateLimitResponse,
-    summary="Get User Rate Limit",
-)
-async def get_user_rate_limit(
-    user_id: Optional[str] = None,
-    email: Optional[str] = None,
-    admin_user_id: str = Security(get_user_id),
-) -> UserRateLimitResponse:
-    """Get a user's current usage and effective rate limits. Admin-only.
-
-    Accepts either ``user_id`` or ``email`` as a query parameter.
-    When ``email`` is provided the user is looked up by email first.
-    """
-    resolved_id, resolved_email = await _resolve_user_id(user_id, email)
-
-    logger.info("Admin %s checking rate limit for user %s", admin_user_id, resolved_id)
-
-    daily_limit, weekly_limit, tier = await get_global_rate_limits(
-        resolved_id, config.daily_token_limit, config.weekly_token_limit
-    )
-    usage = await get_usage_status(resolved_id, daily_limit, weekly_limit, tier=tier)
-
-    return UserRateLimitResponse(
-        user_id=resolved_id,
-        user_email=resolved_email,
-        daily_token_limit=daily_limit,
-        weekly_token_limit=weekly_limit,
-        daily_tokens_used=usage.daily.used,
-        weekly_tokens_used=usage.weekly.used,
-        tier=tier,
-    )
-
-
-@router.post(
-    "/rate_limit/reset",
-    response_model=UserRateLimitResponse,
-    summary="Reset User Rate Limit Usage",
-)
-async def reset_user_rate_limit(
-    user_id: str = Body(embed=True),
-    reset_weekly: bool = Body(False, embed=True),
-    admin_user_id: str = Security(get_user_id),
-) -> UserRateLimitResponse:
-    """Reset a user's daily usage counter (and optionally weekly). Admin-only."""
-    logger.info(
-        "Admin %s resetting rate limit for user %s (reset_weekly=%s)",
-        admin_user_id,
-        user_id,
-        reset_weekly,
-    )
-
-    try:
-        await reset_user_usage(user_id, reset_weekly=reset_weekly)
-    except Exception as e:
-        logger.exception("Failed to reset user usage")
-        raise HTTPException(status_code=500, detail="Failed to reset usage") from e
-
-    daily_limit, weekly_limit, tier = await get_global_rate_limits(
-        user_id, config.daily_token_limit, config.weekly_token_limit
-    )
-    usage = await get_usage_status(user_id, daily_limit, weekly_limit, tier=tier)
-
-    try:
-        resolved_email = await get_user_email_by_id(user_id)
-    except Exception:
-        logger.warning("Failed to resolve email for user %s", user_id, exc_info=True)
-        resolved_email = None
-
-    return UserRateLimitResponse(
-        user_id=user_id,
-        user_email=resolved_email,
-        daily_token_limit=daily_limit,
-        weekly_token_limit=weekly_limit,
-        daily_tokens_used=usage.daily.used,
-        weekly_tokens_used=usage.weekly.used,
-        tier=tier,
-    )
-
-
-@router.get(
-    "/rate_limit/tier",
-    response_model=UserTierResponse,
-    summary="Get User Rate Limit Tier",
-)
-async def get_user_rate_limit_tier(
-    user_id: str,
-    admin_user_id: str = Security(get_user_id),
-) -> UserTierResponse:
-    """Get a user's current rate-limit tier. Admin-only.
-
-    Returns 404 if the user does not exist in the database.
-    """
-    logger.info("Admin %s checking tier for user %s", admin_user_id, user_id)
-
-    resolved_email = await get_user_email_by_id(user_id)
-    if resolved_email is None:
-        raise HTTPException(status_code=404, detail=f"User {user_id} not found")
-
-    tier = await get_user_tier(user_id)
-    return UserTierResponse(user_id=user_id, tier=tier)
-
-
-@router.post(
-    "/rate_limit/tier",
-    response_model=UserTierResponse,
-    summary="Set User Rate Limit Tier",
-)
-async def set_user_rate_limit_tier(
-    request: SetUserTierRequest,
-    admin_user_id: str = Security(get_user_id),
-) -> UserTierResponse:
-    """Set a user's rate-limit tier. Admin-only.
-
-    Returns 404 if the user does not exist in the database.
-    """
-    try:
-        resolved_email = await get_user_email_by_id(request.user_id)
-    except Exception:
-        logger.warning(
-            "Failed to resolve email for user %s",
-            request.user_id,
-            exc_info=True,
-        )
-        resolved_email = None
-
-    if resolved_email is None:
-        raise HTTPException(status_code=404, detail=f"User {request.user_id} not found")
-
-    old_tier = await get_user_tier(request.user_id)
-    logger.info(
-        "Admin %s changing tier for user %s (%s): %s -> %s",
-        admin_user_id,
-        request.user_id,
-        resolved_email,
-        old_tier.value,
-        request.tier.value,
-    )
-    try:
-        await set_user_tier(request.user_id, request.tier)
-    except Exception as e:
-        logger.exception("Failed to set user tier")
-        raise HTTPException(status_code=500, detail="Failed to set tier") from e
-
-    return UserTierResponse(user_id=request.user_id, tier=request.tier)
-
-
-class UserSearchResult(BaseModel):
-    user_id: str
-    user_email: Optional[str] = None
-
-
-@router.get(
-    "/rate_limit/search_users",
-    response_model=list[UserSearchResult],
-    summary="Search Users by Name or Email",
-)
-async def admin_search_users(
-    query: str,
-    limit: int = 20,
-    admin_user_id: str = Security(get_user_id),
-) -> list[UserSearchResult]:
-    """Search users by partial email or name. Admin-only.
-
-    Queries the User table directly — returns results even for users
-    without credit transaction history.
-    """
-    if len(query.strip()) < 3:
-        raise HTTPException(
-            status_code=400,
-            detail="Search query must be at least 3 characters.",
-        )
-    logger.info("Admin %s searching users with query=%r", admin_user_id, query)
-    results = await search_users(query, limit=max(1, min(limit, 50)))
-    return [UserSearchResult(user_id=uid, user_email=email) for uid, email in results]
--- a/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes_test.py
@@ -1,566 +0,0 @@
-import json
-from types import SimpleNamespace
-from unittest.mock import AsyncMock
-
-import fastapi
-import fastapi.testclient
-import pytest
-import pytest_mock
-from autogpt_libs.auth.jwt_utils import get_jwt_payload
-from pytest_snapshot.plugin import Snapshot
-
-from backend.copilot.rate_limit import CoPilotUsageStatus, SubscriptionTier, UsageWindow
-
-from .rate_limit_admin_routes import router as rate_limit_admin_router
-
-app = fastapi.FastAPI()
-app.include_router(rate_limit_admin_router)
-
-client = fastapi.testclient.TestClient(app)
-
-_MOCK_MODULE = "backend.api.features.admin.rate_limit_admin_routes"
-
-_TARGET_EMAIL = "target@example.com"
-
-
-@pytest.fixture(autouse=True)
-def setup_app_admin_auth(mock_jwt_admin):
-    """Setup admin auth overrides for all tests in this module"""
-    app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
-    yield
-    app.dependency_overrides.clear()
-
-
-def _mock_usage_status(
-    daily_used: int = 500_000, weekly_used: int = 3_000_000
-) -> CoPilotUsageStatus:
-    from datetime import UTC, datetime, timedelta
-
-    now = datetime.now(UTC)
-    return CoPilotUsageStatus(
-        daily=UsageWindow(
-            used=daily_used, limit=2_500_000, resets_at=now + timedelta(hours=6)
-        ),
-        weekly=UsageWindow(
-            used=weekly_used, limit=12_500_000, resets_at=now + timedelta(days=3)
-        ),
-    )
-
-
-def _patch_rate_limit_deps(
-    mocker: pytest_mock.MockerFixture,
-    target_user_id: str,
-    daily_used: int = 500_000,
-    weekly_used: int = 3_000_000,
-):
-    """Patch the common rate-limit + user-lookup dependencies."""
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_global_rate_limits",
-        new_callable=AsyncMock,
-        return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
-    )
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_usage_status",
-        new_callable=AsyncMock,
-        return_value=_mock_usage_status(daily_used=daily_used, weekly_used=weekly_used),
-    )
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_user_email_by_id",
-        new_callable=AsyncMock,
-        return_value=_TARGET_EMAIL,
-    )
-
-
-def test_get_rate_limit(
-    mocker: pytest_mock.MockerFixture,
-    configured_snapshot: Snapshot,
-    target_user_id: str,
-) -> None:
-    """Test getting rate limit and usage for a user."""
-    _patch_rate_limit_deps(mocker, target_user_id)
-
-    response = client.get("/admin/rate_limit", params={"user_id": target_user_id})
-
-    assert response.status_code == 200
-    data = response.json()
-    assert data["user_id"] == target_user_id
-    assert data["user_email"] == _TARGET_EMAIL
-    assert data["daily_token_limit"] == 2_500_000
-    assert data["weekly_token_limit"] == 12_500_000
-    assert data["daily_tokens_used"] == 500_000
-    assert data["weekly_tokens_used"] == 3_000_000
-    assert data["tier"] == "FREE"
-
-    configured_snapshot.assert_match(
-        json.dumps(data, indent=2, sort_keys=True) + "\n",
-        "get_rate_limit",
-    )
-
-
-def test_get_rate_limit_by_email(
-    mocker: pytest_mock.MockerFixture,
-    target_user_id: str,
-) -> None:
-    """Test looking up rate limits via email instead of user_id."""
-    _patch_rate_limit_deps(mocker, target_user_id)
-
-    mock_user = SimpleNamespace(id=target_user_id, email=_TARGET_EMAIL)
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_user_by_email",
-        new_callable=AsyncMock,
-        return_value=mock_user,
-    )
-
-    response = client.get("/admin/rate_limit", params={"email": _TARGET_EMAIL})
-
-    assert response.status_code == 200
-    data = response.json()
-    assert data["user_id"] == target_user_id
-    assert data["user_email"] == _TARGET_EMAIL
-    assert data["daily_token_limit"] == 2_500_000
-
-
-def test_get_rate_limit_by_email_not_found(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    """Test that looking up a non-existent email returns 404."""
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_user_by_email",
-        new_callable=AsyncMock,
-        return_value=None,
-    )
-
-    response = client.get("/admin/rate_limit", params={"email": "nobody@example.com"})
-
-    assert response.status_code == 404
-
-
-def test_get_rate_limit_no_params() -> None:
-    """Test that omitting both user_id and email returns 400."""
-    response = client.get("/admin/rate_limit")
-    assert response.status_code == 400
-
-
-def test_reset_user_usage_daily_only(
-    mocker: pytest_mock.MockerFixture,
-    configured_snapshot: Snapshot,
-    target_user_id: str,
-) -> None:
-    """Test resetting only daily usage (default behaviour)."""
-    mock_reset = mocker.patch(
-        f"{_MOCK_MODULE}.reset_user_usage",
-        new_callable=AsyncMock,
-    )
-    _patch_rate_limit_deps(mocker, target_user_id, daily_used=0, weekly_used=3_000_000)
-
-    response = client.post(
-        "/admin/rate_limit/reset",
-        json={"user_id": target_user_id},
-    )
-
-    assert response.status_code == 200
-    data = response.json()
-    assert data["daily_tokens_used"] == 0
-    # Weekly is untouched
-    assert data["weekly_tokens_used"] == 3_000_000
-    assert data["tier"] == "FREE"
-
-    mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=False)
-
-    configured_snapshot.assert_match(
-        json.dumps(data, indent=2, sort_keys=True) + "\n",
-        "reset_user_usage_daily_only",
-    )
-
-
-def test_reset_user_usage_daily_and_weekly(
-    mocker: pytest_mock.MockerFixture,
-    configured_snapshot: Snapshot,
-    target_user_id: str,
-) -> None:
-    """Test resetting both daily and weekly usage."""
-    mock_reset = mocker.patch(
-        f"{_MOCK_MODULE}.reset_user_usage",
-        new_callable=AsyncMock,
-    )
-    _patch_rate_limit_deps(mocker, target_user_id, daily_used=0, weekly_used=0)
-
-    response = client.post(
-        "/admin/rate_limit/reset",
-        json={"user_id": target_user_id, "reset_weekly": True},
-    )
-
-    assert response.status_code == 200
-    data = response.json()
-    assert data["daily_tokens_used"] == 0
-    assert data["weekly_tokens_used"] == 0
-    assert data["tier"] == "FREE"
-
-    mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=True)
-
-    configured_snapshot.assert_match(
-        json.dumps(data, indent=2, sort_keys=True) + "\n",
-        "reset_user_usage_daily_and_weekly",
-    )
-
-
-def test_reset_user_usage_redis_failure(
-    mocker: pytest_mock.MockerFixture,
-    target_user_id: str,
-) -> None:
-    """Test that Redis failure on reset returns 500."""
-    mocker.patch(
-        f"{_MOCK_MODULE}.reset_user_usage",
-        new_callable=AsyncMock,
-        side_effect=Exception("Redis connection refused"),
-    )
-
-    response = client.post(
-        "/admin/rate_limit/reset",
-        json={"user_id": target_user_id},
-    )
-
-    assert response.status_code == 500
-
-
-def test_get_rate_limit_email_lookup_failure(
-    mocker: pytest_mock.MockerFixture,
-    target_user_id: str,
-) -> None:
-    """Test that failing to resolve a user email degrades gracefully."""
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_global_rate_limits",
-        new_callable=AsyncMock,
-        return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
-    )
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_usage_status",
-        new_callable=AsyncMock,
-        return_value=_mock_usage_status(),
-    )
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_user_email_by_id",
-        new_callable=AsyncMock,
-        side_effect=Exception("DB connection lost"),
-    )
-
-    response = client.get("/admin/rate_limit", params={"user_id": target_user_id})
-
-    assert response.status_code == 200
-    data = response.json()
-    assert data["user_id"] == target_user_id
-    assert data["user_email"] is None
-
-
-def test_admin_endpoints_require_admin_role(mock_jwt_user) -> None:
-    """Test that rate limit admin endpoints require admin role."""
-    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
-
-    response = client.get("/admin/rate_limit", params={"user_id": "test"})
-    assert response.status_code == 403
-
-    response = client.post(
-        "/admin/rate_limit/reset",
-        json={"user_id": "test"},
-    )
-    assert response.status_code == 403
-
-
-# ---------------------------------------------------------------------------
-# Tier management endpoints
-# ---------------------------------------------------------------------------
-
-
-def test_get_user_tier(
-    mocker: pytest_mock.MockerFixture,
-    target_user_id: str,
-) -> None:
-    """Test getting a user's rate-limit tier."""
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_user_email_by_id",
-        new_callable=AsyncMock,
-        return_value=_TARGET_EMAIL,
-    )
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_user_tier",
-        new_callable=AsyncMock,
-        return_value=SubscriptionTier.PRO,
-    )
-
-    response = client.get("/admin/rate_limit/tier", params={"user_id": target_user_id})
-
-    assert response.status_code == 200
-    data = response.json()
-    assert data["user_id"] == target_user_id
-    assert data["tier"] == "PRO"
-
-
-def test_get_user_tier_user_not_found(
-    mocker: pytest_mock.MockerFixture,
-    target_user_id: str,
-) -> None:
-    """Test that getting tier for a non-existent user returns 404."""
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_user_email_by_id",
-        new_callable=AsyncMock,
-        return_value=None,
-    )
-
-    response = client.get("/admin/rate_limit/tier", params={"user_id": target_user_id})
-
-    assert response.status_code == 404
-
-
-def test_set_user_tier(
-    mocker: pytest_mock.MockerFixture,
-    target_user_id: str,
-) -> None:
-    """Test setting a user's rate-limit tier (upgrade)."""
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_user_email_by_id",
-        new_callable=AsyncMock,
-        return_value=_TARGET_EMAIL,
-    )
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_user_tier",
-        new_callable=AsyncMock,
-        return_value=SubscriptionTier.FREE,
-    )
-    mock_set = mocker.patch(
-        f"{_MOCK_MODULE}.set_user_tier",
-        new_callable=AsyncMock,
-    )
-
-    response = client.post(
-        "/admin/rate_limit/tier",
-        json={"user_id": target_user_id, "tier": "ENTERPRISE"},
-    )
-
-    assert response.status_code == 200
-    data = response.json()
-    assert data["user_id"] == target_user_id
-    assert data["tier"] == "ENTERPRISE"
-    mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.ENTERPRISE)
-
-
-def test_set_user_tier_downgrade(
-    mocker: pytest_mock.MockerFixture,
-    target_user_id: str,
-) -> None:
-    """Test downgrading a user's tier from PRO to FREE."""
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_user_email_by_id",
-        new_callable=AsyncMock,
-        return_value=_TARGET_EMAIL,
-    )
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_user_tier",
-        new_callable=AsyncMock,
-        return_value=SubscriptionTier.PRO,
-    )
-    mock_set = mocker.patch(
-        f"{_MOCK_MODULE}.set_user_tier",
-        new_callable=AsyncMock,
-    )
-
-    response = client.post(
-        "/admin/rate_limit/tier",
-        json={"user_id": target_user_id, "tier": "FREE"},
-    )
-
-    assert response.status_code == 200
-    data = response.json()
-    assert data["user_id"] == target_user_id
-    assert data["tier"] == "FREE"
-    mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.FREE)
-
-
-def test_set_user_tier_invalid_tier(
-    target_user_id: str,
-) -> None:
-    """Test that setting an invalid tier returns 422."""
-    response = client.post(
-        "/admin/rate_limit/tier",
-        json={"user_id": target_user_id, "tier": "invalid"},
-    )
-
-    assert response.status_code == 422
-
-
-def test_set_user_tier_invalid_tier_uppercase(
-    target_user_id: str,
-) -> None:
-    """Test that setting an unrecognised uppercase tier (e.g. 'INVALID') returns 422.
-
-    Regression: ensures Pydantic enum validation rejects values that are not
-    members of SubscriptionTier, even when they look like valid enum names.
-    """
-    response = client.post(
-        "/admin/rate_limit/tier",
-        json={"user_id": target_user_id, "tier": "INVALID"},
-    )
-
-    assert response.status_code == 422
-    body = response.json()
-    assert "detail" in body
-
-
-def test_set_user_tier_email_lookup_failure_returns_404(
-    mocker: pytest_mock.MockerFixture,
-    target_user_id: str,
-) -> None:
-    """Test that email lookup failure returns 404 (user unverifiable)."""
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_user_email_by_id",
-        new_callable=AsyncMock,
-        side_effect=Exception("DB connection failed"),
-    )
-
-    response = client.post(
-        "/admin/rate_limit/tier",
-        json={"user_id": target_user_id, "tier": "PRO"},
-    )
-
-    assert response.status_code == 404
-
-
-def test_set_user_tier_user_not_found(
-    mocker: pytest_mock.MockerFixture,
-    target_user_id: str,
-) -> None:
-    """Test that setting tier for a non-existent user returns 404."""
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_user_email_by_id",
-        new_callable=AsyncMock,
-        return_value=None,
-    )
-
-    response = client.post(
-        "/admin/rate_limit/tier",
-        json={"user_id": target_user_id, "tier": "PRO"},
-    )
-
-    assert response.status_code == 404
-
-
-def test_set_user_tier_db_failure(
-    mocker: pytest_mock.MockerFixture,
-    target_user_id: str,
-) -> None:
-    """Test that DB failure on set tier returns 500."""
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_user_email_by_id",
-        new_callable=AsyncMock,
-        return_value=_TARGET_EMAIL,
-    )
-    mocker.patch(
-        f"{_MOCK_MODULE}.get_user_tier",
-        new_callable=AsyncMock,
-        return_value=SubscriptionTier.FREE,
-    )
-    mocker.patch(
-        f"{_MOCK_MODULE}.set_user_tier",
-        new_callable=AsyncMock,
-        side_effect=Exception("DB connection refused"),
-    )
-
-    response = client.post(
-        "/admin/rate_limit/tier",
-        json={"user_id": target_user_id, "tier": "PRO"},
-    )
-
-    assert response.status_code == 500
-
-
-def test_tier_endpoints_require_admin_role(mock_jwt_user) -> None:
-    """Test that tier admin endpoints require admin role."""
-    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
-
-    response = client.get("/admin/rate_limit/tier", params={"user_id": "test"})
-    assert response.status_code == 403
-
-    response = client.post(
-        "/admin/rate_limit/tier",
-        json={"user_id": "test", "tier": "PRO"},
-    )
-    assert response.status_code == 403
-
-
-# ─── search_users endpoint ──────────────────────────────────────────
-
-
-def test_search_users_returns_matching_users(
-    mocker: pytest_mock.MockerFixture,
-    admin_user_id: str,
-) -> None:
-    """Partial search should return all matching users from the User table."""
-    mocker.patch(
-        _MOCK_MODULE + ".search_users",
-        new_callable=AsyncMock,
-        return_value=[
-            ("user-1", "zamil.majdy@gmail.com"),
-            ("user-2", "zamil.majdy@agpt.co"),
-        ],
-    )
-
-    response = client.get("/admin/rate_limit/search_users", params={"query": "zamil"})
-
-    assert response.status_code == 200
-    results = response.json()
-    assert len(results) == 2
-    assert results[0]["user_email"] == "zamil.majdy@gmail.com"
-    assert results[1]["user_email"] == "zamil.majdy@agpt.co"
-
-
-def test_search_users_empty_results(
-    mocker: pytest_mock.MockerFixture,
-    admin_user_id: str,
-) -> None:
-    """Search with no matches returns empty list."""
-    mocker.patch(
-        _MOCK_MODULE + ".search_users",
-        new_callable=AsyncMock,
-        return_value=[],
-    )
-
-    response = client.get(
-        "/admin/rate_limit/search_users", params={"query": "nonexistent"}
-    )
-
-    assert response.status_code == 200
-    assert response.json() == []
-
-
-def test_search_users_short_query_rejected(
-    admin_user_id: str,
-) -> None:
-    """Query shorter than 3 characters should return 400."""
-    response = client.get("/admin/rate_limit/search_users", params={"query": "ab"})
-    assert response.status_code == 400
-
-
-def test_search_users_negative_limit_clamped(
-    mocker: pytest_mock.MockerFixture,
-    admin_user_id: str,
-) -> None:
-    """Negative limit should be clamped to 1, not passed through."""
-    mock_search = mocker.patch(
-        _MOCK_MODULE + ".search_users",
-        new_callable=AsyncMock,
-        return_value=[],
-    )
-
-    response = client.get(
-        "/admin/rate_limit/search_users", params={"query": "test", "limit": -1}
-    )
-
-    assert response.status_code == 200
-    mock_search.assert_awaited_once_with("test", limit=1)
-
-
-def test_search_users_requires_admin_role(mock_jwt_user) -> None:
-    """Test that the search_users endpoint requires admin role."""
-    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
-
-    response = client.get("/admin/rate_limit/search_users", params={"query": "test"})
-    assert response.status_code == 403
--- a/autogpt_platform/backend/backend/api/features/admin/store_admin_routes.py
+++ b/autogpt_platform/backend/backend/api/features/admin/store_admin_routes.py
@@ -7,8 +7,6 @@ import fastapi
 import fastapi.responses
 import prisma.enums

-import backend.api.features.library.db as library_db
-import backend.api.features.library.model as library_model
 import backend.api.features.store.cache as store_cache
 import backend.api.features.store.db as store_db
 import backend.api.features.store.model as store_model
@@ -26,13 +24,14 @@ router = fastapi.APIRouter(
@router.get(
    "/listings",
    summary="Get Admin Listings History",
+    response_model=store_model.StoreListingsWithVersionsResponse,
 )
 async def get_admin_listings_with_versions(
    status: typing.Optional[prisma.enums.SubmissionStatus] = None,
    search: typing.Optional[str] = None,
    page: int = 1,
    page_size: int = 20,
-) -> store_model.StoreListingsWithVersionsAdminViewResponse:
+):
    """
    Get store listings with their version history for admins.

@@ -46,26 +45,36 @@ async def get_admin_listings_with_versions(
        page_size: Number of items per page

    Returns:
-        Paginated listings with their versions
+        StoreListingsWithVersionsResponse with listings and their versions
    """
-    listings = await store_db.get_admin_listings_with_versions(
-        status=status,
-        search_query=search,
-        page=page,
-        page_size=page_size,
-    )
-    return listings
+    try:
+        listings = await store_db.get_admin_listings_with_versions(
+            status=status,
+            search_query=search,
+            page=page,
+            page_size=page_size,
+        )
+        return listings
+    except Exception as e:
+        logger.exception("Error getting admin listings with versions: %s", e)
+        return fastapi.responses.JSONResponse(
+            status_code=500,
+            content={
+                "detail": "An error occurred while retrieving listings with versions"
+            },
+        )


@router.post(
    "/submissions/{store_listing_version_id}/review",
    summary="Review Store Submission",
+    response_model=store_model.StoreSubmission,
 )
 async def review_submission(
    store_listing_version_id: str,
    request: store_model.ReviewSubmissionRequest,
    user_id: str = fastapi.Security(autogpt_libs.auth.get_user_id),
-) -> store_model.StoreSubmissionAdminView:
+):
    """
    Review a store listing submission.

@@ -75,24 +84,31 @@ async def review_submission(
        user_id: Authenticated admin user performing the review

    Returns:
-        StoreSubmissionAdminView with updated review information
+        StoreSubmission with updated review information
    """
-    already_approved = await store_db.check_submission_already_approved(
-        store_listing_version_id=store_listing_version_id,
-    )
-    submission = await store_db.review_store_submission(
-        store_listing_version_id=store_listing_version_id,
-        is_approved=request.is_approved,
-        external_comments=request.comments,
-        internal_comments=request.internal_comments or "",
-        reviewer_id=user_id,
-    )
+    try:
+        already_approved = await store_db.check_submission_already_approved(
+            store_listing_version_id=store_listing_version_id,
+        )
+        submission = await store_db.review_store_submission(
+            store_listing_version_id=store_listing_version_id,
+            is_approved=request.is_approved,
+            external_comments=request.comments,
+            internal_comments=request.internal_comments or "",
+            reviewer_id=user_id,
+        )

-    state_changed = already_approved != request.is_approved
-    # Clear caches whenever approval state changes, since store visibility can change
-    if state_changed:
-        store_cache.clear_all_caches()
-    return submission
+        state_changed = already_approved != request.is_approved
+        # Clear caches when the request is approved as it updates what is shown on the store
+        if state_changed:
+            store_cache.clear_all_caches()
+        return submission
+    except Exception as e:
+        logger.exception("Error reviewing submission: %s", e)
+        return fastapi.responses.JSONResponse(
+            status_code=500,
+            content={"detail": "An error occurred while reviewing the submission"},
+        )


@router.get(
@@ -134,40 +150,3 @@ async def admin_download_agent_file(
        return fastapi.responses.FileResponse(
            tmp_file.name, filename=file_name, media_type="application/json"
        )
-
-
-@router.get(
-    "/submissions/{store_listing_version_id}/preview",
-    summary="Admin Preview Submission Listing",
-)
-async def admin_preview_submission(
-    store_listing_version_id: str,
-) -> store_model.StoreAgentDetails:
-    """
-    Preview a marketplace submission as it would appear on the listing page.
-    Bypasses the APPROVED-only StoreAgent view so admins can preview pending
-    submissions before approving.
-    """
-    return await store_db.get_store_agent_details_as_admin(store_listing_version_id)
-
-
-@router.post(
-    "/submissions/{store_listing_version_id}/add-to-library",
-    summary="Admin Add Pending Agent to Library",
-    status_code=201,
-)
-async def admin_add_agent_to_library(
-    store_listing_version_id: str,
-    user_id: str = fastapi.Security(autogpt_libs.auth.get_user_id),
-) -> library_model.LibraryAgent:
-    """
-    Add a pending marketplace agent to the admin's library for review.
-    Uses admin-level access to bypass marketplace APPROVED-only checks.
-
-    The builder can load the graph because get_graph() checks library
-    membership as a fallback: "you added it, you keep it."
-    """
-    return await library_db.add_store_agent_to_library_as_admin(
-        store_listing_version_id=store_listing_version_id,
-        user_id=user_id,
-    )
--- a/autogpt_platform/backend/backend/api/features/admin/store_admin_routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/admin/store_admin_routes_test.py
@@ -1,335 +0,0 @@
-"""Tests for admin store routes and the bypass logic they depend on.
-
-Tests are organized by what they protect:
- SECRT-2162: get_graph_as_admin bypasses ownership/marketplace checks
- SECRT-2167 security: admin endpoints reject non-admin users
- SECRT-2167 bypass: preview queries StoreListingVersion (not StoreAgent view),
-  and add-to-library uses get_graph_as_admin (not get_graph)
-"""
-
-from datetime import datetime, timezone
-from unittest.mock import AsyncMock, MagicMock, patch
-
-import fastapi
-import fastapi.responses
-import fastapi.testclient
-import pytest
-import pytest_mock
-from autogpt_libs.auth.jwt_utils import get_jwt_payload
-
-from backend.data.graph import get_graph_as_admin
-from backend.util.exceptions import NotFoundError
-
-from .store_admin_routes import router as store_admin_router
-
-# Shared constants
-ADMIN_USER_ID = "admin-user-id"
-CREATOR_USER_ID = "other-creator-id"
-GRAPH_ID = "test-graph-id"
-GRAPH_VERSION = 3
-SLV_ID = "test-store-listing-version-id"
-
-
-def _make_mock_graph(user_id: str = CREATOR_USER_ID) -> MagicMock:
-    graph = MagicMock()
-    graph.userId = user_id
-    graph.id = GRAPH_ID
-    graph.version = GRAPH_VERSION
-    graph.Nodes = []
-    return graph
-
-
-# ---- SECRT-2162: get_graph_as_admin bypasses ownership checks ---- #
-
-
-@pytest.mark.asyncio
-async def test_admin_can_access_pending_agent_not_owned() -> None:
-    """get_graph_as_admin must return a graph even when the admin doesn't own
-    it and it's not APPROVED in the marketplace."""
-    mock_graph = _make_mock_graph()
-    mock_graph_model = MagicMock(name="GraphModel")
-
-    with (
-        patch("backend.data.graph.AgentGraph.prisma") as mock_prisma,
-        patch(
-            "backend.data.graph.GraphModel.from_db",
-            return_value=mock_graph_model,
-        ),
-    ):
-        mock_prisma.return_value.find_first = AsyncMock(return_value=mock_graph)
-
-        result = await get_graph_as_admin(
-            graph_id=GRAPH_ID,
-            version=GRAPH_VERSION,
-            user_id=ADMIN_USER_ID,
-            for_export=False,
-        )
-
-    assert result is mock_graph_model
-
-
-@pytest.mark.asyncio
-async def test_admin_download_pending_agent_with_subagents() -> None:
-    """get_graph_as_admin with for_export=True must call get_sub_graphs
-    and pass sub_graphs to GraphModel.from_db."""
-    mock_graph = _make_mock_graph()
-    mock_sub_graph = MagicMock(name="SubGraph")
-    mock_graph_model = MagicMock(name="GraphModel")
-
-    with (
-        patch("backend.data.graph.AgentGraph.prisma") as mock_prisma,
-        patch(
-            "backend.data.graph.get_sub_graphs",
-            new_callable=AsyncMock,
-            return_value=[mock_sub_graph],
-        ) as mock_get_sub,
-        patch(
-            "backend.data.graph.GraphModel.from_db",
-            return_value=mock_graph_model,
-        ) as mock_from_db,
-    ):
-        mock_prisma.return_value.find_first = AsyncMock(return_value=mock_graph)
-
-        result = await get_graph_as_admin(
-            graph_id=GRAPH_ID,
-            version=GRAPH_VERSION,
-            user_id=ADMIN_USER_ID,
-            for_export=True,
-        )
-
-    assert result is mock_graph_model
-    mock_get_sub.assert_awaited_once_with(mock_graph)
-    mock_from_db.assert_called_once_with(
-        graph=mock_graph,
-        sub_graphs=[mock_sub_graph],
-        for_export=True,
-    )
-
-
-# ---- SECRT-2167 security: admin endpoints reject non-admin users ---- #
-
-app = fastapi.FastAPI()
-app.include_router(store_admin_router)
-
-
-@app.exception_handler(NotFoundError)
-async def _not_found_handler(
-    request: fastapi.Request, exc: NotFoundError
-) -> fastapi.responses.JSONResponse:
-    return fastapi.responses.JSONResponse(status_code=404, content={"detail": str(exc)})
-
-
-client = fastapi.testclient.TestClient(app)
-
-
-@pytest.fixture(autouse=True)
-def setup_app_admin_auth(mock_jwt_admin):
-    """Setup admin auth overrides for all route tests in this module."""
-    app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
-    yield
-    app.dependency_overrides.clear()
-
-
-def test_preview_requires_admin(mock_jwt_user) -> None:
-    """Non-admin users must get 403 on the preview endpoint."""
-    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
-    response = client.get(f"/admin/submissions/{SLV_ID}/preview")
-    assert response.status_code == 403
-
-
-def test_add_to_library_requires_admin(mock_jwt_user) -> None:
-    """Non-admin users must get 403 on the add-to-library endpoint."""
-    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
-    response = client.post(f"/admin/submissions/{SLV_ID}/add-to-library")
-    assert response.status_code == 403
-
-
-def test_preview_nonexistent_submission(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    """Preview of a nonexistent submission returns 404."""
-    mocker.patch(
-        "backend.api.features.admin.store_admin_routes.store_db"
-        ".get_store_agent_details_as_admin",
-        side_effect=NotFoundError("not found"),
-    )
-    response = client.get(f"/admin/submissions/{SLV_ID}/preview")
-    assert response.status_code == 404
-
-
-# ---- SECRT-2167 bypass: verify the right data sources are used ---- #
-
-
-@pytest.mark.asyncio
-async def test_preview_queries_store_listing_version_not_store_agent() -> None:
-    """get_store_agent_details_as_admin must query StoreListingVersion
-    directly (not the APPROVED-only StoreAgent view). This is THE test that
-    prevents the bypass from being accidentally reverted."""
-    from backend.api.features.store.db import get_store_agent_details_as_admin
-
-    mock_slv = MagicMock()
-    mock_slv.id = SLV_ID
-    mock_slv.name = "Test Agent"
-    mock_slv.subHeading = "Short desc"
-    mock_slv.description = "Long desc"
-    mock_slv.videoUrl = None
-    mock_slv.agentOutputDemoUrl = None
-    mock_slv.imageUrls = ["https://example.com/img.png"]
-    mock_slv.instructions = None
-    mock_slv.categories = ["productivity"]
-    mock_slv.version = 1
-    mock_slv.agentGraphId = GRAPH_ID
-    mock_slv.agentGraphVersion = GRAPH_VERSION
-    mock_slv.updatedAt = datetime(2026, 3, 24, tzinfo=timezone.utc)
-    mock_slv.recommendedScheduleCron = "0 9 * * *"
-
-    mock_listing = MagicMock()
-    mock_listing.id = "listing-id"
-    mock_listing.slug = "test-agent"
-    mock_listing.activeVersionId = SLV_ID
-    mock_listing.hasApprovedVersion = False
-    mock_listing.CreatorProfile = MagicMock(username="creator", avatarUrl="")
-    mock_slv.StoreListing = mock_listing
-
-    with (
-        patch(
-            "backend.api.features.store.db.prisma.models" ".StoreListingVersion.prisma",
-        ) as mock_slv_prisma,
-        patch(
-            "backend.api.features.store.db.prisma.models.StoreAgent.prisma",
-        ) as mock_store_agent_prisma,
-    ):
-        mock_slv_prisma.return_value.find_unique = AsyncMock(return_value=mock_slv)
-
-        result = await get_store_agent_details_as_admin(SLV_ID)
-
-    # Verify it queried StoreListingVersion (not the APPROVED-only StoreAgent)
-    mock_slv_prisma.return_value.find_unique.assert_awaited_once()
-    await_args = mock_slv_prisma.return_value.find_unique.await_args
-    assert await_args is not None
-    assert await_args.kwargs["where"] == {"id": SLV_ID}
-
-    # Verify the APPROVED-only StoreAgent view was NOT touched
-    mock_store_agent_prisma.assert_not_called()
-
-    # Verify the result has the right data
-    assert result.agent_name == "Test Agent"
-    assert result.agent_image == ["https://example.com/img.png"]
-    assert result.has_approved_version is False
-    assert result.runs == 0
-    assert result.rating == 0.0
-
-
-@pytest.mark.asyncio
-async def test_resolve_graph_admin_uses_get_graph_as_admin() -> None:
-    """resolve_graph_for_library(admin=True) must call get_graph_as_admin,
-    not get_graph. This is THE test that prevents the add-to-library bypass
-    from being accidentally reverted."""
-    from backend.api.features.library._add_to_library import resolve_graph_for_library
-
-    mock_slv = MagicMock()
-    mock_slv.AgentGraph = MagicMock(id=GRAPH_ID, version=GRAPH_VERSION)
-    mock_graph_model = MagicMock(name="GraphModel")
-
-    with (
-        patch(
-            "backend.api.features.library._add_to_library.prisma.models"
-            ".StoreListingVersion.prisma",
-        ) as mock_prisma,
-        patch(
-            "backend.api.features.library._add_to_library.graph_db"
-            ".get_graph_as_admin",
-            new_callable=AsyncMock,
-            return_value=mock_graph_model,
-        ) as mock_admin,
-        patch(
-            "backend.api.features.library._add_to_library.graph_db.get_graph",
-            new_callable=AsyncMock,
-        ) as mock_regular,
-    ):
-        mock_prisma.return_value.find_unique = AsyncMock(return_value=mock_slv)
-
-        result = await resolve_graph_for_library(SLV_ID, ADMIN_USER_ID, admin=True)
-
-    assert result is mock_graph_model
-    mock_admin.assert_awaited_once_with(
-        graph_id=GRAPH_ID, version=GRAPH_VERSION, user_id=ADMIN_USER_ID
-    )
-    mock_regular.assert_not_awaited()
-
-
-@pytest.mark.asyncio
-async def test_resolve_graph_regular_uses_get_graph() -> None:
-    """resolve_graph_for_library(admin=False) must call get_graph,
-    not get_graph_as_admin. Ensures the non-admin path is preserved."""
-    from backend.api.features.library._add_to_library import resolve_graph_for_library
-
-    mock_slv = MagicMock()
-    mock_slv.AgentGraph = MagicMock(id=GRAPH_ID, version=GRAPH_VERSION)
-    mock_graph_model = MagicMock(name="GraphModel")
-
-    with (
-        patch(
-            "backend.api.features.library._add_to_library.prisma.models"
-            ".StoreListingVersion.prisma",
-        ) as mock_prisma,
-        patch(
-            "backend.api.features.library._add_to_library.graph_db"
-            ".get_graph_as_admin",
-            new_callable=AsyncMock,
-        ) as mock_admin,
-        patch(
-            "backend.api.features.library._add_to_library.graph_db.get_graph",
-            new_callable=AsyncMock,
-            return_value=mock_graph_model,
-        ) as mock_regular,
-    ):
-        mock_prisma.return_value.find_unique = AsyncMock(return_value=mock_slv)
-
-        result = await resolve_graph_for_library(SLV_ID, "regular-user-id", admin=False)
-
-    assert result is mock_graph_model
-    mock_regular.assert_awaited_once_with(
-        graph_id=GRAPH_ID, version=GRAPH_VERSION, user_id="regular-user-id"
-    )
-    mock_admin.assert_not_awaited()
-
-
-# ---- Library membership grants graph access (product decision) ---- #
-
-
-@pytest.mark.asyncio
-async def test_library_member_can_view_pending_agent_in_builder() -> None:
-    """After adding a pending agent to their library, the user should be
-    able to load the graph in the builder via get_graph()."""
-    mock_graph = _make_mock_graph()
-    mock_graph_model = MagicMock(name="GraphModel")
-    mock_library_agent = MagicMock()
-    mock_library_agent.AgentGraph = mock_graph
-
-    with (
-        patch("backend.data.graph.AgentGraph.prisma") as mock_ag_prisma,
-        patch(
-            "backend.data.graph.StoreListingVersion.prisma",
-        ) as mock_slv_prisma,
-        patch("backend.data.graph.LibraryAgent.prisma") as mock_lib_prisma,
-        patch(
-            "backend.data.graph.GraphModel.from_db",
-            return_value=mock_graph_model,
-        ),
-    ):
-        mock_ag_prisma.return_value.find_first = AsyncMock(return_value=None)
-        mock_slv_prisma.return_value.find_first = AsyncMock(return_value=None)
-        mock_lib_prisma.return_value.find_first = AsyncMock(
-            return_value=mock_library_agent
-        )
-
-        from backend.data.graph import get_graph
-
-        result = await get_graph(
-            graph_id=GRAPH_ID,
-            version=GRAPH_VERSION,
-            user_id=ADMIN_USER_ID,
-        )
-
-    assert result is mock_graph_model, "Library membership should grant graph access"
--- a/autogpt_platform/backend/backend/api/features/builder/db.py
+++ b/autogpt_platform/backend/backend/api/features/builder/db.py
@@ -1,33 +1,28 @@
 import logging
 from dataclasses import dataclass
+from datetime import datetime, timedelta, timezone
 from difflib import SequenceMatcher
-from typing import Any, Sequence, get_args, get_origin
+from typing import Sequence

 import prisma
-from prisma.models import mv_suggested_blocks

 import backend.api.features.library.db as library_db
 import backend.api.features.library.model as library_model
 import backend.api.features.store.db as store_db
 import backend.api.features.store.model as store_model
+import backend.data.block
 from backend.blocks import load_all_blocks
-from backend.blocks._base import (
-    AnyBlockSchema,
-    BlockCategory,
-    BlockInfo,
-    BlockSchema,
-    BlockType,
-)
 from backend.blocks.llm import LlmModel
+from backend.data.block import AnyBlockSchema, BlockCategory, BlockInfo, BlockSchema
+from backend.data.db import query_raw_with_schema
 from backend.integrations.providers import ProviderName
 from backend.util.cache import cached
 from backend.util.models import Pagination
-from backend.util.text import split_camelcase

 from .model import (
    BlockCategoryResponse,
    BlockResponse,
-    BlockTypeFilter,
+    BlockType,
    CountResponse,
    FilterType,
    Provider,
@@ -42,16 +37,6 @@ MAX_LIBRARY_AGENT_RESULTS = 100
 MAX_MARKETPLACE_AGENT_RESULTS = 100
 MIN_SCORE_FOR_FILTERED_RESULTS = 10.0

-# Boost blocks over marketplace agents in search results
-BLOCK_SCORE_BOOST = 50.0
-
-# Block IDs to exclude from search results
-EXCLUDED_BLOCK_IDS = frozenset(
-    {
-        "e189baac-8c20-45a1-94a7-55177ea42565",  # AgentExecutorBlock
-    }
-)
-
 SearchResultItem = BlockInfo | library_model.LibraryAgent | store_model.StoreAgent


@@ -74,8 +59,8 @@ def get_block_categories(category_blocks: int = 3) -> list[BlockCategoryResponse

    for block_type in load_all_blocks().values():
        block: AnyBlockSchema = block_type()
-        # Skip disabled and excluded blocks
-        if block.disabled or block.id in EXCLUDED_BLOCK_IDS:
+        # Skip disabled blocks
+        if block.disabled:
            continue
        # Skip blocks that don't have categories (all should have at least one)
        if not block.categories:
@@ -103,7 +88,7 @@ def get_block_categories(category_blocks: int = 3) -> list[BlockCategoryResponse
 def get_blocks(
    *,
    category: str | None = None,
-    type: BlockTypeFilter | None = None,
+    type: BlockType | None = None,
    provider: ProviderName | None = None,
    page: int = 1,
    page_size: int = 50,
@@ -126,9 +111,6 @@ def get_blocks(
        # Skip disabled blocks
        if block.disabled:
            continue
-        # Skip excluded blocks
-        if block.id in EXCLUDED_BLOCK_IDS:
-            continue
        # Skip blocks that don't match the category
        if category and category not in {c.name.lower() for c in block.categories}:
            continue
@@ -268,25 +250,14 @@ async def _build_cached_search_results(
        "my_agents": 0,
    }

-    # Use hybrid search when query is present, otherwise list all blocks
-    if (include_blocks or include_integrations) and normalized_query:
-        block_results, block_total, integration_total = await _text_search_blocks(
-            query=search_query,
-            include_blocks=include_blocks,
-            include_integrations=include_integrations,
-        )
-        scored_items.extend(block_results)
-        total_items["blocks"] = block_total
-        total_items["integrations"] = integration_total
-    elif include_blocks or include_integrations:
-        # No query - list all blocks using in-memory approach
-        block_results, block_total, integration_total = _collect_block_results(
-            include_blocks=include_blocks,
-            include_integrations=include_integrations,
-        )
-        scored_items.extend(block_results)
-        total_items["blocks"] = block_total
-        total_items["integrations"] = integration_total
+    block_results, block_total, integration_total = _collect_block_results(
+        normalized_query=normalized_query,
+        include_blocks=include_blocks,
+        include_integrations=include_integrations,
+    )
+    scored_items.extend(block_results)
+    total_items["blocks"] = block_total
+    total_items["integrations"] = integration_total

    if include_library_agents:
        library_response = await library_db.list_library_agents(
@@ -331,14 +302,10 @@ async def _build_cached_search_results(

 def _collect_block_results(
    *,
+    normalized_query: str,
    include_blocks: bool,
    include_integrations: bool,
 ) -> tuple[list[_ScoredItem], int, int]:
-    """
-    Collect all blocks for listing (no search query).
-
-    All blocks get BLOCK_SCORE_BOOST to prioritize them over marketplace agents.
-    """
    results: list[_ScoredItem] = []
    block_count = 0
    integration_count = 0
@@ -351,10 +318,6 @@ def _collect_block_results(
        if block.disabled:
            continue

-        # Skip excluded blocks
-        if block.id in EXCLUDED_BLOCK_IDS:
-            continue
-
        block_info = block.get_info()
        credentials = list(block.input_schema.get_credentials_fields().values())
        is_integration = len(credentials) > 0
@@ -364,6 +327,10 @@ def _collect_block_results(
        if not is_integration and not include_blocks:
            continue

+        score = _score_block(block, block_info, normalized_query)
+        if not _should_include_item(score, normalized_query):
+            continue
+
        filter_type: FilterType = "integrations" if is_integration else "blocks"
        if is_integration:
            integration_count += 1
@@ -374,86 +341,14 @@ def _collect_block_results(
            _ScoredItem(
                item=block_info,
                filter_type=filter_type,
-                score=BLOCK_SCORE_BOOST,
-                sort_key=block_info.name.lower(),
+                score=score,
+                sort_key=_get_item_name(block_info),
            )
        )

    return results, block_count, integration_count


-async def _text_search_blocks(
-    *,
-    query: str,
-    include_blocks: bool,
-    include_integrations: bool,
-) -> tuple[list[_ScoredItem], int, int]:
-    """
-    Search blocks using in-memory text matching over the block registry.
-
-    All blocks are already loaded in memory, so this is fast and reliable
-    regardless of whether OpenAI embeddings are available.
-
-    Scoring:
-        - Base: text relevance via _score_primary_fields, plus BLOCK_SCORE_BOOST
-          to prioritize blocks over marketplace agents in combined results
-        - +20 if the block has an LlmModel field and the query matches an LLM model name
-    """
-    results: list[_ScoredItem] = []
-
-    if not include_blocks and not include_integrations:
-        return results, 0, 0
-
-    normalized_query = query.strip().lower()
-
-    all_results, _, _ = _collect_block_results(
-        include_blocks=include_blocks,
-        include_integrations=include_integrations,
-    )
-
-    all_blocks = load_all_blocks()
-
-    for item in all_results:
-        block_info = item.item
-        assert isinstance(block_info, BlockInfo)
-        name = split_camelcase(block_info.name).lower()
-
-        # Build rich description including input field descriptions,
-        # matching the searchable text that the embedding pipeline uses
-        desc_parts = [block_info.description or ""]
-        block_cls = all_blocks.get(block_info.id)
-        if block_cls is not None:
-            block: AnyBlockSchema = block_cls()
-            desc_parts += [
-                f"{f}: {info.description}"
-                for f, info in block.input_schema.model_fields.items()
-                if info.description
-            ]
-        description = " ".join(desc_parts).lower()
-
-        score = _score_primary_fields(name, description, normalized_query)
-
-        # Add LLM model match bonus
-        if block_cls is not None and _matches_llm_model(
-            block_cls().input_schema, normalized_query
-        ):
-            score += 20
-
-        if score >= MIN_SCORE_FOR_FILTERED_RESULTS:
-            results.append(
-                _ScoredItem(
-                    item=block_info,
-                    filter_type=item.filter_type,
-                    score=score + BLOCK_SCORE_BOOST,
-                    sort_key=name,
-                )
-            )
-
-    block_count = sum(1 for r in results if r.filter_type == "blocks")
-    integration_count = sum(1 for r in results if r.filter_type == "integrations")
-    return results, block_count, integration_count
-
-
 def _build_library_items(
    *,
    agents: list[library_model.LibraryAgent],
@@ -572,8 +467,6 @@ async def _get_static_counts():
        block: AnyBlockSchema = block_type()
        if block.disabled:
            continue
-        if block.id in EXCLUDED_BLOCK_IDS:
-            continue

        all_blocks += 1

@@ -600,25 +493,47 @@ async def _get_static_counts():
    }


-def _contains_type(annotation: Any, target: type) -> bool:
-    """Check if an annotation is or contains the target type (handles Optional/Union/Annotated)."""
-    if annotation is target:
-        return True
-    origin = get_origin(annotation)
-    if origin is None:
-        return False
-    return any(_contains_type(arg, target) for arg in get_args(annotation))
-
-
 def _matches_llm_model(schema_cls: type[BlockSchema], query: str) -> bool:
    for field in schema_cls.model_fields.values():
-        if _contains_type(field.annotation, LlmModel):
+        if field.annotation == LlmModel:
            # Check if query matches any value in llm_models
            if any(query in name for name in llm_models):
                return True
    return False


+def _score_block(
+    block: AnyBlockSchema,
+    block_info: BlockInfo,
+    normalized_query: str,
+) -> float:
+    if not normalized_query:
+        return 0.0
+
+    name = block_info.name.lower()
+    description = block_info.description.lower()
+    score = _score_primary_fields(name, description, normalized_query)
+
+    category_text = " ".join(
+        category.get("category", "").lower() for category in block_info.categories
+    )
+    score += _score_additional_field(category_text, normalized_query, 12, 6)
+
+    credentials_info = block.input_schema.get_credentials_fields_info().values()
+    provider_names = [
+        provider.value.lower()
+        for info in credentials_info
+        for provider in info.provider
+    ]
+    provider_text = " ".join(provider_names)
+    score += _score_additional_field(provider_text, normalized_query, 15, 6)
+
+    if _matches_llm_model(block.input_schema, normalized_query):
+        score += 20
+
+    return score
+
+
 def _score_library_agent(
    agent: library_model.LibraryAgent,
    normalized_query: str,
@@ -725,32 +640,45 @@ def _get_all_providers() -> dict[ProviderName, Provider]:
    return providers


-@cached(ttl_seconds=3600, shared_cache=True)
+@cached(ttl_seconds=3600)
 async def get_suggested_blocks(count: int = 5) -> list[BlockInfo]:
-    """Return the most-executed blocks from the last 14 days.
+    suggested_blocks = []
+    # Sum the number of executions for each block type
+    # Prisma cannot group by nested relations, so we do a raw query
+    # Calculate the cutoff timestamp
+    timestamp_threshold = datetime.now(timezone.utc) - timedelta(days=30)

-    Queries the mv_suggested_blocks materialized view (refreshed hourly via pg_cron)
-    and returns the top `count` blocks sorted by execution count, excluding
-    Input/Output/Agent block types and blocks in EXCLUDED_BLOCK_IDS.
-    """
-    results = await mv_suggested_blocks.prisma().find_many()
+    results = await query_raw_with_schema(
+        """
+        SELECT
+            agent_node."agentBlockId" AS block_id,
+            COUNT(execution.id) AS execution_count
+        FROM {schema_prefix}"AgentNodeExecution" execution
+        JOIN {schema_prefix}"AgentNode" agent_node ON execution."agentNodeId" = agent_node.id
+        WHERE execution."endedTime" >= $1::timestamp
+        GROUP BY agent_node."agentBlockId"
+        ORDER BY execution_count DESC;
+        """,
+        timestamp_threshold,
+    )

    # Get the top blocks based on execution count
-    # But ignore Input, Output, Agent, and excluded blocks
+    # But ignore Input and Output blocks
    blocks: list[tuple[BlockInfo, int]] = []
-    execution_counts = {row.block_id: row.execution_count for row in results}

    for block_type in load_all_blocks().values():
        block: AnyBlockSchema = block_type()
        if block.disabled or block.block_type in (
-            BlockType.INPUT,
-            BlockType.OUTPUT,
-            BlockType.AGENT,
+            backend.data.block.BlockType.INPUT,
+            backend.data.block.BlockType.OUTPUT,
+            backend.data.block.BlockType.AGENT,
        ):
            continue
-        if block.id in EXCLUDED_BLOCK_IDS:
-            continue
-        execution_count = execution_counts.get(block.id, 0)
+        # Find the execution count for this block
+        execution_count = next(
+            (row["execution_count"] for row in results if row["block_id"] == block.id),
+            0,
+        )
        blocks.append((block.get_info(), execution_count))
    # Sort blocks by execution count
    blocks.sort(key=lambda x: x[1], reverse=True)
--- a/autogpt_platform/backend/backend/api/features/builder/model.py
+++ b/autogpt_platform/backend/backend/api/features/builder/model.py
@@ -4,7 +4,7 @@ from pydantic import BaseModel

 import backend.api.features.library.model as library_model
 import backend.api.features.store.model as store_model
-from backend.blocks._base import BlockInfo
+from backend.data.block import BlockInfo
 from backend.integrations.providers import ProviderName
 from backend.util.models import Pagination

@@ -15,7 +15,7 @@ FilterType = Literal[
    "my_agents",
 ]

-BlockTypeFilter = Literal["all", "input", "action", "output"]
+BlockType = Literal["all", "input", "action", "output"]


 class SearchEntry(BaseModel):
@@ -27,6 +27,7 @@ class SearchEntry(BaseModel):

 # Suggestions
 class SuggestionsResponse(BaseModel):
+    otto_suggestions: list[str]
    recent_searches: list[SearchEntry]
    providers: list[ProviderName]
    top_blocks: list[BlockInfo]
--- a/autogpt_platform/backend/backend/api/features/builder/routes.py
+++ b/autogpt_platform/backend/backend/api/features/builder/routes.py
@@ -1,5 +1,5 @@
 import logging
-from typing import Annotated, Sequence, cast, get_args
+from typing import Annotated, Sequence

 import fastapi
 from autogpt_libs.auth.dependencies import get_user_id, requires_user
@@ -10,8 +10,6 @@ from backend.util.models import Pagination
 from . import db as builder_db
 from . import model as builder_model

-VALID_FILTER_VALUES = get_args(builder_model.FilterType)
-
 logger = logging.getLogger(__name__)

 router = fastapi.APIRouter(
@@ -51,6 +49,11 @@ async def get_suggestions(
    Get all suggestions for the Blocks Menu.
    """
    return builder_model.SuggestionsResponse(
+        otto_suggestions=[
+            "What blocks do I need to get started?",
+            "Help me create a list",
+            "Help me feed my data to Google Maps",
+        ],
        recent_searches=await builder_db.get_recent_searches(user_id),
        providers=[
            ProviderName.TWITTER,
@@ -85,7 +88,7 @@ async def get_block_categories(
 )
 async def get_blocks(
    category: Annotated[str | None, fastapi.Query()] = None,
-    type: Annotated[builder_model.BlockTypeFilter | None, fastapi.Query()] = None,
+    type: Annotated[builder_model.BlockType | None, fastapi.Query()] = None,
    provider: Annotated[ProviderName | None, fastapi.Query()] = None,
    page: Annotated[int, fastapi.Query()] = 1,
    page_size: Annotated[int, fastapi.Query()] = 50,
@@ -148,7 +151,7 @@ async def get_providers(
 async def search(
    user_id: Annotated[str, fastapi.Security(get_user_id)],
    search_query: Annotated[str | None, fastapi.Query()] = None,
-    filter: Annotated[str | None, fastapi.Query()] = None,
+    filter: Annotated[list[builder_model.FilterType] | None, fastapi.Query()] = None,
    search_id: Annotated[str | None, fastapi.Query()] = None,
    by_creator: Annotated[list[str] | None, fastapi.Query()] = None,
    page: Annotated[int, fastapi.Query()] = 1,
@@ -157,20 +160,9 @@ async def search(
    """
    Search for blocks (including integrations), marketplace agents, and user library agents.
    """
-    # Parse and validate filter parameter
-    filters: list[builder_model.FilterType]
-    if filter:
-        filter_values = [f.strip() for f in filter.split(",")]
-        invalid_filters = [f for f in filter_values if f not in VALID_FILTER_VALUES]
-        if invalid_filters:
-            raise fastapi.HTTPException(
-                status_code=400,
-                detail=f"Invalid filter value(s): {', '.join(invalid_filters)}. "
-                f"Valid values are: {', '.join(VALID_FILTER_VALUES)}",
-            )
-        filters = cast(list[builder_model.FilterType], filter_values)
-    else:
-        filters = [
+    # If no filters are provided, then we will return all types
+    if not filter:
+        filter = [
            "blocks",
            "integrations",
            "marketplace_agents",
@@ -182,7 +174,7 @@ async def search(
    cached_results = await builder_db.get_sorted_search_results(
        user_id=user_id,
        search_query=search_query,
-        filters=filters,
+        filters=filter,
        by_creator=by_creator,
    )

@@ -204,7 +196,7 @@ async def search(
        user_id,
        builder_model.SearchEntry(
            search_query=search_query,
-            filter=filters,
+            filter=filter,
            by_creator=by_creator,
            search_id=search_id,
        ),
--- a/autogpt_platform/backend/backend/api/features/chat/config.py
+++ b/autogpt_platform/backend/backend/api/features/chat/config.py
@@ -0,0 +1,96 @@
+"""Configuration management for chat system."""
+
+import os
+
+from pydantic import Field, field_validator
+from pydantic_settings import BaseSettings
+
+
+class ChatConfig(BaseSettings):
+    """Configuration for the chat system."""
+
+    # OpenAI API Configuration
+    model: str = Field(
+        default="anthropic/claude-opus-4.5", description="Default model to use"
+    )
+    title_model: str = Field(
+        default="openai/gpt-4o-mini",
+        description="Model to use for generating session titles (should be fast/cheap)",
+    )
+    api_key: str | None = Field(default=None, description="OpenAI API key")
+    base_url: str | None = Field(
+        default="https://openrouter.ai/api/v1",
+        description="Base URL for API (e.g., for OpenRouter)",
+    )
+
+    # Session TTL Configuration - 12 hours
+    session_ttl: int = Field(default=43200, description="Session TTL in seconds")
+
+    # Streaming Configuration
+    max_context_messages: int = Field(
+        default=50, ge=1, le=200, description="Maximum context messages"
+    )
+
+    stream_timeout: int = Field(default=300, description="Stream timeout in seconds")
+    max_retries: int = Field(default=3, description="Maximum number of retries")
+    max_agent_runs: int = Field(default=30, description="Maximum number of agent runs")
+    max_agent_schedules: int = Field(
+        default=30, description="Maximum number of agent schedules"
+    )
+
+    # Long-running operation configuration
+    long_running_operation_ttl: int = Field(
+        default=600,
+        description="TTL in seconds for long-running operation tracking in Redis (safety net if pod dies)",
+    )
+
+    # Langfuse Prompt Management Configuration
+    # Note: Langfuse credentials are in Settings().secrets (settings.py)
+    langfuse_prompt_name: str = Field(
+        default="CoPilot Prompt",
+        description="Name of the prompt in Langfuse to fetch",
+    )
+
+    @field_validator("api_key", mode="before")
+    @classmethod
+    def get_api_key(cls, v):
+        """Get API key from environment if not provided."""
+        if v is None:
+            # Try to get from environment variables
+            # First check for CHAT_API_KEY (Pydantic prefix)
+            v = os.getenv("CHAT_API_KEY")
+            if not v:
+                # Fall back to OPEN_ROUTER_API_KEY
+                v = os.getenv("OPEN_ROUTER_API_KEY")
+            if not v:
+                # Fall back to OPENAI_API_KEY
+                v = os.getenv("OPENAI_API_KEY")
+        return v
+
+    @field_validator("base_url", mode="before")
+    @classmethod
+    def get_base_url(cls, v):
+        """Get base URL from environment if not provided."""
+        if v is None:
+            # Check for OpenRouter or custom base URL
+            v = os.getenv("CHAT_BASE_URL")
+            if not v:
+                v = os.getenv("OPENROUTER_BASE_URL")
+            if not v:
+                v = os.getenv("OPENAI_BASE_URL")
+            if not v:
+                v = "https://openrouter.ai/api/v1"
+        return v
+
+    # Prompt paths for different contexts
+    PROMPT_PATHS: dict[str, str] = {
+        "default": "prompts/chat_system.md",
+        "onboarding": "prompts/onboarding_system.md",
+    }
+
+    class Config:
+        """Pydantic config."""
+
+        env_file = ".env"
+        env_file_encoding = "utf-8"
+        extra = "ignore"  # Ignore extra environment variables
--- a/autogpt_platform/backend/backend/api/features/chat/db.py
+++ b/autogpt_platform/backend/backend/api/features/chat/db.py
@@ -0,0 +1,291 @@
+"""Database operations for chat sessions."""
+
+import asyncio
+import logging
+from datetime import UTC, datetime
+from typing import Any, cast
+
+from prisma.models import ChatMessage as PrismaChatMessage
+from prisma.models import ChatSession as PrismaChatSession
+from prisma.types import (
+    ChatMessageCreateInput,
+    ChatSessionCreateInput,
+    ChatSessionUpdateInput,
+    ChatSessionWhereInput,
+)
+
+from backend.data.db import transaction
+from backend.util.json import SafeJson
+
+logger = logging.getLogger(__name__)
+
+
+async def get_chat_session(session_id: str) -> PrismaChatSession | None:
+    """Get a chat session by ID from the database."""
+    session = await PrismaChatSession.prisma().find_unique(
+        where={"id": session_id},
+        include={"Messages": True},
+    )
+    if session and session.Messages:
+        # Sort messages by sequence in Python - Prisma Python client doesn't support
+        # order_by in include clauses (unlike Prisma JS), so we sort after fetching
+        session.Messages.sort(key=lambda m: m.sequence)
+    return session
+
+
+async def create_chat_session(
+    session_id: str,
+    user_id: str,
+) -> PrismaChatSession:
+    """Create a new chat session in the database."""
+    data = ChatSessionCreateInput(
+        id=session_id,
+        userId=user_id,
+        credentials=SafeJson({}),
+        successfulAgentRuns=SafeJson({}),
+        successfulAgentSchedules=SafeJson({}),
+    )
+    return await PrismaChatSession.prisma().create(
+        data=data,
+        include={"Messages": True},
+    )
+
+
+async def update_chat_session(
+    session_id: str,
+    credentials: dict[str, Any] | None = None,
+    successful_agent_runs: dict[str, Any] | None = None,
+    successful_agent_schedules: dict[str, Any] | None = None,
+    total_prompt_tokens: int | None = None,
+    total_completion_tokens: int | None = None,
+    title: str | None = None,
+) -> PrismaChatSession | None:
+    """Update a chat session's metadata."""
+    data: ChatSessionUpdateInput = {"updatedAt": datetime.now(UTC)}
+
+    if credentials is not None:
+        data["credentials"] = SafeJson(credentials)
+    if successful_agent_runs is not None:
+        data["successfulAgentRuns"] = SafeJson(successful_agent_runs)
+    if successful_agent_schedules is not None:
+        data["successfulAgentSchedules"] = SafeJson(successful_agent_schedules)
+    if total_prompt_tokens is not None:
+        data["totalPromptTokens"] = total_prompt_tokens
+    if total_completion_tokens is not None:
+        data["totalCompletionTokens"] = total_completion_tokens
+    if title is not None:
+        data["title"] = title
+
+    session = await PrismaChatSession.prisma().update(
+        where={"id": session_id},
+        data=data,
+        include={"Messages": True},
+    )
+    if session and session.Messages:
+        # Sort in Python - Prisma Python doesn't support order_by in include clauses
+        session.Messages.sort(key=lambda m: m.sequence)
+    return session
+
+
+async def add_chat_message(
+    session_id: str,
+    role: str,
+    sequence: int,
+    content: str | None = None,
+    name: str | None = None,
+    tool_call_id: str | None = None,
+    refusal: str | None = None,
+    tool_calls: list[dict[str, Any]] | None = None,
+    function_call: dict[str, Any] | None = None,
+) -> PrismaChatMessage:
+    """Add a message to a chat session."""
+    # Build input dict dynamically rather than using ChatMessageCreateInput directly
+    # because Prisma's TypedDict validation rejects optional fields set to None.
+    # We only include fields that have values, then cast at the end.
+    data: dict[str, Any] = {
+        "Session": {"connect": {"id": session_id}},
+        "role": role,
+        "sequence": sequence,
+    }
+
+    # Add optional string fields
+    if content is not None:
+        data["content"] = content
+    if name is not None:
+        data["name"] = name
+    if tool_call_id is not None:
+        data["toolCallId"] = tool_call_id
+    if refusal is not None:
+        data["refusal"] = refusal
+
+    # Add optional JSON fields only when they have values
+    if tool_calls is not None:
+        data["toolCalls"] = SafeJson(tool_calls)
+    if function_call is not None:
+        data["functionCall"] = SafeJson(function_call)
+
+    # Run message create and session timestamp update in parallel for lower latency
+    _, message = await asyncio.gather(
+        PrismaChatSession.prisma().update(
+            where={"id": session_id},
+            data={"updatedAt": datetime.now(UTC)},
+        ),
+        PrismaChatMessage.prisma().create(data=cast(ChatMessageCreateInput, data)),
+    )
+    return message
+
+
+async def add_chat_messages_batch(
+    session_id: str,
+    messages: list[dict[str, Any]],
+    start_sequence: int,
+) -> list[PrismaChatMessage]:
+    """Add multiple messages to a chat session in a batch.
+
+    Uses a transaction for atomicity - if any message creation fails,
+    the entire batch is rolled back.
+    """
+    if not messages:
+        return []
+
+    created_messages = []
+
+    async with transaction() as tx:
+        for i, msg in enumerate(messages):
+            # Build input dict dynamically rather than using ChatMessageCreateInput
+            # directly because Prisma's TypedDict validation rejects optional fields
+            # set to None. We only include fields that have values, then cast.
+            data: dict[str, Any] = {
+                "Session": {"connect": {"id": session_id}},
+                "role": msg["role"],
+                "sequence": start_sequence + i,
+            }
+
+            # Add optional string fields
+            if msg.get("content") is not None:
+                data["content"] = msg["content"]
+            if msg.get("name") is not None:
+                data["name"] = msg["name"]
+            if msg.get("tool_call_id") is not None:
+                data["toolCallId"] = msg["tool_call_id"]
+            if msg.get("refusal") is not None:
+                data["refusal"] = msg["refusal"]
+
+            # Add optional JSON fields only when they have values
+            if msg.get("tool_calls") is not None:
+                data["toolCalls"] = SafeJson(msg["tool_calls"])
+            if msg.get("function_call") is not None:
+                data["functionCall"] = SafeJson(msg["function_call"])
+
+            created = await PrismaChatMessage.prisma(tx).create(
+                data=cast(ChatMessageCreateInput, data)
+            )
+            created_messages.append(created)
+
+        # Update session's updatedAt timestamp within the same transaction.
+        # Note: Token usage (total_prompt_tokens, total_completion_tokens) is updated
+        # separately via update_chat_session() after streaming completes.
+        await PrismaChatSession.prisma(tx).update(
+            where={"id": session_id},
+            data={"updatedAt": datetime.now(UTC)},
+        )
+
+    return created_messages
+
+
+async def get_user_chat_sessions(
+    user_id: str,
+    limit: int = 50,
+    offset: int = 0,
+) -> list[PrismaChatSession]:
+    """Get chat sessions for a user, ordered by most recent."""
+    return await PrismaChatSession.prisma().find_many(
+        where={"userId": user_id},
+        order={"updatedAt": "desc"},
+        take=limit,
+        skip=offset,
+    )
+
+
+async def get_user_session_count(user_id: str) -> int:
+    """Get the total number of chat sessions for a user."""
+    return await PrismaChatSession.prisma().count(where={"userId": user_id})
+
+
+async def delete_chat_session(session_id: str, user_id: str | None = None) -> bool:
+    """Delete a chat session and all its messages.
+
+    Args:
+        session_id: The session ID to delete.
+        user_id: If provided, validates that the session belongs to this user
+            before deletion. This prevents unauthorized deletion of other
+            users' sessions.
+
+    Returns:
+        True if deleted successfully, False otherwise.
+    """
+    try:
+        # Build typed where clause with optional user_id validation
+        where_clause: ChatSessionWhereInput = {"id": session_id}
+        if user_id is not None:
+            where_clause["userId"] = user_id
+
+        result = await PrismaChatSession.prisma().delete_many(where=where_clause)
+        if result == 0:
+            logger.warning(
+                f"No session deleted for {session_id} "
+                f"(user_id validation: {user_id is not None})"
+            )
+            return False
+        return True
+    except Exception as e:
+        logger.error(f"Failed to delete chat session {session_id}: {e}")
+        return False
+
+
+async def get_chat_session_message_count(session_id: str) -> int:
+    """Get the number of messages in a chat session."""
+    count = await PrismaChatMessage.prisma().count(where={"sessionId": session_id})
+    return count
+
+
+async def update_tool_message_content(
+    session_id: str,
+    tool_call_id: str,
+    new_content: str,
+) -> bool:
+    """Update the content of a tool message in chat history.
+
+    Used by background tasks to update pending operation messages with final results.
+
+    Args:
+        session_id: The chat session ID.
+        tool_call_id: The tool call ID to find the message.
+        new_content: The new content to set.
+
+    Returns:
+        True if a message was updated, False otherwise.
+    """
+    try:
+        result = await PrismaChatMessage.prisma().update_many(
+            where={
+                "sessionId": session_id,
+                "toolCallId": tool_call_id,
+            },
+            data={
+                "content": new_content,
+            },
+        )
+        if result == 0:
+            logger.warning(
+                f"No message found to update for session {session_id}, "
+                f"tool_call_id {tool_call_id}"
+            )
+            return False
+        return True
+    except Exception as e:
+        logger.error(
+            f"Failed to update tool message for session {session_id}, "
+            f"tool_call_id {tool_call_id}: {e}"
+        )
+        return False
--- a/autogpt_platform/backend/backend/api/features/chat/model.py
+++ b/autogpt_platform/backend/backend/api/features/chat/model.py
@@ -0,0 +1,617 @@
+import asyncio
+import logging
+import uuid
+from datetime import UTC, datetime
+from typing import Any
+from weakref import WeakValueDictionary
+
+from openai.types.chat import (
+    ChatCompletionAssistantMessageParam,
+    ChatCompletionDeveloperMessageParam,
+    ChatCompletionFunctionMessageParam,
+    ChatCompletionMessageParam,
+    ChatCompletionSystemMessageParam,
+    ChatCompletionToolMessageParam,
+    ChatCompletionUserMessageParam,
+)
+from openai.types.chat.chat_completion_assistant_message_param import FunctionCall
+from openai.types.chat.chat_completion_message_tool_call_param import (
+    ChatCompletionMessageToolCallParam,
+    Function,
+)
+from prisma.models import ChatMessage as PrismaChatMessage
+from prisma.models import ChatSession as PrismaChatSession
+from pydantic import BaseModel
+
+from backend.data.redis_client import get_redis_async
+from backend.util import json
+from backend.util.exceptions import DatabaseError, RedisError
+
+from . import db as chat_db
+from .config import ChatConfig
+
+logger = logging.getLogger(__name__)
+config = ChatConfig()
+
+
+def _parse_json_field(value: str | dict | list | None, default: Any = None) -> Any:
+    """Parse a JSON field that may be stored as string or already parsed."""
+    if value is None:
+        return default
+    if isinstance(value, str):
+        return json.loads(value)
+    return value
+
+
+# Redis cache key prefix for chat sessions
+CHAT_SESSION_CACHE_PREFIX = "chat:session:"
+
+
+def _get_session_cache_key(session_id: str) -> str:
+    """Get the Redis cache key for a chat session."""
+    return f"{CHAT_SESSION_CACHE_PREFIX}{session_id}"
+
+
+# Session-level locks to prevent race conditions during concurrent upserts.
+# Uses WeakValueDictionary to automatically garbage collect locks when no longer referenced,
+# preventing unbounded memory growth while maintaining lock semantics for active sessions.
+# Invalidation: Locks are auto-removed by GC when no coroutine holds a reference (after
+# async with lock: completes). Explicit cleanup also occurs in delete_chat_session().
+_session_locks: WeakValueDictionary[str, asyncio.Lock] = WeakValueDictionary()
+_session_locks_mutex = asyncio.Lock()
+
+
+async def _get_session_lock(session_id: str) -> asyncio.Lock:
+    """Get or create a lock for a specific session to prevent concurrent upserts.
+
+    Uses WeakValueDictionary for automatic cleanup: locks are garbage collected
+    when no coroutine holds a reference to them, preventing memory leaks from
+    unbounded growth of session locks.
+    """
+    async with _session_locks_mutex:
+        lock = _session_locks.get(session_id)
+        if lock is None:
+            lock = asyncio.Lock()
+            _session_locks[session_id] = lock
+        return lock
+
+
+class ChatMessage(BaseModel):
+    role: str
+    content: str | None = None
+    name: str | None = None
+    tool_call_id: str | None = None
+    refusal: str | None = None
+    tool_calls: list[dict] | None = None
+    function_call: dict | None = None
+
+
+class Usage(BaseModel):
+    prompt_tokens: int
+    completion_tokens: int
+    total_tokens: int
+
+
+class ChatSession(BaseModel):
+    session_id: str
+    user_id: str
+    title: str | None = None
+    messages: list[ChatMessage]
+    usage: list[Usage]
+    credentials: dict[str, dict] = {}  # Map of provider -> credential metadata
+    started_at: datetime
+    updated_at: datetime
+    successful_agent_runs: dict[str, int] = {}
+    successful_agent_schedules: dict[str, int] = {}
+
+    @staticmethod
+    def new(user_id: str) -> "ChatSession":
+        return ChatSession(
+            session_id=str(uuid.uuid4()),
+            user_id=user_id,
+            title=None,
+            messages=[],
+            usage=[],
+            credentials={},
+            started_at=datetime.now(UTC),
+            updated_at=datetime.now(UTC),
+        )
+
+    @staticmethod
+    def from_db(
+        prisma_session: PrismaChatSession,
+        prisma_messages: list[PrismaChatMessage] | None = None,
+    ) -> "ChatSession":
+        """Convert Prisma models to Pydantic ChatSession."""
+        messages = []
+        if prisma_messages:
+            for msg in prisma_messages:
+                messages.append(
+                    ChatMessage(
+                        role=msg.role,
+                        content=msg.content,
+                        name=msg.name,
+                        tool_call_id=msg.toolCallId,
+                        refusal=msg.refusal,
+                        tool_calls=_parse_json_field(msg.toolCalls),
+                        function_call=_parse_json_field(msg.functionCall),
+                    )
+                )
+
+        # Parse JSON fields from Prisma
+        credentials = _parse_json_field(prisma_session.credentials, default={})
+        successful_agent_runs = _parse_json_field(
+            prisma_session.successfulAgentRuns, default={}
+        )
+        successful_agent_schedules = _parse_json_field(
+            prisma_session.successfulAgentSchedules, default={}
+        )
+
+        # Calculate usage from token counts
+        usage = []
+        if prisma_session.totalPromptTokens or prisma_session.totalCompletionTokens:
+            usage.append(
+                Usage(
+                    prompt_tokens=prisma_session.totalPromptTokens or 0,
+                    completion_tokens=prisma_session.totalCompletionTokens or 0,
+                    total_tokens=(prisma_session.totalPromptTokens or 0)
+                    + (prisma_session.totalCompletionTokens or 0),
+                )
+            )
+
+        return ChatSession(
+            session_id=prisma_session.id,
+            user_id=prisma_session.userId,
+            title=prisma_session.title,
+            messages=messages,
+            usage=usage,
+            credentials=credentials,
+            started_at=prisma_session.createdAt,
+            updated_at=prisma_session.updatedAt,
+            successful_agent_runs=successful_agent_runs,
+            successful_agent_schedules=successful_agent_schedules,
+        )
+
+    def to_openai_messages(self) -> list[ChatCompletionMessageParam]:
+        messages = []
+        for message in self.messages:
+            if message.role == "developer":
+                m = ChatCompletionDeveloperMessageParam(
+                    role="developer",
+                    content=message.content or "",
+                )
+                if message.name:
+                    m["name"] = message.name
+                messages.append(m)
+            elif message.role == "system":
+                m = ChatCompletionSystemMessageParam(
+                    role="system",
+                    content=message.content or "",
+                )
+                if message.name:
+                    m["name"] = message.name
+                messages.append(m)
+            elif message.role == "user":
+                m = ChatCompletionUserMessageParam(
+                    role="user",
+                    content=message.content or "",
+                )
+                if message.name:
+                    m["name"] = message.name
+                messages.append(m)
+            elif message.role == "assistant":
+                m = ChatCompletionAssistantMessageParam(
+                    role="assistant",
+                    content=message.content or "",
+                )
+                if message.function_call:
+                    m["function_call"] = FunctionCall(
+                        arguments=message.function_call["arguments"],
+                        name=message.function_call["name"],
+                    )
+                if message.refusal:
+                    m["refusal"] = message.refusal
+                if message.tool_calls:
+                    t: list[ChatCompletionMessageToolCallParam] = []
+                    for tool_call in message.tool_calls:
+                        # Tool calls are stored with nested structure: {id, type, function: {name, arguments}}
+                        function_data = tool_call.get("function", {})
+
+                        # Skip tool calls that are missing required fields
+                        if "id" not in tool_call or "name" not in function_data:
+                            logger.warning(
+                                f"Skipping invalid tool call: missing required fields. "
+                                f"Got: {tool_call.keys()}, function keys: {function_data.keys()}"
+                            )
+                            continue
+
+                        # Arguments are stored as a JSON string
+                        arguments_str = function_data.get("arguments", "{}")
+
+                        t.append(
+                            ChatCompletionMessageToolCallParam(
+                                id=tool_call["id"],
+                                type="function",
+                                function=Function(
+                                    arguments=arguments_str,
+                                    name=function_data["name"],
+                                ),
+                            )
+                        )
+                    m["tool_calls"] = t
+                if message.name:
+                    m["name"] = message.name
+                messages.append(m)
+            elif message.role == "tool":
+                messages.append(
+                    ChatCompletionToolMessageParam(
+                        role="tool",
+                        content=message.content or "",
+                        tool_call_id=message.tool_call_id or "",
+                    )
+                )
+            elif message.role == "function":
+                messages.append(
+                    ChatCompletionFunctionMessageParam(
+                        role="function",
+                        content=message.content,
+                        name=message.name or "",
+                    )
+                )
+        return messages
+
+
+async def _get_session_from_cache(session_id: str) -> ChatSession | None:
+    """Get a chat session from Redis cache."""
+    redis_key = _get_session_cache_key(session_id)
+    async_redis = await get_redis_async()
+    raw_session: bytes | None = await async_redis.get(redis_key)
+
+    if raw_session is None:
+        return None
+
+    try:
+        session = ChatSession.model_validate_json(raw_session)
+        logger.info(
+            f"Loading session {session_id} from cache: "
+            f"message_count={len(session.messages)}, "
+            f"roles={[m.role for m in session.messages]}"
+        )
+        return session
+    except Exception as e:
+        logger.error(f"Failed to deserialize session {session_id}: {e}", exc_info=True)
+        raise RedisError(f"Corrupted session data for {session_id}") from e
+
+
+async def _cache_session(session: ChatSession) -> None:
+    """Cache a chat session in Redis."""
+    redis_key = _get_session_cache_key(session.session_id)
+    async_redis = await get_redis_async()
+    await async_redis.setex(redis_key, config.session_ttl, session.model_dump_json())
+
+
+async def cache_chat_session(session: ChatSession) -> None:
+    """Cache a chat session without persisting to the database."""
+    await _cache_session(session)
+
+
+async def invalidate_session_cache(session_id: str) -> None:
+    """Invalidate a chat session from Redis cache.
+
+    Used by background tasks to ensure fresh data is loaded on next access.
+    This is best-effort - Redis failures are logged but don't fail the operation.
+    """
+    try:
+        redis_key = _get_session_cache_key(session_id)
+        async_redis = await get_redis_async()
+        await async_redis.delete(redis_key)
+    except Exception as e:
+        # Best-effort: log but don't fail - cache will expire naturally
+        logger.warning(f"Failed to invalidate session cache for {session_id}: {e}")
+
+
+async def _get_session_from_db(session_id: str) -> ChatSession | None:
+    """Get a chat session from the database."""
+    prisma_session = await chat_db.get_chat_session(session_id)
+    if not prisma_session:
+        return None
+
+    messages = prisma_session.Messages
+    logger.info(
+        f"Loading session {session_id} from DB: "
+        f"has_messages={messages is not None}, "
+        f"message_count={len(messages) if messages else 0}, "
+        f"roles={[m.role for m in messages] if messages else []}"
+    )
+
+    return ChatSession.from_db(prisma_session, messages)
+
+
+async def _save_session_to_db(
+    session: ChatSession, existing_message_count: int
+) -> None:
+    """Save or update a chat session in the database."""
+    # Check if session exists in DB
+    existing = await chat_db.get_chat_session(session.session_id)
+
+    if not existing:
+        # Create new session
+        await chat_db.create_chat_session(
+            session_id=session.session_id,
+            user_id=session.user_id,
+        )
+        existing_message_count = 0
+
+    # Calculate total tokens from usage
+    total_prompt = sum(u.prompt_tokens for u in session.usage)
+    total_completion = sum(u.completion_tokens for u in session.usage)
+
+    # Update session metadata
+    await chat_db.update_chat_session(
+        session_id=session.session_id,
+        credentials=session.credentials,
+        successful_agent_runs=session.successful_agent_runs,
+        successful_agent_schedules=session.successful_agent_schedules,
+        total_prompt_tokens=total_prompt,
+        total_completion_tokens=total_completion,
+    )
+
+    # Add new messages (only those after existing count)
+    new_messages = session.messages[existing_message_count:]
+    if new_messages:
+        messages_data = []
+        for msg in new_messages:
+            messages_data.append(
+                {
+                    "role": msg.role,
+                    "content": msg.content,
+                    "name": msg.name,
+                    "tool_call_id": msg.tool_call_id,
+                    "refusal": msg.refusal,
+                    "tool_calls": msg.tool_calls,
+                    "function_call": msg.function_call,
+                }
+            )
+        logger.info(
+            f"Saving {len(new_messages)} new messages to DB for session {session.session_id}: "
+            f"roles={[m['role'] for m in messages_data]}, "
+            f"start_sequence={existing_message_count}"
+        )
+        await chat_db.add_chat_messages_batch(
+            session_id=session.session_id,
+            messages=messages_data,
+            start_sequence=existing_message_count,
+        )
+
+
+async def get_chat_session(
+    session_id: str,
+    user_id: str | None = None,
+) -> ChatSession | None:
+    """Get a chat session by ID.
+
+    Checks Redis cache first, falls back to database if not found.
+    Caches database results back to Redis.
+
+    Args:
+        session_id: The session ID to fetch.
+        user_id: If provided, validates that the session belongs to this user.
+            If None, ownership is not validated (admin/system access).
+    """
+    # Try cache first
+    try:
+        session = await _get_session_from_cache(session_id)
+        if session:
+            # Verify user ownership if user_id was provided for validation
+            if user_id is not None and session.user_id != user_id:
+                logger.warning(
+                    f"Session {session_id} user id mismatch: {session.user_id} != {user_id}"
+                )
+                return None
+            return session
+    except RedisError:
+        logger.warning(f"Cache error for session {session_id}, trying database")
+    except Exception as e:
+        logger.warning(f"Unexpected cache error for session {session_id}: {e}")
+
+    # Fall back to database
+    logger.info(f"Session {session_id} not in cache, checking database")
+    session = await _get_session_from_db(session_id)
+
+    if session is None:
+        logger.warning(f"Session {session_id} not found in cache or database")
+        return None
+
+    # Verify user ownership if user_id was provided for validation
+    if user_id is not None and session.user_id != user_id:
+        logger.warning(
+            f"Session {session_id} user id mismatch: {session.user_id} != {user_id}"
+        )
+        return None
+
+    # Cache the session from DB
+    try:
+        await _cache_session(session)
+        logger.info(f"Cached session {session_id} from database")
+    except Exception as e:
+        logger.warning(f"Failed to cache session {session_id}: {e}")
+
+    return session
+
+
+async def upsert_chat_session(
+    session: ChatSession,
+) -> ChatSession:
+    """Update a chat session in both cache and database.
+
+    Uses session-level locking to prevent race conditions when concurrent
+    operations (e.g., background title update and main stream handler)
+    attempt to upsert the same session simultaneously.
+
+    Raises:
+        DatabaseError: If the database write fails. The cache is still updated
+            as a best-effort optimization, but the error is propagated to ensure
+            callers are aware of the persistence failure.
+        RedisError: If the cache write fails (after successful DB write).
+    """
+    # Acquire session-specific lock to prevent concurrent upserts
+    lock = await _get_session_lock(session.session_id)
+
+    async with lock:
+        # Get existing message count from DB for incremental saves
+        existing_message_count = await chat_db.get_chat_session_message_count(
+            session.session_id
+        )
+
+        db_error: Exception | None = None
+
+        # Save to database (primary storage)
+        try:
+            await _save_session_to_db(session, existing_message_count)
+        except Exception as e:
+            logger.error(
+                f"Failed to save session {session.session_id} to database: {e}"
+            )
+            db_error = e
+
+        # Save to cache (best-effort, even if DB failed)
+        try:
+            await _cache_session(session)
+        except Exception as e:
+            # If DB succeeded but cache failed, raise cache error
+            if db_error is None:
+                raise RedisError(
+                    f"Failed to persist chat session {session.session_id} to Redis: {e}"
+                ) from e
+            # If both failed, log cache error but raise DB error (more critical)
+            logger.warning(
+                f"Cache write also failed for session {session.session_id}: {e}"
+            )
+
+        # Propagate DB error after attempting cache (prevents data loss)
+        if db_error is not None:
+            raise DatabaseError(
+                f"Failed to persist chat session {session.session_id} to database"
+            ) from db_error
+
+        return session
+
+
+async def create_chat_session(user_id: str) -> ChatSession:
+    """Create a new chat session and persist it.
+
+    Raises:
+        DatabaseError: If the database write fails. We fail fast to ensure
+            callers never receive a non-persisted session that only exists
+            in cache (which would be lost when the cache expires).
+    """
+    session = ChatSession.new(user_id)
+
+    # Create in database first - fail fast if this fails
+    try:
+        await chat_db.create_chat_session(
+            session_id=session.session_id,
+            user_id=user_id,
+        )
+    except Exception as e:
+        logger.error(f"Failed to create session {session.session_id} in database: {e}")
+        raise DatabaseError(
+            f"Failed to create chat session {session.session_id} in database"
+        ) from e
+
+    # Cache the session (best-effort optimization, DB is source of truth)
+    try:
+        await _cache_session(session)
+    except Exception as e:
+        logger.warning(f"Failed to cache new session {session.session_id}: {e}")
+
+    return session
+
+
+async def get_user_sessions(
+    user_id: str,
+    limit: int = 50,
+    offset: int = 0,
+) -> tuple[list[ChatSession], int]:
+    """Get chat sessions for a user from the database with total count.
+
+    Returns:
+        A tuple of (sessions, total_count) where total_count is the overall
+        number of sessions for the user (not just the current page).
+    """
+    prisma_sessions = await chat_db.get_user_chat_sessions(user_id, limit, offset)
+    total_count = await chat_db.get_user_session_count(user_id)
+
+    sessions = []
+    for prisma_session in prisma_sessions:
+        # Convert without messages for listing (lighter weight)
+        sessions.append(ChatSession.from_db(prisma_session, None))
+
+    return sessions, total_count
+
+
+async def delete_chat_session(session_id: str, user_id: str | None = None) -> bool:
+    """Delete a chat session from both cache and database.
+
+    Args:
+        session_id: The session ID to delete.
+        user_id: If provided, validates that the session belongs to this user
+            before deletion. This prevents unauthorized deletion.
+
+    Returns:
+        True if deleted successfully, False otherwise.
+    """
+    # Delete from database first (with optional user_id validation)
+    # This confirms ownership before invalidating cache
+    deleted = await chat_db.delete_chat_session(session_id, user_id)
+
+    if not deleted:
+        return False
+
+    # Only invalidate cache and clean up lock after DB confirms deletion
+    try:
+        redis_key = _get_session_cache_key(session_id)
+        async_redis = await get_redis_async()
+        await async_redis.delete(redis_key)
+    except Exception as e:
+        logger.warning(f"Failed to delete session {session_id} from cache: {e}")
+
+    # Clean up session lock (belt-and-suspenders with WeakValueDictionary)
+    async with _session_locks_mutex:
+        _session_locks.pop(session_id, None)
+
+    return True
+
+
+async def update_session_title(session_id: str, title: str) -> bool:
+    """Update only the title of a chat session.
+
+    This is a lightweight operation that doesn't touch messages, avoiding
+    race conditions with concurrent message updates. Use this for background
+    title generation instead of upsert_chat_session.
+
+    Args:
+        session_id: The session ID to update.
+        title: The new title to set.
+
+    Returns:
+        True if updated successfully, False otherwise.
+    """
+    try:
+        result = await chat_db.update_chat_session(session_id=session_id, title=title)
+        if result is None:
+            logger.warning(f"Session {session_id} not found for title update")
+            return False
+
+        # Invalidate cache so next fetch gets updated title
+        try:
+            redis_key = _get_session_cache_key(session_id)
+            async_redis = await get_redis_async()
+            await async_redis.delete(redis_key)
+        except Exception as e:
+            logger.warning(f"Failed to invalidate cache for session {session_id}: {e}")
+
+        return True
+    except Exception as e:
+        logger.error(f"Failed to update title for session {session_id}: {e}")
+        return False
--- a/autogpt_platform/backend/backend/api/features/chat/model_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/model_test.py
@@ -0,0 +1,119 @@
+import pytest
+
+from .model import (
+    ChatMessage,
+    ChatSession,
+    Usage,
+    get_chat_session,
+    upsert_chat_session,
+)
+
+messages = [
+    ChatMessage(content="Hello, how are you?", role="user"),
+    ChatMessage(
+        content="I'm fine, thank you!",
+        role="assistant",
+        tool_calls=[
+            {
+                "id": "t123",
+                "type": "function",
+                "function": {
+                    "name": "get_weather",
+                    "arguments": '{"city": "New York"}',
+                },
+            }
+        ],
+    ),
+    ChatMessage(
+        content="I'm using the tool to get the weather",
+        role="tool",
+        tool_call_id="t123",
+    ),
+]
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_chatsession_serialization_deserialization():
+    s = ChatSession.new(user_id="abc123")
+    s.messages = messages
+    s.usage = [Usage(prompt_tokens=100, completion_tokens=200, total_tokens=300)]
+    serialized = s.model_dump_json()
+    s2 = ChatSession.model_validate_json(serialized)
+    assert s2.model_dump() == s.model_dump()
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_chatsession_redis_storage(setup_test_user, test_user_id):
+
+    s = ChatSession.new(user_id=test_user_id)
+    s.messages = messages
+
+    s = await upsert_chat_session(s)
+
+    s2 = await get_chat_session(
+        session_id=s.session_id,
+        user_id=s.user_id,
+    )
+
+    assert s2 == s
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_chatsession_redis_storage_user_id_mismatch(
+    setup_test_user, test_user_id
+):
+
+    s = ChatSession.new(user_id=test_user_id)
+    s.messages = messages
+    s = await upsert_chat_session(s)
+
+    s2 = await get_chat_session(s.session_id, "different_user_id")
+
+    assert s2 is None
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_chatsession_db_storage(setup_test_user, test_user_id):
+    """Test that messages are correctly saved to and loaded from DB (not cache)."""
+    from backend.data.redis_client import get_redis_async
+
+    # Create session with messages including assistant message
+    s = ChatSession.new(user_id=test_user_id)
+    s.messages = messages  # Contains user, assistant, and tool messages
+    assert s.session_id is not None, "Session id is not set"
+    # Upsert to save to both cache and DB
+    s = await upsert_chat_session(s)
+
+    # Clear the Redis cache to force DB load
+    redis_key = f"chat:session:{s.session_id}"
+    async_redis = await get_redis_async()
+    await async_redis.delete(redis_key)
+
+    # Load from DB (cache was cleared)
+    s2 = await get_chat_session(
+        session_id=s.session_id,
+        user_id=s.user_id,
+    )
+
+    assert s2 is not None, "Session not found after loading from DB"
+    assert len(s2.messages) == len(
+        s.messages
+    ), f"Message count mismatch: expected {len(s.messages)}, got {len(s2.messages)}"
+
+    # Verify all roles are present
+    roles = [m.role for m in s2.messages]
+    assert "user" in roles, f"User message missing. Roles found: {roles}"
+    assert "assistant" in roles, f"Assistant message missing. Roles found: {roles}"
+    assert "tool" in roles, f"Tool message missing. Roles found: {roles}"
+
+    # Verify message content
+    for orig, loaded in zip(s.messages, s2.messages):
+        assert orig.role == loaded.role, f"Role mismatch: {orig.role} != {loaded.role}"
+        assert (
+            orig.content == loaded.content
+        ), f"Content mismatch for {orig.role}: {orig.content} != {loaded.content}"
+        if orig.tool_calls:
+            assert (
+                loaded.tool_calls is not None
+            ), f"Tool calls missing for {orig.role} message"
+            assert len(orig.tool_calls) == len(loaded.tool_calls)
--- a/autogpt_platform/backend/backend/api/features/chat/response_model.py
+++ b/autogpt_platform/backend/backend/api/features/chat/response_model.py
@@ -0,0 +1,162 @@
+"""
+Response models for Vercel AI SDK UI Stream Protocol.
+
+This module implements the AI SDK UI Stream Protocol (v1) for streaming chat responses.
+See: https://ai-sdk.dev/docs/ai-sdk-ui/stream-protocol
+"""
+
+from enum import Enum
+from typing import Any
+
+from pydantic import BaseModel, Field
+
+
+class ResponseType(str, Enum):
+    """Types of streaming responses following AI SDK protocol."""
+
+    # Message lifecycle
+    START = "start"
+    FINISH = "finish"
+
+    # Text streaming
+    TEXT_START = "text-start"
+    TEXT_DELTA = "text-delta"
+    TEXT_END = "text-end"
+
+    # Tool interaction
+    TOOL_INPUT_START = "tool-input-start"
+    TOOL_INPUT_AVAILABLE = "tool-input-available"
+    TOOL_OUTPUT_AVAILABLE = "tool-output-available"
+
+    # Other
+    ERROR = "error"
+    USAGE = "usage"
+    HEARTBEAT = "heartbeat"
+
+
+class StreamBaseResponse(BaseModel):
+    """Base response model for all streaming responses."""
+
+    type: ResponseType
+
+    def to_sse(self) -> str:
+        """Convert to SSE format."""
+        return f"data: {self.model_dump_json()}\n\n"
+
+
+# ========== Message Lifecycle ==========
+
+
+class StreamStart(StreamBaseResponse):
+    """Start of a new message."""
+
+    type: ResponseType = ResponseType.START
+    messageId: str = Field(..., description="Unique message ID")
+
+
+class StreamFinish(StreamBaseResponse):
+    """End of message/stream."""
+
+    type: ResponseType = ResponseType.FINISH
+
+
+# ========== Text Streaming ==========
+
+
+class StreamTextStart(StreamBaseResponse):
+    """Start of a text block."""
+
+    type: ResponseType = ResponseType.TEXT_START
+    id: str = Field(..., description="Text block ID")
+
+
+class StreamTextDelta(StreamBaseResponse):
+    """Streaming text content delta."""
+
+    type: ResponseType = ResponseType.TEXT_DELTA
+    id: str = Field(..., description="Text block ID")
+    delta: str = Field(..., description="Text content delta")
+
+
+class StreamTextEnd(StreamBaseResponse):
+    """End of a text block."""
+
+    type: ResponseType = ResponseType.TEXT_END
+    id: str = Field(..., description="Text block ID")
+
+
+# ========== Tool Interaction ==========
+
+
+class StreamToolInputStart(StreamBaseResponse):
+    """Tool call started notification."""
+
+    type: ResponseType = ResponseType.TOOL_INPUT_START
+    toolCallId: str = Field(..., description="Unique tool call ID")
+    toolName: str = Field(..., description="Name of the tool being called")
+
+
+class StreamToolInputAvailable(StreamBaseResponse):
+    """Tool input is ready for execution."""
+
+    type: ResponseType = ResponseType.TOOL_INPUT_AVAILABLE
+    toolCallId: str = Field(..., description="Unique tool call ID")
+    toolName: str = Field(..., description="Name of the tool being called")
+    input: dict[str, Any] = Field(
+        default_factory=dict, description="Tool input arguments"
+    )
+
+
+class StreamToolOutputAvailable(StreamBaseResponse):
+    """Tool execution result."""
+
+    type: ResponseType = ResponseType.TOOL_OUTPUT_AVAILABLE
+    toolCallId: str = Field(..., description="Tool call ID this responds to")
+    output: str | dict[str, Any] = Field(..., description="Tool execution output")
+    # Additional fields for internal use (not part of AI SDK spec but useful)
+    toolName: str | None = Field(
+        default=None, description="Name of the tool that was executed"
+    )
+    success: bool = Field(
+        default=True, description="Whether the tool execution succeeded"
+    )
+
+
+# ========== Other ==========
+
+
+class StreamUsage(StreamBaseResponse):
+    """Token usage statistics."""
+
+    type: ResponseType = ResponseType.USAGE
+    promptTokens: int = Field(..., description="Number of prompt tokens")
+    completionTokens: int = Field(..., description="Number of completion tokens")
+    totalTokens: int = Field(..., description="Total number of tokens")
+
+
+class StreamError(StreamBaseResponse):
+    """Error response."""
+
+    type: ResponseType = ResponseType.ERROR
+    errorText: str = Field(..., description="Error message text")
+    code: str | None = Field(default=None, description="Error code")
+    details: dict[str, Any] | None = Field(
+        default=None, description="Additional error details"
+    )
+
+
+class StreamHeartbeat(StreamBaseResponse):
+    """Heartbeat to keep SSE connection alive during long-running operations.
+
+    Uses SSE comment format (: comment) which is ignored by clients but keeps
+    the connection alive through proxies and load balancers.
+    """
+
+    type: ResponseType = ResponseType.HEARTBEAT
+    toolCallId: str | None = Field(
+        default=None, description="Tool call ID if heartbeat is for a specific tool"
+    )
+
+    def to_sse(self) -> str:
+        """Convert to SSE comment format to keep connection alive."""
+        return ": heartbeat\n\n"
--- a/autogpt_platform/backend/backend/api/features/chat/routes.py
+++ b/autogpt_platform/backend/backend/api/features/chat/routes.py
--- a/autogpt_platform/backend/backend/api/features/chat/routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/routes_test.py
@@ -1,581 +0,0 @@
-"""Tests for chat API routes: session title update, file attachment validation, usage, and rate limiting."""
-
-from datetime import UTC, datetime, timedelta
-from unittest.mock import AsyncMock, MagicMock
-
-import fastapi
-import fastapi.testclient
-import pytest
-import pytest_mock
-
-from backend.api.features.chat import routes as chat_routes
-from backend.copilot.rate_limit import SubscriptionTier
-
-app = fastapi.FastAPI()
-app.include_router(chat_routes.router)
-
-client = fastapi.testclient.TestClient(app)
-
-TEST_USER_ID = "3e53486c-cf57-477e-ba2a-cb02dc828e1a"
-
-
-@pytest.fixture(autouse=True)
-def setup_app_auth(mock_jwt_user):
-    """Setup auth overrides for all tests in this module"""
-    from autogpt_libs.auth.jwt_utils import get_jwt_payload
-
-    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
-    yield
-    app.dependency_overrides.clear()
-
-
-def _mock_update_session_title(
-    mocker: pytest_mock.MockerFixture, *, success: bool = True
-):
-    """Mock update_session_title."""
-    return mocker.patch(
-        "backend.api.features.chat.routes.update_session_title",
-        new_callable=AsyncMock,
-        return_value=success,
-    )
-
-
-# ─── Update title: success ─────────────────────────────────────────────
-
-
-def test_update_title_success(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    mock_update = _mock_update_session_title(mocker, success=True)
-
-    response = client.patch(
-        "/sessions/sess-1/title",
-        json={"title": "My project"},
-    )
-
-    assert response.status_code == 200
-    assert response.json() == {"status": "ok"}
-    mock_update.assert_called_once_with("sess-1", test_user_id, "My project")
-
-
-def test_update_title_trims_whitespace(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    mock_update = _mock_update_session_title(mocker, success=True)
-
-    response = client.patch(
-        "/sessions/sess-1/title",
-        json={"title": "  trimmed  "},
-    )
-
-    assert response.status_code == 200
-    mock_update.assert_called_once_with("sess-1", test_user_id, "trimmed")
-
-
-# ─── Update title: blank / whitespace-only → 422 ──────────────────────
-
-
-def test_update_title_blank_rejected(
-    test_user_id: str,
-) -> None:
-    """Whitespace-only titles must be rejected before hitting the DB."""
-    response = client.patch(
-        "/sessions/sess-1/title",
-        json={"title": "   "},
-    )
-
-    assert response.status_code == 422
-
-
-def test_update_title_empty_rejected(
-    test_user_id: str,
-) -> None:
-    response = client.patch(
-        "/sessions/sess-1/title",
-        json={"title": ""},
-    )
-
-    assert response.status_code == 422
-
-
-# ─── Update title: session not found or wrong user → 404 ──────────────
-
-
-def test_update_title_not_found(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    _mock_update_session_title(mocker, success=False)
-
-    response = client.patch(
-        "/sessions/sess-1/title",
-        json={"title": "New name"},
-    )
-
-    assert response.status_code == 404
-
-
-# ─── file_ids Pydantic validation ─────────────────────────────────────
-
-
-def test_stream_chat_rejects_too_many_file_ids():
-    """More than 20 file_ids should be rejected by Pydantic validation (422)."""
-    response = client.post(
-        "/sessions/sess-1/stream",
-        json={
-            "message": "hello",
-            "file_ids": [f"00000000-0000-0000-0000-{i:012d}" for i in range(21)],
-        },
-    )
-    assert response.status_code == 422
-
-
-def _mock_stream_internals(mocker: pytest_mock.MockFixture):
-    """Mock the async internals of stream_chat_post so tests can exercise
-    validation and enrichment logic without needing Redis/RabbitMQ."""
-    mocker.patch(
-        "backend.api.features.chat.routes._validate_and_get_session",
-        return_value=None,
-    )
-    mocker.patch(
-        "backend.api.features.chat.routes.append_and_save_message",
-        return_value=None,
-    )
-    mock_registry = mocker.MagicMock()
-    mock_registry.create_session = mocker.AsyncMock(return_value=None)
-    mocker.patch(
-        "backend.api.features.chat.routes.stream_registry",
-        mock_registry,
-    )
-    mocker.patch(
-        "backend.api.features.chat.routes.enqueue_copilot_turn",
-        return_value=None,
-    )
-    mocker.patch(
-        "backend.api.features.chat.routes.track_user_message",
-        return_value=None,
-    )
-
-
-def test_stream_chat_accepts_20_file_ids(mocker: pytest_mock.MockFixture):
-    """Exactly 20 file_ids should be accepted (not rejected by validation)."""
-    _mock_stream_internals(mocker)
-    # Patch workspace lookup as imported by the routes module
-    mocker.patch(
-        "backend.api.features.chat.routes.get_or_create_workspace",
-        return_value=type("W", (), {"id": "ws-1"})(),
-    )
-    mock_prisma = mocker.MagicMock()
-    mock_prisma.find_many = mocker.AsyncMock(return_value=[])
-    mocker.patch(
-        "prisma.models.UserWorkspaceFile.prisma",
-        return_value=mock_prisma,
-    )
-
-    response = client.post(
-        "/sessions/sess-1/stream",
-        json={
-            "message": "hello",
-            "file_ids": [f"00000000-0000-0000-0000-{i:012d}" for i in range(20)],
-        },
-    )
-    # Should get past validation — 200 streaming response expected
-    assert response.status_code == 200
-
-
-# ─── UUID format filtering ─────────────────────────────────────────────
-
-
-def test_file_ids_filters_invalid_uuids(mocker: pytest_mock.MockFixture):
-    """Non-UUID strings in file_ids should be silently filtered out
-    and NOT passed to the database query."""
-    _mock_stream_internals(mocker)
-    mocker.patch(
-        "backend.api.features.chat.routes.get_or_create_workspace",
-        return_value=type("W", (), {"id": "ws-1"})(),
-    )
-
-    mock_prisma = mocker.MagicMock()
-    mock_prisma.find_many = mocker.AsyncMock(return_value=[])
-    mocker.patch(
-        "prisma.models.UserWorkspaceFile.prisma",
-        return_value=mock_prisma,
-    )
-
-    valid_id = "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
-    client.post(
-        "/sessions/sess-1/stream",
-        json={
-            "message": "hello",
-            "file_ids": [
-                valid_id,
-                "not-a-uuid",
-                "../../../etc/passwd",
-                "",
-            ],
-        },
-    )
-
-    # The find_many call should only receive the one valid UUID
-    mock_prisma.find_many.assert_called_once()
-    call_kwargs = mock_prisma.find_many.call_args[1]
-    assert call_kwargs["where"]["id"]["in"] == [valid_id]
-
-
-# ─── Cross-workspace file_ids ─────────────────────────────────────────
-
-
-def test_file_ids_scoped_to_workspace(mocker: pytest_mock.MockFixture):
-    """The batch query should scope to the user's workspace."""
-    _mock_stream_internals(mocker)
-    mocker.patch(
-        "backend.api.features.chat.routes.get_or_create_workspace",
-        return_value=type("W", (), {"id": "my-workspace-id"})(),
-    )
-
-    mock_prisma = mocker.MagicMock()
-    mock_prisma.find_many = mocker.AsyncMock(return_value=[])
-    mocker.patch(
-        "prisma.models.UserWorkspaceFile.prisma",
-        return_value=mock_prisma,
-    )
-
-    fid = "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
-    client.post(
-        "/sessions/sess-1/stream",
-        json={"message": "hi", "file_ids": [fid]},
-    )
-
-    call_kwargs = mock_prisma.find_many.call_args[1]
-    assert call_kwargs["where"]["workspaceId"] == "my-workspace-id"
-    assert call_kwargs["where"]["isDeleted"] is False
-
-
-# ─── Rate limit → 429 ─────────────────────────────────────────────────
-
-
-def test_stream_chat_returns_429_on_daily_rate_limit(mocker: pytest_mock.MockFixture):
-    """When check_rate_limit raises RateLimitExceeded for daily limit the endpoint returns 429."""
-    from backend.copilot.rate_limit import RateLimitExceeded
-
-    _mock_stream_internals(mocker)
-    # Ensure the rate-limit branch is entered by setting a non-zero limit.
-    mocker.patch.object(chat_routes.config, "daily_token_limit", 10000)
-    mocker.patch.object(chat_routes.config, "weekly_token_limit", 50000)
-    mocker.patch(
-        "backend.api.features.chat.routes.check_rate_limit",
-        side_effect=RateLimitExceeded("daily", datetime.now(UTC) + timedelta(hours=1)),
-    )
-
-    response = client.post(
-        "/sessions/sess-1/stream",
-        json={"message": "hello"},
-    )
-    assert response.status_code == 429
-    assert "daily" in response.json()["detail"].lower()
-
-
-def test_stream_chat_returns_429_on_weekly_rate_limit(mocker: pytest_mock.MockFixture):
-    """When check_rate_limit raises RateLimitExceeded for weekly limit the endpoint returns 429."""
-    from backend.copilot.rate_limit import RateLimitExceeded
-
-    _mock_stream_internals(mocker)
-    mocker.patch.object(chat_routes.config, "daily_token_limit", 10000)
-    mocker.patch.object(chat_routes.config, "weekly_token_limit", 50000)
-    resets_at = datetime.now(UTC) + timedelta(days=3)
-    mocker.patch(
-        "backend.api.features.chat.routes.check_rate_limit",
-        side_effect=RateLimitExceeded("weekly", resets_at),
-    )
-
-    response = client.post(
-        "/sessions/sess-1/stream",
-        json={"message": "hello"},
-    )
-    assert response.status_code == 429
-    detail = response.json()["detail"].lower()
-    assert "weekly" in detail
-    assert "resets in" in detail
-
-
-def test_stream_chat_429_includes_reset_time(mocker: pytest_mock.MockFixture):
-    """The 429 response detail should include the human-readable reset time."""
-    from backend.copilot.rate_limit import RateLimitExceeded
-
-    _mock_stream_internals(mocker)
-    mocker.patch.object(chat_routes.config, "daily_token_limit", 10000)
-    mocker.patch.object(chat_routes.config, "weekly_token_limit", 50000)
-    mocker.patch(
-        "backend.api.features.chat.routes.check_rate_limit",
-        side_effect=RateLimitExceeded(
-            "daily", datetime.now(UTC) + timedelta(hours=2, minutes=30)
-        ),
-    )
-
-    response = client.post(
-        "/sessions/sess-1/stream",
-        json={"message": "hello"},
-    )
-    assert response.status_code == 429
-    detail = response.json()["detail"]
-    assert "2h" in detail
-    assert "Resets in" in detail
-
-
-# ─── Usage endpoint ───────────────────────────────────────────────────
-
-
-def _mock_usage(
-    mocker: pytest_mock.MockerFixture,
-    *,
-    daily_used: int = 500,
-    weekly_used: int = 2000,
-    daily_limit: int = 10000,
-    weekly_limit: int = 50000,
-    tier: "SubscriptionTier" = SubscriptionTier.FREE,
-) -> AsyncMock:
-    """Mock get_usage_status and get_global_rate_limits for usage endpoint tests.
-
-    Mocks both ``get_global_rate_limits`` (returns the given limits + tier) and
-    ``get_usage_status`` so that tests exercise the endpoint without hitting
-    LaunchDarkly or Prisma.
-    """
-    from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow
-
-    mocker.patch(
-        "backend.api.features.chat.routes.get_global_rate_limits",
-        new_callable=AsyncMock,
-        return_value=(daily_limit, weekly_limit, tier),
-    )
-
-    resets_at = datetime.now(UTC) + timedelta(days=1)
-    status = CoPilotUsageStatus(
-        daily=UsageWindow(used=daily_used, limit=daily_limit, resets_at=resets_at),
-        weekly=UsageWindow(used=weekly_used, limit=weekly_limit, resets_at=resets_at),
-    )
-    return mocker.patch(
-        "backend.api.features.chat.routes.get_usage_status",
-        new_callable=AsyncMock,
-        return_value=status,
-    )
-
-
-def test_usage_returns_daily_and_weekly(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    """GET /usage returns daily and weekly usage."""
-    mock_get = _mock_usage(mocker, daily_used=500, weekly_used=2000)
-
-    mocker.patch.object(chat_routes.config, "daily_token_limit", 10000)
-    mocker.patch.object(chat_routes.config, "weekly_token_limit", 50000)
-
-    response = client.get("/usage")
-
-    assert response.status_code == 200
-    data = response.json()
-    assert data["daily"]["used"] == 500
-    assert data["weekly"]["used"] == 2000
-
-    mock_get.assert_called_once_with(
-        user_id=test_user_id,
-        daily_token_limit=10000,
-        weekly_token_limit=50000,
-        rate_limit_reset_cost=chat_routes.config.rate_limit_reset_cost,
-        tier=SubscriptionTier.FREE,
-    )
-
-
-def test_usage_uses_config_limits(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    """The endpoint forwards resolved limits from get_global_rate_limits to get_usage_status."""
-    mock_get = _mock_usage(mocker, daily_limit=99999, weekly_limit=77777)
-
-    mocker.patch.object(chat_routes.config, "rate_limit_reset_cost", 500)
-
-    response = client.get("/usage")
-
-    assert response.status_code == 200
-    mock_get.assert_called_once_with(
-        user_id=test_user_id,
-        daily_token_limit=99999,
-        weekly_token_limit=77777,
-        rate_limit_reset_cost=500,
-        tier=SubscriptionTier.FREE,
-    )
-
-
-def test_usage_rejects_unauthenticated_request() -> None:
-    """GET /usage should return 401 when no valid JWT is provided."""
-    unauthenticated_app = fastapi.FastAPI()
-    unauthenticated_app.include_router(chat_routes.router)
-    unauthenticated_client = fastapi.testclient.TestClient(unauthenticated_app)
-
-    response = unauthenticated_client.get("/usage")
-
-    assert response.status_code == 401
-
-
-# ─── Suggested prompts endpoint ──────────────────────────────────────
-
-
-def _mock_get_business_understanding(
-    mocker: pytest_mock.MockerFixture,
-    *,
-    return_value=None,
-):
-    """Mock get_business_understanding."""
-    return mocker.patch(
-        "backend.api.features.chat.routes.get_business_understanding",
-        new_callable=AsyncMock,
-        return_value=return_value,
-    )
-
-
-def test_suggested_prompts_returns_themes(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    """User with themed prompts gets them back as themes list."""
-    mock_understanding = MagicMock()
-    mock_understanding.suggested_prompts = {
-        "Learn": ["L1", "L2"],
-        "Create": ["C1"],
-    }
-    _mock_get_business_understanding(mocker, return_value=mock_understanding)
-
-    response = client.get("/suggested-prompts")
-
-    assert response.status_code == 200
-    data = response.json()
-    assert "themes" in data
-    themes_by_name = {t["name"]: t["prompts"] for t in data["themes"]}
-    assert themes_by_name["Learn"] == ["L1", "L2"]
-    assert themes_by_name["Create"] == ["C1"]
-
-
-def test_suggested_prompts_no_understanding(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    """User with no understanding gets empty themes list."""
-    _mock_get_business_understanding(mocker, return_value=None)
-
-    response = client.get("/suggested-prompts")
-
-    assert response.status_code == 200
-    assert response.json() == {"themes": []}
-
-
-def test_suggested_prompts_empty_prompts(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    """User with understanding but empty prompts gets empty themes list."""
-    mock_understanding = MagicMock()
-    mock_understanding.suggested_prompts = {}
-    _mock_get_business_understanding(mocker, return_value=mock_understanding)
-
-    response = client.get("/suggested-prompts")
-
-    assert response.status_code == 200
-    assert response.json() == {"themes": []}
-
-
-# ─── Create session: dry_run contract ─────────────────────────────────
-
-
-def _mock_create_chat_session(mocker: pytest_mock.MockerFixture):
-    """Mock create_chat_session to return a fake session."""
-    from backend.copilot.model import ChatSession
-
-    async def _fake_create(user_id: str, *, dry_run: bool):
-        return ChatSession.new(user_id, dry_run=dry_run)
-
-    return mocker.patch(
-        "backend.api.features.chat.routes.create_chat_session",
-        new_callable=AsyncMock,
-        side_effect=_fake_create,
-    )
-
-
-def test_create_session_dry_run_true(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    """Sending ``{"dry_run": true}`` sets metadata.dry_run to True."""
-    _mock_create_chat_session(mocker)
-
-    response = client.post("/sessions", json={"dry_run": True})
-
-    assert response.status_code == 200
-    assert response.json()["metadata"]["dry_run"] is True
-
-
-def test_create_session_dry_run_default_false(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    """Empty body defaults dry_run to False."""
-    _mock_create_chat_session(mocker)
-
-    response = client.post("/sessions")
-
-    assert response.status_code == 200
-    assert response.json()["metadata"]["dry_run"] is False
-
-
-def test_create_session_rejects_nested_metadata(
-    test_user_id: str,
-) -> None:
-    """Sending ``{"metadata": {"dry_run": true}}`` must return 422, not silently
-    default to ``dry_run=False``. This guards against the common mistake of
-    nesting dry_run inside metadata instead of providing it at the top level."""
-    response = client.post(
-        "/sessions",
-        json={"metadata": {"dry_run": True}},
-    )
-
-    assert response.status_code == 422
-
-
-class TestStreamChatRequestModeValidation:
-    """Pydantic-level validation of the ``mode`` field on StreamChatRequest."""
-
-    def test_rejects_invalid_mode_value(self) -> None:
-        """Any string outside the Literal set must raise ValidationError."""
-        from pydantic import ValidationError
-
-        from backend.api.features.chat.routes import StreamChatRequest
-
-        with pytest.raises(ValidationError):
-            StreamChatRequest(message="hi", mode="turbo")  # type: ignore[arg-type]
-
-    def test_accepts_fast_mode(self) -> None:
-        from backend.api.features.chat.routes import StreamChatRequest
-
-        req = StreamChatRequest(message="hi", mode="fast")
-        assert req.mode == "fast"
-
-    def test_accepts_extended_thinking_mode(self) -> None:
-        from backend.api.features.chat.routes import StreamChatRequest
-
-        req = StreamChatRequest(message="hi", mode="extended_thinking")
-        assert req.mode == "extended_thinking"
-
-    def test_accepts_none_mode(self) -> None:
-        """``mode=None`` is valid (server decides via feature flags)."""
-        from backend.api.features.chat.routes import StreamChatRequest
-
-        req = StreamChatRequest(message="hi", mode=None)
-        assert req.mode is None
-
-    def test_mode_defaults_to_none_when_omitted(self) -> None:
-        from backend.api.features.chat.routes import StreamChatRequest
-
-        req = StreamChatRequest(message="hi")
-        assert req.mode is None
--- a/autogpt_platform/backend/backend/api/features/chat/service.py
+++ b/autogpt_platform/backend/backend/api/features/chat/service.py
--- a/autogpt_platform/backend/backend/api/features/chat/service_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/service_test.py
@@ -0,0 +1,82 @@
+import logging
+from os import getenv
+
+import pytest
+
+from . import service as chat_service
+from .model import create_chat_session, get_chat_session, upsert_chat_session
+from .response_model import (
+    StreamError,
+    StreamFinish,
+    StreamTextDelta,
+    StreamToolOutputAvailable,
+)
+
+logger = logging.getLogger(__name__)
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_stream_chat_completion(setup_test_user, test_user_id):
+    """
+    Test the stream_chat_completion function.
+    """
+    api_key: str | None = getenv("OPEN_ROUTER_API_KEY")
+    if not api_key:
+        return pytest.skip("OPEN_ROUTER_API_KEY is not set, skipping test")
+
+    session = await create_chat_session(test_user_id)
+
+    has_errors = False
+    has_ended = False
+    assistant_message = ""
+    async for chunk in chat_service.stream_chat_completion(
+        session.session_id, "Hello, how are you?", user_id=session.user_id
+    ):
+        logger.info(chunk)
+        if isinstance(chunk, StreamError):
+            has_errors = True
+        if isinstance(chunk, StreamTextDelta):
+            assistant_message += chunk.delta
+        if isinstance(chunk, StreamFinish):
+            has_ended = True
+
+    assert has_ended, "Chat completion did not end"
+    assert not has_errors, "Error occurred while streaming chat completion"
+    assert assistant_message, "Assistant message is empty"
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_stream_chat_completion_with_tool_calls(setup_test_user, test_user_id):
+    """
+    Test the stream_chat_completion function.
+    """
+    api_key: str | None = getenv("OPEN_ROUTER_API_KEY")
+    if not api_key:
+        return pytest.skip("OPEN_ROUTER_API_KEY is not set, skipping test")
+
+    session = await create_chat_session(test_user_id)
+    session = await upsert_chat_session(session)
+
+    has_errors = False
+    has_ended = False
+    had_tool_calls = False
+    async for chunk in chat_service.stream_chat_completion(
+        session.session_id,
+        "Please find me an agent that can help me with my business. Use the query 'moneny printing agent'",
+        user_id=session.user_id,
+    ):
+        logger.info(chunk)
+        if isinstance(chunk, StreamError):
+            has_errors = True
+
+        if isinstance(chunk, StreamFinish):
+            has_ended = True
+        if isinstance(chunk, StreamToolOutputAvailable):
+            had_tool_calls = True
+
+    assert has_ended, "Chat completion did not end"
+    assert not has_errors, "Error occurred while streaming chat completion"
+    assert had_tool_calls, "Tool calls did not occur"
+    session = await get_chat_session(session.session_id)
+    assert session, "Session not found"
+    assert session.usage, "Usage is empty"
--- a/autogpt_platform/backend/backend/api/features/chat/tools/IDEAS.md
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/IDEAS.md
--- a/autogpt_platform/backend/backend/api/features/chat/tools/init.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/init.py
@@ -0,0 +1,92 @@
+import logging
+from typing import TYPE_CHECKING, Any
+
+from openai.types.chat import ChatCompletionToolParam
+
+from backend.api.features.chat.model import ChatSession
+from backend.api.features.chat.tracking import track_tool_called
+
+from .add_understanding import AddUnderstandingTool
+from .agent_output import AgentOutputTool
+from .base import BaseTool
+from .create_agent import CreateAgentTool
+from .edit_agent import EditAgentTool
+from .find_agent import FindAgentTool
+from .find_block import FindBlockTool
+from .find_library_agent import FindLibraryAgentTool
+from .get_doc_page import GetDocPageTool
+from .run_agent import RunAgentTool
+from .run_block import RunBlockTool
+from .search_docs import SearchDocsTool
+from .workspace_files import (
+    DeleteWorkspaceFileTool,
+    ListWorkspaceFilesTool,
+    ReadWorkspaceFileTool,
+    WriteWorkspaceFileTool,
+)
+
+if TYPE_CHECKING:
+    from backend.api.features.chat.response_model import StreamToolOutputAvailable
+
+logger = logging.getLogger(__name__)
+
+# Single source of truth for all tools
+TOOL_REGISTRY: dict[str, BaseTool] = {
+    "add_understanding": AddUnderstandingTool(),
+    "create_agent": CreateAgentTool(),
+    "edit_agent": EditAgentTool(),
+    "find_agent": FindAgentTool(),
+    "find_block": FindBlockTool(),
+    "find_library_agent": FindLibraryAgentTool(),
+    "run_agent": RunAgentTool(),
+    "run_block": RunBlockTool(),
+    "view_agent_output": AgentOutputTool(),
+    "search_docs": SearchDocsTool(),
+    "get_doc_page": GetDocPageTool(),
+    # Workspace tools for CoPilot file operations
+    "list_workspace_files": ListWorkspaceFilesTool(),
+    "read_workspace_file": ReadWorkspaceFileTool(),
+    "write_workspace_file": WriteWorkspaceFileTool(),
+    "delete_workspace_file": DeleteWorkspaceFileTool(),
+}
+
+# Export individual tool instances for backwards compatibility
+find_agent_tool = TOOL_REGISTRY["find_agent"]
+run_agent_tool = TOOL_REGISTRY["run_agent"]
+
+# Generated from registry for OpenAI API
+tools: list[ChatCompletionToolParam] = [
+    tool.as_openai_tool() for tool in TOOL_REGISTRY.values()
+]
+
+
+def get_tool(tool_name: str) -> BaseTool | None:
+    """Get a tool instance by name."""
+    return TOOL_REGISTRY.get(tool_name)
+
+
+async def execute_tool(
+    tool_name: str,
+    parameters: dict[str, Any],
+    user_id: str | None,
+    session: ChatSession,
+    tool_call_id: str,
+) -> "StreamToolOutputAvailable":
+    """Execute a tool by name."""
+    tool = get_tool(tool_name)
+    if not tool:
+        raise ValueError(f"Tool {tool_name} not found")
+
+    # Track tool call in PostHog
+    logger.info(
+        f"Tracking tool call: tool={tool_name}, user={user_id}, "
+        f"session={session.session_id}, call_id={tool_call_id}"
+    )
+    track_tool_called(
+        user_id=user_id,
+        session_id=session.session_id,
+        tool_name=tool_name,
+        tool_call_id=tool_call_id,
+    )
+
+    return await tool.execute(user_id, session, tool_call_id, **parameters)
--- a/autogpt_platform/backend/backend/api/features/chat/tools/_test_data.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/_test_data.py
@@ -1,46 +1,22 @@
-import logging
 import uuid
 from datetime import UTC, datetime
 from os import getenv

 import pytest
-import pytest_asyncio
 from prisma.types import ProfileCreateInput
 from pydantic import SecretStr

+from backend.api.features.chat.model import ChatSession
 from backend.api.features.store import db as store_db
 from backend.blocks.firecrawl.scrape import FirecrawlScrapeBlock
 from backend.blocks.io import AgentInputBlock, AgentOutputBlock
 from backend.blocks.llm import AITextGeneratorBlock
-from backend.copilot.model import ChatSession
-from backend.data import db as db_module
 from backend.data.db import prisma
 from backend.data.graph import Graph, Link, Node, create_graph
 from backend.data.model import APIKeyCredentials
 from backend.data.user import get_or_create_user
 from backend.integrations.credentials_store import IntegrationCredentialsStore

-_logger = logging.getLogger(__name__)
-
-
-async def _ensure_db_connected() -> None:
-    """Ensure the Prisma connection is alive on the current event loop.
-
-    On Python 3.11, the httpx transport inside Prisma can reference a stale
-    (closed) event loop when session-scoped async fixtures are evaluated long
-    after the initial ``server`` fixture connected Prisma.  A cheap health-check
-    followed by a reconnect fixes this without affecting other fixtures.
-    """
-    try:
-        await prisma.query_raw("SELECT 1")
-    except Exception:
-        _logger.info("Prisma connection stale – reconnecting")
-        try:
-            await db_module.disconnect()
-        except Exception:
-            pass
-        await db_module.connect()
-

 def make_session(user_id: str):
    return ChatSession(
@@ -55,19 +31,15 @@ def make_session(user_id: str):
    )


-@pytest_asyncio.fixture(scope="session", loop_scope="session")
-async def setup_test_data(server):
+@pytest.fixture(scope="session")
+async def setup_test_data():
    """
    Set up test data for run_agent tests:
    1. Create a test user
    2. Create a test graph (agent input -> agent output)
    3. Create a store listing and store listing version
    4. Approve the store listing version
-
-    Depends on ``server`` to ensure Prisma is connected.
    """
-    await _ensure_db_connected()
-
    # 1. Create a test user
    user_data = {
        "sub": f"test-user-{uuid.uuid4()}",
@@ -102,6 +74,7 @@ async def setup_test_data(server):
            "value": "",
            "advanced": False,
            "description": "Test input field",
+            "placeholder_values": [],
        },
        metadata={"position": {"x": 0, "y": 0}},
    )
@@ -150,8 +123,8 @@ async def setup_test_data(server):
    unique_slug = f"test-agent-{str(uuid.uuid4())[:8]}"
    store_submission = await store_db.create_store_submission(
        user_id=user.id,
-        graph_id=created_graph.id,
-        graph_version=created_graph.version,
+        agent_id=created_graph.id,
+        agent_version=created_graph.version,
        slug=unique_slug,
        name="Test Agent",
        description="A simple test agent",
@@ -160,10 +133,10 @@ async def setup_test_data(server):
        image_urls=["https://example.com/image.jpg"],
    )

-    assert store_submission.listing_version_id is not None
+    assert store_submission.store_listing_version_id is not None
    # 4. Approve the store listing version
    await store_db.review_store_submission(
-        store_listing_version_id=store_submission.listing_version_id,
+        store_listing_version_id=store_submission.store_listing_version_id,
        is_approved=True,
        external_comments="Approved for testing",
        internal_comments="Test approval",
@@ -177,19 +150,15 @@ async def setup_test_data(server):
    }


-@pytest_asyncio.fixture(scope="session", loop_scope="session")
-async def setup_llm_test_data(server):
+@pytest.fixture(scope="session")
+async def setup_llm_test_data():
    """
    Set up test data for LLM agent tests:
    1. Create a test user
    2. Create test OpenAI credentials for the user
    3. Create a test graph with input -> LLM block -> output
    4. Create and approve a store listing
-
-    Depends on ``server`` to ensure Prisma is connected.
    """
-    await _ensure_db_connected()
-
    key = getenv("OPENAI_API_KEY")
    if not key:
        return pytest.skip("OPENAI_API_KEY is not set")
@@ -241,6 +210,7 @@ async def setup_llm_test_data(server):
            "value": "",
            "advanced": False,
            "description": "Prompt for the LLM",
+            "placeholder_values": [],
        },
        metadata={"position": {"x": 0, "y": 0}},
    )
@@ -319,8 +289,8 @@ async def setup_llm_test_data(server):
    unique_slug = f"llm-test-agent-{str(uuid.uuid4())[:8]}"
    store_submission = await store_db.create_store_submission(
        user_id=user.id,
-        graph_id=created_graph.id,
-        graph_version=created_graph.version,
+        agent_id=created_graph.id,
+        agent_version=created_graph.version,
        slug=unique_slug,
        name="LLM Test Agent",
        description="An agent with LLM capabilities",
@@ -328,9 +298,9 @@ async def setup_llm_test_data(server):
        categories=["testing", "ai"],
        image_urls=["https://example.com/image.jpg"],
    )
-    assert store_submission.listing_version_id is not None
+    assert store_submission.store_listing_version_id is not None
    await store_db.review_store_submission(
-        store_listing_version_id=store_submission.listing_version_id,
+        store_listing_version_id=store_submission.store_listing_version_id,
        is_approved=True,
        external_comments="Approved for testing",
        internal_comments="Test approval for LLM agent",
@@ -345,18 +315,14 @@ async def setup_llm_test_data(server):
    }


-@pytest_asyncio.fixture(scope="session", loop_scope="session")
-async def setup_firecrawl_test_data(server):
+@pytest.fixture(scope="session")
+async def setup_firecrawl_test_data():
    """
    Set up test data for Firecrawl agent tests (missing credentials scenario):
    1. Create a test user (WITHOUT Firecrawl credentials)
    2. Create a test graph with input -> Firecrawl block -> output
    3. Create and approve a store listing
-
-    Depends on ``server`` to ensure Prisma is connected.
    """
-    await _ensure_db_connected()
-
    # 1. Create a test user
    user_data = {
        "sub": f"test-user-{uuid.uuid4()}",
@@ -394,6 +360,7 @@ async def setup_firecrawl_test_data(server):
            "value": "",
            "advanced": False,
            "description": "URL for Firecrawl to scrape",
+            "placeholder_values": [],
        },
        metadata={"position": {"x": 0, "y": 0}},
    )
@@ -473,8 +440,8 @@ async def setup_firecrawl_test_data(server):
    unique_slug = f"firecrawl-test-agent-{str(uuid.uuid4())[:8]}"
    store_submission = await store_db.create_store_submission(
        user_id=user.id,
-        graph_id=created_graph.id,
-        graph_version=created_graph.version,
+        agent_id=created_graph.id,
+        agent_version=created_graph.version,
        slug=unique_slug,
        name="Firecrawl Test Agent",
        description="An agent with Firecrawl integration (no credentials)",
@@ -482,9 +449,9 @@ async def setup_firecrawl_test_data(server):
        categories=["testing", "scraping"],
        image_urls=["https://example.com/image.jpg"],
    )
-    assert store_submission.listing_version_id is not None
+    assert store_submission.store_listing_version_id is not None
    await store_db.review_store_submission(
-        store_listing_version_id=store_submission.listing_version_id,
+        store_listing_version_id=store_submission.store_listing_version_id,
        is_approved=True,
        external_comments="Approved for testing",
        internal_comments="Test approval for Firecrawl agent",
--- a/autogpt_platform/backend/backend/api/features/chat/tools/add_understanding.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/add_understanding.py
@@ -3,9 +3,11 @@
 import logging
 from typing import Any

-from backend.copilot.model import ChatSession
-from backend.data.db_accessors import understanding_db
-from backend.data.understanding import BusinessUnderstandingInput
+from backend.api.features.chat.model import ChatSession
+from backend.data.understanding import (
+    BusinessUnderstandingInput,
+    upsert_business_understanding,
+)

 from .base import BaseTool
 from .models import ErrorResponse, ToolResponseBase, UnderstandingUpdatedResponse
@@ -22,12 +24,13 @@ class AddUnderstandingTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Store user's business context, workflows, pain points, and automation goals. "
-            "Call whenever the user shares business info. Each call incrementally merges "
-            "with existing data — provide only the fields you have. "
-            "Builds a profile that helps recommend better agents for the user's needs."
-        )
+        return """Capture and store information about the user's business context,
+workflows, pain points, and automation goals. Call this tool whenever the user
+shares information about their business. Each call incrementally adds to the
+existing understanding - you don't need to provide all fields at once.
+
+Use this to build a comprehensive profile that helps recommend better agents
+and automations for the user's specific needs."""

    @property
    def parameters(self) -> dict[str, Any]:
@@ -68,9 +71,6 @@ class AddUnderstandingTool(BaseTool):
        Each call merges new data with existing understanding:
        - String fields are overwritten if provided
        - List fields are appended (with deduplication)
-
-        Note: This tool accepts **kwargs because its parameters are derived
-        dynamically from the BusinessUnderstandingInput model schema.
        """
        session_id = session.session_id

@@ -80,26 +80,26 @@ class AddUnderstandingTool(BaseTool):
                session_id=session_id,
            )

-        # Build input model from kwargs (only include fields defined in the model)
-        valid_fields = set(BusinessUnderstandingInput.model_fields.keys())
-        filtered = {k: v for k, v in kwargs.items() if k in valid_fields}
-
        # Check if any data was provided
-        if not any(v is not None for v in filtered.values()):
+        if not any(v is not None for v in kwargs.values()):
            return ErrorResponse(
                message="Please provide at least one field to update.",
                session_id=session_id,
            )

-        input_data = BusinessUnderstandingInput(**filtered)
+        # Build input model from kwargs (only include fields defined in the model)
+        valid_fields = set(BusinessUnderstandingInput.model_fields.keys())
+        input_data = BusinessUnderstandingInput(
+            **{k: v for k, v in kwargs.items() if k in valid_fields}
+        )

        # Track which fields were updated
-        updated_fields = [k for k, v in filtered.items() if v is not None]
+        updated_fields = [
+            k for k, v in kwargs.items() if k in valid_fields and v is not None
+        ]

        # Upsert with merge
-        understanding = await understanding_db().upsert_business_understanding(
-            user_id, input_data
-        )
+        understanding = await upsert_business_understanding(user_id, input_data)

        # Build current understanding summary (filter out empty values)
        current_understanding = {
--- a/autogpt_platform/backend/backend/api/features/chat/tools/agent_generator/init.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/agent_generator/init.py
@@ -0,0 +1,31 @@
+"""Agent generator package - Creates agents from natural language."""
+
+from .core import (
+    AgentGeneratorNotConfiguredError,
+    decompose_goal,
+    generate_agent,
+    generate_agent_patch,
+    get_agent_as_json,
+    json_to_graph,
+    save_agent_to_library,
+)
+from .errors import get_user_message_for_error
+from .service import health_check as check_external_service_health
+from .service import is_external_service_configured
+
+__all__ = [
+    # Core functions
+    "decompose_goal",
+    "generate_agent",
+    "generate_agent_patch",
+    "save_agent_to_library",
+    "get_agent_as_json",
+    "json_to_graph",
+    # Exceptions
+    "AgentGeneratorNotConfiguredError",
+    # Service
+    "is_external_service_configured",
+    "check_external_service_health",
+    # Error handling
+    "get_user_message_for_error",
+]
--- a/autogpt_platform/backend/backend/api/features/chat/tools/agent_generator/core.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/agent_generator/core.py
@@ -0,0 +1,281 @@
+"""Core agent generation functions."""
+
+import logging
+import uuid
+from typing import Any
+
+from backend.api.features.library import db as library_db
+from backend.data.graph import Graph, Link, Node, create_graph
+
+from .service import (
+    decompose_goal_external,
+    generate_agent_external,
+    generate_agent_patch_external,
+    is_external_service_configured,
+)
+
+logger = logging.getLogger(__name__)
+
+
+class AgentGeneratorNotConfiguredError(Exception):
+    """Raised when the external Agent Generator service is not configured."""
+
+    pass
+
+
+def _check_service_configured() -> None:
+    """Check if the external Agent Generator service is configured.
+
+    Raises:
+        AgentGeneratorNotConfiguredError: If the service is not configured.
+    """
+    if not is_external_service_configured():
+        raise AgentGeneratorNotConfiguredError(
+            "Agent Generator service is not configured. "
+            "Set AGENTGENERATOR_HOST environment variable to enable agent generation."
+        )
+
+
+async def decompose_goal(description: str, context: str = "") -> dict[str, Any] | None:
+    """Break down a goal into steps or return clarifying questions.
+
+    Args:
+        description: Natural language goal description
+        context: Additional context (e.g., answers to previous questions)
+
+    Returns:
+        Dict with either:
+        - {"type": "clarifying_questions", "questions": [...]}
+        - {"type": "instructions", "steps": [...]}
+        Or None on error
+
+    Raises:
+        AgentGeneratorNotConfiguredError: If the external service is not configured.
+    """
+    _check_service_configured()
+    logger.info("Calling external Agent Generator service for decompose_goal")
+    return await decompose_goal_external(description, context)
+
+
+async def generate_agent(instructions: dict[str, Any]) -> dict[str, Any] | None:
+    """Generate agent JSON from instructions.
+
+    Args:
+        instructions: Structured instructions from decompose_goal
+
+    Returns:
+        Agent JSON dict, error dict {"type": "error", ...}, or None on error
+
+    Raises:
+        AgentGeneratorNotConfiguredError: If the external service is not configured.
+    """
+    _check_service_configured()
+    logger.info("Calling external Agent Generator service for generate_agent")
+    result = await generate_agent_external(instructions)
+    if result:
+        # Check if it's an error response - pass through as-is
+        if isinstance(result, dict) and result.get("type") == "error":
+            return result
+        # Ensure required fields for successful agent generation
+        if "id" not in result:
+            result["id"] = str(uuid.uuid4())
+        if "version" not in result:
+            result["version"] = 1
+        if "is_active" not in result:
+            result["is_active"] = True
+    return result
+
+
+def json_to_graph(agent_json: dict[str, Any]) -> Graph:
+    """Convert agent JSON dict to Graph model.
+
+    Args:
+        agent_json: Agent JSON with nodes and links
+
+    Returns:
+        Graph ready for saving
+    """
+    nodes = []
+    for n in agent_json.get("nodes", []):
+        node = Node(
+            id=n.get("id", str(uuid.uuid4())),
+            block_id=n["block_id"],
+            input_default=n.get("input_default", {}),
+            metadata=n.get("metadata", {}),
+        )
+        nodes.append(node)
+
+    links = []
+    for link_data in agent_json.get("links", []):
+        link = Link(
+            id=link_data.get("id", str(uuid.uuid4())),
+            source_id=link_data["source_id"],
+            sink_id=link_data["sink_id"],
+            source_name=link_data["source_name"],
+            sink_name=link_data["sink_name"],
+            is_static=link_data.get("is_static", False),
+        )
+        links.append(link)
+
+    return Graph(
+        id=agent_json.get("id", str(uuid.uuid4())),
+        version=agent_json.get("version", 1),
+        is_active=agent_json.get("is_active", True),
+        name=agent_json.get("name", "Generated Agent"),
+        description=agent_json.get("description", ""),
+        nodes=nodes,
+        links=links,
+    )
+
+
+def _reassign_node_ids(graph: Graph) -> None:
+    """Reassign all node and link IDs to new UUIDs.
+
+    This is needed when creating a new version to avoid unique constraint violations.
+    """
+    # Create mapping from old node IDs to new UUIDs
+    id_map = {node.id: str(uuid.uuid4()) for node in graph.nodes}
+
+    # Reassign node IDs
+    for node in graph.nodes:
+        node.id = id_map[node.id]
+
+    # Update link references to use new node IDs
+    for link in graph.links:
+        link.id = str(uuid.uuid4())  # Also give links new IDs
+        if link.source_id in id_map:
+            link.source_id = id_map[link.source_id]
+        if link.sink_id in id_map:
+            link.sink_id = id_map[link.sink_id]
+
+
+async def save_agent_to_library(
+    agent_json: dict[str, Any], user_id: str, is_update: bool = False
+) -> tuple[Graph, Any]:
+    """Save agent to database and user's library.
+
+    Args:
+        agent_json: Agent JSON dict
+        user_id: User ID
+        is_update: Whether this is an update to an existing agent
+
+    Returns:
+        Tuple of (created Graph, LibraryAgent)
+    """
+    from backend.data.graph import get_graph_all_versions
+
+    graph = json_to_graph(agent_json)
+
+    if is_update:
+        # For updates, keep the same graph ID but increment version
+        # and reassign node/link IDs to avoid conflicts
+        if graph.id:
+            existing_versions = await get_graph_all_versions(graph.id, user_id)
+            if existing_versions:
+                latest_version = max(v.version for v in existing_versions)
+                graph.version = latest_version + 1
+                # Reassign node IDs (but keep graph ID the same)
+                _reassign_node_ids(graph)
+                logger.info(f"Updating agent {graph.id} to version {graph.version}")
+    else:
+        # For new agents, always generate a fresh UUID to avoid collisions
+        graph.id = str(uuid.uuid4())
+        graph.version = 1
+        # Reassign all node IDs as well
+        _reassign_node_ids(graph)
+        logger.info(f"Creating new agent with ID {graph.id}")
+
+    # Save to database
+    created_graph = await create_graph(graph, user_id)
+
+    # Add to user's library (or update existing library agent)
+    library_agents = await library_db.create_library_agent(
+        graph=created_graph,
+        user_id=user_id,
+        sensitive_action_safe_mode=True,
+        create_library_agents_for_sub_graphs=False,
+    )
+
+    return created_graph, library_agents[0]
+
+
+async def get_agent_as_json(
+    graph_id: str, user_id: str | None
+) -> dict[str, Any] | None:
+    """Fetch an agent and convert to JSON format for editing.
+
+    Args:
+        graph_id: Graph ID or library agent ID
+        user_id: User ID
+
+    Returns:
+        Agent as JSON dict or None if not found
+    """
+    from backend.data.graph import get_graph
+
+    # Try to get the graph (version=None gets the active version)
+    graph = await get_graph(graph_id, version=None, user_id=user_id)
+    if not graph:
+        return None
+
+    # Convert to JSON format
+    nodes = []
+    for node in graph.nodes:
+        nodes.append(
+            {
+                "id": node.id,
+                "block_id": node.block_id,
+                "input_default": node.input_default,
+                "metadata": node.metadata,
+            }
+        )
+
+    links = []
+    for node in graph.nodes:
+        for link in node.output_links:
+            links.append(
+                {
+                    "id": link.id,
+                    "source_id": link.source_id,
+                    "sink_id": link.sink_id,
+                    "source_name": link.source_name,
+                    "sink_name": link.sink_name,
+                    "is_static": link.is_static,
+                }
+            )
+
+    return {
+        "id": graph.id,
+        "name": graph.name,
+        "description": graph.description,
+        "version": graph.version,
+        "is_active": graph.is_active,
+        "nodes": nodes,
+        "links": links,
+    }
+
+
+async def generate_agent_patch(
+    update_request: str, current_agent: dict[str, Any]
+) -> dict[str, Any] | None:
+    """Update an existing agent using natural language.
+
+    The external Agent Generator service handles:
+    - Generating the patch
+    - Applying the patch
+    - Fixing and validating the result
+
+    Args:
+        update_request: Natural language description of changes
+        current_agent: Current agent JSON
+
+    Returns:
+        Updated agent JSON, clarifying questions dict {"type": "clarifying_questions", ...},
+        error dict {"type": "error", ...}, or None on unexpected error
+
+    Raises:
+        AgentGeneratorNotConfiguredError: If the external service is not configured.
+    """
+    _check_service_configured()
+    logger.info("Calling external Agent Generator service for generate_agent_patch")
+    return await generate_agent_patch_external(update_request, current_agent)
--- a/autogpt_platform/backend/backend/api/features/chat/tools/agent_generator/errors.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/agent_generator/errors.py
@@ -0,0 +1,43 @@
+"""Error handling utilities for agent generator."""
+
+
+def get_user_message_for_error(
+    error_type: str,
+    operation: str = "process the request",
+    llm_parse_message: str | None = None,
+    validation_message: str | None = None,
+) -> str:
+    """Get a user-friendly error message based on error type.
+
+    This function maps internal error types to user-friendly messages,
+    providing a consistent experience across different agent operations.
+
+    Args:
+        error_type: The error type from the external service
+            (e.g., "llm_parse_error", "timeout", "rate_limit")
+        operation: Description of what operation failed, used in the default
+            message (e.g., "analyze the goal", "generate the agent")
+        llm_parse_message: Custom message for llm_parse_error type
+        validation_message: Custom message for validation_error type
+
+    Returns:
+        User-friendly error message suitable for display to the user
+    """
+    if error_type == "llm_parse_error":
+        return (
+            llm_parse_message
+            or "The AI had trouble processing this request. Please try again."
+        )
+    elif error_type == "validation_error":
+        return (
+            validation_message
+            or "The request failed validation. Please try rephrasing."
+        )
+    elif error_type == "patch_error":
+        return "Failed to apply the changes. Please try a different approach."
+    elif error_type in ("timeout", "llm_timeout"):
+        return "The request took too long. Please try again."
+    elif error_type in ("rate_limit", "llm_rate_limit"):
+        return "The service is currently busy. Please try again in a moment."
+    else:
+        return f"Failed to {operation}. Please try again."
--- a/autogpt_platform/backend/backend/api/features/chat/tools/agent_generator/service.py
+++ b/autogpt_platform/backend/backend/api/features/chat/tools/agent_generator/service.py
@@ -0,0 +1,374 @@
+"""External Agent Generator service client.
+
+This module provides a client for communicating with the external Agent Generator
+microservice. When AGENTGENERATOR_HOST is configured, the agent generation functions
+will delegate to the external service instead of using the built-in LLM-based implementation.
+"""
+
+import logging
+from typing import Any
+
+import httpx
+
+from backend.util.settings import Settings
+
+logger = logging.getLogger(__name__)
+
+
+def _create_error_response(
+    error_message: str,
+    error_type: str = "unknown",
+    details: dict[str, Any] | None = None,
+) -> dict[str, Any]:
+    """Create a standardized error response dict.
+
+    Args:
+        error_message: Human-readable error message
+        error_type: Machine-readable error type
+        details: Optional additional error details
+
+    Returns:
+        Error dict with type="error" and error details
+    """
+    response: dict[str, Any] = {
+        "type": "error",
+        "error": error_message,
+        "error_type": error_type,
+    }
+    if details:
+        response["details"] = details
+    return response
+
+
+def _classify_http_error(e: httpx.HTTPStatusError) -> tuple[str, str]:
+    """Classify an HTTP error into error_type and message.
+
+    Args:
+        e: The HTTP status error
+
+    Returns:
+        Tuple of (error_type, error_message)
+    """
+    status = e.response.status_code
+    if status == 429:
+        return "rate_limit", f"Agent Generator rate limited: {e}"
+    elif status == 503:
+        return "service_unavailable", f"Agent Generator unavailable: {e}"
+    elif status == 504 or status == 408:
+        return "timeout", f"Agent Generator timed out: {e}"
+    else:
+        return "http_error", f"HTTP error calling Agent Generator: {e}"
+
+
+def _classify_request_error(e: httpx.RequestError) -> tuple[str, str]:
+    """Classify a request error into error_type and message.
+
+    Args:
+        e: The request error
+
+    Returns:
+        Tuple of (error_type, error_message)
+    """
+    error_str = str(e).lower()
+    if "timeout" in error_str or "timed out" in error_str:
+        return "timeout", f"Agent Generator request timed out: {e}"
+    elif "connect" in error_str:
+        return "connection_error", f"Could not connect to Agent Generator: {e}"
+    else:
+        return "request_error", f"Request error calling Agent Generator: {e}"
+
+
+_client: httpx.AsyncClient | None = None
+_settings: Settings | None = None
+
+
+def _get_settings() -> Settings:
+    """Get or create settings singleton."""
+    global _settings
+    if _settings is None:
+        _settings = Settings()
+    return _settings
+
+
+def is_external_service_configured() -> bool:
+    """Check if external Agent Generator service is configured."""
+    settings = _get_settings()
+    return bool(settings.config.agentgenerator_host)
+
+
+def _get_base_url() -> str:
+    """Get the base URL for the external service."""
+    settings = _get_settings()
+    host = settings.config.agentgenerator_host
+    port = settings.config.agentgenerator_port
+    return f"http://{host}:{port}"
+
+
+def _get_client() -> httpx.AsyncClient:
+    """Get or create the HTTP client for the external service."""
+    global _client
+    if _client is None:
+        settings = _get_settings()
+        _client = httpx.AsyncClient(
+            base_url=_get_base_url(),
+            timeout=httpx.Timeout(settings.config.agentgenerator_timeout),
+        )
+    return _client
+
+
+async def decompose_goal_external(
+    description: str, context: str = ""
+) -> dict[str, Any] | None:
+    """Call the external service to decompose a goal.
+
+    Args:
+        description: Natural language goal description
+        context: Additional context (e.g., answers to previous questions)
+
+    Returns:
+        Dict with either:
+        - {"type": "clarifying_questions", "questions": [...]}
+        - {"type": "instructions", "steps": [...]}
+        - {"type": "unachievable_goal", ...}
+        - {"type": "vague_goal", ...}
+        - {"type": "error", "error": "...", "error_type": "..."} on error
+        Or None on unexpected error
+    """
+    client = _get_client()
+
+    # Build the request payload
+    payload: dict[str, Any] = {"description": description}
+    if context:
+        # The external service uses user_instruction for additional context
+        payload["user_instruction"] = context
+
+    try:
+        response = await client.post("/api/decompose-description", json=payload)
+        response.raise_for_status()
+        data = response.json()
+
+        if not data.get("success"):
+            error_msg = data.get("error", "Unknown error from Agent Generator")
+            error_type = data.get("error_type", "unknown")
+            logger.error(
+                f"Agent Generator decomposition failed: {error_msg} "
+                f"(type: {error_type})"
+            )
+            return _create_error_response(error_msg, error_type)
+
+        # Map the response to the expected format
+        response_type = data.get("type")
+        if response_type == "instructions":
+            return {"type": "instructions", "steps": data.get("steps", [])}
+        elif response_type == "clarifying_questions":
+            return {
+                "type": "clarifying_questions",
+                "questions": data.get("questions", []),
+            }
+        elif response_type == "unachievable_goal":
+            return {
+                "type": "unachievable_goal",
+                "reason": data.get("reason"),
+                "suggested_goal": data.get("suggested_goal"),
+            }
+        elif response_type == "vague_goal":
+            return {
+                "type": "vague_goal",
+                "suggested_goal": data.get("suggested_goal"),
+            }
+        elif response_type == "error":
+            # Pass through error from the service
+            return _create_error_response(
+                data.get("error", "Unknown error"),
+                data.get("error_type", "unknown"),
+            )
+        else:
+            logger.error(
+                f"Unknown response type from external service: {response_type}"
+            )
+            return _create_error_response(
+                f"Unknown response type from Agent Generator: {response_type}",
+                "invalid_response",
+            )
+
+    except httpx.HTTPStatusError as e:
+        error_type, error_msg = _classify_http_error(e)
+        logger.error(error_msg)
+        return _create_error_response(error_msg, error_type)
+    except httpx.RequestError as e:
+        error_type, error_msg = _classify_request_error(e)
+        logger.error(error_msg)
+        return _create_error_response(error_msg, error_type)
+    except Exception as e:
+        error_msg = f"Unexpected error calling Agent Generator: {e}"
+        logger.error(error_msg)
+        return _create_error_response(error_msg, "unexpected_error")
+
+
+async def generate_agent_external(
+    instructions: dict[str, Any],
+) -> dict[str, Any] | None:
+    """Call the external service to generate an agent from instructions.
+
+    Args:
+        instructions: Structured instructions from decompose_goal
+
+    Returns:
+        Agent JSON dict on success, or error dict {"type": "error", ...} on error
+    """
+    client = _get_client()
+
+    try:
+        response = await client.post(
+            "/api/generate-agent", json={"instructions": instructions}
+        )
+        response.raise_for_status()
+        data = response.json()
+
+        if not data.get("success"):
+            error_msg = data.get("error", "Unknown error from Agent Generator")
+            error_type = data.get("error_type", "unknown")
+            logger.error(
+                f"Agent Generator generation failed: {error_msg} "
+                f"(type: {error_type})"
+            )
+            return _create_error_response(error_msg, error_type)
+
+        return data.get("agent_json")
+
+    except httpx.HTTPStatusError as e:
+        error_type, error_msg = _classify_http_error(e)
+        logger.error(error_msg)
+        return _create_error_response(error_msg, error_type)
+    except httpx.RequestError as e:
+        error_type, error_msg = _classify_request_error(e)
+        logger.error(error_msg)
+        return _create_error_response(error_msg, error_type)
+    except Exception as e:
+        error_msg = f"Unexpected error calling Agent Generator: {e}"
+        logger.error(error_msg)
+        return _create_error_response(error_msg, "unexpected_error")
+
+
+async def generate_agent_patch_external(
+    update_request: str, current_agent: dict[str, Any]
+) -> dict[str, Any] | None:
+    """Call the external service to generate a patch for an existing agent.
+
+    Args:
+        update_request: Natural language description of changes
+        current_agent: Current agent JSON
+
+    Returns:
+        Updated agent JSON, clarifying questions dict, or error dict on error
+    """
+    client = _get_client()
+
+    try:
+        response = await client.post(
+            "/api/update-agent",
+            json={
+                "update_request": update_request,
+                "current_agent_json": current_agent,
+            },
+        )
+        response.raise_for_status()
+        data = response.json()
+
+        if not data.get("success"):
+            error_msg = data.get("error", "Unknown error from Agent Generator")
+            error_type = data.get("error_type", "unknown")
+            logger.error(
+                f"Agent Generator patch generation failed: {error_msg} "
+                f"(type: {error_type})"
+            )
+            return _create_error_response(error_msg, error_type)
+
+        # Check if it's clarifying questions
+        if data.get("type") == "clarifying_questions":
+            return {
+                "type": "clarifying_questions",
+                "questions": data.get("questions", []),
+            }
+
+        # Check if it's an error passed through
+        if data.get("type") == "error":
+            return _create_error_response(
+                data.get("error", "Unknown error"),
+                data.get("error_type", "unknown"),
+            )
+
+        # Otherwise return the updated agent JSON
+        return data.get("agent_json")
+
+    except httpx.HTTPStatusError as e:
+        error_type, error_msg = _classify_http_error(e)
+        logger.error(error_msg)
+        return _create_error_response(error_msg, error_type)
+    except httpx.RequestError as e:
+        error_type, error_msg = _classify_request_error(e)
+        logger.error(error_msg)
+        return _create_error_response(error_msg, error_type)
+    except Exception as e:
+        error_msg = f"Unexpected error calling Agent Generator: {e}"
+        logger.error(error_msg)
+        return _create_error_response(error_msg, "unexpected_error")
+
+
+async def get_blocks_external() -> list[dict[str, Any]] | None:
+    """Get available blocks from the external service.
+
+    Returns:
+        List of block info dicts or None on error
+    """
+    client = _get_client()
+
+    try:
+        response = await client.get("/api/blocks")
+        response.raise_for_status()
+        data = response.json()
+
+        if not data.get("success"):
+            logger.error("External service returned error getting blocks")
+            return None
+
+        return data.get("blocks", [])
+
+    except httpx.HTTPStatusError as e:
+        logger.error(f"HTTP error getting blocks from external service: {e}")
+        return None
+    except httpx.RequestError as e:
+        logger.error(f"Request error getting blocks from external service: {e}")
+        return None
+    except Exception as e:
+        logger.error(f"Unexpected error getting blocks from external service: {e}")
+        return None
+
+
+async def health_check() -> bool:
+    """Check if the external service is healthy.
+
+    Returns:
+        True if healthy, False otherwise
+    """
+    if not is_external_service_configured():
+        return False
+
+    client = _get_client()
+
+    try:
+        response = await client.get("/health")
+        response.raise_for_status()
+        data = response.json()
+        return data.get("status") == "healthy" and data.get("blocks_loaded", False)
+    except Exception as e:
+        logger.warning(f"External agent generator health check failed: {e}")
+        return False
+
+
+async def close_client() -> None:
+    """Close the HTTP client."""
+    global _client
+    if _client is not None:
+        await _client.aclose()
+        _client = None
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
claude[bot]	657190e759	fix(frontend): address latest CodeRabbit review suggestions - Use valid sort value "runs" instead of undefined in MainSearchResultPage test defaultProps to match production default and satisfy type contract - Remove redundant marketplacePage.goto() navigation in E2E test since the page is already at /marketplace after login Co-authored-by: Ubbe <0ubbe@users.noreply.github.com>	2026-02-12 15:02:18 +00:00
claude[bot]	caabee9278	fix(frontend): address CodeRabbit review suggestions for marketplace tests - Fix filename typo: supress → suppress and update imports - Replace waitFor + getByText/getByRole with findByText/findByRole (idiomatic RTL async queries) - Remove unnecessary comments in test files per coding guidelines - Fix operator precedence with explicit parentheses in suppress helper - Remove redundant `undefined as undefined` type casts - Extract inline props to `interface Props` in MockOnboardingProvider - Widen body type in create-500-handler from Record<string,unknown> to unknown - Add isValidating reset in mock-supabase-auth helpers - Add missing creators MSW handler in no-results tests - Clean up vitest.setup.tsx: replace nested afterAll with module-scoped variable - Fix lint errors: unused imports (act, matchesUrl) and unused params - Fix formatting in custom-mutator.ts Co-authored-by: Ubbe <0ubbe@users.noreply.github.com>	2026-02-12 14:36:34 +00:00
Otto	0fcaa63162	style(frontend): fix formatting in marketplace integration tests	2026-01-30 06:34:39 +00:00
Abhimanyu Yadav	6299045f98	Merge branch 'dev' into abhi/marketplace-integration-tests	2026-01-30 11:42:52 +05:30
Otto	24cd34ed3f	refactor(frontend): reorganize marketplace integration tests into file-specific locations - Split main.test.tsx files into dedicated test files: - rendering.test.tsx - Component rendering tests - auth-state.test.tsx - Authentication state tests - error-handling.test.tsx - API error handling tests - Add new test files: - loading-state.test.tsx - Loading skeleton tests - empty-state.test.tsx - Empty data handling tests - no-results.test.tsx - Search with no results tests Test coverage: - MainMarketplacePage: 14 tests (5 files) - MainAgentPage: 13 tests (3 files) - MainCreatorPage: 10 tests (3 files) - MainSearchResultPage: 11 tests (4 files) - Total: 48 tests across 15 files	2026-01-30 06:11:53 +00:00
abhi1992002	876c6677de	fix(frontend): enhance testing and error handling in marketplace components ### Changes 🏗️ - Updated `MainMarketplacePage` tests to include rendering checks for various sections and error handling for API failures. - Improved `AgentInfo` component to filter out NaN values from version numbers. - Modified `customMutator` to conditionally log errors based on the environment. - Enhanced Vitest configuration for better integration testing setup. - Refactored existing tests for marketplace agents and creators to focus on cross-page flows. ### Checklist 📋 - [x] Verified that all tests pass with the new changes. - [x] Ensured comprehensive coverage for error handling scenarios in tests. - [x] Updated documentation for testing practices in `CLAUDE.md`.	2026-01-23 12:26:00 +05:30
abhi1992002	3e3af45456	fix(frontend): update testing setup with @testing-library/jest-dom and happy-dom ### Changes 🏗️ - Removed `happy-dom` from `devDependencies` and added it back in a different section for clarity. - Added `@testing-library/jest-dom` to `devDependencies` for improved testing assertions. - Updated `tsconfig.json` to include types for `@testing-library/jest-dom`. - Configured Vitest to enable global variables for testing. - Imported `@testing-library/jest-dom` in the Vitest setup file for enhanced testing capabilities. ### Checklist 📋 - [x] Verified that all tests pass with the new setup. - [x] Ensured that the testing environment is correctly configured for integration tests.	2026-01-23 10:07:36 +05:30