dx(orchestrate): fix stale-review gate and add pr-test evaluation rules to SKILL.md (#12701 )

## Changes ### verify-complete.sh - CHANGES_REQUESTED reviews are now compared against the latest commit timestamp. If the review was submitted **before** the latest commit, it is treated as stale and does not block verification. - Added fail-closed guard: if the `gh pr view` fetch fails, the script exits 1 (rather than treating missing data as "no blocking reviews") - Fixed edge case: a `CHANGES_REQUESTED` review with a null `submittedAt` is now counted as fresh/blocking (previously silently skipped) - Combined two separate `gh pr view` calls into one (`--json commits,reviews`) to reduce API calls and ensure consistency ### SKILL.md (orchestrate skill) - Added `### /pr-test result evaluation` section with explicit pass/partial/fail handling table - **PARTIAL on any headline feature scenario = immediate blocker**: re-brief the agent, fix, and re-run from scratch. Never approve or output ORCHESTRATOR:DONE with a PARTIAL headline result. - Concrete incident callout: PR #12699 S5 (Apply suggestions) was PARTIAL — AI never output JSON action blocks — but was nearly approved. This rule prevents recurrence. - Updated `verify-complete.sh` description throughout to include "no fresh CHANGES_REQUESTED" - Added staleness rule documentation: a review only blocks if submitted *after* the latest commit ## Why Two separate incidents prompted these changes: 1. **verify-complete.sh false positive**: An automated bot (autogpt-pr-reviewer) submitted a `CHANGES_REQUESTED` review in April. An agent then pushed fixing commits. The old script still blocked on the stale review, preventing the PR from being verified as done. 2. **Missed PARTIAL signal**: PR #12699 had a PARTIAL result on its headline scenario (S5 Apply button) because the AI emitted direct builder tool calls instead of JSON action blocks. The orchestrator nearly approved it. The new SKILL.md rule makes PARTIAL = blocker explicit. ## Checklist - [x] I have read the contribution guide - [x] My changes follow the code style of this project - [x] Changes are limited to the scope of this PR (< 20% unrelated changes) - [x] All new and existing tests pass
dx: add /orchestrate skill — Claude Code agent fleet supervisor with spare worktree lifecycle (#12691 )
2026-04-08 03:00:28 -04:00 · 2026-04-08 08:58:42 +07:00 · 2026-04-08 00:18:32 +07:00 · 2026-04-07 12:43:47 +00:00 · 2026-04-07 19:28:40 +07:00 · 2026-04-07 19:11:09 +07:00
2981 changed files with 131893 additions and 835857 deletions
--- a/.agents/skills
+++ b/.agents/skills
@@ -0,0 +1 @@
+../.claude/skills
--- a/.claude/settings.json
+++ b/.claude/settings.json
@@ -0,0 +1,10 @@
+{
+  "permissions": {
+    "allowedTools": [
+      "Read", "Grep", "Glob",
+      "Bash(ls:*)", "Bash(cat:*)", "Bash(grep:*)", "Bash(find:*)",
+      "Bash(git status:*)", "Bash(git diff:*)", "Bash(git log:*)", "Bash(git worktree:*)",
+      "Bash(tmux:*)", "Bash(sleep:*)", "Bash(branchlet:*)"
+    ]
+  }
+}
--- a/.claude/skills/open-pr/SKILL.md
+++ b/.claude/skills/open-pr/SKILL.md
@@ -0,0 +1,106 @@
+---
+name: open-pr
+description: Open a pull request with proper PR template, test coverage, and review workflow. Guides agents through creating a PR that follows repo conventions, ensures existing behaviors aren't broken, covers new behaviors with tests, and handles review via bot when local testing isn't possible. TRIGGER when user asks to "open a PR", "create a PR", "make a PR", "submit a PR", "open pull request", "push and create PR", or any variation of opening/submitting a pull request.
+user-invocable: true
+args: "[base-branch] — optional target branch (defaults to dev)."
+metadata:
+  author: autogpt-team
+  version: "1.0.0"
+---
+
+# Open a Pull Request
+
+## Step 1: Pre-flight checks
+
+Before opening the PR:
+
+1. Ensure all changes are committed
+2. Ensure the branch is pushed to the remote (`git push -u origin <branch>`)
+3. Run linters/formatters across the whole repo (not just changed files) and commit any fixes
+
+## Step 2: Test coverage
+
+**This is critical.** Before opening the PR, verify:
+
+### Existing behavior is not broken
+- Identify which modules/components your changes touch
+- Run the existing test suites for those areas
+- If tests fail, fix them before opening the PR — do not open a PR with known regressions
+
+### New behavior has test coverage
+- Every new feature, endpoint, or behavior change needs tests
+- If you added a new block, add tests for that block
+- If you changed API behavior, add or update API tests
+- If you changed frontend behavior, verify it doesn't break existing flows
+
+If you cannot run the full test suite locally, note which tests you ran and which you couldn't in the test plan.
+
+## Step 3: Create the PR using the repo template
+
+Read the canonical PR template at `.github/PULL_REQUEST_TEMPLATE.md` and use it **verbatim** as your PR body:
+
+1. Read the template: `cat .github/PULL_REQUEST_TEMPLATE.md`
+2. Preserve the exact section titles and formatting, including:
+   - `### Why / What / How`
+   - `### Changes 🏗️`
+   - `### Checklist 📋`
+3. Replace HTML comment prompts (`<!-- ... -->`) with actual content; do not leave them in
+4. **Do not pre-check boxes** — leave all checkboxes as `- [ ]` until each step is actually completed
+5. Do not alter the template structure, rename sections, or remove any checklist items
+
+**PR title must use conventional commit format** (e.g., `feat(backend): add new block`, `fix(frontend): resolve routing bug`, `dx(skills): update PR workflow`). See CLAUDE.md for the full list of scopes.
+
+Use `gh pr create` with the base branch (defaults to `dev` if no `[base-branch]` was provided). Use `--body-file` to avoid shell interpretation of backticks and special characters:
+
+```bash
+BASE_BRANCH="${BASE_BRANCH:-dev}"
+PR_BODY=$(mktemp)
+cat > "$PR_BODY" << 'PREOF'
+<filled-in template from .github/PULL_REQUEST_TEMPLATE.md>
+PREOF
+gh pr create --base "$BASE_BRANCH" --title "<type>(scope): short description" --body-file "$PR_BODY"
+rm "$PR_BODY"
+```
+
+## Step 4: Review workflow
+
+### If you have a workspace that allows testing (docker, running backend, etc.)
+- Run `/pr-test` to do E2E manual testing of the PR using docker compose, agent-browser, and API calls. This is the most thorough way to validate your changes before review.
+- After testing, run `/pr-review` to self-review the PR for correctness, security, code quality, and testing gaps before requesting human review.
+
+### If you do NOT have a workspace that allows testing
+This is common for agents running in worktrees without a full stack. In this case:
+
+1. Run `/pr-review` locally to catch obvious issues before pushing
+2. **Comment `/review` on the PR** after creating it to trigger the review bot
+3. **Poll for the review** rather than blindly waiting — check for new review comments every 30 seconds using `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate` and the GraphQL inline threads query. The bot typically responds within 30 minutes, but polling lets the agent react as soon as it arrives.
+4. Do NOT proceed or merge until the bot review comes back
+5. Address any issues the bot raises — use `/pr-address` which has a full polling loop with CI + comment tracking
+
+```bash
+# After creating the PR:
+PR_NUMBER=$(gh pr view --json number -q .number)
+gh pr comment "$PR_NUMBER" --body "/review"
+# Then use /pr-address to poll for and address the review when it arrives
+```
+
+## Step 5: Address review feedback
+
+Once the review bot or human reviewers leave comments:
+- Run `/pr-address` to address review comments. It will loop until CI is green and all comments are resolved.
+- Do not merge without human approval.
+
+## Related skills
+
+| Skill | When to use |
+|---|---|
+| `/pr-test` | E2E testing with docker compose, agent-browser, API calls — use when you have a running workspace |
+| `/pr-review` | Review for correctness, security, code quality — use before requesting human review |
+| `/pr-address` | Address reviewer comments and loop until CI green — use after reviews come in |
+
+## Step 6: Post-creation
+
+After the PR is created and review is triggered:
+- Share the PR URL with the user
+- If waiting on the review bot, let the user know the expected wait time (~30 min)
+- Do not merge without human approval
--- a/.claude/skills/orchestrate/SKILL.md
+++ b/.claude/skills/orchestrate/SKILL.md
@@ -0,0 +1,545 @@
+---
+name: orchestrate
+description: "Meta-agent supervisor that manages a fleet of Claude Code agents running in tmux windows. Auto-discovers spare worktrees, spawns agents, monitors state, kicks idle agents, approves safe confirmations, and recycles worktrees when done. TRIGGER when user asks to supervise agents, run parallel tasks, manage worktrees, check agent status, or orchestrate parallel work."
+user-invocable: true
+argument-hint: "any free text — e.g. 'start 3 agents on X Y Z', 'show status', 'add task: implement feature A', 'stop', 'how many are free?'"
+metadata:
+  author: autogpt-team
+  version: "6.0.0"
+---
+
+# Orchestrate — Agent Fleet Supervisor
+
+One tmux session, N windows — each window is one agent working in its own worktree. Speak naturally; Claude maps your intent to the right scripts.
+
+## Scripts
+
+```bash
+SKILLS_DIR=$(git rev-parse --show-toplevel)/.claude/skills/orchestrate/scripts
+STATE_FILE=~/.claude/orchestrator-state.json
+```
+
+| Script | Purpose |
+|---|---|
+| `find-spare.sh [REPO_ROOT]` | List free worktrees — one `PATH BRANCH` per line |
+| `spawn-agent.sh SESSION PATH SPARE NEW_BRANCH OBJECTIVE [PR_NUMBER] [STEPS...]` | Create window + checkout branch + launch claude + send task. **Stdout: `SESSION:WIN` only** |
+| `recycle-agent.sh WINDOW PATH SPARE_BRANCH` | Kill window + restore spare branch |
+| `run-loop.sh` | **Mechanical babysitter** — idle restart + dialog approval + recycle on ORCHESTRATOR:DONE + supervisor health check + all-done notification |
+| `verify-complete.sh WINDOW` | Verify PR is done: checkpoints ✓ + 0 unresolved threads + CI green + no fresh CHANGES_REQUESTED. Repo auto-derived from state file `.repo` or git remote. |
+| `notify.sh MESSAGE` | Send notification via Discord webhook (env `DISCORD_WEBHOOK_URL` or state `.discord_webhook`), macOS notification center, and stdout |
+| `capacity.sh [REPO_ROOT]` | Print available + in-use worktrees |
+| `status.sh` | Print fleet status + live pane commands |
+| `poll-cycle.sh` | One monitoring cycle — classifies panes, tracks checkpoints, returns JSON action array |
+| `classify-pane.sh WINDOW` | Classify one pane state |
+
+## Supervision model
+
+```
+Orchestrating Claude (this Claude session — IS the supervisor)
+  └── Reads pane output, checks CI, intervenes with targeted guidance
+        run-loop.sh (separate tmux window, every 30s)
+          └── Mechanical only: idle restart, dialog approval, recycle on ORCHESTRATOR:DONE
+```
+
+**You (the orchestrating Claude)** are the supervisor. After spawning agents, stay in this conversation and actively monitor: poll each agent's pane every 2-3 minutes, check CI, nudge stalled agents, and verify completions. Do not spawn a separate supervisor Claude window — it loses context, is hard to observe, and compounds context compression problems.
+
+**run-loop.sh** is the mechanical layer — zero tokens, handles things that need no judgment: restart crashed agents, press Enter on dialogs, recycle completed worktrees (only after `verify-complete.sh` passes).
+
+## Checkpoint protocol
+
+Agents output checkpoints as they complete each required step:
+
+```
+CHECKPOINT:<step-name>
+```
+
+Required steps are passed as args to `spawn-agent.sh` (e.g. `pr-address pr-test`). `run-loop.sh` will not recycle a window until all required checkpoints are found in the pane output. If `verify-complete.sh` fails, the agent is re-briefed automatically.
+
+## Worktree lifecycle
+
+```text
+spare/N branch  →  spawn-agent.sh (--session-id UUID)  →  window + feat/branch + claude running
+                                                                 ↓
+                                               CHECKPOINT:<step> (as steps complete)
+                                                                 ↓
+                                                        ORCHESTRATOR:DONE
+                                                                 ↓
+                                    verify-complete.sh: checkpoints ✓ + 0 threads + CI green + no fresh CHANGES_REQUESTED
+                                                                 ↓
+                                              state → "done", notify, window KEPT OPEN
+                                                                 ↓
+                              user/orchestrator explicitly requests recycle
+                                                                 ↓
+                                         recycle-agent.sh → spare/N (free again)
+```
+
+**Windows are never auto-killed.** The worktree stays on its branch, the session stays alive. The agent is done working but the window, git state, and Claude session are all preserved until you choose to recycle.
+
+**To resume a done or crashed session:**
+```bash
+# Resume by stored session ID (preferred — exact session, full context)
+claude --resume SESSION_ID --permission-mode bypassPermissions
+
+# Or resume most recent session in that worktree directory
+cd /path/to/worktree && claude --continue --permission-mode bypassPermissions
+```
+
+**To manually recycle when ready:**
+```bash
+bash ~/.claude/orchestrator/scripts/recycle-agent.sh SESSION:WIN WORKTREE_PATH spare/N
+# Then update state:
+jq --arg w "SESSION:WIN" '.agents |= map(if .window == $w then .state = "recycled" else . end)' \
+  ~/.claude/orchestrator-state.json > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
+```
+
+## State file (`~/.claude/orchestrator-state.json`)
+
+Never committed to git. You maintain this file directly using `jq` + atomic writes (`.tmp` → `mv`).
+
+```json
+{
+  "active": true,
+  "tmux_session": "autogpt1",
+  "idle_threshold_seconds": 300,
+  "loop_window": "autogpt1:5",
+  "repo": "Significant-Gravitas/AutoGPT",
+  "discord_webhook": "https://discord.com/api/webhooks/...",
+  "last_poll_at": 0,
+  "agents": [
+    {
+      "window": "autogpt1:3",
+      "worktree": "AutoGPT6",
+      "worktree_path": "/path/to/AutoGPT6",
+      "spare_branch": "spare/6",
+      "branch": "feat/my-feature",
+      "objective": "Implement X and open a PR",
+      "pr_number": "12345",
+      "session_id": "550e8400-e29b-41d4-a716-446655440000",
+      "steps": ["pr-address", "pr-test"],
+      "checkpoints": ["pr-address"],
+      "state": "running",
+      "last_output_hash": "",
+      "last_seen_at": 0,
+      "spawned_at": 0,
+      "idle_since": 0,
+      "revision_count": 0,
+      "last_rebriefed_at": 0
+    }
+  ]
+}
+```
+
+Top-level optional fields:
+- `repo` — GitHub `owner/repo` for CI/thread checks. Auto-derived from git remote if omitted.
+- `discord_webhook` — Discord webhook URL for completion notifications. Also reads `DISCORD_WEBHOOK_URL` env var.
+
+Per-agent fields:
+- `session_id` — UUID passed to `claude --session-id` at spawn; use with `claude --resume UUID` to restore exact session context after a crash or window close.
+- `last_rebriefed_at` — Unix timestamp of last re-brief; enforces 5-min cooldown to prevent spam.
+
+Agent states: `running` | `idle` | `stuck` | `waiting_approval` | `complete` | `done` | `escalated`
+
+`done` means verified complete — window is still open, session still alive, worktree still on task branch. Not recycled yet.
+
+## Serial /pr-test rule
+
+`/pr-test` and `/pr-test --fix` run local Docker + integration tests that use shared ports, a shared database, and shared build caches. **Running two `/pr-test` jobs simultaneously will cause port conflicts and database corruption.**
+
+**Rule: only one `/pr-test` runs at a time. The orchestrator serializes them.**
+
+You (the orchestrating Claude) own the test queue:
+1. Agents do `pr-review` and `pr-address` in parallel — that's safe (they only push code and reply to GitHub).
+2. When a PR needs local testing, add it to your mental queue — don't give agents a `pr-test` step.
+3. Run `/pr-test https://github.com/OWNER/REPO/pull/PR_NUMBER --fix` yourself, sequentially.
+4. Feed results back to the relevant agent via `tmux send-keys`:
+   ```bash
+   tmux send-keys -t SESSION:WIN "Local tests for PR #N: <paste failure output or 'all passed'>. Fix any failures and push, then output ORCHESTRATOR:DONE."
+   sleep 0.3
+   tmux send-keys -t SESSION:WIN Enter
+   ```
+5. Wait for CI to confirm green before marking the agent done.
+
+If multiple PRs need testing at the same time, pick the one furthest along (fewest pending CI checks) and test it first. Only start the next test after the previous one completes.
+
+## Session restore (tested and confirmed)
+
+Agent sessions are saved to disk. To restore a closed or crashed session:
+
+```bash
+# If session_id is in state (preferred):
+NEW_WIN=$(tmux new-window -t SESSION -n WORKTREE_NAME -P -F '#{window_index}')
+tmux send-keys -t "SESSION:${NEW_WIN}" "cd /path/to/worktree && claude --resume SESSION_ID --permission-mode bypassPermissions" Enter
+
+# If no session_id (use --continue for most recent session in that directory):
+tmux send-keys -t "SESSION:${NEW_WIN}" "cd /path/to/worktree && claude --continue --permission-mode bypassPermissions" Enter
+```
+
+`--continue` restores the full conversation history including all tool calls, file edits, and context. The agent resumes exactly where it left off. After restoring, update the window address in the state file:
+
+```bash
+jq --arg old "SESSION:OLD_WIN" --arg new "SESSION:NEW_WIN" \
+  '(.agents[] | select(.window == $old)).window = $new' \
+  ~/.claude/orchestrator-state.json > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
+```
+
+## Intent → action mapping
+
+Match the user's message to one of these intents:
+
+| The user says something like… | What to do |
+|---|---|
+| "status", "what's running", "show agents" | Run `status.sh` + `capacity.sh`, show output |
+| "how many free", "capacity", "available worktrees" | Run `capacity.sh`, show output |
+| "start N agents on X, Y, Z" or "run these tasks: …" | See **Spawning agents** below |
+| "add task: …", "add one more agent for …" | See **Adding an agent** below |
+| "stop", "shut down", "pause the fleet" | See **Stopping** below |
+| "poll", "check now", "run a cycle" | Run `poll-cycle.sh`, process actions |
+| "recycle window X", "free up autogpt3" | Run `recycle-agent.sh` directly |
+
+When the intent is ambiguous, show capacity first and ask what tasks to run.
+
+## Spawning agents
+
+### 1. Resolve tmux session
+
+```bash
+tmux list-sessions -F "#{session_name}: #{session_windows} windows" 2>/dev/null
+```
+
+Use an existing session. **Never create a tmux session from within Claude** — it becomes a child of Claude's process and dies when the session ends. If no session exists, tell the user to run `tmux new-session -d -s autogpt1` in their terminal first, then re-invoke `/orchestrate`.
+
+### 2. Show available capacity
+
+```bash
+bash $SKILLS_DIR/capacity.sh $(git rev-parse --show-toplevel)
+```
+
+### 3. Collect tasks from the user
+
+For each task, gather:
+- **objective** — what to do (e.g. "implement feature X and open a PR")
+- **branch name** — e.g. `feat/my-feature` (derive from objective if not given)
+- **pr_number** — GitHub PR number if working on an existing PR (for verification)
+- **steps** — required checkpoint names in order (e.g. `pr-address pr-test`) — derive from objective
+
+Ask for `idle_threshold_seconds` only if the user mentions it (default: 300).
+
+Never ask the user to specify a worktree — auto-assign from `find-spare.sh`.
+
+### 4. Spawn one agent per task
+
+```bash
+# Get ordered list of spare worktrees
+SPARE_LIST=$(bash $SKILLS_DIR/find-spare.sh $(git rev-parse --show-toplevel))
+
+# For each task, take the next spare line:
+WORKTREE_PATH=$(echo "$SPARE_LINE" | awk '{print $1}')
+SPARE_BRANCH=$(echo "$SPARE_LINE" | awk '{print $2}')
+
+# With PR number and required steps:
+WINDOW=$(bash $SKILLS_DIR/spawn-agent.sh "$SESSION" "$WORKTREE_PATH" "$SPARE_BRANCH" "$NEW_BRANCH" "$OBJECTIVE" "$PR_NUMBER" "pr-address" "pr-test")
+
+# Without PR (new work):
+WINDOW=$(bash $SKILLS_DIR/spawn-agent.sh "$SESSION" "$WORKTREE_PATH" "$SPARE_BRANCH" "$NEW_BRANCH" "$OBJECTIVE")
+```
+
+Build an agent record and append it to the state file. If the state file doesn't exist yet, initialize it:
+
+```bash
+# Derive repo from git remote (used by verify-complete.sh + supervisor)
+REPO=$(git remote get-url origin 2>/dev/null | sed 's|.*github\.com[:/]||; s|\.git$||' || echo "")
+
+jq -n \
+  --arg session "$SESSION" \
+  --arg repo "$REPO" \
+  --argjson threshold 300 \
+  '{active:true, tmux_session:$session, idle_threshold_seconds:$threshold,
+    repo:$repo, loop_window:null, supervisor_window:null, last_poll_at:0, agents:[]}' \
+  > ~/.claude/orchestrator-state.json
+```
+
+Optionally add a Discord webhook for completion notifications:
+```bash
+jq --arg hook "$DISCORD_WEBHOOK_URL" '.discord_webhook = $hook' ~/.claude/orchestrator-state.json \
+  > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
+```
+
+`spawn-agent.sh` writes the initial agent record (window, worktree_path, branch, objective, state, etc.) to the state file automatically — **do not append the record again after calling it.** The record already exists and `pr_number`/`steps` are patched in by the script itself.
+
+### 5. Start the mechanical babysitter
+
+```bash
+LOOP_WIN=$(tmux new-window -t "$SESSION" -n "orchestrator" -P -F '#{window_index}')
+LOOP_WINDOW="${SESSION}:${LOOP_WIN}"
+tmux send-keys -t "$LOOP_WINDOW" "bash $SKILLS_DIR/run-loop.sh" Enter
+
+jq --arg w "$LOOP_WINDOW" '.loop_window = $w' ~/.claude/orchestrator-state.json \
+  > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
+```
+
+### 6. Begin supervising directly in this conversation
+
+You are the supervisor. After spawning, immediately start your first poll loop (see **Supervisor duties** below) and continue every 2-3 minutes. Do NOT spawn a separate supervisor Claude window.
+
+## Adding an agent
+
+Find the next spare worktree, then spawn and append to state — same as steps 2–4 above but for a single task. If no spare worktrees are available, tell the user.
+
+## Supervisor duties (YOUR job, every 2-3 min in this conversation)
+
+You are the supervisor. Run this poll loop directly in your Claude session — not in a separate window.
+
+### Poll loop mechanism
+
+You are reactive — you only act when a tool completes or the user sends a message. To create a self-sustaining poll loop without user involvement:
+
+1. Start each poll with `run_in_background: true` + a sleep before the work:
+   ```bash
+   sleep 120 && tmux capture-pane -t autogpt1:0 -p -S -200 | tail -40
+   # + similar for each active window
+   ```
+2. When the background job notifies you, read the pane output and take action.
+3. Immediately schedule the next background poll — this keeps the loop alive.
+4. Stop scheduling when all agents are done/escalated.
+
+**Never tell the user "I'll poll every 2-3 minutes"** — that does nothing without a trigger. Start the background job instead.
+
+### Each poll: what to check
+
+```bash
+# 1. Read state
+cat ~/.claude/orchestrator-state.json | jq '.agents[] | {window, worktree, branch, state, pr_number, checkpoints}'
+
+# 2. For each running/stuck/idle agent, capture pane
+tmux capture-pane -t SESSION:WIN -p -S -200 | tail -60
+```
+
+For each agent, decide:
+
+| What you see | Action |
+|---|---|
+| Spinner / tools running | Do nothing — agent is working |
+| Idle `❯` prompt, no `ORCHESTRATOR:DONE` | Stalled — send specific nudge with objective from state |
+| Stuck in error loop | Send targeted fix with exact error + solution |
+| Waiting for input / question | Answer and unblock via `tmux send-keys` |
+| CI red | `gh pr checks PR_NUMBER --repo REPO` → tell agent exactly what's failing |
+| Context compacted / agent lost | Send recovery: `cat ~/.claude/orchestrator-state.json | jq '.agents[] | select(.window=="WIN")'` + `gh pr view PR_NUMBER --json title,body` |
+| `ORCHESTRATOR:DONE` in output | Run `verify-complete.sh` — if it fails, re-brief with specific reason |
+
+### Strict ORCHESTRATOR:DONE gate
+
+`verify-complete.sh` handles the main checks automatically (checkpoints, threads, CI green, spawned_at, and CHANGES_REQUESTED). Run it:
+
+**CHANGES_REQUESTED staleness rule**: a `CHANGES_REQUESTED` review only blocks if it was submitted *after* the latest commit. If the latest commit postdates the review, the review is considered stale (feedback already addressed) and does not block. This avoids false negatives when a bot reviewer hasn't re-reviewed after the agent's fixing commits.
+
+```bash
+SKILLS_DIR=~/.claude/orchestrator/scripts
+bash $SKILLS_DIR/verify-complete.sh SESSION:WIN
+```
+
+If it passes → run-loop.sh will recycle the window automatically. No manual action needed.
+If it fails → re-brief the agent with the failure reason. Never manually mark state `done` to bypass this.
+
+### Re-brief a stalled agent
+
+```bash
+OBJ=$(jq -r --arg w SESSION:WIN '.agents[] | select(.window==$w) | .objective' ~/.claude/orchestrator-state.json)
+PR=$(jq -r --arg w SESSION:WIN '.agents[] | select(.window==$w) | .pr_number' ~/.claude/orchestrator-state.json)
+tmux send-keys -t SESSION:WIN "You appear stalled. Your objective: $OBJ. Check: gh pr view $PR --json title,body,headRefName to reorient."
+sleep 0.3
+tmux send-keys -t SESSION:WIN Enter
+```
+
+If `image_path` is set on the agent record, include: "Re-read context at IMAGE_PATH with the Read tool."
+
+## Self-recovery protocol (agents)
+
+spawn-agent.sh automatically includes this instruction in every objective:
+
+> If your context compacts and you lose track of what to do, run:
+> `cat ~/.claude/orchestrator-state.json | jq '.agents[] | select(.window=="SESSION:WIN")'`
+> and `gh pr view PR_NUMBER --json title,body,headRefName` to reorient.
+> Output each completed step as `CHECKPOINT:<step-name>` on its own line.
+
+## Passing images and screenshots to agents
+
+`tmux send-keys` is text-only — you cannot paste a raw image into a pane. To give an agent visual context (screenshots, diagrams, mockups):
+
+1. **Save the image to a temp file** with a stable path:
+   ```bash
+   # If the user drags in a screenshot or you receive a file path:
+   IMAGE_PATH="/tmp/orchestrator-context-$(date +%s).png"
+   cp "$USER_PROVIDED_PATH" "$IMAGE_PATH"
+   ```
+
+2. **Reference the path in the objective string**:
+   ```bash
+   OBJECTIVE="Implement the layout shown in /tmp/orchestrator-context-1234567890.png. Read that image first with the Read tool to understand the design."
+   ```
+
+3. The agent uses its `Read` tool to view the image at startup — Claude Code agents are multimodal and can read image files directly.
+
+**Rule**: always use `/tmp/orchestrator-context-<timestamp>.png` as the naming convention so the supervisor knows what to look for if it needs to re-brief an agent with the same image.
+
+---
+
+## Orchestrator final evaluation (YOU decide, not the script)
+
+`verify-complete.sh` is a gate — it blocks premature marking. But it cannot tell you if the work is actually good. That is YOUR job.
+
+When run-loop marks an agent `pending_evaluation` and you're notified, do all of these before marking done:
+
+### 1. Run /pr-test (required, serialized, use TodoWrite to queue)
+
+`/pr-test` is the only reliable confirmation that the objective is actually met. Run it yourself, not the agent.
+
+**When multiple PRs reach `pending_evaluation` at the same time, use TodoWrite to queue them:**
+```
+- [ ] /pr-test PR #12636 — fix copilot retry logic
+- [ ] /pr-test PR #12699 — builder chat panel
+```
+Run one at a time. Check off as you go.
+
+```
+/pr-test https://github.com/Significant-Gravitas/AutoGPT/pull/PR_NUMBER
+```
+
+**/pr-test can be lazy** — if it gives vague output, re-run with full context:
+
+```
+/pr-test https://github.com/OWNER/REPO/pull/PR_NUMBER
+Context: This PR implements <objective from state file>. Key files: <list>.
+Please verify: <specific behaviors to check>.
+```
+
+Only one `/pr-test` at a time — they share ports and DB.
+
+### /pr-test result evaluation
+
+**PARTIAL on any headline feature scenario is an immediate blocker.** Do not approve, do not mark done, do not let the agent output `ORCHESTRATOR:DONE`.
+
+| `/pr-test` result | Action |
+|---|---|
+| All headline scenarios **PASS** | Proceed to evaluation step 2 |
+| Any headline scenario **PARTIAL** | Re-brief the agent immediately — see below |
+| Any headline scenario **FAIL** | Re-brief the agent immediately |
+
+**What PARTIAL means**: the feature is only partly working. Example: the Apply button never appeared, or the AI returned no action blocks. The agent addressed part of the objective but not all of it.
+
+**When any headline scenario is PARTIAL or FAIL:**
+
+1. Do NOT mark the agent done or accept `ORCHESTRATOR:DONE`
+2. Re-brief the agent with the specific scenario that failed and what was missing:
+   ```bash
+   tmux send-keys -t SESSION:WIN "PARTIAL result on /pr-test — S5 (Apply button) never appeared. The AI must output JSON action blocks for the Apply button to render. Fix this before re-running /pr-test."
+   sleep 0.3
+   tmux send-keys -t SESSION:WIN Enter
+   ```
+3. Set state back to `running`:
+   ```bash
+   jq --arg w "SESSION:WIN" '(.agents[] | select(.window == $w)).state = "running"' \
+     ~/.claude/orchestrator-state.json > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
+   ```
+4. Wait for new `ORCHESTRATOR:DONE`, then re-run `/pr-test` from scratch
+
+**Rule: only ALL-PASS qualifies for approval.** A mix of PASS + PARTIAL is a failure.
+
+> **Why this matters**: PR #12699 was wrongly approved with S5 PARTIAL — the AI never output JSON action blocks so the Apply button never appeared. The fix was already in the agent's reach but slipped through because PARTIAL was not treated as blocking.
+
+### 2. Do your own evaluation
+
+1. **Read the PR diff and objective** — does the code actually implement what was asked? Is anything obviously missing or half-done?
+2. **Read the resolved threads** — were comments addressed with real fixes, or just dismissed/resolved without changes?
+3. **Check CI run names** — any suspicious retries that shouldn't have passed?
+4. **Check the PR description** — title, summary, test plan complete?
+
+### 3. Decide
+
+- `/pr-test` all scenarios PASS + evaluation looks good → mark `done` in state, tell the user the PR is ready, ask if window should be closed
+- `/pr-test` any scenario PARTIAL or FAIL → re-brief the agent with the specific failing scenario, set state back to `running` (see `/pr-test result evaluation` above)
+- Evaluation finds gaps even with all PASS → re-brief the agent with specific gaps, set state back to `running`
+
+**Never mark done based purely on script output.** You hold the full objective context; the script does not.
+
+```bash
+# Mark done after your positive evaluation:
+jq --arg w "SESSION:WIN" '(.agents[] | select(.window == $w)).state = "done"' \
+  ~/.claude/orchestrator-state.json > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
+```
+
+## When to stop the fleet
+
+Stop the fleet (`active = false`) when **all** of the following are true:
+
+| Check | How to verify |
+|---|---|
+| All agents are `done` or `escalated` | `jq '[.agents[] | select(.state | test("running\|stuck\|idle\|waiting_approval"))] | length' ~/.claude/orchestrator-state.json` == 0 |
+| All PRs have 0 unresolved review threads | GraphQL `isResolved` check per PR |
+| All PRs have green CI **on a run triggered after the agent's last push** | `gh run list --branch BRANCH --limit 1` timestamp > `spawned_at` in state |
+| No fresh CHANGES_REQUESTED (after latest commit) | `verify-complete.sh` checks this — stale pre-commit reviews are ignored |
+| No agents are `escalated` without human review | If any are escalated, surface to user first |
+
+**Do NOT stop just because agents output `ORCHESTRATOR:DONE`.** That is a signal to verify, not a signal to stop.
+
+**Do stop** if the user explicitly says "stop", "shut down", or "kill everything", even with agents still running.
+
+```bash
+# Graceful stop
+jq '.active = false' ~/.claude/orchestrator-state.json > /tmp/orch.tmp \
+  && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
+
+LOOP_WINDOW=$(jq -r '.loop_window // ""' ~/.claude/orchestrator-state.json)
+[ -n "$LOOP_WINDOW" ] && tmux kill-window -t "$LOOP_WINDOW" 2>/dev/null || true
+```
+
+Does **not** recycle running worktrees — agents may still be mid-task. Run `capacity.sh` to see what's still in progress.
+
+## tmux send-keys pattern
+
+**Always split long messages into text + Enter as two separate calls with a sleep between them.** If sent as one call (`"text" Enter`), Enter can fire before the full string is buffered into Claude's input — leaving the message stuck as `[Pasted text +N lines]` unsent.
+
+```bash
+# CORRECT — text then Enter separately
+tmux send-keys -t "$WINDOW" "your long message here"
+sleep 0.3
+tmux send-keys -t "$WINDOW" Enter
+
+# WRONG — Enter may fire before text is buffered
+tmux send-keys -t "$WINDOW" "your long message here" Enter
+```
+
+Short single-character sends (`y`, `Down`, empty Enter for dialog approval) are safe to combine since they have no buffering lag.
+
+---
+
+## Protected worktrees
+
+Some worktrees must **never** be used as spare worktrees for agent tasks because they host files critical to the orchestrator itself:
+
+| Worktree | Protected branch | Why |
+|---|---|---|
+| `AutoGPT1` | `dx/orchestrate-skill` | Hosts the orchestrate skill scripts. `recycle-agent.sh` would check out `spare/1`, wiping `.claude/skills/` and breaking all subsequent `spawn-agent.sh` calls. |
+
+**Rule**: when selecting spare worktrees via `find-spare.sh`, skip any worktree whose CURRENT branch matches a protected branch. If you accidentally spawn an agent in a protected worktree, do not let `recycle-agent.sh` run on it — manually restore the branch after the agent finishes.
+
+When `dx/orchestrate-skill` is merged into `dev`, `AutoGPT1` becomes a normal spare again.
+
+---
+
+## Key rules
+
+1. **Scripts do all the heavy lifting** — don't reimplement their logic inline in this file
+2. **Never ask the user to pick a worktree** — auto-assign from `find-spare.sh` output
+3. **Never restart a running agent** — only restart on `idle` kicks (foreground is a shell)
+4. **Auto-dismiss settings dialogs** — if "Enter to confirm" appears, send Down+Enter
+5. **Always `--permission-mode bypassPermissions`** on every spawn
+6. **Escalate after 3 kicks** — mark `escalated`, surface to user
+7. **Atomic state writes** — always write to `.tmp` then `mv`
+8. **Never approve destructive commands** outside the worktree scope — when in doubt, escalate
+9. **Never recycle without verification** — `verify-complete.sh` must pass before recycling
+10. **No TASK.md files** — commit risk; use state file + `gh pr view` for agent context persistence
+11. **Re-brief stalled agents** — read objective from state file + `gh pr view`, send via tmux
+12. **ORCHESTRATOR:DONE is a signal to verify, not to accept** — always run `verify-complete.sh` and check CI run timestamp before recycling
+13. **Protected worktrees** — never use the worktree hosting the skill scripts as a spare
+14. **Images via file path** — save screenshots to `/tmp/orchestrator-context-<ts>.png`, pass path in objective; agents read with the `Read` tool
+15. **Split send-keys** — always separate text and Enter with `sleep 0.3` between calls for long strings
--- a/.claude/skills/orchestrate/scripts/capacity.sh
+++ b/.claude/skills/orchestrate/scripts/capacity.sh
@@ -0,0 +1,43 @@
+#!/usr/bin/env bash
+# capacity.sh — show fleet capacity: available spare worktrees + in-use agents
+#
+# Usage: capacity.sh [REPO_ROOT]
+#   REPO_ROOT defaults to the root worktree of the current git repo.
+#
+# Reads: ~/.claude/orchestrator-state.json (skipped if missing or corrupt)
+
+set -euo pipefail
+
+SCRIPTS_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
+REPO_ROOT="${1:-$(git rev-parse --show-toplevel 2>/dev/null || echo "")}"
+
+echo "=== Available (spare) worktrees ==="
+if [ -n "$REPO_ROOT" ]; then
+  SPARE=$("$SCRIPTS_DIR/find-spare.sh" "$REPO_ROOT" 2>/dev/null || echo "")
+else
+  SPARE=$("$SCRIPTS_DIR/find-spare.sh" 2>/dev/null || echo "")
+fi
+
+if [ -z "$SPARE" ]; then
+  echo "  (none)"
+else
+  while IFS= read -r line; do
+    [ -z "$line" ] && continue
+    echo "  ✓ $line"
+  done <<< "$SPARE"
+fi
+
+echo ""
+echo "=== In-use worktrees ==="
+if [ -f "$STATE_FILE" ] && jq -e '.' "$STATE_FILE" >/dev/null 2>&1; then
+  IN_USE=$(jq -r '.agents[] | select(.state != "done") | "  [\(.state)] \(.worktree_path) → \(.branch)"' \
+    "$STATE_FILE" 2>/dev/null || echo "")
+  if [ -n "$IN_USE" ]; then
+    echo "$IN_USE"
+  else
+    echo "  (none)"
+  fi
+else
+  echo "  (no active state file)"
+fi
--- a/.claude/skills/orchestrate/scripts/classify-pane.sh
+++ b/.claude/skills/orchestrate/scripts/classify-pane.sh
@@ -0,0 +1,85 @@
+#!/usr/bin/env bash
+# classify-pane.sh — Classify the current state of a tmux pane
+#
+# Usage: classify-pane.sh <tmux-target>
+#   tmux-target: e.g. "work:0", "work:1.0"
+#
+# Output (stdout): JSON object:
+#   { "state": "running|idle|waiting_approval|complete", "reason": "...", "pane_cmd": "..." }
+#
+# Exit codes: 0=ok, 1=error (invalid target or tmux window not found)
+
+set -euo pipefail
+
+TARGET="${1:-}"
+
+if [ -z "$TARGET" ]; then
+  echo '{"state":"error","reason":"no target provided","pane_cmd":""}'
+  exit 1
+fi
+
+# Validate tmux target format: session:window or session:window.pane
+if ! [[ "$TARGET" =~ ^[a-zA-Z0-9_.-]+:[a-zA-Z0-9_.-]+(\.[0-9]+)?$ ]]; then
+  echo '{"state":"error","reason":"invalid tmux target format","pane_cmd":""}'
+  exit 1
+fi
+
+# Check session exists (use %%:* to extract session name from session:window)
+if ! tmux list-windows -t "${TARGET%%:*}" &>/dev/null 2>&1; then
+  echo '{"state":"error","reason":"tmux target not found","pane_cmd":""}'
+  exit 1
+fi
+
+# Get the current foreground command in the pane
+PANE_CMD=$(tmux display-message -t "$TARGET" -p '#{pane_current_command}' 2>/dev/null || echo "unknown")
+
+# Capture and strip ANSI codes (use perl for cross-platform compatibility — BSD sed lacks \x1b support)
+RAW=$(tmux capture-pane -t "$TARGET" -p -S -50 2>/dev/null || echo "")
+CLEAN=$(echo "$RAW" | perl -pe 's/\x1b\[[0-9;]*[a-zA-Z]//g; s/\x1b\(B//g; s/\x1b\[\?[0-9]*[hl]//g; s/\r//g' \
+  | grep -v '^[[:space:]]*$' || true)
+
+# --- Check: explicit completion marker ---
+# Must be on its own line (not buried in the objective text sent at spawn time).
+if echo "$CLEAN" | grep -qE "^[[:space:]]*ORCHESTRATOR:DONE[[:space:]]*$"; then
+  jq -n --arg cmd "$PANE_CMD" '{"state":"complete","reason":"ORCHESTRATOR:DONE marker found","pane_cmd":$cmd}'
+  exit 0
+fi
+
+# --- Check: Claude Code approval prompt patterns ---
+LAST_40=$(echo "$CLEAN" | tail -40)
+APPROVAL_PATTERNS=(
+  "Do you want to proceed"
+  "Do you want to make this"
+  "\\[y/n\\]"
+  "\\[Y/n\\]"
+  "\\[n/Y\\]"
+  "Proceed\\?"
+  "Allow this command"
+  "Run bash command"
+  "Allow bash"
+  "Would you like"
+  "Press enter to continue"
+  "Esc to cancel"
+)
+for pattern in "${APPROVAL_PATTERNS[@]}"; do
+  if echo "$LAST_40" | grep -qiE "$pattern"; then
+    jq -n --arg pattern "$pattern" --arg cmd "$PANE_CMD" \
+      '{"state":"waiting_approval","reason":"approval pattern: \($pattern)","pane_cmd":$cmd}'
+    exit 0
+  fi
+done
+
+# --- Check: shell prompt (claude has exited) ---
+# If the foreground process is a shell (not claude/node), the agent has exited
+case "$PANE_CMD" in
+  zsh|bash|fish|sh|dash|tcsh|ksh)
+    jq -n --arg cmd "$PANE_CMD" \
+      '{"state":"idle","reason":"agent exited — shell prompt active","pane_cmd":$cmd}'
+    exit 0
+    ;;
+esac
+
+# Agent is still running (claude/node/python is the foreground process)
+jq -n --arg cmd "$PANE_CMD" \
+  '{"state":"running","reason":"foreground process: \($cmd)","pane_cmd":$cmd}'
+exit 0
--- a/.claude/skills/orchestrate/scripts/find-spare.sh
+++ b/.claude/skills/orchestrate/scripts/find-spare.sh
@@ -0,0 +1,24 @@
+#!/usr/bin/env bash
+# find-spare.sh — list worktrees on spare/N branches (free to use)
+#
+# Usage: find-spare.sh [REPO_ROOT]
+#   REPO_ROOT defaults to the root worktree containing the current git repo.
+#
+# Output (stdout): one line per available worktree: "PATH BRANCH"
+#   e.g.: /Users/me/Code/AutoGPT3 spare/3
+
+set -euo pipefail
+
+REPO_ROOT="${1:-$(git rev-parse --show-toplevel 2>/dev/null || echo "")}"
+if [ -z "$REPO_ROOT" ]; then
+  echo "Error: not inside a git repo and no REPO_ROOT provided" >&2
+  exit 1
+fi
+
+git -C "$REPO_ROOT" worktree list --porcelain \
+  | awk '
+      /^worktree / { path = substr($0, 10) }
+      /^branch /   { branch = substr($0, 8); print path " " branch }
+    ' \
+  | { grep -E " refs/heads/spare/[0-9]+$" || true; } \
+  | sed 's|refs/heads/||'
--- a/.claude/skills/orchestrate/scripts/notify.sh
+++ b/.claude/skills/orchestrate/scripts/notify.sh
@@ -0,0 +1,40 @@
+#!/usr/bin/env bash
+# notify.sh — send a fleet notification message
+#
+# Delivery order (first available wins):
+#   1. Discord webhook — DISCORD_WEBHOOK_URL env var OR state file .discord_webhook
+#   2. macOS notification center — osascript (silent fail if unavailable)
+#   3. Stdout only
+#
+# Usage: notify.sh MESSAGE
+# Exit: always 0 (notification failure must not abort the caller)
+
+MESSAGE="${1:-}"
+[ -z "$MESSAGE" ] && exit 0
+
+STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
+
+# --- Resolve Discord webhook ---
+WEBHOOK="${DISCORD_WEBHOOK_URL:-}"
+if [ -z "$WEBHOOK" ] && [ -f "$STATE_FILE" ]; then
+  WEBHOOK=$(jq -r '.discord_webhook // ""' "$STATE_FILE" 2>/dev/null || echo "")
+fi
+
+# --- Discord delivery ---
+if [ -n "$WEBHOOK" ]; then
+  PAYLOAD=$(jq -n --arg msg "$MESSAGE" '{"content": $msg}')
+  curl -s -X POST "$WEBHOOK" \
+    -H "Content-Type: application/json" \
+    -d "$PAYLOAD" > /dev/null 2>&1 || true
+fi
+
+# --- macOS notification center (silent if not macOS or osascript missing) ---
+if command -v osascript &>/dev/null 2>&1; then
+  # Escape single quotes for AppleScript
+  SAFE_MSG=$(echo "$MESSAGE" | sed "s/'/\\\\'/g")
+  osascript -e "display notification \"${SAFE_MSG}\" with title \"Orchestrator\"" 2>/dev/null || true
+fi
+
+# Always print to stdout so run-loop.sh logs it
+echo "$MESSAGE"
+exit 0
--- a/.claude/skills/orchestrate/scripts/poll-cycle.sh
+++ b/.claude/skills/orchestrate/scripts/poll-cycle.sh
@@ -0,0 +1,257 @@
+#!/usr/bin/env bash
+# poll-cycle.sh — Single orchestrator poll cycle
+#
+# Reads ~/.claude/orchestrator-state.json, classifies each agent, updates state,
+# and outputs a JSON array of actions for Claude to take.
+#
+# Usage: poll-cycle.sh
+# Output (stdout): JSON array of action objects
+#   [{ "window": "work:0", "action": "kick|approve|none", "state": "...",
+#      "worktree": "...", "objective": "...", "reason": "..." }]
+#
+# The state file is updated in-place (atomic write via .tmp).
+
+set -euo pipefail
+
+STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
+SCRIPTS_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+CLASSIFY="$SCRIPTS_DIR/classify-pane.sh"
+
+# Cross-platform md5: always outputs just the hex digest
+md5_hash() {
+  if command -v md5sum &>/dev/null; then
+    md5sum | awk '{print $1}'
+  else
+    md5 | awk '{print $NF}'
+  fi
+}
+
+# Clean up temp file on any exit (avoids stale .tmp if jq write fails)
+trap 'rm -f "${STATE_FILE}.tmp"' EXIT
+
+# Ensure state file exists
+if [ ! -f "$STATE_FILE" ]; then
+  echo '{"active":false,"agents":[]}' > "$STATE_FILE"
+fi
+
+# Validate JSON upfront before any jq reads that run under set -e.
+# A truncated/corrupt file (e.g. from a SIGKILL mid-write) would otherwise
+# abort the script at the ACTIVE read below without emitting any JSON output.
+if ! jq -e '.' "$STATE_FILE" >/dev/null 2>&1; then
+  echo "State file parse error — check $STATE_FILE" >&2
+  echo "[]"
+  exit 0
+fi
+
+ACTIVE=$(jq -r '.active // false' "$STATE_FILE")
+if [ "$ACTIVE" != "true" ]; then
+  echo "[]"
+  exit 0
+fi
+
+NOW=$(date +%s)
+IDLE_THRESHOLD=$(jq -r '.idle_threshold_seconds // 300' "$STATE_FILE")
+
+ACTIONS="[]"
+UPDATED_AGENTS="[]"
+
+# Read agents as newline-delimited JSON objects.
+# jq exits non-zero when .agents[] has no matches on an empty array, which is valid —
+# so we suppress that exit code and separately validate the file is well-formed JSON.
+if ! AGENTS_JSON=$(jq -e -c '.agents // empty | .[]' "$STATE_FILE" 2>/dev/null); then
+  if ! jq -e '.' "$STATE_FILE" > /dev/null 2>&1; then
+    echo "State file parse error — check $STATE_FILE" >&2
+  fi
+  echo "[]"
+  exit 0
+fi
+
+if [ -z "$AGENTS_JSON" ]; then
+  echo "[]"
+  exit 0
+fi
+
+while IFS= read -r agent; do
+  [ -z "$agent" ] && continue
+
+  # Use // "" defaults so a single malformed field doesn't abort the whole cycle
+  WINDOW=$(echo "$agent"   | jq -r '.window // ""')
+  WORKTREE=$(echo "$agent" | jq -r '.worktree // ""')
+  OBJECTIVE=$(echo "$agent"| jq -r '.objective // ""')
+  STATE=$(echo "$agent"    | jq -r '.state // "running"')
+  LAST_HASH=$(echo "$agent"| jq -r '.last_output_hash // ""')
+  IDLE_SINCE=$(echo "$agent"| jq -r '.idle_since // 0')
+  REVISION_COUNT=$(echo "$agent"| jq -r '.revision_count // 0')
+
+  # Validate window format to prevent tmux target injection.
+  # Allow session:window (numeric or named) and session:window.pane
+  if ! [[ "$WINDOW" =~ ^[a-zA-Z0-9_.-]+:[a-zA-Z0-9_.-]+(\.[0-9]+)?$ ]]; then
+    echo "Skipping agent with invalid window value: $WINDOW" >&2
+    UPDATED_AGENTS=$(echo "$UPDATED_AGENTS" | jq --argjson a "$agent" '. + [$a]')
+    continue
+  fi
+
+  # Pass-through terminal-state agents
+  if [[ "$STATE" == "done" || "$STATE" == "escalated" || "$STATE" == "complete" || "$STATE" == "pending_evaluation" ]]; then
+    UPDATED_AGENTS=$(echo "$UPDATED_AGENTS" | jq --argjson a "$agent" '. + [$a]')
+    continue
+  fi
+
+  # Classify pane.
+  # classify-pane.sh always emits JSON before exit (even on error), so using
+  # "|| echo '...'" would concatenate two JSON objects when it exits non-zero.
+  # Use "|| true" inside the substitution so set -euo pipefail does not abort
+  # the poll cycle when classify exits with a non-zero status code.
+  CLASSIFICATION=$("$CLASSIFY" "$WINDOW" 2>/dev/null || true)
+  [ -z "$CLASSIFICATION" ] && CLASSIFICATION='{"state":"error","reason":"classify failed","pane_cmd":"unknown"}'
+
+  PANE_STATE=$(echo "$CLASSIFICATION" | jq -r '.state')
+  PANE_REASON=$(echo "$CLASSIFICATION" | jq -r '.reason')
+
+  # Capture full pane output once — used for hash (stuck detection) and checkpoint parsing.
+  # Use -S -500 to get the last ~500 lines of scrollback so checkpoints aren't missed.
+  RAW=$(tmux capture-pane -t "$WINDOW" -p -S -500 2>/dev/null || echo "")
+
+  # --- Checkpoint tracking ---
+  # Parse any "CHECKPOINT:<step>" lines the agent has output and merge into state file.
+  # The agent writes these as it completes each required step so verify-complete.sh can gate recycling.
+  EXISTING_CPS=$(echo "$agent" | jq -c '.checkpoints // []')
+  NEW_CHECKPOINTS_JSON="$EXISTING_CPS"
+  if [ -n "$RAW" ]; then
+    FOUND_CPS=$(echo "$RAW" \
+      | grep -oE "CHECKPOINT:[a-zA-Z0-9_-]+" \
+      | sed 's/CHECKPOINT://' \
+      | sort -u \
+      | jq -R . | jq -s . 2>/dev/null || echo "[]")
+    NEW_CHECKPOINTS_JSON=$(jq -n \
+      --argjson existing "$EXISTING_CPS" \
+      --argjson found "$FOUND_CPS" \
+      '($existing + $found) | unique' 2>/dev/null || echo "$EXISTING_CPS")
+  fi
+
+  # Compute content hash for stuck-detection (only for running agents)
+  CURRENT_HASH=""
+  if [[ "$PANE_STATE" == "running" ]] && [ -n "$RAW" ]; then
+    CURRENT_HASH=$(echo "$RAW" | tail -20 | md5_hash)
+  fi
+
+  NEW_STATE="$STATE"
+  NEW_IDLE_SINCE="$IDLE_SINCE"
+  NEW_REVISION_COUNT="$REVISION_COUNT"
+  ACTION="none"
+  REASON="$PANE_REASON"
+
+  case "$PANE_STATE" in
+    complete)
+      # Agent output ORCHESTRATOR:DONE — mark pending_evaluation so orchestrator handles it.
+      # run-loop does NOT verify or notify; orchestrator's background poll picks this up.
+      NEW_STATE="pending_evaluation"
+      ACTION="complete"  # run-loop logs it but takes no action
+      ;;
+    waiting_approval)
+      NEW_STATE="waiting_approval"
+      ACTION="approve"
+      ;;
+    idle)
+      # Agent process has exited — needs restart
+      NEW_STATE="idle"
+      ACTION="kick"
+      REASON="agent exited (shell is foreground)"
+      NEW_REVISION_COUNT=$(( REVISION_COUNT + 1 ))
+      NEW_IDLE_SINCE=$NOW
+      if [ "$NEW_REVISION_COUNT" -ge 3 ]; then
+        NEW_STATE="escalated"
+        ACTION="none"
+        REASON="escalated after ${NEW_REVISION_COUNT} kicks — needs human attention"
+      fi
+      ;;
+    running)
+      # Clear idle_since only when transitioning from idle (agent was kicked and
+      # restarted). Do NOT reset for stuck — idle_since must persist across polls
+      # so STUCK_DURATION can accumulate and trigger escalation.
+      # Also update the local IDLE_SINCE so the hash-stability check below uses
+      # the reset value on this same poll, not the stale kick timestamp.
+      if [[ "$STATE" == "idle" ]]; then
+        NEW_IDLE_SINCE=0
+        IDLE_SINCE=0
+      fi
+      # Check if hash has been stable (agent may be stuck mid-task)
+      if [ -n "$CURRENT_HASH" ] && [ "$CURRENT_HASH" = "$LAST_HASH" ] && [ "$LAST_HASH" != "" ]; then
+        if [ "$IDLE_SINCE" = "0" ] || [ "$IDLE_SINCE" = "null" ]; then
+          NEW_IDLE_SINCE=$NOW
+        else
+          STUCK_DURATION=$(( NOW - IDLE_SINCE ))
+          if [ "$STUCK_DURATION" -gt "$IDLE_THRESHOLD" ]; then
+            NEW_REVISION_COUNT=$(( REVISION_COUNT + 1 ))
+            NEW_IDLE_SINCE=$NOW
+            if [ "$NEW_REVISION_COUNT" -ge 3 ]; then
+              NEW_STATE="escalated"
+              ACTION="none"
+              REASON="escalated after ${NEW_REVISION_COUNT} kicks — needs human attention"
+            else
+              NEW_STATE="stuck"
+              ACTION="kick"
+              REASON="output unchanged for ${STUCK_DURATION}s (threshold: ${IDLE_THRESHOLD}s)"
+            fi
+          fi
+        fi
+      else
+        # Only reset the idle timer when we have a valid hash comparison (pane
+        # capture succeeded). If CURRENT_HASH is empty (tmux capture-pane failed),
+        # preserve existing timers so stuck detection is not inadvertently reset.
+        if [ -n "$CURRENT_HASH" ]; then
+          NEW_STATE="running"
+          NEW_IDLE_SINCE=0
+        fi
+      fi
+      ;;
+    error)
+      REASON="classify error: $PANE_REASON"
+      ;;
+  esac
+
+  # Build updated agent record (ensure idle_since and revision_count are numeric)
+  # Use || true on each jq call so a malformed field skips this agent rather than
+  # aborting the entire poll cycle under set -e.
+  UPDATED_AGENT=$(echo "$agent" | jq \
+    --arg state "$NEW_STATE" \
+    --arg hash "$CURRENT_HASH" \
+    --argjson now "$NOW" \
+    --arg idle_since "$NEW_IDLE_SINCE" \
+    --arg revision_count "$NEW_REVISION_COUNT" \
+    --argjson checkpoints "$NEW_CHECKPOINTS_JSON" \
+    '.state = $state
+     | .last_output_hash = (if $hash == "" then .last_output_hash else $hash end)
+     | .last_seen_at = $now
+     | .idle_since = ($idle_since | tonumber)
+     | .revision_count = ($revision_count | tonumber)
+     | .checkpoints = $checkpoints' 2>/dev/null) || {
+    echo "Warning: failed to build updated agent for window $WINDOW — keeping original" >&2
+    UPDATED_AGENTS=$(echo "$UPDATED_AGENTS" | jq --argjson a "$agent" '. + [$a]')
+    continue
+  }
+
+  UPDATED_AGENTS=$(echo "$UPDATED_AGENTS" | jq --argjson a "$UPDATED_AGENT" '. + [$a]')
+
+  # Add action if needed
+  if [ "$ACTION" != "none" ]; then
+    ACTION_OBJ=$(jq -n \
+      --arg window "$WINDOW" \
+      --arg action "$ACTION" \
+      --arg state "$NEW_STATE" \
+      --arg worktree "$WORKTREE" \
+      --arg objective "$OBJECTIVE" \
+      --arg reason "$REASON" \
+      '{window:$window, action:$action, state:$state, worktree:$worktree, objective:$objective, reason:$reason}')
+    ACTIONS=$(echo "$ACTIONS" | jq --argjson a "$ACTION_OBJ" '. + [$a]')
+  fi
+
+done <<< "$AGENTS_JSON"
+
+# Atomic state file update
+jq --argjson agents "$UPDATED_AGENTS" \
+   --argjson now "$NOW" \
+   '.agents = $agents | .last_poll_at = $now' \
+   "$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
+
+echo "$ACTIONS"
--- a/.claude/skills/orchestrate/scripts/recycle-agent.sh
+++ b/.claude/skills/orchestrate/scripts/recycle-agent.sh
@@ -0,0 +1,32 @@
+#!/usr/bin/env bash
+# recycle-agent.sh — kill a tmux window and restore the worktree to its spare branch
+#
+# Usage: recycle-agent.sh WINDOW WORKTREE_PATH SPARE_BRANCH
+#   WINDOW        — tmux target, e.g. autogpt1:3
+#   WORKTREE_PATH — absolute path to the git worktree
+#   SPARE_BRANCH  — branch to restore, e.g. spare/6
+#
+# Stdout: one status line
+
+set -euo pipefail
+
+if [ $# -lt 3 ]; then
+  echo "Usage: recycle-agent.sh WINDOW WORKTREE_PATH SPARE_BRANCH" >&2
+  exit 1
+fi
+
+WINDOW="$1"
+WORKTREE_PATH="$2"
+SPARE_BRANCH="$3"
+
+# Kill the tmux window (ignore error — may already be gone)
+tmux kill-window -t "$WINDOW" 2>/dev/null || true
+
+# Restore to spare branch: abort any in-progress operation, then clean
+git -C "$WORKTREE_PATH" rebase --abort 2>/dev/null || true
+git -C "$WORKTREE_PATH" merge --abort 2>/dev/null || true
+git -C "$WORKTREE_PATH" reset --hard HEAD 2>/dev/null
+git -C "$WORKTREE_PATH" clean -fd 2>/dev/null
+git -C "$WORKTREE_PATH" checkout "$SPARE_BRANCH"
+
+echo "Recycled: $(basename "$WORKTREE_PATH") → $SPARE_BRANCH (window $WINDOW closed)"
--- a/.claude/skills/orchestrate/scripts/run-loop.sh
+++ b/.claude/skills/orchestrate/scripts/run-loop.sh
@@ -0,0 +1,164 @@
+#!/usr/bin/env bash
+# run-loop.sh — Mechanical babysitter for the agent fleet (runs in its own tmux window)
+#
+# Handles ONLY two things that need no intelligence:
+#   idle    → restart claude using --resume SESSION_ID (or --continue) to restore context
+#   approve → auto-approve safe dialogs, press Enter on numbered-option dialogs
+#
+# Everything else — ORCHESTRATOR:DONE, verification, /pr-test, final evaluation,
+# marking done, deciding to close windows — is the orchestrating Claude's job.
+# poll-cycle.sh sets state to pending_evaluation when ORCHESTRATOR:DONE is detected;
+# the orchestrator's background poll loop handles it from there.
+#
+# Usage: run-loop.sh
+# Env:   POLL_INTERVAL (default: 30), ORCHESTRATOR_STATE_FILE
+
+set -euo pipefail
+
+# Copy scripts to a stable location outside the repo so they survive branch
+# checkouts (e.g. recycle-agent.sh switching spare/N back into this worktree
+# would wipe .claude/skills/orchestrate/scripts if the skill only exists on the
+# current branch).
+_ORIGIN_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+STABLE_SCRIPTS_DIR="$HOME/.claude/orchestrator/scripts"
+mkdir -p "$STABLE_SCRIPTS_DIR"
+cp "$_ORIGIN_DIR"/*.sh "$STABLE_SCRIPTS_DIR/"
+chmod +x "$STABLE_SCRIPTS_DIR"/*.sh
+SCRIPTS_DIR="$STABLE_SCRIPTS_DIR"
+
+STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
+POLL_INTERVAL="${POLL_INTERVAL:-30}"
+
+# ---------------------------------------------------------------------------
+# update_state WINDOW FIELD VALUE
+# ---------------------------------------------------------------------------
+update_state() {
+  local window="$1" field="$2" value="$3"
+  jq --arg w "$window" --arg f "$field" --arg v "$value" \
+    '.agents |= map(if .window == $w then .[$f] = $v else . end)' \
+    "$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
+}
+
+update_state_int() {
+  local window="$1" field="$2" value="$3"
+  jq --arg w "$window" --arg f "$field" --argjson v "$value" \
+    '.agents |= map(if .window == $w then .[$f] = $v else . end)' \
+    "$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
+}
+
+agent_field() {
+  jq -r --arg w "$1" --arg f "$2" \
+    '.agents[] | select(.window == $w) | .[$f] // ""' \
+    "$STATE_FILE" 2>/dev/null
+}
+
+# ---------------------------------------------------------------------------
+# wait_for_prompt WINDOW — wait up to 60s for Claude's ❯ prompt
+# ---------------------------------------------------------------------------
+wait_for_prompt() {
+  local window="$1"
+  for i in $(seq 1 60); do
+    local cmd pane
+    cmd=$(tmux display-message -t "$window" -p '#{pane_current_command}' 2>/dev/null || echo "")
+    pane=$(tmux capture-pane -t "$window" -p 2>/dev/null || echo "")
+    if echo "$pane" | grep -q "Enter to confirm"; then
+      tmux send-keys -t "$window" Down Enter; sleep 2; continue
+    fi
+    [[ "$cmd" == "node" ]] && echo "$pane" | grep -q "❯" && return 0
+    sleep 1
+  done
+  return 1  # timed out
+}
+
+# ---------------------------------------------------------------------------
+# handle_kick WINDOW STATE — only for idle (crashed) agents, not stuck
+# ---------------------------------------------------------------------------
+handle_kick() {
+  local window="$1" state="$2"
+  [[ "$state" != "idle" ]] && return  # stuck agents handled by supervisor
+
+  local worktree_path session_id
+  worktree_path=$(agent_field "$window" "worktree_path")
+  session_id=$(agent_field "$window" "session_id")
+
+  echo "[$(date +%H:%M:%S)] KICK restart  $window — agent exited, resuming session"
+
+  # Resume the exact session so the agent retains full context — no need to re-send objective
+  if [ -n "$session_id" ]; then
+    tmux send-keys -t "$window" "cd '${worktree_path}' && claude --resume '${session_id}' --permission-mode bypassPermissions" Enter
+  else
+    tmux send-keys -t "$window" "cd '${worktree_path}' && claude --continue --permission-mode bypassPermissions" Enter
+  fi
+
+  wait_for_prompt "$window" || echo "[$(date +%H:%M:%S)] KICK WARNING  $window — timed out waiting for ❯"
+}
+
+# ---------------------------------------------------------------------------
+# handle_approve WINDOW — auto-approve dialogs that need no judgment
+# ---------------------------------------------------------------------------
+handle_approve() {
+  local window="$1"
+  local pane_tail
+  pane_tail=$(tmux capture-pane -t "$window" -p 2>/dev/null | tail -3 || echo "")
+
+  # Settings error dialog at startup
+  if echo "$pane_tail" | grep -q "Enter to confirm"; then
+    echo "[$(date +%H:%M:%S)] APPROVE dialog $window — settings error"
+    tmux send-keys -t "$window" Down Enter
+    return
+  fi
+
+  # Numbered-option dialog (e.g. "Do you want to make this edit?")
+  # ❯ is already on option 1 (Yes) — Enter confirms it
+  if echo "$pane_tail" | grep -qE "❯\s*1\." || echo "$pane_tail" | grep -q "Esc to cancel"; then
+    echo "[$(date +%H:%M:%S)] APPROVE edit   $window"
+    tmux send-keys -t "$window" "" Enter
+    return
+  fi
+
+  # y/n prompt for safe operations
+  if echo "$pane_tail" | grep -qiE "(^git |^npm |^pnpm |^poetry |^pytest|^docker |^make |^cargo |^pip |^yarn |curl .*(localhost|127\.0\.0\.1))"; then
+    echo "[$(date +%H:%M:%S)] APPROVE safe   $window"
+    tmux send-keys -t "$window" "y" Enter
+    return
+  fi
+
+  # Anything else — supervisor handles it, just log
+  echo "[$(date +%H:%M:%S)] APPROVE skip   $window — unknown dialog, supervisor will handle"
+}
+
+# ---------------------------------------------------------------------------
+# Main loop
+# ---------------------------------------------------------------------------
+echo "[$(date +%H:%M:%S)] run-loop started (mechanical only, poll every ${POLL_INTERVAL}s)"
+echo "[$(date +%H:%M:%S)] Supervisor: orchestrating Claude session (not a separate window)"
+echo "---"
+
+while true; do
+  if ! jq -e '.active == true' "$STATE_FILE" >/dev/null 2>&1; then
+    echo "[$(date +%H:%M:%S)] active=false — exiting."
+    exit 0
+  fi
+
+  ACTIONS=$("$SCRIPTS_DIR/poll-cycle.sh" 2>/dev/null || echo "[]")
+  KICKED=0; DONE=0
+
+  while IFS= read -r action; do
+    [ -z "$action" ] && continue
+    WINDOW=$(echo "$action" | jq -r '.window // ""')
+    ACTION=$(echo "$action" | jq -r '.action // ""')
+    STATE=$(echo "$action"  | jq -r '.state // ""')
+
+    case "$ACTION" in
+      kick)     handle_kick "$WINDOW" "$STATE" || true; KICKED=$(( KICKED + 1 )) ;;
+      approve)  handle_approve "$WINDOW" || true ;;
+      complete) DONE=$(( DONE + 1 )) ;;  # poll-cycle already set state=pending_evaluation; orchestrator handles
+    esac
+  done < <(echo "$ACTIONS" | jq -c '.[]' 2>/dev/null || true)
+
+  RUNNING=$(jq '[.agents[] | select(.state | test("running|stuck|waiting_approval|idle"))] | length' \
+    "$STATE_FILE" 2>/dev/null || echo 0)
+
+  echo "[$(date +%H:%M:%S)] Poll — ${RUNNING} running  ${KICKED} kicked  ${DONE} recycled"
+  sleep "$POLL_INTERVAL"
+done
--- a/.claude/skills/orchestrate/scripts/spawn-agent.sh
+++ b/.claude/skills/orchestrate/scripts/spawn-agent.sh
@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+# spawn-agent.sh — create tmux window, checkout branch, launch claude, send task
+#
+# Usage: spawn-agent.sh SESSION WORKTREE_PATH SPARE_BRANCH NEW_BRANCH OBJECTIVE [PR_NUMBER] [STEPS...]
+#   SESSION       — tmux session name, e.g. autogpt1
+#   WORKTREE_PATH — absolute path to the git worktree
+#   SPARE_BRANCH  — spare branch being replaced, e.g. spare/6 (saved for recycle)
+#   NEW_BRANCH    — task branch to create, e.g. feat/my-feature
+#   OBJECTIVE     — task description sent to the agent
+#   PR_NUMBER     — (optional) GitHub PR number for completion verification
+#   STEPS...      — (optional) required checkpoint names, e.g. pr-address pr-test
+#
+# Stdout: SESSION:WINDOW_INDEX (nothing else — callers rely on this)
+# Exit non-zero on failure.
+
+set -euo pipefail
+
+if [ $# -lt 5 ]; then
+  echo "Usage: spawn-agent.sh SESSION WORKTREE_PATH SPARE_BRANCH NEW_BRANCH OBJECTIVE [PR_NUMBER] [STEPS...]" >&2
+  exit 1
+fi
+
+SESSION="$1"
+WORKTREE_PATH="$2"
+SPARE_BRANCH="$3"
+NEW_BRANCH="$4"
+OBJECTIVE="$5"
+PR_NUMBER="${6:-}"
+STEPS=("${@:7}")
+WORKTREE_NAME=$(basename "$WORKTREE_PATH")
+STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
+
+# Generate a stable session ID so this agent's Claude session can always be resumed:
+#   claude --resume $SESSION_ID --permission-mode bypassPermissions
+SESSION_ID=$(uuidgen 2>/dev/null || python3 -c "import uuid; print(uuid.uuid4())")
+
+# Create (or switch to) the task branch
+git -C "$WORKTREE_PATH" checkout -b "$NEW_BRANCH" 2>/dev/null \
+  || git -C "$WORKTREE_PATH" checkout "$NEW_BRANCH"
+
+# Open a new named tmux window; capture its numeric index
+WIN_IDX=$(tmux new-window -t "$SESSION" -n "$WORKTREE_NAME" -P -F '#{window_index}')
+WINDOW="${SESSION}:${WIN_IDX}"
+
+# Append the initial agent record to the state file so subsequent jq updates find it.
+# This must happen before the pr_number/steps update below.
+if [ -f "$STATE_FILE" ]; then
+  NOW=$(date +%s)
+  jq --arg window "$WINDOW" \
+     --arg worktree "$WORKTREE_NAME" \
+     --arg worktree_path "$WORKTREE_PATH" \
+     --arg spare_branch "$SPARE_BRANCH" \
+     --arg branch "$NEW_BRANCH" \
+     --arg objective "$OBJECTIVE" \
+     --arg session_id "$SESSION_ID" \
+     --argjson now "$NOW" \
+     '.agents += [{
+       "window": $window,
+       "worktree": $worktree,
+       "worktree_path": $worktree_path,
+       "spare_branch": $spare_branch,
+       "branch": $branch,
+       "objective": $objective,
+       "session_id": $session_id,
+       "state": "running",
+       "checkpoints": [],
+       "last_output_hash": "",
+       "last_seen_at": $now,
+       "spawned_at": $now,
+       "idle_since": 0,
+       "revision_count": 0,
+       "last_rebriefed_at": 0
+     }]' "$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
+fi
+
+# Store pr_number + steps in state file if provided (enables verify-complete.sh).
+# The agent record was appended above so the jq select now finds it.
+if [ -n "$PR_NUMBER" ] && [ -f "$STATE_FILE" ]; then
+  if [ "${#STEPS[@]}" -gt 0 ]; then
+    STEPS_JSON=$(printf '%s\n' "${STEPS[@]}" | jq -R . | jq -s .)
+  else
+    STEPS_JSON='[]'
+  fi
+  jq --arg w "$WINDOW" --arg pr "$PR_NUMBER" --argjson steps "$STEPS_JSON" \
+    '.agents |= map(if .window == $w then . + {pr_number: $pr, steps: $steps, checkpoints: []} else . end)' \
+    "$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
+fi
+
+# Launch claude with a stable session ID so it can always be resumed after a crash:
+#   claude --resume SESSION_ID --permission-mode bypassPermissions
+tmux send-keys -t "$WINDOW" "cd '${WORKTREE_PATH}' && claude --permission-mode bypassPermissions --session-id '${SESSION_ID}'" Enter
+
+# Wait up to 60s for claude to be fully interactive:
+# both pane_current_command == 'node' AND the '❯' prompt is visible.
+PROMPT_FOUND=false
+for i in $(seq 1 60); do
+  CMD=$(tmux display-message -t "$WINDOW" -p '#{pane_current_command}' 2>/dev/null || echo "")
+  PANE=$(tmux capture-pane -t "$WINDOW" -p 2>/dev/null || echo "")
+  if echo "$PANE" | grep -q "Enter to confirm"; then
+    tmux send-keys -t "$WINDOW" Down Enter
+    sleep 2
+    continue
+  fi
+  if [[ "$CMD" == "node" ]] && echo "$PANE" | grep -q "❯"; then
+    PROMPT_FOUND=true
+    break
+  fi
+  sleep 1
+done
+
+if ! $PROMPT_FOUND; then
+  echo "[spawn-agent] WARNING: timed out waiting for ❯ prompt on $WINDOW — sending objective anyway" >&2
+fi
+
+# Send the task. Split text and Enter — if combined, Enter can fire before the string
+# is fully buffered, leaving the message stuck as "[Pasted text +N lines]" unsent.
+tmux send-keys -t "$WINDOW" "${OBJECTIVE} Output each completed step as CHECKPOINT:<step-name>. When ALL steps are done, output ORCHESTRATOR:DONE on its own line."
+sleep 0.3
+tmux send-keys -t "$WINDOW" Enter
+
+# Only output the window address — nothing else (callers parse this)
+echo "$WINDOW"
--- a/.claude/skills/orchestrate/scripts/status.sh
+++ b/.claude/skills/orchestrate/scripts/status.sh
@@ -0,0 +1,43 @@
+#!/usr/bin/env bash
+# status.sh — print orchestrator status: state file summary + live tmux pane commands
+#
+# Usage: status.sh
+# Reads: ~/.claude/orchestrator-state.json
+
+set -euo pipefail
+
+STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
+
+if [ ! -f "$STATE_FILE" ] || ! jq -e '.' "$STATE_FILE" >/dev/null 2>&1; then
+  echo "No orchestrator state found at $STATE_FILE"
+  exit 0
+fi
+
+# Header: active status, session, thresholds, last poll
+jq -r '
+  "=== Orchestrator [\(if .active then "RUNNING" else "STOPPED" end)] ===",
+  "Session: \(.tmux_session // "unknown")  |  Idle threshold: \(.idle_threshold_seconds // 300)s",
+  "Last poll: \(if (.last_poll_at // 0) == 0 then "never" else (.last_poll_at | strftime("%H:%M:%S")) end)",
+  ""
+' "$STATE_FILE"
+
+# Each agent: state, window, worktree/branch, truncated objective
+AGENT_COUNT=$(jq '.agents | length' "$STATE_FILE")
+if [ "$AGENT_COUNT" -eq 0 ]; then
+  echo "  (no agents registered)"
+else
+  jq -r '
+    .agents[] |
+    "  [\(.state | ascii_upcase)] \(.window)  \(.worktree)/\(.branch)",
+    "    \(.objective // "" | .[0:70])"
+  ' "$STATE_FILE"
+fi
+
+echo ""
+
+# Live pane_current_command for non-done agents
+while IFS= read -r WINDOW; do
+  [ -z "$WINDOW" ] && continue
+  CMD=$(tmux display-message -t "$WINDOW" -p '#{pane_current_command}' 2>/dev/null || echo "unreachable")
+  echo "  $WINDOW live: $CMD"
+done < <(jq -r '.agents[] | select(.state != "done") | .window' "$STATE_FILE" 2>/dev/null || true)
--- a/.claude/skills/orchestrate/scripts/verify-complete.sh
+++ b/.claude/skills/orchestrate/scripts/verify-complete.sh
@@ -0,0 +1,180 @@
+#!/usr/bin/env bash
+# verify-complete.sh — verify a PR task is truly done before marking the agent done
+#
+# Check order matters:
+#   1. Checkpoints — did the agent do all required steps?
+#   2. CI complete — no pending (bots post comments AFTER their check runs, must wait)
+#   3. CI passing — no failures (agent must fix before done)
+#   4. spawned_at — a new CI run was triggered after agent spawned (proves real work)
+#   5. Unresolved threads — checked AFTER CI so bot-posted comments are included
+#   6. CHANGES_REQUESTED — checked AFTER CI so bot reviews are included
+#
+# Usage: verify-complete.sh WINDOW
+# Exit 0 = verified complete; exit 1 = not complete (stderr has reason)
+
+set -euo pipefail
+
+WINDOW="$1"
+STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
+
+PR_NUMBER=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .pr_number // ""' "$STATE_FILE" 2>/dev/null)
+STEPS=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .steps // [] | .[]' "$STATE_FILE" 2>/dev/null || true)
+CHECKPOINTS=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .checkpoints // [] | .[]' "$STATE_FILE" 2>/dev/null || true)
+WORKTREE_PATH=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .worktree_path // ""' "$STATE_FILE" 2>/dev/null)
+BRANCH=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .branch // ""' "$STATE_FILE" 2>/dev/null)
+SPAWNED_AT=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .spawned_at // "0"' "$STATE_FILE" 2>/dev/null || echo "0")
+
+# No PR number = cannot verify
+if [ -z "$PR_NUMBER" ]; then
+  echo "NOT COMPLETE: no pr_number in state — set pr_number or mark done manually" >&2
+  exit 1
+fi
+
+# --- Check 1: all required steps are checkpointed ---
+MISSING=""
+while IFS= read -r step; do
+  [ -z "$step" ] && continue
+  if ! echo "$CHECKPOINTS" | grep -qFx "$step"; then
+    MISSING="$MISSING $step"
+  fi
+done <<< "$STEPS"
+
+if [ -n "$MISSING" ]; then
+  echo "NOT COMPLETE: missing checkpoints:$MISSING on PR #$PR_NUMBER" >&2
+  exit 1
+fi
+
+# Resolve repo for all GitHub checks below
+REPO=$(jq -r '.repo // ""' "$STATE_FILE" 2>/dev/null || echo "")
+if [ -z "$REPO" ] && [ -n "$WORKTREE_PATH" ] && [ -d "$WORKTREE_PATH" ]; then
+  REPO=$(git -C "$WORKTREE_PATH" remote get-url origin 2>/dev/null \
+    | sed 's|.*github\.com[:/]||; s|\.git$||' || echo "")
+fi
+
+if [ -z "$REPO" ]; then
+  echo "Warning: cannot resolve repo — skipping CI/thread checks" >&2
+  echo "VERIFIED: PR #$PR_NUMBER — checkpoints ✓ (CI/thread checks skipped — no repo)"
+  exit 0
+fi
+
+CI_BUCKETS=$(gh pr checks "$PR_NUMBER" --repo "$REPO" --json bucket 2>/dev/null || echo "[]")
+
+# --- Check 2: CI fully complete — no pending checks ---
+# Pending checks MUST finish before we check threads/reviews:
+# bots (Seer, Check PR Status, etc.) post comments and CHANGES_REQUESTED AFTER their CI check runs.
+PENDING=$(echo "$CI_BUCKETS" | jq '[.[] | select(.bucket == "pending")] | length' 2>/dev/null || echo "0")
+if [ "$PENDING" -gt 0 ]; then
+  PENDING_NAMES=$(gh pr checks "$PR_NUMBER" --repo "$REPO" --json bucket,name 2>/dev/null \
+    | jq -r '[.[] | select(.bucket == "pending") | .name] | join(", ")' 2>/dev/null || echo "unknown")
+  echo "NOT COMPLETE: $PENDING CI checks still pending on PR #$PR_NUMBER ($PENDING_NAMES)" >&2
+  exit 1
+fi
+
+# --- Check 3: CI passing — no failures ---
+FAILING=$(echo "$CI_BUCKETS" | jq '[.[] | select(.bucket == "fail")] | length' 2>/dev/null || echo "0")
+if [ "$FAILING" -gt 0 ]; then
+  FAILING_NAMES=$(gh pr checks "$PR_NUMBER" --repo "$REPO" --json bucket,name 2>/dev/null \
+    | jq -r '[.[] | select(.bucket == "fail") | .name] | join(", ")' 2>/dev/null || echo "unknown")
+  echo "NOT COMPLETE: $FAILING failing CI checks on PR #$PR_NUMBER ($FAILING_NAMES)" >&2
+  exit 1
+fi
+
+# --- Check 4: a new CI run was triggered AFTER the agent spawned ---
+if [ -n "$BRANCH" ] && [ "${SPAWNED_AT:-0}" -gt 0 ]; then
+  LATEST_RUN_AT=$(gh run list --repo "$REPO" --branch "$BRANCH" \
+    --json createdAt --limit 1 2>/dev/null | jq -r '.[0].createdAt // ""')
+  if [ -n "$LATEST_RUN_AT" ]; then
+    if date --version >/dev/null 2>&1; then
+      LATEST_RUN_EPOCH=$(date -d "$LATEST_RUN_AT" "+%s" 2>/dev/null || echo "0")
+    else
+      LATEST_RUN_EPOCH=$(TZ=UTC date -j -f "%Y-%m-%dT%H:%M:%SZ" "$LATEST_RUN_AT" "+%s" 2>/dev/null || echo "0")
+    fi
+    if [ "$LATEST_RUN_EPOCH" -le "$SPAWNED_AT" ]; then
+      echo "NOT COMPLETE: latest CI run on $BRANCH predates agent spawn — agent may not have pushed yet" >&2
+      exit 1
+    fi
+  fi
+fi
+
+OWNER=$(echo "$REPO" | cut -d/ -f1)
+REPONAME=$(echo "$REPO" | cut -d/ -f2)
+
+# --- Check 5: no unresolved review threads (checked AFTER CI — bots post after their check) ---
+UNRESOLVED=$(gh api graphql -f query="
+  { repository(owner: \"${OWNER}\", name: \"${REPONAME}\") {
+      pullRequest(number: ${PR_NUMBER}) {
+        reviewThreads(first: 50) { nodes { isResolved } }
+      }
+    }
+  }
+" --jq '[.data.repository.pullRequest.reviewThreads.nodes[] | select(.isResolved == false)] | length' 2>/dev/null || echo "0")
+
+if [ "$UNRESOLVED" -gt 0 ]; then
+  echo "NOT COMPLETE: $UNRESOLVED unresolved review threads on PR #$PR_NUMBER" >&2
+  exit 1
+fi
+
+# --- Check 6: no CHANGES_REQUESTED (checked AFTER CI — bots post reviews after their check) ---
+# A CHANGES_REQUESTED review is stale if the latest commit was pushed AFTER the review was submitted.
+# Stale reviews (pre-dating the fixing commits) should not block verification.
+#
+# Fetch commits and latestReviews in a single call and fail closed — if gh fails,
+# treat that as NOT COMPLETE rather than silently passing.
+# Use latestReviews (not reviews) so each reviewer's latest state is used — superseded
+# CHANGES_REQUESTED entries are automatically excluded when the reviewer later approved.
+# Note: we intentionally use committedDate (not PR updatedAt) because updatedAt changes on any
+# PR activity (bot comments, label changes) which would create false negatives.
+PR_REVIEW_METADATA=$(gh pr view "$PR_NUMBER" --repo "$REPO" \
+  --json commits,latestReviews 2>/dev/null) || {
+  echo "NOT COMPLETE: unable to fetch PR review metadata for PR #$PR_NUMBER" >&2
+  exit 1
+}
+
+LATEST_COMMIT_DATE=$(jq -r '.commits[-1].committedDate // ""' <<< "$PR_REVIEW_METADATA")
+CHANGES_REQUESTED_REVIEWS=$(jq '[.latestReviews[]? | select(.state == "CHANGES_REQUESTED")]' <<< "$PR_REVIEW_METADATA")
+
+BLOCKING_CHANGES_REQUESTED=0
+BLOCKING_REQUESTERS=""
+
+if [ -n "$LATEST_COMMIT_DATE" ] && [ "$(echo "$CHANGES_REQUESTED_REVIEWS" | jq length)" -gt 0 ]; then
+  if date --version >/dev/null 2>&1; then
+    LATEST_COMMIT_EPOCH=$(date -d "$LATEST_COMMIT_DATE" "+%s" 2>/dev/null || echo "0")
+  else
+    LATEST_COMMIT_EPOCH=$(TZ=UTC date -j -f "%Y-%m-%dT%H:%M:%SZ" "$LATEST_COMMIT_DATE" "+%s" 2>/dev/null || echo "0")
+  fi
+
+  while IFS= read -r review; do
+    [ -z "$review" ] && continue
+    REVIEW_DATE=$(echo "$review" | jq -r '.submittedAt // ""')
+    REVIEWER=$(echo "$review" | jq -r '.author.login // "unknown"')
+    if [ -z "$REVIEW_DATE" ]; then
+      # No submission date — treat as fresh (conservative: blocks verification)
+      BLOCKING_CHANGES_REQUESTED=$(( BLOCKING_CHANGES_REQUESTED + 1 ))
+      BLOCKING_REQUESTERS="${BLOCKING_REQUESTERS:+$BLOCKING_REQUESTERS, }${REVIEWER}"
+    else
+      if date --version >/dev/null 2>&1; then
+        REVIEW_EPOCH=$(date -d "$REVIEW_DATE" "+%s" 2>/dev/null || echo "0")
+      else
+        REVIEW_EPOCH=$(TZ=UTC date -j -f "%Y-%m-%dT%H:%M:%SZ" "$REVIEW_DATE" "+%s" 2>/dev/null || echo "0")
+      fi
+      if [ "$REVIEW_EPOCH" -gt "$LATEST_COMMIT_EPOCH" ]; then
+        # Review was submitted AFTER latest commit — still fresh, blocks verification
+        BLOCKING_CHANGES_REQUESTED=$(( BLOCKING_CHANGES_REQUESTED + 1 ))
+        BLOCKING_REQUESTERS="${BLOCKING_REQUESTERS:+$BLOCKING_REQUESTERS, }${REVIEWER}"
+      fi
+      # Review submitted BEFORE latest commit — stale, skip
+    fi
+  done <<< "$(echo "$CHANGES_REQUESTED_REVIEWS" | jq -c '.[]')"
+else
+  # No commit date or no changes_requested — check raw count as fallback
+  BLOCKING_CHANGES_REQUESTED=$(echo "$CHANGES_REQUESTED_REVIEWS" | jq length 2>/dev/null || echo "0")
+  BLOCKING_REQUESTERS=$(echo "$CHANGES_REQUESTED_REVIEWS" | jq -r '[.[].author.login] | join(", ")' 2>/dev/null || echo "unknown")
+fi
+
+if [ "$BLOCKING_CHANGES_REQUESTED" -gt 0 ]; then
+  echo "NOT COMPLETE: CHANGES_REQUESTED (after latest commit) from ${BLOCKING_REQUESTERS} on PR #$PR_NUMBER" >&2
+  exit 1
+fi
+
+echo "VERIFIED: PR #$PR_NUMBER — checkpoints ✓, CI complete + green, 0 unresolved threads, no CHANGES_REQUESTED"
+exit 0
--- a/.claude/skills/pr-address/SKILL.md
+++ b/.claude/skills/pr-address/SKILL.md
@@ -2,7 +2,7 @@
 name: pr-address
 description: Address PR review comments and loop until CI green and all comments resolved. TRIGGER when user asks to address comments, fix PR feedback, respond to reviewers, or babysit/monitor a PR.
 user-invocable: true
-args: "[PR number or URL] — if omitted, finds PR for current branch."
+argument-hint: "[PR number or URL] — if omitted, finds PR for current branch."
 metadata:
  author: autogpt-team
  version: "1.0.0"
@@ -17,18 +17,70 @@ gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoG
 gh pr view {N}
 ```

-## Fetch comments (all sources)
+## Read the PR description
+
+Understand the **Why / What / How** before addressing comments — you need context to make good fixes:

 ```bash
-gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews       # top-level reviews
-gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments      # inline review comments
-gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments     # PR conversation comments
+gh pr view {N} --json body --jq '.body'
 ```

-**Bots to watch for:**
- `autogpt-reviewer` — posts "Blockers", "Should Fix", "Nice to Have". Address ALL of them.
- `sentry[bot]` — bug predictions. Fix real bugs, explain false positives.
- `coderabbitai[bot]` — automated review. Address actionable items.
+## Fetch comments (all sources)
+
+### 1. Inline review threads — GraphQL (primary source of actionable items)
+
+Use GraphQL to fetch inline threads. It natively exposes `isResolved`, returns threads already grouped with all replies, and paginates via cursor — no manual thread reconstruction needed.
+
+```bash
+gh api graphql -f query='
+{
+  repository(owner: "Significant-Gravitas", name: "AutoGPT") {
+    pullRequest(number: {N}) {
+      reviewThreads(first: 100) {
+        pageInfo { hasNextPage endCursor }
+        nodes {
+          id
+          isResolved
+          path
+          comments(last: 1) {
+            nodes { databaseId body author { login } createdAt }
+          }
+        }
+      }
+    }
+  }
+}'
+```
+
+If `pageInfo.hasNextPage` is true, fetch subsequent pages by adding `after: "<endCursor>"` to `reviewThreads(first: 100, after: "...")` and repeat until `hasNextPage` is false.
+
+**Filter to unresolved threads only** — skip any thread where `isResolved: true`. `comments(last: 1)` returns the most recent comment in the thread — act on that; it reflects the reviewer's final ask. Use the thread `id` (Relay global ID) to track threads across polls.
+
+### 2. Top-level reviews — REST (MUST paginate)
+
+```bash
+gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate
+```
+
+**CRITICAL — always `--paginate`.** Reviews default to 30 per page. PRs can have 80–170+ reviews (mostly empty resolution events). Without pagination you miss reviews past position 30 — including `autogpt-reviewer`'s structured review which is typically posted after several CI runs and sits well beyond the first page.
+
+Two things to extract:
+- **Overall state**: look for `CHANGES_REQUESTED` or `APPROVED` reviews.
+- **Actionable feedback**: non-empty bodies only. Empty-body reviews are thread-resolution events — they indicate progress but have no feedback to act on.
+
+**Where each reviewer posts:**
+- `autogpt-reviewer` — posts detailed structured reviews ("Blockers", "Should Fix", "Nice to Have") as **top-level reviews**. Not present on every PR. Address ALL items.
+- `sentry[bot]` — posts bug predictions as **inline threads**. Fix real bugs, explain false positives.
+- `coderabbitai[bot]` — posts summaries as **top-level reviews** AND actionable items as **inline threads**. Address actionable items.
+- Human reviewers — can post in any source. Address ALL non-empty feedback.
+
+### 3. PR conversation comments — REST
+
+```bash
+gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments --paginate
+```
+
+Mostly contains: bot summaries (`coderabbitai[bot]`), CI/conflict detection (`github-actions[bot]`), and author status updates. Scan for non-empty messages from non-bot human reviewers that aren't the PR author — those are the ones that need a response.

 ## For each unaddressed comment

@@ -38,10 +90,34 @@ Address comments **one at a time**: fix → commit → push → inline reply →
 2. Commit and push the fix
 3. Reply **inline** (not as a new top-level comment) referencing the fixing commit — this is what resolves the conversation for bot reviewers (coderabbitai, sentry):

+Use a **markdown commit link** so GitHub renders it as a clickable reference. Get the full SHA with `git rev-parse HEAD` after committing:
+
 | Comment type | How to reply |
 |---|---|
-| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="Fixed in <commit-sha>: <description>"` |
-| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="Fixed in <commit-sha>: <description>"` |
+| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="🤖 Fixed in [abc1234](https://github.com/Significant-Gravitas/AutoGPT/commit/FULL_SHA): <description>"` |
+| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="🤖 Fixed in [abc1234](https://github.com/Significant-Gravitas/AutoGPT/commit/FULL_SHA): <description>"` |
+
+## Codecov coverage
+
+Codecov patch target is **80%** on changed lines. Checks are **informational** (not blocking) but should be green.
+
+### Running coverage locally
+
+**Backend** (from `autogpt_platform/backend/`):
+```bash
+poetry run pytest -s -vv --cov=backend --cov-branch --cov-report term-missing
+```
+
+**Frontend** (from `autogpt_platform/frontend/`):
+```bash
+pnpm vitest run --coverage
+```
+
+### When codecov/patch fails
+
+1. Find uncovered files: `git diff --name-only $(gh pr view --json baseRefName --jq '.baseRefName')...HEAD`
+2. For each uncovered file — extract inline logic to `helpers.ts`/`helpers.py` and test those (highest ROI). Colocate tests as `*_test.py` (backend) or `__tests__/*.test.ts` (frontend).
+3. Run coverage locally to verify, commit, push.

 ## Format and commit

@@ -61,7 +137,9 @@ kill $REST_PID 2>/dev/null; trap - EXIT
 ```
 Never manually edit files in `src/app/api/__generated__/`.

-Then commit and **push immediately** — never batch commits without pushing.
+Then commit and **push immediately** — never batch commits without pushing. Each fix should be visible on GitHub right away so CI can start and reviewers can see progress.
+
+**Never push empty commits** (`git commit --allow-empty`) to re-trigger CI or bot checks. When a check fails, investigate the root cause (unchecked PR checklist, unaddressed review comments, code issues) and fix those directly. Empty commits add noise to git history.

 For backend commits in worktrees: `poetry run git commit` (pre-commit hooks).

@@ -69,11 +147,88 @@ For backend commits in worktrees: `poetry run git commit` (pre-commit hooks).

 ```text
 address comments → format → commit → push
-→ re-check comments → fix new ones → push
-→ wait for CI → re-check comments after CI settles
+→ wait for CI (while addressing new comments) → fix failures → push
+→ re-check comments after CI settles
 → repeat until: all comments addressed AND CI green AND no new comments arriving
 ```

-While CI runs, stay productive: run local tests, address remaining comments.
+### Polling for CI + new comments

-**The loop ends when:** CI fully green + all comments addressed + no new comments since CI settled.
+After pushing, poll for **both** CI status and new comments in a single loop. Do not use `gh pr checks --watch` — it blocks the tool and prevents reacting to new comments while CI is running.
+
+> **Note:** `gh pr checks --watch --fail-fast` is tempting but it blocks the entire Bash tool call, meaning the agent cannot check for or address new comments until CI fully completes. Always poll manually instead.
+
+**Polling loop — repeat every 30 seconds:**
+
+1. Check CI status:
+```bash
+gh pr checks {N} --repo Significant-Gravitas/AutoGPT --json bucket,name,link
+```
+   Parse the results: if every check has `bucket` of `"pass"` or `"skipping"`, CI is green. If any has `"fail"`, CI has failed. Otherwise CI is still pending.
+
+2. Check for merge conflicts:
+```bash
+gh pr view {N} --repo Significant-Gravitas/AutoGPT --json mergeable --jq '.mergeable'
+```
+   If the result is `"CONFLICTING"`, the PR has a merge conflict — see "Resolving merge conflicts" below. If `"UNKNOWN"`, GitHub is still computing mergeability — wait and re-check next poll.
+
+3. Check for new/changed comments (all three sources):
+
+   **Inline threads** — re-run the GraphQL query from "Fetch comments". For each unresolved thread, record `{thread_id, last_comment_databaseId}` as your baseline. On each poll, action is needed if:
+   - A new thread `id` appears that wasn't in the baseline (new thread), OR
+   - An existing thread's `last_comment_databaseId` has changed (new reply on existing thread)
+
+   **Conversation comments:**
+   ```bash
+   gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments --paginate
+   ```
+   Compare total count and newest `id` against baseline. Filter to non-empty, non-bot, non-author-update messages.
+
+   **Top-level reviews:**
+   ```bash
+   gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate
+   ```
+   Watch for new non-empty reviews (`CHANGES_REQUESTED` or `COMMENTED` with body). Compare total count and newest `id` against baseline.
+
+4. **React in this precedence order (first match wins):**
+
+| What happened | Action |
+|---|---|
+| Merge conflict detected | See "Resolving merge conflicts" below. |
+| Mergeability is `UNKNOWN` | GitHub is still computing mergeability. Sleep 30 seconds, then restart polling from the top. |
+| New comments detected | Address them (fix → commit → push → reply). After pushing, re-fetch all comments to update your baseline, then restart this polling loop from the top (new commits invalidate CI status). |
+| CI failed (bucket == "fail") | Get failed check links: `gh pr checks {N} --repo Significant-Gravitas/AutoGPT --json bucket,link --jq '.[] \| select(.bucket == "fail") \| .link'`. Extract run ID from link (format: `.../actions/runs/<run-id>/job/...`), read logs with `gh run view <run-id> --repo Significant-Gravitas/AutoGPT --log-failed`. Fix → commit → push → restart polling. |
+| CI green + no new comments | **Do not exit immediately.** Bots (coderabbitai, sentry) often post reviews shortly after CI settles. Continue polling for **2 more cycles (60s)** after CI goes green. Only exit after 2 consecutive green+quiet polls. |
+| CI pending + no new comments | Sleep 30 seconds, then poll again. |
+
+**The loop ends when:** CI fully green + all comments addressed + **2 consecutive polls with no new comments after CI settled.**
+
+### Resolving merge conflicts
+
+1. Identify the PR's target branch and remote:
+```bash
+gh pr view {N} --repo Significant-Gravitas/AutoGPT --json baseRefName --jq '.baseRefName'
+git remote -v   # find the remote pointing to Significant-Gravitas/AutoGPT (typically 'upstream' in forks, 'origin' for direct contributors)
+```
+
+2. Pull the latest base branch with a 3-way merge:
+```bash
+git pull {base-remote} {base-branch} --no-rebase
+```
+
+3. Resolve conflicting files, then verify no conflict markers remain:
+```bash
+if grep -R -n -E '^(<<<<<<<|=======|>>>>>>>)' <conflicted-files>; then
+  echo "Unresolved conflict markers found — resolve before proceeding."
+  exit 1
+fi
+```
+
+4. Stage and push:
+```bash
+git add <conflicted-files>
+git commit -m "Resolve merge conflicts with {base-branch}"
+git push
+```
+
+5. Restart the polling loop from the top — new commits reset CI status.
--- a/.claude/skills/pr-review/SKILL.md
+++ b/.claude/skills/pr-review/SKILL.md
@@ -17,6 +17,16 @@ gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoG
 gh pr view {N}
 ```

+## Read the PR description
+
+Before reading code, understand the **why**, **what**, and **how** from the PR description:
+
+```bash
+gh pr view {N} --json body --jq '.body'
+```
+
+Every PR should have a Why / What / How structure. If any of these are missing, note it as feedback.
+
 ## Read the diff

 ```bash
@@ -28,12 +38,14 @@ gh pr diff {N}
 Before posting anything, fetch existing inline comments to avoid duplicates:

 ```bash
-gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments
+gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate
 gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews
 ```

 ## What to check

+**Description quality:** Does the PR description cover Why (motivation/problem), What (summary of changes), and How (approach/implementation details)? If any are missing, request them — you can't judge the approach without understanding the problem and intent.
+
 **Correctness:** logic errors, off-by-one, missing edge cases, race conditions (TOCTOU in file access, credit charging), error handling gaps, async correctness (missing `await`, unclosed resources).

 **Security:** input validation at boundaries, no injection (command, XSS, SQL), secrets not logged, file paths sanitized (`os.path.basename()` in error messages).
--- a/.claude/skills/pr-test/SKILL.md
+++ b/.claude/skills/pr-test/SKILL.md
@@ -0,0 +1,874 @@
+---
+name: pr-test
+description: "E2E manual testing of PRs/branches using docker compose, agent-browser, and API calls. TRIGGER when user asks to manually test a PR, test a feature end-to-end, or run integration tests against a running system."
+user-invocable: true
+argument-hint: "[worktree path or PR number] — tests the PR in the given worktree. Optional flags: --fix (auto-fix issues found)"
+metadata:
+  author: autogpt-team
+  version: "2.0.0"
+---
+
+# Manual E2E Test
+
+Test a PR/branch end-to-end by building the full platform, interacting via browser and API, capturing screenshots, and reporting results.
+
+## Critical Requirements
+
+These are NON-NEGOTIABLE. Every test run MUST satisfy ALL the following:
+
+### 1. Screenshots at Every Step
+- Take a screenshot at EVERY significant test step — not just at the end
+- Every test scenario MUST have at least one BEFORE and one AFTER screenshot
+- Name screenshots sequentially: `{NN}-{action}-{state}.png` (e.g., `01-credits-before.png`, `02-credits-after.png`)
+- If a screenshot is missing for a scenario, the test is INCOMPLETE — go back and take it
+
+### 2. Screenshots MUST Be Posted to PR
+- Push ALL screenshots to a temp branch `test-screenshots/pr-{N}`
+- Post a PR comment with ALL screenshots embedded inline using GitHub raw URLs
+- This is NOT optional — every test run MUST end with a PR comment containing screenshots
+- If screenshot upload fails, retry. If it still fails, list failed files and require manual drag-and-drop/paste attachment in the PR comment
+
+### 3. State Verification with Before/After Evidence
+- For EVERY state-changing operation (API call, user action), capture the state BEFORE and AFTER
+- Log the actual API response values (e.g., `credits_before=100, credits_after=95`)
+- Screenshot MUST show the relevant UI state change
+- Compare expected vs actual values explicitly — do not just eyeball it
+
+### 4. Negative Test Cases Are Mandatory
+- Test at least ONE negative case per feature (e.g., insufficient credits, invalid input, unauthorized access)
+- Verify error messages are user-friendly and accurate
+- Verify the system state did NOT change after a rejected operation
+
+### 5. Test Report Must Include Full Evidence
+Each test scenario in the report MUST have:
+- **Steps**: What was done (exact commands or UI actions)
+- **Expected**: What should happen
+- **Actual**: What actually happened
+- **API Evidence**: Before/after API response values for state-changing operations
+- **Screenshot Evidence**: Before/after screenshots with explanations
+
+## State Manipulation for Realistic Testing
+
+When testing features that depend on specific states (rate limits, credits, quotas):
+
+1. **Use Redis CLI to set counters directly:**
+   ```bash
+   # Find the Redis container
+   REDIS_CONTAINER=$(docker ps --format '{{.Names}}' | grep redis | head -1)
+   # Set a key with expiry
+   docker exec $REDIS_CONTAINER redis-cli SET key value EX ttl
+   # Example: Set rate limit counter to near-limit
+   docker exec $REDIS_CONTAINER redis-cli SET "rate_limit:user:test@test.com" 99 EX 3600
+   # Example: Check current value
+   docker exec $REDIS_CONTAINER redis-cli GET "rate_limit:user:test@test.com"
+   ```
+
+2. **Use API calls to check before/after state:**
+   ```bash
+   # BEFORE: Record current state
+   BEFORE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits')
+   echo "Credits BEFORE: $BEFORE"
+
+   # Perform the action...
+
+   # AFTER: Record new state and compare
+   AFTER=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits')
+   echo "Credits AFTER: $AFTER"
+   echo "Delta: $(( BEFORE - AFTER ))"
+   ```
+
+3. **Take screenshots BEFORE and AFTER state changes** — the UI must reflect the backend state change
+
+4. **Never rely on mocked/injected browser state** — always use real backend state. Do NOT use `agent-browser eval` to fake UI state. The backend must be the source of truth.
+
+5. **Use direct DB queries when needed:**
+   ```bash
+   # Query via Supabase's PostgREST or docker exec into the DB
+   docker exec supabase-db psql -U supabase_admin -d postgres -c "SELECT credits FROM user_credits WHERE user_id = '...';"
+   ```
+
+6. **After every API test, verify the state change actually persisted:**
+   ```bash
+   # Example: After a credits purchase, verify DB matches API
+   API_CREDITS=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits')
+   DB_CREDITS=$(docker exec supabase-db psql -U supabase_admin -d postgres -t -c "SELECT credits FROM user_credits WHERE user_id = '...';" | tr -d ' ')
+   [ "$API_CREDITS" = "$DB_CREDITS" ] && echo "CONSISTENT" || echo "MISMATCH: API=$API_CREDITS DB=$DB_CREDITS"
+   ```
+
+## Arguments
+
+- `$ARGUMENTS` — worktree path (e.g. `$REPO_ROOT`) or PR number
+- If `--fix` flag is present, auto-fix bugs found and push fixes (like pr-address loop)
+
+## Step 0: Resolve the target
+
+```bash
+# If argument is a PR number, find its worktree
+gh pr view {N} --json headRefName --jq '.headRefName'
+# If argument is a path, use it directly
+```
+
+Determine:
+- `REPO_ROOT` — the root repo directory: `git -C "$WORKTREE_PATH" worktree list | head -1 | awk '{print $1}'` (or `git rev-parse --show-toplevel` if not a worktree)
+- `WORKTREE_PATH` — the worktree directory
+- `PLATFORM_DIR` — `$WORKTREE_PATH/autogpt_platform`
+- `BACKEND_DIR` — `$PLATFORM_DIR/backend`
+- `FRONTEND_DIR` — `$PLATFORM_DIR/frontend`
+- `PR_NUMBER` — the PR number (from `gh pr list --head $(git branch --show-current)`)
+- `PR_TITLE` — the PR title, slugified (e.g. "Add copilot permissions" → "add-copilot-permissions")
+- `RESULTS_DIR` — `$REPO_ROOT/test-results/PR-{PR_NUMBER}-{slugified-title}`
+
+Create the results directory:
+```bash
+PR_NUMBER=$(cd $WORKTREE_PATH && gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT --json number --jq '.[0].number')
+PR_TITLE=$(cd $WORKTREE_PATH && gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT --json title --jq '.[0].title' | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//' | head -c 50)
+RESULTS_DIR="$REPO_ROOT/test-results/PR-${PR_NUMBER}-${PR_TITLE}"
+mkdir -p $RESULTS_DIR
+```
+
+**Test user credentials** (for logging into the UI or verifying results manually):
+- Email: `test@test.com`
+- Password: `testtest123`
+
+## Step 1: Understand the PR
+
+Before testing, understand what changed:
+
+```bash
+cd $WORKTREE_PATH
+
+# Read PR description to understand the WHY
+gh pr view {N} --json body --jq '.body'
+
+git log --oneline dev..HEAD | head -20
+git diff dev --stat
+```
+
+Read the PR description (Why / What / How) and changed files to understand:
+0. **Why** does this PR exist? What problem does it solve?
+1. **What** feature/fix does this PR implement?
+2. **How** does it work? What's the approach?
+3. What components are affected? (backend, frontend, copilot, executor, etc.)
+4. What are the key user-facing behaviors to test?
+
+## Step 2: Write test scenarios
+
+Based on the PR analysis, write a test plan to `$RESULTS_DIR/test-plan.md`:
+
+```markdown
+# Test Plan: PR #{N} — {title}
+
+## Scenarios
+1. [Scenario name] — [what to verify]
+2. ...
+
+## API Tests (if applicable)
+1. [Endpoint] — [expected behavior]
+   - Before state: [what to check before]
+   - After state: [what to verify changed]
+
+## UI Tests (if applicable)
+1. [Page/component] — [interaction to test]
+   - Screenshot before: [what to capture]
+   - Screenshot after: [what to capture]
+
+## Negative Tests (REQUIRED — at least one per feature)
+1. [What should NOT happen] — [how to trigger it]
+   - Expected error: [what error message/code]
+   - State unchanged: [what to verify did NOT change]
+```
+
+**Be critical** — include edge cases, error paths, and security checks. Every scenario MUST specify what screenshots to take and what state to verify.
+
+## Step 3: Environment setup
+
+### 3a. Copy .env files from the root worktree
+
+The root worktree (`$REPO_ROOT`) has the canonical `.env` files with all API keys. Copy them to the target worktree:
+
+```bash
+# CRITICAL: .env files are NOT checked into git. They must be copied manually.
+cp $REPO_ROOT/autogpt_platform/.env $PLATFORM_DIR/.env
+cp $REPO_ROOT/autogpt_platform/backend/.env $BACKEND_DIR/.env
+cp $REPO_ROOT/autogpt_platform/frontend/.env $FRONTEND_DIR/.env
+```
+
+### 3b. Configure copilot authentication
+
+The copilot needs an LLM API to function. Two approaches (try subscription first):
+
+#### Option 1: Subscription mode (preferred — uses your Claude Max/Pro subscription)
+
+The `claude_agent_sdk` Python package **bundles its own Claude CLI binary** — no need to install `@anthropic-ai/claude-code` via npm. The backend auto-provisions credentials from environment variables on startup.
+
+Run the helper script to extract tokens from your host and auto-update `backend/.env` (works on macOS, Linux, and Windows/WSL):
+
+```bash
+# Extracts OAuth tokens and writes CLAUDE_CODE_OAUTH_TOKEN + CLAUDE_CODE_REFRESH_TOKEN into .env
+bash $BACKEND_DIR/scripts/refresh_claude_token.sh --env-file $BACKEND_DIR/.env
+```
+
+**How it works:** The script reads the OAuth token from:
+- **macOS**: system keychain (`"Claude Code-credentials"`)
+- **Linux/WSL**: `~/.claude/.credentials.json`
+- **Windows**: `%APPDATA%/claude/.credentials.json`
+
+It sets `CLAUDE_CODE_OAUTH_TOKEN`, `CLAUDE_CODE_REFRESH_TOKEN`, and `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true` in the `.env` file. On container startup, the backend auto-provisions `~/.claude/.credentials.json` inside the container from these env vars. The SDK's bundled CLI then authenticates using that file. No `claude login`, no npm install needed.
+
+**Note:** The OAuth token expires (~24h). If copilot returns auth errors, re-run the script and restart: `$BACKEND_DIR/scripts/refresh_claude_token.sh --env-file $BACKEND_DIR/.env && docker compose up -d copilot_executor`
+
+#### Option 2: OpenRouter API key mode (fallback)
+
+If subscription mode doesn't work, switch to API key mode using OpenRouter:
+
+```bash
+# In $BACKEND_DIR/.env, ensure these are set:
+CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false
+CHAT_API_KEY=<value of OPEN_ROUTER_API_KEY from the same .env>
+CHAT_BASE_URL=https://openrouter.ai/api/v1
+CHAT_USE_CLAUDE_AGENT_SDK=true
+```
+
+Use `sed` to update these values:
+```bash
+ORKEY=$(grep "^OPEN_ROUTER_API_KEY=" $BACKEND_DIR/.env | cut -d= -f2)
+[ -n "$ORKEY" ] || { echo "ERROR: OPEN_ROUTER_API_KEY is missing in $BACKEND_DIR/.env"; exit 1; }
+perl -i -pe 's/CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true/CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false/' $BACKEND_DIR/.env
+# Add or update CHAT_API_KEY and CHAT_BASE_URL
+grep -q "^CHAT_API_KEY=" $BACKEND_DIR/.env && perl -i -pe "s|^CHAT_API_KEY=.*|CHAT_API_KEY=$ORKEY|" $BACKEND_DIR/.env || echo "CHAT_API_KEY=$ORKEY" >> $BACKEND_DIR/.env
+grep -q "^CHAT_BASE_URL=" $BACKEND_DIR/.env && perl -i -pe 's|^CHAT_BASE_URL=.*|CHAT_BASE_URL=https://openrouter.ai/api/v1|' $BACKEND_DIR/.env || echo "CHAT_BASE_URL=https://openrouter.ai/api/v1" >> $BACKEND_DIR/.env
+```
+
+### 3c. Stop conflicting containers
+
+```bash
+# Stop any running app containers (keep infra: supabase, redis, rabbitmq, clamav)
+docker ps --format "{{.Names}}" | grep -E "rest_server|executor|copilot|websocket|database_manager|scheduler|notification|frontend|migrate" | while read name; do
+  docker stop "$name" 2>/dev/null
+done
+```
+
+### 3e. Build and start
+
+```bash
+cd $PLATFORM_DIR && docker compose build --no-cache 2>&1 | tail -20
+if [ ${PIPESTATUS[0]} -ne 0 ]; then echo "ERROR: Docker build failed"; exit 1; fi
+
+cd $PLATFORM_DIR && docker compose up -d 2>&1 | tail -20
+if [ ${PIPESTATUS[0]} -ne 0 ]; then echo "ERROR: Docker compose up failed"; exit 1; fi
+```
+
+**Note:** If the container appears to be running old code (e.g. missing PR changes), use `docker compose build --no-cache` to force a full rebuild. Docker BuildKit may sometimes reuse cached `COPY` layers from a previous build on a different branch.
+
+**Expected time: 3-8 minutes** for build, 5-10 minutes with `--no-cache`.
+
+### 3f. Wait for services to be ready
+
+```bash
+# Poll until backend and frontend respond
+for i in $(seq 1 60); do
+  BACKEND=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8006/docs 2>/dev/null)
+  FRONTEND=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000 2>/dev/null)
+  if [ "$BACKEND" = "200" ] && [ "$FRONTEND" = "200" ]; then
+    echo "Services ready"
+    break
+  fi
+  sleep 5
+done
+```
+
+
+### 3h. Create test user and get auth token
+
+```bash
+ANON_KEY=$(grep "NEXT_PUBLIC_SUPABASE_ANON_KEY=" $FRONTEND_DIR/.env | sed 's/.*NEXT_PUBLIC_SUPABASE_ANON_KEY=//' | tr -d '[:space:]')
+
+# Signup (idempotent — returns "User already registered" if exists)
+RESULT=$(curl -s -X POST 'http://localhost:8000/auth/v1/signup' \
+  -H "apikey: $ANON_KEY" \
+  -H 'Content-Type: application/json' \
+  -d '{"email":"test@test.com","password":"testtest123"}')
+
+# If "Database error finding user", restart supabase-auth and retry
+if echo "$RESULT" | grep -q "Database error"; then
+  docker restart supabase-auth && sleep 5
+  curl -s -X POST 'http://localhost:8000/auth/v1/signup' \
+    -H "apikey: $ANON_KEY" \
+    -H 'Content-Type: application/json' \
+    -d '{"email":"test@test.com","password":"testtest123"}'
+fi
+
+# Get auth token
+TOKEN=$(curl -s -X POST 'http://localhost:8000/auth/v1/token?grant_type=password' \
+  -H "apikey: $ANON_KEY" \
+  -H 'Content-Type: application/json' \
+  -d '{"email":"test@test.com","password":"testtest123"}' | jq -r '.access_token // ""')
+```
+
+**Use this token for ALL API calls:**
+```bash
+curl -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/...
+```
+
+## Step 4: Run tests
+
+### Service ports reference
+
+| Service | Port | URL |
+|---------|------|-----|
+| Frontend | 3000 | http://localhost:3000 |
+| Backend REST | 8006 | http://localhost:8006 |
+| Supabase Auth (via Kong) | 8000 | http://localhost:8000 |
+| Executor | 8002 | http://localhost:8002 |
+| Copilot Executor | 8008 | http://localhost:8008 |
+| WebSocket | 8001 | http://localhost:8001 |
+| Database Manager | 8005 | http://localhost:8005 |
+| Redis | 6379 | localhost:6379 |
+| RabbitMQ | 5672 | localhost:5672 |
+
+### API testing
+
+Use `curl` with the auth token for backend API tests. **For EVERY API call that changes state, record before/after values:**
+
+```bash
+# Example: List agents
+curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/graphs | jq . | head -20
+
+# Example: Create an agent
+curl -s -X POST http://localhost:8006/api/graphs \
+  -H "Authorization: Bearer $TOKEN" \
+  -H 'Content-Type: application/json' \
+  -d '{...}' | jq .
+
+# Example: Run an agent
+curl -s -X POST "http://localhost:8006/api/graphs/{graph_id}/execute" \
+  -H "Authorization: Bearer $TOKEN" \
+  -H 'Content-Type: application/json' \
+  -d '{"data": {...}}'
+
+# Example: Get execution results
+curl -s -H "Authorization: Bearer $TOKEN" \
+  "http://localhost:8006/api/graphs/{graph_id}/executions/{exec_id}" | jq .
+```
+
+**State verification pattern (use for EVERY state-changing API call):**
+```bash
+# 1. Record BEFORE state
+BEFORE_STATE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/{resource} | jq '{relevant_fields}')
+echo "BEFORE: $BEFORE_STATE"
+
+# 2. Perform the action
+ACTION_RESULT=$(curl -s -X POST ... | jq .)
+echo "ACTION RESULT: $ACTION_RESULT"
+
+# 3. Record AFTER state
+AFTER_STATE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/{resource} | jq '{relevant_fields}')
+echo "AFTER: $AFTER_STATE"
+
+# 4. Log the comparison
+echo "=== STATE CHANGE VERIFICATION ==="
+echo "Before: $BEFORE_STATE"
+echo "After: $AFTER_STATE"
+echo "Expected change: {describe what should have changed}"
+```
+
+### Browser testing with agent-browser
+
+```bash
+# Close any existing session
+agent-browser close 2>/dev/null || true
+
+# Use --session-name to persist cookies across navigations
+# This means login only needs to happen once per test session
+agent-browser --session-name pr-test open 'http://localhost:3000/login' --timeout 15000
+
+# Get interactive elements
+agent-browser --session-name pr-test snapshot | grep "textbox\|button"
+
+# Login
+agent-browser --session-name pr-test fill {email_ref} "test@test.com"
+agent-browser --session-name pr-test fill {password_ref} "testtest123"
+agent-browser --session-name pr-test click {login_button_ref}
+sleep 5
+
+# Dismiss cookie banner if present
+agent-browser --session-name pr-test click 'text=Accept All' 2>/dev/null || true
+
+# Navigate — cookies are preserved so login persists
+agent-browser --session-name pr-test open 'http://localhost:3000/copilot' --timeout 10000
+
+# Take screenshot
+agent-browser --session-name pr-test screenshot $RESULTS_DIR/01-page.png
+
+# Interact with elements
+agent-browser --session-name pr-test fill {ref} "text"
+agent-browser --session-name pr-test press "Enter"
+agent-browser --session-name pr-test click {ref}
+agent-browser --session-name pr-test click 'text=Button Text'
+
+# Read page content
+agent-browser --session-name pr-test snapshot | grep "text:"
+```
+
+**Key pages:**
+- `/copilot` — CoPilot chat (for testing copilot features)
+- `/build` — Agent builder (for testing block/node features)
+- `/build?flowID={id}` — Specific agent in builder
+- `/library` — Agent library (for testing listing/import features)
+- `/library/agents/{id}` — Agent detail with run history
+- `/marketplace` — Marketplace
+
+### Checking logs
+
+```bash
+# Backend REST server
+docker logs autogpt_platform-rest_server-1 2>&1 | tail -30
+
+# Executor (runs agent graphs)
+docker logs autogpt_platform-executor-1 2>&1 | tail -30
+
+# Copilot executor (runs copilot chat sessions)
+docker logs autogpt_platform-copilot_executor-1 2>&1 | tail -30
+
+# Frontend
+docker logs autogpt_platform-frontend-1 2>&1 | tail -30
+
+# Filter for errors
+docker logs autogpt_platform-executor-1 2>&1 | grep -i "error\|exception\|traceback" | tail -20
+```
+
+### Copilot chat testing
+
+The copilot uses SSE streaming. To test via API:
+
+```bash
+# Create a session
+SESSION_ID=$(curl -s -X POST 'http://localhost:8006/api/chat/sessions' \
+  -H "Authorization: Bearer $TOKEN" \
+  -H 'Content-Type: application/json' \
+  -d '{}' | jq -r '.id // .session_id // ""')
+
+# Stream a message (SSE - will stream chunks)
+curl -N -X POST "http://localhost:8006/api/chat/sessions/$SESSION_ID/stream" \
+  -H "Authorization: Bearer $TOKEN" \
+  -H 'Content-Type: application/json' \
+  -d '{"message": "Hello, what can you help me with?"}' \
+  --max-time 60 2>/dev/null | head -50
+```
+
+Or test via browser (preferred for UI verification):
+```bash
+agent-browser --session-name pr-test open 'http://localhost:3000/copilot' --timeout 10000
+# ... fill chat input and press Enter, wait 20-30s for response
+```
+
+## Step 5: Record results and take screenshots
+
+**Take a screenshot at EVERY significant test step** — before and after interactions, on success, and on failure. This is NON-NEGOTIABLE.
+
+**Required screenshot pattern for each test scenario:**
+```bash
+# BEFORE the action
+agent-browser --session-name pr-test screenshot $RESULTS_DIR/{NN}-{scenario}-before.png
+
+# Perform the action...
+
+# AFTER the action
+agent-browser --session-name pr-test screenshot $RESULTS_DIR/{NN}-{scenario}-after.png
+```
+
+**Naming convention:**
+```bash
+# Examples:
+# $RESULTS_DIR/01-login-page-before.png
+# $RESULTS_DIR/02-login-page-after.png
+# $RESULTS_DIR/03-credits-page-before.png
+# $RESULTS_DIR/04-credits-purchase-after.png
+# $RESULTS_DIR/05-negative-insufficient-credits.png
+# $RESULTS_DIR/06-error-state.png
+```
+
+**Minimum requirements:**
+- At least TWO screenshots per test scenario (before + after)
+- At least ONE screenshot for each negative test case showing the error state
+- If a test fails, screenshot the failure state AND any error logs visible in the UI
+
+## Step 6: Show results to user with screenshots
+
+**CRITICAL: After all tests complete, you MUST show every screenshot to the user using the Read tool, with an explanation of what each screenshot shows.** This is the most important part of the test report — the user needs to visually verify the results.
+
+For each screenshot:
+1. Use the `Read` tool to display the PNG file (Claude can read images)
+2. Write a 1-2 sentence explanation below it describing:
+   - What page/state is being shown
+   - What the screenshot proves (which test scenario it validates)
+   - Any notable details visible in the UI
+
+Format the output like this:
+
+```markdown
+### Screenshot 1: {descriptive title}
+[Read the PNG file here]
+
+**What it shows:** {1-2 sentence explanation of what this screenshot proves}
+
+---
+```
+
+After showing all screenshots, output a **detailed** summary table:
+
+| # | Scenario | Result | API Evidence | Screenshot Evidence |
+|---|----------|--------|-------------|-------------------|
+| 1 | {name} | PASS/FAIL | Before: X, After: Y | 01-before.png, 02-after.png |
+| 2 | ... | ... | ... | ... |
+
+**IMPORTANT:** As you show each screenshot and record test results, persist them in shell variables for Step 7:
+
+```bash
+# Build these variables during Step 6 — they are required by Step 7's script
+# NOTE: declare -A requires Bash 4.0+. This is standard on modern systems (macOS ships zsh
+# but Homebrew bash is 5.x; Linux typically has bash 5.x). If running on Bash <4, use a
+# plain variable with a lookup function instead.
+declare -A SCREENSHOT_EXPLANATIONS=(
+  ["01-login-page.png"]="Shows the login page loaded successfully with SSO options visible."
+  ["02-builder-with-block.png"]="The builder canvas displays the newly added block connected to the trigger."
+  # ... one entry per screenshot, using the same explanations you showed the user above
+)
+
+TEST_RESULTS_TABLE="| 1 | Login flow | PASS | N/A | 01-login-before.png, 02-login-after.png |
+| 2 | Credits purchase | PASS | Before: 100, After: 95 | 03-credits-before.png, 04-credits-after.png |
+| 3 | Insufficient credits (negative) | PASS | Credits: 0, rejected | 05-insufficient-credits-error.png |"
+# ... one row per test scenario with actual results
+```
+
+## Step 7: Post test report as PR comment with screenshots
+
+Upload screenshots to the PR using the GitHub Git API (no local git operations — safe for worktrees), then post a comment with inline images and per-screenshot explanations.
+
+**This step is MANDATORY. Every test run MUST post a PR comment with screenshots. No exceptions.**
+
+**CRITICAL — NEVER post a bare directory link like `https://github.com/.../tree/...`.** Every screenshot MUST appear as `![name](raw_url)` inline in the PR comment so reviewers can see them without clicking any links. After posting, the verification step below greps the comment for `![` tags and exits 1 if none are found — the test run is considered incomplete until this passes.
+
+```bash
+# Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
+REPO="Significant-Gravitas/AutoGPT"
+SCREENSHOTS_BRANCH="test-screenshots/pr-${PR_NUMBER}"
+SCREENSHOTS_DIR="test-screenshots/PR-${PR_NUMBER}"
+
+# Step 1: Create blobs for each screenshot and build tree JSON
+# Retry each blob upload up to 3 times. If still failing, list them at end of report.
+shopt -s nullglob
+SCREENSHOT_FILES=("$RESULTS_DIR"/*.png)
+if [ ${#SCREENSHOT_FILES[@]} -eq 0 ]; then
+  echo "ERROR: No screenshots found in $RESULTS_DIR. Test run is incomplete."
+  exit 1
+fi
+TREE_JSON='['
+FIRST=true
+FAILED_UPLOADS=()
+for img in "${SCREENSHOT_FILES[@]}"; do
+  BASENAME=$(basename "$img")
+  B64=$(base64 < "$img")
+  BLOB_SHA=""
+  for attempt in 1 2 3; do
+    BLOB_SHA=$(gh api "repos/${REPO}/git/blobs" -f content="$B64" -f encoding="base64" --jq '.sha' 2>/dev/null || true)
+    [ -n "$BLOB_SHA" ] && break
+    sleep 1
+  done
+  if [ -z "$BLOB_SHA" ]; then
+    FAILED_UPLOADS+=("$img")
+    continue
+  fi
+  if [ "$FIRST" = true ]; then FIRST=false; else TREE_JSON+=','; fi
+  TREE_JSON+="{\"path\":\"${SCREENSHOTS_DIR}/${BASENAME}\",\"mode\":\"100644\",\"type\":\"blob\",\"sha\":\"${BLOB_SHA}\"}"
+done
+TREE_JSON+=']'
+
+# Step 2: Create tree, commit, and branch ref
+TREE_SHA=$(echo "$TREE_JSON" | jq -c '{tree: .}' | gh api "repos/${REPO}/git/trees" --input - --jq '.sha')
+
+# Resolve parent commit so screenshots are chained, not orphan root commits
+PARENT_SHA=$(gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" --jq '.object.sha' 2>/dev/null || echo "")
+if [ -n "$PARENT_SHA" ]; then
+  COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
+    -f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
+    -f tree="$TREE_SHA" \
+    -f "parents[]=$PARENT_SHA" \
+    --jq '.sha')
+else
+  COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
+    -f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
+    -f tree="$TREE_SHA" \
+    --jq '.sha')
+fi
+
+gh api "repos/${REPO}/git/refs" \
+  -f ref="refs/heads/${SCREENSHOTS_BRANCH}" \
+  -f sha="$COMMIT_SHA" 2>/dev/null \
+  || gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" \
+    -X PATCH -f sha="$COMMIT_SHA" -F force=true
+```
+
+Then post the comment with **inline images AND explanations for each screenshot**:
+
+```bash
+REPO_URL="https://raw.githubusercontent.com/${REPO}/${SCREENSHOTS_BRANCH}"
+
+# Build image markdown using uploaded image URLs; skip FAILED_UPLOADS (listed separately)
+
+IMAGE_MARKDOWN=""
+for img in "${SCREENSHOT_FILES[@]}"; do
+  BASENAME=$(basename "$img")
+  TITLE=$(echo "${BASENAME%.png}" | sed 's/^[0-9]*-//' | sed 's/-/ /g' | awk '{for(i=1;i<=NF;i++) $i=toupper(substr($i,1,1)) tolower(substr($i,2))}1')
+  # Skip images that failed to upload — they will be listed at the end
+  IS_FAILED=false
+  for failed in "${FAILED_UPLOADS[@]}"; do
+    [ "$(basename "$failed")" = "$BASENAME" ] && IS_FAILED=true && break
+  done
+  if [ "$IS_FAILED" = true ]; then
+    continue
+  fi
+  EXPLANATION="${SCREENSHOT_EXPLANATIONS[$BASENAME]}"
+  if [ -z "$EXPLANATION" ]; then
+    echo "ERROR: Missing screenshot explanation for $BASENAME. Add it to SCREENSHOT_EXPLANATIONS in Step 6."
+    exit 1
+  fi
+  IMAGE_MARKDOWN="${IMAGE_MARKDOWN}
+### ${TITLE}
+![${BASENAME}](${REPO_URL}/${SCREENSHOTS_DIR}/${BASENAME})
+${EXPLANATION}
+"
+done
+
+# Write comment body to file to avoid shell interpretation issues with special characters
+COMMENT_FILE=$(mktemp)
+# If any uploads failed, append a section listing them with instructions
+FAILED_SECTION=""
+if [ ${#FAILED_UPLOADS[@]} -gt 0 ]; then
+  FAILED_SECTION="
+## ⚠️ Failed Screenshot Uploads
+The following screenshots could not be uploaded via the GitHub API after 3 retries.
+**To add them:** drag-and-drop or paste these files into a PR comment manually:
+"
+  for failed in "${FAILED_UPLOADS[@]}"; do
+    FAILED_SECTION="${FAILED_SECTION}
+- \`$(basename "$failed")\` (local path: \`$failed\`)"
+  done
+  FAILED_SECTION="${FAILED_SECTION}
+
+**Run status:** INCOMPLETE until the files above are manually attached and visible inline in the PR."
+fi
+
+cat > "$COMMENT_FILE" <<INNEREOF
+## E2E Test Report
+
+| # | Scenario | Result | API Evidence | Screenshot Evidence |
+|---|----------|--------|-------------|-------------------|
+${TEST_RESULTS_TABLE}
+
+${IMAGE_MARKDOWN}
+${FAILED_SECTION}
+INNEREOF
+
+gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE"
+rm -f "$COMMENT_FILE"
+
+# Verify the posted comment contains inline images — exit 1 if none found
+# Use separate --paginate + jq pipe: --jq applies per-page, not to the full list
+LAST_COMMENT=$(gh api "repos/${REPO}/issues/$PR_NUMBER/comments" --paginate 2>/dev/null | jq -r '.[-1].body // ""')
+if ! echo "$LAST_COMMENT" | grep -q '!\['; then
+  echo "ERROR: Posted comment contains no inline images (![). Bare directory links are not acceptable." >&2
+  exit 1
+fi
+echo "✓ Inline images verified in posted comment"
+```
+
+**The PR comment MUST include:**
+1. A summary table of all scenarios with PASS/FAIL and before/after API evidence
+2. Every successfully uploaded screenshot rendered inline; any failed uploads listed with manual attachment instructions
+3. A 1-2 sentence explanation below each screenshot describing what it proves
+
+This approach uses the GitHub Git API to create blobs, trees, commits, and refs entirely server-side. No local `git checkout` or `git push` — safe for worktrees and won't interfere with the PR branch.
+
+## Step 8: Evaluate and post a formal PR review
+
+After the test comment is posted, evaluate whether the run was thorough enough to make a merge decision, then post a formal GitHub review (approve or request changes). **This step is mandatory — every test run MUST end with a formal review decision.**
+
+### Evaluation criteria
+
+Re-read the PR description:
+```bash
+gh pr view "$PR_NUMBER" --json body --jq '.body' --repo "$REPO"
+```
+
+Score the run against each criterion:
+
+| Criterion | Pass condition |
+|-----------|---------------|
+| **Coverage** | Every feature/change described in the PR has at least one test scenario |
+| **All scenarios pass** | No FAIL rows in the results table |
+| **Negative tests** | At least one failure-path test per feature (invalid input, unauthorized, edge case) |
+| **Before/after evidence** | Every state-changing API call has before/after values logged |
+| **Screenshots are meaningful** | Screenshots show the actual state change, not just a loading spinner or blank page |
+| **No regressions** | Existing core flows (login, agent create/run) still work |
+
+### Decision logic
+
+```
+ALL criteria pass                            → APPROVE
+Any scenario FAIL or missing PR feature      → REQUEST_CHANGES (list gaps)
+Evidence weak (no before/after, vague shots) → REQUEST_CHANGES (list what's missing)
+```
+
+### Post the review
+
+```bash
+REVIEW_FILE=$(mktemp)
+
+# Count results
+PASS_COUNT=$(echo "$TEST_RESULTS_TABLE" | grep -c "PASS" || true)
+FAIL_COUNT=$(echo "$TEST_RESULTS_TABLE" | grep -c "FAIL" || true)
+TOTAL=$(( PASS_COUNT + FAIL_COUNT ))
+
+# List any coverage gaps found during evaluation (populate this array as you assess)
+# e.g. COVERAGE_GAPS=("PR claims to add X but no test covers it")
+COVERAGE_GAPS=()
+```
+
+**If APPROVING** — all criteria met, zero failures, full coverage:
+
+```bash
+cat > "$REVIEW_FILE" <<REVIEWEOF
+## E2E Test Evaluation — APPROVED
+
+**Results:** ${PASS_COUNT}/${TOTAL} scenarios passed.
+
+**Coverage:** All features described in the PR were exercised.
+
+**Evidence:** Before/after API values logged for all state-changing operations; screenshots show meaningful state transitions.
+
+**Negative tests:** Failure paths tested for each feature.
+
+No regressions observed on core flows.
+REVIEWEOF
+
+gh pr review "$PR_NUMBER" --repo "$REPO" --approve --body "$(cat "$REVIEW_FILE")"
+echo "✅ PR approved"
+```
+
+**If REQUESTING CHANGES** — any failure, coverage gap, or missing evidence:
+
+```bash
+FAIL_LIST=$(echo "$TEST_RESULTS_TABLE" | grep "FAIL" | awk -F'|' '{print "- Scenario" $2 "failed"}' || true)
+
+cat > "$REVIEW_FILE" <<REVIEWEOF
+## E2E Test Evaluation — Changes Requested
+
+**Results:** ${PASS_COUNT}/${TOTAL} scenarios passed, ${FAIL_COUNT} failed.
+
+### Required before merge
+
+${FAIL_LIST}
+$(for gap in "${COVERAGE_GAPS[@]}"; do echo "- $gap"; done)
+
+Please fix the above and re-run the E2E tests.
+REVIEWEOF
+
+gh pr review "$PR_NUMBER" --repo "$REPO" --request-changes --body "$(cat "$REVIEW_FILE")"
+echo "❌ Changes requested"
+```
+
+```bash
+rm -f "$REVIEW_FILE"
+```
+
+**Rules:**
+- In `--fix` mode, fix all failures before posting the review — the review reflects the final state after fixes
+- Never approve if any scenario failed, even if it seems like a flake — rerun that scenario first
+- Never request changes for issues already fixed in this run
+
+## Fix mode (--fix flag)
+
+When `--fix` is present, the standard is HIGHER. Do not just note issues — FIX them immediately.
+
+### Fix protocol for EVERY issue found (including UX issues):
+
+1. **Identify** the root cause in the code — read the relevant source files
+2. **Write a failing test first** (TDD): For backend bugs, write a test marked with `pytest.mark.xfail(reason="...")`. For frontend/Playwright bugs, write a test with `.fixme` annotation. Run it to confirm it fails as expected.
+3. **Screenshot** the broken state: `agent-browser screenshot $RESULTS_DIR/{NN}-broken-{description}.png`
+4. **Fix** the code in the worktree
+5. **Rebuild** ONLY the affected service (not the whole stack):
+   ```bash
+   cd $PLATFORM_DIR && docker compose up --build -d {service_name}
+   # e.g., docker compose up --build -d rest_server
+   # e.g., docker compose up --build -d frontend
+   ```
+6. **Wait** for the service to be ready (poll health endpoint)
+7. **Re-test** the same scenario
+8. **Screenshot** the fixed state: `agent-browser screenshot $RESULTS_DIR/{NN}-fixed-{description}.png`
+9. **Remove the xfail/fixme marker** from the test written in step 2, and verify it passes
+10. **Verify** the fix did not break other scenarios (run a quick smoke test)
+11. **Commit and push** immediately:
+   ```bash
+   cd $WORKTREE_PATH
+   git add -A
+   git commit -m "fix: {description of fix}"
+   git push
+   ```
+12. **Continue** to the next test scenario
+
+### Fix loop (like pr-address)
+
+```text
+test scenario → find issue (bug OR UX problem) → screenshot broken state
+→ fix code → rebuild affected service only → re-test → screenshot fixed state
+→ verify no regressions → commit + push
+→ repeat for next scenario
+→ after ALL scenarios pass, run full re-test to verify everything together
+```
+
+**Key differences from non-fix mode:**
+- UX issues count as bugs — fix them (bad alignment, confusing labels, missing loading states)
+- Every fix MUST have a before/after screenshot pair proving it works
+- Commit after EACH fix, not in a batch at the end
+- The final re-test must produce a clean set of all-passing screenshots
+
+## Known issues and workarounds
+
+### Problem: "Database error finding user" on signup
+**Cause:** Supabase auth service schema cache is stale after migration.
+**Fix:** `docker restart supabase-auth && sleep 5` then retry signup.
+
+### Problem: Copilot returns auth errors in subscription mode
+**Cause:** `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true` but `CLAUDE_CODE_OAUTH_TOKEN` is not set or expired.
+**Fix:** Re-extract the OAuth token from macOS keychain (see step 3b, Option 1) and recreate the container (`docker compose up -d copilot_executor`). The backend auto-provisions `~/.claude/.credentials.json` from the env var on startup. No `npm install` or `claude login` needed — the SDK bundles its own CLI binary.
+
+### Problem: agent-browser can't find chromium
+**Cause:** The Dockerfile auto-provisions system chromium on all architectures (including ARM64). If your branch is behind `dev`, this may not be present yet.
+**Fix:** Check if chromium exists: `which chromium || which chromium-browser`. If missing, install it: `apt-get install -y chromium` and set `AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium` in the container environment.
+
+### Problem: agent-browser selector matches multiple elements
+**Cause:** `text=X` matches all elements containing that text.
+**Fix:** Use `agent-browser snapshot` to get specific `ref=eNN` references, then use those: `agent-browser click eNN`.
+
+### Problem: Frontend shows cookie banner blocking interaction
+**Fix:** `agent-browser click 'text=Accept All'` before other interactions.
+
+### Problem: Container loses npm packages after rebuild
+**Cause:** `docker compose up --build` rebuilds the image, losing runtime installs.
+**Fix:** Add packages to the Dockerfile instead of installing at runtime.
+
+### Problem: Services not starting after `docker compose up`
+**Fix:** Wait and check health: `docker compose ps`. Common cause: migration hasn't finished. Check: `docker logs autogpt_platform-migrate-1 2>&1 | tail -5`. If supabase-db isn't healthy: `docker restart supabase-db && sleep 10`.
+
+### Problem: Docker uses cached layers with old code (PR changes not visible)
+**Cause:** `docker compose up --build` reuses cached `COPY` layers from previous builds. If the PR branch changes Python files but the previous build already cached that layer from `dev`, the container runs `dev` code.
+**Fix:** Always use `docker compose build --no-cache` for the first build of a PR branch. Subsequent rebuilds within the same branch can use `--build`.
+
+### Problem: `agent-browser open` loses login session
+**Cause:** Without session persistence, `agent-browser open` starts fresh.
+**Fix:** Use `--session-name pr-test` on ALL agent-browser commands. This auto-saves/restores cookies and localStorage across navigations. Alternatively, use `agent-browser eval "window.location.href = '...'"` to navigate within the same context.
+
+### Problem: Supabase auth returns "Database error querying schema"
+**Cause:** The database schema changed (migration ran) but supabase-auth has a stale schema cache.
+**Fix:** `docker restart supabase-db && sleep 10 && docker restart supabase-auth && sleep 8`. If user data was lost, re-signup.
--- a/.claude/skills/setup-repo/SKILL.md
+++ b/.claude/skills/setup-repo/SKILL.md
@@ -0,0 +1,195 @@
+---
+name: setup-repo
+description: Initialize a worktree-based repo layout for parallel development. Creates a main worktree, a reviews worktree for PR reviews, and N numbered work branches. Handles .env creation, dependency installation, and branchlet config. TRIGGER when user asks to set up the repo from scratch, initialize worktrees, bootstrap their dev environment, "setup repo", "setup worktrees", "initialize dev environment", "set up branches", or when a freshly cloned repo has no sibling worktrees.
+user-invocable: true
+args: "No arguments — interactive setup via prompts."
+metadata:
+  author: autogpt-team
+  version: "1.0.0"
+---
+
+# Repository Setup
+
+This skill sets up a worktree-based development layout from a freshly cloned repo. It creates:
+- A **main** worktree (the primary checkout)
+- A **reviews** worktree (for PR reviews)
+- **N work branches** (branch1..branchN) for parallel development
+
+## Step 1: Identify the repo
+
+Determine the repo root and parent directory:
+
+```bash
+ROOT=$(git rev-parse --show-toplevel)
+REPO_NAME=$(basename "$ROOT")
+PARENT=$(dirname "$ROOT")
+```
+
+Detect if the repo is already inside a worktree layout by counting sibling worktrees (not just checking the directory name, which could be anything):
+
+```bash
+# Count worktrees that are siblings (live under $PARENT but aren't $ROOT itself)
+SIBLING_COUNT=$(git worktree list --porcelain 2>/dev/null | grep "^worktree " | grep -c "$PARENT/" || true)
+if [ "$SIBLING_COUNT" -gt 1 ]; then
+  echo "INFO: Existing worktree layout detected at $PARENT ($SIBLING_COUNT worktrees)"
+  # Use $ROOT as-is; skip renaming/restructuring
+else
+  echo "INFO: Fresh clone detected, proceeding with setup"
+fi
+```
+
+## Step 2: Ask the user questions
+
+Use AskUserQuestion to gather setup preferences:
+
+1. **How many parallel work branches do you need?** (Options: 4, 8, 16, or custom)
+   - These become `branch1` through `branchN`
+2. **Which branch should be the base?** (Options: origin/master, origin/dev, or custom)
+   - All work branches and reviews will start from this
+
+## Step 3: Fetch and set up branches
+
+```bash
+cd "$ROOT"
+git fetch origin
+
+# Create the reviews branch from base (skip if already exists)
+if git show-ref --verify --quiet refs/heads/reviews; then
+  echo "INFO: Branch 'reviews' already exists, skipping"
+else
+  git branch reviews <base-branch>
+fi
+
+# Create numbered work branches from base (skip if already exists)
+for i in $(seq 1 "$COUNT"); do
+  if git show-ref --verify --quiet "refs/heads/branch$i"; then
+    echo "INFO: Branch 'branch$i' already exists, skipping"
+  else
+    git branch "branch$i" <base-branch>
+  fi
+done
+```
+
+## Step 4: Create worktrees
+
+Create worktrees as siblings to the main checkout:
+
+```bash
+if [ -d "$PARENT/reviews" ]; then
+  echo "INFO: Worktree '$PARENT/reviews' already exists, skipping"
+else
+  git worktree add "$PARENT/reviews" reviews
+fi
+
+for i in $(seq 1 "$COUNT"); do
+  if [ -d "$PARENT/branch$i" ]; then
+    echo "INFO: Worktree '$PARENT/branch$i' already exists, skipping"
+  else
+    git worktree add "$PARENT/branch$i" "branch$i"
+  fi
+done
+```
+
+## Step 5: Set up environment files
+
+**Do NOT assume .env files exist.** For each worktree (including main if needed):
+
+1. Check if `.env` exists in the source worktree for each path
+2. If `.env` exists, copy it
+3. If only `.env.default` or `.env.example` exists, copy that as `.env`
+4. If neither exists, warn the user and list which env files are missing
+
+Env file locations to check (same as the `/worktree` skill — keep these in sync):
+- `autogpt_platform/.env`
+- `autogpt_platform/backend/.env`
+- `autogpt_platform/frontend/.env`
+
+> **Note:** This env copying logic intentionally mirrors the `/worktree` skill's approach. If you update the path list or fallback logic here, update `/worktree` as well.
+
+```bash
+SOURCE="$ROOT"
+WORKTREES="reviews"
+for i in $(seq 1 "$COUNT"); do WORKTREES="$WORKTREES branch$i"; done
+
+FOUND_ANY_ENV=0
+for wt in $WORKTREES; do
+  TARGET="$PARENT/$wt"
+  for envpath in autogpt_platform autogpt_platform/backend autogpt_platform/frontend; do
+    if [ -f "$SOURCE/$envpath/.env" ]; then
+      FOUND_ANY_ENV=1
+      cp "$SOURCE/$envpath/.env" "$TARGET/$envpath/.env"
+    elif [ -f "$SOURCE/$envpath/.env.default" ]; then
+      FOUND_ANY_ENV=1
+      cp "$SOURCE/$envpath/.env.default" "$TARGET/$envpath/.env"
+      echo "NOTE: $wt/$envpath/.env was created from .env.default — you may need to edit it"
+    elif [ -f "$SOURCE/$envpath/.env.example" ]; then
+      FOUND_ANY_ENV=1
+      cp "$SOURCE/$envpath/.env.example" "$TARGET/$envpath/.env"
+      echo "NOTE: $wt/$envpath/.env was created from .env.example — you may need to edit it"
+    else
+      echo "WARNING: No .env, .env.default, or .env.example found at $SOURCE/$envpath/"
+    fi
+  done
+done
+
+if [ "$FOUND_ANY_ENV" -eq 0 ]; then
+  echo "WARNING: No environment files or templates were found in the source worktree."
+  # Use AskUserQuestion to confirm: "Continue setup without env files?"
+  # If the user declines, stop here and let them set up .env files first.
+fi
+```
+
+## Step 6: Copy branchlet config
+
+Copy `.branchlet.json` from main to each worktree so branchlet can manage sub-worktrees:
+
+```bash
+if [ -f "$ROOT/.branchlet.json" ]; then
+  for wt in $WORKTREES; do
+    cp "$ROOT/.branchlet.json" "$PARENT/$wt/.branchlet.json"
+  done
+fi
+```
+
+## Step 7: Install dependencies
+
+Install deps in all worktrees. Run these sequentially per worktree:
+
+```bash
+for wt in $WORKTREES; do
+  TARGET="$PARENT/$wt"
+  echo "=== Installing deps for $wt ==="
+  (cd "$TARGET/autogpt_platform/autogpt_libs" && poetry install) &&
+  (cd "$TARGET/autogpt_platform/backend" && poetry install && poetry run prisma generate) &&
+  (cd "$TARGET/autogpt_platform/frontend" && pnpm install) &&
+  echo "=== Done: $wt ===" ||
+  echo "=== FAILED: $wt ==="
+done
+```
+
+This is slow. Run in background if possible and notify when complete.
+
+## Step 8: Verify and report
+
+After setup, verify and report to the user:
+
+```bash
+git worktree list
+```
+
+Summarize:
+- Number of worktrees created
+- Which env files were copied vs created from defaults vs missing
+- Any warnings or errors encountered
+
+## Final directory layout
+
+```
+parent/
+  main/              # Primary checkout (already exists)
+  reviews/           # PR review worktree
+  branch1/           # Work branch 1
+  branch2/           # Work branch 2
+  ...
+  branchN/           # Work branch N
+```
--- a/.claude/skills/write-frontend-tests/SKILL.md
+++ b/.claude/skills/write-frontend-tests/SKILL.md
@@ -0,0 +1,224 @@
+---
+name: write-frontend-tests
+description: "Analyze the current branch diff against dev, plan integration tests for changed frontend pages/components, and write them. TRIGGER when user asks to write frontend tests, add test coverage, or 'write tests for my changes'."
+user-invocable: true
+args: "[base branch] — defaults to dev. Optionally pass a specific base branch to diff against."
+metadata:
+  author: autogpt-team
+  version: "1.0.0"
+---
+
+# Write Frontend Tests
+
+Analyze the current branch's frontend changes, plan integration tests, and write them.
+
+## References
+
+Before writing any tests, read the testing rules and conventions:
+
+- `autogpt_platform/frontend/TESTING.md` — testing strategy, file locations, examples
+- `autogpt_platform/frontend/src/tests/AGENTS.md` — detailed testing rules, MSW patterns, decision flowchart
+- `autogpt_platform/frontend/src/tests/integrations/test-utils.tsx` — custom render with providers
+- `autogpt_platform/frontend/src/tests/integrations/vitest.setup.tsx` — MSW server setup
+
+## Step 1: Identify changed frontend files
+
+```bash
+BASE_BRANCH="${ARGUMENTS:-dev}"
+cd autogpt_platform/frontend
+
+# Get changed frontend files (excluding generated, config, and test files)
+git diff "$BASE_BRANCH"...HEAD --name-only -- src/ \
+  | grep -v '__generated__' \
+  | grep -v '__tests__' \
+  | grep -v '\.test\.' \
+  | grep -v '\.stories\.' \
+  | grep -v '\.spec\.'
+```
+
+Also read the diff to understand what changed:
+
+```bash
+git diff "$BASE_BRANCH"...HEAD --stat -- src/
+git diff "$BASE_BRANCH"...HEAD -- src/ | head -500
+```
+
+## Step 2: Categorize changes and find test targets
+
+For each changed file, determine:
+
+1. **Is it a page?** (`page.tsx`) — these are the primary test targets
+2. **Is it a hook?** (`use*.ts`) — test via the page that uses it
+3. **Is it a component?** (`.tsx` in `components/`) — test via the parent page unless it's complex enough to warrant isolation
+4. **Is it a helper?** (`helpers.ts`, `utils.ts`) — unit test directly if pure logic
+
+**Priority order:**
+1. Pages with new/changed data fetching or user interactions
+2. Components with complex internal logic (modals, forms, wizards)
+3. Hooks with non-trivial business logic
+4. Pure helper functions
+
+Skip: styling-only changes, type-only changes, config changes.
+
+## Step 3: Check for existing tests
+
+For each test target, check if tests already exist:
+
+```bash
+# For a page at src/app/(platform)/library/page.tsx
+ls src/app/\(platform\)/library/__tests__/ 2>/dev/null
+
+# For a component at src/app/(platform)/library/components/AgentCard/AgentCard.tsx
+ls src/app/\(platform\)/library/components/AgentCard/__tests__/ 2>/dev/null
+```
+
+Note which targets have no tests (need new files) vs which have tests that need updating.
+
+## Step 4: Identify API endpoints used
+
+For each test target, find which API hooks are used:
+
+```bash
+# Find generated API hook imports in the changed files
+grep -rn 'from.*__generated__/endpoints' src/app/\(platform\)/library/
+grep -rn 'use[A-Z].*V[12]' src/app/\(platform\)/library/
+```
+
+For each API hook found, locate the corresponding MSW handler:
+
+```bash
+# If the page uses useGetV2ListLibraryAgents, find its MSW handlers
+grep -rn 'getGetV2ListLibraryAgents.*Handler' src/app/api/__generated__/endpoints/library/library.msw.ts
+```
+
+List every MSW handler you will need (200 for happy path, 4xx for error paths).
+
+## Step 5: Write the test plan
+
+Before writing code, output a plan as a numbered list:
+
+```
+Test plan for [branch name]:
+
+1. src/app/(platform)/library/__tests__/main.test.tsx (NEW)
+   - Renders page with agent list (MSW 200)
+   - Shows loading state
+   - Shows error state (MSW 422)
+   - Handles empty agent list
+
+2. src/app/(platform)/library/__tests__/search.test.tsx (NEW)
+   - Filters agents by search query
+   - Shows no results message
+   - Clears search
+
+3. src/app/(platform)/library/components/AgentCard/__tests__/AgentCard.test.tsx (UPDATE)
+   - Add test for new "duplicate" action
+```
+
+Present this plan to the user. Wait for confirmation before proceeding. If the user has feedback, adjust the plan.
+
+## Step 6: Write the tests
+
+For each test file in the plan, follow these conventions:
+
+### File structure
+
+```tsx
+import { render, screen, waitFor } from "@/tests/integrations/test-utils";
+import { server } from "@/mocks/mock-server";
+// Import MSW handlers for endpoints the page uses
+import {
+  getGetV2ListLibraryAgentsMockHandler200,
+  getGetV2ListLibraryAgentsMockHandler422,
+} from "@/app/api/__generated__/endpoints/library/library.msw";
+// Import the component under test
+import LibraryPage from "../page";
+
+describe("LibraryPage", () => {
+  test("renders agent list from API", async () => {
+    server.use(getGetV2ListLibraryAgentsMockHandler200());
+
+    render(<LibraryPage />);
+
+    expect(await screen.findByText(/my agents/i)).toBeDefined();
+  });
+
+  test("shows error state on API failure", async () => {
+    server.use(getGetV2ListLibraryAgentsMockHandler422());
+
+    render(<LibraryPage />);
+
+    expect(await screen.findByText(/error/i)).toBeDefined();
+  });
+});
+```
+
+### Rules
+
+- Use `render()` from `@/tests/integrations/test-utils` (NOT from `@testing-library/react` directly)
+- Use `server.use()` to set up MSW handlers BEFORE rendering
+- Use `findBy*` (async) for elements that appear after data fetching — NOT `getBy*`
+- Use `getBy*` only for elements that are immediately present in the DOM
+- Use `screen` queries — do NOT destructure from `render()`
+- Use `waitFor` when asserting side effects or state changes after interactions
+- Import `fireEvent` or `userEvent` from the test-utils for interactions
+- Do NOT mock internal hooks or functions — mock at the API boundary via MSW
+- Do NOT use `act()` manually — `render` and `fireEvent` handle it
+- Keep tests focused: one behavior per test
+- Use descriptive test names that read like sentences
+
+### Test location
+
+```
+# For pages: __tests__/ next to page.tsx
+src/app/(platform)/library/__tests__/main.test.tsx
+
+# For complex standalone components: __tests__/ inside component folder
+src/app/(platform)/library/components/AgentCard/__tests__/AgentCard.test.tsx
+
+# For pure helpers: co-located .test.ts
+src/app/(platform)/library/helpers.test.ts
+```
+
+### Custom MSW overrides
+
+When the auto-generated faker data is not enough, override with specific data:
+
+```tsx
+import { http, HttpResponse } from "msw";
+
+server.use(
+  http.get("http://localhost:3000/api/proxy/api/v2/library/agents", () => {
+    return HttpResponse.json({
+      agents: [
+        { id: "1", name: "Test Agent", description: "A test agent" },
+      ],
+      pagination: { total_items: 1, total_pages: 1, page: 1, page_size: 10 },
+    });
+  }),
+);
+```
+
+Use the proxy URL pattern: `http://localhost:3000/api/proxy/api/v{version}/{path}` — this matches the MSW base URL configured in `orval.config.ts`.
+
+## Step 7: Run and verify
+
+After writing all tests:
+
+```bash
+cd autogpt_platform/frontend
+pnpm test:unit --reporter=verbose
+```
+
+If tests fail:
+1. Read the error output carefully
+2. Fix the test (not the source code, unless there is a genuine bug)
+3. Re-run until all pass
+
+Then run the full checks:
+
+```bash
+pnpm format
+pnpm lint
+pnpm types
+```
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,8 +1,12 @@
-<!-- Clearly explain the need for these changes: -->
+### Why / What / How
+
+<!-- Why: Why does this PR exist? What problem does it solve, or what's broken/missing without it? -->
+<!-- What: What does this PR change? Summarize the changes at a high level. -->
+<!-- How: How does it work? Describe the approach, key implementation details, or architecture decisions. -->

 ### Changes 🏗️

-<!-- Concisely describe all of the changes made in this pull request: -->
+<!-- List the key changes. Keep it higher level than the diff but specific enough to highlight what's new/modified. -->

 ### Checklist 📋

--- a/.github/workflows/classic-autogpt-ci.yml
+++ b/.github/workflows/classic-autogpt-ci.yml
@@ -6,11 +6,19 @@ on:
    paths:
      - '.github/workflows/classic-autogpt-ci.yml'
      - 'classic/original_autogpt/**'
+      - 'classic/direct_benchmark/**'
+      - 'classic/forge/**'
+      - 'classic/pyproject.toml'
+      - 'classic/poetry.lock'
  pull_request:
    branches: [ master, dev, release-* ]
    paths:
      - '.github/workflows/classic-autogpt-ci.yml'
      - 'classic/original_autogpt/**'
+      - 'classic/direct_benchmark/**'
+      - 'classic/forge/**'
+      - 'classic/pyproject.toml'
+      - 'classic/poetry.lock'

 concurrency:
  group: ${{ format('classic-autogpt-ci-{0}', github.head_ref && format('{0}-{1}', github.event_name, github.event.pull_request.number) || github.sha) }}
@@ -19,47 +27,22 @@ concurrency:
 defaults:
  run:
    shell: bash
-    working-directory: classic/original_autogpt
+    working-directory: classic

 jobs:
  test:
    permissions:
      contents: read
    timeout-minutes: 30
-    strategy:
-      fail-fast: false
-      matrix:
-        python-version: ["3.10"]
-        platform-os: [ubuntu, macos, macos-arm64, windows]
-    runs-on: ${{ matrix.platform-os != 'macos-arm64' && format('{0}-latest', matrix.platform-os) || 'macos-14' }}
+    runs-on: ubuntu-latest

    steps:
-      # Quite slow on macOS (2~4 minutes to set up Docker)
-      # - name: Set up Docker (macOS)
-      #   if: runner.os == 'macOS'
-      #   uses: crazy-max/ghaction-setup-docker@v3
-
-      - name: Start MinIO service (Linux)
-        if: runner.os == 'Linux'
+      - name: Start MinIO service
        working-directory: '.'
        run: |
          docker pull minio/minio:edge-cicd
          docker run -d -p 9000:9000 minio/minio:edge-cicd

-      - name: Start MinIO service (macOS)
-        if: runner.os == 'macOS'
-        working-directory: ${{ runner.temp }}
-        run: |
-          brew install minio/stable/minio
-          mkdir data
-          minio server ./data &
-
-      # No MinIO on Windows:
-      # - Windows doesn't support running Linux Docker containers
-      # - It doesn't seem possible to start background processes on Windows. They are
-      #   killed after the step returns.
-      #   See: https://github.com/actions/runner/issues/598#issuecomment-2011890429
-
      - name: Checkout repository
        uses: actions/checkout@v4
        with:
@@ -71,41 +54,23 @@ jobs:
          git config --global user.name "Auto-GPT-Bot"
          git config --global user.email "github-bot@agpt.co"

-      - name: Set up Python ${{ matrix.python-version }}
+      - name: Set up Python 3.12
        uses: actions/setup-python@v5
        with:
-          python-version: ${{ matrix.python-version }}
+          python-version: "3.12"

      - id: get_date
        name: Get date
        run: echo "date=$(date +'%Y-%m-%d')" >> $GITHUB_OUTPUT

      - name: Set up Python dependency cache
-        # On Windows, unpacking cached dependencies takes longer than just installing them
-        if: runner.os != 'Windows'
        uses: actions/cache@v4
        with:
-          path: ${{ runner.os == 'macOS' && '~/Library/Caches/pypoetry' || '~/.cache/pypoetry' }}
-          key: poetry-${{ runner.os }}-${{ hashFiles('classic/original_autogpt/poetry.lock') }}
+          path: ~/.cache/pypoetry
+          key: poetry-${{ runner.os }}-${{ hashFiles('classic/poetry.lock') }}

-      - name: Install Poetry (Unix)
-        if: runner.os != 'Windows'
-        run: |
-          curl -sSL https://install.python-poetry.org | python3 -
-
-          if [ "${{ runner.os }}" = "macOS" ]; then
-            PATH="$HOME/.local/bin:$PATH"
-            echo "$HOME/.local/bin" >> $GITHUB_PATH
-          fi
-
-      - name: Install Poetry (Windows)
-        if: runner.os == 'Windows'
-        shell: pwsh
-        run: |
-          (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -
-
-          $env:PATH += ";$env:APPDATA\Python\Scripts"
-          echo "$env:APPDATA\Python\Scripts" >> $env:GITHUB_PATH
+      - name: Install Poetry
+        run: curl -sSL https://install.python-poetry.org | python3 -

      - name: Install Python dependencies
        run: poetry install
@@ -116,12 +81,13 @@ jobs:
            --cov=autogpt --cov-branch --cov-report term-missing --cov-report xml \
            --numprocesses=logical --durations=10 \
            --junitxml=junit.xml -o junit_family=legacy \
-            tests/unit tests/integration
+            original_autogpt/tests/unit original_autogpt/tests/integration
        env:
          CI: true
          PLAIN_OUTPUT: True
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-          S3_ENDPOINT_URL: ${{ runner.os != 'Windows' && 'http://127.0.0.1:9000' || '' }}
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          S3_ENDPOINT_URL: http://127.0.0.1:9000
          AWS_ACCESS_KEY_ID: minioadmin
          AWS_SECRET_ACCESS_KEY: minioadmin

@@ -135,11 +101,11 @@ jobs:
        uses: codecov/codecov-action@v5
        with:
          token: ${{ secrets.CODECOV_TOKEN }}
-          flags: autogpt-agent,${{ runner.os }}
+          flags: autogpt-agent

      - name: Upload logs to artifact
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: test-logs
-          path: classic/original_autogpt/logs/
+          path: classic/logs/
--- a/.github/workflows/classic-autogpt-docker-ci.yml
+++ b/.github/workflows/classic-autogpt-docker-ci.yml
@@ -148,7 +148,7 @@ jobs:
            --entrypoint poetry ${{ env.IMAGE_NAME }} run \
            pytest -v --cov=autogpt --cov-branch --cov-report term-missing \
            --numprocesses=4 --durations=10 \
-            tests/unit tests/integration 2>&1 | tee test_output.txt
+            original_autogpt/tests/unit original_autogpt/tests/integration 2>&1 | tee test_output.txt

          test_failure=${PIPESTATUS[0]}

--- a/.github/workflows/classic-autogpts-ci.yml
+++ b/.github/workflows/classic-autogpts-ci.yml
@@ -10,10 +10,9 @@ on:
      - '.github/workflows/classic-autogpts-ci.yml'
      - 'classic/original_autogpt/**'
      - 'classic/forge/**'
-      - 'classic/benchmark/**'
-      - 'classic/run'
-      - 'classic/cli.py'
-      - 'classic/setup.py'
+      - 'classic/direct_benchmark/**'
+      - 'classic/pyproject.toml'
+      - 'classic/poetry.lock'
      - '!**/*.md'
  pull_request:
    branches: [ master, dev, release-* ]
@@ -21,10 +20,9 @@ on:
      - '.github/workflows/classic-autogpts-ci.yml'
      - 'classic/original_autogpt/**'
      - 'classic/forge/**'
-      - 'classic/benchmark/**'
-      - 'classic/run'
-      - 'classic/cli.py'
-      - 'classic/setup.py'
+      - 'classic/direct_benchmark/**'
+      - 'classic/pyproject.toml'
+      - 'classic/poetry.lock'
      - '!**/*.md'

 defaults:
@@ -35,13 +33,9 @@ defaults:
 jobs:
  serve-agent-protocol:
    runs-on: ubuntu-latest
-    strategy:
-      matrix:
-        agent-name: [ original_autogpt ]
-      fail-fast: false
    timeout-minutes: 20
    env:
-      min-python-version: '3.10'
+      min-python-version: '3.12'
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
@@ -55,22 +49,22 @@ jobs:
          python-version: ${{ env.min-python-version }}

      - name: Install Poetry
-        working-directory: ./classic/${{ matrix.agent-name }}/
        run: |
          curl -sSL https://install.python-poetry.org | python -

-      - name: Run regression tests
+      - name: Install dependencies
+        run: poetry install
+
+      - name: Run smoke tests with direct-benchmark
        run: |
-          ./run agent start ${{ matrix.agent-name }}
-          cd ${{ matrix.agent-name }}
-          poetry run agbenchmark --mock --test=BasicRetrieval --test=Battleship --test=WebArenaTask_0
-          poetry run agbenchmark --test=WriteFile
+          poetry run direct-benchmark run \
+            --strategies one_shot \
+            --models claude \
+            --tests ReadFile,WriteFile \
+            --json
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-          AGENT_NAME: ${{ matrix.agent-name }}
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          REQUESTS_CA_BUNDLE: /etc/ssl/certs/ca-certificates.crt
-          HELICONE_CACHE_ENABLED: false
-          HELICONE_PROPERTY_AGENT: ${{ matrix.agent-name }}
-          REPORTS_FOLDER: ${{ format('../../reports/{0}', matrix.agent-name) }}
-          TELEMETRY_ENVIRONMENT: autogpt-ci
-          TELEMETRY_OPT_IN: ${{ github.ref_name == 'master' }}
+          NONINTERACTIVE_MODE: "true"
+          CI: true
--- a/.github/workflows/classic-benchmark-ci.yml
+++ b/.github/workflows/classic-benchmark-ci.yml
@@ -1,18 +1,24 @@
-name: Classic - AGBenchmark CI
+name: Classic - Direct Benchmark CI

 on:
  push:
    branches: [ master, dev, ci-test* ]
    paths:
-      - 'classic/benchmark/**'
-      - '!classic/benchmark/reports/**'
+      - 'classic/direct_benchmark/**'
+      - 'classic/original_autogpt/**'
+      - 'classic/forge/**'
      - .github/workflows/classic-benchmark-ci.yml
+      - 'classic/pyproject.toml'
+      - 'classic/poetry.lock'
  pull_request:
    branches: [ master, dev, release-* ]
    paths:
-      - 'classic/benchmark/**'
-      - '!classic/benchmark/reports/**'
+      - 'classic/direct_benchmark/**'
+      - 'classic/original_autogpt/**'
+      - 'classic/forge/**'
      - .github/workflows/classic-benchmark-ci.yml
+      - 'classic/pyproject.toml'
+      - 'classic/poetry.lock'

 concurrency:
  group: ${{ format('benchmark-ci-{0}', github.head_ref && format('{0}-{1}', github.event_name, github.event.pull_request.number) || github.sha) }}
@@ -23,95 +29,16 @@ defaults:
    shell: bash

 env:
-  min-python-version: '3.10'
+  min-python-version: '3.12'

 jobs:
-  test:
-    permissions:
-      contents: read
+  benchmark-tests:
+    runs-on: ubuntu-latest
    timeout-minutes: 30
-    strategy:
-      fail-fast: false
-      matrix:
-        python-version: ["3.10"]
-        platform-os: [ubuntu, macos, macos-arm64, windows]
-    runs-on: ${{ matrix.platform-os != 'macos-arm64' && format('{0}-latest', matrix.platform-os) || 'macos-14' }}
    defaults:
      run:
        shell: bash
-        working-directory: classic/benchmark
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v4
-        with:
-          fetch-depth: 0
-          submodules: true
-
-      - name: Set up Python ${{ matrix.python-version }}
-        uses: actions/setup-python@v5
-        with:
-          python-version: ${{ matrix.python-version }}
-
-      - name: Set up Python dependency cache
-        # On Windows, unpacking cached dependencies takes longer than just installing them
-        if: runner.os != 'Windows'
-        uses: actions/cache@v4
-        with:
-          path: ${{ runner.os == 'macOS' && '~/Library/Caches/pypoetry' || '~/.cache/pypoetry' }}
-          key: poetry-${{ runner.os }}-${{ hashFiles('classic/benchmark/poetry.lock') }}
-
-      - name: Install Poetry (Unix)
-        if: runner.os != 'Windows'
-        run: |
-          curl -sSL https://install.python-poetry.org | python3 -
-
-          if [ "${{ runner.os }}" = "macOS" ]; then
-            PATH="$HOME/.local/bin:$PATH"
-            echo "$HOME/.local/bin" >> $GITHUB_PATH
-          fi
-
-      - name: Install Poetry (Windows)
-        if: runner.os == 'Windows'
-        shell: pwsh
-        run: |
-          (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -
-
-          $env:PATH += ";$env:APPDATA\Python\Scripts"
-          echo "$env:APPDATA\Python\Scripts" >> $env:GITHUB_PATH
-
-      - name: Install Python dependencies
-        run: poetry install
-
-      - name: Run pytest with coverage
-        run: |
-          poetry run pytest -vv \
-            --cov=agbenchmark --cov-branch --cov-report term-missing --cov-report xml \
-            --durations=10 \
-            --junitxml=junit.xml -o junit_family=legacy \
-            tests
-        env:
-          CI: true
-          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-
-      - name: Upload test results to Codecov
-        if: ${{ !cancelled() }}  # Run even if tests fail
-        uses: codecov/test-results-action@v1
-        with:
-          token: ${{ secrets.CODECOV_TOKEN }}
-
-      - name: Upload coverage reports to Codecov
-        uses: codecov/codecov-action@v5
-        with:
-          token: ${{ secrets.CODECOV_TOKEN }}
-          flags: agbenchmark,${{ runner.os }}
-
-  self-test-with-agent:
-    runs-on: ubuntu-latest
-    strategy:
-      matrix:
-        agent-name: [forge]
-      fail-fast: false
-    timeout-minutes: 20
+        working-directory: classic
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
@@ -124,53 +51,120 @@ jobs:
        with:
          python-version: ${{ env.min-python-version }}

+      - name: Set up Python dependency cache
+        uses: actions/cache@v4
+        with:
+          path: ~/.cache/pypoetry
+          key: poetry-${{ runner.os }}-${{ hashFiles('classic/poetry.lock') }}
+
      - name: Install Poetry
        run: |
-          curl -sSL https://install.python-poetry.org | python -
+          curl -sSL https://install.python-poetry.org | python3 -
+
+      - name: Install dependencies
+        run: poetry install
+
+      - name: Run basic benchmark tests
+        run: |
+          echo "Testing ReadFile challenge with one_shot strategy..."
+          poetry run direct-benchmark run \
+            --fresh \
+            --strategies one_shot \
+            --models claude \
+            --tests ReadFile \
+            --json
+
+          echo "Testing WriteFile challenge..."
+          poetry run direct-benchmark run \
+            --fresh \
+            --strategies one_shot \
+            --models claude \
+            --tests WriteFile \
+            --json
+        env:
+          CI: true
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+          NONINTERACTIVE_MODE: "true"
+
+      - name: Test category filtering
+        run: |
+          echo "Testing coding category..."
+          poetry run direct-benchmark run \
+            --fresh \
+            --strategies one_shot \
+            --models claude \
+            --categories coding \
+            --tests ReadFile,WriteFile \
+            --json
+        env:
+          CI: true
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+          NONINTERACTIVE_MODE: "true"
+
+      - name: Test multiple strategies
+        run: |
+          echo "Testing multiple strategies..."
+          poetry run direct-benchmark run \
+            --fresh \
+            --strategies one_shot,plan_execute \
+            --models claude \
+            --tests ReadFile \
+            --parallel 2 \
+            --json
+        env:
+          CI: true
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+          NONINTERACTIVE_MODE: "true"
+
+  # Run regression tests on maintain challenges
+  regression-tests:
+    runs-on: ubuntu-latest
+    timeout-minutes: 45
+    if: github.ref == 'refs/heads/master' || github.ref == 'refs/heads/dev'
+    defaults:
+      run:
+        shell: bash
+        working-directory: classic
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+          submodules: true
+
+      - name: Set up Python ${{ env.min-python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ env.min-python-version }}
+
+      - name: Set up Python dependency cache
+        uses: actions/cache@v4
+        with:
+          path: ~/.cache/pypoetry
+          key: poetry-${{ runner.os }}-${{ hashFiles('classic/poetry.lock') }}
+
+      - name: Install Poetry
+        run: |
+          curl -sSL https://install.python-poetry.org | python3 -
+
+      - name: Install dependencies
+        run: poetry install

      - name: Run regression tests
-        working-directory: classic
        run: |
-          ./run agent start ${{ matrix.agent-name }}
-          cd ${{ matrix.agent-name }}
-
-          set +e # Ignore non-zero exit codes and continue execution
-          echo "Running the following command: poetry run agbenchmark --maintain --mock"
-          poetry run agbenchmark --maintain --mock
-          EXIT_CODE=$?
-          set -e  # Stop ignoring non-zero exit codes
-          # Check if the exit code was 5, and if so, exit with 0 instead
-          if [ $EXIT_CODE -eq 5 ]; then
-            echo "regression_tests.json is empty."
-          fi
-
-          echo "Running the following command: poetry run agbenchmark --mock"
-          poetry run agbenchmark --mock
-
-          echo "Running the following command: poetry run agbenchmark --mock --category=data"
-          poetry run agbenchmark --mock --category=data
-
-          echo "Running the following command: poetry run agbenchmark --mock --category=coding"
-          poetry run agbenchmark --mock --category=coding
-
-          # echo "Running the following command: poetry run agbenchmark --test=WriteFile"
-          # poetry run agbenchmark --test=WriteFile
-          cd ../benchmark
-          poetry install
-          echo "Adding the BUILD_SKILL_TREE environment variable. This will attempt to add new elements in the skill tree. If new elements are added, the CI fails because they should have been pushed"
-          export BUILD_SKILL_TREE=true
-
-          # poetry run agbenchmark --mock
-
-          # CHANGED=$(git diff --name-only | grep -E '(agbenchmark/challenges)|(../classic/frontend/assets)') || echo "No diffs"
-          # if [ ! -z "$CHANGED" ]; then
-          #   echo "There are unstaged changes please run agbenchmark and commit those changes since they are needed."
-          #   echo "$CHANGED"
-          #   exit 1
-          # else
-          #   echo "No unstaged changes."
-          # fi
+          echo "Running regression tests (previously beaten challenges)..."
+          poetry run direct-benchmark run \
+            --fresh \
+            --strategies one_shot \
+            --models claude \
+            --maintain \
+            --parallel 4 \
+            --json
        env:
+          CI: true
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-          TELEMETRY_ENVIRONMENT: autogpt-benchmark-ci
-          TELEMETRY_OPT_IN: ${{ github.ref_name == 'master' }}
+          NONINTERACTIVE_MODE: "true"
--- a/.github/workflows/classic-forge-ci.yml
+++ b/.github/workflows/classic-forge-ci.yml
@@ -6,13 +6,15 @@ on:
    paths:
      - '.github/workflows/classic-forge-ci.yml'
      - 'classic/forge/**'
-      - '!classic/forge/tests/vcr_cassettes'
+      - 'classic/pyproject.toml'
+      - 'classic/poetry.lock'
  pull_request:
    branches: [ master, dev, release-* ]
    paths:
      - '.github/workflows/classic-forge-ci.yml'
      - 'classic/forge/**'
-      - '!classic/forge/tests/vcr_cassettes'
+      - 'classic/pyproject.toml'
+      - 'classic/poetry.lock'

 concurrency:
  group: ${{ format('forge-ci-{0}', github.head_ref && format('{0}-{1}', github.event_name, github.event.pull_request.number) || github.sha) }}
@@ -21,131 +23,60 @@ concurrency:
 defaults:
  run:
    shell: bash
-    working-directory: classic/forge
+    working-directory: classic

 jobs:
  test:
    permissions:
      contents: read
    timeout-minutes: 30
-    strategy:
-      fail-fast: false
-      matrix:
-        python-version: ["3.10"]
-        platform-os: [ubuntu, macos, macos-arm64, windows]
-    runs-on: ${{ matrix.platform-os != 'macos-arm64' && format('{0}-latest', matrix.platform-os) || 'macos-14' }}
+    runs-on: ubuntu-latest

    steps:
-      # Quite slow on macOS (2~4 minutes to set up Docker)
-      # - name: Set up Docker (macOS)
-      #   if: runner.os == 'macOS'
-      #   uses: crazy-max/ghaction-setup-docker@v3
-
-      - name: Start MinIO service (Linux)
-        if: runner.os == 'Linux'
+      - name: Start MinIO service
        working-directory: '.'
        run: |
          docker pull minio/minio:edge-cicd
          docker run -d -p 9000:9000 minio/minio:edge-cicd

-      - name: Start MinIO service (macOS)
-        if: runner.os == 'macOS'
-        working-directory: ${{ runner.temp }}
-        run: |
-          brew install minio/stable/minio
-          mkdir data
-          minio server ./data &
-
-      # No MinIO on Windows:
-      # - Windows doesn't support running Linux Docker containers
-      # - It doesn't seem possible to start background processes on Windows. They are
-      #   killed after the step returns.
-      #   See: https://github.com/actions/runner/issues/598#issuecomment-2011890429
-
      - name: Checkout repository
        uses: actions/checkout@v4
-        with:
-          fetch-depth: 0
-          submodules: true

-      - name: Checkout cassettes
-        if: ${{ startsWith(github.event_name, 'pull_request') }}
-        env:
-          PR_BASE: ${{ github.event.pull_request.base.ref }}
-          PR_BRANCH: ${{ github.event.pull_request.head.ref }}
-          PR_AUTHOR: ${{ github.event.pull_request.user.login }}
-        run: |
-          cassette_branch="${PR_AUTHOR}-${PR_BRANCH}"
-          cassette_base_branch="${PR_BASE}"
-          cd tests/vcr_cassettes
-
-          if ! git ls-remote --exit-code --heads origin $cassette_base_branch ; then
-            cassette_base_branch="master"
-          fi
-
-          if git ls-remote --exit-code --heads origin $cassette_branch ; then
-            git fetch origin $cassette_branch
-            git fetch origin $cassette_base_branch
-
-            git checkout $cassette_branch
-
-            # Pick non-conflicting cassette updates from the base branch
-            git merge --no-commit --strategy-option=ours origin/$cassette_base_branch
-            echo "Using cassettes from mirror branch '$cassette_branch'," \
-              "synced to upstream branch '$cassette_base_branch'."
-          else
-            git checkout -b $cassette_branch
-            echo "Branch '$cassette_branch' does not exist in cassette submodule." \
-              "Using cassettes from '$cassette_base_branch'."
-          fi
-
-      - name: Set up Python ${{ matrix.python-version }}
+      - name: Set up Python 3.12
        uses: actions/setup-python@v5
        with:
-          python-version: ${{ matrix.python-version }}
+          python-version: "3.12"

      - name: Set up Python dependency cache
-        # On Windows, unpacking cached dependencies takes longer than just installing them
-        if: runner.os != 'Windows'
        uses: actions/cache@v4
        with:
-          path: ${{ runner.os == 'macOS' && '~/Library/Caches/pypoetry' || '~/.cache/pypoetry' }}
-          key: poetry-${{ runner.os }}-${{ hashFiles('classic/forge/poetry.lock') }}
+          path: ~/.cache/pypoetry
+          key: poetry-${{ runner.os }}-${{ hashFiles('classic/poetry.lock') }}

-      - name: Install Poetry (Unix)
-        if: runner.os != 'Windows'
-        run: |
-          curl -sSL https://install.python-poetry.org | python3 -
-
-          if [ "${{ runner.os }}" = "macOS" ]; then
-            PATH="$HOME/.local/bin:$PATH"
-            echo "$HOME/.local/bin" >> $GITHUB_PATH
-          fi
-
-      - name: Install Poetry (Windows)
-        if: runner.os == 'Windows'
-        shell: pwsh
-        run: |
-          (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | python -
-
-          $env:PATH += ";$env:APPDATA\Python\Scripts"
-          echo "$env:APPDATA\Python\Scripts" >> $env:GITHUB_PATH
+      - name: Install Poetry
+        run: curl -sSL https://install.python-poetry.org | python3 -

      - name: Install Python dependencies
        run: poetry install

+      - name: Install Playwright browsers
+        run: poetry run playwright install chromium
+
      - name: Run pytest with coverage
        run: |
          poetry run pytest -vv \
            --cov=forge --cov-branch --cov-report term-missing --cov-report xml \
            --durations=10 \
            --junitxml=junit.xml -o junit_family=legacy \
-            forge
+            forge/forge forge/tests
        env:
          CI: true
          PLAIN_OUTPUT: True
+          # API keys - tests that need these will skip if not available
+          # Secrets are not available to fork PRs (GitHub security feature)
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-          S3_ENDPOINT_URL: ${{ runner.os != 'Windows' && 'http://127.0.0.1:9000' || '' }}
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
+          S3_ENDPOINT_URL: http://127.0.0.1:9000
          AWS_ACCESS_KEY_ID: minioadmin
          AWS_SECRET_ACCESS_KEY: minioadmin

@@ -159,85 +90,11 @@ jobs:
        uses: codecov/codecov-action@v5
        with:
          token: ${{ secrets.CODECOV_TOKEN }}
-          flags: forge,${{ runner.os }}
-
-      - id: setup_git_auth
-        name: Set up git token authentication
-        # Cassettes may be pushed even when tests fail
-        if: success() || failure()
-        run: |
-          config_key="http.${{ github.server_url }}/.extraheader"
-          if [ "${{ runner.os }}" = 'macOS' ]; then
-            base64_pat=$(echo -n "pat:${{ secrets.PAT_REVIEW }}" | base64)
-          else
-            base64_pat=$(echo -n "pat:${{ secrets.PAT_REVIEW }}" | base64 -w0)
-          fi
-
-          git config "$config_key" \
-            "Authorization: Basic $base64_pat"
-
-          cd tests/vcr_cassettes
-          git config "$config_key" \
-            "Authorization: Basic $base64_pat"
-
-          echo "config_key=$config_key" >> $GITHUB_OUTPUT
-
-      - id: push_cassettes
-        name: Push updated cassettes
-        # For pull requests, push updated cassettes even when tests fail
-        if: github.event_name == 'push' || (! github.event.pull_request.head.repo.fork && (success() || failure()))
-        env:
-          PR_BRANCH: ${{ github.event.pull_request.head.ref }}
-          PR_AUTHOR: ${{ github.event.pull_request.user.login }}
-        run: |
-          if [ "${{ startsWith(github.event_name, 'pull_request') }}" = "true" ]; then
-            is_pull_request=true
-            cassette_branch="${PR_AUTHOR}-${PR_BRANCH}"
-          else
-            cassette_branch="${{ github.ref_name }}"
-          fi
-
-          cd tests/vcr_cassettes
-          # Commit & push changes to cassettes if any
-          if ! git diff --quiet; then
-            git add .
-            git commit -m "Auto-update cassettes"
-            git push origin HEAD:$cassette_branch
-            if [ ! $is_pull_request ]; then
-              cd ../..
-              git add tests/vcr_cassettes
-              git commit -m "Update cassette submodule"
-              git push origin HEAD:$cassette_branch
-            fi
-            echo "updated=true" >> $GITHUB_OUTPUT
-          else
-            echo "updated=false" >> $GITHUB_OUTPUT
-            echo "No cassette changes to commit"
-          fi
-
-      - name: Post Set up git token auth
-        if: steps.setup_git_auth.outcome == 'success'
-        run: |
-          git config --unset-all '${{ steps.setup_git_auth.outputs.config_key }}'
-          git submodule foreach git config --unset-all '${{ steps.setup_git_auth.outputs.config_key }}'
-
-      - name: Apply "behaviour change" label and comment on PR
-        if: ${{ startsWith(github.event_name, 'pull_request') }}
-        run: |
-          PR_NUMBER="${{ github.event.pull_request.number }}"
-          TOKEN="${{ secrets.PAT_REVIEW }}"
-          REPO="${{ github.repository }}"
-
-          if [[ "${{ steps.push_cassettes.outputs.updated }}" == "true" ]]; then
-            echo "Adding label and comment..."
-            echo $TOKEN | gh auth login --with-token
-            gh issue edit $PR_NUMBER --add-label "behaviour change"
-            gh issue comment $PR_NUMBER --body "You changed AutoGPT's behaviour on ${{ runner.os }}. The cassettes have been updated and will be merged to the submodule when this Pull Request gets merged."
-          fi
+          flags: forge

      - name: Upload logs to artifact
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: test-logs
-          path: classic/forge/logs/
+          path: classic/logs/
--- a/.github/workflows/classic-frontend-ci.yml
+++ b/.github/workflows/classic-frontend-ci.yml
@@ -1,60 +0,0 @@
-name: Classic - Frontend CI/CD
-
-on:
-  push:
-    branches:
-      - master
-      - dev
-      - 'ci-test*' # This will match any branch that starts with "ci-test"
-    paths:
-      - 'classic/frontend/**'
-      - '.github/workflows/classic-frontend-ci.yml'
-  pull_request:
-    paths:
-      - 'classic/frontend/**'
-      - '.github/workflows/classic-frontend-ci.yml'
-
-jobs:
-  build:
-    permissions:
-      contents: write
-      pull-requests: write
-    runs-on: ubuntu-latest
-    env:
-      BUILD_BRANCH: ${{ format('classic-frontend-build/{0}', github.ref_name) }}
-
-    steps:
-      - name: Checkout Repo
-        uses: actions/checkout@v4
-
-      - name: Setup Flutter
-        uses: subosito/flutter-action@v2
-        with:
-          flutter-version: '3.13.2'
-
-      - name: Build Flutter to Web
-        run: |
-          cd classic/frontend
-          flutter build web --base-href /app/
-
-      # - name: Commit and Push to ${{ env.BUILD_BRANCH }}
-      #   if: github.event_name == 'push'
-      #   run: |
-      #     git config --local user.email "action@github.com"
-      #     git config --local user.name "GitHub Action"
-      #     git add classic/frontend/build/web
-      #     git checkout -B ${{ env.BUILD_BRANCH }}
-      #     git commit -m "Update frontend build to ${GITHUB_SHA:0:7}" -a
-      #     git push -f origin ${{ env.BUILD_BRANCH }}
-
-      - name: Create PR ${{ env.BUILD_BRANCH }} -> ${{ github.ref_name }}
-        if: github.event_name == 'push'
-        uses: peter-evans/create-pull-request@v8
-        with:
-          add-paths: classic/frontend/build/web
-          base: ${{ github.ref_name }}
-          branch: ${{ env.BUILD_BRANCH }}
-          delete-branch: true
-          title: "Update frontend build in `${{ github.ref_name }}`"
-          body: "This PR updates the frontend build based on commit ${{ github.sha }}."
-          commit-message: "Update frontend build based on commit ${{ github.sha }}"
--- a/.github/workflows/classic-python-checks.yml
+++ b/.github/workflows/classic-python-checks.yml
@@ -7,7 +7,9 @@ on:
      - '.github/workflows/classic-python-checks-ci.yml'
      - 'classic/original_autogpt/**'
      - 'classic/forge/**'
-      - 'classic/benchmark/**'
+      - 'classic/direct_benchmark/**'
+      - 'classic/pyproject.toml'
+      - 'classic/poetry.lock'
      - '**.py'
      - '!classic/forge/tests/vcr_cassettes'
  pull_request:
@@ -16,7 +18,9 @@ on:
      - '.github/workflows/classic-python-checks-ci.yml'
      - 'classic/original_autogpt/**'
      - 'classic/forge/**'
-      - 'classic/benchmark/**'
+      - 'classic/direct_benchmark/**'
+      - 'classic/pyproject.toml'
+      - 'classic/poetry.lock'
      - '**.py'
      - '!classic/forge/tests/vcr_cassettes'

@@ -27,44 +31,13 @@ concurrency:
 defaults:
  run:
    shell: bash
+    working-directory: classic

 jobs:
-  get-changed-parts:
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v4
-
-      - id: changes-in
-        name: Determine affected subprojects
-        uses: dorny/paths-filter@v3
-        with:
-          filters: |
-            original_autogpt:
-              - classic/original_autogpt/autogpt/**
-              - classic/original_autogpt/tests/**
-              - classic/original_autogpt/poetry.lock
-            forge:
-              - classic/forge/forge/**
-              - classic/forge/tests/**
-              - classic/forge/poetry.lock
-            benchmark:
-              - classic/benchmark/agbenchmark/**
-              - classic/benchmark/tests/**
-              - classic/benchmark/poetry.lock
-    outputs:
-      changed-parts: ${{ steps.changes-in.outputs.changes }}
-
  lint:
-    needs: get-changed-parts
    runs-on: ubuntu-latest
    env:
-      min-python-version: "3.10"
-
-    strategy:
-      matrix:
-        sub-package: ${{ fromJson(needs.get-changed-parts.outputs.changed-parts) }}
-      fail-fast: false
+      min-python-version: "3.12"

    steps:
      - name: Checkout repository
@@ -81,42 +54,31 @@ jobs:
        uses: actions/cache@v4
        with:
          path: ~/.cache/pypoetry
-          key: ${{ runner.os }}-poetry-${{ hashFiles(format('{0}/poetry.lock', matrix.sub-package)) }}
+          key: ${{ runner.os }}-poetry-${{ hashFiles('classic/poetry.lock') }}

      - name: Install Poetry
        run: curl -sSL https://install.python-poetry.org | python3 -

-      # Install dependencies
-
      - name: Install Python dependencies
-        run: poetry -C classic/${{ matrix.sub-package }} install
+        run: poetry install

      # Lint

      - name: Lint (isort)
        run: poetry run isort --check .
-        working-directory: classic/${{ matrix.sub-package }}

      - name: Lint (Black)
        if: success() || failure()
        run: poetry run black --check .
-        working-directory: classic/${{ matrix.sub-package }}

      - name: Lint (Flake8)
        if: success() || failure()
        run: poetry run flake8 .
-        working-directory: classic/${{ matrix.sub-package }}

  types:
-    needs: get-changed-parts
    runs-on: ubuntu-latest
    env:
-      min-python-version: "3.10"
-
-    strategy:
-      matrix:
-        sub-package: ${{ fromJson(needs.get-changed-parts.outputs.changed-parts) }}
-      fail-fast: false
+      min-python-version: "3.12"

    steps:
      - name: Checkout repository
@@ -133,19 +95,16 @@ jobs:
        uses: actions/cache@v4
        with:
          path: ~/.cache/pypoetry
-          key: ${{ runner.os }}-poetry-${{ hashFiles(format('{0}/poetry.lock', matrix.sub-package)) }}
+          key: ${{ runner.os }}-poetry-${{ hashFiles('classic/poetry.lock') }}

      - name: Install Poetry
        run: curl -sSL https://install.python-poetry.org | python3 -

-      # Install dependencies
-
      - name: Install Python dependencies
-        run: poetry -C classic/${{ matrix.sub-package }} install
+        run: poetry install

      # Typecheck

      - name: Typecheck
        if: success() || failure()
        run: poetry run pyright
-        working-directory: classic/${{ matrix.sub-package }}
--- a/.github/workflows/platform-backend-ci.yml
+++ b/.github/workflows/platform-backend-ci.yml
@@ -5,12 +5,14 @@ on:
    branches: [master, dev, ci-test*]
    paths:
      - ".github/workflows/platform-backend-ci.yml"
+      - ".github/workflows/scripts/get_package_version_from_lockfile.py"
      - "autogpt_platform/backend/**"
      - "autogpt_platform/autogpt_libs/**"
  pull_request:
    branches: [master, dev, release-*]
    paths:
      - ".github/workflows/platform-backend-ci.yml"
+      - ".github/workflows/scripts/get_package_version_from_lockfile.py"
      - "autogpt_platform/backend/**"
      - "autogpt_platform/autogpt_libs/**"
  merge_group:
@@ -25,10 +27,91 @@ defaults:
    working-directory: autogpt_platform/backend

 jobs:
+  lint:
+    permissions:
+      contents: read
+    timeout-minutes: 10
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6
+
+      - name: Set up Python 3.12
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+
+      - name: Set up Python dependency cache
+        uses: actions/cache@v5
+        with:
+          path: ~/.cache/pypoetry
+          key: poetry-${{ runner.os }}-py3.12-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
+
+      - name: Install Poetry
+        run: |
+          HEAD_POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
+          echo "Using Poetry version ${HEAD_POETRY_VERSION}"
+          curl -sSL https://install.python-poetry.org | POETRY_VERSION=$HEAD_POETRY_VERSION python3 -
+
+      - name: Install Python dependencies
+        run: poetry install
+
+      - name: Run Linters
+        run: poetry run lint --skip-pyright
+
+    env:
+      CI: true
+      PLAIN_OUTPUT: True
+
+  type-check:
+    permissions:
+      contents: read
+    timeout-minutes: 10
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: ["3.11", "3.12", "3.13"]
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6
+
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Set up Python dependency cache
+        uses: actions/cache@v5
+        with:
+          path: ~/.cache/pypoetry
+          key: poetry-${{ runner.os }}-py${{ matrix.python-version }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
+
+      - name: Install Poetry
+        run: |
+          HEAD_POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
+          echo "Using Poetry version ${HEAD_POETRY_VERSION}"
+          curl -sSL https://install.python-poetry.org | POETRY_VERSION=$HEAD_POETRY_VERSION python3 -
+
+      - name: Install Python dependencies
+        run: poetry install
+
+      - name: Generate Prisma Client
+        run: poetry run prisma generate && poetry run gen-prisma-stub
+
+      - name: Run Pyright
+        run: poetry run pyright --pythonversion ${{ matrix.python-version }}
+
+    env:
+      CI: true
+      PLAIN_OUTPUT: True
+
  test:
    permissions:
      contents: read
-    timeout-minutes: 30
+    timeout-minutes: 15
    strategy:
      fail-fast: false
      matrix:
@@ -96,9 +179,9 @@ jobs:
        uses: actions/cache@v5
        with:
          path: ~/.cache/pypoetry
-          key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
+          key: poetry-${{ runner.os }}-py${{ matrix.python-version }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}

-      - name: Install Poetry (Unix)
+      - name: Install Poetry
        run: |
          # Extract Poetry version from backend/poetry.lock
          HEAD_POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
@@ -156,22 +239,22 @@ jobs:
          echo "Waiting for ClamAV daemon to start..."
          max_attempts=60
          attempt=0
-          
+
          until nc -z localhost 3310 || [ $attempt -eq $max_attempts ]; do
            echo "ClamAV is unavailable - sleeping (attempt $((attempt+1))/$max_attempts)"
            sleep 5
            attempt=$((attempt+1))
          done
-          
+
          if [ $attempt -eq $max_attempts ]; then
            echo "ClamAV failed to start after $((max_attempts*5)) seconds"
            echo "Checking ClamAV service logs..."
            docker logs $(docker ps -q --filter "ancestor=clamav/clamav-debian:latest") 2>&1 | tail -50 || echo "No ClamAV container found"
            exit 1
          fi
-          
+
          echo "ClamAV is ready!"
-          
+
          # Verify ClamAV is responsive
          echo "Testing ClamAV connection..."
          timeout 10 bash -c 'echo "PING" | nc localhost 3310' || {
@@ -186,18 +269,15 @@ jobs:
          DATABASE_URL: ${{ steps.supabase.outputs.DB_URL }}
          DIRECT_URL: ${{ steps.supabase.outputs.DB_URL }}

-      - id: lint
-        name: Run Linter
-        run: poetry run lint
-
      - name: Run pytest with coverage
        run: |
          if [[ "${{ runner.debug }}" == "1" ]]; then
-            poetry run pytest -s -vv -o log_cli=true -o log_cli_level=DEBUG
+            poetry run pytest -s -vv -o log_cli=true -o log_cli_level=DEBUG \
+              --cov=backend --cov-branch --cov-report term-missing --cov-report xml
          else
-            poetry run pytest -s -vv
+            poetry run pytest -s -vv \
+              --cov=backend --cov-branch --cov-report term-missing --cov-report xml
          fi
-        if: success() || (failure() && steps.lint.outcome == 'failure')
        env:
          LOG_LEVEL: ${{ runner.debug && 'DEBUG' || 'INFO' }}
          DATABASE_URL: ${{ steps.supabase.outputs.DB_URL }}
@@ -209,6 +289,14 @@ jobs:
          REDIS_PORT: "6379"
          ENCRYPTION_KEY: "dvziYgz0KSK8FENhju0ZYi8-fRTfAdlz6YLhdB_jhNw=" # DO NOT USE IN PRODUCTION!!

+      - name: Upload coverage reports to Codecov
+        if: ${{ !cancelled() }}
+        uses: codecov/codecov-action@v5
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}
+          flags: platform-backend
+          files: ./autogpt_platform/backend/coverage.xml
+
    env:
      CI: true
      PLAIN_OUTPUT: True
@@ -222,9 +310,3 @@ jobs:
      # the backend service, docker composes, and examples
      RABBITMQ_DEFAULT_USER: "rabbitmq_user_default"
      RABBITMQ_DEFAULT_PASS: "k0VMxyIJF9S35f3x2uaw5IWAl6Y536O7"
-
-      # - name: Upload coverage reports to Codecov
-      #   uses: codecov/codecov-action@v4
-      #   with:
-      #     token: ${{ secrets.CODECOV_TOKEN }}
-      #     flags: backend,${{ runner.os }}
--- a/.github/workflows/platform-frontend-ci.yml
+++ b/.github/workflows/platform-frontend-ci.yml
@@ -120,175 +120,6 @@ jobs:
          token: ${{ secrets.GITHUB_TOKEN }}
          exitOnceUploaded: true

-  e2e_test:
-    name: end-to-end tests
-    runs-on: big-boi
-
-    steps:
-      - name: Checkout repository
-        uses: actions/checkout@v6
-        with:
-          submodules: recursive
-
-      - name: Set up Platform - Copy default supabase .env
-        run: |
-          cp ../.env.default ../.env
-
-      - name: Set up Platform - Copy backend .env and set OpenAI API key
-        run: |
-          cp ../backend/.env.default ../backend/.env
-          echo "OPENAI_INTERNAL_API_KEY=${{ secrets.OPENAI_API_KEY }}" >> ../backend/.env
-        env:
-          # Used by E2E test data script to generate embeddings for approved store agents
-          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
-
-      - name: Set up Platform - Set up Docker Buildx
-        uses: docker/setup-buildx-action@v3
-        with:
-          driver: docker-container
-          driver-opts: network=host
-
-      - name: Set up Platform - Expose GHA cache to docker buildx CLI
-        uses: crazy-max/ghaction-github-runtime@v4
-
-      - name: Set up Platform - Build Docker images (with cache)
-        working-directory: autogpt_platform
-        run: |
-          pip install pyyaml
-
-          # Resolve extends and generate a flat compose file that bake can understand
-          docker compose -f docker-compose.yml config > docker-compose.resolved.yml
-
-          # Add cache configuration to the resolved compose file
-          python ../.github/workflows/scripts/docker-ci-fix-compose-build-cache.py \
-            --source docker-compose.resolved.yml \
-            --cache-from "type=gha" \
-            --cache-to "type=gha,mode=max" \
-            --backend-hash "${{ hashFiles('autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/poetry.lock', 'autogpt_platform/backend/backend') }}" \
-            --frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src') }}" \
-            --git-ref "${{ github.ref }}"
-
-          # Build with bake using the resolved compose file (now includes cache config)
-          docker buildx bake --allow=fs.read=.. -f docker-compose.resolved.yml --load
-        env:
-          NEXT_PUBLIC_PW_TEST: true
-
-      - name: Set up tests - Cache E2E test data
-        id: e2e-data-cache
-        uses: actions/cache@v5
-        with:
-          path: /tmp/e2e_test_data.sql
-          key: e2e-test-data-${{ hashFiles('autogpt_platform/backend/test/e2e_test_data.py', 'autogpt_platform/backend/migrations/**', '.github/workflows/platform-frontend-ci.yml') }}
-
-      - name: Set up Platform - Start Supabase DB + Auth
-        run: |
-          docker compose -f ../docker-compose.resolved.yml up -d db auth --no-build
-          echo "Waiting for database to be ready..."
-          timeout 60 sh -c 'until docker compose -f ../docker-compose.resolved.yml exec -T db pg_isready -U postgres 2>/dev/null; do sleep 2; done'
-          echo "Waiting for auth service to be ready..."
-          timeout 60 sh -c 'until docker compose -f ../docker-compose.resolved.yml exec -T db psql -U postgres -d postgres -c "SELECT 1 FROM auth.users LIMIT 1" 2>/dev/null; do sleep 2; done' || echo "Auth schema check timeout, continuing..."
-
-      - name: Set up Platform - Run migrations
-        run: |
-          echo "Running migrations..."
-          docker compose -f ../docker-compose.resolved.yml run --rm migrate
-          echo "✅ Migrations completed"
-        env:
-          NEXT_PUBLIC_PW_TEST: true
-
-      - name: Set up tests - Load cached E2E test data
-        if: steps.e2e-data-cache.outputs.cache-hit == 'true'
-        run: |
-          echo "✅ Found cached E2E test data, restoring..."
-          {
-            echo "SET session_replication_role = 'replica';"
-            cat /tmp/e2e_test_data.sql
-            echo "SET session_replication_role = 'origin';"
-          } | docker compose -f ../docker-compose.resolved.yml exec -T db psql -U postgres -d postgres -b
-          # Refresh materialized views after restore
-          docker compose -f ../docker-compose.resolved.yml exec -T db \
-            psql -U postgres -d postgres -b -c "SET search_path TO platform; SELECT refresh_store_materialized_views();" || true
-
-          echo "✅ E2E test data restored from cache"
-
-      - name: Set up Platform - Start (all other services)
-        run: |
-          docker compose -f ../docker-compose.resolved.yml up -d --no-build
-          echo "Waiting for rest_server to be ready..."
-          timeout 60 sh -c 'until curl -f http://localhost:8006/health 2>/dev/null; do sleep 2; done' || echo "Rest server health check timeout, continuing..."
-        env:
-          NEXT_PUBLIC_PW_TEST: true
-
-      - name: Set up tests - Create E2E test data
-        if: steps.e2e-data-cache.outputs.cache-hit != 'true'
-        run: |
-          echo "Creating E2E test data..."
-          docker cp ../backend/test/e2e_test_data.py $(docker compose -f ../docker-compose.resolved.yml ps -q rest_server):/tmp/e2e_test_data.py
-          docker compose -f ../docker-compose.resolved.yml exec -T rest_server sh -c "cd /app/autogpt_platform && python /tmp/e2e_test_data.py" || {
-            echo "❌ E2E test data creation failed!"
-            docker compose -f ../docker-compose.resolved.yml logs --tail=50 rest_server
-            exit 1
-          }
-
-          # Dump auth.users + platform schema for cache (two separate dumps)
-          echo "Dumping database for cache..."
-          {
-            docker compose -f ../docker-compose.resolved.yml exec -T db \
-              pg_dump -U postgres --data-only --column-inserts \
-              --table='auth.users' postgres
-            docker compose -f ../docker-compose.resolved.yml exec -T db \
-              pg_dump -U postgres --data-only --column-inserts \
-              --schema=platform \
-              --exclude-table='platform._prisma_migrations' \
-              --exclude-table='platform.apscheduler_jobs' \
-              --exclude-table='platform.apscheduler_jobs_batched_notifications' \
-              postgres
-          } > /tmp/e2e_test_data.sql
-
-          echo "✅ Database dump created for caching ($(wc -l < /tmp/e2e_test_data.sql) lines)"
-
-      - name: Set up tests - Enable corepack
-        run: corepack enable
-
-      - name: Set up tests - Set up Node
-        uses: actions/setup-node@v6
-        with:
-          node-version: "22.18.0"
-          cache: "pnpm"
-          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
-
-      - name: Set up tests - Install dependencies
-        run: pnpm install --frozen-lockfile
-
-      - name: Set up tests - Install browser 'chromium'
-        run: pnpm playwright install --with-deps chromium
-
-      - name: Run Playwright tests
-        run: pnpm test:no-build
-        continue-on-error: false
-
-      - name: Upload Playwright report
-        if: always()
-        uses: actions/upload-artifact@v4
-        with:
-          name: playwright-report
-          path: playwright-report
-          if-no-files-found: ignore
-          retention-days: 3
-
-      - name: Upload Playwright test results
-        if: always()
-        uses: actions/upload-artifact@v4
-        with:
-          name: playwright-test-results
-          path: test-results
-          if-no-files-found: ignore
-          retention-days: 3
-
-      - name: Print Final Docker Compose logs
-        if: always()
-        run: docker compose -f ../docker-compose.resolved.yml logs
-
  integration_test:
    runs-on: ubuntu-latest
    needs: setup
@@ -317,3 +148,11 @@ jobs:

      - name: Run Integration Tests
        run: pnpm test:unit
+
+      - name: Upload coverage reports to Codecov
+        if: ${{ !cancelled() }}
+        uses: codecov/codecov-action@v5
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}
+          flags: platform-frontend
+          files: ./autogpt_platform/frontend/coverage/cobertura-coverage.xml
--- a/.github/workflows/platform-fullstack-ci.yml
+++ b/.github/workflows/platform-fullstack-ci.yml
@@ -1,14 +1,18 @@
-name: AutoGPT Platform - Frontend CI
+name: AutoGPT Platform - Full-stack CI

 on:
  push:
    branches: [master, dev]
    paths:
      - ".github/workflows/platform-fullstack-ci.yml"
+      - ".github/workflows/scripts/docker-ci-fix-compose-build-cache.py"
+      - ".github/workflows/scripts/get_package_version_from_lockfile.py"
      - "autogpt_platform/**"
  pull_request:
    paths:
      - ".github/workflows/platform-fullstack-ci.yml"
+      - ".github/workflows/scripts/docker-ci-fix-compose-build-cache.py"
+      - ".github/workflows/scripts/get_package_version_from_lockfile.py"
      - "autogpt_platform/**"
  merge_group:

@@ -24,42 +28,28 @@ defaults:
 jobs:
  setup:
    runs-on: ubuntu-latest
-    outputs:
-      cache-key: ${{ steps.cache-key.outputs.key }}

    steps:
      - name: Checkout repository
        uses: actions/checkout@v6

-      - name: Set up Node.js
-        uses: actions/setup-node@v6
-        with:
-          node-version: "22.18.0"
-
      - name: Enable corepack
        run: corepack enable

-      - name: Generate cache key
-        id: cache-key
-        run: echo "key=${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/package.json') }}" >> $GITHUB_OUTPUT
-
-      - name: Cache dependencies
-        uses: actions/cache@v5
+      - name: Set up Node
+        uses: actions/setup-node@v6
        with:
-          path: ~/.pnpm-store
-          key: ${{ steps.cache-key.outputs.key }}
-          restore-keys: |
-            ${{ runner.os }}-pnpm-${{ hashFiles('autogpt_platform/frontend/pnpm-lock.yaml') }}
-            ${{ runner.os }}-pnpm-
+          node-version: "22.18.0"
+          cache: "pnpm"
+          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml

-      - name: Install dependencies
+      - name: Install dependencies to populate cache
        run: pnpm install --frozen-lockfile

-  types:
-    runs-on: big-boi
+  check-api-types:
+    name: check API types
+    runs-on: ubuntu-latest
    needs: setup
-    strategy:
-      fail-fast: false

    steps:
      - name: Checkout repository
@@ -67,70 +57,279 @@ jobs:
        with:
          submodules: recursive

-      - name: Set up Node.js
+      # ------------------------ Backend setup ------------------------
+
+      - name: Set up Backend - Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+
+      - name: Set up Backend - Install Poetry
+        working-directory: autogpt_platform/backend
+        run: |
+          POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
+          echo "Installing Poetry version ${POETRY_VERSION}"
+          curl -sSL https://install.python-poetry.org | POETRY_VERSION=$POETRY_VERSION python3 -
+
+      - name: Set up Backend - Set up dependency cache
+        uses: actions/cache@v5
+        with:
+          path: ~/.cache/pypoetry
+          key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
+
+      - name: Set up Backend - Install dependencies
+        working-directory: autogpt_platform/backend
+        run: poetry install
+
+      - name: Set up Backend - Generate Prisma client
+        working-directory: autogpt_platform/backend
+        run: poetry run prisma generate && poetry run gen-prisma-stub
+
+      - name: Set up Frontend - Export OpenAPI schema from Backend
+        working-directory: autogpt_platform/backend
+        run: poetry run export-api-schema --output ../frontend/src/app/api/openapi.json
+
+      # ------------------------ Frontend setup ------------------------
+
+      - name: Set up Frontend - Enable corepack
+        run: corepack enable
+
+      - name: Set up Frontend - Set up Node
        uses: actions/setup-node@v6
        with:
          node-version: "22.18.0"
+          cache: "pnpm"
+          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml

-      - name: Enable corepack
-        run: corepack enable
-
-      - name: Copy default supabase .env
-        run: |
-          cp ../.env.default ../.env
-
-      - name: Copy backend .env
-        run: |
-          cp ../backend/.env.default ../backend/.env
-
-      - name: Run docker compose
-        run: |
-          docker compose -f ../docker-compose.yml --profile local up -d deps_backend
-
-      - name: Restore dependencies cache
-        uses: actions/cache@v5
-        with:
-          path: ~/.pnpm-store
-          key: ${{ needs.setup.outputs.cache-key }}
-          restore-keys: |
-            ${{ runner.os }}-pnpm-
-
-      - name: Install dependencies
+      - name: Set up Frontend - Install dependencies
        run: pnpm install --frozen-lockfile

-      - name: Setup .env
-        run: cp .env.default .env
-
-      - name: Wait for services to be ready
-        run: |
-          echo "Waiting for rest_server to be ready..."
-          timeout 60 sh -c 'until curl -f http://localhost:8006/health 2>/dev/null; do sleep 2; done' || echo "Rest server health check timeout, continuing..."
-          echo "Waiting for database to be ready..."
-          timeout 60 sh -c 'until docker compose -f ../docker-compose.yml exec -T db pg_isready -U postgres 2>/dev/null; do sleep 2; done' || echo "Database ready check timeout, continuing..."
-
-      - name: Generate API queries
-        run: pnpm generate:api:force
+      - name: Set up Frontend - Format OpenAPI schema
+        id: format-schema
+        run: pnpm prettier --write ./src/app/api/openapi.json

      - name: Check for API schema changes
        run: |
          if ! git diff --exit-code src/app/api/openapi.json; then
            echo "❌ API schema changes detected in src/app/api/openapi.json"
            echo ""
-            echo "The openapi.json file has been modified after running 'pnpm generate:api-all'."
+            echo "The openapi.json file has been modified after exporting the API schema."
            echo "This usually means changes have been made in the BE endpoints without updating the Frontend."
            echo "The API schema is now out of sync with the Front-end queries."
            echo ""
            echo "To fix this:"
-            echo "1. Pull the backend 'docker compose pull && docker compose up -d --build --force-recreate'"
-            echo "2. Run 'pnpm generate:api' locally"
-            echo "3. Run 'pnpm types' locally"
-            echo "4. Fix any TypeScript errors that may have been introduced"
-            echo "5. Commit and push your changes"
+            echo "\nIn the backend directory:"
+            echo "1. Run 'poetry run export-api-schema --output ../frontend/src/app/api/openapi.json'"
+            echo "\nIn the frontend directory:"
+            echo "2. Run 'pnpm prettier --write src/app/api/openapi.json'"
+            echo "3. Run 'pnpm generate:api'"
+            echo "4. Run 'pnpm types'"
+            echo "5. Fix any TypeScript errors that may have been introduced"
+            echo "6. Commit and push your changes"
            echo ""
            exit 1
          else
            echo "✅ No API schema changes detected"
          fi

-      - name: Run Typescript checks
+      - name: Set up Frontend - Generate API client
+        id: generate-api-client
+        run: pnpm orval --config ./orval.config.ts
+        # Continue with type generation & check even if there are schema changes
+        if: success() || (steps.format-schema.outcome == 'success')
+
+      - name: Check for TypeScript errors
        run: pnpm types
+        if: success() || (steps.generate-api-client.outcome == 'success')
+
+  e2e_test:
+    name: end-to-end tests
+    runs-on: big-boi
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6
+        with:
+          submodules: recursive
+
+      - name: Set up Platform - Copy default supabase .env
+        run: |
+          cp ../.env.default ../.env
+
+      - name: Set up Platform - Copy backend .env and set OpenAI API key
+        run: |
+          cp ../backend/.env.default ../backend/.env
+          echo "OPENAI_INTERNAL_API_KEY=${{ secrets.OPENAI_API_KEY }}" >> ../backend/.env
+        env:
+          # Used by E2E test data script to generate embeddings for approved store agents
+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+
+      - name: Set up Platform - Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
+        with:
+          driver: docker-container
+          driver-opts: network=host
+
+      - name: Set up Platform - Expose GHA cache to docker buildx CLI
+        uses: crazy-max/ghaction-github-runtime@v4
+
+      - name: Set up Platform - Build Docker images (with cache)
+        working-directory: autogpt_platform
+        run: |
+          pip install pyyaml
+
+          # Resolve extends and generate a flat compose file that bake can understand
+          export NEXT_PUBLIC_SOURCEMAPS NEXT_PUBLIC_PW_TEST
+          docker compose -f docker-compose.yml config > docker-compose.resolved.yml
+
+          # Ensure NEXT_PUBLIC_SOURCEMAPS is in resolved compose
+          # (docker compose config on some versions drops this arg)
+          if ! grep -q "NEXT_PUBLIC_SOURCEMAPS" docker-compose.resolved.yml; then
+            echo "Injecting NEXT_PUBLIC_SOURCEMAPS into resolved compose (docker compose config dropped it)"
+            sed -i '/NEXT_PUBLIC_PW_TEST/a\        NEXT_PUBLIC_SOURCEMAPS: "true"' docker-compose.resolved.yml
+          fi
+
+          # Add cache configuration to the resolved compose file
+          python ../.github/workflows/scripts/docker-ci-fix-compose-build-cache.py \
+            --source docker-compose.resolved.yml \
+            --cache-from "type=gha" \
+            --cache-to "type=gha,mode=max" \
+            --backend-hash "${{ hashFiles('autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/poetry.lock', 'autogpt_platform/backend/backend/**') }}" \
+            --frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}-sourcemaps" \
+            --git-ref "${{ github.ref }}"
+
+          # Build with bake using the resolved compose file (now includes cache config)
+          docker buildx bake --allow=fs.read=.. -f docker-compose.resolved.yml --load
+        env:
+          NEXT_PUBLIC_PW_TEST: true
+          NEXT_PUBLIC_SOURCEMAPS: true
+
+      - name: Set up tests - Cache E2E test data
+        id: e2e-data-cache
+        uses: actions/cache@v5
+        with:
+          path: /tmp/e2e_test_data.sql
+          key: e2e-test-data-${{ hashFiles('autogpt_platform/backend/test/e2e_test_data.py', 'autogpt_platform/backend/migrations/**', '.github/workflows/platform-fullstack-ci.yml') }}
+
+      - name: Set up Platform - Start Supabase DB + Auth
+        run: |
+          docker compose -f ../docker-compose.resolved.yml up -d db auth --no-build
+          echo "Waiting for database to be ready..."
+          timeout 60 sh -c 'until docker compose -f ../docker-compose.resolved.yml exec -T db pg_isready -U postgres 2>/dev/null; do sleep 2; done'
+          echo "Waiting for auth service to be ready..."
+          timeout 60 sh -c 'until docker compose -f ../docker-compose.resolved.yml exec -T db psql -U postgres -d postgres -c "SELECT 1 FROM auth.users LIMIT 1" 2>/dev/null; do sleep 2; done' || echo "Auth schema check timeout, continuing..."
+
+      - name: Set up Platform - Run migrations
+        run: |
+          echo "Running migrations..."
+          docker compose -f ../docker-compose.resolved.yml run --rm migrate
+          echo "✅ Migrations completed"
+        env:
+          NEXT_PUBLIC_PW_TEST: true
+
+      - name: Set up tests - Load cached E2E test data
+        if: steps.e2e-data-cache.outputs.cache-hit == 'true'
+        run: |
+          echo "✅ Found cached E2E test data, restoring..."
+          {
+            echo "SET session_replication_role = 'replica';"
+            cat /tmp/e2e_test_data.sql
+            echo "SET session_replication_role = 'origin';"
+          } | docker compose -f ../docker-compose.resolved.yml exec -T db psql -U postgres -d postgres -b
+          # Refresh materialized views after restore
+          docker compose -f ../docker-compose.resolved.yml exec -T db \
+            psql -U postgres -d postgres -b -c "SET search_path TO platform; SELECT refresh_store_materialized_views();" || true
+
+          echo "✅ E2E test data restored from cache"
+
+      - name: Set up Platform - Start (all other services)
+        run: |
+          docker compose -f ../docker-compose.resolved.yml up -d --no-build
+          echo "Waiting for rest_server to be ready..."
+          timeout 60 sh -c 'until curl -f http://localhost:8006/health 2>/dev/null; do sleep 2; done' || echo "Rest server health check timeout, continuing..."
+        env:
+          NEXT_PUBLIC_PW_TEST: true
+
+      - name: Set up tests - Create E2E test data
+        if: steps.e2e-data-cache.outputs.cache-hit != 'true'
+        run: |
+          echo "Creating E2E test data..."
+          docker cp ../backend/test/e2e_test_data.py $(docker compose -f ../docker-compose.resolved.yml ps -q rest_server):/tmp/e2e_test_data.py
+          docker compose -f ../docker-compose.resolved.yml exec -T rest_server sh -c "cd /app/autogpt_platform && python /tmp/e2e_test_data.py" || {
+            echo "❌ E2E test data creation failed!"
+            docker compose -f ../docker-compose.resolved.yml logs --tail=50 rest_server
+            exit 1
+          }
+
+          # Dump auth.users + platform schema for cache (two separate dumps)
+          echo "Dumping database for cache..."
+          {
+            docker compose -f ../docker-compose.resolved.yml exec -T db \
+              pg_dump -U postgres --data-only --column-inserts \
+              --table='auth.users' postgres
+            docker compose -f ../docker-compose.resolved.yml exec -T db \
+              pg_dump -U postgres --data-only --column-inserts \
+              --schema=platform \
+              --exclude-table='platform._prisma_migrations' \
+              --exclude-table='platform.apscheduler_jobs' \
+              --exclude-table='platform.apscheduler_jobs_batched_notifications' \
+              postgres
+          } > /tmp/e2e_test_data.sql
+
+          echo "✅ Database dump created for caching ($(wc -l < /tmp/e2e_test_data.sql) lines)"
+
+      - name: Set up tests - Enable corepack
+        run: corepack enable
+
+      - name: Set up tests - Set up Node
+        uses: actions/setup-node@v6
+        with:
+          node-version: "22.18.0"
+          cache: "pnpm"
+          cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
+
+      - name: Copy source maps from Docker for E2E coverage
+        run: |
+          FRONTEND_CONTAINER=$(docker compose -f ../docker-compose.resolved.yml ps -q frontend)
+          docker cp "$FRONTEND_CONTAINER":/app/.next/static .next-static-coverage
+
+      - name: Set up tests - Install dependencies
+        run: pnpm install --frozen-lockfile
+
+      - name: Set up tests - Install browser 'chromium'
+        run: pnpm playwright install --with-deps chromium
+
+      - name: Run Playwright tests
+        run: pnpm test:no-build
+        continue-on-error: false
+
+      - name: Upload E2E coverage to Codecov
+        if: ${{ !cancelled() }}
+        uses: codecov/codecov-action@v5
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}
+          flags: platform-frontend-e2e
+          files: ./autogpt_platform/frontend/coverage/e2e/cobertura-coverage.xml
+          disable_search: true
+
+      - name: Upload Playwright report
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: playwright-report
+          path: autogpt_platform/frontend/playwright-report
+          if-no-files-found: ignore
+          retention-days: 3
+
+      - name: Upload Playwright test results
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: playwright-test-results
+          path: autogpt_platform/frontend/test-results
+          if-no-files-found: ignore
+          retention-days: 3
+
+      - name: Print Final Docker Compose logs
+        if: always()
+        run: docker compose -f ../docker-compose.resolved.yml logs
--- a/.gitignore
+++ b/.gitignore
@@ -3,6 +3,7 @@
 classic/original_autogpt/keys.py
 classic/original_autogpt/*.json
 auto_gpt_workspace/*
+.autogpt/
 *.mpeg
 .env
 # Root .env files
@@ -16,6 +17,7 @@ log-ingestion.txt
 /logs
 *.log
 *.mp3
+!autogpt_platform/frontend/public/notification.mp3
 mem.sqlite3
 venvAutoGPT

@@ -159,6 +161,10 @@ CURRENT_BULLETIN.md

 # AgBenchmark
 classic/benchmark/agbenchmark/reports/
+classic/reports/
+classic/direct_benchmark/reports/
+classic/.benchmark_workspaces/
+classic/direct_benchmark/.benchmark_workspaces/

 # Nodejs
 package-lock.json
@@ -177,9 +183,13 @@ autogpt_platform/backend/settings.py

 *.ign.*
 .test-contents
+**/.claude/settings.local.json
 .claude/settings.local.json
 CLAUDE.local.md
 /autogpt_platform/backend/logs
+
+# Test database
+test.db
 .next
 # Implementation plans (generated by AI agents)
 plans/
--- a/.gitleaks.toml
+++ b/.gitleaks.toml
@@ -0,0 +1,36 @@
+title = "AutoGPT Gitleaks Config"
+
+[extend]
+useDefault = true
+
+[allowlist]
+description = "Global allowlist"
+paths = [
+    # Template/example env files (no real secrets)
+    '''\.env\.(default|example|template)$''',
+    # Lock files
+    '''pnpm-lock\.yaml$''',
+    '''poetry\.lock$''',
+    # Secrets baseline
+    '''\.secrets\.baseline$''',
+    # Build artifacts and caches (should not be committed)
+    '''__pycache__/''',
+    '''classic/frontend/build/''',
+    # Docker dev setup (local dev JWTs/keys only)
+    '''autogpt_platform/db/docker/''',
+    # Load test configs (dev JWTs)
+    '''load-tests/configs/''',
+    # Test files with fake/fixture keys (_test.py, test_*.py, conftest.py)
+    '''(_test|test_.*|conftest)\.py$''',
+    # Documentation (only contains placeholder keys in curl/API examples)
+    '''docs/.*\.md$''',
+    # Firebase config (public API keys by design)
+    '''google-services\.json$''',
+    '''classic/frontend/(lib|web)/''',
+]
+# CI test-only encryption key (marked DO NOT USE IN PRODUCTION)
+regexes = [
+    '''dvziYgz0KSK8FENhju0ZYi8''',
+    # LLM model name enum values falsely flagged as API keys
+    '''Llama-\d.*Instruct''',
+]
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,3 +0,0 @@
-[submodule "classic/forge/tests/vcr_cassettes"]
-	path = classic/forge/tests/vcr_cassettes
-	url = https://github.com/Significant-Gravitas/Auto-GPT-test-cassettes
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -23,9 +23,15 @@ repos:
      - id: detect-secrets
        name: Detect secrets
        description: Detects high entropy strings that are likely to be passwords.
+        args: ["--baseline", ".secrets.baseline"]
        files: ^autogpt_platform/
-        exclude: pnpm-lock\.yaml$
-        stages: [pre-push]
+        exclude: (pnpm-lock\.yaml|\.env\.(default|example|template))$
+
+  - repo: https://github.com/gitleaks/gitleaks
+    rev: v8.24.3
+    hooks:
+      - id: gitleaks
+        name: Detect secrets (gitleaks)

  - repo: local
    # For proper type checking, all dependencies need to be up-to-date.
@@ -84,51 +90,16 @@ repos:
        stages: [pre-commit, post-checkout]

      - id: poetry-install
-        name: Check & Install dependencies - Classic - AutoGPT
-        alias: poetry-install-classic-autogpt
+        name: Check & Install dependencies - Classic
+        alias: poetry-install-classic
        entry: >
          bash -c '
          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
          else
            git diff --cached --name-only
-          fi | grep -qE "^classic/(original_autogpt|forge)/poetry\.lock$" || exit 0;
-          poetry -C classic/original_autogpt install
-          '
-        # include forge source (since it's a path dependency)
-        always_run: true
-        language: system
-        pass_filenames: false
-        stages: [pre-commit, post-checkout]
-
-      - id: poetry-install
-        name: Check & Install dependencies - Classic - Forge
-        alias: poetry-install-classic-forge
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^classic/forge/poetry\.lock$" || exit 0;
-          poetry -C classic/forge install
-          '
-        always_run: true
-        language: system
-        pass_filenames: false
-        stages: [pre-commit, post-checkout]
-
-      - id: poetry-install
-        name: Check & Install dependencies - Classic - Benchmark
-        alias: poetry-install-classic-benchmark
-        entry: >
-          bash -c '
-          if [ -n "$PRE_COMMIT_FROM_REF" ]; then
-            git diff --name-only "$PRE_COMMIT_FROM_REF" "$PRE_COMMIT_TO_REF"
-          else
-            git diff --cached --name-only
-          fi | grep -qE "^classic/benchmark/poetry\.lock$" || exit 0;
-          poetry -C classic/benchmark install
+          fi | grep -qE "^classic/poetry\.lock$" || exit 0;
+          poetry -C classic install
          '
        always_run: true
        language: system
@@ -223,26 +194,10 @@ repos:
        language: system

      - id: isort
-        name: Lint (isort) - Classic - AutoGPT
-        alias: isort-classic-autogpt
-        entry: poetry -P classic/original_autogpt run isort -p autogpt
-        files: ^classic/original_autogpt/
-        types: [file, python]
-        language: system
-
-      - id: isort
-        name: Lint (isort) - Classic - Forge
-        alias: isort-classic-forge
-        entry: poetry -P classic/forge run isort -p forge
-        files: ^classic/forge/
-        types: [file, python]
-        language: system
-
-      - id: isort
-        name: Lint (isort) - Classic - Benchmark
-        alias: isort-classic-benchmark
-        entry: poetry -P classic/benchmark run isort -p agbenchmark
-        files: ^classic/benchmark/
+        name: Lint (isort) - Classic
+        alias: isort-classic
+        entry: bash -c 'cd classic && poetry run isort $(echo "$@" | sed "s|classic/||g")' --
+        files: ^classic/(original_autogpt|forge|direct_benchmark)/
        types: [file, python]
        language: system

@@ -256,26 +211,13 @@ repos:

  - repo: https://github.com/PyCQA/flake8
    rev: 7.0.0
-    # To have flake8 load the config of the individual subprojects, we have to call
-    # them separately.
+    # Use consolidated flake8 config at classic/.flake8
    hooks:
      - id: flake8
-        name: Lint (Flake8) - Classic - AutoGPT
-        alias: flake8-classic-autogpt
-        files: ^classic/original_autogpt/(autogpt|scripts|tests)/
-        args: [--config=classic/original_autogpt/.flake8]
-
-      - id: flake8
-        name: Lint (Flake8) - Classic - Forge
-        alias: flake8-classic-forge
-        files: ^classic/forge/(forge|tests)/
-        args: [--config=classic/forge/.flake8]
-
-      - id: flake8
-        name: Lint (Flake8) - Classic - Benchmark
-        alias: flake8-classic-benchmark
-        files: ^classic/benchmark/(agbenchmark|tests)/((?!reports).)*[/.]
-        args: [--config=classic/benchmark/.flake8]
+        name: Lint (Flake8) - Classic
+        alias: flake8-classic
+        files: ^classic/(original_autogpt|forge|direct_benchmark)/
+        args: [--config=classic/.flake8]

  - repo: local
    hooks:
@@ -311,29 +253,10 @@ repos:
        pass_filenames: false

      - id: pyright
-        name: Typecheck - Classic - AutoGPT
-        alias: pyright-classic-autogpt
-        entry: poetry -C classic/original_autogpt run pyright
-        # include forge source (since it's a path dependency) but exclude *_test.py files:
-        files: ^(classic/original_autogpt/((autogpt|scripts|tests)/|poetry\.lock$)|classic/forge/(forge/.*(?<!_test)\.py|poetry\.lock)$)
-        types: [file]
-        language: system
-        pass_filenames: false
-
-      - id: pyright
-        name: Typecheck - Classic - Forge
-        alias: pyright-classic-forge
-        entry: poetry -C classic/forge run pyright
-        files: ^classic/forge/(forge/|poetry\.lock$)
-        types: [file]
-        language: system
-        pass_filenames: false
-
-      - id: pyright
-        name: Typecheck - Classic - Benchmark
-        alias: pyright-classic-benchmark
-        entry: poetry -C classic/benchmark run pyright
-        files: ^classic/benchmark/(agbenchmark/|tests/|poetry\.lock$)
+        name: Typecheck - Classic
+        alias: pyright-classic
+        entry: poetry -C classic run pyright
+        files: ^classic/(original_autogpt|forge|direct_benchmark)/.*\.py$|^classic/poetry\.lock$
        types: [file]
        language: system
        pass_filenames: false
@@ -360,26 +283,9 @@ repos:
  #       pass_filenames: false

  #     - id: pytest
-  #       name: Run tests - Classic - AutoGPT (excl. slow tests)
-  #       alias: pytest-classic-autogpt
-  #       entry: bash -c 'cd classic/original_autogpt && poetry run pytest --cov=autogpt -m "not slow" tests/unit tests/integration'
-  #       # include forge source (since it's a path dependency) but exclude *_test.py files:
-  #       files: ^(classic/original_autogpt/((autogpt|tests)/|poetry\.lock$)|classic/forge/(forge/.*(?<!_test)\.py|poetry\.lock)$)
-  #       language: system
-  #       pass_filenames: false
-
-  #     - id: pytest
-  #       name: Run tests - Classic - Forge (excl. slow tests)
-  #       alias: pytest-classic-forge
-  #       entry: bash -c 'cd classic/forge && poetry run pytest --cov=forge -m "not slow"'
-  #       files: ^classic/forge/(forge/|tests/|poetry\.lock$)
-  #       language: system
-  #       pass_filenames: false
-
-  #     - id: pytest
-  #       name: Run tests - Classic - Benchmark
-  #       alias: pytest-classic-benchmark
-  #       entry: bash -c 'cd classic/benchmark && poetry run pytest --cov=benchmark'
-  #       files: ^classic/benchmark/(agbenchmark/|tests/|poetry\.lock$)
+  #       name: Run tests - Classic (excl. slow tests)
+  #       alias: pytest-classic
+  #       entry: bash -c 'cd classic && poetry run pytest -m "not slow"'
+  #       files: ^classic/(original_autogpt|forge|direct_benchmark)/
  #       language: system
  #       pass_filenames: false
--- a/.secrets.baseline
+++ b/.secrets.baseline
@@ -0,0 +1,467 @@
+{
+  "version": "1.5.0",
+  "plugins_used": [
+    {
+      "name": "ArtifactoryDetector"
+    },
+    {
+      "name": "AWSKeyDetector"
+    },
+    {
+      "name": "AzureStorageKeyDetector"
+    },
+    {
+      "name": "Base64HighEntropyString",
+      "limit": 4.5
+    },
+    {
+      "name": "BasicAuthDetector"
+    },
+    {
+      "name": "CloudantDetector"
+    },
+    {
+      "name": "DiscordBotTokenDetector"
+    },
+    {
+      "name": "GitHubTokenDetector"
+    },
+    {
+      "name": "GitLabTokenDetector"
+    },
+    {
+      "name": "HexHighEntropyString",
+      "limit": 3.0
+    },
+    {
+      "name": "IbmCloudIamDetector"
+    },
+    {
+      "name": "IbmCosHmacDetector"
+    },
+    {
+      "name": "IPPublicDetector"
+    },
+    {
+      "name": "JwtTokenDetector"
+    },
+    {
+      "name": "KeywordDetector",
+      "keyword_exclude": ""
+    },
+    {
+      "name": "MailchimpDetector"
+    },
+    {
+      "name": "NpmDetector"
+    },
+    {
+      "name": "OpenAIDetector"
+    },
+    {
+      "name": "PrivateKeyDetector"
+    },
+    {
+      "name": "PypiTokenDetector"
+    },
+    {
+      "name": "SendGridDetector"
+    },
+    {
+      "name": "SlackDetector"
+    },
+    {
+      "name": "SoftlayerDetector"
+    },
+    {
+      "name": "SquareOAuthDetector"
+    },
+    {
+      "name": "StripeDetector"
+    },
+    {
+      "name": "TelegramBotTokenDetector"
+    },
+    {
+      "name": "TwilioKeyDetector"
+    }
+  ],
+  "filters_used": [
+    {
+      "path": "detect_secrets.filters.allowlist.is_line_allowlisted"
+    },
+    {
+      "path": "detect_secrets.filters.common.is_ignored_due_to_verification_policies",
+      "min_level": 2
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_indirect_reference"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_likely_id_string"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_lock_file"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_not_alphanumeric_string"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_potential_uuid"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_prefixed_with_dollar_sign"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_sequential_string"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_swagger_file"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_templated_secret"
+    },
+    {
+      "path": "detect_secrets.filters.regex.should_exclude_file",
+      "pattern": [
+        "\\.env$",
+        "pnpm-lock\\.yaml$",
+        "\\.env\\.(default|example|template)$",
+        "__pycache__",
+        "_test\\.py$",
+        "test_.*\\.py$",
+        "conftest\\.py$",
+        "poetry\\.lock$",
+        "node_modules"
+      ]
+    }
+  ],
+  "results": {
+    "autogpt_platform/backend/backend/api/external/v1/integrations.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/api/external/v1/integrations.py",
+        "hashed_secret": "665b1e3851eefefa3fb878654292f16597d25155",
+        "is_verified": false,
+        "line_number": 289
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/airtable/_config.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/blocks/airtable/_config.py",
+        "hashed_secret": "57e168b03afb7c1ee3cdc4ee3db2fe1cc6e0df26",
+        "is_verified": false,
+        "line_number": 29
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/dataforseo/_config.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/blocks/dataforseo/_config.py",
+        "hashed_secret": "32ce93887331fa5d192f2876ea15ec000c7d58b8",
+        "is_verified": false,
+        "line_number": 12
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/github/checks.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/checks.py",
+        "hashed_secret": "8ac6f92737d8586790519c5d7bfb4d2eb172c238",
+        "is_verified": false,
+        "line_number": 108
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/github/ci.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/ci.py",
+        "hashed_secret": "90bd1b48e958257948487b90bee080ba5ed00caa",
+        "is_verified": false,
+        "line_number": 123
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
+        "hashed_secret": "f96896dafced7387dcd22343b8ea29d3d2c65663",
+        "is_verified": false,
+        "line_number": 42
+      },
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
+        "hashed_secret": "b80a94d5e70bedf4f5f89d2f5a5255cc9492d12e",
+        "is_verified": false,
+        "line_number": 193
+      },
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
+        "hashed_secret": "75b17e517fe1b3136394f6bec80c4f892da75e42",
+        "is_verified": false,
+        "line_number": 344
+      },
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
+        "hashed_secret": "b0bfb5e4e2394e7f8906e5ed1dffd88b2bc89dd5",
+        "is_verified": false,
+        "line_number": 534
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/github/statuses.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/statuses.py",
+        "hashed_secret": "8ac6f92737d8586790519c5d7bfb4d2eb172c238",
+        "is_verified": false,
+        "line_number": 85
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/google/docs.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/google/docs.py",
+        "hashed_secret": "c95da0c6696342c867ef0c8258d2f74d20fd94d4",
+        "is_verified": false,
+        "line_number": 203
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/google/sheets.py": [
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/google/sheets.py",
+        "hashed_secret": "bd5a04fa3667e693edc13239b6d310c5c7a8564b",
+        "is_verified": false,
+        "line_number": 57
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/linear/_config.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/blocks/linear/_config.py",
+        "hashed_secret": "b37f020f42d6d613b6ce30103e4d408c4499b3bb",
+        "is_verified": false,
+        "line_number": 53
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/medium.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/medium.py",
+        "hashed_secret": "ff998abc1ce6d8f01a675fa197368e44c8916e9c",
+        "is_verified": false,
+        "line_number": 131
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/replicate/replicate_block.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/replicate/replicate_block.py",
+        "hashed_secret": "8bbdd6f26368f58ea4011d13d7f763cb662e66f0",
+        "is_verified": false,
+        "line_number": 55
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/slant3d/webhook.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/slant3d/webhook.py",
+        "hashed_secret": "36263c76947443b2f6e6b78153967ac4a7da99f9",
+        "is_verified": false,
+        "line_number": 100
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/talking_head.py": [
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/talking_head.py",
+        "hashed_secret": "44ce2d66222529eea4a32932823466fc0601c799",
+        "is_verified": false,
+        "line_number": 113
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/wordpress/_config.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/blocks/wordpress/_config.py",
+        "hashed_secret": "e62679512436161b78e8a8d68c8829c2a1031ccb",
+        "is_verified": false,
+        "line_number": 17
+      }
+    ],
+    "autogpt_platform/backend/backend/util/cache.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/util/cache.py",
+        "hashed_secret": "37f0c918c3fa47ca4a70e42037f9f123fdfbc75b",
+        "is_verified": false,
+        "line_number": 449
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/helpers.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/helpers.ts",
+        "hashed_secret": "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8",
+        "is_verified": false,
+        "line_number": 6
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/dictionaries/en.json": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/dictionaries/en.json",
+        "hashed_secret": "8be3c943b1609fffbfc51aad666d0a04adf83c9d",
+        "is_verified": false,
+        "line_number": 5
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/dictionaries/es.json": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/dictionaries/es.json",
+        "hashed_secret": "5a6d1c612954979ea99ee33dbb2d231b00f6ac0a",
+        "is_verified": false,
+        "line_number": 5
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts",
+        "hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
+        "is_verified": false,
+        "line_number": 6
+      },
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts",
+        "hashed_secret": "f72cbb45464d487064610c5411c576ca4019d380",
+        "is_verified": false,
+        "line_number": 8
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts",
+        "hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
+        "is_verified": false,
+        "line_number": 5
+      },
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts",
+        "hashed_secret": "f72cbb45464d487064610c5411c576ca4019d380",
+        "is_verified": false,
+        "line_number": 7
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx",
+        "hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
+        "is_verified": false,
+        "line_number": 192
+      },
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx",
+        "hashed_secret": "86275db852204937bbdbdebe5fabe8536e030ab6",
+        "is_verified": false,
+        "line_number": 193
+      }
+    ],
+    "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts",
+        "hashed_secret": "47acd2028cf81b5da88ddeedb2aea4eca4b71fbd",
+        "is_verified": false,
+        "line_number": 102
+      },
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts",
+        "hashed_secret": "8be3c943b1609fffbfc51aad666d0a04adf83c9d",
+        "is_verified": false,
+        "line_number": 103
+      }
+    ],
+    "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts": [
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "9c486c92f1a7420e1045c7ad963fbb7ba3621025",
+        "is_verified": false,
+        "line_number": 73
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "9277508c7a6effc8fb59163efbfada189e35425c",
+        "is_verified": false,
+        "line_number": 75
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "8dc7e2cb1d0935897d541bf5facab389b8a50340",
+        "is_verified": false,
+        "line_number": 77
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "79a26ad48775944299be6aaf9fb1d5302c1ed75b",
+        "is_verified": false,
+        "line_number": 79
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "a3b62b44500a1612e48d4cab8294df81561b3b1a",
+        "is_verified": false,
+        "line_number": 81
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "a58979bd0b21ef4f50417d001008e60dd7a85c64",
+        "is_verified": false,
+        "line_number": 83
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "6cb6e075f8e8c7c850f9d128d6608e5dbe209a79",
+        "is_verified": false,
+        "line_number": 85
+      }
+    ],
+    "autogpt_platform/frontend/src/lib/constants.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/lib/constants.ts",
+        "hashed_secret": "27b924db06a28cc755fb07c54f0fddc30659fe4d",
+        "is_verified": false,
+        "line_number": 10
+      }
+    ],
+    "autogpt_platform/frontend/src/tests/credentials/index.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/tests/credentials/index.ts",
+        "hashed_secret": "c18006fc138809314751cd1991f1e0b820fabd37",
+        "is_verified": false,
+        "line_number": 4
+      }
+    ]
+  },
+  "generated_at": "2026-04-02T13:10:54Z"
+}
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,6 +1,6 @@
 # AutoGPT Platform Contribution Guide

-This guide provides context for Codex when updating the **autogpt_platform** folder.
+This guide provides context for coding agents when updating the **autogpt_platform** folder.

 ## Directory overview

@@ -30,7 +30,7 @@ See `/frontend/CONTRIBUTING.md` for complete patterns. Quick reference:
   - Regenerate with `pnpm generate:api`
   - Pattern: `use{Method}{Version}{OperationName}`
 4. **Styling**: Tailwind CSS only, use design tokens, Phosphor Icons only
-5. **Testing**: Add Storybook stories for new components, Playwright for E2E
+5. **Testing**: Integration tests (Vitest + RTL + MSW) are the default (~90%, page-level). Playwright for E2E critical flows. Storybook for design system components. See `autogpt_platform/frontend/TESTING.md`
 6. **Code conventions**: Function declarations (not arrow functions) for components/handlers

 - Component props should be `interface Props { ... }` (not exported) unless the interface needs to be used outside the component
@@ -47,7 +47,9 @@ See `/frontend/CONTRIBUTING.md` for complete patterns. Quick reference:
 ## Testing

 - Backend: `poetry run test` (runs pytest with a docker based postgres + prisma).
- Frontend: `pnpm test` or `pnpm test-ui` for Playwright tests. See `docs/content/platform/contributing/tests.md` for tips.
+- Frontend integration tests: `pnpm test:unit` (Vitest + RTL + MSW, primary testing approach).
+- Frontend E2E tests: `pnpm test` or `pnpm test-ui` for Playwright tests.
+- See `autogpt_platform/frontend/TESTING.md` for the full testing strategy.

 Always run the relevant linters and tests before committing.
 Use conventional commit messages for all commits (e.g. `feat(backend): add API`).
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1 @@
+@AGENTS.md
--- a/README.md
+++ b/README.md
@@ -83,13 +83,13 @@ The AutoGPT frontend is where users interact with our powerful AI automation pla

   **Agent Builder:** For those who want to customize, our intuitive, low-code interface allows you to design and configure your own AI agents. 
   
-   **Workflow Management:** Build, modify, and optimize your automation workflows with ease. You build your agent by connecting blocks, where each block     performs a single action.
+   **Workflow Management:** Build, modify, and optimize your automation workflows with ease. You build your agent by connecting blocks, where each block performs a single action.
   
   **Deployment Controls:** Manage the lifecycle of your agents, from testing to production.
   
   **Ready-to-Use Agents:** Don't want to build? Simply select from our library of pre-configured agents and put them to work immediately.
   
-   **Agent Interaction:** Whether you've built your own or are using pre-configured agents, easily run and interact with them through our user-friendly      interface.
+   **Agent Interaction:** Whether you've built your own or are using pre-configured agents, easily run and interact with them through our user-friendly interface.

   **Monitoring and Analytics:** Keep track of your agents' performance and gain insights to continually improve your automation processes.

--- a/autogpt_platform/AGENTS.md
+++ b/autogpt_platform/AGENTS.md
@@ -0,0 +1,120 @@
+# AutoGPT Platform
+
+This file provides guidance to coding agents when working with code in this repository.
+
+## Repository Overview
+
+AutoGPT Platform is a monorepo containing:
+
+- **Backend** (`backend`): Python FastAPI server with async support
+- **Frontend** (`frontend`): Next.js React application
+- **Shared Libraries** (`autogpt_libs`): Common Python utilities
+
+## Component Documentation
+
+- **Backend**: See @backend/AGENTS.md for backend-specific commands, architecture, and development tasks
+- **Frontend**: See @frontend/AGENTS.md for frontend-specific commands, architecture, and development patterns
+
+## Key Concepts
+
+1. **Agent Graphs**: Workflow definitions stored as JSON, executed by the backend
+2. **Blocks**: Reusable components in `backend/backend/blocks/` that perform specific tasks
+3. **Integrations**: OAuth and API connections stored per user
+4. **Store**: Marketplace for sharing agent templates
+5. **Virus Scanning**: ClamAV integration for file upload security
+
+### Environment Configuration
+
+#### Configuration Files
+
+- **Backend**: `backend/.env.default` (defaults) → `backend/.env` (user overrides)
+- **Frontend**: `frontend/.env.default` (defaults) → `frontend/.env` (user overrides)
+- **Platform**: `.env.default` (Supabase/shared defaults) → `.env` (user overrides)
+
+#### Docker Environment Loading Order
+
+1. `.env.default` files provide base configuration (tracked in git)
+2. `.env` files provide user-specific overrides (gitignored)
+3. Docker Compose `environment:` sections provide service-specific overrides
+4. Shell environment variables have highest precedence
+
+#### Key Points
+
+- All services use hardcoded defaults in docker-compose files (no `${VARIABLE}` substitutions)
+- The `env_file` directive loads variables INTO containers at runtime
+- Backend/Frontend services use YAML anchors for consistent configuration
+- Supabase services (`db/docker/docker-compose.yml`) follow the same pattern
+
+### Branching Strategy
+
+- **`dev`** is the main development branch. All PRs should target `dev`.
+- **`master`** is the production branch. Only used for production releases.
+
+### Creating Pull Requests
+
+- Create the PR against the `dev` branch of the repository.
+- **Split PRs by concern** — each PR should have a single clear purpose. For example, "usage tracking" and "credit charging" should be separate PRs even if related. Combining multiple concerns makes it harder for reviewers to understand what belongs to what.
+- Ensure the branch name is descriptive (e.g., `feature/add-new-block`)
+- Use conventional commit messages (see below)
+- **Structure the PR description with Why / What / How** — Why: the motivation (what problem it solves, what's broken/missing without it); What: high-level summary of changes; How: approach, key implementation details, or architecture decisions. Reviewers need all three to judge whether the approach fits the problem.
+- Fill out the .github/PULL_REQUEST_TEMPLATE.md template as the PR description
+- Always use `--body-file` to pass PR body — avoids shell interpretation of backticks and special characters:
+  ```bash
+  PR_BODY=$(mktemp)
+  cat > "$PR_BODY" << 'PREOF'
+  ## Summary
+  - use `backticks` freely here
+  PREOF
+  gh pr create --title "..." --body-file "$PR_BODY" --base dev
+  rm "$PR_BODY"
+  ```
+- Run the github pre-commit hooks to ensure code quality.
+
+### Test-Driven Development (TDD)
+
+When fixing a bug or adding a feature, follow a test-first approach:
+
+1. **Write a failing test first** — create a test that reproduces the bug or validates the new behavior, marked with `@pytest.mark.xfail` (backend) or `.fixme` (Playwright). Run it to confirm it fails for the right reason.
+2. **Implement the fix/feature** — write the minimal code to make the test pass.
+3. **Remove the xfail marker** — once the test passes, remove the `xfail`/`.fixme` annotation and run the full test suite to confirm nothing else broke.
+
+This ensures every change is covered by a test and that the test actually validates the intended behavior.
+
+### Reviewing/Revising Pull Requests
+
+Use `/pr-review` to review a PR or `/pr-address` to address comments.
+
+When fetching comments manually:
+- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate` — top-level reviews
+- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate` — inline review comments (always paginate to avoid missing comments beyond page 1)
+- `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments` — PR conversation comments
+
+### Conventional Commits
+
+Use this format for commit messages and Pull Request titles:
+
+**Conventional Commit Types:**
+
+- `feat`: Introduces a new feature to the codebase
+- `fix`: Patches a bug in the codebase
+- `refactor`: Code change that neither fixes a bug nor adds a feature; also applies to removing features
+- `ci`: Changes to CI configuration
+- `docs`: Documentation-only changes
+- `dx`: Improvements to the developer experience
+
+**Recommended Base Scopes:**
+
+- `platform`: Changes affecting both frontend and backend
+- `frontend`
+- `backend`
+- `infra`
+- `blocks`: Modifications/additions of individual blocks
+
+**Subscope Examples:**
+
+- `backend/executor`
+- `backend/db`
+- `frontend/builder` (includes changes to the block UI component)
+- `infra/prod`
+
+Use these scopes and subscopes for clarity and consistency in commit messages.
--- a/autogpt_platform/CLAUDE.md
+++ b/autogpt_platform/CLAUDE.md
@@ -1,98 +1 @@
-# CLAUDE.md
-
-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
-
-## Repository Overview
-
-AutoGPT Platform is a monorepo containing:
-
- **Backend** (`backend`): Python FastAPI server with async support
- **Frontend** (`frontend`): Next.js React application
- **Shared Libraries** (`autogpt_libs`): Common Python utilities
-
-## Component Documentation
-
- **Backend**: See @backend/CLAUDE.md for backend-specific commands, architecture, and development tasks
- **Frontend**: See @frontend/CLAUDE.md for frontend-specific commands, architecture, and development patterns
-
-## Key Concepts
-
-1. **Agent Graphs**: Workflow definitions stored as JSON, executed by the backend
-2. **Blocks**: Reusable components in `backend/backend/blocks/` that perform specific tasks
-3. **Integrations**: OAuth and API connections stored per user
-4. **Store**: Marketplace for sharing agent templates
-5. **Virus Scanning**: ClamAV integration for file upload security
-
-### Environment Configuration
-
-#### Configuration Files
-
- **Backend**: `backend/.env.default` (defaults) → `backend/.env` (user overrides)
- **Frontend**: `frontend/.env.default` (defaults) → `frontend/.env` (user overrides)
- **Platform**: `.env.default` (Supabase/shared defaults) → `.env` (user overrides)
-
-#### Docker Environment Loading Order
-
-1. `.env.default` files provide base configuration (tracked in git)
-2. `.env` files provide user-specific overrides (gitignored)
-3. Docker Compose `environment:` sections provide service-specific overrides
-4. Shell environment variables have highest precedence
-
-#### Key Points
-
- All services use hardcoded defaults in docker-compose files (no `${VARIABLE}` substitutions)
- The `env_file` directive loads variables INTO containers at runtime
- Backend/Frontend services use YAML anchors for consistent configuration
- Supabase services (`db/docker/docker-compose.yml`) follow the same pattern
-
-### Branching Strategy
-
- **`dev`** is the main development branch. All PRs should target `dev`.
- **`master`** is the production branch. Only used for production releases.
-
-### Creating Pull Requests
-
- Create the PR against the `dev` branch of the repository.
- Ensure the branch name is descriptive (e.g., `feature/add-new-block`)
- Use conventional commit messages (see below)
- Fill out the .github/PULL_REQUEST_TEMPLATE.md template as the PR description
- Run the github pre-commit hooks to ensure code quality.
-
-### Reviewing/Revising Pull Requests
-
-Use `/pr-review` to review a PR or `/pr-address` to address comments.
-
-When fetching comments manually:
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews` — top-level reviews
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments` — inline review comments
- `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments` — PR conversation comments
-
-### Conventional Commits
-
-Use this format for commit messages and Pull Request titles:
-
-**Conventional Commit Types:**
-
- `feat`: Introduces a new feature to the codebase
- `fix`: Patches a bug in the codebase
- `refactor`: Code change that neither fixes a bug nor adds a feature; also applies to removing features
- `ci`: Changes to CI configuration
- `docs`: Documentation-only changes
- `dx`: Improvements to the developer experience
-
-**Recommended Base Scopes:**
-
- `platform`: Changes affecting both frontend and backend
- `frontend`
- `backend`
- `infra`
- `blocks`: Modifications/additions of individual blocks
-
-**Subscope Examples:**
-
- `backend/executor`
- `backend/db`
- `frontend/builder` (includes changes to the block UI component)
- `infra/prod`
-
-Use these scopes and subscopes for clarity and consistency in commit messages.
+@AGENTS.md
--- a/autogpt_platform/autogpt_libs/poetry.lock
+++ b/autogpt_platform/autogpt_libs/poetry.lock
@@ -1,4 +1,4 @@
-# This file is automatically @generated by Poetry 2.1.1 and should not be changed by hand.
+# This file is automatically @generated by Poetry 2.2.1 and should not be changed by hand.

 [[package]]
 name = "annotated-doc"
@@ -67,7 +67,7 @@ description = "Backport of asyncio.Runner, a context manager that controls event
 optional = false
 python-versions = "<3.11,>=3.8"
 groups = ["dev"]
-markers = "python_version < \"3.11\""
+markers = "python_version == \"3.10\""
 files = [
    {file = "backports_asyncio_runner-1.2.0-py3-none-any.whl", hash = "sha256:0da0a936a8aeb554eccb426dc55af3ba63bcdc69fa1a600b5bb305413a4477b5"},
    {file = "backports_asyncio_runner-1.2.0.tar.gz", hash = "sha256:a5aa7b2b7d8f8bfcaa2b57313f70792df84e32a2a746f585213373f900b42162"},
@@ -541,7 +541,7 @@ description = "Backport of PEP 654 (exception groups)"
 optional = false
 python-versions = ">=3.7"
 groups = ["main", "dev"]
-markers = "python_version < \"3.11\""
+markers = "python_version == \"3.10\""
 files = [
    {file = "exceptiongroup-1.3.0-py3-none-any.whl", hash = "sha256:4d111e6e0c13d0644cad6ddaa7ed0261a0b36971f6d23e7ec9b4b9097da78a10"},
    {file = "exceptiongroup-1.3.0.tar.gz", hash = "sha256:b241f5885f560bc56a59ee63ca4c6a8bfa46ae4ad651af316d4e81817bb9fd88"},
@@ -2181,14 +2181,14 @@ testing = ["coverage (>=6.2)", "hypothesis (>=5.7.1)"]

 [[package]]
 name = "pytest-cov"
-version = "7.0.0"
+version = "7.1.0"
 description = "Pytest plugin for measuring coverage."
 optional = false
 python-versions = ">=3.9"
 groups = ["dev"]
 files = [
-    {file = "pytest_cov-7.0.0-py3-none-any.whl", hash = "sha256:3b8e9558b16cc1479da72058bdecf8073661c7f57f7d3c5f22a1c23507f2d861"},
-    {file = "pytest_cov-7.0.0.tar.gz", hash = "sha256:33c97eda2e049a0c5298e91f519302a1334c26ac65c1a483d6206fd458361af1"},
+    {file = "pytest_cov-7.1.0-py3-none-any.whl", hash = "sha256:a0461110b7865f9a271aa1b51e516c9a95de9d696734a2f71e3e78f46e1d4678"},
+    {file = "pytest_cov-7.1.0.tar.gz", hash = "sha256:30674f2b5f6351aa09702a9c8c364f6a01c27aae0c1366ae8016160d1efc56b2"},
 ]

 [package.dependencies]
@@ -2342,30 +2342,30 @@ pyasn1 = ">=0.1.3"

 [[package]]
 name = "ruff"
-version = "0.15.0"
+version = "0.15.7"
 description = "An extremely fast Python linter and code formatter, written in Rust."
 optional = false
 python-versions = ">=3.7"
 groups = ["dev"]
 files = [
-    {file = "ruff-0.15.0-py3-none-linux_armv6l.whl", hash = "sha256:aac4ebaa612a82b23d45964586f24ae9bc23ca101919f5590bdb368d74ad5455"},
-    {file = "ruff-0.15.0-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:dcd4be7cc75cfbbca24a98d04d0b9b36a270d0833241f776b788d59f4142b14d"},
-    {file = "ruff-0.15.0-py3-none-macosx_11_0_arm64.whl", hash = "sha256:d747e3319b2bce179c7c1eaad3d884dc0a199b5f4d5187620530adf9105268ce"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:650bd9c56ae03102c51a5e4b554d74d825ff3abe4db22b90fd32d816c2e90621"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:a6664b7eac559e3048223a2da77769c2f92b43a6dfd4720cef42654299a599c9"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:6f811f97b0f092b35320d1556f3353bf238763420ade5d9e62ebd2b73f2ff179"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:761ec0a66680fab6454236635a39abaf14198818c8cdf691e036f4bc0f406b2d"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:940f11c2604d317e797b289f4f9f3fa5555ffe4fb574b55ed006c3d9b6f0eb78"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bcbca3d40558789126da91d7ef9a7c87772ee107033db7191edefa34e2c7f1b4"},
-    {file = "ruff-0.15.0-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:9a121a96db1d75fa3eb39c4539e607f628920dd72ff1f7c5ee4f1b768ac62d6e"},
-    {file = "ruff-0.15.0-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:5298d518e493061f2eabd4abd067c7e4fb89e2f63291c94332e35631c07c3662"},
-    {file = "ruff-0.15.0-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:afb6e603d6375ff0d6b0cee563fa21ab570fd15e65c852cb24922cef25050cf1"},
-    {file = "ruff-0.15.0-py3-none-musllinux_1_2_i686.whl", hash = "sha256:77e515f6b15f828b94dc17d2b4ace334c9ddb7d9468c54b2f9ed2b9c1593ef16"},
-    {file = "ruff-0.15.0-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:6f6e80850a01eb13b3e42ee0ebdf6e4497151b48c35051aab51c101266d187a3"},
-    {file = "ruff-0.15.0-py3-none-win32.whl", hash = "sha256:238a717ef803e501b6d51e0bdd0d2c6e8513fe9eec14002445134d3907cd46c3"},
-    {file = "ruff-0.15.0-py3-none-win_amd64.whl", hash = "sha256:dd5e4d3301dc01de614da3cdffc33d4b1b96fb89e45721f1598e5532ccf78b18"},
-    {file = "ruff-0.15.0-py3-none-win_arm64.whl", hash = "sha256:c480d632cc0ca3f0727acac8b7d053542d9e114a462a145d0b00e7cd658c515a"},
-    {file = "ruff-0.15.0.tar.gz", hash = "sha256:6bdea47cdbea30d40f8f8d7d69c0854ba7c15420ec75a26f463290949d7f7e9a"},
+    {file = "ruff-0.15.7-py3-none-linux_armv6l.whl", hash = "sha256:a81cc5b6910fb7dfc7c32d20652e50fa05963f6e13ead3c5915c41ac5d16668e"},
+    {file = "ruff-0.15.7-py3-none-macosx_10_12_x86_64.whl", hash = "sha256:722d165bd52403f3bdabc0ce9e41fc47070ac56d7a91b4e0d097b516a53a3477"},
+    {file = "ruff-0.15.7-py3-none-macosx_11_0_arm64.whl", hash = "sha256:7fbc2448094262552146cbe1b9643a92f66559d3761f1ad0656d4991491af49e"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:6b39329b60eba44156d138275323cc726bbfbddcec3063da57caa8a8b1d50adf"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:87768c151808505f2bfc93ae44e5f9e7c8518943e5074f76ac21558ef5627c85"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:fb0511670002c6c529ec66c0e30641c976c8963de26a113f3a30456b702468b0"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:e0d19644f801849229db8345180a71bee5407b429dd217f853ec515e968a6912"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:4806d8e09ef5e84eb19ba833d0442f7e300b23fe3f0981cae159a248a10f0036"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:dce0896488562f09a27b9c91b1f58a097457143931f3c4d519690dea54e624c5"},
+    {file = "ruff-0.15.7-py3-none-manylinux_2_31_riscv64.whl", hash = "sha256:1852ce241d2bc89e5dc823e03cff4ce73d816b5c6cdadd27dbfe7b03217d2a12"},
+    {file = "ruff-0.15.7-py3-none-musllinux_1_2_aarch64.whl", hash = "sha256:5f3e4b221fb4bd293f79912fc5e93a9063ebd6d0dcbd528f91b89172a9b8436c"},
+    {file = "ruff-0.15.7-py3-none-musllinux_1_2_armv7l.whl", hash = "sha256:b15e48602c9c1d9bdc504b472e90b90c97dc7d46c7028011ae67f3861ceba7b4"},
+    {file = "ruff-0.15.7-py3-none-musllinux_1_2_i686.whl", hash = "sha256:1b4705e0e85cedc74b0a23cf6a179dbb3df184cb227761979cc76c0440b5ab0d"},
+    {file = "ruff-0.15.7-py3-none-musllinux_1_2_x86_64.whl", hash = "sha256:112c1fa316a558bb34319282c1200a8bf0495f1b735aeb78bfcb2991e6087580"},
+    {file = "ruff-0.15.7-py3-none-win32.whl", hash = "sha256:6d39e2d3505b082323352f733599f28169d12e891f7dd407f2d4f54b4c2886de"},
+    {file = "ruff-0.15.7-py3-none-win_amd64.whl", hash = "sha256:4d53d712ddebcd7dace1bc395367aec12c057aacfe9adbb6d832302575f4d3a1"},
+    {file = "ruff-0.15.7-py3-none-win_arm64.whl", hash = "sha256:18e8d73f1c3fdf27931497972250340f92e8c861722161a9caeb89a58ead6ed2"},
+    {file = "ruff-0.15.7.tar.gz", hash = "sha256:04f1ae61fc20fe0b148617c324d9d009b5f63412c0b16474f3d5f1a1a665f7ac"},
 ]

 [[package]]
@@ -2564,7 +2564,7 @@ description = "A lil' TOML parser"
 optional = false
 python-versions = ">=3.8"
 groups = ["dev"]
-markers = "python_version < \"3.11\""
+markers = "python_version == \"3.10\""
 files = [
    {file = "tomli-2.2.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:678e4fa69e4575eb77d103de3df8a895e1591b48e740211bd1067378c69e8249"},
    {file = "tomli-2.2.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:023aa114dd824ade0100497eb2318602af309e5a55595f76b626d6d9f3b7b0a6"},
@@ -2912,4 +2912,4 @@ type = ["pytest-mypy"]
 [metadata]
 lock-version = "2.1"
 python-versions = ">=3.10,<4.0"
-content-hash = "9619cae908ad38fa2c48016a58bcf4241f6f5793aa0e6cc140276e91c433cbbb"
+content-hash = "e0936a065565550afed18f6298b7e04e814b44100def7049f1a0d68662624a39"
--- a/autogpt_platform/autogpt_libs/pyproject.toml
+++ b/autogpt_platform/autogpt_libs/pyproject.toml
@@ -26,8 +26,8 @@ pyright = "^1.1.408"
 pytest = "^8.4.1"
 pytest-asyncio = "^1.3.0"
 pytest-mock = "^3.15.1"
-pytest-cov = "^7.0.0"
-ruff = "^0.15.0"
+pytest-cov = "^7.1.0"
+ruff = "^0.15.7"

 [build-system]
 requires = ["poetry-core"]
--- a/autogpt_platform/backend/.env.default
+++ b/autogpt_platform/backend/.env.default
@@ -37,10 +37,6 @@ JWT_VERIFY_KEY=your-super-secret-jwt-token-with-at-least-32-characters-long
 ENCRYPTION_KEY=dvziYgz0KSK8FENhju0ZYi8-fRTfAdlz6YLhdB_jhNw=
 UNSUBSCRIBE_SECRET_KEY=HlP8ivStJjmbf6NKi78m_3FnOogut0t5ckzjsIqeaio=

-## ===== SIGNUP / INVITE GATE ===== ##
-# Set to true to require an invite before users can sign up
-ENABLE_INVITE_GATE=false
-
 ## ===== IMPORTANT OPTIONAL CONFIGURATION ===== ##
 # Platform URLs (set these for webhooks and OAuth to work)
 PLATFORM_BASE_URL=http://localhost:8000
@@ -182,6 +178,7 @@ SMTP_USERNAME=
 SMTP_PASSWORD=

 # Business & Marketing Tools
+AGENTMAIL_API_KEY=
 APOLLO_API_KEY=
 ENRICHLAYER_API_KEY=
 AYRSHARE_API_KEY=
--- a/autogpt_platform/backend/AGENTS.md
+++ b/autogpt_platform/backend/AGENTS.md
@@ -0,0 +1,227 @@
+# Backend
+
+This file provides guidance to coding agents when working with the backend.
+
+## Essential Commands
+
+To run something with Python package dependencies you MUST use `poetry run ...`.
+
+```bash
+# Install dependencies
+poetry install
+
+# Run database migrations
+poetry run prisma migrate dev
+
+# Start all services (database, redis, rabbitmq, clamav)
+docker compose up -d
+
+# Run the backend as a whole
+poetry run app
+
+# Run tests
+poetry run test
+
+# Run specific test
+poetry run pytest path/to/test_file.py::test_function_name
+
+# Run block tests (tests that validate all blocks work correctly)
+poetry run pytest backend/blocks/test/test_block.py -xvs
+
+# Run tests for a specific block (e.g., GetCurrentTimeBlock)
+poetry run pytest 'backend/blocks/test/test_block.py::test_available_blocks[GetCurrentTimeBlock]' -xvs
+
+# Lint and format
+# prefer format if you want to just "fix" it and only get the errors that can't be autofixed
+poetry run format  # Black + isort
+poetry run lint    # ruff
+```
+
+More details can be found in @TESTING.md
+
+### Creating/Updating Snapshots
+
+When you first write a test or when the expected output changes:
+
+```bash
+poetry run pytest path/to/test.py --snapshot-update
+```
+
+⚠️ **Important**: Always review snapshot changes before committing! Use `git diff` to verify the changes are expected.
+
+## Architecture
+
+- **API Layer**: FastAPI with REST and WebSocket endpoints
+- **Database**: PostgreSQL with Prisma ORM, includes pgvector for embeddings
+- **Queue System**: RabbitMQ for async task processing
+- **Execution Engine**: Separate executor service processes agent workflows
+- **Authentication**: JWT-based with Supabase integration
+- **Security**: Cache protection middleware prevents sensitive data caching in browsers/proxies
+
+## Code Style
+
+- **Top-level imports only** — no local/inner imports (lazy imports only for heavy optional deps like `openpyxl`)
+- **Absolute imports** — use `from backend.module import ...` for cross-package imports. Single-dot relative (`from .sibling import ...`) is acceptable for sibling modules within the same package (e.g., blocks). Avoid double-dot relative imports (`from ..parent import ...`) — use the absolute path instead
+- **No duck typing** — no `hasattr`/`getattr`/`isinstance` for type dispatch; use typed interfaces/unions/protocols
+- **Pydantic models** over dataclass/namedtuple/dict for structured data
+- **No linter suppressors** — no `# type: ignore`, `# noqa`, `# pyright: ignore`; fix the type/code
+- **List comprehensions** over manual loop-and-append
+- **Early return** — guard clauses first, avoid deep nesting
+- **f-strings vs printf syntax in log statements** — Use `%s` for deferred interpolation in `debug` statements, f-strings elsewhere for readability: `logger.debug("Processing %s items", count)`, `logger.info(f"Processing {count} items")`
+- **Sanitize error paths** — `os.path.basename()` in error messages to avoid leaking directory structure
+- **TOCTOU awareness** — avoid check-then-act patterns for file access and credit charging
+- **`Security()` vs `Depends()`** — use `Security()` for auth deps to get proper OpenAPI security spec
+- **Redis pipelines** — `transaction=True` for atomicity on multi-step operations
+- **`max(0, value)` guards** — for computed values that should never be negative
+- **SSE protocol** — `data:` lines for frontend-parsed events (must match Zod schema), `: comment` lines for heartbeats/status
+- **File length** — keep files under ~300 lines; if a file grows beyond this, split by responsibility (e.g. extract helpers, models, or a sub-module into a new file). Never keep appending to a long file.
+- **Function length** — keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions are a sign of mixed concerns, not complexity.
+- **Top-down ordering** — define the main/public function or class first, then the helpers it uses below. A reader should encounter high-level logic before implementation details.
+
+## Testing Approach
+
+- Uses pytest with snapshot testing for API responses
+- Test files are colocated with source files (`*_test.py`)
+- Mock at boundaries — mock where the symbol is **used**, not where it's **defined**
+- After refactoring, update mock targets to match new module paths
+- Use `AsyncMock` for async functions (`from unittest.mock import AsyncMock`)
+
+### Test-Driven Development (TDD)
+
+When fixing a bug or adding a feature, write the test **before** the implementation:
+
+```python
+# 1. Write a failing test marked xfail
+@pytest.mark.xfail(reason="Bug #1234: widget crashes on empty input")
+def test_widget_handles_empty_input():
+    result = widget.process("")
+    assert result == Widget.EMPTY_RESULT
+
+# 2. Run it — confirm it fails (XFAIL)
+# poetry run pytest path/to/test.py::test_widget_handles_empty_input -xvs
+
+# 3. Implement the fix
+
+# 4. Remove xfail, run again — confirm it passes
+def test_widget_handles_empty_input():
+    result = widget.process("")
+    assert result == Widget.EMPTY_RESULT
+```
+
+This catches regressions and proves the fix actually works. **Every bug fix should include a test that would have caught it.**
+
+## Database Schema
+
+Key models (defined in `schema.prisma`):
+
+- `User`: Authentication and profile data
+- `AgentGraph`: Workflow definitions with version control
+- `AgentGraphExecution`: Execution history and results
+- `AgentNode`: Individual nodes in a workflow
+- `StoreListing`: Marketplace listings for sharing agents
+
+## Environment Configuration
+
+- **Backend**: `.env.default` (defaults) → `.env` (user overrides)
+
+## Common Development Tasks
+
+### Adding a new block
+
+Follow the comprehensive [Block SDK Guide](@../../docs/platform/block-sdk-guide.md) which covers:
+
+- Provider configuration with `ProviderBuilder`
+- Block schema definition
+- Authentication (API keys, OAuth, webhooks)
+- Testing and validation
+- File organization
+
+Quick steps:
+
+1. Create new file in `backend/blocks/`
+2. Configure provider using `ProviderBuilder` in `_config.py`
+3. Inherit from `Block` base class
+4. Define input/output schemas using `BlockSchema`
+5. Implement async `run` method
+6. Generate unique block ID using `uuid.uuid4()`
+7. Test with `poetry run pytest backend/blocks/test/test_block.py`
+
+Note: when making many new blocks analyze the interfaces for each of these blocks and picture if they would go well together in a graph-based editor or would they struggle to connect productively?
+ex: do the inputs and outputs tie well together?
+
+If you get any pushback or hit complex block conditions check the new_blocks guide in the docs.
+
+#### Handling files in blocks with `store_media_file()`
+
+When blocks need to work with files (images, videos, documents), use `store_media_file()` from `backend.util.file`. The `return_format` parameter determines what you get back:
+
+| Format | Use When | Returns |
+|--------|----------|---------|
+| `"for_local_processing"` | Processing with local tools (ffmpeg, MoviePy, PIL) | Local file path (e.g., `"image.png"`) |
+| `"for_external_api"` | Sending content to external APIs (Replicate, OpenAI) | Data URI (e.g., `"data:image/png;base64,..."`) |
+| `"for_block_output"` | Returning output from your block | Smart: `workspace://` in CoPilot, data URI in graphs |
+
+**Examples:**
+
+```python
+# INPUT: Need to process file locally with ffmpeg
+local_path = await store_media_file(
+    file=input_data.video,
+    execution_context=execution_context,
+    return_format="for_local_processing",
+)
+# local_path = "video.mp4" - use with Path/ffmpeg/etc
+
+# INPUT: Need to send to external API like Replicate
+image_b64 = await store_media_file(
+    file=input_data.image,
+    execution_context=execution_context,
+    return_format="for_external_api",
+)
+# image_b64 = "data:image/png;base64,iVBORw0..." - send to API
+
+# OUTPUT: Returning result from block
+result_url = await store_media_file(
+    file=generated_image_url,
+    execution_context=execution_context,
+    return_format="for_block_output",
+)
+yield "image_url", result_url
+# In CoPilot: result_url = "workspace://abc123"
+# In graphs:  result_url = "data:image/png;base64,..."
+```
+
+**Key points:**
+
+- `for_block_output` is the ONLY format that auto-adapts to execution context
+- Always use `for_block_output` for block outputs unless you have a specific reason not to
+- Never hardcode workspace checks - let `for_block_output` handle it
+
+### Modifying the API
+
+1. Update route in `backend/api/features/`
+2. Add/update Pydantic models in same directory
+3. Write tests alongside the route file
+4. Run `poetry run test` to verify
+
+## Workspace & Media Files
+
+**Read [Workspace & Media Architecture](../../docs/platform/workspace-media-architecture.md) when:**
+- Working on CoPilot file upload/download features
+- Building blocks that handle `MediaFileType` inputs/outputs
+- Modifying `WorkspaceManager` or `store_media_file()`
+- Debugging file persistence or virus scanning issues
+
+Covers: `WorkspaceManager` (persistent storage with session scoping), `store_media_file()` (media normalization pipeline), and responsibility boundaries for virus scanning and persistence.
+
+## Security Implementation
+
+### Cache Protection Middleware
+
+- Located in `backend/api/middleware/security.py`
+- Default behavior: Disables caching for ALL endpoints with `Cache-Control: no-store, no-cache, must-revalidate, private`
+- Uses an allow list approach - only explicitly permitted paths can be cached
+- Cacheable paths include: static assets (`static/*`, `_next/static/*`), health checks, public store pages, documentation
+- Prevents sensitive data (auth tokens, API keys, user data) from being cached by browsers/proxies
+- To allow caching for a new endpoint, add it to `CACHEABLE_PATHS` in the middleware
+- Applied to both main API server and external API applications
--- a/autogpt_platform/backend/CLAUDE.md
+++ b/autogpt_platform/backend/CLAUDE.md
@@ -1,191 +1 @@
-# CLAUDE.md - Backend
-
-This file provides guidance to Claude Code when working with the backend.
-
-## Essential Commands
-
-To run something with Python package dependencies you MUST use `poetry run ...`.
-
-```bash
-# Install dependencies
-poetry install
-
-# Run database migrations
-poetry run prisma migrate dev
-
-# Start all services (database, redis, rabbitmq, clamav)
-docker compose up -d
-
-# Run the backend as a whole
-poetry run app
-
-# Run tests
-poetry run test
-
-# Run specific test
-poetry run pytest path/to/test_file.py::test_function_name
-
-# Run block tests (tests that validate all blocks work correctly)
-poetry run pytest backend/blocks/test/test_block.py -xvs
-
-# Run tests for a specific block (e.g., GetCurrentTimeBlock)
-poetry run pytest 'backend/blocks/test/test_block.py::test_available_blocks[GetCurrentTimeBlock]' -xvs
-
-# Lint and format
-# prefer format if you want to just "fix" it and only get the errors that can't be autofixed
-poetry run format  # Black + isort
-poetry run lint    # ruff
-```
-
-More details can be found in @TESTING.md
-
-### Creating/Updating Snapshots
-
-When you first write a test or when the expected output changes:
-
-```bash
-poetry run pytest path/to/test.py --snapshot-update
-```
-
-⚠️ **Important**: Always review snapshot changes before committing! Use `git diff` to verify the changes are expected.
-
-## Architecture
-
- **API Layer**: FastAPI with REST and WebSocket endpoints
- **Database**: PostgreSQL with Prisma ORM, includes pgvector for embeddings
- **Queue System**: RabbitMQ for async task processing
- **Execution Engine**: Separate executor service processes agent workflows
- **Authentication**: JWT-based with Supabase integration
- **Security**: Cache protection middleware prevents sensitive data caching in browsers/proxies
-
-## Code Style
-
- **Top-level imports only** — no local/inner imports (lazy imports only for heavy optional deps like `openpyxl`)
- **No duck typing** — no `hasattr`/`getattr`/`isinstance` for type dispatch; use typed interfaces/unions/protocols
- **Pydantic models** over dataclass/namedtuple/dict for structured data
- **No linter suppressors** — no `# type: ignore`, `# noqa`, `# pyright: ignore`; fix the type/code
- **List comprehensions** over manual loop-and-append
- **Early return** — guard clauses first, avoid deep nesting
- **Lazy `%s` logging** — `logger.info("Processing %s items", count)` not `logger.info(f"Processing {count} items")`
- **Sanitize error paths** — `os.path.basename()` in error messages to avoid leaking directory structure
- **TOCTOU awareness** — avoid check-then-act patterns for file access and credit charging
- **`Security()` vs `Depends()`** — use `Security()` for auth deps to get proper OpenAPI security spec
- **Redis pipelines** — `transaction=True` for atomicity on multi-step operations
- **`max(0, value)` guards** — for computed values that should never be negative
- **SSE protocol** — `data:` lines for frontend-parsed events (must match Zod schema), `: comment` lines for heartbeats/status
- **File length** — keep files under ~300 lines; if a file grows beyond this, split by responsibility (e.g. extract helpers, models, or a sub-module into a new file). Never keep appending to a long file.
- **Function length** — keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions are a sign of mixed concerns, not complexity.
-
-## Testing Approach
-
- Uses pytest with snapshot testing for API responses
- Test files are colocated with source files (`*_test.py`)
- Mock at boundaries — mock where the symbol is **used**, not where it's **defined**
- After refactoring, update mock targets to match new module paths
- Use `AsyncMock` for async functions (`from unittest.mock import AsyncMock`)
-
-## Database Schema
-
-Key models (defined in `schema.prisma`):
-
- `User`: Authentication and profile data
- `AgentGraph`: Workflow definitions with version control
- `AgentGraphExecution`: Execution history and results
- `AgentNode`: Individual nodes in a workflow
- `StoreListing`: Marketplace listings for sharing agents
-
-## Environment Configuration
-
- **Backend**: `.env.default` (defaults) → `.env` (user overrides)
-
-## Common Development Tasks
-
-### Adding a new block
-
-Follow the comprehensive [Block SDK Guide](@../../docs/content/platform/block-sdk-guide.md) which covers:
-
- Provider configuration with `ProviderBuilder`
- Block schema definition
- Authentication (API keys, OAuth, webhooks)
- Testing and validation
- File organization
-
-Quick steps:
-
-1. Create new file in `backend/blocks/`
-2. Configure provider using `ProviderBuilder` in `_config.py`
-3. Inherit from `Block` base class
-4. Define input/output schemas using `BlockSchema`
-5. Implement async `run` method
-6. Generate unique block ID using `uuid.uuid4()`
-7. Test with `poetry run pytest backend/blocks/test/test_block.py`
-
-Note: when making many new blocks analyze the interfaces for each of these blocks and picture if they would go well together in a graph-based editor or would they struggle to connect productively?
-ex: do the inputs and outputs tie well together?
-
-If you get any pushback or hit complex block conditions check the new_blocks guide in the docs.
-
-#### Handling files in blocks with `store_media_file()`
-
-When blocks need to work with files (images, videos, documents), use `store_media_file()` from `backend.util.file`. The `return_format` parameter determines what you get back:
-
-| Format | Use When | Returns |
-|--------|----------|---------|
-| `"for_local_processing"` | Processing with local tools (ffmpeg, MoviePy, PIL) | Local file path (e.g., `"image.png"`) |
-| `"for_external_api"` | Sending content to external APIs (Replicate, OpenAI) | Data URI (e.g., `"data:image/png;base64,..."`) |
-| `"for_block_output"` | Returning output from your block | Smart: `workspace://` in CoPilot, data URI in graphs |
-
-**Examples:**
-
-```python
-# INPUT: Need to process file locally with ffmpeg
-local_path = await store_media_file(
-    file=input_data.video,
-    execution_context=execution_context,
-    return_format="for_local_processing",
-)
-# local_path = "video.mp4" - use with Path/ffmpeg/etc
-
-# INPUT: Need to send to external API like Replicate
-image_b64 = await store_media_file(
-    file=input_data.image,
-    execution_context=execution_context,
-    return_format="for_external_api",
-)
-# image_b64 = "data:image/png;base64,iVBORw0..." - send to API
-
-# OUTPUT: Returning result from block
-result_url = await store_media_file(
-    file=generated_image_url,
-    execution_context=execution_context,
-    return_format="for_block_output",
-)
-yield "image_url", result_url
-# In CoPilot: result_url = "workspace://abc123"
-# In graphs:  result_url = "data:image/png;base64,..."
-```
-
-**Key points:**
-
- `for_block_output` is the ONLY format that auto-adapts to execution context
- Always use `for_block_output` for block outputs unless you have a specific reason not to
- Never hardcode workspace checks - let `for_block_output` handle it
-
-### Modifying the API
-
-1. Update route in `backend/api/features/`
-2. Add/update Pydantic models in same directory
-3. Write tests alongside the route file
-4. Run `poetry run test` to verify
-
-## Security Implementation
-
-### Cache Protection Middleware
-
- Located in `backend/api/middleware/security.py`
- Default behavior: Disables caching for ALL endpoints with `Cache-Control: no-store, no-cache, must-revalidate, private`
- Uses an allow list approach - only explicitly permitted paths can be cached
- Cacheable paths include: static assets (`static/*`, `_next/static/*`), health checks, public store pages, documentation
- Prevents sensitive data (auth tokens, API keys, user data) from being cached by browsers/proxies
- To allow caching for a new endpoint, add it to `CACHEABLE_PATHS` in the middleware
- Applied to both main API server and external API applications
+@AGENTS.md
--- a/autogpt_platform/backend/Dockerfile
+++ b/autogpt_platform/backend/Dockerfile
@@ -50,7 +50,7 @@ RUN poetry install --no-ansi --no-root
 # Generate Prisma client
 COPY autogpt_platform/backend/schema.prisma ./
 COPY autogpt_platform/backend/backend/data/partial_types.py ./backend/data/partial_types.py
-COPY autogpt_platform/backend/gen_prisma_types_stub.py ./
+COPY autogpt_platform/backend/scripts/gen_prisma_types_stub.py ./scripts/
 RUN poetry run prisma generate && poetry run gen-prisma-stub

 # =============================== DB MIGRATOR =============================== #
@@ -82,7 +82,7 @@ RUN pip3 install prisma>=0.15.0 --break-system-packages

 COPY autogpt_platform/backend/schema.prisma ./
 COPY autogpt_platform/backend/backend/data/partial_types.py ./backend/data/partial_types.py
-COPY autogpt_platform/backend/gen_prisma_types_stub.py ./
+COPY autogpt_platform/backend/scripts/gen_prisma_types_stub.py ./scripts/
 COPY autogpt_platform/backend/migrations ./migrations

 # ============================== BACKEND SERVER ============================== #
@@ -121,19 +121,21 @@ RUN ln -s ../lib/node_modules/npm/bin/npm-cli.js /usr/bin/npm \
    && ln -s ../lib/node_modules/npm/bin/npx-cli.js /usr/bin/npx
 COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries

-# Install agent-browser (Copilot browser tool) + Chromium runtime dependencies.
-# These are the runtime libraries Chromium/Playwright needs on Debian 13 (trixie).
-RUN apt-get update && apt-get install -y --no-install-recommends \
-    libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 \
-    libdbus-1-3 libxkbcommon0 libatspi2.0-0t64 libxcomposite1 libxdamage1 \
-    libxfixes3 libxrandr2 libgbm1 libasound2t64 libpango-1.0-0 libcairo2 \
-    libx11-6 libx11-xcb1 libxcb1 libxext6 libglib2.0-0t64 \
-    fonts-liberation libfontconfig1 \
+# Install agent-browser (Copilot browser tool) using the system chromium package.
+# Chrome for Testing (the binary agent-browser downloads via `agent-browser install`)
+# has no ARM64 builds, so we use the distro-packaged chromium instead — verified to
+# work with agent-browser via Docker tests on arm64; amd64 is validated in CI.
+# Note: system chromium tracks the Debian package schedule rather than a pinned
+# Chrome for Testing release. If agent-browser requires a specific Chrome version,
+# verify compatibility against the chromium package version in the base image.
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends chromium fonts-liberation \
    && rm -rf /var/lib/apt/lists/* \
    && npm install -g agent-browser \
-    && agent-browser install \
    && rm -rf /tmp/* /root/.npm

+ENV AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium
+
 WORKDIR /app/autogpt_platform/backend

 # Copy only the .venv from builder (not the entire /app directory)
--- a/autogpt_platform/backend/backend/api/external/v1/integrations.py
+++ b/autogpt_platform/backend/backend/api/external/v1/integrations.py
@@ -18,14 +18,22 @@ from pydantic import BaseModel, Field, SecretStr

 from backend.api.external.middleware import require_permission
 from backend.api.features.integrations.models import get_all_provider_names
+from backend.api.features.integrations.router import (
+    CredentialsMetaResponse,
+    to_meta_response,
+)
 from backend.data.auth.base import APIAuthorizationInfo
 from backend.data.model import (
    APIKeyCredentials,
    Credentials,
    CredentialsType,
    HostScopedCredentials,
-    OAuth2Credentials,
    UserPasswordCredentials,
+    is_sdk_default,
+)
+from backend.integrations.credentials_store import (
+    is_system_credential,
+    provider_matches,
 )
 from backend.integrations.creds_manager import IntegrationCredentialsManager
 from backend.integrations.oauth import CREDENTIALS_BY_PROVIDER, HANDLERS_BY_NAME
@@ -91,18 +99,6 @@ class OAuthCompleteResponse(BaseModel):
    )


-class CredentialSummary(BaseModel):
-    """Summary of a credential without sensitive data."""
-
-    id: str
-    provider: str
-    type: CredentialsType
-    title: Optional[str] = None
-    scopes: Optional[list[str]] = None
-    username: Optional[str] = None
-    host: Optional[str] = None
-
-
 class ProviderInfo(BaseModel):
    """Information about an integration provider."""

@@ -473,12 +469,12 @@ async def complete_oauth(
    )


-@integrations_router.get("/credentials", response_model=list[CredentialSummary])
+@integrations_router.get("/credentials", response_model=list[CredentialsMetaResponse])
 async def list_credentials(
    auth: APIAuthorizationInfo = Security(
        require_permission(APIKeyPermission.READ_INTEGRATIONS)
    ),
-) -> list[CredentialSummary]:
+) -> list[CredentialsMetaResponse]:
    """
    List all credentials for the authenticated user.

@@ -486,28 +482,19 @@ async def list_credentials(
    """
    credentials = await creds_manager.store.get_all_creds(auth.user_id)
    return [
-        CredentialSummary(
-            id=cred.id,
-            provider=cred.provider,
-            type=cred.type,
-            title=cred.title,
-            scopes=cred.scopes if isinstance(cred, OAuth2Credentials) else None,
-            username=cred.username if isinstance(cred, OAuth2Credentials) else None,
-            host=cred.host if isinstance(cred, HostScopedCredentials) else None,
-        )
-        for cred in credentials
+        to_meta_response(cred) for cred in credentials if not is_sdk_default(cred.id)
    ]


@integrations_router.get(
-    "/{provider}/credentials", response_model=list[CredentialSummary]
+    "/{provider}/credentials", response_model=list[CredentialsMetaResponse]
 )
 async def list_credentials_by_provider(
    provider: Annotated[str, Path(title="The provider to list credentials for")],
    auth: APIAuthorizationInfo = Security(
        require_permission(APIKeyPermission.READ_INTEGRATIONS)
    ),
-) -> list[CredentialSummary]:
+) -> list[CredentialsMetaResponse]:
    """
    List credentials for a specific provider.
    """
@@ -515,16 +502,7 @@ async def list_credentials_by_provider(
        auth.user_id, provider
    )
    return [
-        CredentialSummary(
-            id=cred.id,
-            provider=cred.provider,
-            type=cred.type,
-            title=cred.title,
-            scopes=cred.scopes if isinstance(cred, OAuth2Credentials) else None,
-            username=cred.username if isinstance(cred, OAuth2Credentials) else None,
-            host=cred.host if isinstance(cred, HostScopedCredentials) else None,
-        )
-        for cred in credentials
+        to_meta_response(cred) for cred in credentials if not is_sdk_default(cred.id)
    ]


@@ -597,11 +575,11 @@ async def create_credential(
    # Store credentials
    try:
        await creds_manager.create(auth.user_id, credentials)
-    except Exception as e:
-        logger.error(f"Failed to store credentials: {e}")
+    except Exception:
+        logger.exception("Failed to store credentials")
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
-            detail=f"Failed to store credentials: {str(e)}",
+            detail="Failed to store credentials",
        )

    logger.info(f"Created {request.type} credentials for provider {provider}")
@@ -639,15 +617,23 @@ async def delete_credential(
    use the main API's delete endpoint which handles webhook cleanup and
    token revocation.
    """
+    if is_sdk_default(cred_id):
+        raise HTTPException(
+            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
+        )
+    if is_system_credential(cred_id):
+        raise HTTPException(
+            status_code=status.HTTP_403_FORBIDDEN,
+            detail="System-managed credentials cannot be deleted",
+        )
    creds = await creds_manager.store.get_creds_by_id(auth.user_id, cred_id)
    if not creds:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
        )
-    if creds.provider != provider:
+    if not provider_matches(creds.provider, provider):
        raise HTTPException(
-            status_code=status.HTTP_404_NOT_FOUND,
-            detail="Credentials do not match the specified provider",
+            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
        )

    await creds_manager.delete(auth.user_id, cred_id)
--- a/autogpt_platform/backend/backend/api/external/v1/tools.py
+++ b/autogpt_platform/backend/backend/api/external/v1/tools.py
@@ -72,7 +72,7 @@ class RunAgentRequest(BaseModel):

 def _create_ephemeral_session(user_id: str) -> ChatSession:
    """Create an ephemeral session for stateless API requests."""
-    return ChatSession.new(user_id)
+    return ChatSession.new(user_id, dry_run=False)


@tools_router.post(
--- a/autogpt_platform/backend/backend/api/features/admin/model.py
+++ b/autogpt_platform/backend/backend/api/features/admin/model.py
@@ -1,17 +1,8 @@
-from __future__ import annotations
-
-from datetime import datetime
-from typing import TYPE_CHECKING, Any, Literal, Optional
-
-import prisma.enums
-from pydantic import BaseModel, EmailStr
+from pydantic import BaseModel

 from backend.data.model import UserTransaction
 from backend.util.models import Pagination

-if TYPE_CHECKING:
-    from backend.data.invited_user import BulkInvitedUsersResult, InvitedUserRecord
-

 class UserHistoryResponse(BaseModel):
    """Response model for listings with version history"""
@@ -23,70 +14,3 @@ class UserHistoryResponse(BaseModel):
 class AddUserCreditsResponse(BaseModel):
    new_balance: int
    transaction_key: str
-
-
-class CreateInvitedUserRequest(BaseModel):
-    email: EmailStr
-    name: Optional[str] = None
-
-
-class InvitedUserResponse(BaseModel):
-    id: str
-    email: str
-    status: prisma.enums.InvitedUserStatus
-    auth_user_id: Optional[str] = None
-    name: Optional[str] = None
-    tally_understanding: Optional[dict[str, Any]] = None
-    tally_status: prisma.enums.TallyComputationStatus
-    tally_computed_at: Optional[datetime] = None
-    tally_error: Optional[str] = None
-    created_at: datetime
-    updated_at: datetime
-
-    @classmethod
-    def from_record(cls, record: InvitedUserRecord) -> InvitedUserResponse:
-        return cls.model_validate(record.model_dump())
-
-
-class InvitedUsersResponse(BaseModel):
-    invited_users: list[InvitedUserResponse]
-    pagination: Pagination
-
-
-class BulkInvitedUserRowResponse(BaseModel):
-    row_number: int
-    email: Optional[str] = None
-    name: Optional[str] = None
-    status: Literal["CREATED", "SKIPPED", "ERROR"]
-    message: str
-    invited_user: Optional[InvitedUserResponse] = None
-
-
-class BulkInvitedUsersResponse(BaseModel):
-    created_count: int
-    skipped_count: int
-    error_count: int
-    results: list[BulkInvitedUserRowResponse]
-
-    @classmethod
-    def from_result(cls, result: BulkInvitedUsersResult) -> BulkInvitedUsersResponse:
-        return cls(
-            created_count=result.created_count,
-            skipped_count=result.skipped_count,
-            error_count=result.error_count,
-            results=[
-                BulkInvitedUserRowResponse(
-                    row_number=row.row_number,
-                    email=row.email,
-                    name=row.name,
-                    status=row.status,
-                    message=row.message,
-                    invited_user=(
-                        InvitedUserResponse.from_record(row.invited_user)
-                        if row.invited_user is not None
-                        else None
-                    ),
-                )
-                for row in result.results
-            ],
-        )
--- a/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes.py
+++ b/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes.py
@@ -0,0 +1,259 @@
+"""Admin endpoints for checking and resetting user CoPilot rate limit usage."""
+
+import logging
+from typing import Optional
+
+from autogpt_libs.auth import get_user_id, requires_admin_user
+from fastapi import APIRouter, Body, HTTPException, Security
+from pydantic import BaseModel
+
+from backend.copilot.config import ChatConfig
+from backend.copilot.rate_limit import (
+    SubscriptionTier,
+    get_global_rate_limits,
+    get_usage_status,
+    get_user_tier,
+    reset_user_usage,
+    set_user_tier,
+)
+from backend.data.user import get_user_by_email, get_user_email_by_id, search_users
+
+logger = logging.getLogger(__name__)
+
+config = ChatConfig()
+
+router = APIRouter(
+    prefix="/admin",
+    tags=["copilot", "admin"],
+    dependencies=[Security(requires_admin_user)],
+)
+
+
+class UserRateLimitResponse(BaseModel):
+    user_id: str
+    user_email: Optional[str] = None
+    daily_token_limit: int
+    weekly_token_limit: int
+    daily_tokens_used: int
+    weekly_tokens_used: int
+    tier: SubscriptionTier
+
+
+class UserTierResponse(BaseModel):
+    user_id: str
+    tier: SubscriptionTier
+
+
+class SetUserTierRequest(BaseModel):
+    user_id: str
+    tier: SubscriptionTier
+
+
+async def _resolve_user_id(
+    user_id: Optional[str], email: Optional[str]
+) -> tuple[str, Optional[str]]:
+    """Resolve a user_id and email from the provided parameters.
+
+    Returns (user_id, email). Accepts either user_id or email; at least one
+    must be provided.  When both are provided, ``email`` takes precedence.
+    """
+    if email:
+        user = await get_user_by_email(email)
+        if not user:
+            raise HTTPException(
+                status_code=404, detail="No user found with the provided email."
+            )
+        return user.id, email
+
+    if not user_id:
+        raise HTTPException(
+            status_code=400,
+            detail="Either user_id or email query parameter is required.",
+        )
+
+    # We have a user_id; try to look up their email for display purposes.
+    # This is non-critical -- a failure should not block the response.
+    try:
+        resolved_email = await get_user_email_by_id(user_id)
+    except Exception:
+        logger.warning("Failed to resolve email for user %s", user_id, exc_info=True)
+        resolved_email = None
+    return user_id, resolved_email
+
+
+@router.get(
+    "/rate_limit",
+    response_model=UserRateLimitResponse,
+    summary="Get User Rate Limit",
+)
+async def get_user_rate_limit(
+    user_id: Optional[str] = None,
+    email: Optional[str] = None,
+    admin_user_id: str = Security(get_user_id),
+) -> UserRateLimitResponse:
+    """Get a user's current usage and effective rate limits. Admin-only.
+
+    Accepts either ``user_id`` or ``email`` as a query parameter.
+    When ``email`` is provided the user is looked up by email first.
+    """
+    resolved_id, resolved_email = await _resolve_user_id(user_id, email)
+
+    logger.info("Admin %s checking rate limit for user %s", admin_user_id, resolved_id)
+
+    daily_limit, weekly_limit, tier = await get_global_rate_limits(
+        resolved_id, config.daily_token_limit, config.weekly_token_limit
+    )
+    usage = await get_usage_status(resolved_id, daily_limit, weekly_limit, tier=tier)
+
+    return UserRateLimitResponse(
+        user_id=resolved_id,
+        user_email=resolved_email,
+        daily_token_limit=daily_limit,
+        weekly_token_limit=weekly_limit,
+        daily_tokens_used=usage.daily.used,
+        weekly_tokens_used=usage.weekly.used,
+        tier=tier,
+    )
+
+
+@router.post(
+    "/rate_limit/reset",
+    response_model=UserRateLimitResponse,
+    summary="Reset User Rate Limit Usage",
+)
+async def reset_user_rate_limit(
+    user_id: str = Body(embed=True),
+    reset_weekly: bool = Body(False, embed=True),
+    admin_user_id: str = Security(get_user_id),
+) -> UserRateLimitResponse:
+    """Reset a user's daily usage counter (and optionally weekly). Admin-only."""
+    logger.info(
+        "Admin %s resetting rate limit for user %s (reset_weekly=%s)",
+        admin_user_id,
+        user_id,
+        reset_weekly,
+    )
+
+    try:
+        await reset_user_usage(user_id, reset_weekly=reset_weekly)
+    except Exception as e:
+        logger.exception("Failed to reset user usage")
+        raise HTTPException(status_code=500, detail="Failed to reset usage") from e
+
+    daily_limit, weekly_limit, tier = await get_global_rate_limits(
+        user_id, config.daily_token_limit, config.weekly_token_limit
+    )
+    usage = await get_usage_status(user_id, daily_limit, weekly_limit, tier=tier)
+
+    try:
+        resolved_email = await get_user_email_by_id(user_id)
+    except Exception:
+        logger.warning("Failed to resolve email for user %s", user_id, exc_info=True)
+        resolved_email = None
+
+    return UserRateLimitResponse(
+        user_id=user_id,
+        user_email=resolved_email,
+        daily_token_limit=daily_limit,
+        weekly_token_limit=weekly_limit,
+        daily_tokens_used=usage.daily.used,
+        weekly_tokens_used=usage.weekly.used,
+        tier=tier,
+    )
+
+
+@router.get(
+    "/rate_limit/tier",
+    response_model=UserTierResponse,
+    summary="Get User Rate Limit Tier",
+)
+async def get_user_rate_limit_tier(
+    user_id: str,
+    admin_user_id: str = Security(get_user_id),
+) -> UserTierResponse:
+    """Get a user's current rate-limit tier. Admin-only.
+
+    Returns 404 if the user does not exist in the database.
+    """
+    logger.info("Admin %s checking tier for user %s", admin_user_id, user_id)
+
+    resolved_email = await get_user_email_by_id(user_id)
+    if resolved_email is None:
+        raise HTTPException(status_code=404, detail=f"User {user_id} not found")
+
+    tier = await get_user_tier(user_id)
+    return UserTierResponse(user_id=user_id, tier=tier)
+
+
+@router.post(
+    "/rate_limit/tier",
+    response_model=UserTierResponse,
+    summary="Set User Rate Limit Tier",
+)
+async def set_user_rate_limit_tier(
+    request: SetUserTierRequest,
+    admin_user_id: str = Security(get_user_id),
+) -> UserTierResponse:
+    """Set a user's rate-limit tier. Admin-only.
+
+    Returns 404 if the user does not exist in the database.
+    """
+    try:
+        resolved_email = await get_user_email_by_id(request.user_id)
+    except Exception:
+        logger.warning(
+            "Failed to resolve email for user %s",
+            request.user_id,
+            exc_info=True,
+        )
+        resolved_email = None
+
+    if resolved_email is None:
+        raise HTTPException(status_code=404, detail=f"User {request.user_id} not found")
+
+    old_tier = await get_user_tier(request.user_id)
+    logger.info(
+        "Admin %s changing tier for user %s (%s): %s -> %s",
+        admin_user_id,
+        request.user_id,
+        resolved_email,
+        old_tier.value,
+        request.tier.value,
+    )
+    try:
+        await set_user_tier(request.user_id, request.tier)
+    except Exception as e:
+        logger.exception("Failed to set user tier")
+        raise HTTPException(status_code=500, detail="Failed to set tier") from e
+
+    return UserTierResponse(user_id=request.user_id, tier=request.tier)
+
+
+class UserSearchResult(BaseModel):
+    user_id: str
+    user_email: Optional[str] = None
+
+
+@router.get(
+    "/rate_limit/search_users",
+    response_model=list[UserSearchResult],
+    summary="Search Users by Name or Email",
+)
+async def admin_search_users(
+    query: str,
+    limit: int = 20,
+    admin_user_id: str = Security(get_user_id),
+) -> list[UserSearchResult]:
+    """Search users by partial email or name. Admin-only.
+
+    Queries the User table directly — returns results even for users
+    without credit transaction history.
+    """
+    if len(query.strip()) < 3:
+        raise HTTPException(
+            status_code=400,
+            detail="Search query must be at least 3 characters.",
+        )
+    logger.info("Admin %s searching users with query=%r", admin_user_id, query)
+    results = await search_users(query, limit=max(1, min(limit, 50)))
+    return [UserSearchResult(user_id=uid, user_email=email) for uid, email in results]
--- a/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/admin/rate_limit_admin_routes_test.py
@@ -0,0 +1,566 @@
+import json
+from types import SimpleNamespace
+from unittest.mock import AsyncMock
+
+import fastapi
+import fastapi.testclient
+import pytest
+import pytest_mock
+from autogpt_libs.auth.jwt_utils import get_jwt_payload
+from pytest_snapshot.plugin import Snapshot
+
+from backend.copilot.rate_limit import CoPilotUsageStatus, SubscriptionTier, UsageWindow
+
+from .rate_limit_admin_routes import router as rate_limit_admin_router
+
+app = fastapi.FastAPI()
+app.include_router(rate_limit_admin_router)
+
+client = fastapi.testclient.TestClient(app)
+
+_MOCK_MODULE = "backend.api.features.admin.rate_limit_admin_routes"
+
+_TARGET_EMAIL = "target@example.com"
+
+
+@pytest.fixture(autouse=True)
+def setup_app_admin_auth(mock_jwt_admin):
+    """Setup admin auth overrides for all tests in this module"""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
+    yield
+    app.dependency_overrides.clear()
+
+
+def _mock_usage_status(
+    daily_used: int = 500_000, weekly_used: int = 3_000_000
+) -> CoPilotUsageStatus:
+    from datetime import UTC, datetime, timedelta
+
+    now = datetime.now(UTC)
+    return CoPilotUsageStatus(
+        daily=UsageWindow(
+            used=daily_used, limit=2_500_000, resets_at=now + timedelta(hours=6)
+        ),
+        weekly=UsageWindow(
+            used=weekly_used, limit=12_500_000, resets_at=now + timedelta(days=3)
+        ),
+    )
+
+
+def _patch_rate_limit_deps(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+    daily_used: int = 500_000,
+    weekly_used: int = 3_000_000,
+):
+    """Patch the common rate-limit + user-lookup dependencies."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_global_rate_limits",
+        new_callable=AsyncMock,
+        return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_usage_status",
+        new_callable=AsyncMock,
+        return_value=_mock_usage_status(daily_used=daily_used, weekly_used=weekly_used),
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=_TARGET_EMAIL,
+    )
+
+
+def test_get_rate_limit(
+    mocker: pytest_mock.MockerFixture,
+    configured_snapshot: Snapshot,
+    target_user_id: str,
+) -> None:
+    """Test getting rate limit and usage for a user."""
+    _patch_rate_limit_deps(mocker, target_user_id)
+
+    response = client.get("/admin/rate_limit", params={"user_id": target_user_id})
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["user_id"] == target_user_id
+    assert data["user_email"] == _TARGET_EMAIL
+    assert data["daily_token_limit"] == 2_500_000
+    assert data["weekly_token_limit"] == 12_500_000
+    assert data["daily_tokens_used"] == 500_000
+    assert data["weekly_tokens_used"] == 3_000_000
+    assert data["tier"] == "FREE"
+
+    configured_snapshot.assert_match(
+        json.dumps(data, indent=2, sort_keys=True) + "\n",
+        "get_rate_limit",
+    )
+
+
+def test_get_rate_limit_by_email(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test looking up rate limits via email instead of user_id."""
+    _patch_rate_limit_deps(mocker, target_user_id)
+
+    mock_user = SimpleNamespace(id=target_user_id, email=_TARGET_EMAIL)
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_by_email",
+        new_callable=AsyncMock,
+        return_value=mock_user,
+    )
+
+    response = client.get("/admin/rate_limit", params={"email": _TARGET_EMAIL})
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["user_id"] == target_user_id
+    assert data["user_email"] == _TARGET_EMAIL
+    assert data["daily_token_limit"] == 2_500_000
+
+
+def test_get_rate_limit_by_email_not_found(
+    mocker: pytest_mock.MockerFixture,
+) -> None:
+    """Test that looking up a non-existent email returns 404."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_by_email",
+        new_callable=AsyncMock,
+        return_value=None,
+    )
+
+    response = client.get("/admin/rate_limit", params={"email": "nobody@example.com"})
+
+    assert response.status_code == 404
+
+
+def test_get_rate_limit_no_params() -> None:
+    """Test that omitting both user_id and email returns 400."""
+    response = client.get("/admin/rate_limit")
+    assert response.status_code == 400
+
+
+def test_reset_user_usage_daily_only(
+    mocker: pytest_mock.MockerFixture,
+    configured_snapshot: Snapshot,
+    target_user_id: str,
+) -> None:
+    """Test resetting only daily usage (default behaviour)."""
+    mock_reset = mocker.patch(
+        f"{_MOCK_MODULE}.reset_user_usage",
+        new_callable=AsyncMock,
+    )
+    _patch_rate_limit_deps(mocker, target_user_id, daily_used=0, weekly_used=3_000_000)
+
+    response = client.post(
+        "/admin/rate_limit/reset",
+        json={"user_id": target_user_id},
+    )
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["daily_tokens_used"] == 0
+    # Weekly is untouched
+    assert data["weekly_tokens_used"] == 3_000_000
+    assert data["tier"] == "FREE"
+
+    mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=False)
+
+    configured_snapshot.assert_match(
+        json.dumps(data, indent=2, sort_keys=True) + "\n",
+        "reset_user_usage_daily_only",
+    )
+
+
+def test_reset_user_usage_daily_and_weekly(
+    mocker: pytest_mock.MockerFixture,
+    configured_snapshot: Snapshot,
+    target_user_id: str,
+) -> None:
+    """Test resetting both daily and weekly usage."""
+    mock_reset = mocker.patch(
+        f"{_MOCK_MODULE}.reset_user_usage",
+        new_callable=AsyncMock,
+    )
+    _patch_rate_limit_deps(mocker, target_user_id, daily_used=0, weekly_used=0)
+
+    response = client.post(
+        "/admin/rate_limit/reset",
+        json={"user_id": target_user_id, "reset_weekly": True},
+    )
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["daily_tokens_used"] == 0
+    assert data["weekly_tokens_used"] == 0
+    assert data["tier"] == "FREE"
+
+    mock_reset.assert_awaited_once_with(target_user_id, reset_weekly=True)
+
+    configured_snapshot.assert_match(
+        json.dumps(data, indent=2, sort_keys=True) + "\n",
+        "reset_user_usage_daily_and_weekly",
+    )
+
+
+def test_reset_user_usage_redis_failure(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that Redis failure on reset returns 500."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.reset_user_usage",
+        new_callable=AsyncMock,
+        side_effect=Exception("Redis connection refused"),
+    )
+
+    response = client.post(
+        "/admin/rate_limit/reset",
+        json={"user_id": target_user_id},
+    )
+
+    assert response.status_code == 500
+
+
+def test_get_rate_limit_email_lookup_failure(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that failing to resolve a user email degrades gracefully."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_global_rate_limits",
+        new_callable=AsyncMock,
+        return_value=(2_500_000, 12_500_000, SubscriptionTier.FREE),
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_usage_status",
+        new_callable=AsyncMock,
+        return_value=_mock_usage_status(),
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        side_effect=Exception("DB connection lost"),
+    )
+
+    response = client.get("/admin/rate_limit", params={"user_id": target_user_id})
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["user_id"] == target_user_id
+    assert data["user_email"] is None
+
+
+def test_admin_endpoints_require_admin_role(mock_jwt_user) -> None:
+    """Test that rate limit admin endpoints require admin role."""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
+
+    response = client.get("/admin/rate_limit", params={"user_id": "test"})
+    assert response.status_code == 403
+
+    response = client.post(
+        "/admin/rate_limit/reset",
+        json={"user_id": "test"},
+    )
+    assert response.status_code == 403
+
+
+# ---------------------------------------------------------------------------
+# Tier management endpoints
+# ---------------------------------------------------------------------------
+
+
+def test_get_user_tier(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test getting a user's rate-limit tier."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=_TARGET_EMAIL,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_tier",
+        new_callable=AsyncMock,
+        return_value=SubscriptionTier.PRO,
+    )
+
+    response = client.get("/admin/rate_limit/tier", params={"user_id": target_user_id})
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["user_id"] == target_user_id
+    assert data["tier"] == "PRO"
+
+
+def test_get_user_tier_user_not_found(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that getting tier for a non-existent user returns 404."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=None,
+    )
+
+    response = client.get("/admin/rate_limit/tier", params={"user_id": target_user_id})
+
+    assert response.status_code == 404
+
+
+def test_set_user_tier(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test setting a user's rate-limit tier (upgrade)."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=_TARGET_EMAIL,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_tier",
+        new_callable=AsyncMock,
+        return_value=SubscriptionTier.FREE,
+    )
+    mock_set = mocker.patch(
+        f"{_MOCK_MODULE}.set_user_tier",
+        new_callable=AsyncMock,
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "ENTERPRISE"},
+    )
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["user_id"] == target_user_id
+    assert data["tier"] == "ENTERPRISE"
+    mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.ENTERPRISE)
+
+
+def test_set_user_tier_downgrade(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test downgrading a user's tier from PRO to FREE."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=_TARGET_EMAIL,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_tier",
+        new_callable=AsyncMock,
+        return_value=SubscriptionTier.PRO,
+    )
+    mock_set = mocker.patch(
+        f"{_MOCK_MODULE}.set_user_tier",
+        new_callable=AsyncMock,
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "FREE"},
+    )
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["user_id"] == target_user_id
+    assert data["tier"] == "FREE"
+    mock_set.assert_awaited_once_with(target_user_id, SubscriptionTier.FREE)
+
+
+def test_set_user_tier_invalid_tier(
+    target_user_id: str,
+) -> None:
+    """Test that setting an invalid tier returns 422."""
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "invalid"},
+    )
+
+    assert response.status_code == 422
+
+
+def test_set_user_tier_invalid_tier_uppercase(
+    target_user_id: str,
+) -> None:
+    """Test that setting an unrecognised uppercase tier (e.g. 'INVALID') returns 422.
+
+    Regression: ensures Pydantic enum validation rejects values that are not
+    members of SubscriptionTier, even when they look like valid enum names.
+    """
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "INVALID"},
+    )
+
+    assert response.status_code == 422
+    body = response.json()
+    assert "detail" in body
+
+
+def test_set_user_tier_email_lookup_failure_returns_404(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that email lookup failure returns 404 (user unverifiable)."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        side_effect=Exception("DB connection failed"),
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "PRO"},
+    )
+
+    assert response.status_code == 404
+
+
+def test_set_user_tier_user_not_found(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that setting tier for a non-existent user returns 404."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=None,
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "PRO"},
+    )
+
+    assert response.status_code == 404
+
+
+def test_set_user_tier_db_failure(
+    mocker: pytest_mock.MockerFixture,
+    target_user_id: str,
+) -> None:
+    """Test that DB failure on set tier returns 500."""
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_email_by_id",
+        new_callable=AsyncMock,
+        return_value=_TARGET_EMAIL,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.get_user_tier",
+        new_callable=AsyncMock,
+        return_value=SubscriptionTier.FREE,
+    )
+    mocker.patch(
+        f"{_MOCK_MODULE}.set_user_tier",
+        new_callable=AsyncMock,
+        side_effect=Exception("DB connection refused"),
+    )
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": target_user_id, "tier": "PRO"},
+    )
+
+    assert response.status_code == 500
+
+
+def test_tier_endpoints_require_admin_role(mock_jwt_user) -> None:
+    """Test that tier admin endpoints require admin role."""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
+
+    response = client.get("/admin/rate_limit/tier", params={"user_id": "test"})
+    assert response.status_code == 403
+
+    response = client.post(
+        "/admin/rate_limit/tier",
+        json={"user_id": "test", "tier": "PRO"},
+    )
+    assert response.status_code == 403
+
+
+# ─── search_users endpoint ──────────────────────────────────────────
+
+
+def test_search_users_returns_matching_users(
+    mocker: pytest_mock.MockerFixture,
+    admin_user_id: str,
+) -> None:
+    """Partial search should return all matching users from the User table."""
+    mocker.patch(
+        _MOCK_MODULE + ".search_users",
+        new_callable=AsyncMock,
+        return_value=[
+            ("user-1", "zamil.majdy@gmail.com"),
+            ("user-2", "zamil.majdy@agpt.co"),
+        ],
+    )
+
+    response = client.get("/admin/rate_limit/search_users", params={"query": "zamil"})
+
+    assert response.status_code == 200
+    results = response.json()
+    assert len(results) == 2
+    assert results[0]["user_email"] == "zamil.majdy@gmail.com"
+    assert results[1]["user_email"] == "zamil.majdy@agpt.co"
+
+
+def test_search_users_empty_results(
+    mocker: pytest_mock.MockerFixture,
+    admin_user_id: str,
+) -> None:
+    """Search with no matches returns empty list."""
+    mocker.patch(
+        _MOCK_MODULE + ".search_users",
+        new_callable=AsyncMock,
+        return_value=[],
+    )
+
+    response = client.get(
+        "/admin/rate_limit/search_users", params={"query": "nonexistent"}
+    )
+
+    assert response.status_code == 200
+    assert response.json() == []
+
+
+def test_search_users_short_query_rejected(
+    admin_user_id: str,
+) -> None:
+    """Query shorter than 3 characters should return 400."""
+    response = client.get("/admin/rate_limit/search_users", params={"query": "ab"})
+    assert response.status_code == 400
+
+
+def test_search_users_negative_limit_clamped(
+    mocker: pytest_mock.MockerFixture,
+    admin_user_id: str,
+) -> None:
+    """Negative limit should be clamped to 1, not passed through."""
+    mock_search = mocker.patch(
+        _MOCK_MODULE + ".search_users",
+        new_callable=AsyncMock,
+        return_value=[],
+    )
+
+    response = client.get(
+        "/admin/rate_limit/search_users", params={"query": "test", "limit": -1}
+    )
+
+    assert response.status_code == 200
+    mock_search.assert_awaited_once_with("test", limit=1)
+
+
+def test_search_users_requires_admin_role(mock_jwt_user) -> None:
+    """Test that the search_users endpoint requires admin role."""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
+
+    response = client.get("/admin/rate_limit/search_users", params={"query": "test"})
+    assert response.status_code == 403
--- a/autogpt_platform/backend/backend/api/features/admin/store_admin_routes.py
+++ b/autogpt_platform/backend/backend/api/features/admin/store_admin_routes.py
@@ -7,6 +7,8 @@ import fastapi
 import fastapi.responses
 import prisma.enums

+import backend.api.features.library.db as library_db
+import backend.api.features.library.model as library_model
 import backend.api.features.store.cache as store_cache
 import backend.api.features.store.db as store_db
 import backend.api.features.store.model as store_model
@@ -132,3 +134,40 @@ async def admin_download_agent_file(
        return fastapi.responses.FileResponse(
            tmp_file.name, filename=file_name, media_type="application/json"
        )
+
+
+@router.get(
+    "/submissions/{store_listing_version_id}/preview",
+    summary="Admin Preview Submission Listing",
+)
+async def admin_preview_submission(
+    store_listing_version_id: str,
+) -> store_model.StoreAgentDetails:
+    """
+    Preview a marketplace submission as it would appear on the listing page.
+    Bypasses the APPROVED-only StoreAgent view so admins can preview pending
+    submissions before approving.
+    """
+    return await store_db.get_store_agent_details_as_admin(store_listing_version_id)
+
+
+@router.post(
+    "/submissions/{store_listing_version_id}/add-to-library",
+    summary="Admin Add Pending Agent to Library",
+    status_code=201,
+)
+async def admin_add_agent_to_library(
+    store_listing_version_id: str,
+    user_id: str = fastapi.Security(autogpt_libs.auth.get_user_id),
+) -> library_model.LibraryAgent:
+    """
+    Add a pending marketplace agent to the admin's library for review.
+    Uses admin-level access to bypass marketplace APPROVED-only checks.
+
+    The builder can load the graph because get_graph() checks library
+    membership as a fallback: "you added it, you keep it."
+    """
+    return await library_db.add_store_agent_to_library_as_admin(
+        store_listing_version_id=store_listing_version_id,
+        user_id=user_id,
+    )
--- a/autogpt_platform/backend/backend/api/features/admin/store_admin_routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/admin/store_admin_routes_test.py
@@ -0,0 +1,335 @@
+"""Tests for admin store routes and the bypass logic they depend on.
+
+Tests are organized by what they protect:
+- SECRT-2162: get_graph_as_admin bypasses ownership/marketplace checks
+- SECRT-2167 security: admin endpoints reject non-admin users
+- SECRT-2167 bypass: preview queries StoreListingVersion (not StoreAgent view),
+  and add-to-library uses get_graph_as_admin (not get_graph)
+"""
+
+from datetime import datetime, timezone
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import fastapi
+import fastapi.responses
+import fastapi.testclient
+import pytest
+import pytest_mock
+from autogpt_libs.auth.jwt_utils import get_jwt_payload
+
+from backend.data.graph import get_graph_as_admin
+from backend.util.exceptions import NotFoundError
+
+from .store_admin_routes import router as store_admin_router
+
+# Shared constants
+ADMIN_USER_ID = "admin-user-id"
+CREATOR_USER_ID = "other-creator-id"
+GRAPH_ID = "test-graph-id"
+GRAPH_VERSION = 3
+SLV_ID = "test-store-listing-version-id"
+
+
+def _make_mock_graph(user_id: str = CREATOR_USER_ID) -> MagicMock:
+    graph = MagicMock()
+    graph.userId = user_id
+    graph.id = GRAPH_ID
+    graph.version = GRAPH_VERSION
+    graph.Nodes = []
+    return graph
+
+
+# ---- SECRT-2162: get_graph_as_admin bypasses ownership checks ---- #
+
+
+@pytest.mark.asyncio
+async def test_admin_can_access_pending_agent_not_owned() -> None:
+    """get_graph_as_admin must return a graph even when the admin doesn't own
+    it and it's not APPROVED in the marketplace."""
+    mock_graph = _make_mock_graph()
+    mock_graph_model = MagicMock(name="GraphModel")
+
+    with (
+        patch("backend.data.graph.AgentGraph.prisma") as mock_prisma,
+        patch(
+            "backend.data.graph.GraphModel.from_db",
+            return_value=mock_graph_model,
+        ),
+    ):
+        mock_prisma.return_value.find_first = AsyncMock(return_value=mock_graph)
+
+        result = await get_graph_as_admin(
+            graph_id=GRAPH_ID,
+            version=GRAPH_VERSION,
+            user_id=ADMIN_USER_ID,
+            for_export=False,
+        )
+
+    assert result is mock_graph_model
+
+
+@pytest.mark.asyncio
+async def test_admin_download_pending_agent_with_subagents() -> None:
+    """get_graph_as_admin with for_export=True must call get_sub_graphs
+    and pass sub_graphs to GraphModel.from_db."""
+    mock_graph = _make_mock_graph()
+    mock_sub_graph = MagicMock(name="SubGraph")
+    mock_graph_model = MagicMock(name="GraphModel")
+
+    with (
+        patch("backend.data.graph.AgentGraph.prisma") as mock_prisma,
+        patch(
+            "backend.data.graph.get_sub_graphs",
+            new_callable=AsyncMock,
+            return_value=[mock_sub_graph],
+        ) as mock_get_sub,
+        patch(
+            "backend.data.graph.GraphModel.from_db",
+            return_value=mock_graph_model,
+        ) as mock_from_db,
+    ):
+        mock_prisma.return_value.find_first = AsyncMock(return_value=mock_graph)
+
+        result = await get_graph_as_admin(
+            graph_id=GRAPH_ID,
+            version=GRAPH_VERSION,
+            user_id=ADMIN_USER_ID,
+            for_export=True,
+        )
+
+    assert result is mock_graph_model
+    mock_get_sub.assert_awaited_once_with(mock_graph)
+    mock_from_db.assert_called_once_with(
+        graph=mock_graph,
+        sub_graphs=[mock_sub_graph],
+        for_export=True,
+    )
+
+
+# ---- SECRT-2167 security: admin endpoints reject non-admin users ---- #
+
+app = fastapi.FastAPI()
+app.include_router(store_admin_router)
+
+
+@app.exception_handler(NotFoundError)
+async def _not_found_handler(
+    request: fastapi.Request, exc: NotFoundError
+) -> fastapi.responses.JSONResponse:
+    return fastapi.responses.JSONResponse(status_code=404, content={"detail": str(exc)})
+
+
+client = fastapi.testclient.TestClient(app)
+
+
+@pytest.fixture(autouse=True)
+def setup_app_admin_auth(mock_jwt_admin):
+    """Setup admin auth overrides for all route tests in this module."""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
+    yield
+    app.dependency_overrides.clear()
+
+
+def test_preview_requires_admin(mock_jwt_user) -> None:
+    """Non-admin users must get 403 on the preview endpoint."""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
+    response = client.get(f"/admin/submissions/{SLV_ID}/preview")
+    assert response.status_code == 403
+
+
+def test_add_to_library_requires_admin(mock_jwt_user) -> None:
+    """Non-admin users must get 403 on the add-to-library endpoint."""
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
+    response = client.post(f"/admin/submissions/{SLV_ID}/add-to-library")
+    assert response.status_code == 403
+
+
+def test_preview_nonexistent_submission(
+    mocker: pytest_mock.MockerFixture,
+) -> None:
+    """Preview of a nonexistent submission returns 404."""
+    mocker.patch(
+        "backend.api.features.admin.store_admin_routes.store_db"
+        ".get_store_agent_details_as_admin",
+        side_effect=NotFoundError("not found"),
+    )
+    response = client.get(f"/admin/submissions/{SLV_ID}/preview")
+    assert response.status_code == 404
+
+
+# ---- SECRT-2167 bypass: verify the right data sources are used ---- #
+
+
+@pytest.mark.asyncio
+async def test_preview_queries_store_listing_version_not_store_agent() -> None:
+    """get_store_agent_details_as_admin must query StoreListingVersion
+    directly (not the APPROVED-only StoreAgent view). This is THE test that
+    prevents the bypass from being accidentally reverted."""
+    from backend.api.features.store.db import get_store_agent_details_as_admin
+
+    mock_slv = MagicMock()
+    mock_slv.id = SLV_ID
+    mock_slv.name = "Test Agent"
+    mock_slv.subHeading = "Short desc"
+    mock_slv.description = "Long desc"
+    mock_slv.videoUrl = None
+    mock_slv.agentOutputDemoUrl = None
+    mock_slv.imageUrls = ["https://example.com/img.png"]
+    mock_slv.instructions = None
+    mock_slv.categories = ["productivity"]
+    mock_slv.version = 1
+    mock_slv.agentGraphId = GRAPH_ID
+    mock_slv.agentGraphVersion = GRAPH_VERSION
+    mock_slv.updatedAt = datetime(2026, 3, 24, tzinfo=timezone.utc)
+    mock_slv.recommendedScheduleCron = "0 9 * * *"
+
+    mock_listing = MagicMock()
+    mock_listing.id = "listing-id"
+    mock_listing.slug = "test-agent"
+    mock_listing.activeVersionId = SLV_ID
+    mock_listing.hasApprovedVersion = False
+    mock_listing.CreatorProfile = MagicMock(username="creator", avatarUrl="")
+    mock_slv.StoreListing = mock_listing
+
+    with (
+        patch(
+            "backend.api.features.store.db.prisma.models" ".StoreListingVersion.prisma",
+        ) as mock_slv_prisma,
+        patch(
+            "backend.api.features.store.db.prisma.models.StoreAgent.prisma",
+        ) as mock_store_agent_prisma,
+    ):
+        mock_slv_prisma.return_value.find_unique = AsyncMock(return_value=mock_slv)
+
+        result = await get_store_agent_details_as_admin(SLV_ID)
+
+    # Verify it queried StoreListingVersion (not the APPROVED-only StoreAgent)
+    mock_slv_prisma.return_value.find_unique.assert_awaited_once()
+    await_args = mock_slv_prisma.return_value.find_unique.await_args
+    assert await_args is not None
+    assert await_args.kwargs["where"] == {"id": SLV_ID}
+
+    # Verify the APPROVED-only StoreAgent view was NOT touched
+    mock_store_agent_prisma.assert_not_called()
+
+    # Verify the result has the right data
+    assert result.agent_name == "Test Agent"
+    assert result.agent_image == ["https://example.com/img.png"]
+    assert result.has_approved_version is False
+    assert result.runs == 0
+    assert result.rating == 0.0
+
+
+@pytest.mark.asyncio
+async def test_resolve_graph_admin_uses_get_graph_as_admin() -> None:
+    """resolve_graph_for_library(admin=True) must call get_graph_as_admin,
+    not get_graph. This is THE test that prevents the add-to-library bypass
+    from being accidentally reverted."""
+    from backend.api.features.library._add_to_library import resolve_graph_for_library
+
+    mock_slv = MagicMock()
+    mock_slv.AgentGraph = MagicMock(id=GRAPH_ID, version=GRAPH_VERSION)
+    mock_graph_model = MagicMock(name="GraphModel")
+
+    with (
+        patch(
+            "backend.api.features.library._add_to_library.prisma.models"
+            ".StoreListingVersion.prisma",
+        ) as mock_prisma,
+        patch(
+            "backend.api.features.library._add_to_library.graph_db"
+            ".get_graph_as_admin",
+            new_callable=AsyncMock,
+            return_value=mock_graph_model,
+        ) as mock_admin,
+        patch(
+            "backend.api.features.library._add_to_library.graph_db.get_graph",
+            new_callable=AsyncMock,
+        ) as mock_regular,
+    ):
+        mock_prisma.return_value.find_unique = AsyncMock(return_value=mock_slv)
+
+        result = await resolve_graph_for_library(SLV_ID, ADMIN_USER_ID, admin=True)
+
+    assert result is mock_graph_model
+    mock_admin.assert_awaited_once_with(
+        graph_id=GRAPH_ID, version=GRAPH_VERSION, user_id=ADMIN_USER_ID
+    )
+    mock_regular.assert_not_awaited()
+
+
+@pytest.mark.asyncio
+async def test_resolve_graph_regular_uses_get_graph() -> None:
+    """resolve_graph_for_library(admin=False) must call get_graph,
+    not get_graph_as_admin. Ensures the non-admin path is preserved."""
+    from backend.api.features.library._add_to_library import resolve_graph_for_library
+
+    mock_slv = MagicMock()
+    mock_slv.AgentGraph = MagicMock(id=GRAPH_ID, version=GRAPH_VERSION)
+    mock_graph_model = MagicMock(name="GraphModel")
+
+    with (
+        patch(
+            "backend.api.features.library._add_to_library.prisma.models"
+            ".StoreListingVersion.prisma",
+        ) as mock_prisma,
+        patch(
+            "backend.api.features.library._add_to_library.graph_db"
+            ".get_graph_as_admin",
+            new_callable=AsyncMock,
+        ) as mock_admin,
+        patch(
+            "backend.api.features.library._add_to_library.graph_db.get_graph",
+            new_callable=AsyncMock,
+            return_value=mock_graph_model,
+        ) as mock_regular,
+    ):
+        mock_prisma.return_value.find_unique = AsyncMock(return_value=mock_slv)
+
+        result = await resolve_graph_for_library(SLV_ID, "regular-user-id", admin=False)
+
+    assert result is mock_graph_model
+    mock_regular.assert_awaited_once_with(
+        graph_id=GRAPH_ID, version=GRAPH_VERSION, user_id="regular-user-id"
+    )
+    mock_admin.assert_not_awaited()
+
+
+# ---- Library membership grants graph access (product decision) ---- #
+
+
+@pytest.mark.asyncio
+async def test_library_member_can_view_pending_agent_in_builder() -> None:
+    """After adding a pending agent to their library, the user should be
+    able to load the graph in the builder via get_graph()."""
+    mock_graph = _make_mock_graph()
+    mock_graph_model = MagicMock(name="GraphModel")
+    mock_library_agent = MagicMock()
+    mock_library_agent.AgentGraph = mock_graph
+
+    with (
+        patch("backend.data.graph.AgentGraph.prisma") as mock_ag_prisma,
+        patch(
+            "backend.data.graph.StoreListingVersion.prisma",
+        ) as mock_slv_prisma,
+        patch("backend.data.graph.LibraryAgent.prisma") as mock_lib_prisma,
+        patch(
+            "backend.data.graph.GraphModel.from_db",
+            return_value=mock_graph_model,
+        ),
+    ):
+        mock_ag_prisma.return_value.find_first = AsyncMock(return_value=None)
+        mock_slv_prisma.return_value.find_first = AsyncMock(return_value=None)
+        mock_lib_prisma.return_value.find_first = AsyncMock(
+            return_value=mock_library_agent
+        )
+
+        from backend.data.graph import get_graph
+
+        result = await get_graph(
+            graph_id=GRAPH_ID,
+            version=GRAPH_VERSION,
+            user_id=ADMIN_USER_ID,
+        )
+
+    assert result is mock_graph_model, "Library membership should grant graph access"
--- a/autogpt_platform/backend/backend/api/features/admin/user_admin_routes.py
+++ b/autogpt_platform/backend/backend/api/features/admin/user_admin_routes.py
@@ -1,137 +0,0 @@
-import logging
-import math
-
-from autogpt_libs.auth import get_user_id, requires_admin_user
-from fastapi import APIRouter, File, Query, Security, UploadFile
-
-from backend.data.invited_user import (
-    bulk_create_invited_users_from_file,
-    create_invited_user,
-    list_invited_users,
-    retry_invited_user_tally,
-    revoke_invited_user,
-)
-from backend.data.tally import mask_email
-from backend.util.models import Pagination
-
-from .model import (
-    BulkInvitedUsersResponse,
-    CreateInvitedUserRequest,
-    InvitedUserResponse,
-    InvitedUsersResponse,
-)
-
-logger = logging.getLogger(__name__)
-
-
-router = APIRouter(
-    prefix="/admin",
-    tags=["users", "admin"],
-    dependencies=[Security(requires_admin_user)],
-)
-
-
-@router.get(
-    "/invited-users",
-    response_model=InvitedUsersResponse,
-    summary="List Invited Users",
-)
-async def get_invited_users(
-    admin_user_id: str = Security(get_user_id),
-    page: int = Query(1, ge=1),
-    page_size: int = Query(50, ge=1, le=200),
-) -> InvitedUsersResponse:
-    logger.info("Admin user %s requested invited users", admin_user_id)
-    invited_users, total = await list_invited_users(page=page, page_size=page_size)
-    return InvitedUsersResponse(
-        invited_users=[InvitedUserResponse.from_record(iu) for iu in invited_users],
-        pagination=Pagination(
-            total_items=total,
-            total_pages=max(1, math.ceil(total / page_size)),
-            current_page=page,
-            page_size=page_size,
-        ),
-    )
-
-
-@router.post(
-    "/invited-users",
-    response_model=InvitedUserResponse,
-    summary="Create Invited User",
-)
-async def create_invited_user_route(
-    request: CreateInvitedUserRequest,
-    admin_user_id: str = Security(get_user_id),
-) -> InvitedUserResponse:
-    logger.info(
-        "Admin user %s creating invited user for %s",
-        admin_user_id,
-        mask_email(request.email),
-    )
-    invited_user = await create_invited_user(request.email, request.name)
-    logger.info(
-        "Admin user %s created invited user %s",
-        admin_user_id,
-        invited_user.id,
-    )
-    return InvitedUserResponse.from_record(invited_user)
-
-
-@router.post(
-    "/invited-users/bulk",
-    response_model=BulkInvitedUsersResponse,
-    summary="Bulk Create Invited Users",
-    operation_id="postV2BulkCreateInvitedUsers",
-)
-async def bulk_create_invited_users_route(
-    file: UploadFile = File(...),
-    admin_user_id: str = Security(get_user_id),
-) -> BulkInvitedUsersResponse:
-    logger.info(
-        "Admin user %s bulk invited users from %s",
-        admin_user_id,
-        file.filename or "<unnamed>",
-    )
-    content = await file.read()
-    result = await bulk_create_invited_users_from_file(file.filename, content)
-    return BulkInvitedUsersResponse.from_result(result)
-
-
-@router.post(
-    "/invited-users/{invited_user_id}/revoke",
-    response_model=InvitedUserResponse,
-    summary="Revoke Invited User",
-)
-async def revoke_invited_user_route(
-    invited_user_id: str,
-    admin_user_id: str = Security(get_user_id),
-) -> InvitedUserResponse:
-    logger.info(
-        "Admin user %s revoking invited user %s", admin_user_id, invited_user_id
-    )
-    invited_user = await revoke_invited_user(invited_user_id)
-    logger.info("Admin user %s revoked invited user %s", admin_user_id, invited_user_id)
-    return InvitedUserResponse.from_record(invited_user)
-
-
-@router.post(
-    "/invited-users/{invited_user_id}/retry-tally",
-    response_model=InvitedUserResponse,
-    summary="Retry Invited User Tally",
-)
-async def retry_invited_user_tally_route(
-    invited_user_id: str,
-    admin_user_id: str = Security(get_user_id),
-) -> InvitedUserResponse:
-    logger.info(
-        "Admin user %s retrying Tally seed for invited user %s",
-        admin_user_id,
-        invited_user_id,
-    )
-    invited_user = await retry_invited_user_tally(invited_user_id)
-    logger.info(
-        "Admin user %s retried Tally seed for invited user %s",
-        admin_user_id,
-        invited_user_id,
-    )
-    return InvitedUserResponse.from_record(invited_user)
--- a/autogpt_platform/backend/backend/api/features/admin/user_admin_routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/admin/user_admin_routes_test.py
@@ -1,168 +0,0 @@
-from datetime import datetime, timezone
-from unittest.mock import AsyncMock
-
-import fastapi
-import fastapi.testclient
-import prisma.enums
-import pytest
-import pytest_mock
-from autogpt_libs.auth.jwt_utils import get_jwt_payload
-
-from backend.data.invited_user import (
-    BulkInvitedUserRowResult,
-    BulkInvitedUsersResult,
-    InvitedUserRecord,
-)
-
-from .user_admin_routes import router as user_admin_router
-
-app = fastapi.FastAPI()
-app.include_router(user_admin_router)
-
-client = fastapi.testclient.TestClient(app)
-
-
-@pytest.fixture(autouse=True)
-def setup_app_admin_auth(mock_jwt_admin):
-    app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
-    yield
-    app.dependency_overrides.clear()
-
-
-def _sample_invited_user() -> InvitedUserRecord:
-    now = datetime.now(timezone.utc)
-    return InvitedUserRecord(
-        id="invite-1",
-        email="invited@example.com",
-        status=prisma.enums.InvitedUserStatus.INVITED,
-        auth_user_id=None,
-        name="Invited User",
-        tally_understanding=None,
-        tally_status=prisma.enums.TallyComputationStatus.PENDING,
-        tally_computed_at=None,
-        tally_error=None,
-        created_at=now,
-        updated_at=now,
-    )
-
-
-def _sample_bulk_invited_users_result() -> BulkInvitedUsersResult:
-    return BulkInvitedUsersResult(
-        created_count=1,
-        skipped_count=1,
-        error_count=0,
-        results=[
-            BulkInvitedUserRowResult(
-                row_number=1,
-                email="invited@example.com",
-                name=None,
-                status="CREATED",
-                message="Invite created",
-                invited_user=_sample_invited_user(),
-            ),
-            BulkInvitedUserRowResult(
-                row_number=2,
-                email="duplicate@example.com",
-                name=None,
-                status="SKIPPED",
-                message="An invited user with this email already exists",
-                invited_user=None,
-            ),
-        ],
-    )
-
-
-def test_get_invited_users(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    mocker.patch(
-        "backend.api.features.admin.user_admin_routes.list_invited_users",
-        AsyncMock(return_value=([_sample_invited_user()], 1)),
-    )
-
-    response = client.get("/admin/invited-users")
-
-    assert response.status_code == 200
-    data = response.json()
-    assert len(data["invited_users"]) == 1
-    assert data["invited_users"][0]["email"] == "invited@example.com"
-    assert data["invited_users"][0]["status"] == "INVITED"
-    assert data["pagination"]["total_items"] == 1
-    assert data["pagination"]["current_page"] == 1
-    assert data["pagination"]["page_size"] == 50
-
-
-def test_create_invited_user(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    mocker.patch(
-        "backend.api.features.admin.user_admin_routes.create_invited_user",
-        AsyncMock(return_value=_sample_invited_user()),
-    )
-
-    response = client.post(
-        "/admin/invited-users",
-        json={"email": "invited@example.com", "name": "Invited User"},
-    )
-
-    assert response.status_code == 200
-    data = response.json()
-    assert data["email"] == "invited@example.com"
-    assert data["name"] == "Invited User"
-
-
-def test_bulk_create_invited_users(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    mocker.patch(
-        "backend.api.features.admin.user_admin_routes.bulk_create_invited_users_from_file",
-        AsyncMock(return_value=_sample_bulk_invited_users_result()),
-    )
-
-    response = client.post(
-        "/admin/invited-users/bulk",
-        files={
-            "file": ("invites.txt", b"invited@example.com\nduplicate@example.com\n")
-        },
-    )
-
-    assert response.status_code == 200
-    data = response.json()
-    assert data["created_count"] == 1
-    assert data["skipped_count"] == 1
-    assert data["results"][0]["status"] == "CREATED"
-    assert data["results"][1]["status"] == "SKIPPED"
-
-
-def test_revoke_invited_user(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    revoked = _sample_invited_user().model_copy(
-        update={"status": prisma.enums.InvitedUserStatus.REVOKED}
-    )
-    mocker.patch(
-        "backend.api.features.admin.user_admin_routes.revoke_invited_user",
-        AsyncMock(return_value=revoked),
-    )
-
-    response = client.post("/admin/invited-users/invite-1/revoke")
-
-    assert response.status_code == 200
-    assert response.json()["status"] == "REVOKED"
-
-
-def test_retry_invited_user_tally(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    retried = _sample_invited_user().model_copy(
-        update={"tally_status": prisma.enums.TallyComputationStatus.RUNNING}
-    )
-    mocker.patch(
-        "backend.api.features.admin.user_admin_routes.retry_invited_user_tally",
-        AsyncMock(return_value=retried),
-    )
-
-    response = client.post("/admin/invited-users/invite-1/retry-tally")
-
-    assert response.status_code == 200
-    assert response.json()["tally_status"] == "RUNNING"
--- a/autogpt_platform/backend/backend/api/features/builder/db.py
+++ b/autogpt_platform/backend/backend/api/features/builder/db.py
@@ -4,14 +4,12 @@ from difflib import SequenceMatcher
 from typing import Any, Sequence, get_args, get_origin

 import prisma
-from prisma.enums import ContentType
 from prisma.models import mv_suggested_blocks

 import backend.api.features.library.db as library_db
 import backend.api.features.library.model as library_model
 import backend.api.features.store.db as store_db
 import backend.api.features.store.model as store_model
-from backend.api.features.store.hybrid_search import unified_hybrid_search
 from backend.blocks import load_all_blocks
 from backend.blocks._base import (
    AnyBlockSchema,
@@ -24,6 +22,7 @@ from backend.blocks.llm import LlmModel
 from backend.integrations.providers import ProviderName
 from backend.util.cache import cached
 from backend.util.models import Pagination
+from backend.util.text import split_camelcase

 from .model import (
    BlockCategoryResponse,
@@ -271,7 +270,7 @@ async def _build_cached_search_results(

    # Use hybrid search when query is present, otherwise list all blocks
    if (include_blocks or include_integrations) and normalized_query:
-        block_results, block_total, integration_total = await _hybrid_search_blocks(
+        block_results, block_total, integration_total = await _text_search_blocks(
            query=search_query,
            include_blocks=include_blocks,
            include_integrations=include_integrations,
@@ -383,117 +382,75 @@ def _collect_block_results(
    return results, block_count, integration_count


-async def _hybrid_search_blocks(
+async def _text_search_blocks(
    *,
    query: str,
    include_blocks: bool,
    include_integrations: bool,
 ) -> tuple[list[_ScoredItem], int, int]:
    """
-    Search blocks using hybrid search with builder-specific filtering.
+    Search blocks using in-memory text matching over the block registry.

-    Uses unified_hybrid_search for semantic + lexical search, then applies
-    post-filtering for block/integration types and scoring adjustments.
+    All blocks are already loaded in memory, so this is fast and reliable
+    regardless of whether OpenAI embeddings are available.

    Scoring:
-        - Base: hybrid relevance score (0-1) scaled to 0-100, plus BLOCK_SCORE_BOOST
+        - Base: text relevance via _score_primary_fields, plus BLOCK_SCORE_BOOST
          to prioritize blocks over marketplace agents in combined results
-        - +30 for exact name match, +15 for prefix name match
        - +20 if the block has an LlmModel field and the query matches an LLM model name
-
-    Args:
-        query: The search query string
-        include_blocks: Whether to include regular blocks
-        include_integrations: Whether to include integration blocks
-
-    Returns:
-        Tuple of (scored_items, block_count, integration_count)
    """
    results: list[_ScoredItem] = []
-    block_count = 0
-    integration_count = 0

    if not include_blocks and not include_integrations:
-        return results, block_count, integration_count
+        return results, 0, 0

    normalized_query = query.strip().lower()

-    # Fetch more results to account for post-filtering
-    search_results, _ = await unified_hybrid_search(
-        query=query,
-        content_types=[ContentType.BLOCK],
-        page=1,
-        page_size=150,
-        min_score=0.10,
+    all_results, _, _ = _collect_block_results(
+        include_blocks=include_blocks,
+        include_integrations=include_integrations,
    )

-    # Load all blocks for getting BlockInfo
    all_blocks = load_all_blocks()

-    for result in search_results:
-        block_id = result["content_id"]
+    for item in all_results:
+        block_info = item.item
+        assert isinstance(block_info, BlockInfo)
+        name = split_camelcase(block_info.name).lower()

-        # Skip excluded blocks
-        if block_id in EXCLUDED_BLOCK_IDS:
-            continue
+        # Build rich description including input field descriptions,
+        # matching the searchable text that the embedding pipeline uses
+        desc_parts = [block_info.description or ""]
+        block_cls = all_blocks.get(block_info.id)
+        if block_cls is not None:
+            block: AnyBlockSchema = block_cls()
+            desc_parts += [
+                f"{f}: {info.description}"
+                for f, info in block.input_schema.model_fields.items()
+                if info.description
+            ]
+        description = " ".join(desc_parts).lower()

-        metadata = result.get("metadata", {})
-        hybrid_score = result.get("relevance", 0.0)
-
-        # Get the actual block class
-        if block_id not in all_blocks:
-            continue
-
-        block_cls = all_blocks[block_id]
-        block: AnyBlockSchema = block_cls()
-
-        if block.disabled:
-            continue
-
-        # Check block/integration filter using metadata
-        is_integration = metadata.get("is_integration", False)
-
-        if is_integration and not include_integrations:
-            continue
-        if not is_integration and not include_blocks:
-            continue
-
-        # Get block info
-        block_info = block.get_info()
-
-        # Calculate final score: scale hybrid score and add builder-specific bonuses
-        # Hybrid scores are 0-1, builder scores were 0-200+
-        # Add BLOCK_SCORE_BOOST to prioritize blocks over marketplace agents
-        final_score = hybrid_score * 100 + BLOCK_SCORE_BOOST
+        score = _score_primary_fields(name, description, normalized_query)

        # Add LLM model match bonus
-        has_llm_field = metadata.get("has_llm_model_field", False)
-        if has_llm_field and _matches_llm_model(block.input_schema, normalized_query):
-            final_score += 20
+        if block_cls is not None and _matches_llm_model(
+            block_cls().input_schema, normalized_query
+        ):
+            score += 20

-        # Add exact/prefix match bonus for deterministic tie-breaking
-        name = block_info.name.lower()
-        if name == normalized_query:
-            final_score += 30
-        elif name.startswith(normalized_query):
-            final_score += 15
-
-        # Track counts
-        filter_type: FilterType = "integrations" if is_integration else "blocks"
-        if is_integration:
-            integration_count += 1
-        else:
-            block_count += 1
-
-        results.append(
-            _ScoredItem(
-                item=block_info,
-                filter_type=filter_type,
-                score=final_score,
-                sort_key=name,
+        if score >= MIN_SCORE_FOR_FILTERED_RESULTS:
+            results.append(
+                _ScoredItem(
+                    item=block_info,
+                    filter_type=item.filter_type,
+                    score=score + BLOCK_SCORE_BOOST,
+                    sort_key=name,
+                )
            )
-        )

+    block_count = sum(1 for r in results if r.filter_type == "blocks")
+    integration_count = sum(1 for r in results if r.filter_type == "integrations")
    return results, block_count, integration_count


--- a/autogpt_platform/backend/backend/api/features/chat/routes.py
+++ b/autogpt_platform/backend/backend/api/features/chat/routes.py
@@ -8,18 +8,20 @@ from typing import Annotated
 from uuid import uuid4

 from autogpt_libs import auth
-from fastapi import APIRouter, Depends, HTTPException, Query, Response, Security
+from fastapi import APIRouter, HTTPException, Query, Response, Security
 from fastapi.responses import StreamingResponse
 from prisma.models import UserWorkspaceFile
-from pydantic import BaseModel, Field, field_validator
+from pydantic import BaseModel, ConfigDict, Field, field_validator

 from backend.copilot import service as chat_service
 from backend.copilot import stream_registry
-from backend.copilot.config import ChatConfig
+from backend.copilot.config import ChatConfig, CopilotMode
+from backend.copilot.db import get_chat_messages_paginated
 from backend.copilot.executor.utils import enqueue_cancel_task, enqueue_copilot_turn
 from backend.copilot.model import (
    ChatMessage,
    ChatSession,
+    ChatSessionMetadata,
    append_and_save_message,
    create_chat_session,
    delete_chat_session,
@@ -27,6 +29,18 @@ from backend.copilot.model import (
    get_user_sessions,
    update_session_title,
 )
+from backend.copilot.rate_limit import (
+    CoPilotUsageStatus,
+    RateLimitExceeded,
+    acquire_reset_lock,
+    check_rate_limit,
+    get_daily_reset_count,
+    get_global_rate_limits,
+    get_usage_status,
+    increment_daily_reset_count,
+    release_reset_lock,
+    reset_daily_usage,
+)
 from backend.copilot.response_model import StreamError, StreamFinish, StreamHeartbeat
 from backend.copilot.tools.e2b_sandbox import kill_sandbox
 from backend.copilot.tools.models import (
@@ -53,10 +67,16 @@ from backend.copilot.tools.models import (
    UnderstandingUpdatedResponse,
 )
 from backend.copilot.tracking import track_user_message
+from backend.data.credit import UsageTransactionMetadata, get_user_credit_model
 from backend.data.redis_client import get_redis_async
 from backend.data.understanding import get_business_understanding
 from backend.data.workspace import get_or_create_workspace
-from backend.util.exceptions import NotFoundError
+from backend.util.exceptions import InsufficientBalanceError, NotFoundError
+from backend.util.settings import Settings
+
+settings = Settings()
+
+logger = logging.getLogger(__name__)

 config = ChatConfig()

@@ -64,8 +84,6 @@ _UUID_RE = re.compile(
    r"^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", re.I
 )

-logger = logging.getLogger(__name__)
-

 async def _validate_and_get_session(
    session_id: str,
@@ -94,6 +112,23 @@ class StreamChatRequest(BaseModel):
    file_ids: list[str] | None = Field(
        default=None, max_length=20
    )  # Workspace file IDs attached to this message
+    mode: CopilotMode | None = Field(
+        default=None,
+        description="Autopilot mode: 'fast' for baseline LLM, 'extended_thinking' for Claude Agent SDK. "
+        "If None, uses the server default (extended_thinking).",
+    )
+
+
+class CreateSessionRequest(BaseModel):
+    """Request model for creating a new chat session.
+
+    ``dry_run`` is a **top-level** field — do not nest it inside ``metadata``.
+    Extra/unknown fields are rejected (422) to prevent silent mis-use.
+    """
+
+    model_config = ConfigDict(extra="forbid")
+
+    dry_run: bool = False


 class CreateSessionResponse(BaseModel):
@@ -102,6 +137,7 @@ class CreateSessionResponse(BaseModel):
    id: str
    created_at: str
    user_id: str | None
+    metadata: ChatSessionMetadata = ChatSessionMetadata()


 class ActiveStreamInfo(BaseModel):
@@ -120,6 +156,11 @@ class SessionDetailResponse(BaseModel):
    user_id: str | None
    messages: list[dict]
    active_stream: ActiveStreamInfo | None = None  # Present if stream is still active
+    has_more_messages: bool = False
+    oldest_sequence: int | None = None
+    total_prompt_tokens: int = 0
+    total_completion_tokens: int = 0
+    metadata: ChatSessionMetadata = ChatSessionMetadata()


 class SessionSummaryResponse(BaseModel):
@@ -207,7 +248,7 @@ async def list_sessions(
            }
        except Exception:
            logger.warning(
-                "Failed to fetch processing status from Redis; " "defaulting to empty"
+                "Failed to fetch processing status from Redis; defaulting to empty"
            )

    return ListSessionsResponse(
@@ -229,7 +270,8 @@ async def list_sessions(
    "/sessions",
 )
 async def create_session(
-    user_id: Annotated[str, Depends(auth.get_user_id)],
+    user_id: Annotated[str, Security(auth.get_user_id)],
+    request: CreateSessionRequest | None = None,
 ) -> CreateSessionResponse:
    """
    Create a new chat session.
@@ -238,22 +280,28 @@ async def create_session(

    Args:
        user_id: The authenticated user ID parsed from the JWT (required).
+        request: Optional request body. When provided, ``dry_run=True``
+            forces run_block and run_agent calls to use dry-run simulation.

    Returns:
        CreateSessionResponse: Details of the created session.

    """
+    dry_run = request.dry_run if request else False
+
    logger.info(
        f"Creating session with user_id: "
        f"...{user_id[-8:] if len(user_id) > 8 else '<redacted>'}"
+        f"{', dry_run=True' if dry_run else ''}"
    )

-    session = await create_chat_session(user_id)
+    session = await create_chat_session(user_id, dry_run=dry_run)

    return CreateSessionResponse(
        id=session.session_id,
        created_at=session.started_at.isoformat(),
        user_id=session.user_id,
+        metadata=session.metadata,
    )


@@ -348,54 +396,278 @@ async def update_session_title_route(
 )
 async def get_session(
    session_id: str,
-    user_id: Annotated[str | None, Depends(auth.get_user_id)],
+    user_id: Annotated[str, Security(auth.get_user_id)],
+    limit: int = Query(default=50, ge=1, le=200),
+    before_sequence: int | None = Query(default=None, ge=0),
 ) -> SessionDetailResponse:
    """
    Retrieve the details of a specific chat session.

-    Looks up a chat session by ID for the given user (if authenticated) and returns all session data including messages.
-    If there's an active stream for this session, returns active_stream info for reconnection.
+    Supports cursor-based pagination via ``limit`` and ``before_sequence``.
+    When no pagination params are provided, returns the most recent messages.

    Args:
        session_id: The unique identifier for the desired chat session.
-        user_id: The optional authenticated user ID, or None for anonymous access.
+        user_id: The authenticated user's ID.
+        limit: Maximum number of messages to return (1-200, default 50).
+        before_sequence: Return messages with sequence < this value (cursor).

    Returns:
-        SessionDetailResponse: Details for the requested session, including active_stream info if applicable.
-
+        SessionDetailResponse: Details for the requested session, including
+            active_stream info and pagination metadata.
    """
-    session = await get_chat_session(session_id, user_id)
-    if not session:
+    page = await get_chat_messages_paginated(
+        session_id, limit, before_sequence, user_id=user_id
+    )
+    if page is None:
        raise NotFoundError(f"Session {session_id} not found.")
+    messages = [message.model_dump() for message in page.messages]

-    messages = [message.model_dump() for message in session.messages]
-
-    # Check if there's an active stream for this session
+    # Only check active stream on initial load (not on "load more" requests)
    active_stream_info = None
-    active_session, last_message_id = await stream_registry.get_active_session(
-        session_id, user_id
-    )
-    logger.info(
-        f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
-        f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
-    )
-    if active_session:
-        # Keep the assistant message (including tool_calls) so the frontend can
-        # render the correct tool UI (e.g. CreateAgent with mini game).
-        # convertChatSessionToUiMessages handles isComplete=false by setting
-        # tool parts without output to state "input-available".
-        active_stream_info = ActiveStreamInfo(
-            turn_id=active_session.turn_id,
-            last_message_id=last_message_id,
+    if before_sequence is None:
+        active_session, last_message_id = await stream_registry.get_active_session(
+            session_id, user_id
+        )
+        logger.info(
+            f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
+            f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
+        )
+        if active_session:
+            active_stream_info = ActiveStreamInfo(
+                turn_id=active_session.turn_id,
+                last_message_id=last_message_id,
+            )
+
+    # Skip session metadata on "load more" — frontend only needs messages
+    if before_sequence is not None:
+        return SessionDetailResponse(
+            id=page.session.session_id,
+            created_at=page.session.started_at.isoformat(),
+            updated_at=page.session.updated_at.isoformat(),
+            user_id=page.session.user_id or None,
+            messages=messages,
+            active_stream=None,
+            has_more_messages=page.has_more,
+            oldest_sequence=page.oldest_sequence,
+            total_prompt_tokens=0,
+            total_completion_tokens=0,
        )

+    total_prompt = sum(u.prompt_tokens for u in page.session.usage)
+    total_completion = sum(u.completion_tokens for u in page.session.usage)
+
    return SessionDetailResponse(
-        id=session.session_id,
-        created_at=session.started_at.isoformat(),
-        updated_at=session.updated_at.isoformat(),
-        user_id=session.user_id or None,
+        id=page.session.session_id,
+        created_at=page.session.started_at.isoformat(),
+        updated_at=page.session.updated_at.isoformat(),
+        user_id=page.session.user_id or None,
        messages=messages,
        active_stream=active_stream_info,
+        has_more_messages=page.has_more,
+        oldest_sequence=page.oldest_sequence,
+        total_prompt_tokens=total_prompt,
+        total_completion_tokens=total_completion,
+        metadata=page.session.metadata,
+    )
+
+
+@router.get(
+    "/usage",
+)
+async def get_copilot_usage(
+    user_id: Annotated[str, Security(auth.get_user_id)],
+) -> CoPilotUsageStatus:
+    """Get CoPilot usage status for the authenticated user.
+
+    Returns current token usage vs limits for daily and weekly windows.
+    Global defaults sourced from LaunchDarkly (falling back to config).
+    Includes the user's rate-limit tier.
+    """
+    daily_limit, weekly_limit, tier = await get_global_rate_limits(
+        user_id, config.daily_token_limit, config.weekly_token_limit
+    )
+    return await get_usage_status(
+        user_id=user_id,
+        daily_token_limit=daily_limit,
+        weekly_token_limit=weekly_limit,
+        rate_limit_reset_cost=config.rate_limit_reset_cost,
+        tier=tier,
+    )
+
+
+class RateLimitResetResponse(BaseModel):
+    """Response from resetting the daily rate limit."""
+
+    success: bool
+    credits_charged: int = Field(description="Credits charged (in cents)")
+    remaining_balance: int = Field(description="Credit balance after charge (in cents)")
+    usage: CoPilotUsageStatus = Field(description="Updated usage status after reset")
+
+
+@router.post(
+    "/usage/reset",
+    status_code=200,
+    responses={
+        400: {
+            "description": "Bad Request (feature disabled or daily limit not reached)"
+        },
+        402: {"description": "Payment Required (insufficient credits)"},
+        429: {
+            "description": "Too Many Requests (max daily resets exceeded or reset in progress)"
+        },
+        503: {
+            "description": "Service Unavailable (Redis reset failed; credits refunded or support needed)"
+        },
+    },
+)
+async def reset_copilot_usage(
+    user_id: Annotated[str, Security(auth.get_user_id)],
+) -> RateLimitResetResponse:
+    """Reset the daily CoPilot rate limit by spending credits.
+
+    Allows users who have hit their daily token limit to spend credits
+    to reset their daily usage counter and continue working.
+    Returns 400 if the feature is disabled or the user is not over the limit.
+    Returns 402 if the user has insufficient credits.
+    """
+    cost = config.rate_limit_reset_cost
+    if cost <= 0:
+        raise HTTPException(
+            status_code=400,
+            detail="Rate limit reset is not available.",
+        )
+
+    if not settings.config.enable_credit:
+        raise HTTPException(
+            status_code=400,
+            detail="Rate limit reset is not available (credit system is disabled).",
+        )
+
+    daily_limit, weekly_limit, tier = await get_global_rate_limits(
+        user_id, config.daily_token_limit, config.weekly_token_limit
+    )
+
+    if daily_limit <= 0:
+        raise HTTPException(
+            status_code=400,
+            detail="No daily limit is configured — nothing to reset.",
+        )
+
+    # Check max daily resets.  get_daily_reset_count returns None when Redis
+    # is unavailable; reject the reset in that case to prevent unlimited
+    # free resets when the counter store is down.
+    reset_count = await get_daily_reset_count(user_id)
+    if reset_count is None:
+        raise HTTPException(
+            status_code=503,
+            detail="Unable to verify reset eligibility — please try again later.",
+        )
+    if config.max_daily_resets > 0 and reset_count >= config.max_daily_resets:
+        raise HTTPException(
+            status_code=429,
+            detail=f"You've used all {config.max_daily_resets} resets for today.",
+        )
+
+    # Acquire a per-user lock to prevent TOCTOU races (concurrent resets).
+    if not await acquire_reset_lock(user_id):
+        raise HTTPException(
+            status_code=429,
+            detail="A reset is already in progress. Please try again.",
+        )
+
+    try:
+        # Verify the user is actually at or over their daily limit.
+        # (rate_limit_reset_cost intentionally omitted — this object is only
+        # used for limit checks, not returned to the client.)
+        usage_status = await get_usage_status(
+            user_id=user_id,
+            daily_token_limit=daily_limit,
+            weekly_token_limit=weekly_limit,
+            tier=tier,
+        )
+        if daily_limit > 0 and usage_status.daily.used < daily_limit:
+            raise HTTPException(
+                status_code=400,
+                detail="You have not reached your daily limit yet.",
+            )
+
+        # If the weekly limit is also exhausted, resetting the daily counter
+        # won't help — the user would still be blocked by the weekly limit.
+        if weekly_limit > 0 and usage_status.weekly.used >= weekly_limit:
+            raise HTTPException(
+                status_code=400,
+                detail="Your weekly limit is also reached. Resetting the daily limit won't help.",
+            )
+
+        # Charge credits.
+        credit_model = await get_user_credit_model(user_id)
+        try:
+            remaining = await credit_model.spend_credits(
+                user_id=user_id,
+                cost=cost,
+                metadata=UsageTransactionMetadata(
+                    reason="CoPilot daily rate limit reset",
+                ),
+            )
+        except InsufficientBalanceError as e:
+            raise HTTPException(
+                status_code=402,
+                detail="Insufficient credits to reset your rate limit.",
+            ) from e
+
+        # Reset daily usage in Redis.  If this fails, refund the credits
+        # so the user is not charged for a service they did not receive.
+        if not await reset_daily_usage(user_id, daily_token_limit=daily_limit):
+            # Compensate: refund the charged credits.
+            refunded = False
+            try:
+                await credit_model.top_up_credits(user_id, cost)
+                refunded = True
+                logger.warning(
+                    "Refunded %d credits to user %s after Redis reset failure",
+                    cost,
+                    user_id[:8],
+                )
+            except Exception:
+                logger.error(
+                    "CRITICAL: Failed to refund %d credits to user %s "
+                    "after Redis reset failure — manual intervention required",
+                    cost,
+                    user_id[:8],
+                    exc_info=True,
+                )
+            if refunded:
+                raise HTTPException(
+                    status_code=503,
+                    detail="Rate limit reset failed — please try again later. "
+                    "Your credits have not been charged.",
+                )
+            raise HTTPException(
+                status_code=503,
+                detail="Rate limit reset failed and the automatic refund "
+                "also failed. Please contact support for assistance.",
+            )
+
+        # Track the reset count for daily cap enforcement.
+        await increment_daily_reset_count(user_id)
+    finally:
+        await release_reset_lock(user_id)
+
+    # Return updated usage status.
+    updated_usage = await get_usage_status(
+        user_id=user_id,
+        daily_token_limit=daily_limit,
+        weekly_token_limit=weekly_limit,
+        rate_limit_reset_cost=config.rate_limit_reset_cost,
+        tier=tier,
+    )
+
+    return RateLimitResetResponse(
+        success=True,
+        credits_charged=cost,
+        remaining_balance=remaining,
+        usage=updated_usage,
    )


@@ -405,7 +677,7 @@ async def get_session(
 )
 async def cancel_session_task(
    session_id: str,
-    user_id: Annotated[str | None, Depends(auth.get_user_id)],
+    user_id: Annotated[str, Security(auth.get_user_id)],
 ) -> CancelSessionResponse:
    """Cancel the active streaming task for a session.

@@ -450,7 +722,7 @@ async def cancel_session_task(
 async def stream_chat_post(
    session_id: str,
    request: StreamChatRequest,
-    user_id: str | None = Depends(auth.get_user_id),
+    user_id: str = Security(auth.get_user_id),
 ):
    """
    Stream chat responses for a session (POST with context support).
@@ -467,7 +739,7 @@ async def stream_chat_post(
    Args:
        session_id: The chat session identifier to associate with the streamed messages.
        request: Request body containing message, is_user_message, and optional context.
-        user_id: Optional authenticated user ID.
+        user_id: Authenticated user ID.
    Returns:
        StreamingResponse: SSE-formatted response chunks.

@@ -476,9 +748,7 @@ async def stream_chat_post(
    import time

    stream_start_time = time.perf_counter()
-    log_meta = {"component": "ChatStream", "session_id": session_id}
-    if user_id:
-        log_meta["user_id"] = user_id
+    log_meta = {"component": "ChatStream", "session_id": session_id, "user_id": user_id}

    logger.info(
        f"[TIMING] stream_chat_post STARTED, session={session_id}, "
@@ -496,6 +766,22 @@ async def stream_chat_post(
        },
    )

+    # Pre-turn rate limit check (token-based).
+    # check_rate_limit short-circuits internally when both limits are 0.
+    # Global defaults sourced from LaunchDarkly, falling back to config.
+    if user_id:
+        try:
+            daily_limit, weekly_limit, _ = await get_global_rate_limits(
+                user_id, config.daily_token_limit, config.weekly_token_limit
+            )
+            await check_rate_limit(
+                user_id=user_id,
+                daily_token_limit=daily_limit,
+                weekly_token_limit=weekly_limit,
+            )
+        except RateLimitExceeded as e:
+            raise HTTPException(status_code=429, detail=str(e)) from e
+
    # Enrich message with file metadata if file_ids are provided.
    # Also sanitise file_ids so only validated, workspace-scoped IDs are
    # forwarded downstream (e.g. to the executor via enqueue_copilot_turn).
@@ -580,6 +866,7 @@ async def stream_chat_post(
        is_user_message=request.is_user_message,
        context=request.context,
        file_ids=sanitized_file_ids,
+        mode=request.mode,
    )

    setup_time = (time.perf_counter() - stream_start_time) * 1000
@@ -730,7 +1017,7 @@ async def stream_chat_post(
 )
 async def resume_session_stream(
    session_id: str,
-    user_id: str | None = Depends(auth.get_user_id),
+    user_id: str = Security(auth.get_user_id),
 ):
    """
    Resume an active stream for a session.
@@ -857,12 +1144,19 @@ async def session_assign_user(
 # ========== Suggested Prompts ==========


-class SuggestedPromptsResponse(BaseModel):
-    """Response model for user-specific suggested prompts."""
+class SuggestedTheme(BaseModel):
+    """A themed group of suggested prompts."""

+    name: str
    prompts: list[str]


+class SuggestedPromptsResponse(BaseModel):
+    """Response model for user-specific suggested prompts grouped by theme."""
+
+    themes: list[SuggestedTheme]
+
+
@router.get(
    "/suggested-prompts",
    dependencies=[Security(auth.requires_user)],
@@ -871,17 +1165,21 @@ async def get_suggested_prompts(
    user_id: Annotated[str, Security(auth.get_user_id)],
 ) -> SuggestedPromptsResponse:
    """
-    Get LLM-generated suggested prompts for the authenticated user.
+    Get LLM-generated suggested prompts grouped by theme.

    Returns personalized quick-action prompts based on the user's
-    business understanding. Returns an empty list if no custom prompts
-    are available.
+    business understanding. Returns empty themes list if no custom
+    prompts are available.
    """
    understanding = await get_business_understanding(user_id)
-    if understanding is None:
-        return SuggestedPromptsResponse(prompts=[])
+    if understanding is None or not understanding.suggested_prompts:
+        return SuggestedPromptsResponse(themes=[])

-    return SuggestedPromptsResponse(prompts=understanding.suggested_prompts)
+    themes = [
+        SuggestedTheme(name=name, prompts=prompts)
+        for name, prompts in understanding.suggested_prompts.items()
+    ]
+    return SuggestedPromptsResponse(themes=themes)


 # ========== Configuration ==========
@@ -932,7 +1230,7 @@ async def health_check() -> dict:
    )

    # Create and retrieve session to verify full data layer
-    session = await create_chat_session(health_check_user_id)
+    session = await create_chat_session(health_check_user_id, dry_run=False)
    await get_chat_session(session.session_id, health_check_user_id)

    return {
--- a/autogpt_platform/backend/backend/api/features/chat/routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/routes_test.py
@@ -1,5 +1,6 @@
-"""Tests for chat API routes: session title update, file attachment validation, and suggested prompts."""
+"""Tests for chat API routes: session title update, file attachment validation, usage, and rate limiting."""

+from datetime import UTC, datetime, timedelta
 from unittest.mock import AsyncMock, MagicMock

 import fastapi
@@ -8,6 +9,7 @@ import pytest
 import pytest_mock

 from backend.api.features.chat import routes as chat_routes
+from backend.copilot.rate_limit import SubscriptionTier

 app = fastapi.FastAPI()
 app.include_router(chat_routes.router)
@@ -251,6 +253,173 @@ def test_file_ids_scoped_to_workspace(mocker: pytest_mock.MockFixture):
    assert call_kwargs["where"]["isDeleted"] is False


+# ─── Rate limit → 429 ─────────────────────────────────────────────────
+
+
+def test_stream_chat_returns_429_on_daily_rate_limit(mocker: pytest_mock.MockFixture):
+    """When check_rate_limit raises RateLimitExceeded for daily limit the endpoint returns 429."""
+    from backend.copilot.rate_limit import RateLimitExceeded
+
+    _mock_stream_internals(mocker)
+    # Ensure the rate-limit branch is entered by setting a non-zero limit.
+    mocker.patch.object(chat_routes.config, "daily_token_limit", 10000)
+    mocker.patch.object(chat_routes.config, "weekly_token_limit", 50000)
+    mocker.patch(
+        "backend.api.features.chat.routes.check_rate_limit",
+        side_effect=RateLimitExceeded("daily", datetime.now(UTC) + timedelta(hours=1)),
+    )
+
+    response = client.post(
+        "/sessions/sess-1/stream",
+        json={"message": "hello"},
+    )
+    assert response.status_code == 429
+    assert "daily" in response.json()["detail"].lower()
+
+
+def test_stream_chat_returns_429_on_weekly_rate_limit(mocker: pytest_mock.MockFixture):
+    """When check_rate_limit raises RateLimitExceeded for weekly limit the endpoint returns 429."""
+    from backend.copilot.rate_limit import RateLimitExceeded
+
+    _mock_stream_internals(mocker)
+    mocker.patch.object(chat_routes.config, "daily_token_limit", 10000)
+    mocker.patch.object(chat_routes.config, "weekly_token_limit", 50000)
+    resets_at = datetime.now(UTC) + timedelta(days=3)
+    mocker.patch(
+        "backend.api.features.chat.routes.check_rate_limit",
+        side_effect=RateLimitExceeded("weekly", resets_at),
+    )
+
+    response = client.post(
+        "/sessions/sess-1/stream",
+        json={"message": "hello"},
+    )
+    assert response.status_code == 429
+    detail = response.json()["detail"].lower()
+    assert "weekly" in detail
+    assert "resets in" in detail
+
+
+def test_stream_chat_429_includes_reset_time(mocker: pytest_mock.MockFixture):
+    """The 429 response detail should include the human-readable reset time."""
+    from backend.copilot.rate_limit import RateLimitExceeded
+
+    _mock_stream_internals(mocker)
+    mocker.patch.object(chat_routes.config, "daily_token_limit", 10000)
+    mocker.patch.object(chat_routes.config, "weekly_token_limit", 50000)
+    mocker.patch(
+        "backend.api.features.chat.routes.check_rate_limit",
+        side_effect=RateLimitExceeded(
+            "daily", datetime.now(UTC) + timedelta(hours=2, minutes=30)
+        ),
+    )
+
+    response = client.post(
+        "/sessions/sess-1/stream",
+        json={"message": "hello"},
+    )
+    assert response.status_code == 429
+    detail = response.json()["detail"]
+    assert "2h" in detail
+    assert "Resets in" in detail
+
+
+# ─── Usage endpoint ───────────────────────────────────────────────────
+
+
+def _mock_usage(
+    mocker: pytest_mock.MockerFixture,
+    *,
+    daily_used: int = 500,
+    weekly_used: int = 2000,
+    daily_limit: int = 10000,
+    weekly_limit: int = 50000,
+    tier: "SubscriptionTier" = SubscriptionTier.FREE,
+) -> AsyncMock:
+    """Mock get_usage_status and get_global_rate_limits for usage endpoint tests.
+
+    Mocks both ``get_global_rate_limits`` (returns the given limits + tier) and
+    ``get_usage_status`` so that tests exercise the endpoint without hitting
+    LaunchDarkly or Prisma.
+    """
+    from backend.copilot.rate_limit import CoPilotUsageStatus, UsageWindow
+
+    mocker.patch(
+        "backend.api.features.chat.routes.get_global_rate_limits",
+        new_callable=AsyncMock,
+        return_value=(daily_limit, weekly_limit, tier),
+    )
+
+    resets_at = datetime.now(UTC) + timedelta(days=1)
+    status = CoPilotUsageStatus(
+        daily=UsageWindow(used=daily_used, limit=daily_limit, resets_at=resets_at),
+        weekly=UsageWindow(used=weekly_used, limit=weekly_limit, resets_at=resets_at),
+    )
+    return mocker.patch(
+        "backend.api.features.chat.routes.get_usage_status",
+        new_callable=AsyncMock,
+        return_value=status,
+    )
+
+
+def test_usage_returns_daily_and_weekly(
+    mocker: pytest_mock.MockerFixture,
+    test_user_id: str,
+) -> None:
+    """GET /usage returns daily and weekly usage."""
+    mock_get = _mock_usage(mocker, daily_used=500, weekly_used=2000)
+
+    mocker.patch.object(chat_routes.config, "daily_token_limit", 10000)
+    mocker.patch.object(chat_routes.config, "weekly_token_limit", 50000)
+
+    response = client.get("/usage")
+
+    assert response.status_code == 200
+    data = response.json()
+    assert data["daily"]["used"] == 500
+    assert data["weekly"]["used"] == 2000
+
+    mock_get.assert_called_once_with(
+        user_id=test_user_id,
+        daily_token_limit=10000,
+        weekly_token_limit=50000,
+        rate_limit_reset_cost=chat_routes.config.rate_limit_reset_cost,
+        tier=SubscriptionTier.FREE,
+    )
+
+
+def test_usage_uses_config_limits(
+    mocker: pytest_mock.MockerFixture,
+    test_user_id: str,
+) -> None:
+    """The endpoint forwards resolved limits from get_global_rate_limits to get_usage_status."""
+    mock_get = _mock_usage(mocker, daily_limit=99999, weekly_limit=77777)
+
+    mocker.patch.object(chat_routes.config, "rate_limit_reset_cost", 500)
+
+    response = client.get("/usage")
+
+    assert response.status_code == 200
+    mock_get.assert_called_once_with(
+        user_id=test_user_id,
+        daily_token_limit=99999,
+        weekly_token_limit=77777,
+        rate_limit_reset_cost=500,
+        tier=SubscriptionTier.FREE,
+    )
+
+
+def test_usage_rejects_unauthenticated_request() -> None:
+    """GET /usage should return 401 when no valid JWT is provided."""
+    unauthenticated_app = fastapi.FastAPI()
+    unauthenticated_app.include_router(chat_routes.router)
+    unauthenticated_client = fastapi.testclient.TestClient(unauthenticated_app)
+
+    response = unauthenticated_client.get("/usage")
+
+    assert response.status_code == 401
+
+
 # ─── Suggested prompts endpoint ──────────────────────────────────────


@@ -267,44 +436,146 @@ def _mock_get_business_understanding(
    )


-def test_suggested_prompts_returns_prompts(
+def test_suggested_prompts_returns_themes(
    mocker: pytest_mock.MockerFixture,
    test_user_id: str,
 ) -> None:
-    """User with understanding and prompts gets them back."""
+    """User with themed prompts gets them back as themes list."""
    mock_understanding = MagicMock()
-    mock_understanding.suggested_prompts = ["Do X", "Do Y", "Do Z"]
+    mock_understanding.suggested_prompts = {
+        "Learn": ["L1", "L2"],
+        "Create": ["C1"],
+    }
    _mock_get_business_understanding(mocker, return_value=mock_understanding)

    response = client.get("/suggested-prompts")

    assert response.status_code == 200
-    assert response.json() == {"prompts": ["Do X", "Do Y", "Do Z"]}
+    data = response.json()
+    assert "themes" in data
+    themes_by_name = {t["name"]: t["prompts"] for t in data["themes"]}
+    assert themes_by_name["Learn"] == ["L1", "L2"]
+    assert themes_by_name["Create"] == ["C1"]


 def test_suggested_prompts_no_understanding(
    mocker: pytest_mock.MockerFixture,
    test_user_id: str,
 ) -> None:
-    """User with no understanding gets empty list."""
+    """User with no understanding gets empty themes list."""
    _mock_get_business_understanding(mocker, return_value=None)

    response = client.get("/suggested-prompts")

    assert response.status_code == 200
-    assert response.json() == {"prompts": []}
+    assert response.json() == {"themes": []}


 def test_suggested_prompts_empty_prompts(
    mocker: pytest_mock.MockerFixture,
    test_user_id: str,
 ) -> None:
-    """User with understanding but no prompts gets empty list."""
+    """User with understanding but empty prompts gets empty themes list."""
    mock_understanding = MagicMock()
-    mock_understanding.suggested_prompts = []
+    mock_understanding.suggested_prompts = {}
    _mock_get_business_understanding(mocker, return_value=mock_understanding)

    response = client.get("/suggested-prompts")

    assert response.status_code == 200
-    assert response.json() == {"prompts": []}
+    assert response.json() == {"themes": []}
+
+
+# ─── Create session: dry_run contract ─────────────────────────────────
+
+
+def _mock_create_chat_session(mocker: pytest_mock.MockerFixture):
+    """Mock create_chat_session to return a fake session."""
+    from backend.copilot.model import ChatSession
+
+    async def _fake_create(user_id: str, *, dry_run: bool):
+        return ChatSession.new(user_id, dry_run=dry_run)
+
+    return mocker.patch(
+        "backend.api.features.chat.routes.create_chat_session",
+        new_callable=AsyncMock,
+        side_effect=_fake_create,
+    )
+
+
+def test_create_session_dry_run_true(
+    mocker: pytest_mock.MockerFixture,
+    test_user_id: str,
+) -> None:
+    """Sending ``{"dry_run": true}`` sets metadata.dry_run to True."""
+    _mock_create_chat_session(mocker)
+
+    response = client.post("/sessions", json={"dry_run": True})
+
+    assert response.status_code == 200
+    assert response.json()["metadata"]["dry_run"] is True
+
+
+def test_create_session_dry_run_default_false(
+    mocker: pytest_mock.MockerFixture,
+    test_user_id: str,
+) -> None:
+    """Empty body defaults dry_run to False."""
+    _mock_create_chat_session(mocker)
+
+    response = client.post("/sessions")
+
+    assert response.status_code == 200
+    assert response.json()["metadata"]["dry_run"] is False
+
+
+def test_create_session_rejects_nested_metadata(
+    test_user_id: str,
+) -> None:
+    """Sending ``{"metadata": {"dry_run": true}}`` must return 422, not silently
+    default to ``dry_run=False``. This guards against the common mistake of
+    nesting dry_run inside metadata instead of providing it at the top level."""
+    response = client.post(
+        "/sessions",
+        json={"metadata": {"dry_run": True}},
+    )
+
+    assert response.status_code == 422
+
+
+class TestStreamChatRequestModeValidation:
+    """Pydantic-level validation of the ``mode`` field on StreamChatRequest."""
+
+    def test_rejects_invalid_mode_value(self) -> None:
+        """Any string outside the Literal set must raise ValidationError."""
+        from pydantic import ValidationError
+
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        with pytest.raises(ValidationError):
+            StreamChatRequest(message="hi", mode="turbo")  # type: ignore[arg-type]
+
+    def test_accepts_fast_mode(self) -> None:
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        req = StreamChatRequest(message="hi", mode="fast")
+        assert req.mode == "fast"
+
+    def test_accepts_extended_thinking_mode(self) -> None:
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        req = StreamChatRequest(message="hi", mode="extended_thinking")
+        assert req.mode == "extended_thinking"
+
+    def test_accepts_none_mode(self) -> None:
+        """``mode=None`` is valid (server decides via feature flags)."""
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        req = StreamChatRequest(message="hi", mode=None)
+        assert req.mode is None
+
+    def test_mode_defaults_to_none_when_omitted(self) -> None:
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        req = StreamChatRequest(message="hi")
+        assert req.mode is None
--- a/autogpt_platform/backend/backend/api/features/integrations/conftest.py
+++ b/autogpt_platform/backend/backend/api/features/integrations/conftest.py
@@ -0,0 +1,13 @@
+"""Override session-scoped fixtures so unit tests run without the server."""
+
+import pytest
+
+
+@pytest.fixture(scope="session")
+def server():
+    yield None
+
+
+@pytest.fixture(scope="session", autouse=True)
+def graph_cleanup():
+    yield
--- a/autogpt_platform/backend/backend/api/features/integrations/router.py
+++ b/autogpt_platform/backend/backend/api/features/integrations/router.py
@@ -34,16 +34,21 @@ from backend.data.model import (
    HostScopedCredentials,
    OAuth2Credentials,
    UserIntegrations,
+    is_sdk_default,
 )
 from backend.data.onboarding import OnboardingStep, complete_onboarding_step
 from backend.data.user import get_user_integrations
 from backend.executor.utils import add_graph_execution
 from backend.integrations.ayrshare import AyrshareClient, SocialPlatform
-from backend.integrations.credentials_store import provider_matches
+from backend.integrations.credentials_store import (
+    is_system_credential,
+    provider_matches,
+)
 from backend.integrations.creds_manager import (
    IntegrationCredentialsManager,
    create_mcp_oauth_handler,
 )
+from backend.integrations.managed_credentials import ensure_managed_credentials
 from backend.integrations.oauth import CREDENTIALS_BY_PROVIDER, HANDLERS_BY_NAME
 from backend.integrations.providers import ProviderName
 from backend.integrations.webhooks import get_webhook_manager
@@ -109,6 +114,7 @@ class CredentialsMetaResponse(BaseModel):
        default=None,
        description="Host pattern for host-scoped or MCP server URL for MCP credentials",
    )
+    is_managed: bool = False

    @model_validator(mode="before")
    @classmethod
@@ -138,6 +144,19 @@ class CredentialsMetaResponse(BaseModel):
        return None


+def to_meta_response(cred: Credentials) -> CredentialsMetaResponse:
+    return CredentialsMetaResponse(
+        id=cred.id,
+        provider=cred.provider,
+        type=cred.type,
+        title=cred.title,
+        scopes=cred.scopes if isinstance(cred, OAuth2Credentials) else None,
+        username=cred.username if isinstance(cred, OAuth2Credentials) else None,
+        host=CredentialsMetaResponse.get_host(cred),
+        is_managed=cred.is_managed,
+    )
+
+
@router.post("/{provider}/callback", summary="Exchange OAuth code for tokens")
 async def callback(
    provider: Annotated[
@@ -204,34 +223,20 @@ async def callback(
        f"and provider {provider.value}"
    )

-    return CredentialsMetaResponse(
-        id=credentials.id,
-        provider=credentials.provider,
-        type=credentials.type,
-        title=credentials.title,
-        scopes=credentials.scopes,
-        username=credentials.username,
-        host=(CredentialsMetaResponse.get_host(credentials)),
-    )
+    return to_meta_response(credentials)


@router.get("/credentials", summary="List Credentials")
 async def list_credentials(
    user_id: Annotated[str, Security(get_user_id)],
 ) -> list[CredentialsMetaResponse]:
+    # Fire-and-forget: provision missing managed credentials in the background.
+    # The credential appears on the next page load; listing is never blocked.
+    asyncio.create_task(ensure_managed_credentials(user_id, creds_manager.store))
    credentials = await creds_manager.store.get_all_creds(user_id)

    return [
-        CredentialsMetaResponse(
-            id=cred.id,
-            provider=cred.provider,
-            type=cred.type,
-            title=cred.title,
-            scopes=cred.scopes if isinstance(cred, OAuth2Credentials) else None,
-            username=cred.username if isinstance(cred, OAuth2Credentials) else None,
-            host=CredentialsMetaResponse.get_host(cred),
-        )
-        for cred in credentials
+        to_meta_response(cred) for cred in credentials if not is_sdk_default(cred.id)
    ]


@@ -242,19 +247,11 @@ async def list_credentials_by_provider(
    ],
    user_id: Annotated[str, Security(get_user_id)],
 ) -> list[CredentialsMetaResponse]:
+    asyncio.create_task(ensure_managed_credentials(user_id, creds_manager.store))
    credentials = await creds_manager.store.get_creds_by_provider(user_id, provider)

    return [
-        CredentialsMetaResponse(
-            id=cred.id,
-            provider=cred.provider,
-            type=cred.type,
-            title=cred.title,
-            scopes=cred.scopes if isinstance(cred, OAuth2Credentials) else None,
-            username=cred.username if isinstance(cred, OAuth2Credentials) else None,
-            host=CredentialsMetaResponse.get_host(cred),
-        )
-        for cred in credentials
+        to_meta_response(cred) for cred in credentials if not is_sdk_default(cred.id)
    ]


@@ -267,18 +264,21 @@ async def get_credential(
    ],
    cred_id: Annotated[str, Path(title="The ID of the credentials to retrieve")],
    user_id: Annotated[str, Security(get_user_id)],
-) -> Credentials:
+) -> CredentialsMetaResponse:
+    if is_sdk_default(cred_id):
+        raise HTTPException(
+            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
+        )
    credential = await creds_manager.get(user_id, cred_id)
    if not credential:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
        )
-    if credential.provider != provider:
+    if not provider_matches(credential.provider, provider):
        raise HTTPException(
-            status_code=status.HTTP_404_NOT_FOUND,
-            detail="Credentials do not match the specified provider",
+            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
        )
-    return credential
+    return to_meta_response(credential)


@router.post("/{provider}/credentials", status_code=201, summary="Create Credentials")
@@ -288,16 +288,22 @@ async def create_credentials(
        ProviderName, Path(title="The provider to create credentials for")
    ],
    credentials: Credentials,
-) -> Credentials:
+) -> CredentialsMetaResponse:
+    if is_sdk_default(credentials.id):
+        raise HTTPException(
+            status_code=status.HTTP_403_FORBIDDEN,
+            detail="Cannot create credentials with a reserved ID",
+        )
    credentials.provider = provider
    try:
        await creds_manager.create(user_id, credentials)
-    except Exception as e:
+    except Exception:
+        logger.exception("Failed to store credentials")
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
-            detail=f"Failed to store credentials: {str(e)}",
+            detail="Failed to store credentials",
        )
-    return credentials
+    return to_meta_response(credentials)


 class CredentialsDeletionResponse(BaseModel):
@@ -332,15 +338,29 @@ async def delete_credentials(
        bool, Query(title="Whether to proceed if any linked webhooks are still in use")
    ] = False,
 ) -> CredentialsDeletionResponse | CredentialsDeletionNeedsConfirmationResponse:
+    if is_sdk_default(cred_id):
+        raise HTTPException(
+            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
+        )
+    if is_system_credential(cred_id):
+        raise HTTPException(
+            status_code=status.HTTP_403_FORBIDDEN,
+            detail="System-managed credentials cannot be deleted",
+        )
    creds = await creds_manager.store.get_creds_by_id(user_id, cred_id)
    if not creds:
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND, detail="Credentials not found"
        )
-    if creds.provider != provider:
+    if not provider_matches(creds.provider, provider):
        raise HTTPException(
            status_code=status.HTTP_404_NOT_FOUND,
-            detail="Credentials do not match the specified provider",
+            detail="Credentials not found",
+        )
+    if creds.is_managed:
+        raise HTTPException(
+            status_code=status.HTTP_403_FORBIDDEN,
+            detail="AutoGPT-managed credentials cannot be deleted",
        )

    try:
--- a/autogpt_platform/backend/backend/api/features/integrations/router_test.py
+++ b/autogpt_platform/backend/backend/api/features/integrations/router_test.py
@@ -0,0 +1,570 @@
+"""Tests for credentials API security: no secret leakage, SDK defaults filtered."""
+
+from contextlib import asynccontextmanager
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import fastapi
+import fastapi.testclient
+import pytest
+from pydantic import SecretStr
+
+from backend.api.features.integrations.router import router
+from backend.data.model import (
+    APIKeyCredentials,
+    HostScopedCredentials,
+    OAuth2Credentials,
+    UserPasswordCredentials,
+)
+
+app = fastapi.FastAPI()
+app.include_router(router)
+client = fastapi.testclient.TestClient(app)
+
+TEST_USER_ID = "test-user-id"
+
+
+def _make_api_key_cred(cred_id: str = "cred-123", provider: str = "openai"):
+    return APIKeyCredentials(
+        id=cred_id,
+        provider=provider,
+        title="My API Key",
+        api_key=SecretStr("sk-secret-key-value"),
+    )
+
+
+def _make_oauth2_cred(cred_id: str = "cred-456", provider: str = "github"):
+    return OAuth2Credentials(
+        id=cred_id,
+        provider=provider,
+        title="My OAuth",
+        access_token=SecretStr("ghp_secret_token"),
+        refresh_token=SecretStr("ghp_refresh_secret"),
+        scopes=["repo", "user"],
+        username="testuser",
+    )
+
+
+def _make_user_password_cred(cred_id: str = "cred-789", provider: str = "openai"):
+    return UserPasswordCredentials(
+        id=cred_id,
+        provider=provider,
+        title="My Login",
+        username=SecretStr("admin"),
+        password=SecretStr("s3cret-pass"),
+    )
+
+
+def _make_host_scoped_cred(cred_id: str = "cred-host", provider: str = "openai"):
+    return HostScopedCredentials(
+        id=cred_id,
+        provider=provider,
+        title="Host Cred",
+        host="https://api.example.com",
+        headers={"Authorization": SecretStr("Bearer top-secret")},
+    )
+
+
+def _make_sdk_default_cred(provider: str = "openai"):
+    return APIKeyCredentials(
+        id=f"{provider}-default",
+        provider=provider,
+        title=f"{provider} (default)",
+        api_key=SecretStr("sk-platform-secret-key"),
+    )
+
+
+@pytest.fixture(autouse=True)
+def setup_auth(mock_jwt_user):
+    from autogpt_libs.auth.jwt_utils import get_jwt_payload
+
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
+    yield
+    app.dependency_overrides.clear()
+
+
+class TestGetCredentialReturnsMetaOnly:
+    """GET /{provider}/credentials/{cred_id} must not return secrets."""
+
+    def test_api_key_credential_no_secret(self):
+        cred = _make_api_key_cred()
+        with (
+            patch.object(router, "dependencies", []),
+            patch("backend.api.features.integrations.router.creds_manager") as mock_mgr,
+        ):
+            mock_mgr.get = AsyncMock(return_value=cred)
+            resp = client.get("/openai/credentials/cred-123")
+
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["id"] == "cred-123"
+        assert data["provider"] == "openai"
+        assert data["type"] == "api_key"
+        assert "api_key" not in data
+        assert "sk-secret-key-value" not in str(data)
+
+    def test_oauth2_credential_no_secret(self):
+        cred = _make_oauth2_cred()
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.get = AsyncMock(return_value=cred)
+            resp = client.get("/github/credentials/cred-456")
+
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["id"] == "cred-456"
+        assert data["scopes"] == ["repo", "user"]
+        assert data["username"] == "testuser"
+        assert "access_token" not in data
+        assert "refresh_token" not in data
+        assert "ghp_" not in str(data)
+
+    def test_user_password_credential_no_secret(self):
+        cred = _make_user_password_cred()
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.get = AsyncMock(return_value=cred)
+            resp = client.get("/openai/credentials/cred-789")
+
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["id"] == "cred-789"
+        assert "password" not in data
+        assert "username" not in data or data["username"] is None
+        assert "s3cret-pass" not in str(data)
+        assert "admin" not in str(data)
+
+    def test_host_scoped_credential_no_secret(self):
+        cred = _make_host_scoped_cred()
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.get = AsyncMock(return_value=cred)
+            resp = client.get("/openai/credentials/cred-host")
+
+        assert resp.status_code == 200
+        data = resp.json()
+        assert data["id"] == "cred-host"
+        assert data["host"] == "https://api.example.com"
+        assert "headers" not in data
+        assert "top-secret" not in str(data)
+
+    def test_get_credential_wrong_provider_returns_404(self):
+        """Provider mismatch should return generic 404, not leak credential existence."""
+        cred = _make_api_key_cred(provider="openai")
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.get = AsyncMock(return_value=cred)
+            resp = client.get("/github/credentials/cred-123")
+
+        assert resp.status_code == 404
+        assert resp.json()["detail"] == "Credentials not found"
+
+    def test_list_credentials_no_secrets(self):
+        """List endpoint must not leak secrets in any credential."""
+        creds = [_make_api_key_cred(), _make_oauth2_cred()]
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.store.get_all_creds = AsyncMock(return_value=creds)
+            resp = client.get("/credentials")
+
+        assert resp.status_code == 200
+        raw = str(resp.json())
+        assert "sk-secret-key-value" not in raw
+        assert "ghp_secret_token" not in raw
+        assert "ghp_refresh_secret" not in raw
+
+
+class TestSdkDefaultCredentialsNotAccessible:
+    """SDK default credentials (ID ending in '-default') must be hidden."""
+
+    def test_get_sdk_default_returns_404(self):
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.get = AsyncMock()
+            resp = client.get("/openai/credentials/openai-default")
+
+        assert resp.status_code == 404
+        mock_mgr.get.assert_not_called()
+
+    def test_list_credentials_excludes_sdk_defaults(self):
+        user_cred = _make_api_key_cred()
+        sdk_cred = _make_sdk_default_cred("openai")
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.store.get_all_creds = AsyncMock(return_value=[user_cred, sdk_cred])
+            resp = client.get("/credentials")
+
+        assert resp.status_code == 200
+        data = resp.json()
+        ids = [c["id"] for c in data]
+        assert "cred-123" in ids
+        assert "openai-default" not in ids
+
+    def test_list_by_provider_excludes_sdk_defaults(self):
+        user_cred = _make_api_key_cred()
+        sdk_cred = _make_sdk_default_cred("openai")
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.store.get_creds_by_provider = AsyncMock(
+                return_value=[user_cred, sdk_cred]
+            )
+            resp = client.get("/openai/credentials")
+
+        assert resp.status_code == 200
+        data = resp.json()
+        ids = [c["id"] for c in data]
+        assert "cred-123" in ids
+        assert "openai-default" not in ids
+
+    def test_delete_sdk_default_returns_404(self):
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.store.get_creds_by_id = AsyncMock()
+            resp = client.request("DELETE", "/openai/credentials/openai-default")
+
+        assert resp.status_code == 404
+        mock_mgr.store.get_creds_by_id.assert_not_called()
+
+
+class TestCreateCredentialNoSecretInResponse:
+    """POST /{provider}/credentials must not return secrets."""
+
+    def test_create_api_key_no_secret_in_response(self):
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.create = AsyncMock()
+            resp = client.post(
+                "/openai/credentials",
+                json={
+                    "id": "new-cred",
+                    "provider": "openai",
+                    "type": "api_key",
+                    "title": "New Key",
+                    "api_key": "sk-newsecret",
+                },
+            )
+
+        assert resp.status_code == 201
+        data = resp.json()
+        assert data["id"] == "new-cred"
+        assert "api_key" not in data
+        assert "sk-newsecret" not in str(data)
+
+    def test_create_with_sdk_default_id_rejected(self):
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.create = AsyncMock()
+            resp = client.post(
+                "/openai/credentials",
+                json={
+                    "id": "openai-default",
+                    "provider": "openai",
+                    "type": "api_key",
+                    "title": "Sneaky",
+                    "api_key": "sk-evil",
+                },
+            )
+
+        assert resp.status_code == 403
+        mock_mgr.create.assert_not_called()
+
+
+class TestManagedCredentials:
+    """AutoGPT-managed credentials cannot be deleted by users."""
+
+    def test_delete_is_managed_returns_403(self):
+        cred = APIKeyCredentials(
+            id="managed-cred-1",
+            provider="agent_mail",
+            title="AgentMail (managed by AutoGPT)",
+            api_key=SecretStr("sk-managed-key"),
+            is_managed=True,
+        )
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.store.get_creds_by_id = AsyncMock(return_value=cred)
+            resp = client.request("DELETE", "/agent_mail/credentials/managed-cred-1")
+
+        assert resp.status_code == 403
+        assert "AutoGPT-managed" in resp.json()["detail"]
+
+    def test_list_credentials_includes_is_managed_field(self):
+        managed = APIKeyCredentials(
+            id="managed-1",
+            provider="agent_mail",
+            title="AgentMail (managed)",
+            api_key=SecretStr("sk-key"),
+            is_managed=True,
+        )
+        regular = APIKeyCredentials(
+            id="regular-1",
+            provider="openai",
+            title="My Key",
+            api_key=SecretStr("sk-key"),
+        )
+        with patch(
+            "backend.api.features.integrations.router.creds_manager"
+        ) as mock_mgr:
+            mock_mgr.store.get_all_creds = AsyncMock(return_value=[managed, regular])
+            resp = client.get("/credentials")
+
+        assert resp.status_code == 200
+        data = resp.json()
+        managed_cred = next(c for c in data if c["id"] == "managed-1")
+        regular_cred = next(c for c in data if c["id"] == "regular-1")
+        assert managed_cred["is_managed"] is True
+        assert regular_cred["is_managed"] is False
+
+
+# ---------------------------------------------------------------------------
+# Managed credential provisioning infrastructure
+# ---------------------------------------------------------------------------
+
+
+def _make_managed_cred(
+    provider: str = "agent_mail", pod_id: str = "pod-abc"
+) -> APIKeyCredentials:
+    return APIKeyCredentials(
+        id="managed-auto",
+        provider=provider,
+        title="AgentMail (managed by AutoGPT)",
+        api_key=SecretStr("sk-pod-key"),
+        is_managed=True,
+        metadata={"pod_id": pod_id},
+    )
+
+
+def _make_store_mock(**kwargs) -> MagicMock:
+    """Create a store mock with a working async ``locks()`` context manager."""
+
+    @asynccontextmanager
+    async def _noop_locked(key):
+        yield
+
+    locks_obj = MagicMock()
+    locks_obj.locked = _noop_locked
+
+    store = MagicMock(**kwargs)
+    store.locks = AsyncMock(return_value=locks_obj)
+    return store
+
+
+class TestEnsureManagedCredentials:
+    """Unit tests for the ensure/cleanup helpers in managed_credentials.py."""
+
+    @pytest.mark.asyncio
+    async def test_provisions_when_missing(self):
+        """Provider.provision() is called when no managed credential exists."""
+        from backend.integrations.managed_credentials import (
+            _PROVIDERS,
+            _provisioned_users,
+            ensure_managed_credentials,
+        )
+
+        cred = _make_managed_cred()
+        provider = MagicMock()
+        provider.provider_name = "test_provider"
+        provider.is_available = AsyncMock(return_value=True)
+        provider.provision = AsyncMock(return_value=cred)
+
+        store = _make_store_mock()
+        store.has_managed_credential = AsyncMock(return_value=False)
+        store.add_managed_credential = AsyncMock()
+
+        saved = dict(_PROVIDERS)
+        _PROVIDERS.clear()
+        _PROVIDERS["test_provider"] = provider
+        _provisioned_users.pop("user-1", None)
+        try:
+            await ensure_managed_credentials("user-1", store)
+        finally:
+            _PROVIDERS.clear()
+            _PROVIDERS.update(saved)
+            _provisioned_users.pop("user-1", None)
+
+        provider.provision.assert_awaited_once_with("user-1")
+        store.add_managed_credential.assert_awaited_once_with("user-1", cred)
+
+    @pytest.mark.asyncio
+    async def test_skips_when_already_exists(self):
+        """Provider.provision() is NOT called when managed credential exists."""
+        from backend.integrations.managed_credentials import (
+            _PROVIDERS,
+            _provisioned_users,
+            ensure_managed_credentials,
+        )
+
+        provider = MagicMock()
+        provider.provider_name = "test_provider"
+        provider.is_available = AsyncMock(return_value=True)
+        provider.provision = AsyncMock()
+
+        store = _make_store_mock()
+        store.has_managed_credential = AsyncMock(return_value=True)
+
+        saved = dict(_PROVIDERS)
+        _PROVIDERS.clear()
+        _PROVIDERS["test_provider"] = provider
+        _provisioned_users.pop("user-1", None)
+        try:
+            await ensure_managed_credentials("user-1", store)
+        finally:
+            _PROVIDERS.clear()
+            _PROVIDERS.update(saved)
+            _provisioned_users.pop("user-1", None)
+
+        provider.provision.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_skips_when_unavailable(self):
+        """Provider.provision() is NOT called when provider is not available."""
+        from backend.integrations.managed_credentials import (
+            _PROVIDERS,
+            _provisioned_users,
+            ensure_managed_credentials,
+        )
+
+        provider = MagicMock()
+        provider.provider_name = "test_provider"
+        provider.is_available = AsyncMock(return_value=False)
+        provider.provision = AsyncMock()
+
+        store = _make_store_mock()
+        store.has_managed_credential = AsyncMock()
+
+        saved = dict(_PROVIDERS)
+        _PROVIDERS.clear()
+        _PROVIDERS["test_provider"] = provider
+        _provisioned_users.pop("user-1", None)
+        try:
+            await ensure_managed_credentials("user-1", store)
+        finally:
+            _PROVIDERS.clear()
+            _PROVIDERS.update(saved)
+            _provisioned_users.pop("user-1", None)
+
+        provider.provision.assert_not_awaited()
+        store.has_managed_credential.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_provision_failure_does_not_propagate(self):
+        """A failed provision is logged but does not raise."""
+        from backend.integrations.managed_credentials import (
+            _PROVIDERS,
+            _provisioned_users,
+            ensure_managed_credentials,
+        )
+
+        provider = MagicMock()
+        provider.provider_name = "test_provider"
+        provider.is_available = AsyncMock(return_value=True)
+        provider.provision = AsyncMock(side_effect=RuntimeError("boom"))
+
+        store = _make_store_mock()
+        store.has_managed_credential = AsyncMock(return_value=False)
+
+        saved = dict(_PROVIDERS)
+        _PROVIDERS.clear()
+        _PROVIDERS["test_provider"] = provider
+        _provisioned_users.pop("user-1", None)
+        try:
+            await ensure_managed_credentials("user-1", store)
+        finally:
+            _PROVIDERS.clear()
+            _PROVIDERS.update(saved)
+            _provisioned_users.pop("user-1", None)
+
+        # No exception raised — provisioning failure is swallowed.
+
+
+class TestCleanupManagedCredentials:
+    """Unit tests for cleanup_managed_credentials."""
+
+    @pytest.mark.asyncio
+    async def test_calls_deprovision_for_managed_creds(self):
+        from backend.integrations.managed_credentials import (
+            _PROVIDERS,
+            cleanup_managed_credentials,
+        )
+
+        cred = _make_managed_cred()
+        provider = MagicMock()
+        provider.provider_name = "agent_mail"
+        provider.deprovision = AsyncMock()
+
+        store = MagicMock()
+        store.get_all_creds = AsyncMock(return_value=[cred])
+
+        saved = dict(_PROVIDERS)
+        _PROVIDERS.clear()
+        _PROVIDERS["agent_mail"] = provider
+        try:
+            await cleanup_managed_credentials("user-1", store)
+        finally:
+            _PROVIDERS.clear()
+            _PROVIDERS.update(saved)
+
+        provider.deprovision.assert_awaited_once_with("user-1", cred)
+
+    @pytest.mark.asyncio
+    async def test_skips_non_managed_creds(self):
+        from backend.integrations.managed_credentials import (
+            _PROVIDERS,
+            cleanup_managed_credentials,
+        )
+
+        regular = _make_api_key_cred()
+        provider = MagicMock()
+        provider.provider_name = "openai"
+        provider.deprovision = AsyncMock()
+
+        store = MagicMock()
+        store.get_all_creds = AsyncMock(return_value=[regular])
+
+        saved = dict(_PROVIDERS)
+        _PROVIDERS.clear()
+        _PROVIDERS["openai"] = provider
+        try:
+            await cleanup_managed_credentials("user-1", store)
+        finally:
+            _PROVIDERS.clear()
+            _PROVIDERS.update(saved)
+
+        provider.deprovision.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_deprovision_failure_does_not_propagate(self):
+        from backend.integrations.managed_credentials import (
+            _PROVIDERS,
+            cleanup_managed_credentials,
+        )
+
+        cred = _make_managed_cred()
+        provider = MagicMock()
+        provider.provider_name = "agent_mail"
+        provider.deprovision = AsyncMock(side_effect=RuntimeError("boom"))
+
+        store = MagicMock()
+        store.get_all_creds = AsyncMock(return_value=[cred])
+
+        saved = dict(_PROVIDERS)
+        _PROVIDERS.clear()
+        _PROVIDERS["agent_mail"] = provider
+        try:
+            await cleanup_managed_credentials("user-1", store)
+        finally:
+            _PROVIDERS.clear()
+            _PROVIDERS.update(saved)
+
+        # No exception raised — cleanup failure is swallowed.
--- a/autogpt_platform/backend/backend/api/features/library/_add_to_library.py
+++ b/autogpt_platform/backend/backend/api/features/library/_add_to_library.py
@@ -0,0 +1,120 @@
+"""Shared logic for adding store agents to a user's library.
+
+Both `add_store_agent_to_library` and `add_store_agent_to_library_as_admin`
+delegate to these helpers so the duplication-prone create/restore/dedup
+logic lives in exactly one place.
+"""
+
+import logging
+
+import prisma.errors
+import prisma.models
+
+import backend.api.features.library.model as library_model
+import backend.data.graph as graph_db
+from backend.data.graph import GraphModel, GraphSettings
+from backend.data.includes import library_agent_include
+from backend.util.exceptions import NotFoundError
+from backend.util.json import SafeJson
+
+logger = logging.getLogger(__name__)
+
+
+async def resolve_graph_for_library(
+    store_listing_version_id: str,
+    user_id: str,
+    *,
+    admin: bool,
+) -> GraphModel:
+    """Look up a StoreListingVersion and resolve its graph.
+
+    When ``admin=True``, uses ``get_graph_as_admin`` to bypass the marketplace
+    APPROVED-only check.  Otherwise uses the regular ``get_graph``.
+    """
+    slv = await prisma.models.StoreListingVersion.prisma().find_unique(
+        where={"id": store_listing_version_id}, include={"AgentGraph": True}
+    )
+    if not slv or not slv.AgentGraph:
+        raise NotFoundError(
+            f"Store listing version {store_listing_version_id} not found or invalid"
+        )
+
+    ag = slv.AgentGraph
+    if admin:
+        graph_model = await graph_db.get_graph_as_admin(
+            graph_id=ag.id, version=ag.version, user_id=user_id
+        )
+    else:
+        graph_model = await graph_db.get_graph(
+            graph_id=ag.id, version=ag.version, user_id=user_id
+        )
+
+    if not graph_model:
+        raise NotFoundError(f"Graph #{ag.id} v{ag.version} not found or accessible")
+    return graph_model
+
+
+async def add_graph_to_library(
+    store_listing_version_id: str,
+    graph_model: GraphModel,
+    user_id: str,
+) -> library_model.LibraryAgent:
+    """Check existing / restore soft-deleted / create new LibraryAgent.
+
+    Uses a create-then-catch-UniqueViolationError-then-update pattern on
+    the (userId, agentGraphId, agentGraphVersion) composite unique constraint.
+    This is more robust than ``upsert`` because Prisma's upsert atomicity
+    guarantees are not well-documented for all versions.
+    """
+    settings_json = SafeJson(GraphSettings.from_graph(graph_model).model_dump())
+    _include = library_agent_include(
+        user_id, include_nodes=False, include_executions=False
+    )
+
+    try:
+        added_agent = await prisma.models.LibraryAgent.prisma().create(
+            data={
+                "User": {"connect": {"id": user_id}},
+                "AgentGraph": {
+                    "connect": {
+                        "graphVersionId": {
+                            "id": graph_model.id,
+                            "version": graph_model.version,
+                        }
+                    }
+                },
+                "isCreatedByUser": False,
+                "useGraphIsActiveVersion": False,
+                "settings": settings_json,
+            },
+            include=_include,
+        )
+    except prisma.errors.UniqueViolationError:
+        # Already exists — update to restore if previously soft-deleted/archived
+        added_agent = await prisma.models.LibraryAgent.prisma().update(
+            where={
+                "userId_agentGraphId_agentGraphVersion": {
+                    "userId": user_id,
+                    "agentGraphId": graph_model.id,
+                    "agentGraphVersion": graph_model.version,
+                }
+            },
+            data={
+                "isDeleted": False,
+                "isArchived": False,
+                "settings": settings_json,
+            },
+            include=_include,
+        )
+        if added_agent is None:
+            raise NotFoundError(
+                f"LibraryAgent for graph #{graph_model.id} "
+                f"v{graph_model.version} not found after UniqueViolationError"
+            )
+
+    logger.debug(
+        f"Added graph #{graph_model.id} v{graph_model.version} "
+        f"for store listing version #{store_listing_version_id} "
+        f"to library for user #{user_id}"
+    )
+    return library_model.LibraryAgent.from_db(added_agent)
--- a/autogpt_platform/backend/backend/api/features/library/_add_to_library_test.py
+++ b/autogpt_platform/backend/backend/api/features/library/_add_to_library_test.py
@@ -0,0 +1,80 @@
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import prisma.errors
+import pytest
+
+from ._add_to_library import add_graph_to_library
+
+
+@pytest.mark.asyncio
+async def test_add_graph_to_library_create_new_agent() -> None:
+    """When no matching LibraryAgent exists, create inserts a new one."""
+    graph_model = MagicMock(id="graph-id", version=2, nodes=[])
+    created_agent = MagicMock(name="CreatedLibraryAgent")
+    converted_agent = MagicMock(name="ConvertedLibraryAgent")
+
+    with (
+        patch(
+            "backend.api.features.library._add_to_library.prisma.models.LibraryAgent.prisma"
+        ) as mock_prisma,
+        patch(
+            "backend.api.features.library._add_to_library.library_model.LibraryAgent.from_db",
+            return_value=converted_agent,
+        ) as mock_from_db,
+    ):
+        mock_prisma.return_value.create = AsyncMock(return_value=created_agent)
+
+        result = await add_graph_to_library("slv-id", graph_model, "user-id")
+
+    assert result is converted_agent
+    mock_from_db.assert_called_once_with(created_agent)
+    # Verify create was called with correct data
+    create_call = mock_prisma.return_value.create.call_args
+    create_data = create_call.kwargs["data"]
+    assert create_data["User"] == {"connect": {"id": "user-id"}}
+    assert create_data["AgentGraph"] == {
+        "connect": {"graphVersionId": {"id": "graph-id", "version": 2}}
+    }
+    assert create_data["isCreatedByUser"] is False
+    assert create_data["useGraphIsActiveVersion"] is False
+
+
+@pytest.mark.asyncio
+async def test_add_graph_to_library_unique_violation_updates_existing() -> None:
+    """UniqueViolationError on create falls back to update."""
+    graph_model = MagicMock(id="graph-id", version=2, nodes=[])
+    updated_agent = MagicMock(name="UpdatedLibraryAgent")
+    converted_agent = MagicMock(name="ConvertedLibraryAgent")
+
+    with (
+        patch(
+            "backend.api.features.library._add_to_library.prisma.models.LibraryAgent.prisma"
+        ) as mock_prisma,
+        patch(
+            "backend.api.features.library._add_to_library.library_model.LibraryAgent.from_db",
+            return_value=converted_agent,
+        ) as mock_from_db,
+    ):
+        mock_prisma.return_value.create = AsyncMock(
+            side_effect=prisma.errors.UniqueViolationError(
+                MagicMock(), message="unique constraint"
+            )
+        )
+        mock_prisma.return_value.update = AsyncMock(return_value=updated_agent)
+
+        result = await add_graph_to_library("slv-id", graph_model, "user-id")
+
+    assert result is converted_agent
+    mock_from_db.assert_called_once_with(updated_agent)
+    # Verify update was called with correct where and data
+    update_call = mock_prisma.return_value.update.call_args
+    assert update_call.kwargs["where"] == {
+        "userId_agentGraphId_agentGraphVersion": {
+            "userId": "user-id",
+            "agentGraphId": "graph-id",
+            "agentGraphVersion": 2,
+        }
+    }
+    update_data = update_call.kwargs["data"]
+    assert update_data["isDeleted"] is False
+    assert update_data["isArchived"] is False
--- a/autogpt_platform/backend/backend/api/features/library/db.py
+++ b/autogpt_platform/backend/backend/api/features/library/db.py
@@ -336,12 +336,15 @@ async def get_library_agent_by_graph_id(
    user_id: str,
    graph_id: str,
    graph_version: Optional[int] = None,
+    include_archived: bool = False,
 ) -> library_model.LibraryAgent | None:
    filter: prisma.types.LibraryAgentWhereInput = {
        "agentGraphId": graph_id,
        "userId": user_id,
        "isDeleted": False,
    }
+    if not include_archived:
+        filter["isArchived"] = False
    if graph_version is not None:
        filter["agentGraphVersion"] = graph_version

@@ -433,32 +436,58 @@ async def create_library_agent(
    async with transaction() as tx:
        library_agents = await asyncio.gather(
            *(
-                prisma.models.LibraryAgent.prisma(tx).create(
-                    data=prisma.types.LibraryAgentCreateInput(
-                        isCreatedByUser=(user_id == user_id),
-                        useGraphIsActiveVersion=True,
-                        User={"connect": {"id": user_id}},
-                        AgentGraph={
-                            "connect": {
-                                "graphVersionId": {
-                                    "id": graph_entry.id,
-                                    "version": graph_entry.version,
+                prisma.models.LibraryAgent.prisma(tx).upsert(
+                    where={
+                        "userId_agentGraphId_agentGraphVersion": {
+                            "userId": user_id,
+                            "agentGraphId": graph_entry.id,
+                            "agentGraphVersion": graph_entry.version,
+                        }
+                    },
+                    data={
+                        "create": prisma.types.LibraryAgentCreateInput(
+                            isCreatedByUser=(user_id == graph.user_id),
+                            useGraphIsActiveVersion=True,
+                            User={"connect": {"id": user_id}},
+                            AgentGraph={
+                                "connect": {
+                                    "graphVersionId": {
+                                        "id": graph_entry.id,
+                                        "version": graph_entry.version,
+                                    }
                                }
-                            }
+                            },
+                            settings=SafeJson(
+                                GraphSettings.from_graph(
+                                    graph_entry,
+                                    hitl_safe_mode=hitl_safe_mode,
+                                    sensitive_action_safe_mode=sensitive_action_safe_mode,
+                                ).model_dump()
+                            ),
+                            **(
+                                {"Folder": {"connect": {"id": folder_id}}}
+                                if folder_id and graph_entry is graph
+                                else {}
+                            ),
+                        ),
+                        "update": {
+                            "isDeleted": False,
+                            "isArchived": False,
+                            "useGraphIsActiveVersion": True,
+                            "settings": SafeJson(
+                                GraphSettings.from_graph(
+                                    graph_entry,
+                                    hitl_safe_mode=hitl_safe_mode,
+                                    sensitive_action_safe_mode=sensitive_action_safe_mode,
+                                ).model_dump()
+                            ),
+                            **(
+                                {"Folder": {"connect": {"id": folder_id}}}
+                                if folder_id and graph_entry is graph
+                                else {}
+                            ),
                        },
-                        settings=SafeJson(
-                            GraphSettings.from_graph(
-                                graph_entry,
-                                hitl_safe_mode=hitl_safe_mode,
-                                sensitive_action_safe_mode=sensitive_action_safe_mode,
-                            ).model_dump()
-                        ),
-                        **(
-                            {"Folder": {"connect": {"id": folder_id}}}
-                            if folder_id and graph_entry is graph
-                            else {}
-                        ),
-                    ),
+                    },
                    include=library_agent_include(
                        user_id, include_nodes=False, include_executions=False
                    ),
@@ -582,7 +611,9 @@ async def update_graph_in_library(

    created_graph = await graph_db.create_graph(graph_model, user_id)

-    library_agent = await get_library_agent_by_graph_id(user_id, created_graph.id)
+    library_agent = await get_library_agent_by_graph_id(
+        user_id, created_graph.id, include_archived=True
+    )
    if not library_agent:
        raise NotFoundError(f"Library agent not found for graph {created_graph.id}")

@@ -818,92 +849,38 @@ async def delete_library_agent_by_graph_id(graph_id: str, user_id: str) -> None:
 async def add_store_agent_to_library(
    store_listing_version_id: str, user_id: str
 ) -> library_model.LibraryAgent:
+    """Adds a marketplace agent to the user’s library.
+
+    See also: `add_store_agent_to_library_as_admin()` which uses
+    `get_graph_as_admin` to bypass marketplace status checks for admin review.
    """
-    Adds an agent from a store listing version to the user's library if they don't already have it.
+    from ._add_to_library import add_graph_to_library, resolve_graph_for_library

-    Args:
-        store_listing_version_id: The ID of the store listing version containing the agent.
-        user_id: The user’s library to which the agent is being added.
-
-    Returns:
-        The newly created LibraryAgent if successfully added, the existing corresponding one if any.
-
-    Raises:
-        NotFoundError: If the store listing or associated agent is not found.
-        DatabaseError: If there's an issue creating the LibraryAgent record.
-    """
    logger.debug(
        f"Adding agent from store listing version #{store_listing_version_id} "
        f"to library for user #{user_id}"
    )
-
-    store_listing_version = (
-        await prisma.models.StoreListingVersion.prisma().find_unique(
-            where={"id": store_listing_version_id}, include={"AgentGraph": True}
-        )
+    graph_model = await resolve_graph_for_library(
+        store_listing_version_id, user_id, admin=False
    )
-    if not store_listing_version or not store_listing_version.AgentGraph:
-        logger.warning(f"Store listing version not found: {store_listing_version_id}")
-        raise NotFoundError(
-            f"Store listing version {store_listing_version_id} not found or invalid"
-        )
+    return await add_graph_to_library(store_listing_version_id, graph_model, user_id)

-    graph = store_listing_version.AgentGraph

-    # Convert to GraphModel to check for HITL blocks
-    graph_model = await graph_db.get_graph(
-        graph_id=graph.id,
-        version=graph.version,
-        user_id=user_id,
-        include_subgraphs=False,
+async def add_store_agent_to_library_as_admin(
+    store_listing_version_id: str, user_id: str
+) -> library_model.LibraryAgent:
+    """Admin variant that uses `get_graph_as_admin` to bypass marketplace
+    APPROVED-only checks, allowing admins to add pending agents for review."""
+    from ._add_to_library import add_graph_to_library, resolve_graph_for_library
+
+    logger.warning(
+        f"ADMIN adding agent from store listing version "
+        f"#{store_listing_version_id} to library for user #{user_id}"
    )
-    if not graph_model:
-        raise NotFoundError(
-            f"Graph #{graph.id} v{graph.version} not found or accessible"
-        )
-
-    # Check if user already has this agent (non-deleted)
-    if existing := await get_library_agent_by_graph_id(
-        user_id, graph.id, graph.version
-    ):
-        return existing
-
-    # Check for soft-deleted version and restore it
-    deleted_agent = await prisma.models.LibraryAgent.prisma().find_unique(
-        where={
-            "userId_agentGraphId_agentGraphVersion": {
-                "userId": user_id,
-                "agentGraphId": graph.id,
-                "agentGraphVersion": graph.version,
-            }
-        },
+    graph_model = await resolve_graph_for_library(
+        store_listing_version_id, user_id, admin=True
    )
-    if deleted_agent and deleted_agent.isDeleted:
-        return await update_library_agent(deleted_agent.id, user_id, is_deleted=False)
-
-    # Create LibraryAgent entry
-    added_agent = await prisma.models.LibraryAgent.prisma().create(
-        data={
-            "User": {"connect": {"id": user_id}},
-            "AgentGraph": {
-                "connect": {
-                    "graphVersionId": {"id": graph.id, "version": graph.version}
-                }
-            },
-            "isCreatedByUser": False,
-            "useGraphIsActiveVersion": False,
-            "settings": SafeJson(GraphSettings.from_graph(graph_model).model_dump()),
-        },
-        include=library_agent_include(
-            user_id, include_nodes=False, include_executions=False
-        ),
-    )
-    logger.debug(
-        f"Added graph #{graph.id} v{graph.version}"
-        f"for store listing version #{store_listing_version.id} "
-        f"to library for user #{user_id}"
-    )
-    return library_model.LibraryAgent.from_db(added_agent)
+    return await add_graph_to_library(store_listing_version_id, graph_model, user_id)


 ##############################################
--- a/autogpt_platform/backend/backend/api/features/library/db_test.py
+++ b/autogpt_platform/backend/backend/api/features/library/db_test.py
@@ -1,4 +1,6 @@
+from contextlib import asynccontextmanager
 from datetime import datetime
+from unittest.mock import AsyncMock, MagicMock, patch

 import prisma.enums
 import prisma.models
@@ -85,10 +87,6 @@ async def test_get_library_agents(mocker):
 async def test_add_agent_to_library(mocker):
    await connect()

-    # Mock the transaction context
-    mock_transaction = mocker.patch("backend.api.features.library.db.transaction")
-    mock_transaction.return_value.__aenter__ = mocker.AsyncMock(return_value=None)
-    mock_transaction.return_value.__aexit__ = mocker.AsyncMock(return_value=None)
    # Mock data
    mock_store_listing_data = prisma.models.StoreListingVersion(
        id="version123",
@@ -143,15 +141,18 @@ async def test_add_agent_to_library(mocker):
    )

    mock_library_agent = mocker.patch("prisma.models.LibraryAgent.prisma")
-    mock_library_agent.return_value.find_first = mocker.AsyncMock(return_value=None)
-    mock_library_agent.return_value.find_unique = mocker.AsyncMock(return_value=None)
    mock_library_agent.return_value.create = mocker.AsyncMock(
        return_value=mock_library_agent_data
    )

-    # Mock graph_db.get_graph function that's called to check for HITL blocks
-    mock_graph_db = mocker.patch("backend.api.features.library.db.graph_db")
+    # Mock graph_db.get_graph function that's called in resolve_graph_for_library
+    # (lives in _add_to_library.py after refactor, not db.py)
+    mock_graph_db = mocker.patch(
+        "backend.api.features.library._add_to_library.graph_db"
+    )
    mock_graph_model = mocker.Mock()
+    mock_graph_model.id = "agent1"
+    mock_graph_model.version = 1
    mock_graph_model.nodes = (
        []
    )  # Empty list so _has_human_in_the_loop_blocks returns False
@@ -170,37 +171,27 @@ async def test_add_agent_to_library(mocker):
    mock_store_listing_version.return_value.find_unique.assert_called_once_with(
        where={"id": "version123"}, include={"AgentGraph": True}
    )
-    mock_library_agent.return_value.find_unique.assert_called_once_with(
-        where={
-            "userId_agentGraphId_agentGraphVersion": {
-                "userId": "test-user",
-                "agentGraphId": "agent1",
-                "agentGraphVersion": 1,
-            }
-        },
-    )
    # Check that create was called with the expected data including settings
    create_call_args = mock_library_agent.return_value.create.call_args
    assert create_call_args is not None

-    # Verify the main structure
-    expected_data = {
+    # Verify the create data structure
+    create_data = create_call_args.kwargs["data"]
+    expected_create = {
        "User": {"connect": {"id": "test-user"}},
        "AgentGraph": {"connect": {"graphVersionId": {"id": "agent1", "version": 1}}},
        "isCreatedByUser": False,
+        "useGraphIsActiveVersion": False,
    }
-
-    actual_data = create_call_args[1]["data"]
-    # Check that all expected fields are present
-    for key, value in expected_data.items():
-        assert actual_data[key] == value
+    for key, value in expected_create.items():
+        assert create_data[key] == value

    # Check that settings field is present and is a SafeJson object
-    assert "settings" in actual_data
-    assert hasattr(actual_data["settings"], "__class__")  # Should be a SafeJson object
+    assert "settings" in create_data
+    assert hasattr(create_data["settings"], "__class__")  # Should be a SafeJson object

    # Check include parameter
-    assert create_call_args[1]["include"] == library_agent_include(
+    assert create_call_args.kwargs["include"] == library_agent_include(
        "test-user", include_nodes=False, include_executions=False
    )

@@ -224,3 +215,141 @@ async def test_add_agent_to_library_not_found(mocker):
    mock_store_listing_version.return_value.find_unique.assert_called_once_with(
        where={"id": "version123"}, include={"AgentGraph": True}
    )
+
+
+@pytest.mark.asyncio
+async def test_get_library_agent_by_graph_id_excludes_archived(mocker):
+    mock_library_agent = mocker.patch("prisma.models.LibraryAgent.prisma")
+    mock_library_agent.return_value.find_first = mocker.AsyncMock(return_value=None)
+
+    result = await db.get_library_agent_by_graph_id("test-user", "agent1", 7)
+
+    assert result is None
+    mock_library_agent.return_value.find_first.assert_called_once()
+    where = mock_library_agent.return_value.find_first.call_args.kwargs["where"]
+    assert where == {
+        "agentGraphId": "agent1",
+        "userId": "test-user",
+        "isDeleted": False,
+        "isArchived": False,
+        "agentGraphVersion": 7,
+    }
+
+
+@pytest.mark.asyncio
+async def test_get_library_agent_by_graph_id_can_include_archived(mocker):
+    mock_library_agent = mocker.patch("prisma.models.LibraryAgent.prisma")
+    mock_library_agent.return_value.find_first = mocker.AsyncMock(return_value=None)
+
+    result = await db.get_library_agent_by_graph_id(
+        "test-user",
+        "agent1",
+        7,
+        include_archived=True,
+    )
+
+    assert result is None
+    mock_library_agent.return_value.find_first.assert_called_once()
+    where = mock_library_agent.return_value.find_first.call_args.kwargs["where"]
+    assert where == {
+        "agentGraphId": "agent1",
+        "userId": "test-user",
+        "isDeleted": False,
+        "agentGraphVersion": 7,
+    }
+
+
+@pytest.mark.asyncio
+async def test_update_graph_in_library_allows_archived_library_agent(mocker):
+    graph = mocker.Mock(id="graph-id")
+    existing_version = mocker.Mock(version=1, is_active=True)
+    graph_model = mocker.Mock()
+    created_graph = mocker.Mock(id="graph-id", version=2, is_active=False)
+    current_library_agent = mocker.Mock()
+    updated_library_agent = mocker.Mock()
+
+    mocker.patch(
+        "backend.api.features.library.db.graph_db.get_graph_all_versions",
+        new=mocker.AsyncMock(return_value=[existing_version]),
+    )
+    mocker.patch(
+        "backend.api.features.library.db.graph_db.make_graph_model",
+        return_value=graph_model,
+    )
+    mocker.patch(
+        "backend.api.features.library.db.graph_db.create_graph",
+        new=mocker.AsyncMock(return_value=created_graph),
+    )
+    mock_get_library_agent = mocker.patch(
+        "backend.api.features.library.db.get_library_agent_by_graph_id",
+        new=mocker.AsyncMock(return_value=current_library_agent),
+    )
+    mock_update_library_agent = mocker.patch(
+        "backend.api.features.library.db.update_library_agent_version_and_settings",
+        new=mocker.AsyncMock(return_value=updated_library_agent),
+    )
+
+    result_graph, result_library_agent = await db.update_graph_in_library(
+        graph,
+        "test-user",
+    )
+
+    assert result_graph is created_graph
+    assert result_library_agent is updated_library_agent
+    assert graph.version == 2
+    graph_model.reassign_ids.assert_called_once_with(
+        user_id="test-user", reassign_graph_id=False
+    )
+    mock_get_library_agent.assert_awaited_once_with(
+        "test-user",
+        "graph-id",
+        include_archived=True,
+    )
+    mock_update_library_agent.assert_awaited_once_with("test-user", created_graph)
+
+
+@pytest.mark.asyncio
+async def test_create_library_agent_uses_upsert():
+    """create_library_agent should use upsert (not create) to handle duplicates."""
+    mock_graph = MagicMock()
+    mock_graph.id = "graph-1"
+    mock_graph.version = 1
+    mock_graph.user_id = "user-1"
+    mock_graph.nodes = []
+    mock_graph.sub_graphs = []
+
+    mock_upserted = MagicMock(name="UpsertedLibraryAgent")
+
+    @asynccontextmanager
+    async def fake_tx():
+        yield None
+
+    with (
+        patch("backend.api.features.library.db.transaction", fake_tx),
+        patch("prisma.models.LibraryAgent.prisma") as mock_prisma,
+        patch(
+            "backend.api.features.library.db.add_generated_agent_image",
+            new=AsyncMock(),
+        ),
+        patch(
+            "backend.api.features.library.model.LibraryAgent.from_db",
+            return_value=MagicMock(),
+        ),
+    ):
+        mock_prisma.return_value.upsert = AsyncMock(return_value=mock_upserted)
+
+        result = await db.create_library_agent(mock_graph, "user-1")
+
+    assert len(result) == 1
+    upsert_call = mock_prisma.return_value.upsert.call_args
+    assert upsert_call is not None
+    # Verify the upsert where clause uses the composite unique key
+    where = upsert_call.kwargs["where"]
+    assert "userId_agentGraphId_agentGraphVersion" in where
+    # Verify the upsert data has both create and update branches
+    data = upsert_call.kwargs["data"]
+    assert "create" in data
+    assert "update" in data
+    # Verify update branch restores soft-deleted/archived agents
+    assert data["update"]["isDeleted"] is False
+    assert data["update"]["isArchived"] is False
--- a/autogpt_platform/backend/backend/api/features/oauth_test.py
+++ b/autogpt_platform/backend/backend/api/features/oauth_test.py
@@ -12,6 +12,7 @@ Tests cover:
 5. Complete OAuth flow end-to-end
 """

+import asyncio
 import base64
 import hashlib
 import secrets
@@ -58,14 +59,27 @@ async def test_user(server, test_user_id: str):

    yield test_user_id

-    # Cleanup - delete in correct order due to foreign key constraints
-    await PrismaOAuthAccessToken.prisma().delete_many(where={"userId": test_user_id})
-    await PrismaOAuthRefreshToken.prisma().delete_many(where={"userId": test_user_id})
-    await PrismaOAuthAuthorizationCode.prisma().delete_many(
-        where={"userId": test_user_id}
-    )
-    await PrismaOAuthApplication.prisma().delete_many(where={"ownerId": test_user_id})
-    await PrismaUser.prisma().delete(where={"id": test_user_id})
+    # Cleanup - delete in correct order due to foreign key constraints.
+    # Wrap in try/except because the event loop or Prisma engine may already
+    # be closed during session teardown on Python 3.12+.
+    try:
+        await asyncio.gather(
+            PrismaOAuthAccessToken.prisma().delete_many(where={"userId": test_user_id}),
+            PrismaOAuthRefreshToken.prisma().delete_many(
+                where={"userId": test_user_id}
+            ),
+            PrismaOAuthAuthorizationCode.prisma().delete_many(
+                where={"userId": test_user_id}
+            ),
+        )
+        await asyncio.gather(
+            PrismaOAuthApplication.prisma().delete_many(
+                where={"ownerId": test_user_id}
+            ),
+            PrismaUser.prisma().delete(where={"id": test_user_id}),
+        )
+    except RuntimeError:
+        pass


@pytest_asyncio.fixture
--- a/autogpt_platform/backend/backend/api/features/onboarding_profile_test.py
+++ b/autogpt_platform/backend/backend/api/features/onboarding_profile_test.py
@@ -0,0 +1,61 @@
+from unittest.mock import AsyncMock
+
+import fastapi
+import fastapi.testclient
+import pytest
+
+from backend.api.features.v1 import v1_router
+
+app = fastapi.FastAPI()
+app.include_router(v1_router)
+client = fastapi.testclient.TestClient(app)
+
+
+@pytest.fixture(autouse=True)
+def setup_app_auth(mock_jwt_user):
+    from autogpt_libs.auth.jwt_utils import get_jwt_payload
+
+    app.dependency_overrides[get_jwt_payload] = mock_jwt_user["get_jwt_payload"]
+    yield
+    app.dependency_overrides.clear()
+
+
+def test_onboarding_profile_success(mocker):
+    mock_extract = mocker.patch(
+        "backend.api.features.v1.extract_business_understanding",
+        new_callable=AsyncMock,
+    )
+    mock_upsert = mocker.patch(
+        "backend.api.features.v1.upsert_business_understanding",
+        new_callable=AsyncMock,
+    )
+
+    from backend.data.understanding import BusinessUnderstandingInput
+
+    mock_extract.return_value = BusinessUnderstandingInput.model_construct(
+        user_name="John",
+        user_role="Founder/CEO",
+        pain_points=["Finding leads"],
+        suggested_prompts={"Learn": ["How do I automate lead gen?"]},
+    )
+    mock_upsert.return_value = AsyncMock()
+
+    response = client.post(
+        "/onboarding/profile",
+        json={
+            "user_name": "John",
+            "user_role": "Founder/CEO",
+            "pain_points": ["Finding leads", "Email & outreach"],
+        },
+    )
+    assert response.status_code == 200
+    mock_extract.assert_awaited_once()
+    mock_upsert.assert_awaited_once()
+
+
+def test_onboarding_profile_missing_fields():
+    response = client.post(
+        "/onboarding/profile",
+        json={"user_name": "John"},
+    )
+    assert response.status_code == 422
--- a/autogpt_platform/backend/backend/api/features/store/content_handlers.py
+++ b/autogpt_platform/backend/backend/api/features/store/content_handlers.py
@@ -5,16 +5,26 @@ Pluggable system for different content sources (store agents, blocks, docs).
 Each handler knows how to fetch and process its content type for embedding.
 """

+from __future__ import annotations
+
+import asyncio
+import functools
+import itertools
 import logging
 from abc import ABC, abstractmethod
 from dataclasses import dataclass
 from pathlib import Path
-from typing import Any, get_args, get_origin
+from typing import TYPE_CHECKING, Any, get_args, get_origin

 from prisma.enums import ContentType

+from backend.blocks import get_blocks
 from backend.blocks.llm import LlmModel
 from backend.data.db import query_raw_with_schema
+from backend.util.text import split_camelcase
+
+if TYPE_CHECKING:
+    from backend.blocks._base import AnyBlockSchema

 logger = logging.getLogger(__name__)

@@ -154,6 +164,28 @@ class StoreAgentHandler(ContentHandler):
        }


+@functools.lru_cache(maxsize=1)
+def _get_enabled_blocks() -> dict[str, AnyBlockSchema]:
+    """Return ``{block_id: block_instance}`` for all enabled, instantiable blocks.
+
+    Disabled blocks and blocks that fail to instantiate are silently skipped
+    (with a warning log), so callers never need their own try/except loop.
+
+    Results are cached for the process lifetime via ``lru_cache`` because
+    blocks are registered at import time and never change while running.
+    """
+    enabled: dict[str, AnyBlockSchema] = {}
+    for block_id, block_cls in get_blocks().items():
+        try:
+            instance = block_cls()
+        except Exception as e:
+            logger.warning(f"Skipping block {block_id}: init failed: {e}")
+            continue
+        if not instance.disabled:
+            enabled[block_id] = instance
+    return enabled
+
+
 class BlockHandler(ContentHandler):
    """Handler for block definitions (Python classes)."""

@@ -163,16 +195,14 @@ class BlockHandler(ContentHandler):

    async def get_missing_items(self, batch_size: int) -> list[ContentItem]:
        """Fetch blocks without embeddings."""
-        from backend.blocks import get_blocks
-
-        # Get all available blocks
-        all_blocks = get_blocks()
-
-        # Check which ones have embeddings
-        if not all_blocks:
+        # to_thread keeps the first (heavy) call off the event loop.  On
+        # subsequent calls the lru_cache makes this a dict lookup, so the
+        # thread-pool overhead is negligible compared to the DB queries below.
+        enabled = await asyncio.to_thread(_get_enabled_blocks)
+        if not enabled:
            return []

-        block_ids = list(all_blocks.keys())
+        block_ids = list(enabled.keys())

        # Query for existing embeddings
        placeholders = ",".join([f"${i+1}" for i in range(len(block_ids))])
@@ -187,52 +217,42 @@ class BlockHandler(ContentHandler):
        )

        existing_ids = {row["contentId"] for row in existing_result}
-        missing_blocks = [
-            (block_id, block_cls)
-            for block_id, block_cls in all_blocks.items()
-            if block_id not in existing_ids
-        ]

-        # Convert to ContentItem
+        # Convert to ContentItem — disabled filtering already done by
+        # _get_enabled_blocks so batch_size won't be exhausted by disabled blocks.
+        missing = ((bid, b) for bid, b in enabled.items() if bid not in existing_ids)
        items = []
-        for block_id, block_cls in missing_blocks[:batch_size]:
+        for block_id, block in itertools.islice(missing, batch_size):
            try:
-                block_instance = block_cls()
-
-                if block_instance.disabled:
-                    continue
-
                # Build searchable text from block metadata
-                parts = []
-                if block_instance.name:
-                    parts.append(block_instance.name)
-                if block_instance.description:
-                    parts.append(block_instance.description)
-                if block_instance.categories:
-                    parts.append(
-                        " ".join(str(cat.value) for cat in block_instance.categories)
+                if not block.name:
+                    logger.warning(
+                        f"Block {block_id} has no name — using block_id as fallback"
                    )
+                display_name = split_camelcase(block.name) if block.name else ""
+                parts = []
+                if display_name:
+                    parts.append(display_name)
+                if block.description:
+                    parts.append(block.description)
+                if block.categories:
+                    parts.append(" ".join(str(cat.value) for cat in block.categories))

                # Add input schema field descriptions
-                block_input_fields = block_instance.input_schema.model_fields
                parts += [
                    f"{field_name}: {field_info.description}"
-                    for field_name, field_info in block_input_fields.items()
+                    for field_name, field_info in block.input_schema.model_fields.items()
                    if field_info.description
                ]

                searchable_text = " ".join(parts)

                categories_list = (
-                    [cat.value for cat in block_instance.categories]
-                    if block_instance.categories
-                    else []
+                    [cat.value for cat in block.categories] if block.categories else []
                )

                # Extract provider names from credentials fields
-                credentials_info = (
-                    block_instance.input_schema.get_credentials_fields_info()
-                )
+                credentials_info = block.input_schema.get_credentials_fields_info()
                is_integration = len(credentials_info) > 0
                provider_names = [
                    provider.value.lower()
@@ -243,7 +263,7 @@ class BlockHandler(ContentHandler):
                # Check if block has LlmModel field in input schema
                has_llm_model_field = any(
                    _contains_type(field.annotation, LlmModel)
-                    for field in block_instance.input_schema.model_fields.values()
+                    for field in block.input_schema.model_fields.values()
                )

                items.append(
@@ -252,13 +272,13 @@ class BlockHandler(ContentHandler):
                        content_type=ContentType.BLOCK,
                        searchable_text=searchable_text,
                        metadata={
-                            "name": block_instance.name,
+                            "name": display_name or block.name or block_id,
                            "categories": categories_list,
                            "providers": provider_names,
                            "has_llm_model_field": has_llm_model_field,
                            "is_integration": is_integration,
                        },
-                        user_id=None,  # Blocks are public
+                        user_id=None,
                    )
                )
            except Exception as e:
@@ -269,22 +289,13 @@ class BlockHandler(ContentHandler):

    async def get_stats(self) -> dict[str, int]:
        """Get statistics about block embedding coverage."""
-        from backend.blocks import get_blocks
-
-        all_blocks = get_blocks()
-
-        # Filter out disabled blocks - they're not indexed
-        enabled_block_ids = [
-            block_id
-            for block_id, block_cls in all_blocks.items()
-            if not block_cls().disabled
-        ]
-        total_blocks = len(enabled_block_ids)
+        enabled = await asyncio.to_thread(_get_enabled_blocks)
+        total_blocks = len(enabled)

        if total_blocks == 0:
            return {"total": 0, "with_embeddings": 0, "without_embeddings": 0}

-        block_ids = enabled_block_ids
+        block_ids = list(enabled.keys())
        placeholders = ",".join([f"${i+1}" for i in range(len(block_ids))])

        embedded_result = await query_raw_with_schema(
--- a/autogpt_platform/backend/backend/api/features/store/content_handlers_test.py
+++ b/autogpt_platform/backend/backend/api/features/store/content_handlers_test.py
@@ -1,7 +1,5 @@
 """
-E2E tests for content handlers (blocks, store agents, documentation).
-
-Tests the full flow: discovering content → generating embeddings → storing.
+Tests for content handlers (blocks, store agents, documentation).
 """

 from pathlib import Path
@@ -15,15 +13,103 @@ from backend.api.features.store.content_handlers import (
    BlockHandler,
    DocumentationHandler,
    StoreAgentHandler,
+    _get_enabled_blocks,
 )


+@pytest.fixture(autouse=True)
+def _clear_block_cache():
+    """Clear the lru_cache on _get_enabled_blocks before each test."""
+    _get_enabled_blocks.cache_clear()
+    yield
+    _get_enabled_blocks.cache_clear()
+
+
+# ---------------------------------------------------------------------------
+# Helper to build a mock block class that returns a pre-configured instance
+# ---------------------------------------------------------------------------
+
+
+def _make_block_class(
+    *,
+    name: str = "Block",
+    description: str = "",
+    disabled: bool = False,
+    categories: list[MagicMock] | None = None,
+    fields: dict[str, str] | None = None,
+    raise_on_init: Exception | None = None,
+) -> MagicMock:
+    cls = MagicMock()
+    if raise_on_init is not None:
+        cls.side_effect = raise_on_init
+        return cls
+    inst = MagicMock()
+    inst.name = name
+    inst.disabled = disabled
+    inst.description = description
+    inst.categories = categories or []
+    field_mocks = {
+        fname: MagicMock(description=fdesc) for fname, fdesc in (fields or {}).items()
+    }
+    inst.input_schema.model_fields = field_mocks
+    inst.input_schema.get_credentials_fields_info.return_value = {}
+    cls.return_value = inst
+    return cls
+
+
+# ---------------------------------------------------------------------------
+# _get_enabled_blocks
+# ---------------------------------------------------------------------------
+
+
+def test_get_enabled_blocks_filters_disabled():
+    """Disabled blocks are excluded."""
+    blocks = {
+        "enabled": _make_block_class(name="E", disabled=False),
+        "disabled": _make_block_class(name="D", disabled=True),
+    }
+    with patch(
+        "backend.api.features.store.content_handlers.get_blocks", return_value=blocks
+    ):
+        result = _get_enabled_blocks()
+    assert list(result.keys()) == ["enabled"]
+
+
+def test_get_enabled_blocks_skips_broken():
+    """Blocks that raise on init are skipped without crashing."""
+    blocks = {
+        "good": _make_block_class(name="Good"),
+        "bad": _make_block_class(raise_on_init=RuntimeError("boom")),
+    }
+    with patch(
+        "backend.api.features.store.content_handlers.get_blocks", return_value=blocks
+    ):
+        result = _get_enabled_blocks()
+    assert list(result.keys()) == ["good"]
+
+
+def test_get_enabled_blocks_cached():
+    """_get_enabled_blocks() calls get_blocks() only once across multiple calls."""
+    blocks = {"b1": _make_block_class(name="B1")}
+    with patch(
+        "backend.api.features.store.content_handlers.get_blocks", return_value=blocks
+    ) as mock_get_blocks:
+        result1 = _get_enabled_blocks()
+        result2 = _get_enabled_blocks()
+    assert result1 is result2
+    mock_get_blocks.assert_called_once()
+
+
+# ---------------------------------------------------------------------------
+# StoreAgentHandler
+# ---------------------------------------------------------------------------
+
+
@pytest.mark.asyncio(loop_scope="session")
 async def test_store_agent_handler_get_missing_items(mocker):
    """Test StoreAgentHandler fetches approved agents without embeddings."""
    handler = StoreAgentHandler()

-    # Mock database query
    mock_missing = [
        {
            "id": "agent-1",
@@ -54,9 +140,7 @@ async def test_store_agent_handler_get_stats(mocker):
    """Test StoreAgentHandler returns correct stats."""
    handler = StoreAgentHandler()

-    # Mock approved count query
    mock_approved = [{"count": 50}]
-    # Mock embedded count query
    mock_embedded = [{"count": 30}]

    with patch(
@@ -70,74 +154,130 @@ async def test_store_agent_handler_get_stats(mocker):
        assert stats["without_embeddings"] == 20


+# ---------------------------------------------------------------------------
+# BlockHandler
+# ---------------------------------------------------------------------------
+
+
@pytest.mark.asyncio(loop_scope="session")
-async def test_block_handler_get_missing_items(mocker):
+async def test_block_handler_get_missing_items():
    """Test BlockHandler discovers blocks without embeddings."""
    handler = BlockHandler()

-    # Mock get_blocks to return test blocks
-    mock_block_class = MagicMock()
-    mock_block_instance = MagicMock()
-    mock_block_instance.name = "Calculator Block"
-    mock_block_instance.description = "Performs calculations"
-    mock_block_instance.categories = [MagicMock(value="MATH")]
-    mock_block_instance.disabled = False
-    mock_field = MagicMock()
-    mock_field.description = "Math expression to evaluate"
-    mock_block_instance.input_schema.model_fields = {"expression": mock_field}
-    mock_block_instance.input_schema.get_credentials_fields_info.return_value = {}
-    mock_block_class.return_value = mock_block_instance
-
-    mock_blocks = {"block-uuid-1": mock_block_class}
-
-    # Mock existing embeddings query (no embeddings exist)
-    mock_existing = []
+    blocks = {
+        "block-uuid-1": _make_block_class(
+            name="CalculatorBlock",
+            description="Performs calculations",
+            categories=[MagicMock(value="MATH")],
+            fields={"expression": "Math expression to evaluate"},
+        ),
+    }

    with patch(
-        "backend.blocks.get_blocks",
-        return_value=mock_blocks,
+        "backend.api.features.store.content_handlers.get_blocks", return_value=blocks
    ):
        with patch(
            "backend.api.features.store.content_handlers.query_raw_with_schema",
-            return_value=mock_existing,
+            return_value=[],
        ):
            items = await handler.get_missing_items(batch_size=10)

            assert len(items) == 1
            assert items[0].content_id == "block-uuid-1"
            assert items[0].content_type == ContentType.BLOCK
+            # CamelCase should be split in searchable text and metadata name
            assert "Calculator Block" in items[0].searchable_text
            assert "Performs calculations" in items[0].searchable_text
            assert "MATH" in items[0].searchable_text
            assert "expression: Math expression" in items[0].searchable_text
+            assert items[0].metadata["name"] == "Calculator Block"
            assert items[0].user_id is None


@pytest.mark.asyncio(loop_scope="session")
-async def test_block_handler_get_stats(mocker):
+async def test_block_handler_get_missing_items_splits_camelcase():
+    """CamelCase block names are split for better search indexing."""
+    handler = BlockHandler()
+
+    blocks = {
+        "ai-block": _make_block_class(name="AITextGeneratorBlock"),
+    }
+
+    with patch(
+        "backend.api.features.store.content_handlers.get_blocks", return_value=blocks
+    ):
+        with patch(
+            "backend.api.features.store.content_handlers.query_raw_with_schema",
+            return_value=[],
+        ):
+            items = await handler.get_missing_items(batch_size=10)
+
+            assert len(items) == 1
+            assert "AI Text Generator Block" in items[0].searchable_text
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_block_handler_get_missing_items_batch_size_zero():
+    """batch_size=0 returns an empty list; the DB is still queried to find missing IDs."""
+    handler = BlockHandler()
+
+    blocks = {"b1": _make_block_class(name="B1")}
+
+    with patch(
+        "backend.api.features.store.content_handlers.get_blocks", return_value=blocks
+    ):
+        with patch(
+            "backend.api.features.store.content_handlers.query_raw_with_schema",
+            return_value=[],
+        ) as mock_query:
+            items = await handler.get_missing_items(batch_size=0)
+            assert items == []
+            # DB query is still issued to learn which blocks lack embeddings;
+            # the empty result comes from itertools.islice limiting to 0 items.
+            mock_query.assert_called_once()
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_block_handler_disabled_dont_exhaust_batch():
+    """Disabled blocks don't consume batch budget, so enabled blocks get indexed."""
+    handler = BlockHandler()
+
+    # 5 disabled + 3 enabled, batch_size=2
+    blocks = {
+        **{
+            f"dis-{i}": _make_block_class(name=f"D{i}", disabled=True) for i in range(5)
+        },
+        **{f"en-{i}": _make_block_class(name=f"E{i}") for i in range(3)},
+    }
+
+    with patch(
+        "backend.api.features.store.content_handlers.get_blocks", return_value=blocks
+    ):
+        with patch(
+            "backend.api.features.store.content_handlers.query_raw_with_schema",
+            return_value=[],
+        ):
+            items = await handler.get_missing_items(batch_size=2)
+
+            assert len(items) == 2
+            assert all(item.content_id.startswith("en-") for item in items)
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_block_handler_get_stats():
    """Test BlockHandler returns correct stats."""
    handler = BlockHandler()

-    # Mock get_blocks - each block class returns an instance with disabled=False
-    def make_mock_block_class():
-        mock_class = MagicMock()
-        mock_instance = MagicMock()
-        mock_instance.disabled = False
-        mock_class.return_value = mock_instance
-        return mock_class
-
-    mock_blocks = {
-        "block-1": make_mock_block_class(),
-        "block-2": make_mock_block_class(),
-        "block-3": make_mock_block_class(),
+    blocks = {
+        "block-1": _make_block_class(name="B1"),
+        "block-2": _make_block_class(name="B2"),
+        "block-3": _make_block_class(name="B3"),
    }

-    # Mock embedded count query (2 blocks have embeddings)
    mock_embedded = [{"count": 2}]

    with patch(
-        "backend.blocks.get_blocks",
-        return_value=mock_blocks,
+        "backend.api.features.store.content_handlers.get_blocks", return_value=blocks
    ):
        with patch(
            "backend.api.features.store.content_handlers.query_raw_with_schema",
@@ -150,21 +290,123 @@ async def test_block_handler_get_stats(mocker):
            assert stats["without_embeddings"] == 1


+@pytest.mark.asyncio(loop_scope="session")
+async def test_block_handler_get_stats_skips_broken():
+    """get_stats skips broken blocks instead of crashing."""
+    handler = BlockHandler()
+
+    blocks = {
+        "good": _make_block_class(name="Good"),
+        "bad": _make_block_class(raise_on_init=RuntimeError("boom")),
+    }
+
+    mock_embedded = [{"count": 1}]
+
+    with patch(
+        "backend.api.features.store.content_handlers.get_blocks", return_value=blocks
+    ):
+        with patch(
+            "backend.api.features.store.content_handlers.query_raw_with_schema",
+            return_value=mock_embedded,
+        ):
+            stats = await handler.get_stats()
+
+            assert stats["total"] == 1  # only the good block
+            assert stats["with_embeddings"] == 1
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_block_handler_handles_none_name():
+    """When block.name is None the fallback display name logic is used."""
+    handler = BlockHandler()
+
+    blocks = {
+        "none-name-block": _make_block_class(
+            name="placeholder",  # will be overridden to None below
+            description="A block with no name",
+        ),
+    }
+    # Override the name to None after construction so _make_block_class
+    # doesn't interfere with the mock wiring.
+    blocks["none-name-block"].return_value.name = None
+
+    with patch(
+        "backend.api.features.store.content_handlers.get_blocks", return_value=blocks
+    ):
+        with patch(
+            "backend.api.features.store.content_handlers.query_raw_with_schema",
+            return_value=[],
+        ):
+            items = await handler.get_missing_items(batch_size=10)
+
+            assert len(items) == 1
+            # display_name should be "" because block.name is None
+            # searchable_text should still contain the description
+            assert "A block with no name" in items[0].searchable_text
+            # metadata["name"] falls back to block_id when both display_name
+            # and block.name are falsy, ensuring it is always a non-empty string.
+            assert items[0].metadata["name"] == "none-name-block"
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_block_handler_handles_empty_attributes():
+    """Test BlockHandler handles blocks with empty/falsy attribute values."""
+    handler = BlockHandler()
+
+    blocks = {"block-minimal": _make_block_class(name="Minimal Block")}
+
+    with patch(
+        "backend.api.features.store.content_handlers.get_blocks", return_value=blocks
+    ):
+        with patch(
+            "backend.api.features.store.content_handlers.query_raw_with_schema",
+            return_value=[],
+        ):
+            items = await handler.get_missing_items(batch_size=10)
+
+            assert len(items) == 1
+            assert items[0].searchable_text == "Minimal Block"
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_block_handler_skips_failed_blocks():
+    """Test BlockHandler skips blocks that fail to instantiate."""
+    handler = BlockHandler()
+
+    blocks = {
+        "good-block": _make_block_class(name="Good Block", description="Works fine"),
+        "bad-block": _make_block_class(raise_on_init=Exception("Instantiation failed")),
+    }
+
+    with patch(
+        "backend.api.features.store.content_handlers.get_blocks", return_value=blocks
+    ):
+        with patch(
+            "backend.api.features.store.content_handlers.query_raw_with_schema",
+            return_value=[],
+        ):
+            items = await handler.get_missing_items(batch_size=10)
+
+            assert len(items) == 1
+            assert items[0].content_id == "good-block"
+
+
+# ---------------------------------------------------------------------------
+# DocumentationHandler
+# ---------------------------------------------------------------------------
+
+
@pytest.mark.asyncio(loop_scope="session")
 async def test_documentation_handler_get_missing_items(tmp_path, mocker):
    """Test DocumentationHandler discovers docs without embeddings."""
    handler = DocumentationHandler()

-    # Create temporary docs directory with test files
    docs_root = tmp_path / "docs"
    docs_root.mkdir()
-
    (docs_root / "guide.md").write_text("# Getting Started\n\nThis is a guide.")
    (docs_root / "api.mdx").write_text("# API Reference\n\nAPI documentation.")

-    # Mock _get_docs_root to return temp dir
    with patch.object(handler, "_get_docs_root", return_value=docs_root):
-        # Mock existing embeddings query (no embeddings exist)
        with patch(
            "backend.api.features.store.content_handlers.query_raw_with_schema",
            return_value=[],
@@ -173,7 +415,6 @@ async def test_documentation_handler_get_missing_items(tmp_path, mocker):

            assert len(items) == 2

-            # Check guide.md (content_id format: doc_path::section_index)
            guide_item = next(
                (item for item in items if item.content_id == "guide.md::0"), None
            )
@@ -184,7 +425,6 @@ async def test_documentation_handler_get_missing_items(tmp_path, mocker):
            assert guide_item.metadata["doc_title"] == "Getting Started"
            assert guide_item.user_id is None

-            # Check api.mdx (content_id format: doc_path::section_index)
            api_item = next(
                (item for item in items if item.content_id == "api.mdx::0"), None
            )
@@ -197,14 +437,12 @@ async def test_documentation_handler_get_stats(tmp_path, mocker):
    """Test DocumentationHandler returns correct stats."""
    handler = DocumentationHandler()

-    # Create temporary docs directory
    docs_root = tmp_path / "docs"
    docs_root.mkdir()
    (docs_root / "doc1.md").write_text("# Doc 1")
    (docs_root / "doc2.md").write_text("# Doc 2")
    (docs_root / "doc3.mdx").write_text("# Doc 3")

-    # Mock embedded count query (1 doc has embedding)
    mock_embedded = [{"count": 1}]

    with patch.object(handler, "_get_docs_root", return_value=docs_root):
@@ -224,13 +462,11 @@ async def test_documentation_handler_title_extraction(tmp_path):
    """Test DocumentationHandler extracts title from markdown heading."""
    handler = DocumentationHandler()

-    # Test with heading
    doc_with_heading = tmp_path / "with_heading.md"
    doc_with_heading.write_text("# My Title\n\nContent here")
    title = handler._extract_doc_title(doc_with_heading)
    assert title == "My Title"

-    # Test without heading
    doc_without_heading = tmp_path / "no-heading.md"
    doc_without_heading.write_text("Just content, no heading")
    title = handler._extract_doc_title(doc_without_heading)
@@ -242,7 +478,6 @@ async def test_documentation_handler_markdown_chunking(tmp_path):
    """Test DocumentationHandler chunks markdown by headings."""
    handler = DocumentationHandler()

-    # Test document with multiple sections
    doc_with_sections = tmp_path / "sections.md"
    doc_with_sections.write_text(
        "# Document Title\n\n"
@@ -254,7 +489,6 @@ async def test_documentation_handler_markdown_chunking(tmp_path):
    )
    sections = handler._chunk_markdown_by_headings(doc_with_sections)

-    # Should have 3 sections: intro (with doc title), section one, section two
    assert len(sections) == 3
    assert sections[0].title == "Document Title"
    assert sections[0].index == 0
@@ -268,7 +502,6 @@ async def test_documentation_handler_markdown_chunking(tmp_path):
    assert sections[2].index == 2
    assert "Content for section two" in sections[2].content

-    # Test document without headings
    doc_no_sections = tmp_path / "no-sections.md"
    doc_no_sections.write_text("Just plain content without any headings.")
    sections = handler._chunk_markdown_by_headings(doc_no_sections)
@@ -282,21 +515,39 @@ async def test_documentation_handler_section_content_ids():
    """Test DocumentationHandler creates and parses section content IDs."""
    handler = DocumentationHandler()

-    # Test making content ID
    content_id = handler._make_section_content_id("docs/guide.md", 2)
    assert content_id == "docs/guide.md::2"

-    # Test parsing content ID
    doc_path, section_index = handler._parse_section_content_id("docs/guide.md::2")
    assert doc_path == "docs/guide.md"
    assert section_index == 2

-    # Test parsing legacy format (no section index)
    doc_path, section_index = handler._parse_section_content_id("docs/old-format.md")
    assert doc_path == "docs/old-format.md"
    assert section_index == 0


+@pytest.mark.asyncio(loop_scope="session")
+async def test_documentation_handler_missing_docs_directory():
+    """Test DocumentationHandler handles missing docs directory gracefully."""
+    handler = DocumentationHandler()
+
+    fake_path = Path("/nonexistent/docs")
+    with patch.object(handler, "_get_docs_root", return_value=fake_path):
+        items = await handler.get_missing_items(batch_size=10)
+        assert items == []
+
+        stats = await handler.get_stats()
+        assert stats["total"] == 0
+        assert stats["with_embeddings"] == 0
+        assert stats["without_embeddings"] == 0
+
+
+# ---------------------------------------------------------------------------
+# Registry
+# ---------------------------------------------------------------------------
+
+
@pytest.mark.asyncio(loop_scope="session")
 async def test_content_handlers_registry():
    """Test all content types are registered."""
@@ -307,88 +558,3 @@ async def test_content_handlers_registry():
    assert isinstance(CONTENT_HANDLERS[ContentType.STORE_AGENT], StoreAgentHandler)
    assert isinstance(CONTENT_HANDLERS[ContentType.BLOCK], BlockHandler)
    assert isinstance(CONTENT_HANDLERS[ContentType.DOCUMENTATION], DocumentationHandler)
-
-
-@pytest.mark.asyncio(loop_scope="session")
-async def test_block_handler_handles_empty_attributes():
-    """Test BlockHandler handles blocks with empty/falsy attribute values."""
-    handler = BlockHandler()
-
-    # Mock block with empty values (all attributes exist but are falsy)
-    mock_block_class = MagicMock()
-    mock_block_instance = MagicMock()
-    mock_block_instance.name = "Minimal Block"
-    mock_block_instance.disabled = False
-    mock_block_instance.description = ""
-    mock_block_instance.categories = set()
-    mock_block_instance.input_schema.model_fields = {}
-    mock_block_instance.input_schema.get_credentials_fields_info.return_value = {}
-    mock_block_class.return_value = mock_block_instance
-
-    mock_blocks = {"block-minimal": mock_block_class}
-
-    with patch(
-        "backend.blocks.get_blocks",
-        return_value=mock_blocks,
-    ):
-        with patch(
-            "backend.api.features.store.content_handlers.query_raw_with_schema",
-            return_value=[],
-        ):
-            items = await handler.get_missing_items(batch_size=10)
-
-            assert len(items) == 1
-            assert items[0].searchable_text == "Minimal Block"
-
-
-@pytest.mark.asyncio(loop_scope="session")
-async def test_block_handler_skips_failed_blocks():
-    """Test BlockHandler skips blocks that fail to instantiate."""
-    handler = BlockHandler()
-
-    # Mock one good block and one bad block
-    good_block = MagicMock()
-    good_instance = MagicMock()
-    good_instance.name = "Good Block"
-    good_instance.description = "Works fine"
-    good_instance.categories = []
-    good_instance.disabled = False
-    good_instance.input_schema.model_fields = {}
-    good_instance.input_schema.get_credentials_fields_info.return_value = {}
-    good_block.return_value = good_instance
-
-    bad_block = MagicMock()
-    bad_block.side_effect = Exception("Instantiation failed")
-
-    mock_blocks = {"good-block": good_block, "bad-block": bad_block}
-
-    with patch(
-        "backend.blocks.get_blocks",
-        return_value=mock_blocks,
-    ):
-        with patch(
-            "backend.api.features.store.content_handlers.query_raw_with_schema",
-            return_value=[],
-        ):
-            items = await handler.get_missing_items(batch_size=10)
-
-            # Should only get the good block
-            assert len(items) == 1
-            assert items[0].content_id == "good-block"
-
-
-@pytest.mark.asyncio(loop_scope="session")
-async def test_documentation_handler_missing_docs_directory():
-    """Test DocumentationHandler handles missing docs directory gracefully."""
-    handler = DocumentationHandler()
-
-    # Mock _get_docs_root to return non-existent path
-    fake_path = Path("/nonexistent/docs")
-    with patch.object(handler, "_get_docs_root", return_value=fake_path):
-        items = await handler.get_missing_items(batch_size=10)
-        assert items == []
-
-        stats = await handler.get_stats()
-        assert stats["total"] == 0
-        assert stats["with_embeddings"] == 0
-        assert stats["without_embeddings"] == 0
--- a/autogpt_platform/backend/backend/api/features/store/db.py
+++ b/autogpt_platform/backend/backend/api/features/store/db.py
@@ -9,7 +9,7 @@ import prisma.errors
 import prisma.models
 import prisma.types

-from backend.data.db import transaction
+from backend.data.db import query_raw_with_schema, transaction
 from backend.data.graph import (
    GraphModel,
    GraphModelWithoutNodes,
@@ -104,7 +104,8 @@ async def get_store_agents(
                # search_used_hybrid remains False, will use fallback path below

            # Convert hybrid search results (dict format) if hybrid succeeded
-            if search_used_hybrid:
+            # Fall through to direct DB search if hybrid returned nothing
+            if search_used_hybrid and agents:
                total_pages = (total + page_size - 1) // page_size
                store_agents: list[store_model.StoreAgent] = []
                for agent in agents:
@@ -130,52 +131,20 @@ async def get_store_agents(
                        )
                        continue

-        if not search_used_hybrid:
-            # Fallback path - use basic search or no search
-            where_clause: prisma.types.StoreAgentWhereInput = {"is_available": True}
-            if featured:
-                where_clause["featured"] = featured
-            if creators:
-                where_clause["creator_username"] = {"in": creators}
-            if category:
-                where_clause["categories"] = {"has": category}
-
-            # Add basic text search if search_query provided but hybrid failed
-            if search_query:
-                where_clause["OR"] = [
-                    {"agent_name": {"contains": search_query, "mode": "insensitive"}},
-                    {"sub_heading": {"contains": search_query, "mode": "insensitive"}},
-                    {"description": {"contains": search_query, "mode": "insensitive"}},
-                ]
-
-            order_by = []
-            if sorted_by == StoreAgentsSortOptions.RATING:
-                order_by.append({"rating": "desc"})
-            elif sorted_by == StoreAgentsSortOptions.RUNS:
-                order_by.append({"runs": "desc"})
-            elif sorted_by == StoreAgentsSortOptions.NAME:
-                order_by.append({"agent_name": "asc"})
-            elif sorted_by == StoreAgentsSortOptions.UPDATED_AT:
-                order_by.append({"updated_at": "desc"})
-
-            db_agents = await prisma.models.StoreAgent.prisma().find_many(
-                where=where_clause,
-                order=order_by,
-                skip=(page - 1) * page_size,
-                take=page_size,
+        if not search_used_hybrid or not agents:
+            # Fallback path: direct DB query with optional tsvector search.
+            # This mirrors the original pre-hybrid-search implementation.
+            store_agents, total = await _fallback_store_agent_search(
+                search_query=search_query,
+                featured=featured,
+                creators=creators,
+                category=category,
+                sorted_by=sorted_by,
+                page=page,
+                page_size=page_size,
            )
-
-            total = await prisma.models.StoreAgent.prisma().count(where=where_clause)
            total_pages = (total + page_size - 1) // page_size

-            store_agents: list[store_model.StoreAgent] = []
-            for agent in db_agents:
-                try:
-                    store_agents.append(store_model.StoreAgent.from_db(agent))
-                except Exception as e:
-                    logger.error(f"Error parsing StoreAgent from db: {e}")
-                    continue
-
        logger.debug(f"Found {len(store_agents)} agents")
        return store_model.StoreAgentsResponse(
            agents=store_agents,
@@ -195,6 +164,126 @@ async def get_store_agents(
    #         await log_search_term(search_query=search_term)


+async def _fallback_store_agent_search(
+    *,
+    search_query: str | None,
+    featured: bool,
+    creators: list[str] | None,
+    category: str | None,
+    sorted_by: StoreAgentsSortOptions | None,
+    page: int,
+    page_size: int,
+) -> tuple[list[store_model.StoreAgent], int]:
+    """Direct DB search fallback when hybrid search is unavailable or empty.
+
+    Uses ad-hoc to_tsvector/plainto_tsquery with ts_rank_cd for text search,
+    matching the quality of the original pre-hybrid-search implementation.
+    Falls back to simple listing when no search query is provided.
+    """
+    if not search_query:
+        # No search query — use Prisma for simple filtered listing
+        where_clause: prisma.types.StoreAgentWhereInput = {"is_available": True}
+        if featured:
+            where_clause["featured"] = featured
+        if creators:
+            where_clause["creator_username"] = {"in": creators}
+        if category:
+            where_clause["categories"] = {"has": category}
+
+        order_by = []
+        if sorted_by == StoreAgentsSortOptions.RATING:
+            order_by.append({"rating": "desc"})
+        elif sorted_by == StoreAgentsSortOptions.RUNS:
+            order_by.append({"runs": "desc"})
+        elif sorted_by == StoreAgentsSortOptions.NAME:
+            order_by.append({"agent_name": "asc"})
+        elif sorted_by == StoreAgentsSortOptions.UPDATED_AT:
+            order_by.append({"updated_at": "desc"})
+
+        db_agents = await prisma.models.StoreAgent.prisma().find_many(
+            where=where_clause,
+            order=order_by,
+            skip=(page - 1) * page_size,
+            take=page_size,
+        )
+        total = await prisma.models.StoreAgent.prisma().count(where=where_clause)
+        return [store_model.StoreAgent.from_db(a) for a in db_agents], total
+
+    # Text search using ad-hoc tsvector on StoreAgent view fields
+    params: list[Any] = [search_query]
+    filters = ["sa.is_available = true"]
+    param_idx = 2
+
+    if featured:
+        filters.append("sa.featured = true")
+    if creators:
+        params.append(creators)
+        filters.append(f"sa.creator_username = ANY(${param_idx})")
+        param_idx += 1
+    if category:
+        params.append(category)
+        filters.append(f"${param_idx} = ANY(sa.categories)")
+        param_idx += 1
+
+    where_sql = " AND ".join(filters)
+
+    params.extend([page_size, (page - 1) * page_size])
+    limit_param = f"${param_idx}"
+    param_idx += 1
+    offset_param = f"${param_idx}"
+
+    sql = f"""
+        WITH ranked AS (
+            SELECT sa.*,
+                ts_rank_cd(
+                    to_tsvector('english',
+                        COALESCE(sa.agent_name, '') || ' ' ||
+                        COALESCE(sa.sub_heading, '') || ' ' ||
+                        COALESCE(sa.description, '')
+                    ),
+                    plainto_tsquery('english', $1)
+                ) AS rank,
+                COUNT(*) OVER () AS total_count
+            FROM {{schema_prefix}}"StoreAgent" sa
+            WHERE {where_sql}
+            AND to_tsvector('english',
+                    COALESCE(sa.agent_name, '') || ' ' ||
+                    COALESCE(sa.sub_heading, '') || ' ' ||
+                    COALESCE(sa.description, '')
+                ) @@ plainto_tsquery('english', $1)
+        )
+        SELECT * FROM ranked
+        ORDER BY rank DESC
+        LIMIT {limit_param} OFFSET {offset_param}
+    """
+
+    results = await query_raw_with_schema(sql, *params)
+    total = results[0]["total_count"] if results else 0
+
+    store_agents = []
+    for row in results:
+        try:
+            store_agents.append(
+                store_model.StoreAgent(
+                    slug=row["slug"],
+                    agent_name=row["agent_name"],
+                    agent_image=row["agent_image"][0] if row["agent_image"] else "",
+                    creator=row["creator_username"] or "Needs Profile",
+                    creator_avatar=row["creator_avatar"] or "",
+                    sub_heading=row["sub_heading"],
+                    description=row["description"],
+                    runs=row["runs"],
+                    rating=row["rating"],
+                    agent_graph_id=row.get("graph_id", ""),
+                )
+            )
+        except Exception as e:
+            logger.error(f"Error parsing StoreAgent from fallback search: {e}")
+            continue
+
+    return store_agents, total
+
+
 async def log_search_term(search_query: str):
    """Log a search term to the database"""

@@ -302,6 +391,11 @@ async def get_available_graph(
 async def get_store_agent_by_version_id(
    store_listing_version_id: str,
 ) -> store_model.StoreAgentDetails:
+    """Get agent details from the StoreAgent view (APPROVED agents only).
+
+    See also: `get_store_agent_details_as_admin()` which bypasses the
+    APPROVED-only StoreAgent view for admin preview of pending submissions.
+    """
    logger.debug(f"Getting store agent details for {store_listing_version_id}")

    try:
@@ -322,6 +416,57 @@ async def get_store_agent_by_version_id(
        raise DatabaseError("Failed to fetch agent details") from e


+async def get_store_agent_details_as_admin(
+    store_listing_version_id: str,
+) -> store_model.StoreAgentDetails:
+    """Get agent details for admin preview, bypassing the APPROVED-only
+    StoreAgent view. Queries StoreListingVersion directly so pending
+    submissions are visible."""
+    slv = await prisma.models.StoreListingVersion.prisma().find_unique(
+        where={"id": store_listing_version_id},
+        include={
+            "StoreListing": {"include": {"CreatorProfile": True}},
+        },
+    )
+    if not slv or not slv.StoreListing:
+        raise NotFoundError(
+            f"Store listing version {store_listing_version_id} not found"
+        )
+
+    listing = slv.StoreListing
+    # CreatorProfile is a required FK relation — should always exist.
+    # If it's None, the DB is in a bad state.
+    profile = listing.CreatorProfile
+    if not profile:
+        raise DatabaseError(
+            f"StoreListing {listing.id} has no CreatorProfile — FK violated"
+        )
+
+    return store_model.StoreAgentDetails(
+        store_listing_version_id=slv.id,
+        slug=listing.slug,
+        agent_name=slv.name,
+        agent_video=slv.videoUrl or "",
+        agent_output_demo=slv.agentOutputDemoUrl or "",
+        agent_image=slv.imageUrls,
+        creator=profile.username,
+        creator_avatar=profile.avatarUrl or "",
+        sub_heading=slv.subHeading,
+        description=slv.description,
+        instructions=slv.instructions,
+        categories=slv.categories,
+        runs=0,
+        rating=0.0,
+        versions=[str(slv.version)],
+        graph_id=slv.agentGraphId,
+        graph_versions=[str(slv.agentGraphVersion)],
+        last_updated=slv.updatedAt,
+        recommended_schedule_cron=slv.recommendedScheduleCron,
+        active_version_id=listing.activeVersionId or slv.id,
+        has_approved_version=listing.hasApprovedVersion,
+    )
+
+
 class StoreCreatorsSortOptions(Enum):
    # NOTE: values correspond 1:1 to columns of the Creator view
    AGENT_RATING = "agent_rating"
@@ -1139,16 +1284,21 @@ async def review_store_submission(
                    },
                )

-                # Generate embedding for approved listing (blocking - admin operation)
-                # Inside transaction: if embedding fails, entire transaction rolls back
-                await ensure_embedding(
-                    version_id=store_listing_version_id,
-                    name=submission.name,
-                    description=submission.description,
-                    sub_heading=submission.subHeading,
-                    categories=submission.categories,
-                    tx=tx,
-                )
+                # Generate embedding for approved listing (best-effort)
+                try:
+                    await ensure_embedding(
+                        version_id=store_listing_version_id,
+                        name=submission.name,
+                        description=submission.description,
+                        sub_heading=submission.subHeading,
+                        categories=submission.categories,
+                        tx=tx,
+                    )
+                except Exception as emb_err:
+                    logger.warning(
+                        f"Could not generate embedding for listing "
+                        f"{store_listing_version_id}: {emb_err}"
+                    )

                await prisma.models.StoreListing.prisma(tx).update(
                    where={"id": submission.storeListingId},
--- a/autogpt_platform/backend/backend/api/features/store/db_test.py
+++ b/autogpt_platform/backend/backend/api/features/store/db_test.py
@@ -189,6 +189,7 @@ async def test_create_store_submission(mocker):
        notifyOnAgentApproved=True,
        notifyOnAgentRejected=True,
        timezone="Europe/Delft",
+        subscriptionTier=prisma.enums.SubscriptionTier.FREE,  # type: ignore[reportCallIssue,reportAttributeAccessIssue]
    )
    mock_agent = prisma.models.AgentGraph(
        id="agent-id",
--- a/autogpt_platform/backend/backend/api/features/store/embeddings.py
+++ b/autogpt_platform/backend/backend/api/features/store/embeddings.py
@@ -15,6 +15,7 @@ from prisma.enums import ContentType
 from tiktoken import encoding_for_model

 from backend.api.features.store.content_handlers import CONTENT_HANDLERS
+from backend.blocks import get_blocks
 from backend.data.db import execute_raw_with_schema, query_raw_with_schema
 from backend.util.clients import get_openai_client
 from backend.util.json import dumps
@@ -662,8 +663,6 @@ async def cleanup_orphaned_embeddings() -> dict[str, Any]:
                )
                current_ids = {row["id"] for row in valid_agents}
            elif content_type == ContentType.BLOCK:
-                from backend.blocks import get_blocks
-
                current_ids = set(get_blocks().keys())
            elif content_type == ContentType.DOCUMENTATION:
                # Use DocumentationHandler to get section-based content IDs
--- a/autogpt_platform/backend/backend/api/features/store/hybrid_search.py
+++ b/autogpt_platform/backend/backend/api/features/store/hybrid_search.py
@@ -31,12 +31,10 @@ logger = logging.getLogger(__name__)


 def tokenize(text: str) -> list[str]:
-    """Simple tokenizer for BM25 - lowercase and split on non-alphanumeric."""
+    """Tokenize text for BM25."""
    if not text:
        return []
-    # Lowercase and split on non-alphanumeric characters
-    tokens = re.findall(r"\b\w+\b", text.lower())
-    return tokens
+    return re.findall(r"\b\w+\b", text.lower())


 def bm25_rerank(
--- a/autogpt_platform/backend/backend/api/features/store/hybrid_search_test.py
+++ b/autogpt_platform/backend/backend/api/features/store/hybrid_search_test.py
@@ -14,9 +14,27 @@ from backend.api.features.store.hybrid_search import (
    HybridSearchWeights,
    UnifiedSearchWeights,
    hybrid_search,
+    tokenize,
    unified_hybrid_search,
 )

+# ---------------------------------------------------------------------------
+# tokenize (BM25)
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.parametrize(
+    "input_text, expected",
+    [
+        ("AITextGeneratorBlock", ["aitextgeneratorblock"]),
+        ("hello world", ["hello", "world"]),
+        ("", []),
+        ("HTTPRequest", ["httprequest"]),
+    ],
+)
+def test_tokenize(input_text: str, expected: list[str]):
+    assert tokenize(input_text) == expected
+

@pytest.mark.asyncio(loop_scope="session")
@pytest.mark.integration
--- a/autogpt_platform/backend/backend/api/features/store/routes.py
+++ b/autogpt_platform/backend/backend/api/features/store/routes.py
@@ -1,5 +1,4 @@
 import logging
-import tempfile
 import urllib.parse

 import autogpt_libs.auth
@@ -259,21 +258,18 @@ async def get_graph_meta_by_store_listing_version_id(
 )
 async def download_agent_file(
    store_listing_version_id: str,
-) -> fastapi.responses.FileResponse:
+) -> fastapi.responses.Response:
    """Download agent graph file for a specific marketplace listing version"""
    graph_data = await store_db.get_agent(store_listing_version_id)
    file_name = f"agent_{graph_data.id}_v{graph_data.version or 'latest'}.json"

-    # Sending graph as a stream (similar to marketplace v1)
-    with tempfile.NamedTemporaryFile(
-        mode="w", suffix=".json", delete=False
-    ) as tmp_file:
-        tmp_file.write(backend.util.json.dumps(graph_data))
-        tmp_file.flush()
-
-        return fastapi.responses.FileResponse(
-            tmp_file.name, filename=file_name, media_type="application/json"
-        )
+    return fastapi.responses.Response(
+        content=backend.util.json.dumps(graph_data),
+        media_type="application/json",
+        headers={
+            "Content-Disposition": f'attachment; filename="{file_name}"',
+        },
+    )


 ##############################################
--- a/autogpt_platform/backend/backend/api/features/store/text_utils.py
+++ b/autogpt_platform/backend/backend/api/features/store/text_utils.py
@@ -0,0 +1,5 @@
+"""Backward-compatibility shim — ``split_camelcase`` now lives in backend.util.text."""
+
+from backend.util.text import split_camelcase  # noqa: F401
+
+__all__ = ["split_camelcase"]
--- a/autogpt_platform/backend/backend/api/features/store/text_utils_test.py
+++ b/autogpt_platform/backend/backend/api/features/store/text_utils_test.py
@@ -0,0 +1,49 @@
+"""Tests for split_camelcase (now in backend.util.text)."""
+
+import pytest
+
+from backend.util.text import split_camelcase
+
+# ---------------------------------------------------------------------------
+# split_camelcase
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.parametrize(
+    "input_text, expected",
+    [
+        ("AITextGeneratorBlock", "AI Text Generator Block"),
+        ("HTTPRequestBlock", "HTTP Request Block"),
+        ("simpleWord", "simple Word"),
+        ("already spaced", "already spaced"),
+        ("XMLParser", "XML Parser"),
+        ("getHTTPResponse", "get HTTP Response"),
+        ("Block", "Block"),
+        ("", ""),
+        ("OAuth2Block", "OAuth2 Block"),
+        ("IOError", "IO Error"),
+        ("getHTTPSResponse", "get HTTPS Response"),
+        # Known limitation: single-letter uppercase prefixes are NOT split.
+        # "ABlock" stays "ABlock" because the algorithm requires the left
+        # part of an uppercase run to retain at least 2 uppercase chars.
+        ("ABlock", "ABlock"),
+        # Digit-to-uppercase transitions
+        ("Base64Encoder", "Base64 Encoder"),
+        ("UTF8Decoder", "UTF8 Decoder"),
+        # Pure digits — no camelCase boundaries to split
+        ("123", "123"),
+        # Known limitation: single-letter uppercase segments after digits
+        # are not split from the following word.  "3D" is only 1 uppercase
+        # char so the uppercase-run rule cannot fire, producing "3 DRenderer"
+        # rather than the ideal "3D Renderer".
+        ("3DRenderer", "3 DRenderer"),
+        # Exception list — compound terms that should stay together
+        ("YouTubeBlock", "YouTube Block"),
+        ("OpenAIBlock", "OpenAI Block"),
+        ("AutoGPTAgent", "AutoGPT Agent"),
+        ("GitHubIntegration", "GitHub Integration"),
+        ("LinkedInBlock", "LinkedIn Block"),
+    ],
+)
+def test_split_camelcase(input_text: str, expected: str):
+    assert split_camelcase(input_text) == expected
--- a/autogpt_platform/backend/backend/api/features/v1.py
+++ b/autogpt_platform/backend/backend/api/features/v1.py
@@ -55,7 +55,6 @@ from backend.data.credit import (
    set_auto_top_up,
 )
 from backend.data.graph import GraphSettings
-from backend.data.invited_user import get_or_activate_user
 from backend.data.model import CredentialsMetaInput, UserOnboarding
 from backend.data.notifications import NotificationPreference, NotificationPreferenceDTO
 from backend.data.onboarding import (
@@ -64,13 +63,19 @@ from backend.data.onboarding import (
    UserOnboardingUpdate,
    complete_onboarding_step,
    complete_re_run_agent,
+    format_onboarding_for_extraction,
    get_recommended_agents,
    get_user_onboarding,
-    onboarding_enabled,
    reset_user_onboarding,
    update_user_onboarding,
 )
+from backend.data.tally import extract_business_understanding
+from backend.data.understanding import (
+    BusinessUnderstandingInput,
+    upsert_business_understanding,
+)
 from backend.data.user import (
+    get_or_create_user,
    get_user_by_id,
    get_user_notification_preference,
    update_user_email,
@@ -136,10 +141,12 @@ _tally_background_tasks: set[asyncio.Task] = set()
    dependencies=[Security(requires_user)],
 )
 async def get_or_create_user_route(user_data: dict = Security(get_jwt_payload)):
-    user = await get_or_activate_user(user_data)
+    user = await get_or_create_user(user_data)

-    # Fire-and-forget: backfill Tally understanding when invite pre-seeding did
-    # not produce a stored result before first activation.
+    # Fire-and-forget: populate business understanding from Tally form.
+    # We use created_at proximity instead of an is_new flag because
+    # get_or_create_user is cached — a separate is_new return value would be
+    # unreliable on repeated calls within the cache TTL.
    age_seconds = (datetime.now(timezone.utc) - user.created_at).total_seconds()
    if age_seconds < 30:
        try:
@@ -163,8 +170,7 @@ async def get_or_create_user_route(user_data: dict = Security(get_jwt_payload)):
    dependencies=[Security(requires_user)],
 )
 async def update_user_email_route(
-    user_id: Annotated[str, Security(get_user_id)],
-    email: str = Body(...),
+    user_id: Annotated[str, Security(get_user_id)], email: str = Body(...)
 ) -> dict[str, str]:
    await update_user_email(user_id, email)

@@ -178,16 +184,10 @@ async def update_user_email_route(
    dependencies=[Security(requires_user)],
 )
 async def get_user_timezone_route(
-    user_id: Annotated[str, Security(get_user_id)],
+    user_data: dict = Security(get_jwt_payload),
 ) -> TimezoneResponse:
    """Get user timezone setting."""
-    try:
-        user = await get_user_by_id(user_id)
-    except ValueError:
-        raise HTTPException(
-            status_code=HTTP_404_NOT_FOUND,
-            detail="User not found. Please complete activation via /auth/user first.",
-        )
+    user = await get_or_create_user(user_data)
    return TimezoneResponse(timezone=user.timezone)


@@ -198,8 +198,7 @@ async def get_user_timezone_route(
    dependencies=[Security(requires_user)],
 )
 async def update_user_timezone_route(
-    user_id: Annotated[str, Security(get_user_id)],
-    request: UpdateTimezoneRequest,
+    user_id: Annotated[str, Security(get_user_id)], request: UpdateTimezoneRequest
 ) -> TimezoneResponse:
    """Update user timezone. The timezone should be a valid IANA timezone identifier."""
    user = await update_user_timezone(user_id, str(request.timezone))
@@ -288,35 +287,33 @@ async def get_onboarding_agents(
    return await get_recommended_agents(user_id)


-class OnboardingStatusResponse(pydantic.BaseModel):
-    """Response for onboarding status check."""
+class OnboardingProfileRequest(pydantic.BaseModel):
+    """Request body for onboarding profile submission."""

-    is_onboarding_enabled: bool
-    is_chat_enabled: bool
+    user_name: str = pydantic.Field(min_length=1, max_length=100)
+    user_role: str = pydantic.Field(min_length=1, max_length=100)
+    pain_points: list[str] = pydantic.Field(default_factory=list, max_length=20)
+
+
+class OnboardingStatusResponse(pydantic.BaseModel):
+    """Response for onboarding completion check."""
+
+    is_completed: bool


@v1_router.get(
-    "/onboarding/enabled",
-    summary="Is onboarding enabled",
+    "/onboarding/completed",
+    summary="Check if onboarding is completed",
    tags=["onboarding", "public"],
    response_model=OnboardingStatusResponse,
+    dependencies=[Security(requires_user)],
 )
-async def is_onboarding_enabled(
+async def is_onboarding_completed(
    user_id: Annotated[str, Security(get_user_id)],
 ) -> OnboardingStatusResponse:
-    # Check if chat is enabled for user
-    is_chat_enabled = await is_feature_enabled(Flag.CHAT, user_id, False)
-
-    # If chat is enabled, skip legacy onboarding
-    if is_chat_enabled:
-        return OnboardingStatusResponse(
-            is_onboarding_enabled=False,
-            is_chat_enabled=True,
-        )
-
+    user_onboarding = await get_user_onboarding(user_id)
    return OnboardingStatusResponse(
-        is_onboarding_enabled=await onboarding_enabled(),
-        is_chat_enabled=False,
+        is_completed=OnboardingStep.VISIT_COPILOT in user_onboarding.completedSteps,
    )


@@ -331,6 +328,38 @@ async def reset_onboarding(user_id: Annotated[str, Security(get_user_id)]):
    return await reset_user_onboarding(user_id)


+@v1_router.post(
+    "/onboarding/profile",
+    summary="Submit onboarding profile",
+    tags=["onboarding"],
+    dependencies=[Security(requires_user)],
+)
+async def submit_onboarding_profile(
+    data: OnboardingProfileRequest,
+    user_id: Annotated[str, Security(get_user_id)],
+):
+    formatted = format_onboarding_for_extraction(
+        user_name=data.user_name,
+        user_role=data.user_role,
+        pain_points=data.pain_points,
+    )
+
+    try:
+        understanding_input = await extract_business_understanding(formatted)
+    except Exception:
+        understanding_input = BusinessUnderstandingInput.model_construct()
+
+    # Ensure the direct fields are set even if LLM missed them
+    understanding_input.user_name = data.user_name
+    understanding_input.user_role = data.user_role
+    if not understanding_input.pain_points:
+        understanding_input.pain_points = data.pain_points
+
+    await upsert_business_understanding(user_id, understanding_input)
+
+    return {"status": "ok"}
+
+
 ########################################################
 ##################### Blocks ###########################
 ########################################################
@@ -598,6 +627,11 @@ async def fulfill_checkout(user_id: Annotated[str, Security(get_user_id)]):
 async def configure_user_auto_top_up(
    request: AutoTopUpConfig, user_id: Annotated[str, Security(get_user_id)]
 ) -> str:
+    """Configure auto top-up settings and perform an immediate top-up if needed.
+
+    Raises HTTPException(422) if the request parameters are invalid or if
+    the credit top-up fails.
+    """
    if request.threshold < 0:
        raise HTTPException(status_code=422, detail="Threshold must be greater than 0")
    if request.amount < 500 and request.amount != 0:
@@ -612,10 +646,20 @@ async def configure_user_auto_top_up(
    user_credit_model = await get_user_credit_model(user_id)
    current_balance = await user_credit_model.get_credits(user_id)

-    if current_balance < request.threshold:
-        await user_credit_model.top_up_credits(user_id, request.amount)
-    else:
-        await user_credit_model.top_up_credits(user_id, 0)
+    try:
+        if current_balance < request.threshold:
+            await user_credit_model.top_up_credits(user_id, request.amount)
+        else:
+            await user_credit_model.top_up_credits(user_id, 0)
+    except ValueError as e:
+        known_messages = (
+            "must not be negative",
+            "already exists for user",
+            "No payment method found",
+        )
+        if any(msg in str(e) for msg in known_messages):
+            raise HTTPException(status_code=422, detail=str(e))
+        raise

    await set_auto_top_up(
        user_id, AutoTopUpConfig(threshold=request.threshold, amount=request.amount)
@@ -971,14 +1015,16 @@ async def execute_graph(
    source: Annotated[GraphExecutionSource | None, Body(embed=True)] = None,
    graph_version: Optional[int] = None,
    preset_id: Optional[str] = None,
+    dry_run: Annotated[bool, Body(embed=True)] = False,
 ) -> execution_db.GraphExecutionMeta:
-    user_credit_model = await get_user_credit_model(user_id)
-    current_balance = await user_credit_model.get_credits(user_id)
-    if current_balance <= 0:
-        raise HTTPException(
-            status_code=402,
-            detail="Insufficient balance to execute the agent. Please top up your account.",
-        )
+    if not dry_run:
+        user_credit_model = await get_user_credit_model(user_id)
+        current_balance = await user_credit_model.get_credits(user_id)
+        if current_balance <= 0:
+            raise HTTPException(
+                status_code=402,
+                detail="Insufficient balance to execute the agent. Please top up your account.",
+            )

    try:
        result = await execution_utils.add_graph_execution(
@@ -988,6 +1034,7 @@ async def execute_graph(
            preset_id=preset_id,
            graph_version=graph_version,
            graph_credentials_inputs=credentials_inputs,
+            dry_run=dry_run,
        )
        # Record successful graph execution
        record_graph_execution(graph_id=graph_id, status="success", user_id=user_id)
--- a/autogpt_platform/backend/backend/api/features/v1_test.py
+++ b/autogpt_platform/backend/backend/api/features/v1_test.py
@@ -51,7 +51,7 @@ def test_get_or_create_user_route(
    }

    mocker.patch(
-        "backend.api.features.v1.get_or_activate_user",
+        "backend.api.features.v1.get_or_create_user",
        return_value=mock_user,
    )

--- a/autogpt_platform/backend/backend/api/features/workspace/routes.py
+++ b/autogpt_platform/backend/backend/api/features/workspace/routes.py
@@ -12,7 +12,7 @@ import fastapi
 from autogpt_libs.auth.dependencies import get_user_id, requires_user
 from fastapi import Query, UploadFile
 from fastapi.responses import Response
-from pydantic import BaseModel
+from pydantic import BaseModel, Field

 from backend.data.workspace import (
    WorkspaceFile,
@@ -131,9 +131,26 @@ class StorageUsageResponse(BaseModel):
    file_count: int


+class WorkspaceFileItem(BaseModel):
+    id: str
+    name: str
+    path: str
+    mime_type: str
+    size_bytes: int
+    metadata: dict = Field(default_factory=dict)
+    created_at: str
+
+
+class ListFilesResponse(BaseModel):
+    files: list[WorkspaceFileItem]
+    offset: int = 0
+    has_more: bool = False
+
+
@router.get(
    "/files/{file_id}/download",
    summary="Download file by ID",
+    operation_id="getWorkspaceDownloadFileById",
 )
 async def download_file(
    user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -158,6 +175,7 @@ async def download_file(
@router.delete(
    "/files/{file_id}",
    summary="Delete a workspace file",
+    operation_id="deleteWorkspaceFile",
 )
 async def delete_workspace_file(
    user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -183,11 +201,13 @@ async def delete_workspace_file(
@router.post(
    "/files/upload",
    summary="Upload file to workspace",
+    operation_id="uploadWorkspaceFile",
 )
 async def upload_file(
    user_id: Annotated[str, fastapi.Security(get_user_id)],
    file: UploadFile,
    session_id: str | None = Query(default=None),
+    overwrite: bool = Query(default=False),
 ) -> UploadFileResponse:
    """
    Upload a file to the user's workspace.
@@ -195,6 +215,9 @@ async def upload_file(
    Files are stored in session-scoped paths when session_id is provided,
    so the agent's session-scoped tools can discover them automatically.
    """
+    # Empty-string session_id drops session scoping; normalize to None.
+    session_id = session_id or None
+
    config = Config()

    # Sanitize filename — strip any directory components
@@ -248,15 +271,28 @@ async def upload_file(
    # Write file via WorkspaceManager
    manager = WorkspaceManager(user_id, workspace.id, session_id)
    try:
-        workspace_file = await manager.write_file(content, filename)
+        workspace_file = await manager.write_file(
+            content, filename, overwrite=overwrite, metadata={"origin": "user-upload"}
+        )
    except ValueError as e:
-        raise fastapi.HTTPException(status_code=409, detail=str(e)) from e
+        # write_file raises ValueError for both path-conflict and size-limit
+        # cases; map each to its correct HTTP status.
+        message = str(e)
+        if message.startswith("File too large"):
+            raise fastapi.HTTPException(status_code=413, detail=message) from e
+        raise fastapi.HTTPException(status_code=409, detail=message) from e

    # Post-write storage check — eliminates TOCTOU race on the quota.
    # If a concurrent upload pushed us over the limit, undo this write.
    new_total = await get_workspace_total_size(workspace.id)
    if storage_limit_bytes and new_total > storage_limit_bytes:
-        await soft_delete_workspace_file(workspace_file.id, workspace.id)
+        try:
+            await soft_delete_workspace_file(workspace_file.id, workspace.id)
+        except Exception as e:
+            logger.warning(
+                f"Failed to soft-delete over-quota file {workspace_file.id} "
+                f"in workspace {workspace.id}: {e}"
+            )
        raise fastapi.HTTPException(
            status_code=413,
            detail={
@@ -278,6 +314,7 @@ async def upload_file(
@router.get(
    "/storage/usage",
    summary="Get workspace storage usage",
+    operation_id="getWorkspaceStorageUsage",
 )
 async def get_storage_usage(
    user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -298,3 +335,57 @@ async def get_storage_usage(
        used_percent=round((used_bytes / limit_bytes) * 100, 1) if limit_bytes else 0,
        file_count=file_count,
    )
+
+
+@router.get(
+    "/files",
+    summary="List workspace files",
+    operation_id="listWorkspaceFiles",
+)
+async def list_workspace_files(
+    user_id: Annotated[str, fastapi.Security(get_user_id)],
+    session_id: str | None = Query(default=None),
+    limit: int = Query(default=200, ge=1, le=1000),
+    offset: int = Query(default=0, ge=0),
+) -> ListFilesResponse:
+    """
+    List files in the user's workspace.
+
+    When session_id is provided, only files for that session are returned.
+    Otherwise, all files across sessions are listed. Results are paginated
+    via `limit`/`offset`; `has_more` indicates whether additional pages exist.
+    """
+    workspace = await get_or_create_workspace(user_id)
+
+    # Treat empty-string session_id the same as omitted — an empty value
+    # would otherwise silently list files across every session instead of
+    # scoping to one.
+    session_id = session_id or None
+
+    manager = WorkspaceManager(user_id, workspace.id, session_id)
+    include_all = session_id is None
+    # Fetch one extra to compute has_more without a separate count query.
+    files = await manager.list_files(
+        limit=limit + 1,
+        offset=offset,
+        include_all_sessions=include_all,
+    )
+    has_more = len(files) > limit
+    page = files[:limit]
+
+    return ListFilesResponse(
+        files=[
+            WorkspaceFileItem(
+                id=f.id,
+                name=f.name,
+                path=f.path,
+                mime_type=f.mime_type,
+                size_bytes=f.size_bytes,
+                metadata=f.metadata or {},
+                created_at=f.created_at.isoformat(),
+            )
+            for f in page
+        ],
+        offset=offset,
+        has_more=has_more,
+    )
--- a/autogpt_platform/backend/backend/api/features/workspace/routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/workspace/routes_test.py
@@ -1,48 +1,28 @@
-"""Tests for workspace file upload and download routes."""
-
 import io
 from datetime import datetime, timezone
+from unittest.mock import AsyncMock, MagicMock, patch

 import fastapi
 import fastapi.testclient
 import pytest
-import pytest_mock

-from backend.api.features.workspace import routes as workspace_routes
-from backend.data.workspace import WorkspaceFile
+from backend.api.features.workspace.routes import router
+from backend.data.workspace import Workspace, WorkspaceFile

 app = fastapi.FastAPI()
-app.include_router(workspace_routes.router)
+app.include_router(router)


@app.exception_handler(ValueError)
 async def _value_error_handler(
    request: fastapi.Request, exc: ValueError
 ) -> fastapi.responses.JSONResponse:
-    """Mirror the production ValueError → 400 mapping from rest_api.py."""
+    """Mirror the production ValueError → 400 mapping from the REST app."""
    return fastapi.responses.JSONResponse(status_code=400, content={"detail": str(exc)})


 client = fastapi.testclient.TestClient(app)

-TEST_USER_ID = "3e53486c-cf57-477e-ba2a-cb02dc828e1a"
-
-MOCK_WORKSPACE = type("W", (), {"id": "ws-1"})()
-
-_NOW = datetime(2023, 1, 1, tzinfo=timezone.utc)
-
-MOCK_FILE = WorkspaceFile(
-    id="file-aaa-bbb",
-    workspace_id="ws-1",
-    created_at=_NOW,
-    updated_at=_NOW,
-    name="hello.txt",
-    path="/session/hello.txt",
-    mime_type="text/plain",
-    size_bytes=13,
-    storage_path="local://hello.txt",
-)
-

@pytest.fixture(autouse=True)
 def setup_app_auth(mock_jwt_user):
@@ -53,25 +33,201 @@ def setup_app_auth(mock_jwt_user):
    app.dependency_overrides.clear()


+def _make_workspace(user_id: str = "test-user-id") -> Workspace:
+    return Workspace(
+        id="ws-001",
+        user_id=user_id,
+        created_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
+        updated_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
+    )
+
+
+def _make_file(**overrides) -> WorkspaceFile:
+    defaults = {
+        "id": "file-001",
+        "workspace_id": "ws-001",
+        "created_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
+        "updated_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
+        "name": "test.txt",
+        "path": "/test.txt",
+        "storage_path": "local://test.txt",
+        "mime_type": "text/plain",
+        "size_bytes": 100,
+        "checksum": None,
+        "is_deleted": False,
+        "deleted_at": None,
+        "metadata": {},
+    }
+    defaults.update(overrides)
+    return WorkspaceFile(**defaults)
+
+
+def _make_file_mock(**overrides) -> MagicMock:
+    """Create a mock WorkspaceFile to simulate DB records with null fields."""
+    defaults = {
+        "id": "file-001",
+        "name": "test.txt",
+        "path": "/test.txt",
+        "mime_type": "text/plain",
+        "size_bytes": 100,
+        "metadata": {},
+        "created_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
+    }
+    defaults.update(overrides)
+    mock = MagicMock(spec=WorkspaceFile)
+    for k, v in defaults.items():
+        setattr(mock, k, v)
+    return mock
+
+
+# -- list_workspace_files tests --
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_returns_all_when_no_session(mock_manager_cls, mock_get_workspace):
+    mock_get_workspace.return_value = _make_workspace()
+    files = [
+        _make_file(id="f1", name="a.txt", metadata={"origin": "user-upload"}),
+        _make_file(id="f2", name="b.csv", metadata={"origin": "agent-created"}),
+    ]
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = files
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files")
+    assert response.status_code == 200
+
+    data = response.json()
+    assert len(data["files"]) == 2
+    assert data["has_more"] is False
+    assert data["offset"] == 0
+    assert data["files"][0]["id"] == "f1"
+    assert data["files"][0]["metadata"] == {"origin": "user-upload"}
+    assert data["files"][1]["id"] == "f2"
+    mock_instance.list_files.assert_called_once_with(
+        limit=201, offset=0, include_all_sessions=True
+    )
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_scopes_to_session_when_provided(
+    mock_manager_cls, mock_get_workspace, test_user_id
+):
+    mock_get_workspace.return_value = _make_workspace(user_id=test_user_id)
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = []
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files?session_id=sess-123")
+    assert response.status_code == 200
+
+    data = response.json()
+    assert data["files"] == []
+    assert data["has_more"] is False
+    mock_manager_cls.assert_called_once_with(test_user_id, "ws-001", "sess-123")
+    mock_instance.list_files.assert_called_once_with(
+        limit=201, offset=0, include_all_sessions=False
+    )
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_null_metadata_coerced_to_empty_dict(
+    mock_manager_cls, mock_get_workspace
+):
+    """Route uses `f.metadata or {}` for pre-existing files with null metadata."""
+    mock_get_workspace.return_value = _make_workspace()
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = [_make_file_mock(metadata=None)]
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files")
+    assert response.status_code == 200
+    assert response.json()["files"][0]["metadata"] == {}
+
+
+# -- upload_file metadata tests --
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.get_workspace_total_size")
+@patch("backend.api.features.workspace.routes.scan_content_safe")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_upload_passes_user_upload_origin_metadata(
+    mock_manager_cls, mock_scan, mock_total_size, mock_get_workspace
+):
+    mock_get_workspace.return_value = _make_workspace()
+    mock_total_size.return_value = 100
+    written = _make_file(id="new-file", name="doc.pdf")
+    mock_instance = AsyncMock()
+    mock_instance.write_file.return_value = written
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.post(
+        "/files/upload",
+        files={"file": ("doc.pdf", b"fake-pdf-content", "application/pdf")},
+    )
+    assert response.status_code == 200
+
+    mock_instance.write_file.assert_called_once()
+    call_kwargs = mock_instance.write_file.call_args
+    assert call_kwargs.kwargs.get("metadata") == {"origin": "user-upload"}
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.get_workspace_total_size")
+@patch("backend.api.features.workspace.routes.scan_content_safe")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_upload_returns_409_on_file_conflict(
+    mock_manager_cls, mock_scan, mock_total_size, mock_get_workspace
+):
+    mock_get_workspace.return_value = _make_workspace()
+    mock_total_size.return_value = 100
+    mock_instance = AsyncMock()
+    mock_instance.write_file.side_effect = ValueError("File already exists at path")
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.post(
+        "/files/upload",
+        files={"file": ("dup.txt", b"content", "text/plain")},
+    )
+    assert response.status_code == 409
+    assert "already exists" in response.json()["detail"]
+
+
+# -- Restored upload/download/delete security + invariant tests --
+
+
 def _upload(
    filename: str = "hello.txt",
    content: bytes = b"Hello, world!",
    content_type: str = "text/plain",
 ):
-    """Helper to POST a file upload."""
    return client.post(
        "/files/upload?session_id=sess-1",
        files={"file": (filename, io.BytesIO(content), content_type)},
    )


-# ---- Happy path ----
+_MOCK_FILE = WorkspaceFile(
+    id="file-aaa-bbb",
+    workspace_id="ws-001",
+    created_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
+    updated_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
+    name="hello.txt",
+    path="/sessions/sess-1/hello.txt",
+    mime_type="text/plain",
+    size_bytes=13,
+    storage_path="local://hello.txt",
+)


-def test_upload_happy_path(mocker: pytest_mock.MockFixture):
+def test_upload_happy_path(mocker):
    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -82,7 +238,7 @@ def test_upload_happy_path(mocker: pytest_mock.MockFixture):
        return_value=None,
    )
    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
    mocker.patch(
        "backend.api.features.workspace.routes.WorkspaceManager",
        return_value=mock_manager,
@@ -96,10 +252,7 @@ def test_upload_happy_path(mocker: pytest_mock.MockFixture):
    assert data["size_bytes"] == 13


-# ---- Per-file size limit ----
-
-
-def test_upload_exceeds_max_file_size(mocker: pytest_mock.MockFixture):
+def test_upload_exceeds_max_file_size(mocker):
    """Files larger than max_file_size_mb should be rejected with 413."""
    cfg = mocker.patch("backend.api.features.workspace.routes.Config")
    cfg.return_value.max_file_size_mb = 0  # 0 MB → any content is too big
@@ -109,15 +262,11 @@ def test_upload_exceeds_max_file_size(mocker: pytest_mock.MockFixture):
    assert response.status_code == 413


-# ---- Storage quota exceeded ----
-
-
-def test_upload_storage_quota_exceeded(mocker: pytest_mock.MockFixture):
+def test_upload_storage_quota_exceeded(mocker):
    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
-    # Current usage already at limit
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
        return_value=500 * 1024 * 1024,
@@ -128,27 +277,22 @@ def test_upload_storage_quota_exceeded(mocker: pytest_mock.MockFixture):
    assert "Storage limit exceeded" in response.text


-# ---- Post-write quota race (B2) ----
-
-
-def test_upload_post_write_quota_race(mocker: pytest_mock.MockFixture):
-    """If a concurrent upload tips the total over the limit after write,
-    the file should be soft-deleted and 413 returned."""
+def test_upload_post_write_quota_race(mocker):
+    """Concurrent upload tipping over limit after write should soft-delete + 413."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
-    # Pre-write check passes (under limit), but post-write check fails
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
-        side_effect=[0, 600 * 1024 * 1024],  # first call OK, second over limit
+        side_effect=[0, 600 * 1024 * 1024],
    )
    mocker.patch(
        "backend.api.features.workspace.routes.scan_content_safe",
        return_value=None,
    )
    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
    mocker.patch(
        "backend.api.features.workspace.routes.WorkspaceManager",
        return_value=mock_manager,
@@ -160,17 +304,14 @@ def test_upload_post_write_quota_race(mocker: pytest_mock.MockFixture):

    response = _upload()
    assert response.status_code == 413
-    mock_delete.assert_called_once_with("file-aaa-bbb", "ws-1")
+    mock_delete.assert_called_once_with("file-aaa-bbb", "ws-001")


-# ---- Any extension accepted (no allowlist) ----
-
-
-def test_upload_any_extension(mocker: pytest_mock.MockFixture):
+def test_upload_any_extension(mocker):
    """Any file extension should be accepted — ClamAV is the security layer."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -181,7 +322,7 @@ def test_upload_any_extension(mocker: pytest_mock.MockFixture):
        return_value=None,
    )
    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
    mocker.patch(
        "backend.api.features.workspace.routes.WorkspaceManager",
        return_value=mock_manager,
@@ -191,16 +332,13 @@ def test_upload_any_extension(mocker: pytest_mock.MockFixture):
    assert response.status_code == 200


-# ---- Virus scan rejection ----
-
-
-def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
+def test_upload_blocked_by_virus_scan(mocker):
    """Files flagged by ClamAV should be rejected and never written to storage."""
    from backend.api.features.store.exceptions import VirusDetectedError

    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -211,7 +349,7 @@ def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
        side_effect=VirusDetectedError("Eicar-Test-Signature"),
    )
    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
    mocker.patch(
        "backend.api.features.workspace.routes.WorkspaceManager",
        return_value=mock_manager,
@@ -219,18 +357,14 @@ def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):

    response = _upload(filename="evil.exe", content=b"X5O!P%@AP...")
    assert response.status_code == 400
-    assert "Virus detected" in response.text
    mock_manager.write_file.assert_not_called()


-# ---- No file extension ----
-
-
-def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
+def test_upload_file_without_extension(mocker):
    """Files without an extension should be accepted and stored as-is."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -241,7 +375,7 @@ def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
        return_value=None,
    )
    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
    mocker.patch(
        "backend.api.features.workspace.routes.WorkspaceManager",
        return_value=mock_manager,
@@ -257,14 +391,11 @@ def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
    assert mock_manager.write_file.call_args[0][1] == "Makefile"


-# ---- Filename sanitization (SF5) ----
-
-
-def test_upload_strips_path_components(mocker: pytest_mock.MockFixture):
+def test_upload_strips_path_components(mocker):
    """Path-traversal filenames should be reduced to their basename."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -275,28 +406,23 @@ def test_upload_strips_path_components(mocker: pytest_mock.MockFixture):
        return_value=None,
    )
    mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
    mocker.patch(
        "backend.api.features.workspace.routes.WorkspaceManager",
        return_value=mock_manager,
    )

-    # Filename with traversal
    _upload(filename="../../etc/passwd.txt")

-    # write_file should have been called with just the basename
    mock_manager.write_file.assert_called_once()
    call_args = mock_manager.write_file.call_args
    assert call_args[0][1] == "passwd.txt"


-# ---- Download ----
-
-
-def test_download_file_not_found(mocker: pytest_mock.MockFixture):
+def test_download_file_not_found(mocker):
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace_file",
@@ -307,14 +433,11 @@ def test_download_file_not_found(mocker: pytest_mock.MockFixture):
    assert response.status_code == 404


-# ---- Delete ----
-
-
-def test_delete_file_success(mocker: pytest_mock.MockFixture):
+def test_delete_file_success(mocker):
    """Deleting an existing file should return {"deleted": true}."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mock_manager = mocker.MagicMock()
    mock_manager.delete_file = mocker.AsyncMock(return_value=True)
@@ -329,11 +452,11 @@ def test_delete_file_success(mocker: pytest_mock.MockFixture):
    mock_manager.delete_file.assert_called_once_with("file-aaa-bbb")


-def test_delete_file_not_found(mocker: pytest_mock.MockFixture):
+def test_delete_file_not_found(mocker):
    """Deleting a non-existent file should return 404."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
    )
    mock_manager = mocker.MagicMock()
    mock_manager.delete_file = mocker.AsyncMock(return_value=False)
@@ -347,7 +470,7 @@ def test_delete_file_not_found(mocker: pytest_mock.MockFixture):
    assert "File not found" in response.text


-def test_delete_file_no_workspace(mocker: pytest_mock.MockFixture):
+def test_delete_file_no_workspace(mocker):
    """Deleting when user has no workspace should return 404."""
    mocker.patch(
        "backend.api.features.workspace.routes.get_workspace",
@@ -357,3 +480,123 @@ def test_delete_file_no_workspace(mocker: pytest_mock.MockFixture):
    response = client.delete("/files/file-aaa-bbb")
    assert response.status_code == 404
    assert "Workspace not found" in response.text
+
+
+def test_upload_write_file_too_large_returns_413(mocker):
+    """write_file raises ValueError("File too large: …") → must map to 413."""
+    mocker.patch(
+        "backend.api.features.workspace.routes.get_or_create_workspace",
+        return_value=_make_workspace(),
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.get_workspace_total_size",
+        return_value=0,
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.scan_content_safe",
+        return_value=None,
+    )
+    mock_manager = mocker.MagicMock()
+    mock_manager.write_file = mocker.AsyncMock(
+        side_effect=ValueError("File too large: 900 bytes exceeds 1MB limit")
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.WorkspaceManager",
+        return_value=mock_manager,
+    )
+
+    response = _upload()
+    assert response.status_code == 413
+    assert "File too large" in response.text
+
+
+def test_upload_write_file_conflict_returns_409(mocker):
+    """Non-'File too large' ValueErrors from write_file stay as 409."""
+    mocker.patch(
+        "backend.api.features.workspace.routes.get_or_create_workspace",
+        return_value=_make_workspace(),
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.get_workspace_total_size",
+        return_value=0,
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.scan_content_safe",
+        return_value=None,
+    )
+    mock_manager = mocker.MagicMock()
+    mock_manager.write_file = mocker.AsyncMock(
+        side_effect=ValueError("File already exists at path: /sessions/x/a.txt")
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.WorkspaceManager",
+        return_value=mock_manager,
+    )
+
+    response = _upload()
+    assert response.status_code == 409
+    assert "already exists" in response.text
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_has_more_true_when_limit_exceeded(
+    mock_manager_cls, mock_get_workspace
+):
+    """The limit+1 fetch trick must flip has_more=True and trim the page."""
+    mock_get_workspace.return_value = _make_workspace()
+    # Backend was asked for limit+1=3, and returned exactly 3 items.
+    files = [
+        _make_file(id="f1", name="a.txt"),
+        _make_file(id="f2", name="b.txt"),
+        _make_file(id="f3", name="c.txt"),
+    ]
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = files
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files?limit=2")
+    assert response.status_code == 200
+    data = response.json()
+    assert data["has_more"] is True
+    assert len(data["files"]) == 2
+    assert data["files"][0]["id"] == "f1"
+    assert data["files"][1]["id"] == "f2"
+    mock_instance.list_files.assert_called_once_with(
+        limit=3, offset=0, include_all_sessions=True
+    )
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_has_more_false_when_exactly_page_size(
+    mock_manager_cls, mock_get_workspace
+):
+    """Exactly `limit` rows means we're on the last page — has_more=False."""
+    mock_get_workspace.return_value = _make_workspace()
+    files = [_make_file(id="f1", name="a.txt"), _make_file(id="f2", name="b.txt")]
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = files
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files?limit=2")
+    assert response.status_code == 200
+    data = response.json()
+    assert data["has_more"] is False
+    assert len(data["files"]) == 2
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_offset_is_echoed_back(mock_manager_cls, mock_get_workspace):
+    mock_get_workspace.return_value = _make_workspace()
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = []
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files?offset=50&limit=10")
+    assert response.status_code == 200
+    assert response.json()["offset"] == 50
+    mock_instance.list_files.assert_called_once_with(
+        limit=11, offset=50, include_all_sessions=True
+    )
--- a/autogpt_platform/backend/backend/api/rest_api.py
+++ b/autogpt_platform/backend/backend/api/rest_api.py
@@ -18,8 +18,8 @@ from prisma.errors import PrismaError

 import backend.api.features.admin.credit_admin_routes
 import backend.api.features.admin.execution_analytics_routes
+import backend.api.features.admin.rate_limit_admin_routes
 import backend.api.features.admin.store_admin_routes
-import backend.api.features.admin.user_admin_routes
 import backend.api.features.builder
 import backend.api.features.builder.routes
 import backend.api.features.chat.routes as chat_routes
@@ -118,6 +118,11 @@ async def lifespan_context(app: fastapi.FastAPI):

    AutoRegistry.patch_integrations()

+    # Register managed credential providers (e.g. AgentMail)
+    from backend.integrations.managed_providers import register_all
+
+    register_all()
+
    await backend.data.block.initialize_blocks()

    await backend.data.user.migrate_and_encrypt_user_integrations()
@@ -211,13 +216,22 @@ instrument_fastapi(
 def handle_internal_http_error(status_code: int = 500, log_error: bool = True):
    def handler(request: fastapi.Request, exc: Exception):
        if log_error:
-            logger.exception(
-                "%s %s failed. Investigate and resolve the underlying issue: %s",
-                request.method,
-                request.url.path,
-                exc,
-                exc_info=exc,
-            )
+            if status_code >= 500:
+                logger.exception(
+                    "%s %s failed. Investigate and resolve the underlying issue: %s",
+                    request.method,
+                    request.url.path,
+                    exc,
+                    exc_info=exc,
+                )
+            else:
+                logger.warning(
+                    "%s %s failed with %d: %s",
+                    request.method,
+                    request.url.path,
+                    status_code,
+                    exc,
+                )

        hint = (
            "Adjust the request and retry."
@@ -267,12 +281,10 @@ async def validation_error_handler(


 app.add_exception_handler(PrismaError, handle_internal_http_error(500))
-app.add_exception_handler(
-    FolderAlreadyExistsError, handle_internal_http_error(409, False)
-)
-app.add_exception_handler(FolderValidationError, handle_internal_http_error(400, False))
-app.add_exception_handler(NotFoundError, handle_internal_http_error(404, False))
-app.add_exception_handler(NotAuthorizedError, handle_internal_http_error(403, False))
+app.add_exception_handler(FolderAlreadyExistsError, handle_internal_http_error(409))
+app.add_exception_handler(FolderValidationError, handle_internal_http_error(400))
+app.add_exception_handler(NotFoundError, handle_internal_http_error(404))
+app.add_exception_handler(NotAuthorizedError, handle_internal_http_error(403))
 app.add_exception_handler(RequestValidationError, validation_error_handler)
 app.add_exception_handler(pydantic.ValidationError, validation_error_handler)
 app.add_exception_handler(MissingConfigError, handle_internal_http_error(503))
@@ -313,9 +325,9 @@ app.include_router(
    prefix="/api/executions",
 )
 app.include_router(
-    backend.api.features.admin.user_admin_routes.router,
+    backend.api.features.admin.rate_limit_admin_routes.router,
    tags=["v2", "admin"],
-    prefix="/api/users",
+    prefix="/api/copilot",
 )
 app.include_router(
    backend.api.features.executions.review.routes.router,
@@ -527,8 +539,11 @@ class AgentServer(backend.util.service.AppProcess):
        user_id: str,
        provider: ProviderName,
        credentials: Credentials,
-    ) -> Credentials:
-        from .features.integrations.router import create_credentials, get_credential
+    ):
+        from backend.api.features.integrations.router import (
+            create_credentials,
+            get_credential,
+        )

        try:
            return await create_credentials(
--- a/autogpt_platform/backend/backend/blocks/_base.py
+++ b/autogpt_platform/backend/backend/blocks/_base.py
@@ -698,13 +698,30 @@ class Block(ABC, Generic[BlockSchemaInputType, BlockSchemaOutputType]):
            if should_pause:
                return

-        # Validate the input data (original or reviewer-modified) once
-        if error := self.input_schema.validate_data(input_data):
-            raise BlockInputError(
-                message=f"Unable to execute block with invalid input data: {error}",
-                block_name=self.name,
-                block_id=self.id,
-            )
+        # Validate the input data (original or reviewer-modified) once.
+        # In dry-run mode, credential fields may contain sentinel None values
+        # that would fail JSON schema required checks.  We still validate the
+        # non-credential fields so blocks that execute for real during dry-run
+        # (e.g. AgentExecutorBlock) get proper input validation.
+        is_dry_run = getattr(kwargs.get("execution_context"), "dry_run", False)
+        if is_dry_run:
+            cred_field_names = set(self.input_schema.get_credentials_fields().keys())
+            non_cred_data = {
+                k: v for k, v in input_data.items() if k not in cred_field_names
+            }
+            if error := self.input_schema.validate_data(non_cred_data):
+                raise BlockInputError(
+                    message=f"Unable to execute block with invalid input data: {error}",
+                    block_name=self.name,
+                    block_id=self.id,
+                )
+        else:
+            if error := self.input_schema.validate_data(input_data):
+                raise BlockInputError(
+                    message=f"Unable to execute block with invalid input data: {error}",
+                    block_name=self.name,
+                    block_id=self.id,
+                )

        # Use the validated input data
        async for output_name, output_data in self.run(
--- a/autogpt_platform/backend/backend/blocks/agent.py
+++ b/autogpt_platform/backend/backend/blocks/agent.py
@@ -49,11 +49,17 @@ class AgentExecutorBlock(Block):
        @classmethod
        def get_missing_input(cls, data: BlockInput) -> set[str]:
            required_fields = cls.get_input_schema(data).get("required", [])
-            return set(required_fields) - set(data)
+            # Check against the nested `inputs` dict, not the top-level node
+            # data — required fields like "topic" live inside data["inputs"],
+            # not at data["topic"].
+            provided = data.get("inputs", {})
+            return set(required_fields) - set(provided)

        @classmethod
        def get_mismatch_error(cls, data: BlockInput) -> str | None:
-            return validate_with_jsonschema(cls.get_input_schema(data), data)
+            return validate_with_jsonschema(
+                cls.get_input_schema(data), data.get("inputs", {})
+            )

    class Output(BlockSchema):
        # Use BlockSchema to avoid automatic error field that could clash with graph outputs
@@ -88,6 +94,7 @@ class AgentExecutorBlock(Block):
            execution_context=execution_context.model_copy(
                update={"parent_execution_id": graph_exec_id},
            ),
+            dry_run=execution_context.dry_run,
        )

        logger = execution_utils.LogMetadata(
@@ -149,14 +156,19 @@ class AgentExecutorBlock(Block):
                ExecutionStatus.TERMINATED,
                ExecutionStatus.FAILED,
            ]:
-                logger.debug(
-                    f"Execution {log_id} received event {event.event_type} with status {event.status}"
+                logger.info(
+                    f"Execution {log_id} skipping event {event.event_type} status={event.status} "
+                    f"node={getattr(event, 'node_exec_id', '?')}"
                )
                continue

            if event.event_type == ExecutionEventType.GRAPH_EXEC_UPDATE:
                # If the graph execution is COMPLETED, TERMINATED, or FAILED,
                # we can stop listening for further events.
+                logger.info(
+                    f"Execution {log_id} graph completed with status {event.status}, "
+                    f"yielded {len(yielded_node_exec_ids)} outputs"
+                )
                self.merge_stats(
                    NodeExecutionStats(
                        extra_cost=event.stats.cost if event.stats else 0,
--- a/autogpt_platform/backend/backend/blocks/agent_mail/_config.py
+++ b/autogpt_platform/backend/backend/blocks/agent_mail/_config.py
@@ -0,0 +1,33 @@
+"""
+Shared configuration for all AgentMail blocks.
+"""
+
+from agentmail import AsyncAgentMail
+
+from backend.sdk import APIKeyCredentials, ProviderBuilder, SecretStr
+
+agent_mail = (
+    ProviderBuilder("agent_mail")
+    .with_api_key("AGENTMAIL_API_KEY", "AgentMail API Key")
+    .build()
+)
+
+TEST_CREDENTIALS = APIKeyCredentials(
+    id="01234567-89ab-cdef-0123-456789abcdef",
+    provider="agent_mail",
+    title="Mock AgentMail API Key",
+    api_key=SecretStr("mock-agentmail-api-key"),
+    expires_at=None,
+)
+
+TEST_CREDENTIALS_INPUT = {
+    "id": TEST_CREDENTIALS.id,
+    "provider": TEST_CREDENTIALS.provider,
+    "type": TEST_CREDENTIALS.type,
+    "title": TEST_CREDENTIALS.title,
+}
+
+
+def _client(credentials: APIKeyCredentials) -> AsyncAgentMail:
+    """Create an AsyncAgentMail client from credentials."""
+    return AsyncAgentMail(api_key=credentials.api_key.get_secret_value())
--- a/autogpt_platform/backend/backend/blocks/agent_mail/attachments.py
+++ b/autogpt_platform/backend/backend/blocks/agent_mail/attachments.py
@@ -0,0 +1,211 @@
+"""
+AgentMail Attachment blocks — download file attachments from messages and threads.
+
+Attachments are files associated with messages (PDFs, CSVs, images, etc.).
+To send attachments, include them in the attachments parameter when using
+AgentMailSendMessageBlock or AgentMailReplyToMessageBlock.
+
+To download, first get the attachment_id from a message's attachments array,
+then use these blocks to retrieve the file content as base64.
+"""
+
+import base64
+
+from backend.sdk import (
+    APIKeyCredentials,
+    Block,
+    BlockCategory,
+    BlockOutput,
+    BlockSchemaInput,
+    BlockSchemaOutput,
+    CredentialsMetaInput,
+    SchemaField,
+)
+
+from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
+
+
+class AgentMailGetMessageAttachmentBlock(Block):
+    """
+    Download a file attachment from a specific email message.
+
+    Retrieves the raw file content and returns it as base64-encoded data.
+    First get the attachment_id from a message object's attachments array,
+    then use this block to download the file.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address the message belongs to"
+        )
+        message_id: str = SchemaField(
+            description="Message ID containing the attachment"
+        )
+        attachment_id: str = SchemaField(
+            description="Attachment ID to download (from the message's attachments array)"
+        )
+
+    class Output(BlockSchemaOutput):
+        content_base64: str = SchemaField(
+            description="File content encoded as a base64 string. Decode with base64.b64decode() to get raw bytes."
+        )
+        attachment_id: str = SchemaField(
+            description="The attachment ID that was downloaded"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="a283ffc4-8087-4c3d-9135-8f26b86742ec",
+            description="Download a file attachment from an email message. Returns base64-encoded file content.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+                "message_id": "test-msg",
+                "attachment_id": "test-attach",
+            },
+            test_output=[
+                ("content_base64", "dGVzdA=="),
+                ("attachment_id", "test-attach"),
+            ],
+            test_mock={
+                "get_attachment": lambda *a, **kw: b"test",
+            },
+        )
+
+    @staticmethod
+    async def get_attachment(
+        credentials: APIKeyCredentials,
+        inbox_id: str,
+        message_id: str,
+        attachment_id: str,
+    ):
+        client = _client(credentials)
+        return await client.inboxes.messages.get_attachment(
+            inbox_id=inbox_id,
+            message_id=message_id,
+            attachment_id=attachment_id,
+        )
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            data = await self.get_attachment(
+                credentials=credentials,
+                inbox_id=input_data.inbox_id,
+                message_id=input_data.message_id,
+                attachment_id=input_data.attachment_id,
+            )
+            if isinstance(data, bytes):
+                encoded = base64.b64encode(data).decode()
+            elif isinstance(data, str):
+                encoded = base64.b64encode(data.encode("utf-8")).decode()
+            else:
+                raise TypeError(
+                    f"Unexpected attachment data type: {type(data).__name__}"
+                )
+
+            yield "content_base64", encoded
+            yield "attachment_id", input_data.attachment_id
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailGetThreadAttachmentBlock(Block):
+    """
+    Download a file attachment from a conversation thread.
+
+    Same as GetMessageAttachment but looks up by thread ID instead of
+    message ID. Useful when you know the thread but not the specific
+    message containing the attachment.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address the thread belongs to"
+        )
+        thread_id: str = SchemaField(description="Thread ID containing the attachment")
+        attachment_id: str = SchemaField(
+            description="Attachment ID to download (from a message's attachments array within the thread)"
+        )
+
+    class Output(BlockSchemaOutput):
+        content_base64: str = SchemaField(
+            description="File content encoded as a base64 string. Decode with base64.b64decode() to get raw bytes."
+        )
+        attachment_id: str = SchemaField(
+            description="The attachment ID that was downloaded"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="06b6a4c4-9d71-4992-9e9c-cf3b352763b5",
+            description="Download a file attachment from a conversation thread. Returns base64-encoded file content.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+                "thread_id": "test-thread",
+                "attachment_id": "test-attach",
+            },
+            test_output=[
+                ("content_base64", "dGVzdA=="),
+                ("attachment_id", "test-attach"),
+            ],
+            test_mock={
+                "get_attachment": lambda *a, **kw: b"test",
+            },
+        )
+
+    @staticmethod
+    async def get_attachment(
+        credentials: APIKeyCredentials,
+        inbox_id: str,
+        thread_id: str,
+        attachment_id: str,
+    ):
+        client = _client(credentials)
+        return await client.inboxes.threads.get_attachment(
+            inbox_id=inbox_id,
+            thread_id=thread_id,
+            attachment_id=attachment_id,
+        )
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            data = await self.get_attachment(
+                credentials=credentials,
+                inbox_id=input_data.inbox_id,
+                thread_id=input_data.thread_id,
+                attachment_id=input_data.attachment_id,
+            )
+            if isinstance(data, bytes):
+                encoded = base64.b64encode(data).decode()
+            elif isinstance(data, str):
+                encoded = base64.b64encode(data.encode("utf-8")).decode()
+            else:
+                raise TypeError(
+                    f"Unexpected attachment data type: {type(data).__name__}"
+                )
+
+            yield "content_base64", encoded
+            yield "attachment_id", input_data.attachment_id
+        except Exception as e:
+            yield "error", str(e)
--- a/autogpt_platform/backend/backend/blocks/agent_mail/drafts.py
+++ b/autogpt_platform/backend/backend/blocks/agent_mail/drafts.py
@@ -0,0 +1,678 @@
+"""
+AgentMail Draft blocks — create, get, list, update, send, and delete drafts.
+
+A Draft is an unsent message that can be reviewed, edited, and sent later.
+Drafts enable human-in-the-loop review, scheduled sending (via send_at),
+and complex multi-step email composition workflows.
+"""
+
+from typing import Optional
+
+from backend.sdk import (
+    APIKeyCredentials,
+    Block,
+    BlockCategory,
+    BlockOutput,
+    BlockSchemaInput,
+    BlockSchemaOutput,
+    CredentialsMetaInput,
+    SchemaField,
+)
+
+from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
+
+
+class AgentMailCreateDraftBlock(Block):
+    """
+    Create a draft email in an AgentMail inbox for review or scheduled sending.
+
+    Drafts let agents prepare emails without sending immediately. Use send_at
+    to schedule automatic sending at a future time (ISO 8601 format).
+    Scheduled drafts are auto-labeled 'scheduled' and can be cancelled by
+    deleting the draft.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address to create the draft in"
+        )
+        to: list[str] = SchemaField(
+            description="Recipient email addresses (e.g. ['user@example.com'])"
+        )
+        subject: str = SchemaField(description="Email subject line", default="")
+        text: str = SchemaField(description="Plain text body of the draft", default="")
+        html: str = SchemaField(
+            description="Rich HTML body of the draft", default="", advanced=True
+        )
+        cc: list[str] = SchemaField(
+            description="CC recipient email addresses",
+            default_factory=list,
+            advanced=True,
+        )
+        bcc: list[str] = SchemaField(
+            description="BCC recipient email addresses",
+            default_factory=list,
+            advanced=True,
+        )
+        in_reply_to: str = SchemaField(
+            description="Message ID this draft replies to, for threading follow-up drafts",
+            default="",
+            advanced=True,
+        )
+        send_at: str = SchemaField(
+            description="Schedule automatic sending at this ISO 8601 datetime (e.g. '2025-01-15T09:00:00Z'). Leave empty for manual send.",
+            default="",
+            advanced=True,
+        )
+
+    class Output(BlockSchemaOutput):
+        draft_id: str = SchemaField(
+            description="Unique identifier of the created draft"
+        )
+        send_status: str = SchemaField(
+            description="'scheduled' if send_at was set, empty otherwise. Values: scheduled, sending, failed.",
+            default="",
+        )
+        result: dict = SchemaField(
+            description="Complete draft object with all metadata"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="25ac9086-69fd-48b8-b910-9dbe04b8f3bd",
+            description="Create a draft email for review or scheduled sending. Use send_at for automatic future delivery.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+                "to": ["user@example.com"],
+            },
+            test_output=[
+                ("draft_id", "mock-draft-id"),
+                ("send_status", ""),
+                ("result", dict),
+            ],
+            test_mock={
+                "create_draft": lambda *a, **kw: type(
+                    "Draft",
+                    (),
+                    {
+                        "draft_id": "mock-draft-id",
+                        "send_status": "",
+                        "model_dump": lambda self: {"draft_id": "mock-draft-id"},
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def create_draft(credentials: APIKeyCredentials, inbox_id: str, **params):
+        client = _client(credentials)
+        return await client.inboxes.drafts.create(inbox_id, **params)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            params: dict = {"to": input_data.to}
+            if input_data.subject:
+                params["subject"] = input_data.subject
+            if input_data.text:
+                params["text"] = input_data.text
+            if input_data.html:
+                params["html"] = input_data.html
+            if input_data.cc:
+                params["cc"] = input_data.cc
+            if input_data.bcc:
+                params["bcc"] = input_data.bcc
+            if input_data.in_reply_to:
+                params["in_reply_to"] = input_data.in_reply_to
+            if input_data.send_at:
+                params["send_at"] = input_data.send_at
+
+            draft = await self.create_draft(credentials, input_data.inbox_id, **params)
+            result = draft.model_dump()
+
+            yield "draft_id", draft.draft_id
+            yield "send_status", draft.send_status or ""
+            yield "result", result
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailGetDraftBlock(Block):
+    """
+    Retrieve a specific draft from an AgentMail inbox.
+
+    Returns the draft contents including recipients, subject, body, and
+    scheduled send status. Use this to review a draft before approving it.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address the draft belongs to"
+        )
+        draft_id: str = SchemaField(description="Draft ID to retrieve")
+
+    class Output(BlockSchemaOutput):
+        draft_id: str = SchemaField(description="Unique identifier of the draft")
+        subject: str = SchemaField(description="Draft subject line", default="")
+        send_status: str = SchemaField(
+            description="Scheduled send status: 'scheduled', 'sending', 'failed', or empty",
+            default="",
+        )
+        send_at: str = SchemaField(
+            description="Scheduled send time (ISO 8601) if set", default=""
+        )
+        result: dict = SchemaField(description="Complete draft object with all fields")
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="8e57780d-dc25-43d4-a0f4-1f02877b09fb",
+            description="Retrieve a draft email to review its contents, recipients, and scheduled send status.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+                "draft_id": "test-draft",
+            },
+            test_output=[
+                ("draft_id", "test-draft"),
+                ("subject", ""),
+                ("send_status", ""),
+                ("send_at", ""),
+                ("result", dict),
+            ],
+            test_mock={
+                "get_draft": lambda *a, **kw: type(
+                    "Draft",
+                    (),
+                    {
+                        "draft_id": "test-draft",
+                        "subject": "",
+                        "send_status": "",
+                        "send_at": "",
+                        "model_dump": lambda self: {"draft_id": "test-draft"},
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def get_draft(credentials: APIKeyCredentials, inbox_id: str, draft_id: str):
+        client = _client(credentials)
+        return await client.inboxes.drafts.get(inbox_id=inbox_id, draft_id=draft_id)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            draft = await self.get_draft(
+                credentials, input_data.inbox_id, input_data.draft_id
+            )
+            result = draft.model_dump()
+
+            yield "draft_id", draft.draft_id
+            yield "subject", draft.subject or ""
+            yield "send_status", draft.send_status or ""
+            yield "send_at", draft.send_at or ""
+            yield "result", result
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailListDraftsBlock(Block):
+    """
+    List all drafts in an AgentMail inbox with optional label filtering.
+
+    Use labels=['scheduled'] to find all drafts queued for future sending.
+    Useful for building approval dashboards or monitoring pending outreach.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address to list drafts from"
+        )
+        limit: int = SchemaField(
+            description="Maximum number of drafts to return per page (1-100)",
+            default=20,
+            advanced=True,
+        )
+        page_token: str = SchemaField(
+            description="Token from a previous response to fetch the next page",
+            default="",
+            advanced=True,
+        )
+        labels: list[str] = SchemaField(
+            description="Filter drafts by labels (e.g. ['scheduled'] for pending sends)",
+            default_factory=list,
+            advanced=True,
+        )
+
+    class Output(BlockSchemaOutput):
+        drafts: list[dict] = SchemaField(
+            description="List of draft objects with subject, recipients, send_status, etc."
+        )
+        count: int = SchemaField(description="Number of drafts returned")
+        next_page_token: str = SchemaField(
+            description="Token for the next page. Empty if no more results.",
+            default="",
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="e84883b7-7c39-4c5c-88e8-0a72b078ea63",
+            description="List drafts in an AgentMail inbox. Filter by labels=['scheduled'] to find pending sends.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+            },
+            test_output=[
+                ("drafts", []),
+                ("count", 0),
+                ("next_page_token", ""),
+            ],
+            test_mock={
+                "list_drafts": lambda *a, **kw: type(
+                    "Resp",
+                    (),
+                    {
+                        "drafts": [],
+                        "count": 0,
+                        "next_page_token": "",
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def list_drafts(credentials: APIKeyCredentials, inbox_id: str, **params):
+        client = _client(credentials)
+        return await client.inboxes.drafts.list(inbox_id, **params)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            params: dict = {"limit": input_data.limit}
+            if input_data.page_token:
+                params["page_token"] = input_data.page_token
+            if input_data.labels:
+                params["labels"] = input_data.labels
+
+            response = await self.list_drafts(
+                credentials, input_data.inbox_id, **params
+            )
+            drafts = [d.model_dump() for d in response.drafts]
+
+            yield "drafts", drafts
+            yield "count", response.count
+            yield "next_page_token", response.next_page_token or ""
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailUpdateDraftBlock(Block):
+    """
+    Update an existing draft's content, recipients, or scheduled send time.
+
+    Use this to reschedule a draft (change send_at), modify recipients,
+    or edit the subject/body before sending. To cancel a scheduled send,
+    delete the draft instead.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address the draft belongs to"
+        )
+        draft_id: str = SchemaField(description="Draft ID to update")
+        to: Optional[list[str]] = SchemaField(
+            description="Updated recipient email addresses (replaces existing list). Omit to keep current value.",
+            default=None,
+        )
+        subject: Optional[str] = SchemaField(
+            description="Updated subject line. Omit to keep current value.",
+            default=None,
+        )
+        text: Optional[str] = SchemaField(
+            description="Updated plain text body. Omit to keep current value.",
+            default=None,
+        )
+        html: Optional[str] = SchemaField(
+            description="Updated HTML body. Omit to keep current value.",
+            default=None,
+            advanced=True,
+        )
+        send_at: Optional[str] = SchemaField(
+            description="Reschedule: new ISO 8601 send time (e.g. '2025-01-20T14:00:00Z'). Omit to keep current value.",
+            default=None,
+            advanced=True,
+        )
+
+    class Output(BlockSchemaOutput):
+        draft_id: str = SchemaField(description="The updated draft ID")
+        send_status: str = SchemaField(description="Updated send status", default="")
+        result: dict = SchemaField(description="Complete updated draft object")
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="351f6e51-695a-421a-9032-46a587b10336",
+            description="Update a draft's content, recipients, or scheduled send time. Use to reschedule or edit before sending.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+                "draft_id": "test-draft",
+            },
+            test_output=[
+                ("draft_id", "test-draft"),
+                ("send_status", ""),
+                ("result", dict),
+            ],
+            test_mock={
+                "update_draft": lambda *a, **kw: type(
+                    "Draft",
+                    (),
+                    {
+                        "draft_id": "test-draft",
+                        "send_status": "",
+                        "model_dump": lambda self: {"draft_id": "test-draft"},
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def update_draft(
+        credentials: APIKeyCredentials, inbox_id: str, draft_id: str, **params
+    ):
+        client = _client(credentials)
+        return await client.inboxes.drafts.update(
+            inbox_id=inbox_id, draft_id=draft_id, **params
+        )
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            params: dict = {}
+            if input_data.to is not None:
+                params["to"] = input_data.to
+            if input_data.subject is not None:
+                params["subject"] = input_data.subject
+            if input_data.text is not None:
+                params["text"] = input_data.text
+            if input_data.html is not None:
+                params["html"] = input_data.html
+            if input_data.send_at is not None:
+                params["send_at"] = input_data.send_at
+
+            draft = await self.update_draft(
+                credentials, input_data.inbox_id, input_data.draft_id, **params
+            )
+            result = draft.model_dump()
+
+            yield "draft_id", draft.draft_id
+            yield "send_status", draft.send_status or ""
+            yield "result", result
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailSendDraftBlock(Block):
+    """
+    Send a draft immediately, converting it into a delivered message.
+
+    The draft is deleted after successful sending and becomes a regular
+    message with a message_id. Use this for human-in-the-loop approval
+    workflows: agent creates draft, human reviews, then this block sends it.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address the draft belongs to"
+        )
+        draft_id: str = SchemaField(description="Draft ID to send now")
+
+    class Output(BlockSchemaOutput):
+        message_id: str = SchemaField(
+            description="Message ID of the now-sent email (draft is deleted)"
+        )
+        thread_id: str = SchemaField(
+            description="Thread ID the sent message belongs to"
+        )
+        result: dict = SchemaField(description="Complete sent message object")
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="37c39e83-475d-4b3d-843a-d923d001b85a",
+            description="Send a draft immediately, converting it into a delivered message. The draft is deleted after sending.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            is_sensitive_action=True,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+                "draft_id": "test-draft",
+            },
+            test_output=[
+                ("message_id", "mock-msg-id"),
+                ("thread_id", "mock-thread-id"),
+                ("result", dict),
+            ],
+            test_mock={
+                "send_draft": lambda *a, **kw: type(
+                    "Msg",
+                    (),
+                    {
+                        "message_id": "mock-msg-id",
+                        "thread_id": "mock-thread-id",
+                        "model_dump": lambda self: {"message_id": "mock-msg-id"},
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def send_draft(credentials: APIKeyCredentials, inbox_id: str, draft_id: str):
+        client = _client(credentials)
+        return await client.inboxes.drafts.send(inbox_id=inbox_id, draft_id=draft_id)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            msg = await self.send_draft(
+                credentials, input_data.inbox_id, input_data.draft_id
+            )
+            result = msg.model_dump()
+
+            yield "message_id", msg.message_id
+            yield "thread_id", msg.thread_id or ""
+            yield "result", result
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailDeleteDraftBlock(Block):
+    """
+    Delete a draft from an AgentMail inbox. Also cancels any scheduled send.
+
+    If the draft was scheduled with send_at, deleting it cancels the
+    scheduled delivery. This is the way to cancel a scheduled email.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address the draft belongs to"
+        )
+        draft_id: str = SchemaField(
+            description="Draft ID to delete (also cancels scheduled sends)"
+        )
+
+    class Output(BlockSchemaOutput):
+        success: bool = SchemaField(
+            description="True if the draft was successfully deleted/cancelled"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="9023eb99-3e2f-4def-808b-d9c584b3d9e7",
+            description="Delete a draft or cancel a scheduled email. Removes the draft permanently.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            is_sensitive_action=True,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+                "draft_id": "test-draft",
+            },
+            test_output=[("success", True)],
+            test_mock={
+                "delete_draft": lambda *a, **kw: None,
+            },
+        )
+
+    @staticmethod
+    async def delete_draft(
+        credentials: APIKeyCredentials, inbox_id: str, draft_id: str
+    ):
+        client = _client(credentials)
+        await client.inboxes.drafts.delete(inbox_id=inbox_id, draft_id=draft_id)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            await self.delete_draft(
+                credentials, input_data.inbox_id, input_data.draft_id
+            )
+            yield "success", True
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailListOrgDraftsBlock(Block):
+    """
+    List all drafts across every inbox in your organization.
+
+    Returns drafts from all inboxes in one query. Perfect for building
+    a central approval dashboard where a human supervisor can review
+    and approve any draft created by any agent.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        limit: int = SchemaField(
+            description="Maximum number of drafts to return per page (1-100)",
+            default=20,
+            advanced=True,
+        )
+        page_token: str = SchemaField(
+            description="Token from a previous response to fetch the next page",
+            default="",
+            advanced=True,
+        )
+
+    class Output(BlockSchemaOutput):
+        drafts: list[dict] = SchemaField(
+            description="List of draft objects from all inboxes in the organization"
+        )
+        count: int = SchemaField(description="Number of drafts returned")
+        next_page_token: str = SchemaField(
+            description="Token for the next page. Empty if no more results.",
+            default="",
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="ed7558ae-3a07-45f5-af55-a25fe88c9971",
+            description="List all drafts across every inbox in your organization. Use for central approval dashboards.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={"credentials": TEST_CREDENTIALS_INPUT},
+            test_output=[
+                ("drafts", []),
+                ("count", 0),
+                ("next_page_token", ""),
+            ],
+            test_mock={
+                "list_org_drafts": lambda *a, **kw: type(
+                    "Resp",
+                    (),
+                    {
+                        "drafts": [],
+                        "count": 0,
+                        "next_page_token": "",
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def list_org_drafts(credentials: APIKeyCredentials, **params):
+        client = _client(credentials)
+        return await client.drafts.list(**params)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            params: dict = {"limit": input_data.limit}
+            if input_data.page_token:
+                params["page_token"] = input_data.page_token
+
+            response = await self.list_org_drafts(credentials, **params)
+            drafts = [d.model_dump() for d in response.drafts]
+
+            yield "drafts", drafts
+            yield "count", response.count
+            yield "next_page_token", response.next_page_token or ""
+        except Exception as e:
+            yield "error", str(e)
--- a/autogpt_platform/backend/backend/blocks/agent_mail/inbox.py
+++ b/autogpt_platform/backend/backend/blocks/agent_mail/inbox.py
@@ -0,0 +1,414 @@
+"""
+AgentMail Inbox blocks — create, get, list, update, and delete inboxes.
+
+An Inbox is a fully programmable email account for AI agents. Each inbox gets
+a unique email address and can send, receive, and manage emails via the
+AgentMail API. You can create thousands of inboxes on demand.
+"""
+
+from agentmail.inboxes.types import CreateInboxRequest
+
+from backend.sdk import (
+    APIKeyCredentials,
+    Block,
+    BlockCategory,
+    BlockOutput,
+    BlockSchemaInput,
+    BlockSchemaOutput,
+    CredentialsMetaInput,
+    SchemaField,
+)
+
+from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
+
+
+class AgentMailCreateInboxBlock(Block):
+    """
+    Create a new email inbox for an AI agent via AgentMail.
+
+    Each inbox gets a unique email address (e.g. username@agentmail.to).
+    If username and domain are not provided, AgentMail auto-generates them.
+    Use custom domains by specifying the domain field.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        username: str = SchemaField(
+            description="Local part of the email address (e.g. 'support' for support@domain.com). Leave empty to auto-generate.",
+            default="",
+            advanced=False,
+        )
+        domain: str = SchemaField(
+            description="Email domain (e.g. 'mydomain.com'). Defaults to agentmail.to if empty.",
+            default="",
+            advanced=False,
+        )
+        display_name: str = SchemaField(
+            description="Friendly name shown in the 'From' field of sent emails (e.g. 'Support Agent')",
+            default="",
+            advanced=False,
+        )
+
+    class Output(BlockSchemaOutput):
+        inbox_id: str = SchemaField(
+            description="Unique identifier for the created inbox (also the email address)"
+        )
+        email_address: str = SchemaField(
+            description="Full email address of the inbox (e.g. support@agentmail.to)"
+        )
+        result: dict = SchemaField(
+            description="Complete inbox object with all metadata"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="7a8ac219-c6ec-4eec-a828-81af283ce04c",
+            description="Create a new email inbox for an AI agent via AgentMail. Each inbox gets a unique address and can send/receive emails.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={"credentials": TEST_CREDENTIALS_INPUT},
+            test_output=[
+                ("inbox_id", "mock-inbox-id"),
+                ("email_address", "mock-inbox-id"),
+                ("result", dict),
+            ],
+            test_mock={
+                "create_inbox": lambda *a, **kw: type(
+                    "Inbox",
+                    (),
+                    {
+                        "inbox_id": "mock-inbox-id",
+                        "model_dump": lambda self: {"inbox_id": "mock-inbox-id"},
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def create_inbox(credentials: APIKeyCredentials, **params):
+        client = _client(credentials)
+        return await client.inboxes.create(request=CreateInboxRequest(**params))
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            params: dict = {}
+            if input_data.username:
+                params["username"] = input_data.username
+            if input_data.domain:
+                params["domain"] = input_data.domain
+            if input_data.display_name:
+                params["display_name"] = input_data.display_name
+
+            inbox = await self.create_inbox(credentials, **params)
+            result = inbox.model_dump()
+
+            yield "inbox_id", inbox.inbox_id
+            yield "email_address", inbox.inbox_id
+            yield "result", result
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailGetInboxBlock(Block):
+    """
+    Retrieve details of an existing AgentMail inbox by its ID or email address.
+
+    Returns the inbox metadata including email address, display name, and
+    configuration. Use this to check if an inbox exists or get its properties.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address to look up (e.g. 'support@agentmail.to')"
+        )
+
+    class Output(BlockSchemaOutput):
+        inbox_id: str = SchemaField(description="Unique identifier of the inbox")
+        email_address: str = SchemaField(description="Full email address of the inbox")
+        display_name: str = SchemaField(
+            description="Friendly name shown in the 'From' field", default=""
+        )
+        result: dict = SchemaField(
+            description="Complete inbox object with all metadata"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="b858f62b-6c12-4736-aaf2-dbc5a9281320",
+            description="Retrieve details of an existing AgentMail inbox including its email address, display name, and configuration.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+            },
+            test_output=[
+                ("inbox_id", "test-inbox"),
+                ("email_address", "test-inbox"),
+                ("display_name", ""),
+                ("result", dict),
+            ],
+            test_mock={
+                "get_inbox": lambda *a, **kw: type(
+                    "Inbox",
+                    (),
+                    {
+                        "inbox_id": "test-inbox",
+                        "display_name": "",
+                        "model_dump": lambda self: {"inbox_id": "test-inbox"},
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def get_inbox(credentials: APIKeyCredentials, inbox_id: str):
+        client = _client(credentials)
+        return await client.inboxes.get(inbox_id=inbox_id)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            inbox = await self.get_inbox(credentials, input_data.inbox_id)
+            result = inbox.model_dump()
+
+            yield "inbox_id", inbox.inbox_id
+            yield "email_address", inbox.inbox_id
+            yield "display_name", inbox.display_name or ""
+            yield "result", result
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailListInboxesBlock(Block):
+    """
+    List all email inboxes in your AgentMail organization.
+
+    Returns a paginated list of all inboxes with their metadata.
+    Use page_token for pagination when you have many inboxes.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        limit: int = SchemaField(
+            description="Maximum number of inboxes to return per page (1-100)",
+            default=20,
+            advanced=True,
+        )
+        page_token: str = SchemaField(
+            description="Token from a previous response to fetch the next page of results",
+            default="",
+            advanced=True,
+        )
+
+    class Output(BlockSchemaOutput):
+        inboxes: list[dict] = SchemaField(
+            description="List of inbox objects, each containing inbox_id, email_address, display_name, etc."
+        )
+        count: int = SchemaField(
+            description="Total number of inboxes in your organization"
+        )
+        next_page_token: str = SchemaField(
+            description="Token to pass as page_token to get the next page. Empty if no more results.",
+            default="",
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="cfd84a06-2121-4cef-8d14-8badf52d22f0",
+            description="List all email inboxes in your AgentMail organization with pagination support.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={"credentials": TEST_CREDENTIALS_INPUT},
+            test_output=[
+                ("inboxes", []),
+                ("count", 0),
+                ("next_page_token", ""),
+            ],
+            test_mock={
+                "list_inboxes": lambda *a, **kw: type(
+                    "Resp",
+                    (),
+                    {
+                        "inboxes": [],
+                        "count": 0,
+                        "next_page_token": "",
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def list_inboxes(credentials: APIKeyCredentials, **params):
+        client = _client(credentials)
+        return await client.inboxes.list(**params)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            params: dict = {"limit": input_data.limit}
+            if input_data.page_token:
+                params["page_token"] = input_data.page_token
+
+            response = await self.list_inboxes(credentials, **params)
+            inboxes = [i.model_dump() for i in response.inboxes]
+
+            yield "inboxes", inboxes
+            yield "count", (c if (c := response.count) is not None else len(inboxes))
+            yield "next_page_token", response.next_page_token or ""
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailUpdateInboxBlock(Block):
+    """
+    Update the display name of an existing AgentMail inbox.
+
+    Changes the friendly name shown in the 'From' field when emails are sent
+    from this inbox. The email address itself cannot be changed.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address to update (e.g. 'support@agentmail.to')"
+        )
+        display_name: str = SchemaField(
+            description="New display name for the inbox (e.g. 'Customer Support Bot')"
+        )
+
+    class Output(BlockSchemaOutput):
+        inbox_id: str = SchemaField(description="The updated inbox ID")
+        result: dict = SchemaField(
+            description="Complete updated inbox object with all metadata"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="59b49f59-a6d1-4203-94c0-3908adac50b6",
+            description="Update the display name of an AgentMail inbox. Changes the 'From' name shown when emails are sent.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+                "display_name": "Updated",
+            },
+            test_output=[
+                ("inbox_id", "test-inbox"),
+                ("result", dict),
+            ],
+            test_mock={
+                "update_inbox": lambda *a, **kw: type(
+                    "Inbox",
+                    (),
+                    {
+                        "inbox_id": "test-inbox",
+                        "model_dump": lambda self: {"inbox_id": "test-inbox"},
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def update_inbox(credentials: APIKeyCredentials, inbox_id: str, **params):
+        client = _client(credentials)
+        return await client.inboxes.update(inbox_id=inbox_id, **params)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            inbox = await self.update_inbox(
+                credentials,
+                input_data.inbox_id,
+                display_name=input_data.display_name,
+            )
+            result = inbox.model_dump()
+
+            yield "inbox_id", inbox.inbox_id
+            yield "result", result
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailDeleteInboxBlock(Block):
+    """
+    Permanently delete an AgentMail inbox and all its data.
+
+    This removes the inbox, all its messages, threads, and drafts.
+    This action cannot be undone. The email address will no longer
+    receive or send emails.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address to permanently delete"
+        )
+
+    class Output(BlockSchemaOutput):
+        success: bool = SchemaField(
+            description="True if the inbox was successfully deleted"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="ade970ae-8428-4a7b-9278-b52054dbf535",
+            description="Permanently delete an AgentMail inbox and all its messages, threads, and drafts. This action cannot be undone.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            is_sensitive_action=True,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+            },
+            test_output=[("success", True)],
+            test_mock={
+                "delete_inbox": lambda *a, **kw: None,
+            },
+        )
+
+    @staticmethod
+    async def delete_inbox(credentials: APIKeyCredentials, inbox_id: str):
+        client = _client(credentials)
+        await client.inboxes.delete(inbox_id=inbox_id)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            await self.delete_inbox(credentials, input_data.inbox_id)
+            yield "success", True
+        except Exception as e:
+            yield "error", str(e)
--- a/autogpt_platform/backend/backend/blocks/agent_mail/lists.py
+++ b/autogpt_platform/backend/backend/blocks/agent_mail/lists.py
@@ -0,0 +1,384 @@
+"""
+AgentMail List blocks — manage allow/block lists for email filtering.
+
+Lists let you control which email addresses and domains your agents can
+send to or receive from. There are four list types based on two dimensions:
+direction (send/receive) and type (allow/block).
+
+- receive + allow: Only accept emails from these addresses/domains
+- receive + block: Reject emails from these addresses/domains
+- send + allow: Only send emails to these addresses/domains
+- send + block: Prevent sending emails to these addresses/domains
+"""
+
+from enum import Enum
+
+from backend.sdk import (
+    APIKeyCredentials,
+    Block,
+    BlockCategory,
+    BlockOutput,
+    BlockSchemaInput,
+    BlockSchemaOutput,
+    CredentialsMetaInput,
+    SchemaField,
+)
+
+from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
+
+
+class ListDirection(str, Enum):
+    SEND = "send"
+    RECEIVE = "receive"
+
+
+class ListType(str, Enum):
+    ALLOW = "allow"
+    BLOCK = "block"
+
+
+class AgentMailListEntriesBlock(Block):
+    """
+    List all entries in an AgentMail allow/block list.
+
+    Retrieves email addresses and domains that are currently allowed
+    or blocked for sending or receiving. Use direction and list_type
+    to select which of the four lists to query.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        direction: ListDirection = SchemaField(
+            description="'send' to filter outgoing emails, 'receive' to filter incoming emails"
+        )
+        list_type: ListType = SchemaField(
+            description="'allow' for whitelist (only permit these), 'block' for blacklist (reject these)"
+        )
+        limit: int = SchemaField(
+            description="Maximum number of entries to return per page",
+            default=20,
+            advanced=True,
+        )
+        page_token: str = SchemaField(
+            description="Token from a previous response to fetch the next page",
+            default="",
+            advanced=True,
+        )
+
+    class Output(BlockSchemaOutput):
+        entries: list[dict] = SchemaField(
+            description="List of entries, each with an email address or domain"
+        )
+        count: int = SchemaField(description="Number of entries returned")
+        next_page_token: str = SchemaField(
+            description="Token for the next page. Empty if no more results.",
+            default="",
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="01489100-35da-45aa-8a01-9540ba0e9a21",
+            description="List all entries in an AgentMail allow/block list. Choose send/receive direction and allow/block type.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "direction": "receive",
+                "list_type": "block",
+            },
+            test_output=[
+                ("entries", []),
+                ("count", 0),
+                ("next_page_token", ""),
+            ],
+            test_mock={
+                "list_entries": lambda *a, **kw: type(
+                    "Resp",
+                    (),
+                    {
+                        "entries": [],
+                        "count": 0,
+                        "next_page_token": "",
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def list_entries(
+        credentials: APIKeyCredentials, direction: str, list_type: str, **params
+    ):
+        client = _client(credentials)
+        return await client.lists.list(direction, list_type, **params)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            params: dict = {"limit": input_data.limit}
+            if input_data.page_token:
+                params["page_token"] = input_data.page_token
+
+            response = await self.list_entries(
+                credentials,
+                input_data.direction.value,
+                input_data.list_type.value,
+                **params,
+            )
+            entries = [e.model_dump() for e in response.entries]
+
+            yield "entries", entries
+            yield "count", (c if (c := response.count) is not None else len(entries))
+            yield "next_page_token", response.next_page_token or ""
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailCreateListEntryBlock(Block):
+    """
+    Add an email address or domain to an AgentMail allow/block list.
+
+    Entries can be full email addresses (e.g. 'partner@example.com') or
+    entire domains (e.g. 'example.com'). For block lists, you can optionally
+    provide a reason (e.g. 'spam', 'competitor').
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        direction: ListDirection = SchemaField(
+            description="'send' for outgoing email rules, 'receive' for incoming email rules"
+        )
+        list_type: ListType = SchemaField(
+            description="'allow' to whitelist, 'block' to blacklist"
+        )
+        entry: str = SchemaField(
+            description="Email address (user@example.com) or domain (example.com) to add"
+        )
+        reason: str = SchemaField(
+            description="Reason for blocking (only used with block lists, e.g. 'spam', 'competitor')",
+            default="",
+            advanced=True,
+        )
+
+    class Output(BlockSchemaOutput):
+        entry: str = SchemaField(
+            description="The email address or domain that was added"
+        )
+        result: dict = SchemaField(description="Complete entry object")
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="b6650a0a-b113-40cf-8243-ff20f684f9b8",
+            description="Add an email address or domain to an allow/block list. Block spam senders or whitelist trusted domains.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            is_sensitive_action=True,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "direction": "receive",
+                "list_type": "block",
+                "entry": "spam@example.com",
+            },
+            test_output=[
+                ("entry", "spam@example.com"),
+                ("result", dict),
+            ],
+            test_mock={
+                "create_entry": lambda *a, **kw: type(
+                    "Entry",
+                    (),
+                    {
+                        "model_dump": lambda self: {"entry": "spam@example.com"},
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def create_entry(
+        credentials: APIKeyCredentials, direction: str, list_type: str, **params
+    ):
+        client = _client(credentials)
+        return await client.lists.create(direction, list_type, **params)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            params: dict = {"entry": input_data.entry}
+            if input_data.reason and input_data.list_type == ListType.BLOCK:
+                params["reason"] = input_data.reason
+
+            result = await self.create_entry(
+                credentials,
+                input_data.direction.value,
+                input_data.list_type.value,
+                **params,
+            )
+            result_dict = result.model_dump()
+
+            yield "entry", input_data.entry
+            yield "result", result_dict
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailGetListEntryBlock(Block):
+    """
+    Check if an email address or domain exists in an AgentMail allow/block list.
+
+    Returns the entry details if found. Use this to verify whether a specific
+    address or domain is currently allowed or blocked.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        direction: ListDirection = SchemaField(
+            description="'send' for outgoing rules, 'receive' for incoming rules"
+        )
+        list_type: ListType = SchemaField(
+            description="'allow' for whitelist, 'block' for blacklist"
+        )
+        entry: str = SchemaField(description="Email address or domain to look up")
+
+    class Output(BlockSchemaOutput):
+        entry: str = SchemaField(
+            description="The email address or domain that was found"
+        )
+        result: dict = SchemaField(description="Complete entry object with metadata")
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="fb117058-ab27-40d1-9231-eb1dd526fc7a",
+            description="Check if an email address or domain is in an allow/block list. Verify filtering rules.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "direction": "receive",
+                "list_type": "block",
+                "entry": "spam@example.com",
+            },
+            test_output=[
+                ("entry", "spam@example.com"),
+                ("result", dict),
+            ],
+            test_mock={
+                "get_entry": lambda *a, **kw: type(
+                    "Entry",
+                    (),
+                    {
+                        "model_dump": lambda self: {"entry": "spam@example.com"},
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def get_entry(
+        credentials: APIKeyCredentials, direction: str, list_type: str, entry: str
+    ):
+        client = _client(credentials)
+        return await client.lists.get(direction, list_type, entry=entry)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            result = await self.get_entry(
+                credentials,
+                input_data.direction.value,
+                input_data.list_type.value,
+                input_data.entry,
+            )
+            result_dict = result.model_dump()
+
+            yield "entry", input_data.entry
+            yield "result", result_dict
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailDeleteListEntryBlock(Block):
+    """
+    Remove an email address or domain from an AgentMail allow/block list.
+
+    After removal, the address/domain will no longer be filtered by this list.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        direction: ListDirection = SchemaField(
+            description="'send' for outgoing rules, 'receive' for incoming rules"
+        )
+        list_type: ListType = SchemaField(
+            description="'allow' for whitelist, 'block' for blacklist"
+        )
+        entry: str = SchemaField(
+            description="Email address or domain to remove from the list"
+        )
+
+    class Output(BlockSchemaOutput):
+        success: bool = SchemaField(
+            description="True if the entry was successfully removed"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="2b8d57f1-1c9e-470f-a70b-5991c80fad5f",
+            description="Remove an email address or domain from an allow/block list to stop filtering it.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            is_sensitive_action=True,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "direction": "receive",
+                "list_type": "block",
+                "entry": "spam@example.com",
+            },
+            test_output=[("success", True)],
+            test_mock={
+                "delete_entry": lambda *a, **kw: None,
+            },
+        )
+
+    @staticmethod
+    async def delete_entry(
+        credentials: APIKeyCredentials, direction: str, list_type: str, entry: str
+    ):
+        client = _client(credentials)
+        await client.lists.delete(direction, list_type, entry=entry)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            await self.delete_entry(
+                credentials,
+                input_data.direction.value,
+                input_data.list_type.value,
+                input_data.entry,
+            )
+            yield "success", True
+        except Exception as e:
+            yield "error", str(e)
--- a/autogpt_platform/backend/backend/blocks/agent_mail/messages.py
+++ b/autogpt_platform/backend/backend/blocks/agent_mail/messages.py
@@ -0,0 +1,695 @@
+"""
+AgentMail Message blocks — send, list, get, reply, forward, and update messages.
+
+A Message is an individual email within a Thread. Agents can send new messages
+(which create threads), reply to existing messages, forward them, and manage
+labels for state tracking (e.g. read/unread, campaign tags).
+"""
+
+from backend.sdk import (
+    APIKeyCredentials,
+    Block,
+    BlockCategory,
+    BlockOutput,
+    BlockSchemaInput,
+    BlockSchemaOutput,
+    CredentialsMetaInput,
+    SchemaField,
+)
+
+from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
+
+
+class AgentMailSendMessageBlock(Block):
+    """
+    Send a new email from an AgentMail inbox, automatically creating a new thread.
+
+    Supports plain text and HTML bodies, CC/BCC recipients, and labels for
+    organizing messages (e.g. campaign tracking, state management).
+    Max 50 combined recipients across to, cc, and bcc.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address to send from (e.g. 'agent@agentmail.to')"
+        )
+        to: list[str] = SchemaField(
+            description="Recipient email addresses (e.g. ['user@example.com'])"
+        )
+        subject: str = SchemaField(description="Email subject line")
+        text: str = SchemaField(
+            description="Plain text body of the email. Always provide this as a fallback for email clients that don't render HTML."
+        )
+        html: str = SchemaField(
+            description="Rich HTML body of the email. Embed CSS in a <style> tag for best compatibility across email clients.",
+            default="",
+            advanced=True,
+        )
+        cc: list[str] = SchemaField(
+            description="CC recipient email addresses for human-in-the-loop oversight",
+            default_factory=list,
+            advanced=True,
+        )
+        bcc: list[str] = SchemaField(
+            description="BCC recipient email addresses (hidden from other recipients)",
+            default_factory=list,
+            advanced=True,
+        )
+        labels: list[str] = SchemaField(
+            description="Labels to tag the message for filtering and state management (e.g. ['outreach', 'q4-campaign'])",
+            default_factory=list,
+            advanced=True,
+        )
+
+    class Output(BlockSchemaOutput):
+        message_id: str = SchemaField(
+            description="Unique identifier of the sent message"
+        )
+        thread_id: str = SchemaField(
+            description="Thread ID grouping this message and any future replies"
+        )
+        result: dict = SchemaField(
+            description="Complete sent message object with all metadata"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="b67469b2-7748-4d81-a223-4ebd332cca89",
+            description="Send a new email from an AgentMail inbox. Creates a new conversation thread. Supports HTML, CC/BCC, and labels.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            is_sensitive_action=True,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+                "to": ["user@example.com"],
+                "subject": "Test",
+                "text": "Hello",
+            },
+            test_output=[
+                ("message_id", "mock-msg-id"),
+                ("thread_id", "mock-thread-id"),
+                ("result", dict),
+            ],
+            test_mock={
+                "send_message": lambda *a, **kw: type(
+                    "Msg",
+                    (),
+                    {
+                        "message_id": "mock-msg-id",
+                        "thread_id": "mock-thread-id",
+                        "model_dump": lambda self: {
+                            "message_id": "mock-msg-id",
+                            "thread_id": "mock-thread-id",
+                        },
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def send_message(credentials: APIKeyCredentials, inbox_id: str, **params):
+        client = _client(credentials)
+        return await client.inboxes.messages.send(inbox_id, **params)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            total = len(input_data.to) + len(input_data.cc) + len(input_data.bcc)
+            if total > 50:
+                raise ValueError(
+                    f"Max 50 combined recipients across to, cc, and bcc (got {total})"
+                )
+
+            params: dict = {
+                "to": input_data.to,
+                "subject": input_data.subject,
+                "text": input_data.text,
+            }
+            if input_data.html:
+                params["html"] = input_data.html
+            if input_data.cc:
+                params["cc"] = input_data.cc
+            if input_data.bcc:
+                params["bcc"] = input_data.bcc
+            if input_data.labels:
+                params["labels"] = input_data.labels
+
+            msg = await self.send_message(credentials, input_data.inbox_id, **params)
+            result = msg.model_dump()
+
+            yield "message_id", msg.message_id
+            yield "thread_id", msg.thread_id or ""
+            yield "result", result
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailListMessagesBlock(Block):
+    """
+    List all messages in an AgentMail inbox with optional label filtering.
+
+    Returns a paginated list of messages. Use labels to filter (e.g.
+    labels=['unread'] to only get unprocessed messages). Useful for
+    polling workflows or building inbox views.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address to list messages from"
+        )
+        limit: int = SchemaField(
+            description="Maximum number of messages to return per page (1-100)",
+            default=20,
+            advanced=True,
+        )
+        page_token: str = SchemaField(
+            description="Token from a previous response to fetch the next page",
+            default="",
+            advanced=True,
+        )
+        labels: list[str] = SchemaField(
+            description="Only return messages with ALL of these labels (e.g. ['unread'] or ['q4-campaign', 'follow-up'])",
+            default_factory=list,
+            advanced=True,
+        )
+
+    class Output(BlockSchemaOutput):
+        messages: list[dict] = SchemaField(
+            description="List of message objects with subject, sender, text, html, labels, etc."
+        )
+        count: int = SchemaField(description="Number of messages returned")
+        next_page_token: str = SchemaField(
+            description="Token for the next page. Empty if no more results.",
+            default="",
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="721234df-c7a2-4927-b205-744badbd5844",
+            description="List messages in an AgentMail inbox. Filter by labels to find unread, campaign-tagged, or categorized messages.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+            },
+            test_output=[
+                ("messages", []),
+                ("count", 0),
+                ("next_page_token", ""),
+            ],
+            test_mock={
+                "list_messages": lambda *a, **kw: type(
+                    "Resp",
+                    (),
+                    {
+                        "messages": [],
+                        "count": 0,
+                        "next_page_token": "",
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def list_messages(credentials: APIKeyCredentials, inbox_id: str, **params):
+        client = _client(credentials)
+        return await client.inboxes.messages.list(inbox_id, **params)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            params: dict = {"limit": input_data.limit}
+            if input_data.page_token:
+                params["page_token"] = input_data.page_token
+            if input_data.labels:
+                params["labels"] = input_data.labels
+
+            response = await self.list_messages(
+                credentials, input_data.inbox_id, **params
+            )
+            messages = [m.model_dump() for m in response.messages]
+
+            yield "messages", messages
+            yield "count", (c if (c := response.count) is not None else len(messages))
+            yield "next_page_token", response.next_page_token or ""
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailGetMessageBlock(Block):
+    """
+    Retrieve a specific email message by ID from an AgentMail inbox.
+
+    Returns the full message including subject, body (text and HTML),
+    sender, recipients, and attachments. Use extracted_text to get
+    only the new reply content without quoted history.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address the message belongs to"
+        )
+        message_id: str = SchemaField(
+            description="Message ID to retrieve (e.g. '<abc123@agentmail.to>')"
+        )
+
+    class Output(BlockSchemaOutput):
+        message_id: str = SchemaField(description="Unique identifier of the message")
+        thread_id: str = SchemaField(description="Thread this message belongs to")
+        subject: str = SchemaField(description="Email subject line")
+        text: str = SchemaField(
+            description="Full plain text body (may include quoted reply history)"
+        )
+        extracted_text: str = SchemaField(
+            description="Just the new reply content with quoted history stripped. Best for AI processing.",
+            default="",
+        )
+        html: str = SchemaField(description="HTML body of the email", default="")
+        result: dict = SchemaField(
+            description="Complete message object with all fields including sender, recipients, attachments, labels"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="2788bdfa-1527-4603-a5e4-a455c05c032f",
+            description="Retrieve a specific email message by ID. Includes extracted_text for clean reply content without quoted history.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+                "message_id": "test-msg",
+            },
+            test_output=[
+                ("message_id", "test-msg"),
+                ("thread_id", "t1"),
+                ("subject", "Hi"),
+                ("text", "Hello"),
+                ("extracted_text", "Hello"),
+                ("html", ""),
+                ("result", dict),
+            ],
+            test_mock={
+                "get_message": lambda *a, **kw: type(
+                    "Msg",
+                    (),
+                    {
+                        "message_id": "test-msg",
+                        "thread_id": "t1",
+                        "subject": "Hi",
+                        "text": "Hello",
+                        "extracted_text": "Hello",
+                        "html": "",
+                        "model_dump": lambda self: {"message_id": "test-msg"},
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def get_message(
+        credentials: APIKeyCredentials,
+        inbox_id: str,
+        message_id: str,
+    ):
+        client = _client(credentials)
+        return await client.inboxes.messages.get(
+            inbox_id=inbox_id, message_id=message_id
+        )
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            msg = await self.get_message(
+                credentials, input_data.inbox_id, input_data.message_id
+            )
+            result = msg.model_dump()
+
+            yield "message_id", msg.message_id
+            yield "thread_id", msg.thread_id or ""
+            yield "subject", msg.subject or ""
+            yield "text", msg.text or ""
+            yield "extracted_text", msg.extracted_text or ""
+            yield "html", msg.html or ""
+            yield "result", result
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailReplyToMessageBlock(Block):
+    """
+    Reply to an existing email message, keeping the reply in the same thread.
+
+    The reply is automatically added to the same conversation thread as the
+    original message. Use this for multi-turn agent conversations.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address to send the reply from"
+        )
+        message_id: str = SchemaField(
+            description="Message ID to reply to (e.g. '<abc123@agentmail.to>')"
+        )
+        text: str = SchemaField(description="Plain text body of the reply")
+        html: str = SchemaField(
+            description="Rich HTML body of the reply",
+            default="",
+            advanced=True,
+        )
+
+    class Output(BlockSchemaOutput):
+        message_id: str = SchemaField(
+            description="Unique identifier of the reply message"
+        )
+        thread_id: str = SchemaField(description="Thread ID the reply was added to")
+        result: dict = SchemaField(
+            description="Complete reply message object with all metadata"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="b9fe53fa-5026-4547-9570-b54ccb487229",
+            description="Reply to an existing email in the same conversation thread. Use for multi-turn agent conversations.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            is_sensitive_action=True,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+                "message_id": "test-msg",
+                "text": "Reply",
+            },
+            test_output=[
+                ("message_id", "mock-reply-id"),
+                ("thread_id", "mock-thread-id"),
+                ("result", dict),
+            ],
+            test_mock={
+                "reply_to_message": lambda *a, **kw: type(
+                    "Msg",
+                    (),
+                    {
+                        "message_id": "mock-reply-id",
+                        "thread_id": "mock-thread-id",
+                        "model_dump": lambda self: {"message_id": "mock-reply-id"},
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def reply_to_message(
+        credentials: APIKeyCredentials, inbox_id: str, message_id: str, **params
+    ):
+        client = _client(credentials)
+        return await client.inboxes.messages.reply(
+            inbox_id=inbox_id, message_id=message_id, **params
+        )
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            params: dict = {"text": input_data.text}
+            if input_data.html:
+                params["html"] = input_data.html
+
+            reply = await self.reply_to_message(
+                credentials,
+                input_data.inbox_id,
+                input_data.message_id,
+                **params,
+            )
+            result = reply.model_dump()
+
+            yield "message_id", reply.message_id
+            yield "thread_id", reply.thread_id or ""
+            yield "result", result
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailForwardMessageBlock(Block):
+    """
+    Forward an existing email message to one or more recipients.
+
+    Sends the original message content to different email addresses.
+    Optionally prepend additional text or override the subject line.
+    Max 50 combined recipients across to, cc, and bcc.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address to forward from"
+        )
+        message_id: str = SchemaField(description="Message ID to forward")
+        to: list[str] = SchemaField(
+            description="Recipient email addresses to forward the message to (e.g. ['user@example.com'])"
+        )
+        cc: list[str] = SchemaField(
+            description="CC recipient email addresses",
+            default_factory=list,
+            advanced=True,
+        )
+        bcc: list[str] = SchemaField(
+            description="BCC recipient email addresses (hidden from other recipients)",
+            default_factory=list,
+            advanced=True,
+        )
+        subject: str = SchemaField(
+            description="Override the subject line (defaults to 'Fwd: <original subject>')",
+            default="",
+            advanced=True,
+        )
+        text: str = SchemaField(
+            description="Additional plain text to prepend before the forwarded content",
+            default="",
+            advanced=True,
+        )
+        html: str = SchemaField(
+            description="Additional HTML to prepend before the forwarded content",
+            default="",
+            advanced=True,
+        )
+
+    class Output(BlockSchemaOutput):
+        message_id: str = SchemaField(
+            description="Unique identifier of the forwarded message"
+        )
+        thread_id: str = SchemaField(description="Thread ID of the forward")
+        result: dict = SchemaField(
+            description="Complete forwarded message object with all metadata"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="b70c7e33-5d66-4f8e-897f-ac73a7bfce82",
+            description="Forward an email message to one or more recipients. Supports CC/BCC and optional extra text or subject override.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            is_sensitive_action=True,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+                "message_id": "test-msg",
+                "to": ["user@example.com"],
+            },
+            test_output=[
+                ("message_id", "mock-fwd-id"),
+                ("thread_id", "mock-thread-id"),
+                ("result", dict),
+            ],
+            test_mock={
+                "forward_message": lambda *a, **kw: type(
+                    "Msg",
+                    (),
+                    {
+                        "message_id": "mock-fwd-id",
+                        "thread_id": "mock-thread-id",
+                        "model_dump": lambda self: {"message_id": "mock-fwd-id"},
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def forward_message(
+        credentials: APIKeyCredentials, inbox_id: str, message_id: str, **params
+    ):
+        client = _client(credentials)
+        return await client.inboxes.messages.forward(
+            inbox_id=inbox_id, message_id=message_id, **params
+        )
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            total = len(input_data.to) + len(input_data.cc) + len(input_data.bcc)
+            if total > 50:
+                raise ValueError(
+                    f"Max 50 combined recipients across to, cc, and bcc (got {total})"
+                )
+
+            params: dict = {"to": input_data.to}
+            if input_data.cc:
+                params["cc"] = input_data.cc
+            if input_data.bcc:
+                params["bcc"] = input_data.bcc
+            if input_data.subject:
+                params["subject"] = input_data.subject
+            if input_data.text:
+                params["text"] = input_data.text
+            if input_data.html:
+                params["html"] = input_data.html
+
+            fwd = await self.forward_message(
+                credentials,
+                input_data.inbox_id,
+                input_data.message_id,
+                **params,
+            )
+            result = fwd.model_dump()
+
+            yield "message_id", fwd.message_id
+            yield "thread_id", fwd.thread_id or ""
+            yield "result", result
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailUpdateMessageBlock(Block):
+    """
+    Add or remove labels on an email message for state management.
+
+    Labels are string tags used to track message state (read/unread),
+    categorize messages (billing, support), or tag campaigns (q4-outreach).
+    Common pattern: add 'read' and remove 'unread' after processing a message.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address the message belongs to"
+        )
+        message_id: str = SchemaField(description="Message ID to update labels on")
+        add_labels: list[str] = SchemaField(
+            description="Labels to add (e.g. ['read', 'processed', 'high-priority'])",
+            default_factory=list,
+        )
+        remove_labels: list[str] = SchemaField(
+            description="Labels to remove (e.g. ['unread', 'pending'])",
+            default_factory=list,
+        )
+
+    class Output(BlockSchemaOutput):
+        message_id: str = SchemaField(description="The updated message ID")
+        result: dict = SchemaField(
+            description="Complete updated message object with current labels"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="694ff816-4c89-4a5e-a552-8c31be187735",
+            description="Add or remove labels on an email message. Use for read/unread tracking, campaign tagging, or state management.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+                "message_id": "test-msg",
+                "add_labels": ["read"],
+            },
+            test_output=[
+                ("message_id", "test-msg"),
+                ("result", dict),
+            ],
+            test_mock={
+                "update_message": lambda *a, **kw: type(
+                    "Msg",
+                    (),
+                    {
+                        "message_id": "test-msg",
+                        "model_dump": lambda self: {"message_id": "test-msg"},
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def update_message(
+        credentials: APIKeyCredentials, inbox_id: str, message_id: str, **params
+    ):
+        client = _client(credentials)
+        return await client.inboxes.messages.update(
+            inbox_id=inbox_id, message_id=message_id, **params
+        )
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            if not input_data.add_labels and not input_data.remove_labels:
+                raise ValueError(
+                    "Must specify at least one label operation: add_labels or remove_labels"
+                )
+
+            params: dict = {}
+            if input_data.add_labels:
+                params["add_labels"] = input_data.add_labels
+            if input_data.remove_labels:
+                params["remove_labels"] = input_data.remove_labels
+
+            msg = await self.update_message(
+                credentials,
+                input_data.inbox_id,
+                input_data.message_id,
+                **params,
+            )
+            result = msg.model_dump()
+
+            yield "message_id", msg.message_id
+            yield "result", result
+        except Exception as e:
+            yield "error", str(e)
--- a/autogpt_platform/backend/backend/blocks/agent_mail/pods.py
+++ b/autogpt_platform/backend/backend/blocks/agent_mail/pods.py
@@ -0,0 +1,651 @@
+"""
+AgentMail Pod blocks — create, get, list, delete pods and list pod-scoped resources.
+
+Pods provide multi-tenant isolation between your customers. Each pod acts as
+an isolated workspace containing its own inboxes, domains, threads, and drafts.
+Use pods when building SaaS platforms, agency tools, or AI agent fleets that
+serve multiple customers.
+"""
+
+from backend.sdk import (
+    APIKeyCredentials,
+    Block,
+    BlockCategory,
+    BlockOutput,
+    BlockSchemaInput,
+    BlockSchemaOutput,
+    CredentialsMetaInput,
+    SchemaField,
+)
+
+from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
+
+
+class AgentMailCreatePodBlock(Block):
+    """
+    Create a new pod for multi-tenant customer isolation.
+
+    Each pod acts as an isolated workspace for one customer or tenant.
+    Use client_id to map pods to your internal tenant IDs for idempotent
+    creation (safe to retry without creating duplicates).
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        client_id: str = SchemaField(
+            description="Your internal tenant/customer ID for idempotent mapping. Lets you access the pod by your own ID instead of AgentMail's pod_id.",
+            default="",
+        )
+
+    class Output(BlockSchemaOutput):
+        pod_id: str = SchemaField(description="Unique identifier of the created pod")
+        result: dict = SchemaField(description="Complete pod object with all metadata")
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="a2db9784-2d17-4f8f-9d6b-0214e6f22101",
+            description="Create a new pod for multi-tenant customer isolation. Use client_id to map to your internal tenant IDs.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={"credentials": TEST_CREDENTIALS_INPUT},
+            test_output=[
+                ("pod_id", "mock-pod-id"),
+                ("result", dict),
+            ],
+            test_mock={
+                "create_pod": lambda *a, **kw: type(
+                    "Pod",
+                    (),
+                    {
+                        "pod_id": "mock-pod-id",
+                        "model_dump": lambda self: {"pod_id": "mock-pod-id"},
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def create_pod(credentials: APIKeyCredentials, **params):
+        client = _client(credentials)
+        return await client.pods.create(**params)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            params: dict = {}
+            if input_data.client_id:
+                params["client_id"] = input_data.client_id
+
+            pod = await self.create_pod(credentials, **params)
+            result = pod.model_dump()
+
+            yield "pod_id", pod.pod_id
+            yield "result", result
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailGetPodBlock(Block):
+    """
+    Retrieve details of an existing pod by its ID.
+
+    Returns the pod metadata including its client_id mapping and
+    creation timestamp.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        pod_id: str = SchemaField(description="Pod ID to retrieve")
+
+    class Output(BlockSchemaOutput):
+        pod_id: str = SchemaField(description="Unique identifier of the pod")
+        result: dict = SchemaField(description="Complete pod object with all metadata")
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="553361bc-bb1b-4322-9ad4-0c226200217e",
+            description="Retrieve details of an existing pod including its client_id mapping and metadata.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={"credentials": TEST_CREDENTIALS_INPUT, "pod_id": "test-pod"},
+            test_output=[
+                ("pod_id", "test-pod"),
+                ("result", dict),
+            ],
+            test_mock={
+                "get_pod": lambda *a, **kw: type(
+                    "Pod",
+                    (),
+                    {
+                        "pod_id": "test-pod",
+                        "model_dump": lambda self: {"pod_id": "test-pod"},
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def get_pod(credentials: APIKeyCredentials, pod_id: str):
+        client = _client(credentials)
+        return await client.pods.get(pod_id=pod_id)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            pod = await self.get_pod(credentials, pod_id=input_data.pod_id)
+            result = pod.model_dump()
+
+            yield "pod_id", pod.pod_id
+            yield "result", result
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailListPodsBlock(Block):
+    """
+    List all pods in your AgentMail organization.
+
+    Returns a paginated list of all tenant pods with their metadata.
+    Use this to see all customer workspaces at a glance.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        limit: int = SchemaField(
+            description="Maximum number of pods to return per page (1-100)",
+            default=20,
+            advanced=True,
+        )
+        page_token: str = SchemaField(
+            description="Token from a previous response to fetch the next page",
+            default="",
+            advanced=True,
+        )
+
+    class Output(BlockSchemaOutput):
+        pods: list[dict] = SchemaField(
+            description="List of pod objects with pod_id, client_id, creation time, etc."
+        )
+        count: int = SchemaField(description="Number of pods returned")
+        next_page_token: str = SchemaField(
+            description="Token for the next page. Empty if no more results.",
+            default="",
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="9d3725ee-2968-431a-a816-857ab41e1420",
+            description="List all tenant pods in your organization. See all customer workspaces at a glance.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={"credentials": TEST_CREDENTIALS_INPUT},
+            test_output=[
+                ("pods", []),
+                ("count", 0),
+                ("next_page_token", ""),
+            ],
+            test_mock={
+                "list_pods": lambda *a, **kw: type(
+                    "Resp",
+                    (),
+                    {
+                        "pods": [],
+                        "count": 0,
+                        "next_page_token": "",
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def list_pods(credentials: APIKeyCredentials, **params):
+        client = _client(credentials)
+        return await client.pods.list(**params)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            params: dict = {"limit": input_data.limit}
+            if input_data.page_token:
+                params["page_token"] = input_data.page_token
+
+            response = await self.list_pods(credentials, **params)
+            pods = [p.model_dump() for p in response.pods]
+
+            yield "pods", pods
+            yield "count", response.count
+            yield "next_page_token", response.next_page_token or ""
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailDeletePodBlock(Block):
+    """
+    Permanently delete a pod. All inboxes and domains must be removed first.
+
+    You cannot delete a pod that still contains inboxes or domains.
+    Delete all child resources first, then delete the pod.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        pod_id: str = SchemaField(
+            description="Pod ID to permanently delete (must have no inboxes or domains)"
+        )
+
+    class Output(BlockSchemaOutput):
+        success: bool = SchemaField(
+            description="True if the pod was successfully deleted"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="f371f8cd-682d-4f5f-905c-529c74a8fb35",
+            description="Permanently delete a pod. All inboxes and domains must be removed first.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            is_sensitive_action=True,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={"credentials": TEST_CREDENTIALS_INPUT, "pod_id": "test-pod"},
+            test_output=[("success", True)],
+            test_mock={
+                "delete_pod": lambda *a, **kw: None,
+            },
+        )
+
+    @staticmethod
+    async def delete_pod(credentials: APIKeyCredentials, pod_id: str):
+        client = _client(credentials)
+        await client.pods.delete(pod_id=pod_id)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            await self.delete_pod(credentials, pod_id=input_data.pod_id)
+            yield "success", True
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailListPodInboxesBlock(Block):
+    """
+    List all inboxes within a specific pod (customer workspace).
+
+    Returns only the inboxes belonging to this pod, providing
+    tenant-scoped visibility.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        pod_id: str = SchemaField(description="Pod ID to list inboxes from")
+        limit: int = SchemaField(
+            description="Maximum number of inboxes to return per page (1-100)",
+            default=20,
+            advanced=True,
+        )
+        page_token: str = SchemaField(
+            description="Token from a previous response to fetch the next page",
+            default="",
+            advanced=True,
+        )
+
+    class Output(BlockSchemaOutput):
+        inboxes: list[dict] = SchemaField(
+            description="List of inbox objects within this pod"
+        )
+        count: int = SchemaField(description="Number of inboxes returned")
+        next_page_token: str = SchemaField(
+            description="Token for the next page. Empty if no more results.",
+            default="",
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="a8c17ce0-b7c1-4bc3-ae39-680e1952e5d0",
+            description="List all inboxes within a pod. View email accounts scoped to a specific customer.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={"credentials": TEST_CREDENTIALS_INPUT, "pod_id": "test-pod"},
+            test_output=[
+                ("inboxes", []),
+                ("count", 0),
+                ("next_page_token", ""),
+            ],
+            test_mock={
+                "list_pod_inboxes": lambda *a, **kw: type(
+                    "Resp",
+                    (),
+                    {
+                        "inboxes": [],
+                        "count": 0,
+                        "next_page_token": "",
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def list_pod_inboxes(credentials: APIKeyCredentials, pod_id: str, **params):
+        client = _client(credentials)
+        return await client.pods.inboxes.list(pod_id=pod_id, **params)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            params: dict = {"limit": input_data.limit}
+            if input_data.page_token:
+                params["page_token"] = input_data.page_token
+
+            response = await self.list_pod_inboxes(
+                credentials, pod_id=input_data.pod_id, **params
+            )
+            inboxes = [i.model_dump() for i in response.inboxes]
+
+            yield "inboxes", inboxes
+            yield "count", response.count
+            yield "next_page_token", response.next_page_token or ""
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailListPodThreadsBlock(Block):
+    """
+    List all conversation threads across all inboxes within a pod.
+
+    Returns threads from every inbox in the pod. Use for building
+    per-customer dashboards showing all email activity, or for
+    supervisor agents monitoring a customer's conversations.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        pod_id: str = SchemaField(description="Pod ID to list threads from")
+        limit: int = SchemaField(
+            description="Maximum number of threads to return per page (1-100)",
+            default=20,
+            advanced=True,
+        )
+        page_token: str = SchemaField(
+            description="Token from a previous response to fetch the next page",
+            default="",
+            advanced=True,
+        )
+        labels: list[str] = SchemaField(
+            description="Only return threads matching ALL of these labels",
+            default_factory=list,
+            advanced=True,
+        )
+
+    class Output(BlockSchemaOutput):
+        threads: list[dict] = SchemaField(
+            description="List of thread objects from all inboxes in this pod"
+        )
+        count: int = SchemaField(description="Number of threads returned")
+        next_page_token: str = SchemaField(
+            description="Token for the next page. Empty if no more results.",
+            default="",
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="80214f08-8b85-4533-a6b8-f8123bfcb410",
+            description="List all conversation threads across all inboxes within a pod. View all email activity for a customer.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={"credentials": TEST_CREDENTIALS_INPUT, "pod_id": "test-pod"},
+            test_output=[
+                ("threads", []),
+                ("count", 0),
+                ("next_page_token", ""),
+            ],
+            test_mock={
+                "list_pod_threads": lambda *a, **kw: type(
+                    "Resp",
+                    (),
+                    {
+                        "threads": [],
+                        "count": 0,
+                        "next_page_token": "",
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def list_pod_threads(credentials: APIKeyCredentials, pod_id: str, **params):
+        client = _client(credentials)
+        return await client.pods.threads.list(pod_id=pod_id, **params)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            params: dict = {"limit": input_data.limit}
+            if input_data.page_token:
+                params["page_token"] = input_data.page_token
+            if input_data.labels:
+                params["labels"] = input_data.labels
+
+            response = await self.list_pod_threads(
+                credentials, pod_id=input_data.pod_id, **params
+            )
+            threads = [t.model_dump() for t in response.threads]
+
+            yield "threads", threads
+            yield "count", response.count
+            yield "next_page_token", response.next_page_token or ""
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailListPodDraftsBlock(Block):
+    """
+    List all drafts across all inboxes within a pod.
+
+    Returns pending drafts from every inbox in the pod. Use for
+    per-customer approval dashboards or monitoring scheduled sends.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        pod_id: str = SchemaField(description="Pod ID to list drafts from")
+        limit: int = SchemaField(
+            description="Maximum number of drafts to return per page (1-100)",
+            default=20,
+            advanced=True,
+        )
+        page_token: str = SchemaField(
+            description="Token from a previous response to fetch the next page",
+            default="",
+            advanced=True,
+        )
+
+    class Output(BlockSchemaOutput):
+        drafts: list[dict] = SchemaField(
+            description="List of draft objects from all inboxes in this pod"
+        )
+        count: int = SchemaField(description="Number of drafts returned")
+        next_page_token: str = SchemaField(
+            description="Token for the next page. Empty if no more results.",
+            default="",
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="12fd7a3e-51ad-4b20-97c1-0391f207f517",
+            description="List all drafts across all inboxes within a pod. View pending emails for a customer.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={"credentials": TEST_CREDENTIALS_INPUT, "pod_id": "test-pod"},
+            test_output=[
+                ("drafts", []),
+                ("count", 0),
+                ("next_page_token", ""),
+            ],
+            test_mock={
+                "list_pod_drafts": lambda *a, **kw: type(
+                    "Resp",
+                    (),
+                    {
+                        "drafts": [],
+                        "count": 0,
+                        "next_page_token": "",
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def list_pod_drafts(credentials: APIKeyCredentials, pod_id: str, **params):
+        client = _client(credentials)
+        return await client.pods.drafts.list(pod_id=pod_id, **params)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            params: dict = {"limit": input_data.limit}
+            if input_data.page_token:
+                params["page_token"] = input_data.page_token
+
+            response = await self.list_pod_drafts(
+                credentials, pod_id=input_data.pod_id, **params
+            )
+            drafts = [d.model_dump() for d in response.drafts]
+
+            yield "drafts", drafts
+            yield "count", response.count
+            yield "next_page_token", response.next_page_token or ""
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailCreatePodInboxBlock(Block):
+    """
+    Create a new email inbox within a specific pod (customer workspace).
+
+    The inbox is automatically scoped to the pod and inherits its
+    isolation guarantees. If username/domain are not provided,
+    AgentMail auto-generates a unique address.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        pod_id: str = SchemaField(description="Pod ID to create the inbox in")
+        username: str = SchemaField(
+            description="Local part of the email address (e.g. 'support'). Leave empty to auto-generate.",
+            default="",
+        )
+        domain: str = SchemaField(
+            description="Email domain (e.g. 'mydomain.com'). Defaults to agentmail.to if empty.",
+            default="",
+        )
+        display_name: str = SchemaField(
+            description="Friendly name shown in the 'From' field (e.g. 'Customer Support')",
+            default="",
+        )
+
+    class Output(BlockSchemaOutput):
+        inbox_id: str = SchemaField(
+            description="Unique identifier of the created inbox"
+        )
+        email_address: str = SchemaField(description="Full email address of the inbox")
+        result: dict = SchemaField(
+            description="Complete inbox object with all metadata"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="c6862373-1ac6-402e-89e6-7db1fea882af",
+            description="Create a new email inbox within a pod. The inbox is scoped to the customer workspace.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={"credentials": TEST_CREDENTIALS_INPUT, "pod_id": "test-pod"},
+            test_output=[
+                ("inbox_id", "mock-inbox-id"),
+                ("email_address", "mock-inbox-id"),
+                ("result", dict),
+            ],
+            test_mock={
+                "create_pod_inbox": lambda *a, **kw: type(
+                    "Inbox",
+                    (),
+                    {
+                        "inbox_id": "mock-inbox-id",
+                        "model_dump": lambda self: {"inbox_id": "mock-inbox-id"},
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def create_pod_inbox(credentials: APIKeyCredentials, pod_id: str, **params):
+        client = _client(credentials)
+        return await client.pods.inboxes.create(pod_id=pod_id, **params)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            params: dict = {}
+            if input_data.username:
+                params["username"] = input_data.username
+            if input_data.domain:
+                params["domain"] = input_data.domain
+            if input_data.display_name:
+                params["display_name"] = input_data.display_name
+
+            inbox = await self.create_pod_inbox(
+                credentials, pod_id=input_data.pod_id, **params
+            )
+            result = inbox.model_dump()
+
+            yield "inbox_id", inbox.inbox_id
+            yield "email_address", inbox.inbox_id
+            yield "result", result
+        except Exception as e:
+            yield "error", str(e)
--- a/autogpt_platform/backend/backend/blocks/agent_mail/threads.py
+++ b/autogpt_platform/backend/backend/blocks/agent_mail/threads.py
@@ -0,0 +1,438 @@
+"""
+AgentMail Thread blocks — list, get, and delete conversation threads.
+
+A Thread groups related messages into a single conversation. Threads are
+created automatically when a new message is sent and grow as replies are added.
+Threads can be queried per-inbox or across the entire organization.
+"""
+
+from backend.sdk import (
+    APIKeyCredentials,
+    Block,
+    BlockCategory,
+    BlockOutput,
+    BlockSchemaInput,
+    BlockSchemaOutput,
+    CredentialsMetaInput,
+    SchemaField,
+)
+
+from ._config import TEST_CREDENTIALS, TEST_CREDENTIALS_INPUT, _client, agent_mail
+
+
+class AgentMailListInboxThreadsBlock(Block):
+    """
+    List all conversation threads within a specific AgentMail inbox.
+
+    Returns a paginated list of threads with optional label filtering.
+    Use labels to find threads by campaign, status, or custom tags.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address to list threads from"
+        )
+        limit: int = SchemaField(
+            description="Maximum number of threads to return per page (1-100)",
+            default=20,
+            advanced=True,
+        )
+        page_token: str = SchemaField(
+            description="Token from a previous response to fetch the next page",
+            default="",
+            advanced=True,
+        )
+        labels: list[str] = SchemaField(
+            description="Only return threads matching ALL of these labels (e.g. ['q4-campaign', 'follow-up'])",
+            default_factory=list,
+            advanced=True,
+        )
+
+    class Output(BlockSchemaOutput):
+        threads: list[dict] = SchemaField(
+            description="List of thread objects with thread_id, subject, message count, labels, etc."
+        )
+        count: int = SchemaField(description="Number of threads returned")
+        next_page_token: str = SchemaField(
+            description="Token for the next page. Empty if no more results.",
+            default="",
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="63dd9e2d-ef81-405c-b034-c031f0437334",
+            description="List all conversation threads in an AgentMail inbox. Filter by labels for campaign tracking or status management.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+            },
+            test_output=[
+                ("threads", []),
+                ("count", 0),
+                ("next_page_token", ""),
+            ],
+            test_mock={
+                "list_threads": lambda *a, **kw: type(
+                    "Resp",
+                    (),
+                    {
+                        "threads": [],
+                        "count": 0,
+                        "next_page_token": "",
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def list_threads(credentials: APIKeyCredentials, inbox_id: str, **params):
+        client = _client(credentials)
+        return await client.inboxes.threads.list(inbox_id=inbox_id, **params)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            params: dict = {"limit": input_data.limit}
+            if input_data.page_token:
+                params["page_token"] = input_data.page_token
+            if input_data.labels:
+                params["labels"] = input_data.labels
+
+            response = await self.list_threads(
+                credentials, input_data.inbox_id, **params
+            )
+            threads = [t.model_dump() for t in response.threads]
+
+            yield "threads", threads
+            yield "count", (c if (c := response.count) is not None else len(threads))
+            yield "next_page_token", response.next_page_token or ""
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailGetInboxThreadBlock(Block):
+    """
+    Retrieve a single conversation thread from an AgentMail inbox.
+
+    Returns the thread with all its messages in chronological order.
+    Use this to get the full conversation history for context when
+    composing replies.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address the thread belongs to"
+        )
+        thread_id: str = SchemaField(description="Thread ID to retrieve")
+
+    class Output(BlockSchemaOutput):
+        thread_id: str = SchemaField(description="Unique identifier of the thread")
+        messages: list[dict] = SchemaField(
+            description="All messages in the thread, in chronological order"
+        )
+        result: dict = SchemaField(
+            description="Complete thread object with all metadata"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="42866290-1479-4153-83e7-550b703e9da2",
+            description="Retrieve a conversation thread with all its messages. Use for getting full conversation context before replying.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+                "thread_id": "test-thread",
+            },
+            test_output=[
+                ("thread_id", "test-thread"),
+                ("messages", []),
+                ("result", dict),
+            ],
+            test_mock={
+                "get_thread": lambda *a, **kw: type(
+                    "Thread",
+                    (),
+                    {
+                        "thread_id": "test-thread",
+                        "messages": [],
+                        "model_dump": lambda self: {
+                            "thread_id": "test-thread",
+                            "messages": [],
+                        },
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def get_thread(credentials: APIKeyCredentials, inbox_id: str, thread_id: str):
+        client = _client(credentials)
+        return await client.inboxes.threads.get(inbox_id=inbox_id, thread_id=thread_id)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            thread = await self.get_thread(
+                credentials, input_data.inbox_id, input_data.thread_id
+            )
+            messages = [m.model_dump() for m in thread.messages]
+            result = thread.model_dump()
+            result["messages"] = messages
+
+            yield "thread_id", thread.thread_id
+            yield "messages", messages
+            yield "result", result
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailDeleteInboxThreadBlock(Block):
+    """
+    Permanently delete a conversation thread and all its messages from an inbox.
+
+    This removes the thread and every message within it. This action
+    cannot be undone.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        inbox_id: str = SchemaField(
+            description="Inbox ID or email address the thread belongs to"
+        )
+        thread_id: str = SchemaField(description="Thread ID to permanently delete")
+
+    class Output(BlockSchemaOutput):
+        success: bool = SchemaField(
+            description="True if the thread was successfully deleted"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="18cd5f6f-4ff6-45da-8300-25a50ea7fb75",
+            description="Permanently delete a conversation thread and all its messages. This action cannot be undone.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            is_sensitive_action=True,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "inbox_id": "test-inbox",
+                "thread_id": "test-thread",
+            },
+            test_output=[("success", True)],
+            test_mock={
+                "delete_thread": lambda *a, **kw: None,
+            },
+        )
+
+    @staticmethod
+    async def delete_thread(
+        credentials: APIKeyCredentials, inbox_id: str, thread_id: str
+    ):
+        client = _client(credentials)
+        await client.inboxes.threads.delete(inbox_id=inbox_id, thread_id=thread_id)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            await self.delete_thread(
+                credentials, input_data.inbox_id, input_data.thread_id
+            )
+            yield "success", True
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailListOrgThreadsBlock(Block):
+    """
+    List conversation threads across ALL inboxes in your organization.
+
+    Unlike per-inbox listing, this returns threads from every inbox.
+    Ideal for building supervisor agents that monitor all conversations,
+    analytics dashboards, or cross-agent routing workflows.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        limit: int = SchemaField(
+            description="Maximum number of threads to return per page (1-100)",
+            default=20,
+            advanced=True,
+        )
+        page_token: str = SchemaField(
+            description="Token from a previous response to fetch the next page",
+            default="",
+            advanced=True,
+        )
+        labels: list[str] = SchemaField(
+            description="Only return threads matching ALL of these labels",
+            default_factory=list,
+            advanced=True,
+        )
+
+    class Output(BlockSchemaOutput):
+        threads: list[dict] = SchemaField(
+            description="List of thread objects from all inboxes in the organization"
+        )
+        count: int = SchemaField(description="Number of threads returned")
+        next_page_token: str = SchemaField(
+            description="Token for the next page. Empty if no more results.",
+            default="",
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="d7a0657b-58ab-48b2-898b-7bd94f44a708",
+            description="List threads across ALL inboxes in your organization. Use for supervisor agents, dashboards, or cross-agent monitoring.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={"credentials": TEST_CREDENTIALS_INPUT},
+            test_output=[
+                ("threads", []),
+                ("count", 0),
+                ("next_page_token", ""),
+            ],
+            test_mock={
+                "list_org_threads": lambda *a, **kw: type(
+                    "Resp",
+                    (),
+                    {
+                        "threads": [],
+                        "count": 0,
+                        "next_page_token": "",
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def list_org_threads(credentials: APIKeyCredentials, **params):
+        client = _client(credentials)
+        return await client.threads.list(**params)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            params: dict = {"limit": input_data.limit}
+            if input_data.page_token:
+                params["page_token"] = input_data.page_token
+            if input_data.labels:
+                params["labels"] = input_data.labels
+
+            response = await self.list_org_threads(credentials, **params)
+            threads = [t.model_dump() for t in response.threads]
+
+            yield "threads", threads
+            yield "count", (c if (c := response.count) is not None else len(threads))
+            yield "next_page_token", response.next_page_token or ""
+        except Exception as e:
+            yield "error", str(e)
+
+
+class AgentMailGetOrgThreadBlock(Block):
+    """
+    Retrieve a single conversation thread by ID from anywhere in the organization.
+
+    Works without needing to know which inbox the thread belongs to.
+    Returns the thread with all its messages in chronological order.
+    """
+
+    class Input(BlockSchemaInput):
+        credentials: CredentialsMetaInput = agent_mail.credentials_field(
+            description="AgentMail API key from https://console.agentmail.to"
+        )
+        thread_id: str = SchemaField(
+            description="Thread ID to retrieve (works across all inboxes)"
+        )
+
+    class Output(BlockSchemaOutput):
+        thread_id: str = SchemaField(description="Unique identifier of the thread")
+        messages: list[dict] = SchemaField(
+            description="All messages in the thread, in chronological order"
+        )
+        result: dict = SchemaField(
+            description="Complete thread object with all metadata"
+        )
+        error: str = SchemaField(description="Error message if the operation failed")
+
+    def __init__(self):
+        super().__init__(
+            id="39aaae31-3eb1-44c6-9e37-5a44a4529649",
+            description="Retrieve a conversation thread by ID from anywhere in the organization, without needing the inbox ID.",
+            categories={BlockCategory.COMMUNICATION},
+            input_schema=self.Input,
+            output_schema=self.Output,
+            test_credentials=TEST_CREDENTIALS,
+            test_input={
+                "credentials": TEST_CREDENTIALS_INPUT,
+                "thread_id": "test-thread",
+            },
+            test_output=[
+                ("thread_id", "test-thread"),
+                ("messages", []),
+                ("result", dict),
+            ],
+            test_mock={
+                "get_org_thread": lambda *a, **kw: type(
+                    "Thread",
+                    (),
+                    {
+                        "thread_id": "test-thread",
+                        "messages": [],
+                        "model_dump": lambda self: {
+                            "thread_id": "test-thread",
+                            "messages": [],
+                        },
+                    },
+                )(),
+            },
+        )
+
+    @staticmethod
+    async def get_org_thread(credentials: APIKeyCredentials, thread_id: str):
+        client = _client(credentials)
+        return await client.threads.get(thread_id=thread_id)
+
+    async def run(
+        self, input_data: Input, *, credentials: APIKeyCredentials, **kwargs
+    ) -> BlockOutput:
+        try:
+            thread = await self.get_org_thread(credentials, input_data.thread_id)
+            messages = [m.model_dump() for m in thread.messages]
+            result = thread.model_dump()
+            result["messages"] = messages
+
+            yield "thread_id", thread.thread_id
+            yield "messages", messages
+            yield "result", result
+        except Exception as e:
+            yield "error", str(e)
--- a/autogpt_platform/backend/backend/blocks/ai_condition.py
+++ b/autogpt_platform/backend/backend/blocks/ai_condition.py
@@ -1,3 +1,4 @@
+import re
 from typing import Any

 from backend.blocks._base import (
@@ -19,6 +20,33 @@ from backend.blocks.llm import (
 )
 from backend.data.model import APIKeyCredentials, NodeExecutionStats, SchemaField

+# Minimum max_output_tokens accepted by OpenAI-compatible APIs.
+# A true/false answer fits comfortably within this budget.
+MIN_LLM_OUTPUT_TOKENS = 16
+
+
+def _parse_boolean_response(response_text: str) -> tuple[bool, str | None]:
+    """Parse an LLM response into a boolean result.
+
+    Returns a ``(result, error)`` tuple.  *error* is ``None`` when the
+    response is unambiguous; otherwise it contains a diagnostic message
+    and *result* defaults to ``False``.
+    """
+    text = response_text.strip().lower()
+    if text == "true":
+        return True, None
+    if text == "false":
+        return False, None
+
+    # Fuzzy match – use word boundaries to avoid false positives like "untrue".
+    tokens = set(re.findall(r"\b(true|false|yes|no|1|0)\b", text))
+    if tokens == {"true"} or tokens == {"yes"} or tokens == {"1"}:
+        return True, None
+    if tokens == {"false"} or tokens == {"no"} or tokens == {"0"}:
+        return False, None
+
+    return False, f"Unclear AI response: '{response_text}'"
+

 class AIConditionBlock(AIBlockBase):
    """
@@ -162,54 +190,26 @@ class AIConditionBlock(AIBlockBase):
        ]

        # Call the LLM
-        try:
-            response = await self.llm_call(
-                credentials=credentials,
-                llm_model=input_data.model,
-                prompt=prompt,
-                max_tokens=10,  # We only expect a true/false response
+        response = await self.llm_call(
+            credentials=credentials,
+            llm_model=input_data.model,
+            prompt=prompt,
+            max_tokens=MIN_LLM_OUTPUT_TOKENS,
+        )
+
+        # Extract the boolean result from the response
+        result, error = _parse_boolean_response(response.response)
+        if error:
+            yield "error", error
+
+        # Update internal stats
+        self.merge_stats(
+            NodeExecutionStats(
+                input_token_count=response.prompt_tokens,
+                output_token_count=response.completion_tokens,
            )
-
-            # Extract the boolean result from the response
-            response_text = response.response.strip().lower()
-            if response_text == "true":
-                result = True
-            elif response_text == "false":
-                result = False
-            else:
-                # If the response is not clear, try to interpret it using word boundaries
-                import re
-
-                # Use word boundaries to avoid false positives like 'untrue' or '10'
-                tokens = set(re.findall(r"\b(true|false|yes|no|1|0)\b", response_text))
-
-                if tokens == {"true"} or tokens == {"yes"} or tokens == {"1"}:
-                    result = True
-                elif tokens == {"false"} or tokens == {"no"} or tokens == {"0"}:
-                    result = False
-                else:
-                    # Unclear or conflicting response - default to False and yield error
-                    result = False
-                    yield "error", f"Unclear AI response: '{response.response}'"
-
-            # Update internal stats
-            self.merge_stats(
-                NodeExecutionStats(
-                    input_token_count=response.prompt_tokens,
-                    output_token_count=response.completion_tokens,
-                )
-            )
-            self.prompt = response.prompt
-
-        except Exception as e:
-            # In case of any error, default to False to be safe
-            result = False
-            # Log the error but don't fail the block execution
-            import logging
-
-            logger = logging.getLogger(__name__)
-            logger.error(f"AI condition evaluation failed: {str(e)}")
-            yield "error", f"AI evaluation failed: {str(e)}"
+        )
+        self.prompt = response.prompt

        # Yield results
        yield "result", result
--- a/autogpt_platform/backend/backend/blocks/ai_condition_test.py
+++ b/autogpt_platform/backend/backend/blocks/ai_condition_test.py
@@ -0,0 +1,147 @@
+"""Tests for AIConditionBlock – regression coverage for max_tokens and error propagation."""
+
+from __future__ import annotations
+
+from typing import cast
+
+import pytest
+
+from backend.blocks.ai_condition import (
+    MIN_LLM_OUTPUT_TOKENS,
+    AIConditionBlock,
+    _parse_boolean_response,
+)
+from backend.blocks.llm import (
+    DEFAULT_LLM_MODEL,
+    TEST_CREDENTIALS,
+    TEST_CREDENTIALS_INPUT,
+    AICredentials,
+    LLMResponse,
+)
+
+_TEST_AI_CREDENTIALS = cast(AICredentials, TEST_CREDENTIALS_INPUT)
+
+
+# ---------------------------------------------------------------------------
+# Helper to collect all yields from the async generator
+# ---------------------------------------------------------------------------
+
+
+async def _collect_outputs(block: AIConditionBlock, input_data, credentials):
+    outputs: dict[str, object] = {}
+    async for name, value in block.run(input_data, credentials=credentials):
+        outputs[name] = value
+    return outputs
+
+
+def _make_input(**overrides) -> AIConditionBlock.Input:
+    defaults: dict = {
+        "input_value": "hello@example.com",
+        "condition": "the input is an email address",
+        "yes_value": "yes!",
+        "no_value": "no!",
+        "model": DEFAULT_LLM_MODEL,
+        "credentials": TEST_CREDENTIALS_INPUT,
+    }
+    defaults.update(overrides)
+    return AIConditionBlock.Input(**defaults)
+
+
+def _mock_llm_response(response_text: str) -> LLMResponse:
+    return LLMResponse(
+        raw_response="",
+        prompt=[],
+        response=response_text,
+        tool_calls=None,
+        prompt_tokens=10,
+        completion_tokens=5,
+        reasoning=None,
+    )
+
+
+# ---------------------------------------------------------------------------
+# _parse_boolean_response unit tests
+# ---------------------------------------------------------------------------
+
+
+class TestParseBooleanResponse:
+    def test_true_exact(self):
+        assert _parse_boolean_response("true") == (True, None)
+
+    def test_false_exact(self):
+        assert _parse_boolean_response("false") == (False, None)
+
+    def test_true_with_whitespace(self):
+        assert _parse_boolean_response("  True  ") == (True, None)
+
+    def test_yes_fuzzy(self):
+        assert _parse_boolean_response("Yes") == (True, None)
+
+    def test_no_fuzzy(self):
+        assert _parse_boolean_response("no") == (False, None)
+
+    def test_one_fuzzy(self):
+        assert _parse_boolean_response("1") == (True, None)
+
+    def test_zero_fuzzy(self):
+        assert _parse_boolean_response("0") == (False, None)
+
+    def test_unclear_response(self):
+        result, error = _parse_boolean_response("I'm not sure")
+        assert result is False
+        assert error is not None
+        assert "Unclear" in error
+
+    def test_conflicting_tokens(self):
+        result, error = _parse_boolean_response("true and false")
+        assert result is False
+        assert error is not None
+
+
+# ---------------------------------------------------------------------------
+# Regression: max_tokens is set to MIN_LLM_OUTPUT_TOKENS
+# ---------------------------------------------------------------------------
+
+
+class TestMaxTokensRegression:
+    @pytest.mark.asyncio
+    async def test_llm_call_receives_min_output_tokens(self):
+        """max_tokens must be MIN_LLM_OUTPUT_TOKENS (16) – the previous value
+        of 1 was too low and caused OpenAI to reject the request."""
+        block = AIConditionBlock()
+        captured_kwargs: dict = {}
+
+        async def spy_llm_call(**kwargs):
+            captured_kwargs.update(kwargs)
+            return _mock_llm_response("true")
+
+        block.llm_call = spy_llm_call  # type: ignore[assignment]
+
+        input_data = _make_input()
+        await _collect_outputs(block, input_data, credentials=TEST_CREDENTIALS)
+
+        assert captured_kwargs["max_tokens"] == MIN_LLM_OUTPUT_TOKENS
+        assert captured_kwargs["max_tokens"] == 16
+
+
+# ---------------------------------------------------------------------------
+# Regression: exceptions from llm_call must propagate
+# ---------------------------------------------------------------------------
+
+
+class TestExceptionPropagation:
+    @pytest.mark.asyncio
+    async def test_llm_call_exception_propagates(self):
+        """If llm_call raises, the exception must NOT be swallowed.
+        Previously the block caught all exceptions and silently returned
+        result=False."""
+        block = AIConditionBlock()
+
+        async def boom(**kwargs):
+            raise RuntimeError("LLM provider error")
+
+        block.llm_call = boom  # type: ignore[assignment]
+
+        input_data = _make_input()
+        with pytest.raises(RuntimeError, match="LLM provider error"):
+            await _collect_outputs(block, input_data, credentials=TEST_CREDENTIALS)
--- a/autogpt_platform/backend/backend/blocks/ai_image_customizer.py
+++ b/autogpt_platform/backend/backend/blocks/ai_image_customizer.py
@@ -27,6 +27,7 @@ from backend.util.file import MediaFileType, store_media_file
 class GeminiImageModel(str, Enum):
    NANO_BANANA = "google/nano-banana"
    NANO_BANANA_PRO = "google/nano-banana-pro"
+    NANO_BANANA_2 = "google/nano-banana-2"


 class AspectRatio(str, Enum):
@@ -77,7 +78,7 @@ class AIImageCustomizerBlock(Block):
        )
        model: GeminiImageModel = SchemaField(
            description="The AI model to use for image generation and editing",
-            default=GeminiImageModel.NANO_BANANA,
+            default=GeminiImageModel.NANO_BANANA_2,
            title="Model",
        )
        images: list[MediaFileType] = SchemaField(
@@ -103,7 +104,7 @@ class AIImageCustomizerBlock(Block):
        super().__init__(
            id="d76bbe4c-930e-4894-8469-b66775511f71",
            description=(
-                "Generate and edit custom images using Google's Nano-Banana model from Gemini 2.5. "
+                "Generate and edit custom images using Google's Nano-Banana models from Gemini. "
                "Provide a prompt and optional reference images to create or modify images."
            ),
            categories={BlockCategory.AI, BlockCategory.MULTIMEDIA},
@@ -111,7 +112,7 @@ class AIImageCustomizerBlock(Block):
            output_schema=AIImageCustomizerBlock.Output,
            test_input={
                "prompt": "Make the scene more vibrant and colorful",
-                "model": GeminiImageModel.NANO_BANANA,
+                "model": GeminiImageModel.NANO_BANANA_2,
                "images": [],
                "aspect_ratio": AspectRatio.MATCH_INPUT_IMAGE,
                "output_format": OutputFormat.JPG,
--- a/autogpt_platform/backend/backend/blocks/ai_image_generator_block.py
+++ b/autogpt_platform/backend/backend/blocks/ai_image_generator_block.py
@@ -115,6 +115,7 @@ class ImageGenModel(str, Enum):
    RECRAFT = "Recraft v3"
    SD3_5 = "Stable Diffusion 3.5 Medium"
    NANO_BANANA_PRO = "Nano Banana Pro"
+    NANO_BANANA_2 = "Nano Banana 2"


 class AIImageGeneratorBlock(Block):
@@ -131,7 +132,7 @@ class AIImageGeneratorBlock(Block):
        )
        model: ImageGenModel = SchemaField(
            description="The AI model to use for image generation",
-            default=ImageGenModel.SD3_5,
+            default=ImageGenModel.NANO_BANANA_2,
            title="Model",
        )
        size: ImageSize = SchemaField(
@@ -165,7 +166,7 @@ class AIImageGeneratorBlock(Block):
            test_input={
                "credentials": TEST_CREDENTIALS_INPUT,
                "prompt": "An octopus using a laptop in a snowy forest with 'AutoGPT' clearly visible on the screen",
-                "model": ImageGenModel.RECRAFT,
+                "model": ImageGenModel.NANO_BANANA_2,
                "size": ImageSize.SQUARE,
                "style": ImageStyle.REALISTIC,
            },
@@ -179,7 +180,9 @@ class AIImageGeneratorBlock(Block):
            ],
            test_mock={
                # Return a data URI directly so store_media_file doesn't need to download
-                "_run_client": lambda *args, **kwargs: "data:image/webp;base64,UklGRiQAAABXRUJQVlA4IBgAAAAwAQCdASoBAAEAAQAcJYgCdAEO"
+                "_run_client": lambda *args, **kwargs: (
+                    "data:image/webp;base64,UklGRiQAAABXRUJQVlA4IBgAAAAwAQCdASoBAAEAAQAcJYgCdAEO"
+                )
            },
        )

@@ -280,17 +283,24 @@ class AIImageGeneratorBlock(Block):
                )
                return output

-            elif input_data.model == ImageGenModel.NANO_BANANA_PRO:
-                # Use Nano Banana Pro (Google Gemini 3 Pro Image)
+            elif input_data.model in (
+                ImageGenModel.NANO_BANANA_PRO,
+                ImageGenModel.NANO_BANANA_2,
+            ):
+                # Use Nano Banana models (Google Gemini image variants)
+                model_map = {
+                    ImageGenModel.NANO_BANANA_PRO: "google/nano-banana-pro",
+                    ImageGenModel.NANO_BANANA_2: "google/nano-banana-2",
+                }
                input_params = {
                    "prompt": modified_prompt,
                    "aspect_ratio": SIZE_TO_NANO_BANANA_RATIO[input_data.size],
-                    "resolution": "2K",  # Default to 2K for good quality/cost balance
+                    "resolution": "2K",
                    "output_format": "jpg",
-                    "safety_filter_level": "block_only_high",  # Most permissive
+                    "safety_filter_level": "block_only_high",
                }
                output = await self._run_client(
-                    credentials, "google/nano-banana-pro", input_params
+                    credentials, model_map[input_data.model], input_params
                )
                return output

--- a/autogpt_platform/backend/backend/blocks/autopilot.py
+++ b/autogpt_platform/backend/backend/blocks/autopilot.py
@@ -0,0 +1,537 @@
+from __future__ import annotations
+
+import asyncio
+import contextvars
+import json
+import logging
+from typing import TYPE_CHECKING, Any
+
+from typing_extensions import TypedDict  # Needed for Python <3.12 compatibility
+
+from backend.blocks._base import (
+    Block,
+    BlockCategory,
+    BlockOutput,
+    BlockSchemaInput,
+    BlockSchemaOutput,
+)
+from backend.copilot.permissions import (
+    CopilotPermissions,
+    ToolName,
+    all_known_tool_names,
+    validate_block_identifiers,
+)
+from backend.data.model import SchemaField
+
+if TYPE_CHECKING:
+    from backend.data.execution import ExecutionContext
+
+logger = logging.getLogger(__name__)
+
+# Block ID shared between autopilot.py and copilot prompting.py.
+AUTOPILOT_BLOCK_ID = "c069dc6b-c3ed-4c12-b6e5-d47361e64ce6"
+
+
+class ToolCallEntry(TypedDict):
+    """A single tool invocation record from an autopilot execution."""
+
+    tool_call_id: str
+    tool_name: str
+    input: Any
+    output: Any | None
+    success: bool | None
+
+
+class TokenUsage(TypedDict):
+    """Aggregated token counts from the autopilot stream."""
+
+    prompt_tokens: int
+    completion_tokens: int
+    total_tokens: int
+
+
+class AutoPilotBlock(Block):
+    """Execute tasks using AutoGPT AutoPilot with full access to platform tools.
+
+    The autopilot can manage agents, access workspace files, fetch web content,
+    run blocks, and more. This block enables sub-agent patterns (autopilot calling
+    autopilot) and scheduled autopilot execution via the agent executor.
+    """
+
+    class Input(BlockSchemaInput):
+        """Input schema for the AutoPilot block."""
+
+        prompt: str = SchemaField(
+            description=(
+                "The task or instruction for the autopilot to execute. "
+                "The autopilot has access to platform tools like agent management, "
+                "workspace files, web fetch, block execution, and more."
+            ),
+            placeholder="Find my agents and list them",
+            advanced=False,
+        )
+
+        system_context: str = SchemaField(
+            description=(
+                "Optional additional context prepended to the prompt. "
+                "Use this to constrain autopilot behavior, provide domain "
+                "context, or set output format requirements."
+            ),
+            default="",
+            advanced=True,
+        )
+
+        session_id: str = SchemaField(
+            description=(
+                "Session ID to continue an existing autopilot conversation. "
+                "Leave empty to start a new session. "
+                "Use the session_id output from a previous run to continue."
+            ),
+            default="",
+            advanced=True,
+        )
+
+        max_recursion_depth: int = SchemaField(
+            description=(
+                "Maximum nesting depth when the autopilot calls this block "
+                "recursively (sub-agent pattern). Prevents infinite loops."
+            ),
+            default=3,
+            ge=1,
+            le=10,
+            advanced=True,
+        )
+
+        tools: list[ToolName] = SchemaField(
+            description=(
+                "Tool names to filter. Works with tools_exclude to form an "
+                "allow-list or deny-list. "
+                "Leave empty to apply no tool filter."
+            ),
+            default=[],
+            advanced=True,
+        )
+
+        tools_exclude: bool = SchemaField(
+            description=(
+                "Controls how the 'tools' list is interpreted. "
+                "True (default): 'tools' is a deny-list — listed tools are blocked, "
+                "all others are allowed. An empty 'tools' list means allow everything. "
+                "False: 'tools' is an allow-list — only listed tools are permitted."
+            ),
+            default=True,
+            advanced=True,
+        )
+
+        blocks: list[str] = SchemaField(
+            description=(
+                "Block identifiers to filter when the copilot uses run_block. "
+                "Each entry can be: a block name (e.g. 'HTTP Request'), "
+                "a full block UUID, or the first 8 hex characters of the UUID "
+                "(e.g. 'c069dc6b'). Works with blocks_exclude. "
+                "Leave empty to apply no block filter."
+            ),
+            default=[],
+            advanced=True,
+        )
+
+        blocks_exclude: bool = SchemaField(
+            description=(
+                "Controls how the 'blocks' list is interpreted. "
+                "True (default): 'blocks' is a deny-list — listed blocks are blocked, "
+                "all others are allowed. An empty 'blocks' list means allow everything. "
+                "False: 'blocks' is an allow-list — only listed blocks are permitted."
+            ),
+            default=True,
+            advanced=True,
+        )
+
+        dry_run: bool = SchemaField(
+            description=(
+                "When enabled, run_block and run_agent tool calls in this "
+                "autopilot session are forced to use dry-run simulation mode. "
+                "No real API calls, side effects, or credits are consumed "
+                "by those tools. Useful for testing agent wiring and "
+                "previewing outputs. "
+                "Only applies when creating a new session (session_id is empty). "
+                "When reusing an existing session_id, the session's original "
+                "dry_run setting is preserved."
+            ),
+            default=False,
+            advanced=True,
+        )
+
+        # timeout_seconds removed: the SDK manages its own heartbeat-based
+        # timeouts internally; wrapping with asyncio.timeout corrupts the
+        # SDK's internal stream (see service.py CRITICAL comment).
+
+    class Output(BlockSchemaOutput):
+        """Output schema for the AutoPilot block."""
+
+        response: str = SchemaField(
+            description="The final text response from the autopilot."
+        )
+        tool_calls: list[ToolCallEntry] = SchemaField(
+            description=(
+                "List of tools called during execution. Each entry has "
+                "tool_call_id, tool_name, input, output, and success fields."
+            ),
+        )
+        conversation_history: str = SchemaField(
+            description=(
+                "Current turn messages (user prompt + assistant reply) as JSON. "
+                "It can be used for logging or analysis."
+            ),
+        )
+        session_id: str = SchemaField(
+            description=(
+                "Session ID for this conversation. "
+                "Pass this back to continue the conversation in a future run."
+            ),
+        )
+        token_usage: TokenUsage = SchemaField(
+            description=(
+                "Token usage statistics: prompt_tokens, "
+                "completion_tokens, total_tokens."
+            ),
+        )
+
+    def __init__(self):
+        super().__init__(
+            id=AUTOPILOT_BLOCK_ID,
+            description=(
+                "Execute tasks using AutoGPT AutoPilot with full access to "
+                "platform tools (agent management, workspace files, web fetch, "
+                "block execution, and more). Enables sub-agent patterns and "
+                "scheduled autopilot execution."
+            ),
+            categories={BlockCategory.AI, BlockCategory.AGENT},
+            input_schema=AutoPilotBlock.Input,
+            output_schema=AutoPilotBlock.Output,
+            test_input={
+                "prompt": "List my agents",
+                "system_context": "",
+                "session_id": "",
+                "max_recursion_depth": 3,
+            },
+            test_output=[
+                ("response", "You have 2 agents: Agent A and Agent B."),
+                ("tool_calls", []),
+                (
+                    "conversation_history",
+                    '[{"role": "user", "content": "List my agents"}]',
+                ),
+                ("session_id", "test-session-id"),
+                (
+                    "token_usage",
+                    {
+                        "prompt_tokens": 100,
+                        "completion_tokens": 50,
+                        "total_tokens": 150,
+                    },
+                ),
+            ],
+            test_mock={
+                "create_session": lambda *args, **kwargs: "test-session-id",
+                "execute_copilot": lambda *args, **kwargs: (
+                    "You have 2 agents: Agent A and Agent B.",
+                    [],
+                    '[{"role": "user", "content": "List my agents"}]',
+                    "test-session-id",
+                    {
+                        "prompt_tokens": 100,
+                        "completion_tokens": 50,
+                        "total_tokens": 150,
+                    },
+                ),
+            },
+        )
+
+    async def create_session(self, user_id: str, *, dry_run: bool) -> str:
+        """Create a new chat session and return its ID (mockable for tests)."""
+        from backend.copilot.model import create_chat_session  # avoid circular import
+
+        session = await create_chat_session(user_id, dry_run=dry_run)
+        return session.session_id
+
+    async def execute_copilot(
+        self,
+        prompt: str,
+        system_context: str,
+        session_id: str,
+        max_recursion_depth: int,
+        user_id: str,
+        permissions: "CopilotPermissions | None" = None,
+    ) -> tuple[str, list[ToolCallEntry], str, str, TokenUsage]:
+        """Invoke the copilot and collect all stream results.
+
+        Delegates to :func:`collect_copilot_response` — the shared helper that
+        consumes ``stream_chat_completion_sdk`` without wrapping it in an
+        ``asyncio.timeout`` (the SDK manages its own heartbeat-based timeouts).
+
+        Args:
+            prompt: The user task/instruction.
+            system_context: Optional context prepended to the prompt.
+            session_id: Chat session to use.
+            max_recursion_depth: Maximum allowed recursion nesting.
+            user_id: Authenticated user ID.
+            permissions: Optional capability filter restricting tools/blocks.
+
+        Returns:
+            A tuple of (response_text, tool_calls, history_json, session_id, usage).
+        """
+        from backend.copilot.sdk.collect import (
+            collect_copilot_response,  # avoid circular import
+        )
+
+        tokens = _check_recursion(max_recursion_depth)
+        perm_token = None
+        try:
+            effective_permissions, perm_token = _merge_inherited_permissions(
+                permissions
+            )
+            effective_prompt = prompt
+            if system_context:
+                effective_prompt = f"[System Context: {system_context}]\n\n{prompt}"
+
+            result = await collect_copilot_response(
+                session_id=session_id,
+                message=effective_prompt,
+                user_id=user_id,
+                permissions=effective_permissions,
+            )
+
+            # Build a lightweight conversation summary from streamed data.
+            turn_messages: list[dict[str, Any]] = [
+                {"role": "user", "content": effective_prompt},
+            ]
+            if result.tool_calls:
+                turn_messages.append(
+                    {
+                        "role": "assistant",
+                        "content": result.response_text,
+                        "tool_calls": result.tool_calls,
+                    }
+                )
+            else:
+                turn_messages.append(
+                    {"role": "assistant", "content": result.response_text}
+                )
+            history_json = json.dumps(turn_messages, default=str)
+
+            tool_calls: list[ToolCallEntry] = [
+                {
+                    "tool_call_id": tc["tool_call_id"],
+                    "tool_name": tc["tool_name"],
+                    "input": tc["input"],
+                    "output": tc["output"],
+                    "success": tc["success"],
+                }
+                for tc in result.tool_calls
+            ]
+
+            usage: TokenUsage = {
+                "prompt_tokens": result.prompt_tokens,
+                "completion_tokens": result.completion_tokens,
+                "total_tokens": result.total_tokens,
+            }
+
+            return (
+                result.response_text,
+                tool_calls,
+                history_json,
+                session_id,
+                usage,
+            )
+        finally:
+            _reset_recursion(tokens)
+            if perm_token is not None:
+                _inherited_permissions.reset(perm_token)
+
+    async def run(
+        self,
+        input_data: Input,
+        *,
+        execution_context: ExecutionContext,
+        **kwargs,
+    ) -> BlockOutput:
+        """Validate inputs, invoke the autopilot, and yield structured outputs.
+
+        Yields session_id even on failure so callers can inspect/resume the session.
+        """
+        if not input_data.prompt.strip():
+            yield "error", "Prompt cannot be empty."
+            return
+
+        if not execution_context.user_id:
+            yield "error", "Cannot run autopilot without an authenticated user."
+            return
+
+        if input_data.max_recursion_depth < 1:
+            yield "error", "max_recursion_depth must be at least 1."
+            return
+
+        # Validate and build permissions eagerly — fail before creating a session.
+        permissions = await _build_and_validate_permissions(input_data)
+        if isinstance(permissions, str):
+            # Validation error returned as a string message.
+            yield "error", permissions
+            return
+
+        # Create session eagerly so the user always gets the session_id,
+        # even if the downstream stream fails (avoids orphaned sessions).
+        sid = input_data.session_id
+        if not sid:
+            sid = await self.create_session(
+                execution_context.user_id, dry_run=input_data.dry_run
+            )
+
+        # NOTE: No asyncio.timeout() here — the SDK manages its own
+        # heartbeat-based timeouts internally.  Wrapping with asyncio.timeout
+        # would cancel the task mid-flight, corrupting the SDK's internal
+        # anyio memory stream (see service.py CRITICAL comment).
+        try:
+            response, tool_calls, history, _, usage = await self.execute_copilot(
+                prompt=input_data.prompt,
+                system_context=input_data.system_context,
+                session_id=sid,
+                max_recursion_depth=input_data.max_recursion_depth,
+                user_id=execution_context.user_id,
+                permissions=permissions,
+            )
+
+            yield "response", response
+            yield "tool_calls", tool_calls
+            yield "conversation_history", history
+            yield "session_id", sid
+            yield "token_usage", usage
+        except asyncio.CancelledError:
+            yield "session_id", sid
+            yield "error", "AutoPilot execution was cancelled."
+            raise
+        except Exception as exc:
+            yield "session_id", sid
+            yield "error", str(exc)
+
+
+# ---------------------------------------------------------------------------
+# Helpers – placed after the block class for top-down readability.
+# ---------------------------------------------------------------------------
+
+# Task-scoped recursion depth counter & chain-wide limit.
+# contextvars are scoped to the current asyncio task, so concurrent
+# graph executions each get independent counters.
+_autopilot_recursion_depth: contextvars.ContextVar[int] = contextvars.ContextVar(
+    "_autopilot_recursion_depth", default=0
+)
+_autopilot_recursion_limit: contextvars.ContextVar[int | None] = contextvars.ContextVar(
+    "_autopilot_recursion_limit", default=None
+)
+
+
+def _check_recursion(
+    max_depth: int,
+) -> tuple[contextvars.Token[int], contextvars.Token[int | None]]:
+    """Check and increment recursion depth.
+
+    Returns ContextVar tokens that must be passed to ``_reset_recursion``
+    when the caller exits to restore the previous depth.
+
+    Raises:
+        RuntimeError: If the current depth already meets or exceeds the limit.
+    """
+    current = _autopilot_recursion_depth.get()
+    inherited = _autopilot_recursion_limit.get()
+    limit = max_depth if inherited is None else min(inherited, max_depth)
+    if current >= limit:
+        raise RuntimeError(
+            f"AutoPilot recursion depth limit reached ({limit}). "
+            "The autopilot has called itself too many times."
+        )
+    return (
+        _autopilot_recursion_depth.set(current + 1),
+        _autopilot_recursion_limit.set(limit),
+    )
+
+
+def _reset_recursion(
+    tokens: tuple[contextvars.Token[int], contextvars.Token[int | None]],
+) -> None:
+    """Restore recursion depth and limit to their previous values."""
+    _autopilot_recursion_depth.reset(tokens[0])
+    _autopilot_recursion_limit.reset(tokens[1])
+
+
+# ---------------------------------------------------------------------------
+# Permission helpers
+# ---------------------------------------------------------------------------
+
+# Inherited permissions from a parent AutoPilotBlock execution.
+# This acts as a ceiling: child executions can only be more restrictive.
+_inherited_permissions: contextvars.ContextVar["CopilotPermissions | None"] = (
+    contextvars.ContextVar("_inherited_permissions", default=None)
+)
+
+
+async def _build_and_validate_permissions(
+    input_data: "AutoPilotBlock.Input",
+) -> "CopilotPermissions | str":
+    """Build a :class:`CopilotPermissions` from block input and validate it.
+
+    Returns a :class:`CopilotPermissions` on success or a human-readable
+    error string if validation fails.
+    """
+    # Tool names are validated by Pydantic via the ToolName Literal type
+    # at model construction time — no runtime check needed here.
+    # Validate block identifiers against live block registry.
+    if input_data.blocks:
+        invalid_blocks = await validate_block_identifiers(input_data.blocks)
+        if invalid_blocks:
+            return (
+                f"Unknown block identifier(s) in 'blocks': {invalid_blocks}. "
+                "Use find_block to discover valid block names and IDs. "
+                "You may also use the first 8 characters of a block UUID."
+            )
+
+    return CopilotPermissions(
+        tools=list(input_data.tools),
+        tools_exclude=input_data.tools_exclude,
+        blocks=input_data.blocks,
+        blocks_exclude=input_data.blocks_exclude,
+    )
+
+
+def _merge_inherited_permissions(
+    permissions: "CopilotPermissions | None",
+) -> "tuple[CopilotPermissions | None, contextvars.Token[CopilotPermissions | None] | None]":
+    """Merge *permissions* with any inherited parent permissions.
+
+    The merged result is stored back into the contextvar so that any nested
+    AutoPilotBlock invocation (sub-agent) inherits the merged ceiling.
+
+    Returns a tuple of (merged_permissions, reset_token).  The caller MUST
+    reset the contextvar via ``_inherited_permissions.reset(token)`` in a
+    ``finally`` block when ``reset_token`` is not None — this prevents
+    permission leakage between sequential independent executions in the same
+    asyncio task.
+    """
+    parent = _inherited_permissions.get()
+
+    if permissions is None and parent is None:
+        return None, None
+
+    all_tools = all_known_tool_names()
+
+    if permissions is None:
+        permissions = CopilotPermissions()  # allow-all; will be narrowed by parent
+
+    merged = (
+        permissions.merged_with_parent(parent, all_tools)
+        if parent is not None
+        else permissions
+    )
+
+    # Store merged permissions as the new inherited ceiling for nested calls.
+    # Return the token so the caller can restore the previous value in finally.
+    token = _inherited_permissions.set(merged)
+    return merged, token
--- a/autogpt_platform/backend/backend/blocks/autopilot_permissions_test.py
+++ b/autogpt_platform/backend/backend/blocks/autopilot_permissions_test.py
@@ -0,0 +1,265 @@
+"""Tests for AutoPilotBlock permission fields and validation."""
+
+from __future__ import annotations
+
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+from pydantic import ValidationError
+
+from backend.blocks.autopilot import (
+    AutoPilotBlock,
+    _build_and_validate_permissions,
+    _inherited_permissions,
+    _merge_inherited_permissions,
+)
+from backend.copilot.permissions import CopilotPermissions, all_known_tool_names
+from backend.data.execution import ExecutionContext
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def _make_input(**kwargs) -> AutoPilotBlock.Input:
+    defaults = {
+        "prompt": "Do something",
+        "system_context": "",
+        "session_id": "",
+        "max_recursion_depth": 3,
+        "tools": [],
+        "tools_exclude": True,
+        "blocks": [],
+        "blocks_exclude": True,
+    }
+    defaults.update(kwargs)
+    return AutoPilotBlock.Input(**defaults)
+
+
+# ---------------------------------------------------------------------------
+# _build_and_validate_permissions
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+class TestBuildAndValidatePermissions:
+    async def test_empty_inputs_returns_empty_permissions(self):
+        inp = _make_input()
+        result = await _build_and_validate_permissions(inp)
+        assert isinstance(result, CopilotPermissions)
+        assert result.is_empty()
+
+    async def test_valid_tool_names_accepted(self):
+        inp = _make_input(tools=["run_block", "web_fetch"], tools_exclude=True)
+        result = await _build_and_validate_permissions(inp)
+        assert isinstance(result, CopilotPermissions)
+        assert result.tools == ["run_block", "web_fetch"]
+        assert result.tools_exclude is True
+
+    async def test_invalid_tool_rejected_by_pydantic(self):
+        """Invalid tool names are now caught at Pydantic validation time
+        (Literal type), before ``_build_and_validate_permissions`` is called."""
+        with pytest.raises(ValidationError, match="not_a_real_tool"):
+            _make_input(tools=["not_a_real_tool"])
+
+    async def test_valid_block_name_accepted(self):
+        mock_block_cls = MagicMock()
+        mock_block_cls.return_value.name = "HTTP Request"
+        with patch(
+            "backend.blocks.get_blocks",
+            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block_cls},
+        ):
+            inp = _make_input(blocks=["HTTP Request"], blocks_exclude=True)
+            result = await _build_and_validate_permissions(inp)
+        assert isinstance(result, CopilotPermissions)
+        assert result.blocks == ["HTTP Request"]
+
+    async def test_valid_partial_uuid_accepted(self):
+        mock_block_cls = MagicMock()
+        mock_block_cls.return_value.name = "HTTP Request"
+        with patch(
+            "backend.blocks.get_blocks",
+            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block_cls},
+        ):
+            inp = _make_input(blocks=["c069dc6b"], blocks_exclude=False)
+            result = await _build_and_validate_permissions(inp)
+        assert isinstance(result, CopilotPermissions)
+
+    async def test_invalid_block_identifier_returns_error(self):
+        mock_block_cls = MagicMock()
+        mock_block_cls.return_value.name = "HTTP Request"
+        with patch(
+            "backend.blocks.get_blocks",
+            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block_cls},
+        ):
+            inp = _make_input(blocks=["totally_fake_block"])
+            result = await _build_and_validate_permissions(inp)
+        assert isinstance(result, str)
+        assert "totally_fake_block" in result
+        assert "Unknown block identifier" in result
+
+    async def test_sdk_builtin_tool_names_accepted(self):
+        inp = _make_input(tools=["Read", "Task", "WebSearch"], tools_exclude=False)
+        result = await _build_and_validate_permissions(inp)
+        assert isinstance(result, CopilotPermissions)
+        assert not result.tools_exclude
+
+    async def test_empty_blocks_skips_validation(self):
+        # Should not call validate_block_identifiers at all when blocks=[].
+        with patch(
+            "backend.copilot.permissions.validate_block_identifiers"
+        ) as mock_validate:
+            inp = _make_input(blocks=[])
+            await _build_and_validate_permissions(inp)
+            mock_validate.assert_not_called()
+
+
+# ---------------------------------------------------------------------------
+# _merge_inherited_permissions
+# ---------------------------------------------------------------------------
+
+
+class TestMergeInheritedPermissions:
+    def test_no_permissions_no_parent_returns_none(self):
+        merged, token = _merge_inherited_permissions(None)
+        assert merged is None
+        assert token is None
+
+    def test_permissions_no_parent_returned_unchanged(self):
+        perms = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
+        merged, token = _merge_inherited_permissions(perms)
+        try:
+            assert merged is perms
+            assert token is not None
+        finally:
+            if token is not None:
+                _inherited_permissions.reset(token)
+
+    def test_child_narrows_parent(self):
+        parent = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
+        # Set parent as inherited
+        outer_token = _inherited_permissions.set(parent)
+        try:
+            child = CopilotPermissions(tools=["web_fetch"], tools_exclude=True)
+            merged, inner_token = _merge_inherited_permissions(child)
+            try:
+                assert merged is not None
+                all_t = all_known_tool_names()
+                effective = merged.effective_allowed_tools(all_t)
+                assert "bash_exec" not in effective
+                assert "web_fetch" not in effective
+            finally:
+                if inner_token is not None:
+                    _inherited_permissions.reset(inner_token)
+        finally:
+            _inherited_permissions.reset(outer_token)
+
+    def test_none_permissions_with_parent_uses_parent(self):
+        parent = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
+        outer_token = _inherited_permissions.set(parent)
+        try:
+            merged, inner_token = _merge_inherited_permissions(None)
+            try:
+                assert merged is not None
+                # Merged should have parent's restrictions
+                effective = merged.effective_allowed_tools(all_known_tool_names())
+                assert "bash_exec" not in effective
+            finally:
+                if inner_token is not None:
+                    _inherited_permissions.reset(inner_token)
+        finally:
+            _inherited_permissions.reset(outer_token)
+
+    def test_child_cannot_expand_parent_whitelist(self):
+        parent = CopilotPermissions(tools=["run_block"], tools_exclude=False)
+        outer_token = _inherited_permissions.set(parent)
+        try:
+            # Child tries to allow more tools
+            child = CopilotPermissions(
+                tools=["run_block", "bash_exec"], tools_exclude=False
+            )
+            merged, inner_token = _merge_inherited_permissions(child)
+            try:
+                assert merged is not None
+                effective = merged.effective_allowed_tools(all_known_tool_names())
+                assert "bash_exec" not in effective
+                assert "run_block" in effective
+            finally:
+                if inner_token is not None:
+                    _inherited_permissions.reset(inner_token)
+        finally:
+            _inherited_permissions.reset(outer_token)
+
+
+# ---------------------------------------------------------------------------
+# AutoPilotBlock.run — validation integration
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.asyncio
+class TestAutoPilotBlockRunPermissions:
+    async def _collect_outputs(self, block, input_data, user_id="test-user"):
+        """Helper to collect all yields from block.run()."""
+        ctx = ExecutionContext(
+            user_id=user_id,
+            graph_id="g1",
+            graph_exec_id="ge1",
+            node_exec_id="ne1",
+            node_id="n1",
+        )
+        outputs = {}
+        async for key, val in block.run(input_data, execution_context=ctx):
+            outputs[key] = val
+        return outputs
+
+    async def test_invalid_tool_rejected_by_pydantic(self):
+        """Invalid tool names are caught at Pydantic validation (Literal type)."""
+        with pytest.raises(ValidationError, match="not_a_tool"):
+            _make_input(tools=["not_a_tool"])
+
+    async def test_invalid_block_yields_error(self):
+        mock_block_cls = MagicMock()
+        mock_block_cls.return_value.name = "HTTP Request"
+        with patch(
+            "backend.blocks.get_blocks",
+            return_value={"c069dc6b-c3ed-4c12-b6e5-d47361e64ce6": mock_block_cls},
+        ):
+            block = AutoPilotBlock()
+            inp = _make_input(blocks=["nonexistent_block"])
+            outputs = await self._collect_outputs(block, inp)
+        assert "error" in outputs
+        assert "nonexistent_block" in outputs["error"]
+
+    async def test_empty_prompt_yields_error_before_permission_check(self):
+        block = AutoPilotBlock()
+        inp = _make_input(prompt="   ", tools=["run_block"])
+        outputs = await self._collect_outputs(block, inp)
+        assert "error" in outputs
+        assert "Prompt cannot be empty" in outputs["error"]
+
+    async def test_valid_permissions_passed_to_execute(self):
+        """Permissions are forwarded to execute_copilot when valid."""
+        block = AutoPilotBlock()
+        captured: dict = {}
+
+        async def fake_execute_copilot(self_inner, **kwargs):
+            captured["permissions"] = kwargs.get("permissions")
+            return (
+                "ok",
+                [],
+                '[{"role":"user","content":"hi"}]',
+                "test-sid",
+                {"prompt_tokens": 1, "completion_tokens": 1, "total_tokens": 2},
+            )
+
+        with patch.object(
+            AutoPilotBlock, "create_session", new=AsyncMock(return_value="test-sid")
+        ), patch.object(AutoPilotBlock, "execute_copilot", new=fake_execute_copilot):
+            inp = _make_input(tools=["run_block"], tools_exclude=False)
+            outputs = await self._collect_outputs(block, inp)
+
+        assert "error" not in outputs
+        perms = captured.get("permissions")
+        assert isinstance(perms, CopilotPermissions)
+        assert perms.tools == ["run_block"]
+        assert perms.tools_exclude is False
--- a/autogpt_platform/backend/backend/blocks/data_manipulation.py
+++ b/autogpt_platform/backend/backend/blocks/data_manipulation.py
@@ -472,7 +472,7 @@ class AddToListBlock(Block):

    async def run(self, input_data: Input, **kwargs) -> BlockOutput:
        entries_added = input_data.entries.copy()
-        if input_data.entry:
+        if input_data.entry is not None:
            entries_added.append(input_data.entry)

        updated_list = input_data.list.copy()
--- a/autogpt_platform/backend/backend/blocks/discord/bot_blocks.py
+++ b/autogpt_platform/backend/backend/blocks/discord/bot_blocks.py
@@ -73,7 +73,7 @@ class ReadDiscordMessagesBlock(Block):
            id="df06086a-d5ac-4abb-9996-2ad0acb2eff7",
            input_schema=ReadDiscordMessagesBlock.Input,  # Assign input schema
            output_schema=ReadDiscordMessagesBlock.Output,  # Assign output schema
-            description="Reads messages from a Discord channel using a bot token.",
+            description="Reads new messages from a Discord channel using a bot token and triggers when a new message is posted",
            categories={BlockCategory.SOCIAL},
            test_input={
                "continuous_read": False,
--- a/Show More
+++ b/Show More