diff --git a/.claude/skills/orchestrate/SKILL.md b/.claude/skills/orchestrate/SKILL.md
new file mode 100644
index 0000000000..eb82da0395
--- /dev/null
+++ b/.claude/skills/orchestrate/SKILL.md
@@ -0,0 +1,509 @@
+---
+name: orchestrate
+description: "Meta-agent supervisor that manages a fleet of Claude Code agents running in tmux windows. Auto-discovers spare worktrees, spawns agents, monitors state, kicks idle agents, approves safe confirmations, and recycles worktrees when done. TRIGGER when user asks to supervise agents, run parallel tasks, manage worktrees, check agent status, or orchestrate parallel work."
+user-invocable: true
+argument-hint: "any free text — e.g. 'start 3 agents on X Y Z', 'show status', 'add task: implement feature A', 'stop', 'how many are free?'"
+metadata:
+  author: autogpt-team
+  version: "6.0.0"
+---
+
+# Orchestrate — Agent Fleet Supervisor
+
+One tmux session, N windows — each window is one agent working in its own worktree. Speak naturally; Claude maps your intent to the right scripts.
+
+## Scripts
+
+```bash
+SKILLS_DIR=$(git rev-parse --show-toplevel)/.claude/skills/orchestrate/scripts
+STATE_FILE=~/.claude/orchestrator-state.json
+```
+
+| Script | Purpose |
+|---|---|
+| `find-spare.sh [REPO_ROOT]` | List free worktrees — one `PATH BRANCH` per line |
+| `spawn-agent.sh SESSION PATH SPARE NEW_BRANCH OBJECTIVE [PR_NUMBER] [STEPS...]` | Create window + checkout branch + launch claude + send task. **Stdout: `SESSION:WIN` only** |
+| `recycle-agent.sh WINDOW PATH SPARE_BRANCH` | Kill window + restore spare branch |
+| `run-loop.sh` | **Mechanical babysitter** — idle restart + dialog approval + recycle on ORCHESTRATOR:DONE + supervisor health check + all-done notification |
+| `verify-complete.sh WINDOW` | Verify PR is done: checkpoints ✓ + 0 unresolved threads + CI green. Repo auto-derived from state file `.repo` or git remote. |
+| `notify.sh MESSAGE` | Send notification via Discord webhook (env `DISCORD_WEBHOOK_URL` or state `.discord_webhook`), macOS notification center, and stdout |
+| `capacity.sh [REPO_ROOT]` | Print available + in-use worktrees |
+| `status.sh` | Print fleet status + live pane commands |
+| `poll-cycle.sh` | One monitoring cycle — classifies panes, tracks checkpoints, returns JSON action array |
+| `classify-pane.sh WINDOW` | Classify one pane state |
+
+## Supervision model
+
+```
+Orchestrating Claude (this Claude session — IS the supervisor)
+  └── Reads pane output, checks CI, intervenes with targeted guidance
+        run-loop.sh (separate tmux window, every 30s)
+          └── Mechanical only: idle restart, dialog approval, recycle on ORCHESTRATOR:DONE
+```
+
+**You (the orchestrating Claude)** are the supervisor. After spawning agents, stay in this conversation and actively monitor: poll each agent's pane every 2-3 minutes, check CI, nudge stalled agents, and verify completions. Do not spawn a separate supervisor Claude window — it loses context, is hard to observe, and compounds context compression problems.
+
+**run-loop.sh** is the mechanical layer — zero tokens, handles things that need no judgment: restart crashed agents, press Enter on dialogs, recycle completed worktrees (only after `verify-complete.sh` passes).
+
+## Checkpoint protocol
+
+Agents output checkpoints as they complete each required step:
+
+```
+CHECKPOINT:<step-name>
+```
+
+Required steps are passed as args to `spawn-agent.sh` (e.g. `pr-address pr-test`). `run-loop.sh` will not recycle a window until all required checkpoints are found in the pane output. If `verify-complete.sh` fails, the agent is re-briefed automatically.
+
+## Worktree lifecycle
+
+```text
+spare/N branch  →  spawn-agent.sh (--session-id UUID)  →  window + feat/branch + claude running
+                                                                 ↓
+                                               CHECKPOINT:<step> (as steps complete)
+                                                                 ↓
+                                                        ORCHESTRATOR:DONE
+                                                                 ↓
+                                    verify-complete.sh: checkpoints ✓ + 0 threads + CI green
+                                                                 ↓
+                                              state → "done", notify, window KEPT OPEN
+                                                                 ↓
+                              user/orchestrator explicitly requests recycle
+                                                                 ↓
+                                         recycle-agent.sh → spare/N (free again)
+```
+
+**Windows are never auto-killed.** The worktree stays on its branch, the session stays alive. The agent is done working but the window, git state, and Claude session are all preserved until you choose to recycle.
+
+**To resume a done or crashed session:**
+```bash
+# Resume by stored session ID (preferred — exact session, full context)
+claude --resume SESSION_ID --permission-mode bypassPermissions
+
+# Or resume most recent session in that worktree directory
+cd /path/to/worktree && claude --continue --permission-mode bypassPermissions
+```
+
+**To manually recycle when ready:**
+```bash
+bash ~/.claude/orchestrator/scripts/recycle-agent.sh SESSION:WIN WORKTREE_PATH spare/N
+# Then update state:
+jq --arg w "SESSION:WIN" '.agents |= map(if .window == $w then .state = "recycled" else . end)' \
+  ~/.claude/orchestrator-state.json > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
+```
+
+## State file (`~/.claude/orchestrator-state.json`)
+
+Never committed to git. You maintain this file directly using `jq` + atomic writes (`.tmp` → `mv`).
+
+```json
+{
+  "active": true,
+  "tmux_session": "autogpt1",
+  "idle_threshold_seconds": 300,
+  "loop_window": "autogpt1:5",
+  "repo": "Significant-Gravitas/AutoGPT",
+  "discord_webhook": "https://discord.com/api/webhooks/...",
+  "last_poll_at": 0,
+  "agents": [
+    {
+      "window": "autogpt1:3",
+      "worktree": "AutoGPT6",
+      "worktree_path": "/path/to/AutoGPT6",
+      "spare_branch": "spare/6",
+      "branch": "feat/my-feature",
+      "objective": "Implement X and open a PR",
+      "pr_number": "12345",
+      "session_id": "550e8400-e29b-41d4-a716-446655440000",
+      "steps": ["pr-address", "pr-test"],
+      "checkpoints": ["pr-address"],
+      "state": "running",
+      "last_output_hash": "",
+      "last_seen_at": 0,
+      "spawned_at": 0,
+      "idle_since": 0,
+      "revision_count": 0,
+      "last_rebriefed_at": 0
+    }
+  ]
+}
+```
+
+Top-level optional fields:
+- `repo` — GitHub `owner/repo` for CI/thread checks. Auto-derived from git remote if omitted.
+- `discord_webhook` — Discord webhook URL for completion notifications. Also reads `DISCORD_WEBHOOK_URL` env var.
+
+Per-agent fields:
+- `session_id` — UUID passed to `claude --session-id` at spawn; use with `claude --resume UUID` to restore exact session context after a crash or window close.
+- `last_rebriefed_at` — Unix timestamp of last re-brief; enforces 5-min cooldown to prevent spam.
+
+Agent states: `running` | `idle` | `stuck` | `waiting_approval` | `complete` | `done` | `escalated`
+
+`done` means verified complete — window is still open, session still alive, worktree still on task branch. Not recycled yet.
+
+## Serial /pr-test rule
+
+`/pr-test` and `/pr-test --fix` run local Docker + integration tests that use shared ports, a shared database, and shared build caches. **Running two `/pr-test` jobs simultaneously will cause port conflicts and database corruption.**
+
+**Rule: only one `/pr-test` runs at a time. The orchestrator serializes them.**
+
+You (the orchestrating Claude) own the test queue:
+1. Agents do `pr-review` and `pr-address` in parallel — that's safe (they only push code and reply to GitHub).
+2. When a PR needs local testing, add it to your mental queue — don't give agents a `pr-test` step.
+3. Run `/pr-test https://github.com/OWNER/REPO/pull/PR_NUMBER --fix` yourself, sequentially.
+4. Feed results back to the relevant agent via `tmux send-keys`:
+   ```bash
+   tmux send-keys -t SESSION:WIN "Local tests for PR #N: <paste failure output or 'all passed'>. Fix any failures and push, then output ORCHESTRATOR:DONE."
+   sleep 0.3
+   tmux send-keys -t SESSION:WIN Enter
+   ```
+5. Wait for CI to confirm green before marking the agent done.
+
+If multiple PRs need testing at the same time, pick the one furthest along (fewest pending CI checks) and test it first. Only start the next test after the previous one completes.
+
+## Session restore (tested and confirmed)
+
+Agent sessions are saved to disk. To restore a closed or crashed session:
+
+```bash
+# If session_id is in state (preferred):
+NEW_WIN=$(tmux new-window -t SESSION -n WORKTREE_NAME -P -F '#{window_index}')
+tmux send-keys -t "SESSION:${NEW_WIN}" "cd /path/to/worktree && claude --resume SESSION_ID --permission-mode bypassPermissions" Enter
+
+# If no session_id (use --continue for most recent session in that directory):
+tmux send-keys -t "SESSION:${NEW_WIN}" "cd /path/to/worktree && claude --continue --permission-mode bypassPermissions" Enter
+```
+
+`--continue` restores the full conversation history including all tool calls, file edits, and context. The agent resumes exactly where it left off. After restoring, update the window address in the state file:
+
+```bash
+jq --arg old "SESSION:OLD_WIN" --arg new "SESSION:NEW_WIN" \
+  '(.agents[] | select(.window == $old)).window = $new' \
+  ~/.claude/orchestrator-state.json > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
+```
+
+## Intent → action mapping
+
+Match the user's message to one of these intents:
+
+| The user says something like… | What to do |
+|---|---|
+| "status", "what's running", "show agents" | Run `status.sh` + `capacity.sh`, show output |
+| "how many free", "capacity", "available worktrees" | Run `capacity.sh`, show output |
+| "start N agents on X, Y, Z" or "run these tasks: …" | See **Spawning agents** below |
+| "add task: …", "add one more agent for …" | See **Adding an agent** below |
+| "stop", "shut down", "pause the fleet" | See **Stopping** below |
+| "poll", "check now", "run a cycle" | Run `poll-cycle.sh`, process actions |
+| "recycle window X", "free up autogpt3" | Run `recycle-agent.sh` directly |
+
+When the intent is ambiguous, show capacity first and ask what tasks to run.
+
+## Spawning agents
+
+### 1. Resolve tmux session
+
+```bash
+tmux list-sessions -F "#{session_name}: #{session_windows} windows" 2>/dev/null
+```
+
+Use an existing session. **Never create a tmux session from within Claude** — it becomes a child of Claude's process and dies when the session ends. If no session exists, tell the user to run `tmux new-session -d -s autogpt1` in their terminal first, then re-invoke `/orchestrate`.
+
+### 2. Show available capacity
+
+```bash
+bash $SKILLS_DIR/capacity.sh $(git rev-parse --show-toplevel)
+```
+
+### 3. Collect tasks from the user
+
+For each task, gather:
+- **objective** — what to do (e.g. "implement feature X and open a PR")
+- **branch name** — e.g. `feat/my-feature` (derive from objective if not given)
+- **pr_number** — GitHub PR number if working on an existing PR (for verification)
+- **steps** — required checkpoint names in order (e.g. `pr-address pr-test`) — derive from objective
+
+Ask for `idle_threshold_seconds` only if the user mentions it (default: 300).
+
+Never ask the user to specify a worktree — auto-assign from `find-spare.sh`.
+
+### 4. Spawn one agent per task
+
+```bash
+# Get ordered list of spare worktrees
+SPARE_LIST=$(bash $SKILLS_DIR/find-spare.sh $(git rev-parse --show-toplevel))
+
+# For each task, take the next spare line:
+WORKTREE_PATH=$(echo "$SPARE_LINE" | awk '{print $1}')
+SPARE_BRANCH=$(echo "$SPARE_LINE" | awk '{print $2}')
+
+# With PR number and required steps:
+WINDOW=$(bash $SKILLS_DIR/spawn-agent.sh "$SESSION" "$WORKTREE_PATH" "$SPARE_BRANCH" "$NEW_BRANCH" "$OBJECTIVE" "$PR_NUMBER" "pr-address" "pr-test")
+
+# Without PR (new work):
+WINDOW=$(bash $SKILLS_DIR/spawn-agent.sh "$SESSION" "$WORKTREE_PATH" "$SPARE_BRANCH" "$NEW_BRANCH" "$OBJECTIVE")
+```
+
+Build an agent record and append it to the state file. If the state file doesn't exist yet, initialize it:
+
+```bash
+# Derive repo from git remote (used by verify-complete.sh + supervisor)
+REPO=$(git remote get-url origin 2>/dev/null | sed 's|.*github\.com[:/]||; s|\.git$||' || echo "")
+
+jq -n \
+  --arg session "$SESSION" \
+  --arg repo "$REPO" \
+  --argjson threshold 300 \
+  '{active:true, tmux_session:$session, idle_threshold_seconds:$threshold,
+    repo:$repo, loop_window:null, supervisor_window:null, last_poll_at:0, agents:[]}' \
+  > ~/.claude/orchestrator-state.json
+```
+
+Optionally add a Discord webhook for completion notifications:
+```bash
+jq --arg hook "$DISCORD_WEBHOOK_URL" '.discord_webhook = $hook' ~/.claude/orchestrator-state.json \
+  > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
+```
+
+`spawn-agent.sh` writes the initial agent record (window, worktree_path, branch, objective, state, etc.) to the state file automatically — **do not append the record again after calling it.** The record already exists and `pr_number`/`steps` are patched in by the script itself.
+
+### 5. Start the mechanical babysitter
+
+```bash
+LOOP_WIN=$(tmux new-window -t "$SESSION" -n "orchestrator" -P -F '#{window_index}')
+LOOP_WINDOW="${SESSION}:${LOOP_WIN}"
+tmux send-keys -t "$LOOP_WINDOW" "bash $SKILLS_DIR/run-loop.sh" Enter
+
+jq --arg w "$LOOP_WINDOW" '.loop_window = $w' ~/.claude/orchestrator-state.json \
+  > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
+```
+
+### 6. Begin supervising directly in this conversation
+
+You are the supervisor. After spawning, immediately start your first poll loop (see **Supervisor duties** below) and continue every 2-3 minutes. Do NOT spawn a separate supervisor Claude window.
+
+## Adding an agent
+
+Find the next spare worktree, then spawn and append to state — same as steps 2–4 above but for a single task. If no spare worktrees are available, tell the user.
+
+## Supervisor duties (YOUR job, every 2-3 min in this conversation)
+
+You are the supervisor. Run this poll loop directly in your Claude session — not in a separate window.
+
+### Poll loop mechanism
+
+You are reactive — you only act when a tool completes or the user sends a message. To create a self-sustaining poll loop without user involvement:
+
+1. Start each poll with `run_in_background: true` + a sleep before the work:
+   ```bash
+   sleep 120 && tmux capture-pane -t autogpt1:0 -p -S -200 | tail -40
+   # + similar for each active window
+   ```
+2. When the background job notifies you, read the pane output and take action.
+3. Immediately schedule the next background poll — this keeps the loop alive.
+4. Stop scheduling when all agents are done/escalated.
+
+**Never tell the user "I'll poll every 2-3 minutes"** — that does nothing without a trigger. Start the background job instead.
+
+### Each poll: what to check
+
+```bash
+# 1. Read state
+cat ~/.claude/orchestrator-state.json | jq '.agents[] | {window, worktree, branch, state, pr_number, checkpoints}'
+
+# 2. For each running/stuck/idle agent, capture pane
+tmux capture-pane -t SESSION:WIN -p -S -200 | tail -60
+```
+
+For each agent, decide:
+
+| What you see | Action |
+|---|---|
+| Spinner / tools running | Do nothing — agent is working |
+| Idle `❯` prompt, no `ORCHESTRATOR:DONE` | Stalled — send specific nudge with objective from state |
+| Stuck in error loop | Send targeted fix with exact error + solution |
+| Waiting for input / question | Answer and unblock via `tmux send-keys` |
+| CI red | `gh pr checks PR_NUMBER --repo REPO` → tell agent exactly what's failing |
+| Context compacted / agent lost | Send recovery: `cat ~/.claude/orchestrator-state.json | jq '.agents[] | select(.window=="WIN")'` + `gh pr view PR_NUMBER --json title,body` |
+| `ORCHESTRATOR:DONE` in output | Run `verify-complete.sh` — if it fails, re-brief with specific reason |
+
+### Strict ORCHESTRATOR:DONE gate
+
+`verify-complete.sh` handles the main checks automatically (checkpoints, threads, CHANGES_REQUESTED, CI green, spawned_at). Run it:
+
+```bash
+SKILLS_DIR=~/.claude/orchestrator/scripts
+bash $SKILLS_DIR/verify-complete.sh SESSION:WIN
+```
+
+If it passes → run-loop.sh will recycle the window automatically. No manual action needed.
+If it fails → re-brief the agent with the failure reason. Never manually mark state `done` to bypass this.
+
+### Re-brief a stalled agent
+
+```bash
+OBJ=$(jq -r --arg w SESSION:WIN '.agents[] | select(.window==$w) | .objective' ~/.claude/orchestrator-state.json)
+PR=$(jq -r --arg w SESSION:WIN '.agents[] | select(.window==$w) | .pr_number' ~/.claude/orchestrator-state.json)
+tmux send-keys -t SESSION:WIN "You appear stalled. Your objective: $OBJ. Check: gh pr view $PR --json title,body,headRefName to reorient."
+sleep 0.3
+tmux send-keys -t SESSION:WIN Enter
+```
+
+If `image_path` is set on the agent record, include: "Re-read context at IMAGE_PATH with the Read tool."
+
+## Self-recovery protocol (agents)
+
+spawn-agent.sh automatically includes this instruction in every objective:
+
+> If your context compacts and you lose track of what to do, run:
+> `cat ~/.claude/orchestrator-state.json | jq '.agents[] | select(.window=="SESSION:WIN")'`
+> and `gh pr view PR_NUMBER --json title,body,headRefName` to reorient.
+> Output each completed step as `CHECKPOINT:<step-name>` on its own line.
+
+## Passing images and screenshots to agents
+
+`tmux send-keys` is text-only — you cannot paste a raw image into a pane. To give an agent visual context (screenshots, diagrams, mockups):
+
+1. **Save the image to a temp file** with a stable path:
+   ```bash
+   # If the user drags in a screenshot or you receive a file path:
+   IMAGE_PATH="/tmp/orchestrator-context-$(date +%s).png"
+   cp "$USER_PROVIDED_PATH" "$IMAGE_PATH"
+   ```
+
+2. **Reference the path in the objective string**:
+   ```bash
+   OBJECTIVE="Implement the layout shown in /tmp/orchestrator-context-1234567890.png. Read that image first with the Read tool to understand the design."
+   ```
+
+3. The agent uses its `Read` tool to view the image at startup — Claude Code agents are multimodal and can read image files directly.
+
+**Rule**: always use `/tmp/orchestrator-context-<timestamp>.png` as the naming convention so the supervisor knows what to look for if it needs to re-brief an agent with the same image.
+
+---
+
+## Orchestrator final evaluation (YOU decide, not the script)
+
+`verify-complete.sh` is a gate — it blocks premature marking. But it cannot tell you if the work is actually good. That is YOUR job.
+
+When run-loop marks an agent `pending_evaluation` and you're notified, do all of these before marking done:
+
+### 1. Run /pr-test (required, serialized, use TodoWrite to queue)
+
+`/pr-test` is the only reliable confirmation that the objective is actually met. Run it yourself, not the agent.
+
+**When multiple PRs reach `pending_evaluation` at the same time, use TodoWrite to queue them:**
+```
+- [ ] /pr-test PR #12636 — fix copilot retry logic
+- [ ] /pr-test PR #12699 — builder chat panel
+```
+Run one at a time. Check off as you go.
+
+```
+/pr-test https://github.com/Significant-Gravitas/AutoGPT/pull/PR_NUMBER
+```
+
+**/pr-test can be lazy** — if it gives vague output, re-run with full context:
+
+```
+/pr-test https://github.com/OWNER/REPO/pull/PR_NUMBER
+Context: This PR implements <objective from state file>. Key files: <list>.
+Please verify: <specific behaviors to check>.
+```
+
+Only one `/pr-test` at a time — they share ports and DB.
+
+### 2. Do your own evaluation
+
+1. **Read the PR diff and objective** — does the code actually implement what was asked? Is anything obviously missing or half-done?
+2. **Read the resolved threads** — were comments addressed with real fixes, or just dismissed/resolved without changes?
+3. **Check CI run names** — any suspicious retries that shouldn't have passed?
+4. **Check the PR description** — title, summary, test plan complete?
+
+### 3. Decide
+
+- `/pr-test` passes + evaluation looks good → mark `done` in state, tell the user the PR is ready, ask if window should be closed
+- `/pr-test` fails or evaluation finds gaps → re-brief the agent with specific failures, set state back to `running`
+
+**Never mark done based purely on script output.** You hold the full objective context; the script does not.
+
+```bash
+# Mark done after your positive evaluation:
+jq --arg w "SESSION:WIN" '(.agents[] | select(.window == $w)).state = "done"' \
+  ~/.claude/orchestrator-state.json > /tmp/orch.tmp && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
+```
+
+## When to stop the fleet
+
+Stop the fleet (`active = false`) when **all** of the following are true:
+
+| Check | How to verify |
+|---|---|
+| All agents are `done` or `escalated` | `jq '[.agents[] | select(.state | test("running\|stuck\|idle\|waiting_approval"))] | length' ~/.claude/orchestrator-state.json` == 0 |
+| All PRs have 0 unresolved review threads | GraphQL `isResolved` check per PR |
+| All PRs have green CI **on a run triggered after the agent's last push** | `gh run list --branch BRANCH --limit 1` timestamp > `spawned_at` in state |
+| No agents are `escalated` without human review | If any are escalated, surface to user first |
+
+**Do NOT stop just because agents output `ORCHESTRATOR:DONE`.** That is a signal to verify, not a signal to stop.
+
+**Do stop** if the user explicitly says "stop", "shut down", or "kill everything", even with agents still running.
+
+```bash
+# Graceful stop
+jq '.active = false' ~/.claude/orchestrator-state.json > /tmp/orch.tmp \
+  && mv /tmp/orch.tmp ~/.claude/orchestrator-state.json
+
+LOOP_WINDOW=$(jq -r '.loop_window // ""' ~/.claude/orchestrator-state.json)
+[ -n "$LOOP_WINDOW" ] && tmux kill-window -t "$LOOP_WINDOW" 2>/dev/null || true
+```
+
+Does **not** recycle running worktrees — agents may still be mid-task. Run `capacity.sh` to see what's still in progress.
+
+## tmux send-keys pattern
+
+**Always split long messages into text + Enter as two separate calls with a sleep between them.** If sent as one call (`"text" Enter`), Enter can fire before the full string is buffered into Claude's input — leaving the message stuck as `[Pasted text +N lines]` unsent.
+
+```bash
+# CORRECT — text then Enter separately
+tmux send-keys -t "$WINDOW" "your long message here"
+sleep 0.3
+tmux send-keys -t "$WINDOW" Enter
+
+# WRONG — Enter may fire before text is buffered
+tmux send-keys -t "$WINDOW" "your long message here" Enter
+```
+
+Short single-character sends (`y`, `Down`, empty Enter for dialog approval) are safe to combine since they have no buffering lag.
+
+---
+
+## Protected worktrees
+
+Some worktrees must **never** be used as spare worktrees for agent tasks because they host files critical to the orchestrator itself:
+
+| Worktree | Protected branch | Why |
+|---|---|---|
+| `AutoGPT1` | `dx/orchestrate-skill` | Hosts the orchestrate skill scripts. `recycle-agent.sh` would check out `spare/1`, wiping `.claude/skills/` and breaking all subsequent `spawn-agent.sh` calls. |
+
+**Rule**: when selecting spare worktrees via `find-spare.sh`, skip any worktree whose CURRENT branch matches a protected branch. If you accidentally spawn an agent in a protected worktree, do not let `recycle-agent.sh` run on it — manually restore the branch after the agent finishes.
+
+When `dx/orchestrate-skill` is merged into `dev`, `AutoGPT1` becomes a normal spare again.
+
+---
+
+## Key rules
+
+1. **Scripts do all the heavy lifting** — don't reimplement their logic inline in this file
+2. **Never ask the user to pick a worktree** — auto-assign from `find-spare.sh` output
+3. **Never restart a running agent** — only restart on `idle` kicks (foreground is a shell)
+4. **Auto-dismiss settings dialogs** — if "Enter to confirm" appears, send Down+Enter
+5. **Always `--permission-mode bypassPermissions`** on every spawn
+6. **Escalate after 3 kicks** — mark `escalated`, surface to user
+7. **Atomic state writes** — always write to `.tmp` then `mv`
+8. **Never approve destructive commands** outside the worktree scope — when in doubt, escalate
+9. **Never recycle without verification** — `verify-complete.sh` must pass before recycling
+10. **No TASK.md files** — commit risk; use state file + `gh pr view` for agent context persistence
+11. **Re-brief stalled agents** — read objective from state file + `gh pr view`, send via tmux
+12. **ORCHESTRATOR:DONE is a signal to verify, not to accept** — always run `verify-complete.sh` and check CI run timestamp before recycling
+13. **Protected worktrees** — never use the worktree hosting the skill scripts as a spare
+14. **Images via file path** — save screenshots to `/tmp/orchestrator-context-<ts>.png`, pass path in objective; agents read with the `Read` tool
+15. **Split send-keys** — always separate text and Enter with `sleep 0.3` between calls for long strings
diff --git a/.claude/skills/orchestrate/scripts/capacity.sh b/.claude/skills/orchestrate/scripts/capacity.sh
new file mode 100755
index 0000000000..1bbf376297
--- /dev/null
+++ b/.claude/skills/orchestrate/scripts/capacity.sh
@@ -0,0 +1,43 @@
+#!/usr/bin/env bash
+# capacity.sh — show fleet capacity: available spare worktrees + in-use agents
+#
+# Usage: capacity.sh [REPO_ROOT]
+#   REPO_ROOT defaults to the root worktree of the current git repo.
+#
+# Reads: ~/.claude/orchestrator-state.json (skipped if missing or corrupt)
+
+set -euo pipefail
+
+SCRIPTS_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
+REPO_ROOT="${1:-$(git rev-parse --show-toplevel 2>/dev/null || echo "")}"
+
+echo "=== Available (spare) worktrees ==="
+if [ -n "$REPO_ROOT" ]; then
+  SPARE=$("$SCRIPTS_DIR/find-spare.sh" "$REPO_ROOT" 2>/dev/null || echo "")
+else
+  SPARE=$("$SCRIPTS_DIR/find-spare.sh" 2>/dev/null || echo "")
+fi
+
+if [ -z "$SPARE" ]; then
+  echo "  (none)"
+else
+  while IFS= read -r line; do
+    [ -z "$line" ] && continue
+    echo "  ✓ $line"
+  done <<< "$SPARE"
+fi
+
+echo ""
+echo "=== In-use worktrees ==="
+if [ -f "$STATE_FILE" ] && jq -e '.' "$STATE_FILE" >/dev/null 2>&1; then
+  IN_USE=$(jq -r '.agents[] | select(.state != "done") | "  [\(.state)] \(.worktree_path) → \(.branch)"' \
+    "$STATE_FILE" 2>/dev/null || echo "")
+  if [ -n "$IN_USE" ]; then
+    echo "$IN_USE"
+  else
+    echo "  (none)"
+  fi
+else
+  echo "  (no active state file)"
+fi
diff --git a/.claude/skills/orchestrate/scripts/classify-pane.sh b/.claude/skills/orchestrate/scripts/classify-pane.sh
new file mode 100755
index 0000000000..57504c72ce
--- /dev/null
+++ b/.claude/skills/orchestrate/scripts/classify-pane.sh
@@ -0,0 +1,85 @@
+#!/usr/bin/env bash
+# classify-pane.sh — Classify the current state of a tmux pane
+#
+# Usage: classify-pane.sh <tmux-target>
+#   tmux-target: e.g. "work:0", "work:1.0"
+#
+# Output (stdout): JSON object:
+#   { "state": "running|idle|waiting_approval|complete", "reason": "...", "pane_cmd": "..." }
+#
+# Exit codes: 0=ok, 1=error (invalid target or tmux window not found)
+
+set -euo pipefail
+
+TARGET="${1:-}"
+
+if [ -z "$TARGET" ]; then
+  echo '{"state":"error","reason":"no target provided","pane_cmd":""}'
+  exit 1
+fi
+
+# Validate tmux target format: session:window or session:window.pane
+if ! [[ "$TARGET" =~ ^[a-zA-Z0-9_.-]+:[a-zA-Z0-9_.-]+(\.[0-9]+)?$ ]]; then
+  echo '{"state":"error","reason":"invalid tmux target format","pane_cmd":""}'
+  exit 1
+fi
+
+# Check session exists (use %%:* to extract session name from session:window)
+if ! tmux list-windows -t "${TARGET%%:*}" &>/dev/null 2>&1; then
+  echo '{"state":"error","reason":"tmux target not found","pane_cmd":""}'
+  exit 1
+fi
+
+# Get the current foreground command in the pane
+PANE_CMD=$(tmux display-message -t "$TARGET" -p '#{pane_current_command}' 2>/dev/null || echo "unknown")
+
+# Capture and strip ANSI codes (use perl for cross-platform compatibility — BSD sed lacks \x1b support)
+RAW=$(tmux capture-pane -t "$TARGET" -p -S -50 2>/dev/null || echo "")
+CLEAN=$(echo "$RAW" | perl -pe 's/\x1b\[[0-9;]*[a-zA-Z]//g; s/\x1b\(B//g; s/\x1b\[\?[0-9]*[hl]//g; s/\r//g' \
+  | grep -v '^[[:space:]]*$' || true)
+
+# --- Check: explicit completion marker ---
+# Must be on its own line (not buried in the objective text sent at spawn time).
+if echo "$CLEAN" | grep -qE "^[[:space:]]*ORCHESTRATOR:DONE[[:space:]]*$"; then
+  jq -n --arg cmd "$PANE_CMD" '{"state":"complete","reason":"ORCHESTRATOR:DONE marker found","pane_cmd":$cmd}'
+  exit 0
+fi
+
+# --- Check: Claude Code approval prompt patterns ---
+LAST_40=$(echo "$CLEAN" | tail -40)
+APPROVAL_PATTERNS=(
+  "Do you want to proceed"
+  "Do you want to make this"
+  "\\[y/n\\]"
+  "\\[Y/n\\]"
+  "\\[n/Y\\]"
+  "Proceed\\?"
+  "Allow this command"
+  "Run bash command"
+  "Allow bash"
+  "Would you like"
+  "Press enter to continue"
+  "Esc to cancel"
+)
+for pattern in "${APPROVAL_PATTERNS[@]}"; do
+  if echo "$LAST_40" | grep -qiE "$pattern"; then
+    jq -n --arg pattern "$pattern" --arg cmd "$PANE_CMD" \
+      '{"state":"waiting_approval","reason":"approval pattern: \($pattern)","pane_cmd":$cmd}'
+    exit 0
+  fi
+done
+
+# --- Check: shell prompt (claude has exited) ---
+# If the foreground process is a shell (not claude/node), the agent has exited
+case "$PANE_CMD" in
+  zsh|bash|fish|sh|dash|tcsh|ksh)
+    jq -n --arg cmd "$PANE_CMD" \
+      '{"state":"idle","reason":"agent exited — shell prompt active","pane_cmd":$cmd}'
+    exit 0
+    ;;
+esac
+
+# Agent is still running (claude/node/python is the foreground process)
+jq -n --arg cmd "$PANE_CMD" \
+  '{"state":"running","reason":"foreground process: \($cmd)","pane_cmd":$cmd}'
+exit 0
diff --git a/.claude/skills/orchestrate/scripts/find-spare.sh b/.claude/skills/orchestrate/scripts/find-spare.sh
new file mode 100755
index 0000000000..e374a41c9b
--- /dev/null
+++ b/.claude/skills/orchestrate/scripts/find-spare.sh
@@ -0,0 +1,24 @@
+#!/usr/bin/env bash
+# find-spare.sh — list worktrees on spare/N branches (free to use)
+#
+# Usage: find-spare.sh [REPO_ROOT]
+#   REPO_ROOT defaults to the root worktree containing the current git repo.
+#
+# Output (stdout): one line per available worktree: "PATH BRANCH"
+#   e.g.: /Users/me/Code/AutoGPT3 spare/3
+
+set -euo pipefail
+
+REPO_ROOT="${1:-$(git rev-parse --show-toplevel 2>/dev/null || echo "")}"
+if [ -z "$REPO_ROOT" ]; then
+  echo "Error: not inside a git repo and no REPO_ROOT provided" >&2
+  exit 1
+fi
+
+git -C "$REPO_ROOT" worktree list --porcelain \
+  | awk '
+      /^worktree / { path = substr($0, 10) }
+      /^branch /   { branch = substr($0, 8); print path " " branch }
+    ' \
+  | { grep -E " refs/heads/spare/[0-9]+$" || true; } \
+  | sed 's|refs/heads/||'
diff --git a/.claude/skills/orchestrate/scripts/notify.sh b/.claude/skills/orchestrate/scripts/notify.sh
new file mode 100755
index 0000000000..ace46cc152
--- /dev/null
+++ b/.claude/skills/orchestrate/scripts/notify.sh
@@ -0,0 +1,40 @@
+#!/usr/bin/env bash
+# notify.sh — send a fleet notification message
+#
+# Delivery order (first available wins):
+#   1. Discord webhook — DISCORD_WEBHOOK_URL env var OR state file .discord_webhook
+#   2. macOS notification center — osascript (silent fail if unavailable)
+#   3. Stdout only
+#
+# Usage: notify.sh MESSAGE
+# Exit: always 0 (notification failure must not abort the caller)
+
+MESSAGE="${1:-}"
+[ -z "$MESSAGE" ] && exit 0
+
+STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
+
+# --- Resolve Discord webhook ---
+WEBHOOK="${DISCORD_WEBHOOK_URL:-}"
+if [ -z "$WEBHOOK" ] && [ -f "$STATE_FILE" ]; then
+  WEBHOOK=$(jq -r '.discord_webhook // ""' "$STATE_FILE" 2>/dev/null || echo "")
+fi
+
+# --- Discord delivery ---
+if [ -n "$WEBHOOK" ]; then
+  PAYLOAD=$(jq -n --arg msg "$MESSAGE" '{"content": $msg}')
+  curl -s -X POST "$WEBHOOK" \
+    -H "Content-Type: application/json" \
+    -d "$PAYLOAD" > /dev/null 2>&1 || true
+fi
+
+# --- macOS notification center (silent if not macOS or osascript missing) ---
+if command -v osascript &>/dev/null 2>&1; then
+  # Escape single quotes for AppleScript
+  SAFE_MSG=$(echo "$MESSAGE" | sed "s/'/\\\\'/g")
+  osascript -e "display notification \"${SAFE_MSG}\" with title \"Orchestrator\"" 2>/dev/null || true
+fi
+
+# Always print to stdout so run-loop.sh logs it
+echo "$MESSAGE"
+exit 0
diff --git a/.claude/skills/orchestrate/scripts/poll-cycle.sh b/.claude/skills/orchestrate/scripts/poll-cycle.sh
new file mode 100755
index 0000000000..dafd307bf3
--- /dev/null
+++ b/.claude/skills/orchestrate/scripts/poll-cycle.sh
@@ -0,0 +1,257 @@
+#!/usr/bin/env bash
+# poll-cycle.sh — Single orchestrator poll cycle
+#
+# Reads ~/.claude/orchestrator-state.json, classifies each agent, updates state,
+# and outputs a JSON array of actions for Claude to take.
+#
+# Usage: poll-cycle.sh
+# Output (stdout): JSON array of action objects
+#   [{ "window": "work:0", "action": "kick|approve|none", "state": "...",
+#      "worktree": "...", "objective": "...", "reason": "..." }]
+#
+# The state file is updated in-place (atomic write via .tmp).
+
+set -euo pipefail
+
+STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
+SCRIPTS_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+CLASSIFY="$SCRIPTS_DIR/classify-pane.sh"
+
+# Cross-platform md5: always outputs just the hex digest
+md5_hash() {
+  if command -v md5sum &>/dev/null; then
+    md5sum | awk '{print $1}'
+  else
+    md5 | awk '{print $NF}'
+  fi
+}
+
+# Clean up temp file on any exit (avoids stale .tmp if jq write fails)
+trap 'rm -f "${STATE_FILE}.tmp"' EXIT
+
+# Ensure state file exists
+if [ ! -f "$STATE_FILE" ]; then
+  echo '{"active":false,"agents":[]}' > "$STATE_FILE"
+fi
+
+# Validate JSON upfront before any jq reads that run under set -e.
+# A truncated/corrupt file (e.g. from a SIGKILL mid-write) would otherwise
+# abort the script at the ACTIVE read below without emitting any JSON output.
+if ! jq -e '.' "$STATE_FILE" >/dev/null 2>&1; then
+  echo "State file parse error — check $STATE_FILE" >&2
+  echo "[]"
+  exit 0
+fi
+
+ACTIVE=$(jq -r '.active // false' "$STATE_FILE")
+if [ "$ACTIVE" != "true" ]; then
+  echo "[]"
+  exit 0
+fi
+
+NOW=$(date +%s)
+IDLE_THRESHOLD=$(jq -r '.idle_threshold_seconds // 300' "$STATE_FILE")
+
+ACTIONS="[]"
+UPDATED_AGENTS="[]"
+
+# Read agents as newline-delimited JSON objects.
+# jq exits non-zero when .agents[] has no matches on an empty array, which is valid —
+# so we suppress that exit code and separately validate the file is well-formed JSON.
+if ! AGENTS_JSON=$(jq -e -c '.agents // empty | .[]' "$STATE_FILE" 2>/dev/null); then
+  if ! jq -e '.' "$STATE_FILE" > /dev/null 2>&1; then
+    echo "State file parse error — check $STATE_FILE" >&2
+  fi
+  echo "[]"
+  exit 0
+fi
+
+if [ -z "$AGENTS_JSON" ]; then
+  echo "[]"
+  exit 0
+fi
+
+while IFS= read -r agent; do
+  [ -z "$agent" ] && continue
+
+  # Use // "" defaults so a single malformed field doesn't abort the whole cycle
+  WINDOW=$(echo "$agent"   | jq -r '.window // ""')
+  WORKTREE=$(echo "$agent" | jq -r '.worktree // ""')
+  OBJECTIVE=$(echo "$agent"| jq -r '.objective // ""')
+  STATE=$(echo "$agent"    | jq -r '.state // "running"')
+  LAST_HASH=$(echo "$agent"| jq -r '.last_output_hash // ""')
+  IDLE_SINCE=$(echo "$agent"| jq -r '.idle_since // 0')
+  REVISION_COUNT=$(echo "$agent"| jq -r '.revision_count // 0')
+
+  # Validate window format to prevent tmux target injection.
+  # Allow session:window (numeric or named) and session:window.pane
+  if ! [[ "$WINDOW" =~ ^[a-zA-Z0-9_.-]+:[a-zA-Z0-9_.-]+(\.[0-9]+)?$ ]]; then
+    echo "Skipping agent with invalid window value: $WINDOW" >&2
+    UPDATED_AGENTS=$(echo "$UPDATED_AGENTS" | jq --argjson a "$agent" '. + [$a]')
+    continue
+  fi
+
+  # Pass-through terminal-state agents
+  if [[ "$STATE" == "done" || "$STATE" == "escalated" || "$STATE" == "complete" || "$STATE" == "pending_evaluation" ]]; then
+    UPDATED_AGENTS=$(echo "$UPDATED_AGENTS" | jq --argjson a "$agent" '. + [$a]')
+    continue
+  fi
+
+  # Classify pane.
+  # classify-pane.sh always emits JSON before exit (even on error), so using
+  # "|| echo '...'" would concatenate two JSON objects when it exits non-zero.
+  # Use "|| true" inside the substitution so set -euo pipefail does not abort
+  # the poll cycle when classify exits with a non-zero status code.
+  CLASSIFICATION=$("$CLASSIFY" "$WINDOW" 2>/dev/null || true)
+  [ -z "$CLASSIFICATION" ] && CLASSIFICATION='{"state":"error","reason":"classify failed","pane_cmd":"unknown"}'
+
+  PANE_STATE=$(echo "$CLASSIFICATION" | jq -r '.state')
+  PANE_REASON=$(echo "$CLASSIFICATION" | jq -r '.reason')
+
+  # Capture full pane output once — used for hash (stuck detection) and checkpoint parsing.
+  # Use -S -500 to get the last ~500 lines of scrollback so checkpoints aren't missed.
+  RAW=$(tmux capture-pane -t "$WINDOW" -p -S -500 2>/dev/null || echo "")
+
+  # --- Checkpoint tracking ---
+  # Parse any "CHECKPOINT:<step>" lines the agent has output and merge into state file.
+  # The agent writes these as it completes each required step so verify-complete.sh can gate recycling.
+  EXISTING_CPS=$(echo "$agent" | jq -c '.checkpoints // []')
+  NEW_CHECKPOINTS_JSON="$EXISTING_CPS"
+  if [ -n "$RAW" ]; then
+    FOUND_CPS=$(echo "$RAW" \
+      | grep -oE "CHECKPOINT:[a-zA-Z0-9_-]+" \
+      | sed 's/CHECKPOINT://' \
+      | sort -u \
+      | jq -R . | jq -s . 2>/dev/null || echo "[]")
+    NEW_CHECKPOINTS_JSON=$(jq -n \
+      --argjson existing "$EXISTING_CPS" \
+      --argjson found "$FOUND_CPS" \
+      '($existing + $found) | unique' 2>/dev/null || echo "$EXISTING_CPS")
+  fi
+
+  # Compute content hash for stuck-detection (only for running agents)
+  CURRENT_HASH=""
+  if [[ "$PANE_STATE" == "running" ]] && [ -n "$RAW" ]; then
+    CURRENT_HASH=$(echo "$RAW" | tail -20 | md5_hash)
+  fi
+
+  NEW_STATE="$STATE"
+  NEW_IDLE_SINCE="$IDLE_SINCE"
+  NEW_REVISION_COUNT="$REVISION_COUNT"
+  ACTION="none"
+  REASON="$PANE_REASON"
+
+  case "$PANE_STATE" in
+    complete)
+      # Agent output ORCHESTRATOR:DONE — mark pending_evaluation so orchestrator handles it.
+      # run-loop does NOT verify or notify; orchestrator's background poll picks this up.
+      NEW_STATE="pending_evaluation"
+      ACTION="complete"  # run-loop logs it but takes no action
+      ;;
+    waiting_approval)
+      NEW_STATE="waiting_approval"
+      ACTION="approve"
+      ;;
+    idle)
+      # Agent process has exited — needs restart
+      NEW_STATE="idle"
+      ACTION="kick"
+      REASON="agent exited (shell is foreground)"
+      NEW_REVISION_COUNT=$(( REVISION_COUNT + 1 ))
+      NEW_IDLE_SINCE=$NOW
+      if [ "$NEW_REVISION_COUNT" -ge 3 ]; then
+        NEW_STATE="escalated"
+        ACTION="none"
+        REASON="escalated after ${NEW_REVISION_COUNT} kicks — needs human attention"
+      fi
+      ;;
+    running)
+      # Clear idle_since only when transitioning from idle (agent was kicked and
+      # restarted). Do NOT reset for stuck — idle_since must persist across polls
+      # so STUCK_DURATION can accumulate and trigger escalation.
+      # Also update the local IDLE_SINCE so the hash-stability check below uses
+      # the reset value on this same poll, not the stale kick timestamp.
+      if [[ "$STATE" == "idle" ]]; then
+        NEW_IDLE_SINCE=0
+        IDLE_SINCE=0
+      fi
+      # Check if hash has been stable (agent may be stuck mid-task)
+      if [ -n "$CURRENT_HASH" ] && [ "$CURRENT_HASH" = "$LAST_HASH" ] && [ "$LAST_HASH" != "" ]; then
+        if [ "$IDLE_SINCE" = "0" ] || [ "$IDLE_SINCE" = "null" ]; then
+          NEW_IDLE_SINCE=$NOW
+        else
+          STUCK_DURATION=$(( NOW - IDLE_SINCE ))
+          if [ "$STUCK_DURATION" -gt "$IDLE_THRESHOLD" ]; then
+            NEW_REVISION_COUNT=$(( REVISION_COUNT + 1 ))
+            NEW_IDLE_SINCE=$NOW
+            if [ "$NEW_REVISION_COUNT" -ge 3 ]; then
+              NEW_STATE="escalated"
+              ACTION="none"
+              REASON="escalated after ${NEW_REVISION_COUNT} kicks — needs human attention"
+            else
+              NEW_STATE="stuck"
+              ACTION="kick"
+              REASON="output unchanged for ${STUCK_DURATION}s (threshold: ${IDLE_THRESHOLD}s)"
+            fi
+          fi
+        fi
+      else
+        # Only reset the idle timer when we have a valid hash comparison (pane
+        # capture succeeded). If CURRENT_HASH is empty (tmux capture-pane failed),
+        # preserve existing timers so stuck detection is not inadvertently reset.
+        if [ -n "$CURRENT_HASH" ]; then
+          NEW_STATE="running"
+          NEW_IDLE_SINCE=0
+        fi
+      fi
+      ;;
+    error)
+      REASON="classify error: $PANE_REASON"
+      ;;
+  esac
+
+  # Build updated agent record (ensure idle_since and revision_count are numeric)
+  # Use || true on each jq call so a malformed field skips this agent rather than
+  # aborting the entire poll cycle under set -e.
+  UPDATED_AGENT=$(echo "$agent" | jq \
+    --arg state "$NEW_STATE" \
+    --arg hash "$CURRENT_HASH" \
+    --argjson now "$NOW" \
+    --arg idle_since "$NEW_IDLE_SINCE" \
+    --arg revision_count "$NEW_REVISION_COUNT" \
+    --argjson checkpoints "$NEW_CHECKPOINTS_JSON" \
+    '.state = $state
+     | .last_output_hash = (if $hash == "" then .last_output_hash else $hash end)
+     | .last_seen_at = $now
+     | .idle_since = ($idle_since | tonumber)
+     | .revision_count = ($revision_count | tonumber)
+     | .checkpoints = $checkpoints' 2>/dev/null) || {
+    echo "Warning: failed to build updated agent for window $WINDOW — keeping original" >&2
+    UPDATED_AGENTS=$(echo "$UPDATED_AGENTS" | jq --argjson a "$agent" '. + [$a]')
+    continue
+  }
+
+  UPDATED_AGENTS=$(echo "$UPDATED_AGENTS" | jq --argjson a "$UPDATED_AGENT" '. + [$a]')
+
+  # Add action if needed
+  if [ "$ACTION" != "none" ]; then
+    ACTION_OBJ=$(jq -n \
+      --arg window "$WINDOW" \
+      --arg action "$ACTION" \
+      --arg state "$NEW_STATE" \
+      --arg worktree "$WORKTREE" \
+      --arg objective "$OBJECTIVE" \
+      --arg reason "$REASON" \
+      '{window:$window, action:$action, state:$state, worktree:$worktree, objective:$objective, reason:$reason}')
+    ACTIONS=$(echo "$ACTIONS" | jq --argjson a "$ACTION_OBJ" '. + [$a]')
+  fi
+
+done <<< "$AGENTS_JSON"
+
+# Atomic state file update
+jq --argjson agents "$UPDATED_AGENTS" \
+   --argjson now "$NOW" \
+   '.agents = $agents | .last_poll_at = $now' \
+   "$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
+
+echo "$ACTIONS"
diff --git a/.claude/skills/orchestrate/scripts/recycle-agent.sh b/.claude/skills/orchestrate/scripts/recycle-agent.sh
new file mode 100755
index 0000000000..6d5e2fdc8f
--- /dev/null
+++ b/.claude/skills/orchestrate/scripts/recycle-agent.sh
@@ -0,0 +1,32 @@
+#!/usr/bin/env bash
+# recycle-agent.sh — kill a tmux window and restore the worktree to its spare branch
+#
+# Usage: recycle-agent.sh WINDOW WORKTREE_PATH SPARE_BRANCH
+#   WINDOW        — tmux target, e.g. autogpt1:3
+#   WORKTREE_PATH — absolute path to the git worktree
+#   SPARE_BRANCH  — branch to restore, e.g. spare/6
+#
+# Stdout: one status line
+
+set -euo pipefail
+
+if [ $# -lt 3 ]; then
+  echo "Usage: recycle-agent.sh WINDOW WORKTREE_PATH SPARE_BRANCH" >&2
+  exit 1
+fi
+
+WINDOW="$1"
+WORKTREE_PATH="$2"
+SPARE_BRANCH="$3"
+
+# Kill the tmux window (ignore error — may already be gone)
+tmux kill-window -t "$WINDOW" 2>/dev/null || true
+
+# Restore to spare branch: abort any in-progress operation, then clean
+git -C "$WORKTREE_PATH" rebase --abort 2>/dev/null || true
+git -C "$WORKTREE_PATH" merge --abort 2>/dev/null || true
+git -C "$WORKTREE_PATH" reset --hard HEAD 2>/dev/null
+git -C "$WORKTREE_PATH" clean -fd 2>/dev/null
+git -C "$WORKTREE_PATH" checkout "$SPARE_BRANCH"
+
+echo "Recycled: $(basename "$WORKTREE_PATH") → $SPARE_BRANCH (window $WINDOW closed)"
diff --git a/.claude/skills/orchestrate/scripts/run-loop.sh b/.claude/skills/orchestrate/scripts/run-loop.sh
new file mode 100755
index 0000000000..cfa7cf9a67
--- /dev/null
+++ b/.claude/skills/orchestrate/scripts/run-loop.sh
@@ -0,0 +1,164 @@
+#!/usr/bin/env bash
+# run-loop.sh — Mechanical babysitter for the agent fleet (runs in its own tmux window)
+#
+# Handles ONLY two things that need no intelligence:
+#   idle    → restart claude using --resume SESSION_ID (or --continue) to restore context
+#   approve → auto-approve safe dialogs, press Enter on numbered-option dialogs
+#
+# Everything else — ORCHESTRATOR:DONE, verification, /pr-test, final evaluation,
+# marking done, deciding to close windows — is the orchestrating Claude's job.
+# poll-cycle.sh sets state to pending_evaluation when ORCHESTRATOR:DONE is detected;
+# the orchestrator's background poll loop handles it from there.
+#
+# Usage: run-loop.sh
+# Env:   POLL_INTERVAL (default: 30), ORCHESTRATOR_STATE_FILE
+
+set -euo pipefail
+
+# Copy scripts to a stable location outside the repo so they survive branch
+# checkouts (e.g. recycle-agent.sh switching spare/N back into this worktree
+# would wipe .claude/skills/orchestrate/scripts if the skill only exists on the
+# current branch).
+_ORIGIN_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+STABLE_SCRIPTS_DIR="$HOME/.claude/orchestrator/scripts"
+mkdir -p "$STABLE_SCRIPTS_DIR"
+cp "$_ORIGIN_DIR"/*.sh "$STABLE_SCRIPTS_DIR/"
+chmod +x "$STABLE_SCRIPTS_DIR"/*.sh
+SCRIPTS_DIR="$STABLE_SCRIPTS_DIR"
+
+STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
+POLL_INTERVAL="${POLL_INTERVAL:-30}"
+
+# ---------------------------------------------------------------------------
+# update_state WINDOW FIELD VALUE
+# ---------------------------------------------------------------------------
+update_state() {
+  local window="$1" field="$2" value="$3"
+  jq --arg w "$window" --arg f "$field" --arg v "$value" \
+    '.agents |= map(if .window == $w then .[$f] = $v else . end)' \
+    "$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
+}
+
+update_state_int() {
+  local window="$1" field="$2" value="$3"
+  jq --arg w "$window" --arg f "$field" --argjson v "$value" \
+    '.agents |= map(if .window == $w then .[$f] = $v else . end)' \
+    "$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
+}
+
+agent_field() {
+  jq -r --arg w "$1" --arg f "$2" \
+    '.agents[] | select(.window == $w) | .[$f] // ""' \
+    "$STATE_FILE" 2>/dev/null
+}
+
+# ---------------------------------------------------------------------------
+# wait_for_prompt WINDOW — wait up to 60s for Claude's ❯ prompt
+# ---------------------------------------------------------------------------
+wait_for_prompt() {
+  local window="$1"
+  for i in $(seq 1 60); do
+    local cmd pane
+    cmd=$(tmux display-message -t "$window" -p '#{pane_current_command}' 2>/dev/null || echo "")
+    pane=$(tmux capture-pane -t "$window" -p 2>/dev/null || echo "")
+    if echo "$pane" | grep -q "Enter to confirm"; then
+      tmux send-keys -t "$window" Down Enter; sleep 2; continue
+    fi
+    [[ "$cmd" == "node" ]] && echo "$pane" | grep -q "❯" && return 0
+    sleep 1
+  done
+  return 1  # timed out
+}
+
+# ---------------------------------------------------------------------------
+# handle_kick WINDOW STATE — only for idle (crashed) agents, not stuck
+# ---------------------------------------------------------------------------
+handle_kick() {
+  local window="$1" state="$2"
+  [[ "$state" != "idle" ]] && return  # stuck agents handled by supervisor
+
+  local worktree_path session_id
+  worktree_path=$(agent_field "$window" "worktree_path")
+  session_id=$(agent_field "$window" "session_id")
+
+  echo "[$(date +%H:%M:%S)] KICK restart  $window — agent exited, resuming session"
+
+  # Resume the exact session so the agent retains full context — no need to re-send objective
+  if [ -n "$session_id" ]; then
+    tmux send-keys -t "$window" "cd '${worktree_path}' && claude --resume '${session_id}' --permission-mode bypassPermissions" Enter
+  else
+    tmux send-keys -t "$window" "cd '${worktree_path}' && claude --continue --permission-mode bypassPermissions" Enter
+  fi
+
+  wait_for_prompt "$window" || echo "[$(date +%H:%M:%S)] KICK WARNING  $window — timed out waiting for ❯"
+}
+
+# ---------------------------------------------------------------------------
+# handle_approve WINDOW — auto-approve dialogs that need no judgment
+# ---------------------------------------------------------------------------
+handle_approve() {
+  local window="$1"
+  local pane_tail
+  pane_tail=$(tmux capture-pane -t "$window" -p 2>/dev/null | tail -3 || echo "")
+
+  # Settings error dialog at startup
+  if echo "$pane_tail" | grep -q "Enter to confirm"; then
+    echo "[$(date +%H:%M:%S)] APPROVE dialog $window — settings error"
+    tmux send-keys -t "$window" Down Enter
+    return
+  fi
+
+  # Numbered-option dialog (e.g. "Do you want to make this edit?")
+  # ❯ is already on option 1 (Yes) — Enter confirms it
+  if echo "$pane_tail" | grep -qE "❯\s*1\." || echo "$pane_tail" | grep -q "Esc to cancel"; then
+    echo "[$(date +%H:%M:%S)] APPROVE edit   $window"
+    tmux send-keys -t "$window" "" Enter
+    return
+  fi
+
+  # y/n prompt for safe operations
+  if echo "$pane_tail" | grep -qiE "(^git |^npm |^pnpm |^poetry |^pytest|^docker |^make |^cargo |^pip |^yarn |curl .*(localhost|127\.0\.0\.1))"; then
+    echo "[$(date +%H:%M:%S)] APPROVE safe   $window"
+    tmux send-keys -t "$window" "y" Enter
+    return
+  fi
+
+  # Anything else — supervisor handles it, just log
+  echo "[$(date +%H:%M:%S)] APPROVE skip   $window — unknown dialog, supervisor will handle"
+}
+
+# ---------------------------------------------------------------------------
+# Main loop
+# ---------------------------------------------------------------------------
+echo "[$(date +%H:%M:%S)] run-loop started (mechanical only, poll every ${POLL_INTERVAL}s)"
+echo "[$(date +%H:%M:%S)] Supervisor: orchestrating Claude session (not a separate window)"
+echo "---"
+
+while true; do
+  if ! jq -e '.active == true' "$STATE_FILE" >/dev/null 2>&1; then
+    echo "[$(date +%H:%M:%S)] active=false — exiting."
+    exit 0
+  fi
+
+  ACTIONS=$("$SCRIPTS_DIR/poll-cycle.sh" 2>/dev/null || echo "[]")
+  KICKED=0; DONE=0
+
+  while IFS= read -r action; do
+    [ -z "$action" ] && continue
+    WINDOW=$(echo "$action" | jq -r '.window // ""')
+    ACTION=$(echo "$action" | jq -r '.action // ""')
+    STATE=$(echo "$action"  | jq -r '.state // ""')
+
+    case "$ACTION" in
+      kick)     handle_kick "$WINDOW" "$STATE" || true; KICKED=$(( KICKED + 1 )) ;;
+      approve)  handle_approve "$WINDOW" || true ;;
+      complete) DONE=$(( DONE + 1 )) ;;  # poll-cycle already set state=pending_evaluation; orchestrator handles
+    esac
+  done < <(echo "$ACTIONS" | jq -c '.[]' 2>/dev/null || true)
+
+  RUNNING=$(jq '[.agents[] | select(.state | test("running|stuck|waiting_approval|idle"))] | length' \
+    "$STATE_FILE" 2>/dev/null || echo 0)
+
+  echo "[$(date +%H:%M:%S)] Poll — ${RUNNING} running  ${KICKED} kicked  ${DONE} recycled"
+  sleep "$POLL_INTERVAL"
+done
diff --git a/.claude/skills/orchestrate/scripts/spawn-agent.sh b/.claude/skills/orchestrate/scripts/spawn-agent.sh
new file mode 100755
index 0000000000..526a32f067
--- /dev/null
+++ b/.claude/skills/orchestrate/scripts/spawn-agent.sh
@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+# spawn-agent.sh — create tmux window, checkout branch, launch claude, send task
+#
+# Usage: spawn-agent.sh SESSION WORKTREE_PATH SPARE_BRANCH NEW_BRANCH OBJECTIVE [PR_NUMBER] [STEPS...]
+#   SESSION       — tmux session name, e.g. autogpt1
+#   WORKTREE_PATH — absolute path to the git worktree
+#   SPARE_BRANCH  — spare branch being replaced, e.g. spare/6 (saved for recycle)
+#   NEW_BRANCH    — task branch to create, e.g. feat/my-feature
+#   OBJECTIVE     — task description sent to the agent
+#   PR_NUMBER     — (optional) GitHub PR number for completion verification
+#   STEPS...      — (optional) required checkpoint names, e.g. pr-address pr-test
+#
+# Stdout: SESSION:WINDOW_INDEX (nothing else — callers rely on this)
+# Exit non-zero on failure.
+
+set -euo pipefail
+
+if [ $# -lt 5 ]; then
+  echo "Usage: spawn-agent.sh SESSION WORKTREE_PATH SPARE_BRANCH NEW_BRANCH OBJECTIVE [PR_NUMBER] [STEPS...]" >&2
+  exit 1
+fi
+
+SESSION="$1"
+WORKTREE_PATH="$2"
+SPARE_BRANCH="$3"
+NEW_BRANCH="$4"
+OBJECTIVE="$5"
+PR_NUMBER="${6:-}"
+STEPS=("${@:7}")
+WORKTREE_NAME=$(basename "$WORKTREE_PATH")
+STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
+
+# Generate a stable session ID so this agent's Claude session can always be resumed:
+#   claude --resume $SESSION_ID --permission-mode bypassPermissions
+SESSION_ID=$(uuidgen 2>/dev/null || python3 -c "import uuid; print(uuid.uuid4())")
+
+# Create (or switch to) the task branch
+git -C "$WORKTREE_PATH" checkout -b "$NEW_BRANCH" 2>/dev/null \
+  || git -C "$WORKTREE_PATH" checkout "$NEW_BRANCH"
+
+# Open a new named tmux window; capture its numeric index
+WIN_IDX=$(tmux new-window -t "$SESSION" -n "$WORKTREE_NAME" -P -F '#{window_index}')
+WINDOW="${SESSION}:${WIN_IDX}"
+
+# Append the initial agent record to the state file so subsequent jq updates find it.
+# This must happen before the pr_number/steps update below.
+if [ -f "$STATE_FILE" ]; then
+  NOW=$(date +%s)
+  jq --arg window "$WINDOW" \
+     --arg worktree "$WORKTREE_NAME" \
+     --arg worktree_path "$WORKTREE_PATH" \
+     --arg spare_branch "$SPARE_BRANCH" \
+     --arg branch "$NEW_BRANCH" \
+     --arg objective "$OBJECTIVE" \
+     --arg session_id "$SESSION_ID" \
+     --argjson now "$NOW" \
+     '.agents += [{
+       "window": $window,
+       "worktree": $worktree,
+       "worktree_path": $worktree_path,
+       "spare_branch": $spare_branch,
+       "branch": $branch,
+       "objective": $objective,
+       "session_id": $session_id,
+       "state": "running",
+       "checkpoints": [],
+       "last_output_hash": "",
+       "last_seen_at": $now,
+       "spawned_at": $now,
+       "idle_since": 0,
+       "revision_count": 0,
+       "last_rebriefed_at": 0
+     }]' "$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
+fi
+
+# Store pr_number + steps in state file if provided (enables verify-complete.sh).
+# The agent record was appended above so the jq select now finds it.
+if [ -n "$PR_NUMBER" ] && [ -f "$STATE_FILE" ]; then
+  if [ "${#STEPS[@]}" -gt 0 ]; then
+    STEPS_JSON=$(printf '%s\n' "${STEPS[@]}" | jq -R . | jq -s .)
+  else
+    STEPS_JSON='[]'
+  fi
+  jq --arg w "$WINDOW" --arg pr "$PR_NUMBER" --argjson steps "$STEPS_JSON" \
+    '.agents |= map(if .window == $w then . + {pr_number: $pr, steps: $steps, checkpoints: []} else . end)' \
+    "$STATE_FILE" > "${STATE_FILE}.tmp" && mv "${STATE_FILE}.tmp" "$STATE_FILE"
+fi
+
+# Launch claude with a stable session ID so it can always be resumed after a crash:
+#   claude --resume SESSION_ID --permission-mode bypassPermissions
+tmux send-keys -t "$WINDOW" "cd '${WORKTREE_PATH}' && claude --permission-mode bypassPermissions --session-id '${SESSION_ID}'" Enter
+
+# Wait up to 60s for claude to be fully interactive:
+# both pane_current_command == 'node' AND the '❯' prompt is visible.
+PROMPT_FOUND=false
+for i in $(seq 1 60); do
+  CMD=$(tmux display-message -t "$WINDOW" -p '#{pane_current_command}' 2>/dev/null || echo "")
+  PANE=$(tmux capture-pane -t "$WINDOW" -p 2>/dev/null || echo "")
+  if echo "$PANE" | grep -q "Enter to confirm"; then
+    tmux send-keys -t "$WINDOW" Down Enter
+    sleep 2
+    continue
+  fi
+  if [[ "$CMD" == "node" ]] && echo "$PANE" | grep -q "❯"; then
+    PROMPT_FOUND=true
+    break
+  fi
+  sleep 1
+done
+
+if ! $PROMPT_FOUND; then
+  echo "[spawn-agent] WARNING: timed out waiting for ❯ prompt on $WINDOW — sending objective anyway" >&2
+fi
+
+# Send the task. Split text and Enter — if combined, Enter can fire before the string
+# is fully buffered, leaving the message stuck as "[Pasted text +N lines]" unsent.
+tmux send-keys -t "$WINDOW" "${OBJECTIVE} Output each completed step as CHECKPOINT:<step-name>. When ALL steps are done, output ORCHESTRATOR:DONE on its own line."
+sleep 0.3
+tmux send-keys -t "$WINDOW" Enter
+
+# Only output the window address — nothing else (callers parse this)
+echo "$WINDOW"
diff --git a/.claude/skills/orchestrate/scripts/status.sh b/.claude/skills/orchestrate/scripts/status.sh
new file mode 100755
index 0000000000..d1b191c05f
--- /dev/null
+++ b/.claude/skills/orchestrate/scripts/status.sh
@@ -0,0 +1,43 @@
+#!/usr/bin/env bash
+# status.sh — print orchestrator status: state file summary + live tmux pane commands
+#
+# Usage: status.sh
+# Reads: ~/.claude/orchestrator-state.json
+
+set -euo pipefail
+
+STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
+
+if [ ! -f "$STATE_FILE" ] || ! jq -e '.' "$STATE_FILE" >/dev/null 2>&1; then
+  echo "No orchestrator state found at $STATE_FILE"
+  exit 0
+fi
+
+# Header: active status, session, thresholds, last poll
+jq -r '
+  "=== Orchestrator [\(if .active then "RUNNING" else "STOPPED" end)] ===",
+  "Session: \(.tmux_session // "unknown")  |  Idle threshold: \(.idle_threshold_seconds // 300)s",
+  "Last poll: \(if (.last_poll_at // 0) == 0 then "never" else (.last_poll_at | strftime("%H:%M:%S")) end)",
+  ""
+' "$STATE_FILE"
+
+# Each agent: state, window, worktree/branch, truncated objective
+AGENT_COUNT=$(jq '.agents | length' "$STATE_FILE")
+if [ "$AGENT_COUNT" -eq 0 ]; then
+  echo "  (no agents registered)"
+else
+  jq -r '
+    .agents[] |
+    "  [\(.state | ascii_upcase)] \(.window)  \(.worktree)/\(.branch)",
+    "    \(.objective // "" | .[0:70])"
+  ' "$STATE_FILE"
+fi
+
+echo ""
+
+# Live pane_current_command for non-done agents
+while IFS= read -r WINDOW; do
+  [ -z "$WINDOW" ] && continue
+  CMD=$(tmux display-message -t "$WINDOW" -p '#{pane_current_command}' 2>/dev/null || echo "unreachable")
+  echo "  $WINDOW live: $CMD"
+done < <(jq -r '.agents[] | select(.state != "done") | .window' "$STATE_FILE" 2>/dev/null || true)
diff --git a/.claude/skills/orchestrate/scripts/verify-complete.sh b/.claude/skills/orchestrate/scripts/verify-complete.sh
new file mode 100644
index 0000000000..4ce6ae7eec
--- /dev/null
+++ b/.claude/skills/orchestrate/scripts/verify-complete.sh
@@ -0,0 +1,129 @@
+#!/usr/bin/env bash
+# verify-complete.sh — verify a PR task is truly done before marking the agent done
+#
+# Check order matters:
+#   1. Checkpoints — did the agent do all required steps?
+#   2. CI complete — no pending (bots post comments AFTER their check runs, must wait)
+#   3. CI passing — no failures (agent must fix before done)
+#   4. spawned_at — a new CI run was triggered after agent spawned (proves real work)
+#   5. Unresolved threads — checked AFTER CI so bot-posted comments are included
+#   6. CHANGES_REQUESTED — checked AFTER CI so bot reviews are included
+#
+# Usage: verify-complete.sh WINDOW
+# Exit 0 = verified complete; exit 1 = not complete (stderr has reason)
+
+set -euo pipefail
+
+WINDOW="$1"
+STATE_FILE="${ORCHESTRATOR_STATE_FILE:-$HOME/.claude/orchestrator-state.json}"
+
+PR_NUMBER=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .pr_number // ""' "$STATE_FILE" 2>/dev/null)
+STEPS=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .steps // [] | .[]' "$STATE_FILE" 2>/dev/null || true)
+CHECKPOINTS=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .checkpoints // [] | .[]' "$STATE_FILE" 2>/dev/null || true)
+WORKTREE_PATH=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .worktree_path // ""' "$STATE_FILE" 2>/dev/null)
+BRANCH=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .branch // ""' "$STATE_FILE" 2>/dev/null)
+SPAWNED_AT=$(jq -r --arg w "$WINDOW" '.agents[] | select(.window == $w) | .spawned_at // "0"' "$STATE_FILE" 2>/dev/null || echo "0")
+
+# No PR number = cannot verify
+if [ -z "$PR_NUMBER" ]; then
+  echo "NOT COMPLETE: no pr_number in state — set pr_number or mark done manually" >&2
+  exit 1
+fi
+
+# --- Check 1: all required steps are checkpointed ---
+MISSING=""
+while IFS= read -r step; do
+  [ -z "$step" ] && continue
+  if ! echo "$CHECKPOINTS" | grep -qFx "$step"; then
+    MISSING="$MISSING $step"
+  fi
+done <<< "$STEPS"
+
+if [ -n "$MISSING" ]; then
+  echo "NOT COMPLETE: missing checkpoints:$MISSING on PR #$PR_NUMBER" >&2
+  exit 1
+fi
+
+# Resolve repo for all GitHub checks below
+REPO=$(jq -r '.repo // ""' "$STATE_FILE" 2>/dev/null || echo "")
+if [ -z "$REPO" ] && [ -n "$WORKTREE_PATH" ] && [ -d "$WORKTREE_PATH" ]; then
+  REPO=$(git -C "$WORKTREE_PATH" remote get-url origin 2>/dev/null \
+    | sed 's|.*github\.com[:/]||; s|\.git$||' || echo "")
+fi
+
+if [ -z "$REPO" ]; then
+  echo "Warning: cannot resolve repo — skipping CI/thread checks" >&2
+  echo "VERIFIED: PR #$PR_NUMBER — checkpoints ✓ (CI/thread checks skipped — no repo)"
+  exit 0
+fi
+
+CI_BUCKETS=$(gh pr checks "$PR_NUMBER" --repo "$REPO" --json bucket 2>/dev/null || echo "[]")
+
+# --- Check 2: CI fully complete — no pending checks ---
+# Pending checks MUST finish before we check threads/reviews:
+# bots (Seer, Check PR Status, etc.) post comments and CHANGES_REQUESTED AFTER their CI check runs.
+PENDING=$(echo "$CI_BUCKETS" | jq '[.[] | select(.bucket == "pending")] | length' 2>/dev/null || echo "0")
+if [ "$PENDING" -gt 0 ]; then
+  PENDING_NAMES=$(gh pr checks "$PR_NUMBER" --repo "$REPO" --json bucket,name 2>/dev/null \
+    | jq -r '[.[] | select(.bucket == "pending") | .name] | join(", ")' 2>/dev/null || echo "unknown")
+  echo "NOT COMPLETE: $PENDING CI checks still pending on PR #$PR_NUMBER ($PENDING_NAMES)" >&2
+  exit 1
+fi
+
+# --- Check 3: CI passing — no failures ---
+FAILING=$(echo "$CI_BUCKETS" | jq '[.[] | select(.bucket == "fail")] | length' 2>/dev/null || echo "0")
+if [ "$FAILING" -gt 0 ]; then
+  FAILING_NAMES=$(gh pr checks "$PR_NUMBER" --repo "$REPO" --json bucket,name 2>/dev/null \
+    | jq -r '[.[] | select(.bucket == "fail") | .name] | join(", ")' 2>/dev/null || echo "unknown")
+  echo "NOT COMPLETE: $FAILING failing CI checks on PR #$PR_NUMBER ($FAILING_NAMES)" >&2
+  exit 1
+fi
+
+# --- Check 4: a new CI run was triggered AFTER the agent spawned ---
+if [ -n "$BRANCH" ] && [ "${SPAWNED_AT:-0}" -gt 0 ]; then
+  LATEST_RUN_AT=$(gh run list --repo "$REPO" --branch "$BRANCH" \
+    --json createdAt --limit 1 2>/dev/null | jq -r '.[0].createdAt // ""')
+  if [ -n "$LATEST_RUN_AT" ]; then
+    if date --version >/dev/null 2>&1; then
+      LATEST_RUN_EPOCH=$(date -d "$LATEST_RUN_AT" "+%s" 2>/dev/null || echo "0")
+    else
+      LATEST_RUN_EPOCH=$(TZ=UTC date -j -f "%Y-%m-%dT%H:%M:%SZ" "$LATEST_RUN_AT" "+%s" 2>/dev/null || echo "0")
+    fi
+    if [ "$LATEST_RUN_EPOCH" -le "$SPAWNED_AT" ]; then
+      echo "NOT COMPLETE: latest CI run on $BRANCH predates agent spawn — agent may not have pushed yet" >&2
+      exit 1
+    fi
+  fi
+fi
+
+OWNER=$(echo "$REPO" | cut -d/ -f1)
+REPONAME=$(echo "$REPO" | cut -d/ -f2)
+
+# --- Check 5: no unresolved review threads (checked AFTER CI — bots post after their check) ---
+UNRESOLVED=$(gh api graphql -f query="
+  { repository(owner: \"${OWNER}\", name: \"${REPONAME}\") {
+      pullRequest(number: ${PR_NUMBER}) {
+        reviewThreads(first: 50) { nodes { isResolved } }
+      }
+    }
+  }
+" --jq '[.data.repository.pullRequest.reviewThreads.nodes[] | select(.isResolved == false)] | length' 2>/dev/null || echo "0")
+
+if [ "$UNRESOLVED" -gt 0 ]; then
+  echo "NOT COMPLETE: $UNRESOLVED unresolved review threads on PR #$PR_NUMBER" >&2
+  exit 1
+fi
+
+# --- Check 6: no CHANGES_REQUESTED (checked AFTER CI — bots post reviews after their check) ---
+CHANGES_REQUESTED=$(gh pr view "$PR_NUMBER" --repo "$REPO" \
+  --json reviews --jq '[.reviews[] | select(.state == "CHANGES_REQUESTED")] | length' 2>/dev/null || echo "0")
+
+if [ "$CHANGES_REQUESTED" -gt 0 ]; then
+  REQUESTERS=$(gh pr view "$PR_NUMBER" --repo "$REPO" \
+    --json reviews --jq '[.reviews[] | select(.state == "CHANGES_REQUESTED") | .author.login] | join(", ")' 2>/dev/null || echo "unknown")
+  echo "NOT COMPLETE: CHANGES_REQUESTED from ${REQUESTERS} on PR #$PR_NUMBER" >&2
+  exit 1
+fi
+
+echo "VERIFIED: PR #$PR_NUMBER — checkpoints ✓, CI complete + green, 0 unresolved threads, no CHANGES_REQUESTED"
+exit 0
diff --git a/.claude/skills/pr-address/SKILL.md b/.claude/skills/pr-address/SKILL.md
index 4c6ab81e58..9a9c89e0ec 100644
--- a/.claude/skills/pr-address/SKILL.md
+++ b/.claude/skills/pr-address/SKILL.md
@@ -90,10 +90,12 @@ Address comments **one at a time**: fix → commit → push → inline reply →
 2. Commit and push the fix
 3. Reply **inline** (not as a new top-level comment) referencing the fixing commit — this is what resolves the conversation for bot reviewers (coderabbitai, sentry):
 
+Use a **markdown commit link** so GitHub renders it as a clickable reference. Get the full SHA with `git rev-parse HEAD` after committing:
+
 | Comment type | How to reply |
 |---|---|
-| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="🤖 Fixed in <commit-sha>: <description>"` |
-| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="🤖 Fixed in <commit-sha>: <description>"` |
+| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="🤖 Fixed in [abc1234](https://github.com/Significant-Gravitas/AutoGPT/commit/FULL_SHA): <description>"` |
+| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="🤖 Fixed in [abc1234](https://github.com/Significant-Gravitas/AutoGPT/commit/FULL_SHA): <description>"` |
 
 ## Codecov coverage
 
diff --git a/.claude/skills/pr-test/SKILL.md b/.claude/skills/pr-test/SKILL.md
index b915cc55ab..f11feda332 100644
--- a/.claude/skills/pr-test/SKILL.md
+++ b/.claude/skills/pr-test/SKILL.md
@@ -547,6 +547,8 @@ Upload screenshots to the PR using the GitHub Git API (no local git operations 
 
 **This step is MANDATORY. Every test run MUST post a PR comment with screenshots. No exceptions.**
 
+**CRITICAL — NEVER post a bare directory link like `https://github.com/.../tree/...`.** Every screenshot MUST appear as `![name](raw_url)` inline in the PR comment so reviewers can see them without clicking any links. After posting, the verification step below greps the comment for `![` tags and exits 1 if none are found — the test run is considered incomplete until this passes.
+
 ```bash
 # Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
 REPO="Significant-Gravitas/AutoGPT"
@@ -584,15 +586,27 @@ TREE_JSON+=']'
 
 # Step 2: Create tree, commit, and branch ref
 TREE_SHA=$(echo "$TREE_JSON" | jq -c '{tree: .}' | gh api "repos/${REPO}/git/trees" --input - --jq '.sha')
-COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
-  -f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
-  -f tree="$TREE_SHA" \
-  --jq '.sha')
+
+# Resolve parent commit so screenshots are chained, not orphan root commits
+PARENT_SHA=$(gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" --jq '.object.sha' 2>/dev/null || echo "")
+if [ -n "$PARENT_SHA" ]; then
+  COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
+    -f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
+    -f tree="$TREE_SHA" \
+    -f "parents[]=$PARENT_SHA" \
+    --jq '.sha')
+else
+  COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
+    -f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
+    -f tree="$TREE_SHA" \
+    --jq '.sha')
+fi
+
 gh api "repos/${REPO}/git/refs" \
   -f ref="refs/heads/${SCREENSHOTS_BRANCH}" \
   -f sha="$COMMIT_SHA" 2>/dev/null \
   || gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" \
-    -X PATCH -f sha="$COMMIT_SHA" -f force=true
+    -X PATCH -f sha="$COMMIT_SHA" -F force=true
 ```
 
 Then post the comment with **inline images AND explanations for each screenshot**:
@@ -658,6 +672,15 @@ INNEREOF
 
 gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -F body=@"$COMMENT_FILE"
 rm -f "$COMMENT_FILE"
+
+# Verify the posted comment contains inline images — exit 1 if none found
+# Use separate --paginate + jq pipe: --jq applies per-page, not to the full list
+LAST_COMMENT=$(gh api "repos/${REPO}/issues/$PR_NUMBER/comments" --paginate 2>/dev/null | jq -r '.[-1].body // ""')
+if ! echo "$LAST_COMMENT" | grep -q '!\['; then
+  echo "ERROR: Posted comment contains no inline images (![). Bare directory links are not acceptable." >&2
+  exit 1
+fi
+echo "✓ Inline images verified in posted comment"
 ```
 
 **The PR comment MUST include:**
@@ -667,6 +690,103 @@ rm -f "$COMMENT_FILE"
 
 This approach uses the GitHub Git API to create blobs, trees, commits, and refs entirely server-side. No local `git checkout` or `git push` — safe for worktrees and won't interfere with the PR branch.
 
+## Step 8: Evaluate and post a formal PR review
+
+After the test comment is posted, evaluate whether the run was thorough enough to make a merge decision, then post a formal GitHub review (approve or request changes). **This step is mandatory — every test run MUST end with a formal review decision.**
+
+### Evaluation criteria
+
+Re-read the PR description:
+```bash
+gh pr view "$PR_NUMBER" --json body --jq '.body' --repo "$REPO"
+```
+
+Score the run against each criterion:
+
+| Criterion | Pass condition |
+|-----------|---------------|
+| **Coverage** | Every feature/change described in the PR has at least one test scenario |
+| **All scenarios pass** | No FAIL rows in the results table |
+| **Negative tests** | At least one failure-path test per feature (invalid input, unauthorized, edge case) |
+| **Before/after evidence** | Every state-changing API call has before/after values logged |
+| **Screenshots are meaningful** | Screenshots show the actual state change, not just a loading spinner or blank page |
+| **No regressions** | Existing core flows (login, agent create/run) still work |
+
+### Decision logic
+
+```
+ALL criteria pass                            → APPROVE
+Any scenario FAIL or missing PR feature      → REQUEST_CHANGES (list gaps)
+Evidence weak (no before/after, vague shots) → REQUEST_CHANGES (list what's missing)
+```
+
+### Post the review
+
+```bash
+REVIEW_FILE=$(mktemp)
+
+# Count results
+PASS_COUNT=$(echo "$TEST_RESULTS_TABLE" | grep -c "PASS" || true)
+FAIL_COUNT=$(echo "$TEST_RESULTS_TABLE" | grep -c "FAIL" || true)
+TOTAL=$(( PASS_COUNT + FAIL_COUNT ))
+
+# List any coverage gaps found during evaluation (populate this array as you assess)
+# e.g. COVERAGE_GAPS=("PR claims to add X but no test covers it")
+COVERAGE_GAPS=()
+```
+
+**If APPROVING** — all criteria met, zero failures, full coverage:
+
+```bash
+cat > "$REVIEW_FILE" <<REVIEWEOF
+## E2E Test Evaluation — APPROVED
+
+**Results:** ${PASS_COUNT}/${TOTAL} scenarios passed.
+
+**Coverage:** All features described in the PR were exercised.
+
+**Evidence:** Before/after API values logged for all state-changing operations; screenshots show meaningful state transitions.
+
+**Negative tests:** Failure paths tested for each feature.
+
+No regressions observed on core flows.
+REVIEWEOF
+
+gh pr review "$PR_NUMBER" --repo "$REPO" --approve --body "$(cat "$REVIEW_FILE")"
+echo "✅ PR approved"
+```
+
+**If REQUESTING CHANGES** — any failure, coverage gap, or missing evidence:
+
+```bash
+FAIL_LIST=$(echo "$TEST_RESULTS_TABLE" | grep "FAIL" | awk -F'|' '{print "- Scenario" $2 "failed"}' || true)
+
+cat > "$REVIEW_FILE" <<REVIEWEOF
+## E2E Test Evaluation — Changes Requested
+
+**Results:** ${PASS_COUNT}/${TOTAL} scenarios passed, ${FAIL_COUNT} failed.
+
+### Required before merge
+
+${FAIL_LIST}
+$(for gap in "${COVERAGE_GAPS[@]}"; do echo "- $gap"; done)
+
+Please fix the above and re-run the E2E tests.
+REVIEWEOF
+
+gh pr review "$PR_NUMBER" --repo "$REPO" --request-changes --body "$(cat "$REVIEW_FILE")"
+echo "❌ Changes requested"
+```
+
+```bash
+rm -f "$REVIEW_FILE"
+```
+
+**Rules:**
+- In `--fix` mode, fix all failures before posting the review — the review reflects the final state after fixes
+- Never approve if any scenario failed, even if it seems like a flake — rerun that scenario first
+- Never request changes for issues already fixed in this run
+
 ## Fix mode (--fix flag)
 
 When `--fix` is present, the standard is HIGHER. Do not just note issues — FIX them immediately.
diff --git a/.claude/skills/write-frontend-tests/SKILL.md b/.claude/skills/write-frontend-tests/SKILL.md
new file mode 100644
index 0000000000..177ce64a68
--- /dev/null
+++ b/.claude/skills/write-frontend-tests/SKILL.md
@@ -0,0 +1,224 @@
+---
+name: write-frontend-tests
+description: "Analyze the current branch diff against dev, plan integration tests for changed frontend pages/components, and write them. TRIGGER when user asks to write frontend tests, add test coverage, or 'write tests for my changes'."
+user-invocable: true
+args: "[base branch] — defaults to dev. Optionally pass a specific base branch to diff against."
+metadata:
+  author: autogpt-team
+  version: "1.0.0"
+---
+
+# Write Frontend Tests
+
+Analyze the current branch's frontend changes, plan integration tests, and write them.
+
+## References
+
+Before writing any tests, read the testing rules and conventions:
+
+- `autogpt_platform/frontend/TESTING.md` — testing strategy, file locations, examples
+- `autogpt_platform/frontend/src/tests/AGENTS.md` — detailed testing rules, MSW patterns, decision flowchart
+- `autogpt_platform/frontend/src/tests/integrations/test-utils.tsx` — custom render with providers
+- `autogpt_platform/frontend/src/tests/integrations/vitest.setup.tsx` — MSW server setup
+
+## Step 1: Identify changed frontend files
+
+```bash
+BASE_BRANCH="${ARGUMENTS:-dev}"
+cd autogpt_platform/frontend
+
+# Get changed frontend files (excluding generated, config, and test files)
+git diff "$BASE_BRANCH"...HEAD --name-only -- src/ \
+  | grep -v '__generated__' \
+  | grep -v '__tests__' \
+  | grep -v '\.test\.' \
+  | grep -v '\.stories\.' \
+  | grep -v '\.spec\.'
+```
+
+Also read the diff to understand what changed:
+
+```bash
+git diff "$BASE_BRANCH"...HEAD --stat -- src/
+git diff "$BASE_BRANCH"...HEAD -- src/ | head -500
+```
+
+## Step 2: Categorize changes and find test targets
+
+For each changed file, determine:
+
+1. **Is it a page?** (`page.tsx`) — these are the primary test targets
+2. **Is it a hook?** (`use*.ts`) — test via the page that uses it
+3. **Is it a component?** (`.tsx` in `components/`) — test via the parent page unless it's complex enough to warrant isolation
+4. **Is it a helper?** (`helpers.ts`, `utils.ts`) — unit test directly if pure logic
+
+**Priority order:**
+1. Pages with new/changed data fetching or user interactions
+2. Components with complex internal logic (modals, forms, wizards)
+3. Hooks with non-trivial business logic
+4. Pure helper functions
+
+Skip: styling-only changes, type-only changes, config changes.
+
+## Step 3: Check for existing tests
+
+For each test target, check if tests already exist:
+
+```bash
+# For a page at src/app/(platform)/library/page.tsx
+ls src/app/\(platform\)/library/__tests__/ 2>/dev/null
+
+# For a component at src/app/(platform)/library/components/AgentCard/AgentCard.tsx
+ls src/app/\(platform\)/library/components/AgentCard/__tests__/ 2>/dev/null
+```
+
+Note which targets have no tests (need new files) vs which have tests that need updating.
+
+## Step 4: Identify API endpoints used
+
+For each test target, find which API hooks are used:
+
+```bash
+# Find generated API hook imports in the changed files
+grep -rn 'from.*__generated__/endpoints' src/app/\(platform\)/library/
+grep -rn 'use[A-Z].*V[12]' src/app/\(platform\)/library/
+```
+
+For each API hook found, locate the corresponding MSW handler:
+
+```bash
+# If the page uses useGetV2ListLibraryAgents, find its MSW handlers
+grep -rn 'getGetV2ListLibraryAgents.*Handler' src/app/api/__generated__/endpoints/library/library.msw.ts
+```
+
+List every MSW handler you will need (200 for happy path, 4xx for error paths).
+
+## Step 5: Write the test plan
+
+Before writing code, output a plan as a numbered list:
+
+```
+Test plan for [branch name]:
+
+1. src/app/(platform)/library/__tests__/main.test.tsx (NEW)
+   - Renders page with agent list (MSW 200)
+   - Shows loading state
+   - Shows error state (MSW 422)
+   - Handles empty agent list
+
+2. src/app/(platform)/library/__tests__/search.test.tsx (NEW)
+   - Filters agents by search query
+   - Shows no results message
+   - Clears search
+
+3. src/app/(platform)/library/components/AgentCard/__tests__/AgentCard.test.tsx (UPDATE)
+   - Add test for new "duplicate" action
+```
+
+Present this plan to the user. Wait for confirmation before proceeding. If the user has feedback, adjust the plan.
+
+## Step 6: Write the tests
+
+For each test file in the plan, follow these conventions:
+
+### File structure
+
+```tsx
+import { render, screen, waitFor } from "@/tests/integrations/test-utils";
+import { server } from "@/mocks/mock-server";
+// Import MSW handlers for endpoints the page uses
+import {
+  getGetV2ListLibraryAgentsMockHandler200,
+  getGetV2ListLibraryAgentsMockHandler422,
+} from "@/app/api/__generated__/endpoints/library/library.msw";
+// Import the component under test
+import LibraryPage from "../page";
+
+describe("LibraryPage", () => {
+  test("renders agent list from API", async () => {
+    server.use(getGetV2ListLibraryAgentsMockHandler200());
+
+    render(<LibraryPage />);
+
+    expect(await screen.findByText(/my agents/i)).toBeDefined();
+  });
+
+  test("shows error state on API failure", async () => {
+    server.use(getGetV2ListLibraryAgentsMockHandler422());
+
+    render(<LibraryPage />);
+
+    expect(await screen.findByText(/error/i)).toBeDefined();
+  });
+});
+```
+
+### Rules
+
+- Use `render()` from `@/tests/integrations/test-utils` (NOT from `@testing-library/react` directly)
+- Use `server.use()` to set up MSW handlers BEFORE rendering
+- Use `findBy*` (async) for elements that appear after data fetching — NOT `getBy*`
+- Use `getBy*` only for elements that are immediately present in the DOM
+- Use `screen` queries — do NOT destructure from `render()`
+- Use `waitFor` when asserting side effects or state changes after interactions
+- Import `fireEvent` or `userEvent` from the test-utils for interactions
+- Do NOT mock internal hooks or functions — mock at the API boundary via MSW
+- Do NOT use `act()` manually — `render` and `fireEvent` handle it
+- Keep tests focused: one behavior per test
+- Use descriptive test names that read like sentences
+
+### Test location
+
+```
+# For pages: __tests__/ next to page.tsx
+src/app/(platform)/library/__tests__/main.test.tsx
+
+# For complex standalone components: __tests__/ inside component folder
+src/app/(platform)/library/components/AgentCard/__tests__/AgentCard.test.tsx
+
+# For pure helpers: co-located .test.ts
+src/app/(platform)/library/helpers.test.ts
+```
+
+### Custom MSW overrides
+
+When the auto-generated faker data is not enough, override with specific data:
+
+```tsx
+import { http, HttpResponse } from "msw";
+
+server.use(
+  http.get("http://localhost:3000/api/proxy/api/v2/library/agents", () => {
+    return HttpResponse.json({
+      agents: [
+        { id: "1", name: "Test Agent", description: "A test agent" },
+      ],
+      pagination: { total_items: 1, total_pages: 1, page: 1, page_size: 10 },
+    });
+  }),
+);
+```
+
+Use the proxy URL pattern: `http://localhost:3000/api/proxy/api/v{version}/{path}` — this matches the MSW base URL configured in `orval.config.ts`.
+
+## Step 7: Run and verify
+
+After writing all tests:
+
+```bash
+cd autogpt_platform/frontend
+pnpm test:unit --reporter=verbose
+```
+
+If tests fail:
+1. Read the error output carefully
+2. Fix the test (not the source code, unless there is a genuine bug)
+3. Re-run until all pass
+
+Then run the full checks:
+
+```bash
+pnpm format
+pnpm lint
+pnpm types
+```
diff --git a/.github/workflows/platform-fullstack-ci.yml b/.github/workflows/platform-fullstack-ci.yml
index fc772171b1..5020f8aa2e 100644
--- a/.github/workflows/platform-fullstack-ci.yml
+++ b/.github/workflows/platform-fullstack-ci.yml
@@ -179,21 +179,30 @@ jobs:
           pip install pyyaml
 
           # Resolve extends and generate a flat compose file that bake can understand
+          export NEXT_PUBLIC_SOURCEMAPS NEXT_PUBLIC_PW_TEST
           docker compose -f docker-compose.yml config > docker-compose.resolved.yml
 
+          # Ensure NEXT_PUBLIC_SOURCEMAPS is in resolved compose
+          # (docker compose config on some versions drops this arg)
+          if ! grep -q "NEXT_PUBLIC_SOURCEMAPS" docker-compose.resolved.yml; then
+            echo "Injecting NEXT_PUBLIC_SOURCEMAPS into resolved compose (docker compose config dropped it)"
+            sed -i '/NEXT_PUBLIC_PW_TEST/a\        NEXT_PUBLIC_SOURCEMAPS: "true"' docker-compose.resolved.yml
+          fi
+
           # Add cache configuration to the resolved compose file
           python ../.github/workflows/scripts/docker-ci-fix-compose-build-cache.py \
             --source docker-compose.resolved.yml \
             --cache-from "type=gha" \
             --cache-to "type=gha,mode=max" \
             --backend-hash "${{ hashFiles('autogpt_platform/backend/Dockerfile', 'autogpt_platform/backend/poetry.lock', 'autogpt_platform/backend/backend/**') }}" \
-            --frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}" \
+            --frontend-hash "${{ hashFiles('autogpt_platform/frontend/Dockerfile', 'autogpt_platform/frontend/pnpm-lock.yaml', 'autogpt_platform/frontend/src/**') }}-sourcemaps" \
             --git-ref "${{ github.ref }}"
 
           # Build with bake using the resolved compose file (now includes cache config)
           docker buildx bake --allow=fs.read=.. -f docker-compose.resolved.yml --load
         env:
           NEXT_PUBLIC_PW_TEST: true
+          NEXT_PUBLIC_SOURCEMAPS: true
 
       - name: Set up tests - Cache E2E test data
         id: e2e-data-cache
@@ -279,6 +288,11 @@ jobs:
           cache: "pnpm"
           cache-dependency-path: autogpt_platform/frontend/pnpm-lock.yaml
 
+      - name: Copy source maps from Docker for E2E coverage
+        run: |
+          FRONTEND_CONTAINER=$(docker compose -f ../docker-compose.resolved.yml ps -q frontend)
+          docker cp "$FRONTEND_CONTAINER":/app/.next/static .next-static-coverage
+
       - name: Set up tests - Install dependencies
         run: pnpm install --frozen-lockfile
 
@@ -289,6 +303,15 @@ jobs:
         run: pnpm test:no-build
         continue-on-error: false
 
+      - name: Upload E2E coverage to Codecov
+        if: ${{ !cancelled() }}
+        uses: codecov/codecov-action@v5
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}
+          flags: platform-frontend-e2e
+          files: ./autogpt_platform/frontend/coverage/e2e/cobertura-coverage.xml
+          disable_search: true
+
       - name: Upload Playwright report
         if: always()
         uses: actions/upload-artifact@v4
diff --git a/.gitleaks.toml b/.gitleaks.toml
new file mode 100644
index 0000000000..75867a7f50
--- /dev/null
+++ b/.gitleaks.toml
@@ -0,0 +1,36 @@
+title = "AutoGPT Gitleaks Config"
+
+[extend]
+useDefault = true
+
+[allowlist]
+description = "Global allowlist"
+paths = [
+    # Template/example env files (no real secrets)
+    '''\.env\.(default|example|template)$''',
+    # Lock files
+    '''pnpm-lock\.yaml$''',
+    '''poetry\.lock$''',
+    # Secrets baseline
+    '''\.secrets\.baseline$''',
+    # Build artifacts and caches (should not be committed)
+    '''__pycache__/''',
+    '''classic/frontend/build/''',
+    # Docker dev setup (local dev JWTs/keys only)
+    '''autogpt_platform/db/docker/''',
+    # Load test configs (dev JWTs)
+    '''load-tests/configs/''',
+    # Test files with fake/fixture keys (_test.py, test_*.py, conftest.py)
+    '''(_test|test_.*|conftest)\.py$''',
+    # Documentation (only contains placeholder keys in curl/API examples)
+    '''docs/.*\.md$''',
+    # Firebase config (public API keys by design)
+    '''google-services\.json$''',
+    '''classic/frontend/(lib|web)/''',
+]
+# CI test-only encryption key (marked DO NOT USE IN PRODUCTION)
+regexes = [
+    '''dvziYgz0KSK8FENhju0ZYi8''',
+    # LLM model name enum values falsely flagged as API keys
+    '''Llama-\d.*Instruct''',
+]
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 9dc1951992..b5527825ac 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -23,9 +23,15 @@ repos:
       - id: detect-secrets
         name: Detect secrets
         description: Detects high entropy strings that are likely to be passwords.
+        args: ["--baseline", ".secrets.baseline"]
         files: ^autogpt_platform/
-        exclude: pnpm-lock\.yaml$
-        stages: [pre-push]
+        exclude: (pnpm-lock\.yaml|\.env\.(default|example|template))$
+
+  - repo: https://github.com/gitleaks/gitleaks
+    rev: v8.24.3
+    hooks:
+      - id: gitleaks
+        name: Detect secrets (gitleaks)
 
   - repo: local
     # For proper type checking, all dependencies need to be up-to-date.
diff --git a/.secrets.baseline b/.secrets.baseline
new file mode 100644
index 0000000000..4b3deeb6b5
--- /dev/null
+++ b/.secrets.baseline
@@ -0,0 +1,467 @@
+{
+  "version": "1.5.0",
+  "plugins_used": [
+    {
+      "name": "ArtifactoryDetector"
+    },
+    {
+      "name": "AWSKeyDetector"
+    },
+    {
+      "name": "AzureStorageKeyDetector"
+    },
+    {
+      "name": "Base64HighEntropyString",
+      "limit": 4.5
+    },
+    {
+      "name": "BasicAuthDetector"
+    },
+    {
+      "name": "CloudantDetector"
+    },
+    {
+      "name": "DiscordBotTokenDetector"
+    },
+    {
+      "name": "GitHubTokenDetector"
+    },
+    {
+      "name": "GitLabTokenDetector"
+    },
+    {
+      "name": "HexHighEntropyString",
+      "limit": 3.0
+    },
+    {
+      "name": "IbmCloudIamDetector"
+    },
+    {
+      "name": "IbmCosHmacDetector"
+    },
+    {
+      "name": "IPPublicDetector"
+    },
+    {
+      "name": "JwtTokenDetector"
+    },
+    {
+      "name": "KeywordDetector",
+      "keyword_exclude": ""
+    },
+    {
+      "name": "MailchimpDetector"
+    },
+    {
+      "name": "NpmDetector"
+    },
+    {
+      "name": "OpenAIDetector"
+    },
+    {
+      "name": "PrivateKeyDetector"
+    },
+    {
+      "name": "PypiTokenDetector"
+    },
+    {
+      "name": "SendGridDetector"
+    },
+    {
+      "name": "SlackDetector"
+    },
+    {
+      "name": "SoftlayerDetector"
+    },
+    {
+      "name": "SquareOAuthDetector"
+    },
+    {
+      "name": "StripeDetector"
+    },
+    {
+      "name": "TelegramBotTokenDetector"
+    },
+    {
+      "name": "TwilioKeyDetector"
+    }
+  ],
+  "filters_used": [
+    {
+      "path": "detect_secrets.filters.allowlist.is_line_allowlisted"
+    },
+    {
+      "path": "detect_secrets.filters.common.is_ignored_due_to_verification_policies",
+      "min_level": 2
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_indirect_reference"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_likely_id_string"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_lock_file"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_not_alphanumeric_string"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_potential_uuid"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_prefixed_with_dollar_sign"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_sequential_string"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_swagger_file"
+    },
+    {
+      "path": "detect_secrets.filters.heuristic.is_templated_secret"
+    },
+    {
+      "path": "detect_secrets.filters.regex.should_exclude_file",
+      "pattern": [
+        "\\.env$",
+        "pnpm-lock\\.yaml$",
+        "\\.env\\.(default|example|template)$",
+        "__pycache__",
+        "_test\\.py$",
+        "test_.*\\.py$",
+        "conftest\\.py$",
+        "poetry\\.lock$",
+        "node_modules"
+      ]
+    }
+  ],
+  "results": {
+    "autogpt_platform/backend/backend/api/external/v1/integrations.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/api/external/v1/integrations.py",
+        "hashed_secret": "665b1e3851eefefa3fb878654292f16597d25155",
+        "is_verified": false,
+        "line_number": 289
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/airtable/_config.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/blocks/airtable/_config.py",
+        "hashed_secret": "57e168b03afb7c1ee3cdc4ee3db2fe1cc6e0df26",
+        "is_verified": false,
+        "line_number": 29
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/dataforseo/_config.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/blocks/dataforseo/_config.py",
+        "hashed_secret": "32ce93887331fa5d192f2876ea15ec000c7d58b8",
+        "is_verified": false,
+        "line_number": 12
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/github/checks.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/checks.py",
+        "hashed_secret": "8ac6f92737d8586790519c5d7bfb4d2eb172c238",
+        "is_verified": false,
+        "line_number": 108
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/github/ci.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/ci.py",
+        "hashed_secret": "90bd1b48e958257948487b90bee080ba5ed00caa",
+        "is_verified": false,
+        "line_number": 123
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
+        "hashed_secret": "f96896dafced7387dcd22343b8ea29d3d2c65663",
+        "is_verified": false,
+        "line_number": 42
+      },
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
+        "hashed_secret": "b80a94d5e70bedf4f5f89d2f5a5255cc9492d12e",
+        "is_verified": false,
+        "line_number": 193
+      },
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
+        "hashed_secret": "75b17e517fe1b3136394f6bec80c4f892da75e42",
+        "is_verified": false,
+        "line_number": 344
+      },
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/example_payloads/pull_request.synchronize.json",
+        "hashed_secret": "b0bfb5e4e2394e7f8906e5ed1dffd88b2bc89dd5",
+        "is_verified": false,
+        "line_number": 534
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/github/statuses.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/github/statuses.py",
+        "hashed_secret": "8ac6f92737d8586790519c5d7bfb4d2eb172c238",
+        "is_verified": false,
+        "line_number": 85
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/google/docs.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/google/docs.py",
+        "hashed_secret": "c95da0c6696342c867ef0c8258d2f74d20fd94d4",
+        "is_verified": false,
+        "line_number": 203
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/google/sheets.py": [
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/google/sheets.py",
+        "hashed_secret": "bd5a04fa3667e693edc13239b6d310c5c7a8564b",
+        "is_verified": false,
+        "line_number": 57
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/linear/_config.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/blocks/linear/_config.py",
+        "hashed_secret": "b37f020f42d6d613b6ce30103e4d408c4499b3bb",
+        "is_verified": false,
+        "line_number": 53
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/medium.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/medium.py",
+        "hashed_secret": "ff998abc1ce6d8f01a675fa197368e44c8916e9c",
+        "is_verified": false,
+        "line_number": 131
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/replicate/replicate_block.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/replicate/replicate_block.py",
+        "hashed_secret": "8bbdd6f26368f58ea4011d13d7f763cb662e66f0",
+        "is_verified": false,
+        "line_number": 55
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/slant3d/webhook.py": [
+      {
+        "type": "Hex High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/slant3d/webhook.py",
+        "hashed_secret": "36263c76947443b2f6e6b78153967ac4a7da99f9",
+        "is_verified": false,
+        "line_number": 100
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/talking_head.py": [
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/backend/backend/blocks/talking_head.py",
+        "hashed_secret": "44ce2d66222529eea4a32932823466fc0601c799",
+        "is_verified": false,
+        "line_number": 113
+      }
+    ],
+    "autogpt_platform/backend/backend/blocks/wordpress/_config.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/blocks/wordpress/_config.py",
+        "hashed_secret": "e62679512436161b78e8a8d68c8829c2a1031ccb",
+        "is_verified": false,
+        "line_number": 17
+      }
+    ],
+    "autogpt_platform/backend/backend/util/cache.py": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/backend/backend/util/cache.py",
+        "hashed_secret": "37f0c918c3fa47ca4a70e42037f9f123fdfbc75b",
+        "is_verified": false,
+        "line_number": 449
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/helpers.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/helpers.ts",
+        "hashed_secret": "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8",
+        "is_verified": false,
+        "line_number": 6
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/dictionaries/en.json": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/dictionaries/en.json",
+        "hashed_secret": "8be3c943b1609fffbfc51aad666d0a04adf83c9d",
+        "is_verified": false,
+        "line_number": 5
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/dictionaries/es.json": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/dictionaries/es.json",
+        "hashed_secret": "5a6d1c612954979ea99ee33dbb2d231b00f6ac0a",
+        "is_verified": false,
+        "line_number": 5
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts",
+        "hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
+        "is_verified": false,
+        "line_number": 6
+      },
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/AgentInputsReadOnly/helpers.ts",
+        "hashed_secret": "f72cbb45464d487064610c5411c576ca4019d380",
+        "is_verified": false,
+        "line_number": 8
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts",
+        "hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
+        "is_verified": false,
+        "line_number": 5
+      },
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/modals/RunAgentModal/components/ModalRunSection/helpers.ts",
+        "hashed_secret": "f72cbb45464d487064610c5411c576ca4019d380",
+        "is_verified": false,
+        "line_number": 7
+      }
+    ],
+    "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx",
+        "hashed_secret": "cf678cab87dc1f7d1b95b964f15375e088461679",
+        "is_verified": false,
+        "line_number": 192
+      },
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/app/(platform)/profile/(user)/integrations/page.tsx",
+        "hashed_secret": "86275db852204937bbdbdebe5fabe8536e030ab6",
+        "is_verified": false,
+        "line_number": 193
+      }
+    ],
+    "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts",
+        "hashed_secret": "47acd2028cf81b5da88ddeedb2aea4eca4b71fbd",
+        "is_verified": false,
+        "line_number": 102
+      },
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts",
+        "hashed_secret": "8be3c943b1609fffbfc51aad666d0a04adf83c9d",
+        "is_verified": false,
+        "line_number": 103
+      }
+    ],
+    "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts": [
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "9c486c92f1a7420e1045c7ad963fbb7ba3621025",
+        "is_verified": false,
+        "line_number": 73
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "9277508c7a6effc8fb59163efbfada189e35425c",
+        "is_verified": false,
+        "line_number": 75
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "8dc7e2cb1d0935897d541bf5facab389b8a50340",
+        "is_verified": false,
+        "line_number": 77
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "79a26ad48775944299be6aaf9fb1d5302c1ed75b",
+        "is_verified": false,
+        "line_number": 79
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "a3b62b44500a1612e48d4cab8294df81561b3b1a",
+        "is_verified": false,
+        "line_number": 81
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "a58979bd0b21ef4f50417d001008e60dd7a85c64",
+        "is_verified": false,
+        "line_number": 83
+      },
+      {
+        "type": "Base64 High Entropy String",
+        "filename": "autogpt_platform/frontend/src/lib/autogpt-server-api/utils.ts",
+        "hashed_secret": "6cb6e075f8e8c7c850f9d128d6608e5dbe209a79",
+        "is_verified": false,
+        "line_number": 85
+      }
+    ],
+    "autogpt_platform/frontend/src/lib/constants.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/lib/constants.ts",
+        "hashed_secret": "27b924db06a28cc755fb07c54f0fddc30659fe4d",
+        "is_verified": false,
+        "line_number": 10
+      }
+    ],
+    "autogpt_platform/frontend/src/tests/credentials/index.ts": [
+      {
+        "type": "Secret Keyword",
+        "filename": "autogpt_platform/frontend/src/tests/credentials/index.ts",
+        "hashed_secret": "c18006fc138809314751cd1991f1e0b820fabd37",
+        "is_verified": false,
+        "line_number": 4
+      }
+    ]
+  },
+  "generated_at": "2026-04-02T13:10:54Z"
+}
diff --git a/AGENTS.md b/AGENTS.md
index f88741ae3a..d0b325167c 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -30,7 +30,7 @@ See `/frontend/CONTRIBUTING.md` for complete patterns. Quick reference:
    - Regenerate with `pnpm generate:api`
    - Pattern: `use{Method}{Version}{OperationName}`
 4. **Styling**: Tailwind CSS only, use design tokens, Phosphor Icons only
-5. **Testing**: Add Storybook stories for new components, Playwright for E2E
+5. **Testing**: Integration tests (Vitest + RTL + MSW) are the default (~90%, page-level). Playwright for E2E critical flows. Storybook for design system components. See `autogpt_platform/frontend/TESTING.md`
 6. **Code conventions**: Function declarations (not arrow functions) for components/handlers
 
 - Component props should be `interface Props { ... }` (not exported) unless the interface needs to be used outside the component
@@ -47,7 +47,9 @@ See `/frontend/CONTRIBUTING.md` for complete patterns. Quick reference:
 ## Testing
 
 - Backend: `poetry run test` (runs pytest with a docker based postgres + prisma).
-- Frontend: `pnpm test` or `pnpm test-ui` for Playwright tests. See `docs/content/platform/contributing/tests.md` for tips.
+- Frontend integration tests: `pnpm test:unit` (Vitest + RTL + MSW, primary testing approach).
+- Frontend E2E tests: `pnpm test` or `pnpm test-ui` for Playwright tests.
+- See `autogpt_platform/frontend/TESTING.md` for the full testing strategy.
 
 Always run the relevant linters and tests before committing.
 Use conventional commit messages for all commits (e.g. `feat(backend): add API`).
diff --git a/autogpt_platform/backend/backend/api/features/chat/routes.py b/autogpt_platform/backend/backend/api/features/chat/routes.py
index b4f876aea4..083ad586f9 100644
--- a/autogpt_platform/backend/backend/api/features/chat/routes.py
+++ b/autogpt_platform/backend/backend/api/features/chat/routes.py
@@ -15,7 +15,8 @@ from pydantic import BaseModel, ConfigDict, Field, field_validator
 
 from backend.copilot import service as chat_service
 from backend.copilot import stream_registry
-from backend.copilot.config import ChatConfig
+from backend.copilot.config import ChatConfig, CopilotMode
+from backend.copilot.db import get_chat_messages_paginated
 from backend.copilot.executor.utils import enqueue_cancel_task, enqueue_copilot_turn
 from backend.copilot.model import (
     ChatMessage,
@@ -111,6 +112,11 @@ class StreamChatRequest(BaseModel):
     file_ids: list[str] | None = Field(
         default=None, max_length=20
     )  # Workspace file IDs attached to this message
+    mode: CopilotMode | None = Field(
+        default=None,
+        description="Autopilot mode: 'fast' for baseline LLM, 'extended_thinking' for Claude Agent SDK. "
+        "If None, uses the server default (extended_thinking).",
+    )
 
 
 class CreateSessionRequest(BaseModel):
@@ -150,6 +156,8 @@ class SessionDetailResponse(BaseModel):
     user_id: str | None
     messages: list[dict]
     active_stream: ActiveStreamInfo | None = None  # Present if stream is still active
+    has_more_messages: bool = False
+    oldest_sequence: int | None = None
     total_prompt_tokens: int = 0
     total_completion_tokens: int = 0
     metadata: ChatSessionMetadata = ChatSessionMetadata()
@@ -389,60 +397,78 @@ async def update_session_title_route(
 async def get_session(
     session_id: str,
     user_id: Annotated[str, Security(auth.get_user_id)],
+    limit: int = Query(default=50, ge=1, le=200),
+    before_sequence: int | None = Query(default=None, ge=0),
 ) -> SessionDetailResponse:
     """
     Retrieve the details of a specific chat session.
 
-    Looks up a chat session by ID for the given user (if authenticated) and returns all session data including messages.
-    If there's an active stream for this session, returns active_stream info for reconnection.
+    Supports cursor-based pagination via ``limit`` and ``before_sequence``.
+    When no pagination params are provided, returns the most recent messages.
 
     Args:
         session_id: The unique identifier for the desired chat session.
-        user_id: The optional authenticated user ID, or None for anonymous access.
+        user_id: The authenticated user's ID.
+        limit: Maximum number of messages to return (1-200, default 50).
+        before_sequence: Return messages with sequence < this value (cursor).
 
     Returns:
-        SessionDetailResponse: Details for the requested session, including active_stream info if applicable.
-
+        SessionDetailResponse: Details for the requested session, including
+            active_stream info and pagination metadata.
     """
-    session = await get_chat_session(session_id, user_id)
-    if not session:
+    page = await get_chat_messages_paginated(
+        session_id, limit, before_sequence, user_id=user_id
+    )
+    if page is None:
         raise NotFoundError(f"Session {session_id} not found.")
+    messages = [message.model_dump() for message in page.messages]
 
-    messages = [message.model_dump() for message in session.messages]
-
-    # Check if there's an active stream for this session
+    # Only check active stream on initial load (not on "load more" requests)
     active_stream_info = None
-    active_session, last_message_id = await stream_registry.get_active_session(
-        session_id, user_id
-    )
-    logger.info(
-        f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
-        f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
-    )
-    if active_session:
-        # Keep the assistant message (including tool_calls) so the frontend can
-        # render the correct tool UI (e.g. CreateAgent with mini game).
-        # convertChatSessionToUiMessages handles isComplete=false by setting
-        # tool parts without output to state "input-available".
-        active_stream_info = ActiveStreamInfo(
-            turn_id=active_session.turn_id,
-            last_message_id=last_message_id,
+    if before_sequence is None:
+        active_session, last_message_id = await stream_registry.get_active_session(
+            session_id, user_id
+        )
+        logger.info(
+            f"[GET_SESSION] session={session_id}, active_session={active_session is not None}, "
+            f"msg_count={len(messages)}, last_role={messages[-1].get('role') if messages else 'none'}"
+        )
+        if active_session:
+            active_stream_info = ActiveStreamInfo(
+                turn_id=active_session.turn_id,
+                last_message_id=last_message_id,
+            )
+
+    # Skip session metadata on "load more" — frontend only needs messages
+    if before_sequence is not None:
+        return SessionDetailResponse(
+            id=page.session.session_id,
+            created_at=page.session.started_at.isoformat(),
+            updated_at=page.session.updated_at.isoformat(),
+            user_id=page.session.user_id or None,
+            messages=messages,
+            active_stream=None,
+            has_more_messages=page.has_more,
+            oldest_sequence=page.oldest_sequence,
+            total_prompt_tokens=0,
+            total_completion_tokens=0,
         )
 
-    # Sum token usage from session
-    total_prompt = sum(u.prompt_tokens for u in session.usage)
-    total_completion = sum(u.completion_tokens for u in session.usage)
+    total_prompt = sum(u.prompt_tokens for u in page.session.usage)
+    total_completion = sum(u.completion_tokens for u in page.session.usage)
 
     return SessionDetailResponse(
-        id=session.session_id,
-        created_at=session.started_at.isoformat(),
-        updated_at=session.updated_at.isoformat(),
-        user_id=session.user_id or None,
+        id=page.session.session_id,
+        created_at=page.session.started_at.isoformat(),
+        updated_at=page.session.updated_at.isoformat(),
+        user_id=page.session.user_id or None,
         messages=messages,
         active_stream=active_stream_info,
+        has_more_messages=page.has_more,
+        oldest_sequence=page.oldest_sequence,
         total_prompt_tokens=total_prompt,
         total_completion_tokens=total_completion,
-        metadata=session.metadata,
+        metadata=page.session.metadata,
     )
 
 
@@ -843,6 +869,7 @@ async def stream_chat_post(
         file_ids=sanitized_file_ids,
         organization_id=ctx.org_id,
         team_id=ctx.team_id,
+        mode=request.mode,
     )
 
     setup_time = (time.perf_counter() - stream_start_time) * 1000
diff --git a/autogpt_platform/backend/backend/api/features/chat/routes_test.py b/autogpt_platform/backend/backend/api/features/chat/routes_test.py
index be3f0962fb..cd87fe611f 100644
--- a/autogpt_platform/backend/backend/api/features/chat/routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/routes_test.py
@@ -541,3 +541,41 @@ def test_create_session_rejects_nested_metadata(
     )
 
     assert response.status_code == 422
+
+
+class TestStreamChatRequestModeValidation:
+    """Pydantic-level validation of the ``mode`` field on StreamChatRequest."""
+
+    def test_rejects_invalid_mode_value(self) -> None:
+        """Any string outside the Literal set must raise ValidationError."""
+        from pydantic import ValidationError
+
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        with pytest.raises(ValidationError):
+            StreamChatRequest(message="hi", mode="turbo")  # type: ignore[arg-type]
+
+    def test_accepts_fast_mode(self) -> None:
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        req = StreamChatRequest(message="hi", mode="fast")
+        assert req.mode == "fast"
+
+    def test_accepts_extended_thinking_mode(self) -> None:
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        req = StreamChatRequest(message="hi", mode="extended_thinking")
+        assert req.mode == "extended_thinking"
+
+    def test_accepts_none_mode(self) -> None:
+        """``mode=None`` is valid (server decides via feature flags)."""
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        req = StreamChatRequest(message="hi", mode=None)
+        assert req.mode is None
+
+    def test_mode_defaults_to_none_when_omitted(self) -> None:
+        from backend.api.features.chat.routes import StreamChatRequest
+
+        req = StreamChatRequest(message="hi")
+        assert req.mode is None
diff --git a/autogpt_platform/backend/backend/api/features/workspace/routes.py b/autogpt_platform/backend/backend/api/features/workspace/routes.py
index 8ca339edbd..39bcc6c7c4 100644
--- a/autogpt_platform/backend/backend/api/features/workspace/routes.py
+++ b/autogpt_platform/backend/backend/api/features/workspace/routes.py
@@ -12,7 +12,7 @@ import fastapi
 from autogpt_libs.auth.dependencies import get_user_id, requires_user
 from fastapi import Query, UploadFile
 from fastapi.responses import Response
-from pydantic import BaseModel
+from pydantic import BaseModel, Field
 
 from backend.data.workspace import (
     WorkspaceFile,
@@ -131,9 +131,26 @@ class StorageUsageResponse(BaseModel):
     file_count: int
 
 
+class WorkspaceFileItem(BaseModel):
+    id: str
+    name: str
+    path: str
+    mime_type: str
+    size_bytes: int
+    metadata: dict = Field(default_factory=dict)
+    created_at: str
+
+
+class ListFilesResponse(BaseModel):
+    files: list[WorkspaceFileItem]
+    offset: int = 0
+    has_more: bool = False
+
+
 @router.get(
     "/files/{file_id}/download",
     summary="Download file by ID",
+    operation_id="getWorkspaceDownloadFileById",
 )
 async def download_file(
     user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -158,6 +175,7 @@ async def download_file(
 @router.delete(
     "/files/{file_id}",
     summary="Delete a workspace file",
+    operation_id="deleteWorkspaceFile",
 )
 async def delete_workspace_file(
     user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -183,6 +201,7 @@ async def delete_workspace_file(
 @router.post(
     "/files/upload",
     summary="Upload file to workspace",
+    operation_id="uploadWorkspaceFile",
 )
 async def upload_file(
     user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -196,6 +215,9 @@ async def upload_file(
     Files are stored in session-scoped paths when session_id is provided,
     so the agent's session-scoped tools can discover them automatically.
     """
+    # Empty-string session_id drops session scoping; normalize to None.
+    session_id = session_id or None
+
     config = Config()
 
     # Sanitize filename — strip any directory components
@@ -250,16 +272,27 @@ async def upload_file(
     manager = WorkspaceManager(user_id, workspace.id, session_id)
     try:
         workspace_file = await manager.write_file(
-            content, filename, overwrite=overwrite
+            content, filename, overwrite=overwrite, metadata={"origin": "user-upload"}
         )
     except ValueError as e:
-        raise fastapi.HTTPException(status_code=409, detail=str(e)) from e
+        # write_file raises ValueError for both path-conflict and size-limit
+        # cases; map each to its correct HTTP status.
+        message = str(e)
+        if message.startswith("File too large"):
+            raise fastapi.HTTPException(status_code=413, detail=message) from e
+        raise fastapi.HTTPException(status_code=409, detail=message) from e
 
     # Post-write storage check — eliminates TOCTOU race on the quota.
     # If a concurrent upload pushed us over the limit, undo this write.
     new_total = await get_workspace_total_size(workspace.id)
     if storage_limit_bytes and new_total > storage_limit_bytes:
-        await soft_delete_workspace_file(workspace_file.id, workspace.id)
+        try:
+            await soft_delete_workspace_file(workspace_file.id, workspace.id)
+        except Exception as e:
+            logger.warning(
+                f"Failed to soft-delete over-quota file {workspace_file.id} "
+                f"in workspace {workspace.id}: {e}"
+            )
         raise fastapi.HTTPException(
             status_code=413,
             detail={
@@ -281,6 +314,7 @@ async def upload_file(
 @router.get(
     "/storage/usage",
     summary="Get workspace storage usage",
+    operation_id="getWorkspaceStorageUsage",
 )
 async def get_storage_usage(
     user_id: Annotated[str, fastapi.Security(get_user_id)],
@@ -301,3 +335,57 @@ async def get_storage_usage(
         used_percent=round((used_bytes / limit_bytes) * 100, 1) if limit_bytes else 0,
         file_count=file_count,
     )
+
+
+@router.get(
+    "/files",
+    summary="List workspace files",
+    operation_id="listWorkspaceFiles",
+)
+async def list_workspace_files(
+    user_id: Annotated[str, fastapi.Security(get_user_id)],
+    session_id: str | None = Query(default=None),
+    limit: int = Query(default=200, ge=1, le=1000),
+    offset: int = Query(default=0, ge=0),
+) -> ListFilesResponse:
+    """
+    List files in the user's workspace.
+
+    When session_id is provided, only files for that session are returned.
+    Otherwise, all files across sessions are listed. Results are paginated
+    via `limit`/`offset`; `has_more` indicates whether additional pages exist.
+    """
+    workspace = await get_or_create_workspace(user_id)
+
+    # Treat empty-string session_id the same as omitted — an empty value
+    # would otherwise silently list files across every session instead of
+    # scoping to one.
+    session_id = session_id or None
+
+    manager = WorkspaceManager(user_id, workspace.id, session_id)
+    include_all = session_id is None
+    # Fetch one extra to compute has_more without a separate count query.
+    files = await manager.list_files(
+        limit=limit + 1,
+        offset=offset,
+        include_all_sessions=include_all,
+    )
+    has_more = len(files) > limit
+    page = files[:limit]
+
+    return ListFilesResponse(
+        files=[
+            WorkspaceFileItem(
+                id=f.id,
+                name=f.name,
+                path=f.path,
+                mime_type=f.mime_type,
+                size_bytes=f.size_bytes,
+                metadata=f.metadata or {},
+                created_at=f.created_at.isoformat(),
+            )
+            for f in page
+        ],
+        offset=offset,
+        has_more=has_more,
+    )
diff --git a/autogpt_platform/backend/backend/api/features/workspace/routes_test.py b/autogpt_platform/backend/backend/api/features/workspace/routes_test.py
index 76da67aaa1..42726ba051 100644
--- a/autogpt_platform/backend/backend/api/features/workspace/routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/workspace/routes_test.py
@@ -1,48 +1,28 @@
-"""Tests for workspace file upload and download routes."""
-
 import io
 from datetime import datetime, timezone
+from unittest.mock import AsyncMock, MagicMock, patch
 
 import fastapi
 import fastapi.testclient
 import pytest
-import pytest_mock
 
-from backend.api.features.workspace import routes as workspace_routes
-from backend.data.workspace import WorkspaceFile
+from backend.api.features.workspace.routes import router
+from backend.data.workspace import Workspace, WorkspaceFile
 
 app = fastapi.FastAPI()
-app.include_router(workspace_routes.router)
+app.include_router(router)
 
 
 @app.exception_handler(ValueError)
 async def _value_error_handler(
     request: fastapi.Request, exc: ValueError
 ) -> fastapi.responses.JSONResponse:
-    """Mirror the production ValueError → 400 mapping from rest_api.py."""
+    """Mirror the production ValueError → 400 mapping from the REST app."""
     return fastapi.responses.JSONResponse(status_code=400, content={"detail": str(exc)})
 
 
 client = fastapi.testclient.TestClient(app)
 
-TEST_USER_ID = "3e53486c-cf57-477e-ba2a-cb02dc828e1a"
-
-MOCK_WORKSPACE = type("W", (), {"id": "ws-1"})()
-
-_NOW = datetime(2023, 1, 1, tzinfo=timezone.utc)
-
-MOCK_FILE = WorkspaceFile(
-    id="file-aaa-bbb",
-    workspace_id="ws-1",
-    created_at=_NOW,
-    updated_at=_NOW,
-    name="hello.txt",
-    path="/session/hello.txt",
-    mime_type="text/plain",
-    size_bytes=13,
-    storage_path="local://hello.txt",
-)
-
 
 @pytest.fixture(autouse=True)
 def setup_app_auth(mock_jwt_user):
@@ -53,25 +33,201 @@ def setup_app_auth(mock_jwt_user):
     app.dependency_overrides.clear()
 
 
+def _make_workspace(user_id: str = "test-user-id") -> Workspace:
+    return Workspace(
+        id="ws-001",
+        user_id=user_id,
+        created_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
+        updated_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
+    )
+
+
+def _make_file(**overrides) -> WorkspaceFile:
+    defaults = {
+        "id": "file-001",
+        "workspace_id": "ws-001",
+        "created_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
+        "updated_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
+        "name": "test.txt",
+        "path": "/test.txt",
+        "storage_path": "local://test.txt",
+        "mime_type": "text/plain",
+        "size_bytes": 100,
+        "checksum": None,
+        "is_deleted": False,
+        "deleted_at": None,
+        "metadata": {},
+    }
+    defaults.update(overrides)
+    return WorkspaceFile(**defaults)
+
+
+def _make_file_mock(**overrides) -> MagicMock:
+    """Create a mock WorkspaceFile to simulate DB records with null fields."""
+    defaults = {
+        "id": "file-001",
+        "name": "test.txt",
+        "path": "/test.txt",
+        "mime_type": "text/plain",
+        "size_bytes": 100,
+        "metadata": {},
+        "created_at": datetime(2026, 1, 1, tzinfo=timezone.utc),
+    }
+    defaults.update(overrides)
+    mock = MagicMock(spec=WorkspaceFile)
+    for k, v in defaults.items():
+        setattr(mock, k, v)
+    return mock
+
+
+# -- list_workspace_files tests --
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_returns_all_when_no_session(mock_manager_cls, mock_get_workspace):
+    mock_get_workspace.return_value = _make_workspace()
+    files = [
+        _make_file(id="f1", name="a.txt", metadata={"origin": "user-upload"}),
+        _make_file(id="f2", name="b.csv", metadata={"origin": "agent-created"}),
+    ]
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = files
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files")
+    assert response.status_code == 200
+
+    data = response.json()
+    assert len(data["files"]) == 2
+    assert data["has_more"] is False
+    assert data["offset"] == 0
+    assert data["files"][0]["id"] == "f1"
+    assert data["files"][0]["metadata"] == {"origin": "user-upload"}
+    assert data["files"][1]["id"] == "f2"
+    mock_instance.list_files.assert_called_once_with(
+        limit=201, offset=0, include_all_sessions=True
+    )
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_scopes_to_session_when_provided(
+    mock_manager_cls, mock_get_workspace, test_user_id
+):
+    mock_get_workspace.return_value = _make_workspace(user_id=test_user_id)
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = []
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files?session_id=sess-123")
+    assert response.status_code == 200
+
+    data = response.json()
+    assert data["files"] == []
+    assert data["has_more"] is False
+    mock_manager_cls.assert_called_once_with(test_user_id, "ws-001", "sess-123")
+    mock_instance.list_files.assert_called_once_with(
+        limit=201, offset=0, include_all_sessions=False
+    )
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_null_metadata_coerced_to_empty_dict(
+    mock_manager_cls, mock_get_workspace
+):
+    """Route uses `f.metadata or {}` for pre-existing files with null metadata."""
+    mock_get_workspace.return_value = _make_workspace()
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = [_make_file_mock(metadata=None)]
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files")
+    assert response.status_code == 200
+    assert response.json()["files"][0]["metadata"] == {}
+
+
+# -- upload_file metadata tests --
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.get_workspace_total_size")
+@patch("backend.api.features.workspace.routes.scan_content_safe")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_upload_passes_user_upload_origin_metadata(
+    mock_manager_cls, mock_scan, mock_total_size, mock_get_workspace
+):
+    mock_get_workspace.return_value = _make_workspace()
+    mock_total_size.return_value = 100
+    written = _make_file(id="new-file", name="doc.pdf")
+    mock_instance = AsyncMock()
+    mock_instance.write_file.return_value = written
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.post(
+        "/files/upload",
+        files={"file": ("doc.pdf", b"fake-pdf-content", "application/pdf")},
+    )
+    assert response.status_code == 200
+
+    mock_instance.write_file.assert_called_once()
+    call_kwargs = mock_instance.write_file.call_args
+    assert call_kwargs.kwargs.get("metadata") == {"origin": "user-upload"}
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.get_workspace_total_size")
+@patch("backend.api.features.workspace.routes.scan_content_safe")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_upload_returns_409_on_file_conflict(
+    mock_manager_cls, mock_scan, mock_total_size, mock_get_workspace
+):
+    mock_get_workspace.return_value = _make_workspace()
+    mock_total_size.return_value = 100
+    mock_instance = AsyncMock()
+    mock_instance.write_file.side_effect = ValueError("File already exists at path")
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.post(
+        "/files/upload",
+        files={"file": ("dup.txt", b"content", "text/plain")},
+    )
+    assert response.status_code == 409
+    assert "already exists" in response.json()["detail"]
+
+
+# -- Restored upload/download/delete security + invariant tests --
+
+
 def _upload(
     filename: str = "hello.txt",
     content: bytes = b"Hello, world!",
     content_type: str = "text/plain",
 ):
-    """Helper to POST a file upload."""
     return client.post(
         "/files/upload?session_id=sess-1",
         files={"file": (filename, io.BytesIO(content), content_type)},
     )
 
 
-# ---- Happy path ----
+_MOCK_FILE = WorkspaceFile(
+    id="file-aaa-bbb",
+    workspace_id="ws-001",
+    created_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
+    updated_at=datetime(2026, 1, 1, tzinfo=timezone.utc),
+    name="hello.txt",
+    path="/sessions/sess-1/hello.txt",
+    mime_type="text/plain",
+    size_bytes=13,
+    storage_path="local://hello.txt",
+)
 
 
-def test_upload_happy_path(mocker: pytest_mock.MockFixture):
+def test_upload_happy_path(mocker):
     mocker.patch(
         "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
     )
     mocker.patch(
         "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -82,7 +238,7 @@ def test_upload_happy_path(mocker: pytest_mock.MockFixture):
         return_value=None,
     )
     mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
     mocker.patch(
         "backend.api.features.workspace.routes.WorkspaceManager",
         return_value=mock_manager,
@@ -96,10 +252,7 @@ def test_upload_happy_path(mocker: pytest_mock.MockFixture):
     assert data["size_bytes"] == 13
 
 
-# ---- Per-file size limit ----
-
-
-def test_upload_exceeds_max_file_size(mocker: pytest_mock.MockFixture):
+def test_upload_exceeds_max_file_size(mocker):
     """Files larger than max_file_size_mb should be rejected with 413."""
     cfg = mocker.patch("backend.api.features.workspace.routes.Config")
     cfg.return_value.max_file_size_mb = 0  # 0 MB → any content is too big
@@ -109,15 +262,11 @@ def test_upload_exceeds_max_file_size(mocker: pytest_mock.MockFixture):
     assert response.status_code == 413
 
 
-# ---- Storage quota exceeded ----
-
-
-def test_upload_storage_quota_exceeded(mocker: pytest_mock.MockFixture):
+def test_upload_storage_quota_exceeded(mocker):
     mocker.patch(
         "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
     )
-    # Current usage already at limit
     mocker.patch(
         "backend.api.features.workspace.routes.get_workspace_total_size",
         return_value=500 * 1024 * 1024,
@@ -128,27 +277,22 @@ def test_upload_storage_quota_exceeded(mocker: pytest_mock.MockFixture):
     assert "Storage limit exceeded" in response.text
 
 
-# ---- Post-write quota race (B2) ----
-
-
-def test_upload_post_write_quota_race(mocker: pytest_mock.MockFixture):
-    """If a concurrent upload tips the total over the limit after write,
-    the file should be soft-deleted and 413 returned."""
+def test_upload_post_write_quota_race(mocker):
+    """Concurrent upload tipping over limit after write should soft-delete + 413."""
     mocker.patch(
         "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
     )
-    # Pre-write check passes (under limit), but post-write check fails
     mocker.patch(
         "backend.api.features.workspace.routes.get_workspace_total_size",
-        side_effect=[0, 600 * 1024 * 1024],  # first call OK, second over limit
+        side_effect=[0, 600 * 1024 * 1024],
     )
     mocker.patch(
         "backend.api.features.workspace.routes.scan_content_safe",
         return_value=None,
     )
     mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
     mocker.patch(
         "backend.api.features.workspace.routes.WorkspaceManager",
         return_value=mock_manager,
@@ -160,17 +304,14 @@ def test_upload_post_write_quota_race(mocker: pytest_mock.MockFixture):
 
     response = _upload()
     assert response.status_code == 413
-    mock_delete.assert_called_once_with("file-aaa-bbb", "ws-1")
+    mock_delete.assert_called_once_with("file-aaa-bbb", "ws-001")
 
 
-# ---- Any extension accepted (no allowlist) ----
-
-
-def test_upload_any_extension(mocker: pytest_mock.MockFixture):
+def test_upload_any_extension(mocker):
     """Any file extension should be accepted — ClamAV is the security layer."""
     mocker.patch(
         "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
     )
     mocker.patch(
         "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -181,7 +322,7 @@ def test_upload_any_extension(mocker: pytest_mock.MockFixture):
         return_value=None,
     )
     mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
     mocker.patch(
         "backend.api.features.workspace.routes.WorkspaceManager",
         return_value=mock_manager,
@@ -191,16 +332,13 @@ def test_upload_any_extension(mocker: pytest_mock.MockFixture):
     assert response.status_code == 200
 
 
-# ---- Virus scan rejection ----
-
-
-def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
+def test_upload_blocked_by_virus_scan(mocker):
     """Files flagged by ClamAV should be rejected and never written to storage."""
     from backend.api.features.store.exceptions import VirusDetectedError
 
     mocker.patch(
         "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
     )
     mocker.patch(
         "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -211,7 +349,7 @@ def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
         side_effect=VirusDetectedError("Eicar-Test-Signature"),
     )
     mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
     mocker.patch(
         "backend.api.features.workspace.routes.WorkspaceManager",
         return_value=mock_manager,
@@ -219,18 +357,14 @@ def test_upload_blocked_by_virus_scan(mocker: pytest_mock.MockFixture):
 
     response = _upload(filename="evil.exe", content=b"X5O!P%@AP...")
     assert response.status_code == 400
-    assert "Virus detected" in response.text
     mock_manager.write_file.assert_not_called()
 
 
-# ---- No file extension ----
-
-
-def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
+def test_upload_file_without_extension(mocker):
     """Files without an extension should be accepted and stored as-is."""
     mocker.patch(
         "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
     )
     mocker.patch(
         "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -241,7 +375,7 @@ def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
         return_value=None,
     )
     mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
     mocker.patch(
         "backend.api.features.workspace.routes.WorkspaceManager",
         return_value=mock_manager,
@@ -257,14 +391,11 @@ def test_upload_file_without_extension(mocker: pytest_mock.MockFixture):
     assert mock_manager.write_file.call_args[0][1] == "Makefile"
 
 
-# ---- Filename sanitization (SF5) ----
-
-
-def test_upload_strips_path_components(mocker: pytest_mock.MockFixture):
+def test_upload_strips_path_components(mocker):
     """Path-traversal filenames should be reduced to their basename."""
     mocker.patch(
         "backend.api.features.workspace.routes.get_or_create_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
     )
     mocker.patch(
         "backend.api.features.workspace.routes.get_workspace_total_size",
@@ -275,28 +406,23 @@ def test_upload_strips_path_components(mocker: pytest_mock.MockFixture):
         return_value=None,
     )
     mock_manager = mocker.MagicMock()
-    mock_manager.write_file = mocker.AsyncMock(return_value=MOCK_FILE)
+    mock_manager.write_file = mocker.AsyncMock(return_value=_MOCK_FILE)
     mocker.patch(
         "backend.api.features.workspace.routes.WorkspaceManager",
         return_value=mock_manager,
     )
 
-    # Filename with traversal
     _upload(filename="../../etc/passwd.txt")
 
-    # write_file should have been called with just the basename
     mock_manager.write_file.assert_called_once()
     call_args = mock_manager.write_file.call_args
     assert call_args[0][1] == "passwd.txt"
 
 
-# ---- Download ----
-
-
-def test_download_file_not_found(mocker: pytest_mock.MockFixture):
+def test_download_file_not_found(mocker):
     mocker.patch(
         "backend.api.features.workspace.routes.get_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
     )
     mocker.patch(
         "backend.api.features.workspace.routes.get_workspace_file",
@@ -307,14 +433,11 @@ def test_download_file_not_found(mocker: pytest_mock.MockFixture):
     assert response.status_code == 404
 
 
-# ---- Delete ----
-
-
-def test_delete_file_success(mocker: pytest_mock.MockFixture):
+def test_delete_file_success(mocker):
     """Deleting an existing file should return {"deleted": true}."""
     mocker.patch(
         "backend.api.features.workspace.routes.get_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
     )
     mock_manager = mocker.MagicMock()
     mock_manager.delete_file = mocker.AsyncMock(return_value=True)
@@ -329,11 +452,11 @@ def test_delete_file_success(mocker: pytest_mock.MockFixture):
     mock_manager.delete_file.assert_called_once_with("file-aaa-bbb")
 
 
-def test_delete_file_not_found(mocker: pytest_mock.MockFixture):
+def test_delete_file_not_found(mocker):
     """Deleting a non-existent file should return 404."""
     mocker.patch(
         "backend.api.features.workspace.routes.get_workspace",
-        return_value=MOCK_WORKSPACE,
+        return_value=_make_workspace(),
     )
     mock_manager = mocker.MagicMock()
     mock_manager.delete_file = mocker.AsyncMock(return_value=False)
@@ -347,7 +470,7 @@ def test_delete_file_not_found(mocker: pytest_mock.MockFixture):
     assert "File not found" in response.text
 
 
-def test_delete_file_no_workspace(mocker: pytest_mock.MockFixture):
+def test_delete_file_no_workspace(mocker):
     """Deleting when user has no workspace should return 404."""
     mocker.patch(
         "backend.api.features.workspace.routes.get_workspace",
@@ -357,3 +480,123 @@ def test_delete_file_no_workspace(mocker: pytest_mock.MockFixture):
     response = client.delete("/files/file-aaa-bbb")
     assert response.status_code == 404
     assert "Workspace not found" in response.text
+
+
+def test_upload_write_file_too_large_returns_413(mocker):
+    """write_file raises ValueError("File too large: …") → must map to 413."""
+    mocker.patch(
+        "backend.api.features.workspace.routes.get_or_create_workspace",
+        return_value=_make_workspace(),
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.get_workspace_total_size",
+        return_value=0,
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.scan_content_safe",
+        return_value=None,
+    )
+    mock_manager = mocker.MagicMock()
+    mock_manager.write_file = mocker.AsyncMock(
+        side_effect=ValueError("File too large: 900 bytes exceeds 1MB limit")
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.WorkspaceManager",
+        return_value=mock_manager,
+    )
+
+    response = _upload()
+    assert response.status_code == 413
+    assert "File too large" in response.text
+
+
+def test_upload_write_file_conflict_returns_409(mocker):
+    """Non-'File too large' ValueErrors from write_file stay as 409."""
+    mocker.patch(
+        "backend.api.features.workspace.routes.get_or_create_workspace",
+        return_value=_make_workspace(),
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.get_workspace_total_size",
+        return_value=0,
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.scan_content_safe",
+        return_value=None,
+    )
+    mock_manager = mocker.MagicMock()
+    mock_manager.write_file = mocker.AsyncMock(
+        side_effect=ValueError("File already exists at path: /sessions/x/a.txt")
+    )
+    mocker.patch(
+        "backend.api.features.workspace.routes.WorkspaceManager",
+        return_value=mock_manager,
+    )
+
+    response = _upload()
+    assert response.status_code == 409
+    assert "already exists" in response.text
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_has_more_true_when_limit_exceeded(
+    mock_manager_cls, mock_get_workspace
+):
+    """The limit+1 fetch trick must flip has_more=True and trim the page."""
+    mock_get_workspace.return_value = _make_workspace()
+    # Backend was asked for limit+1=3, and returned exactly 3 items.
+    files = [
+        _make_file(id="f1", name="a.txt"),
+        _make_file(id="f2", name="b.txt"),
+        _make_file(id="f3", name="c.txt"),
+    ]
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = files
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files?limit=2")
+    assert response.status_code == 200
+    data = response.json()
+    assert data["has_more"] is True
+    assert len(data["files"]) == 2
+    assert data["files"][0]["id"] == "f1"
+    assert data["files"][1]["id"] == "f2"
+    mock_instance.list_files.assert_called_once_with(
+        limit=3, offset=0, include_all_sessions=True
+    )
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_has_more_false_when_exactly_page_size(
+    mock_manager_cls, mock_get_workspace
+):
+    """Exactly `limit` rows means we're on the last page — has_more=False."""
+    mock_get_workspace.return_value = _make_workspace()
+    files = [_make_file(id="f1", name="a.txt"), _make_file(id="f2", name="b.txt")]
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = files
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files?limit=2")
+    assert response.status_code == 200
+    data = response.json()
+    assert data["has_more"] is False
+    assert len(data["files"]) == 2
+
+
+@patch("backend.api.features.workspace.routes.get_or_create_workspace")
+@patch("backend.api.features.workspace.routes.WorkspaceManager")
+def test_list_files_offset_is_echoed_back(mock_manager_cls, mock_get_workspace):
+    mock_get_workspace.return_value = _make_workspace()
+    mock_instance = AsyncMock()
+    mock_instance.list_files.return_value = []
+    mock_manager_cls.return_value = mock_instance
+
+    response = client.get("/files?offset=50&limit=10")
+    assert response.status_code == 200
+    assert response.json()["offset"] == 50
+    mock_instance.list_files.assert_called_once_with(
+        limit=11, offset=50, include_all_sessions=True
+    )
diff --git a/autogpt_platform/backend/backend/blocks/llm.py b/autogpt_platform/backend/backend/blocks/llm.py
index e3e34c9968..66f87b7f47 100644
--- a/autogpt_platform/backend/backend/blocks/llm.py
+++ b/autogpt_platform/backend/backend/blocks/llm.py
@@ -205,6 +205,19 @@ class LlmModel(str, Enum, metaclass=LlmModelMeta):
     KIMI_K2 = "moonshotai/kimi-k2"
     QWEN3_235B_A22B_THINKING = "qwen/qwen3-235b-a22b-thinking-2507"
     QWEN3_CODER = "qwen/qwen3-coder"
+    # Z.ai (Zhipu) models
+    ZAI_GLM_4_32B = "z-ai/glm-4-32b"
+    ZAI_GLM_4_5 = "z-ai/glm-4.5"
+    ZAI_GLM_4_5_AIR = "z-ai/glm-4.5-air"
+    ZAI_GLM_4_5_AIR_FREE = "z-ai/glm-4.5-air:free"
+    ZAI_GLM_4_5V = "z-ai/glm-4.5v"
+    ZAI_GLM_4_6 = "z-ai/glm-4.6"
+    ZAI_GLM_4_6V = "z-ai/glm-4.6v"
+    ZAI_GLM_4_7 = "z-ai/glm-4.7"
+    ZAI_GLM_4_7_FLASH = "z-ai/glm-4.7-flash"
+    ZAI_GLM_5 = "z-ai/glm-5"
+    ZAI_GLM_5_TURBO = "z-ai/glm-5-turbo"
+    ZAI_GLM_5V_TURBO = "z-ai/glm-5v-turbo"
     # Llama API models
     LLAMA_API_LLAMA_4_SCOUT = "Llama-4-Scout-17B-16E-Instruct-FP8"
     LLAMA_API_LLAMA4_MAVERICK = "Llama-4-Maverick-17B-128E-Instruct-FP8"
@@ -630,6 +643,43 @@ MODEL_METADATA = {
     LlmModel.QWEN3_CODER: ModelMetadata(
         "open_router", 262144, 262144, "Qwen 3 Coder", "OpenRouter", "Qwen", 3
     ),
+    # https://openrouter.ai/models?q=z-ai
+    LlmModel.ZAI_GLM_4_32B: ModelMetadata(
+        "open_router", 128000, 128000, "GLM 4 32B", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_4_5: ModelMetadata(
+        "open_router", 131072, 98304, "GLM 4.5", "OpenRouter", "Z.ai", 2
+    ),
+    LlmModel.ZAI_GLM_4_5_AIR: ModelMetadata(
+        "open_router", 131072, 98304, "GLM 4.5 Air", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_4_5_AIR_FREE: ModelMetadata(
+        "open_router", 131072, 96000, "GLM 4.5 Air (Free)", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_4_5V: ModelMetadata(
+        "open_router", 65536, 16384, "GLM 4.5V", "OpenRouter", "Z.ai", 2
+    ),
+    LlmModel.ZAI_GLM_4_6: ModelMetadata(
+        "open_router", 204800, 204800, "GLM 4.6", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_4_6V: ModelMetadata(
+        "open_router", 131072, 131072, "GLM 4.6V", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_4_7: ModelMetadata(
+        "open_router", 202752, 65535, "GLM 4.7", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_4_7_FLASH: ModelMetadata(
+        "open_router", 202752, 202752, "GLM 4.7 Flash", "OpenRouter", "Z.ai", 1
+    ),
+    LlmModel.ZAI_GLM_5: ModelMetadata(
+        "open_router", 80000, 80000, "GLM 5", "OpenRouter", "Z.ai", 2
+    ),
+    LlmModel.ZAI_GLM_5_TURBO: ModelMetadata(
+        "open_router", 202752, 131072, "GLM 5 Turbo", "OpenRouter", "Z.ai", 3
+    ),
+    LlmModel.ZAI_GLM_5V_TURBO: ModelMetadata(
+        "open_router", 202752, 131072, "GLM 5V Turbo", "OpenRouter", "Z.ai", 3
+    ),
     # Llama API models
     LlmModel.LLAMA_API_LLAMA_4_SCOUT: ModelMetadata(
         "llama_api",
diff --git a/autogpt_platform/backend/backend/copilot/baseline/service.py b/autogpt_platform/backend/backend/copilot/baseline/service.py
index 379686b64d..abbe159b9b 100644
--- a/autogpt_platform/backend/backend/copilot/baseline/service.py
+++ b/autogpt_platform/backend/backend/copilot/baseline/service.py
@@ -7,22 +7,29 @@ shared tool registry as the SDK path.
 """
 
 import asyncio
+import base64
 import logging
+import os
+import re
+import shutil
+import tempfile
 import uuid
 from collections.abc import AsyncGenerator, Sequence
 from dataclasses import dataclass, field
 from functools import partial
-from typing import Any, cast
+from typing import TYPE_CHECKING, Any, cast
 
 import orjson
 from langfuse import propagate_attributes
 from openai.types.chat import ChatCompletionMessageParam, ChatCompletionToolParam
 
-from backend.copilot.context import set_execution_context
+from backend.copilot.config import CopilotMode
+from backend.copilot.context import get_workspace_manager, set_execution_context
 from backend.copilot.model import (
     ChatMessage,
     ChatSession,
     get_chat_session,
+    maybe_append_user_message,
     update_session_title,
     upsert_chat_session,
 )
@@ -51,6 +58,15 @@ from backend.copilot.service import (
 from backend.copilot.token_tracking import persist_and_record_usage
 from backend.copilot.tools import execute_tool, get_available_tools
 from backend.copilot.tracking import track_user_message
+from backend.copilot.transcript import (
+    STOP_REASON_END_TURN,
+    STOP_REASON_TOOL_USE,
+    TranscriptDownload,
+    download_transcript,
+    upload_transcript,
+    validate_transcript,
+)
+from backend.copilot.transcript_builder import TranscriptBuilder
 from backend.util.exceptions import NotFoundError
 from backend.util.prompt import (
     compress_context,
@@ -64,6 +80,9 @@ from backend.util.tool_call_loop import (
     tool_call_loop,
 )
 
+if TYPE_CHECKING:
+    from backend.copilot.permissions import CopilotPermissions
+
 logger = logging.getLogger(__name__)
 
 # Set to hold background tasks to prevent garbage collection
@@ -72,6 +91,233 @@ _background_tasks: set[asyncio.Task[Any]] = set()
 # Maximum number of tool-call rounds before forcing a text response.
 _MAX_TOOL_ROUNDS = 30
 
+# Max seconds to wait for transcript upload in the finally block before
+# letting it continue as a background task (tracked in _background_tasks).
+_TRANSCRIPT_UPLOAD_TIMEOUT_S = 5
+
+# MIME types that can be embedded as vision content blocks (OpenAI format).
+_VISION_MIME_TYPES = frozenset({"image/png", "image/jpeg", "image/gif", "image/webp"})
+
+# Max size for embedding images directly in the user message (20 MiB raw).
+_MAX_INLINE_IMAGE_BYTES = 20 * 1024 * 1024
+
+# Matches characters unsafe for filenames.
+_UNSAFE_FILENAME = re.compile(r"[^\w.\-]")
+
+
+async def _prepare_baseline_attachments(
+    file_ids: list[str],
+    user_id: str,
+    session_id: str,
+    working_dir: str,
+) -> tuple[str, list[dict[str, Any]]]:
+    """Download workspace files and prepare them for the baseline LLM.
+
+    Images become OpenAI-format vision content blocks.  Non-image files are
+    saved to *working_dir* so tool handlers can access them.
+
+    Returns ``(hint_text, image_blocks)``.
+    """
+    if not file_ids or not user_id:
+        return "", []
+
+    try:
+        manager = await get_workspace_manager(user_id, session_id)
+    except Exception:
+        logger.warning(
+            "Failed to create workspace manager for file attachments",
+            exc_info=True,
+        )
+        return "", []
+
+    image_blocks: list[dict[str, Any]] = []
+    file_descriptions: list[str] = []
+
+    for fid in file_ids:
+        try:
+            file_info = await manager.get_file_info(fid)
+            if file_info is None:
+                continue
+            content = await manager.read_file_by_id(fid)
+            mime = (file_info.mime_type or "").split(";")[0].strip().lower()
+
+            if mime in _VISION_MIME_TYPES and len(content) <= _MAX_INLINE_IMAGE_BYTES:
+                b64 = base64.b64encode(content).decode("ascii")
+                image_blocks.append(
+                    {
+                        "type": "image",
+                        "source": {"type": "base64", "media_type": mime, "data": b64},
+                    }
+                )
+                file_descriptions.append(
+                    f"- {file_info.name} ({mime}, "
+                    f"{file_info.size_bytes:,} bytes) [embedded as image]"
+                )
+            else:
+                safe = _UNSAFE_FILENAME.sub("_", file_info.name) or "file"
+                candidate = os.path.join(working_dir, safe)
+                if os.path.exists(candidate):
+                    stem, ext = os.path.splitext(safe)
+                    idx = 1
+                    while os.path.exists(candidate):
+                        candidate = os.path.join(working_dir, f"{stem}_{idx}{ext}")
+                        idx += 1
+                with open(candidate, "wb") as f:
+                    f.write(content)
+                file_descriptions.append(
+                    f"- {file_info.name} ({mime}, "
+                    f"{file_info.size_bytes:,} bytes) saved to "
+                    f"{os.path.basename(candidate)}"
+                )
+        except Exception:
+            logger.warning("Failed to prepare file %s", fid[:12], exc_info=True)
+
+    if not file_descriptions:
+        return "", []
+
+    noun = "file" if len(file_descriptions) == 1 else "files"
+    has_non_images = len(file_descriptions) > len(image_blocks)
+    read_hint = (
+        " Use the read_workspace_file tool to view non-image files."
+        if has_non_images
+        else ""
+    )
+    hint = (
+        f"\n[The user attached {len(file_descriptions)} {noun}.{read_hint}\n"
+        + "\n".join(file_descriptions)
+        + "]"
+    )
+    return hint, image_blocks
+
+
+def _filter_tools_by_permissions(
+    tools: list[ChatCompletionToolParam],
+    permissions: "CopilotPermissions",
+) -> list[ChatCompletionToolParam]:
+    """Filter OpenAI-format tools based on CopilotPermissions.
+
+    Uses short tool names (the ``function.name`` field) to compute the
+    effective allowed set, then keeps only matching tools.
+    """
+    from backend.copilot.permissions import all_known_tool_names
+
+    if permissions.is_empty():
+        return tools
+
+    all_tools = all_known_tool_names()
+    effective = permissions.effective_allowed_tools(all_tools)
+
+    return [
+        t
+        for t in tools
+        if t.get("function", {}).get("name") in effective  # type: ignore[union-attr]
+    ]
+
+
+def _resolve_baseline_model(mode: CopilotMode | None) -> str:
+    """Pick the model for the baseline path based on the per-request mode.
+
+    Only ``mode='fast'`` downgrades to the cheaper/faster model.  Any other
+    value (including ``None`` and ``'extended_thinking'``) preserves the
+    default model so that users who never select a mode don't get
+    silently moved to the cheaper tier.
+    """
+    if mode == "fast":
+        return config.fast_model
+    return config.model
+
+
+# Tag pairs to strip from baseline streaming output.  Different models use
+# different tag names for their internal reasoning (Claude uses <thinking>,
+# Gemini uses <internal_reasoning>, etc.).
+_REASONING_TAG_PAIRS: list[tuple[str, str]] = [
+    ("<thinking>", "</thinking>"),
+    ("<internal_reasoning>", "</internal_reasoning>"),
+]
+
+# Longest opener — used to size the partial-tag buffer.
+_MAX_OPEN_TAG_LEN = max(len(o) for o, _ in _REASONING_TAG_PAIRS)
+
+
+class _ThinkingStripper:
+    """Strip reasoning blocks from a stream of text deltas.
+
+    Handles multiple tag patterns (``<thinking>``, ``<internal_reasoning>``,
+    etc.) so the same stripper works across Claude, Gemini, and other models.
+
+    Buffers just enough characters to detect a tag that may be split
+    across chunks; emits text immediately when no tag is in-flight.
+    Robust to single chunks that open and close a block, multiple
+    blocks per stream, and tags that straddle chunk boundaries.
+    """
+
+    def __init__(self) -> None:
+        self._buffer: str = ""
+        self._in_thinking: bool = False
+        self._close_tag: str = ""  # closing tag for the currently open block
+
+    def _find_open_tag(self) -> tuple[int, str, str]:
+        """Find the earliest opening tag in the buffer.
+
+        Returns (position, open_tag, close_tag) or (-1, "", "") if none.
+        """
+        best_pos = -1
+        best_open = ""
+        best_close = ""
+        for open_tag, close_tag in _REASONING_TAG_PAIRS:
+            pos = self._buffer.find(open_tag)
+            if pos != -1 and (best_pos == -1 or pos < best_pos):
+                best_pos = pos
+                best_open = open_tag
+                best_close = close_tag
+        return best_pos, best_open, best_close
+
+    def process(self, chunk: str) -> str:
+        """Feed a chunk and return the text that is safe to emit now."""
+        self._buffer += chunk
+        out: list[str] = []
+        while self._buffer:
+            if self._in_thinking:
+                end = self._buffer.find(self._close_tag)
+                if end == -1:
+                    keep = len(self._close_tag) - 1
+                    self._buffer = self._buffer[-keep:] if keep else ""
+                    return "".join(out)
+                self._buffer = self._buffer[end + len(self._close_tag) :]
+                self._in_thinking = False
+                self._close_tag = ""
+            else:
+                start, open_tag, close_tag = self._find_open_tag()
+                if start == -1:
+                    # No opening tag; emit everything except a tail that
+                    # could start a partial opener on the next chunk.
+                    safe_end = len(self._buffer)
+                    for keep in range(
+                        min(_MAX_OPEN_TAG_LEN - 1, len(self._buffer)), 0, -1
+                    ):
+                        tail = self._buffer[-keep:]
+                        if any(o[:keep] == tail for o, _ in _REASONING_TAG_PAIRS):
+                            safe_end = len(self._buffer) - keep
+                            break
+                    out.append(self._buffer[:safe_end])
+                    self._buffer = self._buffer[safe_end:]
+                    return "".join(out)
+                out.append(self._buffer[:start])
+                self._buffer = self._buffer[start + len(open_tag) :]
+                self._in_thinking = True
+                self._close_tag = close_tag
+        return "".join(out)
+
+    def flush(self) -> str:
+        """Return any remaining emittable text when the stream ends."""
+        if self._in_thinking:
+            # Unclosed thinking block — discard the buffered reasoning.
+            self._buffer = ""
+            return ""
+        out = self._buffer
+        self._buffer = ""
+        return out
+
 
 @dataclass
 class _BaselineStreamState:
@@ -81,12 +327,15 @@ class _BaselineStreamState:
     can be module-level functions instead of deeply nested closures.
     """
 
+    model: str = ""
     pending_events: list[StreamBaseResponse] = field(default_factory=list)
     assistant_text: str = ""
     text_block_id: str = field(default_factory=lambda: str(uuid.uuid4()))
     text_started: bool = False
     turn_prompt_tokens: int = 0
     turn_completion_tokens: int = 0
+    thinking_stripper: _ThinkingStripper = field(default_factory=_ThinkingStripper)
+    session_messages: list[ChatMessage] = field(default_factory=list)
 
 
 async def _baseline_llm_caller(
@@ -100,6 +349,9 @@ async def _baseline_llm_caller(
     Extracted from ``stream_chat_completion_baseline`` for readability.
     """
     state.pending_events.append(StreamStartStep())
+    # Fresh thinking-strip state per round so a malformed unclosed
+    # block in one LLM call cannot silently drop content in the next.
+    state.thinking_stripper = _ThinkingStripper()
 
     round_text = ""
     try:
@@ -108,7 +360,7 @@ async def _baseline_llm_caller(
         if tools:
             typed_tools = cast(list[ChatCompletionToolParam], tools)
             response = await client.chat.completions.create(
-                model=config.model,
+                model=state.model,
                 messages=typed_messages,
                 tools=typed_tools,
                 stream=True,
@@ -116,7 +368,7 @@ async def _baseline_llm_caller(
             )
         else:
             response = await client.chat.completions.create(
-                model=config.model,
+                model=state.model,
                 messages=typed_messages,
                 stream=True,
                 stream_options={"include_usage": True},
@@ -133,13 +385,17 @@ async def _baseline_llm_caller(
                 continue
 
             if delta.content:
-                if not state.text_started:
-                    state.pending_events.append(StreamTextStart(id=state.text_block_id))
-                    state.text_started = True
-                round_text += delta.content
-                state.pending_events.append(
-                    StreamTextDelta(id=state.text_block_id, delta=delta.content)
-                )
+                emit = state.thinking_stripper.process(delta.content)
+                if emit:
+                    if not state.text_started:
+                        state.pending_events.append(
+                            StreamTextStart(id=state.text_block_id)
+                        )
+                        state.text_started = True
+                    round_text += emit
+                    state.pending_events.append(
+                        StreamTextDelta(id=state.text_block_id, delta=emit)
+                    )
 
             if delta.tool_calls:
                 for tc in delta.tool_calls:
@@ -158,6 +414,16 @@ async def _baseline_llm_caller(
                     if tc.function and tc.function.arguments:
                         entry["arguments"] += tc.function.arguments
 
+        # Flush any buffered text held back by the thinking stripper.
+        tail = state.thinking_stripper.flush()
+        if tail:
+            if not state.text_started:
+                state.pending_events.append(StreamTextStart(id=state.text_block_id))
+                state.text_started = True
+            round_text += tail
+            state.pending_events.append(
+                StreamTextDelta(id=state.text_block_id, delta=tail)
+            )
         # Close text block
         if state.text_started:
             state.pending_events.append(StreamTextEnd(id=state.text_block_id))
@@ -278,17 +544,17 @@ async def _baseline_tool_executor(
         )
 
 
-def _baseline_conversation_updater(
+def _mutate_openai_messages(
     messages: list[dict[str, Any]],
     response: LLMLoopResponse,
-    tool_results: list[ToolCallResult] | None = None,
+    tool_results: list[ToolCallResult] | None,
 ) -> None:
-    """Update OpenAI message list with assistant response + tool results.
+    """Append assistant / tool-result entries to the OpenAI message list.
 
-    Extracted from ``stream_chat_completion_baseline`` for readability.
+    This is the side-effect boundary for the next LLM call — no transcript
+    mutation happens here.
     """
     if tool_results:
-        # Build assistant message with tool_calls
         assistant_msg: dict[str, Any] = {"role": "assistant"}
         if response.response_text:
             assistant_msg["content"] = response.response_text
@@ -309,9 +575,115 @@ def _baseline_conversation_updater(
                     "content": tr.content,
                 }
             )
-    else:
+    elif response.response_text:
+        messages.append({"role": "assistant", "content": response.response_text})
+
+
+def _record_turn_to_transcript(
+    response: LLMLoopResponse,
+    tool_results: list[ToolCallResult] | None,
+    *,
+    transcript_builder: TranscriptBuilder,
+    model: str,
+) -> None:
+    """Append assistant + tool-result entries to the transcript builder.
+
+    Kept separate from :func:`_mutate_openai_messages` so the two
+    concerns (next-LLM-call payload vs. durable conversation log) can
+    evolve independently.
+    """
+    if tool_results:
+        content_blocks: list[dict[str, Any]] = []
         if response.response_text:
-            messages.append({"role": "assistant", "content": response.response_text})
+            content_blocks.append({"type": "text", "text": response.response_text})
+        for tc in response.tool_calls:
+            try:
+                args = orjson.loads(tc.arguments) if tc.arguments else {}
+            except (ValueError, TypeError, orjson.JSONDecodeError) as parse_err:
+                logger.debug(
+                    "[Baseline] Failed to parse tool_call arguments "
+                    "(tool=%s, id=%s): %s",
+                    tc.name,
+                    tc.id,
+                    parse_err,
+                )
+                args = {}
+            content_blocks.append(
+                {
+                    "type": "tool_use",
+                    "id": tc.id,
+                    "name": tc.name,
+                    "input": args,
+                }
+            )
+        if content_blocks:
+            transcript_builder.append_assistant(
+                content_blocks=content_blocks,
+                model=model,
+                stop_reason=STOP_REASON_TOOL_USE,
+            )
+        for tr in tool_results:
+            # Record tool result to transcript AFTER the assistant tool_use
+            # block to maintain correct Anthropic API ordering:
+            # assistant(tool_use) → user(tool_result)
+            transcript_builder.append_tool_result(
+                tool_use_id=tr.tool_call_id,
+                content=tr.content,
+            )
+    elif response.response_text:
+        transcript_builder.append_assistant(
+            content_blocks=[{"type": "text", "text": response.response_text}],
+            model=model,
+            stop_reason=STOP_REASON_END_TURN,
+        )
+
+
+def _baseline_conversation_updater(
+    messages: list[dict[str, Any]],
+    response: LLMLoopResponse,
+    tool_results: list[ToolCallResult] | None = None,
+    *,
+    transcript_builder: TranscriptBuilder,
+    model: str = "",
+    state: _BaselineStreamState | None = None,
+) -> None:
+    """Update OpenAI message list with assistant response + tool results.
+
+    Also records structured ChatMessage entries in ``state.session_messages``
+    so the full tool-call history is persisted to the session (not just the
+    concatenated assistant text).
+    """
+    _mutate_openai_messages(messages, response, tool_results)
+    _record_turn_to_transcript(
+        response,
+        tool_results,
+        transcript_builder=transcript_builder,
+        model=model,
+    )
+    # Record structured messages for session persistence so tool calls
+    # and tool results survive across turns and mode switches.
+    if state is not None and tool_results:
+        assistant_msg = ChatMessage(
+            role="assistant",
+            content=response.response_text or "",
+            tool_calls=[
+                {
+                    "id": tc.id,
+                    "type": "function",
+                    "function": {"name": tc.name, "arguments": tc.arguments},
+                }
+                for tc in response.tool_calls
+            ],
+        )
+        state.session_messages.append(assistant_msg)
+        for tr in tool_results:
+            state.session_messages.append(
+                ChatMessage(
+                    role="tool",
+                    content=tr.content,
+                    tool_call_id=tr.tool_call_id,
+                )
+            )
 
 
 async def _update_title_async(
@@ -328,6 +700,7 @@ async def _update_title_async(
 
 async def _compress_session_messages(
     messages: list[ChatMessage],
+    model: str,
 ) -> list[ChatMessage]:
     """Compress session messages if they exceed the model's token limit.
 
@@ -340,45 +713,189 @@ async def _compress_session_messages(
         msg_dict: dict[str, Any] = {"role": msg.role}
         if msg.content:
             msg_dict["content"] = msg.content
+        if msg.tool_calls:
+            msg_dict["tool_calls"] = msg.tool_calls
+        if msg.tool_call_id:
+            msg_dict["tool_call_id"] = msg.tool_call_id
         messages_dict.append(msg_dict)
 
     try:
         result = await compress_context(
             messages=messages_dict,
-            model=config.model,
+            model=model,
             client=_get_openai_client(),
         )
     except Exception as e:
         logger.warning("[Baseline] Context compression with LLM failed: %s", e)
         result = await compress_context(
             messages=messages_dict,
-            model=config.model,
+            model=model,
             client=None,
         )
 
     if result.was_compacted:
         logger.info(
-            "[Baseline] Context compacted: %d -> %d tokens "
-            "(%d summarized, %d dropped)",
+            "[Baseline] Context compacted: %d -> %d tokens (%d summarized, %d dropped)",
             result.original_token_count,
             result.token_count,
             result.messages_summarized,
             result.messages_dropped,
         )
         return [
-            ChatMessage(role=m["role"], content=m.get("content"))
+            ChatMessage(
+                role=m["role"],
+                content=m.get("content"),
+                tool_calls=m.get("tool_calls"),
+                tool_call_id=m.get("tool_call_id"),
+            )
             for m in result.messages
         ]
 
     return messages
 
 
+def is_transcript_stale(dl: TranscriptDownload | None, session_msg_count: int) -> bool:
+    """Return ``True`` when a download doesn't cover the current session.
+
+    A transcript is stale when it has a known ``message_count`` and that
+    count doesn't reach ``session_msg_count - 1`` (i.e. the session has
+    already advanced beyond what the stored transcript captures).
+    Loading a stale transcript would silently drop intermediate turns,
+    so callers should treat stale as "skip load, skip upload".
+
+    An unknown ``message_count`` (``0``) is treated as **not stale**
+    because older transcripts uploaded before msg_count tracking
+    existed must still be usable.
+    """
+    if dl is None:
+        return False
+    if not dl.message_count:
+        return False
+    return dl.message_count < session_msg_count - 1
+
+
+def should_upload_transcript(
+    user_id: str | None, transcript_covers_prefix: bool
+) -> bool:
+    """Return ``True`` when the caller should upload the final transcript.
+
+    Uploads require a logged-in user (for the storage key) *and* a
+    transcript that covered the session prefix when loaded — otherwise
+    we'd be overwriting a more complete version in storage with a
+    partial one built from just the current turn.
+    """
+    return bool(user_id) and transcript_covers_prefix
+
+
+async def _load_prior_transcript(
+    user_id: str,
+    session_id: str,
+    session_msg_count: int,
+    transcript_builder: TranscriptBuilder,
+) -> bool:
+    """Download and load the prior transcript into ``transcript_builder``.
+
+    Returns ``True`` when the loaded transcript fully covers the session
+    prefix; ``False`` otherwise (stale, missing, invalid, or download
+    error).  Callers should suppress uploads when this returns ``False``
+    to avoid overwriting a more complete version in storage.
+    """
+    try:
+        dl = await download_transcript(user_id, session_id, log_prefix="[Baseline]")
+    except Exception as e:
+        logger.warning("[Baseline] Transcript download failed: %s", e)
+        return False
+
+    if dl is None:
+        logger.debug("[Baseline] No transcript available")
+        return False
+
+    if not validate_transcript(dl.content):
+        logger.warning("[Baseline] Downloaded transcript but invalid")
+        return False
+
+    if is_transcript_stale(dl, session_msg_count):
+        logger.warning(
+            "[Baseline] Transcript stale: covers %d of %d messages, skipping",
+            dl.message_count,
+            session_msg_count,
+        )
+        return False
+
+    transcript_builder.load_previous(dl.content, log_prefix="[Baseline]")
+    logger.info(
+        "[Baseline] Loaded transcript: %dB, msg_count=%d",
+        len(dl.content),
+        dl.message_count,
+    )
+    return True
+
+
+async def _upload_final_transcript(
+    user_id: str,
+    session_id: str,
+    transcript_builder: TranscriptBuilder,
+    session_msg_count: int,
+) -> None:
+    """Serialize and upload the transcript for next-turn continuity.
+
+    Uses the builder's own invariants to decide whether to upload,
+    avoiding a JSONL re-parse.  A builder that ends with an assistant
+    entry is structurally complete; a builder that doesn't (empty, or
+    ends mid-turn) is skipped.
+    """
+    try:
+        if transcript_builder.last_entry_type != "assistant":
+            logger.debug(
+                "[Baseline] No complete assistant turn to upload (last_entry=%s)",
+                transcript_builder.last_entry_type,
+            )
+            return
+        content = transcript_builder.to_jsonl()
+        if not content:
+            logger.debug("[Baseline] Empty transcript content, skipping upload")
+            return
+        # Track the upload as a background task so a timeout doesn't leak an
+        # orphaned coroutine; shield it so cancellation of this caller doesn't
+        # abort the in-flight GCS write.
+        upload_task = asyncio.create_task(
+            upload_transcript(
+                user_id=user_id,
+                session_id=session_id,
+                content=content,
+                message_count=session_msg_count,
+                log_prefix="[Baseline]",
+                skip_strip=True,
+            )
+        )
+        _background_tasks.add(upload_task)
+        upload_task.add_done_callback(_background_tasks.discard)
+        # Bound the wait: a hung storage backend must not block the response
+        # from finishing. The task keeps running in _background_tasks on
+        # timeout and will be cleaned up when it resolves.
+        await asyncio.wait_for(
+            asyncio.shield(upload_task), timeout=_TRANSCRIPT_UPLOAD_TIMEOUT_S
+        )
+    except asyncio.TimeoutError:
+        # Upload is still running in _background_tasks; we just stopped waiting.
+        logger.info(
+            "[Baseline] Transcript upload exceeded %ss wait — continuing as background task",
+            _TRANSCRIPT_UPLOAD_TIMEOUT_S,
+        )
+    except Exception as upload_err:
+        logger.error("[Baseline] Transcript upload failed: %s", upload_err)
+
+
 async def stream_chat_completion_baseline(
     session_id: str,
     message: str | None = None,
     is_user_message: bool = True,
     user_id: str | None = None,
     session: ChatSession | None = None,
+    file_ids: list[str] | None = None,
+    permissions: "CopilotPermissions | None" = None,
+    context: dict[str, str] | None = None,
+    mode: CopilotMode | None = None,
     **_kwargs: Any,
 ) -> AsyncGenerator[StreamBaseResponse, None]:
     """Baseline LLM with tool calling via OpenAI-compatible API.
@@ -397,25 +914,74 @@ async def stream_chat_completion_baseline(
             f"Session {session_id} not found. Please create a new session first."
         )
 
-    # Append user message
-    new_role = "user" if is_user_message else "assistant"
-    if message and (
-        len(session.messages) == 0
-        or not (
-            session.messages[-1].role == new_role
-            and session.messages[-1].content == message
-        )
-    ):
-        session.messages.append(ChatMessage(role=new_role, content=message))
+    if maybe_append_user_message(session, message, is_user_message):
         if is_user_message:
             track_user_message(
                 user_id=user_id,
                 session_id=session_id,
-                message_length=len(message),
+                message_length=len(message or ""),
             )
 
     session = await upsert_chat_session(session)
 
+    # Select model based on the per-request mode.  'fast' downgrades to
+    # the cheaper/faster model; everything else keeps the default.
+    active_model = _resolve_baseline_model(mode)
+
+    # --- E2B sandbox setup (feature parity with SDK path) ---
+    e2b_sandbox = None
+    e2b_api_key = config.active_e2b_api_key
+    if e2b_api_key:
+        try:
+            from backend.copilot.tools.e2b_sandbox import get_or_create_sandbox
+
+            e2b_sandbox = await get_or_create_sandbox(
+                session_id,
+                api_key=e2b_api_key,
+                template=config.e2b_sandbox_template,
+                timeout=config.e2b_sandbox_timeout,
+                on_timeout=config.e2b_sandbox_on_timeout,
+            )
+        except Exception:
+            logger.warning("[Baseline] E2B sandbox setup failed", exc_info=True)
+
+    # --- Transcript support (feature parity with SDK path) ---
+    transcript_builder = TranscriptBuilder()
+    transcript_covers_prefix = True
+
+    # Build system prompt only on the first turn to avoid mid-conversation
+    # changes from concurrent chats updating business understanding.
+    is_first_turn = len(session.messages) <= 1
+    if is_first_turn:
+        prompt_task = _build_system_prompt(user_id, has_conversation_history=False)
+    else:
+        prompt_task = _build_system_prompt(user_id=None, has_conversation_history=True)
+
+    # Run download + prompt build concurrently — both are independent I/O
+    # on the request critical path.
+    if user_id and len(session.messages) > 1:
+        transcript_covers_prefix, (base_system_prompt, _) = await asyncio.gather(
+            _load_prior_transcript(
+                user_id=user_id,
+                session_id=session_id,
+                session_msg_count=len(session.messages),
+                transcript_builder=transcript_builder,
+            ),
+            prompt_task,
+        )
+    else:
+        base_system_prompt, _ = await prompt_task
+
+    # Append user message to transcript.
+    # Always append when the message is present and is from the user,
+    # even on duplicate-suppressed retries (is_new_message=False).
+    # The loaded transcript may be stale (uploaded before the previous
+    # attempt stored this message), so skipping it would leave the
+    # transcript without the user turn, creating a malformed
+    # assistant-after-assistant structure when the LLM reply is added.
+    if message and is_user_message:
+        transcript_builder.append_user(content=message)
+
     # Generate title for new sessions
     if is_user_message and not session.title:
         user_messages = [m for m in session.messages if m.role == "user"]
@@ -430,36 +996,104 @@ async def stream_chat_completion_baseline(
 
     message_id = str(uuid.uuid4())
 
-    # Build system prompt only on the first turn to avoid mid-conversation
-    # changes from concurrent chats updating business understanding.
-    is_first_turn = len(session.messages) <= 1
-    if is_first_turn:
-        base_system_prompt, _ = await _build_system_prompt(
-            user_id, has_conversation_history=False
-        )
-    else:
-        base_system_prompt, _ = await _build_system_prompt(
-            user_id=None, has_conversation_history=True
-        )
-
     # Append tool documentation and technical notes
     system_prompt = base_system_prompt + get_baseline_supplement()
 
     # Compress context if approaching the model's token limit
-    messages_for_context = await _compress_session_messages(session.messages)
+    messages_for_context = await _compress_session_messages(
+        session.messages, model=active_model
+    )
 
-    # Build OpenAI message list from session history
+    # Build OpenAI message list from session history.
+    # Include tool_calls on assistant messages and tool-role results so the
+    # model retains full context of what tools were invoked and their outcomes.
     openai_messages: list[dict[str, Any]] = [
         {"role": "system", "content": system_prompt}
     ]
     for msg in messages_for_context:
-        if msg.role in ("user", "assistant") and msg.content:
+        if msg.role == "assistant":
+            entry: dict[str, Any] = {"role": "assistant"}
+            if msg.content:
+                entry["content"] = msg.content
+            if msg.tool_calls:
+                entry["tool_calls"] = msg.tool_calls
+            if msg.content or msg.tool_calls:
+                openai_messages.append(entry)
+        elif msg.role == "tool" and msg.tool_call_id:
+            openai_messages.append(
+                {
+                    "role": "tool",
+                    "tool_call_id": msg.tool_call_id,
+                    "content": msg.content or "",
+                }
+            )
+        elif msg.role == "user" and msg.content:
             openai_messages.append({"role": msg.role, "content": msg.content})
 
+    # --- File attachments (feature parity with SDK path) ---
+    working_dir: str | None = None
+    attachment_hint = ""
+    image_blocks: list[dict[str, Any]] = []
+    if file_ids and user_id:
+        working_dir = tempfile.mkdtemp(prefix=f"copilot-baseline-{session_id[:8]}-")
+        attachment_hint, image_blocks = await _prepare_baseline_attachments(
+            file_ids, user_id, session_id, working_dir
+        )
+
+    # --- URL context ---
+    context_hint = ""
+    if context and context.get("url"):
+        url = context["url"]
+        content_text = context.get("content", "")
+        if content_text:
+            context_hint = (
+                f"\n[The user shared a URL: {url}\n" f"Content:\n{content_text[:8000]}]"
+            )
+        else:
+            context_hint = f"\n[The user shared a URL: {url}]"
+
+    # Append attachment + context hints and image blocks to the last user
+    # message in a single reverse scan.
+    extra_hint = attachment_hint + context_hint
+    if extra_hint or image_blocks:
+        for i in range(len(openai_messages) - 1, -1, -1):
+            if openai_messages[i].get("role") == "user":
+                existing = openai_messages[i].get("content", "")
+                if isinstance(existing, str):
+                    text = existing + "\n" + extra_hint if extra_hint else existing
+                    if image_blocks:
+                        parts: list[dict[str, Any]] = [{"type": "text", "text": text}]
+                        for img in image_blocks:
+                            parts.append(
+                                {
+                                    "type": "image_url",
+                                    "image_url": {
+                                        "url": (
+                                            f"data:{img['source']['media_type']};"
+                                            f"base64,{img['source']['data']}"
+                                        )
+                                    },
+                                }
+                            )
+                        openai_messages[i]["content"] = parts
+                    else:
+                        openai_messages[i]["content"] = text
+                break
+
     tools = get_available_tools()
 
+    # --- Permission filtering ---
+    if permissions is not None:
+        tools = _filter_tools_by_permissions(tools, permissions)
+
     # Propagate execution context so tool handlers can read session-level flags.
-    set_execution_context(user_id, session)
+    set_execution_context(
+        user_id,
+        session,
+        sandbox=e2b_sandbox,
+        sdk_cwd=working_dir,
+        permissions=permissions,
+    )
 
     yield StreamStart(messageId=message_id, sessionId=session_id)
 
@@ -478,7 +1112,7 @@ async def stream_chat_completion_baseline(
         logger.warning("[Baseline] Langfuse trace context setup failed")
 
     _stream_error = False  # Track whether an error occurred during streaming
-    state = _BaselineStreamState()
+    state = _BaselineStreamState(model=active_model)
 
     # Bind extracted module-level callbacks to this request's state/session
     # using functools.partial so they satisfy the Protocol signatures.
@@ -487,6 +1121,13 @@ async def stream_chat_completion_baseline(
         _baseline_tool_executor, state=state, user_id=user_id, session=session
     )
 
+    _bound_conversation_updater = partial(
+        _baseline_conversation_updater,
+        transcript_builder=transcript_builder,
+        model=active_model,
+        state=state,
+    )
+
     try:
         loop_result = None
         async for loop_result in tool_call_loop(
@@ -494,7 +1135,7 @@ async def stream_chat_completion_baseline(
             tools=tools,
             llm_call=_bound_llm_caller,
             execute_tool=_bound_tool_executor,
-            update_conversation=_baseline_conversation_updater,
+            update_conversation=_bound_conversation_updater,
             max_iterations=_MAX_TOOL_ROUNDS,
         ):
             # Drain buffered events after each iteration (real-time streaming)
@@ -563,10 +1204,10 @@ async def stream_chat_completion_baseline(
             and not (_stream_error and not state.assistant_text)
         ):
             state.turn_prompt_tokens = max(
-                estimate_token_count(openai_messages, model=config.model), 1
+                estimate_token_count(openai_messages, model=active_model), 1
             )
             state.turn_completion_tokens = estimate_token_count_str(
-                state.assistant_text, model=config.model
+                state.assistant_text, model=active_model
             )
             logger.info(
                 "[Baseline] No streaming usage reported; estimated tokens: "
@@ -587,16 +1228,54 @@ async def stream_chat_completion_baseline(
             log_prefix="[Baseline]",
         )
 
-        # Persist assistant response
-        if state.assistant_text:
-            session.messages.append(
-                ChatMessage(role="assistant", content=state.assistant_text)
+        # Persist structured tool-call history (assistant + tool messages)
+        # collected by the conversation updater, then the final text response.
+        for msg in state.session_messages:
+            session.messages.append(msg)
+        # Append the final assistant text (from the last LLM call that had
+        # no tool calls, i.e. the natural finish).  Only add it if the
+        # conversation updater didn't already record it as part of a
+        # tool-call round (which would have empty response_text).
+        final_text = state.assistant_text
+        if state.session_messages:
+            # Strip text already captured in tool-call round messages
+            recorded = "".join(
+                m.content or "" for m in state.session_messages if m.role == "assistant"
             )
+            if final_text.startswith(recorded):
+                final_text = final_text[len(recorded) :]
+        if final_text.strip():
+            session.messages.append(ChatMessage(role="assistant", content=final_text))
         try:
             await upsert_chat_session(session)
         except Exception as persist_err:
             logger.error("[Baseline] Failed to persist session: %s", persist_err)
 
+        # --- Upload transcript for next-turn continuity ---
+        # Backfill partial assistant text that wasn't recorded by the
+        # conversation updater (e.g. when the stream aborted mid-round).
+        # Without this, mode-switching after a failed turn would lose
+        # the partial assistant response from the transcript.
+        if _stream_error and state.assistant_text:
+            if transcript_builder.last_entry_type != "assistant":
+                transcript_builder.append_assistant(
+                    content_blocks=[{"type": "text", "text": state.assistant_text}],
+                    model=active_model,
+                    stop_reason=STOP_REASON_END_TURN,
+                )
+
+        if user_id and should_upload_transcript(user_id, transcript_covers_prefix):
+            await _upload_final_transcript(
+                user_id=user_id,
+                session_id=session_id,
+                transcript_builder=transcript_builder,
+                session_msg_count=len(session.messages),
+            )
+
+        # Clean up the ephemeral working directory used for file attachments.
+        if working_dir is not None:
+            shutil.rmtree(working_dir, ignore_errors=True)
+
     # Yield usage and finish AFTER try/finally (not inside finally).
     # PEP 525 prohibits yielding from finally in async generators during
     # aclose() — doing so raises RuntimeError on client disconnect.
diff --git a/autogpt_platform/backend/backend/copilot/baseline/service_unit_test.py b/autogpt_platform/backend/backend/copilot/baseline/service_unit_test.py
new file mode 100644
index 0000000000..c5cbb9d882
--- /dev/null
+++ b/autogpt_platform/backend/backend/copilot/baseline/service_unit_test.py
@@ -0,0 +1,633 @@
+"""Unit tests for baseline service pure-logic helpers.
+
+These tests cover ``_baseline_conversation_updater`` and ``_BaselineStreamState``
+without requiring API keys, database connections, or network access.
+"""
+
+from unittest.mock import AsyncMock, patch
+
+import pytest
+from openai.types.chat import ChatCompletionToolParam
+
+from backend.copilot.baseline.service import (
+    _baseline_conversation_updater,
+    _BaselineStreamState,
+    _compress_session_messages,
+    _ThinkingStripper,
+)
+from backend.copilot.model import ChatMessage
+from backend.copilot.transcript_builder import TranscriptBuilder
+from backend.util.prompt import CompressResult
+from backend.util.tool_call_loop import LLMLoopResponse, LLMToolCall, ToolCallResult
+
+
+class TestBaselineStreamState:
+    def test_defaults(self):
+        state = _BaselineStreamState()
+        assert state.pending_events == []
+        assert state.assistant_text == ""
+        assert state.text_started is False
+        assert state.turn_prompt_tokens == 0
+        assert state.turn_completion_tokens == 0
+        assert state.text_block_id  # Should be a UUID string
+
+    def test_mutable_fields(self):
+        state = _BaselineStreamState()
+        state.assistant_text = "hello"
+        state.turn_prompt_tokens = 100
+        state.turn_completion_tokens = 50
+        assert state.assistant_text == "hello"
+        assert state.turn_prompt_tokens == 100
+        assert state.turn_completion_tokens == 50
+
+
+class TestBaselineConversationUpdater:
+    """Tests for _baseline_conversation_updater which updates the OpenAI
+    message list and transcript builder after each LLM call."""
+
+    def _make_transcript_builder(self) -> TranscriptBuilder:
+        builder = TranscriptBuilder()
+        builder.append_user("test question")
+        return builder
+
+    def test_text_only_response(self):
+        """When the LLM returns text without tool calls, the updater appends
+        a single assistant message and records it in the transcript."""
+        messages: list = []
+        builder = self._make_transcript_builder()
+        response = LLMLoopResponse(
+            response_text="Hello, world!",
+            tool_calls=[],
+            raw_response=None,
+            prompt_tokens=0,
+            completion_tokens=0,
+        )
+
+        _baseline_conversation_updater(
+            messages,
+            response,
+            tool_results=None,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        assert len(messages) == 1
+        assert messages[0]["role"] == "assistant"
+        assert messages[0]["content"] == "Hello, world!"
+        # Transcript should have user + assistant
+        assert builder.entry_count == 2
+        assert builder.last_entry_type == "assistant"
+
+    def test_tool_calls_response(self):
+        """When the LLM returns tool calls, the updater appends the assistant
+        message with tool_calls and tool result messages."""
+        messages: list = []
+        builder = self._make_transcript_builder()
+        response = LLMLoopResponse(
+            response_text="Let me search...",
+            tool_calls=[
+                LLMToolCall(
+                    id="tc_1",
+                    name="search",
+                    arguments='{"query": "test"}',
+                ),
+            ],
+            raw_response=None,
+            prompt_tokens=0,
+            completion_tokens=0,
+        )
+        tool_results = [
+            ToolCallResult(
+                tool_call_id="tc_1",
+                tool_name="search",
+                content="Found result",
+            ),
+        ]
+
+        _baseline_conversation_updater(
+            messages,
+            response,
+            tool_results=tool_results,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        # Messages: assistant (with tool_calls) + tool result
+        assert len(messages) == 2
+        assert messages[0]["role"] == "assistant"
+        assert messages[0]["content"] == "Let me search..."
+        assert len(messages[0]["tool_calls"]) == 1
+        assert messages[0]["tool_calls"][0]["id"] == "tc_1"
+        assert messages[1]["role"] == "tool"
+        assert messages[1]["tool_call_id"] == "tc_1"
+        assert messages[1]["content"] == "Found result"
+
+        # Transcript: user + assistant(tool_use) + user(tool_result)
+        assert builder.entry_count == 3
+
+    def test_tool_calls_without_text(self):
+        """Tool calls without accompanying text should still work."""
+        messages: list = []
+        builder = self._make_transcript_builder()
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[
+                LLMToolCall(id="tc_1", name="run", arguments="{}"),
+            ],
+            raw_response=None,
+            prompt_tokens=0,
+            completion_tokens=0,
+        )
+        tool_results = [
+            ToolCallResult(tool_call_id="tc_1", tool_name="run", content="done"),
+        ]
+
+        _baseline_conversation_updater(
+            messages,
+            response,
+            tool_results=tool_results,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        assert len(messages) == 2
+        assert "content" not in messages[0]  # No text content
+        assert messages[0]["tool_calls"][0]["function"]["name"] == "run"
+
+    def test_no_text_no_tools(self):
+        """When the response has no text and no tool calls, nothing is appended."""
+        messages: list = []
+        builder = self._make_transcript_builder()
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[],
+            raw_response=None,
+            prompt_tokens=0,
+            completion_tokens=0,
+        )
+
+        _baseline_conversation_updater(
+            messages,
+            response,
+            tool_results=None,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        assert len(messages) == 0
+        # Only the user entry from setup
+        assert builder.entry_count == 1
+
+    def test_multiple_tool_calls(self):
+        """Multiple tool calls in a single response are all recorded."""
+        messages: list = []
+        builder = self._make_transcript_builder()
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[
+                LLMToolCall(id="tc_1", name="tool_a", arguments="{}"),
+                LLMToolCall(id="tc_2", name="tool_b", arguments='{"x": 1}'),
+            ],
+            raw_response=None,
+            prompt_tokens=0,
+            completion_tokens=0,
+        )
+        tool_results = [
+            ToolCallResult(tool_call_id="tc_1", tool_name="tool_a", content="result_a"),
+            ToolCallResult(tool_call_id="tc_2", tool_name="tool_b", content="result_b"),
+        ]
+
+        _baseline_conversation_updater(
+            messages,
+            response,
+            tool_results=tool_results,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        # 1 assistant + 2 tool results
+        assert len(messages) == 3
+        assert len(messages[0]["tool_calls"]) == 2
+        assert messages[1]["tool_call_id"] == "tc_1"
+        assert messages[2]["tool_call_id"] == "tc_2"
+
+    def test_invalid_tool_arguments_handled(self):
+        """Tool call with invalid JSON arguments: the arguments field is
+        stored as-is in the message, and orjson failure falls back to {}
+        in the transcript content_blocks."""
+        messages: list = []
+        builder = self._make_transcript_builder()
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[
+                LLMToolCall(id="tc_1", name="tool_x", arguments="not-json"),
+            ],
+            raw_response=None,
+            prompt_tokens=0,
+            completion_tokens=0,
+        )
+        tool_results = [
+            ToolCallResult(tool_call_id="tc_1", tool_name="tool_x", content="ok"),
+        ]
+
+        _baseline_conversation_updater(
+            messages,
+            response,
+            tool_results=tool_results,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        # Should not raise — invalid JSON falls back to {} in transcript
+        assert len(messages) == 2
+        assert messages[0]["tool_calls"][0]["function"]["arguments"] == "not-json"
+
+
+class TestCompressSessionMessagesPreservesToolCalls:
+    """``_compress_session_messages`` must round-trip tool_calls + tool_call_id.
+
+    Compression serialises ChatMessage to dict for ``compress_context`` and
+    reifies the result back to ChatMessage.  A regression that drops
+    ``tool_calls`` or ``tool_call_id`` would corrupt the OpenAI message
+    list and break downstream tool-execution rounds.
+    """
+
+    @pytest.mark.asyncio
+    async def test_compressed_output_keeps_tool_calls_and_ids(self):
+        # Simulate compression that returns a summary + the most recent
+        # assistant(tool_call) + tool(tool_result) intact.
+        summary = {"role": "system", "content": "prior turns: user asked X"}
+        assistant_with_tc = {
+            "role": "assistant",
+            "content": "calling tool",
+            "tool_calls": [
+                {
+                    "id": "tc_abc",
+                    "type": "function",
+                    "function": {"name": "search", "arguments": '{"q":"y"}'},
+                }
+            ],
+        }
+        tool_result = {
+            "role": "tool",
+            "tool_call_id": "tc_abc",
+            "content": "search result",
+        }
+
+        compress_result = CompressResult(
+            messages=[summary, assistant_with_tc, tool_result],
+            token_count=100,
+            was_compacted=True,
+            original_token_count=5000,
+            messages_summarized=10,
+            messages_dropped=0,
+        )
+
+        # Input: messages that should be compressed.
+        input_messages = [
+            ChatMessage(role="user", content="q1"),
+            ChatMessage(
+                role="assistant",
+                content="calling tool",
+                tool_calls=[
+                    {
+                        "id": "tc_abc",
+                        "type": "function",
+                        "function": {
+                            "name": "search",
+                            "arguments": '{"q":"y"}',
+                        },
+                    }
+                ],
+            ),
+            ChatMessage(
+                role="tool",
+                tool_call_id="tc_abc",
+                content="search result",
+            ),
+        ]
+
+        with patch(
+            "backend.copilot.baseline.service.compress_context",
+            new=AsyncMock(return_value=compress_result),
+        ):
+            compressed = await _compress_session_messages(
+                input_messages, model="openrouter/anthropic/claude-opus-4"
+            )
+
+        # Summary, assistant(tool_calls), tool(tool_call_id).
+        assert len(compressed) == 3
+        # Assistant message must keep its tool_calls intact.
+        assistant_msg = compressed[1]
+        assert assistant_msg.role == "assistant"
+        assert assistant_msg.tool_calls is not None
+        assert len(assistant_msg.tool_calls) == 1
+        assert assistant_msg.tool_calls[0]["id"] == "tc_abc"
+        assert assistant_msg.tool_calls[0]["function"]["name"] == "search"
+        # Tool-role message must keep tool_call_id for OpenAI linkage.
+        tool_msg = compressed[2]
+        assert tool_msg.role == "tool"
+        assert tool_msg.tool_call_id == "tc_abc"
+        assert tool_msg.content == "search result"
+
+    @pytest.mark.asyncio
+    async def test_uncompressed_passthrough_keeps_fields(self):
+        """When compression is a no-op (was_compacted=False), the original
+        messages must be returned unchanged — including tool_calls."""
+        input_messages = [
+            ChatMessage(
+                role="assistant",
+                content="c",
+                tool_calls=[
+                    {
+                        "id": "t1",
+                        "type": "function",
+                        "function": {"name": "f", "arguments": "{}"},
+                    }
+                ],
+            ),
+            ChatMessage(role="tool", tool_call_id="t1", content="ok"),
+        ]
+
+        noop_result = CompressResult(
+            messages=[],  # ignored when was_compacted=False
+            token_count=10,
+            was_compacted=False,
+        )
+
+        with patch(
+            "backend.copilot.baseline.service.compress_context",
+            new=AsyncMock(return_value=noop_result),
+        ):
+            out = await _compress_session_messages(
+                input_messages, model="openrouter/anthropic/claude-opus-4"
+            )
+
+        assert out is input_messages  # same list returned
+        assert out[0].tool_calls is not None
+        assert out[0].tool_calls[0]["id"] == "t1"
+        assert out[1].tool_call_id == "t1"
+
+
+# ---- _ThinkingStripper tests ---- #
+
+
+def test_thinking_stripper_basic_thinking_tag() -> None:
+    """<thinking>...</thinking> blocks are fully stripped."""
+    s = _ThinkingStripper()
+    assert s.process("<thinking>internal reasoning here</thinking>Hello!") == "Hello!"
+
+
+def test_thinking_stripper_internal_reasoning_tag() -> None:
+    """<internal_reasoning>...</internal_reasoning> blocks (Gemini) are stripped."""
+    s = _ThinkingStripper()
+    assert (
+        s.process("<internal_reasoning>step by step</internal_reasoning>Answer")
+        == "Answer"
+    )
+
+
+def test_thinking_stripper_split_across_chunks() -> None:
+    """Tags split across multiple chunks are handled correctly."""
+    s = _ThinkingStripper()
+    out = s.process("Hello <thin")
+    out += s.process("king>secret</thinking> world")
+    assert out == "Hello  world"
+
+
+def test_thinking_stripper_plain_text_preserved() -> None:
+    """Plain text with the word 'thinking' is not stripped."""
+    s = _ThinkingStripper()
+    assert (
+        s.process("I am thinking about this problem")
+        == "I am thinking about this problem"
+    )
+
+
+def test_thinking_stripper_multiple_blocks() -> None:
+    """Multiple reasoning blocks in one stream are all stripped."""
+    s = _ThinkingStripper()
+    result = s.process(
+        "A<thinking>x</thinking>B<internal_reasoning>y</internal_reasoning>C"
+    )
+    assert result == "ABC"
+
+
+def test_thinking_stripper_flush_discards_unclosed() -> None:
+    """Unclosed reasoning block is discarded on flush."""
+    s = _ThinkingStripper()
+    s.process("Start<thinking>never closed")
+    flushed = s.flush()
+    assert "never closed" not in flushed
+
+
+def test_thinking_stripper_empty_block() -> None:
+    """Empty reasoning blocks are handled gracefully."""
+    s = _ThinkingStripper()
+    assert s.process("Before<thinking></thinking>After") == "BeforeAfter"
+
+
+# ---- _filter_tools_by_permissions tests ---- #
+
+
+def _make_tool(name: str) -> ChatCompletionToolParam:
+    """Build a minimal OpenAI ChatCompletionToolParam."""
+    return ChatCompletionToolParam(
+        type="function",
+        function={"name": name, "parameters": {}},
+    )
+
+
+class TestFilterToolsByPermissions:
+    """Tests for _filter_tools_by_permissions."""
+
+    @patch(
+        "backend.copilot.permissions.all_known_tool_names",
+        return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
+    )
+    def test_empty_permissions_returns_all(self, _mock_names):
+        """Empty permissions (no filtering) returns every tool unchanged."""
+        from backend.copilot.baseline.service import _filter_tools_by_permissions
+        from backend.copilot.permissions import CopilotPermissions
+
+        tools = [_make_tool("run_block"), _make_tool("web_fetch")]
+        perms = CopilotPermissions()
+        result = _filter_tools_by_permissions(tools, perms)
+        assert result == tools
+
+    @patch(
+        "backend.copilot.permissions.all_known_tool_names",
+        return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
+    )
+    def test_allowlist_keeps_only_matching(self, _mock_names):
+        """Explicit allowlist (tools_exclude=False) keeps only listed tools."""
+        from backend.copilot.baseline.service import _filter_tools_by_permissions
+        from backend.copilot.permissions import CopilotPermissions
+
+        tools = [
+            _make_tool("run_block"),
+            _make_tool("web_fetch"),
+            _make_tool("bash_exec"),
+        ]
+        perms = CopilotPermissions(tools=["web_fetch"], tools_exclude=False)
+        result = _filter_tools_by_permissions(tools, perms)
+        assert len(result) == 1
+        assert result[0]["function"]["name"] == "web_fetch"
+
+    @patch(
+        "backend.copilot.permissions.all_known_tool_names",
+        return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
+    )
+    def test_blacklist_excludes_listed(self, _mock_names):
+        """Blacklist (tools_exclude=True) removes only the listed tools."""
+        from backend.copilot.baseline.service import _filter_tools_by_permissions
+        from backend.copilot.permissions import CopilotPermissions
+
+        tools = [
+            _make_tool("run_block"),
+            _make_tool("web_fetch"),
+            _make_tool("bash_exec"),
+        ]
+        perms = CopilotPermissions(tools=["bash_exec"], tools_exclude=True)
+        result = _filter_tools_by_permissions(tools, perms)
+        names = [t["function"]["name"] for t in result]
+        assert "bash_exec" not in names
+        assert "run_block" in names
+        assert "web_fetch" in names
+        assert len(result) == 2
+
+    @patch(
+        "backend.copilot.permissions.all_known_tool_names",
+        return_value=frozenset({"run_block", "web_fetch", "bash_exec"}),
+    )
+    def test_unknown_tool_name_filtered_out(self, _mock_names):
+        """A tool whose name is not in all_known_tool_names is dropped."""
+        from backend.copilot.baseline.service import _filter_tools_by_permissions
+        from backend.copilot.permissions import CopilotPermissions
+
+        tools = [_make_tool("run_block"), _make_tool("unknown_tool")]
+        perms = CopilotPermissions(tools=["run_block"], tools_exclude=False)
+        result = _filter_tools_by_permissions(tools, perms)
+        names = [t["function"]["name"] for t in result]
+        assert "unknown_tool" not in names
+        assert names == ["run_block"]
+
+
+# ---- _prepare_baseline_attachments tests ---- #
+
+
+class TestPrepareBaselineAttachments:
+    """Tests for _prepare_baseline_attachments."""
+
+    @pytest.mark.asyncio
+    async def test_empty_file_ids(self):
+        """Empty file_ids returns empty hint and blocks."""
+        from backend.copilot.baseline.service import _prepare_baseline_attachments
+
+        hint, blocks = await _prepare_baseline_attachments([], "user1", "sess1", "/tmp")
+        assert hint == ""
+        assert blocks == []
+
+    @pytest.mark.asyncio
+    async def test_empty_user_id(self):
+        """Empty user_id returns empty hint and blocks."""
+        from backend.copilot.baseline.service import _prepare_baseline_attachments
+
+        hint, blocks = await _prepare_baseline_attachments(
+            ["file1"], "", "sess1", "/tmp"
+        )
+        assert hint == ""
+        assert blocks == []
+
+    @pytest.mark.asyncio
+    async def test_image_file_returns_vision_blocks(self):
+        """A PNG image within size limits is returned as a base64 vision block."""
+        from backend.copilot.baseline.service import _prepare_baseline_attachments
+
+        fake_info = AsyncMock()
+        fake_info.name = "photo.png"
+        fake_info.mime_type = "image/png"
+        fake_info.size_bytes = 1024
+
+        fake_manager = AsyncMock()
+        fake_manager.get_file_info = AsyncMock(return_value=fake_info)
+        fake_manager.read_file_by_id = AsyncMock(return_value=b"\x89PNG_FAKE_DATA")
+
+        with patch(
+            "backend.copilot.baseline.service.get_workspace_manager",
+            new=AsyncMock(return_value=fake_manager),
+        ):
+            hint, blocks = await _prepare_baseline_attachments(
+                ["fid1"], "user1", "sess1", "/tmp/workdir"
+            )
+
+        assert len(blocks) == 1
+        assert blocks[0]["type"] == "image"
+        assert blocks[0]["source"]["media_type"] == "image/png"
+        assert blocks[0]["source"]["type"] == "base64"
+        assert "photo.png" in hint
+        assert "embedded as image" in hint
+
+    @pytest.mark.asyncio
+    async def test_non_image_file_saved_to_working_dir(self, tmp_path):
+        """A non-image file is written to working_dir."""
+        from backend.copilot.baseline.service import _prepare_baseline_attachments
+
+        fake_info = AsyncMock()
+        fake_info.name = "data.csv"
+        fake_info.mime_type = "text/csv"
+        fake_info.size_bytes = 42
+
+        fake_manager = AsyncMock()
+        fake_manager.get_file_info = AsyncMock(return_value=fake_info)
+        fake_manager.read_file_by_id = AsyncMock(return_value=b"col1,col2\na,b")
+
+        with patch(
+            "backend.copilot.baseline.service.get_workspace_manager",
+            new=AsyncMock(return_value=fake_manager),
+        ):
+            hint, blocks = await _prepare_baseline_attachments(
+                ["fid1"], "user1", "sess1", str(tmp_path)
+            )
+
+        assert blocks == []
+        assert "data.csv" in hint
+        assert "saved to" in hint
+        saved = tmp_path / "data.csv"
+        assert saved.exists()
+        assert saved.read_bytes() == b"col1,col2\na,b"
+
+    @pytest.mark.asyncio
+    async def test_file_not_found_skipped(self):
+        """When get_file_info returns None the file is silently skipped."""
+        from backend.copilot.baseline.service import _prepare_baseline_attachments
+
+        fake_manager = AsyncMock()
+        fake_manager.get_file_info = AsyncMock(return_value=None)
+
+        with patch(
+            "backend.copilot.baseline.service.get_workspace_manager",
+            new=AsyncMock(return_value=fake_manager),
+        ):
+            hint, blocks = await _prepare_baseline_attachments(
+                ["missing_id"], "user1", "sess1", "/tmp"
+            )
+
+        assert hint == ""
+        assert blocks == []
+
+    @pytest.mark.asyncio
+    async def test_workspace_manager_error(self):
+        """When get_workspace_manager raises, returns empty results."""
+        from backend.copilot.baseline.service import _prepare_baseline_attachments
+
+        with patch(
+            "backend.copilot.baseline.service.get_workspace_manager",
+            new=AsyncMock(side_effect=RuntimeError("connection failed")),
+        ):
+            hint, blocks = await _prepare_baseline_attachments(
+                ["fid1"], "user1", "sess1", "/tmp"
+            )
+
+        assert hint == ""
+        assert blocks == []
diff --git a/autogpt_platform/backend/backend/copilot/baseline/transcript_integration_test.py b/autogpt_platform/backend/backend/copilot/baseline/transcript_integration_test.py
new file mode 100644
index 0000000000..fccf7c6387
--- /dev/null
+++ b/autogpt_platform/backend/backend/copilot/baseline/transcript_integration_test.py
@@ -0,0 +1,667 @@
+"""Integration tests for baseline transcript flow.
+
+Exercises the real helpers in ``baseline/service.py`` that download,
+validate, load, append to, backfill, and upload the transcript.
+Storage is mocked via ``download_transcript`` / ``upload_transcript``
+patches; no network access is required.
+"""
+
+import json as stdlib_json
+from unittest.mock import AsyncMock, patch
+
+import pytest
+
+from backend.copilot.baseline.service import (
+    _load_prior_transcript,
+    _record_turn_to_transcript,
+    _resolve_baseline_model,
+    _upload_final_transcript,
+    is_transcript_stale,
+    should_upload_transcript,
+)
+from backend.copilot.service import config
+from backend.copilot.transcript import (
+    STOP_REASON_END_TURN,
+    STOP_REASON_TOOL_USE,
+    TranscriptDownload,
+)
+from backend.copilot.transcript_builder import TranscriptBuilder
+from backend.util.tool_call_loop import LLMLoopResponse, LLMToolCall, ToolCallResult
+
+
+def _make_transcript_content(*roles: str) -> str:
+    """Build a minimal valid JSONL transcript from role names."""
+    lines = []
+    parent = ""
+    for i, role in enumerate(roles):
+        uid = f"uuid-{i}"
+        entry: dict = {
+            "type": role,
+            "uuid": uid,
+            "parentUuid": parent,
+            "message": {
+                "role": role,
+                "content": [{"type": "text", "text": f"{role} message {i}"}],
+            },
+        }
+        if role == "assistant":
+            entry["message"]["id"] = f"msg_{i}"
+            entry["message"]["model"] = "test-model"
+            entry["message"]["type"] = "message"
+            entry["message"]["stop_reason"] = STOP_REASON_END_TURN
+        lines.append(stdlib_json.dumps(entry))
+        parent = uid
+    return "\n".join(lines) + "\n"
+
+
+class TestResolveBaselineModel:
+    """Model selection honours the per-request mode."""
+
+    def test_fast_mode_selects_fast_model(self):
+        assert _resolve_baseline_model("fast") == config.fast_model
+
+    def test_extended_thinking_selects_default_model(self):
+        assert _resolve_baseline_model("extended_thinking") == config.model
+
+    def test_none_mode_selects_default_model(self):
+        """Critical: baseline users without a mode MUST keep the default (opus)."""
+        assert _resolve_baseline_model(None) == config.model
+
+    def test_default_and_fast_models_differ(self):
+        """Sanity: the two tiers are actually distinct in production config."""
+        assert config.model != config.fast_model
+
+
+class TestLoadPriorTranscript:
+    """``_load_prior_transcript`` wraps the download + validate + load flow."""
+
+    @pytest.mark.asyncio
+    async def test_loads_fresh_transcript(self):
+        builder = TranscriptBuilder()
+        content = _make_transcript_content("user", "assistant")
+        download = TranscriptDownload(content=content, message_count=2)
+
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(return_value=download),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=3,
+                transcript_builder=builder,
+            )
+
+        assert covers is True
+        assert builder.entry_count == 2
+        assert builder.last_entry_type == "assistant"
+
+    @pytest.mark.asyncio
+    async def test_rejects_stale_transcript(self):
+        """msg_count strictly less than session-1 is treated as stale."""
+        builder = TranscriptBuilder()
+        content = _make_transcript_content("user", "assistant")
+        # session has 6 messages, transcript only covers 2 → stale.
+        download = TranscriptDownload(content=content, message_count=2)
+
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(return_value=download),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=6,
+                transcript_builder=builder,
+            )
+
+        assert covers is False
+        assert builder.is_empty
+
+    @pytest.mark.asyncio
+    async def test_missing_transcript_returns_false(self):
+        builder = TranscriptBuilder()
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(return_value=None),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=2,
+                transcript_builder=builder,
+            )
+
+        assert covers is False
+        assert builder.is_empty
+
+    @pytest.mark.asyncio
+    async def test_invalid_transcript_returns_false(self):
+        builder = TranscriptBuilder()
+        download = TranscriptDownload(
+            content='{"type":"progress","uuid":"a"}\n',
+            message_count=1,
+        )
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(return_value=download),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=2,
+                transcript_builder=builder,
+            )
+
+        assert covers is False
+        assert builder.is_empty
+
+    @pytest.mark.asyncio
+    async def test_download_exception_returns_false(self):
+        builder = TranscriptBuilder()
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(side_effect=RuntimeError("boom")),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=2,
+                transcript_builder=builder,
+            )
+
+        assert covers is False
+        assert builder.is_empty
+
+    @pytest.mark.asyncio
+    async def test_zero_message_count_not_stale(self):
+        """When msg_count is 0 (unknown), staleness check is skipped."""
+        builder = TranscriptBuilder()
+        download = TranscriptDownload(
+            content=_make_transcript_content("user", "assistant"),
+            message_count=0,
+        )
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(return_value=download),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=20,
+                transcript_builder=builder,
+            )
+
+        assert covers is True
+        assert builder.entry_count == 2
+
+
+class TestUploadFinalTranscript:
+    """``_upload_final_transcript`` serialises and calls storage."""
+
+    @pytest.mark.asyncio
+    async def test_uploads_valid_transcript(self):
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "hello"}],
+            model="test-model",
+            stop_reason=STOP_REASON_END_TURN,
+        )
+
+        upload_mock = AsyncMock(return_value=None)
+        with patch(
+            "backend.copilot.baseline.service.upload_transcript",
+            new=upload_mock,
+        ):
+            await _upload_final_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                transcript_builder=builder,
+                session_msg_count=2,
+            )
+
+        upload_mock.assert_awaited_once()
+        assert upload_mock.await_args is not None
+        call_kwargs = upload_mock.await_args.kwargs
+        assert call_kwargs["user_id"] == "user-1"
+        assert call_kwargs["session_id"] == "session-1"
+        assert call_kwargs["message_count"] == 2
+        assert "hello" in call_kwargs["content"]
+
+    @pytest.mark.asyncio
+    async def test_skips_upload_when_builder_empty(self):
+        builder = TranscriptBuilder()
+        upload_mock = AsyncMock(return_value=None)
+        with patch(
+            "backend.copilot.baseline.service.upload_transcript",
+            new=upload_mock,
+        ):
+            await _upload_final_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                transcript_builder=builder,
+                session_msg_count=0,
+            )
+
+        upload_mock.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_swallows_upload_exceptions(self):
+        """Upload failures should not propagate (flow continues for the user)."""
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "hello"}],
+            model="test-model",
+            stop_reason=STOP_REASON_END_TURN,
+        )
+
+        with patch(
+            "backend.copilot.baseline.service.upload_transcript",
+            new=AsyncMock(side_effect=RuntimeError("storage unavailable")),
+        ):
+            # Should not raise.
+            await _upload_final_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                transcript_builder=builder,
+                session_msg_count=2,
+            )
+
+
+class TestRecordTurnToTranscript:
+    """``_record_turn_to_transcript`` translates LLMLoopResponse → transcript."""
+
+    def test_records_final_assistant_text(self):
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+
+        response = LLMLoopResponse(
+            response_text="hello there",
+            tool_calls=[],
+            raw_response=None,
+        )
+        _record_turn_to_transcript(
+            response,
+            tool_results=None,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        assert builder.entry_count == 2
+        assert builder.last_entry_type == "assistant"
+        jsonl = builder.to_jsonl()
+        assert "hello there" in jsonl
+        assert STOP_REASON_END_TURN in jsonl
+
+    def test_records_tool_use_then_tool_result(self):
+        """Anthropic ordering: assistant(tool_use) → user(tool_result)."""
+        builder = TranscriptBuilder()
+        builder.append_user(content="use a tool")
+
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[
+                LLMToolCall(id="call-1", name="echo", arguments='{"text":"hi"}')
+            ],
+            raw_response=None,
+        )
+        tool_results = [
+            ToolCallResult(tool_call_id="call-1", tool_name="echo", content="hi")
+        ]
+        _record_turn_to_transcript(
+            response,
+            tool_results,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        # user, assistant(tool_use), user(tool_result) = 3 entries
+        assert builder.entry_count == 3
+        jsonl = builder.to_jsonl()
+        assert STOP_REASON_TOOL_USE in jsonl
+        assert "tool_use" in jsonl
+        assert "tool_result" in jsonl
+        assert "call-1" in jsonl
+
+    def test_records_nothing_on_empty_response(self):
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[],
+            raw_response=None,
+        )
+        _record_turn_to_transcript(
+            response,
+            tool_results=None,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        assert builder.entry_count == 1
+
+    def test_malformed_tool_args_dont_crash(self):
+        """Bad JSON in tool arguments falls back to {} without raising."""
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+
+        response = LLMLoopResponse(
+            response_text=None,
+            tool_calls=[LLMToolCall(id="call-1", name="echo", arguments="{not-json")],
+            raw_response=None,
+        )
+        tool_results = [
+            ToolCallResult(tool_call_id="call-1", tool_name="echo", content="ok")
+        ]
+        _record_turn_to_transcript(
+            response,
+            tool_results,
+            transcript_builder=builder,
+            model="test-model",
+        )
+
+        assert builder.entry_count == 3
+        jsonl = builder.to_jsonl()
+        assert '"input":{}' in jsonl
+
+
+class TestRoundTrip:
+    """End-to-end: load prior → append new turn → upload."""
+
+    @pytest.mark.asyncio
+    async def test_full_round_trip(self):
+        prior = _make_transcript_content("user", "assistant")
+        download = TranscriptDownload(content=prior, message_count=2)
+
+        builder = TranscriptBuilder()
+        with patch(
+            "backend.copilot.baseline.service.download_transcript",
+            new=AsyncMock(return_value=download),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=3,
+                transcript_builder=builder,
+            )
+        assert covers is True
+        assert builder.entry_count == 2
+
+        # New user turn.
+        builder.append_user(content="new question")
+        assert builder.entry_count == 3
+
+        # New assistant turn.
+        response = LLMLoopResponse(
+            response_text="new answer",
+            tool_calls=[],
+            raw_response=None,
+        )
+        _record_turn_to_transcript(
+            response,
+            tool_results=None,
+            transcript_builder=builder,
+            model="test-model",
+        )
+        assert builder.entry_count == 4
+
+        # Upload.
+        upload_mock = AsyncMock(return_value=None)
+        with patch(
+            "backend.copilot.baseline.service.upload_transcript",
+            new=upload_mock,
+        ):
+            await _upload_final_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                transcript_builder=builder,
+                session_msg_count=4,
+            )
+
+        upload_mock.assert_awaited_once()
+        assert upload_mock.await_args is not None
+        uploaded = upload_mock.await_args.kwargs["content"]
+        assert "new question" in uploaded
+        assert "new answer" in uploaded
+        # Original content preserved in the round trip.
+        assert "user message 0" in uploaded
+        assert "assistant message 1" in uploaded
+
+    @pytest.mark.asyncio
+    async def test_backfill_append_guard(self):
+        """Backfill only runs when the last entry is not already assistant."""
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+
+        # Simulate the backfill guard from stream_chat_completion_baseline.
+        assistant_text = "partial text before error"
+        if builder.last_entry_type != "assistant":
+            builder.append_assistant(
+                content_blocks=[{"type": "text", "text": assistant_text}],
+                model="test-model",
+                stop_reason=STOP_REASON_END_TURN,
+            )
+
+        assert builder.last_entry_type == "assistant"
+        assert "partial text before error" in builder.to_jsonl()
+
+        # Second invocation: the guard must prevent double-append.
+        initial_count = builder.entry_count
+        if builder.last_entry_type != "assistant":
+            builder.append_assistant(
+                content_blocks=[{"type": "text", "text": "duplicate"}],
+                model="test-model",
+                stop_reason=STOP_REASON_END_TURN,
+            )
+        assert builder.entry_count == initial_count
+
+
+class TestIsTranscriptStale:
+    """``is_transcript_stale`` gates prior-transcript loading."""
+
+    def test_none_download_is_not_stale(self):
+        assert is_transcript_stale(None, session_msg_count=5) is False
+
+    def test_zero_message_count_is_not_stale(self):
+        """Legacy transcripts without msg_count tracking must remain usable."""
+        dl = TranscriptDownload(content="", message_count=0)
+        assert is_transcript_stale(dl, session_msg_count=20) is False
+
+    def test_stale_when_covers_less_than_prefix(self):
+        dl = TranscriptDownload(content="", message_count=2)
+        # session has 6 messages; transcript must cover at least 5 (6-1).
+        assert is_transcript_stale(dl, session_msg_count=6) is True
+
+    def test_fresh_when_covers_full_prefix(self):
+        dl = TranscriptDownload(content="", message_count=5)
+        assert is_transcript_stale(dl, session_msg_count=6) is False
+
+    def test_fresh_when_exceeds_prefix(self):
+        """Race: transcript ahead of session count is still acceptable."""
+        dl = TranscriptDownload(content="", message_count=10)
+        assert is_transcript_stale(dl, session_msg_count=6) is False
+
+    def test_boundary_equal_to_prefix_minus_one(self):
+        dl = TranscriptDownload(content="", message_count=5)
+        assert is_transcript_stale(dl, session_msg_count=6) is False
+
+
+class TestShouldUploadTranscript:
+    """``should_upload_transcript`` gates the final upload."""
+
+    def test_upload_allowed_for_user_with_coverage(self):
+        assert should_upload_transcript("user-1", True) is True
+
+    def test_upload_skipped_when_no_user(self):
+        assert should_upload_transcript(None, True) is False
+
+    def test_upload_skipped_when_empty_user(self):
+        assert should_upload_transcript("", True) is False
+
+    def test_upload_skipped_without_coverage(self):
+        """Partial transcript must never clobber a more complete stored one."""
+        assert should_upload_transcript("user-1", False) is False
+
+    def test_upload_skipped_when_no_user_and_no_coverage(self):
+        assert should_upload_transcript(None, False) is False
+
+
+class TestTranscriptLifecycle:
+    """End-to-end: download → validate → build → upload.
+
+    Simulates the full transcript lifecycle inside
+    ``stream_chat_completion_baseline`` by mocking the storage layer and
+    driving each step through the real helpers.
+    """
+
+    @pytest.mark.asyncio
+    async def test_full_lifecycle_happy_path(self):
+        """Fresh download, append a turn, upload covers the session."""
+        builder = TranscriptBuilder()
+        prior = _make_transcript_content("user", "assistant")
+        download = TranscriptDownload(content=prior, message_count=2)
+
+        upload_mock = AsyncMock(return_value=None)
+        with (
+            patch(
+                "backend.copilot.baseline.service.download_transcript",
+                new=AsyncMock(return_value=download),
+            ),
+            patch(
+                "backend.copilot.baseline.service.upload_transcript",
+                new=upload_mock,
+            ),
+        ):
+            # --- 1. Download & load prior transcript ---
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=3,
+                transcript_builder=builder,
+            )
+            assert covers is True
+
+            # --- 2. Append a new user turn + a new assistant response ---
+            builder.append_user(content="follow-up question")
+            _record_turn_to_transcript(
+                LLMLoopResponse(
+                    response_text="follow-up answer",
+                    tool_calls=[],
+                    raw_response=None,
+                ),
+                tool_results=None,
+                transcript_builder=builder,
+                model="test-model",
+            )
+
+            # --- 3. Gate + upload ---
+            assert (
+                should_upload_transcript(
+                    user_id="user-1", transcript_covers_prefix=covers
+                )
+                is True
+            )
+            await _upload_final_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                transcript_builder=builder,
+                session_msg_count=4,
+            )
+
+        upload_mock.assert_awaited_once()
+        assert upload_mock.await_args is not None
+        uploaded = upload_mock.await_args.kwargs["content"]
+        assert "follow-up question" in uploaded
+        assert "follow-up answer" in uploaded
+        # Original prior-turn content preserved.
+        assert "user message 0" in uploaded
+        assert "assistant message 1" in uploaded
+
+    @pytest.mark.asyncio
+    async def test_lifecycle_stale_download_suppresses_upload(self):
+        """Stale download → covers=False → upload must be skipped."""
+        builder = TranscriptBuilder()
+        # session has 10 msgs but stored transcript only covers 2 → stale.
+        stale = TranscriptDownload(
+            content=_make_transcript_content("user", "assistant"),
+            message_count=2,
+        )
+
+        upload_mock = AsyncMock(return_value=None)
+        with (
+            patch(
+                "backend.copilot.baseline.service.download_transcript",
+                new=AsyncMock(return_value=stale),
+            ),
+            patch(
+                "backend.copilot.baseline.service.upload_transcript",
+                new=upload_mock,
+            ),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=10,
+                transcript_builder=builder,
+            )
+
+        assert covers is False
+        # The caller's gate mirrors the production path.
+        assert (
+            should_upload_transcript(user_id="user-1", transcript_covers_prefix=covers)
+            is False
+        )
+        upload_mock.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_lifecycle_anonymous_user_skips_upload(self):
+        """Anonymous (user_id=None) → upload gate must return False."""
+        builder = TranscriptBuilder()
+        builder.append_user(content="hi")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "hello"}],
+            model="test-model",
+            stop_reason=STOP_REASON_END_TURN,
+        )
+
+        assert (
+            should_upload_transcript(user_id=None, transcript_covers_prefix=True)
+            is False
+        )
+
+    @pytest.mark.asyncio
+    async def test_lifecycle_missing_download_still_uploads_new_content(self):
+        """No prior transcript → covers defaults to True in the service,
+        new turn should upload cleanly."""
+        builder = TranscriptBuilder()
+        upload_mock = AsyncMock(return_value=None)
+        with (
+            patch(
+                "backend.copilot.baseline.service.download_transcript",
+                new=AsyncMock(return_value=None),
+            ),
+            patch(
+                "backend.copilot.baseline.service.upload_transcript",
+                new=upload_mock,
+            ),
+        ):
+            covers = await _load_prior_transcript(
+                user_id="user-1",
+                session_id="session-1",
+                session_msg_count=1,
+                transcript_builder=builder,
+            )
+            # No download: covers is False, so the production path would
+            # skip upload. This protects against overwriting a future
+            # more-complete transcript with a single-turn snapshot.
+            assert covers is False
+            assert (
+                should_upload_transcript(
+                    user_id="user-1", transcript_covers_prefix=covers
+                )
+                is False
+            )
+            upload_mock.assert_not_awaited()
diff --git a/autogpt_platform/backend/backend/copilot/config.py b/autogpt_platform/backend/backend/copilot/config.py
index 6c271322a6..2db5c2f03f 100644
--- a/autogpt_platform/backend/backend/copilot/config.py
+++ b/autogpt_platform/backend/backend/copilot/config.py
@@ -8,13 +8,26 @@ from pydantic_settings import BaseSettings
 
 from backend.util.clients import OPENROUTER_BASE_URL
 
+# Per-request routing mode for a single chat turn.
+# - 'fast': route to the baseline OpenAI-compatible path with the cheaper model.
+# - 'extended_thinking': route to the Claude Agent SDK path with the default
+#   (opus) model.
+# ``None`` means "no override"; the server falls back to the Claude Code
+# subscription flag → LaunchDarkly COPILOT_SDK → config.use_claude_agent_sdk.
+CopilotMode = Literal["fast", "extended_thinking"]
+
 
 class ChatConfig(BaseSettings):
     """Configuration for the chat system."""
 
     # OpenAI API Configuration
     model: str = Field(
-        default="anthropic/claude-opus-4.6", description="Default model to use"
+        default="anthropic/claude-opus-4.6",
+        description="Default model for extended thinking mode",
+    )
+    fast_model: str = Field(
+        default="anthropic/claude-sonnet-4",
+        description="Model for fast mode (baseline path). Should be faster/cheaper than the default model.",
     )
     title_model: str = Field(
         default="openai/gpt-4o-mini",
diff --git a/autogpt_platform/backend/backend/copilot/db.py b/autogpt_platform/backend/backend/copilot/db.py
index f94f1d56c7..24d0e1a558 100644
--- a/autogpt_platform/backend/backend/copilot/db.py
+++ b/autogpt_platform/backend/backend/copilot/db.py
@@ -14,6 +14,7 @@ from prisma.types import (
     ChatSessionUpdateInput,
     ChatSessionWhereInput,
 )
+from pydantic import BaseModel
 
 from backend.data import db
 from backend.util.json import SafeJson, sanitize_string
@@ -23,12 +24,22 @@ from .model import (
     ChatSession,
     ChatSessionInfo,
     ChatSessionMetadata,
-    invalidate_session_cache,
+    cache_chat_session,
 )
+from .model import get_chat_session as get_chat_session_cached
 
 logger = logging.getLogger(__name__)
 
 
+class PaginatedMessages(BaseModel):
+    """Result of a paginated message query."""
+
+    messages: list[ChatMessage]
+    has_more: bool
+    oldest_sequence: int | None
+    session: ChatSessionInfo
+
+
 async def get_chat_session(session_id: str) -> ChatSession | None:
     """Get a chat session by ID from the database."""
     session = await PrismaChatSession.prisma().find_unique(
@@ -38,6 +49,116 @@ async def get_chat_session(session_id: str) -> ChatSession | None:
     return ChatSession.from_db(session) if session else None
 
 
+async def get_chat_session_metadata(session_id: str) -> ChatSessionInfo | None:
+    """Get chat session metadata (without messages) for ownership validation."""
+    session = await PrismaChatSession.prisma().find_unique(
+        where={"id": session_id},
+    )
+    return ChatSessionInfo.from_db(session) if session else None
+
+
+async def get_chat_messages_paginated(
+    session_id: str,
+    limit: int = 50,
+    before_sequence: int | None = None,
+    user_id: str | None = None,
+) -> PaginatedMessages | None:
+    """Get paginated messages for a session, newest first.
+
+    Verifies session existence (and ownership when ``user_id`` is provided)
+    in parallel with the message query.  Returns ``None`` when the session
+    is not found or does not belong to the user.
+
+    Args:
+        session_id: The chat session ID.
+        limit: Max messages to return.
+        before_sequence: Cursor — return messages with sequence < this value.
+        user_id: If provided, filters via ``Session.userId`` so only the
+            session owner's messages are returned (acts as an ownership guard).
+    """
+    # Build session-existence / ownership check
+    session_where: ChatSessionWhereInput = {"id": session_id}
+    if user_id is not None:
+        session_where["userId"] = user_id
+
+    # Build message include — fetch paginated messages in the same query
+    msg_include: dict[str, Any] = {
+        "order_by": {"sequence": "desc"},
+        "take": limit + 1,
+    }
+    if before_sequence is not None:
+        msg_include["where"] = {"sequence": {"lt": before_sequence}}
+
+    # Single query: session existence/ownership + paginated messages
+    session = await PrismaChatSession.prisma().find_first(
+        where=session_where,
+        include={"Messages": msg_include},
+    )
+
+    if session is None:
+        return None
+
+    session_info = ChatSessionInfo.from_db(session)
+    results = list(session.Messages) if session.Messages else []
+
+    has_more = len(results) > limit
+    results = results[:limit]
+
+    # Reverse to ascending order
+    results.reverse()
+
+    # Tool-call boundary fix: if the oldest message is a tool message,
+    # expand backward to include the preceding assistant message that
+    # owns the tool_calls, so convertChatSessionMessagesToUiMessages
+    # can pair them correctly.
+    _BOUNDARY_SCAN_LIMIT = 10
+    if results and results[0].role == "tool":
+        boundary_where: dict[str, Any] = {
+            "sessionId": session_id,
+            "sequence": {"lt": results[0].sequence},
+        }
+        if user_id is not None:
+            boundary_where["Session"] = {"is": {"userId": user_id}}
+        extra = await PrismaChatMessage.prisma().find_many(
+            where=boundary_where,
+            order={"sequence": "desc"},
+            take=_BOUNDARY_SCAN_LIMIT,
+        )
+        # Find the first non-tool message (should be the assistant)
+        boundary_msgs = []
+        found_owner = False
+        for msg in extra:
+            boundary_msgs.append(msg)
+            if msg.role != "tool":
+                found_owner = True
+                break
+        boundary_msgs.reverse()
+        if not found_owner:
+            logger.warning(
+                "Boundary expansion did not find owning assistant message "
+                "for session=%s before sequence=%s (%d msgs scanned)",
+                session_id,
+                results[0].sequence,
+                len(extra),
+            )
+        if boundary_msgs:
+            results = boundary_msgs + results
+            # Only mark has_more if the expanded boundary isn't the
+            # very start of the conversation (sequence 0).
+            if boundary_msgs[0].sequence > 0:
+                has_more = True
+
+    messages = [ChatMessage.from_db(m) for m in results]
+    oldest_sequence = messages[0].sequence if messages else None
+
+    return PaginatedMessages(
+        messages=messages,
+        has_more=has_more,
+        oldest_sequence=oldest_sequence,
+        session=session_info,
+    )
+
+
 async def create_chat_session(
     session_id: str,
     user_id: str,
@@ -386,8 +507,11 @@ async def update_tool_message_content(
 async def set_turn_duration(session_id: str, duration_ms: int) -> None:
     """Set durationMs on the last assistant message in a session.
 
-    Also invalidates the Redis session cache so the next GET returns
-    the updated duration.
+    Updates the Redis cache in-place instead of invalidating it.
+    Invalidation would delete the key, creating a window where concurrent
+    ``get_chat_session`` calls re-populate the cache from DB — potentially
+    with stale data if the DB write from the previous turn hasn't propagated.
+    This race caused duplicate user messages on the next turn.
     """
     last_msg = await PrismaChatMessage.prisma().find_first(
         where={"sessionId": session_id, "role": "assistant"},
@@ -398,5 +522,13 @@ async def set_turn_duration(session_id: str, duration_ms: int) -> None:
             where={"id": last_msg.id},
             data={"durationMs": duration_ms},
         )
-        # Invalidate cache so the session is re-fetched from DB with durationMs
-        await invalidate_session_cache(session_id)
+        # Update cache in-place rather than invalidating to avoid a
+        # race window where the empty cache gets re-populated with
+        # stale data by a concurrent get_chat_session call.
+        session = await get_chat_session_cached(session_id)
+        if session and session.messages:
+            for msg in reversed(session.messages):
+                if msg.role == "assistant":
+                    msg.duration_ms = duration_ms
+                    break
+            await cache_chat_session(session)
diff --git a/autogpt_platform/backend/backend/copilot/db_test.py b/autogpt_platform/backend/backend/copilot/db_test.py
new file mode 100644
index 0000000000..27fa788702
--- /dev/null
+++ b/autogpt_platform/backend/backend/copilot/db_test.py
@@ -0,0 +1,388 @@
+"""Unit tests for copilot.db — paginated message queries."""
+
+from __future__ import annotations
+
+from datetime import UTC, datetime
+from typing import Any
+from unittest.mock import AsyncMock, patch
+
+import pytest
+from prisma.models import ChatMessage as PrismaChatMessage
+from prisma.models import ChatSession as PrismaChatSession
+
+from backend.copilot.db import (
+    PaginatedMessages,
+    get_chat_messages_paginated,
+    set_turn_duration,
+)
+from backend.copilot.model import ChatMessage as CopilotChatMessage
+from backend.copilot.model import ChatSession, get_chat_session, upsert_chat_session
+
+
+def _make_msg(
+    sequence: int,
+    role: str = "assistant",
+    content: str | None = "hello",
+    tool_calls: Any = None,
+) -> PrismaChatMessage:
+    """Build a minimal PrismaChatMessage for testing."""
+    return PrismaChatMessage(
+        id=f"msg-{sequence}",
+        createdAt=datetime.now(UTC),
+        sessionId="sess-1",
+        role=role,
+        content=content,
+        sequence=sequence,
+        toolCalls=tool_calls,
+        name=None,
+        toolCallId=None,
+        refusal=None,
+        functionCall=None,
+    )
+
+
+def _make_session(
+    session_id: str = "sess-1",
+    user_id: str = "user-1",
+    messages: list[PrismaChatMessage] | None = None,
+) -> PrismaChatSession:
+    """Build a minimal PrismaChatSession for testing."""
+    now = datetime.now(UTC)
+    session = PrismaChatSession.model_construct(
+        id=session_id,
+        createdAt=now,
+        updatedAt=now,
+        userId=user_id,
+        credentials={},
+        successfulAgentRuns={},
+        successfulAgentSchedules={},
+        totalPromptTokens=0,
+        totalCompletionTokens=0,
+        title=None,
+        metadata={},
+        Messages=messages or [],
+    )
+    return session
+
+
+SESSION_ID = "sess-1"
+
+
+@pytest.fixture()
+def mock_db():
+    """Patch ChatSession.prisma().find_first and ChatMessage.prisma().find_many.
+
+    find_first is used for the main query (session + included messages).
+    find_many is used only for boundary expansion queries.
+    """
+    with (
+        patch.object(PrismaChatSession, "prisma") as mock_session_prisma,
+        patch.object(PrismaChatMessage, "prisma") as mock_msg_prisma,
+    ):
+        find_first = AsyncMock()
+        mock_session_prisma.return_value.find_first = find_first
+
+        find_many = AsyncMock(return_value=[])
+        mock_msg_prisma.return_value.find_many = find_many
+
+        yield find_first, find_many
+
+
+# ---------- Basic pagination ----------
+
+
+@pytest.mark.asyncio
+async def test_basic_page_returns_messages_ascending(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """Messages are returned in ascending sequence order."""
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(3), _make_msg(2), _make_msg(1)],
+    )
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+
+    assert isinstance(page, PaginatedMessages)
+    assert [m.sequence for m in page.messages] == [1, 2, 3]
+    assert page.has_more is False
+    assert page.oldest_sequence == 1
+
+
+@pytest.mark.asyncio
+async def test_has_more_when_results_exceed_limit(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """has_more is True when DB returns more than limit items."""
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(3), _make_msg(2), _make_msg(1)],
+    )
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=2)
+
+    assert page is not None
+    assert page.has_more is True
+    assert len(page.messages) == 2
+    assert [m.sequence for m in page.messages] == [2, 3]
+
+
+@pytest.mark.asyncio
+async def test_empty_session_returns_no_messages(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(messages=[])
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=50)
+
+    assert page is not None
+    assert page.messages == []
+    assert page.has_more is False
+    assert page.oldest_sequence is None
+
+
+@pytest.mark.asyncio
+async def test_before_sequence_filters_correctly(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """before_sequence is passed as a where filter inside the Messages include."""
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(2), _make_msg(1)],
+    )
+
+    await get_chat_messages_paginated(SESSION_ID, limit=50, before_sequence=5)
+
+    call_kwargs = find_first.call_args
+    include = call_kwargs.kwargs.get("include") or call_kwargs[1].get("include")
+    assert include["Messages"]["where"] == {"sequence": {"lt": 5}}
+
+
+@pytest.mark.asyncio
+async def test_no_where_on_messages_without_before_sequence(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """Without before_sequence, the Messages include has no where clause."""
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(messages=[_make_msg(1)])
+
+    await get_chat_messages_paginated(SESSION_ID, limit=50)
+
+    call_kwargs = find_first.call_args
+    include = call_kwargs.kwargs.get("include") or call_kwargs[1].get("include")
+    assert "where" not in include["Messages"]
+
+
+@pytest.mark.asyncio
+async def test_user_id_filter_applied_to_session_where(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """user_id adds a userId filter to the session-level where clause."""
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(messages=[_make_msg(1)])
+
+    await get_chat_messages_paginated(SESSION_ID, limit=50, user_id="user-abc")
+
+    call_kwargs = find_first.call_args
+    where = call_kwargs.kwargs.get("where") or call_kwargs[1].get("where")
+    assert where["userId"] == "user-abc"
+
+
+@pytest.mark.asyncio
+async def test_session_not_found_returns_none(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """Returns None when session doesn't exist or user doesn't own it."""
+    find_first, _ = mock_db
+    find_first.return_value = None
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=50)
+
+    assert page is None
+
+
+@pytest.mark.asyncio
+async def test_session_info_included_in_result(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """PaginatedMessages includes session metadata."""
+    find_first, _ = mock_db
+    find_first.return_value = _make_session(messages=[_make_msg(1)])
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=50)
+
+    assert page is not None
+    assert page.session.session_id == SESSION_ID
+
+
+# ---------- Backward boundary expansion ----------
+
+
+@pytest.mark.asyncio
+async def test_boundary_expansion_includes_assistant(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """When page starts with a tool message, expand backward to include
+    the owning assistant message."""
+    find_first, find_many = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(5, role="tool"), _make_msg(4, role="tool")],
+    )
+    find_many.return_value = [_make_msg(3, role="assistant")]
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+
+    assert page is not None
+    assert [m.sequence for m in page.messages] == [3, 4, 5]
+    assert page.messages[0].role == "assistant"
+    assert page.oldest_sequence == 3
+
+
+@pytest.mark.asyncio
+async def test_boundary_expansion_includes_multiple_tool_msgs(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """Boundary expansion scans past consecutive tool messages to find
+    the owning assistant."""
+    find_first, find_many = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(7, role="tool")],
+    )
+    find_many.return_value = [
+        _make_msg(6, role="tool"),
+        _make_msg(5, role="tool"),
+        _make_msg(4, role="assistant"),
+    ]
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+
+    assert page is not None
+    assert [m.sequence for m in page.messages] == [4, 5, 6, 7]
+    assert page.messages[0].role == "assistant"
+
+
+@pytest.mark.asyncio
+async def test_boundary_expansion_sets_has_more_when_not_at_start(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """After boundary expansion, has_more=True if expanded msgs aren't at seq 0."""
+    find_first, find_many = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(3, role="tool")],
+    )
+    find_many.return_value = [_make_msg(2, role="assistant")]
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+
+    assert page is not None
+    assert page.has_more is True
+
+
+@pytest.mark.asyncio
+async def test_boundary_expansion_no_has_more_at_conversation_start(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """has_more stays False when boundary expansion reaches seq 0."""
+    find_first, find_many = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(1, role="tool")],
+    )
+    find_many.return_value = [_make_msg(0, role="assistant")]
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+
+    assert page is not None
+    assert page.has_more is False
+    assert page.oldest_sequence == 0
+
+
+@pytest.mark.asyncio
+async def test_no_boundary_expansion_when_first_msg_not_tool(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """No boundary expansion when the first message is not a tool message."""
+    find_first, find_many = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(3, role="user"), _make_msg(2, role="assistant")],
+    )
+
+    page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+
+    assert page is not None
+    assert find_many.call_count == 0
+    assert [m.sequence for m in page.messages] == [2, 3]
+
+
+@pytest.mark.asyncio
+async def test_boundary_expansion_warns_when_no_owner_found(
+    mock_db: tuple[AsyncMock, AsyncMock],
+):
+    """When boundary scan doesn't find a non-tool message, a warning is logged
+    and the boundary messages are still included."""
+    find_first, find_many = mock_db
+    find_first.return_value = _make_session(
+        messages=[_make_msg(10, role="tool")],
+    )
+    find_many.return_value = [_make_msg(i, role="tool") for i in range(9, -1, -1)]
+
+    with patch("backend.copilot.db.logger") as mock_logger:
+        page = await get_chat_messages_paginated(SESSION_ID, limit=5)
+        mock_logger.warning.assert_called_once()
+
+    assert page is not None
+    assert page.messages[0].role == "tool"
+    assert len(page.messages) > 1
+
+
+# ---------- Turn duration (integration tests) ----------
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_set_turn_duration_updates_cache_in_place(setup_test_user, test_user_id):
+    """set_turn_duration patches the cached session without invalidation.
+
+    Verifies that after calling set_turn_duration the Redis-cached session
+    reflects the updated durationMs on the last assistant message, without
+    the cache having been deleted and re-populated (which could race with
+    concurrent get_chat_session calls).
+    """
+    session = ChatSession.new(user_id=test_user_id, dry_run=False)
+    session.messages = [
+        CopilotChatMessage(role="user", content="hello"),
+        CopilotChatMessage(role="assistant", content="hi there"),
+    ]
+    session = await upsert_chat_session(session)
+
+    # Ensure the session is in cache
+    cached = await get_chat_session(session.session_id, test_user_id)
+    assert cached is not None
+    assert cached.messages[-1].duration_ms is None
+
+    # Update turn duration — should patch cache in-place
+    await set_turn_duration(session.session_id, 1234)
+
+    # Read from cache (not DB) — the cache should already have the update
+    updated = await get_chat_session(session.session_id, test_user_id)
+    assert updated is not None
+    assistant_msgs = [m for m in updated.messages if m.role == "assistant"]
+    assert len(assistant_msgs) == 1
+    assert assistant_msgs[0].duration_ms == 1234
+
+
+@pytest.mark.asyncio(loop_scope="session")
+async def test_set_turn_duration_no_assistant_message(setup_test_user, test_user_id):
+    """set_turn_duration is a no-op when there are no assistant messages."""
+    session = ChatSession.new(user_id=test_user_id, dry_run=False)
+    session.messages = [
+        CopilotChatMessage(role="user", content="hello"),
+    ]
+    session = await upsert_chat_session(session)
+
+    # Should not raise
+    await set_turn_duration(session.session_id, 5678)
+
+    cached = await get_chat_session(session.session_id, test_user_id)
+    assert cached is not None
+    # User message should not have durationMs
+    assert cached.messages[0].duration_ms is None
diff --git a/autogpt_platform/backend/backend/copilot/executor/processor.py b/autogpt_platform/backend/backend/copilot/executor/processor.py
index c111cd6df7..f94821f0e1 100644
--- a/autogpt_platform/backend/backend/copilot/executor/processor.py
+++ b/autogpt_platform/backend/backend/copilot/executor/processor.py
@@ -13,7 +13,7 @@ import time
 
 from backend.copilot import stream_registry
 from backend.copilot.baseline import stream_chat_completion_baseline
-from backend.copilot.config import ChatConfig
+from backend.copilot.config import ChatConfig, CopilotMode
 from backend.copilot.response_model import StreamError
 from backend.copilot.sdk import service as sdk_service
 from backend.copilot.sdk.dummy import stream_chat_completion_dummy
@@ -30,6 +30,57 @@ from .utils import CoPilotExecutionEntry, CoPilotLogMetadata
 logger = TruncatedLogger(logging.getLogger(__name__), prefix="[CoPilotExecutor]")
 
 
+# ============ Mode Routing ============ #
+
+
+async def resolve_effective_mode(
+    mode: CopilotMode | None,
+    user_id: str | None,
+) -> CopilotMode | None:
+    """Strip ``mode`` when the user is not entitled to the toggle.
+
+    The UI gates the mode toggle behind ``CHAT_MODE_OPTION``; the
+    processor enforces the same gate server-side so an authenticated
+    user cannot bypass the flag by crafting a request directly.
+    """
+    if mode is None:
+        return None
+    allowed = await is_feature_enabled(
+        Flag.CHAT_MODE_OPTION,
+        user_id or "anonymous",
+        default=False,
+    )
+    if not allowed:
+        logger.info(f"Ignoring mode={mode} — CHAT_MODE_OPTION is disabled for user")
+        return None
+    return mode
+
+
+async def resolve_use_sdk_for_mode(
+    mode: CopilotMode | None,
+    user_id: str | None,
+    *,
+    use_claude_code_subscription: bool,
+    config_default: bool,
+) -> bool:
+    """Pick the SDK vs baseline path for a single turn.
+
+    Per-request ``mode`` wins whenever it is set (after the
+    ``CHAT_MODE_OPTION`` gate has been applied upstream).  Otherwise
+    falls back to the Claude Code subscription override, then the
+    ``COPILOT_SDK`` LaunchDarkly flag, then the config default.
+    """
+    if mode == "fast":
+        return False
+    if mode == "extended_thinking":
+        return True
+    return use_claude_code_subscription or await is_feature_enabled(
+        Flag.COPILOT_SDK,
+        user_id or "anonymous",
+        default=config_default,
+    )
+
+
 # ============ Module Entry Points ============ #
 
 # Thread-local storage for processor instances
@@ -250,21 +301,26 @@ class CoPilotProcessor:
             if config.test_mode:
                 stream_fn = stream_chat_completion_dummy
                 log.warning("Using DUMMY service (CHAT_TEST_MODE=true)")
+                effective_mode = None
             else:
-                use_sdk = (
-                    config.use_claude_code_subscription
-                    or await is_feature_enabled(
-                        Flag.COPILOT_SDK,
-                        entry.user_id or "anonymous",
-                        default=config.use_claude_agent_sdk,
-                    )
+                # Enforce server-side feature-flag gate so unauthorised
+                # users cannot force a mode by crafting the request.
+                effective_mode = await resolve_effective_mode(entry.mode, entry.user_id)
+                use_sdk = await resolve_use_sdk_for_mode(
+                    effective_mode,
+                    entry.user_id,
+                    use_claude_code_subscription=config.use_claude_code_subscription,
+                    config_default=config.use_claude_agent_sdk,
                 )
                 stream_fn = (
                     sdk_service.stream_chat_completion_sdk
                     if use_sdk
                     else stream_chat_completion_baseline
                 )
-                log.info(f"Using {'SDK' if use_sdk else 'baseline'} service")
+                log.info(
+                    f"Using {'SDK' if use_sdk else 'baseline'} service "
+                    f"(mode={effective_mode or 'default'})"
+                )
 
             # Stream chat completion and publish chunks to Redis.
             # stream_and_publish wraps the raw stream with registry
@@ -276,6 +332,7 @@ class CoPilotProcessor:
                 user_id=entry.user_id,
                 context=entry.context,
                 file_ids=entry.file_ids,
+                mode=effective_mode,
             )
             async for chunk in stream_registry.stream_and_publish(
                 session_id=entry.session_id,
diff --git a/autogpt_platform/backend/backend/copilot/executor/processor_test.py b/autogpt_platform/backend/backend/copilot/executor/processor_test.py
new file mode 100644
index 0000000000..f565c5a2b3
--- /dev/null
+++ b/autogpt_platform/backend/backend/copilot/executor/processor_test.py
@@ -0,0 +1,175 @@
+"""Unit tests for CoPilot mode routing logic in the processor.
+
+Tests cover the mode→service mapping:
+  - 'fast' → baseline service
+  - 'extended_thinking' → SDK service
+  - None → feature flag / config fallback
+
+as well as the ``CHAT_MODE_OPTION`` server-side gate.  The tests import
+the real production helpers from ``processor.py`` so the routing logic
+has meaningful coverage.
+"""
+
+from unittest.mock import AsyncMock, patch
+
+import pytest
+
+from backend.copilot.executor.processor import (
+    resolve_effective_mode,
+    resolve_use_sdk_for_mode,
+)
+
+
+class TestResolveUseSdkForMode:
+    """Tests for the per-request mode routing logic."""
+
+    @pytest.mark.asyncio
+    async def test_fast_mode_uses_baseline(self):
+        """mode='fast' always routes to baseline, regardless of flags."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=True),
+        ):
+            assert (
+                await resolve_use_sdk_for_mode(
+                    "fast",
+                    "user-1",
+                    use_claude_code_subscription=True,
+                    config_default=True,
+                )
+                is False
+            )
+
+    @pytest.mark.asyncio
+    async def test_extended_thinking_uses_sdk(self):
+        """mode='extended_thinking' always routes to SDK, regardless of flags."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=False),
+        ):
+            assert (
+                await resolve_use_sdk_for_mode(
+                    "extended_thinking",
+                    "user-1",
+                    use_claude_code_subscription=False,
+                    config_default=False,
+                )
+                is True
+            )
+
+    @pytest.mark.asyncio
+    async def test_none_mode_uses_subscription_override(self):
+        """mode=None with claude_code_subscription=True routes to SDK."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=False),
+        ):
+            assert (
+                await resolve_use_sdk_for_mode(
+                    None,
+                    "user-1",
+                    use_claude_code_subscription=True,
+                    config_default=False,
+                )
+                is True
+            )
+
+    @pytest.mark.asyncio
+    async def test_none_mode_uses_feature_flag(self):
+        """mode=None with feature flag enabled routes to SDK."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=True),
+        ) as flag_mock:
+            assert (
+                await resolve_use_sdk_for_mode(
+                    None,
+                    "user-1",
+                    use_claude_code_subscription=False,
+                    config_default=False,
+                )
+                is True
+            )
+            flag_mock.assert_awaited_once()
+
+    @pytest.mark.asyncio
+    async def test_none_mode_uses_config_default(self):
+        """mode=None falls back to config.use_claude_agent_sdk."""
+        # When LaunchDarkly returns the default (True), we expect SDK routing.
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=True),
+        ):
+            assert (
+                await resolve_use_sdk_for_mode(
+                    None,
+                    "user-1",
+                    use_claude_code_subscription=False,
+                    config_default=True,
+                )
+                is True
+            )
+
+    @pytest.mark.asyncio
+    async def test_none_mode_all_disabled(self):
+        """mode=None with all flags off routes to baseline."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=False),
+        ):
+            assert (
+                await resolve_use_sdk_for_mode(
+                    None,
+                    "user-1",
+                    use_claude_code_subscription=False,
+                    config_default=False,
+                )
+                is False
+            )
+
+
+class TestResolveEffectiveMode:
+    """Tests for the CHAT_MODE_OPTION server-side gate."""
+
+    @pytest.mark.asyncio
+    async def test_none_mode_passes_through(self):
+        """mode=None is returned as-is without a flag check."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=False),
+        ) as flag_mock:
+            assert await resolve_effective_mode(None, "user-1") is None
+            flag_mock.assert_not_awaited()
+
+    @pytest.mark.asyncio
+    async def test_mode_stripped_when_flag_disabled(self):
+        """When CHAT_MODE_OPTION is off, mode is dropped to None."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=False),
+        ):
+            assert await resolve_effective_mode("fast", "user-1") is None
+            assert await resolve_effective_mode("extended_thinking", "user-1") is None
+
+    @pytest.mark.asyncio
+    async def test_mode_preserved_when_flag_enabled(self):
+        """When CHAT_MODE_OPTION is on, the user-selected mode is preserved."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=True),
+        ):
+            assert await resolve_effective_mode("fast", "user-1") == "fast"
+            assert (
+                await resolve_effective_mode("extended_thinking", "user-1")
+                == "extended_thinking"
+            )
+
+    @pytest.mark.asyncio
+    async def test_anonymous_user_with_mode(self):
+        """Anonymous users (user_id=None) still pass through the gate."""
+        with patch(
+            "backend.copilot.executor.processor.is_feature_enabled",
+            new=AsyncMock(return_value=False),
+        ) as flag_mock:
+            assert await resolve_effective_mode("fast", None) is None
+            flag_mock.assert_awaited_once()
diff --git a/autogpt_platform/backend/backend/copilot/executor/utils.py b/autogpt_platform/backend/backend/copilot/executor/utils.py
index 9edd90b462..2a25c202fe 100644
--- a/autogpt_platform/backend/backend/copilot/executor/utils.py
+++ b/autogpt_platform/backend/backend/copilot/executor/utils.py
@@ -9,6 +9,7 @@ import logging
 
 from pydantic import BaseModel
 
+from backend.copilot.config import CopilotMode
 from backend.data.rabbitmq import Exchange, ExchangeType, Queue, RabbitMQConfig
 from backend.util.logging import TruncatedLogger, is_structured_logging_enabled
 
@@ -162,6 +163,9 @@ class CoPilotExecutionEntry(BaseModel):
     team_id: str | None = None
     """Active workspace for tenant-scoped execution"""
 
+    mode: CopilotMode | None = None
+    """Autopilot mode override: 'fast' or 'extended_thinking'. None = server default."""
+
 
 class CancelCoPilotEvent(BaseModel):
     """Event to cancel a CoPilot operation."""
@@ -183,6 +187,7 @@ async def enqueue_copilot_turn(
     file_ids: list[str] | None = None,
     organization_id: str | None = None,
     team_id: str | None = None,
+    mode: CopilotMode | None = None,
 ) -> None:
     """Enqueue a CoPilot task for processing by the executor service.
 
@@ -194,6 +199,7 @@ async def enqueue_copilot_turn(
         is_user_message: Whether the message is from the user (vs system/assistant)
         context: Optional context for the message (e.g., {url: str, content: str})
         file_ids: Optional workspace file IDs attached to the user's message
+        mode: Autopilot mode override ('fast' or 'extended_thinking'). None = server default.
     """
     from backend.util.clients import get_async_copilot_queue
 
@@ -207,6 +213,7 @@ async def enqueue_copilot_turn(
         file_ids=file_ids,
         organization_id=organization_id,
         team_id=team_id,
+        mode=mode,
     )
 
     queue_client = await get_async_copilot_queue()
diff --git a/autogpt_platform/backend/backend/copilot/executor/utils_test.py b/autogpt_platform/backend/backend/copilot/executor/utils_test.py
new file mode 100644
index 0000000000..47602551ba
--- /dev/null
+++ b/autogpt_platform/backend/backend/copilot/executor/utils_test.py
@@ -0,0 +1,123 @@
+"""Tests for CoPilot executor utils (queue config, message models, logging)."""
+
+from backend.copilot.executor.utils import (
+    COPILOT_EXECUTION_EXCHANGE,
+    COPILOT_EXECUTION_QUEUE_NAME,
+    COPILOT_EXECUTION_ROUTING_KEY,
+    CancelCoPilotEvent,
+    CoPilotExecutionEntry,
+    CoPilotLogMetadata,
+    create_copilot_queue_config,
+)
+
+
+class TestCoPilotExecutionEntry:
+    def test_basic_fields(self):
+        entry = CoPilotExecutionEntry(
+            session_id="s1",
+            user_id="u1",
+            message="hello",
+        )
+        assert entry.session_id == "s1"
+        assert entry.user_id == "u1"
+        assert entry.message == "hello"
+        assert entry.is_user_message is True
+        assert entry.mode is None
+        assert entry.context is None
+        assert entry.file_ids is None
+
+    def test_mode_field(self):
+        entry = CoPilotExecutionEntry(
+            session_id="s1",
+            user_id="u1",
+            message="test",
+            mode="fast",
+        )
+        assert entry.mode == "fast"
+
+        entry2 = CoPilotExecutionEntry(
+            session_id="s1",
+            user_id="u1",
+            message="test",
+            mode="extended_thinking",
+        )
+        assert entry2.mode == "extended_thinking"
+
+    def test_optional_fields(self):
+        entry = CoPilotExecutionEntry(
+            session_id="s1",
+            user_id="u1",
+            message="test",
+            turn_id="t1",
+            context={"url": "https://example.com"},
+            file_ids=["f1", "f2"],
+            is_user_message=False,
+        )
+        assert entry.turn_id == "t1"
+        assert entry.context == {"url": "https://example.com"}
+        assert entry.file_ids == ["f1", "f2"]
+        assert entry.is_user_message is False
+
+    def test_serialization_roundtrip(self):
+        entry = CoPilotExecutionEntry(
+            session_id="s1",
+            user_id="u1",
+            message="hello",
+            mode="fast",
+        )
+        json_str = entry.model_dump_json()
+        restored = CoPilotExecutionEntry.model_validate_json(json_str)
+        assert restored == entry
+
+
+class TestCancelCoPilotEvent:
+    def test_basic(self):
+        event = CancelCoPilotEvent(session_id="s1")
+        assert event.session_id == "s1"
+
+    def test_serialization(self):
+        event = CancelCoPilotEvent(session_id="s1")
+        restored = CancelCoPilotEvent.model_validate_json(event.model_dump_json())
+        assert restored.session_id == "s1"
+
+
+class TestCreateCopilotQueueConfig:
+    def test_returns_valid_config(self):
+        config = create_copilot_queue_config()
+        assert len(config.exchanges) == 2
+        assert len(config.queues) == 2
+
+    def test_execution_queue_properties(self):
+        config = create_copilot_queue_config()
+        exec_queue = next(
+            q for q in config.queues if q.name == COPILOT_EXECUTION_QUEUE_NAME
+        )
+        assert exec_queue.durable is True
+        assert exec_queue.exchange == COPILOT_EXECUTION_EXCHANGE
+        assert exec_queue.routing_key == COPILOT_EXECUTION_ROUTING_KEY
+
+    def test_cancel_queue_uses_fanout(self):
+        config = create_copilot_queue_config()
+        cancel_queue = next(
+            q for q in config.queues if q.name != COPILOT_EXECUTION_QUEUE_NAME
+        )
+        assert cancel_queue.exchange is not None
+        assert cancel_queue.exchange.type.value == "fanout"
+
+
+class TestCoPilotLogMetadata:
+    def test_creates_logger_with_metadata(self):
+        import logging
+
+        base_logger = logging.getLogger("test")
+        log = CoPilotLogMetadata(base_logger, session_id="s1", user_id="u1")
+        assert log is not None
+
+    def test_filters_none_values(self):
+        import logging
+
+        base_logger = logging.getLogger("test")
+        log = CoPilotLogMetadata(
+            base_logger, session_id="s1", user_id=None, turn_id="t1"
+        )
+        assert log is not None
diff --git a/autogpt_platform/backend/backend/copilot/model.py b/autogpt_platform/backend/backend/copilot/model.py
index 9afc380d68..9bb7964b93 100644
--- a/autogpt_platform/backend/backend/copilot/model.py
+++ b/autogpt_platform/backend/backend/copilot/model.py
@@ -64,6 +64,7 @@ class ChatMessage(BaseModel):
     refusal: str | None = None
     tool_calls: list[dict] | None = None
     function_call: dict | None = None
+    sequence: int | None = None
     duration_ms: int | None = None
 
     @staticmethod
@@ -77,10 +78,54 @@ class ChatMessage(BaseModel):
             refusal=prisma_message.refusal,
             tool_calls=_parse_json_field(prisma_message.toolCalls),
             function_call=_parse_json_field(prisma_message.functionCall),
+            sequence=prisma_message.sequence,
             duration_ms=prisma_message.durationMs,
         )
 
 
+def is_message_duplicate(
+    messages: list[ChatMessage],
+    role: str,
+    content: str,
+) -> bool:
+    """Check whether *content* is already present in the current pending turn.
+
+    Only inspects trailing messages that share the given *role* (i.e. the
+    current turn). This ensures legitimately repeated messages across different
+    turns are not suppressed, while same-turn duplicates from stale cache are
+    still caught.
+    """
+    for m in reversed(messages):
+        if m.role == role:
+            if m.content == content:
+                return True
+        else:
+            break
+    return False
+
+
+def maybe_append_user_message(
+    session: "ChatSession",
+    message: str | None,
+    is_user_message: bool,
+) -> bool:
+    """Append a user/assistant message to the session if not already present.
+
+    The route handler already persists the user message before enqueueing,
+    so we check trailing same-role messages to avoid re-appending when the
+    session cache is slightly stale.
+
+    Returns True if the message was appended, False if skipped.
+    """
+    if not message:
+        return False
+    role = "user" if is_user_message else "assistant"
+    if is_message_duplicate(session.messages, role, message):
+        return False
+    session.messages.append(ChatMessage(role=role, content=message))
+    return True
+
+
 class Usage(BaseModel):
     prompt_tokens: int
     completion_tokens: int
diff --git a/autogpt_platform/backend/backend/copilot/model_test.py b/autogpt_platform/backend/backend/copilot/model_test.py
index 6e748d9c6d..c78d63cc5a 100644
--- a/autogpt_platform/backend/backend/copilot/model_test.py
+++ b/autogpt_platform/backend/backend/copilot/model_test.py
@@ -17,6 +17,8 @@ from .model import (
     ChatSession,
     Usage,
     get_chat_session,
+    is_message_duplicate,
+    maybe_append_user_message,
     upsert_chat_session,
 )
 
@@ -424,3 +426,151 @@ async def test_concurrent_saves_collision_detection(setup_test_user, test_user_i
     assert "Streaming message 1" in contents
     assert "Streaming message 2" in contents
     assert "Callback result" in contents
+
+
+# --------------------------------------------------------------------------- #
+#  is_message_duplicate                                                        #
+# --------------------------------------------------------------------------- #
+
+
+def test_duplicate_detected_in_trailing_same_role():
+    """Duplicate user message at the tail is detected."""
+    msgs = [
+        ChatMessage(role="user", content="hello"),
+        ChatMessage(role="assistant", content="hi there"),
+        ChatMessage(role="user", content="yes"),
+    ]
+    assert is_message_duplicate(msgs, "user", "yes") is True
+
+
+def test_duplicate_not_detected_across_turns():
+    """Same text in a previous turn (separated by assistant) is NOT a duplicate."""
+    msgs = [
+        ChatMessage(role="user", content="yes"),
+        ChatMessage(role="assistant", content="ok"),
+    ]
+    assert is_message_duplicate(msgs, "user", "yes") is False
+
+
+def test_no_duplicate_on_empty_messages():
+    """Empty message list never reports a duplicate."""
+    assert is_message_duplicate([], "user", "hello") is False
+
+
+def test_no_duplicate_when_content_differs():
+    """Different content in the trailing same-role block is not a duplicate."""
+    msgs = [
+        ChatMessage(role="assistant", content="response"),
+        ChatMessage(role="user", content="first message"),
+    ]
+    assert is_message_duplicate(msgs, "user", "second message") is False
+
+
+def test_duplicate_with_multiple_trailing_same_role():
+    """Detects duplicate among multiple consecutive same-role messages."""
+    msgs = [
+        ChatMessage(role="assistant", content="response"),
+        ChatMessage(role="user", content="msg1"),
+        ChatMessage(role="user", content="msg2"),
+    ]
+    assert is_message_duplicate(msgs, "user", "msg1") is True
+    assert is_message_duplicate(msgs, "user", "msg2") is True
+    assert is_message_duplicate(msgs, "user", "msg3") is False
+
+
+def test_duplicate_check_for_assistant_role():
+    """Works correctly when checking assistant role too."""
+    msgs = [
+        ChatMessage(role="user", content="hi"),
+        ChatMessage(role="assistant", content="hello"),
+        ChatMessage(role="assistant", content="how can I help?"),
+    ]
+    assert is_message_duplicate(msgs, "assistant", "hello") is True
+    assert is_message_duplicate(msgs, "assistant", "new response") is False
+
+
+def test_no_false_positive_when_content_is_none():
+    """Messages with content=None in the trailing block do not match."""
+    msgs = [
+        ChatMessage(role="user", content=None),
+        ChatMessage(role="user", content="hello"),
+    ]
+    assert is_message_duplicate(msgs, "user", "hello") is True
+    # None-content message should not match any string
+    msgs2 = [
+        ChatMessage(role="user", content=None),
+    ]
+    assert is_message_duplicate(msgs2, "user", "hello") is False
+
+
+def test_all_same_role_messages():
+    """When all messages share the same role, the entire list is scanned."""
+    msgs = [
+        ChatMessage(role="user", content="first"),
+        ChatMessage(role="user", content="second"),
+        ChatMessage(role="user", content="third"),
+    ]
+    assert is_message_duplicate(msgs, "user", "first") is True
+    assert is_message_duplicate(msgs, "user", "new") is False
+
+
+# --------------------------------------------------------------------------- #
+#  maybe_append_user_message                                                   #
+# --------------------------------------------------------------------------- #
+
+
+def test_maybe_append_user_message_appends_new():
+    """A new user message is appended and returns True."""
+    session = ChatSession.new(user_id="u", dry_run=False)
+    session.messages = [
+        ChatMessage(role="assistant", content="hello"),
+    ]
+    result = maybe_append_user_message(session, "new msg", is_user_message=True)
+    assert result is True
+    assert len(session.messages) == 2
+    assert session.messages[-1].role == "user"
+    assert session.messages[-1].content == "new msg"
+
+
+def test_maybe_append_user_message_skips_duplicate():
+    """A duplicate user message is skipped and returns False."""
+    session = ChatSession.new(user_id="u", dry_run=False)
+    session.messages = [
+        ChatMessage(role="assistant", content="hello"),
+        ChatMessage(role="user", content="dup"),
+    ]
+    result = maybe_append_user_message(session, "dup", is_user_message=True)
+    assert result is False
+    assert len(session.messages) == 2
+
+
+def test_maybe_append_user_message_none_message():
+    """None/empty message returns False without appending."""
+    session = ChatSession.new(user_id="u", dry_run=False)
+    assert maybe_append_user_message(session, None, is_user_message=True) is False
+    assert maybe_append_user_message(session, "", is_user_message=True) is False
+    assert len(session.messages) == 0
+
+
+def test_maybe_append_assistant_message():
+    """Works for assistant role when is_user_message=False."""
+    session = ChatSession.new(user_id="u", dry_run=False)
+    session.messages = [
+        ChatMessage(role="user", content="hi"),
+    ]
+    result = maybe_append_user_message(session, "response", is_user_message=False)
+    assert result is True
+    assert session.messages[-1].role == "assistant"
+    assert session.messages[-1].content == "response"
+
+
+def test_maybe_append_assistant_skips_duplicate():
+    """Duplicate assistant message is skipped."""
+    session = ChatSession.new(user_id="u", dry_run=False)
+    session.messages = [
+        ChatMessage(role="user", content="hi"),
+        ChatMessage(role="assistant", content="dup"),
+    ]
+    result = maybe_append_user_message(session, "dup", is_user_message=False)
+    assert result is False
+    assert len(session.messages) == 2
diff --git a/autogpt_platform/backend/backend/copilot/prompting.py b/autogpt_platform/backend/backend/copilot/prompting.py
index 2c95c1721b..dd630a2e9b 100644
--- a/autogpt_platform/backend/backend/copilot/prompting.py
+++ b/autogpt_platform/backend/backend/copilot/prompting.py
@@ -126,6 +126,21 @@ After building the file, reference it with `@@agptfile:` in other tools:
 - When spawning sub-agents for research, ensure each has a distinct
   non-overlapping scope to avoid redundant searches.
 
+
+### Tool Discovery Priority
+
+When the user asks to interact with a service or API, follow this order:
+
+1. **find_block first** — Search platform blocks with `find_block`. The platform has hundreds of built-in blocks (Google Sheets, Docs, Calendar, Gmail, Slack, GitHub, etc.) that work without extra setup.
+
+2. **run_mcp_tool** — If no matching block exists, check if a hosted MCP server is available for the service. Only use known MCP server URLs from the registry.
+
+3. **SendAuthenticatedWebRequestBlock** — If no block or MCP server exists, use `SendAuthenticatedWebRequestBlock` with existing host-scoped credentials. Check available credentials via `connect_integration`.
+
+4. **Manual API call** — As a last resort, guide the user to set up credentials and use `SendAuthenticatedWebRequestBlock` with direct API calls.
+
+**Never skip step 1.** Built-in blocks are more reliable, tested, and user-friendly than MCP or raw API calls.
+
 ### Sub-agent tasks
 - When using the Task tool, NEVER set `run_in_background` to true.
   All tasks must run in the foreground.
diff --git a/autogpt_platform/backend/backend/copilot/rate_limit_test.py b/autogpt_platform/backend/backend/copilot/rate_limit_test.py
index 6daca40175..6a4416148c 100644
--- a/autogpt_platform/backend/backend/copilot/rate_limit_test.py
+++ b/autogpt_platform/backend/backend/copilot/rate_limit_test.py
@@ -13,12 +13,21 @@ from .rate_limit import (
     RateLimitExceeded,
     SubscriptionTier,
     UsageWindow,
+    _daily_key,
+    _daily_reset_time,
+    _weekly_key,
+    _weekly_reset_time,
+    acquire_reset_lock,
     check_rate_limit,
+    get_daily_reset_count,
     get_global_rate_limits,
     get_usage_status,
     get_user_tier,
+    increment_daily_reset_count,
     record_token_usage,
+    release_reset_lock,
     reset_daily_usage,
+    reset_user_usage,
     set_user_tier,
 )
 
@@ -1210,3 +1219,205 @@ class TestTierLimitsEnforced:
             assert daily == biz_daily  # 20x
             # Should NOT raise — usage is within the BUSINESS tier allowance
             await check_rate_limit(_USER, daily, weekly)
+
+
+# ---------------------------------------------------------------------------
+# Private key/reset helpers
+# ---------------------------------------------------------------------------
+
+
+class TestKeyHelpers:
+    def test_daily_key_format(self):
+        now = datetime(2026, 4, 3, 12, 0, 0, tzinfo=UTC)
+        key = _daily_key("user-1", now=now)
+        assert "daily" in key
+        assert "user-1" in key
+        assert "2026-04-03" in key
+
+    def test_daily_key_defaults_to_now(self):
+        key = _daily_key("user-1")
+        assert "daily" in key
+        assert "user-1" in key
+
+    def test_weekly_key_format(self):
+        now = datetime(2026, 4, 3, 12, 0, 0, tzinfo=UTC)
+        key = _weekly_key("user-1", now=now)
+        assert "weekly" in key
+        assert "user-1" in key
+        assert "2026-W" in key
+
+    def test_weekly_key_defaults_to_now(self):
+        key = _weekly_key("user-1")
+        assert "weekly" in key
+
+    def test_daily_reset_time_is_next_midnight(self):
+        now = datetime(2026, 4, 3, 15, 30, 0, tzinfo=UTC)
+        reset = _daily_reset_time(now=now)
+        assert reset == datetime(2026, 4, 4, 0, 0, 0, tzinfo=UTC)
+
+    def test_daily_reset_time_defaults_to_now(self):
+        reset = _daily_reset_time()
+        assert reset.hour == 0
+        assert reset.minute == 0
+
+    def test_weekly_reset_time_is_next_monday(self):
+        # 2026-04-03 is a Friday
+        now = datetime(2026, 4, 3, 15, 30, 0, tzinfo=UTC)
+        reset = _weekly_reset_time(now=now)
+        assert reset.weekday() == 0  # Monday
+        assert reset == datetime(2026, 4, 6, 0, 0, 0, tzinfo=UTC)
+
+    def test_weekly_reset_time_defaults_to_now(self):
+        reset = _weekly_reset_time()
+        assert reset.weekday() == 0  # Monday
+
+
+# ---------------------------------------------------------------------------
+# acquire_reset_lock / release_reset_lock
+# ---------------------------------------------------------------------------
+
+
+class TestResetLock:
+    @pytest.mark.asyncio
+    async def test_acquire_lock_success(self):
+        mock_redis = AsyncMock()
+        mock_redis.set = AsyncMock(return_value=True)
+        with patch(
+            "backend.copilot.rate_limit.get_redis_async", return_value=mock_redis
+        ):
+            result = await acquire_reset_lock("user-1")
+        assert result is True
+
+    @pytest.mark.asyncio
+    async def test_acquire_lock_already_held(self):
+        mock_redis = AsyncMock()
+        mock_redis.set = AsyncMock(return_value=False)
+        with patch(
+            "backend.copilot.rate_limit.get_redis_async", return_value=mock_redis
+        ):
+            result = await acquire_reset_lock("user-1")
+        assert result is False
+
+    @pytest.mark.asyncio
+    async def test_acquire_lock_redis_unavailable(self):
+        with patch(
+            "backend.copilot.rate_limit.get_redis_async",
+            side_effect=RedisError("down"),
+        ):
+            result = await acquire_reset_lock("user-1")
+        assert result is False
+
+    @pytest.mark.asyncio
+    async def test_release_lock_success(self):
+        mock_redis = AsyncMock()
+        with patch(
+            "backend.copilot.rate_limit.get_redis_async", return_value=mock_redis
+        ):
+            await release_reset_lock("user-1")
+        mock_redis.delete.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_release_lock_redis_unavailable(self):
+        with patch(
+            "backend.copilot.rate_limit.get_redis_async",
+            side_effect=RedisError("down"),
+        ):
+            # Should not raise
+            await release_reset_lock("user-1")
+
+
+# ---------------------------------------------------------------------------
+# get_daily_reset_count / increment_daily_reset_count
+# ---------------------------------------------------------------------------
+
+
+class TestDailyResetCount:
+    @pytest.mark.asyncio
+    async def test_get_count_returns_value(self):
+        mock_redis = AsyncMock()
+        mock_redis.get = AsyncMock(return_value="3")
+        with patch(
+            "backend.copilot.rate_limit.get_redis_async", return_value=mock_redis
+        ):
+            count = await get_daily_reset_count("user-1")
+        assert count == 3
+
+    @pytest.mark.asyncio
+    async def test_get_count_returns_zero_when_no_key(self):
+        mock_redis = AsyncMock()
+        mock_redis.get = AsyncMock(return_value=None)
+        with patch(
+            "backend.copilot.rate_limit.get_redis_async", return_value=mock_redis
+        ):
+            count = await get_daily_reset_count("user-1")
+        assert count == 0
+
+    @pytest.mark.asyncio
+    async def test_get_count_returns_none_when_redis_unavailable(self):
+        with patch(
+            "backend.copilot.rate_limit.get_redis_async",
+            side_effect=RedisError("down"),
+        ):
+            count = await get_daily_reset_count("user-1")
+        assert count is None
+
+    @pytest.mark.asyncio
+    async def test_increment_count(self):
+        mock_pipe = MagicMock()
+        mock_pipe.incr = MagicMock()
+        mock_pipe.expire = MagicMock()
+        mock_pipe.execute = AsyncMock()
+
+        mock_redis = AsyncMock()
+        mock_redis.pipeline = MagicMock(return_value=mock_pipe)
+
+        with patch(
+            "backend.copilot.rate_limit.get_redis_async", return_value=mock_redis
+        ):
+            await increment_daily_reset_count("user-1")
+        mock_pipe.incr.assert_called_once()
+        mock_pipe.expire.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_increment_count_redis_unavailable(self):
+        with patch(
+            "backend.copilot.rate_limit.get_redis_async",
+            side_effect=RedisError("down"),
+        ):
+            # Should not raise
+            await increment_daily_reset_count("user-1")
+
+
+# ---------------------------------------------------------------------------
+# reset_user_usage
+# ---------------------------------------------------------------------------
+
+
+class TestResetUserUsage:
+    @pytest.mark.asyncio
+    async def test_resets_daily_key(self):
+        mock_redis = AsyncMock()
+        with patch(
+            "backend.copilot.rate_limit.get_redis_async", return_value=mock_redis
+        ):
+            await reset_user_usage("user-1")
+        mock_redis.delete.assert_called_once()
+
+    @pytest.mark.asyncio
+    async def test_resets_daily_and_weekly(self):
+        mock_redis = AsyncMock()
+        with patch(
+            "backend.copilot.rate_limit.get_redis_async", return_value=mock_redis
+        ):
+            await reset_user_usage("user-1", reset_weekly=True)
+        args = mock_redis.delete.call_args[0]
+        assert len(args) == 2  # both daily and weekly keys
+
+    @pytest.mark.asyncio
+    async def test_raises_on_redis_failure(self):
+        with patch(
+            "backend.copilot.rate_limit.get_redis_async",
+            side_effect=RedisError("down"),
+        ):
+            with pytest.raises(RedisError):
+                await reset_user_usage("user-1")
diff --git a/autogpt_platform/backend/backend/copilot/sdk/agent_generation_guide.md b/autogpt_platform/backend/backend/copilot/sdk/agent_generation_guide.md
index cdb436429e..28b6f1c7dc 100644
--- a/autogpt_platform/backend/backend/copilot/sdk/agent_generation_guide.md
+++ b/autogpt_platform/backend/backend/copilot/sdk/agent_generation_guide.md
@@ -53,6 +53,12 @@ Steps:
    or fix manually based on the error descriptions. Iterate until valid.
 8. **Save**: Call `create_agent` (new) or `edit_agent` (existing) with
    the final `agent_json`
+8. **Dry-run**: ALWAYS call `run_agent` with `dry_run=True` and
+   `wait_for_result=120` to verify the agent works end-to-end.
+9. **Inspect & fix**: Check the dry-run output for errors. If issues are
+   found, call `edit_agent` to fix and dry-run again. Repeat until the
+   simulation passes or the problems are clearly unfixable.
+   See "REQUIRED: Dry-Run Verification Loop" section below for details.
 
 ### Agent JSON Structure
 
@@ -246,19 +252,51 @@ call in a loop until the task is complete:
 Regular blocks work exactly like sub-agents as tools — wire each input
 field from `source_name: "tools"` on the Orchestrator side.
 
-### Testing with Dry Run
+### REQUIRED: Dry-Run Verification Loop (create -> dry-run -> fix)
 
-After saving an agent, suggest a dry run to validate wiring without consuming
-real API calls, credentials, or credits:
+After creating or editing an agent, you MUST dry-run it before telling the
+user the agent is ready. NEVER skip this step.
 
-1. **Run**: Call `run_agent` or `run_block` with `dry_run=True` and provide
-   sample inputs. This executes the graph with mock outputs, verifying that
-   links resolve correctly and required inputs are satisfied.
-2. **Check results**: Call `view_agent_output` with `show_execution_details=True`
-   to inspect the full node-by-node execution trace. This shows what each node
-   received as input and produced as output, making it easy to spot wiring issues.
-3. **Iterate**: If the dry run reveals wiring issues or missing inputs, fix
-   the agent JSON and re-save before suggesting a real execution.
+#### Step-by-step workflow
+
+1. **Create/Edit**: Call `create_agent` or `edit_agent` to save the agent.
+2. **Dry-run**: Call `run_agent` with `dry_run=True`, `wait_for_result=120`,
+   and realistic sample inputs that exercise every path in the agent. This
+   simulates execution using an LLM for each block — no real API calls,
+   credentials, or credits are consumed.
+3. **Inspect output**: Examine the dry-run result for problems. If
+   `wait_for_result` returns only a summary, call
+   `view_agent_output(execution_id=..., show_execution_details=True)` to
+   see the full node-by-node execution trace. Look for:
+   - **Errors / failed nodes** — a node raised an exception or returned an
+     error status. Common causes: wrong `source_name`/`sink_name` in links,
+     missing `input_default` values, or referencing a nonexistent block output.
+   - **Null / empty outputs** — data did not flow through a link. Verify that
+     `source_name` and `sink_name` match the block schemas exactly (case-
+     sensitive, including nested `_#_` notation).
+   - **Nodes that never executed** — the node was not reached. Likely a
+     missing or broken link from an upstream node.
+   - **Unexpected values** — data arrived but in the wrong type or
+     structure. Check type compatibility between linked ports.
+4. **Fix**: If any issues are found, call `edit_agent` with the corrected
+   agent JSON, then go back to step 2.
+5. **Repeat**: Continue the dry-run -> fix cycle until the simulation passes
+   or the problems are clearly unfixable. If you stop making progress,
+   report the remaining issues to the user and ask for guidance.
+
+#### Good vs bad dry-run output
+
+**Good output** (agent is ready):
+- All nodes executed successfully (no errors in the execution trace)
+- Data flows through every link with non-null, correctly-typed values
+- The final `AgentOutputBlock` contains a meaningful result
+- Status is `COMPLETED`
+
+**Bad output** (needs fixing):
+- Status is `FAILED` — check the error message for the failing node
+- An output node received `null` — trace back to find the broken link
+- A node received data in the wrong format (e.g. string where list expected)
+- Nodes downstream of a failing node were skipped entirely
 
 **Special block behaviour in dry-run mode:**
 - **OrchestratorBlock** and **AgentExecutorBlock** execute for real so the
diff --git a/autogpt_platform/backend/backend/copilot/sdk/mcp_tool_guide.md b/autogpt_platform/backend/backend/copilot/sdk/mcp_tool_guide.md
index 97c60168b8..a86aa2d12b 100644
--- a/autogpt_platform/backend/backend/copilot/sdk/mcp_tool_guide.md
+++ b/autogpt_platform/backend/backend/copilot/sdk/mcp_tool_guide.md
@@ -28,13 +28,12 @@ Each result includes a `remotes` array with the exact server URL to use.
 
 ### Important: Check blocks first
 
-Before using `run_mcp_tool`, always check if the platform already has blocks for the service
-using `find_block`. The platform has hundreds of built-in blocks (Google Sheets, Google Docs,
-Google Calendar, Gmail, etc.) that work without MCP setup.
+Always follow the **Tool Discovery Priority** described in the tool notes:
+call `find_block` before resorting to `run_mcp_tool`.
 
 Only use `run_mcp_tool` when:
-- The service is in the known hosted MCP servers list above, OR
-- You searched `find_block` first and found no matching blocks
+- You searched `find_block` first and found no matching blocks, AND
+- The service is in the known hosted MCP servers list above or found via the registry API
 
 **Never guess or construct MCP server URLs.** Only use URLs from the known servers list above
 or from the `remotes[].url` field in MCP registry search results.
diff --git a/autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py b/autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
index 27e334e9bd..a9783c4079 100644
--- a/autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
@@ -8,20 +8,19 @@ from uuid import uuid4
 
 import pytest
 
-from backend.util import json
-from backend.util.prompt import CompressResult
-
-from .conftest import build_test_transcript as _build_transcript
-from .service import _friendly_error_text, _is_prompt_too_long
-from .transcript import (
+from backend.copilot.transcript import (
     _flatten_assistant_content,
     _flatten_tool_result_content,
     _messages_to_transcript,
     _run_compression,
     _transcript_to_messages,
-    compact_transcript,
-    validate_transcript,
 )
+from backend.util import json
+from backend.util.prompt import CompressResult
+
+from .conftest import build_test_transcript as _build_transcript
+from .service import _friendly_error_text, _is_prompt_too_long
+from .transcript import compact_transcript, validate_transcript
 
 # ---------------------------------------------------------------------------
 # _flatten_assistant_content
@@ -403,7 +402,7 @@ class TestCompactTranscript:
             },
         )()
         with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
             new_callable=AsyncMock,
             return_value=mock_result,
         ):
@@ -438,7 +437,7 @@ class TestCompactTranscript:
             },
         )()
         with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
             new_callable=AsyncMock,
             return_value=mock_result,
         ):
@@ -462,7 +461,7 @@ class TestCompactTranscript:
             ]
         )
         with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
             new_callable=AsyncMock,
             side_effect=RuntimeError("LLM unavailable"),
         ):
@@ -568,11 +567,11 @@ class TestRunCompressionTimeout:
 
         with (
             patch(
-                "backend.copilot.sdk.transcript.get_openai_client",
+                "backend.copilot.transcript.get_openai_client",
                 return_value="fake-client",
             ),
             patch(
-                "backend.copilot.sdk.transcript.compress_context",
+                "backend.copilot.transcript.compress_context",
                 side_effect=_mock_compress,
             ),
         ):
@@ -602,11 +601,11 @@ class TestRunCompressionTimeout:
 
         with (
             patch(
-                "backend.copilot.sdk.transcript.get_openai_client",
+                "backend.copilot.transcript.get_openai_client",
                 return_value=None,
             ),
             patch(
-                "backend.copilot.sdk.transcript.compress_context",
+                "backend.copilot.transcript.compress_context",
                 new_callable=AsyncMock,
                 return_value=truncation_result,
             ) as mock_compress,
diff --git a/autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py b/autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
index 9bacffb6a8..2873ee596d 100644
--- a/autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
@@ -26,18 +26,17 @@ from unittest.mock import AsyncMock, MagicMock, patch
 
 import pytest
 
-from backend.util import json
-
-from .conftest import build_test_transcript as _build_transcript
-from .service import _MAX_STREAM_ATTEMPTS, _reduce_context
-from .transcript import (
+from backend.copilot.transcript import (
     _flatten_assistant_content,
     _flatten_tool_result_content,
     _messages_to_transcript,
     _transcript_to_messages,
-    compact_transcript,
-    validate_transcript,
 )
+from backend.util import json
+
+from .conftest import build_test_transcript as _build_transcript
+from .service import _MAX_STREAM_ATTEMPTS, _reduce_context
+from .transcript import compact_transcript, validate_transcript
 from .transcript_builder import TranscriptBuilder
 
 # ---------------------------------------------------------------------------
@@ -113,7 +112,7 @@ class TestScenarioCompactAndRetry:
                 )(),
             ),
             patch(
-                "backend.copilot.sdk.transcript._run_compression",
+                "backend.copilot.transcript._run_compression",
                 new_callable=AsyncMock,
                 return_value=mock_result,
             ),
@@ -170,7 +169,7 @@ class TestScenarioCompactFailsFallback:
                 )(),
             ),
             patch(
-                "backend.copilot.sdk.transcript._run_compression",
+                "backend.copilot.transcript._run_compression",
                 new_callable=AsyncMock,
                 side_effect=RuntimeError("LLM unavailable"),
             ),
@@ -261,7 +260,7 @@ class TestScenarioDoubleFailDBFallback:
                 )(),
             ),
             patch(
-                "backend.copilot.sdk.transcript._run_compression",
+                "backend.copilot.transcript._run_compression",
                 new_callable=AsyncMock,
                 return_value=mock_result,
             ),
@@ -337,7 +336,7 @@ class TestScenarioCompactionIdentical:
                 )(),
             ),
             patch(
-                "backend.copilot.sdk.transcript._run_compression",
+                "backend.copilot.transcript._run_compression",
                 new_callable=AsyncMock,
                 return_value=mock_result,
             ),
@@ -730,7 +729,7 @@ class TestRetryEdgeCases:
                 )(),
             ),
             patch(
-                "backend.copilot.sdk.transcript._run_compression",
+                "backend.copilot.transcript._run_compression",
                 new_callable=AsyncMock,
                 return_value=mock_result,
             ),
@@ -841,7 +840,7 @@ class TestRetryStateReset:
                 )(),
             ),
             patch(
-                "backend.copilot.sdk.transcript._run_compression",
+                "backend.copilot.transcript._run_compression",
                 new_callable=AsyncMock,
                 side_effect=RuntimeError("boom"),
             ),
@@ -1405,9 +1404,9 @@ class TestStreamChatCompletionRetryIntegration:
                 events.append(event)
 
         # Should NOT retry — only 1 attempt for auth errors
-        assert attempt_count[0] == 1, (
-            f"Expected 1 attempt (no retry for auth error), " f"got {attempt_count[0]}"
-        )
+        assert (
+            attempt_count[0] == 1
+        ), f"Expected 1 attempt (no retry for auth error), got {attempt_count[0]}"
         errors = [e for e in events if isinstance(e, StreamError)]
         assert errors, "Expected StreamError"
         assert errors[0].code == "sdk_stream_error"
diff --git a/autogpt_platform/backend/backend/copilot/sdk/service.py b/autogpt_platform/backend/backend/copilot/sdk/service.py
index b4321d2520..8c670ea8b9 100644
--- a/autogpt_platform/backend/backend/copilot/sdk/service.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/service.py
@@ -34,12 +34,23 @@ from pydantic import BaseModel
 from backend.copilot.context import get_workspace_manager
 from backend.copilot.permissions import apply_tool_permissions
 from backend.copilot.rate_limit import get_user_tier
+from backend.copilot.transcript import (
+    _run_compression,
+    cleanup_stale_project_dirs,
+    compact_transcript,
+    download_transcript,
+    read_compacted_entries,
+    upload_transcript,
+    validate_transcript,
+    write_transcript_to_tempfile,
+)
+from backend.copilot.transcript_builder import TranscriptBuilder
 from backend.data.redis_client import get_redis_async
 from backend.executor.cluster_lock import AsyncClusterLock
 from backend.util.exceptions import NotFoundError
 from backend.util.settings import Settings
 
-from ..config import ChatConfig
+from ..config import ChatConfig, CopilotMode
 from ..constants import (
     COPILOT_ERROR_PREFIX,
     COPILOT_RETRYABLE_ERROR_PREFIX,
@@ -52,6 +63,7 @@ from ..model import (
     ChatMessage,
     ChatSession,
     get_chat_session,
+    maybe_append_user_message,
     update_session_title,
     upsert_chat_session,
 )
@@ -93,17 +105,6 @@ from .tool_adapter import (
     set_execution_context,
     wait_for_stash,
 )
-from .transcript import (
-    _run_compression,
-    cleanup_stale_project_dirs,
-    compact_transcript,
-    download_transcript,
-    read_compacted_entries,
-    upload_transcript,
-    validate_transcript,
-    write_transcript_to_tempfile,
-)
-from .transcript_builder import TranscriptBuilder
 
 logger = logging.getLogger(__name__)
 config = ChatConfig()
@@ -130,6 +131,11 @@ _CIRCUIT_BREAKER_ERROR_MSG = (
     "Try breaking your request into smaller parts."
 )
 
+# Idle timeout: abort the stream if no meaningful SDK message (only heartbeats)
+# arrives for this many seconds. This catches hung tool calls (e.g. WebSearch
+# hanging on a search provider that never responds).
+_IDLE_TIMEOUT_SECONDS = 10 * 60  # 10 minutes
+
 # Patterns that indicate the prompt/request exceeds the model's context limit.
 # Matched case-insensitively against the full exception chain.
 _PROMPT_TOO_LONG_PATTERNS: tuple[str, ...] = (
@@ -1272,6 +1278,8 @@ async def _run_stream_attempt(
             await client.query(state.query_message, session_id=ctx.session_id)
             state.transcript_builder.append_user(content=ctx.current_message)
 
+        _last_real_msg_time = time.monotonic()
+
         async for sdk_msg in _iter_sdk_messages(client):
             # Heartbeat sentinel — refresh lock and keep SSE alive
             if sdk_msg is None:
@@ -1279,8 +1287,34 @@ async def _run_stream_attempt(
                 for ev in ctx.compaction.emit_start_if_ready():
                     yield ev
                 yield StreamHeartbeat()
+
+                # Idle timeout: if no real SDK message for too long, a tool
+                # call is likely hung (e.g. WebSearch provider not responding).
+                idle_seconds = time.monotonic() - _last_real_msg_time
+                if idle_seconds >= _IDLE_TIMEOUT_SECONDS:
+                    logger.error(
+                        "%s Idle timeout after %.0fs with no SDK message — "
+                        "aborting stream (likely hung tool call)",
+                        ctx.log_prefix,
+                        idle_seconds,
+                    )
+                    stream_error_msg = (
+                        "A tool call appears to be stuck "
+                        "(no response for 10 minutes). "
+                        "Please try again."
+                    )
+                    stream_error_code = "idle_timeout"
+                    _append_error_marker(ctx.session, stream_error_msg, retryable=True)
+                    yield StreamError(
+                        errorText=stream_error_msg,
+                        code=stream_error_code,
+                    )
+                    ended_with_stream_error = True
+                    break
                 continue
 
+            _last_real_msg_time = time.monotonic()
+
             logger.info(
                 "%s Received: %s %s (unresolved=%d, current=%d, resolved=%d)",
                 ctx.log_prefix,
@@ -1529,9 +1563,21 @@ async def _run_stream_attempt(
             # --- Intermediate persistence ---
             # Flush session messages to DB periodically so page reloads
             # show progress during long-running turns.
+            #
+            # IMPORTANT: Skip the flush while tool calls are pending
+            # (tool_calls set on assistant but results not yet received).
+            # The DB save is append-only (uses start_sequence), so if we
+            # flush the assistant message before tool_calls are set on it
+            # (text and tool_use arrive as separate SDK events), the
+            # tool_calls update is lost — the next flush starts past it.
             _msgs_since_flush += 1
             now = time.monotonic()
-            if (
+            has_pending_tools = (
+                acc.has_appended_assistant
+                and acc.accumulated_tool_calls
+                and not acc.has_tool_results
+            )
+            if not has_pending_tools and (
                 _msgs_since_flush >= _FLUSH_MESSAGE_THRESHOLD
                 or (now - _last_flush_time) >= _FLUSH_INTERVAL_SECONDS
             ):
@@ -1631,6 +1677,7 @@ async def stream_chat_completion_sdk(
     session: ChatSession | None = None,
     file_ids: list[str] | None = None,
     permissions: "CopilotPermissions | None" = None,
+    mode: CopilotMode | None = None,
     **_kwargs: Any,
 ) -> AsyncIterator[StreamBaseResponse]:
     """Stream chat completion using Claude Agent SDK.
@@ -1639,7 +1686,10 @@ async def stream_chat_completion_sdk(
         file_ids: Optional workspace file IDs attached to the user's message.
             Images are embedded as vision content blocks; other files are
             saved to the SDK working directory for the Read tool.
+        mode: Accepted for signature compatibility with the baseline path.
+            The SDK path does not currently branch on this value.
     """
+    _ = mode  # SDK path ignores the requested mode.
 
     if session is None:
         session = await get_chat_session(session_id, user_id)
@@ -1670,19 +1720,12 @@ async def stream_chat_completion_sdk(
         )
         session.messages.pop()
 
-    # Append the new message to the session if it's not already there
-    new_message_role = "user" if is_user_message else "assistant"
-    if message and (
-        len(session.messages) == 0
-        or not (
-            session.messages[-1].role == new_message_role
-            and session.messages[-1].content == message
-        )
-    ):
-        session.messages.append(ChatMessage(role=new_message_role, content=message))
+    if maybe_append_user_message(session, message, is_user_message):
         if is_user_message:
             track_user_message(
-                user_id=user_id, session_id=session_id, message_length=len(message)
+                user_id=user_id,
+                session_id=session_id,
+                message_length=len(message or ""),
             )
 
     # Structured log prefix: [SDK][<session>][T<turn>]
diff --git a/autogpt_platform/backend/backend/copilot/sdk/thinking_blocks_test.py b/autogpt_platform/backend/backend/copilot/sdk/thinking_blocks_test.py
index c734f07c89..48d38100b5 100644
--- a/autogpt_platform/backend/backend/copilot/sdk/thinking_blocks_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/thinking_blocks_test.py
@@ -27,20 +27,19 @@ from backend.copilot.response_model import (
     StreamTextDelta,
     StreamTextStart,
 )
-from backend.util import json
-
-from .conftest import build_structured_transcript
-from .response_adapter import SDKResponseAdapter
-from .service import _format_sdk_content_blocks
-from .transcript import (
+from backend.copilot.transcript import (
     _find_last_assistant_entry,
     _flatten_assistant_content,
     _messages_to_transcript,
     _rechain_tail,
     _transcript_to_messages,
-    compact_transcript,
-    validate_transcript,
 )
+from backend.util import json
+
+from .conftest import build_structured_transcript
+from .response_adapter import SDKResponseAdapter
+from .service import _format_sdk_content_blocks
+from .transcript import compact_transcript, validate_transcript
 
 # ---------------------------------------------------------------------------
 # Fixtures: realistic thinking block content
@@ -439,7 +438,7 @@ class TestCompactTranscriptThinkingBlocks:
             },
         )()
         with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
             new_callable=AsyncMock,
             return_value=mock_result,
         ):
@@ -498,7 +497,7 @@ class TestCompactTranscriptThinkingBlocks:
             )()
 
         with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
             side_effect=mock_compression,
         ):
             await compact_transcript(transcript, model="test-model")
@@ -551,7 +550,7 @@ class TestCompactTranscriptThinkingBlocks:
             },
         )()
         with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
             new_callable=AsyncMock,
             return_value=mock_result,
         ):
@@ -601,7 +600,7 @@ class TestCompactTranscriptThinkingBlocks:
             },
         )()
         with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
             new_callable=AsyncMock,
             return_value=mock_result,
         ):
@@ -638,7 +637,7 @@ class TestCompactTranscriptThinkingBlocks:
             },
         )()
         with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
             new_callable=AsyncMock,
             return_value=mock_result,
         ):
@@ -699,7 +698,7 @@ class TestCompactTranscriptThinkingBlocks:
             },
         )()
         with patch(
-            "backend.copilot.sdk.transcript._run_compression",
+            "backend.copilot.transcript._run_compression",
             new_callable=AsyncMock,
             return_value=mock_result,
         ):
diff --git a/autogpt_platform/backend/backend/copilot/sdk/transcript.py b/autogpt_platform/backend/backend/copilot/sdk/transcript.py
index 3aa1dddb37..a93bfbfe30 100644
--- a/autogpt_platform/backend/backend/copilot/sdk/transcript.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/transcript.py
@@ -1,1099 +1,48 @@
-"""JSONL transcript management for stateless multi-turn resume.
+"""Re-export public API from shared ``backend.copilot.transcript``.
 
-The Claude Code CLI persists conversations as JSONL files (one JSON object per
-line).  When the SDK's ``Stop`` hook fires we read this file, strip bloat
-(progress entries, metadata), and upload the result to bucket storage.  On the
-next turn we download the transcript, write it to a temp file, and pass
-``--resume`` so the CLI can reconstruct the full conversation.
-
-Storage is handled via ``WorkspaceStorageBackend`` (GCS in prod, local
-filesystem for self-hosted) — no DB column needed.
+The canonical implementation now lives at ``backend.copilot.transcript``
+so both the SDK and baseline paths can import without cross-package
+dependencies.  Public symbols are re-exported here so existing ``from
+.transcript import ...`` statements within the ``sdk`` package continue
+to work without modification.
 """
 
-from __future__ import annotations
-
-import asyncio
-import logging
-import os
-import re
-import shutil
-import time
-from dataclasses import dataclass
-from pathlib import Path
-from uuid import uuid4
-
-from backend.util import json
-from backend.util.clients import get_openai_client
-from backend.util.prompt import CompressResult, compress_context
-from backend.util.workspace_storage import GCSWorkspaceStorage, get_workspace_storage
-
-logger = logging.getLogger(__name__)
-
-# UUIDs are hex + hyphens; strip everything else to prevent path injection.
-_SAFE_ID_RE = re.compile(r"[^0-9a-fA-F-]")
-
-# Entry types that can be safely removed from the transcript without breaking
-# the parentUuid conversation tree that ``--resume`` relies on.
-# - progress: UI progress ticks, no message content (avg 97KB for agent_progress)
-# - file-history-snapshot: undo tracking metadata
-# - queue-operation: internal queue bookkeeping
-# - summary: session summaries
-# - pr-link: PR link metadata
-STRIPPABLE_TYPES = frozenset(
-    {"progress", "file-history-snapshot", "queue-operation", "summary", "pr-link"}
+from backend.copilot.transcript import (
+    COMPACT_MSG_ID_PREFIX,
+    ENTRY_TYPE_MESSAGE,
+    STOP_REASON_END_TURN,
+    STRIPPABLE_TYPES,
+    TRANSCRIPT_STORAGE_PREFIX,
+    TranscriptDownload,
+    cleanup_stale_project_dirs,
+    compact_transcript,
+    delete_transcript,
+    download_transcript,
+    read_compacted_entries,
+    strip_for_upload,
+    strip_progress_entries,
+    strip_stale_thinking_blocks,
+    upload_transcript,
+    validate_transcript,
+    write_transcript_to_tempfile,
 )
 
-# Thinking block types that can be stripped from non-last assistant entries.
-# The Anthropic API only requires these in the *last* assistant message.
-_THINKING_BLOCK_TYPES = frozenset({"thinking", "redacted_thinking"})
-
-
-@dataclass
-class TranscriptDownload:
-    """Result of downloading a transcript with its metadata."""
-
-    content: str
-    message_count: int = 0  # session.messages length when uploaded
-    uploaded_at: float = 0.0  # epoch timestamp of upload
-
-
-# Workspace storage constants — deterministic path from session_id.
-TRANSCRIPT_STORAGE_PREFIX = "chat-transcripts"
-
-
-# ---------------------------------------------------------------------------
-# Progress stripping
-# ---------------------------------------------------------------------------
-
-
-def strip_progress_entries(content: str) -> str:
-    """Remove progress/metadata entries from a JSONL transcript.
-
-    Removes entries whose ``type`` is in ``STRIPPABLE_TYPES`` and reparents
-    any remaining child entries so the ``parentUuid`` chain stays intact.
-    Typically reduces transcript size by ~30%.
-
-    Entries that are not stripped or reparented are kept as their original
-    raw JSON line to avoid unnecessary re-serialization that changes
-    whitespace or key ordering.
-    """
-    lines = content.strip().split("\n")
-
-    # Parse entries, keeping the original line alongside the parsed dict.
-    parsed: list[tuple[str, dict | None]] = []
-    for line in lines:
-        parsed.append((line, json.loads(line, fallback=None)))
-
-    # First pass: identify stripped UUIDs and build parent map.
-    stripped_uuids: set[str] = set()
-    uuid_to_parent: dict[str, str] = {}
-
-    for _line, entry in parsed:
-        if not isinstance(entry, dict):
-            continue
-        uid = entry.get("uuid", "")
-        parent = entry.get("parentUuid", "")
-        if uid:
-            uuid_to_parent[uid] = parent
-        if (
-            entry.get("type", "") in STRIPPABLE_TYPES
-            and uid
-            and not entry.get("isCompactSummary")
-        ):
-            stripped_uuids.add(uid)
-
-    # Second pass: keep non-stripped entries, reparenting where needed.
-    # Preserve original line when no reparenting is required.
-    reparented: set[str] = set()
-    for _line, entry in parsed:
-        if not isinstance(entry, dict):
-            continue
-        parent = entry.get("parentUuid", "")
-        original_parent = parent
-        # seen_parents is local per-entry (not shared across iterations) so
-        # it can only detect cycles within a single ancestry walk, not across
-        # entries.  This is intentional: each entry's parent chain is
-        # independent, and reusing a global set would incorrectly short-circuit
-        # valid re-use of the same UUID as a parent in different subtrees.
-        seen_parents: set[str] = set()
-        while parent in stripped_uuids and parent not in seen_parents:
-            seen_parents.add(parent)
-            parent = uuid_to_parent.get(parent, "")
-        if parent != original_parent:
-            entry["parentUuid"] = parent
-            uid = entry.get("uuid", "")
-            if uid:
-                reparented.add(uid)
-
-    result_lines: list[str] = []
-    for line, entry in parsed:
-        if not isinstance(entry, dict):
-            result_lines.append(line)
-            continue
-        if entry.get("type", "") in STRIPPABLE_TYPES and not entry.get(
-            "isCompactSummary"
-        ):
-            continue
-        uid = entry.get("uuid", "")
-        if uid in reparented:
-            # Re-serialize only entries whose parentUuid was changed.
-            result_lines.append(json.dumps(entry, separators=(",", ":")))
-        else:
-            result_lines.append(line)
-
-    return "\n".join(result_lines) + "\n"
-
-
-# ---------------------------------------------------------------------------
-# Local file I/O (write temp file for --resume)
-# ---------------------------------------------------------------------------
-
-
-def _sanitize_id(raw_id: str, max_len: int = 36) -> str:
-    """Sanitize an ID for safe use in file paths.
-
-    Session/user IDs are expected to be UUIDs (hex + hyphens).  Strip
-    everything else and truncate to *max_len* so the result cannot introduce
-    path separators or other special characters.
-    """
-    cleaned = _SAFE_ID_RE.sub("", raw_id or "")[:max_len]
-    return cleaned or "unknown"
-
-
-_SAFE_CWD_PREFIX = os.path.realpath("/tmp/copilot-")
-
-
-def _projects_base() -> str:
-    """Return the resolved path to the CLI's projects directory."""
-    config_dir = os.environ.get("CLAUDE_CONFIG_DIR") or os.path.expanduser("~/.claude")
-    return os.path.realpath(os.path.join(config_dir, "projects"))
-
-
-_STALE_PROJECT_DIR_SECONDS = 12 * 3600  # 12 hours — matches max session lifetime
-_MAX_PROJECT_DIRS_TO_SWEEP = 50  # limit per sweep to avoid long pauses
-
-
-def cleanup_stale_project_dirs(encoded_cwd: str | None = None) -> int:
-    """Remove CLI project directories older than ``_STALE_PROJECT_DIR_SECONDS``.
-
-    Each CoPilot SDK turn creates a unique ``~/.claude/projects/<encoded-cwd>/``
-    directory.  These are intentionally kept across turns so the model can read
-    tool-result files via ``--resume``.  However, after a session ends they
-    become stale.  This function sweeps old ones to prevent unbounded disk
-    growth.
-
-    When *encoded_cwd* is provided the sweep is scoped to that single
-    directory, making the operation safe in multi-tenant environments where
-    multiple copilot sessions share the same host.  Without it the function
-    falls back to sweeping all directories matching the copilot naming pattern
-    (``-tmp-copilot-``), which is only safe for single-tenant deployments.
-
-    Returns the number of directories removed.
-    """
-    projects_base = _projects_base()
-    if not os.path.isdir(projects_base):
-        return 0
-
-    now = time.time()
-    removed = 0
-
-    # Scoped mode: only clean up the one directory for the current session.
-    if encoded_cwd:
-        target = Path(projects_base) / encoded_cwd
-        if not target.is_dir():
-            return 0
-        # Guard: only sweep copilot-generated dirs.
-        if "-tmp-copilot-" not in target.name:
-            logger.warning(
-                "[Transcript] Refusing to sweep non-copilot dir: %s", target.name
-            )
-            return 0
-        try:
-            # st_mtime is used as a proxy for session activity. Claude CLI writes
-            # its JSONL transcript into this directory during each turn, so mtime
-            # advances on every turn. A directory whose mtime is older than
-            # _STALE_PROJECT_DIR_SECONDS has not had an active turn in that window
-            # and is safe to remove (the session cannot --resume after cleanup).
-            age = now - target.stat().st_mtime
-        except OSError:
-            return 0
-        if age < _STALE_PROJECT_DIR_SECONDS:
-            return 0
-        try:
-            shutil.rmtree(target, ignore_errors=True)
-            removed = 1
-        except OSError:
-            pass
-        if removed:
-            logger.info(
-                "[Transcript] Swept stale CLI project dir %s (age %ds > %ds)",
-                target.name,
-                int(age),
-                _STALE_PROJECT_DIR_SECONDS,
-            )
-        return removed
-
-    # Unscoped fallback: sweep all copilot dirs across the projects base.
-    # Only safe for single-tenant deployments; callers should prefer the
-    # scoped variant by passing encoded_cwd.
-    try:
-        entries = Path(projects_base).iterdir()
-    except OSError as e:
-        logger.warning("[Transcript] Failed to list projects dir: %s", e)
-        return 0
-
-    for entry in entries:
-        if removed >= _MAX_PROJECT_DIRS_TO_SWEEP:
-            break
-        # Only sweep copilot-generated dirs (pattern: -tmp-copilot- or
-        # -private-tmp-copilot-).
-        if "-tmp-copilot-" not in entry.name:
-            continue
-        if not entry.is_dir():
-            continue
-        try:
-            # See the scoped-mode comment above: st_mtime advances on every turn,
-            # so a stale mtime reliably indicates an inactive session.
-            age = now - entry.stat().st_mtime
-        except OSError:
-            continue
-        if age < _STALE_PROJECT_DIR_SECONDS:
-            continue
-
-        try:
-            shutil.rmtree(entry, ignore_errors=True)
-            removed += 1
-        except OSError:
-            pass
-
-    if removed:
-        logger.info(
-            "[Transcript] Swept %d stale CLI project dirs (older than %ds)",
-            removed,
-            _STALE_PROJECT_DIR_SECONDS,
-        )
-    return removed
-
-
-def read_compacted_entries(transcript_path: str) -> list[dict] | None:
-    """Read compacted entries from the CLI session file after compaction.
-
-    Parses the JSONL file line-by-line, finds the ``isCompactSummary: true``
-    entry, and returns it plus all entries after it.
-
-    The CLI writes the compaction summary BEFORE sending the next message,
-    so the file is guaranteed to be flushed by the time we read it.
-
-    Returns a list of parsed dicts, or ``None`` if the file cannot be read
-    or no compaction summary is found.
-    """
-    if not transcript_path:
-        return None
-
-    projects_base = _projects_base()
-    real_path = os.path.realpath(transcript_path)
-    if not real_path.startswith(projects_base + os.sep):
-        logger.warning(
-            "[Transcript] transcript_path outside projects base: %s", transcript_path
-        )
-        return None
-
-    try:
-        content = Path(real_path).read_text()
-    except OSError as e:
-        logger.warning(
-            "[Transcript] Failed to read session file %s: %s", transcript_path, e
-        )
-        return None
-
-    lines = content.strip().split("\n")
-    compact_idx: int | None = None
-
-    for idx, line in enumerate(lines):
-        if not line.strip():
-            continue
-        entry = json.loads(line, fallback=None)
-        if not isinstance(entry, dict):
-            continue
-        if entry.get("isCompactSummary"):
-            compact_idx = idx  # don't break — find the LAST summary
-
-    if compact_idx is None:
-        logger.debug("[Transcript] No compaction summary found in %s", transcript_path)
-        return None
-
-    entries: list[dict] = []
-    for line in lines[compact_idx:]:
-        if not line.strip():
-            continue
-        entry = json.loads(line, fallback=None)
-        if isinstance(entry, dict):
-            entries.append(entry)
-
-    logger.info(
-        "[Transcript] Read %d compacted entries from %s (summary at line %d)",
-        len(entries),
-        transcript_path,
-        compact_idx + 1,
-    )
-    return entries
-
-
-def write_transcript_to_tempfile(
-    transcript_content: str,
-    session_id: str,
-    cwd: str,
-) -> str | None:
-    """Write JSONL transcript to a temp file inside *cwd* for ``--resume``.
-
-    The file lives in the session working directory so it is cleaned up
-    automatically when the session ends.
-
-    Returns the absolute path to the file, or ``None`` on failure.
-    """
-    # Validate cwd is under the expected sandbox prefix (CodeQL sanitizer).
-    real_cwd = os.path.realpath(cwd)
-    if not real_cwd.startswith(_SAFE_CWD_PREFIX):
-        logger.warning("[Transcript] cwd outside sandbox: %s", cwd)
-        return None
-
-    try:
-        os.makedirs(real_cwd, exist_ok=True)
-        safe_id = _sanitize_id(session_id, max_len=8)
-        jsonl_path = os.path.realpath(
-            os.path.join(real_cwd, f"transcript-{safe_id}.jsonl")
-        )
-        if not jsonl_path.startswith(real_cwd):
-            logger.warning("[Transcript] Path escaped cwd: %s", jsonl_path)
-            return None
-
-        with open(jsonl_path, "w") as f:
-            f.write(transcript_content)
-
-        logger.info("[Transcript] Wrote resume file: %s", jsonl_path)
-        return jsonl_path
-
-    except OSError as e:
-        logger.warning("[Transcript] Failed to write resume file: %s", e)
-        return None
-
-
-def validate_transcript(content: str | None) -> bool:
-    """Check that a transcript has actual conversation messages.
-
-    A valid transcript needs at least one assistant message (not just
-    queue-operation / file-history-snapshot metadata).  We do NOT require
-    a ``type: "user"`` entry because with ``--resume`` the user's message
-    is passed as a CLI query parameter and does not appear in the
-    transcript file.
-    """
-    if not content or not content.strip():
-        return False
-
-    lines = content.strip().split("\n")
-
-    has_assistant = False
-
-    for line in lines:
-        if not line.strip():
-            continue
-        entry = json.loads(line, fallback=None)
-        if not isinstance(entry, dict):
-            return False
-        if entry.get("type") == "assistant":
-            has_assistant = True
-
-    return has_assistant
-
-
-# ---------------------------------------------------------------------------
-# Bucket storage (GCS / local via WorkspaceStorageBackend)
-# ---------------------------------------------------------------------------
-
-
-def _storage_path_parts(user_id: str, session_id: str) -> tuple[str, str, str]:
-    """Return (workspace_id, file_id, filename) for a session's transcript.
-
-    Path structure: ``chat-transcripts/{user_id}/{session_id}.jsonl``
-    IDs are sanitized to hex+hyphen to prevent path traversal.
-    """
-    return (
-        TRANSCRIPT_STORAGE_PREFIX,
-        _sanitize_id(user_id),
-        f"{_sanitize_id(session_id)}.jsonl",
-    )
-
-
-def _meta_storage_path_parts(user_id: str, session_id: str) -> tuple[str, str, str]:
-    """Return (workspace_id, file_id, filename) for a session's transcript metadata."""
-    return (
-        TRANSCRIPT_STORAGE_PREFIX,
-        _sanitize_id(user_id),
-        f"{_sanitize_id(session_id)}.meta.json",
-    )
-
-
-def _build_path_from_parts(parts: tuple[str, str, str], backend: object) -> str:
-    """Build a full storage path from (workspace_id, file_id, filename) parts."""
-    wid, fid, fname = parts
-    if isinstance(backend, GCSWorkspaceStorage):
-        blob = f"workspaces/{wid}/{fid}/{fname}"
-        return f"gcs://{backend.bucket_name}/{blob}"
-    return f"local://{wid}/{fid}/{fname}"
-
-
-def _build_storage_path(user_id: str, session_id: str, backend: object) -> str:
-    """Build the full storage path string that ``retrieve()`` expects."""
-    return _build_path_from_parts(_storage_path_parts(user_id, session_id), backend)
-
-
-def _build_meta_storage_path(user_id: str, session_id: str, backend: object) -> str:
-    """Build the full storage path for the companion .meta.json file."""
-    return _build_path_from_parts(
-        _meta_storage_path_parts(user_id, session_id), backend
-    )
-
-
-def strip_stale_thinking_blocks(content: str) -> str:
-    """Remove thinking/redacted_thinking blocks from non-last assistant entries.
-
-    The Anthropic API only requires thinking blocks in the **last** assistant
-    message to be value-identical to the original response.  Older assistant
-    entries carry stale thinking blocks that consume significant tokens
-    (often 10-50K each) without providing useful context for ``--resume``.
-
-    Stripping them before upload prevents the CLI from triggering compaction
-    every turn just to compress away the stale thinking bloat.
-    """
-    lines = content.strip().split("\n")
-    if not lines:
-        return content
-
-    parsed: list[tuple[str, dict | None]] = []
-    for line in lines:
-        parsed.append((line, json.loads(line, fallback=None)))
-
-    # Reverse scan to find the last assistant message ID and index.
-    last_asst_msg_id: str | None = None
-    last_asst_idx: int | None = None
-    for i in range(len(parsed) - 1, -1, -1):
-        _line, entry = parsed[i]
-        if not isinstance(entry, dict):
-            continue
-        msg = entry.get("message", {})
-        if msg.get("role") == "assistant":
-            last_asst_msg_id = msg.get("id")
-            last_asst_idx = i
-            break
-
-    if last_asst_idx is None:
-        return content
-
-    result_lines: list[str] = []
-    stripped_count = 0
-    for i, (line, entry) in enumerate(parsed):
-        if not isinstance(entry, dict):
-            result_lines.append(line)
-            continue
-
-        msg = entry.get("message", {})
-        # Only strip from assistant entries that are NOT the last turn.
-        # Use msg_id matching when available; fall back to index for entries
-        # without an id field.
-        is_last_turn = (
-            last_asst_msg_id is not None and msg.get("id") == last_asst_msg_id
-        ) or (last_asst_msg_id is None and i == last_asst_idx)
-        if (
-            msg.get("role") == "assistant"
-            and not is_last_turn
-            and isinstance(msg.get("content"), list)
-        ):
-            content_blocks = msg["content"]
-            filtered = [
-                b
-                for b in content_blocks
-                if not (isinstance(b, dict) and b.get("type") in _THINKING_BLOCK_TYPES)
-            ]
-            if len(filtered) < len(content_blocks):
-                stripped_count += len(content_blocks) - len(filtered)
-                entry = {**entry, "message": {**msg, "content": filtered}}
-                result_lines.append(json.dumps(entry, separators=(",", ":")))
-                continue
-
-        result_lines.append(line)
-
-    if stripped_count:
-        logger.info(
-            "[Transcript] Stripped %d stale thinking block(s) from non-last entries",
-            stripped_count,
-        )
-
-    return "\n".join(result_lines) + "\n"
-
-
-async def upload_transcript(
-    user_id: str,
-    session_id: str,
-    content: str,
-    message_count: int = 0,
-    log_prefix: str = "[Transcript]",
-) -> None:
-    """Strip progress entries and upload complete transcript.
-
-    The transcript represents the FULL active context (atomic).
-    Each upload REPLACES the previous transcript entirely.
-
-    The executor holds a cluster lock per session, so concurrent uploads for
-    the same session cannot happen.
-
-    Args:
-        content: Complete JSONL transcript (from TranscriptBuilder).
-        message_count: ``len(session.messages)`` at upload time.
-    """
-    # Strip metadata entries (progress, file-history-snapshot, etc.)
-    # Note: SDK-built transcripts shouldn't have these, but strip for safety
-    stripped = strip_progress_entries(content)
-    # Strip stale thinking blocks from older assistant entries — these consume
-    # significant tokens and trigger unnecessary CLI compaction every turn.
-    stripped = strip_stale_thinking_blocks(stripped)
-    if not validate_transcript(stripped):
-        # Log entry types for debugging — helps identify why validation failed
-        entry_types = [
-            json.loads(line, fallback={"type": "INVALID_JSON"}).get("type", "?")
-            for line in stripped.strip().split("\n")
-        ]
-        logger.warning(
-            "%s Skipping upload — stripped content not valid "
-            "(types=%s, stripped_len=%d, raw_len=%d)",
-            log_prefix,
-            entry_types,
-            len(stripped),
-            len(content),
-        )
-        logger.debug("%s Raw content preview: %s", log_prefix, content[:500])
-        logger.debug("%s Stripped content: %s", log_prefix, stripped[:500])
-        return
-
-    storage = await get_workspace_storage()
-    wid, fid, fname = _storage_path_parts(user_id, session_id)
-    encoded = stripped.encode("utf-8")
-
-    await storage.store(
-        workspace_id=wid,
-        file_id=fid,
-        filename=fname,
-        content=encoded,
-    )
-
-    # Update metadata so message_count stays current.  The gap-fill logic
-    # in _build_query_message relies on it to avoid re-compressing messages.
-    try:
-        meta = {"message_count": message_count, "uploaded_at": time.time()}
-        mwid, mfid, mfname = _meta_storage_path_parts(user_id, session_id)
-        await storage.store(
-            workspace_id=mwid,
-            file_id=mfid,
-            filename=mfname,
-            content=json.dumps(meta).encode("utf-8"),
-        )
-    except Exception as e:
-        logger.warning("%s Failed to write metadata: %s", log_prefix, e)
-
-    logger.info(
-        "%s Uploaded %dB (stripped from %dB, msg_count=%d)",
-        log_prefix,
-        len(encoded),
-        len(content),
-        message_count,
-    )
-
-
-async def download_transcript(
-    user_id: str,
-    session_id: str,
-    log_prefix: str = "[Transcript]",
-) -> TranscriptDownload | None:
-    """Download transcript and metadata from bucket storage.
-
-    Returns a ``TranscriptDownload`` with the JSONL content and the
-    ``message_count`` watermark from the upload, or ``None`` if not found.
-    """
-    storage = await get_workspace_storage()
-    path = _build_storage_path(user_id, session_id, storage)
-
-    try:
-        data = await storage.retrieve(path)
-        content = data.decode("utf-8")
-    except FileNotFoundError:
-        logger.debug("%s No transcript in storage", log_prefix)
-        return None
-    except Exception as e:
-        logger.warning("%s Failed to download transcript: %s", log_prefix, e)
-        return None
-
-    # Try to load metadata (best-effort — old transcripts won't have it)
-    message_count = 0
-    uploaded_at = 0.0
-    try:
-        meta_path = _build_meta_storage_path(user_id, session_id, storage)
-        meta_data = await storage.retrieve(meta_path)
-        meta = json.loads(meta_data.decode("utf-8"), fallback={})
-        message_count = meta.get("message_count", 0)
-        uploaded_at = meta.get("uploaded_at", 0.0)
-    except FileNotFoundError:
-        pass  # No metadata — treat as unknown (msg_count=0 → always fill gap)
-    except Exception as e:
-        logger.debug("%s Failed to load transcript metadata: %s", log_prefix, e)
-
-    logger.info(
-        "%s Downloaded %dB (msg_count=%d)", log_prefix, len(content), message_count
-    )
-    return TranscriptDownload(
-        content=content,
-        message_count=message_count,
-        uploaded_at=uploaded_at,
-    )
-
-
-async def delete_transcript(user_id: str, session_id: str) -> None:
-    """Delete transcript and its metadata from bucket storage.
-
-    Removes both the ``.jsonl`` transcript and the companion ``.meta.json``
-    so stale ``message_count`` watermarks cannot corrupt gap-fill logic.
-    """
-    storage = await get_workspace_storage()
-    path = _build_storage_path(user_id, session_id, storage)
-
-    try:
-        await storage.delete(path)
-        logger.info("[Transcript] Deleted transcript for session %s", session_id)
-    except Exception as e:
-        logger.warning("[Transcript] Failed to delete transcript: %s", e)
-
-    # Also delete the companion .meta.json to avoid orphaned metadata.
-    try:
-        meta_path = _build_meta_storage_path(user_id, session_id, storage)
-        await storage.delete(meta_path)
-        logger.info("[Transcript] Deleted metadata for session %s", session_id)
-    except Exception as e:
-        logger.warning("[Transcript] Failed to delete metadata: %s", e)
-
-
-# ---------------------------------------------------------------------------
-# Transcript compaction — LLM summarization for prompt-too-long recovery
-# ---------------------------------------------------------------------------
-
-# JSONL protocol values used in transcript serialization.
-STOP_REASON_END_TURN = "end_turn"
-COMPACT_MSG_ID_PREFIX = "msg_compact_"
-ENTRY_TYPE_MESSAGE = "message"
-
-
-def _flatten_assistant_content(blocks: list) -> str:
-    """Flatten assistant content blocks into a single plain-text string.
-
-    Structured ``tool_use`` blocks are converted to ``[tool_use: name]``
-    placeholders.  ``thinking`` and ``redacted_thinking`` blocks are
-    silently dropped — they carry no useful context for compression
-    summaries and must not leak into compacted transcripts (the Anthropic
-    API requires thinking blocks in the last assistant message to be
-    value-identical to the original response; including stale thinking
-    text would violate that constraint).
-
-    This is intentional: ``compress_context`` requires plain text for
-    token counting and LLM summarization.  The structural loss is
-    acceptable because compaction only runs when the original transcript
-    was already too large for the model.
-    """
-    parts: list[str] = []
-    for block in blocks:
-        if isinstance(block, dict):
-            btype = block.get("type", "")
-            if btype in _THINKING_BLOCK_TYPES:
-                continue
-            if btype == "text":
-                parts.append(block.get("text", ""))
-            elif btype == "tool_use":
-                # Drop tool_use entirely — any text representation gets
-                # mimicked by the model as plain text instead of actual
-                # structured tool calls. The tool results (in the
-                # following user/tool_result entry) provide sufficient
-                # context about what happened.
-                continue
-            else:
-                continue
-        elif isinstance(block, str):
-            parts.append(block)
-    return "\n".join(parts) if parts else ""
-
-
-def _flatten_tool_result_content(blocks: list) -> str:
-    """Flatten tool_result and other content blocks into plain text.
-
-    Handles nested tool_result structures, text blocks, and raw strings.
-    Uses ``json.dumps`` as fallback for dict blocks without a ``text`` key
-    or where ``text`` is ``None``.
-
-    Like ``_flatten_assistant_content``, structured blocks (images, nested
-    tool results) are reduced to text representations for compression.
-    """
-    str_parts: list[str] = []
-    for block in blocks:
-        if isinstance(block, dict) and block.get("type") == "tool_result":
-            inner = block.get("content") or ""
-            if isinstance(inner, list):
-                for sub in inner:
-                    if isinstance(sub, dict):
-                        sub_type = sub.get("type")
-                        if sub_type in ("image", "document"):
-                            # Avoid serializing base64 binary data into
-                            # the compaction input — use a placeholder.
-                            str_parts.append(f"[__{sub_type}__]")
-                        elif sub_type == "text" or sub.get("text") is not None:
-                            str_parts.append(str(sub.get("text", "")))
-                        else:
-                            str_parts.append(json.dumps(sub))
-                    else:
-                        str_parts.append(str(sub))
-            else:
-                str_parts.append(str(inner))
-        elif isinstance(block, dict) and block.get("type") == "text":
-            str_parts.append(str(block.get("text", "")))
-        elif isinstance(block, dict):
-            # Preserve non-text/non-tool_result blocks (e.g. image) as placeholders.
-            # Use __prefix__ to distinguish from literal user text.
-            btype = block.get("type", "unknown")
-            str_parts.append(f"[__{btype}__]")
-        elif isinstance(block, str):
-            str_parts.append(block)
-    return "\n".join(str_parts) if str_parts else ""
-
-
-def _transcript_to_messages(content: str) -> list[dict]:
-    """Convert JSONL transcript entries to plain message dicts for compression.
-
-    Parses each line of the JSONL *content*, skips strippable metadata entries
-    (progress, file-history-snapshot, etc.), and extracts the ``role`` and
-    flattened ``content`` from the ``message`` field of each remaining entry.
-
-    Structured content blocks (``tool_use``, ``tool_result``, images) are
-    flattened to plain text via ``_flatten_assistant_content`` and
-    ``_flatten_tool_result_content`` so that ``compress_context`` can
-    perform token counting and LLM summarization on uniform strings.
-
-    Returns:
-        A list of ``{"role": str, "content": str}`` dicts suitable for
-        ``compress_context``.
-    """
-    messages: list[dict] = []
-    for line in content.strip().split("\n"):
-        if not line.strip():
-            continue
-        entry = json.loads(line, fallback=None)
-        if not isinstance(entry, dict):
-            continue
-        if entry.get("type", "") in STRIPPABLE_TYPES and not entry.get(
-            "isCompactSummary"
-        ):
-            continue
-        msg = entry.get("message", {})
-        role = msg.get("role", "")
-        if not role:
-            continue
-        msg_dict: dict = {"role": role}
-        raw_content = msg.get("content")
-        if role == "assistant" and isinstance(raw_content, list):
-            msg_dict["content"] = _flatten_assistant_content(raw_content)
-        elif isinstance(raw_content, list):
-            msg_dict["content"] = _flatten_tool_result_content(raw_content)
-        else:
-            msg_dict["content"] = raw_content or ""
-        messages.append(msg_dict)
-    return messages
-
-
-def _messages_to_transcript(messages: list[dict]) -> str:
-    """Convert compressed message dicts back to JSONL transcript format.
-
-    Rebuilds a minimal JSONL transcript from the ``{"role", "content"}``
-    dicts returned by ``compress_context``.  Each message becomes one JSONL
-    line with a fresh ``uuid`` / ``parentUuid`` chain so the CLI's
-    ``--resume`` flag can reconstruct a valid conversation tree.
-
-    Assistant messages are wrapped in the full ``message`` envelope
-    (``id``, ``model``, ``stop_reason``, structured ``content`` blocks)
-    that the CLI expects.  User messages use the simpler ``{role, content}``
-    form.
-
-    Returns:
-        A newline-terminated JSONL string, or an empty string if *messages*
-        is empty.
-    """
-    lines: list[str] = []
-    last_uuid: str = ""  # root entry uses empty string, not null
-    for msg in messages:
-        role = msg.get("role", "user")
-        entry_type = "assistant" if role == "assistant" else "user"
-        uid = str(uuid4())
-        content = msg.get("content", "")
-        if role == "assistant":
-            message: dict = {
-                "role": "assistant",
-                "model": "",
-                "id": f"{COMPACT_MSG_ID_PREFIX}{uuid4().hex[:24]}",
-                "type": ENTRY_TYPE_MESSAGE,
-                "content": [{"type": "text", "text": content}] if content else [],
-                "stop_reason": STOP_REASON_END_TURN,
-                "stop_sequence": None,
-            }
-        else:
-            message = {"role": role, "content": content}
-        entry = {
-            "type": entry_type,
-            "uuid": uid,
-            "parentUuid": last_uuid,
-            "message": message,
-        }
-        lines.append(json.dumps(entry, separators=(",", ":")))
-        last_uuid = uid
-    return "\n".join(lines) + "\n" if lines else ""
-
-
-_COMPACTION_TIMEOUT_SECONDS = 60
-_TRUNCATION_TIMEOUT_SECONDS = 30
-
-
-async def _run_compression(
-    messages: list[dict],
-    model: str,
-    log_prefix: str,
-) -> CompressResult:
-    """Run LLM-based compression with truncation fallback.
-
-    Uses the shared OpenAI client from ``get_openai_client()``.
-    If no client is configured or the LLM call fails, falls back to
-    truncation-based compression which drops older messages without
-    summarization.
-
-    A 60-second timeout prevents a hung LLM call from blocking the
-    retry path indefinitely.  The truncation fallback also has a
-    30-second timeout to guard against slow tokenization on very large
-    transcripts.
-    """
-    client = get_openai_client()
-    if client is None:
-        logger.warning("%s No OpenAI client configured, using truncation", log_prefix)
-        return await asyncio.wait_for(
-            compress_context(messages=messages, model=model, client=None),
-            timeout=_TRUNCATION_TIMEOUT_SECONDS,
-        )
-    try:
-        return await asyncio.wait_for(
-            compress_context(messages=messages, model=model, client=client),
-            timeout=_COMPACTION_TIMEOUT_SECONDS,
-        )
-    except Exception as e:
-        logger.warning("%s LLM compaction failed, using truncation: %s", log_prefix, e)
-        return await asyncio.wait_for(
-            compress_context(messages=messages, model=model, client=None),
-            timeout=_TRUNCATION_TIMEOUT_SECONDS,
-        )
-
-
-def _find_last_assistant_entry(
-    content: str,
-) -> tuple[list[str], list[str]]:
-    """Split JSONL lines into (compressible_prefix, preserved_tail).
-
-    The tail starts at the **first** entry of the last assistant turn and
-    includes everything after it (typically trailing user messages).  An
-    assistant turn can span multiple consecutive JSONL entries sharing the
-    same ``message.id`` (e.g., a thinking entry followed by a tool_use
-    entry).  All entries of the turn are preserved verbatim.
-
-    The Anthropic API requires that ``thinking`` and ``redacted_thinking``
-    blocks in the **last** assistant message remain value-identical to the
-    original response (the API validates parsed signature values, not raw
-    JSON bytes).  By excluding the entire turn from compression we
-    guarantee those blocks are never altered.
-
-    Returns ``(all_lines, [])`` when no assistant entry is found.
-    """
-    lines = [ln for ln in content.strip().split("\n") if ln.strip()]
-
-    # Parse all lines once to avoid double JSON deserialization.
-    # json.loads with fallback=None returns Any; non-dict entries are
-    # safely skipped by the isinstance(entry, dict) guards below.
-    parsed: list = [json.loads(ln, fallback=None) for ln in lines]
-
-    # Reverse scan: find the message.id and index of the last assistant entry.
-    last_asst_msg_id: str | None = None
-    last_asst_idx: int | None = None
-    for i in range(len(parsed) - 1, -1, -1):
-        entry = parsed[i]
-        if not isinstance(entry, dict):
-            continue
-        msg = entry.get("message", {})
-        if msg.get("role") == "assistant":
-            last_asst_idx = i
-            last_asst_msg_id = msg.get("id")
-            break
-
-    if last_asst_idx is None:
-        return lines, []
-
-    # If the assistant entry has no message.id, fall back to preserving
-    # from that single entry onward — safer than compressing everything.
-    if last_asst_msg_id is None:
-        return lines[:last_asst_idx], lines[last_asst_idx:]
-
-    # Forward scan: find the first entry of this turn (same message.id).
-    first_turn_idx: int | None = None
-    for i, entry in enumerate(parsed):
-        if not isinstance(entry, dict):
-            continue
-        msg = entry.get("message", {})
-        if msg.get("role") == "assistant" and msg.get("id") == last_asst_msg_id:
-            first_turn_idx = i
-            break
-
-    if first_turn_idx is None:
-        return lines, []
-    return lines[:first_turn_idx], lines[first_turn_idx:]
-
-
-async def compact_transcript(
-    content: str,
-    *,
-    model: str,
-    log_prefix: str = "[Transcript]",
-) -> str | None:
-    """Compact an oversized JSONL transcript using LLM summarization.
-
-    Converts transcript entries to plain messages, runs ``compress_context``
-    (the same compressor used for pre-query history), and rebuilds JSONL.
-
-    The **last assistant entry** (and any entries after it) are preserved
-    verbatim — never flattened or compressed.  The Anthropic API requires
-    ``thinking`` and ``redacted_thinking`` blocks in the latest assistant
-    message to be value-identical to the original response (the API
-    validates parsed signature values, not raw JSON bytes); compressing
-    them would destroy the cryptographic signatures and cause
-    ``invalid_request_error``.
-
-    Structured content in *older* assistant entries (``tool_use`` blocks,
-    ``thinking`` blocks, ``tool_result`` nesting, images) is flattened to
-    plain text for compression.  This matches the fidelity of the Plan C
-    (DB compression) fallback path.
-
-    Returns the compacted JSONL string, or ``None`` on failure.
-
-    See also:
-        ``_compress_messages`` in ``service.py`` — compresses ``ChatMessage``
-        lists for pre-query DB history.
-    """
-    prefix_lines, tail_lines = _find_last_assistant_entry(content)
-
-    # Build the JSONL string for the compressible prefix
-    prefix_content = "\n".join(prefix_lines) + "\n" if prefix_lines else ""
-    messages = _transcript_to_messages(prefix_content) if prefix_content else []
-
-    if len(messages) + len(tail_lines) < 2:
-        total = len(messages) + len(tail_lines)
-        logger.warning("%s Too few messages to compact (%d)", log_prefix, total)
-        return None
-    if not messages:
-        logger.warning("%s Nothing to compress (only tail entries remain)", log_prefix)
-        return None
-    try:
-        result = await _run_compression(messages, model, log_prefix)
-        if not result.was_compacted:
-            logger.warning(
-                "%s Compressor reports within budget but SDK rejected — "
-                "signalling failure",
-                log_prefix,
-            )
-            return None
-        if not result.messages:
-            logger.warning("%s Compressor returned empty messages", log_prefix)
-            return None
-        logger.info(
-            "%s Compacted transcript: %d->%d tokens (%d summarized, %d dropped)",
-            log_prefix,
-            result.original_token_count,
-            result.token_count,
-            result.messages_summarized,
-            result.messages_dropped,
-        )
-        compressed_part = _messages_to_transcript(result.messages)
-
-        # Re-append the preserved tail (last assistant + trailing entries)
-        # with parentUuid patched to chain onto the compressed prefix.
-        tail_part = _rechain_tail(compressed_part, tail_lines)
-        compacted = compressed_part + tail_part
-
-        if len(compacted) >= len(content):
-            # Byte count can increase due to preserved tail entries
-            # (thinking blocks, JSON overhead) even when token count
-            # decreased.  Log a warning but still return — the API
-            # validates tokens not bytes, and the caller falls through
-            # to DB fallback if the transcript is still too large.
-            logger.warning(
-                "%s Compacted transcript (%d bytes) is not smaller than "
-                "original (%d bytes) — may still reduce token count",
-                log_prefix,
-                len(compacted),
-                len(content),
-            )
-        # Authoritative validation — the caller (_reduce_context) also
-        # validates, but this is the canonical check that guarantees we
-        # never return a malformed transcript from this function.
-        if not validate_transcript(compacted):
-            logger.warning("%s Compacted transcript failed validation", log_prefix)
-            return None
-        return compacted
-    except Exception as e:
-        logger.error(
-            "%s Transcript compaction failed: %s", log_prefix, e, exc_info=True
-        )
-        return None
-
-
-def _rechain_tail(compressed_prefix: str, tail_lines: list[str]) -> str:
-    """Patch tail entries so their parentUuid chain links to the compressed prefix.
-
-    The first tail entry's ``parentUuid`` is set to the ``uuid`` of the
-    last entry in the compressed prefix.  Subsequent tail entries are
-    rechained to point to their predecessor in the tail — their original
-    ``parentUuid`` values may reference entries that were compressed away.
-    """
-    if not tail_lines:
-        return ""
-    # Find the last uuid in the compressed prefix
-    last_prefix_uuid = ""
-    for line in reversed(compressed_prefix.strip().split("\n")):
-        if not line.strip():
-            continue
-        entry = json.loads(line, fallback=None)
-        if isinstance(entry, dict) and "uuid" in entry:
-            last_prefix_uuid = entry["uuid"]
-            break
-
-    result_lines: list[str] = []
-    prev_uuid: str | None = None
-    for i, line in enumerate(tail_lines):
-        entry = json.loads(line, fallback=None)
-        if not isinstance(entry, dict):
-            # Safety guard: _find_last_assistant_entry already filters empty
-            # lines, and well-formed JSONL always parses to dicts.  Non-dict
-            # lines are passed through unchanged; prev_uuid is intentionally
-            # NOT updated so the next dict entry chains to the last known uuid.
-            result_lines.append(line)
-            continue
-        if i == 0:
-            entry["parentUuid"] = last_prefix_uuid
-        elif prev_uuid is not None:
-            entry["parentUuid"] = prev_uuid
-        prev_uuid = entry.get("uuid")
-        result_lines.append(json.dumps(entry, separators=(",", ":")))
-    return "\n".join(result_lines) + "\n"
+__all__ = [
+    "COMPACT_MSG_ID_PREFIX",
+    "ENTRY_TYPE_MESSAGE",
+    "STOP_REASON_END_TURN",
+    "STRIPPABLE_TYPES",
+    "TRANSCRIPT_STORAGE_PREFIX",
+    "TranscriptDownload",
+    "cleanup_stale_project_dirs",
+    "compact_transcript",
+    "delete_transcript",
+    "download_transcript",
+    "read_compacted_entries",
+    "strip_for_upload",
+    "strip_progress_entries",
+    "strip_stale_thinking_blocks",
+    "upload_transcript",
+    "validate_transcript",
+    "write_transcript_to_tempfile",
+]
diff --git a/autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py b/autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
index b0b7fa5502..5e971bf395 100644
--- a/autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
@@ -1,235 +1,10 @@
-"""Build complete JSONL transcript from SDK messages.
+"""Re-export from shared ``backend.copilot.transcript_builder`` for backward compat.
 
-The transcript represents the FULL active context at any point in time.
-Each upload REPLACES the previous transcript atomically.
-
-Flow:
-  Turn 1: Upload [msg1, msg2]
-  Turn 2: Download [msg1, msg2] → Upload [msg1, msg2, msg3, msg4] (REPLACE)
-  Turn 3: Download [msg1, msg2, msg3, msg4] → Upload [all messages] (REPLACE)
-
-The transcript is never incremental - always the complete atomic state.
+The canonical implementation now lives at ``backend.copilot.transcript_builder``
+so both the SDK and baseline paths can import without cross-package
+dependencies.
 """
 
-import logging
-from typing import Any
-from uuid import uuid4
+from backend.copilot.transcript_builder import TranscriptBuilder, TranscriptEntry
 
-from pydantic import BaseModel
-
-from backend.util import json
-
-from .transcript import STRIPPABLE_TYPES
-
-logger = logging.getLogger(__name__)
-
-
-class TranscriptEntry(BaseModel):
-    """Single transcript entry (user or assistant turn)."""
-
-    type: str
-    uuid: str
-    parentUuid: str | None
-    isCompactSummary: bool | None = None
-    message: dict[str, Any]
-
-
-class TranscriptBuilder:
-    """Build complete JSONL transcript from SDK messages.
-
-    This builder maintains the FULL conversation state, not incremental changes.
-    The output is always the complete active context.
-    """
-
-    def __init__(self) -> None:
-        self._entries: list[TranscriptEntry] = []
-        self._last_uuid: str | None = None
-
-    def _last_is_assistant(self) -> bool:
-        return bool(self._entries) and self._entries[-1].type == "assistant"
-
-    def _last_message_id(self) -> str:
-        """Return the message.id of the last entry, or '' if none."""
-        if self._entries:
-            return self._entries[-1].message.get("id", "")
-        return ""
-
-    @staticmethod
-    def _parse_entry(data: dict) -> TranscriptEntry | None:
-        """Parse a single transcript entry, filtering strippable types.
-
-        Returns ``None`` for entries that should be skipped (strippable types
-        that are not compaction summaries).
-        """
-        entry_type = data.get("type", "")
-        if entry_type in STRIPPABLE_TYPES and not data.get("isCompactSummary"):
-            return None
-        return TranscriptEntry(
-            type=entry_type,
-            uuid=data.get("uuid") or str(uuid4()),
-            parentUuid=data.get("parentUuid"),
-            isCompactSummary=data.get("isCompactSummary"),
-            message=data.get("message", {}),
-        )
-
-    def load_previous(self, content: str, log_prefix: str = "[Transcript]") -> None:
-        """Load complete previous transcript.
-
-        This loads the FULL previous context. As new messages come in,
-        we append to this state. The final output is the complete context
-        (previous + new), not just the delta.
-        """
-        if not content or not content.strip():
-            return
-
-        lines = content.strip().split("\n")
-        for line_num, line in enumerate(lines, 1):
-            if not line.strip():
-                continue
-
-            data = json.loads(line, fallback=None)
-            if data is None:
-                logger.warning(
-                    "%s Failed to parse transcript line %d/%d",
-                    log_prefix,
-                    line_num,
-                    len(lines),
-                )
-                continue
-
-            entry = self._parse_entry(data)
-            if entry is None:
-                continue
-            self._entries.append(entry)
-            self._last_uuid = entry.uuid
-
-        logger.info(
-            "%s Loaded %d entries from previous transcript (last_uuid=%s)",
-            log_prefix,
-            len(self._entries),
-            self._last_uuid[:12] if self._last_uuid else None,
-        )
-
-    def append_user(self, content: str | list[dict], uuid: str | None = None) -> None:
-        """Append a user entry."""
-        msg_uuid = uuid or str(uuid4())
-
-        self._entries.append(
-            TranscriptEntry(
-                type="user",
-                uuid=msg_uuid,
-                parentUuid=self._last_uuid,
-                message={"role": "user", "content": content},
-            )
-        )
-        self._last_uuid = msg_uuid
-
-    def append_tool_result(self, tool_use_id: str, content: str) -> None:
-        """Append a tool result as a user entry (one per tool call)."""
-        self.append_user(
-            content=[
-                {"type": "tool_result", "tool_use_id": tool_use_id, "content": content}
-            ]
-        )
-
-    def append_assistant(
-        self,
-        content_blocks: list[dict],
-        model: str = "",
-        stop_reason: str | None = None,
-    ) -> None:
-        """Append an assistant entry.
-
-        Consecutive assistant entries automatically share the same message ID
-        so the CLI can merge them (thinking → text → tool_use) into a single
-        API message on ``--resume``.  A new ID is assigned whenever an
-        assistant entry follows a non-assistant entry (user message or tool
-        result), because that marks the start of a new API response.
-        """
-        message_id = (
-            self._last_message_id()
-            if self._last_is_assistant()
-            else f"msg_sdk_{uuid4().hex[:24]}"
-        )
-
-        msg_uuid = str(uuid4())
-
-        self._entries.append(
-            TranscriptEntry(
-                type="assistant",
-                uuid=msg_uuid,
-                parentUuid=self._last_uuid,
-                message={
-                    "role": "assistant",
-                    "model": model,
-                    "id": message_id,
-                    "type": "message",
-                    "content": content_blocks,
-                    "stop_reason": stop_reason,
-                    "stop_sequence": None,
-                },
-            )
-        )
-        self._last_uuid = msg_uuid
-
-    def replace_entries(
-        self, compacted_entries: list[dict], log_prefix: str = "[Transcript]"
-    ) -> None:
-        """Replace all entries with compacted entries from the CLI session file.
-
-        Called after mid-stream compaction so TranscriptBuilder mirrors the
-        CLI's active context (compaction summary + post-compaction entries).
-
-        Builds the new list first and validates it's non-empty before swapping,
-        so corrupt input cannot wipe the conversation history.
-        """
-        new_entries: list[TranscriptEntry] = []
-        for data in compacted_entries:
-            entry = self._parse_entry(data)
-            if entry is not None:
-                new_entries.append(entry)
-
-        if not new_entries:
-            logger.warning(
-                "%s replace_entries produced 0 entries from %d inputs, keeping old (%d entries)",
-                log_prefix,
-                len(compacted_entries),
-                len(self._entries),
-            )
-            return
-
-        old_count = len(self._entries)
-        self._entries = new_entries
-        self._last_uuid = new_entries[-1].uuid
-
-        logger.info(
-            "%s TranscriptBuilder compacted: %d entries -> %d entries",
-            log_prefix,
-            old_count,
-            len(self._entries),
-        )
-
-    def to_jsonl(self) -> str:
-        """Export complete context as JSONL.
-
-        Consecutive assistant entries are kept separate to match the
-        native CLI format — the SDK merges them internally on resume.
-
-        Returns the FULL conversation state (all entries), not incremental.
-        This output REPLACES any previous transcript.
-        """
-        if not self._entries:
-            return ""
-
-        lines = [entry.model_dump_json(exclude_none=True) for entry in self._entries]
-        return "\n".join(lines) + "\n"
-
-    @property
-    def entry_count(self) -> int:
-        """Total number of entries in the complete context."""
-        return len(self._entries)
-
-    @property
-    def is_empty(self) -> bool:
-        """Whether this builder has any entries."""
-        return len(self._entries) == 0
+__all__ = ["TranscriptBuilder", "TranscriptEntry"]
diff --git a/autogpt_platform/backend/backend/copilot/sdk/transcript_test.py b/autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
index e70b3cedd9..cdc80d467d 100644
--- a/autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
@@ -303,7 +303,7 @@ class TestDeleteTranscript:
         mock_storage.delete = AsyncMock()
 
         with patch(
-            "backend.copilot.sdk.transcript.get_workspace_storage",
+            "backend.copilot.transcript.get_workspace_storage",
             new_callable=AsyncMock,
             return_value=mock_storage,
         ):
@@ -323,7 +323,7 @@ class TestDeleteTranscript:
         )
 
         with patch(
-            "backend.copilot.sdk.transcript.get_workspace_storage",
+            "backend.copilot.transcript.get_workspace_storage",
             new_callable=AsyncMock,
             return_value=mock_storage,
         ):
@@ -341,7 +341,7 @@ class TestDeleteTranscript:
         )
 
         with patch(
-            "backend.copilot.sdk.transcript.get_workspace_storage",
+            "backend.copilot.transcript.get_workspace_storage",
             new_callable=AsyncMock,
             return_value=mock_storage,
         ):
@@ -850,7 +850,7 @@ class TestRunCompression:
     @pytest.mark.asyncio
     async def test_no_client_uses_truncation(self):
         """Path (a): ``get_openai_client()`` returns None → truncation only."""
-        from .transcript import _run_compression
+        from backend.copilot.transcript import _run_compression
 
         truncation_result = self._make_compress_result(
             True, [{"role": "user", "content": "truncated"}]
@@ -858,11 +858,11 @@ class TestRunCompression:
 
         with (
             patch(
-                "backend.copilot.sdk.transcript.get_openai_client",
+                "backend.copilot.transcript.get_openai_client",
                 return_value=None,
             ),
             patch(
-                "backend.copilot.sdk.transcript.compress_context",
+                "backend.copilot.transcript.compress_context",
                 new_callable=AsyncMock,
                 return_value=truncation_result,
             ) as mock_compress,
@@ -885,7 +885,7 @@ class TestRunCompression:
     @pytest.mark.asyncio
     async def test_llm_success_returns_llm_result(self):
         """Path (b): ``get_openai_client()`` returns a client → LLM compresses."""
-        from .transcript import _run_compression
+        from backend.copilot.transcript import _run_compression
 
         llm_result = self._make_compress_result(
             True, [{"role": "user", "content": "LLM summary"}]
@@ -894,11 +894,11 @@ class TestRunCompression:
 
         with (
             patch(
-                "backend.copilot.sdk.transcript.get_openai_client",
+                "backend.copilot.transcript.get_openai_client",
                 return_value=mock_client,
             ),
             patch(
-                "backend.copilot.sdk.transcript.compress_context",
+                "backend.copilot.transcript.compress_context",
                 new_callable=AsyncMock,
                 return_value=llm_result,
             ) as mock_compress,
@@ -916,7 +916,7 @@ class TestRunCompression:
     @pytest.mark.asyncio
     async def test_llm_failure_falls_back_to_truncation(self):
         """Path (c): LLM call raises → truncation fallback used instead."""
-        from .transcript import _run_compression
+        from backend.copilot.transcript import _run_compression
 
         truncation_result = self._make_compress_result(
             True, [{"role": "user", "content": "truncated fallback"}]
@@ -932,11 +932,11 @@ class TestRunCompression:
 
         with (
             patch(
-                "backend.copilot.sdk.transcript.get_openai_client",
+                "backend.copilot.transcript.get_openai_client",
                 return_value=mock_client,
             ),
             patch(
-                "backend.copilot.sdk.transcript.compress_context",
+                "backend.copilot.transcript.compress_context",
                 side_effect=_compress_side_effect,
             ),
         ):
@@ -953,7 +953,7 @@ class TestRunCompression:
     @pytest.mark.asyncio
     async def test_llm_timeout_falls_back_to_truncation(self):
         """Path (d): LLM call exceeds timeout → truncation fallback used."""
-        from .transcript import _run_compression
+        from backend.copilot.transcript import _run_compression
 
         truncation_result = self._make_compress_result(
             True, [{"role": "user", "content": "truncated after timeout"}]
@@ -970,19 +970,19 @@ class TestRunCompression:
         fake_client = MagicMock()
         with (
             patch(
-                "backend.copilot.sdk.transcript.get_openai_client",
+                "backend.copilot.transcript.get_openai_client",
                 return_value=fake_client,
             ),
             patch(
-                "backend.copilot.sdk.transcript.compress_context",
+                "backend.copilot.transcript.compress_context",
                 side_effect=_compress_side_effect,
             ),
             patch(
-                "backend.copilot.sdk.transcript._COMPACTION_TIMEOUT_SECONDS",
+                "backend.copilot.transcript._COMPACTION_TIMEOUT_SECONDS",
                 0.05,
             ),
             patch(
-                "backend.copilot.sdk.transcript._TRUNCATION_TIMEOUT_SECONDS",
+                "backend.copilot.transcript._TRUNCATION_TIMEOUT_SECONDS",
                 5,
             ),
         ):
@@ -1007,7 +1007,7 @@ class TestCleanupStaleProjectDirs:
 
     def test_removes_old_copilot_dirs(self, tmp_path, monkeypatch):
         """Directories matching copilot pattern older than threshold are removed."""
-        from backend.copilot.sdk.transcript import (
+        from backend.copilot.transcript import (
             _STALE_PROJECT_DIR_SECONDS,
             cleanup_stale_project_dirs,
         )
@@ -1015,7 +1015,7 @@ class TestCleanupStaleProjectDirs:
         projects_dir = tmp_path / "projects"
         projects_dir.mkdir()
         monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
             lambda: str(projects_dir),
         )
 
@@ -1039,12 +1039,12 @@ class TestCleanupStaleProjectDirs:
 
     def test_ignores_non_copilot_dirs(self, tmp_path, monkeypatch):
         """Directories not matching copilot pattern are left alone."""
-        from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
+        from backend.copilot.transcript import cleanup_stale_project_dirs
 
         projects_dir = tmp_path / "projects"
         projects_dir.mkdir()
         monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
             lambda: str(projects_dir),
         )
 
@@ -1062,7 +1062,7 @@ class TestCleanupStaleProjectDirs:
 
     def test_ttl_boundary_not_removed(self, tmp_path, monkeypatch):
         """A directory exactly at the TTL boundary should NOT be removed."""
-        from backend.copilot.sdk.transcript import (
+        from backend.copilot.transcript import (
             _STALE_PROJECT_DIR_SECONDS,
             cleanup_stale_project_dirs,
         )
@@ -1070,7 +1070,7 @@ class TestCleanupStaleProjectDirs:
         projects_dir = tmp_path / "projects"
         projects_dir.mkdir()
         monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
             lambda: str(projects_dir),
         )
 
@@ -1088,7 +1088,7 @@ class TestCleanupStaleProjectDirs:
 
     def test_skips_non_directory_entries(self, tmp_path, monkeypatch):
         """Regular files matching the copilot pattern are not removed."""
-        from backend.copilot.sdk.transcript import (
+        from backend.copilot.transcript import (
             _STALE_PROJECT_DIR_SECONDS,
             cleanup_stale_project_dirs,
         )
@@ -1096,7 +1096,7 @@ class TestCleanupStaleProjectDirs:
         projects_dir = tmp_path / "projects"
         projects_dir.mkdir()
         monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
             lambda: str(projects_dir),
         )
 
@@ -1114,11 +1114,11 @@ class TestCleanupStaleProjectDirs:
 
     def test_missing_base_dir_returns_zero(self, tmp_path, monkeypatch):
         """If the projects base directory doesn't exist, return 0 gracefully."""
-        from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
+        from backend.copilot.transcript import cleanup_stale_project_dirs
 
         nonexistent = str(tmp_path / "does-not-exist" / "projects")
         monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
             lambda: nonexistent,
         )
 
@@ -1129,7 +1129,7 @@ class TestCleanupStaleProjectDirs:
         """When encoded_cwd is supplied only that directory is swept."""
         import time
 
-        from backend.copilot.sdk.transcript import (
+        from backend.copilot.transcript import (
             _STALE_PROJECT_DIR_SECONDS,
             cleanup_stale_project_dirs,
         )
@@ -1137,7 +1137,7 @@ class TestCleanupStaleProjectDirs:
         projects_dir = tmp_path / "projects"
         projects_dir.mkdir()
         monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
             lambda: str(projects_dir),
         )
 
@@ -1160,12 +1160,12 @@ class TestCleanupStaleProjectDirs:
 
     def test_scoped_fresh_dir_not_removed(self, tmp_path, monkeypatch):
         """Scoped sweep leaves a fresh directory alone."""
-        from backend.copilot.sdk.transcript import cleanup_stale_project_dirs
+        from backend.copilot.transcript import cleanup_stale_project_dirs
 
         projects_dir = tmp_path / "projects"
         projects_dir.mkdir()
         monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
             lambda: str(projects_dir),
         )
 
@@ -1181,7 +1181,7 @@ class TestCleanupStaleProjectDirs:
         """Scoped sweep refuses to remove a non-copilot directory."""
         import time
 
-        from backend.copilot.sdk.transcript import (
+        from backend.copilot.transcript import (
             _STALE_PROJECT_DIR_SECONDS,
             cleanup_stale_project_dirs,
         )
@@ -1189,7 +1189,7 @@ class TestCleanupStaleProjectDirs:
         projects_dir = tmp_path / "projects"
         projects_dir.mkdir()
         monkeypatch.setattr(
-            "backend.copilot.sdk.transcript._projects_base",
+            "backend.copilot.transcript._projects_base",
             lambda: str(projects_dir),
         )
 
diff --git a/autogpt_platform/backend/backend/copilot/service_test.py b/autogpt_platform/backend/backend/copilot/service_test.py
index d65b356f4a..c4b1c3182e 100644
--- a/autogpt_platform/backend/backend/copilot/service_test.py
+++ b/autogpt_platform/backend/backend/copilot/service_test.py
@@ -7,7 +7,7 @@ import pytest
 from .model import create_chat_session, get_chat_session, upsert_chat_session
 from .response_model import StreamError, StreamTextDelta
 from .sdk import service as sdk_service
-from .sdk.transcript import download_transcript
+from .transcript import download_transcript
 
 logger = logging.getLogger(__name__)
 
diff --git a/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py b/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py
index ce9c30dc3a..adebd89bf1 100644
--- a/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py
@@ -33,12 +33,23 @@ _GET_CURRENT_DATE_BLOCK_ID = "b29c1b50-5d0e-4d9f-8f9d-1b0e6fcbf0b1"
 _GMAIL_SEND_BLOCK_ID = "6c27abc2-e51d-499e-a85f-5a0041ba94f0"
 _TEXT_REPLACE_BLOCK_ID = "7e7c87ab-3469-4bcc-9abe-67705091b713"
 
+# Default OrchestratorBlock model/mode — kept in sync with ChatConfig.model.
+# ChatConfig uses the OpenRouter format ("anthropic/claude-opus-4.6");
+# OrchestratorBlock uses the native Anthropic model name.
+ORCHESTRATOR_DEFAULT_MODEL = "claude-opus-4-6"
+ORCHESTRATOR_DEFAULT_EXECUTION_MODE = "extended_thinking"
+
 # Defaults applied to OrchestratorBlock nodes by the fixer.
-_SDM_DEFAULTS: dict[str, int | bool] = {
+# execution_mode and model match the copilot's default (extended thinking
+# with Opus) so generated agents inherit the same reasoning capabilities.
+# If the user explicitly sets these fields, the fixer won't override them.
+_SDM_DEFAULTS: dict[str, int | bool | str] = {
     "agent_mode_max_iterations": 10,
     "conversation_compaction": True,
     "retry": 3,
     "multiple_tool_calls": False,
+    "execution_mode": ORCHESTRATOR_DEFAULT_EXECUTION_MODE,
+    "model": ORCHESTRATOR_DEFAULT_MODEL,
 }
 
 
@@ -879,6 +890,12 @@ class AgentFixer:
             )
 
             if is_ai_block:
+                # Skip AI blocks that don't expose a "model" input property
+                # (some AI-category blocks have no model selector at all).
+                input_properties = block.get("inputSchema", {}).get("properties", {})
+                if "model" not in input_properties:
+                    continue
+
                 node_id = node.get("id")
                 input_default = node.get("input_default", {})
                 current_model = input_default.get("model")
@@ -887,9 +904,7 @@ class AgentFixer:
                 # Blocks with a block-specific enum on the model field (e.g.
                 # PerplexityBlock) use their own enum values; others use the
                 # generic set.
-                model_schema = (
-                    block.get("inputSchema", {}).get("properties", {}).get("model", {})
-                )
+                model_schema = input_properties.get("model", {})
                 block_model_enum = model_schema.get("enum")
 
                 if block_model_enum:
@@ -1649,6 +1664,8 @@ class AgentFixer:
         2. ``conversation_compaction`` defaults to ``True``
         3. ``retry`` defaults to ``3``
         4. ``multiple_tool_calls`` defaults to ``False``
+        5. ``execution_mode`` defaults to ``"extended_thinking"``
+        6. ``model`` defaults to ``"claude-opus-4-6"``
 
         Args:
             agent: The agent dictionary to fix
@@ -1748,6 +1765,12 @@ class AgentFixer:
         agent = self.fix_node_x_coordinates(agent, node_lookup=node_lookup)
         agent = self.fix_getcurrentdate_offset(agent)
 
+        # Apply OrchestratorBlock defaults BEFORE fix_ai_model_parameter so that
+        # the orchestrator-specific model (claude-opus-4-6) is set first and
+        # fix_ai_model_parameter sees it as a valid allowed model instead of
+        # overwriting it with the generic default (gpt-4o).
+        agent = self.fix_orchestrator_blocks(agent)
+
         # Apply fixes that require blocks information
         if blocks:
             agent = self.fix_invalid_nested_sink_links(
@@ -1765,9 +1788,6 @@ class AgentFixer:
         # Apply fixes for MCPToolBlock nodes
         agent = self.fix_mcp_tool_blocks(agent)
 
-        # Apply fixes for OrchestratorBlock nodes (agent-mode defaults)
-        agent = self.fix_orchestrator_blocks(agent)
-
         # Apply fixes for AgentExecutorBlock nodes (sub-agents)
         if library_agents:
             agent = self.fix_agent_executor_blocks(agent, library_agents)
diff --git a/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer_test.py b/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer_test.py
index 07d71a941c..2319ad6760 100644
--- a/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer_test.py
@@ -580,6 +580,29 @@ class TestFixAiModelParameter:
 
         assert result["nodes"][0]["input_default"]["model"] == "perplexity/sonar"
 
+    def test_ai_block_without_model_property_is_skipped(self):
+        """AI-category blocks that have no 'model' input property should not
+        have a model injected — they simply don't expose a model selector."""
+        fixer = AgentFixer()
+        block_id = generate_uuid()
+        node = _make_node(node_id="n1", block_id=block_id, input_default={})
+        agent = _make_agent(nodes=[node])
+
+        blocks = [
+            {
+                "id": block_id,
+                "name": "SomeAIBlock",
+                "categories": [{"category": "AI"}],
+                "inputSchema": {
+                    "properties": {"prompt": {"type": "string"}},
+                },
+            }
+        ]
+
+        result = fixer.fix_ai_model_parameter(agent, blocks)
+
+        assert "model" not in result["nodes"][0]["input_default"]
+
 
 class TestFixAgentExecutorBlocks:
     """Tests for fix_agent_executor_blocks."""
diff --git a/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide.py b/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide.py
index 2fc733ceb2..0db8e0453c 100644
--- a/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide.py
+++ b/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide.py
@@ -42,7 +42,10 @@ class GetAgentBuildingGuideTool(BaseTool):
 
     @property
     def description(self) -> str:
-        return "Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage). Call before generating agent JSON."
+        return (
+            "Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage, "
+            "and the create->dry-run->fix iterative workflow). Call before generating agent JSON."
+        )
 
     @property
     def parameters(self) -> dict[str, Any]:
diff --git a/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide_test.py b/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide_test.py
new file mode 100644
index 0000000000..261247ee72
--- /dev/null
+++ b/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide_test.py
@@ -0,0 +1,15 @@
+"""Tests for GetAgentBuildingGuideTool."""
+
+from backend.copilot.tools.get_agent_building_guide import _load_guide
+
+
+def test_load_guide_returns_string():
+    guide = _load_guide()
+    assert isinstance(guide, str)
+    assert len(guide) > 100
+
+
+def test_load_guide_caches():
+    guide1 = _load_guide()
+    guide2 = _load_guide()
+    assert guide1 is guide2
diff --git a/autogpt_platform/backend/backend/copilot/tools/helpers.py b/autogpt_platform/backend/backend/copilot/tools/helpers.py
index 8ea7650b4a..cc45a3f63e 100644
--- a/autogpt_platform/backend/backend/copilot/tools/helpers.py
+++ b/autogpt_platform/backend/backend/copilot/tools/helpers.py
@@ -48,27 +48,41 @@ logger = logging.getLogger(__name__)
 def get_inputs_from_schema(
     input_schema: dict[str, Any],
     exclude_fields: set[str] | None = None,
+    input_data: dict[str, Any] | None = None,
 ) -> list[dict[str, Any]]:
-    """Extract input field info from JSON schema."""
+    """Extract input field info from JSON schema.
+
+    When *input_data* is provided, each field's ``value`` key is populated
+    with the value the CoPilot already supplied — so the frontend can
+    prefill the form instead of showing empty inputs.  Fields marked
+    ``advanced`` in the schema are flagged so the frontend can hide them
+    by default (matching the builder behaviour).
+    """
     if not isinstance(input_schema, dict):
         return []
 
     exclude = exclude_fields or set()
     properties = input_schema.get("properties", {})
     required = set(input_schema.get("required", []))
+    provided = input_data or {}
 
-    return [
-        {
+    results: list[dict[str, Any]] = []
+    for name, schema in properties.items():
+        if name in exclude:
+            continue
+        entry: dict[str, Any] = {
             "name": name,
             "title": schema.get("title", name),
             "type": schema.get("type", "string"),
             "description": schema.get("description", ""),
             "required": name in required,
             "default": schema.get("default"),
+            "advanced": schema.get("advanced", False),
         }
-        for name, schema in properties.items()
-        if name not in exclude
-    ]
+        if name in provided:
+            entry["value"] = provided[name]
+        results.append(entry)
+    return results
 
 
 async def execute_block(
@@ -446,7 +460,9 @@ async def prepare_block_for_execution(
                 requirements={
                     "credentials": missing_creds_list,
                     "inputs": get_inputs_from_schema(
-                        input_schema, exclude_fields=credentials_fields
+                        input_schema,
+                        exclude_fields=credentials_fields,
+                        input_data=input_data,
                     ),
                     "execution_modes": ["immediate"],
                 },
diff --git a/autogpt_platform/backend/backend/copilot/tools/run_agent.py b/autogpt_platform/backend/backend/copilot/tools/run_agent.py
index 65ea76dd26..d056e1a5af 100644
--- a/autogpt_platform/backend/backend/copilot/tools/run_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/run_agent.py
@@ -153,7 +153,11 @@ class RunAgentTool(BaseTool):
                 },
                 "dry_run": {
                     "type": "boolean",
-                    "description": "Execute in preview mode.",
+                    "description": (
+                        "When true, simulates execution using an LLM for each block "
+                        "— no real API calls, credentials, or credits. "
+                        "See agent_generation_guide for the full workflow."
+                    ),
                 },
             },
             "required": ["dry_run"],
diff --git a/autogpt_platform/backend/backend/copilot/tools/workspace_files.py b/autogpt_platform/backend/backend/copilot/tools/workspace_files.py
index def2d4772a..a5fe549923 100644
--- a/autogpt_platform/backend/backend/copilot/tools/workspace_files.py
+++ b/autogpt_platform/backend/backend/copilot/tools/workspace_files.py
@@ -845,6 +845,7 @@ class WriteWorkspaceFileTool(BaseTool):
                 path=path,
                 mime_type=mime_type,
                 overwrite=overwrite,
+                metadata={"origin": "agent-created"},
             )
 
             # Build informative source label and message.
diff --git a/autogpt_platform/backend/backend/copilot/transcript.py b/autogpt_platform/backend/backend/copilot/transcript.py
new file mode 100644
index 0000000000..7f961a116f
--- /dev/null
+++ b/autogpt_platform/backend/backend/copilot/transcript.py
@@ -0,0 +1,1247 @@
+"""JSONL transcript management for stateless multi-turn resume.
+
+The Claude Code CLI persists conversations as JSONL files (one JSON object per
+line).  When the SDK's ``Stop`` hook fires we read this file, strip bloat
+(progress entries, metadata), and upload the result to bucket storage.  On the
+next turn we download the transcript, write it to a temp file, and pass
+``--resume`` so the CLI can reconstruct the full conversation.
+
+Storage is handled via ``WorkspaceStorageBackend`` (GCS in prod, local
+filesystem for self-hosted) — no DB column needed.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import os
+import re
+import shutil
+import time
+from dataclasses import dataclass
+from pathlib import Path
+from uuid import uuid4
+
+from backend.util import json
+from backend.util.clients import get_openai_client
+from backend.util.prompt import CompressResult, compress_context
+from backend.util.workspace_storage import GCSWorkspaceStorage, get_workspace_storage
+
+logger = logging.getLogger(__name__)
+
+# UUIDs are hex + hyphens; strip everything else to prevent path injection.
+_SAFE_ID_RE = re.compile(r"[^0-9a-fA-F-]")
+
+# Entry types that can be safely removed from the transcript without breaking
+# the parentUuid conversation tree that ``--resume`` relies on.
+# - progress: UI progress ticks, no message content (avg 97KB for agent_progress)
+# - file-history-snapshot: undo tracking metadata
+# - queue-operation: internal queue bookkeeping
+# - summary: session summaries
+# - pr-link: PR link metadata
+STRIPPABLE_TYPES = frozenset(
+    {"progress", "file-history-snapshot", "queue-operation", "summary", "pr-link"}
+)
+
+
+@dataclass
+class TranscriptDownload:
+    """Result of downloading a transcript with its metadata."""
+
+    content: str
+    message_count: int = 0  # session.messages length when uploaded
+    uploaded_at: float = 0.0  # epoch timestamp of upload
+
+
+# Workspace storage constants — deterministic path from session_id.
+TRANSCRIPT_STORAGE_PREFIX = "chat-transcripts"
+
+
+# ---------------------------------------------------------------------------
+# Progress stripping
+# ---------------------------------------------------------------------------
+
+
+def strip_progress_entries(content: str) -> str:
+    """Remove progress/metadata entries from a JSONL transcript.
+
+    Removes entries whose ``type`` is in ``STRIPPABLE_TYPES`` and reparents
+    any remaining child entries so the ``parentUuid`` chain stays intact.
+    Typically reduces transcript size by ~30%.
+
+    Entries that are not stripped or reparented are kept as their original
+    raw JSON line to avoid unnecessary re-serialization that changes
+    whitespace or key ordering.
+    """
+    lines = content.strip().split("\n")
+
+    # Parse entries, keeping the original line alongside the parsed dict.
+    parsed: list[tuple[str, dict | None]] = []
+    for line in lines:
+        parsed.append((line, json.loads(line, fallback=None)))
+
+    # First pass: identify stripped UUIDs and build parent map.
+    stripped_uuids: set[str] = set()
+    uuid_to_parent: dict[str, str] = {}
+
+    for _line, entry in parsed:
+        if not isinstance(entry, dict):
+            continue
+        uid = entry.get("uuid", "")
+        parent = entry.get("parentUuid", "")
+        if uid:
+            uuid_to_parent[uid] = parent
+        if (
+            entry.get("type", "") in STRIPPABLE_TYPES
+            and uid
+            and not entry.get("isCompactSummary")
+        ):
+            stripped_uuids.add(uid)
+
+    # Second pass: keep non-stripped entries, reparenting where needed.
+    # Preserve original line when no reparenting is required.
+    reparented: set[str] = set()
+    for _line, entry in parsed:
+        if not isinstance(entry, dict):
+            continue
+        parent = entry.get("parentUuid", "")
+        original_parent = parent
+        # seen_parents is local per-entry (not shared across iterations) so
+        # it can only detect cycles within a single ancestry walk, not across
+        # entries.  This is intentional: each entry's parent chain is
+        # independent, and reusing a global set would incorrectly short-circuit
+        # valid re-use of the same UUID as a parent in different subtrees.
+        seen_parents: set[str] = set()
+        while parent in stripped_uuids and parent not in seen_parents:
+            seen_parents.add(parent)
+            parent = uuid_to_parent.get(parent, "")
+        if parent != original_parent:
+            entry["parentUuid"] = parent
+            uid = entry.get("uuid", "")
+            if uid:
+                reparented.add(uid)
+
+    result_lines: list[str] = []
+    for line, entry in parsed:
+        if not isinstance(entry, dict):
+            result_lines.append(line)
+            continue
+        if entry.get("type", "") in STRIPPABLE_TYPES and not entry.get(
+            "isCompactSummary"
+        ):
+            continue
+        uid = entry.get("uuid", "")
+        if uid in reparented:
+            # Re-serialize only entries whose parentUuid was changed.
+            result_lines.append(json.dumps(entry, separators=(",", ":")))
+        else:
+            result_lines.append(line)
+
+    return "\n".join(result_lines) + "\n"
+
+
+def strip_stale_thinking_blocks(content: str) -> str:
+    """Remove thinking/redacted_thinking blocks from non-last assistant entries.
+
+    The Anthropic API only requires thinking blocks in the **last** assistant
+    message to be value-identical to the original response.  Older assistant
+    entries carry stale thinking blocks that consume significant tokens
+    (often 10-50K each) without providing useful context for ``--resume``.
+
+    Stripping them before upload prevents the CLI from triggering compaction
+    every turn just to compress away the stale thinking bloat.
+    """
+    lines = content.strip().split("\n")
+    if not lines:
+        return content
+
+    parsed: list[tuple[str, dict | None]] = []
+    for line in lines:
+        parsed.append((line, json.loads(line, fallback=None)))
+
+    # Reverse scan to find the last assistant message ID and index.
+    last_asst_msg_id: str | None = None
+    last_asst_idx: int | None = None
+    for i in range(len(parsed) - 1, -1, -1):
+        _line, entry = parsed[i]
+        if not isinstance(entry, dict):
+            continue
+        msg = entry.get("message", {})
+        if msg.get("role") == "assistant":
+            last_asst_msg_id = msg.get("id")
+            last_asst_idx = i
+            break
+
+    if last_asst_idx is None:
+        return content
+
+    result_lines: list[str] = []
+    stripped_count = 0
+    for i, (line, entry) in enumerate(parsed):
+        if not isinstance(entry, dict):
+            result_lines.append(line)
+            continue
+
+        msg = entry.get("message", {})
+        # Only strip from assistant entries that are NOT the last turn.
+        # Use msg_id matching when available; fall back to index for entries
+        # without an id field.
+        is_last_turn = (
+            last_asst_msg_id is not None and msg.get("id") == last_asst_msg_id
+        ) or (last_asst_msg_id is None and i == last_asst_idx)
+        if (
+            msg.get("role") == "assistant"
+            and not is_last_turn
+            and isinstance(msg.get("content"), list)
+        ):
+            content_blocks = msg["content"]
+            filtered = [
+                b
+                for b in content_blocks
+                if not (isinstance(b, dict) and b.get("type") in _THINKING_BLOCK_TYPES)
+            ]
+            if len(filtered) < len(content_blocks):
+                stripped_count += len(content_blocks) - len(filtered)
+                entry = {**entry, "message": {**msg, "content": filtered}}
+                result_lines.append(json.dumps(entry, separators=(",", ":")))
+                continue
+
+        result_lines.append(line)
+
+    if stripped_count:
+        logger.info(
+            "[Transcript] Stripped %d stale thinking block(s) from non-last entries",
+            stripped_count,
+        )
+
+    return "\n".join(result_lines) + "\n"
+
+
+def strip_for_upload(content: str) -> str:
+    """Combined single-parse strip of progress entries and stale thinking blocks.
+
+    Equivalent to ``strip_stale_thinking_blocks(strip_progress_entries(content))``
+    but parses the JSONL only once, avoiding redundant ``split`` + ``json.loads``
+    passes on every upload.
+    """
+    lines = content.strip().split("\n")
+    if not lines:
+        return content
+
+    parsed: list[tuple[str, dict | None]] = []
+    for line in lines:
+        parsed.append((line, json.loads(line, fallback=None)))
+
+    # --- Phase 1: progress stripping (reparent children) ---
+    stripped_uuids: set[str] = set()
+    uuid_to_parent: dict[str, str] = {}
+
+    for _line, entry in parsed:
+        if not isinstance(entry, dict):
+            continue
+        uid = entry.get("uuid", "")
+        parent = entry.get("parentUuid", "")
+        if uid:
+            uuid_to_parent[uid] = parent
+        if (
+            entry.get("type", "") in STRIPPABLE_TYPES
+            and uid
+            and not entry.get("isCompactSummary")
+        ):
+            stripped_uuids.add(uid)
+
+    reparented: set[str] = set()
+    for _line, entry in parsed:
+        if not isinstance(entry, dict):
+            continue
+        parent = entry.get("parentUuid", "")
+        original_parent = parent
+        seen_parents: set[str] = set()
+        while parent in stripped_uuids and parent not in seen_parents:
+            seen_parents.add(parent)
+            parent = uuid_to_parent.get(parent, "")
+        if parent != original_parent:
+            entry["parentUuid"] = parent
+            uid = entry.get("uuid", "")
+            if uid:
+                reparented.add(uid)
+
+    # --- Phase 2: identify last assistant for thinking-block stripping ---
+    last_asst_msg_id: str | None = None
+    last_asst_idx: int | None = None
+    for i in range(len(parsed) - 1, -1, -1):
+        _line, entry = parsed[i]
+        if not isinstance(entry, dict):
+            continue
+        if entry.get("type", "") in STRIPPABLE_TYPES and not entry.get(
+            "isCompactSummary"
+        ):
+            continue
+        msg = entry.get("message", {})
+        if msg.get("role") == "assistant":
+            last_asst_msg_id = msg.get("id")
+            last_asst_idx = i
+            break
+
+    # --- Phase 3: single output pass ---
+    result_lines: list[str] = []
+    thinking_stripped = 0
+    for i, (line, entry) in enumerate(parsed):
+        if not isinstance(entry, dict):
+            result_lines.append(line)
+            continue
+
+        # Drop progress/metadata entries
+        if entry.get("type", "") in STRIPPABLE_TYPES and not entry.get(
+            "isCompactSummary"
+        ):
+            continue
+
+        needs_reserialize = False
+        uid = entry.get("uuid", "")
+
+        # Reparented entries need re-serialization
+        if uid in reparented:
+            needs_reserialize = True
+
+        # Strip stale thinking blocks from non-last assistant entries
+        if last_asst_idx is not None:
+            msg = entry.get("message", {})
+            is_last_turn = (
+                last_asst_msg_id is not None and msg.get("id") == last_asst_msg_id
+            ) or (last_asst_msg_id is None and i == last_asst_idx)
+            if (
+                msg.get("role") == "assistant"
+                and not is_last_turn
+                and isinstance(msg.get("content"), list)
+            ):
+                content_blocks = msg["content"]
+                filtered = [
+                    b
+                    for b in content_blocks
+                    if not (
+                        isinstance(b, dict) and b.get("type") in _THINKING_BLOCK_TYPES
+                    )
+                ]
+                if len(filtered) < len(content_blocks):
+                    thinking_stripped += len(content_blocks) - len(filtered)
+                    entry = {**entry, "message": {**msg, "content": filtered}}
+                    needs_reserialize = True
+
+        if needs_reserialize:
+            result_lines.append(json.dumps(entry, separators=(",", ":")))
+        else:
+            result_lines.append(line)
+
+    if thinking_stripped:
+        logger.info(
+            "[Transcript] Stripped %d stale thinking block(s) from non-last entries",
+            thinking_stripped,
+        )
+
+    return "\n".join(result_lines) + "\n"
+
+
+# ---------------------------------------------------------------------------
+# Local file I/O (write temp file for --resume)
+# ---------------------------------------------------------------------------
+
+
+def _sanitize_id(raw_id: str, max_len: int = 36) -> str:
+    """Sanitize an ID for safe use in file paths.
+
+    Session/user IDs are expected to be UUIDs (hex + hyphens).  Strip
+    everything else and truncate to *max_len* so the result cannot introduce
+    path separators or other special characters.
+    """
+    cleaned = _SAFE_ID_RE.sub("", raw_id or "")[:max_len]
+    return cleaned or "unknown"
+
+
+_SAFE_CWD_PREFIX = os.path.realpath("/tmp/copilot-")
+
+
+def _projects_base() -> str:
+    """Return the resolved path to the CLI's projects directory."""
+    config_dir = os.environ.get("CLAUDE_CONFIG_DIR") or os.path.expanduser("~/.claude")
+    return os.path.realpath(os.path.join(config_dir, "projects"))
+
+
+_STALE_PROJECT_DIR_SECONDS = 12 * 3600  # 12 hours — matches max session lifetime
+_MAX_PROJECT_DIRS_TO_SWEEP = 50  # limit per sweep to avoid long pauses
+
+
+def cleanup_stale_project_dirs(encoded_cwd: str | None = None) -> int:
+    """Remove CLI project directories older than ``_STALE_PROJECT_DIR_SECONDS``.
+
+    Each CoPilot SDK turn creates a unique ``~/.claude/projects/<encoded-cwd>/``
+    directory.  These are intentionally kept across turns so the model can read
+    tool-result files via ``--resume``.  However, after a session ends they
+    become stale.  This function sweeps old ones to prevent unbounded disk
+    growth.
+
+    When *encoded_cwd* is provided the sweep is scoped to that single
+    directory, making the operation safe in multi-tenant environments where
+    multiple copilot sessions share the same host.  Without it the function
+    falls back to sweeping all directories matching the copilot naming pattern
+    (``-tmp-copilot-``), which is only safe for single-tenant deployments.
+
+    Returns the number of directories removed.
+    """
+    projects_base = _projects_base()
+    if not os.path.isdir(projects_base):
+        return 0
+
+    now = time.time()
+    removed = 0
+
+    # Scoped mode: only clean up the one directory for the current session.
+    if encoded_cwd:
+        target = Path(projects_base) / encoded_cwd
+        if not target.is_dir():
+            return 0
+        # Guard: only sweep copilot-generated dirs.
+        if "-tmp-copilot-" not in target.name:
+            logger.warning(
+                "[Transcript] Refusing to sweep non-copilot dir: %s", target.name
+            )
+            return 0
+        try:
+            # st_mtime is used as a proxy for session activity. Claude CLI writes
+            # its JSONL transcript into this directory during each turn, so mtime
+            # advances on every turn. A directory whose mtime is older than
+            # _STALE_PROJECT_DIR_SECONDS has not had an active turn in that window
+            # and is safe to remove (the session cannot --resume after cleanup).
+            age = now - target.stat().st_mtime
+        except OSError:
+            return 0
+        if age < _STALE_PROJECT_DIR_SECONDS:
+            return 0
+        try:
+            shutil.rmtree(target, ignore_errors=True)
+            removed = 1
+        except OSError:
+            pass
+        if removed:
+            logger.info(
+                "[Transcript] Swept stale CLI project dir %s (age %ds > %ds)",
+                target.name,
+                int(age),
+                _STALE_PROJECT_DIR_SECONDS,
+            )
+        return removed
+
+    # Unscoped fallback: sweep all copilot dirs across the projects base.
+    # Only safe for single-tenant deployments; callers should prefer the
+    # scoped variant by passing encoded_cwd.
+    try:
+        entries = Path(projects_base).iterdir()
+    except OSError as e:
+        logger.warning("[Transcript] Failed to list projects dir: %s", e)
+        return 0
+
+    for entry in entries:
+        if removed >= _MAX_PROJECT_DIRS_TO_SWEEP:
+            break
+        # Only sweep copilot-generated dirs (pattern: -tmp-copilot- or
+        # -private-tmp-copilot-).
+        if "-tmp-copilot-" not in entry.name:
+            continue
+        if not entry.is_dir():
+            continue
+        try:
+            # See the scoped-mode comment above: st_mtime advances on every turn,
+            # so a stale mtime reliably indicates an inactive session.
+            age = now - entry.stat().st_mtime
+        except OSError:
+            continue
+        if age < _STALE_PROJECT_DIR_SECONDS:
+            continue
+
+        try:
+            shutil.rmtree(entry, ignore_errors=True)
+            removed += 1
+        except OSError:
+            pass
+
+    if removed:
+        logger.info(
+            "[Transcript] Swept %d stale CLI project dirs (older than %ds)",
+            removed,
+            _STALE_PROJECT_DIR_SECONDS,
+        )
+    return removed
+
+
+def read_compacted_entries(transcript_path: str) -> list[dict] | None:
+    """Read compacted entries from the CLI session file after compaction.
+
+    Parses the JSONL file line-by-line, finds the ``isCompactSummary: true``
+    entry, and returns it plus all entries after it.
+
+    The CLI writes the compaction summary BEFORE sending the next message,
+    so the file is guaranteed to be flushed by the time we read it.
+
+    Returns a list of parsed dicts, or ``None`` if the file cannot be read
+    or no compaction summary is found.
+    """
+    if not transcript_path:
+        return None
+
+    projects_base = _projects_base()
+    real_path = os.path.realpath(transcript_path)
+    if not real_path.startswith(projects_base + os.sep):
+        logger.warning(
+            "[Transcript] transcript_path outside projects base: %s", transcript_path
+        )
+        return None
+
+    try:
+        content = Path(real_path).read_text()
+    except OSError as e:
+        logger.warning(
+            "[Transcript] Failed to read session file %s: %s", transcript_path, e
+        )
+        return None
+
+    lines = content.strip().split("\n")
+    compact_idx: int | None = None
+
+    for idx, line in enumerate(lines):
+        if not line.strip():
+            continue
+        entry = json.loads(line, fallback=None)
+        if not isinstance(entry, dict):
+            continue
+        if entry.get("isCompactSummary"):
+            compact_idx = idx  # don't break — find the LAST summary
+
+    if compact_idx is None:
+        logger.debug("[Transcript] No compaction summary found in %s", transcript_path)
+        return None
+
+    entries: list[dict] = []
+    for line in lines[compact_idx:]:
+        if not line.strip():
+            continue
+        entry = json.loads(line, fallback=None)
+        if isinstance(entry, dict):
+            entries.append(entry)
+
+    logger.info(
+        "[Transcript] Read %d compacted entries from %s (summary at line %d)",
+        len(entries),
+        transcript_path,
+        compact_idx + 1,
+    )
+    return entries
+
+
+def write_transcript_to_tempfile(
+    transcript_content: str,
+    session_id: str,
+    cwd: str,
+) -> str | None:
+    """Write JSONL transcript to a temp file inside *cwd* for ``--resume``.
+
+    The file lives in the session working directory so it is cleaned up
+    automatically when the session ends.
+
+    Returns the absolute path to the file, or ``None`` on failure.
+    """
+    # Validate cwd is under the expected sandbox prefix (CodeQL sanitizer).
+    real_cwd = os.path.realpath(cwd)
+    if not real_cwd.startswith(_SAFE_CWD_PREFIX):
+        logger.warning("[Transcript] cwd outside sandbox: %s", cwd)
+        return None
+
+    try:
+        os.makedirs(real_cwd, exist_ok=True)
+        safe_id = _sanitize_id(session_id, max_len=8)
+        jsonl_path = os.path.realpath(
+            os.path.join(real_cwd, f"transcript-{safe_id}.jsonl")
+        )
+        if not jsonl_path.startswith(real_cwd):
+            logger.warning("[Transcript] Path escaped cwd: %s", jsonl_path)
+            return None
+
+        with open(jsonl_path, "w") as f:
+            f.write(transcript_content)
+
+        logger.info("[Transcript] Wrote resume file: %s", jsonl_path)
+        return jsonl_path
+
+    except OSError as e:
+        logger.warning("[Transcript] Failed to write resume file: %s", e)
+        return None
+
+
+def validate_transcript(content: str | None) -> bool:
+    """Check that a transcript has actual conversation messages.
+
+    A valid transcript needs at least one assistant message (not just
+    queue-operation / file-history-snapshot metadata).  We do NOT require
+    a ``type: "user"`` entry because with ``--resume`` the user's message
+    is passed as a CLI query parameter and does not appear in the
+    transcript file.
+    """
+    if not content or not content.strip():
+        return False
+
+    lines = content.strip().split("\n")
+
+    has_assistant = False
+
+    for line in lines:
+        if not line.strip():
+            continue
+        entry = json.loads(line, fallback=None)
+        if not isinstance(entry, dict):
+            return False
+        if entry.get("type") == "assistant":
+            has_assistant = True
+
+    return has_assistant
+
+
+# ---------------------------------------------------------------------------
+# Bucket storage (GCS / local via WorkspaceStorageBackend)
+# ---------------------------------------------------------------------------
+
+
+def _storage_path_parts(user_id: str, session_id: str) -> tuple[str, str, str]:
+    """Return (workspace_id, file_id, filename) for a session's transcript.
+
+    Path structure: ``chat-transcripts/{user_id}/{session_id}.jsonl``
+    IDs are sanitized to hex+hyphen to prevent path traversal.
+    """
+    return (
+        TRANSCRIPT_STORAGE_PREFIX,
+        _sanitize_id(user_id),
+        f"{_sanitize_id(session_id)}.jsonl",
+    )
+
+
+def _meta_storage_path_parts(user_id: str, session_id: str) -> tuple[str, str, str]:
+    """Return (workspace_id, file_id, filename) for a session's transcript metadata."""
+    return (
+        TRANSCRIPT_STORAGE_PREFIX,
+        _sanitize_id(user_id),
+        f"{_sanitize_id(session_id)}.meta.json",
+    )
+
+
+def _build_path_from_parts(parts: tuple[str, str, str], backend: object) -> str:
+    """Build a full storage path from (workspace_id, file_id, filename) parts."""
+    wid, fid, fname = parts
+    if isinstance(backend, GCSWorkspaceStorage):
+        blob = f"workspaces/{wid}/{fid}/{fname}"
+        return f"gcs://{backend.bucket_name}/{blob}"
+    return f"local://{wid}/{fid}/{fname}"
+
+
+def _build_storage_path(user_id: str, session_id: str, backend: object) -> str:
+    """Build the full storage path string that ``retrieve()`` expects."""
+    return _build_path_from_parts(_storage_path_parts(user_id, session_id), backend)
+
+
+def _build_meta_storage_path(user_id: str, session_id: str, backend: object) -> str:
+    """Build the full storage path for the companion .meta.json file."""
+    return _build_path_from_parts(
+        _meta_storage_path_parts(user_id, session_id), backend
+    )
+
+
+async def upload_transcript(
+    user_id: str,
+    session_id: str,
+    content: str,
+    message_count: int = 0,
+    log_prefix: str = "[Transcript]",
+    skip_strip: bool = False,
+) -> None:
+    """Strip progress entries and stale thinking blocks, then upload transcript.
+
+    The transcript represents the FULL active context (atomic).
+    Each upload REPLACES the previous transcript entirely.
+
+    The executor holds a cluster lock per session, so concurrent uploads for
+    the same session cannot happen.
+
+    Args:
+        content: Complete JSONL transcript (from TranscriptBuilder).
+        message_count: ``len(session.messages)`` at upload time.
+        skip_strip: When ``True``, skip the strip + re-validate pass.
+            Safe for builder-generated content (baseline path) which
+            never emits progress entries or stale thinking blocks.
+    """
+    if skip_strip:
+        # Caller guarantees the content is already clean and valid.
+        stripped = content
+    else:
+        # Strip metadata entries and stale thinking blocks in a single parse.
+        # SDK-built transcripts may have progress entries; strip for safety.
+        stripped = strip_for_upload(content)
+    if not skip_strip and not validate_transcript(stripped):
+        # Log entry types for debugging — helps identify why validation failed
+        entry_types = [
+            json.loads(line, fallback={"type": "INVALID_JSON"}).get("type", "?")
+            for line in stripped.strip().split("\n")
+        ]
+        logger.warning(
+            "%s Skipping upload — stripped content not valid "
+            "(types=%s, stripped_len=%d, raw_len=%d)",
+            log_prefix,
+            entry_types,
+            len(stripped),
+            len(content),
+        )
+        logger.debug("%s Raw content preview: %s", log_prefix, content[:500])
+        logger.debug("%s Stripped content: %s", log_prefix, stripped[:500])
+        return
+
+    storage = await get_workspace_storage()
+    wid, fid, fname = _storage_path_parts(user_id, session_id)
+    encoded = stripped.encode("utf-8")
+    meta = {"message_count": message_count, "uploaded_at": time.time()}
+    mwid, mfid, mfname = _meta_storage_path_parts(user_id, session_id)
+    meta_encoded = json.dumps(meta).encode("utf-8")
+
+    # Transcript + metadata are independent objects at different keys, so
+    # write them concurrently.  ``return_exceptions`` keeps a metadata
+    # failure from sinking the transcript write.
+    transcript_result, metadata_result = await asyncio.gather(
+        storage.store(
+            workspace_id=wid,
+            file_id=fid,
+            filename=fname,
+            content=encoded,
+        ),
+        storage.store(
+            workspace_id=mwid,
+            file_id=mfid,
+            filename=mfname,
+            content=meta_encoded,
+        ),
+        return_exceptions=True,
+    )
+    if isinstance(transcript_result, BaseException):
+        raise transcript_result
+    if isinstance(metadata_result, BaseException):
+        # Metadata is best-effort — the gap-fill logic in
+        # _build_query_message tolerates a missing metadata file.
+        logger.warning("%s Failed to write metadata: %s", log_prefix, metadata_result)
+
+    logger.info(
+        "%s Uploaded %dB (stripped from %dB, msg_count=%d)",
+        log_prefix,
+        len(encoded),
+        len(content),
+        message_count,
+    )
+
+
+async def download_transcript(
+    user_id: str,
+    session_id: str,
+    log_prefix: str = "[Transcript]",
+) -> TranscriptDownload | None:
+    """Download transcript and metadata from bucket storage.
+
+    Returns a ``TranscriptDownload`` with the JSONL content and the
+    ``message_count`` watermark from the upload, or ``None`` if not found.
+
+    The content and metadata fetches run concurrently since they are
+    independent objects in the bucket.
+    """
+    storage = await get_workspace_storage()
+    path = _build_storage_path(user_id, session_id, storage)
+    meta_path = _build_meta_storage_path(user_id, session_id, storage)
+
+    content_task = asyncio.create_task(storage.retrieve(path))
+    meta_task = asyncio.create_task(storage.retrieve(meta_path))
+    content_result, meta_result = await asyncio.gather(
+        content_task, meta_task, return_exceptions=True
+    )
+
+    if isinstance(content_result, FileNotFoundError):
+        logger.debug("%s No transcript in storage", log_prefix)
+        return None
+    if isinstance(content_result, BaseException):
+        logger.warning(
+            "%s Failed to download transcript: %s", log_prefix, content_result
+        )
+        return None
+
+    content = content_result.decode("utf-8")
+
+    # Metadata is best-effort — old transcripts won't have it.
+    message_count = 0
+    uploaded_at = 0.0
+    if isinstance(meta_result, FileNotFoundError):
+        pass  # No metadata — treat as unknown (msg_count=0 → always fill gap)
+    elif isinstance(meta_result, BaseException):
+        logger.debug(
+            "%s Failed to load transcript metadata: %s", log_prefix, meta_result
+        )
+    else:
+        meta = json.loads(meta_result.decode("utf-8"), fallback={})
+        message_count = meta.get("message_count", 0)
+        uploaded_at = meta.get("uploaded_at", 0.0)
+
+    logger.info(
+        "%s Downloaded %dB (msg_count=%d)", log_prefix, len(content), message_count
+    )
+    return TranscriptDownload(
+        content=content,
+        message_count=message_count,
+        uploaded_at=uploaded_at,
+    )
+
+
+async def delete_transcript(user_id: str, session_id: str) -> None:
+    """Delete transcript and its metadata from bucket storage.
+
+    Removes both the ``.jsonl`` transcript and the companion ``.meta.json``
+    so stale ``message_count`` watermarks cannot corrupt gap-fill logic.
+    """
+    storage = await get_workspace_storage()
+    path = _build_storage_path(user_id, session_id, storage)
+
+    try:
+        await storage.delete(path)
+        logger.info("[Transcript] Deleted transcript for session %s", session_id)
+    except Exception as e:
+        logger.warning("[Transcript] Failed to delete transcript: %s", e)
+
+    # Also delete the companion .meta.json to avoid orphaned metadata.
+    try:
+        meta_path = _build_meta_storage_path(user_id, session_id, storage)
+        await storage.delete(meta_path)
+        logger.info("[Transcript] Deleted metadata for session %s", session_id)
+    except Exception as e:
+        logger.warning("[Transcript] Failed to delete metadata: %s", e)
+
+
+# ---------------------------------------------------------------------------
+# Transcript compaction — LLM summarization for prompt-too-long recovery
+# ---------------------------------------------------------------------------
+
+# JSONL protocol values used in transcript serialization.
+STOP_REASON_END_TURN = "end_turn"
+STOP_REASON_TOOL_USE = "tool_use"
+COMPACT_MSG_ID_PREFIX = "msg_compact_"
+ENTRY_TYPE_MESSAGE = "message"
+
+
+_THINKING_BLOCK_TYPES = frozenset({"thinking", "redacted_thinking"})
+
+
+def _flatten_assistant_content(blocks: list) -> str:
+    """Flatten assistant content blocks into a single plain-text string.
+
+    Structured ``tool_use`` blocks are converted to ``[tool_use: name]``
+    placeholders.  ``thinking`` and ``redacted_thinking`` blocks are
+    silently dropped — they carry no useful context for compression
+    summaries and must not leak into compacted transcripts (the Anthropic
+    API requires thinking blocks in the last assistant message to be
+    value-identical to the original response; including stale thinking
+    text would violate that constraint).
+
+    This is intentional: ``compress_context`` requires plain text for
+    token counting and LLM summarization.  The structural loss is
+    acceptable because compaction only runs when the original transcript
+    was already too large for the model.
+    """
+    parts: list[str] = []
+    for block in blocks:
+        if isinstance(block, dict):
+            btype = block.get("type", "")
+            if btype in _THINKING_BLOCK_TYPES:
+                continue
+            if btype == "text":
+                parts.append(block.get("text", ""))
+            elif btype == "tool_use":
+                # Drop tool_use entirely — any text representation gets
+                # mimicked by the model as plain text instead of actual
+                # structured tool calls. The tool results (in the
+                # following user/tool_result entry) provide sufficient
+                # context about what happened.
+                continue
+            else:
+                continue
+        elif isinstance(block, str):
+            parts.append(block)
+    return "\n".join(parts) if parts else ""
+
+
+def _flatten_tool_result_content(blocks: list) -> str:
+    """Flatten tool_result and other content blocks into plain text.
+
+    Handles nested tool_result structures, text blocks, and raw strings.
+    Uses ``json.dumps`` as fallback for dict blocks without a ``text`` key
+    or where ``text`` is ``None``.
+
+    Like ``_flatten_assistant_content``, structured blocks (images, nested
+    tool results) are reduced to text representations for compression.
+    """
+    str_parts: list[str] = []
+    for block in blocks:
+        if isinstance(block, dict) and block.get("type") == "tool_result":
+            inner = block.get("content") or ""
+            if isinstance(inner, list):
+                for sub in inner:
+                    if isinstance(sub, dict):
+                        sub_type = sub.get("type")
+                        if sub_type in ("image", "document"):
+                            # Avoid serializing base64 binary data into
+                            # the compaction input — use a placeholder.
+                            str_parts.append(f"[__{sub_type}__]")
+                        elif sub_type == "text" or sub.get("text") is not None:
+                            str_parts.append(str(sub.get("text", "")))
+                        else:
+                            str_parts.append(json.dumps(sub))
+                    else:
+                        str_parts.append(str(sub))
+            else:
+                str_parts.append(str(inner))
+        elif isinstance(block, dict) and block.get("type") == "text":
+            str_parts.append(str(block.get("text", "")))
+        elif isinstance(block, dict):
+            # Preserve non-text/non-tool_result blocks (e.g. image) as placeholders.
+            # Use __prefix__ to distinguish from literal user text.
+            btype = block.get("type", "unknown")
+            str_parts.append(f"[__{btype}__]")
+        elif isinstance(block, str):
+            str_parts.append(block)
+    return "\n".join(str_parts) if str_parts else ""
+
+
+def _transcript_to_messages(content: str) -> list[dict]:
+    """Convert JSONL transcript entries to plain message dicts for compression.
+
+    Parses each line of the JSONL *content*, skips strippable metadata entries
+    (progress, file-history-snapshot, etc.), and extracts the ``role`` and
+    flattened ``content`` from the ``message`` field of each remaining entry.
+
+    Structured content blocks (``tool_use``, ``tool_result``, images) are
+    flattened to plain text via ``_flatten_assistant_content`` and
+    ``_flatten_tool_result_content`` so that ``compress_context`` can
+    perform token counting and LLM summarization on uniform strings.
+
+    Returns:
+        A list of ``{"role": str, "content": str}`` dicts suitable for
+        ``compress_context``.
+    """
+    messages: list[dict] = []
+    for line in content.strip().split("\n"):
+        if not line.strip():
+            continue
+        entry = json.loads(line, fallback=None)
+        if not isinstance(entry, dict):
+            continue
+        if entry.get("type", "") in STRIPPABLE_TYPES and not entry.get(
+            "isCompactSummary"
+        ):
+            continue
+        msg = entry.get("message", {})
+        role = msg.get("role", "")
+        if not role:
+            continue
+        msg_dict: dict = {"role": role}
+        raw_content = msg.get("content")
+        if role == "assistant" and isinstance(raw_content, list):
+            msg_dict["content"] = _flatten_assistant_content(raw_content)
+        elif isinstance(raw_content, list):
+            msg_dict["content"] = _flatten_tool_result_content(raw_content)
+        else:
+            msg_dict["content"] = raw_content or ""
+        messages.append(msg_dict)
+    return messages
+
+
+def _messages_to_transcript(messages: list[dict]) -> str:
+    """Convert compressed message dicts back to JSONL transcript format.
+
+    Rebuilds a minimal JSONL transcript from the ``{"role", "content"}``
+    dicts returned by ``compress_context``.  Each message becomes one JSONL
+    line with a fresh ``uuid`` / ``parentUuid`` chain so the CLI's
+    ``--resume`` flag can reconstruct a valid conversation tree.
+
+    Assistant messages are wrapped in the full ``message`` envelope
+    (``id``, ``model``, ``stop_reason``, structured ``content`` blocks)
+    that the CLI expects.  User messages use the simpler ``{role, content}``
+    form.
+
+    Returns:
+        A newline-terminated JSONL string, or an empty string if *messages*
+        is empty.
+    """
+    lines: list[str] = []
+    last_uuid: str = ""  # root entry uses empty string, not null
+    for msg in messages:
+        role = msg.get("role", "user")
+        entry_type = "assistant" if role == "assistant" else "user"
+        uid = str(uuid4())
+        content = msg.get("content", "")
+        if role == "assistant":
+            message: dict = {
+                "role": "assistant",
+                "model": "",
+                "id": f"{COMPACT_MSG_ID_PREFIX}{uuid4().hex[:24]}",
+                "type": ENTRY_TYPE_MESSAGE,
+                "content": [{"type": "text", "text": content}] if content else [],
+                "stop_reason": STOP_REASON_END_TURN,
+                "stop_sequence": None,
+            }
+        else:
+            message = {"role": role, "content": content}
+        entry = {
+            "type": entry_type,
+            "uuid": uid,
+            "parentUuid": last_uuid,
+            "message": message,
+        }
+        lines.append(json.dumps(entry, separators=(",", ":")))
+        last_uuid = uid
+    return "\n".join(lines) + "\n" if lines else ""
+
+
+_COMPACTION_TIMEOUT_SECONDS = 60
+_TRUNCATION_TIMEOUT_SECONDS = 30
+
+
+async def _run_compression(
+    messages: list[dict],
+    model: str,
+    log_prefix: str,
+) -> CompressResult:
+    """Run LLM-based compression with truncation fallback.
+
+    Uses the shared OpenAI client from ``get_openai_client()``.
+    If no client is configured or the LLM call fails, falls back to
+    truncation-based compression which drops older messages without
+    summarization.
+
+    A 60-second timeout prevents a hung LLM call from blocking the
+    retry path indefinitely.  The truncation fallback also has a
+    30-second timeout to guard against slow tokenization on very large
+    transcripts.
+    """
+    client = get_openai_client()
+    if client is None:
+        logger.warning("%s No OpenAI client configured, using truncation", log_prefix)
+        return await asyncio.wait_for(
+            compress_context(messages=messages, model=model, client=None),
+            timeout=_TRUNCATION_TIMEOUT_SECONDS,
+        )
+    try:
+        return await asyncio.wait_for(
+            compress_context(messages=messages, model=model, client=client),
+            timeout=_COMPACTION_TIMEOUT_SECONDS,
+        )
+    except Exception as e:
+        logger.warning("%s LLM compaction failed, using truncation: %s", log_prefix, e)
+        return await asyncio.wait_for(
+            compress_context(messages=messages, model=model, client=None),
+            timeout=_TRUNCATION_TIMEOUT_SECONDS,
+        )
+
+
+def _find_last_assistant_entry(
+    content: str,
+) -> tuple[list[str], list[str]]:
+    """Split JSONL lines into (compressible_prefix, preserved_tail).
+
+    The tail starts at the **first** entry of the last assistant turn and
+    includes everything after it (typically trailing user messages).  An
+    assistant turn can span multiple consecutive JSONL entries sharing the
+    same ``message.id`` (e.g., a thinking entry followed by a tool_use
+    entry).  All entries of the turn are preserved verbatim.
+
+    The Anthropic API requires that ``thinking`` and ``redacted_thinking``
+    blocks in the **last** assistant message remain value-identical to the
+    original response (the API validates parsed signature values, not raw
+    JSON bytes).  By excluding the entire turn from compression we
+    guarantee those blocks are never altered.
+
+    Returns ``(all_lines, [])`` when no assistant entry is found.
+    """
+    lines = [ln for ln in content.strip().split("\n") if ln.strip()]
+
+    # Parse all lines once to avoid double JSON deserialization.
+    # json.loads with fallback=None returns Any; non-dict entries are
+    # safely skipped by the isinstance(entry, dict) guards below.
+    parsed: list = [json.loads(ln, fallback=None) for ln in lines]
+
+    # Reverse scan: find the message.id and index of the last assistant entry.
+    last_asst_msg_id: str | None = None
+    last_asst_idx: int | None = None
+    for i in range(len(parsed) - 1, -1, -1):
+        entry = parsed[i]
+        if not isinstance(entry, dict):
+            continue
+        msg = entry.get("message", {})
+        if msg.get("role") == "assistant":
+            last_asst_idx = i
+            last_asst_msg_id = msg.get("id")
+            break
+
+    if last_asst_idx is None:
+        return lines, []
+
+    # If the assistant entry has no message.id, fall back to preserving
+    # from that single entry onward — safer than compressing everything.
+    if last_asst_msg_id is None:
+        return lines[:last_asst_idx], lines[last_asst_idx:]
+
+    # Forward scan: find the first entry of this turn (same message.id).
+    first_turn_idx: int | None = None
+    for i, entry in enumerate(parsed):
+        if not isinstance(entry, dict):
+            continue
+        msg = entry.get("message", {})
+        if msg.get("role") == "assistant" and msg.get("id") == last_asst_msg_id:
+            first_turn_idx = i
+            break
+
+    if first_turn_idx is None:
+        return lines, []
+    return lines[:first_turn_idx], lines[first_turn_idx:]
+
+
+async def compact_transcript(
+    content: str,
+    *,
+    model: str,
+    log_prefix: str = "[Transcript]",
+) -> str | None:
+    """Compact an oversized JSONL transcript using LLM summarization.
+
+    Converts transcript entries to plain messages, runs ``compress_context``
+    (the same compressor used for pre-query history), and rebuilds JSONL.
+
+    The **last assistant entry** (and any entries after it) are preserved
+    verbatim — never flattened or compressed.  The Anthropic API requires
+    ``thinking`` and ``redacted_thinking`` blocks in the latest assistant
+    message to be value-identical to the original response (the API
+    validates parsed signature values, not raw JSON bytes); compressing
+    them would destroy the cryptographic signatures and cause
+    ``invalid_request_error``.
+
+    Structured content in *older* assistant entries (``tool_use`` blocks,
+    ``thinking`` blocks, ``tool_result`` nesting, images) is flattened to
+    plain text for compression.  This matches the fidelity of the Plan C
+    (DB compression) fallback path.
+
+    Returns the compacted JSONL string, or ``None`` on failure.
+
+    See also:
+        ``_compress_messages`` in ``service.py`` — compresses ``ChatMessage``
+        lists for pre-query DB history.
+    """
+    prefix_lines, tail_lines = _find_last_assistant_entry(content)
+
+    # Build the JSONL string for the compressible prefix
+    prefix_content = "\n".join(prefix_lines) + "\n" if prefix_lines else ""
+    messages = _transcript_to_messages(prefix_content) if prefix_content else []
+
+    if len(messages) + len(tail_lines) < 2:
+        total = len(messages) + len(tail_lines)
+        logger.warning("%s Too few messages to compact (%d)", log_prefix, total)
+        return None
+    if not messages:
+        logger.warning("%s Nothing to compress (only tail entries remain)", log_prefix)
+        return None
+    try:
+        result = await _run_compression(messages, model, log_prefix)
+        if not result.was_compacted:
+            logger.warning(
+                "%s Compressor reports within budget but SDK rejected — "
+                "signalling failure",
+                log_prefix,
+            )
+            return None
+        if not result.messages:
+            logger.warning("%s Compressor returned empty messages", log_prefix)
+            return None
+        logger.info(
+            "%s Compacted transcript: %d->%d tokens (%d summarized, %d dropped)",
+            log_prefix,
+            result.original_token_count,
+            result.token_count,
+            result.messages_summarized,
+            result.messages_dropped,
+        )
+        compressed_part = _messages_to_transcript(result.messages)
+
+        # Re-append the preserved tail (last assistant + trailing entries)
+        # with parentUuid patched to chain onto the compressed prefix.
+        tail_part = _rechain_tail(compressed_part, tail_lines)
+        compacted = compressed_part + tail_part
+
+        if len(compacted) >= len(content):
+            # Byte count can increase due to preserved tail entries
+            # (thinking blocks, JSON overhead) even when token count
+            # decreased.  Log a warning but still return — the API
+            # validates tokens not bytes, and the caller falls through
+            # to DB fallback if the transcript is still too large.
+            logger.warning(
+                "%s Compacted transcript (%d bytes) is not smaller than "
+                "original (%d bytes) — may still reduce token count",
+                log_prefix,
+                len(compacted),
+                len(content),
+            )
+        # Authoritative validation — the caller (_reduce_context) also
+        # validates, but this is the canonical check that guarantees we
+        # never return a malformed transcript from this function.
+        if not validate_transcript(compacted):
+            logger.warning("%s Compacted transcript failed validation", log_prefix)
+            return None
+        return compacted
+    except Exception as e:
+        logger.error(
+            "%s Transcript compaction failed: %s", log_prefix, e, exc_info=True
+        )
+        return None
+
+
+def _rechain_tail(compressed_prefix: str, tail_lines: list[str]) -> str:
+    """Patch tail entries so their parentUuid chain links to the compressed prefix.
+
+    The first tail entry's ``parentUuid`` is set to the ``uuid`` of the
+    last entry in the compressed prefix.  Subsequent tail entries are
+    rechained to point to their predecessor in the tail — their original
+    ``parentUuid`` values may reference entries that were compressed away.
+    """
+    if not tail_lines:
+        return ""
+    # Find the last uuid in the compressed prefix
+    last_prefix_uuid = ""
+    for line in reversed(compressed_prefix.strip().split("\n")):
+        if not line.strip():
+            continue
+        entry = json.loads(line, fallback=None)
+        if isinstance(entry, dict) and "uuid" in entry:
+            last_prefix_uuid = entry["uuid"]
+            break
+
+    result_lines: list[str] = []
+    prev_uuid: str | None = None
+    for i, line in enumerate(tail_lines):
+        entry = json.loads(line, fallback=None)
+        if not isinstance(entry, dict):
+            # Safety guard: _find_last_assistant_entry already filters empty
+            # lines, and well-formed JSONL always parses to dicts.  Non-dict
+            # lines are passed through unchanged; prev_uuid is intentionally
+            # NOT updated so the next dict entry chains to the last known uuid.
+            result_lines.append(line)
+            continue
+        if i == 0:
+            entry["parentUuid"] = last_prefix_uuid
+        elif prev_uuid is not None:
+            entry["parentUuid"] = prev_uuid
+        prev_uuid = entry.get("uuid")
+        result_lines.append(json.dumps(entry, separators=(",", ":")))
+    return "\n".join(result_lines) + "\n"
diff --git a/autogpt_platform/backend/backend/copilot/transcript_builder.py b/autogpt_platform/backend/backend/copilot/transcript_builder.py
new file mode 100644
index 0000000000..b5f086f802
--- /dev/null
+++ b/autogpt_platform/backend/backend/copilot/transcript_builder.py
@@ -0,0 +1,240 @@
+"""Build complete JSONL transcript from SDK messages.
+
+The transcript represents the FULL active context at any point in time.
+Each upload REPLACES the previous transcript atomically.
+
+Flow:
+  Turn 1: Upload [msg1, msg2]
+  Turn 2: Download [msg1, msg2] → Upload [msg1, msg2, msg3, msg4] (REPLACE)
+  Turn 3: Download [msg1, msg2, msg3, msg4] → Upload [all messages] (REPLACE)
+
+The transcript is never incremental - always the complete atomic state.
+"""
+
+import logging
+from typing import Any
+from uuid import uuid4
+
+from pydantic import BaseModel
+
+from backend.util import json
+
+from .transcript import STRIPPABLE_TYPES
+
+logger = logging.getLogger(__name__)
+
+
+class TranscriptEntry(BaseModel):
+    """Single transcript entry (user or assistant turn)."""
+
+    type: str
+    uuid: str
+    parentUuid: str = ""
+    isCompactSummary: bool | None = None
+    message: dict[str, Any]
+
+
+class TranscriptBuilder:
+    """Build complete JSONL transcript from SDK messages.
+
+    This builder maintains the FULL conversation state, not incremental changes.
+    The output is always the complete active context.
+    """
+
+    def __init__(self) -> None:
+        self._entries: list[TranscriptEntry] = []
+        self._last_uuid: str | None = None
+
+    def _last_is_assistant(self) -> bool:
+        return bool(self._entries) and self._entries[-1].type == "assistant"
+
+    def _last_message_id(self) -> str:
+        """Return the message.id of the last entry, or '' if none."""
+        if self._entries:
+            return self._entries[-1].message.get("id", "")
+        return ""
+
+    @staticmethod
+    def _parse_entry(data: dict) -> TranscriptEntry | None:
+        """Parse a single transcript entry, filtering strippable types.
+
+        Returns ``None`` for entries that should be skipped (strippable types
+        that are not compaction summaries).
+        """
+        entry_type = data.get("type", "")
+        if entry_type in STRIPPABLE_TYPES and not data.get("isCompactSummary"):
+            return None
+        return TranscriptEntry(
+            type=entry_type,
+            uuid=data.get("uuid") or str(uuid4()),
+            parentUuid=data.get("parentUuid") or "",
+            isCompactSummary=data.get("isCompactSummary"),
+            message=data.get("message", {}),
+        )
+
+    def load_previous(self, content: str, log_prefix: str = "[Transcript]") -> None:
+        """Load complete previous transcript.
+
+        This loads the FULL previous context. As new messages come in,
+        we append to this state. The final output is the complete context
+        (previous + new), not just the delta.
+        """
+        if not content or not content.strip():
+            return
+
+        lines = content.strip().split("\n")
+        for line_num, line in enumerate(lines, 1):
+            if not line.strip():
+                continue
+
+            data = json.loads(line, fallback=None)
+            if data is None:
+                logger.warning(
+                    "%s Failed to parse transcript line %d/%d",
+                    log_prefix,
+                    line_num,
+                    len(lines),
+                )
+                continue
+
+            entry = self._parse_entry(data)
+            if entry is None:
+                continue
+            self._entries.append(entry)
+            self._last_uuid = entry.uuid
+
+        logger.info(
+            "%s Loaded %d entries from previous transcript (last_uuid=%s)",
+            log_prefix,
+            len(self._entries),
+            self._last_uuid[:12] if self._last_uuid else None,
+        )
+
+    def append_user(self, content: str | list[dict], uuid: str | None = None) -> None:
+        """Append a user entry."""
+        msg_uuid = uuid or str(uuid4())
+
+        self._entries.append(
+            TranscriptEntry(
+                type="user",
+                uuid=msg_uuid,
+                parentUuid=self._last_uuid or "",
+                message={"role": "user", "content": content},
+            )
+        )
+        self._last_uuid = msg_uuid
+
+    def append_tool_result(self, tool_use_id: str, content: str) -> None:
+        """Append a tool result as a user entry (one per tool call)."""
+        self.append_user(
+            content=[
+                {"type": "tool_result", "tool_use_id": tool_use_id, "content": content}
+            ]
+        )
+
+    def append_assistant(
+        self,
+        content_blocks: list[dict],
+        model: str = "",
+        stop_reason: str | None = None,
+    ) -> None:
+        """Append an assistant entry.
+
+        Consecutive assistant entries automatically share the same message ID
+        so the CLI can merge them (thinking → text → tool_use) into a single
+        API message on ``--resume``.  A new ID is assigned whenever an
+        assistant entry follows a non-assistant entry (user message or tool
+        result), because that marks the start of a new API response.
+        """
+        message_id = (
+            self._last_message_id()
+            if self._last_is_assistant()
+            else f"msg_sdk_{uuid4().hex[:24]}"
+        )
+
+        msg_uuid = str(uuid4())
+
+        self._entries.append(
+            TranscriptEntry(
+                type="assistant",
+                uuid=msg_uuid,
+                parentUuid=self._last_uuid or "",
+                message={
+                    "role": "assistant",
+                    "model": model,
+                    "id": message_id,
+                    "type": "message",
+                    "content": content_blocks,
+                    "stop_reason": stop_reason,
+                    "stop_sequence": None,
+                },
+            )
+        )
+        self._last_uuid = msg_uuid
+
+    def replace_entries(
+        self, compacted_entries: list[dict], log_prefix: str = "[Transcript]"
+    ) -> None:
+        """Replace all entries with compacted entries from the CLI session file.
+
+        Called after mid-stream compaction so TranscriptBuilder mirrors the
+        CLI's active context (compaction summary + post-compaction entries).
+
+        Builds the new list first and validates it's non-empty before swapping,
+        so corrupt input cannot wipe the conversation history.
+        """
+        new_entries: list[TranscriptEntry] = []
+        for data in compacted_entries:
+            entry = self._parse_entry(data)
+            if entry is not None:
+                new_entries.append(entry)
+
+        if not new_entries:
+            logger.warning(
+                "%s replace_entries produced 0 entries from %d inputs, keeping old (%d entries)",
+                log_prefix,
+                len(compacted_entries),
+                len(self._entries),
+            )
+            return
+
+        old_count = len(self._entries)
+        self._entries = new_entries
+        self._last_uuid = new_entries[-1].uuid
+
+        logger.info(
+            "%s TranscriptBuilder compacted: %d entries -> %d entries",
+            log_prefix,
+            old_count,
+            len(self._entries),
+        )
+
+    def to_jsonl(self) -> str:
+        """Export complete context as JSONL.
+
+        Consecutive assistant entries are kept separate to match the
+        native CLI format — the SDK merges them internally on resume.
+
+        Returns the FULL conversation state (all entries), not incremental.
+        This output REPLACES any previous transcript.
+        """
+        if not self._entries:
+            return ""
+
+        lines = [entry.model_dump_json(exclude_none=True) for entry in self._entries]
+        return "\n".join(lines) + "\n"
+
+    @property
+    def entry_count(self) -> int:
+        """Total number of entries in the complete context."""
+        return len(self._entries)
+
+    @property
+    def is_empty(self) -> bool:
+        """Whether this builder has any entries."""
+        return len(self._entries) == 0
+
+    @property
+    def last_entry_type(self) -> str | None:
+        """Type of the last entry, or None if empty."""
+        return self._entries[-1].type if self._entries else None
diff --git a/autogpt_platform/backend/backend/copilot/transcript_builder_test.py b/autogpt_platform/backend/backend/copilot/transcript_builder_test.py
new file mode 100644
index 0000000000..c53bbc29a0
--- /dev/null
+++ b/autogpt_platform/backend/backend/copilot/transcript_builder_test.py
@@ -0,0 +1,260 @@
+"""Tests for canonical TranscriptBuilder (backend.copilot.transcript_builder).
+
+These tests directly import from the canonical module to ensure codecov
+patch coverage for the new file.
+"""
+
+from backend.copilot.transcript_builder import TranscriptBuilder, TranscriptEntry
+from backend.util import json
+
+
+def _make_jsonl(*entries: dict) -> str:
+    return "\n".join(json.dumps(e) for e in entries) + "\n"
+
+
+USER_MSG = {
+    "type": "user",
+    "uuid": "u1",
+    "message": {"role": "user", "content": "hello"},
+}
+ASST_MSG = {
+    "type": "assistant",
+    "uuid": "a1",
+    "parentUuid": "u1",
+    "message": {
+        "role": "assistant",
+        "id": "msg_1",
+        "type": "message",
+        "content": [{"type": "text", "text": "hi"}],
+        "stop_reason": "end_turn",
+        "stop_sequence": None,
+    },
+}
+
+
+class TestTranscriptEntry:
+    def test_basic_construction(self):
+        entry = TranscriptEntry(
+            type="user", uuid="u1", message={"role": "user", "content": "hi"}
+        )
+        assert entry.type == "user"
+        assert entry.uuid == "u1"
+        assert entry.parentUuid == ""
+        assert entry.isCompactSummary is None
+
+    def test_optional_fields(self):
+        entry = TranscriptEntry(
+            type="summary",
+            uuid="s1",
+            parentUuid="p1",
+            isCompactSummary=True,
+            message={"role": "user", "content": "summary"},
+        )
+        assert entry.isCompactSummary is True
+        assert entry.parentUuid == "p1"
+
+
+class TestTranscriptBuilderInit:
+    def test_starts_empty(self):
+        builder = TranscriptBuilder()
+        assert builder.is_empty
+        assert builder.entry_count == 0
+        assert builder.last_entry_type is None
+        assert builder.to_jsonl() == ""
+
+
+class TestAppendUser:
+    def test_appends_user_entry(self):
+        builder = TranscriptBuilder()
+        builder.append_user("hello")
+        assert builder.entry_count == 1
+        assert builder.last_entry_type == "user"
+
+    def test_chains_parent_uuid(self):
+        builder = TranscriptBuilder()
+        builder.append_user("first", uuid="u1")
+        builder.append_user("second", uuid="u2")
+        output = builder.to_jsonl()
+        entries = [json.loads(line) for line in output.strip().split("\n")]
+        assert entries[0]["parentUuid"] == ""
+        assert entries[1]["parentUuid"] == "u1"
+
+    def test_custom_uuid(self):
+        builder = TranscriptBuilder()
+        builder.append_user("hello", uuid="custom-id")
+        output = builder.to_jsonl()
+        entry = json.loads(output.strip())
+        assert entry["uuid"] == "custom-id"
+
+
+class TestAppendToolResult:
+    def test_appends_as_user_entry(self):
+        builder = TranscriptBuilder()
+        builder.append_tool_result(tool_use_id="tc_1", content="result text")
+        assert builder.entry_count == 1
+        assert builder.last_entry_type == "user"
+        output = builder.to_jsonl()
+        entry = json.loads(output.strip())
+        content = entry["message"]["content"]
+        assert len(content) == 1
+        assert content[0]["type"] == "tool_result"
+        assert content[0]["tool_use_id"] == "tc_1"
+        assert content[0]["content"] == "result text"
+
+
+class TestAppendAssistant:
+    def test_appends_assistant_entry(self):
+        builder = TranscriptBuilder()
+        builder.append_user("hi")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "hello"}],
+            model="test-model",
+            stop_reason="end_turn",
+        )
+        assert builder.entry_count == 2
+        assert builder.last_entry_type == "assistant"
+
+    def test_consecutive_assistants_share_message_id(self):
+        builder = TranscriptBuilder()
+        builder.append_user("hi")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "part 1"}],
+            model="m",
+        )
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "part 2"}],
+            model="m",
+        )
+        output = builder.to_jsonl()
+        entries = [json.loads(line) for line in output.strip().split("\n")]
+        # The two assistant entries share the same message ID
+        assert entries[1]["message"]["id"] == entries[2]["message"]["id"]
+
+    def test_non_consecutive_assistants_get_different_ids(self):
+        builder = TranscriptBuilder()
+        builder.append_user("q1")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "a1"}],
+            model="m",
+        )
+        builder.append_user("q2")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "a2"}],
+            model="m",
+        )
+        output = builder.to_jsonl()
+        entries = [json.loads(line) for line in output.strip().split("\n")]
+        assert entries[1]["message"]["id"] != entries[3]["message"]["id"]
+
+
+class TestLoadPrevious:
+    def test_loads_valid_entries(self):
+        content = _make_jsonl(USER_MSG, ASST_MSG)
+        builder = TranscriptBuilder()
+        builder.load_previous(content)
+        assert builder.entry_count == 2
+
+    def test_skips_empty_content(self):
+        builder = TranscriptBuilder()
+        builder.load_previous("")
+        assert builder.is_empty
+        builder.load_previous("   ")
+        assert builder.is_empty
+
+    def test_skips_strippable_types(self):
+        progress = {"type": "progress", "uuid": "p1", "message": {}}
+        content = _make_jsonl(USER_MSG, progress, ASST_MSG)
+        builder = TranscriptBuilder()
+        builder.load_previous(content)
+        assert builder.entry_count == 2  # progress was skipped
+
+    def test_preserves_compact_summary(self):
+        compact = {
+            "type": "summary",
+            "uuid": "cs1",
+            "isCompactSummary": True,
+            "message": {"role": "user", "content": "summary"},
+        }
+        content = _make_jsonl(compact, ASST_MSG)
+        builder = TranscriptBuilder()
+        builder.load_previous(content)
+        assert builder.entry_count == 2
+
+    def test_skips_invalid_json_lines(self):
+        content = '{"type":"user","uuid":"u1","message":{}}\nnot-valid-json\n'
+        builder = TranscriptBuilder()
+        builder.load_previous(content)
+        assert builder.entry_count == 1
+
+
+class TestToJsonl:
+    def test_roundtrip(self):
+        builder = TranscriptBuilder()
+        builder.append_user("hello", uuid="u1")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "world"}],
+            model="m",
+        )
+        output = builder.to_jsonl()
+        assert output.endswith("\n")
+        lines = output.strip().split("\n")
+        assert len(lines) == 2
+        for line in lines:
+            parsed = json.loads(line)
+            assert "type" in parsed
+            assert "uuid" in parsed
+            assert "message" in parsed
+
+
+class TestReplaceEntries:
+    def test_replaces_all_entries(self):
+        builder = TranscriptBuilder()
+        builder.append_user("old")
+        builder.append_assistant(
+            content_blocks=[{"type": "text", "text": "old answer"}], model="m"
+        )
+        assert builder.entry_count == 2
+
+        compacted = [
+            {
+                "type": "summary",
+                "uuid": "cs1",
+                "isCompactSummary": True,
+                "message": {"role": "user", "content": "compacted"},
+            }
+        ]
+        builder.replace_entries(compacted)
+        assert builder.entry_count == 1
+
+    def test_empty_replacement_keeps_existing(self):
+        builder = TranscriptBuilder()
+        builder.append_user("keep me")
+        builder.replace_entries([])
+        assert builder.entry_count == 1
+
+
+class TestParseEntry:
+    def test_filters_strippable_non_compact(self):
+        result = TranscriptBuilder._parse_entry(
+            {"type": "progress", "uuid": "p1", "message": {}}
+        )
+        assert result is None
+
+    def test_keeps_compact_summary(self):
+        result = TranscriptBuilder._parse_entry(
+            {
+                "type": "summary",
+                "uuid": "cs1",
+                "isCompactSummary": True,
+                "message": {},
+            }
+        )
+        assert result is not None
+        assert result.isCompactSummary is True
+
+    def test_generates_uuid_if_missing(self):
+        result = TranscriptBuilder._parse_entry(
+            {"type": "user", "message": {"role": "user", "content": "hi"}}
+        )
+        assert result is not None
+        assert result.uuid  # Should be a generated UUID
diff --git a/autogpt_platform/backend/backend/copilot/transcript_test.py b/autogpt_platform/backend/backend/copilot/transcript_test.py
new file mode 100644
index 0000000000..dd99fd5a85
--- /dev/null
+++ b/autogpt_platform/backend/backend/copilot/transcript_test.py
@@ -0,0 +1,726 @@
+"""Tests for canonical transcript module (backend.copilot.transcript).
+
+Covers pure helper functions that are not exercised by the SDK re-export tests.
+"""
+
+from __future__ import annotations
+
+from unittest.mock import MagicMock
+
+from backend.util import json
+
+from .transcript import (
+    TranscriptDownload,
+    _build_path_from_parts,
+    _find_last_assistant_entry,
+    _flatten_assistant_content,
+    _flatten_tool_result_content,
+    _messages_to_transcript,
+    _meta_storage_path_parts,
+    _rechain_tail,
+    _sanitize_id,
+    _storage_path_parts,
+    _transcript_to_messages,
+    strip_for_upload,
+    validate_transcript,
+)
+
+
+def _make_jsonl(*entries: dict) -> str:
+    return "\n".join(json.dumps(e) for e in entries) + "\n"
+
+
+# ---------------------------------------------------------------------------
+# _sanitize_id
+# ---------------------------------------------------------------------------
+
+
+class TestSanitizeId:
+    def test_uuid_passes_through(self):
+        assert _sanitize_id("abcdef12-3456-7890-abcd-ef1234567890") == (
+            "abcdef12-3456-7890-abcd-ef1234567890"
+        )
+
+    def test_strips_non_hex_characters(self):
+        # Only hex chars (0-9, a-f, A-F) and hyphens are kept
+        result = _sanitize_id("abc/../../etc/passwd")
+        assert "/" not in result
+        assert "." not in result
+        # 'p', 's', 'w' are not hex chars, so they are stripped
+        assert all(c in "0123456789abcdefABCDEF-" for c in result)
+
+    def test_truncates_to_max_len(self):
+        long_id = "a" * 100
+        result = _sanitize_id(long_id, max_len=10)
+        assert len(result) == 10
+
+    def test_empty_returns_unknown(self):
+        assert _sanitize_id("") == "unknown"
+
+    def test_none_returns_unknown(self):
+        assert _sanitize_id(None) == "unknown"  # type: ignore[arg-type]
+
+    def test_special_chars_only_returns_unknown(self):
+        assert _sanitize_id("!@#$%^&*()") == "unknown"
+
+
+# ---------------------------------------------------------------------------
+# _storage_path_parts / _meta_storage_path_parts
+# ---------------------------------------------------------------------------
+
+
+class TestStoragePathParts:
+    def test_returns_triple(self):
+        prefix, uid, fname = _storage_path_parts("user-1", "sess-2")
+        assert prefix == "chat-transcripts"
+        assert "e" in uid  # hex chars from "user-1" sanitized
+        assert fname.endswith(".jsonl")
+
+    def test_meta_returns_meta_json(self):
+        prefix, uid, fname = _meta_storage_path_parts("user-1", "sess-2")
+        assert prefix == "chat-transcripts"
+        assert fname.endswith(".meta.json")
+
+
+# ---------------------------------------------------------------------------
+# _build_path_from_parts
+# ---------------------------------------------------------------------------
+
+
+class TestBuildPathFromParts:
+    def test_gcs_backend(self):
+        from backend.util.workspace_storage import GCSWorkspaceStorage
+
+        mock_gcs = MagicMock(spec=GCSWorkspaceStorage)
+        mock_gcs.bucket_name = "my-bucket"
+        path = _build_path_from_parts(("wid", "fid", "file.jsonl"), mock_gcs)
+        assert path == "gcs://my-bucket/workspaces/wid/fid/file.jsonl"
+
+    def test_local_backend(self):
+        # Use a plain object (not MagicMock) so isinstance(GCSWorkspaceStorage) is False
+        local_backend = type("LocalBackend", (), {})()
+        path = _build_path_from_parts(("wid", "fid", "file.jsonl"), local_backend)
+        assert path == "local://wid/fid/file.jsonl"
+
+
+# ---------------------------------------------------------------------------
+# TranscriptDownload dataclass
+# ---------------------------------------------------------------------------
+
+
+class TestTranscriptDownload:
+    def test_defaults(self):
+        td = TranscriptDownload(content="hello")
+        assert td.content == "hello"
+        assert td.message_count == 0
+        assert td.uploaded_at == 0.0
+
+    def test_custom_values(self):
+        td = TranscriptDownload(content="data", message_count=5, uploaded_at=123.45)
+        assert td.message_count == 5
+        assert td.uploaded_at == 123.45
+
+
+# ---------------------------------------------------------------------------
+# _flatten_assistant_content
+# ---------------------------------------------------------------------------
+
+
+class TestFlattenAssistantContent:
+    def test_text_blocks(self):
+        blocks = [
+            {"type": "text", "text": "Hello"},
+            {"type": "text", "text": "World"},
+        ]
+        assert _flatten_assistant_content(blocks) == "Hello\nWorld"
+
+    def test_thinking_blocks_stripped(self):
+        blocks = [
+            {"type": "thinking", "thinking": "hmm..."},
+            {"type": "text", "text": "answer"},
+            {"type": "redacted_thinking", "data": "secret"},
+        ]
+        assert _flatten_assistant_content(blocks) == "answer"
+
+    def test_tool_use_blocks_stripped(self):
+        blocks = [
+            {"type": "text", "text": "I'll run a tool"},
+            {"type": "tool_use", "name": "bash", "id": "tc1", "input": {}},
+        ]
+        assert _flatten_assistant_content(blocks) == "I'll run a tool"
+
+    def test_string_blocks(self):
+        blocks = ["hello", "world"]
+        assert _flatten_assistant_content(blocks) == "hello\nworld"
+
+    def test_empty_blocks(self):
+        assert _flatten_assistant_content([]) == ""
+
+    def test_unknown_dict_blocks_skipped(self):
+        blocks = [{"type": "image", "data": "base64..."}]
+        assert _flatten_assistant_content(blocks) == ""
+
+
+# ---------------------------------------------------------------------------
+# _flatten_tool_result_content
+# ---------------------------------------------------------------------------
+
+
+class TestFlattenToolResultContent:
+    def test_tool_result_with_text_content(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "tc1",
+                "content": [{"type": "text", "text": "output data"}],
+            }
+        ]
+        assert _flatten_tool_result_content(blocks) == "output data"
+
+    def test_tool_result_with_string_content(self):
+        blocks = [
+            {"type": "tool_result", "tool_use_id": "tc1", "content": "simple string"}
+        ]
+        assert _flatten_tool_result_content(blocks) == "simple string"
+
+    def test_tool_result_with_image_placeholder(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "tc1",
+                "content": [{"type": "image", "data": "base64..."}],
+            }
+        ]
+        assert _flatten_tool_result_content(blocks) == "[__image__]"
+
+    def test_tool_result_with_document_placeholder(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "tc1",
+                "content": [{"type": "document", "data": "base64..."}],
+            }
+        ]
+        assert _flatten_tool_result_content(blocks) == "[__document__]"
+
+    def test_tool_result_with_none_content(self):
+        blocks = [{"type": "tool_result", "tool_use_id": "tc1", "content": None}]
+        assert _flatten_tool_result_content(blocks) == ""
+
+    def test_text_block_outside_tool_result(self):
+        blocks = [{"type": "text", "text": "standalone"}]
+        assert _flatten_tool_result_content(blocks) == "standalone"
+
+    def test_unknown_dict_block_placeholder(self):
+        blocks = [{"type": "custom_widget", "data": "x"}]
+        assert _flatten_tool_result_content(blocks) == "[__custom_widget__]"
+
+    def test_string_blocks(self):
+        blocks = ["raw text"]
+        assert _flatten_tool_result_content(blocks) == "raw text"
+
+    def test_empty_blocks(self):
+        assert _flatten_tool_result_content([]) == ""
+
+    def test_mixed_content_in_tool_result(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "tc1",
+                "content": [
+                    {"type": "text", "text": "line1"},
+                    {"type": "image", "data": "..."},
+                    "raw string",
+                ],
+            }
+        ]
+        result = _flatten_tool_result_content(blocks)
+        assert "line1" in result
+        assert "[__image__]" in result
+        assert "raw string" in result
+
+    def test_tool_result_with_dict_without_text_key(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "tc1",
+                "content": [{"count": 42}],
+            }
+        ]
+        result = _flatten_tool_result_content(blocks)
+        assert "42" in result
+
+    def test_tool_result_content_list_with_list_content(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "tc1",
+                "content": [{"type": "text", "text": None}],
+            }
+        ]
+        result = _flatten_tool_result_content(blocks)
+        assert result == "None"
+
+
+# ---------------------------------------------------------------------------
+# _transcript_to_messages
+# ---------------------------------------------------------------------------
+
+USER_ENTRY = {
+    "type": "user",
+    "uuid": "u1",
+    "parentUuid": "",
+    "message": {"role": "user", "content": "hello"},
+}
+ASST_ENTRY = {
+    "type": "assistant",
+    "uuid": "a1",
+    "parentUuid": "u1",
+    "message": {
+        "role": "assistant",
+        "id": "msg_1",
+        "content": [{"type": "text", "text": "hi there"}],
+    },
+}
+PROGRESS_ENTRY = {
+    "type": "progress",
+    "uuid": "p1",
+    "parentUuid": "u1",
+    "data": {},
+}
+
+
+class TestTranscriptToMessages:
+    def test_basic_conversion(self):
+        content = _make_jsonl(USER_ENTRY, ASST_ENTRY)
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 2
+        assert messages[0] == {"role": "user", "content": "hello"}
+        assert messages[1]["role"] == "assistant"
+        assert messages[1]["content"] == "hi there"
+
+    def test_skips_strippable_types(self):
+        content = _make_jsonl(USER_ENTRY, PROGRESS_ENTRY, ASST_ENTRY)
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 2
+
+    def test_skips_entries_without_role(self):
+        no_role = {"type": "user", "uuid": "x", "message": {"content": "no role"}}
+        content = _make_jsonl(no_role)
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 0
+
+    def test_handles_string_content(self):
+        entry = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "plain string"},
+        }
+        content = _make_jsonl(entry)
+        messages = _transcript_to_messages(content)
+        assert messages[0]["content"] == "plain string"
+
+    def test_handles_tool_result_content(self):
+        entry = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {
+                "role": "user",
+                "content": [
+                    {"type": "tool_result", "tool_use_id": "tc1", "content": "output"}
+                ],
+            },
+        }
+        content = _make_jsonl(entry)
+        messages = _transcript_to_messages(content)
+        assert messages[0]["content"] == "output"
+
+    def test_handles_none_content(self):
+        entry = {
+            "type": "assistant",
+            "uuid": "a1",
+            "message": {"role": "assistant", "content": None},
+        }
+        content = _make_jsonl(entry)
+        messages = _transcript_to_messages(content)
+        assert messages[0]["content"] == ""
+
+    def test_skips_invalid_json(self):
+        content = "not valid json\n"
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 0
+
+    def test_preserves_compact_summary(self):
+        compact = {
+            "type": "summary",
+            "uuid": "cs1",
+            "isCompactSummary": True,
+            "message": {"role": "user", "content": "summary of conversation"},
+        }
+        content = _make_jsonl(compact)
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 1
+
+    def test_strips_summary_without_compact_flag(self):
+        summary = {
+            "type": "summary",
+            "uuid": "s1",
+            "message": {"role": "user", "content": "summary"},
+        }
+        content = _make_jsonl(summary)
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 0
+
+
+# ---------------------------------------------------------------------------
+# _messages_to_transcript
+# ---------------------------------------------------------------------------
+
+
+class TestMessagesToTranscript:
+    def test_basic_roundtrip(self):
+        messages = [
+            {"role": "user", "content": "hello"},
+            {"role": "assistant", "content": "world"},
+        ]
+        result = _messages_to_transcript(messages)
+        assert result.endswith("\n")
+        lines = result.strip().split("\n")
+        assert len(lines) == 2
+
+        user_entry = json.loads(lines[0])
+        assert user_entry["type"] == "user"
+        assert user_entry["message"]["role"] == "user"
+        assert user_entry["message"]["content"] == "hello"
+        assert user_entry["parentUuid"] == ""
+
+        asst_entry = json.loads(lines[1])
+        assert asst_entry["type"] == "assistant"
+        assert asst_entry["message"]["role"] == "assistant"
+        assert asst_entry["message"]["content"] == [{"type": "text", "text": "world"}]
+        assert asst_entry["parentUuid"] == user_entry["uuid"]
+
+    def test_empty_messages(self):
+        assert _messages_to_transcript([]) == ""
+
+    def test_assistant_has_message_envelope(self):
+        messages = [{"role": "assistant", "content": "test"}]
+        result = _messages_to_transcript(messages)
+        entry = json.loads(result.strip())
+        msg = entry["message"]
+        assert "id" in msg
+        assert msg["id"].startswith("msg_compact_")
+        assert msg["type"] == "message"
+        assert msg["stop_reason"] == "end_turn"
+        assert msg["stop_sequence"] is None
+
+    def test_uuid_chain(self):
+        messages = [
+            {"role": "user", "content": "a"},
+            {"role": "assistant", "content": "b"},
+            {"role": "user", "content": "c"},
+        ]
+        result = _messages_to_transcript(messages)
+        lines = result.strip().split("\n")
+        entries = [json.loads(line) for line in lines]
+        assert entries[0]["parentUuid"] == ""
+        assert entries[1]["parentUuid"] == entries[0]["uuid"]
+        assert entries[2]["parentUuid"] == entries[1]["uuid"]
+
+    def test_assistant_with_empty_content(self):
+        messages = [{"role": "assistant", "content": ""}]
+        result = _messages_to_transcript(messages)
+        entry = json.loads(result.strip())
+        assert entry["message"]["content"] == []
+
+
+# ---------------------------------------------------------------------------
+# _find_last_assistant_entry
+# ---------------------------------------------------------------------------
+
+
+class TestFindLastAssistantEntry:
+    def test_splits_at_last_assistant(self):
+        user = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "hi"},
+        }
+        asst = {
+            "type": "assistant",
+            "uuid": "a1",
+            "message": {"role": "assistant", "id": "msg1", "content": "answer"},
+        }
+        content = _make_jsonl(user, asst)
+        prefix, tail = _find_last_assistant_entry(content)
+        assert len(prefix) == 1
+        assert len(tail) == 1
+
+    def test_no_assistant_returns_all_in_prefix(self):
+        user1 = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "hi"},
+        }
+        user2 = {
+            "type": "user",
+            "uuid": "u2",
+            "message": {"role": "user", "content": "hey"},
+        }
+        content = _make_jsonl(user1, user2)
+        prefix, tail = _find_last_assistant_entry(content)
+        assert len(prefix) == 2
+        assert len(tail) == 0
+
+    def test_multi_entry_turn_preserved(self):
+        user = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "q"},
+        }
+        asst1 = {
+            "type": "assistant",
+            "uuid": "a1",
+            "message": {
+                "role": "assistant",
+                "id": "msg_turn",
+                "content": [{"type": "thinking", "thinking": "hmm"}],
+            },
+        }
+        asst2 = {
+            "type": "assistant",
+            "uuid": "a2",
+            "message": {
+                "role": "assistant",
+                "id": "msg_turn",
+                "content": [{"type": "text", "text": "answer"}],
+            },
+        }
+        content = _make_jsonl(user, asst1, asst2)
+        prefix, tail = _find_last_assistant_entry(content)
+        assert len(prefix) == 1  # just the user
+        assert len(tail) == 2  # both assistant entries
+
+    def test_assistant_without_id(self):
+        user = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "q"},
+        }
+        asst = {
+            "type": "assistant",
+            "uuid": "a1",
+            "message": {"role": "assistant", "content": "no id"},
+        }
+        content = _make_jsonl(user, asst)
+        prefix, tail = _find_last_assistant_entry(content)
+        assert len(prefix) == 1
+        assert len(tail) == 1
+
+    def test_trailing_user_after_assistant(self):
+        user1 = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "q"},
+        }
+        asst = {
+            "type": "assistant",
+            "uuid": "a1",
+            "message": {"role": "assistant", "id": "msg1", "content": "a"},
+        }
+        user2 = {
+            "type": "user",
+            "uuid": "u2",
+            "message": {"role": "user", "content": "follow"},
+        }
+        content = _make_jsonl(user1, asst, user2)
+        prefix, tail = _find_last_assistant_entry(content)
+        assert len(prefix) == 1  # user1
+        assert len(tail) == 2  # asst + user2
+
+
+# ---------------------------------------------------------------------------
+# _rechain_tail
+# ---------------------------------------------------------------------------
+
+
+class TestRechainTail:
+    def test_empty_tail(self):
+        assert _rechain_tail("some prefix\n", []) == ""
+
+    def test_patches_first_entry_parent(self):
+        prefix_entry = {"uuid": "last-prefix-uuid", "type": "user", "message": {}}
+        prefix = json.dumps(prefix_entry) + "\n"
+
+        tail_entry = {
+            "uuid": "t1",
+            "parentUuid": "old-parent",
+            "type": "assistant",
+            "message": {},
+        }
+        tail_lines = [json.dumps(tail_entry)]
+
+        result = _rechain_tail(prefix, tail_lines)
+        parsed = json.loads(result.strip())
+        assert parsed["parentUuid"] == "last-prefix-uuid"
+
+    def test_chains_consecutive_tail_entries(self):
+        prefix_entry = {"uuid": "p1", "type": "user", "message": {}}
+        prefix = json.dumps(prefix_entry) + "\n"
+
+        t1 = {"uuid": "t1", "parentUuid": "old1", "type": "assistant", "message": {}}
+        t2 = {"uuid": "t2", "parentUuid": "old2", "type": "user", "message": {}}
+        tail_lines = [json.dumps(t1), json.dumps(t2)]
+
+        result = _rechain_tail(prefix, tail_lines)
+        entries = [json.loads(line) for line in result.strip().split("\n")]
+        assert entries[0]["parentUuid"] == "p1"
+        assert entries[1]["parentUuid"] == "t1"
+
+    def test_non_dict_lines_passed_through(self):
+        prefix_entry = {"uuid": "p1", "type": "user", "message": {}}
+        prefix = json.dumps(prefix_entry) + "\n"
+
+        tail_lines = ["not-a-json-dict"]
+        result = _rechain_tail(prefix, tail_lines)
+        assert "not-a-json-dict" in result
+
+
+# ---------------------------------------------------------------------------
+# strip_for_upload (combined single-parse)
+# ---------------------------------------------------------------------------
+
+
+class TestStripForUpload:
+    def test_strips_progress_and_thinking(self):
+        user = {
+            "type": "user",
+            "uuid": "u1",
+            "parentUuid": "",
+            "message": {"role": "user", "content": "hi"},
+        }
+        progress = {"type": "progress", "uuid": "p1", "parentUuid": "u1", "data": {}}
+        asst_old = {
+            "type": "assistant",
+            "uuid": "a1",
+            "parentUuid": "p1",
+            "message": {
+                "role": "assistant",
+                "id": "msg_old",
+                "content": [
+                    {"type": "thinking", "thinking": "stale thinking"},
+                    {"type": "text", "text": "old answer"},
+                ],
+            },
+        }
+        user2 = {
+            "type": "user",
+            "uuid": "u2",
+            "parentUuid": "a1",
+            "message": {"role": "user", "content": "next"},
+        }
+        asst_new = {
+            "type": "assistant",
+            "uuid": "a2",
+            "parentUuid": "u2",
+            "message": {
+                "role": "assistant",
+                "id": "msg_new",
+                "content": [
+                    {"type": "thinking", "thinking": "fresh thinking"},
+                    {"type": "text", "text": "new answer"},
+                ],
+            },
+        }
+        content = _make_jsonl(user, progress, asst_old, user2, asst_new)
+        result = strip_for_upload(content)
+
+        lines = result.strip().split("\n")
+        # Progress should be stripped -> 4 entries remain
+        assert len(lines) == 4
+
+        # First entry (user) should be reparented since its child (progress) was stripped
+        entries = [json.loads(line) for line in lines]
+        types = [e.get("type") for e in entries]
+        assert "progress" not in types
+
+        # Old assistant thinking stripped, new assistant thinking preserved
+        old_asst = next(
+            e for e in entries if e.get("message", {}).get("id") == "msg_old"
+        )
+        old_content = old_asst["message"]["content"]
+        old_types = [b["type"] for b in old_content if isinstance(b, dict)]
+        assert "thinking" not in old_types
+        assert "text" in old_types
+
+        new_asst = next(
+            e for e in entries if e.get("message", {}).get("id") == "msg_new"
+        )
+        new_content = new_asst["message"]["content"]
+        new_types = [b["type"] for b in new_content if isinstance(b, dict)]
+        assert "thinking" in new_types  # last assistant preserved
+
+    def test_empty_content(self):
+        result = strip_for_upload("")
+        # Empty string produces a single empty line after split, resulting in "\n"
+        assert result.strip() == ""
+
+    def test_preserves_compact_summary(self):
+        compact = {
+            "type": "summary",
+            "uuid": "cs1",
+            "isCompactSummary": True,
+            "message": {"role": "user", "content": "summary"},
+        }
+        asst = {
+            "type": "assistant",
+            "uuid": "a1",
+            "parentUuid": "cs1",
+            "message": {"role": "assistant", "id": "msg1", "content": "answer"},
+        }
+        content = _make_jsonl(compact, asst)
+        result = strip_for_upload(content)
+        lines = result.strip().split("\n")
+        assert len(lines) == 2
+
+    def test_no_assistant_entries(self):
+        user = {
+            "type": "user",
+            "uuid": "u1",
+            "message": {"role": "user", "content": "hi"},
+        }
+        content = _make_jsonl(user)
+        result = strip_for_upload(content)
+        lines = result.strip().split("\n")
+        assert len(lines) == 1
+
+
+# ---------------------------------------------------------------------------
+# validate_transcript (additional edge cases)
+# ---------------------------------------------------------------------------
+
+
+class TestValidateTranscript:
+    def test_valid_with_assistant(self):
+        content = _make_jsonl(
+            USER_ENTRY,
+            ASST_ENTRY,
+        )
+        assert validate_transcript(content) is True
+
+    def test_none_returns_false(self):
+        assert validate_transcript(None) is False
+
+    def test_whitespace_only_returns_false(self):
+        assert validate_transcript("   \n  ") is False
+
+    def test_no_assistant_returns_false(self):
+        content = _make_jsonl(USER_ENTRY)
+        assert validate_transcript(content) is False
+
+    def test_invalid_json_returns_false(self):
+        assert validate_transcript("not json\n") is False
+
+    def test_assistant_only_is_valid(self):
+        content = _make_jsonl(ASST_ENTRY)
+        assert validate_transcript(content) is True
diff --git a/autogpt_platform/backend/backend/data/block_cost_config.py b/autogpt_platform/backend/backend/data/block_cost_config.py
index f9e49efc95..1753d5e65e 100644
--- a/autogpt_platform/backend/backend/data/block_cost_config.py
+++ b/autogpt_platform/backend/backend/data/block_cost_config.py
@@ -147,6 +147,19 @@ MODEL_COST: dict[LlmModel, int] = {
     LlmModel.KIMI_K2: 1,
     LlmModel.QWEN3_235B_A22B_THINKING: 1,
     LlmModel.QWEN3_CODER: 9,
+    # Z.ai (Zhipu) models
+    LlmModel.ZAI_GLM_4_32B: 1,
+    LlmModel.ZAI_GLM_4_5: 2,
+    LlmModel.ZAI_GLM_4_5_AIR: 1,
+    LlmModel.ZAI_GLM_4_5_AIR_FREE: 1,
+    LlmModel.ZAI_GLM_4_5V: 2,
+    LlmModel.ZAI_GLM_4_6: 1,
+    LlmModel.ZAI_GLM_4_6V: 1,
+    LlmModel.ZAI_GLM_4_7: 1,
+    LlmModel.ZAI_GLM_4_7_FLASH: 1,
+    LlmModel.ZAI_GLM_5: 2,
+    LlmModel.ZAI_GLM_5_TURBO: 4,
+    LlmModel.ZAI_GLM_5V_TURBO: 4,
     # v0 by Vercel models
     LlmModel.V0_1_5_MD: 1,
     LlmModel.V0_1_5_LG: 2,
diff --git a/autogpt_platform/backend/backend/util/feature_flag.py b/autogpt_platform/backend/backend/util/feature_flag.py
index 2af9659011..47ad704fc3 100644
--- a/autogpt_platform/backend/backend/util/feature_flag.py
+++ b/autogpt_platform/backend/backend/util/feature_flag.py
@@ -1,5 +1,6 @@
 import contextlib
 import logging
+import os
 from enum import Enum
 from functools import wraps
 from typing import Any, Awaitable, Callable, TypeVar
@@ -38,6 +39,7 @@ class Flag(str, Enum):
     AGENT_ACTIVITY = "agent-activity"
     ENABLE_PLATFORM_PAYMENT = "enable-platform-payment"
     CHAT = "chat"
+    CHAT_MODE_OPTION = "chat-mode-option"
     COPILOT_SDK = "copilot-sdk"
     COPILOT_DAILY_TOKEN_LIMIT = "copilot-daily-token-limit"
     COPILOT_WEEKLY_TOKEN_LIMIT = "copilot-weekly-token-limit"
@@ -165,6 +167,30 @@ async def get_feature_flag_value(
         return default
 
 
+def _env_flag_override(flag_key: Flag) -> bool | None:
+    """Return a local override for ``flag_key`` from the environment.
+
+    Set ``FORCE_FLAG_<NAME>=true|false`` (``NAME`` = flag value with
+    ``-`` → ``_``, upper-cased) to bypass LaunchDarkly for a single
+    flag in local dev or tests.  Returns ``None`` when no override
+    is configured so the caller falls through to LaunchDarkly.
+
+    The ``NEXT_PUBLIC_FORCE_FLAG_<NAME>`` prefix is also accepted so a
+    single shared env var can toggle a flag across backend and
+    frontend (the frontend requires the ``NEXT_PUBLIC_`` prefix to
+    expose the value to the browser bundle).
+
+    Example: ``FORCE_FLAG_CHAT_MODE_OPTION=true`` forces
+    ``Flag.CHAT_MODE_OPTION`` on regardless of LaunchDarkly.
+    """
+    suffix = flag_key.value.upper().replace("-", "_")
+    for prefix in ("FORCE_FLAG_", "NEXT_PUBLIC_FORCE_FLAG_"):
+        raw = os.environ.get(prefix + suffix)
+        if raw is not None:
+            return raw.strip().lower() in ("1", "true", "yes", "on")
+    return None
+
+
 async def is_feature_enabled(
     flag_key: Flag,
     user_id: str,
@@ -181,6 +207,11 @@ async def is_feature_enabled(
     Returns:
         True if feature is enabled, False otherwise
     """
+    override = _env_flag_override(flag_key)
+    if override is not None:
+        logger.debug(f"Feature flag {flag_key} overridden by env: {override}")
+        return override
+
     result = await get_feature_flag_value(flag_key.value, user_id, default)
 
     # If the result is already a boolean, return it
diff --git a/autogpt_platform/backend/backend/util/feature_flag_test.py b/autogpt_platform/backend/backend/util/feature_flag_test.py
index 9bd99809ff..9a11256ef8 100644
--- a/autogpt_platform/backend/backend/util/feature_flag_test.py
+++ b/autogpt_platform/backend/backend/util/feature_flag_test.py
@@ -4,6 +4,7 @@ from ldclient import LDClient
 
 from backend.util.feature_flag import (
     Flag,
+    _env_flag_override,
     feature_flag,
     is_feature_enabled,
     mock_flag_variation,
@@ -111,3 +112,59 @@ async def test_is_feature_enabled_with_flag_enum(mocker):
     assert result is True
     # Should call with the flag's string value
     mock_get_feature_flag_value.assert_called_once()
+
+
+class TestEnvFlagOverride:
+    def test_force_flag_true(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "true")
+        assert _env_flag_override(Flag.CHAT) is True
+
+    def test_force_flag_false(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "false")
+        assert _env_flag_override(Flag.CHAT) is False
+
+    def test_next_public_prefix_true(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("NEXT_PUBLIC_FORCE_FLAG_CHAT", "true")
+        assert _env_flag_override(Flag.CHAT) is True
+
+    def test_unset_returns_none(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.delenv("FORCE_FLAG_CHAT", raising=False)
+        monkeypatch.delenv("NEXT_PUBLIC_FORCE_FLAG_CHAT", raising=False)
+        assert _env_flag_override(Flag.CHAT) is None
+
+    def test_invalid_value_returns_false(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "notaboolean")
+        assert _env_flag_override(Flag.CHAT) is False
+
+    def test_numeric_one_returns_true(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "1")
+        assert _env_flag_override(Flag.CHAT) is True
+
+    def test_yes_returns_true(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "yes")
+        assert _env_flag_override(Flag.CHAT) is True
+
+    def test_on_returns_true(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "on")
+        assert _env_flag_override(Flag.CHAT) is True
+
+    def test_hyphenated_flag_converts_to_underscore(
+        self, monkeypatch: pytest.MonkeyPatch
+    ):
+        monkeypatch.setenv("FORCE_FLAG_CHAT_MODE_OPTION", "true")
+        assert _env_flag_override(Flag.CHAT_MODE_OPTION) is True
+
+    def test_force_flag_takes_precedence_over_next_public(
+        self, monkeypatch: pytest.MonkeyPatch
+    ):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "false")
+        monkeypatch.setenv("NEXT_PUBLIC_FORCE_FLAG_CHAT", "true")
+        assert _env_flag_override(Flag.CHAT) is False
+
+    def test_whitespace_is_stripped(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "  true  ")
+        assert _env_flag_override(Flag.CHAT) is True
+
+    def test_case_insensitive_value(self, monkeypatch: pytest.MonkeyPatch):
+        monkeypatch.setenv("FORCE_FLAG_CHAT", "TRUE")
+        assert _env_flag_override(Flag.CHAT) is True
diff --git a/autogpt_platform/backend/backend/util/workspace.py b/autogpt_platform/backend/backend/util/workspace.py
index 34ab1e3582..5ec4a5b336 100644
--- a/autogpt_platform/backend/backend/util/workspace.py
+++ b/autogpt_platform/backend/backend/util/workspace.py
@@ -155,6 +155,7 @@ class WorkspaceManager:
         path: Optional[str] = None,
         mime_type: Optional[str] = None,
         overwrite: bool = False,
+        metadata: Optional[dict] = None,
     ) -> WorkspaceFile:
         """
         Write file to workspace.
@@ -168,6 +169,7 @@ class WorkspaceManager:
             path: Virtual path (defaults to "/{filename}", session-scoped if session_id set)
             mime_type: MIME type (auto-detected if not provided)
             overwrite: Whether to overwrite existing file at path
+            metadata: Optional metadata dict (e.g., origin tracking)
 
         Returns:
             Created WorkspaceFile instance
@@ -246,6 +248,7 @@ class WorkspaceManager:
                     mime_type=mime_type,
                     size_bytes=len(content),
                     checksum=checksum,
+                    metadata=metadata,
                 )
             except UniqueViolationError:
                 if retries > 0:
diff --git a/autogpt_platform/backend/test/agent_generator/test_orchestrator.py b/autogpt_platform/backend/test/agent_generator/test_orchestrator.py
index 557db8016b..0096b222ef 100644
--- a/autogpt_platform/backend/test/agent_generator/test_orchestrator.py
+++ b/autogpt_platform/backend/test/agent_generator/test_orchestrator.py
@@ -140,7 +140,9 @@ class TestFixOrchestratorBlocks:
         assert defaults["conversation_compaction"] is True
         assert defaults["retry"] == 3
         assert defaults["multiple_tool_calls"] is False
-        assert len(fixer.fixes_applied) == 4
+        assert defaults["execution_mode"] == "extended_thinking"
+        assert defaults["model"] == "claude-opus-4-6"
+        assert len(fixer.fixes_applied) == 6
 
     def test_preserves_existing_values(self):
         """Existing user-set values are never overwritten."""
@@ -153,6 +155,8 @@ class TestFixOrchestratorBlocks:
                         "conversation_compaction": False,
                         "retry": 1,
                         "multiple_tool_calls": True,
+                        "execution_mode": "built_in",
+                        "model": "gpt-4o",
                     }
                 )
             ],
@@ -166,6 +170,8 @@ class TestFixOrchestratorBlocks:
         assert defaults["conversation_compaction"] is False
         assert defaults["retry"] == 1
         assert defaults["multiple_tool_calls"] is True
+        assert defaults["execution_mode"] == "built_in"
+        assert defaults["model"] == "gpt-4o"
         assert len(fixer.fixes_applied) == 0
 
     def test_partial_defaults(self):
@@ -189,7 +195,9 @@ class TestFixOrchestratorBlocks:
         assert defaults["conversation_compaction"] is True  # filled
         assert defaults["retry"] == 3  # filled
         assert defaults["multiple_tool_calls"] is False  # filled
-        assert len(fixer.fixes_applied) == 3
+        assert defaults["execution_mode"] == "extended_thinking"  # filled
+        assert defaults["model"] == "claude-opus-4-6"  # filled
+        assert len(fixer.fixes_applied) == 5
 
     def test_skips_non_sdm_nodes(self):
         """Non-Orchestrator nodes are untouched."""
@@ -258,11 +266,13 @@ class TestFixOrchestratorBlocks:
         result = fixer.fix_orchestrator_blocks(agent)
 
         defaults = result["nodes"][0]["input_default"]
-        assert defaults["agent_mode_max_iterations"] == 10  # None → default
-        assert defaults["conversation_compaction"] is True  # None → default
+        assert defaults["agent_mode_max_iterations"] == 10  # None -> default
+        assert defaults["conversation_compaction"] is True  # None -> default
         assert defaults["retry"] == 3  # kept
         assert defaults["multiple_tool_calls"] is False  # kept
-        assert len(fixer.fixes_applied) == 2
+        assert defaults["execution_mode"] == "extended_thinking"  # filled
+        assert defaults["model"] == "claude-opus-4-6"  # filled
+        assert len(fixer.fixes_applied) == 4
 
     def test_multiple_sdm_nodes(self):
         """Multiple SDM nodes are all fixed independently."""
@@ -277,11 +287,11 @@ class TestFixOrchestratorBlocks:
 
         result = fixer.fix_orchestrator_blocks(agent)
 
-        # First node: 3 defaults filled (agent_mode was already set)
+        # First node: 5 defaults filled (agent_mode was already set)
         assert result["nodes"][0]["input_default"]["agent_mode_max_iterations"] == 3
-        # Second node: all 4 defaults filled
+        # Second node: all 6 defaults filled
         assert result["nodes"][1]["input_default"]["agent_mode_max_iterations"] == 10
-        assert len(fixer.fixes_applied) == 7  # 3 + 4
+        assert len(fixer.fixes_applied) == 11  # 5 + 6
 
     def test_registered_in_apply_all_fixes(self):
         """fix_orchestrator_blocks runs as part of apply_all_fixes."""
@@ -655,6 +665,7 @@ class TestOrchestratorE2EPipeline:
                         "conversation_compaction": {"type": "boolean"},
                         "retry": {"type": "integer"},
                         "multiple_tool_calls": {"type": "boolean"},
+                        "execution_mode": {"type": "string"},
                     },
                     "required": ["prompt"],
                 },
diff --git a/autogpt_platform/backend/test/copilot/__init__.py b/autogpt_platform/backend/test/copilot/__init__.py
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/autogpt_platform/backend/test/copilot/dry_run_loop_test.py b/autogpt_platform/backend/test/copilot/dry_run_loop_test.py
new file mode 100644
index 0000000000..b55a050fd2
--- /dev/null
+++ b/autogpt_platform/backend/test/copilot/dry_run_loop_test.py
@@ -0,0 +1,394 @@
+"""Prompt regression tests AND functional tests for the dry-run verification loop.
+
+NOTE: This file lives in test/copilot/ rather than being colocated with a
+single source module because it is a cross-cutting test spanning multiple
+modules: prompting.py, service.py, agent_generation_guide.md, and run_agent.py.
+
+These tests verify that the create -> dry-run -> fix iterative workflow is
+properly communicated through tool descriptions, the prompting supplement,
+and the agent building guide.
+
+After deduplication, the full dry-run workflow lives in the
+agent_generation_guide.md only. The system prompt and individual tool
+descriptions no longer repeat it — they keep a minimal footprint.
+
+**Intentionally brittle**: the assertions check for specific substrings so
+that accidental removal or rewording of key instructions is caught. If you
+deliberately reword a prompt, update the corresponding assertion here.
+
+--- Functional tests (added separately) ---
+
+The dry-run loop is primarily a *prompt/guide* feature — the copilot reads
+the guide and follows its instructions.  There are no standalone Python
+functions that implement "loop until passing" logic; the loop is driven by
+the LLM.  However, several pieces of real Python infrastructure make the
+loop possible:
+
+1. The ``run_agent`` and ``run_block`` OpenAI tool schemas expose a
+   ``dry_run`` boolean parameter that the LLM must be able to set.
+2. The ``RunAgentInput`` Pydantic model validates ``dry_run`` as a required
+   bool, so the executor can branch on it.
+3. The ``_check_prerequisites`` method in ``RunAgentTool`` bypasses
+   credential and missing-input gates when ``dry_run=True``.
+4. The guide documents the workflow steps in a specific order that the LLM
+   must follow: create/edit -> dry-run -> inspect -> fix -> repeat.
+
+The functional test classes below exercise items 1-4 directly.
+"""
+
+import re
+from pathlib import Path
+from typing import Any, cast
+
+import pytest
+from openai.types.chat import ChatCompletionToolParam
+from pydantic import ValidationError
+
+from backend.copilot.prompting import get_sdk_supplement
+from backend.copilot.service import DEFAULT_SYSTEM_PROMPT
+from backend.copilot.tools import TOOL_REGISTRY
+from backend.copilot.tools.run_agent import RunAgentInput
+
+# Resolved once for the whole module so individual tests stay fast.
+_SDK_SUPPLEMENT = get_sdk_supplement(use_e2b=False, cwd="/tmp/test")
+
+
+# ---------------------------------------------------------------------------
+# Prompt regression tests (original)
+# ---------------------------------------------------------------------------
+
+
+class TestSystemPromptBasics:
+    """Verify the system prompt includes essential baseline content.
+
+    After deduplication, the dry-run workflow lives only in the guide.
+    The system prompt carries tone and personality only.
+    """
+
+    def test_mentions_automations(self):
+        assert "automations" in DEFAULT_SYSTEM_PROMPT.lower()
+
+    def test_mentions_action_oriented(self):
+        assert "action-oriented" in DEFAULT_SYSTEM_PROMPT.lower()
+
+
+class TestToolDescriptionsDryRunLoop:
+    """Verify tool descriptions and parameters related to the dry-run loop."""
+
+    def test_get_agent_building_guide_mentions_workflow(self):
+        desc = TOOL_REGISTRY["get_agent_building_guide"].description
+        assert "dry-run" in desc.lower()
+
+    def test_run_agent_dry_run_param_exists_and_is_boolean(self):
+        schema = TOOL_REGISTRY["run_agent"].as_openai_tool()
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        assert "dry_run" in params["properties"]
+        assert params["properties"]["dry_run"]["type"] == "boolean"
+
+    def test_run_agent_dry_run_param_mentions_simulation(self):
+        """After deduplication the dry_run param description mentions simulation."""
+        schema = TOOL_REGISTRY["run_agent"].as_openai_tool()
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        dry_run_desc = params["properties"]["dry_run"]["description"]
+        assert "simulat" in dry_run_desc.lower()
+
+
+class TestPromptingSupplementContent:
+    """Verify the prompting supplement (via get_sdk_supplement) includes
+    essential shared tool notes.  After deduplication, the dry-run workflow
+    lives only in the guide; the supplement carries storage, file-handling,
+    and tool-discovery notes.
+    """
+
+    def test_includes_tool_discovery_priority(self):
+        assert "Tool Discovery Priority" in _SDK_SUPPLEMENT
+
+    def test_includes_find_block_first(self):
+        assert "find_block first" in _SDK_SUPPLEMENT or "find_block" in _SDK_SUPPLEMENT
+
+    def test_includes_send_authenticated_web_request(self):
+        assert "SendAuthenticatedWebRequestBlock" in _SDK_SUPPLEMENT
+
+
+class TestAgentBuildingGuideDryRunLoop:
+    """Verify the agent building guide includes the dry-run loop."""
+
+    @pytest.fixture
+    def guide_content(self):
+        guide_path = (
+            Path(__file__).resolve().parent.parent.parent
+            / "backend"
+            / "copilot"
+            / "sdk"
+            / "agent_generation_guide.md"
+        )
+        return guide_path.read_text(encoding="utf-8")
+
+    def test_has_dry_run_verification_section(self, guide_content):
+        assert "REQUIRED: Dry-Run Verification Loop" in guide_content
+
+    def test_workflow_includes_dry_run_step(self, guide_content):
+        assert "dry_run=True" in guide_content
+
+    def test_mentions_good_vs_bad_output(self, guide_content):
+        assert "**Good output**" in guide_content
+        assert "**Bad output**" in guide_content
+
+    def test_mentions_repeat_until_pass(self, guide_content):
+        lower = guide_content.lower()
+        assert "repeat" in lower
+        assert "clearly unfixable" in lower
+
+    def test_mentions_wait_for_result(self, guide_content):
+        assert "wait_for_result=120" in guide_content
+
+    def test_mentions_view_agent_output(self, guide_content):
+        assert "view_agent_output" in guide_content
+
+    def test_workflow_has_dry_run_and_inspect_steps(self, guide_content):
+        assert "**Dry-run**" in guide_content
+        assert "**Inspect & fix**" in guide_content
+
+
+# ---------------------------------------------------------------------------
+# Functional tests: tool schema validation
+# ---------------------------------------------------------------------------
+
+
+class TestRunAgentToolSchema:
+    """Validate the run_agent OpenAI tool schema exposes dry_run correctly.
+
+    These go beyond substring checks — they verify the full schema structure
+    that the LLM receives, ensuring the parameter is well-formed and will be
+    parsed correctly by OpenAI function-calling.
+    """
+
+    @pytest.fixture
+    def schema(self) -> ChatCompletionToolParam:
+        return TOOL_REGISTRY["run_agent"].as_openai_tool()
+
+    def test_schema_is_valid_openai_tool(self, schema: ChatCompletionToolParam):
+        """The schema has the required top-level OpenAI structure."""
+        assert schema["type"] == "function"
+        assert "function" in schema
+        func = schema["function"]
+        assert "name" in func
+        assert "description" in func
+        assert "parameters" in func
+        assert func["name"] == "run_agent"
+
+    def test_dry_run_is_required(self, schema: ChatCompletionToolParam):
+        """dry_run must be in 'required' so the LLM always provides it explicitly."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        required = params.get("required", [])
+        assert "dry_run" in required
+
+    def test_dry_run_is_boolean_type(self, schema: ChatCompletionToolParam):
+        """dry_run must be typed as boolean so the LLM generates true/false."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        assert params["properties"]["dry_run"]["type"] == "boolean"
+
+    def test_dry_run_description_is_nonempty(self, schema: ChatCompletionToolParam):
+        """The description must be present and substantive for LLM guidance."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        desc = params["properties"]["dry_run"]["description"]
+        assert isinstance(desc, str)
+        assert len(desc) > 10, "Description too short to guide the LLM"
+
+    def test_wait_for_result_coexists_with_dry_run(
+        self, schema: ChatCompletionToolParam
+    ):
+        """wait_for_result must also be present — the guide instructs the LLM
+        to pass both dry_run=True and wait_for_result=120 together."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        assert "wait_for_result" in params["properties"]
+        assert params["properties"]["wait_for_result"]["type"] == "integer"
+
+
+class TestRunBlockToolSchema:
+    """Validate the run_block OpenAI tool schema exposes dry_run correctly."""
+
+    @pytest.fixture
+    def schema(self) -> ChatCompletionToolParam:
+        return TOOL_REGISTRY["run_block"].as_openai_tool()
+
+    def test_schema_is_valid_openai_tool(self, schema: ChatCompletionToolParam):
+        assert schema["type"] == "function"
+        func = schema["function"]
+        assert func["name"] == "run_block"
+        assert "parameters" in func
+
+    def test_dry_run_exists_and_is_boolean(self, schema: ChatCompletionToolParam):
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        props = params["properties"]
+        assert "dry_run" in props
+        assert props["dry_run"]["type"] == "boolean"
+
+    def test_dry_run_is_required(self, schema: ChatCompletionToolParam):
+        """dry_run must be required — along with block_id and input_data."""
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        required = params.get("required", [])
+        assert "dry_run" in required
+        assert "block_id" in required
+        assert "input_data" in required
+
+    def test_dry_run_description_mentions_preview(
+        self, schema: ChatCompletionToolParam
+    ):
+        params = cast(dict[str, Any], schema["function"].get("parameters", {}))
+        desc = params["properties"]["dry_run"]["description"]
+        assert isinstance(desc, str)
+        assert (
+            "preview mode" in desc.lower()
+        ), "run_block dry_run description should mention preview mode"
+
+
+# ---------------------------------------------------------------------------
+# Functional tests: RunAgentInput Pydantic model
+# ---------------------------------------------------------------------------
+
+
+class TestRunAgentInputModel:
+    """Validate RunAgentInput Pydantic model handles dry_run correctly.
+
+    The executor reads dry_run from this model, so it must parse, default,
+    and validate properly.
+    """
+
+    def test_dry_run_accepts_true(self):
+        model = RunAgentInput(username_agent_slug="user/agent", dry_run=True)
+        assert model.dry_run is True
+
+    def test_dry_run_accepts_false(self):
+        """dry_run=False must be accepted when provided explicitly."""
+        model = RunAgentInput(username_agent_slug="user/agent", dry_run=False)
+        assert model.dry_run is False
+
+    def test_dry_run_coerces_truthy_int(self):
+        """Pydantic bool fields coerce int 1 to True."""
+        model = RunAgentInput(username_agent_slug="user/agent", dry_run=1)  # type: ignore[arg-type]
+        assert model.dry_run is True
+
+    def test_dry_run_coerces_falsy_int(self):
+        """Pydantic bool fields coerce int 0 to False."""
+        model = RunAgentInput(username_agent_slug="user/agent", dry_run=0)  # type: ignore[arg-type]
+        assert model.dry_run is False
+
+    def test_dry_run_with_wait_for_result(self):
+        """The guide instructs passing both dry_run=True and wait_for_result=120.
+        The model must accept this combination."""
+        model = RunAgentInput(
+            username_agent_slug="user/agent",
+            dry_run=True,
+            wait_for_result=120,
+        )
+        assert model.dry_run is True
+        assert model.wait_for_result == 120
+
+    def test_wait_for_result_upper_bound(self):
+        """wait_for_result is bounded at 300 seconds (ge=0, le=300)."""
+        with pytest.raises(ValidationError):
+            RunAgentInput(
+                username_agent_slug="user/agent",
+                dry_run=True,
+                wait_for_result=301,
+            )
+
+    def test_string_fields_are_stripped(self):
+        """The strip_strings validator should strip whitespace from string fields."""
+        model = RunAgentInput(username_agent_slug="  user/agent  ", dry_run=True)
+        assert model.username_agent_slug == "user/agent"
+
+
+# ---------------------------------------------------------------------------
+# Functional tests: guide documents the correct workflow ordering
+# ---------------------------------------------------------------------------
+
+
+class TestGuideWorkflowOrdering:
+    """Verify the guide documents workflow steps in the correct order.
+
+    The LLM must see: create/edit -> dry-run -> inspect -> fix -> repeat.
+    If these steps are reordered, the copilot would follow the wrong sequence.
+    These tests verify *ordering*, not just presence.
+    """
+
+    @pytest.fixture
+    def guide_content(self) -> str:
+        guide_path = (
+            Path(__file__).resolve().parent.parent.parent
+            / "backend"
+            / "copilot"
+            / "sdk"
+            / "agent_generation_guide.md"
+        )
+        return guide_path.read_text(encoding="utf-8")
+
+    def test_create_before_dry_run_in_workflow(self, guide_content: str):
+        """Step 7 (Save/create_agent) must appear before step 8 (Dry-run)."""
+        create_pos = guide_content.index("create_agent")
+        dry_run_pos = guide_content.index("dry_run=True")
+        assert (
+            create_pos < dry_run_pos
+        ), "create_agent must appear before dry_run=True in the workflow"
+
+    def test_dry_run_before_inspect_in_verification_section(self, guide_content: str):
+        """In the verification loop section, Dry-run step must come before
+        Inspect & fix step."""
+        section_start = guide_content.index("REQUIRED: Dry-Run Verification Loop")
+        section = guide_content[section_start:]
+        dry_run_pos = section.index("**Dry-run**")
+        inspect_pos = section.index("**Inspect")
+        assert (
+            dry_run_pos < inspect_pos
+        ), "Dry-run step must come before Inspect & fix in the verification loop"
+
+    def test_fix_before_repeat_in_verification_section(self, guide_content: str):
+        """The Fix step must come before the Repeat step."""
+        section_start = guide_content.index("REQUIRED: Dry-Run Verification Loop")
+        section = guide_content[section_start:]
+        fix_pos = section.index("**Fix**")
+        repeat_pos = section.index("**Repeat**")
+        assert fix_pos < repeat_pos
+
+    def test_good_output_before_bad_output(self, guide_content: str):
+        """Good output examples should be listed before bad output examples,
+        so the LLM sees the success pattern first."""
+        good_pos = guide_content.index("**Good output**")
+        bad_pos = guide_content.index("**Bad output**")
+        assert good_pos < bad_pos
+
+    def test_numbered_steps_in_verification_section(self, guide_content: str):
+        """The step-by-step workflow should have numbered steps 1-5."""
+        section_start = guide_content.index("Step-by-step workflow")
+        section = guide_content[section_start:]
+        # The section should contain numbered items 1 through 5
+        for step_num in range(1, 6):
+            assert (
+                f"{step_num}. " in section
+            ), f"Missing numbered step {step_num} in verification workflow"
+
+    def test_workflow_steps_are_in_numbered_order(self, guide_content: str):
+        """The main workflow steps (1-9) must appear in ascending order."""
+        # Extract the numbered workflow items from the top-level workflow section
+        workflow_start = guide_content.index("### Workflow for Creating/Editing Agents")
+        # End at the next ### section
+        next_section = guide_content.index("### Agent JSON Structure")
+        workflow_section = guide_content[workflow_start:next_section]
+        step_positions = []
+        for step_num in range(1, 10):
+            pattern = rf"^{step_num}\.\s"
+            match = re.search(pattern, workflow_section, re.MULTILINE)
+            if match:
+                step_positions.append((step_num, match.start()))
+        # Verify at least steps 1-9 are present and in order
+        assert (
+            len(step_positions) >= 9
+        ), f"Expected 9 workflow steps, found {len(step_positions)}"
+        for i in range(1, len(step_positions)):
+            prev_num, prev_pos = step_positions[i - 1]
+            curr_num, curr_pos = step_positions[i]
+            assert prev_pos < curr_pos, (
+                f"Step {prev_num} (pos {prev_pos}) should appear before "
+                f"step {curr_num} (pos {curr_pos})"
+            )
diff --git a/autogpt_platform/docker-compose.yml b/autogpt_platform/docker-compose.yml
index 625761c0b5..0a8b412d57 100644
--- a/autogpt_platform/docker-compose.yml
+++ b/autogpt_platform/docker-compose.yml
@@ -98,6 +98,7 @@ services:
       - CLAMD_CONF_MaxScanSize=100M
       - CLAMD_CONF_MaxThreads=12
       - CLAMD_CONF_ReadTimeout=300
+      - CLAMD_CONF_TCPAddr=0.0.0.0
     healthcheck:
       test: ["CMD-SHELL", "clamdscan --version || exit 1"]
       interval: 30s
diff --git a/autogpt_platform/frontend/AGENTS.md b/autogpt_platform/frontend/AGENTS.md
index e0accaadc1..152d0f239d 100644
--- a/autogpt_platform/frontend/AGENTS.md
+++ b/autogpt_platform/frontend/AGENTS.md
@@ -40,6 +40,8 @@ After making **any** code changes in the frontend, you MUST run the following co
 
 Do NOT skip these steps. If any command reports errors, fix them and re-run until clean. Only then may you consider the task complete. If typing keeps failing, stop and ask the user.
 
+4. `pnpm test:unit` — run integration tests; fix any failures
+
 ### Code Style
 
 - Fully capitalize acronyms in symbols, e.g. `graphID`, `useBackendAPI`
@@ -62,7 +64,7 @@ Do NOT skip these steps. If any command reports errors, fix them and re-run unti
 - **Icons**: Phosphor Icons only
 - **Feature Flags**: LaunchDarkly integration
 - **Error Handling**: ErrorCard for render errors, toast for mutations, Sentry for exceptions
-- **Testing**: Playwright for E2E, Storybook for component development
+- **Testing**: Vitest + React Testing Library + MSW for integration tests (primary), Playwright for E2E, Storybook for visual
 
 ## Environment Configuration
 
@@ -84,7 +86,12 @@ See @CONTRIBUTING.md for complete patterns. Quick reference:
    - Regenerate with `pnpm generate:api`
    - Pattern: `use{Method}{Version}{OperationName}`
 4. **Styling**: Tailwind CSS only, use design tokens, Phosphor Icons only
-5. **Testing**: Add Storybook stories for new components, Playwright for E2E. When fixing a bug, write a failing Playwright test first (use `.fixme` annotation), implement the fix, then remove the annotation.
+5. **Testing**: Integration tests are the default (~90%). See `TESTING.md` for full details.
+   - **New pages/features**: Write integration tests in `__tests__/` next to `page.tsx` using Vitest + RTL + MSW
+   - **API mocking**: Use Orval-generated MSW handlers from `@/app/api/__generated__/endpoints/{tag}/{tag}.msw.ts`
+   - **Run**: `pnpm test:unit` (integration/unit), `pnpm test` (Playwright E2E)
+   - **Storybook**: For design system components in `src/components/`
+   - **TDD**: Write a failing test first, implement, then verify
 6. **Code conventions**:
    - Use function declarations (not arrow functions) for components/handlers
    - Do not use `useCallback` or `useMemo` unless asked to optimise a given function
diff --git a/autogpt_platform/frontend/CONTRIBUTING.md b/autogpt_platform/frontend/CONTRIBUTING.md
index 649bb1ca92..bcb40f4430 100644
--- a/autogpt_platform/frontend/CONTRIBUTING.md
+++ b/autogpt_platform/frontend/CONTRIBUTING.md
@@ -747,9 +747,65 @@ export function CreateButton() {
 
 ---
 
-## 🧪 Testing & Storybook
+## 🧪 Testing
 
-- See `TESTING.md` for Playwright setup, E2E data seeding, and Storybook usage.
+See `TESTING.md` for full details. Key principles:
+
+### Integration tests are the default (~90% of tests)
+
+We test at the **page level**: render the page with React Testing Library, mock API requests with MSW (auto-generated by Orval), and assert with testing-library queries.
+
+```bash
+pnpm test:unit              # run integration/unit tests
+pnpm test:unit:watch        # watch mode
+```
+
+### Test file location
+
+Tests live in `__tests__/` next to the page or component:
+
+```
+app/(platform)/library/
+  __tests__/
+    main.test.tsx           # main page rendering & interactions
+    search.test.tsx         # search-specific behavior
+  components/
+  page.tsx
+  useLibraryPage.ts
+```
+
+### Writing a test
+
+1. Render the page using `render()` from `@/tests/integrations/test-utils`
+2. Mock API responses using Orval-generated MSW handlers from `@/app/api/__generated__/endpoints/{tag}/{tag}.msw.ts`
+3. Assert with `screen.findByText`, `screen.getByRole`, etc.
+
+```tsx
+import { render, screen } from "@/tests/integrations/test-utils";
+import { server } from "@/mocks/mock-server";
+import { getGetV2ListLibraryAgentsMockHandler200 } from "@/app/api/__generated__/endpoints/library/library.msw";
+import LibraryPage from "../page";
+
+test("renders agent list", async () => {
+  server.use(getGetV2ListLibraryAgentsMockHandler200());
+  render(<LibraryPage />);
+  expect(await screen.findByText("My Agents")).toBeDefined();
+});
+```
+
+### When to use each test type
+
+| Type                                 | When                                          |
+| ------------------------------------ | --------------------------------------------- |
+| **Integration (Vitest + RTL + MSW)** | Default for all new pages and features        |
+| **E2E (Playwright)**                 | Auth flows, payments, cross-page navigation   |
+| **Storybook**                        | Design system components in `src/components/` |
+
+### TDD workflow
+
+1. Write a failing test (integration test or Playwright with `.fixme`)
+2. Implement the fix/feature
+3. Remove annotations and run the full suite
 
 ---
 
@@ -763,8 +819,10 @@ Common scripts (see `package.json` for full list):
 - `pnpm lint` — ESLint + Prettier check
 - `pnpm format` — Format code
 - `pnpm types` — Type-check
+- `pnpm test:unit` — Run integration/unit tests (Vitest + RTL + MSW)
+- `pnpm test:unit:watch` — Watch mode for integration tests
+- `pnpm test` — Run Playwright E2E tests
 - `pnpm storybook` — Run Storybook
-- `pnpm test` — Run Playwright tests
 
 Generated API client:
 
@@ -780,6 +838,7 @@ Generated API client:
 - Logic is separated into `use*.ts` and `helpers.ts` when non-trivial
 - Reusable logic extracted to `src/services/` or `src/lib/utils.ts` when appropriate
 - Navigation uses the Next.js router
+- Integration tests added/updated for new pages and features (`pnpm test:unit`)
 - Lint, format, type-check, and tests pass locally
 - Stories updated/added if UI changed; verified in Storybook
 
diff --git a/autogpt_platform/frontend/Dockerfile b/autogpt_platform/frontend/Dockerfile
index 476a9a8ed3..58ce906cd4 100644
--- a/autogpt_platform/frontend/Dockerfile
+++ b/autogpt_platform/frontend/Dockerfile
@@ -12,6 +12,10 @@ COPY autogpt_platform/frontend/ .
 # Allow CI to opt-in to Playwright test build-time flags
 ARG NEXT_PUBLIC_PW_TEST="false"
 ENV NEXT_PUBLIC_PW_TEST=$NEXT_PUBLIC_PW_TEST
+# Allow CI to opt-in to browser sourcemaps for coverage path resolution.
+# Keep Docker builds defaulting to false to avoid the memory hit.
+ARG NEXT_PUBLIC_SOURCEMAPS="false"
+ENV NEXT_PUBLIC_SOURCEMAPS=$NEXT_PUBLIC_SOURCEMAPS
 ENV NODE_ENV="production"
 # Merge env files appropriately based on environment
 RUN if [ -f .env.production ]; then \
@@ -25,10 +29,6 @@ RUN if [ -f .env.production ]; then \
       cp .env.default .env; \
     fi
 RUN pnpm run generate:api
-# Disable source-map generation in Docker builds to halve webpack memory usage.
-# Source maps are only useful when SENTRY_AUTH_TOKEN is set (Vercel deploys);
-# the Docker image never uploads them, so generating them just wastes RAM.
-ENV NEXT_PUBLIC_SOURCEMAPS="false"
 # In CI, we want NEXT_PUBLIC_PW_TEST=true during build so Next.js inlines it
 RUN if [ "$NEXT_PUBLIC_PW_TEST" = "true" ]; then NEXT_PUBLIC_PW_TEST=true NODE_OPTIONS="--max-old-space-size=8192" pnpm build; else NODE_OPTIONS="--max-old-space-size=8192" pnpm build; fi
 
diff --git a/autogpt_platform/frontend/TESTING.md b/autogpt_platform/frontend/TESTING.md
index 2995295c96..0b95f8eaab 100644
--- a/autogpt_platform/frontend/TESTING.md
+++ b/autogpt_platform/frontend/TESTING.md
@@ -1,57 +1,168 @@
-# Frontend Testing 🧪
+# Frontend Testing
 
-## Quick Start (local) 🚀
+## Testing Strategy
+
+| Type                      | Tool                                 | Speed         | When to use                                           |
+| ------------------------- | ------------------------------------ | ------------- | ----------------------------------------------------- |
+| **Integration (primary)** | Vitest + React Testing Library + MSW | Fast (~100ms) | ~90% of tests — page-level rendering with mocked API  |
+| **E2E**                   | Playwright                           | Slow (~5s)    | Critical flows: auth, payments, cross-page navigation |
+| **Visual**                | Storybook + Chromatic                | N/A           | Design system components                              |
+
+**Integration tests are the default.** Since most of our code is client-only, we test at the page level: render the page with React Testing Library, mock API requests with MSW (handlers auto-generated by Orval), and assert with testing-library queries.
+
+## Integration Tests (Vitest + RTL + MSW)
+
+### Running
+
+```bash
+pnpm test:unit              # run all integration/unit tests with coverage
+pnpm test:unit:watch        # watch mode for development
+```
+
+### File location
+
+Tests live in a `__tests__/` folder next to the page or component they test:
+
+```
+app/(platform)/library/
+  __tests__/
+    main.test.tsx           # tests the main page rendering & interactions
+    search.test.tsx         # tests search-specific behavior
+  components/
+    AgentCard/
+      AgentCard.tsx
+      __tests__/
+        AgentCard.test.tsx  # only when testing the component in isolation
+  page.tsx
+  useLibraryPage.ts
+```
+
+**Naming**: use descriptive names like `main.test.tsx`, `search.test.tsx`, `filters.test.tsx` — not `page.test.tsx` or `index.test.tsx`.
+
+### Writing an integration test
+
+1. **Render the page** using the custom `render()` from `@/tests/integrations/test-utils` (wraps providers)
+2. **Mock API responses** using Orval-generated MSW handlers from `@/app/api/__generated__/endpoints/{tag}/{tag}.msw.ts`
+3. **Assert** with React Testing Library queries (`screen.findByText`, `screen.getByRole`, etc.)
+
+```tsx
+import { render, screen } from "@/tests/integrations/test-utils";
+import { server } from "@/mocks/mock-server";
+import {
+  getGetV2ListLibraryAgentsMockHandler200,
+  getGetV2ListLibraryAgentsMockHandler422,
+} from "@/app/api/__generated__/endpoints/library/library.msw";
+import LibraryPage from "../page";
+
+describe("LibraryPage", () => {
+  test("renders agent list from API", async () => {
+    server.use(getGetV2ListLibraryAgentsMockHandler200());
+
+    render(<LibraryPage />);
+
+    expect(await screen.findByText("My Agents")).toBeDefined();
+  });
+
+  test("shows error state on API failure", async () => {
+    server.use(getGetV2ListLibraryAgentsMockHandler422());
+
+    render(<LibraryPage />);
+
+    expect(await screen.findByText(/error/i)).toBeDefined();
+  });
+});
+```
+
+### MSW handlers
+
+Orval generates typed MSW handlers for every endpoint and HTTP status code:
+
+- `getGetV2ListLibraryAgentsMockHandler200()` — success response with faker data
+- `getGetV2ListLibraryAgentsMockHandler422()` — validation error response
+- `getGetV2ListLibraryAgentsMockHandler401()` — unauthorized response
+
+To override with custom data, pass a resolver:
+
+```tsx
+import { http, HttpResponse } from "msw";
+
+server.use(
+  http.get("http://localhost:3000/api/proxy/api/library/agents", () => {
+    return HttpResponse.json({
+      agents: [{ id: "1", name: "My Agent" }],
+      pagination: { total: 1 },
+    });
+  }),
+);
+```
+
+All handlers are aggregated in `src/mocks/mock-handlers.ts` and the MSW server is set up in `src/mocks/mock-server.ts`.
+
+### Test utilities
+
+- **`@/tests/integrations/test-utils`** — custom `render()` that wraps components with `QueryClientProvider`, `BackendAPIProvider`, `OnboardingProvider`, `NuqsTestingAdapter`, and `TooltipProvider`, so query-state hooks and tooltips work out of the box in page-level tests
+- **`@/tests/integrations/setup-nextjs-mocks`** — mocks for `next/navigation`, `next/image`, `next/headers`, `next/link`
+- **`@/tests/integrations/mock-supabase-request`** — mocks Supabase auth (returns null user by default)
+
+### What to test at page level
+
+- Page renders with API data (happy path)
+- Loading and error states
+- User interactions that trigger mutations (clicks, form submissions)
+- Conditional rendering based on API responses
+- Search, filtering, pagination behavior
+
+### When to test a component in isolation
+
+Only when the component has complex internal logic that is hard to exercise through the page test. Prefer page-level tests as the default.
+
+## E2E Tests (Playwright)
+
+### Running
+
+```bash
+pnpm test                   # build + run all Playwright tests
+pnpm test-ui                # run with Playwright UI
+pnpm test:no-build          # run against a running dev server
+```
+
+### Setup
 
 1. Start the backend + Supabase stack:
    - From `autogpt_platform`: `docker compose --profile local up deps_backend -d`
-   - Or run the full stack: `docker compose up -d`
 2. Seed rich E2E data (creates `test123@gmail.com` with library agents):
    - From `autogpt_platform/backend`: `poetry run python test/e2e_test_data.py`
-3. Run Playwright:
-   - From `autogpt_platform/frontend`: `pnpm test` or `pnpm test-ui`
 
-## How Playwright setup works 🎭
+### How Playwright setup works
 
-- Playwright runs from `frontend/playwright.config.ts` with a global setup step.
-- The global setup creates a user pool via the real signup UI and stores it in `frontend/.auth/user-pool.json`.
-- Most tests call `getTestUser()` (from `src/tests/utils/auth.ts`) which pulls a random user from that pool.
-  - these users do not contain library agents, it's user that just "signed up" on the platform, hence some tests to make use of users created via script (see below) with more data
+- Playwright runs from `frontend/playwright.config.ts` with a global setup step
+- Global setup creates a user pool via the real signup UI, stored in `frontend/.auth/user-pool.json`
+- `getTestUser()` (from `src/tests/utils/auth.ts`) pulls a random user from the pool
+- `getTestUserWithLibraryAgents()` uses the rich user created by the data script
 
-## Test users 👤
+### Test users
 
-- **User pool (basic users)**  
-  Created automatically by the Playwright global setup through `/signup`.  
-  Used by `getTestUser()` in `src/tests/utils/auth.ts`.
+- **User pool (basic users)** — created automatically by Playwright global setup. Used by `getTestUser()`
+- **Rich user with library agents** — created by `backend/test/e2e_test_data.py`. Used by `getTestUserWithLibraryAgents()`
 
-- **Rich user with library agents**  
-  Created by `backend/test/e2e_test_data.py`.  
-  Accessed via `getTestUserWithLibraryAgents()` in `src/tests/credentials/index.ts`.
-
-Use the rich user when a test needs existing library agents (e.g. `library.spec.ts`).
-
-## Resetting or wiping the DB 🔁
+### Resetting the DB
 
 If you reset the Docker DB and logins start failing:
 
-1. Delete `frontend/.auth/user-pool.json` so the pool is regenerated.
-2. Re-run the E2E data script to recreate the rich user + library agents:
-   - `poetry run python test/e2e_test_data.py`
+1. Delete `frontend/.auth/user-pool.json`
+2. Re-run `poetry run python test/e2e_test_data.py`
 
-## Storybook 📚
+## Storybook
 
-## Flow diagram 🗺️
+- `pnpm storybook` — run locally
+- `pnpm build-storybook` — build static
+- `pnpm test-storybook` — CI runner
+- When changing components in `src/components`, update or add stories and verify in Storybook/Chromatic
 
-```mermaid
-flowchart TD
-  A[Start Docker stack] --> B[Run e2e_test_data.py]
-  B --> C[Run Playwright tests]
-  C --> D[Global setup creates user pool]
-  D --> E{Test needs rich data?}
-  E -->|No| F[getTestUser from user pool]
-  E -->|Yes| G[getTestUserWithLibraryAgents]
-```
+## TDD Workflow
 
-- `pnpm storybook` – Run Storybook locally
-- `pnpm build-storybook` – Build a static Storybook
-- CI runner: `pnpm test-storybook`
-- When changing components in `src/components`, update or add stories and verify in Storybook/Chromatic.
+When fixing a bug or adding a feature:
+
+1. **Write a failing test first** — for integration tests, write the test and confirm it fails. For Playwright, use `.fixme` annotation
+2. **Implement the fix/feature** — write the minimal code to make the test pass
+3. **Remove annotations** — once passing, remove `.fixme` and run the full suite
diff --git a/autogpt_platform/frontend/package.json b/autogpt_platform/frontend/package.json
index bc172c1669..90c2645272 100644
--- a/autogpt_platform/frontend/package.json
+++ b/autogpt_platform/frontend/package.json
@@ -161,6 +161,7 @@
     "eslint-plugin-storybook": "9.1.5",
     "happy-dom": "20.3.4",
     "import-in-the-middle": "2.0.2",
+    "monocart-reporter": "2.10.0",
     "msw": "2.11.6",
     "msw-storybook-addon": "2.0.6",
     "orval": "7.13.0",
diff --git a/autogpt_platform/frontend/playwright.config.ts b/autogpt_platform/frontend/playwright.config.ts
index 7604e8e88a..bf3c19845f 100644
--- a/autogpt_platform/frontend/playwright.config.ts
+++ b/autogpt_platform/frontend/playwright.config.ts
@@ -5,10 +5,57 @@ import { defineConfig, devices } from "@playwright/test";
  * https://github.com/motdotla/dotenv
  */
 import dotenv from "dotenv";
+import fs from "fs";
 import path from "path";
 dotenv.config({ path: path.resolve(__dirname, ".env") });
 dotenv.config({ path: path.resolve(__dirname, "../backend/.env") });
 
+const frontendRoot = __dirname.replaceAll("\\", "/");
+
+// Directory where CI copies .next/static from the Docker container
+const staticCoverageDir = path.resolve(__dirname, ".next-static-coverage");
+
+function normalizeCoverageSourcePath(filePath: string) {
+  const normalizedFilePath = filePath.replaceAll("\\", "/");
+  const withoutWebpackPrefix = normalizedFilePath.replace(
+    /^webpack:\/\/_N_E\//,
+    "",
+  );
+
+  if (withoutWebpackPrefix.startsWith("./")) {
+    return withoutWebpackPrefix.slice(2);
+  }
+
+  if (withoutWebpackPrefix.startsWith(frontendRoot)) {
+    return path.posix.relative(frontendRoot, withoutWebpackPrefix);
+  }
+
+  return withoutWebpackPrefix;
+}
+
+// Resolve source maps from the copied .next/static directory.
+// Cache parsed results to avoid repeated disk reads during report generation.
+const sourceMapCache = new Map<string, object | undefined>();
+
+function resolveSourceMap(sourcePath: string) {
+  // sourcePath is the sourceMappingURL, e.g.:
+  //   "http://localhost:3000/_next/static/chunks/abc123.js.map"
+  const match = sourcePath.match(/_next\/static\/(.+)$/);
+  if (!match) return undefined;
+
+  const mapFile = path.join(staticCoverageDir, match[1]);
+  if (sourceMapCache.has(mapFile)) return sourceMapCache.get(mapFile);
+
+  try {
+    const result = JSON.parse(fs.readFileSync(mapFile, "utf8")) as object;
+    sourceMapCache.set(mapFile, result);
+    return result;
+  } catch {
+    sourceMapCache.set(mapFile, undefined);
+    return undefined;
+  }
+}
+
 export default defineConfig({
   testDir: "./src/tests",
   /* Global setup file that runs before all tests */
@@ -22,7 +69,30 @@ export default defineConfig({
   /* use more workers on CI. */
   workers: process.env.CI ? 4 : undefined,
   /* Reporter to use. See https://playwright.dev/docs/test-reporters */
-  reporter: [["list"], ["html", { open: "never" }]],
+  reporter: [
+    ["list"],
+    ["html", { open: "never" }],
+    [
+      "monocart-reporter",
+      {
+        name: "E2E Coverage Report",
+        outputFile: "./coverage/e2e/report.html",
+        coverage: {
+          reports: ["cobertura"],
+          outputDir: "./coverage/e2e",
+          entryFilter: (entry: { url: string }) =>
+            entry.url.includes("/_next/static/") &&
+            !entry.url.includes("node_modules"),
+          sourceFilter: (sourcePath: string) =>
+            sourcePath.includes("src/") && !sourcePath.includes("node_modules"),
+          sourcePath: (filePath: string) =>
+            normalizeCoverageSourcePath(filePath),
+          sourceMapResolver: (sourcePath: string) =>
+            resolveSourceMap(sourcePath),
+        },
+      },
+    ],
+  ],
   /* Shared settings for all the projects below. See https://playwright.dev/docs/api/class-testoptions. */
   use: {
     /* Base URL to use in actions like `await page.goto('/')`. */
diff --git a/autogpt_platform/frontend/pnpm-lock.yaml b/autogpt_platform/frontend/pnpm-lock.yaml
index 5baa9a50f6..95b49e3a22 100644
--- a/autogpt_platform/frontend/pnpm-lock.yaml
+++ b/autogpt_platform/frontend/pnpm-lock.yaml
@@ -400,6 +400,9 @@ importers:
       import-in-the-middle:
         specifier: 2.0.2
         version: 2.0.2
+      monocart-reporter:
+        specifier: 2.10.0
+        version: 2.10.0
       msw:
         specifier: 2.11.6
         version: 2.11.6(@types/node@24.10.0)(typescript@5.9.3)
@@ -4064,6 +4067,10 @@ packages:
     resolution: {integrity: sha512-h8lQ8tacZYnR3vNQTgibj+tODHI5/+l06Au2Pcriv/Gmet0eaj4TwWH41sO9wnHDiQsEj19q0drzdWdeAHtweg==}
     engines: {node: '>=6.5'}
 
+  accepts@1.3.8:
+    resolution: {integrity: sha512-PYAthTa2m2VKxuvSD3DPC/Gy+U+sOA1LAuT8mkmRuvw+NACSaeXEQ+NHcVF7rONl6qcaxV3Uuemwawk+7+SJLw==}
+    engines: {node: '>= 0.6'}
+
   acorn-import-attributes@1.9.5:
     resolution: {integrity: sha512-n02Vykv5uA3eHGM/Z2dQrcD56kL8TyDb2p1+0P83PClMnC/nc+anbQRhIOWnSq4Ke/KvDPrY3C9hDtC/A3eHnQ==}
     peerDependencies:
@@ -4080,6 +4087,14 @@ packages:
     peerDependencies:
       acorn: ^6.0.0 || ^7.0.0 || ^8.0.0
 
+  acorn-loose@8.5.2:
+    resolution: {integrity: sha512-PPvV6g8UGMGgjrMu+n/f9E/tCSkNQ2Y97eFvuVdJfG11+xdIeDcLyNdC8SHcrHbRqkfwLASdplyR6B6sKM1U4A==}
+    engines: {node: '>=0.4.0'}
+
+  acorn-walk@8.3.5:
+    resolution: {integrity: sha512-HEHNfbars9v4pgpW6SO1KSPkfoS0xVOM/9UzkJltjlsHZmJasxg8aXkuZa7SMf8vKGIBhpUsPluQSqhJFCqebw==}
+    engines: {node: '>=0.4.0'}
+
   acorn@8.15.0:
     resolution: {integrity: sha512-NZyJarBfL7nWwIq+FDL6Zp/yHEhePMNnnJ0y3qfieCrmNvYct8uvtiV41UvlSe6apAfk0fY1FbWx+NwfmpvtTg==}
     engines: {node: '>=0.4.0'}
@@ -4610,9 +4625,20 @@ packages:
   console-browserify@1.2.0:
     resolution: {integrity: sha512-ZMkYO/LkF17QvCPqM0gxw8yUzigAOZOSWSHg91FH6orS7vcEj5dVZTidN2fQ14yBSdg97RqhSNwLUXInd52OTA==}
 
+  console-grid@2.2.3:
+    resolution: {integrity: sha512-+mecFacaFxGl+1G31IsCx41taUXuW2FxX+4xIE0TIPhgML+Jb9JFcBWGhhWerd1/vhScubdmHqTwOhB0KCUUAg==}
+
   constants-browserify@1.0.0:
     resolution: {integrity: sha512-xFxOwqIzR/e1k1gLiWEophSCMqXcwVHIH7akf7b/vxcUeGunlj3hvZaaqxwHsTgn+IndtkQJgSztIDWeumWJDQ==}
 
+  content-disposition@1.0.1:
+    resolution: {integrity: sha512-oIXISMynqSqm241k6kcQ5UwttDILMK4BiurCfGEREw6+X9jkkpEe5T9FZaApyLGGOnFuyMWZpdolTXMtvEJ08Q==}
+    engines: {node: '>=18'}
+
+  content-type@1.0.5:
+    resolution: {integrity: sha512-nTjqfcBFEipKdXCv4YDQWCfmcLZKm81ldF0pAopTvyrFGVbcR6P/VAAd5G7N+0tTr8QqiU0tFadD6FK4NtJwOA==}
+    engines: {node: '>= 0.6'}
+
   convert-source-map@1.9.0:
     resolution: {integrity: sha512-ASFBup0Mz1uyiIjANan1jzLQami9z1PoYSZCiiYW2FczPbenXc45FZdBZLzOT+r6+iciuEModtmCti+hjaAk0A==}
 
@@ -4623,6 +4649,10 @@ packages:
     resolution: {integrity: sha512-9Kr/j4O16ISv8zBBhJoi4bXOYNTkFLOqSL3UDB0njXxCXNezjeyVrJyGOWtgfs/q2km1gwBcfH8q1yEGoMYunA==}
     engines: {node: '>=18'}
 
+  cookies@0.9.1:
+    resolution: {integrity: sha512-TG2hpqe4ELx54QER/S3HQ9SRVnQnGBtKUz5bLQWtYAQ+o6GpgMs6sYUvaiJjVxb+UXwhRhAEP3m7LbsIZ77Hmw==}
+    engines: {node: '>= 0.8'}
+
   core-js-compat@3.47.0:
     resolution: {integrity: sha512-IGfuznZ/n7Kp9+nypamBhvwdwLsW6KC8IOaURw2doAK5e98AG3acVLdh0woOnEqCfUtS+Vu882JE4k/DAm3ItQ==}
 
@@ -4931,6 +4961,9 @@ packages:
     resolution: {integrity: sha512-h5k/5U50IJJFpzfL6nO9jaaumfjO/f2NjK/oYB2Djzm4p9L+3T9qWpZqZ2hAbLPuuYq9wrU08WQyBTL5GbPk5Q==}
     engines: {node: '>=6'}
 
+  deep-equal@1.0.1:
+    resolution: {integrity: sha512-bHtC0iYvWhyaTzvV3CZgPeZQqCOBGyGsVV7v4eevpdkLHfiSrXUdBG+qAuSz4RI70sszvjQ1QSZ98An1yNwpSw==}
+
   deep-is@0.1.4:
     resolution: {integrity: sha512-oIPzksmTg4/MriiaYGO+okXDT7ztn/w3Eptv/+gSIdMdKsJo0u4CfYNFJPy+4SKMuCqGw2wxnA+URMg3t8a/bQ==}
 
@@ -4957,6 +4990,17 @@ packages:
   delaunator@5.0.1:
     resolution: {integrity: sha512-8nvh+XBe96aCESrGOqMp/84b13H9cdKbG5P2ejQCh4d4sK9RL4371qou9drQjMhvnPmhWl5hnmqbEE0fXr9Xnw==}
 
+  delegates@1.0.0:
+    resolution: {integrity: sha512-bd2L678uiWATM6m5Z1VzNCErI3jiGzt6HGY8OVICs40JQq/HALfbyNJmp0UDakEY4pMMaN0Ly5om/B1VI/+xfQ==}
+
+  depd@1.1.2:
+    resolution: {integrity: sha512-7emPTl6Dpo6JRXOXjLRxck+FlLRX5847cLKEn00PLAgc3g2hTZZgr+e4c2v6QpSmLeFP3n5yUo7ft6avBK/5jQ==}
+    engines: {node: '>= 0.6'}
+
+  depd@2.0.0:
+    resolution: {integrity: sha512-g7nH6P6dyDioJogAAGprGpCtVImJhpPk/roCzdb3fIh61/s/nPsfR6onyMwkCAR/OlC3yBC0lESvUoQEAssIrw==}
+    engines: {node: '>= 0.8'}
+
   dependency-graph@0.11.0:
     resolution: {integrity: sha512-JeMq7fEshyepOWDfcfHK06N3MhyPhz++vtqWhMT5O9A3K42rdsEDpfdVqjaqaAhsw6a+ZqeDvQVtD0hFHQWrzg==}
     engines: {node: '>= 0.6.0'}
@@ -4968,6 +5012,10 @@ packages:
   des.js@1.1.0:
     resolution: {integrity: sha512-r17GxjhUCjSRy8aiJpr8/UadFIzMzJGexI3Nmz4ADi9LYSFx4gTBp80+NaX/YsXWWLhpZ7v/v/ubEc/bCNfKwg==}
 
+  destroy@1.2.0:
+    resolution: {integrity: sha512-2sJGJTaXIIaR1w4iJSNoN0hnMY7Gpc/n8D4qSCJw8QqFWXf7cuAgnEHxBpweaVcPevC2l3KpjYCx3NypQQgaJg==}
+    engines: {node: '>= 0.8', npm: 1.2.8000 || >= 1.4.16}
+
   detect-libc@2.1.2:
     resolution: {integrity: sha512-Btj2BOOO83o3WyH59e8MgXsxEQVcarkUOpEYrubB0urwnN10yQ364rsiByU11nZlqWYZm05i/of7io4mzihBtQ==}
     engines: {node: '>=8'}
@@ -5049,6 +5097,12 @@ packages:
   eastasianwidth@0.2.0:
     resolution: {integrity: sha512-I88TYZWc9XiYHRQ4/3c5rjjfgkjhLyW2luGIheGERbNQ6OY7yTybanSpDXZa8y7VUP9YmDcYa+eyq4ca7iLqWA==}
 
+  ee-first@1.1.1:
+    resolution: {integrity: sha512-WMwm9LhRUo+WUaRN+vRuETqG89IgZphVSNkdFgeb6sS/E4OrDIN7t48CAewSHXc6C8lefD8KKfr5vY61brQlow==}
+
+  eight-colors@1.3.2:
+    resolution: {integrity: sha512-qo7BAEbNnadiWn3EgZFD8tk2DWpifEHJE7CVyp09I0FiUJZ6z0YSyCGFmmtopVMi32iaL4hEK6m+/pPkx1iMFA==}
+
   electron-to-chromium@1.5.267:
     resolution: {integrity: sha512-0Drusm6MVRXSOJpGbaSVgcQsuB4hEkMpHXaVstcPmhu5LIedxs1xNK/nIxmQIU/RPC0+1/o0AVZfBTkTNJOdUw==}
 
@@ -5081,6 +5135,10 @@ packages:
     resolution: {integrity: sha512-/kyM18EfinwXZbno9FyUGeFh87KC8HRQBQGildHZbEuRyWFOmv1U10o9BBp8XVZDVNNuQKyIGIu5ZYAAXJ0V2Q==}
     engines: {node: '>= 4'}
 
+  encodeurl@2.0.0:
+    resolution: {integrity: sha512-Q0n9HRi4m6JuGIV1eFlmvJB7ZEVxu93IrMyiMsGC0lrMJMWzRgx6WGquyfQgZVb31vhGgXnfmPNNXmxnOkRBrg==}
+    engines: {node: '>= 0.8'}
+
   endent@2.1.0:
     resolution: {integrity: sha512-r8VyPX7XL8U01Xgnb1CjZ3XV+z90cXIJ9JPE/R9SEC9vpw2P6CfsRPJmp20DppC5N7ZAMCmjYkJIa744Iyg96w==}
 
@@ -5180,6 +5238,9 @@ packages:
     resolution: {integrity: sha512-WUj2qlxaQtO4g6Pq5c29GTcWGDyd8itL8zTlipgECz3JesAiiOKotd8JU6otB3PACgG6xkJUyVhboMS+bje/jA==}
     engines: {node: '>=6'}
 
+  escape-html@1.0.3:
+    resolution: {integrity: sha512-NiSupZ4OeuGwr68lGIeym/ksIZMJodUGOSCZ/FSnTxcrekbvqrgdUxlJOMpijaKZVjAJrWrGs/6Jy8OMuyj9ow==}
+
   escape-string-regexp@4.0.0:
     resolution: {integrity: sha512-TtpcNJ3XAzx3Gq8sWRzJaVajRs0uVxA2YAkdb1jm2YkPz4G6egUFAyA3n5vtEIZefPk5Wa4UXbKuS5fKkJWdgA==}
     engines: {node: '>=10'}
@@ -5493,6 +5554,10 @@ packages:
       react-dom:
         optional: true
 
+  fresh@0.5.2:
+    resolution: {integrity: sha512-zJ2mQYM18rEFOudeV4GShTGIQ7RbzA7ozbU9I/XBpm7kqgMywgmylMwXHxZJmkVoYkna9d2pVXVXPdYTP9ej8Q==}
+    engines: {node: '>= 0.6'}
+
   fs-extra@10.1.0:
     resolution: {integrity: sha512-oRXApq54ETRj4eMiFzGnHWGy+zo5raudjuxN0b8H7s/RU2oW0Wvsx9O0ACRN/kRq9E8Vu/ReskGB5o3ji+FzHQ==}
     engines: {node: '>=12'}
@@ -5773,6 +5838,18 @@ packages:
   htmlparser2@6.1.0:
     resolution: {integrity: sha512-gyyPk6rgonLFEDGoeRgQNaEUvdJ4ktTmmUh/h2t7s+M8oPpIPxgNACWa+6ESR57kXstwqPiCut0V8NRpcwgU7A==}
 
+  http-assert@1.5.0:
+    resolution: {integrity: sha512-uPpH7OKX4H25hBmU6G1jWNaqJGpTXxey+YOUizJUAgu0AjLUeC8D73hTrhvDS5D+GJN1DN1+hhc/eF/wpxtp0w==}
+    engines: {node: '>= 0.8'}
+
+  http-errors@1.8.1:
+    resolution: {integrity: sha512-Kpk9Sm7NmI+RHhnj6OIWDI1d6fIoFAtFt9RLaTMRlg/8w49juAStsrBgp0Dp4OdxdVbRIeKhtCUvoi/RuAhO4g==}
+    engines: {node: '>= 0.6'}
+
+  http-errors@2.0.1:
+    resolution: {integrity: sha512-4FbRdAX+bSdmo4AUFuS0WNiPz8NgFt+r8ThgNWmlrjQjt1Q7ZR9+zTlce2859x4KSXrwIsaeTqDoKQmtP8pLmQ==}
+    engines: {node: '>= 0.8'}
+
   http-proxy-agent@7.0.2:
     resolution: {integrity: sha512-T1gkAiYYDWYx3V5Bmyu7HcfcvL7mUrTWiM6yOfa3PIphViJ/gFPbvidQ+veqSOHci/PxBcDabeUNCzpOODJZig==}
     engines: {node: '>= 14'}
@@ -6193,12 +6270,26 @@ packages:
     resolution: {integrity: sha512-YHzO7721WbmAL6Ov1uzN/l5mY5WWWhJBSW+jq4tkfZfsxmo1hu6frS0EOswvjBUnWE6NtjEs48SFn5CQESRLZg==}
     hasBin: true
 
+  keygrip@1.1.0:
+    resolution: {integrity: sha512-iYSchDJ+liQ8iwbSI2QqsQOvqv58eJCEanyJPJi+Khyu8smkcKSFUCbPwzFcL7YVtZ6eONjqRX/38caJ7QjRAQ==}
+    engines: {node: '>= 0.6'}
+
   keyv@4.5.4:
     resolution: {integrity: sha512-oxVHkHR/EJf2CNXnWxRLW6mg7JyCCUcG0DtEGmL2ctUo1PNTin1PUil+r/+4r5MpVgC/fn1kjsx7mjSujKqIpw==}
 
   khroma@2.1.0:
     resolution: {integrity: sha512-Ls993zuzfayK269Svk9hzpeGUKob/sIgZzyHYdjQoAdQetRKpOLj+k/QQQ/6Qi0Yz65mlROrfd+Ev+1+7dz9Kw==}
 
+  koa-compose@4.1.0:
+    resolution: {integrity: sha512-8ODW8TrDuMYvXRwra/Kh7/rJo9BtOfPc6qO8eAfC80CnCvSjSl0bkRM24X6/XBBEyj0v1nRUQ1LyOy3dbqOWXw==}
+
+  koa-static-resolver@1.0.6:
+    resolution: {integrity: sha512-ZX5RshSzH8nFn05/vUNQzqw32nEigsPa67AVUr6ZuQxuGdnCcTLcdgr4C81+YbJjpgqKHfacMBd7NmJIbj7fXw==}
+
+  koa@3.2.0:
+    resolution: {integrity: sha512-TrM4/tnNY7uJ1aW55sIIa+dqBvc4V14WRIAlGcWat9wV5pRS9Wr5Zk2ZTjQP1jtfIHDoHiSbPuV08P0fUZo2pg==}
+    engines: {node: '>= 18'}
+
   langium@3.3.1:
     resolution: {integrity: sha512-QJv/h939gDpvT+9SiLVlY7tZC3xB2qK57v0J04Sh9wpMb6MP1q8gB21L3WIo8T5P1MSMg3Ep14L7KkDCFG3y4w==}
     engines: {node: '>=16.0.0'}
@@ -6351,6 +6442,9 @@ packages:
     resolution: {integrity: sha512-h5bgJWpxJNswbU7qCrV0tIKQCaS3blPDrqKWx+QxzuzL1zGUzij9XCWLrSLsJPu5t+eWA/ycetzYAO5IOMcWAQ==}
     hasBin: true
 
+  lz-utils@2.1.0:
+    resolution: {integrity: sha512-CMkfimAypidTtWjNDxY8a1bc1mJdyEh04V2FfEQ5Zh8Nx4v7k850EYa+dOWGn9hKG5xOyHP5MkuduAZCTHRvJw==}
+
   magic-string@0.30.21:
     resolution: {integrity: sha512-vd2F4YUyEXKGcLHoq+TEyCjxueSeHnFxyyjNp80yg0XV4vUhnDer/lvvlqM/arB5bXQN5K2/3oinyCRyx8T2CQ==}
 
@@ -6456,6 +6550,10 @@ packages:
   mdurl@2.0.0:
     resolution: {integrity: sha512-Lf+9+2r+Tdp5wXDXC4PcIBjTDtq4UKjCPMQhKIuzpJNW0b96kVqSwW0bT7FhRSfmAiFYgP+SCRvdrDozfh0U5w==}
 
+  media-typer@1.1.0:
+    resolution: {integrity: sha512-aisnrDP4GNe06UcKFnV5bfMNPBUw4jsLGaWwWfnH3v02GnBuXX2MCVn5RbrWo0j3pczUilYblq7fQ7Nw2t5XKw==}
+    engines: {node: '>= 0.8'}
+
   memfs@3.5.3:
     resolution: {integrity: sha512-UERzLsxzllchadvbPs5aolHh65ISpKpM+ccLbOJ8/vvpBKmAWf+la7dXFy7Mr0ySHbdHrFv5kGFCUHHe6GFEmw==}
     engines: {node: '>= 4.0.0'}
@@ -6598,10 +6696,18 @@ packages:
     resolution: {integrity: sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg==}
     engines: {node: '>= 0.6'}
 
+  mime-db@1.54.0:
+    resolution: {integrity: sha512-aU5EJuIN2WDemCcAp2vFBfp/m4EAhWJnUNSSw0ixs7/kXbd6Pg64EmwJkNdFhB8aWt1sH2CTXrLxo/iAGV3oPQ==}
+    engines: {node: '>= 0.6'}
+
   mime-types@2.1.35:
     resolution: {integrity: sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw==}
     engines: {node: '>= 0.6'}
 
+  mime-types@3.0.2:
+    resolution: {integrity: sha512-Lbgzdk0h4juoQ9fCKXW4by0UJqj+nOOrI9MJ1sSj4nI8aI2eo1qmvQEie4VD1glsS250n15LsWsYtCugiStS5A==}
+    engines: {node: '>=18'}
+
   mimic-fn@2.1.0:
     resolution: {integrity: sha512-OqbOk5oEQeAZ8WXWydlu9HJjz9WVdEIvamMCcXmuqUYjTknH/sqsWvhQ3vgwKFRR1HpjvNBKQ37nbJgYzGqGcg==}
     engines: {node: '>=6'}
@@ -6640,6 +6746,17 @@ packages:
   module-details-from-path@1.0.4:
     resolution: {integrity: sha512-EGWKgxALGMgzvxYF1UyGTy0HXX/2vHLkw6+NvDKW2jypWbHpjQuj4UMcqQWXHERJhVGKikolT06G3bcKe4fi7w==}
 
+  monocart-coverage-reports@2.12.9:
+    resolution: {integrity: sha512-vtFqbC3Egl4nVa1FSIrQvMPO6HZtb9lo+3IW7/crdvrLNW2IH8lUsxaK0TsKNmMO2mhFWwqQywLV2CZelqPgwA==}
+    hasBin: true
+
+  monocart-locator@1.0.2:
+    resolution: {integrity: sha512-v8W5hJLcWMIxLCcSi/MHh+VeefI+ycFmGz23Froer9QzWjrbg4J3gFJBuI/T1VLNoYxF47bVPPxq8ZlNX4gVCw==}
+
+  monocart-reporter@2.10.0:
+    resolution: {integrity: sha512-Q421HL8hCr024HMjQcQylEpOLy69FE6Zli2s/A0zptfFEPW/kaz6B1Ll3CYs8L1j67+egt1HeNC1LTHUsp6W+A==}
+    hasBin: true
+
   motion-dom@12.24.8:
     resolution: {integrity: sha512-wX64WITk6gKOhaTqhsFqmIkayLAAx45SVFiMnJIxIrH5uqyrwrxjrfo8WX9Kh8CaUAixjeMn82iH0W0QT9wD5w==}
 
@@ -6688,6 +6805,10 @@ packages:
   natural-compare@1.4.0:
     resolution: {integrity: sha512-OWND8ei3VtNC9h7V60qff3SVobHr996CTwgxubgyQYEpg290h9J0buyECNNJexkFm5sOajh5G116RYA1c8ZMSw==}
 
+  negotiator@0.6.3:
+    resolution: {integrity: sha512-+EUsqGPLsM+j/zdChZjsnX51g4XrHFOIXwfnCVPGlQk/k5giakcKsuxCObBRu6DSm9opw/O6slWbJdghQM4bBg==}
+    engines: {node: '>= 0.6'}
+
   neo-async@2.6.2:
     resolution: {integrity: sha512-Yd3UES5mWCSqR+qNT93S3UoYUkqAZ9lLg8a7g9rimsWmYGK8cVToA4/sF3RrshdyV3sAGMXVUmpMYOw+dLpOuw==}
 
@@ -6757,6 +6878,10 @@ packages:
   node-releases@2.0.27:
     resolution: {integrity: sha512-nmh3lCkYZ3grZvqcCH+fjmQ7X+H0OeZgP40OierEaAptX4XofMh5kwNbWh7lBduUzCcV/8kZ+NDLCwm2iorIlA==}
 
+  nodemailer@7.0.13:
+    resolution: {integrity: sha512-PNDFSJdP+KFgdsG3ZzMXCgquO7I6McjY2vlqILjtJd0hy8wEvtugS9xKRF2NWlPNGxvLCXlTNIae4serI7dinw==}
+    engines: {node: '>=6.0.0'}
+
   normalize-path@3.0.0:
     resolution: {integrity: sha512-6eZs5Ls3WtCisHWp9S2GUy8dqkpGi4BVSz3GaqiE6ezub0512ESztXUwUB6C6IKbQkY2Pnb/mD4WYojCRwcwLA==}
     engines: {node: '>=0.10.0'}
@@ -6851,6 +6976,10 @@ packages:
   obug@2.1.1:
     resolution: {integrity: sha512-uTqF9MuPraAQ+IsnPf366RG4cP9RtUi7MLO1N3KEc+wb0a6yKpeL0lmk2IB1jY5KHPAlTc6T/JRdC/YqxHNwkQ==}
 
+  on-finished@2.4.1:
+    resolution: {integrity: sha512-oVlzkg3ENAhCk2zdv7IJwd/QUD4z2RxRwpkcGY8psCVcCYZNq4wYnVWALHM+brtuJjePWiYF/ClmuDr8Ch5+kg==}
+    engines: {node: '>= 0.8'}
+
   once@1.4.0:
     resolution: {integrity: sha512-lNaJgI+2Q5URQBkccEKHTQOPaXdUxnZZElQTZY0MFUAuaEqe1E+Nyvgdz/aIyNi6Z9MzO5dv1H8n58/GELp3+w==}
 
@@ -6953,6 +7082,10 @@ packages:
   parse5@8.0.0:
     resolution: {integrity: sha512-9m4m5GSgXjL4AjumKzq1Fgfp3Z8rsvjRNbnkVwfu2ImRqE5D0LnY2QfDen18FSY9C573YU5XxSapdHZTZ2WolA==}
 
+  parseurl@1.3.3:
+    resolution: {integrity: sha512-CiyeOxFT/JZyN5m0z9PfXw4SCBJ6Sygz1Dpl0wqjlhDEGGBP1GnsUVEL0p63hoG1fcj3fHynXi9NYO4nWOL+qQ==}
+    engines: {node: '>= 0.8'}
+
   pascal-case@3.1.2:
     resolution: {integrity: sha512-uWlGT3YSnK9x3BQJaOdcZwrnV6hPpd8jFH1/ucpiLRPh/2zCVJKS19E4GvYHvaCcACn3foXZ0cLB9Wrx1KGe5g==}
 
@@ -7751,6 +7884,9 @@ packages:
   setimmediate@1.0.5:
     resolution: {integrity: sha512-MATJdZp8sLqDl/68LfQmbP8zKPLQNV6BIZoIgrscFDQ+RsvK/BxeDQOgyxKKoh0y/8h3BqVFnCqQ/gd+reiIXA==}
 
+  setprototypeof@1.2.0:
+    resolution: {integrity: sha512-E5LDX7Wrp85Kil5bhZv46j8jOeboKq5JMmYM3gVGdGH8xFpPWXUMsNrlODCrkoxMEeNi/XZIwuRvY4XNwYMJpw==}
+
   sha.js@2.4.12:
     resolution: {integrity: sha512-8LzC5+bvI45BjpfXU8V5fdU2mfeKiQe1D1gIMn7XUlF3OTUrpdJpPPH4EMAnF0DsHHdSZqCdSss5qCmJKuiO3w==}
     engines: {node: '>= 0.10'}
@@ -7872,6 +8008,10 @@ packages:
     resolution: {integrity: sha512-WjlahMgHmCJpqzU8bIBy4qtsZdU9lRlcZE3Lvyej6t4tuOuv1vk57OW3MBrj6hXBFx/nNoC9MPMTcr5YA7NQbg==}
     engines: {node: '>=6'}
 
+  statuses@1.5.0:
+    resolution: {integrity: sha512-OpZ3zP+jT1PI7I8nemJX4AKmAX070ZkYPVWV/AaKTJl+tXCTGyVdC1a4SL8RUQYEwk/f34ZX8UTykN68FwrqAA==}
+    engines: {node: '>= 0.6'}
+
   statuses@2.0.2:
     resolution: {integrity: sha512-DvEy55V3DB7uknRo+4iOGT5fP1slR8wQohVdknigZPMpMstaKJQWhwiYBACJE3Ul2pTnATihhBYnRhZQHGBiRw==}
     engines: {node: '>= 0.8'}
@@ -8157,6 +8297,10 @@ packages:
     resolution: {integrity: sha512-65P7iz6X5yEr1cwcgvQxbbIw7Uk3gOy5dIdtZ4rDveLqhrdJP+Li/Hx6tyK0NEb+2GCyneCMJiGqrADCSNk8sQ==}
     engines: {node: '>=8.0'}
 
+  toidentifier@1.0.1:
+    resolution: {integrity: sha512-o5sSPKEkg/DIQNmH43V0/uerLrpzVedkUh8tGNvaeXpfpuwjKenlSox/2O/BTlZUtEe+JG7s5YhEz608PlAHRA==}
+    engines: {node: '>=0.6'}
+
   tough-cookie@6.0.0:
     resolution: {integrity: sha512-kXuRi1mtaKMrsLUxz3sQYvVl37B0Ns6MzfrtV5DvJceE9bPyspOqk9xxv7XbZWcfLWbFmm997vl83qUWVJA64w==}
     engines: {node: '>=16'}
@@ -8228,6 +8372,10 @@ packages:
   tslib@2.8.1:
     resolution: {integrity: sha512-oJFu94HQb+KVduSUQL7wnpmqnfmLsOA/nAh6b6EH0wCEoK0/mPeXU6c3wKDV83MkOuHPRHtSXKKU99IBazS/2w==}
 
+  tsscmp@1.0.6:
+    resolution: {integrity: sha512-LxhtAkPDTkVCMQjt2h6eBVY28KCjikZqZfMcC15YBeNjkgUpdCfBu5HoiOTDu86v6smE8yOjyEktJ8hlbANHQA==}
+    engines: {node: '>=0.6.x'}
+
   tty-browserify@0.0.1:
     resolution: {integrity: sha512-C3TaO7K81YvjCgQH9Q1S3R3P3BtN3RIM8n+OvX4il1K1zgE8ZhI0op7kClgkxtutIE8hQrcrHBXvIheqKUUCxw==}
 
@@ -8257,6 +8405,10 @@ packages:
     resolution: {integrity: sha512-TeTSQ6H5YHvpqVwBRcnLDCBnDOHWYu7IvGbHT6N8AOymcr9PJGjc1GTtiWZTYg0NCgYwvnYWEkVChQAr9bjfwA==}
     engines: {node: '>=16'}
 
+  type-is@2.0.1:
+    resolution: {integrity: sha512-OZs6gsjF4vMp32qrCbiVSkrFmXtG/AZhY3t0iAMrMBiAZyV9oALtXO8hsrHbMXF9x6L3grlFuwW2oAz7cav+Gw==}
+    engines: {node: '>= 0.6'}
+
   typed-array-buffer@1.0.3:
     resolution: {integrity: sha512-nAYYwfY3qnzX30IkA6AQZjVbtK6duGontcQm1WSG1MD94YLqK0515GNApXkoxKOWMusVssAHWLh9SeaoefYFGw==}
     engines: {node: '>= 0.4'}
@@ -8457,6 +8609,10 @@ packages:
     resolution: {integrity: sha512-spH26xU080ydGggxRyR1Yhcbgx+j3y5jbNXk/8L+iRvdIEQ4uTRH2Sgf2dokud6Q4oAtsbNvJ1Ft+9xmm6IZcA==}
     engines: {node: '>= 0.10'}
 
+  vary@1.1.2:
+    resolution: {integrity: sha512-BNGbWLfd0eUPabhkXUVm0j8uuvREyTh5ovRa/dyow/BqAbZJyC+5fU+IzQOzmAKzYqYRAISoRhdQr3eIZ/PXqg==}
+    engines: {node: '>= 0.8'}
+
   vaul@1.1.2:
     resolution: {integrity: sha512-ZFkClGpWyI2WUQjdLJ/BaGuV6AVQiJ3uELGk3OYtP+B6yCO7Cmn9vPFXVJkRaGkOJu3m8bQMgtyzNHixULceQA==}
     peerDependencies:
@@ -12911,6 +13067,11 @@ snapshots:
     dependencies:
       event-target-shim: 5.0.1
 
+  accepts@1.3.8:
+    dependencies:
+      mime-types: 2.1.35
+      negotiator: 0.6.3
+
   acorn-import-attributes@1.9.5(acorn@8.15.0):
     dependencies:
       acorn: 8.15.0
@@ -12923,6 +13084,14 @@ snapshots:
     dependencies:
       acorn: 8.15.0
 
+  acorn-loose@8.5.2:
+    dependencies:
+      acorn: 8.15.0
+
+  acorn-walk@8.3.5:
+    dependencies:
+      acorn: 8.15.0
+
   acorn@8.15.0: {}
 
   adjust-sourcemap-loader@4.0.0:
@@ -13472,14 +13641,25 @@ snapshots:
 
   console-browserify@1.2.0: {}
 
+  console-grid@2.2.3: {}
+
   constants-browserify@1.0.0: {}
 
+  content-disposition@1.0.1: {}
+
+  content-type@1.0.5: {}
+
   convert-source-map@1.9.0: {}
 
   convert-source-map@2.0.0: {}
 
   cookie@1.0.2: {}
 
+  cookies@0.9.1:
+    dependencies:
+      depd: 2.0.0
+      keygrip: 1.1.0
+
   core-js-compat@3.47.0:
     dependencies:
       browserslist: 4.28.1
@@ -13843,6 +14023,8 @@ snapshots:
 
   deep-eql@5.0.2: {}
 
+  deep-equal@1.0.1: {}
+
   deep-is@0.1.4: {}
 
   deepmerge-ts@7.1.5: {}
@@ -13867,6 +14049,12 @@ snapshots:
     dependencies:
       robust-predicates: 3.0.2
 
+  delegates@1.0.0: {}
+
+  depd@1.1.2: {}
+
+  depd@2.0.0: {}
+
   dependency-graph@0.11.0: {}
 
   dequal@2.0.3: {}
@@ -13876,6 +14064,8 @@ snapshots:
       inherits: 2.0.4
       minimalistic-assert: 1.0.1
 
+  destroy@1.2.0: {}
+
   detect-libc@2.1.2:
     optional: true
 
@@ -13958,6 +14148,10 @@ snapshots:
 
   eastasianwidth@0.2.0: {}
 
+  ee-first@1.1.1: {}
+
+  eight-colors@1.3.2: {}
+
   electron-to-chromium@1.5.267: {}
 
   elliptic@6.6.1:
@@ -13990,6 +14184,8 @@ snapshots:
 
   emojis-list@3.0.0: {}
 
+  encodeurl@2.0.0: {}
+
   endent@2.1.0:
     dependencies:
       dedent: 0.7.0
@@ -14209,6 +14405,8 @@ snapshots:
 
   escalade@3.2.0: {}
 
+  escape-html@1.0.3: {}
+
   escape-string-regexp@4.0.0: {}
 
   escape-string-regexp@5.0.0: {}
@@ -14606,6 +14804,8 @@ snapshots:
       react: 18.3.1
       react-dom: 18.3.1(react@18.3.1)
 
+  fresh@0.5.2: {}
+
   fs-extra@10.1.0:
     dependencies:
       graceful-fs: 4.2.11
@@ -14994,6 +15194,27 @@ snapshots:
       domutils: 2.8.0
       entities: 2.2.0
 
+  http-assert@1.5.0:
+    dependencies:
+      deep-equal: 1.0.1
+      http-errors: 1.8.1
+
+  http-errors@1.8.1:
+    dependencies:
+      depd: 1.1.2
+      inherits: 2.0.4
+      setprototypeof: 1.2.0
+      statuses: 1.5.0
+      toidentifier: 1.0.1
+
+  http-errors@2.0.1:
+    dependencies:
+      depd: 2.0.0
+      inherits: 2.0.4
+      setprototypeof: 1.2.0
+      statuses: 2.0.2
+      toidentifier: 1.0.1
+
   http-proxy-agent@7.0.2:
     dependencies:
       agent-base: 7.1.4
@@ -15409,12 +15630,41 @@ snapshots:
     dependencies:
       commander: 8.3.0
 
+  keygrip@1.1.0:
+    dependencies:
+      tsscmp: 1.0.6
+
   keyv@4.5.4:
     dependencies:
       json-buffer: 3.0.1
 
   khroma@2.1.0: {}
 
+  koa-compose@4.1.0: {}
+
+  koa-static-resolver@1.0.6: {}
+
+  koa@3.2.0:
+    dependencies:
+      accepts: 1.3.8
+      content-disposition: 1.0.1
+      content-type: 1.0.5
+      cookies: 0.9.1
+      delegates: 1.0.0
+      destroy: 1.2.0
+      encodeurl: 2.0.0
+      escape-html: 1.0.3
+      fresh: 0.5.2
+      http-assert: 1.5.0
+      http-errors: 2.0.1
+      koa-compose: 4.1.0
+      mime-types: 3.0.2
+      on-finished: 2.4.1
+      parseurl: 1.3.3
+      statuses: 2.0.2
+      type-is: 2.0.1
+      vary: 1.1.2
+
   langium@3.3.1:
     dependencies:
       chevrotain: 11.0.3
@@ -15552,6 +15802,8 @@ snapshots:
 
   lz-string@1.5.0: {}
 
+  lz-utils@2.1.0: {}
+
   magic-string@0.30.21:
     dependencies:
       '@jridgewell/sourcemap-codec': 1.5.5
@@ -15771,6 +16023,8 @@ snapshots:
 
   mdurl@2.0.0: {}
 
+  media-typer@1.1.0: {}
+
   memfs@3.5.3:
     dependencies:
       fs-monkey: 1.1.0
@@ -16047,10 +16301,16 @@ snapshots:
 
   mime-db@1.52.0: {}
 
+  mime-db@1.54.0: {}
+
   mime-types@2.1.35:
     dependencies:
       mime-db: 1.52.0
 
+  mime-types@3.0.2:
+    dependencies:
+      mime-db: 1.54.0
+
   mimic-fn@2.1.0: {}
 
   min-indent@1.0.1: {}
@@ -16084,6 +16344,34 @@ snapshots:
 
   module-details-from-path@1.0.4: {}
 
+  monocart-coverage-reports@2.12.9:
+    dependencies:
+      acorn: 8.15.0
+      acorn-loose: 8.5.2
+      acorn-walk: 8.3.5
+      commander: 14.0.2
+      console-grid: 2.2.3
+      eight-colors: 1.3.2
+      foreground-child: 3.3.1
+      istanbul-lib-coverage: 3.2.2
+      istanbul-lib-report: 3.0.1
+      istanbul-reports: 3.2.0
+      lz-utils: 2.1.0
+      monocart-locator: 1.0.2
+
+  monocart-locator@1.0.2: {}
+
+  monocart-reporter@2.10.0:
+    dependencies:
+      console-grid: 2.2.3
+      eight-colors: 1.3.2
+      koa: 3.2.0
+      koa-static-resolver: 1.0.6
+      lz-utils: 2.1.0
+      monocart-coverage-reports: 2.12.9
+      monocart-locator: 1.0.2
+      nodemailer: 7.0.13
+
   motion-dom@12.24.8:
     dependencies:
       motion-utils: 12.23.28
@@ -16138,6 +16426,8 @@ snapshots:
 
   natural-compare@1.4.0: {}
 
+  negotiator@0.6.3: {}
+
   neo-async@2.6.2: {}
 
   next-themes@0.4.6(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
@@ -16237,6 +16527,8 @@ snapshots:
 
   node-releases@2.0.27: {}
 
+  nodemailer@7.0.13: {}
+
   normalize-path@3.0.0: {}
 
   npm-run-path@4.0.1:
@@ -16338,6 +16630,10 @@ snapshots:
 
   obug@2.1.1: {}
 
+  on-finished@2.4.1:
+    dependencies:
+      ee-first: 1.1.1
+
   once@1.4.0:
     dependencies:
       wrappy: 1.0.2
@@ -16495,6 +16791,8 @@ snapshots:
       entities: 6.0.1
     optional: true
 
+  parseurl@1.3.3: {}
+
   pascal-case@3.1.2:
     dependencies:
       no-case: 3.0.4
@@ -17365,6 +17663,8 @@ snapshots:
 
   setimmediate@1.0.5: {}
 
+  setprototypeof@1.2.0: {}
+
   sha.js@2.4.12:
     dependencies:
       inherits: 2.0.4
@@ -17526,6 +17826,8 @@ snapshots:
     dependencies:
       type-fest: 0.7.1
 
+  statuses@1.5.0: {}
+
   statuses@2.0.2: {}
 
   std-env@3.10.0: {}
@@ -17873,6 +18175,8 @@ snapshots:
     dependencies:
       is-number: 7.0.0
 
+  toidentifier@1.0.1: {}
+
   tough-cookie@6.0.0:
     dependencies:
       tldts: 7.0.19
@@ -17930,6 +18234,8 @@ snapshots:
 
   tslib@2.8.1: {}
 
+  tsscmp@1.0.6: {}
+
   tty-browserify@0.0.1: {}
 
   twemoji-parser@14.0.0: {}
@@ -17953,6 +18259,12 @@ snapshots:
 
   type-fest@4.41.0: {}
 
+  type-is@2.0.1:
+    dependencies:
+      content-type: 1.0.5
+      media-typer: 1.1.0
+      mime-types: 3.0.2
+
   typed-array-buffer@1.0.3:
     dependencies:
       call-bound: 1.0.4
@@ -18182,6 +18494,8 @@ snapshots:
 
   validator@13.15.26: {}
 
+  vary@1.1.2: {}
+
   vaul@1.1.2(@types/react-dom@18.3.5(@types/react@18.3.17))(@types/react@18.3.17)(react-dom@18.3.1(react@18.3.1))(react@18.3.1):
     dependencies:
       '@radix-ui/react-dialog': 1.1.15(@types/react-dom@18.3.5(@types/react@18.3.17))(@types/react@18.3.17)(react-dom@18.3.1(react@18.3.1))(react@18.3.1)
diff --git a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/__tests__/store.test.ts b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/__tests__/store.test.ts
new file mode 100644
index 0000000000..f28d1fc2cb
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/__tests__/store.test.ts
@@ -0,0 +1,156 @@
+import { describe, it, expect, beforeEach } from "vitest";
+import { useOnboardingWizardStore } from "../store";
+
+beforeEach(() => {
+  useOnboardingWizardStore.getState().reset();
+});
+
+describe("useOnboardingWizardStore", () => {
+  describe("initial state", () => {
+    it("starts at step 1 with empty fields", () => {
+      const state = useOnboardingWizardStore.getState();
+      expect(state.currentStep).toBe(1);
+      expect(state.name).toBe("");
+      expect(state.role).toBe("");
+      expect(state.otherRole).toBe("");
+      expect(state.painPoints).toEqual([]);
+      expect(state.otherPainPoint).toBe("");
+    });
+  });
+
+  describe("setName", () => {
+    it("updates the name", () => {
+      useOnboardingWizardStore.getState().setName("Alice");
+      expect(useOnboardingWizardStore.getState().name).toBe("Alice");
+    });
+  });
+
+  describe("setRole", () => {
+    it("updates the role", () => {
+      useOnboardingWizardStore.getState().setRole("Engineer");
+      expect(useOnboardingWizardStore.getState().role).toBe("Engineer");
+    });
+  });
+
+  describe("setOtherRole", () => {
+    it("updates the other role text", () => {
+      useOnboardingWizardStore.getState().setOtherRole("Designer");
+      expect(useOnboardingWizardStore.getState().otherRole).toBe("Designer");
+    });
+  });
+
+  describe("togglePainPoint", () => {
+    it("adds a pain point", () => {
+      useOnboardingWizardStore.getState().togglePainPoint("slow builds");
+      expect(useOnboardingWizardStore.getState().painPoints).toEqual([
+        "slow builds",
+      ]);
+    });
+
+    it("removes a pain point when toggled again", () => {
+      useOnboardingWizardStore.getState().togglePainPoint("slow builds");
+      useOnboardingWizardStore.getState().togglePainPoint("slow builds");
+      expect(useOnboardingWizardStore.getState().painPoints).toEqual([]);
+    });
+
+    it("handles multiple pain points", () => {
+      useOnboardingWizardStore.getState().togglePainPoint("slow builds");
+      useOnboardingWizardStore.getState().togglePainPoint("no tests");
+      expect(useOnboardingWizardStore.getState().painPoints).toEqual([
+        "slow builds",
+        "no tests",
+      ]);
+
+      useOnboardingWizardStore.getState().togglePainPoint("slow builds");
+      expect(useOnboardingWizardStore.getState().painPoints).toEqual([
+        "no tests",
+      ]);
+    });
+
+    it("ignores new selections when at the max limit", () => {
+      useOnboardingWizardStore.getState().togglePainPoint("a");
+      useOnboardingWizardStore.getState().togglePainPoint("b");
+      useOnboardingWizardStore.getState().togglePainPoint("c");
+      useOnboardingWizardStore.getState().togglePainPoint("d");
+      expect(useOnboardingWizardStore.getState().painPoints).toEqual([
+        "a",
+        "b",
+        "c",
+      ]);
+    });
+
+    it("still allows deselecting when at the max limit", () => {
+      useOnboardingWizardStore.getState().togglePainPoint("a");
+      useOnboardingWizardStore.getState().togglePainPoint("b");
+      useOnboardingWizardStore.getState().togglePainPoint("c");
+      useOnboardingWizardStore.getState().togglePainPoint("b");
+      expect(useOnboardingWizardStore.getState().painPoints).toEqual([
+        "a",
+        "c",
+      ]);
+    });
+  });
+
+  describe("setOtherPainPoint", () => {
+    it("updates the other pain point text", () => {
+      useOnboardingWizardStore.getState().setOtherPainPoint("flaky CI");
+      expect(useOnboardingWizardStore.getState().otherPainPoint).toBe(
+        "flaky CI",
+      );
+    });
+  });
+
+  describe("nextStep", () => {
+    it("increments the step", () => {
+      useOnboardingWizardStore.getState().nextStep();
+      expect(useOnboardingWizardStore.getState().currentStep).toBe(2);
+    });
+
+    it("clamps at step 4", () => {
+      useOnboardingWizardStore.getState().goToStep(4);
+      useOnboardingWizardStore.getState().nextStep();
+      expect(useOnboardingWizardStore.getState().currentStep).toBe(4);
+    });
+  });
+
+  describe("prevStep", () => {
+    it("decrements the step", () => {
+      useOnboardingWizardStore.getState().goToStep(3);
+      useOnboardingWizardStore.getState().prevStep();
+      expect(useOnboardingWizardStore.getState().currentStep).toBe(2);
+    });
+
+    it("clamps at step 1", () => {
+      useOnboardingWizardStore.getState().prevStep();
+      expect(useOnboardingWizardStore.getState().currentStep).toBe(1);
+    });
+  });
+
+  describe("goToStep", () => {
+    it("jumps to an arbitrary step", () => {
+      useOnboardingWizardStore.getState().goToStep(3);
+      expect(useOnboardingWizardStore.getState().currentStep).toBe(3);
+    });
+  });
+
+  describe("reset", () => {
+    it("resets all fields to defaults", () => {
+      useOnboardingWizardStore.getState().setName("Alice");
+      useOnboardingWizardStore.getState().setRole("Engineer");
+      useOnboardingWizardStore.getState().setOtherRole("Other");
+      useOnboardingWizardStore.getState().togglePainPoint("slow builds");
+      useOnboardingWizardStore.getState().setOtherPainPoint("flaky CI");
+      useOnboardingWizardStore.getState().goToStep(3);
+
+      useOnboardingWizardStore.getState().reset();
+
+      const state = useOnboardingWizardStore.getState();
+      expect(state.currentStep).toBe(1);
+      expect(state.name).toBe("");
+      expect(state.role).toBe("");
+      expect(state.otherRole).toBe("");
+      expect(state.painPoints).toEqual([]);
+      expect(state.otherPainPoint).toBe("");
+    });
+  });
+});
diff --git a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/components/ProgressBar.tsx b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/components/ProgressBar.tsx
index aee653d93f..71819d7d4c 100644
--- a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/components/ProgressBar.tsx
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/components/ProgressBar.tsx
@@ -7,9 +7,9 @@ export function ProgressBar({ currentStep, totalSteps }: Props) {
   const percent = (currentStep / totalSteps) * 100;
 
   return (
-    <div className="absolute left-0 top-0 h-[0.625rem] w-full bg-neutral-300">
+    <div className="absolute left-0 top-0 h-[3px] w-full bg-neutral-200">
       <div
-        className="h-full bg-purple-400 shadow-[0_0_4px_2px_rgba(168,85,247,0.5)] transition-all duration-500 ease-out"
+        className="h-full bg-purple-400 transition-all duration-500 ease-out"
         style={{ width: `${percent}%` }}
       />
     </div>
diff --git a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/components/SelectableCard.tsx b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/components/SelectableCard.tsx
index 7559ff3e21..574f02fd7b 100644
--- a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/components/SelectableCard.tsx
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/components/SelectableCard.tsx
@@ -2,6 +2,7 @@
 
 import { Text } from "@/components/atoms/Text/Text";
 import { cn } from "@/lib/utils";
+import { Check } from "@phosphor-icons/react";
 
 interface Props {
   icon: React.ReactNode;
@@ -24,13 +25,18 @@ export function SelectableCard({
       onClick={onClick}
       aria-pressed={selected}
       className={cn(
-        "flex h-[9rem] w-[10.375rem] shrink-0 flex-col items-center justify-center gap-3 rounded-xl border-2 bg-white px-6 py-5 transition-all hover:shadow-sm md:shrink lg:gap-2 lg:px-10 lg:py-8",
+        "relative flex h-[9rem] w-[10.375rem] shrink-0 flex-col items-center justify-center gap-3 rounded-xl border-2 bg-white px-6 py-5 transition-all hover:shadow-sm md:shrink lg:gap-2 lg:px-10 lg:py-8",
         className,
         selected
           ? "border-purple-500 bg-purple-50 shadow-sm"
           : "border-transparent",
       )}
     >
+      {selected && (
+        <span className="absolute right-2 top-2 flex h-5 w-5 items-center justify-center rounded-full bg-purple-500">
+          <Check size={12} weight="bold" className="text-white" />
+        </span>
+      )}
       <Text
         variant="lead"
         as="span"
diff --git a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/PainPointsStep.tsx b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/PainPointsStep.tsx
index e6323a7706..2f8dd75e75 100644
--- a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/PainPointsStep.tsx
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/PainPointsStep.tsx
@@ -3,6 +3,7 @@
 import { Button } from "@/components/atoms/Button/Button";
 import { Input } from "@/components/atoms/Input/Input";
 import { Text } from "@/components/atoms/Text/Text";
+import { cn } from "@/lib/utils";
 import { ReactNode } from "react";
 
 import { FadeIn } from "@/components/atoms/FadeIn/FadeIn";
@@ -73,6 +74,8 @@ export function PainPointsStep() {
     togglePainPoint,
     setOtherPainPoint,
     hasSomethingElse,
+    atLimit,
+    shaking,
     canContinue,
     handleLaunch,
   } = usePainPointsStep();
@@ -90,7 +93,7 @@ export function PainPointsStep() {
             What&apos;s eating your time?
           </Text>
           <Text variant="lead" className="!text-zinc-500">
-            Pick the tasks you&apos;d love to hand off to Autopilot
+            Pick the tasks you&apos;d love to hand off to AutoPilot
           </Text>
         </div>
 
@@ -107,11 +110,22 @@ export function PainPointsStep() {
               />
             ))}
           </div>
-          {!hasSomethingElse ? (
-            <Text variant="small" className="!text-zinc-500">
-              Pick as many as you want — you can always change later
-            </Text>
-          ) : null}
+          <Text
+            variant="small"
+            className={cn(
+              "transition-colors",
+              atLimit && canContinue ? "!text-green-600" : "!text-zinc-500",
+              shaking && "animate-shake",
+            )}
+          >
+            {shaking
+              ? "You've picked 3 — tap one to swap it out"
+              : atLimit && canContinue
+                ? "3 selected — you're all set!"
+                : atLimit && hasSomethingElse
+                  ? "Tell us what else takes up your time"
+                  : "Pick up to 3 to start — AutoPilot can help with anything else later"}
+          </Text>
         </div>
 
         {hasSomethingElse && (
@@ -133,7 +147,7 @@ export function PainPointsStep() {
           disabled={!canContinue}
           className="w-full max-w-xs"
         >
-          Launch Autopilot
+          Launch AutoPilot
         </Button>
       </div>
     </FadeIn>
diff --git a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/RoleStep.tsx b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/RoleStep.tsx
index 79704e3e31..9bb6af42cd 100644
--- a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/RoleStep.tsx
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/RoleStep.tsx
@@ -8,6 +8,7 @@ import { FadeIn } from "@/components/atoms/FadeIn/FadeIn";
 import { SelectableCard } from "../components/SelectableCard";
 import { useOnboardingWizardStore } from "../store";
 import { Emoji } from "@/components/atoms/Emoji/Emoji";
+import { useEffect, useRef } from "react";
 
 const IMG_SIZE = 42;
 
@@ -57,12 +58,26 @@ export function RoleStep() {
   const setRole = useOnboardingWizardStore((s) => s.setRole);
   const setOtherRole = useOnboardingWizardStore((s) => s.setOtherRole);
   const nextStep = useOnboardingWizardStore((s) => s.nextStep);
+  const autoAdvanceTimer = useRef<ReturnType<typeof setTimeout> | null>(null);
 
   const isOther = role === "Other";
-  const canContinue = role && (!isOther || otherRole.trim());
 
-  function handleContinue() {
-    if (canContinue) {
+  useEffect(() => {
+    return () => {
+      if (autoAdvanceTimer.current) clearTimeout(autoAdvanceTimer.current);
+    };
+  }, []);
+
+  function handleRoleSelect(id: string) {
+    if (autoAdvanceTimer.current) clearTimeout(autoAdvanceTimer.current);
+    setRole(id);
+    if (id !== "Other") {
+      autoAdvanceTimer.current = setTimeout(nextStep, 350);
+    }
+  }
+
+  function handleOtherContinue() {
+    if (otherRole.trim()) {
       nextStep();
     }
   }
@@ -78,7 +93,7 @@ export function RoleStep() {
             What best describes you, {name}?
           </Text>
           <Text variant="lead" className="!text-zinc-500">
-            Autopilot will tailor automations to your world
+            So AutoPilot knows how to help you best
           </Text>
         </div>
 
@@ -89,33 +104,35 @@ export function RoleStep() {
               icon={r.icon}
               label={r.label}
               selected={role === r.id}
-              onClick={() => setRole(r.id)}
+              onClick={() => handleRoleSelect(r.id)}
               className="p-8"
             />
           ))}
         </div>
 
         {isOther && (
-          <div className="-mb-5 w-full px-8 md:px-0">
-            <Input
-              id="other-role"
-              label="Other role"
-              hideLabel
-              placeholder="Describe your role..."
-              value={otherRole}
-              onChange={(e) => setOtherRole(e.target.value)}
-              autoFocus
-            />
-          </div>
-        )}
+          <>
+            <div className="-mb-5 w-full px-8 md:px-0">
+              <Input
+                id="other-role"
+                label="Other role"
+                hideLabel
+                placeholder="Describe your role..."
+                value={otherRole}
+                onChange={(e) => setOtherRole(e.target.value)}
+                autoFocus
+              />
+            </div>
 
-        <Button
-          onClick={handleContinue}
-          disabled={!canContinue}
-          className="w-full max-w-xs"
-        >
-          Continue
-        </Button>
+            <Button
+              onClick={handleOtherContinue}
+              disabled={!otherRole.trim()}
+              className="w-full max-w-xs"
+            >
+              Continue
+            </Button>
+          </>
+        )}
       </div>
     </FadeIn>
   );
diff --git a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/WelcomeStep.tsx b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/WelcomeStep.tsx
index fa054161cc..06ce9b57b7 100644
--- a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/WelcomeStep.tsx
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/WelcomeStep.tsx
@@ -4,13 +4,6 @@ import { AutoGPTLogo } from "@/components/atoms/AutoGPTLogo/AutoGPTLogo";
 import { Button } from "@/components/atoms/Button/Button";
 import { Input } from "@/components/atoms/Input/Input";
 import { Text } from "@/components/atoms/Text/Text";
-import {
-  Tooltip,
-  TooltipContent,
-  TooltipProvider,
-  TooltipTrigger,
-} from "@/components/atoms/Tooltip/BaseTooltip";
-import { Question } from "@phosphor-icons/react";
 import { FadeIn } from "@/components/atoms/FadeIn/FadeIn";
 import { useOnboardingWizardStore } from "../store";
 
@@ -40,36 +33,16 @@ export function WelcomeStep() {
           <Text variant="h3">Welcome to AutoGPT</Text>
           <Text variant="lead" as="span" className="!text-zinc-500">
             Let&apos;s personalize your experience so{" "}
-            <span className="relative mr-3 inline-block bg-gradient-to-r from-purple-500 to-indigo-500 bg-clip-text text-transparent">
-              Autopilot
-              <span className="absolute -right-4 top-0">
-                <TooltipProvider delayDuration={400}>
-                  <Tooltip>
-                    <TooltipTrigger asChild>
-                      <button
-                        type="button"
-                        aria-label="What is Autopilot?"
-                        className="inline-flex text-purple-500"
-                      >
-                        <Question size={14} />
-                      </button>
-                    </TooltipTrigger>
-                    <TooltipContent>
-                      Autopilot is AutoGPT&apos;s AI assistant that watches your
-                      connected apps, spots repetitive tasks you do every day
-                      and runs them for you automatically.
-                    </TooltipContent>
-                  </Tooltip>
-                </TooltipProvider>
-              </span>
+            <span className="bg-gradient-to-r from-purple-500 to-indigo-500 bg-clip-text text-transparent">
+              AutoPilot
             </span>{" "}
-            can start saving you time right away
+            can start saving you time
           </Text>
         </div>
 
         <Input
           id="first-name"
-          label="Your first name"
+          label="What should I call you?"
           placeholder="e.g. John"
           value={name}
           onChange={(e) => setName(e.target.value)}
diff --git a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/__tests__/PainPointsStep.test.tsx b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/__tests__/PainPointsStep.test.tsx
new file mode 100644
index 0000000000..f6843f7998
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/__tests__/PainPointsStep.test.tsx
@@ -0,0 +1,154 @@
+import {
+  render,
+  screen,
+  fireEvent,
+  cleanup,
+} from "@/tests/integrations/test-utils";
+import { afterEach, beforeEach, describe, expect, test, vi } from "vitest";
+import { useOnboardingWizardStore } from "../../store";
+import { PainPointsStep } from "../PainPointsStep";
+
+vi.mock("@/components/atoms/Emoji/Emoji", () => ({
+  Emoji: ({ text }: { text: string }) => <span>{text}</span>,
+}));
+
+vi.mock("@/components/atoms/FadeIn/FadeIn", () => ({
+  FadeIn: ({ children }: { children: React.ReactNode }) => (
+    <div>{children}</div>
+  ),
+}));
+
+function getCard(name: RegExp) {
+  return screen.getByRole("button", { name });
+}
+
+function clickCard(name: RegExp) {
+  fireEvent.click(getCard(name));
+}
+
+function getLaunchButton() {
+  return screen.getByRole("button", { name: /launch autopilot/i });
+}
+
+afterEach(cleanup);
+
+beforeEach(() => {
+  useOnboardingWizardStore.getState().reset();
+  useOnboardingWizardStore.getState().setName("Alice");
+  useOnboardingWizardStore.getState().setRole("Founder/CEO");
+  useOnboardingWizardStore.getState().goToStep(3);
+});
+
+describe("PainPointsStep", () => {
+  test("renders all pain point cards", () => {
+    render(<PainPointsStep />);
+
+    expect(getCard(/finding leads/i)).toBeDefined();
+    expect(getCard(/email & outreach/i)).toBeDefined();
+    expect(getCard(/reports & data/i)).toBeDefined();
+    expect(getCard(/customer support/i)).toBeDefined();
+    expect(getCard(/social media/i)).toBeDefined();
+    expect(getCard(/something else/i)).toBeDefined();
+  });
+
+  test("shows default helper text", () => {
+    render(<PainPointsStep />);
+
+    expect(
+      screen.getAllByText(/pick up to 3 to start/i).length,
+    ).toBeGreaterThan(0);
+  });
+
+  test("selecting a card marks it as pressed", () => {
+    render(<PainPointsStep />);
+
+    clickCard(/finding leads/i);
+
+    expect(getCard(/finding leads/i).getAttribute("aria-pressed")).toBe("true");
+  });
+
+  test("launch button is disabled when nothing is selected", () => {
+    render(<PainPointsStep />);
+
+    expect(getLaunchButton().hasAttribute("disabled")).toBe(true);
+  });
+
+  test("launch button is enabled after selecting a pain point", () => {
+    render(<PainPointsStep />);
+
+    clickCard(/finding leads/i);
+
+    expect(getLaunchButton().hasAttribute("disabled")).toBe(false);
+  });
+
+  test("shows success text when 3 items are selected", () => {
+    render(<PainPointsStep />);
+
+    clickCard(/finding leads/i);
+    clickCard(/email & outreach/i);
+    clickCard(/reports & data/i);
+
+    expect(screen.getAllByText(/3 selected/i).length).toBeGreaterThan(0);
+  });
+
+  test("does not select a 4th item when at the limit", () => {
+    render(<PainPointsStep />);
+
+    clickCard(/finding leads/i);
+    clickCard(/email & outreach/i);
+    clickCard(/reports & data/i);
+    clickCard(/customer support/i);
+
+    expect(getCard(/customer support/i).getAttribute("aria-pressed")).toBe(
+      "false",
+    );
+  });
+
+  test("can deselect when at the limit and select a different one", () => {
+    render(<PainPointsStep />);
+
+    clickCard(/finding leads/i);
+    clickCard(/email & outreach/i);
+    clickCard(/reports & data/i);
+
+    clickCard(/finding leads/i);
+    expect(getCard(/finding leads/i).getAttribute("aria-pressed")).toBe(
+      "false",
+    );
+
+    clickCard(/customer support/i);
+    expect(getCard(/customer support/i).getAttribute("aria-pressed")).toBe(
+      "true",
+    );
+  });
+
+  test("shows input when 'Something else' is selected", () => {
+    render(<PainPointsStep />);
+
+    clickCard(/something else/i);
+
+    expect(
+      screen.getByPlaceholderText(/what else takes up your time/i),
+    ).toBeDefined();
+  });
+
+  test("launch button is disabled when 'Something else' selected but input empty", () => {
+    render(<PainPointsStep />);
+
+    clickCard(/something else/i);
+
+    expect(getLaunchButton().hasAttribute("disabled")).toBe(true);
+  });
+
+  test("launch button is enabled when 'Something else' selected and input filled", () => {
+    render(<PainPointsStep />);
+
+    clickCard(/something else/i);
+    fireEvent.change(
+      screen.getByPlaceholderText(/what else takes up your time/i),
+      { target: { value: "Manual invoicing" } },
+    );
+
+    expect(getLaunchButton().hasAttribute("disabled")).toBe(false);
+  });
+});
diff --git a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/__tests__/RoleStep.test.tsx b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/__tests__/RoleStep.test.tsx
new file mode 100644
index 0000000000..0cafccab98
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/__tests__/RoleStep.test.tsx
@@ -0,0 +1,123 @@
+import {
+  render,
+  screen,
+  fireEvent,
+  cleanup,
+} from "@/tests/integrations/test-utils";
+import { afterEach, beforeEach, describe, expect, test, vi } from "vitest";
+import { useOnboardingWizardStore } from "../../store";
+import { RoleStep } from "../RoleStep";
+
+vi.mock("@/components/atoms/Emoji/Emoji", () => ({
+  Emoji: ({ text }: { text: string }) => <span>{text}</span>,
+}));
+
+vi.mock("@/components/atoms/FadeIn/FadeIn", () => ({
+  FadeIn: ({ children }: { children: React.ReactNode }) => (
+    <div>{children}</div>
+  ),
+}));
+
+afterEach(() => {
+  cleanup();
+  vi.useRealTimers();
+});
+
+beforeEach(() => {
+  vi.useFakeTimers();
+  useOnboardingWizardStore.getState().reset();
+  useOnboardingWizardStore.getState().setName("Alice");
+  useOnboardingWizardStore.getState().goToStep(2);
+});
+
+describe("RoleStep", () => {
+  test("renders all role cards", () => {
+    render(<RoleStep />);
+
+    expect(screen.getByText("Founder / CEO")).toBeDefined();
+    expect(screen.getByText("Operations")).toBeDefined();
+    expect(screen.getByText("Sales / BD")).toBeDefined();
+    expect(screen.getByText("Marketing")).toBeDefined();
+    expect(screen.getByText("Product / PM")).toBeDefined();
+    expect(screen.getByText("Engineering")).toBeDefined();
+    expect(screen.getByText("HR / People")).toBeDefined();
+    expect(screen.getByText("Other")).toBeDefined();
+  });
+
+  test("displays the user name in the heading", () => {
+    render(<RoleStep />);
+
+    expect(
+      screen.getAllByText(/what best describes you, alice/i).length,
+    ).toBeGreaterThan(0);
+  });
+
+  test("selecting a non-Other role auto-advances after delay", () => {
+    render(<RoleStep />);
+
+    fireEvent.click(screen.getByRole("button", { name: /engineering/i }));
+
+    expect(useOnboardingWizardStore.getState().role).toBe("Engineering");
+    expect(useOnboardingWizardStore.getState().currentStep).toBe(2);
+
+    vi.advanceTimersByTime(350);
+
+    expect(useOnboardingWizardStore.getState().currentStep).toBe(3);
+  });
+
+  test("selecting 'Other' does not auto-advance", () => {
+    render(<RoleStep />);
+
+    fireEvent.click(screen.getByRole("button", { name: /\bother\b/i }));
+
+    vi.advanceTimersByTime(500);
+
+    expect(useOnboardingWizardStore.getState().currentStep).toBe(2);
+  });
+
+  test("selecting 'Other' shows text input and Continue button", () => {
+    render(<RoleStep />);
+
+    fireEvent.click(screen.getByRole("button", { name: /\bother\b/i }));
+
+    expect(screen.getByPlaceholderText(/describe your role/i)).toBeDefined();
+    expect(screen.getByRole("button", { name: /continue/i })).toBeDefined();
+  });
+
+  test("Continue button is disabled when Other input is empty", () => {
+    render(<RoleStep />);
+
+    fireEvent.click(screen.getByRole("button", { name: /\bother\b/i }));
+
+    const continueBtn = screen.getByRole("button", { name: /continue/i });
+    expect(continueBtn.hasAttribute("disabled")).toBe(true);
+  });
+
+  test("Continue button advances when Other role text is filled", () => {
+    render(<RoleStep />);
+
+    fireEvent.click(screen.getByRole("button", { name: /\bother\b/i }));
+    fireEvent.change(screen.getByPlaceholderText(/describe your role/i), {
+      target: { value: "Designer" },
+    });
+
+    const continueBtn = screen.getByRole("button", { name: /continue/i });
+    expect(continueBtn.hasAttribute("disabled")).toBe(false);
+
+    fireEvent.click(continueBtn);
+    expect(useOnboardingWizardStore.getState().currentStep).toBe(3);
+  });
+
+  test("switching from Other to a regular role cancels Other and auto-advances", () => {
+    render(<RoleStep />);
+
+    fireEvent.click(screen.getByRole("button", { name: /\bother\b/i }));
+    expect(screen.getByPlaceholderText(/describe your role/i)).toBeDefined();
+
+    fireEvent.click(screen.getByRole("button", { name: /marketing/i }));
+
+    expect(useOnboardingWizardStore.getState().role).toBe("Marketing");
+    vi.advanceTimersByTime(350);
+    expect(useOnboardingWizardStore.getState().currentStep).toBe(3);
+  });
+});
diff --git a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/usePainPointsStep.ts b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/usePainPointsStep.ts
index bf8f5e59cc..384a43e80c 100644
--- a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/usePainPointsStep.ts
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/steps/usePainPointsStep.ts
@@ -1,4 +1,5 @@
-import { useOnboardingWizardStore } from "../store";
+import { useEffect, useRef, useState } from "react";
+import { MAX_PAIN_POINT_SELECTIONS, useOnboardingWizardStore } from "../store";
 
 const ROLE_TOP_PICKS: Record<string, string[]> = {
   "Founder/CEO": [
@@ -23,18 +24,38 @@ export function usePainPointsStep() {
   const role = useOnboardingWizardStore((s) => s.role);
   const painPoints = useOnboardingWizardStore((s) => s.painPoints);
   const otherPainPoint = useOnboardingWizardStore((s) => s.otherPainPoint);
-  const togglePainPoint = useOnboardingWizardStore((s) => s.togglePainPoint);
+  const storeToggle = useOnboardingWizardStore((s) => s.togglePainPoint);
   const setOtherPainPoint = useOnboardingWizardStore(
     (s) => s.setOtherPainPoint,
   );
   const nextStep = useOnboardingWizardStore((s) => s.nextStep);
+  const [shaking, setShaking] = useState(false);
+  const shakeTimer = useRef<ReturnType<typeof setTimeout> | null>(null);
+
+  useEffect(() => {
+    return () => {
+      if (shakeTimer.current) clearTimeout(shakeTimer.current);
+    };
+  }, []);
 
   const topIDs = getTopPickIDs(role);
   const hasSomethingElse = painPoints.includes("Something else");
+  const atLimit = painPoints.length >= MAX_PAIN_POINT_SELECTIONS;
   const canContinue =
     painPoints.length > 0 &&
     (!hasSomethingElse || Boolean(otherPainPoint.trim()));
 
+  function togglePainPoint(id: string) {
+    const alreadySelected = painPoints.includes(id);
+    if (!alreadySelected && atLimit) {
+      if (shakeTimer.current) clearTimeout(shakeTimer.current);
+      setShaking(true);
+      shakeTimer.current = setTimeout(() => setShaking(false), 600);
+      return;
+    }
+    storeToggle(id);
+  }
+
   function handleLaunch() {
     if (canContinue) {
       nextStep();
@@ -48,6 +69,8 @@ export function usePainPointsStep() {
     togglePainPoint,
     setOtherPainPoint,
     hasSomethingElse,
+    atLimit,
+    shaking,
     canContinue,
     handleLaunch,
   };
diff --git a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/store.ts b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/store.ts
index edc5ffa020..fe5e52b8c1 100644
--- a/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/store.ts
+++ b/autogpt_platform/frontend/src/app/(no-navbar)/onboarding/store.ts
@@ -1,5 +1,6 @@
 import { create } from "zustand";
 
+export const MAX_PAIN_POINT_SELECTIONS = 3;
 export type Step = 1 | 2 | 3 | 4;
 
 interface OnboardingWizardState {
@@ -40,6 +41,8 @@ export const useOnboardingWizardStore = create<OnboardingWizardState>(
     togglePainPoint(painPoint) {
       set((state) => {
         const exists = state.painPoints.includes(painPoint);
+        if (!exists && state.painPoints.length >= MAX_PAIN_POINT_SELECTIONS)
+          return state;
         return {
           painPoints: exists
             ? state.painPoints.filter((p) => p !== painPoint)
diff --git a/autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/CustomNode/components/NodeOutput/components/ContentRenderer.tsx b/autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/CustomNode/components/NodeOutput/components/ContentRenderer.tsx
index d7b2e11819..b25dba32a4 100644
--- a/autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/CustomNode/components/NodeOutput/components/ContentRenderer.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/CustomNode/components/NodeOutput/components/ContentRenderer.tsx
@@ -40,14 +40,14 @@ export const ContentRenderer: React.FC<{
     !shortContent
   ) {
     return (
-      <div className="overflow-hidden [&>*]:rounded-xlarge [&>*]:!text-xs [&_pre]:whitespace-pre-wrap [&_pre]:break-words">
+      <div className="overflow-x-auto [&>*]:rounded-xlarge [&>*]:!text-xs [&_pre]:whitespace-pre-wrap [&_pre]:break-words">
         {renderer?.render(value, metadata)}
       </div>
     );
   }
 
   return (
-    <div className="overflow-hidden [&>*]:rounded-xlarge [&>*]:!text-xs">
+    <div className="overflow-x-auto [&>*]:rounded-xlarge [&>*]:!text-xs">
       <TextRenderer value={value} truncateLengthLimit={200} />
     </div>
   );
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/CopilotPage.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/CopilotPage.tsx
index 90084bc535..46fbe1ed6e 100644
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/CopilotPage.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/CopilotPage.tsx
@@ -8,6 +8,7 @@ import { Flag, useGetFlag } from "@/services/feature-flags/use-get-flag";
 import { SidebarProvider } from "@/components/ui/sidebar";
 import { cn } from "@/lib/utils";
 import { UploadSimple } from "@phosphor-icons/react";
+import dynamic from "next/dynamic";
 import { useCallback, useEffect, useRef, useState } from "react";
 import { ChatContainer } from "./components/ChatContainer/ChatContainer";
 import { ChatSidebar } from "./components/ChatSidebar/ChatSidebar";
@@ -20,6 +21,14 @@ import { RateLimitResetDialog } from "./components/RateLimitResetDialog/RateLimi
 import { ScaleLoader } from "./components/ScaleLoader/ScaleLoader";
 import { useCopilotPage } from "./useCopilotPage";
 
+const ArtifactPanel = dynamic(
+  () =>
+    import("./components/ArtifactPanel/ArtifactPanel").then(
+      (m) => m.ArtifactPanel,
+    ),
+  { ssr: false },
+);
+
 export function CopilotPage() {
   const [isDragging, setIsDragging] = useState(false);
   const [droppedFiles, setDroppedFiles] = useState<File[]>([]);
@@ -80,6 +89,10 @@ export function CopilotPage() {
     isUploadingFiles,
     isUserLoading,
     isLoggedIn,
+    // Pagination
+    hasMoreMessages,
+    isLoadingMore,
+    loadMore,
     // Mobile drawer
     isMobile,
     isDrawerOpen,
@@ -116,6 +129,7 @@ export function CopilotPage() {
   const resetCost = usage?.reset_cost;
 
   const isBillingEnabled = useGetFlag(Flag.ENABLE_PLATFORM_PAYMENT);
+  const isArtifactsEnabled = useGetFlag(Flag.ARTIFACTS);
   const { credits, fetchCredits } = useCredits({ fetchInitialCredits: true });
   const hasInsufficientCredits =
     credits !== null && resetCost != null && credits < resetCost;
@@ -150,48 +164,55 @@ export function CopilotPage() {
       className="h-[calc(100vh-72px)] min-h-0"
     >
       {!isMobile && <ChatSidebar />}
-      <div
-        className="relative flex h-full w-full flex-col overflow-hidden bg-[#f8f8f9] px-0"
-        onDragEnter={handleDragEnter}
-        onDragOver={handleDragOver}
-        onDragLeave={handleDragLeave}
-        onDrop={handleDrop}
-      >
-        {isMobile && <MobileHeader onOpenDrawer={handleOpenDrawer} />}
-        <NotificationBanner />
-        {/* Drop overlay */}
+      <div className="flex h-full w-full flex-row overflow-hidden">
         <div
-          className={cn(
-            "pointer-events-none absolute inset-0 z-50 flex flex-col items-center justify-center gap-3 rounded-lg border-2 border-dashed border-violet-400 bg-violet-500/10 transition-opacity duration-150",
-            isDragging ? "opacity-100" : "opacity-0",
-          )}
+          className="relative flex min-w-0 flex-1 flex-col overflow-hidden bg-[#f8f8f9] px-0"
+          onDragEnter={handleDragEnter}
+          onDragOver={handleDragOver}
+          onDragLeave={handleDragLeave}
+          onDrop={handleDrop}
         >
-          <UploadSimple className="h-10 w-10 text-violet-500" weight="bold" />
-          <span className="text-lg font-medium text-violet-600">
-            Drop files here
-          </span>
-        </div>
-        <div className="flex-1 overflow-hidden">
-          <ChatContainer
-            messages={messages}
-            status={status}
-            error={error}
-            sessionId={sessionId}
-            isLoadingSession={isLoadingSession}
-            isSessionError={isSessionError}
-            isCreatingSession={isCreatingSession}
-            isReconnecting={isReconnecting}
-            isSyncing={isSyncing}
-            onCreateSession={createSession}
-            onSend={onSend}
-            onStop={stop}
-            isUploadingFiles={isUploadingFiles}
-            droppedFiles={droppedFiles}
-            onDroppedFilesConsumed={handleDroppedFilesConsumed}
-            historicalDurations={historicalDurations}
-          />
+          {isMobile && <MobileHeader onOpenDrawer={handleOpenDrawer} />}
+          <NotificationBanner />
+          {/* Drop overlay */}
+          <div
+            className={cn(
+              "pointer-events-none absolute inset-0 z-50 flex flex-col items-center justify-center gap-3 rounded-lg border-2 border-dashed border-violet-400 bg-violet-500/10 transition-opacity duration-150",
+              isDragging ? "opacity-100" : "opacity-0",
+            )}
+          >
+            <UploadSimple className="h-10 w-10 text-violet-500" weight="bold" />
+            <span className="text-lg font-medium text-violet-600">
+              Drop files here
+            </span>
+          </div>
+          <div className="flex-1 overflow-hidden">
+            <ChatContainer
+              messages={messages}
+              status={status}
+              error={error}
+              sessionId={sessionId}
+              isLoadingSession={isLoadingSession}
+              isSessionError={isSessionError}
+              isCreatingSession={isCreatingSession}
+              isReconnecting={isReconnecting}
+              isSyncing={isSyncing}
+              onCreateSession={createSession}
+              onSend={onSend}
+              onStop={stop}
+              isUploadingFiles={isUploadingFiles}
+              hasMoreMessages={hasMoreMessages}
+              isLoadingMore={isLoadingMore}
+              onLoadMore={loadMore}
+              droppedFiles={droppedFiles}
+              onDroppedFilesConsumed={handleDroppedFilesConsumed}
+              historicalDurations={historicalDurations}
+            />
+          </div>
         </div>
+        {!isMobile && isArtifactsEnabled && <ArtifactPanel />}
       </div>
+      {isMobile && isArtifactsEnabled && <ArtifactPanel mobile />}
       {isMobile && (
         <MobileDrawer
           isOpen={isDrawerOpen}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/__tests__/store.test.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/__tests__/store.test.ts
new file mode 100644
index 0000000000..e9ffe11db1
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/__tests__/store.test.ts
@@ -0,0 +1,221 @@
+import { describe, expect, it, beforeEach, vi } from "vitest";
+import { useCopilotUIStore } from "../store";
+
+vi.mock("@sentry/nextjs", () => ({
+  captureException: vi.fn(),
+}));
+
+vi.mock("@/services/environment", () => ({
+  environment: {
+    isServerSide: vi.fn(() => false),
+  },
+}));
+
+describe("useCopilotUIStore", () => {
+  beforeEach(() => {
+    window.localStorage.clear();
+    useCopilotUIStore.setState({
+      initialPrompt: null,
+      sessionToDelete: null,
+      isDrawerOpen: false,
+      completedSessionIDs: new Set<string>(),
+      isNotificationsEnabled: false,
+      isSoundEnabled: true,
+      showNotificationDialog: false,
+      copilotMode: "extended_thinking",
+    });
+  });
+
+  describe("initialPrompt", () => {
+    it("starts as null", () => {
+      expect(useCopilotUIStore.getState().initialPrompt).toBeNull();
+    });
+
+    it("sets and clears prompt", () => {
+      useCopilotUIStore.getState().setInitialPrompt("Hello");
+      expect(useCopilotUIStore.getState().initialPrompt).toBe("Hello");
+
+      useCopilotUIStore.getState().setInitialPrompt(null);
+      expect(useCopilotUIStore.getState().initialPrompt).toBeNull();
+    });
+  });
+
+  describe("sessionToDelete", () => {
+    it("starts as null", () => {
+      expect(useCopilotUIStore.getState().sessionToDelete).toBeNull();
+    });
+
+    it("sets and clears a delete target", () => {
+      useCopilotUIStore
+        .getState()
+        .setSessionToDelete({ id: "abc", title: "Test" });
+      expect(useCopilotUIStore.getState().sessionToDelete).toEqual({
+        id: "abc",
+        title: "Test",
+      });
+
+      useCopilotUIStore.getState().setSessionToDelete(null);
+      expect(useCopilotUIStore.getState().sessionToDelete).toBeNull();
+    });
+  });
+
+  describe("drawer", () => {
+    it("starts closed", () => {
+      expect(useCopilotUIStore.getState().isDrawerOpen).toBe(false);
+    });
+
+    it("opens and closes", () => {
+      useCopilotUIStore.getState().setDrawerOpen(true);
+      expect(useCopilotUIStore.getState().isDrawerOpen).toBe(true);
+
+      useCopilotUIStore.getState().setDrawerOpen(false);
+      expect(useCopilotUIStore.getState().isDrawerOpen).toBe(false);
+    });
+  });
+
+  describe("completedSessionIDs", () => {
+    it("starts empty", () => {
+      expect(useCopilotUIStore.getState().completedSessionIDs.size).toBe(0);
+    });
+
+    it("adds a completed session", () => {
+      useCopilotUIStore.getState().addCompletedSession("s1");
+      expect(useCopilotUIStore.getState().completedSessionIDs.has("s1")).toBe(
+        true,
+      );
+    });
+
+    it("persists added sessions to localStorage", () => {
+      useCopilotUIStore.getState().addCompletedSession("s1");
+      useCopilotUIStore.getState().addCompletedSession("s2");
+      const raw = window.localStorage.getItem("copilot-completed-sessions");
+      expect(raw).not.toBeNull();
+      const parsed = JSON.parse(raw!) as string[];
+      expect(parsed).toContain("s1");
+      expect(parsed).toContain("s2");
+    });
+
+    it("clears a single completed session", () => {
+      useCopilotUIStore.getState().addCompletedSession("s1");
+      useCopilotUIStore.getState().addCompletedSession("s2");
+      useCopilotUIStore.getState().clearCompletedSession("s1");
+      expect(useCopilotUIStore.getState().completedSessionIDs.has("s1")).toBe(
+        false,
+      );
+      expect(useCopilotUIStore.getState().completedSessionIDs.has("s2")).toBe(
+        true,
+      );
+    });
+
+    it("updates localStorage when a session is cleared", () => {
+      useCopilotUIStore.getState().addCompletedSession("s1");
+      useCopilotUIStore.getState().addCompletedSession("s2");
+      useCopilotUIStore.getState().clearCompletedSession("s1");
+      const raw = window.localStorage.getItem("copilot-completed-sessions");
+      const parsed = JSON.parse(raw!) as string[];
+      expect(parsed).not.toContain("s1");
+      expect(parsed).toContain("s2");
+    });
+
+    it("clears all completed sessions", () => {
+      useCopilotUIStore.getState().addCompletedSession("s1");
+      useCopilotUIStore.getState().addCompletedSession("s2");
+      useCopilotUIStore.getState().clearAllCompletedSessions();
+      expect(useCopilotUIStore.getState().completedSessionIDs.size).toBe(0);
+    });
+
+    it("removes localStorage key when all sessions are cleared", () => {
+      useCopilotUIStore.getState().addCompletedSession("s1");
+      useCopilotUIStore.getState().clearAllCompletedSessions();
+      expect(
+        window.localStorage.getItem("copilot-completed-sessions"),
+      ).toBeNull();
+    });
+  });
+
+  describe("sound toggle", () => {
+    it("starts enabled", () => {
+      expect(useCopilotUIStore.getState().isSoundEnabled).toBe(true);
+    });
+
+    it("toggles sound off and on", () => {
+      useCopilotUIStore.getState().toggleSound();
+      expect(useCopilotUIStore.getState().isSoundEnabled).toBe(false);
+
+      useCopilotUIStore.getState().toggleSound();
+      expect(useCopilotUIStore.getState().isSoundEnabled).toBe(true);
+    });
+
+    it("persists to localStorage", () => {
+      useCopilotUIStore.getState().toggleSound();
+      expect(window.localStorage.getItem("copilot-sound-enabled")).toBe(
+        "false",
+      );
+    });
+  });
+
+  describe("copilotMode", () => {
+    it("defaults to extended_thinking", () => {
+      expect(useCopilotUIStore.getState().copilotMode).toBe(
+        "extended_thinking",
+      );
+    });
+
+    it("sets mode to fast", () => {
+      useCopilotUIStore.getState().setCopilotMode("fast");
+      expect(useCopilotUIStore.getState().copilotMode).toBe("fast");
+      expect(window.localStorage.getItem("copilot-mode")).toBe("fast");
+    });
+
+    it("sets mode back to extended_thinking", () => {
+      useCopilotUIStore.getState().setCopilotMode("fast");
+      useCopilotUIStore.getState().setCopilotMode("extended_thinking");
+      expect(useCopilotUIStore.getState().copilotMode).toBe(
+        "extended_thinking",
+      );
+    });
+  });
+
+  describe("clearCopilotLocalData", () => {
+    it("resets state and clears localStorage keys", () => {
+      useCopilotUIStore.getState().setCopilotMode("fast");
+      useCopilotUIStore.getState().setNotificationsEnabled(true);
+      useCopilotUIStore.getState().toggleSound();
+      useCopilotUIStore.getState().addCompletedSession("s1");
+
+      useCopilotUIStore.getState().clearCopilotLocalData();
+
+      const state = useCopilotUIStore.getState();
+      expect(state.copilotMode).toBe("extended_thinking");
+      expect(state.isNotificationsEnabled).toBe(false);
+      expect(state.isSoundEnabled).toBe(true);
+      expect(state.completedSessionIDs.size).toBe(0);
+      expect(window.localStorage.getItem("copilot-mode")).toBeNull();
+      expect(
+        window.localStorage.getItem("copilot-notifications-enabled"),
+      ).toBeNull();
+      expect(window.localStorage.getItem("copilot-sound-enabled")).toBeNull();
+      expect(
+        window.localStorage.getItem("copilot-completed-sessions"),
+      ).toBeNull();
+    });
+  });
+
+  describe("notifications", () => {
+    it("sets notification preference", () => {
+      useCopilotUIStore.getState().setNotificationsEnabled(true);
+      expect(useCopilotUIStore.getState().isNotificationsEnabled).toBe(true);
+      expect(window.localStorage.getItem("copilot-notifications-enabled")).toBe(
+        "true",
+      );
+    });
+
+    it("shows and hides notification dialog", () => {
+      useCopilotUIStore.getState().setShowNotificationDialog(true);
+      expect(useCopilotUIStore.getState().showNotificationDialog).toBe(true);
+
+      useCopilotUIStore.getState().setShowNotificationDialog(false);
+      expect(useCopilotUIStore.getState().showNotificationDialog).toBe(false);
+    });
+  });
+});
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactCard/ArtifactCard.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactCard/ArtifactCard.tsx
new file mode 100644
index 0000000000..554d760215
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactCard/ArtifactCard.tsx
@@ -0,0 +1,114 @@
+"use client";
+
+import { toast } from "@/components/molecules/Toast/use-toast";
+import { cn } from "@/lib/utils";
+import { CaretRight, DownloadSimple } from "@phosphor-icons/react";
+import type { ArtifactRef } from "../../store";
+import { useCopilotUIStore } from "../../store";
+import { downloadArtifact } from "../ArtifactPanel/downloadArtifact";
+import { classifyArtifact } from "../ArtifactPanel/helpers";
+
+interface Props {
+  artifact: ArtifactRef;
+}
+
+function formatSize(bytes?: number): string {
+  if (!bytes) return "";
+  if (bytes < 1024) return `${bytes} B`;
+  if (bytes < 1024 * 1024) return `${(bytes / 1024).toFixed(1)} KB`;
+  return `${(bytes / (1024 * 1024)).toFixed(1)} MB`;
+}
+
+export function ArtifactCard({ artifact }: Props) {
+  const activeID = useCopilotUIStore((s) => s.artifactPanel.activeArtifact?.id);
+  const isOpen = useCopilotUIStore((s) => s.artifactPanel.isOpen);
+  const openArtifact = useCopilotUIStore((s) => s.openArtifact);
+
+  const isActive = isOpen && activeID === artifact.id;
+  const classification = classifyArtifact(
+    artifact.mimeType,
+    artifact.title,
+    artifact.sizeBytes,
+  );
+  const Icon = classification.icon;
+
+  function handleDownloadOnly() {
+    downloadArtifact(artifact).catch(() => {
+      toast({
+        title: "Download failed",
+        description: "Couldn't fetch the file.",
+        variant: "destructive",
+      });
+    });
+  }
+
+  if (!classification.openable) {
+    return (
+      <button
+        type="button"
+        onClick={handleDownloadOnly}
+        className="my-1 flex w-full items-center gap-3 rounded-lg border border-zinc-200 bg-white px-3 py-2.5 text-left transition-colors hover:bg-zinc-50"
+      >
+        <Icon size={20} className="shrink-0 text-zinc-400" />
+        <div className="min-w-0 flex-1">
+          <p className="truncate text-sm font-medium text-zinc-900">
+            {artifact.title}
+          </p>
+          <p className="text-xs text-zinc-400">
+            {classification.label}
+            {artifact.sizeBytes
+              ? ` \u2022 ${formatSize(artifact.sizeBytes)}`
+              : ""}
+          </p>
+        </div>
+        <DownloadSimple size={16} className="shrink-0 text-zinc-400" />
+      </button>
+    );
+  }
+
+  return (
+    <button
+      type="button"
+      onClick={() => openArtifact(artifact)}
+      className={cn(
+        "my-1 flex w-full items-center gap-3 rounded-lg border bg-white px-3 py-2.5 text-left transition-colors hover:bg-zinc-50",
+        isActive ? "border-violet-300 bg-violet-50/50" : "border-zinc-200",
+      )}
+    >
+      <Icon
+        size={20}
+        className={cn(
+          "shrink-0",
+          isActive ? "text-violet-500" : "text-zinc-400",
+        )}
+      />
+      <div className="min-w-0 flex-1">
+        <p className="truncate text-sm font-medium text-zinc-900">
+          {artifact.title}
+        </p>
+        <p className="text-xs text-zinc-400">
+          <span
+            className={cn(
+              "inline-block rounded-full px-1.5 py-0.5 text-xs font-medium",
+              artifact.origin === "user-upload"
+                ? "bg-blue-50 text-blue-500"
+                : "bg-violet-50 text-violet-500",
+            )}
+          >
+            {classification.label}
+          </span>
+          {artifact.sizeBytes
+            ? ` \u2022 ${formatSize(artifact.sizeBytes)}`
+            : ""}
+        </p>
+      </div>
+      <CaretRight
+        size={16}
+        className={cn(
+          "shrink-0",
+          isActive ? "text-violet-400" : "text-zinc-300",
+        )}
+      />
+    </button>
+  );
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/ArtifactPanel.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/ArtifactPanel.tsx
new file mode 100644
index 0000000000..78e79e50e8
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/ArtifactPanel.tsx
@@ -0,0 +1,125 @@
+"use client";
+
+import {
+  Sheet,
+  SheetContent,
+  SheetHeader,
+  SheetTitle,
+} from "@/components/ui/sheet";
+import { AnimatePresence, motion } from "framer-motion";
+import { ArtifactContent } from "./components/ArtifactContent";
+import { ArtifactDragHandle } from "./components/ArtifactDragHandle";
+import { ArtifactMinimizedStrip } from "./components/ArtifactMinimizedStrip";
+import { ArtifactPanelHeader } from "./components/ArtifactPanelHeader";
+import { useArtifactPanel } from "./useArtifactPanel";
+
+interface Props {
+  mobile?: boolean;
+}
+
+export function ArtifactPanel({ mobile }: Props) {
+  const {
+    isOpen,
+    isMinimized,
+    isMaximized,
+    activeArtifact,
+    history,
+    effectiveWidth,
+    isSourceView,
+    classification,
+    setIsSourceView,
+    closeArtifactPanel,
+    minimizeArtifactPanel,
+    maximizeArtifactPanel,
+    restoreArtifactPanel,
+    setArtifactPanelWidth,
+    goBackArtifact,
+    canCopy,
+    handleCopy,
+    handleDownload,
+  } = useArtifactPanel();
+
+  if (!activeArtifact || !classification) return null;
+
+  const headerProps = {
+    artifact: activeArtifact,
+    classification,
+    canGoBack: history.length > 0,
+    isMaximized,
+    isSourceView,
+    hasSourceToggle: classification.hasSourceToggle,
+    mobile: !!mobile,
+    canCopy,
+    onBack: goBackArtifact,
+    onClose: closeArtifactPanel,
+    onMinimize: minimizeArtifactPanel,
+    onMaximize: maximizeArtifactPanel,
+    onRestore: restoreArtifactPanel,
+    onCopy: handleCopy,
+    onDownload: handleDownload,
+    onSourceToggle: setIsSourceView,
+  };
+
+  // Mobile: fullscreen Sheet overlay
+  if (mobile) {
+    return (
+      <Sheet
+        open={isOpen}
+        onOpenChange={(open) => !open && closeArtifactPanel()}
+      >
+        <SheetContent
+          side="right"
+          className="flex w-full flex-col p-0 sm:max-w-full"
+        >
+          <SheetHeader className="sr-only">
+            <SheetTitle>{activeArtifact.title}</SheetTitle>
+          </SheetHeader>
+          <ArtifactPanelHeader {...headerProps} />
+          <ArtifactContent
+            artifact={activeArtifact}
+            isSourceView={isSourceView}
+            classification={classification}
+          />
+        </SheetContent>
+      </Sheet>
+    );
+  }
+
+  // Minimized strip
+  if (isOpen && isMinimized) {
+    return (
+      <ArtifactMinimizedStrip
+        artifact={activeArtifact}
+        classification={classification}
+        onExpand={restoreArtifactPanel}
+      />
+    );
+  }
+
+  // Keep AnimatePresence mounted across the open→closed transition so the
+  // exit animation on the motion.div has a chance to run.
+  return (
+    <AnimatePresence>
+      {isOpen && (
+        <motion.div
+          key="artifact-panel"
+          data-artifact-panel
+          initial={{ opacity: 0 }}
+          animate={{ opacity: 1 }}
+          exit={{ opacity: 0 }}
+          transition={{ duration: 0.25, ease: "easeInOut" }}
+          className="relative flex h-full flex-col overflow-hidden border-l border-zinc-200 bg-white"
+          style={{ width: effectiveWidth }}
+        >
+          <ArtifactDragHandle onWidthChange={setArtifactPanelWidth} />
+          <ArtifactPanelHeader {...headerProps} />
+          <ArtifactContent
+            artifact={activeArtifact}
+            isSourceView={isSourceView}
+            classification={classification}
+          />
+        </motion.div>
+      )}
+    </AnimatePresence>
+  );
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/ArtifactContent.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/ArtifactContent.tsx
new file mode 100644
index 0000000000..6e057293b5
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/ArtifactContent.tsx
@@ -0,0 +1,198 @@
+"use client";
+
+import { globalRegistry } from "@/components/contextual/OutputRenderers";
+import { codeRenderer } from "@/components/contextual/OutputRenderers/renderers/CodeRenderer";
+import { Suspense } from "react";
+import type { ArtifactRef } from "../../../store";
+import type { ArtifactClassification } from "../helpers";
+import { ArtifactReactPreview } from "./ArtifactReactPreview";
+import { ArtifactSkeleton } from "./ArtifactSkeleton";
+import {
+  TAILWIND_CDN_URL,
+  wrapWithHeadInjection,
+} from "@/lib/iframe-sandbox-csp";
+import { useArtifactContent } from "./useArtifactContent";
+
+interface Props {
+  artifact: ArtifactRef;
+  isSourceView: boolean;
+  classification: ArtifactClassification;
+}
+
+function ArtifactContentLoader({
+  artifact,
+  isSourceView,
+  classification,
+}: Props) {
+  const { content, pdfUrl, isLoading, error, scrollRef, retry } =
+    useArtifactContent(artifact, classification);
+
+  if (isLoading) {
+    return <ArtifactSkeleton extraLine />;
+  }
+
+  if (error) {
+    return (
+      <div
+        role="alert"
+        className="flex flex-col items-center justify-center gap-3 p-8 text-center"
+      >
+        <p className="text-sm text-zinc-500">Failed to load content</p>
+        <p className="text-xs text-zinc-400">{error}</p>
+        <button
+          type="button"
+          onClick={retry}
+          className="rounded-md border border-zinc-200 bg-white px-3 py-1.5 text-xs font-medium text-zinc-700 shadow-sm transition-colors hover:bg-zinc-50 focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-violet-400"
+        >
+          Try again
+        </button>
+      </div>
+    );
+  }
+
+  return (
+    <div ref={scrollRef} className="flex-1 overflow-y-auto">
+      <ArtifactRenderer
+        artifact={artifact}
+        content={content}
+        pdfUrl={pdfUrl}
+        isSourceView={isSourceView}
+        classification={classification}
+      />
+    </div>
+  );
+}
+
+function ArtifactRenderer({
+  artifact,
+  content,
+  pdfUrl,
+  isSourceView,
+  classification,
+}: {
+  artifact: ArtifactRef;
+  content: string | null;
+  pdfUrl: string | null;
+  isSourceView: boolean;
+  classification: ArtifactClassification;
+}) {
+  // Image: render directly from URL (no content fetch)
+  if (classification.type === "image") {
+    return (
+      <div className="flex items-center justify-center p-4">
+        {/* eslint-disable-next-line @next/next/no-img-element */}
+        <img
+          src={artifact.sourceUrl}
+          alt={artifact.title}
+          className="max-h-full max-w-full object-contain"
+        />
+      </div>
+    );
+  }
+
+  if (classification.type === "pdf" && pdfUrl) {
+    // No sandbox — Chrome/Edge block PDF rendering in sandboxed iframes
+    // (Chromium bug #413851). The blob URL has a null origin so it can't
+    // access the parent page regardless.
+    return (
+      <iframe src={pdfUrl} className="h-full w-full" title={artifact.title} />
+    );
+  }
+
+  if (content === null) return null;
+
+  // Source view: always show raw text
+  if (isSourceView) {
+    return (
+      <pre className="whitespace-pre-wrap break-words p-4 font-mono text-sm text-zinc-800">
+        {content}
+      </pre>
+    );
+  }
+
+  if (classification.type === "html") {
+    // Inject Tailwind CDN — no CSP (see iframe-sandbox-csp.ts for why)
+    const tailwindScript = `<script src="${TAILWIND_CDN_URL}"></script>`;
+    const wrapped = wrapWithHeadInjection(content, tailwindScript);
+    return (
+      <iframe
+        sandbox="allow-scripts"
+        srcDoc={wrapped}
+        className="h-full w-full border-0"
+        title={artifact.title}
+      />
+    );
+  }
+
+  if (classification.type === "react") {
+    return <ArtifactReactPreview source={content} title={artifact.title} />;
+  }
+
+  // Code: pass with explicit type metadata so CodeRenderer matches
+  // (prevents higher-priority MarkdownRenderer from claiming it)
+  if (classification.type === "code") {
+    const ext = artifact.title.split(".").pop() ?? "";
+    const codeMeta = {
+      mimeType: artifact.mimeType ?? undefined,
+      filename: artifact.title,
+      type: "code",
+      language: ext,
+    };
+    return <div className="p-4">{codeRenderer.render(content, codeMeta)}</div>;
+  }
+
+  // JSON: parse first so the JSONRenderer gets an object, not a string
+  // (prevents higher-priority MarkdownRenderer from claiming it)
+  if (classification.type === "json") {
+    try {
+      const parsed = JSON.parse(content);
+      const jsonMeta = {
+        mimeType: "application/json",
+        type: "json",
+        filename: artifact.title,
+      };
+      const jsonRenderer = globalRegistry.getRenderer(parsed, jsonMeta);
+      if (jsonRenderer) {
+        return (
+          <div className="p-4">{jsonRenderer.render(parsed, jsonMeta)}</div>
+        );
+      }
+    } catch {
+      // invalid JSON — fall through to plain text
+    }
+  }
+
+  // CSV: pass with explicit metadata so CSVRenderer matches
+  if (classification.type === "csv") {
+    const csvMeta = { mimeType: "text/csv", filename: artifact.title };
+    const csvRenderer = globalRegistry.getRenderer(content, csvMeta);
+    if (csvRenderer) {
+      return <div className="p-4">{csvRenderer.render(content, csvMeta)}</div>;
+    }
+  }
+
+  // Try the global renderer registry
+  const metadata = {
+    mimeType: artifact.mimeType ?? undefined,
+    filename: artifact.title,
+  };
+  const renderer = globalRegistry.getRenderer(content, metadata);
+  if (renderer) {
+    return <div className="p-4">{renderer.render(content, metadata)}</div>;
+  }
+
+  // Fallback: plain text
+  return (
+    <pre className="whitespace-pre-wrap break-words p-4 font-mono text-sm text-zinc-800">
+      {content}
+    </pre>
+  );
+}
+
+export function ArtifactContent(props: Props) {
+  return (
+    <Suspense fallback={<ArtifactSkeleton />}>
+      <ArtifactContentLoader {...props} />
+    </Suspense>
+  );
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/ArtifactDragHandle.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/ArtifactDragHandle.tsx
new file mode 100644
index 0000000000..0f30ce2078
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/ArtifactDragHandle.tsx
@@ -0,0 +1,93 @@
+"use client";
+
+import { cn } from "@/lib/utils";
+import { useEffect, useRef, useState } from "react";
+import { DEFAULT_PANEL_WIDTH } from "../../../store";
+
+interface Props {
+  onWidthChange: (width: number) => void;
+  minWidth?: number;
+  maxWidthPercent?: number;
+}
+
+export function ArtifactDragHandle({
+  onWidthChange,
+  minWidth = 320,
+  maxWidthPercent = 85,
+}: Props) {
+  const [isDragging, setIsDragging] = useState(false);
+  const startXRef = useRef(0);
+  const startWidthRef = useRef(0);
+  // Use refs for the callback + bounds so the drag listeners can read the
+  // latest values without having to detach/reattach between re-renders.
+  const onWidthChangeRef = useRef(onWidthChange);
+  const minWidthRef = useRef(minWidth);
+  const maxWidthPercentRef = useRef(maxWidthPercent);
+  onWidthChangeRef.current = onWidthChange;
+  minWidthRef.current = minWidth;
+  maxWidthPercentRef.current = maxWidthPercent;
+
+  // Attach document listeners only while dragging, and always tear them down
+  // on unmount — otherwise closing the panel mid-drag leaves listeners bound
+  // to a handler that calls setState on the unmounted component.
+  useEffect(() => {
+    if (!isDragging) return;
+
+    function handlePointerMove(moveEvent: PointerEvent) {
+      const delta = startXRef.current - moveEvent.clientX;
+      const maxWidth = window.innerWidth * (maxWidthPercentRef.current / 100);
+      const newWidth = Math.min(
+        maxWidth,
+        Math.max(minWidthRef.current, startWidthRef.current + delta),
+      );
+      onWidthChangeRef.current(newWidth);
+    }
+
+    function handlePointerUp() {
+      setIsDragging(false);
+    }
+
+    document.addEventListener("pointermove", handlePointerMove);
+    document.addEventListener("pointerup", handlePointerUp);
+    document.addEventListener("pointercancel", handlePointerUp);
+    return () => {
+      document.removeEventListener("pointermove", handlePointerMove);
+      document.removeEventListener("pointerup", handlePointerUp);
+      document.removeEventListener("pointercancel", handlePointerUp);
+    };
+  }, [isDragging]);
+
+  function handlePointerDown(e: React.PointerEvent) {
+    e.preventDefault();
+    startXRef.current = e.clientX;
+
+    // Get the panel's current width from its parent
+    const panel = (e.target as HTMLElement).closest(
+      "[data-artifact-panel]",
+    ) as HTMLElement | null;
+    startWidthRef.current = panel?.offsetWidth ?? DEFAULT_PANEL_WIDTH;
+
+    setIsDragging(true);
+  }
+
+  return (
+    // 12px transparent hit target with the visible 1px line centered inside
+    // (WCAG-compliant, matches ~8-12px conventions of other resizable panels).
+    <div
+      role="separator"
+      aria-orientation="vertical"
+      aria-label="Resize panel"
+      className={cn(
+        "group absolute -left-1.5 top-0 z-10 flex h-full w-3 cursor-col-resize items-stretch justify-center",
+      )}
+      onPointerDown={handlePointerDown}
+    >
+      <div
+        className={cn(
+          "h-full w-px bg-transparent transition-colors group-hover:w-0.5 group-hover:bg-violet-400",
+          isDragging && "w-0.5 bg-violet-500",
+        )}
+      />
+    </div>
+  );
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/ArtifactMinimizedStrip.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/ArtifactMinimizedStrip.tsx
new file mode 100644
index 0000000000..5c85e6eca9
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/ArtifactMinimizedStrip.tsx
@@ -0,0 +1,47 @@
+"use client";
+
+import { ArrowsOutSimple } from "@phosphor-icons/react";
+import type { ArtifactRef } from "../../../store";
+import type { ArtifactClassification } from "../helpers";
+
+interface Props {
+  artifact: ArtifactRef;
+  classification: ArtifactClassification;
+  onExpand: () => void;
+}
+
+export function ArtifactMinimizedStrip({
+  artifact,
+  classification,
+  onExpand,
+}: Props) {
+  const Icon = classification.icon;
+
+  return (
+    <div className="flex h-full w-10 flex-col items-center border-l border-zinc-200 bg-white pt-3">
+      <button
+        type="button"
+        onClick={onExpand}
+        className="rounded p-1.5 text-zinc-500 transition-colors hover:bg-zinc-100 hover:text-zinc-700 focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-violet-400"
+        title="Expand panel"
+      >
+        <ArrowsOutSimple size={16} />
+      </button>
+      <div className="mt-3 text-zinc-400">
+        <Icon size={16} />
+      </div>
+      <span
+        className="mt-2 text-xs text-zinc-400"
+        style={{
+          writingMode: "vertical-rl",
+          textOrientation: "mixed",
+          maxHeight: "120px",
+          overflow: "hidden",
+          textOverflow: "ellipsis",
+        }}
+      >
+        {artifact.title}
+      </span>
+    </div>
+  );
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/ArtifactPanelHeader.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/ArtifactPanelHeader.tsx
new file mode 100644
index 0000000000..eb888fa6b2
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/ArtifactPanelHeader.tsx
@@ -0,0 +1,138 @@
+"use client";
+
+import { cn } from "@/lib/utils";
+import {
+  ArrowLeft,
+  ArrowsIn,
+  ArrowsOut,
+  Copy,
+  DownloadSimple,
+  Minus,
+  X,
+} from "@phosphor-icons/react";
+import type { ArtifactRef } from "../../../store";
+import type { ArtifactClassification } from "../helpers";
+import { SourceToggle } from "./SourceToggle";
+
+interface Props {
+  artifact: ArtifactRef;
+  classification: ArtifactClassification;
+  canGoBack: boolean;
+  isMaximized: boolean;
+  isSourceView: boolean;
+  hasSourceToggle: boolean;
+  mobile?: boolean;
+  canCopy?: boolean;
+  onBack: () => void;
+  onClose: () => void;
+  onMinimize: () => void;
+  onMaximize: () => void;
+  onRestore: () => void;
+  onCopy: () => void;
+  onDownload: () => void;
+  onSourceToggle: (isSource: boolean) => void;
+}
+
+function HeaderButton({
+  onClick,
+  title,
+  children,
+}: {
+  onClick: () => void;
+  title: string;
+  children: React.ReactNode;
+}) {
+  return (
+    <button
+      type="button"
+      onClick={onClick}
+      title={title}
+      aria-label={title}
+      className="rounded p-1.5 text-zinc-500 transition-colors hover:bg-zinc-100 hover:text-zinc-700"
+    >
+      {children}
+    </button>
+  );
+}
+
+export function ArtifactPanelHeader({
+  artifact,
+  classification,
+  canGoBack,
+  isMaximized,
+  isSourceView,
+  hasSourceToggle,
+  mobile,
+  canCopy = true,
+  onBack,
+  onClose,
+  onMinimize,
+  onMaximize,
+  onRestore,
+  onCopy,
+  onDownload,
+  onSourceToggle,
+}: Props) {
+  const Icon = classification.icon;
+
+  return (
+    <div className="sticky top-0 z-10 flex items-center gap-2 border-b border-zinc-200 bg-white px-3 py-2">
+      {/* Left section */}
+      <div className="flex min-w-0 flex-1 items-center gap-2">
+        {canGoBack && (
+          <HeaderButton onClick={onBack} title="Back">
+            <ArrowLeft size={16} />
+          </HeaderButton>
+        )}
+        <Icon size={16} className="shrink-0 text-zinc-400" />
+        <span className="truncate text-sm font-medium text-zinc-900">
+          {artifact.title}
+        </span>
+        <span
+          className={cn(
+            "shrink-0 rounded-full px-2 py-0.5 text-xs font-medium",
+            artifact.origin === "user-upload"
+              ? "bg-blue-50 text-blue-600"
+              : "bg-violet-50 text-violet-600",
+          )}
+        >
+          {classification.label}
+        </span>
+      </div>
+
+      {/* Right section */}
+      <div className="flex items-center gap-1">
+        {hasSourceToggle && (
+          <SourceToggle isSourceView={isSourceView} onToggle={onSourceToggle} />
+        )}
+        {canCopy && (
+          <HeaderButton onClick={onCopy} title="Copy">
+            <Copy size={16} />
+          </HeaderButton>
+        )}
+        <HeaderButton onClick={onDownload} title="Download">
+          <DownloadSimple size={16} />
+        </HeaderButton>
+        {!mobile && (
+          <>
+            <HeaderButton onClick={onMinimize} title="Minimize">
+              <Minus size={16} />
+            </HeaderButton>
+            {isMaximized ? (
+              <HeaderButton onClick={onRestore} title="Restore">
+                <ArrowsIn size={16} />
+              </HeaderButton>
+            ) : (
+              <HeaderButton onClick={onMaximize} title="Maximize">
+                <ArrowsOut size={16} />
+              </HeaderButton>
+            )}
+          </>
+        )}
+        <HeaderButton onClick={onClose} title="Close">
+          <X size={16} />
+        </HeaderButton>
+      </div>
+    </div>
+  );
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/ArtifactReactPreview.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/ArtifactReactPreview.tsx
new file mode 100644
index 0000000000..a8ad870213
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/ArtifactReactPreview.tsx
@@ -0,0 +1,72 @@
+"use client";
+
+import { useEffect, useState } from "react";
+import { ArtifactSkeleton } from "./ArtifactSkeleton";
+import {
+  buildReactArtifactSrcDoc,
+  collectPreviewStyles,
+  transpileReactArtifactSource,
+} from "./reactArtifactPreview";
+
+interface Props {
+  source: string;
+  title: string;
+}
+
+export function ArtifactReactPreview({ source, title }: Props) {
+  const [srcDoc, setSrcDoc] = useState<string | null>(null);
+  const [error, setError] = useState<string | null>(null);
+
+  useEffect(() => {
+    let cancelled = false;
+
+    setSrcDoc(null);
+    setError(null);
+
+    transpileReactArtifactSource(source, title)
+      .then((compiledCode) => {
+        if (cancelled) return;
+        setSrcDoc(
+          buildReactArtifactSrcDoc(compiledCode, title, collectPreviewStyles()),
+        );
+      })
+      .catch((nextError: unknown) => {
+        if (cancelled) return;
+        setError(
+          nextError instanceof Error
+            ? nextError.message
+            : "Failed to build artifact preview",
+        );
+      });
+
+    return () => {
+      cancelled = true;
+    };
+  }, [source, title]);
+
+  if (error) {
+    return (
+      <div className="flex flex-col gap-2 p-4">
+        <p className="text-sm font-medium text-red-600">
+          Failed to render React preview
+        </p>
+        <pre className="whitespace-pre-wrap break-words rounded-md bg-red-50 p-3 font-mono text-xs text-red-900">
+          {error}
+        </pre>
+      </div>
+    );
+  }
+
+  if (!srcDoc) {
+    return <ArtifactSkeleton />;
+  }
+
+  return (
+    <iframe
+      sandbox="allow-scripts"
+      srcDoc={srcDoc}
+      className="h-full w-full border-0"
+      title={`${title} preview`}
+    />
+  );
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/ArtifactSkeleton.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/ArtifactSkeleton.tsx
new file mode 100644
index 0000000000..c90666fdaf
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/ArtifactSkeleton.tsx
@@ -0,0 +1,17 @@
+import { Skeleton } from "@/components/ui/skeleton";
+
+interface Props {
+  /** Extra line before the 32h block (the variant used while fetching text). */
+  extraLine?: boolean;
+}
+
+export function ArtifactSkeleton({ extraLine }: Props) {
+  return (
+    <div className="space-y-3 p-4">
+      <Skeleton className="h-4 w-3/4" />
+      <Skeleton className="h-4 w-1/2" />
+      {extraLine && <Skeleton className="h-4 w-5/6" />}
+      <Skeleton className="h-32 w-full" />
+    </div>
+  );
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/SourceToggle.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/SourceToggle.tsx
new file mode 100644
index 0000000000..3b1f257b1d
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/SourceToggle.tsx
@@ -0,0 +1,41 @@
+"use client";
+
+import { cn } from "@/lib/utils";
+
+interface Props {
+  isSourceView: boolean;
+  onToggle: (isSource: boolean) => void;
+}
+
+export function SourceToggle({ isSourceView, onToggle }: Props) {
+  return (
+    <div className="flex items-center rounded-md border border-zinc-200 bg-zinc-50 p-0.5 text-xs font-medium">
+      <button
+        type="button"
+        aria-pressed={!isSourceView}
+        className={cn(
+          "rounded px-2 py-1 transition-colors",
+          !isSourceView
+            ? "bg-white text-zinc-900 shadow-sm"
+            : "text-zinc-500 hover:text-zinc-700",
+        )}
+        onClick={() => onToggle(false)}
+      >
+        Preview
+      </button>
+      <button
+        type="button"
+        aria-pressed={isSourceView}
+        className={cn(
+          "rounded px-2 py-1 transition-colors",
+          isSourceView
+            ? "bg-white text-zinc-900 shadow-sm"
+            : "text-zinc-500 hover:text-zinc-700",
+        )}
+        onClick={() => onToggle(true)}
+      >
+        Source
+      </button>
+    </div>
+  );
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/__tests__/useArtifactContent.test.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/__tests__/useArtifactContent.test.ts
new file mode 100644
index 0000000000..e9f5a11d3e
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/__tests__/useArtifactContent.test.ts
@@ -0,0 +1,167 @@
+import { describe, expect, it, vi, beforeEach, afterEach } from "vitest";
+import { renderHook, waitFor, act } from "@testing-library/react";
+import {
+  useArtifactContent,
+  getCachedArtifactContent,
+} from "../useArtifactContent";
+import type { ArtifactRef } from "../../../../store";
+import type { ArtifactClassification } from "../../helpers";
+
+function makeArtifact(overrides?: Partial<ArtifactRef>): ArtifactRef {
+  return {
+    id: "file-001",
+    title: "test.txt",
+    mimeType: "text/plain",
+    sourceUrl: "/api/proxy/api/workspace/files/file-001/download",
+    origin: "agent",
+    ...overrides,
+  };
+}
+
+function makeClassification(
+  overrides?: Partial<ArtifactClassification>,
+): ArtifactClassification {
+  return {
+    type: "text",
+    icon: vi.fn() as any,
+    label: "Text",
+    openable: true,
+    hasSourceToggle: false,
+    ...overrides,
+  };
+}
+
+describe("useArtifactContent", () => {
+  beforeEach(() => {
+    vi.stubGlobal(
+      "fetch",
+      vi.fn().mockResolvedValue({
+        ok: true,
+        text: () => Promise.resolve("file content here"),
+        blob: () => Promise.resolve(new Blob(["pdf bytes"])),
+      }),
+    );
+  });
+
+  afterEach(() => {
+    vi.restoreAllMocks();
+  });
+
+  it("fetches text content for text artifacts", async () => {
+    const artifact = makeArtifact();
+    const classification = makeClassification({ type: "text" });
+
+    const { result } = renderHook(() =>
+      useArtifactContent(artifact, classification),
+    );
+
+    await waitFor(() => {
+      expect(result.current.isLoading).toBe(false);
+    });
+
+    expect(result.current.content).toBe("file content here");
+    expect(result.current.error).toBeNull();
+  });
+
+  it("skips fetch for image artifacts", async () => {
+    const artifact = makeArtifact({ mimeType: "image/png" });
+    const classification = makeClassification({ type: "image" });
+
+    const { result } = renderHook(() =>
+      useArtifactContent(artifact, classification),
+    );
+
+    expect(result.current.isLoading).toBe(false);
+    expect(result.current.content).toBeNull();
+    expect(fetch).not.toHaveBeenCalled();
+  });
+
+  it("creates blob URL for PDF artifacts", async () => {
+    const artifact = makeArtifact({ mimeType: "application/pdf" });
+    const classification = makeClassification({ type: "pdf" });
+
+    const { result } = renderHook(() =>
+      useArtifactContent(artifact, classification),
+    );
+
+    await waitFor(() => {
+      expect(result.current.isLoading).toBe(false);
+    });
+
+    expect(result.current.pdfUrl).toMatch(/^blob:/);
+  });
+
+  it("sets error on fetch failure", async () => {
+    vi.stubGlobal(
+      "fetch",
+      vi.fn().mockResolvedValue({
+        ok: false,
+        status: 404,
+        text: () => Promise.resolve("Not found"),
+      }),
+    );
+
+    // Use a unique ID to avoid hitting the module-level content cache
+    const artifact = makeArtifact({ id: "error-test-unique" });
+    const classification = makeClassification({ type: "text" });
+
+    const { result } = renderHook(() =>
+      useArtifactContent(artifact, classification),
+    );
+
+    await waitFor(() => {
+      expect(result.current.error).toBeTruthy();
+    });
+
+    expect(result.current.error).toContain("404");
+    expect(result.current.content).toBeNull();
+  });
+
+  it("caches fetched content and exposes via getCachedArtifactContent", async () => {
+    const artifact = makeArtifact({ id: "cache-test" });
+    const classification = makeClassification({ type: "text" });
+
+    const { result } = renderHook(() =>
+      useArtifactContent(artifact, classification),
+    );
+
+    await waitFor(() => {
+      expect(result.current.content).toBe("file content here");
+    });
+
+    expect(getCachedArtifactContent("cache-test")).toBe("file content here");
+  });
+
+  it("retry clears cache and re-fetches", async () => {
+    let callCount = 0;
+    vi.stubGlobal(
+      "fetch",
+      vi.fn().mockImplementation(() => {
+        callCount++;
+        return Promise.resolve({
+          ok: true,
+          text: () => Promise.resolve(`response ${callCount}`),
+        });
+      }),
+    );
+
+    const artifact = makeArtifact({ id: "retry-test" });
+    const classification = makeClassification({ type: "text" });
+
+    const { result } = renderHook(() =>
+      useArtifactContent(artifact, classification),
+    );
+
+    await waitFor(() => {
+      expect(result.current.content).toBe("response 1");
+    });
+
+    act(() => {
+      result.current.retry();
+    });
+
+    await waitFor(() => {
+      expect(result.current.content).toBe("response 2");
+    });
+  });
+});
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/reactArtifactPreview.test.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/reactArtifactPreview.test.ts
new file mode 100644
index 0000000000..6a6bc806cb
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/reactArtifactPreview.test.ts
@@ -0,0 +1,88 @@
+import { describe, expect, it } from "vitest";
+import {
+  buildReactArtifactSrcDoc,
+  collectPreviewStyles,
+  escapeHtml,
+} from "./reactArtifactPreview";
+
+describe("escapeHtml", () => {
+  it("escapes &, <, >, \", '", () => {
+    expect(escapeHtml("a & b")).toBe("a &amp; b");
+    expect(escapeHtml("<script>")).toBe("&lt;script&gt;");
+    expect(escapeHtml('hello "world"')).toBe("hello &quot;world&quot;");
+    expect(escapeHtml("it's")).toBe("it&#39;s");
+  });
+
+  it("neutralizes a </title> escape attempt", () => {
+    // Used to escape a title that lands inside <title>${safeTitle}</title>
+    const out = escapeHtml("</title><script>alert(1)</script>");
+    expect(out).not.toContain("</title>");
+    expect(out).not.toContain("<script>");
+    expect(out).toContain("&lt;/title&gt;");
+    expect(out).toContain("&lt;script&gt;");
+  });
+
+  it("escapes ampersand first so entities aren't double-escaped in the wrong order", () => {
+    // If & were escaped AFTER <, the < → &lt; output would become &amp;lt;.
+    // Verify the & substitution ran on the raw input only.
+    expect(escapeHtml("A&B<C")).toBe("A&amp;B&lt;C");
+  });
+
+  it("is safe on empty / plain strings", () => {
+    expect(escapeHtml("")).toBe("");
+    expect(escapeHtml("plain text 123")).toBe("plain text 123");
+  });
+});
+
+describe("buildReactArtifactSrcDoc", () => {
+  const STYLES = collectPreviewStyles();
+
+  it("does not contain a CSP meta tag (see iframe-sandbox-csp.ts)", () => {
+    const doc = buildReactArtifactSrcDoc("module.exports = {};", "A", STYLES);
+    expect(doc).not.toContain("Content-Security-Policy");
+  });
+
+  it("includes SRI-pinned React and ReactDOM bundles", () => {
+    const doc = buildReactArtifactSrcDoc("module.exports = {};", "A", STYLES);
+    expect(doc).toContain(
+      'src="https://unpkg.com/react@18.3.1/umd/react.production.min.js"',
+    );
+    expect(doc).toContain('integrity="sha384-');
+    expect(doc).toContain(
+      'src="https://unpkg.com/react-dom@18.3.1/umd/react-dom.production.min.js"',
+    );
+  });
+
+  it("escapes the title into the <title> tag", () => {
+    const doc = buildReactArtifactSrcDoc(
+      "module.exports = {};",
+      "</title><script>alert(1)</script>",
+      STYLES,
+    );
+    expect(doc).not.toMatch(/<title><\/title><script>/);
+    expect(doc).toContain("&lt;/title&gt;");
+  });
+
+  it("escapes </script> sequences in compiled code so the inline script can't be broken out of", () => {
+    // A legitimate artifact may contain the literal string "</script>" inside
+    // a JSX template or string; it must be \u003c-escaped before embedding.
+    const compiled = 'const x = "</script><script>alert(1)</script>";';
+    const doc = buildReactArtifactSrcDoc(compiled, "A", STYLES);
+    // The raw compiled string should NOT appear verbatim inside the srcDoc
+    // (that would break out of the runtime <script>).
+    expect(doc).not.toContain('"</script><script>alert(1)</script>"');
+    // Instead, the escaped \u003c/script> form is what we expect.
+    expect(doc).toContain("\\u003c/script>");
+  });
+
+  it("wires up #root and #error containers", () => {
+    const doc = buildReactArtifactSrcDoc("module.exports = {};", "A", STYLES);
+    expect(doc).toContain('<div id="root">');
+    expect(doc).toContain('<div id="error">');
+  });
+
+  it("injects the styles markup supplied by collectPreviewStyles", () => {
+    const doc = buildReactArtifactSrcDoc("module.exports = {};", "A", STYLES);
+    expect(doc).toContain("box-sizing: border-box");
+  });
+});
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/reactArtifactPreview.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/reactArtifactPreview.ts
new file mode 100644
index 0000000000..eb20732f24
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/reactArtifactPreview.ts
@@ -0,0 +1,318 @@
+/**
+ * React artifact preview — security model
+ *
+ * AI-generated TSX source is transpiled (TypeScript) and executed inside a
+ * sandboxed iframe (`sandbox="allow-scripts"` without `allow-same-origin`).
+ *
+ * What's isolated:
+ *   - No access to parent page cookies, localStorage, or sessionStorage
+ *   - No form submissions or popups (no allow-forms / allow-popups)
+ *   - Treated as a unique opaque origin by the browser
+ *
+ * What's allowed inside the iframe:
+ *   - Inline script execution (needed to render React components)
+ *   - `new Function()` is used to evaluate the compiled code (eval-equivalent)
+ *   - Full DOM access within the iframe
+ *   - Network requests via fetch/XHR (allowed — only artifact content is
+ *     visible inside the sandbox, no secret data to exfiltrate)
+ *
+ * React is loaded from unpkg with pinned version and SRI integrity hashes.
+ */
+
+import { TAILWIND_CDN_URL } from "@/lib/iframe-sandbox-csp";
+
+export { transpileReactArtifactSource } from "./transpileReactArtifact";
+
+export function escapeHtml(value: string): string {
+  return value
+    .replaceAll("&", "&amp;")
+    .replaceAll("<", "&lt;")
+    .replaceAll(">", "&gt;")
+    .replaceAll('"', "&quot;")
+    .replaceAll("'", "&#39;");
+}
+
+/** Minimal CSS reset for React artifact previews.
+ *
+ * Previously this copied ALL host stylesheets (200KB+ Tailwind) into every
+ * preview iframe. Now we provide a self-contained reset and let artifacts
+ * declare their own styles. This avoids tight coupling between the app's CSS
+ * and artifact rendering, and keeps the srcdoc size small.
+ */
+export function collectPreviewStyles() {
+  return `<style>
+    *, *::before, *::after { box-sizing: border-box; }
+    body { margin: 0; font-family: ui-sans-serif, system-ui, sans-serif; }
+  </style>`;
+}
+
+export function buildReactArtifactSrcDoc(
+  compiledCode: string,
+  title: string,
+  stylesMarkup: string,
+) {
+  const safeTitle = escapeHtml(title);
+  const runtime = JSON.stringify(compiledCode).replace(/</g, "\\u003c");
+
+  return `<!doctype html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1" />
+    <title>${safeTitle}</title>
+    ${stylesMarkup}
+    <style>
+      html, body, #root {
+        height: 100%;
+        margin: 0;
+      }
+
+      body {
+        background:
+          radial-gradient(circle at top, rgba(148, 163, 184, 0.18), transparent 35%),
+          #f8fafc;
+        color: #18181b;
+        font-family: ui-sans-serif, system-ui, sans-serif;
+      }
+
+      #root {
+        box-sizing: border-box;
+        min-height: 100%;
+        isolation: isolate;
+      }
+
+      #error {
+        display: none;
+        box-sizing: border-box;
+        margin: 24px;
+        padding: 16px;
+        border: 1px solid #fecaca;
+        border-radius: 16px;
+        background: #fff1f2;
+        color: #991b1b;
+        font-family: ui-monospace, SFMono-Regular, monospace;
+        white-space: pre-wrap;
+      }
+    </style>
+    <script src="${TAILWIND_CDN_URL}"></script>
+    <script crossorigin="anonymous" src="https://unpkg.com/react@18.3.1/umd/react.production.min.js" integrity="sha384-DGyLxAyjq0f9SPpVevD6IgztCFlnMF6oW/XQGmfe+IsZ8TqEiDrcHkMLKI6fiB/Z"></script>
+    <script crossorigin="anonymous" src="https://unpkg.com/react-dom@18.3.1/umd/react-dom.production.min.js" integrity="sha384-gTGxhz21lVGYNMcdJOyq01Edg0jhn/c22nsx0kyqP0TxaV5WVdsSH1fSDUf5YJj1"></script>
+  </head>
+  <body>
+    <div id="root"></div>
+    <div id="error"></div>
+    <script>
+      (function () {
+        const compiledCode = ${runtime};
+        const rootElement = document.getElementById("root");
+        const errorElement = document.getElementById("error");
+
+        function showError(error) {
+          rootElement.style.display = "none";
+          errorElement.style.display = "block";
+          errorElement.textContent =
+            error instanceof Error && error.stack
+              ? error.stack
+              : error instanceof Error
+                ? error.message
+                : String(error);
+        }
+
+        function getModuleExports(module, exports) {
+          return {
+            ...exports,
+            ...(typeof module.exports === "object" ? module.exports : {}),
+          };
+        }
+
+        function getRenderableCandidate(moduleExports) {
+          if (typeof moduleExports.default === "function") {
+            return moduleExports.default;
+          }
+
+          if (typeof moduleExports.App === "function") {
+            return moduleExports.App;
+          }
+
+          const namedCandidate = Object.entries(moduleExports).find(
+            ([name, value]) =>
+              name !== "default" &&
+              !name.endsWith("Provider") &&
+              /^[A-Z]/.test(name) &&
+              typeof value === "function",
+          );
+
+          if (namedCandidate) {
+            return namedCandidate[1];
+          }
+
+          if (typeof App !== "undefined" && typeof App === "function") {
+            return App;
+          }
+
+          throw new Error(
+            "No renderable component found. Export a default component, export App, or export a named component.",
+          );
+        }
+
+        function wrapWithProviders(Component, moduleExports) {
+          const providers = Object.entries(moduleExports)
+            .filter(
+              ([name, value]) =>
+                name !== "default" &&
+                name.endsWith("Provider") &&
+                typeof value === "function",
+            )
+            .map(([, value]) => value);
+
+          if (providers.length === 0) {
+            return Component;
+          }
+
+          return function WrappedArtifactPreview() {
+            let tree = React.createElement(Component);
+
+            for (let i = providers.length - 1; i >= 0; i -= 1) {
+              tree = React.createElement(providers[i], null, tree);
+            }
+
+            return tree;
+          };
+        }
+
+        function require(name) {
+          if (name === "react") {
+            return React;
+          }
+
+          if (name === "react-dom") {
+            return ReactDOM;
+          }
+
+          if (name === "react-dom/client") {
+            return { createRoot: ReactDOM.createRoot };
+          }
+
+          if (name === "react/jsx-runtime" || name === "react/jsx-dev-runtime") {
+            // jsx/jsxs signature: (type, config, key) where config.children is
+            // the children (single value for jsx, array for jsxs). createElement
+            // wants variadic children, so we have to unpack config.children.
+            function jsx(type, config, key) {
+              var props = {};
+              if (config != null) {
+                for (var k in config) {
+                  if (k !== "children") props[k] = config[k];
+                }
+              }
+              if (key !== undefined) props.key = key;
+              var children =
+                config != null && "children" in config ? config.children : undefined;
+              if (Array.isArray(children)) {
+                return React.createElement.apply(
+                  null,
+                  [type, props].concat(children),
+                );
+              }
+              return children === undefined
+                ? React.createElement(type, props)
+                : React.createElement(type, props, children);
+            }
+            return { Fragment: React.Fragment, jsx: jsx, jsxs: jsx };
+          }
+
+          throw new Error("Unsupported import in artifact preview: " + name);
+        }
+
+        class PreviewErrorBoundary extends React.Component {
+          constructor(props) {
+            super(props);
+            this.state = { error: null };
+          }
+
+          static getDerivedStateFromError(error) {
+            return { error };
+          }
+
+          render() {
+            if (this.state.error) {
+              return React.createElement(
+                "div",
+                {
+                  style: {
+                    margin: "24px",
+                    padding: "16px",
+                    border: "1px solid #fecaca",
+                    borderRadius: "16px",
+                    background: "#fff1f2",
+                    color: "#991b1b",
+                    fontFamily: "ui-monospace, SFMono-Regular, monospace",
+                    whiteSpace: "pre-wrap",
+                  },
+                },
+                this.state.error.stack || this.state.error.message || String(this.state.error),
+              );
+            }
+
+            return this.props.children;
+          }
+        }
+
+        try {
+          const exports = {};
+          const module = { exports };
+          const factory = new Function(
+            "React",
+            "ReactDOM",
+            "module",
+            "exports",
+            "require",
+            \`
+              "use strict";
+              \${compiledCode}
+              return {
+                module,
+                exports,
+                app: typeof App !== "undefined" ? App : undefined,
+              };
+            \`,
+          );
+
+          const executionResult = factory(
+            React,
+            ReactDOM,
+            module,
+            exports,
+            require,
+          );
+          const moduleExports = getModuleExports(
+            executionResult.module,
+            executionResult.exports,
+          );
+
+          if (
+            executionResult.app &&
+            typeof moduleExports.App !== "function"
+          ) {
+            moduleExports.App = executionResult.app;
+          }
+
+          const Component = wrapWithProviders(
+            getRenderableCandidate(moduleExports),
+            moduleExports,
+          );
+
+          ReactDOM.createRoot(rootElement).render(
+            React.createElement(
+              PreviewErrorBoundary,
+              null,
+              React.createElement(Component),
+            ),
+          );
+        } catch (error) {
+          showError(error);
+        }
+      })();
+    </script>
+  </body>
+</html>`;
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/transpileReactArtifact.test.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/transpileReactArtifact.test.ts
new file mode 100644
index 0000000000..5a43b99749
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/transpileReactArtifact.test.ts
@@ -0,0 +1,51 @@
+import { describe, expect, it } from "vitest";
+import { transpileReactArtifactSource } from "./transpileReactArtifact";
+
+describe("transpileReactArtifactSource", () => {
+  it("transpiles a simple TSX function component", async () => {
+    const src =
+      'import React from "react";\nexport default function App() { return <div>hi</div>; }';
+    const out = await transpileReactArtifactSource(src, "App.tsx");
+    // Classic-transform emits React.createElement calls.
+    // esModuleInterop emits `react_1.default.createElement(...)` — match either form.
+    expect(out).toMatch(/\.createElement\(/);
+    expect(out).not.toContain("<div>");
+  });
+
+  it("still transpiles when the filename lacks an extension (ensureJsxExtension)", async () => {
+    const src = "export default function A() { return <span>x</span>; }";
+    // Previously: filename without .tsx caused a JSX syntax error.
+    const out = await transpileReactArtifactSource(src, "A");
+    // esModuleInterop emits `react_1.default.createElement(...)` — match either form.
+    expect(out).toMatch(/\.createElement\(/);
+  });
+
+  it("still transpiles when the filename ends in .ts (not jsx-aware)", async () => {
+    const src = "export default function A() { return <b>x</b>; }";
+    const out = await transpileReactArtifactSource(src, "A.ts");
+    // esModuleInterop emits `react_1.default.createElement(...)` — match either form.
+    expect(out).toMatch(/\.createElement\(/);
+  });
+
+  it("keeps .tsx extension as-is", async () => {
+    const src = "export default function A() { return <i>x</i>; }";
+    const out = await transpileReactArtifactSource(src, "Comp.tsx");
+    // esModuleInterop emits `react_1.default.createElement(...)` — match either form.
+    expect(out).toMatch(/\.createElement\(/);
+  });
+
+  it("throws with a useful diagnostic on syntax errors", async () => {
+    const broken = "export default function A() { return <div><b></div>; }"; // unclosed <b>
+    await expect(
+      transpileReactArtifactSource(broken, "broken.tsx"),
+    ).rejects.toThrow();
+  });
+
+  it("transpiles TypeScript type annotations away", async () => {
+    const src =
+      "function greet(name: string): string { return 'hi ' + name; }\nexport default () => greet('a');";
+    const out = await transpileReactArtifactSource(src, "g.tsx");
+    expect(out).not.toContain(": string");
+    expect(out).toContain("function greet(name)");
+  });
+});
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/transpileReactArtifact.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/transpileReactArtifact.ts
new file mode 100644
index 0000000000..4b23d0976d
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/transpileReactArtifact.ts
@@ -0,0 +1,43 @@
+function ensureJsxExtension(filename: string): string {
+  // TypeScript infers JSX parsing from the file extension; if the artifact
+  // title is "component" or "foo.ts", TSX syntax in the source will be
+  // treated as a syntax error. Force a .tsx extension for transpilation.
+  const lower = filename.toLowerCase();
+  if (lower.endsWith(".tsx") || lower.endsWith(".jsx")) return filename;
+  return `${filename || "artifact"}.tsx`;
+}
+
+export async function transpileReactArtifactSource(
+  source: string,
+  filename: string,
+) {
+  const ts = await import("typescript");
+  const result = ts.transpileModule(source, {
+    compilerOptions: {
+      allowJs: true,
+      esModuleInterop: true,
+      jsx: ts.JsxEmit.React,
+      module: ts.ModuleKind.CommonJS,
+      target: ts.ScriptTarget.ES2020,
+    },
+    fileName: ensureJsxExtension(filename),
+    reportDiagnostics: true,
+  });
+
+  const diagnostics =
+    result.diagnostics?.filter(
+      (diagnostic) => diagnostic.category === ts.DiagnosticCategory.Error,
+    ) ?? [];
+
+  if (diagnostics.length > 0) {
+    const message = diagnostics
+      .slice(0, 3)
+      .map((diagnostic) =>
+        ts.flattenDiagnosticMessageText(diagnostic.messageText, "\n"),
+      )
+      .join("\n\n");
+    throw new Error(message);
+  }
+
+  return result.outputText;
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/useArtifactContent.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/useArtifactContent.ts
new file mode 100644
index 0000000000..a800cdcd8f
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/components/useArtifactContent.ts
@@ -0,0 +1,154 @@
+"use client";
+
+import { useEffect, useRef, useState } from "react";
+import type { ArtifactRef } from "../../../store";
+import type { ArtifactClassification } from "../helpers";
+
+// Cap on cached text artifacts. Long sessions with many large artifacts
+// would otherwise hold every opened one in memory.
+const CONTENT_CACHE_MAX = 12;
+
+// Module-level LRU keyed by artifact id so a sibling action (e.g. Copy
+// in ArtifactPanelHeader) can read what the panel already fetched without
+// re-hitting the network.
+const contentCache = new Map<string, string>();
+
+export function getCachedArtifactContent(id: string): string | undefined {
+  return contentCache.get(id);
+}
+
+export function clearContentCache() {
+  contentCache.clear();
+}
+
+export function useArtifactContent(
+  artifact: ArtifactRef,
+  classification: ArtifactClassification,
+) {
+  const [content, setContent] = useState<string | null>(null);
+  const [pdfUrl, setPdfUrl] = useState<string | null>(null);
+  const [isLoading, setIsLoading] = useState(false);
+  const [error, setError] = useState<string | null>(null);
+  // Bumped by `retry()` to force the fetch effect to re-run.
+  const [retryNonce, setRetryNonce] = useState(0);
+  const scrollPositions = useRef(new Map<string, number>());
+  const scrollRef = useRef<HTMLDivElement>(null);
+
+  function retry() {
+    // Drop any cached failure/content for this id so we actually re-fetch.
+    contentCache.delete(artifact.id);
+    setRetryNonce((n) => n + 1);
+  }
+
+  // Save scroll position when switching artifacts. Only save when the
+  // content div has actually been mounted with a nonzero scrollTop, so we
+  // don't overwrite a previously-saved position with 0 from a skeleton render.
+  useEffect(() => {
+    return () => {
+      const node = scrollRef.current;
+      if (node && node.scrollTop > 0) {
+        scrollPositions.current.set(artifact.id, node.scrollTop);
+      }
+    };
+  }, [artifact.id]);
+
+  // Restore scroll position — wait until isLoading flips to false, since
+  // the scroll container is replaced by a Skeleton during loading and the
+  // real content div would otherwise mount with scrollTop=0.
+  useEffect(() => {
+    if (isLoading) return;
+    const saved = scrollPositions.current.get(artifact.id);
+    if (saved != null && scrollRef.current) {
+      scrollRef.current.scrollTop = saved;
+    }
+  }, [artifact.id, isLoading]);
+
+  useEffect(() => {
+    if (classification.type === "image") {
+      setContent(null);
+      setPdfUrl(null);
+      setError(null);
+      setIsLoading(false);
+      return;
+    }
+
+    let cancelled = false;
+    setIsLoading(true);
+    setError(null);
+
+    if (classification.type === "pdf") {
+      let objectUrl: string | null = null;
+      setContent(null);
+      setPdfUrl(null);
+      fetch(artifact.sourceUrl)
+        .then((res) => {
+          if (!res.ok) throw new Error(`Failed to fetch: ${res.status}`);
+          return res.blob();
+        })
+        .then((blob) => {
+          objectUrl = URL.createObjectURL(blob);
+          if (cancelled) {
+            URL.revokeObjectURL(objectUrl);
+            objectUrl = null;
+            return;
+          }
+          setPdfUrl(objectUrl);
+          setIsLoading(false);
+        })
+        .catch((err) => {
+          if (!cancelled) {
+            setError(err.message);
+            setIsLoading(false);
+          }
+        });
+      return () => {
+        cancelled = true;
+        if (objectUrl) URL.revokeObjectURL(objectUrl);
+      };
+    }
+
+    setPdfUrl(null);
+    // LRU touch — re-insert so the most-recently-used entry sits at the
+    // tail and the oldest entry falls off the head first.
+    const cache = contentCache;
+    const cached = cache.get(artifact.id);
+    if (cached !== undefined) {
+      cache.delete(artifact.id);
+      cache.set(artifact.id, cached);
+      setContent(cached);
+      setIsLoading(false);
+      return () => {
+        cancelled = true;
+      };
+    }
+    fetch(artifact.sourceUrl)
+      .then((res) => {
+        if (!res.ok) throw new Error(`Failed to fetch: ${res.status}`);
+        return res.text();
+      })
+      .then((text) => {
+        if (!cancelled) {
+          if (cache.size >= CONTENT_CACHE_MAX) {
+            // Map preserves insertion order — first key is the oldest.
+            const oldest = cache.keys().next().value;
+            if (oldest !== undefined) cache.delete(oldest);
+          }
+          cache.set(artifact.id, text);
+          setContent(text);
+          setIsLoading(false);
+        }
+      })
+      .catch((err) => {
+        if (!cancelled) {
+          setError(err.message);
+          setIsLoading(false);
+        }
+      });
+
+    return () => {
+      cancelled = true;
+    };
+  }, [artifact.id, artifact.sourceUrl, classification.type, retryNonce]);
+
+  return { content, pdfUrl, isLoading, error, scrollRef, retry };
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/downloadArtifact.test.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/downloadArtifact.test.ts
new file mode 100644
index 0000000000..fd9ca76079
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/downloadArtifact.test.ts
@@ -0,0 +1,121 @@
+import { afterEach, describe, expect, it, vi } from "vitest";
+import type { ArtifactRef } from "../../store";
+import { downloadArtifact } from "./downloadArtifact";
+
+function makeArtifact(title: string): ArtifactRef {
+  return {
+    id: "abc",
+    title,
+    mimeType: "text/plain",
+    sourceUrl: "/api/proxy/api/workspace/files/abc/download",
+    origin: "agent",
+  };
+}
+
+afterEach(() => {
+  vi.restoreAllMocks();
+});
+
+describe("downloadArtifact filename sanitization", () => {
+  it("strips path separators and control characters", async () => {
+    global.fetch = vi.fn().mockResolvedValue({
+      ok: true,
+      blob: () => Promise.resolve(new Blob(["x"])),
+    });
+    const clicks: HTMLAnchorElement[] = [];
+    const originalCreate = document.createElement.bind(document);
+    vi.spyOn(document, "createElement").mockImplementation((tag: string) => {
+      const el = originalCreate(tag);
+      if (tag === "a") {
+        clicks.push(el as HTMLAnchorElement);
+        // Prevent actual navigation in test env.
+        (el as HTMLAnchorElement).click = () => {};
+      }
+      return el;
+    });
+    global.URL.createObjectURL = vi.fn(() => "blob:mock");
+    global.URL.revokeObjectURL = vi.fn();
+
+    await downloadArtifact(makeArtifact("../../etc/passwd"));
+    // ..→_ then /→_ gives ____etc_passwd (no leading ..)
+    expect(clicks[0]?.download).toBe("____etc_passwd");
+  });
+
+  it("replaces Windows-reserved characters", async () => {
+    global.fetch = vi.fn().mockResolvedValue({
+      ok: true,
+      blob: () => Promise.resolve(new Blob(["x"])),
+    });
+    const clicks: HTMLAnchorElement[] = [];
+    const originalCreate = document.createElement.bind(document);
+    vi.spyOn(document, "createElement").mockImplementation((tag: string) => {
+      const el = originalCreate(tag);
+      if (tag === "a") {
+        clicks.push(el as HTMLAnchorElement);
+        (el as HTMLAnchorElement).click = () => {};
+      }
+      return el;
+    });
+    global.URL.createObjectURL = vi.fn(() => "blob:mock");
+    global.URL.revokeObjectURL = vi.fn();
+
+    await downloadArtifact(makeArtifact('a<b>c:"d*e?f|g'));
+    expect(clicks[0]?.download).toBe("a_b_c__d_e_f_g");
+  });
+
+  it("falls back to 'download' when title is empty after sanitization", async () => {
+    global.fetch = vi.fn().mockResolvedValue({
+      ok: true,
+      blob: () => Promise.resolve(new Blob(["x"])),
+    });
+    const clicks: HTMLAnchorElement[] = [];
+    const originalCreate = document.createElement.bind(document);
+    vi.spyOn(document, "createElement").mockImplementation((tag: string) => {
+      const el = originalCreate(tag);
+      if (tag === "a") {
+        clicks.push(el as HTMLAnchorElement);
+        (el as HTMLAnchorElement).click = () => {};
+      }
+      return el;
+    });
+    global.URL.createObjectURL = vi.fn(() => "blob:mock");
+    global.URL.revokeObjectURL = vi.fn();
+
+    await downloadArtifact(makeArtifact(""));
+    expect(clicks[0]?.download).toBe("download");
+  });
+
+  it("keeps normal filenames intact", async () => {
+    global.fetch = vi.fn().mockResolvedValue({
+      ok: true,
+      blob: () => Promise.resolve(new Blob(["x"])),
+    });
+    const clicks: HTMLAnchorElement[] = [];
+    const originalCreate = document.createElement.bind(document);
+    vi.spyOn(document, "createElement").mockImplementation((tag: string) => {
+      const el = originalCreate(tag);
+      if (tag === "a") {
+        clicks.push(el as HTMLAnchorElement);
+        (el as HTMLAnchorElement).click = () => {};
+      }
+      return el;
+    });
+    global.URL.createObjectURL = vi.fn(() => "blob:mock");
+    global.URL.revokeObjectURL = vi.fn();
+
+    await downloadArtifact(makeArtifact("report-2024 (final).pdf"));
+    expect(clicks[0]?.download).toBe("report-2024 (final).pdf");
+  });
+
+  it("rejects when fetch returns non-ok status", async () => {
+    global.fetch = vi.fn().mockResolvedValue({ ok: false, status: 404 });
+    await expect(downloadArtifact(makeArtifact("x.txt"))).rejects.toThrow(
+      /Download failed: 404/,
+    );
+  });
+
+  it("rejects when fetch itself throws", async () => {
+    global.fetch = vi.fn().mockRejectedValue(new Error("network"));
+    await expect(downloadArtifact(makeArtifact("x.txt"))).rejects.toThrow();
+  });
+});
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/downloadArtifact.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/downloadArtifact.ts
new file mode 100644
index 0000000000..d7d902839a
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/downloadArtifact.ts
@@ -0,0 +1,35 @@
+import type { ArtifactRef } from "../../store";
+
+/**
+ * Trigger a file download from an artifact URL.
+ *
+ * Uses fetch+blob instead of a bare `<a download>` because the browser
+ * ignores the `download` attribute on cross-origin responses (GCS signed
+ * URLs), and some browsers require the anchor to be attached to the DOM
+ * before `.click()` fires the download.
+ */
+export function downloadArtifact(artifact: ArtifactRef): Promise<void> {
+  // Replace path separators, Windows-reserved chars, control chars, and
+  // parent-dir sequences so the browser-assigned filename is safe to write
+  // anywhere on the user's filesystem.
+  const safeName =
+    artifact.title
+      .replace(/\.\./g, "_")
+      .replace(/[\\/:*?"<>|\x00-\x1f]/g, "_")
+      .replace(/^\.+/, "") || "download";
+  return fetch(artifact.sourceUrl)
+    .then((res) => {
+      if (!res.ok) throw new Error(`Download failed: ${res.status}`);
+      return res.blob();
+    })
+    .then((blob) => {
+      const url = URL.createObjectURL(blob);
+      const a = document.createElement("a");
+      a.href = url;
+      a.download = safeName;
+      document.body.appendChild(a);
+      a.click();
+      a.remove();
+      URL.revokeObjectURL(url);
+    });
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/helpers.test.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/helpers.test.ts
new file mode 100644
index 0000000000..f45f0695b8
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/helpers.test.ts
@@ -0,0 +1,79 @@
+import { describe, expect, it } from "vitest";
+import { classifyArtifact } from "./helpers";
+
+describe("classifyArtifact", () => {
+  it("routes PDF by extension", () => {
+    const c = classifyArtifact(null, "report.pdf");
+    expect(c.type).toBe("pdf");
+    expect(c.openable).toBe(true);
+  });
+
+  it("routes PDF by MIME when no extension matches", () => {
+    const c = classifyArtifact("application/pdf", "noextension");
+    expect(c.type).toBe("pdf");
+  });
+
+  it("routes JSX/TSX as react", () => {
+    expect(classifyArtifact(null, "App.tsx").type).toBe("react");
+    expect(classifyArtifact(null, "Comp.jsx").type).toBe("react");
+  });
+
+  it("routes code extensions to code", () => {
+    expect(classifyArtifact(null, "script.py").type).toBe("code");
+    expect(classifyArtifact(null, "main.go").type).toBe("code");
+    expect(classifyArtifact(null, "Dockerfile.yml").type).toBe("code");
+  });
+
+  it("treats images as image (inline rendered)", () => {
+    expect(classifyArtifact(null, "photo.png").type).toBe("image");
+    expect(classifyArtifact("image/svg+xml", "unknown").type).toBe("image");
+  });
+
+  it("treats CSVs as csv with source toggle", () => {
+    const c = classifyArtifact(null, "data.csv");
+    expect(c.type).toBe("csv");
+    expect(c.hasSourceToggle).toBe(true);
+  });
+
+  it("treats HTML as html with source toggle", () => {
+    expect(classifyArtifact(null, "page.html").type).toBe("html");
+    expect(classifyArtifact("text/html", "noext").type).toBe("html");
+  });
+
+  it("treats markdown as markdown", () => {
+    expect(classifyArtifact(null, "README.md").type).toBe("markdown");
+    expect(classifyArtifact("text/markdown", "x").type).toBe("markdown");
+  });
+
+  it("gates files > 10MB to download-only", () => {
+    const c = classifyArtifact("text/plain", "big.txt", 20 * 1024 * 1024);
+    expect(c.openable).toBe(false);
+    expect(c.type).toBe("download-only");
+  });
+
+  it("treats binary/octet-stream MIME as download-only", () => {
+    expect(classifyArtifact("application/zip", "a.zip").openable).toBe(false);
+    expect(classifyArtifact("application/octet-stream", "x").openable).toBe(
+      false,
+    );
+    expect(classifyArtifact("video/mp4", "clip.mp4").openable).toBe(false);
+  });
+
+  it("defaults unknown extension+MIME to download-only (not text)", () => {
+    // Regression: previously dumped binary as <pre>; now refuses to open.
+    const c = classifyArtifact(null, "data.bin");
+    expect(c.openable).toBe(false);
+    expect(c.type).toBe("download-only");
+  });
+
+  it("is case-insensitive on extension", () => {
+    expect(classifyArtifact(null, "image.PNG").type).toBe("image");
+    expect(classifyArtifact(null, "Notes.MD").type).toBe("markdown");
+  });
+
+  it("prioritizes extension over MIME", () => {
+    // Extension says CSV, MIME says plain text → extension wins.
+    const c = classifyArtifact("text/plain", "data.csv");
+    expect(c.type).toBe("csv");
+  });
+});
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/helpers.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/helpers.ts
new file mode 100644
index 0000000000..dc9d6cddc6
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/helpers.ts
@@ -0,0 +1,229 @@
+import {
+  Code,
+  File,
+  FileHtml,
+  FileText,
+  Image,
+  Table,
+} from "@phosphor-icons/react";
+import type { Icon } from "@phosphor-icons/react";
+
+export interface ArtifactClassification {
+  type:
+    | "markdown"
+    | "code"
+    | "react"
+    | "html"
+    | "csv"
+    | "json"
+    | "image"
+    | "pdf"
+    | "text"
+    | "download-only";
+  icon: Icon;
+  label: string;
+  openable: boolean;
+  hasSourceToggle: boolean;
+}
+
+const TEN_MB = 10 * 1024 * 1024;
+
+// Catalog of classification kinds. Each entry defines the shared output
+// shape; extension/MIME → kind mapping is handled by the lookup tables below.
+const KIND: Record<string, ArtifactClassification> = {
+  image: {
+    type: "image",
+    icon: Image,
+    label: "Image",
+    openable: true,
+    hasSourceToggle: false,
+  },
+  pdf: {
+    type: "pdf",
+    icon: FileText,
+    label: "PDF",
+    openable: true,
+    hasSourceToggle: false,
+  },
+  csv: {
+    type: "csv",
+    icon: Table,
+    label: "Spreadsheet",
+    openable: true,
+    hasSourceToggle: true,
+  },
+  html: {
+    type: "html",
+    icon: FileHtml,
+    label: "HTML",
+    openable: true,
+    hasSourceToggle: true,
+  },
+  react: {
+    type: "react",
+    icon: FileHtml,
+    label: "React",
+    openable: true,
+    hasSourceToggle: true,
+  },
+  markdown: {
+    type: "markdown",
+    icon: FileText,
+    label: "Document",
+    openable: true,
+    hasSourceToggle: true,
+  },
+  json: {
+    type: "json",
+    icon: Code,
+    label: "Data",
+    openable: true,
+    hasSourceToggle: true,
+  },
+  code: {
+    type: "code",
+    icon: Code,
+    label: "Code",
+    openable: true,
+    hasSourceToggle: false,
+  },
+  text: {
+    type: "text",
+    icon: FileText,
+    label: "Text",
+    openable: true,
+    hasSourceToggle: false,
+  },
+  "download-only": {
+    type: "download-only",
+    icon: File,
+    label: "File",
+    openable: false,
+    hasSourceToggle: false,
+  },
+};
+
+// Extension → kind. First match wins.
+const EXT_KIND: Record<string, string> = {
+  ".png": "image",
+  ".jpg": "image",
+  ".jpeg": "image",
+  ".gif": "image",
+  ".webp": "image",
+  ".svg": "image",
+  ".bmp": "image",
+  ".ico": "image",
+  ".pdf": "pdf",
+  ".csv": "csv",
+  ".html": "html",
+  ".htm": "html",
+  ".jsx": "react",
+  ".tsx": "react",
+  ".md": "markdown",
+  ".mdx": "markdown",
+  ".json": "json",
+  ".txt": "text",
+  ".log": "text",
+  // code extensions
+  ".js": "code",
+  ".ts": "code",
+  ".py": "code",
+  ".rb": "code",
+  ".go": "code",
+  ".rs": "code",
+  ".java": "code",
+  ".c": "code",
+  ".cpp": "code",
+  ".h": "code",
+  ".cs": "code",
+  ".php": "code",
+  ".swift": "code",
+  ".kt": "code",
+  ".sh": "code",
+  ".bash": "code",
+  ".zsh": "code",
+  ".yml": "code",
+  ".yaml": "code",
+  ".toml": "code",
+  ".ini": "code",
+  ".cfg": "code",
+  ".sql": "code",
+  ".r": "code",
+  ".lua": "code",
+  ".pl": "code",
+  ".scala": "code",
+};
+
+// Exact-match MIME → kind (fallback when extension doesn't match).
+const MIME_KIND: Record<string, string> = {
+  "application/pdf": "pdf",
+  "text/csv": "csv",
+  "text/html": "html",
+  "text/jsx": "react",
+  "text/tsx": "react",
+  "application/jsx": "react",
+  "application/x-typescript-jsx": "react",
+  "text/markdown": "markdown",
+  "text/x-markdown": "markdown",
+  "application/json": "json",
+  "application/javascript": "code",
+  "text/javascript": "code",
+  "application/typescript": "code",
+  "text/typescript": "code",
+  "application/xml": "code",
+  "text/xml": "code",
+};
+
+const BINARY_MIMES = new Set([
+  "application/zip",
+  "application/x-zip-compressed",
+  "application/gzip",
+  "application/x-tar",
+  "application/x-rar-compressed",
+  "application/x-7z-compressed",
+  "application/octet-stream",
+  "application/x-executable",
+  "application/x-msdos-program",
+  "application/vnd.microsoft.portable-executable",
+]);
+
+function getExtension(filename?: string): string {
+  if (!filename) return "";
+  const lastDot = filename.lastIndexOf(".");
+  if (lastDot === -1) return "";
+  return filename.slice(lastDot).toLowerCase();
+}
+
+export function classifyArtifact(
+  mimeType: string | null,
+  filename?: string,
+  sizeBytes?: number,
+): ArtifactClassification {
+  // Size gate: >10MB is download-only regardless of type.
+  if (sizeBytes && sizeBytes > TEN_MB) return KIND["download-only"];
+
+  // Extension first (more reliable than MIME for AI-generated files).
+  const ext = getExtension(filename);
+  const extKind = EXT_KIND[ext];
+  if (extKind) return KIND[extKind];
+
+  // MIME fallbacks.
+  const mime = (mimeType ?? "").toLowerCase();
+  if (mime.startsWith("image/")) return KIND.image;
+  const mimeKind = MIME_KIND[mime];
+  if (mimeKind) return KIND[mimeKind];
+  if (mime.startsWith("text/x-")) return KIND.code;
+  if (
+    BINARY_MIMES.has(mime) ||
+    mime.startsWith("audio/") ||
+    mime.startsWith("video/")
+  ) {
+    return KIND["download-only"];
+  }
+  if (mime.startsWith("text/")) return KIND.text;
+
+  // Unknown extension + unknown MIME: don't open — we can't safely assume
+  // this is text, and fetching a binary to dump it into a <pre> wastes
+  // bandwidth and shows garbage.
+  return KIND["download-only"];
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/useArtifactPanel.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/useArtifactPanel.ts
new file mode 100644
index 0000000000..3a512aa709
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ArtifactPanel/useArtifactPanel.ts
@@ -0,0 +1,148 @@
+"use client";
+
+import { toast } from "@/components/molecules/Toast/use-toast";
+import { useEffect, useState } from "react";
+import { useCopilotUIStore } from "../../store";
+import { getCachedArtifactContent } from "./components/useArtifactContent";
+import { downloadArtifact } from "./downloadArtifact";
+import { classifyArtifact } from "./helpers";
+
+// SSR fallback for viewport width before window is available.
+const DEFAULT_VIEWPORT_WIDTH = 1280;
+
+export function useArtifactPanel() {
+  const artifactPanel = useCopilotUIStore((s) => s.artifactPanel);
+  const closeArtifactPanel = useCopilotUIStore((s) => s.closeArtifactPanel);
+  const minimizeArtifactPanel = useCopilotUIStore(
+    (s) => s.minimizeArtifactPanel,
+  );
+  const maximizeArtifactPanel = useCopilotUIStore(
+    (s) => s.maximizeArtifactPanel,
+  );
+  const restoreArtifactPanel = useCopilotUIStore((s) => s.restoreArtifactPanel);
+  const setArtifactPanelWidth = useCopilotUIStore(
+    (s) => s.setArtifactPanelWidth,
+  );
+  const goBackArtifact = useCopilotUIStore((s) => s.goBackArtifact);
+
+  const [isSourceView, setIsSourceView] = useState(false);
+
+  const { activeArtifact } = artifactPanel;
+
+  const classification = activeArtifact
+    ? classifyArtifact(
+        activeArtifact.mimeType,
+        activeArtifact.title,
+        activeArtifact.sizeBytes,
+      )
+    : null;
+
+  // Reset source view when switching artifacts
+  useEffect(() => {
+    setIsSourceView(false);
+  }, [activeArtifact?.id]);
+
+  // Keyboard: Escape to close
+  useEffect(() => {
+    if (!artifactPanel.isOpen) return;
+
+    function handleKeyDown(e: KeyboardEvent) {
+      if (e.key === "Escape") {
+        if (document.querySelector('[role="dialog"], [data-state="open"]'))
+          return;
+        closeArtifactPanel();
+      }
+    }
+
+    document.addEventListener("keydown", handleKeyDown);
+    return () => document.removeEventListener("keydown", handleKeyDown);
+  }, [artifactPanel.isOpen, closeArtifactPanel]);
+
+  // Track viewport width reactively for maximize mode.
+  const [viewportWidth, setViewportWidth] = useState(
+    typeof window !== "undefined" ? window.innerWidth : DEFAULT_VIEWPORT_WIDTH,
+  );
+  useEffect(() => {
+    // Throttle to ~10Hz: resize fires continuously during drag, but we only
+    // need the panel width to follow the viewport within a frame or two.
+    let timer: ReturnType<typeof setTimeout> | null = null;
+    function handleResize() {
+      if (timer) return;
+      timer = setTimeout(() => {
+        setViewportWidth(window.innerWidth);
+        timer = null;
+      }, 100);
+    }
+    window.addEventListener("resize", handleResize);
+    return () => {
+      window.removeEventListener("resize", handleResize);
+      if (timer) clearTimeout(timer);
+    };
+  }, []);
+
+  const canCopy =
+    classification != null &&
+    classification.type !== "image" &&
+    classification.type !== "download-only" &&
+    classification.type !== "pdf";
+
+  function handleCopy() {
+    if (!activeArtifact || !canCopy) return;
+    // Reuse content already fetched by the preview pane when available —
+    // Copy should feel instant, not trigger a second network round-trip.
+    const cached = getCachedArtifactContent(activeArtifact.id);
+    const textPromise = cached
+      ? Promise.resolve(cached)
+      : fetch(activeArtifact.sourceUrl).then((res) => {
+          if (!res.ok) throw new Error(`Copy failed: ${res.status}`);
+          return res.text();
+        });
+    textPromise
+      .then((text) => navigator.clipboard.writeText(text))
+      .then(() => {
+        toast({ title: "Copied to clipboard" });
+      })
+      .catch(() => {
+        toast({
+          title: "Copy failed",
+          description: "Couldn't read the file or access the clipboard.",
+          variant: "destructive",
+        });
+      });
+  }
+
+  function handleDownload() {
+    if (!activeArtifact) return;
+    downloadArtifact(activeArtifact).catch(() => {
+      toast({
+        title: "Download failed",
+        description: "Couldn't fetch the file.",
+        variant: "destructive",
+      });
+    });
+  }
+
+  // Always clamp against the current viewport so a previously-dragged-wide
+  // panel doesn't spill offscreen after the user resizes their window.
+  const maxWidth = viewportWidth * 0.85;
+  const effectiveWidth = artifactPanel.isMaximized
+    ? maxWidth
+    : Math.min(artifactPanel.width, maxWidth);
+
+  return {
+    ...artifactPanel,
+    effectiveWidth,
+    isSourceView,
+    classification,
+    setIsSourceView,
+    closeArtifactPanel,
+    minimizeArtifactPanel,
+    maximizeArtifactPanel,
+    restoreArtifactPanel,
+    setArtifactPanelWidth,
+    goBackArtifact,
+    canCopy,
+    handleCopy,
+    handleDownload,
+  };
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatContainer/ChatContainer.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatContainer/ChatContainer.tsx
index 3b42b1a415..6e67c8fc48 100644
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatContainer/ChatContainer.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatContainer/ChatContainer.tsx
@@ -1,11 +1,15 @@
 "use client";
 import { ChatInput } from "@/app/(platform)/copilot/components/ChatInput/ChatInput";
+import { cn } from "@/lib/utils";
+import { Flag, useGetFlag } from "@/services/feature-flags/use-get-flag";
 import { UIDataTypes, UIMessage, UITools } from "ai";
 import { LayoutGroup, motion } from "framer-motion";
 import { useCallback } from "react";
+import { useCopilotUIStore } from "../../store";
 import { ChatMessagesContainer } from "../ChatMessagesContainer/ChatMessagesContainer";
 import { CopilotChatActionsProvider } from "../CopilotChatActionsProvider/CopilotChatActionsProvider";
 import { EmptySession } from "../EmptySession/EmptySession";
+import { useAutoOpenArtifacts } from "./useAutoOpenArtifacts";
 
 export interface ChatContainerProps {
   messages: UIMessage<unknown, UIDataTypes, UITools>[];
@@ -23,6 +27,9 @@ export interface ChatContainerProps {
   onSend: (message: string, files?: File[]) => void | Promise<void>;
   onStop: () => void;
   isUploadingFiles?: boolean;
+  hasMoreMessages?: boolean;
+  isLoadingMore?: boolean;
+  onLoadMore?: () => void;
   /** Files dropped onto the chat window. */
   droppedFiles?: File[];
   /** Called after droppedFiles have been consumed by ChatInput. */
@@ -44,10 +51,23 @@ export const ChatContainer = ({
   onSend,
   onStop,
   isUploadingFiles,
+  hasMoreMessages,
+  isLoadingMore,
+  onLoadMore,
   droppedFiles,
   onDroppedFilesConsumed,
   historicalDurations,
 }: ChatContainerProps) => {
+  const isArtifactsEnabled = useGetFlag(Flag.ARTIFACTS);
+  const isArtifactPanelOpen = useCopilotUIStore((s) => s.artifactPanel.isOpen);
+  // When the flag is off we must not auto-open artifacts or let the panel's
+  // open state drive layout width; an artifact generated in a stale session
+  // state would otherwise shrink the chat column with no panel rendered.
+  const isArtifactOpen = isArtifactsEnabled && isArtifactPanelOpen;
+  useAutoOpenArtifacts({
+    messages: isArtifactsEnabled ? messages : [],
+    sessionId,
+  });
   const isBusy =
     status === "streaming" ||
     status === "submitted" ||
@@ -76,13 +96,21 @@ export const ChatContainer = ({
       <LayoutGroup id="copilot-2-chat-layout">
         <div className="flex h-full min-h-0 w-full flex-col bg-[#f8f8f9] px-2 lg:px-0">
           {sessionId ? (
-            <div className="mx-auto flex h-full min-h-0 w-full max-w-3xl flex-col">
+            <div
+              className={cn(
+                "mx-auto flex h-full min-h-0 w-full flex-col",
+                !isArtifactOpen && "max-w-3xl",
+              )}
+            >
               <ChatMessagesContainer
                 messages={messages}
                 status={status}
                 error={error}
                 isLoading={isLoadingSession}
                 sessionID={sessionId}
+                hasMoreMessages={hasMoreMessages}
+                isLoadingMore={isLoadingMore}
+                onLoadMore={onLoadMore}
                 onRetry={handleRetry}
                 historicalDurations={historicalDurations}
               />
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatContainer/useAutoOpenArtifacts.test.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatContainer/useAutoOpenArtifacts.test.ts
new file mode 100644
index 0000000000..140b46b338
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatContainer/useAutoOpenArtifacts.test.ts
@@ -0,0 +1,140 @@
+import { act, renderHook } from "@testing-library/react";
+import { beforeEach, describe, expect, it } from "vitest";
+import { useCopilotUIStore } from "../../store";
+import { useAutoOpenArtifacts } from "./useAutoOpenArtifacts";
+
+function assistantMessageWithText(id: string, text: string) {
+  return {
+    id,
+    role: "assistant" as const,
+    parts: [{ type: "text" as const, text }],
+  };
+}
+
+const A_ID = "11111111-0000-0000-0000-000000000000";
+const B_ID = "22222222-0000-0000-0000-000000000000";
+
+function resetStore() {
+  useCopilotUIStore.setState({
+    artifactPanel: {
+      isOpen: false,
+      isMinimized: false,
+      isMaximized: false,
+      width: 600,
+      activeArtifact: null,
+      history: [],
+    },
+  });
+}
+
+describe("useAutoOpenArtifacts", () => {
+  beforeEach(resetStore);
+
+  it("does NOT auto-open on the initial hydration of message list (baseline pass)", () => {
+    const messages = [
+      assistantMessageWithText("m1", `[a](workspace://${A_ID})`),
+    ];
+    renderHook(() =>
+      useAutoOpenArtifacts({ messages: messages as any, sessionId: "s1" }),
+    );
+    // Initial run just records the baseline fingerprint; nothing opens.
+    expect(useCopilotUIStore.getState().artifactPanel.isOpen).toBe(false);
+  });
+
+  it("auto-opens when an existing assistant message adds a new artifact", () => {
+    // 1st render: baseline with no artifact.
+    const initial = [assistantMessageWithText("m1", "thinking...")];
+    const { rerender } = renderHook(
+      ({ messages, sessionId }) =>
+        useAutoOpenArtifacts({ messages: messages as any, sessionId }),
+      { initialProps: { messages: initial, sessionId: "s1" } },
+    );
+    expect(useCopilotUIStore.getState().artifactPanel.isOpen).toBe(false);
+
+    // 2nd render: same message id now contains an artifact link.
+    act(() => {
+      rerender({
+        messages: [
+          assistantMessageWithText("m1", `here: [A](workspace://${A_ID})`),
+        ],
+        sessionId: "s1",
+      });
+    });
+    const s = useCopilotUIStore.getState().artifactPanel;
+    expect(s.isOpen).toBe(true);
+    expect(s.activeArtifact?.id).toBe(A_ID);
+  });
+
+  it("does not re-open when the fingerprint hasn't changed", () => {
+    const msg = assistantMessageWithText("m1", `[A](workspace://${A_ID})`);
+    const { rerender } = renderHook(
+      ({ messages, sessionId }) =>
+        useAutoOpenArtifacts({ messages: messages as any, sessionId }),
+      { initialProps: { messages: [msg], sessionId: "s1" } },
+    );
+    // Baseline captured; no open.
+    expect(useCopilotUIStore.getState().artifactPanel.isOpen).toBe(false);
+
+    // Rerender identical content: no change in fingerprint → no open.
+    act(() => {
+      rerender({ messages: [msg], sessionId: "s1" });
+    });
+    expect(useCopilotUIStore.getState().artifactPanel.isOpen).toBe(false);
+  });
+
+  it("auto-opens when a brand-new assistant message arrives after the baseline is established", () => {
+    // First render: one message without artifacts → establishes baseline.
+    const { rerender } = renderHook(
+      ({ messages, sessionId }) =>
+        useAutoOpenArtifacts({ messages: messages as any, sessionId }),
+      {
+        initialProps: {
+          messages: [assistantMessageWithText("m1", "plain")] as any,
+          sessionId: "s1",
+        },
+      },
+    );
+    expect(useCopilotUIStore.getState().artifactPanel.isOpen).toBe(false);
+
+    // Second render: a *new* assistant message with an artifact. Baseline
+    // is already set, so this should auto-open.
+    act(() => {
+      rerender({
+        messages: [
+          assistantMessageWithText("m1", "plain"),
+          assistantMessageWithText("m2", `[B](workspace://${B_ID})`),
+        ] as any,
+        sessionId: "s1",
+      });
+    });
+    const s = useCopilotUIStore.getState().artifactPanel;
+    expect(s.isOpen).toBe(true);
+    expect(s.activeArtifact?.id).toBe(B_ID);
+  });
+
+  it("resets hydration baseline when sessionId changes", () => {
+    const { rerender } = renderHook(
+      ({ messages, sessionId }) =>
+        useAutoOpenArtifacts({ messages: messages as any, sessionId }),
+      {
+        initialProps: {
+          messages: [
+            assistantMessageWithText("m1", `[A](workspace://${A_ID})`),
+          ] as any,
+          sessionId: "s1",
+        },
+      },
+    );
+    // Switch to a new session — the first pass on the new session should
+    // NOT auto-open (it's a fresh hydration).
+    act(() => {
+      rerender({
+        messages: [
+          assistantMessageWithText("m2", `[B](workspace://${B_ID})`),
+        ] as any,
+        sessionId: "s2",
+      });
+    });
+    expect(useCopilotUIStore.getState().artifactPanel.isOpen).toBe(false);
+  });
+});
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatContainer/useAutoOpenArtifacts.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatContainer/useAutoOpenArtifacts.ts
new file mode 100644
index 0000000000..4fc1ca02bb
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatContainer/useAutoOpenArtifacts.ts
@@ -0,0 +1,91 @@
+"use client";
+
+import { UIDataTypes, UIMessage, UITools } from "ai";
+import { useEffect, useRef } from "react";
+import type { ArtifactRef } from "../../store";
+import { useCopilotUIStore } from "../../store";
+import { getMessageArtifacts } from "../ChatMessagesContainer/helpers";
+
+function fingerprintArtifacts(artifacts: ArtifactRef[]): string {
+  return artifacts
+    .map((a) => `${a.id}:${a.title}:${a.mimeType ?? ""}:${a.sourceUrl}`)
+    .join("|");
+}
+
+interface UseAutoOpenArtifactsOptions {
+  messages: UIMessage<unknown, UIDataTypes, UITools>[];
+  sessionId: string | null;
+}
+
+export function useAutoOpenArtifacts({
+  messages,
+  sessionId,
+}: UseAutoOpenArtifactsOptions) {
+  const openArtifact = useCopilotUIStore((state) => state.openArtifact);
+  const messageFingerprintsRef = useRef<Map<string, string>>(new Map());
+  const hasInitializedRef = useRef(false);
+
+  useEffect(() => {
+    messageFingerprintsRef.current = new Map();
+    hasInitializedRef.current = false;
+  }, [sessionId]);
+
+  useEffect(() => {
+    if (messages.length === 0) {
+      messageFingerprintsRef.current = new Map();
+      return;
+    }
+
+    // Only scan messages whose fingerprint might have changed since the
+    // last pass: that's the last assistant message (currently streaming)
+    // plus any assistant message whose id isn't in the baseline yet.
+    // This keeps the cost O(new+tail), not O(all messages), on every chunk.
+    const previous = messageFingerprintsRef.current;
+    const nextFingerprints = new Map<string, string>(previous);
+    let nextArtifact: ArtifactRef | null = null;
+    const lastAssistantIdx = (() => {
+      for (let i = messages.length - 1; i >= 0; i--) {
+        if (messages[i].role === "assistant") return i;
+      }
+      return -1;
+    })();
+
+    for (let i = 0; i < messages.length; i++) {
+      const message = messages[i];
+      if (message.role !== "assistant") continue;
+      const isTailAssistant = i === lastAssistantIdx;
+      const isNewMessage = !previous.has(message.id);
+      if (!isTailAssistant && !isNewMessage) continue;
+
+      const artifacts = getMessageArtifacts(message);
+      const fingerprint = fingerprintArtifacts(artifacts);
+      nextFingerprints.set(message.id, fingerprint);
+
+      if (!hasInitializedRef.current || fingerprint.length === 0) {
+        continue;
+      }
+
+      const previousFingerprint = previous.get(message.id) ?? "";
+      if (previousFingerprint === fingerprint) continue;
+
+      nextArtifact = artifacts[artifacts.length - 1] ?? nextArtifact;
+    }
+
+    // Drop entries for messages that no longer exist (e.g. history truncated).
+    const liveIds = new Set(messages.map((m) => m.id));
+    for (const id of nextFingerprints.keys()) {
+      if (!liveIds.has(id)) nextFingerprints.delete(id);
+    }
+
+    messageFingerprintsRef.current = nextFingerprints;
+
+    if (!hasInitializedRef.current) {
+      hasInitializedRef.current = true;
+      return;
+    }
+
+    if (nextArtifact) {
+      openArtifact(nextArtifact);
+    }
+  }, [messages, openArtifact]);
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/ChatInput.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/ChatInput.tsx
index aa99e2fa18..b836c1e766 100644
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/ChatInput.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/ChatInput.tsx
@@ -5,17 +5,21 @@ import {
   PromptInputTextarea,
   PromptInputTools,
 } from "@/components/ai-elements/prompt-input";
+import { toast } from "@/components/molecules/Toast/use-toast";
 import { InputGroup } from "@/components/ui/input-group";
 import { cn } from "@/lib/utils";
+import { Flag, useGetFlag } from "@/services/feature-flags/use-get-flag";
 import { ChangeEvent, useEffect, useState } from "react";
 import { AttachmentMenu } from "./components/AttachmentMenu";
 import { FileChips } from "./components/FileChips";
+import { ModeToggleButton } from "./components/ModeToggleButton";
 import { RecordingButton } from "./components/RecordingButton";
 import { RecordingIndicator } from "./components/RecordingIndicator";
+import { useCopilotUIStore } from "../../store";
 import { useChatInput } from "./useChatInput";
 import { useVoiceRecording } from "./useVoiceRecording";
 
-export interface Props {
+interface Props {
   onSend: (message: string, files?: File[]) => void | Promise<void>;
   disabled?: boolean;
   isStreaming?: boolean;
@@ -42,8 +46,26 @@ export function ChatInput({
   droppedFiles,
   onDroppedFilesConsumed,
 }: Props) {
+  const { copilotMode, setCopilotMode } = useCopilotUIStore();
+  const showModeToggle = useGetFlag(Flag.CHAT_MODE_OPTION);
   const [files, setFiles] = useState<File[]>([]);
 
+  function handleToggleMode() {
+    const next =
+      copilotMode === "extended_thinking" ? "fast" : "extended_thinking";
+    setCopilotMode(next);
+    toast({
+      title:
+        next === "fast"
+          ? "Switched to Fast mode"
+          : "Switched to Extended Thinking mode",
+      description:
+        next === "fast"
+          ? "Optimized for speed — ideal for simpler tasks."
+          : "Responses may take longer.",
+    });
+  }
+
   // Merge files dropped onto the chat window into internal state.
   useEffect(() => {
     if (droppedFiles && droppedFiles.length > 0) {
@@ -157,6 +179,13 @@ export function ChatInput({
               onFilesSelected={handleFilesSelected}
               disabled={isBusy}
             />
+            {showModeToggle && (
+              <ModeToggleButton
+                mode={copilotMode}
+                isStreaming={isStreaming}
+                onToggle={handleToggleMode}
+              />
+            )}
           </PromptInputTools>
 
           <div className="flex items-center gap-4">
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/__tests__/ChatInput.test.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/__tests__/ChatInput.test.tsx
new file mode 100644
index 0000000000..cb8f4227b4
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/__tests__/ChatInput.test.tsx
@@ -0,0 +1,199 @@
+import {
+  render,
+  screen,
+  fireEvent,
+  cleanup,
+} from "@/tests/integrations/test-utils";
+import { afterEach, describe, expect, it, vi } from "vitest";
+import { ChatInput } from "../ChatInput";
+
+let mockCopilotMode = "extended_thinking";
+const mockSetCopilotMode = vi.fn((mode: string) => {
+  mockCopilotMode = mode;
+});
+
+vi.mock("@/app/(platform)/copilot/store", () => ({
+  useCopilotUIStore: () => ({
+    copilotMode: mockCopilotMode,
+    setCopilotMode: mockSetCopilotMode,
+    initialPrompt: null,
+    setInitialPrompt: vi.fn(),
+  }),
+}));
+
+let mockFlagValue = false;
+vi.mock("@/services/feature-flags/use-get-flag", () => ({
+  Flag: { CHAT_MODE_OPTION: "CHAT_MODE_OPTION" },
+  useGetFlag: () => mockFlagValue,
+}));
+
+vi.mock("@/components/molecules/Toast/use-toast", () => ({
+  toast: vi.fn(),
+  useToast: () => ({ toast: vi.fn(), dismiss: vi.fn() }),
+}));
+
+vi.mock("../useVoiceRecording", () => ({
+  useVoiceRecording: () => ({
+    isRecording: false,
+    isTranscribing: false,
+    elapsedTime: 0,
+    toggleRecording: vi.fn(),
+    handleKeyDown: vi.fn(),
+    showMicButton: false,
+    isInputDisabled: false,
+    audioStream: null,
+  }),
+}));
+
+vi.mock("@/components/ai-elements/prompt-input", () => ({
+  PromptInputBody: ({ children }: { children: React.ReactNode }) => (
+    <div>{children}</div>
+  ),
+  PromptInputFooter: ({ children }: { children: React.ReactNode }) => (
+    <div>{children}</div>
+  ),
+  PromptInputSubmit: ({ disabled }: { disabled?: boolean }) => (
+    <button disabled={disabled} data-testid="submit">
+      Send
+    </button>
+  ),
+  PromptInputTextarea: (props: {
+    id?: string;
+    value?: string;
+    onChange?: React.ChangeEventHandler<HTMLTextAreaElement>;
+    disabled?: boolean;
+    placeholder?: string;
+  }) => (
+    <textarea
+      id={props.id}
+      value={props.value}
+      onChange={props.onChange}
+      disabled={props.disabled}
+      placeholder={props.placeholder}
+      data-testid="textarea"
+    />
+  ),
+  PromptInputTools: ({ children }: { children: React.ReactNode }) => (
+    <div data-testid="tools">{children}</div>
+  ),
+}));
+
+vi.mock("@/components/ui/input-group", () => ({
+  InputGroup: ({
+    children,
+    className,
+  }: {
+    children: React.ReactNode;
+    className?: string;
+  }) => <div className={className}>{children}</div>,
+}));
+
+vi.mock("../components/AttachmentMenu", () => ({
+  AttachmentMenu: () => <div data-testid="attachment-menu" />,
+}));
+vi.mock("../components/FileChips", () => ({
+  FileChips: () => null,
+}));
+vi.mock("../components/RecordingButton", () => ({
+  RecordingButton: () => null,
+}));
+vi.mock("../components/RecordingIndicator", () => ({
+  RecordingIndicator: () => null,
+}));
+
+const mockOnSend = vi.fn();
+
+afterEach(() => {
+  cleanup();
+  vi.clearAllMocks();
+  mockCopilotMode = "extended_thinking";
+});
+
+describe("ChatInput mode toggle", () => {
+  it("does not render mode toggle when flag is disabled", () => {
+    mockFlagValue = false;
+    render(<ChatInput onSend={mockOnSend} />);
+    expect(screen.queryByLabelText(/switch to/i)).toBeNull();
+  });
+
+  it("renders mode toggle when flag is enabled", () => {
+    mockFlagValue = true;
+    render(<ChatInput onSend={mockOnSend} />);
+    expect(screen.getByLabelText(/switch to fast mode/i)).toBeDefined();
+  });
+
+  it("shows Thinking label in extended_thinking mode", () => {
+    mockFlagValue = true;
+    mockCopilotMode = "extended_thinking";
+    render(<ChatInput onSend={mockOnSend} />);
+    expect(screen.getByText("Thinking")).toBeDefined();
+  });
+
+  it("shows Fast label in fast mode", () => {
+    mockFlagValue = true;
+    mockCopilotMode = "fast";
+    render(<ChatInput onSend={mockOnSend} />);
+    expect(screen.getByText("Fast")).toBeDefined();
+  });
+
+  it("toggles from extended_thinking to fast on click", () => {
+    mockFlagValue = true;
+    mockCopilotMode = "extended_thinking";
+    render(<ChatInput onSend={mockOnSend} />);
+    fireEvent.click(screen.getByLabelText(/switch to fast mode/i));
+    expect(mockSetCopilotMode).toHaveBeenCalledWith("fast");
+  });
+
+  it("toggles from fast to extended_thinking on click", () => {
+    mockFlagValue = true;
+    mockCopilotMode = "fast";
+    render(<ChatInput onSend={mockOnSend} />);
+    fireEvent.click(screen.getByLabelText(/switch to extended thinking/i));
+    expect(mockSetCopilotMode).toHaveBeenCalledWith("extended_thinking");
+  });
+
+  it("disables toggle button when streaming", () => {
+    mockFlagValue = true;
+    render(<ChatInput onSend={mockOnSend} isStreaming />);
+    const button = screen.getByLabelText(/switch to fast mode/i);
+    expect(button.hasAttribute("disabled")).toBe(true);
+  });
+
+  it("exposes aria-pressed=true in extended_thinking mode", () => {
+    mockFlagValue = true;
+    mockCopilotMode = "extended_thinking";
+    render(<ChatInput onSend={mockOnSend} />);
+    const button = screen.getByLabelText(/switch to fast mode/i);
+    expect(button.getAttribute("aria-pressed")).toBe("true");
+  });
+
+  it("sets aria-pressed=false in fast mode", () => {
+    mockFlagValue = true;
+    mockCopilotMode = "fast";
+    render(<ChatInput onSend={mockOnSend} />);
+    const button = screen.getByLabelText(/switch to extended thinking/i);
+    expect(button.getAttribute("aria-pressed")).toBe("false");
+  });
+
+  it("uses streaming-specific tooltip when disabled", () => {
+    mockFlagValue = true;
+    render(<ChatInput onSend={mockOnSend} isStreaming />);
+    const button = screen.getByLabelText(/switch to fast mode/i);
+    expect(button.getAttribute("title")).toBe(
+      "Mode cannot be changed while streaming",
+    );
+  });
+
+  it("shows a toast when the user toggles mode", async () => {
+    const { toast } = await import("@/components/molecules/Toast/use-toast");
+    mockFlagValue = true;
+    mockCopilotMode = "extended_thinking";
+    render(<ChatInput onSend={mockOnSend} />);
+    fireEvent.click(screen.getByLabelText(/switch to fast mode/i));
+    expect(toast).toHaveBeenCalledWith(
+      expect.objectContaining({
+        title: expect.stringMatching(/switched to fast mode/i),
+      }),
+    );
+  });
+});
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/__tests__/useChatInput.test.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/__tests__/useChatInput.test.ts
new file mode 100644
index 0000000000..259c215a26
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/__tests__/useChatInput.test.ts
@@ -0,0 +1,122 @@
+import { renderHook, act } from "@testing-library/react";
+import { describe, expect, it, vi, beforeEach } from "vitest";
+import { useChatInput } from "../useChatInput";
+
+vi.mock("@/app/(platform)/copilot/store", () => ({
+  useCopilotUIStore: () => ({
+    initialPrompt: null,
+    setInitialPrompt: vi.fn(),
+  }),
+}));
+
+describe("useChatInput", () => {
+  const mockOnSend = vi.fn();
+
+  beforeEach(() => {
+    vi.clearAllMocks();
+    mockOnSend.mockResolvedValue(undefined);
+  });
+
+  it("does not send when value is empty", async () => {
+    const { result } = renderHook(() => useChatInput({ onSend: mockOnSend }));
+
+    await act(async () => {
+      await result.current.handleSend();
+    });
+
+    expect(mockOnSend).not.toHaveBeenCalled();
+  });
+
+  it("sends trimmed value and clears input", async () => {
+    const { result } = renderHook(() => useChatInput({ onSend: mockOnSend }));
+
+    act(() => {
+      result.current.setValue("  hello  ");
+    });
+
+    await act(async () => {
+      await result.current.handleSend();
+    });
+
+    expect(mockOnSend).toHaveBeenCalledWith("hello");
+    expect(result.current.value).toBe("");
+  });
+
+  it("does not send when disabled", async () => {
+    const { result } = renderHook(() =>
+      useChatInput({ onSend: mockOnSend, disabled: true }),
+    );
+
+    act(() => {
+      result.current.setValue("hello");
+    });
+
+    await act(async () => {
+      await result.current.handleSend();
+    });
+
+    expect(mockOnSend).not.toHaveBeenCalled();
+  });
+
+  it("prevents double-submit via ref guard", async () => {
+    let resolveFirst: () => void;
+    const slowSend = vi.fn(
+      () =>
+        new Promise<void>((resolve) => {
+          resolveFirst = resolve;
+        }),
+    );
+
+    const { result } = renderHook(() => useChatInput({ onSend: slowSend }));
+
+    act(() => {
+      result.current.setValue("hello");
+    });
+
+    act(() => {
+      void result.current.handleSend();
+    });
+
+    await act(async () => {
+      await result.current.handleSend();
+    });
+
+    expect(slowSend).toHaveBeenCalledTimes(1);
+
+    await act(async () => {
+      resolveFirst!();
+    });
+  });
+
+  it("allows sending empty when canSendEmpty is true", async () => {
+    const { result } = renderHook(() =>
+      useChatInput({ onSend: mockOnSend, canSendEmpty: true }),
+    );
+
+    await act(async () => {
+      await result.current.handleSend();
+    });
+
+    expect(mockOnSend).toHaveBeenCalledWith("");
+  });
+
+  it("resets isSending after onSend throws", async () => {
+    mockOnSend.mockRejectedValue(new Error("fail"));
+
+    const { result } = renderHook(() => useChatInput({ onSend: mockOnSend }));
+
+    act(() => {
+      result.current.setValue("hello");
+    });
+
+    await act(async () => {
+      try {
+        await result.current.handleSend();
+      } catch {
+        // expected
+      }
+    });
+
+    expect(result.current.isSending).toBe(false);
+  });
+});
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/components/ModeToggleButton.stories.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/components/ModeToggleButton.stories.tsx
new file mode 100644
index 0000000000..6bccdbc888
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/components/ModeToggleButton.stories.tsx
@@ -0,0 +1,44 @@
+import type { Meta, StoryObj } from "@storybook/nextjs";
+import { ModeToggleButton } from "./ModeToggleButton";
+
+const meta: Meta<typeof ModeToggleButton> = {
+  title: "Copilot/ModeToggleButton",
+  component: ModeToggleButton,
+  tags: ["autodocs"],
+  parameters: {
+    layout: "centered",
+    docs: {
+      description: {
+        component:
+          "Toggle between Fast and Extended Thinking copilot modes. Disabled while a response is streaming.",
+      },
+    },
+  },
+  args: {
+    onToggle: () => {},
+  },
+};
+
+export default meta;
+type Story = StoryObj<typeof meta>;
+
+export const FastMode: Story = {
+  args: {
+    mode: "fast",
+    isStreaming: false,
+  },
+};
+
+export const ExtendedThinkingMode: Story = {
+  args: {
+    mode: "extended_thinking",
+    isStreaming: false,
+  },
+};
+
+export const DisabledWhileStreaming: Story = {
+  args: {
+    mode: "fast",
+    isStreaming: true,
+  },
+};
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/components/ModeToggleButton.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/components/ModeToggleButton.tsx
new file mode 100644
index 0000000000..88d4bbba4d
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/components/ModeToggleButton.tsx
@@ -0,0 +1,52 @@
+"use client";
+
+import { cn } from "@/lib/utils";
+import { Brain, Lightning } from "@phosphor-icons/react";
+import type { CopilotMode } from "../../../store";
+
+interface Props {
+  mode: CopilotMode;
+  isStreaming: boolean;
+  onToggle: () => void;
+}
+
+export function ModeToggleButton({ mode, isStreaming, onToggle }: Props) {
+  const isExtended = mode === "extended_thinking";
+  return (
+    <button
+      type="button"
+      aria-pressed={isExtended}
+      disabled={isStreaming}
+      onClick={onToggle}
+      className={cn(
+        "inline-flex min-h-11 min-w-11 items-center justify-center gap-1 rounded-md px-2 py-1 text-xs font-medium transition-colors",
+        isExtended
+          ? "bg-purple-100 text-purple-900 hover:bg-purple-200"
+          : "bg-amber-100 text-amber-900 hover:bg-amber-200",
+        isStreaming && "cursor-not-allowed opacity-50",
+      )}
+      aria-label={
+        isExtended ? "Switch to Fast mode" : "Switch to Extended Thinking mode"
+      }
+      title={
+        isStreaming
+          ? "Mode cannot be changed while streaming"
+          : isExtended
+            ? "Extended Thinking mode — deeper reasoning (click to switch to Fast mode)"
+            : "Fast mode — quicker responses (click to switch to Extended Thinking)"
+      }
+    >
+      {isExtended ? (
+        <>
+          <Brain size={14} />
+          Thinking
+        </>
+      ) : (
+        <>
+          <Lightning size={14} />
+          Fast
+        </>
+      )}
+    </button>
+  );
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/useChatInput.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/useChatInput.ts
index 14ad5aed5b..92e965900b 100644
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/useChatInput.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatInput/useChatInput.ts
@@ -1,5 +1,5 @@
 import { useCopilotUIStore } from "@/app/(platform)/copilot/store";
-import { ChangeEvent, FormEvent, useEffect, useState } from "react";
+import { ChangeEvent, FormEvent, useEffect, useRef, useState } from "react";
 
 interface Args {
   onSend: (message: string) => void;
@@ -17,6 +17,9 @@ export function useChatInput({
 }: Args) {
   const [value, setValue] = useState("");
   const [isSending, setIsSending] = useState(false);
+  // Synchronous guard against double-submit — refs update immediately,
+  // unlike state which batches and can leave a gap for a second call.
+  const isSubmittingRef = useRef(false);
   const { initialPrompt, setInitialPrompt } = useCopilotUIStore();
 
   useEffect(
@@ -47,12 +50,15 @@ export function useChatInput({
 
   async function handleSend() {
     if (disabled || isSending || (!value.trim() && !canSendEmpty)) return;
+    if (isSubmittingRef.current) return;
 
+    isSubmittingRef.current = true;
     setIsSending(true);
     try {
       await onSend(value.trim());
       setValue("");
     } finally {
+      isSubmittingRef.current = false;
       setIsSending(false);
     }
   }
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/ChatMessagesContainer.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/ChatMessagesContainer.tsx
index 205fa74bd0..5161103f4b 100644
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/ChatMessagesContainer.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/ChatMessagesContainer.tsx
@@ -1,4 +1,4 @@
-import { useEffect, useMemo, useRef } from "react";
+import { useMemo, useState } from "react";
 import {
   Conversation,
   ConversationContent,
@@ -11,6 +11,8 @@ import {
 } from "@/components/ai-elements/message";
 import { LoadingSpinner } from "@/components/atoms/LoadingSpinner/LoadingSpinner";
 import { FileUIPart, UIDataTypes, UIMessage, UITools } from "ai";
+import { useEffect, useLayoutEffect, useRef } from "react";
+import { useStickToBottomContext } from "use-stick-to-bottom";
 import { TOOL_PART_PREFIX } from "../JobStatsBar/constants";
 import { TurnStatsBar } from "../JobStatsBar/TurnStatsBar";
 import { useElapsedTimer } from "../JobStatsBar/useElapsedTimer";
@@ -37,6 +39,9 @@ interface Props {
   error: Error | undefined;
   isLoading: boolean;
   sessionID?: string | null;
+  hasMoreMessages?: boolean;
+  isLoadingMore?: boolean;
+  onLoadMore?: () => void;
   onRetry?: () => void;
   historicalDurations?: Map<string, number>;
 }
@@ -106,15 +111,120 @@ function extractGraphExecId(
   return null;
 }
 
+/**
+ * Triggers `onLoadMore` when scrolled near the top, and preserves the
+ * user's scroll position after older messages are prepended to the DOM.
+ *
+ * Scroll preservation works by:
+ * 1. Capturing `scrollHeight` / `scrollTop` in the observer callback
+ *    (synchronous, before React re-renders).
+ * 2. Restoring `scrollTop` in a `useLayoutEffect` keyed on
+ *    `messageCount` so it only fires when messages actually change
+ *    (not on intermediate renders like the loading-spinner toggle).
+ */
+function LoadMoreSentinel({
+  hasMore,
+  isLoading,
+  messageCount,
+  onLoadMore,
+}: {
+  hasMore: boolean;
+  isLoading: boolean;
+  messageCount: number;
+  onLoadMore: () => void;
+}) {
+  const sentinelRef = useRef<HTMLDivElement>(null);
+  const onLoadMoreRef = useRef(onLoadMore);
+  onLoadMoreRef.current = onLoadMore;
+  // Pre-mutation scroll snapshot, written synchronously before onLoadMore
+  const scrollSnapshotRef = useRef({ scrollHeight: 0, scrollTop: 0 });
+  const { scrollRef } = useStickToBottomContext();
+
+  // IntersectionObserver to trigger load when sentinel is near viewport.
+  // Only fires when the container is actually scrollable to prevent
+  // exhausting all pages when content fits without scrolling.
+  useEffect(() => {
+    if (!sentinelRef.current || !hasMore || isLoading) return;
+    const observer = new IntersectionObserver(
+      ([entry]) => {
+        if (!entry.isIntersecting) return;
+        const scrollParent =
+          sentinelRef.current?.closest('[role="log"]') ??
+          sentinelRef.current?.parentElement;
+        if (
+          scrollParent &&
+          scrollParent.scrollHeight <= scrollParent.clientHeight
+        )
+          return;
+        // Capture scroll metrics *before* the state update
+        const el = scrollRef.current;
+        if (el) {
+          scrollSnapshotRef.current = {
+            scrollHeight: el.scrollHeight,
+            scrollTop: el.scrollTop,
+          };
+        }
+        onLoadMoreRef.current();
+      },
+      { rootMargin: "200px 0px 0px 0px" },
+    );
+    observer.observe(sentinelRef.current);
+    return () => observer.disconnect();
+  }, [hasMore, isLoading, scrollRef]);
+
+  // After React commits new DOM nodes (prepended messages), adjust
+  // scrollTop so the user stays at the same visual position.
+  // Keyed on messageCount so it only fires when messages actually
+  // change — NOT on intermediate renders (loading spinner, etc.)
+  // that would consume the snapshot too early.
+  useLayoutEffect(() => {
+    const el = scrollRef.current;
+    const { scrollHeight: prevHeight, scrollTop: prevTop } =
+      scrollSnapshotRef.current;
+    if (!el || prevHeight === 0) return;
+    const delta = el.scrollHeight - prevHeight;
+    if (delta > 0) {
+      el.scrollTop = prevTop + delta;
+    }
+    scrollSnapshotRef.current = { scrollHeight: 0, scrollTop: 0 };
+  }, [messageCount, scrollRef]);
+
+  return (
+    <div ref={sentinelRef} className="flex justify-center py-1">
+      {isLoading && <LoadingSpinner className="h-5 w-5 text-neutral-400" />}
+    </div>
+  );
+}
+
 export function ChatMessagesContainer({
   messages,
   status,
   error,
   isLoading,
   sessionID,
+  hasMoreMessages,
+  isLoadingMore,
+  onLoadMore,
   onRetry,
   historicalDurations,
 }: Props) {
+  // Hide the container for one frame when messages first load so
+  // StickToBottom can scroll to the bottom before the user sees it.
+  const [settled, setSettled] = useState(false);
+  const [prevSessionID, setPrevSessionID] = useState(sessionID);
+  if (sessionID !== prevSessionID) {
+    setPrevSessionID(sessionID);
+    if (settled) setSettled(false);
+  }
+  const messagesReady = messages.length > 0 || !isLoading;
+  useEffect(() => {
+    if (settled || !messagesReady) return;
+    const raf = requestAnimationFrame(() => setSettled(true));
+    return () => cancelAnimationFrame(raf);
+  }, [settled, messagesReady]);
+  // opacity-0 only during the single frame between messages arriving and scroll settling
+  const hideForScroll = messagesReady && !settled;
+
   const lastMessage = messages[messages.length - 1];
   const graphExecId = useMemo(() => extractGraphExecId(messages), [messages]);
 
@@ -162,13 +272,27 @@ export function ChatMessagesContainer({
   });
 
   return (
-    <Conversation className="min-h-0 flex-1">
-      <ConversationContent className="flex flex-1 flex-col gap-6 px-3 py-6">
+    <Conversation
+      key={sessionID ?? "new"}
+      resize={settled ? "smooth" : "instant"}
+      className={
+        "min-h-0 flex-1 " +
+        (hideForScroll
+          ? "opacity-0"
+          : "opacity-100 transition-opacity duration-100 ease-out")
+      }
+    >
+      <ConversationContent className="flex min-h-full flex-1 flex-col gap-6 px-3 py-6">
+        {hasMoreMessages && onLoadMore && (
+          <LoadMoreSentinel
+            hasMore={hasMoreMessages}
+            isLoading={!!isLoadingMore}
+            messageCount={messages.length}
+            onLoadMore={onLoadMore}
+          />
+        )}
         {isLoading && messages.length === 0 && (
-          <div
-            className="flex flex-1 items-center justify-center"
-            style={{ minHeight: "calc(100vh - 12rem)" }}
-          >
+          <div className="flex flex-1 items-center justify-center">
             <LoadingSpinner className="text-neutral-600" />
           </div>
         )}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/components/MessageAttachments.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/components/MessageAttachments.tsx
index 6f3085affb..43ad4c4208 100644
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/components/MessageAttachments.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/components/MessageAttachments.tsx
@@ -2,6 +2,7 @@ import {
   FileText as FileTextIcon,
   DownloadSimple as DownloadIcon,
 } from "@phosphor-icons/react";
+import { Flag, useGetFlag } from "@/services/feature-flags/use-get-flag";
 import type { FileUIPart } from "ai";
 import {
   globalRegistry,
@@ -14,6 +15,8 @@ import {
   ContentCardTitle,
   ContentCardSubtitle,
 } from "../../ToolAccordion/AccordionContent";
+import { ArtifactCard } from "../../ArtifactCard/ArtifactCard";
+import { filePartToArtifactRef } from "../helpers";
 
 interface Props {
   files: FileUIPart[];
@@ -39,11 +42,26 @@ function renderFileContent(file: FileUIPart): React.ReactNode | null {
 }
 
 export function MessageAttachments({ files, isUser }: Props) {
+  const isArtifactsEnabled = useGetFlag(Flag.ARTIFACTS);
   if (files.length === 0) return null;
 
   return (
     <div className="mt-2 flex flex-col gap-2">
       {files.map((file, i) => {
+        if (isArtifactsEnabled) {
+          const artifactRef = filePartToArtifactRef(
+            file,
+            isUser ? "user-upload" : "agent",
+          );
+          if (artifactRef) {
+            return (
+              <ArtifactCard
+                key={`artifact-${artifactRef.id}-${i}`}
+                artifact={artifactRef}
+              />
+            );
+          }
+        }
         const rendered = renderFileContent(file);
         return rendered ? (
           <div
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/components/MessagePartRenderer.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/components/MessagePartRenderer.tsx
index 93a5a6d4e6..090ab5310e 100644
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/components/MessagePartRenderer.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/components/MessagePartRenderer.tsx
@@ -1,8 +1,9 @@
 import { MessageResponse } from "@/components/ai-elements/message";
 import { ErrorCard } from "@/components/molecules/ErrorCard/ErrorCard";
+import { Flag, useGetFlag } from "@/services/feature-flags/use-get-flag";
 import { ExclamationMarkIcon } from "@phosphor-icons/react";
 import { ToolUIPart, UIDataTypes, UIMessage, UITools } from "ai";
-import { useState } from "react";
+import { ArtifactCard } from "../../ArtifactCard/ArtifactCard";
 import { AskQuestionTool } from "../../../tools/AskQuestion/AskQuestion";
 import { ConnectIntegrationTool } from "../../../tools/ConnectIntegrationTool/ConnectIntegrationTool";
 import { CreateAgentTool } from "../../../tools/CreateAgent/CreateAgent";
@@ -20,7 +21,11 @@ import { RunBlockTool } from "../../../tools/RunBlock/RunBlock";
 import { RunMCPToolComponent } from "../../../tools/RunMCPTool/RunMCPTool";
 import { SearchDocsTool } from "../../../tools/SearchDocs/SearchDocs";
 import { ViewAgentOutputTool } from "../../../tools/ViewAgentOutput/ViewAgentOutput";
-import { parseSpecialMarkers, resolveWorkspaceUrls } from "../helpers";
+import {
+  extractWorkspaceArtifacts,
+  parseSpecialMarkers,
+  resolveWorkspaceUrls,
+} from "../helpers";
 
 /**
  * Custom img component for Streamdown that renders <video> elements
@@ -29,12 +34,10 @@ import { parseSpecialMarkers, resolveWorkspaceUrls } from "../helpers";
  */
 function WorkspaceMediaImage(props: React.JSX.IntrinsicElements["img"]) {
   const { src, alt, ...rest } = props;
-  const [imgFailed, setImgFailed] = useState(false);
-  const isWorkspace = src?.includes("/workspace/files/") ?? false;
 
   if (!src) return null;
 
-  if (alt?.startsWith("video:") || (imgFailed && isWorkspace)) {
+  if (alt?.startsWith("video:")) {
     return (
       <span className="my-2 inline-block">
         <video
@@ -56,9 +59,6 @@ function WorkspaceMediaImage(props: React.JSX.IntrinsicElements["img"]) {
       alt={alt || "Image"}
       className="h-auto max-w-full rounded-md border border-zinc-200"
       loading="lazy"
-      onError={() => {
-        if (isWorkspace) setImgFailed(true);
-      }}
       {...rest}
     />
   );
@@ -67,6 +67,27 @@ function WorkspaceMediaImage(props: React.JSX.IntrinsicElements["img"]) {
 /** Stable components override for Streamdown (avoids re-creating on every render). */
 const STREAMDOWN_COMPONENTS = { img: WorkspaceMediaImage };
 
+function TextWithArtifactCards({ text }: { text: string }) {
+  const isArtifactsEnabled = useGetFlag(Flag.ARTIFACTS);
+  const artifacts = extractWorkspaceArtifacts(text);
+  const resolved = resolveWorkspaceUrls(text);
+
+  return (
+    <>
+      {isArtifactsEnabled && artifacts.length > 0 && (
+        <div className="mb-2 flex flex-col gap-1">
+          {artifacts.map((artifact) => (
+            <ArtifactCard key={artifact.id} artifact={artifact} />
+          ))}
+        </div>
+      )}
+      <MessageResponse components={STREAMDOWN_COMPONENTS}>
+        {resolved}
+      </MessageResponse>
+    </>
+  );
+}
+
 interface Props {
   part: UIMessage<unknown, UIDataTypes, UITools>["parts"][number];
   messageID: string;
@@ -124,11 +145,7 @@ export function MessagePartRenderer({
         );
       }
 
-      return (
-        <MessageResponse key={key} components={STREAMDOWN_COMPONENTS}>
-          {resolveWorkspaceUrls(cleanText)}
-        </MessageResponse>
-      );
+      return <TextWithArtifactCards key={key} text={cleanText} />;
     }
     case "tool-ask_question":
       return <AskQuestionTool key={key} part={part as ToolUIPart} />;
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/helpers.test.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/helpers.test.ts
new file mode 100644
index 0000000000..831894d57b
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/helpers.test.ts
@@ -0,0 +1,103 @@
+import { describe, expect, it } from "vitest";
+import { extractWorkspaceArtifacts, filePartToArtifactRef } from "./helpers";
+
+describe("extractWorkspaceArtifacts", () => {
+  it("extracts a single workspace:// link with its markdown title", () => {
+    const text =
+      "See [the report](workspace://550e8400-e29b-41d4-a716-446655440000) for details.";
+    const out = extractWorkspaceArtifacts(text);
+    expect(out).toHaveLength(1);
+    expect(out[0].id).toBe("550e8400-e29b-41d4-a716-446655440000");
+    expect(out[0].title).toBe("the report");
+    expect(out[0].origin).toBe("agent");
+  });
+
+  it("falls back to a synthetic title when the URI isn't wrapped in link markdown", () => {
+    const text = "raw workspace://abc12345-0000-0000-0000-000000000000 link";
+    const out = extractWorkspaceArtifacts(text);
+    expect(out).toHaveLength(1);
+    expect(out[0].title).toBe("File abc12345");
+  });
+
+  it("skips URIs inside image markdown so images don't double-render", () => {
+    const text =
+      "![chart](workspace://abc12345-0000-0000-0000-000000000000#image/png)";
+    expect(extractWorkspaceArtifacts(text)).toEqual([]);
+  });
+
+  it("still extracts non-image links when image links are also present", () => {
+    const text =
+      "![chart](workspace://aaaaaaaa-0000-0000-0000-000000000000#image/png) " +
+      "and [doc](workspace://bbbbbbbb-0000-0000-0000-000000000000)";
+    const out = extractWorkspaceArtifacts(text);
+    expect(out).toHaveLength(1);
+    expect(out[0].id).toBe("bbbbbbbb-0000-0000-0000-000000000000");
+  });
+
+  it("deduplicates repeated references to the same artifact id", () => {
+    const text =
+      "[A](workspace://11111111-0000-0000-0000-000000000000) and " +
+      "[A again](workspace://11111111-0000-0000-0000-000000000000)";
+    const out = extractWorkspaceArtifacts(text);
+    expect(out).toHaveLength(1);
+  });
+
+  it("returns empty when no workspace URIs are present", () => {
+    expect(extractWorkspaceArtifacts("plain text, no links")).toEqual([]);
+  });
+
+  it("picks up the mime hint from the URI fragment", () => {
+    const text =
+      "![v](workspace://cccccccc-0000-0000-0000-000000000000#video/mp4) " +
+      "[d](workspace://dddddddd-0000-0000-0000-000000000000#application/pdf)";
+    const out = extractWorkspaceArtifacts(text);
+    expect(out).toHaveLength(1);
+    expect(out[0].mimeType).toBe("application/pdf");
+  });
+});
+
+describe("filePartToArtifactRef", () => {
+  it("returns null without a url", () => {
+    expect(
+      filePartToArtifactRef({ type: "file", url: "", filename: "x" } as any),
+    ).toBeNull();
+  });
+
+  it("returns null for URLs that don't match the workspace file pattern", () => {
+    expect(
+      filePartToArtifactRef({
+        type: "file",
+        url: "https://example.com/file.txt",
+        filename: "file.txt",
+      } as any),
+    ).toBeNull();
+  });
+
+  it("extracts id from the workspace proxy URL", () => {
+    const ref = filePartToArtifactRef({
+      type: "file",
+      url: "/api/proxy/api/workspace/files/550e8400-e29b-41d4-a716-446655440000/download",
+      filename: "report.pdf",
+      mediaType: "application/pdf",
+    } as any);
+    expect(ref?.id).toBe("550e8400-e29b-41d4-a716-446655440000");
+    expect(ref?.title).toBe("report.pdf");
+    expect(ref?.mimeType).toBe("application/pdf");
+  });
+
+  it("defaults origin to user-upload but accepts an override", () => {
+    const url =
+      "/api/proxy/api/workspace/files/550e8400-e29b-41d4-a716-446655440000/download";
+    const defaulted = filePartToArtifactRef({
+      type: "file",
+      url,
+      filename: "a.txt",
+    } as any);
+    expect(defaulted?.origin).toBe("user-upload");
+    const overridden = filePartToArtifactRef(
+      { type: "file", url, filename: "a.txt" } as any,
+      "agent",
+    );
+    expect(overridden?.origin).toBe("agent");
+  });
+});
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/helpers.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/helpers.ts
index c859ba791f..e03dfaa26c 100644
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/helpers.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/components/ChatMessagesContainer/helpers.ts
@@ -1,6 +1,8 @@
 import { getGetWorkspaceDownloadFileByIdUrl } from "@/app/api/__generated__/endpoints/workspace/workspace";
 import { ResponseType } from "@/app/api/__generated__/models/responseType";
-import { ToolUIPart, UIDataTypes, UIMessage, UITools } from "ai";
+import { parseWorkspaceURI } from "@/lib/workspace-uri";
+import { FileUIPart, ToolUIPart, UIDataTypes, UIMessage, UITools } from "ai";
+import type { ArtifactRef } from "../../store";
 
 export type MessagePart = UIMessage<
   unknown,
@@ -31,6 +33,10 @@ const CUSTOM_TOOL_TYPES = new Set([
   "tool-create_feature_request",
 ]);
 
+const WORKSPACE_FILE_PATTERN =
+  /\/api\/proxy\/api\/workspace\/files\/([a-f0-9-]+)\/download/;
+const WORKSPACE_URI_PATTERN = /workspace:\/\/([a-f0-9-]+)(?:#([^\s)\]]+))?/g;
+
 const INTERACTIVE_RESPONSE_TYPES: ReadonlySet<string> = new Set([
   ResponseType.setup_requirements,
   ResponseType.agent_details,
@@ -233,6 +239,84 @@ export function parseSpecialMarkers(text: string): {
   return { markerType: null, markerText: "", cleanText: text };
 }
 
+export function filePartToArtifactRef(
+  file: FileUIPart,
+  origin: ArtifactRef["origin"] = "user-upload",
+): ArtifactRef | null {
+  if (!file.url) return null;
+  const match = file.url.match(WORKSPACE_FILE_PATTERN);
+  if (!match) return null;
+  return {
+    id: match[1],
+    title: file.filename || "File",
+    mimeType: file.mediaType || null,
+    sourceUrl: file.url,
+    origin,
+  };
+}
+
+export function extractWorkspaceArtifacts(text: string): ArtifactRef[] {
+  const seen = new Set<string>();
+  const artifacts: ArtifactRef[] = [];
+
+  for (const match of text.matchAll(WORKSPACE_URI_PATTERN)) {
+    const fullUri = match[0];
+    const parsed = parseWorkspaceURI(fullUri);
+
+    if (!parsed || seen.has(parsed.fileID)) continue;
+
+    // Skip URIs inside image markdown (`![alt](workspace://...)`). Images are
+    // rendered inline via resolveWorkspaceUrls — surfacing them as cards too
+    // would double-render the same asset.
+    const escapedUri = escapeRegExp(fullUri);
+    const imagePattern = new RegExp(`!\\[[^\\]]*\\]\\(${escapedUri}\\)`);
+    if (imagePattern.test(text)) continue;
+
+    seen.add(parsed.fileID);
+
+    const linkPattern = new RegExp(`\\[([^\\]]+)\\]\\(${escapedUri}\\)`);
+    const linkMatch = text.match(linkPattern);
+    const title = linkMatch?.[1] ?? `File ${parsed.fileID.slice(0, 8)}`;
+
+    artifacts.push({
+      id: parsed.fileID,
+      title,
+      mimeType: parsed.mimeType,
+      sourceUrl: `/api/proxy${getGetWorkspaceDownloadFileByIdUrl(parsed.fileID)}`,
+      origin: "agent",
+    });
+  }
+
+  return artifacts;
+}
+
+export function getMessageArtifacts(
+  message: UIMessage<unknown, UIDataTypes, UITools>,
+): ArtifactRef[] {
+  const seen = new Set<string>();
+  const artifacts: ArtifactRef[] = [];
+
+  for (const part of message.parts) {
+    if (part.type === "text") {
+      for (const artifact of extractWorkspaceArtifacts(part.text)) {
+        if (seen.has(artifact.id)) continue;
+        seen.add(artifact.id);
+        artifacts.push(artifact);
+      }
+    }
+
+    if (part.type === "file") {
+      const origin = message.role === "user" ? "user-upload" : "agent";
+      const artifact = filePartToArtifactRef(part, origin);
+      if (!artifact || seen.has(artifact.id)) continue;
+      seen.add(artifact.id);
+      artifacts.push(artifact);
+    }
+  }
+
+  return artifacts;
+}
+
 /**
  * Resolve workspace:// URLs in markdown text to proxy download URLs.
  *
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/helpers.test.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/helpers.test.ts
index d56dbd13d8..a7919430d2 100644
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/helpers.test.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/helpers.test.ts
@@ -1,8 +1,12 @@
+import type { UIMessage } from "ai";
 import { describe, expect, it } from "vitest";
 import {
   ORIGINAL_TITLE,
+  extractSendMessageText,
   formatNotificationTitle,
+  getSendSuppressionReason,
   parseSessionIDs,
+  shouldSuppressDuplicateSend,
 } from "./helpers";
 
 describe("formatNotificationTitle", () => {
@@ -74,3 +78,216 @@ describe("parseSessionIDs", () => {
     expect(parseSessionIDs('["a","a","b"]')).toEqual(new Set(["a", "b"]));
   });
 });
+
+describe("extractSendMessageText", () => {
+  it("extracts text from a string argument", () => {
+    expect(extractSendMessageText("hello")).toBe("hello");
+  });
+
+  it("extracts text from an object with text property", () => {
+    expect(extractSendMessageText({ text: "world" })).toBe("world");
+  });
+
+  it("returns empty string for null", () => {
+    expect(extractSendMessageText(null)).toBe("");
+  });
+
+  it("returns empty string for undefined", () => {
+    expect(extractSendMessageText(undefined)).toBe("");
+  });
+
+  it("converts numbers to string", () => {
+    expect(extractSendMessageText(42)).toBe("42");
+  });
+});
+
+let msgCounter = 0;
+function makeMsg(role: "user" | "assistant", text: string): UIMessage {
+  return {
+    id: `msg-${msgCounter++}`,
+    role,
+    parts: [{ type: "text", text }],
+  };
+}
+
+describe("shouldSuppressDuplicateSend", () => {
+  it("suppresses when reconnect is scheduled", () => {
+    expect(
+      shouldSuppressDuplicateSend({
+        text: "hello",
+        isReconnectScheduled: true,
+        lastSubmittedText: null,
+        messages: [],
+      }),
+    ).toBe(true);
+  });
+
+  it("allows send when not reconnecting and no prior submission", () => {
+    expect(
+      shouldSuppressDuplicateSend({
+        text: "hello",
+        isReconnectScheduled: false,
+        lastSubmittedText: null,
+        messages: [],
+      }),
+    ).toBe(false);
+  });
+
+  it("suppresses when text matches last submitted AND last user message", () => {
+    const messages = [makeMsg("user", "hello"), makeMsg("assistant", "hi")];
+    expect(
+      shouldSuppressDuplicateSend({
+        text: "hello",
+        isReconnectScheduled: false,
+        lastSubmittedText: "hello",
+        messages,
+      }),
+    ).toBe(true);
+  });
+
+  it("allows send when text matches last submitted but differs from last user message", () => {
+    const messages = [
+      makeMsg("user", "different"),
+      makeMsg("assistant", "reply"),
+    ];
+    expect(
+      shouldSuppressDuplicateSend({
+        text: "hello",
+        isReconnectScheduled: false,
+        lastSubmittedText: "hello",
+        messages,
+      }),
+    ).toBe(false);
+  });
+
+  it("allows send when text differs from last submitted", () => {
+    const messages = [makeMsg("user", "hello")];
+    expect(
+      shouldSuppressDuplicateSend({
+        text: "new message",
+        isReconnectScheduled: false,
+        lastSubmittedText: "hello",
+        messages,
+      }),
+    ).toBe(false);
+  });
+
+  it("allows send when text is empty", () => {
+    expect(
+      shouldSuppressDuplicateSend({
+        text: "",
+        isReconnectScheduled: false,
+        lastSubmittedText: "",
+        messages: [],
+      }),
+    ).toBe(false);
+  });
+
+  it("allows send with empty messages array even if text matches lastSubmitted", () => {
+    expect(
+      shouldSuppressDuplicateSend({
+        text: "hello",
+        isReconnectScheduled: false,
+        lastSubmittedText: "hello",
+        messages: [],
+      }),
+    ).toBe(false);
+  });
+});
+
+describe("getSendSuppressionReason", () => {
+  it("returns 'reconnecting' when reconnect is scheduled", () => {
+    expect(
+      getSendSuppressionReason({
+        text: "hello",
+        isReconnectScheduled: true,
+        lastSubmittedText: null,
+        messages: [],
+      }),
+    ).toBe("reconnecting");
+  });
+
+  it("returns 'reconnecting' even when text would otherwise be a duplicate", () => {
+    const messages = [makeMsg("user", "hello")];
+    expect(
+      getSendSuppressionReason({
+        text: "hello",
+        isReconnectScheduled: true,
+        lastSubmittedText: "hello",
+        messages,
+      }),
+    ).toBe("reconnecting");
+  });
+
+  it("returns 'duplicate' when text matches last submitted AND last user message", () => {
+    const messages = [makeMsg("user", "hello"), makeMsg("assistant", "hi")];
+    expect(
+      getSendSuppressionReason({
+        text: "hello",
+        isReconnectScheduled: false,
+        lastSubmittedText: "hello",
+        messages,
+      }),
+    ).toBe("duplicate");
+  });
+
+  it("returns null when text matches last submitted but differs from last user message", () => {
+    const messages = [
+      makeMsg("user", "different"),
+      makeMsg("assistant", "reply"),
+    ];
+    expect(
+      getSendSuppressionReason({
+        text: "hello",
+        isReconnectScheduled: false,
+        lastSubmittedText: "hello",
+        messages,
+      }),
+    ).toBeNull();
+  });
+
+  it("returns null when text differs from last submitted", () => {
+    const messages = [makeMsg("user", "hello")];
+    expect(
+      getSendSuppressionReason({
+        text: "new message",
+        isReconnectScheduled: false,
+        lastSubmittedText: "hello",
+        messages,
+      }),
+    ).toBeNull();
+  });
+
+  it("returns null when not reconnecting and no prior submission", () => {
+    expect(
+      getSendSuppressionReason({
+        text: "hello",
+        isReconnectScheduled: false,
+        lastSubmittedText: null,
+        messages: [],
+      }),
+    ).toBeNull();
+  });
+
+  it("returns null when text is empty", () => {
+    expect(
+      getSendSuppressionReason({
+        text: "",
+        isReconnectScheduled: false,
+        lastSubmittedText: "",
+        messages: [],
+      }),
+    ).toBeNull();
+  });
+
+  it("returns null when messages array is empty even if text matches lastSubmitted", () => {
+    expect(
+      getSendSuppressionReason({
+        text: "hello",
+        isReconnectScheduled: false,
+        lastSubmittedText: "hello",
+        messages: [],
+      }),
+    ).toBeNull();
+  });
+});
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/helpers.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/helpers.ts
index ea5ceee77e..4ee845c53f 100644
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/helpers.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/helpers.ts
@@ -65,6 +65,72 @@ export function resolveInProgressTools(
   }));
 }
 
+/**
+ * Extract the user-visible text from the arguments passed to `sendMessage`.
+ * Handles both `sendMessage("hello")` and `sendMessage({ text: "hello" })`.
+ */
+export function extractSendMessageText(firstArg: unknown): string {
+  if (firstArg && typeof firstArg === "object" && "text" in firstArg)
+    return (firstArg as { text: string }).text;
+  return String(firstArg ?? "");
+}
+
+interface SuppressDuplicateArgs {
+  text: string;
+  isReconnectScheduled: boolean;
+  lastSubmittedText: string | null;
+  messages: UIMessage[];
+}
+
+/**
+ * Reason a sendMessage was suppressed, or ``null`` to pass through.
+ *
+ * - ``"reconnecting"``: the stream is reconnecting; the caller should
+ *   notify the user (the UI may not yet reflect the disabled state).
+ * - ``"duplicate"``: the same text was just submitted and echoed back
+ *   by the session — safe to silently drop (user double-clicked).
+ */
+export type SuppressReason = "reconnecting" | "duplicate" | null;
+
+/**
+ * Determine whether a sendMessage call should be suppressed to prevent
+ * duplicate POSTs during reconnect cycles or re-submits of the same text.
+ *
+ * Returns the reason so callers can surface user-visible feedback when
+ * the suppression isn't just a silent duplicate.
+ */
+export function getSendSuppressionReason({
+  text,
+  isReconnectScheduled,
+  lastSubmittedText,
+  messages,
+}: SuppressDuplicateArgs): SuppressReason {
+  if (isReconnectScheduled) return "reconnecting";
+
+  if (text && lastSubmittedText === text) {
+    const lastUserMsg = messages.filter((m) => m.role === "user").pop();
+    const lastUserText = lastUserMsg?.parts
+      ?.map((p) => ("text" in p ? p.text : ""))
+      .join("")
+      .trim();
+    if (lastUserText === text) return "duplicate";
+  }
+
+  return null;
+}
+
+/**
+ * Backwards-compatible boolean wrapper for ``getSendSuppressionReason``.
+ *
+ * @deprecated Call ``getSendSuppressionReason`` directly so callers can
+ * distinguish between reconnect and duplicate suppression.
+ */
+export function shouldSuppressDuplicateSend(
+  args: SuppressDuplicateArgs,
+): boolean {
+  return getSendSuppressionReason(args) !== null;
+}
+
 /**
  * Deduplicate messages by ID and by consecutive content fingerprint.
  *
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/helpers/convertChatSessionToUiMessages.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/helpers/convertChatSessionToUiMessages.ts
index 2211c27277..5021d661f0 100644
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/helpers/convertChatSessionToUiMessages.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/helpers/convertChatSessionToUiMessages.ts
@@ -6,6 +6,7 @@ interface SessionChatMessage {
   content: string | null;
   tool_call_id: string | null;
   tool_calls: unknown[] | null;
+  sequence: number | null;
   duration_ms: number | null;
 }
 
@@ -35,6 +36,7 @@ function coerceSessionChatMessages(
               ? null
               : String(msg.tool_call_id),
         tool_calls: Array.isArray(msg.tool_calls) ? msg.tool_calls : null,
+        sequence: typeof msg.sequence === "number" ? msg.sequence : null,
         duration_ms:
           typeof msg.duration_ms === "number" ? msg.duration_ms : null,
       };
@@ -101,10 +103,67 @@ function toToolInput(rawArguments: unknown): unknown {
   return {};
 }
 
+/**
+ * Concatenate two UIMessage arrays, merging consecutive assistant messages
+ * at the join point so that reasoning + response parts stay in a single bubble.
+ *
+ * Within each page, `convertChatSessionMessagesToUiMessages` already merges
+ * consecutive assistant DB rows. This handles the boundary between pages
+ * (or between older-pages and the current/streaming page).
+ */
+export function concatWithAssistantMerge(
+  a: UIMessage<unknown, UIDataTypes, UITools>[],
+  b: UIMessage<unknown, UIDataTypes, UITools>[],
+): UIMessage<unknown, UIDataTypes, UITools>[] {
+  if (a.length === 0) return b;
+  if (b.length === 0) return a;
+  const last = a[a.length - 1];
+  const first = b[0];
+  if (last.role === "assistant" && first.role === "assistant") {
+    return [
+      ...a.slice(0, -1),
+      { ...last, parts: [...last.parts, ...first.parts] },
+      ...b.slice(1),
+    ];
+  }
+  return [...a, ...b];
+}
+
+/**
+ * Extract a toolCallId → output map from raw API messages.
+ * Used to provide cross-page tool output context when converting
+ * older pages that may have assistant tool_calls whose results
+ * are in a newer page.
+ */
+export function extractToolOutputsFromRaw(
+  rawMessages: unknown[],
+): Map<string, unknown> {
+  const map = new Map<string, unknown>();
+  for (const raw of rawMessages) {
+    if (!raw || typeof raw !== "object") continue;
+    const msg = raw as Record<string, unknown>;
+    if (
+      msg.role === "tool" &&
+      typeof msg.tool_call_id === "string" &&
+      msg.content != null
+    ) {
+      map.set(
+        msg.tool_call_id,
+        typeof msg.content === "string" ? msg.content : String(msg.content),
+      );
+    }
+  }
+  return map;
+}
+
 export function convertChatSessionMessagesToUiMessages(
   sessionId: string,
   rawMessages: unknown[],
-  options?: { isComplete?: boolean },
+  options?: {
+    isComplete?: boolean;
+    /** Tool outputs from adjacent pages, for cross-page tool_call matching. */
+    extraToolOutputs?: Map<string, unknown>;
+  },
 ): {
   messages: UIMessage<unknown, UIDataTypes, UITools>[];
   durations: Map<string, number>;
@@ -112,6 +171,14 @@ export function convertChatSessionMessagesToUiMessages(
   const messages = coerceSessionChatMessages(rawMessages);
   const toolOutputsByCallId = new Map<string, unknown>();
 
+  // Seed with extra tool outputs from adjacent pages first;
+  // outputs from this page will override if present in both.
+  if (options?.extraToolOutputs) {
+    for (const [id, output] of options.extraToolOutputs) {
+      toolOutputsByCallId.set(id, output);
+    }
+  }
+
   for (const msg of messages) {
     if (msg.role !== "tool") continue;
     if (!msg.tool_call_id) continue;
@@ -122,7 +189,7 @@ export function convertChatSessionMessagesToUiMessages(
   const uiMessages: UIMessage<unknown, UIDataTypes, UITools>[] = [];
   const durations = new Map<string, number>();
 
-  messages.forEach((msg, index) => {
+  messages.forEach((msg) => {
     if (msg.role === "tool") return;
     if (msg.role !== "user" && msg.role !== "assistant") return;
 
@@ -200,7 +267,7 @@ export function convertChatSessionMessagesToUiMessages(
       return;
     }
 
-    const msgId = `${sessionId}-${index}`;
+    const msgId = `${sessionId}-seq-${msg.sequence}`;
     uiMessages.push({
       id: msgId,
       role: msg.role,
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/store.test.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/store.test.ts
new file mode 100644
index 0000000000..d31b55ebb7
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/store.test.ts
@@ -0,0 +1,141 @@
+import { beforeEach, describe, expect, it } from "vitest";
+import type { ArtifactRef } from "./store";
+import { useCopilotUIStore } from "./store";
+
+function makeArtifact(id: string, title = `file-${id}`): ArtifactRef {
+  return {
+    id,
+    title,
+    mimeType: "text/plain",
+    sourceUrl: `/api/proxy/api/workspace/files/${id}/download`,
+    origin: "agent",
+  };
+}
+
+function resetStore() {
+  useCopilotUIStore.setState({
+    artifactPanel: {
+      isOpen: false,
+      isMinimized: false,
+      isMaximized: false,
+      width: 600,
+      activeArtifact: null,
+      history: [],
+    },
+  });
+}
+
+describe("artifactPanel store actions", () => {
+  beforeEach(resetStore);
+
+  it("openArtifact opens the panel and sets the active artifact", () => {
+    const a = makeArtifact("a");
+    useCopilotUIStore.getState().openArtifact(a);
+    const s = useCopilotUIStore.getState().artifactPanel;
+    expect(s.isOpen).toBe(true);
+    expect(s.isMinimized).toBe(false);
+    expect(s.activeArtifact?.id).toBe("a");
+    expect(s.history).toEqual([]);
+  });
+
+  it("openArtifact pushes the previous artifact onto history", () => {
+    const a = makeArtifact("a");
+    const b = makeArtifact("b");
+    useCopilotUIStore.getState().openArtifact(a);
+    useCopilotUIStore.getState().openArtifact(b);
+    const s = useCopilotUIStore.getState().artifactPanel;
+    expect(s.activeArtifact?.id).toBe("b");
+    expect(s.history.map((h) => h.id)).toEqual(["a"]);
+  });
+
+  it("openArtifact does NOT push history when re-opening the same artifact", () => {
+    const a = makeArtifact("a");
+    useCopilotUIStore.getState().openArtifact(a);
+    useCopilotUIStore.getState().openArtifact(a);
+    expect(useCopilotUIStore.getState().artifactPanel.history).toEqual([]);
+  });
+
+  it("openArtifact pops the top of history when returning to it (A→B→A)", () => {
+    const a = makeArtifact("a");
+    const b = makeArtifact("b");
+    useCopilotUIStore.getState().openArtifact(a);
+    useCopilotUIStore.getState().openArtifact(b);
+    useCopilotUIStore.getState().openArtifact(a); // ping-pong
+    const s = useCopilotUIStore.getState().artifactPanel;
+    expect(s.activeArtifact?.id).toBe("a");
+    // History was [a]; returning to a should pop, not push.
+    expect(s.history).toEqual([]);
+  });
+
+  it("goBackArtifact pops the last entry and becomes active", () => {
+    const a = makeArtifact("a");
+    const b = makeArtifact("b");
+    useCopilotUIStore.getState().openArtifact(a);
+    useCopilotUIStore.getState().openArtifact(b);
+    useCopilotUIStore.getState().goBackArtifact();
+    const s = useCopilotUIStore.getState().artifactPanel;
+    expect(s.activeArtifact?.id).toBe("a");
+    expect(s.history).toEqual([]);
+  });
+
+  it("goBackArtifact is a no-op when history is empty", () => {
+    const a = makeArtifact("a");
+    useCopilotUIStore.getState().openArtifact(a);
+    useCopilotUIStore.getState().goBackArtifact();
+    const s = useCopilotUIStore.getState().artifactPanel;
+    expect(s.activeArtifact?.id).toBe("a");
+  });
+
+  it("closeArtifactPanel keeps activeArtifact (for exit animation) and clears history", () => {
+    const a = makeArtifact("a");
+    const b = makeArtifact("b");
+    useCopilotUIStore.getState().openArtifact(a);
+    useCopilotUIStore.getState().openArtifact(b);
+    useCopilotUIStore.getState().closeArtifactPanel();
+    const s = useCopilotUIStore.getState().artifactPanel;
+    expect(s.isOpen).toBe(false);
+    expect(s.isMinimized).toBe(false);
+    expect(s.activeArtifact?.id).toBe("b");
+    expect(s.history).toEqual([]);
+  });
+
+  it("minimize/restore toggles isMinimized without touching activeArtifact", () => {
+    const a = makeArtifact("a");
+    useCopilotUIStore.getState().openArtifact(a);
+    useCopilotUIStore.getState().minimizeArtifactPanel();
+    expect(useCopilotUIStore.getState().artifactPanel.isMinimized).toBe(true);
+    useCopilotUIStore.getState().restoreArtifactPanel();
+    expect(useCopilotUIStore.getState().artifactPanel.isMinimized).toBe(false);
+    expect(useCopilotUIStore.getState().artifactPanel.activeArtifact?.id).toBe(
+      "a",
+    );
+  });
+
+  it("maximize sets isMaximized and clears isMinimized", () => {
+    const a = makeArtifact("a");
+    useCopilotUIStore.getState().openArtifact(a);
+    useCopilotUIStore.getState().minimizeArtifactPanel();
+    useCopilotUIStore.getState().maximizeArtifactPanel();
+    const s = useCopilotUIStore.getState().artifactPanel;
+    expect(s.isMaximized).toBe(true);
+    expect(s.isMinimized).toBe(false);
+  });
+
+  it("restoreArtifactPanel clears both isMinimized and isMaximized", () => {
+    const a = makeArtifact("a");
+    useCopilotUIStore.getState().openArtifact(a);
+    useCopilotUIStore.getState().maximizeArtifactPanel();
+    useCopilotUIStore.getState().restoreArtifactPanel();
+    const s = useCopilotUIStore.getState().artifactPanel;
+    expect(s.isMaximized).toBe(false);
+    expect(s.isMinimized).toBe(false);
+  });
+
+  it("setArtifactPanelWidth updates width and clears isMaximized", () => {
+    useCopilotUIStore.getState().maximizeArtifactPanel();
+    useCopilotUIStore.getState().setArtifactPanelWidth(720);
+    const s = useCopilotUIStore.getState().artifactPanel;
+    expect(s.width).toBe(720);
+    expect(s.isMaximized).toBe(false);
+  });
+});
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/store.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/store.ts
index 742aadf7b7..34f4c2fda9 100644
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/store.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/store.ts
@@ -1,5 +1,6 @@
 import { Key, storage } from "@/services/storage/local-storage";
 import { create } from "zustand";
+import { clearContentCache } from "./components/ArtifactPanel/components/useArtifactContent";
 import { ORIGINAL_TITLE, parseSessionIDs } from "./helpers";
 
 export interface DeleteTarget {
@@ -7,8 +8,77 @@ export interface DeleteTarget {
   title: string | null | undefined;
 }
 
+/**
+ * A single workspace artifact surfaced in the copilot chat.
+ *
+ * Rendered by `ArtifactCard` (inline) and `ArtifactPanel` (preview pane).
+ * Typically extracted from `workspace://<id>` URIs in assistant text parts
+ * or from `FileUIPart` attachments; see `getMessageArtifacts` in
+ * `ChatMessagesContainer/helpers.ts`.
+ */
+export interface ArtifactRef {
+  /** Workspace file ID (matches the backend `WorkspaceFile.id`). */
+  id: string;
+  /** Human-visible filename, used as both title and download filename. */
+  title: string;
+  /** MIME type if known (from backend metadata or `workspace://id#mime`). */
+  mimeType: string | null;
+  /**
+   * Fully-qualified URL the preview/download code will fetch from. Today
+   * this is always the same-origin proxy path
+   * `/api/proxy/api/workspace/files/{id}/download`.
+   */
+  sourceUrl: string;
+  /**
+   * Who produced the artifact — drives the origin badge color in
+   * `ArtifactPanelHeader`. Derived from the emitting message's role.
+   */
+  origin: "agent" | "user-upload";
+  /** Size in bytes if known — used by `classifyArtifact` for size gating. */
+  sizeBytes?: number;
+}
+
+interface ArtifactPanelState {
+  isOpen: boolean;
+  isMinimized: boolean;
+  isMaximized: boolean;
+  width: number;
+  activeArtifact: ArtifactRef | null;
+  history: ArtifactRef[];
+}
+
+export const DEFAULT_PANEL_WIDTH = 600;
+
+/** Autopilot response mode. */
+export type CopilotMode = "extended_thinking" | "fast";
+
 const isClient = typeof window !== "undefined";
 
+function getPersistedWidth(): number {
+  if (!isClient) return DEFAULT_PANEL_WIDTH;
+  const saved = storage.get(Key.COPILOT_ARTIFACT_PANEL_WIDTH);
+  if (saved) {
+    const parsed = parseInt(saved, 10);
+    // Match the drag-handle clamp so a stale/corrupt value can't open the
+    // panel wider than 85% of the viewport.
+    const maxWidth = window.innerWidth * 0.85;
+    if (!isNaN(parsed) && parsed >= 320) {
+      return Math.min(parsed, maxWidth);
+    }
+  }
+  return DEFAULT_PANEL_WIDTH;
+}
+
+let panelWidthPersistTimer: ReturnType<typeof setTimeout> | null = null;
+function schedulePanelWidthPersist(width: number) {
+  if (!isClient) return;
+  if (panelWidthPersistTimer) clearTimeout(panelWidthPersistTimer);
+  panelWidthPersistTimer = setTimeout(() => {
+    storage.set(Key.COPILOT_ARTIFACT_PANEL_WIDTH, String(width));
+    panelWidthPersistTimer = null;
+  }, 200);
+}
+
 function persistCompletedSessions(ids: Set<string>) {
   if (!isClient) return;
   try {
@@ -47,6 +117,20 @@ interface CopilotUIState {
   showNotificationDialog: boolean;
   setShowNotificationDialog: (show: boolean) => void;
 
+  // Artifact panel
+  artifactPanel: ArtifactPanelState;
+  openArtifact: (ref: ArtifactRef) => void;
+  closeArtifactPanel: () => void;
+  minimizeArtifactPanel: () => void;
+  maximizeArtifactPanel: () => void;
+  restoreArtifactPanel: () => void;
+  setArtifactPanelWidth: (width: number) => void;
+  goBackArtifact: () => void;
+
+  /** Autopilot mode: 'extended_thinking' (default) or 'fast'. */
+  copilotMode: CopilotMode;
+  setCopilotMode: (mode: CopilotMode) => void;
+
   clearCopilotLocalData: () => void;
 }
 
@@ -104,16 +188,120 @@ export const useCopilotUIStore = create<CopilotUIState>((set) => ({
   showNotificationDialog: false,
   setShowNotificationDialog: (show) => set({ showNotificationDialog: show }),
 
+  // Artifact panel
+  artifactPanel: {
+    isOpen: false,
+    isMinimized: false,
+    isMaximized: false,
+    width: getPersistedWidth(),
+    activeArtifact: null,
+    history: [],
+  },
+  openArtifact: (ref) =>
+    set((state) => {
+      const { activeArtifact, history: prevHistory } = state.artifactPanel;
+      const topOfHistory = prevHistory[prevHistory.length - 1];
+      const isReturningToTop = topOfHistory?.id === ref.id;
+      const MAX_HISTORY = 25;
+      const history = isReturningToTop
+        ? prevHistory.slice(0, -1)
+        : activeArtifact && activeArtifact.id !== ref.id
+          ? [...prevHistory, activeArtifact].slice(-MAX_HISTORY)
+          : prevHistory;
+      return {
+        artifactPanel: {
+          ...state.artifactPanel,
+          isOpen: true,
+          isMinimized: false,
+          activeArtifact: ref,
+          history,
+        },
+      };
+    }),
+  closeArtifactPanel: () =>
+    set((state) => ({
+      artifactPanel: {
+        ...state.artifactPanel,
+        isOpen: false,
+        isMinimized: false,
+        history: [],
+      },
+    })),
+  minimizeArtifactPanel: () =>
+    set((state) => ({
+      artifactPanel: { ...state.artifactPanel, isMinimized: true },
+    })),
+  maximizeArtifactPanel: () =>
+    set((state) => ({
+      artifactPanel: {
+        ...state.artifactPanel,
+        isMaximized: true,
+        isMinimized: false,
+      },
+    })),
+  restoreArtifactPanel: () =>
+    set((state) => ({
+      artifactPanel: {
+        ...state.artifactPanel,
+        isMaximized: false,
+        isMinimized: false,
+      },
+    })),
+  setArtifactPanelWidth: (width) => {
+    schedulePanelWidthPersist(width);
+    set((state) => ({
+      artifactPanel: {
+        ...state.artifactPanel,
+        width,
+        isMaximized: false,
+      },
+    }));
+  },
+  goBackArtifact: () =>
+    set((state) => {
+      const { history } = state.artifactPanel;
+      if (history.length === 0) return state;
+      const previous = history[history.length - 1];
+      return {
+        artifactPanel: {
+          ...state.artifactPanel,
+          activeArtifact: previous,
+          history: history.slice(0, -1),
+        },
+      };
+    }),
+
+  copilotMode:
+    isClient && storage.get(Key.COPILOT_MODE) === "fast"
+      ? "fast"
+      : "extended_thinking",
+  setCopilotMode: (mode) => {
+    storage.set(Key.COPILOT_MODE, mode);
+    set({ copilotMode: mode });
+  },
+
   clearCopilotLocalData: () => {
+    clearContentCache();
     storage.clean(Key.COPILOT_NOTIFICATIONS_ENABLED);
     storage.clean(Key.COPILOT_SOUND_ENABLED);
     storage.clean(Key.COPILOT_NOTIFICATION_BANNER_DISMISSED);
     storage.clean(Key.COPILOT_NOTIFICATION_DIALOG_DISMISSED);
+    storage.clean(Key.COPILOT_ARTIFACT_PANEL_WIDTH);
+    storage.clean(Key.COPILOT_MODE);
     storage.clean(Key.COPILOT_COMPLETED_SESSIONS);
     set({
       completedSessionIDs: new Set<string>(),
       isNotificationsEnabled: false,
       isSoundEnabled: true,
+      artifactPanel: {
+        isOpen: false,
+        isMinimized: false,
+        isMaximized: false,
+        width: DEFAULT_PANEL_WIDTH,
+        activeArtifact: null,
+        history: [],
+      },
+      copilotMode: "extended_thinking",
     });
     if (isClient) {
       document.title = ORIGINAL_TITLE;
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/tools/GenericTool/__tests__/helpers.test.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/tools/GenericTool/__tests__/helpers.test.ts
index e74d1fb80a..753bc8133a 100644
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/tools/GenericTool/__tests__/helpers.test.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/tools/GenericTool/__tests__/helpers.test.ts
@@ -334,4 +334,57 @@ describe("getAnimationText", () => {
     });
     expect(getAnimationText(part, "agent")).toBe("Agent still running\u2026");
   });
+
+  it("shows agent completed with summary for sync agent", () => {
+    const part = makePart({
+      type: `tool-${TOOL_AGENT}`,
+      state: "output-available",
+      input: { description: "analyze code" },
+      output: { status: "completed" },
+    });
+    expect(getAnimationText(part, "agent")).toBe(
+      "Agent completed: analyze code",
+    );
+  });
+
+  it("shows agent completed without summary", () => {
+    const part = makePart({
+      type: `tool-${TOOL_AGENT}`,
+      state: "output-available",
+      output: {},
+    });
+    expect(getAnimationText(part, "agent")).toBe("Agent completed");
+  });
+
+  it("shows error text for web search failure", () => {
+    const part = makePart({
+      type: "tool-WebSearch",
+      state: "output-error",
+    });
+    expect(getAnimationText(part, "web")).toBe("Search failed");
+  });
+
+  it("shows error text for web fetch failure", () => {
+    const part = makePart({
+      type: "tool-web_fetch",
+      state: "output-error",
+    });
+    expect(getAnimationText(part, "web")).toBe("Fetch failed");
+  });
+
+  it("shows error text for browser failure", () => {
+    const part = makePart({
+      type: "tool-browser_navigate",
+      state: "output-error",
+    });
+    expect(getAnimationText(part, "browser")).toBe("Browser action failed");
+  });
+
+  it("shows fallback text for unknown state", () => {
+    const part = makePart({
+      type: "tool-custom_tool",
+      state: "unknown-state" as any,
+    });
+    expect(getAnimationText(part, "other")).toBe("Running Custom tool\u2026");
+  });
 });
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/tools/RunBlock/components/SetupRequirementsCard/SetupRequirementsCard.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/tools/RunBlock/components/SetupRequirementsCard/SetupRequirementsCard.tsx
index 9c1c2a464a..7b2e0c339d 100644
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/tools/RunBlock/components/SetupRequirementsCard/SetupRequirementsCard.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/tools/RunBlock/components/SetupRequirementsCard/SetupRequirementsCard.tsx
@@ -6,25 +6,26 @@ import { Text } from "@/components/atoms/Text/Text";
 import { CredentialsGroupedView } from "@/components/contextual/CredentialsInput/components/CredentialsGroupedView/CredentialsGroupedView";
 import { FormRenderer } from "@/components/renderers/InputRenderer/FormRenderer";
 import type { CredentialsMetaInput } from "@/lib/autogpt-server-api/types";
-import { useState } from "react";
+import { useEffect, useMemo, useState } from "react";
 import { useCopilotChatActions } from "../../../../components/CopilotChatActionsProvider/useCopilotChatActions";
 import { ContentMessage } from "../../../../components/ToolAccordion/AccordionContent";
 import {
   buildExpectedInputsSchema,
+  buildRunMessage,
+  buildSiblingInputsFromCredentials,
+  checkAllCredentialsComplete,
+  checkAllInputsComplete,
+  checkCanRun,
   coerceCredentialFields,
   coerceExpectedInputs,
+  extractInitialValues,
+  mergeInputValues,
 } from "./helpers";
 
 interface Props {
   output: SetupRequirementsResponse;
-  /** Override the message sent to the chat when the user clicks Proceed after connecting credentials.
-   * Defaults to "Please re-run this step now." */
   retryInstruction?: string;
-  /** Override the label shown above the credentials section.
-   * Defaults to "Credentials". */
   credentialsLabel?: string;
-  /** Called after Proceed is clicked so the parent can persist the dismissed state
-   * across remounts (avoids re-enabling the Proceed button on remount). */
   onComplete?: () => void;
 }
 
@@ -39,8 +40,8 @@ export function SetupRequirementsCard({
   const [inputCredentials, setInputCredentials] = useState<
     Record<string, CredentialsMetaInput | undefined>
   >({});
-  const [inputValues, setInputValues] = useState<Record<string, unknown>>({});
   const [hasSent, setHasSent] = useState(false);
+  const [showAdvanced, setShowAdvanced] = useState(false);
 
   const { credentialFields, requiredCredentials } = coerceCredentialFields(
     output.setup_info.user_readiness?.missing_credentials,
@@ -50,57 +51,69 @@ export function SetupRequirementsCard({
     (output.setup_info.requirements as Record<string, unknown>)?.inputs,
   );
 
-  const inputSchema = buildExpectedInputsSchema(expectedInputs);
+  const initialValues = useMemo(
+    () => extractInitialValues(expectedInputs),
+    // eslint-disable-next-line react-hooks/exhaustive-deps -- stabilise on the raw prop
+    [output.setup_info.requirements],
+  );
+
+  const [inputValues, setInputValues] =
+    useState<Record<string, unknown>>(initialValues);
+
+  const initialValuesKey = JSON.stringify(initialValues);
+  useEffect(() => {
+    setInputValues((prev) => mergeInputValues(initialValues, prev));
+    // eslint-disable-next-line react-hooks/exhaustive-deps -- sync when serialised values change
+  }, [initialValuesKey]);
+
+  const hasAdvancedFields = expectedInputs.some((i) => i.advanced);
+  const inputSchema = buildExpectedInputsSchema(expectedInputs, showAdvanced);
+
+  // Build siblingInputs for credential modal host prefill.
+  // Prefer discriminator_values from the credential response, but also
+  // include values from input_data (e.g. url field) so the host pattern
+  // can be extracted even when discriminator_values is empty.
+  const siblingInputs = useMemo(() => {
+    const fromCreds = buildSiblingInputsFromCredentials(
+      output.setup_info.user_readiness?.missing_credentials,
+    );
+    return { ...inputValues, ...fromCreds };
+  }, [output.setup_info.user_readiness?.missing_credentials, inputValues]);
 
   function handleCredentialChange(key: string, value?: CredentialsMetaInput) {
     setInputCredentials((prev) => ({ ...prev, [key]: value }));
   }
 
   const needsCredentials = credentialFields.length > 0;
-  const isAllCredentialsComplete =
-    needsCredentials &&
-    [...requiredCredentials].every((key) => !!inputCredentials[key]);
+  const isAllCredsComplete = checkAllCredentialsComplete(
+    requiredCredentials,
+    inputCredentials,
+  );
 
-  const needsInputs = inputSchema !== null;
-  const requiredInputNames = expectedInputs
-    .filter((i) => i.required)
-    .map((i) => i.name);
-  const isAllInputsComplete =
-    needsInputs &&
-    requiredInputNames.every((name) => {
-      const v = inputValues[name];
-      return v !== undefined && v !== null && v !== "";
-    });
+  const needsInputs = expectedInputs.length > 0;
+  const isAllInputsDone = checkAllInputsComplete(expectedInputs, inputValues);
 
   if (hasSent) {
     return <ContentMessage>Connected. Continuing…</ContentMessage>;
   }
 
-  const canRun =
-    (!needsCredentials || isAllCredentialsComplete) &&
-    (!needsInputs || isAllInputsComplete);
+  const canRun = checkCanRun(
+    needsCredentials,
+    isAllCredsComplete,
+    isAllInputsDone,
+  );
 
   function handleRun() {
     setHasSent(true);
     onComplete?.();
-
-    const parts: string[] = [];
-    if (needsCredentials) {
-      parts.push("I've configured the required credentials.");
-    }
-
-    if (needsInputs) {
-      const nonEmpty = Object.fromEntries(
-        Object.entries(inputValues).filter(
-          ([, v]) => v !== undefined && v !== null && v !== "",
-        ),
-      );
-      parts.push(`Run with these inputs: ${JSON.stringify(nonEmpty, null, 2)}`);
-    } else {
-      parts.push(retryInstruction ?? "Please re-run this step now.");
-    }
-
-    onSend(parts.join(" "));
+    onSend(
+      buildRunMessage(
+        needsCredentials,
+        needsInputs,
+        inputValues,
+        retryInstruction,
+      ),
+    );
     setInputValues({});
   }
 
@@ -118,31 +131,44 @@ export function SetupRequirementsCard({
               credentialFields={credentialFields}
               requiredCredentials={requiredCredentials}
               inputCredentials={inputCredentials}
-              inputValues={{}}
+              inputValues={siblingInputs}
               onCredentialChange={handleCredentialChange}
             />
           </div>
         </div>
       )}
 
-      {inputSchema && (
+      {(inputSchema || hasAdvancedFields) && (
         <div className="rounded-2xl border bg-background p-3 pt-4">
           <Text variant="small" className="w-fit border-b text-zinc-500">
             Inputs
           </Text>
-          <FormRenderer
-            jsonSchema={inputSchema}
-            className="mb-3 mt-3"
-            handleChange={(v) => setInputValues(v.formData ?? {})}
-            uiSchema={{
-              "ui:submitButtonOptions": { norender: true },
-            }}
-            initialValues={inputValues}
-            formContext={{
-              showHandles: false,
-              size: "small",
-            }}
-          />
+          {inputSchema && (
+            <FormRenderer
+              jsonSchema={inputSchema}
+              className="mb-3 mt-3"
+              handleChange={(v) =>
+                setInputValues((prev) => ({ ...prev, ...(v.formData ?? {}) }))
+              }
+              uiSchema={{
+                "ui:submitButtonOptions": { norender: true },
+              }}
+              initialValues={inputValues}
+              formContext={{
+                showHandles: false,
+                size: "small",
+              }}
+            />
+          )}
+          {hasAdvancedFields && (
+            <button
+              type="button"
+              className="text-xs text-muted-foreground underline"
+              onClick={() => setShowAdvanced((v) => !v)}
+            >
+              {showAdvanced ? "Hide advanced fields" : "Show advanced fields"}
+            </button>
+          )}
         </div>
       )}
 
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/tools/RunBlock/components/SetupRequirementsCard/__tests__/SetupRequirementsCard.test.tsx b/autogpt_platform/frontend/src/app/(platform)/copilot/tools/RunBlock/components/SetupRequirementsCard/__tests__/SetupRequirementsCard.test.tsx
new file mode 100644
index 0000000000..3ef0e6d82e
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/tools/RunBlock/components/SetupRequirementsCard/__tests__/SetupRequirementsCard.test.tsx
@@ -0,0 +1,247 @@
+import { render, screen, fireEvent, cleanup } from "@testing-library/react";
+import { afterEach, describe, expect, it, vi } from "vitest";
+import { SetupRequirementsCard } from "../SetupRequirementsCard";
+import type { SetupRequirementsResponse } from "@/app/api/__generated__/models/setupRequirementsResponse";
+
+const mockOnSend = vi.fn();
+vi.mock(
+  "../../../../../components/CopilotChatActionsProvider/useCopilotChatActions",
+  () => ({
+    useCopilotChatActions: () => ({ onSend: mockOnSend }),
+  }),
+);
+
+vi.mock(
+  "@/components/contextual/CredentialsInput/components/CredentialsGroupedView/CredentialsGroupedView",
+  () => ({
+    CredentialsGroupedView: () => (
+      <div data-testid="credentials-grouped-view">Credentials</div>
+    ),
+  }),
+);
+
+vi.mock("@/components/renderers/InputRenderer/FormRenderer", () => ({
+  FormRenderer: ({
+    handleChange,
+  }: {
+    handleChange: (e: { formData?: Record<string, unknown> }) => void;
+  }) => (
+    <div data-testid="form-renderer">
+      <button
+        data-testid="form-change"
+        onClick={() => handleChange({ formData: { url: "https://test.com" } })}
+      >
+        Fill
+      </button>
+    </div>
+  ),
+}));
+
+afterEach(() => {
+  cleanup();
+  mockOnSend.mockReset();
+});
+
+function makeOutput(
+  overrides: {
+    message?: string;
+    missingCredentials?: Record<string, unknown>;
+    inputs?: unknown[];
+  } = {},
+): SetupRequirementsResponse {
+  const {
+    message = "Please configure credentials",
+    missingCredentials,
+    inputs,
+  } = overrides;
+  return {
+    type: "setup_requirements",
+    message,
+    session_id: "sess-1",
+    setup_info: {
+      agent_id: "agent-1",
+      agent_name: "Test Agent",
+      user_readiness: {
+        has_all_credentials: !missingCredentials,
+        missing_credentials: missingCredentials ?? {},
+        ready_to_run: !missingCredentials && !inputs,
+      },
+      requirements: {
+        credentials: [],
+        inputs: inputs ?? [],
+        execution_modes: ["immediate"],
+      },
+    },
+    graph_id: null,
+    graph_version: null,
+  } as SetupRequirementsResponse;
+}
+
+describe("SetupRequirementsCard", () => {
+  it("renders the setup message", () => {
+    render(<SetupRequirementsCard output={makeOutput()} />);
+    expect(screen.getByText("Please configure credentials")).toBeDefined();
+  });
+
+  it("renders credential section when missing credentials are provided", () => {
+    render(
+      <SetupRequirementsCard
+        output={makeOutput({
+          missingCredentials: {
+            api_key: {
+              provider: "openai",
+              types: ["api_key"],
+            },
+          },
+        })}
+      />,
+    );
+    expect(screen.getByTestId("credentials-grouped-view")).toBeDefined();
+  });
+
+  it("uses custom credentials label when provided", () => {
+    render(
+      <SetupRequirementsCard
+        output={makeOutput({
+          missingCredentials: {
+            api_key: { provider: "openai", types: ["api_key"] },
+          },
+        })}
+        credentialsLabel="API Keys"
+      />,
+    );
+    expect(screen.getByText("API Keys")).toBeDefined();
+  });
+
+  it("renders input form when inputs are provided", () => {
+    render(
+      <SetupRequirementsCard
+        output={makeOutput({
+          inputs: [
+            { name: "url", title: "URL", type: "string", required: true },
+          ],
+        })}
+      />,
+    );
+    expect(screen.getByTestId("form-renderer")).toBeDefined();
+    expect(screen.getByText("Inputs")).toBeDefined();
+  });
+
+  it("renders Proceed button that is enabled when inputs are filled", () => {
+    render(
+      <SetupRequirementsCard
+        output={makeOutput({
+          inputs: [
+            {
+              name: "url",
+              title: "URL",
+              type: "string",
+              required: true,
+              value: "https://prefilled.com",
+            },
+          ],
+        })}
+      />,
+    );
+    const proceed = screen.getByText("Proceed");
+    expect(proceed.closest("button")?.disabled).toBe(false);
+  });
+
+  it("calls onSend and shows Connected message when Proceed is clicked", () => {
+    render(
+      <SetupRequirementsCard
+        output={makeOutput({
+          inputs: [
+            {
+              name: "url",
+              title: "URL",
+              type: "string",
+              required: true,
+              value: "https://prefilled.com",
+            },
+          ],
+        })}
+      />,
+    );
+    fireEvent.click(screen.getByText("Proceed"));
+    expect(mockOnSend).toHaveBeenCalledOnce();
+    expect(screen.getByText(/Connected. Continuing/)).toBeDefined();
+  });
+
+  it("calls onComplete callback when Proceed is clicked", () => {
+    const onComplete = vi.fn();
+    render(
+      <SetupRequirementsCard
+        output={makeOutput({
+          inputs: [
+            {
+              name: "url",
+              title: "URL",
+              type: "string",
+              required: true,
+              value: "https://prefilled.com",
+            },
+          ],
+        })}
+        onComplete={onComplete}
+      />,
+    );
+    fireEvent.click(screen.getByText("Proceed"));
+    expect(onComplete).toHaveBeenCalledOnce();
+  });
+
+  it("renders advanced toggle when advanced inputs exist", () => {
+    render(
+      <SetupRequirementsCard
+        output={makeOutput({
+          inputs: [
+            {
+              name: "debug",
+              title: "Debug Mode",
+              type: "boolean",
+              advanced: true,
+            },
+          ],
+        })}
+      />,
+    );
+    expect(screen.getByText("Show advanced fields")).toBeDefined();
+  });
+
+  it("toggles advanced fields visibility", () => {
+    render(
+      <SetupRequirementsCard
+        output={makeOutput({
+          inputs: [
+            { name: "url", title: "URL", type: "string", required: false },
+            { name: "debug", title: "Debug", type: "boolean", advanced: true },
+          ],
+        })}
+      />,
+    );
+    const toggle = screen.getByText("Show advanced fields");
+    fireEvent.click(toggle);
+    expect(screen.getByText("Hide advanced fields")).toBeDefined();
+  });
+
+  it("includes retryInstruction in onSend message when no inputs needed", () => {
+    render(
+      <SetupRequirementsCard
+        output={makeOutput({
+          missingCredentials: {
+            api_key: { provider: "openai", types: ["api_key"] },
+          },
+        })}
+        retryInstruction="Retry the agent now"
+      />,
+    );
+    // With credentials required but no auto-filling mechanism in the mock,
+    // Proceed is disabled, but we're testing render only here
+    expect(screen.getByText("Proceed")).toBeDefined();
+  });
+
+  it("does not render Proceed when neither credentials nor inputs are needed", () => {
+    render(<SetupRequirementsCard output={makeOutput()} />);
+    expect(screen.queryByText("Proceed")).toBeNull();
+  });
+});
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/tools/RunBlock/components/SetupRequirementsCard/__tests__/helpers.test.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/tools/RunBlock/components/SetupRequirementsCard/__tests__/helpers.test.ts
new file mode 100644
index 0000000000..ba0281278e
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/tools/RunBlock/components/SetupRequirementsCard/__tests__/helpers.test.ts
@@ -0,0 +1,741 @@
+import { describe, expect, it } from "vitest";
+import {
+  coerceCredentialFields,
+  buildSiblingInputsFromCredentials,
+  coerceExpectedInputs,
+  buildExpectedInputsSchema,
+  extractInitialValues,
+  mergeInputValues,
+  checkAllCredentialsComplete,
+  getRequiredInputNames,
+  checkAllInputsComplete,
+  checkCanRun,
+  buildRunMessage,
+} from "../helpers";
+
+describe("coerceCredentialFields", () => {
+  it("returns empty results for null input", () => {
+    const result = coerceCredentialFields(null);
+    expect(result.credentialFields).toEqual([]);
+    expect(result.requiredCredentials.size).toBe(0);
+  });
+
+  it("returns empty results for non-object input", () => {
+    const result = coerceCredentialFields("not-an-object");
+    expect(result.credentialFields).toEqual([]);
+  });
+
+  it("parses valid credential with api_key type", () => {
+    const input = {
+      cred1: {
+        provider: "github",
+        types: ["api_key"],
+      },
+    };
+    const result = coerceCredentialFields(input);
+    expect(result.credentialFields).toHaveLength(1);
+    expect(result.credentialFields[0][0]).toBe("cred1");
+    expect(result.requiredCredentials.has("cred1")).toBe(true);
+  });
+
+  it("filters out invalid credential types", () => {
+    const input = {
+      cred1: {
+        provider: "github",
+        types: ["invalid_type"],
+      },
+    };
+    const result = coerceCredentialFields(input);
+    expect(result.credentialFields).toHaveLength(0);
+  });
+
+  it("handles non-string items in types array", () => {
+    const input = {
+      cred1: {
+        provider: "github",
+        types: [123, null, "api_key", undefined],
+      },
+    };
+    const result = coerceCredentialFields(input);
+    expect(result.credentialFields).toHaveLength(1);
+    const schema = result.credentialFields[0][1] as Record<string, unknown>;
+    expect(schema.credentials_types).toEqual(["api_key"]);
+  });
+
+  it("skips entries with empty types array", () => {
+    const input = {
+      cred1: {
+        provider: "github",
+        types: [],
+      },
+    };
+    const result = coerceCredentialFields(input);
+    expect(result.credentialFields).toHaveLength(0);
+  });
+
+  it("skips entries without provider", () => {
+    const input = {
+      cred1: {
+        provider: "",
+        types: ["api_key"],
+      },
+    };
+    const result = coerceCredentialFields(input);
+    expect(result.credentialFields).toHaveLength(0);
+  });
+
+  it("includes discriminator when present", () => {
+    const input = {
+      cred1: {
+        provider: "custom",
+        types: ["host_scoped"],
+        discriminator: "url",
+        discriminator_values: ["https://example.com"],
+      },
+    };
+    const result = coerceCredentialFields(input);
+    expect(result.credentialFields).toHaveLength(1);
+    const schema = result.credentialFields[0][1] as Record<string, unknown>;
+    expect(schema.discriminator).toBe("url");
+    expect(schema.discriminator_values).toEqual(["https://example.com"]);
+  });
+
+  it("includes scopes when present", () => {
+    const input = {
+      cred1: {
+        provider: "google",
+        types: ["oauth2"],
+        scopes: ["read", "write"],
+      },
+    };
+    const result = coerceCredentialFields(input);
+    const schema = result.credentialFields[0][1] as Record<string, unknown>;
+    expect(schema.credentials_scopes).toEqual(["read", "write"]);
+  });
+
+  it("handles multiple credentials", () => {
+    const input = {
+      cred1: { provider: "github", types: ["api_key"] },
+      cred2: { provider: "google", types: ["oauth2"] },
+    };
+    const result = coerceCredentialFields(input);
+    expect(result.credentialFields).toHaveLength(2);
+    expect(result.requiredCredentials.size).toBe(2);
+  });
+
+  it("skips non-object values", () => {
+    const input = {
+      cred1: "invalid",
+      cred2: null,
+      cred3: { provider: "github", types: ["api_key"] },
+    };
+    const result = coerceCredentialFields(input);
+    expect(result.credentialFields).toHaveLength(1);
+  });
+});
+
+describe("buildSiblingInputsFromCredentials", () => {
+  it("returns empty object for null input", () => {
+    expect(buildSiblingInputsFromCredentials(null)).toEqual({});
+  });
+
+  it("returns empty object for non-object input", () => {
+    expect(buildSiblingInputsFromCredentials("string")).toEqual({});
+  });
+
+  it("extracts discriminator values", () => {
+    const input = {
+      cred1: {
+        discriminator: "url",
+        discriminator_values: ["https://example.com"],
+      },
+    };
+    const result = buildSiblingInputsFromCredentials(input);
+    expect(result.url).toBe("https://example.com");
+  });
+
+  it("takes only the first discriminator value", () => {
+    const input = {
+      cred1: {
+        discriminator: "host",
+        discriminator_values: ["first.com", "second.com"],
+      },
+    };
+    const result = buildSiblingInputsFromCredentials(input);
+    expect(result.host).toBe("first.com");
+  });
+
+  it("skips entries without discriminator", () => {
+    const input = {
+      cred1: { provider: "github" },
+    };
+    const result = buildSiblingInputsFromCredentials(input);
+    expect(Object.keys(result)).toHaveLength(0);
+  });
+
+  it("skips entries with empty discriminator_values", () => {
+    const input = {
+      cred1: { discriminator: "url", discriminator_values: [] },
+    };
+    const result = buildSiblingInputsFromCredentials(input);
+    expect(Object.keys(result)).toHaveLength(0);
+  });
+
+  it("skips non-object values in the credentials map", () => {
+    const input = {
+      cred1: "string-value",
+      cred2: null,
+      cred3: 42,
+      cred4: {
+        discriminator: "url",
+        discriminator_values: ["https://ok.com"],
+      },
+    };
+    const result = buildSiblingInputsFromCredentials(input);
+    expect(result.url).toBe("https://ok.com");
+    expect(Object.keys(result)).toHaveLength(1);
+  });
+
+  it("filters non-string discriminator_values", () => {
+    const input = {
+      cred1: {
+        discriminator: "url",
+        discriminator_values: [42, "https://valid.com", null],
+      },
+    };
+    const result = buildSiblingInputsFromCredentials(input);
+    expect(result.url).toBe("https://valid.com");
+  });
+});
+
+describe("coerceExpectedInputs", () => {
+  it("returns empty array for non-array input", () => {
+    expect(coerceExpectedInputs(null)).toEqual([]);
+    expect(coerceExpectedInputs("string")).toEqual([]);
+  });
+
+  it("parses valid input objects", () => {
+    const result = coerceExpectedInputs([
+      { name: "query", title: "Search Query", type: "string", required: true },
+    ]);
+    expect(result).toHaveLength(1);
+    expect(result[0].name).toBe("query");
+    expect(result[0].title).toBe("Search Query");
+    expect(result[0].type).toBe("string");
+    expect(result[0].required).toBe(true);
+    expect(result[0].advanced).toBe(false);
+  });
+
+  it("generates fallback name from index", () => {
+    const result = coerceExpectedInputs([{ type: "string" }]);
+    expect(result[0].name).toBe("input-0");
+    expect(result[0].title).toBe("input-0");
+  });
+
+  it("uses name as fallback title", () => {
+    const result = coerceExpectedInputs([{ name: "query", type: "string" }]);
+    expect(result[0].title).toBe("query");
+  });
+
+  it("includes description when present", () => {
+    const result = coerceExpectedInputs([
+      { name: "q", type: "string", description: "The search query" },
+    ]);
+    expect(result[0].description).toBe("The search query");
+  });
+
+  it("excludes empty description", () => {
+    const result = coerceExpectedInputs([
+      { name: "q", type: "string", description: "  " },
+    ]);
+    expect(result[0].description).toBeUndefined();
+  });
+
+  it("includes value when present and non-null", () => {
+    const result = coerceExpectedInputs([
+      { name: "q", type: "string", value: "default" },
+    ]);
+    expect(result[0].value).toBe("default");
+  });
+
+  it("skips non-object array elements", () => {
+    const result = coerceExpectedInputs([
+      null,
+      "string",
+      { name: "valid", type: "string" },
+    ]);
+    expect(result).toHaveLength(1);
+    expect(result[0].name).toBe("valid");
+  });
+
+  it("uses 'unknown' for non-string type field", () => {
+    const result = coerceExpectedInputs([{ name: "q", type: 42 }]);
+    expect(result[0].type).toBe("unknown");
+  });
+
+  it("skips null value", () => {
+    const result = coerceExpectedInputs([
+      { name: "q", type: "string", value: null },
+    ]);
+    expect(result[0].value).toBeUndefined();
+  });
+
+  it("omits non-string discriminator_values from scopes in coerceCredentialFields", () => {
+    const input = {
+      cred1: {
+        provider: "github",
+        types: ["api_key"],
+        scopes: ["read", 42, null, "write"],
+      },
+    };
+    const result = coerceCredentialFields(input);
+    const schema = result.credentialFields[0][1] as Record<string, unknown>;
+    expect(schema.credentials_scopes).toEqual(["read", "write"]);
+  });
+});
+
+describe("buildExpectedInputsSchema", () => {
+  const inputs = [
+    {
+      name: "query",
+      title: "Query",
+      type: "string",
+      required: true,
+      advanced: false,
+    },
+    {
+      name: "limit",
+      title: "Limit",
+      type: "int",
+      required: false,
+      advanced: true,
+    },
+  ];
+
+  it("returns null for empty inputs", () => {
+    expect(buildExpectedInputsSchema([])).toBeNull();
+  });
+
+  it("excludes advanced fields by default", () => {
+    const schema = buildExpectedInputsSchema(inputs);
+    expect(schema).not.toBeNull();
+    expect(schema!.properties).toHaveProperty("query");
+    expect(schema!.properties).not.toHaveProperty("limit");
+  });
+
+  it("includes advanced fields when showAdvanced is true", () => {
+    const schema = buildExpectedInputsSchema(inputs, true);
+    expect(schema!.properties).toHaveProperty("query");
+    expect(schema!.properties).toHaveProperty("limit");
+  });
+
+  it("maps types correctly", () => {
+    const allTypes = [
+      { name: "a", title: "A", type: "str", required: false, advanced: false },
+      { name: "b", title: "B", type: "int", required: false, advanced: false },
+      {
+        name: "c",
+        title: "C",
+        type: "float",
+        required: false,
+        advanced: false,
+      },
+      {
+        name: "d",
+        title: "D",
+        type: "bool",
+        required: false,
+        advanced: false,
+      },
+      {
+        name: "e",
+        title: "E",
+        type: "unknown_type",
+        required: false,
+        advanced: false,
+      },
+    ];
+    const schema = buildExpectedInputsSchema(allTypes);
+    const props = schema!.properties as Record<string, Record<string, unknown>>;
+    expect(props.a.type).toBe("string");
+    expect(props.b.type).toBe("integer");
+    expect(props.c.type).toBe("number");
+    expect(props.d.type).toBe("boolean");
+    expect(props.e.type).toBe("string");
+  });
+
+  it("includes required array only for required fields", () => {
+    const schema = buildExpectedInputsSchema(inputs);
+    expect(schema!.required).toEqual(["query"]);
+  });
+
+  it("omits required when no fields are required", () => {
+    const optional = [
+      {
+        name: "q",
+        title: "Q",
+        type: "string",
+        required: false,
+        advanced: false,
+      },
+    ];
+    const schema = buildExpectedInputsSchema(optional);
+    expect(schema!.required).toBeUndefined();
+  });
+
+  it("includes default value from input.value", () => {
+    const withDefault = [
+      {
+        name: "q",
+        title: "Q",
+        type: "string",
+        required: false,
+        advanced: false,
+        value: "hello",
+      },
+    ];
+    const schema = buildExpectedInputsSchema(withDefault);
+    const props = schema!.properties as Record<string, Record<string, unknown>>;
+    expect(props.q.default).toBe("hello");
+  });
+
+  it("includes description in schema when present", () => {
+    const withDesc = [
+      {
+        name: "q",
+        title: "Q",
+        type: "string",
+        required: false,
+        advanced: false,
+        description: "A search query",
+      },
+    ];
+    const schema = buildExpectedInputsSchema(withDesc);
+    const props = schema!.properties as Record<string, Record<string, unknown>>;
+    expect(props.q.description).toBe("A search query");
+  });
+
+  it("returns null when all inputs are advanced and showAdvanced is false", () => {
+    const advancedOnly = [
+      {
+        name: "limit",
+        title: "Limit",
+        type: "int",
+        required: false,
+        advanced: true,
+      },
+    ];
+    expect(buildExpectedInputsSchema(advancedOnly)).toBeNull();
+    expect(buildExpectedInputsSchema(advancedOnly, true)).not.toBeNull();
+  });
+});
+
+describe("extractInitialValues", () => {
+  it("returns empty object when no values are set", () => {
+    const inputs = [
+      {
+        name: "q",
+        title: "Q",
+        type: "string",
+        required: false,
+        advanced: false,
+      },
+    ];
+    expect(extractInitialValues(inputs)).toEqual({});
+  });
+
+  it("extracts values that are present", () => {
+    const inputs = [
+      {
+        name: "q",
+        title: "Q",
+        type: "string",
+        required: false,
+        advanced: false,
+        value: "hello",
+      },
+      {
+        name: "n",
+        title: "N",
+        type: "number",
+        required: false,
+        advanced: false,
+        value: 42,
+      },
+    ];
+    expect(extractInitialValues(inputs)).toEqual({ q: "hello", n: 42 });
+  });
+
+  it("skips null and undefined values", () => {
+    const inputs = [
+      {
+        name: "a",
+        title: "A",
+        type: "string",
+        required: false,
+        advanced: false,
+        value: null,
+      },
+      {
+        name: "b",
+        title: "B",
+        type: "string",
+        required: false,
+        advanced: false,
+      },
+    ];
+    expect(extractInitialValues(inputs)).toEqual({});
+  });
+});
+
+describe("mergeInputValues", () => {
+  it("returns initial values when prev is empty", () => {
+    expect(mergeInputValues({ a: "1" }, {})).toEqual({ a: "1" });
+  });
+
+  it("preserves non-empty prev values over initial", () => {
+    expect(mergeInputValues({ a: "1", b: "2" }, { a: "override" })).toEqual({
+      a: "override",
+      b: "2",
+    });
+  });
+
+  it("skips undefined, null, and empty string from prev", () => {
+    expect(
+      mergeInputValues(
+        { a: "init-a", b: "init-b", c: "init-c" },
+        { a: undefined, b: null, c: "" },
+      ),
+    ).toEqual({ a: "init-a", b: "init-b", c: "init-c" });
+  });
+
+  it("adds new keys from prev that are not in initial", () => {
+    expect(mergeInputValues({ a: "1" }, { b: "new" })).toEqual({
+      a: "1",
+      b: "new",
+    });
+  });
+
+  it("preserves zero and false as valid values from prev", () => {
+    expect(mergeInputValues({ a: "1" }, { a: 0, b: false })).toEqual({
+      a: 0,
+      b: false,
+    });
+  });
+});
+
+describe("checkAllCredentialsComplete", () => {
+  it("returns true when all required credentials are present", () => {
+    const required = new Set(["cred1", "cred2"]);
+    const input = { cred1: { id: "a" }, cred2: { id: "b" } };
+    expect(checkAllCredentialsComplete(required, input)).toBe(true);
+  });
+
+  it("returns false when a required credential is missing", () => {
+    const required = new Set(["cred1", "cred2"]);
+    const input = { cred1: { id: "a" } };
+    expect(checkAllCredentialsComplete(required, input)).toBe(false);
+  });
+
+  it("returns false when a required credential is falsy", () => {
+    const required = new Set(["cred1"]);
+    const input = { cred1: undefined };
+    expect(checkAllCredentialsComplete(required, input)).toBe(false);
+  });
+
+  it("returns true when no credentials are required", () => {
+    expect(checkAllCredentialsComplete(new Set(), {})).toBe(true);
+  });
+});
+
+describe("getRequiredInputNames", () => {
+  it("returns names of required non-advanced inputs", () => {
+    const inputs = [
+      {
+        name: "a",
+        title: "A",
+        type: "string",
+        required: true,
+        advanced: false,
+      },
+      {
+        name: "b",
+        title: "B",
+        type: "string",
+        required: false,
+        advanced: false,
+      },
+      { name: "c", title: "C", type: "string", required: true, advanced: true },
+      {
+        name: "d",
+        title: "D",
+        type: "string",
+        required: true,
+        advanced: false,
+      },
+    ];
+    expect(getRequiredInputNames(inputs)).toEqual(["a", "d"]);
+  });
+
+  it("returns empty array when no inputs are required", () => {
+    const inputs = [
+      {
+        name: "a",
+        title: "A",
+        type: "string",
+        required: false,
+        advanced: false,
+      },
+    ];
+    expect(getRequiredInputNames(inputs)).toEqual([]);
+  });
+});
+
+describe("checkAllInputsComplete", () => {
+  it("returns true when there are no inputs", () => {
+    expect(checkAllInputsComplete([], {})).toBe(true);
+  });
+
+  it("returns true when all required inputs have values", () => {
+    const inputs = [
+      {
+        name: "a",
+        title: "A",
+        type: "string",
+        required: true,
+        advanced: false,
+      },
+      {
+        name: "b",
+        title: "B",
+        type: "string",
+        required: false,
+        advanced: false,
+      },
+    ];
+    expect(checkAllInputsComplete(inputs, { a: "value" })).toBe(true);
+  });
+
+  it("returns false when a required input is empty", () => {
+    const inputs = [
+      {
+        name: "a",
+        title: "A",
+        type: "string",
+        required: true,
+        advanced: false,
+      },
+    ];
+    expect(checkAllInputsComplete(inputs, { a: "" })).toBe(false);
+  });
+
+  it("returns false when a required input is null", () => {
+    const inputs = [
+      {
+        name: "a",
+        title: "A",
+        type: "string",
+        required: true,
+        advanced: false,
+      },
+    ];
+    expect(checkAllInputsComplete(inputs, { a: null })).toBe(false);
+  });
+
+  it("returns false when a required input is undefined", () => {
+    const inputs = [
+      {
+        name: "a",
+        title: "A",
+        type: "string",
+        required: true,
+        advanced: false,
+      },
+    ];
+    expect(checkAllInputsComplete(inputs, {})).toBe(false);
+  });
+
+  it("ignores advanced required inputs", () => {
+    const inputs = [
+      { name: "a", title: "A", type: "string", required: true, advanced: true },
+    ];
+    expect(checkAllInputsComplete(inputs, {})).toBe(true);
+  });
+
+  it("returns true with only optional inputs present", () => {
+    const inputs = [
+      {
+        name: "a",
+        title: "A",
+        type: "string",
+        required: false,
+        advanced: false,
+      },
+    ];
+    expect(checkAllInputsComplete(inputs, {})).toBe(true);
+  });
+});
+
+describe("checkCanRun", () => {
+  it("returns true when no credentials needed and inputs complete", () => {
+    expect(checkCanRun(false, false, true)).toBe(true);
+  });
+
+  it("returns false when credentials needed but not complete", () => {
+    expect(checkCanRun(true, false, true)).toBe(false);
+  });
+
+  it("returns false when inputs not complete", () => {
+    expect(checkCanRun(false, false, false)).toBe(false);
+  });
+
+  it("returns true when credentials needed and complete, inputs complete", () => {
+    expect(checkCanRun(true, true, true)).toBe(true);
+  });
+
+  it("returns false when both credentials and inputs incomplete", () => {
+    expect(checkCanRun(true, false, false)).toBe(false);
+  });
+});
+
+describe("buildRunMessage", () => {
+  it("includes credentials message when needsCredentials is true", () => {
+    const msg = buildRunMessage(true, false, {});
+    expect(msg).toContain("I've configured the required credentials.");
+  });
+
+  it("includes inputs when needsInputs is true", () => {
+    const msg = buildRunMessage(false, true, { query: "test" });
+    expect(msg).toContain("Run with these inputs:");
+    expect(msg).toContain('"query": "test"');
+  });
+
+  it("filters out empty/null/undefined values from inputs", () => {
+    const msg = buildRunMessage(false, true, {
+      a: "keep",
+      b: "",
+      c: null,
+      d: undefined,
+    });
+    expect(msg).toContain('"a": "keep"');
+    expect(msg).not.toContain('"b"');
+    expect(msg).not.toContain('"c"');
+    expect(msg).not.toContain('"d"');
+  });
+
+  it("uses retryInstruction when provided and no inputs", () => {
+    const msg = buildRunMessage(false, false, {}, "Retry now please.");
+    expect(msg).toBe("Retry now please.");
+  });
+
+  it("uses default retry message when no retryInstruction", () => {
+    const msg = buildRunMessage(false, false, {});
+    expect(msg).toBe("Please re-run this step now.");
+  });
+
+  it("combines credentials and inputs messages", () => {
+    const msg = buildRunMessage(true, true, { key: "val" });
+    expect(msg).toContain("I've configured the required credentials.");
+    expect(msg).toContain("Run with these inputs:");
+  });
+});
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/tools/RunBlock/components/SetupRequirementsCard/helpers.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/tools/RunBlock/components/SetupRequirementsCard/helpers.ts
index 79688d2425..10e2399e80 100644
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/tools/RunBlock/components/SetupRequirementsCard/helpers.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/tools/RunBlock/components/SetupRequirementsCard/helpers.ts
@@ -71,21 +71,58 @@ export function coerceCredentialFields(rawMissingCredentials: unknown): {
   return { credentialFields, requiredCredentials };
 }
 
-export function coerceExpectedInputs(rawInputs: unknown): Array<{
+/**
+ * Build a sibling-inputs dict from the missing_credentials discriminator values.
+ *
+ * When the backend resolves credentials for host-scoped blocks (e.g.
+ * SendAuthenticatedWebRequestBlock), it adds the target URL to
+ * `discriminator_values`.  The credential modal uses `siblingInputs`
+ * to extract the host and prefill the "Host Pattern" field.
+ *
+ * This function builds that mapping from the `discriminator` field name
+ * and the first `discriminator_values` entry for each credential.
+ */
+export function buildSiblingInputsFromCredentials(
+  rawMissingCredentials: unknown,
+): Record<string, unknown> {
+  const result: Record<string, unknown> = {};
+  if (!rawMissingCredentials || typeof rawMissingCredentials !== "object")
+    return result;
+
+  const missing = rawMissingCredentials as Record<string, unknown>;
+  for (const value of Object.values(missing)) {
+    if (!value || typeof value !== "object") continue;
+    const cred = value as Record<string, unknown>;
+
+    const discriminator =
+      typeof cred.discriminator === "string" ? cred.discriminator : null;
+    const discriminatorValues = Array.isArray(cred.discriminator_values)
+      ? cred.discriminator_values.filter(
+          (v): v is string => typeof v === "string",
+        )
+      : [];
+
+    if (discriminator && discriminatorValues.length > 0) {
+      result[discriminator] = discriminatorValues[0];
+    }
+  }
+
+  return result;
+}
+
+interface ExpectedInput {
   name: string;
   title: string;
   type: string;
   description?: string;
   required: boolean;
-}> {
+  advanced: boolean;
+  value?: unknown;
+}
+
+export function coerceExpectedInputs(rawInputs: unknown): ExpectedInput[] {
   if (!Array.isArray(rawInputs)) return [];
-  const results: Array<{
-    name: string;
-    title: string;
-    type: string;
-    description?: string;
-    required: boolean;
-  }> = [];
+  const results: ExpectedInput[] = [];
 
   rawInputs.forEach((value, index) => {
     if (!value || typeof value !== "object") return;
@@ -105,15 +142,13 @@ export function coerceExpectedInputs(rawInputs: unknown): Array<{
         ? input.description.trim()
         : undefined;
     const required = Boolean(input.required);
+    const advanced = Boolean(input.advanced);
 
-    const item: {
-      name: string;
-      title: string;
-      type: string;
-      description?: string;
-      required: boolean;
-    } = { name, title, type, required };
+    const item: ExpectedInput = { name, title, type, required, advanced };
     if (description) item.description = description;
+    if (input.value !== undefined && input.value !== null) {
+      item.value = input.value;
+    }
     results.push(item);
   });
 
@@ -123,17 +158,20 @@ export function coerceExpectedInputs(rawInputs: unknown): Array<{
 /**
  * Build an RJSF schema from expected inputs so they can be rendered
  * as a dynamic form via FormRenderer.
+ *
+ * When ``showAdvanced`` is false (default), fields marked ``advanced``
+ * are excluded — matching the builder behaviour where advanced fields
+ * are hidden behind a toggle.
  */
 export function buildExpectedInputsSchema(
-  expectedInputs: Array<{
-    name: string;
-    title: string;
-    type: string;
-    description?: string;
-    required: boolean;
-  }>,
+  expectedInputs: ExpectedInput[],
+  showAdvanced = false,
 ): RJSFSchema | null {
-  if (expectedInputs.length === 0) return null;
+  const visible = showAdvanced
+    ? expectedInputs
+    : expectedInputs.filter((i) => !i.advanced);
+
+  if (visible.length === 0) return null;
 
   const TYPE_MAP: Record<string, string> = {
     string: "string",
@@ -150,12 +188,14 @@ export function buildExpectedInputsSchema(
   const properties: Record<string, Record<string, unknown>> = {};
   const required: string[] = [];
 
-  for (const input of expectedInputs) {
-    properties[input.name] = {
+  for (const input of visible) {
+    const prop: Record<string, unknown> = {
       type: TYPE_MAP[input.type.toLowerCase()] ?? "string",
       title: input.title,
-      ...(input.description ? { description: input.description } : {}),
     };
+    if (input.description) prop.description = input.description;
+    if (input.value !== undefined) prop.default = input.value;
+    properties[input.name] = prop;
     if (input.required) required.push(input.name);
   }
 
@@ -165,3 +205,92 @@ export function buildExpectedInputsSchema(
     ...(required.length > 0 ? { required } : {}),
   };
 }
+
+/**
+ * Extract initial form values from expected inputs that have a
+ * prefilled ``value`` from the backend.
+ */
+export function extractInitialValues(
+  expectedInputs: ExpectedInput[],
+): Record<string, unknown> {
+  const values: Record<string, unknown> = {};
+  for (const input of expectedInputs) {
+    if (input.value !== undefined && input.value !== null) {
+      values[input.name] = input.value;
+    }
+  }
+  return values;
+}
+
+export function mergeInputValues(
+  initialValues: Record<string, unknown>,
+  prev: Record<string, unknown>,
+): Record<string, unknown> {
+  const merged = { ...initialValues };
+  for (const [key, value] of Object.entries(prev)) {
+    if (value !== undefined && value !== null && value !== "") {
+      merged[key] = value;
+    }
+  }
+  return merged;
+}
+
+export function checkAllCredentialsComplete(
+  requiredCredentials: Set<string>,
+  inputCredentials: Record<string, unknown>,
+): boolean {
+  return [...requiredCredentials].every((key) => !!inputCredentials[key]);
+}
+
+export function getRequiredInputNames(
+  expectedInputs: ExpectedInput[],
+): string[] {
+  return expectedInputs
+    .filter((i) => i.required && !i.advanced)
+    .map((i) => i.name);
+}
+
+export function checkAllInputsComplete(
+  expectedInputs: ExpectedInput[],
+  inputValues: Record<string, unknown>,
+): boolean {
+  if (expectedInputs.length === 0) return true;
+  const requiredNames = getRequiredInputNames(expectedInputs);
+  return requiredNames.every((name) => {
+    const v = inputValues[name];
+    return v !== undefined && v !== null && v !== "";
+  });
+}
+
+export function checkCanRun(
+  needsCredentials: boolean,
+  isAllCredentialsComplete: boolean,
+  isAllInputsComplete: boolean,
+): boolean {
+  return (!needsCredentials || isAllCredentialsComplete) && isAllInputsComplete;
+}
+
+export function buildRunMessage(
+  needsCredentials: boolean,
+  needsInputs: boolean,
+  inputValues: Record<string, unknown>,
+  retryInstruction?: string,
+): string {
+  const parts: string[] = [];
+  if (needsCredentials) {
+    parts.push("I've configured the required credentials.");
+  }
+
+  if (needsInputs) {
+    const nonEmpty = Object.fromEntries(
+      Object.entries(inputValues).filter(
+        ([, v]) => v !== undefined && v !== null && v !== "",
+      ),
+    );
+    parts.push(`Run with these inputs: ${JSON.stringify(nonEmpty, null, 2)}`);
+  } else {
+    parts.push(retryInstruction ?? "Please re-run this step now.");
+  }
+
+  return parts.join(" ");
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/useChatSession.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/useChatSession.ts
index 1e3bd583ec..e001792456 100644
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/useChatSession.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/useChatSession.ts
@@ -15,7 +15,7 @@ export function useChatSession() {
   const [sessionId, setSessionId] = useQueryState("sessionId", parseAsString);
   const queryClient = useQueryClient();
 
-  const sessionQuery = useGetV2GetSession(sessionId ?? "", {
+  const sessionQuery = useGetV2GetSession(sessionId ?? "", undefined, {
     query: {
       enabled: !!sessionId,
       staleTime: Infinity, // Manual invalidation on session switch
@@ -57,6 +57,17 @@ export function useChatSession() {
     return !!sessionQuery.data.data.active_stream;
   }, [sessionQuery.data, sessionQuery.isFetching, sessionId]);
 
+  // Pagination metadata from the initial page load
+  const hasMoreMessages = useMemo(() => {
+    if (sessionQuery.data?.status !== 200) return false;
+    return !!sessionQuery.data.data.has_more_messages;
+  }, [sessionQuery.data]);
+
+  const oldestSequence = useMemo(() => {
+    if (sessionQuery.data?.status !== 200) return null;
+    return sessionQuery.data.data.oldest_sequence ?? null;
+  }, [sessionQuery.data]);
+
   // Memoize so the effect in useCopilotPage doesn't infinite-loop on a new
   // array reference every render. Re-derives only when query data changes.
   // When the session is complete (no active stream), mark dangling tool
@@ -127,12 +138,22 @@ export function useChatSession() {
     }
   }
 
+  // Raw messages from the initial page — exposed for cross-page
+  // tool output matching by useLoadMoreMessages.
+  const rawSessionMessages =
+    sessionQuery.data?.status === 200
+      ? ((sessionQuery.data.data.messages ?? []) as unknown[])
+      : [];
+
   return {
     sessionId,
     setSessionId,
     hydratedMessages,
+    rawSessionMessages,
     historicalDurations,
     hasActiveStream,
+    hasMoreMessages,
+    oldestSequence,
     isLoadingSession: sessionQuery.isLoading,
     isSessionError: sessionQuery.isError,
     createSession,
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
index 4bfb651171..4d97a9619d 100644
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotPage.ts
@@ -10,11 +10,14 @@ import { useBreakpoint } from "@/lib/hooks/useBreakpoint";
 import { useSupabase } from "@/lib/supabase/hooks/useSupabase";
 import { useQueryClient } from "@tanstack/react-query";
 import type { FileUIPart } from "ai";
+import { Flag, useGetFlag } from "@/services/feature-flags/use-get-flag";
 import { useEffect, useRef, useState } from "react";
+import { concatWithAssistantMerge } from "./helpers/convertChatSessionToUiMessages";
 import { useCopilotUIStore } from "./store";
 import { useChatSession } from "./useChatSession";
 import { useCopilotNotifications } from "./useCopilotNotifications";
 import { useCopilotStream } from "./useCopilotStream";
+import { useLoadMoreMessages } from "./useLoadMoreMessages";
 import { useWorkflowImportAutoSubmit } from "./useWorkflowImportAutoSubmit";
 
 const TITLE_POLL_INTERVAL_MS = 2_000;
@@ -32,15 +35,25 @@ export function useCopilotPage() {
   const [pendingMessage, setPendingMessage] = useState<string | null>(null);
   const queryClient = useQueryClient();
 
-  const { sessionToDelete, setSessionToDelete, isDrawerOpen, setDrawerOpen } =
-    useCopilotUIStore();
+  const isModeToggleEnabled = useGetFlag(Flag.CHAT_MODE_OPTION);
+
+  const {
+    sessionToDelete,
+    setSessionToDelete,
+    isDrawerOpen,
+    setDrawerOpen,
+    copilotMode,
+  } = useCopilotUIStore();
 
   const {
     sessionId,
     setSessionId,
     hydratedMessages,
+    rawSessionMessages,
     historicalDurations,
     hasActiveStream,
+    hasMoreMessages,
+    oldestSequence,
     isLoadingSession,
     isSessionError,
     createSession,
@@ -49,7 +62,7 @@ export function useCopilotPage() {
   } = useChatSession();
 
   const {
-    messages,
+    messages: currentMessages,
     sendMessage,
     stop,
     status,
@@ -64,8 +77,22 @@ export function useCopilotPage() {
     hydratedMessages,
     hasActiveStream,
     refetchSession,
+    copilotMode: isModeToggleEnabled ? copilotMode : undefined,
   });
 
+  const { olderMessages, hasMore, isLoadingMore, loadMore } =
+    useLoadMoreMessages({
+      sessionId,
+      initialOldestSequence: oldestSequence,
+      initialHasMore: hasMoreMessages,
+      initialPageRawMessages: rawSessionMessages,
+    });
+
+  // Combine older (paginated) messages with current page messages,
+  // merging consecutive assistant UIMessages at the page boundary so
+  // reasoning + response parts stay in a single bubble.
+  const messages = concatWithAssistantMerge(olderMessages, currentMessages);
+
   useCopilotNotifications(sessionId);
 
   // --- Delete session ---
@@ -362,6 +389,10 @@ export function useCopilotPage() {
     isLoggedIn,
     createSession,
     onSend,
+    // Pagination
+    hasMoreMessages: hasMore,
+    isLoadingMore,
+    loadMore,
     // Mobile drawer
     isMobile,
     isDrawerOpen,
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
index 1251a113a4..ab04c81bee 100644
--- a/autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts
@@ -10,12 +10,15 @@ import { useChat } from "@ai-sdk/react";
 import { useQueryClient } from "@tanstack/react-query";
 import { DefaultChatTransport } from "ai";
 import type { FileUIPart, UIMessage } from "ai";
-import { useCallback, useEffect, useMemo, useRef, useState } from "react";
+import { useEffect, useMemo, useRef, useState } from "react";
 import {
   deduplicateMessages,
+  extractSendMessageText,
   hasActiveBackendStream,
   resolveInProgressTools,
+  getSendSuppressionReason,
 } from "./helpers";
+import type { CopilotMode } from "./store";
 
 const RECONNECT_BASE_DELAY_MS = 1_000;
 const RECONNECT_MAX_ATTEMPTS = 3;
@@ -38,6 +41,8 @@ interface UseCopilotStreamArgs {
   hydratedMessages: UIMessage[] | undefined;
   hasActiveStream: boolean;
   refetchSession: () => Promise<{ data?: unknown }>;
+  /** Autopilot mode to use for requests. `undefined` = let backend decide via feature flags. */
+  copilotMode: CopilotMode | undefined;
 }
 
 export function useCopilotStream({
@@ -45,10 +50,18 @@ export function useCopilotStream({
   hydratedMessages,
   hasActiveStream,
   refetchSession,
+  copilotMode,
 }: UseCopilotStreamArgs) {
   const queryClient = useQueryClient();
   const [rateLimitMessage, setRateLimitMessage] = useState<string | null>(null);
-  const dismissRateLimit = useCallback(() => setRateLimitMessage(null), []);
+  function dismissRateLimit() {
+    setRateLimitMessage(null);
+  }
+  // Use a ref for copilotMode so the transport closure always reads the
+  // latest value without recreating the DefaultChatTransport (which would
+  // reset useChat's internal Chat instance and break mid-session streaming).
+  const copilotModeRef = useRef(copilotMode);
+  copilotModeRef.current = copilotMode;
 
   // Connect directly to the Python backend for SSE, bypassing the Next.js
   // serverless proxy. This eliminates the Vercel 800s function timeout that
@@ -79,6 +92,7 @@ export function useCopilotStream({
                   is_user_message: last.role === "user",
                   context: null,
                   file_ids: fileIds && fileIds.length > 0 ? fileIds : null,
+                  mode: copilotModeRef.current ?? null,
                 },
                 headers: await getAuthHeaders(),
               };
@@ -147,9 +161,14 @@ export function useCopilotStream({
     }, delay);
   }
 
+  // Tracks the ID of the last user message that was submitted via sendMessage.
+  // During a reconnect cycle, if the session already contains this message, we
+  // must not POST it again — only GET-resume is safe.
+  const lastSubmittedMsgRef = useRef<string | null>(null);
+
   const {
     messages: rawMessages,
-    sendMessage,
+    sendMessage: sdkSendMessage,
     stop: sdkStop,
     status,
     error,
@@ -236,6 +255,36 @@ export function useCopilotStream({
     },
   });
 
+  // Wrap sdkSendMessage to guard against re-sending the user message during a
+  // reconnect cycle. If the session already has the message (i.e. we are in a
+  // reconnect/resume flow), only GET-resume is safe — never re-POST.
+  const sendMessage: typeof sdkSendMessage = async (...args) => {
+    const text = extractSendMessageText(args[0]);
+
+    const suppressReason = getSendSuppressionReason({
+      text,
+      isReconnectScheduled: isReconnectScheduledRef.current,
+      lastSubmittedText: lastSubmittedMsgRef.current,
+      messages: rawMessages,
+    });
+
+    if (suppressReason === "reconnecting") {
+      // The ref flips to ``true`` synchronously while the React state that
+      // drives the UI's disabled state only updates on the next render, so
+      // the user may have clicked send against a still-enabled input. Tell
+      // them their message wasn't dropped silently.
+      toast({
+        title: "Reconnecting",
+        description: "Wait for the connection to resume before sending.",
+      });
+      return;
+    }
+    if (suppressReason === "duplicate") return;
+
+    lastSubmittedMsgRef.current = text;
+    return sdkSendMessage(...args);
+  };
+
   // Deduplicate messages continuously to prevent duplicates when resuming streams
   const messages = useMemo(
     () => deduplicateMessages(rawMessages),
@@ -381,6 +430,7 @@ export function useCopilotStream({
     setRateLimitMessage(null);
     hasShownDisconnectToast.current = false;
     isUserStoppingRef.current = false;
+    lastSubmittedMsgRef.current = null;
     setReconnectExhausted(false);
     setIsSyncing(false);
     hasResumedRef.current.clear();
@@ -409,6 +459,7 @@ export function useCopilotStream({
       if (status === "ready") {
         reconnectAttemptsRef.current = 0;
         hasShownDisconnectToast.current = false;
+        lastSubmittedMsgRef.current = null;
         setReconnectExhausted(false);
       }
     }
diff --git a/autogpt_platform/frontend/src/app/(platform)/copilot/useLoadMoreMessages.ts b/autogpt_platform/frontend/src/app/(platform)/copilot/useLoadMoreMessages.ts
new file mode 100644
index 0000000000..13efd957f9
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/copilot/useLoadMoreMessages.ts
@@ -0,0 +1,161 @@
+import { getV2GetSession } from "@/app/api/__generated__/endpoints/chat/chat";
+import type { UIDataTypes, UIMessage, UITools } from "ai";
+import { useEffect, useMemo, useRef, useState } from "react";
+import {
+  convertChatSessionMessagesToUiMessages,
+  extractToolOutputsFromRaw,
+} from "./helpers/convertChatSessionToUiMessages";
+
+interface UseLoadMoreMessagesArgs {
+  sessionId: string | null;
+  initialOldestSequence: number | null;
+  initialHasMore: boolean;
+  /** Raw messages from the initial page, used for cross-page tool output matching. */
+  initialPageRawMessages: unknown[];
+}
+
+const MAX_CONSECUTIVE_ERRORS = 3;
+const MAX_OLDER_MESSAGES = 2000;
+
+export function useLoadMoreMessages({
+  sessionId,
+  initialOldestSequence,
+  initialHasMore,
+  initialPageRawMessages,
+}: UseLoadMoreMessagesArgs) {
+  // Store accumulated raw messages from all older pages (in ascending order).
+  // Re-converting them all together ensures tool outputs are matched across
+  // inter-page boundaries.
+  const [olderRawMessages, setOlderRawMessages] = useState<unknown[]>([]);
+  const [oldestSequence, setOldestSequence] = useState<number | null>(
+    initialOldestSequence,
+  );
+  const [hasMore, setHasMore] = useState(initialHasMore);
+  const [isLoadingMore, setIsLoadingMore] = useState(false);
+  const isLoadingMoreRef = useRef(false);
+  const consecutiveErrorsRef = useRef(0);
+  // Epoch counter to discard stale loadMore responses after a reset
+  const epochRef = useRef(0);
+
+  // Track the sessionId and initial cursor to reset state on change
+  const prevSessionIdRef = useRef(sessionId);
+  const prevInitialOldestRef = useRef(initialOldestSequence);
+
+  // Sync initial values from parent when they change
+  useEffect(() => {
+    if (prevSessionIdRef.current !== sessionId) {
+      // Session changed — full reset
+      prevSessionIdRef.current = sessionId;
+      prevInitialOldestRef.current = initialOldestSequence;
+      setOlderRawMessages([]);
+      setOldestSequence(initialOldestSequence);
+      setHasMore(initialHasMore);
+      setIsLoadingMore(false);
+      isLoadingMoreRef.current = false;
+      consecutiveErrorsRef.current = 0;
+      epochRef.current += 1;
+    } else if (
+      prevInitialOldestRef.current !== initialOldestSequence &&
+      olderRawMessages.length > 0
+    ) {
+      // Same session but initial window shifted (e.g. new messages arrived) —
+      // clear paged state to avoid gaps/duplicates
+      prevInitialOldestRef.current = initialOldestSequence;
+      setOlderRawMessages([]);
+      setOldestSequence(initialOldestSequence);
+      setHasMore(initialHasMore);
+      setIsLoadingMore(false);
+      isLoadingMoreRef.current = false;
+      consecutiveErrorsRef.current = 0;
+      epochRef.current += 1;
+    } else {
+      // Update from parent when initial data changes (e.g. refetch)
+      prevInitialOldestRef.current = initialOldestSequence;
+      setOldestSequence(initialOldestSequence);
+      setHasMore(initialHasMore);
+    }
+  }, [sessionId, initialOldestSequence, initialHasMore]);
+
+  // Convert all accumulated raw messages in one pass so tool outputs
+  // are matched across inter-page boundaries. Initial page tool outputs
+  // are included via extraToolOutputs to handle the boundary between
+  // the last older page and the initial/streaming page.
+  const olderMessages: UIMessage<unknown, UIDataTypes, UITools>[] =
+    useMemo(() => {
+      if (!sessionId || olderRawMessages.length === 0) return [];
+      const extraToolOutputs =
+        initialPageRawMessages.length > 0
+          ? extractToolOutputsFromRaw(initialPageRawMessages)
+          : undefined;
+      return convertChatSessionMessagesToUiMessages(
+        sessionId,
+        olderRawMessages,
+        { isComplete: true, extraToolOutputs },
+      ).messages;
+    }, [sessionId, olderRawMessages, initialPageRawMessages]);
+
+  async function loadMore() {
+    if (
+      !sessionId ||
+      !hasMore ||
+      isLoadingMoreRef.current ||
+      oldestSequence === null
+    )
+      return;
+
+    const requestEpoch = epochRef.current;
+    isLoadingMoreRef.current = true;
+    setIsLoadingMore(true);
+    try {
+      const response = await getV2GetSession(sessionId, {
+        limit: 50,
+        before_sequence: oldestSequence,
+      });
+
+      // Discard response if session/pagination was reset while awaiting
+      if (epochRef.current !== requestEpoch) return;
+
+      if (response.status !== 200) {
+        consecutiveErrorsRef.current += 1;
+        console.warn(
+          `[loadMore] Failed to load messages (status=${response.status}, attempt=${consecutiveErrorsRef.current})`,
+        );
+        if (consecutiveErrorsRef.current >= MAX_CONSECUTIVE_ERRORS) {
+          setHasMore(false);
+        }
+        return;
+      }
+
+      consecutiveErrorsRef.current = 0;
+
+      const newRaw = (response.data.messages ?? []) as unknown[];
+      setOlderRawMessages((prev) => {
+        const merged = [...newRaw, ...prev];
+        if (merged.length > MAX_OLDER_MESSAGES) {
+          return merged.slice(merged.length - MAX_OLDER_MESSAGES);
+        }
+        return merged;
+      });
+      setOldestSequence(response.data.oldest_sequence ?? null);
+      if (newRaw.length + olderRawMessages.length >= MAX_OLDER_MESSAGES) {
+        setHasMore(false);
+      } else {
+        setHasMore(!!response.data.has_more_messages);
+      }
+    } catch (error) {
+      if (epochRef.current !== requestEpoch) return;
+      consecutiveErrorsRef.current += 1;
+      console.warn("[loadMore] Network error:", error);
+      if (consecutiveErrorsRef.current >= MAX_CONSECUTIVE_ERRORS) {
+        setHasMore(false);
+      }
+    } finally {
+      if (epochRef.current === requestEpoch) {
+        isLoadingMoreRef.current = false;
+        setIsLoadingMore(false);
+      }
+    }
+  }
+
+  return { olderMessages, hasMore, isLoadingMore, loadMore };
+}
diff --git a/autogpt_platform/frontend/src/app/(platform)/library/__tests__/main.test.tsx b/autogpt_platform/frontend/src/app/(platform)/library/__tests__/main.test.tsx
new file mode 100644
index 0000000000..8d7960dc9b
--- /dev/null
+++ b/autogpt_platform/frontend/src/app/(platform)/library/__tests__/main.test.tsx
@@ -0,0 +1,223 @@
+import { describe, expect, test } from "vitest";
+import { render, screen } from "@/tests/integrations/test-utils";
+import { server } from "@/mocks/mock-server";
+import {
+  getGetV2ListLibraryAgentsMockHandler,
+  getGetV2ListLibraryAgentsResponseMock,
+  getGetV2ListFavoriteLibraryAgentsMockHandler,
+  getGetV2ListFavoriteLibraryAgentsResponseMock,
+} from "@/app/api/__generated__/endpoints/library/library.msw";
+import {
+  getGetV2ListLibraryFoldersMockHandler,
+  getGetV2ListLibraryFoldersResponseMock,
+} from "@/app/api/__generated__/endpoints/folders/folders.msw";
+import { getGetV1ListAllExecutionsMockHandler } from "@/app/api/__generated__/endpoints/graphs/graphs.msw";
+import { LibraryAgent } from "@/app/api/__generated__/models/libraryAgent";
+import LibraryPage from "../page";
+
+function makeAgent(overrides: Partial<LibraryAgent> = {}): LibraryAgent {
+  const base = getGetV2ListLibraryAgentsResponseMock().agents[0];
+  return { ...base, ...overrides };
+}
+
+function setupHandlers({
+  agents,
+  favorites,
+  folders,
+  executions,
+}: {
+  agents?: LibraryAgent[];
+  favorites?: LibraryAgent[];
+  folders?: Parameters<typeof getGetV2ListLibraryFoldersResponseMock>[0];
+  executions?: Parameters<typeof getGetV1ListAllExecutionsMockHandler>[0];
+} = {}) {
+  const agentList = agents ?? [makeAgent()];
+  const favList = favorites ?? [];
+
+  server.use(
+    getGetV2ListLibraryAgentsMockHandler({
+      ...getGetV2ListLibraryAgentsResponseMock(),
+      agents: agentList,
+      pagination: {
+        total_items: agentList.length,
+        total_pages: 1,
+        current_page: 1,
+        page_size: 20,
+      },
+    }),
+    getGetV2ListFavoriteLibraryAgentsMockHandler({
+      ...getGetV2ListFavoriteLibraryAgentsResponseMock(),
+      agents: favList,
+      pagination: {
+        total_items: favList.length,
+        total_pages: 1,
+        current_page: 1,
+        page_size: 10,
+      },
+    }),
+    getGetV2ListLibraryFoldersMockHandler(
+      folders
+        ? getGetV2ListLibraryFoldersResponseMock(folders)
+        : {
+            folders: [],
+            pagination: {
+              total_items: 0,
+              total_pages: 1,
+              current_page: 1,
+              page_size: 20,
+            },
+          },
+    ),
+    getGetV1ListAllExecutionsMockHandler(executions ?? []),
+  );
+}
+
+function waitForAgentsToLoad() {
+  return screen.findAllByTestId("library-agent-card-name");
+}
+
+describe("LibraryPage", () => {
+  test("renders agent cards from API", async () => {
+    setupHandlers({ agents: [makeAgent({ name: "Weather Bot" })] });
+
+    render(<LibraryPage />);
+
+    expect(await screen.findByText("Weather Bot")).toBeDefined();
+  });
+
+  test("renders multiple agent cards with correct names", async () => {
+    setupHandlers({
+      agents: [
+        makeAgent({ id: "a1", name: "Agent Alpha" }),
+        makeAgent({ id: "a2", name: "Agent Beta" }),
+        makeAgent({ id: "a3", name: "Agent Gamma" }),
+      ],
+    });
+
+    render(<LibraryPage />);
+
+    expect(await screen.findByText("Agent Alpha")).toBeDefined();
+    expect(screen.getByText("Agent Beta")).toBeDefined();
+    expect(screen.getByText("Agent Gamma")).toBeDefined();
+  });
+
+  test("renders All and Favorites tabs", async () => {
+    setupHandlers();
+
+    render(<LibraryPage />);
+
+    await waitForAgentsToLoad();
+
+    const tabs = screen.getAllByRole("tab");
+    const tabNames = tabs.map((t) => t.textContent);
+    expect(tabNames.some((n) => n?.match(/all/i))).toBe(true);
+    expect(tabNames.some((n) => n?.match(/favorites/i))).toBe(true);
+  });
+
+  test("favorites tab is disabled when no favorites exist", async () => {
+    setupHandlers();
+
+    render(<LibraryPage />);
+
+    await waitForAgentsToLoad();
+
+    const favoritesTab = screen
+      .getAllByRole("tab")
+      .find((t) => t.textContent?.match(/favorites/i));
+    expect(favoritesTab).toBeDefined();
+    expect(favoritesTab!.hasAttribute("data-disabled")).toBe(true);
+  });
+
+  test("renders folders alongside agents", async () => {
+    setupHandlers({
+      folders: {
+        folders: [
+          {
+            id: "f1",
+            user_id: "test-user",
+            name: "Work Agents",
+            agent_count: 3,
+            color: null,
+            icon: null,
+            created_at: new Date(),
+            updated_at: new Date(),
+          },
+          {
+            id: "f2",
+            user_id: "test-user",
+            name: "Personal",
+            agent_count: 1,
+            color: null,
+            icon: null,
+            created_at: new Date(),
+            updated_at: new Date(),
+          },
+        ],
+      },
+    });
+
+    render(<LibraryPage />);
+
+    expect(await screen.findByText("Work Agents")).toBeDefined();
+    expect(screen.getByText("Personal")).toBeDefined();
+    expect(screen.getAllByTestId("library-folder")).toHaveLength(2);
+  });
+
+  test("shows See runs link on agent card", async () => {
+    setupHandlers({
+      agents: [makeAgent({ name: "Linked Agent", can_access_graph: true })],
+    });
+
+    render(<LibraryPage />);
+
+    await screen.findByText("Linked Agent");
+
+    const runLinks = screen.getAllByText("See runs");
+    expect(runLinks.length).toBeGreaterThan(0);
+  });
+
+  test("renders search bar and import button", async () => {
+    setupHandlers();
+
+    render(<LibraryPage />);
+
+    await waitForAgentsToLoad();
+
+    const searchBars = screen.getAllByTestId("library-textbox");
+    expect(searchBars.length).toBeGreaterThan(0);
+
+    const importButtons = screen.getAllByTestId("import-button");
+    expect(importButtons.length).toBeGreaterThan(0);
+  });
+
+  test("renders Jump Back In when there is an active execution", async () => {
+    const agent = makeAgent({
+      id: "lib-1",
+      graph_id: "g-1",
+      name: "Running Agent",
+    });
+    setupHandlers({
+      agents: [agent],
+      executions: [
+        {
+          id: "exec-1",
+          user_id: "test-user",
+          graph_id: "g-1",
+          graph_version: 1,
+          inputs: {},
+          credential_inputs: {},
+          nodes_input_masks: {},
+          preset_id: null,
+          status: "RUNNING",
+          started_at: new Date(Date.now() - 60_000),
+          ended_at: null,
+          stats: null,
+        },
+      ],
+    });
+
+    render(<LibraryPage />);
+
+    expect(await screen.findByText("Jump Back In")).toBeDefined();
+  });
+});
diff --git a/autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/selected-views/SelectedRunView/components/SelectedRunActions/SelectedRunActions.tsx b/autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/selected-views/SelectedRunView/components/SelectedRunActions/SelectedRunActions.tsx
index 83c836def4..ef1103a5a0 100644
--- a/autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/selected-views/SelectedRunView/components/SelectedRunActions/SelectedRunActions.tsx
+++ b/autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/NewAgentLibraryView/components/selected-views/SelectedRunView/components/SelectedRunActions/SelectedRunActions.tsx
@@ -2,7 +2,6 @@ import { GraphExecution } from "@/app/api/__generated__/models/graphExecution";
 import { LibraryAgent } from "@/app/api/__generated__/models/libraryAgent";
 import { Button } from "@/components/atoms/Button/Button";
 import { LoadingSpinner } from "@/components/atoms/LoadingSpinner/LoadingSpinner";
-import { Flag, useGetFlag } from "@/services/feature-flags/use-get-flag";
 import {
   ArrowBendLeftUpIcon,
   ArrowBendRightDownIcon,
@@ -47,7 +46,6 @@ export function SelectedRunActions({
     onSelectRun: onSelectRun,
   });
 
-  const shareExecutionResultsEnabled = useGetFlag(Flag.SHARE_EXECUTION_RESULTS);
   const isRunning = run?.status === "RUNNING";
 
   if (!run || !agent) return null;
@@ -104,14 +102,12 @@ export function SelectedRunActions({
           <EyeIcon weight="bold" size={18} className="text-zinc-700" />
         </Button>
       ) : null}
-      {shareExecutionResultsEnabled && (
-        <ShareRunButton
-          graphId={agent.graph_id}
-          executionId={run.id}
-          isShared={run.is_shared}
-          shareToken={run.share_token}
-        />
-      )}
+      <ShareRunButton
+        graphId={agent.graph_id}
+        executionId={run.id}
+        isShared={run.is_shared}
+        shareToken={run.share_token}
+      />
       {canRunManually && (
         <>
           <Button
diff --git a/autogpt_platform/frontend/src/app/api/openapi.json b/autogpt_platform/frontend/src/app/api/openapi.json
index 1ab89b30ec..51ce017aff 100644
--- a/autogpt_platform/frontend/src/app/api/openapi.json
+++ b/autogpt_platform/frontend/src/app/api/openapi.json
@@ -1113,7 +1113,7 @@
       "get": {
         "tags": ["v2", "chat", "chat"],
         "summary": "Get Session",
-        "description": "Retrieve the details of a specific chat session.\n\nLooks up a chat session by ID for the given user (if authenticated) and returns all session data including messages.\nIf there's an active stream for this session, returns active_stream info for reconnection.\n\nArgs:\n    session_id: The unique identifier for the desired chat session.\n    user_id: The optional authenticated user ID, or None for anonymous access.\n\nReturns:\n    SessionDetailResponse: Details for the requested session, including active_stream info if applicable.",
+        "description": "Retrieve the details of a specific chat session.\n\nSupports cursor-based pagination via ``limit`` and ``before_sequence``.\nWhen no pagination params are provided, returns the most recent messages.\n\nArgs:\n    session_id: The unique identifier for the desired chat session.\n    user_id: The authenticated user's ID.\n    limit: Maximum number of messages to return (1-200, default 50).\n    before_sequence: Return messages with sequence < this value (cursor).\n\nReturns:\n    SessionDetailResponse: Details for the requested session, including\n        active_stream info and pagination metadata.",
         "operationId": "getV2GetSession",
         "security": [{ "HTTPBearerJWT": [] }],
         "parameters": [
@@ -1122,6 +1122,30 @@
             "in": "path",
             "required": true,
             "schema": { "type": "string", "title": "Session Id" }
+          },
+          {
+            "name": "limit",
+            "in": "query",
+            "required": false,
+            "schema": {
+              "type": "integer",
+              "maximum": 200,
+              "minimum": 1,
+              "default": 50,
+              "title": "Limit"
+            }
+          },
+          {
+            "name": "before_sequence",
+            "in": "query",
+            "required": false,
+            "schema": {
+              "anyOf": [
+                { "type": "integer", "minimum": 0 },
+                { "type": "null" }
+              ],
+              "title": "Before Sequence"
+            }
           }
         ],
         "responses": {
@@ -8082,12 +8106,76 @@
         }
       }
     },
+    "/api/workspace/files": {
+      "get": {
+        "tags": ["workspace"],
+        "summary": "List workspace files",
+        "description": "List files in the user's workspace.\n\nWhen session_id is provided, only files for that session are returned.\nOtherwise, all files across sessions are listed. Results are paginated\nvia `limit`/`offset`; `has_more` indicates whether additional pages exist.",
+        "operationId": "listWorkspaceFiles",
+        "security": [{ "HTTPBearerJWT": [] }],
+        "parameters": [
+          {
+            "name": "session_id",
+            "in": "query",
+            "required": false,
+            "schema": {
+              "anyOf": [{ "type": "string" }, { "type": "null" }],
+              "title": "Session Id"
+            }
+          },
+          {
+            "name": "limit",
+            "in": "query",
+            "required": false,
+            "schema": {
+              "type": "integer",
+              "maximum": 1000,
+              "minimum": 1,
+              "default": 200,
+              "title": "Limit"
+            }
+          },
+          {
+            "name": "offset",
+            "in": "query",
+            "required": false,
+            "schema": {
+              "type": "integer",
+              "minimum": 0,
+              "default": 0,
+              "title": "Offset"
+            }
+          }
+        ],
+        "responses": {
+          "200": {
+            "description": "Successful Response",
+            "content": {
+              "application/json": {
+                "schema": { "$ref": "#/components/schemas/ListFilesResponse" }
+              }
+            }
+          },
+          "401": {
+            "$ref": "#/components/responses/HTTP401NotAuthenticatedError"
+          },
+          "422": {
+            "description": "Validation Error",
+            "content": {
+              "application/json": {
+                "schema": { "$ref": "#/components/schemas/HTTPValidationError" }
+              }
+            }
+          }
+        }
+      }
+    },
     "/api/workspace/files/upload": {
       "post": {
         "tags": ["workspace"],
         "summary": "Upload file to workspace",
         "description": "Upload a file to the user's workspace.\n\nFiles are stored in session-scoped paths when session_id is provided,\nso the agent's session-scoped tools can discover them automatically.",
-        "operationId": "postWorkspaceUpload file to workspace",
+        "operationId": "uploadWorkspaceFile",
         "security": [{ "HTTPBearerJWT": [] }],
         "parameters": [
           {
@@ -8115,7 +8203,7 @@
           "content": {
             "multipart/form-data": {
               "schema": {
-                "$ref": "#/components/schemas/Body_postWorkspaceUpload_file_to_workspace"
+                "$ref": "#/components/schemas/Body_uploadWorkspaceFile"
               }
             }
           }
@@ -8150,7 +8238,7 @@
         "tags": ["workspace"],
         "summary": "Delete a workspace file",
         "description": "Soft-delete a workspace file and attempt to remove it from storage.\n\nUsed when a user clears a file input in the builder.",
-        "operationId": "deleteWorkspaceDelete a workspace file",
+        "operationId": "deleteWorkspaceFile",
         "security": [{ "HTTPBearerJWT": [] }],
         "parameters": [
           {
@@ -8188,7 +8276,7 @@
         "tags": ["workspace"],
         "summary": "Download file by ID",
         "description": "Download a file by its ID.\n\nReturns the file content directly or redirects to a signed URL for GCS.",
-        "operationId": "getWorkspaceDownload file by id",
+        "operationId": "getWorkspaceDownloadFileById",
         "security": [{ "HTTPBearerJWT": [] }],
         "parameters": [
           {
@@ -8222,7 +8310,7 @@
         "tags": ["workspace"],
         "summary": "Get workspace storage usage",
         "description": "Get storage usage information for the user's workspace.",
-        "operationId": "getWorkspaceGet workspace storage usage",
+        "operationId": "getWorkspaceStorageUsage",
         "responses": {
           "200": {
             "description": "Successful Response",
@@ -9559,13 +9647,13 @@
         "required": ["file"],
         "title": "Body_postV2Upload submission media"
       },
-      "Body_postWorkspaceUpload_file_to_workspace": {
+      "Body_uploadWorkspaceFile": {
         "properties": {
           "file": { "type": "string", "format": "binary", "title": "File" }
         },
         "type": "object",
         "required": ["file"],
-        "title": "Body_postWorkspaceUpload file to workspace"
+        "title": "Body_uploadWorkspaceFile"
       },
       "BulkMoveAgentsRequest": {
         "properties": {
@@ -11834,6 +11922,24 @@
         "required": ["source_id", "sink_id", "source_name", "sink_name"],
         "title": "Link"
       },
+      "ListFilesResponse": {
+        "properties": {
+          "files": {
+            "items": { "$ref": "#/components/schemas/WorkspaceFileItem" },
+            "type": "array",
+            "title": "Files"
+          },
+          "offset": { "type": "integer", "title": "Offset", "default": 0 },
+          "has_more": {
+            "type": "boolean",
+            "title": "Has More",
+            "default": false
+          }
+        },
+        "type": "object",
+        "required": ["files"],
+        "title": "ListFilesResponse"
+      },
       "ListSessionsResponse": {
         "properties": {
           "sessions": {
@@ -13560,6 +13666,15 @@
               { "type": "null" }
             ]
           },
+          "has_more_messages": {
+            "type": "boolean",
+            "title": "Has More Messages",
+            "default": false
+          },
+          "oldest_sequence": {
+            "anyOf": [{ "type": "integer" }, { "type": "null" }],
+            "title": "Oldest Sequence"
+          },
           "total_prompt_tokens": {
             "type": "integer",
             "title": "Total Prompt Tokens",
@@ -14360,6 +14475,14 @@
               { "type": "null" }
             ],
             "title": "File Ids"
+          },
+          "mode": {
+            "anyOf": [
+              { "type": "string", "enum": ["fast", "extended_thinking"] },
+              { "type": "null" }
+            ],
+            "title": "Mode",
+            "description": "Autopilot mode: 'fast' for baseline LLM, 'extended_thinking' for Claude Agent SDK. If None, uses the server default (extended_thinking)."
           }
         },
         "type": "object",
@@ -16489,6 +16612,19 @@
             "format": "date-time",
             "title": "Createdat"
           }
+      "WorkspaceFileItem": {
+        "properties": {
+          "id": { "type": "string", "title": "Id" },
+          "name": { "type": "string", "title": "Name" },
+          "path": { "type": "string", "title": "Path" },
+          "mime_type": { "type": "string", "title": "Mime Type" },
+          "size_bytes": { "type": "integer", "title": "Size Bytes" },
+          "metadata": {
+            "additionalProperties": true,
+            "type": "object",
+            "title": "Metadata"
+          },
+          "created_at": { "type": "string", "title": "Created At" }
         },
         "type": "object",
         "required": [
@@ -16503,6 +16639,12 @@
           "createdAt"
         ],
         "title": "WorkspaceResponse"
+          "path",
+          "mime_type",
+          "size_bytes",
+          "created_at"
+        ],
+        "title": "WorkspaceFileItem"
       },
       "backend__api__features__workspace__routes__UploadFileResponse": {
         "properties": {
diff --git a/autogpt_platform/frontend/src/components/ai-elements/conversation.tsx b/autogpt_platform/frontend/src/components/ai-elements/conversation.tsx
index 7c67361284..cda40eb57f 100644
--- a/autogpt_platform/frontend/src/components/ai-elements/conversation.tsx
+++ b/autogpt_platform/frontend/src/components/ai-elements/conversation.tsx
@@ -1,7 +1,6 @@
 "use client";
 
 import { Button } from "@/components/ui/button";
-import { scrollbarStyles } from "@/components/styles/scrollbars";
 import { cn } from "@/lib/utils";
 import { ArrowDownIcon } from "lucide-react";
 import type { ComponentProps } from "react";
@@ -12,12 +11,8 @@ export type ConversationProps = ComponentProps<typeof StickToBottom>;
 
 export const Conversation = ({ className, ...props }: ConversationProps) => (
   <StickToBottom
-    className={cn(
-      "relative flex-1 overflow-y-hidden",
-      scrollbarStyles,
-      className,
-    )}
-    initial="smooth"
+    className={cn("relative flex-1 overflow-y-hidden", className)}
+    initial="instant"
     resize="smooth"
     role="log"
     {...props}
@@ -30,10 +25,15 @@ export type ConversationContentProps = ComponentProps<
 
 export const ConversationContent = ({
   className,
+  scrollClassName,
   ...props
 }: ConversationContentProps) => (
   <StickToBottom.Content
     className={cn("flex flex-col gap-8 p-4", className)}
+    scrollClassName={cn(
+      "scrollbar-thin scrollbar-track-transparent scrollbar-thumb-zinc-300",
+      scrollClassName,
+    )}
     {...props}
   />
 );
diff --git a/autogpt_platform/frontend/src/components/atoms/Input/Input.tsx b/autogpt_platform/frontend/src/components/atoms/Input/Input.tsx
index 2591a14cf4..ee2caa39af 100644
--- a/autogpt_platform/frontend/src/components/atoms/Input/Input.tsx
+++ b/autogpt_platform/frontend/src/components/atoms/Input/Input.tsx
@@ -78,7 +78,7 @@ export function Input({
     "font-normal text-black",
     "placeholder:font-normal placeholder:text-zinc-400",
     // Focus and hover states
-    "focus:border-zinc-400 focus:shadow-none focus:outline-none focus:ring-1 focus:ring-zinc-400 focus:ring-offset-0",
+    "focus:border-purple-400 focus:shadow-none focus:outline-none focus:ring-1 focus:ring-purple-400 focus:ring-offset-0",
     className,
   );
 
diff --git a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/CredentialsInput.tsx b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/CredentialsInput.tsx
index 6c8e061895..461102d7eb 100644
--- a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/CredentialsInput.tsx
+++ b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/CredentialsInput.tsx
@@ -10,6 +10,7 @@ import { toDisplayName } from "@/providers/agent-credentials/helper";
 import { APIKeyCredentialsModal } from "./components/APIKeyCredentialsModal/APIKeyCredentialsModal";
 import { CredentialsFlatView } from "./components/CredentialsFlatView/CredentialsFlatView";
 import { CredentialTypeSelector } from "./components/CredentialTypeSelector/CredentialTypeSelector";
+import { DeleteConfirmationModal } from "./components/DeleteConfirmationModal/DeleteConfirmationModal";
 import { HostScopedCredentialsModal } from "./components/HotScopedCredentialsModal/HotScopedCredentialsModal";
 import { OAuthFlowWaitingModal } from "./components/OAuthWaitingModal/OAuthWaitingModal";
 import { PasswordCredentialsModal } from "./components/PasswordCredentialsModal/PasswordCredentialsModal";
@@ -90,6 +91,12 @@ export function CredentialsInput({
     handleActionButtonClick,
     handleCredentialSelect,
     handleOAuthLogin,
+    handleDeleteCredential,
+    handleDeleteConfirm,
+    credentialToDelete,
+    deleteWarningMessage,
+    setCredentialToDelete,
+    isDeletingCredential,
   } = hookData;
 
   const displayName = toDisplayName(provider);
@@ -113,6 +120,7 @@ export function CredentialsInput({
         onSelectCredential={handleCredentialSelect}
         onClearCredential={() => onSelectCredential(undefined)}
         onAddCredential={handleActionButtonClick}
+        onDeleteCredential={readOnly ? undefined : handleDeleteCredential}
         actionButtonText={actionButtonText}
         isOptional={isOptional}
         showTitle={showTitle}
@@ -192,6 +200,15 @@ export function CredentialsInput({
               Error: {oAuthError}
             </Text>
           )}
+
+          <DeleteConfirmationModal
+            credentialToDelete={credentialToDelete}
+            warningMessage={deleteWarningMessage}
+            isDeleting={isDeletingCredential}
+            onClose={() => setCredentialToDelete(null)}
+            onConfirm={() => handleDeleteConfirm(false)}
+            onForceConfirm={() => handleDeleteConfirm(true)}
+          />
         </>
       )}
     </div>
diff --git a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/__tests__/helpers.test.ts b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/__tests__/helpers.test.ts
new file mode 100644
index 0000000000..bb68980ac1
--- /dev/null
+++ b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/__tests__/helpers.test.ts
@@ -0,0 +1,449 @@
+import { describe, expect, it, vi } from "vitest";
+import {
+  countSupportedTypes,
+  getSupportedTypes,
+  getCredentialTypeLabel,
+  getActionButtonText,
+  getCredentialDisplayName,
+  isSystemCredential,
+  filterSystemCredentials,
+  getSystemCredentials,
+  processCredentialDeletion,
+  findExistingHostCredentials,
+  hasExistingHostCredential,
+  resolveActionTarget,
+  headerPairsToRecord,
+  addHeaderPairToList,
+  removeHeaderPairFromList,
+  updateHeaderPairInList,
+} from "../helpers";
+
+describe("countSupportedTypes", () => {
+  it("returns 0 when nothing is supported", () => {
+    expect(countSupportedTypes(false, false, false, false)).toBe(0);
+  });
+
+  it("returns 1 for a single supported type", () => {
+    expect(countSupportedTypes(true, false, false, false)).toBe(1);
+    expect(countSupportedTypes(false, true, false, false)).toBe(1);
+  });
+
+  it("returns count of all true flags", () => {
+    expect(countSupportedTypes(true, true, true, true)).toBe(4);
+    expect(countSupportedTypes(true, false, true, false)).toBe(2);
+  });
+});
+
+describe("getSupportedTypes", () => {
+  it("returns empty array when nothing supported", () => {
+    expect(getSupportedTypes(false, false, false, false)).toEqual([]);
+  });
+
+  it("returns oauth2 when supportsOAuth2 is true", () => {
+    expect(getSupportedTypes(true, false, false, false)).toEqual(["oauth2"]);
+  });
+
+  it("returns all supported types in order", () => {
+    expect(getSupportedTypes(true, true, true, true)).toEqual([
+      "oauth2",
+      "api_key",
+      "user_password",
+      "host_scoped",
+    ]);
+  });
+
+  it("returns only the enabled types", () => {
+    expect(getSupportedTypes(false, true, false, true)).toEqual([
+      "api_key",
+      "host_scoped",
+    ]);
+  });
+});
+
+describe("getCredentialTypeLabel", () => {
+  it("returns 'OAuth' for oauth2", () => {
+    expect(getCredentialTypeLabel("oauth2")).toBe("OAuth");
+  });
+
+  it("returns 'API Key' for api_key", () => {
+    expect(getCredentialTypeLabel("api_key")).toBe("API Key");
+  });
+
+  it("returns 'Password' for user_password", () => {
+    expect(getCredentialTypeLabel("user_password")).toBe("Password");
+  });
+
+  it("returns 'Headers' for host_scoped", () => {
+    expect(getCredentialTypeLabel("host_scoped")).toBe("Headers");
+  });
+});
+
+describe("getActionButtonText", () => {
+  it("returns generic text for multiple types without existing", () => {
+    expect(getActionButtonText(true, true, false, false, false)).toBe(
+      "Add credential",
+    );
+  });
+
+  it("returns generic text for multiple types with existing", () => {
+    expect(getActionButtonText(true, true, false, false, true)).toBe(
+      "Add another credential",
+    );
+  });
+
+  it("returns specific text for single OAuth2 without existing", () => {
+    expect(getActionButtonText(true, false, false, false, false)).toBe(
+      "Add account",
+    );
+  });
+
+  it("returns specific text for single OAuth2 with existing", () => {
+    expect(getActionButtonText(true, false, false, false, true)).toBe(
+      "Connect another account",
+    );
+  });
+
+  it("returns API key text for single API key", () => {
+    expect(getActionButtonText(false, true, false, false, false)).toBe(
+      "Add API key",
+    );
+    expect(getActionButtonText(false, true, false, false, true)).toBe(
+      "Use a new API key",
+    );
+  });
+
+  it("returns password text for single user_password", () => {
+    expect(getActionButtonText(false, false, true, false, false)).toBe(
+      "Add username and password",
+    );
+    expect(getActionButtonText(false, false, true, false, true)).toBe(
+      "Add a new username and password",
+    );
+  });
+
+  it("returns headers text for single host_scoped", () => {
+    expect(getActionButtonText(false, false, false, true, false)).toBe(
+      "Add headers",
+    );
+    expect(getActionButtonText(false, false, false, true, true)).toBe(
+      "Update headers",
+    );
+  });
+
+  it("returns fallback text when no type is supported", () => {
+    expect(getActionButtonText(false, false, false, false, false)).toBe(
+      "Add credentials",
+    );
+    expect(getActionButtonText(false, false, false, false, true)).toBe(
+      "Add new credentials",
+    );
+  });
+});
+
+describe("getCredentialDisplayName", () => {
+  it("returns title when present", () => {
+    expect(getCredentialDisplayName({ title: "My API Key" }, "Google")).toBe(
+      "My API Key",
+    );
+  });
+
+  it("returns username when title is missing", () => {
+    expect(
+      getCredentialDisplayName({ username: "user@example.com" }, "Google"),
+    ).toBe("user@example.com");
+  });
+
+  it("returns fallback when both are missing", () => {
+    expect(getCredentialDisplayName({}, "Google")).toBe("Your Google account");
+  });
+});
+
+describe("isSystemCredential", () => {
+  it("returns true when is_system is true", () => {
+    expect(isSystemCredential({ is_system: true })).toBe(true);
+  });
+
+  it("returns false when is_system is false and no title", () => {
+    expect(isSystemCredential({ is_system: false })).toBe(false);
+  });
+
+  it("returns true when title contains 'system'", () => {
+    expect(isSystemCredential({ title: "System Default" })).toBe(true);
+  });
+
+  it("returns true when title starts with 'use credits for'", () => {
+    expect(isSystemCredential({ title: "Use Credits for OpenAI" })).toBe(true);
+  });
+
+  it("returns true when title contains 'use credits'", () => {
+    expect(isSystemCredential({ title: "Please use credits" })).toBe(true);
+  });
+
+  it("returns false for regular credential", () => {
+    expect(isSystemCredential({ title: "My API Key" })).toBe(false);
+  });
+
+  it("returns false when title is null", () => {
+    expect(isSystemCredential({ title: null })).toBe(false);
+  });
+});
+
+describe("filterSystemCredentials", () => {
+  it("removes system credentials", () => {
+    const creds = [
+      { title: "My Key", is_system: false },
+      { title: "System Default", is_system: true },
+      { title: "Other Key" },
+    ];
+    expect(filterSystemCredentials(creds)).toEqual([
+      { title: "My Key", is_system: false },
+      { title: "Other Key" },
+    ]);
+  });
+
+  it("returns empty array when all are system", () => {
+    expect(filterSystemCredentials([{ is_system: true }])).toEqual([]);
+  });
+});
+
+describe("getSystemCredentials", () => {
+  it("returns only system credentials", () => {
+    const creds = [
+      { title: "My Key", is_system: false },
+      { title: "System Default", is_system: true },
+    ];
+    expect(getSystemCredentials(creds)).toEqual([
+      { title: "System Default", is_system: true },
+    ]);
+  });
+});
+
+describe("processCredentialDeletion", () => {
+  const cred = { id: "cred-1", title: "My Key" };
+
+  it("clears state on successful deletion", async () => {
+    const deleteFn = vi.fn().mockResolvedValue({ deleted: true });
+    const state = await processCredentialDeletion(
+      cred,
+      "other",
+      deleteFn,
+      false,
+    );
+    expect(state.credentialToDelete).toBeNull();
+    expect(state.shouldUnselectCurrent).toBe(false);
+  });
+
+  it("flags shouldUnselectCurrent when selected credential is deleted", async () => {
+    const deleteFn = vi.fn().mockResolvedValue({ deleted: true });
+    const state = await processCredentialDeletion(
+      cred,
+      "cred-1",
+      deleteFn,
+      false,
+    );
+    expect(state.shouldUnselectCurrent).toBe(true);
+  });
+
+  it("returns warning when confirmation needed", async () => {
+    const deleteFn = vi.fn().mockResolvedValue({
+      deleted: false,
+      need_confirmation: true,
+      message: "In use",
+    });
+    const state = await processCredentialDeletion(
+      cred,
+      undefined,
+      deleteFn,
+      false,
+    );
+    expect(state.warningMessage).toBe("In use");
+    expect(state.credentialToDelete).toBe(cred);
+  });
+
+  it("uses fallback warning when message is empty", async () => {
+    const deleteFn = vi.fn().mockResolvedValue({
+      deleted: false,
+      need_confirmation: true,
+      message: "",
+    });
+    const state = await processCredentialDeletion(
+      cred,
+      undefined,
+      deleteFn,
+      false,
+    );
+    expect(state.warningMessage).toBe(
+      "This credential is in use. Force delete?",
+    );
+  });
+
+  it("passes force=true to the delete function", async () => {
+    const deleteFn = vi.fn().mockResolvedValue({ deleted: true });
+    await processCredentialDeletion(cred, undefined, deleteFn, true);
+    expect(deleteFn).toHaveBeenCalledWith("cred-1", true);
+  });
+});
+
+describe("findExistingHostCredentials", () => {
+  const creds = [
+    { id: "1", type: "host_scoped", host: "a.com" },
+    { id: "2", type: "api_key" },
+    { id: "3", type: "host_scoped", host: "b.com" },
+  ];
+
+  it("returns matching host_scoped credentials", () => {
+    expect(findExistingHostCredentials(creds, "a.com")).toEqual([
+      { id: "1", type: "host_scoped", host: "a.com" },
+    ]);
+  });
+
+  it("returns empty when no match", () => {
+    expect(findExistingHostCredentials(creds, "c.com")).toEqual([]);
+  });
+});
+
+describe("hasExistingHostCredential", () => {
+  const creds = [{ type: "host_scoped", host: "x.com" }, { type: "api_key" }];
+
+  it("returns true for existing host", () => {
+    expect(hasExistingHostCredential(creds, "x.com")).toBe(true);
+  });
+
+  it("returns false for non-existing host", () => {
+    expect(hasExistingHostCredential(creds, "y.com")).toBe(false);
+  });
+});
+
+describe("resolveActionTarget", () => {
+  it("returns type_selector when hasMultipleCredentialTypes is true", () => {
+    expect(resolveActionTarget(true, true, true, false, false)).toBe(
+      "type_selector",
+    );
+  });
+
+  it("returns oauth when only OAuth2 is supported", () => {
+    expect(resolveActionTarget(false, true, false, false, false)).toBe("oauth");
+  });
+
+  it("returns api_key when only API key is supported", () => {
+    expect(resolveActionTarget(false, false, true, false, false)).toBe(
+      "api_key",
+    );
+  });
+
+  it("returns user_password when only user_password is supported", () => {
+    expect(resolveActionTarget(false, false, false, true, false)).toBe(
+      "user_password",
+    );
+  });
+
+  it("returns host_scoped when only host_scoped is supported", () => {
+    expect(resolveActionTarget(false, false, false, false, true)).toBe(
+      "host_scoped",
+    );
+  });
+
+  it("returns null when nothing is supported", () => {
+    expect(resolveActionTarget(false, false, false, false, false)).toBeNull();
+  });
+
+  it("prefers oauth over api_key when not multiple types", () => {
+    expect(resolveActionTarget(false, true, true, false, false)).toBe("oauth");
+  });
+});
+
+describe("headerPairsToRecord", () => {
+  it("converts pairs to record filtering empty entries", () => {
+    const pairs = [
+      { key: "Authorization", value: "Bearer token" },
+      { key: "", value: "ignored" },
+      { key: "X-Key", value: "" },
+      { key: "  Accept  ", value: "  application/json  " },
+    ];
+    expect(headerPairsToRecord(pairs)).toEqual({
+      Authorization: "Bearer token",
+      Accept: "application/json",
+    });
+  });
+
+  it("returns empty object for empty pairs", () => {
+    expect(headerPairsToRecord([])).toEqual({});
+  });
+
+  it("returns empty object when all pairs are empty", () => {
+    expect(headerPairsToRecord([{ key: "", value: "" }])).toEqual({});
+  });
+});
+
+describe("addHeaderPairToList", () => {
+  it("adds a new empty pair to the list", () => {
+    const pairs = [{ key: "a", value: "b" }];
+    const result = addHeaderPairToList(pairs);
+    expect(result).toHaveLength(2);
+    expect(result[1]).toEqual({ key: "", value: "" });
+  });
+
+  it("does not mutate the original array", () => {
+    const pairs = [{ key: "a", value: "b" }];
+    const result = addHeaderPairToList(pairs);
+    expect(pairs).toHaveLength(1);
+    expect(result).not.toBe(pairs);
+  });
+});
+
+describe("removeHeaderPairFromList", () => {
+  it("removes the pair at the given index", () => {
+    const pairs = [
+      { key: "a", value: "1" },
+      { key: "b", value: "2" },
+      { key: "c", value: "3" },
+    ];
+    const result = removeHeaderPairFromList(pairs, 1);
+    expect(result).toEqual([
+      { key: "a", value: "1" },
+      { key: "c", value: "3" },
+    ]);
+  });
+
+  it("does not remove when only one pair remains", () => {
+    const pairs = [{ key: "a", value: "1" }];
+    const result = removeHeaderPairFromList(pairs, 0);
+    expect(result).toHaveLength(1);
+    expect(result).toBe(pairs);
+  });
+
+  it("does not mutate the original array", () => {
+    const pairs = [
+      { key: "a", value: "1" },
+      { key: "b", value: "2" },
+    ];
+    removeHeaderPairFromList(pairs, 0);
+    expect(pairs).toHaveLength(2);
+  });
+});
+
+describe("updateHeaderPairInList", () => {
+  it("updates the key of a pair at the given index", () => {
+    const pairs = [
+      { key: "a", value: "1" },
+      { key: "b", value: "2" },
+    ];
+    const result = updateHeaderPairInList(pairs, 0, "key", "updated");
+    expect(result[0]).toEqual({ key: "updated", value: "1" });
+    expect(result[1]).toEqual({ key: "b", value: "2" });
+  });
+
+  it("updates the value of a pair at the given index", () => {
+    const pairs = [{ key: "a", value: "1" }];
+    const result = updateHeaderPairInList(pairs, 0, "value", "new-val");
+    expect(result[0]).toEqual({ key: "a", value: "new-val" });
+  });
+
+  it("does not mutate the original array or pair objects", () => {
+    const pairs = [{ key: "a", value: "1" }];
+    const result = updateHeaderPairInList(pairs, 0, "key", "b");
+    expect(pairs[0].key).toBe("a");
+    expect(result).not.toBe(pairs);
+    expect(result[0]).not.toBe(pairs[0]);
+  });
+});
diff --git a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/CredentialsFlatView/CredentialsFlatView.tsx b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/CredentialsFlatView/CredentialsFlatView.tsx
index 9457ae5732..a458533e19 100644
--- a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/CredentialsFlatView/CredentialsFlatView.tsx
+++ b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/CredentialsFlatView/CredentialsFlatView.tsx
@@ -31,6 +31,7 @@ type Props = {
   onSelectCredential: (credentialId: string) => void;
   onClearCredential: () => void;
   onAddCredential: () => void;
+  onDeleteCredential?: (credential: { id: string; title: string }) => void;
 };
 
 export function CredentialsFlatView({
@@ -47,6 +48,7 @@ export function CredentialsFlatView({
   onSelectCredential,
   onClearCredential,
   onAddCredential,
+  onDeleteCredential,
 }: Props) {
   const hasCredentials = credentials.length > 0;
 
@@ -99,6 +101,15 @@ export function CredentialsFlatView({
                   provider={provider}
                   displayName={displayName}
                   onSelect={() => onSelectCredential(credential.id)}
+                  onDelete={
+                    onDeleteCredential
+                      ? () =>
+                          onDeleteCredential({
+                            id: credential.id,
+                            title: credential.title || credential.id,
+                          })
+                      : undefined
+                  }
                   readOnly={readOnly}
                 />
               ))}
diff --git a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/DeleteConfirmationModal/DeleteConfirmationModal.tsx b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/DeleteConfirmationModal/DeleteConfirmationModal.tsx
index e3dd811ccc..2fd427003b 100644
--- a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/DeleteConfirmationModal/DeleteConfirmationModal.tsx
+++ b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/DeleteConfirmationModal/DeleteConfirmationModal.tsx
@@ -4,16 +4,20 @@ import { Dialog } from "@/components/molecules/Dialog/Dialog";
 
 interface Props {
   credentialToDelete: { id: string; title: string } | null;
+  warningMessage?: string | null;
   isDeleting: boolean;
   onClose: () => void;
   onConfirm: () => void;
+  onForceConfirm: () => void;
 }
 
 export function DeleteConfirmationModal({
   credentialToDelete,
+  warningMessage,
   isDeleting,
   onClose,
   onConfirm,
+  onForceConfirm,
 }: Props) {
   return (
     <Dialog
@@ -27,21 +31,35 @@ export function DeleteConfirmationModal({
       styling={{ maxWidth: "32rem" }}
     >
       <Dialog.Content>
-        <Text variant="large">
-          Are you sure you want to delete &quot;{credentialToDelete?.title}
-          &quot;? This action cannot be undone.
-        </Text>
+        {warningMessage ? (
+          <Text variant="large">{warningMessage}</Text>
+        ) : (
+          <Text variant="large">
+            Are you sure you want to delete &quot;{credentialToDelete?.title}
+            &quot;? This action cannot be undone.
+          </Text>
+        )}
         <Dialog.Footer>
           <Button variant="secondary" onClick={onClose} disabled={isDeleting}>
             Cancel
           </Button>
-          <Button
-            variant="destructive"
-            onClick={onConfirm}
-            loading={isDeleting}
-          >
-            Delete
-          </Button>
+          {warningMessage ? (
+            <Button
+              variant="destructive"
+              onClick={onForceConfirm}
+              loading={isDeleting}
+            >
+              Force Delete
+            </Button>
+          ) : (
+            <Button
+              variant="destructive"
+              onClick={onConfirm}
+              loading={isDeleting}
+            >
+              Delete
+            </Button>
+          )}
         </Dialog.Footer>
       </Dialog.Content>
     </Dialog>
diff --git a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/DeleteConfirmationModal/__tests__/DeleteConfirmationModal.test.tsx b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/DeleteConfirmationModal/__tests__/DeleteConfirmationModal.test.tsx
new file mode 100644
index 0000000000..c1f9d4b9b7
--- /dev/null
+++ b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/DeleteConfirmationModal/__tests__/DeleteConfirmationModal.test.tsx
@@ -0,0 +1,76 @@
+import { render, screen, fireEvent, cleanup } from "@testing-library/react";
+import { afterEach, describe, expect, it, vi } from "vitest";
+import { DeleteConfirmationModal } from "../DeleteConfirmationModal";
+
+afterEach(() => {
+  cleanup();
+});
+
+const credential = { id: "cred-1", title: "My API Key" };
+
+function renderModal(
+  overrides: Partial<Parameters<typeof DeleteConfirmationModal>[0]> = {},
+) {
+  const defaultProps = {
+    credentialToDelete: credential,
+    isDeleting: false,
+    onClose: vi.fn(),
+    onConfirm: vi.fn(),
+    onForceConfirm: vi.fn(),
+    ...overrides,
+  };
+  return {
+    ...render(<DeleteConfirmationModal {...defaultProps} />),
+    props: defaultProps,
+  };
+}
+
+describe("DeleteConfirmationModal", () => {
+  it("shows confirmation text with credential title when no warning", () => {
+    renderModal();
+    expect(screen.getByText(/Are you sure you want to delete/)).toBeDefined();
+    expect(screen.getByText(/My API Key/)).toBeDefined();
+  });
+
+  it("shows Delete button when no warning message", () => {
+    renderModal();
+    expect(screen.getByText("Delete")).toBeDefined();
+    expect(screen.queryByText("Force Delete")).toBeNull();
+  });
+
+  it("shows warning message when provided", () => {
+    renderModal({ warningMessage: "Used by 3 agents" });
+    expect(screen.getByText("Used by 3 agents")).toBeDefined();
+    expect(screen.queryByText(/Are you sure/)).toBeNull();
+  });
+
+  it("shows Force Delete button when warning message is present", () => {
+    renderModal({ warningMessage: "Credential is in use" });
+    expect(screen.getByText("Force Delete")).toBeDefined();
+    expect(screen.queryByText("Delete")).toBeNull();
+  });
+
+  it("calls onConfirm when Delete button is clicked", () => {
+    const { props } = renderModal();
+    fireEvent.click(screen.getByText("Delete"));
+    expect(props.onConfirm).toHaveBeenCalledOnce();
+  });
+
+  it("calls onForceConfirm when Force Delete button is clicked", () => {
+    const { props } = renderModal({ warningMessage: "In use" });
+    fireEvent.click(screen.getByText("Force Delete"));
+    expect(props.onForceConfirm).toHaveBeenCalledOnce();
+  });
+
+  it("calls onClose when Cancel button is clicked", () => {
+    const { props } = renderModal();
+    fireEvent.click(screen.getByText("Cancel"));
+    expect(props.onClose).toHaveBeenCalledOnce();
+  });
+
+  it("disables Cancel button when isDeleting is true", () => {
+    renderModal({ isDeleting: true });
+    const cancelButton = screen.getByText("Cancel");
+    expect(cancelButton.closest("button")?.disabled).toBe(true);
+  });
+});
diff --git a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/HotScopedCredentialsModal/HotScopedCredentialsModal.tsx b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/HotScopedCredentialsModal/HotScopedCredentialsModal.tsx
index 63d2ae1ac5..b1339220e5 100644
--- a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/HotScopedCredentialsModal/HotScopedCredentialsModal.tsx
+++ b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/components/HotScopedCredentialsModal/HotScopedCredentialsModal.tsx
@@ -1,4 +1,4 @@
-import { useEffect, useState } from "react";
+import { useContext, useEffect, useState } from "react";
 import { z } from "zod";
 import { useForm } from "react-hook-form";
 import { zodResolver } from "@hookform/resolvers/zod";
@@ -16,8 +16,19 @@ import {
   BlockIOCredentialsSubSchema,
   CredentialsMetaInput,
 } from "@/lib/autogpt-server-api/types";
+import { CredentialsProvidersContext } from "@/providers/agent-credentials/credentials-provider";
 import { getHostFromUrl } from "@/lib/utils/url";
 import { PlusIcon, TrashIcon } from "@phosphor-icons/react";
+import { toast } from "@/components/molecules/Toast/use-toast";
+import {
+  addHeaderPairToList,
+  findExistingHostCredentials,
+  hasExistingHostCredential,
+  headerPairsToRecord,
+  removeHeaderPairFromList,
+  updateHeaderPairInList,
+  type HeaderPair,
+} from "../../helpers";
 
 type Props = {
   schema: BlockIOCredentialsSubSchema;
@@ -35,6 +46,7 @@ export function HostScopedCredentialsModal({
   siblingInputs,
 }: Props) {
   const credentials = useCredentials(schema, siblingInputs);
+  const allProviders = useContext(CredentialsProvidersContext);
 
   // Get current host from siblingInputs or discriminator_values
   const currentUrl = credentials?.discriminatorValue;
@@ -65,9 +77,9 @@ export function HostScopedCredentialsModal({
     },
   });
 
-  const [headerPairs, setHeaderPairs] = useState<
-    Array<{ key: string; value: string }>
-  >([{ key: "", value: "" }]);
+  const [headerPairs, setHeaderPairs] = useState<HeaderPair[]>([
+    { key: "", value: "" },
+  ]);
 
   // Update form values when siblingInputs change
   useEffect(() => {
@@ -89,16 +101,30 @@ export function HostScopedCredentialsModal({
     return null;
   }
 
-  const { provider, providerName, createHostScopedCredentials } = credentials;
+  const {
+    provider,
+    providerName,
+    createHostScopedCredentials,
+    deleteCredentials,
+  } = credentials;
+
+  // Use the unfiltered credential list from the provider context for deduplication.
+  // The hook's savedCredentials is pre-filtered by discriminatorValue, which may be
+  // empty when no URL is entered yet — causing deduplication to miss existing creds.
+  const allProviderCredentials =
+    allProviders?.[provider]?.savedCredentials ?? [];
+
+  const hasExistingForHost = hasExistingHostCredential(
+    allProviderCredentials,
+    currentHost || form.getValues("host"),
+  );
 
   const addHeaderPair = () => {
-    setHeaderPairs([...headerPairs, { key: "", value: "" }]);
+    setHeaderPairs((prev) => addHeaderPairToList(prev));
   };
 
   const removeHeaderPair = (index: number) => {
-    if (headerPairs.length > 1) {
-      setHeaderPairs(headerPairs.filter((_, i) => i !== index));
-    }
+    setHeaderPairs((prev) => removeHeaderPairFromList(prev, index));
   };
 
   const updateHeaderPair = (
@@ -106,40 +132,55 @@ export function HostScopedCredentialsModal({
     field: "key" | "value",
     value: string,
   ) => {
-    const newPairs = [...headerPairs];
-    newPairs[index][field] = value;
-    setHeaderPairs(newPairs);
+    setHeaderPairs((prev) => updateHeaderPairInList(prev, index, field, value));
   };
 
   async function onSubmit(values: z.infer<typeof formSchema>) {
-    // Convert header pairs to object, filtering out empty pairs
-    const headers = headerPairs.reduce(
-      (acc, pair) => {
-        if (pair.key.trim() && pair.value.trim()) {
-          acc[pair.key.trim()] = pair.value.trim();
-        }
-        return acc;
-      },
-      {} as Record<string, string>,
+    const headers = headerPairsToRecord(headerPairs);
+
+    // Delete existing host-scoped credentials for the same host to avoid duplicates.
+    // Uses unfiltered provider credentials (not the hook's pre-filtered list).
+    const host = values.host;
+    const existingForHost = findExistingHostCredentials(
+      allProviderCredentials,
+      host,
     );
 
-    const newCredentials = await createHostScopedCredentials({
-      host: values.host,
-      title: currentHost || values.host,
-      headers,
-    });
+    try {
+      for (const existing of existingForHost) {
+        await deleteCredentials(existing.id, true);
+      }
 
-    onCredentialsCreate({
-      provider,
-      id: newCredentials.id,
-      type: "host_scoped",
-      title: newCredentials.title,
-    });
+      const newCredentials = await createHostScopedCredentials({
+        host,
+        title: currentHost || host,
+        headers,
+      });
+
+      onCredentialsCreate({
+        provider,
+        id: newCredentials.id,
+        type: "host_scoped",
+        title: newCredentials.title,
+      });
+    } catch (error) {
+      const message =
+        error instanceof Error ? error.message : "Something went wrong";
+      toast({
+        title: "Failed to save credentials",
+        description: message,
+        variant: "destructive",
+      });
+    }
   }
 
   return (
     <Dialog
-      title={`Add sensitive headers for ${providerName}`}
+      title={
+        hasExistingForHost
+          ? `Update sensitive headers for ${providerName}`
+          : `Add sensitive headers for ${providerName}`
+      }
       controlled={{
         isOpen: open,
         set: (isOpen) => {
@@ -241,7 +282,9 @@ export function HostScopedCredentialsModal({
 
             <div className="pt-8">
               <Button type="submit" className="w-full" size="small">
-                Save & use these credentials
+                {hasExistingForHost
+                  ? "Update & use these credentials"
+                  : "Save & use these credentials"}
               </Button>
             </div>
           </form>
diff --git a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.test.ts b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.test.ts
new file mode 100644
index 0000000000..bc9b46142b
--- /dev/null
+++ b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.test.ts
@@ -0,0 +1,554 @@
+import { describe, expect, it, vi } from "vitest";
+import {
+  countSupportedTypes,
+  getSupportedTypes,
+  getCredentialTypeLabel,
+  getActionButtonText,
+  getCredentialDisplayName,
+  isSystemCredential,
+  filterSystemCredentials,
+  getSystemCredentials,
+  processCredentialDeletion,
+  findExistingHostCredentials,
+  hasExistingHostCredential,
+  OAUTH_TIMEOUT_MS,
+  MASKED_KEY_LENGTH,
+  resolveActionTarget,
+  headerPairsToRecord,
+  addHeaderPairToList,
+  removeHeaderPairFromList,
+  updateHeaderPairInList,
+} from "./helpers";
+
+describe("countSupportedTypes", () => {
+  it("returns 0 when no types are supported", () => {
+    expect(countSupportedTypes(false, false, false, false)).toBe(0);
+  });
+
+  it("returns 1 when only one type is supported", () => {
+    expect(countSupportedTypes(true, false, false, false)).toBe(1);
+    expect(countSupportedTypes(false, true, false, false)).toBe(1);
+    expect(countSupportedTypes(false, false, true, false)).toBe(1);
+    expect(countSupportedTypes(false, false, false, true)).toBe(1);
+  });
+
+  it("returns correct count for multiple types", () => {
+    expect(countSupportedTypes(true, true, false, false)).toBe(2);
+    expect(countSupportedTypes(true, true, true, false)).toBe(3);
+    expect(countSupportedTypes(true, true, true, true)).toBe(4);
+  });
+});
+
+describe("getSupportedTypes", () => {
+  it("returns empty array when no types are supported", () => {
+    expect(getSupportedTypes(false, false, false, false)).toEqual([]);
+  });
+
+  it("returns oauth2 when supportsOAuth2 is true", () => {
+    expect(getSupportedTypes(true, false, false, false)).toEqual(["oauth2"]);
+  });
+
+  it("returns api_key when supportsApiKey is true", () => {
+    expect(getSupportedTypes(false, true, false, false)).toEqual(["api_key"]);
+  });
+
+  it("returns user_password when supportsUserPassword is true", () => {
+    expect(getSupportedTypes(false, false, true, false)).toEqual([
+      "user_password",
+    ]);
+  });
+
+  it("returns host_scoped when supportsHostScoped is true", () => {
+    expect(getSupportedTypes(false, false, false, true)).toEqual([
+      "host_scoped",
+    ]);
+  });
+
+  it("returns all types in order when all are supported", () => {
+    expect(getSupportedTypes(true, true, true, true)).toEqual([
+      "oauth2",
+      "api_key",
+      "user_password",
+      "host_scoped",
+    ]);
+  });
+});
+
+describe("getCredentialTypeLabel", () => {
+  it("returns OAuth for oauth2", () => {
+    expect(getCredentialTypeLabel("oauth2")).toBe("OAuth");
+  });
+
+  it("returns API Key for api_key", () => {
+    expect(getCredentialTypeLabel("api_key")).toBe("API Key");
+  });
+
+  it("returns Password for user_password", () => {
+    expect(getCredentialTypeLabel("user_password")).toBe("Password");
+  });
+
+  it("returns Headers for host_scoped", () => {
+    expect(getCredentialTypeLabel("host_scoped")).toBe("Headers");
+  });
+});
+
+describe("getActionButtonText", () => {
+  describe("when multiple types are supported", () => {
+    it("returns generic text without existing credentials", () => {
+      expect(getActionButtonText(true, true, false, false, false)).toBe(
+        "Add credential",
+      );
+    });
+
+    it("returns generic text with existing credentials", () => {
+      expect(getActionButtonText(true, true, false, false, true)).toBe(
+        "Add another credential",
+      );
+    });
+  });
+
+  describe("when only OAuth2 is supported", () => {
+    it("returns 'Add account' without existing credentials", () => {
+      expect(getActionButtonText(true, false, false, false, false)).toBe(
+        "Add account",
+      );
+    });
+
+    it("returns 'Connect another account' with existing credentials", () => {
+      expect(getActionButtonText(true, false, false, false, true)).toBe(
+        "Connect another account",
+      );
+    });
+  });
+
+  describe("when only API key is supported", () => {
+    it("returns 'Add API key' without existing credentials", () => {
+      expect(getActionButtonText(false, true, false, false, false)).toBe(
+        "Add API key",
+      );
+    });
+
+    it("returns 'Use a new API key' with existing credentials", () => {
+      expect(getActionButtonText(false, true, false, false, true)).toBe(
+        "Use a new API key",
+      );
+    });
+  });
+
+  describe("when only user_password is supported", () => {
+    it("returns 'Add username and password' without existing credentials", () => {
+      expect(getActionButtonText(false, false, true, false, false)).toBe(
+        "Add username and password",
+      );
+    });
+
+    it("returns 'Add a new username and password' with existing credentials", () => {
+      expect(getActionButtonText(false, false, true, false, true)).toBe(
+        "Add a new username and password",
+      );
+    });
+  });
+
+  describe("when only host_scoped is supported", () => {
+    it("returns 'Add headers' without existing credentials", () => {
+      expect(getActionButtonText(false, false, false, true, false)).toBe(
+        "Add headers",
+      );
+    });
+
+    it("returns 'Update headers' with existing credentials", () => {
+      expect(getActionButtonText(false, false, false, true, true)).toBe(
+        "Update headers",
+      );
+    });
+  });
+
+  describe("when no types are supported", () => {
+    it("returns 'Add credentials' without existing credentials", () => {
+      expect(getActionButtonText(false, false, false, false, false)).toBe(
+        "Add credentials",
+      );
+    });
+
+    it("returns 'Add new credentials' with existing credentials", () => {
+      expect(getActionButtonText(false, false, false, false, true)).toBe(
+        "Add new credentials",
+      );
+    });
+  });
+});
+
+describe("getCredentialDisplayName", () => {
+  it("returns title when present", () => {
+    expect(
+      getCredentialDisplayName({ title: "My Key", username: "user" }, "GitHub"),
+    ).toBe("My Key");
+  });
+
+  it("falls back to username when title is missing", () => {
+    expect(getCredentialDisplayName({ username: "jdoe" }, "GitHub")).toBe(
+      "jdoe",
+    );
+  });
+
+  it("falls back to display name when both title and username are missing", () => {
+    expect(getCredentialDisplayName({}, "GitHub")).toBe("Your GitHub account");
+  });
+
+  it("falls back when title is empty string", () => {
+    expect(getCredentialDisplayName({ title: "" }, "GitHub")).toBe(
+      "Your GitHub account",
+    );
+  });
+});
+
+describe("isSystemCredential", () => {
+  it("returns true when is_system is true", () => {
+    expect(isSystemCredential({ is_system: true })).toBe(true);
+  });
+
+  it("returns false when is_system is false and no title", () => {
+    expect(isSystemCredential({ is_system: false })).toBe(false);
+  });
+
+  it("returns false when title is null", () => {
+    expect(isSystemCredential({ title: null })).toBe(false);
+  });
+
+  it("returns false when title is absent", () => {
+    expect(isSystemCredential({})).toBe(false);
+  });
+
+  it("returns true when title contains 'system'", () => {
+    expect(isSystemCredential({ title: "System API Key" })).toBe(true);
+  });
+
+  it("returns true when title contains 'system' case-insensitively", () => {
+    expect(isSystemCredential({ title: "SYSTEM key" })).toBe(true);
+  });
+
+  it("returns true when title starts with 'Use credits for'", () => {
+    expect(isSystemCredential({ title: "Use credits for OpenAI" })).toBe(true);
+  });
+
+  it("returns true when title starts with 'use credits for' case-insensitively", () => {
+    expect(isSystemCredential({ title: "use credits for Anthropic" })).toBe(
+      true,
+    );
+  });
+
+  it("returns true when title contains 'use credits'", () => {
+    expect(isSystemCredential({ title: "Please use credits here" })).toBe(true);
+  });
+
+  it("returns false for a normal credential title", () => {
+    expect(isSystemCredential({ title: "My Personal Key" })).toBe(false);
+  });
+});
+
+describe("filterSystemCredentials", () => {
+  it("returns empty array for empty input", () => {
+    expect(filterSystemCredentials([])).toEqual([]);
+  });
+
+  it("filters out system credentials", () => {
+    const credentials = [
+      { title: "My Key" },
+      { title: "System Key" },
+      { title: "Use credits for OpenAI" },
+      { title: "Personal Token" },
+    ];
+    const result = filterSystemCredentials(credentials);
+    expect(result).toEqual([{ title: "My Key" }, { title: "Personal Token" }]);
+  });
+
+  it("filters out credentials with is_system flag", () => {
+    const credentials = [
+      { title: "Normal", is_system: false },
+      { title: "Hidden", is_system: true },
+    ];
+    const result = filterSystemCredentials(credentials);
+    expect(result).toEqual([{ title: "Normal", is_system: false }]);
+  });
+});
+
+describe("getSystemCredentials", () => {
+  it("returns empty array for empty input", () => {
+    expect(getSystemCredentials([])).toEqual([]);
+  });
+
+  it("returns only system credentials", () => {
+    const credentials = [
+      { title: "My Key" },
+      { title: "System Key" },
+      { title: "Use credits for OpenAI" },
+      { title: "Personal Token" },
+    ];
+    const result = getSystemCredentials(credentials);
+    expect(result).toEqual([
+      { title: "System Key" },
+      { title: "Use credits for OpenAI" },
+    ]);
+  });
+
+  it("returns credentials with is_system flag", () => {
+    const credentials = [
+      { title: "Normal", is_system: false },
+      { title: "Hidden", is_system: true },
+    ];
+    const result = getSystemCredentials(credentials);
+    expect(result).toEqual([{ title: "Hidden", is_system: true }]);
+  });
+});
+
+describe("constants", () => {
+  it("OAUTH_TIMEOUT_MS is 5 minutes", () => {
+    expect(OAUTH_TIMEOUT_MS).toBe(300000);
+  });
+
+  it("MASKED_KEY_LENGTH is 15", () => {
+    expect(MASKED_KEY_LENGTH).toBe(15);
+  });
+});
+
+describe("processCredentialDeletion", () => {
+  const cred = { id: "cred-1", title: "My Key" };
+
+  it("returns cleared state on successful deletion", async () => {
+    const deleteFn = vi.fn().mockResolvedValue({ deleted: true });
+    const state = await processCredentialDeletion(
+      cred,
+      "other-id",
+      deleteFn,
+      false,
+    );
+
+    expect(deleteFn).toHaveBeenCalledWith("cred-1", false);
+    expect(state.credentialToDelete).toBeNull();
+    expect(state.warningMessage).toBeNull();
+    expect(state.shouldUnselectCurrent).toBe(false);
+  });
+
+  it("sets shouldUnselectCurrent when deleting the selected credential", async () => {
+    const deleteFn = vi.fn().mockResolvedValue({ deleted: true });
+    const state = await processCredentialDeletion(
+      cred,
+      "cred-1",
+      deleteFn,
+      false,
+    );
+
+    expect(state.shouldUnselectCurrent).toBe(true);
+    expect(state.credentialToDelete).toBeNull();
+  });
+
+  it("returns warning state when confirmation is needed", async () => {
+    const deleteFn = vi.fn().mockResolvedValue({
+      deleted: false,
+      need_confirmation: true,
+      message: "Used by 3 agents",
+    });
+    const state = await processCredentialDeletion(
+      cred,
+      undefined,
+      deleteFn,
+      false,
+    );
+
+    expect(state.warningMessage).toBe("Used by 3 agents");
+    expect(state.credentialToDelete).toBe(cred);
+    expect(state.shouldUnselectCurrent).toBe(false);
+  });
+
+  it("uses default warning message when none provided", async () => {
+    const deleteFn = vi.fn().mockResolvedValue({
+      deleted: false,
+      need_confirmation: true,
+      message: "",
+    });
+    const state = await processCredentialDeletion(
+      cred,
+      undefined,
+      deleteFn,
+      false,
+    );
+
+    expect(state.warningMessage).toBe(
+      "This credential is in use. Force delete?",
+    );
+  });
+
+  it("passes force flag to delete function", async () => {
+    const deleteFn = vi.fn().mockResolvedValue({ deleted: true });
+    await processCredentialDeletion(cred, undefined, deleteFn, true);
+
+    expect(deleteFn).toHaveBeenCalledWith("cred-1", true);
+  });
+
+  it("returns unchanged state for unknown result shape", async () => {
+    const deleteFn = vi.fn().mockResolvedValue({ deleted: false });
+    const state = await processCredentialDeletion(
+      cred,
+      undefined,
+      deleteFn,
+      false,
+    );
+
+    expect(state.warningMessage).toBeNull();
+    expect(state.credentialToDelete).toBe(cred);
+    expect(state.shouldUnselectCurrent).toBe(false);
+  });
+});
+
+describe("findExistingHostCredentials", () => {
+  const credentials = [
+    { id: "1", type: "host_scoped", host: "api.example.com" },
+    { id: "2", type: "host_scoped", host: "api.other.com" },
+    { id: "3", type: "api_key" },
+    { id: "4", type: "host_scoped", host: "api.example.com" },
+  ];
+
+  it("finds credentials matching the given host", () => {
+    const result = findExistingHostCredentials(credentials, "api.example.com");
+    expect(result).toHaveLength(2);
+    expect(result[0].id).toBe("1");
+    expect(result[1].id).toBe("4");
+  });
+
+  it("returns empty array when no match", () => {
+    expect(findExistingHostCredentials(credentials, "unknown.com")).toEqual([]);
+  });
+
+  it("ignores non-host_scoped credentials", () => {
+    const result = findExistingHostCredentials(credentials, "api.other.com");
+    expect(result).toHaveLength(1);
+    expect(result[0].id).toBe("2");
+  });
+
+  it("returns empty array for empty credentials list", () => {
+    expect(findExistingHostCredentials([], "any.com")).toEqual([]);
+  });
+});
+
+describe("hasExistingHostCredential", () => {
+  const credentials = [
+    { type: "host_scoped", host: "api.example.com" },
+    { type: "api_key" },
+  ];
+
+  it("returns true when a host_scoped credential exists for the host", () => {
+    expect(hasExistingHostCredential(credentials, "api.example.com")).toBe(
+      true,
+    );
+  });
+
+  it("returns false when no matching host_scoped credential exists", () => {
+    expect(hasExistingHostCredential(credentials, "other.com")).toBe(false);
+  });
+
+  it("returns false for empty credentials list", () => {
+    expect(hasExistingHostCredential([], "any.com")).toBe(false);
+  });
+});
+
+describe("resolveActionTarget", () => {
+  it("returns type_selector when hasMultipleCredentialTypes is true", () => {
+    expect(resolveActionTarget(true, true, true, false, false)).toBe(
+      "type_selector",
+    );
+  });
+
+  it("returns oauth when only OAuth2 is supported", () => {
+    expect(resolveActionTarget(false, true, false, false, false)).toBe("oauth");
+  });
+
+  it("returns api_key when only API key is supported", () => {
+    expect(resolveActionTarget(false, false, true, false, false)).toBe(
+      "api_key",
+    );
+  });
+
+  it("returns user_password when only user_password is supported", () => {
+    expect(resolveActionTarget(false, false, false, true, false)).toBe(
+      "user_password",
+    );
+  });
+
+  it("returns host_scoped when only host_scoped is supported", () => {
+    expect(resolveActionTarget(false, false, false, false, true)).toBe(
+      "host_scoped",
+    );
+  });
+
+  it("returns null when nothing is supported", () => {
+    expect(resolveActionTarget(false, false, false, false, false)).toBeNull();
+  });
+});
+
+describe("headerPairsToRecord", () => {
+  it("converts non-empty pairs to record", () => {
+    const pairs = [
+      { key: "Authorization", value: "Bearer token" },
+      { key: "", value: "ignored" },
+      { key: "X-Key", value: "" },
+    ];
+    expect(headerPairsToRecord(pairs)).toEqual({
+      Authorization: "Bearer token",
+    });
+  });
+
+  it("trims keys and values", () => {
+    expect(
+      headerPairsToRecord([{ key: "  Accept  ", value: "  text/html  " }]),
+    ).toEqual({ Accept: "text/html" });
+  });
+
+  it("returns empty object for empty pairs", () => {
+    expect(headerPairsToRecord([])).toEqual({});
+  });
+});
+
+describe("addHeaderPairToList", () => {
+  it("appends an empty pair", () => {
+    const result = addHeaderPairToList([{ key: "a", value: "b" }]);
+    expect(result).toHaveLength(2);
+    expect(result[1]).toEqual({ key: "", value: "" });
+  });
+});
+
+describe("removeHeaderPairFromList", () => {
+  it("removes the pair at index", () => {
+    const pairs = [
+      { key: "a", value: "1" },
+      { key: "b", value: "2" },
+    ];
+    expect(removeHeaderPairFromList(pairs, 0)).toEqual([
+      { key: "b", value: "2" },
+    ]);
+  });
+
+  it("does not remove the last pair", () => {
+    const pairs = [{ key: "a", value: "1" }];
+    expect(removeHeaderPairFromList(pairs, 0)).toBe(pairs);
+  });
+});
+
+describe("updateHeaderPairInList", () => {
+  it("updates key at the given index", () => {
+    const pairs = [{ key: "a", value: "1" }];
+    const result = updateHeaderPairInList(pairs, 0, "key", "b");
+    expect(result[0]).toEqual({ key: "b", value: "1" });
+  });
+
+  it("updates value at the given index", () => {
+    const pairs = [{ key: "a", value: "1" }];
+    const result = updateHeaderPairInList(pairs, 0, "value", "2");
+    expect(result[0]).toEqual({ key: "a", value: "2" });
+  });
+
+  it("does not mutate originals", () => {
+    const pairs = [{ key: "a", value: "1" }];
+    updateHeaderPairInList(pairs, 0, "key", "b");
+    expect(pairs[0].key).toBe("a");
+  });
+});
diff --git a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts
index a6485b0b22..9b0bc9bed1 100644
--- a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts
+++ b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/helpers.ts
@@ -149,7 +149,7 @@ export function getActionButtonText(
     if (supportsOAuth2) return "Connect another account";
     if (supportsApiKey) return "Use a new API key";
     if (supportsUserPassword) return "Add a new username and password";
-    if (supportsHostScoped) return "Add new headers";
+    if (supportsHostScoped) return "Update headers";
     return "Add new credentials";
   } else {
     if (supportsOAuth2) return "Add account";
@@ -197,3 +197,123 @@ export function getSystemCredentials<
 >(credentials: T[]): T[] {
   return credentials.filter((cred) => isSystemCredential(cred));
 }
+
+export type DeleteResult =
+  | { deleted: true }
+  | { deleted: false; need_confirmation: true; message: string };
+
+export type DeleteState = {
+  warningMessage: string | null;
+  credentialToDelete: { id: string; title: string } | null;
+  shouldUnselectCurrent: boolean;
+};
+
+export async function processCredentialDeletion(
+  credentialToDelete: { id: string; title: string },
+  selectedCredentialId: string | undefined,
+  deleteCredentials: (id: string, force: boolean) => Promise<DeleteResult>,
+  force: boolean,
+): Promise<DeleteState> {
+  const result = await deleteCredentials(credentialToDelete.id, force);
+
+  if (result.deleted) {
+    return {
+      warningMessage: null,
+      credentialToDelete: null,
+      shouldUnselectCurrent: selectedCredentialId === credentialToDelete.id,
+    };
+  }
+
+  if ("need_confirmation" in result && result.need_confirmation) {
+    return {
+      warningMessage:
+        result.message || "This credential is in use. Force delete?",
+      credentialToDelete,
+      shouldUnselectCurrent: false,
+    };
+  }
+
+  return {
+    warningMessage: null,
+    credentialToDelete,
+    shouldUnselectCurrent: false,
+  };
+}
+
+export function findExistingHostCredentials<
+  T extends { type: string; id: string; host?: string },
+>(credentials: T[], host: string): T[] {
+  return credentials.filter(
+    (c) => c.type === "host_scoped" && "host" in c && c.host === host,
+  );
+}
+
+export function hasExistingHostCredential<
+  T extends { type: string; host?: string },
+>(credentials: T[], host: string): boolean {
+  return credentials.some(
+    (c) => c.type === "host_scoped" && "host" in c && c.host === host,
+  );
+}
+
+export type ActionTarget =
+  | "type_selector"
+  | "oauth"
+  | "api_key"
+  | "user_password"
+  | "host_scoped"
+  | null;
+
+export function resolveActionTarget(
+  hasMultipleCredentialTypes: boolean,
+  supportsOAuth2: boolean,
+  supportsApiKey: boolean,
+  supportsUserPassword: boolean,
+  supportsHostScoped: boolean,
+): ActionTarget {
+  if (hasMultipleCredentialTypes) return "type_selector";
+  if (supportsOAuth2) return "oauth";
+  if (supportsApiKey) return "api_key";
+  if (supportsUserPassword) return "user_password";
+  if (supportsHostScoped) return "host_scoped";
+  return null;
+}
+
+export type HeaderPair = { key: string; value: string };
+
+export function headerPairsToRecord(
+  pairs: HeaderPair[],
+): Record<string, string> {
+  return pairs.reduce(
+    (acc, pair) => {
+      if (pair.key.trim() && pair.value.trim()) {
+        acc[pair.key.trim()] = pair.value.trim();
+      }
+      return acc;
+    },
+    {} as Record<string, string>,
+  );
+}
+
+export function addHeaderPairToList(pairs: HeaderPair[]): HeaderPair[] {
+  return [...pairs, { key: "", value: "" }];
+}
+
+export function removeHeaderPairFromList(
+  pairs: HeaderPair[],
+  index: number,
+): HeaderPair[] {
+  if (pairs.length <= 1) return pairs;
+  return pairs.filter((_, i) => i !== index);
+}
+
+export function updateHeaderPairInList(
+  pairs: HeaderPair[],
+  index: number,
+  field: "key" | "value",
+  value: string,
+): HeaderPair[] {
+  const newPairs = [...pairs];
+  newPairs[index] = { ...newPairs[index], [field]: value };
+  return newPairs;
+}
diff --git a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/useCredentialsInput.ts b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/useCredentialsInput.ts
index 0ffdbcb053..a124566c84 100644
--- a/autogpt_platform/frontend/src/components/contextual/CredentialsInput/useCredentialsInput.ts
+++ b/autogpt_platform/frontend/src/components/contextual/CredentialsInput/useCredentialsInput.ts
@@ -1,10 +1,10 @@
-import { useDeleteV1DeleteCredentials } from "@/app/api/__generated__/endpoints/integrations/integrations";
 import useCredentials from "@/hooks/useCredentials";
 import { useBackendAPI } from "@/lib/autogpt-server-api/context";
 import {
   BlockIOCredentialsSubSchema,
   CredentialsMetaInput,
 } from "@/lib/autogpt-server-api/types";
+import { toast } from "@/components/molecules/Toast/use-toast";
 import { postV2InitiateOauthLoginForAnMcpServer } from "@/app/api/__generated__/endpoints/mcp/mcp";
 import {
   OAUTH_ERROR_FLOW_CANCELED,
@@ -12,7 +12,6 @@ import {
   OAUTH_ERROR_WINDOW_CLOSED,
   openOAuthPopup,
 } from "@/lib/oauth-popup";
-import { useQueryClient } from "@tanstack/react-query";
 import { useEffect, useRef, useState } from "react";
 import {
   countSupportedTypes,
@@ -20,6 +19,8 @@ import {
   getActionButtonText,
   getSupportedTypes,
   getSystemCredentials,
+  processCredentialDeletion,
+  resolveActionTarget,
 } from "./helpers";
 
 export type CredentialsInputState = ReturnType<typeof useCredentialsInput>;
@@ -59,12 +60,15 @@ export function useCredentialsInput({
     id: string;
     title: string;
   } | null>(null);
+  const [deleteWarningMessage, setDeleteWarningMessage] = useState<
+    string | null
+  >(null);
 
   const api = useBackendAPI();
-  const queryClient = useQueryClient();
   const credentials = useCredentials(schema, siblingInputs);
   const hasAttemptedAutoSelect = useRef(false);
   const oauthAbortRef = useRef<((reason?: string) => void) | null>(null);
+  const [isDeletingCredential, setIsDeletingCredential] = useState(false);
 
   // Clean up on unmount
   useEffect(() => {
@@ -73,23 +77,6 @@ export function useCredentialsInput({
     };
   }, []);
 
-  const deleteCredentialsMutation = useDeleteV1DeleteCredentials({
-    mutation: {
-      onSuccess: () => {
-        queryClient.invalidateQueries({
-          queryKey: ["/api/integrations/credentials"],
-        });
-        queryClient.invalidateQueries({
-          queryKey: [`/api/integrations/${credentials?.provider}/credentials`],
-        });
-        setCredentialToDelete(null);
-        if (selectedCredential?.id === credentialToDelete?.id) {
-          onSelectCredential(undefined);
-        }
-      },
-    },
-  });
-
   useEffect(() => {
     if (onLoaded) {
       onLoaded(Boolean(credentials && credentials.isLoading === false));
@@ -282,19 +269,29 @@ export function useCredentialsInput({
   );
 
   function handleActionButtonClick() {
-    if (hasMultipleCredentialTypes) {
-      setCredentialTypeSelectorOpen(true);
-      return;
-    }
-
-    if (supportsOAuth2) {
-      handleOAuthLogin();
-    } else if (supportsApiKey) {
-      setAPICredentialsModalOpen(true);
-    } else if (supportsUserPassword) {
-      setUserPasswordCredentialsModalOpen(true);
-    } else if (supportsHostScoped) {
-      setHostScopedCredentialsModalOpen(true);
+    const target = resolveActionTarget(
+      hasMultipleCredentialTypes,
+      supportsOAuth2,
+      supportsApiKey,
+      supportsUserPassword,
+      supportsHostScoped,
+    );
+    switch (target) {
+      case "type_selector":
+        setCredentialTypeSelectorOpen(true);
+        break;
+      case "oauth":
+        handleOAuthLogin();
+        break;
+      case "api_key":
+        setAPICredentialsModalOpen(true);
+        break;
+      case "user_password":
+        setUserPasswordCredentialsModalOpen(true);
+        break;
+      case "host_scoped":
+        setHostScopedCredentialsModalOpen(true);
+        break;
     }
   }
 
@@ -315,15 +312,42 @@ export function useCredentialsInput({
   }
 
   function handleDeleteCredential(credential: { id: string; title: string }) {
+    setDeleteWarningMessage(null);
     setCredentialToDelete(credential);
   }
 
-  function handleDeleteConfirm() {
-    if (credentialToDelete && credentials) {
-      deleteCredentialsMutation.mutate({
-        provider: credentials.provider,
-        credId: credentialToDelete.id,
+  async function handleDeleteConfirm(force: boolean = false) {
+    if (
+      !credentialToDelete ||
+      !credentials ||
+      !("deleteCredentials" in credentials)
+    )
+      return;
+
+    setIsDeletingCredential(true);
+    try {
+      const state = await processCredentialDeletion(
+        credentialToDelete,
+        selectedCredential?.id,
+        credentials.deleteCredentials,
+        force,
+      );
+
+      if (state.shouldUnselectCurrent) {
+        onSelectCredential(undefined);
+      }
+      setDeleteWarningMessage(state.warningMessage);
+      setCredentialToDelete(state.credentialToDelete);
+    } catch (error) {
+      const message =
+        error instanceof Error ? error.message : "Something went wrong";
+      toast({
+        title: "Failed to delete credential",
+        description: message,
+        variant: "destructive",
       });
+    } finally {
+      setIsDeletingCredential(false);
     }
   }
 
@@ -350,7 +374,8 @@ export function useCredentialsInput({
     isOAuth2FlowInProgress,
     cancelOAuthFlow,
     credentialToDelete,
-    deleteCredentialsMutation,
+    deleteWarningMessage,
+    isDeletingCredential,
     actionButtonText: getActionButtonText(
       supportsOAuth2,
       supportsApiKey,
diff --git a/autogpt_platform/frontend/src/components/contextual/OutputRenderers/index.ts b/autogpt_platform/frontend/src/components/contextual/OutputRenderers/index.ts
index 074c6441cc..8b5948d007 100644
--- a/autogpt_platform/frontend/src/components/contextual/OutputRenderers/index.ts
+++ b/autogpt_platform/frontend/src/components/contextual/OutputRenderers/index.ts
@@ -1,6 +1,8 @@
 import { globalRegistry } from "./types";
 import { textRenderer } from "./renderers/TextRenderer";
 import { codeRenderer } from "./renderers/CodeRenderer";
+import { csvRenderer } from "./renderers/CSVRenderer";
+import { htmlRenderer } from "./renderers/HTMLRenderer";
 import { imageRenderer } from "./renderers/ImageRenderer";
 import { videoRenderer } from "./renderers/VideoRenderer";
 import { audioRenderer } from "./renderers/AudioRenderer";
@@ -13,7 +15,9 @@ import { linkRenderer } from "./renderers/LinkRenderer";
 globalRegistry.register(workspaceFileRenderer);
 globalRegistry.register(videoRenderer);
 globalRegistry.register(audioRenderer);
+globalRegistry.register(htmlRenderer);
 globalRegistry.register(imageRenderer);
+globalRegistry.register(csvRenderer);
 globalRegistry.register(codeRenderer);
 globalRegistry.register(markdownRenderer);
 globalRegistry.register(jsonRenderer);
diff --git a/autogpt_platform/frontend/src/components/contextual/OutputRenderers/renderers/CSVRenderer.test.ts b/autogpt_platform/frontend/src/components/contextual/OutputRenderers/renderers/CSVRenderer.test.ts
new file mode 100644
index 0000000000..6fc650cd1a
--- /dev/null
+++ b/autogpt_platform/frontend/src/components/contextual/OutputRenderers/renderers/CSVRenderer.test.ts
@@ -0,0 +1,67 @@
+import { describe, expect, it } from "vitest";
+import { csvRenderer } from "./CSVRenderer";
+
+function downloadText(value: string, filename = "t.csv"): string {
+  const dl = csvRenderer.getDownloadContent?.(value, { filename });
+  if (!dl) throw new Error("no download content");
+  return dl.filename;
+}
+
+describe("csvRenderer.canRender", () => {
+  it("matches CSV mime type", () => {
+    expect(csvRenderer.canRender("a,b\n1,2", { mimeType: "text/csv" })).toBe(
+      true,
+    );
+  });
+  it("matches .csv filename case-insensitively", () => {
+    expect(csvRenderer.canRender("a,b", { filename: "data.CSV" })).toBe(true);
+  });
+  it("rejects non-string values", () => {
+    expect(csvRenderer.canRender(42, { mimeType: "text/csv" })).toBe(false);
+  });
+  it("rejects strings without CSV hint", () => {
+    expect(csvRenderer.canRender("a,b,c", {})).toBe(false);
+  });
+});
+
+describe("csvRenderer.getDownloadContent", () => {
+  it("uses filename from metadata", () => {
+    expect(downloadText("a,b\n1,2", "my.csv")).toBe("my.csv");
+  });
+  it("falls back to data.csv", () => {
+    const dl = csvRenderer.getDownloadContent?.("a,b\n1,2");
+    expect(dl?.filename).toBe("data.csv");
+  });
+});
+
+describe("csvRenderer.getCopyContent", () => {
+  it("round-trips content as plain text", () => {
+    const result = csvRenderer.getCopyContent?.("x,y\n1,2");
+    expect(result?.mimeType).toBe("text/plain");
+    expect(result?.data).toBe("x,y\n1,2");
+  });
+});
+
+describe("csvRenderer.render (parse via render output smoke)", () => {
+  // The parser itself isn't exported, so we exercise it through render.
+  // These tests ensure render() doesn't throw on edge-case CSVs.
+  it("handles empty input", () => {
+    expect(() => csvRenderer.render("")).not.toThrow();
+  });
+  it("handles embedded newline inside quoted field", () => {
+    const csv = 'name,bio\n"Alice","line1\nline2"\n"Bob","x"';
+    expect(() => csvRenderer.render(csv)).not.toThrow();
+  });
+  it("strips BOM from first header cell (smoke)", () => {
+    const csv = "\ufefftitle,count\nfoo,1";
+    expect(() => csvRenderer.render(csv)).not.toThrow();
+  });
+  it("handles CRLF line endings", () => {
+    const csv = "a,b\r\n1,2\r\n3,4";
+    expect(() => csvRenderer.render(csv)).not.toThrow();
+  });
+  it("handles escaped double quote inside a quoted field", () => {
+    const csv = 'name\n"She said ""hi"""';
+    expect(() => csvRenderer.render(csv)).not.toThrow();
+  });
+});
diff --git a/autogpt_platform/frontend/src/components/contextual/OutputRenderers/renderers/CSVRenderer.tsx b/autogpt_platform/frontend/src/components/contextual/OutputRenderers/renderers/CSVRenderer.tsx
new file mode 100644
index 0000000000..7f39064eb1
--- /dev/null
+++ b/autogpt_platform/frontend/src/components/contextual/OutputRenderers/renderers/CSVRenderer.tsx
@@ -0,0 +1,177 @@
+import React, { useMemo, useState } from "react";
+import {
+  OutputRenderer,
+  OutputMetadata,
+  DownloadContent,
+  CopyContent,
+} from "../types";
+
+function parseCSV(text: string): { headers: string[]; rows: string[][] } {
+  const normalized = text
+    .replace(/\r\n?/g, "\n")
+    .replace(/^\ufeff/, "")
+    .trim();
+  if (normalized.length === 0) return { headers: [], rows: [] };
+
+  // Character-by-character parse so embedded newlines inside "quoted" cells
+  // (allowed by RFC 4180) don't break the row split.
+  const rows: string[][] = [];
+  let current = "";
+  let row: string[] = [];
+  let inQuotes = false;
+  for (let i = 0; i < normalized.length; i++) {
+    const ch = normalized[i];
+    if (inQuotes) {
+      if (ch === '"' && normalized[i + 1] === '"') {
+        current += '"';
+        i++;
+      } else if (ch === '"') {
+        inQuotes = false;
+      } else {
+        current += ch;
+      }
+    } else if (ch === '"') {
+      inQuotes = true;
+    } else if (ch === ",") {
+      row.push(current);
+      current = "";
+    } else if (ch === "\n") {
+      row.push(current);
+      rows.push(row);
+      row = [];
+      current = "";
+    } else {
+      current += ch;
+    }
+  }
+  row.push(current);
+  rows.push(row);
+
+  const headers = rows[0] ?? [];
+  return { headers, rows: rows.slice(1) };
+}
+
+function CSVTable({ value }: { value: string }) {
+  const { headers, rows } = useMemo(() => parseCSV(value), [value]);
+  const [sortCol, setSortCol] = useState<number | null>(null);
+  const [sortAsc, setSortAsc] = useState(true);
+
+  const sortedRows = useMemo(() => {
+    if (sortCol === null) return rows;
+    return [...rows].sort((a, b) => {
+      const aVal = a[sortCol] ?? "";
+      const bVal = b[sortCol] ?? "";
+      const aNum = parseFloat(aVal);
+      const bNum = parseFloat(bVal);
+      if (!isNaN(aNum) && !isNaN(bNum)) {
+        return sortAsc ? aNum - bNum : bNum - aNum;
+      }
+      return sortAsc ? aVal.localeCompare(bVal) : bVal.localeCompare(aVal);
+    });
+  }, [rows, sortCol, sortAsc]);
+
+  function handleSort(col: number) {
+    if (sortCol === col) {
+      setSortAsc(!sortAsc);
+    } else {
+      setSortCol(col);
+      setSortAsc(true);
+    }
+  }
+
+  if (headers.length === 0) {
+    return <p className="p-4 text-sm text-zinc-500">Empty CSV</p>;
+  }
+
+  return (
+    <div className="overflow-x-auto">
+      <table className="w-full border-collapse text-sm">
+        <thead>
+          <tr className="border-b border-zinc-200 bg-zinc-50">
+            {headers.map((header, i) => (
+              <th
+                key={i}
+                className="px-3 py-2 text-left font-medium text-zinc-700"
+              >
+                <button
+                  type="button"
+                  className="flex w-full cursor-pointer select-none items-center gap-1 hover:bg-zinc-100"
+                  onClick={() => handleSort(i)}
+                >
+                  {header}
+                  {sortCol === i && (
+                    <span className="text-xs">
+                      {sortAsc ? "\u25B2" : "\u25BC"}
+                    </span>
+                  )}
+                </button>
+              </th>
+            ))}
+          </tr>
+        </thead>
+        <tbody>
+          {sortedRows.map((row, rowIdx) => (
+            <tr
+              key={rowIdx}
+              className="border-b border-zinc-100 even:bg-zinc-50/50"
+              style={{
+                contentVisibility: "auto",
+                containIntrinsicSize: "0 36px",
+              }}
+            >
+              {row.map((cell, cellIdx) => (
+                <td key={cellIdx} className="px-3 py-1.5 text-zinc-600">
+                  {cell}
+                </td>
+              ))}
+            </tr>
+          ))}
+        </tbody>
+      </table>
+    </div>
+  );
+}
+
+function canRenderCSV(value: unknown, metadata?: OutputMetadata): boolean {
+  if (typeof value !== "string") return false;
+  if (metadata?.mimeType === "text/csv") return true;
+  if (metadata?.filename?.toLowerCase().endsWith(".csv")) return true;
+  return false;
+}
+
+function renderCSV(
+  value: unknown,
+  _metadata?: OutputMetadata,
+): React.ReactNode {
+  return <CSVTable value={String(value)} />;
+}
+
+function getCopyContentCSV(
+  value: unknown,
+  _metadata?: OutputMetadata,
+): CopyContent | null {
+  const text = String(value);
+  return { mimeType: "text/plain", data: text, fallbackText: text };
+}
+
+function getDownloadContentCSV(
+  value: unknown,
+  metadata?: OutputMetadata,
+): DownloadContent | null {
+  const text = String(value);
+  return {
+    data: new Blob([text], { type: "text/csv" }),
+    filename: metadata?.filename || "data.csv",
+    mimeType: "text/csv",
+  };
+}
+
+export const csvRenderer: OutputRenderer = {
+  name: "CSVRenderer",
+  priority: 38,
+  canRender: canRenderCSV,
+  render: renderCSV,
+  getCopyContent: getCopyContentCSV,
+  getDownloadContent: getDownloadContentCSV,
+  isConcatenable: () => false,
+};
diff --git a/autogpt_platform/frontend/src/components/contextual/OutputRenderers/renderers/CodeRenderer.tsx b/autogpt_platform/frontend/src/components/contextual/OutputRenderers/renderers/CodeRenderer.tsx
index 93df7d8ddd..c3a0423997 100644
--- a/autogpt_platform/frontend/src/components/contextual/OutputRenderers/renderers/CodeRenderer.tsx
+++ b/autogpt_platform/frontend/src/components/contextual/OutputRenderers/renderers/CodeRenderer.tsx
@@ -1,4 +1,13 @@
-import React from "react";
+"use client";
+
+import React, { useEffect, useState } from "react";
+import {
+  SHIKI_THEMES,
+  type BundledLanguage,
+  getShikiHighlighter,
+  isLanguageSupported,
+  resolveLanguage,
+} from "@/lib/shiki-highlighter";
 import {
   OutputRenderer,
   OutputMetadata,
@@ -6,6 +15,18 @@ import {
   CopyContent,
 } from "../types";
 
+interface HighlightToken {
+  content: string;
+  color?: string;
+  htmlStyle?: Record<string, string>;
+}
+
+interface HighlightedCodeState {
+  tokens: HighlightToken[][];
+  fg?: string;
+  bg?: string;
+}
+
 function getFileExtension(language: string): string {
   const extensionMap: Record<string, string> = {
     javascript: "js",
@@ -68,24 +89,153 @@ function canRenderCode(value: unknown, metadata?: OutputMetadata): boolean {
   return codeIndicators.some((pattern) => pattern.test(value));
 }
 
+function EditorLineNumber({ index }: { index: number }) {
+  return (
+    <span className="select-none pr-2 text-right font-mono text-xs text-zinc-600">
+      {index + 1}
+    </span>
+  );
+}
+
+function PlainCodeLines({ code }: { code: string }) {
+  return code.split("\n").map((line, index) => (
+    <div key={`${index}-${line}`} className="grid grid-cols-[3rem_1fr] gap-4">
+      <EditorLineNumber index={index} />
+      <span className="whitespace-pre font-mono text-sm text-zinc-100">
+        {line || " "}
+      </span>
+    </div>
+  ));
+}
+
+function HighlightedCodeBlock({
+  code,
+  filename,
+  language,
+}: {
+  code: string;
+  filename?: string;
+  language?: string;
+}) {
+  const [highlighted, setHighlighted] = useState<HighlightedCodeState | null>(
+    null,
+  );
+  const resolvedLanguage = resolveLanguage(language || "text");
+  const supportedLanguage = isLanguageSupported(resolvedLanguage)
+    ? resolvedLanguage
+    : "text";
+
+  useEffect(() => {
+    let cancelled = false;
+    const shikiLanguage = supportedLanguage as BundledLanguage;
+
+    setHighlighted(null);
+
+    getShikiHighlighter()
+      .then(async (highlighter) => {
+        if (
+          supportedLanguage !== "text" &&
+          !highlighter.getLoadedLanguages().includes(supportedLanguage)
+        ) {
+          await highlighter.loadLanguage(shikiLanguage);
+        }
+
+        const shikiResult = highlighter.codeToTokens(code, {
+          lang: shikiLanguage,
+          theme: SHIKI_THEMES[1],
+        });
+
+        if (cancelled) return;
+
+        setHighlighted({
+          tokens: shikiResult.tokens.map((line) =>
+            line.map((token) => ({
+              content: token.content,
+              color: token.color,
+              htmlStyle: token.htmlStyle,
+            })),
+          ),
+          fg: shikiResult.fg,
+          bg: shikiResult.bg,
+        });
+      })
+      .catch(() => {
+        if (cancelled) return;
+        setHighlighted(null);
+      });
+
+    return () => {
+      cancelled = true;
+    };
+  }, [code, supportedLanguage]);
+
+  return (
+    <div className="overflow-hidden rounded-lg border border-zinc-900 bg-[#020617] shadow-sm">
+      <div className="flex items-center justify-between border-b border-zinc-800 bg-[#111827] px-3 py-2">
+        <span className="truncate font-mono text-xs text-zinc-400">
+          {filename || "code"}
+        </span>
+        <span className="rounded bg-zinc-800 px-2 py-0.5 font-mono text-[11px] uppercase tracking-wide text-zinc-300">
+          {supportedLanguage}
+        </span>
+      </div>
+      <div
+        className="overflow-x-auto"
+        style={{
+          backgroundColor: highlighted?.bg || "#020617",
+          color: highlighted?.fg || "#e2e8f0",
+        }}
+      >
+        <pre className="min-w-full p-4">
+          {highlighted ? (
+            highlighted.tokens.map((line, index) => (
+              <div
+                key={`${index}-${line.length}`}
+                className="grid grid-cols-[3rem_1fr] gap-4"
+              >
+                <EditorLineNumber index={index} />
+                <span className="whitespace-pre font-mono text-sm leading-6">
+                  {line.length > 0
+                    ? line.map((token, tokenIndex) => (
+                        <span
+                          key={`${index}-${tokenIndex}-${token.content}`}
+                          style={
+                            token.htmlStyle
+                              ? (token.htmlStyle as React.CSSProperties)
+                              : token.color
+                                ? { color: token.color }
+                                : undefined
+                          }
+                        >
+                          {token.content}
+                        </span>
+                      ))
+                    : " "}
+                </span>
+              </div>
+            ))
+          ) : (
+            <PlainCodeLines code={code} />
+          )}
+        </pre>
+      </div>
+    </div>
+  );
+}
+
 function renderCode(
   value: unknown,
   metadata?: OutputMetadata,
 ): React.ReactNode {
   const codeValue = String(value);
-  const language = metadata?.language || "plaintext";
+  const language = metadata?.language || "text";
 
   return (
-    <div className="group relative">
-      {metadata?.language && (
-        <div className="absolute right-2 top-2 rounded bg-background/80 px-2 py-1 text-xs text-muted-foreground">
-          {language}
-        </div>
-      )}
-      <pre className="overflow-x-auto rounded-md bg-muted p-3">
-        <code className="font-mono text-sm">{codeValue}</code>
-      </pre>
-    </div>
+    <HighlightedCodeBlock
+      code={codeValue}
+      filename={metadata?.filename}
+      language={language}
+    />
   );
 }
 
diff --git a/autogpt_platform/frontend/src/components/contextual/OutputRenderers/renderers/HTMLRenderer.tsx b/autogpt_platform/frontend/src/components/contextual/OutputRenderers/renderers/HTMLRenderer.tsx
new file mode 100644
index 0000000000..40a28e3c0a
--- /dev/null
+++ b/autogpt_platform/frontend/src/components/contextual/OutputRenderers/renderers/HTMLRenderer.tsx
@@ -0,0 +1,75 @@
+import React from "react";
+import {
+  TAILWIND_CDN_URL,
+  wrapWithHeadInjection,
+} from "@/lib/iframe-sandbox-csp";
+import {
+  OutputRenderer,
+  OutputMetadata,
+  DownloadContent,
+  CopyContent,
+} from "../types";
+
+function HTMLPreview({ value }: { value: string }) {
+  // Inject Tailwind CDN — no CSP (see iframe-sandbox-csp.ts for why)
+  const tailwindScript = `<script src="${TAILWIND_CDN_URL}"></script>`;
+  const srcDoc = wrapWithHeadInjection(value, tailwindScript);
+  return (
+    <iframe
+      sandbox="allow-scripts"
+      srcDoc={srcDoc}
+      className="h-96 w-full rounded border border-zinc-200"
+      title="HTML preview"
+    />
+  );
+}
+
+function canRenderHTML(value: unknown, metadata?: OutputMetadata): boolean {
+  if (typeof value !== "string") return false;
+  if (metadata?.mimeType === "text/html") return true;
+  const filename = metadata?.filename?.toLowerCase();
+  if (filename?.endsWith(".html") || filename?.endsWith(".htm")) return true;
+  return false;
+}
+
+function renderHTML(
+  value: unknown,
+  _metadata?: OutputMetadata,
+): React.ReactNode {
+  return <HTMLPreview value={String(value)} />;
+}
+
+function getCopyContentHTML(
+  value: unknown,
+  _metadata?: OutputMetadata,
+): CopyContent | null {
+  const text = String(value);
+  return {
+    mimeType: "text/html",
+    data: text,
+    fallbackText: text,
+    alternativeMimeTypes: ["text/plain"],
+  };
+}
+
+function getDownloadContentHTML(
+  value: unknown,
+  metadata?: OutputMetadata,
+): DownloadContent | null {
+  const text = String(value);
+  return {
+    data: new Blob([text], { type: "text/html" }),
+    filename: metadata?.filename || "page.html",
+    mimeType: "text/html",
+  };
+}
+
+export const htmlRenderer: OutputRenderer = {
+  name: "HTMLRenderer",
+  priority: 42,
+  canRender: canRenderHTML,
+  render: renderHTML,
+  getCopyContent: getCopyContentHTML,
+  getDownloadContent: getDownloadContentHTML,
+  isConcatenable: () => false,
+};
diff --git a/autogpt_platform/frontend/src/components/renderers/InputRenderer/base/standard/widgets/FileInput/useWorkspaceUpload.ts b/autogpt_platform/frontend/src/components/renderers/InputRenderer/base/standard/widgets/FileInput/useWorkspaceUpload.ts
index e2b759927b..b64f3c2e70 100644
--- a/autogpt_platform/frontend/src/components/renderers/InputRenderer/base/standard/widgets/FileInput/useWorkspaceUpload.ts
+++ b/autogpt_platform/frontend/src/components/renderers/InputRenderer/base/standard/widgets/FileInput/useWorkspaceUpload.ts
@@ -1,4 +1,4 @@
-import { useDeleteWorkspaceDeleteAWorkspaceFile } from "@/app/api/__generated__/endpoints/workspace/workspace";
+import { useDeleteWorkspaceFile } from "@/app/api/__generated__/endpoints/workspace/workspace";
 import { useToast } from "@/components/molecules/Toast/use-toast";
 import { uploadFileDirect } from "@/lib/direct-upload";
 import { parseWorkspaceFileID, buildWorkspaceURI } from "@/lib/workspace-uri";
@@ -6,7 +6,7 @@ import { parseWorkspaceFileID, buildWorkspaceURI } from "@/lib/workspace-uri";
 export function useWorkspaceUpload() {
   const { toast } = useToast();
 
-  const { mutate: deleteMutation } = useDeleteWorkspaceDeleteAWorkspaceFile({
+  const { mutate: deleteMutation } = useDeleteWorkspaceFile({
     mutation: {
       onError: () => {
         toast({
diff --git a/autogpt_platform/frontend/src/lib/__tests__/iframe-sandbox-csp.test.ts b/autogpt_platform/frontend/src/lib/__tests__/iframe-sandbox-csp.test.ts
new file mode 100644
index 0000000000..ce51bee485
--- /dev/null
+++ b/autogpt_platform/frontend/src/lib/__tests__/iframe-sandbox-csp.test.ts
@@ -0,0 +1,58 @@
+import { describe, expect, it } from "vitest";
+import { TAILWIND_CDN_URL, wrapWithHeadInjection } from "../iframe-sandbox-csp";
+
+describe("wrapWithHeadInjection", () => {
+  const injection = '<script src="https://example.com/lib.js"></script>';
+
+  it("injects after <head> when document has a head tag", () => {
+    const html = "<html><head><title>Test</title></head><body>Hi</body></html>";
+    const result = wrapWithHeadInjection(html, injection);
+    expect(result).toContain(`<head>${injection}<title>Test</title>`);
+  });
+
+  it("injects after <head> with attributes", () => {
+    const html = '<html><head lang="en"><title>Test</title></head></html>';
+    const result = wrapWithHeadInjection(html, injection);
+    expect(result).toContain(`<head lang="en">${injection}<title>Test</title>`);
+  });
+
+  it("is case-insensitive for head tag", () => {
+    const html = "<HTML><HEAD><TITLE>Test</TITLE></HEAD></HTML>";
+    const result = wrapWithHeadInjection(html, injection);
+    expect(result).toContain(`<HEAD>${injection}<TITLE>`);
+  });
+
+  it("wraps headless content in a full document skeleton", () => {
+    const html = "<div>Just a fragment</div>";
+    const result = wrapWithHeadInjection(html, injection);
+    expect(result).toBe(
+      `<!doctype html><html><head>${injection}</head><body>${html}</body></html>`,
+    );
+  });
+
+  it("wraps empty string in a skeleton", () => {
+    const result = wrapWithHeadInjection("", injection);
+    expect(result).toContain("<head>" + injection + "</head>");
+    expect(result).toContain("<body></body>");
+  });
+});
+
+describe("TAILWIND_CDN_URL", () => {
+  it("is pinned to a specific version", () => {
+    expect(TAILWIND_CDN_URL).toMatch(
+      /^https:\/\/cdn\.tailwindcss\.com\/\d+\.\d+\.\d+$/,
+    );
+  });
+});
+
+describe("no CSP is exported", () => {
+  it("does not export ARTIFACT_IFRAME_CSP", async () => {
+    const mod = await import("../iframe-sandbox-csp");
+    expect("ARTIFACT_IFRAME_CSP" in mod).toBe(false);
+  });
+
+  it("does not export cspMetaTag", async () => {
+    const mod = await import("../iframe-sandbox-csp");
+    expect("cspMetaTag" in mod).toBe(false);
+  });
+});
diff --git a/autogpt_platform/frontend/src/lib/iframe-sandbox-csp.ts b/autogpt_platform/frontend/src/lib/iframe-sandbox-csp.ts
new file mode 100644
index 0000000000..65990f1e13
--- /dev/null
+++ b/autogpt_platform/frontend/src/lib/iframe-sandbox-csp.ts
@@ -0,0 +1,48 @@
+/**
+ * Artifact iframe preview utilities.
+ *
+ * ===== WHY THERE IS NO CSP =====
+ *
+ * We intentionally do NOT inject a Content-Security-Policy meta tag into
+ * artifact preview iframes. CSP was added and removed multiple times during
+ * review — here's why it stays out:
+ *
+ * 1. `connect-src 'none'` breaks any AI-generated HTML that uses fetch(),
+ *    XMLHttpRequest, or WebSocket — dashboards, API-driven charts, data
+ *    loaders, etc. all silently fail.
+ *
+ * 2. The iframe sandbox (`sandbox="allow-scripts"` without `allow-same-origin`)
+ *    already provides strong isolation: the iframe gets a unique opaque origin,
+ *    so it cannot access the parent page's cookies, localStorage, DOM, or
+ *    make same-origin requests to our backend.
+ *
+ * 3. The only data a script inside the iframe can exfiltrate is the artifact
+ *    content itself — which the user already sees in the chat. There is no
+ *    secret data available inside the sandbox.
+ *
+ * 4. Meta-CSP is unreliable in practice: if AI-generated HTML includes its
+ *    own <meta http-equiv="Content-Security-Policy"> before ours, the browser
+ *    honors the first one and ignores ours.
+ *
+ * DO NOT re-add CSP without addressing all four points above.
+ * ================================================================
+ */
+
+// Pinned to a specific version to reduce exposure to unannounced upstream
+// changes (SRI is not possible because the JIT runtime is generated on demand).
+export const TAILWIND_CDN_URL = "https://cdn.tailwindcss.com/3.4.16";
+
+/**
+ * Inject content into the <head> of an HTML document string.
+ * If the content has no <head> tag, wraps it in a full document skeleton.
+ */
+const HEAD_OPEN_RE = /<head(\s[^>]*)?>/i;
+export function wrapWithHeadInjection(
+  content: string,
+  headInjection: string,
+): string {
+  if (HEAD_OPEN_RE.test(content)) {
+    return content.replace(HEAD_OPEN_RE, (match) => `${match}${headInjection}`);
+  }
+  return `<!doctype html><html><head>${headInjection}</head><body>${content}</body></html>`;
+}
diff --git a/autogpt_platform/frontend/src/services/feature-flags/__tests__/envFlagOverride.test.ts b/autogpt_platform/frontend/src/services/feature-flags/__tests__/envFlagOverride.test.ts
new file mode 100644
index 0000000000..44860ab0d5
--- /dev/null
+++ b/autogpt_platform/frontend/src/services/feature-flags/__tests__/envFlagOverride.test.ts
@@ -0,0 +1,61 @@
+import { describe, it, expect, beforeEach, vi } from "vitest";
+import { envFlagOverride, Flag } from "../use-get-flag";
+
+vi.mock("launchdarkly-react-client-sdk", () => ({
+  useFlags: () => ({}),
+}));
+
+vi.mock("@/app/(platform)/marketplace/components/HeroSection/helpers", () => ({
+  DEFAULT_SEARCH_TERMS: [],
+}));
+
+vi.mock("@/services/environment", () => ({
+  environment: { areFeatureFlagsEnabled: () => false },
+}));
+
+const ENV_KEY = "NEXT_PUBLIC_FORCE_FLAG_CHAT_MODE_OPTION";
+
+describe("envFlagOverride", () => {
+  beforeEach(() => {
+    delete process.env[ENV_KEY];
+  });
+
+  it('returns true when env var is "true"', () => {
+    process.env[ENV_KEY] = "true";
+    expect(envFlagOverride(Flag.CHAT_MODE_OPTION)).toBe(true);
+  });
+
+  it('returns false when env var is "false"', () => {
+    process.env[ENV_KEY] = "false";
+    expect(envFlagOverride(Flag.CHAT_MODE_OPTION)).toBe(false);
+  });
+
+  it('returns true when env var is "1"', () => {
+    process.env[ENV_KEY] = "1";
+    expect(envFlagOverride(Flag.CHAT_MODE_OPTION)).toBe(true);
+  });
+
+  it('returns true when env var is "yes"', () => {
+    process.env[ENV_KEY] = "yes";
+    expect(envFlagOverride(Flag.CHAT_MODE_OPTION)).toBe(true);
+  });
+
+  it('returns true when env var is "on"', () => {
+    process.env[ENV_KEY] = "on";
+    expect(envFlagOverride(Flag.CHAT_MODE_OPTION)).toBe(true);
+  });
+
+  it("returns undefined when env var is not set", () => {
+    expect(envFlagOverride(Flag.CHAT_MODE_OPTION)).toBeUndefined();
+  });
+
+  it("returns undefined for an empty string", () => {
+    process.env[ENV_KEY] = "";
+    expect(envFlagOverride(Flag.CHAT_MODE_OPTION)).toBeUndefined();
+  });
+
+  it("returns undefined for an unrecognised string", () => {
+    process.env[ENV_KEY] = "banana";
+    expect(envFlagOverride(Flag.CHAT_MODE_OPTION)).toBeUndefined();
+  });
+});
diff --git a/autogpt_platform/frontend/src/services/feature-flags/feature-flag-provider.tsx b/autogpt_platform/frontend/src/services/feature-flags/feature-flag-provider.tsx
index 1ef75ae124..01a7441123 100644
--- a/autogpt_platform/frontend/src/services/feature-flags/feature-flag-provider.tsx
+++ b/autogpt_platform/frontend/src/services/feature-flags/feature-flag-provider.tsx
@@ -26,11 +26,18 @@ export function LaunchDarklyProvider({ children }: { children: ReactNode }) {
       };
     }
 
+    // Mirror the context built by the backend
+    // (feature_flag.py:_fetch_user_context_data) so LaunchDarkly targeting
+    // rules evaluate identically on both sides.
     return {
       kind: "user" as const,
       key: user.id,
-      ...(user.email && { email: user.email }),
       anonymous: false,
+      ...(user.email && {
+        email: user.email,
+        email_domain: user.email.split("@").at(-1),
+      }),
+      ...(user.role && { role: user.role }),
       custom: {
         ...(user.role && { role: user.role }),
       },
diff --git a/autogpt_platform/frontend/src/services/feature-flags/use-get-flag.ts b/autogpt_platform/frontend/src/services/feature-flags/use-get-flag.ts
index d492487999..8a4d0cd9ad 100644
--- a/autogpt_platform/frontend/src/services/feature-flags/use-get-flag.ts
+++ b/autogpt_platform/frontend/src/services/feature-flags/use-get-flag.ts
@@ -6,35 +6,62 @@ import { useFlags } from "launchdarkly-react-client-sdk";
 
 export enum Flag {
   BETA_BLOCKS = "beta-blocks",
-  NEW_BLOCK_MENU = "new-block-menu",
-  GRAPH_SEARCH = "graph-search",
-  ENABLE_ENHANCED_OUTPUT_HANDLING = "enable-enhanced-output-handling",
-  SHARE_EXECUTION_RESULTS = "share-execution-results",
-  AGENT_FAVORITING = "agent-favoriting",
   MARKETPLACE_SEARCH_TERMS = "marketplace-search-terms",
   ENABLE_PLATFORM_PAYMENT = "enable-platform-payment",
+  ARTIFACTS = "artifacts",
+  CHAT_MODE_OPTION = "chat-mode-option",
 }
 
 const isPwMockEnabled = process.env.NEXT_PUBLIC_PW_TEST === "true";
 
 const defaultFlags = {
   [Flag.BETA_BLOCKS]: [],
-  [Flag.NEW_BLOCK_MENU]: false,
-  [Flag.GRAPH_SEARCH]: false,
-  [Flag.ENABLE_ENHANCED_OUTPUT_HANDLING]: false,
-  [Flag.SHARE_EXECUTION_RESULTS]: false,
-  [Flag.AGENT_FAVORITING]: false,
   [Flag.MARKETPLACE_SEARCH_TERMS]: DEFAULT_SEARCH_TERMS,
   [Flag.ENABLE_PLATFORM_PAYMENT]: false,
+  [Flag.ARTIFACTS]: false,
+  [Flag.CHAT_MODE_OPTION]: false,
 };
 
 type FlagValues = typeof defaultFlags;
 
+/**
+ * Read a per-flag override from the build-time env.
+ *
+ * Set ``NEXT_PUBLIC_FORCE_FLAG_<NAME>=true|false`` (``NAME`` = flag value
+ * with ``-`` → ``_``, upper-cased) to bypass LaunchDarkly for that flag
+ * in local dev.  Returns ``undefined`` when no override is configured so
+ * the caller falls through to LaunchDarkly / ``defaultFlags``.
+ *
+ * Note: ``NEXT_PUBLIC_*`` env vars are baked into the bundle at build
+ * time, so the frontend image must be rebuilt after changing them.
+ */
+export function envFlagOverride<T extends Flag>(
+  flag: T,
+): FlagValues[T] | undefined {
+  const envName =
+    "NEXT_PUBLIC_FORCE_FLAG_" + flag.toUpperCase().replace(/-/g, "_");
+  const raw = process.env[envName];
+  if (raw === undefined) return undefined;
+  const normalized = raw.trim().toLowerCase();
+  if (["1", "true", "yes", "on"].includes(normalized)) {
+    return true as FlagValues[T];
+  }
+  if (["0", "false", "no", "off"].includes(normalized)) {
+    return false as FlagValues[T];
+  }
+  return undefined;
+}
+
 export function useGetFlag<T extends Flag>(flag: T): FlagValues[T] {
   const currentFlags = useFlags<FlagValues>();
   const flagValue = currentFlags[flag];
   const areFlagsEnabled = environment.areFeatureFlagsEnabled();
 
+  const override = envFlagOverride(flag);
+  if (override !== undefined) {
+    return override;
+  }
+
   if (!areFlagsEnabled || isPwMockEnabled) {
     return defaultFlags[flag];
   }
diff --git a/autogpt_platform/frontend/src/services/storage/__tests__/local-storage.test.ts b/autogpt_platform/frontend/src/services/storage/__tests__/local-storage.test.ts
new file mode 100644
index 0000000000..0a4017bbcf
--- /dev/null
+++ b/autogpt_platform/frontend/src/services/storage/__tests__/local-storage.test.ts
@@ -0,0 +1,68 @@
+import { describe, expect, it, beforeEach, vi } from "vitest";
+
+vi.mock("@sentry/nextjs", () => ({
+  captureException: vi.fn(),
+}));
+
+vi.mock("@/services/environment", () => ({
+  environment: {
+    isServerSide: vi.fn(() => false),
+  },
+}));
+
+import { Key, storage } from "../local-storage";
+import { environment } from "@/services/environment";
+
+describe("storage", () => {
+  beforeEach(() => {
+    window.localStorage.clear();
+    vi.mocked(environment.isServerSide).mockReturnValue(false);
+  });
+
+  describe("set and get", () => {
+    it("stores and retrieves a value", () => {
+      storage.set(Key.COPILOT_MODE, "fast");
+      expect(storage.get(Key.COPILOT_MODE)).toBe("fast");
+    });
+
+    it("returns null for unset keys", () => {
+      expect(storage.get(Key.COPILOT_MODE)).toBeNull();
+    });
+  });
+
+  describe("clean", () => {
+    it("removes a stored value", () => {
+      storage.set(Key.COPILOT_SOUND_ENABLED, "true");
+      storage.clean(Key.COPILOT_SOUND_ENABLED);
+      expect(storage.get(Key.COPILOT_SOUND_ENABLED)).toBeNull();
+    });
+  });
+
+  describe("server-side guard", () => {
+    it("returns undefined for get when on server side", () => {
+      vi.mocked(environment.isServerSide).mockReturnValue(true);
+      expect(storage.get(Key.COPILOT_MODE)).toBeUndefined();
+    });
+
+    it("returns undefined for set when on server side", () => {
+      vi.mocked(environment.isServerSide).mockReturnValue(true);
+      expect(storage.set(Key.COPILOT_MODE, "fast")).toBeUndefined();
+    });
+
+    it("returns undefined for clean when on server side", () => {
+      vi.mocked(environment.isServerSide).mockReturnValue(true);
+      expect(storage.clean(Key.COPILOT_MODE)).toBeUndefined();
+    });
+  });
+});
+
+describe("Key enum", () => {
+  it("has expected keys", () => {
+    expect(Key.COPILOT_MODE).toBe("copilot-mode");
+    expect(Key.COPILOT_SOUND_ENABLED).toBe("copilot-sound-enabled");
+    expect(Key.COPILOT_NOTIFICATIONS_ENABLED).toBe(
+      "copilot-notifications-enabled",
+    );
+    expect(Key.CHAT_SESSION_ID).toBe("chat_session_id");
+  });
+});
diff --git a/autogpt_platform/frontend/src/services/storage/local-storage.ts b/autogpt_platform/frontend/src/services/storage/local-storage.ts
index 39190bea3f..9a5bfc51e6 100644
--- a/autogpt_platform/frontend/src/services/storage/local-storage.ts
+++ b/autogpt_platform/frontend/src/services/storage/local-storage.ts
@@ -17,6 +17,8 @@ export enum Key {
   COPILOT_NOTIFICATION_DIALOG_DISMISSED = "copilot-notification-dialog-dismissed",
   ACTIVE_ORG = "active-org-id",
   ACTIVE_TEAM = "active-team-id",
+  COPILOT_ARTIFACT_PANEL_WIDTH = "copilot-artifact-panel-width",
+  COPILOT_MODE = "copilot-mode",
   COPILOT_COMPLETED_SESSIONS = "copilot-completed-sessions",
 }
 
diff --git a/autogpt_platform/frontend/src/tests/AGENTS.md b/autogpt_platform/frontend/src/tests/AGENTS.md
index f6cc3dca7e..1969708e8c 100644
--- a/autogpt_platform/frontend/src/tests/AGENTS.md
+++ b/autogpt_platform/frontend/src/tests/AGENTS.md
@@ -24,6 +24,16 @@
 
 **Location:** `src/tests/*.spec.ts` (centralized, as there will be fewer of them)
 
+**Import:** Always import `test` and `expect` from `./coverage-fixture` instead of `@playwright/test`. This auto-collects V8 coverage per test for Codecov reporting.
+
+```ts
+// correct
+import { test, expect } from "./coverage-fixture";
+
+// wrong - bypasses coverage collection
+import { test, expect } from "@playwright/test";
+```
+
 ### ✅ Integration Tests (Vitest + RTL)
 
 **Use for:** Testing components with their dependencies (API calls, state).
diff --git a/autogpt_platform/frontend/src/tests/agent-activity.spec.ts b/autogpt_platform/frontend/src/tests/agent-activity.spec.ts
index f55dba35ba..4ae4a11d0c 100644
--- a/autogpt_platform/frontend/src/tests/agent-activity.spec.ts
+++ b/autogpt_platform/frontend/src/tests/agent-activity.spec.ts
@@ -1,4 +1,4 @@
-import test, { expect } from "@playwright/test";
+import { test, expect } from "./coverage-fixture";
 import { BuildPage } from "./pages/build.page";
 import * as LibraryPage from "./pages/library.page";
 import { LoginPage } from "./pages/login.page";
diff --git a/autogpt_platform/frontend/src/tests/agent-dashboard.spec.ts b/autogpt_platform/frontend/src/tests/agent-dashboard.spec.ts
index fe93e7c258..ec7ac3bfa0 100644
--- a/autogpt_platform/frontend/src/tests/agent-dashboard.spec.ts
+++ b/autogpt_platform/frontend/src/tests/agent-dashboard.spec.ts
@@ -1,4 +1,4 @@
-import test, { expect } from "@playwright/test";
+import { test, expect } from "./coverage-fixture";
 import { getTestUserWithLibraryAgents } from "./credentials";
 import { LoginPage } from "./pages/login.page";
 import { hasUrl, isHidden } from "./utils/assertion";
diff --git a/autogpt_platform/frontend/src/tests/api-keys.spec.ts b/autogpt_platform/frontend/src/tests/api-keys.spec.ts
index 813272ce5e..8c59ced981 100644
--- a/autogpt_platform/frontend/src/tests/api-keys.spec.ts
+++ b/autogpt_platform/frontend/src/tests/api-keys.spec.ts
@@ -1,4 +1,4 @@
-import { expect, test } from "@playwright/test";
+import { expect, test } from "./coverage-fixture";
 import { getTestUserWithLibraryAgents } from "./credentials";
 import { LoginPage } from "./pages/login.page";
 import { hasUrl } from "./utils/assertion";
diff --git a/autogpt_platform/frontend/src/tests/build.spec.ts b/autogpt_platform/frontend/src/tests/build.spec.ts
index 05c98535fe..ad0b9524d0 100644
--- a/autogpt_platform/frontend/src/tests/build.spec.ts
+++ b/autogpt_platform/frontend/src/tests/build.spec.ts
@@ -1,4 +1,4 @@
-import test, { expect } from "@playwright/test";
+import { test, expect } from "./coverage-fixture";
 import { BuildPage } from "./pages/build.page";
 import { LoginPage } from "./pages/login.page";
 import { hasUrl } from "./utils/assertion";
diff --git a/autogpt_platform/frontend/src/tests/coverage-fixture.ts b/autogpt_platform/frontend/src/tests/coverage-fixture.ts
new file mode 100644
index 0000000000..0aa0e3216c
--- /dev/null
+++ b/autogpt_platform/frontend/src/tests/coverage-fixture.ts
@@ -0,0 +1,33 @@
+import { test as base } from "@playwright/test";
+import { addCoverageReport } from "monocart-reporter";
+
+const test = base.extend<{ autoTestFixture: void }>({
+  autoTestFixture: [
+    async ({ page }, use) => {
+      let hasCoverage = false;
+      try {
+        await page.coverage.startJSCoverage({ resetOnNavigation: false });
+        hasCoverage = true;
+      } catch {
+        // coverage API not available (e.g. non-browser tests)
+      }
+
+      await use();
+
+      if (hasCoverage) {
+        try {
+          const jsCoverageList = await page.coverage.stopJSCoverage();
+          if (jsCoverageList.length > 0) {
+            await addCoverageReport(jsCoverageList, test.info());
+          }
+        } catch {
+          // Don't let coverage teardown failures mask real test failures
+        }
+      }
+    },
+    { scope: "test", auto: true },
+  ],
+});
+
+export { test };
+export { expect } from "@playwright/test";
diff --git a/autogpt_platform/frontend/src/tests/integrations/test-utils.tsx b/autogpt_platform/frontend/src/tests/integrations/test-utils.tsx
index dbe424a88b..c708f1c4e6 100644
--- a/autogpt_platform/frontend/src/tests/integrations/test-utils.tsx
+++ b/autogpt_platform/frontend/src/tests/integrations/test-utils.tsx
@@ -1,7 +1,9 @@
+import { TooltipProvider } from "@/components/atoms/Tooltip/BaseTooltip";
 import { BackendAPIProvider } from "@/lib/autogpt-server-api/context";
 import OnboardingProvider from "@/providers/onboarding/onboarding-provider";
 import { QueryClient, QueryClientProvider } from "@tanstack/react-query";
 import { render, RenderOptions } from "@testing-library/react";
+import { NuqsTestingAdapter } from "nuqs/adapters/testing";
 import { ReactElement, ReactNode } from "react";
 
 function createTestQueryClient() {
@@ -18,9 +20,13 @@ function TestProviders({ children }: { children: ReactNode }) {
   const queryClient = createTestQueryClient();
   return (
     <QueryClientProvider client={queryClient}>
-      <BackendAPIProvider>
-        <OnboardingProvider>{children}</OnboardingProvider>
-      </BackendAPIProvider>
+      <NuqsTestingAdapter>
+        <BackendAPIProvider>
+          <OnboardingProvider>
+            <TooltipProvider>{children}</TooltipProvider>
+          </OnboardingProvider>
+        </BackendAPIProvider>
+      </NuqsTestingAdapter>
     </QueryClientProvider>
   );
 }
diff --git a/autogpt_platform/frontend/src/tests/library.spec.ts b/autogpt_platform/frontend/src/tests/library.spec.ts
index d9fcd54988..98ba698398 100644
--- a/autogpt_platform/frontend/src/tests/library.spec.ts
+++ b/autogpt_platform/frontend/src/tests/library.spec.ts
@@ -1,4 +1,4 @@
-import test, { expect } from "@playwright/test";
+import { test, expect } from "./coverage-fixture";
 import path from "path";
 import { getTestUserWithLibraryAgents } from "./credentials";
 import { LibraryPage } from "./pages/library.page";
diff --git a/autogpt_platform/frontend/src/tests/marketplace-agent.spec.ts b/autogpt_platform/frontend/src/tests/marketplace-agent.spec.ts
index 75efb134d5..fb38b90d63 100644
--- a/autogpt_platform/frontend/src/tests/marketplace-agent.spec.ts
+++ b/autogpt_platform/frontend/src/tests/marketplace-agent.spec.ts
@@ -1,4 +1,4 @@
-import { expect, test } from "@playwright/test";
+import { expect, test } from "./coverage-fixture";
 import { getTestUserWithLibraryAgents } from "./credentials";
 import { LoginPage } from "./pages/login.page";
 import { MarketplacePage } from "./pages/marketplace.page";
diff --git a/autogpt_platform/frontend/src/tests/marketplace-creator.spec.ts b/autogpt_platform/frontend/src/tests/marketplace-creator.spec.ts
index 595e5b9eb0..6fbf4d39be 100644
--- a/autogpt_platform/frontend/src/tests/marketplace-creator.spec.ts
+++ b/autogpt_platform/frontend/src/tests/marketplace-creator.spec.ts
@@ -1,4 +1,4 @@
-import { test } from "@playwright/test";
+import { test } from "./coverage-fixture";
 import { getTestUserWithLibraryAgents } from "./credentials";
 import { LoginPage } from "./pages/login.page";
 import { MarketplacePage } from "./pages/marketplace.page";
diff --git a/autogpt_platform/frontend/src/tests/marketplace.spec.ts b/autogpt_platform/frontend/src/tests/marketplace.spec.ts
index 489baaa026..83b0d81d92 100644
--- a/autogpt_platform/frontend/src/tests/marketplace.spec.ts
+++ b/autogpt_platform/frontend/src/tests/marketplace.spec.ts
@@ -1,4 +1,4 @@
-import { expect, test } from "@playwright/test";
+import { expect, test } from "./coverage-fixture";
 import { getTestUserWithLibraryAgents } from "./credentials";
 import { LoginPage } from "./pages/login.page";
 import { MarketplacePage } from "./pages/marketplace.page";
diff --git a/autogpt_platform/frontend/src/tests/onboarding.spec.ts b/autogpt_platform/frontend/src/tests/onboarding.spec.ts
index 643fa0e2b4..321469c268 100644
--- a/autogpt_platform/frontend/src/tests/onboarding.spec.ts
+++ b/autogpt_platform/frontend/src/tests/onboarding.spec.ts
@@ -1,4 +1,4 @@
-import test, { expect } from "@playwright/test";
+import { test, expect } from "./coverage-fixture";
 import { signupTestUser } from "./utils/signup";
 import { completeOnboardingWizard } from "./utils/onboarding";
 import { getSelectors } from "./utils/selectors";
@@ -32,7 +32,7 @@ test("onboarding wizard step navigation works", async ({ page }) => {
 
   // Step 1: Welcome
   await expect(page.getByText("Welcome to AutoGPT")).toBeVisible();
-  await page.getByLabel("Your first name").fill("Bob");
+  await page.getByLabel("What should I call you?").fill("Bob");
   await page.getByRole("button", { name: "Continue" }).click();
 
   // Step 2: Role — verify we're here, then go back
@@ -41,7 +41,7 @@ test("onboarding wizard step navigation works", async ({ page }) => {
 
   // Should be back on step 1 with name preserved
   await expect(page.getByText("Welcome to AutoGPT")).toBeVisible();
-  await expect(page.getByLabel("Your first name")).toHaveValue("Bob");
+  await expect(page.getByLabel("What should I call you?")).toHaveValue("Bob");
 });
 
 test("onboarding wizard validates required fields", async ({ page }) => {
@@ -53,18 +53,13 @@ test("onboarding wizard validates required fields", async ({ page }) => {
   await expect(continueButton).toBeDisabled();
 
   // Fill name — continue should become enabled
-  await page.getByLabel("Your first name").fill("Charlie");
+  await page.getByLabel("What should I call you?").fill("Charlie");
   await expect(continueButton).toBeEnabled();
   await continueButton.click();
 
-  // Step 2: Continue should be disabled without a role
-  const step2Continue = page.getByRole("button", { name: "Continue" });
-  await expect(step2Continue).toBeDisabled();
-
-  // Select role — continue should become enabled
+  // Step 2: Role — selecting auto-advances to step 3
+  await expect(page.getByText("What best describes you")).toBeVisible();
   await page.getByText("Engineering").click();
-  await expect(step2Continue).toBeEnabled();
-  await step2Continue.click();
 
   // Step 3: Launch Autopilot should be disabled without any pain points
   const launchButton = page.getByRole("button", { name: "Launch Autopilot" });
@@ -95,7 +90,7 @@ test("onboarding URL params sync with steps", async ({ page }) => {
   await expect(page.getByText("Welcome to AutoGPT")).toBeVisible();
 
   // Fill name and go to step 2
-  await page.getByLabel("Your first name").fill("Test");
+  await page.getByLabel("What should I call you?").fill("Test");
   await page.getByRole("button", { name: "Continue" }).click();
 
   // URL should show step=2
@@ -106,12 +101,11 @@ test("role-based pain point ordering works", async ({ page }) => {
   await signupTestUser(page, undefined, undefined, false);
 
   // Complete step 1
-  await page.getByLabel("Your first name").fill("Test");
+  await page.getByLabel("What should I call you?").fill("Test");
   await page.getByRole("button", { name: "Continue" }).click();
 
-  // Select Sales/BD role
+  // Select Sales/BD role (auto-advances to step 3)
   await page.getByText("Sales / BD").click();
-  await page.getByRole("button", { name: "Continue" }).click();
 
   // On pain points step, "Finding leads" should be visible (top pick for Sales)
   await expect(page.getByText("What's eating your time?")).toBeVisible();
diff --git a/autogpt_platform/frontend/src/tests/profile-form.spec.ts b/autogpt_platform/frontend/src/tests/profile-form.spec.ts
index e9b018858f..3ca593809c 100644
--- a/autogpt_platform/frontend/src/tests/profile-form.spec.ts
+++ b/autogpt_platform/frontend/src/tests/profile-form.spec.ts
@@ -1,4 +1,4 @@
-import test, { expect } from "@playwright/test";
+import { test, expect } from "./coverage-fixture";
 import { getTestUserWithLibraryAgents } from "./credentials";
 import { LoginPage } from "./pages/login.page";
 import { ProfileFormPage } from "./pages/profile-form.page";
diff --git a/autogpt_platform/frontend/src/tests/profile.spec.ts b/autogpt_platform/frontend/src/tests/profile.spec.ts
index c592b370ea..60f28e7372 100644
--- a/autogpt_platform/frontend/src/tests/profile.spec.ts
+++ b/autogpt_platform/frontend/src/tests/profile.spec.ts
@@ -1,6 +1,6 @@
 import { LoginPage } from "./pages/login.page";
 import { ProfilePage } from "./pages/profile.page";
-import test, { expect } from "@playwright/test";
+import { test, expect } from "./coverage-fixture";
 import { getTestUser } from "./utils/auth";
 import { hasUrl } from "./utils/assertion";
 
diff --git a/autogpt_platform/frontend/src/tests/publish-agent.spec.ts b/autogpt_platform/frontend/src/tests/publish-agent.spec.ts
index 6e1811e25d..e2dafef873 100644
--- a/autogpt_platform/frontend/src/tests/publish-agent.spec.ts
+++ b/autogpt_platform/frontend/src/tests/publish-agent.spec.ts
@@ -1,4 +1,4 @@
-import test from "@playwright/test";
+import { test } from "./coverage-fixture";
 import { getTestUserWithLibraryAgents } from "./credentials";
 import { LoginPage } from "./pages/login.page";
 import {
diff --git a/autogpt_platform/frontend/src/tests/settings.spec.ts b/autogpt_platform/frontend/src/tests/settings.spec.ts
index 72152ea66f..25ca0c337a 100644
--- a/autogpt_platform/frontend/src/tests/settings.spec.ts
+++ b/autogpt_platform/frontend/src/tests/settings.spec.ts
@@ -1,4 +1,4 @@
-import test, { expect } from "@playwright/test";
+import { test, expect } from "./coverage-fixture";
 import { getTestUser } from "./utils/auth";
 import { LoginPage } from "./pages/login.page";
 import { hasAttribute, hasUrl, isHidden, isVisible } from "./utils/assertion";
diff --git a/autogpt_platform/frontend/src/tests/signin.spec.ts b/autogpt_platform/frontend/src/tests/signin.spec.ts
index 2feb11e775..f7249ca059 100644
--- a/autogpt_platform/frontend/src/tests/signin.spec.ts
+++ b/autogpt_platform/frontend/src/tests/signin.spec.ts
@@ -1,6 +1,6 @@
 // auth.spec.ts
 
-import test from "@playwright/test";
+import { test } from "./coverage-fixture";
 import { BuildPage } from "./pages/build.page";
 import { LoginPage } from "./pages/login.page";
 import { hasUrl, isHidden, isVisible } from "./utils/assertion";
diff --git a/autogpt_platform/frontend/src/tests/signup.spec.ts b/autogpt_platform/frontend/src/tests/signup.spec.ts
index 8606eebaa7..bcf5ea3725 100644
--- a/autogpt_platform/frontend/src/tests/signup.spec.ts
+++ b/autogpt_platform/frontend/src/tests/signup.spec.ts
@@ -1,4 +1,4 @@
-import test, { expect } from "@playwright/test";
+import { test, expect } from "./coverage-fixture";
 import {
   generateTestEmail,
   generateTestPassword,
diff --git a/autogpt_platform/frontend/src/tests/title.spec.ts b/autogpt_platform/frontend/src/tests/title.spec.ts
index 08bade66b4..87cac8fe53 100644
--- a/autogpt_platform/frontend/src/tests/title.spec.ts
+++ b/autogpt_platform/frontend/src/tests/title.spec.ts
@@ -1,4 +1,4 @@
-import test, { expect } from "@playwright/test";
+import { test, expect } from "./coverage-fixture";
 
 test("has title", async ({ page }) => {
   await page.goto("/");
diff --git a/autogpt_platform/frontend/src/tests/util.spec.ts b/autogpt_platform/frontend/src/tests/util.spec.ts
index acf4cf53ec..7e766457ac 100644
--- a/autogpt_platform/frontend/src/tests/util.spec.ts
+++ b/autogpt_platform/frontend/src/tests/util.spec.ts
@@ -1,4 +1,4 @@
-import test, { expect } from "@playwright/test";
+import { test, expect } from "./coverage-fixture";
 import { setNestedProperty } from "../lib/utils";
 
 const testCases = [
diff --git a/autogpt_platform/frontend/src/tests/utils/onboarding.ts b/autogpt_platform/frontend/src/tests/utils/onboarding.ts
index 22dd7330b4..375babc743 100644
--- a/autogpt_platform/frontend/src/tests/utils/onboarding.ts
+++ b/autogpt_platform/frontend/src/tests/utils/onboarding.ts
@@ -52,15 +52,14 @@ export async function completeOnboardingWizard(
   await expect(page.getByText("Welcome to AutoGPT")).toBeVisible({
     timeout: 10000,
   });
-  await page.getByLabel("Your first name").fill(name);
+  await page.getByLabel("What should I call you?").fill(name);
   await page.getByRole("button", { name: "Continue" }).click();
 
-  // Step 2: Role — select a role
+  // Step 2: Role — select a role (auto-advances after selection)
   await expect(page.getByText("What best describes you")).toBeVisible({
     timeout: 5000,
   });
   await page.getByText(role, { exact: false }).click();
-  await page.getByRole("button", { name: "Continue" }).click();
 
   // Step 3: Pain points — select tasks
   await expect(page.getByText("What's eating your time?")).toBeVisible({
@@ -72,9 +71,6 @@ export async function completeOnboardingWizard(
   await page.getByRole("button", { name: "Launch Autopilot" }).click();
 
   // Step 4: Preparing — wait for animation to complete and redirect to /copilot
-  await expect(page.getByText("Preparing your workspace")).toBeVisible({
-    timeout: 5000,
-  });
   await page.waitForURL(/\/copilot/, { timeout: 15000 });
 
   return { name, role, painPoints };
diff --git a/autogpt_platform/frontend/tailwind.config.ts b/autogpt_platform/frontend/tailwind.config.ts
index 754521de4c..6b6fff3f41 100644
--- a/autogpt_platform/frontend/tailwind.config.ts
+++ b/autogpt_platform/frontend/tailwind.config.ts
@@ -175,6 +175,13 @@ const config = {
             boxShadow: "0 0 0 30px rgba(0, 0, 0, 0)",
           },
         },
+        shake: {
+          "0%, 100%": { transform: "translateX(0)" },
+          "20%": { transform: "translateX(-4px)" },
+          "40%": { transform: "translateX(4px)" },
+          "60%": { transform: "translateX(-3px)" },
+          "80%": { transform: "translateX(3px)" },
+        },
       },
       animation: {
         "accordion-down": "accordion-down 0.2s ease-out",
@@ -182,6 +189,7 @@ const config = {
         "fade-in": "fade-in 0.2s ease-out",
         shimmer: "shimmer 4s ease-in-out infinite",
         loader: "loader 1s infinite",
+        shake: "shake 0.5s ease-in-out",
       },
       transitionDuration: {
         "2000": "2000ms",
diff --git a/codecov.yml b/codecov.yml
index 193f37a9d3..8a09885275 100644
--- a/codecov.yml
+++ b/codecov.yml
@@ -22,6 +22,10 @@ flags:
     paths:
       - autogpt_platform/frontend/src/
     carryforward: true
+  platform-frontend-e2e:
+    paths:
+      - autogpt_platform/frontend/src/
+    carryforward: true
   autogpt-agent:
     paths:
       - classic/
diff --git a/docs/integrations/block-integrations/llm.md b/docs/integrations/block-integrations/llm.md
index e14e278560..77da6fd5d0 100644
--- a/docs/integrations/block-integrations/llm.md
+++ b/docs/integrations/block-integrations/llm.md
@@ -65,7 +65,7 @@ The result routes data to yes_output or no_output, enabling intelligent branchin
 | condition | A plaintext English description of the condition to evaluate | str | Yes |
 | yes_value | (Optional) Value to output if the condition is true. If not provided, input_value will be used. | Yes Value | No |
 | no_value | (Optional) Value to output if the condition is false. If not provided, input_value will be used. | No Value | No |
-| model | The language model to use for evaluating the condition. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-sonnet-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-2.5-pro" \| "google/gemini-3.1-pro-preview" \| "google/gemini-3-flash-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-3.1-flash-lite-preview" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "mistralai/mistral-large-2512" \| "mistralai/mistral-medium-3.1" \| "mistralai/mistral-small-3.2-24b-instruct" \| "mistralai/codestral-2508" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "cohere/command-a-03-2025" \| "cohere/command-a-translate-08-2025" \| "cohere/command-a-reasoning-08-2025" \| "cohere/command-a-vision-07-2025" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-reasoning-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "microsoft/phi-4" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-3" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
+| model | The language model to use for evaluating the condition. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-sonnet-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-2.5-pro" \| "google/gemini-3.1-pro-preview" \| "google/gemini-3-flash-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-3.1-flash-lite-preview" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "mistralai/mistral-large-2512" \| "mistralai/mistral-medium-3.1" \| "mistralai/mistral-small-3.2-24b-instruct" \| "mistralai/codestral-2508" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "cohere/command-a-03-2025" \| "cohere/command-a-translate-08-2025" \| "cohere/command-a-reasoning-08-2025" \| "cohere/command-a-vision-07-2025" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-reasoning-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "microsoft/phi-4" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-3" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "z-ai/glm-4-32b" \| "z-ai/glm-4.5" \| "z-ai/glm-4.5-air" \| "z-ai/glm-4.5-air:free" \| "z-ai/glm-4.5v" \| "z-ai/glm-4.6" \| "z-ai/glm-4.6v" \| "z-ai/glm-4.7" \| "z-ai/glm-4.7-flash" \| "z-ai/glm-5" \| "z-ai/glm-5-turbo" \| "z-ai/glm-5v-turbo" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
 
 ### Outputs
 
@@ -103,7 +103,7 @@ The block sends the entire conversation history to the chosen LLM, including sys
 |-------|-------------|------|----------|
 | prompt | The prompt to send to the language model. | str | No |
 | messages | List of messages in the conversation. | List[Any] | Yes |
-| model | The language model to use for the conversation. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-sonnet-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-2.5-pro" \| "google/gemini-3.1-pro-preview" \| "google/gemini-3-flash-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-3.1-flash-lite-preview" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "mistralai/mistral-large-2512" \| "mistralai/mistral-medium-3.1" \| "mistralai/mistral-small-3.2-24b-instruct" \| "mistralai/codestral-2508" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "cohere/command-a-03-2025" \| "cohere/command-a-translate-08-2025" \| "cohere/command-a-reasoning-08-2025" \| "cohere/command-a-vision-07-2025" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-reasoning-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "microsoft/phi-4" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-3" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
+| model | The language model to use for the conversation. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-sonnet-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-2.5-pro" \| "google/gemini-3.1-pro-preview" \| "google/gemini-3-flash-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-3.1-flash-lite-preview" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "mistralai/mistral-large-2512" \| "mistralai/mistral-medium-3.1" \| "mistralai/mistral-small-3.2-24b-instruct" \| "mistralai/codestral-2508" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "cohere/command-a-03-2025" \| "cohere/command-a-translate-08-2025" \| "cohere/command-a-reasoning-08-2025" \| "cohere/command-a-vision-07-2025" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-reasoning-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "microsoft/phi-4" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-3" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "z-ai/glm-4-32b" \| "z-ai/glm-4.5" \| "z-ai/glm-4.5-air" \| "z-ai/glm-4.5-air:free" \| "z-ai/glm-4.5v" \| "z-ai/glm-4.6" \| "z-ai/glm-4.6v" \| "z-ai/glm-4.7" \| "z-ai/glm-4.7-flash" \| "z-ai/glm-5" \| "z-ai/glm-5-turbo" \| "z-ai/glm-5v-turbo" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
 | max_tokens | The maximum number of tokens to generate in the chat completion. | int | No |
 | ollama_host | Ollama host for local  models | str | No |
 
@@ -257,7 +257,7 @@ The block formulates a prompt based on the given focus or source data, sends it
 |-------|-------------|------|----------|
 | focus | The focus of the list to generate. | str | No |
 | source_data | The data to generate the list from. | str | No |
-| model | The language model to use for generating the list. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-sonnet-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-2.5-pro" \| "google/gemini-3.1-pro-preview" \| "google/gemini-3-flash-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-3.1-flash-lite-preview" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "mistralai/mistral-large-2512" \| "mistralai/mistral-medium-3.1" \| "mistralai/mistral-small-3.2-24b-instruct" \| "mistralai/codestral-2508" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "cohere/command-a-03-2025" \| "cohere/command-a-translate-08-2025" \| "cohere/command-a-reasoning-08-2025" \| "cohere/command-a-vision-07-2025" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-reasoning-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "microsoft/phi-4" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-3" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
+| model | The language model to use for generating the list. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-sonnet-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-2.5-pro" \| "google/gemini-3.1-pro-preview" \| "google/gemini-3-flash-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-3.1-flash-lite-preview" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "mistralai/mistral-large-2512" \| "mistralai/mistral-medium-3.1" \| "mistralai/mistral-small-3.2-24b-instruct" \| "mistralai/codestral-2508" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "cohere/command-a-03-2025" \| "cohere/command-a-translate-08-2025" \| "cohere/command-a-reasoning-08-2025" \| "cohere/command-a-vision-07-2025" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-reasoning-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "microsoft/phi-4" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-3" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "z-ai/glm-4-32b" \| "z-ai/glm-4.5" \| "z-ai/glm-4.5-air" \| "z-ai/glm-4.5-air:free" \| "z-ai/glm-4.5v" \| "z-ai/glm-4.6" \| "z-ai/glm-4.6v" \| "z-ai/glm-4.7" \| "z-ai/glm-4.7-flash" \| "z-ai/glm-5" \| "z-ai/glm-5-turbo" \| "z-ai/glm-5v-turbo" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
 | max_retries | Maximum number of retries for generating a valid list. | int | No |
 | force_json_output | Whether to force the LLM to produce a JSON-only response. This can increase the block's reliability, but may also reduce the quality of the response because it prohibits the LLM from reasoning before providing its JSON response. | bool | No |
 | max_tokens | The maximum number of tokens to generate in the chat completion. | int | No |
@@ -424,7 +424,7 @@ The block sends the input prompt to a chosen LLM, along with any system prompts
 | prompt | The prompt to send to the language model. | str | Yes |
 | expected_format | Expected format of the response. If provided, the response will be validated against this format. The keys should be the expected fields in the response, and the values should be the description of the field. | Dict[str, str] | Yes |
 | list_result | Whether the response should be a list of objects in the expected format. | bool | No |
-| model | The language model to use for answering the prompt. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-sonnet-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-2.5-pro" \| "google/gemini-3.1-pro-preview" \| "google/gemini-3-flash-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-3.1-flash-lite-preview" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "mistralai/mistral-large-2512" \| "mistralai/mistral-medium-3.1" \| "mistralai/mistral-small-3.2-24b-instruct" \| "mistralai/codestral-2508" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "cohere/command-a-03-2025" \| "cohere/command-a-translate-08-2025" \| "cohere/command-a-reasoning-08-2025" \| "cohere/command-a-vision-07-2025" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-reasoning-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "microsoft/phi-4" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-3" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
+| model | The language model to use for answering the prompt. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-sonnet-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-2.5-pro" \| "google/gemini-3.1-pro-preview" \| "google/gemini-3-flash-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-3.1-flash-lite-preview" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "mistralai/mistral-large-2512" \| "mistralai/mistral-medium-3.1" \| "mistralai/mistral-small-3.2-24b-instruct" \| "mistralai/codestral-2508" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "cohere/command-a-03-2025" \| "cohere/command-a-translate-08-2025" \| "cohere/command-a-reasoning-08-2025" \| "cohere/command-a-vision-07-2025" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-reasoning-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "microsoft/phi-4" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-3" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "z-ai/glm-4-32b" \| "z-ai/glm-4.5" \| "z-ai/glm-4.5-air" \| "z-ai/glm-4.5-air:free" \| "z-ai/glm-4.5v" \| "z-ai/glm-4.6" \| "z-ai/glm-4.6v" \| "z-ai/glm-4.7" \| "z-ai/glm-4.7-flash" \| "z-ai/glm-5" \| "z-ai/glm-5-turbo" \| "z-ai/glm-5v-turbo" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
 | force_json_output | Whether to force the LLM to produce a JSON-only response. This can increase the block's reliability, but may also reduce the quality of the response because it prohibits the LLM from reasoning before providing its JSON response. | bool | No |
 | sys_prompt | The system prompt to provide additional context to the model. | str | No |
 | conversation_history | The conversation history to provide context for the prompt. | List[Dict[str, Any]] | No |
@@ -464,7 +464,7 @@ The block sends the input prompt to a chosen LLM, processes the response, and re
 | Input | Description | Type | Required |
 |-------|-------------|------|----------|
 | prompt | The prompt to send to the language model. You can use any of the {keys} from Prompt Values to fill in the prompt with values from the prompt values dictionary by putting them in curly braces. | str | Yes |
-| model | The language model to use for answering the prompt. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-sonnet-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-2.5-pro" \| "google/gemini-3.1-pro-preview" \| "google/gemini-3-flash-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-3.1-flash-lite-preview" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "mistralai/mistral-large-2512" \| "mistralai/mistral-medium-3.1" \| "mistralai/mistral-small-3.2-24b-instruct" \| "mistralai/codestral-2508" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "cohere/command-a-03-2025" \| "cohere/command-a-translate-08-2025" \| "cohere/command-a-reasoning-08-2025" \| "cohere/command-a-vision-07-2025" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-reasoning-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "microsoft/phi-4" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-3" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
+| model | The language model to use for answering the prompt. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-sonnet-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-2.5-pro" \| "google/gemini-3.1-pro-preview" \| "google/gemini-3-flash-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-3.1-flash-lite-preview" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "mistralai/mistral-large-2512" \| "mistralai/mistral-medium-3.1" \| "mistralai/mistral-small-3.2-24b-instruct" \| "mistralai/codestral-2508" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "cohere/command-a-03-2025" \| "cohere/command-a-translate-08-2025" \| "cohere/command-a-reasoning-08-2025" \| "cohere/command-a-vision-07-2025" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-reasoning-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "microsoft/phi-4" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-3" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "z-ai/glm-4-32b" \| "z-ai/glm-4.5" \| "z-ai/glm-4.5-air" \| "z-ai/glm-4.5-air:free" \| "z-ai/glm-4.5v" \| "z-ai/glm-4.6" \| "z-ai/glm-4.6v" \| "z-ai/glm-4.7" \| "z-ai/glm-4.7-flash" \| "z-ai/glm-5" \| "z-ai/glm-5-turbo" \| "z-ai/glm-5v-turbo" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
 | sys_prompt | The system prompt to provide additional context to the model. | str | No |
 | retry | Number of times to retry the LLM call if the response does not match the expected format. | int | No |
 | prompt_values | Values used to fill in the prompt. The values can be used in the prompt by putting them in a double curly braces, e.g. {{variable_name}}. | Dict[str, str] | No |
@@ -501,7 +501,7 @@ The block splits the input text into smaller chunks, sends each chunk to an LLM
 | Input | Description | Type | Required |
 |-------|-------------|------|----------|
 | text | The text to summarize. | str | Yes |
-| model | The language model to use for summarizing the text. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-sonnet-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-2.5-pro" \| "google/gemini-3.1-pro-preview" \| "google/gemini-3-flash-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-3.1-flash-lite-preview" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "mistralai/mistral-large-2512" \| "mistralai/mistral-medium-3.1" \| "mistralai/mistral-small-3.2-24b-instruct" \| "mistralai/codestral-2508" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "cohere/command-a-03-2025" \| "cohere/command-a-translate-08-2025" \| "cohere/command-a-reasoning-08-2025" \| "cohere/command-a-vision-07-2025" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-reasoning-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "microsoft/phi-4" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-3" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
+| model | The language model to use for summarizing the text. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-sonnet-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-2.5-pro" \| "google/gemini-3.1-pro-preview" \| "google/gemini-3-flash-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-3.1-flash-lite-preview" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "mistralai/mistral-large-2512" \| "mistralai/mistral-medium-3.1" \| "mistralai/mistral-small-3.2-24b-instruct" \| "mistralai/codestral-2508" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "cohere/command-a-03-2025" \| "cohere/command-a-translate-08-2025" \| "cohere/command-a-reasoning-08-2025" \| "cohere/command-a-vision-07-2025" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-reasoning-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "microsoft/phi-4" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-3" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "z-ai/glm-4-32b" \| "z-ai/glm-4.5" \| "z-ai/glm-4.5-air" \| "z-ai/glm-4.5-air:free" \| "z-ai/glm-4.5v" \| "z-ai/glm-4.6" \| "z-ai/glm-4.6v" \| "z-ai/glm-4.7" \| "z-ai/glm-4.7-flash" \| "z-ai/glm-5" \| "z-ai/glm-5-turbo" \| "z-ai/glm-5v-turbo" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
 | focus | The topic to focus on in the summary | str | No |
 | style | The style of the summary to generate. | "concise" \| "detailed" \| "bullet points" \| "numbered list" | No |
 | max_tokens | The maximum number of tokens to generate in the chat completion. | int | No |
@@ -721,7 +721,7 @@ _Add technical explanation here._
 | Input | Description | Type | Required |
 |-------|-------------|------|----------|
 | prompt | The prompt to send to the language model. | str | Yes |
-| model | The language model to use for answering the prompt. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-sonnet-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-2.5-pro" \| "google/gemini-3.1-pro-preview" \| "google/gemini-3-flash-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-3.1-flash-lite-preview" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "mistralai/mistral-large-2512" \| "mistralai/mistral-medium-3.1" \| "mistralai/mistral-small-3.2-24b-instruct" \| "mistralai/codestral-2508" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "cohere/command-a-03-2025" \| "cohere/command-a-translate-08-2025" \| "cohere/command-a-reasoning-08-2025" \| "cohere/command-a-vision-07-2025" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-reasoning-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "microsoft/phi-4" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-3" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
+| model | The language model to use for answering the prompt. | "o3-mini" \| "o3-2025-04-16" \| "o1" \| "o1-mini" \| "gpt-5.2-2025-12-11" \| "gpt-5.1-2025-11-13" \| "gpt-5-2025-08-07" \| "gpt-5-mini-2025-08-07" \| "gpt-5-nano-2025-08-07" \| "gpt-5-chat-latest" \| "gpt-4.1-2025-04-14" \| "gpt-4.1-mini-2025-04-14" \| "gpt-4o-mini" \| "gpt-4o" \| "gpt-4-turbo" \| "claude-opus-4-1-20250805" \| "claude-opus-4-20250514" \| "claude-sonnet-4-20250514" \| "claude-opus-4-5-20251101" \| "claude-sonnet-4-5-20250929" \| "claude-haiku-4-5-20251001" \| "claude-opus-4-6" \| "claude-sonnet-4-6" \| "claude-3-haiku-20240307" \| "Qwen/Qwen2.5-72B-Instruct-Turbo" \| "nvidia/llama-3.1-nemotron-70b-instruct" \| "meta-llama/Llama-3.3-70B-Instruct-Turbo" \| "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" \| "meta-llama/Llama-3.2-3B-Instruct-Turbo" \| "llama-3.3-70b-versatile" \| "llama-3.1-8b-instant" \| "llama3.3" \| "llama3.2" \| "llama3" \| "llama3.1:405b" \| "dolphin-mistral:latest" \| "openai/gpt-oss-120b" \| "openai/gpt-oss-20b" \| "google/gemini-2.5-pro-preview-03-25" \| "google/gemini-2.5-pro" \| "google/gemini-3.1-pro-preview" \| "google/gemini-3-flash-preview" \| "google/gemini-2.5-flash" \| "google/gemini-2.0-flash-001" \| "google/gemini-3.1-flash-lite-preview" \| "google/gemini-2.5-flash-lite-preview-06-17" \| "google/gemini-2.0-flash-lite-001" \| "mistralai/mistral-nemo" \| "mistralai/mistral-large-2512" \| "mistralai/mistral-medium-3.1" \| "mistralai/mistral-small-3.2-24b-instruct" \| "mistralai/codestral-2508" \| "cohere/command-r-08-2024" \| "cohere/command-r-plus-08-2024" \| "cohere/command-a-03-2025" \| "cohere/command-a-translate-08-2025" \| "cohere/command-a-reasoning-08-2025" \| "cohere/command-a-vision-07-2025" \| "deepseek/deepseek-chat" \| "deepseek/deepseek-r1-0528" \| "perplexity/sonar" \| "perplexity/sonar-pro" \| "perplexity/sonar-reasoning-pro" \| "perplexity/sonar-deep-research" \| "nousresearch/hermes-3-llama-3.1-405b" \| "nousresearch/hermes-3-llama-3.1-70b" \| "amazon/nova-lite-v1" \| "amazon/nova-micro-v1" \| "amazon/nova-pro-v1" \| "microsoft/wizardlm-2-8x22b" \| "microsoft/phi-4" \| "gryphe/mythomax-l2-13b" \| "meta-llama/llama-4-scout" \| "meta-llama/llama-4-maverick" \| "x-ai/grok-3" \| "x-ai/grok-4" \| "x-ai/grok-4-fast" \| "x-ai/grok-4.1-fast" \| "x-ai/grok-code-fast-1" \| "moonshotai/kimi-k2" \| "qwen/qwen3-235b-a22b-thinking-2507" \| "qwen/qwen3-coder" \| "z-ai/glm-4-32b" \| "z-ai/glm-4.5" \| "z-ai/glm-4.5-air" \| "z-ai/glm-4.5-air:free" \| "z-ai/glm-4.5v" \| "z-ai/glm-4.6" \| "z-ai/glm-4.6v" \| "z-ai/glm-4.7" \| "z-ai/glm-4.7-flash" \| "z-ai/glm-5" \| "z-ai/glm-5-turbo" \| "z-ai/glm-5v-turbo" \| "Llama-4-Scout-17B-16E-Instruct-FP8" \| "Llama-4-Maverick-17B-128E-Instruct-FP8" \| "Llama-3.3-8B-Instruct" \| "Llama-3.3-70B-Instruct" \| "v0-1.5-md" \| "v0-1.5-lg" \| "v0-1.0-md" | No |
 | multiple_tool_calls | Whether to allow multiple tool calls in a single response. | bool | No |
 | sys_prompt | The system prompt to provide additional context to the model. | str | No |
 | conversation_history | The conversation history to provide context for the prompt. | List[Dict[str, Any]] | No |