fix(schema): rename migration dirs to 14-digit Prisma timestamps, add onUpdate: Cascade

- Rename 20260310_ → 20260310120000_ (schema) and 20260310130000_ (seed) - Add onUpdate: Cascade to LlmModelMigration FK relations
Seed LLM model creators and link models
2026-04-08 03:00:28 -04:00 · 2026-04-04 20:47:49 +00:00 · 2026-03-25 13:57:41 +00:00 · 2026-03-23 13:43:15 +00:00 · 2026-03-23 13:20:48 +00:00 · 2026-03-23 13:03:02 +00:00
207 changed files with 13681 additions and 5828 deletions
--- a/.claude/skills/pr-address/SKILL.md
+++ b/.claude/skills/pr-address/SKILL.md
@@ -2,7 +2,7 @@
 name: pr-address
 description: Address PR review comments and loop until CI green and all comments resolved. TRIGGER when user asks to address comments, fix PR feedback, respond to reviewers, or babysit/monitor a PR.
 user-invocable: true
-args: "[PR number or URL] — if omitted, finds PR for current branch."
+argument-hint: "[PR number or URL] — if omitted, finds PR for current branch."
 metadata:
  author: autogpt-team
  version: "1.0.0"
@@ -19,16 +19,60 @@ gh pr view {N}

 ## Fetch comments (all sources)

+### 1. Inline review threads — GraphQL (primary source of actionable items)
+
+Use GraphQL to fetch inline threads. It natively exposes `isResolved`, returns threads already grouped with all replies, and paginates via cursor — no manual thread reconstruction needed.
+
 ```bash
-gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews       # top-level reviews
-gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments      # inline review comments
-gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments     # PR conversation comments
+gh api graphql -f query='
+{
+  repository(owner: "Significant-Gravitas", name: "AutoGPT") {
+    pullRequest(number: {N}) {
+      reviewThreads(first: 100) {
+        pageInfo { hasNextPage endCursor }
+        nodes {
+          id
+          isResolved
+          path
+          comments(last: 1) {
+            nodes { databaseId body author { login } createdAt }
+          }
+        }
+      }
+    }
+  }
+}'
 ```

-**Bots to watch for:**
- `autogpt-reviewer` — posts "Blockers", "Should Fix", "Nice to Have". Address ALL of them.
- `sentry[bot]` — bug predictions. Fix real bugs, explain false positives.
- `coderabbitai[bot]` — automated review. Address actionable items.
+If `pageInfo.hasNextPage` is true, fetch subsequent pages by adding `after: "<endCursor>"` to `reviewThreads(first: 100, after: "...")` and repeat until `hasNextPage` is false.
+
+**Filter to unresolved threads only** — skip any thread where `isResolved: true`. `comments(last: 1)` returns the most recent comment in the thread — act on that; it reflects the reviewer's final ask. Use the thread `id` (Relay global ID) to track threads across polls.
+
+### 2. Top-level reviews — REST (MUST paginate)
+
+```bash
+gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate
+```
+
+**CRITICAL — always `--paginate`.** Reviews default to 30 per page. PRs can have 80–170+ reviews (mostly empty resolution events). Without pagination you miss reviews past position 30 — including `autogpt-reviewer`'s structured review which is typically posted after several CI runs and sits well beyond the first page.
+
+Two things to extract:
+- **Overall state**: look for `CHANGES_REQUESTED` or `APPROVED` reviews.
+- **Actionable feedback**: non-empty bodies only. Empty-body reviews are thread-resolution events — they indicate progress but have no feedback to act on.
+
+**Where each reviewer posts:**
+- `autogpt-reviewer` — posts detailed structured reviews ("Blockers", "Should Fix", "Nice to Have") as **top-level reviews**. Not present on every PR. Address ALL items.
+- `sentry[bot]` — posts bug predictions as **inline threads**. Fix real bugs, explain false positives.
+- `coderabbitai[bot]` — posts summaries as **top-level reviews** AND actionable items as **inline threads**. Address actionable items.
+- Human reviewers — can post in any source. Address ALL non-empty feedback.
+
+### 3. PR conversation comments — REST
+
+```bash
+gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments --paginate
+```
+
+Mostly contains: bot summaries (`coderabbitai[bot]`), CI/conflict detection (`github-actions[bot]`), and author status updates. Scan for non-empty messages from non-bot human reviewers that aren't the PR author — those are the ones that need a response.

 ## For each unaddressed comment

@@ -40,8 +84,8 @@ Address comments **one at a time**: fix → commit → push → inline reply →

 | Comment type | How to reply |
 |---|---|
-| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="Fixed in <commit-sha>: <description>"` |
-| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="Fixed in <commit-sha>: <description>"` |
+| Inline review (`pulls/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments/{ID}/replies -f body="🤖 Fixed in <commit-sha>: <description>"` |
+| Conversation (`issues/{N}/comments`) | `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments -f body="🤖 Fixed in <commit-sha>: <description>"` |

 ## Format and commit

@@ -74,21 +118,83 @@ address comments → format → commit → push
 → repeat until: all comments addressed AND CI green AND no new comments arriving
 ```

-### Waiting for CI
+### Polling for CI + new comments

-Use `gh pr checks --watch --fail-fast` to efficiently wait for CI. This blocks until all checks finish (or one fails early):
+After pushing, poll for **both** CI status and new comments in a single loop. Do not use `gh pr checks --watch` — it blocks the tool and prevents reacting to new comments while CI is running.

+> **Note:** `gh pr checks --watch --fail-fast` is tempting but it blocks the entire Bash tool call, meaning the agent cannot check for or address new comments until CI fully completes. Always poll manually instead.
+
+**Polling loop — repeat every 30 seconds:**
+
+1. Check CI status:
 ```bash
-gh pr checks --watch --fail-fast
+gh pr checks {N} --repo Significant-Gravitas/AutoGPT --json bucket,name,link
+```
+   Parse the results: if every check has `bucket` of `"pass"` or `"skipping"`, CI is green. If any has `"fail"`, CI has failed. Otherwise CI is still pending.
+
+2. Check for merge conflicts:
+```bash
+gh pr view {N} --repo Significant-Gravitas/AutoGPT --json mergeable --jq '.mergeable'
+```
+   If the result is `"CONFLICTING"`, the PR has a merge conflict — see "Resolving merge conflicts" below. If `"UNKNOWN"`, GitHub is still computing mergeability — wait and re-check next poll.
+
+3. Check for new/changed comments (all three sources):
+
+   **Inline threads** — re-run the GraphQL query from "Fetch comments". For each unresolved thread, record `{thread_id, last_comment_databaseId}` as your baseline. On each poll, action is needed if:
+   - A new thread `id` appears that wasn't in the baseline (new thread), OR
+   - An existing thread's `last_comment_databaseId` has changed (new reply on existing thread)
+
+   **Conversation comments:**
+   ```bash
+   gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments --paginate
+   ```
+   Compare total count and newest `id` against baseline. Filter to non-empty, non-bot, non-author-update messages.
+
+   **Top-level reviews:**
+   ```bash
+   gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate
+   ```
+   Watch for new non-empty reviews (`CHANGES_REQUESTED` or `COMMENTED` with body). Compare total count and newest `id` against baseline.
+
+4. **React in this precedence order (first match wins):**
+
+| What happened | Action |
+|---|---|
+| Merge conflict detected | See "Resolving merge conflicts" below. |
+| Mergeability is `UNKNOWN` | GitHub is still computing mergeability. Sleep 30 seconds, then restart polling from the top. |
+| New comments detected | Address them (fix → commit → push → reply). After pushing, re-fetch all comments to update your baseline, then restart this polling loop from the top (new commits invalidate CI status). |
+| CI failed (bucket == "fail") | Get failed check links: `gh pr checks {N} --repo Significant-Gravitas/AutoGPT --json bucket,link --jq '.[] \| select(.bucket == "fail") \| .link'`. Extract run ID from link (format: `.../actions/runs/<run-id>/job/...`), read logs with `gh run view <run-id> --repo Significant-Gravitas/AutoGPT --log-failed`. Fix → commit → push → restart polling. |
+| CI green + no new comments | **Do not exit immediately.** Bots (coderabbitai, sentry) often post reviews shortly after CI settles. Continue polling for **2 more cycles (60s)** after CI goes green. Only exit after 2 consecutive green+quiet polls. |
+| CI pending + no new comments | Sleep 30 seconds, then poll again. |
+
+**The loop ends when:** CI fully green + all comments addressed + **2 consecutive polls with no new comments after CI settled.**
+
+### Resolving merge conflicts
+
+1. Identify the PR's target branch and remote:
+```bash
+gh pr view {N} --repo Significant-Gravitas/AutoGPT --json baseRefName --jq '.baseRefName'
+git remote -v   # find the remote pointing to Significant-Gravitas/AutoGPT (typically 'upstream' in forks, 'origin' for direct contributors)
 ```

-If a check fails:
-1. Get the failed run ID: `gh pr checks --json name,state,link --jq '.[] | select(.state == "FAILURE")'`
-2. Read logs: `gh run view <run-id> --log-failed`
-3. Fix → commit → push → wait again with `gh pr checks --watch --fail-fast`
+2. Pull the latest base branch with a 3-way merge:
+```bash
+git pull {base-remote} {base-branch} --no-rebase
+```

-### Between CI waits
+3. Resolve conflicting files, then verify no conflict markers remain:
+```bash
+if grep -R -n -E '^(<<<<<<<|=======|>>>>>>>)' <conflicted-files>; then
+  echo "Unresolved conflict markers found — resolve before proceeding."
+  exit 1
+fi
+```

-After each push and while waiting for CI, re-fetch comments — bots like `coderabbitai` and `sentry` often post new comments on fresh commits. Address those while CI is still running, then wait again.
+4. Stage and push:
+```bash
+git add <conflicted-files>
+git commit -m "Resolve merge conflicts with {base-branch}"
+git push
+```

-**The loop ends when:** CI fully green + all comments addressed + no new comments since CI settled.
+5. Restart the polling loop from the top — new commits reset CI status.
--- a/.claude/skills/pr-review/SKILL.md
+++ b/.claude/skills/pr-review/SKILL.md
@@ -28,7 +28,7 @@ gh pr diff {N}
 Before posting anything, fetch existing inline comments to avoid duplicates:

 ```bash
-gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments
+gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate
 gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews
 ```

--- a/.claude/skills/pr-test/SKILL.md
+++ b/.claude/skills/pr-test/SKILL.md
@@ -0,0 +1,534 @@
+---
+name: pr-test
+description: "E2E manual testing of PRs/branches using docker compose, agent-browser, and API calls. TRIGGER when user asks to manually test a PR, test a feature end-to-end, or run integration tests against a running system."
+user-invocable: true
+argument-hint: "[worktree path or PR number] — tests the PR in the given worktree. Optional flags: --fix (auto-fix issues found)"
+metadata:
+  author: autogpt-team
+  version: "1.0.0"
+---
+
+# Manual E2E Test
+
+Test a PR/branch end-to-end by building the full platform, interacting via browser and API, capturing screenshots, and reporting results.
+
+## Arguments
+
+- `$ARGUMENTS` — worktree path (e.g. `$REPO_ROOT`) or PR number
+- If `--fix` flag is present, auto-fix bugs found and push fixes (like pr-address loop)
+
+## Step 0: Resolve the target
+
+```bash
+# If argument is a PR number, find its worktree
+gh pr view {N} --json headRefName --jq '.headRefName'
+# If argument is a path, use it directly
+```
+
+Determine:
+- `REPO_ROOT` — the root repo directory: `git -C "$WORKTREE_PATH" worktree list | head -1 | awk '{print $1}'` (or `git rev-parse --show-toplevel` if not a worktree)
+- `WORKTREE_PATH` — the worktree directory
+- `PLATFORM_DIR` — `$WORKTREE_PATH/autogpt_platform`
+- `BACKEND_DIR` — `$PLATFORM_DIR/backend`
+- `FRONTEND_DIR` — `$PLATFORM_DIR/frontend`
+- `PR_NUMBER` — the PR number (from `gh pr list --head $(git branch --show-current)`)
+- `PR_TITLE` — the PR title, slugified (e.g. "Add copilot permissions" → "add-copilot-permissions")
+- `RESULTS_DIR` — `$REPO_ROOT/test-results/PR-{PR_NUMBER}-{slugified-title}`
+
+Create the results directory:
+```bash
+PR_NUMBER=$(cd $WORKTREE_PATH && gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT --json number --jq '.[0].number')
+PR_TITLE=$(cd $WORKTREE_PATH && gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT --json title --jq '.[0].title' | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//' | head -c 50)
+RESULTS_DIR="$REPO_ROOT/test-results/PR-${PR_NUMBER}-${PR_TITLE}"
+mkdir -p $RESULTS_DIR
+```
+
+**Test user credentials** (for logging into the UI or verifying results manually):
+- Email: `test@test.com`
+- Password: `testtest123`
+
+## Step 1: Understand the PR
+
+Before testing, understand what changed:
+
+```bash
+cd $WORKTREE_PATH
+git log --oneline dev..HEAD | head -20
+git diff dev --stat
+```
+
+Read the changed files to understand:
+1. What feature/fix does this PR implement?
+2. What components are affected? (backend, frontend, copilot, executor, etc.)
+3. What are the key user-facing behaviors to test?
+
+## Step 2: Write test scenarios
+
+Based on the PR analysis, write a test plan to `$RESULTS_DIR/test-plan.md`:
+
+```markdown
+# Test Plan: PR #{N} — {title}
+
+## Scenarios
+1. [Scenario name] — [what to verify]
+2. ...
+
+## API Tests (if applicable)
+1. [Endpoint] — [expected behavior]
+
+## UI Tests (if applicable)
+1. [Page/component] — [interaction to test]
+
+## Negative Tests
+1. [What should NOT happen]
+```
+
+**Be critical** — include edge cases, error paths, and security checks.
+
+## Step 3: Environment setup
+
+### 3a. Copy .env files from the root worktree
+
+The root worktree (`$REPO_ROOT`) has the canonical `.env` files with all API keys. Copy them to the target worktree:
+
+```bash
+# CRITICAL: .env files are NOT checked into git. They must be copied manually.
+cp $REPO_ROOT/autogpt_platform/.env $PLATFORM_DIR/.env
+cp $REPO_ROOT/autogpt_platform/backend/.env $BACKEND_DIR/.env
+cp $REPO_ROOT/autogpt_platform/frontend/.env $FRONTEND_DIR/.env
+```
+
+### 3b. Configure copilot authentication
+
+The copilot needs an LLM API to function. Two approaches (try subscription first):
+
+#### Option 1: Subscription mode (preferred — uses your Claude Max/Pro subscription)
+
+The `claude_agent_sdk` Python package **bundles its own Claude CLI binary** — no need to install `@anthropic-ai/claude-code` via npm. The backend auto-provisions credentials from environment variables on startup.
+
+Run the helper script to extract tokens from your host and auto-update `backend/.env` (works on macOS, Linux, and Windows/WSL):
+
+```bash
+# Extracts OAuth tokens and writes CLAUDE_CODE_OAUTH_TOKEN + CLAUDE_CODE_REFRESH_TOKEN into .env
+bash $BACKEND_DIR/scripts/refresh_claude_token.sh --env-file $BACKEND_DIR/.env
+```
+
+**How it works:** The script reads the OAuth token from:
+- **macOS**: system keychain (`"Claude Code-credentials"`)
+- **Linux/WSL**: `~/.claude/.credentials.json`
+- **Windows**: `%APPDATA%/claude/.credentials.json`
+
+It sets `CLAUDE_CODE_OAUTH_TOKEN`, `CLAUDE_CODE_REFRESH_TOKEN`, and `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true` in the `.env` file. On container startup, the backend auto-provisions `~/.claude/.credentials.json` inside the container from these env vars. The SDK's bundled CLI then authenticates using that file. No `claude login`, no npm install needed.
+
+**Note:** The OAuth token expires (~24h). If copilot returns auth errors, re-run the script and restart: `$BACKEND_DIR/scripts/refresh_claude_token.sh --env-file $BACKEND_DIR/.env && docker compose up -d copilot_executor`
+
+#### Option 2: OpenRouter API key mode (fallback)
+
+If subscription mode doesn't work, switch to API key mode using OpenRouter:
+
+```bash
+# In $BACKEND_DIR/.env, ensure these are set:
+CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false
+CHAT_API_KEY=<value of OPEN_ROUTER_API_KEY from the same .env>
+CHAT_BASE_URL=https://openrouter.ai/api/v1
+CHAT_USE_CLAUDE_AGENT_SDK=true
+```
+
+Use `sed` to update these values:
+```bash
+ORKEY=$(grep "^OPEN_ROUTER_API_KEY=" $BACKEND_DIR/.env | cut -d= -f2)
+[ -n "$ORKEY" ] || { echo "ERROR: OPEN_ROUTER_API_KEY is missing in $BACKEND_DIR/.env"; exit 1; }
+perl -i -pe 's/CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true/CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false/' $BACKEND_DIR/.env
+# Add or update CHAT_API_KEY and CHAT_BASE_URL
+grep -q "^CHAT_API_KEY=" $BACKEND_DIR/.env && perl -i -pe "s|^CHAT_API_KEY=.*|CHAT_API_KEY=$ORKEY|" $BACKEND_DIR/.env || echo "CHAT_API_KEY=$ORKEY" >> $BACKEND_DIR/.env
+grep -q "^CHAT_BASE_URL=" $BACKEND_DIR/.env && perl -i -pe 's|^CHAT_BASE_URL=.*|CHAT_BASE_URL=https://openrouter.ai/api/v1|' $BACKEND_DIR/.env || echo "CHAT_BASE_URL=https://openrouter.ai/api/v1" >> $BACKEND_DIR/.env
+```
+
+### 3c. Stop conflicting containers
+
+```bash
+# Stop any running app containers (keep infra: supabase, redis, rabbitmq, clamav)
+docker ps --format "{{.Names}}" | grep -E "rest_server|executor|copilot|websocket|database_manager|scheduler|notification|frontend|migrate" | while read name; do
+  docker stop "$name" 2>/dev/null
+done
+```
+
+### 3e. Build and start
+
+```bash
+cd $PLATFORM_DIR && docker compose build --no-cache 2>&1 | tail -20
+if [ ${PIPESTATUS[0]} -ne 0 ]; then echo "ERROR: Docker build failed"; exit 1; fi
+
+cd $PLATFORM_DIR && docker compose up -d 2>&1 | tail -20
+if [ ${PIPESTATUS[0]} -ne 0 ]; then echo "ERROR: Docker compose up failed"; exit 1; fi
+```
+
+**Note:** If the container appears to be running old code (e.g. missing PR changes), use `docker compose build --no-cache` to force a full rebuild. Docker BuildKit may sometimes reuse cached `COPY` layers from a previous build on a different branch.
+
+**Expected time: 3-8 minutes** for build, 5-10 minutes with `--no-cache`.
+
+### 3f. Wait for services to be ready
+
+```bash
+# Poll until backend and frontend respond
+for i in $(seq 1 60); do
+  BACKEND=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8006/docs 2>/dev/null)
+  FRONTEND=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000 2>/dev/null)
+  if [ "$BACKEND" = "200" ] && [ "$FRONTEND" = "200" ]; then
+    echo "Services ready"
+    break
+  fi
+  sleep 5
+done
+```
+
+
+### 3h. Create test user and get auth token
+
+```bash
+ANON_KEY=$(grep "NEXT_PUBLIC_SUPABASE_ANON_KEY=" $FRONTEND_DIR/.env | sed 's/.*NEXT_PUBLIC_SUPABASE_ANON_KEY=//' | tr -d '[:space:]')
+
+# Signup (idempotent — returns "User already registered" if exists)
+RESULT=$(curl -s -X POST 'http://localhost:8000/auth/v1/signup' \
+  -H "apikey: $ANON_KEY" \
+  -H 'Content-Type: application/json' \
+  -d '{"email":"test@test.com","password":"testtest123"}')
+
+# If "Database error finding user", restart supabase-auth and retry
+if echo "$RESULT" | grep -q "Database error"; then
+  docker restart supabase-auth && sleep 5
+  curl -s -X POST 'http://localhost:8000/auth/v1/signup' \
+    -H "apikey: $ANON_KEY" \
+    -H 'Content-Type: application/json' \
+    -d '{"email":"test@test.com","password":"testtest123"}'
+fi
+
+# Get auth token
+TOKEN=$(curl -s -X POST 'http://localhost:8000/auth/v1/token?grant_type=password' \
+  -H "apikey: $ANON_KEY" \
+  -H 'Content-Type: application/json' \
+  -d '{"email":"test@test.com","password":"testtest123"}' | jq -r '.access_token // ""')
+```
+
+**Use this token for ALL API calls:**
+```bash
+curl -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/...
+```
+
+## Step 4: Run tests
+
+### Service ports reference
+
+| Service | Port | URL |
+|---------|------|-----|
+| Frontend | 3000 | http://localhost:3000 |
+| Backend REST | 8006 | http://localhost:8006 |
+| Supabase Auth (via Kong) | 8000 | http://localhost:8000 |
+| Executor | 8002 | http://localhost:8002 |
+| Copilot Executor | 8008 | http://localhost:8008 |
+| WebSocket | 8001 | http://localhost:8001 |
+| Database Manager | 8005 | http://localhost:8005 |
+| Redis | 6379 | localhost:6379 |
+| RabbitMQ | 5672 | localhost:5672 |
+
+### API testing
+
+Use `curl` with the auth token for backend API tests:
+
+```bash
+# Example: List agents
+curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/graphs | jq . | head -20
+
+# Example: Create an agent
+curl -s -X POST http://localhost:8006/api/graphs \
+  -H "Authorization: Bearer $TOKEN" \
+  -H 'Content-Type: application/json' \
+  -d '{...}' | jq .
+
+# Example: Run an agent
+curl -s -X POST "http://localhost:8006/api/graphs/{graph_id}/execute" \
+  -H "Authorization: Bearer $TOKEN" \
+  -H 'Content-Type: application/json' \
+  -d '{"data": {...}}'
+
+# Example: Get execution results
+curl -s -H "Authorization: Bearer $TOKEN" \
+  "http://localhost:8006/api/graphs/{graph_id}/executions/{exec_id}" | jq .
+```
+
+### Browser testing with agent-browser
+
+```bash
+# Close any existing session
+agent-browser close 2>/dev/null || true
+
+# Use --session-name to persist cookies across navigations
+# This means login only needs to happen once per test session
+agent-browser --session-name pr-test open 'http://localhost:3000/login' --timeout 15000
+
+# Get interactive elements
+agent-browser --session-name pr-test snapshot | grep "textbox\|button"
+
+# Login
+agent-browser --session-name pr-test fill {email_ref} "test@test.com"
+agent-browser --session-name pr-test fill {password_ref} "testtest123"
+agent-browser --session-name pr-test click {login_button_ref}
+sleep 5
+
+# Dismiss cookie banner if present
+agent-browser --session-name pr-test click 'text=Accept All' 2>/dev/null || true
+
+# Navigate — cookies are preserved so login persists
+agent-browser --session-name pr-test open 'http://localhost:3000/copilot' --timeout 10000
+
+# Take screenshot
+agent-browser --session-name pr-test screenshot $RESULTS_DIR/01-page.png
+
+# Interact with elements
+agent-browser --session-name pr-test fill {ref} "text"
+agent-browser --session-name pr-test press "Enter"
+agent-browser --session-name pr-test click {ref}
+agent-browser --session-name pr-test click 'text=Button Text'
+
+# Read page content
+agent-browser --session-name pr-test snapshot | grep "text:"
+```
+
+**Key pages:**
+- `/copilot` — CoPilot chat (for testing copilot features)
+- `/build` — Agent builder (for testing block/node features)
+- `/build?flowID={id}` — Specific agent in builder
+- `/library` — Agent library (for testing listing/import features)
+- `/library/agents/{id}` — Agent detail with run history
+- `/marketplace` — Marketplace
+
+### Checking logs
+
+```bash
+# Backend REST server
+docker logs autogpt_platform-rest_server-1 2>&1 | tail -30
+
+# Executor (runs agent graphs)
+docker logs autogpt_platform-executor-1 2>&1 | tail -30
+
+# Copilot executor (runs copilot chat sessions)
+docker logs autogpt_platform-copilot_executor-1 2>&1 | tail -30
+
+# Frontend
+docker logs autogpt_platform-frontend-1 2>&1 | tail -30
+
+# Filter for errors
+docker logs autogpt_platform-executor-1 2>&1 | grep -i "error\|exception\|traceback" | tail -20
+```
+
+### Copilot chat testing
+
+The copilot uses SSE streaming. To test via API:
+
+```bash
+# Create a session
+SESSION_ID=$(curl -s -X POST 'http://localhost:8006/api/chat/sessions' \
+  -H "Authorization: Bearer $TOKEN" \
+  -H 'Content-Type: application/json' \
+  -d '{}' | jq -r '.id // .session_id // ""')
+
+# Stream a message (SSE - will stream chunks)
+curl -N -X POST "http://localhost:8006/api/chat/sessions/$SESSION_ID/stream" \
+  -H "Authorization: Bearer $TOKEN" \
+  -H 'Content-Type: application/json' \
+  -d '{"message": "Hello, what can you help me with?"}' \
+  --max-time 60 2>/dev/null | head -50
+```
+
+Or test via browser (preferred for UI verification):
+```bash
+agent-browser --session-name pr-test open 'http://localhost:3000/copilot' --timeout 10000
+# ... fill chat input and press Enter, wait 20-30s for response
+```
+
+## Step 5: Record results
+
+For each test scenario, record in `$RESULTS_DIR/test-report.md`:
+
+```markdown
+# E2E Test Report: PR #{N} — {title}
+Date: {date}
+Branch: {branch}
+Worktree: {path}
+
+## Environment
+- Docker services: [list running containers]
+- API keys: OpenRouter={present/missing}, E2B={present/missing}
+
+## Test Results
+
+### Scenario 1: {name}
+**Steps:**
+1. ...
+2. ...
+**Expected:** ...
+**Actual:** ...
+**Result:** PASS / FAIL
+**Screenshot:** {filename}.png
+**Logs:** (if relevant)
+
+### Scenario 2: {name}
+...
+
+## Summary
+- Total: X scenarios
+- Passed: Y
+- Failed: Z
+- Bugs found: [list]
+```
+
+Take screenshots at each significant step:
+```bash
+agent-browser --session-name pr-test screenshot $RESULTS_DIR/{NN}-{description}.png
+```
+
+## Step 6: Report results
+
+After all tests complete, output a summary to the user:
+
+1. Table of all scenarios with PASS/FAIL
+2. Screenshots of failures (read the PNG files to show them)
+3. Any bugs found with details
+4. Recommendations
+
+### Post test results as PR comment with screenshots
+
+Upload screenshots to the PR using the GitHub Git API (no local git operations — safe for worktrees).
+
+```bash
+# Upload screenshots via GitHub Git API (creates blobs, tree, commit, and ref remotely)
+REPO="Significant-Gravitas/AutoGPT"
+SCREENSHOTS_BRANCH="test-screenshots/pr-${PR_NUMBER}"
+SCREENSHOTS_DIR="test-screenshots/PR-${PR_NUMBER}"
+
+# Step 1: Create blobs for each screenshot
+declare -a TREE_ENTRIES
+for img in $RESULTS_DIR/*.png; do
+  BASENAME=$(basename "$img")
+  B64=$(base64 < "$img")
+  BLOB_SHA=$(gh api "repos/${REPO}/git/blobs" -f content="$B64" -f encoding="base64" --jq '.sha')
+  TREE_ENTRIES+=("-f" "tree[][path]=${SCREENSHOTS_DIR}/${BASENAME}" "-f" "tree[][mode]=100644" "-f" "tree[][type]=blob" "-f" "tree[][sha]=${BLOB_SHA}")
+done
+
+# Step 2: Create a tree with all screenshot blobs
+# Build the tree JSON manually since gh api doesn't handle arrays well
+TREE_JSON='['
+FIRST=true
+for img in $RESULTS_DIR/*.png; do
+  BASENAME=$(basename "$img")
+  B64=$(base64 < "$img")
+  BLOB_SHA=$(gh api "repos/${REPO}/git/blobs" -f content="$B64" -f encoding="base64" --jq '.sha')
+  if [ "$FIRST" = true ]; then FIRST=false; else TREE_JSON+=','; fi
+  TREE_JSON+="{\"path\":\"${SCREENSHOTS_DIR}/${BASENAME}\",\"mode\":\"100644\",\"type\":\"blob\",\"sha\":\"${BLOB_SHA}\"}"
+done
+TREE_JSON+=']'
+
+TREE_SHA=$(echo "$TREE_JSON" | gh api "repos/${REPO}/git/trees" --input - -f base_tree="" --jq '.sha' 2>/dev/null \
+  || echo "$TREE_JSON" | jq -c '{tree: .}' | gh api "repos/${REPO}/git/trees" --input - --jq '.sha')
+
+# Step 3: Create a commit pointing to that tree
+COMMIT_SHA=$(gh api "repos/${REPO}/git/commits" \
+  -f message="test: add E2E test screenshots for PR #${PR_NUMBER}" \
+  -f tree="$TREE_SHA" \
+  --jq '.sha')
+
+# Step 4: Create or update the ref (branch) — no local checkout needed
+gh api "repos/${REPO}/git/refs" \
+  -f ref="refs/heads/${SCREENSHOTS_BRANCH}" \
+  -f sha="$COMMIT_SHA" 2>/dev/null \
+  || gh api "repos/${REPO}/git/refs/heads/${SCREENSHOTS_BRANCH}" \
+    -X PATCH -f sha="$COMMIT_SHA" -f force=true
+
+# Step 5: Build image markdown and post the comment
+REPO_URL="https://raw.githubusercontent.com/${REPO}/${SCREENSHOTS_BRANCH}"
+IMAGE_MARKDOWN=""
+for img in $RESULTS_DIR/*.png; do
+  BASENAME=$(basename "$img")
+  IMAGE_MARKDOWN="$IMAGE_MARKDOWN
+![${BASENAME}](${REPO_URL}/${SCREENSHOTS_DIR}/${BASENAME})"
+done
+
+gh api "repos/${REPO}/issues/$PR_NUMBER/comments" -f body="$(cat <<EOF
+## 🧪 E2E Test Report
+
+$(cat $RESULTS_DIR/test-report.md)
+
+### Screenshots
+${IMAGE_MARKDOWN}
+EOF
+)"
+```
+
+This approach uses the GitHub Git API to create blobs, trees, commits, and refs entirely server-side. No local `git checkout` or `git push` — safe for worktrees and won't interfere with the PR branch.
+
+## Fix mode (--fix flag)
+
+When `--fix` is present, after finding a bug:
+
+1. Identify the root cause in the code
+2. Fix it in the worktree
+3. Rebuild the affected service: `cd $PLATFORM_DIR && docker compose up --build -d {service_name}`
+4. Re-test the scenario
+5. If fix works, commit and push:
+   ```bash
+   cd $WORKTREE_PATH
+   git add -A
+   git commit -m "fix: {description of fix}"
+   git push
+   ```
+6. Continue testing remaining scenarios
+7. After all fixes, run the full test suite again to ensure no regressions
+
+### Fix loop (like pr-address)
+
+```text
+test scenario → find bug → fix code → rebuild service → re-test
+→ repeat until all scenarios pass
+→ commit + push all fixes
+→ run full re-test to verify
+```
+
+## Known issues and workarounds
+
+### Problem: "Database error finding user" on signup
+**Cause:** Supabase auth service schema cache is stale after migration.
+**Fix:** `docker restart supabase-auth && sleep 5` then retry signup.
+
+### Problem: Copilot returns auth errors in subscription mode
+**Cause:** `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true` but `CLAUDE_CODE_OAUTH_TOKEN` is not set or expired.
+**Fix:** Re-extract the OAuth token from macOS keychain (see step 3b, Option 1) and recreate the container (`docker compose up -d copilot_executor`). The backend auto-provisions `~/.claude/.credentials.json` from the env var on startup. No `npm install` or `claude login` needed — the SDK bundles its own CLI binary.
+
+### Problem: agent-browser can't find chromium
+**Cause:** The Dockerfile auto-provisions system chromium on all architectures (including ARM64). If your branch is behind `dev`, this may not be present yet.
+**Fix:** Check if chromium exists: `which chromium || which chromium-browser`. If missing, install it: `apt-get install -y chromium` and set `AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium` in the container environment.
+
+### Problem: agent-browser selector matches multiple elements
+**Cause:** `text=X` matches all elements containing that text.
+**Fix:** Use `agent-browser snapshot` to get specific `ref=eNN` references, then use those: `agent-browser click eNN`.
+
+### Problem: Frontend shows cookie banner blocking interaction
+**Fix:** `agent-browser click 'text=Accept All'` before other interactions.
+
+### Problem: Container loses npm packages after rebuild
+**Cause:** `docker compose up --build` rebuilds the image, losing runtime installs.
+**Fix:** Add packages to the Dockerfile instead of installing at runtime.
+
+### Problem: Services not starting after `docker compose up`
+**Fix:** Wait and check health: `docker compose ps`. Common cause: migration hasn't finished. Check: `docker logs autogpt_platform-migrate-1 2>&1 | tail -5`. If supabase-db isn't healthy: `docker restart supabase-db && sleep 10`.
+
+### Problem: Docker uses cached layers with old code (PR changes not visible)
+**Cause:** `docker compose up --build` reuses cached `COPY` layers from previous builds. If the PR branch changes Python files but the previous build already cached that layer from `dev`, the container runs `dev` code.
+**Fix:** Always use `docker compose build --no-cache` for the first build of a PR branch. Subsequent rebuilds within the same branch can use `--build`.
+
+### Problem: `agent-browser open` loses login session
+**Cause:** Without session persistence, `agent-browser open` starts fresh.
+**Fix:** Use `--session-name pr-test` on ALL agent-browser commands. This auto-saves/restores cookies and localStorage across navigations. Alternatively, use `agent-browser eval "window.location.href = '...'"` to navigate within the same context.
+
+### Problem: Supabase auth returns "Database error querying schema"
+**Cause:** The database schema changed (migration ran) but supabase-auth has a stale schema cache.
+**Fix:** `docker restart supabase-db && sleep 10 && docker restart supabase-auth && sleep 8`. If user data was lost, re-signup.
--- a/.github/workflows/platform-backend-ci.yml
+++ b/.github/workflows/platform-backend-ci.yml
@@ -27,10 +27,91 @@ defaults:
    working-directory: autogpt_platform/backend

 jobs:
+  lint:
+    permissions:
+      contents: read
+    timeout-minutes: 10
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6
+
+      - name: Set up Python 3.12
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+
+      - name: Set up Python dependency cache
+        uses: actions/cache@v5
+        with:
+          path: ~/.cache/pypoetry
+          key: poetry-${{ runner.os }}-py3.12-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
+
+      - name: Install Poetry
+        run: |
+          HEAD_POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
+          echo "Using Poetry version ${HEAD_POETRY_VERSION}"
+          curl -sSL https://install.python-poetry.org | POETRY_VERSION=$HEAD_POETRY_VERSION python3 -
+
+      - name: Install Python dependencies
+        run: poetry install
+
+      - name: Run Linters
+        run: poetry run lint --skip-pyright
+
+    env:
+      CI: true
+      PLAIN_OUTPUT: True
+
+  type-check:
+    permissions:
+      contents: read
+    timeout-minutes: 10
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: ["3.11", "3.12", "3.13"]
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v6
+
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Set up Python dependency cache
+        uses: actions/cache@v5
+        with:
+          path: ~/.cache/pypoetry
+          key: poetry-${{ runner.os }}-py${{ matrix.python-version }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
+
+      - name: Install Poetry
+        run: |
+          HEAD_POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
+          echo "Using Poetry version ${HEAD_POETRY_VERSION}"
+          curl -sSL https://install.python-poetry.org | POETRY_VERSION=$HEAD_POETRY_VERSION python3 -
+
+      - name: Install Python dependencies
+        run: poetry install
+
+      - name: Generate Prisma Client
+        run: poetry run prisma generate && poetry run gen-prisma-stub
+
+      - name: Run Pyright
+        run: poetry run pyright --pythonversion ${{ matrix.python-version }}
+
+    env:
+      CI: true
+      PLAIN_OUTPUT: True
+
  test:
    permissions:
      contents: read
-    timeout-minutes: 30
+    timeout-minutes: 15
    strategy:
      fail-fast: false
      matrix:
@@ -98,9 +179,9 @@ jobs:
        uses: actions/cache@v5
        with:
          path: ~/.cache/pypoetry
-          key: poetry-${{ runner.os }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}
+          key: poetry-${{ runner.os }}-py${{ matrix.python-version }}-${{ hashFiles('autogpt_platform/backend/poetry.lock') }}

-      - name: Install Poetry (Unix)
+      - name: Install Poetry
        run: |
          # Extract Poetry version from backend/poetry.lock
          HEAD_POETRY_VERSION=$(python ../../.github/workflows/scripts/get_package_version_from_lockfile.py poetry)
@@ -158,22 +239,22 @@ jobs:
          echo "Waiting for ClamAV daemon to start..."
          max_attempts=60
          attempt=0
-          
+
          until nc -z localhost 3310 || [ $attempt -eq $max_attempts ]; do
            echo "ClamAV is unavailable - sleeping (attempt $((attempt+1))/$max_attempts)"
            sleep 5
            attempt=$((attempt+1))
          done
-          
+
          if [ $attempt -eq $max_attempts ]; then
            echo "ClamAV failed to start after $((max_attempts*5)) seconds"
            echo "Checking ClamAV service logs..."
            docker logs $(docker ps -q --filter "ancestor=clamav/clamav-debian:latest") 2>&1 | tail -50 || echo "No ClamAV container found"
            exit 1
          fi
-          
+
          echo "ClamAV is ready!"
-          
+
          # Verify ClamAV is responsive
          echo "Testing ClamAV connection..."
          timeout 10 bash -c 'echo "PING" | nc localhost 3310' || {
@@ -188,18 +269,13 @@ jobs:
          DATABASE_URL: ${{ steps.supabase.outputs.DB_URL }}
          DIRECT_URL: ${{ steps.supabase.outputs.DB_URL }}

-      - id: lint
-        name: Run Linter
-        run: poetry run lint
-
-      - name: Run pytest with coverage
+      - name: Run pytest
        run: |
          if [[ "${{ runner.debug }}" == "1" ]]; then
            poetry run pytest -s -vv -o log_cli=true -o log_cli_level=DEBUG
          else
            poetry run pytest -s -vv
          fi
-        if: success() || (failure() && steps.lint.outcome == 'failure')
        env:
          LOG_LEVEL: ${{ runner.debug && 'DEBUG' || 'INFO' }}
          DATABASE_URL: ${{ steps.supabase.outputs.DB_URL }}
@@ -211,6 +287,12 @@ jobs:
          REDIS_PORT: "6379"
          ENCRYPTION_KEY: "dvziYgz0KSK8FENhju0ZYi8-fRTfAdlz6YLhdB_jhNw=" # DO NOT USE IN PRODUCTION!!

+      # - name: Upload coverage reports to Codecov
+      #   uses: codecov/codecov-action@v4
+      #   with:
+      #     token: ${{ secrets.CODECOV_TOKEN }}
+      #     flags: backend,${{ runner.os }}
+
    env:
      CI: true
      PLAIN_OUTPUT: True
@@ -224,9 +306,3 @@ jobs:
      # the backend service, docker composes, and examples
      RABBITMQ_DEFAULT_USER: "rabbitmq_user_default"
      RABBITMQ_DEFAULT_PASS: "k0VMxyIJF9S35f3x2uaw5IWAl6Y536O7"
-
-      # - name: Upload coverage reports to Codecov
-      #   uses: codecov/codecov-action@v4
-      #   with:
-      #     token: ${{ secrets.CODECOV_TOKEN }}
-      #     flags: backend,${{ runner.os }}
--- a/.github/workflows/platform-fullstack-ci.yml
+++ b/.github/workflows/platform-fullstack-ci.yml
@@ -294,7 +294,7 @@ jobs:
        uses: actions/upload-artifact@v4
        with:
          name: playwright-report
-          path: playwright-report
+          path: autogpt_platform/frontend/playwright-report
          if-no-files-found: ignore
          retention-days: 3

@@ -303,7 +303,7 @@ jobs:
        uses: actions/upload-artifact@v4
        with:
          name: playwright-test-results
-          path: test-results
+          path: autogpt_platform/frontend/test-results
          if-no-files-found: ignore
          retention-days: 3

--- a/autogpt_platform/CLAUDE.md
+++ b/autogpt_platform/CLAUDE.md
@@ -56,15 +56,35 @@ AutoGPT Platform is a monorepo containing:
 - Ensure the branch name is descriptive (e.g., `feature/add-new-block`)
 - Use conventional commit messages (see below)
 - Fill out the .github/PULL_REQUEST_TEMPLATE.md template as the PR description
+- Always use `--body-file` to pass PR body — avoids shell interpretation of backticks and special characters:
+  ```bash
+  PR_BODY=$(mktemp)
+  cat > "$PR_BODY" << 'PREOF'
+  ## Summary
+  - use `backticks` freely here
+  PREOF
+  gh pr create --title "..." --body-file "$PR_BODY" --base dev
+  rm "$PR_BODY"
+  ```
 - Run the github pre-commit hooks to ensure code quality.

+### Test-Driven Development (TDD)
+
+When fixing a bug or adding a feature, follow a test-first approach:
+
+1. **Write a failing test first** — create a test that reproduces the bug or validates the new behavior, marked with `@pytest.mark.xfail` (backend) or `.fixme` (Playwright). Run it to confirm it fails for the right reason.
+2. **Implement the fix/feature** — write the minimal code to make the test pass.
+3. **Remove the xfail marker** — once the test passes, remove the `xfail`/`.fixme` annotation and run the full test suite to confirm nothing else broke.
+
+This ensures every change is covered by a test and that the test actually validates the intended behavior.
+
 ### Reviewing/Revising Pull Requests

 Use `/pr-review` to review a PR or `/pr-address` to address comments.

 When fetching comments manually:
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews` — top-level reviews
- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments` — inline review comments
+- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/reviews --paginate` — top-level reviews
+- `gh api repos/Significant-Gravitas/AutoGPT/pulls/{N}/comments --paginate` — inline review comments (always paginate to avoid missing comments beyond page 1)
 - `gh api repos/Significant-Gravitas/AutoGPT/issues/{N}/comments` — PR conversation comments

 ### Conventional Commits
--- a/autogpt_platform/backend/.env.default
+++ b/autogpt_platform/backend/.env.default
@@ -37,10 +37,6 @@ JWT_VERIFY_KEY=your-super-secret-jwt-token-with-at-least-32-characters-long
 ENCRYPTION_KEY=dvziYgz0KSK8FENhju0ZYi8-fRTfAdlz6YLhdB_jhNw=
 UNSUBSCRIBE_SECRET_KEY=HlP8ivStJjmbf6NKi78m_3FnOogut0t5ckzjsIqeaio=

-## ===== SIGNUP / INVITE GATE ===== ##
-# Set to true to require an invite before users can sign up
-ENABLE_INVITE_GATE=false
-
 ## ===== IMPORTANT OPTIONAL CONFIGURATION ===== ##
 # Platform URLs (set these for webhooks and OAuth to work)
 PLATFORM_BASE_URL=http://localhost:8000
--- a/autogpt_platform/backend/CLAUDE.md
+++ b/autogpt_platform/backend/CLAUDE.md
@@ -85,6 +85,30 @@ poetry run pytest path/to/test.py --snapshot-update
 - After refactoring, update mock targets to match new module paths
 - Use `AsyncMock` for async functions (`from unittest.mock import AsyncMock`)

+### Test-Driven Development (TDD)
+
+When fixing a bug or adding a feature, write the test **before** the implementation:
+
+```python
+# 1. Write a failing test marked xfail
+@pytest.mark.xfail(reason="Bug #1234: widget crashes on empty input")
+def test_widget_handles_empty_input():
+    result = widget.process("")
+    assert result == Widget.EMPTY_RESULT
+
+# 2. Run it — confirm it fails (XFAIL)
+# poetry run pytest path/to/test.py::test_widget_handles_empty_input -xvs
+
+# 3. Implement the fix
+
+# 4. Remove xfail, run again — confirm it passes
+def test_widget_handles_empty_input():
+    result = widget.process("")
+    assert result == Widget.EMPTY_RESULT
+```
+
+This catches regressions and proves the fix actually works. **Every bug fix should include a test that would have caught it.**
+
 ## Database Schema

 Key models (defined in `schema.prisma`):
--- a/autogpt_platform/backend/Dockerfile
+++ b/autogpt_platform/backend/Dockerfile
@@ -50,7 +50,7 @@ RUN poetry install --no-ansi --no-root
 # Generate Prisma client
 COPY autogpt_platform/backend/schema.prisma ./
 COPY autogpt_platform/backend/backend/data/partial_types.py ./backend/data/partial_types.py
-COPY autogpt_platform/backend/gen_prisma_types_stub.py ./
+COPY autogpt_platform/backend/scripts/gen_prisma_types_stub.py ./scripts/
 RUN poetry run prisma generate && poetry run gen-prisma-stub

 # =============================== DB MIGRATOR =============================== #
@@ -82,7 +82,7 @@ RUN pip3 install prisma>=0.15.0 --break-system-packages

 COPY autogpt_platform/backend/schema.prisma ./
 COPY autogpt_platform/backend/backend/data/partial_types.py ./backend/data/partial_types.py
-COPY autogpt_platform/backend/gen_prisma_types_stub.py ./
+COPY autogpt_platform/backend/scripts/gen_prisma_types_stub.py ./scripts/
 COPY autogpt_platform/backend/migrations ./migrations

 # ============================== BACKEND SERVER ============================== #
@@ -121,19 +121,37 @@ RUN ln -s ../lib/node_modules/npm/bin/npm-cli.js /usr/bin/npm \
    && ln -s ../lib/node_modules/npm/bin/npx-cli.js /usr/bin/npx
 COPY --from=builder /root/.cache/prisma-python/binaries /root/.cache/prisma-python/binaries

-# Install agent-browser (Copilot browser tool) + Chromium runtime dependencies.
-# These are the runtime libraries Chromium/Playwright needs on Debian 13 (trixie).
-RUN apt-get update && apt-get install -y --no-install-recommends \
-    libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 \
-    libdbus-1-3 libxkbcommon0 libatspi2.0-0t64 libxcomposite1 libxdamage1 \
-    libxfixes3 libxrandr2 libgbm1 libasound2t64 libpango-1.0-0 libcairo2 \
-    libx11-6 libx11-xcb1 libxcb1 libxext6 libglib2.0-0t64 \
-    fonts-liberation libfontconfig1 \
+# Install agent-browser (Copilot browser tool) + Chromium.
+# On amd64: install runtime libs + run `agent-browser install` to download
+#   Chrome for Testing (pinned version, tested with Playwright).
+# On arm64: install system chromium package — Chrome for Testing has no ARM64
+#   binary. AGENT_BROWSER_EXECUTABLE_PATH is set at runtime by the entrypoint
+#   script (below) to redirect agent-browser to the system binary.
+ARG TARGETARCH
+RUN apt-get update \
+    && if [ "$TARGETARCH" = "arm64" ]; then \
+         apt-get install -y --no-install-recommends chromium fonts-liberation; \
+       else \
+         apt-get install -y --no-install-recommends \
+           libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 \
+           libdbus-1-3 libxkbcommon0 libatspi2.0-0t64 libxcomposite1 libxdamage1 \
+           libxfixes3 libxrandr2 libgbm1 libasound2t64 libpango-1.0-0 libcairo2 \
+           libx11-6 libx11-xcb1 libxcb1 libxext6 libglib2.0-0t64 \
+           fonts-liberation libfontconfig1; \
+       fi \
    && rm -rf /var/lib/apt/lists/* \
    && npm install -g agent-browser \
-    && agent-browser install \
+    && ([ "$TARGETARCH" = "arm64" ] || agent-browser install) \
    && rm -rf /tmp/* /root/.npm

+# On arm64 the system chromium is at /usr/bin/chromium; set
+# AGENT_BROWSER_EXECUTABLE_PATH so agent-browser's daemon uses it instead of
+# Chrome for Testing (which has no ARM64 binary). On amd64 the variable is left
+# unset so agent-browser uses the Chrome for Testing binary it downloaded above.
+RUN printf '#!/bin/sh\n[ -x /usr/bin/chromium ] && export AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium\nexec "$@"\n' \
+    > /usr/local/bin/entrypoint.sh \
+    && chmod +x /usr/local/bin/entrypoint.sh
+
 WORKDIR /app/autogpt_platform/backend

 # Copy only the .venv from builder (not the entire /app directory)
@@ -155,4 +173,5 @@ RUN POETRY_VIRTUALENVS_CREATE=true POETRY_VIRTUALENVS_IN_PROJECT=true \

 ENV PORT=8000

+ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
 CMD ["rest"]
--- a/autogpt_platform/backend/backend/api/features/admin/model.py
+++ b/autogpt_platform/backend/backend/api/features/admin/model.py
@@ -1,17 +1,8 @@
-from __future__ import annotations
-
-from datetime import datetime
-from typing import TYPE_CHECKING, Any, Literal, Optional
-
-import prisma.enums
-from pydantic import BaseModel, EmailStr
+from pydantic import BaseModel

 from backend.data.model import UserTransaction
 from backend.util.models import Pagination

-if TYPE_CHECKING:
-    from backend.data.invited_user import BulkInvitedUsersResult, InvitedUserRecord
-

 class UserHistoryResponse(BaseModel):
    """Response model for listings with version history"""
@@ -23,70 +14,3 @@ class UserHistoryResponse(BaseModel):
 class AddUserCreditsResponse(BaseModel):
    new_balance: int
    transaction_key: str
-
-
-class CreateInvitedUserRequest(BaseModel):
-    email: EmailStr
-    name: Optional[str] = None
-
-
-class InvitedUserResponse(BaseModel):
-    id: str
-    email: str
-    status: prisma.enums.InvitedUserStatus
-    auth_user_id: Optional[str] = None
-    name: Optional[str] = None
-    tally_understanding: Optional[dict[str, Any]] = None
-    tally_status: prisma.enums.TallyComputationStatus
-    tally_computed_at: Optional[datetime] = None
-    tally_error: Optional[str] = None
-    created_at: datetime
-    updated_at: datetime
-
-    @classmethod
-    def from_record(cls, record: InvitedUserRecord) -> InvitedUserResponse:
-        return cls.model_validate(record.model_dump())
-
-
-class InvitedUsersResponse(BaseModel):
-    invited_users: list[InvitedUserResponse]
-    pagination: Pagination
-
-
-class BulkInvitedUserRowResponse(BaseModel):
-    row_number: int
-    email: Optional[str] = None
-    name: Optional[str] = None
-    status: Literal["CREATED", "SKIPPED", "ERROR"]
-    message: str
-    invited_user: Optional[InvitedUserResponse] = None
-
-
-class BulkInvitedUsersResponse(BaseModel):
-    created_count: int
-    skipped_count: int
-    error_count: int
-    results: list[BulkInvitedUserRowResponse]
-
-    @classmethod
-    def from_result(cls, result: BulkInvitedUsersResult) -> BulkInvitedUsersResponse:
-        return cls(
-            created_count=result.created_count,
-            skipped_count=result.skipped_count,
-            error_count=result.error_count,
-            results=[
-                BulkInvitedUserRowResponse(
-                    row_number=row.row_number,
-                    email=row.email,
-                    name=row.name,
-                    status=row.status,
-                    message=row.message,
-                    invited_user=(
-                        InvitedUserResponse.from_record(row.invited_user)
-                        if row.invited_user is not None
-                        else None
-                    ),
-                )
-                for row in result.results
-            ],
-        )
--- a/autogpt_platform/backend/backend/api/features/admin/user_admin_routes.py
+++ b/autogpt_platform/backend/backend/api/features/admin/user_admin_routes.py
@@ -1,137 +0,0 @@
-import logging
-import math
-
-from autogpt_libs.auth import get_user_id, requires_admin_user
-from fastapi import APIRouter, File, Query, Security, UploadFile
-
-from backend.data.invited_user import (
-    bulk_create_invited_users_from_file,
-    create_invited_user,
-    list_invited_users,
-    retry_invited_user_tally,
-    revoke_invited_user,
-)
-from backend.data.tally import mask_email
-from backend.util.models import Pagination
-
-from .model import (
-    BulkInvitedUsersResponse,
-    CreateInvitedUserRequest,
-    InvitedUserResponse,
-    InvitedUsersResponse,
-)
-
-logger = logging.getLogger(__name__)
-
-
-router = APIRouter(
-    prefix="/admin",
-    tags=["users", "admin"],
-    dependencies=[Security(requires_admin_user)],
-)
-
-
-@router.get(
-    "/invited-users",
-    response_model=InvitedUsersResponse,
-    summary="List Invited Users",
-)
-async def get_invited_users(
-    admin_user_id: str = Security(get_user_id),
-    page: int = Query(1, ge=1),
-    page_size: int = Query(50, ge=1, le=200),
-) -> InvitedUsersResponse:
-    logger.info("Admin user %s requested invited users", admin_user_id)
-    invited_users, total = await list_invited_users(page=page, page_size=page_size)
-    return InvitedUsersResponse(
-        invited_users=[InvitedUserResponse.from_record(iu) for iu in invited_users],
-        pagination=Pagination(
-            total_items=total,
-            total_pages=max(1, math.ceil(total / page_size)),
-            current_page=page,
-            page_size=page_size,
-        ),
-    )
-
-
-@router.post(
-    "/invited-users",
-    response_model=InvitedUserResponse,
-    summary="Create Invited User",
-)
-async def create_invited_user_route(
-    request: CreateInvitedUserRequest,
-    admin_user_id: str = Security(get_user_id),
-) -> InvitedUserResponse:
-    logger.info(
-        "Admin user %s creating invited user for %s",
-        admin_user_id,
-        mask_email(request.email),
-    )
-    invited_user = await create_invited_user(request.email, request.name)
-    logger.info(
-        "Admin user %s created invited user %s",
-        admin_user_id,
-        invited_user.id,
-    )
-    return InvitedUserResponse.from_record(invited_user)
-
-
-@router.post(
-    "/invited-users/bulk",
-    response_model=BulkInvitedUsersResponse,
-    summary="Bulk Create Invited Users",
-    operation_id="postV2BulkCreateInvitedUsers",
-)
-async def bulk_create_invited_users_route(
-    file: UploadFile = File(...),
-    admin_user_id: str = Security(get_user_id),
-) -> BulkInvitedUsersResponse:
-    logger.info(
-        "Admin user %s bulk invited users from %s",
-        admin_user_id,
-        file.filename or "<unnamed>",
-    )
-    content = await file.read()
-    result = await bulk_create_invited_users_from_file(file.filename, content)
-    return BulkInvitedUsersResponse.from_result(result)
-
-
-@router.post(
-    "/invited-users/{invited_user_id}/revoke",
-    response_model=InvitedUserResponse,
-    summary="Revoke Invited User",
-)
-async def revoke_invited_user_route(
-    invited_user_id: str,
-    admin_user_id: str = Security(get_user_id),
-) -> InvitedUserResponse:
-    logger.info(
-        "Admin user %s revoking invited user %s", admin_user_id, invited_user_id
-    )
-    invited_user = await revoke_invited_user(invited_user_id)
-    logger.info("Admin user %s revoked invited user %s", admin_user_id, invited_user_id)
-    return InvitedUserResponse.from_record(invited_user)
-
-
-@router.post(
-    "/invited-users/{invited_user_id}/retry-tally",
-    response_model=InvitedUserResponse,
-    summary="Retry Invited User Tally",
-)
-async def retry_invited_user_tally_route(
-    invited_user_id: str,
-    admin_user_id: str = Security(get_user_id),
-) -> InvitedUserResponse:
-    logger.info(
-        "Admin user %s retrying Tally seed for invited user %s",
-        admin_user_id,
-        invited_user_id,
-    )
-    invited_user = await retry_invited_user_tally(invited_user_id)
-    logger.info(
-        "Admin user %s retried Tally seed for invited user %s",
-        admin_user_id,
-        invited_user_id,
-    )
-    return InvitedUserResponse.from_record(invited_user)
--- a/autogpt_platform/backend/backend/api/features/admin/user_admin_routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/admin/user_admin_routes_test.py
@@ -1,168 +0,0 @@
-from datetime import datetime, timezone
-from unittest.mock import AsyncMock
-
-import fastapi
-import fastapi.testclient
-import prisma.enums
-import pytest
-import pytest_mock
-from autogpt_libs.auth.jwt_utils import get_jwt_payload
-
-from backend.data.invited_user import (
-    BulkInvitedUserRowResult,
-    BulkInvitedUsersResult,
-    InvitedUserRecord,
-)
-
-from .user_admin_routes import router as user_admin_router
-
-app = fastapi.FastAPI()
-app.include_router(user_admin_router)
-
-client = fastapi.testclient.TestClient(app)
-
-
-@pytest.fixture(autouse=True)
-def setup_app_admin_auth(mock_jwt_admin):
-    app.dependency_overrides[get_jwt_payload] = mock_jwt_admin["get_jwt_payload"]
-    yield
-    app.dependency_overrides.clear()
-
-
-def _sample_invited_user() -> InvitedUserRecord:
-    now = datetime.now(timezone.utc)
-    return InvitedUserRecord(
-        id="invite-1",
-        email="invited@example.com",
-        status=prisma.enums.InvitedUserStatus.INVITED,
-        auth_user_id=None,
-        name="Invited User",
-        tally_understanding=None,
-        tally_status=prisma.enums.TallyComputationStatus.PENDING,
-        tally_computed_at=None,
-        tally_error=None,
-        created_at=now,
-        updated_at=now,
-    )
-
-
-def _sample_bulk_invited_users_result() -> BulkInvitedUsersResult:
-    return BulkInvitedUsersResult(
-        created_count=1,
-        skipped_count=1,
-        error_count=0,
-        results=[
-            BulkInvitedUserRowResult(
-                row_number=1,
-                email="invited@example.com",
-                name=None,
-                status="CREATED",
-                message="Invite created",
-                invited_user=_sample_invited_user(),
-            ),
-            BulkInvitedUserRowResult(
-                row_number=2,
-                email="duplicate@example.com",
-                name=None,
-                status="SKIPPED",
-                message="An invited user with this email already exists",
-                invited_user=None,
-            ),
-        ],
-    )
-
-
-def test_get_invited_users(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    mocker.patch(
-        "backend.api.features.admin.user_admin_routes.list_invited_users",
-        AsyncMock(return_value=([_sample_invited_user()], 1)),
-    )
-
-    response = client.get("/admin/invited-users")
-
-    assert response.status_code == 200
-    data = response.json()
-    assert len(data["invited_users"]) == 1
-    assert data["invited_users"][0]["email"] == "invited@example.com"
-    assert data["invited_users"][0]["status"] == "INVITED"
-    assert data["pagination"]["total_items"] == 1
-    assert data["pagination"]["current_page"] == 1
-    assert data["pagination"]["page_size"] == 50
-
-
-def test_create_invited_user(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    mocker.patch(
-        "backend.api.features.admin.user_admin_routes.create_invited_user",
-        AsyncMock(return_value=_sample_invited_user()),
-    )
-
-    response = client.post(
-        "/admin/invited-users",
-        json={"email": "invited@example.com", "name": "Invited User"},
-    )
-
-    assert response.status_code == 200
-    data = response.json()
-    assert data["email"] == "invited@example.com"
-    assert data["name"] == "Invited User"
-
-
-def test_bulk_create_invited_users(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    mocker.patch(
-        "backend.api.features.admin.user_admin_routes.bulk_create_invited_users_from_file",
-        AsyncMock(return_value=_sample_bulk_invited_users_result()),
-    )
-
-    response = client.post(
-        "/admin/invited-users/bulk",
-        files={
-            "file": ("invites.txt", b"invited@example.com\nduplicate@example.com\n")
-        },
-    )
-
-    assert response.status_code == 200
-    data = response.json()
-    assert data["created_count"] == 1
-    assert data["skipped_count"] == 1
-    assert data["results"][0]["status"] == "CREATED"
-    assert data["results"][1]["status"] == "SKIPPED"
-
-
-def test_revoke_invited_user(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    revoked = _sample_invited_user().model_copy(
-        update={"status": prisma.enums.InvitedUserStatus.REVOKED}
-    )
-    mocker.patch(
-        "backend.api.features.admin.user_admin_routes.revoke_invited_user",
-        AsyncMock(return_value=revoked),
-    )
-
-    response = client.post("/admin/invited-users/invite-1/revoke")
-
-    assert response.status_code == 200
-    assert response.json()["status"] == "REVOKED"
-
-
-def test_retry_invited_user_tally(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    retried = _sample_invited_user().model_copy(
-        update={"tally_status": prisma.enums.TallyComputationStatus.RUNNING}
-    )
-    mocker.patch(
-        "backend.api.features.admin.user_admin_routes.retry_invited_user_tally",
-        AsyncMock(return_value=retried),
-    )
-
-    response = client.post("/admin/invited-users/invite-1/retry-tally")
-
-    assert response.status_code == 200
-    assert response.json()["tally_status"] == "RUNNING"
--- a/autogpt_platform/backend/backend/api/features/builder/db.py
+++ b/autogpt_platform/backend/backend/api/features/builder/db.py
@@ -4,14 +4,12 @@ from difflib import SequenceMatcher
 from typing import Any, Sequence, get_args, get_origin

 import prisma
-from prisma.enums import ContentType
 from prisma.models import mv_suggested_blocks

 import backend.api.features.library.db as library_db
 import backend.api.features.library.model as library_model
 import backend.api.features.store.db as store_db
 import backend.api.features.store.model as store_model
-from backend.api.features.store.hybrid_search import unified_hybrid_search
 from backend.blocks import load_all_blocks
 from backend.blocks._base import (
    AnyBlockSchema,
@@ -24,6 +22,7 @@ from backend.blocks.llm import LlmModel
 from backend.integrations.providers import ProviderName
 from backend.util.cache import cached
 from backend.util.models import Pagination
+from backend.util.text import split_camelcase

 from .model import (
    BlockCategoryResponse,
@@ -271,7 +270,7 @@ async def _build_cached_search_results(

    # Use hybrid search when query is present, otherwise list all blocks
    if (include_blocks or include_integrations) and normalized_query:
-        block_results, block_total, integration_total = await _hybrid_search_blocks(
+        block_results, block_total, integration_total = await _text_search_blocks(
            query=search_query,
            include_blocks=include_blocks,
            include_integrations=include_integrations,
@@ -383,117 +382,75 @@ def _collect_block_results(
    return results, block_count, integration_count


-async def _hybrid_search_blocks(
+async def _text_search_blocks(
    *,
    query: str,
    include_blocks: bool,
    include_integrations: bool,
 ) -> tuple[list[_ScoredItem], int, int]:
    """
-    Search blocks using hybrid search with builder-specific filtering.
+    Search blocks using in-memory text matching over the block registry.

-    Uses unified_hybrid_search for semantic + lexical search, then applies
-    post-filtering for block/integration types and scoring adjustments.
+    All blocks are already loaded in memory, so this is fast and reliable
+    regardless of whether OpenAI embeddings are available.

    Scoring:
-        - Base: hybrid relevance score (0-1) scaled to 0-100, plus BLOCK_SCORE_BOOST
+        - Base: text relevance via _score_primary_fields, plus BLOCK_SCORE_BOOST
          to prioritize blocks over marketplace agents in combined results
-        - +30 for exact name match, +15 for prefix name match
        - +20 if the block has an LlmModel field and the query matches an LLM model name
-
-    Args:
-        query: The search query string
-        include_blocks: Whether to include regular blocks
-        include_integrations: Whether to include integration blocks
-
-    Returns:
-        Tuple of (scored_items, block_count, integration_count)
    """
    results: list[_ScoredItem] = []
-    block_count = 0
-    integration_count = 0

    if not include_blocks and not include_integrations:
-        return results, block_count, integration_count
+        return results, 0, 0

    normalized_query = query.strip().lower()

-    # Fetch more results to account for post-filtering
-    search_results, _ = await unified_hybrid_search(
-        query=query,
-        content_types=[ContentType.BLOCK],
-        page=1,
-        page_size=150,
-        min_score=0.10,
+    all_results, _, _ = _collect_block_results(
+        include_blocks=include_blocks,
+        include_integrations=include_integrations,
    )

-    # Load all blocks for getting BlockInfo
    all_blocks = load_all_blocks()

-    for result in search_results:
-        block_id = result["content_id"]
+    for item in all_results:
+        block_info = item.item
+        assert isinstance(block_info, BlockInfo)
+        name = split_camelcase(block_info.name).lower()

-        # Skip excluded blocks
-        if block_id in EXCLUDED_BLOCK_IDS:
-            continue
+        # Build rich description including input field descriptions,
+        # matching the searchable text that the embedding pipeline uses
+        desc_parts = [block_info.description or ""]
+        block_cls = all_blocks.get(block_info.id)
+        if block_cls is not None:
+            block: AnyBlockSchema = block_cls()
+            desc_parts += [
+                f"{f}: {info.description}"
+                for f, info in block.input_schema.model_fields.items()
+                if info.description
+            ]
+        description = " ".join(desc_parts).lower()

-        metadata = result.get("metadata", {})
-        hybrid_score = result.get("relevance", 0.0)
-
-        # Get the actual block class
-        if block_id not in all_blocks:
-            continue
-
-        block_cls = all_blocks[block_id]
-        block: AnyBlockSchema = block_cls()
-
-        if block.disabled:
-            continue
-
-        # Check block/integration filter using metadata
-        is_integration = metadata.get("is_integration", False)
-
-        if is_integration and not include_integrations:
-            continue
-        if not is_integration and not include_blocks:
-            continue
-
-        # Get block info
-        block_info = block.get_info()
-
-        # Calculate final score: scale hybrid score and add builder-specific bonuses
-        # Hybrid scores are 0-1, builder scores were 0-200+
-        # Add BLOCK_SCORE_BOOST to prioritize blocks over marketplace agents
-        final_score = hybrid_score * 100 + BLOCK_SCORE_BOOST
+        score = _score_primary_fields(name, description, normalized_query)

        # Add LLM model match bonus
-        has_llm_field = metadata.get("has_llm_model_field", False)
-        if has_llm_field and _matches_llm_model(block.input_schema, normalized_query):
-            final_score += 20
+        if block_cls is not None and _matches_llm_model(
+            block_cls().input_schema, normalized_query
+        ):
+            score += 20

-        # Add exact/prefix match bonus for deterministic tie-breaking
-        name = block_info.name.lower()
-        if name == normalized_query:
-            final_score += 30
-        elif name.startswith(normalized_query):
-            final_score += 15
-
-        # Track counts
-        filter_type: FilterType = "integrations" if is_integration else "blocks"
-        if is_integration:
-            integration_count += 1
-        else:
-            block_count += 1
-
-        results.append(
-            _ScoredItem(
-                item=block_info,
-                filter_type=filter_type,
-                score=final_score,
-                sort_key=name,
+        if score >= MIN_SCORE_FOR_FILTERED_RESULTS:
+            results.append(
+                _ScoredItem(
+                    item=block_info,
+                    filter_type=item.filter_type,
+                    score=score + BLOCK_SCORE_BOOST,
+                    sort_key=name,
+                )
            )
-        )

+    block_count = sum(1 for r in results if r.filter_type == "blocks")
+    integration_count = sum(1 for r in results if r.filter_type == "integrations")
    return results, block_count, integration_count


--- a/autogpt_platform/backend/backend/api/features/chat/routes.py
+++ b/autogpt_platform/backend/backend/api/features/chat/routes.py
@@ -60,7 +60,6 @@ from backend.copilot.tools.models import (
 )
 from backend.copilot.tracking import track_user_message
 from backend.data.redis_client import get_redis_async
-from backend.data.understanding import get_business_understanding
 from backend.data.workspace import get_or_create_workspace
 from backend.util.exceptions import NotFoundError

@@ -895,36 +894,6 @@ async def session_assign_user(
    return {"status": "ok"}


-# ========== Suggested Prompts ==========
-
-
-class SuggestedPromptsResponse(BaseModel):
-    """Response model for user-specific suggested prompts."""
-
-    prompts: list[str]
-
-
-@router.get(
-    "/suggested-prompts",
-    dependencies=[Security(auth.requires_user)],
-)
-async def get_suggested_prompts(
-    user_id: Annotated[str, Security(auth.get_user_id)],
-) -> SuggestedPromptsResponse:
-    """
-    Get LLM-generated suggested prompts for the authenticated user.
-
-    Returns personalized quick-action prompts based on the user's
-    business understanding. Returns an empty list if no custom prompts
-    are available.
-    """
-    understanding = await get_business_understanding(user_id)
-    if understanding is None:
-        return SuggestedPromptsResponse(prompts=[])
-
-    return SuggestedPromptsResponse(prompts=understanding.suggested_prompts)
-
-
 # ========== Configuration ==========


--- a/autogpt_platform/backend/backend/api/features/chat/routes_test.py
+++ b/autogpt_platform/backend/backend/api/features/chat/routes_test.py
@@ -1,7 +1,7 @@
-"""Tests for chat API routes: session title update, file attachment validation, usage, rate limiting, and suggested prompts."""
+"""Tests for chat API routes: session title update, file attachment validation, usage, and rate limiting."""

 from datetime import UTC, datetime, timedelta
-from unittest.mock import AsyncMock, MagicMock
+from unittest.mock import AsyncMock

 import fastapi
 import fastapi.testclient
@@ -400,62 +400,3 @@ def test_usage_rejects_unauthenticated_request() -> None:
    response = unauthenticated_client.get("/usage")

    assert response.status_code == 401
-
-
-# ─── Suggested prompts endpoint ──────────────────────────────────────
-
-
-def _mock_get_business_understanding(
-    mocker: pytest_mock.MockerFixture,
-    *,
-    return_value=None,
-):
-    """Mock get_business_understanding."""
-    return mocker.patch(
-        "backend.api.features.chat.routes.get_business_understanding",
-        new_callable=AsyncMock,
-        return_value=return_value,
-    )
-
-
-def test_suggested_prompts_returns_prompts(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    """User with understanding and prompts gets them back."""
-    mock_understanding = MagicMock()
-    mock_understanding.suggested_prompts = ["Do X", "Do Y", "Do Z"]
-    _mock_get_business_understanding(mocker, return_value=mock_understanding)
-
-    response = client.get("/suggested-prompts")
-
-    assert response.status_code == 200
-    assert response.json() == {"prompts": ["Do X", "Do Y", "Do Z"]}
-
-
-def test_suggested_prompts_no_understanding(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    """User with no understanding gets empty list."""
-    _mock_get_business_understanding(mocker, return_value=None)
-
-    response = client.get("/suggested-prompts")
-
-    assert response.status_code == 200
-    assert response.json() == {"prompts": []}
-
-
-def test_suggested_prompts_empty_prompts(
-    mocker: pytest_mock.MockerFixture,
-    test_user_id: str,
-) -> None:
-    """User with understanding but no prompts gets empty list."""
-    mock_understanding = MagicMock()
-    mock_understanding.suggested_prompts = []
-    _mock_get_business_understanding(mocker, return_value=mock_understanding)
-
-    response = client.get("/suggested-prompts")
-
-    assert response.status_code == 200
-    assert response.json() == {"prompts": []}
--- a/autogpt_platform/backend/backend/api/features/store/db.py
+++ b/autogpt_platform/backend/backend/api/features/store/db.py
@@ -9,7 +9,7 @@ import prisma.errors
 import prisma.models
 import prisma.types

-from backend.data.db import transaction
+from backend.data.db import query_raw_with_schema, transaction
 from backend.data.graph import (
    GraphModel,
    GraphModelWithoutNodes,
@@ -104,7 +104,8 @@ async def get_store_agents(
                # search_used_hybrid remains False, will use fallback path below

            # Convert hybrid search results (dict format) if hybrid succeeded
-            if search_used_hybrid:
+            # Fall through to direct DB search if hybrid returned nothing
+            if search_used_hybrid and agents:
                total_pages = (total + page_size - 1) // page_size
                store_agents: list[store_model.StoreAgent] = []
                for agent in agents:
@@ -130,52 +131,20 @@ async def get_store_agents(
                        )
                        continue

-        if not search_used_hybrid:
-            # Fallback path - use basic search or no search
-            where_clause: prisma.types.StoreAgentWhereInput = {"is_available": True}
-            if featured:
-                where_clause["featured"] = featured
-            if creators:
-                where_clause["creator_username"] = {"in": creators}
-            if category:
-                where_clause["categories"] = {"has": category}
-
-            # Add basic text search if search_query provided but hybrid failed
-            if search_query:
-                where_clause["OR"] = [
-                    {"agent_name": {"contains": search_query, "mode": "insensitive"}},
-                    {"sub_heading": {"contains": search_query, "mode": "insensitive"}},
-                    {"description": {"contains": search_query, "mode": "insensitive"}},
-                ]
-
-            order_by = []
-            if sorted_by == StoreAgentsSortOptions.RATING:
-                order_by.append({"rating": "desc"})
-            elif sorted_by == StoreAgentsSortOptions.RUNS:
-                order_by.append({"runs": "desc"})
-            elif sorted_by == StoreAgentsSortOptions.NAME:
-                order_by.append({"agent_name": "asc"})
-            elif sorted_by == StoreAgentsSortOptions.UPDATED_AT:
-                order_by.append({"updated_at": "desc"})
-
-            db_agents = await prisma.models.StoreAgent.prisma().find_many(
-                where=where_clause,
-                order=order_by,
-                skip=(page - 1) * page_size,
-                take=page_size,
+        if not search_used_hybrid or not agents:
+            # Fallback path: direct DB query with optional tsvector search.
+            # This mirrors the original pre-hybrid-search implementation.
+            store_agents, total = await _fallback_store_agent_search(
+                search_query=search_query,
+                featured=featured,
+                creators=creators,
+                category=category,
+                sorted_by=sorted_by,
+                page=page,
+                page_size=page_size,
            )
-
-            total = await prisma.models.StoreAgent.prisma().count(where=where_clause)
            total_pages = (total + page_size - 1) // page_size

-            store_agents: list[store_model.StoreAgent] = []
-            for agent in db_agents:
-                try:
-                    store_agents.append(store_model.StoreAgent.from_db(agent))
-                except Exception as e:
-                    logger.error(f"Error parsing StoreAgent from db: {e}")
-                    continue
-
        logger.debug(f"Found {len(store_agents)} agents")
        return store_model.StoreAgentsResponse(
            agents=store_agents,
@@ -195,6 +164,126 @@ async def get_store_agents(
    #         await log_search_term(search_query=search_term)


+async def _fallback_store_agent_search(
+    *,
+    search_query: str | None,
+    featured: bool,
+    creators: list[str] | None,
+    category: str | None,
+    sorted_by: StoreAgentsSortOptions | None,
+    page: int,
+    page_size: int,
+) -> tuple[list[store_model.StoreAgent], int]:
+    """Direct DB search fallback when hybrid search is unavailable or empty.
+
+    Uses ad-hoc to_tsvector/plainto_tsquery with ts_rank_cd for text search,
+    matching the quality of the original pre-hybrid-search implementation.
+    Falls back to simple listing when no search query is provided.
+    """
+    if not search_query:
+        # No search query — use Prisma for simple filtered listing
+        where_clause: prisma.types.StoreAgentWhereInput = {"is_available": True}
+        if featured:
+            where_clause["featured"] = featured
+        if creators:
+            where_clause["creator_username"] = {"in": creators}
+        if category:
+            where_clause["categories"] = {"has": category}
+
+        order_by = []
+        if sorted_by == StoreAgentsSortOptions.RATING:
+            order_by.append({"rating": "desc"})
+        elif sorted_by == StoreAgentsSortOptions.RUNS:
+            order_by.append({"runs": "desc"})
+        elif sorted_by == StoreAgentsSortOptions.NAME:
+            order_by.append({"agent_name": "asc"})
+        elif sorted_by == StoreAgentsSortOptions.UPDATED_AT:
+            order_by.append({"updated_at": "desc"})
+
+        db_agents = await prisma.models.StoreAgent.prisma().find_many(
+            where=where_clause,
+            order=order_by,
+            skip=(page - 1) * page_size,
+            take=page_size,
+        )
+        total = await prisma.models.StoreAgent.prisma().count(where=where_clause)
+        return [store_model.StoreAgent.from_db(a) for a in db_agents], total
+
+    # Text search using ad-hoc tsvector on StoreAgent view fields
+    params: list[Any] = [search_query]
+    filters = ["sa.is_available = true"]
+    param_idx = 2
+
+    if featured:
+        filters.append("sa.featured = true")
+    if creators:
+        params.append(creators)
+        filters.append(f"sa.creator_username = ANY(${param_idx})")
+        param_idx += 1
+    if category:
+        params.append(category)
+        filters.append(f"${param_idx} = ANY(sa.categories)")
+        param_idx += 1
+
+    where_sql = " AND ".join(filters)
+
+    params.extend([page_size, (page - 1) * page_size])
+    limit_param = f"${param_idx}"
+    param_idx += 1
+    offset_param = f"${param_idx}"
+
+    sql = f"""
+        WITH ranked AS (
+            SELECT sa.*,
+                ts_rank_cd(
+                    to_tsvector('english',
+                        COALESCE(sa.agent_name, '') || ' ' ||
+                        COALESCE(sa.sub_heading, '') || ' ' ||
+                        COALESCE(sa.description, '')
+                    ),
+                    plainto_tsquery('english', $1)
+                ) AS rank,
+                COUNT(*) OVER () AS total_count
+            FROM {{schema_prefix}}"StoreAgent" sa
+            WHERE {where_sql}
+            AND to_tsvector('english',
+                    COALESCE(sa.agent_name, '') || ' ' ||
+                    COALESCE(sa.sub_heading, '') || ' ' ||
+                    COALESCE(sa.description, '')
+                ) @@ plainto_tsquery('english', $1)
+        )
+        SELECT * FROM ranked
+        ORDER BY rank DESC
+        LIMIT {limit_param} OFFSET {offset_param}
+    """
+
+    results = await query_raw_with_schema(sql, *params)
+    total = results[0]["total_count"] if results else 0
+
+    store_agents = []
+    for row in results:
+        try:
+            store_agents.append(
+                store_model.StoreAgent(
+                    slug=row["slug"],
+                    agent_name=row["agent_name"],
+                    agent_image=row["agent_image"][0] if row["agent_image"] else "",
+                    creator=row["creator_username"] or "Needs Profile",
+                    creator_avatar=row["creator_avatar"] or "",
+                    sub_heading=row["sub_heading"],
+                    description=row["description"],
+                    runs=row["runs"],
+                    rating=row["rating"],
+                    agent_graph_id=row.get("graph_id", ""),
+                )
+            )
+        except Exception as e:
+            logger.error(f"Error parsing StoreAgent from fallback search: {e}")
+            continue
+
+    return store_agents, total
+
+
 async def log_search_term(search_query: str):
    """Log a search term to the database"""

@@ -1139,16 +1228,21 @@ async def review_store_submission(
                    },
                )

-                # Generate embedding for approved listing (blocking - admin operation)
-                # Inside transaction: if embedding fails, entire transaction rolls back
-                await ensure_embedding(
-                    version_id=store_listing_version_id,
-                    name=submission.name,
-                    description=submission.description,
-                    sub_heading=submission.subHeading,
-                    categories=submission.categories,
-                    tx=tx,
-                )
+                # Generate embedding for approved listing (best-effort)
+                try:
+                    await ensure_embedding(
+                        version_id=store_listing_version_id,
+                        name=submission.name,
+                        description=submission.description,
+                        sub_heading=submission.subHeading,
+                        categories=submission.categories,
+                        tx=tx,
+                    )
+                except Exception as emb_err:
+                    logger.warning(
+                        f"Could not generate embedding for listing "
+                        f"{store_listing_version_id}: {emb_err}"
+                    )

                await prisma.models.StoreListing.prisma(tx).update(
                    where={"id": submission.storeListingId},
--- a/autogpt_platform/backend/backend/api/features/v1.py
+++ b/autogpt_platform/backend/backend/api/features/v1.py
@@ -55,7 +55,6 @@ from backend.data.credit import (
    set_auto_top_up,
 )
 from backend.data.graph import GraphSettings
-from backend.data.invited_user import get_or_activate_user
 from backend.data.model import CredentialsMetaInput, UserOnboarding
 from backend.data.notifications import NotificationPreference, NotificationPreferenceDTO
 from backend.data.onboarding import (
@@ -71,6 +70,7 @@ from backend.data.onboarding import (
    update_user_onboarding,
 )
 from backend.data.user import (
+    get_or_create_user,
    get_user_by_id,
    get_user_notification_preference,
    update_user_email,
@@ -136,10 +136,12 @@ _tally_background_tasks: set[asyncio.Task] = set()
    dependencies=[Security(requires_user)],
 )
 async def get_or_create_user_route(user_data: dict = Security(get_jwt_payload)):
-    user = await get_or_activate_user(user_data)
+    user = await get_or_create_user(user_data)

-    # Fire-and-forget: backfill Tally understanding when invite pre-seeding did
-    # not produce a stored result before first activation.
+    # Fire-and-forget: populate business understanding from Tally form.
+    # We use created_at proximity instead of an is_new flag because
+    # get_or_create_user is cached — a separate is_new return value would be
+    # unreliable on repeated calls within the cache TTL.
    age_seconds = (datetime.now(timezone.utc) - user.created_at).total_seconds()
    if age_seconds < 30:
        try:
@@ -163,8 +165,7 @@ async def get_or_create_user_route(user_data: dict = Security(get_jwt_payload)):
    dependencies=[Security(requires_user)],
 )
 async def update_user_email_route(
-    user_id: Annotated[str, Security(get_user_id)],
-    email: str = Body(...),
+    user_id: Annotated[str, Security(get_user_id)], email: str = Body(...)
 ) -> dict[str, str]:
    await update_user_email(user_id, email)

@@ -178,16 +179,10 @@ async def update_user_email_route(
    dependencies=[Security(requires_user)],
 )
 async def get_user_timezone_route(
-    user_id: Annotated[str, Security(get_user_id)],
+    user_data: dict = Security(get_jwt_payload),
 ) -> TimezoneResponse:
    """Get user timezone setting."""
-    try:
-        user = await get_user_by_id(user_id)
-    except ValueError:
-        raise HTTPException(
-            status_code=HTTP_404_NOT_FOUND,
-            detail="User not found. Please complete activation via /auth/user first.",
-        )
+    user = await get_or_create_user(user_data)
    return TimezoneResponse(timezone=user.timezone)


@@ -198,8 +193,7 @@ async def get_user_timezone_route(
    dependencies=[Security(requires_user)],
 )
 async def update_user_timezone_route(
-    user_id: Annotated[str, Security(get_user_id)],
-    request: UpdateTimezoneRequest,
+    user_id: Annotated[str, Security(get_user_id)], request: UpdateTimezoneRequest
 ) -> TimezoneResponse:
    """Update user timezone. The timezone should be a valid IANA timezone identifier."""
    user = await update_user_timezone(user_id, str(request.timezone))
@@ -598,6 +592,11 @@ async def fulfill_checkout(user_id: Annotated[str, Security(get_user_id)]):
 async def configure_user_auto_top_up(
    request: AutoTopUpConfig, user_id: Annotated[str, Security(get_user_id)]
 ) -> str:
+    """Configure auto top-up settings and perform an immediate top-up if needed.
+
+    Raises HTTPException(422) if the request parameters are invalid or if
+    the credit top-up fails.
+    """
    if request.threshold < 0:
        raise HTTPException(status_code=422, detail="Threshold must be greater than 0")
    if request.amount < 500 and request.amount != 0:
@@ -612,10 +611,20 @@ async def configure_user_auto_top_up(
    user_credit_model = await get_user_credit_model(user_id)
    current_balance = await user_credit_model.get_credits(user_id)

-    if current_balance < request.threshold:
-        await user_credit_model.top_up_credits(user_id, request.amount)
-    else:
-        await user_credit_model.top_up_credits(user_id, 0)
+    try:
+        if current_balance < request.threshold:
+            await user_credit_model.top_up_credits(user_id, request.amount)
+        else:
+            await user_credit_model.top_up_credits(user_id, 0)
+    except ValueError as e:
+        known_messages = (
+            "must not be negative",
+            "already exists for user",
+            "No payment method found",
+        )
+        if any(msg in str(e) for msg in known_messages):
+            raise HTTPException(status_code=422, detail=str(e))
+        raise

    await set_auto_top_up(
        user_id, AutoTopUpConfig(threshold=request.threshold, amount=request.amount)
--- a/autogpt_platform/backend/backend/api/features/v1_test.py
+++ b/autogpt_platform/backend/backend/api/features/v1_test.py
@@ -51,7 +51,7 @@ def test_get_or_create_user_route(
    }

    mocker.patch(
-        "backend.api.features.v1.get_or_activate_user",
+        "backend.api.features.v1.get_or_create_user",
        return_value=mock_user,
    )

--- a/autogpt_platform/backend/backend/api/features/workspace/routes.py
+++ b/autogpt_platform/backend/backend/api/features/workspace/routes.py
@@ -188,6 +188,7 @@ async def upload_file(
    user_id: Annotated[str, fastapi.Security(get_user_id)],
    file: UploadFile,
    session_id: str | None = Query(default=None),
+    overwrite: bool = Query(default=False),
 ) -> UploadFileResponse:
    """
    Upload a file to the user's workspace.
@@ -248,7 +249,9 @@ async def upload_file(
    # Write file via WorkspaceManager
    manager = WorkspaceManager(user_id, workspace.id, session_id)
    try:
-        workspace_file = await manager.write_file(content, filename)
+        workspace_file = await manager.write_file(
+            content, filename, overwrite=overwrite
+        )
    except ValueError as e:
        raise fastapi.HTTPException(status_code=409, detail=str(e)) from e

--- a/autogpt_platform/backend/backend/api/rest_api.py
+++ b/autogpt_platform/backend/backend/api/rest_api.py
@@ -19,7 +19,6 @@ from prisma.errors import PrismaError
 import backend.api.features.admin.credit_admin_routes
 import backend.api.features.admin.execution_analytics_routes
 import backend.api.features.admin.store_admin_routes
-import backend.api.features.admin.user_admin_routes
 import backend.api.features.builder
 import backend.api.features.builder.routes
 import backend.api.features.chat.routes as chat_routes
@@ -211,13 +210,22 @@ instrument_fastapi(
 def handle_internal_http_error(status_code: int = 500, log_error: bool = True):
    def handler(request: fastapi.Request, exc: Exception):
        if log_error:
-            logger.exception(
-                "%s %s failed. Investigate and resolve the underlying issue: %s",
-                request.method,
-                request.url.path,
-                exc,
-                exc_info=exc,
-            )
+            if status_code >= 500:
+                logger.exception(
+                    "%s %s failed. Investigate and resolve the underlying issue: %s",
+                    request.method,
+                    request.url.path,
+                    exc,
+                    exc_info=exc,
+                )
+            else:
+                logger.warning(
+                    "%s %s failed with %d: %s",
+                    request.method,
+                    request.url.path,
+                    status_code,
+                    exc,
+                )

        hint = (
            "Adjust the request and retry."
@@ -267,12 +275,10 @@ async def validation_error_handler(


 app.add_exception_handler(PrismaError, handle_internal_http_error(500))
-app.add_exception_handler(
-    FolderAlreadyExistsError, handle_internal_http_error(409, False)
-)
-app.add_exception_handler(FolderValidationError, handle_internal_http_error(400, False))
-app.add_exception_handler(NotFoundError, handle_internal_http_error(404, False))
-app.add_exception_handler(NotAuthorizedError, handle_internal_http_error(403, False))
+app.add_exception_handler(FolderAlreadyExistsError, handle_internal_http_error(409))
+app.add_exception_handler(FolderValidationError, handle_internal_http_error(400))
+app.add_exception_handler(NotFoundError, handle_internal_http_error(404))
+app.add_exception_handler(NotAuthorizedError, handle_internal_http_error(403))
 app.add_exception_handler(RequestValidationError, validation_error_handler)
 app.add_exception_handler(pydantic.ValidationError, validation_error_handler)
 app.add_exception_handler(MissingConfigError, handle_internal_http_error(503))
@@ -312,11 +318,6 @@ app.include_router(
    tags=["v2", "admin"],
    prefix="/api/executions",
 )
-app.include_router(
-    backend.api.features.admin.user_admin_routes.router,
-    tags=["v2", "admin"],
-    prefix="/api/users",
-)
 app.include_router(
    backend.api.features.executions.review.routes.router,
    tags=["v2", "executions", "review"],
--- a/autogpt_platform/backend/backend/blocks/ai_image_customizer.py
+++ b/autogpt_platform/backend/backend/blocks/ai_image_customizer.py
@@ -27,6 +27,7 @@ from backend.util.file import MediaFileType, store_media_file
 class GeminiImageModel(str, Enum):
    NANO_BANANA = "google/nano-banana"
    NANO_BANANA_PRO = "google/nano-banana-pro"
+    NANO_BANANA_2 = "google/nano-banana-2"


 class AspectRatio(str, Enum):
@@ -77,7 +78,7 @@ class AIImageCustomizerBlock(Block):
        )
        model: GeminiImageModel = SchemaField(
            description="The AI model to use for image generation and editing",
-            default=GeminiImageModel.NANO_BANANA,
+            default=GeminiImageModel.NANO_BANANA_2,
            title="Model",
        )
        images: list[MediaFileType] = SchemaField(
@@ -103,7 +104,7 @@ class AIImageCustomizerBlock(Block):
        super().__init__(
            id="d76bbe4c-930e-4894-8469-b66775511f71",
            description=(
-                "Generate and edit custom images using Google's Nano-Banana model from Gemini 2.5. "
+                "Generate and edit custom images using Google's Nano-Banana models from Gemini. "
                "Provide a prompt and optional reference images to create or modify images."
            ),
            categories={BlockCategory.AI, BlockCategory.MULTIMEDIA},
@@ -111,7 +112,7 @@ class AIImageCustomizerBlock(Block):
            output_schema=AIImageCustomizerBlock.Output,
            test_input={
                "prompt": "Make the scene more vibrant and colorful",
-                "model": GeminiImageModel.NANO_BANANA,
+                "model": GeminiImageModel.NANO_BANANA_2,
                "images": [],
                "aspect_ratio": AspectRatio.MATCH_INPUT_IMAGE,
                "output_format": OutputFormat.JPG,
--- a/autogpt_platform/backend/backend/blocks/ai_image_generator_block.py
+++ b/autogpt_platform/backend/backend/blocks/ai_image_generator_block.py
@@ -115,6 +115,7 @@ class ImageGenModel(str, Enum):
    RECRAFT = "Recraft v3"
    SD3_5 = "Stable Diffusion 3.5 Medium"
    NANO_BANANA_PRO = "Nano Banana Pro"
+    NANO_BANANA_2 = "Nano Banana 2"


 class AIImageGeneratorBlock(Block):
@@ -131,7 +132,7 @@ class AIImageGeneratorBlock(Block):
        )
        model: ImageGenModel = SchemaField(
            description="The AI model to use for image generation",
-            default=ImageGenModel.SD3_5,
+            default=ImageGenModel.NANO_BANANA_2,
            title="Model",
        )
        size: ImageSize = SchemaField(
@@ -165,7 +166,7 @@ class AIImageGeneratorBlock(Block):
            test_input={
                "credentials": TEST_CREDENTIALS_INPUT,
                "prompt": "An octopus using a laptop in a snowy forest with 'AutoGPT' clearly visible on the screen",
-                "model": ImageGenModel.RECRAFT,
+                "model": ImageGenModel.NANO_BANANA_2,
                "size": ImageSize.SQUARE,
                "style": ImageStyle.REALISTIC,
            },
@@ -179,7 +180,9 @@ class AIImageGeneratorBlock(Block):
            ],
            test_mock={
                # Return a data URI directly so store_media_file doesn't need to download
-                "_run_client": lambda *args, **kwargs: "data:image/webp;base64,UklGRiQAAABXRUJQVlA4IBgAAAAwAQCdASoBAAEAAQAcJYgCdAEO"
+                "_run_client": lambda *args, **kwargs: (
+                    "data:image/webp;base64,UklGRiQAAABXRUJQVlA4IBgAAAAwAQCdASoBAAEAAQAcJYgCdAEO"
+                )
            },
        )

@@ -280,17 +283,24 @@ class AIImageGeneratorBlock(Block):
                )
                return output

-            elif input_data.model == ImageGenModel.NANO_BANANA_PRO:
-                # Use Nano Banana Pro (Google Gemini 3 Pro Image)
+            elif input_data.model in (
+                ImageGenModel.NANO_BANANA_PRO,
+                ImageGenModel.NANO_BANANA_2,
+            ):
+                # Use Nano Banana models (Google Gemini image variants)
+                model_map = {
+                    ImageGenModel.NANO_BANANA_PRO: "google/nano-banana-pro",
+                    ImageGenModel.NANO_BANANA_2: "google/nano-banana-2",
+                }
                input_params = {
                    "prompt": modified_prompt,
                    "aspect_ratio": SIZE_TO_NANO_BANANA_RATIO[input_data.size],
-                    "resolution": "2K",  # Default to 2K for good quality/cost balance
+                    "resolution": "2K",
                    "output_format": "jpg",
-                    "safety_filter_level": "block_only_high",  # Most permissive
+                    "safety_filter_level": "block_only_high",
                }
                output = await self._run_client(
-                    credentials, "google/nano-banana-pro", input_params
+                    credentials, model_map[input_data.model], input_params
                )
                return output

--- a/autogpt_platform/backend/backend/blocks/autopilot.py
+++ b/autogpt_platform/backend/backend/blocks/autopilot.py
@@ -0,0 +1,376 @@
+from __future__ import annotations
+
+import asyncio
+import contextvars
+import json
+import logging
+from typing import TYPE_CHECKING, Any
+
+from typing_extensions import TypedDict  # Needed for Python <3.12 compatibility
+
+from backend.blocks._base import (
+    Block,
+    BlockCategory,
+    BlockOutput,
+    BlockSchemaInput,
+    BlockSchemaOutput,
+)
+from backend.data.model import SchemaField
+
+if TYPE_CHECKING:
+    from backend.data.execution import ExecutionContext
+
+logger = logging.getLogger(__name__)
+
+# Block ID shared between autopilot.py and copilot prompting.py.
+AUTOPILOT_BLOCK_ID = "c069dc6b-c3ed-4c12-b6e5-d47361e64ce6"
+
+
+class ToolCallEntry(TypedDict):
+    """A single tool invocation record from an autopilot execution."""
+
+    tool_call_id: str
+    tool_name: str
+    input: Any
+    output: Any | None
+    success: bool | None
+
+
+class TokenUsage(TypedDict):
+    """Aggregated token counts from the autopilot stream."""
+
+    prompt_tokens: int
+    completion_tokens: int
+    total_tokens: int
+
+
+class AutoPilotBlock(Block):
+    """Execute tasks using AutoGPT AutoPilot with full access to platform tools.
+
+    The autopilot can manage agents, access workspace files, fetch web content,
+    run blocks, and more. This block enables sub-agent patterns (autopilot calling
+    autopilot) and scheduled autopilot execution via the agent executor.
+    """
+
+    class Input(BlockSchemaInput):
+        """Input schema for the AutoPilot block."""
+
+        prompt: str = SchemaField(
+            description=(
+                "The task or instruction for the autopilot to execute. "
+                "The autopilot has access to platform tools like agent management, "
+                "workspace files, web fetch, block execution, and more."
+            ),
+            placeholder="Find my agents and list them",
+            advanced=False,
+        )
+
+        system_context: str = SchemaField(
+            description=(
+                "Optional additional context prepended to the prompt. "
+                "Use this to constrain autopilot behavior, provide domain "
+                "context, or set output format requirements."
+            ),
+            default="",
+            advanced=True,
+        )
+
+        session_id: str = SchemaField(
+            description=(
+                "Session ID to continue an existing autopilot conversation. "
+                "Leave empty to start a new session. "
+                "Use the session_id output from a previous run to continue."
+            ),
+            default="",
+            advanced=True,
+        )
+
+        max_recursion_depth: int = SchemaField(
+            description=(
+                "Maximum nesting depth when the autopilot calls this block "
+                "recursively (sub-agent pattern). Prevents infinite loops."
+            ),
+            default=3,
+            ge=1,
+            le=10,
+            advanced=True,
+        )
+
+        # timeout_seconds removed: the SDK manages its own heartbeat-based
+        # timeouts internally; wrapping with asyncio.timeout corrupts the
+        # SDK's internal stream (see service.py CRITICAL comment).
+
+    class Output(BlockSchemaOutput):
+        """Output schema for the AutoPilot block."""
+
+        response: str = SchemaField(
+            description="The final text response from the autopilot."
+        )
+        tool_calls: list[ToolCallEntry] = SchemaField(
+            description=(
+                "List of tools called during execution. Each entry has "
+                "tool_call_id, tool_name, input, output, and success fields."
+            ),
+        )
+        conversation_history: str = SchemaField(
+            description=(
+                "Current turn messages (user prompt + assistant reply) as JSON. "
+                "It can be used for logging or analysis."
+            ),
+        )
+        session_id: str = SchemaField(
+            description=(
+                "Session ID for this conversation. "
+                "Pass this back to continue the conversation in a future run."
+            ),
+        )
+        token_usage: TokenUsage = SchemaField(
+            description=(
+                "Token usage statistics: prompt_tokens, "
+                "completion_tokens, total_tokens."
+            ),
+        )
+
+    def __init__(self):
+        super().__init__(
+            id=AUTOPILOT_BLOCK_ID,
+            description=(
+                "Execute tasks using AutoGPT AutoPilot with full access to "
+                "platform tools (agent management, workspace files, web fetch, "
+                "block execution, and more). Enables sub-agent patterns and "
+                "scheduled autopilot execution."
+            ),
+            categories={BlockCategory.AI, BlockCategory.AGENT},
+            input_schema=AutoPilotBlock.Input,
+            output_schema=AutoPilotBlock.Output,
+            test_input={
+                "prompt": "List my agents",
+                "system_context": "",
+                "session_id": "",
+                "max_recursion_depth": 3,
+            },
+            test_output=[
+                ("response", "You have 2 agents: Agent A and Agent B."),
+                ("tool_calls", []),
+                (
+                    "conversation_history",
+                    '[{"role": "user", "content": "List my agents"}]',
+                ),
+                ("session_id", "test-session-id"),
+                (
+                    "token_usage",
+                    {
+                        "prompt_tokens": 100,
+                        "completion_tokens": 50,
+                        "total_tokens": 150,
+                    },
+                ),
+            ],
+            test_mock={
+                "create_session": lambda *args, **kwargs: "test-session-id",
+                "execute_copilot": lambda *args, **kwargs: (
+                    "You have 2 agents: Agent A and Agent B.",
+                    [],
+                    '[{"role": "user", "content": "List my agents"}]',
+                    "test-session-id",
+                    {
+                        "prompt_tokens": 100,
+                        "completion_tokens": 50,
+                        "total_tokens": 150,
+                    },
+                ),
+            },
+        )
+
+    async def create_session(self, user_id: str) -> str:
+        """Create a new chat session and return its ID (mockable for tests)."""
+        from backend.copilot.model import create_chat_session
+
+        session = await create_chat_session(user_id)
+        return session.session_id
+
+    async def execute_copilot(
+        self,
+        prompt: str,
+        system_context: str,
+        session_id: str,
+        max_recursion_depth: int,
+        user_id: str,
+    ) -> tuple[str, list[ToolCallEntry], str, str, TokenUsage]:
+        """Invoke the copilot and collect all stream results.
+
+        Delegates to :func:`collect_copilot_response` — the shared helper that
+        consumes ``stream_chat_completion_sdk`` without wrapping it in an
+        ``asyncio.timeout`` (the SDK manages its own heartbeat-based timeouts).
+
+        Args:
+            prompt: The user task/instruction.
+            system_context: Optional context prepended to the prompt.
+            session_id: Chat session to use.
+            max_recursion_depth: Maximum allowed recursion nesting.
+            user_id: Authenticated user ID.
+
+        Returns:
+            A tuple of (response_text, tool_calls, history_json, session_id, usage).
+        """
+        from backend.copilot.sdk.collect import collect_copilot_response
+
+        tokens = _check_recursion(max_recursion_depth)
+        try:
+            effective_prompt = prompt
+            if system_context:
+                effective_prompt = f"[System Context: {system_context}]\n\n{prompt}"
+
+            result = await collect_copilot_response(
+                session_id=session_id,
+                message=effective_prompt,
+                user_id=user_id,
+            )
+
+            # Build a lightweight conversation summary from streamed data.
+            turn_messages: list[dict[str, Any]] = [
+                {"role": "user", "content": effective_prompt},
+            ]
+            if result.tool_calls:
+                turn_messages.append(
+                    {
+                        "role": "assistant",
+                        "content": result.response_text,
+                        "tool_calls": result.tool_calls,
+                    }
+                )
+            else:
+                turn_messages.append(
+                    {"role": "assistant", "content": result.response_text}
+                )
+            history_json = json.dumps(turn_messages, default=str)
+
+            tool_calls: list[ToolCallEntry] = [
+                {
+                    "tool_call_id": tc["tool_call_id"],
+                    "tool_name": tc["tool_name"],
+                    "input": tc["input"],
+                    "output": tc["output"],
+                    "success": tc["success"],
+                }
+                for tc in result.tool_calls
+            ]
+
+            usage: TokenUsage = {
+                "prompt_tokens": result.prompt_tokens,
+                "completion_tokens": result.completion_tokens,
+                "total_tokens": result.total_tokens,
+            }
+
+            return (
+                result.response_text,
+                tool_calls,
+                history_json,
+                session_id,
+                usage,
+            )
+        finally:
+            _reset_recursion(tokens)
+
+    async def run(
+        self,
+        input_data: Input,
+        *,
+        execution_context: ExecutionContext,
+        **kwargs,
+    ) -> BlockOutput:
+        """Validate inputs, invoke the autopilot, and yield structured outputs.
+
+        Yields session_id even on failure so callers can inspect/resume the session.
+        """
+        if not input_data.prompt.strip():
+            yield "error", "Prompt cannot be empty."
+            return
+
+        if not execution_context.user_id:
+            yield "error", "Cannot run autopilot without an authenticated user."
+            return
+
+        if input_data.max_recursion_depth < 1:
+            yield "error", "max_recursion_depth must be at least 1."
+            return
+
+        # Create session eagerly so the user always gets the session_id,
+        # even if the downstream stream fails (avoids orphaned sessions).
+        sid = input_data.session_id
+        if not sid:
+            sid = await self.create_session(execution_context.user_id)
+
+        # NOTE: No asyncio.timeout() here — the SDK manages its own
+        # heartbeat-based timeouts internally.  Wrapping with asyncio.timeout
+        # would cancel the task mid-flight, corrupting the SDK's internal
+        # anyio memory stream (see service.py CRITICAL comment).
+        try:
+            response, tool_calls, history, _, usage = await self.execute_copilot(
+                prompt=input_data.prompt,
+                system_context=input_data.system_context,
+                session_id=sid,
+                max_recursion_depth=input_data.max_recursion_depth,
+                user_id=execution_context.user_id,
+            )
+
+            yield "response", response
+            yield "tool_calls", tool_calls
+            yield "conversation_history", history
+            yield "session_id", sid
+            yield "token_usage", usage
+        except asyncio.CancelledError:
+            yield "session_id", sid
+            yield "error", "AutoPilot execution was cancelled."
+            raise
+        except Exception as exc:
+            yield "session_id", sid
+            yield "error", str(exc)
+
+
+# ---------------------------------------------------------------------------
+# Helpers – placed after the block class for top-down readability.
+# ---------------------------------------------------------------------------
+
+# Task-scoped recursion depth counter & chain-wide limit.
+# contextvars are scoped to the current asyncio task, so concurrent
+# graph executions each get independent counters.
+_autopilot_recursion_depth: contextvars.ContextVar[int] = contextvars.ContextVar(
+    "_autopilot_recursion_depth", default=0
+)
+_autopilot_recursion_limit: contextvars.ContextVar[int | None] = contextvars.ContextVar(
+    "_autopilot_recursion_limit", default=None
+)
+
+
+def _check_recursion(
+    max_depth: int,
+) -> tuple[contextvars.Token[int], contextvars.Token[int | None]]:
+    """Check and increment recursion depth.
+
+    Returns ContextVar tokens that must be passed to ``_reset_recursion``
+    when the caller exits to restore the previous depth.
+
+    Raises:
+        RuntimeError: If the current depth already meets or exceeds the limit.
+    """
+    current = _autopilot_recursion_depth.get()
+    inherited = _autopilot_recursion_limit.get()
+    limit = max_depth if inherited is None else min(inherited, max_depth)
+    if current >= limit:
+        raise RuntimeError(
+            f"AutoPilot recursion depth limit reached ({limit}). "
+            "The autopilot has called itself too many times."
+        )
+    return (
+        _autopilot_recursion_depth.set(current + 1),
+        _autopilot_recursion_limit.set(limit),
+    )
+
+
+def _reset_recursion(
+    tokens: tuple[contextvars.Token[int], contextvars.Token[int | None]],
+) -> None:
+    """Restore recursion depth and limit to their previous values."""
+    _autopilot_recursion_depth.reset(tokens[0])
+    _autopilot_recursion_limit.reset(tokens[1])
--- a/autogpt_platform/backend/backend/blocks/data_manipulation.py
+++ b/autogpt_platform/backend/backend/blocks/data_manipulation.py
@@ -472,7 +472,7 @@ class AddToListBlock(Block):

    async def run(self, input_data: Input, **kwargs) -> BlockOutput:
        entries_added = input_data.entries.copy()
-        if input_data.entry:
+        if input_data.entry is not None:
            entries_added.append(input_data.entry)

        updated_list = input_data.list.copy()
--- a/autogpt_platform/backend/backend/blocks/flux_kontext.py
+++ b/autogpt_platform/backend/backend/blocks/flux_kontext.py
@@ -34,17 +34,29 @@ TEST_CREDENTIALS_INPUT = {
    "provider": TEST_CREDENTIALS.provider,
    "id": TEST_CREDENTIALS.id,
    "type": TEST_CREDENTIALS.type,
-    "title": TEST_CREDENTIALS.type,
+    "title": TEST_CREDENTIALS.title,
 }


-class FluxKontextModelName(str, Enum):
-    PRO = "Flux Kontext Pro"
-    MAX = "Flux Kontext Max"
+class ImageEditorModel(str, Enum):
+    FLUX_KONTEXT_PRO = "Flux Kontext Pro"
+    FLUX_KONTEXT_MAX = "Flux Kontext Max"
+    NANO_BANANA_PRO = "Nano Banana Pro"
+    NANO_BANANA_2 = "Nano Banana 2"

    @property
    def api_name(self) -> str:
-        return f"black-forest-labs/flux-kontext-{self.name.lower()}"
+        _map = {
+            "FLUX_KONTEXT_PRO": "black-forest-labs/flux-kontext-pro",
+            "FLUX_KONTEXT_MAX": "black-forest-labs/flux-kontext-max",
+            "NANO_BANANA_PRO": "google/nano-banana-pro",
+            "NANO_BANANA_2": "google/nano-banana-2",
+        }
+        return _map[self.name]
+
+
+# Keep old name as alias for backwards compatibility
+FluxKontextModelName = ImageEditorModel


 class AspectRatio(str, Enum):
@@ -69,7 +81,7 @@ class AIImageEditorBlock(Block):
        credentials: CredentialsMetaInput[
            Literal[ProviderName.REPLICATE], Literal["api_key"]
        ] = CredentialsField(
-            description="Replicate API key with permissions for Flux Kontext models",
+            description="Replicate API key with permissions for Flux Kontext and Nano Banana models",
        )
        prompt: str = SchemaField(
            description="Text instruction describing the desired edit",
@@ -87,14 +99,14 @@ class AIImageEditorBlock(Block):
            advanced=False,
        )
        seed: Optional[int] = SchemaField(
-            description="Random seed. Set for reproducible generation",
+            description="Random seed. Set for reproducible generation (Flux Kontext only; ignored by Nano Banana models)",
            default=None,
            title="Seed",
            advanced=True,
        )
-        model: FluxKontextModelName = SchemaField(
+        model: ImageEditorModel = SchemaField(
            description="Model variant to use",
-            default=FluxKontextModelName.PRO,
+            default=ImageEditorModel.NANO_BANANA_2,
            title="Model",
        )

@@ -107,7 +119,7 @@ class AIImageEditorBlock(Block):
        super().__init__(
            id="3fd9c73d-4370-4925-a1ff-1b86b99fabfa",
            description=(
-                "Edit images using BlackForest Labs' Flux Kontext models. Provide a prompt "
+                "Edit images using Flux Kontext or Google Nano Banana models. Provide a prompt "
                "and optional reference image to generate a modified image."
            ),
            categories={BlockCategory.AI, BlockCategory.MULTIMEDIA},
@@ -118,7 +130,7 @@ class AIImageEditorBlock(Block):
                "input_image": "data:image/png;base64,MQ==",
                "aspect_ratio": AspectRatio.MATCH_INPUT_IMAGE,
                "seed": None,
-                "model": FluxKontextModelName.PRO,
+                "model": ImageEditorModel.NANO_BANANA_2,
                "credentials": TEST_CREDENTIALS_INPUT,
            },
            test_output=[
@@ -127,7 +139,9 @@ class AIImageEditorBlock(Block):
            ],
            test_mock={
                # Use data URI to avoid HTTP requests during tests
-                "run_model": lambda *args, **kwargs: "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==",
+                "run_model": lambda *args, **kwargs: (
+                    "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg=="
+                ),
            },
            test_credentials=TEST_CREDENTIALS,
        )
@@ -142,7 +156,7 @@ class AIImageEditorBlock(Block):
    ) -> BlockOutput:
        result = await self.run_model(
            api_key=credentials.api_key,
-            model_name=input_data.model.api_name,
+            model=input_data.model,
            prompt=input_data.prompt,
            input_image_b64=(
                await store_media_file(
@@ -169,7 +183,7 @@ class AIImageEditorBlock(Block):
    async def run_model(
        self,
        api_key: SecretStr,
-        model_name: str,
+        model: ImageEditorModel,
        prompt: str,
        input_image_b64: Optional[str],
        aspect_ratio: str,
@@ -178,12 +192,29 @@ class AIImageEditorBlock(Block):
        graph_exec_id: str,
    ) -> MediaFileType:
        client = ReplicateClient(api_token=api_key.get_secret_value())
-        input_params = {
-            "prompt": prompt,
-            "input_image": input_image_b64,
-            "aspect_ratio": aspect_ratio,
-            **({"seed": seed} if seed is not None else {}),
-        }
+        model_name = model.api_name
+
+        is_nano_banana = model in (
+            ImageEditorModel.NANO_BANANA_PRO,
+            ImageEditorModel.NANO_BANANA_2,
+        )
+        if is_nano_banana:
+            input_params: dict = {
+                "prompt": prompt,
+                "aspect_ratio": aspect_ratio,
+                "output_format": "jpg",
+                "safety_filter_level": "block_only_high",
+            }
+            # NB API expects "image_input" as a list, unlike Flux's single "input_image"
+            if input_image_b64:
+                input_params["image_input"] = [input_image_b64]
+        else:
+            input_params = {
+                "prompt": prompt,
+                "input_image": input_image_b64,
+                "aspect_ratio": aspect_ratio,
+                **({"seed": seed} if seed is not None else {}),
+            }

        try:
            output: FileOutput | list[FileOutput] = await client.async_run(  # type: ignore
--- a/autogpt_platform/backend/backend/blocks/llm.py
+++ b/autogpt_platform/backend/backend/blocks/llm.py
@@ -33,6 +33,13 @@ from backend.integrations.providers import ProviderName
 from backend.util import json
 from backend.util.clients import OPENROUTER_BASE_URL
 from backend.util.logging import TruncatedLogger
+from backend.util.openai_responses import (
+    convert_tools_to_responses_format,
+    extract_responses_content,
+    extract_responses_reasoning,
+    extract_responses_tool_calls,
+    extract_responses_usage,
+)
 from backend.util.prompt import compress_context, estimate_token_count
 from backend.util.request import validate_url_host
 from backend.util.settings import Settings
@@ -111,7 +118,6 @@ class LlmModel(str, Enum, metaclass=LlmModelMeta):
    GPT4O_MINI = "gpt-4o-mini"
    GPT4O = "gpt-4o"
    GPT4_TURBO = "gpt-4-turbo"
-    GPT3_5_TURBO = "gpt-3.5-turbo"
    # Anthropic models
    CLAUDE_4_1_OPUS = "claude-opus-4-1-20250805"
    CLAUDE_4_OPUS = "claude-opus-4-20250514"
@@ -277,9 +283,6 @@ MODEL_METADATA = {
    LlmModel.GPT4_TURBO: ModelMetadata(
        "openai", 128000, 4096, "GPT-4 Turbo", "OpenAI", "OpenAI", 3
    ),  # gpt-4-turbo-2024-04-09
-    LlmModel.GPT3_5_TURBO: ModelMetadata(
-        "openai", 16385, 4096, "GPT-3.5 Turbo", "OpenAI", "OpenAI", 1
-    ),  # gpt-3.5-turbo-0125
    # https://docs.anthropic.com/en/docs/about-claude/models
    LlmModel.CLAUDE_4_1_OPUS: ModelMetadata(
        "anthropic", 200000, 32000, "Claude Opus 4.1", "Anthropic", "Anthropic", 3
@@ -793,6 +796,19 @@ async def llm_call(
            )
        prompt = result.messages

+    # Sanitize unpaired surrogates in message content to prevent
+    # UnicodeEncodeError when httpx encodes the JSON request body.
+    for msg in prompt:
+        content = msg.get("content")
+        if isinstance(content, str):
+            try:
+                content.encode("utf-8")
+            except UnicodeEncodeError:
+                logger.warning("Sanitized unpaired surrogates in LLM prompt content")
+                msg["content"] = content.encode("utf-8", errors="surrogatepass").decode(
+                    "utf-8", errors="replace"
+                )
+
    # Calculate available tokens based on context window and input length
    estimated_input_tokens = estimate_token_count(prompt)
    model_max_output = llm_model.max_output_tokens or int(2**15)
@@ -801,36 +817,53 @@ async def llm_call(
    max_tokens = max(min(available_tokens, model_max_output, user_max), 1)

    if provider == "openai":
-        tools_param = tools if tools else openai.NOT_GIVEN
        oai_client = openai.AsyncOpenAI(api_key=credentials.api_key.get_secret_value())
-        response_format = None

-        parallel_tool_calls = get_parallel_tool_calls_param(
-            llm_model, parallel_tool_calls
-        )
+        tools_param = convert_tools_to_responses_format(tools) if tools else openai.omit

+        text_config = openai.omit
        if force_json_output:
-            response_format = {"type": "json_object"}
+            text_config = {"format": {"type": "json_object"}}  # type: ignore

-        response = await oai_client.chat.completions.create(
+        response = await oai_client.responses.create(
            model=llm_model.value,
-            messages=prompt,  # type: ignore
-            response_format=response_format,  # type: ignore
-            max_completion_tokens=max_tokens,
-            tools=tools_param,  # type: ignore
-            parallel_tool_calls=parallel_tool_calls,
+            input=prompt,  # type: ignore[arg-type]
+            tools=tools_param,  # type: ignore[arg-type]
+            max_output_tokens=max_tokens,
+            parallel_tool_calls=get_parallel_tool_calls_param(
+                llm_model, parallel_tool_calls
+            ),
+            text=text_config,  # type: ignore[arg-type]
+            store=False,
        )

-        tool_calls = extract_openai_tool_calls(response)
-        reasoning = extract_openai_reasoning(response)
+        raw_tool_calls = extract_responses_tool_calls(response)
+        tool_calls = (
+            [
+                ToolContentBlock(
+                    id=tc["id"],
+                    type=tc["type"],
+                    function=ToolCall(
+                        name=tc["function"]["name"],
+                        arguments=tc["function"]["arguments"],
+                    ),
+                )
+                for tc in raw_tool_calls
+            ]
+            if raw_tool_calls
+            else None
+        )
+        reasoning = extract_responses_reasoning(response)
+        content = extract_responses_content(response)
+        prompt_tokens, completion_tokens = extract_responses_usage(response)

        return LLMResponse(
-            raw_response=response.choices[0].message,
+            raw_response=response,
            prompt=prompt,
-            response=response.choices[0].message.content or "",
+            response=content,
            tool_calls=tool_calls,
-            prompt_tokens=response.usage.prompt_tokens if response.usage else 0,
-            completion_tokens=response.usage.completion_tokens if response.usage else 0,
+            prompt_tokens=prompt_tokens,
+            completion_tokens=completion_tokens,
            reasoning=reasoning,
        )
    elif provider == "anthropic":
--- a/autogpt_platform/backend/backend/blocks/smart_decision_maker.py
+++ b/autogpt_platform/backend/backend/blocks/smart_decision_maker.py
@@ -61,20 +61,27 @@ class ExecutionParams(BaseModel):
 def _get_tool_requests(entry: dict[str, Any]) -> list[str]:
    """
    Return a list of tool_call_ids if the entry is a tool request.
-    Supports both OpenAI and Anthropics formats.
+    Supports OpenAI Chat Completions, Responses API, and Anthropic formats.
    """
    tool_call_ids = []
+
+    # OpenAI Responses API: function_call items have type="function_call"
+    if entry.get("type") == "function_call":
+        if call_id := entry.get("call_id"):
+            tool_call_ids.append(call_id)
+        return tool_call_ids
+
    if entry.get("role") != "assistant":
        return tool_call_ids

-    # OpenAI: check for tool_calls in the entry.
+    # OpenAI Chat Completions: check for tool_calls in the entry.
    calls = entry.get("tool_calls")
    if isinstance(calls, list):
        for call in calls:
            if tool_id := call.get("id"):
                tool_call_ids.append(tool_id)

-    # Anthropics: check content items for tool_use type.
+    # Anthropic: check content items for tool_use type.
    content = entry.get("content")
    if isinstance(content, list):
        for item in content:
@@ -89,16 +96,22 @@ def _get_tool_requests(entry: dict[str, Any]) -> list[str]:
 def _get_tool_responses(entry: dict[str, Any]) -> list[str]:
    """
    Return a list of tool_call_ids if the entry is a tool response.
-    Supports both OpenAI and Anthropics formats.
+    Supports OpenAI Chat Completions, Responses API, and Anthropic formats.
    """
    tool_call_ids: list[str] = []

-    # OpenAI: a tool response message with role "tool" and key "tool_call_id".
+    # OpenAI Responses API: function_call_output items
+    if entry.get("type") == "function_call_output":
+        if call_id := entry.get("call_id"):
+            tool_call_ids.append(str(call_id))
+        return tool_call_ids
+
+    # OpenAI Chat Completions: a tool response message with role "tool".
    if entry.get("role") == "tool":
        if tool_call_id := entry.get("tool_call_id"):
            tool_call_ids.append(str(tool_call_id))

-    # Anthropics: check content items for tool_result type.
+    # Anthropic: check content items for tool_result type.
    if entry.get("role") == "user":
        content = entry.get("content")
        if isinstance(content, list):
@@ -111,14 +124,16 @@ def _get_tool_responses(entry: dict[str, Any]) -> list[str]:
    return tool_call_ids


-def _create_tool_response(call_id: str, output: Any) -> dict[str, Any]:
+def _create_tool_response(
+    call_id: str, output: Any, *, responses_api: bool = False
+) -> dict[str, Any]:
    """
-    Create a tool response message for either OpenAI or Anthropics,
-    based on the tool_id format.
+    Create a tool response message for OpenAI, Anthropic, or OpenAI Responses API,
+    based on the tool_id format and the responses_api flag.
    """
    content = output if isinstance(output, str) else json.dumps(output)

-    # Anthropics format: tool IDs typically start with "toolu_"
+    # Anthropic format: tool IDs typically start with "toolu_"
    if call_id.startswith("toolu_"):
        return {
            "role": "user",
@@ -128,8 +143,11 @@ def _create_tool_response(call_id: str, output: Any) -> dict[str, Any]:
            ],
        }

-    # OpenAI format: tool IDs typically start with "call_".
-    # Or default fallback (if the tool_id doesn't match any known prefix)
+    # OpenAI Responses API format
+    if responses_api:
+        return {"type": "function_call_output", "call_id": call_id, "output": content}
+
+    # OpenAI Chat Completions format (default fallback)
    return {"role": "tool", "tool_call_id": call_id, "content": content}


@@ -177,10 +195,19 @@ def _combine_tool_responses(tool_outputs: list[dict[str, Any]]) -> list[dict[str
    return tool_outputs


-def _convert_raw_response_to_dict(raw_response: Any) -> dict[str, Any]:
+def _convert_raw_response_to_dict(
+    raw_response: Any,
+) -> dict[str, Any] | list[dict[str, Any]]:
    """
    Safely convert raw_response to dictionary format for conversation history.
    Handles different response types from different LLM providers.
+
+    For the OpenAI Responses API, the raw_response is the entire Response
+    object.  Its ``output`` items (messages, function_calls) are extracted
+    individually so they can be used as valid input items on the next call.
+    Returns a **list** of dicts in that case.
+
+    For Chat Completions / Anthropic / Ollama, returns a single dict.
    """
    if isinstance(raw_response, str):
        # Ollama returns a string, convert to dict format
@@ -188,11 +215,28 @@ def _convert_raw_response_to_dict(raw_response: Any) -> dict[str, Any]:
    elif isinstance(raw_response, dict):
        # Already a dict (from tests or some providers)
        return raw_response
+    elif _is_responses_api_object(raw_response):
+        # OpenAI Responses API: extract individual output items
+        items = [json.to_dict(item) for item in raw_response.output]
+        return items if items else [{"role": "assistant", "content": ""}]
    else:
-        # OpenAI/Anthropic return objects, convert with json.to_dict
+        # Chat Completions / Anthropic return message objects
        return json.to_dict(raw_response)


+def _is_responses_api_object(obj: Any) -> bool:
+    """Detect an OpenAI Responses API Response object.
+
+    These have ``object == "response"`` and an ``output`` list, but no
+    ``role`` attribute (unlike ChatCompletionMessage).
+    """
+    return (
+        getattr(obj, "object", None) == "response"
+        and hasattr(obj, "output")
+        and not hasattr(obj, "role")
+    )
+
+
 def get_pending_tool_calls(conversation_history: list[Any] | None) -> dict[str, int]:
    """
    All the tool calls entry in the conversation history requires a response.
@@ -754,19 +798,34 @@ class SmartDecisionMakerBlock(Block):
        self, prompt: list[dict], response, tool_outputs: list | None = None
    ):
        """Update conversation history with response and tool outputs."""
-        # Don't add separate reasoning message with tool calls (breaks Anthropic's tool_use->tool_result pairing)
-        assistant_message = _convert_raw_response_to_dict(response.raw_response)
-        has_tool_calls = isinstance(assistant_message.get("content"), list) and any(
-            item.get("type") == "tool_use"
-            for item in assistant_message.get("content", [])
-        )
+        converted = _convert_raw_response_to_dict(response.raw_response)

-        if response.reasoning and not has_tool_calls:
-            prompt.append(
-                {"role": "assistant", "content": f"[Reasoning]: {response.reasoning}"}
+        if isinstance(converted, list):
+            # Responses API: output items are already individual dicts
+            has_tool_calls = any(
+                item.get("type") == "function_call" for item in converted
            )
-
-        prompt.append(assistant_message)
+            if response.reasoning and not has_tool_calls:
+                prompt.append(
+                    {
+                        "role": "assistant",
+                        "content": f"[Reasoning]: {response.reasoning}",
+                    }
+                )
+            prompt.extend(converted)
+        else:
+            # Chat Completions / Anthropic: single assistant message dict
+            has_tool_calls = isinstance(converted.get("content"), list) and any(
+                item.get("type") == "tool_use" for item in converted.get("content", [])
+            )
+            if response.reasoning and not has_tool_calls:
+                prompt.append(
+                    {
+                        "role": "assistant",
+                        "content": f"[Reasoning]: {response.reasoning}",
+                    }
+                )
+            prompt.append(converted)

        if tool_outputs:
            prompt.extend(tool_outputs)
@@ -776,6 +835,8 @@ class SmartDecisionMakerBlock(Block):
        tool_info: ToolInfo,
        execution_params: ExecutionParams,
        execution_processor: "ExecutionProcessor",
+        *,
+        responses_api: bool = False,
    ) -> dict:
        """Execute a single tool using the execution manager for proper integration."""
        # Lazy imports to avoid circular dependencies
@@ -868,13 +929,17 @@ class SmartDecisionMakerBlock(Block):
                if node_outputs
                else "Tool executed successfully"
            )
-            return _create_tool_response(tool_call.id, tool_response_content)
+            return _create_tool_response(
+                tool_call.id, tool_response_content, responses_api=responses_api
+            )

        except Exception as e:
-            logger.error(f"Tool execution with manager failed: {e}")
+            logger.warning(f"Tool execution with manager failed: {e}")
            # Return error response
            return _create_tool_response(
-                tool_call.id, f"Tool execution failed: {str(e)}"
+                tool_call.id,
+                f"Tool execution failed: {str(e)}",
+                responses_api=responses_api,
            )

    async def _execute_tools_agent_mode(
@@ -895,6 +960,7 @@ class SmartDecisionMakerBlock(Block):
        """Execute tools in agent mode with a loop until finished."""
        max_iterations = input_data.agent_mode_max_iterations
        iteration = 0
+        use_responses_api = input_data.model.metadata.provider == "openai"

        # Execution parameters for tool execution
        execution_params = ExecutionParams(
@@ -951,14 +1017,19 @@ class SmartDecisionMakerBlock(Block):
            for tool_info in processed_tools:
                try:
                    tool_response = await self._execute_single_tool_with_manager(
-                        tool_info, execution_params, execution_processor
+                        tool_info,
+                        execution_params,
+                        execution_processor,
+                        responses_api=use_responses_api,
                    )
                    tool_outputs.append(tool_response)
                except Exception as e:
                    logger.error(f"Tool execution failed: {e}")
                    # Create error response for the tool
                    error_response = _create_tool_response(
-                        tool_info.tool_call.id, f"Error: {str(e)}"
+                        tool_info.tool_call.id,
+                        f"Error: {str(e)}",
+                        responses_api=use_responses_api,
                    )
                    tool_outputs.append(error_response)

@@ -1020,11 +1091,17 @@ class SmartDecisionMakerBlock(Block):
        if pending_tool_calls and input_data.last_tool_output is None:
            raise ValueError(f"Tool call requires an output for {pending_tool_calls}")

+        use_responses_api = input_data.model.metadata.provider == "openai"
+
        tool_output = []
        if pending_tool_calls and input_data.last_tool_output is not None:
            first_call_id = next(iter(pending_tool_calls.keys()))
            tool_output.append(
-                _create_tool_response(first_call_id, input_data.last_tool_output)
+                _create_tool_response(
+                    first_call_id,
+                    input_data.last_tool_output,
+                    responses_api=use_responses_api,
+                )
            )

            prompt.extend(tool_output)
@@ -1056,7 +1133,9 @@ class SmartDecisionMakerBlock(Block):
            )

        if input_data.sys_prompt and not any(
-            p["role"] == "system" and p["content"].startswith(MAIN_OBJECTIVE_PREFIX)
+            p.get("role") == "system"
+            and isinstance(p.get("content"), str)
+            and p["content"].startswith(MAIN_OBJECTIVE_PREFIX)
            for p in prompt
        ):
            prompt.append(
@@ -1067,7 +1146,9 @@ class SmartDecisionMakerBlock(Block):
            )

        if input_data.prompt and not any(
-            p["role"] == "user" and p["content"].startswith(MAIN_OBJECTIVE_PREFIX)
+            p.get("role") == "user"
+            and isinstance(p.get("content"), str)
+            and p["content"].startswith(MAIN_OBJECTIVE_PREFIX)
            for p in prompt
        ):
            prompt.append(
@@ -1175,11 +1256,26 @@ class SmartDecisionMakerBlock(Block):
                )
                yield emit_key, arg_value

-        if response.reasoning:
+        converted = _convert_raw_response_to_dict(response.raw_response)
+
+        # Check for tool calls to avoid inserting reasoning between tool pairs
+        if isinstance(converted, list):
+            has_tool_calls = any(
+                item.get("type") == "function_call" for item in converted
+            )
+        else:
+            has_tool_calls = isinstance(converted.get("content"), list) and any(
+                item.get("type") == "tool_use" for item in converted.get("content", [])
+            )
+
+        if response.reasoning and not has_tool_calls:
            prompt.append(
                {"role": "assistant", "content": f"[Reasoning]: {response.reasoning}"}
            )

-        prompt.append(_convert_raw_response_to_dict(response.raw_response))
+        if isinstance(converted, list):
+            prompt.extend(converted)
+        else:
+            prompt.append(converted)

        yield "conversations", prompt
--- a/autogpt_platform/backend/backend/blocks/test/test_autopilot.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_autopilot.py
@@ -0,0 +1,223 @@
+"""Tests for AutoPilotBlock: recursion guard, streaming, validation, and error paths."""
+
+import asyncio
+from unittest.mock import AsyncMock
+
+import pytest
+
+from backend.blocks.autopilot import (
+    AUTOPILOT_BLOCK_ID,
+    AutoPilotBlock,
+    _autopilot_recursion_depth,
+    _autopilot_recursion_limit,
+    _check_recursion,
+    _reset_recursion,
+)
+from backend.data.execution import ExecutionContext
+
+
+def _make_context(user_id: str = "test-user-123") -> ExecutionContext:
+    """Helper to build an ExecutionContext for tests."""
+    return ExecutionContext(
+        user_id=user_id,
+        graph_id="graph-1",
+        graph_exec_id="gexec-1",
+        graph_version=1,
+        node_id="node-1",
+        node_exec_id="nexec-1",
+    )
+
+
+# ---------------------------------------------------------------------------
+# Recursion guard unit tests
+# ---------------------------------------------------------------------------
+
+
+class TestCheckRecursion:
+    """Unit tests for _check_recursion / _reset_recursion."""
+
+    def test_first_call_increments_depth(self):
+        tokens = _check_recursion(3)
+        try:
+            assert _autopilot_recursion_depth.get() == 1
+            assert _autopilot_recursion_limit.get() == 3
+        finally:
+            _reset_recursion(tokens)
+
+    def test_reset_restores_previous_values(self):
+        assert _autopilot_recursion_depth.get() == 0
+        assert _autopilot_recursion_limit.get() is None
+        tokens = _check_recursion(5)
+        _reset_recursion(tokens)
+        assert _autopilot_recursion_depth.get() == 0
+        assert _autopilot_recursion_limit.get() is None
+
+    def test_exceeding_limit_raises(self):
+        t1 = _check_recursion(2)
+        try:
+            t2 = _check_recursion(2)
+            try:
+                with pytest.raises(RuntimeError, match="recursion depth limit"):
+                    _check_recursion(2)
+            finally:
+                _reset_recursion(t2)
+        finally:
+            _reset_recursion(t1)
+
+    def test_nested_calls_respect_inherited_limit(self):
+        """Inner call with higher max_depth still respects outer limit."""
+        t1 = _check_recursion(2)  # sets limit=2
+        try:
+            t2 = _check_recursion(10)  # inner wants 10, but inherited is 2
+            try:
+                # depth is now 2, limit is min(10, 2) = 2 → should raise
+                with pytest.raises(RuntimeError, match="recursion depth limit"):
+                    _check_recursion(10)
+            finally:
+                _reset_recursion(t2)
+        finally:
+            _reset_recursion(t1)
+
+    def test_limit_of_one_blocks_immediately_on_second_call(self):
+        t1 = _check_recursion(1)
+        try:
+            with pytest.raises(RuntimeError):
+                _check_recursion(1)
+        finally:
+            _reset_recursion(t1)
+
+
+# ---------------------------------------------------------------------------
+# AutoPilotBlock.run() validation tests
+# ---------------------------------------------------------------------------
+
+
+class TestRunValidation:
+    """Tests for input validation in AutoPilotBlock.run()."""
+
+    @pytest.fixture
+    def block(self):
+        return AutoPilotBlock()
+
+    @pytest.mark.asyncio
+    async def test_empty_prompt_yields_error(self, block):
+        block.Input  # ensure schema is accessible
+        input_data = block.Input(prompt="   ", max_recursion_depth=3)
+        ctx = _make_context()
+        outputs = {}
+        async for name, value in block.run(input_data, execution_context=ctx):
+            outputs[name] = value
+        assert outputs.get("error") == "Prompt cannot be empty."
+        assert "response" not in outputs
+
+    @pytest.mark.asyncio
+    async def test_missing_user_id_yields_error(self, block):
+        input_data = block.Input(prompt="hello", max_recursion_depth=3)
+        ctx = _make_context(user_id="")
+        outputs = {}
+        async for name, value in block.run(input_data, execution_context=ctx):
+            outputs[name] = value
+        assert "authenticated user" in outputs.get("error", "")
+
+    @pytest.mark.asyncio
+    async def test_successful_run_yields_all_outputs(self, block):
+        """With execute_copilot mocked, run() should yield all 5 success outputs."""
+        mock_result = (
+            "Hello world",
+            [],
+            '[{"role":"user","content":"hi"}]',
+            "sess-abc",
+            {"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15},
+        )
+        block.execute_copilot = AsyncMock(return_value=mock_result)
+        block.create_session = AsyncMock(return_value="sess-abc")
+
+        input_data = block.Input(prompt="hi", max_recursion_depth=3)
+        ctx = _make_context()
+        outputs = {}
+        async for name, value in block.run(input_data, execution_context=ctx):
+            outputs[name] = value
+
+        assert outputs["response"] == "Hello world"
+        assert outputs["tool_calls"] == []
+        assert outputs["session_id"] == "sess-abc"
+        assert outputs["token_usage"]["total_tokens"] == 15
+        assert "error" not in outputs
+
+    @pytest.mark.asyncio
+    async def test_exception_yields_error(self, block):
+        """On unexpected failure, run() should yield an error output."""
+        block.execute_copilot = AsyncMock(side_effect=RuntimeError("boom"))
+        block.create_session = AsyncMock(return_value="sess-fail")
+
+        input_data = block.Input(prompt="do something", max_recursion_depth=3)
+        ctx = _make_context()
+        outputs = {}
+        async for name, value in block.run(input_data, execution_context=ctx):
+            outputs[name] = value
+
+        assert outputs["session_id"] == "sess-fail"
+        assert "boom" in outputs.get("error", "")
+
+    @pytest.mark.asyncio
+    async def test_cancelled_error_yields_error_and_reraises(self, block):
+        """CancelledError should yield error, then re-raise."""
+        block.execute_copilot = AsyncMock(side_effect=asyncio.CancelledError())
+        block.create_session = AsyncMock(return_value="sess-cancel")
+
+        input_data = block.Input(prompt="do something", max_recursion_depth=3)
+        ctx = _make_context()
+        outputs = {}
+        with pytest.raises(asyncio.CancelledError):
+            async for name, value in block.run(input_data, execution_context=ctx):
+                outputs[name] = value
+
+        assert outputs["session_id"] == "sess-cancel"
+        assert "cancelled" in outputs.get("error", "").lower()
+
+    @pytest.mark.asyncio
+    async def test_existing_session_id_skips_create(self, block):
+        """When session_id is provided, create_session should not be called."""
+        mock_result = (
+            "ok",
+            [],
+            "[]",
+            "existing-sid",
+            {"prompt_tokens": 0, "completion_tokens": 0, "total_tokens": 0},
+        )
+        block.execute_copilot = AsyncMock(return_value=mock_result)
+        block.create_session = AsyncMock()
+
+        input_data = block.Input(
+            prompt="test", session_id="existing-sid", max_recursion_depth=3
+        )
+        ctx = _make_context()
+        async for _ in block.run(input_data, execution_context=ctx):
+            pass
+
+        block.create_session.assert_not_called()
+
+
+# ---------------------------------------------------------------------------
+# Block registration / ID tests
+# ---------------------------------------------------------------------------
+
+
+class TestBlockRegistration:
+    def test_block_id_matches_constant(self):
+        block = AutoPilotBlock()
+        assert block.id == AUTOPILOT_BLOCK_ID
+
+    def test_max_recursion_depth_has_upper_bound(self):
+        """Schema should enforce le=10."""
+        schema = AutoPilotBlock.Input.model_json_schema()
+        max_rec = schema["properties"]["max_recursion_depth"]
+        assert (
+            max_rec.get("maximum") == 10 or max_rec.get("exclusiveMaximum", 999) <= 11
+        )
+
+    def test_output_schema_has_no_duplicate_error_field(self):
+        """Output should inherit error from BlockSchemaOutput, not redefine it."""
+        # The field should exist (inherited) but there should be no explicit
+        # redefinition. We verify by checking the class __annotations__ directly.
+        assert "error" not in AutoPilotBlock.Output.__annotations__
--- a/autogpt_platform/backend/backend/blocks/test/test_llm.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_llm.py
@@ -13,18 +13,17 @@ class TestLLMStatsTracking:
        """Test that llm_call returns proper token counts in LLMResponse."""
        import backend.blocks.llm as llm

-        # Mock the OpenAI client
+        # Mock the OpenAI Responses API response
        mock_response = MagicMock()
-        mock_response.choices = [
-            MagicMock(message=MagicMock(content="Test response", tool_calls=None))
-        ]
-        mock_response.usage = MagicMock(prompt_tokens=10, completion_tokens=20)
+        mock_response.output_text = "Test response"
+        mock_response.output = []
+        mock_response.usage = MagicMock(input_tokens=10, output_tokens=20)

        # Test with mocked OpenAI response
        with patch("openai.AsyncOpenAI") as mock_openai:
            mock_client = AsyncMock()
            mock_openai.return_value = mock_client
-            mock_client.chat.completions.create = AsyncMock(return_value=mock_response)
+            mock_client.responses.create = AsyncMock(return_value=mock_response)

            response = await llm.llm_call(
                credentials=llm.TEST_CREDENTIALS,
@@ -271,30 +270,17 @@ class TestLLMStatsTracking:
            mock_response = MagicMock()
            # Return different responses for chunk summary vs final summary
            if call_count == 1:
-                mock_response.choices = [
-                    MagicMock(
-                        message=MagicMock(
-                            content='<json_output id="test123456">{"summary": "Test chunk summary"}</json_output>',
-                            tool_calls=None,
-                        )
-                    )
-                ]
+                mock_response.output_text = '<json_output id="test123456">{"summary": "Test chunk summary"}</json_output>'
            else:
-                mock_response.choices = [
-                    MagicMock(
-                        message=MagicMock(
-                            content='<json_output id="test123456">{"final_summary": "Test final summary"}</json_output>',
-                            tool_calls=None,
-                        )
-                    )
-                ]
-            mock_response.usage = MagicMock(prompt_tokens=50, completion_tokens=30)
+                mock_response.output_text = '<json_output id="test123456">{"final_summary": "Test final summary"}</json_output>'
+            mock_response.output = []
+            mock_response.usage = MagicMock(input_tokens=50, output_tokens=30)
            return mock_response

        with patch("openai.AsyncOpenAI") as mock_openai:
            mock_client = AsyncMock()
            mock_openai.return_value = mock_client
-            mock_client.chat.completions.create = mock_create
+            mock_client.responses.create = mock_create

            # Test with very short text (should only need 1 chunk + 1 final summary)
            input_data = llm.AITextSummarizerBlock.Input(
--- a/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_responses_api.py
+++ b/autogpt_platform/backend/backend/blocks/test/test_smart_decision_maker_responses_api.py
--- a/autogpt_platform/backend/backend/copilot/config.py
+++ b/autogpt_platform/backend/backend/copilot/config.py
@@ -148,7 +148,7 @@ class ChatConfig(BaseSettings):
        description="E2B sandbox template to use for copilot sessions.",
    )
    e2b_sandbox_timeout: int = Field(
-        default=300,  # 5 min safety net — explicit per-turn pause is the primary mechanism
+        default=420,  # 7 min safety net — allows headroom for compaction retries
        description="E2B sandbox running-time timeout (seconds). "
        "E2B timeout is wall-clock (not idle). Explicit per-turn pause is the primary "
        "mechanism; this is the safety net.",
--- a/autogpt_platform/backend/backend/copilot/integration_creds.py
+++ b/autogpt_platform/backend/backend/copilot/integration_creds.py
@@ -0,0 +1,173 @@
+"""Integration credential lookup with per-process TTL cache.
+
+Provides token retrieval for connected integrations so that copilot tools
+(e.g. bash_exec) can inject auth tokens into the execution environment without
+hitting the database on every command.
+
+Cache semantics (handled automatically by TTLCache):
+- Token found → cached for _TOKEN_CACHE_TTL (5 min).  Avoids repeated DB hits
+  for users who have credentials and are running many bash commands.
+- No credentials found → cached for _NULL_CACHE_TTL (60 s).  Avoids a DB hit
+  on every E2B command for users who haven't connected an account yet, while
+  still picking up a newly-connected account within one minute.
+
+Both caches are bounded to _CACHE_MAX_SIZE entries; cachetools evicts the
+least-recently-used entry when the limit is reached.
+
+Multi-worker note: both caches are in-process only.  Each worker/replica
+maintains its own independent cache, so a credential fetch may be duplicated
+across processes.  This is acceptable for the current goal (reduce DB hits per
+session per-process), but if cache efficiency across replicas becomes important
+a shared cache (e.g. Redis) should be used instead.
+"""
+
+import logging
+from typing import cast
+
+from cachetools import TTLCache
+
+from backend.copilot.providers import SUPPORTED_PROVIDERS
+from backend.data.model import APIKeyCredentials, OAuth2Credentials
+from backend.integrations.creds_manager import (
+    IntegrationCredentialsManager,
+    register_creds_changed_hook,
+)
+
+logger = logging.getLogger(__name__)
+
+# Derived from the single SUPPORTED_PROVIDERS registry for backward compat.
+PROVIDER_ENV_VARS: dict[str, list[str]] = {
+    slug: entry["env_vars"] for slug, entry in SUPPORTED_PROVIDERS.items()
+}
+
+_TOKEN_CACHE_TTL = 300.0  # seconds — for found tokens
+_NULL_CACHE_TTL = 60.0  # seconds — for "not connected" results
+_CACHE_MAX_SIZE = 10_000
+
+# (user_id, provider) → token string.  TTLCache handles expiry + eviction.
+# Thread-safety note: TTLCache is NOT thread-safe, but that is acceptable here
+# because all callers (get_provider_token, invalidate_user_provider_cache) run
+# exclusively on the asyncio event loop.  There are no await points between a
+# cache read and its corresponding write within any function, so no concurrent
+# coroutine can interleave.  If ThreadPoolExecutor workers are ever added to
+# this path, a threading.RLock should be wrapped around these caches.
+_token_cache: TTLCache[tuple[str, str], str] = TTLCache(
+    maxsize=_CACHE_MAX_SIZE, ttl=_TOKEN_CACHE_TTL
+)
+# Separate cache for "no credentials" results with a shorter TTL.
+_null_cache: TTLCache[tuple[str, str], bool] = TTLCache(
+    maxsize=_CACHE_MAX_SIZE, ttl=_NULL_CACHE_TTL
+)
+
+
+def invalidate_user_provider_cache(user_id: str, provider: str) -> None:
+    """Remove the cached entry for *user_id*/*provider* from both caches.
+
+    Call this after storing new credentials so that the next
+    ``get_provider_token()`` call performs a fresh DB lookup instead of
+    serving a stale TTL-cached result.
+    """
+    key = (user_id, provider)
+    _token_cache.pop(key, None)
+    _null_cache.pop(key, None)
+
+
+# Register this module's cache-bust function with the credentials manager so
+# that any create/update/delete operation immediately evicts stale cache
+# entries.  This avoids a lazy import inside creds_manager and eliminates the
+# circular-import risk.
+try:
+    register_creds_changed_hook(invalidate_user_provider_cache)
+except RuntimeError:
+    # Hook already registered (e.g. module re-import in tests).
+    pass
+
+# Module-level singleton to avoid re-instantiating IntegrationCredentialsManager
+# on every cache-miss call to get_provider_token().
+_manager = IntegrationCredentialsManager()
+
+
+async def get_provider_token(user_id: str, provider: str) -> str | None:
+    """Return the user's access token for *provider*, or ``None`` if not connected.
+
+    OAuth2 tokens are preferred (refreshed if needed); API keys are the fallback.
+    Found tokens are cached for _TOKEN_CACHE_TTL (5 min).  "Not connected" results
+    are cached for _NULL_CACHE_TTL (60 s) to avoid a DB hit on every bash_exec
+    command for users who haven't connected yet, while still picking up a
+    newly-connected account within one minute.
+    """
+    cache_key = (user_id, provider)
+
+    if cache_key in _null_cache:
+        return None
+    if cached := _token_cache.get(cache_key):
+        return cached
+
+    manager = _manager
+    try:
+        creds_list = await manager.store.get_creds_by_provider(user_id, provider)
+    except Exception:
+        logger.warning(
+            "Failed to fetch %s credentials for user %s",
+            provider,
+            user_id,
+            exc_info=True,
+        )
+        return None
+
+    # Pass 1: prefer OAuth2 (carry scope info, refreshable via token endpoint).
+    # Sort so broader-scoped tokens come first: a token with "repo" scope covers
+    # full git access, while a public-data-only token lacks push/pull permission.
+    # lock=False — background injection; not worth a distributed lock acquisition.
+    oauth2_creds = sorted(
+        [c for c in creds_list if c.type == "oauth2"],
+        key=lambda c: 0 if "repo" in (cast(OAuth2Credentials, c).scopes or []) else 1,
+    )
+    for creds in oauth2_creds:
+        if creds.type == "oauth2":
+            try:
+                fresh = await manager.refresh_if_needed(
+                    user_id, cast(OAuth2Credentials, creds), lock=False
+                )
+                token = fresh.access_token.get_secret_value()
+            except Exception:
+                logger.warning(
+                    "Failed to refresh %s OAuth token for user %s; "
+                    "discarding stale token to force re-auth",
+                    provider,
+                    user_id,
+                    exc_info=True,
+                )
+                # Do NOT fall back to the stale token — it is likely expired
+                # or revoked.  Returning None forces the caller to re-auth,
+                # preventing the LLM from receiving a non-functional token.
+                continue
+            _token_cache[cache_key] = token
+            return token
+
+    # Pass 2: fall back to API key (no expiry, no refresh needed).
+    for creds in creds_list:
+        if creds.type == "api_key":
+            token = cast(APIKeyCredentials, creds).api_key.get_secret_value()
+            _token_cache[cache_key] = token
+            return token
+
+    # No credentials found — cache to avoid repeated DB hits.
+    _null_cache[cache_key] = True
+    return None
+
+
+async def get_integration_env_vars(user_id: str) -> dict[str, str]:
+    """Return env vars for all providers the user has connected.
+
+    Iterates :data:`PROVIDER_ENV_VARS`, fetches each token, and builds a flat
+    ``{env_var: token}`` dict ready to pass to a subprocess or E2B sandbox.
+    Only providers with a stored credential contribute entries.
+    """
+    env: dict[str, str] = {}
+    for provider, var_names in PROVIDER_ENV_VARS.items():
+        token = await get_provider_token(user_id, provider)
+        if token:
+            for var in var_names:
+                env[var] = token
+    return env
--- a/autogpt_platform/backend/backend/copilot/integration_creds_test.py
+++ b/autogpt_platform/backend/backend/copilot/integration_creds_test.py
@@ -0,0 +1,195 @@
+"""Tests for integration_creds — TTL cache and token lookup paths."""
+
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+from pydantic import SecretStr
+
+from backend.copilot.integration_creds import (
+    _NULL_CACHE_TTL,
+    _TOKEN_CACHE_TTL,
+    PROVIDER_ENV_VARS,
+    _null_cache,
+    _token_cache,
+    get_integration_env_vars,
+    get_provider_token,
+    invalidate_user_provider_cache,
+)
+from backend.data.model import APIKeyCredentials, OAuth2Credentials
+
+_USER = "user-integration-creds-test"
+_PROVIDER = "github"
+
+
+def _make_api_key_creds(key: str = "test-api-key") -> APIKeyCredentials:
+    return APIKeyCredentials(
+        id="creds-api-key",
+        provider=_PROVIDER,
+        api_key=SecretStr(key),
+        title="Test API Key",
+        expires_at=None,
+    )
+
+
+def _make_oauth2_creds(token: str = "test-oauth-token") -> OAuth2Credentials:
+    return OAuth2Credentials(
+        id="creds-oauth2",
+        provider=_PROVIDER,
+        title="Test OAuth",
+        access_token=SecretStr(token),
+        refresh_token=SecretStr("test-refresh"),
+        access_token_expires_at=None,
+        refresh_token_expires_at=None,
+        scopes=[],
+    )
+
+
+@pytest.fixture(autouse=True)
+def clear_caches():
+    """Ensure clean caches before and after every test."""
+    _token_cache.clear()
+    _null_cache.clear()
+    yield
+    _token_cache.clear()
+    _null_cache.clear()
+
+
+class TestInvalidateUserProviderCache:
+    def test_removes_token_entry(self):
+        key = (_USER, _PROVIDER)
+        _token_cache[key] = "tok"
+        invalidate_user_provider_cache(_USER, _PROVIDER)
+        assert key not in _token_cache
+
+    def test_removes_null_entry(self):
+        key = (_USER, _PROVIDER)
+        _null_cache[key] = True
+        invalidate_user_provider_cache(_USER, _PROVIDER)
+        assert key not in _null_cache
+
+    def test_noop_when_key_not_cached(self):
+        # Should not raise even when there is no cache entry.
+        invalidate_user_provider_cache("no-such-user", _PROVIDER)
+
+    def test_only_removes_targeted_key(self):
+        other_key = ("other-user", _PROVIDER)
+        _token_cache[other_key] = "other-tok"
+        invalidate_user_provider_cache(_USER, _PROVIDER)
+        assert other_key in _token_cache
+
+
+class TestGetProviderToken:
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_returns_cached_token_without_db_hit(self):
+        _token_cache[(_USER, _PROVIDER)] = "cached-tok"
+
+        mock_manager = MagicMock()
+        with patch("backend.copilot.integration_creds._manager", mock_manager):
+            result = await get_provider_token(_USER, _PROVIDER)
+
+        assert result == "cached-tok"
+        mock_manager.store.get_creds_by_provider.assert_not_called()
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_returns_none_for_null_cached_provider(self):
+        _null_cache[(_USER, _PROVIDER)] = True
+
+        mock_manager = MagicMock()
+        with patch("backend.copilot.integration_creds._manager", mock_manager):
+            result = await get_provider_token(_USER, _PROVIDER)
+
+        assert result is None
+        mock_manager.store.get_creds_by_provider.assert_not_called()
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_api_key_creds_returned_and_cached(self):
+        api_creds = _make_api_key_creds("my-api-key")
+        mock_manager = MagicMock()
+        mock_manager.store.get_creds_by_provider = AsyncMock(return_value=[api_creds])
+
+        with patch("backend.copilot.integration_creds._manager", mock_manager):
+            result = await get_provider_token(_USER, _PROVIDER)
+
+        assert result == "my-api-key"
+        assert _token_cache.get((_USER, _PROVIDER)) == "my-api-key"
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_oauth2_preferred_over_api_key(self):
+        oauth_creds = _make_oauth2_creds("oauth-tok")
+        api_creds = _make_api_key_creds("api-tok")
+        mock_manager = MagicMock()
+        mock_manager.store.get_creds_by_provider = AsyncMock(
+            return_value=[api_creds, oauth_creds]
+        )
+        mock_manager.refresh_if_needed = AsyncMock(return_value=oauth_creds)
+
+        with patch("backend.copilot.integration_creds._manager", mock_manager):
+            result = await get_provider_token(_USER, _PROVIDER)
+
+        assert result == "oauth-tok"
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_oauth2_refresh_failure_returns_none(self):
+        """On refresh failure, return None instead of caching a stale token."""
+        oauth_creds = _make_oauth2_creds("stale-oauth-tok")
+        mock_manager = MagicMock()
+        mock_manager.store.get_creds_by_provider = AsyncMock(return_value=[oauth_creds])
+        mock_manager.refresh_if_needed = AsyncMock(side_effect=RuntimeError("network"))
+
+        with patch("backend.copilot.integration_creds._manager", mock_manager):
+            result = await get_provider_token(_USER, _PROVIDER)
+
+        # Stale tokens must NOT be returned — forces re-auth.
+        assert result is None
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_no_credentials_caches_null_entry(self):
+        mock_manager = MagicMock()
+        mock_manager.store.get_creds_by_provider = AsyncMock(return_value=[])
+
+        with patch("backend.copilot.integration_creds._manager", mock_manager):
+            result = await get_provider_token(_USER, _PROVIDER)
+
+        assert result is None
+        assert _null_cache.get((_USER, _PROVIDER)) is True
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_db_exception_returns_none_without_caching(self):
+        mock_manager = MagicMock()
+        mock_manager.store.get_creds_by_provider = AsyncMock(
+            side_effect=RuntimeError("db down")
+        )
+
+        with patch("backend.copilot.integration_creds._manager", mock_manager):
+            result = await get_provider_token(_USER, _PROVIDER)
+
+        assert result is None
+        # DB errors are not cached — next call will retry
+        assert (_USER, _PROVIDER) not in _token_cache
+        assert (_USER, _PROVIDER) not in _null_cache
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_null_cache_has_shorter_ttl_than_token_cache(self):
+        """Verify the TTL constants are set correctly for each cache."""
+        assert _null_cache.ttl == _NULL_CACHE_TTL
+        assert _token_cache.ttl == _TOKEN_CACHE_TTL
+        assert _NULL_CACHE_TTL < _TOKEN_CACHE_TTL
+
+
+class TestGetIntegrationEnvVars:
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_injects_all_env_vars_for_provider(self):
+        _token_cache[(_USER, "github")] = "gh-tok"
+
+        result = await get_integration_env_vars(_USER)
+
+        for var in PROVIDER_ENV_VARS["github"]:
+            assert result[var] == "gh-tok"
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_empty_dict_when_no_credentials(self):
+        _null_cache[(_USER, "github")] = True
+
+        result = await get_integration_env_vars(_USER)
+
+        assert result == {}
--- a/autogpt_platform/backend/backend/copilot/prompting.py
+++ b/autogpt_platform/backend/backend/copilot/prompting.py
@@ -6,39 +6,24 @@ handling the distinction between:
 - Local mode vs E2B mode (storage/filesystem differences)
 """

+from backend.blocks.autopilot import AUTOPILOT_BLOCK_ID
 from backend.copilot.tools import TOOL_REGISTRY

 # Shared technical notes that apply to both SDK and baseline modes
-_SHARED_TOOL_NOTES = """\
+_SHARED_TOOL_NOTES = f"""\

-### Sharing files with the user
-After saving a file to the persistent workspace with `write_workspace_file`,
-share it with the user by embedding the `download_url` from the response in
-your message as a Markdown link or image:
+### Sharing files
+After `write_workspace_file`, embed the `download_url` in Markdown:
+- File: `[report.csv](workspace://file_id#text/csv)`
+- Image: `![chart](workspace://file_id#image/png)`
+- Video: `![recording](workspace://file_id#video/mp4)`

- **Any file** — shows as a clickable download link:
-  `[report.csv](workspace://file_id#text/csv)`
- **Image** — renders inline in chat:
-  `![chart](workspace://file_id#image/png)`
- **Video** — renders inline in chat with player controls:
-  `![recording](workspace://file_id#video/mp4)`
-
-The `download_url` field in the `write_workspace_file` response is already
-in the correct format — paste it directly after the `(` in the Markdown.
-
-### Passing file content to tools — @@agptfile: references
-Instead of copying large file contents into a tool argument, pass a file
-reference and the platform will load the content for you.
-
-Syntax: `@@agptfile:<uri>[<start>-<end>]`
-
- `<uri>` **must** start with `workspace://` or `/` (absolute path):
-  - `workspace://<file_id>` — workspace file by ID
-  - `workspace:///<path>` — workspace file by virtual path
-  - `/absolute/local/path` — ephemeral or sdk_cwd file
-  - E2B sandbox absolute path (e.g. `/home/user/script.py`)
- `[<start>-<end>]` is an optional 1-indexed inclusive line range.
- URIs that do not start with `workspace://` or `/` are **not** expanded.
+### File references — @@agptfile:
+Pass large file content to tools by reference: `@@agptfile:<uri>[<start>-<end>]`
+- `workspace://<file_id>` or `workspace:///<path>` — workspace files
+- `/absolute/path` — local/sandbox files
+- `[start-end]` — optional 1-indexed line range
+- Multiple refs per argument supported. Only `workspace://` and absolute paths are expanded.

 Examples:
 ```
@@ -49,21 +34,9 @@ Examples:
@@agptfile:/home/user/script.py
 ```

-You can embed a reference inside any string argument, or use it as the entire
-value.  Multiple references in one argument are all expanded.
+**Structured data**: When the entire argument is a single file reference, the platform auto-parses by extension/MIME. Supported: JSON, JSONL, CSV, TSV, YAML, TOML, Parquet, Excel (.xlsx only; legacy `.xls` is NOT supported). Unrecognised formats return plain string.

-**Structured data**: When the **entire** argument value is a single file
-reference (no surrounding text), the platform automatically parses the file
-content based on its extension or MIME type.  Supported formats: JSON, JSONL,
-CSV, TSV, YAML, TOML, Parquet, and Excel (.xlsx — first sheet only).
-For example, pass `@@agptfile:workspace://<id>` where the file is a `.csv` and
-the rows will be parsed into `list[list[str]]` automatically.  If the format is
-unrecognised or parsing fails, the content is returned as a plain string.
-Legacy `.xls` files are **not** supported — only the modern `.xlsx` format.
-
-**Type coercion**: The platform also coerces expanded values to match the
-block's expected input types.  For example, if a block expects `list[list[str]]`
-and the expanded value is a JSON string, it will be parsed into the correct type.
+**Type coercion**: The platform auto-coerces expanded string values to match block input types (e.g. JSON string → `list[list[str]]`).

 ### Media file inputs (format: "file")
 Some block inputs accept media files — their schema shows `"format": "file"`.
@@ -81,18 +54,53 @@ that would be corrupted by text encoding.

 Example — committing an image file to GitHub:
 ```json
-{
-  "files": [{
+{{
+  "files": [{{
    "path": "docs/hero.png",
    "content": "workspace://abc123#image/png",
    "operation": "upsert"
-  }]
-}
+  }}]
+}}
 ```

 ### Sub-agent tasks
 - When using the Task tool, NEVER set `run_in_background` to true.
  All tasks must run in the foreground.
+
+### Delegating to another autopilot (sub-autopilot pattern)
+Use the **AutoPilotBlock** (`run_block` with block_id
+`{AUTOPILOT_BLOCK_ID}`) to delegate a task to a fresh
+autopilot instance.  The sub-autopilot has its own full tool set and can
+perform multi-step work autonomously.
+
+- **Input**: `prompt` (required) — the task description.
+  Optional: `system_context` to constrain behavior, `session_id` to
+  continue a previous conversation, `max_recursion_depth` (default 3).
+- **Output**: `response` (text), `tool_calls` (list), `session_id`
+  (for continuation), `conversation_history`, `token_usage`.
+
+Use this when a task is complex enough to benefit from a separate
+autopilot context, e.g. "research X and write a report" while the
+parent autopilot handles orchestration.
+"""
+
+# E2B-only notes — E2B has full internet access so gh CLI works there.
+# Not shown in local (bubblewrap) mode: --unshare-net blocks all network.
+_E2B_TOOL_NOTES = """
+### GitHub CLI (`gh`) and git
+- If the user has connected their GitHub account, both `gh` and `git` are
+  pre-authenticated — use them directly without any manual login step.
+  `git` HTTPS operations (clone, push, pull) work automatically.
+- If the token changes mid-session (e.g. user reconnects with a new token),
+  run `gh auth setup-git` to re-register the credential helper.
+- If `gh` or `git` fails with an authentication error (e.g. "authentication
+  required", "could not read Username", or exit code 128), call
+  `connect_integration(provider="github")` to surface the GitHub credentials
+  setup card so the user can connect their account. Once connected, retry
+  the operation.
+- For operations that need broader access (e.g. private org repos, GitHub
+  Actions), pass the required scopes: e.g.
+  `connect_integration(provider="github", scopes=["repo", "read:org"])`.
 """


@@ -105,6 +113,7 @@ def _build_storage_supplement(
    storage_system_1_persistence: list[str],
    file_move_name_1_to_2: str,
    file_move_name_2_to_1: str,
+    extra_notes: str = "",
 ) -> str:
    """Build storage/filesystem supplement for a specific environment.

@@ -119,6 +128,7 @@ def _build_storage_supplement(
        storage_system_1_persistence: List of persistence behavior descriptions
        file_move_name_1_to_2: Direction label for primary→persistent
        file_move_name_2_to_1: Direction label for persistent→primary
+        extra_notes: Environment-specific notes appended after shared notes
    """
    # Format lists as bullet points with proper indentation
    characteristics = "\n".join(f"   - {c}" for c in storage_system_1_characteristics)
@@ -128,17 +138,12 @@ def _build_storage_supplement(

 ## Tool notes

-### Shell commands
- The SDK built-in Bash tool is NOT available.  Use the `bash_exec` MCP tool
-  for shell commands — it runs {sandbox_type}.
-
-### Working directory
- Your working directory is: `{working_dir}`
- All SDK file tools AND `bash_exec` operate on the same filesystem
- Use relative paths or absolute paths under `{working_dir}` for all file operations
+### Shell & filesystem
+- The SDK built-in Bash tool is NOT available. Use `bash_exec` for shell commands ({sandbox_type}). Working dir: `{working_dir}`
+- SDK file tools (Read/Write/Edit/Glob/Grep) and `bash_exec` share one filesystem — use relative or absolute paths under this dir.
+- `read_workspace_file`/`write_workspace_file` operate on **persistent cloud workspace storage** (separate from the working dir).

 ### Two storage systems — CRITICAL to understand
-
 1. **{storage_system_1_name}** (`{working_dir}`):
 {characteristics}
 {persistence}
@@ -159,12 +164,16 @@ a local file under `~/.claude/projects/.../tool-results/`. To read these files,
 always use `read_file` or `Read` (NOT `read_workspace_file`).
 `read_workspace_file` reads from cloud workspace storage, where SDK
 tool-results are NOT stored.
-{_SHARED_TOOL_NOTES}"""
+{_SHARED_TOOL_NOTES}{extra_notes}"""


 # Pre-built supplements for common environments
 def _get_local_storage_supplement(cwd: str) -> str:
-    """Local ephemeral storage (files lost between turns)."""
+    """Local ephemeral storage (files lost between turns).
+
+    Network is isolated (bubblewrap --unshare-net), so internet-dependent CLIs
+    like gh will not work — no integration env-var notes are included.
+    """
    return _build_storage_supplement(
        working_dir=cwd,
        sandbox_type="in a network-isolated sandbox",
@@ -182,7 +191,11 @@ def _get_local_storage_supplement(cwd: str) -> str:


 def _get_cloud_sandbox_supplement() -> str:
-    """Cloud persistent sandbox (files survive across turns in session)."""
+    """Cloud persistent sandbox (files survive across turns in session).
+
+    E2B has full internet access, so integration tokens (GH_TOKEN etc.) are
+    injected per command in bash_exec — include the CLI guidance notes.
+    """
    return _build_storage_supplement(
        working_dir="/home/user",
        sandbox_type="in a cloud sandbox with full internet access",
@@ -197,6 +210,7 @@ def _get_cloud_sandbox_supplement() -> str:
        ],
        file_move_name_1_to_2="Sandbox → Persistent",
        file_move_name_2_to_1="Persistent → Sandbox",
+        extra_notes=_E2B_TOOL_NOTES,
    )


--- a/autogpt_platform/backend/backend/copilot/providers.py
+++ b/autogpt_platform/backend/backend/copilot/providers.py
@@ -0,0 +1,63 @@
+"""Single source of truth for copilot-supported integration providers.
+
+Both :mod:`~backend.copilot.integration_creds` (env-var injection) and
+:mod:`~backend.copilot.tools.connect_integration` (UI setup card) import from
+here, eliminating the risk of the two registries drifting out of sync.
+"""
+
+from typing import TypedDict
+
+
+class ProviderEntry(TypedDict):
+    """Metadata for a supported integration provider.
+
+    Attributes:
+        name: Human-readable display name (e.g. "GitHub").
+        env_vars: Environment variable names injected when the provider is
+            connected (e.g. ``["GH_TOKEN", "GITHUB_TOKEN"]``).
+        default_scopes: Default OAuth scopes requested when the agent does not
+            specify any.
+    """
+
+    name: str
+    env_vars: list[str]
+    default_scopes: list[str]
+
+
+def _is_github_oauth_configured() -> bool:
+    """Return True if GitHub OAuth env vars are set.
+
+    Uses a lazy import to avoid triggering ``Secrets()`` during module import,
+    which can fail in environments where secrets are not yet loaded (e.g. tests,
+    CLI tooling).
+    """
+    from backend.blocks.github._auth import GITHUB_OAUTH_IS_CONFIGURED
+
+    return GITHUB_OAUTH_IS_CONFIGURED
+
+
+# -- Registry ----------------------------------------------------------------
+# Add new providers here.  Both env-var injection and the setup-card tool read
+# from this single registry.
+
+SUPPORTED_PROVIDERS: dict[str, ProviderEntry] = {
+    "github": {
+        "name": "GitHub",
+        "env_vars": ["GH_TOKEN", "GITHUB_TOKEN"],
+        "default_scopes": ["repo"],
+    },
+}
+
+
+def get_provider_auth_types(provider: str) -> list[str]:
+    """Return the supported credential types for *provider* at runtime.
+
+    OAuth types are only offered when the corresponding OAuth client env vars
+    are configured.
+    """
+    if provider == "github":
+        if _is_github_oauth_configured():
+            return ["api_key", "oauth2"]
+        return ["api_key"]
+    # Default for unknown/future providers — API key only.
+    return ["api_key"]
--- a/autogpt_platform/backend/backend/copilot/response_model.py
+++ b/autogpt_platform/backend/backend/copilot/response_model.py
@@ -43,6 +43,7 @@ class ResponseType(str, Enum):
    ERROR = "error"
    USAGE = "usage"
    HEARTBEAT = "heartbeat"
+    STATUS = "status"


 class StreamBaseResponse(BaseModel):
@@ -263,3 +264,19 @@ class StreamHeartbeat(StreamBaseResponse):
    def to_sse(self) -> str:
        """Convert to SSE comment format to keep connection alive."""
        return ": heartbeat\n\n"
+
+
+class StreamStatus(StreamBaseResponse):
+    """Transient status notification shown to the user during long operations.
+
+    Used to provide feedback when the backend performs behind-the-scenes work
+    (e.g., compacting conversation context on a retry) that would otherwise
+    leave the user staring at an unexplained pause.
+
+    Sent as a proper ``data:`` event so the frontend can display it to the
+    user.  The AI SDK stream parser gracefully skips unknown chunk types
+    (logs a console warning), so this does not break the stream.
+    """
+
+    type: ResponseType = ResponseType.STATUS
+    message: str = Field(..., description="Human-readable status message")
--- a/autogpt_platform/backend/backend/copilot/sdk/init.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/init.py
@@ -24,10 +24,14 @@ from typing import TYPE_CHECKING, Any
 # Static imports for type checkers so they can resolve __all__ entries
 # without executing the lazy-import machinery at runtime.
 if TYPE_CHECKING:
+    from .collect import CopilotResult as CopilotResult
+    from .collect import collect_copilot_response as collect_copilot_response
    from .service import stream_chat_completion_sdk as stream_chat_completion_sdk
    from .tool_adapter import create_copilot_mcp_server as create_copilot_mcp_server

 __all__ = [
+    "CopilotResult",
+    "collect_copilot_response",
    "stream_chat_completion_sdk",
    "create_copilot_mcp_server",
 ]
@@ -35,6 +39,8 @@ __all__ = [
 # Dispatch table for PEP 562 lazy imports.  Each entry is a (module, attr)
 # pair so new exports can be added without touching __getattr__ itself.
 _LAZY_IMPORTS: dict[str, tuple[str, str]] = {
+    "CopilotResult": (".collect", "CopilotResult"),
+    "collect_copilot_response": (".collect", "collect_copilot_response"),
    "stream_chat_completion_sdk": (".service", "stream_chat_completion_sdk"),
    "create_copilot_mcp_server": (".tool_adapter", "create_copilot_mcp_server"),
 }
--- a/autogpt_platform/backend/backend/copilot/sdk/collect.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/collect.py
@@ -0,0 +1,108 @@
+"""Public helpers for consuming a copilot stream as a simple request-response.
+
+This module exposes :class:`CopilotResult` and :func:`collect_copilot_response`
+so that callers (e.g. the AutoPilot block) can consume the copilot stream
+without implementing their own event loop.
+"""
+
+from __future__ import annotations
+
+from typing import Any
+
+
+class CopilotResult:
+    """Aggregated result from consuming a copilot stream.
+
+    Returned by :func:`collect_copilot_response` so callers don't need to
+    implement their own event-loop over the raw stream events.
+    """
+
+    __slots__ = (
+        "response_text",
+        "tool_calls",
+        "prompt_tokens",
+        "completion_tokens",
+        "total_tokens",
+    )
+
+    def __init__(self) -> None:
+        self.response_text: str = ""
+        self.tool_calls: list[dict[str, Any]] = []
+        self.prompt_tokens: int = 0
+        self.completion_tokens: int = 0
+        self.total_tokens: int = 0
+
+
+async def collect_copilot_response(
+    *,
+    session_id: str,
+    message: str,
+    user_id: str,
+    is_user_message: bool = True,
+) -> CopilotResult:
+    """Consume :func:`stream_chat_completion_sdk` and return aggregated results.
+
+    This is the recommended entry-point for callers that need a simple
+    request-response interface (e.g. the AutoPilot block) rather than
+    streaming individual events.  It avoids duplicating the event-collection
+    logic and does NOT wrap the stream in ``asyncio.timeout`` — the SDK
+    manages its own heartbeat-based timeouts internally.
+
+    Args:
+        session_id: Chat session to use.
+        message: The user message / prompt.
+        user_id: Authenticated user ID.
+        is_user_message: Whether this is a user-initiated message.
+
+    Returns:
+        A :class:`CopilotResult` with the aggregated response text,
+        tool calls, and token usage.
+
+    Raises:
+        RuntimeError: If the stream yields a ``StreamError`` event.
+    """
+    from backend.copilot.response_model import (
+        StreamError,
+        StreamTextDelta,
+        StreamToolInputAvailable,
+        StreamToolOutputAvailable,
+        StreamUsage,
+    )
+
+    from .service import stream_chat_completion_sdk
+
+    result = CopilotResult()
+    response_parts: list[str] = []
+    tool_calls_by_id: dict[str, dict[str, Any]] = {}
+
+    async for event in stream_chat_completion_sdk(
+        session_id=session_id,
+        message=message,
+        is_user_message=is_user_message,
+        user_id=user_id,
+    ):
+        if isinstance(event, StreamTextDelta):
+            response_parts.append(event.delta)
+        elif isinstance(event, StreamToolInputAvailable):
+            entry: dict[str, Any] = {
+                "tool_call_id": event.toolCallId,
+                "tool_name": event.toolName,
+                "input": event.input,
+                "output": None,
+                "success": None,
+            }
+            result.tool_calls.append(entry)
+            tool_calls_by_id[event.toolCallId] = entry
+        elif isinstance(event, StreamToolOutputAvailable):
+            if tc := tool_calls_by_id.get(event.toolCallId):
+                tc["output"] = event.output
+                tc["success"] = event.success
+        elif isinstance(event, StreamUsage):
+            result.prompt_tokens += event.prompt_tokens
+            result.completion_tokens += event.completion_tokens
+            result.total_tokens += event.total_tokens
+        elif isinstance(event, StreamError):
+            raise RuntimeError(f"Copilot error: {event.errorText}")
+
+    result.response_text = "".join(response_parts)
+    return result
--- a/autogpt_platform/backend/backend/copilot/sdk/compaction.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/compaction.py
@@ -12,6 +12,7 @@ import asyncio
 import logging
 import uuid
 from dataclasses import dataclass, field
+from typing import Any

 from ..constants import COMPACTION_DONE_MSG, COMPACTION_TOOL_NAME
 from ..model import ChatMessage, ChatSession
@@ -119,14 +120,12 @@ def filter_compaction_messages(
    filtered: list[ChatMessage] = []
    for msg in messages:
        if msg.role == "assistant" and msg.tool_calls:
+            real_calls: list[dict[str, Any]] = []
            for tc in msg.tool_calls:
                if tc.get("function", {}).get("name") == COMPACTION_TOOL_NAME:
                    compaction_ids.add(tc.get("id", ""))
-            real_calls = [
-                tc
-                for tc in msg.tool_calls
-                if tc.get("function", {}).get("name") != COMPACTION_TOOL_NAME
-            ]
+                else:
+                    real_calls.append(tc)
            if not real_calls and not msg.content:
                continue
        if msg.role == "tool" and msg.tool_call_id in compaction_ids:
@@ -222,6 +221,7 @@ class CompactionTracker:

    def reset_for_query(self) -> None:
        """Reset per-query state before a new SDK query."""
+        self._compact_start.clear()
        self._done = False
        self._start_emitted = False
        self._tool_call_id = ""
--- a/autogpt_platform/backend/backend/copilot/sdk/conftest.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/conftest.py
@@ -0,0 +1,54 @@
+"""Shared test fixtures for copilot SDK tests."""
+
+from __future__ import annotations
+
+from unittest.mock import patch
+from uuid import uuid4
+
+import pytest
+
+from backend.util import json
+
+
+@pytest.fixture()
+def mock_chat_config():
+    """Mock ChatConfig so compact_transcript tests skip real config lookup."""
+    with patch(
+        "backend.copilot.config.ChatConfig",
+        return_value=type("Cfg", (), {"model": "m", "api_key": "k", "base_url": "u"})(),
+    ):
+        yield
+
+
+def build_test_transcript(pairs: list[tuple[str, str]]) -> str:
+    """Build a minimal valid JSONL transcript from (role, content) pairs.
+
+    Use this helper in any copilot SDK test that needs a well-formed
+    transcript without hitting the real storage layer.
+    """
+    lines: list[str] = []
+    last_uuid: str | None = None
+    for role, content in pairs:
+        uid = str(uuid4())
+        entry_type = "assistant" if role == "assistant" else "user"
+        msg: dict = {"role": role, "content": content}
+        if role == "assistant":
+            msg.update(
+                {
+                    "model": "",
+                    "id": f"msg_{uid[:8]}",
+                    "type": "message",
+                    "content": [{"type": "text", "text": content}],
+                    "stop_reason": "end_turn",
+                    "stop_sequence": None,
+                }
+            )
+        entry = {
+            "type": entry_type,
+            "uuid": uid,
+            "parentUuid": last_uuid,
+            "message": msg,
+        }
+        lines.append(json.dumps(entry, separators=(",", ":")))
+        last_uuid = uid
+    return "\n".join(lines) + "\n"
--- a/autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
@@ -0,0 +1,651 @@
+"""Tests for retry logic and transcript compaction helpers."""
+
+from __future__ import annotations
+
+import asyncio
+from unittest.mock import AsyncMock, patch
+from uuid import uuid4
+
+import pytest
+
+from backend.util import json
+from backend.util.prompt import CompressResult
+
+from .conftest import build_test_transcript as _build_transcript
+from .service import _friendly_error_text, _is_prompt_too_long
+from .transcript import (
+    _flatten_assistant_content,
+    _flatten_tool_result_content,
+    _messages_to_transcript,
+    _run_compression,
+    _transcript_to_messages,
+    compact_transcript,
+    validate_transcript,
+)
+
+# ---------------------------------------------------------------------------
+# _flatten_assistant_content
+# ---------------------------------------------------------------------------
+
+
+class TestFlattenAssistantContent:
+    def test_text_blocks(self):
+        blocks = [
+            {"type": "text", "text": "Hello"},
+            {"type": "text", "text": "World"},
+        ]
+        assert _flatten_assistant_content(blocks) == "Hello\nWorld"
+
+    def test_tool_use_blocks(self):
+        blocks = [{"type": "tool_use", "name": "read_file", "input": {}}]
+        assert _flatten_assistant_content(blocks) == "[tool_use: read_file]"
+
+    def test_mixed_blocks(self):
+        blocks = [
+            {"type": "text", "text": "Let me read that."},
+            {"type": "tool_use", "name": "Read", "input": {"path": "/foo"}},
+        ]
+        result = _flatten_assistant_content(blocks)
+        assert "Let me read that." in result
+        assert "[tool_use: Read]" in result
+
+    def test_raw_strings(self):
+        assert _flatten_assistant_content(["hello", "world"]) == "hello\nworld"
+
+    def test_unknown_block_type_preserved_as_placeholder(self):
+        blocks = [
+            {"type": "text", "text": "See this image:"},
+            {"type": "image", "source": {"type": "base64", "data": "..."}},
+        ]
+        result = _flatten_assistant_content(blocks)
+        assert "See this image:" in result
+        assert "[__image__]" in result
+
+    def test_empty(self):
+        assert _flatten_assistant_content([]) == ""
+
+
+# ---------------------------------------------------------------------------
+# _flatten_tool_result_content
+# ---------------------------------------------------------------------------
+
+
+class TestFlattenToolResultContent:
+    def test_tool_result_with_text(self):
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "123",
+                "content": [{"type": "text", "text": "file contents here"}],
+            }
+        ]
+        assert _flatten_tool_result_content(blocks) == "file contents here"
+
+    def test_tool_result_with_string_content(self):
+        blocks = [{"type": "tool_result", "tool_use_id": "123", "content": "ok"}]
+        assert _flatten_tool_result_content(blocks) == "ok"
+
+    def test_text_block(self):
+        blocks = [{"type": "text", "text": "plain text"}]
+        assert _flatten_tool_result_content(blocks) == "plain text"
+
+    def test_raw_string(self):
+        assert _flatten_tool_result_content(["raw"]) == "raw"
+
+    def test_tool_result_with_none_content(self):
+        """tool_result with content=None should produce empty string."""
+        blocks = [{"type": "tool_result", "tool_use_id": "x", "content": None}]
+        assert _flatten_tool_result_content(blocks) == ""
+
+    def test_tool_result_with_empty_list_content(self):
+        """tool_result with content=[] should produce empty string."""
+        blocks = [{"type": "tool_result", "tool_use_id": "x", "content": []}]
+        assert _flatten_tool_result_content(blocks) == ""
+
+    def test_empty(self):
+        assert _flatten_tool_result_content([]) == ""
+
+    def test_nested_dict_without_text(self):
+        """Dict blocks without text key use json.dumps fallback."""
+        blocks = [
+            {
+                "type": "tool_result",
+                "tool_use_id": "x",
+                "content": [{"type": "image", "source": "data:..."}],
+            }
+        ]
+        result = _flatten_tool_result_content(blocks)
+        assert "image" in result  # json.dumps fallback
+
+    def test_unknown_block_type_preserved_as_placeholder(self):
+        blocks = [{"type": "image", "source": {"type": "base64", "data": "..."}}]
+        result = _flatten_tool_result_content(blocks)
+        assert "[__image__]" in result
+
+
+# ---------------------------------------------------------------------------
+# _transcript_to_messages
+# ---------------------------------------------------------------------------
+
+
+def _make_entry(entry_type: str, role: str, content: str | list, **kwargs) -> str:
+    """Build a JSONL line for testing."""
+    uid = str(uuid4())
+    msg: dict = {"role": role, "content": content}
+    msg.update(kwargs)
+    entry = {
+        "type": entry_type,
+        "uuid": uid,
+        "parentUuid": None,
+        "message": msg,
+    }
+    return json.dumps(entry, separators=(",", ":"))
+
+
+class TestTranscriptToMessages:
+    def test_basic_roundtrip(self):
+        lines = [
+            _make_entry("user", "user", "Hello"),
+            _make_entry("assistant", "assistant", [{"type": "text", "text": "Hi"}]),
+        ]
+        content = "\n".join(lines) + "\n"
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 2
+        assert messages[0] == {"role": "user", "content": "Hello"}
+        assert messages[1] == {"role": "assistant", "content": "Hi"}
+
+    def test_skips_strippable_types(self):
+        """Progress and metadata entries are excluded."""
+        lines = [
+            _make_entry("user", "user", "Hello"),
+            json.dumps(
+                {
+                    "type": "progress",
+                    "uuid": str(uuid4()),
+                    "parentUuid": None,
+                    "message": {"role": "assistant", "content": "..."},
+                }
+            ),
+            _make_entry("assistant", "assistant", [{"type": "text", "text": "Hi"}]),
+        ]
+        content = "\n".join(lines) + "\n"
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 2
+
+    def test_empty_content(self):
+        assert _transcript_to_messages("") == []
+
+    def test_tool_result_content(self):
+        """User entries with tool_result content blocks are flattened."""
+        lines = [
+            _make_entry(
+                "user",
+                "user",
+                [
+                    {
+                        "type": "tool_result",
+                        "tool_use_id": "123",
+                        "content": "tool output",
+                    }
+                ],
+            ),
+        ]
+        content = "\n".join(lines) + "\n"
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 1
+        assert messages[0]["content"] == "tool output"
+
+    def test_malformed_json_lines_skipped(self):
+        """Malformed JSON lines in transcript are silently skipped."""
+        lines = [
+            _make_entry("user", "user", "Hello"),
+            "this is not valid json",
+            _make_entry("assistant", "assistant", [{"type": "text", "text": "Hi"}]),
+        ]
+        content = "\n".join(lines) + "\n"
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 2
+
+    def test_empty_lines_skipped(self):
+        """Empty lines and whitespace-only lines are skipped."""
+        lines = [
+            _make_entry("user", "user", "Hello"),
+            "",
+            "   ",
+            _make_entry("assistant", "assistant", [{"type": "text", "text": "Hi"}]),
+        ]
+        content = "\n".join(lines) + "\n"
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 2
+
+    def test_unicode_content_preserved(self):
+        """Unicode characters survive transcript roundtrip."""
+        lines = [
+            _make_entry("user", "user", "Hello 你好 🌍"),
+            _make_entry(
+                "assistant",
+                "assistant",
+                [{"type": "text", "text": "Bonjour 日本語 émojis 🎉"}],
+            ),
+        ]
+        content = "\n".join(lines) + "\n"
+        messages = _transcript_to_messages(content)
+        assert messages[0]["content"] == "Hello 你好 🌍"
+        assert messages[1]["content"] == "Bonjour 日本語 émojis 🎉"
+
+    def test_entry_without_role_skipped(self):
+        """Entries with missing role in message are skipped."""
+        entry_no_role = json.dumps(
+            {
+                "type": "user",
+                "uuid": str(uuid4()),
+                "parentUuid": None,
+                "message": {"content": "no role here"},
+            }
+        )
+        lines = [
+            entry_no_role,
+            _make_entry("user", "user", "Hello"),
+        ]
+        content = "\n".join(lines) + "\n"
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 1
+        assert messages[0]["content"] == "Hello"
+
+    def test_tool_use_and_result_pairs(self):
+        """Tool use + tool result pairs are properly flattened."""
+        lines = [
+            _make_entry(
+                "assistant",
+                "assistant",
+                [
+                    {"type": "text", "text": "Let me check."},
+                    {"type": "tool_use", "name": "read_file", "input": {"path": "/x"}},
+                ],
+            ),
+            _make_entry(
+                "user",
+                "user",
+                [
+                    {
+                        "type": "tool_result",
+                        "tool_use_id": "abc",
+                        "content": [{"type": "text", "text": "file contents"}],
+                    }
+                ],
+            ),
+        ]
+        content = "\n".join(lines) + "\n"
+        messages = _transcript_to_messages(content)
+        assert len(messages) == 2
+        assert "Let me check." in messages[0]["content"]
+        assert "[tool_use: read_file]" in messages[0]["content"]
+        assert messages[1]["content"] == "file contents"
+
+
+# ---------------------------------------------------------------------------
+# _messages_to_transcript
+# ---------------------------------------------------------------------------
+
+
+class TestMessagesToTranscript:
+    def test_produces_valid_jsonl(self):
+        messages = [
+            {"role": "user", "content": "Hello"},
+            {"role": "assistant", "content": "Hi there"},
+        ]
+        result = _messages_to_transcript(messages)
+        lines = result.strip().split("\n")
+        assert len(lines) == 2
+        for line in lines:
+            parsed = json.loads(line)
+            assert "type" in parsed
+            assert "uuid" in parsed
+            assert "message" in parsed
+
+    def test_assistant_has_proper_structure(self):
+        messages = [{"role": "assistant", "content": "Hello"}]
+        result = _messages_to_transcript(messages)
+        entry = json.loads(result.strip())
+        assert entry["type"] == "assistant"
+        msg = entry["message"]
+        assert msg["role"] == "assistant"
+        assert msg["type"] == "message"
+        assert msg["stop_reason"] == "end_turn"
+        assert isinstance(msg["content"], list)
+        assert msg["content"][0]["type"] == "text"
+
+    def test_user_has_plain_content(self):
+        messages = [{"role": "user", "content": "Hi"}]
+        result = _messages_to_transcript(messages)
+        entry = json.loads(result.strip())
+        assert entry["type"] == "user"
+        assert entry["message"]["content"] == "Hi"
+
+    def test_parent_uuid_chain(self):
+        messages = [
+            {"role": "user", "content": "A"},
+            {"role": "assistant", "content": "B"},
+            {"role": "user", "content": "C"},
+        ]
+        result = _messages_to_transcript(messages)
+        lines = result.strip().split("\n")
+        entries = [json.loads(line) for line in lines]
+        assert entries[0]["parentUuid"] == ""
+        assert entries[1]["parentUuid"] == entries[0]["uuid"]
+        assert entries[2]["parentUuid"] == entries[1]["uuid"]
+
+    def test_empty_messages(self):
+        assert _messages_to_transcript([]) == ""
+
+    def test_output_is_valid_transcript(self):
+        """Output should pass validate_transcript if it has assistant entries."""
+        messages = [
+            {"role": "user", "content": "Hello"},
+            {"role": "assistant", "content": "Hi"},
+        ]
+        result = _messages_to_transcript(messages)
+        assert validate_transcript(result)
+
+    def test_roundtrip_to_messages(self):
+        """Messages → transcript → messages preserves structure."""
+        original = [
+            {"role": "user", "content": "Hello"},
+            {"role": "assistant", "content": "Hi there"},
+            {"role": "user", "content": "How are you?"},
+        ]
+        transcript = _messages_to_transcript(original)
+        restored = _transcript_to_messages(transcript)
+        assert len(restored) == len(original)
+        for orig, rest in zip(original, restored):
+            assert orig["role"] == rest["role"]
+            assert orig["content"] == rest["content"]
+
+
+# ---------------------------------------------------------------------------
+# compact_transcript
+# ---------------------------------------------------------------------------
+
+
+class TestCompactTranscript:
+    @pytest.mark.asyncio
+    async def test_too_few_messages_returns_none(self, mock_chat_config):
+        """compact_transcript returns None when transcript has < 2 messages."""
+        transcript = _build_transcript([("user", "Hello")])
+        result = await compact_transcript(transcript, model="test-model")
+        assert result is None
+
+    @pytest.mark.asyncio
+    async def test_returns_none_when_not_compacted(self, mock_chat_config):
+        """When compress_context says no compaction needed, returns None.
+        The compressor couldn't reduce it, so retrying with the same
+        content would fail identically."""
+        transcript = _build_transcript(
+            [
+                ("user", "Hello"),
+                ("assistant", "Hi there"),
+            ]
+        )
+        mock_result = type(
+            "CompressResult",
+            (),
+            {
+                "was_compacted": False,
+                "messages": [],
+                "original_token_count": 100,
+                "token_count": 100,
+                "messages_summarized": 0,
+                "messages_dropped": 0,
+            },
+        )()
+        with patch(
+            "backend.copilot.sdk.transcript._run_compression",
+            new_callable=AsyncMock,
+            return_value=mock_result,
+        ):
+            result = await compact_transcript(transcript, model="test-model")
+        assert result is None
+
+    @pytest.mark.asyncio
+    async def test_returns_compacted_transcript(self, mock_chat_config):
+        """When compaction succeeds, returns a valid compacted transcript."""
+        transcript = _build_transcript(
+            [
+                ("user", "Hello"),
+                ("assistant", "Hi"),
+                ("user", "More"),
+                ("assistant", "Details"),
+            ]
+        )
+        compacted_msgs = [
+            {"role": "user", "content": "[summary]"},
+            {"role": "assistant", "content": "Summarized response"},
+        ]
+        mock_result = type(
+            "CompressResult",
+            (),
+            {
+                "was_compacted": True,
+                "messages": compacted_msgs,
+                "original_token_count": 500,
+                "token_count": 100,
+                "messages_summarized": 2,
+                "messages_dropped": 0,
+            },
+        )()
+        with patch(
+            "backend.copilot.sdk.transcript._run_compression",
+            new_callable=AsyncMock,
+            return_value=mock_result,
+        ):
+            result = await compact_transcript(transcript, model="test-model")
+        assert result is not None
+        assert validate_transcript(result)
+        msgs = _transcript_to_messages(result)
+        assert len(msgs) == 2
+        assert msgs[1]["content"] == "Summarized response"
+
+    @pytest.mark.asyncio
+    async def test_returns_none_on_compression_failure(self, mock_chat_config):
+        """When _run_compression raises, returns None."""
+        transcript = _build_transcript(
+            [
+                ("user", "Hello"),
+                ("assistant", "Hi"),
+            ]
+        )
+        with patch(
+            "backend.copilot.sdk.transcript._run_compression",
+            new_callable=AsyncMock,
+            side_effect=RuntimeError("LLM unavailable"),
+        ):
+            result = await compact_transcript(transcript, model="test-model")
+        assert result is None
+
+
+# ---------------------------------------------------------------------------
+# _is_prompt_too_long
+# ---------------------------------------------------------------------------
+
+
+class TestIsPromptTooLong:
+    """Unit tests for _is_prompt_too_long pattern matching."""
+
+    def test_prompt_is_too_long(self):
+        err = RuntimeError("prompt is too long for model context")
+        assert _is_prompt_too_long(err) is True
+
+    def test_request_too_large(self):
+        err = Exception("request too large: 250000 tokens")
+        assert _is_prompt_too_long(err) is True
+
+    def test_maximum_context_length(self):
+        err = ValueError("maximum context length exceeded")
+        assert _is_prompt_too_long(err) is True
+
+    def test_context_length_exceeded(self):
+        err = Exception("context_length_exceeded")
+        assert _is_prompt_too_long(err) is True
+
+    def test_input_tokens_exceed(self):
+        err = Exception("input tokens exceed the max_tokens limit")
+        assert _is_prompt_too_long(err) is True
+
+    def test_input_is_too_long(self):
+        err = Exception("input is too long for the model")
+        assert _is_prompt_too_long(err) is True
+
+    def test_content_length_exceeds(self):
+        err = Exception("content length exceeds maximum")
+        assert _is_prompt_too_long(err) is True
+
+    def test_unrelated_error_returns_false(self):
+        err = RuntimeError("network timeout")
+        assert _is_prompt_too_long(err) is False
+
+    def test_auth_error_returns_false(self):
+        err = Exception("authentication failed: invalid API key")
+        assert _is_prompt_too_long(err) is False
+
+    def test_chained_exception_detected(self):
+        """Prompt-too-long error wrapped in another exception is detected."""
+        inner = RuntimeError("prompt is too long")
+        outer = Exception("SDK error")
+        outer.__cause__ = inner
+        assert _is_prompt_too_long(outer) is True
+
+    def test_case_insensitive(self):
+        err = Exception("PROMPT IS TOO LONG")
+        assert _is_prompt_too_long(err) is True
+
+    def test_old_max_tokens_exceeded_not_matched(self):
+        """The old broad 'max_tokens_exceeded' pattern was removed.
+        Only 'input tokens exceed' should match now."""
+        err = Exception("max_tokens_exceeded")
+        assert _is_prompt_too_long(err) is False
+
+
+# ---------------------------------------------------------------------------
+# _run_compression timeout fallback
+# ---------------------------------------------------------------------------
+
+
+class TestRunCompressionTimeout:
+    """Verify _run_compression falls back to truncation when LLM times out."""
+
+    @pytest.mark.asyncio
+    async def test_timeout_falls_back_to_truncation(self):
+        """When compress_context with LLM client times out,
+        _run_compression falls back to truncation (client=None)."""
+        messages = [
+            {"role": "user", "content": "Hello"},
+            {"role": "assistant", "content": "Hi there"},
+        ]
+        truncation_result = CompressResult(
+            messages=messages,
+            was_compacted=False,
+            original_token_count=50,
+            token_count=50,
+            messages_summarized=0,
+            messages_dropped=0,
+        )
+
+        call_args: list[dict] = []
+
+        async def _mock_compress(**kwargs):
+            call_args.append(kwargs)
+            if kwargs.get("client") is not None:
+                # Simulate timeout by raising asyncio.TimeoutError
+                raise asyncio.TimeoutError("LLM compaction timed out")
+            return truncation_result
+
+        with (
+            patch(
+                "backend.copilot.sdk.transcript.get_openai_client",
+                return_value="fake-client",
+            ),
+            patch(
+                "backend.copilot.sdk.transcript.compress_context",
+                side_effect=_mock_compress,
+            ),
+        ):
+            result = await _run_compression(messages, "test-model", "[test]")
+
+        assert result == truncation_result
+        # Should have been called twice: once with client, once without
+        assert len(call_args) == 2
+        assert call_args[0]["client"] is not None  # LLM attempt
+        assert call_args[1]["client"] is None  # truncation fallback
+
+    @pytest.mark.asyncio
+    async def test_no_client_uses_truncation_directly(self):
+        """When no OpenAI client is configured, goes straight to truncation."""
+        messages = [
+            {"role": "user", "content": "Hello"},
+            {"role": "assistant", "content": "Hi there"},
+        ]
+        truncation_result = CompressResult(
+            messages=messages,
+            was_compacted=False,
+            original_token_count=50,
+            token_count=50,
+            messages_summarized=0,
+            messages_dropped=0,
+        )
+
+        with (
+            patch(
+                "backend.copilot.sdk.transcript.get_openai_client",
+                return_value=None,
+            ),
+            patch(
+                "backend.copilot.sdk.transcript.compress_context",
+                new_callable=AsyncMock,
+                return_value=truncation_result,
+            ) as mock_compress,
+        ):
+            result = await _run_compression(messages, "test-model", "[test]")
+
+        assert result == truncation_result
+        mock_compress.assert_called_once()
+        # When no client, compress_context is called with client=None
+        assert mock_compress.call_args.kwargs.get("client") is None
+
+
+# ---------------------------------------------------------------------------
+# _friendly_error_text
+# ---------------------------------------------------------------------------
+
+
+class TestFriendlyErrorText:
+    """Verify user-friendly error message mapping."""
+
+    def test_authentication_error(self):
+        result = _friendly_error_text("authentication failed: invalid API key")
+        assert "Authentication" in result
+        assert "API key" in result
+
+    def test_rate_limit_error(self):
+        result = _friendly_error_text("rate limit exceeded")
+        assert "Rate limit" in result
+
+    def test_overloaded_error(self):
+        result = _friendly_error_text("API is overloaded")
+        assert "overloaded" in result
+
+    def test_timeout_error(self):
+        result = _friendly_error_text("Request timeout after 30s")
+        assert "timed out" in result
+
+    def test_connection_error(self):
+        result = _friendly_error_text("Connection refused")
+        assert "Connection" in result or "connection" in result
+
+    def test_unknown_error_passthrough(self):
+        result = _friendly_error_text("some unknown error XYZ")
+        assert "SDK stream error:" in result
+        assert "XYZ" in result
+
+    def test_unauthorized_error(self):
+        result = _friendly_error_text("401 Unauthorized")
+        assert "Authentication" in result
--- a/autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py
--- a/autogpt_platform/backend/backend/copilot/sdk/service.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/service.py
--- a/autogpt_platform/backend/backend/copilot/sdk/service_helpers_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/service_helpers_test.py
@@ -0,0 +1,283 @@
+"""Unit tests for extracted service helpers.
+
+Covers ``_is_prompt_too_long``, ``_reduce_context``, ``_iter_sdk_messages``,
+and the ``ReducedContext`` named tuple.
+"""
+
+from __future__ import annotations
+
+import asyncio
+from collections.abc import AsyncGenerator
+from unittest.mock import AsyncMock, patch
+
+import pytest
+
+from .conftest import build_test_transcript as _build_transcript
+from .service import (
+    ReducedContext,
+    _is_prompt_too_long,
+    _iter_sdk_messages,
+    _reduce_context,
+)
+
+# ---------------------------------------------------------------------------
+# _is_prompt_too_long
+# ---------------------------------------------------------------------------
+
+
+class TestIsPromptTooLong:
+    def test_direct_match(self) -> None:
+        assert _is_prompt_too_long(Exception("prompt is too long")) is True
+
+    def test_case_insensitive(self) -> None:
+        assert _is_prompt_too_long(Exception("PROMPT IS TOO LONG")) is True
+
+    def test_no_match(self) -> None:
+        assert _is_prompt_too_long(Exception("network timeout")) is False
+
+    def test_request_too_large(self) -> None:
+        assert _is_prompt_too_long(Exception("request too large for model")) is True
+
+    def test_context_length_exceeded(self) -> None:
+        assert _is_prompt_too_long(Exception("context_length_exceeded")) is True
+
+    def test_max_tokens_exceeded_not_matched(self) -> None:
+        """'max_tokens_exceeded' is intentionally excluded (too broad)."""
+        assert _is_prompt_too_long(Exception("max_tokens_exceeded")) is False
+
+    def test_max_tokens_config_error_no_match(self) -> None:
+        """'max_tokens must be at least 1' should NOT match."""
+        assert _is_prompt_too_long(Exception("max_tokens must be at least 1")) is False
+
+    def test_chained_cause(self) -> None:
+        inner = Exception("prompt is too long")
+        outer = RuntimeError("SDK error")
+        outer.__cause__ = inner
+        assert _is_prompt_too_long(outer) is True
+
+    def test_chained_context(self) -> None:
+        inner = Exception("request too large")
+        outer = RuntimeError("wrapped")
+        outer.__context__ = inner
+        assert _is_prompt_too_long(outer) is True
+
+    def test_deep_chain(self) -> None:
+        bottom = Exception("maximum context length")
+        middle = RuntimeError("middle")
+        middle.__cause__ = bottom
+        top = ValueError("top")
+        top.__cause__ = middle
+        assert _is_prompt_too_long(top) is True
+
+    def test_chain_no_match(self) -> None:
+        inner = Exception("rate limit exceeded")
+        outer = RuntimeError("wrapped")
+        outer.__cause__ = inner
+        assert _is_prompt_too_long(outer) is False
+
+    def test_cycle_detection(self) -> None:
+        """Exception chain with a cycle should not infinite-loop."""
+        a = Exception("error a")
+        b = Exception("error b")
+        a.__cause__ = b
+        b.__cause__ = a  # cycle
+        assert _is_prompt_too_long(a) is False
+
+    def test_all_patterns(self) -> None:
+        patterns = [
+            "prompt is too long",
+            "request too large",
+            "maximum context length",
+            "context_length_exceeded",
+            "input tokens exceed",
+            "input is too long",
+            "content length exceeds",
+        ]
+        for pattern in patterns:
+            assert _is_prompt_too_long(Exception(pattern)) is True, pattern
+
+
+# ---------------------------------------------------------------------------
+# _reduce_context
+# ---------------------------------------------------------------------------
+
+
+class TestReduceContext:
+    @pytest.mark.asyncio
+    async def test_first_retry_compaction_success(self) -> None:
+        transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
+        compacted = _build_transcript([("user", "hi"), ("assistant", "[summary]")])
+
+        with (
+            patch(
+                "backend.copilot.sdk.service.compact_transcript",
+                new_callable=AsyncMock,
+                return_value=compacted,
+            ),
+            patch(
+                "backend.copilot.sdk.service.validate_transcript",
+                return_value=True,
+            ),
+            patch(
+                "backend.copilot.sdk.service.write_transcript_to_tempfile",
+                return_value="/tmp/resume.jsonl",
+            ),
+        ):
+            ctx = await _reduce_context(
+                transcript, False, "sess-123", "/tmp/cwd", "[test]"
+            )
+
+        assert isinstance(ctx, ReducedContext)
+        assert ctx.use_resume is True
+        assert ctx.resume_file == "/tmp/resume.jsonl"
+        assert ctx.transcript_lost is False
+        assert ctx.tried_compaction is True
+
+    @pytest.mark.asyncio
+    async def test_compaction_fails_drops_transcript(self) -> None:
+        transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
+
+        with patch(
+            "backend.copilot.sdk.service.compact_transcript",
+            new_callable=AsyncMock,
+            return_value=None,
+        ):
+            ctx = await _reduce_context(
+                transcript, False, "sess-123", "/tmp/cwd", "[test]"
+            )
+
+        assert ctx.use_resume is False
+        assert ctx.resume_file is None
+        assert ctx.transcript_lost is True
+        assert ctx.tried_compaction is True
+
+    @pytest.mark.asyncio
+    async def test_already_tried_compaction_skips(self) -> None:
+        transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
+
+        ctx = await _reduce_context(transcript, True, "sess-123", "/tmp/cwd", "[test]")
+
+        assert ctx.use_resume is False
+        assert ctx.transcript_lost is True
+        assert ctx.tried_compaction is True
+
+    @pytest.mark.asyncio
+    async def test_empty_transcript_drops(self) -> None:
+        ctx = await _reduce_context("", False, "sess-123", "/tmp/cwd", "[test]")
+
+        assert ctx.use_resume is False
+        assert ctx.transcript_lost is True
+
+    @pytest.mark.asyncio
+    async def test_compaction_returns_same_content_drops(self) -> None:
+        transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
+
+        with patch(
+            "backend.copilot.sdk.service.compact_transcript",
+            new_callable=AsyncMock,
+            return_value=transcript,  # same content
+        ):
+            ctx = await _reduce_context(
+                transcript, False, "sess-123", "/tmp/cwd", "[test]"
+            )
+
+        assert ctx.transcript_lost is True
+
+    @pytest.mark.asyncio
+    async def test_write_tempfile_fails_drops(self) -> None:
+        transcript = _build_transcript([("user", "hi"), ("assistant", "hello")])
+        compacted = _build_transcript([("user", "hi"), ("assistant", "[summary]")])
+
+        with (
+            patch(
+                "backend.copilot.sdk.service.compact_transcript",
+                new_callable=AsyncMock,
+                return_value=compacted,
+            ),
+            patch(
+                "backend.copilot.sdk.service.validate_transcript",
+                return_value=True,
+            ),
+            patch(
+                "backend.copilot.sdk.service.write_transcript_to_tempfile",
+                return_value=None,
+            ),
+        ):
+            ctx = await _reduce_context(
+                transcript, False, "sess-123", "/tmp/cwd", "[test]"
+            )
+
+        assert ctx.transcript_lost is True
+
+
+# ---------------------------------------------------------------------------
+# _iter_sdk_messages
+# ---------------------------------------------------------------------------
+
+
+class TestIterSdkMessages:
+    @pytest.mark.asyncio
+    async def test_yields_messages(self) -> None:
+        messages = ["msg1", "msg2", "msg3"]
+        client = AsyncMock()
+
+        async def _fake_receive() -> AsyncGenerator[str]:
+            for m in messages:
+                yield m
+
+        client.receive_response = _fake_receive
+        result = [msg async for msg in _iter_sdk_messages(client)]
+        assert result == messages
+
+    @pytest.mark.asyncio
+    async def test_heartbeat_on_timeout(self) -> None:
+        """Yields None when asyncio.wait times out."""
+        client = AsyncMock()
+        received: list = []
+
+        async def _slow_receive() -> AsyncGenerator[str]:
+            await asyncio.sleep(100)  # never completes
+            yield "never"  # pragma: no cover — unreachable, yield makes this an async generator
+
+        client.receive_response = _slow_receive
+
+        with patch("backend.copilot.sdk.service._HEARTBEAT_INTERVAL", 0.01):
+            count = 0
+            async for msg in _iter_sdk_messages(client):
+                received.append(msg)
+                count += 1
+                if count >= 3:
+                    break
+
+        assert all(m is None for m in received)
+
+    @pytest.mark.asyncio
+    async def test_exception_propagates(self) -> None:
+        client = AsyncMock()
+
+        async def _error_receive() -> AsyncGenerator[str]:
+            raise RuntimeError("SDK crash")
+            yield  # pragma: no cover — unreachable, yield makes this an async generator
+
+        client.receive_response = _error_receive
+
+        with pytest.raises(RuntimeError, match="SDK crash"):
+            async for _ in _iter_sdk_messages(client):
+                pass
+
+    @pytest.mark.asyncio
+    async def test_task_cleanup_on_break(self) -> None:
+        """Pending task is cancelled when generator is closed."""
+        client = AsyncMock()
+
+        async def _slow_receive() -> AsyncGenerator[str]:
+            yield "first"
+            await asyncio.sleep(100)
+            yield "second"
+
+        client.receive_response = _slow_receive
+
+        gen = _iter_sdk_messages(client)
+        first = await gen.__anext__()
+        assert first == "first"
+        await gen.aclose()  # should cancel pending task cleanly
--- a/autogpt_platform/backend/backend/copilot/sdk/subscription.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/subscription.py
@@ -0,0 +1,144 @@
+"""Claude Code subscription auth helpers.
+
+Handles locating the SDK-bundled CLI binary, provisioning credentials from
+environment variables, and validating that subscription auth is functional.
+"""
+
+import functools
+import json
+import logging
+import os
+import shutil
+import subprocess
+
+logger = logging.getLogger(__name__)
+
+
+def find_bundled_cli() -> str:
+    """Locate the Claude CLI binary bundled inside ``claude_agent_sdk``.
+
+    Falls back to ``shutil.which("claude")`` if the SDK bundle is absent.
+    """
+    try:
+        from claude_agent_sdk._internal.transport.subprocess_cli import (
+            SubprocessCLITransport,
+        )
+
+        path = SubprocessCLITransport._find_bundled_cli(None)  # type: ignore[arg-type]
+        if path:
+            return str(path)
+    except Exception:
+        pass
+    system_path = shutil.which("claude")
+    if system_path:
+        return system_path
+    raise RuntimeError(
+        "Claude CLI not found — neither the SDK-bundled binary nor a "
+        "system-installed `claude` could be located."
+    )
+
+
+def provision_credentials_file() -> None:
+    """Write ``~/.claude/.credentials.json`` from env when running headless.
+
+    If ``CLAUDE_CODE_OAUTH_TOKEN`` is set (an OAuth *access* token obtained
+    from ``claude auth status`` or extracted from the macOS keychain), this
+    helper writes a minimal credentials file so the bundled CLI can
+    authenticate without an interactive ``claude login``.
+
+    A ``CLAUDE_CODE_REFRESH_TOKEN`` env var is optional but recommended —
+    it lets the CLI silently refresh an expired access token.
+    """
+    access_token = os.environ.get("CLAUDE_CODE_OAUTH_TOKEN", "").strip()
+    if not access_token:
+        return
+
+    creds_dir = os.path.expanduser("~/.claude")
+    creds_path = os.path.join(creds_dir, ".credentials.json")
+
+    # Don't overwrite an existing credentials file (e.g. from a volume mount).
+    if os.path.exists(creds_path):
+        logger.debug("Credentials file already exists at %s — skipping", creds_path)
+        return
+
+    os.makedirs(creds_dir, exist_ok=True)
+
+    creds = {
+        "claudeAiOauth": {
+            "accessToken": access_token,
+            "refreshToken": os.environ.get("CLAUDE_CODE_REFRESH_TOKEN", "").strip(),
+            "expiresAt": 0,
+            "scopes": [
+                "user:inference",
+                "user:profile",
+                "user:sessions:claude_code",
+            ],
+        }
+    }
+    with open(creds_path, "w") as f:
+        json.dump(creds, f)
+    logger.info("Provisioned Claude credentials file at %s", creds_path)
+
+
+@functools.cache
+def validate_subscription() -> None:
+    """Validate the bundled Claude CLI is reachable and authenticated.
+
+    Cached so the blocking subprocess check runs at most once per process
+    lifetime.  On first call, also provisions ``~/.claude/.credentials.json``
+    from the ``CLAUDE_CODE_OAUTH_TOKEN`` env var when available.
+    """
+    provision_credentials_file()
+
+    cli = find_bundled_cli()
+    result = subprocess.run(
+        [cli, "--version"],
+        capture_output=True,
+        text=True,
+        timeout=10,
+    )
+    if result.returncode != 0:
+        raise RuntimeError(
+            f"Claude CLI check failed (exit {result.returncode}): "
+            f"{result.stderr.strip()}"
+        )
+    logger.info(
+        "Claude Code subscription mode: CLI version %s",
+        result.stdout.strip(),
+    )
+
+    # Verify the CLI is actually authenticated.
+    auth_result = subprocess.run(
+        [cli, "auth", "status"],
+        capture_output=True,
+        text=True,
+        timeout=10,
+        env={
+            **os.environ,
+            "ANTHROPIC_API_KEY": "",
+            "ANTHROPIC_AUTH_TOKEN": "",
+            "ANTHROPIC_BASE_URL": "",
+        },
+    )
+    if auth_result.returncode != 0:
+        raise RuntimeError(
+            "Claude CLI is not authenticated. Either:\n"
+            "  • Set CLAUDE_CODE_OAUTH_TOKEN env var (from `claude auth status` "
+            "or macOS keychain), or\n"
+            "  • Mount ~/.claude/.credentials.json into the container, or\n"
+            "  • Run `claude login` inside the container."
+        )
+    try:
+        status = json.loads(auth_result.stdout)
+        if not status.get("loggedIn"):
+            raise RuntimeError(
+                "Claude CLI reports loggedIn=false. Set CLAUDE_CODE_OAUTH_TOKEN "
+                "or run `claude login`."
+            )
+        logger.info(
+            "Claude subscription auth: method=%s, email=%s",
+            status.get("authMethod"),
+            status.get("email"),
+        )
+    except json.JSONDecodeError:
+        logger.warning("Could not parse `claude auth status` output")
--- a/autogpt_platform/backend/backend/copilot/sdk/transcript.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/transcript.py
@@ -10,6 +10,9 @@ Storage is handled via ``WorkspaceStorageBackend`` (GCS in prod, local
 filesystem for self-hosted) — no DB column needed.
 """

+from __future__ import annotations
+
+import asyncio
 import logging
 import os
 import re
@@ -17,8 +20,12 @@ import shutil
 import time
 from dataclasses import dataclass
 from pathlib import Path
+from uuid import uuid4

 from backend.util import json
+from backend.util.clients import get_openai_client
+from backend.util.prompt import CompressResult, compress_context
+from backend.util.workspace_storage import GCSWorkspaceStorage, get_workspace_storage

 logger = logging.getLogger(__name__)

@@ -99,7 +106,14 @@ def strip_progress_entries(content: str) -> str:
            continue
        parent = entry.get("parentUuid", "")
        original_parent = parent
-        while parent in stripped_uuids:
+        # seen_parents is local per-entry (not shared across iterations) so
+        # it can only detect cycles within a single ancestry walk, not across
+        # entries.  This is intentional: each entry's parent chain is
+        # independent, and reusing a global set would incorrectly short-circuit
+        # valid re-use of the same UUID as a parent in different subtrees.
+        seen_parents: set[str] = set()
+        while parent in stripped_uuids and parent not in seen_parents:
+            seen_parents.add(parent)
            parent = uuid_to_parent.get(parent, "")
        if parent != original_parent:
            entry["parentUuid"] = parent
@@ -336,7 +350,7 @@ def write_transcript_to_tempfile(
    # Validate cwd is under the expected sandbox prefix (CodeQL sanitizer).
    real_cwd = os.path.realpath(cwd)
    if not real_cwd.startswith(_SAFE_CWD_PREFIX):
-        logger.warning(f"[Transcript] cwd outside sandbox: {cwd}")
+        logger.warning("[Transcript] cwd outside sandbox: %s", cwd)
        return None

    try:
@@ -346,17 +360,17 @@ def write_transcript_to_tempfile(
            os.path.join(real_cwd, f"transcript-{safe_id}.jsonl")
        )
        if not jsonl_path.startswith(real_cwd):
-            logger.warning(f"[Transcript] Path escaped cwd: {jsonl_path}")
+            logger.warning("[Transcript] Path escaped cwd: %s", jsonl_path)
            return None

        with open(jsonl_path, "w") as f:
            f.write(transcript_content)

-        logger.info(f"[Transcript] Wrote resume file: {jsonl_path}")
+        logger.info("[Transcript] Wrote resume file: %s", jsonl_path)
        return jsonl_path

    except OSError as e:
-        logger.warning(f"[Transcript] Failed to write resume file: {e}")
+        logger.warning("[Transcript] Failed to write resume file: %s", e)
        return None


@@ -417,8 +431,6 @@ def _meta_storage_path_parts(user_id: str, session_id: str) -> tuple[str, str, s

 def _build_path_from_parts(parts: tuple[str, str, str], backend: object) -> str:
    """Build a full storage path from (workspace_id, file_id, filename) parts."""
-    from backend.util.workspace_storage import GCSWorkspaceStorage
-
    wid, fid, fname = parts
    if isinstance(backend, GCSWorkspaceStorage):
        blob = f"workspaces/{wid}/{fid}/{fname}"
@@ -457,17 +469,15 @@ async def upload_transcript(
        content: Complete JSONL transcript (from TranscriptBuilder).
        message_count: ``len(session.messages)`` at upload time.
    """
-    from backend.util.workspace_storage import get_workspace_storage
-
    # Strip metadata entries (progress, file-history-snapshot, etc.)
    # Note: SDK-built transcripts shouldn't have these, but strip for safety
    stripped = strip_progress_entries(content)
    if not validate_transcript(stripped):
        # Log entry types for debugging — helps identify why validation failed
-        entry_types: list[str] = []
-        for line in stripped.strip().split("\n"):
-            entry = json.loads(line, fallback={"type": "INVALID_JSON"})
-            entry_types.append(entry.get("type", "?"))
+        entry_types = [
+            json.loads(line, fallback={"type": "INVALID_JSON"}).get("type", "?")
+            for line in stripped.strip().split("\n")
+        ]
        logger.warning(
            "%s Skipping upload — stripped content not valid "
            "(types=%s, stripped_len=%d, raw_len=%d)",
@@ -503,11 +513,14 @@ async def upload_transcript(
            content=json.dumps(meta).encode("utf-8"),
        )
    except Exception as e:
-        logger.warning(f"{log_prefix} Failed to write metadata: {e}")
+        logger.warning("%s Failed to write metadata: %s", log_prefix, e)

    logger.info(
-        f"{log_prefix} Uploaded {len(encoded)}B "
-        f"(stripped from {len(content)}B, msg_count={message_count})"
+        "%s Uploaded %dB (stripped from %dB, msg_count=%d)",
+        log_prefix,
+        len(encoded),
+        len(content),
+        message_count,
    )


@@ -521,8 +534,6 @@ async def download_transcript(
    Returns a ``TranscriptDownload`` with the JSONL content and the
    ``message_count`` watermark from the upload, or ``None`` if not found.
    """
-    from backend.util.workspace_storage import get_workspace_storage
-
    storage = await get_workspace_storage()
    path = _build_storage_path(user_id, session_id, storage)

@@ -530,10 +541,10 @@ async def download_transcript(
        data = await storage.retrieve(path)
        content = data.decode("utf-8")
    except FileNotFoundError:
-        logger.debug(f"{log_prefix} No transcript in storage")
+        logger.debug("%s No transcript in storage", log_prefix)
        return None
    except Exception as e:
-        logger.warning(f"{log_prefix} Failed to download transcript: {e}")
+        logger.warning("%s Failed to download transcript: %s", log_prefix, e)
        return None

    # Try to load metadata (best-effort — old transcripts won't have it)
@@ -545,10 +556,14 @@ async def download_transcript(
        meta = json.loads(meta_data.decode("utf-8"), fallback={})
        message_count = meta.get("message_count", 0)
        uploaded_at = meta.get("uploaded_at", 0.0)
-    except (FileNotFoundError, Exception):
+    except FileNotFoundError:
        pass  # No metadata — treat as unknown (msg_count=0 → always fill gap)
+    except Exception as e:
+        logger.debug("%s Failed to load transcript metadata: %s", log_prefix, e)

-    logger.info(f"{log_prefix} Downloaded {len(content)}B (msg_count={message_count})")
+    logger.info(
+        "%s Downloaded %dB (msg_count=%d)", log_prefix, len(content), message_count
+    )
    return TranscriptDownload(
        content=content,
        message_count=message_count,
@@ -562,8 +577,6 @@ async def delete_transcript(user_id: str, session_id: str) -> None:
    Removes both the ``.jsonl`` transcript and the companion ``.meta.json``
    so stale ``message_count`` watermarks cannot corrupt gap-fill logic.
    """
-    from backend.util.workspace_storage import get_workspace_storage
-
    storage = await get_workspace_storage()
    path = _build_storage_path(user_id, session_id, storage)

@@ -580,3 +593,280 @@ async def delete_transcript(user_id: str, session_id: str) -> None:
        logger.info("[Transcript] Deleted metadata for session %s", session_id)
    except Exception as e:
        logger.warning("[Transcript] Failed to delete metadata: %s", e)
+
+
+# ---------------------------------------------------------------------------
+# Transcript compaction — LLM summarization for prompt-too-long recovery
+# ---------------------------------------------------------------------------
+
+# JSONL protocol values used in transcript serialization.
+STOP_REASON_END_TURN = "end_turn"
+COMPACT_MSG_ID_PREFIX = "msg_compact_"
+ENTRY_TYPE_MESSAGE = "message"
+
+
+def _flatten_assistant_content(blocks: list) -> str:
+    """Flatten assistant content blocks into a single plain-text string.
+
+    Structured ``tool_use`` blocks are converted to ``[tool_use: name]``
+    placeholders.  This is intentional: ``compress_context`` requires plain
+    text for token counting and LLM summarization.  The structural loss is
+    acceptable because compaction only runs when the original transcript was
+    already too large for the model — a summarized plain-text version is
+    better than no context at all.
+    """
+    parts: list[str] = []
+    for block in blocks:
+        if isinstance(block, dict):
+            btype = block.get("type", "")
+            if btype == "text":
+                parts.append(block.get("text", ""))
+            elif btype == "tool_use":
+                parts.append(f"[tool_use: {block.get('name', '?')}]")
+            else:
+                # Preserve non-text blocks (e.g. image) as placeholders.
+                # Use __prefix__ to distinguish from literal user text.
+                parts.append(f"[__{btype}__]")
+        elif isinstance(block, str):
+            parts.append(block)
+    return "\n".join(parts) if parts else ""
+
+
+def _flatten_tool_result_content(blocks: list) -> str:
+    """Flatten tool_result and other content blocks into plain text.
+
+    Handles nested tool_result structures, text blocks, and raw strings.
+    Uses ``json.dumps`` as fallback for dict blocks without a ``text`` key
+    or where ``text`` is ``None``.
+
+    Like ``_flatten_assistant_content``, structured blocks (images, nested
+    tool results) are reduced to text representations for compression.
+    """
+    str_parts: list[str] = []
+    for block in blocks:
+        if isinstance(block, dict) and block.get("type") == "tool_result":
+            inner = block.get("content") or ""
+            if isinstance(inner, list):
+                for sub in inner:
+                    if isinstance(sub, dict):
+                        sub_type = sub.get("type")
+                        if sub_type in ("image", "document"):
+                            # Avoid serializing base64 binary data into
+                            # the compaction input — use a placeholder.
+                            str_parts.append(f"[__{sub_type}__]")
+                        elif sub_type == "text" or sub.get("text") is not None:
+                            str_parts.append(str(sub.get("text", "")))
+                        else:
+                            str_parts.append(json.dumps(sub))
+                    else:
+                        str_parts.append(str(sub))
+            else:
+                str_parts.append(str(inner))
+        elif isinstance(block, dict) and block.get("type") == "text":
+            str_parts.append(str(block.get("text", "")))
+        elif isinstance(block, dict):
+            # Preserve non-text/non-tool_result blocks (e.g. image) as placeholders.
+            # Use __prefix__ to distinguish from literal user text.
+            btype = block.get("type", "unknown")
+            str_parts.append(f"[__{btype}__]")
+        elif isinstance(block, str):
+            str_parts.append(block)
+    return "\n".join(str_parts) if str_parts else ""
+
+
+def _transcript_to_messages(content: str) -> list[dict]:
+    """Convert JSONL transcript entries to plain message dicts for compression.
+
+    Parses each line of the JSONL *content*, skips strippable metadata entries
+    (progress, file-history-snapshot, etc.), and extracts the ``role`` and
+    flattened ``content`` from the ``message`` field of each remaining entry.
+
+    Structured content blocks (``tool_use``, ``tool_result``, images) are
+    flattened to plain text via ``_flatten_assistant_content`` and
+    ``_flatten_tool_result_content`` so that ``compress_context`` can
+    perform token counting and LLM summarization on uniform strings.
+
+    Returns:
+        A list of ``{"role": str, "content": str}`` dicts suitable for
+        ``compress_context``.
+    """
+    messages: list[dict] = []
+    for line in content.strip().split("\n"):
+        if not line.strip():
+            continue
+        entry = json.loads(line, fallback=None)
+        if not isinstance(entry, dict):
+            continue
+        if entry.get("type", "") in STRIPPABLE_TYPES and not entry.get(
+            "isCompactSummary"
+        ):
+            continue
+        msg = entry.get("message", {})
+        role = msg.get("role", "")
+        if not role:
+            continue
+        msg_dict: dict = {"role": role}
+        raw_content = msg.get("content")
+        if role == "assistant" and isinstance(raw_content, list):
+            msg_dict["content"] = _flatten_assistant_content(raw_content)
+        elif isinstance(raw_content, list):
+            msg_dict["content"] = _flatten_tool_result_content(raw_content)
+        else:
+            msg_dict["content"] = raw_content or ""
+        messages.append(msg_dict)
+    return messages
+
+
+def _messages_to_transcript(messages: list[dict]) -> str:
+    """Convert compressed message dicts back to JSONL transcript format.
+
+    Rebuilds a minimal JSONL transcript from the ``{"role", "content"}``
+    dicts returned by ``compress_context``.  Each message becomes one JSONL
+    line with a fresh ``uuid`` / ``parentUuid`` chain so the CLI's
+    ``--resume`` flag can reconstruct a valid conversation tree.
+
+    Assistant messages are wrapped in the full ``message`` envelope
+    (``id``, ``model``, ``stop_reason``, structured ``content`` blocks)
+    that the CLI expects.  User messages use the simpler ``{role, content}``
+    form.
+
+    Returns:
+        A newline-terminated JSONL string, or an empty string if *messages*
+        is empty.
+    """
+    lines: list[str] = []
+    last_uuid: str = ""  # root entry uses empty string, not null
+    for msg in messages:
+        role = msg.get("role", "user")
+        entry_type = "assistant" if role == "assistant" else "user"
+        uid = str(uuid4())
+        content = msg.get("content", "")
+        if role == "assistant":
+            message: dict = {
+                "role": "assistant",
+                "model": "",
+                "id": f"{COMPACT_MSG_ID_PREFIX}{uuid4().hex[:24]}",
+                "type": ENTRY_TYPE_MESSAGE,
+                "content": [{"type": "text", "text": content}] if content else [],
+                "stop_reason": STOP_REASON_END_TURN,
+                "stop_sequence": None,
+            }
+        else:
+            message = {"role": role, "content": content}
+        entry = {
+            "type": entry_type,
+            "uuid": uid,
+            "parentUuid": last_uuid,
+            "message": message,
+        }
+        lines.append(json.dumps(entry, separators=(",", ":")))
+        last_uuid = uid
+    return "\n".join(lines) + "\n" if lines else ""
+
+
+_COMPACTION_TIMEOUT_SECONDS = 60
+_TRUNCATION_TIMEOUT_SECONDS = 30
+
+
+async def _run_compression(
+    messages: list[dict],
+    model: str,
+    log_prefix: str,
+) -> CompressResult:
+    """Run LLM-based compression with truncation fallback.
+
+    Uses the shared OpenAI client from ``get_openai_client()``.
+    If no client is configured or the LLM call fails, falls back to
+    truncation-based compression which drops older messages without
+    summarization.
+
+    A 60-second timeout prevents a hung LLM call from blocking the
+    retry path indefinitely.  The truncation fallback also has a
+    30-second timeout to guard against slow tokenization on very large
+    transcripts.
+    """
+    client = get_openai_client()
+    if client is None:
+        logger.warning("%s No OpenAI client configured, using truncation", log_prefix)
+        return await asyncio.wait_for(
+            compress_context(messages=messages, model=model, client=None),
+            timeout=_TRUNCATION_TIMEOUT_SECONDS,
+        )
+    try:
+        return await asyncio.wait_for(
+            compress_context(messages=messages, model=model, client=client),
+            timeout=_COMPACTION_TIMEOUT_SECONDS,
+        )
+    except Exception as e:
+        logger.warning("%s LLM compaction failed, using truncation: %s", log_prefix, e)
+        return await asyncio.wait_for(
+            compress_context(messages=messages, model=model, client=None),
+            timeout=_TRUNCATION_TIMEOUT_SECONDS,
+        )
+
+
+async def compact_transcript(
+    content: str,
+    *,
+    model: str,
+    log_prefix: str = "[Transcript]",
+) -> str | None:
+    """Compact an oversized JSONL transcript using LLM summarization.
+
+    Converts transcript entries to plain messages, runs ``compress_context``
+    (the same compressor used for pre-query history), and rebuilds JSONL.
+
+    Structured content (``tool_use`` blocks, ``tool_result`` nesting, images)
+    is flattened to plain text for compression.  This matches the fidelity of
+    the Plan C (DB compression) fallback path, where
+    ``_format_conversation_context`` similarly renders tool calls as
+    ``You called tool: name(args)`` and results as ``Tool result: ...``.
+    Neither path preserves structured API content blocks — the compacted
+    context serves as text history for the LLM, which creates proper
+    structured tool calls going forward.
+
+    Images are per-turn attachments loaded from workspace storage by file ID
+    (via ``_prepare_file_attachments``), not part of the conversation history.
+    They are re-attached each turn and are unaffected by compaction.
+
+    Returns the compacted JSONL string, or ``None`` on failure.
+
+    See also:
+        ``_compress_messages`` in ``service.py`` — compresses ``ChatMessage``
+        lists for pre-query DB history.  Both share ``compress_context()``
+        but operate on different input formats (JSONL transcript entries
+        here vs. ChatMessage dicts there).
+    """
+    messages = _transcript_to_messages(content)
+    if len(messages) < 2:
+        logger.warning("%s Too few messages to compact (%d)", log_prefix, len(messages))
+        return None
+    try:
+        result = await _run_compression(messages, model, log_prefix)
+        if not result.was_compacted:
+            # Compressor says it's within budget, but the SDK rejected it.
+            # Return None so the caller falls through to DB fallback.
+            logger.warning(
+                "%s Compressor reports within budget but SDK rejected — "
+                "signalling failure",
+                log_prefix,
+            )
+            return None
+        logger.info(
+            "%s Compacted transcript: %d->%d tokens (%d summarized, %d dropped)",
+            log_prefix,
+            result.original_token_count,
+            result.token_count,
+            result.messages_summarized,
+            result.messages_dropped,
+        )
+        compacted = _messages_to_transcript(result.messages)
+        if not validate_transcript(compacted):
+            logger.warning("%s Compacted transcript failed validation", log_prefix)
+            return None
+        return compacted
+    except Exception as e:
+        logger.error(
+            "%s Transcript compaction failed: %s", log_prefix, e, exc_info=True
+        )
+        return None
--- a/autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
@@ -68,7 +68,7 @@ class TranscriptBuilder:
            type=entry_type,
            uuid=data.get("uuid") or str(uuid4()),
            parentUuid=data.get("parentUuid"),
-            isCompactSummary=data.get("isCompactSummary") or None,
+            isCompactSummary=data.get("isCompactSummary"),
            message=data.get("message", {}),
        )

--- a/autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
+++ b/autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
@@ -1,7 +1,8 @@
 """Unit tests for JSONL transcript management utilities."""

+import asyncio
 import os
-from unittest.mock import AsyncMock, patch
+from unittest.mock import AsyncMock, MagicMock, patch

 import pytest

@@ -301,7 +302,7 @@ class TestDeleteTranscript:
        mock_storage.delete = AsyncMock()

        with patch(
-            "backend.util.workspace_storage.get_workspace_storage",
+            "backend.copilot.sdk.transcript.get_workspace_storage",
            new_callable=AsyncMock,
            return_value=mock_storage,
        ):
@@ -321,7 +322,7 @@ class TestDeleteTranscript:
        )

        with patch(
-            "backend.util.workspace_storage.get_workspace_storage",
+            "backend.copilot.sdk.transcript.get_workspace_storage",
            new_callable=AsyncMock,
            return_value=mock_storage,
        ):
@@ -339,7 +340,7 @@ class TestDeleteTranscript:
        )

        with patch(
-            "backend.util.workspace_storage.get_workspace_storage",
+            "backend.copilot.sdk.transcript.get_workspace_storage",
            new_callable=AsyncMock,
            return_value=mock_storage,
        ):
@@ -818,6 +819,183 @@ class TestCompactionFlowIntegration:
        assert lines2[-1]["parentUuid"] == "a2"


+# ---------------------------------------------------------------------------
+# _run_compression (direct tests for the 3 code paths)
+# ---------------------------------------------------------------------------
+
+
+class TestRunCompression:
+    """Direct tests for ``_run_compression`` covering all 3 code paths.
+
+    Paths:
+    (a) No OpenAI client configured → truncation fallback immediately.
+    (b) LLM success → returns LLM-compressed result.
+    (c) LLM call raises → truncation fallback.
+    """
+
+    def _make_compress_result(self, was_compacted: bool, msgs=None):
+        """Build a minimal CompressResult-like object."""
+        from types import SimpleNamespace
+
+        return SimpleNamespace(
+            was_compacted=was_compacted,
+            messages=msgs or [{"role": "user", "content": "summary"}],
+            original_token_count=500,
+            token_count=100 if was_compacted else 500,
+            messages_summarized=2 if was_compacted else 0,
+            messages_dropped=0,
+        )
+
+    @pytest.mark.asyncio
+    async def test_no_client_uses_truncation(self):
+        """Path (a): ``get_openai_client()`` returns None → truncation only."""
+        from .transcript import _run_compression
+
+        truncation_result = self._make_compress_result(
+            True, [{"role": "user", "content": "truncated"}]
+        )
+
+        with (
+            patch(
+                "backend.copilot.sdk.transcript.get_openai_client",
+                return_value=None,
+            ),
+            patch(
+                "backend.copilot.sdk.transcript.compress_context",
+                new_callable=AsyncMock,
+                return_value=truncation_result,
+            ) as mock_compress,
+        ):
+            result = await _run_compression(
+                [{"role": "user", "content": "hello"}],
+                model="test-model",
+                log_prefix="[test]",
+            )
+
+        # compress_context called with client=None (truncation mode)
+        call_kwargs = mock_compress.call_args
+        assert (
+            call_kwargs.kwargs.get("client") is None
+            or (call_kwargs.args and call_kwargs.args[2] is None)
+            or mock_compress.call_args[1].get("client") is None
+        )
+        assert result is truncation_result
+
+    @pytest.mark.asyncio
+    async def test_llm_success_returns_llm_result(self):
+        """Path (b): ``get_openai_client()`` returns a client → LLM compresses."""
+        from .transcript import _run_compression
+
+        llm_result = self._make_compress_result(
+            True, [{"role": "user", "content": "LLM summary"}]
+        )
+        mock_client = MagicMock()
+
+        with (
+            patch(
+                "backend.copilot.sdk.transcript.get_openai_client",
+                return_value=mock_client,
+            ),
+            patch(
+                "backend.copilot.sdk.transcript.compress_context",
+                new_callable=AsyncMock,
+                return_value=llm_result,
+            ) as mock_compress,
+        ):
+            result = await _run_compression(
+                [{"role": "user", "content": "long conversation"}],
+                model="test-model",
+                log_prefix="[test]",
+            )
+
+        # compress_context called with the real client
+        assert mock_compress.called
+        assert result is llm_result
+
+    @pytest.mark.asyncio
+    async def test_llm_failure_falls_back_to_truncation(self):
+        """Path (c): LLM call raises → truncation fallback used instead."""
+        from .transcript import _run_compression
+
+        truncation_result = self._make_compress_result(
+            True, [{"role": "user", "content": "truncated fallback"}]
+        )
+        mock_client = MagicMock()
+        call_count = [0]
+
+        async def _compress_side_effect(**kwargs):
+            call_count[0] += 1
+            if kwargs.get("client") is not None:
+                raise RuntimeError("LLM timeout")
+            return truncation_result
+
+        with (
+            patch(
+                "backend.copilot.sdk.transcript.get_openai_client",
+                return_value=mock_client,
+            ),
+            patch(
+                "backend.copilot.sdk.transcript.compress_context",
+                side_effect=_compress_side_effect,
+            ),
+        ):
+            result = await _run_compression(
+                [{"role": "user", "content": "long conversation"}],
+                model="test-model",
+                log_prefix="[test]",
+            )
+
+        # compress_context called twice: once for LLM (raises), once for truncation
+        assert call_count[0] == 2
+        assert result is truncation_result
+
+    @pytest.mark.asyncio
+    async def test_llm_timeout_falls_back_to_truncation(self):
+        """Path (d): LLM call exceeds timeout → truncation fallback used."""
+        from .transcript import _run_compression
+
+        truncation_result = self._make_compress_result(
+            True, [{"role": "user", "content": "truncated after timeout"}]
+        )
+        call_count = [0]
+
+        async def _compress_side_effect(*, messages, model, client):
+            call_count[0] += 1
+            if client is not None:
+                # Simulate a hang that exceeds the timeout
+                await asyncio.sleep(9999)
+            return truncation_result
+
+        fake_client = MagicMock()
+        with (
+            patch(
+                "backend.copilot.sdk.transcript.get_openai_client",
+                return_value=fake_client,
+            ),
+            patch(
+                "backend.copilot.sdk.transcript.compress_context",
+                side_effect=_compress_side_effect,
+            ),
+            patch(
+                "backend.copilot.sdk.transcript._COMPACTION_TIMEOUT_SECONDS",
+                0.05,
+            ),
+            patch(
+                "backend.copilot.sdk.transcript._TRUNCATION_TIMEOUT_SECONDS",
+                5,
+            ),
+        ):
+            result = await _run_compression(
+                [{"role": "user", "content": "long conversation"}],
+                model="test-model",
+                log_prefix="[test]",
+            )
+
+        # compress_context called twice: once for LLM (times out), once truncation
+        assert call_count[0] == 2
+        assert result is truncation_result
+
+
 # ---------------------------------------------------------------------------
 # cleanup_stale_project_dirs
 # ---------------------------------------------------------------------------
--- a/autogpt_platform/backend/backend/copilot/tools/init.py
+++ b/autogpt_platform/backend/backend/copilot/tools/init.py
@@ -12,6 +12,7 @@ from .agent_browser import BrowserActTool, BrowserNavigateTool, BrowserScreensho
 from .agent_output import AgentOutputTool
 from .base import BaseTool
 from .bash_exec import BashExecTool
+from .connect_integration import ConnectIntegrationTool
 from .continue_run_block import ContinueRunBlockTool
 from .create_agent import CreateAgentTool
 from .customize_agent import CustomizeAgentTool
@@ -84,6 +85,7 @@ TOOL_REGISTRY: dict[str, BaseTool] = {
    "browser_screenshot": BrowserScreenshotTool(),
    # Sandboxed code execution (bubblewrap)
    "bash_exec": BashExecTool(),
+    "connect_integration": ConnectIntegrationTool(),
    # Persistent workspace tools (cloud storage, survives across sessions)
    # Feature request tools
    "search_feature_requests": SearchFeatureRequestsTool(),
--- a/autogpt_platform/backend/backend/copilot/tools/add_understanding.py
+++ b/autogpt_platform/backend/backend/copilot/tools/add_understanding.py
@@ -22,13 +22,12 @@ class AddUnderstandingTool(BaseTool):

    @property
    def description(self) -> str:
-        return """Capture and store information about the user's business context,
-workflows, pain points, and automation goals. Call this tool whenever the user
-shares information about their business. Each call incrementally adds to the
-existing understanding - you don't need to provide all fields at once.
-
-Use this to build a comprehensive profile that helps recommend better agents
-and automations for the user's specific needs."""
+        return (
+            "Store user's business context, workflows, pain points, and automation goals. "
+            "Call whenever the user shares business info. Each call incrementally merges "
+            "with existing data — provide only the fields you have. "
+            "Builds a profile that helps recommend better agents for the user's needs."
+        )

    @property
    def parameters(self) -> dict[str, Any]:
--- a/autogpt_platform/backend/backend/copilot/tools/agent_browser.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_browser.py
@@ -20,7 +20,9 @@ SSRF protection:

 Requires:
  npm install -g agent-browser
-  agent-browser install   (downloads Chromium, one-time per machine)
+  agent-browser install   (downloads Chromium, one-time — skipped in Docker
+                           where system chromium is pre-installed and
+                           AGENT_BROWSER_EXECUTABLE_PATH is set)
 """

 import asyncio
@@ -408,18 +410,11 @@ class BrowserNavigateTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Navigate to a URL using a real browser. Returns an accessibility "
-            "tree snapshot listing the page's interactive elements with @ref IDs "
-            "(e.g. @e3) that can be used with browser_act. "
-            "Session persists — cookies and login state carry over between calls. "
-            "Use this (with browser_act) for multi-step interaction: login flows, "
-            "form filling, button clicks, or anything requiring page interaction. "
-            "For plain static pages, prefer web_fetch — no browser overhead. "
-            "For authenticated pages: navigate to the login page first, use browser_act "
-            "to fill credentials and submit, then navigate to the target page. "
-            "Note: for slow SPAs, the returned snapshot may reflect a partially-loaded "
-            "state. If elements seem missing, use browser_act with action='wait' and a "
-            "CSS selector or millisecond delay, then take a browser_screenshot to verify."
+            "Navigate to a URL in a real browser. Returns accessibility tree with @ref IDs "
+            "for browser_act. Session persists (cookies/auth carry over). "
+            "For static pages, prefer web_fetch. "
+            "For SPAs, elements may load late — use browser_act with wait + browser_screenshot to verify. "
+            "For auth: navigate to login, fill creds and submit with browser_act, then navigate to target."
        )

    @property
@@ -429,13 +424,13 @@ class BrowserNavigateTool(BaseTool):
            "properties": {
                "url": {
                    "type": "string",
-                    "description": "The HTTP/HTTPS URL to navigate to.",
+                    "description": "HTTP/HTTPS URL to navigate to.",
                },
                "wait_for": {
                    "type": "string",
                    "enum": ["networkidle", "load", "domcontentloaded"],
                    "default": "networkidle",
-                    "description": "When to consider navigation complete. Use 'networkidle' for SPAs (default).",
+                    "description": "Navigation completion strategy (default: networkidle).",
                },
            },
            "required": ["url"],
@@ -554,14 +549,12 @@ class BrowserActTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Interact with the current browser page. Use @ref IDs from the "
-            "snapshot (e.g. '@e3') to target elements. Returns an updated snapshot. "
-            "Supported actions: click, dblclick, fill, type, scroll, hover, press, "
+            "Interact with the current browser page using @ref IDs from the snapshot. "
+            "Actions: click, dblclick, fill, type, scroll, hover, press, "
            "check, uncheck, select, wait, back, forward, reload. "
-            "fill clears the field before typing; type appends without clearing. "
-            "wait accepts a CSS selector (waits for element) or milliseconds string (e.g. '1000'). "
-            "Example login flow: fill @e1 with email → fill @e2 with password → "
-            "click @e3 (submit) → browser_navigate to the target page."
+            "fill clears field first; type appends. "
+            "wait accepts CSS selector or milliseconds (e.g. '1000'). "
+            "Returns updated snapshot."
        )

    @property
@@ -587,30 +580,21 @@ class BrowserActTool(BaseTool):
                        "forward",
                        "reload",
                    ],
-                    "description": "The action to perform.",
+                    "description": "Action to perform.",
                },
                "target": {
                    "type": "string",
-                    "description": (
-                        "Element to target. Use @ref from snapshot (e.g. '@e3'), "
-                        "a CSS selector, or a text description. "
-                        "Required for: click, dblclick, fill, type, hover, check, uncheck, select. "
-                        "For wait: a CSS selector to wait for, or milliseconds as a string (e.g. '1000')."
-                    ),
+                    "description": "@ref ID (e.g. '@e3'), CSS selector, or text. Required for: click, dblclick, fill, type, hover, check, uncheck, select. For wait: CSS selector or milliseconds string (e.g. '1000').",
                },
                "value": {
                    "type": "string",
-                    "description": (
-                        "For fill/type: the text to enter. "
-                        "For press: key name (e.g. 'Enter', 'Tab', 'Control+a'). "
-                        "For select: the option value to select."
-                    ),
+                    "description": "Text for fill/type, key for press (e.g. 'Enter'), option for select.",
                },
                "direction": {
                    "type": "string",
                    "enum": ["up", "down", "left", "right"],
                    "default": "down",
-                    "description": "For scroll: direction to scroll.",
+                    "description": "Scroll direction (default: down).",
                },
            },
            "required": ["action"],
@@ -757,12 +741,10 @@ class BrowserScreenshotTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Take a screenshot of the current browser page and save it to the workspace. "
-            "IMPORTANT: After calling this tool, immediately call read_workspace_file "
-            "with the returned file_id to display the image inline to the user — "
-            "the screenshot is not visible until you do this. "
-            "With annotate=true (default), @ref labels are overlaid on interactive "
-            "elements, making it easy to see which @ref ID maps to which element on screen."
+            "Screenshot the current browser page and save to workspace. "
+            "annotate=true overlays @ref labels on elements. "
+            "IMPORTANT: After calling, you MUST immediately call read_workspace_file with the "
+            "returned file_id to display the image inline."
        )

    @property
@@ -773,12 +755,12 @@ class BrowserScreenshotTool(BaseTool):
                "annotate": {
                    "type": "boolean",
                    "default": True,
-                    "description": "Overlay @ref labels on interactive elements (default: true).",
+                    "description": "Overlay @ref labels (default: true).",
                },
                "filename": {
                    "type": "string",
                    "default": "screenshot.png",
-                    "description": "Filename to save in the workspace.",
+                    "description": "Workspace filename (default: screenshot.png).",
                },
            },
        }
--- a/autogpt_platform/backend/backend/copilot/tools/agent_output.py
+++ b/autogpt_platform/backend/backend/copilot/tools/agent_output.py
@@ -108,22 +108,12 @@ class AgentOutputTool(BaseTool):

    @property
    def description(self) -> str:
-        return """Retrieve execution outputs from agents in the user's library.
-
-        Identify the agent using one of:
-        - agent_name: Fuzzy search in user's library
-        - library_agent_id: Exact library agent ID
-        - store_slug: Marketplace format 'username/agent-name'
-
-        Select which run to retrieve using:
-        - execution_id: Specific execution ID
-        - run_time: 'latest' (default), 'yesterday', 'last week', or ISO date 'YYYY-MM-DD'
-
-        Wait for completion (optional):
-        - wait_if_running: Max seconds to wait if execution is still running (0-300).
-          If the execution is running/queued, waits up to this many seconds for completion.
-          Returns current status on timeout. If already finished, returns immediately.
-        """
+        return (
+            "Retrieve execution outputs from a library agent. "
+            "Identify by agent_name, library_agent_id, or store_slug. "
+            "Filter by execution_id or run_time. "
+            "Optionally wait for running executions."
+        )

    @property
    def parameters(self) -> dict[str, Any]:
@@ -132,32 +122,29 @@ class AgentOutputTool(BaseTool):
            "properties": {
                "agent_name": {
                    "type": "string",
-                    "description": "Agent name to search for in user's library (fuzzy match)",
+                    "description": "Agent name (fuzzy match).",
                },
                "library_agent_id": {
                    "type": "string",
-                    "description": "Exact library agent ID",
+                    "description": "Library agent ID.",
                },
                "store_slug": {
                    "type": "string",
-                    "description": "Marketplace identifier: 'username/agent-slug'",
+                    "description": "Marketplace 'username/agent-name'.",
                },
                "execution_id": {
                    "type": "string",
-                    "description": "Specific execution ID to retrieve",
+                    "description": "Specific execution ID.",
                },
                "run_time": {
                    "type": "string",
-                    "description": (
-                        "Time filter: 'latest', 'yesterday', 'last week', or 'YYYY-MM-DD'"
-                    ),
+                    "description": "Time filter: 'latest', 'today', 'yesterday', 'last week', 'last 7 days', 'last month', 'last 30 days', 'YYYY-MM-DD', or ISO datetime.",
                },
                "wait_if_running": {
                    "type": "integer",
-                    "description": (
-                        "Max seconds to wait if execution is still running (0-300). "
-                        "If running, waits for completion. Returns current state on timeout."
-                    ),
+                    "description": "Max seconds to wait if still running (0-300). Returns current state on timeout.",
+                    "minimum": 0,
+                    "maximum": 300,
                },
            },
            "required": [],
--- a/autogpt_platform/backend/backend/copilot/tools/base.py
+++ b/autogpt_platform/backend/backend/copilot/tools/base.py
@@ -164,8 +164,9 @@ class BaseTool:

        """
        if self.requires_auth and not user_id:
-            logger.error(
-                f"Attempted tool call for {self.name} but user not authenticated"
+            logger.warning(
+                "Attempted tool call for %s but user not authenticated",
+                self.name,
            )
            return StreamToolOutputAvailable(
                toolCallId=tool_call_id,
@@ -196,7 +197,7 @@ class BaseTool:
                output=raw_output,
            )
        except Exception as e:
-            logger.error(f"Error in {self.name}: {e}", exc_info=True)
+            logger.warning("Error in %s", self.name, exc_info=True)
            return StreamToolOutputAvailable(
                toolCallId=tool_call_id,
                toolName=self.name,
--- a/autogpt_platform/backend/backend/copilot/tools/bash_exec.py
+++ b/autogpt_platform/backend/backend/copilot/tools/bash_exec.py
@@ -22,6 +22,7 @@ from e2b import AsyncSandbox
 from e2b.exceptions import TimeoutException

 from backend.copilot.context import E2B_WORKDIR, get_current_sandbox
+from backend.copilot.integration_creds import get_integration_env_vars
 from backend.copilot.model import ChatSession

 from .base import BaseTool
@@ -41,15 +42,9 @@ class BashExecTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Execute a Bash command or script. "
-            "Full Bash scripting is supported (loops, conditionals, pipes, "
-            "functions, etc.). "
-            "The working directory is shared with the SDK Read/Write/Edit/Glob/Grep "
-            "tools — files created by either are immediately visible to both. "
-            "Execution is killed after the timeout (default 30s, max 120s). "
-            "Returns stdout and stderr. "
-            "Useful for file manipulation, data processing, running scripts, "
-            "and installing packages."
+            "Execute a Bash command or script. Shares filesystem with SDK file tools. "
+            "Useful for scripts, data processing, and package installation. "
+            "Killed after timeout (default 30s, max 120s)."
        )

    @property
@@ -59,13 +54,11 @@ class BashExecTool(BaseTool):
            "properties": {
                "command": {
                    "type": "string",
-                    "description": "Bash command or script to execute.",
+                    "description": "Bash command or script.",
                },
                "timeout": {
                    "type": "integer",
-                    "description": (
-                        "Max execution time in seconds (default 30, max 120)."
-                    ),
+                    "description": "Max seconds (default 30, max 120).",
                    "default": 30,
                },
            },
@@ -74,7 +67,10 @@ class BashExecTool(BaseTool):

    @property
    def requires_auth(self) -> bool:
-        return False
+        # True because _execute_on_e2b injects user tokens (GH_TOKEN etc.)
+        # when user_id is present.  Defense-in-depth: ensures only authenticated
+        # users reach the token injection path.
+        return True

    async def _execute(
        self,
@@ -82,6 +78,14 @@ class BashExecTool(BaseTool):
        session: ChatSession,
        **kwargs: Any,
    ) -> ToolResponseBase:
+        """Run a bash command on E2B (if available) or in a bubblewrap sandbox.
+
+        Dispatches to :meth:`_execute_on_e2b` when a sandbox is present in the
+        current execution context, otherwise falls back to the local bubblewrap
+        sandbox.  Returns a :class:`BashExecResponse` on success or an
+        :class:`ErrorResponse` when the sandbox is unavailable or the command
+        is empty.
+        """
        session_id = session.session_id if session else None

        command: str = (kwargs.get("command") or "").strip()
@@ -96,7 +100,9 @@ class BashExecTool(BaseTool):

        sandbox = get_current_sandbox()
        if sandbox is not None:
-            return await self._execute_on_e2b(sandbox, command, timeout, session_id)
+            return await self._execute_on_e2b(
+                sandbox, command, timeout, session_id, user_id
+            )

        # Bubblewrap fallback: local isolated execution.
        if not has_full_sandbox():
@@ -133,19 +139,42 @@ class BashExecTool(BaseTool):
        command: str,
        timeout: int,
        session_id: str | None,
+        user_id: str | None = None,
    ) -> ToolResponseBase:
-        """Execute *command* on the E2B sandbox via commands.run()."""
+        """Execute *command* on the E2B sandbox via commands.run().
+
+        Integration tokens (e.g. GH_TOKEN) are injected into the sandbox env
+        for any user with connected accounts. E2B has full internet access, so
+        CLI tools like ``gh`` work without manual authentication.
+        """
+        envs: dict[str, str] = {
+            "PATH": "/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin",
+        }
+        # Collect injected secret values so we can scrub them from output.
+        secret_values: list[str] = []
+        if user_id is not None:
+            integration_env = await get_integration_env_vars(user_id)
+            secret_values = [v for v in integration_env.values() if v]
+            envs.update(integration_env)
+
        try:
            result = await sandbox.commands.run(
                f"bash -c {shlex.quote(command)}",
                cwd=E2B_WORKDIR,
                timeout=timeout,
-                envs={"PATH": "/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin"},
+                envs=envs,
            )
+            stdout = result.stdout or ""
+            stderr = result.stderr or ""
+            # Scrub injected tokens from command output to prevent exfiltration
+            # via `echo $GH_TOKEN`, `env`, `printenv`, etc.
+            for secret in secret_values:
+                stdout = stdout.replace(secret, "[REDACTED]")
+                stderr = stderr.replace(secret, "[REDACTED]")
            return BashExecResponse(
                message=f"Command executed on E2B (exit {result.exit_code})",
-                stdout=result.stdout or "",
-                stderr=result.stderr or "",
+                stdout=stdout,
+                stderr=stderr,
                exit_code=result.exit_code,
                timed_out=False,
                session_id=session_id,
--- a/autogpt_platform/backend/backend/copilot/tools/bash_exec_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/bash_exec_test.py
@@ -0,0 +1,78 @@
+"""Tests for BashExecTool — E2B path with token injection."""
+
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+from ._test_data import make_session
+from .bash_exec import BashExecTool
+from .models import BashExecResponse
+
+_USER = "user-bash-exec-test"
+
+
+def _make_tool() -> BashExecTool:
+    return BashExecTool()
+
+
+def _make_sandbox(exit_code: int = 0, stdout: str = "", stderr: str = "") -> MagicMock:
+    result = MagicMock()
+    result.exit_code = exit_code
+    result.stdout = stdout
+    result.stderr = stderr
+
+    sandbox = MagicMock()
+    sandbox.commands.run = AsyncMock(return_value=result)
+    return sandbox
+
+
+class TestBashExecE2BTokenInjection:
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_token_injected_when_user_id_set(self):
+        """When user_id is provided, integration env vars are merged into sandbox envs."""
+        tool = _make_tool()
+        session = make_session(user_id=_USER)
+        sandbox = _make_sandbox(stdout="ok")
+        env_vars = {"GH_TOKEN": "gh-secret", "GITHUB_TOKEN": "gh-secret"}
+
+        with patch(
+            "backend.copilot.tools.bash_exec.get_integration_env_vars",
+            new=AsyncMock(return_value=env_vars),
+        ) as mock_get_env:
+            result = await tool._execute_on_e2b(
+                sandbox=sandbox,
+                command="echo hi",
+                timeout=10,
+                session_id=session.session_id,
+                user_id=_USER,
+            )
+
+        mock_get_env.assert_awaited_once_with(_USER)
+        call_kwargs = sandbox.commands.run.call_args[1]
+        assert call_kwargs["envs"]["GH_TOKEN"] == "gh-secret"
+        assert call_kwargs["envs"]["GITHUB_TOKEN"] == "gh-secret"
+        assert isinstance(result, BashExecResponse)
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_no_token_injection_when_user_id_is_none(self):
+        """When user_id is None, get_integration_env_vars must NOT be called."""
+        tool = _make_tool()
+        session = make_session(user_id=_USER)
+        sandbox = _make_sandbox(stdout="ok")
+
+        with patch(
+            "backend.copilot.tools.bash_exec.get_integration_env_vars",
+            new=AsyncMock(return_value={"GH_TOKEN": "should-not-appear"}),
+        ) as mock_get_env:
+            result = await tool._execute_on_e2b(
+                sandbox=sandbox,
+                command="echo hi",
+                timeout=10,
+                session_id=session.session_id,
+                user_id=None,
+            )
+
+        mock_get_env.assert_not_called()
+        call_kwargs = sandbox.commands.run.call_args[1]
+        assert "GH_TOKEN" not in call_kwargs["envs"]
+        assert isinstance(result, BashExecResponse)
--- a/autogpt_platform/backend/backend/copilot/tools/connect_integration.py
+++ b/autogpt_platform/backend/backend/copilot/tools/connect_integration.py
@@ -0,0 +1,196 @@
+"""Tool for prompting the user to connect a required integration.
+
+When the copilot encounters an authentication failure (e.g. `gh` CLI returns
+"authentication required"), it calls this tool to surface the credentials
+setup card in the chat — the same UI that appears when a GitHub block runs
+without configured credentials.
+"""
+
+from typing import Any, TypedDict
+
+from backend.copilot.model import ChatSession
+from backend.copilot.providers import SUPPORTED_PROVIDERS, get_provider_auth_types
+from backend.copilot.tools.models import (
+    ErrorResponse,
+    ResponseType,
+    SetupInfo,
+    SetupRequirementsResponse,
+    ToolResponseBase,
+    UserReadiness,
+)
+
+from .base import BaseTool
+
+
+class _CredentialEntry(TypedDict):
+    """Shape of each entry inside SetupRequirementsResponse.user_readiness.missing_credentials.
+
+    Partially overlaps with :class:`~backend.data.model.CredentialsMetaInput`
+    (``id``, ``title``, ``provider``) but carries extra UI-facing fields
+    (``types``, ``scopes``) that the frontend ``SetupRequirementsCard`` needs
+    to render the inline credential setup card.
+
+    Display name is derived from :data:`SUPPORTED_PROVIDERS` at build time
+    rather than stored here — eliminates the old ``provider_name`` field.
+    ``types`` replaces the old singular ``type`` field; the frontend already
+    prefers ``types`` and only fell back to ``type`` for compatibility.
+    """
+
+    id: str
+    title: str
+    # Slug used as the credential key (e.g. "github").
+    provider: str
+    # All supported credential types the user can choose from (e.g. ["api_key", "oauth2"]).
+    # The first element is the default/primary type.
+    types: list[str]
+    scopes: list[str]
+
+
+class ConnectIntegrationTool(BaseTool):
+    """Surface the credentials setup UI when an integration is not connected."""
+
+    @property
+    def name(self) -> str:
+        return "connect_integration"
+
+    @property
+    def description(self) -> str:
+        return (
+            "Prompt the user to connect a required integration (e.g. GitHub). "
+            "Call this when an external CLI or API call fails because the user "
+            "has not connected the relevant account. "
+            "The tool surfaces a credentials setup card in the chat so the user "
+            "can authenticate without leaving the page. "
+            "After the user connects the account, retry the operation. "
+            "In E2B/cloud sandbox mode the token (GH_TOKEN/GITHUB_TOKEN) is "
+            "automatically injected per-command in bash_exec — no manual export needed. "
+            "In local bubblewrap mode network is isolated so GitHub CLI commands "
+            "will still fail after connecting; inform the user of this limitation."
+        )
+
+    @property
+    def parameters(self) -> dict[str, Any]:
+        return {
+            "type": "object",
+            "properties": {
+                "provider": {
+                    "type": "string",
+                    "description": (
+                        "Integration provider slug, e.g. 'github'. "
+                        "Must be one of the supported providers."
+                    ),
+                    "enum": list(SUPPORTED_PROVIDERS.keys()),
+                },
+                "reason": {
+                    "type": "string",
+                    "description": (
+                        "Brief explanation of why the integration is needed, "
+                        "shown to the user in the setup card."
+                    ),
+                    "maxLength": 500,
+                },
+                "scopes": {
+                    "type": "array",
+                    "items": {"type": "string"},
+                    "description": (
+                        "OAuth scopes to request. Omit to use the provider default. "
+                        "Add extra scopes when you need more access — e.g. for GitHub: "
+                        "'repo' (clone/push/pull), 'read:org' (org membership), "
+                        "'workflow' (GitHub Actions). "
+                        "Requesting only the scopes you actually need is best practice."
+                    ),
+                },
+            },
+            "required": ["provider"],
+        }
+
+    @property
+    def requires_auth(self) -> bool:
+        # Require auth so only authenticated users can trigger the setup card.
+        # The card itself is user-agnostic (no per-user data needed), so
+        # user_id is intentionally unused in _execute.
+        return True
+
+    async def _execute(
+        self,
+        user_id: str | None,
+        session: ChatSession,
+        **kwargs: Any,
+    ) -> ToolResponseBase:
+        """Build and return a :class:`SetupRequirementsResponse` for the requested provider.
+
+        Validates the *provider* slug against the known registry, merges any
+        agent-requested OAuth *scopes* with the provider defaults, and constructs
+        the credential setup card payload that the frontend renders as an inline
+        authentication prompt.
+
+        Returns an :class:`ErrorResponse` if *provider* is unknown.
+        """
+        _ = user_id  # setup card is user-agnostic; auth is enforced via requires_auth
+        session_id = session.session_id if session else None
+        provider: str = (kwargs.get("provider") or "").strip().lower()
+        reason: str = (kwargs.get("reason") or "").strip()[
+            :500
+        ]  # cap LLM-controlled text
+        extra_scopes: list[str] = [
+            str(s).strip() for s in (kwargs.get("scopes") or []) if str(s).strip()
+        ]
+
+        entry = SUPPORTED_PROVIDERS.get(provider)
+        if not entry:
+            supported = ", ".join(f"'{p}'" for p in SUPPORTED_PROVIDERS)
+            return ErrorResponse(
+                message=(
+                    f"Unknown provider '{provider}'. "
+                    f"Supported providers: {supported}."
+                ),
+                error="unknown_provider",
+                session_id=session_id,
+            )
+
+        display_name: str = entry["name"]
+        supported_types: list[str] = get_provider_auth_types(provider)
+        # Merge agent-requested scopes with provider defaults (deduplicated, order preserved).
+        default_scopes: list[str] = entry["default_scopes"]
+        seen: set[str] = set()
+        scopes: list[str] = []
+        for s in default_scopes + extra_scopes:
+            if s not in seen:
+                seen.add(s)
+                scopes.append(s)
+        field_key = f"{provider}_credentials"
+
+        message_parts = [
+            f"To continue, please connect your {display_name} account.",
+        ]
+        if reason:
+            message_parts.append(reason)
+
+        credential_entry: _CredentialEntry = {
+            "id": field_key,
+            "title": f"{display_name} Credentials",
+            "provider": provider,
+            "types": supported_types,
+            "scopes": scopes,
+        }
+        missing_credentials: dict[str, _CredentialEntry] = {field_key: credential_entry}
+
+        return SetupRequirementsResponse(
+            type=ResponseType.SETUP_REQUIREMENTS,
+            message=" ".join(message_parts),
+            session_id=session_id,
+            setup_info=SetupInfo(
+                agent_id=f"connect_{provider}",
+                agent_name=display_name,
+                user_readiness=UserReadiness(
+                    has_all_credentials=False,
+                    missing_credentials=missing_credentials,
+                    ready_to_run=False,
+                ),
+                requirements={
+                    "credentials": [missing_credentials[field_key]],
+                    "inputs": [],
+                    "execution_modes": [],
+                },
+            ),
+        )
--- a/autogpt_platform/backend/backend/copilot/tools/connect_integration_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/connect_integration_test.py
@@ -0,0 +1,135 @@
+"""Tests for ConnectIntegrationTool."""
+
+import pytest
+
+from ._test_data import make_session
+from .connect_integration import ConnectIntegrationTool
+from .models import ErrorResponse, SetupRequirementsResponse
+
+_TEST_USER_ID = "test-user-connect-integration"
+
+
+class TestConnectIntegrationTool:
+    def _make_tool(self) -> ConnectIntegrationTool:
+        return ConnectIntegrationTool()
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_unknown_provider_returns_error(self):
+        tool = self._make_tool()
+        session = make_session(user_id=_TEST_USER_ID)
+        result = await tool._execute(
+            user_id=_TEST_USER_ID, session=session, provider="nonexistent"
+        )
+        assert isinstance(result, ErrorResponse)
+        assert result.error == "unknown_provider"
+        assert "nonexistent" in result.message
+        assert "github" in result.message
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_empty_provider_returns_error(self):
+        tool = self._make_tool()
+        session = make_session(user_id=_TEST_USER_ID)
+        result = await tool._execute(
+            user_id=_TEST_USER_ID, session=session, provider=""
+        )
+        assert isinstance(result, ErrorResponse)
+        assert result.error == "unknown_provider"
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_github_provider_returns_setup_response(self):
+        tool = self._make_tool()
+        session = make_session(user_id=_TEST_USER_ID)
+        result = await tool._execute(
+            user_id=_TEST_USER_ID, session=session, provider="github"
+        )
+        assert isinstance(result, SetupRequirementsResponse)
+        assert result.setup_info.agent_name == "GitHub"
+        assert result.setup_info.agent_id == "connect_github"
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_github_has_missing_credentials_in_readiness(self):
+        tool = self._make_tool()
+        session = make_session(user_id=_TEST_USER_ID)
+        result = await tool._execute(
+            user_id=_TEST_USER_ID, session=session, provider="github"
+        )
+        assert isinstance(result, SetupRequirementsResponse)
+        readiness = result.setup_info.user_readiness
+        assert readiness.has_all_credentials is False
+        assert readiness.ready_to_run is False
+        assert "github_credentials" in readiness.missing_credentials
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_github_requirements_include_credential_entry(self):
+        tool = self._make_tool()
+        session = make_session(user_id=_TEST_USER_ID)
+        result = await tool._execute(
+            user_id=_TEST_USER_ID, session=session, provider="github"
+        )
+        assert isinstance(result, SetupRequirementsResponse)
+        creds = result.setup_info.requirements["credentials"]
+        assert len(creds) == 1
+        assert creds[0]["provider"] == "github"
+        assert creds[0]["id"] == "github_credentials"
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_reason_appears_in_message(self):
+        tool = self._make_tool()
+        session = make_session(user_id=_TEST_USER_ID)
+        reason = "Needed to create a pull request."
+        result = await tool._execute(
+            user_id=_TEST_USER_ID, session=session, provider="github", reason=reason
+        )
+        assert isinstance(result, SetupRequirementsResponse)
+        assert reason in result.message
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_session_id_propagated(self):
+        tool = self._make_tool()
+        session = make_session(user_id=_TEST_USER_ID)
+        result = await tool._execute(
+            user_id=_TEST_USER_ID, session=session, provider="github"
+        )
+        assert isinstance(result, SetupRequirementsResponse)
+        assert result.session_id == session.session_id
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_provider_case_insensitive(self):
+        """Provider slug is normalised to lowercase before lookup."""
+        tool = self._make_tool()
+        session = make_session(user_id=_TEST_USER_ID)
+        result = await tool._execute(
+            user_id=_TEST_USER_ID, session=session, provider="GitHub"
+        )
+        assert isinstance(result, SetupRequirementsResponse)
+
+    def test_tool_name(self):
+        assert ConnectIntegrationTool().name == "connect_integration"
+
+    def test_requires_auth(self):
+        assert ConnectIntegrationTool().requires_auth is True
+
+    @pytest.mark.asyncio(loop_scope="session")
+    async def test_unauthenticated_user_gets_need_login_response(self):
+        """execute() with user_id=None must return NeedLoginResponse, not the setup card.
+
+        This verifies that the requires_auth guard in BaseTool.execute() fires
+        before _execute() is called, so unauthenticated callers cannot probe
+        which integrations are configured.
+        """
+        import json
+
+        tool = self._make_tool()
+        # Session still needs a user_id string; the None is passed to execute()
+        # to simulate an unauthenticated call.
+        session = make_session(user_id=_TEST_USER_ID)
+        result = await tool.execute(
+            user_id=None,
+            session=session,
+            tool_call_id="test-call-id",
+            provider="github",
+        )
+        raw = result.output
+        output = json.loads(raw) if isinstance(raw, str) else raw
+        assert output.get("type") == "need_login"
+        assert result.success is False
--- a/autogpt_platform/backend/backend/copilot/tools/continue_run_block.py
+++ b/autogpt_platform/backend/backend/copilot/tools/continue_run_block.py
@@ -30,12 +30,7 @@ class ContinueRunBlockTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Continue executing a block after human review approval. "
-            "Use this after a run_block call returned review_required. "
-            "Pass the review_id from the review_required response. "
-            "The block will execute with the original pre-approved input data."
-        )
+        return "Resume block execution after a run_block call returned review_required. Pass the review_id."

    @property
    def parameters(self) -> dict[str, Any]:
@@ -44,10 +39,7 @@ class ContinueRunBlockTool(BaseTool):
            "properties": {
                "review_id": {
                    "type": "string",
-                    "description": (
-                        "The review_id from a previous review_required response. "
-                        "This resumes execution with the pre-approved input data."
-                    ),
+                    "description": "review_id from the review_required response.",
                },
            },
            "required": ["review_id"],
--- a/autogpt_platform/backend/backend/copilot/tools/create_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/create_agent.py
@@ -23,12 +23,8 @@ class CreateAgentTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Create a new agent workflow. Pass `agent_json` with the complete "
-            "agent graph JSON you generated using block schemas from find_block. "
-            "The tool validates, auto-fixes, and saves.\n\n"
-            "IMPORTANT: Before calling this tool, search for relevant existing agents "
-            "using find_library_agent that could be used as building blocks. "
-            "Pass their IDs in the library_agent_ids parameter."
+            "Create a new agent from JSON (nodes + links). Validates, auto-fixes, and saves. "
+            "Before calling, search for existing agents with find_library_agent."
        )

    @property
@@ -42,34 +38,21 @@ class CreateAgentTool(BaseTool):
            "properties": {
                "agent_json": {
                    "type": "object",
-                    "description": (
-                        "The agent JSON to validate and save. "
-                        "Must contain 'nodes' and 'links' arrays, and optionally "
-                        "'name' and 'description'."
-                    ),
+                    "description": "Agent graph with 'nodes' and 'links' arrays.",
                },
                "library_agent_ids": {
                    "type": "array",
                    "items": {"type": "string"},
-                    "description": (
-                        "List of library agent IDs to use as building blocks."
-                    ),
+                    "description": "Library agent IDs as building blocks.",
                },
                "save": {
                    "type": "boolean",
-                    "description": (
-                        "Whether to save the agent. Default is true. "
-                        "Set to false for preview only."
-                    ),
+                    "description": "Save the agent (default: true). False for preview.",
                    "default": True,
                },
                "folder_id": {
                    "type": "string",
-                    "description": (
-                        "Optional folder ID to save the agent into. "
-                        "If not provided, the agent is saved at root level. "
-                        "Use list_folders to find available folders."
-                    ),
+                    "description": "Folder ID to save into (default: root).",
                },
            },
            "required": ["agent_json"],
--- a/autogpt_platform/backend/backend/copilot/tools/customize_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/customize_agent.py
@@ -23,9 +23,7 @@ class CustomizeAgentTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Customize a marketplace or template agent. Pass `agent_json` "
-            "with the complete customized agent JSON. The tool validates, "
-            "auto-fixes, and saves."
+            "Customize a marketplace/template agent. Validates, auto-fixes, and saves."
        )

    @property
@@ -39,32 +37,21 @@ class CustomizeAgentTool(BaseTool):
            "properties": {
                "agent_json": {
                    "type": "object",
-                    "description": (
-                        "Complete customized agent JSON to validate and save. "
-                        "Optionally include 'name' and 'description'."
-                    ),
+                    "description": "Customized agent JSON with nodes and links.",
                },
                "library_agent_ids": {
                    "type": "array",
                    "items": {"type": "string"},
-                    "description": (
-                        "List of library agent IDs to use as building blocks."
-                    ),
+                    "description": "Library agent IDs as building blocks.",
                },
                "save": {
                    "type": "boolean",
-                    "description": (
-                        "Whether to save the customized agent. Default is true."
-                    ),
+                    "description": "Save the agent (default: true). False for preview.",
                    "default": True,
                },
                "folder_id": {
                    "type": "string",
-                    "description": (
-                        "Optional folder ID to save the agent into. "
-                        "If not provided, the agent is saved at root level. "
-                        "Use list_folders to find available folders."
-                    ),
+                    "description": "Folder ID to save into (default: root).",
                },
            },
            "required": ["agent_json"],
--- a/autogpt_platform/backend/backend/copilot/tools/e2b_sandbox.py
+++ b/autogpt_platform/backend/backend/copilot/tools/e2b_sandbox.py
@@ -41,8 +41,7 @@ import contextlib
 import logging
 from typing import Any, Awaitable, Callable, Literal

-from e2b import AsyncSandbox
-from e2b.sandbox.sandbox_api import SandboxLifecycle
+from e2b import AsyncSandbox, SandboxLifecycle

 from backend.data.redis_client import get_redis_async

--- a/autogpt_platform/backend/backend/copilot/tools/edit_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/edit_agent.py
@@ -23,12 +23,8 @@ class EditAgentTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Edit an existing agent. Pass `agent_json` with the complete "
-            "updated agent JSON you generated. The tool validates, auto-fixes, "
-            "and saves.\n\n"
-            "IMPORTANT: Before calling this tool, if the changes involve adding new "
-            "functionality, search for relevant existing agents using find_library_agent "
-            "that could be used as building blocks."
+            "Edit an existing agent. Validates, auto-fixes, and saves. "
+            "Before calling, search for existing agents with find_library_agent."
        )

    @property
@@ -42,33 +38,20 @@ class EditAgentTool(BaseTool):
            "properties": {
                "agent_id": {
                    "type": "string",
-                    "description": (
-                        "The ID of the agent to edit. "
-                        "Can be a graph ID or library agent ID."
-                    ),
+                    "description": "Graph ID or library agent ID to edit.",
                },
                "agent_json": {
                    "type": "object",
-                    "description": (
-                        "Complete updated agent JSON to validate and save. "
-                        "Must contain 'nodes' and 'links'. "
-                        "Include 'name' and/or 'description' if they need "
-                        "to be updated."
-                    ),
+                    "description": "Updated agent JSON with nodes and links.",
                },
                "library_agent_ids": {
                    "type": "array",
                    "items": {"type": "string"},
-                    "description": (
-                        "List of library agent IDs to use as building blocks for the changes."
-                    ),
+                    "description": "Library agent IDs as building blocks.",
                },
                "save": {
                    "type": "boolean",
-                    "description": (
-                        "Whether to save the changes. "
-                        "Default is true. Set to false for preview only."
-                    ),
+                    "description": "Save changes (default: true). False for preview.",
                    "default": True,
                },
            },
--- a/autogpt_platform/backend/backend/copilot/tools/feature_requests.py
+++ b/autogpt_platform/backend/backend/copilot/tools/feature_requests.py
@@ -134,11 +134,7 @@ class SearchFeatureRequestsTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Search existing feature requests to check if a similar request "
-            "already exists before creating a new one. Returns matching feature "
-            "requests with their ID, title, and description."
-        )
+        return "Search existing feature requests. Check before creating a new one."

    @property
    def parameters(self) -> dict[str, Any]:
@@ -234,14 +230,9 @@ class CreateFeatureRequestTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Create a new feature request or add a customer need to an existing one. "
-            "Always search first with search_feature_requests to avoid duplicates. "
-            "If a matching request exists, pass its ID as existing_issue_id to add "
-            "the user's need to it instead of creating a duplicate. "
-            "IMPORTANT: Never include personally identifiable information (PII) in "
-            "the title or description — no names, emails, phone numbers, company "
-            "names, or other identifying details. Write titles and descriptions in "
-            "generic, feature-focused language."
+            "Create a feature request or add need to existing one. "
+            "Search first to avoid duplicates. Pass existing_issue_id to add to existing. "
+            "Never include PII (names, emails, phone numbers, company names) in title/description."
        )

    @property
@@ -251,28 +242,15 @@ class CreateFeatureRequestTool(BaseTool):
            "properties": {
                "title": {
                    "type": "string",
-                    "description": (
-                        "Title for the feature request. Must be generic and "
-                        "feature-focused — do not include any user names, emails, "
-                        "company names, or other PII."
-                    ),
+                    "description": "Feature request title. No names, emails, or company info.",
                },
                "description": {
                    "type": "string",
-                    "description": (
-                        "Detailed description of what the user wants and why. "
-                        "Must not contain any personally identifiable information "
-                        "(PII) — describe the feature need generically without "
-                        "referencing specific users, companies, or contact details."
-                    ),
+                    "description": "What the user wants and why. No names, emails, or company info.",
                },
                "existing_issue_id": {
                    "type": "string",
-                    "description": (
-                        "If adding a need to an existing feature request, "
-                        "provide its Linear issue ID (from search results). "
-                        "Omit to create a new feature request."
-                    ),
+                    "description": "Linear issue ID to add need to (from search results).",
                },
            },
            "required": ["title", "description"],
--- a/autogpt_platform/backend/backend/copilot/tools/find_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/find_agent.py
@@ -18,10 +18,7 @@ class FindAgentTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Discover agents from the marketplace based on capabilities and "
-            "user needs, or look up a specific agent by its creator/slug ID."
-        )
+        return "Search marketplace agents by capability, or look up by slug ('username/agent-name')."

    @property
    def parameters(self) -> dict[str, Any]:
@@ -30,7 +27,7 @@ class FindAgentTool(BaseTool):
            "properties": {
                "query": {
                    "type": "string",
-                    "description": "Search query describing what the user wants to accomplish, or a creator/slug ID (e.g. 'username/agent-name') for direct lookup. Use single keywords for best results.",
+                    "description": "Search keywords, or 'username/agent-name' for direct slug lookup.",
                },
            },
            "required": ["query"],
--- a/autogpt_platform/backend/backend/copilot/tools/find_block.py
+++ b/autogpt_platform/backend/backend/copilot/tools/find_block.py
@@ -54,13 +54,9 @@ class FindBlockTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Search for available blocks by name or description, or look up a "
-            "specific block by its ID. "
-            "Blocks are reusable components that perform specific tasks like "
-            "sending emails, making API calls, processing text, etc. "
-            "IMPORTANT: Use this tool FIRST to get the block's 'id' before calling run_block. "
-            "The response includes each block's id, name, and description. "
-            "Call run_block with the block's id **with no inputs** to see detailed inputs/outputs and execute it."
+            "Search blocks by name or description. Returns block IDs for run_block. "
+            "Always call this FIRST to get block IDs before using run_block. "
+            "Then call run_block with the block's id and empty input_data to see its detailed schema."
        )

    @property
@@ -70,19 +66,11 @@ class FindBlockTool(BaseTool):
            "properties": {
                "query": {
                    "type": "string",
-                    "description": (
-                        "Search query to find blocks by name or description, "
-                        "or a block ID (UUID) for direct lookup. "
-                        "Use keywords like 'email', 'http', 'text', 'ai', etc."
-                    ),
+                    "description": "Search keywords (e.g. 'email', 'http', 'ai').",
                },
                "include_schemas": {
                    "type": "boolean",
-                    "description": (
-                        "If true, include full input_schema and output_schema "
-                        "for each block. Use when generating agent JSON that "
-                        "needs block schemas. Default is false."
-                    ),
+                    "description": "Include full input/output schemas (for agent JSON generation).",
                    "default": False,
                },
            },
--- a/autogpt_platform/backend/backend/copilot/tools/find_library_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/find_library_agent.py
@@ -19,13 +19,8 @@ class FindLibraryAgentTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Search for or list agents in the user's library. Use this to find "
-            "agents the user has already added to their library, including agents "
-            "they created or added from the marketplace. "
-            "When creating agents with sub-agent composition, use this to get "
-            "the agent's graph_id, graph_version, input_schema, and output_schema "
-            "needed for AgentExecutorBlock nodes. "
-            "Omit the query to list all agents."
+            "Search user's library agents. Returns graph_id, schemas for sub-agent composition. "
+            "Omit query to list all."
        )

    @property
@@ -35,10 +30,7 @@ class FindLibraryAgentTool(BaseTool):
            "properties": {
                "query": {
                    "type": "string",
-                    "description": (
-                        "Search query to find agents by name or description. "
-                        "Omit to list all agents in the library."
-                    ),
+                    "description": "Search by name/description. Omit to list all.",
                },
            },
            "required": [],
--- a/autogpt_platform/backend/backend/copilot/tools/fix_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/fix_agent.py
@@ -22,20 +22,10 @@ class FixAgentGraphTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Auto-fix common issues in an agent JSON graph. Applies fixes for:\n"
-            "- Missing or invalid UUIDs on nodes and links\n"
-            "- StoreValueBlock prerequisites for ConditionBlock\n"
-            "- Double curly brace escaping in prompt templates\n"
-            "- AddToList/AddToDictionary prerequisite blocks\n"
-            "- CodeExecutionBlock output field naming\n"
-            "- Missing credentials configuration\n"
-            "- Node X coordinate spacing (800+ units apart)\n"
-            "- AI model default parameters\n"
-            "- Link static properties based on input schema\n"
-            "- Type mismatches (inserts conversion blocks)\n\n"
-            "Returns the fixed agent JSON plus a list of fixes applied. "
-            "After fixing, the agent is re-validated. If still invalid, "
-            "the remaining errors are included in the response."
+            "Auto-fix common agent JSON issues: missing/invalid UUIDs, StoreValueBlock prerequisites, "
+            "double curly brace escaping, AddToList/AddToDictionary prerequisites, credentials, "
+            "node spacing, AI model defaults, link static properties, and type mismatches. "
+            "Returns fixed JSON and list of fixes applied."
        )

    @property
--- a/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide.py
+++ b/autogpt_platform/backend/backend/copilot/tools/get_agent_building_guide.py
@@ -42,12 +42,7 @@ class GetAgentBuildingGuideTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Returns the complete guide for building agent JSON graphs, including "
-            "block IDs, link structure, AgentInputBlock, AgentOutputBlock, "
-            "AgentExecutorBlock (for sub-agent composition), and MCPToolBlock usage. "
-            "Call this before generating agent JSON to ensure correct structure."
-        )
+        return "Get the agent JSON building guide (nodes, links, AgentExecutorBlock, MCPToolBlock usage). Call before generating agent JSON."

    @property
    def parameters(self) -> dict[str, Any]:
--- a/autogpt_platform/backend/backend/copilot/tools/get_doc_page.py
+++ b/autogpt_platform/backend/backend/copilot/tools/get_doc_page.py
@@ -25,8 +25,7 @@ class GetDocPageTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Get the full content of a documentation page by its path. "
-            "Use this after search_docs to read the complete content of a relevant page."
+            "Read full documentation page content by path (from search_docs results)."
        )

    @property
@@ -36,10 +35,7 @@ class GetDocPageTool(BaseTool):
            "properties": {
                "path": {
                    "type": "string",
-                    "description": (
-                        "The path to the documentation file, as returned by search_docs. "
-                        "Example: 'platform/block-sdk-guide.md'"
-                    ),
+                    "description": "Doc file path (e.g. 'platform/block-sdk-guide.md').",
                },
            },
            "required": ["path"],
--- a/autogpt_platform/backend/backend/copilot/tools/get_mcp_guide.py
+++ b/autogpt_platform/backend/backend/copilot/tools/get_mcp_guide.py
@@ -38,11 +38,7 @@ class GetMCPGuideTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Returns the MCP tool guide: known hosted server URLs (Notion, Linear, "
-            "Stripe, Intercom, Cloudflare, Atlassian) and authentication workflow. "
-            "Call before using run_mcp_tool if you need a server URL or auth info."
-        )
+        return "Get MCP server URLs and auth guide. Call before run_mcp_tool if you need a server URL or auth info."

    @property
    def parameters(self) -> dict[str, Any]:
--- a/autogpt_platform/backend/backend/copilot/tools/manage_folders.py
+++ b/autogpt_platform/backend/backend/copilot/tools/manage_folders.py
@@ -88,10 +88,7 @@ class CreateFolderTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Create a new folder in the user's library to organize agents. "
-            "Optionally nest it inside an existing folder using parent_id."
-        )
+        return "Create a library folder. Use parent_id to nest inside another folder."

    @property
    def requires_auth(self) -> bool:
@@ -104,22 +101,19 @@ class CreateFolderTool(BaseTool):
            "properties": {
                "name": {
                    "type": "string",
-                    "description": "Name for the new folder (max 100 chars).",
+                    "description": "Folder name (max 100 chars).",
                },
                "parent_id": {
                    "type": "string",
-                    "description": (
-                        "ID of the parent folder to nest inside. "
-                        "Omit to create at root level."
-                    ),
+                    "description": "Parent folder ID (omit for root).",
                },
                "icon": {
                    "type": "string",
-                    "description": "Optional icon identifier for the folder.",
+                    "description": "Icon identifier.",
                },
                "color": {
                    "type": "string",
-                    "description": "Optional hex color code (#RRGGBB).",
+                    "description": "Hex color (#RRGGBB).",
                },
            },
            "required": ["name"],
@@ -175,13 +169,9 @@ class ListFoldersTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "List the user's library folders. "
-            "Omit parent_id to get the full folder tree. "
-            "Provide parent_id to list only direct children of that folder. "
-            "Set include_agents=true to also return the agents inside each folder "
-            "and root-level agents not in any folder. Always set include_agents=true "
-            "when the user asks about agents, wants to see what's in their folders, "
-            "or mentions agents alongside folders."
+            "List library folders. Omit parent_id for full tree. "
+            "Set include_agents=true when user asks about agents, wants to see "
+            "what's in their folders, or mentions agents alongside folders."
        )

    @property
@@ -195,17 +185,11 @@ class ListFoldersTool(BaseTool):
            "properties": {
                "parent_id": {
                    "type": "string",
-                    "description": (
-                        "List children of this folder. "
-                        "Omit to get the full folder tree."
-                    ),
+                    "description": "List children of this folder (omit for full tree).",
                },
                "include_agents": {
                    "type": "boolean",
-                    "description": (
-                        "Whether to include the list of agents inside each folder. "
-                        "Defaults to false."
-                    ),
+                    "description": "Include agents in each folder (default: false).",
                },
            },
            "required": [],
@@ -357,10 +341,7 @@ class MoveFolderTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Move a folder to a different parent folder. "
-            "Set target_parent_id to null to move to root level."
-        )
+        return "Move a folder. Set target_parent_id to null for root."

    @property
    def requires_auth(self) -> bool:
@@ -373,14 +354,11 @@ class MoveFolderTool(BaseTool):
            "properties": {
                "folder_id": {
                    "type": "string",
-                    "description": "ID of the folder to move.",
+                    "description": "Folder ID.",
                },
                "target_parent_id": {
                    "type": ["string", "null"],
-                    "description": (
-                        "ID of the new parent folder. "
-                        "Use null to move to root level."
-                    ),
+                    "description": "New parent folder ID (null for root).",
                },
            },
            "required": ["folder_id"],
@@ -433,10 +411,7 @@ class DeleteFolderTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Delete a folder from the user's library. "
-            "Agents inside the folder are moved to root level (not deleted)."
-        )
+        return "Delete a folder. Agents inside move to root (not deleted)."

    @property
    def requires_auth(self) -> bool:
@@ -499,10 +474,7 @@ class MoveAgentsToFolderTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Move one or more agents to a folder. "
-            "Set folder_id to null to move agents to root level."
-        )
+        return "Move agents to a folder. Set folder_id to null for root."

    @property
    def requires_auth(self) -> bool:
@@ -516,13 +488,11 @@ class MoveAgentsToFolderTool(BaseTool):
                "agent_ids": {
                    "type": "array",
                    "items": {"type": "string"},
-                    "description": "List of library agent IDs to move.",
+                    "description": "Library agent IDs to move.",
                },
                "folder_id": {
                    "type": ["string", "null"],
-                    "description": (
-                        "Target folder ID. Use null to move to root level."
-                    ),
+                    "description": "Target folder ID (null for root).",
                },
            },
            "required": ["agent_ids"],
--- a/autogpt_platform/backend/backend/copilot/tools/run_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/run_agent.py
@@ -104,19 +104,11 @@ class RunAgentTool(BaseTool):

    @property
    def description(self) -> str:
-        return """Run or schedule an agent from the marketplace or user's library.
-
-        The tool automatically handles the setup flow:
-        - Returns missing inputs if required fields are not provided
-        - Returns missing credentials if user needs to configure them
-        - Executes immediately if all requirements are met
-        - Schedules execution if cron expression is provided
-
-        Identify the agent using either:
-        - username_agent_slug: Marketplace format 'username/agent-name'
-        - library_agent_id: ID of an agent in the user's library
-
-        For scheduled execution, provide: schedule_name, cron, and optionally timezone."""
+        return (
+            "Run or schedule an agent. Automatically checks inputs and credentials. "
+            "Identify by username_agent_slug ('user/agent') or library_agent_id. "
+            "For scheduling, provide schedule_name + cron."
+        )

    @property
    def parameters(self) -> dict[str, Any]:
@@ -125,40 +117,38 @@ class RunAgentTool(BaseTool):
            "properties": {
                "username_agent_slug": {
                    "type": "string",
-                    "description": "Agent identifier in format 'username/agent-name'",
+                    "description": "Marketplace format 'username/agent-name'.",
                },
                "library_agent_id": {
                    "type": "string",
-                    "description": "Library agent ID from user's library",
+                    "description": "Library agent ID.",
                },
                "inputs": {
                    "type": "object",
-                    "description": "Input values for the agent",
+                    "description": "Input values for the agent.",
                    "additionalProperties": True,
                },
                "use_defaults": {
                    "type": "boolean",
-                    "description": "Set to true to run with default values (user must confirm)",
+                    "description": "Run with default values (confirm with user first).",
                },
                "schedule_name": {
                    "type": "string",
-                    "description": "Name for scheduled execution (triggers scheduling mode)",
+                    "description": "Name for scheduled execution. Providing this triggers scheduling mode (also requires cron).",
                },
                "cron": {
                    "type": "string",
-                    "description": "Cron expression (5 fields: min hour day month weekday)",
+                    "description": "Cron expression (min hour day month weekday).",
                },
                "timezone": {
                    "type": "string",
-                    "description": "IANA timezone for schedule (default: UTC)",
+                    "description": "IANA timezone (default: UTC).",
                },
                "wait_for_result": {
                    "type": "integer",
-                    "description": (
-                        "Max seconds to wait for execution to complete (0-300). "
-                        "If >0, blocks until the execution finishes or times out. "
-                        "Returns execution outputs when complete."
-                    ),
+                    "description": "Max seconds to wait for completion (0-300).",
+                    "minimum": 0,
+                    "maximum": 300,
                },
            },
            "required": [],
--- a/autogpt_platform/backend/backend/copilot/tools/run_block.py
+++ b/autogpt_platform/backend/backend/copilot/tools/run_block.py
@@ -45,13 +45,10 @@ class RunBlockTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Execute a specific block with the provided input data. "
-            "IMPORTANT: You MUST call find_block first to get the block's 'id' - "
-            "do NOT guess or make up block IDs. "
-            "On first attempt (without input_data), returns detailed schema showing "
-            "required inputs and outputs. Then call again with proper input_data to execute. "
-            "If a block requires human review, use continue_run_block with the "
-            "review_id after the user approves."
+            "Execute a block. IMPORTANT: Always get block_id from find_block first "
+            "— do NOT guess or fabricate IDs. "
+            "Call with empty input_data to see schema, then with data to execute. "
+            "If review_required, use continue_run_block."
        )

    @property
@@ -61,28 +58,14 @@ class RunBlockTool(BaseTool):
            "properties": {
                "block_id": {
                    "type": "string",
-                    "description": (
-                        "The block's 'id' field from find_block results. "
-                        "NEVER guess this - always get it from find_block first."
-                    ),
-                },
-                "block_name": {
-                    "type": "string",
-                    "description": (
-                        "The block's human-readable name from find_block results. "
-                        "Used for display purposes in the UI."
-                    ),
+                    "description": "Block ID from find_block results.",
                },
                "input_data": {
                    "type": "object",
-                    "description": (
-                        "Input values for the block. "
-                        "First call with empty {} to see the block's schema, "
-                        "then call again with proper values to execute."
-                    ),
+                    "description": "Input values. Use {} first to see schema.",
                },
            },
-            "required": ["block_id", "block_name", "input_data"],
+            "required": ["block_id", "input_data"],
        }

    @property
--- a/autogpt_platform/backend/backend/copilot/tools/run_mcp_tool.py
+++ b/autogpt_platform/backend/backend/copilot/tools/run_mcp_tool.py
@@ -57,10 +57,9 @@ class RunMCPToolTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Connect to an MCP (Model Context Protocol) server to discover and execute its tools. "
-            "Two-step: (1) call with server_url to list available tools, "
-            "(2) call again with server_url + tool_name + tool_arguments to execute. "
-            "Call get_mcp_guide for known server URLs and auth details."
+            "Discover and execute MCP server tools. "
+            "Call with server_url only to list tools, then with tool_name + tool_arguments to execute. "
+            "Call get_mcp_guide first for server URLs and auth."
        )

    @property
@@ -70,24 +69,15 @@ class RunMCPToolTool(BaseTool):
            "properties": {
                "server_url": {
                    "type": "string",
-                    "description": (
-                        "URL of the MCP server (Streamable HTTP endpoint), "
-                        "e.g. https://mcp.example.com/mcp"
-                    ),
+                    "description": "MCP server URL (Streamable HTTP endpoint).",
                },
                "tool_name": {
                    "type": "string",
-                    "description": (
-                        "Name of the MCP tool to execute. "
-                        "Omit on first call to discover available tools."
-                    ),
+                    "description": "Tool to execute. Omit to discover available tools.",
                },
                "tool_arguments": {
                    "type": "object",
-                    "description": (
-                        "Arguments to pass to the selected tool. "
-                        "Must match the tool's input schema returned during discovery."
-                    ),
+                    "description": "Arguments matching the tool's input schema.",
                },
            },
            "required": ["server_url"],
--- a/autogpt_platform/backend/backend/copilot/tools/search_docs.py
+++ b/autogpt_platform/backend/backend/copilot/tools/search_docs.py
@@ -38,11 +38,7 @@ class SearchDocsTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Search the AutoGPT platform documentation for information about "
-            "how to use the platform, build agents, configure blocks, and more. "
-            "Returns relevant documentation sections. Use get_doc_page to read full content."
-        )
+        return "Search platform documentation by keyword. Use get_doc_page to read full results."

    @property
    def parameters(self) -> dict[str, Any]:
@@ -51,10 +47,7 @@ class SearchDocsTool(BaseTool):
            "properties": {
                "query": {
                    "type": "string",
-                    "description": (
-                        "Search query to find relevant documentation. "
-                        "Use natural language to describe what you're looking for."
-                    ),
+                    "description": "Documentation search query.",
                },
            },
            "required": ["query"],
--- a/autogpt_platform/backend/backend/copilot/tools/tool_schema_test.py
+++ b/autogpt_platform/backend/backend/copilot/tools/tool_schema_test.py
@@ -0,0 +1,119 @@
+"""Schema regression tests for all registered CoPilot tools.
+
+Validates that every tool in TOOL_REGISTRY produces a well-formed schema:
+- description is non-empty
+- all `required` fields exist in `properties`
+- every property has a `type` and `description`
+- total schema character budget does not regress past threshold
+"""
+
+import json
+from typing import Any, cast
+
+import pytest
+
+from backend.copilot.tools import TOOL_REGISTRY
+
+# Character budget (~4 chars/token heuristic, targeting ~8000 tokens)
+_CHAR_BUDGET = 32_000
+
+
+@pytest.fixture(scope="module")
+def all_tool_schemas() -> list[tuple[str, Any]]:
+    """Return (tool_name, openai_schema) pairs for every registered tool."""
+    return [(name, tool.as_openai_tool()) for name, tool in TOOL_REGISTRY.items()]
+
+
+def _get_parametrize_data() -> list[tuple[str, object]]:
+    """Build parametrize data at collection time."""
+    return [(name, tool.as_openai_tool()) for name, tool in TOOL_REGISTRY.items()]
+
+
+@pytest.mark.parametrize(
+    "tool_name,schema",
+    _get_parametrize_data(),
+    ids=[name for name, _ in _get_parametrize_data()],
+)
+class TestToolSchema:
+    """Validate schema invariants for every registered tool."""
+
+    def test_description_non_empty(self, tool_name: str, schema: dict) -> None:
+        desc = schema["function"].get("description", "")
+        assert desc, f"Tool '{tool_name}' has an empty description"
+
+    def test_required_fields_exist_in_properties(
+        self, tool_name: str, schema: dict
+    ) -> None:
+        params = schema["function"].get("parameters", {})
+        properties = params.get("properties", {})
+        required = params.get("required", [])
+        for field in required:
+            assert field in properties, (
+                f"Tool '{tool_name}': required field '{field}' "
+                f"not found in properties {list(properties.keys())}"
+            )
+
+    def test_every_property_has_type_and_description(
+        self, tool_name: str, schema: dict
+    ) -> None:
+        params = schema["function"].get("parameters", {})
+        properties = params.get("properties", {})
+        for prop_name, prop_def in properties.items():
+            assert (
+                "type" in prop_def
+            ), f"Tool '{tool_name}', property '{prop_name}' is missing 'type'"
+            assert (
+                "description" in prop_def
+            ), f"Tool '{tool_name}', property '{prop_name}' is missing 'description'"
+
+
+def test_browser_act_action_enum_complete() -> None:
+    """Assert browser_act action enum still contains all 14 supported actions.
+
+    This prevents future PRs from accidentally dropping actions during description
+    trimming. The enum is the authoritative list — this locks it at 14 values.
+    """
+    tool = TOOL_REGISTRY["browser_act"]
+    schema = tool.as_openai_tool()
+    fn_def = schema["function"]
+    params = cast(dict[str, Any], fn_def.get("parameters", {}))
+    actions = params["properties"]["action"]["enum"]
+    expected = {
+        "click",
+        "dblclick",
+        "fill",
+        "type",
+        "scroll",
+        "hover",
+        "press",
+        "check",
+        "uncheck",
+        "select",
+        "wait",
+        "back",
+        "forward",
+        "reload",
+    }
+    assert set(actions) == expected, (
+        f"browser_act action enum changed. Got {set(actions)}, expected {expected}. "
+        "If you added/removed an action, update this test intentionally."
+    )
+
+
+def test_total_schema_char_budget() -> None:
+    """Assert total tool schema size stays under the character budget.
+
+    This locks in the 34% token reduction from #12398 and prevents future
+    description bloat from eroding the gains. Uses character count with a
+    ~4 chars/token heuristic (budget of 32000 chars ≈ 8000 tokens).
+    Character count is tokenizer-agnostic — no dependency on GPT or Claude
+    tokenizers — while still providing a stable regression gate.
+    """
+    schemas = [tool.as_openai_tool() for tool in TOOL_REGISTRY.values()]
+    serialized = json.dumps(schemas)
+    total_chars = len(serialized)
+    assert total_chars < _CHAR_BUDGET, (
+        f"Tool schemas use {total_chars} chars (~{total_chars // 4} tokens), "
+        f"exceeding budget of {_CHAR_BUDGET} chars (~{_CHAR_BUDGET // 4} tokens). "
+        f"Description bloat detected — trim descriptions or raise the budget intentionally."
+    )
--- a/autogpt_platform/backend/backend/copilot/tools/validate_agent.py
+++ b/autogpt_platform/backend/backend/copilot/tools/validate_agent.py
@@ -22,17 +22,9 @@ class ValidateAgentGraphTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Validate an agent JSON graph for correctness. Checks:\n"
-            "- All block_ids reference real blocks\n"
-            "- All links reference valid source/sink nodes and fields\n"
-            "- Required input fields are wired or have defaults\n"
-            "- Data types are compatible across links\n"
-            "- Nested sink links use correct notation\n"
-            "- Prompt templates use proper curly brace escaping\n"
-            "- AgentExecutorBlock configurations are valid\n\n"
-            "Call this after generating agent JSON to verify correctness. "
-            "If validation fails, either fix issues manually based on the error "
-            "descriptions, or call fix_agent_graph to auto-fix common problems."
+            "Validate agent JSON for correctness: block_ids, links, required fields, "
+            "type compatibility, nested sink notation, prompt brace escaping, "
+            "and AgentExecutorBlock configs. On failure, use fix_agent_graph to auto-fix."
        )

    @property
@@ -46,11 +38,7 @@ class ValidateAgentGraphTool(BaseTool):
            "properties": {
                "agent_json": {
                    "type": "object",
-                    "description": (
-                        "The agent JSON to validate. Must contain 'nodes' and 'links' arrays. "
-                        "Each node needs: id (UUID), block_id, input_default, metadata. "
-                        "Each link needs: id (UUID), source_id, source_name, sink_id, sink_name."
-                    ),
+                    "description": "Agent JSON with 'nodes' and 'links' arrays.",
                },
            },
            "required": ["agent_json"],
--- a/autogpt_platform/backend/backend/copilot/tools/web_fetch.py
+++ b/autogpt_platform/backend/backend/copilot/tools/web_fetch.py
@@ -59,13 +59,7 @@ class WebFetchTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Fetch the content of a public web page by URL. "
-            "Returns readable text extracted from HTML by default. "
-            "Useful for reading documentation, articles, and API responses. "
-            "Only supports HTTP/HTTPS GET requests to public URLs "
-            "(private/internal network addresses are blocked)."
-        )
+        return "Fetch a public web page. Public URLs only — internal addresses blocked. Returns readable text from HTML by default."

    @property
    def parameters(self) -> dict[str, Any]:
@@ -74,14 +68,11 @@ class WebFetchTool(BaseTool):
            "properties": {
                "url": {
                    "type": "string",
-                    "description": "The public HTTP/HTTPS URL to fetch.",
+                    "description": "Public HTTP/HTTPS URL.",
                },
                "extract_text": {
                    "type": "boolean",
-                    "description": (
-                        "If true (default), extract readable text from HTML. "
-                        "If false, return raw content."
-                    ),
+                    "description": "Extract text from HTML (default: true).",
                    "default": True,
                },
            },
--- a/autogpt_platform/backend/backend/copilot/tools/workspace_files.py
+++ b/autogpt_platform/backend/backend/copilot/tools/workspace_files.py
@@ -27,6 +27,8 @@ from .models import ErrorResponse, ResponseType, ToolResponseBase

 logger = logging.getLogger(__name__)

+_MAX_FILE_SIZE_MB = Config().max_file_size_mb
+
 # Sentinel file_id used when a tool-result file is read directly from the local
 # host filesystem (rather than from workspace storage).
 _LOCAL_TOOL_RESULT_FILE_ID = "local"
@@ -415,13 +417,7 @@ class ListWorkspaceFilesTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "List files in the user's persistent workspace (cloud storage). "
-            "These files survive across sessions. "
-            "For ephemeral session files, use the SDK Read/Glob tools instead. "
-            "Returns file names, paths, sizes, and metadata. "
-            "Optionally filter by path prefix."
-        )
+        return "List persistent workspace files. For ephemeral session files, use SDK Glob/Read instead. Optionally filter by path prefix."

    @property
    def parameters(self) -> dict[str, Any]:
@@ -430,24 +426,17 @@ class ListWorkspaceFilesTool(BaseTool):
            "properties": {
                "path_prefix": {
                    "type": "string",
-                    "description": (
-                        "Optional path prefix to filter files "
-                        "(e.g., '/documents/' to list only files in documents folder). "
-                        "By default, only files from the current session are listed."
-                    ),
+                    "description": "Filter by path prefix (e.g. '/documents/').",
                },
                "limit": {
                    "type": "integer",
-                    "description": "Maximum number of files to return (default 50, max 100)",
+                    "description": "Max files to return (default 50, max 100).",
                    "minimum": 1,
                    "maximum": 100,
                },
                "include_all_sessions": {
                    "type": "boolean",
-                    "description": (
-                        "If true, list files from all sessions. "
-                        "Default is false (only current session's files)."
-                    ),
+                    "description": "Include files from all sessions (default: false).",
                },
            },
            "required": [],
@@ -530,18 +519,11 @@ class ReadWorkspaceFileTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Read a file from the user's persistent workspace (cloud storage). "
-            "These files survive across sessions. "
-            "For ephemeral session files, use the SDK Read tool instead. "
-            "Specify either file_id or path to identify the file. "
-            "For small text files, returns content directly. "
-            "For large or binary files, returns metadata and a download URL. "
-            "Use 'save_to_path' to copy the file to the working directory "
-            "(sandbox or ephemeral) for processing with bash_exec or file tools. "
-            "Use 'offset' and 'length' for paginated reads of large files "
-            "(e.g., persisted tool outputs). "
-            "Paths are scoped to the current session by default. "
-            "Use /sessions/<session_id>/... for cross-session access."
+            "Read a file from persistent workspace. Specify file_id or path. "
+            "Small text/image files return inline; large/binary return metadata+URL. "
+            "Use save_to_path to copy to working dir for processing. "
+            "Use offset/length for paginated reads. "
+            "Paths scoped to current session; use /sessions/<id>/... for cross-session access."
        )

    @property
@@ -551,48 +533,30 @@ class ReadWorkspaceFileTool(BaseTool):
            "properties": {
                "file_id": {
                    "type": "string",
-                    "description": "The file's unique ID (from list_workspace_files)",
+                    "description": "File ID from list_workspace_files.",
                },
                "path": {
                    "type": "string",
-                    "description": (
-                        "The virtual file path (e.g., '/documents/report.pdf'). "
-                        "Scoped to current session by default."
-                    ),
+                    "description": "Virtual file path (e.g. '/documents/report.pdf').",
                },
                "save_to_path": {
                    "type": "string",
-                    "description": (
-                        "If provided, save the file to this path in the working "
-                        "directory (cloud sandbox when E2B is active, or "
-                        "ephemeral dir otherwise) so it can be processed with "
-                        "bash_exec or file tools. "
-                        "The file content is still returned in the response."
-                    ),
+                    "description": "Copy file to this working directory path for processing.",
                },
                "force_download_url": {
                    "type": "boolean",
-                    "description": (
-                        "If true, always return metadata+URL instead of inline content. "
-                        "Default is false (auto-selects based on file size/type)."
-                    ),
+                    "description": "Always return metadata+URL instead of inline content.",
                },
                "offset": {
                    "type": "integer",
-                    "description": (
-                        "Character offset to start reading from (0-based). "
-                        "Use with 'length' for paginated reads of large files."
-                    ),
+                    "description": "Character offset for paginated reads (0-based).",
                },
                "length": {
                    "type": "integer",
-                    "description": (
-                        "Maximum number of characters to return. "
-                        "Defaults to full file. Use with 'offset' for paginated reads."
-                    ),
+                    "description": "Max characters to return for paginated reads.",
                },
            },
-            "required": [],  # At least one must be provided
+            "required": [],  # At least one of file_id or path must be provided
        }

    @property
@@ -755,15 +719,10 @@ class WriteWorkspaceFileTool(BaseTool):
    @property
    def description(self) -> str:
        return (
-            "Write or create a file in the user's persistent workspace (cloud storage). "
-            "These files survive across sessions. "
-            "For ephemeral session files, use the SDK Write tool instead. "
-            "Provide content as plain text via 'content', OR base64-encoded via "
-            "'content_base64', OR copy a file from the ephemeral working directory "
-            "via 'source_path'. Exactly one of these three is required. "
-            f"Maximum file size is {Config().max_file_size_mb}MB. "
-            "Files are saved to the current session's folder by default. "
-            "Use /sessions/<session_id>/... for cross-session access."
+            "Write a file to persistent workspace (survives across sessions). "
+            "Provide exactly one of: content (text), content_base64 (binary), "
+            f"or source_path (copy from working dir). Max {_MAX_FILE_SIZE_MB}MB. "
+            "Paths scoped to current session; use /sessions/<id>/... for cross-session access."
        )

    @property
@@ -773,51 +732,31 @@ class WriteWorkspaceFileTool(BaseTool):
            "properties": {
                "filename": {
                    "type": "string",
-                    "description": "Name for the file (e.g., 'report.pdf')",
+                    "description": "Filename (e.g. 'report.pdf').",
                },
                "content": {
                    "type": "string",
-                    "description": (
-                        "Plain text content to write. Use this for text files "
-                        "(code, configs, documents, etc.). "
-                        "Mutually exclusive with content_base64 and source_path."
-                    ),
+                    "description": "Plain text content. Mutually exclusive with content_base64/source_path.",
                },
                "content_base64": {
                    "type": "string",
-                    "description": (
-                        "Base64-encoded file content. Use this for binary files "
-                        "(images, PDFs, etc.). "
-                        "Mutually exclusive with content and source_path."
-                    ),
+                    "description": "Base64-encoded binary content. Mutually exclusive with content/source_path.",
                },
                "source_path": {
                    "type": "string",
-                    "description": (
-                        "Path to a file in the ephemeral working directory to "
-                        "copy to workspace (e.g., '/tmp/copilot-.../output.csv'). "
-                        "Use this to persist files created by bash_exec or SDK Write. "
-                        "Mutually exclusive with content and content_base64."
-                    ),
+                    "description": "Working directory path to copy to workspace. Mutually exclusive with content/content_base64.",
                },
                "path": {
                    "type": "string",
-                    "description": (
-                        "Optional virtual path where to save the file "
-                        "(e.g., '/documents/report.pdf'). "
-                        "Defaults to '/{filename}'. Scoped to current session."
-                    ),
+                    "description": "Virtual path (e.g. '/documents/report.pdf'). Defaults to '/{filename}'.",
                },
                "mime_type": {
                    "type": "string",
-                    "description": (
-                        "Optional MIME type of the file. "
-                        "Auto-detected from filename if not provided."
-                    ),
+                    "description": "MIME type. Auto-detected from filename if omitted.",
                },
                "overwrite": {
                    "type": "boolean",
-                    "description": "Whether to overwrite if file exists at path (default: false)",
+                    "description": "Overwrite if file exists (default: false).",
                },
            },
            "required": ["filename"],
@@ -859,10 +798,10 @@ class WriteWorkspaceFileTool(BaseTool):
            return resolved
        content: bytes = resolved

-        max_size = Config().max_file_size_mb * 1024 * 1024
+        max_size = _MAX_FILE_SIZE_MB * 1024 * 1024
        if len(content) > max_size:
            return ErrorResponse(
-                message=f"File too large. Maximum size is {Config().max_file_size_mb}MB",
+                message=f"File too large. Maximum size is {_MAX_FILE_SIZE_MB}MB",
                session_id=session_id,
            )

@@ -944,12 +883,7 @@ class DeleteWorkspaceFileTool(BaseTool):

    @property
    def description(self) -> str:
-        return (
-            "Delete a file from the user's persistent workspace (cloud storage). "
-            "Specify either file_id or path to identify the file. "
-            "Paths are scoped to the current session by default. "
-            "Use /sessions/<session_id>/... for cross-session access."
-        )
+        return "Delete a file from persistent workspace. Specify file_id or path. Paths scoped to current session; use /sessions/<id>/... for cross-session access."

    @property
    def parameters(self) -> dict[str, Any]:
@@ -958,17 +892,14 @@ class DeleteWorkspaceFileTool(BaseTool):
            "properties": {
                "file_id": {
                    "type": "string",
-                    "description": "The file's unique ID (from list_workspace_files)",
+                    "description": "File ID from list_workspace_files.",
                },
                "path": {
                    "type": "string",
-                    "description": (
-                        "The virtual file path (e.g., '/documents/report.pdf'). "
-                        "Scoped to current session by default."
-                    ),
+                    "description": "Virtual file path.",
                },
            },
-            "required": [],  # At least one must be provided
+            "required": [],  # At least one of file_id or path must be provided
        }

    @property
--- a/autogpt_platform/backend/backend/data/block_cost_config.py
+++ b/autogpt_platform/backend/backend/data/block_cost_config.py
@@ -76,7 +76,6 @@ MODEL_COST: dict[LlmModel, int] = {
    LlmModel.GPT4O_MINI: 1,
    LlmModel.GPT4O: 3,
    LlmModel.GPT4_TURBO: 10,
-    LlmModel.GPT3_5_TURBO: 1,
    LlmModel.CLAUDE_4_1_OPUS: 21,
    LlmModel.CLAUDE_4_OPUS: 21,
    LlmModel.CLAUDE_4_SONNET: 5,
@@ -423,7 +422,7 @@ BLOCK_COSTS: dict[Type[Block], list[BlockCost]] = {
        BlockCost(
            cost_amount=10,
            cost_filter={
-                "model": FluxKontextModelName.PRO.api_name,
+                "model": FluxKontextModelName.FLUX_KONTEXT_PRO,
                "credentials": {
                    "id": replicate_credentials.id,
                    "provider": replicate_credentials.provider,
@@ -434,7 +433,29 @@ BLOCK_COSTS: dict[Type[Block], list[BlockCost]] = {
        BlockCost(
            cost_amount=20,
            cost_filter={
-                "model": FluxKontextModelName.MAX.api_name,
+                "model": FluxKontextModelName.FLUX_KONTEXT_MAX,
+                "credentials": {
+                    "id": replicate_credentials.id,
+                    "provider": replicate_credentials.provider,
+                    "type": replicate_credentials.type,
+                },
+            },
+        ),
+        BlockCost(
+            cost_amount=14,  # Nano Banana Pro
+            cost_filter={
+                "model": FluxKontextModelName.NANO_BANANA_PRO,
+                "credentials": {
+                    "id": replicate_credentials.id,
+                    "provider": replicate_credentials.provider,
+                    "type": replicate_credentials.type,
+                },
+            },
+        ),
+        BlockCost(
+            cost_amount=14,  # Nano Banana 2
+            cost_filter={
+                "model": FluxKontextModelName.NANO_BANANA_2,
                "credentials": {
                    "id": replicate_credentials.id,
                    "provider": replicate_credentials.provider,
@@ -632,6 +653,17 @@ BLOCK_COSTS: dict[Type[Block], list[BlockCost]] = {
                },
            },
        ),
+        BlockCost(
+            cost_amount=14,  # Nano Banana 2: same pricing tier as Pro
+            cost_filter={
+                "model": ImageGenModel.NANO_BANANA_2,
+                "credentials": {
+                    "id": replicate_credentials.id,
+                    "provider": replicate_credentials.provider,
+                    "type": replicate_credentials.type,
+                },
+            },
+        ),
    ],
    AIImageCustomizerBlock: [
        BlockCost(
@@ -656,6 +688,17 @@ BLOCK_COSTS: dict[Type[Block], list[BlockCost]] = {
                },
            },
        ),
+        BlockCost(
+            cost_amount=14,  # Nano Banana 2: same pricing tier as Pro
+            cost_filter={
+                "model": GeminiImageModel.NANO_BANANA_2,
+                "credentials": {
+                    "id": replicate_credentials.id,
+                    "provider": replicate_credentials.provider,
+                    "type": replicate_credentials.type,
+                },
+            },
+        ),
    ],
    VideoNarrationBlock: [
        BlockCost(
--- a/autogpt_platform/backend/backend/data/execution.py
+++ b/autogpt_platform/backend/backend/data/execution.py
@@ -877,12 +877,12 @@ async def get_execution_outputs_by_node_exec_id(
        where={"referencedByOutputExecId": node_exec_id}
    )

-    result = {}
+    result: CompletedBlockOutput = defaultdict(list)
    for output in outputs:
        if output.data is not None:
-            result[output.name] = type_utils.convert(output.data, JsonValue)
+            result[output.name].append(type_utils.convert(output.data, JsonValue))

-    return result
+    return dict(result)


 async def update_graph_execution_start_time(
--- a/autogpt_platform/backend/backend/data/execution_outputs_test.py
+++ b/autogpt_platform/backend/backend/data/execution_outputs_test.py
@@ -0,0 +1,102 @@
+"""Test that get_execution_outputs_by_node_exec_id returns CompletedBlockOutput.
+
+CompletedBlockOutput is dict[str, list[Any]] — values must be lists.
+The RPC service layer validates return types via TypeAdapter, so if
+the function returns plain values instead of lists, it causes:
+
+    1 validation error for dict[str,list[any]] response
+    Input should be a valid list [type=list_type, input_value='', input_type=str]
+
+This breaks SmartDecisionMakerBlock agent mode tool execution.
+"""
+
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+from pydantic import TypeAdapter
+
+from backend.data.block import CompletedBlockOutput
+
+
+@pytest.mark.asyncio
+async def test_outputs_are_lists():
+    """Each value in the returned dict must be a list, matching CompletedBlockOutput."""
+    from backend.data.execution import get_execution_outputs_by_node_exec_id
+
+    mock_output = MagicMock()
+    mock_output.name = "response"
+    mock_output.data = "some text output"
+
+    with patch(
+        "backend.data.execution.AgentNodeExecutionInputOutput.prisma"
+    ) as mock_prisma:
+        mock_prisma.return_value.find_many = AsyncMock(return_value=[mock_output])
+        result = await get_execution_outputs_by_node_exec_id("test-exec-id")
+
+    # The result must conform to CompletedBlockOutput = dict[str, list[Any]]
+    assert "response" in result
+    assert isinstance(
+        result["response"], list
+    ), f"Expected list, got {type(result['response']).__name__}: {result['response']!r}"
+
+    # Must also pass TypeAdapter validation (this is what the RPC layer does)
+    adapter = TypeAdapter(CompletedBlockOutput)
+    validated = adapter.validate_python(result)  # This is the line that fails in prod
+    assert validated == {"response": ["some text output"]}
+
+
+@pytest.mark.asyncio
+async def test_multiple_outputs_same_name_are_collected():
+    """Multiple outputs with the same name should all appear in the list."""
+    from backend.data.execution import get_execution_outputs_by_node_exec_id
+
+    mock_out1 = MagicMock()
+    mock_out1.name = "result"
+    mock_out1.data = "first"
+
+    mock_out2 = MagicMock()
+    mock_out2.name = "result"
+    mock_out2.data = "second"
+
+    with patch(
+        "backend.data.execution.AgentNodeExecutionInputOutput.prisma"
+    ) as mock_prisma:
+        mock_prisma.return_value.find_many = AsyncMock(
+            return_value=[mock_out1, mock_out2]
+        )
+        result = await get_execution_outputs_by_node_exec_id("test-exec-id")
+
+    assert isinstance(result["result"], list)
+    assert len(result["result"]) == 2
+
+
+@pytest.mark.asyncio
+async def test_empty_outputs_returns_empty_dict():
+    """No outputs → empty dict."""
+    from backend.data.execution import get_execution_outputs_by_node_exec_id
+
+    with patch(
+        "backend.data.execution.AgentNodeExecutionInputOutput.prisma"
+    ) as mock_prisma:
+        mock_prisma.return_value.find_many = AsyncMock(return_value=[])
+        result = await get_execution_outputs_by_node_exec_id("test-exec-id")
+
+    assert result == {}
+
+
+@pytest.mark.asyncio
+async def test_none_data_skipped():
+    """Outputs with data=None should be skipped."""
+    from backend.data.execution import get_execution_outputs_by_node_exec_id
+
+    mock_output = MagicMock()
+    mock_output.name = "response"
+    mock_output.data = None
+
+    with patch(
+        "backend.data.execution.AgentNodeExecutionInputOutput.prisma"
+    ) as mock_prisma:
+        mock_prisma.return_value.find_many = AsyncMock(return_value=[mock_output])
+        result = await get_execution_outputs_by_node_exec_id("test-exec-id")
+
+    assert result == {}
--- a/autogpt_platform/backend/backend/data/graph.py
+++ b/autogpt_platform/backend/backend/data/graph.py
@@ -36,7 +36,7 @@ from backend.util.models import Pagination
 from backend.util.request import parse_url

 from .block import BlockInput
-from .db import BaseDbModel
+from .db import BaseDbModel, execute_raw_with_schema
 from .db import prisma as db
 from .db import query_raw_with_schema, transaction
 from .dynamic_fields import is_tool_pin, sanitize_pin_name
@@ -1670,15 +1670,15 @@ async def migrate_llm_models(migrate_to: LlmModel):
    # Update each block
    for id, path in llm_model_fields.items():
        query = f"""
-            UPDATE platform."AgentNode"
+            UPDATE {{schema_prefix}}"AgentNode"
            SET "constantInput" = jsonb_set("constantInput", $1, to_jsonb($2), true)
            WHERE "agentBlockId" = $3
            AND "constantInput" ? ($4)::text
            AND "constantInput"->>($4)::text NOT IN {escaped_enum_values}
            """

-        await db.execute_raw(
-            query,  # type: ignore - is supposed to be LiteralString
+        await execute_raw_with_schema(
+            query,
            [path],
            migrate_to.value,
            id,
--- a/autogpt_platform/backend/backend/data/invited_user.py
+++ b/autogpt_platform/backend/backend/data/invited_user.py
@@ -1,750 +0,0 @@
-import asyncio
-import csv
-import io
-import logging
-import os
-import re
-import socket
-from dataclasses import dataclass
-from datetime import datetime, timezone
-from typing import Any, Literal, Optional
-from uuid import uuid4
-
-import prisma.enums
-import prisma.models
-import prisma.types
-from prisma.errors import UniqueViolationError
-from pydantic import BaseModel, EmailStr, TypeAdapter, ValidationError
-
-from backend.data.db import transaction
-from backend.data.model import User
-from backend.data.redis_client import get_redis_async
-from backend.data.tally import get_business_understanding_input_from_tally, mask_email
-from backend.data.understanding import (
-    BusinessUnderstandingInput,
-    merge_business_understanding_data,
-)
-from backend.data.user import get_user_by_email, get_user_by_id
-from backend.executor.cluster_lock import AsyncClusterLock
-from backend.util.exceptions import (
-    NotAuthorizedError,
-    NotFoundError,
-    PreconditionFailed,
-)
-from backend.util.json import SafeJson
-from backend.util.settings import Settings
-
-logger = logging.getLogger(__name__)
-_settings = Settings()
-
-_WORKER_ID = f"{socket.gethostname()}:{os.getpid()}"
-
-_tally_seed_tasks: dict[str, asyncio.Task] = {}
-_TALLY_STALE_SECONDS = 300
-_MAX_TALLY_ERROR_LENGTH = 200
-_email_adapter = TypeAdapter(EmailStr)
-
-MAX_BULK_INVITE_FILE_BYTES = 1024 * 1024
-MAX_BULK_INVITE_ROWS = 500
-
-
-class InvitedUserRecord(BaseModel):
-    id: str
-    email: str
-    status: prisma.enums.InvitedUserStatus
-    auth_user_id: Optional[str] = None
-    name: Optional[str] = None
-    tally_understanding: Optional[dict[str, Any]] = None
-    tally_status: prisma.enums.TallyComputationStatus
-    tally_computed_at: Optional[datetime] = None
-    tally_error: Optional[str] = None
-    created_at: datetime
-    updated_at: datetime
-
-    @classmethod
-    def from_db(cls, invited_user: "prisma.models.InvitedUser") -> "InvitedUserRecord":
-        payload = (
-            invited_user.tallyUnderstanding
-            if isinstance(invited_user.tallyUnderstanding, dict)
-            else None
-        )
-        return cls(
-            id=invited_user.id,
-            email=invited_user.email,
-            status=invited_user.status,
-            auth_user_id=invited_user.authUserId,
-            name=invited_user.name,
-            tally_understanding=payload,
-            tally_status=invited_user.tallyStatus,
-            tally_computed_at=invited_user.tallyComputedAt,
-            tally_error=invited_user.tallyError,
-            created_at=invited_user.createdAt,
-            updated_at=invited_user.updatedAt,
-        )
-
-
-class BulkInvitedUserRowResult(BaseModel):
-    row_number: int
-    email: Optional[str] = None
-    name: Optional[str] = None
-    status: Literal["CREATED", "SKIPPED", "ERROR"]
-    message: str
-    invited_user: Optional[InvitedUserRecord] = None
-
-
-class BulkInvitedUsersResult(BaseModel):
-    created_count: int
-    skipped_count: int
-    error_count: int
-    results: list[BulkInvitedUserRowResult]
-
-
-@dataclass(frozen=True)
-class _ParsedInviteRow:
-    row_number: int
-    email: str
-    name: Optional[str]
-
-
-def normalize_email(email: str) -> str:
-    return email.strip().lower()
-
-
-def _normalize_name(name: Optional[str]) -> Optional[str]:
-    if name is None:
-        return None
-    normalized = name.strip()
-    return normalized or None
-
-
-def _default_profile_name(email: str, preferred_name: Optional[str]) -> str:
-    if preferred_name:
-        return preferred_name
-    local_part = email.split("@", 1)[0].strip()
-    return local_part or "user"
-
-
-def _sanitize_username_base(email: str) -> str:
-    local_part = email.split("@", 1)[0].lower()
-    sanitized = re.sub(r"[^a-z0-9-]", "", local_part)
-    sanitized = sanitized.strip("-")
-    return sanitized[:40] or "user"
-
-
-async def _generate_unique_profile_username(email: str, tx) -> str:
-    base = _sanitize_username_base(email)
-
-    for _ in range(2):
-        candidate = f"{base}-{uuid4().hex[:6]}"
-        existing = await prisma.models.Profile.prisma(tx).find_unique(
-            where={"username": candidate}
-        )
-        if existing is None:
-            return candidate
-
-    raise RuntimeError(f"Unable to generate unique username for {email}")
-
-
-async def _ensure_default_profile(
-    user_id: str,
-    email: str,
-    preferred_name: Optional[str],
-    tx,
-) -> None:
-    existing_profile = await prisma.models.Profile.prisma(tx).find_unique(
-        where={"userId": user_id}
-    )
-    if existing_profile is not None:
-        return
-
-    username = await _generate_unique_profile_username(email, tx)
-    await prisma.models.Profile.prisma(tx).create(
-        data=prisma.types.ProfileCreateInput(
-            userId=user_id,
-            name=_default_profile_name(email, preferred_name),
-            username=username,
-            description="I'm new here",
-            links=[],
-            avatarUrl="",
-        )
-    )
-
-
-async def _ensure_default_onboarding(user_id: str, tx) -> None:
-    await prisma.models.UserOnboarding.prisma(tx).upsert(
-        where={"userId": user_id},
-        data={
-            "create": prisma.types.UserOnboardingCreateInput(userId=user_id),
-            "update": {},
-        },
-    )
-
-
-async def _apply_tally_understanding(
-    user_id: str,
-    invited_user: "prisma.models.InvitedUser",
-    tx,
-) -> None:
-    if not isinstance(invited_user.tallyUnderstanding, dict):
-        return
-
-    try:
-        input_data = BusinessUnderstandingInput.model_validate(
-            invited_user.tallyUnderstanding
-        )
-    except Exception:
-        logger.warning(
-            "Malformed tallyUnderstanding for invited user %s; skipping",
-            invited_user.id,
-            exc_info=True,
-        )
-        return
-
-    payload = merge_business_understanding_data({}, input_data)
-    await prisma.models.CoPilotUnderstanding.prisma(tx).upsert(
-        where={"userId": user_id},
-        data={
-            "create": {"userId": user_id, "data": SafeJson(payload)},
-            "update": {"data": SafeJson(payload)},
-        },
-    )
-
-
-async def list_invited_users(
-    page: int = 1,
-    page_size: int = 50,
-) -> tuple[list[InvitedUserRecord], int]:
-    total = await prisma.models.InvitedUser.prisma().count()
-    invited_users = await prisma.models.InvitedUser.prisma().find_many(
-        order={"createdAt": "desc"},
-        skip=(page - 1) * page_size,
-        take=page_size,
-    )
-    return [InvitedUserRecord.from_db(iu) for iu in invited_users], total
-
-
-async def create_invited_user(
-    email: str, name: Optional[str] = None
-) -> InvitedUserRecord:
-    normalized_email = normalize_email(email)
-    normalized_name = _normalize_name(name)
-
-    existing_user = await prisma.models.User.prisma().find_unique(
-        where={"email": normalized_email}
-    )
-    if existing_user is not None:
-        raise PreconditionFailed("An active user with this email already exists")
-
-    existing_invited_user = await prisma.models.InvitedUser.prisma().find_unique(
-        where={"email": normalized_email}
-    )
-    if existing_invited_user is not None:
-        raise PreconditionFailed("An invited user with this email already exists")
-
-    try:
-        invited_user = await prisma.models.InvitedUser.prisma().create(
-            data={
-                "email": normalized_email,
-                "name": normalized_name,
-                "status": prisma.enums.InvitedUserStatus.INVITED,
-                "tallyStatus": prisma.enums.TallyComputationStatus.PENDING,
-            }
-        )
-    except UniqueViolationError:
-        raise PreconditionFailed("An invited user with this email already exists")
-    schedule_invited_user_tally_precompute(invited_user.id)
-    return InvitedUserRecord.from_db(invited_user)
-
-
-async def revoke_invited_user(invited_user_id: str) -> InvitedUserRecord:
-    invited_user = await prisma.models.InvitedUser.prisma().find_unique(
-        where={"id": invited_user_id}
-    )
-    if invited_user is None:
-        raise NotFoundError(f"Invited user {invited_user_id} not found")
-
-    if invited_user.status == prisma.enums.InvitedUserStatus.CLAIMED:
-        raise PreconditionFailed("Claimed invited users cannot be revoked")
-
-    if invited_user.status == prisma.enums.InvitedUserStatus.REVOKED:
-        return InvitedUserRecord.from_db(invited_user)
-
-    revoked_user = await prisma.models.InvitedUser.prisma().update(
-        where={"id": invited_user_id},
-        data={"status": prisma.enums.InvitedUserStatus.REVOKED},
-    )
-    if revoked_user is None:
-        raise NotFoundError(f"Invited user {invited_user_id} not found")
-    return InvitedUserRecord.from_db(revoked_user)
-
-
-async def retry_invited_user_tally(invited_user_id: str) -> InvitedUserRecord:
-    invited_user = await prisma.models.InvitedUser.prisma().find_unique(
-        where={"id": invited_user_id}
-    )
-    if invited_user is None:
-        raise NotFoundError(f"Invited user {invited_user_id} not found")
-
-    if invited_user.status == prisma.enums.InvitedUserStatus.REVOKED:
-        raise PreconditionFailed("Revoked invited users cannot retry Tally seeding")
-
-    refreshed_user = await prisma.models.InvitedUser.prisma().update(
-        where={"id": invited_user_id},
-        data={
-            "tallyUnderstanding": None,
-            "tallyStatus": prisma.enums.TallyComputationStatus.PENDING,
-            "tallyComputedAt": None,
-            "tallyError": None,
-        },
-    )
-    if refreshed_user is None:
-        raise NotFoundError(f"Invited user {invited_user_id} not found")
-    schedule_invited_user_tally_precompute(invited_user_id)
-    return InvitedUserRecord.from_db(refreshed_user)
-
-
-def _decode_bulk_invite_file(content: bytes) -> str:
-    if len(content) > MAX_BULK_INVITE_FILE_BYTES:
-        raise ValueError("Invite file exceeds the maximum size of 1 MB")
-
-    try:
-        return content.decode("utf-8-sig")
-    except UnicodeDecodeError as exc:
-        raise ValueError("Invite file must be UTF-8 encoded") from exc
-
-
-def _parse_bulk_invite_csv(text: str) -> list[_ParsedInviteRow]:
-    indexed_rows: list[tuple[int, list[str]]] = []
-
-    for row_number, row in enumerate(csv.reader(io.StringIO(text)), start=1):
-        normalized_row = [cell.strip() for cell in row]
-        if any(normalized_row):
-            indexed_rows.append((row_number, normalized_row))
-
-    if not indexed_rows:
-        return []
-
-    header = [cell.lower() for cell in indexed_rows[0][1]]
-    has_header = "email" in header
-    email_index = header.index("email") if has_header else 0
-    name_index: Optional[int] = (
-        header.index("name")
-        if has_header and "name" in header
-        else (1 if not has_header else None)
-    )
-    data_rows = indexed_rows[1:] if has_header else indexed_rows
-
-    parsed_rows: list[_ParsedInviteRow] = []
-    for row_number, row in data_rows:
-        if len(parsed_rows) >= MAX_BULK_INVITE_ROWS:
-            break
-        email = row[email_index].strip() if len(row) > email_index else ""
-        name = (
-            row[name_index].strip()
-            if name_index is not None and len(row) > name_index
-            else ""
-        )
-        parsed_rows.append(
-            _ParsedInviteRow(
-                row_number=row_number,
-                email=email,
-                name=name or None,
-            )
-        )
-
-    return parsed_rows
-
-
-def _parse_bulk_invite_text(text: str) -> list[_ParsedInviteRow]:
-    parsed_rows: list[_ParsedInviteRow] = []
-
-    for row_number, raw_line in enumerate(text.splitlines(), start=1):
-        if len(parsed_rows) >= MAX_BULK_INVITE_ROWS:
-            break
-        line = raw_line.strip()
-        if not line or line.startswith("#"):
-            continue
-
-        parsed_rows.append(
-            _ParsedInviteRow(
-                row_number=row_number,
-                email=line,
-                name=None,
-            )
-        )
-
-    return parsed_rows
-
-
-def _parse_bulk_invite_file(
-    filename: Optional[str],
-    content: bytes,
-) -> list[_ParsedInviteRow]:
-    text = _decode_bulk_invite_file(content)
-    file_name = filename.lower() if filename else ""
-    parsed_rows = (
-        _parse_bulk_invite_csv(text)
-        if file_name.endswith(".csv")
-        else _parse_bulk_invite_text(text)
-    )
-
-    if not parsed_rows:
-        raise ValueError("Invite file did not contain any emails")
-
-    return parsed_rows
-
-
-async def bulk_create_invited_users_from_file(
-    filename: Optional[str],
-    content: bytes,
-) -> BulkInvitedUsersResult:
-    parsed_rows = _parse_bulk_invite_file(filename, content)
-
-    created_count = 0
-    skipped_count = 0
-    error_count = 0
-    results: list[BulkInvitedUserRowResult] = []
-    seen_emails: set[str] = set()
-
-    for row in parsed_rows:
-        row_name = _normalize_name(row.name)
-
-        try:
-            validated_email = _email_adapter.validate_python(row.email)
-        except ValidationError:
-            error_count += 1
-            results.append(
-                BulkInvitedUserRowResult(
-                    row_number=row.row_number,
-                    email=row.email or None,
-                    name=row_name,
-                    status="ERROR",
-                    message="Invalid email address",
-                )
-            )
-            continue
-
-        normalized_email = normalize_email(str(validated_email))
-        if normalized_email in seen_emails:
-            skipped_count += 1
-            results.append(
-                BulkInvitedUserRowResult(
-                    row_number=row.row_number,
-                    email=normalized_email,
-                    name=row_name,
-                    status="SKIPPED",
-                    message="Duplicate email in upload file",
-                )
-            )
-            continue
-
-        seen_emails.add(normalized_email)
-
-        try:
-            invited_user = await create_invited_user(normalized_email, row_name)
-        except PreconditionFailed as exc:
-            skipped_count += 1
-            results.append(
-                BulkInvitedUserRowResult(
-                    row_number=row.row_number,
-                    email=normalized_email,
-                    name=row_name,
-                    status="SKIPPED",
-                    message=str(exc),
-                )
-            )
-        except Exception:
-            masked = mask_email(normalized_email)
-            logger.exception(
-                "Failed to create bulk invite for row %s (%s)",
-                row.row_number,
-                masked,
-            )
-            error_count += 1
-            results.append(
-                BulkInvitedUserRowResult(
-                    row_number=row.row_number,
-                    email=normalized_email,
-                    name=row_name,
-                    status="ERROR",
-                    message="Unexpected error creating invite",
-                )
-            )
-        else:
-            created_count += 1
-            results.append(
-                BulkInvitedUserRowResult(
-                    row_number=row.row_number,
-                    email=normalized_email,
-                    name=row_name,
-                    status="CREATED",
-                    message="Invite created",
-                    invited_user=invited_user,
-                )
-            )
-
-    return BulkInvitedUsersResult(
-        created_count=created_count,
-        skipped_count=skipped_count,
-        error_count=error_count,
-        results=results,
-    )
-
-
-async def _compute_invited_user_tally_seed(invited_user_id: str) -> None:
-    invited_user = await prisma.models.InvitedUser.prisma().find_unique(
-        where={"id": invited_user_id}
-    )
-    if invited_user is None:
-        return
-
-    if invited_user.status == prisma.enums.InvitedUserStatus.REVOKED:
-        return
-
-    try:
-        r = await get_redis_async()
-    except Exception:
-        r = None
-
-    lock: AsyncClusterLock | None = None
-
-    if r is not None:
-        lock = AsyncClusterLock(
-            redis=r,
-            key=f"tally_seed:{invited_user_id}",
-            owner_id=_WORKER_ID,
-            timeout=_TALLY_STALE_SECONDS,
-        )
-        current_owner = await lock.try_acquire()
-
-        if current_owner is None:
-            logger.warn("Redis unvailable for tally lock - skipping tally enrichement")
-            return
-        elif current_owner != _WORKER_ID:
-            logger.debug(
-                "Tally seed for %s already locked by %s, skipping",
-                invited_user_id,
-                current_owner,
-            )
-            return
-    if (
-        invited_user.tallyStatus == prisma.enums.TallyComputationStatus.RUNNING
-        and invited_user.updatedAt is not None
-    ):
-        age = (datetime.now(timezone.utc) - invited_user.updatedAt).total_seconds()
-        if age < _TALLY_STALE_SECONDS:
-            logger.debug(
-                "Tally task for %s still RUNNING (age=%ds), skipping",
-                invited_user_id,
-                int(age),
-            )
-            return
-        logger.info(
-            "Tally task for %s is stale (age=%ds), re-running",
-            invited_user_id,
-            int(age),
-        )
-
-    await prisma.models.InvitedUser.prisma().update(
-        where={"id": invited_user_id},
-        data={
-            "tallyStatus": prisma.enums.TallyComputationStatus.RUNNING,
-            "tallyError": None,
-        },
-    )
-
-    try:
-        input_data = await get_business_understanding_input_from_tally(
-            invited_user.email,
-            require_api_key=True,
-        )
-        payload = (
-            SafeJson(input_data.model_dump(exclude_none=True))
-            if input_data is not None
-            else None
-        )
-        await prisma.models.InvitedUser.prisma().update(
-            where={"id": invited_user_id},
-            data={
-                "tallyUnderstanding": payload,
-                "tallyStatus": prisma.enums.TallyComputationStatus.READY,
-                "tallyComputedAt": datetime.now(timezone.utc),
-                "tallyError": None,
-            },
-        )
-    except Exception as exc:
-        logger.exception(
-            "Failed to compute Tally understanding for invited user %s",
-            invited_user_id,
-        )
-        sanitized_error = re.sub(
-            r"https?://\S+", "<url>", f"{type(exc).__name__}: {exc}"
-        )[:_MAX_TALLY_ERROR_LENGTH]
-        await prisma.models.InvitedUser.prisma().update(
-            where={"id": invited_user_id},
-            data={
-                "tallyStatus": prisma.enums.TallyComputationStatus.FAILED,
-                "tallyError": sanitized_error,
-            },
-        )
-
-
-def schedule_invited_user_tally_precompute(invited_user_id: str) -> None:
-    existing = _tally_seed_tasks.get(invited_user_id)
-    if existing is not None and not existing.done():
-        logger.debug("Tally task already running for %s, skipping", invited_user_id)
-        return
-
-    task = asyncio.create_task(_compute_invited_user_tally_seed(invited_user_id))
-    _tally_seed_tasks[invited_user_id] = task
-
-    def _on_done(t: asyncio.Task, _id: str = invited_user_id) -> None:
-        if _tally_seed_tasks.get(_id) is t:
-            del _tally_seed_tasks[_id]
-
-    task.add_done_callback(_on_done)
-
-
-async def _open_signup_create_user(
-    auth_user_id: str,
-    normalized_email: str,
-    metadata_name: Optional[str],
-) -> User:
-    """Create a user without requiring an invite (open signup mode)."""
-    preferred_name = _normalize_name(metadata_name)
-    try:
-        async with transaction() as tx:
-            user = await prisma.models.User.prisma(tx).create(
-                data=prisma.types.UserCreateInput(
-                    id=auth_user_id,
-                    email=normalized_email,
-                    name=preferred_name,
-                )
-            )
-            await _ensure_default_profile(
-                auth_user_id, normalized_email, preferred_name, tx
-            )
-            await _ensure_default_onboarding(auth_user_id, tx)
-    except UniqueViolationError:
-        existing = await prisma.models.User.prisma().find_unique(
-            where={"id": auth_user_id}
-        )
-        if existing is not None:
-            return User.from_db(existing)
-        raise
-
-    return User.from_db(user)
-
-
-# TODO: We need to change this functions logic before going live
-async def get_or_activate_user(user_data: dict) -> User:
-    auth_user_id = user_data.get("sub")
-    if not auth_user_id:
-        raise NotAuthorizedError("User ID not found in token")
-
-    auth_email = user_data.get("email")
-    if not auth_email:
-        raise NotAuthorizedError("Email not found in token")
-
-    normalized_email = normalize_email(auth_email)
-    user_metadata = user_data.get("user_metadata")
-    metadata_name = (
-        user_metadata.get("name") if isinstance(user_metadata, dict) else None
-    )
-
-    existing_user = None
-    try:
-        existing_user = await get_user_by_id(auth_user_id)
-    except ValueError:
-        existing_user = None
-    except Exception:
-        logger.exception("Error on get user by id during tally enrichment process")
-        raise
-
-    if existing_user is not None:
-        return existing_user
-
-    if not _settings.config.enable_invite_gate or normalized_email.endswith("@agpt.co"):
-        return await _open_signup_create_user(
-            auth_user_id, normalized_email, metadata_name
-        )
-
-    invited_user = await prisma.models.InvitedUser.prisma().find_unique(
-        where={"email": normalized_email}
-    )
-    if invited_user is None:
-        raise NotAuthorizedError("Your email is not allowed to access the platform")
-
-    if invited_user.status != prisma.enums.InvitedUserStatus.INVITED:
-        raise NotAuthorizedError("Your invitation is no longer active")
-
-    try:
-        async with transaction() as tx:
-            current_user = await prisma.models.User.prisma(tx).find_unique(
-                where={"id": auth_user_id}
-            )
-            if current_user is not None:
-                return User.from_db(current_user)
-
-            current_invited_user = await prisma.models.InvitedUser.prisma(
-                tx
-            ).find_unique(where={"email": normalized_email})
-            if current_invited_user is None:
-                raise NotAuthorizedError(
-                    "Your email is not allowed to access the platform"
-                )
-
-            if current_invited_user.status != prisma.enums.InvitedUserStatus.INVITED:
-                raise NotAuthorizedError("Your invitation is no longer active")
-
-            if current_invited_user.authUserId not in (None, auth_user_id):
-                raise NotAuthorizedError("Your invitation has already been claimed")
-
-            preferred_name = current_invited_user.name or _normalize_name(metadata_name)
-            await prisma.models.User.prisma(tx).create(
-                data=prisma.types.UserCreateInput(
-                    id=auth_user_id,
-                    email=normalized_email,
-                    name=preferred_name,
-                )
-            )
-
-            await prisma.models.InvitedUser.prisma(tx).update(
-                where={"id": current_invited_user.id},
-                data={
-                    "status": prisma.enums.InvitedUserStatus.CLAIMED,
-                    "authUserId": auth_user_id,
-                },
-            )
-
-            await _ensure_default_profile(
-                auth_user_id,
-                normalized_email,
-                preferred_name,
-                tx,
-            )
-            await _ensure_default_onboarding(auth_user_id, tx)
-            await _apply_tally_understanding(auth_user_id, current_invited_user, tx)
-    except UniqueViolationError:
-        logger.info("Concurrent activation for user %s; re-fetching", auth_user_id)
-        already_created = await prisma.models.User.prisma().find_unique(
-            where={"id": auth_user_id}
-        )
-        if already_created is not None:
-            return User.from_db(already_created)
-        raise RuntimeError(
-            f"UniqueViolationError during activation but user {auth_user_id} not found"
-        )
-
-    get_user_by_id.cache_delete(auth_user_id)
-    get_user_by_email.cache_delete(normalized_email)
-
-    activated_user = await prisma.models.User.prisma().find_unique(
-        where={"id": auth_user_id}
-    )
-    if activated_user is None:
-        raise RuntimeError(
-            f"Activated user {auth_user_id} was not found after creation"
-        )
-
-    return User.from_db(activated_user)
--- a/autogpt_platform/backend/backend/data/invited_user_test.py
+++ b/autogpt_platform/backend/backend/data/invited_user_test.py
@@ -1,335 +0,0 @@
-from contextlib import asynccontextmanager
-from datetime import datetime, timezone
-from types import SimpleNamespace
-from typing import cast
-from unittest.mock import AsyncMock, Mock
-
-import prisma.enums
-import prisma.models
-import pytest
-import pytest_mock
-
-from backend.util.exceptions import NotAuthorizedError, PreconditionFailed
-
-from .invited_user import (
-    InvitedUserRecord,
-    bulk_create_invited_users_from_file,
-    create_invited_user,
-    get_or_activate_user,
-    retry_invited_user_tally,
-)
-
-
-def _invited_user_db_record(
-    *,
-    status: prisma.enums.InvitedUserStatus = prisma.enums.InvitedUserStatus.INVITED,
-    tally_understanding: dict | None = None,
-):
-    now = datetime.now(timezone.utc)
-    return SimpleNamespace(
-        id="invite-1",
-        email="invited@example.com",
-        status=status,
-        authUserId=None,
-        name="Invited User",
-        tallyUnderstanding=tally_understanding,
-        tallyStatus=prisma.enums.TallyComputationStatus.PENDING,
-        tallyComputedAt=None,
-        tallyError=None,
-        createdAt=now,
-        updatedAt=now,
-    )
-
-
-def _invited_user_record(
-    *,
-    status: prisma.enums.InvitedUserStatus = prisma.enums.InvitedUserStatus.INVITED,
-    tally_understanding: dict | None = None,
-):
-    return InvitedUserRecord.from_db(
-        cast(
-            prisma.models.InvitedUser,
-            _invited_user_db_record(
-                status=status,
-                tally_understanding=tally_understanding,
-            ),
-        )
-    )
-
-
-def _user_db_record():
-    now = datetime.now(timezone.utc)
-    return SimpleNamespace(
-        id="auth-user-1",
-        email="invited@example.com",
-        emailVerified=True,
-        name="Invited User",
-        createdAt=now,
-        updatedAt=now,
-        metadata={},
-        integrations="",
-        stripeCustomerId=None,
-        topUpConfig=None,
-        maxEmailsPerDay=3,
-        notifyOnAgentRun=True,
-        notifyOnZeroBalance=True,
-        notifyOnLowBalance=True,
-        notifyOnBlockExecutionFailed=True,
-        notifyOnContinuousAgentError=True,
-        notifyOnDailySummary=True,
-        notifyOnWeeklySummary=True,
-        notifyOnMonthlySummary=True,
-        notifyOnAgentApproved=True,
-        notifyOnAgentRejected=True,
-        timezone="not-set",
-    )
-
-
-@pytest.mark.asyncio
-async def test_create_invited_user_rejects_existing_active_user(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    user_repo = Mock()
-    user_repo.find_unique = AsyncMock(return_value=_user_db_record())
-    invited_user_repo = Mock()
-    invited_user_repo.find_unique = AsyncMock()
-
-    mocker.patch(
-        "backend.data.invited_user.prisma.models.User.prisma", return_value=user_repo
-    )
-    mocker.patch(
-        "backend.data.invited_user.prisma.models.InvitedUser.prisma",
-        return_value=invited_user_repo,
-    )
-
-    with pytest.raises(PreconditionFailed):
-        await create_invited_user("Invited@example.com")
-
-
-@pytest.mark.asyncio
-async def test_create_invited_user_schedules_tally_seed(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    user_repo = Mock()
-    user_repo.find_unique = AsyncMock(return_value=None)
-    invited_user_repo = Mock()
-    invited_user_repo.find_unique = AsyncMock(return_value=None)
-    invited_user_repo.create = AsyncMock(return_value=_invited_user_db_record())
-    schedule = mocker.patch(
-        "backend.data.invited_user.schedule_invited_user_tally_precompute"
-    )
-
-    mocker.patch(
-        "backend.data.invited_user.prisma.models.User.prisma", return_value=user_repo
-    )
-    mocker.patch(
-        "backend.data.invited_user.prisma.models.InvitedUser.prisma",
-        return_value=invited_user_repo,
-    )
-
-    invited_user = await create_invited_user("Invited@example.com", "Invited User")
-
-    assert invited_user.email == "invited@example.com"
-    invited_user_repo.create.assert_awaited_once()
-    schedule.assert_called_once_with("invite-1")
-
-
-@pytest.mark.asyncio
-async def test_retry_invited_user_tally_resets_state_and_schedules(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    invited_user_repo = Mock()
-    invited_user_repo.find_unique = AsyncMock(return_value=_invited_user_db_record())
-    invited_user_repo.update = AsyncMock(return_value=_invited_user_db_record())
-    schedule = mocker.patch(
-        "backend.data.invited_user.schedule_invited_user_tally_precompute"
-    )
-
-    mocker.patch(
-        "backend.data.invited_user.prisma.models.InvitedUser.prisma",
-        return_value=invited_user_repo,
-    )
-
-    invited_user = await retry_invited_user_tally("invite-1")
-
-    assert invited_user.id == "invite-1"
-    invited_user_repo.update.assert_awaited_once()
-    schedule.assert_called_once_with("invite-1")
-
-
-@pytest.mark.asyncio
-async def test_get_or_activate_user_requires_invite(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    invited_user_repo = Mock()
-    invited_user_repo.find_unique = AsyncMock(return_value=None)
-
-    mock_get_user_by_id = AsyncMock(side_effect=ValueError("User not found"))
-    mock_get_user_by_id.cache_delete = Mock()
-    mocker.patch(
-        "backend.data.invited_user.get_user_by_id",
-        mock_get_user_by_id,
-    )
-    mocker.patch(
-        "backend.data.invited_user._settings.config.enable_invite_gate",
-        True,
-    )
-    mocker.patch(
-        "backend.data.invited_user.prisma.models.InvitedUser.prisma",
-        return_value=invited_user_repo,
-    )
-
-    with pytest.raises(NotAuthorizedError):
-        await get_or_activate_user(
-            {"sub": "auth-user-1", "email": "invited@example.com"}
-        )
-
-
-@pytest.mark.asyncio
-async def test_get_or_activate_user_creates_user_from_invite(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    tx = object()
-    invited_user = _invited_user_db_record(
-        tally_understanding={"user_name": "Invited User", "industry": "Automation"}
-    )
-    created_user = _user_db_record()
-
-    outside_user_repo = Mock()
-    # Only called once at post-transaction verification (line 741);
-    # get_user_by_id (line 657) uses prisma.user.find_unique, not this mock.
-    outside_user_repo.find_unique = AsyncMock(return_value=created_user)
-
-    inside_user_repo = Mock()
-    inside_user_repo.find_unique = AsyncMock(return_value=None)
-    inside_user_repo.create = AsyncMock(return_value=created_user)
-
-    outside_invited_repo = Mock()
-    outside_invited_repo.find_unique = AsyncMock(return_value=invited_user)
-
-    inside_invited_repo = Mock()
-    inside_invited_repo.find_unique = AsyncMock(return_value=invited_user)
-    inside_invited_repo.update = AsyncMock(return_value=invited_user)
-
-    def user_prisma(client=None):
-        return inside_user_repo if client is tx else outside_user_repo
-
-    def invited_user_prisma(client=None):
-        return inside_invited_repo if client is tx else outside_invited_repo
-
-    @asynccontextmanager
-    async def fake_transaction():
-        yield tx
-
-    # Mock get_user_by_id since it uses prisma.user.find_unique (global client),
-    # not prisma.models.User.prisma().find_unique which we mock above.
-    mock_get_user_by_id = AsyncMock(side_effect=ValueError("User not found"))
-    mock_get_user_by_id.cache_delete = Mock()
-    mocker.patch(
-        "backend.data.invited_user.get_user_by_id",
-        mock_get_user_by_id,
-    )
-    mock_get_user_by_email = AsyncMock()
-    mock_get_user_by_email.cache_delete = Mock()
-    mocker.patch(
-        "backend.data.invited_user.get_user_by_email",
-        mock_get_user_by_email,
-    )
-    ensure_profile = mocker.patch(
-        "backend.data.invited_user._ensure_default_profile",
-        AsyncMock(),
-    )
-    ensure_onboarding = mocker.patch(
-        "backend.data.invited_user._ensure_default_onboarding",
-        AsyncMock(),
-    )
-    apply_tally = mocker.patch(
-        "backend.data.invited_user._apply_tally_understanding",
-        AsyncMock(),
-    )
-    mocker.patch("backend.data.invited_user.transaction", fake_transaction)
-    mocker.patch(
-        "backend.data.invited_user.prisma.models.User.prisma", side_effect=user_prisma
-    )
-    mocker.patch(
-        "backend.data.invited_user.prisma.models.InvitedUser.prisma",
-        side_effect=invited_user_prisma,
-    )
-
-    user = await get_or_activate_user(
-        {
-            "sub": "auth-user-1",
-            "email": "Invited@example.com",
-            "user_metadata": {"name": "Invited User"},
-        }
-    )
-
-    assert user.id == "auth-user-1"
-    inside_user_repo.create.assert_awaited_once()
-    inside_invited_repo.update.assert_awaited_once()
-    ensure_profile.assert_awaited_once()
-    ensure_onboarding.assert_awaited_once_with("auth-user-1", tx)
-    apply_tally.assert_awaited_once_with("auth-user-1", invited_user, tx)
-
-
-@pytest.mark.asyncio
-async def test_bulk_create_invited_users_from_text_file(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    create_invited = mocker.patch(
-        "backend.data.invited_user.create_invited_user",
-        AsyncMock(
-            side_effect=[
-                _invited_user_record(),
-                _invited_user_record(),
-            ]
-        ),
-    )
-
-    result = await bulk_create_invited_users_from_file(
-        "invites.txt",
-        b"Invited@example.com\nsecond@example.com\n",
-    )
-
-    assert result.created_count == 2
-    assert result.skipped_count == 0
-    assert result.error_count == 0
-    assert [row.status for row in result.results] == ["CREATED", "CREATED"]
-    assert create_invited.await_count == 2
-
-
-@pytest.mark.asyncio
-async def test_bulk_create_invited_users_handles_csv_duplicates_and_invalid_rows(
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    create_invited = mocker.patch(
-        "backend.data.invited_user.create_invited_user",
-        AsyncMock(
-            side_effect=[
-                _invited_user_record(),
-                PreconditionFailed("An invited user with this email already exists"),
-            ]
-        ),
-    )
-
-    result = await bulk_create_invited_users_from_file(
-        "invites.csv",
-        (
-            "email,name\n"
-            "valid@example.com,Valid User\n"
-            "not-an-email,Bad Row\n"
-            "valid@example.com,Duplicate In File\n"
-            "existing@example.com,Existing User\n"
-        ).encode("utf-8"),
-    )
-
-    assert result.created_count == 1
-    assert result.skipped_count == 2
-    assert result.error_count == 1
-    assert [row.status for row in result.results] == [
-        "CREATED",
-        "ERROR",
-        "SKIPPED",
-        "SKIPPED",
-    ]
-    assert create_invited.await_count == 2
--- a/autogpt_platform/backend/backend/data/tally.py
+++ b/autogpt_platform/backend/backend/data/tally.py
@@ -41,7 +41,7 @@ _MAX_PAGES = 100
 _LLM_TIMEOUT = 30


-def mask_email(email: str) -> str:
+def _mask_email(email: str) -> str:
    """Mask an email for safe logging: 'alice@example.com' -> 'a***e@example.com'."""
    try:
        local, domain = email.rsplit("@", 1)
@@ -196,7 +196,8 @@ async def _refresh_cache(form_id: str) -> tuple[dict, list]:

    Returns (email_index, questions).
    """
-    client = _make_tally_client(_settings.secrets.tally_api_key)
+    settings = Settings()
+    client = _make_tally_client(settings.secrets.tally_api_key)

    redis = await get_redis_async()
    last_fetch_key = _LAST_FETCH_KEY.format(form_id=form_id)
@@ -331,9 +332,6 @@ Fields:
 - current_software (list of strings): software/tools currently used
 - existing_automation (list of strings): existing automations
 - additional_notes (string): any additional context
- suggested_prompts (list of 5 strings): short action prompts (each under 20 words) that would help \
-this person get started with automating their work. Should be specific to their industry, role, and \
-pain points; actionable and conversational in tone; focused on automation opportunities.

 Form data:
 """
@@ -341,21 +339,21 @@ Form data:
 _EXTRACTION_SUFFIX = "\n\nReturn ONLY valid JSON."


-async def extract_business_understanding_from_tally(
+async def extract_business_understanding(
    formatted_text: str,
 ) -> BusinessUnderstandingInput:
-    """
-    Use an LLM to extract structured business understanding from form text.
+    """Use an LLM to extract structured business understanding from form text.

    Raises on timeout or unparseable response so the caller can handle it.
    """
-    api_key = _settings.secrets.open_router_api_key
+    settings = Settings()
+    api_key = settings.secrets.open_router_api_key
    client = AsyncOpenAI(api_key=api_key, base_url=OPENROUTER_BASE_URL)

    try:
        response = await asyncio.wait_for(
            client.chat.completions.create(
-                model=_settings.config.tally_extraction_llm_model,
+                model="openai/gpt-4o-mini",
                messages=[
                    {
                        "role": "user",
@@ -380,57 +378,9 @@ async def extract_business_understanding_from_tally(

    # Filter out null values before constructing
    cleaned = {k: v for k, v in data.items() if v is not None}
-
-    # Validate suggested_prompts: filter >20 words, keep top 3
-    raw_prompts = cleaned.get("suggested_prompts", [])
-    if isinstance(raw_prompts, list):
-        valid = [
-            p.strip()
-            for p in raw_prompts
-            if isinstance(p, str) and len(p.strip().split()) <= 20
-        ]
-        # This will keep up to 3 suggestions
-        short_prompts = valid[:3] if valid else None
-        if short_prompts:
-            cleaned["suggested_prompts"] = short_prompts
-        else:
-            # We dont want to add a None value suggested_prompts field
-            cleaned.pop("suggested_prompts", None)
-    else:
-        # suggested_prompts must be a list - removing it as its not here
-        cleaned.pop("suggested_prompts", None)
-
    return BusinessUnderstandingInput(**cleaned)


-async def get_business_understanding_input_from_tally(
-    email: str,
-    *,
-    require_api_key: bool = False,
-) -> Optional[BusinessUnderstandingInput]:
-    if not _settings.secrets.tally_api_key:
-        if require_api_key:
-            raise RuntimeError("Tally API key is not configured")
-        logger.debug("Tally: no API key configured, skipping")
-        return None
-
-    masked = mask_email(email)
-    result = await find_submission_by_email(TALLY_FORM_ID, email)
-    if result is None:
-        logger.debug(f"Tally: no submission found for {masked}")
-        return None
-
-    submission, questions = result
-    logger.info(f"Tally: found submission for {masked}, extracting understanding")
-
-    formatted = format_submission_for_llm(submission, questions)
-    if not formatted.strip():
-        logger.warning("Tally: formatted submission was empty, skipping")
-        return None
-
-    return await extract_business_understanding_from_tally(formatted)
-
-
 async def populate_understanding_from_tally(user_id: str, email: str) -> None:
    """Main orchestrator: check Tally for a matching submission and populate understanding.

@@ -445,9 +395,32 @@ async def populate_understanding_from_tally(user_id: str, email: str) -> None:
            )
            return

-        understanding_input = await get_business_understanding_input_from_tally(email)
-        if understanding_input is None:
+        # Check required config is present
+        settings = Settings()
+        if not settings.secrets.tally_api_key or not settings.secrets.tally_form_id:
+            logger.debug("Tally: Tally config incomplete, skipping")
            return
+        if not settings.secrets.open_router_api_key:
+            logger.debug("Tally: no OpenRouter API key configured, skipping")
+            return
+
+        # Look up submission by email
+        masked = _mask_email(email)
+        result = await find_submission_by_email(settings.secrets.tally_form_id, email)
+        if result is None:
+            logger.debug(f"Tally: no submission found for {masked}")
+            return
+
+        submission, questions = result
+        logger.info(f"Tally: found submission for {masked}, extracting understanding")
+
+        # Format and extract
+        formatted = format_submission_for_llm(submission, questions)
+        if not formatted.strip():
+            logger.warning("Tally: formatted submission was empty, skipping")
+            return
+
+        understanding_input = await extract_business_understanding(formatted)

        # Upsert into database
        await upsert_business_understanding(user_id, understanding_input)
--- a/autogpt_platform/backend/backend/data/tally_test.py
+++ b/autogpt_platform/backend/backend/data/tally_test.py
@@ -12,11 +12,11 @@ from backend.data.tally import (
    _build_email_index,
    _format_answer,
    _make_tally_client,
+    _mask_email,
    _refresh_cache,
-    extract_business_understanding_from_tally,
+    extract_business_understanding,
    find_submission_by_email,
    format_submission_for_llm,
-    mask_email,
    populate_understanding_from_tally,
 )

@@ -248,7 +248,7 @@ async def test_populate_understanding_skips_no_api_key():
            new_callable=AsyncMock,
            return_value=None,
        ),
-        patch("backend.data.tally._settings", mock_settings),
+        patch("backend.data.tally.Settings", return_value=mock_settings),
        patch(
            "backend.data.tally.find_submission_by_email",
            new_callable=AsyncMock,
@@ -284,7 +284,6 @@ async def test_populate_understanding_full_flow():
        ],
    }
    mock_input = MagicMock()
-    mock_input.suggested_prompts = ["Prompt 1", "Prompt 2", "Prompt 3"]

    with (
        patch(
@@ -292,14 +291,14 @@ async def test_populate_understanding_full_flow():
            new_callable=AsyncMock,
            return_value=None,
        ),
-        patch("backend.data.tally._settings", mock_settings),
+        patch("backend.data.tally.Settings", return_value=mock_settings),
        patch(
            "backend.data.tally.find_submission_by_email",
            new_callable=AsyncMock,
            return_value=(submission, SAMPLE_QUESTIONS),
        ),
        patch(
-            "backend.data.tally.extract_business_understanding_from_tally",
+            "backend.data.tally.extract_business_understanding",
            new_callable=AsyncMock,
            return_value=mock_input,
        ) as mock_extract,
@@ -332,14 +331,14 @@ async def test_populate_understanding_handles_llm_timeout():
            new_callable=AsyncMock,
            return_value=None,
        ),
-        patch("backend.data.tally._settings", mock_settings),
+        patch("backend.data.tally.Settings", return_value=mock_settings),
        patch(
            "backend.data.tally.find_submission_by_email",
            new_callable=AsyncMock,
            return_value=(submission, SAMPLE_QUESTIONS),
        ),
        patch(
-            "backend.data.tally.extract_business_understanding_from_tally",
+            "backend.data.tally.extract_business_understanding",
            new_callable=AsyncMock,
            side_effect=asyncio.TimeoutError(),
        ),
@@ -357,13 +356,13 @@ async def test_populate_understanding_handles_llm_timeout():


 def test_mask_email():
-    assert mask_email("alice@example.com") == "a***e@example.com"
-    assert mask_email("ab@example.com") == "a***@example.com"
-    assert mask_email("a@example.com") == "a***@example.com"
+    assert _mask_email("alice@example.com") == "a***e@example.com"
+    assert _mask_email("ab@example.com") == "a***@example.com"
+    assert _mask_email("a@example.com") == "a***@example.com"


 def test_mask_email_invalid():
-    assert mask_email("no-at-sign") == "***"
+    assert _mask_email("no-at-sign") == "***"


 # ── Prompt construction (curly-brace safety) ─────────────────────────────────
@@ -394,11 +393,11 @@ def test_extraction_prompt_no_format_placeholders():
    assert single_braces == [], f"Found format placeholders: {single_braces}"


-# ── extract_business_understanding_from_tally ────────────────────────────────────────────
+# ── extract_business_understanding ────────────────────────────────────────────


@pytest.mark.asyncio
-async def test_extract_business_understanding_from_tally_success():
+async def test_extract_business_understanding_success():
    """Happy path: LLM returns valid JSON that maps to BusinessUnderstandingInput."""
    mock_choice = MagicMock()
    mock_choice.message.content = json.dumps(
@@ -407,13 +406,6 @@ async def test_extract_business_understanding_from_tally_success():
            "business_name": "Acme Corp",
            "industry": "Technology",
            "pain_points": ["manual reporting"],
-            "suggested_prompts": [
-                "Automate weekly reports",
-                "Set up invoice processing",
-                "Create a customer onboarding flow",
-                "Track project deadlines automatically",
-                "Send follow-up emails after meetings",
-            ],
        }
    )
    mock_response = MagicMock()
@@ -423,56 +415,16 @@ async def test_extract_business_understanding_from_tally_success():
    mock_client.chat.completions.create.return_value = mock_response

    with patch("backend.data.tally.AsyncOpenAI", return_value=mock_client):
-        result = await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
+        result = await extract_business_understanding("Q: Name?\nA: Alice")

    assert result.user_name == "Alice"
    assert result.business_name == "Acme Corp"
    assert result.industry == "Technology"
    assert result.pain_points == ["manual reporting"]
-    # suggested_prompts validated and sliced to top 3
-    assert result.suggested_prompts == [
-        "Automate weekly reports",
-        "Set up invoice processing",
-        "Create a customer onboarding flow",
-    ]


@pytest.mark.asyncio
-async def test_extract_business_understanding_from_tally_filters_long_prompts():
-    """Prompts exceeding 20 words are excluded and only top 3 are kept."""
-    long_prompt = " ".join(["word"] * 21)
-    mock_choice = MagicMock()
-    mock_choice.message.content = json.dumps(
-        {
-            "user_name": "Alice",
-            "suggested_prompts": [
-                long_prompt,
-                "Short prompt one",
-                long_prompt,
-                "Short prompt two",
-                "Short prompt three",
-                "Short prompt four",
-            ],
-        }
-    )
-    mock_response = MagicMock()
-    mock_response.choices = [mock_choice]
-
-    mock_client = AsyncMock()
-    mock_client.chat.completions.create.return_value = mock_response
-
-    with patch("backend.data.tally.AsyncOpenAI", return_value=mock_client):
-        result = await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
-
-    assert result.suggested_prompts == [
-        "Short prompt one",
-        "Short prompt two",
-        "Short prompt three",
-    ]
-
-
-@pytest.mark.asyncio
-async def test_extract_business_understanding_from_tally_filters_nulls():
+async def test_extract_business_understanding_filters_nulls():
    """Null values from LLM should be excluded from the result."""
    mock_choice = MagicMock()
    mock_choice.message.content = json.dumps(
@@ -485,7 +437,7 @@ async def test_extract_business_understanding_from_tally_filters_nulls():
    mock_client.chat.completions.create.return_value = mock_response

    with patch("backend.data.tally.AsyncOpenAI", return_value=mock_client):
-        result = await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
+        result = await extract_business_understanding("Q: Name?\nA: Alice")

    assert result.user_name == "Alice"
    assert result.business_name is None
@@ -493,7 +445,7 @@ async def test_extract_business_understanding_from_tally_filters_nulls():


@pytest.mark.asyncio
-async def test_extract_business_understanding_from_tally_invalid_json():
+async def test_extract_business_understanding_invalid_json():
    """Invalid JSON from LLM should raise JSONDecodeError."""
    mock_choice = MagicMock()
    mock_choice.message.content = "not valid json {"
@@ -507,11 +459,11 @@ async def test_extract_business_understanding_from_tally_invalid_json():
        patch("backend.data.tally.AsyncOpenAI", return_value=mock_client),
        pytest.raises(json.JSONDecodeError),
    ):
-        await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
+        await extract_business_understanding("Q: Name?\nA: Alice")


@pytest.mark.asyncio
-async def test_extract_business_understanding_from_tally_timeout():
+async def test_extract_business_understanding_timeout():
    """LLM timeout should propagate as asyncio.TimeoutError."""
    mock_client = AsyncMock()
    mock_client.chat.completions.create.side_effect = asyncio.TimeoutError()
@@ -521,7 +473,7 @@ async def test_extract_business_understanding_from_tally_timeout():
        patch("backend.data.tally._LLM_TIMEOUT", 0.001),
        pytest.raises(asyncio.TimeoutError),
    ):
-        await extract_business_understanding_from_tally("Q: Name?\nA: Alice")
+        await extract_business_understanding("Q: Name?\nA: Alice")


 # ── _refresh_cache ───────────────────────────────────────────────────────────
@@ -540,7 +492,7 @@ async def test_refresh_cache_full_fetch():
    submissions = SAMPLE_SUBMISSIONS

    with (
-        patch("backend.data.tally._settings", mock_settings),
+        patch("backend.data.tally.Settings", return_value=mock_settings),
        patch(
            "backend.data.tally.get_redis_async",
            new_callable=AsyncMock,
@@ -588,7 +540,7 @@ async def test_refresh_cache_incremental_fetch():
    new_submissions = [SAMPLE_SUBMISSIONS[0]]  # Just Alice

    with (
-        patch("backend.data.tally._settings", mock_settings),
+        patch("backend.data.tally.Settings", return_value=mock_settings),
        patch(
            "backend.data.tally.get_redis_async",
            new_callable=AsyncMock,
--- a/autogpt_platform/backend/backend/data/understanding.py
+++ b/autogpt_platform/backend/backend/data/understanding.py
@@ -86,11 +86,6 @@ class BusinessUnderstandingInput(pydantic.BaseModel):
        None, description="Any additional context"
    )

-    # Suggested prompts (UI-only, not included in system prompt)
-    suggested_prompts: Optional[list[str]] = pydantic.Field(
-        None, description="LLM-generated suggested prompts based on business context"
-    )
-

 class BusinessUnderstanding(pydantic.BaseModel):
    """Full business understanding model returned from database."""
@@ -127,9 +122,6 @@ class BusinessUnderstanding(pydantic.BaseModel):
    # Additional context
    additional_notes: Optional[str] = None

-    # Suggested prompts (UI-only, not included in system prompt)
-    suggested_prompts: list[str] = pydantic.Field(default_factory=list)
-
    @classmethod
    def from_db(cls, db_record: CoPilotUnderstanding) -> "BusinessUnderstanding":
        """Convert database record to Pydantic model."""
@@ -157,7 +149,6 @@ class BusinessUnderstanding(pydantic.BaseModel):
            current_software=_json_to_list(business.get("current_software")),
            existing_automation=_json_to_list(business.get("existing_automation")),
            additional_notes=business.get("additional_notes"),
-            suggested_prompts=_json_to_list(data.get("suggested_prompts")),
        )


@@ -175,62 +166,6 @@ def _merge_lists(existing: list | None, new: list | None) -> list | None:
    return merged


-def merge_business_understanding_data(
-    existing_data: dict[str, Any],
-    input_data: BusinessUnderstandingInput,
-) -> dict[str, Any]:
-    merged_data = dict(existing_data)
-
-    merged_business: dict[str, Any] = {}
-    if isinstance(merged_data.get("business"), dict):
-        merged_business = dict(merged_data["business"])
-
-    business_string_fields = [
-        "job_title",
-        "business_name",
-        "industry",
-        "business_size",
-        "user_role",
-        "additional_notes",
-    ]
-    business_list_fields = [
-        "key_workflows",
-        "daily_activities",
-        "pain_points",
-        "bottlenecks",
-        "manual_tasks",
-        "automation_goals",
-        "current_software",
-        "existing_automation",
-    ]
-
-    if input_data.user_name is not None:
-        merged_data["name"] = input_data.user_name
-
-    for field in business_string_fields:
-        value = getattr(input_data, field)
-        if value is not None:
-            merged_business[field] = value
-
-    for field in business_list_fields:
-        value = getattr(input_data, field)
-        if value is not None:
-            existing_list = _json_to_list(merged_business.get(field))
-            merged_list = _merge_lists(existing_list, value)
-            merged_business[field] = merged_list
-
-    merged_business["version"] = 1
-    merged_data["business"] = merged_business
-
-    # suggested_prompts lives at the top level (not under `business`) because
-    # it's a UI-only artifact consumed by the frontend, not business understanding
-    # data. The `business` sub-dict feeds the system prompt.
-    if input_data.suggested_prompts is not None:
-        merged_data["suggested_prompts"] = input_data.suggested_prompts
-
-    return merged_data
-
-
 async def _get_from_cache(user_id: str) -> Optional[BusinessUnderstanding]:
    """Get business understanding from Redis cache."""
    try:
@@ -310,18 +245,63 @@ async def upsert_business_understanding(
        where={"userId": user_id}
    )

+    # Get existing data structure or start fresh
    existing_data: dict[str, Any] = {}
    if existing and isinstance(existing.data, dict):
        existing_data = dict(existing.data)

-    merged_data = merge_business_understanding_data(existing_data, input_data)
+    existing_business: dict[str, Any] = {}
+    if isinstance(existing_data.get("business"), dict):
+        existing_business = dict(existing_data["business"])
+
+    # Business fields (stored inside business object)
+    business_string_fields = [
+        "job_title",
+        "business_name",
+        "industry",
+        "business_size",
+        "user_role",
+        "additional_notes",
+    ]
+    business_list_fields = [
+        "key_workflows",
+        "daily_activities",
+        "pain_points",
+        "bottlenecks",
+        "manual_tasks",
+        "automation_goals",
+        "current_software",
+        "existing_automation",
+    ]
+
+    # Handle top-level name field
+    if input_data.user_name is not None:
+        existing_data["name"] = input_data.user_name
+
+    # Business string fields - overwrite if provided
+    for field in business_string_fields:
+        value = getattr(input_data, field)
+        if value is not None:
+            existing_business[field] = value
+
+    # Business list fields - merge with existing
+    for field in business_list_fields:
+        value = getattr(input_data, field)
+        if value is not None:
+            existing_list = _json_to_list(existing_business.get(field))
+            merged = _merge_lists(existing_list, value)
+            existing_business[field] = merged
+
+    # Set version and nest business data
+    existing_business["version"] = 1
+    existing_data["business"] = existing_business

    # Upsert with the merged data
    record = await CoPilotUnderstanding.prisma().upsert(
        where={"userId": user_id},
        data={
-            "create": {"userId": user_id, "data": SafeJson(merged_data)},
-            "update": {"data": SafeJson(merged_data)},
+            "create": {"userId": user_id, "data": SafeJson(existing_data)},
+            "update": {"data": SafeJson(existing_data)},
        },
    )

--- a/autogpt_platform/backend/backend/data/understanding_test.py
+++ b/autogpt_platform/backend/backend/data/understanding_test.py
@@ -1,102 +0,0 @@
-"""Tests for business understanding merge and format logic."""
-
-from datetime import datetime, timezone
-from typing import Any
-
-from backend.data.understanding import (
-    BusinessUnderstanding,
-    BusinessUnderstandingInput,
-    format_understanding_for_prompt,
-    merge_business_understanding_data,
-)
-
-
-def _make_input(**kwargs: Any) -> BusinessUnderstandingInput:
-    """Create a BusinessUnderstandingInput with only the specified fields."""
-    return BusinessUnderstandingInput.model_validate(kwargs)
-
-
-# ─── merge_business_understanding_data: suggested_prompts ─────────────
-
-
-def test_merge_suggested_prompts_overwrites_existing():
-    """New suggested_prompts should fully replace existing ones (not append)."""
-    existing = {
-        "name": "Alice",
-        "business": {"industry": "Tech", "version": 1},
-        "suggested_prompts": ["Old prompt 1", "Old prompt 2"],
-    }
-    input_data = _make_input(
-        suggested_prompts=["New prompt A", "New prompt B", "New prompt C"],
-    )
-
-    result = merge_business_understanding_data(existing, input_data)
-
-    assert result["suggested_prompts"] == [
-        "New prompt A",
-        "New prompt B",
-        "New prompt C",
-    ]
-
-
-def test_merge_suggested_prompts_none_preserves_existing():
-    """When input has suggested_prompts=None, existing prompts are preserved."""
-    existing = {
-        "name": "Alice",
-        "business": {"industry": "Tech", "version": 1},
-        "suggested_prompts": ["Keep me"],
-    }
-    input_data = _make_input(industry="Finance")
-
-    result = merge_business_understanding_data(existing, input_data)
-
-    assert result["suggested_prompts"] == ["Keep me"]
-    assert result["business"]["industry"] == "Finance"
-
-
-def test_merge_suggested_prompts_added_to_empty_data():
-    """Suggested prompts are set at top level even when starting from empty data."""
-    existing: dict[str, Any] = {}
-    input_data = _make_input(suggested_prompts=["Prompt 1"])
-
-    result = merge_business_understanding_data(existing, input_data)
-
-    assert result["suggested_prompts"] == ["Prompt 1"]
-
-
-def test_merge_suggested_prompts_empty_list_overwrites():
-    """An explicit empty list should overwrite existing prompts."""
-    existing: dict[str, Any] = {
-        "suggested_prompts": ["Old prompt"],
-        "business": {"version": 1},
-    }
-    input_data = _make_input(suggested_prompts=[])
-
-    result = merge_business_understanding_data(existing, input_data)
-
-    assert result["suggested_prompts"] == []
-
-
-# ─── format_understanding_for_prompt: excludes suggested_prompts ──────
-
-
-def test_format_understanding_excludes_suggested_prompts():
-    """suggested_prompts is UI-only and must NOT appear in the system prompt."""
-    understanding = BusinessUnderstanding(
-        id="test-id",
-        user_id="user-1",
-        created_at=datetime.now(tz=timezone.utc),
-        updated_at=datetime.now(tz=timezone.utc),
-        user_name="Alice",
-        industry="Technology",
-        suggested_prompts=["Automate reports", "Set up alerts", "Track KPIs"],
-    )
-
-    formatted = format_understanding_for_prompt(understanding)
-
-    assert "Alice" in formatted
-    assert "Technology" in formatted
-    assert "suggested_prompts" not in formatted
-    assert "Automate reports" not in formatted
-    assert "Set up alerts" not in formatted
-    assert "Track KPIs" not in formatted
--- a/autogpt_platform/backend/backend/executor/manager.py
+++ b/autogpt_platform/backend/backend/executor/manager.py
@@ -224,7 +224,7 @@ async def execute_node(
    # Sanity check: validate the execution input.
    input_data, error = validate_exec(node, data.inputs, resolve_input=False)
    if input_data is None:
-        log_metadata.error(f"Skip execution, input validation error: {error}")
+        log_metadata.warning(f"Skip execution, input validation error: {error}")
        yield "error", error
        return

--- a/autogpt_platform/backend/backend/integrations/creds_manager.py
+++ b/autogpt_platform/backend/backend/integrations/creds_manager.py
@@ -25,6 +25,53 @@ logger = logging.getLogger(__name__)
 settings = Settings()


+_on_creds_changed: Callable[[str, str], None] | None = None
+
+
+def register_creds_changed_hook(hook: Callable[[str, str], None]) -> None:
+    """Register a callback invoked after any credential is created/updated/deleted.
+
+    The callback receives ``(user_id, provider)`` and should be idempotent.
+    Only one hook can be registered at a time.  Intended to be called once at
+    application startup (e.g. by the copilot module) without creating an
+    import cycle.
+
+    Raises:
+        RuntimeError: If a hook is already registered.  Call
+            :func:`unregister_creds_changed_hook` first if replacement is needed.
+    """
+    global _on_creds_changed
+    if _on_creds_changed is not None:
+        raise RuntimeError(
+            "A creds_changed hook is already registered. "
+            "Call unregister_creds_changed_hook() before registering a new one."
+        )
+    _on_creds_changed = hook
+
+
+def unregister_creds_changed_hook() -> None:
+    """Remove the currently registered creds-changed hook (if any).
+
+    Primarily useful in tests to reset global state between test cases.
+    """
+    global _on_creds_changed
+    _on_creds_changed = None
+
+
+def _invoke_creds_changed_hook(user_id: str, provider: str) -> None:
+    """Invoke the registered creds-changed hook (if any)."""
+    if _on_creds_changed is not None:
+        try:
+            _on_creds_changed(user_id, provider)
+        except Exception:
+            logger.warning(
+                "Credential-change hook failed for user=%s provider=%s",
+                user_id,
+                provider,
+                exc_info=True,
+            )
+
+
 class IntegrationCredentialsManager:
    """
    Handles the lifecycle of integration credentials.
@@ -69,7 +116,10 @@ class IntegrationCredentialsManager:
        return self._locks

    async def create(self, user_id: str, credentials: Credentials) -> None:
-        return await self.store.add_creds(user_id, credentials)
+        result = await self.store.add_creds(user_id, credentials)
+        # Notify listeners so downstream caches are invalidated immediately.
+        _invoke_creds_changed_hook(user_id, credentials.provider)
+        return result

    async def exists(self, user_id: str, credentials_id: str) -> bool:
        return (await self.store.get_creds_by_id(user_id, credentials_id)) is not None
@@ -146,8 +196,7 @@ class IntegrationCredentialsManager:
                oauth_handler = await _get_provider_oauth_handler(credentials.provider)
            if oauth_handler.needs_refresh(credentials):
                logger.debug(
-                    f"Refreshing '{credentials.provider}' "
-                    f"credentials #{credentials.id}"
+                    f"Refreshing '{credentials.provider}' credentials #{credentials.id}"
                )
                _lock = None
                if lock:
@@ -156,11 +205,16 @@ class IntegrationCredentialsManager:

                fresh_credentials = await oauth_handler.refresh_tokens(credentials)
                await self.store.update_creds(user_id, fresh_credentials)
+                # Notify listeners so the refreshed token is picked up immediately.
+                _invoke_creds_changed_hook(user_id, fresh_credentials.provider)
                if _lock and (await _lock.locked()) and (await _lock.owned()):
                    try:
                        await _lock.release()
-                    except Exception as e:
-                        logger.warning(f"Failed to release OAuth refresh lock: {e}")
+                    except Exception:
+                        logger.warning(
+                            "Failed to release OAuth refresh lock",
+                            exc_info=True,
+                        )

                credentials = fresh_credentials
        return credentials
@@ -168,10 +222,17 @@ class IntegrationCredentialsManager:
    async def update(self, user_id: str, updated: Credentials) -> None:
        async with self._locked(user_id, updated.id):
            await self.store.update_creds(user_id, updated)
+        # Notify listeners so the updated credential is picked up immediately.
+        _invoke_creds_changed_hook(user_id, updated.provider)

    async def delete(self, user_id: str, credentials_id: str) -> None:
        async with self._locked(user_id, credentials_id):
+            # Read inside the lock to avoid TOCTOU — another coroutine could
+            # delete the same credential between the read and the delete.
+            creds = await self.store.get_creds_by_id(user_id, credentials_id)
            await self.store.delete_creds_by_id(user_id, credentials_id)
+        if creds:
+            _invoke_creds_changed_hook(user_id, creds.provider)

    # -- Locking utilities -- #

@@ -195,8 +256,11 @@ class IntegrationCredentialsManager:
            if (await lock.locked()) and (await lock.owned()):
                try:
                    await lock.release()
-                except Exception as e:
-                    logger.warning(f"Failed to release credentials lock: {e}")
+                except Exception:
+                    logger.warning(
+                        "Failed to release credentials lock",
+                        exc_info=True,
+                    )

    async def release_all_locks(self):
        """Call this on process termination to ensure all locks are released"""
--- a/autogpt_platform/backend/backend/integrations/creds_manager_test.py
+++ b/autogpt_platform/backend/backend/integrations/creds_manager_test.py
@@ -0,0 +1,60 @@
+"""Tests for creds_manager hook system: register, invoke, and CRUD integration."""
+
+import pytest
+
+from backend.integrations.creds_manager import (
+    _invoke_creds_changed_hook,
+    register_creds_changed_hook,
+    unregister_creds_changed_hook,
+)
+
+
+@pytest.fixture(autouse=True)
+def _reset_hook():
+    """Ensure global hook state is clean before and after every test."""
+    unregister_creds_changed_hook()
+    yield
+    unregister_creds_changed_hook()
+
+
+class TestRegisterCredsChangedHook:
+    def test_register_and_invoke(self):
+        calls: list[tuple[str, str]] = []
+        register_creds_changed_hook(lambda u, p: calls.append((u, p)))
+
+        _invoke_creds_changed_hook("user-1", "github")
+        assert calls == [("user-1", "github")]
+
+    def test_double_register_raises(self):
+        register_creds_changed_hook(lambda u, p: None)
+        with pytest.raises(RuntimeError, match="already registered"):
+            register_creds_changed_hook(lambda u, p: None)
+
+    def test_unregister_then_reregister(self):
+        register_creds_changed_hook(lambda u, p: None)
+        unregister_creds_changed_hook()
+        # Should not raise after unregister.
+        register_creds_changed_hook(lambda u, p: None)
+
+
+class TestInvokeCredsChangedHook:
+    def test_noop_when_no_hook_registered(self):
+        # Must not raise even when no hook is registered.
+        _invoke_creds_changed_hook("user-1", "github")
+
+    def test_hook_exception_is_swallowed(self):
+        def bad_hook(user_id: str, provider: str) -> None:
+            raise ValueError("boom")
+
+        register_creds_changed_hook(bad_hook)
+        # Must not propagate the exception.
+        _invoke_creds_changed_hook("user-1", "github")
+
+    def test_hook_receives_correct_args(self):
+        calls: list[tuple[str, str]] = []
+        register_creds_changed_hook(lambda u, p: calls.append((u, p)))
+
+        _invoke_creds_changed_hook("user-a", "github")
+        _invoke_creds_changed_hook("user-b", "slack")
+
+        assert calls == [("user-a", "github"), ("user-b", "slack")]
--- a/autogpt_platform/backend/backend/notifications/test_notifications.py
+++ b/autogpt_platform/backend/backend/notifications/test_notifications.py
@@ -19,6 +19,7 @@ class TestNotificationErrorHandling:
        with patch("backend.notifications.notifications.AppService.__init__"):
            manager = NotificationManager()
            manager.email_sender = MagicMock()
+            manager.email_sender.send_templated = AsyncMock()
            # Mock the _get_template method used by _process_batch
            template_mock = Mock()
            template_mock.base_template = "base"
@@ -27,9 +28,10 @@ class TestNotificationErrorHandling:
            manager.email_sender._get_template = Mock(return_value=template_mock)
            # Mock the formatter
            manager.email_sender.formatter = Mock()
-            manager.email_sender.formatter.format_email = Mock(
+            manager.email_sender.formatter.format_email = AsyncMock(
                return_value=("subject", "body content")
            )
+            manager.email_sender.send_templated = AsyncMock()
            manager.email_sender.formatter.env = Mock()
            manager.email_sender.formatter.env.globals = {
                "base_url": "http://example.com"
@@ -331,7 +333,7 @@ class TestNotificationErrorHandling:
                        return ("subject", "x" * 5_000_000)  # Over 4.5MB limit
                return ("subject", "normal sized content")

-            notification_manager.email_sender.formatter.format_email = Mock(
+            notification_manager.email_sender.formatter.format_email = AsyncMock(
                side_effect=format_side_effect
            )

--- a/autogpt_platform/backend/backend/util/conftest.py
+++ b/autogpt_platform/backend/backend/util/conftest.py
@@ -0,0 +1,14 @@
+"""Override session-scoped fixtures from parent conftest.py so unit tests
+in this directory can run without the full server stack."""
+
+import pytest
+
+
+@pytest.fixture(scope="session")
+def server():
+    yield None
+
+
+@pytest.fixture(scope="session", autouse=True)
+def graph_cleanup():
+    yield
--- a/autogpt_platform/backend/backend/util/file.py
+++ b/autogpt_platform/backend/backend/util/file.py
@@ -71,11 +71,15 @@ def sanitize_filename(filename: str) -> str:

    # Truncate if too long
    if len(sanitized) > MAX_FILENAME_LENGTH:
-        # Keep the extension if possible
+        # Keep the extension if possible, but only if it's reasonable length
        if "." in sanitized:
            name, ext = sanitized.rsplit(".", 1)
-            max_name_length = MAX_FILENAME_LENGTH - len(ext) - 1
-            sanitized = name[:max_name_length] + "." + ext
+            # If extension is too long, it's likely not a file extension but just text
+            if len(ext) <= 20:
+                max_name_length = MAX_FILENAME_LENGTH - len(ext) - 1
+                sanitized = name[:max_name_length] + "." + ext
+            else:
+                sanitized = sanitized[:MAX_FILENAME_LENGTH]
        else:
            sanitized = sanitized[:MAX_FILENAME_LENGTH]

@@ -129,7 +133,7 @@ async def store_media_file(

    Return format options:
    - "for_local_processing": Returns local file path - use with ffmpeg, MoviePy, PIL, etc.
-    - "for_external_api": Returns data URI (base64) - use when sending to external APIs
+    - "for_external_api": Returns data URI (base64) - use when sending content to external APIs
    - "for_block_output": Returns best format for output - workspace:// in CoPilot, data URI in graphs

    :param file:               Data URI, URL, workspace://, or local (relative) path.
--- a/autogpt_platform/backend/backend/util/openai_responses.py
+++ b/autogpt_platform/backend/backend/util/openai_responses.py
@@ -0,0 +1,150 @@
+"""Helpers for OpenAI Responses API.
+
+This module provides utilities for using OpenAI's Responses API, which is the
+default for all OpenAI models supported by the platform.
+"""
+
+from typing import Any
+
+
+def convert_tools_to_responses_format(tools: list[dict] | None) -> list[dict]:
+    """Convert Chat Completions tool format to Responses API format.
+
+    The Responses API uses internally-tagged polymorphism (flatter structure)
+    and functions are strict by default.
+
+    Chat Completions format:
+        {"type": "function", "function": {"name": "...", "parameters": {...}}}
+
+    Responses API format:
+        {"type": "function", "name": "...", "parameters": {...}}
+
+    Args:
+        tools: List of tools in Chat Completions format
+
+    Returns:
+        List of tools in Responses API format
+    """
+    if not tools:
+        return []
+
+    converted = []
+    for tool in tools:
+        if tool.get("type") == "function":
+            func = tool.get("function", {})
+            name = func.get("name")
+            if not name:
+                raise ValueError(
+                    f"Function tool is missing required 'name' field: {tool}"
+                )
+            entry: dict[str, Any] = {
+                "type": "function",
+                "name": name,
+                # Note: strict=True is default in Responses API
+            }
+            if func.get("description") is not None:
+                entry["description"] = func["description"]
+            if func.get("parameters") is not None:
+                entry["parameters"] = func["parameters"]
+            converted.append(entry)
+        else:
+            # Pass through non-function tools as-is
+            converted.append(tool)
+    return converted
+
+
+def extract_responses_tool_calls(response: Any) -> list[dict] | None:
+    """Extract tool calls from Responses API response.
+
+    The Responses API returns tool calls as separate items in the output array
+    with type="function_call".
+
+    Args:
+        response: The Responses API response object
+
+    Returns:
+        List of tool calls in a normalized format, or None if no tool calls
+    """
+    tool_calls = []
+    for item in response.output:
+        if getattr(item, "type", None) == "function_call":
+            tool_calls.append(
+                {
+                    "id": item.call_id,
+                    "type": "function",
+                    "function": {
+                        "name": item.name,
+                        "arguments": item.arguments,
+                    },
+                }
+            )
+    return tool_calls if tool_calls else None
+
+
+def extract_responses_usage(response: Any) -> tuple[int, int]:
+    """Extract token usage from Responses API response.
+
+    The Responses API uses input_tokens/output_tokens (not prompt_tokens/completion_tokens).
+
+    Args:
+        response: The Responses API response object
+
+    Returns:
+        Tuple of (input_tokens, output_tokens)
+    """
+    if not getattr(response, "usage", None):
+        return 0, 0
+
+    return (
+        getattr(response.usage, "input_tokens", 0),
+        getattr(response.usage, "output_tokens", 0),
+    )
+
+
+def extract_responses_content(response: Any) -> str:
+    """Extract text content from Responses API response.
+
+    Args:
+        response: The Responses API response object
+
+    Returns:
+        The text content from the response, or empty string if none
+    """
+    # The SDK provides a helper property
+    if hasattr(response, "output_text"):
+        return response.output_text or ""
+
+    # Fallback: manually extract from output items
+    for item in response.output:
+        if getattr(item, "type", None) == "message":
+            for content in getattr(item, "content", []):
+                if getattr(content, "type", None) == "output_text":
+                    return getattr(content, "text", "")
+    return ""
+
+
+def extract_responses_reasoning(response: Any) -> str | None:
+    """Extract reasoning content from Responses API response.
+
+    Reasoning models return their reasoning process in the response,
+    which can be useful for debugging or display.
+
+    Args:
+        response: The Responses API response object
+
+    Returns:
+        The reasoning text, or None if not present
+    """
+    for item in response.output:
+        if getattr(item, "type", None) == "reasoning":
+            # Reasoning items may have summary or content
+            summary = getattr(item, "summary", [])
+            if summary:
+                # Join summary items if present
+                texts = []
+                for s in summary:
+                    if hasattr(s, "text"):
+                        texts.append(s.text)
+                if texts:
+                    return "\n".join(texts)
+    return None
--- a/autogpt_platform/backend/backend/util/openai_responses_test.py
+++ b/autogpt_platform/backend/backend/util/openai_responses_test.py
@@ -0,0 +1,312 @@
+"""Tests for OpenAI Responses API helpers."""
+
+from unittest.mock import MagicMock
+
+from backend.util.openai_responses import (
+    convert_tools_to_responses_format,
+    extract_responses_content,
+    extract_responses_reasoning,
+    extract_responses_tool_calls,
+    extract_responses_usage,
+)
+
+
+class TestConvertToolsToResponsesFormat:
+    """Tests for the convert_tools_to_responses_format function."""
+
+    def test_empty_tools_returns_empty_list(self):
+        """Empty or None tools should return empty list."""
+        assert convert_tools_to_responses_format(None) == []
+        assert convert_tools_to_responses_format([]) == []
+
+    def test_converts_function_tool_format(self):
+        """Should convert Chat Completions function format to Responses format."""
+        chat_completions_tools = [
+            {
+                "type": "function",
+                "function": {
+                    "name": "get_weather",
+                    "description": "Get the weather in a location",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "location": {"type": "string"},
+                        },
+                        "required": ["location"],
+                    },
+                },
+            }
+        ]
+
+        result = convert_tools_to_responses_format(chat_completions_tools)
+
+        assert len(result) == 1
+        assert result[0]["type"] == "function"
+        assert result[0]["name"] == "get_weather"
+        assert result[0]["description"] == "Get the weather in a location"
+        assert result[0]["parameters"] == {
+            "type": "object",
+            "properties": {
+                "location": {"type": "string"},
+            },
+            "required": ["location"],
+        }
+        # Should not have nested "function" key
+        assert "function" not in result[0]
+
+    def test_handles_multiple_tools(self):
+        """Should handle multiple tools."""
+        chat_completions_tools = [
+            {
+                "type": "function",
+                "function": {
+                    "name": "tool_1",
+                    "description": "First tool",
+                    "parameters": {"type": "object", "properties": {}},
+                },
+            },
+            {
+                "type": "function",
+                "function": {
+                    "name": "tool_2",
+                    "description": "Second tool",
+                    "parameters": {"type": "object", "properties": {}},
+                },
+            },
+        ]
+
+        result = convert_tools_to_responses_format(chat_completions_tools)
+
+        assert len(result) == 2
+        assert result[0]["name"] == "tool_1"
+        assert result[1]["name"] == "tool_2"
+
+    def test_passes_through_non_function_tools(self):
+        """Non-function tools should be passed through as-is."""
+        tools = [{"type": "web_search", "config": {"enabled": True}}]
+
+        result = convert_tools_to_responses_format(tools)
+
+        assert result == tools
+
+    def test_omits_none_description_and_parameters(self):
+        """Should omit description and parameters when they are None."""
+        tools = [
+            {
+                "type": "function",
+                "function": {
+                    "name": "simple_tool",
+                },
+            }
+        ]
+
+        result = convert_tools_to_responses_format(tools)
+
+        assert len(result) == 1
+        assert result[0]["type"] == "function"
+        assert result[0]["name"] == "simple_tool"
+        assert "description" not in result[0]
+        assert "parameters" not in result[0]
+
+    def test_raises_on_missing_name(self):
+        """Should raise ValueError when function tool has no name."""
+        import pytest
+
+        tools = [{"type": "function", "function": {}}]
+        with pytest.raises(ValueError, match="missing required 'name' field"):
+            convert_tools_to_responses_format(tools)
+
+
+class TestExtractResponsesToolCalls:
+    """Tests for the extract_responses_tool_calls function."""
+
+    def test_extracts_function_call_items(self):
+        """Should extract function_call items from response output."""
+        item = MagicMock()
+        item.type = "function_call"
+        item.call_id = "call_123"
+        item.name = "get_weather"
+        item.arguments = '{"location": "NYC"}'
+
+        response = MagicMock()
+        response.output = [item]
+
+        result = extract_responses_tool_calls(response)
+
+        assert result == [
+            {
+                "id": "call_123",
+                "type": "function",
+                "function": {
+                    "name": "get_weather",
+                    "arguments": '{"location": "NYC"}',
+                },
+            }
+        ]
+
+    def test_returns_none_when_no_tool_calls(self):
+        """Should return None when no function_call items exist."""
+        message_item = MagicMock()
+        message_item.type = "message"
+
+        response = MagicMock()
+        response.output = [message_item]
+
+        assert extract_responses_tool_calls(response) is None
+
+    def test_returns_none_for_empty_output(self):
+        """Should return None when output is empty."""
+        response = MagicMock()
+        response.output = []
+
+        assert extract_responses_tool_calls(response) is None
+
+    def test_extracts_multiple_tool_calls(self):
+        """Should extract multiple function_call items."""
+        item1 = MagicMock()
+        item1.type = "function_call"
+        item1.call_id = "call_1"
+        item1.name = "tool_a"
+        item1.arguments = "{}"
+
+        item2 = MagicMock()
+        item2.type = "function_call"
+        item2.call_id = "call_2"
+        item2.name = "tool_b"
+        item2.arguments = '{"x": 1}'
+
+        response = MagicMock()
+        response.output = [item1, item2]
+
+        result = extract_responses_tool_calls(response)
+
+        assert result is not None
+        assert len(result) == 2
+        assert result[0]["function"]["name"] == "tool_a"
+        assert result[1]["function"]["name"] == "tool_b"
+
+
+class TestExtractResponsesUsage:
+    """Tests for the extract_responses_usage function."""
+
+    def test_extracts_token_counts(self):
+        """Should extract input_tokens and output_tokens."""
+        response = MagicMock()
+        response.usage.input_tokens = 42
+        response.usage.output_tokens = 17
+
+        result = extract_responses_usage(response)
+
+        assert result == (42, 17)
+
+    def test_returns_zeros_when_usage_is_none(self):
+        """Should return (0, 0) when usage is None."""
+        response = MagicMock()
+        response.usage = None
+
+        result = extract_responses_usage(response)
+
+        assert result == (0, 0)
+
+
+class TestExtractResponsesContent:
+    """Tests for the extract_responses_content function."""
+
+    def test_extracts_from_output_text(self):
+        """Should use output_text property when available."""
+        response = MagicMock()
+        response.output_text = "Hello world"
+
+        assert extract_responses_content(response) == "Hello world"
+
+    def test_returns_empty_string_when_output_text_is_none(self):
+        """Should return empty string when output_text is None."""
+        response = MagicMock()
+        response.output_text = None
+        response.output = []
+
+        assert extract_responses_content(response) == ""
+
+    def test_fallback_to_output_items(self):
+        """Should fall back to extracting from output items."""
+        text_content = MagicMock()
+        text_content.type = "output_text"
+        text_content.text = "Fallback content"
+
+        message_item = MagicMock()
+        message_item.type = "message"
+        message_item.content = [text_content]
+
+        response = MagicMock(spec=[])  # no output_text attribute
+        response.output = [message_item]
+
+        assert extract_responses_content(response) == "Fallback content"
+
+    def test_returns_empty_string_for_empty_output(self):
+        """Should return empty string when no content found."""
+        response = MagicMock(spec=[])  # no output_text attribute
+        response.output = []
+
+        assert extract_responses_content(response) == ""
+
+
+class TestExtractResponsesReasoning:
+    """Tests for the extract_responses_reasoning function."""
+
+    def test_extracts_reasoning_summary(self):
+        """Should extract reasoning text from summary items."""
+        summary_item = MagicMock()
+        summary_item.text = "Step 1: Think about it"
+
+        reasoning_item = MagicMock()
+        reasoning_item.type = "reasoning"
+        reasoning_item.summary = [summary_item]
+
+        response = MagicMock()
+        response.output = [reasoning_item]
+
+        assert extract_responses_reasoning(response) == "Step 1: Think about it"
+
+    def test_joins_multiple_summary_items(self):
+        """Should join multiple summary text items with newlines."""
+        s1 = MagicMock()
+        s1.text = "First thought"
+        s2 = MagicMock()
+        s2.text = "Second thought"
+
+        reasoning_item = MagicMock()
+        reasoning_item.type = "reasoning"
+        reasoning_item.summary = [s1, s2]
+
+        response = MagicMock()
+        response.output = [reasoning_item]
+
+        assert extract_responses_reasoning(response) == "First thought\nSecond thought"
+
+    def test_returns_none_when_no_reasoning(self):
+        """Should return None when no reasoning items exist."""
+        message_item = MagicMock()
+        message_item.type = "message"
+
+        response = MagicMock()
+        response.output = [message_item]
+
+        assert extract_responses_reasoning(response) is None
+
+    def test_returns_none_for_empty_output(self):
+        """Should return None when output is empty."""
+        response = MagicMock()
+        response.output = []
+
+        assert extract_responses_reasoning(response) is None
+
+    def test_returns_none_when_summary_is_empty(self):
+        """Should return None when reasoning item has empty summary."""
+        reasoning_item = MagicMock()
+        reasoning_item.type = "reasoning"
+        reasoning_item.summary = []
+
+        response = MagicMock()
+        response.output = [reasoning_item]
+
+        assert extract_responses_reasoning(response) is None
--- a/autogpt_platform/backend/backend/util/prompt.py
+++ b/autogpt_platform/backend/backend/util/prompt.py
@@ -36,16 +36,34 @@ def _msg_tokens(msg: dict, enc) -> int:
    OpenAI counts ≈3 wrapper tokens per chat message, plus 1 if "name"
    is present, plus the tokenised content length.
    For tool calls, we need to count tokens in tool_calls and content fields.
+    Supports Chat Completions, Anthropic, and Responses API formats.
    """
    WRAPPER = 3 + (1 if "name" in msg else 0)

+    # Responses API: function_call items have arguments + name
+    if msg.get("type") == "function_call":
+        return (
+            WRAPPER
+            + _tok_len(msg.get("name", ""), enc)
+            + _tok_len(msg.get("arguments", ""), enc)
+            + _tok_len(msg.get("call_id", ""), enc)
+        )
+
+    # Responses API: function_call_output items have output
+    if msg.get("type") == "function_call_output":
+        return (
+            WRAPPER
+            + _tok_len(msg.get("output", ""), enc)
+            + _tok_len(msg.get("call_id", ""), enc)
+        )
+
    # Count content tokens
    content_tokens = _tok_len(msg.get("content") or "", enc)

    # Count tool call tokens for both OpenAI and Anthropic formats
    tool_call_tokens = 0

-    # OpenAI format: tool_calls array at message level
+    # OpenAI Chat Completions format: tool_calls array at message level
    if "tool_calls" in msg and isinstance(msg["tool_calls"], list):
        for tool_call in msg["tool_calls"]:
            # Count the tool call structure tokens
@@ -70,6 +88,10 @@ def _msg_tokens(msg: dict, enc) -> int:
                # Count tool result tokens
                tool_call_tokens += _tok_len(item.get("tool_use_id", ""), enc)
                tool_call_tokens += _tok_len(item.get("content", ""), enc)
+            elif isinstance(item, dict) and item.get("type") == "text":
+                # Count text block tokens (standard: "text" key, fallback: "content")
+                text_val = item.get("text") or item.get("content", "")
+                tool_call_tokens += _tok_len(text_val, enc)
            elif isinstance(item, dict) and "content" in item:
                # Other content types with content field
                tool_call_tokens += _tok_len(item.get("content", ""), enc)
@@ -81,6 +103,10 @@ def _msg_tokens(msg: dict, enc) -> int:

 def _is_tool_message(msg: dict) -> bool:
    """Check if a message contains tool calls or results that should be protected."""
+    # Responses API: standalone function_call / function_call_output items
+    if msg.get("type") in ("function_call", "function_call_output"):
+        return True
+
    content = msg.get("content")

    # Check for Anthropic-style tool messages
@@ -90,7 +116,7 @@ def _is_tool_message(msg: dict) -> bool:
    ):
        return True

-    # Check for OpenAI-style tool calls in the message
+    # Check for OpenAI Chat Completions-style tool calls in the message
    if "tool_calls" in msg or msg.get("role") == "tool":
        return True

@@ -109,11 +135,18 @@ def _is_objective_message(msg: dict) -> bool:
 def _truncate_tool_message_content(msg: dict, enc, max_tokens: int) -> None:
    """
    Carefully truncate tool message content while preserving tool structure.
-    Handles both Anthropic-style (list content) and OpenAI-style (string content) tool messages.
+    Handles Anthropic, Chat Completions, and Responses API tool messages.
    """
+    # Responses API: function_call_output has "output" field
+    if msg.get("type") == "function_call_output":
+        output = msg.get("output", "")
+        if isinstance(output, str) and _tok_len(output, enc) > max_tokens:
+            msg["output"] = _truncate_middle_tokens(output, enc, max_tokens)
+        return
+
    content = msg.get("content")

-    # OpenAI-style tool message: role="tool" with string content
+    # OpenAI Chat Completions tool message: role="tool" with string content
    if msg.get("role") == "tool" and isinstance(content, str):
        if _tok_len(content, enc) > max_tokens:
            msg["content"] = _truncate_middle_tokens(content, enc, max_tokens)
@@ -145,10 +178,16 @@ def _truncate_middle_tokens(text: str, enc, max_tok: int) -> str:
    if len(ids) <= max_tok:
        return text  # nothing to do

+    # Need at least 3 tokens (head + ellipsis + tail) for meaningful truncation
+    if max_tok < 1:
+        return ""
+    mid = enc.encode(" … ")
+    if max_tok < 3:
+        return enc.decode(ids[:max_tok])
+
    # Split the allowance between the two ends:
    head = max_tok // 2 - 1  # -1 for the ellipsis
    tail = max_tok - head - 1
-    mid = enc.encode(" … ")
    return enc.decode(ids[:head] + mid + ids[-tail:])


@@ -241,18 +280,26 @@ def _extract_tool_call_ids_from_message(msg: dict) -> set[str]:
    """
    Extract tool_call IDs from an assistant message.

-    Supports both formats:
-    - OpenAI: {"role": "assistant", "tool_calls": [{"id": "..."}]}
+    Supports all formats:
+    - OpenAI Chat Completions: {"role": "assistant", "tool_calls": [{"id": "..."}]}
    - Anthropic: {"role": "assistant", "content": [{"type": "tool_use", "id": "..."}]}
+    - OpenAI Responses API: {"type": "function_call", "call_id": "..."}

    Returns:
        Set of tool_call IDs found in the message.
    """
    ids: set[str] = set()
+
+    # Responses API: standalone function_call item
+    if msg.get("type") == "function_call":
+        if call_id := msg.get("call_id"):
+            ids.add(call_id)
+        return ids
+
    if msg.get("role") != "assistant":
        return ids

-    # OpenAI format: tool_calls array
+    # OpenAI Chat Completions format: tool_calls array
    if msg.get("tool_calls"):
        for tc in msg["tool_calls"]:
            tc_id = tc.get("id")
@@ -275,16 +322,23 @@ def _extract_tool_response_ids_from_message(msg: dict) -> set[str]:
    """
    Extract tool_call IDs that this message is responding to.

-    Supports both formats:
-    - OpenAI: {"role": "tool", "tool_call_id": "..."}
+    Supports all formats:
+    - OpenAI Chat Completions: {"role": "tool", "tool_call_id": "..."}
    - Anthropic: {"role": "user", "content": [{"type": "tool_result", "tool_use_id": "..."}]}
+    - OpenAI Responses API: {"type": "function_call_output", "call_id": "..."}

    Returns:
        Set of tool_call IDs this message responds to.
    """
    ids: set[str] = set()

-    # OpenAI format: role=tool with tool_call_id
+    # Responses API: standalone function_call_output item
+    if msg.get("type") == "function_call_output":
+        if call_id := msg.get("call_id"):
+            ids.add(call_id)
+        return ids
+
+    # OpenAI Chat Completions format: role=tool with tool_call_id
    if msg.get("role") == "tool":
        tc_id = msg.get("tool_call_id")
        if tc_id:
@@ -303,8 +357,11 @@ def _extract_tool_response_ids_from_message(msg: dict) -> set[str]:


 def _is_tool_response_message(msg: dict) -> bool:
-    """Check if message is a tool response (OpenAI or Anthropic format)."""
-    # OpenAI format
+    """Check if message is a tool response (Chat Completions, Anthropic, or Responses API)."""
+    # Responses API format
+    if msg.get("type") == "function_call_output":
+        return True
+    # OpenAI Chat Completions format
    if msg.get("role") == "tool":
        return True
    # Anthropic format
@@ -322,13 +379,20 @@ def _remove_orphan_tool_responses(
    """
    Remove tool response messages/blocks that reference orphan tool_call IDs.

-    Supports both OpenAI and Anthropic formats.
+    Supports OpenAI Chat Completions, Anthropic, and Responses API formats.
    For Anthropic messages with mixed valid/orphan tool_result blocks,
    filters out only the orphan blocks instead of dropping the entire message.
    """
    result = []
    for msg in messages:
-        # OpenAI format: role=tool - drop entire message if orphan
+        # Responses API: function_call_output - drop if orphan
+        if msg.get("type") == "function_call_output":
+            if msg.get("call_id") in orphan_ids:
+                continue
+            result.append(msg)
+            continue
+
+        # OpenAI Chat Completions: role=tool - drop entire message if orphan
        if msg.get("role") == "tool":
            tc_id = msg.get("tool_call_id")
            if tc_id and tc_id in orphan_ids:
@@ -514,6 +578,18 @@ async def _summarize_messages_llm(
    """Summarize messages using an LLM."""
    conversation = []
    for msg in messages:
+        # Responses API: function_call items
+        if msg.get("type") == "function_call":
+            name = msg.get("name", "unknown_tool")
+            args = msg.get("arguments", "")
+            conversation.append(f"TOOL CALL ({name}): {args}")
+            continue
+        # Responses API: function_call_output items
+        if msg.get("type") == "function_call_output":
+            output = msg.get("output", "")
+            conversation.append(f"TOOL OUTPUT: {output}")
+            continue
+
        role = msg.get("role", "")
        content = msg.get("content", "")
        if content and role in ("user", "assistant", "tool"):
@@ -545,6 +621,14 @@ async def _summarize_messages_llm(
                    "- Actions taken and key decisions made\n"
                    "- Technical specifics (file names, tool outputs, function signatures)\n"
                    "- Errors encountered and resolutions applied\n\n"
+                    "IMPORTANT: Preserve all concrete references verbatim — these are small but "
+                    "critical for continuing the conversation:\n"
+                    "- File paths and directory paths (e.g. /src/app/page.tsx, ./output/result.csv)\n"
+                    "- Image/media file paths from tool outputs\n"
+                    "- URLs, API endpoints, and webhook addresses\n"
+                    "- Resource IDs, session IDs, and identifiers\n"
+                    "- Tool names that were called and their key parameters\n"
+                    "- Environment variables, config keys, and credentials names (not values)\n\n"
                    "Include ONLY the sections below that have relevant content "
                    "(skip sections with nothing to report):\n\n"
                    "## 1. Primary Request and Intent\n"
@@ -552,7 +636,8 @@ async def _summarize_messages_llm(
                    "## 2. Key Technical Concepts\n"
                    "Technologies, frameworks, tools, and patterns being used or discussed.\n\n"
                    "## 3. Files and Resources Involved\n"
-                    "Specific files examined or modified, with relevant snippets and identifiers.\n\n"
+                    "Specific files examined or modified, with relevant snippets and identifiers. "
+                    "Include exact file paths, image paths from tool outputs, and resource URLs.\n\n"
                    "## 4. Errors and Fixes\n"
                    "Problems encountered, error messages, and their resolutions.\n\n"
                    "## 5. All User Messages\n"
@@ -566,7 +651,7 @@ async def _summarize_messages_llm(
            },
            {"role": "user", "content": f"Summarize:\n\n{conversation_text}"},
        ],
-        max_tokens=1500,
+        max_tokens=2000,
        temperature=0.3,
    )

@@ -686,11 +771,15 @@ async def compress_context(
                    msgs = [summary_msg] + recent_msgs

                logger.info(
-                    f"Context summarized: {original_count} -> {total_tokens()} tokens, "
-                    f"summarized {messages_summarized} messages"
+                    "Context summarized: %d -> %d tokens, summarized %d messages",
+                    original_count,
+                    total_tokens(),
+                    messages_summarized,
                )
            except Exception as e:
-                logger.warning(f"Summarization failed, continuing with truncation: {e}")
+                logger.warning(
+                    "Summarization failed, continuing with truncation: %s", e
+                )
                # Fall through to content truncation

    # ---- STEP 2: Normalize content ----------------------------------------
--- a/autogpt_platform/backend/backend/util/prompt_responses_api_test.py
+++ b/autogpt_platform/backend/backend/util/prompt_responses_api_test.py
@@ -0,0 +1,603 @@
+"""Tests for prompt.py compatibility with the OpenAI Responses API.
+
+The Responses API uses a different conversation format:
+- Tool calls are standalone items with ``type: "function_call"`` and ``call_id``
+- Tool results are items with ``type: "function_call_output"`` and ``call_id``
+- These items do NOT have ``role`` at the top level
+
+These tests validate that prompt utilities correctly handle Responses API items
+alongside Chat Completions and Anthropic formats.
+"""
+
+import pytest
+from tiktoken import encoding_for_model
+
+from backend.util.prompt import (
+    _ensure_tool_pairs_intact,
+    _extract_tool_call_ids_from_message,
+    _extract_tool_response_ids_from_message,
+    _is_tool_message,
+    _is_tool_response_message,
+    _msg_tokens,
+    _remove_orphan_tool_responses,
+    _truncate_tool_message_content,
+    compress_context,
+    validate_and_remove_orphan_tool_responses,
+)
+
+# ── Fixtures ──────────────────────────────────────────────────────────────
+
+
+@pytest.fixture
+def enc():
+    return encoding_for_model("gpt-4o")
+
+
+# ── Sample items ──────────────────────────────────────────────────────────
+
+FUNCTION_CALL_ITEM = {
+    "type": "function_call",
+    "id": "fc_abc",
+    "call_id": "call_abc",
+    "name": "search_tool",
+    "arguments": '{"query": "python asyncio tutorial"}',
+    "status": "completed",
+}
+
+FUNCTION_CALL_OUTPUT_ITEM = {
+    "type": "function_call_output",
+    "call_id": "call_abc",
+    "output": '{"results": ["result1", "result2", "result3"]}',
+}
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# _msg_tokens
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+class TestMsgTokensResponsesApi:
+    """_msg_tokens should count tokens in function_call / function_call_output
+    items, not just role-based messages."""
+
+    def test_chat_completions_tool_call_counted(self, enc):
+        """Baseline: Chat Completions tool_calls are counted correctly."""
+        msg = {
+            "role": "assistant",
+            "content": None,
+            "tool_calls": [
+                {
+                    "id": "call_abc",
+                    "type": "function",
+                    "function": {
+                        "name": "search_tool",
+                        "arguments": '{"query": "python asyncio tutorial"}',
+                    },
+                }
+            ],
+        }
+        tokens = _msg_tokens(msg, enc)
+        assert tokens > 10  # Should count the tool call content
+
+    def test_chat_completions_tool_response_counted(self, enc):
+        """Baseline: Chat Completions tool responses are counted correctly."""
+        msg = {
+            "role": "tool",
+            "tool_call_id": "call_abc",
+            "content": '{"results": ["result1", "result2"]}',
+        }
+        tokens = _msg_tokens(msg, enc)
+        assert tokens > 5
+
+    def test_function_call_minimal_fields(self, enc):
+        """function_call with missing optional fields still counts."""
+        msg = {"type": "function_call"}
+        tokens = _msg_tokens(msg, enc)
+        assert tokens >= 3  # At least the wrapper
+
+    def test_function_call_output_minimal_fields(self, enc):
+        """function_call_output with missing output field still counts."""
+        msg = {"type": "function_call_output"}
+        tokens = _msg_tokens(msg, enc)
+        assert tokens >= 3
+
+    def test_function_call_arguments_counted(self, enc):
+        """function_call items have 'arguments' not 'content' — tokens must
+        include the arguments string and the function name."""
+        tokens = _msg_tokens(FUNCTION_CALL_ITEM, enc)
+        # Must count at least the arguments and name tokens
+        name_tokens = len(enc.encode(FUNCTION_CALL_ITEM["name"]))
+        args_tokens = len(enc.encode(FUNCTION_CALL_ITEM["arguments"]))
+        assert tokens >= name_tokens + args_tokens
+
+    def test_function_call_output_content_counted(self, enc):
+        """function_call_output items have 'output' not 'content' — tokens must
+        include the output string."""
+        tokens = _msg_tokens(FUNCTION_CALL_OUTPUT_ITEM, enc)
+        output_tokens = len(enc.encode(FUNCTION_CALL_OUTPUT_ITEM["output"]))
+        assert tokens >= output_tokens
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# _is_tool_message
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+class TestIsToolMessageResponsesApi:
+    """_is_tool_message should recognise Responses API items as tool messages
+    so they are protected from deletion during compaction."""
+
+    def test_chat_completions_tool_call_detected(self):
+        """Baseline: Chat Completions tool_calls are detected."""
+        msg = {
+            "role": "assistant",
+            "tool_calls": [{"id": "call_1", "type": "function"}],
+        }
+        assert _is_tool_message(msg) is True
+
+    def test_chat_completions_tool_response_detected(self):
+        """Baseline: Chat Completions role=tool is detected."""
+        msg = {"role": "tool", "tool_call_id": "call_1", "content": "result"}
+        assert _is_tool_message(msg) is True
+
+    def test_anthropic_tool_use_detected(self):
+        """Baseline: Anthropic tool_use is detected."""
+        msg = {
+            "role": "assistant",
+            "content": [
+                {"type": "tool_use", "id": "toolu_1", "name": "t", "input": {}}
+            ],
+        }
+        assert _is_tool_message(msg) is True
+
+    def test_anthropic_tool_result_detected(self):
+        """Baseline: Anthropic tool_result is detected."""
+        msg = {
+            "role": "user",
+            "content": [
+                {"type": "tool_result", "tool_use_id": "toolu_1", "content": "ok"}
+            ],
+        }
+        assert _is_tool_message(msg) is True
+
+    def test_function_call_detected(self):
+        """type=function_call should be recognised as a tool message."""
+        assert _is_tool_message(FUNCTION_CALL_ITEM) is True
+
+    def test_function_call_output_detected(self):
+        """type=function_call_output should be recognised as a tool message."""
+        assert _is_tool_message(FUNCTION_CALL_OUTPUT_ITEM) is True
+
+    def test_regular_user_message_not_tool(self):
+        """Plain user message → not a tool message."""
+        assert _is_tool_message({"role": "user", "content": "hello"}) is False
+
+    def test_regular_assistant_message_not_tool(self):
+        """Plain assistant message without tool_calls → not a tool message."""
+        assert _is_tool_message({"role": "assistant", "content": "hi"}) is False
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# _extract_tool_call_ids_from_message
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+class TestExtractToolCallIdsResponsesApi:
+    """_extract_tool_call_ids_from_message should extract call_ids from
+    Responses API function_call items."""
+
+    def test_chat_completions_extracted(self):
+        """Baseline: Chat Completions tool_calls IDs are extracted."""
+        msg = {
+            "role": "assistant",
+            "tool_calls": [
+                {"id": "call_1", "type": "function"},
+                {"id": "call_2", "type": "function"},
+            ],
+        }
+        assert _extract_tool_call_ids_from_message(msg) == {"call_1", "call_2"}
+
+    def test_anthropic_extracted(self):
+        """Baseline: Anthropic tool_use IDs are extracted."""
+        msg = {
+            "role": "assistant",
+            "content": [{"type": "tool_use", "id": "toolu_1"}],
+        }
+        assert _extract_tool_call_ids_from_message(msg) == {"toolu_1"}
+
+    def test_function_call_extracted(self):
+        """type=function_call with call_id should be extracted."""
+        assert _extract_tool_call_ids_from_message(FUNCTION_CALL_ITEM) == {"call_abc"}
+
+    def test_function_call_missing_call_id(self):
+        """function_call without call_id → empty set."""
+        msg = {"type": "function_call", "name": "tool"}
+        assert _extract_tool_call_ids_from_message(msg) == set()
+
+    def test_non_assistant_non_function_call(self):
+        """Messages with neither role=assistant nor type=function_call → empty."""
+        msg = {"role": "user", "content": "hello"}
+        assert _extract_tool_call_ids_from_message(msg) == set()
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# _extract_tool_response_ids_from_message
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+class TestExtractToolResponseIdsResponsesApi:
+    """_extract_tool_response_ids_from_message should extract call_ids from
+    Responses API function_call_output items."""
+
+    def test_chat_completions_extracted(self):
+        """Baseline: Chat Completions tool_call_id is extracted."""
+        msg = {"role": "tool", "tool_call_id": "call_1", "content": "result"}
+        assert _extract_tool_response_ids_from_message(msg) == {"call_1"}
+
+    def test_anthropic_extracted(self):
+        """Baseline: Anthropic tool_use_id is extracted."""
+        msg = {
+            "role": "user",
+            "content": [
+                {"type": "tool_result", "tool_use_id": "toolu_1", "content": "ok"}
+            ],
+        }
+        assert _extract_tool_response_ids_from_message(msg) == {"toolu_1"}
+
+    def test_function_call_output_extracted(self):
+        """type=function_call_output with call_id should be extracted."""
+        assert _extract_tool_response_ids_from_message(FUNCTION_CALL_OUTPUT_ITEM) == {
+            "call_abc"
+        }
+
+    def test_function_call_output_missing_call_id(self):
+        """function_call_output without call_id → empty set."""
+        msg = {"type": "function_call_output", "output": "result"}
+        assert _extract_tool_response_ids_from_message(msg) == set()
+
+    def test_non_tool_non_function_call_output(self):
+        """Regular user message → empty set."""
+        msg = {"role": "user", "content": "hello"}
+        assert _extract_tool_response_ids_from_message(msg) == set()
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# _is_tool_response_message
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+class TestIsToolResponseMessageResponsesApi:
+    def test_chat_completions_detected(self):
+        msg = {"role": "tool", "tool_call_id": "call_1", "content": "r"}
+        assert _is_tool_response_message(msg) is True
+
+    def test_anthropic_detected(self):
+        msg = {
+            "role": "user",
+            "content": [
+                {"type": "tool_result", "tool_use_id": "toolu_1", "content": "ok"}
+            ],
+        }
+        assert _is_tool_response_message(msg) is True
+
+    def test_function_call_output_detected(self):
+        """type=function_call_output should be recognised as a tool response."""
+        assert _is_tool_response_message(FUNCTION_CALL_OUTPUT_ITEM) is True
+
+    def test_function_call_is_not_response(self):
+        """function_call is a tool REQUEST, not a response."""
+        assert _is_tool_response_message(FUNCTION_CALL_ITEM) is False
+
+    def test_regular_message_not_response(self):
+        """Plain message → not a tool response."""
+        assert _is_tool_response_message({"role": "user", "content": "hi"}) is False
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# _truncate_tool_message_content
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+class TestTruncateToolMessageContentResponsesApi:
+    def test_chat_completions_truncated(self, enc):
+        """Baseline: role=tool content is truncated."""
+        msg = {"role": "tool", "tool_call_id": "call_1", "content": "x" * 10000}
+        _truncate_tool_message_content(msg, enc, max_tokens=50)
+        assert len(enc.encode(msg["content"])) <= 55  # ~50 with rounding
+
+    def test_function_call_output_truncated(self, enc):
+        """function_call_output 'output' field should be truncated."""
+        msg = {
+            "type": "function_call_output",
+            "call_id": "call_1",
+            "output": "x" * 10000,
+        }
+        _truncate_tool_message_content(msg, enc, max_tokens=50)
+        assert len(enc.encode(msg["output"])) <= 55
+
+    def test_function_call_output_short_not_truncated(self, enc):
+        """Short function_call_output output is left unchanged."""
+        msg = {
+            "type": "function_call_output",
+            "call_id": "call_1",
+            "output": "short",
+        }
+        _truncate_tool_message_content(msg, enc, max_tokens=1000)
+        assert msg["output"] == "short"
+
+    def test_function_call_not_truncated(self, enc):
+        """function_call items (requests) should not be truncated."""
+        msg = dict(FUNCTION_CALL_ITEM)  # copy
+        original_args = msg["arguments"]
+        _truncate_tool_message_content(msg, enc, max_tokens=5)
+        assert msg["arguments"] == original_args  # unchanged
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# _remove_orphan_tool_responses
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+class TestRemoveOrphanToolResponsesResponsesApi:
+    def test_chat_completions_orphan_removed(self):
+        """Baseline: orphan role=tool messages are removed."""
+        messages = [
+            {"role": "tool", "tool_call_id": "call_orphan", "content": "result"},
+            {"role": "user", "content": "Hello"},
+        ]
+        result = _remove_orphan_tool_responses(messages, {"call_orphan"})
+        assert len(result) == 1
+        assert result[0]["role"] == "user"
+
+    def test_function_call_output_orphan_removed(self):
+        """Orphan function_call_output items should be removed."""
+        messages = [
+            {
+                "type": "function_call_output",
+                "call_id": "call_orphan",
+                "output": "result",
+            },
+            {"role": "user", "content": "Hello"},
+        ]
+        result = _remove_orphan_tool_responses(messages, {"call_orphan"})
+        assert len(result) == 1
+        assert result[0]["role"] == "user"
+
+    def test_function_call_output_non_orphan_kept(self):
+        """Non-orphan function_call_output items should be kept."""
+        messages = [
+            {
+                "type": "function_call_output",
+                "call_id": "call_valid",
+                "output": "result",
+            },
+            {"role": "user", "content": "Hello"},
+        ]
+        result = _remove_orphan_tool_responses(messages, {"call_other"})
+        assert len(result) == 2
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# validate_and_remove_orphan_tool_responses
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+class TestValidateOrphansResponsesApi:
+    def test_chat_completions_paired_kept(self):
+        """Baseline: matched Chat Completions pairs are kept."""
+        messages = [
+            {
+                "role": "assistant",
+                "tool_calls": [{"id": "call_1", "type": "function"}],
+            },
+            {"role": "tool", "tool_call_id": "call_1", "content": "done"},
+        ]
+        result = validate_and_remove_orphan_tool_responses(messages, log_warning=False)
+        assert len(result) == 2
+
+    def test_responses_api_paired_kept(self):
+        """Matched Responses API pairs are kept because the validator
+        properly recognizes function_call and function_call_output items."""
+        messages = [
+            {"role": "user", "content": "Do something."},
+            FUNCTION_CALL_ITEM,
+            FUNCTION_CALL_OUTPUT_ITEM,
+        ]
+        result = validate_and_remove_orphan_tool_responses(messages, log_warning=False)
+        assert len(result) == 3
+
+    def test_responses_api_orphan_output_removed(self):
+        """Orphan function_call_output (no matching function_call) should be removed."""
+        messages = [
+            {"role": "user", "content": "Do something."},
+            # No function_call — output is orphaned
+            FUNCTION_CALL_OUTPUT_ITEM,
+        ]
+        result = validate_and_remove_orphan_tool_responses(messages, log_warning=False)
+        assert len(result) == 1
+        assert result[0]["role"] == "user"
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# _ensure_tool_pairs_intact
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+class TestEnsureToolPairsIntactResponsesApi:
+    def test_chat_completions_pair_preserved(self):
+        """Baseline: sliced Chat Completions tool responses get their assistant prepended."""
+        all_msgs = [
+            {"role": "system", "content": "sys"},
+            {
+                "role": "assistant",
+                "tool_calls": [{"id": "call_1", "type": "function"}],
+            },
+            {"role": "tool", "tool_call_id": "call_1", "content": "result"},
+            {"role": "user", "content": "thanks"},
+        ]
+        # Slice starts at index 2 (tool response) — orphan
+        recent = [all_msgs[2], all_msgs[3]]
+        result = _ensure_tool_pairs_intact(recent, all_msgs, start_index=2)
+        # Should prepend the assistant message
+        assert len(result) == 3
+        assert "tool_calls" in result[0]
+
+    def test_responses_api_pair_preserved(self):
+        """Sliced function_call_output should get its function_call prepended."""
+        all_msgs = [
+            {"role": "system", "content": "sys"},
+            {"role": "user", "content": "search for X"},
+            FUNCTION_CALL_ITEM,
+            FUNCTION_CALL_OUTPUT_ITEM,
+            {"role": "user", "content": "thanks"},
+        ]
+        # Slice starts at index 3 (function_call_output) — orphan
+        recent = [all_msgs[3], all_msgs[4]]
+        result = _ensure_tool_pairs_intact(recent, all_msgs, start_index=3)
+        # Should prepend the function_call item
+        assert len(result) == 3
+        assert result[0].get("type") == "function_call"
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# _summarize_messages_llm (minor)
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+class TestSummarizeMessagesResponsesApi:
+    """_summarize_messages_llm extracts content using msg.get("role") and
+    msg.get("content").  Responses API function_call items have neither
+    role in ("user", "assistant", "tool") nor "content" — they'd be silently
+    skipped in the summary.  This is a minor data-loss issue."""
+
+    @pytest.mark.asyncio
+    async def test_function_call_included_in_summary_text(self):
+        """function_call items should contribute to the summary text."""
+        from backend.util.prompt import _summarize_messages_llm
+
+        messages = [
+            {"role": "user", "content": "Search for X"},
+            FUNCTION_CALL_ITEM,
+            FUNCTION_CALL_OUTPUT_ITEM,
+            {"role": "user", "content": "Thanks"},
+        ]
+
+        # We only need to check the conversation text building, not the LLM call.
+        # The function builds conversation_text before calling the client.
+        # We mock the client to capture what it receives.
+        from unittest.mock import AsyncMock, MagicMock
+
+        mock_client = MagicMock()
+        mock_resp = MagicMock()
+        mock_resp.choices = [MagicMock()]
+        mock_resp.choices[0].message.content = "Summary"
+        mock_client.with_options.return_value.chat.completions.create = AsyncMock(
+            return_value=mock_resp
+        )
+
+        await _summarize_messages_llm(messages, mock_client, "gpt-4o")
+
+        # Check the prompt sent to the LLM contains tool info
+        call_args = (
+            mock_client.with_options.return_value.chat.completions.create.call_args
+        )
+        user_msg = call_args.kwargs["messages"][1]["content"]
+        # The tool name or arguments should appear in the summary text
+        assert "search_tool" in user_msg or "python asyncio" in user_msg
+
+
+# ═══════════════════════════════════════════════════════════════════════════
+# compress_context end-to-end
+# ═══════════════════════════════════════════════════════════════════════════
+
+
+class TestCompressContextResponsesApi:
+    @pytest.mark.asyncio
+    async def test_chat_completions_tool_pairs_preserved(self):
+        """Baseline: Chat Completions tool pairs survive compaction."""
+        messages: list[dict] = [
+            {"role": "system", "content": "You are helpful."},
+        ]
+        # Add enough messages to trigger compaction
+        for i in range(20):
+            messages.append({"role": "user", "content": f"Question {i} " * 200})
+            messages.append({"role": "assistant", "content": f"Answer {i} " * 200})
+        # Add a tool pair at the end
+        messages.append(
+            {
+                "role": "assistant",
+                "tool_calls": [
+                    {"id": "call_final", "type": "function", "function": {"name": "f"}}
+                ],
+            }
+        )
+        messages.append(
+            {"role": "tool", "tool_call_id": "call_final", "content": "result"}
+        )
+        messages.append({"role": "assistant", "content": "Done!"})
+
+        result = await compress_context(messages, target_tokens=2000, client=None)
+
+        # If tool response exists, its call must exist too
+        call_ids = set()
+        resp_ids = set()
+        for msg in result.messages:
+            if "tool_calls" in msg:
+                for tc in msg["tool_calls"]:
+                    call_ids.add(tc["id"])
+            if msg.get("role") == "tool":
+                resp_ids.add(msg.get("tool_call_id"))
+        assert resp_ids <= call_ids
+
+    @pytest.mark.asyncio
+    async def test_responses_api_tool_pairs_preserved(self):
+        """Responses API function_call / function_call_output pairs must
+        survive compaction intact.  Currently they can be silently deleted
+        because _is_tool_message doesn't recognise them."""
+        messages = [
+            {"role": "system", "content": "You are helpful."},
+        ]
+        # Add enough messages to trigger compaction
+        for i in range(20):
+            messages.append({"role": "user", "content": f"Question {i} " * 200})
+            messages.append({"role": "assistant", "content": f"Answer {i} " * 200})
+        # Add a Responses API tool pair at the end
+        messages.append(
+            {
+                "type": "function_call",
+                "id": "fc_final",
+                "call_id": "call_final",
+                "name": "search_tool",
+                "arguments": '{"q": "test"}',
+                "status": "completed",
+            }
+        )
+        messages.append(
+            {
+                "type": "function_call_output",
+                "call_id": "call_final",
+                "output": '{"results": ["a", "b"]}',
+            }
+        )
+        messages.append({"role": "user", "content": "Thanks!"})
+
+        result = await compress_context(messages, target_tokens=2000, client=None)
+
+        # The function_call and function_call_output must both survive
+        fc_items = [m for m in result.messages if m.get("type") == "function_call"]
+        fco_items = [
+            m for m in result.messages if m.get("type") == "function_call_output"
+        ]
+
+        # If either exists, the other must exist too (pair integrity)
+        if fc_items or fco_items:
+            fc_call_ids = {m["call_id"] for m in fc_items}
+            fco_call_ids = {m["call_id"] for m in fco_items}
+            assert (
+                fco_call_ids <= fc_call_ids
+            ), "function_call_output exists without matching function_call"
+
+        # At minimum, neither should have been silently deleted if the
+        # conversation was short enough to keep them
+        assert len(fc_items) >= 1, "function_call was deleted during compaction"
+        assert len(fco_items) >= 1, "function_call_output was deleted during compaction"
--- a/autogpt_platform/backend/backend/util/service.py
+++ b/autogpt_platform/backend/backend/util/service.py
@@ -704,8 +704,19 @@ def get_service_client(
            return kwargs

        def _get_return(self, expected_return: TypeAdapter | None, result: Any) -> Any:
+            """Validate and coerce the RPC result to the expected return type.
+
+            Falls back to the raw result with a warning if validation fails.
+            """
            if expected_return:
-                return expected_return.validate_python(result)
+                try:
+                    return expected_return.validate_python(result)
+                except Exception as e:
+                    logger.warning(
+                        "RPC return type validation failed, using raw result: %s",
+                        type(e).__name__,
+                    )
+                    return result
            return result

        def __getattr__(self, name: str) -> Callable[..., Any]:
--- a/Show More
+++ b/Show More